35
experience-based access management & privacy- preserving record linkage elizabeth ashley durham thursday, november 11, 2010

experience-based access management & privacy-preserving record linkage

  • Upload
    pennie

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

experience-based access management & privacy-preserving record linkage. elizabeth ashley durham thursday , november 11, 2010. roadmap. experience-based access management privacy-preserving record linkage definition steps in record linkage experiment conclusions - PowerPoint PPT Presentation

Citation preview

Page 1: experience-based access management & privacy-preserving record linkage

experience-based access management & privacy-preserving record linkage

elizabeth ashley durhamthursday, november 11, 2010

Page 2: experience-based access management & privacy-preserving record linkage

TRUST 2010 2

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 3: experience-based access management & privacy-preserving record linkage

TRUST 2010 3

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 4: experience-based access management & privacy-preserving record linkage

access management

• Least Privilege: How can we limit provider access to only the information required to do their job?

• Identity and Access Management (IAM)– ex: role-based access controls

• IAM in health care organizations– complex workflow– routine emergencies

TRUST 2010 4

Page 5: experience-based access management & privacy-preserving record linkage

5

the problem with access controls

Ideal Model

Enforced Control

the problem

TRUST 2010L. Røstad and N. Øystein. Access control and integration of health care systems: an experience report and future challenges. Proc. Availability, Reliability & Security, 2007; 871-878.

study: 43% of providers accessed records for which they did not have permissions

Page 6: experience-based access management & privacy-preserving record linkage

6

the experience-based access management (EBAM) lifecycle

Ideal Model

Enforced Control

Access Log Expected Model

TRUST 2010

C. Gunter, D. Liebovitz, and B. Malin. “EBAM: Experience-Based Access Management for Healthcare”. USENIX HealthSec’10 workshop

For more information, see:• USENIX Health Security workshop: http://www.usenix.org/event/healthsec10/ • Copy of the paper: http://seclab.uiuc.edu/pubs/GunterML10.pdf• Video of the presentation: http://www.usenix.org/multimedia/healthsec10gunter

Page 7: experience-based access management & privacy-preserving record linkage

TRUST 2010 7

record linkage in surveillance

access logs

“Karen Lewis”

human resources

“Karen Lewis”

hospitalprivacyoffice

Page 8: experience-based access management & privacy-preserving record linkage

TRUST 2010 8

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 9: experience-based access management & privacy-preserving record linkage

TRUST 2010 9

privacy-preserving record linkage (pprl)

set of records from dataholder A set of records from dataholder B

FirstName

LastName

BirthDay

BirthMonth

BirthYear

Gender

Karyn Lewis 28 Sept 1990 F

Marty Smith 19 Apr 1982 M

Jon Smyth 04 Feb 1960 M

Joy Beck 08 May 1980 F

Laura Root 27 Aug 1945 F

FirstName

LastName

BirthDay

BirthMonth

BirthYear

Gender

John Smith 01 Feb 1960 M

Bob Beck 19 Mar 1980 M

Bob Taylor 07 Jun 1972 M

Karen Lewis 28 Sept 1990 F

Alice Todd 27 Aug 1965 F

Page 10: experience-based access management & privacy-preserving record linkage

TRUST 2010 10

roadmap• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 11: experience-based access management & privacy-preserving record linkage

11

steps in record linkage

blocking field comparison

record paircomparison

record pairclassification

matches

non-matches

*

*

* I assume a common schema and method of data standardization. I also assume that the records from an institution have been deduplicated (i.e., record linkage has been applied within each institution such that an individual is represented by only a single record within an institution.) TRUST 2010

Page 12: experience-based access management & privacy-preserving record linkage

TRUST 2010 12

John Smith 04 Mar 1962 M

Jon Smyth 04 Mar 1960 M

0.75 0.8 1 1 0.975 1

record a:

record b:

comparison vector:

field comparison

FirstName

LastName

BirthDay

BirthMonth

BirthYear

Genderfields:

Page 13: experience-based access management & privacy-preserving record linkage

TRUST 2010 13

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 14: experience-based access management & privacy-preserving record linkage

privacy-preserving field comparison experiment

the dataset• 1,000 records from the North Carolina Voter Registration database• fields:

• 1,000 “corrupted” records

• repeated 100 times to examine statistical significanceP. Christen and A. Pudjijono, “Accurate Synthetic Generation of Realistic Personal Information.” Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, 2009. 14TRUST 2010

Last Name

First Name

Middle Name

Birth State

City State Street Name

Street Type

Street Suffix

Race Gender

data corrupter

KATHRYN MCMILLAN

KATHY MEMILLAN

Page 15: experience-based access management & privacy-preserving record linkage

TRUST 2010 15

privacy-preserving field comparison experiment

• option 1: hash & compare• option 2: secure edit similarity• option 3: bloom filter

Page 16: experience-based access management & privacy-preserving record linkage

TRUST 2010 16

privacy-preserving field comparison

option 1: hash & compare

John Smith 04 Mar 1962 M

Jon Smyth 04 Mar 1960 M

0 0 1 1 0 1

record a:

record b:

comparison vector:

xy9l br3f xt ves vr3d ns

nw2 vwer xt ves xd6 ns

SHA-1, “salting” used to prevent dictionary attack

Page 17: experience-based access management & privacy-preserving record linkage

privacy-preserving field comparison experiment

option 2: secure edit similarity

17

• edit distance: the minimal number of insertions, deletions, and substitutions required to convert one string into another

• edit similarity:

• “secure” edit distance: calculated by iteratively using homomorphic encryption to compute the value of each cell of the matrix used in the dynamic programming algorithm to calculate edit distance

𝑒𝑑𝑖𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑠𝑡𝑟𝑖𝑛𝑔1 ,𝑠𝑡𝑟𝑖𝑛𝑔2 )=1− 𝑒𝑑𝑖𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑠𝑡𝑟𝑖𝑛𝑔1 ,𝑠𝑡𝑟𝑖𝑛𝑔2)max ( h𝑙𝑒𝑛𝑔𝑡 (𝑠𝑡𝑟𝑖𝑛𝑔1 ) , h𝑙𝑒𝑛𝑔𝑡 (𝑠𝑡𝑟𝑖𝑛𝑔2 ))

W. Du, M. J. Atallah, “Protocols for Secure Remote Database Access with Approximate Matching, Technical Report”, CERIAS, Purdue Uni-versity, 2001. TRUST 2010

Page 18: experience-based access management & privacy-preserving record linkage

privacy-preserving field comparison experiment

option 3: Bloom filtersrecord a record b

john jon

_j jo oh hn n_ _j jo on n_

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 01 1

h1h2

1 1 1 1 1 1 1 1 1 1 1

77.0135*2

||||||2

tcoefficienDice

α: β:

1,000 bits & 30 hash functions (all variations of SHA-1, “salting” used to prevent dictionary attack)

18Rainer Schnell, Tobias Bachteler, and Jorg Reiher. “Privacy-preserving record linkage using Bloom filters,” BMC Medical Informatics and Decision Making (9). 2009 TRUST 2010

Page 19: experience-based access management & privacy-preserving record linkage

TRUST 2010 19

privacy-preserving field comparison experimentrun time

2.5 GHz quad core PC with 4GB of memory

Elizabeth Durham, Yuan Xue, Murat Kantarcioglu, and Bradley Malin. Submitted to Information Fusion. 2010.

Exact Matching Bloom Filter Estimated Edit Similarity

Embedding Jaro Winkler Phonetic Filter Trigrams100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

323 391

74,088,275

Field Comparison Method

Runt

ime

(sec

onds

)

Page 20: experience-based access management & privacy-preserving record linkage

TRUST 2010 20

privacy-preserving field comparison experiment

correctness

Elizabeth Durham, Yuan Xue, Murat Kantarcioglu, and Bradley Malin. Submitted to Information Fusion. 2010.

Exact Matching Bloom Filter Edit Similarity Embedding Jaro Winkler Phonetic Filter Trigrams75%

80%

85%

90%

95%

100%

83.97%

99.45% 98.54%

Field Comparison Method

True

Pos

itive

Rat

e

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒=¿𝑡𝑟𝑢𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

¿𝑡𝑟𝑢𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+¿ 𝑓𝑎𝑙𝑠𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

Page 21: experience-based access management & privacy-preserving record linkage

TRUST 2010 21

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 22: experience-based access management & privacy-preserving record linkage

TRUST 2010 22

conclusions

hash & compare bloom filter secure edit distance

accuracy:

speed:

security:

overall:

Page 23: experience-based access management & privacy-preserving record linkage

TRUST 2010 23

roadmap

• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage– experiment– conclusions– open research questions in record linkage

Page 24: experience-based access management & privacy-preserving record linkage

TRUST 2010

centralized distributed

24

open research questions in record linkage

Page 25: experience-based access management & privacy-preserving record linkage

TRUST 2010 25

thanks

NLM 2-T15LM07450-06NIH R01 LM009989

NSF CNS-0964063 (EBAM)NSF CCF-0424422 (TRUST)

ebam

privacy-preserving record linkage

Page 26: experience-based access management & privacy-preserving record linkage

TRUST 2010 26

roadmap• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage

– experiment• design• results

– open research questions in record linkage

blocking field comparison

record paircomparison

record pairclassification

Page 27: experience-based access management & privacy-preserving record linkage

TRUST 2010 27

John Smith, …

Bob Beck, …

Bob Taylor, …Karen Lewis, …Alice Todd, …

Jon S

myth, …

Joy B

eck,

Mart

y Smith

, …

Karyn

Lewis,

…La

ura R

oot, …

|A||B| = 25 record pair comparisons

John Smith, …

Bob Beck, …

Bob Taylor, …Karen Lewis, …Alice Todd, …

Jon S

myth, …

Joy B

eck,

…Ka

ryn Le

wis, …

Laur

a Roo

t, …

Mart

y Smith

, …

4 record pair comparisons

no blocking blocking(first letter of last name)

blocking = match = non-match

Page 28: experience-based access management & privacy-preserving record linkage

TRUST 2010 28

roadmap• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage

– experiment• design• results

– open research questions in record linkage

blocking field comparison

record paircomparison

record pairclassification

Page 29: experience-based access management & privacy-preserving record linkage

TRUST 2010

continuous fellegi-sunter

29

* Note this assumes a uniform distribution of similarity scores.

Edward H. Porter and William E. Winkler, “Approximate String Comparison and its Effect on an Advanced Record Linkage System”, Research Report RR97/02, U.S. Census Bureau. 1997.

Page 30: experience-based access management & privacy-preserving record linkage

• conditional probability vectors:• m[i] = P(a[i] == b[i] | (a,b) is a match)* • u[i] = P(a[i] == b[i] | (a,b) is a non-match) *where i = 1, … , # fields

• weight vectors:• agreement weight: wa[i] = log(m[i] / u[i])

• disagreement weight: wd[i] = log(1-m[i] / 1-u[i])

• scoring:

k

i d

a

iiwiiw

bascoreba

ba

1 0][,][1][,][

),(,

,

Fellegi-Sunter (FS)

* The Expectation Maximization (EM) algorithm, or a subset of records for which the true match status is known, can be used to determine these conditional probabilities.

record pair comparison

30

calc

ulat

ed o

nce

per r

ecor

d lin

kage

ove

r all

reco

rd p

airs

calc

ulat

ed fo

r ea

ch re

cord

pai

r

Page 31: experience-based access management & privacy-preserving record linkage

TRUST 2010

conditional probability vectors: weight vectors:

50.090.0log

50.0190.01log

31

fellegi-sunter

I. Fellegi and A. Sunter, "A theory for record linkage.” Journal of the American Statistical Society, 1969.

Page 32: experience-based access management & privacy-preserving record linkage

TRUST 2010 32

roadmap• experience-based access management• privacy-preserving record linkage

– definition– applications– steps in record linkage

– experiment• design• results

– open research questions in record linkage

blocking field comparison

record paircomparison

record pairclassification

Page 33: experience-based access management & privacy-preserving record linkage

TRUST 2010

match score

record pair classification

non-match

non-match

non-match

match

non-match

non-match

non-match

match

non-match

33

record pair classification

mimi williams 02.02.82 f bill rogers 02.02.81 m

mimi williams 02.02.82 f

mimi williams 02.02.82 f

bill rogers 01.01.81 m

bill rogers 01.01.81 m

bill rogers 01.01.81 m

jack abbott 03.03.83 m

jack abbott 03.03.83 m

jack abbott 03.03.83 m

momo williams 01.01.81 f

bill rogers 02.02.81 m

bill rogers 02.02.81 m

william rogers 01.01.81 m

momo williams 01.01.81 f

momo williams 01.01.81 f

william rogers 01.01.81 m

william rogers 01.01.81 m

0

+2

0

+3

+1

+3

+1

0

+1

Page 34: experience-based access management & privacy-preserving record linkage

34

open research questions in record linkage

John Smith, …

Bob Beck, …

Bob Taylor, …Karen Lewis, …Alice Todd, …

Jon S

myth, …

Joy B

eck,

Mart

y Smith

, …

Karyn

Lewis,

…La

ura R

oot, …

|A||B| = 25 record pair comparisons

John Smith, …

Bob Beck, …

Bob Taylor, …Karen Lewis, …Alice Todd, …

Jon S

myth, …

Joy B

eck,

…Ka

ryn Le

wis, …

Laur

a Roo

t, …

Mart

y Smith

, …

4 record pair comparisons

no blocking blocking(first letter of last name)

= match = non-matchTRUST 2010

Page 35: experience-based access management & privacy-preserving record linkage

TRUST 2010 35

open research questions in record linkage

first name

last name

birth date

gender

bill rogers 01.01.81 m

mimi williams 02.02.82 f

jack abbott 03.03.83 m

first name

last name

birth date

gender

bill rogers 02.02.81 m

momo williams 01.01.81 f

william rogers 01.01.81 m

first name

last name

birth date

gender

bill rogers 01.01.81 m

mimi williams 02.02.82 f

jack abbott 03.03.83 m

first name

last name

birth date

gender

bill rogers 02.02.81 m

momo williams 01.01.81 f

william rogers 01.01.81 m

actual

predicted