117
De-identifying Clinical Data Khaled El Emam, CHEO RI & uOttawa

The De-identification of Clinical Data

  • Upload
    kelemam

  • View
    2.576

  • Download
    3

Embed Size (px)

DESCRIPTION

A comprehensive presentation on why the de-identification of clinical information is necessary for secondary uses and how to do it effectively.

Citation preview

Page 1: The De-identification of Clinical Data

De-identifying Clinical DataKhaled El Emam, CHEO RI & uOttawa

Page 2: The De-identification of Clinical Data

www.ehealthinformation.cawww.ehealthinformation.ca

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 3: The De-identification of Clinical Data

Secondary Use/DisclosureSecondary Use/Disclosure

disclosure collection

recipient

collection

individualscustodian

tagent

use custodiandisclosure

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 4: The De-identification of Clinical Data

Data Flows• Mandatory disclosures• Uses by an agent for secondary

Data Flows

• Uses by an agent for secondary purposes

• Permitted discretionary disclosures for • Permitted discretionary disclosures for secondary purposes

• Other disclosures for secondary Other disclosures for secondary purposes

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 5: The De-identification of Clinical Data

Obtaining Consent - I• Sometimes it is not possible or

practical to obtain consent:

Obtaining Consent I

practical to obtain consent:– Making contact to obtain consent may

reveal the individual’s condition to others h hagainst their wishes

– The size of the population may be too large to obtain consent from everyoneto obtain consent from everyone

– Many patients may have relocated or died

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 6: The De-identification of Clinical Data

Obtaining Consent - II– There may be a lack of existing or

continuing relationship with the patients

Obtaining Consent II

– There is a risk of inflicting psychological, social or other harm by contacting individuals or their families in delicate individuals or their families in delicate circumstances

– It would be difficult to contact individuals through advertisements and other public notices

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 7: The De-identification of Clinical Data

Impact of Obtaining Consent• In the case where explicit consent is

used, consenters and non-consenters

Impact of Obtaining Consent

used, consenters and non consenters differ on:– age, sex, race, marital status, educational

level, socioeconomic status, health status, mortality, lifestyle factors, functioning

• The consent rate for express consent • The consent rate for express consent varied from 16% to 93%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 8: The De-identification of Clinical Data

Limiting Principles• Do not collect, use, or disclose PHI if

other information will serve the

Limiting Principles

other information will serve the purpose

• For example, even if it is easier to p ,disclose a whole record, that should not be done if lesser information will reasonably satisfy the purpose

• De-identification would be one element i li iti th t f PHI th t i

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

in limiting the amount of PHI that is collected/used/disclosed

Page 9: The De-identification of Clinical Data

Breaches• In many large research hospitals and

hospital networks it is simply not

Breaches

hospital networks it is simply not possible to control and manage all of the databases and data sets that are created, used, and disclosed for research

• Breach frequency and severity is growingD id tifi ti id t

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

• De-identification provides one way to manage the risks, however

Page 10: The De-identification of Clinical Data

Trust• Patients change their behavior if they

perceive a threat to privacy

Trust

perceive a threat to privacy• This can have a negative impact on the

quality of the data that is used for q yresearch

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 11: The De-identification of Clinical Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 12: The De-identification of Clinical Data

Deloitte Survey (2007)• N=827 respondents in North America• 43% reported more than 10 privacy breaches

Deloitte Survey (2007)

within the last 12 months in their organizations

• Over 85% reported at least one privacy • Over 85% reported at least one privacy breach

• Over 63% reported multiple privacy breaches requiring notification

• Breaches involving 1000+ records were reported by 34% of respondents

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

reported by 34% of respondents

Page 13: The De-identification of Clinical Data

Verizon Study• Based on forensic engagements conducted by

Verizon

Verizon Study

• Breaches resulting from external sources: 73%

• Caused by insiders: 18%• Caused by insiders: 18%• Implicated business partners: 39%• The median number of records involved in an e ed a u be o eco ds o ed a

insider breach were 10 times more than an external breachBi t d h k

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

• Biggest causes are errors and hackers

Page 14: The De-identification of Clinical Data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 15: The De-identification of Clinical Data

HIMSS Leadership Survey• Survey of healthcare IT executives, n=307• Conducted in the 2007-2008 timeframe

HIMSS Leadership Survey

• 24% of respondents reported that they have had a security breach in their organization in the last 12 monthsthe last 12 months

• 16% of respondents reported that they have had a security breach in their organization in the last 6 months

• Half indicated that an internal security breach is a concern to their organizations

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

is a concern to their organizations

Page 16: The De-identification of Clinical Data

HIMSS Analytics Report• IT executives and security officers at

healthcare institutions; n=263

HIMSS Analytics Report

• Half of respondents are concerned with internal inadvertent access to patient data

• 13% indicated that their organization has had • 13% indicated that their organization has had a security breach in the last 12 months

• 80% of these were internal breaches

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 17: The De-identification of Clinical Data

Medical Record Breaches 2008• For all of 2008 (datalossdb.org)• 83 breaches involving medical records (14%

Medical Record Breaches 2008

of total)• Approx. 7.2 million records involved in these

breaches (21 5% of all records)breaches (21.5% of all records)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 18: The De-identification of Clinical Data

Does this Happen Here ?Does this Happen Here ?• Do you know of any cases where computer

equipment was stolen from a hospital ? Did this equipment contain personal health information ?equipment contain personal health information ?

• Do you know if any cases where memory sticks with data on them were lost ?

• Does anyone email data to their hotmail or gmailaccounts so that they can access them from home or while travelling ?or while travelling ?

• Do people still share passwords ?

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 19: The De-identification of Clinical Data

Known Data LeaksKnown Data Leaks• PHI on second hand computers• Leaks through peer-to-peer file sharing networks

P P i t fil th I t t• PowerPoint files on the Internet• Password protected files sent by email

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 20: The De-identification of Clinical Data

Identity TheftIdentity Theft• William Ernst Black (Edmonton 1999)• The creation of identity packages usingThe creation of identity packages using

information about dead children who were living in one jurisdiction but died in another ($37k for each identity package)

• Example: drug smuggler was caught with these identity packages

• Example: American getting free medical care i C d

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

in Canada

Page 21: The De-identification of Clinical Data

Patient ConcernsPatient Concerns• There is evidence (from surveys) that the general

public has changed their behavior to adjust for i d i i k t th i PHIperceived privacy risks wrt their PHI:

– 15% to 17% of US adults– 11% to 13% of Canadian adults

• There is also evidence that vulnerable populations exhibit similar behaviors (e.g., adolescents, people with HIV or at high risk for HIV those undergoingwith HIV or at high risk for HIV, those undergoing genetic testing, mental health patients and battered women)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 22: The De-identification of Clinical Data

Behavior Change - IBehavior Change I• Going to another doctor• Paying out of pocket when insured to avoid

disclosuredisclosure• Not seeking care to avoid disclosure to an employer

or to not be seen entering a clinic by other members of the community

• Giving inaccurate or incomplete information on medical historyy

• Asking a doctor not to record a health problem or record a less serious or embarrassing one

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 23: The De-identification of Clinical Data

Behavior Change - IIBehavior Change II• 87% of US physicians reported that a patient

had asked them not to include certain information in their record

• 78% of US physicians reported that they have withheld information due to privacy concerns

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 24: The De-identification of Clinical Data

SS

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 25: The De-identification of Clinical Data

Asymmetry Principle - IAsymmetry Principle I• Trust is hard to gain but easy to lose:

– Negative events/news carry more weight than g y gpositive ones (negativity bias); it is more diagnosticAvoiding loss people weight negative– Avoiding loss – people weight negative information more greatly in an effort to avoid loss

– Sources of negative information appear more g ppcredible (positive information seems self-serving)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 26: The De-identification of Clinical Data

Asymmetry Principle - IIAsymmetry Principle II– People interpret information according to their

prior beliefs: if they have negative prior beliefs th ti t ill f th t dthen negative events will re-enforce that and positive events will have little impact

– Undecided individuals tend to be affected moreUndecided individuals tend to be affected more by negative information

– People with positive prior beliefs may feel b t d b ti i f ti / tbetrayed by negative information/events

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 27: The De-identification of Clinical Data

Canadian Public - 2007Canadian Public 2007

80

90

100

39 37

46

3440 37

44

3540

50

60

70

0

10

20

30

0

Total BC Alberta Prairies Ont Que Atlantic Territories

In your opinion, how safe and secure is the health

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

y p ,information which EXISTS about you?

(5-7 on a 7 pt scale)

Page 28: The De-identification of Clinical Data

Canadian Public - 2003Canadian Public 2003

Agree (5 7)Agree (5-7)Neither (4)Disagree (1-3)

0 10 20 30 40 50 60 70 80 90 100

DK/NR

I really worry that my personal health information i ht b d f th i th f t

0 10 20 30 40 50 60 70 80 90 100

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

might be used for other purposes in the future which have little to do with my health

Page 29: The De-identification of Clinical Data

How not to De-identify• Just removing the name and address

information is not enough

How not to De identify

information is not enough• It is quite easy to re-identify

individuals from the other data that is left

• There are a number of public real life pexamples of re-identification actually happening

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 30: The De-identification of Clinical Data

Example Data With PHIExample Data With PHI

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 31: The De-identification of Clinical Data

Types of Variables• Identifying variables: variables that

can directly identify a patient

Types of Variables

can directly identify a patient• Quasi-identifiers: variables that can

indirectly identify a patienty y p• Sensitive variables: sensitive clinical

information that the patient would not pwant to be known beyond the circle of care

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 32: The De-identification of Clinical Data

De-identified Data ?De identified Data ?

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 33: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 34: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 35: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 36: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 37: The De-identification of Clinical Data

User #4417749• “tea for good health”• “numb fingers”, “hand tremors”

User #4417749

numb fingers , hand tremors• “dry mouth”• “60 single men”• 60 single men• “dog that urinates on everything”• “landscapers in Lilburn Ga”• landscapers in Lilburn, Ga• “homes sold in shadow lake subdivision

gwinnett county georgia”

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

gwinnett county georgia

Page 38: The De-identification of Clinical Data

Thelma Arnold• 62 year old widow

living in Lilburn Ga

Thelma Arnold

living in Lilburn Ga re-identified by the New York Times

• She has three dogs

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 39: The De-identification of Clinical Data

What Happened Next ?• Maureen Govern, CTO of AOL “resigns”• Abdur Chowdhury, AOL researcher who

What Happened Next ?

Abdur Chowdhury, AOL researcher who released the data was fired

• Abdur’s boss in the research department was fired

• Big embarrassment for AOLg

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 40: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 41: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 42: The De-identification of Clinical Data

Examples of Re-identificationExamples of Re identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 43: The De-identification of Clinical Data

Uniqueness in the US Population• Studies show that between 63% to

87% of the US population is unique on

Uniqueness in the US Population

87% of the US population is unique on their date of birth + ZIP code + gender

• Uniqueness makes it quite easy to re-q q yidentify individuals using a variety of techniques

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 44: The De-identification of Clinical Data

Uniqueness in Canadian PopulationUniqueness in Canadian Population100%

60%

80%

ques

40%

Perc

ent U

niq

PC0%

20%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

PC PC + Gender PC + DoBPC + DoB + Gender1 2 3 4 5 6

Number of Characters in Postal Code

Page 45: The De-identification of Clinical Data

Example• This example shows the risk of re-

identification using just demographics

Example

identification using just demographics

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 46: The De-identification of Clinical Data

Types of Disclosure• Identity Disclosure: being able to

determine the identity associated with

Types of Disclosure

determine the identity associated with a record

• Attribute Disclosure: discovering gsomething new about an individual known to be in the database

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 47: The De-identification of Clinical Data

Disclosure and Invasion-of-Privacy• An important first criterion is deciding

on the sensitivity of the data and the

Disclosure and Invasion of Privacy

on the sensitivity of the data and the potential for harm to the patients from a secondary use/disclosure

• If the invasion-of-privacy is deemed low then there may not be a need to de-identify the data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 48: The De-identification of Clinical Data

Invasion-of-Privacy - I• The personal information in the Data is

highly detailed

Invasion of Privacy I

highly detailed• The information in the Data is of a

highly sensitive and personal natureg y p• The information in the Data comes

from a highly sensitive contextg y• Many people would be affected if there

was a Data breach or the Data was

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

processed inappropriately by the recipient/agent

Page 49: The De-identification of Clinical Data

Invasion-of-Privacy - II• If there was a Data breach or the Data

was processed inappropriately by the

Invasion of Privacy II

was processed inappropriately by the recipient/agent that may cause direct and quantifiable damages and measurable injury to the patients

• If the recipient/agent is located in a different jurisdiction, there is a possibility, for practical purposes, that the data sharing agreement will be

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

the data sharing agreement will be difficult to enforce

Page 50: The De-identification of Clinical Data

Invasion-of-Privacy – Consent - I• There is a provision in the relevant

legislation permitting the

Invasion of Privacy Consent I

legislation permitting the disclosure/use of the Data without the consent of the patients

• The Data was unsolicited or given freely or voluntarily by the patients with little expectation of it being maintained in total confidence

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 51: The De-identification of Clinical Data

Invasion-of-Privacy – Consent - II• The patients have provided express

consent that their Data can be

Invasion of Privacy Consent II

consent that their Data can be disclosed for this secondary Purpose when it was originally collected or at some point since then

• The custodian has consulted well-defined groups or communities regarding the disclosure of the Data and had a positive response

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

and had a positive response

Page 52: The De-identification of Clinical Data

Invasion-of-Privacy – Consent - III• A strategy for informing/notifying the

public about potential disclosures for

Invasion of Privacy Consent III

public about potential disclosures for the recipient’s secondary Purpose was in place when the data was collected or since then

• Obtaining consent from the individuals at this point is inappropriate or impractical

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 53: The De-identification of Clinical Data

Identity Disclosure• Three common types:

– Prosecutor risk

Identity Disclosure

Prosecutor risk– Journalist risk– Rareness

• All three are concerned with the risk of re-identifying a single individual

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 54: The De-identification of Clinical Data

Prosecutor vs. Journalist• If all of the following is true then

prosecutor risk is relevant:

Prosecutor vs. Journalist

p– The data represents the whole population

such that everyone is known to be in it or the sampling fraction is very highthe sampling fraction is very high

– If not the whole population, it is possible for an intruder to know that a particular pperson has a record in the data• Patient may self-reveal

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

• Data collection method is revealing

• Otherwise journalist risk is relevant

Page 55: The De-identification of Clinical Data

Prosecutor Risk - I• The intruder has background

information about a specific individual

Prosecutor Risk I

pknown to be in the database

• The amount of background information will depend on the intruder

• The intruder is attempting to find the record belonging to that individual in the database

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 56: The De-identification of Clinical Data

Prosecutor Risk - II• Examples of intruders:

– Neighbor

Prosecutor Risk II

g– Ex-spouse– Employer– Relative

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 57: The De-identification of Clinical Data

ExampleExampleDate of Birth Gender Postal Code Diagnosis12/03/1957 M K0J 1P012/03/1957 M K0J 1P0 …01/7/1978 M K0J 1P0 …09/12/1968 F K0J 1P0 …17/08/1987 F K0J 1P0 …25/02/1974 F K0J 1T0 …23/05/1985 M K0J 1T0 …K0J 1T0 …14/03/1965 F K0J 2A0 …

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 58: The De-identification of Clinical Data

Selecting Variables – Prosecutor - I• In the best case assumption, a

neighbor would know:

Selecting Variables Prosecutor I

g– Address and telephone information about

the VIP– Household and dwelling information – Household and dwelling information

(number of children, value of property, type of property)K d t (bi th d th ddi )– Key dates (births, deaths, weddings)

– Visible characteristics: gender, race, ethnicity, language spoken at home,

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

weight, height, physical disabilities– Profession

Page 59: The De-identification of Clinical Data

Selecting Variables – Prosecutor - II• What would an ex-spouse know:

– The same things that a neighbor would

Selecting Variables Prosecutor II

g gknow

– Basic medical history (allergies, chronic diseases)diseases)

– Income, years of schooling• All of these variables would be

considered quasi-identifiers if they appear in the database

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 60: The De-identification of Clinical Data

Journalist Risk• The journalist is not looking for a

specific person – re-identifying any

Journalist Risk

p p y g yperson will do

• The journalist has access to a database that s/he can use for matching

• This is called an identification database

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 61: The De-identification of Clinical Data

Journalist Matching ExampleJournalist Matching ExampleMedical Database Identification DB

D BClinicaland labdata

DoB

Initials

Gender

Postal

Name

Address

Telephone No.PostalCode

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Quasi-Identifiers

Page 62: The De-identification of Clinical Data

Assessing Journalist Risk• In general, we want to know how rare

the quasi-identifier values would be in

Assessing Journalist Risk

qthe population (e.g., homeowners/professionals/civil

t i th hi f servants in the geographic area of interest)If th bi ti i t th • If the combination is not rare then there is small journalist risk

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 63: The De-identification of Clinical Data

Selecting Variables – Journalist - I• Depends on what information can be

obtained in an identification database

Selecting Variables Journalist I

• For an external intruder, likely variables are those available in public egist iesregistries:– Key dates (birth, death, marriage)– ProfessionProfession– Home address and telephone number– Type of dwelling

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

– Gender, ethnicity, race– Income if a highly paid public servant

Page 64: The De-identification of Clinical Data

Selecting Variables – Journalist - II• Assume that an internal intruder would

be able to get all relevant

Selecting Variables Journalist II

gadministrative data:– Key dates (birth, death, admission,

discharge visit)discharge, visit)– Gender, address, telephone number

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 65: The De-identification of Clinical Data

Inference of Variables - I• Even though a particular quasi-

identifier may not be known to the

Inference of Variables I

yintruder (prosecutor risk), available in an identification database (journalist), or available in the disclosed database or available in the disclosed database (all three risks), it may be possible to infer it from other variables

• Variables that can be inferred should be treated as quasi-identifiers

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 66: The De-identification of Clinical Data

Inference of Variables - II• Inferred variables should be added to

the disclosed database if they are not

Inference of Variables II

ythere because they may be used in a re-identification attack, and you want to take them into account during risk to take them into account during risk assessment

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 67: The De-identification of Clinical Data

Inference Examples• Gender, ethnicity, religious origin from

name

Inference Examples

• Age from graduation date• Profession from payer of insurance

claim (e.g., civil servants have a single health insurer)

• Age and gender from a diagnostic or • Age and gender from a diagnostic or lab code (e.g., mamogram or PSA test)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 68: The De-identification of Clinical Data

Rareness• If individuals are rare on the quasi-

identifiers, then they are at higher

Rareness

, y gprosecutor and journalist re-identification risk

• If an individual has a rare and visible characteristic/feature, then that also

k th i t id tif ( makes them easier to re-identify (eg, put an ad in the radio)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 69: The De-identification of Clinical Data

Attribute Disclosure• If there is very little variation on

sensitive variables

Attribute Disclosure

• The data set can represent a whole population or some subset

• Learn something new about a person without actually finding which record belongs to them

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 70: The De-identification of Clinical Data

A Pragmatic Approach• It is important to ensure that the

quasi-identifiers are plausible for the

A Pragmatic Approach

q pdata and the recipients of the data

• If you select many quasi-identifiers then that ill b definition inc ease the then that will by definition increase the re-identification risk

• Ideally each selected quasi-identifier • Ideally, each selected quasi identifier should be associated with a realistic re-identification scenario

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 71: The De-identification of Clinical Data

Constructing an Identification DB• This may be a single physical database

or a join of multiple sources together

Constructing an Identification DB

or a join of multiple sources together to construct a virtual database

• It will have the quasi-identifiers as well qas identity information, but will not have the sensitive information (e.g., clinical or financial details)

• The sources may be public and free, bli d f f f ll

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

public and for a fee, or fully commercial

Page 72: The De-identification of Clinical Data

Examples of Identification DBs - I• These are databases or sources

(Canada):

Examples of Identification DBs I

(Canada):– Obituaries: available from newspapers and

funeral homes; there are obituary h k h laggregator sites that make this simple

– PPSR: Private Property Security Registration; contains information on loans Registration; contains information on loans secured by property (e.g., cars)

– Land Registry: information on house

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

ownership

Page 73: The De-identification of Clinical Data

Examples of Identification DBs - II– Membership Lists: provide comprehensive

listings of professionals (e.g., doctors,

Examples of Identification DBs II

lawyers, civil servants)– Salary Disclosure Reports: provided by

governments for those earning higher than governments for those earning higher than a certain threshold

– White Pages: public telephone directory– Job Sites: CVs posted in public and closed

job web sitesD ti Di l f d ti t

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

– Donations: Disclosures of donations to political parties (include address)

Page 74: The De-identification of Clinical Data

Voter Lists - I• Cannot legally be used for purposes

outside of an election (in Canada)

Voter Lists I

( )• But, a charity allegedly supporting a

terrorist group (Tamil Tigers) was fo nd b the RCMP to ha e Canadian found by the RCMP to have Canadian voter lists

• Volunteers do not necessarily destroy • Volunteers do not necessarily destroy or dispose of the lists after an election (and in many cases do not sign

thi b f th t th )

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

anything before they get them)

Page 75: The De-identification of Clinical Data

Voter Lists - II• It is not expensive (or difficult) to

become a candidate in an election and

Voter Lists II

get the voter list:– Alberta: $500

BC: $100– BC: $100– NB: $100 (+nominated by 25 electors)– Ontario: $100$– Quebec: 0$ (+nominated by 100 electors)

• Canadian voter lists do not contain the D B ( t)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

DoB (yet)

Page 76: The De-identification of Clinical Data

Economics of Identification DBs• Some data sources have a fee for each

individual record/search

Economics of Identification DBs

• This makes the cost of creating an identification database quite high

• This may impose a large economic burden on an intruder and act as a deterrent from creating identification deterrent from creating identification databases

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 77: The De-identification of Clinical Data

Internal Identification Databases• An internal intruder may have access

to administrative databases that can

Internal Identification Databases

act as Identification DB• For example, in a hospital an internal

int de ma ha e access to all intruder may have access to all admissions; this is not sensitive data so is less protected but has enough p gdemographics that it can be good as an identification databaseThi t i t l i t d t h

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

• This puts internal intruders at a huge advantage

Page 78: The De-identification of Clinical Data

Internal Access• An internal intruder can get access to

such an administrative database:

Internal Access

– had access in a previous position but that access was not revoked

– people in the organization share access credentials, so the intruder can use someone else’s credentials to get the administrative database

– has access as part of his/her job and there are no audit trailsaudit trails

– internal systems are not well protected because internal people are trusted and intruder knows how to break-in the system to get the data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

to break in the system to get the data

Page 79: The De-identification of Clinical Data

Public Registries• In the following slides I will explain

how to create identification databases

Public Registries

from public registries in Canada

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 80: The De-identification of Clinical Data

Professional Groups - IProfessional Groups IWe can construct identification databases for specific

professional groups

MembershipLists

PPSRLists

White Pages

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 81: The De-identification of Clinical Data

Professional Groups - II• College of Physicians and Surgeons of Ontario• Law Society of Upper Canada

P f i l E i O t i

Professional Groups II

• Professional Engineers Ontario• College of Occupational Therapists• College of Physical Therapists • Public servants (eg, GEDS)• …….

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 82: The De-identification of Clinical Data

What is the success rate ?CPSO LSUC

• Ability to get home postal codes (source: PPSR and 60% 45%• Ability to get home postal codes (source: PPSR and telephone directory)

60% 45%

• Ability to get practice/firm postal codes (source: CPSO/LSUC)

100% 100%CPSO/LSUC)

• Ability to get date of birth (source: PPSR) 40% 45%

• Ability to get gender (source: CPSO/genderizing 100% 100%Ability to get gender (source: CPSO/genderizingLSUC)

100% 100%

• Ability to get initials (source: CPSO/LSUC) 100% 100%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 83: The De-identification of Clinical Data

What is the success rate by gender?What is the success rate by gender?CPSO LSUC

MALEMALE

• Ability to get home postal codes (source: PPSR and telephone directory)

63% 48%

• Ability to get date of birth (source: PPSR) 45% 48%• Ability to get date of birth (source: PPSR) 45% 48%

FEMALE

• Ability to get home postal codes (source: PPSR and 49% 40%Ability to get home postal codes (source: PPSR and telephone directory)

49% 40%

• Ability to get date of birth (source: PPSR) 29% 40%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 84: The De-identification of Clinical Data

HomeownersHomeownersWe can construct identification databases for specific

postal codes

LandRegistry

PPSRCanadaPost RegistryPost

White Pages

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 85: The De-identification of Clinical Data

What is the success rate ?Ott To

• Ability to get initials 93% 100%

• Ability to get DoB 33% 40%

• Ability to get telephone number 80% 50%

• Ability to get gender 87% 95%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 86: The De-identification of Clinical Data

Re-id Risk for Homeowners• The number of households per postal

code is quite small (Ott: 15; To: 20)

Re id Risk for Homeowners

q ( ; )• The individuals (homeowners) were

unique on common combinations of quasi-identifiers (eg, gender and DoB)

• For these individuals re-identification risk is very high

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 87: The De-identification of Clinical Data

Civil Servants - I• GEDS is on the Internet: Government

Electronic Directory Services

Civil Servants I

• There are 386,630 individuals in the federal government (159,652 in Ontario and 28 046 in Alberta)Ontario and 28,046 in Alberta)

• GEDS has approx. 170,000 entries• Incomplete because: organizations can • Incomplete because: organizations can

opt-out, some individuals need to opt-in, and some employees and orgs are

d ( CSIS DND)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

exempted (eg, CSIS, DND)

Page 88: The De-identification of Clinical Data

Civil Servants - II• We selected a sample of 40 individuals

in health care related federal

Civil Servants II

departments in Ontario• Able to get home address for 50%,

home telephone number for 40%, gender for 100%, DoB for 22.5%

• Provincial governments have similar sources

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 89: The De-identification of Clinical Data

Re-identification Threshold• There is a spectrum of re-identification

risk

Re identification Threshold

• When does the probability of re-identification become so high that the information is deemed identifiable ?

• Canadian privacy law tends not to be precise about this

• Gordon case: serious possibility test

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 90: The De-identification of Clinical Data

Canadian Definitions - ICanadian Definitions IPrivacy Law DefinitionOntario PHIPA “Identifying information” means information that identifies an

individual or for which it is reasonably foreseeable in theindividual or for which it is reasonably foreseeable in the circumstances that it could be utilized, either alone or with other information, to identify an individual.

Nfld PPHI “Identifying information” means information that identifies anNfld PPHI Identifying information means information that identifies an individual or for which it is reasonably foreseeable in the circumstances that it could be utilized either alone or together with other information to identify an individual.

Sask THIPA “De-identified personal health information” means personal health information from which any information that may reasonably be expected to identify an individual has been removed

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

removed.

Page 91: The De-identification of Clinical Data

Canadian Definitions - IICanadian Definitions IIPrivacy Law DefinitionAlberta HIA “Individually identifying” means that the identity of the individual be a d dua y de y g ea s a e de y o e d dua

who is the subject of the information can be readily ascertained from the information; “nonidentifying” means that the identity of the individual who is the subject of the information cannot be readily ascertained from the informationreadily ascertained from the information.

NB PPIA “Identifiable individual” means an individual can be identified by the contents of the information because the information includes the individual’s name makes the individual’s identity obvious orthe individual s name, makes the individual s identity obvious, or is likely in the circumstances to be combined with other information that includes the individual’s name or makes the individual’s identity obvious.

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 92: The De-identification of Clinical Data

Re-identification Risk SpectrumRe identification Risk Spectrum

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 93: The De-identification of Clinical Data

Re-identification Threshold• Privacy legislation treats the threshold

in two ways:

Re identification Threshold

y– Discretionary/permitted disclosures and

uses = threshold can be anywhere along the spectrumthe spectrum

– Only de-identified information without consent = information id identifiable or not; there is no spectrum

• Any systematic approach to dealing

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

with thresholds must cover both

Page 94: The De-identification of Clinical Data

Threshold Precedents - I• We will use healthcare precedents as

an indication of the risk that society

Threshold Precedents I

yhas agreed to take:– The largest probability of re-identification

th t i d i li id li that is used in any policy or guideline document in Canada or the US is 0.33

– If the probability is > 0.33 then the If the probability is > 0.33 then the information would certainly be considered identifiable

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 95: The De-identification of Clinical Data

Threshold Precedents - II– The most common probability of re-

identification used in disclosure control of h lth d t i 0 2 ( ll i f 5)

Threshold Precedents II

health data is 0.2 (cell size of 5)– It makes sense that a value of 0.2 would

be used as a “default” riskbe used as a default risk

• Below 0.33 there are many degrees of de-identification

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 96: The De-identification of Clinical Data

Example• The choice of threshold has a

significant impact on risk assessment

Example

g presults

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 97: The De-identification of Clinical Data

De-identification TechniquesDe identification Techniques

D1identifyingquasiidentifying

D2 D3

y gvariables

y gvariables

D2 D3

Randomization Coding Heuristics Analytics

Suppression

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 98: The De-identification of Clinical Data

Examples of Analytics• Table aggregation – disclose only

summary tables

Examples of Analytics

y• Generalization• Record or variable suppressionpp• Geographic aggregation• Sub-samplingSub sampling• Adding noise

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 99: The De-identification of Clinical Data

Common De-identification Heuristic• If geographic area has a small

population, then:

Common De identification Heuristic

p p ,– Suppress all data from that area– Aggregate the geographic area

• Applied for a variety of data sets, including public health data sets

• For many applications this heuristic results in significant loss of data or imperils analysis

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

imperils analysis

Page 100: The De-identification of Clinical Data

Examples• HIPAA: 20k rule• Census Bureau: 100k rule

Examples

Census Bureau: 100k rule• Statistics Canada: 70k rule• British Census: 120k rule• British Census: 120k rule

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 101: The De-identification of Clinical Data

The Problem• Such generic rules ignore the specific

variables that are included in a data

The Problem

set• A smaller cutoff should be used if few

variables are in a data set• A larger cutoff should be used if many

variables are in a data set

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 102: The De-identification of Clinical Data

Automation - IAutomation I

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 103: The De-identification of Clinical Data

Automation - IIAutomation II

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 104: The De-identification of Clinical Data

Our GAPS Models20,000 70,000 100,000

Province Our GAPS Models

Cutoff Cutoff Cutoff

FSA Pop FSA Pop FSA Pop FSA Pop

Alb t 55% 84% 38% 71% 1 4% 5% 0 0Alberta 55% 84% 38% 71% 1.4% 5% 0 0

British Columbia 68% 87% 46% 70% 1.1% 4% 0 0

Manitoba 59% 88% 39% 68% 0 0 0 0

New Brunswick 20% 51% 4.5% 19% 0 0 0 0

Newfoundland 55% 83% 30% 62% 0 0 0 0

Nova Scotia 47% 82% 16% 43% 0 0 0 0

Ontario 69% 91% 49% 76% 1.4% 5% 0.2% 1%

PEI 57% 90% 43% 79% 0 0 0 0

Quebec 59% 84% 36% 63% 1% 5% 0.25% 0

Saskatchewan 60% 93% 49% 84% 2% 7% 0 2%

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Saskatchewan 60% 93% 49% 84% 2% 7% 0 2%

Page 105: The De-identification of Clinical Data

Risk Methodology• De-identification by itself is not

sufficient:

Risk Methodology

– Using low thresholds results in rapid data quality deteriorationUsing high thresholds is perceived as too – Using high thresholds is perceived as too risky

– We want to create incentives for the data recipients to improve their security and privacy practices

M th d l ll t l t d

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

• Methodology allows you to select and justify a threshold

Page 106: The De-identification of Clinical Data

Managing Re-identification RiskManaging Re identification RiskV A

Amount ofDe-identification

-

RiskExposurep

- ++

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

MitigatingControls

Motives &Capacity

Invasion-of-PrivacyV A

Page 107: The De-identification of Clinical Data

The TradeoffsThe TradeoffsAbility to Re-identify the Data

s Low High

Con

trol

s g

Low balanced dangerous

gatin

g C

High

higher costburden ondata recipient

Miti

g Highconservative balanced

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

lower data quality

Page 108: The De-identification of Clinical Data

Steps in Risk Methodology• The methodology has two steps to

evaluate the overall risks

Steps in Risk Methodology

• First we determine the probability of a re-identification attempt

• Then we determine the re-identification risk to use

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 109: The De-identification of Clinical Data

Determining Pr Re-identification Attempts

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 110: The De-identification of Clinical Data

Determining Risk Threshold to Use

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 111: The De-identification of Clinical Data

Implementation of MethodologyImplementation of Methodology• An important component of this

methodology is the ability to audit the gy ydata recipient/agent receiving the data

• Update audits are performed regularly• Data sharing agreements are put in

place for external recipients and external agents (internal ones usually external agents (internal ones usually covered by employment agreements)

• The elements in the security maturity

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

y yprofile are part of the data sharing agreement

Page 112: The De-identification of Clinical Data

Compliance AuditsCompliance Audits• The audits use a publicly available

checklist• Audit results would be generally

accepted so that recipients do not need to get a dited epeatedl fo diffe ent to get audited repeatedly for different disclosures

• Intended to be rapid (one or two day • Intended to be rapid (one or two day on-site) and cheap ($1k to $2k)

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 113: The De-identification of Clinical Data

Example - Pharmacy DataExample Pharmacy Data• Request to CHEO for prescription data

from a commercial data broker• Concern that this data could potentially

identify patients• We performed a study to evaluate re-

identification risk and come up with an anonymous version of the dataanonymous version of the data

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 114: The De-identification of Clinical Data

Prescription Records Example• Patient age in days• Patient gender• Forward Sortation Area• Admission date

• Gender• Length of stay in days• Quarter and year of admission• Admission date

• Discharge date• Diagnosis• Dispensed drug

• Quarter and year of admission• Patient’s region (first character of the

postal code)• Patient’s age in weeks• Diagnosis• Dispensed drug

• Regular third party privacy/security auditsB h ifi i l b i l• Breach notification protocols must be in place

• Restrictions on further distribution of raw data• Data destruction provisions

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 115: The De-identification of Clinical Data

An Example DeploymentAn Example Deployment

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 116: The De-identification of Clinical Data

An Example DeploymentAn Example Deployment

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca

Page 117: The De-identification of Clinical Data

An Example DeploymentAn Example Deployment

Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca