Limiting Disclosure in Hippocratic Databases Kristen LeFevre
Rakesh Agrawal Vuk Ercegovac Raghu Ramakrishnan Yirong Xu David
DeWitt VLDB August 31, 2004
Slide 2
8/31/2004 Limiting Disclosure in Hippocratic Databases2
Presentation Outline Hippocratic Databases framework for managing
privacy, including the problem of limiting disclosure Overview of
our proposal for integrating policy- driven disclosure control into
an existing relational database environment Brief discussion of
alternative cell-level enforcement models Optimized implementation
of opt-in and opt-out choices Overview of performance evaluation
Conclusions
Slide 3
8/31/2004 Limiting Disclosure in Hippocratic Databases3
Hippocratic Databases and Limited Disclosure Hippocratic Databases
have been proposed as a framework for managing privacy-sensitive
information Limited disclosure is one of the defining principles of
this framework Limited Disclosure includes 3 Main Ideas: Privacy
Policy Organizations define a set of rules describing to whom data
may be disclosed (recipients) and how the data may be used
(purposes) Consent Data subjects given control over who may see
their personal information and under what circumstances Disclosure
Control Database ensures that privacy policy and data subject
consent is enforced with respect to all data access Limits the
outflow of information from the database
Slide 4
8/31/2004 Limiting Disclosure in Hippocratic Databases4
Motivating Example Consider a group of athletes registering for a
major international competition Personal information is collected
from each athlete, possibly including Name, Age, Nationality,
Address, Phone number, Visa status Data must be managed according
to the organizing committees privacy policy Government officials
are allowed to see visa information for the purpose of venue
security Team travel agents may see the contact information for
athletes from their own country for making travel arrangements
Organizing committee may not disclose athletes information to
journalists without the athletes consent
Slide 5
8/31/2004 Limiting Disclosure in Hippocratic Databases5 Limited
Disclosure Framework Goals Provide techniques for enforcing a broad
class of privacy policy rules Privacy policy enforcement should
require little or no modification to existing application code
Policy rules should be stored and managed by the database Provide
limited disclosure enforcement at the cell level
Slide 6
8/31/2004 Limiting Disclosure in Hippocratic Databases6 Limited
Disclosure Framework Overview Privacy Meta- Data Data Table Query
Modifier Policy Definition Query Consent Info Subject Consent Start
with an existing database environment with associated applications
Privacy policy is defined and stored in the database in privacy
meta-data tables When providing information, data subjects also
provide consent for various data use Queries are modified so
results respect privacy policy and consent
Slide 7
8/31/2004 Limiting Disclosure in Hippocratic Databases7 Policy
Definition Privacy policy is defined using one of the following
XML-based policy definition languages Platform for Privacy
Preferences (P3P) Enterprise Privacy Authorization Language
(EPAL)
Slide 8
8/31/2004 Limiting Disclosure in Hippocratic Databases8 Privacy
Meta-Data and Policy Meta-Language Privacy meta-language for
expressing the privacy policy in the database Not tied to one
particular policy language Many practical P3P and EPAL policies can
be translated to this language Privacy policy is a set of rules of
the form Condition must be a predicate that can be expressed in SQL
Privacy policy rules stored in the database
Slide 9
8/31/2004 Limiting Disclosure in Hippocratic Databases9
Journalists may only see athletes names for the purpose of writing
articles with explicit consent Government officials may see
athletes visa information for security purposes. Privacy Meta-Data
Example C2AddressAthletesJournalistArticlesR6P1
C1NameAthletesJournalistArticlesR5P1 -NameAthletesGovt
Off.SecurityR2P1 -PhoneAthletesTravel Ag.TravelR4P1
-NameAthletesTravel Ag.TravelR3P1 -VisaAthletesGovt
Off.SecurityR1P1 CondIDColumnTableRecipientPurposeRulePolicy EXISTS
(SELECT Name_choice FROM Athlete_choices WHERE Athletes.Athlete# =
Athlete_choices.Athlete# AND Athlete_choices.Address_choice = 1) C2
EXISTS (SELECT Name_choice FROM Athlete_choices WHERE
Athletes.Athlete# = Athlete_choices.Athlete# AND
Athlete_choices.Name_choice = 1) C1 PredicateCondID
Slide 10
8/31/2004 Limiting Disclosure in Hippocratic Databases10 Query
Modification Implemented two alternative algorithms for modifying
queries to incorporate policy rules and consent information Queries
modified in such a way that query results follow one our cell-
level semantic models
Slide 11
8/31/2004 Limiting Disclosure in Hippocratic Databases11
Enforcement Models Row (tuple)-level enforcement insufficient for
enforcing arbitrary policies when existing database schemas are not
designed with the policy in mind
Slide 12
8/31/2004 Limiting Disclosure in Hippocratic Databases12 An
Example Athlete#NameAgeAddressPhone 1 Michael Phelps
19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian
Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table
Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent
information for journalists writing stories
Slide 13
8/31/2004 Limiting Disclosure in Hippocratic Databases13
Row-Level Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps
19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian
Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table
Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent
information for journalists writing stories
Slide 14
8/31/2004 Limiting Disclosure in Hippocratic Databases14
Row-Level Enforcement 444-4444New York31 Jenny Thompson 4
333-3333Sydney23 Ian Thorpe 3 111-1111Baltimore19 Michael Phelps 1
PhoneAddressAgeNameAthlete# # NameAgeAddressPhone 1 2XXXXX 3XX 4XXX
Consent information for journalists writing stories Must either
disclose prohibited information, or restrict information that
should be available! Filter Athlete #2 because no consent is
provided
Slide 15
8/31/2004 Limiting Disclosure in Hippocratic Databases15
Enforcement Models Cell-level enforcement Table Semantics model
Query Semantics model
Slide 16
8/31/2004 Limiting Disclosure in Hippocratic Databases16 Table
Semantics Enforcement 1. Mask prohibited cells with the null value
2. Filter rows where the primary key is prohibited 3. Conceptually,
query is performed on top of this view
Slide 17
8/31/2004 Limiting Disclosure in Hippocratic Databases17 Table
Semantics Enforcement SQLs null value represents no value Desirable
semantics for prohibited values Predicates applied to null never
evaluate to true Null does not join with other values Null is not
included when computing aggregates
Slide 18
8/31/2004 Limiting Disclosure in Hippocratic Databases18 Table
Semantics Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps
19Baltimore111-1111 2 Natalie Coughlin 22Berkeley222-2222 3 Ian
Thorpe 23Sydney333-3333 4 Jenny Thompson 31New York444-4444 Table
Athletes #Athlete#NameAgeAddressPhone 1 2XXXXX 3XX 4XXX Consent
Information Athlete#NameAgeAddressPhone 1Michael
Phelps19Baltimore111-1111 3Sydney333-3333 4Jenny Thompson
Athlete#NameAgeAddressPhone 1Michael Phelps19Baltimore111-1111
3Sydney333-3333 4Jenny Thompson Mask prohibited cells with null
Filter rows where the primary key is prohibited
Slide 19
8/31/2004 Limiting Disclosure in Hippocratic Databases19
Enforcement Models Cell-level enforcement Table Semantics model
Query Semantics model
Slide 20
8/31/2004 Limiting Disclosure in Hippocratic Databases20 Query
Semantics Enforcement 1. Mask prohibited cells with the null value
2. Execute the query on top of the masked table 3. Filter rows that
are entirely null from the result set
Slide 21
8/31/2004 Limiting Disclosure in Hippocratic Databases21 Query
Semantics Enforcement Athlete#NameAgeAddressPhone 1 Michael Phelps
19Baltimore111-1111 3Sydney333-3333 4 Jenny Thompson NameAge
Michael Phelps19 Jenny Thompson NameAge Michael Phelps19 Jenny
Thompson Query Semantics NameAge Michael Phelps19 Jenny Thompson
Table Semantics Issue Query: SELECT Name, Age FROM Athletes Filter
rows that are entirely null from result set Mask prohibited cells
with null
Slide 22
8/31/2004 Limiting Disclosure in Hippocratic Databases22 Query
Modification Example (Table Semantics) SELECT Name FROM Athletes
WHERE Name = Michael Phelps SELECT CASE WHEN EXISTS (SELECT
Name_Choice FROM Athlete_Choices WHERE Athletes.Athlete# =
Athlete_Choices.Athlete# AND Athlete_Choices.Name_Choice = 1) THEN
Name ELSE null END FROM Athletes WHERE Name = Michael Phelps AND
EXISTS (SELECT Athlete#_Choice FROM Athlete_Choices WHERE
Athletes.Athlete# = Athlete_Choices.Athlete# AND
Athlete_Choices.Athlete#_Choice = 1)
Slide 23
8/31/2004 Limiting Disclosure in Hippocratic Databases23
Database-level disclosure control Database the best place to
enforce limited disclosure More efficient, flexible, and secure
than an application-level approach Need not fetch prohibited data
from the database When applied naively, an application-level
approach leads to privacy leaks when applied at the cell level
Consider the query SELECT Name, Age FROM Athletes WHERE Age >
30
Slide 24
8/31/2004 Limiting Disclosure in Hippocratic Databases24 Based
on this query, it is easy to infer that Jenny Thompsons age is
greater than 30! NameAge Jenny Thompson 31 Example: Difficulties of
application-level disclosure control Athlete#NameAgeAddressPhone
1Michael Phelps19Baltimore111-1111 2Natalie
Coughlin22Berkeley222-2222 3Ian Thorpe23Sydney333-3333 4Jenny
Thompson31New York444-4444 Table Athletes 4 3 2 1 # XXX X X XXX X X
PhoneAddressAgeNameAthlete# Consent Information Jenny Thompson
AgeName Query the database; Retrieve results to application Check
policy and consent info; replace prohibited cells with null
Slide 25
8/31/2004 Limiting Disclosure in Hippocratic Databases25
Database-level disclosure control Database is a logical place to
enforce limited disclosure More efficient and flexible than an
application- level rule engine approach Need not fetch prohibited
data from the database When applied naively, an application-level
approach leads to privacy leaks when applied at the cell level
Consider the query SELECT Name, Age FROM Athletes WHERE Age > 30
Alternative approach performs much query processing in the
application Even more complicated to compute aggregates and joins
when some cells are prohibited!
Slide 26
8/31/2004 Limiting Disclosure in Hippocratic Databases26
Optimized Implementation of Opt-in and Opt-out Conditions Important
to note that SQL queries offer much flexibility for defining
disclosure conditions In practice simple opt-in and opt-out choices
are often used to express subject consent and are extremely
important Sufficient for expressing P3P policy rules Sufficient for
expressing many HIPAA- mandated policies, for example. Implemented
several techniques for storing consent and optimizing this type of
condition
Slide 27
8/31/2004 Limiting Disclosure in Hippocratic Databases27
Optimized Implementation of Opt-in and Opt-out Conditions Several
alternative storage techniques Internal column (inline)
representation External, single table representation External,
multiple table representation
Slide 28
8/31/2004 Limiting Disclosure in Hippocratic Databases28
Optimized Implementation of Opt-in and Opt-out Conditions
Athlete#NameAgeAddressPhoneAthlete #NameAgeAddressPhone 1 Michael
Phelps 19Baltimore111- 1111 yes 2 Natalie Coughlin 23Berkeley222-
2222 no 3 Ian Thorpe 23Sydney333- 3333 yesno yes 4 Jenny Thompson
31New York444- 4444 yes no Table Athletes Internal Column
representation
Slide 29
8/31/2004 Limiting Disclosure in Hippocratic Databases29
Optimized Implementation of Opt-in and Opt-out Conditions External,
single table representation Athlete#NameAgeAddressPhone 1 Michael
Phelps 19Baltimore111- 1111 2 Natalie Coughlin 23Berkeley222- 2222
3 Ian Thorpe 23Sydney333- 3333 4 Jenny Thompson 31New York444- 4444
Table Athletes IDAthlete#NameAgeAddressPhone 1yes 2no 3yes no yes 4
no Consent Table
Slide 30
8/31/2004 Limiting Disclosure in Hippocratic Databases30
Optimized Implementation of Opt-in and Opt-out Conditions External,
multiple table representation Athlete#NameAgeAddressPhone 1 Michael
Phelps 19Baltimore111- 1111 2 Natalie Coughlin 23Berkeley222- 2222
3 Ian Thorpe 23Sydney333- 3333 4 Jenny Thompson 31New York444- 4444
Table Athletes Athlete# 1 3 4 Positive Consent Tables Name 1 4
Phone 1 3 Address 1 3 Age 1
Slide 31
8/31/2004 Limiting Disclosure in Hippocratic Databases31
Overview of Performance Experiments Implemented Query Modification
algorithms on top of DB2 version 8.1 Focused on measuring
performance for unconditional rules, and those with opt-in and
opt-out choices Experimental setup Synthetic dataset based on the
Wisconsin Benchmark Dual-processor 1.8 GHz AMD Machine running
Windows 2000 Server 2 gigabytes memory 50 megabyte buffer pool
Queries run warm and cold Here we report the warm numbers (error
less than 5% with 95% confidence)
Slide 32
8/31/2004Limiting Disclosure in Hippocratic Databases 32 0 10
20 30 40 020406080100 Choice Selectivity (%) Elapsed Time (seconds)
Modified External Multiple Unmodified Modified Internal Measured
performance of a query selecting all records from a 5 million-
record table Compared performance of original and modified queries
for varied choice selectivity Not surprisingly, performance
actually better for modified queries when we use privacy
enforcement as an additional selection condition Able to use
indexes on choice values Shows the importance of database-level
privacy enforcement for performance
Slide 33
8/31/2004Limiting Disclosure in Hippocratic Databases 33
Measured overhead cost using a query that selects all records
Choice selectivity = 100% Observed worst-case scenario where no
rows are filtered due to privacy constraints, but incur all costs
of cell-level checking Full bar represents elapsed time Bottom
portion of bar is CPU time Much of the cost of privacy enforcement
is CPU cost, so scales well as queries become more I/O
intensive
Slide 34
8/31/2004 Limiting Disclosure in Hippocratic Databases34
Additional Performance Results Cost of rewriting queries is small
Must only be done once if query is pre-compiled Found that query
semantics enforcement model is often faster than table semantics
because frequently more rows are filtered Tradeoffs between choice
storage techniques Number of choices stored for a particular table
As more choices are stored, performance of internal representation
suffers Number of choices enforced for a particular query As more
choices are enforced, performance of external multiple
representation suffers Tradeoffs between query modification
algorithms Described in paper
Slide 35
8/31/2004 Limiting Disclosure in Hippocratic Databases35
Conclusions Limited Disclosure is a necessary component of a
comprehensive data privacy management system Proposed a framework
enforcing limited disclosure at the database level More efficient
and flexible than application-level disclosure control Techniques
also have broader use for other applications requiring
policy-driven fine-grained disclosure control Framework can be
deployed to an existing environment with minimal modification to
legacy applications and existing schemas Query modification and
consent storage approaches efficient enough to be viable in
practice