Lecture 10 : 590.03 Fall 12 1
Post-processing outputs for better utility
CompSci 590.03Instructor: Ashwin Machanavajjhala
Lecture 10 : 590.03 Fall 12 2
Announcement
• Project proposal submission deadline is Fri, Oct 12 noon.
Lecture 10 : 590.03 Fall 12 3
Recap: Differential Privacy
For every output …
OD2D1
Adversary should not be able to distinguish between any D1 and D2 based on any O
Pr[A(D1) = O] Pr[A(D2) = O] .
For every pair of inputs that differ in one value
< ε (ε>0)log
Lecture 10 : 590.03 Fall 12 4
Recap: Laplacian Distribution
-10-4.300000000000021.4 7.09999999999990
0.10.20.30.40.50.6
Laplace Distribution – Lap(λ)
Database
Researcher
Query q
True answer q(d) q(d) + η
η
h(η) α exp(-η / λ)
Privacy depends on the λ parameter
Mean: 0, Variance: 2 λ2
Lecture 10 : 590.03 Fall 12 5
Recap: Laplace MechanismThm: If sensitivity of the query is S, then the following guarantees ε-
differential privacy.
λ = S/εSensitivity: Smallest number s.t. for any d, d’ differing in one entry,
|| q(d) – q(d’) || ≤ S(q)
Histogram query: Sensitivity = 2• Variance / error on each entry = 2x4/ε2 = O(1/ε2)
Lecture 10 : 590.03 Fall 12 6
This class• What is the optimal method to answer a batch of queries?
Lecture 10 : 590.03 Fall 12 7
How to answer a batch of queries?• Database of values {x1, x2, …, xk}
• Query Set: – Value of x1 η1 = x1 + δ1– Value of x2 η2 = x2 + δ2– Value of x1 + x2 η3 = x1 + x2 + δ3
• But we know that η1 and η2 should sum up to η3!
Lecture 10 : 590.03 Fall 12 8
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
Lecture 10 : 590.03 Fall 12 9
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
Lecture 10 : 590.03 Fall 12 10
Constrained Inference
Lecture 10 : 590.03 Fall 12 11
Constrained Inference• Let x1 and x2 be the original values. We observe noisy values η1,
η2 and η3• We would like to reconstruct the best estimators y1 (for x1/) and
y2 (for x2) from the noisy values.
• That is, we want to find the values of y1, y2 such that:
min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2
s.t., y1 + y2 = y3
Lecture 10 : 590.03 Fall 12 12
Constrained Inference [Hay et al VLDB 10]
Lecture 10 : 590.03 Fall 12 13
Sorted Unattributed Histograms• Counts of diseases
– (without associating a particular count to the corresponding disease)
• Degree sequence: List of node degrees – (without associating a degree to a particular node)
• Constraint: The values are sorted
Lecture 10 : 590.03 Fall 12 14
Sorted Unattributed HistogramsTrue Values 20, 10, 8, 8, 8, 5, 3, 2Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε))
Proof:?
Lecture 10 : 590.03 Fall 12 15
Sorted Unattributed Histograms
Lecture 10 : 590.03 Fall 12 16
Sorted Unattributed Histograms• n: number of values in the histogram• d: number of distinct values in the histogram• ni: number of times ith distinct value appears in the histogram.
Lecture 10 : 590.03 Fall 12 17
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
Lecture 10 : 590.03 Fall 12 18
Query Strategy
IPrivate
Data
WA
Differential Privacy
A(I) A(I) W(I)~ ~
Original Query Workload
Strategy Query Workload
Noisy StrategyAnswers
Noisy WorkloadAnswers
Lecture 10 : 590.03 Fall 12 19
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 1: Answer all range queries using Laplace mechanism
• O(n2/ε2) total error. • May reduce using constrained optimization …
Lecture 10 : 590.03 Fall 12 20
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 1: Answer all range queries using Laplace mechanism
• Sensitivity = O(n2)• O(n4/ε2) total error across all range queries. • May reduce using constrained optimization …
Lecture 10 : 590.03 Fall 12 21
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.
• O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2)• Total error on all range queries : O(n3/ε2)
Lecture 10 : 590.03 Fall 12 22
Universal Histograms for Range Queries
Strategy 3: Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics.
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
[Hay et al VLDB 2010]
Lecture 10 : 590.03 Fall 12 23
Universal Histograms for Range Queries• Sensitivity: log n• q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2
= x2 + x34 + x56 Error = 2 x 3log2n/ε2
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
Lecture 10 : 590.03 Fall 12 24
Universal Histograms for Range Queries• Every range query can be answered by summing at most log n
different noisy answers• Maximum error on any range query = O(log3n / ε2)• Total error on all range queries = O(n2 log3n / ε2)
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
Lecture 10 : 590.03 Fall 12 25
Universal Histograms & Constrained Inference
• Can further reduce the error by enforcing constraintsx1234 = x12 + x34 = x1 + x2 + x3 + x4
• 2-pass algorithm to compute a consistent version of the counts
[Hay et al VLDB 2010]
Lecture 10 : 590.03 Fall 12 26
Universal Histograms & Constrained Inference
• Pass 1: (Bottom Up)
• Pass 2: (Top down)
[Hay et al VLDB 2010]
Lecture 10 : 590.03 Fall 12 27
Universal Histograms & Constrained Inference
• Resulting consistent counts – Have lower error than noisy counts (upto 10 times smaller in some cases)– Unbiased estimators – Have the least error amongst all unbiased estimators
Lecture 10 : 590.03 Fall 12 28
Next Class• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism