Upload
curtis-bates
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
1
What Can We Learn Privately?
Sofya RaskhodnikovaPenn State University
Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin
Kobbi Nissim Ben Gurion Adam Smith Penn State
To appear in SICOMP special issue for FOCS ‘08
Private Learning
• Goal: machine learning algorithms that protect the privacy of individual examples (people, organizations,...)
• Desiderata‒ Privacy: Worst-case guarantee (differential privacy)‒ Learning: Distributional guarantee (e.g., PAC learning)
• This work‒ Characterize classification problems learnable privately‒ Understand power of popular models for private analysis
2
What Can We Compute Privately?
Prior work:
• Function evaluation[DiNi,DwNi,BDMN,EGS,DMNS,…]‒ Statistical Query (SQ) Learning [Blum Dwork McSherry Nissim 05]‒ Learning Mixtures of Gaussians [Nissim Raskhodnikova Smith 08]
• Mechanism design [McSherry Talwar 07]
This work: PAC Learning (in general, not captured by function evaluation)
Some subsequent work:
• Learning [Chaudhuri Monteleoni 08, McSherry Williams 09, Beimel Kasiwiswanathan Nissim 10, Sarwate Chaudhuri Monteleoni]
• Statistical inference [Smith 08, Dwork Lei 09, Wasserman Zhou 09]
• Synthetic data [Machanavajjhala Kifer Abowd Gehrke Vilhuber 08,Blum Ligget Roth 08, Dwork Naor Reingold Rothblum Vadhan 09, Roth Roughgarden 10]
• Combinatorial optimization [Gupta Ligett McSherry Roth Talwar 10]
3
A“Tell me f(x)”f(x) +
noiseUser
Our Results 1: What is Learnable Privately
PAC* = PAC learnable with poly samples, not necessarily efficiently
Privately PAC-learnable
Parity
PAC*
PAC* = Private PAC*
PAC-learnable
SQ Halfplanes Conjunctions …
4
Basic Privacy Models
5
A
xn
xn-1x3
x2
x1R1R2
xn
xn-1
x3
x2x1
R3
Rn-1
Rn
R1R2
xn
xn-1
x3
x2x1
R3
Rn-1
Rn
CentralizedLocal Noninteractive Local (Interactive)
• Most work in data mining • “randomized response”, “input perturbation”, “Post Randomization Method” (PRAM), “Framework for High-Accuracy Strict-Privacy Preserving Mining” (FRAPP) [W65, AS00, AA01,EGS03,HH02,MS06]•Advantages:
‒ private data never leaves person’s hands‒ easy distribution of extracted information (e.g., CD, website)
Our Results 2: Power of Private Models
PAC* = PAC learnable with poly samples, not necessarily efficiently
Parity
= PAC*
Privately PAC-learnable
Local Noninteractive MaskedParity
Centralized
= Nonadaptive SQ
Local = SQ
6
Definition: Differential Privacy [DMNS06]
Intuition: Users learn roughly the same thing about mewhether or not my data is in the database.
xn
xn-1
x3
x2
x1
Algorithm A
A(x)
xn
xn-1
x’3
x2
x1
Algorithm A
A(x’)
A randomized algorithm A is -differentially private if
for all databases x, x’ that differ in one element
for all sets of answers S Pr[A(x) S] ≤ eε Pr[A(x’) S]
7
Properties of Differential PrivacyComposition : If algorithms A1 and A2 are ε-differentially private
then the algorithm that outputs (A1(x),A2(x)) is 2ε-
differentially private
Meaningful in the presence of arbitrary external information
8
Learning: An Example*• Bank needs to decide which applicants are bad credit risks• Goal: given sample of past customers (labeled examples),
produce good prediction rule (hypothesis) for future loan applicants
• Reasonable hypotheses given this data:‒ Predict YES iff (!Recent Delinquency) AND (% down > 5)‒ Predict YES iff 100*(Mmp/inc) – (% down) < 25
*Example taken from Blum, FOCS03 tutorial
% dow
n
Recent delinquenc
y?
High debt
?
Mmp/inc
10 No No 0.32
10 No Yes 0.25
5 Yes No 0.30
20 No No 0.31
5 No No 0.32
10 Yes Yes 0.38
label zi
example yi
9
Good Risk?
Yes
Yes
No
Yes
No
No
PAC Learning: The Setting
Algorithm draws independent examples from some distribution P, labeled by some target function c.
10
PAC Learning: The Setting
Algorithm outputs hypothesis h (a function from points to labels).
11
PAC Learning: The Setting
• Hypothesis h is good if it mostly agrees with target c: Pry~P [h(y) ≠ c(y)] ≤ α.• Require that h is good with probability at least 1- β.
new pointdrawn from
P
12
accuracy
confidence
*
Definition. Algorithm A PAC learns concept class C if,for all c in C, all distributions P and all α, β in (0,1/2)•Given poly(1/α,1/β,size(c)) examples drawn from P, labeled by some c in C•A outputs a good hypothesis (of accuracy α)
with probability 1- β in poly time
PAC Learning Definition [Valiant 84]
A concept class C is a set of functions {c : D{0,1}} together with their representation.
*
in poly timeof poly length
13
% dow
n
Recent delinquenc
y?
High debt
?
Mmp/inc
10 No No 0.32
10 No Yes 0.25
5 Yes No 0.30
20 No No 0.31
Good Risk?
Yes
Yes
No
Yes
Private LearningInput: Database: x = (x1,x2,…,xn) wherexi = (yi,zi), where yi ~ P, zi = c(yi) (zi is the label of example yi)Output: a hypothesis e.g., Predict Yes if 100*(Mmp/inc) -(% down)<25
• Algorithm A privately PAC learns concept class C if:‒ Utility: Algorithm A PAC learns concept class C
‒ Privacy: Algorithm A is -differentially private
25 No No 0.30 Yes
Average-case guarantee
Worst-case guarantee14
How Can We Design Private Learners?
Previous privacy work focused on function approximation
First attempt: View non-private learner as function to be approximated
Problem: “Close” hypothesismay mislabel many points
15
PAC* = Private PAC*
Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples.
Proof: Adapt exponential mechanism [MT07]:score(x,h) = # of examples in x correctly classified by hypothesis h
Output hypothesis h from C with probability escore(x,h)
‒ may take exponential time
Privacy: for any hypothesis h,Pr[h is output on input x]Pr[h is output on input x’] e score(x,h) ∑ h e score(x’,h)
e score(x’,h) ∑ h e score(x,h)
score(x,h)=3 =
16
● ≤ e2score(x,h)=4
PAC* = Private PAC*Theorem. Every PAC* learnable concept class can be learned
privately, using a poly number of samples.Proof: score(x,h) = # of examples in x correctly classified by hOutput hypothesis h from C with probability escore(x,h)
Utility (learning): • Best hypothesis correctly labels all examples: Pr[h] en
• Bad hypotheses mislabel > α fraction of examples: Pr[h] e(1- α )n
e (1- α )n (# bad hypothesis) ∙ |C| e n e α n
Sufficient to ensure n (ln|C| + ln(1/β)) / (εα). Thenw/ probability 1- β, output h labels 1- α fraction of examples correctly.
• “Occam’s razor”: If n (ln|C|+ ln(1/β)) / α then h does well on examples it does well on distribution P
17
Private version of “Occam’s razor”
Pr[output h is bad] ≤ ≤ ≤ β
Our Results: What is Learnable Privately
Note: Parity with noise is thought to be hardPrivate PAC ≠ learnable with noise
Privately PAC-learnable
Parity
PAC*
PAC* = Private PAC*
PAC-learnable
SQ Halfplanes Conjunctions …
18
[BDMN05]
Efficient Learner for Parity
Parity Problems Domain: D = {0,1}d
Concepts: cr(x) = <r,x> (mod 2)Input: x =((y1,cr(y1)),….., (yn,cr(yn)))
• Each example (yi,cr(yi)) is a linear constraint on r‒ (1101,1) translates to r1 + r2 + r4 (mod 2) =1
• Non-private learning algorithm: ‒ Find r by solving the set of linear equations over
GF(2) imposed by input x
19
The Effect of a Single Example• Let Vi be space of feasible solutions for the set of
equations imposed by (y1,cr(y1)),…,(yi,cr(yi))• Add a fresh example (yi+1,cr(yi+1))
‒Consider the new solution space Vi+1
• Then‒ |Vi+1| ≥ |Vi|/2, or‒ |Vi+1| = 0 (system becomes inconsistent)
The solution space changes drastically only when the non-private learner fails
100 000
001
101
new constraint: second coordinate is 1 new constraint: third coordinate is 0
20
100 000
Private Learner for Parity
Algorithm A1. With probability ½ output “fail”.2. Construct xs by picking each example from x with probability .
3. Solve the system of equations imposed by examples in xs .‒ Let V be the set of feasible solutions.
4. If V = Ø, output “fail”.Otherwise, choose r from V uniformly at random; output cr .
21
Smooths out extreme jumps in Vi
Lemma [utility]. Our algorithm PAC-learns parity withn = O((non-private-sample-size)/ε)
Proof idea: Conditioned on passing step 1, get the same utility aswith εn examples. By repeating a few times, pass step 1 w.h.p.
Private Learner for Parity
Lemma. Algorithm A is 4ε-differentially private.Proof: For inputs x and x’ that differ in position i, show that
for all outcomes probability ratio 1+4ε.• Changed input xi enters the sample with probability ε.
• Probability of “fail” goes up or down by ε/2.
≤ ≤ 1+ε as Pr[A(x’) fails] 1/2.
• For hypothesis r:
=
2 ε/(1- ε) + 1 4ε+1 for ε ½
• The 2nd uses: = ≤ 2.
Intuitively, this follows from |Vi| ≥ |Vi-1|/2. 22
Pr[A(x) fails] Pr[A(x’) fails]+ε/2Pr[A(x’) fails] Pr[A(x’) fails]
Pr[A(x) = r] ε Pr[A(x) = r | iS] + (1-ε)Pr[A(x) = r | iS]
Pr[A(x’) = r] ε Pr[A(x’) = r | iS] + (1-ε)Pr[A(x’) = r | iS]
Pr[A(x) = r | iS] Pr[A(x) = r | iS]Pr[A(x’) = r | iS] Pr[A(x) = r | i S]
0
Our Results: What is Learnable Privately
Note: Parity with noise is thought to be hardPrivate PAC ≠ learnable with noise
Privately PAC-learnable
Parity
PAC*
PAC* = Private PAC*
PAC-learnable
SQ Halfplanes Conjunctions …
23
[BDMN05]
Our Results 2: Power of Private Models
PAC* = PAC learnable ignoring computational efficiency
= PAC*
Local Noninteractive
Centralized
= Nonadaptive SQ
Local = SQ
24
• Interactive• Non-interactive
Reminder: Local Privacy Preserving Protocols
User
xn
xn-1
x3
x2x1
R1
R2
Rn
R3
…
25
• Same guarantees as PAC model, but algorithm no longer has access to individual examples
Theorem [BDMN05]. Any SQ algorithm can be simulated by a private algorithm
Proof: [DMNS06] Perturb query answers using Laplace noise.
SQOracle
Algorithm
g: D × {0,1} {0,1}
Ey~P[g(y,c(y))=1] ± Probability that a random
labeled example (~ P) satisfies g
τ > 1/poly(...)g can be evaluated in poly time poly running time
Statistical Query (SQ) Learning [Kearns 93]
26
(Non-interactive) Local =(Non-adaptive) SQ
Theorem. Any (non-adaptive) SQ algorithm can be simulated by a (non-interactive) local
algorithmLocal protocol for SQ:
For each i, compute bit R(xi) Sum of noisy bits allows
approximation to answerParticipants can compute
noisy bits on their ownR (applied by each participant) is differentially privateIf all SQ queries are known in advance (non-adaptive),
the protocol is non-interactive
User
R
R
RA
27
(Non-interactive) Local =(Non-adaptive) SQ
Theorem. Any (non-interactive) local algorithm can be simulated by a (non-adaptive) SQ algorithm.
Technique: Rejection sampling
Proof idea [non-interactive case]: To simulate randomizer R: D W on entry zi, need to output w in W with probability p(w)=Prz~P [R(z)=w].
Let q(w)=Pr[R(0)=w]. (Approximates p(w) up to factor eε).
.1 Sample w from q(w) .2With probability p(w)/(q(w)eε), output w.3With the remaining probability repeat from (1)
Use SQ queries to estimate p(w). Idea: p(w) = Prz~P [R(z)=w] = Ez~P
[Pr[R(z)=w]]. 28
Our Results 2: Power of Private Models
PAC* = PAC learnable ignoring computational efficiency
Parity
= PAC*
Local Noninteractive MaskedParity
Centralized
= Nonadaptive SQ
Local = SQ
29
30
Non-interactive Local Interactive Local⊊
Masked Parity ProblemsConcepts: cr,a : {0,1}d+log d+1 {+1,-1}indexed by r {0,1}d and a {0,1}
cr,a(y,i,b) =
• (Adaptive) SQ learner: Two rounds of communication• Non-adaptive SQ learner: Needs 2d-1 samples
‒ Proof uses Fourier analytic argument similar to proof that parity is not in SQ
(-1)r • y (mod 2) + a if b=0
(-1)ri if b=1
31
Summary
• PAC* is privately learnable‒ Non-efficient learners
• Known problems in PAC are efficiently privately learnable‒ Parity‒ SQ [BDMN05]‒ What else is in PAC?
• Equivalence of local model and SQ:‒ Local = SQ‒ Local non-interactive = non-adaptive SQ
• Interactivity helps in local model‒ Local non-interactive Local⊊‒ SQ non-adaptive SQ⊊
Open questions• Separate efficient learning from efficient private learning• Better private algorithms for SQ problems• Other learning models