1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim

1

What Can We Learn Privately?

Sofya RaskhodnikovaPenn State University

Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin

Kobbi Nissim Ben Gurion Adam Smith Penn State

To appear in SICOMP special issue for FOCS ‘08

Private Learning

• Goal: machine learning algorithms that protect the privacy of individual examples (people, organizations,...)

• Desiderata‒ Privacy: Worst-case guarantee (differential privacy)‒ Learning: Distributional guarantee (e.g., PAC learning)

• This work‒ Characterize classification problems learnable privately‒ Understand power of popular models for private analysis

2

What Can We Compute Privately?

Prior work:

• Function evaluation[DiNi,DwNi,BDMN,EGS,DMNS,…]‒ Statistical Query (SQ) Learning [Blum Dwork McSherry Nissim 05]‒ Learning Mixtures of Gaussians [Nissim Raskhodnikova Smith 08]

• Mechanism design [McSherry Talwar 07]

This work: PAC Learning (in general, not captured by function evaluation)

Some subsequent work:

• Learning [Chaudhuri Monteleoni 08, McSherry Williams 09, Beimel Kasiwiswanathan Nissim 10, Sarwate Chaudhuri Monteleoni]

• Statistical inference [Smith 08, Dwork Lei 09, Wasserman Zhou 09]

• Synthetic data [Machanavajjhala Kifer Abowd Gehrke Vilhuber 08,Blum Ligget Roth 08, Dwork Naor Reingold Rothblum Vadhan 09, Roth Roughgarden 10]

• Combinatorial optimization [Gupta Ligett McSherry Roth Talwar 10]

3

A“Tell me f(x)”f(x) +

noiseUser

Our Results 1: What is Learnable Privately

PAC* = PAC learnable with poly samples, not necessarily efficiently

Privately PAC-learnable

Parity

PAC*

PAC* = Private PAC*

PAC-learnable

SQ Halfplanes Conjunctions …

4

Basic Privacy Models

5

A

xn

xn-1x3

x2

x1R1R2

xn

xn-1

x3

x2x1

R3

Rn-1

Rn

R1R2

xn

xn-1

x3

x2x1

R3

Rn-1

Rn

CentralizedLocal Noninteractive Local (Interactive)

• Most work in data mining • “randomized response”, “input perturbation”, “Post Randomization Method” (PRAM), “Framework for High-Accuracy Strict-Privacy Preserving Mining” (FRAPP) [W65, AS00, AA01,EGS03,HH02,MS06]•Advantages:

‒ private data never leaves person’s hands‒ easy distribution of extracted information (e.g., CD, website)

Our Results 2: Power of Private Models

PAC* = PAC learnable with poly samples, not necessarily efficiently

Parity

= PAC*


Local Noninteractive MaskedParity

Centralized

= Nonadaptive SQ

Local = SQ

6

Definition: Differential Privacy [DMNS06]

Intuition: Users learn roughly the same thing about mewhether or not my data is in the database.

xn

xn-1

x3

x2

x1

Algorithm A

A(x)

xn

xn-1

x’3

x2

x1

Algorithm A

A(x’)

A randomized algorithm A is -differentially private if

for all databases x, x’ that differ in one element

for all sets of answers S Pr[A(x) S] ≤ eε Pr[A(x’) S]

7

Properties of Differential PrivacyComposition : If algorithms A1 and A2 are ε-differentially private

then the algorithm that outputs (A1(x),A2(x)) is 2ε-

differentially private

Meaningful in the presence of arbitrary external information

8

Learning: An Example*• Bank needs to decide which applicants are bad credit risks• Goal: given sample of past customers (labeled examples),

produce good prediction rule (hypothesis) for future loan applicants

• Reasonable hypotheses given this data:‒ Predict YES iff (!Recent Delinquency) AND (% down > 5)‒ Predict YES iff 100*(Mmp/inc) – (% down) < 25

*Example taken from Blum, FOCS03 tutorial

% dow

n

Recent delinquenc

y?

High debt

?

Mmp/inc

10 No No 0.32

10 No Yes 0.25

5 Yes No 0.30

20 No No 0.31

5 No No 0.32

10 Yes Yes 0.38

label zi

example yi

9

Good Risk?

Yes

Yes

No

Yes

No

No

PAC Learning: The Setting

Algorithm draws independent examples from some distribution P, labeled by some target function c.

10


Algorithm outputs hypothesis h (a function from points to labels).

11


• Hypothesis h is good if it mostly agrees with target c: Pry~P [h(y) ≠ c(y)] ≤ α.• Require that h is good with probability at least 1- β.

new pointdrawn from

P

12

accuracy

confidence

*

Definition. Algorithm A PAC learns concept class C if,for all c in C, all distributions P and all α, β in (0,1/2)•Given poly(1/α,1/β,size(c)) examples drawn from P, labeled by some c in C•A outputs a good hypothesis (of accuracy α)

with probability 1- β in poly time

PAC Learning Definition [Valiant 84]

A concept class C is a set of functions {c : D{0,1}} together with their representation.

*

in poly timeof poly length

13

% dow

n

Recent delinquenc

y?

High debt

?

Mmp/inc

10 No No 0.32

10 No Yes 0.25

5 Yes No 0.30

20 No No 0.31

Good Risk?

Yes

Yes

No

Yes

Private LearningInput: Database: x = (x1,x2,…,xn) wherexi = (yi,zi), where yi ~ P, zi = c(yi) (zi is the label of example yi)Output: a hypothesis e.g., Predict Yes if 100*(Mmp/inc) -(% down)<25

• Algorithm A privately PAC learns concept class C if:‒ Utility: Algorithm A PAC learns concept class C

‒ Privacy: Algorithm A is -differentially private

25 No No 0.30 Yes

Average-case guarantee

Worst-case guarantee14

How Can We Design Private Learners?

Previous privacy work focused on function approximation

First attempt: View non-private learner as function to be approximated

Problem: “Close” hypothesismay mislabel many points

15

PAC* = Private PAC*

Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples.

Proof: Adapt exponential mechanism [MT07]:score(x,h) = # of examples in x correctly classified by hypothesis h

Output hypothesis h from C with probability escore(x,h)

‒ may take exponential time

Privacy: for any hypothesis h,Pr[h is output on input x]Pr[h is output on input x’] e score(x,h) ∑ h e score(x’,h)

e score(x’,h) ∑ h e score(x,h)

score(x,h)=3 =

16

● ≤ e2score(x,h)=4

PAC* = Private PAC*Theorem. Every PAC* learnable concept class can be learned

privately, using a poly number of samples.Proof: score(x,h) = # of examples in x correctly classified by hOutput hypothesis h from C with probability escore(x,h)

Utility (learning): • Best hypothesis correctly labels all examples: Pr[h] en

• Bad hypotheses mislabel > α fraction of examples: Pr[h] e(1- α )n

e (1- α )n (# bad hypothesis) ∙ |C| e n e α n

Sufficient to ensure n (ln|C| + ln(1/β)) / (εα). Thenw/ probability 1- β, output h labels 1- α fraction of examples correctly.

• “Occam’s razor”: If n (ln|C|+ ln(1/β)) / α then h does well on examples it does well on distribution P

17

Private version of “Occam’s razor”

Pr[output h is bad] ≤ ≤ ≤ β

Our Results: What is Learnable Privately

Note: Parity with noise is thought to be hardPrivate PAC ≠ learnable with noise


Parity

PAC*

PAC* = Private PAC*

PAC-learnable


18

[BDMN05]

Efficient Learner for Parity

Parity Problems Domain: D = {0,1}d

Concepts: cr(x) = <r,x> (mod 2)Input: x =((y1,cr(y1)),….., (yn,cr(yn)))

• Each example (yi,cr(yi)) is a linear constraint on r‒ (1101,1) translates to r1 + r2 + r4 (mod 2) =1

• Non-private learning algorithm: ‒ Find r by solving the set of linear equations over

GF(2) imposed by input x

19

The Effect of a Single Example• Let Vi be space of feasible solutions for the set of

equations imposed by (y1,cr(y1)),…,(yi,cr(yi))• Add a fresh example (yi+1,cr(yi+1))

‒Consider the new solution space Vi+1

• Then‒ |Vi+1| ≥ |Vi|/2, or‒ |Vi+1| = 0 (system becomes inconsistent)

The solution space changes drastically only when the non-private learner fails

100 000

001

101

new constraint: second coordinate is 1 new constraint: third coordinate is 0

20

100 000

Private Learner for Parity

Algorithm A1. With probability ½ output “fail”.2. Construct xs by picking each example from x with probability .

3. Solve the system of equations imposed by examples in xs .‒ Let V be the set of feasible solutions.

4. If V = Ø, output “fail”.Otherwise, choose r from V uniformly at random; output cr .

21

Smooths out extreme jumps in Vi

Lemma [utility]. Our algorithm PAC-learns parity withn = O((non-private-sample-size)/ε)

Proof idea: Conditioned on passing step 1, get the same utility aswith εn examples. By repeating a few times, pass step 1 w.h.p.

Private Learner for Parity

Lemma. Algorithm A is 4ε-differentially private.Proof: For inputs x and x’ that differ in position i, show that

for all outcomes probability ratio 1+4ε.• Changed input xi enters the sample with probability ε.

• Probability of “fail” goes up or down by ε/2.

≤ ≤ 1+ε as Pr[A(x’) fails] 1/2.

• For hypothesis r:

=

2 ε/(1- ε) + 1 4ε+1 for ε ½

• The 2nd uses: = ≤ 2.

Intuitively, this follows from |Vi| ≥ |Vi-1|/2. 22

Pr[A(x) fails] Pr[A(x’) fails]+ε/2Pr[A(x’) fails] Pr[A(x’) fails]

Pr[A(x) = r] ε Pr[A(x) = r | iS] + (1-ε)Pr[A(x) = r | iS]

Pr[A(x’) = r] ε Pr[A(x’) = r | iS] + (1-ε)Pr[A(x’) = r | iS]

Pr[A(x) = r | iS] Pr[A(x) = r | iS]Pr[A(x’) = r | iS] Pr[A(x) = r | i S]

0

Our Results: What is Learnable Privately

Note: Parity with noise is thought to be hardPrivate PAC ≠ learnable with noise


Parity

PAC*

PAC* = Private PAC*

PAC-learnable


23

[BDMN05]


PAC* = PAC learnable ignoring computational efficiency

= PAC*

Local Noninteractive

Centralized

= Nonadaptive SQ

Local = SQ

24

• Interactive• Non-interactive

Reminder: Local Privacy Preserving Protocols

User

xn

xn-1

x3

x2x1

R1

R2

Rn

R3

…

25

• Same guarantees as PAC model, but algorithm no longer has access to individual examples

Theorem [BDMN05]. Any SQ algorithm can be simulated by a private algorithm

Proof: [DMNS06] Perturb query answers using Laplace noise.

SQOracle

Algorithm

g: D × {0,1} {0,1}

Ey~P[g(y,c(y))=1] ± Probability that a random

labeled example (~ P) satisfies g

τ > 1/poly(...)g can be evaluated in poly time poly running time

Statistical Query (SQ) Learning [Kearns 93]

26

(Non-interactive) Local =(Non-adaptive) SQ

Theorem. Any (non-adaptive) SQ algorithm can be simulated by a (non-interactive) local

algorithmLocal protocol for SQ:

For each i, compute bit R(xi) Sum of noisy bits allows

approximation to answerParticipants can compute

noisy bits on their ownR (applied by each participant) is differentially privateIf all SQ queries are known in advance (non-adaptive),

the protocol is non-interactive

User

R

R

RA

27

(Non-interactive) Local =(Non-adaptive) SQ

Theorem. Any (non-interactive) local algorithm can be simulated by a (non-adaptive) SQ algorithm.

Technique: Rejection sampling

Proof idea [non-interactive case]: To simulate randomizer R: D W on entry zi, need to output w in W with probability p(w)=Prz~P [R(z)=w].

Let q(w)=Pr[R(0)=w]. (Approximates p(w) up to factor eε).

.1 Sample w from q(w) .2With probability p(w)/(q(w)eε), output w.3With the remaining probability repeat from (1)

Use SQ queries to estimate p(w). Idea: p(w) = Prz~P [R(z)=w] = Ez~P

[Pr[R(z)=w]]. 28


PAC* = PAC learnable ignoring computational efficiency

Parity

= PAC*

Local Noninteractive MaskedParity

Centralized

= Nonadaptive SQ

Local = SQ

29

30

Non-interactive Local Interactive Local⊊

Masked Parity ProblemsConcepts: cr,a : {0,1}d+log d+1 {+1,-1}indexed by r {0,1}d and a {0,1}

cr,a(y,i,b) =

• (Adaptive) SQ learner: Two rounds of communication• Non-adaptive SQ learner: Needs 2d-1 samples

‒ Proof uses Fourier analytic argument similar to proof that parity is not in SQ

(-1)r • y (mod 2) + a if b=0

(-1)ri if b=1

31

Summary

• PAC* is privately learnable‒ Non-efficient learners

• Known problems in PAC are efficiently privately learnable‒ Parity‒ SQ [BDMN05]‒ What else is in PAC?

• Equivalence of local model and SQ:‒ Local = SQ‒ Local non-interactive = non-adaptive SQ

• Interactivity helps in local model‒ Local non-interactive Local⊊‒ SQ non-adaptive SQ⊊

Open questions• Separate efficient learning from efficient private learning• Better private algorithms for SQ problems• Other learning models

Documents

1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim