From Stability to Differential Privacy

From Stability to Differential Privacy

Abhradeep Guha ThakurtaYahoo! Labs, Sunnyvale

Thesis: Stable algorithms yield differentially private algorithms

Differential privacy: A short tutorial

Privacy in Machine Learning Systems

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals


𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals

Trusted learning Algorithm


𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals


UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients


𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Individuals


UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients

Attacker


Learning Algorithm

𝑑1𝑑2𝑑1

𝑑𝑛− 1

𝑑𝑛

Two conflicting goals:

1. Utility: Release accurate information

2. Privacy: Protect privacy of individual entries

Balancing the tradeoff is a difficult problem:

1. Netflix prize database attack [NS08]

2. Facebook advertisement system attack [Korolova11]

3. Amazon recommendation system attack [CKNFS11]

Data privacy is an active area of research:

• Computer science, economics, statistics, biology, social sciences …

Users

Differential Privacy [DMNS06, DKMMN06]Intuition:

• Adversary learns essentially the same thing irrespective of your presence or absence in the data set

• and are called neighboring data sets

• Require: Neighboring data sets induce close distribution on outputs

M

Random coins

𝑑1M()

M

Random coins

𝑑1M()

Data set: Data set:

Differential Privacy [DMNS06, DKMMN06]

Definition:

A randomized algorithm M is -differentially private if

• for all data sets and that differ in one element• for all sets of answers

•Differential privacy is a condition on the algorithm

•Guarantee is meaningful in the presence of any auxiliary information

• Typically, think of privacy parameters: and , where = # of data samples

• Composition: ’s and ‘s add up over multiple executions

Semantics of Differential Privacy

Laplace Mechanism [DMNS06]

Data set and be a function on

Sensitivity: S()

1. Random variable sampled from Lap() 2. Output

Theorem (Privacy): Algorithm is -differentially private

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Perturbation stability (a.k.a. zero local sensitivity)

Perturbation Stability

Function

Data set

Output

Perturbation Stability

Function

Data set

Output

Stability of at : The output does not change on changing any one entryEquivalently, local sensitivity of at is zero

Distance to Instability Property

•Definition: A function is stable at a data set if• For any data set , with ,

•Distance to instability:

•Objective: Output while preserving differential privacy

All data setsUnstable data sets

𝐷Distance

Stable data sets

Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13]

1. If, then return , else return

A Meta-algorithm: Propose-Test-Release (PTR)

Theorem: The algorithm is differentially private

Theorem: If is -stable at , then w.p. the algorithm outputs

Basic tool: Laplace mechanism





This Talk

Sample and aggregate framework[NRS07, Smith11, Smith T.’13]

Sample and Aggregate FrameworkData set

Subsample

𝐷1 𝐷𝑚

Output

Algorithm

Aggregator

Sample and Aggregate Framework

Theorem: If the aggregator is differentially private, then the overall framework is differentially private

Assumption: Each entry appears in data blocks

Proof: Each data entry affects only one data block

A differentially private aggregator using PTR framework [Smith T.’13]

Assumption: discrete possible outputs

𝑆1 𝑆2 𝑆∗ 𝑆𝑟

Coun

t

𝐷1 𝐷𝑚

Vote Vote

An differentially Private Aggregator

Function : Candidate output with the maximum votes

PTR+Report-Noisy-Max Aggregator


Observation: is the gap between the counts of highest and the second highest scoring modelObservation: The algorithm is always computationally efficient

Analysis of the aggregator under subsampling stability [Smith T.’13]

Subsampling Stability

Data set

Random subsamplewith replacement 𝐷1 𝐷𝑚

Function

Stability:

Function¿ w.p.

A Private Aggregator using Subsampling Stability

Voting histogram (in expectation)

𝑆1 𝑆2 𝑆∗ 𝑆𝑟

12m

34𝑚

12𝑚

14𝑚

• : Sample each entry from w.p.

• Each entry of appears in data blocks

PTR+Report-Noisy-Max Aggregator


• Each entry of appears in data blocks w.p.

1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟

Theorem: Above algorithm is differentially private

Theorem: If ,then with probability at least , the true answer is output

A Private Aggregator using Subsampling Stability

Notice: Utility guarantee does not depend on the number of candidate models

This Talk





Sparse linear regression in high-dimensions and the LASSO

Sparse Linear Regression in High-dimensions ()• Data set: where and

• Assumption: Data generated by noisy linear system

𝑦 𝑖 +¿¿

𝑥𝑖

𝜃𝑝× 1∗

𝑤𝑖

Para

mete

r vect

or

Field noise

Feature vector

Data normalization:

• is sub-Gaussian

Sparse Linear Regression in High-dimensions ()• Data set: where and

• Assumption: Data generated by noisy linear system

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

Resp

onse

vect

or Design matrix

Para

mete

r vect

or

Field

nois

e

• Sparsity: has non-zero entries

• Bounded norm: for arbitrary small const.

Model selection problem: Find the non-zero coordinates of

Sparse Linear Regression in High-dimensions ()

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

Resp

onse

vect

or Design matrix

Field

nois

e

Model selection: Non-zero coordinates (or the support) of

Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]

Sparse Linear Regression in High-dimensions ()

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

Resp

onse

vect

or Design matrix

Field

nois

e

Incoherence Restricted Strong Convexity

Consistency of the LASSO Estimator

Consistency conditions* [Wainwright06,ZY07]:

• Support of the underlying parameter vector

+¿¿

𝑋 Γ 𝑋 Γ 𝑐

Restricted Strong Convexity

Consistency of the LASSO Estimator



+¿¿

Theorem*: Under proper choice of and , support of the LASSO estimator equals support of

Incoherence

Incoherence Restricted Strong Convexity

Stochastic Consistency of the LASSO



+¿¿

Theorem [Wainwright06,ZY07]: If each data entry in , then the assumptions above are satisfied w.h.p.

We show [Smith,T.’13]

Consistency conditions

Perturbation stability Proxy conditions(Efficiently testable with

privacy)

This Talk





Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13]

Notion of Neighboring Data sets

𝑛

𝑝

𝑥𝑖 𝑦 𝑖

Data set =

Design matrix Response vector

Notion of Neighboring Data sets

𝑛

𝑝

𝑥𝑖 ′ 𝑦 𝑖′

Data set =

and are neighboring data sets

Design matrix Response vector

Recap: Subsampling Stability

Data set

Random subsamplewith replacement 𝐷1 𝐷𝑚

Function

Stability:

Function¿ w.p.

Recap: PTR+Report-Noisy-Max Aggregator

Assumption: All candidate models

𝑆1 𝑆2 𝑆∗ 𝑆𝑘

Coun

t

𝐷1 𝐷𝑚

Vote Vote

+¿¿

𝑓 𝑓 𝑓

Recap: PTR+Report-Noisy-Max Aggregator


• Each entry of appears in data blocks w.p.

• Fix

1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

𝑦 𝑛×1

+¿¿𝑋𝑛×𝑝

𝜃𝑝× 1∗

𝑤𝑛× 1

Resp

onse

vect

or Design matrix

Para

mete

r vect

or

Field

nois

e

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

+¿¿

Theorem [Wainwright06,ZY07]: Under proper choice of and , support of the LASSO estimator equals support of

Theorem: Under proper choice of , and , the output of the aggregator equals support of

Notice the gap of

Scale of

Perturbation stability based private LASSO and optimal sample complexity [Smith,T.’13]

Recap: Distance to Instability Property

•Definition: A function is stable at a data set if• For any data set , with ,

•Distance to instability:

•Objective: Output while preserving differential privacy

All data setsUnstable data sets

𝐷Distance

Stable data sets


Recap: Propose-Test-Release Framework (PTR)


Theorem: If is -stable at , then w.p. the algorithm outputs

TBD: Some global sensitivity one query

Instantiation of PTR for the LASSO

LASSO:

• Set function support of

• Issue: For , distance to instability might not be efficiently

computable

+¿¿

From [Smith,T.’13] Consistency

conditions


privacy)

This talk

Consistency conditions


privacy)

Perturbation Stability of the LASSO

LASSO: +¿¿

Theorem: Consistency conditions on LASSO are sufficient for perturbation stabilityProof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at

2. Show that support() is stable via using ‘’dual certificate’’ on stable instances

Perturbation Stability of the LASSO+¿¿

Lasso objective on

Proof Sketch: Gradient of LASSO =

0∈𝜕 𝐽 𝐷( �̂�)

Lasso objective on

0∈𝜕 𝐽 𝐷 ′ (�̂� ′ )

Perturbation Stability of the LASSO+¿¿

Proof Sketch: Gradient of LASSO =

Argue using the optimality conditions of and

1. No zero coordinates of become non-zero in (use mutual incoherence condition)

2. No non-zero coordinates of become zero in (use restricted strong convexity condition)

Perturbation Stability Test for the LASSO

0 0

: Support of : Complement of the support of

Test for the following (real test is more complex):• Restricted Strong Convexity (RSC): Minimum eigenvalue of

is

• Strong stability: Negative of the (absolute) coordinates of

the gradient of the least-squared loss in are

+¿¿

Intuition: Strong convexity ensures supp() supp()

1. Strong convexity ensures is small

2. If is large, then

3. Consistency conditions imply is large

Geometry of the Stability of LASSO+¿¿

Dimension 2 in

Dimension 1 in �̂�

Lasso objective along

Intuition: Strong stability ensures no zero coordinate in becomes non-zero in

• For the minimizer to move along , the perturbation to the gradient of least-squared loss has to be large


Dimension 2 in

Dimension 1 in

Slope:

Slope: -

�̂�



Gradient of the least-squared loss:

𝑎𝑖

𝑎𝑝

Γ

Γ c

• Strong stability: for all has a sub-gradient of zero for LASSO()

−𝑋𝑇 (𝑦−𝑋 �̂� )=¿Dimension 2 in

Dimension 1 in

Slope:

Slope: -

�̂�


Test for Restricted Strong Convexity:

Test for strong stability:

Issue: If and , then sensitivities are and

Our solution: Proxy distance • has global sensitivity of one

Making the Stability Test Private (Simplified)

𝑔1

𝑔2

and are both largeand insensitive

+¿¿

1. Compute = function of and


Private Model Selection with Optimal Sample Complexity


Theorem: Under consistency conditions , and , w.h.p. the support of is output. Here .

+¿¿Nearly optimal sample complexity

Thesis: Stable algorithms yield differentially private algorithms

Two notions of stability:

1. Perturbation stability

2. Subsampling stability

This Talk





Concluding Remarks1. Sample and aggregate framework with PTR+report-noisy-

max aggregator is a generic tool for designing learning algorithms

• Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]

2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently

3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity

Documents

From Stability to Differential Privacy