From Stability to Differential Privacy

Preview:

DESCRIPTION

From Stability to Differential Privacy. Abhradeep Guha Thakurta Yahoo! Labs , Sunnyvale. Thesis: Stable algorithms yield differentially private algorithms. Differential privacy: A short tutorial. Privacy in Machine Learning Systems. Individuals. - PowerPoint PPT Presentation

Citation preview

From Stability to Differential Privacy

Abhradeep Guha ThakurtaYahoo! Labs, Sunnyvale

Thesis: Stable algorithms yield differentially private algorithms

Differential privacy: A short tutorial

Privacy in Machine Learning Systems

๐‘‘1๐‘‘2๐‘‘1

๐‘‘๐‘›โˆ’ 1

๐‘‘๐‘›

Individuals

Privacy in Machine Learning Systems

๐‘‘1๐‘‘2๐‘‘1

๐‘‘๐‘›โˆ’ 1

๐‘‘๐‘›

Individuals

Trusted learning Algorithm

Privacy in Machine Learning Systems

๐‘‘1๐‘‘2๐‘‘1

๐‘‘๐‘›โˆ’ 1

๐‘‘๐‘›

Individuals

Trusted learning Algorithm

UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients

Privacy in Machine Learning Systems

๐‘‘1๐‘‘2๐‘‘1

๐‘‘๐‘›โˆ’ 1

๐‘‘๐‘›

Individuals

Trusted learning Algorithm

UsersSumma

ry statistic

s1. Classifiers2. Clusters3. Regressio

n coefficients

Attacker

Privacy in Machine Learning Systems

Learning Algorithm

๐‘‘1๐‘‘2๐‘‘1

๐‘‘๐‘›โˆ’ 1

๐‘‘๐‘›

Two conflicting goals:

1. Utility: Release accurate information

2. Privacy: Protect privacy of individual entries

Balancing the tradeoff is a difficult problem:

1. Netflix prize database attack [NS08]

2. Facebook advertisement system attack [Korolova11]

3. Amazon recommendation system attack [CKNFS11]

Data privacy is an active area of research:

โ€ข Computer science, economics, statistics, biology, social sciences โ€ฆ

Users

Differential Privacy [DMNS06, DKMMN06]Intuition:

โ€ข Adversary learns essentially the same thing irrespective of your presence or absence in the data set

โ€ข and are called neighboring data sets

โ€ข Require: Neighboring data sets induce close distribution on outputs

M

Random coins

๐‘‘1M()

M

Random coins

๐‘‘1M()

Data set: Data set:

Differential Privacy [DMNS06, DKMMN06]

Definition:

A randomized algorithm M is -differentially private if

โ€ข for all data sets and that differ in one elementโ€ข for all sets of answers

โ€ขDifferential privacy is a condition on the algorithm

โ€ขGuarantee is meaningful in the presence of any auxiliary information

โ€ข Typically, think of privacy parameters: and , where = # of data samples

โ€ข Composition: โ€™s and โ€˜s add up over multiple executions

Semantics of Differential Privacy

Laplace Mechanism [DMNS06]

Data set and be a function on

Sensitivity: S()

1. Random variable sampled from Lap() 2. Output

Theorem (Privacy): Algorithm is -differentially private

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Perturbation stability (a.k.a. zero local sensitivity)

Perturbation Stability

Function

Data set

Output

Perturbation Stability

Function

Data set

Output

Stability of at : The output does not change on changing any one entryEquivalently, local sensitivity of at is zero

Distance to Instability Property

โ€ขDefinition: A function is stable at a data set ifโ€ข For any data set , with ,

โ€ขDistance to instability:

โ€ขObjective: Output while preserving differential privacy

All data setsUnstable data sets

๐ทDistance

Stable data sets

Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.โ€™13]

1. If, then return , else return

A Meta-algorithm: Propose-Test-Release (PTR)

Theorem: The algorithm is differentially private

Theorem: If is -stable at , then w.p. the algorithm outputs

Basic tool: Laplace mechanism

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

This Talk

Sample and aggregate framework[NRS07, Smith11, Smith T.โ€™13]

Sample and Aggregate FrameworkData set

Subsample

๐ท1 ๐ท๐‘š

Output

Algorithm

Aggregator

Sample and Aggregate Framework

Theorem: If the aggregator is differentially private, then the overall framework is differentially private

Assumption: Each entry appears in data blocks

Proof: Each data entry affects only one data block

A differentially private aggregator using PTR framework [Smith T.โ€™13]

Assumption: discrete possible outputs

๐‘†1 ๐‘†2 ๐‘†โˆ— ๐‘†๐‘Ÿ

Coun

t

๐ท1 ๐ท๐‘š

Vote Vote

An differentially Private Aggregator

Function : Candidate output with the maximum votes

PTR+Report-Noisy-Max Aggregator

1. If, then return , else return

Observation: is the gap between the counts of highest and the second highest scoring modelObservation: The algorithm is always computationally efficient

Analysis of the aggregator under subsampling stability [Smith T.โ€™13]

Subsampling Stability

Data set

Random subsamplewith replacement ๐ท1 ๐ท๐‘š

Function

Stability:

Functionยฟ w.p.

A Private Aggregator using Subsampling Stability

Voting histogram (in expectation)

๐‘†1 ๐‘†2 ๐‘†โˆ— ๐‘†๐‘Ÿ

12m

34๐‘š

12๐‘š

14๐‘š

โ€ข : Sample each entry from w.p.

โ€ข Each entry of appears in data blocks

PTR+Report-Noisy-Max Aggregator

โ€ข : Sample each entry from w.p.

โ€ข Each entry of appears in data blocks w.p.

1. If, then return , else return ๐‘†1 ๐‘†2 ๐‘†โˆ— ๐‘†๐‘Ÿ

Theorem: Above algorithm is differentially private

Theorem: If ,then with probability at least , the true answer is output

A Private Aggregator using Subsampling Stability

Notice: Utility guarantee does not depend on the number of candidate models

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Sparse linear regression in high-dimensions and the LASSO

Sparse Linear Regression in High-dimensions ()โ€ข Data set: where and

โ€ข Assumption: Data generated by noisy linear system

๐‘ฆ ๐‘– +ยฟยฟ

๐‘ฅ๐‘–

๐œƒ๐‘ร— 1โˆ—

๐‘ค๐‘–

Para

mete

r vect

or

Field noise

Feature vector

Data normalization:

โ€ข is sub-Gaussian

Sparse Linear Regression in High-dimensions ()โ€ข Data set: where and

โ€ข Assumption: Data generated by noisy linear system

๐‘ฆ ๐‘›ร—1

+ยฟยฟ๐‘‹๐‘›ร—๐‘

๐œƒ๐‘ร— 1โˆ—

๐‘ค๐‘›ร— 1

Resp

onse

vect

or Design matrix

Para

mete

r vect

or

Field

nois

e

โ€ข Sparsity: has non-zero entries

โ€ข Bounded norm: for arbitrary small const.

Model selection problem: Find the non-zero coordinates of

Sparse Linear Regression in High-dimensions ()

๐‘ฆ ๐‘›ร—1

+ยฟยฟ๐‘‹๐‘›ร—๐‘

๐œƒ๐‘ร— 1โˆ—

๐‘ค๐‘›ร— 1

Resp

onse

vect

or Design matrix

Field

nois

e

Model selection: Non-zero coordinates (or the support) of

Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,โ€ฆ]

Sparse Linear Regression in High-dimensions ()

๐‘ฆ ๐‘›ร—1

+ยฟยฟ๐‘‹๐‘›ร—๐‘

๐œƒ๐‘ร— 1โˆ—

๐‘ค๐‘›ร— 1

Resp

onse

vect

or Design matrix

Field

nois

e

Incoherence Restricted Strong Convexity

Consistency of the LASSO Estimator

Consistency conditions* [Wainwright06,ZY07]:

โ€ข Support of the underlying parameter vector

+ยฟยฟ

๐‘‹ ฮ“ ๐‘‹ ฮ“ ๐‘

Restricted Strong Convexity

Consistency of the LASSO Estimator

Consistency conditions* [Wainwright06,ZY07]:

โ€ข Support of the underlying parameter vector

+ยฟยฟ

Theorem*: Under proper choice of and , support of the LASSO estimator equals support of

Incoherence

Incoherence Restricted Strong Convexity

Stochastic Consistency of the LASSO

Consistency conditions* [Wainwright06,ZY07]:

โ€ข Support of the underlying parameter vector

+ยฟยฟ

Theorem [Wainwright06,ZY07]: If each data entry in , then the assumptions above are satisfied w.h.p.

We show [Smith,T.โ€™13]

Consistency conditions

Perturbation stability Proxy conditions(Efficiently testable with

privacy)

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Interlude: A simple subsampling based private LASSO algorithm [Smith,T.โ€™13]

Notion of Neighboring Data sets

๐‘›

๐‘

๐‘ฅ๐‘– ๐‘ฆ ๐‘–

Data set =

Design matrix Response vector

Notion of Neighboring Data sets

๐‘›

๐‘

๐‘ฅ๐‘– โ€ฒ ๐‘ฆ ๐‘–โ€ฒ

Data set =

and are neighboring data sets

Design matrix Response vector

Recap: Subsampling Stability

Data set

Random subsamplewith replacement ๐ท1 ๐ท๐‘š

Function

Stability:

Functionยฟ w.p.

Recap: PTR+Report-Noisy-Max Aggregator

Assumption: All candidate models

๐‘†1 ๐‘†2 ๐‘†โˆ— ๐‘†๐‘˜

Coun

t

๐ท1 ๐ท๐‘š

Vote Vote

+ยฟยฟ

๐‘“ ๐‘“ ๐‘“

Recap: PTR+Report-Noisy-Max Aggregator

โ€ข : Sample each entry from w.p.

โ€ข Each entry of appears in data blocks w.p.

โ€ข Fix

1. If, then return , else return ๐‘†1 ๐‘†2 ๐‘†โˆ— ๐‘†๐‘Ÿ

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

๐‘ฆ ๐‘›ร—1

+ยฟยฟ๐‘‹๐‘›ร—๐‘

๐œƒ๐‘ร— 1โˆ—

๐‘ค๐‘›ร— 1

Resp

onse

vect

or Design matrix

Para

mete

r vect

or

Field

nois

e

Subsampling Stability of the LASSO

Stochastic assumptions: Each data entry in Noise

+ยฟยฟ

Theorem [Wainwright06,ZY07]: Under proper choice of and , support of the LASSO estimator equals support of

Theorem: Under proper choice of , and , the output of the aggregator equals support of

Notice the gap of

Scale of

Perturbation stability based private LASSO and optimal sample complexity [Smith,T.โ€™13]

Recap: Distance to Instability Property

โ€ขDefinition: A function is stable at a data set ifโ€ข For any data set , with ,

โ€ขDistance to instability:

โ€ขObjective: Output while preserving differential privacy

All data setsUnstable data sets

๐ทDistance

Stable data sets

1. If, then return , else return

Recap: Propose-Test-Release Framework (PTR)

Theorem: The algorithm is differentially private

Theorem: If is -stable at , then w.p. the algorithm outputs

TBD: Some global sensitivity one query

Instantiation of PTR for the LASSO

LASSO:

โ€ข Set function support of

โ€ข Issue: For , distance to instability might not be efficiently

computable

+ยฟยฟ

From [Smith,T.โ€™13] Consistency

conditions

Perturbation stability Proxy conditions(Efficiently testable with

privacy)

This talk

Consistency conditions

Perturbation stability Proxy conditions(Efficiently testable with

privacy)

Perturbation Stability of the LASSO

LASSO: +ยฟยฟ

Theorem: Consistency conditions on LASSO are sufficient for perturbation stabilityProof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at

2. Show that support() is stable via using โ€˜โ€™dual certificateโ€™โ€™ on stable instances

Perturbation Stability of the LASSO+ยฟยฟ

Lasso objective on

Proof Sketch: Gradient of LASSO =

0โˆˆ๐œ• ๐ฝ ๐ท( ๏ฟฝฬ‚๏ฟฝ)

Lasso objective on

0โˆˆ๐œ• ๐ฝ ๐ท โ€ฒ (๏ฟฝฬ‚๏ฟฝ โ€ฒ )

Perturbation Stability of the LASSO+ยฟยฟ

Proof Sketch: Gradient of LASSO =

Argue using the optimality conditions of and

1. No zero coordinates of become non-zero in (use mutual incoherence condition)

2. No non-zero coordinates of become zero in (use restricted strong convexity condition)

Perturbation Stability Test for the LASSO

0 0

: Support of : Complement of the support of

Test for the following (real test is more complex):โ€ข Restricted Strong Convexity (RSC): Minimum eigenvalue of

is

โ€ข Strong stability: Negative of the (absolute) coordinates of

the gradient of the least-squared loss in are

+ยฟยฟ

Intuition: Strong convexity ensures supp() supp()

1. Strong convexity ensures is small

2. If is large, then

3. Consistency conditions imply is large

Geometry of the Stability of LASSO+ยฟยฟ

Dimension 2 in

Dimension 1 in ๏ฟฝฬ‚๏ฟฝ

Lasso objective along

Intuition: Strong stability ensures no zero coordinate in becomes non-zero in

โ€ข For the minimizer to move along , the perturbation to the gradient of least-squared loss has to be large

Geometry of the Stability of LASSO+ยฟยฟ

Dimension 2 in

Dimension 1 in

Slope:

Slope: -

๏ฟฝฬ‚๏ฟฝ

Lasso objective along

Geometry of the Stability of LASSO+ยฟยฟ

Gradient of the least-squared loss:

๐‘Ž๐‘–

๐‘Ž๐‘

ฮ“

ฮ“ c

โ€ข Strong stability: for all has a sub-gradient of zero for LASSO()

โˆ’๐‘‹๐‘‡ (๐‘ฆโˆ’๐‘‹ ๏ฟฝฬ‚๏ฟฝ )=ยฟDimension 2 in

Dimension 1 in

Slope:

Slope: -

๏ฟฝฬ‚๏ฟฝ

Lasso objective along

Test for Restricted Strong Convexity:

Test for strong stability:

Issue: If and , then sensitivities are and

Our solution: Proxy distance โ€ข has global sensitivity of one

Making the Stability Test Private (Simplified)

๐‘”1

๐‘”2

and are both largeand insensitive

+ยฟยฟ

1. Compute = function of and

2. If, then return , else return

Private Model Selection with Optimal Sample Complexity

Theorem: The algorithm is differentially private

Theorem: Under consistency conditions , and , w.h.p. the support of is output. Here .

+ยฟยฟNearly optimal sample complexity

Thesis: Stable algorithms yield differentially private algorithms

Two notions of stability:

1. Perturbation stability

2. Subsampling stability

This Talk

1. Differential privacy via stability arguments: A meta-algorithm

2. Sample and aggregate framework and private model selection

3. Non-private sparse linear regression in high-dimensions

4. Private sparse linear regression with (nearly) optimal rate

Concluding Remarks1. Sample and aggregate framework with PTR+report-noisy-

max aggregator is a generic tool for designing learning algorithms

โ€ข Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]

2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently

3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity

Recommended