From Stability to Differential Privacy
Abhradeep Guha ThakurtaYahoo! Labs, Sunnyvale
Thesis: Stable algorithms yield differentially private algorithms
Differential privacy: A short tutorial
Privacy in Machine Learning Systems
๐1๐2๐1
๐๐โ 1
๐๐
Individuals
Privacy in Machine Learning Systems
๐1๐2๐1
๐๐โ 1
๐๐
Individuals
Trusted learning Algorithm
Privacy in Machine Learning Systems
๐1๐2๐1
๐๐โ 1
๐๐
Individuals
Trusted learning Algorithm
UsersSumma
ry statistic
s1. Classifiers2. Clusters3. Regressio
n coefficients
Privacy in Machine Learning Systems
๐1๐2๐1
๐๐โ 1
๐๐
Individuals
Trusted learning Algorithm
UsersSumma
ry statistic
s1. Classifiers2. Clusters3. Regressio
n coefficients
Attacker
Privacy in Machine Learning Systems
Learning Algorithm
๐1๐2๐1
๐๐โ 1
๐๐
Two conflicting goals:
1. Utility: Release accurate information
2. Privacy: Protect privacy of individual entries
Balancing the tradeoff is a difficult problem:
1. Netflix prize database attack [NS08]
2. Facebook advertisement system attack [Korolova11]
3. Amazon recommendation system attack [CKNFS11]
Data privacy is an active area of research:
โข Computer science, economics, statistics, biology, social sciences โฆ
Users
Differential Privacy [DMNS06, DKMMN06]Intuition:
โข Adversary learns essentially the same thing irrespective of your presence or absence in the data set
โข and are called neighboring data sets
โข Require: Neighboring data sets induce close distribution on outputs
M
Random coins
๐1M()
M
Random coins
๐1M()
Data set: Data set:
Differential Privacy [DMNS06, DKMMN06]
Definition:
A randomized algorithm M is -differentially private if
โข for all data sets and that differ in one elementโข for all sets of answers
โขDifferential privacy is a condition on the algorithm
โขGuarantee is meaningful in the presence of any auxiliary information
โข Typically, think of privacy parameters: and , where = # of data samples
โข Composition: โs and โs add up over multiple executions
Semantics of Differential Privacy
Laplace Mechanism [DMNS06]
Data set and be a function on
Sensitivity: S()
1. Random variable sampled from Lap() 2. Output
Theorem (Privacy): Algorithm is -differentially private
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Perturbation stability (a.k.a. zero local sensitivity)
Perturbation Stability
Function
Data set
Output
Perturbation Stability
Function
Data set
Output
Stability of at : The output does not change on changing any one entryEquivalently, local sensitivity of at is zero
Distance to Instability Property
โขDefinition: A function is stable at a data set ifโข For any data set , with ,
โขDistance to instability:
โขObjective: Output while preserving differential privacy
All data setsUnstable data sets
๐ทDistance
Stable data sets
Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.โ13]
1. If, then return , else return
A Meta-algorithm: Propose-Test-Release (PTR)
Theorem: The algorithm is differentially private
Theorem: If is -stable at , then w.p. the algorithm outputs
Basic tool: Laplace mechanism
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
This Talk
Sample and aggregate framework[NRS07, Smith11, Smith T.โ13]
Sample and Aggregate FrameworkData set
Subsample
๐ท1 ๐ท๐
Output
Algorithm
Aggregator
Sample and Aggregate Framework
Theorem: If the aggregator is differentially private, then the overall framework is differentially private
Assumption: Each entry appears in data blocks
Proof: Each data entry affects only one data block
A differentially private aggregator using PTR framework [Smith T.โ13]
Assumption: discrete possible outputs
๐1 ๐2 ๐โ ๐๐
Coun
t
๐ท1 ๐ท๐
Vote Vote
An differentially Private Aggregator
Function : Candidate output with the maximum votes
PTR+Report-Noisy-Max Aggregator
1. If, then return , else return
Observation: is the gap between the counts of highest and the second highest scoring modelObservation: The algorithm is always computationally efficient
Analysis of the aggregator under subsampling stability [Smith T.โ13]
Subsampling Stability
Data set
Random subsamplewith replacement ๐ท1 ๐ท๐
Function
Stability:
Functionยฟ w.p.
A Private Aggregator using Subsampling Stability
Voting histogram (in expectation)
๐1 ๐2 ๐โ ๐๐
12m
34๐
12๐
14๐
โข : Sample each entry from w.p.
โข Each entry of appears in data blocks
PTR+Report-Noisy-Max Aggregator
โข : Sample each entry from w.p.
โข Each entry of appears in data blocks w.p.
1. If, then return , else return ๐1 ๐2 ๐โ ๐๐
Theorem: Above algorithm is differentially private
Theorem: If ,then with probability at least , the true answer is output
A Private Aggregator using Subsampling Stability
Notice: Utility guarantee does not depend on the number of candidate models
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Sparse linear regression in high-dimensions and the LASSO
Sparse Linear Regression in High-dimensions ()โข Data set: where and
โข Assumption: Data generated by noisy linear system
๐ฆ ๐ +ยฟยฟ
๐ฅ๐
๐๐ร 1โ
๐ค๐
Para
mete
r vect
or
Field noise
Feature vector
Data normalization:
โข is sub-Gaussian
Sparse Linear Regression in High-dimensions ()โข Data set: where and
โข Assumption: Data generated by noisy linear system
๐ฆ ๐ร1
+ยฟยฟ๐๐ร๐
๐๐ร 1โ
๐ค๐ร 1
Resp
onse
vect
or Design matrix
Para
mete
r vect
or
Field
nois
e
โข Sparsity: has non-zero entries
โข Bounded norm: for arbitrary small const.
Model selection problem: Find the non-zero coordinates of
Sparse Linear Regression in High-dimensions ()
๐ฆ ๐ร1
+ยฟยฟ๐๐ร๐
๐๐ร 1โ
๐ค๐ร 1
Resp
onse
vect
or Design matrix
Field
nois
e
Model selection: Non-zero coordinates (or the support) of
Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,โฆ]
Sparse Linear Regression in High-dimensions ()
๐ฆ ๐ร1
+ยฟยฟ๐๐ร๐
๐๐ร 1โ
๐ค๐ร 1
Resp
onse
vect
or Design matrix
Field
nois
e
Incoherence Restricted Strong Convexity
Consistency of the LASSO Estimator
Consistency conditions* [Wainwright06,ZY07]:
โข Support of the underlying parameter vector
+ยฟยฟ
๐ ฮ ๐ ฮ ๐
Restricted Strong Convexity
Consistency of the LASSO Estimator
Consistency conditions* [Wainwright06,ZY07]:
โข Support of the underlying parameter vector
+ยฟยฟ
Theorem*: Under proper choice of and , support of the LASSO estimator equals support of
Incoherence
Incoherence Restricted Strong Convexity
Stochastic Consistency of the LASSO
Consistency conditions* [Wainwright06,ZY07]:
โข Support of the underlying parameter vector
+ยฟยฟ
Theorem [Wainwright06,ZY07]: If each data entry in , then the assumptions above are satisfied w.h.p.
We show [Smith,T.โ13]
Consistency conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Interlude: A simple subsampling based private LASSO algorithm [Smith,T.โ13]
Notion of Neighboring Data sets
๐
๐
๐ฅ๐ ๐ฆ ๐
Data set =
Design matrix Response vector
Notion of Neighboring Data sets
๐
๐
๐ฅ๐ โฒ ๐ฆ ๐โฒ
Data set =
and are neighboring data sets
Design matrix Response vector
Recap: Subsampling Stability
Data set
Random subsamplewith replacement ๐ท1 ๐ท๐
Function
Stability:
Functionยฟ w.p.
Recap: PTR+Report-Noisy-Max Aggregator
Assumption: All candidate models
๐1 ๐2 ๐โ ๐๐
Coun
t
๐ท1 ๐ท๐
Vote Vote
+ยฟยฟ
๐ ๐ ๐
Recap: PTR+Report-Noisy-Max Aggregator
โข : Sample each entry from w.p.
โข Each entry of appears in data blocks w.p.
โข Fix
1. If, then return , else return ๐1 ๐2 ๐โ ๐๐
Subsampling Stability of the LASSO
Stochastic assumptions: Each data entry in Noise
๐ฆ ๐ร1
+ยฟยฟ๐๐ร๐
๐๐ร 1โ
๐ค๐ร 1
Resp
onse
vect
or Design matrix
Para
mete
r vect
or
Field
nois
e
Subsampling Stability of the LASSO
Stochastic assumptions: Each data entry in Noise
+ยฟยฟ
Theorem [Wainwright06,ZY07]: Under proper choice of and , support of the LASSO estimator equals support of
Theorem: Under proper choice of , and , the output of the aggregator equals support of
Notice the gap of
Scale of
Perturbation stability based private LASSO and optimal sample complexity [Smith,T.โ13]
Recap: Distance to Instability Property
โขDefinition: A function is stable at a data set ifโข For any data set , with ,
โขDistance to instability:
โขObjective: Output while preserving differential privacy
All data setsUnstable data sets
๐ทDistance
Stable data sets
1. If, then return , else return
Recap: Propose-Test-Release Framework (PTR)
Theorem: The algorithm is differentially private
Theorem: If is -stable at , then w.p. the algorithm outputs
TBD: Some global sensitivity one query
Instantiation of PTR for the LASSO
LASSO:
โข Set function support of
โข Issue: For , distance to instability might not be efficiently
computable
+ยฟยฟ
From [Smith,T.โ13] Consistency
conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
This talk
Consistency conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
Perturbation Stability of the LASSO
LASSO: +ยฟยฟ
Theorem: Consistency conditions on LASSO are sufficient for perturbation stabilityProof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at
2. Show that support() is stable via using โโdual certificateโโ on stable instances
Perturbation Stability of the LASSO+ยฟยฟ
Lasso objective on
Proof Sketch: Gradient of LASSO =
0โ๐ ๐ฝ ๐ท( ๏ฟฝฬ๏ฟฝ)
Lasso objective on
0โ๐ ๐ฝ ๐ท โฒ (๏ฟฝฬ๏ฟฝ โฒ )
Perturbation Stability of the LASSO+ยฟยฟ
Proof Sketch: Gradient of LASSO =
Argue using the optimality conditions of and
1. No zero coordinates of become non-zero in (use mutual incoherence condition)
2. No non-zero coordinates of become zero in (use restricted strong convexity condition)
Perturbation Stability Test for the LASSO
0 0
: Support of : Complement of the support of
Test for the following (real test is more complex):โข Restricted Strong Convexity (RSC): Minimum eigenvalue of
is
โข Strong stability: Negative of the (absolute) coordinates of
the gradient of the least-squared loss in are
+ยฟยฟ
Intuition: Strong convexity ensures supp() supp()
1. Strong convexity ensures is small
2. If is large, then
3. Consistency conditions imply is large
Geometry of the Stability of LASSO+ยฟยฟ
Dimension 2 in
Dimension 1 in ๏ฟฝฬ๏ฟฝ
Lasso objective along
Intuition: Strong stability ensures no zero coordinate in becomes non-zero in
โข For the minimizer to move along , the perturbation to the gradient of least-squared loss has to be large
Geometry of the Stability of LASSO+ยฟยฟ
Dimension 2 in
Dimension 1 in
Slope:
Slope: -
๏ฟฝฬ๏ฟฝ
Lasso objective along
Geometry of the Stability of LASSO+ยฟยฟ
Gradient of the least-squared loss:
๐๐
๐๐
ฮ
ฮ c
โข Strong stability: for all has a sub-gradient of zero for LASSO()
โ๐๐ (๐ฆโ๐ ๏ฟฝฬ๏ฟฝ )=ยฟDimension 2 in
Dimension 1 in
Slope:
Slope: -
๏ฟฝฬ๏ฟฝ
Lasso objective along
Test for Restricted Strong Convexity:
Test for strong stability:
Issue: If and , then sensitivities are and
Our solution: Proxy distance โข has global sensitivity of one
Making the Stability Test Private (Simplified)
๐1
๐2
and are both largeand insensitive
+ยฟยฟ
1. Compute = function of and
2. If, then return , else return
Private Model Selection with Optimal Sample Complexity
Theorem: The algorithm is differentially private
Theorem: Under consistency conditions , and , w.h.p. the support of is output. Here .
+ยฟยฟNearly optimal sample complexity
Thesis: Stable algorithms yield differentially private algorithms
Two notions of stability:
1. Perturbation stability
2. Subsampling stability
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Concluding Remarks1. Sample and aggregate framework with PTR+report-noisy-
max aggregator is a generic tool for designing learning algorithms
โข Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]
2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently
3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity