Upload
maverick-robertson
View
31
Download
1
Tags:
Embed Size (px)
DESCRIPTION
From Stability to Differential Privacy. Abhradeep Guha Thakurta Yahoo! Labs , Sunnyvale. Thesis: Stable algorithms yield differentially private algorithms. Differential privacy: A short tutorial. Privacy in Machine Learning Systems. Individuals. - PowerPoint PPT Presentation
Citation preview
From Stability to Differential Privacy
Abhradeep Guha ThakurtaYahoo! Labs, Sunnyvale
Thesis: Stable algorithms yield differentially private algorithms
Differential privacy: A short tutorial
Privacy in Machine Learning Systems
𝑑1𝑑2𝑑1
𝑑𝑛− 1
𝑑𝑛
Individuals
Privacy in Machine Learning Systems
𝑑1𝑑2𝑑1
𝑑𝑛− 1
𝑑𝑛
Individuals
Trusted learning Algorithm
Privacy in Machine Learning Systems
𝑑1𝑑2𝑑1
𝑑𝑛− 1
𝑑𝑛
Individuals
Trusted learning Algorithm
UsersSumma
ry statistic
s1. Classifiers2. Clusters3. Regressio
n coefficients
Privacy in Machine Learning Systems
𝑑1𝑑2𝑑1
𝑑𝑛− 1
𝑑𝑛
Individuals
Trusted learning Algorithm
UsersSumma
ry statistic
s1. Classifiers2. Clusters3. Regressio
n coefficients
Attacker
Privacy in Machine Learning Systems
Learning Algorithm
𝑑1𝑑2𝑑1
𝑑𝑛− 1
𝑑𝑛
Two conflicting goals:
1. Utility: Release accurate information
2. Privacy: Protect privacy of individual entries
Balancing the tradeoff is a difficult problem:
1. Netflix prize database attack [NS08]
2. Facebook advertisement system attack [Korolova11]
3. Amazon recommendation system attack [CKNFS11]
Data privacy is an active area of research:
• Computer science, economics, statistics, biology, social sciences …
Users
Differential Privacy [DMNS06, DKMMN06]Intuition:
• Adversary learns essentially the same thing irrespective of your presence or absence in the data set
• and are called neighboring data sets
• Require: Neighboring data sets induce close distribution on outputs
M
Random coins
𝑑1M()
M
Random coins
𝑑1M()
Data set: Data set:
Differential Privacy [DMNS06, DKMMN06]
Definition:
A randomized algorithm M is -differentially private if
• for all data sets and that differ in one element• for all sets of answers
•Differential privacy is a condition on the algorithm
•Guarantee is meaningful in the presence of any auxiliary information
• Typically, think of privacy parameters: and , where = # of data samples
• Composition: ’s and ‘s add up over multiple executions
Semantics of Differential Privacy
Laplace Mechanism [DMNS06]
Data set and be a function on
Sensitivity: S()
1. Random variable sampled from Lap() 2. Output
Theorem (Privacy): Algorithm is -differentially private
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Perturbation stability (a.k.a. zero local sensitivity)
Perturbation Stability
Function
Data set
Output
Perturbation Stability
Function
Data set
Output
Stability of at : The output does not change on changing any one entryEquivalently, local sensitivity of at is zero
Distance to Instability Property
•Definition: A function is stable at a data set if• For any data set , with ,
•Distance to instability:
•Objective: Output while preserving differential privacy
All data setsUnstable data sets
𝐷Distance
Stable data sets
Propose-Test-Release (PTR) framework [DL09, KRSY11, Smith T.’13]
1. If, then return , else return
A Meta-algorithm: Propose-Test-Release (PTR)
Theorem: The algorithm is differentially private
Theorem: If is -stable at , then w.p. the algorithm outputs
Basic tool: Laplace mechanism
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
This Talk
Sample and aggregate framework[NRS07, Smith11, Smith T.’13]
Sample and Aggregate FrameworkData set
Subsample
𝐷1 𝐷𝑚
Output
Algorithm
Aggregator
Sample and Aggregate Framework
Theorem: If the aggregator is differentially private, then the overall framework is differentially private
Assumption: Each entry appears in data blocks
Proof: Each data entry affects only one data block
A differentially private aggregator using PTR framework [Smith T.’13]
Assumption: discrete possible outputs
𝑆1 𝑆2 𝑆∗ 𝑆𝑟
Coun
t
𝐷1 𝐷𝑚
Vote Vote
An differentially Private Aggregator
Function : Candidate output with the maximum votes
PTR+Report-Noisy-Max Aggregator
1. If, then return , else return
Observation: is the gap between the counts of highest and the second highest scoring modelObservation: The algorithm is always computationally efficient
Analysis of the aggregator under subsampling stability [Smith T.’13]
Subsampling Stability
Data set
Random subsamplewith replacement 𝐷1 𝐷𝑚
Function
Stability:
Function¿ w.p.
A Private Aggregator using Subsampling Stability
Voting histogram (in expectation)
𝑆1 𝑆2 𝑆∗ 𝑆𝑟
12m
34𝑚
12𝑚
14𝑚
• : Sample each entry from w.p.
• Each entry of appears in data blocks
PTR+Report-Noisy-Max Aggregator
• : Sample each entry from w.p.
• Each entry of appears in data blocks w.p.
1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟
Theorem: Above algorithm is differentially private
Theorem: If ,then with probability at least , the true answer is output
A Private Aggregator using Subsampling Stability
Notice: Utility guarantee does not depend on the number of candidate models
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Sparse linear regression in high-dimensions and the LASSO
Sparse Linear Regression in High-dimensions ()• Data set: where and
• Assumption: Data generated by noisy linear system
𝑦 𝑖 +¿¿
𝑥𝑖
𝜃𝑝× 1∗
𝑤𝑖
Para
mete
r vect
or
Field noise
Feature vector
Data normalization:
• is sub-Gaussian
Sparse Linear Regression in High-dimensions ()• Data set: where and
• Assumption: Data generated by noisy linear system
𝑦 𝑛×1
+¿¿𝑋𝑛×𝑝
𝜃𝑝× 1∗
𝑤𝑛× 1
Resp
onse
vect
or Design matrix
Para
mete
r vect
or
Field
nois
e
• Sparsity: has non-zero entries
• Bounded norm: for arbitrary small const.
Model selection problem: Find the non-zero coordinates of
Sparse Linear Regression in High-dimensions ()
𝑦 𝑛×1
+¿¿𝑋𝑛×𝑝
𝜃𝑝× 1∗
𝑤𝑛× 1
Resp
onse
vect
or Design matrix
Field
nois
e
Model selection: Non-zero coordinates (or the support) of
Solution: LASSO estimator [Tibshirani94,EFJT03,Wainwright06,CT07,ZY07,…]
Sparse Linear Regression in High-dimensions ()
𝑦 𝑛×1
+¿¿𝑋𝑛×𝑝
𝜃𝑝× 1∗
𝑤𝑛× 1
Resp
onse
vect
or Design matrix
Field
nois
e
Incoherence Restricted Strong Convexity
Consistency of the LASSO Estimator
Consistency conditions* [Wainwright06,ZY07]:
• Support of the underlying parameter vector
+¿¿
𝑋 Γ 𝑋 Γ 𝑐
Restricted Strong Convexity
Consistency of the LASSO Estimator
Consistency conditions* [Wainwright06,ZY07]:
• Support of the underlying parameter vector
+¿¿
Theorem*: Under proper choice of and , support of the LASSO estimator equals support of
Incoherence
Incoherence Restricted Strong Convexity
Stochastic Consistency of the LASSO
Consistency conditions* [Wainwright06,ZY07]:
• Support of the underlying parameter vector
+¿¿
Theorem [Wainwright06,ZY07]: If each data entry in , then the assumptions above are satisfied w.h.p.
We show [Smith,T.’13]
Consistency conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Interlude: A simple subsampling based private LASSO algorithm [Smith,T.’13]
Notion of Neighboring Data sets
𝑛
𝑝
𝑥𝑖 𝑦 𝑖
Data set =
Design matrix Response vector
Notion of Neighboring Data sets
𝑛
𝑝
𝑥𝑖 ′ 𝑦 𝑖′
Data set =
and are neighboring data sets
Design matrix Response vector
Recap: Subsampling Stability
Data set
Random subsamplewith replacement 𝐷1 𝐷𝑚
Function
Stability:
Function¿ w.p.
Recap: PTR+Report-Noisy-Max Aggregator
Assumption: All candidate models
𝑆1 𝑆2 𝑆∗ 𝑆𝑘
Coun
t
𝐷1 𝐷𝑚
Vote Vote
+¿¿
𝑓 𝑓 𝑓
Recap: PTR+Report-Noisy-Max Aggregator
• : Sample each entry from w.p.
• Each entry of appears in data blocks w.p.
• Fix
1. If, then return , else return 𝑆1 𝑆2 𝑆∗ 𝑆𝑟
Subsampling Stability of the LASSO
Stochastic assumptions: Each data entry in Noise
𝑦 𝑛×1
+¿¿𝑋𝑛×𝑝
𝜃𝑝× 1∗
𝑤𝑛× 1
Resp
onse
vect
or Design matrix
Para
mete
r vect
or
Field
nois
e
Subsampling Stability of the LASSO
Stochastic assumptions: Each data entry in Noise
+¿¿
Theorem [Wainwright06,ZY07]: Under proper choice of and , support of the LASSO estimator equals support of
Theorem: Under proper choice of , and , the output of the aggregator equals support of
Notice the gap of
Scale of
Perturbation stability based private LASSO and optimal sample complexity [Smith,T.’13]
Recap: Distance to Instability Property
•Definition: A function is stable at a data set if• For any data set , with ,
•Distance to instability:
•Objective: Output while preserving differential privacy
All data setsUnstable data sets
𝐷Distance
Stable data sets
1. If, then return , else return
Recap: Propose-Test-Release Framework (PTR)
Theorem: The algorithm is differentially private
Theorem: If is -stable at , then w.p. the algorithm outputs
TBD: Some global sensitivity one query
Instantiation of PTR for the LASSO
LASSO:
• Set function support of
• Issue: For , distance to instability might not be efficiently
computable
+¿¿
From [Smith,T.’13] Consistency
conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
This talk
Consistency conditions
Perturbation stability Proxy conditions(Efficiently testable with
privacy)
Perturbation Stability of the LASSO
LASSO: +¿¿
Theorem: Consistency conditions on LASSO are sufficient for perturbation stabilityProof Sketch: 1. Analyze Karush-Kuhn-Tucker (KKT) optimality conditions at
2. Show that support() is stable via using ‘’dual certificate’’ on stable instances
Perturbation Stability of the LASSO+¿¿
Lasso objective on
Proof Sketch: Gradient of LASSO =
0∈𝜕 𝐽 𝐷( �̂�)
Lasso objective on
0∈𝜕 𝐽 𝐷 ′ (�̂� ′ )
Perturbation Stability of the LASSO+¿¿
Proof Sketch: Gradient of LASSO =
Argue using the optimality conditions of and
1. No zero coordinates of become non-zero in (use mutual incoherence condition)
2. No non-zero coordinates of become zero in (use restricted strong convexity condition)
Perturbation Stability Test for the LASSO
0 0
: Support of : Complement of the support of
Test for the following (real test is more complex):• Restricted Strong Convexity (RSC): Minimum eigenvalue of
is
• Strong stability: Negative of the (absolute) coordinates of
the gradient of the least-squared loss in are
+¿¿
Intuition: Strong convexity ensures supp() supp()
1. Strong convexity ensures is small
2. If is large, then
3. Consistency conditions imply is large
Geometry of the Stability of LASSO+¿¿
Dimension 2 in
Dimension 1 in �̂�
Lasso objective along
Intuition: Strong stability ensures no zero coordinate in becomes non-zero in
• For the minimizer to move along , the perturbation to the gradient of least-squared loss has to be large
Geometry of the Stability of LASSO+¿¿
Dimension 2 in
Dimension 1 in
Slope:
Slope: -
�̂�
Lasso objective along
Geometry of the Stability of LASSO+¿¿
Gradient of the least-squared loss:
𝑎𝑖
𝑎𝑝
Γ
Γ c
• Strong stability: for all has a sub-gradient of zero for LASSO()
−𝑋𝑇 (𝑦−𝑋 �̂� )=¿Dimension 2 in
Dimension 1 in
Slope:
Slope: -
�̂�
Lasso objective along
Test for Restricted Strong Convexity:
Test for strong stability:
Issue: If and , then sensitivities are and
Our solution: Proxy distance • has global sensitivity of one
Making the Stability Test Private (Simplified)
𝑔1
𝑔2
and are both largeand insensitive
+¿¿
1. Compute = function of and
2. If, then return , else return
Private Model Selection with Optimal Sample Complexity
Theorem: The algorithm is differentially private
Theorem: Under consistency conditions , and , w.h.p. the support of is output. Here .
+¿¿Nearly optimal sample complexity
Thesis: Stable algorithms yield differentially private algorithms
Two notions of stability:
1. Perturbation stability
2. Subsampling stability
This Talk
1. Differential privacy via stability arguments: A meta-algorithm
2. Sample and aggregate framework and private model selection
3. Non-private sparse linear regression in high-dimensions
4. Private sparse linear regression with (nearly) optimal rate
Concluding Remarks1. Sample and aggregate framework with PTR+report-noisy-
max aggregator is a generic tool for designing learning algorithms
• Example: learning with non-convex models [Bilenko,Dwork,Rothblum,T.]
2. Propose-test-release framework is an interesting tool if one can compute distance to instability efficiently
3. Open problem: Private high-dimensional learning without assumptions like incoherence and restricted strong convexity