Upload
joshua-cunningham
View
223
Download
3
Tags:
Embed Size (px)
Citation preview
1
Differentially Private Analysis of Graphs and Social Networks
Sofya RaskhodnikovaPennsylvania State University
2
Graphs and networks
Image source: Nykamp DQ, “An introduction to networks.” From Math Insight. http://mathinsight.org/network_introduction.
Many types of data can be represented as graphs, where nodes represent individuals and edges capture relationships.
3
Potentially sensitive information in graphs
• Social, romantic and sexual relationships• “Friendships” in an online social network• Financial transactions• Phone calls and email communication• Doctor-patient relationships
Source: Christakis, Fowler. The Spread of Obesity in a Large Social Network over 32 Years. N Engl J Med 2007; 357:370-379
Source: B. Aven. The effects of corruption on organizational networks and individual behavior. MIT workshop: Information and Decision in Social Networks, 2011.
4
Two conflicting goals
• Privacy: protecting information of individuals.• Utility: drawing accurate conclusions about aggregate
information.
Privacy Utility
5
• False dichotomy: personally identifying vs. non-personally identifying information.• Links and any other information about individual can be used for de-anonymization.
``Anonymized’’ graphs still pose privacy risk
Bearman, Moody, Stovel. Chains of affection: The structure of adolescent romantic and sexual networks, American J. Sociology, 2008
In a typical real-life network, many nodes have unique neighborhoods.
6
Some published de-anonymization attacks
– Movie ratings [Narayanan, Shmatikov 08]De-identified Netflix users based on information from a public movie database IMDb.
– Social networks [Backstrom, Dwork, Kleinberg 07;Narayanan, Shmatikov 09; Narayanan, Shi, Rubinstein 12]
Re-identified users in an online social network (anonymized Twitter) based information from a public online social network (Flickr).
– Computer networks[Coull, Wright, Monrose, Collins, Reiter 07;
Ribeiro, Chen, Miklau, Townsley 08,…]
Can reidentify individuals based on external sources.
Movies
People
7
• Government agency for surveillance.• A phisher/spammer to write a personalized message.• Health insurance provider to check preexisting
conditions. • Marketers to focus advertising on influential nodes.• Stalkers, nosy neighbors, colleagues, or employers.
Who’d want to de-anonymize a social network graph?
image sources: Andrew Joyner, http://dukeromkey.com/
8
What information can be released
without violating privacy?
Differential privacy (for graph data)
Differential privacy [Dwork McSherry Nissim Smith 06]Given an algorithm A is -differentially private if for all pairs of neighbors and all possible outputs o:
𝑷𝒓 [ 𝑨 (𝑮 )=𝒐 ] ≤ (𝟏+𝝐 )⋅𝑷𝒓 [𝑨 (𝑮′ )=𝒐 ] .
Graph G
9image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/
Algorithm
Data processing
output
Data release
10
Two variants of differential privacy for graphs
• Edge differential privacy
Two graphs are neighbors if they differ in one edge.
• Node differential privacy
Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges.
G: G:
G: G:
Differential privacy (for graph data)
Differential privacy [Dwork McSherry Nissim Smith 06]An algorithm A is -differentially private if for all pairs of neighbors and all possible outputs o:
𝑷𝒓 [ 𝑨 (𝑮 )=𝒐 ] ≤ (𝟏+𝝐 )⋅𝑷𝒓 [𝑨 (𝑮′ )=𝒐 ]
Graph G
11image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/
Algorithm
Data processing
output
Data release
12
Some useful properties of differential privacy• Robustness to side information• Post-processing:
an algorithm that runs an -differentially private subroutine is an -differentially private.
• Composition: an algorithm that runs subroutines that are
-differentially private is -differentially private.
13
Is differential privacy too strong?
• No weaker notion has been proposed that satisfies all three useful properties.
• We can actually attain it for many useful statistics!
14
What graph statistics can be computed accurately
with differential privacy?
15
• Number of edges• Counts of small subgraphs
(e.g., triangles, -triangles, -stars)• Degree distribution
• Number of connected components• How different our graph is from a graph with a certain property• Correlation between node attributes (e.g., body fat index) and
network attributes (e.g., number of obese neighbors).• Sizes of graph cuts.• …
Graph statistics
… …
Fraction of nodes of degree d
Degree d
……
The degree of a node isthe number of connections it has.
16
Tools used in differentially private graph algorithms
• Smooth sensitivity– A more nuanced notion of sensitivity than the one
mentioned in the previous talk• Sample and aggregate• Maximum flow• Linear and convex programming• Random projections• Iterative updates• Postprocessing
17
Differentially private graph analysis
A taste of techniques
Basic question: how to compute a statistic fGraph G
18image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/
Algorithm
Data processing
Approximation to
How accurately can an -differentially private algorithm compute f(G)?
Data release
19
• Sensitivity of a function is
• Previous talk: if has low sensitivity, it can be computed accurately with differential privacy (by Laplace mechanism).
• Challenge: Even simple graph statistics have high sensitivity.
𝝏 𝒇 = max(𝐧𝐨𝐝𝐞 )𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 𝑠 𝐺 ,𝐺 ′
|𝑓 (𝐺 )− 𝑓 (𝐺′ )|
Challenge for node privacy: high sensitivity
𝒇 (𝑮 ′ ) 𝒇 (𝑮)
20
• Sensitivity of a function is
• Examples: (G) is the number of edges in G. (G) is the number of triangles in G.
max(𝐧𝐨𝐝𝐞 )𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 𝑠𝐺 ,𝐺 ′
|𝑓 (𝐺 )− 𝑓 (𝐺′ )|
= . =.
Challenge for node privacy: high sensitivity
for graphs on nodes:
21
Idea: project onto graphs with low sensitivity.
[Kasiviswanathan Nissim Raskhodnikova Smith 13]
See also [Blocki Blum Datta Sheffet 13, Chen Zhou 13]
22
Consider graphs of degree , where
• Examples: (G) is the number of edges in G. (G) is the number of triangles in G.
= . =.
“Projections” on graphs of small degree
Graphs of degree .
All graphs
23
Lipschitz extensions
• Release using Laplace mechanism
• There exist Lipschitz extensions for all real-valued functions• We design Lipschitz extensions that can be computed
efficiently using linear and convex programming for – subgraph counts– degree distribution
A function is a Lipschitz extension of if agrees with on graphs of degree Sensitivity of is low
Graphs of degree .
All graphs
𝒇 ′= 𝒇
24
Summary
• Accurate subgraph counts for realistic graphs can be computed by node-private algorithms– Use Lipschitz extensions and linear programming– It is one example of many graph statistics that node-private
algorithms do well on.
25
What can’t be computed differentially privately?
• Differential privacy explicitly excludes the possibility of computing anything that depends on one person’s data:– Is there a node in the graph that has atypical connections?– ``suspicious communication patterns’’?
26
What we are working on
• Node differentially private algorithms for releasing– a large number of graph statistics at once– synthetic graphs
• Exciting area of research:– Edge-private algorithms [Nissim, Raskhodnikova, Smith 07; Hay, Rastogi, Miklau, Suciu 09; Hay, Li, Miklau, Jensen 09; Hardt, Rothblum 10; Karwa, Raskhodnikova, Smith, Yaroslavtsev 11; Karwa, Slavkovic 12; Blocki, Blum, Datta, Sheffet 12; Gupta, Roth, Ullman 12; Mir, Wright 12; Kifer, Lin 13, …]
– Node-private algorithms [Gehrke Lui Pass 12; Blocki Blum Datta Sheffet 13, Kasiviswanathan Nissim Raskhodnikova Smith 13, Chen Zhou 13, Raskhodnikova Smith, ..]
27
Conclusions
• We are close to having edge-private and node-private algorithms that work well in practice for many basic graph statistics.
• Accurate node-private algorithms were thought to be impossible only a few years ago.
• Differential privacy is influencing other scientific disciplines– Next talk: reducing false discovery rate.