27
1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

Embed Size (px)

Citation preview

Page 1: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

1

Differentially Private Analysis of Graphs and Social Networks

Sofya RaskhodnikovaPennsylvania State University

Page 2: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

2

Graphs and networks

Image source: Nykamp DQ, “An introduction to networks.” From Math Insight. http://mathinsight.org/network_introduction.

Many types of data can be represented as graphs, where nodes represent individuals and edges capture relationships.

Page 3: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

3

Potentially sensitive information in graphs

• Social, romantic and sexual relationships• “Friendships” in an online social network• Financial transactions• Phone calls and email communication• Doctor-patient relationships

Source: Christakis, Fowler. The Spread of Obesity in a Large Social Network over 32 Years. N Engl J Med 2007; 357:370-379

Source: B. Aven. The effects of corruption on organizational networks and individual behavior. MIT workshop: Information and Decision in Social Networks, 2011.

Page 4: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

4

Two conflicting goals

• Privacy: protecting information of individuals.• Utility: drawing accurate conclusions about aggregate

information.

Privacy Utility

Page 5: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

5

• False dichotomy: personally identifying vs. non-personally identifying information.• Links and any other information about individual can be used for de-anonymization.

``Anonymized’’ graphs still pose privacy risk

Bearman, Moody, Stovel. Chains of affection: The structure of adolescent romantic and sexual networks, American J. Sociology, 2008

In a typical real-life network, many nodes have unique neighborhoods.

Page 6: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

6

Some published de-anonymization attacks

– Movie ratings [Narayanan, Shmatikov 08]De-identified Netflix users based on information from a public movie database IMDb.

– Social networks [Backstrom, Dwork, Kleinberg 07;Narayanan, Shmatikov 09; Narayanan, Shi, Rubinstein 12]

Re-identified users in an online social network (anonymized Twitter) based information from a public online social network (Flickr).

– Computer networks[Coull, Wright, Monrose, Collins, Reiter 07;

Ribeiro, Chen, Miklau, Townsley 08,…]

Can reidentify individuals based on external sources.

Movies

People

Page 7: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

7

• Government agency for surveillance.• A phisher/spammer to write a personalized message.• Health insurance provider to check preexisting

conditions. • Marketers to focus advertising on influential nodes.• Stalkers, nosy neighbors, colleagues, or employers.

Who’d want to de-anonymize a social network graph?

image sources: Andrew Joyner, http://dukeromkey.com/

Page 8: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

8

What information can be released

without violating privacy?

Page 9: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

Differential privacy (for graph data)

Differential privacy [Dwork McSherry Nissim Smith 06]Given an algorithm A is -differentially private if for all pairs of neighbors and all possible outputs o:

𝑷𝒓 [ 𝑨 (𝑮 )=𝒐 ] ≤ (𝟏+𝝐 )⋅𝑷𝒓 [𝑨 (𝑮′ )=𝒐 ] .

Graph G

9image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Algorithm

Data processing

output

Data release

Page 10: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

10

Two variants of differential privacy for graphs

• Edge differential privacy

Two graphs are neighbors if they differ in one edge.

• Node differential privacy

Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges.

G: G:

G: G:

Page 11: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

Differential privacy (for graph data)

Differential privacy [Dwork McSherry Nissim Smith 06]An algorithm A is -differentially private if for all pairs of neighbors and all possible outputs o:

𝑷𝒓 [ 𝑨 (𝑮 )=𝒐 ] ≤ (𝟏+𝝐 )⋅𝑷𝒓 [𝑨 (𝑮′ )=𝒐 ]

Graph G

11image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Algorithm

Data processing

output

Data release

Page 12: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

12

Some useful properties of differential privacy• Robustness to side information• Post-processing:

an algorithm that runs an -differentially private subroutine is an -differentially private.

• Composition: an algorithm that runs subroutines that are

-differentially private is -differentially private.

Page 13: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

13

Is differential privacy too strong?

• No weaker notion has been proposed that satisfies all three useful properties.

• We can actually attain it for many useful statistics!

Page 14: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

14

What graph statistics can be computed accurately

with differential privacy?

Page 15: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

15

• Number of edges• Counts of small subgraphs

(e.g., triangles, -triangles, -stars)• Degree distribution

• Number of connected components• How different our graph is from a graph with a certain property• Correlation between node attributes (e.g., body fat index) and

network attributes (e.g., number of obese neighbors).• Sizes of graph cuts.• …

Graph statistics

… …

Fraction of nodes of degree d

Degree d

……

The degree of a node isthe number of connections it has.

Page 16: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

16

Tools used in differentially private graph algorithms

• Smooth sensitivity– A more nuanced notion of sensitivity than the one

mentioned in the previous talk• Sample and aggregate• Maximum flow• Linear and convex programming• Random projections• Iterative updates• Postprocessing

Page 17: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

17

Differentially private graph analysis

A taste of techniques

Page 18: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

Basic question: how to compute a statistic fGraph G

18image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Algorithm

Data processing

Approximation to

How accurately can an -differentially private algorithm compute f(G)?

Data release

Page 19: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

19

• Sensitivity of a function is

• Previous talk: if has low sensitivity, it can be computed accurately with differential privacy (by Laplace mechanism).

• Challenge: Even simple graph statistics have high sensitivity.

𝝏 𝒇 = max(𝐧𝐨𝐝𝐞 )𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 𝑠 𝐺 ,𝐺 ′

|𝑓 (𝐺 )− 𝑓 (𝐺′ )|

Challenge for node privacy: high sensitivity

𝒇 (𝑮 ′ ) 𝒇 (𝑮)

Page 20: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

20

• Sensitivity of a function is

• Examples: (G) is the number of edges in G. (G) is the number of triangles in G.

max(𝐧𝐨𝐝𝐞 )𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 𝑠𝐺 ,𝐺 ′

|𝑓 (𝐺 )− 𝑓 (𝐺′ )|

= . =.

Challenge for node privacy: high sensitivity

for graphs on nodes:

Page 21: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

21

Idea: project onto graphs with low sensitivity.

[Kasiviswanathan Nissim Raskhodnikova Smith 13]

See also [Blocki Blum Datta Sheffet 13, Chen Zhou 13]

Page 22: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

22

Consider graphs of degree , where

• Examples: (G) is the number of edges in G. (G) is the number of triangles in G.

= . =.

“Projections” on graphs of small degree

Graphs of degree .

All graphs

Page 23: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

23

Lipschitz extensions

• Release using Laplace mechanism

• There exist Lipschitz extensions for all real-valued functions• We design Lipschitz extensions that can be computed

efficiently using linear and convex programming for – subgraph counts– degree distribution

A function is a Lipschitz extension of if agrees with on graphs of degree Sensitivity of is low

Graphs of degree .

All graphs

𝒇 ′= 𝒇

Page 24: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

24

Summary

• Accurate subgraph counts for realistic graphs can be computed by node-private algorithms– Use Lipschitz extensions and linear programming– It is one example of many graph statistics that node-private

algorithms do well on.

Page 25: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

25

What can’t be computed differentially privately?

• Differential privacy explicitly excludes the possibility of computing anything that depends on one person’s data:– Is there a node in the graph that has atypical connections?– ``suspicious communication patterns’’?

Page 26: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

26

What we are working on

• Node differentially private algorithms for releasing– a large number of graph statistics at once– synthetic graphs

• Exciting area of research:– Edge-private algorithms [Nissim, Raskhodnikova, Smith 07; Hay, Rastogi, Miklau, Suciu 09; Hay, Li, Miklau, Jensen 09; Hardt, Rothblum 10; Karwa, Raskhodnikova, Smith, Yaroslavtsev 11; Karwa, Slavkovic 12; Blocki, Blum, Datta, Sheffet 12; Gupta, Roth, Ullman 12; Mir, Wright 12; Kifer, Lin 13, …]

– Node-private algorithms [Gehrke Lui Pass 12; Blocki Blum Datta Sheffet 13, Kasiviswanathan Nissim Raskhodnikova Smith 13, Chen Zhou 13, Raskhodnikova Smith, ..]

Page 27: 1 Differentially Private Analysis of Graphs and Social Networks Sofya Raskhodnikova Pennsylvania State University

27

Conclusions

• We are close to having edge-private and node-private algorithms that work well in practice for many basic graph statistics.

• Accurate node-private algorithms were thought to be impossible only a few years ago.

• Differential privacy is influencing other scientific disciplines– Next talk: reducing false discovery rate.