36
You are who you know: Inferring user profiles in online social networks Alan Mislove †‡§ Bimal Viswanath Krishna P. Gummadi Peter Druschel § Northeastern University MPI-SWS Rice University February 5th, 2010, WSDM

You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

You are who you know:Inferring user profiles in online social networks

Alan Mislove†‡§ Bimal Viswanath† Krishna P. Gummadi† Peter Druschel†

§ Northeastern University †MPI-SWS ‡Rice University

February 5th, 2010, WSDM

Page 2: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Facebook and personal data

Users upload information to sites like FacebookProfile informationStatus updatesPhotos, videos

Privacy model for dataChoose what to revealAnd what to keep private

When reasoning about privacyDon’t often consider implicit dataWhat our friends reveal about ourselves

2

+

Page 3: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

or ?

What is implicit data?

Example: MIT’s Project GaydarPredict sexual orientation based on friends

Exploiting homophily People associate with others like them

What about other attributes?Using friends-of-friends?

3

Page 4: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

This talk

Explore how much implicit data exists on online social networks?Or, how much information can be inferred?How much data is needed to be able to infer?

Focus on one source: social network

Develop methodology to infer user attributesTest on real-world network data

4

Page 5: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Roadmap

. Idea: Use communities to infer attributes

. Collect fine-grained community data

. Do attribute-based communities exist?

. How well can we infer user attributes?

5

Page 6: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Idea: Use communities

Project Gaydar used 1-hop friends

Using >1 hop friends is challengingExponential growth in sizeUnclear relationship to source

Look for groupings of usersCalled communitiesPotentially share attributes

Leverage literature in community detection

6

Page 7: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

What do we mean by communities?

Group: Users who share a common attributeCommunity: Users more densely connected than overall graph

7

Page 8: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

What do we mean by communities?

Group: Users who share a common attributeCommunity: Users more densely connected than overall graph

7

Page 9: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Roadmap

. Idea: Use communities to infer attributes

. Collect fine-grained community data

. Do attribute-based communities exist?

. How well can we infer user attributes?

8

Page 10: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Social network data

9

Crawled two Facebook networksRice University (university)New Orleans (regional)

Picked known seed userCrawled all of his friends, added new users to listEffectively performed a BFS of graph

Rice ugrad

Rice grad

New Orleans

Users Avg. Degree

Page 11: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Social network data

9

Crawled two Facebook networksRice University (university)New Orleans (regional)

Picked known seed userCrawled all of his friends, added new users to listEffectively performed a BFS of graph

Rice ugrad

Rice grad

New Orleans

Users Avg. Degree

Page 12: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Social network data

9

Crawled two Facebook networksRice University (university)New Orleans (regional)

Picked known seed userCrawled all of his friends, added new users to listEffectively performed a BFS of graph

Rice ugrad

Rice grad

New Orleans

Users Avg. Degree

1,220 35.4

501 6.5

63,731 24.2

Page 13: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Collecting attributes

Obtained authoritative information Queried student directoryCollege (dormitory), major(s), year

Could not collect Facebook profiles

10

RICE

Collected Facebook profiles

Extracted all attributesE.g., high school, groups, genderAttributes are freeform text

new orleans

Page 14: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Roadmap

. Idea: Use communities to infer attributes

. Collect fine-grained community data

. Do attribute-based communities exist?

. How well can we infer user attributes?

11

Page 15: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Do attributes define communities?

Put users into groups based on attributesDetermine if these are communities

Need metric to rate communities

Modularity rates community strengthRange [-1,1]0 represents expected in random graph≥0.25 represents community structure

12

Page 16: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Attribute communities for Rice ugrads

Communities based on shared college or yearMultiple, overlapping community structures

13

major

matriculation year

residential college

CommunitiesCommunity SizeCommunity SizeCommunity Size

ModularityCommunitiesMin Avg Max

Modularity

65 1 23 105 0.004

4 95 305 398 0.259

9 130 135 142 0.385

Page 17: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Roadmap

. Idea: Use communities to infer attributes

. Collect fine-grained community data

. Do attribute-based communities exist?

. How well can we infer user attributes?

14

Page 18: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Using communities to infer attributes

Can we detect a single attribute community?Given a few users in the community

Previous approaches proposed (local community detection)Not designed for social networksNever evaluated on a large-scale social network

Propose a new algorithm to detect a specific communityProblem: How to evaluate community strength?

15

Page 19: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Normalized Conductance

16

How strong is a particular community A?

Conductance previously proposedBut, biased towards large communities

Metric: Normalized conductance CFraction of A’s links within ARelative to a random graph

Range is [-1,1]0 represents no stronger than random

A

Rest of Network

C =eAA

eAA + eAB−

eAeA

eAeA + eAeB

Page 20: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Algorithm

Given seed users, find a community byAdding usersStopping at some point

At each step, add user who increases normalized conductance by the most

Stop when no user increases normalized conductance

17

A

Page 21: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

How to evaluate?

Evaluate performance using precision and recallAlgorithm takes in fraction sharing attribute

Ideally want a precision and recall of 1.0

18

recall = fraction of remaining attribute-sharing users identified

precision = fraction of identified users that share attribute

Page 22: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer Rice undergrad classes?

Yes; different communities show different characteristicsIn next graphs, average across all groups

19

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

Page 23: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer Rice undergrad classes?

Yes; different communities show different characteristicsIn next graphs, average across all groups

19

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n 1st Years

Page 24: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer Rice undergrad classes?

Yes; different communities show different characteristicsIn next graphs, average across all groups

19

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n 1st Years

2nd Years

Page 25: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer Rice undergrad classes?

Yes; different communities show different characteristicsIn next graphs, average across all groups

19

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n 3rd Years1st Years

2nd Years

Page 26: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer Rice undergrad classes?

Yes; different communities show different characteristicsIn next graphs, average across all groups

19

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n

0.000.200.400.600.801.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.200.400.600.801.00

Prec

isio

n 3rd Years1st Years

2nd Years 4th Years

Page 27: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Inferring other attributes

20

0.000.250.500.751.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.250.500.751.00

Prec

isio

n

0.000.250.500.751.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.250.500.751.00

Prec

isio

n

Dormitory Matriculation year

Page 28: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Inferring other attributes

20

0.000.250.500.751.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.250.500.751.00

Prec

isio

n

0.000.250.500.751.00

0 0.2 0.4 0.6 0.8 1

Rec

all

Fraction of Users Revealed

0.000.250.500.751.00

Prec

isio

n

Dormitory Matriculation year

Other algorithms

Page 29: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Can we infer user-provided attributes?

Use New Orleans data

Much more challengingFreeform textNon-authoritative attributesMissing dataMost not communities (gender, birthday, etc)

Results for 92 groupsWith conductance > 0.2

21

0.000.250.500.751.00

0 0.2 0.4 0.6 0.8 1

Reca

ll

Fraction of Users Revealed

0.000.250.500.751.00

Prec

ision

Page 30: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Summary

Ongoing online social network privacy debateFocuses mainly on explicitly provided attributes

Demonstrated that many attributes can be inferredEven if user didn’t provide them

Good interpretation: Can reduce burden on usersDon’t have to fill in entire profile

Bad interpretation: Can figure out attributes users don’t revealPrivacy is a function of what friends reveal

22

Page 31: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Questions?

23

Page 32: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Backup slides

24

Page 33: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Facebook privacy debate

Debate over privacy model and defaults Who can see users’ attributes, status, friends

Scale, intensity of debate illustrates importance

So far, focused on explicit dataThings the user uploaded or provided

What about implicit data?Data users didn’t explicitly reveal?

25

+

Page 34: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Obtaining authoritative information

Additional information from student directory and alumni directory

Found matches for 1,233 (20.0%) undergraduates548 (8.9%) graduate students2,093 (33.9%) alumni

Focus on undergraduate networkObtained college, major(s), year

Similar results for others

26

Page 35: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Modularity

How good is a community division?

Metric: Modularity QFraction of links within communitiesRelative to a random graph

Range is [-1,1]0 represents no more community structure than random

Modularity > 0.25 indicates strong communities

27

Q =�

i

(eii − a2i )

= Tr e− ||e2||

Page 36: You are who you know: Inferring user profiles in online ...Fraction of Users Revealed 0.00 0.25 0.50 0.75 1.00 Precision 0.00 0.25 0.50 0.75 1.00 0 0.2 0.4 0.6 0.8 1 Recall Fraction

05.02.2010 WSDM’10 Alan Mislove

Modularity

How good is a community division?

Metric: Modularity QFraction of links within communitiesRelative to a random graph

Range is [-1,1]0 represents no more community structure than random

Modularity > 0.25 indicates strong communities

27

Q =�

i

(eii − a2i )

= Tr e− ||e2||