96
Online Social Networks: Navigation, Search, Recommendation 1 Many slides adapted from Lada Adamic (Michigan)

Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Online Social Networks: Navigation, Search, Recommendation

1

Many slides adapted from Lada Adamic (Michigan)

Page 2: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Today's Plan

Final project details: recap and tips

Searching a social network

Real systems: node recommendation

2

Page 3: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Search in structured networks

3

Page 5: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

1

6

54

63

67

2

94

number of

nodes found

power-law graph

5

Page 6: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

How would you search for a node here?

6

Page 7: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

What about here?

7

Page 8: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

gnutella network fragment

8

Page 9: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

step

cu

mu

lati

ve

no

de

s f

ou

nd

at

ste

p

high degree seeking 1st neighborshigh degree seeking 2nd neighbors

50% of the files in a 700 node network can be found in < 8 steps

Gnutella network

9

Page 10: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

And here?

10

Page 11: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

here?

11

Page 12: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

here?

Source: http://maps.google.com12

Page 13: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

here?

Source: http://maps.google.com13

Page 14: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

here?

Source: http://maps.google.com14

Page 15: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

NE

MA

Milgram (1960’s), Dodds, Muhamad, Watts (2003)

Given a target individual and a particular property, pass the message to a

person you correspond with who is “closest” to the target.

Short chain lengths – six degrees of separation

Typical strategy – if far from target choose someone geographically closer,

if close to target geographically, choose someone professionally closer

Small world experiments review

Source: undeterminedSource: NASA, U.S. Government;

http://visibleearth.nasa.gov/view_rec.php?id=2429

15

Page 16: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Is this the whole picture?

Why are small worlds navigable?

Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.16

Page 17: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

How to choose among hundreds of acquaintances?

Strategy:

Simple greedy algorithm - each participant chooses

correspondent

who is closest to target with respect to the given property

Models

geographyKleinberg (2000)

hierarchical groupsWatts, Dodds, Newman (2001), Kleinberg(2001)

high degree nodesAdamic, Puniyani, Lukose, Huberman (2001), Newman(2003)

How are people are able to find short paths?

17

Page 18: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Reverse small world experiment

• Killworth & Bernard (1978):

• Given hypothetical targets (name, occupation, location, hobbies, religion…) participants choose an acquaintance for each target

• Acquaintance chosen based on

• (most often) occupation, geography

• only 7% because they “know a lot of people”

• Simple greedy algorithm: most similar acquaintance

• two-step strategy rare

Source: 1978 Peter D. Killworth and H. Russell Bernard. The Reverse Small World Experiment Social Networks 1:159–92. 18

Page 19: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

How many hops actually separate any two individuals in the world?

• Participants are not perfect in routing messages• They use only local information• “The accuracy of small world chains in social networks”

Peter D. Killworth, Chris McCarty , H. Russell Bernard& Mark House:

– Analyze 10920 shortest path connections between 105 members of an interviewing bureau,

– together with the equivalent conceptual, or ‘small world’ routes, which use individuals’ selections of intermediaries.

– This permits the first study of the impact of accuracy within small world chains.

– The mean small world path length (3.23) is 40% longer than the mean of the actual shortest paths (2.30)

– Model suggests that people make a less than optimal small world choice more than half the time.

19

Page 20: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

nodes are placed on a lattice and

connect to nearest neighbors

additional links placed with puv~

+Spatial search

“The geographic movement of the [message]

from Nebraska to

Massachusetts is striking. There is a

progressive closing in on the target

area as each new person is added to the

chain”

S.Milgram „The small world

problem‟, Psychology Today 1,61,1967

r

uvd

Kleinberg, „The Small World Phenomenon, An Algorithmic Perspective‟

Proc. 32nd ACM Symposium on Theory of Computing, 2000.

(Nature 2000)

20

Page 21: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

When r=0, links are randomly distributed, ASP ~ log(n), n size of grid

When r=0, any decentralized algorithm is at least a0n2/3

no locality

When r<2,

expected

time at

least arn(2-r)/3

0~p p

21

Page 22: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Overly localized links on a lattice

When r>2 expected search time ~ N(r-2)/(r-1)

4

1~p

d

22

Page 23: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Links balanced between long and short range

When r=2, expected time of a DA is at most C (log N)2

2

1~p

d

23

Page 25: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Use a well defined network:

HP Labs email correspondence over 3.5 months

Edges are between individuals who sent

at least 6 email messages each way

450 users

median degree = 10, mean degree = 13

average shortest path = 3

Node properties specified:

degree

geographical location

position in organizational hierarchy

Can greedy strategies work?

Testing search models on social networksadvantage: have access to entire communication network

and to individual‟s attributes

25

Page 26: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

the network otherwise known as sample.gdf

26

Page 27: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

100

101

102

103

104

10-8

10-6

10-4

10-2

100

outdegree

frequency

outdegree distributiona = 2.0 fit

Power-law degree distribution of all senders of email passing through HP labs

Strategy 1: High degree search

number of recipients sender has sent email to

pro

po

rtio

n o

f se

nd

ers

27

Page 28: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Filtered network (at least 6 messages sent each way)

0 20 40 60 800

5

10

15

20

25

30

35

number of email correspondents, k

p(k

)

0 20 40 60 8010

-4

10-2

100

k

p(k

)

Degree distribution no longer power-law, but Poisson

It would take 40 steps on average (median of 16) to reach a target! 28

Page 29: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Strategy 2:

Geography

29

Page 30: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

1U

2L 3L

3U

2U

4U

1L

87 % of the

4000 links are

between individuals

on the same floor

Communication across corporate geography

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005. 30

Page 31: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Cubicle distance vs. probability of being linked

102

103

10-3

10-2

10-1

100

distance in feet

pro

po

rtio

n o

f lin

ke

d p

airs

measured

1/r

1/r2

optimum for search

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005. 31

Page 32: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Livejournal

• LiveJournal provides an API to crawl the friendship network + profiles– friendly to researchers– great research opportunity

• basic statistics – Users (stats from April 2006)

• How many users, and how many of those are active?• Total accounts: 9980558 • ... active in some way: 1979716 • ... that have ever updated: 6755023 • ... updating in last 30 days: 1300312 • ... updating in last 7 days: 751301 • ... updating in past 24 hours: 216581

32

Page 33: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Predominantly female & young

demographic• Male: 1370813 (32.4%)

• Female: 2856360 (67.6%)

• Unspecified: 1575389

13 18483

14 87505

15 211445

16 343922

17 400947

18 414601

19 405472

20 371789

21 303076

22 239255

23 194379

24 152569

25 127121

26 98900

27 73392

28 59188

29 48666

Age distribution

33

Page 34: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Geographic Routing in Social Networks

• David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins (PNAS 2005)

• data used

– Feb. 2004

– 500,000 LiveJournal users with US locations

– giant component (77.6%) of the network

– clustering coefficient: 0.2

34

Page 35: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Degree distributions

• The broad degree distributions we’ve learned to know and love

– but more probably lognormal than power law

broader in degree than outdegree distributionSource: http://www.tomkinshome.com/andrew/papers/science-blogs/pnas.pdf 35

Page 36: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Results of a simple greedy geographical algorithm

• Choose source s and target t randomly

• Try to reach target’s city – not target itself

• At each step, the message is forwarded from the current message holder u to the friend v of u geographically closest to t

stop if d(v,t) > d(u,t)

13% of the chains are completed

stop if d(v,t) > d(u,t)

pick a neighbor at random in the

same city if possible, else stop

80% of the chains are completed

Source: http://www.tomkinshome.com/andrew/papers/science-blogs/pnas.pdf36

Page 37: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

the geographic basis of friendship

• d = d(u,v) the distance between pairs of people

• The probability that two people are friends given their distance is equal to

– P(d) = e + f(d), e is a constant independent of geography

– e is 5.0 x 10-6 for LiveJournal users who are very far apart

Source: http://www.tomkinshome.com/andrew/papers/science-blogs/pnas.pdf 37

Page 38: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

the geographic basis of friendship

• The average user will have ~ 2.5 non-geographic friends

• The other friends (5.5 on average) are distributed according to an approximate 1/distance relationship

• But 1/d was proved not to be navigable by Kleinberg, so what gives?

Source: http://www.tomkinshome.com/andrew/papers/science-blogs/pnas.pdf 38

Page 39: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Navigability in networks of variable geographical density

• Kleinberg assumed a uniformly populated 2D lattice

• But population is far from uniform

• population networks and rank-based friendship

– probability of knowing a person depends not on absolute distance but on relative distance (i.e. how many people live closer) Pr[u ->v] ~ 1/ranku(v)

Source: http://www.tomkinshome.com/andrew/papers/science-blogs/pnas.pdf 39

Page 40: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

what if we don’t have geography?

40

Page 41: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

does community structure help?

41

Page 42: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Kleinberg, „Small-World Phenomena and the Dynamics of Information‟, NIPS 14, 2001

Individuals classified into a hierarchy,

hij = height of the least common ancestor.

Theorem: If a = 1 and outdegree is polylogarithmic, can

s ~ O(log n)

Group structure models:

Individuals belong to nested groups

q = size of smallest group that v,w belong to

f(q) ~ q-a

Theorem: If a = 1 and outdegree is polylogarithmic, can

s ~ O(log n)

h b=3

e.g. state-county-city-neighborhood

industry-corporation-division-groupijh

ij bpa

~

Hierarchical small world models

42

Page 43: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Why search is fast in hierarchical topologies

T

S

Rl2|R|<|R‟|<l|R|

k = c log2n calculate probability that s fails to have a link in R‟

R‟

43

Page 44: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

individuals belong to hierarchically nested groups

multiple independent hierarchies h=1,2,..,H

coexist corresponding to occupation,

geography, hobbies, religion…

pij ~ exp(-a x)

Source: Identity and Search in Social Networks: Duncan J. Watts, Peter Sheridan Dodds, and M. E. J. Newman;

Science 17 May 2002 296: 1302-1305. < http://arxiv.org/abs/cond-mat/0205383v1 >

hierarchical models with multiple hierarchies

44

Page 45: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Source: Identity and Search in Social Networks: Duncan J. Watts, Peter Sheridan Dodds, and M. E. J. Newman;

Science 17 May 2002 296: 1302-1305. < http://arxiv.org/abs/cond-mat/0205383v1 >

45

Page 46: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Identity and search in social networksWatts, Dodds, Newman (2001)

Message chains fail at each node with probability p

Network is „searchable‟ if a fraction r of messages reach the target

N=102400

N=409600

N=204800

(1 )L

Lq p r

Source: Identity and Search in Social Networks: Duncan J. Watts, Peter Sheridan Dodds, and M. E. J. Newman;

Science 17 May 2002 296: 1302-1305. < http://arxiv.org/abs/cond-mat/0205383v1 >

46

Page 47: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Small World Model, Watts et al.

Fits Milgram‟s data well

Model

parameters:

N = 108

z = 300

g = 100

b = 10

a= 1, H = 2

Lmodel= 6.7

Ldata = 6.5

http://www.aladdin.cs.cmu.edu/workshops/wsa/papers/dodds-2004-04-10search.pdf

more slides on this:

47

Page 48: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

does it work in practice? back to HP Labs: Organizational hierarchy

48

Page 49: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Email correspondence superimposed on the organizational hierarchy

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005.

49

Page 50: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Example of search path

distance 1

distance 1

distance 2

hierarchical distance = 5

search path distance = 4

distance 1

50

Page 51: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Probability of linking vs. distance in hierarchy

in the „searchable‟ regime: 0 < a < 2 (Watts, Dodds, Newman 2001)

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6p

rob

ab

ility

of lin

kin

g

hierarchical distance h

observedfit exp(-0.92*h)

51

Page 52: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Results

0 5 10 15 20 250

1

2

3

4

5x 10

4

number of steps in search

nu

mb

er

of

pa

irs

distance hierarchy geography geodesic org random

median 4 7 3 6 28

mean 5.7 (4.7) 12 3.1 6.1 57.4

0 2 4 6 8 10 12 14 16 18 200

2000

4000

6000

8000

10000

12000

14000

16000

number of steps

nu

mb

er

of p

airs

hierarchygeography

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005.

52

Page 53: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Expt 2

Searching

a social

networking

website

Source: ClubNexus - Orkut Buyukkokten, Tyler Ziemann53

Page 54: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Source: ClubNexus - Orkut Buyukkokten, Tyler Ziemann54

Page 55: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Profiles:

status (UG or G)

year

major or department

residence

gender

Personality (choose 3 exactly):

you funny, kind, weird, …

friendship honesty/trust, common interests, commitment, …

romance - “ -

freetime socializing, getting outside, reading, …

support unconditional accepters, comic-relief givers, eternal optimists

Interests (choose as many as apply)

books mystery & thriller, science fiction, romance, …

movies western, biography, horror, …

music folk, jazz, techno, …

social activities ballroom dancing, barbecuing, bar-hopping, …

land sports soccer, tennis, golf, …

water sports sailing, kayaking, swimming, …

other sports ski diving, weightlifting, billiards, …

55

Page 56: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Differences between data sets

• complete image of

communication network

• affinity not reflected

• partial information of

social network

• only friends listed

HP labs email network Online community

56

Page 57: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

0 20 40 60 80 1000

50

100

150

200

250

number of links

nu

mb

er

of u

se

rs w

ith

so

ma

ny lin

ks

100

101

102

100

101

102

number of links

num

ber

of

users

Degree Distribution for Nexus Net

2469 users, average degree 8.2

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005. 57

Page 58: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Problem: how to construct hierarchies?

0 1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014

separation in years

pro

b. tw

o u

nd

erg

rad

s a

re frie

nd

s

data

(x+1)-1.1 fit

0 1 2 3 4 50

0.005

0.01

0.015

0.02

separation in years

pro

b.

two g

rads a

re f

riends

data

(x+1)-1.7 fit

Probability of linking by separation in years

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005. 58

Page 59: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Hierarchies not useful for other attributes:

0 100 200 300 400 500 6000

0.01

0.02

0.03

0.04

0.05

0.06

distance between residences

pro

ba

bili

ty o

f b

ein

g frie

nd

s

Geography

Other attributes: major, sports, freetime activities, movie preferences…

source: Adamic and Adar, How to search a social network, Social Networks, 27(3), p.187-203, 2005.

59

Page 60: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Strategy using user profiles

prob. two undergrads are friends (consider simultaneously)

• both undergraduate, both graduate, or one of each

• same or different year

• both male, both female, or one of each

• same or different residences

• same or different major/department

Results

random 133 390

high degree 39 137

profile 21 53

strategy median mean

With an attrition rate of 25%, 5% of the messages get through at

an average of 4.8 steps,

=> hence network is barely searchable 60

Page 61: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Individuals associate on different levels into groups.

Group structure facilitates decentralized search using social ties.

Hierarchy search faster than geographical search

A fraction of „important‟ individuals are easily findable

Humans may be more resourceful in executing search tasks:

making use of weak ties

using more sophisticated strategies

Summary

61

Page 62: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Link Recommendation on Social Networks

• Basics of recommender systems

• Friends on Facebook

• Connections on LinkedIn

• WTF ("who to follow") on Twitter (to be continued)

62

Page 63: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Recommender Systems

• Systems which take user preferences about items as input and outputs recommendations

• Early examples

• Bellcore Music Recommender (1995)

• MIT Media Lab: Firefly (1996)

Best example: Amazon.com

Worst example: Amazon.com

Also:

Netflix

eBay

Google Reader

iTunes Genius

digg.com

Hulu.com

63

Page 64: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Recommender Systems

• Basic idea

– recommend item i to user u for the purpose of• Exposing them to something they would not have otherwise seen

• Leading customers to the Long Tail

• Increasing customers’ satisfaction

• Data for recommender systems (need to know who likes what)

– Purchase/rented

– Ratings

– Web page views

– Which do you think is best?

64

Page 65: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Recommender Systems

• Two types of data:

• Explicit data: user provides information about their preferences– Pro: high quality ratings

– Con: Hard to get: people cannot be bothered

• Implicit data: infer whether or not user likes product based on behavior– Pro: Much more data available, less invasive

– Con: Inference often wrong (does purchase imply preference?)

• In either case, data is just a big matrix – Users x items

– Entries binary or real-valued

• Biggest Problem:– Sparsity: most users have not rated most products.

65

45531

312445

53432142

24542

522434

42331

Page 66: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Recommender Systems: Models

Two camps on how to make recommendations:

– Collaborative Filtering (CF)• Use collective intelligence from all available rating information to make

predictions for individuals

• Depends on the fact that user tastes are correlated and commutative:

• If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y

– Content based• Extracts “features” from items for a big regression or rule-based model

• See www.nanocrowd .com

– 15 years of research in the field

– Conventional wisdom:• CF performs better when there is sufficient data

• Content-based is useful when there is little data

66

Page 67: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Detour: the Netflix Prize

67

Page 68: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Netflix• A US-based DVD rental-by mail company

• >10M customers, 100K titles, ships 1.9M DVDs per day

Good recommendations = happy

customers

68

Page 69: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Netflix Prize

• October, 2006:

• Offers $1,000,000 for an improved recommender algorithm

•Training data

• 100 million ratings

• 480,000 users

• 17,770 movies

• 6 years of data: 2000-2005

• Test data

• Last few ratings of each user (2.8 million)

• Evaluation via RMSE: root mean squared error

• Netflix Cinematch RMSE: 0.9514

• Competition

• $1 million grand prize for 10% improvement

• If 10% not met, $50,000 annual “Progress Prize” for best improvement

datescoremovieuser

2002-01-031211

2002-04-0452131

2002-05-0543452

2002-05-0541232

2003-05-0337682

2003-10-105763

2004-10-114454

2004-10-1115685

2004-10-1123425

2004-12-1222345

2005-01-025766

2005-01-314566

datescoremovieuser

2003-01-03?2121

2002-05-04?11231

2002-07-05?252

2002-09-05?87732

2004-05-03?982

2003-10-10?163

2004-10-11?24504

2004-10-11?20325

2004-10-11?90985

2004-12-12?110125

2005-01-02?6646

2005-01-31?15266

69

Page 70: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Netflix Prize

• Competition design

• Hold-out set created by taking last 9 ratings for each user

– Non-random, biased set

• Hold-out set split randomly three ways:

– Probe Set – appended to training data to allow unbiased estimation of RMSE

– Submit ratings for the (Quiz+Test) Sets – Netflix returns RMSE on the Quiz Set

only

– Quiz Set results posted on public leaderboard, but Test Set used to determine the winner!

» Prevents overfitting

70

Page 71: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Data CharacteristicsMean Score vs. Date of Rating

3.2

3.3

3.4

3.5

3.6

3.7

3.8

2000 2001 2002 2003 2004 2005 2006

Date

Me

an

Sc

ore

0

5

10

15

20

25

30

35

40

1 2 3 4 5

Rating

Perc

en

tag

e

Training (m = 3.60)

Probe (m = 3.67)

71

Page 72: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Ratings per movie/user

Mean Rating# RatingsUser ID

1.9017,651305344

1.8117,432387418

1.2216,5602439493

4.2615,8111664010

4.0814,8292118461

1.379,8201461435

Avg #ratings/user: 208

Avg #ratings/movie: 5627

72

Page 73: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Data Characteristics

• Most Loved MoviesCountAvg ratingMost Loved Movies

137812 4.593 The Shawshank Redemption

133597 4.545 Lord of the Rings :The Return of the King

180883 4.306 The Green Mile

150676 4.460 Lord of the Rings :The Two Towers

139050 4.415 Finding Nemo

117456 4.504 Raiders of the Lost Ark

Most Rated Movies

Miss Congeniality

Independence Day

The Patriot

The Day After Tomorrow

Pretty Woman

Pirates of the Caribbean

Highest Variance

The Royal Tenenbaums

Lost In Translation

Pearl Harbor

Miss Congeniality

Napolean Dynamite

Fahrenheit 9/11

73

Page 74: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

8pm6am10/18pm

• ARRRRGH! We have one more chance….

74

Page 75: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

75

Page 76: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

76

Test Set Results

• BellKor’s Pragmatic Theory: 0.8567

• The Ensemble: 0.8567

• Tie breaker was submission date/time• They won by 20 minutes!

But really:

• BellKor’s Pragmatic Theory: 0.856704

• The Ensemble: 0.856714

• Also, a combination of BPC (10.06%) and Ensemble (10.06%) scores results in a 10.19% improvement!

Page 77: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

BellCore Approach

• The prize winning solutions were an ensemble of many separate solution sets

• Progress Prize 2007: 103 sets

• Progress Prize 2008 (w/Big Chaos): 205 sets

• Grand Prize 2009 (w/ BC and Pragmatic Theory): > 800 sets!!

– Used two main classes of models• Nearest Neighbors

• Latent Factor Models (via Singular Value Decomposition)

• Also regularized regression, not a big factor

• Teammates used neural nets and other methods

• Approaches mainly algorithmic, not statistical in nature

77

Page 78: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Data representation (excluding dates)

121110987654321

455311

3124452

534321423

245424

5224345

423316

users

mo

vie

s

- unknown rating - rating between 1 to 5

78

Page 79: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

79

Page 80: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

121110987654321

455311

3124452

534321423

245424

5224345

423316

users

mo

vie

s

- unknown rating - rating between 1 to 5

80

Page 81: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

121110987654321

455 ?311

3124452

534321423

245424

5224345

423316

users

mo

vie

s

- estimate rating of movie 1 by user 5

81

Page 82: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

121110987654321

455 ?311

3124452

534321423

245424

5224345

423316

users

Neighbor selection:

Identify movies similar to 1, rated by user 5

mo

vie

s

82

Page 83: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

121110987654321

455 ?311

3124452

534321423

245424

5224345

423316

users

Compute similarity weights:

s13=0.2, s16=0.3

mo

vie

s

83

Page 84: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

121110987654321

4552.6

311

3124452

534321423

245424

5224345

423316

users

Predict by taking weighted average:

(0.2*2+0.3*3)/(0.2+0.3)=2.6

mo

vie

s

84

Page 85: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

– To predict the rating for user u on item i:

• Use similar users’ ratings for similar movies:

rui = rating for user u and item i

bui= baseline rating for user u and item I

sij = similarity between items i and j

N(i,u) = neighborhood of item i for user u (might be fixed at k) 

ˆ r ui =sij

j ÎN ( i,u)å ruj

sijj ÎN( i,u)

å

85

Page 86: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors

• Useful to “center” the data, and model residuals

• What is sij ???– Cosine distance

– Correlation

• What is N(i,u)??– Top-k

– Threshold

• What is bui

• How to deal with missing values?• Choose several different options and throw them in!

86

Page 87: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors, cont

• This is called “item-item” NN

– Can also do user-user

– Which do you think is better?

• Advantages of NN

– Few modeling assumptions

– Easy to explain to users

– Most popular RS tool

87

Page 88: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Nearest Neighbors, Modified

• Problem with traditional k-NN:• Similarity weights are calculated globally, and

• do not account for correlation among the neighbors

– We estimate the weights (wij) simultaneously via a least squares optimization :

Basically, a regression using the ratings in the nbhd.

– Shrinkage helps address correlation

– (don’t try this at home)88

Page 89: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Geared

towards

females

Geared

towards

males

serious

escapist

The PrincessDiaries

The Lion King

Braveheart

Lethal Weapon

Independence Day

AmadeusThe Color Purple

Dumb and Dumber

Ocean’s 11

Sense and Sensibility

Latent factor models – Singular Value Decomposition

89

SVD finds concepts

Page 90: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Matrix Decomposition - SVD

90

45531

312445

53432142

24542

522434

42331

item

s

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1

~

~

item

s

users

users

?

D3

Example with 3

factors

(concepts

Each user and each item is

described by a feature vector across

concepts

Page 91: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Factorization-based modeling45531

312445

53432142

24542

522434

42331

.2-.4.1

.5.6-.5

.5.3-.2

.32.11.1

-22.1-.7

.3.7-1

-.92.41.4.3-.4.8-.5-2.5.3-.21.1

1.3-.11.2-.72.91.4-1.31.4.5.7-.8

.1-.6.7.8.4-.3.92.41.7.6-.42.1~

• This is a strange way to use SVD!

– Usually for reducing dimensionality, here for filling in missing data!

– Special techniques to do SVD w/ missing data• Alternating Least Squares = variant of EM algorithms

• Probably most popular model among contestants– 12/11/2006: Simon Funk describes an SVD based method

91

Page 92: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Latent Factor Models, Modified• Problem with traditional SVD:

– User and item factors are determined globally

– Each user described as a fixed linear combination across factors

– What if there are different people in the household?

• Let the linear combination change as a function of the item rated.

• Substitute pu with pu(i), and add similarity weights

• Again, don’t try this at home!

92

Page 93: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

First 2 Singular Vectors

93

Page 94: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Incorporating Implicit Data

• Implicit Data: what you choose to rate is an important, and separate piece of information than how you rate it.

• Helps incorporate negative information, especially for those users with low variance.

• Can be fit in NN or SVD

94

Page 95: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

WTF: The Who to Follow Service at Twitter

• Twitter's user recommendation service, responsible for creating millions of connections daily between users based on shared interests, common connections, and other related factors.

• Reference:http://www.stanford.edu/~rezab/papers/wtf_overview.pdf

95

Page 96: Online Social Networks: Navigation, Search, Recommendationeugene/cs190/lectures/april23-osn3.pdf · Theorem: If a= 1 and outdegree is polylogarithmic, can s ~ O(log n) Group structure

Facebook EdgeRank

• http://techcrunch.com/2010/04/22/facebook-edgerank/

http://econsultancy.com/us/blog/7885-the-ultimate-guide-to-the-facebook-edgerank-algorithm

http://cs229.stanford.edu/proj2007/DaniyalzadeLipus-FacebookFriendSuggestion.pdf

• To be continued…. http://cameronmarlow.com/papers

96