38
School of Information University of Michigan SI 614 Search in structured networks Lecture 15

School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Embed Size (px)

Citation preview

Page 1: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

School of InformationUniversity of Michigan

SI 614Search in structured networks

Lecture 15

Page 2: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Search in structured networks

Page 3: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

NE

MA

Milgram (1960’s), Dodds, Muhamad, Watts (2003)

Given a target individual and a particular property, pass the message to a person you correspond with who is “closest” to the target.

Short chain lengths – six degrees of separation

Typical strategy – if far from target choose someone geographically closer, if close to target geographically, choose someone professionally closer

Small world experiments review

Page 4: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Is this the whole picture?

Why are small worlds navigable?

Page 5: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

How to choose among hundreds of acquaintances?

Strategy:Simple greedy algorithm - each participant chooses correspondentwho is closest to target with respect to the given property

Models

geographyKleinberg (2000)

hierarchical groupsWatts, Dodds, Newman (2001), Kleinberg(2001)

high degree nodesAdamic, Puniyani, Lukose, Huberman (2001), Newman(2003)

But how are people are able to find short paths?

Page 6: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Reverse small world experiment

Killworth & Bernard (1978): Given hypothetical targets (name, occupation, location, hobbies, religion…)

participants choose an acquaintance for each target Acquaintance chosen based on (most often) occupation, geography only 7% because they “know a lot of people” Simple greedy algorithm: most similar acquaintance two-step strategy rare

Page 7: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

nodes are placed on a lattice andconnect to nearest neighbors

additional links placed with puv~

Spatial search

“The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain”

S.Milgram ‘The small world problem’, Psychology Today 1,61,1967

ruvd

Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’Proc. 32nd ACM Symposium on Theory of Computing, 2000.(Nature 2000)

Page 8: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

When r=0, links are randomly distributed, ASP ~ log(n), n size of grid

When r=0, any decentralized algorithm is at least a0n2/3

no locality

When r<2, expected time at least rn(2-r)/3

0~p p

Page 9: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Overly localized links on a latticeWhen r>2 expected search time ~ N(r-2)/(r-1)

4

1~p

d

Page 10: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Links balanced between long and short range

When r=2, expected time of a DA is at most C (log N)2

2

1~p

d

Page 11: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’NIPS 14, 2001

Hierarchical network models:

Individuals classified into a hierarchy, hij = height of the least common ancestor.

Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)

Group structure models:Individuals belong to nested groupsq = size of smallest group that v,w belong to

f(q) ~ q-

Theorem: If = 1 and outdegree is polylogarithmic, can s ~ O(log n)

ijhijp b

h b=3

e.g. state-county-city-neighborhoodindustry-corporation-division-group

Page 12: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Sketch of proof

T

S

R|R|<|R’|<|R|

k = c log2n calculate probability that s fails to have a link in R’

R’

Page 13: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Identity and search in social networksWatts, Dodds, Newman (Science,2001)

individuals belong to hierarchically nested groups

multiple independent hierarchies h=1,2,..,H coexist corresponding to occupation, geography, hobbies, religion…

pij ~ exp(- x)

Page 14: School of Information University of Michigan SI 614 Search in structured networks Lecture 15
Page 15: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Identity and search in social networksWatts, Dodds, Newman (2001)

Message chains fail at each node with probability pNetwork is ‘searchable’ if a fraction r of messages reach the target

N=102400

N=409600

N=204800

(1 )L

Lq p r

Page 16: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Small World Model, Watts et al.

Fits Milgram’s data well

Model parameters:N = 108

z = 300 g = 100b = 10= 1, H = 2

Lmodel= 6.7Ldata = 6.5

http://www.aladdin.cs.cmu.edu/workshops/wsa/papers/dodds-2004-04-10search.pdf

more slides on this:

Page 17: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Mary

Bob

Jane

Who couldintroduce me toRichard Gere?

High degree search

Adamic et al. Phys. Rev. E, 64 46135 (2001)

Page 18: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Small world experiments so far

Classic small world experiment:

Given a target individual, forward to one of your acquaintances

Observe chains but not the rest of the social network

Reverse small world experiment (Killworth & Bernard)

Given a hypothetical individual,

which of your acquaintances would you choose

Observe individual’s social network and possible choices,

but not resulting chains or complete social network

Page 19: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Use a well defined network:HP Labs email correspondence over 3.5 months

Edges are between individuals who sent at least 6 email messages each way

450 usersmedian degree = 10, mean degree = 13average shortest path = 3

Node properties specified:degreegeographical locationposition in organizational hierarchy

Can greedy strategies work?

Testing search models on social networksadvantage: have access to entire communication networkand to individual’s attributes

Page 20: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

100

101

102

103

104

10-8

10-6

10-4

10-2

100

outdegree

freq

uenc

y

outdegree distribution = 2.0 fit

Power-law degree distribution of all senders of email passing through HP labs

Strategy 1: High degree search

number of recipients sender has sent email to

pro

po

rtio

n o

f se

nd

ers

Page 21: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Filtered network (at least 6 messages sent each way)

0 20 40 60 800

5

10

15

20

25

30

35

number of email correspondents, k

p(k

)

0 20 40 60 8010

-4

10-2

100

k

p(k

)

Degree distribution no longer power-law, but Poisson

It would take 40 steps on average (median of 16) to reach a target!

Page 22: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Strategy 2:Geography

Page 23: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

1U

2L 3L

3U

2U

4U

1L

87 % of the4000 links arebetween individualson the same floor

Communication across corporate geography

Page 24: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Cubicle distance vs. probability of being linked

102

103

10-3

10-2

10-1

100

distance in feet

pro

po

rtio

n o

f lin

ked

pa

irs

measured1/r

1/r2

optimum for search

Page 25: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Strategy 3: Organizational hierarchy

Page 26: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Email correspondence superimposed on the organizational hierarchy

Page 27: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Example of search path

distance 1

distance 1

distance 2

hierarchical distance = 5search path distance = 4

distance 1

Page 28: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Probability of linking vs. distance in hierarchy

in the ‘searchable’ regime: 0 < < 2 (Watts, Dodds, Newman 2001)

2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6p

rob

ab

ility

of l

inki

ng

hierarchical distance h

observedfit exp(-0.92*h)

Page 29: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Results

0 5 10 15 20 250

1

2

3

4

5x 10

4

number of steps in search

nu

mb

er

of

pa

irs

distance hierarchy geography geodesic org random

median 4 7 3 6 28

mean 5.7 (4.7) 12 3.1 6.1 57.4

0 2 4 6 8 10 12 14 16 18 200

2000

4000

6000

8000

10000

12000

14000

16000

number of steps

nu

mb

er

of p

airs

hierarchy geography

Page 30: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Expt 2

Searchinga socialnetworkingwebsite

Page 31: School of Information University of Michigan SI 614 Search in structured networks Lecture 15
Page 32: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Profiles:

status (UG or G) yearmajor or departmentresidencegender

Personality (choose 3 exactly):you funny, kind, weird, …friendship honesty/trust, common interests, commitment, …romance - “ -freetime socializing, getting outside, reading, …support unconditional accepters, comic-relief givers, eternal optimists

Interests (choose as many as apply)books mystery & thriller, science fiction, romance, …movies western, biography, horror, …music folk, jazz, techno, …social activities ballroom dancing, barbecuing, bar-hopping, …land sports soccer, tennis, golf, …water sports sailing, kayaking, swimming, …other sports ski diving, weightlifting, billiards, …

Page 33: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Differences between data sets

• complete image of communication network

• affinity not reflected

• partial information of social network

• only friends listed

HP labs email network Online community

Page 34: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

0 20 40 60 80 1000

50

100

150

200

250

number of links

nu

mb

er

of u

sers

with

so

ma

ny

links

100

101

102

100

101

102

number of links

num

ber

of u

sers

Degree Distribution for Nexus Net 2469 users, average degree 8.2

Page 35: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Problem: how to construct hierarchies?

0 1 2 30

0.002

0.004

0.006

0.008

0.01

0.012

0.014

separation in years

pro

b. t

wo

un

de

rgra

ds

are

frie

nd

s

data

(x+1)-1.1 fit

0 1 2 3 4 50

0.005

0.01

0.015

0.02

separation in years

prob

. tw

o gr

ads

are

frie

nds data

(x+1)-1.7 fit

Probability of linking by separation in years

Page 36: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Hierarchies not useful for other attributes:

0 100 200 300 400 500 6000

0.01

0.02

0.03

0.04

0.05

0.06

distance between residences

pro

ba

bili

ty o

f be

ing

frie

nd

s

Geography

Other attributes: major, sports, freetime activities, movie preferences…

Page 37: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Strategy using user profiles

prob. two undergrads are friends (consider simultaneously)

• both undergraduate, both graduate, or one of each

• same or different year

• both male, both female, or one of each

• same or different residences

• same or different major/department

Results

random 133 390high degree 39 137profile 21 53

strategy median mean

With an attrition rate of 25%, 5% of the messages get through atan average of 4.8 steps,=> hence network is barely searchable

Page 38: School of Information University of Michigan SI 614 Search in structured networks Lecture 15

Search Conclusions

Individuals associate on different levels into groups.

Group structure facilitates decentralized search using social ties.

Hierarchy search faster than geographical search

A fraction of ‘important’ individuals are easily findable

Humans may be more resourceful in executing search tasks:making use of weak tiesusing more sophisticated strategies