School of Information University of Michigan SI 614 Search in random networks Lecture 16

School of InformationUniversity of Michigan

SI 614Search in random networks

Lecture 16

MotivationPower-law (PL) networks, social and P2P

Analysis of scaling of search strategies in PL networks

Simulationartificial power-law topologies, real Gnutella networks

Comparison with existing P2P search strategiesReflector, Morpheus

Path finding

Directed SearchFreenet

Search in random networks

2

Mary

Bob

Jane

Who couldintroduce me toRichard Gere?

How do we search?

AT&T Call Graph

Aiello et al. STOC ‘00

# o

f te

lep

hon

e n

um

be

rsfr

om

wh

ich

ca

lls w

ere

ma

de

# of telephone numbers called

100

101

100

101

102

number of neighbors

pro

po

rtio

n o

f n

od

es

datapower-law fit = 2.07

Gnutella network

power-law link distribution

summer 2000,data provided by Clip2

Preferential attachment model

Nodes join at different times

The more connections a node has, the more likely it is to acquirenew connections

Growth process produces power-law network

host cache

pingping

ping

ping

ping

ping ping

file sharing w/o a central index

queries broadcast to every node within radius ttl as network grows, encounter a bandwidth barrier (dial up modems cannot keep up with query traffic, fragmenting the network)

Gnutella and the bandwidth barrier

Clip 2 reportGnutella: To the Bandwidth Barrier and Beyondhttp://www.clip2.com/gnutella.html#q17

1

6

54

63

67

2

94

number ofnodes found

power-law graph

93

number ofnodes found

13

711

1519

Poisson graph

Search with knowledge of 2nd neighbors

Outline of search strategy

pass query onto only one neighbor at each step

requires that nodes sign query- avoid passing message onto a node twice

requires knowledge of one’s neighbors degree- pass to the highest degree node

requires knowledge of one’s neighbors neighbors- route to 2nd degree neighbors

OPTIONS

Generating functions

M.E.J. Newman, S.H. Strogatz, and D.J. Watts ‘Random graphs with arbitrary degree distributions and their

applications’, PRE, cond-mat/0007235

Generating functions for degree distributions

Useful for computing moments of degree distribution, component sizes, and average pathlengths

00

( ) kk

k

G x p x

Fun with generating functions

normalization condition: probabilities sum to 1

1)1(0

0

kkpG

derivatives: the generating function contains all the information of the degree distribution

00

!

1

xk

k

k x

G

kp

Fun with generating functions (cont’d)

Expected degree of a randomly chosen vertex

'0(1)k

k

k kp G

Higher moments of degree distribution

1

0 )(

x

n

kk

nn xGdx

dxpkk

Example: Poisson distribution

Let p = z/N be the probability of an edge existing between two vertices (z is the average degree)

kkNkN

k

xppk

NxG

)1()(

00

)1(~)1( xzN epxp for large N

)1(0 )(' xzzexG

zG )1('0

zkxk

k

k ezkx

G

kp

!

1

!

10

0 just the regular Poisson

distribution

Introducing cutoffs

max 1k N a node cannot have more connections than there are other nodes

This is important for exponents close to 2

1 1

11kp C

x 2 2

6C

1000

( 1000, 2) ~ 0.001kp k p

Probability that none of the nodes in a 1,000 node graph has 1000 or more neighbors:

1000(1 ( 1000, 2)) ~ 0.36p k

without a cutoff, for = 2have > 50% chance of observing a node with more neighbors than there are nodes

for = 2.1, have a 25% chance

# of sites linking to the site

prop

ortio

n of

site

s w

/ so

man

y lin

ks

1000

Selecting from a variety of cutoffs

Nk max1.

2. /k

k eCkp Newman et al.

3.

0

Ckpk

1CNk

otherwise

Generating Function

kCN

k

xkCxG

1

10

Aiello et al.

1 million websites (~ 1997)

N

Aiello’s ‘conservative’ vs. Havlin’s ‘natural’ cutoff

1

1

* 1

~

kN p

Ck N

k N

cutoff where expectednumber of nodes of degreek is 1

k

n(k)

k

n(k)

1

1

cutoff so thatexpected number of nodesof degree > k is 1

max

max

1

1 1max

1

1max

* 1

~

~

~

kk k

k k

N p

ck N

k N

k N

The imposed cutoff can have a dramaticeffect on the properties of the graph

degrees drawn at random, for = 2, and N = 1000

00

( ) kk

k

G x p x

is the probability that a randomlychosen vertex has degree k

~kp k

is a generating function

'0(1)k

k

k kp G is the expected degree of a randomlychosen vertex

'0

1 '0 1

G xG x

G is the distribution of remaining

outgoing edges following and edge

assuming neighbors don’t share edges

11 '1

'02 GGz is the expected number of second

degree neighbors

22

2

2 2

2

1

11

Generating functions for degree distributions Random graphs with arbitrary degree distributions and their applications by Newman, Strogatz & Watts

search with knowledge of first neighbors

max

max

maxmax

max

max

01

' 1 10 0

1

' 1 1 20 max

1 1

'' 1 101 ' '

10 0

1 2'

20

3 2' max1 '

0

( )

( ) ( )

1(1) 1

2

( )( )

(1) (1)

( 1)(1)

( 2) 21(1)

(1)

kk

kk

kk

kk

kk

G x c k x

G x G x c k xx

G k c k k dk k

G x cG x k x

G G x

ck k x

G

kG

G

2max( 1) (3 )

( 2)(3 )

k

Generating function with cutoff

Average degree of vertex

constant in Nfor 2<<3, and kmax~Na, decreaseswith N

Average number of neighborsfollowing an edge

search with knowledge of first neighbors (cont’d)

3 3' 3max max

1 1 max' 20 max

1 2(1)

(1) (3 ) 1 (3 )B

k kz G k

G k

In the limit t->2, ' max1

max

(1)log( )

kG

k

Let’s for the moment ignore the fact that as we do a random walk, we encounter neighborsthat we’ve seen before

s = number of steps =1B

N

z

Search time with different cutoffs

0.18(2.1)s N

3 3m

2

ax

( ) ,2 3N N

sk N

NIf kmax = N,

max

max

log(log( )

, 2)N k

s Nk

2

21

33max 1

( ) ,2 3NN N

sk

N

If kmax = N1/(-1),

max

max

log( )(2) log( )

N ks

kN

0.1(2.1) Ns

search with knowledge of first neighbors (cont’d)

2 3 /13

3max

,2 3

( )

N Ns N

kN

If kmax = N,

So the best we can do is for exponents close to 2 N

2nd neighbor random walk, ignoring overlap:

15.0~1.2, NNS 213~, NNS

2

2

~

s B

B

n z N

NS

z N

232' max

2 1 1 1 21 max

2( ( )) (1)

1 (3 )Bx

kz G G x G

x k

Following the degree sequence

a ~ s = # of steps taken

1.0deg 1.2, NNS

2nd neighbors, ignoring overlap:

Go to highest degree node, then next highest, … etc.

max

max

1 11 max~

k

D k az Nk dk Nak

max

max

' 2(2 )1 1

2( 2) 2 4 /

( ) ~

~ ~

Dz G x Nak

s k N

0 10 20 30 40 50 60 70 80 90 100

1

= 2.00 = 2.25 = 2.50 = 2.75 = 3.00 = 3.25 = 3.50 = 3.75

degree of node

deg

ree

of

nei

gh

bo

r -

1d

egre

e o

f n

od

e

2

20

10

5

Ratio of the degree of a node to the expected degree of its highest degree neighbor for 10,000 node power-law graphs of varying exponents

Actor collaboration graph (imdb database)

~ 2.0-2.2

Exponents close to 2 required to search effectively

World Wide Web, ~ 2.0-2.3, high degree nodes: directories, search engines

Social networks, AT&T call graph ~ 2.1

Gnutella

100 101 102 103 104100

101

102

103

104

105

number of costars

num

ber

of a

ctor

s/ac

tres

ses actors, = 2

actresses, = 2.1

15

50

18 17

8

10

9

6

Following the degree sequence

Complications

Should not visit same node more than once

Many neighbors of current node being visited were also neighbors of previously visited nodes, and there is a bias toward high degree nodes being ‘seen’ over and over again

0 100 200 300 400 500 6000

5

10

15

20

25

30

step

deg

ree

of

no

de

not visitedvisitedneighbors visited

Status and degree of node visited

1

10-2

0.1

1

step

pro

po

rtio

n o

f n

od

es

fo

un

d a

t s

tep

random walk

10 102

103

104

105

106

10-3

10-4

0 20 40 60 80 100

0.2

0.4

0.6

0.8

1

step

cum

ula

tive

no

des

fo

un

d a

t st

ep

random walk degree sequence

seeking high degree nodesspeeds up the search process

about 50% of a 10,000 node graphis explored in the first 12 steps

Progress of exploration in a 10,000 node graph knowing2nd degree neighbors

12

degree sequence

101

102

103

104

105

100

101

102

103

size of graph

cove

rtim

e fo

r h

alf

the

no

des

random walk = 0.37 fit

degree sequence = 0.24 fit

Scaling of search time with size of graph

Comparison with a Poisson graph

100

101

102

103

100

101

102

de

gre

e o

f c

urr

en

t n

od

e

step

Poisson power-law

101

102

104

106

100

101

102

103

104

105

number of nodes in graph

co

ve

r ti

me

fo

r 1

/2 o

f g

rap

h

constant av. deg. = 3.4

= 1.0 fit

10

xzexG

xGxGz

xxG 001

expected degree and expecteddegree following a link are equal

scaling is linear

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

step

cum

ula

tive

no

des

fo

un

d a

t st

ep

high degree seeking 1st neighborshigh degree seeking 2nd neighbors

50% of the files in a 700 node network can be found in < 8 steps

Gnutella network

• Maintain a list of files in their neighborhood

• Check query against list.

• Periodically contact neighbors to maintain list

• Append ID to each query processed

Required modifications to nodes

Tradeoff

storage/cpu(available)

bandwidth(limited)

for

• localized indexing• traffic routed to high degree nodes

Partial implementation:

Theory vs. reality:

• overloading high degree nodes but no worse than original scenario where all nodes handle all traffic

assume high degree -> high bandwidth so can carry the traffic load

• fewer nodes used for routing, system is more susceptible to maliciousattack

Clip2 Distributed Search Solutionshttp://dss.clip2.com© Clip2.com, Inc.

Broadband user runningReflector

Broadband user runningGnutella

Dial-up user runningGnutella

http://dss.clip2.com/

LimeWire, BearShare:drop connections to unresponsive hosts drives slower hosts to have fewer connections &move to edge of network

Kazaa, BearShare defender, Morpheus SuperNodes

from Clip2: Morpheus out of the Underworldhttp://www.openp2p.com/pub/a/p2p/2001/07/02/morpheus.html

Connection-preferencing rules

Supernodes

Conclusions

Search is faster and scales in power-law networks

Networks intended to be searched, such as Gnutella,have a favorable P-L topology

High degree strategy has partially been implemented in existing p2pclients, such as BearShare, Kazaa & Morpheus

A PL link distribution shortens the average shortest path

1

1

1

21

1 zz

zzz

r

rr

Poisson: = z1

PL: > z1

1 1.5 2 2.5 3 3.5 4 4.5 510

0

101

102

103

104

105

106

radius

neig

hbor

s at

rad

ius

power-law =2.5Poisson =1.0

102

104

106

0

2

4

6

N

PLPS

A

B

What about the shortest path discovered along the way?

B.J. Kim et al. ‘Path finding strategies in scale-free networks’, PRE (65) 027103.

each node passes message to highest degree neighbor it hasn’t passed the message to previously

‘cut off’ loops

102

103

104

105

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

N

av

. pa

th le

ng

th f

ou

nd

PL high degree0.72*ln(N)

A high degree seeking strategy finds shortest paths whose average scales logarithmically with the size of the graph

102

103

104

105

101

102

N

av

. pa

th le

ng

th f

ou

nd

PLPoisson

N0.46

N0.48

Scaling of the path length found using a• random strategy on a PL graph• high-degree strategy on a Poisson graph

102

103

104

105

100

101

102

103

104

N

me

dia

n s

ea

rch

co

st

PL high degreePL randPoisson high degree

But…Search costs are prohibitive, might as well do a BFS

Freenet

Queries are passed to one peer at a time.

Queries routed to high degree nodes.

Has a power-law topology Theodore Hong, ‘Performance’ chapter in O’Reilley’s “Peer-to-Peer, Harnessing the Power of Disruptive Technologies”

Scales as N0.275 with the size of the network, N.

Theodore Hong, power - law link distribution of a simulated Freenet network

Theodore Hong, scaling of mean search time

on a simulated Freenet network

Node specialization key to Freenet’s speed

Each node forwards query to node with “closest” hash key

Node passing back a match remembers the address the data came from

Results in nodes developing a bias towards a part of the keyspace

112659 ?356?

340388

396 135214

356340388

396 135214

Queries are naturally routed to high degree nodesUse keys for orientation

Applications to peer to peer networks

Adriana Iamnitchi, Matei Ripeanu, Ian Foster “Small-World File-Sharing Communities”, http://arxiv.org/abs/cs.DC/0307036create localized indeces for peers with similar download patterns

Foreseer:Proposed P2P architecture with friend & neighbor overlay friend: has shared a file neighbor: short ping time

Fletcher, George , Sheth, Hardik and Börner, Katy. (2004). Unstructured Peer-to-Peer Networks: Topological Properties and Search Performance. Third International Joint Conference on Autonomous Agents and MUlti-Agent Systems. W6: Agents and Peer-to-Peer Computing, Moro, Gianluca, Bergmanschi, Sonia and Aberer, Karl, Eds., New York, July 19-23, pp. 2-13.http://ella.slis.indiana.edu/~katy/paper/04-fletcher.pdf

http://arxiv.org/abs/cs.DC/0307036

http://ella.slis.indiana.edu/~katy/paper/04-fletcher.pdf

How do networks become navigable?Aaron Clauset and Cris Moore

arxiv.org/abs/cond-mat/0309415

In the limit N->long range link distribution becomes 1/r,r = lattice distance betweennodes

Documents

School of Information University of Michigan SI 614 Search in random networks Lecture 16