Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst

Preview:

Citation preview

Network Characterization via Random Walks

B. Ribeiro, D. TowsleyUMass-Amherst

Problem

Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it?

degree distribution centrality clustering …

Motivation

understanding technological networks, social networks Internet, wireless networks on-line social networks such as FaceBook,

MySpace, Orkut, YouTube, …

when network dataset not available size, lack of global view, dynamics

Outline

review of sampling

random walks (RWs)

multiple coupled RWs

results

Sampling methods

random sampling uniform vertex sampling

• θi - fraction of vertices with degree i

• degree i vertex sampled with probability θi

uniform edge sampling• πi - probability degree i vertex sampled

• πi = θi x i / <average degree>

crawling snowball sampling – commonly used, highly

biased random walk

6

Estimate θi - fraction of vertices with degree i

Budget: B samples accuracy: Normalized root Mean Squared

Error

uniform vertex

uniform edge

Random sampling: accuracy of estimates

head: GOOD tail: BAD

q head: BAD

q tail: GOOD

NM

SE

in-degree

Uniform vertex vs. edge sampling

edge

vertex

head: GOOD tail: BAD

GO

OD

head: BAD tail: GOOD

BA

D

Flickr graph (1.7 M vertices, 22M

edges)

budget: B = |V|/100

8

uniform vertex

Pros: independent sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

Cons: resource intensive

(sparse user ID space) difficult to sample

large degree vertices

Pros & Consuniform edge

Pros:◦ independent sampling◦ easy to sample high

degree vertices

Cons:◦ no public OSN interface

to sample edges

9

start at node v randomly select a neighbor of v repeat till collected B samples

sampling with replacement

Random walk (RW)

Random walk sampling produces biased

estimate iRW

of i

easily corrected

iRW

= i i /avg. degree

i = Norm iRW

/iCCDF

RW sampling^ ^

11

uniform vertex

Pros: independent sampling OSN needs numeric

user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...

Cons: resource intensive

(sparse user ID space) difficult to sample

large degree vertices

Pros & Consrandom walk

Pros: asymptotically unbiased easy to sample high

degree vertices low cost resource-wise

Cons: graph must be

connected large estimation errors

when graph loosely connected

length of transient?

12

uniform vertex samples A and C subgraphs but is expensive

RW samples A or C but is cheap

A

C

Combine advantages of

uniform vertex & RWs?

Hybrid sampling

Multiple random walks

m independent uniformly placed RWs split budget B among

them

Pros cover all components whp as m increases

Cons bias due to transient difficult to combine estimates

Couple the RWs?

14

m coupled walkers

B – sampling budget

S = {v1, … , vm} initial set of m vertices; E’ =

(1) start from vr S w.p. deg(vr)

(2) walk one step from vr

(3) add walked edge to E’ and update vr

(4) return to (1) (until m + | E’ | = B)

Frontier Sampling (FS)

Random walk on Gm

At steady state

samples edges uniformlyas m → , walkers uniformly distributed in

graph m coupled RWs start approximately in

steady state short transient

15

FS properties

16

Sample paths for θ1 estimate (Flickr graph)

Plot evolution (n) , n - number of steps

17

large connected component of Flickr graph

accuracy metric: NMSE of CCDF

Sampling errors

in-degree

NM

SE

18

2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge

Sampling errors: GAB graph

in-degree

NM

SE

20

m independent walkers walker i takes next step with

exponentially distributed time, mean current node degree

walkers run for time T, report to central site

Distributed FS

Future work analyzing, speeding up convergence

other forms of coupling other graph statistics study how graph structure affects

sampling efficiency power law vs exponential tail spatial correlation, independence vs. SRD

vs. LRD application to different networks

wireless, social, wireless/social

Recommended