View
222
Download
0
Category
Preview:
Citation preview
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks:Basic Concepts and Applications
Laura Ricci
Dipartimento di Informatica
25 luglio 2012
PhD in Computer Science
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Outline
1 Basic Concepts
2 Natural Random Walk
3 Random Walks Characterization
4 Metropolis Hastings
5 Applications
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walk: Basic Concepts
A Random Walk in synthesis:
given an indirected graph and a starting point, select aneighbour at random
move to the selected neighbour and repeat the sameprocess till a termination condition is verified
the random sequence of points selected in this way is arandom walk of the graph
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Random Walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Natural Random Walk
Natural Random Walk
Given an undirected graph G = (V,E), with n =| V | andm =| E |, a natural random walk is a stochastic process thatstarts from a given vertex, and then selects one of its neighborsuniformly at random to visit.
The natural random walk is defined by the following transitionmatrix P :
P (x, y) =
{1
degree(x) , y is a neighbour of x
0, otherwise
where x is the out degree of the node x
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Natural Random Walk
note that we assume undirected graph: i.e. if the walkercan go from i to j, it can also go from j to i
this does not imply that the probability of the transition ijis the same of the transition ji
it depends on the degree distribution of the nodes
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Natural Random Walk: Stationary Distribution
Stationary Distribution
Given an irreducible and aperiodic graph with a set of nodes Nand a set E of edges, the probability of being at a particularnode v converges to the stationary distribution
πRW (v) = deg(v)2×|E| = ∝ degree(v)
if we run the random walk for sufficiently long, then we getarbitrarily close to achieving the distribution πRW (v)
put in a different way: the fraction of time spent at a nodeis directly proportional to the degree of node....
the probability of sampling a node depends on its degree
the natural random walk is inherently biased towards nodewith higher degree
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Natural Random Walk Stationary Distribution
Consider a random walk on the house graph below:
d1, d2, d3, d4, d5) = (2, 3, 3, 2, 2)
so stationary distribution is
π1, π2, π3, π4, π5 = ( 212 ,
312 ,
312 ,
212 ,
212)
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Natural Random Walk Stationary Distribution
A king moves at random on an 8× 8 chessboard. Thenumber of moves in various locations are as follows:
interior tiles: 8 movesedge tiles: 5 movescorner tiles: 3 moves
The number of all possible edges(moves on the chessboard)is 420
Therefore, the stationary distribution is ( 8420 ,
3420 ,
5420)
This gives an idea of the time spent by the king on eachkind of tile during the random walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks and Markov Chains
Time Reversible Markov Chain: the probability of theoccurrence of a state sequence occurring is equal to theprobability of the reversal of that state sequence
running time backward does not affect the distribution at all
Consider a Markov Chain with state space {s1, . . . , sk} andprobability transition matrix P . A stationary distributionπ on S is said to be time reversible for the chain if∀i, j ∈ {1, . . . k}, we have:
π(i)× P (i, j) = π(j)× P (j, i)
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Time Reversible Markov Chain
Time Reversibility interpretation
think of π(i)× P (i, j) as the limiting long run fraction oftransitions made by the Markov chain that go from state ito state j.
time reversibility requires that the long run fraction of i toj transitions is the same as that of the j to i transitions,∀i, j.
note that this is a more stringent requirement thanstationarity, which equates the long run fraction oftransitions that go out of state i to the long run fraction oftransitions that go into state i.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Natural Random Walk and Markov Chains
the natural random walk on a graph is a time reversibleMarkov chain with respect to its stationary distribution
as a matter of fact, for all vi and vj neighbours,
π(i)× P (i, j) =did× 1
di=
1
d
π(j)× P (j, i) =djd× 1
dj=
1
d
otherwise, if vi and vj are not neighbours
π(i)× P (i, j) = π(j)× P (j, i) (1)
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks and Markov Chains
not all walks on a graph are reversible Markov Chains
consider the following walker: at each time step, the walkermoves one step clockwise with probability 3
4 and one stepcounter clockwise with probability 1
4
π = (14 ,14 ,
14 ,
14) is the only stationary distribution
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks and Markov Chains
The transition graph is the following one:
is it sufficient to show that the stationary distributionπ = (14 ,
14 ,
14 ,
14) is not reversible, to conclude that the chain
is not reversible
π(1)× P (1, 2) =1
4× 3
4=
3
16
π(2)× P (2, 1) =1
4× 1
4=
1
16
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walk Metrics
Important measures of Random Walk
Access or Hitting Time, Hij : expected number of stepsbefore node j is visited, starting from node i.
Commute Time: expected number of steps in the randomwalk starting at i, before node j is visited and then node iis reached again.
Cover time expected number of steps to reach every node,starting from a given initial distribution.
Graph Cover Time Maximum Cover Times over all Vertexes
Mixing Rate measures how fast the random walk convergesto the Stationary Distribution (Steady State).
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Computing Random Walk Metrics: A Warm UpExample
the values of the metrics depends on
the graph topologythe probability fluxes on the graph
let us consider the (simple and not realistic) case ofcomplete graph with nodes {0, . . . , n− 1}
each pair of vertices is connected by an edge
consider a natural random walk on this graph and compute
the access time for a pair of nodesthe cover time of the random walk
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Hitting Time
each node has the same number of connections to othernodes
so we can consider a generic pair of nodes, for instancenode 0 and node 1 and compute H(0, 1), without loss ofgenerality.
the probability that, staring from the node 0, we reachnode 1 in the t-h step is(
n−2n−1
)t−1× 1
n−1
so the expected hitting time is:
H(0, 1) =∑∞
t=1 t×(n−2n−1
)t−1× 1
n−1 = n− 1
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Cover Time
The problem of the coverage of a complete graph is closelyrelated to the Coupon Collector Problem
since you are eager of cereals you often buy them
each box of cereal contains one of n different coupons
each coupon is chosen independently and uniformly atrandom from n ones
you cannot collaborate with other people to collect thecoupons!
when you have collected one of every type of coupon, youwin a prize!
under these conditions, what is the expected number of boxof cereals you have to buy before you win the prize?
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Cover Time
Coupon collection problem is modelled through a geometricdistribution
Geometric Random Distribution:
a sequence of independent trials repeated until the firstsuccess
each trial succeeds with probability p
P(X = n) = (1− p)n−1p
E[X] = 1p
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Example: Cover Time
Modelling the coverage problem as a coupon collection problem:
the coupons are the vertexes of the graph
collecting a coupon corresponds to visiting a new node
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Cover Time
Cover time may be modelled by a sequence of GeometricVariables
Let us define a vertex as collected when it has been visitedat least a time by the random walk
X number of vertexes visited by the random walk beforecollecting all the vertexes in the graph
Xi number of vertexes visited after having collected i− 1vertexes and before collecting a new vertex.
X =∑n
i=1Xi
we are interested in computing the expected value ofX = E[X]
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Cover Time
Xi is a geometric random variable with success probability
pi = 1− i− 1
n
E[Xi] =1
pi=
n
n− i+ 1
by exploiting the linearity of expectations
E[X] = E[∑n
i=1Xi] =∑n
i=1 E[Xi] =∑n
i=1n
n−i+1 = n∑n
i=11i
the summation∑n
i1i is the harmonic number ≈ logn
we can conclude that the cover time (time to collect all thevertexes of the graph) is ≈ nlogn
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
A Warm Up Example: Cover Time
Note that
if we want to collect all the nodes through a single randomwalk, we need obviously at least n steps
the factor log(n) introduces a reasonable delay due to therandomness of the approach
this result has been obtained for a completely connected(regular) graph
the cover time in general depends from the topology of thegraph and from the probability fluxes on it
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Lollipop Graph
in the ”lollipop graph”, the asymmetry in the number ofneighbours implies very different hitting time Huv and Hvu
every random walk starting at u has no option but to go inthe direction of u, Hvu = Θ(n2)
a random walk starting at v has very little probability ofproceeding along the straight line, Huv = Θ(n3)
the cover time of the graph is high (Θ(n3)) due to a similarreason.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hasting Method
Markov Chains (and Random Walks) are a very useful andgeneral tool for simulations
suppose we want to simulate a random draw from somedistribution π on a finite set S.
for instance: generate a set of numbers chosen according toa power law distribution, or to a gaussian distribution,. . .
we can exploit the basic theorem of Markov Chains
find an irreducible, aperiodic probability transition matrixP such that its stationary distribution is π (πP = π)run the corresponding Markov chain for a sufficiently longtime
now the problem is to find the matrix P such that thestationary distribution is π
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hasting Method
start from a given connected undirected graph defined on aset S of states. For instance:
a P2P network overlaythe social relationship graph
define the probability transition matrix P such that therandom walk on this graph converges to the stationarydistribution π
the definition of P depends on:
the graph topologythe distribution probability π we want to obtain
different graph topologies lead to different matrices P withstationary distribution π.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hastings Method
to obtain a stationary distribution where the probability tovisit each node is proportional to its degree, simply run anatural random walk
in general, however, we want to obtain a different,arbitrary distribution π, such that:
π(i) ∝ degree(i)× f(i)
or
π(i) ∝ πRW × f(i)
Our Goal
find a simple way to modify the random walk transitionprobabilities so that the modified probability transition matrixhas stationary distribution π.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hastings Method
To obtain the distribution π, such that:
π(i) ∝ degree(i)× f(i) (2)
the Metropolis Hasting Methods defines the following transitionmatrix
P (i, j) =
1
degree(i) ×min{1,f(j)f(i) } if j ∈ N(i)
1−∑
k∈N(i) P (i, k) if i = j
0 otherwise
(3)
Theorem
Let π the distribution defined by (2)and P (i, j) defined by (3),then πP = π, i.e. π is a stationary distribution.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hastings Method
A modified version of natural random walk
To run the natural random walk, at each time, choose arandom neighbor and go there.
To run Metropolis Hastings, suppose to be at node i,generate next random transition as follows:
start out in the same way as natural random walk bychoosing a random neighbour j of i. j is a ”candidate”state
then make the next probabilistic decision: ”accept thecandidate” and move to j, or ”reject it” and stay at i
the probability to accept the candidate is given by the extrafactor
min{1, f(j)f(i) }
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
The Metropolis Hasting Method
Let us consider the ”correction factor”
min{1, f(j)f(i) }
if f(j) ≥ f(i), the minimum is 1, the chain definitely movesfrom i to j.
if f(j) < f(i), the minimum is f(j)f(i) , the chain moves to j
with probability f(j)f(i) .
if f(j) is much smaller than f(i), the desired distribution πplaces much less probability on j than on i
the chain should make a transition from i to j much lessfrequently than the random walk does
this is accomplished in the Metropolis Hastings chain byusually rejecting the candidate j.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Metropolis Hastings: Adjusting for Degree Bias
avoid the bias toward higher degree nodes
build an uniform distribution even for non regular graphs
define the transition matrix as follows:
P (i, j) =
1
degree(i) ×min{1,degree(i)degree(j)} if j ∈ N(i)
1−∑
k∈N(i) P (i, k) if i = j
0 otherwise
the bias toward higher degree nodes is removed by reducingthe probability of transitioning to higher degree nodes ateach step
instantiate the general formula by
f(j) =1
degree(j), f(i) =
1
degree(i)
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Metropolis Hastings: Adjusting for Degree Bias
The algorithm for selecting the next step from the node x
Select a neighbour y of x uniformly at random
Query y for a list of its neighbours, to determine its degree
Generate a random number p, uniformly at randombetween 0 and 1
If p ≤ degree(x)degree(y) , y is the next step.
Otherwise, remain at x as the next step
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Metropolis Hasting: An Example
Consider again the house graph:
on the left: Random Walk Stationary Distribution
on the right: the target distribution
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Metropolis Hastings: An Example
Consider again the house graph:
to define ∀i f(i), exploit the relation
πTARGET (i) = f(i)πRW (i)
πTARGET (i) = f(i)d(j)
2M
for instance f(1) =412212
= 2
similarly, f(2) = 23 = f(3) and f(4) = 1 = f(5).
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Metropolis Hastings: An Example
For instance, the transition probabilities are computed asfollows:
P (1, 2) =1
2×min{1,
23
2} =
1
6
P (1, 1) = 1− 1
6− 1
6=
2
3and so on . . .
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks: Applications
Random Walk exploited to model different scenarios inmathematics and physics
brownian motion of dust particle
statistical mechanics
Random Walks in Computer Science
epidemic diffusion of the information
generate random samples from a large set (for instance a setof nodes from a complex networks)
computation of aggregate functions on complex sets . . .
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks in Computer Science
The graph
nodes: nodes of a peer to peer network, vertex of a socialgraphedges: P2P overlay connections, social relation in a socialnetwork
since a random walk can be viewed as a Markov chain, it iscompletely characterized by its stochastic matrix
we are interested in the probabilities to be assigned to theedge of the graph in order to obtain a given probabilitydistribution
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks: Sampling a Complex Network
measuring the characteristics of complex networks:
P2P networksOn line Social NetworksWorld Wide Web
the complete dataset is not available due to
the huge size of the networkthe privacy concern
sampling techniques are essential for practical estimation ofstudying properties of the network
study the properties of the network based on a small butrepresentative sample.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks: Sampling a Complex Network
Parameters which may be estimated by a random walk
topological characteristics
node degree distributionclustering coefficientnetwork sizenetwork diameter
node characteristics
link bandwidthnumber of shared filesnumber of friends in a social network
A large amount of proposals in the last years.
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks: Sampling A Complex Network
Random Walk convergence detection mechanisms are requiredto obtain useful sample:
a valid sample of a network may be derived from a randomwalk only when the distribution is stationary
this is true only asymptotically
how many of the initial samples in each walk have to bediscarded to lose dependence from the startingpoint(burn-in problem) ?
a further problem:
how many samples to collect before we having collected arepresentative sample?
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Random Walks: Convergence Detection
A naive approach:
run the sampling process long enough and discard a numberof initial burn in samples pro actively
pro: simplicity
cons: from a practical point of view the length of the burnin phase should be minimized because to reduce bandwidthconsumption and computational time.
A more refined approach: estimate convergence from as aset of statistical properties of the walks as they arecollected
several techniques from the Markov chain diagnosticsliterature
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Geweke’s Diagnostics
basic idea: take two non overlapping parts of the RandomWalk
compare the means of both parts, using a difference ofmeans test
if the two parts are separated by many iterations, thecorrelation between them is low
see if the two parts of the chain belong to the samedistribution
the two parts should be identically distributed when thewalk converges
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Geweke’s Diagnostics
Let X be a single sequence of samples of the metric ofinterest, obtained by a random walk. Consider:
a prefix Xa of X (usually 10% of X)a longer suffix Xb of X (usually 50% of X)
compute the z statistics
z =E(Xa)− E(Xb)√V ar(Xa) + V ar(Xb)
(4)
if | z |> T where T is a threshold value (usually 1.96),iterates from the prefix segment were not yet drawn fromthe target distribution and should be discarded
otherwise declare convergence
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Convergence: Using Multiple Parallel Random Walks
a single walk may get trapped in a cluster while exploringthe network
the chain may stay long in some non-representative regionthis may lead to erroneous diagnosis of convergence
to improve convergence, use multiple parallel random walks
Gelman Rubin Diagnostics:
uses parallel chains and discards initial values of each chain
check if all the chains converge to the approximatively thesame target distribution.
the test outputs a single value R that is a function of meansand variances of all chains
failure indicates the need to run a longer chain: burn-in yetto be completed
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Applications: Random Membership Management
a Membership Service provides to a node a list of membersof a dynamic network
the overhead of maintaining the full list of members maybe too high for a complex network:
a node maintain a random subset of the nodes.
A random-walk based membership service: random sampleof size k is computed by node i as follows:
start k Metropolis Hastings random walks in parallel
the probability of visiting a node converges to the uniformone
each node visited by the random walk sends its membershipinformation (for instance its IP address) to i which updatesits local membership set
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Applications: Load Balancing
Load imbalance in P2P networks may be caused by severalfactors:
uneven distribution of dataheterogeneity in node capacities.
Load biased Random Walks:
sample node with probabilities proportional to their load
discover overloaded nodes more often
each node issues a random walker that persistently runs asa node sampler
overloaded nodes exchange tasks with more light weightednodes
Basic Concepts Natural Random Walk Random Walks Characterization Metropolis Hastings Applications
Applications: Index Free Data Search
Index Free Search Method as alternatives to structuredDHT
floodingrandom walks
Popularity-biased Random Walks
content popularity of a peer pi: number of queries satisfiedby pi divided by the total number of queries received by pi.
define a bias in the search towards more popular peers
Index Free Searching: each peer is probed with a probabilityproportional to the square root of its query popularity
Recommended