A measure of betweenness centrality based on random walks Author: M. E. J. Newman Presented by:...

Preview:

Citation preview

A measure of betweenness centrality based on random walks Author: M. E. J. Newman

Presented by:

Amruta Hingane

Department of Computer Science

Kent State University

Overview

• Introduction

• Centrality Measures

• Types Of Betweenness

• Random-walk Betweenness

– A current flow analogy

– Random walks

• Comparison Of Different Betweenness Measures

• Correlation With Other Measures

• Examples Applications

Centrality Measures

Degree:

• Simplest centrality measure

• Number of edges incident on a vertex in a network

• The number of ties an actor has in social network parlance

• A measure in some sense of the popularity of an actor.

ab

c

Degree of b = ?

Centrality Measures Continued…

Closeness:

• Centrality measure which is the mean geodesic (shortest-path) distance between a vertex and all other vertices reachable from it.

• Measure of how long it will take information to spread from a given vertex to others in the network

Betweenness:• Measure of the extent to which a vertex lies on the paths

between others

Betweenness

• The betweenness of a vertex i is defined to be the fraction of shortest paths between pairs of vertices in a network that pass through i.

Σ s<t gi(st) /nst

bi = ----------------

½ n(n − 1)

n = Total no of vertices in network

gi(st) = no of geodesic paths from vertex s to vertex t that pass

through i.

nst = total no of geodesic paths from s to t.

Shortest-path Betweenness

• A measure of the extent to which an actor has control over information flowing between others.

• In a network in which flow is entirely or at least mostly along geodesic paths, the betweenness of a vertex measures how much flow will pass through that particular vertex.

• Betweenness can be calculated for all vertices in time O(mn)

m: edges

n: vertices

Example of Shortest-path Betweenness

Shortest path betweenness

Vertices A and B will have high (shortest-path) betweenness in this configuration, while vertex C will not

Drawbacks of Shortest-path Betweenness

• Does information flow only along geodesic paths?

• News, rumor, fad, message – does it know the ideal route

• To get from one place to another more likely a message wanders around more randomly, encountering who it will.

• Certainly it is possible for information to flow between two individuals via a third mutual acquaintance, even when the two individuals in question are themselves well acquainted

• A realistic betweenness measure should include non-geodesic paths in addition to geodesic ones

Flow betweenness

• Flow betweenness of a vertex i is defined as the amount of flow through vertex i when the maximum flow is transmitted from s to t, averaged over all s and t.

• Flow betweenness can be thought of as measuring the betweenness of vertices in a network in which a maximal amount of information is continuously pumped between all sources and targets.

• Maximum flow from a given s to all reachable targets t can be calculated in worst-case time O(m2) and hence the flow betweenness for all vertices can be calculated in time O(m2n)

Example of Flow Betweenness

While calculating flow betweenness, vertices A and B will get high scores while vertex C will not

Drawbacks of Flow Betweenness

• Does information “know” the ideal route (or one of the ideal routes) from each source to each target, in order to realize the maximum flow?

• Although the flow betweenness does take account of paths other than the shortest path this still seems unrealistic for many practical situations

• Flow betweenness suffers from some of the same drawbacks as shortest-path betweenness, as in flow does not take any sort of ideal path from source to target, be it the shortest path, the maximum flow path, or another kind of ideal path

Random Walk Betweenness

• Random-walk betweenness of a vertex i is equal to the number of times that a random walk starting at s and ending at t passes through i along the way, averaged over all s and t

• Random-walk betweenness can be calculated for all vertices in a network in worst-case time O((m + n)n2) using matrix methods

• This measure is appropriate to a network in which information wanders about essentially at random until it finds its target

Basic Matrix Notations

4

321

0111

1000

1001

1010

A

0110

0000

0001

1000

A

4

321

njnj

njj

njj

n a

a

a

k

k

k

K

,1

,12

,11

2

1

...00

............

0...0

0...0

...00

............

0...0

0...0

Adjacency Matrices

Basic Matrix Notations Continued…

• Rank of a matrix A = maximal number of linearly independent rows/columns

• A square matrix Anxn is invertible only if rank A = n

• Product of eigen values of a matrix = determinant of the matrix

• Ax = λx x = non-zero eigen vector

A = matrix

λ = eigen value• Singular matrix = determinant is zero = matrix is non-invertible

Current Flow Analogy

Current flow betweenness of a vertex i is the amount of current that flows through iaveraged over all source and target points

Kirchoff’s law of current: Total current flow into or out of any vertex is zero

i = i1 + i2

ixy = [V(x) – V(y)] / Rxy

Injected current = 1 unit

Extracted current = 1 unit

Resistance = 1 unit

Current flow betweenness

By Kirchoff’s law of current conservation, the voltages satisfy:

ΣjAij(Vi - Vj) = δis – δit

Aij is an element of adjacency matrix

Aij =

δij is the Knocker δ =

1 if there is an edge between i and j

0 otherwise

1 if i = j

0 otherwise

Current flow betweenness Continued…

Since Σj Aij = ki degree of vertex i

Therefore,

(D - A) . V = s

D = diagonal matrix with elements Dii = ki

s is source vector with elements si =

+1 for i = s

-1 for i = t

0 otherwise

Current flow betweenness Continued…

To obtain V, matrix (D – A) cannot be inverted as it is singular

Removing vth row of D – A.

Vv = 0 as voltage is measured with respect to

corresponding vertex

Removing vth column,

Dv - Av a square matrix (n-1) * (n-1)

V = (Dv - Av)-1 . S

Adding back the missing vertex with values all equal to zero

Voltage at i:

Vi(st) = Tis – Tit T is the resulting

matrix

Current flow betweenness Continued…

Current through i = half of the sum of the absolute values of the currents flowing along the edges incident on that vertex

Ii(st) = ½ ∑j Aij |Vi

(st) – Vj(st)| = ½ ∑j Aij |Tis – Tit – Tjs + Tjt|

for i != s,t

Is(st) = 1

It(st) = 1

∑s<t Ii(st)

Betweenness: bi = ----------- avg. of current flow over all

½ n(n-1) source-target pairs

* Calculated separately for each component of a graph with more than one components

Time of Calculation

Inversion of matrix takes O(n3)

Betweenness equation takes O(mn) for each vertex or O(mn2) for all of them

Total running time to calculate current flow betweenness for all vertices = O((m+n) n2)

O(n3) for a sparse graph

Random Walks

st s = source

t = target

m = message

m?

?

Random-walk Betweenness

• Definition: Betweenness measure for a vertex i is the number of times a message passes through i on its journey, averaged over a large number of trials of the random walk, this value averaged over all possible source/target pairs s, t is random-walk betweenness

• Betweenness of vertex i is the net number of times a walk passes through i.

Calculating Random-walk Betweenness

• Consider an absorbing random walk, a walk that starts at vertex s and makes random moves around the network until it finds itself at vertex t and then stops.

• If at some point in this walk we find ourselves at vertex j, then the probability that we will find ourselves at i on the next step is given by the matrix element: Mij = Aij / kj , for j != t,

s

t

ij

Calculating Random-walk Betweenness Continued…

Aij - element of the adjacency matrix,

kj = Σi Aij - degree of vertex j

In matrix notation,

M = A ・ D−1

D - diagonal matrix with elements Dii = ki.

Mit = 0 for all i

Mt = At ・ D−1 after removing row and column t

Calculating Random-walk Betweenness Continued…

For a walk from s,

probability of reaching j after r steps: [Mrt ]js

probability that step is taken to an adjacent vertex i: kj−1 [Mr

t ]js

Summing over all values of r from 0 to ∞

kj−1 [(I −Mt)−1]js total no.of times we go from j to

i averaged over all possible walks

In matrix notation we can write this as an element of the vector

V = Dt−1 . (I −Mt)−1 . s = (Dt − At)−1 . s

Calculating Random-walk Betweenness Continued..

|Vi − Vj | Net flow of the random walk from j to i

∑s<t Ii(st)

Betweenness: bi = -----------

½ n(n-1)

- Final net flow of random walks through vertex i

Summarization

• Construct the matrix D−A, where D is the diagonal matrix of vertex degrees and A is the adjacency matrix.

• Remove any single row, and the corresponding column. For example, one could remove the last row and column.

• Invert the resulting matrix and then add back in a new row and column consisting of all zeros in the position from which the row and column were previously removed (e.g., the last row and column). Call the resulting matrix T, with elements Tij .

• Calculate the betweenness from Eq. of bi using the values of Ii

Comparison

• In each network, we intuitively expect vertex C to have betweenness lower than that of vertices A and B, but higher than that of vertices X and Y.

Comparison Continued…

• Shortest path betweenness fails to give a higher score to vertex C in the first network than to any of the other vertices within the two communities, while flow betweenness has the same problem with vertex C in the second network.

• Random-walk measure orders the vertices correctly in each case.

Correlation With Other Measures

• Shortest-path betweenness is known to be strongly correlated with vertex degree in most networks

• If the two are strongly correlated, then why calculate betweenness, when degree is almost the same and much easier to calculate?

• There are usually a small number of vertices in a network for which betweenness and degree are very different so need betweenness to identify these vertices.

Example on Correlation With Other Measures

• Vertices with higher degree or higher shortest-path betweenness tend also to have higher random-walk betweenness.

• This misses the real point of interest, that there are a few vertices that have random-walk betweenness values quite different from their scores on the other two measures.

• The size of the vertices increases linearly with their random-walk betweenness.

• The highlighted vertices are those for which the random-walk betweenness is substantially greater than shortest-path betweenness (a factor of two or more).

The largest component of a network of sexual contacts between high-risk actors in the city of Colorado Springs

Example continued…

Example Applications

The network of intermarriage relations between the 15th century Florentine families

Example 1

Ranking on random-walk betweenness

• Medici come out well ahead of the competition, and they easily best their arch-rivals, the Strozzi.

• It is suggested that it was in part the Medici’s skillful manipulation of this marriage network that led to their eventual dominance of the Florentine political landscape.

Example 2

• Increasing size of vertices: score on the random-walk betweenness measure

• A: brokers who establish connections between different groups

• B: lying on paths where there are two (or more) paths to an outlying group of vertices

The largest component of the co-authorship network of scientists working on networks

Thank You

Recommended