61

Introduction Important Concepts in MCL Algorithm MCL Algorithm The Features of MCL Algorithm Summary

Embed Size (px)

Citation preview

Page 1: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 2: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Introduction

Important Concepts in MCL Algorithm

MCL Algorithm

The Features of MCL Algorithm

Summary

Page 3: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Decompose a network into subnetworks based on some topological properties

Usually we look for dense subnetworks

3

Page 4: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Algorithms: Exact: have proven solution quality and time

complexity Approximate: heuristics are used to make them

efficientExample algorithms: Highly connected subgraphs (HCS) Restricted neighborhood search clustering (RNSC) Molecular Complex Detection (MCODE) Markov Cluster Algorithm (MCL)

4

Page 5: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Intuition:◦ High connected nodes could be in one cluster◦ Low connected nodes could be in different

clusters. Model:

◦ A random walk may start at any node ◦ Starting at node r, if a random walk will reach

node t with high probability, then r and t should be clustered together.

Page 6: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

An undirected graph and its adjacency matrix representation.

An undirected graph and its adjacency list representation.

Page 7: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

7

Theorem.Theorem. Let Let MM be the adjacency matrix for be the adjacency matrix for graph graph GG. Then each (. Then each (i, ji, j) entry in ) entry in M M rr is the is the number of paths of length number of paths of length rr from vertex from vertex ii to vertex to vertex jj..

Note:Note: This is the standard power of m, This is the standard power of m, not a Boolean product.not a Boolean product.

Page 8: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

8

Page 9: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

9

Page 10: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Graph power◦ The kth power of a graph G: a graph with the

same set of vertices as G and an edge between two vertices iff there is a path of length at most k between them

◦ The number of paths of length k between any two nodes can be calculated by raising adjacency matrix of G to the exponent k

◦ Then, G’s kth power is defined as the graph whose adjacency matrix is given by the sum of the first k powers of the adjacency matrix:

10

Page 11: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

11

G G2 G3

Page 12: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Given a weighted graph G(V,E,w), the all-pairs shortest paths problem is to find the shortest paths between all pairs of vertices vi, vj ∈ V.

A number of algorithms are known for solving this problem.

Page 13: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Consider the multiplication of the weighted adjacency matrix with itself - except, in this case, we replace the multiplication operation in matrix multiplication by addition, and the addition operation by minimization.

Notice that the product of weighted adjacency matrix with itself returns a matrix that contains shortest paths of length 2 between any pair of nodes.

It follows from this argument that An contains all shortest paths.

Page 14: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 15: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Markov process◦ The probability that a random will take an edge at

node u only depends on u and the given edge.◦ It does not depend on its previous route.◦ This assumption simplifies the computation.

Page 16: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Flow of network is used to approximate the partition

There is an initial amount of flow injected into each node.

At each step, a percentage of flow will goes from a node to its neighbors via the outgoing edges.

Page 17: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Edge Weight◦ Similarity between two nodes◦ Considered as the bandwidth or connectivity.◦ If an edge has higher weight than the other, then

more flow will be flown over the edge.◦ The amount of flow is proportional to the edge

weight.◦ If there is no edge weight, then we can assign the

same weight to all edges.

Page 18: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Two natural clusters

When the flow reaches the border points, it is likely to return back, than cross the border.

A B

Page 19: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

When the flow reaches A, it has four possible outcomes.◦ Three back into the cluster, one leak out.◦ ¾ of flow will return, only ¼ leaks.

Flow will accumulate in the center of a cluster (island).

The border nodes will starve.

Page 20: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Simualtion of Random Flow in graph

Two Operations: Expansion and Inflation

Intrinsic relationship between MCL process result and cluster structure

Page 21: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Observation 1:

The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster

Small for pairs of vertices belonging to different clusters

Page 22: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Oberservation 2:

A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited

Page 23: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

nxn Adjacency matrix A.◦A(i,j) = weight on edge from i to j◦If the graph is undirected A(i,j)=A(j,i), i.e.

A is symmetric

nxn Transition matrix P.◦P is row stochastic◦P(i,j) = probability of stepping on node j

from node i = A(i,j)/∑iA(i,j)

Page 24: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

• Flow: Transition probability from a node to another node.• Flow matrix: Matrix with the flows among all nodes; ith

column represents flows out of ith node. Each column sums to 1.

1 2 3

1 2 3

0.5 0.5

1 1

1 2 3

1 0 0.5 0

2 1.0 0 1.0

3 0 0.5 0

Flow

Matrix

24

Page 25: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Adjacency matrix A Transition matrix P

1

1

11

1

1/2

1/21

Page 26: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

1

1/2

1/21

t=0

Page 27: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

1

1/2

1/21

1

1/2

1/21

t=0 t=1

Page 28: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

1

1/2

1/21

1

1/2

1/21

t=0 t=1

1

1/2

1/21

t=2

Page 29: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

1

1/2

1/21

1

1/2

1/21

t=0 t=1

1

1/2

1/21

t=2

1

1/2

1/21

t=3

Page 30: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

xt(i) = probability that the surfer is at node i at time t

xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i) =∑jxt(j)*P(j,i)

xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

What happens when the surfer keeps walking for a long time?

Page 31: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities.

Cluster structure will show itself as a peaked distribution of the quantities

A lack of cluster structure will result in a flat distribution

Page 32: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Markov Chain

Random Walk on Graph

Some Definitions in MCL

Page 33: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

A Random Process with Markov Property

Markov Property: given the present state, future states are independent of the past states

At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.

Page 34: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 35: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

A walker takes off on some arbitrary vertex

He successively visits new vertices by selecting arbitrarily one of outgoing edges

There is not much difference between random walk and finite Markov chain.

Page 36: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Simple Graph

Simple graph is undirected graph in which every nonzero weight equals 1.

Page 37: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Associated Matrix

The associated matrix of G, denoted MG ,is defined by setting the entry (MG)pq equal to w(vp,vq)

Page 38: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Markov Matrix

The Markov matrix associated with a graph G is denoted by TG and is formally defined by letting its qth column be the qth column of M normalized

Page 39: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 40: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

The associate matrix and markov matrix is actually for matrix M+I

I denotes diagonal matrix with nonzero element equals 1

Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step

Page 41: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 42: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Find Higher-Length Path

Start Point: In associated matrix that the quantity (Mk)pq has a straightforward interpretation as the number of paths of length k between vp and vq

Page 43: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

(MG+I)2

MG

Page 44: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

MG

Page 45: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 46: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Flow is easier with dense regions than across sparse boundaries,

However, in the long run, this effect disappears.

Power of matrix can be used to find higher-length path but the effect will diminish as the flow goes on.

Page 47: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted.

MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.

Page 48: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 49: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 50: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 51: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 52: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Expansion Operation: power of matrix, expansion of dense region

Inflation Operation: mention aboved, elimination of unfavoured region

Page 53: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Expand: M := M*M

Inflate: M := M.^r (r usually 2), renormalize columns

Converged?

Input: A, Adjacency matrixInitialize M to MG, the canonical transition matrix M:= MG:= (A+I) D-1

Yes

Output clusters

No

Prune

Enhances flow to well-connected nodes as well as to new nodes.

Increases inequality in each column. “Rich get richer, poor get poorer.”

Saves memory by removing entries close to zero.

53

Page 54: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 55: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary
Page 56: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

http://www.micans.org/mcl/ani/mcl-animation.html

Page 57: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Find attractor: the node a is an attractor if Maa is nonzero

Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system.

If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.

Page 58: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Attractor Set={1,2,3,4,5,6,7,8,9,10}The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}

Page 59: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

how many steps are requred before the algorithm converges to a idempoent matrix?

The number is typically somewhere between 10 and 100

The effect of inflation on cluster granularity

Page 60: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

MCL stimulates random walk on graph to find cluster

Expansion promotes dense region while Inflation demotes the less favoured region

There is intrinsic relationship between MCL result and cluster structure

Page 61: Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary

Scalable Graph Clustering using Stochastic Flows

The original algorithm for clustering graphs using stochastic flows.

Advantages:• Simple and elegant.• Widely used in Bioinformatics because of its noise tolerance and effectiveness.

Disadvantages:• Very slow.

- Takes 1.2 hours to cluster a 76K node social network.• Prone to output too many clusters.

- Produces 1416 clusters on a 4741 node PPI network.

61