Kavosh : a new algorithm for finding network motifs

Kavosh: a new algorithm for finding network motifs

Jin Chen2012 Fall

Michigan State University

Motivation of this paper

• It presents a new algorithm for finding size-k network motifs from a directed network with less memory and CPU time in comparison to other algorithms

• Input : A large directed or undirected network

• Output: Network motifs which occur in the input network.

Basic Terminologies

slides adapted from Shalev Itzkovitz’s talk given at IPAM UCLA on July 2005

Basic Terminologies

Transcription regulation of gene by Protein

Basic Terminologies

Or Motifs

Basic Terminologies

Definition of Network Motifs• Patterns that occur in a real network significantly more than in

randomized networks are called NETWORK MOTIFS.

• Randomized Networks:Networks with same characteristics as the real network, but where the connections between nodes and edges are made at random.

Definition of Network Motifs

R. Milo et al. Science 2002; vol 298:824-827

Exist algorithms

• mFinder: size 3-4, directed and undirected

• PAJEK: size 3-4, directed and undirected, visible

• FANMOD: size 8, directed and undirected, sampling, visible

• NeMoFinder: size 13, undirected

Kavosh consists of 4 steps• Enumeration: finding all subgraphs of a given size that occur in the

input graph

• Classification: classifying each found subgraph into isomorphic groups

• Random graph generation: generating random networks with respect to the input network (enumeration and classification are also performed on random networks)

• Motif identification: distinguishing motifs among all found subgraphs on basis of statistical parameters

Enumeration

• All subgraphs that include a particular vertex are discovered

• Subsequently, this vertex is removed from thenetwork, and the process is repeated

Enumeration

Example of enumeration: to find all size-3 induced subgraphs in G, the composition is (1,1),(2)

To find all size-4 induced subgraphs in G, the composition is (1,1,1),(1,2),(2,1),(3)

Enumeration(1,1)

(2)After removing node 1

After removing node 1 and 2

Time Complexity of Enumeration

Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms.

However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor. Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist, unless P=NP. … When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs.

from Wikipedia

Classification

• NAUTY - algorithm for finding isomorphism subgraphs

• NAUTY uses canonical matrix as the unique identifier of a subgraph

• Two subgraphs are isomorphic if and only if their canonical matrices are same

Canonical Matrix and Labeling

Adjacent-matrix

0011100100010000

Switch the node labels for obtaining new adjacent matrix.Turn matrix to string, representing a graph.Canonical Labeling: maximal or minimum string

Node order (2,1,3,4) 0101001100010000

Canonical Labeling

subgraph String

NAUTY• The world's fastest isomorphism testing program is Nauty, by Brendan

D. McKay, Professor in the Research School of Computer Science, Australian National University.

• Nauty (No AUTomorphisms, Yes?) is a set of efficient C language procedures to produce a canonically-labeled isomorph of the graph, for isomorphism testing.

• It can test most graphs of less than 100 vertices in well under a second.

• Nauty has been successfully ported to a variety of operating systems and C compilers.

http://www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtmlMcKay, B.D. Practical Graph Isomorphism, Congressus Numerantium, 30 (1981) 45-87

Random graph generation• Switching operations are applied on the edges of the input

network repeatedly, until the network is well randomized.

This progress does not change the vertex degrees.

Motif determination

• Two statistical measures– Z-score

where Np is the number which motif Gp occurred in the input network, is the mean which Gp occurred in random networks and σ is the standard deviation. The larger the Z-score, the more significant is the network motif

– P-valueIt indicates the number of random networks in which a motif GP occurred more often than in a biological network, divided by the total number of random

networks. P-value ranges from 0 to 1. The smaller the P-value, the more significant is the network motif.

Parameters in KavoshThe following parameters are used to describe a network motif in Kovash paper

• The frequency(in real graph) is larger than 4

• By using 1000 randomized network, p-value < 0.01

• By using 1000 randomized network, Z-score > 1.0

Performance

E. coli gene regulatory network

Node number: 672Edge number: 1276

Performance

Contribution

• Designed a new algorithm to find network motif for both directed and undirected network. size: > 8

• A new method to enumerate all the subgraphs

Discussions

• In terms of the algorithm for isomorphism testing, any better ones?

(LEDA Technical report)

Discussions

• What is the bottleneck of this algorithm? Enumeration of subgraphs: Computing combination is exponential

Calculation of Canonical Labeling for all the subgraphs

Discussions

• Is it unbelievable?

Kavosh : a new algorithm for finding network motifs

Documents

Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)

Parameterized Algorithmics for Finding Connected Motifs in ... · Index Terms—parameterized complexity, color-coding, list-colored graphs, pattern matching in graphs, protein-interaction

Parameterized Algorithmics for Finding Connected Motifs in ...theinf1.informatik.uni-jena.de/publications/...in-biological-networks.pdf · has applications in the analysis of biological

Dynamics of Regulatory Networks in Gastrin-Treated ...€¦ · network motifs are unique for each active sub-network. Finding these motifs in active sub-networks can enhance our understand-ing

Finding Regulatory Motifs in DNA Sequences

Finding Regulatory Motifs in DNA Sequences - UCSD CSEbix.ucsd.edu/bioalgorithms/presentations/Ch04_Motifs.pdf · Finding Regulatory Motifs in DNA Sequences. ... • Find the pattern

Finding sequence motifs in PBM data Workshop Project

Finding Regulatory Motifs in DNA Sequences - TIME.mk

Finding Regulatory Motifs in DNA Sequences Lecture 10 – Branch and Bound

Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10

Regulatory Motifs in DNA Sequencestabio172/wiki.files/... · Finding Regulatory Motifs in DNA Sequences. An Introduction to Bioinformatics Algorithms Outline • Implanting Patterns

Programme with Abstracts - 8th Conference on Complex … · Network motifs detection using random networks with prescribed subgraph frequencies 61 Finding Redescriptions of Communities

Mutiple Motifs Charles Yan Spring 2006. 2 Mutiple Motifs

4. Finding Regulatory Motifs in DNA Sequences (Chapter 4 and 12)

Finding patterns within sequencesCombinatorial Pattern Matching 1. A Recurring Problem Finding patterns within sequences Variants on this idea Finding repeated motifs amoungst a set

Finding conserved transcription factor binding sites in promoter sequences NfkappaB motifs in promoters controling human NFkappaB gene family members Markella

Mining Motifs in Omics Networks · Finding motifs in Omics ... Indian Institute of Technology, Kharagpur Kharagpur - 721 ... 4.3 Convergence of Monte Carlo EM based algorithm to the

Ivo Petrov ivo.petrov@optimarin - Kavosh Abzar

Finding Subtle Motifs by Branching from Sample Strings Xuan Qi Computer Science Dept. Utah State Univ

Finding Recurrent Motifs in RNA 3D Structures Jesse Stombaugh Bowling Green State University RNA Society 2006