Detecting Network Motifs in Gene Co-expression Networks

Detecting Network Motifs in Gene Co-expression

Networks

Shahar ArtziShahar Artzi

OutlineOutline

IntroductionIntroduction to Network Motifs

Introduction Introduction to Detecting Network Motifs

Co-expression NetworkCo-expression Network

Network Motif DiscoveryNetwork Motif Discovery

Scale-Free NetworksScale-Free Networks

ResultResult

DiscussionDiscussion

Network MotifsNetwork MotifsWhat is “Network Motifs” ? What is “Network Motifs” ?

Network Motifs are defined as patterns ofNetwork Motifs are defined as patterns of

interconnections that recur in manyinterconnections that recur in manydifferent parts of a network at frequenciesdifferent parts of a network at frequenciesmuch higher than those found inmuch higher than those found inrandomized networks.randomized networks.

Why we need them?Why we need them? * To help us understand how biological networks work.* To help us understand how biological networks work.

* Exact forecasting of operation and reaction in the * Exact forecasting of operation and reaction in the

network under given situations. network under given situations.

The concept of “network motifs” was first proposed by Uri Alon’s group :

Schematic View of Network Motif DetectionSchematic View of Network Motif Detection

Examples for motifsExamples for motifs

• FeedForward Loop FeedForward Loop ((היזון קדמיהיזון קדמי))Found in neural networks.Found in neural networks.It seems to be Used for neutralize It seems to be Used for neutralize ““Biological Noise”.Biological Noise”.

• Single-Input ModuleSingle-Input Module ((פס יצורפס יצור))

Implemented in gene control networksImplemented in gene control networks

Examples for motifsExamples for motifs

• Parallel pathsParallel paths

Found in neural networks.Found in neural networks.(and not so much in gene networks)(and not so much in gene networks)

INTRODUCTIONINTRODUCTIONto Detecting Network Motifs

Functionally related genes could be clustered together based on similar expression profiles.

Problem :

General clustering algorithms produce clusters ofrelatively large size, making it difficult to test the cluster of interest using wet-lab experiments.General clustering algorithms do not provide reasonably detailed information about the relationship among genes in a cluster.This makes it even more difficult for individual researchers to verify the predictions experimentally.

We need a way to break big clusters into smaller ones and thereby provide more detailed insights into relationships among genes within subclusters.

To achieve this goal:

we need additional independent information.

In this research they used Protein sequence information

The algorithms are able to decompose the clusters of genes into smaller ones by integrating protein domain information into the clustering algorithm.

INTRODUCTIONINTRODUCTIONto Detecting Network Motifs

Co-expression NetworkCo-expression Network

The genes are vertices (nodes).

An unweighted, and undirected edge (connection) is put between two genes if they are co-expressed with correlation higher than a specified threshold.

Correlation MatrixCorrelation Matrix AdjacencyAdjacency Matrix Matrix

Cutoff : 0.8Cutoff : 0.8

Network Motif DiscoveryNetwork Motif Discovery

Here we extended the concept of network motif (Alon’s group proposal) to the labeled graph by also considering patterns of vertex labels.

NetworkNetwork Motif Discovery Motif DiscoveryFigure B shows 7 hypothetical genes in a co-expression

network forming three distinct instances of the network motif as described in Figure A.

II and III are overlappingII and III are overlappingI and II are non-overlappingI and II are non-overlapping(I and III are also non – overlapping)(I and III are also non – overlapping)

Network Motif DiscoveryNetwork Motif Discovery Schematic View

< G, k, f >

Enumeration of k-vertex cliques

Groups of cliques

Network motifs

Protein domainProtein domain

f := number of non-overlapping f := number of non-overlapping cliquescliques

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

First step :First step :Starting from a labeled graph whose vertices (genes) were labeled with their corresponding annotated Pfam protein domain information.

A generic clique-based clustering algorithm was applied to this labeled co-expression network to search for patterns of highly co-expressed genes or network motifs.

We’ll go over the clustering algorithm briefly.The algorithm work in Scoring Method :


Our first step is to determine which genes appear to discriminate best among sample types.

To accomplish this, a discrimination score is calculated for each gene.

Only the best genes (those with the highest scores) are retained for subsequent steps.

Subtracting the sum of the standard deviations of a gene within each group allows us to eliminate, or at least diminish, the importance of any gene whose expression levels vary excessively.

The data is obtained in an n x m matrix.


The algorithm can be described in pidgin ALGOL as follows:

From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening”*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3


Alternative approach :

Elimination of outliers before computing the scores -

outliers might affect both the median and the standard deviation.

we modified our approach by adding a screening phase, in which we first compute the medians and the standard deviations for each gene within each group, then check the expression values corresponding to that gene, discarding those at least three standard deviations away from the group median.

We subsequently re-compute all medians and standard deviations using only the retained values.


We describe this modified algorithm in pidgin ALGOL:

continuance continuance

From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening”*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3


Algorithm cont’ : Algorithm cont’ :

Network Motif DiscoveryNetwork Motif Discovery Subsequent Steps

For a specified k, we scanned all k-vertex

cliques and grouped all cliques found based on the protein domain information.

Next a parameter f specifying the minimum number of mutual non-overlapping instances in a network motif was used to trim the list of putative network motifs.

Only the putative network motifs having at least

f non-overlapping instances were kept as network motifs.

We further assessed the statistical significance of each

detected network motif by comparison to randomized networks.

Network Motif DiscoveryNetwork Motif Discovery Subsequent Steps

Starting from the real co-expression network, we generated a randomized network by randomly permuting the domain labels of all genes while leaving the connection structure of the graph untouched, and then ran the same network motif detection procedure on the resulting randomized network.

This process was repeated 1,000 times and the percentage of times the same network motif was found in the randomized networks was defined as the p-value for the network motif.

Network Motif DiscoveryNetwork Motif Discovery domain matching levels

Here we propose only domain matching levels:

Matching level 2 : requires that two proteins have the exact same type of domain, the same number of each type of domain and all domains in the same order in the

respective protein sequences.

Matching level 4 :only requires the same type of domain.

The network motif detection procedure was run separately using different domain matching levels.

ExamplesExamples

A B C A B C1 1’

Domain Matching level 2Domain Matching level 2

Domain Matching Level 4Domain Matching Level 4

2 2’A A C AC

Scale-FreeScale-Free Networks Networks

DEFINITION:DEFINITION: Scale-free networks, including the Internet, Scale-free networks, including the Internet, are characterized by an uneven distribution of are characterized by an uneven distribution of connectedness. Instead of the nodes of these networks connectedness. Instead of the nodes of these networks having a random pattern of connections, some nodes act having a random pattern of connections, some nodes act as "very connected" hubs, a fact that dramatically as "very connected" hubs, a fact that dramatically influences the way the network operates. influences the way the network operates.

Barabasi and his colleagues mapped the connectedness Barabasi and his colleagues mapped the connectedness

of the Web.of the Web. Their experiment yielded a connectivity map that they Their experiment yielded a connectivity map that they christened "scale-free." christened "scale-free."

Scale Free Vs RandomScale Free Vs RandomRandom networks suffer from random failures because each node important as any other

"scale free" networks are more immune to random failure due to the redundancy of paths linking nodes

connectivity ensured by few highly connected nodes

"scale free" networks are prone to catastrophic failure when key "hubs"are attacked

Scale Free Vs RandomScale Free Vs Random(a) constructed by laying down N nodes and connecting each pair with probability p. This network has N = 10 and p = 0.2.

(b) A new node (red) connects to two existing nodes in the network (black) at time t + 1. This new node is much more likely to connect to highly connected nodes, a phenomenon called preferential attachment.

(c) The network connectivity can be characterized by the probability P(k) that a node has k links. For random graphs P(k) is strongly peaked at k = <k> and decays exponentially for large k.

(d) A scale-free network does not have a peak in P(k), and decays as a power law P(k) ~ k g at large k.

(e) A random network - most nodes have approximately the same number of links.

(f) The majority of nodes in a scale-free network have one or two links, but a few nodes have a large number of links; this guarantees that the system is fully connected

Scale Free Vs RandomScale Free Vs Random

• More than 60% of nodes (green) can be reached from the five most connected nodes (red) • compared with only 27% in the random network. • This demonstrates the key role that hubs play in the scale-free network.• Both networks contain the 130 nodes and 430 links.

ResultsResults

To convert a correlation matrix to a corresponding co-expression network, a suitable cutoff value for the correlation coefficient must be chosen.

Based on the previous reports that biological networks, including co-expression networks, follow a scale-free distribution of connectivities, we chose a cutoff

value which gave fewer vertices with higher degree (connectivity).

• The correlation cutoff value of 0.95 is appropriate

ResultsResultsUsing a series of values for parameters k and f, we found a number of putative network motifs under different domain matching levels

ResultsResults

As shown in Table 1, both increasing k, the size of network motifs and f, the minimum number of non overlapping instances, decrease the number of network motifs detected.

To gain further confidence in our predictions, we used yeast protein interaction data that includes rich protein complex information, and searched within this data for instances of putative malaria network motifs.

We can see (in the next slides) that more malaria network motifs were supported by yeast interaction data as the parameters became more stringent.

ResultsResults Confirmation of Prediction by Yeast Protein Interactions

Percentage of network motifs having instancein yeast PPI network.Domain matching level 2

ResultsResults Confirmation of Prediction by Yeast Protein Interactions

Percentage of network motifs having instancein yeast PPI network.Domain matching level 4

Results -Results - Example 1

This figure shows a putative network motif detected under domain matching level 2, k = 6 and f = 2. This motif consists of six highly co-expressed genes.


This figure shows the instances formed by different combinations

of 27 genes detected in the yeast protein interaction network for

the malaria network motif shown in the last figure.

• Protein complex 11635 contains six genes forming an exact instance of the predicted network motif.


We hypothesized that individual instances of a network motif could function in different locations and times, dependent upon regulation.

The malaria time series data enables

us to test this hypothesis by examining the temporal expression profiles of instances of network motifs.

Next figure shows such an example network motif detected with parameter values at domain matching level 2, k = 3 and f = 2.



Apparently these six genes all have similar expression profiles and the only major difference is the timing.There is a phase difference between two instances while all three genes within each of two instances have the same expression profiles.Having instances in yeast protein interaction data provide further support that these genes do interact directly

• More Resultshttp://mouse.ornl.gov/~xpv/camda04/index.html

DISCUSSIONDISCUSSION

massive amounts of experimental

data have been collected

New computational approaches are needed to analyze these data in an integrative way

provide more reliable results with finer resolution for experimental verification

Here we propose a new strategy to analyze gene expression data by integrating a diversity of additional information, such as:

• primary sequence information • protein interaction data.

DISCUSSIONDISCUSSION

Our approach can easily make use of cross-species information.

Biological hypothesis

Modularity of biological networks

How genes within a module are orchestrated at mRNA level is our next research question.

THE THE ENDEND

ReferencesReferences

Detecting Network Motifs in Gene Co-expression Networks Xinxia Peng et al

http://www.weizmann.ac.il/mcb/UriAlon/index.htmlNetwork Motifs: Simple Building Blocks of Complex Networks R. Milo et al

Documents

Detecting Network Motifs in Gene Co-expression Networks