41
Detecting Network Motifs in Gene Co- expression Networks Shahar Artzi Shahar Artzi

Detecting Network Motifs in Gene Co-expression Networks

  • Upload
    lynch

  • View
    63

  • Download
    0

Embed Size (px)

DESCRIPTION

Detecting Network Motifs in Gene Co-expression Networks. Shahar Artzi. Outline. Introduction to Network Motifs Introduction to Detecting Network Motifs Co-expression Network Network Motif Discovery Scale-Free Networks Result Discussion. Network Motifs. What is “Network Motifs” ? - PowerPoint PPT Presentation

Citation preview

Page 1: Detecting Network Motifs in Gene Co-expression Networks

Detecting Network Motifs in Gene Co-expression

Networks

Shahar ArtziShahar Artzi

Page 2: Detecting Network Motifs in Gene Co-expression Networks

OutlineOutline

IntroductionIntroduction to Network Motifs

Introduction Introduction to Detecting Network Motifs

Co-expression NetworkCo-expression Network

Network Motif DiscoveryNetwork Motif Discovery

Scale-Free NetworksScale-Free Networks

ResultResult

DiscussionDiscussion

Page 3: Detecting Network Motifs in Gene Co-expression Networks

Network MotifsNetwork MotifsWhat is “Network Motifs” ? What is “Network Motifs” ?

Network Motifs are defined as patterns ofNetwork Motifs are defined as patterns of

interconnections that recur in manyinterconnections that recur in manydifferent parts of a network at frequenciesdifferent parts of a network at frequenciesmuch higher than those found inmuch higher than those found inrandomized networks.randomized networks.

Why we need them?Why we need them? * To help us understand how biological networks work.* To help us understand how biological networks work.

* Exact forecasting of operation and reaction in the * Exact forecasting of operation and reaction in the

network under given situations. network under given situations.

Page 4: Detecting Network Motifs in Gene Co-expression Networks

The concept of “network motifs” was first proposed by Uri Alon’s group :

Page 5: Detecting Network Motifs in Gene Co-expression Networks

Schematic View of Network Motif DetectionSchematic View of Network Motif Detection

Page 6: Detecting Network Motifs in Gene Co-expression Networks

Examples for motifsExamples for motifs

• FeedForward Loop FeedForward Loop ((היזון קדמיהיזון קדמי))Found in neural networks.Found in neural networks.It seems to be Used for neutralize It seems to be Used for neutralize ““Biological Noise”.Biological Noise”.

• Single-Input ModuleSingle-Input Module ((פס יצורפס יצור))

Implemented in gene control networksImplemented in gene control networks

Page 7: Detecting Network Motifs in Gene Co-expression Networks

Examples for motifsExamples for motifs

• Parallel pathsParallel paths

Found in neural networks.Found in neural networks.(and not so much in gene networks)(and not so much in gene networks)

Page 8: Detecting Network Motifs in Gene Co-expression Networks

INTRODUCTIONINTRODUCTIONto Detecting Network Motifs

Functionally related genes could be clustered together based on similar expression profiles.

Problem :

General clustering algorithms produce clusters ofrelatively large size, making it difficult to test the cluster of interest using wet-lab experiments.General clustering algorithms do not provide reasonably detailed information about the relationship among genes in a cluster.This makes it even more difficult for individual researchers to verify the predictions experimentally.

Page 9: Detecting Network Motifs in Gene Co-expression Networks

We need a way to break big clusters into smaller ones and thereby provide more detailed insights into relationships among genes within subclusters.

To achieve this goal:

we need additional independent information.

In this research they used Protein sequence information

The algorithms are able to decompose the clusters of genes into smaller ones by integrating protein domain information into the clustering algorithm.

INTRODUCTIONINTRODUCTIONto Detecting Network Motifs

Page 10: Detecting Network Motifs in Gene Co-expression Networks

Co-expression NetworkCo-expression Network

The genes are vertices (nodes).

An unweighted, and undirected edge (connection) is put between two genes if they are co-expressed with correlation higher than a specified threshold.

Correlation MatrixCorrelation Matrix AdjacencyAdjacency Matrix Matrix

Cutoff : 0.8Cutoff : 0.8

Page 11: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif Discovery

Here we extended the concept of network motif (Alon’s group proposal) to the labeled graph by also considering patterns of vertex labels.

Page 12: Detecting Network Motifs in Gene Co-expression Networks

NetworkNetwork Motif Discovery Motif DiscoveryFigure B shows 7 hypothetical genes in a co-expression

network forming three distinct instances of the network motif as described in Figure A.

II and III are overlappingII and III are overlappingI and II are non-overlappingI and II are non-overlapping(I and III are also non – overlapping)(I and III are also non – overlapping)

Page 13: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif Discovery Schematic View

< G, k, f >

Enumeration of k-vertex cliques

Groups of cliques

Network motifs

Protein domainProtein domain

f := number of non-overlapping f := number of non-overlapping cliquescliques

Page 14: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

First step :First step :Starting from a labeled graph whose vertices (genes) were labeled with their corresponding annotated Pfam protein domain information.

A generic clique-based clustering algorithm was applied to this labeled co-expression network to search for patterns of highly co-expressed genes or network motifs.

We’ll go over the clustering algorithm briefly.The algorithm work in Scoring Method :

Page 15: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

Our first step is to determine which genes appear to discriminate best among sample types.

To accomplish this, a discrimination score is calculated for each gene.

Only the best genes (those with the highest scores) are retained for subsequent steps.

Subtracting the sum of the standard deviations of a gene within each group allows us to eliminate, or at least diminish, the importance of any gene whose expression levels vary excessively.

The data is obtained in an n x m matrix.

Page 16: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

The algorithm can be described in pidgin ALGOL as follows:

From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening”*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3

Page 17: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

Alternative approach :

Elimination of outliers before computing the scores -

outliers might affect both the median and the standard deviation.

we modified our approach by adding a screening phase, in which we first compute the medians and the standard deviations for each gene within each group, then check the expression values corresponding to that gene, discarding those at least three standard deviations away from the group median.

We subsequently re-compute all medians and standard deviations using only the retained values.

Page 18: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

We describe this modified algorithm in pidgin ALGOL:

continuance continuance

From : “A COMBINATORIAL APPROACH TO THE ANALYSIS OF DIFFERENTIAL GENE EXPRESSION DATA The Use of Graph Algorithms for Disease Prediction and Screening”*Michael A. Langston1, Lan Lin1, Xinxia Peng2, Nicole E. Baldwin1, Christopher T. Symons1, Bing Zhang3 and Jay R. Snoddy3

Page 19: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif DiscoveryScoring Method

Algorithm cont’ : Algorithm cont’ :

Page 20: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif Discovery Subsequent Steps

For a specified k, we scanned all k-vertex

cliques and grouped all cliques found based on the protein domain information.

Next a parameter f specifying the minimum number of mutual non-overlapping instances in a network motif was used to trim the list of putative network motifs.

Only the putative network motifs having at least

f non-overlapping instances were kept as network motifs.

We further assessed the statistical significance of each

detected network motif by comparison to randomized networks.

Page 21: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif Discovery Subsequent Steps

Starting from the real co-expression network, we generated a randomized network by randomly permuting the domain labels of all genes while leaving the connection structure of the graph untouched, and then ran the same network motif detection procedure on the resulting randomized network.

This process was repeated 1,000 times and the percentage of times the same network motif was found in the randomized networks was defined as the p-value for the network motif.

Page 22: Detecting Network Motifs in Gene Co-expression Networks

Network Motif DiscoveryNetwork Motif Discovery domain matching levels

Here we propose only domain matching levels:

Matching level 2 : requires that two proteins have the exact same type of domain, the same number of each type of domain and all domains in the same order in the

respective protein sequences.

Matching level 4 :only requires the same type of domain.

The network motif detection procedure was run separately using different domain matching levels.

Page 23: Detecting Network Motifs in Gene Co-expression Networks

ExamplesExamples

A B C A B C1 1’

Domain Matching level 2Domain Matching level 2

Domain Matching Level 4Domain Matching Level 4

2 2’A A C AC

Page 24: Detecting Network Motifs in Gene Co-expression Networks

Scale-FreeScale-Free Networks Networks

DEFINITION:DEFINITION: Scale-free networks, including the Internet, Scale-free networks, including the Internet, are characterized by an uneven distribution of are characterized by an uneven distribution of connectedness. Instead of the nodes of these networks connectedness. Instead of the nodes of these networks having a random pattern of connections, some nodes act having a random pattern of connections, some nodes act as "very connected" hubs, a fact that dramatically as "very connected" hubs, a fact that dramatically influences the way the network operates. influences the way the network operates.

Barabasi and his colleagues mapped the connectedness Barabasi and his colleagues mapped the connectedness

of the Web.of the Web. Their experiment yielded a connectivity map that they Their experiment yielded a connectivity map that they christened "scale-free." christened "scale-free."

Page 25: Detecting Network Motifs in Gene Co-expression Networks

Scale Free Vs RandomScale Free Vs RandomRandom networks suffer from random failures because each node important as any other

"scale free" networks are more immune to random failure due to the redundancy of paths linking nodes

connectivity ensured by few highly connected nodes

"scale free" networks are prone to catastrophic failure when key "hubs"are attacked

Page 26: Detecting Network Motifs in Gene Co-expression Networks

Scale Free Vs RandomScale Free Vs Random(a) constructed by laying down N nodes and connecting each pair with probability p. This network has N = 10 and p = 0.2.

(b) A new node (red) connects to two existing nodes in the network (black) at time t + 1. This new node is much more likely to connect to highly connected nodes, a phenomenon called preferential attachment.

(c) The network connectivity can be characterized by the probability P(k) that a node has k links. For random graphs P(k) is strongly peaked at k = <k> and decays exponentially for large k.

(d) A scale-free network does not have a peak in P(k), and decays as a power law P(k) ~ k g at large k.

(e) A random network - most nodes have approximately the same number of links.

(f) The majority of nodes in a scale-free network have one or two links, but a few nodes have a large number of links; this guarantees that the system is fully connected

Page 27: Detecting Network Motifs in Gene Co-expression Networks

Scale Free Vs RandomScale Free Vs Random

• More than 60% of nodes (green) can be reached from the five most connected nodes (red) • compared with only 27% in the random network. • This demonstrates the key role that hubs play in the scale-free network.• Both networks contain the 130 nodes and 430 links.

Page 28: Detecting Network Motifs in Gene Co-expression Networks

ResultsResults

To convert a correlation matrix to a corresponding co-expression network, a suitable cutoff value for the correlation coefficient must be chosen.

Based on the previous reports that biological networks, including co-expression networks, follow a scale-free distribution of connectivities, we chose a cutoff

value which gave fewer vertices with higher degree (connectivity).

• The correlation cutoff value of 0.95 is appropriate

Page 29: Detecting Network Motifs in Gene Co-expression Networks

ResultsResultsUsing a series of values for parameters k and f, we found a number of putative network motifs under different domain matching levels

Page 30: Detecting Network Motifs in Gene Co-expression Networks

ResultsResults

As shown in Table 1, both increasing k, the size of network motifs and f, the minimum number of non overlapping instances, decrease the number of network motifs detected.

To gain further confidence in our predictions, we used yeast protein interaction data that includes rich protein complex information, and searched within this data for instances of putative malaria network motifs.

We can see (in the next slides) that more malaria network motifs were supported by yeast interaction data as the parameters became more stringent.

Page 31: Detecting Network Motifs in Gene Co-expression Networks

ResultsResults Confirmation of Prediction by Yeast Protein Interactions

Percentage of network motifs having instancein yeast PPI network.Domain matching level 2

Page 32: Detecting Network Motifs in Gene Co-expression Networks

ResultsResults Confirmation of Prediction by Yeast Protein Interactions

Percentage of network motifs having instancein yeast PPI network.Domain matching level 4

Page 33: Detecting Network Motifs in Gene Co-expression Networks

Results -Results - Example 1

This figure shows a putative network motif detected under domain matching level 2, k = 6 and f = 2. This motif consists of six highly co-expressed genes.

Page 34: Detecting Network Motifs in Gene Co-expression Networks

Results -Results - Example 1

This figure shows the instances formed by different combinations

of 27 genes detected in the yeast protein interaction network for

the malaria network motif shown in the last figure.

• Protein complex 11635 contains six genes forming an exact instance of the predicted network motif.

Page 35: Detecting Network Motifs in Gene Co-expression Networks

Results -Results - Example 2

We hypothesized that individual instances of a network motif could function in different locations and times, dependent upon regulation.

The malaria time series data enables

us to test this hypothesis by examining the temporal expression profiles of instances of network motifs.

Next figure shows such an example network motif detected with parameter values at domain matching level 2, k = 3 and f = 2.

Page 36: Detecting Network Motifs in Gene Co-expression Networks

Results -Results - Example 2

Page 37: Detecting Network Motifs in Gene Co-expression Networks

Results -Results - Example 2

Apparently these six genes all have similar expression profiles and the only major difference is the timing.There is a phase difference between two instances while all three genes within each of two instances have the same expression profiles.Having instances in yeast protein interaction data provide further support that these genes do interact directly

• More Resultshttp://mouse.ornl.gov/~xpv/camda04/index.html

Page 38: Detecting Network Motifs in Gene Co-expression Networks

DISCUSSIONDISCUSSION

massive amounts of experimental

data have been collected

New computational approaches are needed to analyze these data in an integrative way

provide more reliable results with finer resolution for experimental verification

Here we propose a new strategy to analyze gene expression data by integrating a diversity of additional information, such as:

• primary sequence information • protein interaction data.

Page 39: Detecting Network Motifs in Gene Co-expression Networks

DISCUSSIONDISCUSSION

Our approach can easily make use of cross-species information.

Biological hypothesis

Modularity of biological networks

How genes within a module are orchestrated at mRNA level is our next research question.

Page 40: Detecting Network Motifs in Gene Co-expression Networks

THE THE ENDEND

Page 41: Detecting Network Motifs in Gene Co-expression Networks

ReferencesReferences

Detecting Network Motifs in Gene Co-expression Networks Xinxia Peng et al

http://www.weizmann.ac.il/mcb/UriAlon/index.htmlNetwork Motifs: Simple Building Blocks of Complex Networks R. Milo et al