45
Systems Biology, April 25 th 2007 Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology

Networks and Network Topology

Embed Size (px)

DESCRIPTION

Networks and Network Topology. Network Example - The Internet. http://www.jeffkennedyassociates.com:16080/connections/concept/image.html. Co-authorship at Max Planck. http://www.jeffkennedyassociates.com:16080/connections/concept/image.html. Network Measures. Degree k i - PowerPoint PPT Presentation

Citation preview

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Networks and Network Topology

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkNetwork Example - The Internet

http://www.jeffkennedyassociates.com:16080/connections/concept/image.html

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkCo-authorship at Max Planck

http://www.jeffkennedyassociates.com:16080/connections/concept/image.html

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkNetwork Measures

• Degree ki

• Degree distribution P(k)

• Mean path length

• Network Diameter

• Clustering Coefficient

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Paths:metabolic, signaling pathways

Cliques:protein complexes

Hubs:regulatory modules

Subgraphs:maximally weighted

Network Analysis

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkGraphs

• Graph G=(V,E) is a set of vertices V and edges E

• A subgraph G’ of G is induced by some V’ V and E’ E

• Graph properties:– Connectivity (node degree, paths)– Cyclic vs. acyclic– Directed vs. undirected

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkSparse vs Dense

• G(V, E) where |V|=n, |E|=m the number of vertices and edges

• Graph is sparse if m~n

• Graph is dense if m~n2

• Complete graph when m=n2

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkConnected Components

• G(V,E)

• |V| = 69

• |E| = 71

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkConnected Components

• G(V,E)

• |V| = 69

• |E| = 71

• 6 connected components

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkPaths

A path is a sequence {x1, x2,…, xn} such that (x1,x2), (x2,x3), …, (xn-1,xn) are edges of the graph.

A closed path xn=x1 on a graph is called a graph cycle or circuit.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkShortest-Path between nodes

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkShortest-Path between nodes

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkLongest Shortest-Path

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkSmall-world Network

• Every node can be reached from every other by a small number of hops or steps

• High clustering coefficient and low mean-shortest path length– Random graphs don’t necessarily have high clustering coefficients

• Social networks, the Internet, and biological networks all exhibit small-world network characteristics

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

gene A gene Bregulates

gene A gene Bbinds

gene A gene B

reaction product

is a substrate for

regulatory interactions(protein-DNA)

functional complexB is a substrate of A

(protein-protein)

metabolic pathways

Network Representation

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkRepresentation of Metabolic Reactions

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkNetwork Measures: Degree

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

P(k) is probability of each degree k, i.e fraction of nodes having that degree.

For random networks, P(k) is normally distributed.

For real networks the distribution is often a power-law:

P(k) ~ k

Such networks are said to be scale-free

Degree Distribution

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkInterconnected Regions: Modules

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

1

2

2

kk

nkn

C III

k: neighbors of I

nI: edges between

node I’s neighbors

The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity

The center node has 8 (grey) neighbors

There are 4 edges between the neighbors

C = 2*4 /(8*(8-1)) = 8/56 = 1/7

Clustering Coefficient

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkHierarchical Networks

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkDetecting Hierarchical Organization

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

y = 1.2x-1.91

1.0E-04

1.0E-03

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1 10 100

Degree k

P (

k)

Knock-out Lethality and Connectivity

0

10

20

30

40

50

60

0 5 10 15 20 25

Degree k

% E

ssen

tial

Gen

es

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Target the hubs to have an efficient safe sex education campaign

Lewin Bo, et al., Sex i Sverige; Om sexuallivet i Sverige 1996, Folkhälsoinstitutet, 1998

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkScale-Free Networks are Robust

• Complex systems (cell, internet, social networks), are resilient to component failure

• Network topology plays an important role in this robustness– Even if ~80% of nodes fail, the remaining ~20% still maintain

network connectivity

• Attack vulnerability if hubs are selectively targeted

• In yeast, only ~20% of proteins are lethal when deleted, and are 5 times more likely to have degree k>15 than k<5.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkOther Interesting Features

• Cellular networks are assortative, hubs tend not to interact directly with other hubs.

• Hubs tend to be “older” proteins (so far claimed for protein-protein interaction networks only)

• Hubs also seem to have more evolutionary pressure—their protein sequences are more conserved than average between species (shown in yeast vs. worm)

• Experimentally determined protein complexes tend to contain solely essential or non-essential proteins—further evidence for modularity.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkSummary: Network Measures

• Degree ki

The number of edges involving node i

• Degree distribution P(k)The probability (frequency) of nodes of degree k

• Mean path lengthThe avg. shortest path between all node pairs

• Network Diameter– i.e. the longest shortest path

• Clustering Coefficient– A high CC is found for modules

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Finding Overrepresented

Motifs

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkMetabolic and Transcription Factor Networks

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkOverrepresented Motifs

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Identifying protein complexes in protein-

protein interaction networks

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkIdentifying protein complexes from PPI data

Barabasi & Oltvai, Nature Reviews, 2004

Identifying protein complexes from protein-protein interaction data require computational tools.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkThe MCODE algorithm

The three steps of MCODE

1. Vertex weighting

2. Complex prediction

3. Post-processing

Molecular Complex Detection

MCODE

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkVertex (nodes) weighting

Vertex weighting

1. Find neighbors

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkVertex (nodes) weighting

Vertex weighting

1. Find neighbors

2. Get highest k-core graph

K-core graph:

A graph of minimal degree k, i.e.

All nodes must have at least k connections

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkVertex (nodes) weighting

Vertex weighting

1. Find neighbors

2. Get highest k-core graph

3. Calculate density of k-core graph

Density:

Number of observed edges, E, divided by the total number of possible edges, Emax

Emax = V (V-1)/2 (networks without loops)

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkVertex (nodes) weighting

Vertex weighting

1. Find neighbors

2. Get highest k-core graph

3. Calculate density of k-core graph

4. Calculate vertex (node) weight:

Density * kmax

Density:

Number of observed edges, E, divided by the total number of possible edges, Emax

Emax = V (V-1)/2 (networks without loops)

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkMolecular complex prediction

Complex prediction

1. Seed complex by nodes with highest weight

2. Include neighbors if the vertex weight is above threshold (VWP)

3. Repeat step 2 until no more nodes can be included

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkPost-processing

Complex post-processing

1. Complexes must contain at least a 2-core graph

2. Include neighbors if the vertex weight is above the fluff parameter (optional)

3. Haircut: Remove nodes with a degree less than two (optional)

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Identifying active subgraphs

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signaling circuits in molecular interaction networks.Bioinformatics. 2002;18 Suppl 1:S233-40.

Active Subgraphs

Find high scoring subnetwork based on

data integration

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkScoring a Sub-graph

Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signaling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of DenmarkSignificance Assessment of Active Module

Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.

Score distributions for the 1st - 5th best scoring modules before (blue) and after (red) randomizing Z-scores (“states”). Randomization disrupts correlation between gene expression and network location.

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Finding “Active” Pathways in a Large Network is Hard

• Finding the highest scoring subnetwork is NP hard, so we use heuristic search algorithms to identify a collection of high-scoring subnetworks (local optima)

• Simulated annealing and/or greedy search starting from an initial subnetwork “seed”

• Considerations: Local topology, sub-network score significance (is score higher than would be expected at random?), multiple states (conditions)

Systems Biology, April 25th 2007Thomas Skøt Jensen

Technical University of Denmark

Summary

• Network measures– degree, network diameter, degree distributions,

clustering coefficient

• Network modularity and robustness from hubs

• Analyzing networks– Finding motifs, identifying modules (complexes)

• Data integration– Finding active subnetworks