34
1. Lecture WS 2005/06 Bioinformatics III 1 Bioinformatics III “Systems biology” “Integrative cell biology” “Cellular networks” “Computational cell biology” Course will teach mathematical methods that are applied from protein complexes to interaction networks

Course will teach mathematical methods that are applied

Embed Size (px)

DESCRIPTION

Bioinformatics III “Systems biology” “Integrative cell biology” “Cellular networks” “Computational cell biology”. Course will teach mathematical methods that are applied from protein complexes to interaction networks. Content. Week1networks in biology: effects of different topologies - PowerPoint PPT Presentation

Citation preview

1. Lecture WS 2005/06

Bioinformatics III 1

Bioinformatics III “Systems biology”

“Integrative cell biology”“Cellular networks”

“Computational cell biology”

Course will teach mathematical methods that are applied

from protein complexes to interaction networks

1. Lecture WS 2005/06

Bioinformatics III 2

Content

Week1 networks in biology: effects of different topologies

Week2 intro of protein complexes: exp. data

Week3 protein networks: computational analysis

Week4 protein networks: graphical layout (force minimization)

Week5 protein networks: quality check (Bayesian analysis)

Week6 protein networks: modularity

Week7 FFT protein-protein docking, fitting into EM maps, tomography

Week8 transcription, regulatory networks, motifs

Week9 integration of interactome and regulome (Lichtenberg)

Week10+11 metabolic networks: metabolic flux analysis, extreme pathways,

elementary modes, C13 method

Week12 mathematical modelling of signal transduction networks

Week13 integration of protein networks with metabolic pathways

Week14 exam

1. Lecture WS 2005/06

Bioinformatics III 3

Appetizer 1

Cell cycle proteins that are part

of complexes or other physical

interactions are shown within

the circle.

For the dynamic proteins, the

time of peak expression is

shown by the node color;

static proteins are represented

as white nodes.

Outside the circle, the dynamic

proteins without interactions

are positioned and colored

according to their peak time.

Lichtenberg et al. Science 307, 724 (2005)

1. Lecture WS 2005/06

Bioinformatics III 4

Appetizer 2

c, Standard statistics (global topological measures and local network motifs) describing network structures. These vary between endogenous and exogenous conditions; those that are high compared with other conditions are shaded. (Note, the graph for the static state displays only sections that are active in at least one condition, but the table provides statistics for the entire network including inactive regions.)

Luscombe, Babu, … Teichmann, Gerstein, Nature 431, 308 (2004)

a, Schematics and summary of properties for the endogenous and exogenous sub-networks.

b, Graphs of the static and condition-specific networks. Transcription factors and target genes are shown as nodes in the upper and lower sections of each graph respectively, and regulatory interactions are drawn as edges; they are coloured by the number of conditions in which they are active. Different conditions use distinct sections of the network.

1. Lecture WS 2005/06

Bioinformatics III 5

Appetizer 3

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7bR7f

3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3

1. Lecture WS 2005/06

Bioinformatics III 6

Mathematical techniques covered

Mathematical graphs – classification of protein-protein interaction networks,

– algorithms on graphs

– regulatory networks

Fourier transformation – protein/protein-docking, pattern matching

Linear and convex algebra

– metabolic networks

Ordinary and stochastic differential equations

– kinetic modelling of signal transduction pathways

1. Lecture WS 2005/06

Bioinformatics III 7

Literature

lecture slides will be available prior to lectures

suggested reading: links will be put up on course website

http://gepard.bioinformatik.uni-saarland.de/teaching...

1. Lecture WS 2005/06

Bioinformatics III 8

assignments

10 - 12 weekly assignments planned

Homework assignments are handed out in the Thursday lectures and are

available on the course website on the same day.

Homework will include many programming assignments. You can program in

any popular programming language. We recommend powerful script languages

such as Phython or Perl that allow to solve problems efficiently.

Solutions need to be returned until Thursday of the following week 14.00

to Tihamer Geyer in room 1.09 Geb. 17.1, first floor, or handed in prior (!) to the

lecture starting at 14.15. 2 students may submit one joint solution.

Also possible: submit solution by e-mail as 1 printable PDF-file to

[email protected].

Tutorial: participation is recommended but not mandatory. Date: Tue 11-13 ?

Homeworks submitted on Thursdays will be discussed on the following Tuesday.

Each student needs to present his solution to one of the assignments on the

blackboard once in the tutorial session.

1. Lecture WS 2005/06

Bioinformatics III 9

Schein = successful written exam

The successful participation in the lecture course („Schein“) will be certified upon

successful completion of the written exam in February 2006.

Participation at the exam is open to those students who have received 50% of

credit points for the assignments and presented once during the tutorials.

Unless published otherwise on the course website until 3 weeks prior to exam,

the exam will be based on all material covered in the lectures and in the

assignments.

In case of illness please send E-mail to:

[email protected] and provide a medical certificate.

A „second and final chance“ exam will be offered in April 2006.

1. Lecture WS 2005/06

Bioinformatics III 10

tutor

Dr. Tihamer Geyer – assignments

Geb. 17.1, room 1.09

[email protected]

1. Lecture WS 2005/06

Bioinformatics III 11

Systems biology

Biological research in the 1900s followed a reductionist approach:

detect unusual phenotype isolate/purify 1 protein/gene, determine its

function

However, it is increasingly clear that discrete biological function can only rarely

be attributed to an individual molecule.

new task of understanding the structure and dynamics of the complex

intercellular web of interactions that contribute to the structure and function of

a living cell.

1. Lecture WS 2005/06

Bioinformatics III 12

Systems biology

Development of high-throughput data-collection techniques,

e.g. microarrays, protein chips, yeast two-hybrid screens

allow to simultaneously interrogate all cell components at any given time.

there exists various types of interaction webs/networks

- protein-protein interaction network

- metabolic network

- signalling network

- transcription/regulatory network ...

These networks are not independent but form „network of networks“.

1. Lecture WS 2005/06

Bioinformatics III 13

DOE initiative: Genomes to Lifea coordinated effort

slides borrowedfrom talk of

Marvin FrazierLife Sciences DivisionU.S. Dept of Energy

1. Lecture WS 2005/06

Bioinformatics III 14

Facility IProduction and Characterization of Proteins

Estimating Microbial Genome Capability

• Computational Analysis– Genome analysis of genes, proteins, and operons– Metabolic pathways analysis from reference data– Protein machines estimate from PM reference data

• Knowledge Captured– Initial annotation of genome– Initial perceptions of pathways and processes– Recognized machines, function, and homology– Novel proteins/machines (including

prioritization)– Production conditions and experience

1. Lecture WS 2005/06

Bioinformatics III 15

• Analysis and Modeling

– Mass spectrometry expression analysis

– Metabolic and regulatory pathway/ network analysis and modeling

• Knowledge Captured– Expression data and conditions– Novel pathways and processes– Functional inferences about novel

proteins/machines– Genome super annotation: regulation, function,

and processes (deep knowledge about cellular subsystems)

Facility II Whole Proteome Analysis

Modeling Proteome Expression, Regulation, and Pathways

1. Lecture WS 2005/06

Bioinformatics III 16

Facility III Characterization and Imaging of Molecular Machines

Exploring Molecular Machine Geometry and Dynamics

• Computational Analysis, Modeling and Simulation

– Image analysis/cryoelectron microscopy

– Protein interaction analysis/mass spec

– Machine geometry and docking modeling

– Machine biophysical dynamic simulation

• Knowledge Captured

– Machine composition, organization, geometry,

assembly and disassembly

– Component docking and dynamic simulations

of machines

1. Lecture WS 2005/06

Bioinformatics III 17

Facility IVAnalysis and Modeling of Cellular Systems

Simulating Cell and Community Dynamics

• Analysis, Modeling and Simulation

– Couple knowledge of pathways, networks, and

machines to generate an understanding of

cellular and multi-cellular systems

– Metabolism, regulation, and machine simulation

– Cell and multicell modeling and flux visualization

• Knowledge Captured

– Cell and community measurement data sets

– Protein machine assembly time-course data sets

– Dynamic models and simulations of cell processes

1. Lecture WS 2005/06

Bioinformatics III 18

“Genomes To Life” Computing Roadmap

Biological Complexity

ComparativeGenomics

Constraint-BasedFlexible Docking

Co

mp

uti

ng

an

d I

nfo

rmat

ion

In

fras

tru

ctu

re C

apab

ilit

ies

Constrained rigid

docking

Genome-scale protein threading

Community metabolic regulatory, signaling simulations

Molecular machine classical simulation

Protein machineInteractions

Cell, pathway, and network

simulation

Molecule-basedcell simulation

Current U.S. Computing

1. Lecture WS 2005/06

Bioinformatics III 19

Are biological networks special?

Albert-Laszlo Barabasi

Statistical physics:

Tries to finding universal scaling laws of systems,

e.g. how does the dynamics of a glass change

when you lower the temperature?

Phase-transition „critical slowing down“.

„Relaxtion times in spin-glasses or glasses are observed to

grow to such an extent at low temperatures that these systems

do not reach thermal equilibrium on experimentally accessible

time-scales. Properties of such systems are then often found to

depend on their history of preparation; such systems are said to

age.

Similar observations are made in coarsening dynamics at first

order phase transitions. Some properties of spin-glasses and

glasses must therefore be studied via dynamical approaches

which allow taking possible history dependence explicitly into

account.“

1. Lecture WS 2005/06

Bioinformatics III 20

A power law relationship between two scalar quantities x and y is any such that the

relationship can be written as

where a (the constant of proportionality) and k (the exponent of the power law) are

constants.

Power laws can be seen as a straight line on a log-log graph since, taking logs of

both sides, the above equation is equal to

which has the same form as the equation for a line

Power laws are observed in many fields, including physics, biology, geography,

sociology, economics, and war and terrorism. They are among the most frequent

scaling laws that describe the scaling invariance found in many natural phenomena.

www.wikipedia.org

Power laws

kaxy

axk

axy k

loglog

)log(log

cmxy

1. Lecture WS 2005/06

Bioinformatics III 21

First breakthrough: scale-free metabolic networks

(d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology.

(e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical

architecture of metabolism.

(f) The flux distribution in the central metabolism of Escherichia coli follows a power law.

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

1. Lecture WS 2005/06

Bioinformatics III 22

Second breakthrough: Yeast protein interaction network:first example of a scale-free network

A map of protein–protein interactions in

Saccharomyces cerevisiae, which is

based on early yeast two-hybrid

measurements, illustrates that a few

highly connected nodes (which are also

known as hubs) hold the network

together.

The largest cluster, which contains

78% of all proteins, is shown.

The colour of a node indicates the

phenotypic effect of removing the

corresponding protein (red = lethal,

green = non-lethal, orange = slow

growth, yellow = unknown). Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)

1. Lecture WS 2005/06

Bioinformatics III 23

Degree

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

The most elementary characteristic of a node is its

degree (or connectivity), k, which tells us how

many links the node has to other nodes.

a In the undirected network, node A has k = 5.

b In networks in which each link has a selected

direction there is an incoming degree, kin, which

denotes the number of links that point to a node,

and an outgoing degree, kout, which denotes the

number of links that start from it.

E.g., node A in b has kin = 4 and kout = 1.

An undirected network with N nodes and L links is

characterized by an average degree <k> = 2L/N

(where <> denotes the average).

Why?

1. Lecture WS 2005/06

Bioinformatics III 24

Degree distribution

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

The degree distribution, P(k), gives the

probability that a selected node has exactly k

links.

P(k) is obtained by counting the number of nodes

N(k) with k = 1,2... links and dividing by the total

number of nodes N.

The degree distribution allows us to distinguish

between different classes of networks.

1. Lecture WS 2005/06

Bioinformatics III 25

Clustering coefficient

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

In many networks, if node A is connected to B, and B is

connected to C, then it is highly probable that A also has

a direct link to C. This phenomenon can be quantified

using the clustering coefficient

where nI is the number of links connecting the kI

neighbours of node I to each other.

In other words, CI gives the number of 'triangles' that go

through node I, whereas kI (kI -1)/2 is the total number of

triangles that could pass through node I, should all of

node I's neighbours be connected to each other.

12

kk

nC ll

1. Lecture WS 2005/06

Bioinformatics III 26

Clustering coefficient

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

For example, only one pair of node A's five neighbours in a are

linked together (B and C), which gives nA = 1 and CA = 2/20. By

contrast, none of node F's neighbours link to each other, giving

CF = 0. The average clustering coefficient, <C >, characterizes

the overall tendency of nodes to form clusters or groups.

An important measure of the network's structure is the function

C(k), which is defined as the average clustering coefficient of all

nodes with k links. For many real networks C(k) k-1, which is

an indication of a network's hierarchical character.

The average degree <k>, average path length <ℓ> and average

clustering coefficient <C> depend on the number of nodes and

links (N and L) in the network. By contrast, the P(k) and C(k )

functions are independent of the network's size and they

therefore capture a network's generic features, which allows

them to be used to classify various networks.

1. Lecture WS 2005/06

Bioinformatics III 27

Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)

Aa

The Erdös–Rényi (ER) model of a random network starts with N

nodes and connects each pair of nodes with probability p, which

creates a graph with approximately pN (N-1)/2 randomly placed

links.

Ab

The node degrees follow a Poisson distribution, where most

nodes have approximately the same number of links (close to

the average degree <k>). The tail (high k region) of the degree

distribution P(k ) decreases exponentially, which indicates that

nodes that significantly deviate from the average are extremely

rare.

Ac

The clustering coefficient is independent of a node's degree, so

C(k) appears as a horizontal line if plotted as a function of k.

The mean path length is proportional to the logarithm of the

network size, l log N, which indicates that it is characterized by

the small-world property.

Random networks

Why?

1. Lecture WS 2005/06

Bioinformatics III 28

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

Scale-free networks Scale-free networks are characterized by a power-law degree

distribution; the probability that a node has k links follows

P(k) ~ k- -, where is the degree exponent.

The probability that a node is highly connected is statistically

more significant than in a random graph, the network's properties

often being determined by a relatively small number of highly

connected nodes („hubs“, see blue nodes in Ba).

In the Barabási–Albert model of a scale-free network, at each

time point a node with M links is added to the network, it

connects to an already existing node I with probability I = kI/JkJ,

where kI is the degree of node I and J is the index denoting the

sum over network nodes. The network that is generated by this

growth process has a power-law degree distribution with = 3.

1. Lecture WS 2005/06

Bioinformatics III 29

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

Scale-free networks

(Bb) Power-law distributions are seen as a straight

line on a log–log plot.

(Bc) The network that is created by the Barabási–

Albert model does not have an inherent modularity,

so C(k) is independent of k.

Scale-free networks with degree exponents 2<

<3, a range that is observed in most biological and

non-biological networks, are ultra-small, with the

average path length following ℓ ~ log log N, which

is significantly shorter than log N that characterizes

random small-world networks.

1. Lecture WS 2005/06

Bioinformatics III 30

Importance of the degree exponent

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

The value of in P(k) k - determines many properties of the

system. The smaller the value of , the more important the role

of the hubs is in the network.

In general, the unusual properties of scale-free networks are

valid only for < 3.

For 2> >3 there is a hierarchy of hubs, with the most

connected hub being in contact with a small fraction of all

nodes.

For = 2 a hub-and-spoke network emerges, with the largest

hub being in contact with a large fraction of all nodes.

Here, the dispersion of the P(k) distribution, defined as 2 = <k2>

- <k>2, increases with the number of nodes (that is, diverges),

resulting in a series of unexpected features, such as a high

degree of robustness against accidental node failures.

For >3, the hubs are not relevant, most unusual features are

absent, and in many respects the scale-free network behaves

like a random one.

1. Lecture WS 2005/06

Bioinformatics III 31

Shortest path and mean path length

Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

The distance in networks is measured by the path length,

which tells us how many links we need to pass through to

travel between two nodes.

As there are many alternative paths between two nodes,

the shortest path — the path with the smallest number of

links between the selected nodes — has a special role.

In directed networks, the distance ℓAB from node A to node

B is often different from the distance ℓBA from B to A. E.g. in

b , ℓBA = 1, whereas ℓAB = 3.

Often there is no direct path between two nodes. As shown

in b, although there is a path from C to A, there is no path

from A to C. The mean path length, <ℓ>, represents the

average over the shortest paths between all pairs of nodes

and offers a measure of a network's overall navigability.

1. Lecture WS 2005/06

Bioinformatics III 32

First breakthrough: scale-free metabolic networks

(d) The degree distribution, P(k), of the metabolic network illustrates its scale-free topology.

(e) The scaling of the clustering coefficient C(k) with the degree k illustrates the hierarchical

architecture of metabolism (The data shown in d and e represent an average over 43

organisms).

(f) The flux distribution in the central metabolism of Escherichia coli follows a power law,

which indicates that most reactions have small metabolic flux, whereas a few reactions, with

high fluxes, carry most of the metabolic activity. It should be noted that on all three plots the

axis is logarithmic and a straight line on such log–log plots indicates a power-law scaling.

CTP, cytidine triphosphate; GLC, aldo-hexose glucose; UDP, uridine diphosphate; UMP,

uridine monophosphate; UTP, uridine triphosphate.Barabasi & Oltvai, Nature Reviews Genetics 5, 101 (2004)

1. Lecture WS 2005/06

Bioinformatics III 33

Second breakthrough: Yeast protein interaction network:first example of a scale-free network

A map of protein–protein interactions in

Saccharomyces cerevisiae, which is

based on early yeast two-hybrid

measurements, illustrates that a few

highly connected nodes (which are also

known as hubs) hold the network

together.

The largest cluster, which contains

78% of all proteins, is shown. The colour

of a node indicates the phenotypic effect

of removing the corresponding protein

(red = lethal, green = non-lethal, orange

= slow growth, yellow = unknown).

Barabasi & Oltvai, Nature Rev Gen 5, 101 (2004)

1. Lecture WS 2005/06

Bioinformatics III 34

Summary Many cellular networks show properties of scale-free networks

- protein-protein interaction networks

- metabolic networks

- genetic regulatory networks (where nodes are individual genes and links are

derived from expression correlation e.g. by microarray data)

- protein domain networks

However, not all cellular networks are scale-free.

E.g. the transcription regulatory networks of S. cerevisae and E.coli are examples

of mixed scale-free and exponential characteristics.

It is a topic of ongoing debate whether the analysis of subnetworks (available data

is sparse) allows conclusions on the underlying topology of the entire network.

Next lecture:

- mathematical properties of networks

- origin of scale-free topology

- topological robustnessBarabasi & Oltvai, Nature Rev Gen 5, 101 (2004)