SYSTEMS BIOLOGY Lukasz Huminiecki, DPhil Nobel medical institute, Karolinska, Stockholm & Ludwig...

Preview:

DESCRIPTION

WHAT IS ”SYSTEMS BIOLOGY”? ”Systems biology is the coordinated study of biological systems by (1) investigating the components of cellular networks and their interactions, (2) applying exprerimental high- throughput and whole-genome techniques, and (3) integrating computational methods with experiemntal efforts.” – first sentence of the Preface, to Klipp E et al. ”Systems Biology in Practice”, WILEY-VCH, What do you think?

Citation preview

SYSTEMS BIOLOGY

Lukasz Huminiecki, DPhil

Nobel medical institute, Karolinska, Stockholm & Ludwig Institute for Cancer Research, Uppsala

Please, tell me who you are!

Computer scientist/mathematician

Computational biologist/bioinformatician

Raise your hand if you are:

Experimental biologist

Postgraduate

Undergraduate

Post-doc

WHAT IS ”SYSTEMS BIOLOGY”?

”Systems biology is the coordinated study of biological systems by (1) investigating the components of cellular networks and their interactions, (2) applying exprerimental high-throughput and whole-genome techniques, and (3) integrating computational methods with experiemntal efforts.” – first sentence of the Preface, to Klipp E et al. ”Systems Biology in Practice”, WILEY-VCH, 2005.

What do you think?

Back to the Roots?

In fact, early criticics argued that molecular approaches are too reductionist, attempting to explain complex biological phenomena, through actions of few genes or proteins.

There is a cyclical element to all progress!

Before the era of the molecular revolution physiology-oriented biologists were much more used to looking at living things as systems.

Four areas of systems biology on which I will focus today

• Analysis of expression patterns

• Mathematical modeling

• Phylogenetics

• Web-resources and data integration

PART 1EXPRESSION PATTERN

EVOLUTION

Classic view of evolution through gene duplication

• Susumu Ohno, 1970. Evolution by Gene Duplication. Springer, Berlin

• “Natural selection merely modified while redundancy created"

• The neo-functionalization model

Genome-scale tests (1)

Genome-scale tests (2)• Nembaware et al. 2002: Impact of the

presence of paralogs on sequence divergence in a set of mouse-human orthologs. Genome Research

Gene Expression Atlas• http://expression.gnf.org• 101 human (microchip U95A) and 89 mouse

(microchip U74A) Affymetrix experiments • Huminiecki L, Lloyd AT, Wolfe KH.

Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics. 2003 Jul 29;4(1):31

• Mapping to Ensembl via LocusLink• TRIBE families and Ka/Ks calculations using

yn00 from PAML

Huminiecki et al. “Congruence of tissue expression profiles from GEA, SAGEmap and TissueInfo databases”. BMC Genomics

R vs. Ks in paralogs

One-to-one orthologs

Human or mouse duplication

Cumulative plots

Randomisation test 

  R > 0.6 R > 0.7 R > 0.8 R > 0.9

Human duplication 

91%p = 0.37

58% p = 0.0042

52% p = 0.0043

60% p = 0.038

Mouse duplication 

61% p = 0.0111

48% p = 0.0027

36% p = 0.0015

24% p = 0.0018 

The percentages indicate the ratios between the fractions of genes having a particular R-value in sets of orthologues with the human (163 sets) or mouse (139 sets) duplication versus the group of one-to-one orthologues (1,324 pairs).

Sub-functionalisation• Force et al. argue that neo-

functionalisation alone could not account for high accumulation of duplicated genes in eucaryotes

• Duplication-degeneration-complementation (DDC)

• Should lead to tissue-specific expression!

Tissue-specific genes evolve faster and are more likely to belong to large gene families

Gene expression patterns are, in evolutionary perspective,

surprisingly labile!

Literature• Khaitovich P, Weiss G, Lachmann M, Hellmann I, Enard W, Muetzel

B, Wirkner U, Ansorge W, Paabo S. A neutral model of transcriptome evolution. PLoS Biol. 2004 May;2(5):E132. Epub 2004 May 11.

• Huminiecki L, Wolfe KH. Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res. 2004 Oct;14(10A):1870-9.

• Jordan IK, Marino-Ramirez L, Koonin EV. Evolutionary significance of gene expression divergence. Gene. 2005 Jan 17;345(1):119-26. Epub 2004 Dec 29.

• Khaitovich P, Paabo S, Weiss G. Toward a neutral evolutionary model of gene expression. Genetics. 2005 Jun;170(2):929-39. Epub 2005 Apr 16.

• Liao BY, Zhang J. Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006 Mar;23(3):530-40. Epub 2005 Nov 9.

The take home message• An entirely new paradigm is emerging

in evolutionary biology: expression patterns can change dramatically in the course of evolution.

• This impacts on our understanding of biodiversity, human origins, and drug discovery.

Broad goals of collaboration with Pfizer

We aim towards a set of heuristic rules to identify the most “druggable” GPCRs and the best model species in which to conduct preclinical tests. By “druggable” it is meant those which possess any single or combination of characteristics favourable to drug development, such as: (1) conserved sequence, (2) tissue-specificity, and (3) expression domain not overlapping with other members of the family.

Conserved sequence suggests that function is the same, and that drugs will have similar efficacy. A tissue-specific gene facilitates targeting into specific organs or tumour types, and is less likely to engage in multiple functions - both of these features are likely to result in advantageous toxicological profiles. Non-overlapping expression domain minimises the possibility of functional redundancy. Finally, the best animal model for preclinical trials is likely to be the species with the most “human” expression pattern of the target gene, especially in tissues directed for therapeutic intervention, as well as in toxicologically important organs, such as heart, lung, liver, kidney, and brain.

Specific goals of collaboration with Pfizer

• Generate high quality RNA preparations from 20 organs from duplicate male and female rat, guinea pig and dog samples, for comparison with commercial human RNA samples.

• Using qPCR techniques, determine the expression profile of at least 25 genes (with representatives from the histaminergic, serotinergic, and adrenergic GPCR families) in each of these tissues.

• Analyse data to consider congruence in expression profiles between species from an evolutionary bioinformatics perspective, in addition to gaining a deeper understanding into the degree of human-animal model translation and therefore into the suitability of animal species used for functional efficacy and toxicological studies at Pfizer.

Results: RNA isolation

a) b) c)

Polytron/RNAeasy with additional acid phenol step and DNAaway for difficult tissues

• cumulative genes RT_samples ---------- ------ ----------

• run ----> id <---------- id assay | symbol prep tissue | species

• gene ---------- -------- tissue rt ------- | ct | |

• \/ \/ • | RT_summary tissue preps • | ---------- ------------ -------• | id date ----- tissue_index <---------- prep • | technician | tissue_name species • | kit | tissue • | samples <- donor • | description ratio • | dilution yield • | technician• | housekeep_actb • -------------> housekeep_hprt • -------------- • count • run • tissue • ct • dev

The Database

The Ct value• Two-tube comparative method

with ”virtual” housekeeping gene • Amplification assumed to be exponential

with 100% efficiency, Cts scaled accordingly• a) histogram of Ct-values for over 6000 reactions; b) standard deviations

in triplicates; c) ACTB plotted against HPRT1. a) b) c)

• Tissue RNAs from rat, guinea pig and dog were isolated. Human RNAs were purchased from Clontech.

• Human, rat and canine expression profiles of just under 40 genes have been examined thus far. Approximately 8 thousand assays have been performed.

• A number of striking differences in expression patterns have been revealed.

• Thus far, the most remarkable expression shifts have been observed in heart and aorta, among histamine, prostacyclin and adrenergic beta receptors. Numerous changes were also localised to the uterus.

• Apart from divergent expression patterns, mean expression levels also appeared rather different for many genes.

• Differences in expression between prostanoid receptors may have implications for the pharmacology of troublesome COX-2 inhibitors (such as Celebrex, Bextra, and Vioxx).

Results overview

PART 2MODELING

Mathematische Modellierung von Stoffwechsel und Genexpression

Mathematical Modeling ofMetabolism and Gene Expression

• Dr. Edda Klipp• Kinetic Modeling Group

• Vorlesung in der Reihe• “Gene und Genome: die Zukunft der Biologie”

What is a model?Yeast, mouse – as models for human

Verbal explanation

A sequence of letters ATTCGAGGTATA for DNA sequence

Wiring scheme

Mathematical description: Boolean NetworkDifferential EquationsStochastic Equations

- Abstraction-(Simplified) representation allowing for understanding

Edda Klipp, Kinetic Modeling Group

Why modeling?

Even the behavior of simple systems can usually not be predicted intuitively and from experience.

The behavior of complex dynamical processes can not predicted with sufficient precision just from experience.

For prediction and explanation of processes one needs a model.

Experimental observations: many simple and complex processes

isolated enzymatic reaction:

temporal prozesses in metabolic networkspattern of gene expression and regulation

Edda Klipp, Kinetic Modeling Group

Why modeling?

Advantages- Time scales may be streched or compressed.

- Solution algorithms / computer programmes can often be used indepentend of the actually modeled system.

- Costs of modeling are lower than for experiments.

- Representation of quantities that are experimentally hidden.

- No risk for real systems, no interactions investigation/system.

Edda Klipp, Kinetic Modeling Group

Why modeling?

Burning questions- How is cellular response to environmental changes and stress regulated?

- How should a cell be treated to yield a high output of a desired product (Biotechnology)

- Where should a drug operate to cure a disease (Health care)?

- Is our knowledge about a network/pathway complete?

Edda Klipp, Kinetic Modeling Group

Structure of the system

SextS1 S2 S3 S4 S5 Smito

S6

fast

slowslow

Variables, parameters, constantsState variables - set of variables describing the system completelyDimension of the systems = number of independent state variables

How many variables are used in my model? too few – System ist under-determinedtoo many – System ist over-determined and may be contratictery

Units of variables and parameters etc. fit together?

Boundary of the system

Edda Klipp, Kinetic Modeling Group

Biological processes arecomplex phenomena

Central dogma of molecular biology:

GenemRNA

ProteinesCellular processes

Edda Klipp, Kinetic Modeling Group

Direction of discovery

known to be predicted

Structure FunctionProtein interactions Biochemical actionMetabolic pathways Concentration changesEnzyme sets Influence of perturbations

Possible behavior, bifurcations : :

Function StructureTransmission of a signal Sequence of signaling compoundsTime course of concentrations Possible protein interactions : :

Edda Klipp, Kinetic Modeling Group

Concept of stateThe state of a system is a snapshot of the system at a given time that contains enough information to predict the behaviour of the system for all future times. The state of the system is described by the set of variables that must kept track of in a model.Different models of gene regulation have different representations of the state:Boolean model: a state is a list containing for each gene involved, of whether

it is expressed („1“) or not expressed („0“)Differential equation model: a list of concentrations of each chemical entityProbabilistic model: a current probability distribution and/or a list of actual numbers of molecules of a type

Each model defines what it means by the state of a system.Given the current state the model predicts what state/s can occur next.

Edda Klipp, Kinetic Modeling Group

Kinetics – change of stateA Bk

Deterministic, continuous time and state: e.g. ODE modelconcentration of A decreases and concentration of B increases. Concentration change in per time interval dt is given by

AkdtdB

Probabilistic, discrete time and state : transformation of a molecule of type A into a molecule of type Sorte B. The probability of this event in a time interval dt is given by

aktadttaP ,,1a – number of molecules of type A

Deterministic, discrete time and state : e.g. Boolean network modelPresence (or activity) of B at time t+1 depends on presence (or activity) of A at time t tAftB 1

Edda Klipp, Kinetic Modeling Group

Boolean Models

(George Boole, 1815-1864)Each gene can assume one of two states:

expressed („1“) or not expressed („0“)

Background: Not enough information for more detailed descriptionIncreasing complexity and computational effort for more specific models

(discrete, deterministic)

Replacement of continuousfunctions (e.g. Hill function)by step function

Edda Klipp, Kinetic Modeling Group

Boolean ModelsBoolean network is characterized by- the number of nodes („genes“): N- the number of inputs per node (regulatory interactions): k

The dynamics are described by rules:

„if input value/s at time t is/are...., then output value at t+1 is....“

Boolean network have always a finite number of possible states and,therefore, a finite number of state transitions.

B C

Linear chain

Ring

A B C D

A B

C D

A

B

A

Edda Klipp, Kinetic Modeling Group

Boolean ModelsTruth functions

in outputp p not p

0 0 0 1 11 0 1 0 1

rule 0 1 2 3

And Or Nor0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

rule 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

input outputp q

A B

B(t+1) = not (A(t))rule 2

Edda Klipp, Kinetic Modeling Group

Boolean Models

gene a gene b

gene c gene d

C

A

D

B

AB

+

+

repression

activation

transcription

translation

gene

protein

a b

c d

Boolean network

a(t+1) = a(t)

b(t+1) = (not c(t)) and d(t)

c(t+1) = a(t) and b(t)

d(t+1) = not c(t)

0000 00010001 01010010 00000011 00000100 00010101 01010110 00000111 0000

Steady state: 0101

1000 10011001 11011010 10001011 10001100 10111101 11111110 10101111 1010

Zyklus: 1000 1001 1101 1111 1010 1000

Edda Klipp, Kinetic Modeling Group

Boolean Models

- The number of states is finite, , as well as number of state changes.

- The system may reach steady states or cycles.

- Not every state can be reached from every other state.

-The successor state is unique, the predecessor state not.

Advantages: easy description with simple rules, no parameterscomputationally not demanding

Drawbacks: no intermediate values

N2

Edda Klipp, Kinetic Modeling Group

Description with Differential Equations

X + DNA X-DNAk1

X-DNA X + DNAk-1

Nucleic acids + DNA mRNA + DNAk1

mRNA Nucleic acidsk-1

Amino acids + mRNA Proteins + mRNAk2

Proteins Amino acidsk-2

DNAXkDNAXkdtDNAXd

11

SfS dtd

S – vector of concentrationsf – function(s), often non-linear

Edda Klipp, Kinetic Modeling Group

Basic Elements of Biochemical Networks

S1

S2

S4

S3

v1 v2

v3

v4

v5

dtdSdtdSSpdtdS

SpSSpdtdSSppdtdS

24

253

244132

1211

S1[0] = 0

S2[0] = 0S3[0] = 0S4[0] = 1

p1 = 1p2 = 1 p3 = 1 p4 = 0.5p5 = 0.5 0 1 2 3 4 5

0

0.5

1

S[t]

S1S2

S3 S4

Time

Systems equationsr – number of reactionsSi – metabolite concentrationsvj – reaction ratesnij – stoichiometric coefficients

Network properties Individual reaction properties

r

jjij

i vndt

dS

1

p,pSvv ijnN

Kinetics Dynamics admissible steady state fluxes conservation relations

Edda Klipp, Kinetic Modeling Group

ODE - concept of steady state

0pS,vN 0dtdS or

•no change of concentrations•but (usually) non-vanishing fluxes or rates

Time

To restrict modeling to main aspects often the asymptotic behaviour of dynamic systems is analyzed (behavior after sufficient long time). It may be Va

riabl

e

- oscillatory- chaotic

- in many relevant situations the system will reach a steady state.

Edda Klipp, Kinetic Modeling Group

Data BasesGO (Gene Ontology) http://www.geneontology.org, functional description of gene products KEGG (Kyoto Enzyclopedia of Genes and Genomes) http://www.genome.ad.jp/kegg/, reference knowledge base offering information about genes and proteins, biochemical compounds and reactions, and pathways BRENDA (Comprehensive Enzyme Information System) http://www.brenda.uni-koeln.de, curated database containing functional data for individual enzymes NCBI (National Center for Biotechnology) http://www.ncbi.nlm.nih.gov/ ,provides several databases: - molecular databases, with information about nucleotide sequences, proteins, genes, molecular structures, and gene expression - taxonomy database: names and lineages of more than 130,000 organisms

SPAD (Signaling PAthway Database) http://www.grt.kyushu-u.ac.jp/spad/index.html, information about signaling pathways (schemes, links) JWS Online, Model database http://jjj.biochem.sun.ac.za/database/index.html , published models,implemented in Mathematica®

Models can be simulatedBiomodels, Model database http://www.biomodels.net/ , published models,implemented in SBML

Edda Klipp, Kinetic Modeling Group

Modeling Tools•BALSA•BASIS•BIOCHAM•BioCharon•biocyc2SBML•BioGrid•BioModels•BioNetGen•BioPathway Explorer•Bio Sketch Pad•BioSens•BioSPICE Dashboard•BioSpreadsheet•BioTapestry•BioUML•BSTLab•CADLIVE•CellDesigner•Cellerator•CellML2SBML•Cellware•CL-SBML•COPASI

•Cytoscape•DBsolve•Dizzy•E-CELL•ecellJ•ESS•FluxAnalyzer•Fluxor•Gepasi•INSILICO discovery•JACOBIAN•Jarnac•JDesigner•JigCell•JWS Online•Karyote*•KEGG2SBML•Kinsolver*•libSBML•MathSBML•MesoRD•MetaboLogica•MetaFluxNet

•MMT2•Modesto•Moleculizer•Monod•Narrator•NetBuilder•Oscill8•PANTHER Pathway•PathArt•PathScout•PathwayLab•Pathway Tools•PathwayBuilder•PaVESy•PNK•Reactome•ProcessDB•PROTON•pysbml•PySCeS•runSBML•SBML ODE Solver•SBMLeditor

•SBMLmerge•SBMLR•SBMLSim•SBMLToolbox•SBToolbox•SBW•SCIpath•Sigmoid*•SigPath•SigTran•SIMBA•SimBiology•Simpathica•SimWiz•SmartCell•SRS Pathway Editor•StochSim•STOCKS•TERANODE Suite•Trelis•Virtual Cell•WinSCAMP•XPPAUT

http://sbml.orgEdda Klipp, Kinetic Modeling Group

Conclusions•Mathematical models of cellular processes allow for a testable representation of experimental knowledge.

•Models clarify systemic and dynamic properties of the investigated object.

•Models allow simulating processes independent of the experiment.

•Modeling reveals regulatory properties of cellular networks Osmostress response:

–The role of channel Fps1 in osmoresponse–The ability to repeated stimulation and the contribution of phosphatases–Feedback loops / signal integration and separation

•Models can have predictive value–Mutant phenotypes–Effect of intervention–Integration of external signals to cell cycle progression–Critical cell size for G1/S transition

Edda Klipp, Kinetic Modeling Group

Process of model development- Analysis of the objects to be modeled

- Formulating of the scientific PROBLEMS

- Design of a simple model - as „cartoon“- in mathematical terms

- Solve the respective (mathematical) problemes- Comparison of results with real system (EXPERIMENT) - Difference- iterative enhancement of the models (structure, parameters, …)

Distribution of molecules on Both sides of a membrane

Ai Ao

dAi/dt = f(Ai, Ao, C, p)

If we would not make models, then we would not know, why they are wrong

Edda Klipp, Kinetic Modeling Group

Modeling

Mathematical Models for Cellular Processes

ODE-Systemsstructural

Knowledge +

experimental Data

System AnalysisSimulation,Parameteridentification

System Understanding + Prediction

Metabolic and Regulatory Networks

Edda Klipp, Kinetic Modeling Group

Basic Elements of Biochemical Networks

Glucose1-P Glucose6-P Fructose6-Pv1 v2

Phospho-glucomutase

Glucose-Phosphat-isomerase

Metabolite Metabolite Metabolite

Reaction Reaction

Design of structured metabolic models

1. Determination of system limitsG1P G6P F6Pv1 v2

Systemextern extern

Concentration change = Production – Degradation + Transport Transportvvv

dtPdG

2162. Balancing

PGKPGVv

M 11

1

maxRate as function of concentrations and parameters

3. Assignment of Kinetics

Transport

Edda Klipp, Kinetic Modeling Group

Hypothesis Generation

establish a mathematical model of the network

-define a performance function

-calculate parameters optimizing the performance function

-compare prediction with experimental data

Possible theoretical approaches:

Structure FunctionModelling of Systems Dynamics

Function StructureEvolutionary Optimization

HomeostasisAppropriate ResponseExperimental data

Network Control patternParameters

Edda Klipp, Kinetic Modeling Group

Model examples -MetabolismIn Vivo Analysis of Metabolic Dynamics in S. cerevisiae:M. Rizzi, M. Baltes, U. Theobald, M. ReussBiotechnol Bioeng.55: 592–608, 1997.

Representation of Metabolismin the KEGG data basewww.kegg/kegg2.jp

Edda Klipp, Kinetic Modeling Group

Model examples –Signaling pathways

GDPG GTPGG

GDPG

GTP GDP

Ra*

P

Signal

MAP KKKK

MAP KKK MAP KKK-P

MAP KK MAP KK-P

MAP K MAP K-P Signal

ATP ADP

MAP KK-PP

MAP KKK-PP

MAP K-PP

ATP ADP

ATP ADP ATP ADP

ATP ADP ATP ADP

MAP K cascade

A-P A

ADP ATP

B B-P

C-P C

P

k1

k2

k3

k4

PhosphoRelaysystem

Signal

G-Protein

Edda Klipp, Kinetic Modeling Group

Common properties

Cellular network has a high degree of connectivity.

The processes are reactions, molecular interactions.bindingintramolecular transformationsrelease

Differences in modeling of different partsare due to appropriate approximations.

Edda Klipp, Kinetic Modeling Group

Concentrations

Signalling Metabolism

Proteins low

~ 100-300 nmol/L(~ 103-104 molecules per cell)

(catalysts and substrates)

ATP ~ 2 mmol/L

Enzymes low

Metabolites higher

Edda Klipp, Kinetic Modeling Group

Network CharacteristicsSignaling

Reactions can be - catalysed by enzymes- autocatalytic.

The network is given by the existing proteinand their interactions.

Metabolism

All reactions are catalysed by enzymes.

The network is determined by the existing enzymes(which not necessarily interact).

Metabolites need not to be there initially.

Edda Klipp, Kinetic Modeling Group

Network CharacteristicsSignaling Metabolism

MAP K MAP K-P

ATP ADP

MAP K-PP

ATP ADP

P P

Glucose Gluc 6-P

ATP ADP

Fruc 1,6-PP

ATP ADP

P

Fruc 6-P

State changes: change in phosporylation statesCoding of information

But: Conservation(MAPK + MAPK-P + MAPK-PP)in the considered time window

Important feature:Flux through the pathway,(final) transformation of metabolites

Phosphorylation energy transfer

Edda Klipp, Kinetic Modeling Group

Rate equations…. Are a Choice of the Modeler

Signaling Metabolism

MAP K MAP K-P

ATP ADP

Glucose Gluc 6-PATP ADP

Catalyst and Substrate have aboutthe same concentration (ES)

Binding slow compared to intramolecularrearrangements.

First order kinetics

Typical choice:Michaelis-Menten-Kinetics

E+S ES E+P

Requirement: E << S

Hexokinase

Mg2+

MAP KK-PP

fast slowtot

MEkV

SKSVv

2max

max ,

ATPkkSEkv ,

Mass action kinetics

Edda Klipp, Kinetic Modeling Group

Spatial effectsSignaling Metabolism

„well stirred“

Molecules are considered to meet with probability according to their concentration (mass action).

Spatial effects usually neglected.

„well stirred“ ???

Low number of molecules,Highly organised complexes,Often membrane-bound.

Spatial effects should be considered.(problem with ODEs)At least as „compartmentalisation“

Edda Klipp, Kinetic Modeling Group

Temporal characterisationSignaling Metabolism

Time constants for reactions

kk1 A B

k+

k-

1

i i

jijj S

vn

nij - stoichiometric coefficients

0

0

dtdxdf

dtdxdft

Tc

Time constants for metabolites

Definition acc.to Llorens et al. 1998

Amplitude

Heinrich et al., 2002

0

0

dttX

dttXt

i

i

i

2

0

0

2

i

i

i

i

dttX

dttXt

i

i

i

dttX

S

20

Transition time

Duration

time

Edda Klipp, Kinetic Modeling Group

Conclusions

Models for Metabolism and Signaling can use theSame Design Principles.

Metabolism and Signaling may take place in different areas of the cellsdifferent regions of the concentration spacedifferent time scales

Signaling models have to account for the hierarchy in the system

Regulatory couplings (feedback) distribute control in both cases.

Edda Klipp, Kinetic Modeling Group

EXAMPLE TGFbeta signal transduction:

the SMAD engine

Overview of the

pathway• Ligand dimer binds to

receptor heterotetramer (type I and II receptors, both ser/tre kinases)

• r-SMAD1/5/8 versus r-SMAD 2/3

• Phosphorylated r-SMAD binds SMAD4 and travels to the nucleus

• Ubiquitylation (SMURF1-dependent and independent)

LETS LOOK UP THE TGFbeta PATHWAY!

www.reactome.org

Example: Vilar et al. 2006, PLoS Computational Biology

Signal Processing in the TGFbeta Superfamily Ligand/Receptor Network

From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology

14 ligands, 5 type II and 7 type I receptors – this results in 50 different ligand/receptor complexes

Figure 2

Unusual features of the TGFbeta pathway

Simple core trasduction engine (two SMAD channels: 2/3 and 1/5/8) but very complex, diverse respones (42 ligands, 5 type II and 7 type I receptors, 300 target genes)

Receptors are constitutively internalised and recycled – only app. 10% present on the plasma membrane at any time

Comparatively late activation peak: app. 60 minutes (compare with EGFR of only 5 minutes)

Several negative feedback loops, including:- constitutive degradation- ligand-induced degradation (Smad7-Smurf2)

From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology

Ki = 1/3 min

30 min

60 min

Klid = 1/4 min

Figure 3

Sources of experimental dataMitchell H, Choudhury A, Pagano RE, Leof EB. Ligand-dependent

and –independent transforming growth factor-beta receptor recycling regulated by clathrin-mediated endocytosis and Rab11. Mol Biol Cell, 2004, 15: 4166-4178:

• Recycling rate – Figure 3 (app. 30min)• Internalisation rate – Figure 4Di Guglielmo GM, Le Roy C, Goodfellow AF, Wrana JL. Distinct

endocytic pathways regulate TGF-beta receptor signalling and turnover. Nat Cell Biol, 2003, 5: 410-421:

• Internalisation rate – Table 1 - receptors are internalised through the clathrin pathway and lipid-caveolar compartments with similar rates

• Degradatation rate – Figure 3 – app. 400 min

Figure 3, Mitchell et al.Figure 3. TGF-beta receptors recycle at the same rate in the presence and absence of ligand. (A) Mb202 1-18 cells were processed for imaging and fluorescence quantitation as in Figure 2, B and C,, except 10 ng/ml GM-CSF was included in both incubations. Bar, 10 µm. (B) Cultures were labeled with 125I-Fab anti-GM-CSF receptor- for 2 h at 4°C in the presence ( ) or absence ( ) of 10 ng/ml GM-CSF. After washing and incubation at 37°C for 30 min (in the presence or absence of 10 ng/ml GM-CSF), labeled receptor antibody was removed by acid wash and the cultures returned to 37°C. (…) Results are expressed as percentage of the total cell-associated radioactive counts after the first acid strip and before further incubation at 37°C, and indicate the mean ± SD of two experiments done in duplicate.

Figure 4, Mitchell et al.

• Figure 4. TGF-beta receptors internalize at the same rate regardless of activation state. Mb202 1-18 cells were prebound with radiolabeled antibody in the presence ( ) or absence ( ) of 10 ng/ml GM-CSF as in Figure 3B and then incubated at 37°C for the indicated times. Surface antibody was removed by acid treatment at 4°C, after which cells were processed to determine internalized radioactivity (see Materials and Methods). Results are expressed as percentage of total cell-associated radioactive counts before incubation at 37°C and indicate the mean ± SD of two experiments done in duplicate.

Table 1, Di Guglielmo et al. Quantitation of TGF-beta receptor distribution by

immunoelectron microscopy

From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology

Figure 4

Plasma membrane concentrations[lRiRii] - ligand/heterotetramer receptor complex[l] - ligand[Ri] - receptor type i[Rii] - receptor type ii

kα - ligand/receptor complex formation ratekcd - constitutive degradation rateklid - ligand induced degradation rateki - internalisation rate

Endosomal concentrations[lRiRii] - ligand/heterotetramer receptor complex[Ri] - receptor tpe i[Rii] - receptor type ii

ki - internalisation ratekr - recycling rate

From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology

Late and long

Slower rates for internalisation and recycling: Ki = 1/10 min kr = 1/100 min

Figure 5

From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology

CIR makes the difference

Figure 6

PART 3PHYLOGENETICS

WHAT I WILL TALK ABOUT• A BIT OF THEORY

• EXAMPLES (CRISPs AND SMADs)

• MEGA PACKAGE - HOWTO

• INTERPRETATIONS (what can a simple BLAST search, multiple sequence alignment, or a tree, tell me about BIOLOGY)

First Things First(definitions)

• Phylogenetic analysis• Phylogenetic tree

– rooted– unrooted

• Homology– paralogy– orthology

• one-to-one• co-orthology

• Nucleotide substitutions– synonymous– non-synonymous

A phylogenetic analysis of a family of related nucleic acid or protein sequences, is a determination of how the family members might have been derived during evolution

Phylogenetic tree – a graphical representation that depicts evolutionary relationships between a set of related sequences. Most-alike sequences are placed at the outer ends if two branches that are joined below into a lower common branch, representing their derivation from an ancetral sequence. An unrooted tree does not provide information on the common ancestor to the group.

What is phylogenetics?

The simplest tree

Species A

Species B

Ancestral species

Evolutionary time

Gene A

Gene B

Ancestral genebranches

node

root

Homologs. Genes whose sequences are so similar that they almost certainly arose from a common ancestor gene

(1) Orthologs are genes in different species that arose from a single gene in the most recent

common ancestor of those species – that is, by a process of speciation

(2) Paralogs, on the other hand, are genes in the same species that arose from a single gene in an ancestral species by a process of

duplication

Who is Who of -ologs

Evolutionary time

Gene A1

Gene A2

Gene B1

Ancestral gene

Gene A2b

Gene B2

paralogs

co-orthologs1:1 orthologs

Non-synonymous substitution – a nucleotide substitution that results in an amino acid change (dn)

Synonymous substitution – a ”silent” nucleotide substitution, often in the third codon position, that does not result in an amino acid change (ds)

dn/ds – the simplest test for the rate of evolution (1 <, > 1, = 1)

Synonymous or non-synonymous?

EXAMPLE

cysteine-rich secretory proteins (CRISPs)

There are three CRISP genes in human, rat and mouse. However, their nomenclature is misleading

• None of the genes are simple one-to-one orthologs

• A single ancestral gene at the base of the vertebrate lineage was most likely subject to two rounds of gene duplication before the human/rodent split, but the picture is complicated by species-specific duplications and lineage-specific losses

• A surprisingly high number of changes in gene expression patterns have occurred during the evolution of the CRISP family. For detailed discussion, please see: (Huminiecki and Wolfe, Genome Research, 2004)

EXAMPLE TGFbeta signal transduction:

the SMAD engine

Overview of the

pathway• Ligand dimer binds to

receptor heterotetramer (type I and II receptors, both ser/tre kinases)

• r-SMAD1/5/8 versus r-SMAD 2/3

• Phosphorylated r-SMAD binds SMAD4 and travels to the nucleus

• Ubiquitylation (SMURF1-dependent and independent)

Interesting phylogenetic phenomena

• DPP/BMP Type-1 receptor and an r-SMAD found in non-bilaterian cnidarian (Acropora millepora) – has the pathway evolved in a context other than dorsoventral patterning?

• Two SMAD4 in frogs: XSMAD4α and XSMAD4β. Also worms could have two co-SMADs (Sma-4 and Daf-3) but only one SMAD4 expected in mammals!

What is the ancestral SMAD?• Hypothesis: an ancestral SMAD – CoRe-SMAD

– worked as a homodimer. The gene duplicated and gave rise to an r-SMAD and a co-SMAD

• But where did the i-SMADs come from? – i-SMADs evolve faster (evidence: average dn/ds,

length of protein branches, missing phosphorylation motif, and L3 sequence not conserved between DAD and i-SMAD6, 7);

– (((mad, dsmad2), medea),dad)– (((((SMAD1,SMAD5), SMAD9),SMAD2, SMAD3), SMAD4), SMAD6, SMAD7)

Amino-acid PAM matrix, neighbour joining tree

vertebrate SMAD1,5,9 D. melanogaster Mad

vertebrate receptor SMADs D. melanogaster dSMAD2

sma-2

daf-8

sma-3

daf-3

sma-4

vertebrate co-SMADs D. melanogaster Med Medea dSMAD4

daf-14

tag-68

D. melanogaster Dad

vertebrate SMAD7 vertebrate SMAD6

0.5

Fascinating C. elegans SMADs

Positive selection in sma/daf branches?

Sma genes control body size, while daf genes control dauer formation. Lengths of protein branches suggested that daf genes underwent a period of very fast protein evolution. Could it be positive selection in response to environmental change? dn/ds test positive!

Daf corresponding SMAD evidenceDaf-3 co-SMAD(?) nj_PAM, newfeld2_MH1_ml

i-SMAD newfeld2_p-loop_degenerateDaf-8 r-SMAD nj_PAM, newfeld2_p-loopDaf-14 co-SMAD(?) nj_PAM

co-SMAD newfeld2_p-loop(2S)Tag-68 i-SMAD nj_PAM

Interpretations of phylogenies

How all this could help in my project?

I will propose just a few ideas – please, join in, voice your suggestions, discuss your favourite gene family!!!

Application 1”Evolutionary Saga” or my gene family over the eons

Is the gene family present in bacteria, yeast, plants, non-bilaterial animals? To find out, just run a BLAST search against GenBank and read names of the species with hits. Can one infer from this how old the family is?

How many gene duplication events, and when did they occur? Have there been any deletions? Has the intron number changed, or there is no introns (suggestive of retroposition)

Can these events be correlated with the development of a new body plan, new organs, or novel physiology? Is this correlation supported by the sites of expression?

Porifera (sponges)

Cnidaria (jellyfish, coral)

Flatworms

Molluscs (gastropods)

Annelids (leeches)

Arthropods (insects)

Vertebrates (fish, birds, mammals)

Urochordates

Cephalochordates

HemichordatesEchinoderms (sea urchins, starfish)Nematodes (?)

Bilateri

a

Metazoanphylogeny

Wnt

TGFbet

a FGF2R?

Expansion of the signal transduction toolkit

Cnidaria C. elegans Drosophila Human and porifera

TGFbeta 1(?) 4 - 27

Wnt >1 5 7 18

FGF - 1 1 23

Increased anatomical complexity(diversification of body plans and body parts)

Application 2”My Gene and the Genome”, or how my favourite gene compares to other members of the gene family?

How many related genes, how similar, and in what physical location in the genome (most duplications are tandem, head-to-tail)

Evidence for functional redundancy? (important for knockouts)

Tissue-specific expression patterns, or do they overlap (expression.gnf.org)?

Genomic context (www.ensembl.org)

Application 3”Special Sites in my Gene”

Multiple sequence alignment:- regions of conservation- regions of change

Important for the design of my next deletion mutant, hybridization probe, or a set of primers

Visual inspection of the multiple sequence alignment will be sufficient in most cases (check out Pfam or ENSEMBL for precomputed alignments of your favourite family – www.ensembl.org)

Reference Bioinformatics: Sequence and Genome Analysis

David W. Mount

CSHL lab manual series

Great introductionto the field

Reference Molecular Evolutionand Phylogenetics

Masatoshi Nei, Sudhir Kumar

Nuts and bolts of tree drawing methods

Reference From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design

S. Carroll, J. Grenier, S. Weatherbee

Interpretations

THANK YOU!