29
Comparative Biology observabl e observabl e Parameters:tim e rates, selection Unobservable Evolutionary Path observabl e Most Recent Common Ancestor ? ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG T i m e D i r e c t i o n Which phylogeny? Which ancestral states? Which process? Key Questions: Homologous objects Co-modelling Genealogical Key Generalisations:

Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Embed Size (px)

Citation preview

Page 1: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Comparative Biology

observable observable

Parameters:tim

e

rates, selection

Unobservable

Evolutionary Path

observable

Most Recent

Common Ancestor

?

ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG

Tim

e Direction

•Which phylogeny?

•Which ancestral states?

•Which process?

Key Questions:

•Homologous objects•Co-modelling•Genealogical Structures?

Key Generalisations:

Page 2: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Structure of Biology: Physical Systems and Evolution

Data

SequencesStructuresExpression Levels….…•Data

M1

M1

..Mk

Models

Framework for model formulation

•Models

Scientific Texts,Systems Biology Markup Language,Process Algebras…

Knowledge and Representation

•Knowledge & Representation

Structure of Biological Systems

Atoms, Molecules, Networks, MotorsCentral Dogma, Genetic Code…

•Structure of Biological Systems

Dynamics - the system as a physical entity

Evolution - the system has evolved

Part of individuals in a population

Part of species in the tree of life

Page 3: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

The Data

• Sequence Data

• Metabonomics/Metabolomics and Small Molecule Detection

• Expression Data

• Proteomics and Protein Interactions

• Structures from Crystallography, NMR and Cryo-EM

• Single Molecule Measurements

• Microscopy

Page 4: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Example of Reduction/LevelsEnzyme catalysis:

Such reductions can are based on “biological concepts”

A molecular dynamics sample path involving one catalysis event:

Set of E + S initial states ES states? Set of E + P final states

109 time steps

104 atoms

Discrete models of one catalysis event:

E + S ES E + P3-5 steps

red

uct

ion

Other clear reductions:

Individual molecules

Concentration of molecules

Set of atoms

Nucleotide

lipid molecules

Membrane

Page 5: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Elements of Physical Dynamic Modeling

Time Continuous Time

Discrete Time

0 1 2 k

No Time - Equilibrium

State & Space

Continuous Space Discrete SpaceNo Space or Space Homogeneity

Time/Space dependency

Discrete Time

0 1 k-i k-1 k

Deterministic

Stochastic

p0

p1

p2

p3

Discrete Time Continuous Time

Complicated

&

contentious.

Page 6: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Physical Dynamic Modeling: Key Models

Molecular Dynamics Quantum Mechanics Classical Potential

Continuous Time Markov Chains/ Gillespie Algorithm

Ordinary Differential Equations - ODE

Partial Differential Equations - PDE (Turing Model)

Stochastic Ordinary Differential Equations - SODE

Stochastic Partial Differential Equations - SPDE

Models on Networks Boolean Networks Kinetic Models

Page 7: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Elusive Biological Concepts: EmergenceOther EBCs: function, robustness, modularity, purpose, top-down, downward causation.

Strong emergence:(never observed)

The dynamic laws for k components

are not deducible from their properties

and their relationships.

Lower levelHigh dimensional detailed description

Higher levelLow dimensional

“Surprising” stable, robust properties

Re

du

cti

on

Weak emergence: something “new” emerges.

Questions: Automatic detection of emergence? How frequent is it? Does selection pull out emergent systems?

Ex.1 Network Dynamics

Oscillations, sensitive amplification

Large set of enzymes and atoms

Ex.2 Neural Networks

Ability to calculate, consciousness

Large set of cells

Page 8: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Levels & Objects Level Example(s) Data Modelling Techniques Atomic, Molecules -globin, water, cell

membrane Single molecules measurem ents, X-ray diffraction of crystals, N MR

Classical potentials and Newtonian Dynamics, Quantum Mechanics,

Molecular complexes Ribosome, hemoglobin, single molecule measu rements Mechanical analo gues models, Continuous Time Markov Chain with finite state space

Molecule concentrations

Concentration of meta bolites, fate of isotopes in different molecules,

ODEs (many molecules/c oncentrations), kinetics,

Metabolic Network Citric Acid Cycle Enzyme and metabolite concentrations and metabonomics

ODEs, Kinetic Models, Flux Analysis,

Regulatory Network -globins and their regulators

Expression data Boolean networks, Petri Nets, ODEs

Signal Transduction Mitogen-activated protein -kinase (MAPK)

Protein Interaction and Expression Data

ODEs, Continuous Time Markov Chains,

Protein Interaction Network

Yeast PIN Mass Spectroscopy No dynamics involved, i.e. a data type.

Motors Flagellar Motor, Microscopy, single molecule flourescen ce

Mechanical Analog ue Models

Cell(s) B-Cell, zygote, E.coli, Microscopy, expressi on data, proteomics,..

Integration of genetic, mechanical and network models.

Tissue Cancer, Partial differentia l equation (PDEs), cellular automata.

Organ Liver, lung, heart Mechanical measurem ents, Multilevel integrated modelling, including mechanics.

Page 9: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

How to Compare?Examples

Protein Structures Networks Craniums/Shape

Homologous - Non-Homologous?

Homologous components A C G TA - T T

Matching - Similarity - Distance

Distance from shortest paths

The ideal: The probability of 1 observation * Summing over possible evolutionary trajectories to the second observation.

Informal

A set:

AG

T

AC

CT

AC

CTP( ) P( )

A pair:

Page 10: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

“Natural” Evolutionary Modeling

Components: Birth and Death Process. Components are born with rate and die with rate.

Discrete states: Continuous Time Finite States Markov Chains. Initially all rates the same.

p0

p1

p2

p3

Continuous states: Continuous Time Continuous States Markov Process - specifically Diffusion. Initially simplest Diffusion: Brownian Motion, then Ornstein-Uhlenbeck.

Page 11: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Comparative BiologyNucleotides/Amino Acids

Continuous Quantities

Sequences

Gene Structure

Structure RNA Protein

Networks Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction

Macromolecular Assemblies

Motors

Shape

Patterns

Tissue/Organs/Skeleton/….

Dynamics MD movements of proteins Locomotion

Culture

Language Vocabulary Grammar Phonetics Semantics

• Observed or predicted?

• Choice of Representation.

Page 12: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Comparative Biology: Evolutionary Models

Nucleotides/Amino Acids/codons CTFS continuous time finite state Jukes-Cantor 69 +500 otherContinuous Quantities CTCS Felsenstein 68 + 50 otherSequences CT countable S Thorne, Kishino Felsenstein,91 + 40Gene Structure Matching DeGroot, 07Genome Structure CTCS MMStructure RNA SCFG-model like Holmes, I. 06 + few others ProteinNetworks CT countable S Snijder, T Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction Macromolecular Assemblies Motors IShapePatternsTissue/Organs/Skeleton/….Dynamics MD movements of proteins LocomotionCultureLanguage Vocabulary “Infinite Allele Model” (CTCS) Swadesh,52, Sankoff,72,… Grammar - Phonetics Semantics Phenotype

Object Type Reference

Page 13: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

“Natural” Co-Modeling

• Joint evolutionary modeling of X(t),Y(t).

The ideal, rarely if ever done.

• Conditional evolutionary modeling of X(t) given Y(t). The standard in comparative genomics. The distribution of Y(t) is not derived from evolution, but from practicality.

Protein Gene Prediction

RNA structure prediction

Regulatory signal prediction.

• Y(t) deterministic function of X(t)

Movement of proteins

Protein Structures

Page 14: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Examples

•RNA structure prediction

•Comparative Genomics

•Networks Patterns

•Protein Structures

Page 15: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Structure Dependent Molecular Evolution RNA Secondary Structure

From

Durbin e t a l.(1998) B

iologica l Sequence C

ompari son

Secondary Structure : Set of paired positions.

A-U + C-G can base pair. Some other pairings can occur + triple interactions exists.

Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.

Page 16: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Simple String Generators

Context Free Grammar S--> aSa bSb aa bb

One sentence (even length palindromes):S--> aSa --> abSba --> abaaba

Variables (capital) Letters (small)

Regular Grammar: Start with S S --> aT bS T --> aS bT

One sentence – odd # of a’s:S-> aT -> aaS –> aabS -> aabaT -> aaba

Reg

ula

rC

on

text

Fre

e

Page 17: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Stochastic GrammarsThe grammars above classify all string as belonging to the language or not.

All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language.

S -> aSa -> abSba -> abaaba

i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)

If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.

S -> aT -> aaS –> aabS -> aabaT -> aaba

ii. S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb

*0.3

*0.3 *0.2 *0.7 *0.3 *0.2

*0.5 *0.1

Page 18: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

S --> LS L .869 .131F --> dFd LS .788 .212L --> s dFd .895 .105

Secondary Structure Generators

Page 19: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Knudsen & Hein, 2003

From Knudsen & Hein (1999)

RNA Structure Application

Page 20: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Co-Modelling and Conditional Modelling

Observable

Observable Unobservable

Unobservable

Goldman, Thorne & Jones, 96

UC G

AC

AU

AC

Knudsen.., 99

Eddy & co.

Meyer and Durbin 02 Pedersen …, 03 Siepel & Haussler 03

Pedersen, Meyer, Forsberg…, Simmonds 2004a,b

McCauley ….

Firth & Brown

i. P(Sequence Structure)

ii. P(Structure)

)()(

)()(

SequencePSequenceStructureP

StructurePStructureSequenceP

• Conditional Modelling

Needs:Footprinting -Signals (Blanchette)

AGGTATATAATGCG..... Pcoding{ATG-->GTG} orAGCCATTTAGTGCG..... Pnon-coding{ATG-->GTG}

Page 21: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Network EvolutionStatistics of Networks

Comparing Networks

Networks in Cellular Biology

A. Metabolic Pathways

B. Regulatory Networks

C. Signaling Pathways

D. Protein Interaction Networks - PIN

Empirical Facts

Dynamics on Networks (models)

Models of Network Evolution

Page 22: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

A Model for Network Inference

•A core metabolism:

•A given set of metabolites:

•A given set of possible reactions -

arrows not shown.

•A set of present reactions - M

black and red arrows

Let be the rate of deletion the rate of insertionThen

Restriction R:

A metabolism must define a connected graph

M + R defines

1. a set of deletable (dashed) edges D(M):

2. and a set of addable edges A(M):

dP(M)

dt P(M ') P(M ' ')

M ''A (M )

M 'D(M )

- P(M)[D(M) A(M) ]

Page 23: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Likelihood of Homologous PathwaysNumber of Metabolisms:

1 2

3 4

+ 2 symmetrical versions

P( , )=P( )P( -> )

Eleni Giannoulatou

Approaches: Continuous Time Markov Chains with computational tricks.

MCMC

Importance Sampling

Page 24: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

PIN Network EvolutionBarabasi & Oltvai, 2004 & Berg et al. ,2004; Wiuf etal., 2006

•A gene duplicates

•Inherits it connections

•The connections can change

Berg et al. ,2004:

•Gene duplication slow ~10-9/year

•Connection evolution fast ~10-6/year

•Observed networks can be modeled as if node number was fixed.

Page 25: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Likelihood of PINs

•Can only handle 1 graph.

•Limited Evolution Model

de-DAing

De-con

nectin

g Data

2386 nodes and 7221 links

Irreducible (and isomorphic)

735 nodes

)0,33,.66,.1(0

Wiuf etal., 2006

Page 26: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

The Phylogenetic Turing Patterns I

Page 27: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Stripes: p small Spots: p large

The Phylogenetic Turing Patterns II

Reaction-Diffusion Equations:

Analysis Tasks:1. Choose Class of Mechanisms2. Observe Empirical Patterns

3. Choose Closest set of Turing Patterns T1, T2,.., Tk,

4. Choose parameters p1, p2, .. , pk (sets?) behind T1,..

Evolutionary Modelling Tasks:

1. p(t1)-p(t2) ~ N(0, (t1-t2)) 2. Non-overlapping intervals have independent incrementsI.e. Brownian Motion

Scientific Motivation:1. Is there evolutionary information on pattern mechanisms?2. How does patterns evolve?

Page 28: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Known KnownUnknown

-globin Myoglobin

300 amino acid changes800 nucleotide changes1 structural change1.4 Gyr

?

?

?

?

1. Given Structure what are the possible events that could happen?

2. What are their probabilities? Old fashioned substitution + indel process with bias.

Bias: Folding(Sequence Structure) & Fitness of Structure

3. Summation over all paths.

Protein Structure

Page 29: Comparative Biology observable Parameters:time rates, selection Unobservable Evolutionary Path observable Most Recent Common Ancestor ? ATTGCGTATATAT….CAG

Summary: The Virtues of Comparative Modeling• It is the natural setup for much modeling and transfer of knowledge from one species/system to another.

• Even 1 system/species is an evolutionary observation:

x

P(x):

P(Further history of x):

x

U

C G

A

C

AU

A

C