40
Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Embed Size (px)

Citation preview

Page 1: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Leptothorax gredosi

Leptothorax racovitzaeCamponotus herculeanus

0.990.58

0.990.96

0.76

0.76

0.911.00

0.581.00

0.990.91

Thomas Bayes 1702-1761

Page 2: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Bayesian inference

Computational phylogeneticsCSC 10.-12.12.2006

Mikko Kolkkala

Page 3: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

How to read a tree?Temcur312

Temdul313

Temmin608

Temamb604

Temlon314

Proame311

Myrber327

Myrrav202

Temgre328

Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302

LepmUS319

Leppoc610

Fornit

Forhir307

LepaceFIN

Lepkut

Lepgre334

LepmusFINPOL

Lepwil380

Harcan371

Harsub

Carele322

Carsp303

Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505

Terall335

Terxal353

Manrub189

Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526

Temcur312

Temdul313

Temmin608

Temamb604

Temlon314

Proame311

Myrber327

Myrrav202

Temgre328

Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302

LepmUS319

Leppoc610

Fornit

Forhir307

LepaceFIN

Lepkut

Lepgre334

LepmusFINPOL

Lepwil380

Harcan371

Harsub

Carele322

Carsp303

Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505

Terall335

Terxal353

Manrub189

Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526

Temcur312

Temdul313

Temmin608

Temamb604

Temlon314

Proame311

Myrber327

Myrrav202

Temgre328

Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302

LepmUS319

Leppoc610

Fornit

Forhir307

LepaceFIN

Lepkut

Lepgre334

LepmusFINPOL

Lepwil380

Harcan371

Harsub

Carele322

Carsp303

Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505

Terall335

Terxal353

Manrub189

Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526

Temcur312

Temdul313

Temmin608

Temamb604

Temlon314

Proame311

Myrber327

Myrrav202

Temgre328

Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302

LepmUS319

Leppoc610

Fornit

Forhir307

LepaceFIN

Lepkut

Lepgre334

LepmusFINPOL

Lepwil380

Harcan371

Harsub

Carele322

Carsp303

Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505

Terall335

Terxal353

Manrub189

Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526100

Temcur312

Temdul313

Temmin608

Temamb604

Temlon314

Proame311

Myrber327

Myrrav202

Temgre328

Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302

LepmUS319

Leppoc610

Fornit

Forhir307

LepaceFIN

Lepkut

Lepgre334

LepmusFINPOL

Lepwil380

Harcan371

Harsub

Carele322

Carsp303

Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505

Terall335

Terxal353

Manrub189

Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526

100

Temcur312Temdul313

Temmin608

Temamb604Temlon314

Proame311

Myrber327Myrrav202

Temgre328Temalb

Temuni196

Chakut323

Chamue330

Temint352FIN

Podade302LepmUS319Leppoc610

Fornit

Forhir307

LepaceFIN

LepkutLepgre334LepmusFINPOLLepwil380

Harcan371

Harsub

Carele322

Carsp303Strtes377

TetcaeFINGER

Mesbar519

Messtr520

Aphsen505Terall335

Terxal353Manrub189Myrrub175

Camher84

Lasali62

Prolon257

Dolqua620

Liomic533

Rhymet529

Tetpun531

Myrhar523

Myrpic525

Ambsp527

Pacsp528

Odosp526

Page 4: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Bayesian inference

Only very recently phylogenetical applications (”Why”? We’ll return to that…)

Controversial philosophySubjective probability concept; degrees of belief measured as probabilities

A learning processPrior and posterior probabilities

Spam filters

Subjective!Quack!

Page 5: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

)(

)()|()|(

Dp

pDpDp

p = probabilityD = DataΘ = model/hypothesis/parameters| = read: ”provided that"

Conditional probability: ”|”

Page 6: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

p( a six | loaded die )

1/2

An exampleSuppose we have ten identical looking dice, nine ordinary, one die loaded so that a six appears with probability 1/2. Let’s pick one die randomly. The probability of it being loaded is (of course)

1/10 (= prior)Next, we roll the die once - and get a six:

What is the probability that we have picked the loaded die now?

• p( loaded die )

• 1/10

p(a six)

1/2 • 1/10 + 1/6 • 9/10= = 1/4 (= posterior)

p( loaded die | a six ) =

Page 7: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

An exerciseA reliable test?

Test for a rare disease (prevalence 0.1 %): Disease - positive result with probability 0.99No disease - positive result with probability 0.05.

What is the probability that the test is positive but the individual tested has not the disease?

Answer: 0.98(http://en.wikipedia.org/wiki/Bayesian_inference)

Page 8: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

p(data | model) • p(model)

p(data)p(model | data) =

“loaded die" model“a six" data

)(

)()|()|(

Dp

pDpDp

Page 9: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

From dice to biology:

Data: DNA-alignmentModels: nucleotide substitution modelstree shape and branch lengths

p(data | model) • p(model)

p(data)p(model | data) =

Page 10: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

dXlp

XlpXf

)|()(

)|()()|(

Posteriordistribution

Prior distribution Likelihood function

)(

)()|()|(

Dp

pDpDp

Page 11: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

If this Bayesian thing is so excellent why hasn’tIt been used in phylogenetic analyses?

No-one can solve the equations!

Numerical solutions possible - but only with powerful computers

Page 12: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

MCMC = Markov Chain Monte Carlo

Parameters• Tree topology• Branch lenghts• Probabilities for nucleotide substitutions

“”Exploring the tree space”

Parameter space

Pro

bab

ility

© Fredrik Ronqvist

Page 13: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Metropolis-Coupled Markov Chain Monte CarloMCMCMC = (MC)3

“Heated chains"

“Flattened" parameter landscape

Page 14: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

© John Huelsenbeck

(MC)3

Page 15: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

© John Huelsenbeck

(MC)3

Page 16: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

© John Huelsenbeck

(MC)3

Swap of states

Page 17: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

p-values directlyNo need for bootstrapping

Page 18: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

F81 JC

HKY85 K80

K81 TrN

TVM

TIM

SYM GTR

Standard models

Substitutiontypes:1-6

Nucleotidefrequences:equal/estimated fromthe data

Invariable sites:no/estimate

Evolutionary rate:equal/Γ-distributed

"+I"

"+G"

ETC.

Page 19: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

. a a a

a . a a

a a . a

a a a .

AA

C G

G

T

T

C

πA=πc=πg=πT=1/4 )3

41ln(

4

3pD

JCJukes-Cantor

GTRGeneral time-reversible model

0.75

. . .

Page 20: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Characters independet? No way.

Time reversible: GC = CG ?

RNA-genes

Page 21: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

SSR-models(site-specific rates)

Different evolutionary rate for 1./2./3. positions of codons

Problematic(see: Buckley ym. 2001 Syst.Biol. 50:67-86)

Coding regions

Page 22: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

But – how to chooce the model?

Well, nobody said it would be easy.

30

How many parametersDoes it take to fit an elephant?

Page 23: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

“What do you consider the largest map that would be really useful?"

"About six inches to the mile."

"Only six inches! […]

We actually made a map of the country, on the scale of a mile to the mile!"

(Lewis Carroll 1893)

Page 24: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Choosing a model

AIC (Akaike information criterion)AICc (Consistent Akaike information criterion)BIC (Bayesian information criterion)

Programs:

Modeltest (bad)

FindModel (plop!)

MrAic

?

Page 25: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761
Page 26: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Redelings, B. D. & Suchard, M.A 2005: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54: 401-418

Lunter, G. et al. 2005: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6:83

Most commonly used program: MrBayes

Future? Alignment and phylogeny co-estimationBAli-Phy (Redeling & Suchard 2005)

Beast (Lunter et al. 2005)

Page 27: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Sweden24 987 citiesWorld record

Cities (N)

10 69

Routes (N!)

10! = 3 628 80069! = 1.7 x 1098

Travelling salesmanFind the shortest route through cities (another NP-complete problem)

84.8 CPU years

How about studying them all?With rate million routes / sec.it would take 5x1084 years

24 987 24 987! = ?

Page 28: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Acknowledgements

Fredrik RonqvistJohn HuelsenbeckWife and Mom

Page 29: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

-Command-line interface-UNIX, Macintosh and PC platforms

MrBayesRonqvist, F. & Huelsenbeck, J. 2001: Bioinformatics 17: 754-755(2005: v. 3.1.)

Page 30: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

HomepageManualWiki, FAQMailing list (archives)

MrBayes

Page 31: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

MrBayes Running the analysis

All you have to do:

Type

execute filename.nex *

at the MrBayes > prompt and press enter

* Replace filename.nex with your nexus-file containing MrBayes commands (type full path if the file is not in the same folder as MrBayes program).

Page 32: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

#nexusbegin data; dimensions ntax=6 nchar=20; format datatype=dna; matrix Otus1 aaaaaaaaaaaaaaaaaaaa Otus2 aaaaaaaaaaaaaaaaaaaa Otus3 aaaaaaaaaaaaaaaaaaaa Otus4 cccccccccccccccccccc Otus5 gggggggggggggggggggg Otus6 tttttttttttttttttttt ; end;begin mrbayes; mcmcp ngen= 100000 samplefreq=100; mcmc;end;

MrBayes – an example nexus file

Page 33: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

A real thing:

Page 34: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

MrBayes After the run

Summarize the parameter values, type: sump burnin=Summarize the trees, type: sumt burnin=

With a proper burnin value

Page 35: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

burn-in

(C) Fredrik Ronqvist

Page 36: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

MrBayes After the run

Burnin discards initial values before the analysis reached convergence (burnin=2500 if you have run a million generations, sampled every 100th of them,and want to discard the first 25%)

Note: you have to run “enough” generations-Check the plot generated by sump; there should be no obvious trends -The standard deviation of split frequencies should be less than 0.01.

Page 37: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Restriction: Can handle only 24 substitution models

Command for example:lset nst=6 rates=invgamma

MrBayes Models

Confused? Try typing: help lset

Priors, command: prset Defaults (try help prset) should work fine for most analysis

Page 38: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Cladistic parsimony

Prefer the tree with the fewest number of evolutionaryPrefer the tree with the fewest number of evolutionarysteps – only parsimony informative sites countsteps – only parsimony informative sites count

Otus1 aaaaaaaaaaaaaaaaaaaaOtus2 aaaaaaaaaaaaaaaaaaaaOtus3 aaaaaaaaaaaaaaaaaaaaOtus4 cccccccccccccccccccc Otus5 ggggggggggggggggggggOtus6 tttttttttttttttttttt

Otus1Otus2

Otus3

Otus4

Otus5Otus6

Page 39: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Fain ja Houde 2004:Evolution 58: 2558-2573

Page 40: Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

Exercises:1. Study program defaults with help command (e.g. lset and prset)

2. Run program with a few arbitrary sequences (e.g. palikka.nex)-Try sump and sumt commands with different burnin values-Study the files made by the program – where is the tree?

3. Run program with some real data (e.g. your own or birds.txt)-Align sequences-Put them into a nexus file

-Try to find out how to select JC, K2P and GTR model with gamma-distributed rate variation and withoutwith correction for invariable sites and without

-Try the model suggested by FindModel (AIC-criterion)

-

MrBayes