Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community

MMaximum Likelihood aximum Likelihood

Maximum LikelihoodMaximum Likelihood

Historically the newest method.Popularized by Joseph Felsenstein, Seattle, Washington.Its slow uptake by the scientific community has to do with

the difficulty of understanding the theory and also the absence (initially) of good quality software with choice of models and ease of interaction with data.

Also, at the time, it was computationally intractable to analyse large datasets (consider that in the mid-1980s a typical desktop computer had a processor speed of less than 30 MHz).

In recent times, software, models and computer hardware have become sufficiently sophisticated that ML is becoming a method of choice.

ML: comparison with other methodsML: comparison with other methods

- ML is similar to many other methods in many ways, but fundamentally different.- ML assumes a model of sequence evolution (so does Maximum Parsimony and so do distance matrix methods).- ML attempts to answer the question: What is the probability that I would observe these data (a multiple sequence alignment), given a particular model of evolution (a tree and a process).

L = Pr (D|H)L = Pr (D|H)Pr(D|H) is the probability Pr(D|H) is the probability of getting the data D of getting the data D given hypothesis H.given hypothesis H.

Principle of LikelihoodPrinciple of Likelihood

L = Pr (D|H)L = Pr (D|H)Pr(D|H) is the probability Pr(D|H) is the probability of getting the data D of getting the data D given hypothesis H.given hypothesis H.

In the context of molecular phylogenetics, D is the set of sequences being compared, and H is a phylogenetic tree and process, hence we want to find the likelihood of obtaining the observed sequences given a particular tree based on a process. The tree that makes our data the most probable evolutionary outcome is the maximum likelihood estimate of the phylogeny.

Probability ofProbability of givengiven

a b c d

b a e f

c e a g

d c f a

a ,c,g,t

Maximum LikelihoodMaximum Likelihood

What is the probability of observing a datum?What is the probability of observing a datum?

If we flip a coin and get a head and we think the coin is unbiased, then the probability of observing this head is 0.5.

If we think the coin is biased so that we expect to get a head 80% of the time, then the likelihood of observing this datum (a head) is 0.8.

Therefore: The likelihood of making some observation is entirely dependent on the model that underlies our assumption.

pp = ?= ?Lesson: The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed.

What is the probability of observing a What is the probability of observing a 'G' nucleotide?'G' nucleotide?

• Question: If we have a DNA sequence of one nucleotide in length and the identity of this nucleotide is 'G', what is the likelihood that we would observe this 'G'?

• Answer: In the same way as the coin-flipping observation, the likelihood of observing this 'G' is dependent on the model of sequence evolution that is thought to underlie the data.

• e.g.

– Model 1: frequency of G = 0.4 => likelihood (G) = 0.4

– Model 2: frequency of G = 0.1 => likelihood (G) =0.1

– Model 3: frequency of G = 0.25 => likelihood (G) = 0.25

What about longer sequences?What about longer sequences?

If we consider a gene of length 2:Gene 1: GA

The probability of observing this gene is the product of the probabilities of observing each character.

e.g.p(G) = 0.4; p(A) = 0.15 (for instance)likelihood (GA) = 0.4 x 0.15 = 0.06

……or even longer sequences?or even longer sequences?

Gene 1: GACTAGCTAGACAGATACGAATTAC

Model (simple base frequency model):p(A) = 0.15; p(C) = 0.2; p(G) = 0.4; p(T) = 0.25; (the sum of all probabilities must equal 1)

Like (Gene 1) = 0.000000000000000018452813

Note About ModelsNote About Models

You might notice that our model of base frequency is not the optimal model for our observed data. If we had used the following model:p(A) = 0.4; p(C) = 0.2; p(G) = 0.2; p(T) = 0.2;

The likelihood of observing the gene is:Like (gene 1) = 0.000000000000335544320000(a value that is almost 10,000 times higher)

Lesson : The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed.

The ModelThe Model

- The two parts of the model are the tree and the process (the model).- The model is composed of the composition and the substitution process -rate of change from one character state to another character state.

a b c d

b a e f

c e a g

d c f a

a ,c,g,t +Model =

Simple “time-reversible” Model

A simple model is that the rate of change from AA to CC or vice versa is 0.4, the composition of AA is 0.25 and the composition of CC is 0.25 (a simplified version of the Jukes and Cantor 1969 model)

. 0.4 . .

0.4 . . .

. . . .

. . . .

P P ==

0.25 0.25 . .

Probability of the third nucleotide Probability of the third nucleotide position in our current alignmentposition in our current alignment

• p(A) = 0.25; p(C) = 0.25;

• Starting with A, the likelihood of the nucleotide is 0.25 and the likelihood of the substitution (branch) is 0.4. So the likelihood of observing these data is:

• *Likelihood(D|M) = 0.25 x 0.4 =0.01

Note: you will get the same result if you start with c, since this model is reversible

*The likelihood of the data, given the model.

pa c 0.4

Substitution MatrixSubstitution Matrix

For nucleotide sequences, there are 16 possible ways to describe substitutions - a 4x4 matrix.

P

a b c d

e f g h

i j k l

m n o p

Convention dictates that the order of the nucleotides is A,C,G,T

Note: for amino acids, the matrix is a 20 x 20 matrix and for codon-based models, the matrix is 61 x 61

Substitution matrix - an exampleSubstitution matrix - an example

P

0.976 0.01 0.007 0.007

0.002 0.983 0.005 0.01

0.003 0.01 0.979 0.007

0.002 0.013 0.005 0.979

In this matrix, the probability of an A changing to a C is 0.01 and the probability of a C remaining the same is 0.983, etc.

Note: The rows of this matrix sum to 1 - meaning that for every nucleotide, we have covered all the possibilities of what might happen to it. The columns do not sum to anything in particular.

P

0.976 0.01 0.007 0.007

0.002 0.983 0.005 0.01

0.003 0.01 0.979 0.007

0.002 0.013 0.005 0.979

Likelihood ofLikelihood of given givenGene 1: CCATGene 1: CCATGene 2: CCGTGene 2: CCGT

Π = [0.1,0.4,0.2,0.3]

To calculate the likelihood of the entire dataset, given a substitution matrix, base composition and a branch length.

Likelihood of a two-sequence alignmentLikelihood of a two-sequence alignment

cPc c cPc c aPa g tPt t• CCAT• CCGT

= 0.4 x 0.983 x 0.4 x 0.983 x 0.1 x 0.007 x 0.3 x 0.979= 0.0000300

Likelihood of going from the first to the second sequence is 0.0000300

Likelihood of a two-branch treeLikelihood of a two-branch tree

O

A

B

O is the origin or root. The likelihood can be calculated in three ways:

•from A to B in one step (this amounts to the previous method)•from A to B in two steps (through O)•in two parts starting at O.

Lesson about OLesson about O• O is an unknown sequence. • We can only speculate what each position in the alignment

would be if we could observe the sequence of O. • What we do know is that the sum of all possibilities is equal

to 1.• Therefore we must sum the likielihoods for all possibilities

of O.• This becomes computationally intensive.

A

B

For position 1: {A,C,G,T}

{C}

{C}

O

Does changing a model affect the outcome?Does changing a model affect the outcome?

There are different models

Jukes and Cantor (JC69):All base compositions equal (0.25 each), rate of change from one base to another is the same

Kimura 2-Parameter (K2P):All base compositions equal (0.25 each), different substitution rate for transitions and transversions).

Hasegawa-Kishino-Yano (HKY):Like the K2P, but with base composition free to vary.

General Time Reversible (GTR):Base composition free to vary, all possible substitutions can differ.

Long-Branch AttractionLong-Branch Attraction

• In the case below, the wrong tree is often selected. ML will not be prone to this problem, if the correct model of sequence evolution is used.

A

B

C

D

WRONG TREE

A B

C D

ppq

qq p > qp > q

CORRECT TREE

Strengths of MLStrengths of ML

• There is no need to ‘correct’ for anything, the models take care of superimposed substitutions.

• Accurate branch lengths.• Each site has a likelihood.• If the model is correct, we should retrieve the

correct tree (if we have long enough sequences and a sophisticated enough model).

• ML uses all the data (no selection of sites based on informativeness, all sites are informative).

• ML not only tells you about the phylogeny of the sequences, but also the process of evolution that led to the observations of current sequences.

Weaknesses of MLWeaknesses of ML

• Can be inconsistent if we use models that are not accurate.

• Model might not be sophisticated enough.• Very computationally-intensive. Might not

be possible to examine all models (substitution matrices, tree topologies, etc.).

Documents

Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community