17
Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy, Nick Goldman The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees.

The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

  • Upload
    buffy

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees. Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy, Nick Goldman. Balanced Minimum Evolution. - PowerPoint PPT Presentation

Citation preview

Page 1: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Fabio PardiPhD student in Goldman Group

European Bioinformatics Instituteand University of Cambridge, UK

Joint work with:

Barbara Holland, Mike Hendy, Nick Goldman

The BME criterion for tree reconstruction and a Branch and Bound algorithm for

BME-optimal trees.

Page 2: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

BME stands for Balanced Minimum Evolution and is a (new) criterion for distance-based tree reconstruction.

It is based on Pauplin’s formula, ΛD(T), which estimates the total length of a tree, based on:(1) its topology T, (2) an estimated distance matrix D = (dij).[Pauplin 2000 J Mol Evol 51]

The objective, like for any other Minimum Evolution (ME) method, is to find a T that minimises ΛD(T) (= “BME score”).

What is BME?

Page 3: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

Pauplin’s formula.

ΛD(T) = ∑ij wij(T) dij

How to get it:

where wij(T) = 1 / 2 branches between i and j

o(1)

o(2)o(3)

o(4)

o(5) A reasonable estimate of the tree length: Λo= ½ (do(1)o(2)+do(2)o(3)+do(3)o(4)+do(4)o(5)+do(5)o(1))

= ½ ∑i do(i),o(i+1)

But Λo is dependent on the ordering o…Pauplin’s formula can be obtained by averaging over all such o’s.[Semple & Steel 2004 Adv Appl Math 32]

It can also be generalised to multifurcating trees, but not relevant here, as it can be proven that BME-optimal trees are always bifurcating.

Page 4: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

Neighbor Joining revealed!

[Gascuel & Steel 2006 MBE 23]

Until recently it was unclear whether NJ implicitly aimed at optimising some criterion.“NJ has some relation to unweighted least squares and some to minimum evolution, without being definable as an approximate algorithm for either” [Felsenstein’s textbook]

Recently it was shown that NJ can be seen as a greedy algorithm that aims to minimise the BME score. [Desper & Gascuel 2005 (in MEP book)]

Page 5: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

Since NJ tries to (but usually does not) minimise the BME criterion, what about better algorithms for this?

Desper and Gascuel’s program FASTME implements:(1) A sequential addition strategy (which I will call Sadd).(2) A hill-climbing search where NNIs are the possible moves (BNNI).

Page 6: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

Since NJ tries to minimise the BME criterion, what about better algorithms for this?

Desper and Gascuel’s program FASTME implements:(1) A sequential addition strategy (which I will call Sadd).(2) A hill-climbing search where NNIs are the possible moves (BNNI).

Page 7: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

Since NJ tries to minimise the BME criterion, what about better algorithms for this?

Desper and Gascuel’s program FASTME implements:(1) A sequential addition strategy (which I will call Sadd).(2) A hill-climbing search where NNIs are the possible moves (BNNI).

Page 8: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

NJ 4.65 0% 61.0%

BIONJ 4.65 -0.06% 44.6%

Sadd 4.98 6.99% 36.0%

NJ+BNNI 4.48 -3.66% 97.9%

BIONJ+BNNI 4.48 -3.76% 98.0%

Sadd+BNNI 4.50 -3.25% 97.7%

BBBME 4.49 -3.38% 100%

3.61 0% 61.0%

3.53 -2.19% 48.7%

4.05 12.21% 35.5%

3.47 -3.80% 98.1%

3.46 -4.05% 97.9%

3.46 -3.91% 97.8%

3.46 -3.96% 100%

dRF(T, true T) freq. T opt. dRF(T, true T) freq. T opt.

Balanced Minimum Evolution

Since NJ tries to minimise the BME criterion, what about better algorithms for this?

Desper and Gascuel’s program FASTME implements:(1) A sequential addition strategy (which I will call Sadd).(2) A hill-climbing search where NNIs are the possible moves (BNNI).

The results are very good: (2 datasets of 2000 simulated 24-taxon distance matrices each, replicated from Desper and Gascuel 2002 J. Comp. Biol.)

Also other papers [e.g. Vinh & von Haeseler 2005 BMC Bio] confirm that X + BNNI outperforms most (all?) existing distance methods.

Page 9: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

BNNI performs very well, but it may get stuck in local minima.

… constructing low-BME trees is good !!!

What about an exact algorithm for this problem?

Branch and Bound !!!= explore the “meta-tree”.

Every time you enter a new node you assess whether you should go back or continue based on a lower bound LB on the score of the trees below.

If LB > current best score, then no optimal tree is below there, so go back. For every T* here, Λ(T*) LB

T

Page 10: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

A B&B approach to find BME trees: the bound.

If along each path root-leaf the score can only increase then the score of the current tree is a LB.

Parsimony has this property but BME doesn’t, unless we assume the triangle inequality…

Why? Λ(T) = avgo Λo =

= avgo ½ ∑i do(i),o(i+1)

i

j

k

Λ’o - Λo = ½ (dik + dkj – dij) ≥ 0

For every T* here, Λ(T*) LB

T

Λ(T U k) – Λ(T ) = avgo(Λ’o – Λo) ≥ 0

Page 11: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

A B&B approach to find BME trees: the bound.

Taking that idea further, we can drop the triangle inequality assumption and have that Λ(T U k) – Λ(T) ≥ ½ βk

For every T* here, Λ(T*) LB

where βk = min { dik + djk – dij }i,j added before k

T

Λ(T*) Λ(T) + ½ ∑ βkk not in T

Which is good because:

1) The triangle inequality often does not hold.

2) The ∑βk above is usually positive, so this is a better bound than simply requiring an increase Λ(T*) Λ(T).

Page 12: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

A B&B approach to find BME trees: results and conclusions.

I implemented the algorithm in a program called BBBME. This allows us to see how far the heuristics in FASTME are from the optimum.

FASTME’s heuristics are very good... The suboptimal trees produced by BNNI seem as good as the optimal trees.Will these results also hold for larger distance matrices (≥ 24 taxa)?

NJ 4.65 0% 61.0%

BIONJ 4.65 -0.06% 44.6%

Sadd 4.98 6.99% 36.0%

NJ+BNNI 4.48 -3.66% 97.9%

BIONJ+BNNI 4.48 -3.76% 98.0%

Sadd+BNNI 4.50 -3.25% 97.7%

BBBME 4.49 -3.38% 100%

3.61 0% 61.0%

3.53 -2.19% 48.7%

4.05 12.21% 35.5%

3.47 -3.80% 98.1%

3.46 -4.05% 97.9%

3.46 -3.91% 97.8%

3.46 -3.96% 100%

dRF(T, true T) freq. T opt. dRF(T, true T) freq. T opt.

Dataset ‘small’ Dataset ‘moderate’

Unfortunately, experimenting with larger distance matrices is hard.

Page 13: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Thanks:

Mike Hendy

Barbara Holland

Nick Goldman

David Penny

Mike Steel

Rick Desper

Olivier Gascuel

Page 14: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Running time in seconds

Fre

qu

en

cy

0 20 40 60 80 100

02

00

40

06

00

80

01

00

01

20

0

Running time on 24-taxon distance matrices: each run typically takes only few seconds (on 2.80Ghz CPUs with 1.5GB RAM)

But the running time still increases exponentially with the number of taxa: the B&B approach seems applicable up to ~40 taxa…

Page 15: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees
Page 16: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

A Branch and Bound approach to find BME trees:Computational aspects

If we are naïve, calculating the BME score Λ(T’) will take O(k2).

k leaves

T

k+1 leaves

T’

O(k2)O(k2)

O(k2)O(k2)

O(k2)O(k3)

However one can use Λ(T), and it turns out that Λ(T’) can then be calculated in O(1).

Page 17: The BME criterion for tree reconstruction and a Branch and Bound algorithm for BME-optimal trees

Balanced Minimum Evolution

A Branch and Bound approach to find BME trees:Computational aspects

k leaves

T

k+1 leaves

T’

O(1)O(1)

O(1)O(1)

O(1)O(k)

If we are naïve, calculating the BME score Λ(T’) will take O(k2).

However one can use Λ(T), and it turns out that Λ(T’) can then be calculated in O(1).

Λ(T’) = Λ(T) + f(ΔT)

where ΔT is a data structure – of O(k2) size – that needs to be updated for each new T. This takes O(k diamT) = O(k log k). [Desper and Gascuel 2002 J. Comp. Biol.]