Perfect Phylogeny Tutorial #10 © Ilan Gronau Original slides by Shlomo Moran

Preview:

DESCRIPTION

3 no reversals Homoplasy-Free Characters no convergence Homoplasy-free characters induce a convex coloring of the phylogenetic tree The Perfect Phylogeny Problem: Given character-vectors for S, find: -a phylogenetic tree T over S. ( S is the leaf-set of T ) -convex character assignments to all vertices of T. ! This problem is generally NP-hard ! If exists

Citation preview

.

Perfect PhylogenyTutorial #10

© Ilan Gronau

Original slides by Shlomo Moran

2

The underlying model:• A character-vector is given for every specie in S.• Each character represents some observable trait.• Each character takes values from a finite set.• Basic Underlying Assumption: characters are

homoplasy free.

Perfect Phylogeny

3

no reversals

Homoplasy-Free Characters

no convergence

Homoplasy-free characters induce a convex coloring of the phylogenetic tree

The Perfect Phylogeny Problem:

Given character-vectors for S, find:- a phylogenetic tree T over S.

(S is the leaf-set of T)- convex character assignments to

all vertices of T.! This problem is generally NP-hard !If exists

4

Directed binary characters: • 0 – property exists• 1 – property doesn’t exist• Initially (at the root) all propertied do not exist.

Input: binary coloring (C1,…,Cm) of a set S (nxm binary matrix M)

Problem: Find a phylogenetic tree T over S (if one exists), s.t.1. For j=1,…,m, the partial coloring induced by Cj is convex in

T.2. The root has state 0 in all characters.

Directed Binary Perfect Phylogeny

We will present a polynomial-time solution

5

A

ED

C

B

(11000)

(00100)

(01000)

(00110)

(11001)

m characters

n sp

ecie

sExample

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

Input: Possible output:

(00000)

(11000)

(01000)(00100)

C2

C3

zero-root

6

A tree is a directed perfect phylogeny for a given 0/1 matrix

iff we can map each character to an

edge/vertex on which this character was “turned on”.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

A

ED

C

B

C4

C3

C1

C5

Example:

An Important Observation

C2 origin of C2

7

Laminar MatricesDefinitions: Oj – set of objects that have character Cj (Oj={i : Mij=1}). A collection of sets {S1 ,…, Sk} is laminar if

for all i, j, either Si and Sj are disjoint, or one includes the other.

Theorem: A binary matrix M has a perfect phylogenetic tree iff the collection {O1 ,…, Om} is laminar.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 1C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 1

Laminar Not Laminar

8

Proof of Theorem

Assume M has a perfect phylogeny.Consider the edges labeled Ci and Cj: If there is a root-to-leaf path containing both edges (C1,C2 below),

then Oi includes Oj or vice-versa. Otherwise, Oi and Oj are disjoint (C1,C3 below).

A

ED

C

B

C4

C3

C5

C1

C2

9

Assume that the collection {O1 ,…, Ok} is laminar. We prove by induction on the number of characters k that M has a perfect phylogenetic tree.

Basis: one character. There are at most two (distinct) objects, one with and one without this character.

C1

A 1B 0

C1

ABroot

Proof of Theorem (cont)

10

Assume that the collection {O1 ,…, Ok} is laminar.

Induction step: assume correctness for n-1 characters.Consider a matrix with n characters (non-zero columns), and assume WLOG that O1 is not contained in Oj for all j > 1. S1 – the set of objects i for which Mi1 = 1. S2 – the remaining objects. Claim: each character belongs to objects in S1 or S2 , but not to both.

By induction there are trees T1 and T2 for S1 and S2. C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 1 0 0 0 0

T1 T2

C1S1 ={A,C,E}S2 ={B,D}

Proof of Theorem (cont)

why is this?

11

Efficient Implementation1. Sort the columns (characters) according to decreasing binary

value.

Claim: If the binary value of column i is larger than that of column j, then Oi is not a proper subset of Oj.

Proof: Oi > Oj means the 1’s in Oi are not covered by the 1’s in Oj.

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

12

why is this?

2. Make a backwards linked list of the 1’s in each row

Claim: If the columns are sorted, then the set of columns is laminar ifffor each column i, all the links leaving column i point at the same column.

If the matrix is laminar then these pointers define the inclusion hierarchy

Efficient Implementation (cont)

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 0 0 1 1 0

13

(11000)

(00100)

(01000)

(00110)

(11001)

(00000)

(11000)

(10000)(00100)

3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral

character states

Efficient Implementation (cont)

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

C5

C1C2

C4

C3

A

ED

C

B

C4

C3

C5

C1

C2

14

1. Sort the columns (characters) according to decreasing binary value.

2. Make a backwards linked list of the 1’s in each row 3. If the matrix is laminar, compute the inclusion hierarchy4. Reconstruct topology of the phylogenetic tree and ancestral

character states

Complexity: O(mn) – use radix (bucket) sort in stage 1.

Efficient Implementation - Summary

C1 C2 C3 C4 C5

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 0 1D 0 0 1 1 0E 0 1 0 0 0

C2 C1 C3 C5 C4

A 1 1 0 0 0B 0 0 1 0 0C 1 1 0 1 0D 0 0 1 0 1E 1 0 0 0 0

Recommended