Brandon Andrews CS6030. What is a phylogenetic tree? Goals in a phylogenetic tree generator ...
Preview:
Citation preview
- Slide 1
- Brandon Andrews CS6030
- Slide 2
- What is a phylogenetic tree? Goals in a phylogenetic tree
generator Distance based method Fitch-Margoliash Method Example
Verification Demo
- Slide 3
- Also known as an evolutionary tree Attempts to map the genetic
similarity of organisms into a tree where longer branches indicate
more dissimiliarity A B C B and C are similar A and B are more
similar than A and C which have a longer distance
- Slide 4
- Given the sequences and calculated or known dissimilarity
construct a tree which correctly maps this data Nave method:
Generate every possible tree and grade its quality
- Slide 5
- Take a distance matrix that stores the distance from every
sequence to every other sequence Construct a tree which preserves
these distances Most dont 100% preserve the distances
- Slide 6
- Clustering algorithm that works bottom up to create an unrooted
tree Weights are used to help lower the error rate for long
paths
- Slide 7
- Calculate a distance matrix Hamming distance can be used, but a
better dissimilarity function is advised ABCDE A02239 41 B00 43
C0001820 D000010 E00000
- Slide 8
- Add all the sequences to an array of nodes and mark them as
leaves Select the closest nodes by scanning the distance matrix
Those two nodes, in our example D and E will make up the two
branches in a 3-branch calculation to find the branch lengths D E
A, B, C d e abc dist(ABC, D) is the average distance from ABC to D
Dist(ABC, E) is the average distance from ABC to E d = (dist(D, E)
+ (dist(ABC, D) - dist(ABC, E))) / 2; e = dist(D, E) - d; abc =
dist(ABC, D) - d;
- Slide 9
- dist(ABC, D) and dist(ABC, E) Calculate by taking the distance
from each of the elements A, B, and C and averaging them d = (10 +
(32.6 - 34.6)) / 2 = 4 e = 10 - 4 = 6 abc = 32.6 - 4 = 28.6 ABCDE
032.634.6 D0010 E000
- Slide 10
- Now we can create a new node with distance 28.6 and set D and E
to their respective distances Since D and E are leaves their
distance are kept. However, if they werent then the average of the
child distances would be subtracted as seen later D E A, B, C 4 6
28.6
- Slide 11
- The final step in this iteration is to recalculate the nodes
and distance matrix The nodes array has the new merged node DE
appended to the end and D and E are removed The distance matrix is
updated with DE merged and D and E are removed: ABCDE A0223940
B004142 C00019 DE0000
- Slide 12
- Look at the new distance matrix find the closest pair, C and DE
Now there is a special step. C is a leaf so it gets the calculated
distance DE is not a leaf so we need to subtract from DE the
average child distance C DE A, B c de ab dist(AB, C) is the average
distance from AB to C Dist(AB, DE) is the average distance from AB
to DE c = (dist(C, DE) + (dist(AB, C) - dist(AB, DE))) / 2; de =
dist(C, DE) - c; ab = dist(AB, C) - c;
- Slide 13
- Merging A and B to calculate the average distance to C and DE.
dist(AB, C) dist(AB, DE) ABCDE AB04041 C0019 DE000
- Slide 14
- Average child distance example Recursively take the average of
each branches ((5 + ((2 + (4 + 6) / 2) + 3) / 2) + 1) / 2 = 5.5 4 6
3 1 2 5
- Slide 15
- So for DE which has two child nodes we need to subtract the
average of the children. Since DE has two leaf nodes we perform: (4
+ 6) / 2 = 5 So now we calculate c, de, and ab: c = (dist(C, DE) +
(dist(AB, C) - dist(AB, DE))) / 2 = (19 + (40 41)) / 2 = 9 de =
dist(C, DE) c AverageDistance(DE) = 19 9 (4 + 6) / 2 = 5 ab =
dist(AB, C) c = 40 9 = 31 Notice that the distance at de replaces
whatever was previously there
- Slide 16
- With the new node added: Recalculated distance matrix: C A, B 9
5 31 D E 4 6 ABCDE A02239.5 B0041.5 CDE000
- Slide 17
- As before choose the next closest nodes by looking at the
distance matrix A and B are chosen Now a and b can be calculated
since they are leaves, but notice were linking two trees at cde, so
we need a special step to subtract the average distance A CDE a b
cde B dist(CDE, A) is the average distance from CDE to A Dist(CDE,
B) is the average distance from CDE to B a = (dist(A, B) +
(dist(CDE, A) - dist(CDE, B))) / 2 = 10 b = dist(A, B) - c = 12 cde
= dist(CDE, A) - a = 29.5
- Slide 18
- So 29.5 - AverageDistance(CDE) 29.5 - ((5 + (4 + 6) / 2) + 9) /
2 = 29.5 - 9.5 = 20 C A, B 9 5 D E 4 6 A CDE 10 12 cde B 29.5 C 9 5
D E 4 6 A 10 12 B 20
- Slide 19
- So we have a completely defined unrooted tree. How do we root
it? Just take the last branch and divide it by two C 9 5 D E 4 6 A
10 12 B 10
- Slide 20
- Original: From the generated tree: Exact match Rare to happen
Usually off by a small amount ABCDE A02239 41 B00 43 C0001820
D000010 E00000 ABCDE A02239 41 B00 43 C0001820 D000010 E00000
- Slide 21
- http://sirisian.com/javascript/CS6030Project.html
- Slide 22
- Distance based methods such as the Fitch-Margoliash method
produce very accurate trees given an accurate distance matrix in a
very timely manner
- Slide 23
- Bacardit, J., Krasnogor, N. Phylogenetic Trees [PPT document].
Retrieved from
http://www.cs.nott.ac.uk/~jqb/G53BIO/Slides/Phylogenetic%20Trees.ppt
Louhisuo K. (2004, May 4). Constructing phylogenetic trees with
UPGMA and Fitch- Margoliash. Retrieved from
http://www.niksula.cs.hut.fi/~klouhisu/Bioinfo/phyltree.pdf