Computational Molecular Biology Biochem 218 – BioMedical Informatics 231 Doug Brutlag Professor

Embed Size (px)

Citation preview

  • Slide 1

Computational Molecular Biology Biochem 218 BioMedical Informatics 231 http://biochem218.stanford.edu/ http://biochem218.stanford.edu/ Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Phylogenies Slide 2 Cladogram Representation of Phylogenies A B C D E F G H I J L M N O P Q Years x 10 6 100 0 Years x 10 6 10 0 0 Slide 3 Dendrogram Representation of Phylogenies Substitutions per 100 Residues 100.00 GrowTree Phylogram February 1, 2010 Slide 4 Cladogram Slide 5 Phenogram Slide 6 Curve-O-Gram Slide 7 Eurogram Slide 8 RadialGram Slide 9 Methods for Determining Phylogenies Parsimony (character based) o Assigns mutations to branches o Minimize number of edits o Topology maximizes similarity of neighboring leaves Distance methods o Branch lengths = D(i,j)/2 for sequences i, j o Distances must be at least metric o Distances can reflect time or edits o Distance must be relatively constant per unit branch length A B C D E F G H I J L M N O P Q Slide 10 Methods for Determining Phylogenies Parsimony o Minimum mutation (Fitch, PAUP) o Minimal length encoding Probabilistic o Branch and Bound o Maximum likelihood Distance methods o Ultrametric Trees o Additive Trees o UPGMA o Neighbor Joining A B C D E F G H I J L M N O P Q Slide 11 Properties of Trees Rooted or Unrooted Nodes and Branches o Internal Nodes o External Nodes - leaves Operational Taxonomic Units Outgroups Topology One path/pair Distances X Y Z R1 1 3 5 4 2 2 1 X3 4 5 Z Y R2 21 X Z R1 R2 5 3 4 Y Slide 12 Orthologous Evolution Slide 13 Paralogous Evolution hemoglobins Slide 14 Challenges Making Trees: Gene Duplication versus Speciation Slide 15 Orthology and Paralogy HB Human WB Worm HA1 Human HA2 Human Yeast WA Worm Thanks to Seraphim Batzoglou Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications Slide 16 Challenges Making Trees: Gene Conversion A A T C G C G A T A G C A T C A A T T C C C T C Thanks to Maryellen Ruvolo Slide 17 A A T C G C G A T A G C A T CC G C G A T C TC A T C A A T T C C C T C Challenges Making Trees: Gene Conversion Thanks to Maryellen Ruvolo Slide 18 A B C D Gene N Challenges Making Trees: Gene Conversion Gene M Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications Slide 19 A N B N C M C N D N A M B M D M Challenges Making Trees: C M Has Been Converted from C N Thanks to Maryellen Ruvolo Orthologs: Derived by speciation Paralogs: Gene Duplications Orthologs: Derived by speciation Paralogs: Gene Duplications Gene N Gene M Slide 20 Consensus CG/LH Tree Thanks to Maryellen Ruvolo Slide 21 Gene conversion between 1 st & 2nd exons of LH, CG2 Genes LH Gen e CG2 Gene 168 nt 15nt No ConversionConversion ThankThank Thanks to Maryellen Ruvolo Slide 22 Challenges Making Trees: Varying Rates of Mutation Slide 23 Challenges Making Trees: Horizontal Gene Transfer Slide 24 Maximum Ultrametric Distance Trees Matrix D is ultrametric for tree T if: o If D is a symmetric n by n matrix of distances o T contains n leaves, one from each row or column o Each node of T labeled by one entry from D o Numbers from root to leaves strictly decrease o For any two leaves i, j, D(i,j) labels nearest common ancestor of i and j in tree Matrix DTree T Slide 25 Maximum Ultrametric Distance Trees A symmetric matrix D is ultrametric if and only if for every three leaves i, j, and k, there is a tie for the maximum distance between D(i,j), D(i,k) and D(j,k). U V IJ K Slide 26 Additive Distance Trees Matrix D Tree T A B C D 2 1 3 2 4 Slide 27 Distance Metrics Obey the Triangle Inequality D(i,j) D(i,k) +D(j,k) for all i, j, k (Max Score - Smith-Waterman Score) is a Metric if o If Gap-penalty 1+ Gap-size/(n-1) o Assuming match = 1 and mismatch = -1 Slide 28 Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A 1 2 3 A L1,A L2,A L3,A Slide 29 Three Leaf Tree Observe D 1,2 D 1,3 D 2,3 Calculate L 1,A L 2,A L 3,A D1,2=L1,A+L2,A D1,3=L1,A+L3,A D2,3=L2,A+L3,A 1 2 3 A L1,A L2,A L3,A Slide 30 Solution to Three Species Tree 1 2 3 A L1,A L2,A L3,A Slide 31 Four Species Tree Calculate L 1,A L 2,A L 3,B L 4,B, L A,B Observe D 1,2 D 1,3 D 1,4 D 2,3 D 2,4 D 3,4 1 2 3 4 L1,A L2,A A,B L3,A L4,A A B Slide 32 Four Species Topology Label species 1, 2, 3, and 4 so that: D(1,2) + D(3,4) D(1,3) + D(2,4) = D(1,4) + D(2,3) 1 2 3 A 4 B L1,A L2,A LA,B L3,A L4,A Slide 33 Solution for Four Species L1,A = 1/4*(D1,3 + D1,4 - D2,3 -D2,4) + 1/2*D1,2 L2,A = 1/4*(D2,3 + D2,4 - D1,3 - D1,4) + 1/2*D1,2 LB,3 = 1/4*(D1,3 + D2,3 - D1,4 - D2,4) + 1/2*D3,4 LB,4 = 1/4*(D1,4 + D2,4 - D1,3 - D2,3) + 1/2*D3,4 LA,B = 1/4*(D1,3 + D1,4 + D2,3 + D2,4) - 1/2*(D1,2 + D3,4) 1 2 3 A 4 B L1,A L2,A LA,B L3,A L4,A Slide 34 Four Species =>Three Topologies 1 2 3 A 4 B 1 3 2 A 4 B 1 2 A 3 4 Slide 35 Species, Distances, Branches & Topologies Slide 36 Slide 37 Number of Topologies for n Species Slide 38 UPGMA: Unweighted Pair Group Method with Arithmetic Average Where D1,(34) = (D1,3+D1,4)/2 and D2,(34) = (D2,3+D2,4)/2 1 2 34 1 2 3 4 Slide 39 UPGMA Dendrogram 123 4 1 2 3 4 Slide 40 UPGMA Clustering Slide 41 Neighbor Joining Method XY i j n 1 2 3... n-2 n-1 n n-2 n-3 n-4 n-5... 5 4 3 2 1 Slide 42 Nearest Neighbor Dendrogram A B C D E R W X Y Z Slide 43 New Hampshire Standard Tree Slide 44 SeqWeb GrowTree Program http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot Slide 45 GrowTree Parameters Slide 46 GrowTree Distances http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot Slide 47 GrowTree Phylogram (UPGMA) Slide 48 GrowTree Alignment Slide 49 GrowTree Neighbor Joining Tree http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot Slide 50 GrowTree VegF Input http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot Slide 51 GrowTree VegF Neighbor Joining Tree Slide 52 VegF Growth Factors http://en.wikipedia.org/wiki/Vascular_endothelial_growth_factor http://en.wikipedia.org/wiki/Vascular_endothelial_growth_factor Slide 53 GrowTree VegF UPGMA Tree http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot Slide 54 GrowTree VegF Alignment http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot http://seqweb.stanford.edu:81/gcg-bin/analysis.cgi?program=evolution-prot