51
Lecture 5 : Lecture 5 : Phylogenies Phylogenies 9/16/09

Lecture 5 : Phylogenies

  • Upload
    danae

  • View
    59

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 5 : Phylogenies. 9/16/09. Translated blast = protein vs translated database. Blasting Genbank - blastn. Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum. AX8GS9DG01S. Blasting Genbank - discont megablast - exactly same as blastn. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 5 : Phylogenies

Lecture 5 : PhylogeniesLecture 5 : Phylogenies

9/16/09

Page 2: Lecture 5 : Phylogenies

Translated blast = protein vs translated database

Page 3: Lecture 5 : Phylogenies

Blasting Genbank - blastnBlasting Genbank - blastn

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX8GS9DG01S

Page 4: Lecture 5 : Phylogenies

Blasting Genbank - discont Blasting Genbank - discont megablast - exactly same as megablast - exactly same as

blastnblastn

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX9N23U7014

Page 5: Lecture 5 : Phylogenies

Blasting Genbank - megablast - Blasting Genbank - megablast - same species but different ordersame species but different order

Z. bruijni - long beaked echidna T. aculeatus - echidna T. rostratus = honey possum

AX9TUM1G016

Page 6: Lecture 5 : Phylogenies

Blasting Genbank - Blasting Genbank - TblastnTblastn

AX9DYYTE01N

T. aculeatus - echidna S. brachyurus - quokka S. crassicaudata - fat tailed dunnart M. fasciatus - numbat I. obesulus - quenda

Page 7: Lecture 5 : Phylogenies

Species found by BLASTSpecies found by BLAST

I. obesulus = quenda = bandicoot

T. aculeatus = echidna

M. fasciatus = numbat

T. rostratus = honey possum S. crassicaudata

= fat tailed dunnart

O. anatinus = platypus

S. brachyurus = quokka

Z. bruijni - Long beaked echidna

Page 8: Lecture 5 : Phylogenies

Homologene - can be reached Homologene - can be reached from NCBI home pagefrom NCBI home page

Scroll down - they are listed alphabetically

Page 9: Lecture 5 : Phylogenies

QuestionsQuestions

Phylogenies - what are they?

1. How do we build them?

2. What do they tell us?

Page 10: Lecture 5 : Phylogenies

PhylogenyPhylogeny Evolutionary

history of a a group of organisms, especially as depicted in a family tree

Haeckel, 1879

Page 11: Lecture 5 : Phylogenies

Things trees might tell Things trees might tell you :you :

How are organisms with particular trait related?

Did trait evolve multiple times or only once?

What is evolutionary pathwayOf organismsOf genes

Page 12: Lecture 5 : Phylogenies

Molecules can be used to Molecules can be used to learn how organisms are learn how organisms are

relatedrelated

Page 13: Lecture 5 : Phylogenies

To learn about vertebrate To learn about vertebrate evolution: Compare >600 genesevolution: Compare >600 genes

1998

Page 14: Lecture 5 : Phylogenies

Used genes to measure time

1) Time since common ancestor with human

2) Time since two groups diverged

Page 15: Lecture 5 : Phylogenies

More recent version of vertebrate evolution which shows divergence times on the animal tree

Ponting 2008

Page 16: Lecture 5 : Phylogenies

OrangutanHumanChimpRhesus monkey

MouseRat

DogCatHorseCowOpposum

Wallaby

Anole

Chicken

FrogFish -Medaka Fugu Tetraodon ZebrafishElephant sharkLamprey

Platypus

Page 17: Lecture 5 : Phylogenies

Primates 25 MY

Mammals 100 MY100 MY

All vertebrates 550 MY

Tetrapods 420 MY420 MY

Fish 320 MY

Page 18: Lecture 5 : Phylogenies

Molecular clockMolecular clock

Molecules change at a steady rate We can calibrate how fast they

change using fossils The molecules then become a time

piece to measure how recently different groups split off from each other

Page 19: Lecture 5 : Phylogenies

Sequence conservation may Sequence conservation may be highbe high

Gene might code for a protein which is highly constrained

Might have to interact with lots of other proteins

Selection might be quite strong

Page 20: Lecture 5 : Phylogenies

Sequence conservation may Sequence conservation may be lowbe low

Not much constraint

Few sites of interaction

Selection might be weak

Page 21: Lecture 5 : Phylogenies

Phylogeny stepsPhylogeny steps

Align sequences so homologous AA can be compared

Determine the similarity between sequences

Use this to generate a relationship between sequences

Page 22: Lecture 5 : Phylogenies

Clustalw2 to align Clustalw2 to align sequencessequences

Page 23: Lecture 5 : Phylogenies

Put sequences in FASTA Put sequences in FASTA filefile

>TetraodonG1MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGTPINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPLFGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVRAAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCAFFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGGAVDDETSVSASKTEVSSVS

>ZebrafishG1MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGFPINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVMEGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPLFGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKAAAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPAFFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA

>CichlidG1MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGTPINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPLFGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKAAAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPAFFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGGMVEDETSVSTSKTEVSSVS

Page 24: Lecture 5 : Phylogenies

Aligned sequences .aln ; Jalview gives colored version

Funky tree .dnd (need special program to draw)

Scroll down this page for tree (use Phylogram)

Page 25: Lecture 5 : Phylogenies

CLUSTAL W (1.83) multiple sequence alignment

TetraodonG1 MVWDGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYPQYYLVDPIMFKMLALYMFFLICTGT 60CichlidG1 MAWEGGIEPNGTEGKNFYIPMSNRTGIVRSPFEYTQYYLADPIFFKLLAFYMFFLICTGT 60ZebrafishG1 --------MNGTEGSNFYIPMSNRTGLVRSPYDYTQYYLAEPWKFKALAFYMFLLIIFGF 52 *****.***********:****::*.****.:* ** **:***:** *

TetraodonG1 PINGLTLLVTAQNKKLRQPLNYILVNLAVAGLIMCAFGFTITITSAINGYFILGATACAV 120CichlidG1 PINSLTLFVTAQNKKLRQPLNYILVNLAVAGLIMCCFGFTITITSAFNGYFILGSTFCAI 120ZebrafishG1 PINVLTLVVTAQHKKLRQPLNYILVNLAFAGTIMVIFGFTVSFYCSLVGYMALGPLGCVM 112 *** ***.****:***************.** ** ****::: .:: **: **. *.:

TetraodonG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFTGTHAAVGVLFTWIMAFACAGPPL 180CichlidG1 EGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSGAHAGAGVLFTWIMAMACAAPPL 180ZebrafishG1 EGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSANHAMAGIAFTWFMACSCAVPPL 172 ***:*****:**************************:. ** .*: ***:** :** ***

TetraodonG1 FGWSRYLPEGMQCSCGPDYYTLAPGYNNESYVIYMFVVHFFVPVFLIFFTYGSLVLTVR- 239CichlidG1 FGWSRYIPEGMQCSCGPDYYTLAPGFNNESYVIYMFVVHFFVPVFIIFFTYGSLVMTVKA 240ZebrafishG1 FGWSRYLPEGMQTSCGPDYYTLNPEYNNESYVMYMFSCHFCIPVTTIFFTYGSLVCTVKA 232 ******:***** ********* * :******:*** ** :** ********* **:

TetraodonG1 AAAQQQESESTQKAQREVTRMCILMVLGFLVAWTPYATFSGWIFMNKGAAFHPLTAALCA 299CichlidG1 AAAQQQDSASTQKAEKEVTRMCVLMVMGFLIAWTPYASFAGWIFMNKGASFSALTAAIPA 300ZebrafishG1 AAAQQQESESTQKAEREVTRMVILMVLGFLFAWVPYASFAAWIFFNRGAAFSAQAMAVPA 292 ******:* *****::***** :***:***.**.***:*:.***:*:**:* . : *: *

TetraodonG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTFGMGG--AVDDETS-VSASKTEVSSVS-- 351CichlidG1 FFAKSSALYNPVIYVLMNKQFRNCMLSTIGMGG--MVEDETS-VSTSKTEVSSVS-- 352ZebrafishG1 FFSKTSAVFNPIIYVLLNKQFRSCMLNTLFCGKSPLGDDESSSVSTSKTEVSSVSPA 349 **:*:**::**:****:*****.***.*: * :**:* **:*********

Page 26: Lecture 5 : Phylogenies

Alignment is keyAlignment is key

Any other analysis that you do is only as good as your alignment

If your alignment is bad subsequent analyses will be bad

Junk in = Junk out

Page 27: Lecture 5 : Phylogenies

AlignmentsAlignments

Tell you about sequence conservationHow much is there?Where is it?

Page 28: Lecture 5 : Phylogenies

Calculate sequence Calculate sequence similaritiessimilarities

Zebrafish M--------NGTEGSNFYIPMSNR Trout M------Q-NGTEGSNFYIPMSNR Medaka M------E-NGTEGKNFYIPMNNR Cod M----RMEANGTEGKNFYIPMSNR Halibut MVWDGGIEPNGTEGKNFYIPMSNR Tetraodon MVWDGGIEPNGTEGKNFYIPMSNR Goldfish M--------NGTEGNNFYVPLSNR Killifish M---GYG-PNGTEGNNFYIPMSNK * *****.***:*:.*:

Pairwise comparisons

Page 29: Lecture 5 : Phylogenies

Use tree to show Use tree to show sequence relationshipssequence relationships

Short branches mean sequences are more similarLong branches mean there are more differences

Page 30: Lecture 5 : Phylogenies

Q3. How do we build Q3. How do we build phylogenies?phylogenies?

Assume the relationships involve bifurcating branches

ATC

ATG

ACG

CCG

CCC

ATC

ATG

ACG

CCG

CCC

Page 31: Lecture 5 : Phylogenies

Methods to determine Methods to determine similaritiessimilarities

Parsimony

Distance

Maximum likelihood

Bayesian

Page 32: Lecture 5 : Phylogenies

ParsimonyParsimony

The least complex explanation is the most likely to be correctOccam’s razor

The preferred phylogenetic tree is one that requires fewest changes Count up # changes for all possible

treesFind the shortest one

Page 33: Lecture 5 : Phylogenies

Trees based on parsimonyTrees based on parsimony

ATCG

ATCG

ACCG

ACCG

ATCG

ACCG

ATCG

ACCG

CT

CT

CT

Most parsimonious

Page 34: Lecture 5 : Phylogenies

Trees based on parsimonyTrees based on parsimony

T

T

C

C

T

C

T

C

CT

CT

CT

Most parsimonious

Page 35: Lecture 5 : Phylogenies

Can’t always distinguish tree Can’t always distinguish tree topologiestopologies

T

T

C

C

T

T

C

C

CT CT

Equally parsimonious

Page 36: Lecture 5 : Phylogenies

Other limitationsOther limitations

All changes are weighted the sameC-T same as C - ASame no matter how long it takes for

the change to occur

Page 37: Lecture 5 : Phylogenies

Distance methodsDistance methods

Calculate a numerical value for sequence differencesDo for all pairwise combinations

Build tree by joining most similar sequences and then more divergent

Page 38: Lecture 5 : Phylogenies

Distance methodsDistance methods

Fast Pretty robust Only deals with data in pairs

Page 39: Lecture 5 : Phylogenies

Pairwise distancesPairwise distances

Taxa1 AACGGTCATGGCGTTGCATTTaxa2 AACGGTCAGGGCGTTGCATTTaxa3 AACGGTCACGCCGCTGCATT

1 2 3

1 0 .05 .15

2 .05 0 .15

3 .15 .15 0

Page 40: Lecture 5 : Phylogenies

Distance, dDistance, d

p is fractional similarity of sequence

Simplest form of distance: d = 1 - p

AACGGTCATGGCGTTGCATTAACGGTCACGGCGTTGCATT

p = 19/20 d = 0.05

Page 41: Lecture 5 : Phylogenies

Tree buildingTree building

Neighbor joiningJoin most similar pair of sequencesAdd more divergent after

1 2 3

1 0 .05 .15

2 .05 0 .15

3 .15 .15 0

1

2

3

Page 42: Lecture 5 : Phylogenies

How different can 2 sequences How different can 2 sequences get?get?

At infinite time, random probability that two sequences are the sameProbability a base is same = 1/4

DNA only has 4 basesCertain sites will start to change

multiple timesNeed to account for these multiple hits

Page 43: Lecture 5 : Phylogenies

Random sequencesRandom sequences

Write down 20 bases of sequence

Page 44: Lecture 5 : Phylogenies

Compare your sequence Compare your sequence to this oneto this one

AGTCCGATTACGGCTAGCAG

What fraction of sites are the same in the two sequences?

Page 45: Lecture 5 : Phylogenies

Sequence similarity Sequence similarity decays to 25% over long decays to 25% over long

timestimes

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence similarity

Page 46: Lecture 5 : Phylogenies

Sequence difference Sequence difference maxes at 0.75maxes at 0.75

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence difference

Page 47: Lecture 5 : Phylogenies

Sequence change accumulates Sequence change accumulates linearly with time at beginninglinearly with time at beginning

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5

Time

Sequence difference

Page 48: Lecture 5 : Phylogenies

DNA modelsDNA models Use different DNA models to

account for how sequences evolve with timeAllows you to apply different molecular

clocksRelate sequence change to timeClock is not linear except for small

changes and short times Models same as used in maximum

likelihood methods

Page 49: Lecture 5 : Phylogenies

How good is your tree?How good is your tree?

Bootstrap approachRun the same method multiple timesSubsample data each time

Use 50% of dataSee how reproducible the trees areCount how many times a particular

grouping occurs

Page 50: Lecture 5 : Phylogenies

Distance tree Distance tree for rod and for rod and cone cone transducin transducin alpha alpha subunitsubunit

Branch lengths Branch lengths are are proportional to proportional to sequence sequence

differencesdifferences

Page 51: Lecture 5 : Phylogenies

Boot strap values are given for each node which tells how reproducible that

grouping is

58

100

100

95

98

72

69

72

98

86

98

68

97