21
1 Methods of molecular phylogeny Peter Norberg ([email protected]) Content Introduction to Evolution and taxonomy Phylogenetic analysis • Algorithmics Applied phylogenetics Computer Software Practical session

Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg ([email protected]) Content • Introduction

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

1

Methods of molecular phylogeny

Peter Norberg([email protected])

Content

• Introduction to Evolution and taxonomy• Phylogenetic analysis• Algorithmics• Applied phylogenetics• Computer Software• Practical session

Page 2: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

2

Evolution

• Charles Darwin• ”Tree of life”• Phylogenetic tree• Root = Ancestor to

all species

Rooted or unrooted trees?• Trees show

evolutionary relationships

• The root shows direction

Page 3: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

3

Different representations

A B C D

ABCD

AB

CD

AB

CD

ABCD

Trees can be based on:

• Outer appearances (e.g. wings, shape of bills)• Functionality• Complexity• A combination of…• ………..• …..• DNA, RNA, AA, gene order….

Page 4: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

4

Phylogenetic trees based on DNAAATTGGCC AATAGGCC

AATAGGAC

AATTGGCG

AGTTGGCG

TATTGGCG

AATAGGCA

AATAGGACAATAGGCAAGTTGGCGTATTGGCG

Phylogenetic trees based on DNAAATTGGCC AATAGGCC

AATAGGAC

AATTGGCG

AGTTGGCG

TATTGGCG

AATAGGCA

AATAGGACAATAGGCAAGTTGGCGTATTGGCG

Page 5: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

5

Genomic region

• Same genomic region for all taxa!• Not too similar• Not too diverged• Insertions/deletions

Sequence alignment(1) AATGGCAACCGCATTCAGGATTTAA

(4) AATGGTAACCGCATTCAGGAATTA

(2) AATGGTAACCGCAAGGATTTAA

(3) ATGGTAACCGCATTGAGGATTTAA

(5) TGGTAACCGCATTCAGGAATTAA

(1) AATGGCAACCGCATTCAGGATTTAA(2) AATGGTAACCGCAA GGATTTAA(3) ATGGTAACCGCATTGAGGATTTAA(4) AATGGTAACCGCATTCAGGAATTA(5) TGGTAACCGCATTCAGGATTTAA

(1) AATGGCAACCGCATTCAGGATTTAA(2) AATGGTAACCGCAAGGATTTAA(3) ATGGTAACCGCATTGAGGATTTAA(4) AATGGTAACCGCATTCAGGAATTA(5) TGGTAACCGCATTCAGGATTTAA

Correct: Wrong:

Page 6: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

6

Sequence alignment, our exampleAATTGGCC AATAGGCC

AATAGGAC

AATTGGCG

AGTTGGCG

TATTGGCG

AATAGGCA

AATTGGCCAATAGGCC

AATAGGACAATTGGCG

AGTTGGCGTATTGGCG

AATAGGCA

AATTGGCCAATAGGCC

AATAGGACAATTGGCG

AGTTGGCGTATTGGCGAATAGGCA

Phylogenetic principles

• Similar DNA sequences = closely related

• Inherited mutations.

• Simplest “route”!

• Homoplasy unlikely (not always true).

Page 7: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

7

Homology vs. homoplasy

• Homology = similarity due to a common ancestor

• Homoplasy = similarity due to convergent evolution, but independent origins

Algorithms for constructing phylogenetic trees

• What is an algorithm?

• Several different phylogenetic algorithms exist.

• How do they work?

Page 8: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

8

• Distance matrices– Neighbour Joining– UPGMA

• Maximum Parsimony

• Maximum Likelihood

• Bayesian inference

Algorithms for constructing phylogenetic trees

Distance matrices

• Based on the genetic distance

• Genetic distance based on nucleotide substitutions

• Typically # of differences / totalt # of nt

(1) AATTCCGG(2) AATACCGG(3) AATTAATG

1 2 31 0 2 1 03 3 4 0

1 2 31 0 2 0.125 03 0.375 0.5 0

Page 9: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

9

Neighbour Joining

• Cluster in pairs

• Shortest distance first

• => Similar sequences located closely together in the tree

• Fast algorithm!

2

1

3

1 2 31 0 2 0.125 03 0.375 0.5 0

AB

CD

Maximum Parsimony

• Utilizes so-called informative sites.

• Simplest path (fewest mutations)

• Build all possible trees.

• Choose the tree, which requires the fewest mutations

• Relatively fast

Page 10: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

10

Maximum Parsimony, example

(1)AATTCC(2)AAGTCC(3)AATTCC(4)AAGTCT

a

1 23 4

a

1 2 3 4

a a

1 4 2 3

a a

1

2

3

4a

1 2

3 4

a

1 3

2 4

1 2

4 3

a a

a

a

Maximum Likelihood and Bayesian inference

• Statistical method including an evolutionary model

• Summarize the likelihood for all columns

• Calculate the likelihood for all possible trees

• Good but slow!

• Bayesian inference faster

Page 11: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

11

To test all possible trees• Is it possible?

• => Takes too long time!!!!

• To analyze 20 taxa gives ~1022 different possible trees (10.000.000.000.000.000.000.000)

• What to do?

• => Use sophisticated algorithms to limit the search space…..

• Usually produce good results, but not necessarily the best

To root an unrooted tree

• Include an “outgroup”• Outgroup = more distantly related (but not

too distantly)• Place the root where the outgroup connects

to the tree

Page 12: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

12

Rooting a tree

A

BC

D

E

F

outgroup

G

AB

FD

G

CE

Significance

• Is the tree reliable?

• Is it the only probable?

• Bootstrap, Jack knife etc.

Page 13: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

13

Bootstrap

• Construct several new sequence sets (1000 st.)

• A new sequence set is generated by randomly picking of columns from the original set

• Apply the phylogenetic algorithm on all sets.

• Make one consensus tree from all trees

Bootstrapping

A: AACTTAACCACGCTATCGATGCAATTATATAB: AATTTGACTGCGGTACCGATCCAATTATATAC: AATTTGACTGGGCTACCGATCCAATTATATAD: AACTTAACCGCGCTACTGATCGAATTATATA

A: CACCB: TGCTC: TGCTD: CAGC

A

D

B

C

96

96

A

B

C

D

3

3

A

C

B

D

1

1

Page 14: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

14

Pitfalls?

• Homoplasy (convergent evolution)- Selection pressure- Hyper variable regions- Random events

• Gene duplication

• Recombination- Different regions have different ancestries

Recombination

A

B

Recombinants

Recombination

Page 15: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

15

Detection of recombinants

X

A

B

C

D

E

FG

H

I

H

Detection of recombinants

X

A

B

C

DE

FG

H

I

A

B

C

D

E

FG

H

I

H

Page 16: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

16

A

B

C

D

A

B

C

D

R A

B

C

D

R

A

B

C

D

R

A

B

C

D

R

Phylogenetic networks

Applied phylogenetics

• Reconstruct evolutionary history

• Animals, plants, bacteria, viruses, plasmids, ……

• Establish evolutionary mechanisms

• Functional studies

• Trace pandemic diseases

• Forensic medicine

Page 17: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

17

Examples

Page 18: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

18

Page 19: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

19

Practical session

Phylip• Software package for phylogenetic analysis• Several small (command-line) applications• Many different algorithms• Widely used by the scientific community

• seqboot -> Constructs bootstrap sets• dnapars -> Constructs a maximum parsimony tree• consence -> Constructs a consensus tree• drawtree -> Draws the tree

Page 20: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

20

Herpes Simplex Virus Type 1 & 2

• ~100 nm in diameter.• Capsid surrounded by envelope.• Different glycoproteins in envelope.

• Usually asymptomatic• Cause oral and genital lesions, encephalitis, meningitis and keratitis• Transferred via direct contact• Life long infection in the sensorial ganglia• HSV-1: 70-80%, HSV-2: 20-30%

Photo by Linda M. Stannard, University of Cape Town.

HSV-1 US7 (Glycoprotein I)

Page 21: Methods of molecular phylogenybio.biomedicine.gu.se/courses/ht09/genomics/molphy.pdf · Methods of molecular phylogeny Peter Norberg (Peter.norberg@gu.se) Content • Introduction

21

Clinical samplesIsolate Gender Location

E4 F brain (encephalitis)25 M brain (encephalitis)274 F brain (encephalitis)1666 M brain (encephalitis)3355 F brain (encephalitis)7682 M brain (encephalitis)90132 F oral90147 M oral90238 M oral90395 F oral90444 F brain (encephalitis)90579 F oral90602 M oral94783 M oral97869 F genital981264 F genital982466 M genital983501 F genital993412 F oral993515 F oral993565 F genital993576 F genital993594 F oral993606 F oral993608 F genital993615 F genital993621 F genital993626 F genital