1 Comparative Genomics in Vertebrates : Lessons from the Tetraodon nigroviridis genome

Preview:

Citation preview

1

Comparative Genomics in Comparative Genomics in Vertebrates :Vertebrates :

Lessons from the Tetraodon nigroviridis genome

2

R. Hinegardner 1968

3

Identify human genes by comparison to a compact vertebrate genome

Tetraodon genomic sequence

Human genomic sequence Exons

4

Query: SPWTFPS*FLMSSSMKVPSWSRISSPM*GIL*STVSSST SPWTFPS* L+SSS+KV S S SSPM*GIL T SSSTSbjct: SPWTFPS*LLISSSIKVSSSSFTSSPM*GILHKTXSSST

Query: LLFQLFLALSDLKQLRILHTDLKPDNVMLVD--EKELKIKLMDFGLALLTHEAKT--GTI +L Q+ AL LK L ++H DLKP+N+MLVD + ++K++DFG A +H +KT T Sbjct: ILQQVATALKKLKSLGLIHADLKPENIMLVDPVRQPYRVKVIDFGSA--SHVSKTVCSTY

Query: VNALAQYSHNEDEEEEEEHDFKVDKT-DLCDSKKHPE VNAL QY+ ++D+++ ++ + + +K DL D + ESbjct: VNALGQYNDDDDDDDGDDPEEREEKQKDLEDHRDDKE

Query: RYKELTEQQMPGALPPECTPNMDGPHARSVRREQSLHSFHTLFCRRCFKYDRFLH +YKELTEQQ+PGALPPECTPN+DGP+A+SV+REQSLHSFHTLFCRRCFKYD FLHSbjct: KYKELTEQQLPGALPPECTPNIDGPNAKSVQREQSLHSFHTLFCRRCFKYDCFLH

5

BLAST

A T T G C G T A T G C A G C G T A G C A A T T G C G A T A C

T T A C G C G A T G T A G A C A G C G T A G C A A T G T T G C A

Exact match

Query

Subject

word of size W = 11 bases

6

A T T G C G T A T G C A G C G T A G C A A T T G C G A T A C

T T A C G C G A T G T A G A C A G C G T A G C A A T G T T G C A

Blast:

Query

Subject

T A T G C A G C G T A G C A A T

Scoring matrix NUC.4.4

A T G C NA 5 -4 -4 -4 -2T -4 5 -4 -4 -2G -4 -4 5 -4 -2C -4 -4 -4 5 -2N -2 -2 -2 -2 -1

+5-4-4+5

- 8 < X

X = threshold for cumulative score of successive mismatches = 21 by default

W

7

Word “W” = 3 amino acids

(threshold “X”)

(threshold “T”)

L E C N Q L I P I A H K T C P E G K N L

H K TH L TH V TH Y TY K TN K T

L K C H N T Q L P F I Y K T C P E G K N

Extension

Automaton

TBLASTX, BLASTP, BLASTX

8

A R N D C Q E G H I L K M F P S T W Y V B Z X *A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 B -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4 Z -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1

BLOSUM62 scoring matrix

9

Results

TBLASTX:

- Non substitutive scoring matrix: match = +15mismatch = -12

- Initial anchoring word: W= 5

- Never more than 2 consecutive mismatches: X = 25

10

33 % of Tetraodongenome

TBLASTXW=5, X=25n.s. matrix

(10 hours)

8,3 million alignments

322 annotatedHuman genes

11Length (bp)

% Id

12

Exofish performancesExofish performances

Sensitivity Sensitivity genesgenes 62.5%62.5%exonsexons 27.5%27.5%

SpecificitySpecificity100 %100 %

On 322 genesOn 322 genes

Human geneHuman gene

Tetraodon Tetraodon matchesmatches

EcoresEcores((EEvolutionary volutionary CoConserved nserved ReRegions)gions)

Ecores per gene 2.58

13

ExofishExofish

(fishing for exons…with fish exons)(fishing for exons…with fish exons)

Human genomic sequence

Tetraodon genome

Compute alignments

Assemble selected alignments

EcoresEvolutionary conserved regions

Select alignments

Filter repeats and low complexity regions

14

a

Genscan Genscan

Exofish Exofish Annotation Annotation

Carnitine palmitoyl transferase ICarnitine palmitoyl transferase I

15

Genscan

Exofish

Exofish

Genscan Exofish

Annotation

Annotation

Similar to mouse HTF9C

Ran binding protein 1

KIAA1292 protein

16

Exofish 43728 ecores

Estimating the number of genesin a genome

Refseq(13751 genes)

65 %

35 %

? ecores

(How many genes ?)

Human genome

17

How many genes in the human genome?

42066 ecores found in 42,4% of the human genome

42066 / 0.424 = 99212 ecores in the entire genome

11% of ecores correspond to pseudogenes

99212 x 0.89 = 88299 ecores correspond to genes

A gene possesses on average between 2.58 and 3.18 ecore88299 / 3.18 = 27767 genes88299 / 2.58 = 34224 genes

28000 < Human genes < 34000

Estimation based on 42 % of the human genome (january 2000)

18

Genesweep (organisé par Ewan Birney a Cold Spring Harbor in may 2000).

Science, 28 may 2000

19

Organism Nb. Genes Size genome

Virus flu 8 0,001 MbVirus polio 1 0,007 Mb

Mycoplasma genitalium 480 0,58 MbArcheoglobus fulgidus 2.420 2,18 MbMesorhizobium loti 6.746 7,03 Mb

Yeast 6.000 16 MbNematode 19.000 100 MbDrosophila 14.000 120 MbArabidopsis 25.000 100 MbHuman 30.000 3000 Mb

Number of genes in eukaryotes; a paradox ?

Paramecium 40.000 80 Mb

How to estimate the number of genes in a genome…… without knowing the sequence?

92 93 94 95 96 97 98 99 00 01 02 03 04 05 06

20 000

40 000

160 000

140 000

120 000

100 000

80 000

60 000

?

(Antequera and Bird)

(Fields et al.)

(Roest Crollius et al.)

(Lander et al.)

Published estimates

(Ewing and Green et al.)

(Liang et al.)

21

64% of the genome is anchored to chromosomes

36% remains as independent sequences

350 Mb genome21 chromosomes

Whole Genome Shotgun Sequencing : 8 X

Assembly with Arachne

Physical mapping

Gnatostomata(jawed vertebrates)

Chondrichthyes(cartilaginous fishes)

Actinoptérygiens(ray finned fish)

Osteichthyes(bony fishes)

Mammals

Tetrapodes Coelacanthimorpha

Sauropsidae

Mus musculus

Homosapiens

Gallusgallus

Oryziaslatipes

Tetraodonnigroviridis

Takifugurubripes

Danio rerio

Sarcopterygiens(lobe finned fish)

Teleosts Acipenseriforms(sturgeons,…)

Percomorphs Otophysi

CypriniformsBeloniforms Tetraodontiforms

225 my

Pal

éoz

oic

Méz

ozoi

c

23

Ancestral species

orthologs

paralogs

Species 1 Species 2

speciation

A B

duplication

B’

• A and B derive from an ancestral gene by speciationspeciation: they are orthologsorthologs

• B’ appears by duplication of B: they are paralogsparalogs

Signature?

Signature?

24

Ancestral genome

Duplication Deletionsintra-chromosomal

rearrangementsFusions

and fissions

Translocations

Time (tenth of million years)

25

Tetraodon Takifugu

n = 1078Ks <=0.35 n = 330 30.6%Ks > 0.35 n = 748 69.4%

n = 995Ks <=0.35 n = 179 18.0%Ks > 0.35 n = 816 82.0%

Identification of duplicate genes

26

Distribution of 748 duplicate genes in the Tetraodon genome

27

Common ancestor

duplication

diploidization

Homo sapiens Tetraodon nigroviridis

28

Human genome:Synteny with the Tetraodon genome

Tetraodon genome:Synteny with the human genome

29

30

The Paleozoic era

31Drawing by Z. Burian under the direction of Prof. J. Augusta

The giant placoderm Dunkleosteus (~7 metres) chases two Cladoselache sarks

Gnatostomata(jawed vertebrates)

Chondrichthyes(cartilaginous vertebrates)

Actinoptérygiens(ray finned fishes)

Osteichthyes(bony vertebrates)

Mammals

Tetrapods Coelacanthimorpha

Sauropsidae

Mus musculus

Homosapiens

Gallusgallus

Oryzalatipes

Tetraodonnigroviridis

Takifugurubripes

Danio rerio

Sarcopterygiens(lobbed finned fishes)

Teleosts Acipenseriforms(esturgeons,…)

Percomorphs Otophysi

CypriniformsTetraodontiformsBeloniforms

33

34

The ancestral osteichthyes genome (bony vertebrates)

35

What are the intermediary steps in the evolution of the Tetraodon and the human genome ?

36

Modeling the evolution of a duplicated Tetraodon chromosome

Gene order is progressively rearranged over time along Tetraodon and human chromosomes (independently)

The degree of rearrangement along a chromosome segment is thus a measure of elapsed time

Modeling a few simple cases of chromosomal rearrangements in Tetraodon:

1) No rearrangement2) a recent fusion between two chromosomes3) an ancient fusion between two chromosomes4) a fission (break) of a chromosome

37

A simple case: no interchromosomal rearrangement after the dulication

Tetraodon nigroviridisHomo sapiens

Ancestral genome

38

39

1 2 3 4 5 6 7 8 9 10 11 12 13

14

15

16

17

18

19

20

21

Chromosomes Tetraodon

1

2

3

4

5

6

789

10

11

12

13141516

171819

2021

22X

Chr

omos

ome

s H

uma

in

Distribution of 6884 orthologs in their respective genomes

9 11

1

2

3

4

5

6

7

89

10

11

12

13141516

171819

2021

22X

Tetraodon chromosomes

40

41

42

Olivier JaillonJean-Marc AuryJean-Louis PetitLaurence BouneauCécile FischerAlain BernotSophie NicaudCarole DossatBéatrice SegurensCorinne DasilvaMarcel SalanoubatMichael LevyNathalie BoudetVéronique AnthouardClaire JubinVanina CastelliMichael KatinkaBenoît VacherieZineb SkalliLaurence CattolicoJulie PoulainSimone DupratPhilippe BrottierGuillaume LardierVincent SchachterFrancis QuetierWilliam SaurinClaude ScarpelliPatrick WinckerJean WeissenbachHugues Roest Crollius

Georges LutfallaChristian BiémontJean-Nicolas Volff

Jérôme GouzyDaniel Kahn

Nicole Stange-ThomannEvan MauceliDavid JaffeSheila FisherKevin J. McKernanPaul McEwanStephanie BosakMike ZodyJill MesirovKerstin Lindblad-TohBruce BirrenChad NusbaumEric S. Lander

Jean-Pierre CoutanceauCatherine Ozouf-Costaz

Frédéric BrunetMarc Robinson-RechaviVincent Laudet

Sergi CastellanoGenis ParraCharles ChappleRoderic Guigó