View
219
Download
0
Tags:
Embed Size (px)
Citation preview
The dynamics of nuclear gene order in the
eukaryotes
Genome archaeology in the angiosperms
Todd VisionDepartment of Biology
University of North Carolina at Chapel Hill
Comparative maps Spaghetti Diagram Crop Circle
Livingstone et al 1999 Genetics 152:1183Gale & Devos 1998 PNAS 95:1972
Arabidopsis as a hub for plant comparative maps
genome sizes in angiosperms
145262
367 367 372 415 439 473 560 622
907
0
250
500
750
1000
mega
base
s
data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218
Tomato-Arabidopsis synteny
Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121
Outline
• Ancient genome duplication– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication in genome evolution
Outline
• Ancient genome duplication– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication in genome evolution
Mayer et al. (2001) Genome Res. 11, 1167
Rice-Arabidopsis synteny
Paleotetraploidy?
The Arabidopsis Genome Initiative. 2000. Nature 408:796
Genomic dot-plot
gene 1 2 3 4 5 6 7 8 1 1 0 0 0 1 0 0 0 2 0 1 0 0 0 1 0 0 3 0 0 1 0 0 0 1 0 4 0 0 0 1 0 0 0 1 5 1 0 0 0 1 0 0 0 6 0 1 0 0 0 1 0 0 7 0 0 1 0 0 0 1 0 8 0 0 0 1 0 0 0 1
1 2 3 4
5 6 7 8Chromosome copy 1Chromosome copy 2
Duplication vs. multiplication
Multiple duplications generate abundant overlaps among homeologous regions
Vision et al. (2000) Science 290:2114-7.
Segmental paralogy in Arabidopsis
A B DC E F
Many duplicated segments but few duplication events
0
2
4
6
8
10
12
0 .1 .2 .3 .4 .5 .6 .7 .8 .9
amino acid substitution
freq
uenc
y of
blo
cks
Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.
Arabidopsis
tomatoAngiosperm Phylogeny Website. Version 2 August 2001. http://www.mobot.org/MOBOT/research/APweb/.
rice
Block 37 after
Asterid-Rosidsplit
Block 57before
monocot-dicot divergence
Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129
Divergence of homeologs
• Homeologs from age class C and older share less than a third of their genes
– Gene loss
– Or subsequent gene movement?
• There is no evidence for uneven proportions of duplicated genes between homeologs
Redundant gene function: SHATTERPROOF
Martin Yanofsky
Implications for comparative maps
• Networks of synteny
• Goodbye to pairwise comparisons
Outline
• Ancient genome duplication– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication in genome evolution
Ghosts and Muggles
Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627
Interspecies comparison can reveal Ghosts
Things needful
• Identification of highly diverged Muggles
• A systematic way to identify Ghosts
• Centralization of mapped and sequenced DNA markers from multiple species
FISH(Fast Identification of Segmental
Homology)• Identifies candidate segmental homologies
– Dynamic programming
• Statistically evaluates candidates– Null model of transpositional duplication
• No permutations required
• Approaches limits to sensitivity
FISH under null model
k observed number
standard error
upper bound
lower bound
2 45.8 0.06 47.6 40.1
3 2.28 0.02 2.39 1.78
4 0.113 0.003 0.120 0.079
5 0.006 0.001 0.006 0.004
6 0.0003 0.0002 0.0003 0.0002
eAssembler
• Reconstructs ancestral gene order by joining duplicated blocks with overlapping gene content
• Uses ‘breakpoint median’ as objective function
• Similar to algorithms used in sequence assembly
Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.
PHYTOMEintegrating plant genome maps,
sequences and phylogenies
From www.plantgdb.org
Outline
• Ancient genome duplication– How can we reconstruct genomic history?
• Computational challenges
• Role of different classes of gene duplication in genome evolution
Gene duplications in a chromosomal context
• Turnover within gene families can be high– Rate of duplication= 0.002/gene*MY– Half-life=23MY
• Three modes of duplication– Tandem– Transpositional– Segmental
• How does the mode of origin affect the molecular and functional divergence of duplicate genes?
Gene family turnover
Lynch and Conery (2000) Science 290, 1151
Importance of tandem and transpositional duplications
~10% of genes are in tandem arrays
85% of dispersed duplications are not in blocks
• Duplicates on the same chromosome are 20% more common than expected by chance
• Duplicates on the same chromosome are 86% as distant as would be expected by chance
Aux/IAA and ARF sister families
• Importance in Arabidopsis
Diversification of the Aux/IAA gene family
David Remington and Jason Reed
Diversification of ARF gene family
Chromosome 2-4 complex:242 duplicated gene pairs
2600
3000
3400
3800
4200
1200 1600 2000 2400 2800
chromosome 2 (5.6 Mb)
chro
mos
ome
4 (4
.6 M
b)
45
52
49
54
56
Substitutions in coding sequences
• silent substitutions (Ks) only alter the codon, not the resulting amino acid
• replacement substitutions (Ka) alter the amino acid
• Ka and Ks are standardized by the numbers of synonymous and nonsynonymous sites
Ratio of Ka to Ks
Ka/Ks < 1 selective constraint
Ka/Ks = 1 pure neutrality
Ka/Ks > 1positive selection
How have these ancient segmental duplicates diverged?
1. What is the variation in Ka and Ks among simultaneously duplicated pairs?
2. Do the Ka/Ks ratios suggest positive selection?
3. Do the members of each duplicated pair evolve at the same rate?
0
10
20
30
40
50
60
70
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ka
frequency
0
20
40
60
80
100
120
0 1 2 3 4 5
Ks
frequency
coefficient of variation = 0.67
coefficient of variation = 0.53
Relationship between Ka and Ks
Ka/Ks =1
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
Ks
Ka
r2=0.558, p<0.001
Relative rate test
O (outgroup) A B
d1 d2 d3
compare the fit of a model in which d2 = d3with one in which they are allowed to vary
Relative rate tests
• 105 gene pairs could be evaluated against an outgroup
• >30 showed significantly unequal rates of evolution• no evident chromosomal or regional biases
Distance measure Significant pairs
protein 15
Ka 29
Ks 9
Are paralogs different than orthologs?
• Homologous genes are either– Paralogs that diverged through duplication
– Orthologs that diverged though speciation
• Paralogs must coexist in the same genome – do they diverge differently as a result?
• Comparison to 212 Arabidopsis-Brassica orthologs by Tiffin and Hahn (2002) JME 54, 746.– For all pairs, Ka/Ks < 1
– Ka/Ks unimodal around 0.14 (as opposed to 0.20)
– CVKs/CVKa is appx. 2
Conclusions
• A network of synteny due to duplication and gene loss makes deep comparative mapping difficult
• But phylogenetically-informed methods should allow us to go much deeper than at present
• Only by going deep will we be able to understand the varied roles of different kinds of duplication events in the diversification of gene families
Acknowledgements• Arabidopsis genome evolution
– Daniel Brown– Steven Tanksley
• Comparative mapping– Peter Calabrese– Sugata Chakravarty– Luke Huan
• Evolution of duplicated genes– Liqing Zhang– Brandon Gaut– David Remington– Jason Reed
• Support– USDA– NSF
Conservation of gene orientation
parallel
convergent
divergent
Formulating the problem in terms of graph traversal
• nodes are matches• edges are unidirectional• edges have associated distances
The putative duplicated blocks consist of the paths through the graph that traverse edges with short distances
Statistical framework• Null model of duplications
– Single-gene duplication/random transposition
– Leads to uniformly distributed dots
• Null distribution for– The edge distance between nearest neighbors
– The number of serially connected short edges
• Observed edge distances and path lengths analytically compared to null expectation
• Can be approximated by a permutation test
Only a fraction of the genes are (still?) duplicated
Chr2 segment1183 genes
Chr4 segment1168 genes
326duplicates
(~28%)
271 (83%) pairwise duplications
Tandem substitutions
• correlation between Ka and Ks disappears when tandem substitutions are excluded
• could be due to– doublet mutations– compensatory substitutions
At2g18750
AT4g31000
49.5 calmodulin-binding protein 49.62 beta-expansin
AT4g28250At2g20750
49.63 NADH-ubiquinone oxireductase
At2g20800
AT4g28220
56.1 unknown transmembrane
At2g23810
AT4g30430
tobacco1698547
0.13
0.16
0.37
rice8118436
Hemerocallis3551953
p<0.0001
p<0.05
p<0.0001
p<0.01
0.300.10
0.22
0.14
0.160.29
0.220.120.70
potato5734586