12
Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in Monocots W Yuannian Jiao, a,1 Jingping Li, a,b,1 Haibao Tang, c,d,1 and Andrew H. Paterson a,2 a Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602 b Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602 c Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province 350002, China d J. Craig Venter Institute, Rockville, Maryland 20850 ORCID IDs: 0000-0002-8987-2782 (Y.J.); 0000-0002-3460-8570 (H.T.) Unraveling widespread polyploidy events throughout plant evolution is a necessity for inferring the impacts of whole-genome duplication (WGD) on speciation, functional innovations, and to guide identication of true orthologs in divergent taxa. Here, we employed an integrated syntenic and phylogenomic analyses to reveal an ancient WGD that shaped the genomes of all commelinid monocots, including grasses, bromeliads, bananas (Musa acuminata), ginger, palms, and other plants of fundamental, agricultural, and/or horticultural interest. First, comprehensive phylogenomic analyses revealed 1421 putative gene families that retained ancient duplication shared by Musa (Zingiberales) and grass (Poales) genomes, indicating an ancient WGD in monocots. Intergenomic synteny blocks of Musa and Oryza were investigated, and 30 blocks were shown to be duplicated before Musa- Oryza divergence an estimated 120 to 150 million years ago. Synteny comparisons of four monocot (rice [Oryza sativa], sorghum [Sorghum bicolor], banana, and oil palm [Elaeis guineensis]) and two eudicot (grape [Vitis vinifera] and sacred lotus [Nelumbo nucifera]) genomes also support this additional WGD in monocots, herein called Tau (t). Integrating synteny and phylogenomic comparisons achieves better resolution of ancient polyploidy events than either approach individually, a principle that is exempli ed in the disambiguation of a WGD series of rho (r)-sigma (s)-tau (t) in the grass lineages that echoes the alpha (a)-beta (b)-gamma (g) series previously revealed in the Arabidopsis thaliana lineage. INTRODUCTION Whole-genome duplication (WGD), or polyploidy, is a primary source of raw genetic material for evolutionary novelties (Ohno, 1970; Lynch and Conery, 2000; Adams and Wendel, 2005; Soltis et al., 2009). Although a straight change in ploidy is generally expected to be deleterious and often lead to an evolutionary dead end (Otto and Whitton, 2000; Mayrose et al., 2011; Arrigo and Barker, 2012), many modern organisms have descended from paleopolyploidized ancestors, including plants (Grant et al., 2000; Vision et al., 2000; Bowers et al., 2003; Blanc and Wolfe, 2004; Jaillon et al., 2007; Ming et al., 2008; Tang et al., 2008a; Barker et al., 2009; Fawcett et al., 2009; Paterson et al., 2009; International Brachypodium Initiative, 2010; Jiao et al., 2011; Wang et al., 2011; DHont et al., 2012; Amborella Genome Project, 2013; Ming et al., 2013), ani- mals (Christoffels et al., 2004; Jaillon et al., 2004; Dehal and Boore, 2005), and fungi (Kellis et al., 2004). WGDs are especially common in owering plant (angiosperm) lineages, all of which have experienced paleopolyploidies in their evolutionary history (Grant et al., 2000; Vision et al., 2000; Bowers et al., 2003; Blanc and Wolfe, 2004; Cui et al., 2006; Ming et al., 2008, 2013; Tang et al., 2008a, 2010; Fawcett et al., 2009; Paterson et al., 2009; Schnable et al., 2009; Soltis et al., 2009; International Brachypodium Ini- tiative, 2010; Jiao et al., 2011; Van de Peer, 2011; Wang et al., 2011; DHont et al., 2012; Amborella Genome Project, 2013; Singh et al., 2013). However, although several paleopolyploidy events have been intensively studied in eudicots, including the pan-core eudicot g event (Jaillon et al., 2007; Tang et al., 2008b; Jiao et al., 2012; Ming et al., 2013), most paleopolyploidies in monocots have remained elusive due to the availability of fewer sequenced lineages. The rst WGD event discovered in the ancestral lineage leading to modern cereals (Poaceae family) was named r,rst esti- mated to have occurred ;70 million years ago (MYA) based on molecular data (Paterson et al., 2004; Wang et al., 2005), and possibly being ;30 million years older considering recent fossil discoveries (Prasad et al., 2005). The main Poaceae clades (BEP and PACCAD) evolved as separate lineages for about two-thirds of the time since r (Paterson et al., 2004, 2009). Nonetheless, in those family members that have not experienced further ge- nome duplications and subsequent diploidization, such as rice (Oryza sativa) and sorghum (Sorghum bicolor), synteny conserva- tion spans large genomic regions covering hundreds to thousands of orthologous genes (Paterson et al., 2009). The rice and sorghum genomes also share 97 to 98% of post-r gene loss in ortholo- gous regions (Paterson et al., 2009), typical of genome compar- isons among other cereals (Schnable et al., 2009; International Brachypodium Initiative, 2010; Bennetzen et al., 2012) and sup- porting the deduction that the radiation of the major cereal line- ages occurred ;20 million years or more after r (Paterson et al., 2009). The banana (Musa acuminata) genome has undergone three successive rounds of WGD since its divergence with grasses 1 These authors contributed equally to this work. 2 Address correspondence to [email protected]. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Andrew H. Paterson ([email protected]). W Online version contains Web-only data. www.plantcell.org/cgi/doi/10.1105/tpc.114.127597 This article is a Plant Cell Advance Online Publication. The date of its first appearance online is the official date of publication. The article has been edited and the authors have corrected proofs, but minor changes could be made before the final version is published. Posting this version online reduces the time to publication by several weeks. The Plant Cell Preview, www.aspb.org ã 2014 American Society of Plant Biologists. All rights reserved. 1 of 11

Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

  • Upload
    vodan

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

Integrated Syntenic and Phylogenomic Analyses Reveal anAncient Genome Duplication in MonocotsW

Yuannian Jiao,a,1 Jingping Li,a,b,1 Haibao Tang,c,d,1 and Andrew H. Patersona,2

a Plant Genome Mapping Laboratory, University of Georgia, Athens, Georgia 30602b Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602cCenter for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province 350002, Chinad J. Craig Venter Institute, Rockville, Maryland 20850

ORCID IDs: 0000-0002-8987-2782 (Y.J.); 0000-0002-3460-8570 (H.T.)

Unraveling widespread polyploidy events throughout plant evolution is a necessity for inferring the impacts of whole-genomeduplication (WGD) on speciation, functional innovations, and to guide identification of true orthologs in divergent taxa. Here,we employed an integrated syntenic and phylogenomic analyses to reveal an ancient WGD that shaped the genomes of allcommelinid monocots, including grasses, bromeliads, bananas (Musa acuminata), ginger, palms, and other plants of fundamental,agricultural, and/or horticultural interest. First, comprehensive phylogenomic analyses revealed 1421 putative gene families thatretained ancient duplication shared by Musa (Zingiberales) and grass (Poales) genomes, indicating an ancient WGD in monocots.Intergenomic synteny blocks of Musa and Oryza were investigated, and 30 blocks were shown to be duplicated before Musa-Oryza divergence an estimated 120 to 150 million years ago. Synteny comparisons of four monocot (rice [Oryza sativa], sorghum[Sorghum bicolor], banana, and oil palm [Elaeis guineensis]) and two eudicot (grape [Vitis vinifera] and sacred lotus [Nelumbonucifera]) genomes also support this additional WGD in monocots, herein called Tau (t). Integrating synteny and phylogenomiccomparisons achieves better resolution of ancient polyploidy events than either approach individually, a principle that is exemplifiedin the disambiguation of a WGD series of rho (r)-sigma (s)-tau (t) in the grass lineages that echoes the alpha (a)-beta (b)-gamma (g)series previously revealed in the Arabidopsis thaliana lineage.

INTRODUCTION

Whole-genome duplication (WGD), or polyploidy, is a primary sourceof raw genetic material for evolutionary novelties (Ohno, 1970; Lynchand Conery, 2000; Adams and Wendel, 2005; Soltis et al., 2009).Although a straight change in ploidy is generally expected to bedeleterious and often lead to an evolutionary dead end (Otto andWhitton, 2000; Mayrose et al., 2011; Arrigo and Barker, 2012),many modern organisms have descended from paleopolyploidizedancestors, including plants (Grant et al., 2000; Vision et al., 2000;Bowers et al., 2003; Blanc and Wolfe, 2004; Jaillon et al., 2007;Ming et al., 2008; Tang et al., 2008a; Barker et al., 2009; Fawcettet al., 2009; Paterson et al., 2009; International BrachypodiumInitiative, 2010; Jiao et al., 2011; Wang et al., 2011; D’Hont et al.,2012; Amborella Genome Project, 2013; Ming et al., 2013), ani-mals (Christoffels et al., 2004; Jaillon et al., 2004; Dehal andBoore, 2005), and fungi (Kellis et al., 2004). WGDs are especiallycommon in flowering plant (angiosperm) lineages, all of which haveexperienced paleopolyploidies in their evolutionary history (Grantet al., 2000; Vision et al., 2000; Bowers et al., 2003; Blanc andWolfe, 2004; Cui et al., 2006; Ming et al., 2008, 2013; Tang et al.,2008a, 2010; Fawcett et al., 2009; Paterson et al., 2009; Schnable

et al., 2009; Soltis et al., 2009; International Brachypodium Ini-tiative, 2010; Jiao et al., 2011; Van de Peer, 2011; Wang et al.,2011; D’Hont et al., 2012; Amborella Genome Project, 2013;Singh et al., 2013). However, although several paleopolyploidyevents have been intensively studied in eudicots, including thepan-core eudicot g event (Jaillon et al., 2007; Tang et al., 2008b;Jiao et al., 2012; Ming et al., 2013), most paleopolyploidies inmonocots have remained elusive due to the availability of fewersequenced lineages.The first WGD event discovered in the ancestral lineage leading

to modern cereals (Poaceae family) was named “r,” first esti-mated to have occurred ;70 million years ago (MYA) based onmolecular data (Paterson et al., 2004; Wang et al., 2005), andpossibly being ;30 million years older considering recent fossildiscoveries (Prasad et al., 2005). The main Poaceae clades (BEPand PACCAD) evolved as separate lineages for about two-thirdsof the time since r (Paterson et al., 2004, 2009). Nonetheless, inthose family members that have not experienced further ge-nome duplications and subsequent diploidization, such as rice(Oryza sativa) and sorghum (Sorghum bicolor), synteny conserva-tion spans large genomic regions covering hundreds to thousandsof orthologous genes (Paterson et al., 2009). The rice and sorghumgenomes also share 97 to 98% of post-r gene loss in ortholo-gous regions (Paterson et al., 2009), typical of genome compar-isons among other cereals (Schnable et al., 2009; InternationalBrachypodium Initiative, 2010; Bennetzen et al., 2012) and sup-porting the deduction that the radiation of the major cereal line-ages occurred ;20 million years or more after r (Paterson et al.,2009). The banana (Musa acuminata) genome has undergone threesuccessive rounds of WGD since its divergence with grasses

1 These authors contributed equally to this work.2 Address correspondence to [email protected] author responsible for distribution of materials integral to the findingspresented in this article in accordance with the policy described in theInstructions for Authors (www.plantcell.org) is: Andrew H. Paterson([email protected]).W Online version contains Web-only data.www.plantcell.org/cgi/doi/10.1105/tpc.114.127597

This article is a Plant Cell Advance Online Publication. The date of its first appearance online is the official date of publication. The article has been

edited and the authors have corrected proofs, but minor changes could be made before the final version is published. Posting this version online

reduces the time to publication by several weeks.

The Plant Cell Preview, www.aspb.org ã 2014 American Society of Plant Biologists. All rights reserved. 1 of 11

Page 2: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

(D’Hont et al., 2012). The oil palm (Elaeis guineensis) and datepalm (Phoenix dactylifera) genomes share one recent WGD(Al-Mssallem et al., 2013; Singh et al., 2013). Previous genomicanalysis in the banana and oil palm provided no indications ofadditional more ancient WGD before the divergence of Poalesand Zingiberales (D’Hont et al., 2012; Singh et al., 2013).

Nested within homoeologous regions formed in well-characterizedgenome duplications are sometimes found additional paralogsfrom more ancient paleopolyploidy events that have been ob-scured by subsequent reduplications and rediploidizations. Forexample, the event called “s” in our earlier study (Tang et al.,2010) was suspected to comprise multiple events that couldnot be resolved due to lack of evidence from multiple monocotlineages needed for disambiguation.

Here, by comparing the genomes of the monocots rice, sorghum(Poales), oil palm (Arecales), and banana (Zingiberales), and eudicotgenomes from grape (Vitis vinifera; Vitales) and sacred lotus(Nelumbo nucifera; Proteales, of basal eudicots), we show thatprior to r, many monocot lineages had already experienced twodistinct paleopolyploidies. We denote the first as “t” and thefollowing “s” to complete a WGD series of r – s – t in the grasslineage that echoes the a – b – g series in the Arabidopsis lin-eage. Key to this discovery, and of high value for clarifyingcryptic genome duplications in other taxa, is integration of phy-logenomic and synteny analyses (Figure 1), to explore genomepaleo-evolution. This approach can be easily applied to multipledivergent genomes and circumscribe nested paleopolyploidyevents, as exemplified in our study, and therefore is attractive forstudying modern angiosperm genomes after a long and com-plex history of paleopolyploidies.

RESULTS

Phylogenomic Analysis

To investigate ancient WGDs, 15 monocot genomes were used toconstruct putative gene families or orthogroups (see Methods). Ofthe 39,427 total gene families, 11,503 containing at least onemonocot and one eudicot sequence were used to constructmaximum likelihood trees. Using a gene duplication scoringstrategy previously described (Jiao et al., 2011), shared dupli-cations (Figure 2) were identified from constructed gene familyphylogenies after monocot-eudicot divergence but before grass-banana divergence (1450 duplications in 1421 gene families withbootstrap support [BS] $50%; 552 duplications in 540 familieswith BS $80%). Most gene families retaining high levels ofshared duplicates are statistically enriched in Gene Ontologycategories corresponding to kinase, transferase, transporter,transcription regulator, and transcription factor (SupplementalData Set 1), consistent with the general postgenome duplicationsurvivorship patterns (Maere et al., 2005; Paterson et al., 2006;Freeling, 2009).

Orthogroup gene trees show excesses of internal nodes poten-tially corresponding to WGD events. For example, WRKY tran-scription factors (orthogroup 1297) survived ancient duplicationshared by grasses, palms, and Musa after divergence from eudi-cots (Supplemental Figure 1). Additional younger lineage-specific

duplication events within the sampled grasses, palms, and Musawere largely consistent with published views of WGD timing (r ands in grasses, three WGDs in Musa, and one in a palm progenitor)(Tang et al., 2010; D’Hont et al., 2012). Another exemplary genefamily is the C3HC4-type RING zinc finger transcription factorfamily (orthogroup 1312), which showed almost maximal retentionacross many lineages, surviving two nested duplications in grasslineages, one duplication shared by palms, two nested duplica-tions in Musa lineages, and the gamma duplication in eudicots(Supplemental Figure 2).

Multiple Synteny Alignments of Musa Using Oryzaas Reference

We performed a careful analysis of Oryza-Musa synteny to pin-point phylogenetic signals of deep-time monocot duplicationevents, identifying 1021 syntenic blocks using MCScanX (Wanget al., 2012), including 9371 gene pairs. Syntenic regions of Musawere then aligned against Oryza chromosomes, with 22,223Oryza genes in regions which matched one to eight differentMusa regions (hereinafter denoted as homologous clusters) and513 Oryza genes in regions which matched 9;15 Musa regions.The redundancy ratio of 1 Oryza:8 Musa is consistent with threegenome doublings in the Musa genome, while redundancy of 1to 9;15 suggests at least one additional WGD.Duplicated blocks were further dated by mapping the syntenic

anchors onto phylogenetic trees (see Methods). SupplementalFigure 3A shows a homologous cluster of eight Musa regionsaligned to Oryza chromosome 5. Most anchor genes on theseeight regions were determined phylogenetically as having beenduplicated after theMusa-Oryza divergence (Supplemental Figure3B). In total, we found 47 such homologous clusters (1 Oryza to4+ Musa) duplicated after Musa-Oryza divergence, supportingthree previously identified Musa WGDs. We also found 30 ho-mologous clusters that were generated by a more ancient du-plication, before Musa-Oryza divergence. For example, one Oryzaregion (Chr3: 8.7 to 10.5 Mb) matched up to 11 regions in Musa(Figure 3A). TheMusa region groups A (Chr2: 17.1 to 17.3 Mb andChr6: 32.9 to 33.0 Mb) and B (Chr5: 28.4 to 28.8 Mb, Chr6: 9.9 to10.5 Mb and 32.1 to 32.6 Mb, Chr7: 12.0 to 12.5 Mb, Chr8: 25.8to 26.0 Mb, Chr9: 2.4 to 3.0 Mb, Chr10: 7.2 to 7.4 Mb, 21.9 to22.2 Mb and 27.7 to 28.0 Mb, Chr11: 23.6 to 24.2 Mb) wereduplicated after the divergence of Musa and Oryza, respectively,while the duplication of the ancestral blocks of A and B occurredbefore the divergence of Musa and Oryza (Figure 3b).

Overview of Synteny Conservation among SixRepresentative Genomes

Genome alignments of four monocot (rice, sorghum, banana, andoil palm) and two eudicot (grape and sacred lotus) species(Supplemental Table 1) revealed clear evidence of s and t. In total,97.66% (38,136) of rice genes, 98.70% (34,046) of sorghum genes,96.50% (35,272) of banana genes, 86.10% (27,746) of oil palmgenes, 95.76% (25,228) of grape genes, and 92.87% (24,782) ofsacred lotus genes were covered in the aligned regions.Comparison between lotus and oil palm genomes showed

a mode of 2-to-4 multiplicity ratio between orthologs (Table 1),

2 of 11 The Plant Cell

Page 3: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

with the “2” indicative of the lotus lambda (l) duplication (Minget al., 2013) and the “4” indicating two paleotetraploidy events inthe oil palm lineage that include t and palm-specific “p” (Figure 4A)(Singh et al., 2013). The p WGD in oil palm and the s and r WGDsin rice after their separation were reflected by the 2-to-4 multiplicityratio. Collectively, the ploidy ratios across these pairwise com-parisons indicate one WGD (s) in the lineage leading to grassesbefore r and an additional ancestral event (t) preceding thePoaceae-Arecaceae divergence (Figure 4). These events implya total of 83 paleo-multiplicity in the rice genome after monocot-dicot divergence, as confirmed by both lotus-rice and grape-ricecomparisons (Table 1).

With successive WGDs followed by loss of most duplicatedgenes, ancestral gene orders become progressively more frag-mented and discernible synteny blocks become smaller (Table 1).The number of anchor genes tends to show a negative correlationwith species divergence (Table 1), indicating that reciprocal geneloss and/or gene transposition is a persistent process on a largeevolutionary scale. Nonetheless, differential retention of ancestral

loci in early Poales branches has slowed down in the cereal ge-nomes after the most recent WGD, r. About 4.68% of bananaand 6.85% of oil palm genes have no orthologs in the otherstudied monocots but do have grape and lotus (eudicot) or-thologs. This percentage is smaller in rice (3.00%) and sorghum(3.51%). However, the majority (;80%) of ancestral loci that hadbeen differentially retained in the lineage leading to cereals (sincetheir divergence from other Commelinids 120 to ;83 MYA; TheAngiosperm Phylogeny Group, 2009) are still syntenic in present-day rice and sorghum genomes. The remaining;20% have beendifferentially lost or transposed since the divergence of the twospecies 80 to ;50 MYA (The Angiosperm Phylogeny Group,2009).

Circumscribing the s Duplication

Our grape-rice genome comparison showed that there are moreparalogous regions in rice than those produced in the pan-grassr, indicating more ancient paleopolyploidizations designated as

Figure 1. Schematic Diagram Detailing an Integrated Synteny and Phylogenomic Approach for Circumscribing WGD Events on a Species Phylogeny.

(A) Flowchart of the methodology.(B) Rationale of ordering speciation and WGD events in the history of two lineages. Chromosome segments are represented by gray bars, and genesare represented by colored bricks.(C) Illustration of homoeologous genes (anchors) mapping onto the phylogeny of a homologous gene family in the phylogenomic approach used to datethese homologous regions.

A Pan-Commelinid Paleopolyploidy 3 of 11

Page 4: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

s (Tang et al., 2010). Such nested WGDs are often best revealedby bottom-up approaches (Bowers et al., 2003), attempting toreverse the changes resulting from more recent events (in thiscase r) superimposed on them. Using such bottom-up re-construction (see Methods), a total of 146 p (palm WGD) blockscovering 71.74% of the oil palm genome were merged into pre-pancestral gene orders. Similarly, 140 r (most recent grass WGD)

blocks covering 70.63% of the rice genome were merged intopre-r gene orders. Direct comparison between the ancestralorders resulted in 209 synteny blocks covering 96.34% of thepre-p order and 72.77% of the pre-r order, each pre-p regionaligning with up to two paralogous pre-r regions, as exemplifiedin Figure 4B. This clearly revealed that the s event occurred beforethe pan-Poaceae r event (;70 MYA; Paterson et al., 2004) butafter the Poaceae-Arecaceae split (120 to ;83 MYA; The Angio-sperm Phylogeny Group, 2009).As the second largest monocot family and fifth largest angio-

sperm family, the Poaceae is rich in morphological and ecologicaldiversity. Much of the underlying genetic diversity may haveresulted from s and r WGDs. The events may also have con-tributed to some grass-specific characters. Genomes from basalPoales species such as pineapple (Ananas comosus) could beuseful to further narrow the dating of r and s.

Unraveling the Precommelinid t Duplication Event inEarly Monocots

To simplify the study of t in the deep monocot lineages, wecompared oil palm and sacred lotus, each having experiencedonly one subsequent genome duplication. We reconstructed theoil palm pre-p and the sacred lotus pre-l gene orders as de-scribed above for s, covering 71.74 and 74.05% of the respectivegenomes. Alignment of the reconstructed genomes resulted in203 ancestral synteny blocks covering 69.44% of pre-l order and

Figure 2. Phylogenetic Timing of Inferred Gene Duplications.

Values given are the number of orthogroups showing duplications at thespecified branches on the trees generated with the maximum likelihoodmethod with bootstrap value $50%.

Figure 3. Example Alignment Showing Homologous Musa Regions Duplicated by an Additional Ancient WGD before the Divergence of Musa andOryza.

(A) Circle representations of reference Oryza region and the 11 homologous Musa regions.(B) Exemplar gene family tree shows the duplication time of anchor genes located on homologous Musa syntenic blocks. Black lines link Oryza-Musaorthologous gene pairs. Red lines link Musa paralogous gene pairs that were duplicated after the split of Musa and Oryza. Blue lines link Musa genesduplicated before the divergence of Musa and Oryza. Solid dots on phylogenetic tree show recent duplications. Red star represents the ancient tduplication. Yellow diamond means the g triplication. Numbers on branches are BS values. Branch lengths are drawn to scale, with the scale barindicating the number of nucleotide substitutions per site.

4 of 11 The Plant Cell

Page 5: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

73.53% of pre-p order. One pre-l region generally matched up totwo pre-p regions (Figure 4B), indicating that an oil palm ancestorexperienced another more ancient paleotetraploidy event beforep but after it diverged with the eudicots. Syntenic regions withmore than two homoeologous copies retained are distributedacross all 16 oil palm chromosomes, confirming that this eventwas genome-wide. Since we have shown above that oil palm didnot share s, this new paleotetraploidy is inferred to be t (Figure 4A).

Using the eudicot outgroup genomes of grape or lotus, whichonly had one paleopolyploidy in their own lineages, up to fourorthologous regions can be identified in oil palm, indicating twoWGDs, t and p, and up to eight orthologous regions can be iden-tified in rice, indicating three WGDs, t, s, and r (Figure 5). However,fast evolutionary rates and three WGDs in rice have severely in-creased paleoparalog loss, with only 10,877 paralogous copies ofpre-t loci retained (excluding tandem duplications) among 39,049rice genes. In contrast, oil palm evolved at a much slower rate(about one half that of grasses) and had one less WGD. About16,092 pre-t paralogs are retained (excluding tandem duplications)among 32,225 oil palm genes. In both genomes, recurring WGDsseverely eroded signals of paleo-paralogy from t, with pre-t lociretained at 1.4 copies on average in rice (versus 8 at full retention),and 1.5 copies on average in oil palm (versus 4 at full retention).

Simultaneous Circumscription of Paleopolyploidy Events inMultiple Lineages

To integrate many of the individual synteny analyses describedabove, a streamlined procedure (see Methods) was applied tothe rice, sorghum, banana, oil palm, grape, and sacred lotusgenomes. The six genomes were compared in a pairwise mannerand segmented at boundaries of synteny blocks, forming sets ofputative ancestral regions (PARs; clusters of homoeologous re-gions with high anchor density) as described (Tang et al., 2010).By integrating syntenic mapping information from 1217 groups ofPARs from 15 pairs of genome comparisons, paleopolyploidylevels were inferred on each branch of the given species phy-logeny (Supplemental Figure 4). Two alternative topologies of thespecies tree were used to reflect contending evolutionary scenariosdiffering in whether Arecales are sister to both Poales (containinggrasses) and Zingiberales (containing banana) (The AngiospermPhylogeny Group, 2009) or with the Arecales-Zingiberales splitafter their most recent common ancestor (MRCA) diverged withthe Poales MRCA (D’Hont et al., 2012; Singh et al., 2013). Estima-tion on the two species trees each inferred two genome duplica-tions (s and r) on the pan-grass lineage and one (t) preceding the

grass-palm-banana divergence. The results verified individualanalyses from previous and above studies and systematicallydissected the six genomes into homoeologous clusters, the PARs,as produced from their paleopolyploid history.

DISCUSSION

Integrating synteny and phylogenomic comparisons achievesbetter resolution of ancient polyploidy events than either ap-proach individually, a principle that is exemplified in the dis-ambiguation of a WGD series of r – s – t in the grass lineagesthat echoes the a – b – g series previously revealed in the Arabi-dopsis lineage.Using this integrated approach, we were able to reveal a

WGD, t, in the early evolution of monocots, also confirming ninepreviously identified WGDs (Figure 4). Such additional WGD washinted at previously (Tang et al., 2010) but not discernible due tolack of genomic data from non-grass monocots. We infer t tohave occurred near the origin of monocots, before the origin ofcommelinids (which spans all sequenced monocot genomes thusfar) but after the monocot-eudicot split. After this shared ancestraltetraploidy, grass lineages experienced another two tetraploidies.Oil palm experienced one tetraploidy, likely shared with otherpalms (Singh et al., 2013). Banana experienced three additionalduplications (D’Hont et al., 2012), making it among the moreduplicated monocot genomes known, a paleo-dotriacontaploid(2n=32x) like maize (Zea mays; Schnable et al., 2009).Our results clearly showed that t is shared by all 11 sequenced

monocot genomes, i.e., all commelinids. Commelinids originated;120 to ;100 MYA, not long after the origin of monocots ;150to ;130 MYA (Hedges et al., 2006; The Angiosperm PhylogenyGroup, 2009), and comprise ;18,800 Old World and New Worldmonocot species that include many economically importantplants, such as palm, banana, ginger, cereals, and grasses. Theestimated timing of t is very close to the monocot origin, makingit analogous to the g event foreshadowing the initial radiation ofcore eudicots (Figure 4A). Additional basal monocot genomesequences are needed to resolve the exact timing of t.

Integrating Synteny and Phylogenomic Approaches forSystematic Genome Comparison

Repeated polyploidization and subsequent diploidization throughfractionation and structural rearrangement are genome-widemutational forces and a striking feature in plants, presumablycontributing to their marvelous genetic, phenotypic, and ecological

Table 1.Multiplicity Ratios between Pairs of Genomes Resulting from Independent WGDs in Bottom Left Section and Number of Anchors (Number ofSynteny Blocks) in Upper Right Section

Rice Sorghum Banana Oil Palm Grape Sacred Lotus

Rice – 18,377 (57) 18,871 (1,779) 15,879 (826) 10,582 (956) 11,518 (1,015)Sorghum 1:1 – 18,681 (1,755) 15,447 (802) 10,242 (913) 11,001 (958)Banana 8:4 8:4 – 20,481 (1,546) 12,718 (1,363) 12,725 (1,327)Oil palm 2:4 2:4 2:8 – 16,504 (1,007) 17,798 (1,056)Grape 3:8 3:8 3:16 3:4 – 18,003 (685)Sacred lotus 2:8 2:8 2:16 2:4 2:3 –

A Pan-Commelinid Paleopolyploidy 5 of 11

Page 6: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

diversity (Bowers et al., 2003; Blanc and Wolfe, 2004; Adamsand Wendel, 2005; Freeling and Thomas, 2006; Doyle et al.,2008; Fawcett et al., 2009; Freeling, 2009; Schnable et al., 2009;Soltis et al., 2009; Van de Peer et al., 2009; Paterson et al., 2010;D’Hont et al., 2012; Ming et al., 2013). Nonetheless, conservedgenetic content and order are observed to different degrees inclosely and distantly related plant genomes (Bonierbale et al.,1988; Kowalski et al., 1994; Paterson et al., 1996; Ku et al., 2000;Tang et al., 2008a; Tang et al., 2010). However, plant genomecomparisons differ qualitatively from those that are suitable formammals and most other taxa and require methods that reflectand represent the multiway homoeologous relationships resultingfrom paleopolyploidies.

Early studies of WGD using whole-genome sequences werephylogenetic, based either on synonymous substitution rate (Ks)distribution between duplicated gene pairs (Lynch and Conery,2000; Blanc and Wolfe, 2004) or topology of gene family trees(Bowers et al., 2003). Age grouping of paralog Ks values can becalculated without information about gene position and is theonly method for dating WGDs using transcriptome data alonebut risks blending of gene families with different modes of Ksdivergence and for ancient WGDs base substitution often ap-proaches saturation and sequence-based methods become lessuseful. An early phylogenetic approach for dating WGD based on

discriminating duplication-first (external) and speciation-first (in-ternal) tree topologies (Bowers et al., 2003) was constrained bylimited EST sampling, data for few taxa, and lineage evolutionaryrate variation (Tang et al., 2008b), although later improved withmore rigorous taxon sampling in the phylogenomic dating ap-proach (Jiao et al., 2011).While the phylogenomic approach utilizes temporal signals, the

synteny approach identifies WGDs from spatial signals, i.e., con-servation of gene position and order along the chromosomes(Vandepoele et al., 2002; Haas et al., 2004; Tang et al., 2008a).Synteny conservation provides independent and reliable phyloge-netic signals unaffected by DNA substitution rate variation (Tanget al., 2008a). In synteny-based WGD dating, coalescence of du-plicated regions can be traced before or after speciation dependingon the patterns of one-to-one correspondence versus one-to-multiple (or multiple-to-multiple) correspondence (Figure 1B).Integration of temporal and spatial evidence (Figure 1) is by far

the most accurate approach for the circumscription of WGDevents. It also naturally connects studies of genome structureand molecular evolution. With the fast growth of genomic data,systematic comparisons of divergent taxa aided by WGD-awareancestral reconstruction promises better delineation and in-terpretation of plant genome evolution by providing an effectiveframework for multiway genomic alignments, with the unique

Figure 4. Comparison of Six Genomes and History of Paleopolyploidization.

(A) Phylogeny of lineages under study with other angiosperm lineages abstracted. Paleopolyploidy events are represented by orange (duplication) orbrown (triplication) circles filled with their names. Color shades on tree branches indicate positions of genome comparison and whether they precede orpostdate the adjacent polyploidies.(B) Dot plots (dots represent matching loci between two genomes) of example regions from each of the designated genome comparisons are shown,outlined by the same color as on the tree. Quota ratios of the alignments are shown below the dot plots. Paleopolyploidy events underlying thealignment patterns are detailed in the text. Geologic timing is based on estimation, as precise values are usually unknown for ancient events.

6 of 11 The Plant Cell

Page 7: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

advantage of tolerating long evolutionary distance and extensivegenome rearrangement.

The Value of Ancestral Gene Order in Genome Comparisons

Inferred or approximate ancestral gene order often provides thebest reference to thread genomic alignments in taxa with WGDs(Bowers et al., 2003; Kellis et al., 2004; Zheng et al., 2013). First,

it compensates for post-WGD gene loss, increasing the proportionof aligned genes among homoeologous regions. Second, it ex-pands synteny blocks as lineage specific breakpoints are re-moved. Therefore, ancestral reconstruction reverses the effectsof more recent WGDs and better reveals the interleaving patternof gene loss (as illustrated in Figure 1; Kellis et al., 2004), greatlyfacilitating recovery of full syntenic mapping among homoeolo-gous regions. When aWGD-free outgroup genome is not available,

Figure 5. Multiple Alignment of a Set of Syntenic Regions in Oil Palm, Rice, and Lotus.

Triangles represent individual genes and their transcriptional orientations. Genes with no syntenic matches in the selected regions are not plotted. Theevent t is the pan-Commelinid paleotetraploidy that is shared by the oil palm and rice lineages but not with the dicots. The events s and r are twopaleotetraploidies in the ancestral grass lineages that are not shared with the palm and banana lineages. The event l is the paleotetraploidy in the lotus(Nelumbo nucifera) lineage in basal eudicots. Aligned genes in the homoeologous regions are merged into consensus orders to approximate the pre-duplicated ancestral regions. Ancestral genes with uncertain orientations are represented by squares.

A Pan-Commelinid Paleopolyploidy 7 of 11

Page 8: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

as is the case in many plant clades, ancestral gene order can beapproximated by in silico reconstruction. Reconstructed ances-tral genomes, while not necessarily identical to the true ancestralgenome, are important references in effective genome alignments.

Recurring polyploidization and diploidization, ubiquitous in plantevolution, greatly obscure the network of homology in plant ge-nome comparisons. More than 20 ancestral and independentpaleopolyploidies have been identified in ;50 sequenced angio-sperms, affecting 100% of lineages. Consequently, many angio-sperm genomes contain populations of homoeologous regions ofdifferent depth resulting from combination of different levels ofpaleopolyploidization and postpolyploidy gene loss (examplesfrom the six studied genomes are shown in Supplemental Figure5). Inter-genomic comparisons are complicated as well. Arabi-dopsis thaliana and rice, for example, have 829 synteny blocksaveraging 43 genes, resulting from six WGDs, three in each lineage.A comparison between diploid Brassica rapa and banana genomeswould involve as many as 52 copies involving eight WGD events(3x2x2x3=36 in Brassica and 2x2x2x2=16 in banana). It is beyondcurrent technology to directly find and align all the homoeologousregions in such genomes. Instead, ancestral reconstruction basedon an established framework of historical WGDs is able to ef-fectively recover the grand hierarchy of homoeology.

Variation of Lineage Nucleotide Evolutionary Rates andEstimated Ages of s and t

Nucleotide evolutionary rates can vary greatly among sites, genefamilies, nuclear and organellar genomes (Wolfe et al., 1987;Zhang et al., 2002; Mower et al., 2007; Gaut et al., 2011), andlineages with different life history traits and population characters(Smith and Donoghue, 2008; Gaut et al., 2011). Shared paleo-polyploidy events provide inherent reference points to calibrateevolutionary rates among affected lineages. Homoeologs bearingsynteny information are also intrinsically more accurate phylo-genetic markers for multicopy gene families. By applying suchreasoning, the Vitis lineage nucleotide substitution rate is esti-mated to be less than half that of Arabidopsis (Tang et al., 2008b),and the Nelumbo lineage rate is 30% slower than Vitis (Minget al., 2013). The synonymous site substitution rate betweenparalogs formed in the shared t WGD (Supplemental Figure 6) is;1.7 times larger in rice than oil palm. This is much less than the;5-fold difference between grasses (faster) and palms estimatedfrom chloroplast ribulose-bisphosphate carboxylase large subunit(rbcL) genes (Gaut et al., 2011) and ;4-fold difference estimatedfrom a combination of rDNA, chloroplast, and mitochondrial genes(Smith and Donoghue, 2008). These differences in rate estimatesemphasize the heterogeneous nature of molecular evolution inplant genomes. Reliable WGD and homoeology identification isclearly essential for detailed evolutionary analysis on a genome-wide scale.

The pan-grass r event was estimated to be ;70 million yearsold (Paterson et al., 2004) based on descendant paralogs in ricehaving a Ks distribution of mode;0.86 and estimated rice lineagerate of 6.5e-9 substitutions per synonymous site per year. Usingthe same rate, the age of s (paralogous Ks mode ;1.65) wasestimated to be ;127 MYA. The palm WGD p event was esti-mated to be ;75 MYA (D’Hont et al., 2012; Singh et al., 2013),

with descendant paralogs in oil palm having a Ks distribution ofmode;0.36, giving an estimated oil palm lineage rate of;2.4e-9.Paralogs of t have Ks distributions of mode ;1.13 in oil palm and;1.87 in rice. Taking the average rate of rice and oil palm lineages(4.45e-9) to be the approximate substitution rate on their MRCAlineage, the age of t can be estimated to be ;73 MYA before theArecaceae-Poaceae split using Ks distribution of oil palm paralogsor ;64 MYA using rice Ks distribution. Again, we note that recentfossil discoveries (Prasad et al., 2005) have a host of implicationsfor estimated evolutionary rates and associated dates of theseand many other key events in monocot evolution, a topic that weare exploring comprehensively under separate cover. Taking intoconsideration the uncertainties in molecular evolutionary rate es-timation, and availability of paleontological records, t likely oc-curred in primitive monocot branches. Due to a lack of fullysequenced early-diverging monocot genomes and limitation ofresolution in current sequence-based dating methods, furtherrefinement of the exact timing of s and t awaits future studies.While disambiguation of r – s – t was manual, this approach

to integrating synteny and phylogenomic comparisons anddemonstration of its value to resolve otherwise cryptic paleo-polyploidy events has provided results useful to validate effortsto automate the approach, another topic that we plan to reporton comprehensively elsewhere.

METHODS

Data Source

We selected 15 taxa to represent all of the major land plant lineages forwhich genome sequence data are available, including four rosid genomes(Arabidopsis thaliana, Populus trichocarpa, Theobroma cacao, and Vitisvinifera), two asterids (Solanum lycopersicum and Solanum tuberosum),one basal eudicot (Nelumbo nucifera), six monocots (Oryza sativa, Brachy-podium distachyon, Sorghum bicolor, Elaeis guineensis, Phoenix dactylifera,and Musa acuminata), one lycophyte (Selaginella moellendorffii), and onemoss (Physcomitrella patens). Genome data were downloaded from eitherPhytozome (version 6) or respective project websites.

Global Gene Family Phylogeny

The OrthoMCL method (Li et al., 2003) was used to construct a completeset of protein coding genes into orthogroups. For each ortholog, aminoacid alignments were generated with MAFFT (Katoh et al., 2005) usingdefault parameters. Corresponding DNA sequences were then forcedonto the amino acid alignments using custom Perl scripts. DNA align-ments were trimmed using trimAl (Capella-Gutiérrez et al., 2009) using theheuristic “automated1” method. Then, maximum likelihood phylogenetictrees were constructed using RAxML version 7.2.1 (Stamatakis, 2006)with the GTRGAMMA model and 100 bootstrap replicates. In total, weconstructed 11,503 phylogenetic trees containing monocot and at leastone outgroup genes. Due to the classification stringency of OrthoMCL,two anchor genes on synteny blocks could be classified into differentorthogroups, and these two orthogroups were combined to run phy-logeny for dating the duplication of anchors.

Phylogenomic Dating of Synteny Blocks

Duplication time of synteny blocks was dated by carefully mapping ho-mologous anchor genes onto phylogenetic trees. We considered geneduplications only when the BS value equal or greater than 50%. If

8 of 11 The Plant Cell

Page 9: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

homologous anchor genes are clustered in more than one orthogroup,we combined them and recomputed the phylogenetic trees for dating.After all anchor genes duplication dated, we scored the synteny blocksduplication by following criteria: (1) at least two anchor genes for eachregion could be dated phylogeneticly with BS$50%, and (2) the timingof the anchor gene duplications must be mostly consistent.

Circumscribing Paleopolyploidy Events in Multiple Lineages Basedon Synteny Patterns

For each pair of the six input genomes (Supplemental Table 1), represen-tative coding sequences from each gene locus were aligned by LASTZ(Harris, 2007) with default parameters. Weakmatcheswith C-score (Putnamet al., 2008) <0.5 (indicating that their similarity falls below 50% of the bestmatched pair between the two genomes) were filtered out. Remainingmatches within 30 to;40 Manhattan distance units (40 for grape-oil palm,grape-rice, grape-sorghum, and rice-sorghum comparisons and 30 for theremaining comparisons) were clustered into the same synteny block, withblocks having at least five anchors retained. Chromosomes (or scaffolds) ineach genomewere segmented at major alignment breakpoints, resulting insmaller genomic regions with simple, continuous synteny patterns. Seg-mented regions between each pair of genomes were hierarchically clus-tered based on similarity of homolog density and filtered based on theirstatistical significance (P < 1e-30) by modeling the probability of homol-ogous matches with Poisson distribution as described previously. Each setof homologous region is defined as a PAR (Tang et al., 2010). We produceda total of 1217 high quality PARs across the studied genomes in this study.

Homologous regions inside each PAR were further refined at theirboundaries to trim off unaligned proportions and then analyzed to infergenomicmultiplicity using an evolutionarymodel based on pairwise syntenydistance between the clustered regions. Synteny distance is calculatedbetween pairs of aligned syntenic regions similar to the p-distance innucleotide alignment (gene order alignment in our case), which was thenused to infer their phylogenetic relationship using the UPGMA method.From the resulting tree topology, the copy number of paralogous seg-ments in each genomewas counted and used to deduce themultiplicity ofgenomic regions in each individual lineage since their divergence. Thepairwise differences of WGD multiplicity were then combined to infer thenumber of WGDs on each branch of the given species phylogeny byapplying the Fitch-Margoliash algorithm (Fitch andMargoliash, 1967). TheFitch-Margoliash algorithm applies a least-squares method to estimateper branch length (in our case, equivalent to the genomemultiplicity) basedon all pairwise distances between the genomes in comparison. The pro-cedurewas pipelined by an in-housePython script withmanual inspectionsat various intermediate stages.

Inferring Ancestral Gene Order Prior to Polyploidy Event

In WGDs, ancestral regions were duplicated and then diverged by se-quence and structural mutations. Each descendant homoeologous regioncontains a fraction of mutated ancestral gene content. Once syntenyblocks are identified, the ancestral content can be reconstructed by co-alescence of all the descendant regions contained in them.While ancestralreconstruction can be performed at different resolutions to facilitatecomparing regions with different extents of divergence, reconstruction ofancestral gene order recovers the alignability and syntenic mapping ofhomoeologous regions across highly diverged genomes, like those underthis study. In our approach, first, multiple alignment of homoeologous re-gions was computed by the top-down algorithm implemented in MCscan(Tang et al., 2008b). Local alignment inconsistencies, mostly resulted fromtandemgeneduplications andmicro-inversions, were linearized by dynamicprogramming maximizing anchor gene matching. The aligned homoeolo-gous regions were then merged into a single preduplication segment byinterpolation, following methods in previous studies (Bowers et al., 2003;

Aury et al., 2006). This results in approximated ancestral gene order andsyntenic mappings.

Synonymous Substitution (Ks) Calculation

For each pair of homologous genes, we aligned their protein sequencesusing ClustalW2 (Larkin et al., 2007) and converted the protein alignmentto DNA alignment using PAL2NAL (Suyama et al., 2006). Some homol-ogous genes could not produce reliable alignment for various reasons andwere discarded from further analysis. Ks values were calculated using theNei-Gojobori algorithm (Nei and Gojobori, 1986) implemented in thePAML package (Yang, 1997).Ks values for gene pairs with average GC3 >75% are considered unreliable and discarded (Tang et al., 2010). Ksvalues >3.0 indicate saturated substitutions at synonymous sites andthose gene pairs were excluded from further analysis.

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. Exemplar Maximum Likelihood Phylogeny ofWRKY Gene Family

Supplemental Figure 2. Exemplar Maximum Likelihood Phylogeny ofZinc Finger Gene Family.

Supplemental Figure 3. Example Alignment Showing Eight Homolo-gous Musa Regions Duplicated after the Divergence of Musa andOryza.

Supplemental Figure 4. WGD Multiplicity Level Estimation on EachBranch of Given Species Phylogeny.

Supplemental Figure 5. Homoeolog Depth (Paleopolyploidy Level) ofGenomic Regions in the Studied Taxa.

Supplemental Figure 6. Distributions of Synonymous SubstitutionRates (Ks) among Groups of Orthologous or Paralogous Gene Pairs.

Supplemental Table 1. Sources and Basic Information about Angio-sperm Genomes Used in Synteny Comparison.

Supplemental Data Set 1. Significant Enrichment of GO-SLIM Termfor the Gene Families with Ancient Tau Duplication Measured byHypergeometric Test Followed by FDR Multitest Adjustment Method.

ACKNOWLEDGMENTS

We appreciate funding from the National Science Foundation (DBI0849896, MCB 0821096, and MCB 1021718).

AUTHOR CONTRIBUTIONS

Y.J., J.L., H.T., and A.H.P. designed the research. Y.J. and J.L. performedanalyses. H.T. contributed analytic tools. Y.J., J.L., H.T., and A.H.P. wrotethe article.

Received May 13, 2014; revised July 3, 2014; accepted July 10, 2014;published July 31, 2014.

REFERENCES

Adams, K.L., and Wendel, J.F. (2005). Polyploidy and genome evolutionin plants. Curr. Opin. Plant Biol. 8: 135–141.

A Pan-Commelinid Paleopolyploidy 9 of 11

Page 10: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

Al-Mssallem, I.S., Hu, S., Zhang, X., Lin, Q., Liu, W., Tan, J., Yu, X.,Liu, J., Pan, L., Zhang, T., Yin, Y., and Xin, C., et al. (2013).Genome sequence of the date palm Phoenix dactylifera L. Nat.Commun. 4: 2274.

Amborella Genome Project (2013). The Amborella genome and theevolution of flowering plants. Science 342: 1241089.

Arrigo, N., and Barker, M.S. (2012). Rarely successful polyploids andtheir legacy in plant genomes. Curr. Opin. Plant Biol. 15: 140–146.

Aury, J.M., et al. (2006). Global trends of whole-genome duplicationsrevealed by the ciliate Paramecium tetraurelia. Nature 444: 171–178.

Barker, M.S., Vogel, H., and Schranz, M.E. (2009). Paleopolyploidyin the Brassicales: analyses of the Cleome transcriptome elucidatethe history of genome duplications in Arabidopsis and other Brassicales.Genome Biol. Evol. 1: 391–399.

Bennetzen, J.L., et al. (2012). Reference genome sequence of themodel plant Setaria. Nat. Biotechnol. 30: 555–561.

Blanc, G., and Wolfe, K.H. (2004). Widespread paleopolyploidy inmodel plant species inferred from age distributions of duplicategenes. Plant Cell 16: 1667–1678.

Bonierbale, M.W., Plaisted, R.L., and Tanksley, S.D. (1988). RFLP mapsbased on a common set of clones reveal modes of chromosomalevolution in potato and tomato. Genetics 120: 1095–1103.

Bowers, J.E., Chapman, B.A., Rong, J., and Paterson, A.H. (2003).Unravelling angiosperm genome evolution by phylogenetic analysisof chromosomal duplication events. Nature 422: 433–438.

Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T. (2009).trimAl: a tool for automated alignment trimming in large-scale phylogeneticanalyses. Bioinformatics 25: 1972–1973.

Christoffels, A., Koh, E.G.L., Chia, J.M., Brenner, S., Aparicio, S., andVenkatesh, B. (2004). Fugu genome analysis provides evidence fora whole-genome duplication early during the evolution of ray-finnedfishes. Mol. Biol. Evol. 21: 1146–1151.

Cui, L., et al. (2006). Widespread genome duplications throughout thehistory of flowering plants. Genome Res. 16: 738–749.

Dehal, P., and Boore, J.L. (2005). Two rounds of whole genomeduplication in the ancestral vertebrate. PLoS Biol. 3: e314.

D’Hont, A., et al. (2012). The banana (Musa acuminata) genome andthe evolution of monocotyledonous plants. Nature 488: 213–217.

Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with doublegenomes might have had a better chance to survive the Cretaceous-Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106: 5737–5742.

Fitch, W.M., and Margoliash, E. (1967). Construction of phylogenetictrees. Science 155: 279–284.

Freeling, M. (2009). Bias in plant gene content following differentsorts of duplication: tandem, whole-genome, segmental, or bytransposition. Annu. Rev. Plant Biol. 60: 433–453.

Freeling, M., and Thomas, B.C. (2006). Gene-balanced duplications,like tetraploidy, provide predictable drive to increase morphologicalcomplexity. Genome Res. 16: 805–814.

Gaut, B., Yang, L., Takuno, S., and Eguiarte, L.E. (2011). Thepatterns and causes of variation in plant nucleotide substitutionrates. Annu. Rev. Ecol. Evol. Syst. 42: 245–266.

Grant, D., Cregan, P., and Shoemaker, R.C. (2000). Genome organizationin dicots: genome duplication in Arabidopsis and synteny betweensoybean and Arabidopsis. Proc. Natl. Acad. Sci. USA 97: 4168–4173.

Haas, B.J., Delcher, A.L., Wortman, J.R., and Salzberg, S.L. (2004).DAGchainer: a tool for mining segmental genome duplications andsynteny. Bioinformatics 20: 3643–3646.

Harris, R.S. (2007). Improved Pairwise Alignment of Genomic DNA.PhD dissertation (The Pennsylvania State University).

Hedges, S.B., Dudley, J., and Kumar, S. (2006). TimeTree: a publicknowledge-base of divergence times among organisms. Bioinformatics22: 2971–2972.

International Brachypodium Initiative (2010). Genome sequencingand analysis of the model grass Brachypodium distachyon. Nature463: 763–768.

Jaillon, O., et al.; French-Italian Public Consortium for GrapevineGenome Characterization (2007). The grapevine genome sequencesuggests ancestral hexaploidization in major angiosperm phyla. Nature449: 463–467.

Jaillon, O., et al. (2004). Genome duplication in the teleost fish Tetraodonnigroviridis reveals the early vertebrate proto-karyotype. Nature 431:946–957.

Jiao, Y., et al. (2011). Ancestral polyploidy in seed plants and angiosperms.Nature 473: 97–100.

Jiao, Y., et al. (2012). A genome triplication associated with earlydiversification of the core eudicots. Genome Biol. 13: R3.

Katoh, K., Kuma, K., Toh, H., and Miyata, T. (2005). MAFFT version5: improvement in accuracy of multiple sequence alignment.Nucleic Acids Res. 33: 511–518.

Kellis, M., Birren, B.W., and Lander, E.S. (2004). Proof and evolutionaryanalysis of ancient genome duplication in the yeast Saccharomycescerevisiae. Nature 428: 617–624.

Kowalski, S.P., Lan, T.H., Feldmann, K.A., and Paterson, A.H.(1994). Comparative mapping of Arabidopsis thaliana and Brassicaoleracea chromosomes reveals islands of conserved organization.Genetics 138: 499–510.

Ku, H.M., Vision, T., Liu, J., and Tanksley, S.D. (2000). Comparingsequenced segments of the tomato and Arabidopsis genomes:large-scale duplication followed by selective gene loss createsa network of synteny. Proc. Natl. Acad. Sci. USA 97: 9121–9126.

Larkin, M.A., et al. (2007). Clustal W and Clustal X version 2.0.Bioinformatics 23: 2947–2948.

Li, L., and Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL:identification of ortholog groups for eukaryotic genomes. GenomeRes. 13: 2178–2189.

Lynch, M., and Conery, J.S. (2000). The evolutionary fate andconsequences of duplicate genes. Science 290: 1151–1155.

Maere, S., De Bodt, S., Raes, J., Casneuf, T., Van Montagu, M., Kuiper,M., and Van de Peer, Y. (2005). Modeling gene and genome duplicationsin eukaryotes. Proc. Natl. Acad. Sci. USA 102: 5454–5459.

Mayrose, I., Zhan, S.H., Rothfels, C.J., Magnuson-Ford, K., Barker,M.S., Rieseberg, L.H., and Otto, S.P. (2011). Recently formedpolyploid plants diversify at lower rates. Science 333: 1257.

Ming, R., et al. (2013). Genome of the long-living sacred lotus(Nelumbo nucifera Gaertn.). Genome Biol. 14: R41.

Ming, R., et al. (2008). The draft genome of the transgenic tropicalfruit tree papaya (Carica papaya Linnaeus). Nature 452: 991–996.

Mower, J.P., Touzet, P., Gummow, J.S., Delph, L.F., and Palmer,J.D. (2007). Extensive variation in synonymous substitution rates inmitochondrial genes of seed plants. BMC Evol. Biol. 7: 135.

Nei, M., and Gojobori, T. (1986). Simple methods for estimating thenumbers of synonymous and nonsynonymous nucleotide substitutions.Mol. Biol. Evol. 3: 418–426.

Ohno, S. (1970). Evolution by Gene Duplication. (Berlin: Springer).Otto, S.P., and Whitton, J. (2000). Polyploid incidence and evolution.

Annu. Rev. Genet. 34: 401–437.Paterson, A.H., et al. (1996). Toward a unified genetic map of higher plants,

transcending the monocot-dicot divergence. Nat. Genet. 14: 380–382.Paterson, A.H., Bowers, J.E., and Chapman, B.A. (2004). Ancient

polyploidization predating divergence of the cereals, and itsconsequences for comparative genomics. Proc. Natl. Acad. Sci.USA 101: 9903–9908.

Paterson, A.H., Freeling, M., Tang, H., and Wang, X. (2010). Insightsfrom the comparison of plant genome sequences. Annu. Rev. PlantBiol. 61: 349–372.

10 of 11 The Plant Cell

Page 11: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

Paterson, A.H., Chapman, B.A., Kissinger, J.C., Bowers, J.E.,Feltus, F.A., and Estill, J.C. (2006). Many gene and domain familieshave convergent fates following independent whole-genome duplicationevents in Arabidopsis, Oryza, Saccharomyces and Tetraodon. TrendsGenet. 22: 597–602.

Paterson, A.H., et al. (2009). The Sorghum bicolor genome and thediversification of grasses. Nature 457: 551–556.

Prasad, V., Strömberg, C.A., Alimohammadian, H., and Sahni, A.(2005). Dinosaur coprolites and the early evolution of grasses andgrazers. Science 310: 1177–1180.

Putnam, N.H., et al. (2008). The amphioxus genome and the evolutionof the chordate karyotype. Nature 453: 1064–1071.

Schnable, P.S., et al. (2009). The B73 maize genome: complexity,diversity, and dynamics. Science 326: 1112–1115.

Singh, R., et al. (2013). Oil palm genome sequence reveals divergenceof interfertile species in old and new worlds. Nature 500: 335–339.

Smith, S.A., and Donoghue, M.J. (2008). Rates of molecular evolution arelinked to life history in flowering plants. Science 322: 86–89.

Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson,A.H., Zheng, C., Sankoff, D., Depamphilis, C.W., Wall, P.K., andSoltis, P.S. (2009). Polyploidy and angiosperm diversification. Am.J. Bot. 96: 336–348.

Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-basedphylogenetic analyses with thousands of taxa and mixed models.Bioinformatics 22: 2688–2690.

Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: robustconversion of protein sequence alignments into the correspondingcodon alignments. Nucleic Acids Res. 34: W609–W612.

Tang, H., Bowers, J.E., Wang, X., and Paterson, A.H. (2010).Angiosperm genome comparisons reveal early polyploidy in themonocot lineage. Proc. Natl. Acad. Sci. USA 107: 472–477.

Tang, H., Bowers, J.E., Wang, X., Ming, R., Alam, M., and Paterson,A.H. (2008a). Synteny and collinearity in plant genomes. Science320: 486–488.

Tang, H., Wang, X., Bowers, J.E., Ming, R., Alam, M., andPaterson, A.H. (2008b). Unraveling ancient hexaploidy through

multiply-aligned angiosperm gene maps. Genome Res. 18: 1944–1954.

The Angiosperm Phylogeny Group (2009). An update of the Angio-sperm Phylogeny Group classification for the orders and families offlowering plants: APG III. Bot. J. Linn. Soc. 161: 105–121.

Van de Peer, Y. (2011). A mystery unveiled. Genome Biol. 12: 113.Van de Peer, Y., Maere, S., and Meyer, A. (2009). The evolutionary

significance of ancient genome duplications. Nat. Rev. Genet. 10:725–732.

Vandepoele, K., Saeys, Y., Simillion, C., Raes, J., and Van De Peer, Y.(2002). The automatic detection of homologous regions (ADHoRe) andits application to microcolinearity between Arabidopsis and rice. GenomeRes. 12: 1792–1801.

Vision, T.J., Brown, D.G., and Tanksley, S.D. (2000). The origins ofgenomic duplications in Arabidopsis. Science 290: 2114–2117.

Wang, X., Shi, X., Hao, B., Ge, S., and Luo, J. (2005). Duplication andDNA segmental loss in the rice genome: implications fordiploidization. New Phytol. 165: 937–946.

Wang, X., et al.; Brassica rapa Genome Sequencing ProjectConsortium (2011). The genome of the mesopolyploid crop speciesBrassica rapa. Nat. Genet. 43: 1035–1039.

Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H.,Jin, H., Marler, B., Guo, H., Kissinger, J.C., and Paterson, A.H.(2012). MCScanX: a toolkit for detection and evolutionary analysis ofgene synteny and collinearity. Nucleic Acids Res. 40: e49.

Wolfe, K.H., Li, W.H., and Sharp, P.M. (1987). Rates of nucleotidesubstitution vary greatly among plant mitochondrial, chloroplast,and nuclear DNAs. Proc. Natl. Acad. Sci. USA 84: 9054–9058.

Yang, Z. (1997). PAML: a program package for phylogenetic analysisby maximum likelihood. Comput. Appl. Biosci. 13: 555–556.

Zhang, L., Vision, T.J., and Gaut, B.S. (2002). Patterns of nucleotidesubstitution among simultaneously duplicated gene pairs in Arabidopsisthaliana. Mol. Biol. Evol. 19: 1464–1473.

Zheng, C., Chen, E., Albert, V.A., Lyons, E., and Sankoff, D. (2013).Ancient eudicot hexaploidy meets ancestral eurosid gene order.BMC Genomics 14 (suppl. 7): S3.

A Pan-Commelinid Paleopolyploidy 11 of 11

Page 12: Integrated Syntenic and Phylogenomic Analyses Reveal an ... · Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in MonocotsW Yuannian Jiao,a,1 Jingping

DOI 10.1105/tpc.114.127597; originally published online July 31, 2014;Plant Cell

Yuannian Jiao, Jingping Li, Haibao Tang and Andrew H. PatersonMonocots

Integrated Syntenic and Phylogenomic Analyses Reveal an Ancient Genome Duplication in

 This information is current as of August 24, 2019

 

Supplemental Data /content/suppl/2014/07/10/tpc.114.127597.DC1.html

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists