View
6
Download
0
Category
Preview:
Citation preview
1
LARGE-SCALE BIOLOGY 1 2
Breaking Free: the Genomics of Allopolyploidy-facilitated Niche Expansion 3
in White Clover 4
5
Andrew G. Griffiths1*
, Roger Moraga1, Marni Tausen
2,3, Vikas Gupta
2, Timothy P. 6
Bilton4, Matthew A. Campbell
5, Rachael Ashby
4, Istvan Nagy
6, Anar Khan
4, Anna 7
Larking1, Craig Anderson
1, Benjamin Franzmayr
1, Kerry Hancock
1, Alicia Scott
1, Nick 8
W. Ellison1, Murray P. Cox
5, Torben Asp
6, Thomas Mailund
3, Mikkel H. Schierup
3,7, 9
and Stig Uggerhøj Andersen2*
10 *To whom correspondence should be addressed. 11
12
Author affiliations: 13 1AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 4442, New 14
Zealand. 15 2Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 10, 16
8000 Aarhus C, Denmark. 17 3Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, 8000 Aarhus C, 18
Denmark. 19 4AgResearch, Invermay Agricultural Centre, Private Bag 50034, Mosgiel 9053, New 20
Zealand. 21 5Bioinformatics and Statistics Group, Institute of Fundamental Sciences, Massey University, 22
Private Bag 11222, Palmerston North 4410, New Zealand. 23 6Department of Molecular Biology and Genetics, Aarhus University, Forsøgsvej 1, 4200 24
Slagelse, Denmark. 25 7Department of Bioscience, Aarhus University, Ny Munkegade 116, 8000 Aarhus C, 26
Denmark. 27
Corresponding authors 28
Andrew Griffiths 29
Address: AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 30
4442, New Zealand. 31
Tel: +6463518183 32
Email: andrew.griffiths@agresearch.co.nz 33
Stig U. Andersen 34
Address: Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds 35
Vej 10, 8000 Aarhus C, Denmark. 36
Tel: +4587154937 37
Email: sua@mbg.au.dk 38
39
Short Title: Allopolyploidy-facilitated niche expansion in White Clover 40
41
One-sentence Summary: Allopolyploid white clover arose in the last glaciation, retained 42
progenitor genome integrity, and the few genes switching between homoeologues across 43
tissues were enriched for flavonoid synthesis. 44
45
Material Distribution Footnote: The authors responsible for distribution of materials 46
integral to the findings presented in this article in accordance with the policy described in the 47
Plant Cell Advance Publication. Published on April 25, 2019, doi:10.1105/tpc.18.00606
©2019 American Society of Plant Biologists. All Rights Reserved
2
Instructions for Authors (www.plantcell.org) are: Andrew G. Griffiths 48
(andrew.griffiths@agresearch.co.nz) and Stig Uggerhøj Andersen (sua@mbg.au.dk). 49
50
ABSTRACT 51
The merging of distinct genomes, allopolyploidization, is a widespread phenomenon in 52
plants. It generates adaptive potential through increased genetic diversity, but examples 53
demonstrating its exploitation remain scarce. White clover, Trifolium repens, is a ubiquitous 54
temperate allotetraploid forage crop derived from two European diploid progenitors confined 55
to extreme coastal or alpine habitats. We sequenced and assembled the genomes and 56
transcriptomes of this species complex to gain insight into the genesis of white clover and the 57
consequences of allopolyploidization. Based on these data, we estimate that white clover 58
originated ~15-28,000 years ago during the last glaciation when alpine and coastal 59
progenitors were likely co-located in glacial refugia. We found evidence of progenitor 60
diversity carry-over through multiple hybridization events and show that the progenitor 61
subgenomes have retained integrity and gene expression activity as they travelled within 62
white clover from their original confined habitats to a global presence. At the transcriptional 63
level, we observed remarkably stable subgenome expression ratios across tissues. Among the 64
few genes that show tissue-specific switching between homoeologous gene copies, we found 65
flavonoid biosynthesis genes strongly overrepresented, suggesting an adaptive role of some 66
allopolyploidy-associated transcriptional changes. Our results highlight white clover as an 67
example of allopolyploidy-facilitated niche expansion, where two progenitor genomes, 68
adapted and confined to disparate and highly specialized habitats, expanded to a ubiquitous 69
global presence following glaciation-associated allopolyploidization. 70
Keywords: polyploidy; Trifolium; genome; progenitor; subgenome; evolution; legume; plant; 71
flavonoid; homoeologous gene expression; diversity; hybridization timing 72
73
74
3
INTRODUCTION 75
Polyploidy, where more than two genomes reside in a single nucleus, is widely distributed 76
among eukaryotes and is particularly prevalent in angiosperms (Otto and Whitton, 2000; 77
Mable, 2003; Wood et al., 2009), where it is considered a driver of speciation and 78
biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 79
interspecific hybridization followed by genome doubling or fusion of unreduced gametes 80
results in allopolyploidy, where divergent homoeologous subgenomes reside within the same 81
nucleus. These duplications can be recent (< 150 years) (Ownbey, 1950; Soltis et al., 2004; 82
Ainouche et al., 2009; Chester et al., 2012) or ancient, extending back to the origin of seed 83
plants 350 million years ago (Jia et al., 2013; Li et al., 2015). In many lineages, such 84
duplications are a recurring phenomenon (Soltis et al., 2016; Soltis and Soltis, 2016). Many 85
neopolyploids likely become extinct shortly after formation, and there is much debate over 86
the implications of polyploidization and whether it is an evolutionary jump-start or dead-end 87
(Otto, 2007; Madlung, 2013). Polyploidy events can confer enhanced fitness, phenotypic 88
plasticity and adaptability, and have been correlated with survival in stressful environments, 89
such as during the Cretaceous-Paleogene mass extinction (Gross et al., 2004; Leitch and 90
Leitch, 2008; Fawcett et al., 2009; Vanneste et al., 2014b; Vanneste et al., 2014a; Selmecki et 91
al., 2015). Consequences of genome duplication, particularly for allopolyploids, can include 92
rapid gene loss and rearrangement (McClintock, 1984; Rapp et al., 2009; Grover et al., 2012; 93
Tayalé and Parisod, 2013; Yoo et al., 2013; Garsmeur et al., 2014; Woodhouse et al., 2014) 94
through a range of mechanisms in which transposable element activity is implicated 95
(Woodhouse et al., 2014). Many studied allopolyploids show evidence of a genomic or 96
transcriptomic response to accommodating divergent genomes within a single nucleus, and 97
these processes can occur within very few generations (Chester et al., 2012; Grover et al., 98
2012; Tayalé and Parisod, 2013). 99
White clover (Trifolium repens L.) is an example of a relatively recent allopolyploid that 100
likely arose during the last major glaciations 13,000 to 130,000 years ago (Williams et al., 101
2012). A successful and highly variable perennial allotetraploid (AABB-type genome; 102
2n=4x=32) legume exhibiting disomic inheritance (Williams et al., 1998), it has a relatively 103
compact genome (1C=1093 Megabases (Mb)) (Bennett and Leitch, 2011) arranged in small 104
similar-sized chromosomes with few obvious features to distinguish homoeologues (Ansari et 105
al., 1999). Furthermore, its outbreeding nature leads to significant sequence polymorphism 106
and highly heterogeneous populations (Aasmo Finne et al., 2000; Zhang et al., 2010). White 107
clover has a widespread natural range encompassing grasslands of Europe, Western Asia and 108
4
North Africa across latitudes and altitudes (Daday, 1958) (Figure 1). Due to this ability to 109
occupy a wide range of climatic conditions, white clover is used extensively as a companion 110
forage in moist temperate agriculture, thereby extending its range globally (Daday, 1958; 111
Abberton et al., 2006) (Figure 1). Extant relatives of the paternal and maternal progenitors of 112
white clover have been identified as T. occidentale Coombe (Western Clover) and T. 113
pallescens Schreb. (Pale Clover) (Ellison et al., 2006; Williams et al., 2012). In contrast to 114
white clover, these diploid (2n=2x=16) species have very limited and disparate extreme 115
habitats at the margins of, but not overlapping with, the range of the white clover. The 116
creeping T. occidentale (To) is confined to within ~100 m of the shore in a maritime niche on 117
the western coasts of Europe (Coombe, 1961), whereas the non-creeping T. pallescens (Tp) is 118
restricted to European alpine habitats at altitudes between 1800 and 2700 m (Raffl et al., 119
2008) (Figure 1). The allopolyploidization event giving rise to white clover is thought to be 120
relatively recent due to the lack of significant divergence of To genic sequence and the To-121
like homoeologues, and is hypothesized to have occurred during the last major glaciations 122
when environmental conditions brought alpine and maritime species in close proximity in 123
glacial refugia (Williams et al., 2012). In contrast to white clover, which occupies markedly 124
different environments to To and Tp, most allopolyploids with extant progenitors share 125
similar habitats with either progenitor (Soltis et al., 2016). 126
Here, we sequenced and analysed the full genomes and transcriptomes of white clover and 127
extant relatives of its diploid progenitors to gain insight into the timing, process and 128
consequences of the allopolyploidization event. 129
130
131
5
RESULTS 132
Assembly of white clover and progenitor genomes 133
We sequenced inbred individuals of white clover (Tr) and extant relatives of its diploid 134
progenitors, To and Tp, using a range of Illumina sequencing libraries, including white clover 135
TruSeq Synthetic Long-Reads (TSLRs) (Supplemental Table 1). Extant progenitor genome 136
size estimates derived from the sequence data were similar (~530 Mb), together equalling 137
approximately the 1,174 Mb size estimate of the white clover genome (Table 1; Figure 1; 138
Supplemental Figure 1). To and Tp assemblies encompassed, respectively, 437 Mb 139
(N50=192 kilobases (kb)) and 382 Mb (N50=173 kb) of sequence, accounting for 82% and 140
72% of the predicted genome sizes (Figure 1, Table 1). The white clover assembly spanned 141
841 Mb (N50 = 122 kb), and approximated the sum of the assemblies of the extant 142
progenitors; suggesting that conflation of homoeologous sequences had been successfully 143
eliminated (Figure 1, Table 1). The Tp assembly was more fragmented than the other 144
species, which was expected due to the lower sequencing data input (Table 1, Supplemental 145
Table 1). 146
Scaffold ordering, pseudomolecule construction and assignment to subgenomes was guided 147
by pairwise linkage disequilibrium (LD), calculated as r2, between ~7,300 Genotyping-by-148
Sequencing (GBS) Single Nucleotide Polymorphism (SNP) markers on 3,364 scaffolds 149
(Figure 2A; Supplemental Figure 2), and alignment to the model forage legume Medicago 150
truncatula genome (Mt4.0) (Tang et al., 2014). Incorporation of single locus homoeologue-151
specific (SLHS) simple sequence repeat (SSR) markers (Supplemental Table 2) in the LD 152
analysis enabled alignment of pseudomolecules and homoeologues with a previous SSR-153
based genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2C-D). 154
The white clover pseudomolecules were used to order the assemblies for To and Tp, and the 155
resulting pseudomolecules were of similar size for the progenitors and their corresponding 156
white clover subgenomes (Figure 2B; Supplemental Table 3). As an assessment of genome 157
coverage in the assemblies, whole genome shotgun Illumina sequence reads (180 bp insert 158
size; Supplemental Table 1) representing 70×, 80× and 35× depth for To, Tp and white 159
clover, respectively, were mapped back to the corresponding assemblies, yielding a mapping 160
rate of ~95% (Supplemental Table 4). This indicates the assemblies encompass a high 161
proportion of their genomes. Alignment of the white clover To and Tp-derived subgenomes 162
(TrTo and TrTp, respectively) with the Mt4.0 genome (Tang et al., 2014) identified general 163
6
macrosynteny with some major rearrangements relative to M. truncatula (Figure 2C), 164
consistent with previous results (Griffiths et al., 2013). 165
Gene annotation for all three species was enhanced by use of RNA-sequence assemblies 166
derived from paired-end reads of transcripts expressed in floral, leaf, root and stolon/shoot 167
tissue (Pooled dataset) from each species, with an average of 77 million reads per sample 168
(Supplemental Tables 4 and 5). Of the raw reads, ~88%, 96%, and 95% mapped to the To, 169
Tp and white clover genome sequences, respectively, indicating these assemblies encompass 170
most of the gene space. The high read coverage facilitated gene annotation, with most gene 171
models based directly on transcriptional evidence (Supplemental Table 5). The number of 172
protein-coding genes was similar for the progenitors and the number of protein-coding genes 173
in white clover was close to the sum of the contributing progenitors (Supplemental Table 6). 174
175
To and Tp are the progenitors of white clover 176
Identification of the extant relatives of white clover’s diploid progenitors (Figure 1; Figure 177
3A) had been based in part on similarities with discrete portions of the chloroplast genome as 178
well as Internal Transcribed Spacer (ITS) sequences of the Tp and To, respectively (Ellison et 179
al., 2006). Expanding this comparison with the complete chloroplast genomes of white clover 180
and its extant progenitors, the alignment with Tp revealed 99.5% identical bases and a mean 181
coverage of 100% (Supplemental Table 7), confirming Tp as the likely chloroplast donor 182
(Supplemental Figure 3). Furthering this analysis, we surveyed the data generated from 183
whole-genome resequencing (~49-fold coverage) of four outbred white clover individuals 184
from different populations (Supplemental Table 8). Mapping white clover reads from each 185
individual against the chloroplast genomes of white clover, To, and Tp identified high 186
similarity with Tp (mean = 54 SNP variants), whereas there were many differences relative to 187
the To chloroplast (mean = 563 SNP variants) (Supplemental Table 9). In summary, 188
sequencing of five white clover individuals (the S9 and four resequenced individuals) confirm 189
Tp as the likely source of the white clover chloroplast and, therefore, the maternal progenitor 190
of white clover. 191
The 18S-ITS1-5.8S-ITS2-26S rDNA region of the nuclear genome harbouring the ITS 192
sequences forms a gene cluster comprising many hundreds of tandem repeat arrays located in 193
the nucleolar organising region (NOR) (Álvarez and Wendel, 2003). Previous analysis of 194
PCR-amplified white clover ITS sequences had shown them to be either identical or contain a 195
single nucleotide polymorphism (SNP) variant to those of To, and only the To-NOR was 196
7
detected (Ellison et al., 2006; Williams et al., 2012). In this study, we surveyed the 668,571 197
white clover TruSeq Synthetic Long-Reads (TSLRs) for ITS sequences and identified 774 198
full matches spanning the 623 base pair (bp) ITS1-5.8S rRNA-ITS2 region. Alignment with 199
extant progenitor ITS data identified 770 TSLRs containing SNPs diagnostic of To-derived 200
ITS and, additionally, four TSLRs diagnostic of Tp-derived ITS, supporting confirmation of 201
To and Tp as progenitors (Supplemental Figure 4). The prevalence of the To-ITS sequence 202
supports observations that the active NOR in white clover is To-derived (Williams et al., 203
2012), whereas detection of Tp-derived ITS is the first discovery of a remnant Tp-NOR. This 204
is further evidence that Tp contributed to the white clover nuclear genome in addition to the 205
chloroplast genome. The reduction in the Tp-NOR is consistent with nucleolar dominance 206
observed in allopolyploids whereby the rDNA locus of one progenitor is often silenced and 207
subsequently reduced or eliminated from the genome (Ansari et al., 2008; Kovarik et al., 208
2008). 209
Expanding comparisons beyond chloroplast and ITS evidence, we also assessed whole 210
genome similarities between extant progenitors and their corresponding white clover 211
subgenomes. We identified SNPs specific either to subgenome/progenitor pairs (TrTo = To ≠ 212
TrTp = Tp) or to each progenitor genome (To ≠ TrTo = TrTp = Tp; Tp ≠ TrTo= TrTp = To) using 213
HyLiTE (Duchemin et al., 2015), which determined genic SNP subgenome and genome 214
identity based on transcriptomic and genomic data from the three species mapped to To or Tp 215
reference gene models. We found ~8000 SNPs specific to each progenitor in transcribed 216
regions, whereas we identified ~50-fold more shared between each progenitor and its 217
corresponding subgenome (Supplemental Table 10). This indicates substantial conservation 218
of SNPs between the progenitor/subgenome pairs providing further evidence that To and Tp 219
are the likely progenitors. In conclusion, the combined ITS, chloroplast and broader genomic 220
evidence confirmed that white clover is derived from To and Tp progenitors with Tp the 221
likely maternal progenitor. 222
223
White clover originated ~15-28,000 years ago during the last glaciation 224
The analysis of genic SNPs specific to To and Tp using HyLiTE (Duchemin et al., 2015) 225
detected relatively few and similar numbers of genic SNPs (~8000) specific to each extant 226
progenitor, suggesting similar and possibly recent divergence times between each progenitor 227
and its corresponding white clover subgenome (Supplemental Table 10). We investigated 228
this further by estimating the timing of the white clover allopolyploidization event using an 229
8
isolation model (Mailund et al., 2012) that analysed pairwise alignments of progenitor 230
genomes (To and Tp) and white clover subgenomes (TrTo and TrTp) (Figure 3A). Whole 231
pseudomolecule alignment of three genome pairs (To vs. Tp; To vs. TrTo; and Tp vs. TrTp) 232
produced 10-32k alignments with average lengths ranging from 7.2-8.0 kb for each 233
comparison (Supplemental Figures 5-7; Supplemental Table 11). As the model assumes 234
neutral polymorphic sites, we masked genes in the alignments to remove regions likely to be 235
under selection. The mutation rate for white clover is unknown, but a base substitution rate of 236
6.5×10-9
per generation has been determined in Arabidopsis thaliana (Ossowski et al., 2010), 237
and mutation rate appears to increase with genome size (Lynch, 2010). Based on these 238
observations, we estimated the mutation rates of white clover and its progenitors to range 239
between 1.1×10-8
and 1.8×10-8
per base per generation (Supplemental Figure 8). This aligns 240
with the allotetraploid Arachis hypogaea (peanut) estimated genome-wide mutation rate of 241
1.6×10-8
(Bertioli et al., 2016). As the divergence time scales with the mutation rate used, the 242
mutation rate directly influences divergence time estimates. Based on a single generation per 243
year, the Arabidopsis 6.5×10-9
mutation rate placed the white clover progenitors divergence 244
~530k years ago, with the progenitor-subgenome divergence 43-48k years ago 245
(Supplemental Table 11). However, when using the proxy clover mutation rates, the 246
estimated progenitor divergence was ~191-313k years ago, and the progenitors and 247
corresponding subgenomes divergence was 15-28k years ago (Figure 3B-C; Supplemental 248
Table 11). Assuming the timing of the allopolyploidization event equated to that of 249
progenitor genome divergence from the white clover subgenomes (Figure 3A), the assessed 250
mutation rates placed the allopolyploidization within the glaciation period that culminated in 251
the last European Glacial Maximum ~20k years ago (Yokoyama et al., 2000; Mangerud et al., 252
2004), whereas the proxy clover rates coincided with the Glacial Maximum itself (Figure 253
3C). This was a period when the alpine and coastal progenitors were likely to be in close 254
proximity in glacial refugia. 255
256
White clover arose from multiple hybridization events 257
There is a range of allopolyploidization scenarios (Ramsey and Schemske, 1998; Mason and 258
Pires, 2015) that may have given rise to white clover, resulting in different genomic diversity 259
signatures. Common examples rely on unreduced gamete production and may include 260
scenarios such as a single hybridization and chromosome doubling event 261
(A×B=AB→AABB), or an interspecific hybridization of unreduced gametes 262
9
(AA×BB=AABB). If the genesis of white clover resulted from a single hybridization event, 263
this would be associated with a severe genetic bottleneck. We assessed this by analysing 264
whole-genome resequencing data (~49-fold coverage) from four outbred white clover 265
individuals from different populations (Supplemental Table 8). These genomes were then 266
evaluated with Pairwise Sequentially Markovian Coalescent (PSMC) models (Li and Durbin, 267
2011) using the mutation rates described above ranging from 6.5×10-9
to 1.8×10-8
per site per 268
generation. The PSMC model estimated current and past effective population sizes (Ne) based 269
on mutation rates of the four genomes and their subgenomes, which also infers Ne of the pre-270
allopolyploidization progenitor populations. For three of the four individuals, the model 271
suggested only a mild genetic bottleneck 2k to 15k generations ago when the Ne dropped to 272
~20k from a pre-bottleneck level of ~48k (Figure 3D; Supplemental Table 12). This was 273
followed by a rapid recent expansion that lagged the warming period after the last Glacial 274
Maximum (Mangerud et al., 2004) (Figure 3C-D). The fourth individual (27) had a larger 275
number of polymorphic sites than the other three, which could explain the difference in 276
estimated population history (Supplemental Table 8). In addition to the mutation rate, the 277
recombination rate also influences PSMC analysis, but we found the results stable over a 278
wide range of recombination settings (Supplemental Figure 9). The 20-fold increase in the 279
white clover Ne relative to the progenitor populations appears high and could be inflated as 280
the PSMC model has limited power to accurately determine Ne in the recent past (<20k years 281
ago) (Li and Durbin, 2011). The PSMC analysis did not identify a severe population 282
bottleneck coinciding with the hybridization event as would be expected if white clover arose 283
from two or few parent plants, suggesting that diversity may have been carried over to white 284
clover from its progenitors through multiple allopolyploidization events. These results were 285
corroborated using Multiple Sequentially Markovian Coalescent (MSMC) analysis (Schiffels 286
and Durbin, 2014), an extended version of PSMC where several individuals can be surveyed 287
simultaneously. Incorporating more individuals (or haplotypes) improves Ne estimation in 288
recent time periods (2k-30k years ago) (Schiffels and Durbin, 2014), complementing the 289
PSMC model. MSMC analysis identified a weak bottleneck, similar to the PSMC model, that 290
became non-existent leaving only a recent expansion in population size as resolution 291
improved with incremental addition of haplotypes (Figure 3E). As with the PSMC, the 292
MSMC results were consistent over a wide range of recombination and mutation settings 293
(Supplemental Figure 10). 294
These models, however, cannot reliably differentiate between a very severe bottleneck of 295
short duration and a less pronounced reduction of Ne over a longer period (Li and Durbin, 296
10
2011; Schiffels and Durbin, 2014). To distinguish between these possibilities, we 297
characterized diversity in extant white clover populations using Genotyping-by-Sequencing 298
(GBS)-derived SNPs identified in 200 individuals from 20 different clover populations 299
(Supplemental Dataset 1). We found a high level of diversity with polymorphism detected 300
at 7.0% of all reliably sequenced sites with an average heterozygosity of 0.21 yielding an 301
overall genome wide nucleotide diversity of 0.015. This is similar to other outbreeding 302
species such as teosinte (0.012 (Chen et al., 2017)), Arabidopsis lyrata and A. halleri (0.024 303
and 0.020) (Novikova et al., 2016), and higher than selfing species such as A. thaliana 304
(0.0060) (Novikova et al., 2016) and Medicago truncatula (0.0043) (Branca et al., 2011). 305
This dataset allowed us to generate a detailed allele, or site, frequency spectrum (SFS) of the 306
current white clover population sample against which we could compare simulated SFS 307
produced under different demographic scenarios. 308
The most extreme bottleneck would result from a single hybridization and chromosome 309
doubling event (A×B=AB→AABB), producing a single homozygous white clover founder 310
individual with no polymorphic sites, from which all diversity would accumulate through 311
mutations. This scenario exhibited a very strong bias against common alleles coupled with a 312
low rate of polymorphism, and was inconsistent with the observed data (Figure 3F). A 313
second possibility is that two unreduced gametes, one from each of the heterozygous diploid 314
progenitors, fused to produce a single white clover hybrid (AA×BB=AABB), thereby 315
carrying variation over to white clover from the two founder progenitors. This second 316
scenario produced similar SFS to the unreduced gametes model, albeit with a less severe 317
skew compared to the observed data (Figure 3F). These results suggested that a population 318
bottleneck alone could not explain the observed data. Instead, we modelled combinations of 319
various types of population bottlenecks with recent expansion of the population size up to 320
twenty-fold the size of the pre-allopolyploidization effective population size (Supplemental 321
Figure 11). Of all scenarios tested, a recent three-fold expansion of the effective population 322
size in the absence of a population bottleneck came closest to matching the observed data 323
(Figure 3F; Supplemental Figure 11; Supplemental Table 12). Simulations also matched 324
the observed mutation density when the progenitor population size was increased assuming 325
the PSMC analysis had underestimated the progenitor Ne which at ~48k was lower than the 326
usual 100k-1000k range for nuclear genomes of multicellular species (Lynch, 2010) (Figure 327
3F; Supplemental Table 12). 328
White clover is thus unlikely to have experienced a severe genetic bottleneck at the time of 329
allopolyploidization during the most recent glaciation ~15-28k years ago, and must have 330
11
arisen from multiple independently generated hybrids, which resulted in carryover of 331
progenitor diversity into the new species. In addition, our PSMC and MSMC analyses and 332
mutation accumulation simulation both indicated a rapid recent population expansion, which 333
is consistent with white clover quickly filling available niches as the climate warmed to 334
generate a larger effective population size and occupy a broader habitat range than its 335
progenitors had prior to allopolyploidization. 336
337
Progenitor genome integrity has been retained in white clover 338
The extant progenitors of white clover occupy restricted ranges in highly specialized niches 339
(Figure 1). If the progenitor genomes have remained intact, their coexistence within white 340
clover following allopolyploidization has enabled them to undergo global allopolyploidy-341
facilitated niche-expansion. To investigate subgenome integrity, we made further 342
comparisons between the progenitor genomes and their corresponding white clover 343
subgenome derivatives. At a whole-genome level, the sequence-derived estimated genome 344
size of white clover approximated the combined estimates of the contributing progenitors, 345
indicating that major chromosomal fragments had not been lost in white clover compared to 346
the extant progenitors (Table 1; Supplemental Figure 1). 347
Inter-homoeologous recombination could lead to rapid subgenome alterations, but white 348
clover is considered a functional diploid (Williams et al., 1998), and cytological evidence 349
suggests that homoeologous recombination events are very rare in white clover. To determine 350
if the genomic data supported these observations of subgenome independence, we assessed 351
the incidence of white clover Illumina TruSeq Synthetic Long Reads (TSLRs) containing 352
sequence from both progenitors. These long reads (total=668,571; average length=4,113 bp; 353
Supplemental Table 1) were unlikely to contain chimaeric artefacts as they were assembled 354
from single DNA fragments. We identified only five high confidence recombination events in 355
contigs derived from 38 TSLR reads (0.006% of the TSLR pool), where TSLRs contained 356
well-supported blocks arising from both progenitors (Supplemental Figure 12). The very 357
few signatures of homoeologous recombination in white clover agree with the lack of 358
observations of quadrivalent meiotic configurations in cytological studies (Ansari H. 359
(AgResearch), pers. comm.). In conclusion, there appears to have been very limited inter-360
homoeologous recombination in white clover. 361
Gene loss following polyploidization is common (Grover et al., 2012), and could also 362
compromise the integrity of the white clover subgenomes. To assess the level of gene loss, 363
12
we mapped sequencing reads from white clover genomic DNA to the combined reference 364
sequences of the extant progenitor genomes, requiring a unique match for each read. Of the 365
~39k and ~36k protein-coding genes found in both progenitor genomes (Supplemental 366
Table 6), ~36k and ~35k were covered by uniquely mapping white clover reads in To and 367
Tp, respectively, indicating no more than 5% gene loss. 368
Gene silencing is also commonly associated with genome hybridization or duplication, and 369
often precedes gene loss. We identified 68,475 protein-coding genes in white clover with 370
>99% based on direct transcriptional evidence (Supplemental Table 5), which approximated 371
the sums of To and Tp protein-coding gene counts (Supplemental Table 6). This indicated 372
that the subgenomes had retained a similar transcriptionally active gene complement to each 373
other and to the corresponding extant progenitors, suggesting a limited level of 374
allopolyploidy-associated gene silencing in white clover. 375
In summary, the white clover subgenome sizes and transcriptionally active gene counts 376
reflect those of the contributing progenitors, indicating that the subgenomes retained in white 377
clover have maintained independence and integrity since allopolyploidization. 378
Stable subgenome expression ratios across tissues 379
Transcriptional consequences of allopolyploidization encompass homoeologue expression 380
bias and expression level dominance (genomic dominance), and have been well-documented 381
in many polyploids (Grover et al., 2012). To better understand these consequences in white 382
clover through characterization of subgenome transcription patterns, we first identified 383
progenitor-specific SNPs by mapping genomic and Pooled transcriptomic data 384
(Supplemental Table 13) from To and Tp onto To gene models using HyLiTE (Duchemin et 385
al., 2015). We then assigned subgenome identity to white clover RNA-sequence (RNA-seq) 386
reads based on these SNPs. A total of 720 million white clover RNA-seq reads derived from 387
flowers, leaves, stolons/stems, and roots were processed, and 42% were assigned to a specific 388
subgenome. Most of the remaining reads could not be assigned as they did not overlap 389
subgenome-specific SNPs (Supplemental Table 14; Supplemental Dataset 2). The reads 390
were assigned to a similar number of genes on each subgenome (TrTo: 36,181 genes; TrTp: 391
34,942 genes), indicating near equivalence in active gene count per subgenome under 392
glasshouse conditions (Supplemental Table 14). 393
Previous studies in other allopolyploids have examined differential expression of 394
homoeologues across tissues, but as only few genes were examined or average tissue biases 395
were calculated, it remains unclear how often the direction of bias changes between tissues in 396
13
allopolyploids (Adams et al., 2003; Liu and Adams, 2007; Leach et al., 2014; Zhang et al., 397
2015b; Yang et al., 2016). For white clover, we found that 69% of the 19,954 genes, for 398
which expression ratios could be reliably determined, exhibited the same direction of 399
homoeologue expression bias across the tissues assessed (Figure 4A). An example of 400
consistent homoeologue expression ratio (TrTo/TrTp) between tissues is shown (Figure 4B), 401
and such homoeologue ratio stabilities persisted even when the correlation between total gene 402
expression levels (TrTo + TrTp) was low (Figure 4C). 403
404
Expression ratio outliers were enriched for flavonoid biosynthesis genes 405
The high frequency of white clover genes showing stable cross-tissue homoeologue 406
expression ratios suggested that this may represent the default state following 407
allopolyploidization. Genes deviating strongly from this pattern, could, therefore, represent 408
instances of adaptive gene expression control in white clover driven by selection. To 409
investigate this possibility, we selected genes with unusually large log(TrTo/TrTp) expression 410
ratio variance across tissues, identifying 1263 transcripts as expression ratio outliers 411
(Supplemental Dataset 3). 412
Based on gene annotations, the outliers appeared rich in enzyme-encoding genes, leading us 413
to focus on investigating biosynthetic pathways. We assigned enzyme categories (ECs) using 414
the MetaCyc database (Caspi et al., 2016) and found a striking enrichment for components of 415
the phenylpropanoid derivatives biosynthesis pathway (P=4.6×10-24
after Benjamini-416
Hochberg correction) and its subset, the flavonoid biosynthesis pathway (P=2.2×10-10
after 417
Benjamini-Hochberg correction), as well as an enrichment for biosynthesis of secondary 418
metabolites (P=1.9×10-15
) and plant cell structural components (P=1.9×10-10
) (Supplemental 419
Table 15). The initial experiment had been carried out using biological replicates pooled 420
prior to sequencing. To examine if the expression ratio outliers could be reproducibly 421
detected, we carried out two additional experiments with three replicates of four tissues. 422
These experiments were conducted using material from clones of the white clover S9 inbred 423
individual used for sequencing and its selfed progeny (S10), respectively, and we refer to 424
these three separate expression outlier detection experiments as “Pooled”, “S9” and “S10” 425
(Supplemental Tables 13 and 14). For the outliers in the S9 and S10 experiments, we found 426
enrichment of the same pathways (Supplemental Table 15). As indicated by the pathway 427
enrichment analysis, the 2,262 genes identified as expression ratio outliers showed a 428
pronounced overlap among the three experiments (Figure 4D). Repeating the pathway 429
14
enrichment analysis with genes identified as expression ratio outliers in at least two out of the 430
three experiments again highlighted the same pathways (Supplemental Table 15). To 431
increase stringency and take advantage of the replicate information, we performed ANOVA 432
using all samples from the three experiments. In addition to a high mean expression variance 433
across tissues we required that the means be significantly different in the ANOVA after 434
Bonferroni correction. The ANOVA-filtered results again pointed to the same enriched 435
pathways (Supplemental Table 15), and the identified genes showed consistent expression 436
ratios within tissues across experiments and replicates, with particularly pronounced contrasts 437
between expression ratios in roots and mature leaves (Figure 4E-G). The clustered heat map 438
of the outlier expression ratios (Figure 4F) highlighted not only the contrast between root 439
and leaves, but also a temporal and transgenerational similarity within tissues among the 440
Pooled, S9 and S10 samples. The outlier genes included putative transcription factors, 441
receptor-like kinases and other enzymes, but the genes associated with the above-mentioned 442
pathways made up the only clearly overrepresented sets (Supplemental Dataset 3). 443
To assess whether the pathway enrichment could occur by chance, we generated random gene 444
sets meeting the same subgenome read assignment cut-off as the expression ratio outliers. We 445
found that the white clover genome was not generally enriched for genes involved in the 446
pathways associated with expression outliers, as only a single pathway (PWY-7214 baicalein 447
degradation, P < 1×10-3
) was enriched in one out of ten random gene sets (Supplemental 448
Table 15). Many of the identified flavonoid pathway enzymes appeared to cluster relatively 449
closely in the biosynthetic pathway downstream of the metabolite naringenin, which is a key 450
branch point for production of isoflavonoids, flavonols, anthocyanins and condensed tannins 451
(Franzmayr et al., 2012) (Figure 4H). 452
Of 909 outlier genes detected in the ANOVA analysis, 904 contained SNPs between the 453
differentially expressed TrTo and TrTp homoeologues that resulted in amino acid substitutions. 454
The outliers did not, however, show above-average divergence at the protein level 455
(Supplemental Figure 13). 456
457
Expression ratio outliers diverged mainly post-allopolyploidization 458
The deviations from stable subgenome expression ratios across tissues could be 459
allopolyploidy-associated or result from parental inheritance of expression patterns that had 460
already diverged between the progenitors. To distinguish between these possibilities, we 461
compared the subgenome-specific white clover expression data to progenitor expression data. 462
15
For the Pooled and S10 experiments, we also conducted RNA-seq for the progenitors 463
(Supplemental Table 13), and analysed data from both these experiments. After 464
normalization of the expression data using TMM (Robinson et al., 2010), PCA analysis 465
clustered the four tissues and showed tight grouping of the white clover subgenomes for all 466
tissues, whereas the progenitors were more dispersed (Supplemental Figure 14A and E). If 467
the subgenome expression ratio deviations were derived from parental inheritance, the 468
outliers should also show extreme divergence in the inter-progenitor comparison. We first 469
tested this hypothesis by comparing Pearson correlation coefficients for logged expression 470
values across the four tissues in pairwise genome comparisons. As expected, we found the 471
outlier genes greatly skewed (p<10-27
, Kolmogorov-Smirnov test) towards lower correlation 472
coefficients for the inter-subgenome TrTo vs. TrTp comparison. We did not, however, observe 473
a similar skew for the inter-parental comparison To vs Tp, arguing against parental 474
divergence and inheritance being causal for the white clover expression ratio outliers 475
(Supplemental Figure 14B and F; Supplemental Dataset 4). 476
The correlation coefficient analysis captures the directions of change in expression levels 477
between tissues, but does not consider the magnitude of the expression differences between 478
the compared genomes. To examine these, we analysed deviance defined as the sum of the 479
squared differences between logged expression values in the pairwise genome comparisons. 480
Again, we found the largest difference between the outliers and all genes in the inter-481
subgenome comparison, with the outliers skewed towards higher deviance. We also observed 482
significant skews for all other comparisons, indicating that the outliers showed above average 483
expression deviance in the progenitors and between progenitors and subgenomes (p<10-76
, 484
Kolmogorov-Smirnov test) (Supplemental Figure 14C and G; Supplemental Dataset 4). 485
The expression outliers were selected based variance in log(TrTo/TrTp) ratios, making them 486
clearly distinct from the overall gene average (Supplemental Figure 14D and H). However, 487
we also detected a less pronounced, but statistically significant (p<10-93
, Kolmogorov-488
Smirnov test), skew towards higher variance for the inter-progenitor comparison 489
(Supplemental Figure 14D, Supplemental Dataset 4). The correlation, deviance and 490
variance analyses produced very similar results for the Pooled and S10 experiments 491
(Supplemental Figure 14). 492
Overall, the To vs Tp deviations appeared much less pronounced than the TrTo vs TrTp 493
deviations for the outlier gene set, suggesting a limited contribution of parental inheritance to 494
the outlier expression patterns. To quantify the effect, we classified genes as showing 495
parental expression inheritance when they displayed higher correlation coefficients and lower 496
16
deviance for the subgenome-progenitor than for the inter-progenitor comparison(s) and 497
showed subgenome-progenitor correlations coefficients >0.8. Only 104 and 239 genes met 498
these criteria in the Pooled and S10 experiments, respectively, 22 and 43 of which were 499
expression ratio outliers (Supplemental Dataset 4). A total of 19,954 genes and 909 outliers 500
were included in the analysis, making the outliers significantly enriched in genes showing 501
parental expression inheritance (p<10-4
, Chi-squared test with Yates correction). However, 502
parental expression inheritance only explained a few percent of the outliers detected, 503
suggesting that outlier expression divergence occurred mainly post-allopolyploidization. 504
DISCUSSION 505
White clover (Trifolium repens L.) is a successful allotetraploid naturalized globally in moist 506
temperate grasslands and is an integral component of temperate pastoral agriculture. By 507
contrast, its diploid extant European progenitor relatives remain in extreme specialized 508
habitats with T. occidentale (To) restricted to high salinity coastal niches and T. pallescens 509
(Tp) to alpine screes. Based on whole-genome data for both progenitors and white clover, our 510
work has provided a clear example of allopolyploidization-enabled niche-expansion, which 511
has facilitated global radiation of the previously confined specialist progenitor genomes. This 512
occurred without compromising progenitor genome integrity in the past ~20,000 years since 513
the allopolyploidization event, which took place in the glaciation that culminated in the last 514
European Glacial Maximum. This was a period when alpine and coastal species were likely 515
in proximity in glacial refugia. Such extreme conditions are conducive to allopolyploid 516
formation, as the Arctic is one of the most polyploid-rich regions with post-deglaciation 517
colonization dominated by polyploids (Brochmann et al., 2004). Furthermore, temperature 518
extremes can increase production of unreduced male gametes (De Storme and Geelen, 2014), 519
a pathway to repeated allopolyploid formation (Mason and Pires, 2015). This process also 520
facilitates carry-over of diversity from the progenitors to the allopolyploid, as we observed 521
for white clover. 522
Genome rearrangements and loss of duplicate genes are common features of polyploids as 523
they progress towards a diploid-like genome via fractionation. This process can occur at 524
markedly different speeds depending on the polyploid system, and is preceded by changes in 525
homoeologue expression (Soltis et al., 2015; Soltis et al., 2016). In white clover, this has 526
occurred to a very limited degree, as the genomes and functional gene complement of its 527
progenitors have been retained with little evidence of large-scale inter-homoeologous 528
recombination or unbalanced homoeologue expression bias. As more polyploids are 529
17
sequenced, it is clear there is a wide range of genomic responses to polyploidization events. 530
Some very recent polyploids, and polyploids with similar genesis times to white clover, 531
exhibit gene retention with little genome reduction and no evidence of biased fractionation or 532
major genetic changes (Spartina anglica 130 years ago) (Chelaifa et al., 2010; Ferreira de 533
Carvalho et al., 2013); Arachis hypogaea (peanut) 9.4 thousand years ago (kya) (Bertioli et 534
al., 2016); Triticum aestivum (hexaploid wheat) 9kya ((IWGSC). 2014); Coffea arabica 10-535
50kya (Cenci et al., 2012; Combes et al., 2015); and Capsella bursa-pastoris 100-300kya 536
(Douglas et al., 2015). Of these species, Coffea arabica (Bardil et al., 2011) and Spartina 537
anglica (Chelaifa et al., 2010) exhibit expression level or genomic dominance implicated in 538
the process of diploidization. The subgenomes of the allotetraploid cotton (Gossypium 539
hirsutum 1-2Mya)) have little gene loss but show evidence of asymmetric evolution (Zhang 540
et al., 2015b), as well as homoeologue expression bias in certain tissues or conditions (Yoo et 541
al., 2013; Zhang et al., 2015b). By contrast, some very recent (Tragapogon spp. 80ya 542
(Chester et al., 2012); synthetic wheat (Feldman et al., 1997), synthetic allotetraploid 543
Arabidopsis (Wang et al., 2004) and ancient polyploids (Brassica rapa 13-17Mya (Wang et 544
al., 2011; Cheng et al., 2012); Maize 5-12Mya (Schnable et al., 2011) have undergone 545
significant genome and homoeologue expression changes on the path to diploidization. It 546
may be that factors including progenitor relatedness and genome size similarity influence the 547
fate of the post-polyploid genomes. 548
With limited post-hybridization alterations to the subgenomes, white clover appears to have 549
resulted from a fortuitous combination of compatible progenitors, which allowed multiple 550
allopolyploidization events to occur in the contact zones of the progenitors, quickly 551
generating very diverse white clover populations. Furthermore, the repeated 552
allopolyploidization events suggest high compatibility between progenitor genomes. This 553
may explain why we found a stable background of subgenome expression ratios across 554
tissues on a per-homoeologous gene pair basis, which in turn allowed us to identify genes 555
with strongly aberrant expression patterns. Differential homoeologue expression across 556
tissues, during development and in response to stresses has been described in a number of 557
species, including cotton, wheat, and Tragopogon (Adams et al., 2003; Bottley et al., 2006; 558
Liu and Adams, 2007; Hovav et al., 2008; Buggs et al., 2010). It has also been investigated at 559
the whole-transcriptome level, where differential regulation was confirmed in coffee, 560
synthetic Brassica allotetraploids, cotton, wheat, and Brassica juncea (Combes et al., 2013; 561
Liu et al., 2015; Zhang et al., 2015a; Zhang et al., 2015b; Yang et al., 2016). Our analysis 562
strategy distinguishes itself from previous studies by first establishing a testable null 563
18
hypothesis, in this case stable homoeologue expression ratios across tissues, which allows 564
quantitative identification of expression ratio outliers. This phenomenon appears to be a 565
temporal and transgenerational feature as evidenced by the observed overlap in these outlier 566
gene among the different experiments which comprised either the same individuals or 567
progeny sampled at different times. Coupled with pathway enrichment analysis of the 568
expression ratio outliers, we identified the phenylpropanoid and flavonoid biosynthesis 569
pathways as putative targets of selection in white clover. This overrepresentation of the 570
biosynthetic pathways among genes with deviating expression patterns argues against the 571
outliers resulting from rare stochastic events, and instead suggests that selection has driven 572
tissue-specific differential use of homoeologue gene copies. The selective pressure in 573
question could be internal, driven by incompatibilities between diverged homoeologous 574
proteins. However, we did not observe increased diversity at the protein level for the outliers, 575
arguing against this possibility. Taken together with our results indicating that expression 576
divergence occurred mainly post-allopolyploidization, the expression ratio outlier patterns 577
could represent examples of adaptive, allopolyploidy-associated transcriptional changes. 578
Such changes may be influenced by features such as the development of specialised tissue- or 579
homoeologue-specific transcription factors, or localised epigenetic characteristics including 580
differential tissue-specific methylation among homoeologues. 581
As more polyploids are explored in greater depth, other examples of allopolyploids exhibiting 582
stable homoeologue expression ratios are appearing. Arabidopsis kamchatica, an 583
allotetraploid, arose at a similar time (~20 kya) (Akama et al., 2014) to white clover. 584
Similarly, it has greater environmental plasticity than its extant progenitors, although with a 585
significant overlap in distribution (Hoffmann, 2005). Interestingly, it exhibits stable 586
homoeologous expression ratios for the majority of homoeologous pairs, with a very small 587
proportion (~1%) altering expression ratios in response to abiotic stress (Akama et al., 2014; 588
Paape et al., 2016). In contrast to white clover, however, it appears that A. kamchatica has 589
combined the transcriptional patterns of both parental species (Paape et al., 2016). Another 590
recent polyploid also exhibits homoeologue expression features comparable to white clover. 591
In allohexaploid wheat, ~72% homoeologous gene triads showed stable expression ratios 592
across 15 tissues (Ramirez-Gonzalez et al., 2018), remarkably similar to the ~70% we 593
observed in white clover. Furthermore, of the homoeologue triads with variable expression 594
ratios across tissues, the majority did not reflect expression ratios of the wheat progenitor 595
species, indicating possible allopolyploidy-associated transcriptional change (Ramirez-596
Gonzalez et al., 2018). These genes were enriched for defence and external stimuli responses 597
19
involved in fitness (Ramirez-Gonzalez et al., 2018), with a possible role in adaptation for this 598
domesticated species. 599
In white clover, the overrepresentation of the phenylpropanoid, flavonoid and secondary 600
metabolite pathways suggests that secondary metabolites may be under particularly strong 601
selective pressure. Indeed, flavonoids are widely distributed in plants and 9,000 different 602
chemical structures have been characterized (Williams and Grayer, 2004). The pervasive 603
nature and extreme diversity of flavonoids, along with variation at the population level (Cao 604
et al., 2017), argues strongly for their role in plant adaptation, where they have been shown to 605
facilitate tolerance to a wide range of biotic and abiotic stresses including protection against 606
reactive oxygen species, pathogenic microbes and insect predation, as well as in signalling 607
and development (Mierziak et al., 2014; Mouradov and Spangenberg, 2014; Davies et al., 608
2018). It may be that white clover exploits homoeologous isozymes, an option available only 609
to allopolyploids, to fine-tune complex mixes of flavonoids and other secondary metabolites 610
across tissues, for instance to balance responses to symbionts and pathogens in roots or to 611
optimize herbivory deterrence and antifungal defence in leaves through differential regulation 612
of naringenin and its derivatives (Mierziak et al., 2014; Davies et al., 2018) (Figure 4H). 613
It will require further investigation to determine if allopolyploidy-associated transcriptional 614
reprogramming of flavonoid biosynthesis genes contributed to the evolutionary success of 615
white clover. The white clover species complex described here, with well-defined 616
progenitors, limited inter-homoeologue recombination, limited subgenome gene loss, and 617
clearly defined natural ranges, will facilitate such studies, and provides a powerful platform 618
for understanding how allopolyploidy can underpin adaptation and range expansion. As a 619
first step, the current study has highlighted white clover as an example of allopolyploidy-620
facilitated niche-expansion, where two diverged genomes were reunited by a changing 621
climate and colonized the globe. 622
623
20
MATERIALS AND METHODS 624
Plant Material. Trifolium repens L. (white clover): An inbred allopolyploid S9 individual 625
was selected for sequencing from the ninth sequential selfed generation derived by single 626
seed descent from a self-fertile white clover cv. ‘Crau’ derivative (Cousins and Woodfield, 627
2006). T. occidentale Coombe (western clover): an individual derived by controlled selfing 628
from an individual from a wild population (OCD16, Margot Forde Germplasm Centre, New 629
Zealand). T. pallescens Schreb. (pale clover): an individual from a wild population (AZ1895, 630
Margot Forde Germplasm Centre, New Zealand). For linkage disequilibrium mapping, a 631
population (PGen3×PGen9) of 93 F1 full-sib progeny was developed from a hand-pollinated 632
reciprocal pair cross of heterozygous individuals, PGen3 and PGen9. All plants were grown 633
under glasshouse conditions prior to DNA extraction. 634
DNA extraction and sequencing: For the three Trifolium species in this study, nuclear DNA 635
was extracted from young leaves as described (Anderson et al., 2018) and sequenced by 636
Macrogen (Seoul, South Korea). Using a range of platforms (Roche 454-GLX; Illumina 637
HiSeq 2000), libraries (shotgun, paired-end; mate-paired) and insert sizes (180 – 8000 bp), 638
sequence data were generated to an approximate coverage of 492×, 114×, and 230× for T. 639
occidentale, T. pallescens, and T. repens, respectively (Supplemental Table 1). Furthermore, 640
high molecular weight DNA was extracted (Anderson et al., 2018) to create six white clover 641
libraries of Illumina TruSeq Synthetic Long-Reads (TSLRs) generated by Illumina FastTrack 642
Sequencing Services Long Reads pipeline to provide a 3× coverage (Supplemental Table 1). 643
Original sequence data are deposited in NCBI under Bioprojects PRJNA523044, 644
PRJNA521254, and PRJNA523043 for white clover, T. occidentale, and T. pallescens, 645
respectively, as detailed in Accession Numbers (below). 646
RNA extraction and sequencing: Transcriptome analyses of the S9 T. repens individual and 647
extant progenitors T. occidentale and T. pallescens were derived from Illumina RNA-seq data 648
generated from four different tissue samples: mature fully expanded leaf (third node from 649
stolon tip); tap root (T. pallescens) nodal root or nodal root (T. repens and T. occidentale); 650
young opened flowers from the middle whorls of the inflorescence; internodal shoot (T. 651
pallescens) or stolon (T. repens and T. occidentale). The tissues were sampled in 2013 as 652
three biological replicates harvested at a single time point from plants grown in glasshouse 653
conditions, and frozen immediately in liquid nitrogen. Total RNA was isolated from non-654
floral tissue using an Isolate II RNA Plant Kit (Bioline, London, UK). Due to the high 655
phenolic content of white clover flowers, floral RNA was extracted as described (Jamalnasir 656
21
et al., 2013) with the modifications including replacement of chloroform:isoamyl alcohol 657
(24:1, v/v) with pure chloroform, followed by an additional purification step. The final 658
precipitated RNA was resuspended in Isolate II RNA Plant Kit (Bioline, London, UK) 659
extraction buffer prior to binding, washing and eluting from an Isolate II RNA Plant Kit 660
column (Bioline, London, UK). Prior to sequencing, RNA from the biological replicates for 661
each tissue were pooled, transported on dry-ice and sequenced as 500 bp paired-end libraries 662
at Macrogen (Seoul, South Korea) using the Illumina HiSeq 2000 2×100 bp paired-end 663
chemistry (Supplemental Table 13). Reads were filtered using Kraken (Wood and Salzberg, 664
2014) to eliminate viral RNA, and filtered for low quality using SolexaQA (Cox et al., 2010). 665
These data were used for gene annotation and homoeologue expression analysis and is 666
referred to as the “Pooled” data. 667
To further explore stable homoeologue expression across tissues of the T. repens S9 668
individual, additional RNA-seq data were generated at Novogene (Hong Kong) from each of 669
three cloned plants providing unpooled biological replicates per tissue sample (mature fully 670
expanded leaf; root (nodal/tap); stolon/shoot); and young open flowers from the middle 671
whorls of the inflorescence). Tissues were harvested simultaneously in 2017 from each of 672
three cloned plants grown in glasshouse conditions and snap frozen in liquid nitrogen and 673
extracted as described above. RNA was transported frozen on dry-ice to Novogene (Hong 674
Kong) and 250-300 bp insert libraries were sequenced using the Illumina NovoSeq 6000 675
2×150 bp paired-end chemistry (Supplemental Table 13). Reads were filtered for quality 676
and to remove viral RNA as described above. This RNA-seq dataset was used for replicated 677
homoeologue expression analysis and referred to as “S9”. 678
To further investigate conservation of homoeologue expression among extant progenitors and 679
white clover plants of the same age, RNA-seq data were generated from tissue 680
(emerging/young leaf; mature fully expanded leaf; stolon/shoot; root (nodal/tap)) from each 681
of three cloned plants (biological replicates) of an individual selfed progeny each of the 682
sequenced T. occidentale (selfed OCD16), T. pallescens (selfed AZ1895), and T. repens (S10) 683
These plants, co-located in the same glasshouse, were harvested simultaneously in 2018 at 684
the same time of day (early afternoon) as previous tissue collections to minimise effects of 685
diurnal variation on gene expression. RNA was extracted, sequenced and filtered as described 686
above, and the RNA-seq dataset referred to as “S10”. Original sequence data are deposited in 687
NCBI under Bioprojects PRJNA523044, PRJNA521254, and PRJNA523043 for white 688
clover, T. occidentale, and T. pallescens, respectively, as detailed in Accession Numbers 689
(below). 690
22
K-mer analysis for estimating genome size. Total and frequency of k-mers (n=17) was 691
counted in unassembled Illumina 180 bp insert paired-end sequence data for white clover and 692
its progenitors using Jellyfish v2.2.0 (Marçais and Kingsford, 2011) default parameters. The 693
data were plotted to determine maximum read depth for each specific 17-mer, and genome 694
size was estimated as (total 17-mer number)/(peak depth). 695
Genome Assembly: Raw sequence data for all three Trifolium species underwent quality 696
control with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), was then 697
trimmed using the DynamicTrim program from the SolexaQA package (Cox et al., 2010), 698
and in the case of mate-pair libraries, the high level of adapter contamination was reduced 699
using Trimmomatic (Bolger et al., 2014). 700
Assembly of T. pallescens was accomplished exclusively with the use of ALLPATHS-LG 701
(Gnerre et al., 2011) with default parameters. A low level of bacterial contamination was 702
found in the DNA, and these were filtered with the use of a simple in-house script using 703
Bowtie 2 (Langmead and Salzberg, 2012) for raw read mapping. The dataset underwent a 704
further error correction step by re-mapping pair-end data and searching for non-repeats with 705
uneven read distribution. 706
Assembly of T. occidentale was separated in two stages. In the first stage, an assembly was 707
produced using ALLPATHS-LG (Gnerre et al., 2011) with default parameters. All mate-pair 708
libraries were used, but only the single 180 bp pair-end library was incorporated into the 709
ALLPATHS-LG assembly. The other pair-end libraries were assembled separately using 710
Velvet (Zerbino and Birney, 2008). The raw reads were first subjected to digital 711
normalization using the khmer package (http://ged.msu.edu/papers/2012-diginorm/), using a 712
k-mer size of 31, a maximum coverage of 50, and a minimum coverage of 3. The filtered data 713
was assembled with Velvet using the following parameters: k-mer size of 55; expected 714
coverage of 25; coverage cutoff of 7.5; maximum coverage of 45; maximum divergence of 715
0.05; minimum scaffolding pair count of 7.0. The resulting contigs were compared to the 716
ALLPATHS-LG assembly, and used to extend and resolve gaps where long, high-quality 717
overlaps were found. A final round of gap filling using all pair-end data was done using an 718
in-house script (https://github.com/Lanilen/SemHelpers). This resulted in a longer assembly 719
with fewer ambiguities when compared to that of the similar sized genome of T. pallescens. 720
Assembly of T. repens using ALLPATHS-LG yielded unsatisfactory results, with a reduced 721
assembly of 600 Mb displaying a high degree of homoeologue conflation. Neither 722
ALLPATHS-LG nor MaSuRCA (Zimin et al., 2013) assemblers seemed capable of utilising 723
23
the Illumina TruSeq Synthetic Long Reads (TSLRs) (both running out of memory while 724
executing in a 64-core, 1 Tb RAM server), thus the final assembly was done using Velvet 725
(Zerbino and Birney, 2008). 726
All pair-end libraries were subjected to khmer digital normalization to a maximum coverage 727
of 50×, minimum coverage of 3×, and using a k-mer size of 31, independent of each other. 728
The resulting filtered libraries were assembled together with all Illumina TSLRs with Velvet 729
using the following parameters: k-mer size of 47;automatic expected coverage; minimum 730
coverage of 10; fixed insert length of 350 for longer pair-end libraries and 0 for 180 bp pair-731
end library; maximum branch length of 500; maximum divergence of 0.05; minimum contig 732
length of 1,000 (so as to match the filter used by ALLPATHS-LG and make assembly 733
comparisons more meaningful); and turning the conserveLong flag on. 734
The resulting assembly was scaffolded using the mate-pair libraries with the use of SSPACE-735
Basic 2.0 (Boetzer et al., 2011). Illumina TSLRs were split into 1,000 bp fragments, with a 736
generous 750 bp overlap, and added to the mate-pair library collection to take advantage of 737
the gap-filling option of SSPACE. 738
Once the assembly was completed, the Illumina TSLRs were used one last time for an error-739
correction step using in-house scripts (https://github.com/Lanilen/SemHelpers), by aligning 740
them against the assembly and looking for small indels in the alignment. When a gap or an 741
insertion was well supported by TSLRs, the reference genome was corrected with the 742
consensus from the TSLRs. 743
Pseudomolecule construction: White clover scaffolds were assigned to subgenomes and 744
ordered into pseudomolecules using pairwise linkage disequilibrium (LD), calculated as r2, 745
between microsatellite (SSR) markers and Genotyping-by-Sequencing (GBS)-derived 746
biallelic single nucleotide polymorphisms (SNPs) segregating in F1 progeny (n = 93) of a 747
white clover biparental cross (PGen3×PGen9). To enable linkage group identification and 748
distinguish homoeologues in this population, and to align the sequence assembly 749
pseudomolecules with previously published white clover linkage groups, DNA extracted 750
from the parent and progeny plants (Anderson et al., 2018) was screened with 16 single-locus 751
homoeologue-specific (SLHS) SSR markers (Supplemental Table 2) as described (Griffiths 752
et al., 2013). 753
A denser marker set for LD mapping was then generated by preparing GBS libraries from 754
each parent and 93 progeny as described (Elshire et al., 2011). Briefly, 100 ng of sample 755
DNA was digested with ApeKI, followed by ligation of 99 ng barcoded and reverse adapters, 756
24
then pooled and PCR amplified. The pooled libraries were sequenced on two lanes of an 757
Illumina HiSeq 2500, and samples demultiplexed using GBSX (Herten et al., 2015) with one 758
mismatch allowed in the enzyme and barcode. Each sample was mapped back to the 759
reference genome individually using Geneious 8 (Kearse et al., 2012) using default ‘Fast’ 760
parameters and discarding non-unique hits. SNPs were called from the resulting BAM files 761
using ‘ref_map’ from STACKS (Catchen et al., 2013) with a minimum depth of three reads 762
required to form a locus, with one mismatch allowed per locus. The Populations pipeline 763
from STACKS was used to consolidate individuals and generate SNPs across all the samples. 764
The resulting SNPs were filtered as follows. Individual heterozygous genotype calls where 765
the ratio of reads of one allele relative to the other exceeded 9:1 were reset to homozygous 766
for the most numerous allele read. Overall, approximately 1% of heterozygous genotypes 767
were reset to homozygous. SNPs were discarded that had a minor allele frequency (MAF) 768
less than 10%, or at least 50% missing genotype calls, or more than 75% heterozygous 769
genotypes. The remaining SNPs were grouped according to parental genotype, provided that 770
the read depth for both parents was 10 or more. Specifically, a SNP was classified as a 771
PGen3Het SNP if parent PGen3 was heterozygous and parent PGen9 was homozygous; or a 772
PGen9Het SNP if PGen3 was homozygous and PGen9 was heterozygous. These SNPs were 773
discarded if the frequency of the GBS homozygous major genotype was either less than 0.25, 774
or less than twice the frequency of the GBS homozygous minor genotype, as they may have 775
been misclassified. After filtering, the number of SNPs was 22,663 comprising 10,824 776
PGen3Het and 11,839 PGen9Het SNPs. 777
LD analysis was performed using the PGen3Het and PGen9Het SNPs based on a pseudo-778
backcross approach where LD was only considered between SNPs with the same parental 779
genotype configuration and along haplotypes inherited from the heterozygous parent. 780
Estimates of pairwise r2 values were computed using GUS-LD (Bilton et al., 2018), which 781
accounts for uncertainty in GBS genotypes associated with low read depth, where the major 782
allele frequencies were fixed at 0.5 and 𝜀 = 0. These LD estimates can be expressed in terms 783
of recombination fraction, therefore equating the LD analysis to a two-point linkage analysis 784
that accounts for read depth. 785
Linkage groups (LGs) representing parental chromosomes comprising either the PGen3Het 786
or PGen9Het parental SNP data were assembled using the following procedure. For SNPs 787
from GBS sequence tags mapped to the white clover assembly, an initial LG was formed by 788
selecting a set of SNPs from the largest group of SNPs in LD based on a visual inspection of 789
the matrix of r2 values. These LGs were assigned to the assembly pseudomolecules according 790
25
to where the SNP GBS tags mapped. The remaining SNPs aligned to a pseudomolecule were 791
added to the corresponding LG if the mean of the largest thirty r2 values between the 792
unmapped SNP and the LG SNPs was greater than 0.4. Each of the remaining unmapped 793
SNPs were then assigned to an LG based on the greatest mean of the ten largest r2 values 794
between the unmapped SNP and LG SNPs, provided the mean was greater than 0.6. To 795
ensure SNP placement in the LG was unequivocal, SNPs remained unassigned if the second 796
largest r2 mean was to another LG and was greater than 66% of the highest r
2 mean. After 797
assignment, only seven mapped SNPs were removed due to high LD estimates with SNPs in 798
other LGs. 799
Two additional filtering steps were performed on the SNPs mapped to an LG. Firstly, to 800
ensure only a single SNP was selected to represent a GBS sequence tag, SNPs were mapped 801
to the reference genome and placed into bins if separated by less than 180 base pairs. From 802
each bin, only the SNP with the highest mean r2 value across the LG was retained for further 803
analysis. The second filtering step removed SNPs that mapped to multiple loci in the genome. 804
Briefly, SNPs were mapped to a version of the reference genome in which positions that had 805
sufficient similarity that could result in tags being mapped to multiple locations were masked. 806
SNPs were only retained that mapped to the same place in both the masked and unmasked 807
genomes. After filtering, parental LGs comprised 3,492 and 3,785 SNPs for parents PGen3 808
and PGen9, respectively. 809
The single-locus homoeologue-specific SSR data described above were also incorporated to 810
align LGs with a previous genetic linkage map (Griffiths et al., 2013). Pairwise r2 values 811
between the SSRs and the mapped SNPs were estimated where the read depth for the SSR 812
genotype calls were set to infinite. Each SSR was then assigned to the LG group where there 813
were pairwise r2 values greater than 0.5. Depending on the parental configuration, some of 814
the SSRs were mapped in both parental LGs. 815
Ordering the SNPs and SSRs in each LG was achieved using the Sorting Points into 816
Neighbourhoods (SPIN) algorithm (Tsafrir et al., 2005) within the R package “seriation” 817
(Hahsler et al., 2016). Adjustments to the implementation of the neighbourhood algorithm 818
within R were made which consisted of solving the linear assignment problem using the 819
Hungarian algorithm and running the algorithm using multiple starting orders to ensure the 820
global maximum was obtained. The standard deviation parameter for the weighting matrix 821
was set to 15 for all LGs except two where the parameter was set at 10. Orientation of the 822
ordered LGs was determined by SNP alignment to the reference genome and was confirmed 823
by SSR position relative to the genetic linkage map (Griffiths et al., 2013). The order of the 824
26
SSR alleles among the 3,492 and 3,785 SNPs in the PGen3 and PGen9 parental maps, 825
respectively, show conservation of position within the LGs relative to previous linkage maps 826
(Supplemental Figure 2). 827
The GBS tags harbouring the ordered SNPs from the biparental population were mapped to 828
the white clover S9 sequence data and used to group and order an anchor set of 3,364 of the 829
22,100 scaffolds. This anchor set of pseudomolecules was assigned to the appropriate TrTo or 830
TrTp subgenome by BLAST alignment with the progenitor assemblies. To incorporate 831
remaining scaffolds not linked to the anchor set by LD, the anchor scaffolds were BLAST-832
aligned to the model forage legume Medicago truncatula genome (Mt4.0) (Tang et al., 2014) 833
which has general macrosynteny with some major rearrangements relative to white clover 834
(Griffiths et al., 2013). The non-linked scaffolds were assigned a pseudomolecule and order 835
relative to the anchor scaffolds based on proximity on the Mt4.0 genome. A small number 836
(4%; Supplemental Table 3) of white clover scaffolds had no strong alignment to Mt4.0, and 837
were assigned to Chromosome 0. In summary, the white clover scaffolds were ordered based 838
on LD of mapped GBS SNPs, assigned to subgenomes, and then supplemented with non-LD-839
linked scaffolds by alignment to the M. truncatula genome. The progenitor pseudomolecules 840
were assembled based on synteny to white clover. 841
Gene annotation: RNA-seq reads were mapped to three respective genomes using Bowtie 842
v2.1.0 (Langmead and Salzberg, 2012) / TopHat v2.0.9 (Kim et al., 2013) with a maximum 843
intron size of 30 kb. Genome-guided transcripts were then reconstructed using Cufflinks 844
v2.1.1 (Trapnell et al., 2012) and those transcripts were used as primary source for final set of 845
gene models (Sanggaard et al., 2014) (Supplemental Table 5). Gene annotation was 846
performed using a custom pipeline, which combined gene model evidence from experimental 847
data and ab initio methods. Gene model evidence was obtained using five parallel 848
approaches, 1) From Cufflinks as described in previous section, 2) Evidence from the 849
mapping of the de novo assembled transcripts using GMAP v20/07/12 algorithm (Wu and 850
Watanabe, 2005), 3) From AUGUSTUS v2.6.1 (Stanke et al., 2006), 4) GeneMark –E 851
(Lomsadze et al., 2005) and 5) GlimmerHMM v3.0.1 (Majoros et al., 2004) (Supplemental 852
Table 5). The five layers of evidence were merged to create a non-redundant gene model set 853
using a hierarchical filtering approach as described (Sanggaard et al., 2014). Cufflinks gene 854
models were given highest priority, followed by de novo assembled transcripts (GMAP). 855
Three ab initio predictors had lower priority in the following order: AUGUSTUS, GeneMark 856
and Glimmer. Glimmer appears to have been superseded by other prediction programs and 857
27
contributed only very few gene models. AUGUSTUS was trained using protein-coding 858
transcripts (>1000 nt) from Cufflinks. GeneMark uses ~10 Mbp of genome to fine-tune gene 859
model prediction parameters while GlimmerHMM was trained with an Arabidopsis gene set. 860
The final set of gene models were then divided into “Protein coding” and “Unclassified”. 861
Gene models were labeled as “Protein coding” if gene model had a coding sequence length 862
(CDS) larger than 300 or had a functional annotation based on the homology against non-863
redundant (nr) plant database. The “Unclassified” category is the remainder of transcripts 864
which did not fit the “Protein coding” category. 865
The annotation was further improved using the BRAKER2 pipeline tool (Hoff et al., 2017), 866
which is based on AUGUSTUS. The RNA transcription data were mapped using Spliced 867
Transcripts Alignment to a Reference (STAR) (Dobin et al., 2013). The references for the 868
white clover and the progenitors were soft-masked using RepeatMasker (Smit et al., 2015). 869
The BRAKER2 pipeline produced an abundance of genes, due to the AUGUSTUS 870
predictions. Many of these additional genes occurred in regions where we had no RNA-seq 871
mapping. Using the gene models from the BRAKER2 annotation, we calculated confidence 872
scores (0 to 1) based on the amount of overlap in exons found in the previous annotation. The 873
final curated gene set with confidence scores greater than 0.5 did not reduce Benchmarking 874
Universal Single-Copy Orthologs (BUSCO v3.0.2) (Simão et al., 2015) scores after filtering. 875
BUSCO was run against the embryophyta odb9 database downloaded on 28 August 2017. 876
Any genes not supported by the previous annotation were removed. Functional annotation 877
was done by blasting against all protein plant sequences from the NCBI database. The most 878
likely match which was not labelled as uncharacterised, hypothetical and predicted proteins 879
were chosen. 880
ITS and chloroplast analysis: ITS sequences (ITS 1 and 2 together with the 5.8S rRNA) 881
from T. pallescens (GenBank DQ312111) and T. occidentale (GenBank AF053168) were 882
used to search the collection of white clover Illumina TruSeq Synthetic Long Reads (TSLRs) 883
to identify diagnostic SNPs. The T. occidentale chloroplast sequence (GenBank KJ788289), 884
along with T. pallescens and T. repens chloroplast sequences (current work) were aligned 885
using MAUVE (Darling et al., 2004), using the ProgressiveMauve algorithm from Geneious 886
7 (Kearse et al., 2012), and LASTZ (Harris, 2007), using default parameters with inclusion of 887
the --chain parameter and a hit window of 10,000. A fourth chloroplast genome from T. 888
hybridum (GenBank KJ788286.1) was added to serve as an outgroup for comparison. 889
28
Expanding the chloroplast analysis to encompass more white clover individuals, four 890
resequenced clover individuals (described below), were mapped on to the chloroplast 891
sequences using BWA (Li and Durbin, 2011). Variants were called using the mapped reads 892
using the Genome Analysis Tool Kit (GATK) (DePristo et al., 2011) for all four individuals 893
combined and separately. The fewer the SNP variants identified with each chloroplast 894
comparison indicated greater similarity, hence the most likely chloroplast sequence the 895
individual white clover had inherited. 896
Recombination analysis: The assembled, unscaffolded white clover contigs were aligned to 897
the assembled genomes of the progenitors using BLAST to identify contigs where the top 898
high-scoring segment pairs (HSPs) alternated between To and Tp along the white clover 899
contig. Subsequently, all mapping TSLRs were compared against the reference genomes of T. 900
occidentale and T. pallescens using LASTZ (Harris, 2007), and the resulting alignments were 901
filtered to find individual TSLRs having high scoring matches against alternate ancestors on 902
each end of the read. 903
Whole-genome divergence estimation: Whole-chromosome alignments were generated 904
using LASTZ v1.02.00 (Harris, 2007) with the following settings: --hspthresh=12500 --905
strand=both --format=maf --ambiguous=iupac --chain. The resulting MAF alignments were 906
then masked at genic positions, to focus on neutral polymorphic sites, and used as input for 907
the IMCoalHMM isolation model software (Mailund et al., 2012) run with the multichain 908
Monte-Carlo mode on sets of 100 alignments for each of which the median split time was 909
recorded. 910
White clover resequencing, PSMC and MSMC analysis: Whole-genome resequencing 911
was performed using full Illumina sequencing (150 bp paired-end libraries) to ~49× coverage 912
for four divergent individuals selected from among 20 cultivars (Supplemental Table 8). 913
Each individual was mapped using BWA (Li and Durbin, 2011) and the number of total reads 914
and mapped reads can be found in Supplemental Table 8. Reads not properly paired were 915
removed. Diploid FASTA files were generated were generated using samtools mpileup, 916
bcftools and vcftools. Pairwise sequentially Markovian coalescent (PSMC) analysis (Li and 917
Durbin, 2011) was conducted using the diploid fasta files for each individual, and the results 918
of the separate PSMCs were combined and then plotted. Additional PSMC analyses were 919
performed to assess the influence of different recombination rates. With a mutation rate of 1.8 920
x 10-8
the following recombination rates were used: (4+25*2+4+6) in which the first 921
29
parameter spans 4 intervals, the next 25 parameters span 2 time intervals, the second to last 922
parameter spans 4 intervals and last parameter spans 6 intervals (Li and Durbin, 2011); or 923
(4+30*2+4+6+10) equivalent to a higher recombination rate in which the first parameter 924
spans 4 intervals, the next 30 parameters span 2 time intervals, the third to last parameter 925
spans 4 intervals, the second to last spans 6 intervals, and last spans 10 intervals 926
(Nadachowska-Brzyska et al., 2016). All scripts and settings can be found in the PSMC 927
folder at (https://github.com/MarniTausen/CloverAnalysisPipeline). Multiple sequentially 928
Markovian coalescent (MSMC) analysis (Schiffels and Durbin, 2014) was performed on the 929
same individuals as PSMC. MSMC in comparison to PSMC, allows the use of multiple 930
individuals (haplotypes) which increases resolution of population estimation in more recent 931
timeframes (2k - 30k years ago). MSMC was run with 8 haplotypes (4 individuals), 6 932
haplotypes, 4 haplotypes and 2 haplotypes. In the case of two haplotypes MSMC is similar to 933
PSMC, referred to as PSMC’. For each individual all of the pre-processing was run per 934
chromosome, as described on the MSMC manual 935
(https://github.com/stschiff/msmc/blob/master/guide.md). The final results of the MSMC 936
analysis contained unscaled time and effective population sizes, these were then scaled using 937
a generation time of 1 and the mutation rate of 1.8×10-8
. To compare the MSMC analyses 938
with various recombination rate:mutation rate ratios compared to the default ratio (0.25), we 939
assessed MSMC using increased and reduced ratios (0.35 and 0.15, respectively). 940
GBS sequencing and diversity profiling: A panel of 200 white clover genotypes was 941
sampled from 20 populations (Supplemental Dataset 1) and genotyped using genotyping by 942
sequencing (GBS) (Elshire et al., 2011). Briefly, 100 ng of DNA isolated using a CTAB 943
method was digested with ApeKI, ligated to modified Illumina adaptors, pooled to create 944
libraries, each consisting of up to 88 individually bar-coded DNA samples, then PCR-945
amplified as described (Elshire et al., 2011). Libraries were assessed for quality on an Agilent 946
DNA 1000 Assay (Agilent Technologies, Deutschland GmbH) prior to sequencing each 947
library on multiple lanes of an Illumina Hi-Seq flow cell to generate single-end sequences of 948
101 bp. Post-sequencing QC included removal of adaptor contamination using Scythe 949
(https://github.com/vsbuffalo/scythe) with a prior contamination rate set to 0.10. Sickle 950
(https://github.com/najoshi/sickle) was used to trim reads when the average quality score in a 951
sliding window (of 20 bp) fell below a phred (Ewing and Green, 1998) score of 20. At this 952
point reads shorter than 40 bp were also discarded. The reads were demultiplexed using sabre 953
(https://github.com/najoshi/sabre), and all reads originating from the sample were 954
30
concatenated. BWA (Li and Durbin, 2011) was used to align the reads against the white 955
clover reference and generate an alignment file in SAM format. Alignments were sorted by 956
coordinate with picard tools (http://broadinstitute.github.io/picard), and the Genome Analysis 957
Tool Kit (GATK) (DePristo et al., 2011) was used to generate a list of putative INDELs 958
(RealignerTargetCreator), perform local realignment around these putative INDELs 959
(IndelRealigner), and identify variants and call genotypes (UnifiedGenotyper). Only biallelic 960
variant sites with a mean mapping quality of 30 were retained. The SNP density was 961
calculated as the number of variants divided by the number of callable sites in bins of 100 kb. 962
All scripts and commands used in the analysis are available on Github in the GBS and GBS 963
visualization folders (https://github.com/MarniTausen/CloverAnalysisPipeline). 964
Simulation of mutation accumulation: Simulations were performed using msprime 965
(Kelleher et al., 2016) using mutation rates based on Arabidopsis (6.5×10-9
(Ossowski et al., 966
2010)) or genome size (1.1×10-8
(progenitors), 1.8×10-8
(white clover) (Lynch, 2010); 967
Supplemental Figure 8), a sample size of 400 (alleles), and a recombination rate 1×10-7
. 968
Multiple demographic models were run testing different hypotheses. The general structure 969
was a fast bottleneck, modeling the start of the population with a rapid increase to full 970
population size. For the Hybridizing and Doubling hypothesis, the population initiated with a 971
single hybrid individual and assuming it attains the same effective population size (Ne) as its 972
progenitors (120,000), the population achieves this in 20 generations of exponential growth. 973
The hypotheses of unreduced gametes and mild bottlenecks used the same model, assuming a 974
constant population size and introducing a bottleneck followed by 20 generations with rapid 975
growth. Each of the simulations produced a VCF file, where the SNP density was estimated 976
in the same manner as the GBS data was analysed. Site frequency spectra were also produced 977
and used to compare the simulated to the observed data. The scripts used for the simulations 978
and the precise settings can be found at 979
(https://github.com/MarniTausen/CloverAnalysisPipeline). 980
981
Homoeologue-specific transcriptional profiling: Raw genomic DNA and RNAseq reads 982
for all tissues were assessed for quality with SolexaQA++ version 3.14 (Cox et al., 2010). 983
The largest contiguous sections of reads with phred (Ewing and Green, 1998) quality scores 984
>13 were retained and reads with fewer than 25 bp discarded. 985
We applied the Hybrid Lineage Transcriptome Explorer (HyLiTE v1.6.2) (Duchemin et al., 986
2015) to obtain read count matrices for differential expression analysis. As input to HyLiTE, 987
31
we provided Sequence Alignment/Map (SAM) (Li et al., 2009) files. These SAM files were 988
generated by mapping of mRNA reads both paired and unpaired with Bowtie 2 (Langmead et 989
al., 2009; Langmead and Salzberg, 2012) against a set of 36,638 T. occidentale reference 990
gene models. For each tissue type a separate set of SAM files was created for each species of 991
clover. To create a combined analysis, these separate SAM files were merged across all 992
tissues. For increasing coverage for SNP calling, genomic DNA sequence data were mapped 993
under the same conditions to the reference gene models. From a pileup file generated with 994
SAMtools (Li et al., 2009), HyLiTE then identified polymorphisms indicative of parental 995
origin for RNAseq reads and classified reads to parental origin in the hybrid species. The 996
resulting output from HyLITE comprises tables with read counts for parent species to gene 997
models (counts for orthologues) and read counts of unambiguously assignable reads for the 998
hybrid for each homoeologue. The scripts and the workflow for running HyLiTE are 999
available at (https://github.com/MarniTausen/CloverAnalysisPipeline#hylite-pipeline). 1000
Analysis of subgenome expression ratio outliers: Expression ratios were first normalized 1001
across all progenitor and white clover tissue samples, using TMM (Robinson et al., 2010), 1002
which we have previously shown is well suited for correcting for compositional biases in 1003
cross-tissue comparisons (Munch et al., 2018). For white clover, the sum of the reads 1004
assigned to subgenomes was normalized, and the normalized sum was then redistributed on 1005
the white clover subgenomes, maintaining the original subgenome ratio, and then multiplied 1006
by two to maintain normalization with respect to the progenitor samples. PCA analysis was 1007
carried out using the prcomp R function and results were plotted using ggplot2. Pearson 1008
correlation coefficients, deviance and variance were calculated using standard R (version 1009
3.4.3) functions. To identify outliers, a subset of 19,954 genes were used, which showed an 1010
average of more than 1 TPM per sample after TMM normalization and for which at least 40% 1011
of the reads were informative for subgenome assignment in the HyLiTE analysis. The 1012
variance in loge(TrTo/TrTp) subgenome expression across tissues was calculated. Expression 1013
ratio outliers were identified for each of the three experiments (Pooled, S9, S10) as the genes 1014
with a loge(TrTo/TrTp) variance across tissues larger than 1.5 standard deviations above the 1015
mean. ANOVA analysis comparing groups of root, leaf, and stolon samples from all 1016
experiments, flower samples from Pooled and S9 experiments, and young leaf samples from 1017
the S10 experiment, was carried out using R version 3.4.3. Ten control gene sets were 1018
generated by random sampling from the set of 19,954 genes that fulfilled all read count and 1019
assignment requirements. All scripts used for the analysis can be found in Supplemental 1020
32
Dataset 3. Enzyme category (EC) numbers were assigned using the Pathway Tools 1021
(http://bioinformatics.ai.sri.com/ptools/) with the Metacyc database (v.2.2.6) by searching 1022
with the acquired Metacyc gene IDs. Pathway enrichment was evaluated for matches to the 1023
MetaCyc database with e-value less than 10e-10 using the “Genes enriched for pathways” 1024
function of the MetaCyc website (http://metacyc.org/), with the enrichment setting, Fisher’s 1025
exact test and Benjamini-Hochberg correction for multiple testing. 1026
The amount of diversity of the outlier genes between subgenomes was estimated using the 1027
SNPs called through the HyLiTE pipeline. The gene sequences were “mutated” according to 1028
the SNPs and translated into protein, if the amino acid change had occurred the sequence 1029
before mutating and after did not match. 1030
Accession Numbers: Original sequence data from this article can be found in the 1031
EMBL/Genbank data libraries with the following accession numbers which are also detailed 1032
in Supplemental Tables 16-18. 1033
White clover (Bioproject PRJNA523044): 1034
Genome sequencing: SRR8670776, SRR8670777, SRR8670778, SRR8670779, 1035
SRR8670780, SRR8670781, SRR8670782, SRR8670783, SRR8670784, SRR8670785, 1036
SRR8670786, SRR8670787, SRR8670788, SRR8670789, SRR8670790, SRR8670791, 1037
SRR8670792, SRR8670793, SRR8670794, SRR8691041 1038
RNA-seq Pooled: SRR8691037, SRR8691038, SRR8691039, SRR8691040 1039
RNA-seq S9: SRR8691836, SRR8691837, SRR8691838, SRR8691839, SRR8691840, 1040
SRR8691841, SRR8691842, SRR8691843, SRR8691844, SRR8691845, SRR8691846, 1041
SRR8691847 1042
RNA-seq S10: SRR8693957, SRR8693958, SRR8693959, SRR8693960, SRR8693961, 1043
SRR8693962, SRR8693963, SRR8693964, SRR8693965, SRR8693966, SRR8693967, 1044
SRR8693968 1045
1046
Trifolium occidentale (Bioproject PRJNA521254): 1047
33
Genome sequence: SRR8593467, SRR8593468, SRR8593469, SRR8593470, SRR8593471, 1048
SRR8593472, SRR8593473, SRR8593474, SRR8593475. 1049
RNA-seq Pooled: SRR8691453, SRR8692350, SRR8692351, SRR8692352. 1050
RNA-seq S10: SRR8692338, SRR8692339, SRR8692340, SRR8692341, SRR8692342, 1051
SRR8692343, SRR8692344, SRR8692345, SRR8692346, SRR8692347, SRR8692348, 1052
SRR8692349. 1053
1054
Trifolium pallescens (Bioproject PRJNA523043): 1055
Genome sequence: SRR8617465, SRR8617466, SRR8617467. 1056
RNA-seq Pooled: SRR8675752, SRR8675753, SRR8675754, SRR8675755. 1057
RNA-seq S10: SRR8692353, SRR8692354, SRR8692355, SRR8692356, SRR8692357, 1058
SRR8692358, SRR8692359, SRR8692360, SRR8692361, SRR8692362, SRR8692363, 1059
SRR8692364. 1060
1061
SUPPLEMENTAL DATA 1062
Supplemental Figure 1: K-mer based genome size estimates. 1063
Supplemental Figure 2: White clover assembly ordering and alignment to a genetic linkage 1064
map. 1065
Supplemental Figure 3: Chloroplast genome alignments of white clover and extant 1066
progenitors. 1067
Supplemental Figure 4:. Internal Transcribed Spacer 2 alignment for Trifolium occidentale, T. 1068
pallescens compared with Illumina TruSeq Synthetic Long Read sequences from white 1069
clover. 1070
Supplemental Figure 5: Trifolium occidentale (To) vs T. pallescens (Tp) chromosome 1071
alignment. 1072
34
Supplemental Figure 6: Trifolium occidentale (To) vs white clover TrTo subgenome 1073
chromosome alignment. 1074
Supplemental Figure 7: Trifolium pallescens (Tp) vs white clover TrTp subgenome 1075
chromosome alignment. 1076
Supplemental Figure 8: Estimation of mutation rates. 1077
Supplemental Figure 9: Pairwise Sequentially Markovian Coalescent (PSMC) analysis of 1078
four sequenced white clover individuals using different recombination rates. 1079
Supplemental Figure 10: Multiple Sequential Markovian Coalescent (MSMC) analysis of 1080
four sequenced white clover individuals using different settings. 1081
Supplemental Figure 11: Site frequency spectra and SNP density simulations of white clover 1082
with different effective population (Ne) size expansions. 1083
Supplemental Figure 12: Putative inter-homoeologous recombination breakpoints identified 1084
in white clover. 1085
Supplemental Figure 13: Protein level divergence among progenitors and white clover 1086
subgenomes for gene expression ratio outliers. 1087
Supplemental Figure 14: Comparison of extant progenitor and white clover expression 1088
patterns. 1089
Supplemental Figures 1-14 1090
Supplemental Tables 1-18 1091
Supplemental Table 1: Sequencing library statistics. 1092
Supplemental Table 2: Mapped single locus, homoeologue-specific microsatellite markers. 1093
Supplemental Table 3: Pseudomolecule statistics. 1094
Supplemental Table 4: RNA-seq data used in gene annotation and genomic 180 bp library 1095
statistics. 1096
Supplemental Table 5: Gene annotation sources. 1097
35
Supplemental Table 6: Gene annotation summary. 1098
Supplemental Table 7: Chloroplast alignment summary statistics. 1099
Supplemental Table 8: Read statistics for re-sequenced white clover individuals. 1100
Supplemental Table 9: Chloroplast mapping summary statistics for the four fully sequenced 1101
white clover individuals. 1102
Supplemental Table 10: Private progenitor and subgenome diagnostic SNPs. 1103
Supplemental Table 11: Clover genome alignment statistics and estimated divergence times. 1104
Supplemental Table 12: PSMC analysis and mutation accumulation simulation using a range 1105
of mutation rates. 1106
Supplemental Table 13: Summary of RNA-seq libraries from the Pooled, S9 and S10 1107
experiments. 1108
Supplemental Table 14: HyLiTE RNA-seq read assignment summary. 1109
Supplemental Table 15: Pathways enriched for outlier genes. 1110
Supplemental Table 16: T. occidentale accession numbers 1111
Supplemental Table 17: T. pallescens accession numbers 1112
Supplemental Table 18: T. repens accession numbers 1113
1114
Supplemental Dataset 1: GBS summary for white clover individuals 1115
Supplemental Dataset 2: HyLiTE RNA-seq read counts 1116
Supplemental Dataset 3: Expression ratio outliers 1117
Supplemental Dataset 4: R scripts used for expression analysis 1118
1119
36
ACKNOWLEDGEMENTS 1120
We thank Greig Cousins for the white clover S8 inbred cv. ‘Crau’ derivative, and Cindy 1121
Lawley and Karl Sluis of Illumina Inc for early access to the TruSeq Synthetic Long-Read 1122
sequencing technology. The research was funded by: Pastoral Genomics, a joint venture co-1123
funded by DairyNZ, Beef+Lamb New Zealand, Dairy Australia, AgResearch Ltd, New 1124
Zealand Agriseeds Ltd, Grasslands Innovation Ltd, DEEResearch, and the Ministry of 1125
Business, Innovation and Employment (New Zealand); Contract C10X1306 'Genomics for 1126
Production & Security in a Biological Economy'-Ministry of Business, Innovation and 1127
Employment (New Zealand); grant no. 4105-00007A from Innovation Fund Denmark 1128
(S.U.A.); and grant no. 10-081677 from The Danish Council for Independent Research | 1129
Technology and Production Sciences (S.U.A.). 1130
1131
AUTHORS’ CONTRIBUTIONS 1132
A.G.G. initiated the white clover genome sequencing project, co-developed the sequencing 1133
strategy, and selected the germplasm for sequencing. R.M. co-developed the sequencing 1134
strategy, performed genome assembly, synteny, and progenitor validation analyses. M.T. 1135
performed analysis of GBS, resequencing data and carried out PSMC and MSNC analysis, 1136
site frequency spectrum simulations, gene annotation and HyLiTE analysis. V.G. performed 1137
gene annotation. R.M. and V.G. performed subgenome recombination analysis. T.P.B. 1138
carried out LD analysis and prepared the genetic map. M.A.C. performed HyLiTE analysis of 1139
subgenome specific expression. R.A. performed GBS SNP discovery for the white clover 1140
biparental population. I.N. prepared GBS libraries for 200 clover accessions, extracted DNA 1141
for re-sequencing, and analysed data. A.K. co-developed the sequencing strategy. A.L. 1142
optimised RNA extraction methodology and prepared RNA for sequencing. C.A. and B.F 1143
produced the white clover S9 germplasm, extracted DNA for genome sequencing and from 1144
the biparental population, and carried out SSR genotyping. K.H. prepared RNA for 1145
sequencing. A.S. developed the white clover biparental population. N.W.E assembled the 1146
chloroplast genomes. M.P.C. supervised HyLiTE analysis. T.A. designed the GBS protocol 1147
used for the 200 accession diversity panel and analysed data. T.M. prepared scripts for 1148
divergence time analysis. M.H.S. conceived and supervised mutation accumulation 1149
simulations. S.U.A. performed divergence time and expression ratio outlier analysis. A.G.G. 1150
37
and S.U.A. designed and supervised the study, interpreted results and wrote the manuscript 1151
with input from all authors. 1152
1153
References 1154
(IWGSC)., I.W.G.S.C. (2014). A chromosome-based draft sequence of the hexaploid bread 1155
wheat (Triticum aestivum) genome. Science 345, 1251788. 1156
Aasmo Finne, M., Rognli, O.A., and Schjelderup, I. (2000). Genetic variation in a 1157
Norwegian germplasm collection of white clover (Trifolium repens L.). Euphytica 1158
112, 45-56. 1159
Abberton, M.T., Fothergill, M., Collins, R.P., and Marshall, A.H. (2006). Breeding forage 1160
legumes for sustainable and profitable farming systems. Asp. Appl. Biol. 80, 81-87. 1161
Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. (2003). Genes duplicated by 1162
polyploidy show unequal contributions to the transcriptome and organ-specific 1163
reciprocal silencing. Proc. Natl. Acad. Sci. USA 100, 4649-4654. 1164
Ainouche, M.L., Fortune, P.M., Salmon, A., Parisod, C., Grandbastien, M.-A., 1165
Fukunaga, K., Ricou, M., and Misset, M.-T. (2009). Hybridization, polyploidy and 1166
invasion: lessons from Spartina (Poaceae). Biol. Invasions 11, 1159-1173. 1167
Akama, S., Shimizu-Inatsugi, R., Shimizu, K.K., and Sese, J. (2014). Genome-wide 1168
quantification of homeolog expression ratio revealed nonstochastic gene regulation in 1169
synthetic allopolyploid Arabidopsis. Nucleic Acids Res 42, e46. 1170
Álvarez, I., and Wendel, J.F. (2003). Ribosomal ITS sequences and plant phylogenetic 1171
inference. Mol. Phylogen. Evol. 29, 417-434. 1172
Anderson, C.B., Franzmayr, B.K., Hong, S.W., Larking, A.C., van Stijn, T.C., Tan, R., 1173
Moraga, R., Faville, M.J., and Griffiths, A.G. (2018). Protocol: a versatile, 1174
inexpensive, high-throughput plant genomic DNA extraction method suitable for 1175
genotyping-by-sequencing. Plant Methods 14, 75. 1176
Ansari, H.A., Ellison, N.W., and Williams, W.M. (2008). Molecular and cytogenetic 1177
evidence for an allotetraploid origin of Trifolium dubium (Leguminosae). 1178
Chromosoma 117, 159-167. 1179
Ansari, H.A., Ellison, N.W., Reader, S.M., Badaeva, E.D., Friebe, B., Miller, T.E., and 1180
Williams, W.M. (1999). Molecular cytogenetic organization of 5S and 18S-26S 1181
rDNA loci in white clover (Trifolium repens L.) and related species. Ann. Bot. 83, 1182
199-206. 1183
Bardil, A., de Almeida, J.D., Combes, M.C., Lashermes, P., and Bertrand, B. (2011). 1184
Genomic expression dominance in the natural allopolyploid Coffea arabica is 1185
massively affected by growth temperature. New Phytol. 192, 760-774. 1186
Bennett, M.D., and Leitch, I.J. (2011). Nuclear DNA amounts in angiosperms: targets, 1187
trends and tomorrow. Ann. Bot. 107, 467-590. 1188
Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E.K., Liu, 1189
X., Gao, D., Clevenger, J., Dash, S., Ren, L., Moretzsohn, M.C., Shirasawa, K., 1190
Huang, W., Vidigal, B., Abernathy, B., Chu, Y., Niederhuth, C.E., Umale, P., 1191
Araujo, A.C., Kozik, A., Kim, K.D., Burow, M.D., Varshney, R.K., Wang, X., 1192
Zhang, X., Barkley, N., Guimaraes, P.M., Isobe, S., Guo, B., Liao, B., Stalker, 1193
H.T., Schmitz, R.J., Scheffler, B.E., Leal-Bertioli, S.C., Xun, X., Jackson, S.A., 1194
38
Michelmore, R., and Ozias-Akins, P. (2016). The genome sequences of Arachis 1195
duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. 1196
Genet. 48, 438-446. 1197
Bilton, T.P., McEwan, J.C., Clarke, S.M., Brauning, R., van Stijn, T.C., Rowe, S.J., and 1198
Dodds, K.G. (2018). Linkage Disequilibrium Estimation in Low Coverage High-1199
Throughput Sequencing Data. Genetics 209, 389-400. 1200
Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011). Scaffolding 1201
pre-assembled contigs using SSPACE. Bioinformatics 27, 578-579. 1202
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 1203
Illumina sequence data. Bioinformatics 30, 2114-2120. 1204
Bottley, A., Xia, G.M., and Koebner, R.M.D. (2006). Homoeologous gene silencing in 1205
hexaploid wheat. Plant J. 47, 897-906. 1206
Branca, A., Paape, T.D., Zhou, P., Briskine, R., Farmer, A.D., Mudge, J., Bharti, A.K., 1207
Woodward, J.E., May, G.D., Gentzbittel, L., Ben, C., Denny, R., Sadowsky, M.J., 1208
Ronfort, J., Bataillon, T., Young, N.D., and Tiffin, P. (2011). Whole-genome 1209
nucleotide diversity, recombination, and linkage disequilibrium in the model legume 1210
Medicago truncatula. Proc. Natl. Acad. Sci. USA 108, E864-870. 1211
Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and 1212
Elven, R. (2004). Polyploidy in arctic plants. Biol. J. Linn. Soc. 82, 521-536. 1213
Buggs, R.J.A., Elliott, N.M., Zhang, L., Koh, J., Viccini, L.F., Soltis, D.E., and Soltis, 1214
P.S. (2010). Tissue-specific silencing of homoeologs in natural populations of the 1215
recent allopolyploid Tragopogon mirus. New Phytol. 186, 175-183. 1216
Cao, M., Fraser, K., Jones, C., Stewart, A., Lyons, T., Faville, M., and Barrett, B. 1217
(2017). Untargeted metabotyping Lolium perenne reveals population-level variation 1218
in plant flavonoids and alkaloids. Front. Plant Sci. 8, 133. 1219
Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, 1220
A., Krummenacker, M., Latendresse, M., Mueller, L.A., Ong, Q., Paley, S., 1221
Subhraveti, P., Weaver, D.S., and Karp, P.D. (2016). The MetaCyc database of 1222
metabolic pathways and enzymes and the BioCyc collection of pathway/genome 1223
databases. Nucleic Acids Res. 44, 471-480. 1224
Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., and Cresko, W.A. (2013). 1225
Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124-3140. 1226
Cenci, A., Combes, M.C., and Lashermes, P. (2012). Genome evolution in diploid and 1227
tetraploid Coffea species as revealed by comparative analysis of orthologous genome 1228
segments. Plant Mol. Biol. 78, 135-145. 1229
Chelaifa, H., Monnier, A., and Ainouche, M. (2010). Transcriptomic changes following 1230
recent natural hybridization and allopolyploidy in the salt marsh species Spartina x 1231
townsendii and Spartina anglica (Poaceae). New Phytol. 186, 161-174. 1232
Chen, J., Glémin, S., and Lascoux, M. (2017). Genetic diversity and the efficacy of 1233
purifying selection across plant and animal species. Mol. Biol. Evol. 34, 1417-1428. 1234
Cheng, F., Wu, J., Fang, L., Sun, S., Liu, B., Lin, K., Bonnema, G., and Wang, X. (2012). 1235
Biased gene fractionation and dominant gene expression among the subgenomes of 1236
Brassica rapa. PloS one 7, e36442. 1237
Chester, M., Gallagher, J.P., Symonds, V.V., Cruz da Silva, A.V., Mavrodiev, E.V., 1238
Leitch, A.R., Soltis, P.S., and Soltis, D.E. (2012). Extensive chromosomal variation 1239
in a recently formed natural allopolyploid species, Tragopogon miscellus 1240
(Asteraceae). Proc. Natl. Acad. Sci. USA 109, 1176-1181. 1241
Combes, M.-C., Dereeper, A., Severac, D., Bertrand, B., and Lashermes, P. (2013). 1242
Contribution of subgenomes to the transcriptome and their intertwined regulation in 1243
39
the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytol. 200, 1244
251-260. 1245
Combes, M.C., Hueber, Y., Dereeper, A., Rialle, S., Herrera, J.C., and Lashermes, P. 1246
(2015). Regulatory divergence between parental alleles determines gene expression 1247
patterns in hybrids. Genome biology and evolution 7, 1110-1121. 1248
Coombe, D. (1961). Trifolium occidentale, a new species related to T. repens L. Watsonia 5, 1249
68-87. 1250
Cousins, G., and Woodfield, D.R. (2006). Effect of inbreeding on growth of white clover. In 1251
13th Australasian Plant Breeding Conference, C.F. Mercer, ed (Christchurch, New 1252
Zealand: New Zealand Grassland Association), pp. 568-572. 1253
Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010). SolexaQA: At-a-glance quality 1254
assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11, 1255
485. 1256
Daday, H. (1958). Gene frequencies in wild populations of Trifolium repens L. III. World 1257
distribution. Heredity 12, 169-184. 1258
Darling, A.C.E., Mau, B., Blattner, F.R., and Perna, N.T. (2004). Mauve: multiple 1259
alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1260
1394-1403. 1261
Davies, K.M., Albert, N.W., Zhou, Y., and Schwinn, K.E. (2018). Functions of flavonoid 1262
and betalain pigments in abiotic stress tolerance in plants. Annual Plant Reviews 1263
online 1, 1-41. 1264
De Storme, N., and Geelen, D. (2014). The impact of environmental stress on male 1265
reproductive development in plants: biological processes and molecular mechanisms. 1266
Plant, Cell Environ. 37, 1-18. 1267
DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., 1268
Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, 1269
T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, 1270
D., and Daly, M.J. (2011). A framework for variation discovery and genotyping 1271
using next-generation DNA sequencing data. Nat. Genet. 43, 491-498. 1272
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., 1273
Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq 1274
aligner. Bioinformatics 29, 15-21. 1275
Douglas, G.M., Gos, G., Steige, K.A., Salcedo, A., Holm, K., Josephs, E.B., Arunkumar, 1276
R., Agren, J.A., Hazzouri, K.M., Wang, W., Platts, A.E., Williamson, R.J., 1277
Neuffer, B., Lascoux, M., Slotte, T., and Wright, S.I. (2015). Hybrid origins and 1278
the earliest stages of diploidization in the highly successful recent polyploid Capsella 1279
bursa-pastoris. Proc. Natl. Acad. Sci. USA 112, 2806-2811. 1280
Duchemin, W., Dupont, P.Y., Campbell, M.A., Ganley, A.R., and Cox, M.P. (2015). 1281
HyLiTE: accurate and flexible analysis of gene expression in hybrid and allopolyploid 1282
species. BMC Bioinformatics 16, 8. 1283
Ellison, N.W., Liston, A., Steiner, J.J., Williams, W.M., and Taylor, N.L. (2006). 1284
Molecular phylogenetics of the clover genus (Trifolium - Leguminosae). Mol. 1285
Phylogen. Evol. 39, 688-705. 1286
Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., and 1287
Mitchell, S.E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach 1288
for high diversity species. PloS one 6, e19379. 1289
Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. 1290
II. Error probabilities. Genome Res. 8, 186-194. 1291
40
Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might 1292
have had a better chance to survive the Cretaceous–Tertiary extinction event. Proc. 1293
Natl. Acad. Sci. USA 106, 5737-5742. 1294
Feldman, M., Liu, B., Segal, G., Abbo, S., Levy, A.A., and Vega, J.M. (1997). Rapid 1295
elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism 1296
for differentiation of homoeologous chromosomes. Genetics 147, 1381-1387. 1297
Ferreira de Carvalho, J., Poulain, J., Da Silva, C., Wincker, P., Michon-Coudouel, S., 1298
Dheilly, A., Naquin, D., Boutte, J., Salmon, A., and Ainouche, M. (2013). 1299
Transcriptome de novo assembly from next-generation sequencing and comparative 1300
analyses in the hexaploid salt marsh species Spartina maritima and Spartina 1301
alterniflora (Poaceae). Heredity 110, 181-193. 1302
Franzmayr, B.K., Rasmussen, S., Fraser, K.M., and Jameson, P.E. (2012). Expression 1303
and functional characterization of a white clover isoflavone synthase in tobacco. Ann. 1304
Bot. 110, 1291-1301. 1305
Garsmeur, O., Schnable, J.C., Almeida, A., Jourda, C., D’Hont, A., and Freeling, M. 1306
(2014). Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 31, 1307
448-454. 1308
Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., 1309
Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M., Aird, D., Costello, M., 1310
Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., and 1311
Jaffe, D.B. (2011). High-quality draft assemblies of mammalian genomes from 1312
massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513-1518. 1313
Griffiths, A.G., Barrett, B.A., Simon, D., Khan, A.K., Bickerstaff, P., Anderson, C.B., 1314
Franzmayr, B.K., Hancock, K.R., and Jones, C.S. (2013). An integrated genetic 1315
linkage map for white clover (Trifolium repens L.) with alignment to Medicago. BMC 1316
Genomics 14, 388. 1317
Gross, B.L., Kane, N.C., Lexer, C., Rosenthal, L., Fulco, D.M., Donovan, L.A., and 1318
Rieseberg, L.H. (2004). Reconstructing the Origin of Helianthus deserticola: 1319
Survival and Selection on the Desert Floor. Am. Nat. 164, 145-156. 1320
Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E., and Wendel, 1321
J.F. (2012). Homoeolog expression bias and expression level dominance in 1322
allopolyploids. New Phytol. 196, 966-971. 1323
Hahsler, M., Buchta, C., and Hornik, K. (2016). Infrastructure for seriation. R package 1324
version 1.2-1. . 1325
Harris, R.S. (2007). Improved pairwise alignment of genomic DNA. In College of 1326
Engineering (The Pennsylvania State University), pp. 84. 1327
Herten, K., Hestand, M.S., Vermeesch, J.R., and Van Houdt, J.K. (2015). GBSX: a 1328
toolkit for experimental design and demultiplexing genotyping by sequencing 1329
experiments. BMC Bioinformatics 16, 73. 1330
Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2017). BRAKER2 1331
(http://github.com/Gaius-Augustus/BRAKER Downloaded September 28, 2018). 1332
Hoffmann, M.H. (2005). Evolution of the realized climatic niche in the genus Arabidopsis 1333
(Brassicaceae). Evolution 59, 1425-1436. 1334
Hovav, R., Udall, J.A., Chaudhary, B., Rapp, R., Flagel, L., and Wendel, J.F. (2008). 1335
Partitioned expression of duplicated genes during development and evolution of a 1336
single cell in a polyploid plant. Proc. Natl. Acad. Sci. USA 105, 6191-6195. 1337
Jamalnasir, H., Wagiran, A., Shaharuddin, N.A., and Samad, A.A. (2013). Isolation of 1338
high quality RNA from plant rich in flavonoids, Melastoma decemfidum Roxb ex. 1339
Jack. Australian Journal of Crop Science 7, 911-916. 1340
41
Jia, J., Zhao, S., Kong, X., Li, Y., Zhao, G., He, W., Appels, R., Pfeifer, M., Tao, Y., 1341
Zhang, X., Jing, R., Zhang, C., Ma, Y., Gao, L., Gao, C., Spannagl, M., Mayer, 1342
K.F., Li, D., Pan, S., Zheng, F., Hu, Q., Xia, X., Li, J., Liang, Q., Chen, J., 1343
Wicker, T., Gou, C., Kuang, H., He, G., Luo, Y., Keller, B., Xia, Q., Lu, P., 1344
Wang, J., Zou, H., Zhang, R., Xu, J., Gao, J., Middleton, C., Quan, Z., Liu, G., 1345
Yang, H., Liu, X., He, Z., and Mao, L. (2013). Aegilops tauschii draft genome 1346
sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91-95. 1347
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, 1348
S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., 1349
and Drummond, A. (2012). Geneious Basic: An integrated and extendable desktop 1350
software platform for the organization and analysis of sequence data. Bioinformatics 1351
28, 1647-1649. 1352
Kelleher, J., Etheridge, A.M., and McVean, G. (2016). Efficient coalescent simulation and 1353
genealogical analysis for large sample sizes. PLoS Comp. Biol. 12, e1004842. 1354
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). 1355
TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions 1356
and gene fusions. Genome Biol 14, R36. 1357
Kovarik, A., Dadejova, M., Lim, Y.K., Chase, M.W., Clarkson, J.J., Knapp, S., and 1358
Leitch, A.R. (2008). Evolution of rDNA in Nicotiana allopolyploids: a potential link 1359
between rDNA homogenization and epigenetics. Ann. Bot. 101, 815-823. 1360
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. 1361
Methods 9, 357-359. 1362
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-1363
efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, 1364
1-10. 1365
Leach, L.J., Belfield, E.J., Jiang, C., Brown, C., Mithani, A., and Harberd, N.P. (2014). 1366
Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid 1367
bread wheat. BMC Genomics 15, 276. 1368
Leitch, A.R., and Leitch, I.J. (2008). Genomic plasticity and the diversity of polyploid 1369
plants. Science 320, 481-483. 1370
Li, H., and Durbin, R. (2011). Inference of human population history from individual 1371
whole-genome sequences. Nature 475, 493-496. 1372
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., 1373
Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009). The Sequence 1374
Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. 1375
Li, Z., Baniaga, A.E., Sessa, E.B., Scascitelli, M., Graham, S.W., Rieseberg, L.H., and 1376
Barker, M.S. (2015). Early genome duplications in conifers and other seed plants. 1377
Sci. Adv. 1, e1501084. 1378
Liu, Z., and Adams, K.L. (2007). Expression partitioning between genes duplicated by 1379
polyploidy under abiotic stress and during organ development. Curr. Biol. 17, 1669-1380
1674. 1381
Liu, Z., Xin, M., Qin, J., Peng, H., Ni, Z., Yao, Y., and Sun, Q. (2015). Temporal 1382
transcriptome profiling reveals expression partitioning of homeologous genes 1383
contributing to heat and drought acclimation in wheat (Triticum aestivum L.). BMC 1384
Plant Biol. 15, 152. 1385
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. (2005). Gene 1386
identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids 1387
Res. 33, 6494-6506. 1388
Lynch, M. (2010). Evolution of the mutation rate. Trends Genet. 26, 345-352. 1389
42
Madlung, A. (2013). Polyploidy and its effect on evolutionary success: old questions 1390
revisited with new tools. Heredity 110, 99-104. 1391
Mailund, T., Halager, A.E., Westergaard, M., Dutheil, J.Y., Munch, K., Andersen, L.N., 1392
Lunter, G., Prüfer, K., Scally, A., Hobolth, A., and Schierup, M.H. (2012). A new 1393
Isolation with migration model along complete genomes infers very different 1394
divergence processes among closely related great ape species. PLoS Genet. 8, 1395
e1003125. 1396
Majoros, W.H., Pertea, M., and Salzberg, S.L. (2004). TigrScan and GlimmerHMM: two 1397
open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878-2879. 1398
Mangerud, J., Jakobsson, M., Alexanderson, H., Astakhov, V., Clarke, G.K.C., 1399
Henriksen, M., Hjort, C., Krinner, G., Lunkka, J.-P., Möller, P., Murray, A., 1400
Nikolskaya, O., Saarnisto, M., and Svendsen, J.I. (2004). Ice-dammed lakes and 1401
rerouting of the drainage of northern Eurasia during the Last Glaciation. Quatern. Sci. 1402
Rev. 23, 1313-1332. 1403
Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel 1404
counting of occurrences of k-mers. Bioinformatics 27, 764-770. 1405
Mason, A.S., and Pires, J.C. (2015). Unreduced gametes: meiotic mishap or evolutionary 1406
mechanism? Trends Genet. 31, 5-10. 1407
McClintock, B. (1984). The significance of responses of the genome to challenge. Science 1408
226, 792-801. 1409
Mierziak, J., Kostyn, K., and Kulma, A. (2014). Flavonoids as important molecules of 1410
plant interactions with the environment. Molecules 19, 16240-16265. 1411
Mouradov, A., and Spangenberg, G. (2014). Flavonoids: a metabolic network mediating 1412
plants adaptation to their real estate. Front. Plant Sci. 5, 620. 1413
Munch, D., Gupta, V., Bachmann, A., Busch, W., Kelly, S., Mun, T., and Andersen, S.U. 1414
(2018). The Brassicaceae family displays divergent, shoot-skewed NLR resistance 1415
gene expression. Plant Physiol. 176, 1598-1609. 1416
Nadachowska-Brzyska, K., Burri, R., Smeds, L., and Ellegren, H. (2016). PSMC analysis 1417
of effective population sizes in molecular ecology and its application to black-and-1418
white Ficedula flycatchers. Mol. Ecol. 25, 1058-1072. 1419
Novikova, P.Y., Hohmann, N., Nizhynska, V., Tsuchimatsu, T., Ali, J., Muir, G., 1420
Guggisberg, A., Paape, T., Schmid, K., Fedorenko, O.M., Holm, S., Sall, T., 1421
Schlotterer, C., Marhold, K., Widmer, A., Sese, J., Shimizu, K.K., Weigel, D., 1422
Kramer, U., Koch, M.A., and Nordborg, M. (2016). Sequencing of the genus 1423
Arabidopsis identifies a complex history of nonbifurcating speciation and abundant 1424
trans-specific polymorphism. Nat. Genet. 48, 1077-1082. 1425
Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N., Clark, R.M., Shaw, 1426
R.G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of 1427
spontaneous mutations in Arabidopsis thaliana. Science 327, 92-94. 1428
Otto, S.P. (2007). The evolutionary consequences of polyploidy. Cell 131, 452-462. 1429
Ownbey, M. (1950). Natural hybridization and amphiploidy in the genus Tragopogon. Am. J. 1430
Bot. 37, 487-499. 1431
Paape, T., Hatakeyama, M., Shimizu-Inatsugi, R., Cereghetti, T., Onda, Y., Kenta, T., 1432
Sese, J., and Shimizu, K.K. (2016). Conserved but Attenuated Parental Gene 1433
Expression in Allopolyploids: Constitutive Zinc Hyperaccumulation in the 1434
Allotetraploid Arabidopsis kamchatica. Mol Biol Evol 33, 2781-2800. 1435
Raffl, C., Holderegger, R., Parson, W., and Erschbamer, B. (2008). Patterns in genetic 1436
diversity of Trifolium pallescens populations do not reflect chronosequence on alpine 1437
glacier forelands. Heredity 100, 526-532. 1438
43
Ramirez-Gonzalez, R.H., Borrill, P., Lang, D., Harrington, S.A., Brinton, J., Venturini, 1439
L., Davey, M., Jacobs, J., van Ex, F., Pasha, A., Khedikar, Y., Robinson, S.J., 1440
Cory, A.T., Florio, T., Concia, L., Juery, C., Schoonbeek, H., Steuernagel, B., 1441
Xiang, D., Ridout, C.J., Chalhoub, B., Mayer, K.F.X., Benhamed, M., Latrasse, 1442
D., Bendahmane, A., Wulff, B.B.H., Appels, R., Tiwari, V., Datla, R., Choulet, F., 1443
Pozniak, C.J., Provart, N.J., Sharpe, A.G., Paux, E., Spannagl, M., Brautigam, 1444
A., and Uauy, C. (2018). The transcriptional landscape of polyploid wheat. Science 1445
361. 1446
Ramsey, J., and Schemske, D.W. (1998). Pathways, mechanisms, and rates of polyploid 1447
formation in flowering plants. Annu. Rev. Ecol. Syst. 29, 467-501. 1448
Rapp, R.A., Udall, J.A., and Wendel, J.F. (2009). Genomic expression dominance in 1449
allopolyploids. BMC Biol. 7, 18. 1450
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor 1451
package for differential expression analysis of digital gene expression data. 1452
Bioinformatics 26, 139-140. 1453
Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, 1454
X., Cheng, L., Fan, D., Feng, Y., Han, L., Huang, Z., Wu, Z., Liao, L., Settepani, 1455
V., Thøgersen, I.B., Vanthournout, B., Wang, T., Zhu, Y., Funch, P., Enghild, 1456
J.J., Schauser, L., Andersen, S.U., Villesen, P., Schierup, M.H., Bilde, T., and 1457
Wang, J. (2014). Spider genomes provide insight into composition and evolution of 1458
venom and silk. Nature Communications 5, 3765. 1459
Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history 1460
from multiple genome sequences. Nat Genet 46, 919-925. 1461
Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize 1462
subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. 1463
Natl. Acad. Sci. USA 108, 4069-4074. 1464
Selmecki, A.M., Maruvka, Y.E., Richmond, P.A., Guillet, M., Shoresh, N., Sorenson, 1465
A.L., De, S., Kishony, R., Michor, F., Dowell, R., and Pellman, D. (2015). 1466
Polyploidy can drive rapid adaptation in yeast. Nature 519, 349-352. 1467
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. 1468
(2015). BUSCO: assessing genome assembly and annotation completeness with 1469
single-copy orthologs. Bioinformatics 31, 3210-3212. 1470
Smit, A.F.A., Hubley, R., and Green, P. (2015). RepeatMasker Open-4.0 1471
(http://www.repeatmasker.org Downloaded December 2018). 1472
Soltis, D.E., Visger, C.J., Marchant, D.B., and Soltis, P.S. (2016). Polyploidy: Pitfalls and 1473
paths to a paradigm. Am. J. Bot. 103, 1146-1166. 1474
Soltis, D.E., Soltis, P.S., Pires, J.C., Kovarik, A., Tate, J.A., and Mavrodiev, E. (2004). 1475
Recent and recurrent polyploidy in Tragopogon (Asteraceae): cytogenetic, genomic 1476
and genetic comparisons. Biol. J. Linn. Soc. 82, 485-501. 1477
Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in 1478
angiosperms. Curr. Opin. Plant Biol. 30, 159-165. 1479
Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and 1480
genome evolution in plants. Curr. Opin. Genet. Dev. 35, 119-125. 1481
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). 1482
AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, 1483
W435-439. 1484
Tang, H., Krishnakumar, V., Bidwell, S., Rosen, B., Chan, A., Zhou, S., Gentzbittel, L., 1485
Childs, K.L., Yandell, M., Gundlach, H., Mayer, K.F., Schwartz, D.C., and 1486
Town, C.D. (2014). An improved genome release (version Mt4.0) for the model 1487
legume Medicago truncatula. BMC Genomics 15, 312. 1488
44
Tayalé, A., and Parisod, C. (2013). Natural pathways to polyploidy in plants and 1489
consequences for genome reorganization. Cytogenet. Genome Res. 140, 79-96. 1490
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., 1491
Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript 1492
expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Prot. 7, 1493
562-578. 1494
Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A., and Domany, E. (2005). 1495
Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering 1496
distance matrices. Bioinformatics 21, 2301-2308. 1497
Vanneste, K., Maere, S., and Van de Peer, Y. (2014a). Tangled up in two: a burst of 1498
genome duplications at the end of the Cretaceous and the consequences for plant 1499
evolution. Philos. Trans. R. Soc. B 369, 20130353. 1500
Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014b). Analysis of 41 plant 1501
genomes supports a wave of successful genome duplications in association with the 1502
Cretaceous–Paleogene boundary. Genome Res. 24, 1334-1347. 1503
Wang, J., Tian, L., Madlung, A., Lee, H.S., Chen, M., Lee, J.J., Watson, B., Kagochi, T., 1504
Comai, L., and Chen, Z.J. (2004). Stochastic and epigenetic changes of gene 1505
expression in Arabidopsis polyploids. Genetics 167, 1961-1973. 1506
Wang, X., Wang, H., Wang, J., Sun, R., Wu, J., Liu, S., Bai, Y., Mun, J.H., Bancroft, I., 1507
Cheng, F., Huang, S., Li, X., Hua, W., Freeling, M., Pires, J.C., Paterson, A.H., 1508
Chalhoub, B., Wang, B., Hayward, A., Sharpe, A.G., Park, B.S., Weisshaar, B., 1509
Liu, B., Li, B., Tong, C., Song, C., Duran, C., Peng, C., Geng, C., Koh, C., Lin, 1510
C., Edwards, D., Mu, D., Shen, D., Soumpourou, E., Li, F., Fraser, F., Conant, 1511
G., Lassalle, G., King, G.J., Bonnema, G., Tang, H., Belcram, H., Zhou, H., 1512
Hirakawa, H., Abe, H., Guo, H., Jin, H., Parkin, I.A., Batley, J., Kim, J.S., Just, 1513
J., Li, J., Xu, J., Deng, J., Kim, J.A., Yu, J., Meng, J., Min, J., Poulain, J., 1514
Hatakeyama, K., Wu, K., Wang, L., Fang, L., Trick, M., Links, M.G., Zhao, M., 1515
Jin, M., Ramchiary, N., Drou, N., Berkman, P.J., Cai, Q., Huang, Q., Li, R., 1516
Tabata, S., Cheng, S., Zhang, S., Sato, S., Sun, S., Kwon, S.J., Choi, S.R., Lee, 1517
T.H., Fan, W., Zhao, X., Tan, X., Xu, X., Wang, Y., Qiu, Y., Yin, Y., Li, Y., Du, 1518
Y., Liao, Y., Lim, Y., Narusaka, Y., Wang, Z., Li, Z., Xiong, Z., and Zhang, Z. 1519
(2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1520
1035-1039. 1521
Williams, C.A., and Grayer, R.J. (2004). Anthocyanins and other flavonoids. Natural 1522
product reports 21, 539-573. 1523
Williams, W.M., Mason, K.M., and Williamson, M.L. (1998). Genetic analysis of 1524
shikimate dehydrogenase allozymes in Trifolium repens L. Theor. Appl. Genet. 96, 1525
859-868. 1526
Williams, W.M., Ellison, N.W., Ansari, H.A., Verry, I.M., and Hussain, S.W. (2012). 1527
Experimental evidence for the ancestry of allotetraploid Trifolium repens and creation 1528
of synthetic forms with value for plant breeding. BMC Plant Biol. 12, 55. 1529
Wood, D.E., and Salzberg, S.L. (2014). Kraken: ultrafast metagenomic sequence 1530
classification using exact alignments. Genome Biol. 15, R46. 1531
Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M., and Wang, X. (2014). 1532
Origin, inheritance, and gene regulatory consequences of genome dominance in 1533
polyploids. Proc. Natl. Acad. Sci. USA 111, 5283-5288. 1534
Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program 1535
for mRNA and EST sequences. Bioinformatics 21, 1859-1875. 1536
Yang, J., Liu, D., Wang, X., Ji, C., Cheng, F., Liu, B., Hu, Z., Chen, S., Pental, D., Ju, 1537
Y., Yao, P., Li, X., Xie, K., Zhang, J., Wang, J., Liu, F., Ma, W., Shopan, J., 1538
45
Zheng, H., Mackenzie, S.A., and Zhang, M. (2016). The genome sequence of 1539
allopolyploid Brassica juncea and analysis of differential homoeolog gene expression 1540
influencing selection. Nat. Genet. 48, 1225-1232. 1541
Yokoyama, Y., Lambeck, K., De Deckker, P., Johnston, P., and Fifield, L.K. (2000). 1542
Timing of the Last Glacial Maximum from observed sea-level minima. Nature 406, 1543
713-716. 1544
Yoo, M.J., Szadkowski, E., and Wendel, J.F. (2013). Homoeolog expression bias and 1545
expression level dominance in allopolyploid cotton. Heredity 110, 171-180. 1546
Zerbino, D.R., and Birney, E. (2008). Velvet: Algorithms for de novo short read assembly 1547
using de Bruijn graphs. Genome Res. 18, 821-829. 1548
Zhang, D., Pan, Q., Cui, C., Tan, C., Ge, X., Shao, Y., and Li, Z. (2015a). Genome-1549
specific differential gene expressions in resynthesized Brassica allotetraploids from 1550
pair-wise crosses of three cultivated diploids revealed by RNA-seq. Front. Plant Sci. 1551
6, 957. 1552
Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J., Saski, C.A., 1553
Scheffler, B.E., Stelly, D.M., Hulse-Kemp, A.M., Wan, Q., Liu, B., Liu, C., Wang, 1554
S., Pan, M., Wang, Y., Wang, D., Ye, W., Chang, L., Zhang, W., Song, Q., 1555
Kirkbride, R.C., Chen, X., Dennis, E., Llewellyn, D.J., Peterson, D.G., Thaxton, 1556
P., Jones, D.C., Wang, Q., Xu, X., Zhang, H., Wu, H., Zhou, L., Mei, G., Chen, 1557
S., Tian, Y., Xiang, D., Li, X., Ding, J., Zuo, Q., Tao, L., Liu, Y., Li, J., Lin, Y., 1558
Hui, Y., Cao, Z., Cai, C., Zhu, X., Jiang, Z., Zhou, B., Guo, W., Li, R., and Chen, 1559
Z.J. (2015b). Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) 1560
provides a resource for fiber improvement. Nat. Biotechnol. 33, 531-537. 1561
Zhang, X., Zhang, Y., Yan, R., Han, J., Fuzeng, H., Wang, J., and Cao, K. (2010). 1562
Genetic variation of white clover (Trifolium repens L.) collections from China 1563
detected by morphological traits, RAPD and SSR. Afr. J. Biotechnol. 9, 3033-3041. 1564
Zimin, A.V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., and Yorke, J.A. (2013). 1565
The MaSuRCA genome assembler. Bioinformatics 29, 2669-2677. 1566
1567
1568
46
FIGURE LEGENDS 1569
Figure 1. The range of white clover and extant relatives of its progenitors. The present 1570
day ranges of white clover (Trifolium repens) (Daday, 1958) (green), and extant relatives of 1571
its diploid progenitors T. occidentale (blue) and T. pallescens (orange). T. occidentale is 1572
found within ~100 m of the seashore whereas T. pallescens grows in alpine regions between 1573
1800 and 2700 m. 1574
1575
Figure 2. White clover genetic map and synteny with extant progenitor and model 1576
forage legume genomes. A) Genetic map based on linkage disequilibrium (LD) analysis of 1577
~7,300 scaffold-mapped markers from a F1 biparental mapping population (n=93). Red 1578
colour indicates a high level of pairwise linkage disequilibrium (r2) between markers. Black 1579
dots indicate single locus homoeologue-specific microsatellite markers from an existing, co-1580
linear genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2). Progenitor 1581
origin of linkage group homoeologues/subgenomes is denoted by O (Trifolium occidentale) 1582
and P (T. pallescens). B) Circos diagram showing inter-pseudomolecule relationships 1583
between the subgenomes of white clover (TrTo; TrTp) and their progenitors (To; Tp) in these 1584
assemblies. The outer ring (green shaded) represents pseudomolecules in megabases (Mb), 1585
and the inner ring (blue) depicts gene density as a proportion along the pseudomolecules (%). 1586
Coloured lines represent synteny blocks constructed by whole genome alignment using 1587
LASTZ (Harris, 2007) comprising matches over 33 kb in length. Blocks within 100 kb 1588
windows were merged and represented by a single line. Cross-progenitor links indicate 1589
regions of high conservation between the subgenome and both progenitors, not putative 1590
recombination events. C) A matrix plot assessment of synteny between white clover 1591
subgenomes (O and P) and the reference genome of Medicago truncatula, a model forage 1592
legume with eight pseudomolecules (Mt 1 to Mt 8). Synteny was based on alignment of the 1593
3,364 LD-ordered white clover anchor scaffolds with M. truncatula genome Mt4.0 (Tang et 1594
al., 2014). 1595
1596
Figure 3. The white clover allopolyploidization event. A) Schematic phylogenetic tree for 1597
white clover (Tr) and its progenitors T. occidentale (To) and T. pallescens (Tp). A common 1598
ancestor (A) gave rise to To and Tp (grey line), which hybridized (Hyb, blue line) to give rise 1599
to the two sub-genomes TrTo and TrTp in allopolyploid white clover. B) Blue dots indicate 1600
split time estimates based on sets of 100 alignment blocks derived from the pairwise genome 1601
47
comparisons indicated, e.g. To vs Tp. Dots are superimposed on box-and-whiskers plots, 1602
where the median is indicated by a black vertical line. Using a mutation rate of 1.8×10-8
, split 1603
time estimates suggest To and Tp diverged from a common ancestor ~192k years ago, 1604
whereas white clover TrTo and TrTp subgenomes split from their extant progenitors 15k and 1605
17k years ago, respectively, suggesting genesis of white clover ~16k years ago. C) White 1606
clover progenitor speciation (grey block) and hybridization giving rise to white clover (blue 1607
block) aligned with global temperature variation relative to present day average temperature 1608
based on ice core data (Jouzel et al., 2007). The blocks indicate the extent of possible 1609
divergence times using mutation rates in the range 1.1-1.8×10-8
(Supplemental Table 9). D) 1610
Pairwise sequentially Markovian coalescent (PSMC) curve based on whole-genome 1611
resequencing data from each of four white clover individuals. Each curve indicates inferred 1612
population size history through time. The analysis depicted here was carried out using a 1613
mutation rate of 1.8 x 10-8
. See Supplemental Table 11 for results with lower mutation rates. 1614
E) Multiple sequentially Markovian coalescent (MSMC) figure based on the same data as the 1615
PSMC figure. The number of haplotypes corresponds to the number of individuals included. 1616
8 haplotypes comprises all 4. 6 haplotypes comprises individuals 81, 122 and 183, all of the 1617
individuals which had the most pronounced bottlenecks. 4 haplotypes have 3 combinations of 1618
2 individuals, [1] includes 81 and 122, [2] includes 122 and 183, [3] includes 81 and 183. The 1619
results were scaled to a mutation rate of 1.8 x 10-8
. See Supplemental Figure 10 for MSMC 1620
using different mutations rates. F) Site frequency spectra (SFS) of simulations across 20k 1621
generations under the different demographic scenarios indicated on the right. EPS: Effective 1622
population size. The observed SFS is scaled to match total polymorphism count for the 1623
simulations. Dashed line represents the expected SFS under a constant effective population 1624
size. Green numbers indicate simulated SNP densities (%) and goodness of fit between 1625
observed and simulated data. The goodness of fit was calculated by first dividing the 1626
simulated and observed values for each bin with the maximum value of the two and then 1627
using the formula: Σ(observed-simulated)2/simulated. Lower values indicate a better fit. The 1628
observed SNP density was 7.0%. 1629
48
Figure 4. Homoeologue-specific expression analysis. A) Genes classified based on the 1630
number of tissues in which the majority of reads were derived from the TrTo or the TrTp 1631
homoeologue. The numbers underneath the graph indicate the number of tissues for which 1632
the statement on the left is true. Most of the genes (69%) show the same direction of bias 1633
across all four tissues. B) Log(TrTo/TrTp) expression ratios in flowers versus leaves. C) 1634
Spearman correlation coefficients for cross-tissue comparisons of the total expression level 1635
for both homoeologues (TrTo+TrTp) and for the homoeologue expression ratios (TrTo/TrTp). Fl: 1636
flower. Le: leaf. St: stolon/shoot. Ro: root. Error bars indicate 95% confidence intervals 1637
calculated using the formula tanh(arctanh(r2)±1.96/sqrt(n−3)), where r2 is the Pearson 1638
correlation estimate and n=19,954 is the number of observations. D) Venn diagram showing 1639
the overlaps between the expression outlier genes detected in the Pooled, S9 and S10 1640
experiments. E) Log(TrTo/TrTp) difference from the mean values from all three experiments 1641
(Pooled, S9, S10) plotted for a randomly selected gene and an expression ratio outlier. The 1642
grey horizontal line indicates the mean. m-leaf: mature leaf. leaf: emerging young leaf. F) 1643
Clustered heatmap showing row-normalized log(TrTo/TrTp) values for all 909 expression 1644
outliers detected using the ANOVA method. m-leaf: mature leaf. leaf: emerging young leaf. 1645
G) Smoothed scatter plot showing the log(TrTo/TrTp) difference from the mean for all 19,954 1646
genes and the 909 ANOVA outliers, respectively. H) Expression ratio outliers in flavonoid 1647
metabolism. Blue text highlights enzymes found as expression ratio outliers. Numbers above 1648
enzyme names are enzyme category (EC) identifiers. 1649
1650
49
Tables 1651
Table 1: Genome assembly statistics for white clover (Trifolium repens) and extant relatives 1652
of its diploid progenitors, T. occidentale and T. pallescens. 1653 1654
T. occidentale T. pallescens T. repens
Estimated Genome Size (bp)
ALLPATHS-LG estimate 530,459,686 534,411,997 1,174,038,073
k-mer (k = 17) 581,040,135 591,946,893 1,105,102,999
1C Genome size1 ND ND 1,093,000,000
Total Assembly Length 436,797,749 382,382,469 841,400,916
Estimated Genome Coverage 82% 72% 72%
Chloroplast Genome 133,780 130,394 132,086
Assembly Metrics
Assembled Scaffolds (>1000 bp) 7,229 6,923 22,100
Longest Scaffold (bp) 2,576,867 1,323,392 734,507
Average Length (bp) 61,288 55,234 38,072
N10 (bp; number in category) 599,981; 49 571,804; 48 318,557; 201
N50 (bp; number in category) 191,714; 624 172,944; 586 121,657; 2016
N90 (bp; number in category) 41,953; 2416 34,989; 2451 19,736; 8100
Total Undefined bases (Ns) 51,627,539 65,642,595 86,144,561
Gap Ratio 11.8% 17.2% 10.2%
BUSCO2 Scores of Assembly (based on 1440 reference genes)
Complete genes - Total 1353 (94%) 1353 (94%) 1321 (92%)
Complete genes - Single Copy 1203 (84%) 1197 (83%) 494 (34%)
50
Complete genes - Duplicated 150 (10%) 156 (11%) 827 (57%)
Fragmented genes 37 (3%) 33 (2%) 39 (3%)
Missing genes 50 (3%) 54 (4%) 80 (6%)
1Flow cytometry (Bennett and Leitch, 2011); ND=Not Determined 1655
2BUSCO=Benchmarked Universal Single-Copy Orthologs (Simão et al., 2015). 1656
1657
1658
1659
Figure 1. The range of white clover and extant relatives of its progenitors. The present-day ranges of white clover (Trifolium repens) (Daday, 1958) (green), and extant relatives of its diploid progenitors T. occidentale (blue) and T. pallescens (orange). T. occidentale is found within ~100 m of the seashore whereas T. pallescens grows in alpine regions between 1800 and 2700 m.
Figure 2. White clover genetic map and synteny with extant progenitor and model forage legume genomes. A) Genetic map based on linkage disequilibrium (LD) analysis of ~7,300 scaffold-mapped markers from a F1 biparental mapping population (n=93). Red colour indicates a high level of pairwise linkage disequilibrium (r2) between markers. Black dots indicate single locus homoeologue-specific microsatellite markers from an existing, co-linear genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2). Progenitor origin of linkage group homoeologues/subgenomes is denoted by O (Trifolium occidentale) and P (T. pallescens). B) Circos diagram showing inter-pseudomolecule relationships between the subgenomes of white clover (TrTo; TrTp) and their progenitors (To; Tp) in these assemblies. The outer ring (green shaded) represents pseudomolecules in megabases (Mb), and the inner ring (blue) depicts gene density as a proportion along the pseudomolecules (%). Coloured lines represent synteny blocks constructed by whole genome alignment using LASTZ (Harris, 2007) comprising matches over 33 kb in length. Blocks within 100 kb windows were merged and represented by a single line. Cross-progenitor links indicate regions of high conservation between the subgenome and both progenitors, not putative recombination events. C) A matrix plot assessment of synteny between white clover subgenomes (O and P) and the reference genome of Medicago truncatula, a model forage legume with eight pseudomolecules (Mt 1 to Mt 8). Synteny was based on alignment of the 3,364 LD-ordered white clover anchor scaffolds with M. truncatula genome Mt4.0 (Tang et al., 2014).
Figure 3. The white clover allopolyploidization event. A) Schematic phylogenetic tree for white clover (Tr) and its progenitors T. occidentale (To) and T. pallescens (Tp). A common ancestor (A) gave rise to To and Tp (grey line), which hybridized (Hyb, blue line) to give rise to the two sub-genomes TrTo and TrTp in allopolyploid white clover. B) Blue dots indicate split time estimates based on sets of 100 alignment blocks derived from the pairwise genome comparisons indicated, e.g. To vs Tp. Dots are superimposed on box-and-whiskers plots, where the median is indicated by a black vertical line. Using a mutation rate of 1.8×10-
8, split time estimates suggest To and Tp diverged from a common ancestor ~192k years ago, whereas white clover TrTo and TrTp subgenomes split from their extant progenitors 15k and 17k years ago, respectively, suggesting genesis of white clover ~16k years ago. C) White clover progenitor speciation (grey block) and hybridization giving rise to white clover (blue block) aligned with global temperature variation relative to present day average temperature based on ice core data (Jouzel et al., 2007). The blocks indicate the extent of possible divergence times using mutation rates in the range 1.1-1.8×10-8 (Supplemental Table 9). D) Pairwise sequentially Markovian coalescent (PSMC) curve based on whole-genome resequencing data from each of four white clover individuals. Each curve indicates inferred population size history through time. The analysis depicted here was carried out using a mutation rate of 1.8 x 10-8. See Supplemental Table 11 for results with lower mutation rates. E) Multiple sequentially Markovian coalescent (MSMC) figure based on the same data as the PSMC figure. The number of haplotypes corresponds to the number of individuals included. 8 haplotypes comprises all 4. 6 haplotypes comprises individuals 81, 122 and 183, all of the individuals which had the most pronounced bottlenecks. 4 haplotypes have 3 combinations of 2 individuals, [1] includes 81 and 122, [2] includes 122 and 183, [3] includes 81 and 183. The results were scaled to a mutation rate of 1.8 x 10-8. See Supplemental Figure 10 for MSMC using different mutations rates. F) Site frequency spectra (SFS) of simulations across 20k generations under the different demographic scenarios indicated on the right. EPS: Effective population size. The observed SFS is scaled to match total polymorphism count for the simulations. Dashed line represents the expected SFS under a constant effective population size. Green numbers indicate simulated SNP densities (%) and goodness of fit between observed and simulated data. The goodness of fit was calculated by first dividing the simulated and observed values for each bin with the maximum value of the two and then using the formula: Σ(observed-simulated)2/simulated. Lower values indicate a better fit. The observed SNP density was 7.0%.
Figure 4. Homoeologue-specific expression analysis. A) Genes classified based on the number of tissues in which the majority of reads were derived from the TrTo or the TrTp homoeologue. The numbers underneath the graph indicate the number of tissues for which the statement on the left is true. Most of the genes (69%) show the same direction of bias across all four tissues. B) Log(TrTo/TrTp) expression ratios in flowers versus leaves. C) Spearman correlation coefficients for cross-tissue comparisons of the total expression level for both homoeologues (TrTo+TrTp) and for the homoeologue expression ratios (TrTo/TrTp). Fl: flower. Le: leaf. St: stolon/shoot. Ro: root. Error bars indicate 95% confidence intervals calculated using the formula tanh(arctanh(r2)±1.96/sqrt(n−3)), where r2 is the Pearson correlation estimate and n=19,954 is the number of observations. D) Venn diagram showing the overlaps between the expression outlier genes detected in the Pooled, S9 and S10 experiments. E) Log(TrTo/TrTp) difference from the mean values from all three experiments (Pooled, S9, S10) plotted for a randomly selected gene and an expression ratio outlier. The grey horizontal line indicates the mean. m-leaf: mature leaf. leaf: emerging young leaf. F) Clustered heatmap showing row-normalized log(TrTo/TrTp) values for all 909 expression outliers detected using the ANOVA method. m-leaf: mature leaf. leaf: emerging young leaf. G) Smoothed scatter plot showing the log(TrTo/TrTp) difference from the mean for all 19,954 genes and the 909 ANOVA outliers, respectively. H) Expression ratio outliers in flavonoid metabolism. Blue text highlights enzymes found as expression ratio outliers. Numbers above enzyme names are enzyme category (EC) identifiers.
Parsed CitationsIWGSC)., I.W.G.S.C. (2014). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science345, 1251788.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Aasmo Finne, M., Rognli, O.A., and Schjelderup, I. (2000). Genetic variation in a Norwegian germplasm collection of white clover(Trifolium repens L.). Euphytica 112, 45-56.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Abberton, M.T., Fothergill, M., Collins, R.P., and Marshall, A.H. (2006). Breeding forage legumes for sustainable and profitable farmingsystems. Asp. Appl. Biol. 80, 81-87.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. (2003). Genes duplicated by polyploidy show unequal contributions to thetranscriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. USA 100, 4649-4654.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ainouche, M.L., Fortune, P.M., Salmon, A., Parisod, C., Grandbastien, M.-A., Fukunaga, K., Ricou, M., and Misset, M.-T. (2009).Hybridization, polyploidy and invasion: lessons from Spartina (Poaceae). Biol. Invasions 11, 1159-1173.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Akama, S., Shimizu-Inatsugi, R., Shimizu, K.K., and Sese, J. (2014). Genome-wide quantification of homeolog expression ratio revealednonstochastic gene regulation in synthetic allopolyploid Arabidopsis. Nucleic Acids Res 42, e46.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Álvarez, I., and Wendel, J.F. (2003). Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogen. Evol. 29, 417-434.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Anderson, C.B., Franzmayr, B.K., Hong, S.W., Larking, A.C., van Stijn, T.C., Tan, R., Moraga, R., Faville, M.J., and Griffiths, A.G. (2018).Protocol: a versatile, inexpensive, high-throughput plant genomic DNA extraction method suitable for genotyping-by-sequencing.Plant Methods 14, 75.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ansari, H.A., Ellison, N.W., and Williams, W.M. (2008). Molecular and cytogenetic evidence for an allotetraploid origin of Trifoliumdubium (Leguminosae). Chromosoma 117, 159-167.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ansari, H.A., Ellison, N.W., Reader, S.M., Badaeva, E.D., Friebe, B., Miller, T.E., and Williams, W.M. (1999). Molecular cytogeneticorganization of 5S and 18S-26S rDNA loci in white clover (Trifolium repens L.) and related species. Ann. Bot. 83, 199-206.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bardil, A., de Almeida, J.D., Combes, M.C., Lashermes, P., and Bertrand, B. (2011). Genomic expression dominance in the naturalallopolyploid Coffea arabica is massively affected by growth temperature. New Phytol. 192, 760-774.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bennett, M.D., and Leitch, I.J. (2011). Nuclear DNA amounts in angiosperms: targets, trends and tomorrow. Ann. Bot. 107, 467-590.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E.K., Liu, X., Gao, D., Clevenger, J., Dash, S., Ren, L.,Moretzsohn, M.C., Shirasawa, K., Huang, W., Vidigal, B., Abernathy, B., Chu, Y., Niederhuth, C.E., Umale, P., Araujo, A.C., Kozik, A., Kim,K.D., Burow, M.D., Varshney, R.K., Wang, X., Zhang, X., Barkley, N., Guimaraes, P.M., Isobe, S., Guo, B., Liao, B., Stalker, H.T., Schmitz,R.J., Scheffler, B.E., Leal-Bertioli, S.C., Xun, X., Jackson, S.A., Michelmore, R., and Ozias-Akins, P. (2016). The genome sequences ofArachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet. 48, 438-446.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bilton, T.P., McEwan, J.C., Clarke, S.M., Brauning, R., van Stijn, T.C., Rowe, S.J., and Dodds, K.G. (2018). Linkage DisequilibriumEstimation in Low Coverage High-Throughput Sequencing Data. Genetics 209, 389-400.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011). Scaffolding pre-assembled contigs using SSPACE.Bioinformatics 27, 578-579.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bottley, A., Xia, G.M., and Koebner, R.M.D. (2006). Homoeologous gene silencing in hexaploid wheat. Plant J. 47, 897-906.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Branca, A., Paape, T.D., Zhou, P., Briskine, R., Farmer, A.D., Mudge, J., Bharti, A.K., Woodward, J.E., May, G.D., Gentzbittel, L., Ben, C.,Denny, R., Sadowsky, M.J., Ronfort, J., Bataillon, T., Young, N.D., and Tiffin, P. (2011). Whole-genome nucleotide diversity,recombination, and linkage disequilibrium in the model legume Medicago truncatula. Proc. Natl. Acad. Sci. USA 108, E864-870.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and Elven, R. (2004). Polyploidy in arctic plants. Biol.J. Linn. Soc. 82, 521-536.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Buggs, R.J.A., Elliott, N.M., Zhang, L., Koh, J., Viccini, L.F., Soltis, D.E., and Soltis, P.S. (2010). Tissue-specific silencing of homoeologsin natural populations of the recent allopolyploid Tragopogon mirus. New Phytol. 186, 175-183.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Cao, M., Fraser, K., Jones, C., Stewart, A., Lyons, T., Faville, M., and Barrett, B. (2017). Untargeted metabotyping Lolium perennereveals population-level variation in plant flavonoids and alkaloids. Front. Plant Sci. 8, 133.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller,L.A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D.S., and Karp, P.D. (2016). The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, 471-480.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., and Cresko, W.A. (2013). Stacks: an analysis tool set for population genomics.Mol. Ecol. 22, 3124-3140.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Cenci, A., Combes, M.C., and Lashermes, P. (2012). Genome evolution in diploid and tetraploid Coffea species as revealed bycomparative analysis of orthologous genome segments. Plant Mol. Biol. 78, 135-145.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Chelaifa, H., Monnier, A., and Ainouche, M. (2010). Transcriptomic changes following recent natural hybridization and allopolyploidy inthe salt marsh species Spartina x townsendii and Spartina anglica (Poaceae). New Phytol. 186, 161-174.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Chen, J., Glémin, S., and Lascoux, M. (2017). Genetic diversity and the efficacy of purifying selection across plant and animal species.Mol. Biol. Evol. 34, 1417-1428.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Cheng, F., Wu, J., Fang, L., Sun, S., Liu, B., Lin, K., Bonnema, G., and Wang, X. (2012). Biased gene fractionation and dominant geneexpression among the subgenomes of Brassica rapa. PloS one 7, e36442.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Chester, M., Gallagher, J.P., Symonds, V.V., Cruz da Silva, A.V., Mavrodiev, E.V., Leitch, A.R., Soltis, P.S., and Soltis, D.E. (2012).Extensive chromosomal variation in a recently formed natural allopolyploid species, Tragopogon miscellus (Asteraceae). Proc. Natl.Acad. Sci. USA 109, 1176-1181.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Combes, M.-C., Dereeper, A., Severac, D., Bertrand, B., and Lashermes, P. (2013). Contribution of subgenomes to the transcriptomeand their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytol. 200, 251-260.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Google Scholar: Author Only Title Only Author and Title
Combes, M.C., Hueber, Y., Dereeper, A., Rialle, S., Herrera, J.C., and Lashermes, P. (2015). Regulatory divergence between parentalalleles determines gene expression patterns in hybrids. Genome biology and evolution 7, 1110-1121.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Coombe, D. (1961). Trifolium occidentale, a new species related to T. repens L. Watsonia 5, 68-87.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Cousins, G., and Woodfield, D.R. (2006). Effect of inbreeding on growth of white clover. In 13th Australasian Plant BreedingConference, C.F. Mercer, ed (Christchurch, New Zealand: New Zealand Grassland Association), pp. 568-572.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencingdata. BMC Bioinformatics 11, 485.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Daday, H. (1958). Gene frequencies in wild populations of Trifolium repens L. III. World distribution. Heredity 12, 169-184.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Darling, A.C.E., Mau, B., Blattner, F.R., and Perna, N.T. (2004). Mauve: multiple alignment of conserved genomic sequence withrearrangements. Genome Res. 14, 1394-1403.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Davies, K.M., Albert, N.W., Zhou, Y., and Schwinn, K.E. (2018). Functions of flavonoid and betalain pigments in abiotic stress tolerancein plants. Annual Plant Reviews online 1, 1-41.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
De Storme, N., and Geelen, D. (2014). The impact of environmental stress on male reproductive development in plants: biologicalprocesses and molecular mechanisms. Plant, Cell Environ. 37, 1-18.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M.,McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., and Daly, M.J. (2011). Aframework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491-498.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR:ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Douglas, G.M., Gos, G., Steige, K.A., Salcedo, A., Holm, K., Josephs, E.B., Arunkumar, R., Agren, J.A., Hazzouri, K.M., Wang, W., Platts,A.E., Williamson, R.J., Neuffer, B., Lascoux, M., Slotte, T., and Wright, S.I. (2015). Hybrid origins and the earliest stages of diploidizationin the highly successful recent polyploid Capsella bursa-pastoris. Proc. Natl. Acad. Sci. USA 112, 2806-2811.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Duchemin, W., Dupont, P.Y., Campbell, M.A., Ganley, A.R., and Cox, M.P. (2015). HyLiTE: accurate and flexible analysis of geneexpression in hybrid and allopolyploid species. BMC Bioinformatics 16, 8.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ellison, N.W., Liston, A., Steiner, J.J., Williams, W.M., and Taylor, N.L. (2006). Molecular phylogenetics of the clover genus (Trifolium -Leguminosae). Mol. Phylogen. Evol. 39, 688-705.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., and Mitchell, S.E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one 6, e19379.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186-194.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might have had a better chance to survive theCretaceous–Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106, 5737-5742.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Feldman, M., Liu, B., Segal, G., Abbo, S., Levy, A.A., and Vega, J.M. (1997). Rapid elimination of low-copy DNA sequences in polyploidwheat: a possible mechanism for differentiation of homoeologous chromosomes. Genetics 147, 1381-1387.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ferreira de Carvalho, J., Poulain, J., Da Silva, C., Wincker, P., Michon-Coudouel, S., Dheilly, A., Naquin, D., Boutte, J., Salmon, A., andAinouche, M. (2013). Transcriptome de novo assembly from next-generation sequencing and comparative analyses in the hexaploidsalt marsh species Spartina maritima and Spartina alterniflora (Poaceae). Heredity 110, 181-193.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Franzmayr, B.K., Rasmussen, S., Fraser, K.M., and Jameson, P.E. (2012). Expression and functional characterization of a white cloverisoflavone synthase in tobacco. Ann. Bot. 110, 1291-1301.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Garsmeur, O., Schnable, J.C., Almeida, A., Jourda, C., D'Hont, A., and Freeling, M. (2014). Two evolutionarily distinct classes ofpaleopolyploidy. Mol. Biol. Evol. 31, 448-454.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M.,Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., and Jaffe, D.B. (2011). High-quality draftassemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513-1518.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Griffiths, A.G., Barrett, B.A., Simon, D., Khan, A.K., Bickerstaff, P., Anderson, C.B., Franzmayr, B.K., Hancock, K.R., and Jones, C.S.(2013). An integrated genetic linkage map for white clover (Trifolium repens L.) with alignment to Medicago. BMC Genomics 14, 388.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Gross, B.L., Kane, N.C., Lexer, C., Rosenthal, L., Fulco, D.M., Donovan, L.A., and Rieseberg, L.H. (2004). Reconstructing the Origin ofHelianthus deserticola: Survival and Selection on the Desert Floor. Am. Nat. 164, 145-156.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E., and Wendel, J.F. (2012). Homoeolog expression bias andexpression level dominance in allopolyploids. New Phytol. 196, 966-971.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Hahsler, M., Buchta, C., and Hornik, K. (2016). Infrastructure for seriation. R package version 1.2-1. .Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Harris, R.S. (2007). Improved pairwise alignment of genomic DNA. In College of Engineering (The Pennsylvania State University), pp.84.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Herten, K., Hestand, M.S., Vermeesch, J.R., and Van Houdt, J.K. (2015). GBSX: a toolkit for experimental design and demultiplexinggenotyping by sequencing experiments. BMC Bioinformatics 16, 73.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2017). BRAKER2 (http://github.com/Gaius-Augustus/BRAKERDownloaded September 28, 2018).
Hoffmann, M.H. (2005). Evolution of the realized climatic niche in the genus Arabidopsis (Brassicaceae). Evolution 59, 1425-1436.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Hovav, R., Udall, J.A., Chaudhary, B., Rapp, R., Flagel, L., and Wendel, J.F. (2008). Partitioned expression of duplicated genes duringdevelopment and evolution of a single cell in a polyploid plant. Proc. Natl. Acad. Sci. USA 105, 6191-6195.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Jamalnasir, H., Wagiran, A., Shaharuddin, N.A., and Samad, A.A. (2013). Isolation of high quality RNA from plant rich in flavonoids,
Melastoma decemfidum Roxb ex. Jack. Australian Journal of Crop Science 7, 911-916.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Jia, J., Zhao, S., Kong, X., Li, Y., Zhao, G., He, W., Appels, R., Pfeifer, M., Tao, Y., Zhang, X., Jing, R., Zhang, C., Ma, Y., Gao, L., Gao, C.,Spannagl, M., Mayer, K.F., Li, D., Pan, S., Zheng, F., Hu, Q., Xia, X., Li, J., Liang, Q., Chen, J., Wicker, T., Gou, C., Kuang, H., He, G., Luo,Y., Keller, B., Xia, Q., Lu, P., Wang, J., Zou, H., Zhang, R., Xu, J., Gao, J., Middleton, C., Quan, Z., Liu, G., Yang, H., Liu, X., He, Z., andMao, L. (2013). Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91-95.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer,T., Ashton, B., Meintjes, P., and Drummond, A. (2012). Geneious Basic: An integrated and extendable desktop software platform for theorganization and analysis of sequence data. Bioinformatics 28, 1647-1649.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kelleher, J., Etheridge, A.M., and McVean, G. (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes.PLoS Comp. Biol. 12, e1004842.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes inthe presence of insertions, deletions and gene fusions. Genome Biol 14, R36.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kovarik, A., Dadejova, M., Lim, Y.K., Chase, M.W., Clarkson, J.J., Knapp, S., and Leitch, A.R. (2008). Evolution of rDNA in Nicotianaallopolyploids: a potential link between rDNA homogenization and epigenetics. Ann. Bot. 101, 815-823.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to thehuman genome. Genome Biol. 10, 1-10.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Leach, L.J., Belfield, E.J., Jiang, C., Brown, C., Mithani, A., and Harberd, N.P. (2014). Patterns of homoeologous gene expression shownby RNA sequencing in hexaploid bread wheat. BMC Genomics 15, 276.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Leitch, A.R., and Leitch, I.J. (2008). Genomic plasticity and the diversity of polyploid plants. Science 320, 481-483.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Li, H., and Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature 475, 493-496.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009).The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Li, Z., Baniaga, A.E., Sessa, E.B., Scascitelli, M., Graham, S.W., Rieseberg, L.H., and Barker, M.S. (2015). Early genome duplications inconifers and other seed plants. Sci. Adv. 1, e1501084.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Liu, Z., and Adams, K.L. (2007). Expression partitioning between genes duplicated by polyploidy under abiotic stress and during organdevelopment. Curr. Biol. 17, 1669-1674.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Liu, Z., Xin, M., Qin, J., Peng, H., Ni, Z., Yao, Y., and Sun, Q. (2015). Temporal transcriptome profiling reveals expression partitioning ofhomeologous genes contributing to heat and drought acclimation in wheat (Triticum aestivum L.). BMC Plant Biol. 15, 152.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. (2005). Gene identification in novel eukaryotic genomes by
self-training algorithm. Nucleic Acids Res. 33, 6494-6506.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Lynch, M. (2010). Evolution of the mutation rate. Trends Genet. 26, 345-352.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Madlung, A. (2013). Polyploidy and its effect on evolutionary success: old questions revisited with new tools. Heredity 110, 99-104.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mailund, T., Halager, A.E., Westergaard, M., Dutheil, J.Y., Munch, K., Andersen, L.N., Lunter, G., Prüfer, K., Scally, A., Hobolth, A., andSchierup, M.H. (2012). A new Isolation with migration model along complete genomes infers very different divergence processesamong closely related great ape species. PLoS Genet. 8, e1003125.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Majoros, W.H., Pertea, M., and Salzberg, S.L. (2004). TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.Bioinformatics 20, 2878-2879.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mangerud, J., Jakobsson, M., Alexanderson, H., Astakhov, V., Clarke, G.K.C., Henriksen, M., Hjort, C., Krinner, G., Lunkka, J.-P., Möller,P., Murray, A., Nikolskaya, O., Saarnisto, M., and Svendsen, J.I. (2004). Ice-dammed lakes and rerouting of the drainage of northernEurasia during the Last Glaciation. Quatern. Sci. Rev. 23, 1313-1332.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764-770.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mason, A.S., and Pires, J.C. (2015). Unreduced gametes: meiotic mishap or evolutionary mechanism? Trends Genet. 31, 5-10.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
McClintock, B. (1984). The significance of responses of the genome to challenge. Science 226, 792-801.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mierziak, J., Kostyn, K., and Kulma, A. (2014). Flavonoids as important molecules of plant interactions with the environment. Molecules19, 16240-16265.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mouradov, A., and Spangenberg, G. (2014). Flavonoids: a metabolic network mediating plants adaptation to their real estate. Front.Plant Sci. 5, 620.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Munch, D., Gupta, V., Bachmann, A., Busch, W., Kelly, S., Mun, T., and Andersen, S.U. (2018). The Brassicaceae family displaysdivergent, shoot-skewed NLR resistance gene expression. Plant Physiol. 176, 1598-1609.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Nadachowska-Brzyska, K., Burri, R., Smeds, L., and Ellegren, H. (2016). PSMC analysis of effective population sizes in molecularecology and its application to black-and-white Ficedula flycatchers. Mol. Ecol. 25, 1058-1072.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Novikova, P.Y., Hohmann, N., Nizhynska, V., Tsuchimatsu, T., Ali, J., Muir, G., Guggisberg, A., Paape, T., Schmid, K., Fedorenko, O.M.,Holm, S., Sall, T., Schlotterer, C., Marhold, K., Widmer, A., Sese, J., Shimizu, K.K., Weigel, D., Kramer, U., Koch, M.A., and Nordborg, M.(2016). Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specificpolymorphism. Nat. Genet. 48, 1077-1082.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). The rateand molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92-94.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Otto, S.P. (2007). The evolutionary consequences of polyploidy. Cell 131, 452-462.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ownbey, M. (1950). Natural hybridization and amphiploidy in the genus Tragopogon. Am. J. Bot. 37, 487-499.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Paape, T., Hatakeyama, M., Shimizu-Inatsugi, R., Cereghetti, T., Onda, Y., Kenta, T., Sese, J., and Shimizu, K.K. (2016). Conserved butAttenuated Parental Gene Expression in Allopolyploids: Constitutive Zinc Hyperaccumulation in the Allotetraploid Arabidopsiskamchatica. Mol Biol Evol 33, 2781-2800.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Raffl, C., Holderegger, R., Parson, W., and Erschbamer, B. (2008). Patterns in genetic diversity of Trifolium pallescens populations donot reflect chronosequence on alpine glacier forelands. Heredity 100, 526-532.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ramirez-Gonzalez, R.H., Borrill, P., Lang, D., Harrington, S.A., Brinton, J., Venturini, L., Davey, M., Jacobs, J., van Ex, F., Pasha, A.,Khedikar, Y., Robinson, S.J., Cory, A.T., Florio, T., Concia, L., Juery, C., Schoonbeek, H., Steuernagel, B., Xiang, D., Ridout, C.J.,Chalhoub, B., Mayer, K.F.X., Benhamed, M., Latrasse, D., Bendahmane, A., Wulff, B.B.H., Appels, R., Tiwari, V., Datla, R., Choulet, F.,Pozniak, C.J., Provart, N.J., Sharpe, A.G., Paux, E., Spannagl, M., Brautigam, A., and Uauy, C. (2018). The transcriptional landscape ofpolyploid wheat. Science 361.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ramsey, J., and Schemske, D.W. (1998). Pathways, mechanisms, and rates of polyploid formation in flowering plants. Annu. Rev. Ecol.Syst. 29, 467-501.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Rapp, R.A., Udall, J.A., and Wendel, J.F. (2009). Genomic expression dominance in allopolyploids. BMC Biol. 7, 18.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digitalgene expression data. Bioinformatics 26, 139-140.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, X., Cheng, L., Fan, D., Feng, Y., Han, L., Huang, Z.,Wu, Z., Liao, L., Settepani, V., Thøgersen, I.B., Vanthournout, B., Wang, T., Zhu, Y., Funch, P., Enghild, J.J., Schauser, L., Andersen, S.U.,Villesen, P., Schierup, M.H., Bilde, T., and Wang, J. (2014). Spider genomes provide insight into composition and evolution of venomand silk. Nature Communications 5, 3765.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. NatGenet 46, 919-925.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and bothancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108, 4069-4074.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Selmecki, A.M., Maruvka, Y.E., Richmond, P.A., Guillet, M., Shoresh, N., Sorenson, A.L., De, S., Kishony, R., Michor, F., Dowell, R., andPellman, D. (2015). Polyploidy can drive rapid adaptation in yeast. Nature 519, 349-352.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly andannotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Smit, A.F.A., Hubley, R., and Green, P. (2015). RepeatMasker Open-4.0 (http://www.repeatmasker.org Downloaded December 2018).
Soltis, D.E., Visger, C.J., Marchant, D.B., and Soltis, P.S. (2016). Polyploidy: Pitfalls and paths to a paradigm. Am. J. Bot. 103, 1146-1166.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Soltis, D.E., Soltis, P.S., Pires, J.C., Kovarik, A., Tate, J.A., and Mavrodiev, E. (2004). Recent and recurrent polyploidy in Tragopogon(Asteraceae): cytogenetic, genomic and genetic comparisons. Biol. J. Linn. Soc. 82, 485-501.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30, 159-165.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev.35, 119-125.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternativetranscripts. Nucleic Acids Res 34, W435-439.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Tang, H., Krishnakumar, V., Bidwell, S., Rosen, B., Chan, A., Zhou, S., Gentzbittel, L., Childs, K.L., Yandell, M., Gundlach, H., Mayer,K.F., Schwartz, D.C., and Town, C.D. (2014). An improved genome release (version Mt4.0) for the model legume Medicago truncatula.BMC Genomics 15, 312.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Tayalé, A., and Parisod, C. (2013). Natural pathways to polyploidy in plants and consequences for genome reorganization. Cytogenet.Genome Res. 140, 79-96.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012).Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Prot. 7, 562-578.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A., and Domany, E. (2005). Sorting points into neighborhoods (SPIN): dataanalysis and visualization by ordering distance matrices. Bioinformatics 21, 2301-2308.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Vanneste, K., Maere, S., and Van de Peer, Y. (2014a). Tangled up in two: a burst of genome duplications at the end of the Cretaceousand the consequences for plant evolution. Philos. Trans. R. Soc. B 369, 20130353.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014b). Analysis of 41 plant genomes supports a wave of successful genomeduplications in association with the Cretaceous–Paleogene boundary. Genome Res. 24, 1334-1347.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Wang, J., Tian, L., Madlung, A., Lee, H.S., Chen, M., Lee, J.J., Watson, B., Kagochi, T., Comai, L., and Chen, Z.J. (2004). Stochastic andepigenetic changes of gene expression in Arabidopsis polyploids. Genetics 167, 1961-1973.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Wang, X., Wang, H., Wang, J., Sun, R., Wu, J., Liu, S., Bai, Y., Mun, J.H., Bancroft, I., Cheng, F., Huang, S., Li, X., Hua, W., Freeling, M.,Pires, J.C., Paterson, A.H., Chalhoub, B., Wang, B., Hayward, A., Sharpe, A.G., Park, B.S., Weisshaar, B., Liu, B., Li, B., Tong, C., Song,C., Duran, C., Peng, C., Geng, C., Koh, C., Lin, C., Edwards, D., Mu, D., Shen, D., Soumpourou, E., Li, F., Fraser, F., Conant, G., Lassalle,G., King, G.J., Bonnema, G., Tang, H., Belcram, H., Zhou, H., Hirakawa, H., Abe, H., Guo, H., Jin, H., Parkin, I.A., Batley, J., Kim, J.S., Just,J., Li, J., Xu, J., Deng, J., Kim, J.A., Yu, J., Meng, J., Min, J., Poulain, J., Hatakeyama, K., Wu, K., Wang, L., Fang, L., Trick, M., Links, M.G.,Zhao, M., Jin, M., Ramchiary, N., Drou, N., Berkman, P.J., Cai, Q., Huang, Q., Li, R., Tabata, S., Cheng, S., Zhang, S., Sato, S., Sun, S.,Kwon, S.J., Choi, S.R., Lee, T.H., Fan, W., Zhao, X., Tan, X., Xu, X., Wang, Y., Qiu, Y., Yin, Y., Li, Y., Du, Y., Liao, Y., Lim, Y., Narusaka, Y.,Wang, Z., Li, Z., Xiong, Z., and Zhang, Z. (2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035-1039.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Williams, C.A., and Grayer, R.J. (2004). Anthocyanins and other flavonoids. Natural product reports 21, 539-573.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Williams, W.M., Mason, K.M., and Williamson, M.L. (1998). Genetic analysis of shikimate dehydrogenase allozymes in Trifolium repens L.Theor. Appl. Genet. 96, 859-868.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Williams, W.M., Ellison, N.W., Ansari, H.A., Verry, I.M., and Hussain, S.W. (2012). Experimental evidence for the ancestry of allotetraploidTrifolium repens and creation of synthetic forms with value for plant breeding. BMC Plant Biol. 12, 55.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Wood, D.E., and Salzberg, S.L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15,R46.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M., and Wang, X. (2014). Origin, inheritance, and gene regulatoryconsequences of genome dominance in polyploids. Proc. Natl. Acad. Sci. USA 111, 5283-5288.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics21, 1859-1875.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Yang, J., Liu, D., Wang, X., Ji, C., Cheng, F., Liu, B., Hu, Z., Chen, S., Pental, D., Ju, Y., Yao, P., Li, X., Xie, K., Zhang, J., Wang, J., Liu, F.,Ma, W., Shopan, J., Zheng, H., Mackenzie, S.A., and Zhang, M. (2016). The genome sequence of allopolyploid Brassica juncea andanalysis of differential homoeolog gene expression influencing selection. Nat. Genet. 48, 1225-1232.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Yokoyama, Y., Lambeck, K., De Deckker, P., Johnston, P., and Fifield, L.K. (2000). Timing of the Last Glacial Maximum from observedsea-level minima. Nature 406, 713-716.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Yoo, M.J., Szadkowski, E., and Wendel, J.F. (2013). Homoeolog expression bias and expression level dominance in allopolyploidcotton. Heredity 110, 171-180.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zerbino, D.R., and Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821-829.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, D., Pan, Q., Cui, C., Tan, C., Ge, X., Shao, Y., and Li, Z. (2015a). Genome-specific differential gene expressions in resynthesizedBrassica allotetraploids from pair-wise crosses of three cultivated diploids revealed by RNA-seq. Front. Plant Sci. 6, 957.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J., Saski, C.A., Scheffler, B.E., Stelly, D.M., Hulse-Kemp, A.M., Wan, Q.,Liu, B., Liu, C., Wang, S., Pan, M., Wang, Y., Wang, D., Ye, W., Chang, L., Zhang, W., Song, Q., Kirkbride, R.C., Chen, X., Dennis, E.,Llewellyn, D.J., Peterson, D.G., Thaxton, P., Jones, D.C., Wang, Q., Xu, X., Zhang, H., Wu, H., Zhou, L., Mei, G., Chen, S., Tian, Y., Xiang,D., Li, X., Ding, J., Zuo, Q., Tao, L., Liu, Y., Li, J., Lin, Y., Hui, Y., Cao, Z., Cai, C., Zhu, X., Jiang, Z., Zhou, B., Guo, W., Li, R., and Chen, Z.J.(2015b). Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat.Biotechnol. 33, 531-537.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, X., Zhang, Y., Yan, R., Han, J., Fuzeng, H., Wang, J., and Cao, K. (2010). Genetic variation of white clover (Trifolium repens L.)collections from China detected by morphological traits, RAPD and SSR. Afr. J. Biotechnol. 9, 3033-3041.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zimin, A.V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., and Yorke, J.A. (2013). The MaSuRCA genome assembler. Bioinformatics29, 2669-2677.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Recommended