Upload
others
View
25
Download
0
Embed Size (px)
Citation preview
1
Running Head : GWAS in tomato for metabolic traits 1
2
Corresponding Author: 3
Christopher Sauvage (Ph.D) 4
INRA, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL) 5
Domaine St Maurice – 67 Allée des Chênes, CS 60094, F-84143 MONTFAVET 6
Cedex, France 7
Tél : +33 (0)4 32 72 27 54 8
Email: [email protected] 9
10
Research Area: 11
Genes, Development and Evolution 12
13
Plant Physiology Preview. Published on June 3, 2014, as DOI:10.1104/pp.114.241521
Copyright 2014 by the American Society of Plant Biologists
2
Title: Genome Wide Association in tomato reveals 44 candidate loci for fruit 14
metabolic traits. 15
16
Authors : Christopher Sauvage 1, Vincent Segura 2, Guillaume Bauchet 1,3, Rebecca 17
Stevens 1, Phuc Thi Do 4,5, Zoran Nikoloski 4, Alisdair R. Fernie4 and Mathilde 18
Causse 1 19
1- INRA, UR1052, GAFL, 67 allée des Chênes Domaine Saint Maurice – CS60094, 20
84143 Montfavet Cedex (France) 21
2- INRA, UR0588, 2163 avenue de la Pomme de Pin, 45075 Orléans Cedex 2 22
(France) 23
3- Syngenta Seeds, 12 chemin de l’Hobit, 31790 Saint Sauveur (France) 24
4- Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam-Golm 25
(Germany) 26
5- Faculty of Biology, VNU University of Science, Vietnam National University, 27
Hanoi. 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam. 28
29
Summary : Genome wide association has shed light on the genetic architecture of 30
metabolic traits underlying fruit quality in tomato and allowed identification of 31
candidate genes. 32
33
34
3
Financial Source: 35
36
The EU EUSOL project PL016214-2 funded this work. 37
38
Corresponding Author : 39
40
Christopher Sauvage (Ph.D) 41
INRA, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL) 42
Domaine St Maurice - Allée des Chênes, CS 60094, F-84143 MONTFAVET Cedex, 43
France 44
Tél : +33 (0)4 32 72 27 54 45
Email: [email protected] 46
47
4
ABSTRACT 48
Genome-wide association studies have been successful in identifying genes involved 49
in polygenic traits and are valuable for crop improvement. Tomato (Solanum 50
lycopersicum) is a major crop and is highly appreciated worldwide for its health 51
value. We used a core collection of 163 tomato accessions composed of S. 52
lycopersicum, S. lycopersicum cv. cerasiforme and S. pimpinellifolium to map loci 53
controlling variation in fruit metabolites. Fruits were phenotyped for a broad range of 54
metabolites including amino acids, sugars and ascorbate. In parallel, the accessions 55
were genotyped with 5995 SNP markers spread over the whole genome. Genome 56
Wide Association analysis was conducted on a large set of metabolic traits that were 57
stable over two years, using a multi-locus mixed model as a general method for 58
mapping complex traits in structured populations (Segura et al., 2012) and applied to 59
tomato. We detected a total of 44 loci that were significantly associated with a total of 60
19 traits including sucrose, ascorbate, malate and citrate levels. These results not only 61
provide a list of candidate loci to be functionally validated but also a powerful 62
analytical approach for finding genetic variants that can be directly used for crop 63
improvement and deciphering the genetic architecture of complex traits. 64
5
INTRODUCTION 65
In crops, linkage mapping has proved invaluable for detecting quantitative trait loci 66
(QTL) for traits of interest and to unravel their underlying genetic architecture. This 67
approach is based on the analysis of the segregation of polymorphism between the 68
parental lines and their progeny. However, one of the limitations of this approach is 69
the reduced number of recombination events that occur per generation (for review, see 70
Korte & Farlow, 2013). This leads to extended linkage blocks that reduce the 71
accuracy of the linkage mapping. An alternative to linkage-based mapping studies is 72
to perform linkage disequilibrium (LD) mapping in a population of theoretically 73
unrelated individuals. The ancestral polymorphism segregating through this 74
population (or panel) is far more informative compared to the polymorphism of the 75
parental lines of the linkage mapping population (Mauricio, 2001). LD mapping, also 76
known as genome-wide association (GWA), relies on the natural patterns of LD in the 77
population investigated. The aim of GWA is to reveal trait-associated loci by taking 78
advantage of the level of LD. Depending on the decay of LD, the mapping resolution 79
can be narrowed from a large genomic portion where the level of LD is relatively high 80
to a single marker when the LD level is very low. 81
Following domestication, crops are prone to (1) increased levels of LD, (2) population 82
structure (remote common ancestry of large groups of individuals) and (3) cryptic 83
relatedness (the presence of close relatives in a sample of unrelated individuals) 84
(Riedelsheimer et al., 2012). Population structure and cryptic relatedness may lead to 85
false-positive association in GWA studies (Astle & Balding, 2009) but their effect is 86
now relatively well accounted for in mixed linear models (MLM) (for reviews see 87
(Sillanpää, 2010; Listgarten et al., 2012). The problem of high LD in GWA scans 88
must also be taken into account: Segura et al. (2012) investigated this difficulty by 89
proposing a multi-locus mixed-model (MLMM) that handles the confounding effect of 90
background loci that may be present throughout the genome due to linkage 91
disequilibrium. This approach revealed multiple loci in LD and associated with 92
sodium concentration in leaves in A. thaliana while previous methods failed to 93
identify this complex pattern (Segura et al., 2012). 94
In parallel, the development of cost-effective high-throughput sequencing 95
technologies has identified increasingly dense variant loci necessary to conduct GWA 96
scans, especially in model species such as rice for agronomic traits (Huang et al., 97
2010) or maize for drought tolerance (Lu et al., 2010 but see Soto-Cerda & Cloutier, 98
6
2012, for a complete review). However, GWA is not restricted to model species and is 99
becoming increasingly widespread in non-model ones such as sunflower (Mandel et 100
al., 2013) and tomato (Xu et al., 2013) where numerous associations have been 101
successfully identified for traits related to plant architecture (branching in the case of 102
sunflower) and fruit quality (e.g. fresh weight in tomato). 103
Tomato is a crop of particular interest as the fruit are an important source of fiber and 104
nutrients in the human diet and a model for the study of fruit development 105
(Giovannoni, 2001). Over the last two decades, numerous QTL have been identified 106
for traits such as fresh weight (fw) using linkage approaches (Frary et al., 2000; Zhang 107
et al., 2012; Chakrabarti et al., 2013) but also for other fruit related traits such as fruit 108
ascorbic acid levels (Stevens et al., 2007), sensory and instrumental quality traits 109
(Causse et al., 2002), sugar and organic acids (Fulton et al., 2002) or metabolic 110
components (Schauer et al., 2008). Large tomato germplasm collections have been 111
characterized at the molecular level using SSR (Ranc et al., 2008) and SNP markers 112
(Blanca et al., 2012; Shirasawa et al., 2013) giving insights into population structure, 113
tomato evolutionary history and the genetic architecture of traits of agronomical 114
interest. These screens of nucleotide diversity were made possible (for review, see 115
Bauchet & Causse, 2012) in the last couple of years due to the release of the tomato 116
genome sequence (The Tomato Genome Consortium, 2012) and derived genomic 117
tools such as a high density SNP genotyping array (Sim et al., 2012). The 118
combination of large germplasm collections, high-throughput genomic tools and traits 119
of economical interest provide a framework to apply GWA study in this species. In 120
tomato, previous association studies have been limited to a targeted region (e.g. 121
chromosome 2, Ranc et al., 2012), used low-density genome-wide distributed SNP 122
markers (Xu et al., 2013) or investigated a limited number of agronomical of traits 123
with low precision on the association panel (Shirasawa et al., 2013). 124
Using tomato (Solanum lycopersicum) as a model, we aimed to investigate the genetic 125
architecture of traits related to fruit metabolic composition, at high resolution. To 126
reach this objective, we carried out an investigation into LD patterns at the genome-127
wide scale and a GWA scan using the MLMM approach. We present results on the 128
genetic architecture of fruit metabolic composition for metabolites such as organic 129
acids, amino acids, sugars and ascorbate in tomato. 130
RESULTS 131
PHENOTYPING 132
7
We phenotyped a panel composed of 163 accessions for a total of 76 133
metabolic traits including amino acids, organic acids and sugars. The tomato diversity 134
panel was composed of 28 S. lycopersicum (S.L), 119 S. lycopersicum cv. cerasiforme 135
(S.C) and 16 S. pimpinellifolium (S.P) derived from the previously published core 136
collection described in Xu et al. (2013). From the set of 76 phenotypes, 36 of these 137
(47.3%) were highly correlated over the two years of sampling. Of these 36 138
phenotypes, significant differences between the three groups of tomato accessions 139
were identified for 26 phenotypes (70.3% - see Figure 1). The post-hoc TukeyHSD 140
test provided a more thorough investigation of the significant differences among the 141
three groups for each trait. Comparisons including S.P (S.P-S.L and S.P-S.C) were 142
more significantly different to the comparison S.L-S.C (see Figure 2). 143
The correlation pattern revealed clusters of highly correlated compounds in the 144
metabolic profile that largely corresponded to a functional classification of the 145
metabolites (Figure 1). For example, the concentration of fructose, sucrose, maltitol, 146
erythritol and maltose clustered together with SSC (Soluble Solid Content) while 147
amino acids (e.g. serine, threonine, methionine, asparagine) also clustered together. 148
We conducted GWA on this set of 36 phenotypes, which were stable (correlated) over 149
the two experimentation years using the MLMM approach (see Supplementary Table 150
S1 for the complete phenotype dataset). 151
GENOTYPING 152
From the initial 8,784 SNPs of the SOLCAP genotyping array, 7,720 (87.8%) 153
passed the manufacturing quality control and constituted our raw data set (see 154
Materials & Methods). From this raw data set, the quality filtering gave a total of 155
5,995 reliable SNP (77.6%), thus constituting the analysed data set for GWA. The 156
overall average percentage of missing data per locus was estimated at 3.84% in the 157
whole population, while ranging from 2.25% in S.L to 4.07% in S.P. The missing data 158
were imputed by the most common allele of the SNP as no missing data is allowed in 159
the MLMM. 160
The MAF (Minor Allele Frequency) values were evenly distributed from 161
0.001% to 0.5% and showed differences in their distribution between groups. The S.L 162
accessions showed an excess of rare variants with a skewed distribution of the MAF 163
values (median MAF= 0.107), while S.C and S.P accessions showed a broader 164
distribution of the MAF values (median MAF= 0.161 and 0.214, respectively). Such a 165
low median MAF in the S.L accessions may be attributed to (1) a higher proportion of 166
8
nearly monomorphic SNPs and (2) the shared ancestry within this group. This 167
observation is supported by a previous study that investigated the MAF pattern in sub-168
populations of tomato (Sim et al, 2012b). 169
POPULATION STRUCTURE 170
The pairwise-population FST differentiation index was estimated to be ~1% 171
(0.0102) between S.L and S.C, while between S.L and S.P and between S.C and S.P a 172
stronger population differentiation, estimated to be 0.2132 and to 0.1583 respectively, 173
was detected. These results are supported by the estimation of the population structure 174
using the STRUCTURE software. Following the ad hoc statistic ΔK, population 175
structure was apparent with the number of ancestral populations estimated to be two 176
(K=2). Whereas the first group was composed of a cluster of the S.L. accessions and 177
the S.C accessions (N=147), the second group was composed of a cluster of the S.P 178
accessions only (N=16). 179
ESTIMATES OF KINSHIP AND LINKAGE DISEQUILIBRIUM IN THE COLLECTION 180
Within the 163 accessions, the pairwise kinship estimates revealed a low 181
degree of relatedness between individuals with a mean overall estimate of 0.0738. 182
Pairwise linkage disequilibrium estimates (rs2) within each group revealed different 183
levels of LD along chromosomes. On average, LD was higher in S.L (rs2=0.57), 184
medium in S.C (rs2=0.54) and lower in S.P (rs
2=0.34). Within each group and for the 185
12 chromosomes, rs2 estimates ranged from 0.29 (K3) to 0.39 (K12) in S.P, from 186
0.5117 (K12) to 0.5619 (K11) in SC and from 0.52 (K9) to 0.62 (K6) in SL. More 187
details on LD estimates for each chromosome in the three groups by chromosome are 188
given in Table 1. 189
GENOME-WIDE ASSOCIATION 190
GWA was conducted trait by trait in order to dissect the optimal model 191
obtained from the MLMM. After correcting for multiple testing, GWA scan identified 192
a total of 44 loci that were significantly associated with 19 out of the 36 traits 193
(52.7%). These 44 loci were spread unevenly over the genome as all chromosomes 194
carried at least one association (chromosomes 1 and 12) but up to 10 associated loci 195
were located on chromosome 2. Moreover, the number of associated loci per trait 196
ranged from one (for 8 traits in total) to nine (for SSC). Table 2 reports the detailed 197
statistics of GWA (i.e. p-value, genomic location) for the loci associated with these 19 198
traits. For each trait, the heritability (estimated at the step 0 of the model, based on the 199
9
the variance component ���, computed for all markers and representing the estimated 200
genetic variance of the trait) ranged from 0.168 (threonate level) to 0.773 (proline 201
level) with a median value of 0.553 (over all traits), while the missing heritability (not 202
explained by the markers included in the model) ranged from 0.007 (threonine level) 203
to 0.458 (nicotinate level) with a median value of 0.250. The percentage of variation 204
explained for each trait was estimated from the optimal model obtained from the 205
MLMM. The percentage of variation explained ranged from 16.2% to 74.3% for the 206
aspartate level and the dehydroascorbate level traits, respectively, while for the SSC, 207
the percentage of variation was estimated as 0.611 (61%, see Table 3 for details). For 208
each trait, the Manhattan plot displaying p-values for each locus in relation to its 209
genomic location are shown in Supplementary Figure 1. 210
Finally, the peak SNP associated with SSC (Solcap_snp_sl_26678) that 211
belongs to a candidate gene (Solyc09g010080.2 (lin5), a fruit-specific Beta-212
fructofuranosidase or invertase), which plays a role in sugar metabolism (Fridman et 213
al., 2004), validates the methodological approach we employed by its mapping in our 214
panel. We identified putative candidate genes in the present study, especially in close 215
vicinity to 4 peak SNPs. For example, the peak SNP Solcap_snp_sl_26678 216
(chromosome 9, position: 2,411,368 bp) is associated with fruit ascorbate content, and 217
is located ≈423 kbp upstream of a monodehydroascorbate reductase (NADH)-like 218
protein (MDHAR, Solyc09g009390.2, position: 2,835,367 bp) previously shown to be 219
linked to fruit ascorbate levels under stress conditions (Stevens et al., 2008). 220
Similarly, the peak SNPs associated with nicotinate (Solcap_snp_sl_29349), malate 221
(Solcap_snp_sl_19899) and sucrose levels (Solcap_snp_sl_17956) are located at 680 222
kbp, 7.9 kbp and 68 kbp respectively from three putative candidate genes that play a 223
role in the genetic architecture of the variation of these traits that are described as a 224
nicotinate phosphoribosyl transferase protein (Solyc02g093290.2, position: 225
48,771,224 bp), an aluminum-activated malate transporter-like (Solyc06g072910.2, 226
position: 41,337,629 bp) and a sugar transporter (galactosylgalactosylxylosyl protein 227
3-beta-glucuronosyltransferase, Solyc04g076920.2, position: 59,461,803 bp), 228
respectively. 229
As a case study, we focused on the results associated with fruit malate content 230
by compiling all the results obtained for this trait. Figures 3 and 4 illustrate these 231
results. Malate levels were stable over the two years of the experiment (R2=0.621, 232
10
Figure 3A), differences in malate levels were significant between groups (Figure 2 233
and 3B) and the trait was normally distributed within the panel of accessions (Figure 234
3C). GWA identified two significant SNP associated with malate levels (see Figure 235
3D) without inflation in the distribution of p-values at the optimal step of the model 236
(see figure 3E) suggesting that population structure was well controlled. These two 237
SNPs explained a proportion of the trait variation estimated at 39% (see figure 3F). 238
For each peak SNP, located on chromosome 2 and 6, the allelic effects of each 239
genotypic class (homozygotes and heterozygote) were estimated (see figure 3 G and 240
H). Finally, we used the pairwise LD estimates (rs2) for each genomic location to (1) 241
narrow down the genomic interval and (2) seek for putative candidate genes in the 242
vicinity of the two peak SNP (Figure 4A and B), providing a local overview of the 243
extent of LD and revealing an aluminum-activated malate transporter-like 244
(Solyc06g072910.2, position: 41,337,629 bp) as a good candidate gene (see above). 245
246
DISCUSSION 247
The aims of the present study were to (1) investigate LD patterns in a panel of 163 248
tomato accessions including wild, admixed and cultivated accessions and (2) 249
implement a stepwise GWA approach to reveal associations between SNP markers 250
and traits related to fruit metabolites. We successfully achieved this objective with (1) 251
the investigation of the LD patterns revealing different levels of LD along 252
chromosomes and between the three groups constituting the panel and (2) the 253
detection of genome-wide associations for 19 fruit metabolic traits. Finally, we 254
demonstrated that GWA is powerful enough to link metabolic composition of fruits in 255
tomato with genetic variation at a high resolution, despite a high level of LD and 256
population structure. 257
258
METABOLITE PROFILING AND PHENOTYPING OF TRAITS 259
The phenotypic traits focused on in the present study were measured for two years in 260
a row (2007 and 2008) under similar growth conditions on an identical set of 163 261
tomato accessions. Only 36 traits (47.3%) were stable over the two years suggesting 262
that metabolite profiling is highly sensitive to the environmental conditions. Previous 263
studies have reported developmental stage × genotype or environmental × genotype 264
interactions for metabolite profiles, supporting the present results. For example, in 265
tomato, metabolite profiling of 26 compounds revealed significant genotype by 266
11
ripening stage interactions (Pascual-Banuls et al., 2013) whereas in durum wheat 267
significant variations in metabolites were attributed to genotype by environment 268
interactions (G×E, see Beleggia et al., 2013). 269
Investigation of correlations between the metabolites revealed significant 270
relationships between traits (see Figure 1). For example, a first cluster composed of 271
sugar-related traits (i.e. sucrose, fructose) as well as ascorbate and dehydroascorbate 272
levels were positively correlated. A second cluster of positively correlated metabolites 273
composed of eight proteogenic amino acids could be distinguished. The traits of these 274
two clusters were related and had significant and negative correlations. These 275
relationships have already been shown notably between several amino acids and 276
sugar-related traits (i.e. fructose and glucose, Pascual-Banuls et al., 2013). 277
Within the set of stable traits, ANOVA revealed significant differences between the 278
three groups of accessions (S.L, S.C and S.P) for 25 (25/37-67.5%) of these traits (see 279
Figure 2 for a boxplot representation). For example, the most significant differences 280
(p<1×10-09) were observed for ascorbate and dehydroascorbate levels or SSC with 281
higher levels in S.P compared to S.C and S.L. This was previously observed through 282
the detection of QTLs related to ascorbate levels (Stevens et al., 2007) and related to 283
SSC (Prudent et al., 2009), as well as through GWA for SSC (see Xu et al., 2013). 284
285
EXPLOITATION OF THE PATTERNS OF LINKAGE DISEQUILIBRIUM 286
The clear population structure allowed us to estimate the patterns of intra-287
chromosomal linkage disequilibrium in the three groups of accessions (S.L, S.C and 288
S.P) using a total of 5995 genome-wide markers. Our analysis of LD revealed 289
considerable variation across the tomato genome in the populations investigated. The 290
same observation was made in two previous studies that investigated LD patterns in 291
tomato on a genome-wide scale. Similar average r2 estimates (r2=0.464) were 292
obtained in fresh market tomato populations reflecting (1) the effects of selection on 293
genome variation and (2) the breeding history of tomato towards market 294
specialization (Robbins et al., 2011). However a second study highlighted biased 295
decay of LD between euchromatic and heterochromatic regions (Shirasawa et al., 296
2013). These previous studies support the high level of LD identified in the present 297
study. The different levels of LD may be interpreted as a direct effect of the 298
domestication that tomato (especially in SL) has undergone during its history through 299
bottlenecks and selective breeding that has led to a reduction in nucleotide diversity 300
12
and an extended LD following the elimination of recombinant lineages (Hamblin et 301
al., 2011). 302
In cultivated tomato, LD decays over large genomic regions (i.e. several hundreds of 303
Kb up to several Mb) and is advantageous for an association mapping approach, as it 304
requires fewer markers to cover the entire genome. On the other hand, the difficulty in 305
identifying the underlying causal polymorphism responsible for the phenotypic 306
variation represents the main drawback of these large blocks of LD. Identifying the 307
causal polymorphism from GWA signals remains challenging, especially in species 308
where dense genome coverage is still not achieved. To overcome these limitations, the 309
multiple locus mixed model (MLMM) proposed by Segura et al. (2012) handles the 310
confounding effect of background loci due to LD at the GWA scan step. This 311
approach out-performed the existing MLM notably by reducing the number of 312
significantly associated SNP rather than the number of peaks. This reduced the 313
number of candidate loci it was necessary to screen in order to identify the causal 314
polymorphism. 315
GENOME-WIDE ASSOCIATION FOR METABOLIC TRAITS 316
The GWA scan revealed a total of 44 loci (or peak SNP) associated with the variation 317
of 19 traits. These 44 loci accounted for various levels of estimated trait heritability 318
(from 0.168 to 0.773), missing heritability (from 0.007 to 0.458) and percentage of 319
trait variation (from 16.2% to 74.3%). These results suggest that different traits have 320
different genetic architectures: in some cases, a few genes may explain a large 321
proportion of the phenotypic variation (i.e. 2 loci explain 74.3% of the variation in 322
fruit dehydroascorbate) while numerous genes may only explain a fraction of the 323
phenotypic variation (i.e. 5 loci explain 33.2% of the ascorbate level variation). These 324
results are supported by similar observations in rice where various genetic 325
architectures were revealed using a GWAS approach for different traits of 326
agronomical interest (Zhao et al., 2011). However, it should be noted that in most 327
GWAS, significantly associated loci might contribute to a larger proportion of 328
phenotypic variation as many other small-to-medium-effect loci were not detected due 329
to the stringent threshold used in GWAS (False Discovery Rate - FDR) and the lack 330
of statistical significance for the control of false negatives caused by small effect sizes 331
(Visscher et al., 2012). Furthermore, the estimates of the missing heritability suggest 332
that for some traits most (or nearly all) loci underlying the variation in these traits 333
have been identified through the genome scan we conducted. For example, for 334
13
threonine, the missing heritability has been estimated to be 0.007, which means that 335
(1) all the loci responsible for the variation of this trait may have been identified and 336
(2) the genetic architecture of this trait may rely on a small number of genes (only one 337
associated locus in the present study). Conversely, for nicotinate, the genetic 338
architecture of the trait requires further investigation since the missing heritability has 339
been estimated at 0.458, which means that a large number of small effect loci or a 340
limited number of large effect loci remain to be identified. Taken together, these 341
observations suggest that the investigated genetic architecture is usually more 342
complex than it appears. 343
A total of 35 of the 44 associated loci (79.5%) were associated with the metabolic 344
traits (9 loci are associated with SSC) and accounted for between 16.2% (aspartate) 345
and 50.4% (rhamnose) of the percentage of variation of these traits. In a previous 346
investigation of QTL related to metabolic traits using a lower number of lines (N=76), 347
Schauer et al. (2008) detected 104 mQTL for 22 distinct amino acids in tomato. Our 348
results obtained using a GWAS approach contrast with these results in terms of the 349
number of QTLs. However, this difference may reflect the methodological principles 350
underlying both approaches. The more stringent threshold used in GWAS (i.e. FDR) 351
and the confounding effect of population structure may explain this difference. This 352
has been observed in the study of glucosinolate metabolites in Arabidopsis (Chan et 353
al., 2010) and the study of leaf metabolic profiles in maize (Riedelsheimer et al., 354
2012). In the latter study, when comparing a linkage mapping experiment to a GWA 355
scan, increased genetic variation was reported suggesting that the genetic variability is 356
greater in the GWAS as it relies on a larger genetic pool (several up to hundreds of 357
individuals) whereas a linkage experiment relies on a much narrower genetic pool (i.e. 358
a couple of parental lines) notably (Riedelsheimer et al., 2012). 359
The 44 associated loci are spread over the tomato genome as every chromosome 360
carries at least one association (chromosomes 1 and 12) with up to 10 on chromosome 361
2. In tomato, chromosome 2 was suggested to be interesting as it carries a lot of QTLs 362
for traits of interest such as fresh weight (fw2.2, Frary et al., 2000), fruit morphology 363
(Causse et al., 2002) or locule number (lcn 2.1, Muños et al., 2011) to name but a few 364
(see Ranc et al., 2012 and Xu et al., 2013 for more information) supporting the 365
identification of numerous associations on this chromosome in the present study. 366
In several cases, peak SNPs of different traits co-localized or the same peak SNP was 367
associated with different traits. For example, peak SNPs for sucrose and erythritol 368
14
levels (solcap_snp_sl_12459 and solcap_snp_sl_13558) co-localized in a region of 369
68kb on chromosome 2. In a similar way, peak SNPs associated with asparagine, 370
threonine and nicotinate levels were localized within an interval of 508.1 kbp on 371
chromosome 2 (solcap_snp_sl_32389 and solcap_snp_sl_29349 respectively). Such 372
co-localization of peak SNPs has been observed in other GWAS experiments in 373
tomato (Xu et al., 2013), Arabidopsis (Bergelson & Roux, 2010) or rice (Zhao et al., 374
2011) suggesting the presence of genes with pleiotropic effects or closely linked 375
genes. 376
On chromosome 2, both asparagine and threonine level traits are associated with the 377
same peak SNP (solcap_snp_sl_32389 – annotated as a Copine-like protein). These 378
two α-amino acids belong to the class of polar uncharged side chain amino acids and 379
are indirectly linked to the Krebs cycle as their biosynthesis relies on oxaloacetate, 380
which is, as an acceptor compound of this cycle, one of its major metabolic 381
intermediates. The identification of an association between asparagine and threonine 382
levels at the same peak SNP means that this locus is located in the close vicinity of 383
one or several crucial and pleiotropic effect gene(s) directly involved in the metabolic 384
pathway of both asparagine and threonine synthesis. This observation suggests that 385
the genomic region around the peak SNP solcap_snp_sl_32389 has to be investigated 386
further to seek for causal polymorphism and candidate genes underlying the genetic 387
architecture of the asparagine and threonine level traits. 388
Malate and citrate levels were associated with one peak SNP located on chromosome 389
6 (solcap_snp_sl_19899) and two peak SNPs located on chromosomes 2 and 6 390
(solcap_snp_sl_6196 and solcap_snp_sl_19899), respectively (see Figure 4 for 391
detailed Manhattan plots and LD patterns). Interestingly, malate and citrate were 392
associated with the same peak SNP (solcap_snp_sl_19899), located on chromosome 6 393
and annotated as a conserved gene of unknown function. This observation suggests 394
that either in this genomic region, the LD block that this peak SNP belongs to is 395
particularly extended or alternatively that this peak SNP was identified close to a gene 396
involved in the citrate and the malate metabolic pathways. The figure 4b representing 397
the local pattern of LD around the peak SNP suggests that the LD level is relatively 398
low in this genomic region. From a functional point of view, citrate and malate are 399
two organic compounds found in most ripe fruits (Etienne et al., 2013) and have been 400
demonstrated to be highly correlated with many important regulators of ripening in 401
studies that have investigated early fruit development (Mounet et al., 2009; Centeno et 402
15
al., 2011). This result suggests that in the present study we were able to identify a 403
peak SNP that is located near to one or several putative candidate gene(s) playing a 404
crucial role in the citrate and the malate metabolic pathways. Moreover, for the malate 405
level trait, the phenotype of heterozygous individuals is intermediate to that of 406
homozygotes (see Figure 3) suggesting an incomplete dominance effect. 407
SEARCHING FOR CANDIDATE GENES 408
In the present study, we conducted GWA using a MLMM to identify more precisely 409
putative candidate genes involved in the genetic architecture of fruit metabolic traits, 410
by taking advantage notably of the LD pattern. For numerous peak SNPs, their 411
functional annotation is not directly linked to the trait they are associated with. 412
However, for some of these peak SNPs, they directly target a previously characterized 413
candidate gene or are located in the close vicinity of putative candidate genes. For 414
example, the peak SNP associated with fruit SSC (Solcap_snp_sl_26678) belongs to a 415
previously validated candidate gene (Solyc09g010080.2, lin5) which encodes a cell 416
wall invertase, and is a locus for a QTL that positively affects tomato fruit sugar 417
content, and hence the important ‘soluble solids’ commercial trait (Schauer et al., 418
2006). However, more putative candidate genes have been identified in the present 419
study. For example, the peak SNP (Solcap_snp_sl_26678) located on chromosome 9 420
(position: 2,411,368 bp) and associated with fruit ascorbate levels is located near to 421
(423 kbp) a monodehydroascorbate reductase (NADH)-like protein (MDHAR, 422
Solyc09g009390.2, position: 2,835,367 bp) that has been previously identified using a 423
QTL fine mapping approach (Stevens et al., 2008). Similarly, the peak SNPs 424
associated with nicotinate, malate and sucrose levels (Solcap_snp_sl_29349, 425
Solcap_snp_sl_19899 and Solcap_snp_sl_17956, respectively) are also located near 426
to (680 kbp, 7.9 kbp and 68 kbp, respectively) putative candidate genes that play a 427
role in the genetic architecture of the variation of these traits. Indeed, these three 428
putative candidate genes are described as a nicotinate phosphoribosyl transferase 429
protein (Solyc02g093290.2, position: 48,771,224 bp), an aluminum-activated malate 430
transporter-like (Solyc06g072910.2, position: 41,337,629 bp) and a sugar transporter 431
(“galactosylgalactosylxylosyl protein 3-beta-glucuronosyltransferase”, 432
Solyc04g076920.2, position: 59,461,803 bp), respectively. Thus these results open the 433
door for subsequent analyses based on either fine localization of the putative 434
candidate gene using a targeted re-sequencing approach combined with GWAS to 435
identify and confirm the causal polymorphism or functional validation for example by 436
16
transgenic approaches to investigate the biological role of the putative candidate gene 437
(e.g. fine mapping of the fw 3.2 locus, Chakrabarti et al., 2013). 438
CONCLUSION AND PERSPECTIVES 439
These results show that high resolution GWA, by using a multi-locus mixed linear 440
model, has been successful in tomato in deciphering the genetic architecture of fruit 441
composition traits. This led to the identification of promising candidate loci that 442
underlie the genetic architecture of traits such as fruit malate and citrate levels, 443
opening the door to further validation and functional investigation of this locus. The 444
next analytical step relies on the integration of the recent methodological 445
developments such as data imputation (Marchini & Howie, 2010; Howie et al., 2012; 446
Porcu et al., 2013) and haplotype based models (Powell et al., 2012) and should 447
facilitate the identification of novel loci with a higher degree of accuracy. 448
MATERIAL AND METHODS 449
PLANT MATERIAL 450
The tomato diversity panel consisted of 163 accessions composed of 28 S. 451
lycopersicum (S.L), 119 S. lycopersicum cv. cerasiforme (S.C) and 16 S. 452
pimpinellifolium (S.P) derived from the previously published core collection 453
described in Xu et al. (2013). Cherry type tomato (S.C) is an admixture between 454
tomato (S.L) and its closest wild relative (S.P) possibly resulting from the frequent 455
hybridizations between them (Nesbitt & Tanksley, 2002; Ranc et al., 2008). In GWA 456
experiments, the power to detect genetic effects is linked to minor allele frequency 457
(MAF) at genotyped loci. The mixing of different groups, populations or subspecies 458
within a panel, will enhance the efficiency of the approach by capturing rare and 459
common variants. Using a diversity panel composed of several sub-populations or 460
species is a common practice in GWA experiments. For example, in O. sativa, Zhao 461
et al. (2011) used a worldwide diversity panel composed of 5 different species to 462
unravel the complex genetics underlying the natural variation of 34 traits in Oryza 463
sativa. In the present study, we mixed accessions of S.L, S.C and S.P in order to (1) 464
cover the broader range of phenotypic and genetic diversity, as we did not expected a 465
uniform distribution of SNP MAF within each group and to (2) overcome the limited 466
statistical power of GWA due to the skewed distribution of SNP MAF within the 467
panel, especially in terms of the detection of false positive associations (Tabangin et 468
al., 2009). 469
17
Plants (four replicates) were grown in the tunnel in Avignon (France) during summer 470
2007 and 2008 (growth conditions are also described in Xu et al., 2013). Fruits were 471
harvested at the ripe stage. Pericarp tissue from 5 fruits per accession was collected to 472
be frozen in liquid nitrogen and stored at -80°C for metabolomic profiling. DNA was 473
isolated from 100 mg frozen leaves using to the DNeasy Plant Mini Kit (Qiagen, 474
Valencia, California, USA) for the subsequent genotyping assay. Leaf samples 475
corresponded to fully expanded but non-senescent leaves. DNA was quantified using 476
the Quant-iT™ PicoGreen® dsDNA Assay Kit (Invitrogen) according to the 477
manufacturer’s protocol. 478
479
PHENOTYPIC VARIATION 480
METABOLITE PROFILING 481
Tomato pericarp tissue was homogenized, and the exact amount used for metabolite 482
extraction was defined. Three biological replicates were analysed for each accession. 483
Metabolite extraction, derivatization, gas chromatography-mass spectrometry analysis 484
(GC-MS), and data processing were performed as described in Schauer et al. (2006). 485
Metabolites were identified in comparison with database entries of authentic standards 486
(Kopka et al., 2005; Schauer et al., 2005). A total of 76 metabolites were measured in 487
the present experiment (see Supplementary Table S1, for the complete list) including 488
amino acids, sugars and sugar alcohols, and organic acids. 489
490
ASCORBIC ACID LEVEL 491
Ascorbate and dehydroascorbate were measured separately from the metabolite 492
profiling using a microplate assay as described in Stevens et al. (2006) on the frozen 493
pericarp material stored at -80°C. Extractions and assays were carried out in ice-cold 494
6% TCA in triplicate. The assay used was a spectrophotometric assay based on the 495
detection of dipyridyl-Fe2+ complexes following the reduction of Fe3+ to Fe2+ by the 496
reduced form of ascorbate present in the samples and comparison with standards of 497
known concentrations. Total ascorbate (reduced ascorbate + dehydroascorbate) was 498
measured by mixing the sample with 5mM DTT, to reduce dehydroascorbate, prior to 499
the assay. Dehydroascorbate concentrations were therefore calculated to be the 500
difference between the samples with and without DTT added. 501
502
SOLUBLE SOLID CONTENT AND SUGARS 503
18
Concentration of fructose, glucose and sucrose were determined within the 163 504
accessions for the two years of sampling using the micro-method developed by 505
Gomez et al. (2007). This method is precise, linear and accurate when compared to 506
HPLC methods. Measurement of soluble solid content (SSC) in °Brix was performed 507
as described in Xu et al. (2013) on fruit frozen powder derived from blending fruits 508
with liquid N2. SSC values primarily represent estimates of sugar content in fruits and 509
vegetables. 510
DATA NORMALISATION AND STATISTICAL ANALYSES 511
All descriptive statistics and analyses were performed using [R] 2.15.1 except if 512
otherwise specified. A non-parametric Kendall (W) test was used to assess agreement 513
among the biological replicates and to remove any outlier measurements. Thus, for 514
each year, biological replicates were averaged and the normal distribution of the data 515
was tested using a Shapiro–Wilk test. The normality test revealed that 29 of the 76 516
phenotypes (38.1%) were not normally distributed and were log10 transformed. For 517
each phenotype a linear regression revealed correlation between the two years. Only 518
highly correlated phenotypes (correlation coefficient R2 > 0.6) between the two years 519
of sampling were averaged and used in the GWA study. The pairwise correlation 520
between phenotypes was evaluated using a Spearman test (p<0.05). The graphical 521
representation of the pairwise correlation between each phenotype was produced 522
using the R package ‘Corrplot’ and the ‘hclust’ clustering method (Friendly, 2002). 523
Finally, for each phenotype, an analysis of variance (ANOVA) tested significant 524
differences between the groups of accessions (S.L / S.C / S.P). Then, a post-ANOVA 525
Tukey HSD (Honest Significant Difference) test created a set of confidence intervals 526
on the differences between the means of each trait, for which a significant association 527
was detected, to test for significant differences between the pairwise means among the 528
3 groups of the panel (i.e. significant difference for the mean level of malate between 529
S.L and S.P). Significance was declared for p values < 0.05. Finally, the individual 530
missing phenotype data that ranged from 0.033% to 12.7% (median of 0.32%), were 531
replaced by the mean value of the trait computed for all the accessions of the panel, as 532
required by the MLMM model. 533
534
GENOTYPING ARRAY AND SNP SELECTION 535
The SNP genotyping was performed using the Infinium assay (Illumina Inc.), 536
developed by the Solanaceae Coordinated Agricultural Project (SolCAP- Hamilton et 537
19
al., 2012; Sim et al., 2012) to genotype the collection of tomato accessions (according 538
to the manufacturer’s standard protocol). The probe sequences and SNP information 539
are available from SolCAP (http://solcap.msu.edu). The SNP calling rate threshold 540
per locus was set at 90%. A minor allele frequency (MAF) ranging from 0.037 to 0.45 541
was used to filter the raw genotype data set. The minimal MAF was set according to 542
the formula: (# of chromosomes/(2*#individuals)) as proposed by Aulchenko (2007). 543
The minimal success rate of genotyping per accession was fixed to 90%. All SNPs 544
and accessions that did not respect these criteria were removed using the --maf option 545
implemented in PLINK! (Purcell et al., 2007). 546
547
ESTIMATION OF POPULATION DIFFERENTIATION AND STRUCTURE 548
Initially the fixation index (FST, Weir & Cockerham, 1984) estimation was performed 549
between the three groups of tomato accessions to get an overview of the population 550
structure. Then, the STRUCTURE software 2.3.3 (Pritchard et al., 2000; Falush et al., 551
2003) was used to infer the number of ancestral populations based on the filtered SNP 552
data set (see results for the number of sites used) and thus to assign the 163 553
individuals to populations (Q matrix). The most likely number of clusters K in all 554
simulations was assumed to be in the range of K=1 to K=10. Ten replicates were 555
conducted for each K with a burn-in period of 1×106, followed by 5×106 MCMC steps 556
using the Bioportal computing resource (https://www. bioportal.uio.no/; Kumar et al., 557
2009). These parameters met the requirements for the use of the STRUCTURE software 558
proposed in Gilbert et al. (2012) to insure the reproducibility of the results of the 559
present study. The ad hoc statistic ΔK was used to determine the most probable K 560
(Evanno et al., 2005). The ancestry estimation using ADMIXTURE software (Alexander 561
et al., 2009), based on the maximum likelihood estimation of individual ancestries 562
from multi-locus SNP genotype datasets, was used to support the identification of the 563
ancestral populations performed with the STRUCTURE software. 564
565
KINSHIP AND LINKAGE DISEQUILIBRIUM ESTIMATION 566
SPAGeDi software (Hardy & Vekemans, 2002) was used to estimate the Ritland 567
(1995) matrix of pairwise kinship coefficient (K matrix) from the filtered SNP data 568
set (see results section) using a 10,000 bootstrap resampling procedure. Then, the 569
intra-chromosomal linkage disequilibrium (LD) between all pairs of sites was 570
estimated using an unbiased (as individuals are not independent) estimation (named 571
20
rs2), that uses the population structure matrix, and consisting of information about the 572
origins of each individual and the admixture proportions of each individual genome. 573
The method is implemented in the [R] package called LDCORSV (Mangin et al., 574
2011). Finally, the snp.plotter [R] package gave a graphical representation of the 575
pairwise LD estimates at a local scale 576
(http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html). 577
578
GWA MAPPING 579
GWA analyses were performed with correction for population structure (Q) and 580
modeling phenotypic covariance with the Kinship (K) matrix. The QQplots were used 581
to determine the most appropriate correcting method for each analyzed phenotype. 582
Thus, these matrices were implemented into a modified version of the multi-locus 583
mixed model (MLMM) described in Segura et al. (2012) that takes into account the 584
population structure as a cofactor (see the mlmm_cof.r [R] script at 585
https://cynin.gmi.oeaw.ac.at/home/resources/mlmm). Briefly, the MLMM model is 586
based on the EMMA library (Kang et al., 2008). The approach relies on a simple, 587
stepwise mixed-model regression with forward inclusion and backward elimination, 588
while re-estimating the variance components of the model at each step. This method 589
increases the detection power and reduces the false discovery rate (FDR) when 590
compared to traditional single-locus approaches. Two model selection criteria are 591
implemented in MLMM for multi-testing correction: the extended Bayesian 592
information criterion (EBIC - Chen & Chen, 2008) and the multiple-Bonferroni 593
criterion (mBonf), defined as the largest model in which all cofactors have a p-value 594
below a Bonferroni-corrected threshold (we used a threshold of 0.05 – see Segura et 595
al. 2012 for details). From the optimal model obtained with MLMM (according to both 596
EBIC and mBonf criteria) the percentage of variation explained were obtained for 597
each phenotype. Moreover, for each trait, the phenotypic heritability was obtained at 598
the step 0 in the MLMM model, when no marker is included in the model, whereas the 599
missing heritability (percentage of the variance not explained by the markers) was 600
obtained at the optimal step of the MLMM model. Briefly, the MLMM model 601
partitions the phenotypic variance into genetic, random and explained variance at each 602
step, suggesting a natural stopping criterion (genetic variance of 0) for including 603
cofactors. This estimates of the explained and unexplained heritable variance of the 604
analysed trait. 605
21
DATA AVAILABILITY 606
To ensure their accessibility, the genotyping (5995 SNP) and phenotyping data 607
(N=36), as well as the structure and kinship matrices for the 163 accessions were 608
deposited on the GNPis repository hosted at https://urgi.versailles.inra.fr/association 609
(Steinbach et al., 2013). The complete phenotype dataset is also available in 610
Supplementary Table S1. 611
ACKNOWLEDGEMENTS 612
We gratefully acknowledge critical input from the associate editor, Jocelyn Rose, and 613
the two anonymous reviewers. The authors are grateful to Hélène Burck and Yolande 614
Carretero for their invaluable help, characterizing and maintaining the INRA tomato 615
Genetic Resources collection, to the Experimental Installation Unit of the INRA-616
UR1052 research unit for maintaining the facilities. Authors are also grateful to 617
Sophie Rolland, Nicolas Ranc, Stéphane Munos for their early involvement in the 618
experiment, as well as Gisèle Riqueau and Renaud Duboscq for DNA and tissue 619
sampling and ascorbate phenotyping. Finally the authors would like to thanks Martin 620
Ganal (TraitGenetics GmbH, Gatersleben, Germany) for his help with SNP array 621
genotyping. 622
623
624
AUTHOR CONTRIBUTIONS 625
Conceived and designed the experiments: AF, PTD, MC. Performed the experiments: 626
CS, PTD, AF. Analyzed the data: CS, VS, GB, RS, ZN. Wrote the paper: CS, VS, RS. 627
628
22
LITERATURE CITED 629
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of 630
ancestry in unrelated individuals. Gen. Res. 19: 1655-1664. 631
Astle W, Balding DJ (2009) Population structure and cryptic relatedness in genetic 632
association studies. Stat. Sci. 24(4): 451-471. 633
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. (2007) GenABEL: an R library 634
for genome-wide association analysis. Bioinformatics 23(10): 1294-1296. 635
Bauchet G, Causse M (2012) Genetic diversity in tomato (Solanum lycopersicum) 636
and its wild relatives. In: Mahmut C ed. Genetic Diversity in Plants: ISBN: 637
978-953-51-0185-7, InTech, DOI: 10.5772/33073. 638
Beleggia R, Platani C, Nigro F, De Vita P, Cattivelli L, Papa R (2013) Effect of 639
genotype, environment and genotype-by-environment interaction on 640
metabolite profiling in durum wheat (Triticum durum Desf.) grain. J Cereal 641
Sci 57(2): 183-192. 642
Bergelson J, Roux F (2010) Towards identifying genes underlying ecologically 643
relevant traits in Arabidopsis thaliana. Nat Rev Genet 11(12): 867-879. 644
Blanca J, Canizares J, Cordero L, Pascual L, Diez MJ, Nuez F (2012) Variation 645
revealed by SNP genotyping and morphology provides insight into the origin 646
of the tomato. PLoS ONE 7(10): e48198. 647
Brachi B, Morris G, Borevitz J (2011) Genome-wide association studies in plants: 648
the missing heritability is in the field. Genome Biol 12(10): 232. 649
Causse M, Saliba-Colombani V, Lecomte L, Duffe P, Rousselle P, Buret M 650
(2002) QTL analysis of fruit quality in fresh market tomato: a few 651
chromosome regions control the variation of sensory and instrumental traits. J. 652
Exp. Bot. 53(377): 2089-2098. 653
Centeno DC, Osorio S, Nunes-Nesi A, Bertolo AL, Carneiro RT, Araujo WL, 654
Steinhauser MC, Michalska J, Rohrmann J, Geigenberger P, Oliver SN, 655
Stitt M, Carrari F, Rose JK, Fernie AR (2011) Malate plays a crucial role in 656
starch metabolism, ripening, and soluble solid content of tomato fruit and 657
affects postharvest softening. Plant Cell 23(1): 162-184. 658
Chakrabarti M, Zhang N, Sauvage C, Munos S, Blanca J, Canizares J, Diez MJ, 659
Schneider R, Mazourek M, McClead J, Causse M, van der Knaap E 660
(2013) A cytochrome P450 regulates a domestication trait in cultivated 661
tomato. Proc Natl Acad Sci U S A 110(42): 17125-17130. 662
Chan EKF, Rowe HC, Kliebenstein DJ (2010) Understanding the evolution of 663
defense metabolites in Arabidopsis thaliana using genome-wide association 664
mapping. Genetics 185(3): 991-1007. 665
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection 666
with large model spaces. Biometrika 95(3): 759-771. 667
Etienne A, Génard M, Lobit P, Mbeguié-A-Mbeguie D, Bugaud C (2013) What 668
controls fleshy fruit acidity? A review of malate and citrate accumulation in 669
fruit cells. J Exp Bot 64(6): 1451-1469. 670
Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of 671
individuals using the software structure: a simulation study. Mol. Ecol. 14(8): 672
2611-2620. 673
Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using 674
multilocus genotype data: linked loci and correlated allele frequencies. 675
Genetics 164(4): 1567-1587. 676
23
Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, Liu J, Meller J, Elber R, 677
Alpert KB, Tanksley SD (2000) fw2.2: a quantitative trait locus key to the 678
evolution of tomato fruit size. Science 289 (5476): 85-88. 679
Fridman E, Carrari F, Liu YS, Fernie AR, Zamir D (2004) Zooming in on a 680
quantitative trait for tomato yield using interspecific introgressions. Science 681
305 (5691): 1786-1789. 682
Friendly M (2002) Corrgrams: Exploratory displays for correlation matrices. Am Stat 683
56: 316-324. 684
Fulton TM, Bucheli P, Voirol E, Lopez J, Petiard V, Tanksley SD (2002) 685
Quantitative trait loci (QTL) affecting sugars, organic acids and other 686
biochemical properties possibly contributing to flavor, identified in four 687
advanced backcross populations of tomato. Euphytica 127(2): 163-177. 688
Gilbert KJ, Andrew RL, Bock DG, Franklin MT, Kane NC, Moore J-S, Moyers 689
BT, Renaut S, Rennison DJ, Veen T, Vines TH (2012) Recommendations 690
for utilizing and reporting population genetic analyses: the reproducibility of 691
genetic clustering using the program structure. Mol Ecol 21(20): 4925-4930. 692
Giovannoni J (2001) Molecular biology of fruit maturation and ripening. Annu Rev 693
Plant Physiol Plant Mol Biol 52: 725-749. 694
Gomez L, Bancel D, Rubio E, Vercambre G (2007) The microplate reader: an 695
efficient tool for the separate enzymatic analysis of sugars in plant tissues-696
validation of a micro-method. J Sci Food Agric 87(10): 1893-1905. 697
Hamblin MT, Buckler ES, Jannink JL (2011) Population genetics of genomics-698
based crop improvement methods. Trends Genet 27(3): 98-106. 699
Hamilton JP, Sim S-C, Stoffel K, Van Deynze A, Buell CR, Francis DM (2012) 700
Single nucleotide polymorphism discovery in cultivated tomato via 701
sequencing by synthesis. Plant Gen. 5(1): 17-29. 702
Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to analyse 703
spatial genetic structure at the individual or population levels. Mol Ecol Notes 704
2: 612-620. 705
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and 706
accurate genotype imputation in genome-wide association studies through pre-707
phasing. Nat Genet 44(8): 955-959. 708
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, 709
Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng Q, 710
Liu K, Huang T, Zhou T, Jing Y, Lin Z, Buckler ES, Qian Q, Zhang QF, 711
Li J, Han B (2010) Genome-wide association studies of 14 agronomic traits 712
in rice landraces. Nat Genet 42(11): 961-967. 713
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E 714
(2008) Efficient control of population structure in model organism association 715
mapping. Genetics 178(3): 1709-1723. 716
Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, Bergmuller E, 717
Dormann P, Weckwerth W, Gibon Y, Stitt M, Willmitzer L, Fernie AR, 718
Steinhauser D (2005) [email protected]: the Golm Metabolome Database. 719
Bioinformatics 21(8): 1635-1638. 720
Korte A, Farlow A (2013) The advantages and limitations of trait analysis with 721
GWAS: a review. Plant Methods 9(1): 29. 722
Kumar S, Skjaeveland A, Orr R, Enger P, Ruden T, Mevik B-H, Burki F, 723
Botnen A, Shalchian-Tabrizi K (2009) AIR: A batch-oriented web program 724
package for construction of supermatrices ready for phylogenomic analyses. 725
BMC Bioinformatics 10(1): 357. 726
24
Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D (2012) 727
Improved linear mixed models for genome-wide association studies. Nat 728
Methods 9(6): 525-526. 729
Loiselle BA, Sork VL, Nason J, Graham C (1995) Spatial genetic structure of a 730
tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot 82: 731
1420-1425. 732
Lu Y, Zhang S, Shah T, Xie C, Hao Z, Li X, Farkhari M, Ribaut J-M, Cao M, 733
Rong T, Xu Y (2010) Joint linkage-linkage disequilibrium mapping is a 734
powerful approach to detecting quantitative trait loci underlying drought 735
tolerance in maize. Proc Natl Acad Sci USA 107(45): 19585-19590. 736
Mandel JR, Nambeesan S, Bowers JE, Marek LF, Ebert D, Rieseberg LH, 737
Knapp SJ, Burke JM (2013) Association mapping and the genomic 738
consequences of selection in sunflower. PLoS Genet 9(3): e1003378. 739
Mangin B, Siberchicot A, Nicolas S, Doligez A, This P, Cierco-Ayrolles C (2011) 740
Novel measures of linkage disequilibrium that correct the bias due to 741
population structure and relatedness. Heredity 108: 285-291. 742
Marchini J, Howie B (2010) Genotype imputation for genome-wide association 743
studies. Nat Rev Genet 11(7): 499-511. 744
Mauricio R (2001) Mapping quantitative trait loci in plants: uses and caveats for 745
evolutionary biology. Nat Rev Genet 2(5): 370-381. 746
Mounet F, Moing A, Garcia V, Petit J, Maucourt M, Deborde C, Bernillon S, Le 747
Gall G, Colquhoun I, Defernez M, Giraudel JL, Rolin D, Rothan C, 748
Lemaire-Chamley M (2009) Gene and metabolite regulatory network 749
analysis of early developing fruit tissues highlights new candidate genes for 750
the control of tomato fruit composition and development. Plant Physiol 751
149(3): 1505-1528. 752
Muños S, Ranc N, Botton E, Bérard A, Rolland S, Duffé P, Carretero Y, Le 753
Paslier M-C, Delalande C, Bouzayen M, Brunel D, Causse M (2011) 754
Increases in tomato fruit size and locule number is controlled by two key SNP 755
located near Wuschel. Plant Physiol 156(4). 756
Nesbitt TC, Tanksley SD (2002) Comparative sequencing in the genus 757
Lycopersicon: implications for the evolution of fruit size in the domestication 758
of cultivated tomatoes. Genetics 162(1): 365 - 379. 759
Porcu E, Sanna S, Fuchsberger C, Fritsche LG (2013) Genotype imputation in 760
genome-wide association studies. Curr Protoc Hum Genet Chapter 1: Unit 1 761
25. 762
Powell JE, Kranis A, Floyd J, Dekkers JCM, Knott S, Haley CS (2012) Optimal 763
use of regression models in genome-wide association studies. Anim Genet 764
43(2): 133-143. 765
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure 766
using multilocus genotype data. Genetics 155(2): 945-959. 767
Prudent M, Causse M, Génard M, Tripodi P, Grandillo S, Bertin N (2009) 768
Genetic and physiological analysis of tomato fruit weight and composition: 769
influence of carbon availability on QTL detection. J Exp Bot 60(3): 923-937. 770
Purcell S, Neale BD, Todd-Brown K, Thomas L, Ferreira ME, Bender D, Maller 771
J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) Plink: a toolset for 772
whole-genome association and population-based linkage analysis. Am J Hum 773
Gen 81 (3):559-75. 774
25
Ranc N, Munos S, Santoni S, Causse M (2008) A clarified position for Solanum 775
lycopersicum var. cerasiforme in the evolutionary history of tomatoes 776
(Solanaceae). BMC Plant Biol 8: 130. 777
Ranc N, Munos S, Xu J, Le Paslier M-C, Chauveau A, Bounon R, Rolland S, 778
Bouchet JP, Brunel D, Causse M (2012) Genome-wide association mapping 779
in tomato (Solanum lycopersicum) is possible using genome admixture of 780
Solanum lycopersicum var. cerasiforme. G3 2(8): 853-864. 781
Riedelsheimer C, Lisec J, Czedik-Eysenberg A, Sulpice R, Flis A, Grieder C, 782
Altmann T, Stitt M, Willmitzer L, Melchinger AE (2012) Genome-wide 783
association mapping of leaf metabolic profiles for dissecting complex traits in 784
maize. Proc Natl Acad Sci U S A 109(23): 8872-8877. 785
Robbins C, Sim S-C, Yang W, Deynze AE, Knaap E, Joobeur T, Francis D 786
(2011) Mapping and linkage disequilibrium analysis with a genome-wide 787
collection of SNPs that detect polymorphism in cultivated tomato. J. Exp. Bot. 788
62(6): 1831-1845. 789
Schauer N, Semel Y, Balbo I, Steinfath M, Repsilber D, Selbig J, Pleban T, 790
Zamir D, Fernie AR (2008) Mode of inheritance of primary metabolic traits 791
in tomato. Plant Cell 20(3): 509-523. 792
Schauer N, Semel Y, Roessner U, Gur A, Balbo I, Carrari F, Pleban T, Perez-793
Melis A, Bruedigam C, Kopka J, Willmitzer L, Zamir D, Fernie AR 794
(2006) Comprehensive metabolic profiling and phenotyping of interspecific 795
introgression lines for tomato improvement. Nat Biotechnol 24(4): 447-454. 796
Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, 797
Lundgren K, Roessner-Tunali U, Forbes MG, Willmitzer L, Fernie AR, 798
Kopka J (2005) GC-MS libraries for the rapid identification of metabolites in 799
complex biological samples. FEBS Lett 579(6): 1332-1337. 800
Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, Nordborg M 801
(2012) An efficient multi-locus mixed-model approach for genome-wide 802
association studies in structured populations. Nat Genet 44(7): 825-830. 803
Shirasawa K, Fukuoka H, Matsunaga H, Kobayashi Y, Kobayashi I, Hirakawa 804
H, Isobe S, Tabata S (2013) Genome-wide association studies using single 805
nucleotide polymorphism markers developed by re-sequencing of the genomes 806
of cultivated tomato. DNA Res 20(6): 593-603. 807
Sillanpää MJ (2010) Overview of techniques to account for confounding due to 808
population stratification and cryptic relatedness in genomic data association 809
analyses. Heredity 106(4): 511-519. 810
Sim S-C, Durstewitz G, Plieske Jr, Wieseke R, Ganal MW, Van Deynze A, 811
Hamilton JP, Buell CR, Causse M, Wijeratne S, Francis DM (2012) 812
Development of a large SNP genotyping array and generation of high-density 813
genetic maps in tomato. PLoS ONE 7(7): e40563. 814
Soto-Cerda BJ, Cloutier S (2012) Association mapping in plant genomes. In: (Ed.) 815
PMC ed. Genetic Diversity in Plants. ISBN: 978-953-51-0185-7, InTech, 816
DOI: 10.5772/33005. Available from: 817
http://www.intechopen.com/books/genetic-diversity-in-plants/association-818
mapping-in-plant-genomes. 819
Steinbach D, Alaux M, Amselem J, Choisne N, Durand S, Flores Rl, Keliet A-O, 820
Kimmel E, Lapalu N, Luyten I, Michotey Cl, Mohellibi N, Pommier C, 821
Reboux Sb, Valdenaire De, Verdelet D, Quesneville H (2013) GnpIS: an 822
information system to integrate genetic and genomic data from plants and 823
fungi. Database 2013. 824
26
Stevens R, Buret M, Duffe P, Garchery C, Baldet P, Rothan C, Causse M (2007) 825
Candidate genes and quantitative trait loci affecting fruit ascorbic acid content 826
in three tomato populations. Plant Physiol 143(4): 1943-1953. 827
Stevens R, Buret M, Garchery C, Carretero Y, Causse M (2006) Technique for 828
rapid, small-scale analysis of vitamin C levels in fruit and application to a 829
tomato mutant collection. J Agric Food Chem 54(17): 6159-6165. 830
Stevens R, Page D, Gouble B, Garchery C, Zamir D, Causse M (2008) Tomato 831
fruit ascorbic acid content is linked with monodehydroascorbate reductase 832
activity and tolerance to chilling stress. Plant Cell Environ 31(8): 1086-1096. 833
Tabangin ME, Woo JG, Martin LJ (2009) The effect of minor allele frequency on 834
the likelihood of obtaining false positives. BMC Proc 3 Suppl 7: S41. 835
The Tomato Genome Consortium (2012) The tomato genome sequence provides 836
insights into fleshy fruit evolution. Nature 485(7400): 635-641. 837
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS 838
discovery. Am J Hum Genet 90(1): 7-24. 839
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of 840
population structure. Evolution 38: 1358-1370. 841
Xu J, Ranc N, Munos S, Rolland S, Bouchet J-P, Desplat N, Paslier M-C, Liang 842
Y, Brunel D, Causse M (2013) Phenotypic diversity and association mapping 843
for fruit quality traits in cultivated tomato and related species. Theor Appl 844
Genet 126(3): 567-581. 845
Zhang N, Brewer M, Knaap E (2012) Fine mapping of fw3.2 controlling fruit 846
weight in tomato. Theor Appl Genet 125(2): 273-284. 847
Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, 848
Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, 849
McCouch SR (2011) Genome-wide association mapping reveals a rich 850
genetic architecture of complex traits in Oryza sativa. Nat Commun 2: 467. 851
852
27
FIGURE LEGENDS 853
Figure 1 | Lower matrix displaying the correlations between each analyzed phenotype 854
adjusted for the year effect. The correlation coefficient (Spearman) ranges from -1 855
(red colour) to +1 (blue colour). 856
857
Figure 2 | Boxplot graphical representations of the distribution for the 19 traits that 858
showed a significant association. In all graphs, mean values labeled with different 859
letters are significantly different, whereas those with same letters are not (Tukey’s 860
test, P < 0.05). Abbreviation : S.C - Solanum cerasiforme; S.L - Solanum 861
lycopersicum; S.P - Solanum pimpinellifolium 862
863
Figure 3 | A focus on the malate level results. (A) Correlation for the malate level over 864
the two years of sampling in the collection of 163 accessions; (B) Variation of malate 865
level adjusted for the year effect within the three groups; (C) Distribution of the 866
adjusted malate level in the collection; (D) Manhattan plot for the 12 tomato 867
chromosomes (X-axis) and associated p-values for each marker (Y-axis); (E) QQplots 868
of the observed p-value distribution; (F) Evolution of genetic variance at each step of 869
the MLMM model (Blue: Genetic Variance Explained; Green: Total Genetic Variance; 870
Red: Error) for the optimal model (step=2-BIC criterion); (G and H) Allelic effect for 871
the two associated markers on chromosomes 2 and 6. Abbreviation : S.C - Solanum 872
cerasiforme; S.L - Solanum lycopersicum; S.P - Solanum pimpinellifolium 873
874
Figure 4 | Manhattan plots displaying the –log10(p-values) (Y-axis) over genomic 875
positions (X-axis) in a window of 2.5Mbp up and downstream of the two loci 876
associated with the malate level trait that are located on chromosome 2 (Fig 5a) and 877
chromosome 6 (Fig 5b). Different colour schemes are used to represent the pairwise 878
LD estimates (rs2) for each genomic location. 879
880
28
Table I | Intra-chromosomal linkage disequilibrium (rs2) in each tomato group. This estimate takes into account the effect of population structure 881
(see Mangin et al., 2011). 882
883
884
Mean pairwise rs2 K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 all K
S. lycopersicum 0.5508 0.5988 0.5895 0.5570 0.6029 0.6235 0.5416 0.5397 0.5231 0.5539 0.5938 0.5389 0.5678
S. cerasiforme 0.5391 0.5318 0.5394 0.5191 0.5500 0.5530 0.5320 0.5337 0.5204 0.5315 0.5619 0.5117 0.5353
S. pimpinellifolium 0.3323 0.3239 0.2884 0.3872 0.3557 0.2604 0.3478 0.3338 0.3431 0.3923 0.2917 0.3968 0.3378
885
886
29
Table II | Detailed information for the 44 significant associations detected within the 36 traits analysed using the MLMM model. 887
Phenotype SNP1 Chromosome SNP Position (bp)2 p-value3 UniRef Annotation Locus name
(ITAG 2.3)4
ASA solcap_snp_sl_12749 6 36931366 1.42e-05 Peptide transporter, TGF-beta receptor, type
I/II extracellular region Solyc06g065020.2
ASA solcap_snp_sl_37057 7 63886939 2.94e-10 Conserved gene of unknown function Solyc07g064580.2
ASA solcap_snp_sl_26678 9 2411418 1.09e-07 Repressor of silencing 1 Solyc09g009080.2
ASA solcap_snp_sl_46662 9 61773785 1.07e-05 Gene of unknown function Solyc09g074480.1
ASA solcap_snp_sl_62616 11 3393838 4.66e-08 ATP dependent RNA helicase Solyc11g010310.1
Asparagine solcap_snp_sl_32389 2 48943496 1.93e-07 Copine-like protein Solyc02g093520.2
Aspartate solcap_snp_sl_11456 4 58318210 1.67e-07 BHLH transcription factor Solyc04g074810.2
SSC solcap_snp_sl_26136 2 29851816 7.79e-26 Mannose-6-phosphate isomerase Solyc02g063220.2
SSC CT232_snp229 2 43207682 7.73e-10 UV excision repair protein RAD23 protein Solyc02g085840.2
SSC solcap_snp_sl_63048 3 71026 0.0006 CXE carboxylesterase Solyc03g005100.2
SSC solcap_snp_sl_35206 6 1748271 2.92e-21 Auxin signaling F-box1 family protein Solyc06g007830.1
SSC solcap_snp_sl_53288 7 60078938 1.22e-12 Beta-1,3-galactosyl-O-glycosyl-glycoprotein
beta-1,6-N-acetylglucosaminyltransferase 7 Solyc07g054440.2
SSC solcap_snp_sl_65072 8 59477446 5.57e-08 Agenet domain containing protein Solyc08g078530.2
SSC solcap_snp_sl_39725 9 3477979 1.34e-33 Beta-fructofuranosidase (aka lin5) Solyc09g010080.2
SSC solcap_snp_sl_10594 11 2481288 1.89e-13 Single-stranded nucleic acid binding R3H
domain protein Solyc11g008250.1
SSC solcap_snp_sl_659 12 45751611 2.41e-06 Gene of unknown function NA
Citrate solcap_snp_sl_19899 6 41345468 1.48e-07 Conserved gene of unknown function Solyc06g072930.2
DHA solcap_snp_sl_69445 9 64606433 3.16e-39 Ubiquitin carboxyl-terminal hydrolase family
protein Solyc09g089560.2
30
DHA solcap_snp_sl_21770 11 3063738 8.49e-07 Pentatricopeptide repeat-containing protein SGN-U564017
Erythritol solcap_snp_sl_13558 2 36559326 1.24e-07 Pollen allergen Che a 1 Solyc02g076860.2
Erythritol solcap_snp_sl_60698 10 64445598 5.98e-16 Flavin oxidoreductase/NADH oxidase Solyc10g086220.1
Fructose solcap_snp_sl_16136 5 59787171 9.31e-07 Conserved gene of unknown function Solyc05g050500.1
Fructose solcap_snp_sl_27215 6 38384375 9.05e-07 Katanin p60 ATPase-containing subunit Solyc06g066810.2
Fucose solcap_snp_sl_20802 3 60860146 2.70e-07 UV excision repair protein RAD23 protein Solyc03g117780.2
Fucose solcap_snp_sl_53149 4 53628534 1.63e-06 Structural constituent of ribosome Solyc04g056530.1
GABA solcap_snp_sl_35255 6 1330594 5.53e-08 TatD DNase domain-containing
deoxyribonuclease Solyc06g007310.2
Malate solcap_snp_sl_6196 2 13905175 1.28e-06 Gene of unknown function SGN-U565892
Malate solcap_snp_sl_19899 6 41345468 2.48e-08 Conserved gene of unknown function Solyc06g072930.2
Nicotinate solcap_snp_sl_29349 2 49451582 3.83e-06 Uridylyltransferase PII Solyc02g094300.2
Proline solcap_snp_sl_100675 2 28798838 3.71e-06 Conserved gene of unknown function NA
Proline solcap_snp_sl_32499 6 21807134 3.91e-07 Membrane-associated progesterone receptor
component 1 Solyc06g035870.2
Rhamnose solcap_snp_sl_40309 1 84253735 2.61e-08 Embryo-specific 3 SGN-U565850
Rhamnose solcap_snp_sl_34196 3 59102190 2.32e-09 Conserved gene of unknown function Solyc03g115250.2
Rhamnose solcap_snp_sl_56631 8 1403227 9.41e-06 Patatin-1-Kuras 2 Solyc08g006860.2
Rhamnose solcap_snp_sl_39722 9 3484890 2.10e-10 Gene of unknown function SGN-U565153
Sucrose solcap_snp_sl_13549 2 36490995 2.57e-06 Conserved gene of unknown function Solyc02g076800.1
Sucrose solcap_snp_sl_17956 4 59392982 6.01e-05 Glutamyl-tRNA reductase Solyc04g076870.2
Sucrose solcap_snp_sl_29483 5 4037126 9.51e-09 Glycosyltransferase family GT8 protein Solyc05g009820.2
Threonate solcap_snp_sl_11456 4 58318160 5.73e-06 BHLH transcription factor Solyc04g074810.2
Threonine solcap_snp_sl_32389 2 48943446 3.75e-07 Copine-like protein Solyc02g093520.2
Tocopherol solcap_snp_sl_46445 10 2199297 4.35e-07 Conserved gene of unknown function Solyc10g008030.2
31
Tyramine solcap_snp_sl_14531 8 2587919 1.12e-05 Conserved gene of unknown function Solyc08g008120.2
Tyramine solcap_snp_sl_64706 8 57571484 1.18e-07 Lysine-specific demethylase 5A Solyc08g076390.2
Tyramine solcap_snp_sl_36166 11 762353 1.54e-06 Transcription regulator SGN-U275742
888
1 –SNP names as given in the SOLCAP SNP array (see http://solcap.msu.edu) 889
2 – SNP genomic position onto the tomato reference genome (v2.40) 890
3 – SNP p-values 891
4 – Name of the locus the peak SNP belongs to (according to the tomato genome annotation v2.30) 892
893
32
Table III | Summary of trait associations showing the heritability of the trait (h2, step 0 894
in the MLMM model), the missing heritability (h2 at the optimal model), the 895
percentage of associated variation of the trait (PVE), the number of significant loci 896
associated with the trait variation. 897
Phenotype Trait h2 Missing
h2 PVE
#
associations
ASA 0.553 0.333 0.561 5
Asparagine 0.417 0.208 0.220 1
Aspartate 0.284 0.301 0.162 1
Brix 0.600 0.185 0.611 9
Citrate 0.423 0.299 0.181 1
DHA 0.595 0.192 0.743 2
Erythritol 0.534 0.286 0.358 2
Fructose 0.565 0.250 0.386 2
Fucose 0.415 0.365 0.481 2
GABA 0.415 0.184 0.237 1
Malate 0.642 0.182 0.390 2
Nicotinate 0.595 0.458 0.279 1
Proline 0.773 0.282 0.461 2
Rhamnose 0.579 0.195 0.504 4
Sucrose 0.585 0.220 0.439 3
Threonate 0.168 0.174 0.170 1
Threonine 0.348 0.007 0.187 1
Tocopherol 0.306 0.261 0.224 1
Tyramine 0.612 0.347 0.472 3
Min 0.168 0.007 0.162 1
Max 0.773 0.458 0.743 9
Median 0.553 0.250 0.386 2
898
899
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Citrate
Putrescine
Saccharate
Phenylalanine
Lysine
Methionine
Asparagine
Glutamine
betaAlanine
GABA
Serine
Threonine
Malate
Galacturonate
Glucuronate
Xylose
Glutarate2oxo
Tocopherol
Inositol1P
Erythritol
Brix
Fructose
Sucrose
Maltitol
Proline
ASA
DHA
Nicotinate
Tyramine
Glycerol3P
Glutamate
Aspartate
Threonate
Rhamnose
Fucose
Maltose
CitratePutrescineSaccharate
PhenylalanineLysine
MethionineAsparagineGlutaminebetaAlanine
GABASerine
ThreonineMalate
GalacturonateGlucuronate
XyloseGlutarate2oxoTocopherolInositol1PErythritol
BrixFructoseSucroseMaltitolProlineASADHA
NicotinateTyramineGlycerol3PGlutamateAspartateThreonateRhamnose
FucoseMaltose
S.C
S.L
S.P
1.1 1.3 1.5 1.7Groups
ASA
b
a
c
S.C
S.L
S.P
-1.0 0.0 0.5
Groups
Asparagine
a a
a
S.C
S.L
S.P
-0.4 0.0 0.4
Groups
Aspartate
a a
a
S.C
S.L
S.P
4 6 8 10
Groups
SSC
b
a
c
S.C
S.L
S.P
0.0 0.4 0.8
Groups
Citrate
a a
b
S.C
S.L
S.P
-0.4 0.0
Groups
DHA
b
a
c
S.C
S.L
S.P
-0.6 -0.2 0.2
Groups
Erythritol
b a
b
S.C
S.L
S.P
0.0 0.4 0.8
Groups
Fructose
b
a
c
S.C
S.L
S.P
-0.2 0.0 0.2
Groups
Fucose
b
a
c
S.C
S.L
S.P
-0.8 -0.2 0.4
Groups
GABA
b
c
a
S.C
S.L
S.P
-0.6 0.0 0.4
Groups
Malate
a b
a b
S.C
S.L
S.P
-0.2 0.2
Groups
Nicotinate
b
a
c
S.C
S.L
S.P
-1.0 0.0
Groups
Proline
b a
b
S.C
S.L
S.P
0.2 0.6 1.0
Groups
Rhamnose
b a
b
S.C
S.L
S.P
-0.6 0.0 0.4
Groups
Sucrose
b
a
c
S.C
S.L
S.P
-0.4 0.0 0.4
Groups
Threonate
a a
a
S.C
S.L
S.P
-0.4 0.0 0.4
Groups
Threonine
a a
a
S.C
S.L
S.P
0.2 0.6 1.0
Groups
Tocopherol
b a
a b
S.C
S.L
S.P
-0.4 0.0 0.4
Groups
Tyramine
b
a
c
-0.5 0.0 0.5
-0.4
0.00.20.4
Year effect on the measurement of malate content
Malate content measured in 2007
Mal
ate
cont
ent m
easu
red
in 2
008
S.P S.C S.L
-0.6
-0.2
0.2
0.6
Malate content in each species
Mal
ate
cont
ent
Distribution of Malate content in the Collection
Malate Content
Density
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
0.0
0.5
1.0
1.5
02
46
812
Manhattan plot
Chromosome
-log10(pval)
1 2 3 4 5 6 7 8 9 10 11 12
0 1 2 3 4 5
02
46
810
QQplot for Optimal model according to mbonf
Expected ! log10(p)
Observed
!log 10(p)
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Partition of variance at each step of MLMM
step
%var
CC CG GG
-0.6
-0.2
0.2
0.6
Allelic effect for the associated SNP on chromosome 2
Mal
ate
cont
ent
TT TC CC
-0.6
-0.2
0.2
0.6
Allelic effect for the associated SNP on chromosome 6
Mal
ate
cont
ent
(B)(A)
(C) (D)
(E)(F)
(G) (H)
Chromosome 2
11041792 15817593
0
1
2
3
4
5
6
−log
(p−v
alue
)
!
!!
! !!
!!
!
!!
!
! !!!!
!
!
!
!!!
!
!!
!
!!!!!! !
!!
!!
!!!
!
!!!!
!
!
!
!
!!!!!
!
!
!
!
!
!
!
!
!!
sample!
solc
ap_s
np_s
l_32
181
solc
ap_s
np_s
l_32
184
solc
ap_s
np_s
l_11
569
solc
ap_s
np_s
l_69
274
solc
ap_s
np_s
l_11
572
solc
ap_s
np_s
l_69
276
solc
ap_s
np_s
l_58
71so
lcap
_snp
_sl_
5872
solc
ap_s
np_s
l_58
75so
lcap
_snp
_sl_
5879
solc
ap_s
np_s
l_58
81so
lcap
_snp
_sl_
5882
solc
ap_s
np_s
l_58
83so
lcap
_snp
_sl_
5884
solc
ap_s
np_s
l_52
608
solc
ap_s
np_s
l_52
609
solc
ap_s
np_s
l_31
002
solc
ap_s
np_s
l_58
85so
lcap
_snp
_sl_
4082
3so
lcap
_snp
_sl_
4082
0so
lcap
_snp
_sl_
1386
solc
ap_s
np_s
l_13
85so
lcap
_snp
_sl_
1384
solc
ap_s
np_s
l_13
82so
lcap
_snp
_sl_
1381
solc
ap_s
np_s
l_40
819
solc
ap_s
np_s
l_13
80so
lcap
_snp
_sl_
1379
solc
ap_s
np_s
l_26
911
solc
ap_s
np_s
l_61
76so
lcap
_snp
_sl_
6177
solc
ap_s
np_s
l_61
80so
lcap
_snp
_sl_
5312
0so
lcap
_snp
_sl_
6183
solc
ap_s
np_s
l_61
84so
lcap
_snp
_sl_
6185
solc
ap_s
np_s
l_61
89so
lcap
_snp
_sl_
6192
solc
ap_s
np_s
l_31
177
solc
ap_s
np_s
l_61
94so
lcap
_snp
_sl_
6195
solc
ap_s
np_s
l_61
96so
lcap
_snp
_sl_
6199
solc
ap_s
np_s
l_62
01so
lcap
_snp
_sl_
6206
solc
ap_s
np_s
l_31
179
solc
ap_s
np_s
l_31
184
solc
ap_s
np_s
l_62
26so
lcap
_snp
_sl_
6227
solc
ap_s
np_s
l_62
28so
lcap
_snp
_sl_
6229
solc
ap_s
np_s
l_53
126
solc
ap_s
np_s
l_62
32so
lcap
_snp
_sl_
3118
9so
lcap
_snp
_sl_
3119
0so
lcap
_snp
_sl_
3119
1so
lcap
_snp
_sl_
3119
2so
lcap
_snp
_sl_
3119
3so
lcap
_snp
_sl_
6239
solc
ap_s
np_s
l_62
40so
lcap
_snp
_sl_
3119
6so
lcap
_snp
_sl_
3119
8so
lcap
_snp
_sl_
6247
solc
ap_s
np_s
l_62
49so
lcap
_snp
_sl_
6255
LD Map Type: r−squarePhysical Distance: 4775.8 kb
0 0.2 0.4 0.6 0.8 1
(A)
Chromosome 6
39116919 43961486
0123456789
101112
−log
(p−v
alue
)
!
!!
!
!
! ! !
!
!
! !
! !!
!
!!
!!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!
!! !
!!
!!
!! ! !! !
! !
!
! !!
sample!
solc
ap_s
np_s
l_57
093
solc
ap_s
np_s
l_57
106
solc
ap_s
np_s
l_57
155
solc
ap_s
np_s
l_19
852
solc
ap_s
np_s
l_57
180
solc
ap_s
np_s
l_57
217
solc
ap_s
np_s
l_57
228
solc
ap_s
np_s
l_57
252
solc
ap_s
np_s
l_57
254
solc
ap_s
np_s
l_57
294
solc
ap_s
np_s
l_57
327
solc
ap_s
np_s
l_57
347
CL0
1610
2−04
29_s
olca
p_sn
p_sl
_573
52Bc
yc_8
68so
lcap
_snp
_sl_
5737
4so
lcap
_snp
_sl_
5737
6so
lcap
_snp
_sl_
5737
7so
lcap
_snp
_sl_
2442
8so
lcap
_snp
_sl_
5739
8so
lcap
_snp
_sl_
5743
5so
lcap
_snp
_sl_
5745
5so
lcap
_snp
_sl_
5746
5so
lcap
_snp
_sl_
5751
2so
lcap
_snp
_sl_
5751
3C
L015
545−
0057
_sol
cap_
snp_
sl_5
7521
solc
ap_s
np_s
l_24
436
solc
ap_s
np_s
l_24
437
solc
ap_s
np_s
l_19
898
solc
ap_s
np_s
l_19
899
solc
ap_s
np_s
l_57
533
solc
ap_s
np_s
l_24
440
solc
ap_s
np_s
l_57
574
solc
ap_s
np_s
l_24
449
SL10
882_
924
solc
ap_s
np_s
l_57
590
solc
ap_s
np_s
l_57
593
solc
ap_s
np_s
l_57
594
solc
ap_s
np_s
l_24
450
solc
ap_s
np_s
l_57
607
solc
ap_s
np_s
l_24
454
solc
ap_s
np_s
l_19
915
solc
ap_s
np_s
l_57
644
solc
ap_s
np_s
l_57
681
solc
ap_s
np_s
l_57
714
solc
ap_s
np_s
l_57
749
solc
ap_s
np_s
l_57
766
solc
ap_s
np_s
l_57
810
solc
ap_s
np_s
l_57
817
solc
ap_s
np_s
l_57
866
solc
ap_s
np_s
l_39
395
solc
ap_s
np_s
l_39
360
solc
ap_s
np_s
l_39
318
solc
ap_s
np_s
l_39
312
solc
ap_s
np_s
l_39
310
solc
ap_s
np_s
l_39
281
solc
ap_s
np_s
l_39
246
solc
ap_s
np_s
l_16
543
LD Map Type: r−squarePhysical Distance: 4844.6 kb
0 0.2 0.4 0.6 0.8 1
(B)