GWAS in tomato for metabolic traits - Plant Physiology · Bauchet & Causse, 2012) in the last couple of years due to the release of the tomato genome sequence (The Tomato Genome Consortium,

1

Running Head : GWAS in tomato for metabolic traits 1

2

Corresponding Author: 3

Christopher Sauvage (Ph.D) 4

INRA, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL) 5

Domaine St Maurice – 67 Allée des Chênes, CS 60094, F-84143 MONTFAVET 6

Cedex, France 7

Tél : +33 (0)4 32 72 27 54 8

Email: [email protected] 9

10

Research Area: 11

Genes, Development and Evolution 12

13

Plant Physiology Preview. Published on June 3, 2014, as DOI:10.1104/pp.114.241521

Copyright 2014 by the American Society of Plant Biologists

2

Title: Genome Wide Association in tomato reveals 44 candidate loci for fruit 14

metabolic traits. 15

16

Authors : Christopher Sauvage 1, Vincent Segura 2, Guillaume Bauchet 1,3, Rebecca 17

Stevens 1, Phuc Thi Do 4,5, Zoran Nikoloski 4, Alisdair R. Fernie4 and Mathilde 18

Causse 1 19

1- INRA, UR1052, GAFL, 67 allée des Chênes Domaine Saint Maurice – CS60094, 20

84143 Montfavet Cedex (France) 21

2- INRA, UR0588, 2163 avenue de la Pomme de Pin, 45075 Orléans Cedex 2 22

(France) 23

3- Syngenta Seeds, 12 chemin de l’Hobit, 31790 Saint Sauveur (France) 24

4- Max-Planck-Institut für Molekulare Pflanzenphysiologie, 14476 Potsdam-Golm 25

(Germany) 26

5- Faculty of Biology, VNU University of Science, Vietnam National University, 27

Hanoi. 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam. 28

29

Summary : Genome wide association has shed light on the genetic architecture of 30

metabolic traits underlying fruit quality in tomato and allowed identification of 31

candidate genes. 32

33

34

3

Financial Source: 35

36

The EU EUSOL project PL016214-2 funded this work. 37

38

Corresponding Author : 39

40

Christopher Sauvage (Ph.D) 41

INRA, UR1052, Génétique et Amélioration des Fruits et Légumes (GAFL) 42

Domaine St Maurice - Allée des Chênes, CS 60094, F-84143 MONTFAVET Cedex, 43

France 44

Tél : +33 (0)4 32 72 27 54 45

Email: [email protected] 46

47

4

ABSTRACT 48

Genome-wide association studies have been successful in identifying genes involved 49

in polygenic traits and are valuable for crop improvement. Tomato (Solanum 50

lycopersicum) is a major crop and is highly appreciated worldwide for its health 51

value. We used a core collection of 163 tomato accessions composed of S. 52

lycopersicum, S. lycopersicum cv. cerasiforme and S. pimpinellifolium to map loci 53

controlling variation in fruit metabolites. Fruits were phenotyped for a broad range of 54

metabolites including amino acids, sugars and ascorbate. In parallel, the accessions 55

were genotyped with 5995 SNP markers spread over the whole genome. Genome 56

Wide Association analysis was conducted on a large set of metabolic traits that were 57

stable over two years, using a multi-locus mixed model as a general method for 58

mapping complex traits in structured populations (Segura et al., 2012) and applied to 59

tomato. We detected a total of 44 loci that were significantly associated with a total of 60

19 traits including sucrose, ascorbate, malate and citrate levels. These results not only 61

provide a list of candidate loci to be functionally validated but also a powerful 62

analytical approach for finding genetic variants that can be directly used for crop 63

improvement and deciphering the genetic architecture of complex traits. 64

5

INTRODUCTION 65

In crops, linkage mapping has proved invaluable for detecting quantitative trait loci 66

(QTL) for traits of interest and to unravel their underlying genetic architecture. This 67

approach is based on the analysis of the segregation of polymorphism between the 68

parental lines and their progeny. However, one of the limitations of this approach is 69

the reduced number of recombination events that occur per generation (for review, see 70

Korte & Farlow, 2013). This leads to extended linkage blocks that reduce the 71

accuracy of the linkage mapping. An alternative to linkage-based mapping studies is 72

to perform linkage disequilibrium (LD) mapping in a population of theoretically 73

unrelated individuals. The ancestral polymorphism segregating through this 74

population (or panel) is far more informative compared to the polymorphism of the 75

parental lines of the linkage mapping population (Mauricio, 2001). LD mapping, also 76

known as genome-wide association (GWA), relies on the natural patterns of LD in the 77

population investigated. The aim of GWA is to reveal trait-associated loci by taking 78

advantage of the level of LD. Depending on the decay of LD, the mapping resolution 79

can be narrowed from a large genomic portion where the level of LD is relatively high 80

to a single marker when the LD level is very low. 81

Following domestication, crops are prone to (1) increased levels of LD, (2) population 82

structure (remote common ancestry of large groups of individuals) and (3) cryptic 83

relatedness (the presence of close relatives in a sample of unrelated individuals) 84

(Riedelsheimer et al., 2012). Population structure and cryptic relatedness may lead to 85

false-positive association in GWA studies (Astle & Balding, 2009) but their effect is 86

now relatively well accounted for in mixed linear models (MLM) (for reviews see 87

(Sillanpää, 2010; Listgarten et al., 2012). The problem of high LD in GWA scans 88

must also be taken into account: Segura et al. (2012) investigated this difficulty by 89

proposing a multi-locus mixed-model (MLMM) that handles the confounding effect of 90

background loci that may be present throughout the genome due to linkage 91

disequilibrium. This approach revealed multiple loci in LD and associated with 92

sodium concentration in leaves in A. thaliana while previous methods failed to 93

identify this complex pattern (Segura et al., 2012). 94

In parallel, the development of cost-effective high-throughput sequencing 95

technologies has identified increasingly dense variant loci necessary to conduct GWA 96

scans, especially in model species such as rice for agronomic traits (Huang et al., 97

2010) or maize for drought tolerance (Lu et al., 2010 but see Soto-Cerda & Cloutier, 98

6

2012, for a complete review). However, GWA is not restricted to model species and is 99

becoming increasingly widespread in non-model ones such as sunflower (Mandel et 100

al., 2013) and tomato (Xu et al., 2013) where numerous associations have been 101

successfully identified for traits related to plant architecture (branching in the case of 102

sunflower) and fruit quality (e.g. fresh weight in tomato). 103

Tomato is a crop of particular interest as the fruit are an important source of fiber and 104

nutrients in the human diet and a model for the study of fruit development 105

(Giovannoni, 2001). Over the last two decades, numerous QTL have been identified 106

for traits such as fresh weight (fw) using linkage approaches (Frary et al., 2000; Zhang 107

et al., 2012; Chakrabarti et al., 2013) but also for other fruit related traits such as fruit 108

ascorbic acid levels (Stevens et al., 2007), sensory and instrumental quality traits 109

(Causse et al., 2002), sugar and organic acids (Fulton et al., 2002) or metabolic 110

components (Schauer et al., 2008). Large tomato germplasm collections have been 111

characterized at the molecular level using SSR (Ranc et al., 2008) and SNP markers 112

(Blanca et al., 2012; Shirasawa et al., 2013) giving insights into population structure, 113

tomato evolutionary history and the genetic architecture of traits of agronomical 114

interest. These screens of nucleotide diversity were made possible (for review, see 115

Bauchet & Causse, 2012) in the last couple of years due to the release of the tomato 116

genome sequence (The Tomato Genome Consortium, 2012) and derived genomic 117

tools such as a high density SNP genotyping array (Sim et al., 2012). The 118

combination of large germplasm collections, high-throughput genomic tools and traits 119

of economical interest provide a framework to apply GWA study in this species. In 120

tomato, previous association studies have been limited to a targeted region (e.g. 121

chromosome 2, Ranc et al., 2012), used low-density genome-wide distributed SNP 122

markers (Xu et al., 2013) or investigated a limited number of agronomical of traits 123

with low precision on the association panel (Shirasawa et al., 2013). 124

Using tomato (Solanum lycopersicum) as a model, we aimed to investigate the genetic 125

architecture of traits related to fruit metabolic composition, at high resolution. To 126

reach this objective, we carried out an investigation into LD patterns at the genome-127

wide scale and a GWA scan using the MLMM approach. We present results on the 128

genetic architecture of fruit metabolic composition for metabolites such as organic 129

acids, amino acids, sugars and ascorbate in tomato. 130

RESULTS 131

PHENOTYPING 132

7

We phenotyped a panel composed of 163 accessions for a total of 76 133

metabolic traits including amino acids, organic acids and sugars. The tomato diversity 134

panel was composed of 28 S. lycopersicum (S.L), 119 S. lycopersicum cv. cerasiforme 135

(S.C) and 16 S. pimpinellifolium (S.P) derived from the previously published core 136

collection described in Xu et al. (2013). From the set of 76 phenotypes, 36 of these 137

(47.3%) were highly correlated over the two years of sampling. Of these 36 138

phenotypes, significant differences between the three groups of tomato accessions 139

were identified for 26 phenotypes (70.3% - see Figure 1). The post-hoc TukeyHSD 140

test provided a more thorough investigation of the significant differences among the 141

three groups for each trait. Comparisons including S.P (S.P-S.L and S.P-S.C) were 142

more significantly different to the comparison S.L-S.C (see Figure 2). 143

The correlation pattern revealed clusters of highly correlated compounds in the 144

metabolic profile that largely corresponded to a functional classification of the 145

metabolites (Figure 1). For example, the concentration of fructose, sucrose, maltitol, 146

erythritol and maltose clustered together with SSC (Soluble Solid Content) while 147

amino acids (e.g. serine, threonine, methionine, asparagine) also clustered together. 148

We conducted GWA on this set of 36 phenotypes, which were stable (correlated) over 149

the two experimentation years using the MLMM approach (see Supplementary Table 150

S1 for the complete phenotype dataset). 151

GENOTYPING 152

From the initial 8,784 SNPs of the SOLCAP genotyping array, 7,720 (87.8%) 153

passed the manufacturing quality control and constituted our raw data set (see 154

Materials & Methods). From this raw data set, the quality filtering gave a total of 155

5,995 reliable SNP (77.6%), thus constituting the analysed data set for GWA. The 156

overall average percentage of missing data per locus was estimated at 3.84% in the 157

whole population, while ranging from 2.25% in S.L to 4.07% in S.P. The missing data 158

were imputed by the most common allele of the SNP as no missing data is allowed in 159

the MLMM. 160

The MAF (Minor Allele Frequency) values were evenly distributed from 161

0.001% to 0.5% and showed differences in their distribution between groups. The S.L 162

accessions showed an excess of rare variants with a skewed distribution of the MAF 163

values (median MAF= 0.107), while S.C and S.P accessions showed a broader 164

distribution of the MAF values (median MAF= 0.161 and 0.214, respectively). Such a 165

low median MAF in the S.L accessions may be attributed to (1) a higher proportion of 166

8

nearly monomorphic SNPs and (2) the shared ancestry within this group. This 167

observation is supported by a previous study that investigated the MAF pattern in sub-168

populations of tomato (Sim et al, 2012b). 169

POPULATION STRUCTURE 170

The pairwise-population FST differentiation index was estimated to be ~1% 171

(0.0102) between S.L and S.C, while between S.L and S.P and between S.C and S.P a 172

stronger population differentiation, estimated to be 0.2132 and to 0.1583 respectively, 173

was detected. These results are supported by the estimation of the population structure 174

using the STRUCTURE software. Following the ad hoc statistic ΔK, population 175

structure was apparent with the number of ancestral populations estimated to be two 176

(K=2). Whereas the first group was composed of a cluster of the S.L. accessions and 177

the S.C accessions (N=147), the second group was composed of a cluster of the S.P 178

accessions only (N=16). 179

ESTIMATES OF KINSHIP AND LINKAGE DISEQUILIBRIUM IN THE COLLECTION 180

Within the 163 accessions, the pairwise kinship estimates revealed a low 181

degree of relatedness between individuals with a mean overall estimate of 0.0738. 182

Pairwise linkage disequilibrium estimates (rs2) within each group revealed different 183

levels of LD along chromosomes. On average, LD was higher in S.L (rs2=0.57), 184

medium in S.C (rs2=0.54) and lower in S.P (rs

2=0.34). Within each group and for the 185

12 chromosomes, rs2 estimates ranged from 0.29 (K3) to 0.39 (K12) in S.P, from 186

0.5117 (K12) to 0.5619 (K11) in SC and from 0.52 (K9) to 0.62 (K6) in SL. More 187

details on LD estimates for each chromosome in the three groups by chromosome are 188

given in Table 1. 189

GENOME-WIDE ASSOCIATION 190

GWA was conducted trait by trait in order to dissect the optimal model 191

obtained from the MLMM. After correcting for multiple testing, GWA scan identified 192

a total of 44 loci that were significantly associated with 19 out of the 36 traits 193

(52.7%). These 44 loci were spread unevenly over the genome as all chromosomes 194

carried at least one association (chromosomes 1 and 12) but up to 10 associated loci 195

were located on chromosome 2. Moreover, the number of associated loci per trait 196

ranged from one (for 8 traits in total) to nine (for SSC). Table 2 reports the detailed 197

statistics of GWA (i.e. p-value, genomic location) for the loci associated with these 19 198

traits. For each trait, the heritability (estimated at the step 0 of the model, based on the 199

9

the variance component ��, computed for all markers and representing the estimated 200

genetic variance of the trait) ranged from 0.168 (threonate level) to 0.773 (proline 201

level) with a median value of 0.553 (over all traits), while the missing heritability (not 202

explained by the markers included in the model) ranged from 0.007 (threonine level) 203

to 0.458 (nicotinate level) with a median value of 0.250. The percentage of variation 204

explained for each trait was estimated from the optimal model obtained from the 205

MLMM. The percentage of variation explained ranged from 16.2% to 74.3% for the 206

aspartate level and the dehydroascorbate level traits, respectively, while for the SSC, 207

the percentage of variation was estimated as 0.611 (61%, see Table 3 for details). For 208

each trait, the Manhattan plot displaying p-values for each locus in relation to its 209

genomic location are shown in Supplementary Figure 1. 210

Finally, the peak SNP associated with SSC (Solcap_snp_sl_26678) that 211

belongs to a candidate gene (Solyc09g010080.2 (lin5), a fruit-specific Beta-212

fructofuranosidase or invertase), which plays a role in sugar metabolism (Fridman et 213

al., 2004), validates the methodological approach we employed by its mapping in our 214

panel. We identified putative candidate genes in the present study, especially in close 215

vicinity to 4 peak SNPs. For example, the peak SNP Solcap_snp_sl_26678 216

(chromosome 9, position: 2,411,368 bp) is associated with fruit ascorbate content, and 217

is located ≈423 kbp upstream of a monodehydroascorbate reductase (NADH)-like 218

protein (MDHAR, Solyc09g009390.2, position: 2,835,367 bp) previously shown to be 219

linked to fruit ascorbate levels under stress conditions (Stevens et al., 2008). 220

Similarly, the peak SNPs associated with nicotinate (Solcap_snp_sl_29349), malate 221

(Solcap_snp_sl_19899) and sucrose levels (Solcap_snp_sl_17956) are located at 680 222

kbp, 7.9 kbp and 68 kbp respectively from three putative candidate genes that play a 223

role in the genetic architecture of the variation of these traits that are described as a 224

nicotinate phosphoribosyl transferase protein (Solyc02g093290.2, position: 225

48,771,224 bp), an aluminum-activated malate transporter-like (Solyc06g072910.2, 226

position: 41,337,629 bp) and a sugar transporter (galactosylgalactosylxylosyl protein 227

3-beta-glucuronosyltransferase, Solyc04g076920.2, position: 59,461,803 bp), 228

respectively. 229

As a case study, we focused on the results associated with fruit malate content 230

by compiling all the results obtained for this trait. Figures 3 and 4 illustrate these 231

results. Malate levels were stable over the two years of the experiment (R2=0.621, 232

10

Figure 3A), differences in malate levels were significant between groups (Figure 2 233

and 3B) and the trait was normally distributed within the panel of accessions (Figure 234

3C). GWA identified two significant SNP associated with malate levels (see Figure 235

3D) without inflation in the distribution of p-values at the optimal step of the model 236

(see figure 3E) suggesting that population structure was well controlled. These two 237

SNPs explained a proportion of the trait variation estimated at 39% (see figure 3F). 238

For each peak SNP, located on chromosome 2 and 6, the allelic effects of each 239

genotypic class (homozygotes and heterozygote) were estimated (see figure 3 G and 240

H). Finally, we used the pairwise LD estimates (rs2) for each genomic location to (1) 241

narrow down the genomic interval and (2) seek for putative candidate genes in the 242

vicinity of the two peak SNP (Figure 4A and B), providing a local overview of the 243

extent of LD and revealing an aluminum-activated malate transporter-like 244

(Solyc06g072910.2, position: 41,337,629 bp) as a good candidate gene (see above). 245

246

DISCUSSION 247

The aims of the present study were to (1) investigate LD patterns in a panel of 163 248

tomato accessions including wild, admixed and cultivated accessions and (2) 249

implement a stepwise GWA approach to reveal associations between SNP markers 250

and traits related to fruit metabolites. We successfully achieved this objective with (1) 251

the investigation of the LD patterns revealing different levels of LD along 252

chromosomes and between the three groups constituting the panel and (2) the 253

detection of genome-wide associations for 19 fruit metabolic traits. Finally, we 254

demonstrated that GWA is powerful enough to link metabolic composition of fruits in 255

tomato with genetic variation at a high resolution, despite a high level of LD and 256

population structure. 257

258

METABOLITE PROFILING AND PHENOTYPING OF TRAITS 259

The phenotypic traits focused on in the present study were measured for two years in 260

a row (2007 and 2008) under similar growth conditions on an identical set of 163 261

tomato accessions. Only 36 traits (47.3%) were stable over the two years suggesting 262

that metabolite profiling is highly sensitive to the environmental conditions. Previous 263

studies have reported developmental stage × genotype or environmental × genotype 264

interactions for metabolite profiles, supporting the present results. For example, in 265

tomato, metabolite profiling of 26 compounds revealed significant genotype by 266

11

ripening stage interactions (Pascual-Banuls et al., 2013) whereas in durum wheat 267

significant variations in metabolites were attributed to genotype by environment 268

interactions (G×E, see Beleggia et al., 2013). 269

Investigation of correlations between the metabolites revealed significant 270

relationships between traits (see Figure 1). For example, a first cluster composed of 271

sugar-related traits (i.e. sucrose, fructose) as well as ascorbate and dehydroascorbate 272

levels were positively correlated. A second cluster of positively correlated metabolites 273

composed of eight proteogenic amino acids could be distinguished. The traits of these 274

two clusters were related and had significant and negative correlations. These 275

relationships have already been shown notably between several amino acids and 276

sugar-related traits (i.e. fructose and glucose, Pascual-Banuls et al., 2013). 277

Within the set of stable traits, ANOVA revealed significant differences between the 278

three groups of accessions (S.L, S.C and S.P) for 25 (25/37-67.5%) of these traits (see 279

Figure 2 for a boxplot representation). For example, the most significant differences 280

(p<1×10-09) were observed for ascorbate and dehydroascorbate levels or SSC with 281

higher levels in S.P compared to S.C and S.L. This was previously observed through 282

the detection of QTLs related to ascorbate levels (Stevens et al., 2007) and related to 283

SSC (Prudent et al., 2009), as well as through GWA for SSC (see Xu et al., 2013). 284

285

EXPLOITATION OF THE PATTERNS OF LINKAGE DISEQUILIBRIUM 286

The clear population structure allowed us to estimate the patterns of intra-287

chromosomal linkage disequilibrium in the three groups of accessions (S.L, S.C and 288

S.P) using a total of 5995 genome-wide markers. Our analysis of LD revealed 289

considerable variation across the tomato genome in the populations investigated. The 290

same observation was made in two previous studies that investigated LD patterns in 291

tomato on a genome-wide scale. Similar average r2 estimates (r2=0.464) were 292

obtained in fresh market tomato populations reflecting (1) the effects of selection on 293

genome variation and (2) the breeding history of tomato towards market 294

specialization (Robbins et al., 2011). However a second study highlighted biased 295

decay of LD between euchromatic and heterochromatic regions (Shirasawa et al., 296

2013). These previous studies support the high level of LD identified in the present 297

study. The different levels of LD may be interpreted as a direct effect of the 298

domestication that tomato (especially in SL) has undergone during its history through 299

bottlenecks and selective breeding that has led to a reduction in nucleotide diversity 300

12

and an extended LD following the elimination of recombinant lineages (Hamblin et 301

al., 2011). 302

In cultivated tomato, LD decays over large genomic regions (i.e. several hundreds of 303

Kb up to several Mb) and is advantageous for an association mapping approach, as it 304

requires fewer markers to cover the entire genome. On the other hand, the difficulty in 305

identifying the underlying causal polymorphism responsible for the phenotypic 306

variation represents the main drawback of these large blocks of LD. Identifying the 307

causal polymorphism from GWA signals remains challenging, especially in species 308

where dense genome coverage is still not achieved. To overcome these limitations, the 309

multiple locus mixed model (MLMM) proposed by Segura et al. (2012) handles the 310

confounding effect of background loci due to LD at the GWA scan step. This 311

approach out-performed the existing MLM notably by reducing the number of 312

significantly associated SNP rather than the number of peaks. This reduced the 313

number of candidate loci it was necessary to screen in order to identify the causal 314

polymorphism. 315

GENOME-WIDE ASSOCIATION FOR METABOLIC TRAITS 316

The GWA scan revealed a total of 44 loci (or peak SNP) associated with the variation 317

of 19 traits. These 44 loci accounted for various levels of estimated trait heritability 318

(from 0.168 to 0.773), missing heritability (from 0.007 to 0.458) and percentage of 319

trait variation (from 16.2% to 74.3%). These results suggest that different traits have 320

different genetic architectures: in some cases, a few genes may explain a large 321

proportion of the phenotypic variation (i.e. 2 loci explain 74.3% of the variation in 322

fruit dehydroascorbate) while numerous genes may only explain a fraction of the 323

phenotypic variation (i.e. 5 loci explain 33.2% of the ascorbate level variation). These 324

results are supported by similar observations in rice where various genetic 325

architectures were revealed using a GWAS approach for different traits of 326

agronomical interest (Zhao et al., 2011). However, it should be noted that in most 327

GWAS, significantly associated loci might contribute to a larger proportion of 328

phenotypic variation as many other small-to-medium-effect loci were not detected due 329

to the stringent threshold used in GWAS (False Discovery Rate - FDR) and the lack 330

of statistical significance for the control of false negatives caused by small effect sizes 331

(Visscher et al., 2012). Furthermore, the estimates of the missing heritability suggest 332

that for some traits most (or nearly all) loci underlying the variation in these traits 333

have been identified through the genome scan we conducted. For example, for 334

13

threonine, the missing heritability has been estimated to be 0.007, which means that 335

(1) all the loci responsible for the variation of this trait may have been identified and 336

(2) the genetic architecture of this trait may rely on a small number of genes (only one 337

associated locus in the present study). Conversely, for nicotinate, the genetic 338

architecture of the trait requires further investigation since the missing heritability has 339

been estimated at 0.458, which means that a large number of small effect loci or a 340

limited number of large effect loci remain to be identified. Taken together, these 341

observations suggest that the investigated genetic architecture is usually more 342

complex than it appears. 343

A total of 35 of the 44 associated loci (79.5%) were associated with the metabolic 344

traits (9 loci are associated with SSC) and accounted for between 16.2% (aspartate) 345

and 50.4% (rhamnose) of the percentage of variation of these traits. In a previous 346

investigation of QTL related to metabolic traits using a lower number of lines (N=76), 347

Schauer et al. (2008) detected 104 mQTL for 22 distinct amino acids in tomato. Our 348

results obtained using a GWAS approach contrast with these results in terms of the 349

number of QTLs. However, this difference may reflect the methodological principles 350

underlying both approaches. The more stringent threshold used in GWAS (i.e. FDR) 351

and the confounding effect of population structure may explain this difference. This 352

has been observed in the study of glucosinolate metabolites in Arabidopsis (Chan et 353

al., 2010) and the study of leaf metabolic profiles in maize (Riedelsheimer et al., 354

2012). In the latter study, when comparing a linkage mapping experiment to a GWA 355

scan, increased genetic variation was reported suggesting that the genetic variability is 356

greater in the GWAS as it relies on a larger genetic pool (several up to hundreds of 357

individuals) whereas a linkage experiment relies on a much narrower genetic pool (i.e. 358

a couple of parental lines) notably (Riedelsheimer et al., 2012). 359

The 44 associated loci are spread over the tomato genome as every chromosome 360

carries at least one association (chromosomes 1 and 12) with up to 10 on chromosome 361

2. In tomato, chromosome 2 was suggested to be interesting as it carries a lot of QTLs 362

for traits of interest such as fresh weight (fw2.2, Frary et al., 2000), fruit morphology 363

(Causse et al., 2002) or locule number (lcn 2.1, Muños et al., 2011) to name but a few 364

(see Ranc et al., 2012 and Xu et al., 2013 for more information) supporting the 365

identification of numerous associations on this chromosome in the present study. 366

In several cases, peak SNPs of different traits co-localized or the same peak SNP was 367

associated with different traits. For example, peak SNPs for sucrose and erythritol 368

14

levels (solcap_snp_sl_12459 and solcap_snp_sl_13558) co-localized in a region of 369

68kb on chromosome 2. In a similar way, peak SNPs associated with asparagine, 370

threonine and nicotinate levels were localized within an interval of 508.1 kbp on 371

chromosome 2 (solcap_snp_sl_32389 and solcap_snp_sl_29349 respectively). Such 372

co-localization of peak SNPs has been observed in other GWAS experiments in 373

tomato (Xu et al., 2013), Arabidopsis (Bergelson & Roux, 2010) or rice (Zhao et al., 374

2011) suggesting the presence of genes with pleiotropic effects or closely linked 375

genes. 376

On chromosome 2, both asparagine and threonine level traits are associated with the 377

same peak SNP (solcap_snp_sl_32389 – annotated as a Copine-like protein). These 378

two α-amino acids belong to the class of polar uncharged side chain amino acids and 379

are indirectly linked to the Krebs cycle as their biosynthesis relies on oxaloacetate, 380

which is, as an acceptor compound of this cycle, one of its major metabolic 381

intermediates. The identification of an association between asparagine and threonine 382

levels at the same peak SNP means that this locus is located in the close vicinity of 383

one or several crucial and pleiotropic effect gene(s) directly involved in the metabolic 384

pathway of both asparagine and threonine synthesis. This observation suggests that 385

the genomic region around the peak SNP solcap_snp_sl_32389 has to be investigated 386

further to seek for causal polymorphism and candidate genes underlying the genetic 387

architecture of the asparagine and threonine level traits. 388

Malate and citrate levels were associated with one peak SNP located on chromosome 389

6 (solcap_snp_sl_19899) and two peak SNPs located on chromosomes 2 and 6 390

(solcap_snp_sl_6196 and solcap_snp_sl_19899), respectively (see Figure 4 for 391

detailed Manhattan plots and LD patterns). Interestingly, malate and citrate were 392

associated with the same peak SNP (solcap_snp_sl_19899), located on chromosome 6 393

and annotated as a conserved gene of unknown function. This observation suggests 394

that either in this genomic region, the LD block that this peak SNP belongs to is 395

particularly extended or alternatively that this peak SNP was identified close to a gene 396

involved in the citrate and the malate metabolic pathways. The figure 4b representing 397

the local pattern of LD around the peak SNP suggests that the LD level is relatively 398

low in this genomic region. From a functional point of view, citrate and malate are 399

two organic compounds found in most ripe fruits (Etienne et al., 2013) and have been 400

demonstrated to be highly correlated with many important regulators of ripening in 401

studies that have investigated early fruit development (Mounet et al., 2009; Centeno et 402

15

al., 2011). This result suggests that in the present study we were able to identify a 403

peak SNP that is located near to one or several putative candidate gene(s) playing a 404

crucial role in the citrate and the malate metabolic pathways. Moreover, for the malate 405

level trait, the phenotype of heterozygous individuals is intermediate to that of 406

homozygotes (see Figure 3) suggesting an incomplete dominance effect. 407

SEARCHING FOR CANDIDATE GENES 408

In the present study, we conducted GWA using a MLMM to identify more precisely 409

putative candidate genes involved in the genetic architecture of fruit metabolic traits, 410

by taking advantage notably of the LD pattern. For numerous peak SNPs, their 411

functional annotation is not directly linked to the trait they are associated with. 412

However, for some of these peak SNPs, they directly target a previously characterized 413

candidate gene or are located in the close vicinity of putative candidate genes. For 414

example, the peak SNP associated with fruit SSC (Solcap_snp_sl_26678) belongs to a 415

previously validated candidate gene (Solyc09g010080.2, lin5) which encodes a cell 416

wall invertase, and is a locus for a QTL that positively affects tomato fruit sugar 417

content, and hence the important ‘soluble solids’ commercial trait (Schauer et al., 418

2006). However, more putative candidate genes have been identified in the present 419

study. For example, the peak SNP (Solcap_snp_sl_26678) located on chromosome 9 420

(position: 2,411,368 bp) and associated with fruit ascorbate levels is located near to 421

(423 kbp) a monodehydroascorbate reductase (NADH)-like protein (MDHAR, 422

Solyc09g009390.2, position: 2,835,367 bp) that has been previously identified using a 423

QTL fine mapping approach (Stevens et al., 2008). Similarly, the peak SNPs 424

associated with nicotinate, malate and sucrose levels (Solcap_snp_sl_29349, 425

Solcap_snp_sl_19899 and Solcap_snp_sl_17956, respectively) are also located near 426

to (680 kbp, 7.9 kbp and 68 kbp, respectively) putative candidate genes that play a 427

role in the genetic architecture of the variation of these traits. Indeed, these three 428

putative candidate genes are described as a nicotinate phosphoribosyl transferase 429

protein (Solyc02g093290.2, position: 48,771,224 bp), an aluminum-activated malate 430

transporter-like (Solyc06g072910.2, position: 41,337,629 bp) and a sugar transporter 431

(“galactosylgalactosylxylosyl protein 3-beta-glucuronosyltransferase”, 432

Solyc04g076920.2, position: 59,461,803 bp), respectively. Thus these results open the 433

door for subsequent analyses based on either fine localization of the putative 434

candidate gene using a targeted re-sequencing approach combined with GWAS to 435

identify and confirm the causal polymorphism or functional validation for example by 436

16

transgenic approaches to investigate the biological role of the putative candidate gene 437

(e.g. fine mapping of the fw 3.2 locus, Chakrabarti et al., 2013). 438

CONCLUSION AND PERSPECTIVES 439

These results show that high resolution GWA, by using a multi-locus mixed linear 440

model, has been successful in tomato in deciphering the genetic architecture of fruit 441

composition traits. This led to the identification of promising candidate loci that 442

underlie the genetic architecture of traits such as fruit malate and citrate levels, 443

opening the door to further validation and functional investigation of this locus. The 444

next analytical step relies on the integration of the recent methodological 445

developments such as data imputation (Marchini & Howie, 2010; Howie et al., 2012; 446

Porcu et al., 2013) and haplotype based models (Powell et al., 2012) and should 447

facilitate the identification of novel loci with a higher degree of accuracy. 448

MATERIAL AND METHODS 449

PLANT MATERIAL 450

The tomato diversity panel consisted of 163 accessions composed of 28 S. 451

lycopersicum (S.L), 119 S. lycopersicum cv. cerasiforme (S.C) and 16 S. 452

pimpinellifolium (S.P) derived from the previously published core collection 453

described in Xu et al. (2013). Cherry type tomato (S.C) is an admixture between 454

tomato (S.L) and its closest wild relative (S.P) possibly resulting from the frequent 455

hybridizations between them (Nesbitt & Tanksley, 2002; Ranc et al., 2008). In GWA 456

experiments, the power to detect genetic effects is linked to minor allele frequency 457

(MAF) at genotyped loci. The mixing of different groups, populations or subspecies 458

within a panel, will enhance the efficiency of the approach by capturing rare and 459

common variants. Using a diversity panel composed of several sub-populations or 460

species is a common practice in GWA experiments. For example, in O. sativa, Zhao 461

et al. (2011) used a worldwide diversity panel composed of 5 different species to 462

unravel the complex genetics underlying the natural variation of 34 traits in Oryza 463

sativa. In the present study, we mixed accessions of S.L, S.C and S.P in order to (1) 464

cover the broader range of phenotypic and genetic diversity, as we did not expected a 465

uniform distribution of SNP MAF within each group and to (2) overcome the limited 466

statistical power of GWA due to the skewed distribution of SNP MAF within the 467

panel, especially in terms of the detection of false positive associations (Tabangin et 468

al., 2009). 469

17

Plants (four replicates) were grown in the tunnel in Avignon (France) during summer 470

2007 and 2008 (growth conditions are also described in Xu et al., 2013). Fruits were 471

harvested at the ripe stage. Pericarp tissue from 5 fruits per accession was collected to 472

be frozen in liquid nitrogen and stored at -80°C for metabolomic profiling. DNA was 473

isolated from 100 mg frozen leaves using to the DNeasy Plant Mini Kit (Qiagen, 474

Valencia, California, USA) for the subsequent genotyping assay. Leaf samples 475

corresponded to fully expanded but non-senescent leaves. DNA was quantified using 476

the Quant-iT™ PicoGreen® dsDNA Assay Kit (Invitrogen) according to the 477

manufacturer’s protocol. 478

479

PHENOTYPIC VARIATION 480

METABOLITE PROFILING 481

Tomato pericarp tissue was homogenized, and the exact amount used for metabolite 482

extraction was defined. Three biological replicates were analysed for each accession. 483

Metabolite extraction, derivatization, gas chromatography-mass spectrometry analysis 484

(GC-MS), and data processing were performed as described in Schauer et al. (2006). 485

Metabolites were identified in comparison with database entries of authentic standards 486

(Kopka et al., 2005; Schauer et al., 2005). A total of 76 metabolites were measured in 487

the present experiment (see Supplementary Table S1, for the complete list) including 488

amino acids, sugars and sugar alcohols, and organic acids. 489

490

ASCORBIC ACID LEVEL 491

Ascorbate and dehydroascorbate were measured separately from the metabolite 492

profiling using a microplate assay as described in Stevens et al. (2006) on the frozen 493

pericarp material stored at -80°C. Extractions and assays were carried out in ice-cold 494

6% TCA in triplicate. The assay used was a spectrophotometric assay based on the 495

detection of dipyridyl-Fe2+ complexes following the reduction of Fe3+ to Fe2+ by the 496

reduced form of ascorbate present in the samples and comparison with standards of 497

known concentrations. Total ascorbate (reduced ascorbate + dehydroascorbate) was 498

measured by mixing the sample with 5mM DTT, to reduce dehydroascorbate, prior to 499

the assay. Dehydroascorbate concentrations were therefore calculated to be the 500

difference between the samples with and without DTT added. 501

502

SOLUBLE SOLID CONTENT AND SUGARS 503

18

Concentration of fructose, glucose and sucrose were determined within the 163 504

accessions for the two years of sampling using the micro-method developed by 505

Gomez et al. (2007). This method is precise, linear and accurate when compared to 506

HPLC methods. Measurement of soluble solid content (SSC) in °Brix was performed 507

as described in Xu et al. (2013) on fruit frozen powder derived from blending fruits 508

with liquid N2. SSC values primarily represent estimates of sugar content in fruits and 509

vegetables. 510

DATA NORMALISATION AND STATISTICAL ANALYSES 511

All descriptive statistics and analyses were performed using [R] 2.15.1 except if 512

otherwise specified. A non-parametric Kendall (W) test was used to assess agreement 513

among the biological replicates and to remove any outlier measurements. Thus, for 514

each year, biological replicates were averaged and the normal distribution of the data 515

was tested using a Shapiro–Wilk test. The normality test revealed that 29 of the 76 516

phenotypes (38.1%) were not normally distributed and were log10 transformed. For 517

each phenotype a linear regression revealed correlation between the two years. Only 518

highly correlated phenotypes (correlation coefficient R2 > 0.6) between the two years 519

of sampling were averaged and used in the GWA study. The pairwise correlation 520

between phenotypes was evaluated using a Spearman test (p<0.05). The graphical 521

representation of the pairwise correlation between each phenotype was produced 522

using the R package ‘Corrplot’ and the ‘hclust’ clustering method (Friendly, 2002). 523

Finally, for each phenotype, an analysis of variance (ANOVA) tested significant 524

differences between the groups of accessions (S.L / S.C / S.P). Then, a post-ANOVA 525

Tukey HSD (Honest Significant Difference) test created a set of confidence intervals 526

on the differences between the means of each trait, for which a significant association 527

was detected, to test for significant differences between the pairwise means among the 528

3 groups of the panel (i.e. significant difference for the mean level of malate between 529

S.L and S.P). Significance was declared for p values < 0.05. Finally, the individual 530

missing phenotype data that ranged from 0.033% to 12.7% (median of 0.32%), were 531

replaced by the mean value of the trait computed for all the accessions of the panel, as 532

required by the MLMM model. 533

534

GENOTYPING ARRAY AND SNP SELECTION 535

The SNP genotyping was performed using the Infinium assay (Illumina Inc.), 536

developed by the Solanaceae Coordinated Agricultural Project (SolCAP- Hamilton et 537

19

al., 2012; Sim et al., 2012) to genotype the collection of tomato accessions (according 538

to the manufacturer’s standard protocol). The probe sequences and SNP information 539

are available from SolCAP (http://solcap.msu.edu). The SNP calling rate threshold 540

per locus was set at 90%. A minor allele frequency (MAF) ranging from 0.037 to 0.45 541

was used to filter the raw genotype data set. The minimal MAF was set according to 542

the formula: (# of chromosomes/(2*#individuals)) as proposed by Aulchenko (2007). 543

The minimal success rate of genotyping per accession was fixed to 90%. All SNPs 544

and accessions that did not respect these criteria were removed using the --maf option 545

implemented in PLINK! (Purcell et al., 2007). 546

547

ESTIMATION OF POPULATION DIFFERENTIATION AND STRUCTURE 548

Initially the fixation index (FST, Weir & Cockerham, 1984) estimation was performed 549

between the three groups of tomato accessions to get an overview of the population 550

structure. Then, the STRUCTURE software 2.3.3 (Pritchard et al., 2000; Falush et al., 551

2003) was used to infer the number of ancestral populations based on the filtered SNP 552

data set (see results for the number of sites used) and thus to assign the 163 553

individuals to populations (Q matrix). The most likely number of clusters K in all 554

simulations was assumed to be in the range of K=1 to K=10. Ten replicates were 555

conducted for each K with a burn-in period of 1×106, followed by 5×106 MCMC steps 556

using the Bioportal computing resource (https://www. bioportal.uio.no/; Kumar et al., 557

2009). These parameters met the requirements for the use of the STRUCTURE software 558

proposed in Gilbert et al. (2012) to insure the reproducibility of the results of the 559

present study. The ad hoc statistic ΔK was used to determine the most probable K 560

(Evanno et al., 2005). The ancestry estimation using ADMIXTURE software (Alexander 561

et al., 2009), based on the maximum likelihood estimation of individual ancestries 562

from multi-locus SNP genotype datasets, was used to support the identification of the 563

ancestral populations performed with the STRUCTURE software. 564

565

KINSHIP AND LINKAGE DISEQUILIBRIUM ESTIMATION 566

SPAGeDi software (Hardy & Vekemans, 2002) was used to estimate the Ritland 567

(1995) matrix of pairwise kinship coefficient (K matrix) from the filtered SNP data 568

set (see results section) using a 10,000 bootstrap resampling procedure. Then, the 569

intra-chromosomal linkage disequilibrium (LD) between all pairs of sites was 570

estimated using an unbiased (as individuals are not independent) estimation (named 571

20

rs2), that uses the population structure matrix, and consisting of information about the 572

origins of each individual and the admixture proportions of each individual genome. 573

The method is implemented in the [R] package called LDCORSV (Mangin et al., 574

2011). Finally, the snp.plotter [R] package gave a graphical representation of the 575

pairwise LD estimates at a local scale 576

(http://cbdb.nimh.nih.gov/~kristin/snp.plotter.html). 577

578

GWA MAPPING 579

GWA analyses were performed with correction for population structure (Q) and 580

modeling phenotypic covariance with the Kinship (K) matrix. The QQplots were used 581

to determine the most appropriate correcting method for each analyzed phenotype. 582

Thus, these matrices were implemented into a modified version of the multi-locus 583

mixed model (MLMM) described in Segura et al. (2012) that takes into account the 584

population structure as a cofactor (see the mlmm_cof.r [R] script at 585

https://cynin.gmi.oeaw.ac.at/home/resources/mlmm). Briefly, the MLMM model is 586

based on the EMMA library (Kang et al., 2008). The approach relies on a simple, 587

stepwise mixed-model regression with forward inclusion and backward elimination, 588

while re-estimating the variance components of the model at each step. This method 589

increases the detection power and reduces the false discovery rate (FDR) when 590

compared to traditional single-locus approaches. Two model selection criteria are 591

implemented in MLMM for multi-testing correction: the extended Bayesian 592

information criterion (EBIC - Chen & Chen, 2008) and the multiple-Bonferroni 593

criterion (mBonf), defined as the largest model in which all cofactors have a p-value 594

below a Bonferroni-corrected threshold (we used a threshold of 0.05 – see Segura et 595

al. 2012 for details). From the optimal model obtained with MLMM (according to both 596

EBIC and mBonf criteria) the percentage of variation explained were obtained for 597

each phenotype. Moreover, for each trait, the phenotypic heritability was obtained at 598

the step 0 in the MLMM model, when no marker is included in the model, whereas the 599

missing heritability (percentage of the variance not explained by the markers) was 600

obtained at the optimal step of the MLMM model. Briefly, the MLMM model 601

partitions the phenotypic variance into genetic, random and explained variance at each 602

step, suggesting a natural stopping criterion (genetic variance of 0) for including 603

cofactors. This estimates of the explained and unexplained heritable variance of the 604

analysed trait. 605

21

DATA AVAILABILITY 606

To ensure their accessibility, the genotyping (5995 SNP) and phenotyping data 607

(N=36), as well as the structure and kinship matrices for the 163 accessions were 608

deposited on the GNPis repository hosted at https://urgi.versailles.inra.fr/association 609

(Steinbach et al., 2013). The complete phenotype dataset is also available in 610

Supplementary Table S1. 611

ACKNOWLEDGEMENTS 612

We gratefully acknowledge critical input from the associate editor, Jocelyn Rose, and 613

the two anonymous reviewers. The authors are grateful to Hélène Burck and Yolande 614

Carretero for their invaluable help, characterizing and maintaining the INRA tomato 615

Genetic Resources collection, to the Experimental Installation Unit of the INRA-616

UR1052 research unit for maintaining the facilities. Authors are also grateful to 617

Sophie Rolland, Nicolas Ranc, Stéphane Munos for their early involvement in the 618

experiment, as well as Gisèle Riqueau and Renaud Duboscq for DNA and tissue 619

sampling and ascorbate phenotyping. Finally the authors would like to thanks Martin 620

Ganal (TraitGenetics GmbH, Gatersleben, Germany) for his help with SNP array 621

genotyping. 622

623

624

AUTHOR CONTRIBUTIONS 625

Conceived and designed the experiments: AF, PTD, MC. Performed the experiments: 626

CS, PTD, AF. Analyzed the data: CS, VS, GB, RS, ZN. Wrote the paper: CS, VS, RS. 627

628

22

LITERATURE CITED 629

Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of 630

ancestry in unrelated individuals. Gen. Res. 19: 1655-1664. 631

Astle W, Balding DJ (2009) Population structure and cryptic relatedness in genetic 632

association studies. Stat. Sci. 24(4): 451-471. 633

Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. (2007) GenABEL: an R library 634

for genome-wide association analysis. Bioinformatics 23(10): 1294-1296. 635

Bauchet G, Causse M (2012) Genetic diversity in tomato (Solanum lycopersicum) 636

and its wild relatives. In: Mahmut C ed. Genetic Diversity in Plants: ISBN: 637

978-953-51-0185-7, InTech, DOI: 10.5772/33073. 638

Beleggia R, Platani C, Nigro F, De Vita P, Cattivelli L, Papa R (2013) Effect of 639

genotype, environment and genotype-by-environment interaction on 640

metabolite profiling in durum wheat (Triticum durum Desf.) grain. J Cereal 641

Sci 57(2): 183-192. 642

Bergelson J, Roux F (2010) Towards identifying genes underlying ecologically 643

relevant traits in Arabidopsis thaliana. Nat Rev Genet 11(12): 867-879. 644

Blanca J, Canizares J, Cordero L, Pascual L, Diez MJ, Nuez F (2012) Variation 645

revealed by SNP genotyping and morphology provides insight into the origin 646

of the tomato. PLoS ONE 7(10): e48198. 647

Brachi B, Morris G, Borevitz J (2011) Genome-wide association studies in plants: 648

the missing heritability is in the field. Genome Biol 12(10): 232. 649

Causse M, Saliba-Colombani V, Lecomte L, Duffe P, Rousselle P, Buret M 650

(2002) QTL analysis of fruit quality in fresh market tomato: a few 651

chromosome regions control the variation of sensory and instrumental traits. J. 652

Exp. Bot. 53(377): 2089-2098. 653

Centeno DC, Osorio S, Nunes-Nesi A, Bertolo AL, Carneiro RT, Araujo WL, 654

Steinhauser MC, Michalska J, Rohrmann J, Geigenberger P, Oliver SN, 655

Stitt M, Carrari F, Rose JK, Fernie AR (2011) Malate plays a crucial role in 656

starch metabolism, ripening, and soluble solid content of tomato fruit and 657

affects postharvest softening. Plant Cell 23(1): 162-184. 658

Chakrabarti M, Zhang N, Sauvage C, Munos S, Blanca J, Canizares J, Diez MJ, 659

Schneider R, Mazourek M, McClead J, Causse M, van der Knaap E 660

(2013) A cytochrome P450 regulates a domestication trait in cultivated 661

tomato. Proc Natl Acad Sci U S A 110(42): 17125-17130. 662

Chan EKF, Rowe HC, Kliebenstein DJ (2010) Understanding the evolution of 663

defense metabolites in Arabidopsis thaliana using genome-wide association 664

mapping. Genetics 185(3): 991-1007. 665

Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection 666

with large model spaces. Biometrika 95(3): 759-771. 667

Etienne A, Génard M, Lobit P, Mbeguié-A-Mbeguie D, Bugaud C (2013) What 668

controls fleshy fruit acidity? A review of malate and citrate accumulation in 669

fruit cells. J Exp Bot 64(6): 1451-1469. 670

Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of 671

individuals using the software structure: a simulation study. Mol. Ecol. 14(8): 672

2611-2620. 673

Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using 674

multilocus genotype data: linked loci and correlated allele frequencies. 675

Genetics 164(4): 1567-1587. 676

23

Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, Liu J, Meller J, Elber R, 677

Alpert KB, Tanksley SD (2000) fw2.2: a quantitative trait locus key to the 678

evolution of tomato fruit size. Science 289 (5476): 85-88. 679

Fridman E, Carrari F, Liu YS, Fernie AR, Zamir D (2004) Zooming in on a 680

quantitative trait for tomato yield using interspecific introgressions. Science 681

305 (5691): 1786-1789. 682

Friendly M (2002) Corrgrams: Exploratory displays for correlation matrices. Am Stat 683

56: 316-324. 684

Fulton TM, Bucheli P, Voirol E, Lopez J, Petiard V, Tanksley SD (2002) 685

Quantitative trait loci (QTL) affecting sugars, organic acids and other 686

biochemical properties possibly contributing to flavor, identified in four 687

advanced backcross populations of tomato. Euphytica 127(2): 163-177. 688

Gilbert KJ, Andrew RL, Bock DG, Franklin MT, Kane NC, Moore J-S, Moyers 689

BT, Renaut S, Rennison DJ, Veen T, Vines TH (2012) Recommendations 690

for utilizing and reporting population genetic analyses: the reproducibility of 691

genetic clustering using the program structure. Mol Ecol 21(20): 4925-4930. 692

Giovannoni J (2001) Molecular biology of fruit maturation and ripening. Annu Rev 693

Plant Physiol Plant Mol Biol 52: 725-749. 694

Gomez L, Bancel D, Rubio E, Vercambre G (2007) The microplate reader: an 695

efficient tool for the separate enzymatic analysis of sugars in plant tissues-696

validation of a micro-method. J Sci Food Agric 87(10): 1893-1905. 697

Hamblin MT, Buckler ES, Jannink JL (2011) Population genetics of genomics-698

based crop improvement methods. Trends Genet 27(3): 98-106. 699

Hamilton JP, Sim S-C, Stoffel K, Van Deynze A, Buell CR, Francis DM (2012) 700

Single nucleotide polymorphism discovery in cultivated tomato via 701

sequencing by synthesis. Plant Gen. 5(1): 17-29. 702

Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to analyse 703

spatial genetic structure at the individual or population levels. Mol Ecol Notes 704

2: 612-620. 705

Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and 706

accurate genotype imputation in genome-wide association studies through pre-707

phasing. Nat Genet 44(8): 955-959. 708

Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, 709

Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng Q, 710

Liu K, Huang T, Zhou T, Jing Y, Lin Z, Buckler ES, Qian Q, Zhang QF, 711

Li J, Han B (2010) Genome-wide association studies of 14 agronomic traits 712

in rice landraces. Nat Genet 42(11): 961-967. 713

Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E 714

(2008) Efficient control of population structure in model organism association 715

mapping. Genetics 178(3): 1709-1723. 716

Kopka J, Schauer N, Krueger S, Birkemeyer C, Usadel B, Bergmuller E, 717

Dormann P, Weckwerth W, Gibon Y, Stitt M, Willmitzer L, Fernie AR, 718

Steinhauser D (2005) [email protected]: the Golm Metabolome Database. 719

Bioinformatics 21(8): 1635-1638. 720

Korte A, Farlow A (2013) The advantages and limitations of trait analysis with 721

GWAS: a review. Plant Methods 9(1): 29. 722

Kumar S, Skjaeveland A, Orr R, Enger P, Ruden T, Mevik B-H, Burki F, 723

Botnen A, Shalchian-Tabrizi K (2009) AIR: A batch-oriented web program 724

package for construction of supermatrices ready for phylogenomic analyses. 725

BMC Bioinformatics 10(1): 357. 726

24

Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D (2012) 727

Improved linear mixed models for genome-wide association studies. Nat 728

Methods 9(6): 525-526. 729

Loiselle BA, Sork VL, Nason J, Graham C (1995) Spatial genetic structure of a 730

tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot 82: 731

1420-1425. 732

Lu Y, Zhang S, Shah T, Xie C, Hao Z, Li X, Farkhari M, Ribaut J-M, Cao M, 733

Rong T, Xu Y (2010) Joint linkage-linkage disequilibrium mapping is a 734

powerful approach to detecting quantitative trait loci underlying drought 735

tolerance in maize. Proc Natl Acad Sci USA 107(45): 19585-19590. 736

Mandel JR, Nambeesan S, Bowers JE, Marek LF, Ebert D, Rieseberg LH, 737

Knapp SJ, Burke JM (2013) Association mapping and the genomic 738

consequences of selection in sunflower. PLoS Genet 9(3): e1003378. 739

Mangin B, Siberchicot A, Nicolas S, Doligez A, This P, Cierco-Ayrolles C (2011) 740

Novel measures of linkage disequilibrium that correct the bias due to 741

population structure and relatedness. Heredity 108: 285-291. 742

Marchini J, Howie B (2010) Genotype imputation for genome-wide association 743

studies. Nat Rev Genet 11(7): 499-511. 744

Mauricio R (2001) Mapping quantitative trait loci in plants: uses and caveats for 745

evolutionary biology. Nat Rev Genet 2(5): 370-381. 746

Mounet F, Moing A, Garcia V, Petit J, Maucourt M, Deborde C, Bernillon S, Le 747

Gall G, Colquhoun I, Defernez M, Giraudel JL, Rolin D, Rothan C, 748

Lemaire-Chamley M (2009) Gene and metabolite regulatory network 749

analysis of early developing fruit tissues highlights new candidate genes for 750

the control of tomato fruit composition and development. Plant Physiol 751

149(3): 1505-1528. 752

Muños S, Ranc N, Botton E, Bérard A, Rolland S, Duffé P, Carretero Y, Le 753

Paslier M-C, Delalande C, Bouzayen M, Brunel D, Causse M (2011) 754

Increases in tomato fruit size and locule number is controlled by two key SNP 755

located near Wuschel. Plant Physiol 156(4). 756

Nesbitt TC, Tanksley SD (2002) Comparative sequencing in the genus 757

Lycopersicon: implications for the evolution of fruit size in the domestication 758

of cultivated tomatoes. Genetics 162(1): 365 - 379. 759

Porcu E, Sanna S, Fuchsberger C, Fritsche LG (2013) Genotype imputation in 760

genome-wide association studies. Curr Protoc Hum Genet Chapter 1: Unit 1 761

25. 762

Powell JE, Kranis A, Floyd J, Dekkers JCM, Knott S, Haley CS (2012) Optimal 763

use of regression models in genome-wide association studies. Anim Genet 764

43(2): 133-143. 765

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure 766

using multilocus genotype data. Genetics 155(2): 945-959. 767

Prudent M, Causse M, Génard M, Tripodi P, Grandillo S, Bertin N (2009) 768

Genetic and physiological analysis of tomato fruit weight and composition: 769

influence of carbon availability on QTL detection. J Exp Bot 60(3): 923-937. 770

Purcell S, Neale BD, Todd-Brown K, Thomas L, Ferreira ME, Bender D, Maller 771

J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) Plink: a toolset for 772

whole-genome association and population-based linkage analysis. Am J Hum 773

Gen 81 (3):559-75. 774

25

Ranc N, Munos S, Santoni S, Causse M (2008) A clarified position for Solanum 775

lycopersicum var. cerasiforme in the evolutionary history of tomatoes 776

(Solanaceae). BMC Plant Biol 8: 130. 777

Ranc N, Munos S, Xu J, Le Paslier M-C, Chauveau A, Bounon R, Rolland S, 778

Bouchet JP, Brunel D, Causse M (2012) Genome-wide association mapping 779

in tomato (Solanum lycopersicum) is possible using genome admixture of 780

Solanum lycopersicum var. cerasiforme. G3 2(8): 853-864. 781

Riedelsheimer C, Lisec J, Czedik-Eysenberg A, Sulpice R, Flis A, Grieder C, 782

Altmann T, Stitt M, Willmitzer L, Melchinger AE (2012) Genome-wide 783

association mapping of leaf metabolic profiles for dissecting complex traits in 784

maize. Proc Natl Acad Sci U S A 109(23): 8872-8877. 785

Robbins C, Sim S-C, Yang W, Deynze AE, Knaap E, Joobeur T, Francis D 786

(2011) Mapping and linkage disequilibrium analysis with a genome-wide 787

collection of SNPs that detect polymorphism in cultivated tomato. J. Exp. Bot. 788

62(6): 1831-1845. 789

Schauer N, Semel Y, Balbo I, Steinfath M, Repsilber D, Selbig J, Pleban T, 790

Zamir D, Fernie AR (2008) Mode of inheritance of primary metabolic traits 791

in tomato. Plant Cell 20(3): 509-523. 792

Schauer N, Semel Y, Roessner U, Gur A, Balbo I, Carrari F, Pleban T, Perez-793

Melis A, Bruedigam C, Kopka J, Willmitzer L, Zamir D, Fernie AR 794

(2006) Comprehensive metabolic profiling and phenotyping of interspecific 795

introgression lines for tomato improvement. Nat Biotechnol 24(4): 447-454. 796

Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, 797

Lundgren K, Roessner-Tunali U, Forbes MG, Willmitzer L, Fernie AR, 798

Kopka J (2005) GC-MS libraries for the rapid identification of metabolites in 799

complex biological samples. FEBS Lett 579(6): 1332-1337. 800

Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, Nordborg M 801

(2012) An efficient multi-locus mixed-model approach for genome-wide 802

association studies in structured populations. Nat Genet 44(7): 825-830. 803

Shirasawa K, Fukuoka H, Matsunaga H, Kobayashi Y, Kobayashi I, Hirakawa 804

H, Isobe S, Tabata S (2013) Genome-wide association studies using single 805

nucleotide polymorphism markers developed by re-sequencing of the genomes 806

of cultivated tomato. DNA Res 20(6): 593-603. 807

Sillanpää MJ (2010) Overview of techniques to account for confounding due to 808

population stratification and cryptic relatedness in genomic data association 809

analyses. Heredity 106(4): 511-519. 810

Sim S-C, Durstewitz G, Plieske Jr, Wieseke R, Ganal MW, Van Deynze A, 811

Hamilton JP, Buell CR, Causse M, Wijeratne S, Francis DM (2012) 812

Development of a large SNP genotyping array and generation of high-density 813

genetic maps in tomato. PLoS ONE 7(7): e40563. 814

Soto-Cerda BJ, Cloutier S (2012) Association mapping in plant genomes. In: (Ed.) 815

PMC ed. Genetic Diversity in Plants. ISBN: 978-953-51-0185-7, InTech, 816

DOI: 10.5772/33005. Available from: 817

http://www.intechopen.com/books/genetic-diversity-in-plants/association-818

mapping-in-plant-genomes. 819

Steinbach D, Alaux M, Amselem J, Choisne N, Durand S, Flores Rl, Keliet A-O, 820

Kimmel E, Lapalu N, Luyten I, Michotey Cl, Mohellibi N, Pommier C, 821

Reboux Sb, Valdenaire De, Verdelet D, Quesneville H (2013) GnpIS: an 822

information system to integrate genetic and genomic data from plants and 823

fungi. Database 2013. 824

26

Stevens R, Buret M, Duffe P, Garchery C, Baldet P, Rothan C, Causse M (2007) 825

Candidate genes and quantitative trait loci affecting fruit ascorbic acid content 826

in three tomato populations. Plant Physiol 143(4): 1943-1953. 827

Stevens R, Buret M, Garchery C, Carretero Y, Causse M (2006) Technique for 828

rapid, small-scale analysis of vitamin C levels in fruit and application to a 829

tomato mutant collection. J Agric Food Chem 54(17): 6159-6165. 830

Stevens R, Page D, Gouble B, Garchery C, Zamir D, Causse M (2008) Tomato 831

fruit ascorbic acid content is linked with monodehydroascorbate reductase 832

activity and tolerance to chilling stress. Plant Cell Environ 31(8): 1086-1096. 833

Tabangin ME, Woo JG, Martin LJ (2009) The effect of minor allele frequency on 834

the likelihood of obtaining false positives. BMC Proc 3 Suppl 7: S41. 835

The Tomato Genome Consortium (2012) The tomato genome sequence provides 836

insights into fleshy fruit evolution. Nature 485(7400): 635-641. 837

Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS 838

discovery. Am J Hum Genet 90(1): 7-24. 839

Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of 840

population structure. Evolution 38: 1358-1370. 841

Xu J, Ranc N, Munos S, Rolland S, Bouchet J-P, Desplat N, Paslier M-C, Liang 842

Y, Brunel D, Causse M (2013) Phenotypic diversity and association mapping 843

for fruit quality traits in cultivated tomato and related species. Theor Appl 844

Genet 126(3): 567-581. 845

Zhang N, Brewer M, Knaap E (2012) Fine mapping of fw3.2 controlling fruit 846

weight in tomato. Theor Appl Genet 125(2): 273-284. 847

Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, 848

Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, 849

McCouch SR (2011) Genome-wide association mapping reveals a rich 850

genetic architecture of complex traits in Oryza sativa. Nat Commun 2: 467. 851

852

27

FIGURE LEGENDS 853

Figure 1 | Lower matrix displaying the correlations between each analyzed phenotype 854

adjusted for the year effect. The correlation coefficient (Spearman) ranges from -1 855

(red colour) to +1 (blue colour). 856

857

Figure 2 | Boxplot graphical representations of the distribution for the 19 traits that 858

showed a significant association. In all graphs, mean values labeled with different 859

letters are significantly different, whereas those with same letters are not (Tukey’s 860

test, P < 0.05). Abbreviation : S.C - Solanum cerasiforme; S.L - Solanum 861

lycopersicum; S.P - Solanum pimpinellifolium 862

863

Figure 3 | A focus on the malate level results. (A) Correlation for the malate level over 864

the two years of sampling in the collection of 163 accessions; (B) Variation of malate 865

level adjusted for the year effect within the three groups; (C) Distribution of the 866

adjusted malate level in the collection; (D) Manhattan plot for the 12 tomato 867

chromosomes (X-axis) and associated p-values for each marker (Y-axis); (E) QQplots 868

of the observed p-value distribution; (F) Evolution of genetic variance at each step of 869

the MLMM model (Blue: Genetic Variance Explained; Green: Total Genetic Variance; 870

Red: Error) for the optimal model (step=2-BIC criterion); (G and H) Allelic effect for 871

the two associated markers on chromosomes 2 and 6. Abbreviation : S.C - Solanum 872

cerasiforme; S.L - Solanum lycopersicum; S.P - Solanum pimpinellifolium 873

874

Figure 4 | Manhattan plots displaying the –log10(p-values) (Y-axis) over genomic 875

positions (X-axis) in a window of 2.5Mbp up and downstream of the two loci 876

associated with the malate level trait that are located on chromosome 2 (Fig 5a) and 877

chromosome 6 (Fig 5b). Different colour schemes are used to represent the pairwise 878

LD estimates (rs2) for each genomic location. 879

880

28

Table I | Intra-chromosomal linkage disequilibrium (rs2) in each tomato group. This estimate takes into account the effect of population structure 881

(see Mangin et al., 2011). 882

883

884

Mean pairwise rs2 K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 all K

S. lycopersicum 0.5508 0.5988 0.5895 0.5570 0.6029 0.6235 0.5416 0.5397 0.5231 0.5539 0.5938 0.5389 0.5678

S. cerasiforme 0.5391 0.5318 0.5394 0.5191 0.5500 0.5530 0.5320 0.5337 0.5204 0.5315 0.5619 0.5117 0.5353

S. pimpinellifolium 0.3323 0.3239 0.2884 0.3872 0.3557 0.2604 0.3478 0.3338 0.3431 0.3923 0.2917 0.3968 0.3378

885

886

29

Table II | Detailed information for the 44 significant associations detected within the 36 traits analysed using the MLMM model. 887

Phenotype SNP1 Chromosome SNP Position (bp)2 p-value3 UniRef Annotation Locus name

(ITAG 2.3)4

ASA solcap_snp_sl_12749 6 36931366 1.42e-05 Peptide transporter, TGF-beta receptor, type

I/II extracellular region Solyc06g065020.2

ASA solcap_snp_sl_37057 7 63886939 2.94e-10 Conserved gene of unknown function Solyc07g064580.2

ASA solcap_snp_sl_26678 9 2411418 1.09e-07 Repressor of silencing 1 Solyc09g009080.2

ASA solcap_snp_sl_46662 9 61773785 1.07e-05 Gene of unknown function Solyc09g074480.1

ASA solcap_snp_sl_62616 11 3393838 4.66e-08 ATP dependent RNA helicase Solyc11g010310.1

Asparagine solcap_snp_sl_32389 2 48943496 1.93e-07 Copine-like protein Solyc02g093520.2

Aspartate solcap_snp_sl_11456 4 58318210 1.67e-07 BHLH transcription factor Solyc04g074810.2

SSC solcap_snp_sl_26136 2 29851816 7.79e-26 Mannose-6-phosphate isomerase Solyc02g063220.2

SSC CT232_snp229 2 43207682 7.73e-10 UV excision repair protein RAD23 protein Solyc02g085840.2

SSC solcap_snp_sl_63048 3 71026 0.0006 CXE carboxylesterase Solyc03g005100.2

SSC solcap_snp_sl_35206 6 1748271 2.92e-21 Auxin signaling F-box1 family protein Solyc06g007830.1

SSC solcap_snp_sl_53288 7 60078938 1.22e-12 Beta-1,3-galactosyl-O-glycosyl-glycoprotein

beta-1,6-N-acetylglucosaminyltransferase 7 Solyc07g054440.2

SSC solcap_snp_sl_65072 8 59477446 5.57e-08 Agenet domain containing protein Solyc08g078530.2

SSC solcap_snp_sl_39725 9 3477979 1.34e-33 Beta-fructofuranosidase (aka lin5) Solyc09g010080.2

SSC solcap_snp_sl_10594 11 2481288 1.89e-13 Single-stranded nucleic acid binding R3H

domain protein Solyc11g008250.1

SSC solcap_snp_sl_659 12 45751611 2.41e-06 Gene of unknown function NA

Citrate solcap_snp_sl_19899 6 41345468 1.48e-07 Conserved gene of unknown function Solyc06g072930.2

DHA solcap_snp_sl_69445 9 64606433 3.16e-39 Ubiquitin carboxyl-terminal hydrolase family

protein Solyc09g089560.2

30

DHA solcap_snp_sl_21770 11 3063738 8.49e-07 Pentatricopeptide repeat-containing protein SGN-U564017

Erythritol solcap_snp_sl_13558 2 36559326 1.24e-07 Pollen allergen Che a 1 Solyc02g076860.2

Erythritol solcap_snp_sl_60698 10 64445598 5.98e-16 Flavin oxidoreductase/NADH oxidase Solyc10g086220.1

Fructose solcap_snp_sl_16136 5 59787171 9.31e-07 Conserved gene of unknown function Solyc05g050500.1

Fructose solcap_snp_sl_27215 6 38384375 9.05e-07 Katanin p60 ATPase-containing subunit Solyc06g066810.2

Fucose solcap_snp_sl_20802 3 60860146 2.70e-07 UV excision repair protein RAD23 protein Solyc03g117780.2

Fucose solcap_snp_sl_53149 4 53628534 1.63e-06 Structural constituent of ribosome Solyc04g056530.1

GABA solcap_snp_sl_35255 6 1330594 5.53e-08 TatD DNase domain-containing

deoxyribonuclease Solyc06g007310.2

Malate solcap_snp_sl_6196 2 13905175 1.28e-06 Gene of unknown function SGN-U565892

Malate solcap_snp_sl_19899 6 41345468 2.48e-08 Conserved gene of unknown function Solyc06g072930.2

Nicotinate solcap_snp_sl_29349 2 49451582 3.83e-06 Uridylyltransferase PII Solyc02g094300.2

Proline solcap_snp_sl_100675 2 28798838 3.71e-06 Conserved gene of unknown function NA

Proline solcap_snp_sl_32499 6 21807134 3.91e-07 Membrane-associated progesterone receptor

component 1 Solyc06g035870.2

Rhamnose solcap_snp_sl_40309 1 84253735 2.61e-08 Embryo-specific 3 SGN-U565850

Rhamnose solcap_snp_sl_34196 3 59102190 2.32e-09 Conserved gene of unknown function Solyc03g115250.2

Rhamnose solcap_snp_sl_56631 8 1403227 9.41e-06 Patatin-1-Kuras 2 Solyc08g006860.2

Rhamnose solcap_snp_sl_39722 9 3484890 2.10e-10 Gene of unknown function SGN-U565153

Sucrose solcap_snp_sl_13549 2 36490995 2.57e-06 Conserved gene of unknown function Solyc02g076800.1

Sucrose solcap_snp_sl_17956 4 59392982 6.01e-05 Glutamyl-tRNA reductase Solyc04g076870.2

Sucrose solcap_snp_sl_29483 5 4037126 9.51e-09 Glycosyltransferase family GT8 protein Solyc05g009820.2

Threonate solcap_snp_sl_11456 4 58318160 5.73e-06 BHLH transcription factor Solyc04g074810.2

Threonine solcap_snp_sl_32389 2 48943446 3.75e-07 Copine-like protein Solyc02g093520.2

Tocopherol solcap_snp_sl_46445 10 2199297 4.35e-07 Conserved gene of unknown function Solyc10g008030.2

31

Tyramine solcap_snp_sl_14531 8 2587919 1.12e-05 Conserved gene of unknown function Solyc08g008120.2

Tyramine solcap_snp_sl_64706 8 57571484 1.18e-07 Lysine-specific demethylase 5A Solyc08g076390.2

Tyramine solcap_snp_sl_36166 11 762353 1.54e-06 Transcription regulator SGN-U275742

888

1 –SNP names as given in the SOLCAP SNP array (see http://solcap.msu.edu) 889

2 – SNP genomic position onto the tomato reference genome (v2.40) 890

3 – SNP p-values 891

4 – Name of the locus the peak SNP belongs to (according to the tomato genome annotation v2.30) 892

893

32

Table III | Summary of trait associations showing the heritability of the trait (h2, step 0 894

in the MLMM model), the missing heritability (h2 at the optimal model), the 895

percentage of associated variation of the trait (PVE), the number of significant loci 896

associated with the trait variation. 897

Phenotype Trait h2 Missing

h2 PVE

#

associations

ASA 0.553 0.333 0.561 5

Asparagine 0.417 0.208 0.220 1

Aspartate 0.284 0.301 0.162 1

Brix 0.600 0.185 0.611 9

Citrate 0.423 0.299 0.181 1

DHA 0.595 0.192 0.743 2

Erythritol 0.534 0.286 0.358 2

Fructose 0.565 0.250 0.386 2

Fucose 0.415 0.365 0.481 2

GABA 0.415 0.184 0.237 1

Malate 0.642 0.182 0.390 2

Nicotinate 0.595 0.458 0.279 1

Proline 0.773 0.282 0.461 2

Rhamnose 0.579 0.195 0.504 4

Sucrose 0.585 0.220 0.439 3

Threonate 0.168 0.174 0.170 1

Threonine 0.348 0.007 0.187 1

Tocopherol 0.306 0.261 0.224 1

Tyramine 0.612 0.347 0.472 3

Min 0.168 0.007 0.162 1

Max 0.773 0.458 0.743 9

Median 0.553 0.250 0.386 2

898

899

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Citrate

Putrescine

Saccharate

Phenylalanine

Lysine

Methionine

Asparagine

Glutamine

betaAlanine

GABA

Serine

Threonine

Malate

Galacturonate

Glucuronate

Xylose

Glutarate2oxo

Tocopherol

Inositol1P

Erythritol

Brix

Fructose

Sucrose

Maltitol

Proline

ASA

DHA

Nicotinate

Tyramine

Glycerol3P

Glutamate

Aspartate

Threonate

Rhamnose

Fucose

Maltose

CitratePutrescineSaccharate

PhenylalanineLysine

MethionineAsparagineGlutaminebetaAlanine

GABASerine

ThreonineMalate

GalacturonateGlucuronate

XyloseGlutarate2oxoTocopherolInositol1PErythritol

BrixFructoseSucroseMaltitolProlineASADHA

NicotinateTyramineGlycerol3PGlutamateAspartateThreonateRhamnose

FucoseMaltose

Christopher Sauvage

Figure 1 | Lower matrix displaying the correlations between each analyzed phenotype adjusted for the year effect. The correlation coefficient (Spearman) ranges from -1 (red colour) to +1 (blue colour).

S.C

S.L

S.P

1.1 1.3 1.5 1.7Groups

ASA

b

a

c

S.C

S.L

S.P

-1.0 0.0 0.5

Groups

Asparagine

a a

a

S.C

S.L

S.P

-0.4 0.0 0.4

Groups

Aspartate

a a

a

S.C

S.L

S.P

4 6 8 10

Groups

SSC

b

a

c

S.C

S.L

S.P

0.0 0.4 0.8

Groups

Citrate

a a

b

S.C

S.L

S.P

-0.4 0.0

Groups

DHA

b

a

c

S.C

S.L

S.P

-0.6 -0.2 0.2

Groups

Erythritol

b a

b

S.C

S.L

S.P

0.0 0.4 0.8

Groups

Fructose

b

a

c

S.C

S.L

S.P

-0.2 0.0 0.2

Groups

Fucose

b

a

c

S.C

S.L

S.P

-0.8 -0.2 0.4

Groups

GABA

b

c

a

S.C

S.L

S.P

-0.6 0.0 0.4

Groups

Malate

a b

a b

S.C

S.L

S.P

-0.2 0.2

Groups

Nicotinate

b

a

c

S.C

S.L

S.P

-1.0 0.0

Groups

Proline

b a

b

S.C

S.L

S.P

0.2 0.6 1.0

Groups

Rhamnose

b a

b

S.C

S.L

S.P

-0.6 0.0 0.4

Groups

Sucrose

b

a

c

S.C

S.L

S.P

-0.4 0.0 0.4

Groups

Threonate

a a

a

S.C

S.L

S.P

-0.4 0.0 0.4

Groups

Threonine

a a

a

S.C

S.L

S.P

0.2 0.6 1.0

Groups

Tocopherol

b a

a b

S.C

S.L

S.P

-0.4 0.0 0.4

Groups

Tyramine

b

a

c

Christopher Sauvage

Figure 2 | Boxplot graphical representations of the distribution for the 19 traits that showed a significant association. In all graphs, mean values labeled with different letters are significantly different, whereas those with same letters are not (Tukey’s test, P < 0.05). Abbreviation : S.C - Solanum cerasiforme; S.L - Solanum lycopersicum; S.P - Solanum pimpinellifolium

Christopher Sauvage

-0.5 0.0 0.5

-0.4

0.00.20.4

Year effect on the measurement of malate content

Malate content measured in 2007

Mal

ate

cont

ent m

easu

red

in 2

008

S.P S.C S.L

-0.6

-0.2

0.2

0.6

Malate content in each species

Mal

ate

cont

ent

Distribution of Malate content in the Collection

Malate Content

Density

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

0.0

0.5

1.0

1.5

02

46

812

Manhattan plot

Chromosome

-log10(pval)

1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5

02

46

810

QQplot for Optimal model according to mbonf

Expected ! log10(p)

Observed

!log 10(p)

0 5 10 15 20 25

0.0

0.2

0.4

0.6

0.8

1.0

Partition of variance at each step of MLMM

step

%var

CC CG GG

-0.6

-0.2

0.2

0.6

Allelic effect for the associated SNP on chromosome 2

Mal

ate

cont

ent

TT TC CC

-0.6

-0.2

0.2

0.6

Allelic effect for the associated SNP on chromosome 6

Mal

ate

cont

ent

(B)(A)

(C) (D)

(E)(F)

(G) (H)

Christopher Sauvage

Figure 3 | A focus on the malate level results. (A) Correlation for the malate level over the two years of sampling in the collection of 163 accessions; (B) Variation of malate level adjusted for the year effect within the three groups; (C) Distribution of the adjusted malate level in the collection; (D) Manhattan plot for the 12 tomato chromosomes (X-axis) and associated p-values for each marker (Y-axis); (E) QQplots of the observed p-value distribution; (F) Evolution of genetic variance at each step of the MLMM model (Blue: Genetic Variance Explained; Green: Total Genetic Variance; Red: Error) for the optimal model (step=2-BIC criterion); (G and H) Allelic effect for the two associated markers on chromosomes 2 and 6. Abbreviation : S.C - Solanum cerasiforme; S.L - Solanum lycopersicum; S.P - Solanum pimpinellifolium

Chromosome 2

11041792 15817593

0

1

2

3

4

5

6

−log

(p−v

alue

)

!

!!

! !!

!!

!

!!

!

! !!!!

!

!

!

!!!

!

!!

!

!!!!!! !

!!

!!

!!!

!

!!!!

!

!

!

!

!!!!!

!

!

!

!

!

!

!

!

!!

sample!

solc

ap_s

np_s

l_32

181

solc

ap_s

np_s

l_32

184

solc

ap_s

np_s

l_11

569

solc

ap_s

np_s

l_69

274

solc

ap_s

np_s

l_11

572

solc

ap_s

np_s

l_69

276

solc

ap_s

np_s

l_58

71so

lcap

_snp

_sl_

5872

solc

ap_s

np_s

l_58

75so

lcap

_snp

_sl_

5879

solc

ap_s

np_s

l_58

81so

lcap

_snp

_sl_

5882

solc

ap_s

np_s

l_58

83so

lcap

_snp

_sl_

5884

solc

ap_s

np_s

l_52

608

solc

ap_s

np_s

l_52

609

solc

ap_s

np_s

l_31

002

solc

ap_s

np_s

l_58

85so

lcap

_snp

_sl_

4082

3so

lcap

_snp

_sl_

4082

0so

lcap

_snp

_sl_

1386

solc

ap_s

np_s

l_13

85so

lcap

_snp

_sl_

1384

solc

ap_s

np_s

l_13

82so

lcap

_snp

_sl_

1381

solc

ap_s

np_s

l_40

819

solc

ap_s

np_s

l_13

80so

lcap

_snp

_sl_

1379

solc

ap_s

np_s

l_26

911

solc

ap_s

np_s

l_61

76so

lcap

_snp

_sl_

6177

solc

ap_s

np_s

l_61

80so

lcap

_snp

_sl_

5312

0so

lcap

_snp

_sl_

6183

solc

ap_s

np_s

l_61

84so

lcap

_snp

_sl_

6185

solc

ap_s

np_s

l_61

89so

lcap

_snp

_sl_

6192

solc

ap_s

np_s

l_31

177

solc

ap_s

np_s

l_61

94so

lcap

_snp

_sl_

6195

solc

ap_s

np_s

l_61

96so

lcap

_snp

_sl_

6199

solc

ap_s

np_s

l_62

01so

lcap

_snp

_sl_

6206

solc

ap_s

np_s

l_31

179

solc

ap_s

np_s

l_31

184

solc

ap_s

np_s

l_62

26so

lcap

_snp

_sl_

6227

solc

ap_s

np_s

l_62

28so

lcap

_snp

_sl_

6229

solc

ap_s

np_s

l_53

126

solc

ap_s

np_s

l_62

32so

lcap

_snp

_sl_

3118

9so

lcap

_snp

_sl_

3119

0so

lcap

_snp

_sl_

3119

1so

lcap

_snp

_sl_

3119

2so

lcap

_snp

_sl_

3119

3so

lcap

_snp

_sl_

6239

solc

ap_s

np_s

l_62

40so

lcap

_snp

_sl_

3119

6so

lcap

_snp

_sl_

3119

8so

lcap

_snp

_sl_

6247

solc

ap_s

np_s

l_62

49so

lcap

_snp

_sl_

6255

LD Map Type: r−squarePhysical Distance: 4775.8 kb

0 0.2 0.4 0.6 0.8 1

(A)

Chromosome 6

39116919 43961486

0123456789

101112

−log

(p−v

alue

)

!

!!

!

!

! ! !

!

!

! !

! !!

!

!!

!!!!

!

!

!

!

!

!

!

!!

!

!

!

!

!!!

!! !

!!

!!

!! ! !! !

! !

!

! !!

sample!

solc

ap_s

np_s

l_57

093

solc

ap_s

np_s

l_57

106

solc

ap_s

np_s

l_57

155

solc

ap_s

np_s

l_19

852

solc

ap_s

np_s

l_57

180

solc

ap_s

np_s

l_57

217

solc

ap_s

np_s

l_57

228

solc

ap_s

np_s

l_57

252

solc

ap_s

np_s

l_57

254

solc

ap_s

np_s

l_57

294

solc

ap_s

np_s

l_57

327

solc

ap_s

np_s

l_57

347

CL0

1610

2−04

29_s

olca

p_sn

p_sl

_573

52Bc

yc_8

68so

lcap

_snp

_sl_

5737

4so

lcap

_snp

_sl_

5737

6so

lcap

_snp

_sl_

5737

7so

lcap

_snp

_sl_

2442

8so

lcap

_snp

_sl_

5739

8so

lcap

_snp

_sl_

5743

5so

lcap

_snp

_sl_

5745

5so

lcap

_snp

_sl_

5746

5so

lcap

_snp

_sl_

5751

2so

lcap

_snp

_sl_

5751

3C

L015

545−

0057

_sol

cap_

snp_

sl_5

7521

solc

ap_s

np_s

l_24

436

solc

ap_s

np_s

l_24

437

solc

ap_s

np_s

l_19

898

solc

ap_s

np_s

l_19

899

solc

ap_s

np_s

l_57

533

solc

ap_s

np_s

l_24

440

solc

ap_s

np_s

l_57

574

solc

ap_s

np_s

l_24

449

SL10

882_

924

solc

ap_s

np_s

l_57

590

solc

ap_s

np_s

l_57

593

solc

ap_s

np_s

l_57

594

solc

ap_s

np_s

l_24

450

solc

ap_s

np_s

l_57

607

solc

ap_s

np_s

l_24

454

solc

ap_s

np_s

l_19

915

solc

ap_s

np_s

l_57

644

solc

ap_s

np_s

l_57

681

solc

ap_s

np_s

l_57

714

solc

ap_s

np_s

l_57

749

solc

ap_s

np_s

l_57

766

solc

ap_s

np_s

l_57

810

solc

ap_s

np_s

l_57

817

solc

ap_s

np_s

l_57

866

solc

ap_s

np_s

l_39

395

solc

ap_s

np_s

l_39

360

solc

ap_s

np_s

l_39

318

solc

ap_s

np_s

l_39

312

solc

ap_s

np_s

l_39

310

solc

ap_s

np_s

l_39

281

solc

ap_s

np_s

l_39

246

solc

ap_s

np_s

l_16

543

LD Map Type: r−squarePhysical Distance: 4844.6 kb

0 0.2 0.4 0.6 0.8 1

(B)

Christopher Sauvage

Figure 4 | Manhattan plots displaying the –log10(p-values) (Y-axis) over genomic positions (X-axis) in a window of 2.5Mbp up and downstream of the two loci associated with the malate level trait that are located on chromosome 2 (Fig 5a) and chromosome 6 (Fig 5b). Different colour schemes are used to represent the pairwise LD estimates (rs2) for each genomic location.

Documents

GWAS in tomato for metabolic traits - Plant Physiology · Bauchet & Causse, 2012) in the last couple of years due to the release of the tomato genome sequence (The Tomato Genome Consortium,