8
Genic microsatellite markers in plants: features and applications Rajeev K. Varshney 1 , Andreas Graner 1 and Mark E. Sorrells 2 1 Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, D-06466 Gatersleben, Germany 2 Department of Plant Breeding, Cornell University, Ithaca, NY14853, USA Expressed sequence tag (EST) projects have generated a vast amount of publicly available sequence data from plant species; these data can be mined for simple sequence repeats (SSRs). These SSRs are useful as molecular markers because their development is inex- pensive, they represent transcribed genes and a putative function can often be deduced by a homology search. Because they are derived from transcripts, they are useful for assaying the functional diversity in natural populations or germplasm collections. These markers are valuable because of their higher level of transfer- ability to related species, and they can often be used as anchor markers for comparative mapping and evol- utionary studies. They have been developed and mapped in several crop species and could prove useful for marker-assisted selection, especially when the markers reside in the genes responsible for a phenotypic trait. Applications and potential uses of EST-SSRs in plant genetics and breeding are discussed. The analysis of DNA sequence variation is of major importance in genetic studies. In this context, molecular markers are a useful tool for assaying genetic variation, and have greatly enhanced the genetic analysis of crop plants. A variety of molecular markers, including restric- tion fragment length polymorphisms (RFLPs), random amplification of polymorphic DNAs (RAPDs), amplified fragment length polymorphisms (AFLPs) and microsatel- lites or simple sequence repeats (SSRs), have been developed in different crop plants [1,2]. Among different classes of molecular markers, SSR markers are useful for a variety of applications in plant genetics and breeding because of their reproducibility, multiallelic nature, codominant inheritance, relative abundance and good genome coverage [3]. SSR markers have been useful for integrating the genetic, physical and sequence-based physi- cal maps in plant species, and simultaneously have provided breeders and geneticists with an efficient tool to link phenotypic and genotypic variation (for review, see [4]). With the establishment of expressed sequence tag (EST) sequencing projects for gene discovery programs in several plant species, a wealth of DNA sequence information has been generated and deposited in online databases [5]. In addition, sequence data for many fully characterized genes and full-length cDNA clones have been generated for some plant species such as rice [6]. By using some computer programs, the sequence data for ESTs, genes and cDNA clones can be downloaded from GenBank and scanned for identification of SSRs, which are typically referred to as EST-SSRs or genic micro- satellites (Figure 1). Subsequently, locus-specific primers flanking EST- or genic SSRs can be designed to amplify the microsatellite loci present in the genes. Thus, the generation of (genic) SSR markers is relatively easy and inexpensive because they are a byproduct of the sequence data from genes or ESTs that are publicly available. However, the generation of genic SSR markers is largely limited to those species or close relatives for which there is a sufficiently large number of ESTs available. Genic SSRs have some intrinsic advantages over genomic SSRs because they are quickly obtained by electronic sorting, and are present in expressed regions of the genome. The usefulness of these genic SSRs also lies in their expected transferability because the primers are designed from the more conserved coding regions of the genome. Because of the advantages of genic SSR markers over genomic SSR markers and the public availability of large quantities of sequence data, genic SSRs have been identified, developed and used in a variety of studies, for several plant species. In this article, we review the current status of research on genic microsatellites in plants and present a critical appraisal of the relative use of genic SSRs and genomic SSRs for specific purposes, showing a shifting paradigm in microsatellite research for crop breeding with a particular emphasis on cereals. Identification, frequency and distribution of genic SSRs Identification of SSRs in gene sequences of plant species was carried out as early as 1993 by Morgante and Olivieri [7]. However, at that time the volume of sequence data available for SSR analysis was limited (!5000 kb) and therefore only a few genic SSRs were reported. Only one SSR per 64.6 kb in monocotyledonous and one per 21.2 kb in dicotyledonous species were identified [8]. Subsequently, the sudden increase in the volume of sequence data generated from EST projects in several plant species facilitated the identification of genic SSRs in large numbers. For the identification of SSRs in publicly available EST and gene sequences, ‘regular expression matching’ or BLASTN tools were initially used in the FASTA or BLAST2 formatted sequences [9,10]. Subsequently, several Perl scripts, search Corresponding author: Rajeev K. Varshney ([email protected] or [email protected]). Available online 25 November 2004 Review TRENDS in Biotechnology Vol.23 No.1 January 2005 www.sciencedirect.com 0167-7799/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibtech.2004.11.005

Genic microsatellite markers in plants: features and applications

Embed Size (px)

Citation preview

Genic microsatellite markers in plants:features and applicationsRajeev K. Varshney1, Andreas Graner1 and Mark E. Sorrells2

1Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, D-06466 Gatersleben, Germany2Department of Plant Breeding, Cornell University, Ithaca, NY14853, USA

Expressed sequence tag (EST) projects have generated a

vast amount of publicly available sequence data from

plant species; these data can be mined for simple

sequence repeats (SSRs). These SSRs are useful as

molecular markers because their development is inex-

pensive, they represent transcribed genes and a putative

function can often be deduced by a homology search.

Because they are derived from transcripts, they are

useful for assaying the functional diversity in natural

populations or germplasm collections. These markers

are valuable because of their higher level of transfer-

ability to related species, and they can often be used as

anchor markers for comparative mapping and evol-

utionary studies. They have been developed and

mapped in several crop species and could prove useful

for marker-assisted selection, especially when the

markers reside in the genes responsible for a phenotypic

trait. Applications and potential uses of EST-SSRs in

plant genetics and breeding are discussed.

The analysis of DNA sequence variation is of majorimportance in genetic studies. In this context, molecularmarkers are a useful tool for assaying genetic variation,and have greatly enhanced the genetic analysis of cropplants. A variety of molecular markers, including restric-tion fragment length polymorphisms (RFLPs), randomamplification of polymorphic DNAs (RAPDs), amplifiedfragment length polymorphisms (AFLPs) and microsatel-lites or simple sequence repeats (SSRs), have beendeveloped in different crop plants [1,2]. Among differentclasses of molecular markers, SSR markers are useful fora variety of applications in plant genetics and breedingbecause of their reproducibility, multiallelic nature,codominant inheritance, relative abundance and goodgenome coverage [3]. SSR markers have been useful forintegrating the genetic, physical and sequence-based physi-calmaps inplant species, and simultaneouslyhaveprovidedbreeders and geneticists with an efficient tool to linkphenotypic and genotypic variation (for review, see [4]).

With the establishment of expressed sequence tag(EST) sequencing projects for gene discovery programsin several plant species, a wealth of DNA sequenceinformation has been generated and deposited in onlinedatabases [5]. In addition, sequence data for many fully

Corresponding author: Rajeev K. Varshney ([email protected] [email protected]).

Available online 25 November 2004

www.sciencedirect.com 0167-7799/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved

characterized genes and full-length cDNA clones havebeen generated for some plant species such as rice [6]. Byusing some computer programs, the sequence data forESTs, genes and cDNA clones can be downloaded fromGenBank and scanned for identification of SSRs, whichare typically referred to as EST-SSRs or genic micro-satellites (Figure 1). Subsequently, locus-specific primersflanking EST- or genic SSRs can be designed to amplify themicrosatellite loci present in the genes. Thus, thegeneration of (genic) SSR markers is relatively easy andinexpensive because they are a byproduct of the sequencedata from genes or ESTs that are publicly available.However, the generation of genic SSR markers is largelylimited to those species or close relatives for which there isa sufficiently large number of ESTs available. Genic SSRshave some intrinsic advantages over genomic SSRsbecause they are quickly obtained by electronic sorting,and are present in expressed regions of the genome. Theusefulness of these genic SSRs also lies in their expectedtransferability because the primers are designed from themore conserved coding regions of the genome. Because ofthe advantages of genic SSR markers over genomic SSRmarkers and the public availability of large quantities ofsequence data, genic SSRs have been identified, developedand used in a variety of studies, for several plant species.In this article, we review the current status of research ongenic microsatellites in plants and present a criticalappraisal of the relative use of genic SSRs and genomicSSRs for specific purposes, showing a shifting paradigm inmicrosatellite research for crop breeding with a particularemphasis on cereals.

Identification, frequency and distribution of genic SSRs

IdentificationofSSRs ingene sequences of plant specieswascarried out as early as 1993 by Morgante and Olivieri [7].However, at that time the volume of sequence data availablefor SSR analysis was limited (!5000 kb) and therefore onlya fewgenicSSRswere reported.Only oneSSRper 64.6 kb inmonocotyledonous and one per 21.2 kb in dicotyledonousspecies were identified [8]. Subsequently, the suddenincrease in the volume of sequence data generated fromEST projects in several plant species facilitated theidentification of genic SSRs in large numbers. For theidentification of SSRs in publicly available EST and genesequences, ‘regular expression matching’ or BLASTN toolswere initially used in the FASTA or BLAST2 formattedsequences [9,10]. Subsequently, several Perl scripts, search

Review TRENDS in Biotechnology Vol.23 No.1 January 2005

. doi:10.1016/j.tibtech.2004.11.005

TRENDS in Biotechnology

Characterizedand annotatedgenes

Public databases such as NCBIa, EMBLb

Shotgunsequencing(ESTs)

Functionalgenomics

Singletons

Tentativeconsensi

Full-lengthcDNAclones

Unigenes

Available sequence data from genes or ESTs

Database mining:Identification of SSR in sequence data of ESTs or genes

Amplification of genic loci

Primer designing for genic SSRs

Applications

Diversityanalysis

Associationmapping

Gene taggingand QTL analysis

Transferabilityand comparativemapping

Genomemapping

Figure 1. A schematic representation of the development and application of genic simple sequence repeat (SSR) markers. aNCBI, National Center for Biotechnology

Information, Bethesda,MD, USA (http://www.ncbi.nih.gov/); bEMBL, EuropeanMolecular Biology Laboratory, Heidelberg, Germany (http://www.embl-heidelberg.de/). These

databases can be used to download publicly available ESTs or sequence data for a plant species available in the public domain. Abbreviation: QTL, quantitative trait loci.

Review TRENDS in Biotechnology Vol.23 No.1 January 2005 49

modules or programshave been developed for recognition ofSSRpatterns in the sequencefiles (Table1).Amongdifferentprogramsavailable in thepublic domain, theMIcroSAtellite(MISA) searchmodule has some features that are useful forEST quality control and for designing the primer pairs forEST-SSRs in a batch file [11] (see also http://pgrc.ipk-gatersleben.de/misa/). MISA has been used in severalstudies, in different laboratories [11–15]. Another SSRfinder, called Sputnik, has the useful feature of enablingthe user to specify the percent imperfection allowed in theSSR [16] (see also C. Abajian; http://abajian.net/sputnik/index.html), and Perl scripts have been written to facilitaterouting the output toa relational database andbatchprimerdesign for Primer3 (http://wheat.pw.usda.gov/ITMI/EST-SSR/LaRota/).

Table 1. Tools for database mining

Script or program

MIcroSAtellite (MISA)

SSRFinder

BuildSSR

SSR Identification Tool (SSRIT)

Tandem Repeat Finder (TRF)

Tandem Repeat Occurrence Locator (TROLL)

CUGIssr

Sputnik

Modified Sputnik

Modified Sputnik II

SSRSEARCH

www.sciencedirect.com

Because limited genomic sequence data are availablefor many plant species, EST databases have been screenedfor the development of genic SSRs. For example, ESTshave been scanned for the presence of SSRs inArabidopsis[16], cotton [17,18], Festuca species [19], grapes [9],Medicago species [20], soybean [21], sugarcane [22],spruce [23] and cereals including barley [11–13,17,24],maize [13,16,21,24], rice [10,13,16,17,24], rye [13,15,25],sorghum [13,24] and wheat [13,16,17,21,24,26,27].

The abundance of SSRs (perfect and imperfect) inunigenes can range from 1 in every 100 to 1 in every 2unigenes of rice, depending on the minimum length of theSSR repeat motif (M. La Rota et al., unpublished).Varshney et al. [13] estimated the density of SSRs inexpressed regions for 75.2 Mb of barley, 54.7 Mb of maize,

Refs

http://pgrc.ipk-gatersleben.de/misa/; [11]

[21]

[23]

[24]

[59]

[60]

http://www.genome.clemson.edu/projects/ssr/

C. Abajian; http://abajian.net/sputnik/index.html

[16]

http://wheat.pw.usda.gov/ITMI/EST-SSR/LaRota/

ftp://ftp.gramene.org/pub/gramene/software/scripts/ssr.pl

Review TRENDS in Biotechnology Vol.23 No.1 January 200550

43.9 Mb of rice, 3.7 Mb of rye, 41.6 Mb of sorghum and37.5 Mb of wheat and found the overall average density of(redundant) SSRs to be 1 per 6.0 kb. Higher frequencies,however, were reported by Morgante et al. [16], with 2.1,1.1 and 1.3 per kb for rice, maize and wheat, respectively.In this context, it is important to note that the overallfrequency and the frequency of different lengths of SSRsand repeat motifs depend on the criteria used to identifySSRs in the database mining, and therefore varied widelyin different studies. In wheat, for example, the frequencyof SSRs in ESTs has been reported as 1 in 6.2 kb [13], 1 in0.74 kb [16], 1 in 17.42 kb [21], 3.2% orw1 in 1 kb [24], 1 in9.2 kb [26] and 7.5% of the contigs [27]. In some of thesestudies, a redundant set of SSRs (see below) was taken intoaccount for estimating the SSR frequencies [13,21,24,26].Furthermore, different SSR search tools (MISA [13];Sputnik [16]; SSRFinder [21]; SSRIT [24]; macro [26];SSRSEARCH, TRF and RepeatMasker [27]) with differentsearch SSR criteria and different datasets were used forEST database mining. In general, when the minimumrepeat length is 20 bp, SSRs of various plant species arepresent inw5% of the ESTs (for examples, see http://www.genome.clemson.edu/projects/ssr).

Trinucleotide repeats (TNRs) are the most common,followed by either dinucleotide repeats (DNRs) or tetra-nucleotide repeats (TTNRs), depending on the report. Forexample, Varshney et al. [13] reported that among cerealspecies, TNRs were the most frequent (54–78%) followedby DNRs (17.1–40.4%) and TTNRs (3–6%). Frequenciesand distribution of different repeat motifs varied substan-tially in studies by Morgante et al. [16], Gao et al. [21] andKantety et al. [24]. In these reports, wheat TNRs rangedfrom 49% to 83% but DNRs and TTNR frequencies weresimilar. Proportions of the different rice SSRs were similarin these reports but for maize there were large differences,many of which could be attributed to different methods ofscreening and analyzing SSRs and to differences in thesources of DNA sequence. In a recent survey, theproportions of DNRs, TNRs and TTNRs and motifsobserved varied with the length of the SSRs within andamong barley, wheat and rice [M. La Rota et al.,unpublished]. Yu et al. [14] reported that 74% of theTNRs were found in coding regions, 20% in 5 0 UTRs and6% in 3 0 UTRs. By contrast, only 19% of the DNRs were incoding regions and 42% and 39% were in 5 0 and 3 0 UTRs,respectively. The abundance of trimeric SSRs in ESTs wasattributed to the absence of frameshift mutations incoding regions when there is length variation in theseSSRs [28]. Also, among the TNRs, codon repeats corre-sponding to small hydrophilic amino acids are perhapsmore easily tolerated, and selection pressure probablyeliminates codon repeats encoding hydrophobic and basicamino acids [29].

Some inherent issues of genic SSRs

Redundancy

Large-scale EST sequencing projects have been performedfor several plant species [5,30]. However, random orshotgun sequencing within cDNA libraries leads to ahigh proportion of redundant ESTs [31]. For developmentof unique genic SSR markers, a nonredundant EST

www.sciencedirect.com

dataset (after clustering the redundant set of ESTs anddefining the ‘unigene’ set) should be used for identificationand development of EST-SSR markers. In some studies,the redundant EST dataset has been scanned first forthe presence of ESTs containing SSRs (SSR-ESTs) andthen the smaller dataset of redundant SSR-ESTs has beenused to identify nonredundant SSR-ESTs or EST-SSRsafter clustering and defining the unigene SSR-ESTs[11–13,24]. The frequency of SSRs in nonredundant ESTs(or SSR-ESTs) more accurately reflects the density ofSSRs in the transcribed portion of the genome. Alterna-tively, all available ESTs can be assembled and consensussequences from unigene datasets such as the gene indicesat The Institute for Genome Research (TIGR; http://www.tigr.org/tdb/tgi/) or other sources can be used for properdevelopment of nonredundant marker sets [24,27].

Robustness and high-quality markers

The practical use of polymerase chain reaction-basedmarkers, especially in germplasm analysis, in which dataintegration and comparison are crucial, requires that eachSSR marker be validated for quality and robustness of theamplification product. However, a portion of genomicSSRs, developed in the past, have produced faintbands or stuttering, as observed in wheat [32] andbarley [33]. By contrast, SSR markers derived fromgenes have produced a high proportion of high-qualitymarkers with strong bands and distinct allelic peaks inmost reports [11,12,14,19,27,34,35]. High quality androbustness of amplification patterns, along with othermerits (see later) associated with EST-SSR markers,enhance their value, especially for germplasmcharacterization.

Amplification rate and null alleles

Primer design is not an exact science, and a success rate of60–90% amplification for both genomic and EST-SSRs hasbeen reported in different studies [11,12,14,19,22,26].Possible explanations include: (i) one or both primers ofthe EST-SSR extend across a splice site; (ii) the presence oflarge introns in genomic DNA sequence; (iii) the use ofquestionable sequence information for primer develop-ment; and (iv) primers were derived from chimeric cDNAclones. Thus, the quality of the SSR-EST sequence fordesigning the primer pairs is important. In a survey, up to9% of cereal ESTs were of low quality [30] and should berejected for designing primer pairs for EST-SSRs [11].

Furthermore, compared with genomic SSRs, ampli-con size more frequently deviated from expectation[11,12,14,22,27]. This result is probably a result of thepresence of introns and insertions-deletions(in-dels) in thecorresponding genomic sequence, as was substantiated bysequence analysis [19]. Large in-dels (20Cbp) in theSSR-ESTs can alter amplicon size sufficiently to enablevisualization of polymorphism on agarose gels, which,compared with the use of acrylamide gels, significantlyreduce costs and increase throughput [14].

Null alleles (alleles that do not give a polymerase chainreaction product) were observed by using EST-SSRmarkers in studies on kiwifruit [36], rice [34], spruce [23]and wheat [26,35]. In wheat, occurrence of null alleles is

Review TRENDS in Biotechnology Vol.23 No.1 January 2005 51

common and has been reported earlier using genomicSSRs (for references, see [4]). Occurrence of null allelescan be explained by: (i) the deletion of microsatellite at aspecified locus [37]; (ii) mutations (in-dels or substi-tutions) in the primer binding site [38]. Occurrence ofnull alleles complicates the interpretation of segregationdata because heterozygotes cannot be identified andreaction failures cannot be detected. The latter can resultin deviation from the expected Mendelian segregationratios [36].

Level of polymorphism

EST-SSR primers have been reported to be less poly-morphic compared with genomic SSRs in crop plantsbecause of greater DNA sequence conservation in tran-scribed regions [9,23,34,35,39,40]. It is noteworthy thatfor detection of polymorphism, EST-SSRs derived from3 0 ESTs were found to be superior to those derived from5 0-ESTs [9,21,26,41]. Owing to the process of cDNAgeneration (polyT priming), there is a preferential selec-tion of untranslated regions (UTRs) within 3 0-ESTs,resulting in more variation than in 5 0-ESTs. Scott et al.[9] also reported that there were polymorphism differ-ences among microsatellites derived from the 3 0 UTR(most polymorphic at cultivar level), the 5 0 UTR (mostpolymorphic between cultivar and species) and micro-satellites within the coding sequence (most polymorphicbetween species and genera).

Interestingly, in a recent study on identification andgenome mapping of EST-SSRs in kiwifruit (Actinidiaspp.), 93.5% of the markers were polymorphic andsegregating in a mapping population derived from theintraspecific cross between two genotypes of diploidActinidia chinensis [36]. Saha et al. [19] reported that66% of the tall fescue-derived EST-SSR primer pairs werepolymorphic between parents of tall fescue and ryegrasspopulations, and 43% and 38% of these were polymorphicin rice and wheat, respectively.

Applications

Genetic mapping

Microsatellite markers, developed from genomic libraries,can belong to either the transcribed region or thenontranscribed region of the genome, and rarely is thereinformation available regarding their functions. By con-trast, genic microsatellite markers often have known or‘putative’ functions and are gene targeted markers withthe potential of representing functional markers in thosecases where polymorphisms in the repeat motifs affect thefunction of the gene in which they reside [42]. Putativefunctions for a significant proportion of EST-SSR markershave been reported [11,14,18,43]. EST-SSR markers areone class of marker that can contribute to ‘direct alleleselection’, if they are shown to be completely associated oreven responsible for a targeted trait [44]. For example,recently, a Dof homolog (DAG1 gene that showed a strongeffect on seed germination in Arabidopsis [45]) has beenmapped on chromosome 1B of wheat by using wheatEST-SSR primers [43]. Similarly, Yu et al. [14] identifiedtwo EST-SSR markers linked to the photoperiod responsegene (ppd) in wheat. Finally, mapping candidate genes can

www.sciencedirect.com

facilitate genome alignment across distantly relatedspecies [46,47].

In recent years, the EST-SSR loci have been integrated,or genome-wide genetic maps have been prepared, inseveral plant (mainly cereal) species (Table 2). A largenumber of genic SSRs have been placed on the geneticmaps of wheat [14,27,41,43]. EST-SSRs have beenmappedas a part of the transcript map of barley (R.K. Varshneyet al., unpublished) [11]. Unlike genomic SSRs, genicmicrosatellite markers were not clustered around thecentromere but, as expected, were concentrated in gene-rich regions [11,14,43]. It is believed that the distributionof genic SSRs in the genetic maps mirrors the distributionof genes along the genetic map. In some earlier reportsdealing with genomic SSRs, microsatellite markers wereassociated with repetitive DNA or retrotransposons [4,48];however, recent reports indicated that they are predomin-ately associated with nonrepetitive DNA (M. La Rotaet al., unpublished) [16,49].

Functional diversity

Characterization of genetic variation within naturalpopulations and among breeding lines is crucial foreffective conservation and exploitation of geneticresources for crop improvement programs. Molecularmarkers have proven useful for assessment of geneticvariation in germplasm collections [50]. Evaluation ofgermplasm with SSRs derived from genes or ESTs mightenhance the role of genetic markers by assaying thevariation in transcribed and known-function genes,although there is a higher probability of bias owing toselection. Expansion and contraction of SSR repeats ingenes of known function can be tested for association withphenotypic variation or, more desirably, biological func-tion [51]. The presence of SSRs in the transcripts of genessuggests that they might have a role in gene expression orfunction; however, it remains to be seen whether anyunusual phenotypic variation might be associated withthe length of SSRs in coding regions, as was reported forseveral diseases in humans [49,52]. It has been shown thatvariation in repeat units of SSRs: (i) present in 5 0 UTRaffects the gene transcription and/or translation;(ii) present in coding region inactivate or activate genesor truncate protein; and (iii) present in 3 0 UTR might beresponsible for gene silencing or transcription slippage.However, the function of genes that contain SSRs and therole of the SSR motif in the function of the plant genes arepoorly understood. In a computational study, microsatel-lite markers in the transcribed regions of rice andArabidopsis were more frequently found in the 5 0 UTRsthan in coding regions or 3 0 UTRs, suggesting that theycan potentially function as factors in regulating geneexpression [53]. In an experimental study in rice,variation in the number of GA or CT repeats in the5 0 UTR of the waxy gene was correlated with amylosecontent [51,54]. Similarly, microsatellite markers (CCG)nin 5 0 UTRs of some ribosomal protein genes of maize werebelieved to be involved in the regulation of fertilization[55]. Thus, the mechanisms found in human or animalsystems might also have a role in generating phenotypicdiversity in plant species.However, the variation associated

Table 2. Genome mapping using genic simple sequence repeat (SSR) markersa

Plant species Number of genic SSR loci mapped Mapping population used Refs

Barley 185 3 DHsa (Igri!Franka, Steptoe!Morex, OWBDom!OWBRec) [11], R.K. Varshney

et al. unpublished

39 F2s (Lerche!BGR41936), DHs (Igri!Franka), wheat–barley

addition lines

[56]

Cotton 111 BC1b lines (TM1!Hai7124)!TM1) [18]

Kiwifruit 138 Intraspecific cross [36]

Raspberry 8 Full-sib family (Glen Moy!Latham) [61]

Rice 91 DHs (IR64!Azucena), RILsc (Milyang 23!Gihobyeo, Lemont!

Teqing, BS125!WL02)

[10]

Rye 39 4 mapping populations derived from reciprocal crosses (P87!

P105, N6!N2, N7!N2, N7!N6)

[15]

Ryegrass 91 Three-generation population (Floregon!Manhattan) [62]

Tall fescue

(Festuca spp.)

91 Pseudo-test cross-population (HD28–56!R43–64) [57]

Wheat 149 RILs (W7984!Opata85) [14]

126 RILs (W7984!Opata85) [27]

101 RILs (W7984!Opata85, Wenmai 6!Shanhongmai), DHs

(Lumai14!Hanxuan 10)

[43]

White clover 449 Pseudotest cross-population (6525/5!364/7) [63]aAbbreviations: BC1, backcross population; DHs, doubled haploids; RILs, recombinant inbred lines.

Review TRENDS in Biotechnology Vol.23 No.1 January 200552

withdeleterious characters is less likely to be represented inthe germplasm collections of crop species than amongnatural populations because undesirable mutations arecommonly culled from agricultural populations [34].

Several studies have found that genic SSRs are usefulfor estimating genetic relationship (Table 3), and at thesame time provide opportunities to examine functionaldiversity in relation to adaptive variation [35,40]. Incomparison to genomic SSRs, genic SSRs revealed lesspolymorphism (low polymorphic information contentvalue) in germplasm characterization and genetic diver-sity studies [9,11,34,35,39,40,56].

Transferability and comparative mapping

Perhaps the most important feature of the genic SSRmarkers is that these markers are transferable amongdistantly related species, whereas the genomic SSRs arenot suitable for this purpose. Transferability of such

Table 3. Utilization of genic simple sequence repeat (SSR) markers

Plant species Number of EST-SSR

markers used

Details of genotypes use

Alpine lady-fern 10 186 individuals (6 popul

Barley 38 54 cultivars

75 7 genotypes (parents of

10 23 genotypes representi

8 8 spring barley cultivars,

wild barley lines

17 (barley and wheat) 11 varieties

22 28 Germany barley culti

Coffea spp. 9 15 C. arabica and 8 C. ro

Fescue spp. 145 5 Fescue genotypes and

Medicago spp. 39 24 species and subspeci

Rice 129 14 genotypes (parents of

interspecific cross)

Rye 100 15 accessions (13 inbred

cultivars)

Sugarcane 21 5 genotypes

Wheat 20 52 elite exotic wheat gen

22 64 durum wheat accessi

10 (wheat and barley) 15 varieties

52 68 advanced wheat lines

20 56 old and new UK whe

64 18 species of Triticum–AaAbbreviations: EST, expressed sequence tag; DH, doubled haploid; PIC, polymorphic i

www.sciencedirect.com

markers to related species or genera has been demon-strated in several studies (Table 4). Recently, the potentialuse of EST-SSRs developed for barley and wheat has beendemonstrated for comparative mapping in wheat, rye andrice [46,47]. These studies suggested that EST-SSRmarkers could be used in related plant species for whichlittle information is available on SSRs or ESTs. Inaddition, the genic SSRs are good candidates for thedevelopment of conserved orthologous markers for geneticanalysis and breeding of different species. For example, aset of 12 barley EST-SSR markers was identified thatshowed significant homology with the ESTs of fourmonocotyledonous species (wheat, maize, sorghum andrice) and two dicotyledonous species (Arabidopsis andMedicago) and could potentially be used across thesespecies [47].

Two issues of importance for cross-species utilizationare frequency of amplification for a given set of primers

for estimation of genetic diversitya

d Average PIC Refs

ations) 0.49 [64]

0.45 [11]

3 DH mapping populations) – [12]

ng different geographic regions 0.60 [39]

8 Jordan and Syrian landraces and 8 0.38 [40]

0.36 [41]

vars and 2 wild barley accessions 0.38 [56]

busta species 0.32 [65]

2 genotypes each of wheat and rice – [57]

es of Medicago 0.66 [19]

six intersubspecific crosses and one 0.46 [34]

lines and two open-pollinated – [25]

0.62 [22]

otypes 0.44 [26]

ons 0.62 [35]

0.45 [41]

Average alleles 3.3 [66]

at varieties 0.40 [67]

egilops complex Average alleles 6.8 [68]

nformation content (unless otherwise specified).

Table 4. Interspecific and generic transferability of genic simple sequence repeat (SSR) markers

Plant species, genic SSRs developed Species, transferability recorded Refs

Alpine lady-fern 9 species from Woodsiaceae [64]

Apricot 21 Prunus accessions, one pear and six apple cultivars [69]

Barley Wheat, rye, rice [11,47]

Barley and wheat Wheat and barley [41]

Coffee 12 Coffea species and 4 Psilanthus species [65]

Cotton 2 cotton species [17]

Grape 7 species from 2 Vitaceae genera [9]

46 species from Vitaceae family [69]

25 species from 5 Vitaceae genera [70]

8 species from 4 Vitaceae genera [71]

Loblolly pine Different subspecies and species of pine [58]

Medicago (M. truncatula) 6 Medicago species [20]

Rice Wild species of rice [34]

Spruce (Picea spp.) 23 spruce species [23]

Sugarcane (Sachharum spp.) Erianthus and Sorghum [22]

Tall fescue (Festuca arundinacea) Lolium spp., rice, wheat [19]

Wheat Barley, maize, rice [14,46]

Barley, maize, rice, rye and oats [26]

Rice, maize and soybean [43]

Aegilops and Triticum species [68]

Review TRENDS in Biotechnology Vol.23 No.1 January 2005 53

and probability of amplifying the same (orthologous) genein multiple species. Studies have estimated that 44% to60% of EST-SSR primer pairs designed for wheat or barleywill also yield amplicons in rice [11,14,21,41,46,47]. Of tallfescue primers, 59% successfully amplified rice and 71%amplified wheat DNA [57]. Similarly, 96% of the primersdesigned for Medicago truncatula produced amplicons insix other Medicago species [19]. In a study of thetransferability of Loblolly pine SSR markers to otherpine species, Liewlaksaneeyanawin et al. [58] comparedmicrosatellite markers developed from ESTs, unscreenedgenomic DNA, low-copy genomic DNA and undermethy-lated genomic DNA. Although all eight of the EST-SSRmarkers produced amplicons on all four species, the threegroups of genomic SSR markers were only evaluated fortransferability to Pinus contorta ssp. latifolia and 29%,23% and 30% produced amplicons, respectively. In acomparison of methods for primer design, Yu et al. [46]found that aligning consensus sequences from two or morespecies to identify conserved regions for primer design wasless efficient than designing species–specific primers andthen testing them on other species.

Orthology can only be determined by comparing bothsimilarity of amplicon sequences and genome locationacross species [46,47]. For example, Saha et al. [19]sequenced the products of one EST-SSR primer pair forthree fescue species, ryegrass, rice and wheat, and allsequences were O85% similar. Sequence-based compari-son of mapped barley SSR-ESTs with genetically and/orphysically mapped markers in wheat, rye and ricerevealed several markers that showed an orthologousrelationship between examined cereal species [47]. Com-parison of genome locations of polymorphic EST-SSRmarkers mapped in both wheat and rice also confirmedpreviously known genome relationships with most of themarkers examined [46]. However, the assessment ofcolinearity was complicated by the detection of multiplepolymorphic loci in either wheat or rice by 85% of theprimer pairs. The tendency of EST-SSR primer pairs todetect more loci than genomic SSRs was also reported fortall fescue [57].

www.sciencedirect.com

Comparative account on genic and genomic

microsatellite markers

A comparative analysis of genomic SSRs and genic SSRsreveals advantages to both; however, because of lowerpolymorphism, EST-SSRs are not as efficient as genomicSSRs for distinguishing the closely related genotypes(for references, see [4]). Furthermore, the development ofgenic SSRs is restricted to those species for which thereare sufficient sequence data (for ESTs or genes) availablebecause SSRs are present in only 2% to 5% of the unigenesexamined. Nevertheless, EST-SSR markers developed fora given species can successfully be used in a relatedspecies for a variety of purposes, including fingerprintingor diversity studies, comparative mapping and marker-assisted selection. Genic SSR- and genomic SSR markerstend to be complementary for genomemapping, with genicmicrosatellites being less polymorphic but concentrated inthe gene-rich regions. For assessment of functionaldiversity, the genic SSRs are useful; however, because ofhigher polymorphism, genomic SSRs are superior forfingerprinting or varietal identification studies.

Future directions of microsatellite marker research

With more DNA sequence data being generated daily, thetrend is towards cross-referencing genes and genomesusing sequence- andmap-based tools. Because polymorph-ism is a major limitation for many species, microsatellitemarkers are a valuable tool for plant genetics andbreeding.

Clearly, the most significant application of EST-SSRs isfor comparative mapping, with good examples in grami-naceous and leguminous species. A database of EST-SSRprimer pairs that would amplify orthologous loci acrossspecies and that are uniformly distributed over the rice,Medicago and Arabidopsis genomes would be very usefulto breeders and geneticists, especially for minor or under-funded crop species.

In the longer term, development of allele-specificmarkers for the genes controlling agronomic traits willbe important for advancing the science of plant breeding.In this context, genic microsatellites are but one class of

Review TRENDS in Biotechnology Vol.23 No.1 January 200554

marker that can be deployed, along with single nucleotidepolymorphisms and other types of markers that targetfunctional polymorphisms within genes. The choice of themost appropriate marker system needs to be decided uponon a case by case basis and will depend on many issues,including the availability of technology platforms, costs formarker development, species transferability, informationcontent and ease of documentation.

References

1 Philips, R.L. and Vasil, I.K. eds (2001) DNA-Based Markers in Plants,Kluwer Academic Publishers

2 Varshney, R.K. et al. (2004) Molecular maps in cereals: methodologyand progress. In Cereal Genomics (Gupta, P.K. and Varshney, R.K.eds), pp. 35–82, Kluwer Academic Publishers

3 Powell, W. et al. (1996) Polymorphism revealed by simple sequencerepeats. Trends Plant Sci. 1, 215–222

4 Gupta, P.K. and Varshney, R.K. (2000) The development and use ofmicrosatellite markers for genetic analysis and plant breeding withemphasis on bread wheat. Euphytica 113, 163–185

5 Rudd, S. (2003) Expressed sequence tags: alternative or complementto whole genome sequences? Trends Plant Sci. 8, 321–329

6 Kikuchi, S. et al. (2003) Collection, mapping, and annotation of over28,000 cDNA clones from japonica rice. Science 301, 376–379

7 Morgante, M. and Olivieri, A.M. (1993) PCR-amplified microsatellitesas markers in plant genetics. Plant J. 3, 175–182

8 Wang, Z. et al. (1994) Survey of plant short tandem DNA repeats.Theor. Appl. Genet. 88, 1–6

9 Scott, K.D. et al. (2000) Analysis of SSRs derived from grape ESTs.Theor. Appl. Genet. 100, 723–726

10 Temnykh, S. et al. (2000) Mapping and genome organization ofmicrosatellite sequences in rice (Oryza sativa L.). Theor. Appl. Genet.100, 697–712

11 Thiel, T. et al. (2003) Exploiting EST databases for the development ofcDNA derived microsatellite markers in barley (Hordeum vulgare L.).Theor. Appl. Genet. 106, 411–422

12 Kota, R. et al. (2001) Generation and comparison of EST-derived SSRsand SNPs in barley (Hordeum vulgare L.). Hereditas 135, 145–151

13 Varshney, R.K. et al. (2002) In silico analysis on frequency anddistribution of microsatellites in ESTs of some cereal species. Cell.Mol. Biol. Lett. 7, 537–546

14 Yu, J.K. et al. (2004) Development and mapping of EST-derived simplesequence repeat (SSR) markers for hexaploid wheat. Genome 47,805–818

15 Khlestkina, E. et al. (2004) Mapping of 99 new microsatellite-derivedloci in rye (Secale cereale L.) including 39 expressed sequence tags.Theor. Appl. Genet. 109, 725–732

16 Morgante, M. et al. (2002) Microsatellites are preferentially presentwith non-repetitive DNA in plant genomes. Nat. Genet. 30, 194–200

17 Saha, S. et al. (2003) Simple sequence repeats as useful resources tostudy transcribed genes of cotton. Euphytica 130, 355–364

18 Han, Z.-G. et al. (2004) Genetic mapping of EST-derived microsatel-lites from the diploid Gossypium arboreum in allotetraploid cotton.Mol. Gen. Genom. 272, 308–327

19 Saha, M.C. et al. (2004) Tall fescue EST-SSR markers with transfer-ability across several grass species. Theor. Appl. Genet. 109, 783–791

20 Eujayl, I. et al. (2004) Medicago truncatula EST-SSRs reveal cross-species genetic markers for Medicago spp. Theor. Appl. Genet. 108,414–422

21 Gao, L.F. et al. (2003) Analysis of microsatellites in major cropsassessed by computational and experimental approaches. Mol. Breed.12, 245–261

22 Cordeiro, G.M. et al. (2001) Microsatellite markers from sugarcane(Saccharum spp.) ESTs cross transferable to erianthus and sorghum.Plant Sci. 160, 1115–1123

23 Rungis, D. et al. (2004) Robust simple sequence repeat markers forspruce (Picea spp.) from expressed sequence tags. Theor. Appl. Genet.109, 1283–1294

24 Kantety, R.V. et al. (2002) Data mining for simple sequence repeats inexpressed sequence tags from barley, maize, rice, sorghum and wheat.Plant Mol. Biol. 48, 501–510

www.sciencedirect.com

25 Hackauf, B. and Wehling, P. (2002) Identification of microsatellitepolymorphisms in an expressed portion of the rye genome. PlantBreed. 121, 17–25

26 Gupta, P.K. et al. (2003) Transferable EST-SSR markers for the studyof polymorphism and genetic diversity in bread wheat. Mol. Genet.Genomics 270, 315–323

27 Nicot, N. et al. (2004) Study of simple sequence repeat (SSR) markersfrom wheat expressed sequence tags (ESTs). Theor. Appl. Genet. 109,800–805

28 Metzgar, D. et al. (2000) Selection against frameshift mutations limitsmicrosatellite expansion in coding DNA. Genome Res. 10, 72–80

29 Katti, M.V. et al. (2001) Differential distribution of simple sequencerepeats in eukaryotic genome sequences. Mol. Biol. Evol. 18,1161–1167

30 Sreenivasulu, N. et al. (2002) Mining functional information fromcereal genomes – the utility of expressed sequence tags. Curr. Sci. 83,965–973

31 Varshney, R.K. et al. (2004) A simple hybridization-based strategy forthe generation of non-redundant EST collections – a case study inbarley (Hordeum vulgare L.). Plant Sci. 167, 629–634

32 Stephenson, P. et al. (1998) Fifty new microsatellite loci for the wheatgenetic map. Theor. Appl. Genet. 97, 946–949

33 Ramsay, L. et al. (2000) A simple sequence repeat-based linkage mapof barley. Genetics 156, 1997–2005

34 Cho, Y.G. et al. (2000) Diversity of microsatellites derived fromgenomic libraries and GenBank sequences in rice (Oryza sativa L.).Theor. Appl. Genet. 100, 713–722

35 Eujayl, I. et al. (2001) Assessment of genotypic variation amongcultivated durum wheat based on EST-SSRs and genomic SSRs.Euphytica 119, 39–43

36 Fraser, L.G. et al. (2004) EST-derived microsatellites from Actinidiaspecies and their potential for mapping. Theor. Appl. Genet. 108,1010–1016

37 Callen, D. et al. (1993) Incidence and origin of null alleles in the (AC)nmicrosatellite markers. Am. J. Hum. Genet. 52, 922–927

38 Lehman, T. et al. (1996) An evaluation of evolutionary constraints onmicrosatellite loci using null alleles. Genetics 144, 1155–1163

39 Chabane, K. et al. (2005) EST versus genomic derived microsatellitemarkers for genotyping wild and cultivated barley. Genet. Resour.Crop Evol. (in press)

40 Russell, J. et al. (2004) A comparison of sequence-based polymorphismand haplotype content in transcribed and anonymous regions of thebarley. Genome 47, 389–398

41 Holton, T.A. et al. (2002) Identification and mapping of polymorphicSSR markers from expressed gene sequences of barley and wheat.Mol. Breed. 9, 63–71

42 Anderson, J.R. and Lubberstedt, T. (2003) Functional markers inplants. Trends Plant Sci. 8, 554–560

43 Gao, L.F. et al. (2004) One hundred and one new microsatellite lociderived from ESTs (EST-SSRs) in bread wheat. Theor. Appl. Genet.108, 1392–1400

44 Sorrells, M.E. and Wilson, W.A. (1997) Direct classification andselection of superior alleles for crop improvement. Crop Sci. 37,691–697

45 Papi, M. et al. (2000) Identification and disruption of an Arabidopsiszinc finger gene controlling seed germination. Genes Dev. 14, 28–33

46 Yu, J.K. et al. (2004) EST-derived SSR markers for comparativemapping in wheat and rice. Mol. Genet. Genomics 271, 742–751

47 Varshney, R.K. et al. (2005) Interspecific transferability and compara-tive mapping of barley EST-SSRmarkers in wheat, rye and rice. PlantSci. 168, 195–202

48 Schulman, A. et al. (2004) Organization of retrotransposons andmicrosatellites in cereal genomes. In Cereal Genomics (Gupta, P.K.and Varshney, R.K. eds), pp. 83–118, Kluwer Academic Publishers

49 Li, Y.C. et al. (2004) Microsatellites within genes: structure, function,and evolution. Mol. Biol. Evol. 21, 991–1007

50 Mohammadi, S.A. and Prasanna, B.M. (2003) Analysis of geneticdiversity in crop plants – salient statistical tools and considerations.Crop Sci. 43, 1235–1248

51 Ayers, N.M. et al. (1997) Microsatellites and a single nucleotidepolymorphism differentiate apparent amylose classes in an extendedpedigree of US rice germplasm. Theor. Appl. Genet. 94, 773–781

Review TRENDS in Biotechnology Vol.23 No.1 January 2005 55

52 Cummings, C.J. and Zoghbi, H.Y. (2000) Trinucleotide repeats:mechanisms and pathophysiology. Annu. Rev. Genomics Hum.

Genet. 1, 281–32853 Fujimori, S. et al. (2003) A novel feature of microsatellites in plants: a

distribution gradient along the direction of transcription. FEBS Lett.

554, 17–2254 Bao, S. et al. (2002) Microsatellites in starch-synthesizing genes in

relation to starch physicochemical properties in waxy rice (Oryza

sativa L.). Theor. Appl. Genet. 105, 898–90555 Dresselhaus, T. et al. (1999) Novel ribosomal genes from maize are

differentially expressed in the zygotic and somatic cell cycles. Mol.

Gen. Genet. 261, 416–42756 Pillen, K. et al. (2000) Mapping new EMBL-derived barley micro-

satellites and their use in differentiating German barley cultivars.Theor. Appl. Genet. 101, 652–660

57 Saha, M.C. et al. (2004) A high-density linkage map of tall fescuebased on SSR and AFLP markers. Theor. Appl. Genet. (in press)

58 Liewlaksaneeyanawin, C. et al. (2004) Single-copy, species-transfer-

able microsatellite markers developed from loblolly pine ESTs. Theor.Appl. Genet. 109, 361–369

59 Benson, G. (1999) Tandem repeats finder: a program to analyze DNAsequences. Nucleic Acids Res. 27, 573–580

60 Castelo, A.T. et al. (2002) TROLL – Tandem Repeat OccurrenceLocator. Bioinformatics 18, 634–636

61 Graham, J. et al. (2004) The construction of a genetic linkage map ofred raspberry (Rubus idaeus subsp. idaeus) based on AFLPs, genomic-SSR and EST-SSR markers. Theor. Appl. Genet. 109, 740–749

Getting animated

Interested in themolecular cell biology of host–parasite interactions

in Parasitology, one of our companion TRENDS journals. The pictur

revealing the latest advances in understanding

MicrosporidBy C. Franzen [(2004)

http://archiv

Interaction ofBy E. Handman and D.V

http://archi

www.sciencedirect.com

62 Warnke, S.E. et al. (2004) Genetic linkage mapping of an annual xperennial ryegrass population. Theor. Appl. Genet. 109, 294–304

63 Barrett, B. et al. (2004) A microsatellite map of white clover. Theor.Appl. Genet. 109, 596–608

64 Woodhead, M. et al. (2003) Development of EST-SSRs from the alpinelady-fern, Athyrium distentifolium. Mol. Ecol. Notes 3, 287–290

65 Bhat, P. et al. (2004) Identification and characterization of gene (EST)-derived SSR markers from robusta coffee variety ‘CxR’ (an inter-specific hybrid of Coffea canephora!C. congensis). Mol. Ecol. NotesDOI:10.1111/j-1471-8286.2004.00839

66 Dreisigacker, S. et al. (2003) SSR and pedigree analyses of geneticdiversity among CIMMYT wheat lines targeted to different mega-environments. Crop Sci. 44, 381–388

67 Leigh, F. et al. (2003) Assessment of EST- and genomic microsatellitemarkers for variety discrimination and genetic diversity studies inwheat. Euphytica 133, 359–366

68 Bandopadhyay, R. et al. (2004) DNA polymorphism among 18 speciesof Triticum-Aegilops complex using wheat EST-SSRs. Plant Sci. 166,349–356

69 Decroocq, V. et al. (2003) Development and transferability of apricotand grape ESTmicrosatellite markers across taxa. Theor. Appl. Genet.106, 912–922

70 Arnold, C. et al. (2002) The application of SSRs characterized for grape(Vitis vinifera) to conservation studies in Vitaceae. Am. J. Bot. 89,22–28

71 Rossetto, M. et al. (2002) Evaluating the potential of SSR: flankingregions examining taxonomic relationships in the Vitaceae. Theor.Appl. Genet. 104, 61–66

with parasites!

? Then take a look at the online animations produced by Trends

es below are snapshots from two of our collection of animations

parasite life cycles. Check them out today!

ia: how can they invade other cells?Trends Parasitol. 20, 10.1016/j.pt.2004.04.009]e.bmn.com/supp/part/franzen.html

Leishmania with the host macrophage.R. Bullen [(2002) Trends Parasitol. 18, 332–334]ve.bmn.com/supp/part/swf012.html