Cytoplasmic diversity of the cotton genus as revealed by chloroplast microsatellite markers

RESEARCH ARTICLE

Cytoplasmic diversity of the cotton genus as revealedby chloroplast microsatellite markers

Pengbo Li • Zhaohu Li • Huimin Liu •

Jinping Hua

Received: 11 November 2012 / Accepted: 19 June 2013

� Springer Science+Business Media Dordrecht 2013

Abstract The diversity of chloroplast genomes has

played an important role, as have those of nuclear and

mitochondrial genomes, in the evolution of plants. The

sequences of the chloroplast genome supply unsub-

stituted information for genome analysis. In order to

understand the genetic differentiation and relationship

of cotton species, we investigated the cytoplasmic

diversity of chloroplast genomes in 41 Gossypium

accessions with 75 chloroplast simple sequence repeat

(cpSSR) markers. The markers were developed from

reference sequences of the chloroplast genomes of

G. hirsutum and G. barbadense and covered approx-

imately 12.6 kb. Among the 75 markers, 50 were

polymorphic, with polymorphism information content

values ranging from 0.11 to 0.88. Analyses of the

dataset demonstrated that single copy regions were

much more informative than inverted repeat regions.

The non-coding sequences were well differentiated

among these species. For some common cpDNA

haplotypes, the E-genome species that may be the

oldest of the extant cotton species was deduced. The

differentiation of A-genome species lagged behind

that of AD-genome species. Neither G. herbaceum nor

G. arboreum was the cytoplasmic donor of tetraploid

species, strongly suggesting that AD genomes origi-

nated from an extinct ancestor of modern A-genome

species. We speculate that the genetic differentiation

of the chloroplast genome of each cotton species

resulted from the dispersal of that species and its

adaptations to local ecological conditions. These

cpSSR markers provided valuable information to

reveal the diversity and differentiation of cotton

during evolution.

Keywords Chloroplast � cpSSR � Genetic

differentiation � Gossypium

Introduction

Cotton (Gossypium spp.) consists of approximately 50

species (Fryxell 1992) distributed widely throughout

the tropical and subtropical regions of the world.

Gossypium species have been classified into eight

P. Li � J. Hua

Key Laboratory of Crop Heterosis and Utilization of

Ministry of Education, Beijing Key Laboratory of Crop

Genetic Improvement, College of Agronomy and

Biotechnology, China Agricultural University,

Beijing 100193, People’s Republic of China

P. Li � H. Liu

Key Laboratory of Crop Gene Resources and Germplasm

Enhancement on Loess Plateau of Ministry of Agriculture,

Institute of Cotton Research, Shanxi Academy of

Agricultural Sciences, Yuncheng 044000, Shanxi,

People’s Republic of China

Z. Li � J. Hua (&)

College of Agronomy and Biotechnology, China

Agricultural University, No 2, Yuanmingyuan West Road,

Beijing 100193, People’s Republic of China

e-mail: [email protected]

123

Genet Resour Crop Evol

DOI 10.1007/s10722-013-0018-9

diploid genome groups and one tetraploid genome

group, based on morphologic traits, cytogenetic pair-

ing, and fertility of interspecific hybrids (Endrizzi

et al. 1985; Wendel and Albert 1992). Four species are

cultivated, the main one being upland cotton (G. hirsu-

tum L.), which accounts for more than 90 % of cotton

fiber output. Sea island cotton (G. barbadense L.)

produces long-staple fibers and accounts for approx-

imately 8 % of cotton production. The other two

cultivated diploid species, G. herbaceum L. and

G. arboreum L., as well as selected wild relatives,

are potential gene pools for crop improvement, as they

harbor genes for high-quality fiber, high yield, and

resistance to pests, pathogens, and adversity. They

even serve as sources of cytoplasmic male sterility

(CMS) (Maqbool et al. 2008; Meyer 1975; Ulloa et al.

2005).

Traditionally, the systematics of cotton was based

on morphologic and cytogenetic characters, as well as

on biogeography (Fryxell 1992). Molecular biology

techniques have made it possible to analyze and

exploit phylogenetic relationships and interspecific

diversity. Various molecular datasets have been

collected from Gossypium. Such studies have exam-

ined the divergence of chloroplast DNA (cpDNA)

restriction-site variations (Wendel and Albert 1992),

and the diversity of the 5S ribosome gene and

intergenic DNA sequences (Cronn et al. 1996),

chloroplast genes (Cronn et al. 2002; Alvarez et al.

2005), nuclear genes (Seelanan et al. 1999; Small and

Wendel 2000), nuclear-gene introns (Liu et al. 2001),

and nuclear microsatellites (Wu et al. 2007). Most of

these studies concluded that molecular systematics

were complementary and largely congruent with

existing genome designations and geographical dis-

tributions. However, there are still some ambiguities

regarding which genomes were more ancestral among

extant cotton species and how these species dispersed

to other regions.

The chloroplast is essential for various biological

functions, such as photosynthesis, lipid metabolism,

and starch and amino acid biosynthesis (Armbruster

et al. 2011). Its conservative genome is descended

from an ancient cyanobacterial endosymbiosis event

(Leister 2003); however, some genes transferred from

the chloroplast to the host nucleus genome and the

mitochondrial genome during the course of evolution

(Timmis et al. 2004). Compared with the nuclear

genome, the chloroplast genome shows a lower

substitution rate, making it highly conserved (Wolfe

et al. 1987; Clegg et al. 1994). Consequently,

complementary information about chloroplast gen-

omes has been used to study the origin, evolution, and

diversity of various plants, including Amborella

(Goremykin et al. 2003), Malus (Harris et al. 2002),

Gnetales (Won and Renner 2005), Liriodendron

(Yang et al. 2011), and Brassica (Zamani-Nour et al.

2013).

A wide range of chloroplast molecular markers such

as universal primers, PCR–RFLP (polymerase chain

reaction-restriction fragment length polymorphisms),

cpSSR (chloroplast simple sequence repeats), and

InDel (insertions/deletions) markers have been used to

reveal phylogenetic relationships among plant species

(Taberlet et al. 1991, Ibrahim et al. 2007, Provan et al.

2001, Kelchner 2000). Chloroplast microsatellites are

distributed randomly throughout the chloroplast gen-

ome (Provan et al. 1999c, 2001). cpSSR analysis is a

high-resolution, specific polymorphic assay based on

PCR. Since the first report of cpSSR analysis in Pinus

(Powell et al. 1995), this technique has facilitated

analyses of population differentiation and gene flow in

a number of crops, including soybean (Powell et al.

1996), rice (Provan et al. 1997), maize (Provan et al.

1999a), barley (Provan et al. 1999b), wheat (Ishii et al.

2001), sunflower (Wills et al. 2005), sorghum (Li et al.

2010), oat (Li et al. 2009), and grapevine (Salmaso

et al. 2010).

Because of the low mutation and non-recombina-

tion properties of the chloroplast genome, its phylog-

eny can be reconstructed independently of that of the

nuclear genome (Martin et al. 2005). The chloroplast

genome is quite conservative, while the nuclear

genome has recombined continuously during the

evolution and species formation of cotton (Wendel

and Cronn 2003). Restriction site mutations in cpDNA

confirmed that the chloroplast genome of Gossypium

has descended through the female parent (Wendel

1989). The inheritance of chloroplast genomes reflects

patterns of seed flow and dispersal from progenitors to

descendants. For discovered the differentiation feature

and relationship of Gossypium chloroplast genomes,

cpSSRs were identified from the complete chloroplast

genomes of G. hirsutum (Lee et al. 2006) and

G. barbadense (Ibrahim et al. 2006). Eventually, we

developed 75 cpSSR markers from identified SSR loci

to investigate the diversity of chloroplast variation

among 41 cotton germplasm accessions.


123

https://www.researchgate.net/publication/10684997_Analysis_of_the_Amborella_trichopoda_chloroplast_genome_sequence_suggests_that_Amborella_is_not_a_basal_angiosperm_Mol_Biol_Evol?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/11240111_Genetic_clues_to_the_origin_of_the_apple_TRENDS_Genetics?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/12170834_Evolution_of_the_FAD2-1_Fatty_Acid_Desaturase_5'_UTR_Intron_and_the_Molecular_Systematics_of_Gossypium_Malvaceae?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/271391616_Phylogenetics_of_the_Cotton_Genus_Gossypium_Character-State_Weighted_Parsimony_Analysis_of_Chloroplast-DNA_Restriction_Site_Data_and_Its_Systematic_and_Biogeographic_Implications?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/51165819_Chloroplast_microsatellite_markers_in_Liriodendron_tulipifera_Magnoliaceae_and_cross-species_amplification_in_L_chinense?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/6640984_Complete_Nucleotide_Sequence_of_the_Cotton_Gossypium_barbadense_L_Chloroplast_Genome_with_a_Comparative_Analysis_of_Sequences_among_9_Dicot_Plants?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/7225120_The_complete_chloroplast_genome_sequence_of_Gossypium_hirsutum_Organization_and_phylogenetic_relationships_to_other_angiosperms?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/10981680_Chloroplast_research_in_the_genomic_age_Trends_Genet?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/14149951_Chloroplast_DNA_variability_in_wild_and_cultivated_rice_Oryza_spp_revealed_by_polymorphic_chloroplast_simple_sequence_repeats?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/8910147_Timmis_JN_Ayliffe_MA_Huang_CY_Martin_W_Endosymbiotic_gene_transfer_organelle_genomes_forge_eukaryotic_chromosomes_Nat_Rev_Genet_5_123-135?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/30848811_Cronn_RC_Zhao_XP_Paterson_AH_Wendel_JF_Polymorphism_and_concerted_evolution_in_a_tandemly_repeated_gene_family_5S_ribosomal_DNA_in_diploid_and_allopolyploid_cottons_J_Mol_Evol_42_685-705?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/15553647_Powell_W_Morgante_M_McDevitt_R_Vendramin_GG_Rafalski_JA_Polymorphic_simple_sequence_repeat_regions_in_chloroplast_genomes_applications_to_the_population_genetics_of_pines_Proc_Natl_Acad_Sci_USA_92_775?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

https://www.researchgate.net/publication/7606580_The_Chloroplast_trnT-trnF_Region_in_the_Seed_Plant_Lineage_Gnetales?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

Materials and methods

Plant materials

We evaluated 41 germplasm accessions of Gossypium

belonging to 28 species, six synthetic heterozygote

types, and one unknown genotype (Table 1). The

six synthetic heterozygote genotypes were hybrids

derived from two G. hirsutum L. varieties (Coker201

and Sm3) as the male parent with three female parents;

G. harknessii Brandegee CMS line (Meyer 1975),

G. hirsutum var. latifolium Hutchinson, and G. barba-

dense L. Each heterozygote was backcrossed with the

male parent for at least 22 generations. The wild and

semi-wild germplasm types of cotton were collected

from the National Experiment Station of Cotton Wild

Germplasms, Sanya, Hainan, China. The cultivated

and synthetic germplasms were obtained from the

experimental farm at China Agricultural University,

Beijing. To verify the polymorphisms of cpSSR mark-

ers, we selected six Gossypium accessions: G. hirsutum

‘Sm3’, G. barbadense ‘H7124’, G. harknessii, G. anom-

alum Wawra et Peyritsch, G. longicalyx Hutchinson et

Lee, G. somalense (Gurke) Hutchinson and two

synthetic heterozygotes G. harknessii 9 G. hirsutum

‘Coker201’ and G. harknessii 9 G. hirsutum ‘Sm3’.

Hibiscus syriacus L., a related species of Malvaceae,

was used to root the cluster trees.

Chloroplast SSR extraction and primer design

Sequences of the chloroplast genomes of G. hirsutum

(DQ345959) and G. barbadense (AP009123) were

downloaded from GenBank. The cpSSR loci were

queried by searches conducted with SSR Extractor (http://

www.aridolan.com/ssr/ssr.aspx?Header11:MenuLink1=4).

The labile length of the motif was considered to be eight

or more nucleotides (nt) (Rose and Falush 1998; Raube-

son et al. 2007). In this study, eight nt was the threshold for

motifs used to search for mononucleotide repeats in the

inverted repeat (IR) region, and nine nt was the threshold

for the small single copy (SSC) and large single copy

(LSC) regions. Motifs of 10 or more nt were used to

search for dinucleotide repeats. All cpSSR primers were

designed using Primer 3 software (http://frodo.wi.mit.

edu/primer3/input.htm). The optimal length of primers

was 20 nt and the annealing temperature (Tm) was 55 �C.

The product size was set at 100–300 base pairs (bp) as a

general range. These markers were denominated as GCS

numbers, the prefix being the acronym for Gossypium

Chloroplast SSR.

DNA extraction and PCR analysis

DNA was extracted from fresh leaves from an individual

plant of each sample type. Total DNA was extracted

using the CTAB method as described by Song et al.

(1998). The extraction buffer contained 2 mol l-1

NaCl, 0.1 mol l-1 Tris–HCl, 25 mmol l-1 EDTA-

Na2, 2 % (w/v) CTAB, 2 % (w/v) polyvinylpyrrolidone

40, and 2 % (w/v)b-mercaptoethanol, pH 8.0. The DNA

was precipitated with cold ethanol and dissolved in TE

buffer (0.01 M Tris–HCl, 0.001 M EDTA-Na2, pH 8.0).

The quality of isolated DNA was checked by electro-

phoresis on a 0.8 % (w/v) agarose gel and by spectro-

photometry at 260/280 nm. DNA solutions were stored

at -20 �C. Polymerase chain reactions were carried out

in a reaction volume of 15 ll containing 20 ng DNA,

1.5 ll Taq Platinum DNA polymerase reaction buffer,

2.0 ll MgCl2 (15 mM), 0.3 ll dNTP (10 mM), 2 ll

each forward and reverse primer (2 lM) and 1 U Taq

Platinum DNA polymerase (Tiangen, ET104). The PCR

conditions were as follows: 4 min at 94 �C, followed by

30 cycles of 30 s at 94 �C, 30 s at 50–55 �C (based on

the Tm of the primers), 60 s at 72 �C, and 5 min at 72 �C

for final extension. The PCR products were separated by

electrophoresis on 6 % polyacrylamide gels containing

7 M urea and stained with silver (Wu et al. 1999). The

molecular weight of each band was estimated based on

the PCR products amplified from G. hirsutum and

G. barbadense. These bands were scored as 1 (present)

or 0 (absent) in each migrating position, and these values

were used to construct a binary matrix.

Data analysis

Polymorphism information content (PIC) provides an

estimate of the discriminatory power of an SSR locus

by taking into account both the number of its alleles

and their relative frequencies in the target groups. If

each sample is homozygous, we can calculate the PIC

value according to the following formula:

PICi ¼ 1�Xn

j¼1

P2ij

where Pij is the frequency of the jth allele for marker

i summed across n alleles (Anderson et al. 1993).


123

http://www.aridolan.com/ssr/ssr.aspx?Header11:MenuLink1=4

http://www.aridolan.com/ssr/ssr.aspx?Header11:MenuLink1=4

http://frodo.wi.mit.edu/primer3/input.htm

http://frodo.wi.mit.edu/primer3/input.htm

https://www.researchgate.net/publication/5381524_Optimizing_parental_selection_for_genetic_linkage_maps_Genome?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

Table 1 Characterization of 41 cotton germplasm accessions used in this study

No. Taxon Genome Type Geographic origin

1 G. herbaceum L. subsp. africanum (Watt) Vollesen A1 Wild South Africa

2 G. herbaceum L. ‘Hongxing’ A1 Cultivated Asia and Africa

3 G. arboreum L. ‘Shixiya1’ A2 Cultivated Asia and Africa

4 G. anomalum Wawra et Peyritsch B1 Wild Africa

5 G. capitis-viridis Mauer B3 Wild Cape Verde Islands

6 G. sturtianum Willis C1 Wild Australia

7 G. thurberi Todaro D1 Wild America and Mexico

8 G. armourianum Kearney D2-1 Wild Mexico

9 G. harknessii Brandegee D2-2 Wild Mexico

10 G. davidsonii Kellogg D3-d Wild Mexico

11 G. klotzschianum Andersson D3-K Wild Mexico

12 G. aridum (Rose et Standley) Skovsted D4 Wild Galapagos Islands

13 G. raimondii Ulbrich D5 Wild Peru

14 G. gossypioides (Ulbrich) Standley D6 Wild Mexico

15 G. lobatum Gentry D7 Wild Mexico

16 G. trilobum (DC.) Skovsted D8 Wild Mexico

17 G. stocksii Masters E1 Wild Arabia

18 G. somalense (Gurke) Hutchinson E2 Wild North Africa

19 G. areysianum Deflers E3 Wild South Yemen

20 G. incanum (Schwartz) Hillcoat E4 Wild Yemen

21 G. longicalyx Hutchinson et Lee F1 Wild Africa

22 G. bickii Prokhanov G1 Wild Australia

23 G. nelsonii Fryxell G Wild Australia

24 G. australe F. von Mueller G Wild Australia

25 G. hirsutum L. ‘Coker201’ (AD)1 Cultivated Central America

26 G. hirsutum L. ‘Sm3’ (AD)1 Cultivated Central America

27 G. hirsutum var. richmondii Hutchinson (AD)1 Semi- wild Mexico

28 G. hirsutum var. palmeri Hutchinson (AD)1 Semi- wild Mexico

29 G. hirsutum var. latifolium Hutchinson (AD)1 Semi- wild Mexico

30 G. barbadense L. ‘H7124’ (AD)2 Cultivated South America

31 G. barbadense L.‘Lihe’ (AD)2 Semi-wild Yunnan, Chinaa

32 G. tomentosum Nuttall ex Seemann (AD)3 Wild Hawaii Islands

33 G. mustelinum Miers ex Watt (AD)4 Wild Brazil

34 G. darwinii Watt (AD)5 Wild Galapagos Islands

35 G. sp. ‘NG1’ – Wild Hainan, Chinaa

36 G. harknessii 9 hirsutum ‘Coker201’ – Synthetic heterozygote –

37 G. harknessii 9 hirsutum ‘Sm3’ – Synthetic heterozygote –

38 G. hirsutum var. latifolium 9 hirsutum ‘Coker201’ – Synthetic heterozygote –

39 G. hirsutum var. latifolium 9 hirsutum ‘Sm3’ – Synthetic heterozygote –

40 G. barbadense 9 hirsutum ‘Coker201’ – Synthetic heterozygote –

41 G. barbadense 9 hirsutum ‘Sm3’ – Synthetic heterozygote –

a Location of collection


123

The binary matrix was used to compute genetic

distance (Nei and Li 1979) between each two Gossy-

pium germplasm accessions using TREECON 1.3b

(Van de Peer and De Wachter 1994) and a rooted tree

was generated using the neighbor-joining (NJ) clus-

tering algorithm (Saitou and Nei 1987). The reliability

of clusters in the dendrogram was tested by bootstrap

analysis (Felsenstein 1985) with 1,000 replications.

Results

Characterization of Gossypium cpSSRs

The cpSSR motifs were distributed randomly across

the single-copy region (both the LSC and SSC

regions) of the Gossypium chloroplast genome,

whereas they were rare in the IR region. We obtained

100 mononucleotide and 16 dinucleotide cpSSRs,

with lengths of 8–16 and 10–14 nt, respectively.

Nucleotide combinations A/T and/or AT/TA, account-

ing for 93.3 % of motifs, were the most common

motifs. This was expected because of the high A/T

content in the chloroplast genome, especially in the

non-coding sequences (Lee et al. 2006; Xu et al.

2012). We developed 75 cpSSR markers from these

motifs, and all of them could be successfully amplified

from each of the eight verification Gossypium acces-

sions. These GCS markers produced a total of 12.6 kb

DNA sequences in G. hirsutum, accounting for

approximately 7.9 % of the upland cotton chloroplast

genome. Ten GCS markers were located within coding

sequences, and the other markers were located in the

intergenic regions or introns. Most of the products

were a single band (Fig. 1). However, some primers

produced two bands, e.g., GCS32 in the three

D-genome species G. klotzschianum Andersson,

G. raimondii Ulbrich, and G. lobatum Gentry, and

GCS71 and GCS77 in the E-genome species G. soma-

lense. The presence of dual bands suggests that some

altered sites or duplications exist within the haploid

chloroplast genomes of certain cotton species.

Polymorphic profile revealed by cpSSRs

Our 75 GCS markers revealed 50 polymorphic

patterns after screening eight verification Gossypium

accessions (Table 2). Forty-seven polymorphic mark-

ers were located in SC regions and three were located

in the IR region. The average number of polymorphic

sites per 1 kb sequence in the two SC regions was 0.42

and 0.49, respectively, 3.5–4.1-fold that in the IR

region (0.12), demonstrating that the rate of sequence

divergence in the SC regions was higher than that in

the IR region. Fifty cpSSRs markers revealed 249

alleles in 41 cotton accessions with the number of

polymorphic alleles ranging from 2 to 9 with a mean of

4.98. One exceptional marker, GCS80 (located in the

intergenic region of ndhF and ycf1), identified 16

polymorphic alleles. Of the 75 markers, 10 were

located in the coding sequences of the genes matK,

rpoC2, rpoC1, ycf1 and rpoA. However, these markers

were not polymorphic among the eight cotton

accessions.

The PIC values of cpSSR markers differed by

region and ranged from 0.11 to 0.88 with an average of

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

GCS10 GCS11

191bp

156bp

186bp

M

Fig. 1 Polymorphism of cpSSR markers GCS10 and GCS11 in

some Gossypium accessions. Samples 1–8 are G. hirsutum

‘Sm3’, G. barbadense ‘H7124’, G. harknessii, G. har-

knessii 9 G. hirsutum ‘Coker201’, G. harknessii 9 G. hirsu-

tum ‘Sm3’, G. anomalum, G. longicalyx and G. somalense,

respectively

Table 2 Polymorphism of

cpSSR markers in different

regions

Regions Length in G.

hirsutum (bp)

Number of

markers

Polymorphic

markers

Polymorphic

rate (%)

Polymorphic

sites/kb

LSC 88,816 51 37 72.55 0.42

IR 25,608 8 3 37.50 0.12

SSC 20,269 16 10 62.50 0.49

Total 134,693 75 50 66.67 0.37


123

https://www.researchgate.net/publication/22845449_Nei_M_Li_WH_Mathematical_model_for_studying_genetic_variation_in_terms_of_restriction_endonucleases_PNAS_USA_76_5269-5273?el=1_x_8&enrichId=rgreq-2fd621e1-640c-4260-9976-c5fe4962e481&enrichSource=Y292ZXJQYWdlOzI1NTcyMDcyMztBUzoxMDE5MTM1MzgzMzQ3MjNAMTQwMTMwOTQ4OTQ1NQ==

0.60. GCS74 and GCS74 were both located in an ndhA

intron; their PIC values were 0.65 and 0.21, respec-

tively. Nine cpSSR markers, all with five or more

alleles, had PIC values higher than 0.75 (Table 3).

These high PIC sites were located in non-coding

regions within the genes ndhF-ycf1, psaJ-rpl33, trnL-

rpl32, psaA-ycf3, rpoB-trnC, atpH-atpI, ndhD-ccsA,

and ycf4-cem A and in an intron of rps16. Six of the

highly polymorphic sequences were in the LSC

region, and three were in the SSC region.

Distinctive haplotypes in cotton accessions

We identified 25 distinctive cpDNA haplotypes that

were present only in single accessions. These cpDNA

haplotypes were distributed among 14 cotton acces-

sions (Table 4), and were found at high frequencies in

American D-genome and Australian G-genome spe-

cies. For example, G. thurberi Todaro, G. lobatum,

and G. nelsonii Fryxell each harbored three unique

alleles. In contrast, no unique haplotypes were

detected among the A- and F-genomes. GCS80,

located in the intergenic region of ndhF and ycf1,

with an allele size varying from 100 to 245 bp,

harbored up to seven distinctive haplotypes distributed

in three D-genome species, two G-genome species,

one E-genome species, and one AD-genome species

(Table 4). This might be due to insertions/deletions

sequences flanking the cpSSR locus and the boundary

between the IR region and the SC region. The common

cpDNA haplotypes of each genome type were those

that were only present in that genome category.

Sixteen haplotypes were identified in six genome

types (Table 5). All of the B- and E-genome species

possessed five common haplotypes. C-genome species

harbored three common haplotypes, and A-, AD-,

G-genome species each had one common haplotype.

Relationship between number of cpSSR loci used

for analyses and accuracy

Can relatively few markers correctly reflect the

differentiation and evolution of cotton species? To

answer this question, we extracted two samples of high

PIC-value cpSSR markers from our larger dataset and

constructed a cluster diagram containing 28 Gossypi-

um species. Compared with traditional systematics,

the dendrogram constructed using only nine markers

with PIC values higher than 0.75 (see Table 3) gave

incomprehensible results for most of the commonly

accepted clusters of cotton species (Fig. 2a). The D4

genome species Gossypium aridum (Rose et Standley)

Skovsted was clustered as the basal one and islanded

with other D-genome species. Members of the

D-genome were separated by E-genome and F-gen-

ome species, and G-genome species were divided into

two clusters. G. barbadense was closer to A-genome

species than to the other four allotetraploid species in

this cluster tree. Consequently, the ‘‘cluster analysis’’

derived from only nine markers poorly matched the

current classification. To reduce distortion, we

repeated the analysis with a slightly larger sample of

16 markers with PIC values higher than 0.70

(Table 3). This second dendrogram (Fig. 2b) grouped

closely related species together, and the relationships

among the genomes were more consistent with the

current classification. However, both of these limited

datasets were inadequate, and so we continued our

analyses using all polymorphic markers.

Cytoplasmic diversity and clustering of Gossypium

The genetic distance for 41 cotton accessions based on

pairwise comparisons of cpSSR marker alleles ranged

from 0.00 to 0.88. We generated a NJ cluster dendro-

gram based on genetic distance (Fig. 3). The cotton

accessions were grouped into six clusters: A ? AD, F,

D, B, C ? G, and E. Cluster A ? AD included

A-genome species and all allotetraploid accessions.

Two A-genome species and G. herbaceum L. subsp.

africanum (Watt) Vollesen grouped together with a

robust bootstrap value. The genetic distances among

A-genome species were the smallest among all of the

genomes. Only two markers, GCS17 and GCS30,

revealed the small amount of divergence between

G. arboreum and G. herbaceum, and no diversity was

detected between G. arboreum and G. herbaceum

subsp. africanum by cpSSR markers. The clustering

pattern indicated that allotetraploid species were likely

derived from one female parent, and all members

displayed extensive differentiation in the cpDNA

genome. The two cultivated species G. hirsutum and

G. barbadense were clustered relatively closely, and

G. tomentosum Nuttal ex Seemann and G. darwinii

Watt also had a close relationship. Gossypium mustel-

inum Miers ex Watt may have had an earlier evolu-

tionary origin than the other four allotetraploid species.

The five allotetraploid species were most closely


123

Table 3 Fifty polymorphic cpSSR markers in the cotton genus

Markers Sitesa Motifb Forward primers Reverse primers Alleles PIC Size(bp)c

GCS1 trnK intron A10 CGGATGGAGTAGATAATTTCC GGGAATAAACAGGGTTTTAGA 5 0.69 141

GCS3 rps16 intron C11 ATTGCAACGATTCGATAAAC ATGGATCTTTTTGACATGCT 3 0.58 177

GCS4 rps16 intron A11 GATCCATAAACCAGCAAATC TTTTTGAGCATTTTGAGAGTT 5 0.68 122

GCS5 rps16 intron A9 AAAAAGCATTCGTACTCTCA AAAAAGGGGTTAGAGACCAC 5 0.78 101

GCS6 rps16-trnQ A10 TGTATGATTGTCTGAATGCAA GCACGGTAGATTCAAAAAGA 5 0.66 199

GCS8 trnS-trnG A10T11 AGTCCTATTTCCGTTCCTATG GGATTCGACAAAAGGACTTA 5 0.72 196

GCS9 trnG intron T10 ACCTCTCAACGAAAGATTTG CCATGGATCTTTTCCTCATA 5 0.68 198

GCS10 atpH-atpI C14 TCAAAGGATAGACAAGAGCTG GGTCTAATGAATTCGTCCAT 7 0.78 191

GCS11 atpH-atpI A12 GGACGAATTCATTAGACCAA CCATTTCAGTCGATTTCTTC 5 0.72 156

GCS16 trnT-psbD T9 TCCGTCTACTAATTCATTCA TACCAATAAAAACAACATCC 3 0.58 186

GCS17 psbZ-trnG A11 TTCAGATTTTGAGACACATT TTACAGAAGTTTGACTGACC 5 0.72 98

GCS18 psaA-ycf3 (AT)7(TA)5 TCACGTGCACATTCATTACT TTCGTTTGATATTTCGTAAGG 8 0.79 173

GCS19 ndhJ-ndhK A12 CTCCCGCACTTTTCTTTT TTCACTATCTTCCCACGAAT 5 0.70 150

GCS21 ycf4-cem A T9 CCTTTCTTTTGTGCTCCTTA TTCGCGGGTTATCTAAACTA 3 0.53 200

GCS22 ycf4-cem A (AT)5 GCTCCTTCGTCTCAAAATC GTGCTTAGCCCTTGAATCTA 5 0.62 230

GCS23 ycf4-cem A T12 TTTCGAGATAAGCAAAGCAT GCCACGATTCTGCTATTTAC 5 0.75 134

GCS24 petA-psb J T10 TCTAGGAATTGCTTTTACCG CCCCAATTTAGTCCAATTTA 5 0.67 113

GCS25 petA-psb J G13 TCAATCTAAATTGGACTAAA TGAATTTAGAAAACAAAACC 5 0.72 103

GCS26 petA-psb J A9 AGAAAAGGTTTGAATCTGGT CATAGCATCTGCTCTTCGAT 4 0.68 139

GCS30 psaJ-rpl33 T15C11 CGAAAAAGATTAGATCGAG CGTTCTACCTTCCTTATTTA 9 0.86 128

GCS31 psaJ-rpl33 T10 CTTTCAAGATTTGGTTTTGAG TTTCGAACACAACTGGTACA 3 0.62 180

GCS32 clpP intron T10 TAATCCAATTACCACCCTTC GATTGCTGAATCACAGACG 6 0.71 100

GCS33 clpP intron A10 CGAAAGCTAAGATAAAATTG GTAATAATAGCATGGCACTT 5 0.62 131

GCS34 clpP intron A9 GCATACGGTTCAACAAAAAT GCCCCTTCGTTAGAAATTAG 2 0.16 108

GCS35 petB intron A10 TAGTATCTGGAGCACGGAAT AAGAAAGGTTTGTCCTTTGA 3 0.26 200

GCS36 petB intron G12 TGTTTGAGCTGTACGAGATG GCTCTTCGAACCAATCATAG 4 0.73 102

GCS38 petD intron T10 GCTCCGTAAGATCCAGTAGA CCTTGTTTCACTCCGATAGT 6 0.52 148

GCS39 petD-rpoA A10 GAGCAACATTACCGATTGAT GAGAATTCACTTTGCCTTTG 3 0.38 123

GCS40 rpl36-rps8 T10 TCTATTAGACAACCCGTGCT GAGGCTCGACTAGAAGGAAT 4 0.59 130

GCS41 rps8- rpl14 T10 TCCCGAATTTTGATATAACC CGGGAATTGAGACAGTTAAA 5 0.48 191

GCS42 rpl14- rpl16 A11 TGCTACATTCAAATGGGTCT GGAAAGAAGTCTTGTCTTGG 4 0.66 150

GCS48 ycf1-rps15 A12 TATACGAATCAAATCGAAAC CATTTTGATATACACGATGA 4 0.67 100

GCS50 rpl32-ndh F (TA)5 ATCGGTCTTTTGATGTCATT GAAGCATTTTATGCGATTTT 4 0.59 128

GCS51 rpl32-ndh F A10 ATGAATAGAGATGGGAAAAT TCCTATAGATTTGAATGGAG 5 0.65 169

GCS53 atpA-atpF A9 CGTCGGCCCTAATAGTTAC GCTAATATTGGCTTGTTTGG 5 0.40 147

GCS54 atpI-rps2 A9 TGCGTTTGATATACCATTCA GTGATTAGTTTCGTCGGTGT 5 0.55 132

GCS57 rpoB-trnC T9 TTATGCTCTGGGGTTTACAT TGGACGATTCTTCTTCACTT 7 0.79 163

GCS58 petN-psbM T11 TTCGAAACGAAATACGAAGA GAAGGAAAAATGGAATGGA 5 0.50 164

GCS61 atpB-rbcL T10 AATTCGAACCCGAACTCTAT TAGATGTGAAAACAGGCGTA 4 0.66 188

GCS64 clpP intron T10 TTATTTCGTCTGTGATTCAG CAATTTTATCTTAGCTTTCG 4 0.36 197

GCS66 rps12-trnV T12 TTGGAATCTGGGTTCTTCTA AGGATCAAACCTATGGGACT 5 0.62 139

GCS71 trnI intron T9 GCAATGGGATGTGTCTATTT TACCATGGCAAGTATTTGTG 2 0.11 163

GCS73 trnR-trnN T8 AATGGAGTGGCCTTTTATTT GTACTTGCTCTGCTATTCTGC 3 0.48 159

GCS74 ndhA intron T9 TTCGTGGTTTTATCAGATCC ATTTCAACCCATTGTTTTCT 7 0.65 180

GCS75 ndhA intron T9 CGAGATCAATTCAGAAGCAC CTCGTGGGTCACAAATAAAT 2 0.21 166

GCS76 ndhI-ndhG A9 GCAATTCACGCCTAATAGAT AAAATCGTGTATTGGTCCAG 2 0.21 154

GCS77 ndhD-ccsA A9 TTCTGGACCACGAGAGTTAT ACGACCAATTTTAAAAACCA 7 0.77 186

GCS78 trnL-rpl32 A9 GGTTAGTTTCGACAATCCAG GGATTCTTATTTTCCCCATC 8 0.81 202


123

related to A-genome species, among all of the diploid

species. From the dendrogram (Fig. 3), the germplasm

NG1 clustered in a group with G. darwinii based on

cpSSR markers, which had five different alleles in five

markers GCS6, GCS8, GCS33, GCS48 and GCS57.

These results suggested that NG1 was a new accession

belonging to G. darwinii. Cluster F consisted only of

the F-genome species G. longicalyx. The position of

G. longicalyx just basal to the A-genome in the

dendrogram suggests that A-genome and F-genome

species were derived from a common recent ancestor

and have undergone remarkable differentiation. Clus-

ter D consisted of 10 D-genome species, suggesting

that these D-genome species were descended from a

common ancestor. Gossypium klotzschianum, G. da-

vidsonii Kellogg, G. raimondii, and G. lobatum were

clustered together in a sub-group with a high bootstrap

value. These species have close relationships in their

cytoplasmic origin. Gossypium armourianum Kearney

and G. harknessii showed a similarly close relation-

ship. These two species have similar morphological

characters and belong to same subsection Caducibrac-

teolata. Gossypium trilobum (DC.) Skovsted and

G. thurberi formed another sub-group. They both

belong to subsection Houzingenia. G. aridum was

closely related to subsection Caducibracteolata and

G. gossypioides (Ulbrich) Standley was closely related

to subsection Houzingenia. However, G. aridum and

G. gossypioides clustered with little bootstrap support.

Cluster B contained B-genome species with a high

bootstrap value. Four diploid species native to Austra-

lia, representing different genomes (both C and G),

were assembled into cluster C ? G. The G-genome

species, G. nelsonii and G. australe F. von Mueller,

formed a reliable group. In contrast, G. bickii Prokha-

nov was the only G-genome species with a chloroplast

genome resembling that of the C-genome species

G. sturtianum Willis. Four E-genome species were

classified together as cluster E with high reliability.

Gossypium somalense and G. areysianum Deflers had a

comparatively close relationship. Similarly, G. inca-

num (Schwartz) Hillcoat and G. stocksii Masters

grouped together.

Table 4 Distinctive cpDNA haplotypes of Gossypium species

Species Genome Distinctive haplotypesa

G. tomentosum (AD)3 GCS30-130

G. anomalum B1 GCS32-98

G. sturtianum C1 GCS8-200, GCS35-205,

GCS77-193

G. thurberi D1 GCS18-168, GCS80-200

G. davidsonii D3-d GCS80-213

G. aridum D4 GCS53-153, GCS57-167

G. gossypioides D6 GCS38-150, GCS40-135,

GCS80-188

G. lobatum D7 GCS10-197

G. trilobum D8 GCS25-93, GCS80-208

G. stocksii E1 GCS38-156

G. incanum E4 GCS53-149

G. bickii G1 GCS6-193, GCS18-162,

GCS80-100

G. nelsonii G GCS74-176, GCS80-130

G. australe G GCS74-174, GCS80-150

a Number following cpSSR marker shows molecular weight of

each haplotype (similarly hereafter)

Table 5 Common cpDNA haplotypes for each genome type

Genomes

type

Accessions

no.

Common haplotypes

A 3 GCS24-125

AD 15 GCS78-202

B 2 GCS41-188, GCS74-130, GCS78-

186, GCS79-220, GCS80-245

C 1 GCS8-200, GCS35-205, GCS77-193

E 4 GCS1-135, GCS8-185, GCS24-105,

GCS35-198, GCS74-182

G 3 GCS11-153

Table 3 continued

Markers Sitesa Motifb Forward primers Reverse primers Alleles PIC Size(bp)c

GCS79 rpl32-ndhF A9 TCCATAAATTGGTCAAGCTC AACTGATTGATTGTCTTCCAC 7 0.50 234

GCS80 ndhF-ycf1 T9 TTTTAGTAATTTCCTACTTT TATACATGACGATAATCAAT 16 0.88 234

a Sites designated as ‘gene 1-gene 2’ indicate intergenic region between gene 1 and gene 2b Repeat numbers following motif refer to chloroplast genome of G. hirsutumc Fragment size was calculated based on chloroplast genome of G. hirsutum


123

Fig. 2 Neighbor-joining

(NJ) dendrogram of 28

cotton species derived from

cpSSR markers with

different PIC values.

A Dendrogram constructed

using nine markers with PIC

values[0.75, composed of

72 alleles. B Dendrogram

constructed using 16

markers with PIC values

[0.70, composed of 107

alleles. Each accession is

labeled with its genome

symbol. Lower-case letter in

parentheses denotes race or

species name. Hibiscus

syriacus (Hs) was used to

root all trees. Bootstrap

values[40% are shown

above branches. Scale bar

represents genetic distance

of 0. 1 (Nei and Li 1979)

Fig. 3 Neighbor-joining

(NJ) dendrogram of cotton

accessions derived from

diversities of all 50

polymorphic cpSSR

markers. A ? AD, F, D,

C ? G, B, and E are

symbols for each cluster.

Bootstrap values[40 % are

shown above the branches.

Hibiscus syriacus was used

to root the tree. Scale bar

represents genetic distance

of 0.1 (Nei and Li 1979)


123

Discussion

Our results indicated that the polymorphic rate was

higher in SC regions than in IR regions (Table 2),

and all polymorphic markers were located in non-

coding sequences (Table 3). This is due to the fact

that tandem repeats of microsatellites are usually

located in non-coding segments of DNA (Frankham

et al. 2004). That is, intergenic and intron sequences

in SC regions could harbor more information about

cytoplasmic diversity within the cotton genus. Given

the relative conservation and the low rate of differ-

entiation among chloroplast gene sequences, infor-

mation about cpSSR diversity can be used to infer

cotton phylogeny and to better understand its evolu-

tion. Our results authenticate that cpSSR analysis is a

useful tool to investigate plant classification and

chloroplast genome differentiation in lower-level

phylogenetic analyses (Powell et al.1995; Kelchner

2000). In our study, 50 cpSSR markers distributed

across the chloroplast genome confirmed that G. sp.

‘NG1’ is a new accession of G. darwinii. However, a

sufficient number of markers must be used for

analyses of genetic diversity. cpDNA analyses based

on only a few genes and intergenic sequences are not

reliable (Shaw et al. 2005, 2007). For example, when

we constructed a phylogenetic tree using only nine

cpSSR markers, the distinct E- and F-genomes were

grouped together with the D-genome (Fig. 2a, b).

Similar results were obtained when Glycine acces-

sions were examined using only two chloroplast

microsatellite loci (Doyle et al. 1998). When we used

50 cpSSR markers to generate a cluster dendrogram,

its topology structure was highly analogous that of

dendrograms constructed using traditional systemat-

ics analyses (Fig. 3). In general, hybridization is able

to change nuclear genetic components rapidly. Could

it play a role in maternal inheritance of the chloro-

plast genome? We compared cpSSR alleles of six

synthetic heterozygotes with those of their maternal

donor species, and detected no variations after more

than 22 generations of crossing (Fig. 3). Compari-

sons of alleles also revealed that no variations have

arisen between the two artificial G. hirsutum varieties

Coker201 and Sm3 (Fig. 3). The results indicated

that limited hybridization does not cause cpSSR

variations; that is, the rate of variation in cpSSRs is

generally lower than that of nuclear SSRs (Provan

et al. 1999c).

Allotetraploid cotton species clustered with A-gen-

ome taxa in the cpSSR analysis, supporting the

hypothesis that A-genome species or their ancestors

were the maternal donors of the allotetraploid cyto-

plasm (Wendel 1989). The genetic distance among

five allotetraploid species ranged from 0.20 to 0.53.

Even the intra-species distance of the AD1 genome

was up to 0.16. However, the distance between

G. herbaceum and G. arboreum was only 0.04. In

the analysis, there was no difference between G. her-

baceum subsp. africanum and G. arboreum. Thus, the

diversity both within and among species was greater

for AD-genome species than for A-genome species.

Suppose the two type chloroplast genomes followed a

consistent mutation rate (Xu et al. 2012); the relatively

small degree of divergence between G. herbaceum and

G. arboreum suggests that the divergence of A-gen-

ome species occurred rather recently, perhaps after the

formation of the first allotetraploid species. Conse-

quently, the direct maternal donor of the AD-genome

might be not the extant A-genome species, but more

likely their ancestor (Wendel and Cronn 2003).

This cpSSR analysis and previous biosystematic

analyses of diploid Gossypium species yielded differ-

ent results about the D-genome. G. aridum and

G. lobatum are two species typically placed in the

subsection Erioxylum (Fryxell 1992), a relationship

reinforced by a nuclear ribosomal ITS sequence

analysis that clustered these species together (Seela-

nan et al. 1997). However, the clustering patterns from

the cpSSR data in the present study and from an

analysis of cpDNA restriction site variations (Wendel

and Albert 1992) showed that those two species were

highly differentiated, with G. lobatum more closely

resembling G. raimondii in the cpDNA. A clustering

pattern based on InDel differentiation of D-genome

chloroplasts showed a similar result (unpublished

data). From a morphological perspective, G. raimondii

resembles G. klotzschianum and G. davidsonii. There-

fore, the position of G. raimondii revealed by cpSSRs

in our research seems to be reliable. Gossypium

gossypioides has been placed in ‘‘questionable’’

clades: nuclear DNA analysis suggested it was the

basal species of the D genome (Small and Wendel

2000; Cronn et al. 2003; Alvarez et al. 2005), while the

results of cpDNA analyses suggested that this species

was not the earliest in the D genome. In our study, the

cluster containing G. davidsonii represents the basal-

most split from most other D-genome species.


123

Gossypium bickii clustered with G. nelsonii and

G. australe within the G genome in both morpholog-

ical and nuclear genetic analyses (Wendel et al. 1991;

Liu et al. 2001). There are marked differences between

G. bickii and G. sturtianum when comparing their

morphology and nuclear genes (Wendel et al. 1991;

Liu et al. 2001). However, the results of the present

study showed that G. bickii and G. sturtianum have

similar chloroplast genomes. The mechanism of this

phenomenon may be similar to that of the synthetic

heterozygotes in our study (Table 1). They conserved

the cytoplast genomes of their female parents, but their

nuclear genomes were transformed to G. hirsutum

(Fig. 3). In an ancient era, the nuclear genome of

C-genome species was substituted by G-genome

species via an accidental multi-generational interspe-

cific crossing event, whereas the chloroplast genome

was retained with few mutations. This event might

have occurred as a result of an interaction with some

ancient insect in Australia. Wendel and Cronn (2003)

considered this to be an example of a phenomenon

known as ‘‘cytoplasmic capture’’.

The subgenus Gossypium includes members of the

A-, B-, E- and F-genome groups according to tradi-

tional cotton systematics, which is based in part on the

geographic distribution of these species (Fryxell 1979,

1992). Our results indicate that differentiation among

the maternal genomes of members of this sub-group

occurred at an early stage. This finding supports those

of Cronn et al. (2002), who provided strong evidence

based on both chloroplast and nuclear DNA analyses

that these genomes did not form a single subgenus. We

suggest that A- and F-genome species, B-genome

species, and E-genome species should be classified

into three different subgenera.

The biggest difference between our cpSSRs results

and those obtained from a 7121-bp chloroplast DNA

data set (Cronn et al. 2002) was the topology of

E-genome species. The former grouped cluster E as

the basal type among extant cotton taxa, while the

latter indicated that E-genome species were closely

related to A-, F- and D-genome species. From the

grouping process, we found that E-genome species

readily changed their positions in the cluster dendro-

gram (Figs. 2, 3). E-genome species clustered closely

with D genome species in the analysis based on nine

high-value PIC markers (Fig. 2). However, when all

of the markers were used to construct the dendrogram,

cluster E was apart from the root of Gossypium. The

difference might be because of the low-value PIC

markers harbored in the common cpDNA haplotypes

of E-genome species (Table 5). These haplotypes

segregated four E-genome species apart from the other

species. When the results of cpSSR and nuclear DNA

analyses were compared, the consensus clusters were

C- and G-genomes, and A- and F-genomes (Cronn

et al. 2002, 2003). However, the topology of B-, E- and

D-genome clusters showed conflicting results (Seela-

nan et al. 1997, Cronn et al. 2002). Our results suggest

that there has been evolutionary differentiation

between the chloroplast and nuclear genomes.

The relationships among Gossypium species deter-

mined from cpSSR analysis suggested a migration

route. The dendrogram indicated that the E-genome

was the first species derived from the primordial

Gossypium, most likely in northern African and/or

western Asia. We surmise that the B-, C- and

G-genome species had a common maternal origin,

with the C- and G-genomes were derived from the

B-genome, following long-distance dispersal to Aus-

tralia. Similarly, the D-genome dispersed to the

American continent. A-genome species were derived

from an F-genome ancestor, and then spread to the

Indian subcontinent and other regions of Africa.

American and Australian species have more particular

endemic characters in chloroplast genome evolution

as compared with African and Asian species. These

endemic characters resulted from adaptation to new

ecological conditions after migration.

Acknowledgments This work was supported by grants from

Ministry of Education (MOE), P. R. China (No. NCET-06-0106

and key project of MOE Grant No. 107012 to J. Hua). We thank

Professor Kunbo Wang (CRI, CAAS) for providing wild cotton

accessions. We also thank Professor Shu-Miaw Chaw (BRCAS,

Taiwan, China) for suggestions regarding cotton cluster tree

construction and Professor Yangdong Guo (China Agricultural

University) for helpful discussions.

References

Alvarez I, Cronn R, Wendel JF (2005) Phylogeny of the new

world diploid cottons (Gossypium L., Malvaceae) based on

sequences of three low-copy nuclear genes. Plant Syst Evol

252:199–214

Anderson JA, Churchill GA, Autrique JE, Tanksley SD, Sorrells

ME (1993) Optimizing parental selection for genetic

linkage maps. Genome 36:181–186

Armbruster U, Pesaresi P, Pribil M, Hertle A, Leister D (2011)

Update on chloroplast research: new tools, new topics, and

new trends. Mol Plant 4:1–16


123

Clegg MT, Gaut BS, Learn GH Jr, Morton BR (1994) Rates and

patterns of chloroplast DNA evolution. Proc Natl Acad Sci

USA 91:6795–6801

Cronn RC, Zhao X, Paterson AH, Wendel JF (1996) Polymor-

phism and concerted evolution in a tandemly repeated gene

family: 5S ribosomal DNA in diploid and allopolyploid

cottons. J Mol Evol 42:685–705

Cronn RC, Small RL, Haselkorn T, Wendel JF (2002) Rapid

diversification of the cotton genus (Gossypium: Malva-

ceae) revealed by analysis of sixteen nuclear and chloro-

plast genes. Am J Bot 89:707–725

Cronn R, Small RL, Haselkorn T, Wendel JF (2003) Cryptic

repeated genomic recombination during speciation in

Gossypium gossypioides. Evolution 57:2475–2489

Doyle JJ, Morgante M, Tingey SV, Powell W (1998) Size

homoplasy in chloroplast microsatellites of wild perennial

relatives of soybean (Glycine subgenus Glycine). Mol Biol

Evol 15:215–218

Endrizzi JE, Turcotte EL, Kohel RJ (1985) Genetics, cytogenet-

ics, and evolution of Gossypium. Adv Genet 23:271–375

Felsenstein J (1985) Confidence limits on phylogenies: an

approach using the bootstrap. Evolution 39:783–791

Frankham R, Ballou JD, Briscoe DA (2004) A primer of con-

servation genetics. Cambridge University Press, Cambridge

Fryxell PA (1979) The natural history of the cotton tribe. Texas

A & M University Press, College station, TX

Fryxell PA (1992) A revised taxonomic interpretation of Gos-

sypium L. (Malvaceae). Rheedea 2:108–165

Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2003)

Analysis of the Amborella trichopoda chloroplast genome

sequence suggests that Amborella is not a basal angio-

sperm. Mol Biol Evol 20:1499–1505

Harris SA, Robinson JP, Juniper BE (2002) Genetic clues to the

origin of the apple. Trends Genet 18:426–430

Ibrahim RIH, Azuma JI, Sakamoto M (2006) Complete nucleo-

tide sequence of cotton (Gossypium barbadense L.) chlo-

roplast genome with a comparative analysis of sequence

among 9 dicot plants. Genes Genet Syst 81:311–321

Ibrahim RIH, Azuma JI, Sakamoto M (2007) PCR-RFLP anal-

ysis of the whole chloroplast DNA from three cultivated

species of cotton (Gossypium L.). Euphytica 156:47–56

Ishii T, Mori N, Ogihara Y (2001) Evaluation of allelic diversity

at chloroplast microsatellite loci among common wheat

and its ancestral species. Theor Appl Genet 103:896–904

Kelchner SA (2000) The evolution of non-coding chloroplast

DNA and its application in plant systematics. Ann Mo Bot

Gard 87:482–498

Lee SB, Kaittanis C, Jansen RK, Hostetler JB, Tallon LJ, Town

CD, Daniell H (2006) The complete chloroplast genome

sequence of Gossypium hirsutum: organization and phylo-

genetic relationships to other angiosperms. BMC Genomics

7:61

Leister D (2003) Chloroplast research in the genomic age.

Trends Genet 19:47–56

Li W-T, Peng Y-Y, Wei Y-M, Baum BR, Zheng Y-L (2009)

Relationships among Avena species as revealed by con-

sensus chloroplast simple sequence repeat (ccSSR) mark-

ers. Genet Resour Crop Evol 56:465–480

Li R, Zhang H, Zhou X, Guan Y, Yao F, Song G, Wang J, Zhang

C (2010) Genetic diversity in Chinese sorghum landraces

revealed by chloroplast simple sequence repeats. Genet

Resour Crop Evol 57:1–15

Liu Q, Brubaker CL, Green AG, Marshall DR, Sharp PJ, Singh

SP (2001) Evolution of the FAD2-1 fatty acid desaturase 5’

UTR intron and the molecular systematics of Gossypium

(Malvaceae). Am J Bot 88:92–102

Maqbool A, Zahur M, Irfan M, Younas M, Barozai K, Rashid B,

Husnain T, Riazuddin S (2008) Identification and expres-

sion of six drought-responsive transcripts through differ-

ential display in desi cotton (Gossypium arboreum). Mol

Biol 42:492–498

Martin W, Deusch O, Stawski N, Grunheit N, Goremykin V(2005) Chloroplast genome phylogenetics: why we need

independent approaches to plant molecular evolution.

Trends Plant Sci 10:204–209

Meyer VG (1975) Male sterility from Gossypium harknessii.

J Hered 66:23–27

Nei M, Li W (1979) Mathematical model for studying genetic

variation in terms of restriction endonucleases. Proc Natl

Acad Sci USA 76:5269–5273

Powell W, Morgante M, McDevitt R, Vendramin GG, Rafalski

JA (1995) Polymorphic simple sequence repeat regions in

chloroplast genomes: applications to the population

genetics of pines. Proc Natl Acad Sci USA 92:7759–7763

Powell WM, Morgante JJ, Doyle JW, McNicol SV (1996) Gene

pool variation in genus Glycine subgenus Soja revealed by

polymorphic nuclear and chloroplast microsatellites.

Genetics 144:793–803

Provan J, Corbett G, McNicol JW, Powell W (1997) Chloroplast

DNA variability in wild and cultivated rice (Oryza spp.)

revealed by polymorphic chloroplast simple sequence

repeats. Genome 40:104–110

Provan J, Lawrence P, Young G, Wright F, Bird R, Paglia G,

Cattonaro F, Morgante M, Powell W (1999a) Analysis of

the genus Zea (Poaceae) using polymorphic chloroplast

simple sequence repeats. Plant Syst Evol 218:245–256

Provan J, Russell JR, Booth A, Power W (1999b) Polymorphic

chloroplast simple sequence repeat primers for systematic and

population studies in the genus Hordeum. Mol Ecol 8:505–511

Provan J, Soranzo N, Wilson NJ, McNicol JW, Morgante M,

Powell W (1999c) The use of uniparentally inherited

simple sequence repeat markers in plant population studies

and systematics. In: Hollingsworth PM, Bateman RM,

Gornall RJ (eds) Molecular systematics and plant evolu-

tion. Taylor and Francis, London, pp 35–50

Provan J, Powell W, Hollingsworth PM (2001) Chloroplast

microsatellites: new tools for studies in plant ecology and

evolution. Trends Ecol Evol 16:142–147

Raubeson LA, Peery R, Chumley TW, Dziubek C, Fourcade

HM, Boore JL, Jansen RK (2007) Comparative chloroplast

genomics: analyses including new sequences from the

angiosperms Nuphar advena and Ranunculus macranthus.

BMC Genomics 8:174

Rose O, Falush D (1998) A threshold size for microsatellite

expansion. Mol Biol Evol 15:613–615

Saitou N, Nei M (1987) The neighbor-joining method: a new

method for reconstructing phylogenetic trees. Mol Biol

Evol 4:406–425

Salmaso M, Vannozzi A, Lucchin M (2010) Chloroplast

microsatellite markers to assess genetic diversity and


123

origin of an endangered Italian grapevine collection. Am J

Enol Vitic 61:551–556

Seelanan T, Schnabel A, Wendel JF (1997) Congruence and

consensus in the cotton tribe. Syst Bot 22:259–290

Seelanan T, Brubaker CL, Stewart JM, Craven LA, Wendel JF

(1999) Molecular systematics of Australian Gossypium

section Grandicalyx (Malvaceae). Syst Bot 24:183–208

Shaw J, Lickey EB, Beck JT, Farmer SB, Liu WS, Miller J,

Siripun KC, Winder CT, Schilling EE, Small RL (2005)

The tortoise and the hareII: relative utility of 21 noncoding

chloroplast DNA sequences for phylogenetic analysis. Am

J Bot 92:142–166

Shaw J, Lickey EB, Schilling EE, Small RL (2007) Comparison

of whole chloroplast genome sequences to choose non-

coding regions for phylogenetic studies in Angiosperms:

the tortoise and the hare III. Am J Bot 94:275–288

Small RL, Wendel JF (2000) Phylogeny, duplication, and

intraspecific variation of Adh sequences in New World

diploid cottons (Gossypium L. Malvaceae). Mol Phyloge-

net Evol 16:73–84

Song GL, Cui RX, Wang KB, Guo LP, Li SH, Wang CY, Zhang

XD (1998) A rapid improved CTAB method for extraction

of cotton genomic DNA. Acta Gossypii Sin 10:273–275

Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal

primers for amplification of three non-coding regions of

chloroplast DNA. Plant Mol Biol 17:1105–1109

Timmis JN, Ayliffe MA, Huang CY, Martin W (2004) Endo-

symbiotic gene transfer: organelle genomes forge eukary-

otic chromosomes. Nat Rev Genet 5:123–135

Ulloa M, Saha S, Jenkins JN, Meredith WR Jr, McCarty JC Jr,

Stelly DM (2005) Chromosomal assignment of RFLP

linkage groups harboring important QTLs on an intraspe-

cific cotton (Gossypium hirsutum L.) Joinmap. J Hered

96:132–144

Van de Peer Y, De Wachter Y (1994) TREECON for Windows:

a software package for the construction and drawing of

evolutionary trees for the Microsoft Windows environ-

ment. Comput Appl Biosci 10:569–570

Wendel JF (1989) New World tetraploid cottons contain Old

World cytoplasm. Proc Nat Acad Sci USA 86:4132–4136

Wendel JF, Albert VA (1992) Phylogenetics of the cotton genus

(Gossypium): character-state weighted parsimony analysis

of chloroplast-DNA restriction site data and its systematic

and biogeographic implications. Syst Bot 17:115–143

Wendel JF, Cronn RC (2003) Polyploidy and the evolutionary

history of cotton. Adv Agron 78:139–186

Wendel JF, McD SJ, Rettig JH (1991) Molecular evidence for

homoploid reticulate evolution in Australian species of

Gossypium. Evolution 45:694–711

Wills DM, Hester ML, Liu A, Burke JM (2005) Chloroplast SSR

polymorphisms in the compositae and the mode of organ-

ellar inheritance in Helianthus annuus. Theor Appl Genet

110:941–947

Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide sub-

stitution vary greatly among plant mitochondrial, chloro-

plast, and nuclear DNAs. Proc Natl Acad Sci USA

84:9054–9058

Won H, Renner SS (2005) The chloroplast trnT–trnF region in

the seed plant lineage Gnetales. J Mol Evol 61:425–436

Wu GY, Pan HZ, Wu H (1999) Experiments in biochemistry and

molecular biology data handbook. Science Press, Beijing

Wu YX, Daud MK, Chen L, Zhu SJ (2007) Phylogenetic

diversity and relationship among Gossypium germplasm

using SSRs markers. Plant Syst Evol 268:199–208

Xu Q, Xiong GJ, Li PB, He F, Huang Y, Wang KB, Li ZH, Hua

JP (2012) Analysis of complete nucleotide sequences of 12

Gossypium chloroplast genomes: origin and evolution of

allotetraploids. PLoS ONE 7:e37128

Yang AH, Zhang JJ, Yao XH, Huang HW (2011) Chloroplast

microsatellite markers in Liriodendron tulipifera (Magnolia-

ceae) and cross-species amplification in L. chinense. Am J Bot

98:e123–e126

Zamani-Nour S, Clemens R, Mollers C (2013) Cytoplasmic

diversity of Brassica napus L., Brassica oleracea L. and

Brassica rapa L. as determined by chloroplast microsat-

ellite markers. Genet Resour Crop Evol 60:953–965


123

Documents

Cytoplasmic diversity of the cotton genus as revealed by chloroplast microsatellite markers