11
Functional centromeres in soybean include two distinct tandem repeats and a retrotransposon Ahmet L. Tek & Kazunari Kashihara & Minoru Murata & Kiyotaka Nagaki Received: 21 December 2009 / Revised: 29 January 2010 / Accepted: 4 February 2010 # Springer Science+Business Media B.V. 2010 Abstract The centromere as a kinetochore assembly site is fundamental to the partitioning of genetic material during cell division. In order to determine the functional centromeres of soybean, we charac- terized the soybean centromere-specific histone H3 (GmCENH3) protein and developed an antibody against the N-terminal end. Using this antibody, we cloned centromere-associated DNA sequences by chromatin immunoprecipitation. Our analyses indi- cate that soybean centromeres are composed of two distinct satellite repeats (GmCent-1 and GmCent-4) and retrotransposon-related sequences (GmCR). The possible allopolyploid origin of the soybean genome is discussed in view of the centromeric satellite sequences present. Keywords Soybean . centromere-specific histone H3 . kinetochore . satellite repeat . centromere-associated retrotransposon of Glycine max (GmCR) Abbreviations CENH3 Centromere-specific histone H3 EST Expressed sequence tag RACE Rapid amplification of cDNA ends PHEMES PIPES (piperazine-1,4-bis (2- ethanesulfonic acid)), HEPES (4-(2- hydroxyethyl)-1-piperazineethanesul- fonic acid), EGTA (ethylene glycol tetraacetic acid), MgCl2, sorbitol PBS Phosphate-buffered saline ChIP Chromatin immunoprecipitation DAPI 4,6-Diamino-2-phenylindole Introduction Faithful inheritance of genetic material requires proper chromosome segregation during mitosis and meiosis. The centromere, being the site of kinetochore assembly, plays a pivotal role in the attachment of a chromosome to the microtubule spindle machinery. Recent studies have identified more than 80 proteins as constituents of the centromere and suggest a conservation of essential centromere features across a broad evolutionary scale (Meraldi et al. 2006; Cheeseman and Desai 2008). Centromere-specific histone H3 (CENH3) is the major distinctive histone protein primarily located at Chromosome Research DOI 10.1007/s10577-010-9119-x Responsible Editor: Herbert Macgregor. Electronic supplementary material The online version of this article (doi:10.1007/s10577-010-9119-x) contains supplementary material, which is available to authorized users. A. L. Tek : K. Kashihara : M. Murata : K. Nagaki (*) Research Institute for Bioresources, Okayama University, Chuo 2-20-1, Kurashiki 710-0046, Japan e-mail: [email protected] A. L. Tek (*) Department of Agronomy, Faculty of Agriculture, Harran University, Sanliurfa, Turkey e-mail: [email protected]

Functional centromeres in soybean include two distinct tandem repeats and a retrotransposon

Embed Size (px)

Citation preview

Functional centromeres in soybean include two distincttandem repeats and a retrotransposon

Ahmet L. Tek & Kazunari Kashihara &

Minoru Murata & Kiyotaka Nagaki

Received: 21 December 2009 /Revised: 29 January 2010 /Accepted: 4 February 2010# Springer Science+Business Media B.V. 2010

Abstract The centromere as a kinetochore assemblysite is fundamental to the partitioning of geneticmaterial during cell division. In order to determinethe functional centromeres of soybean, we charac-terized the soybean centromere-specific histone H3(GmCENH3) protein and developed an antibodyagainst the N-terminal end. Using this antibody, wecloned centromere-associated DNA sequences bychromatin immunoprecipitation. Our analyses indi-cate that soybean centromeres are composed of twodistinct satellite repeats (GmCent-1 and GmCent-4)and retrotransposon-related sequences (GmCR). Thepossible allopolyploid origin of the soybean genomeis discussed in view of the centromeric satellitesequences present.

Keywords Soybean . centromere-specific histoneH3 .

kinetochore . satellite repeat . centromere-associatedretrotransposon ofGlycine max (GmCR)

AbbreviationsCENH3 Centromere-specific histone H3EST Expressed sequence tagRACE Rapid amplification of cDNA endsPHEMES PIPES (piperazine-1,4-bis (2-

ethanesulfonic acid)), HEPES (4-(2-hydroxyethyl)-1-piperazineethanesul-fonic acid), EGTA (ethylene glycoltetraacetic acid), MgCl2, sorbitol

PBS Phosphate-buffered salineChIP Chromatin immunoprecipitationDAPI 4,6-Diamino-2-phenylindole

Introduction

Faithful inheritance of genetic material requiresproper chromosome segregation during mitosis andmeiosis. The centromere, being the site of kinetochoreassembly, plays a pivotal role in the attachment of achromosome to the microtubule spindle machinery.Recent studies have identified more than 80 proteinsas constituents of the centromere and suggest aconservation of essential centromere features acrossa broad evolutionary scale (Meraldi et al. 2006;Cheeseman and Desai 2008).

Centromere-specific histone H3 (CENH3) is themajor distinctive histone protein primarily located at

Chromosome ResearchDOI 10.1007/s10577-010-9119-x

Responsible Editor: Herbert Macgregor.

Electronic supplementary material The online version of thisarticle (doi:10.1007/s10577-010-9119-x) contains supplementarymaterial, which is available to authorized users.

A. L. Tek :K. Kashihara :M. Murata :K. Nagaki (*)Research Institute for Bioresources, Okayama University,Chuo 2-20-1,Kurashiki 710-0046, Japane-mail: [email protected]

A. L. Tek (*)Department of Agronomy, Faculty of Agriculture,Harran University,Sanliurfa, Turkeye-mail: [email protected]

the centromere. CENH3 is essential for propercentromere formation and chromosome segregationin eukaryotes. In comparison to canonical histone H3,CENH3 forms a unique and more compact nucleo-some structure at the centromere (Black et al. 2004).The first CENH3 was identified in humans followinganalysis of scleroderma patient sera and was desig-nated CENP-A (Palmer et al. 1991; Sullivan et al.1994). Subsequently, homologues of CENH3 havebeen characterized in Drosophila (Henikoff et al.2000), Arabidopsis (Talbert et al. 2002), rice (Nagakiet al. 2004), maize (Zhong et al. 2002), tobacco(Nagaki et al. 2009), and Luzula (Nagaki et al. 2005).Thus, CENH3 is an evolutionarily conserved chro-matin component and defines functional centromereswith its integration at nucleosomes (Sullivan et al.2001; Jiang et al. 2003).

The centromere is a multi-layer DNA-proteincomplex (Santaguida and Musacchio 2009). TheDNA sequence at the base of the complex is a highlydynamic and less-conserved region of the domain (Hallet al. 2004; Ma et al. 2007). It is currently unknownwhy centromeres are seeded on fast evolving DNAsequences of eukaryote chromosomes. Centromereregions in higher eukaryotes are composed of severalmegabase-sized, highly repetitive, heavily methylatedDNA sequences. The precise boundaries of centro-meres remain to be determined for most chromosomesand species. Chromatin immunoprecipitation (ChIP)experiments using antibodies raised against CENH3proteins have proved crucial in efforts to delineate theDNA sequences embedded at functional centromeres(Zhong et al. 2002; Nagaki et al. 2003, 2004, 2009;Lee et al. 2005; Nagaki and Murata 2005; Houben etal. 2007).

Soybean (Glycine max) is the most important grainlegume in the world. It is considered a diploid (2n=40)legume species with a large and complex genome(1,100 Mbp; Arumuganathan and Earle 1991). Never-theless, recent studies provide controversial evidence ofauto- (Straub et al. 2006) or allopolyploidy (Lee andVerma 1984) with an emphasis on paleopolyploidorigin (reviewed in Shoemaker et al. 2006). Approxi-mately 36% of the soybean genome is heterochromatic(Singh and Hymowitz 1988). In higher plants, the largegenome size is correlated with an abundance ofrepetitive DNA elements located mainly in centromericregions and represent a major fraction of the genome(Jiang et al. 2003). However, little information is

available regarding the genetic composition of thesoybean genome. Recent studies indicate that thepericentromeric regions of soybean chromosomes con-tain the tandem repeat STR102 (Morgante et al. 1997)and derivatives of Calypso (Wright and Voytas 2002)and SIRE1 (Laten et al. 2003) retroelements (Lin et al.2005). Although the centromeric localization of twosubfamilies of a tandem repetitive sequence, CentGm-1and -2, was recently reported (Gill et al. 2009), theircontribution to centromere function remains to bedetermined.

In an effort to identify the functional components ofthe centromere in the soybean genome, we first clonedcDNA encoding CENH3 protein and raised a peptideantibody against the N-terminal region of the deducedamino acid sequence. Using this antibody in ChIPassays, we were able to clone centromeric DNAsequences wrapped around CENH3 in the soybeanchromosomes. This is the first report of functionalcentromere components identified in any legumespecies at both the DNA sequence and protein levels.Our results revealed the following new features ofsoybean centromere sequences: (1) soybean centro-meres contain primarily two distinct satellite repeats; (2)a retrotransposon-related sequence is enriched at soy-bean centromeres; (3) centromere satellite and retro-transposon sequences have accumulated to variableextents on each chromosome; (4) given the presence oftwo distinct centromeric satellites, the soybean genomecould be an ancient allopolyploid, supporting theprevious studies (Shoemaker et al. 2006).

Materials and methods

Plant material

Soybean (G. max cv. Mikawajima) seeds wereobtained from commercial sources. Seeds were germi-nated on moistened filter paper at room temperature,and plants were grown in a greenhouse. For cytologicalpreparations, root tips from germinated seeds wereused. Chromatin solution for ChIP assays was preparedfrom fresh leaves of greenhouse-grown plants.

Identification of soybean CENH3

To identify G. max orthologs of the CENH3 protein,hereafter named GmCENH3, we searched the

A.L. Tek et al.

expressed sequence tag (EST) database (http://blast.ncbi.nlm.nih.gov/Blast.cgi) using the tblastn pro-gram and the amino acid sequence of ArabidopsisCENH3, HTR12 (AF465800; Talbert et al. 2002), asa query.

RNA extraction and rapid amplification of cDNAends (RACE)

Total RNA was isolated from young seedlings usingthe RNeasy Plant Mini kit (Qiagen, Valencia, CA,USA) according to the manufacturer's instructions. TheSMART rapid amplification of cDNA ends (RACE)cDNA Amplification kit (BD, Franklin Lakes, NJ,USA) was used to clone the 5′ and 3′ ends of CenH3cDNA with the primers (Gm1H3r: 5′-TCACCAAGGCCTTCCTATTCCTC-3′; Gm2H3f: 5′-GGAACTGTGGCGCTTCGTGAGAT-3′). The amplifiedpolymerase chain reaction (PCR) products were thencloned into pGEM-T Easy Vector (Promega, Madison,WI, USA), and the sequences of RACE clones wereverified by cycle sequencing reactions (BigDye Termi-nator v1.1) using an ABI PRISM 310 GeneticAnalyzer (Applied Biosystems, Foster, CA, USA).Amino acid sequences were deduced from the DNAsequences and aligned with those of HTR12, riceCENH3 (OsCENH3, AY438635; Nagaki et al. 2009),maize CENH3 (ZmCENH3, AF519807; Zhong et al.2002), canonical Arabidopsis histone H3 (AtH3,NM_113651), and H3.2 (AtH3.2, NM_001085048)using Geneious Pro v. 4 Bioinformatics Software(Biomatters, Auckland, New Zealand; http://www.geneious.com/).

Development of the GmCENH3 antibodyand Western blotting

A peptide corresponding to the N-terminus ofGmCENH3 (H2N-ARVKHTPASRKSAKKQAPRA-COOH, amino acids 2-21) was synthesized andinjected into two rabbits. The antisera raised werepurified using an affinity sepharose column compris-ing the peptide. Protein extracts were derived from theinput (I), supernatant (S), and pellet (P) fractions ofthe ChIP experiment (below in detail). The Westernblot experiment was performed using an electro-chemiluminescence Plus Western Blotting DetectionSystem (Amersham Biosciences, GE Healthcare,Buckinghamshire, UK).

Immunostaining

Immunostaining of soybean centromeres with therabbit anti-GmCENH3 antibody was performed aspreviously described (Nagaki et al. 2009). Briefly,root tips from germinated seeds were cut to a lengthof 1–2 cm, fixed with 3% paraformaldehyde and0.2% Triton X-100 in PHEMES buffer (60 mMPIPES, 25 mM HEPES, 10 mM EGTA, 2 mMMgCl2, and 350 mM sorbitol, pH 6.7) for 20 min at4°C and washed twice in phosphate-buffered saline(PBS). The root tips were digested with 1% CellulaseOnozuka RS (Yakult Honsha Co., Minato-ku, Tokyo,Japan) and 0.5% Pectolyase Y23 (Kikkoman, Noda,Chiba, Japan) in PHEMES, and then washed again inPBS. After squashing on poly-L-lysine-coated slides(Matsunami, Kishiwada, Osaka, Japan), cells werewashed briefly in PBS. The GmCENH3 antibody,diluted 1:100 in TNB buffer containing 0.1 M Tris–HCl (pH 7.5), 0.15 M NaCl, and 0.5% blockingreagent (Roche, Basel, Switzerland) was applied andincubated at 4°C overnight. After washing in PBS, theantibody was detected with Alexa Fluor 546-conjugated anti-rabbit antibody (Molecular Probes,Eugene, OR, USA). Chromosomes were stained with4,6-diamino-2-phenylindole (DAPI) and visualizedusing a fluorescence microscope equipped with achilled charge-coupled device camera.

Chromatin immunoprecipitation (ChIP)

ChIP using the GmCENH3 antibody was performedon soybean nucleosomes as previously described(Nagaki et al. 2003, 2009). Leaves were ground inliquid nitrogen and suspended in Tris-buffered saline(1 mM Tris–HCl (pH 7.5), 3 mM CaCl2 and 2 mMMgCl2) containing 0.1 mM phenylmethylsulfonylfluoride and complete protease inhibitor cocktail(Roche). Nuclei were filtered through Miracloth(Calbiochem, Gibbstown, NJ, USA) and collectedby centrifugation. The nuclei were digested withmicrococcal nuclease (Sigma, St. Louis, MO, USA)to prepare the chromatin solution. The chromatinsolution was then incubated with the GmCENH3antibody at 4°C overnight, and antibody complexeswere captured with Dynabeads Protein A beads(Invitrogen, Carlsbad, CA, USA). The specificity ofthe GmCENH3 antibody was tested in both superna-tant and pellet fractions by Western blotting.

Functional centromeres of soybean

GmCENH3-bound DNA in the pellet fraction wasextracted with phenol/chloroform and precipitatedwith ethanol. After polishing the ends of precipitatedDNA with T4 DNA polymerase (Toyobo, Osaka,Japan), DNA was incubated with Taq polymerase(Promega) for the addition of adenine bases at bothends and subsequently cloned into pGEM-T Easyvector (Promega).

Slot blot hybridization

Slot blot hybridization was performed as described byNagaki et al. (2009). DNA isolated from singlecolonies was transferred onto a nylon membrane(Biodyne Plus, Pall) using a slot blot apparatus. Inorder to isolate the CENH3-associated repetitivesequences, the enriched DNA in the pellet fractionwas labeled and detected using a DIG High Primelabeling kit (Roche) and DIG luminescent detectionsystem (Roche), respectively.

Fluorescence in situ hybridization (FISH)

Fluorescence in situ hybridization (FISH) was per-formed on metaphase chromosomes as previouslydescribed (Jiang et al. 1995; Nagaki et al. 2009).Probe DNAs were labeled with biotin-dUTP using aBiotin-Nick Translation Mix (Roche) and withdigoxigenin-dUTP using a DIG-Nick TranslationMix (Roche). Hybridized probes were then detectedusing streptavidin-conjugated Alexa Fluor 488 (Invi-trogen) and rhodamine-conjugated anti-digoxigenin(Roche) antibodies. Chromosome counterstaining andsignal capturing were performed as described inimmunostaining.

Results

Identification and characterization of soybeancentromere-specific histone H3 variant

The amino acid sequence of CENH3 cDNA (HTR12)from Arabidopsis thaliana (Talbert et al. 2002) wasused as a query in a tblastn search against the ESTdatabase. Four soybean cDNA sequences (FK014964,FG987884, FK018984, and BQ612173) with full openreading frame were identified as putative GmCENH3s.Based on this sequence, specific primers were designed

and used for RACE-PCR analysis to confirm the full-length cDNA encoding a 158-amino acid protein. Thededuced GmCENH3 sequence was aligned withother plant CENH3s (Arabidopsis HTR12, riceOsCENH3, and maize ZmCENH3), Arabidopsiscanonical histone H3 (AtH3), and H3.2 (AtH3.2)proteins (Fig. 1).

Three key differences have been identified betweencentromeric and canonical histone H3 proteins (Malikand Henikoff 2003). We adopted these bioinformaticcriteria for analysis of the GmCENH3 protein. First, alonger and more divergent N-terminal tail was presentin the deduced GmCENH3 sequence (158 amino acidsin total) compared with AtH3 (136 amino acids intotal). Second, the loop 1 region in the histone folddomain is longer than that of canonical histone H3s(eight amino acids as opposed to seven for AtH3 andAtH3.2). Third, a conserved glutamine residue (Gln69)in AtH3 is replaced with an isoleucine residue (Ile88)in the deduced GmCENH3 protein. All of thesefindings suggest that the soybean cDNAs identifiedencode the CENH3 protein. However, despite theancient polyploid origin of soybean (Doyle et al.2004; Shoemaker et al. 2006), we failed to detect asecond transcribed homolog of CenH3 in the soybeancDNA pool, as has been found in polyploid tobaccoand rice (Nagaki et al. 2009; Hirsch et al. 2009).

Specificity of GmCENH3 antibody

Centromere specificity of the GmCENH3 antibodywas tested on soybean root-tip cells using animmunofluorescence assay (Fig. 2). Antibody againsta peptide corresponding to an N-terminal region ofthe deduced GmCENH3 sequence bound to centro-meres of metaphase chromosomes and on interphasenuclei, providing direct in situ evidence for centro-meric localization, and is consistent with the consti-tutive localization of CENH3 in other plant species(Talbert et al. 2002; Nagaki et al. 2004, 2005, 2009).Typically, GmCENH3 was localized towards theouter edge of the kinetochore on chromatids. Thedata demonstrate that GmCENH3 is a CENH3 variantand localizes to functional soybean centromeres.

DNA sequences coprecipitated with GmCENH3

CENH3 is at the foundation of a complex network ofproteins that forms a functional kinetochore. Centro-

A.L. Tek et al.

meric DNA is packaged by CENH3 and otherhistones to form centromeric nucleosomes. To iden-tify DNA sequences associated with soybean centro-meres, we performed a ChIP assay using theGmCENH3 antibody. The efficiency of the ChIPassay was investigated by Western blotting usinginput (I), supernatant (S), and pellet (P) fractions(Fig. 3). GmCENH3 signals were lower in the

supernatant fraction and higher in the pellet fractioncompared with the input fraction, and the datasupported actual accumulation of GmCENH3 in thepellet fraction.

Since the Western blot showed a clear differencebetween the supernatant and pellet fractions, wepurified the immunoprecipitated DNA from the pelletfraction and cloned it into a plasmid vector. In an effort

Fig. 2 Immunolocalization of the GmCENH3 protein insoybean centromeres. DAPI-stained chromosomes (a) and aninterphase cell (d) are shown in red. The signals derived from

the GmCENH3 antibody are in green (b, e). Chromosomes,interphase cell, and GmCENH3 signals are merged to showspecific centromere localization (c, f). Scale bar is 10 μm

Fig. 1 Identification of a sequence homologue of CENH3 inthe soybean. Multiple sequence alignment of CENH3 homo-logues from the soybean (GmCENH3), Arabidopsis (HTR12),

rice (OsCENH3), and maize (ZmCENH3). Canonical H3(AtH3) and H3.2 (AtH3.2) proteins were included to comparefunctionally critical residues

Functional centromeres of soybean

to identify CENH3-associated repetitive DNA sequen-ces, the immunoprecipitated DNA was employed as aprobe to screen the clones by slot blot hybridization.Twelve out of 96 independent clones showed stronghybridization signals in the slot blot analysis. TheDNA sequence of all 12 clones was determined anddeposited in Genbank/EMBL/DDBJ with the follow-ing accession numbers: G29 (AB536703), G30(AB536704), G48 (AB536705), G51 (AB536706),G56 (AB536707), G61 (AB536708), G69(AB536709), G71 (AB536710), G75 (AB536711),G87 (AB536712), G92 (AB536713), and G93(AB536714). Sequence analyses of the selected clonesindicated that they could be classified into at least threedifferent types of DNA sequences: CentGm-1,CentGm-4, and GmCR.

CentGm-1 family

The first group comprised five independent clones(G48, G56, G61, G71, and G87) out of the 12 clonesselected by slot blot hybridization (Fig. S1). Allclones comprised tandem repeats consisting of ahead-to-tail oriented 92-bp monomer with an averageA/T content of 59%. These sequences showedsignificant homology to previously reported soybean

SB92 satellite repeats (Kolchinsky and Gresshoff1995; Vahedian et al. 1995), which were recentlydivided into the two subfamilies, CentGm-1 and -2(Gill et al. 2009). We used G48 as being representa-tive of this group, designated as the CentGm-1 family,and hybridized it to soybean metaphase chromosomes(Fig. 4a, b, d). Although use of G48 resulted incentromeric signals on all 40 chromosomes, 24 of thesignals were significantly strong (Fig. 4b). A com-parison of the CentGm-1 clones using FISH indicateddivergence of the DNA sequences with variablesignal intensities at soybean centromeres (Fig. 4i),which is consistent with previous reports (Vahedian etal. 1995; Gill et al. 2009).

A novel component of functional centromeresin soybean (CentGm-4)

The second group, designated as CentGm-4, com-prised five independent clones (G29, G30, G51, G75,and G93; Fig. S2). Only the 432-bp G29 clonecomprised a single 411-bp monomer (from 5 to415 bp) without any internal motif. The 361-bp G30clone contained flanking (227 and 134 bp) sequenceshomologous to G29 at both the 5′ and 3′ ends (shownas G30-a and -b with thin and thick boxes, respec-tively, in Fig. S2). This implies a head-to-tailorientation for CentGm-4. The other three clones,147-bp G51, 356-bp G75, and 178-bp G93, showedhomology to the 411-bp CentGm-4 monomer. Fur-thermore, a 153-bp region of G92 (700–852 bp; seebelow for details) displayed sequence homology toCentGm-4 (G92-b, indicated by the dotted box,Fig. S2).

BLAST searches against the nonredundant nucle-otide (NR) database with this repetitive sequence didnot find any significant similarity. However, BLASTsearches using single-pass genome survey sequences(http://www.ncbi.nlm.nih.gov/dbGSS/index.html) andunfinished high throughput genomic sequences(http://www.ncbi.nlm.nih.gov/HTGS/index.html)revealed homologies with soybean. Eleven bacterialartificial chromosome (BAC) clones (AC236207,AC236144, AC236205, AC236160, AC235824,AC236201, AC236204, AC236228, AC236171,AC236126, and AC236134) harbored monomers withsequence identities ranging from 74% to 96% againstthe G29 sequence. For example, a partially assembledsoybean BAC clone (AC236126) bearing a total of 69

Fig. 3 Western blot analysis of soybean leaf extract usingGmCENH3 antibody. Two bands appear at approximately24 kDa (GmCENH3) and 48 kDa (IgG). Protein extracts arederived from the input (I), supernatant (S), and pellet (P)fractions of the ChIP experiment. The positions correspondingto GmCENH3 are shown with an arrow. Protein size markers(M) are shown in kilodaltons

A.L. Tek et al.

copies of the CentGm-4 family comprising anapproximately 411-bp monomer contained a 12-kbregion in a head-to-tail orientation. This providesfurther support that CentGm-4 is a novel satellite

repeat of an approximately 411-bp monomer and isorganized in tandem at soybean centromeres.

In an effort to confirm that the CenGm-4 familywas derived from centromeres, G30, chosen as being

Fig. 4 Fluorescence in situ hybridization analysis of DNAclones derived from the ChIP experiment using soybeanmetaphase chromosomes. CentGm-1 (G48) and CentGm-4(425 bp, G30) probes are detected with avidin fluorescein (b,green) and anti-rhodamine digoxigenin (C, red), respectively,

on metaphase chromosomes (a–d). A retrotransposon-relatedsequence (G92-a, f) is primarily localized at the centromeres (g).The G71 clone (a CentGm-1 sequence, i) and the G69 clone (j)co-localized on soybean nuclei (k). Scale bar is 10 μm

Functional centromeres of soybean

representative, was labeled with digoxigenin and thenused in the FISH analysis of metaphase chromosomes(Fig. 4a, c, d). The probe DNA exclusively showedcentromeric signals on all chromosomes (Fig. 4c).While 14 strong signals were obtained, these invari-ably co-localized with G48 signals (Fig. 4d), indicat-ing that both CentGm-1 and -4 sequences are presentat soybean functional centromeres.

Other DNA sequences at soybean centromeres(GmCR)

The last group comprised two unrelated clones (G69and G92). The 421-bp G69 clone did not show anysignificant sequence homology through BLASTsearches in the GeneBank/EMBJ/DDBJ. However,FISH analysis of soybean nuclei using G69 as a probe(Fig. 4h, j) yielded a signal pattern that completelyoverlapped that pattern obtained for clone G71, aCentGm-1 family sequence (Fig. 4i, k). This resultsuggests that G69, in addition to the CentGm-1family, is a part of functional centromeres in soybean.

The 852-bp G92 clone contained a sequence at the3′ end (700–852 bp) highly homologous to theCentGm-4 satellite, as shown in the sequence align-ment (dotted box in Fig. S2). On the other hand, the700-bp sequence at the 5′ end of clone G92 was nothomologous to G69, or CentGm-1 or -4 sequences, butwas similar to retrotransposon-related sequences. Inorder to determine the chromosomal origin of the 700-bp sequence, designated G92-a, the correspondingregion was amplified with a primer set (G92-af:5′-TTGCAAGAGCCTAAGTG-3′ and G92-ar:5′-TGTGGAGGTTGTCTAAGAG-3′) and labeledwith digoxigenin. FISH analysis using G92-a as aprobe resulted in centromeric localization on soybeanchromosomes (Fig. 4e–g). Although the signals wereweaker and more diffuse compared to those obtainedwhen using CentGm-1 and CentGm-4 sequences, thespecific hybridization enriched at the centromeressuggests that G92-a is a centromere-related retrotrans-poson sequence in soybean.

Further BLAST searches with G92-a indicatedhigh similarity to a group of retrotransposon sequen-ces (45m6-re-2, accession no. FJ197991; 45m6-re-1,accession no. FJ197992; 71h23-re-11, accession no.FJ197994; and 77p13-re-9, accession no. FJ197995 inFig. S3). These were previously characterized to beintact, but nonautonomous, since they lack fully

functional gag and pol genes (Wawrzynski et al.2008). This class of retrotransposons is highlyabundant in the soybean genome and has beenrepresented as family 6. They are classified separatelyfrom the previously identified SIRE1, Diaspora andCalypso, elements retroelements (Wright and Voytas2002; Laten et al. 2003; Yano et al. 2005). Family 6retrotransposons possess similar long terminalrepeats, primer binding sites, and polypurine tracks(Wawrzynski et al. 2008). Although we identifiedonly one retrotransposon-related clone bordered by acentromeric CentGm-4 satellite, our results clearlyshow that the retrotransposon-related sequenceGmCR is enriched at functional centromeres insoybean.

Discussion

Centromere research in plants contributes greatly to ourunderstanding of centromere structure, function, andcomposition. Currently, comparisons of centromericDNA are limited to species within Gramineae andBrassicaceae, and these show that some DNA elementsare conserved among closely related species (Hall et al.2004). Therefore, more extensive analysis in otherplant species, such as Leguminosae (Fabaceae), isnecessary to shed light on the conserved componentsof centromeres. Here, we provide characterization ofsoybean functional centromeres and analyses of DNAsequences embedded in this rapidly evolving locuswithin the genome.

Although centromere research of legumes is still inits infancy, centromeric satellites have been identifiedin several species. For example, a satellite (Ljcen1)derived from a BAC clone was localized at centro-meres in Lotus japonicus (Pedrosa et al. 2002). InMedicago truncatula, a 166-bp centromeric satellite(MtR3) was found to occupy from ∼450 kb to over amegabase of heterochromatic region on each chromo-some (Kulikova et al. 2004). Recently, two subfami-lies from the CentGm-1 family were identified insubsets of centromeres in soybean (Gill et al. 2009).The CentGm-1 family is present only in G. max and aclose progenitor Glycine soja, and not in otherGlycine species (Kolchinsky and Gresshoff 1995;Vahedian et al. 1995; Gill et al. 2009). A similarlineage-specificity was reported for a centromericsatellite (TrR350) in Trifolium repens (Ansari et al.

A.L. Tek et al.

2004). In general, a single satellite repeat is identifiedas a major constituent of centromeric DNA in legumespecies.

Two centromere satellite sequences in soybean

Our results reveal the presence of two distinct satellitesequences at soybean centromeres. Such a finding issurprising since most species usually possess only asingle centromeric satellite (Jiang et al. 2003; Hall etal. 2004). Nevertheless, the rapid evolution of centro-meric satellites has been noted with the appearance ofdiverged and/or novel centromeric sequences in Arabi-dopsis (Kawabe and Nasuda 2005) and Oryza (Lee etal. 2005; Bao et al. 2006). CentGm-1 and -4 do notdifferentiate the soybean chromosomes in terms ofabundance and localization, as based on FISH analysis.Therefore, they are unique in their organization sincethey are broadly distributed in each chromosome.Although we did not observe chromosome-specificvariants of either soybean centromeric satellites, someform of homogenization process could be underway.Analyses of centromeric DNA sequences across a widerange of organisms suggest that primary DNA sequen-ces do not play a role in centromere identity andpropagation (Sullivan et al. 2001). The presence of twodistinct satellite repeats in functional soybean centro-meres suggests that centromere identity is determinedby epigenetic mechanisms. Our FISH analysis showsthat CentGm-1 and CentGm-4 are present at differentlevels in soybean centromeres.

A satellite repeat and a centromere-specific retro-transposon (CR) element, both of which interact withCENH3, are a common feature of most cerealcentromeres (Zhong et al. 2002; Nagaki et al. 2003,2004; Nagaki and Murata 2005; Houben et al. 2007;Han et al. 2010). However, in other species, retro-transposons can dominate centromeres without anyinterference from satellite repeats (Liu et al. 2008).Although CR elements are known for their exclusivelocalization at centromeres, GmCR is not strictlyconfined to centromeres in soybean, but is distributedat and around centromeres. A similar observation inBeta vulgaris and Beta procumbens has been reportedwith Ty3-gypsy-like elements (pBv26 and pBp10)that are enriched at centromere and pericentromereregions (Gindullis et al. 2001). Given the recentactivation of retrotransposons (Wawrzynski et al.2008), it appears that soybean centromeres are

primarily composed of two satellite sequences withrelatively fewer interruptions by retrotransposon-related elements. Further investigations are requiredto determine the composition and contribution ofGmCR elements in soybean centromeres.

One intriguing aspect of the soybean centromericsatellite repeats CentGm-1 and CentGm-4 is theunusual monomer size. CentGm-1 with a 92-bpmonomer covers half a single nucleosome, while the411-bp CentGm-4 covers more than two nucleo-somes. Satellite repeats at centromeres for mostspecies are usually in the size range of a singlenucleosome. It is believed that there is an evolution-ary constraint on the monomer length of centromericsatellites across most species (Henikoff et al. 2001).For centromeres, nucleosomal unit length, as opposedto a sequence motif across species, is suggested to berestricted by evolution in terms of a structural rolewithin the genome. However, the presence of soybeancentromere satellites with unusual monomer sizesmight provide an exception to this hypothesis. Sincethe CentGm-1 and -4 sequences were isolated byChIP cloning in this study, these two satellites directlyinteract with GmCENH3, and both are probablypackaged into nucleosomes at centromeres.

Centromere–pericentromere junction in soybean

In eukaryotes, centromeres and their flanking pericen-tromeric regions often show densely packed andDAPI-stained heterochromatic structures. The borderbetween these two peculiar structures is usuallydifficult to delineate with precision at both thechromosome and DNA sequence levels. However,these two domains are distinct. Satellite sequencespresent at centromeres extend beyond the functionalboundaries of centromeres, placing only a portion ofthe satellite sequences in direct contact with CENH3(Zhong et al. 2002; Nagaki et al. 2003; Shibata andMurata 2004; Houben et al. 2007). In soybean, there isa complex sequence organization at pericentromeres. Arecent analysis indicates that pericentromeric regions ofsoybean chromosomes bear retroelements and tandemrepeats (Lin et al. 2005). A BAC clone, 076J21, whichhas sequences homologous to a tandem repeat(STR120) and a retroelement (SIRE1), hybridizes toheterochromatic regions at pericentromeres of all 40chromosomes. Our analysis in this study indicated thatsoybean centromeres are mainly composed of two

Functional centromeres of soybean

distinct satellite sequences (CentGm-1 and -4) and acentromeric retrotransposon (GmCR). The results fromour study and others suggest that centromeric andpericentromeric regions are distinct in sequence com-position, although both are embedded in a heterochro-matic domain.

Allopolyploidy in soybean

The soybean genome has been considered an ancientpolyploid and has been suggested to have undergonetwo or more rounds of large-scale duplications(Shoemaker et al. 1996). Such large-scale duplica-tions could have rendered the soybean genomerelatively complex in structure, and these may alsohave affected the DNA sequence organization ofcentromeres. The presence of less-diverged centro-meric repeats with local sequence similarity along themonomer sequence is a common occurrence inallopolyploid plants originating from ancestralgenomes (Kamm et al. 1995; Kawabe and Nasuda2005). In an allohexaploid, Coix aquatica, a 153-bpcentromere satellite derived from Coix lacryma-jobilocalizes to 20 out of 30 chromosomes (Han et al.2010). In another allopolyploid, Arabidopsis suecica,the centromere satellites from progenitors A. thalianaand Arabidopsis arenosa are retained on the chromo-somes (Pontes et al. 2004), while all centromeres arerecognized by HTR12 antibody (Talbert et al. 2002).In these examples, parental centromere satellites arelargely retained in the polyploids in a chromosome-specific manner. The identification of two distinctsatellite repeats (CentGm-1 and CentGm-4) at cen-tromeres supports the hypothesis that soybean is anallopolyploid. However, CentGm-1 and -4 were foundin our study to be intermingled extensively on eachsoybean centromere. It is possible that each satellitecould be derived from progenitor centromeres of thecurrent soybean genome. Polyploidization, recombi-nation, and/or restructuring could account for thelocalization of both satellites on the same soybeancentromeres. Previous studies showed that only G.soja contains CentGm-1. An investigation of thedistribution of CentGm-4 sequences among closelyrelated species may shed light on the possiblemechanisms of evolution in this important species.

Acknowledgements We thank Dr. Jiri Macas for the help insequence analysis. This work was supported by the Fellowship

Program of the Japan Society for the Promotion of Science(JSPS) to ALT and KN.

References

Ansari HA, Ellison NW, Griffiths AG, Williams WM (2004) Alineage-specific centromeric satellite sequence in thegenus Trifolium. Chromosome Res 12:357–367

Arumuganathan K, Earle ED (1991) Nuclear DNA content ofsome important plant species. Plant Mol Biol Rep 9:208–218

Bao W, Zhang W, Yang Q et al (2006) Diversity of centromericrepeats in two closely related wild rice species, Oryzaofficinalis and Oryza rhizomatis. Mol Genet Genomics275:421–430

Black BE, Foltz DR, Chakravarthy S, Luger K, Woods VL,Cleveland DW (2004) Structural determinants for gener-ating centromeric chromatin. Nature 430:578–582

Cheeseman IM, Desai A (2008) Molecular architecture of thekinetochore-microtubule interface. Nat Rev Mol Cell Biol9:33–46

Doyle JJ, Doyle JL, Rauscher JT, Brown AHD (2004)Evolution of the perennial soybean polyploid complex(Glycine subgenus Glycine): a study of contrasts. Biol JLinn Soc 82:583–597

Gill N, Findley S, Walling JG et al (2009) Molecular andchromosomal evidence for allopolyploidy in soybean.Plant Physiol 151:1167–1174

Gindullis F, Desel C, Galasso I, Schmidt T (2001) The large-scale organization of the centromeric region in Betaspecies. Genome Res 11:253–265

Hall AE, Keith KC, Hall SE, Copenhaver GP, Preuss D (2004)The rapidly evolving field of plant centromeres. Curr OpinPlant Biol 7:108–114

Han Y, Wang G, Liu Z et al (2010) Divergence in centromerestructure distinguishes related genomes in Coix lacryma-jobi and its wild relative. Chromosoma 119:89–98

Henikoff S, Ahmad K, Platero JS, van Steensel B (2000)Heterochromatic deposition of centromeric histoneH3-like proteins. Proc Natl Acad Sci USA 97:716–721

Henikoff S, Ahmad K, Malik HS (2001) The centromereparadox: stable inheritance with rapidly evolving DNA.Science 293:1098–1102

Hirsch CD, Wu Y, Yan H, Jiang J (2009) Lineage-specificadaptive evolution of the centromeric protein CENH3 indiploid and allotetraploid Oryza species. Mol Biol Evol26:2877–2885

Houben A, Schroeder-Reiter E, Nagaki K et al (2007) CENH3interacts with the centromeric retrotransposon cereba andGC-rich satellites and locates to centromeric substructuresin barley. Chromosoma 116:275–283

Jiang J, Gill BS, Wang GL, Ronald PC, Ward DC (1995)Metaphase and interphase fluorescence in situ hybridiza-tion mapping of the rice genome with bacterial artificialchromosomes. Proc Natl Acad Sci USA 92:4487–4491

Jiang J, Birchler JA, Parrott WA, Dawe RK (2003) A molecularview of plant centromeres. Trends Plant Sci 8:570–575

Kamm A, Galasso I, Schmidt T, Heslop-Harrison JS (1995)Analysis of a repetitive DNA family from Arabidopsis

A.L. Tek et al.

arenosa and relationships between Arabidopsis species.Plant Mol Biol 27:853–862

Kawabe A, Nasuda S (2005) Structure and genomic organiza-tion of centromeric repeats in Arabidopsis species. MolGenet Genomics 272:593–602

Kolchinsky A, Gresshoff PM (1995) A major satellite DNA ofsoybean is a 92-base pairs tandem repeat. Theor ApplGenet 90:621–626

Kulikova O, Geurts R, Lamine M et al (2004) Satellite repeatsin the functional centromere and pericentromeric hetero-chromatin of Medicago truncatula. Chromosoma 113:276–283

Laten HM, Havecker ER, Farmer LM, Voytas DF (2003)SIRE1, an endogenous retrovirus family from Glycinemax, is highly homogeneous and evolutionarily young.Mol Biol Evol 20:1222–1230

Lee JS, Verma DP (1984) Structure and chromosomal arrange-ment of leghemoglobin genes in kidney bean suggestdivergence in soybean leghemoglobin gene loci followingtetraploidization. EMBO J 3:2745–2752

Lee HR, Zhang W, Langdon T et al (2005) Chromatinimmunoprecipitation cloning reveals rapid evolutionarypatterns of centromeric DNA in Oryza species. Proc NatlAcad Sci USA 102:11793–11798

Lin JY, Jacobus BH, SanMiguel P et al (2005) Pericentromericregions of soybean (Glycine max L. Merr.) chromosomesconsist of retroelements and tandemly repeated DNA andare structurally and evolutionarily labile. Genetics170:1221–1230

Liu Z, Yue W, Li D et al (2008) Structure and dynamics ofretrotransposons at wheat centromeres and pericentro-meres. Chromosoma 117:445–456

Ma J, Wing RA, Bennetzen JL, Jackson SA (2007) Plantcentromere organization: a dynamic structure with con-served functions. Trends Genet 23:134–139

Malik HS, Henikoff S (2003) Phylogenomics of the nucleo-some. Nat Struct Biol 10:882–891

Meraldi P, McAinsh AD, Rheinbay E, Sorger PK (2006)Phylogenetic and structural analysis of centromeric DNAand kinetochore proteins. Genome Biol 7:R23

Morgante M, Jurman I, Shi L, Zhu T, Keim P, Rafalski JA(1997) The STR120 satellite DNA of soybean: organiza-tion, evolution and chromosomal specificity. ChromosomeRes 5:363–373

Nagaki K, Murata M (2005) Characterization of CENH3 andcentromere-associated DNA sequences in sugarcane.Chromosome Res 13:195–203

Nagaki K, Talbert PB, Zhong CX, Dawe RK, Henikoff S, JiangJ (2003) Chromatin immunoprecipitation reveals that the180-bp satellite repeat is the key functional DNA elementof Arabidopsis thaliana centromeres. Genetics 163:1221–1225

Nagaki K, Cheng Z, Ouyang S et al (2004) Sequencing of arice centromere uncovers active genes. Nat Genet 36:138–145

Nagaki K, Kashihara K, Murata M (2005) Visualization ofdiffuse centromeres with centromere-specific histone H3in the holocentric plant Luzula nivea. Plant Cell 17:1886–1893

Nagaki K, Kashihara K, Murata M (2009) A centromeric DNAsequence colocalized with a centromere-specific histoneH3 in tobacco. Chromosoma 118:249–257

Palmer DK, O'Day K, Trong HL, Charbonneau H, Margolis RL(1991) Purification of the centromere-specific proteinCENP-A and demonstration that it is a distinctive histone.Proc Natl Acad Sci USA 88:3734–3738

Pedrosa A, Sandal N, Stougaard J, Schweizer D, Bachmair A(2002) Chromosomal map of the model legume Lotusjaponicus. Genetics 161:1661–1672

Pontes O, Neves N, Silva M et al (2004) Chromosomal locusrearrangements are a rapid response to formation of theallotetraploid Arabidopsis suecica genome. Proc NatlAcad Sci USA 101:18240–18245

Santaguida S, Musacchio A (2009) The life and miracles ofkinetochores. EMBO J 28:2511–2531

Shibata F, Murata M (2004) Differential localization of thecentromere-specific proteins in the major centromericsatellite of Arabidopsis thaliana. J Cell Sci 117:2963–2970

Shoemaker RC, Polzin K, Labate J et al (1996) Genomeduplication in soybean (Glycine subgenus soja). Genetics144:329–338

Shoemaker RC, Schlueter J, Doyle JJ (2006) Paleopolyploidyand gene duplication in soybean and other legumes. CurrOpin Plant Biol 9:104–109

Singh RJ, Hymowitz T (1988) The genomic relationshipbetween Glycine max (L.) Merr. and G. soja Sieb. andZucc. as revealed by pachytene chromosome analysis.Theor Appl Genet 76:705–711

Straub SC, Pfeil BE, Doyle JJ (2006) Testing the polyploid pastof soybean using a low-copy nuclear gene—is Glycine(Fabaceae: Papilionoideae) an auto- or allopolyploid? MolPhylogenet Evol 39:580–584

Sullivan KF, Hechenberger M, Masri K (1994) Human CENP-A contains a histone H3 related histone fold domain that isrequired for targeting to the centromere. J Cell Biol127:581–592

Sullivan BA, Blower MD, Karpen GH (2001) Determiningcentromere identity: cyclical stories and forking paths. NatRev Genet 2:584–596

Talbert PB, Masuelli R, Tyagi AP, Comai L, Henikoff S (2002)Centromeric localization and adaptive evolution of anArabidopsis histone H3 variant. Plant Cell 14:1053–1066

Vahedian M, Shi L, Zhu T, Okimoto R, Danna K, Keim P (1995)Genomic organization and evolution of the soybean SB92satellite sequence. Plant Mol Biol 29:857–862

Wawrzynski A, Ashfield T, Chen NW et al (2008) Replicationof nonautonomous retroelements in soybean appears to beboth recent and common. Plant Physiol 148:1760–1771

Wright DA, Voytas DF (2002) Athila4 of Arabidopsis andCalypso of soybean define a lineage of endogenous plantretroviruses. Genome Res 12:122–131

Yano ST, Panbehi B, Das A, Laten HM (2005) Diaspora, alarge family of Ty3-gypsy retrotransposons in Glycine max,is an envelope-less member of an endogenous plantretrovirus lineage. BMC Evol Biol 5:30

Zhong CX, Marshall JB, Topp C et al (2002) Centromericretroelements and satellites interact with maize kineto-chore protein CENH3. Plant Cell 14:2825–2836

Functional centromeres of soybean