Complete Structure of the Chloroplast Genome of is Thaliana

Embed Size (px)

Citation preview

  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    1/8

    DN A RESEARCH 6, 283-290 (1999)

    Complete Structure of the Chloroplast Genome of ArabidopsisthalianaShusei SATO, Yasukazu NAKAMURA, Takakazu K A N E K O , Erika ASAMIZU, and Satoshi TABATA*Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu , Chiba 292-0812, Japan(Received 21 May 1999)Abstract

    The complete nucleotide sequence of the chloroplast genome of Arabidopsis thahana has been deter-mined. The genome as a circular DNA composed of 154,478 bp containing a pair of inverted repeats of26,264 bp, which are separated by small and large single copy regions of 17,780 bp and 84,170 bp, respec-tively. A total of 87 potential protein-coding genes including 8 genes duplicated in the inverted repeatregions, 4 ribosomal RNA genes and 37 tRNA genes (30 gene species) representing 20 amino acid specieswere assigned to th e genome on the basis of similarity to the chloroplast genes previously repo rted for otherspecies. The translated amino acid sequences from respective potential protein-coding genes showed 63.9%to 100% sequence similarity to those of the corresponding genes in the chloroplast genome of Nicotianatabacum, indicating the occurrence of significant diversity in the chloroplast genes between two dicot plants.The sequence da ta and gene information are available on the World Wide Web database KAOS (KazusaArabidopsis data Opening Site) at http://www.kazusa.or.jp/arabi/.Key words: Arabidopsis thaliana; chloroplast; genome sequencing

    1. IntroductionThe complete sequences of the chloroplast genomeswere first reported for tobacco1 and liverwort2 in 1986.Since then, the chloroplast genome sequences of a num-ber of land plants and algae have been determined.3^14The complete genome structure of a cyanobacteriumSynechocystis sp. PCC6803, the most primitive plant-type photosynthetic organism, has also been reported.15The accumulation of such data has made it possibleto study the evolutional relationship among the chloro-plast genomes and their ancestors. One notion derivedfrom such study is that there was a massive transferof genes from ancestral organelles to nuclei.16 Compari-son of nuclear and chloroplast genomes at the sequencelevel should provide invaluable information for under-standing of the origin and function of the chloroplast incells. In this respect, Arabidopsis thaliana, an excellentmodel organism for the analysis of the complex biolog-ical processes in plants,17 is the most appropriate ma-terial because entire genome sequencing of this plant isin progress18 '19 by international efforts in which we areinvolved.20 Here we determined the complete sequence ofthe chloroplast genome of this plant and compared withthe those of other chloroplasts reported to date. Struc-

    Communicated by Mituru Takanami* To whom correspondence should be address ed. Tel. +81-43 8-52-3933, Fax. +81-438-52-3934, E-mail: tabata@ kazus a.or.jp

    tural similarity with the genome of a cyanobacteriumSynecohcysitis sp. strain PCC6803 was also investigated.2. Materials and Methods2.1. DNA sources

    The Mitsui PI library of Arabidopsis thalianaColumbia, which has been used for sequencing of thechromosomal genome, was adopted for screening of thechloroplast DNA, as the library had been preparedfrom the whole cellular DNA.21 PI clones harboring thechloroplast genome sequences were isolated by screeningthe library with the following probes derived from thetobacco chloroplast:1 pTB30 (psaB), pTS8 (petB), pPac-nD (ndhD), and psbA-F (psbA), which were provided byDr. M. Sugiura of Nagoya University.2.2. DNA sequencing

    The nucleotide sequence of each PI insert was deter-mined according to the bridging shotgun method de-scribed previously.20 Briefly, the purified P I DNA wassubject to sonication followed by size-fractionation onagarose gel electrophoresis. Fractions of approximately1.0 kb and 2.5 kb were respectively cloned into M13mpl8and to construct the libraries of element and bridgeclones. Clones were propagated on microtiter dishes, andthe supernatants were used for preparation of sequence

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    2/8

    284 Sequencing of the Arabidopsis thaliana Chloroplast Genome [Vol. 6,

    petB

    84,170psbA1 / 154,478

    IR110,434ndhD ssc

    1 2 8 , 2 1 4

    Figure 1. Structure of the A. thaliana chloroplast genome and thepositions of sequenced PI clones. The outer circle shows theoverall structure of the chloroplast genome consisting of a largesingle-copy region (LSC), a small single-copy region (SSC) andinverted repeat regions (IRA and IRB) represented by thicklines. The positions of the genetic markers which were used forclone selection are indicated outside of the circle, and the re-gions covered by selected PI clones, MAB17, MAH2, and MCI3are indicated by inner arcs. The sequence information was ob-tained from the regions represented by thick lines on the clones.The initial order of the four regions deduced from the sequencedata was LSC-IRA-SSC-IRB by counterclockwise as shown inthis map, but the SSC sequence between IRA and IRB wasinverted to conform to the indication of reported chloroplastsequences and used for the further analyses.

    temp lates. For sequencing the element clones, single-stranded DNAs were prepared from 100 /ul each of phagesupernatants according to standard procedures and useddirectly as templates. Inserts of the bridge clones wereamplified by PCR in the reaction mixture of 20 /ul con-taining 2 jA of the phage su pernata nt, 50 mM KC1,10 mM Tris-HCl (pH 9.0), 0.1% Triton X100, 1.5 mMMgC b, 50 fiM each of dNTPs, 2 units of Taq polymerase(TaKaRa, Japan), and 100 nM each of the following setsof primers:KFw (5'-GGGTTTTCCCAGTCACGAC-3')KRv (5'-TTATGCTTCCGGCTCGTATGTTGTG-3')PCR amplification was performed through 30 cycles ofthe tem peratu re shift consisting of 96 C for 10 sec and70C for 60 sec, followed by the final extension at 70 Cfor 7 min in a PJ9600 thermal cycler. The products weresubjected to purification by polyethylene glycol and usedfor the sequencing reaction.2.3. DNA sequencing and data assembly

    The sequencing reaction was performed using the cy-cle sequencing kits (Dye-primer Cycle Sequencing kitand Dye-terminator Cycle Sequencing kit of PerkinElmer Applied Biosystems, USA) and reaction robots(Catalyst 800 of Applied Biosystems, USA), according

    to the protocol recommended by manufacturers. TheDNA sequencers used were type 373XL and 377XL ofPerkin Elmer Applied Biosystems. The single-pass se-quence data from one end of element clones and bothends of bridge clones were accumulated and assem-bled using Phred-Phrap programs (Phil Green, Univ. ofWashington, Seattle, USA) and the auto-assembler soft-ware of Applied Biosystems, USA.2-4- Computer-assisted data analysis

    The nucleotide sequences were translated in six framesusing the universal codon table, and each frame wassubjected to similarity search against the non-redundantprotein database, owl (release 29), using the BLASTPprogram.22 Positions of each local alignment, whichshowed similarity with scores of 70 or more to knownprotein sequences, were extracted and aligned along thequery sequences. If internal gaps occurred, th e align-ments below the score of 70 were re-searched to fill in thegaps.

    Structural RNA genes were identified by similaritysearch against the structural RNA data set from Gen-Bank with the BLASTN program,22 and defined as theregions with the local alignments showing 80% or moreidentity to the query sequences along 50 bp or more nu-cleotides. For assignment of tRNA genes, the tRNAscan-SE program23 was applied for prediction.

    3. Results and Discussion3.1. Overall structure of A. thaliana chloroplast genomeThe sequence of the chloroplast genome of Arabidop-sis thaliana sp. Columbia could be constructed by as-sembling the sequences of three partially overlappingPI clones. The complete genome finally deduced was154,478 bp in size. The sequences of nucleotide positions38,670-120,256, 120,257-154,478/1-29,018 and 29,019-38,679 were respectively obtained from clones MAB17,MAH2 and MC I3, as shown in Fig. 1. The genomeconsisted of a pair of inverted duplications of 26,264 bp(IRA and IRB) which are separated by long and shortsingle copy regions of 84,170 bp (LSC) and 17,780 bp(SSC). This overall structure of the A. thaliana chloro-plast genome is typical for land plant chloroplasts.24 '20Although the order of the four regions originally con-structed from MAB17 and MAH2 was LSC-IRA-SSC-IRB counterclockwise as shown in Fig. 1, we inverted thedirection of the SSC sequence between IRA and IRB toconform to the indication of previously reported chloro-plast sequences, and the sequence of the structural iso-form, LSC-IRB-SSC-IRA, was used for further analy-ses. The overall A+T content was 63.7%, which is sim-ilar to those of tobacco (62.2%), rice (61.1%) and maize(61.5%). The A+T content of the LSC and SSC regionswere 66.0% and 70.7%, respectively, whereas that of the

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    3/8

    No. 51 285

    psbi Im Rpsl>K imG"

    imS trnfM psaB

    pshB psbHpsb'I pelB* pel!)*

    ndhft ndhE ncihlpsaC ndhG

    120 13 0

    Gene expressionTranscription ^ |translation ^ |iRNA H

    Im V

    Photosynthetic apparatusPhokJsysiem! ^ ^ MPhoiosyslem II fl|^|Cytochrome b/f ^ ^ ^ ^ ^ ^ |

    Photosynthetic metabolism

    NADHttehydrogciu

    Figure 2. Gene organization of the A. thaliana chloroplast genome. The circular genome of the A. thaliana chloroplast was openedat the junction at IRA and LSC and is represented by a linear map starting from this junction point. The potential protein codingregions are indicated by boxes on both sides of the middle horizontal lines. The genes on the upper side are transcribed from leftto right, and the lower side, from right to left. The putative genes of which the function could be deduced by similarity search areindicated by the gene names. The genes classified into 9 groups according to the biological function are shown by different colorcodes. The intron-containing genes are indicated by asterisks, and the position and the length of the intron is shown by the dottedhorizontal line. The positions of ribosomal and tRNA genes are also shown in the map. The nucleotide sequence of the A. thalianachloroplast genome appears under the accession number AP000423 in the DDBJ/GenBank/EMBL DNA databases.

    IR-regions is 57.7% due to the presence of an rRNA genecluster.The shifts of the border positions between the two in-verted repeat regions (IRA and IRB) and two single copyregions (LSC and SSC) have been observed among var-ious chloroplast species.26~29 To evaluate the differenceof the IR lengths in the chloroplast genomes between A.thaliana (26,264 bp) and tobacco (25,339 bp), the ex-act IR border positions were compared with respect tothe adjacent genes between two species. Whereas a verysmall shift (2 bp) was observed for the junction of IRAand LSC, larger shifts were present at other three junc-tions. The same tendency was seen in the po sitions of theIR border between rice and maize chloroplast genome.5In A. thaliana, the junction between LSC and IRB is lo-

    cated within the rpsl9 gene, and the junction betweenIRB and SSC is within the ndhF gene. In tobacco, thesetwo genes are located in the single copy regions.3.2. Structural features of the putative protein-codinggenes

    The potential protein-coding regions were deduced asdescribed in Materials and Methods, and the positionsof a total of 87 genes including 79 unique gene speciesand 8 duplicated genes in the inverted repeat regionswere localized on the map (Fig. 2). The predicted aminoacid sequences of th e A. thaliana chloroplast genes werethen compared to those in the completely sequenced plas-tid and cyanobacterial genomes (Table 1). All of themshowed the highest identity to those of tobacco, although

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    4/8

    286 Sequencing of the Arabidopsis thaliana Chloroplast Genome [Vol. 6.

    Table 1. List of the potential protein-coding genes assigned in the A. thaliana chloroplast genome and identity with the orthologousgenes. The translated amino acid sequences of 79 assigned genes were compared with those of the corresponding genes in thegenomes of Nicotiana tabacum, Zea mays, Oryza sativa, Marcha ntia polymorpha, Pmus thunbergii, Epifagus virginiana, Euglenagracilis, Cyanophora paradoxa, Odontella sinensis, Porphyra purpurea, and Synechocystis sp. PCC6803.

    Gene expressionTranscriptionrpoAwo B

    rpoClrpo2Translan>l2rp!14

    rpU6n>120rp!22rp!23w\32rp!33rpi36ws2rps3rps4jps7rpsg

    rpsllrpsl2wsl4rpsl5rvsl6rpslSrpsl9

    79.60%92.00%91.10%74.60%

    'Jon96.90%90.20%90.30%82.90%72.30%95.70%84.60%86.40%100.00%89.40%87.60%89.10%97.40%84.30%88.40%95.10%90.20%86.40%7850%86.10%88.00%

    68.00%79.10%79.10%6950%

    Zmmtyt68.00%79.70%85.60%68.10%57.70%87.80%64.90%74.20%91 90%77.00%67.70%79.10%84.00%75.70%65.00%85.40%79.70%81.10%75.90%67.30%64.90%

    OryxtnUv,

    67.10%79.60%79.30%61.20%

    87.40%83.70%85.60%68.10%55.90%85.90%64 90%74.20%91.90%77.90%63.20%80.10%83.30%75.70%65.70%

    L 84.60%83 70%74.40%58.60%67.30%64.90%

    Mmbana a polym opba53.30%68.60%67.70%44.90%

    jure***. ,*jiar0009nniOCO!

    72.7%80.:%42.0%2 5 . : %81.3%6 7 5 %6 0 5 %53.8%65.0%52.4%58.0%5 7 5 %41.8%6.9.3%67 5%53 8%67.7%81.0%

    .111326* 1 3 rirl33OIIUW. . 0 6 I S|z7 . , i i n i3 .^ o i3 i . .n : i i

    .1X26 .111731 >0e44

    ^10521HCX1dlC520df!281rfrl20dKOX

    OthersaccDclpPmatKycflvcf2vcDvcf4vcf5vcf6ycf?yc(9

    VcflOorf76

    Mcebm Macm65.7%84.7%63.9%81.1%88.7%100.0%88.0%65.9%100.0%93.5%95.2%79,5%76.0%

    68.0%53.2%

    52.7%93.7%8 0 5 %65.0%96.6%83.9%83.9%62.0%53.4%

    47.3%66.5%52.5%

    92.1%80.0%65.3%100.0%83.9%90.3%59.0%

    March Boa pc VMcrpht

    70.2%74.0%33.9%31.0%19.2%87.2%64.3%53 1%86.2%67.7%83.9%49.4%

    Pumlbanbtrjii

    62.8%61.3%43.2%394%26.4%86.4%71.2%52.6%89.7%69.Q%75.8%-56.4%

    Epdtgnr \ffgmnnt

    57.2%84.2%47. C% i33.9%58.7%

    35.8%

    30.6%

    48.3%

    38.6%

    74.2%41.0%41.4%79 3%41.9%

    dtoat.lf..

    574%38.3%41.6%69.0%355%31.4%

    Porftoynprpum

    52.6%

    66.9%47.5%38.2%69 0%38.7%47.5%30.4%

    SynachocyiOi

  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    5/8

    No. 5] S. Sato et al. 287

    Alynoacyl D domain Anticodon domain -stem stem loop stem stem loop stem

    Variable region TPsyC domain AmTnoacylstem loop stem stem

    LS Ctrn/1-GVG - 4. . 76GCOGATG TA OCC AAGTGGATTAA GGC A GTGGA TTGTGAA TTCACtmt-mu- 1717. .1761. Mi l. . 4347GGGTTGC TA ACTC AACGGTA OAGT A CTCOG CTTTTAA CCSACtraQ-OUG- 6 6 1 6 . 6 6 8 7TCGGG CG TA GCC AAGCGGTAA GCC A ACGGG TrTTGGT CCCGCtrnS-OCV- 7786. .7872GGAGAGA TS GCT GAGTCGACTAA ACC G TTGGA TTGCTAA TCCATtroS-VCC* 8646. .8668. 9383. . 9431GCGGGTA TA GTTT ACTGGTAA AACC C TTAGC CTTCCAA GCTAAtroP-VCff* 9690..9661GCGTCCA TT CTCT AATGGATA GGAC A TAGGT CTTCTAA ACCTTtrnC-lX* 27373. .27443GGCGGCA TG GCC GAGTCGTAA GGC G GGGGA CTCCAAA TCCTTtroD-GOC - 29801. . 29874

    G0GATT5 TA GTTC AATTGOTCA GAGC A CCGCC CTSTCAA GGCGCtrnY-CtH - 30323. . 30406OGGTCGA TS CCCG AGCGGTTAA TSGG G ACGGA CTGTAAA TTCGTtrnT-BBC - 30466. . 30638OCCCCCA TC GTCT AGTGGTTCA GGAC A TCTCT CTTTCAA GGAGGtmT-OSe* 31369. .314406CCCTTT TA ACTC AOTOGTA OAGT A ACGCC ATGGTAA GGCOTtrnS-OXi - 36312. .36403SGAGASA TG GCCG ASTSGTTGA TGGC T CCGOT CTTGAAA ACCGGtmff-OX'* 36490. .36660

    GCCGATA TA GT CGAATGGTAAA AT T TCTCC TTGCCAA GGAGAtrntB-CiU- 36704. .36777CGCGGGG TA GAGC AGTTTGGTA GCTC G CAAGG CTCATAA CCTTGtrnS-OGt * 4 4 8 2 7 . . 4 4 9 1 3GGAGAGA TG GCC GAGTGGTTGAA GGC G TAACA TTGGAAC TGTTAtrnT-VGO- 46213. .46286GCCCGCT TA GCTC AGAGGTTA GAGC A TCGCA TTTOTAA TGCGAtroL-au 4 6 8 9 4 . . 4 6 9 2 8 4 7 4 4 1 . . 4 7 4 9 0GGOGkTk TC GCO GliTTOOTkOk. CGC T kCGGk C TTJ J J J L TCCOTtroF-GU * 48176. . 48247GCCGGGA TA GCTC AGTTGGTA GAGC A GAGGA CTGAAAA TCCTCtrnY-OtC - 61199. . 61233 61833. . 61871AGGGCTA TA GCTC AGTTAGGTA GAGC A CCTCG TTTACAC CGAGCtrnU-OW 62056. . 62128ACCTACT TA ACTC ACTGGTTAtrnV-CCi - 66229. . 66302ACGCTCT TA GTTC AGTTCGGTAtmP-OSS - 66490. . 66663AGGGATG TA GCGC AGCTTGGTA

    CATCTAGTTTATTCTGTACGAGTTAATCGTACCCGATTGGTTTTCAAGCTTGOCAATATGTCTACCAGCAAGTCTATAGTTCATAAAAATAACTATCAGACAGGTCTO TA G A C nTTG nTA C CTGGTC

    GAGT A TTGCT TTCATAC GSCAGGAAC G TGGGT CTCCAAA ACCCAGCGC G TTTGT TTTGGGT ACAAA

    GTGTCAGGTCGAGTCATGTCATGTC

    GCGGG TTCAATT CCCGT CGTTCGC CCCGGG TTCGAGT CCCGG GCAACCC AGGAGG TTCGAAT CCTTC CGTCCCA GGAGGG TTCGAAT CCCTC TCTTTCC CGCGGG TTCGATT CCCGC TACCCGC TATAGG TTCAAAT CCTAT TSGACGC ACCCAG TTCAAAT CCGOG T5CCGCC TGCOGG TTCGAGC CCCGT CAGTCCC GGCTGG TTCAAAT CCAGC TCGGCCC AGGGGA TTCOACT TCCCC TGGGGGT AATCGG TTCAAAT CCGAT AAGGGGC TGAGGG TTCGAAT CCCTC TCTCTCC TGCOGO TTCGATT CCCGC TATCCOC CACGGG TTCAAAT CCTGT CTCCGCA AOAGOG TTCGAAT CCCTC TCTTTCC GATCGO TTCGATT CCGAT AGCCGGC TOkGGG TTC kiG T CCCTC TirtCCCC kACCAG TTCAAAT CTGGT TCrTGGC ATACGG TTCGAGT CCGTA TAGCCCT AATTGG TTCAAAT CCAAT AGTAGGT AGTAGG TTCAAAT CCTAC AGAGCGT GACGGG TTCAAAT CCTGT CATCCCT A

    IBtrnl-CMk - 8 6 3 1 2 . 8 6 3 8 6 ; IGCATCCA TG GCT GAATGGTTAAtrnL-Ciik - 94276. .94366; IGCCTTGG TG GTG AAATGGTAGAtrnr-GiCk * 100709. . 100760.AGGGATA TA ACTC AGCGGTAtrnl-GAUk * 102801. . 102837.GGGCTAT TA GCTC AGTGGTAtrnA-lASCk * 103665. .103702,GGGGATA TA GCTC AGTTGGTAtrnP-ACGk * 108302. .108375.GGGCTTG TA GCTC AGAGGATTAtrnS-OOnk - 109084. .109013,TCCTCAG TA GCTC AGTGGTA

    I * 1622 64. . 162337AGC G CCCAA CTCATAA TTGGCG AATTCI * 144293. .144373CAC G CGAGA CTCAAAA TCTCG TGCTAAAGAGCGTB - 137 940. . 137869GAGT G TCACC TTGACGT GGTGG AAGTC

    103567. .103 601 ; B - 135048. . 135082 1 35812. . 136846GAGC G CGCCC CTCATAA GGGCG AGGTC1 0 4 5 0 4 . . 1 0 4 6 3 8 ; B - 1 3 4 1 1 1 . . 1 3 4 1 46 1 3 4 9 4 7 . . 1 3 4 9 8 4GAGC T CCGCT CTTCCAA GOCGG ATGTCB - 13 034 7.. 130274GAGC A CGTGG CTACGAA CCACG GTCTCB 129565. . 129636GAGC G GTCGG CTGTTAA CTGAT T3GTC

    GTAGG TTCAATT CCTAC TGGATGC AGGAGG TTCGAGT CCTCT TCAAG GC AATCAO TTCGAGC CTGAT TATCCCT ATCTCG TTCAAAT CCAGG ATGOCCC AAGCGG TTCGAGT CCGCT TATCTCC AGGGGG TTCGAAT CCCTC CTCGCCC AGTAGG TTCGAAT CCTAC TTGGGGA G

    ssctrnL-lMG * 114270. .114349GCCGCTA TG GTG AAATTGGTAGA CAC G CTCCT CTTAGGA AGCAG TGCTAGAGCAT CTCGG TTCGAGT CCGAG TAGCGGC A

    Figure 3. Structure of the tRNA genes in the A. thaliana chloroplast genome. The nucleotide sequences, nucleotide positions in thegenome and the structural domains for the 37 tRNA genes are tabulated.Table 2. The codon-anticodon recognition pattern and codon usage for the A. thaliana chloroplast genome. Numerals indicate thefrequency of usage of each codon in 22,978 codon s in 79 species of potential protein coding genes.

    trnF-GAAUUU F 979UUC F 415UUA L 872 twL-UAADU G L 440 fmL-CAA

    UCU S 495UCC S 248UCA S 336UCG S 163

    tmS-GGA

    (mS-UGA

    UAU Y 704UAC Y 148UAA - 49UAG - 19

    (rnV-GUA tmC-GCAUGU C 203UGC C 68UGA - 12UGG W 400 mW-C C A

    CUU L 504CUC L 154CUA L 320CUG L 139

    tmL-UAGCCU P 373CCC P 171CCA P 259CCG P 117

    trnP-UGG

    CAU HCAC HCAA Q

    394128652

    CAG Q. 170

    tmH-GUG

    tmO-UUG

    CGU RCGC RCGA RCGG R

    299103313100

    tmff-ACG

    AUU I 1024AUC I 341AUA I 632

    tm/-GAUtrnl-CAU

    ACU T 481

    AUG M 519 ^ M . - g A UGUU V 483GUC V 152GUA V 455GUG V 176

    trnV-GAC(mV-UAC

    ACCACA

    T 216T 375

    (mT-GGU

    ACG T 115GCU A 594GCC A 186GCA A 342GCG A 132

    imr-UGU

    tmA-UGC

    AAU N 852AAC N 257AAA K 1016AAG K 270

    (mN-GUU

    imfC-UUU

    GAU D 704GAC D 166GAA E 944GAG E 270

    tmD-GUCtmE-UUC

    AGU SAGC SAGA R

    GGU GGGC GGGA G

    35995

    380AGG R 127

    534149637GGG G 248

    tmS-CCU

    trnfMJCU

    tmG-GCCfmG-UCC

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    6/8

    288(a )

    Sequencing of the Arabidopsis thaliana Chloroplast Genome(b)

    [Vol. 6,

    i i i

    (c )! ! ! ! 11

    / / / / / / / ^n i ii 1 i II in iWlfl O - N M *

    i iliii i i i i i i!S t 2 2 8 5 Kt CO CO 00 00 0 00! i i i i i i %

    (d) (e)f i l l11/ / /T 1M IsfiSlfiSi

    i i

    Figure 4. Structural comparison of gene clusters between the genomes of the A. thaliana chloroplast and the cyanobacterium Syne-chocystis sp. PCC6803. The clusters of the functionally related genes in A. thaliana chloroplast genome (upper) were compared withthe corresponding regions in the cyanobacterial genome (lower). Genes are represented by open boxes on upper (transcribed from leftto right) or lower (transcribed from right to left) side of the middle horizontal lines for each genome. The gene clusters compared are:(a) rpoB-rpoCl-rpoC2, (b) rpsl2-rps7, (c) rpl23-rpl2-rpsl9-rpl22-rps3-rpll6-rpll4-rps8-rpl36-rpsU -rpoA, (d) atpI-atpH-atpF-atpA,(e) ndhH-ndhA-ndhl-ndhG-ndhE-psaC-ndhD, and (f) psbB-psbT-psbN-psbH.

    the value varied from 64% to 100% depending on the genespecies, indicating that significant diversity was gener-ated between the two dicot plants. To obtain informa-tion on the relationship between gene function and di-vergence, comparison was made by dividing the identi-

    fied genes into the following three functional categories:genes related to gene expression, photosynthetic appa-ratus and photosynthetic metabolism. The genes classi-fied into gene expression varied from 72% to 100% (aver-age identity: 87.9%), those classified into photosynthetic

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    7/8

    No. 5] S. Sato et al. 289metabolism from 76% to 100% (average identity: 89.7%),and those classified into photosynthetic apparatus from82.0% to 100% (average identity: 97.0%). It is apparentthat those of the former two gene categories are moredivergent. Sequence conservation was also observed withthe genes in the cyanobacterial genome (25.2% to 90.1%:average identity 59.6%) with the highest values in thecategory of photosynthetic apparatus (47.5% to 90.1%:average identity 70.4%).3.3. Structural features of putative RNA-coding genes

    The A. thaliana chloroplast genome contained twocopies of ribosomal RNA gene clusters (16S - 23S - 4.5S -5S) in the two inverted repeat regions (Fig. 2). Each clus-ter was intervened by two tRNA genes, trnl and trnA,in the 16S and 23S spacer. The sequence identities withthose of tabacco chloroplast were 98-100% in the codingregions and 92% on avarage in the spacer regions.A total of 37 tRNA genes (30 gene species) represent-ing 20 amino acid species were identified in the genomeby similarity search and computer prediction. The nu-cleotide sequences of the gene products and the posi-tions are summarized in Fig. 3. Six of the 30 tRNAgene species, trnK-UUU, irnG-UCC, imL-UAA, trriV-UAC, trnl-GAU, and trnA-XJGG, contained intervenedsequences of 513-5,520 bp long at either the anticodonstem, the anticodon loop, or D-stem.On the basis of the structural information derived fromthe entire protein and tRNA gene constituents of thegenome, the frequency of codon usage and the recogni-tion patterns of the codons and the corresponding an-ticodons were deduced (Table 2). No significant codonusage bias was observed. Thirty tRNA species can suffi-ciently recognize all the codons used in the genome exceptfor trnR-ACG, where only one species of trriR was foundfor four kinds of the codons with different third letters.One possible explanation is that only the first two lettersof the codons are recognized by tRNA. Alternatively, itcould be that the corresponding tRN As are supplied fromthe nuclear genome.3-4- Genom e structure in com parison with cyanobac-teriuvn

    Seventy-four out of 79 genes in the A. thaliana chloro-plast genome were commonly found in the Synechocystisgenome (Table 2). The structures of 13 clusters consist-ing of two or more adjoining genes with rela ted functionsin the chloroplast genome were compared with those ofthe corresponding regions in the cyanobacterial genome.The number and relative positions of the genes were con-served in 7 small clusters: rpl33-rpsl8, atpB-atpE, ndhC-psbG-ndhJ, psaA-psaB, psbD-psbC, psbE-psbF-psbL-psbJ,and petB-petD. Deletion, addition, or rearrangement ofthe genes in either genome were observed in 6 clusters:rpoB-rpoCl-rpoC2, rpsl2-rps7, rpl23-rpl2-rpsl9-rpl22-

    rps3-rpll6-rpll4-rps8-rpl36-rpsll-rpoA, atpI-atpH-atpF-atpA, ndhH-ndhA-ndhl-ndhG-ndhE-psaC-ndhD, psbB-psbT-psbN-psbH, as shown in Fig. 4. Limited conser-vation of gene organization among the genomes of liv-erwort chloroplast, Escherichia coli, and Synechococcushas also been noted.30 These observations would not onlyprovide evidence supporting the endosymbiotic theory inwhich ancestral photosynthetic prokaryotes of cyanobac-teria are the origin of plant chloroplasts, but also suggestthat gene shuffling took place during the establishmentof the cyanobacterium species.

    Acknowledgments: We thank S. Sasamoto, K.Kawashima, S. Shinpo, A. Matsuno, A. Watanabe, M.Yasuda, M. Yatabe, and M. Yamada for excellent techni-cal assistance. We are grateful to Mitsui Plant Biotech-nology Research Institute, the Research Institute of In-novative Research for the Earth, the Arabidopsis Biolog-ical Resource Center at Ohio State University, and Dr.M. Sugiura for providing the DNA libraries and the DNAmarkers. This work was supported by the Kazusa DNAResearch Institute Foundation.R e f e r e n c e s

    1. Shinozaki, K., Ohme, M., Tanaka, M. et al. 1986, Thecomplete nucleotide sequence of the tobacco chloroplastgenome: its gene organization and expression, EMBO J.,5, 2043-2049.2. Ohyama, K., Fukazawa, H., Kohchi, T. et al. 1986,Chloroplast gene organization deduced from complete se-quence of liverwort Marchantia polymorpha chloroplastDNA, Nature, 322, 572-574.3. Hiratsuka, J., Shimada, H., Whittier, R. et al. 1898, Thecomplete sequence of the rice (Oryza sativa) chloroplastgenome: intermolecular recombination between distincttRNA genes accounts for a major plastid DNA inversionduring the evolution of the cereals, Mol. Gen. Genet.,217, 185-194.4. Wolfe, K. H., Morden, C. W., and Palmer, J. D. 1992,Function a nd evo lution of a minimal plastid genome froma nonphotosynthetic parasitic plant, Proc. Natl. Acad.Sci. USA, 89, 10648-10652.

    5. Maier, R. M., Neckermann, K., Igloi, G. L. et al. 1995,Complete sequence of the maize chloroplast genome:Gene content, hotspots of divergence and fine tuning ofgenetic information by transcript editing, J. Mol. Biol,251, 614-628.6. W akasugi, T., T sudzu ki, J., Ito, S. et al. 1994, Loss of allndh genes as determined by sequencing the entire chloro-plast genome of the black pine Pinus thunbergii, Proc.Natl. Acad. Sci. USA, 9 1 , 9794-9798.

    7. Wakasugi, T., Nishikawa, A., Yamada, K. et al. 1998,Com plete nucleotide sequence of the plastid genome froma fern, Psilotum nudum, Endocytobiosis Cell Res., 13(Suppl.), 147.8. Wakasugi, T., Nagai, T., Kapoor, M. et al. 1997, Com-plete nucleotide sequence of the chloroplast genome fromthe green alga Chlorella vulgaris: The existence of genespossibly involved in chloroplast division, Proc. Natl.

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/
  • 8/8/2019 Complete Structure of the Chloroplast Genome of is Thaliana

    8/8

    290 Sequencing of the Arabidopsis thaliana C hloroplast Genome [Vol. 6,Acad. Sci. USA, 94, 5967-5972.9. Hallick, R. B., Hong, L., Drager, R. G. et al. 1993, Com-plete sequence of Euglena gracilis chloroplast DNA, Nu-cleic Acid Res., 21, 3537-3544.

    10. Reith, M. and Munholland, J. 1995, Complete nucleotidesequence of the Prophyra purpurea chloroplast genome,Plant. Mol. Biol. Rep., 13, 333-335.11. Kowallik, K. V., Stoeb e, B., Schaffran, I. et al. 1995, Thechloroplast genome of a chlorophyll a+c-containing alga,Odontella sinensis, Plant. Mol. Biol. Rep. 13, 336-342.12. Stirewalt, V . L., Michalowski, C. B. , Loffelhardt, W. etal . 1995, Nucleotide sequence of the cyanelle genome fromCyanophora paradoxa, Plant. Mol. Biol. Rep., 13, 327-9332.13. Ohta, N., Sato, N., and Kuroiwa, T. 1998, Analysis ofthe p lastid genome of protoflorideophyceous algae Cyani-dioschzon merolae, Plant. Cell Physioi, 39 (Suppl.), s54.14. Douglas, S. E. and Penny, S. L. 1999, Th e plastid genomeof the cryptophyte alga, Guillardia theta: Complete se-quence and conserved synteny groups confirm its commonancestry with red algae, J. Mol. Evol, 48, 236-244.15. Kaneko, T., Sato, S., Kotani, H. et al. 1996, Sequenceanalysis of the genome of the unicellular cyanobacteriumSynechocystis sp. strain PC C6803. II. Sequence determi-nation of the entire genome and assignment of potentialprotein-coding regions, DNA Res., 3, 109-139.16. Martin, W., Stobe, B., Goremykin, V. et al. 1998, Genetransfer to the nucleus and the evolution of chloroplasts,Nature, 393, 162-165.17. Meyerowitz, E. M. and Somervill, C. R. (eds) 1994, Ara-bidopsis, Cold Spring Harbor Laboratory Press.18. Kaiser, J. 1996, First global sequencing effort begins, Sci-ence, 274, 30.19. Meinke, D. W., Cherry, J. M., Dean, C. et al. 1998, Ara-bidopsis thaliana: A model plant for genome analysis,Science, 282, 662-682.20. Sato, S., Kotani, H., Nakamura, Y. et al. 1997, Struc-

    tural analysis of Arabidopsis thaliana chromosome 5. I.Sequence features of the 1.6 Mb regions covered by twentyphysically assigned PI clones, DNA Res., 4, 215-230.21. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., andWhittier, R. F. 1995, Generation of a high-quality PIlibrary of Arabidopsis suitable for chromosome walking,Plant J., 7, 351-358.22. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman , D. J . 1990, Basic local alignment search tool, J.Mol. Biol., 215, 403-410.23. Lowe, T. M. and Eddy, S. R. 1997, tRNAscan-SE: a pro-gram for improved detection of transfer RNA genes ingenomic sequence, Nucl. Acids Res., 25, 955-964.24. Sugiura, M. 1992, The chloroplast genome, Plant. Mol.Biol., 19 , 149-168.25. Sugiura, M., Hirose, T., and Sugita, M. 1998, Evolutionand mechanism of translation in chloroplasts, Annu. Rev.Genet, 32, 437-459.26. Sugita, M., Kato, A., Shimada, H. et al. 1984, Sequenceanalysis of the junctions between a large inverted repeatand single-copy regions in tobacco chloroplast DNA, Mol.Gen. Genet, 194, 200-205.27. Moon, E. and Wu., R. 1988, Organization and nucleotidesequence of genes at both junctions between the two in-verted repe ats and the large single-copy region in the ricechloroplast genome, Gene, 70, 1-12.28. Prombona, A. and Subramanian, A. R. 1989, A new rear-rangement of angiosperm chloroplast DNA in Rye (Secalecereale) involving translocation and duplication of the ri-bosomal rpS15 gene, J. Biol. Chem., 264, 19060-19065.29. Maier, R. M., Dory, I., Igloi, G. et al. 1990, The ndhHgenes of gramminean plastomes are linked with the junc-tions between small single copy and inverted repeat re-gions, Curr. Genet, 18 , 245-250.30. Ozeki, H., Ohyam a, K., Fukuzawa, H. et al. 1987, Geneticsystem of chloroplasts, Cold Spring Harbor Syrup. Quant.Biol., 52, 791-804.

    byguestonOctober10,2010

    dnaresearch.oxfordjournals

    .org

    Downloadedfrom

    http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/http://dnaresearch.oxfordjournals.org/