14
Copyright 0 1995 by the Genetics Society of America Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinuria cornlea Evi Hatzoglou, George C. Rodakis and Rena Lecanidou Department of Biochemistry, Cell and Molecular Biology, and Genetics, University of Athens, Panepistimiopolis, Athens 157 01, Greece Manuscript received January 31, 1995 Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Ahinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Kathanna tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaeu nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. I N the last few years, there has been an accelerated accumulation of sequence data on animal mitochon- drial genomes. Most of the complete mitochondrial DNA (mtDNA) sequences that have been published concern deuterostomes [vertebrates and echino- derms: human (ANDERSON et al. 1981), mouse (BIBB et al. 1981), rat (GADALETA et al. 1989), cow (ANDER- SON et al. 1982), fin whale (ARNASON et al. 1991), blue whale (ARNASON and GULLBERG 1993), harbor seal, Phoca vitulina (ARNASON andJOHNsSON 1992),Ameri- can opossum, Didelphis virgzniana UANKE et al. 1994), chicken (DESJARDINS and MORAIS 1990), Xenopus (ROE et al. 1985), carp (CHANG et al. 1994), sea ur- chins Paracentrotus lividus (CANTATORE et al. 1989) and Strongylocentrotus purpuratus (JACOBS et al. 1988), sea star, Asterina pectinijiera (AsAKAWA et al. 1991)], while the available sequences for protostome coelo- mates are limited to four arthropods [Drosophila yalzuba (CLARY and WOLSTENHOLME 1985a), bee (CRO- ZIER and CROZIER 1993), mosquito, Anopheles quadrima- culatus (MITCHELL et al. 1993), Artemia (PEREZ et al. 1994, EMBLdata bank accession number X69067)l and one mollusc [the chiton Katharina tunicata (BOORE and BROWN 1994b)l. Within the phylum of molluscs, con- siderable information is also availablefor Mytilus edulis, whose partial sequence but complete mtDNA gene con- tent and organization is known (HOFFMAN et al. 1992), Cmespondingauthw: Rena Lecanidou, Department of Biochemistry, Cell and Molecular Biology, and Genetics, University of Athens, Panepistimiopolis, Athens 157 01, Greece. E-mail: [email protected] Genetics 140 1353-1366 (August, 1995) as well as Cqbuea nemoralis, whose partial gene order (excluding tRNA genes) has been published (TERREIIT et al. 1994). The mtDNA gene content of coelomate metazoans is constant: it consists of 13 protein genes, two genes for the small and large ribosomal RNA subunits, and 22 tRNA genes, some of which are transcribed from one and some from the other mtDNA strand. An excep tion is observed in molluscs. Mytilus is missing a protein gene (ATPuse8), while it contains an extra tRNA gene and has all of its genes transcribed from one strand (HOFFMANN et al. 1992); Katharina, on the other hand, contains the standard set of 37 mitochondrial genes, as well as two extra tRNA genes, that may or may not be functional (BOORE and BROWN 1994b). Besides coding regions, there are also noncoding sequences in animal mtDNA and, more specifically, a major noncoding segment, which in deuterostomes (D- loop) contains a combination of sequence elements that are related with control of both replication and transcription (JACOBS et al. 1988; CLAYTON 1991, 1992; WOLSTENHOLME 1992; SHADEL and CLAY~ON 1993). The length of this region is extremely variable [121 nucleotides (nt), sea urchin (JACOBS et al. 1988) ; over 20 kb, pine weevil (BOYCE et al. 1989)l. The length variation is usually due to the presence of short (e.g., 10 bp) (GHMZZANI et al. 1993) or longer [e.g., 260 bp (BROUGHTON and DOWLINC 1994) or 1.2 kb (GJETVAJ et al. 1992)] repeated sequences. Size variation in non- coding sequences inevitably results in total length varia- tion, which can be observed even at the level of individ-

Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Copyright 0 1995 by the Genetics Society of America

Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinuria cornlea

Evi Hatzoglou, George C. Rodakis and Rena Lecanidou

Department of Biochemistry, Cell and Molecular Biology, and Genetics, University of Athens, Panepistimiopolis, Athens 157 01, Greece Manuscript received January 31, 1995

Accepted for publication May 15, 1995

ABSTRACT The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Ahinaria

coerulea w a s determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Kathanna tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaeu nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date.

I N the last few years, there has been an accelerated accumulation of sequence data on animal mitochon-

drial genomes. Most of the complete mitochondrial DNA (mtDNA) sequences that have been published concern deuterostomes [vertebrates and echino- derms: human (ANDERSON et al. 1981), mouse (BIBB et al. 1981), rat (GADALETA et al. 1989), cow (ANDER- SON et al. 1982), fin whale (ARNASON et al. 1991), blue whale (ARNASON and GULLBERG 1993), harbor seal, Phoca vitulina (ARNASON andJOHNsSON 1992), Ameri- can opossum, Didelphis virgzniana UANKE et al. 1994), chicken (DESJARDINS and MORAIS 1990), Xenopus (ROE et al. 1985), carp (CHANG et al. 1994), sea ur- chins Paracentrotus lividus (CANTATORE et al. 1989) and Strongylocentrotus purpuratus (JACOBS et al. 1988), sea star, Asterina pectinijiera (AsAKAWA et al. 1991)], while the available sequences for protostome coelo- mates are limited to four arthropods [Drosophila yalzuba (CLARY and WOLSTENHOLME 1985a), bee (CRO- ZIER and CROZIER 1993), mosquito, Anopheles quadrima- culatus (MITCHELL et al. 1993), Artemia (PEREZ et al. 1994, EMBLdata bank accession number X69067)l and one mollusc [the chiton Katharina tunicata (BOORE and BROWN 1994b)l. Within the phylum of molluscs, con- siderable information is also available for Mytilus edulis, whose partial sequence but complete mtDNA gene con- tent and organization is known (HOFFMAN et al. 1992),

Cmespondingauthw: Rena Lecanidou, Department of Biochemistry, Cell and Molecular Biology, and Genetics, University of Athens, Panepistimiopolis, Athens 157 01, Greece. E-mail: [email protected]

Genetics 140 1353-1366 (August, 1995)

as well as Cqbuea nemoralis, whose partial gene order (excluding tRNA genes) has been published (TERREIIT et al. 1994).

The mtDNA gene content of coelomate metazoans is constant: it consists of 13 protein genes, two genes for the small and large ribosomal RNA subunits, and 22 tRNA genes, some of which are transcribed from one and some from the other mtDNA strand. An excep tion is observed in molluscs. Mytilus is missing a protein gene (ATPuse8), while it contains an extra tRNA gene and has all of its genes transcribed from one strand (HOFFMANN et al. 1992); Katharina, on the other hand, contains the standard set of 37 mitochondrial genes, as well as two extra tRNA genes, that may or may not be functional (BOORE and BROWN 1994b).

Besides coding regions, there are also noncoding sequences in animal mtDNA and, more specifically, a major noncoding segment, which in deuterostomes (D- loop) contains a combination of sequence elements that are related with control of both replication and transcription (JACOBS et al. 1988; CLAYTON 1991, 1992; WOLSTENHOLME 1992; SHADEL and CLAY~ON 1993). The length of this region is extremely variable [121 nucleotides (nt), sea urchin (JACOBS et al. 1988) ; over 20 kb, pine weevil (BOYCE et al. 1989)l. The length variation is usually due to the presence of short (e.g., 10 bp) (GHMZZANI et al. 1993) or longer [e.g., 260 bp (BROUGHTON and DOWLINC 1994) or 1.2 kb (GJETVAJ et al. 1992)] repeated sequences. Size variation in non- coding sequences inevitably results in total length varia- tion, which can be observed even at the level of individ-

Page 2: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1354 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

uals (reviewed by MORITZ et al. 1987). Extremes in size variation have been documented in molluscs: seven species of scallops have sizes ranging from 16.2 to 42 kb (SNYDER et al. 1987; LAROCHE et al. 1990; GJETVAJ et al. 1992).

The mtDNA gene organization is considered to be relatively constant within each metazoan phylum, where the observed variations mainly concern tRNA gene rearrangements (WOLSTENHOLME 1992) and nu- cleotide substitutions (CROZIER et al. 1989; PALUMBI and BENZIE 1991; AVISE et al. 1992; MARTIN et al. 1992). However, molluscs are the largest exception to this rule, since large variations in gene organiza- tion (e.g., between Mytilus and Katharina) (BOORE and BROWN 1994b) and sequence [Albinaria t u n i t a vs. Mytilus (LECANIDOU et al. 1994) and Katharina us. Mytilus (BOORE and BROWN 1994b)l have been found.

We have recently pointed out the first indications for this great diversity of molluscan mtDNA, as Albinaria sequences are more similar to the corresponding se- quences of Drosophila than of Mytilus, and have sug- gested that this may be attributed to either a polyphy- letic origin or to a high and differentiated evolutionaIy rate of molluscan mtDNA (DOURIS et al. 1995; LECANI- DOU et al. 1994). Boom and BROWN (1994b), on the basis of the observation that structural features of Ka- tharina resemble more those of Drosophila than of Mytilus, have proposed a fast mtDNA molecular clock for Mytilus.

In this paper, we present the complete mtDNA se- quence of the land snail A. coerulea. It is the smallest mtDNA among coelomate metazoans (14,130 bp) , does not contain any noncoding sequence longer than 42 nt, and its gene organization seems to have almost noth- ing in common with that of other known metazoans, with the exception of the major genes of another terres- trial gastropod, C. nemoralis.

MATERIALS AND METHODS

The mtDNA of A. coemleu was cloned in three consecu- tive HindIII segments: 6.55 kb (13A3, bases 1561-8123), 5.4 kb (13A12, bases 8124-13278), and 2.4 kb (13F6, bases 13279-1560) (Figure 2) (DOURIS et al. 1995). The follow- ing evidence indicates that the entire mitochondrial ge- nome is represented in these three Hind111 clones: (1) sequencing of an EcoRI clone (15A6, bases 282-2950) (Fig- ure 2) showed that it overlaps clones 13F6 and 13A3, which therefore must be adjacent; (2) clones 13A12 and 13F6 must also be adjacent, because a sequenced clone from a closely related Albinaria species (A28 from A. turrita) (LECANIDOU et al. 1994) overlaps them, and (3) the HindIII site between clones 13A3 and 13A12 (base 8124) (Figure 2) lies in a conserved region of the ND4 gene (bases 8113 to 8137 correspond to the amino acid sequence FLVKLPIY, which is identical in the chiton K. tunicata) (BOORE and BROWN 1994b).

Restriction fragments of the three HindIII clones, as well as of the overlapping EcoFU clone, were subcloned into plasmid vectors pUC8, 9, 18, 19, or pBluescript I1

FIGURE 1.-Gene map of the A. coeruleu mtDNA molecule. The outer circle represents the sense strand for genes tran- scribed clockwise and the inner circle for genes that are tran- scribed counterclockwise. tRNA genes are depicted by the one-letter amino acid code. Positive numbers at gene bound- aries indicate noncoding nucleotides; negative numbers indi- cate overlapping nucleotides of adjacent genes. Abbreviations used: ATPase68, ATP synthase subunits 6 and 8; COI-IIZ, cyte chrome c oxidase subunits I, 11, and 111; Cytb, cytochrome b apoenzyme; NDl-6and ND4L, NADH dehydrogenase subunits 1-6 and 4L; s-rRNA and l-rRNA, small and large subunits of ribosomal RNA.

KS (Stratagene). Further subcloning was performed after deletions of clones and subclones using the enzyme Exo- nuclease I11 (HENIKOFF 1987). Clones and derived sub- clones were amplified after transformation with CaC12/ RbC1, using as hosts Escherichia coli JM83. DNA was ex- tracted by mini plasmid preparations with alkaline lysis (SAMBROOK et al. 1989).

Both mtDNA strands were completely sequenced. Se- quence determination was performed from the ends of clones and subclones by the dideoxynucleotide chain termination method (SANGER et al. 1977) using Sequenase V 2.0 (US B i e chemical) and universal Ml3/pUC primers. Samples were resolved in 4 or 5% polyacrylamide, 7.5 M urea gels, and autoradiography was performed using Kodak X-Omat film. Sequences were identified by comparison with data from the EMBL data bank or from cited publications. The tRNA genes were identified by their potential to form the characteristic for mt-tRNA stem and loop structures. Sequence analysis was performed using computer programs developed by PUSTELL and KAFATOS (1984, 1986). Other computer programs used are cited in the text.

Northern hybridization was used to identify the location of the tRNALys gene. RNA was extracted from the feet (mus- cle) of 12 snails, treated with Proteinase K, extracted with phenol/chloroform and precipitated with ethanol ac- cording to SAMBROOK et al. (1989). Samples were resolved by electrophoresis in a 20 X 20 X 0.15 cm 4% polyacryl- amide, 6 M urea gel and electroblotted in 0.5X TBE to a Gene Screen Plus (DuPond, NEF-976) hybridization trans- fer membrane (SRIVASTAVA et al. 1993). The membrane was baked at 80" for 2 hr. mtDNA fragments used as probes

Page 3: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1355

were labelled by nick translation (SAMBROOK et al. 1989). Hybridization was performed at 65" in 0.4 M NaCl, 1% so- dium dodecyl sulfate, 50 mM Tris-HC1 pH 7.5. Autoradiog- raphy was performed using an intensifjmg screen (DuPond Cronex Lighting-Plus) and Kodak X-Omat film. Exposure was for 1-4 days.

The sequence of A. coeruka mtDNA has been deposited in the EMBL nucleotide sequence data bank under accesion number X83390.

RESULTS AND DISCUSSION

General features: A. coerulea mtDNA has the typical features of metazoan mtDNA but several of its charac- teristics are novel. Its length (14,130 bp) is the smallest among known coelomates with the exception of that belonging to yet another land gastropod, C. nemwalis, that is reported to have approximately the same size as Albinaria (TERRETT et al. 1994). The pseudocoelomate nematodes A. suum and C. elegans ( OKIMOTO et al. 1992) have equally small mtDNA genomes. The small size of Albinaria mDNA is the consequence of its compact gene organization, the small size of its genes, as well as the absence of a lengthy noncoding region (Figure 1). It contains all 37 genes typical of metazoan mtDNA 13 protein genes (ATPase6, ATPase8, COJ COII, COIII, Cytb, ND1, ND2, ND3, ND4, ND4L, ND5, ND6), two ribosomal RNA genes (l-rRNA, s-rRNA) and 22 tRNA genes. The sequence shown in Figure 2 constitutes the sense strand for 24 genes (majmstrand, coding for more genes), while 13 genes are transcribed in the opposite direction (minw strand, coding for fewer genes). The base composition of the major strand is T, 37.9%; C, 13.8%; A, 32.8%; G, 15.5%, and the G + T content is 53.4%. However, if we take into account the nucleotide frequencies at fourfold synonymous sites (considered by definition as neutral positions), we find that in these positions the major strand has almost the same G + T percentage as the minor strand (50.3 and 50.2%, respectively).

The A + T content (70.7%) of Albinaria is higher than that of deuterostomes [in echinoderms the A + T content ranges from 58.9%, S. pu7jburatus, (JACOBS et al. 1988) to 61.3%, A. pectinifma (ASAKAWA et al. 1991); in vertebrates from 55.6%, human (ANDERSON et al. 1981) to 63.2%, mouse (BIBB et al. 1981), as well as of other known molluscs [69.0%, K. tunicata (BOORE and BROWN 1994b); 62%, sequenced portions of M. edulis (HOFFMANN et al. 1992)], but lower than that of insects (78.6%, D. yakuba (CLARY and WOLSTENHOLME 1985a) ; 84.9%, Apis mellifera (CROZIER and CROZIER 1993); 77.4%, A. quadrimacukztus (MITCHELL et al. 1993)] and of pseudocoelomate nematodes (76296, C. elegans; 72.0%, A. suum) (OKIMOTO et al. 1992). The A + T content at fourfold synonymous sites (77.5% overall, 77.9% major strand, 76.3% minor strand) is indicative of a clear bias toward A + T.

As far as the dinucleotide composition of the mtDNA is concerned, Albinaria exhibits the same deficiencies as all known metazoan mtDNAs. The double-stranded dinucleotides CC * GG and GC - GC show a high ratio

of observed/expected frequency. p* = 1.44 and p* = 1.30, respectively; p* is the symmetrized dinucleotide odds ratio calculated according to BURGE et al. (1992). In contrast, the doublestranded dinucleotide CG * CG is the only under-represented (p* = 0.70). At present, there is no widely accepted explanation for the CpG suppression in animal mtDNAs (CARDON et al. 1994).

Gene arrangement: Close inspection of the Albinaria gene arrangement map (Figure 1) reveals certain inter- esting features. The Albinaria gene organization is novel, bearing almost no similarities to any published complete gene arrangements, including those of two other molluscs, Mytilus and Katharina (HOFFMAN et al. 1992; BOORE and BROWN 1994a,b). However, the partial gene organization of another terrestrial gastropod, C. nemwalis (TERRETT et al. 1994), reveals many similarities in the protein and rRNA gene order with Albinaria: only one gene rearrangement is required to intercon- vert the Albinaria and Cepaea protein and rRNA gene order, namely a transposition of the ND4 or COIIIgene. As the Cepaea complete sequence and gene organiza- tion has not been published, a comparison of tRNA genes of these two land gastropods cannot be made at present, although it can be inferred from looking At Figure 6 of TERRETT et al. (1994) that there are at least two positions in the genome map that differ: a region containing either tRNA genes or noncoding sequences present between genes coding for ND6 and ND5 in C e p aea is absent from Albinaria and no tRNA is present between s-rRNA and ATPase6 in Cepaea, whereas these two genes in Albinaria are separated by two tRNA genes.

No gene boundaries are shared between Albinaria ana Mytilus and only very few with Katharina and the arthre pods. More specifically, tRNAvd, lrRNA and tRNA&(CW) are directly adjacent in Albinaria, Katharina and the ai" thropods, but in Albinaria these genes are transcribed from the major strand; furthermore, Albinaria and Ka- tharina also share the gene boundaries between tRNAMd and srRNA, which are transcribed from the minor strand in both molluscs. The only other shared gene boundaries between Albinaria and other organisms are those between genes COIII and tRNATh, which is also observed in the nematodes (OKIMOTO et al. 1992), and between the S- rRNA and tRNAau, which is held in common with echin& derms (see SMITH et al. 1993).

It has been suggested from gene order comparisoxik that rearrangements involving tRNAs occur more fre- quently than rearrangements involving other genes (WOLSTENHOLME 1992). Thus, if the differences in rela- tive locations of tRNA are ignored, we can discern very limited similarities in protein gene borders between Albinaria and Mytilus involving the Cytb and COIIgenes, and the ND4, COIII and ND2 genes, although COIII is transcribed from the minor strand in Albinaria (while all Mytilus genes are transcribed from one strand). Per- haps more interesting is the proximity of ND2 and COI genes in many organisms including the molluscs Albina- ria, Katharina and Cepaea (but not Mytilus), the arthro-

Page 4: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1356 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

A 1 CCGTTTTCCT~TTATTAGGTGTTCTATGT~TATTATAffiTGTAATTTACATAGTATTAAATATA~TTCCAGTTATCTTTTMTATTTMTTTA 101 T T T T C A A C C - G U j T T A A C T T T A A C T T ~ T T T A A T T T G T G A T - T ~ G T T T T T f f i T ~ T G G T A T T ~ M T T T ~ ~ G T G T T T T T C 201 TTTTTGCTMTGAATATATATCTGAAGATCATTATAT-TCCGTTTTGGTTGAATTTTMT-TTTGT~CATCTAT~TTCT~TTTT~ffi 301 T T C A A T T T T T A C T T T G C T T C T A G G T T G A G A T G C T C G ~ T T ~ T T C A T T T ~ T T T A A T T ~ T ~ ~ ~ T A A T T A T M T ~ T C T T C C T C A ~ T T T 401 C T G A C G C T A A T M C T A A T C G A T T A G G A G A T G T A T T A A T T A T T ~ T - T T T ~ T M T T ~ ~ T T - G T T T ~ C G T C ~ T T C C C C C C T A T ~ T 501 T R G T T T G G C T C T C A T C G A T T T T A T T T A C A A T T G C A A G T T T 601 A C C A G T A A G G G C A T T A G T C C A C T C T A G G A C A ~ ~ T T ~ G G C T ~ T C T A T T T A A T M T T C G T T G T T T T A T ~ T ~ T f f i T ~ C C ~ T - T A T A T 701 AGTTTAATGGGACTTGTTGGATCTATTACTTGTTTACTGGGRGGT~GTCGCATTATTTGRGTATGATTT~T~TT~CTTATCTACATTM 801 GTCAGTTGGGAGTTATAATATAT~TT~TCTTMTCTACCTT~CTAGCATTACTTCATTTATAT~TGCTATGTTT~ATACTATTTTT

1001 A T T R G T A T A T T G T G C T T M T ~ G A T T T C C A T T T G T T ~ T C A T T T T A C T ~ C A T T T M T C ~ ~ T A T T ~ T A T - T T G T A A T T T C T T T A 901 AGGGGCAGGACTTATTCTAATATGGGACT~GATTT~GTTTACTAGGTAGATTACTATATTCTT~CTATTGTTATCTCACTGCTTMT

1101 CATCCATAATATTTATATTAGGRACTCTCTCCTTACTG~TATATTCTATTCG~TMT-TTTTTATGTTGRG-TMTMTMTAAGTCCACCTTCTTA 1201 CTGTMTATGRGTTGACACAAAGCARGATATCCATATTCCCTTTRGCGGCTCTC~TGTATT~~TTMTAT~TAT~T~T~TCATATATA 1301 A C G T T T A G A T G G T C T T A A T T T T A T ~ G R G T T C T T T T T T T ~ T T T T T T T T - T T G T T A T - T T T ~ T M C T T T T ~ C 1401 C R A C T T T A A T A T T T T A G G C C C T A C A T C A T A T A A T C T A T T A ~ T T - C T A A A T C T T T A T T M T T T A T A T ~ ~ T T G A T C T T T C 1501 GATTAGTGA~CCAACTGAGTTATARGCARTTTTAATATATT~~TTCTTGGCGAGTTAT~~TATTTAATTGATTM-TTATATGTTRGTTACA

[ m5->

t nul-> ms-> I

1601 TGATTCTTGTTGGTTTGGTTAATAATTAT~TCCTTAT-T~TATTT~GTT~TATTAAATTTATGTATTTT~TATCRGT~CTTTTTA 1701 T A C T T T A T T G G R A C G T A A A G T A C T A A G C A G C A T A C A A A T T C ~ G A C C T A A ~ T T ~ T A T A T ~ T A T T A T T ~ C ~ T T ~ ~ T ~ C T T A 1801 AAATTATTTTTARAAGRGTTTTTTATTC~TTAATffiTAATTCCTTTATGTTTATAATTTT~CTCTATT~CCTM~-TTGATATTAT~ 1001 C T G T T T T T C C T A G T A T A T G A A T A T T ~ T T T T C A T ~ C T A T ~ A ~ A A T A T T A T T T G T C G ~ T T ~ C ~ T T T G T T T A T G T M T T A T T T T T G C C G G 2001 TTGATCATCTAACTCTAAATATTCTTTTTTATTCTTTTTTAGGGGG~T~~CTGCT-~ATTTCTTAT~TT~~TATTATTATTATTATTTTTTGCT 2101 G T T C T T A T A T A T C G G A C A T A T T C T T G A T A T ~ ~ T ~ T C T C ~ T T f f i T A T T A T T A T T T T T A T T A T T A T A T T T A T T T f f i T T T ~ T C A T G T C 2201 T A G C C G A G A - T C G T G C A T T T G A C T T T G ~ ~ T C C G A G T T R G T A T ~ f f i G T T T M T A T T ~ T A T T A T G G T f f i T A T A T T C ~ T T G C T

2401 CTAATTGCGATAGCATTTTTATTTGCTCGCGGGGTTTTATCCTC~TC~TAT~TTT~TTAT-CTTATGCT~TTTTTAC~TTT~GC 2301 CTTTTTAGCTGRGTATAGTTCGATTCTATTTATATGTATAATATCAACCGTTT~TTTTTGTAT~GGATATAATCTTTATTAT~TT~AATTTTG

2501 TATGCTGCATTTGCTAT-TACTTCTTTGAATTGTGT~CAT~~CTATTTAT~~TTATTATTATTATGTATff iTMTGTTTGT~CGT 2601 T T T T T A C T C T T T - T T A A T ~ T ~ T T C T ~ T T T M T A T T A A T T ~ T ~ ~ C G R G T G T ~ C T G T T T C C C T T M T T A T A T G ~ 2701 T G G A T C C A G C A T A G T A T T T A T T A T T G T G T T T T G C T G C T ~ C R G C A T T ~ C T ~ T T T A C T T G T T T G C T T T A T T ~ G T T M T T C T A G G

2801 T G T G A A A T A T T A G C A A T P A A T A A A R T T T T G T T T G C A R A A A 2901 A A T A T A G G A T C T A T T C T T G G T A T M T T T T A ~ T T T - T T G C T M C T G ~ T T C T A C T G T ~ T - T T A T T C G T C T ~ T T ~ T A ~ T T T ~ T 3001 CTATTATTCATATTATTCGTGATGT~CT~GATGATTTCTTCGTTTACTTCAT~MT~GCCT~CTTTTTTTCTTGTTTATGTAT~CCATAT 3101 C G G A C G T G G A C T A T A C T A T C G T A T A T T A T C C A T C C A C G T G T A T ~ T G G T ~ G T G T ~ T T T T T T T f f i T T - T ~ A C ~ T T T T T ~ G 3201 T A T G T A C T T C C A T G G G G T C C A T T T T ~ T G C T ~ T G T A A T T A - ~ T A T T A T ~ C G T ~ C ~ A T T T T ~ C C ~ T ~ T T ~ T G R G 3301 T T T G G G G A G G G T T C T C T G T T G G G C A T ~ T A C C C T ~ T C G T T T T T T ~ ~ T - T T T T T T A T T ~ C C T T T ~ T A T T ~ G T T ~ ~ A ~ T C A T A T 3401 T A T T T T T C T T C A T G A T A A G G G G T C A T C T A A C C C C A T T A G G T A A T T T A T T T C A C T T A A ~ C A T T T ~ C C A T A T T T ~ ~ T T ~ T ~ G T T 3501 G G A T T T T T A A T A G T A T T T G G T G T T C T G T T A A T M T C A C A T T T T T ~ ~ C C T T C T T T A C T ~ T ~ T C C C ~ T T A ~ T T ~ T ~ ~ C C C T A T ~ T ~ 3601 C T C C T A C T C A C A T T C M C C G G A A T G A T A T T T T T T T G T T C G C T T A T ~ T A T T C T T C G A T ~ A T T C C C T ~ T T ~ G f f i f f i T ~ T ~ T T A T T M T A T C 3701 TATTTTAATTCTATATTTTTTACCTTTATCTAGATATGG~TATTCCTGT~TAT~TATTATTTATCAffiTTTTATTTTGRATTTTRGT~TT 3801 ACTTTTATTATTCTGACATGACTAGGTGCTTGCGAGATT~CATATCTCTCATT~GGGTC~CTTACATTATTATATTTTTTMTGTTTTTAT

m1-> 1 I m4L->

W4L-> I t cy-->

3901 TATTAGGGATATCGAATAATTTAAATTTTAATCTTATT-TT-=A~TTT~TTAATTTATATT-TAATTT~TT-T-T-TAA~~ cy"> I tRwI ">

4001 M ~ T G T T T A R A C T T A T M T T A T A T T T T M T A T A T T A T T T A T ~ ~ M T T A A T A G C ~ T A T M T ~

4101 T A A A G C G T G G C T T T ~ G G G C T T R G R A A G A A A A T T T T T C T T T M T T ~ T C ~ G T A ~ T ~ G A - T T M T T T M T ~ T C C T G C G T C T C C G R T C

tRwI *s->

~ m me-> t mII+

4201 C A A A T A G A A A T A R T A T T A T T T C A T G A C ~ T G C A A T ~ ~ T T C T T A T C ~ T T T T T ~ C T G G T - T T ~ T ~ T T - T T A T G T T T T M T ~ T 4301 T A T C A A C T C G A A C T A T A C A T G A A G C T C A G T T A T T A ~ ~ C T G T ~ ~ T T T T A C ~ T T T C T T C T G G T G T ~ C T ~ C T C T T C ~ G A C T G C f f i T T

4501 MCTTCGATTCTTATATAATTCCGGRRGAGGATTTT~C~GTGATTATCGACTACTT~TT~TMCC~CTATGGTACCTTATff iTTT~TA 4401 ATTATATTTACTCGATGAGRGCAGGGTAGAGAGGGTATTATTATTTT-~TATTGGC~T~T~TACT~TAT~TGC~T~T-CATTT~

4601 T T M T G T T A T T A C T A C R A G G T G A ~ T A A T T C A T ~ T ~ C T T T A C ~ T A T ~ G f f i T T - T ~ T ~ G T T C ~ G A T T A A A T ~ T A T A G G 4701 T T T T W \ T G C A A A C T T G C C A G G A A T T T T A T T A C G G A ~ T ~ T C T ~ T T T G T G G G G C T A A C C A T T C T T T T A T ~ ~ A T T ~ ~ T T ~ C C A T T ~ T G T T

mI1-> 1 4801 A A A G A T T T T A T T A R A A T A T G T A A ~ ~ T T A A A C T T T T A T T - C T T ~ T T T T G G C C C T ~ T ~ G A T G G T ~ T T ~ - T ~ A G T T ~ T T T A

tRwI Tyr->

4901 T T T C C ~ T A T ~ T A T A T G T A A G T T A A R C A A A T G T C ~ G A ~ T R G T C A T A A T T A ~ T T A T A ~ T T G T ~ T T T A T A T ~ T A T A T ~ T T G T A ~ - Trp-> 1

5001 G T T A C C T ~ T A R A A A G G C T C T A ~ G T ~ T ~ T A T T M G T T - T T - C T A T ~ T - C C C T T M T A T T ~ T T ~ T G A G

5101 ~ T T C G ~ T T T C T A T A A R A A T A T T T T T ~ ~ T T T A A ~ ~ T ~ T T A A ~ G T G C T T T - C ~ ~ ~ <-m oln

5201 C A ~ - T C T A T G C A C T T A A T T C T G C C R C A T A G T A R A T G C G

5401 T A T T G G T C T T A A T T G R G G U T G A C T A A G T A A C C T A A T G G T T T C T T T T T M T T ~ T T ~ T A T T ~ T C T ~ T T ~

- m my-> m mi.->

<-" wml [ <-).TPaMB

5301 AAGCRAGTAAGGGA~CTATTGGTGTGRGTATAT~TGATTAATAACTMTA~TAAGTCCATAATAT~TGTAACG~TATMTT~CCCATT

1501 T A T A A A C T T A A A A G T - T ~ T A T A G A A G T A A R A A T A T A G G C T T G A R T G A TGTTCATTTATG

5601 A A G T A T T T C T T A G G T T C G A G C T T A A T A - T G C C A T T A A T G ~ - T M T A T ~ C C ~ C G ~ T A T A T T ~ ~ ~ T C ~ G T T - T ~ ~

5601 T T M T A T A G C C T G T ~ C C T ~ T A G T A A T ~ C C A G G G T T A T A T T T A ~ C A T ~ C C T A G T C G T M T T C C G T A G G T ~ T ~ T A T A C 5701 T C G R A T T A T A A T A C T M T A G A T T C A A T T - T T ~ G G T A G T A A T ~ ~ T ~ C ~ T ~ ~ T G ~ C T M T C T C T T T T T ~ ~ T

1901 C T - G G T T A T T A A A A A T M T T A A T A C T A T T A T M T T G T T A A T ~ C T T T T R G T T ~ T T A T A T G T C ~ T T G T T T ~ T T ~ T ~ ~ A A A T A T T T 6001 AATMTATTATTTTGATCTATTeTTCACGTCTTATTTCTTATTT-TTAT~TGAT~TATCGGTGTTMT~~CCAT~TTATTCCATCT

<-ATPase0 I <-m lun <-ATRaMI

<-ATPase6 1 6101 A A G G A T G A A A A T A A G T C T A C T A T - ~ A A T A T T G G A A A A

<" k g

6201 CTACAGAGTTATATCTAACTCATTTTTTATTAT~T~GTATTT~TTTTATACTAT~TAACTTA~A~~TffiTGTAAGTCCATATATTATTAffi 6301 G G C R G C T T C C A C T A C C C C T A C T T T G T T A C G R C T T A T C T C T 6401 ATCCTTATATATTACTTTTAAGTCCATCCAT~TTAT-TTTTTCATTTCGT~TCC~T-TTAATTAATGTMCTCAC~TMTTCTT~~T~C

6601 T A A T G C T T G G T G G T T A T C T A T A T G G T A R G T T C C C C P A T 6701 A A A T T A C T T A G T T G G G G T C G T C T C T A A T C C T A G T T T T A T ~ A T ~ T T T G T T A T ~ C A T ~ T T T T ~ T T A A T A T T C A T M T M T A

<-fRxI Qlu , t <-a-

6501 T G C A C C T T G A T T T G T C T T A A T A T T T ~ T T M T A T T C A T G A A C C C C T T T G T ~ T T C A T ~ ~ T A T A - T T T T T T ~ T A A G T C C A T A T M

6801 T T T C A C C A T T T A T T A T A T T A C T T T ~ T T A T M G T M T T T A A T A A A C T T G T A A G C T A A ~ T A T T A ~ ~ ~ T T T ~ C C G C ~ ~ ~ T T ~ ~ C C T M T ~ C 6901 C T A A A T T A A T A A A C T G T C T A A A T T A T A A T T T ~ T A T A T M T T T - T C T T ~ ~ G ~ T ~ A C A C T T ~ T C C A T A T T T ~ C ~ T A T T T A T T G

F k U m 2.-The complete nucleotide sequence of the circular A. coeruku mtDNA. Only one strand is shown, which corresponds to the sense strand for the majority of genes. Abbreviations used as in Figure 1; numbering starts at the first nucleotide after the ND6 termination codon. The putative first and last nucleotide for all genes are indicated; putative initiation and termination codons of translation, as well as the anticodon sequences of the tRNA genes, are in bold face characters. Arrows indicate the direction of transcription. The translated amino acid sequences of protein genes are available from the authors upon request.

pods (CLARY and WOLSTENHOLME 1985a; CROZIER and either directly adjacent (Katharina) or are separated by CROZIER 1993; PEREZ et al. 1994) (sequence of Artemia different tRNA genes. Interestingly, in sea urchins ND2 mtDNA submitted to EMBL with accession number and COI are separated by the 1-rRNA gene (JACOBS et X69067) and the vertebrates (see WOLSTENHOLME al. 1988; CANTATORE et al. 1989; DE GIORGI et al. 1991; 1992). In all the above, these two protein genes are SMITH et al. 1993).

Page 5: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1357

B 7001 C T T A C G A T A T C C A T - T T - G T G ~ ~ T T A G T G A G G A A <" 1 <-tam mt [ <-m3

7101 CAGTCRAGGA~PCCATAATATATTTCATATMT-TAAGATCARTT~CACTATTATAAAT-TMT~TMTMTT~~TGTTATAAATC

7301 A T T T C T T A A A G G G T C A A A G C C A C A T T C A A A T G G G G T T T T T A A T T C G T A G G A T G ~ T T ~ T T ~ T T M T ~ T ~ T A A T T A T T ~ T

7401 A R G C A T A A T ~ ~ T A A T G A T G T C G G T A G G A C G T A T - ~ A A T G G T T T ~ T T A T ~ T T T G G A T ~ C T A T T - T T ~ T ~ T ~ A T <-ND3 I <-tam .rr (Val -

7501 T C A T ~ T T A T A A R T T ~ T ~ A A ~ T T A A T T T - T A T A T T A T T ~ ~ G T T T C T T ~ ~ T A T T T M T T T T A T T ~ T G C ~ T A ~ T G T m aer ( A M ) - > [ N">

7701 AGTTGATTTGATTAACTCCTGGGACAGCATTAATTTTTCTTA~CTTATAGTTRCCTT~AGTATTAATT-CRTATAATATT~TTATAA 7601 TTATTTCCTTTTTGGGARGTT~CATTGACATTTATCATTCTAAGATCARTATCATTAACATATTTAAATATTMTTTTA~TT~TT~

7801 GTACATTGGGTGTTTARGRRGTTTAAATTTGGTATTGATAAT~~~~TTTTGTGTCTGTGATTTCTT~TTTTATGTAATATTT~TATCTTTGATC

8001 T G T T A T T A T T A T T A T A T T T G T A C T A T A C G G T A G G A T C T C T 8101 GCTAATAATAGCATTTTTAGTARAGCTTCCAAT~ATACCTGTCATTTATGACTACC~CTCACGTC~CCATT~TTCCATAGTATTG 8201 G C C G G T G T G T T A C T T A A A C T G G G G G G A T A C G G A C T G T A T A T A T T M T T A A T T T T A T T A T T ~ T A G T ~ T T G G T A A T T ~ T A A T C A T T ~ C 8301 TTTCATTGTGRGGTGCTGTTATTGCTTCGATTATTTGTATTCARCAAGTTGATATT~TCTGGT-TATTCTTCAGTTGCT-T-TTAGT 8401 ATCAGCTGGGATTTTAATGATAT-TTGAA~TAT~TGTG~TAATAAGATCARTAATTGCTCRTGGCTAT~TCRTCTGCTTTATTCGTATT~T 8501 AACTTATCATATTTAARAATTARGRGCCGAAGTTTARTATTCAT~~TATT~TATCTTTC~CAATAGCTTTTTATTGATTTCTATTTAGTT BIOI G T A T A A A T A T A G C R G C C C C A C C A A C A C T T A R C T T T A T T G G G 8701 CATTATTATATTTATTAW\GCGGGGTATRGATTATATATATATATAACCGTTAATCATG~TT~GTCTTTATATCACCCC~TT~TTAAA

ND4-> I 7

as01 CATCATTATCGTAGATTTAU.AARTCTATGCTTTAA~TT-TA~TAA~AATTTAT-TGATCCTCATCART~TCCTAATATAT~T so01 CATACTACATCTACGRARTGTTAAGATCARTATCATGCTGCTGCT~CC~~ACATGATG~TAGTTCT-TGATAATAATATGTACGT~AAATTCRCGA

9201 A T T A T R C T C A C C A T A T T W A T T ~ T ~ - T ~ T ~ T A A A C C C ~ T T G R G T A A T A T T T T C C T T C C G T C A A T G ~

9401 A T A C A G A G C C G A T T T C A R T T G A A G G T G C T A A C C T T C T A T G X W L A T A T G C C ~ T ~ C T ~ T M T A A A T - C T A T

7201 T T A C T A T R W \ T A A T A C A G G T C ~ T ~ T T C C R C

7901 CCTACTTTATTATTAATTTTACTATGAGGCTATTAAGATCARC~~GGAT~CAGGTTTTTATTTGATA~ATATACCGT~~T~TTRCCTTTAT

BBOl A A T G T A G A C T A T C A T G T T T T A A C C G C A C A T T T G T T A C ~ T T T A T T T T A T T A A T T C C ~ T T A T T T T C T G T ~ ~ T ~ C ~ A C T R C T - T A <-m Thr [ <-COIII

9101 A A A G A A A T A G G G T A C C G G T A A T C C A T ~ C G G T T G ~ A T ~ T G T ~ T T C C A T ~ ~ C R T ~ ~ T T ~ T ~ T G T T T C

9301 T G A T G R G C T C A R G T A A T A C T T A C A C e r G A T A A T A A T - ~ T A T T T A A T ~ T - T ~ T T ~ T - T M T T C C A A T T ~ T C

9501 ACCTAGTTTTARGCCTTTCACCACGTAAGTTGTAT~T~CTTGATAGGT~TCTC-CARTGTCRCGTCRTCATATAT~TAAT~TT 9601 A A C A A C A T A C C G T A T A A A A G T A R G T A A A T A G A G C C A A A R C C C

<-COIII I e701 A R G C ~ A C A T A C T T C G A C T - T R A T G R R R A O G T T T T T T T G C A T G

SBOI TACGGGTATCRTT~GTTGGGAGTT-TACCGTTGCTT~CCTT-TCTGTTCTTCT~CRTAATTATTTT~TCTAAGATCARTT ~ m Il"> [ m2->

9901 T T A R G T A T A A C A T C A R G T A A T T G A A T T A T T A T T T G A A T T G 10001 CAGGTGAAGGAATTATARTRTATA~TTTTAATTCARTCAGTTT~CAGTGATA~ATTAAAT~TTATATATTTTTGT~TCRT~GTCATCATA

10201 A A T A T T G G T A T T G T T G G G T T A C T T C T R A R A A T T G T C C C R R

10401 AGGAATAAGATGTATCRGAGGTAGATTATTTARATATTTTATT~TATGGGTTTTCTTTAGTMTCTT~CGTGTTTTTAT~~-~~TA 10301 TARGAGTARCATTAAGATCARTATTATTTGGRGCATTAATTGGTATAAATTTAA~TAC-T~TT~~T~erCARCTATT~~TAATffiTTffiTT

10601 GGTTTATTGTATTAGTATTTGCGATTTTARGTGCTGTTATTAGTCTTGTATACTATCT-TTTAGTGTAATATTTTTTATAAATAT~TAA~A

10101 T A T T T A T T T A T T T A T T T T T A T C R C T A T A C T A A T A C T G R R A T

10501 AGCATTAGGTTGAGCTTATT-TTAAGGGGTCTTCCGCCATTTATGCTATTTATTG~TTMTGTGTT~TATAATAAT~~AATTTAT

10701 T T T G A R G C A T T A T - T A G C T A T A T T T T T A C T T G T T A A T mz-> 1 tam Lym->

10001 A A T T ~ A A T T T U V \ T T R C G A A T T A A T A A G A T C A R T T ~ ~ G A ~ G A T T T T A C T C A A C - T ~ T ~ T A T T ~ T ~ T T G T A T A T G T T A T T T ~ ~ ~ T A T C T G A T [ COT->

10901 G T G G A A T A G T A G G A A C A G G T A T C T C T A T T A A T T C G A T T ~ C T ~ T A C T T C T G ~ T T ~ T ~ ~ C A T T T T T A T M T G T M T T G T T ~ ~ 11001 ACATGCATTTGTARTAATTTTTTTTATAGTAATACCTATTATAATT~TTTTGG-TTGAATAGTA~TTT~GATT~~C~GATAT- 11101 T T T C C T C G T A T G T T T T T G A T T G e r R C C A C ~ C T T T T A T T T T A T T A A T T T G T T C T - T G G T ~ T ~ ~ ~ T G G A 11201 CTGTATATCCGCCTCTAAGATCARGTTTAGCTCAT~~TGCTTCTGTTGACTTAGCTATTTTTT~CTTCATTT~TATCATTAAGATCART~T~ 11301 TGCAATTAATTTTATTACAACRATTTTTAATRTRCGTAGACCCGGTATAACCATGGAACGTGT~~TTATTTGTGTffiTCGATTCTAGTTACTGTTTTT 11401 TTATTATTACTATCATTACCTTCTTGCTGGGGCAATT~ATACTATTAACTGACC-CTTTAATACT~TTTTTTGATC~~~~T~G 11501 A T C C C A T T C T A T A T T A A G A T C A R C A A C R C T T A T T T T G A T T T T T T G G T C 11601 TAGTGCTATAAARCAACCTTTTGGGACATTAGGTATAATTTAT~CATAATTTCTATTGGTATTTT~TTTATTGTTT~~CATATATTTA~ 11701 GTTGGGAT~TGTAGATACRCGCGCTTATTTTACAGCCG~ACTATAATTATTGTAAGATCARTTCCAACTffiGATT-TTTTT~TGACT~TAAerATTT 11801 A T G G C T C A R A R G T T C A G T A T A C T G C A R G C A T A T A C T G A G T T C 11901 TTCATTAOATATTATACTTCATGATACTTACTATGTTGTGGCACATTTTCAerATGTTTTATCTATAG~TCTTT~ATCTTT~~TTT~T 12001 T T T T G A T T T C C A G T T A T A A C C G G A T T C T T C A C G A G C T T T T T C C C C C 12101 A A C A T T T T T T A G G G T T A G C T G G A A T R C C T C G A C G G T A T T C C 12201 T G T G T T T G C T G T A T T A T T G T T T G T A T T A A T T G T C T G A G A G

12301 T C T G A T G G A T A T T T T C C C C C T T T C A T ~ T T T A T C A A T ~ T A C A T T A C T A T ~ ~ T T A T T A T A A T C - T A A T A C T ~ T ~ ~ T

12401 G A G T T A G T T C C T A R T A T T T ~ T C T - T A T T ~ T A ~ ~ ~ T ~ T - ~ T ~ T ~ T A C C T T T T - T A A T ~ ~ ~ T G A T R C

COI-> I tam Val->

I">

12501 T A A A T T T A T T C T R R G A T T T T A T A T T T C C C G A A A A T A A A A O C T C 12601 A A T G A A A A G T T R G T A G T G A R A C R C T T T C A C T T T T A T T G A T T T T 12701 AAGCGAAGAATTTTATATAACAGTARRAATTTCTAATTAATAATATTARCTGGTAGTTTTATT~TAT~TATATTTATATATTAAATTCCTT-TT 12801 CTATTTATATATTATAAGATCARCCWCAGTATACTAATAATGTATCAATTAGTAAAT~TAT~TAAATATTAGT~TC~-TMTT~TT~GA 12901 C T G T T T A T C A A R R A C A T ~ C A A A R G T A G A T A T T T T T G G T G T T A T ~ G C C C R G T ~ T T T T ~ ~ C C ~ T A C C T T G A C T G T G ~ T A G C R T 13001 A A T A A T T T G R C T T T T A A A T G C e r ~ T ~ C G T A T ~ C T T G T C T C A T T ~ T A T T A C T T T A A A T T T A A T f f i T T ~ T ~ T A C T C 13101 A T A A T T T T A A T A A T - G C C T T ~ T T T T T T T T A 13201 T A T A T T A A T G A A A G T A A T A C G A T T A A A T ~ T T A T A G A 13301 C T A G G T A C T A T T A R G G C C G T T T T A A T A T A A T A T G T T C T G T T C G A A C T T T G T T A C C T ~ T G A T C T G R G T T ~ C G G C G T ~ ~ T C T T T

13401 CTATCTTTTATCTGGC~~ATAATAGTACGAARGGRCCTTATAGGCRCTTATTCAT~TATA~TTATTTTGGCRGAGTAT~TAT~C~~TCATA

13501 ATATAGTAGTTACCTACTAAATAAT~TAAGGGCRGTTRCTGGT~GTT-T~TATT~T~TRCTGTCTACCTTTA~TTTATRCTTTAA

13601 T T A A G A A G R T G T G G T T ~ c A T T A R G T ~ T A T A T T T T T c c

1"> 1 tam L.u (aN)->

m P"> m [ -I->

13701 GGTATTTATGATGCTGAARGGRRTCAATCCRATWU;CCTTTTATTAGererT~TACTTT~GTTATGT~TGTTCTATGATT~T~TTTAT~G 13801 T C T T G A T A T G C ~ A T A T T T T A T T T A T T G T C T A T A T T G G T G 13901 A T A T A T A T A A A T C A T T A T T G T A T G C G T G A W i A G C R G T A A T 14001 T A C T A G A G T A A A T A T T C C T T T A A T T T T T C T C T C

14101 ATTCTCATAGTTGAARGTAGCCARGTTTTAA NDI-> 1

FIGURE 2. - Continued

As can be seen in Figures 1 and 2, all Albinaria genes many cases of overlapping genes coded by the same are either directly adjacent or have very few nucleotides strand. Three types of such overlap may be identified, separating them. The total number of nucleotides be- involving protein-protein, tRNA-protein, or tRNA-tRNA tween genes is 141, the longest intergenic sequence junctions. Thus, there is one case of overlapping pro- of 42 nucleotides separating two genes transcribed in tein genes (hD5-h!D1,7-nt overlap), one case of overlap opposite directions (COIII and tRhCA'7. Five genes end between a tRNA and a protein gene ( tRNALYs-COI, 7-nt on abbreviated stop codons. Furthermore, there are overlap) and six cases of overlapping tRNA genes ( A s p

Page 6: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1358 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

Cys, 5 nt; Gly-His, 4 nt; h ( C W ) - P r o , 4 nt; h ( U U . R ) - Gln, 2 nt; PreAla, 2 nt; Tyr-Tq, 5 nt). Finally, an exten- sive overlap of 12 nt is observed between two tRNA genes transcribed in opposite directions [tRNAs”(UCN) and tRNAs“(AGN)].

Another unusual feature in Albinaria mtDNA is the presence of four consecutive protein-protein gene junc- tions (h?D6-h?D5, h?D5-NDl, NDl-ND4L and ND4L-Cytb) that are not separated by an intervening tRNA. It has been suggested that the secondary structure of a tRNA gene between pairs of protein genes is needed to act as a signal for the precise cleavage of the polycistronic primary transcript (OJALA et al. 1980, 1981). In accor- dance with this hypothesis, almost all reported cases of protein-protein gene borders with no intervening tRNA have the potential to form hairpin structures (BIBB et al. 1981; CLARY and WOLSTENHOLME 1985a; OKIMOTO et al. 1992; BOORE and BROWN 199413). In Albinaria it is also possible to draw stem and loop struc- tures near or at the protein-protein gene boundaries (Figure 3). In the case of h?D6ND5 and NDl-ND4L bor- ders, the stem and loop structures are positioned more or less similarly with those reported for the nematodes and Katharina with respect to gene termini; the com- plete termination codons are positioned within the loops. In the case of ND5-ND1 and m4L-Cytb borders, however, a tRNA-like structure precedes a stable stem and loop structure that is located near the 3‘ end of ND5 and ND4L genes, respectively (in ND4L, there are two hairpins separated by 29 nt). It should be noted that NLI5 and NDl are overlapping genes and that h?D4L ends with an incomplete termination codon. These tRNA-like structures resemble normal tRNA with AAG [alternative Leu(CUN)] and TTT (Lys) anticodons re- spectively. The existence of superfluohs tRNA-like genes has been also reported for two other molluscs, Mytilus and Katharina. What is unusual with Albinaria, however, is that, in contrast to other molluscs, the tRNA- like structures are located totally within protein coding genes. Interestingly, an extension to the 5‘ end of the h?D5 gene in sea urchins can be folded into a secondary structure resembling a tRNA gene and is thought to be a remnant of a tRNAh(CurV) gene (CANTATORE et al. 1987, 1989; DE GIORGI et al. 1991). The significance of the Albinaria tRNA-like structures is not clear at pres- ent. However, we have found that they do not hybridize with low molecular weight RNA in Northern blots, while the two corresponding standard tRNA do (data not shown).

Control regions: An important question emerging from the gene organization of Albinaria mtDNA con- cerns the location of regions containing the signals for replication and transcription. In all metazoans, where such sequences have been identified, they are found in noncoding regions (WOLSTENHOLME 1992). What is unique in Albinaria mtDNA, however, is that it contains virtually no unassigned sequences of significant length

(the largest unassigned sequence is 42 nt and the sec- ond and third largest are 21 and 16 nt, respectively).

Interestingly, the decanucleotide AATATATATT, lo- cated between tRNAh(UUR) and ATPase8 within the l h t Albinaria unassigned sequence, is reminiscent of sequences present in sea urchins (TTATATATAA) and chicken and duck (ATATATAT). In Albinaria, this pal- indrome is even longer (tetradecanucleotide, TAAATA- TATAITTA), if we allow for a 2-nt overlap with the tRNAh(UUR) gene. Such palindromes have been impli- cated to function as bidirectional promoters or as recog- nition signals for enzymes involved in transcription or processing (JACOBS et al. 1988, 1989; CANTATORE et al. 1989; L’ABBE et al. 1991; RAMIREZ et al. 1993).

The longest Albinaria noncoding sequence (42 nt) contains yet another decanucleotide perfect palin- drome (ATAAATITAT), which is twice as long (TGCT- GATAAATTTATAAGCA) if we overlook a 5-nt overlap with the adjacent tRNA’[” gene.

Synthesis of the second (L) strand in a variety of metazoans (including several mammals, Xenopus, Dro- sophila and the nematodes) is supposedly initiated within a run of Ts situated in the loops of potential hairpins that can be formed from intergenic sequence ( C W G et al. 1985; ROE et al. 1985; CLARyand WOLSTEN- HOLME 1987; OKIMOTO et al. 1992). No such structures can be found in the short Albinaria noncoding se- quences. However, a sequence with the potential of forming a stable stem and loop structure is found within the Nll5 gene (Figure 3); this secondary structure has a T-rich loop and moreover its 5‘ end is identical to that of the OL loop in Drosophila (boxed sequence ATATAA in Figure 3) (CLARY and WOLSTENHOLME 1987). The significance of this structure is presently unknown.

Protein genes: Identification of protein genes was accomplished by comparison at the nucleotide/protein level (see Table 1 for genetic code) with already known mtDNA genes of a close relative to A. coerulea (A . turrita) (LECANIDOU et al. 1994) as well as with sequences of genes available in the EMBL data bank. Identification of ND4L was based on hydropathy profiles, predicted using the computer program SOAP of the PCGENE package (BAIROCH 1988) (data not shown), that were found to be very similar to corresponding profiles of Katharina, Mytilus and Drosophila.

Nine of the 13 protein genes of Albinaria mtDNA start with the orthodox translation initiation codon ATGtwo with ATA (Cytb, ND3), one with ATT (h.2)5) and one with TTG (COI). ATT is not used as initiation codon in Katharina and Mytilus but is quite frequent in Drosophila (CLARY and WOLSTENHOLME 1985a) and the nematodes (OKIMOTO et al. 1990,1992). TTG is not used for initiation in any of the known coelomates but is common in pseudocoelomate nematodes (OKIMOTO et al. 1990, 1992). Eight Albinaria protein genes termi- nate with the stop codons TAA or TAG (Figure 2). The other five are inferred to have incomplete translation

Page 7: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1359

G T A T

A C A I

T m6

“ G - C Y J A * C T * G G - T A - T

14110 16

G A A A

A T

T T T

T T T T

T T T T T T T * G C - G T - A T - A G T T G ’ T G A A - T T A

G G T T G G T T

G O T G A

G A

T - A T A - T T T T - A T - A

T - A

T - A T - A G - T

A - T T - A

C - G T - A

A - T T - A T T T - A

/:&i!iAACT/ I A A T T T - A

1394 1529 A - T T * G A - T T - A A - T A T A T T A A C

G A T C T - 1 . 1

T G A T T A T

1323 A - T 1654

T T 1 . 1 I I

T T G G A A

C C A G O T

G - T

T T T - A

A T T C A A G

A - T

C T T G T T T

A T

T T G - C T - A G T - A C - G T A A - T T A T T G A

nD1 T - A G - C

T - A T - A

T - A G - T G - T T - A r

A - T

T C

T - A T - A

C T G - C T A G G T G / : G A A T - A A T G A C T I / G T A G C T G T T T C C - G

C - G 2802 2821 nD4L ”-* T - A

A - T T - A

A - T

T C G A C G I G T - A A A

G G T A T G C T G C I C A I l l 1 T T T G T T C A T A

2530 2551 2615

G T C G A I I I I I I A

C C A G G T G T T O G

A - T A - T

T - A A - T

A T

T A

T T T

FIGURE 3.-Potential secondary structures near or at the junctions of hD6-hD5, h D 5 D 1 , NLll-hD4L, and hD4L-Cytb genes. The start and stop codons are indicated in each case. Within the hD5 and NLl4L genes, tRNA-like structures that are followed by a stem and loop structure may be drawn. Moreover, -300 nt before the end of hD5 another stem and loop structure may be formed, immediately after the boxed sequence ATATAA; a similar structure has been implicated as the origin of light strand replication in Drosophila (see text).

Page 8: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1360 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

TABLE 1

Codon usage of A. coerulea mtDNAencoded proteins

Amino Codon N % acid

Phe TlT 240 6.7 TTC 25 0.7

Leu TTA 340 9.4 TTG 63 1.8

Leu CTT 83 2.3 CTC 14 0.4 CTA 79 2.2 CTG 24 0.7

Ile ATT 266 7.4 ATC 36 1.0

Met ATA 213 5.9 ATG 50 1.4

Val GTT 90 2.5 GTC 20 0.6 GTA 97 2.7 GTG 22 0.6

-I- Amino Codon N % acid

Ser TCT 77 2.1 TCC 15 0.4 TCA 68 1.9 TCG 14 0.4

Pro CCT 49 1.4 CCC 22 0.6 CCA 50 1.4 CCG 11 0.3

Thr ACT 77 2.1 ACC 23 0.6 ACA 73 2.0 ACG 12 0.3

Ala GCT 91 2.5 GCC 24 0.7 GCA 66 1.8 GCG 12 0.3

Amino Codon N % acid

TYr TAT 148 4.1 TAC 38 1.1

Term TAA 6 0.2 TAG 2 0.1

His CAT 61 1.7 CAC 13 0.4

Gln CAA 41 1.1 CAG 10 0.3

Asn AAT 106 2.9 AAC 32 0.9

AAG 13 0.4

GAC 15 0.4 Glu GAA 48 1.3

GAG 25 0.7

LYS AAA 73 2.0

ASP GAT 47 1.3

Amino Codon N % acid

Cys TGT 34 0.9 TGC 10 0.3

Trp TGA 68 1.9 TGG 20 0.6

Arg CGT 17 0.5 CGC 4 0.1 CGA 22 0.6 CGG 6 0.2

Ser AGT 49 1.4 AGC 28 0.8 AGA 64 1.8 AGG 30 0.8

Gly GGT 73 2.0 GGC 24 0.7 GGA 79 2.2 GGG 43 1.2

L

N, total number of particular codon in all proteins. The total number of codons is 3595; the incomplete termination codons were excluded.

termination codons (either T for ATPase6, ND3 and ND4L or TA for COI and Cytb).

In general, proteins coded by Albinaria mtDNA are quite short (Table 2). Five of them (COI, COII, COIII, Cytb, ND5) are the shortest among known metazoans (compare with Table 11 Of WOLSTENHOLME 1992). Also, five others (ATPase6, ND1, ND2, ND4, ND6) are the shortest among coelomate metazoans, since these are shorter only in the nematodes (the corresponding lengths in C. eleguns are 199, 291, 282, 409, and 145

amino acids). The only Albinaria proteins that are not the shortest among known molluscs are ATPase8 and ND4L; both are among the least conserved mitochon- drial proteins and exhibit a high degree of length varia- tion (WOLSTENHOLME 1992). It is worth noting that ATPase8 differs in length by >5% between two Albina- ria species (A . cornleu, 55 amino acids; A. tum'tu, 52 amino acids).

Among compared molluscs of Table 2, there is exten- sive variation not only in the size of their mitochondrial

TABLE 2

Comparison of the A. cod mitochondrial protein coding genes Wi-th those of A. turritu, C. neman-alis, K. tunica&, M . edulis and D. yakuba

Percentage of identity

A. coeruh A. coerulea A. coerulea A. coerula A. coerula Protein length (amino acids)

Protein A.c. A.t. C.n. K.t. M.e. D.y. us. A.t. us. C.n. us. K.t. us. M.e. us. D.y.

ATPASE6 214 - - 230 238 224 - - 34.0 30.2 34.0 ATPase8 55 52 54 53 A 53 50.9 19.6 28.1 A 26.8 COI 509 P - 513 P 512 98.9* - 72.6 (49.0) - (36.0) 71.5 (46.5) COII 224 224 - 229 P 228 92.4 (94.7) - 55.0 (54.1) - (37.4) 53.9 (54.9) COIII 259 - - 259 264 262 - 62.7 42.9 59.5 Cytb 367 - - 379 P 378 - - 53.6 (50.4) - (46.0) 53.6 (52.3) ND1 299 - - 316 P 324 - - 46.1 (47.3) - (40.2) 47.4 (49.8) ND2 307 - - 338 P 341 - - 24.9 (25.2) - (31.6) 30.9 (32.4) ND3 117 - - 120 116 117 - - 39.2 35.5 39.7 ND4 437 - - 442 P 446 - - 35.2 (29.9) - (33.1) 43.0 (39.5) ND4L 99 - 79 100 93 96 - 24.2 22.7 26.2 22.5 ND5 545 - - 571 P 573 - - 35.0 (37.0) - (33.9) 34.2 (35.7) ND6 155 P 180 166 158 174 75.3* 19.9 23.9 17.4 27.8

-

Percentage of amino acid identity was calculated by dividing the number of identical amino acid positions by the common length of the compared sequences. Numbers in parentheses show the percentage of identity among the partially sequenced portions of M. edulis protein coding genes and the corresponding regions of A. coerulea, A. turrita, K. tunicata and D. yakuba genes. Asterisk denotes the percentage identity values for the partially sequenced A. turrita COZ and ND6 genes. A, absent; P, partial. Alignment of compared amino acid sequences is not shown.

Page 9: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1361

srRNA

A.C. ( 5 ) TTTAATTT (61) TTTAAGATTTTFAATTATAT (27) ATTTAGGTTTA -(O)- X.t. ( 5 ) ...GG.. C (103) ........ A.C-......T. (46) ........... -TAG (18) .. D . Y . ( 6 ) ... T .... (61) ..... A.AA- ..... - .... (43 ) ..----- ... .-TAG (24) . .-. ....... .GT..

............... GT-TAATATACL-CCA-TAGTA K.t. [548] T......A....CT...bT... D . Y . [548] ............... T ..... AT...C.... .AG.. TAT...G-... GA

IrRNA

A.c. (52) TACCTTTTGCATAATGGT A.t. (53) .................. 1 [;;E!; 1 GTACGAAAGGACC ............. (25) K.t. (132) ............ T..... [lo851 .............

(24) (27)

U.O. (80) ............. G .... ........... .T (86) D . y . (179) ...... G..T..C.GC.. [ 10641 ............. x.1. (189) .................. [ 13141 .............

(52) (106)

FIGURE 4.-Alignment of the ends of the two ribosomal RNA genes. A.c., A. turrita; K.t., K. tunicata; M.e., M. edulis; D.Y., D. yakuba; X.Z., X . laevis. Numbers in parentheses denote regions where alignment is umbiguous; numbers in brackets denote the total intervening length of unaligned sequences between the ends of the molecules. Double underlining emphasizes inverse repeats forming a stem and loop structure; in the proposed secondary structure of Drosophila s-rRNA this is the last hairpin (CLARY and WOLSTENHOLME 1985b). Single underlining indicates an additional hypothetical stem and loop structure in Albinaria.

proteins but also in the percentage of identity they ex- hibit. As expected, the greatest identity is observed be- tween the two Albinaria species. When more distantly related species are compared, there are some dis- cordances with traditional taxonomic relationships, a conclusion we have previously reached from analysis of A . t u m h mtDNA sequences (LECANIDOU et al. 1994) and which is also reported by BOORE and BROWN (1994b).

Codon usage and codon bias: The genetic code of Albinaria (Table 1) has been presented previously (LE- CANIDOU et al. 1994) and is the same in all known mol- luscs (HOFFMANN et al. 1992; BOORE and BROWN 1994 a,b; TERRETT et al. 1994) as well as in Drosophila (CLARY and WOLSTENHOLME 1985a). It differs from the univer- sal code in that ATA codes for methionine, TGA for tryptophane, and AGA and AGG for serine.

As can be seen in Table 1, codons ending at A or T are much more frequent (-81%) than those ending in C or G. In fourfold synonymous codon families 77.5% of the codons end at A or T. Since any nucleotide substitution in these sites does not lead to an amino acid replacement, the AT bias in the third codon posi- tion should not be attributed to pressure of natural selection at the protein level.

Seven out of 62 amino acid codons, which consist exclusively of A and/or T, represent 38.4% of the total number of codons. Six of these are the most frequently used codons and correspond to the amino acids Leu, Ile, Phe, Met, Tyr, and Asn, which constitute 16.8, 8.4, 7.4, 7.3, 5.2 and 3.8% of the total number of amino acids, respectively. With the exception of Leu, this order of amino acid frequencies is different from that actually observed in Albinaria mitochondrial proteins, where Ser, and not Ile, is the second most frequent amino acid (9.7%), while Val (6.4%), Gly (6.1%), Ala (5.4%)

and Thr (5.2%) are more (or equally, Thr) frequent than Tyr and Asn. Thus, there is no direct correlation between the most commonly used codons and the most frequent amino acids. If the predominance of codons composed solely of A and/or T is attributed to AT bias (CROZIER and CROZIER 1993; BOORE and BROWN 1994b), then the relatively high percentage of certain amino acids that are not coded by strictly A and/or T codons (such as Ser, Thr and the aliphatic Val, Gly, Ala with similar physicochemical properties) must be attributed to selective pressure.

Ribosomal RNA genes: Identification of the Albina- ria s-rRNA and 1-rRNA genes was accomplished by com- parison with other known mitochondrial ribosomal RNA genes. The l-rRNA gene had been previously iden- tified in two cloned segments of A. turrita mtDNA (LE- CANIDOU et al. 1994). Using the complete A. coerulea mtDNA sequence as a guide, it is inferred that these two segments are actually consecutive and that the 5' and of the A. tum'ta 1-rRNA gene is at residue 631 of segment I (the sequence preceding the 5' end contains the tRNAval gene), while the 3' end is at nucleotide 923 of segment I1 [due to revision of the A. turrita tRNAb- (CUN)] (see discussion on tRNAs) .

The ends of the two Albinaria rRNAs cannot be pre- cisely determined because of uncertainties in the align- ments with known rRNAs. Figure 4 shows a comparison of the 5' and 3' regions, which is based on conserved sequence elements showing a significant degree of simi- larity. Since the size of the compared regions is smaller in Albinaria, we assume at present that each Albinaria rRNA gene occupies all of the available space between the tRNAMet and tRNAGL" (s-rRNA) and the tRNAva1 and tRNALcu genes ( 1-rRNA) .

At the 5' end of the Albinaria large rRNA gene, the first conserved region (18 nt) is located 52 nt down-

Page 10: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1362 E. Hatzoglou, G. C. Rodakis and R Lecanidou

A - T C - G T - A G T

T A A I l l 1 T T T G T A

A T T T A A A C A A I l l 1

T A A A T T G C A

T A A A A A - T

T - A T - A

T - A C - G

T T T A G T C

B

a a T T I A C A g T

C

A ISOLEUCINE A - T

A - T G - C

A - T C - G

G - T C - G A

C C C A T A I

T A G G C C G G A G T I

T T A C G G A

A G I I I I I I G

I I l l T T

G G A A T T - A A - T T - A C - G A - T

T T T G G A T

A ARGININE A - T

A - T

A - T T - A

T - A T - A G - C

A C T T T T A C

C A A I I I C A G C G

A I l l 1 G A T T A T

T C G C G C

A G A C T T

T - A C - G

0°C A - T G - C

T C T A T C G

T CYSTEINE G * T

T - A A - T

A - T G - T

A - T T - A

T A

A A A I I I I T T A T A A

A T T T G A T A T T c I I I I T T T T A A A C T T A G A A

A - T

A - T T - A

T - A G - C

T G T A G C A

A GLUlWUC ACID A - T

T - G G - C

A - T T - A

A - T T .G

T A

A A A A I I I I I I T C T C A A J T

A T A T G G A G T T I A

T A T A C A T A A G A A

T I I I I T A G

T - A T - A

A - T T - A T - A

T T T A T T C

FIGURE 5.-Cloverleaf representation of 22 putative A. coerulaa tRNA genes. In 11 of them a direct comparison is made with the corresponding A. tum'tu putative WAS. 0, identical nucleotides; small leters, differences; - - -, gaps in A. turritu. Arrow- shaped boxes encompassing a variable number of nucleotides denote overlapping regions that are supposedly modified posttran- scriptionally by RNA editing using the opposite T nucleotides of the 5' end as an internal guide.

Page 11: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1363

C 6 LYSINE T - A COI

f * I d : N A - T C

G A G I I I I I A T C T T A A

A G C C G G A A T T - 1 I C A

T A A G C A A C A A T T

T - A C - G A - T A - T A - T

T A T A T T T

D A THREONINE A - T

T - G T - A A - T C - G T - A G - C G

A A A T T T T A T

T T T C G T A A T G G I l l 1 A A T A A G C G T A A C

I l l - T

T - A A - T G - C A - T T - A

T A T A T G T

A METHIONINE T - A

A - T T - A

G - C T - A G - C A - T T

A A A I l l T T C C T

T C G A A I l l 1

G G A

A G C T T C A

T A G A A

A - T T - A

G - C G - T G - C

C A T A C A T

G SERINE AGN A - T

A - T

A - T G - C

A - T A - T T - G T T

A T C C T T A l T I I . I I I A A G G G A T I A T A T A T T - A T T - A G * T A - T G . T G - C

C A T A G C T

A PHENYIXANINE T - A

A - T A - T

T - A T - A

A - T A - T T T T C T T T

T A T T C G A G A A A

G A A A G C A

R A T A I I I I I T

I I I I A A

G - C C - G T - G

T G T G G A A

T SERINE UCN T - A

T - A A - T T - A A - T A - T T C T C C G T T A

I T T T C T I I I I

G G C A T A A

I A T A G c c

T A

A T T I I I I I

T G A T A A

A - T A - T

A - T G - C

T - A T A T A T G A

TYROSINE 1

t

A I t T a

FIGURE 5.-Continued

Page 12: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1364 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

stream from the 3' end of the tRNAva' gene, this being the smallest length among all compared molecules (Fig- ure 4). A similar observation can be made for the 3' end, where a conserved sequence of 13 nt is located only 24-25 nt from the junction with tRNALeU. The presence of the conserved heptamer sequence TGGCAGA, defined as the rRNA transcription termina- tion signal in animal mitochondria (VALVERDE et al. 1994) in positions 8-14 of the adjacent tRNAh gene, should be noted.

Comparison of the 5' ends of the small rRNA genes shows that a sequence of 152 nt in Albinaria corre- sponds to a sequence of 236 nt in Katharina or 192 nt in Drosophila. In contrast, the Albinaria 3' end appears to be longer than the equivalent regions of Katharina and Drosophila. A sequence of 18 nt, which is almost identical among compared species, is immediately fol- lowed by an inverted repeat (double underlining in Figure 4) corresponding to the final stem and loop structure of Drosophila s-rRNA (CLARY and WOLSTEN- HOLME 198513). In Albinaria this region is followed by a sequence of 27 nt that could potentially form an extra stem and loop structure, as it contains a 5-nt inverted repeat (single underlining in Figure 4). In any case, the exact points of the rRNA gene ends must be deter- mined by more direct methods (VAN ETTEN et al. 1980; CLARY and WOLSTENHOLME 198513).

Even if we assume that the Albinaria ribosomal RNA genes occupy all of the available space between adjacent tRNA genes, they still represent the shortest rRNA genes among coelomate metazoans (s-rRNA/ I-rRNA: Albinaria, 759/1035 bp; Katharina 826/1275 bp; Mytilus, 945/1244 bp; Drosophila, 789/1326) but are larger than those of pseudocoelomate nematodes (compare with Table VI of WOLSTENHOLME 1992; OKIMOTO et al. 1992).

Transfer RNA genes: Identification of the standard set of 22 Albinaria tRNA genes was based on their pre- dicted cloverleaf structures, which define unambiguous anticodons (Figure 5). Some of the putative secondary structures of ten previously reported A. tum'ta tRNA- gene sequences (LECANIDOU et al. 1994) have been re- drawn to conform with the A. coerulea tRNA secondary structures (see discussion below and Figure 5). In addi- tion to these tRNA genes, two more sequences that can be folded into tRNA-like structures were detected (see gene arrangement) within protein coding genes (Fig- ure 3 ) .

All standard Albinaria tRNAs have the same antico- dons as those reported for Mytilus and Katharina; these are preceded by T and followed by either A or G (pu- rine). Five cases of mismatched base pairs at exactly equivalent positions are evident in the anticodon stem (at the top of five-membered stems; in Ala, Gly, Phe, Pro, Trp). Albinaria anticodon stems consist of 5 and sometimes 6 bp [Ser(AGN), Tyr, and probably Ser- (UCN)]; in the case of Ser(AGN), as many as 9 bp are possible. Although not shown, the Katharina and Myti- lus tRNAYm(AGN) also have the potential of forming 9-

bp anticodon stems, all three molluscan sequences be- ing very conserved in this region. Anticodon stems with an increased potential of base pairing have been also drawn for Ser{AGN} and Ser(UCN) of C. elegans (WOL STENHOLME 1992). The actual presence of a six-mem- bered anticodon stem has been demonstrated by direct RNA sequencing of a mammalian tRNA'"(UCN) (YOKO- GAWA et al. 1991; JANKE et al. 1994). Interestingly, the Albinaria tRNATy' gene conforms very well with the sec- ondary structure of mammalian tRNA'"(UCN), in that, in addition to the six-membered anticodon stem, it con- tains only one nucleotide between the 7-bp acceptor stem and the 4bp D stem. Finally, the Albinaria tRNA'"- (UCN) gene may equally well be drawn to conform with the secondary structure of its mammalian counterpart.

Amino-acyl stems consist of 7 bp. As can be noted in Figure 5, mismatching is observed in several amino- acyl stems [Asp, Gly, His, Leu(CUN), Leu(UUR), Lys, Pro, Ser(UCN), Val]. What is interesting is that in most cases of mismatches, we are dealing with overlapping genes: Asp with Cys, Gly with His, Leu(CUN) with Pro, Leu- (UUR) with Gln, Lys with COI, Pro with Ala. It is also worth noticing that in all such cases the 3' ends of the amino-acyl stems are almost invariably composed of T residues. Recently, it was demonstrated that in A. castel- lanii tRNA mitochondrial genes, certain bases of the 5' end (confined to the first 3 bp of the acceptor arm, where correct base pairing is presumably essential for biological activity) are modified posttranscriptionally by a process of RNA editing (LONERGAN and GRAY 1993); the specificity of editing, rather than being provided by guide R N A s , could be provided by the 3' end of the acceptor stem itself. In Albinaria we believe that a simi- lar mechanism of RNA editing is operating, but that in this case what we probably have is the 5' end of the acceptor stem acting as an internal guide for editing of the 3' end. This editing might thus resemble a primitive polyadenylation mechanism. Final verification must await direct tRNA sequence determination.

We thank professors R. A. D. CAMERON, M. J. SMITH and an anony- mous reviewer for helpful comments on the manuscript. We thank VASSILIS DOURIS and LARA KRAVARITI for their excellent technical assistance. This work was supported by the University of Athens and by the General Secretariat of Research and Technology of the Greek Ministry of Industry, Energy and Technology.

LITERATURE CITED

ANDERSON, S., A. T. BANKIER, B. G. BARRELL, M. H. L. DE BRUIJN, A. R. COULSON et al., 1981 Sequence and organization of the human mitochondrial genome. Nature 290: 457-465.

ANDERSON, S., M. H. L. DE BRUIJN, A. R. COULSON, I. C. EPERON, F. SANGER et al., 1982 Complete sequence of bovine mitochon- drial DNA: conserved features of the mammalian mitochondrial genome. J. Mol. Biol. 156: 683-717.

ARNASON, U., and E. JOHNSSON, 1992 The complete mitochondrial DNA sequence of the harbor seal, Phoca vitulina. J. Mol. Evol. 34: 493-505.

ARNASON, U., and A. GULLBERG, 1993 Comparison between the complete mtDNA sequences of the blue and the fin whale, two species that can hybridize in nature. J. Mol. Evol. 37: 312-322.

ARNASON, U., A. GULLBERC and B. WIDEGREN, 1991 The complete

Page 13: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

Land Snail Mitochondrial Genome 1365

nucleotide sequence of the mitochondrial DNA of the fin whale, Balaaoptera physalus. J. Mol. Evol. 33: 556-568.

ASAKAWA, S., Y. KUMAZAWA, T. ARAKI, H. HIMENO, K. MIURA et al., 1991 Strand-specific nucleotide composition bias in echinoderm and vertebrate mitochondrial genomes. J. Mol. Evol. 3 2 511-520.

AVISE, J. C., B. W. BOWEN, T. R. LAMB, A. B. MEYLAN and E. BERMING HAM, 1992 Mitochondrial DNA evolution at a turtle’s pace: evi- dence for low genetic variability and reduced microevolutionary rate in the testudines. Mol. Biol. Evol. 9 457-473.

BAIROCH, A., 1988 PC/Gene version 5.15. University of Geneva, Switzerland.

BIBB, M. J., R. A. VAN ETTEN, C. T. WRIGHT, M. W. WALBERG and D.A. CLAYTON, 1981 Sequence and gene organization of mouse mitochondrial DNA. Cell 26: 167-180.

BOORE, J. L., and W. M. BROWN, 1994a Mitochondrial genomes and the phylogeny of mollusks. Nautilus (Suppl.) 2 61-78.

BOORE, J. L., and W. M. BROWN, 1994b Complete DNA sequence of the mitochondrial genome of the black chiton, Katharina tuni- cata. Genetics 138: 423-443.

BOYCE, T. H., M. E. ZWICK and C. F. AQUADRO, 1989 Mitochondrial DNA in the pine weevils: size, structure and heteroplasmy. Genet- ics 123: 825-836.

BROUGHTON, R. E., and T. E. DOWLING, 1994 Length variation in mitochondrial DNA of the minnow Cypn‘nella spiloptera. Genetics

BURGE, C., A. M. CAMBELL and S. KARLIN, 1992 Over- and under- representation of short oligonucleotides in DNA sequences. Proc. Natl. Acad. Sci. USA 8 9 1358-1362.

CANTATORE, P., M. N. GADALETA, M. ROBERTI, C. SACCONE and A. C. WILSON, 1987 Duplication and remoulding of tRNA genes dur- ing the evolutionary rearrangement of mitochondrial genomes. Nature 329: 853-855.

CANTATORE, P., M. ROBERTI, G. RAINALDI, M. N. GADALETA and C. SACCONE, 1989 The complete nucleotide sequence, gene orga- nization, and genetic code of the mitochondrial genome of Para- centrotus lividus. J. BIOI. Chem. 264 10965-10975.

CARDON, L. R., C. BURGE, D. A. CLAYTON and S. W I N , 1994 Pem- sive CpG suppression in animal mitochondrial genomes. Proc. Natl. Acad. Sci. USA 91: 3799-3803.

CHANG, D. D., T. W. WONG, J. E. HIXON and D. A. CLAYTON, 1985 Regulatory sequences for mammalian mitochondrial transcrip tion and replication, pp. 135-144 in Achievements and Perspectives of Mitochondrial Research, Vol. II. Biogenesis, edited by E. QUAGLIA- RIELLO, E. C. SLATOR, F. PALMIERI, C. SACCONE and A. M. KROON. Elsevier, Amsterdam.

CHANG, Y.S., F.-L. HUANG and T.-B. LO, 1994 The complete nucleo- tide sequence and gene organization of carp (Cypinus carpio) mitochondrial genome. J. Mol. Evol. 3 8 138-155.

CLARY, D. O., and D. R. WOLSTENHOLME, 1985a The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J. Mol. Evol. 2 2 252-271.

CLARY, D. O., and D. R. WOLSTENHOLME, 1985b The ribosomal RNA genes of Drosophila mitochondrial DNA. Nucleic Acids Res. 31:

CLARY, D. O., and D. R. WOLSTENHOLME, 1987 Drosophila mitochon- drial DNA conserved sequences in the A+T-rich region and supporting evidence for a secondary structure model of the small ribosomal RNA. J. Mol. Evol. 25: 116-125.

CLAYTON, D. A,, 1991 Replication and transcription of vertebrate mitochondrial DNA. Annu. Rev. Cell Biol. 7: 453-478.

CLAYTON, D. A,, 1992 Transcription and replication of animal mito- chondrial DNAs. Int. Rev. Cyt. 141: 217-232.

CROZIER, R. H., andY. C. CROZIER, 1993 The mitochondrial genome of the honeybee Apis meU@raa: complete sequence and genome organization. Genetics 133 97-117.

CROZIER, R. H., Y. C. CROZIER and A G. M A a m Y , 1989 The COI and COII region of honeybee mitochondrial DNA: evidence for variation in insect mitochondrial rates. Mol. Biol. Evol. 6: 399-411.

DE GIORGI, C., c . LANAVE, M. D. MUSCI and C. SACCONE, 1991 Mite chondrial DNA in the sea urchin Arbacia lixulu: evolutionary in- ferences from nucleotide sequence analysis. Mol. Bid. Evol. 8: 515-529.

DESJARDINS, P., and R. MOWS, 1990 Sequence and gene organiza- tion of the chicken mitochondrial genome: a novel gene order in higher in higher vertebrates. J. Mol. Biol. 212: 599-634.

DOURIS, V., G. C. RODAKIS, S. GIOKAS, M. MYLONAS and R. LECANIDOU, 1995 Mitochondrial DNA and morphological differentiation of

138 179-190.

4029-4044.

Ahinaria populations (Gastropoda: Clausiliidae). J. Moll. Studies

GADALETA, G., G. PEPE, G. DE CANDIA, C. QUAGLJARIELLO, E. SBISA et al., 1989 The complete nucleotide sequence of the &ttm nuruegicw mitochondrial genome: cryptic signals revealed by com- parative analysis between vertebrates. J. Mol. Evol. 2 8 497-516.

GHMZZANI, S. C., S. L. D. MACKAY, C. S. MADSEN, P. J. JAIPIS and W. W. HAUSWIRTH, 1993 Transcribed heteroplasmic repeated sequences in the porcine mitochondrial DNA D-loop region. J. Mol. Evol. 37: 36-47.

GJETVAJ, B., D. I. COOK and E. ZoUROS, 1992 Repeated sequences and large scale size variation of mitochondrial DNA a common feature among scallops (Bivalvia: Pectinidae). Mol. Biol. Evol. 9

HENIKOFF, S., 1987 Exonuclease I11 generated deletions for DNA sequence analysis. Promega Notes No. 8.

HOFFMANN, R. J., J. L. BOORE and W. M. BROWN, 1992 A novel mitochondrial genome organization for the blue mussel Mytilus edulis. Genetics 131: 397-412.

JACOBS, H., D. ELLIOT, V. MATH and A. FARGUHARSON, 1988 Nuclee tide sequence and gene organization of sea urchin mtDNA. J. Mol. Biol. 201: 185-217.

JACOBS, H. T., E. R. HERBERT and J. RANKINE, 1989 Sea urchin egg mitochondrial DNA contains a short displacement loop (D-loop) in the replication origin region. Nucleic Acids Res. 17: 8949- 8965.

JANKE, A,, G. FELDMAIER-FUCHS, W. K. THOMAS, A. VON HAESELER and S. PAho, 1994 The marsupial mitochondrial genome and the evolution of placental mammals. Genetics 137: 243-256.

L’ABBP, D., J. F. DUHAIME, B. F. LANG and R MOWS, 1991 The transcription of DNA in chicken mitochondria initiates from one major bidirectional promoter. J. Biol. Chem. 266 10844-10850.

LAROCHE, J., M. SNYDER, D. I. COOK, K. FULLER and E. ZOUROS, 1990 Molecular characterization of a repeat element causing large size variation in the mitochondrial DNA of the sea scallop Pluceopecten magellanim. Mol. Biol. Evol. 7: 45-64.

LECANIDOU, R., V. DOURIS and G. C. RODAKIS, 1994 Novel features of metazoan mtDNA revealed from sequence analysis of three mitochondrial DNA segments of the land snail Ahinaria tumita (Gastropoda: Clausiliidae). J. Mol. Evol. 3 8 369-382.

LONERGAN, K. M., and M. W. GRAY, 1993 Editing of transfer RNAs in Acanthamueba castelanii mitochondria. Science 259: 812-816.

MARTIN, A. P., G. J. P. NAYLOR and S. R. PALUMBI, 1992 Rates of mitochondrial DNA evolution in sharks are slow compared with mammals. Nature 357: 153-155.

MITCHELL, S. E., A. F. COCKBURN and J. A. SEAWRIGHT, 1993 The mitochondrial genome of Anopheles quadrimaculutuc species A complete nucleotide sequence and gene organization. Genome

MORITZ, C., T. E. DOWLING and W. M. BROWN, 1987 Evolution of animal mitochondrial DNA relevance for population biology and systematics. Annu. Rev. Ecol. Syst. 1 8 296-292.

OJALA, D., C. MERKEL, R. GELFAND and G. A ~ A R D I , 1980 The tRNA genes punctuate the reading of genetic information in human mitochondrial DNA. Cell 22: 393-403.

OJALA, D., J. MONTOYA and G. ATTARDI, 1981 tRNA punctuation model of RNA processing in human mitochondria. Nature 290 470-474.

OKIMOTO, R, J. L. MACFARLANE and D. R. WOLSTENHOLME, 1990 Evidence for the frequent use of l T G as the translation initiation codon of mitochondrial protein genes in the nematodes, Ascaris suum and Caaorhabditis elegans. Nucleic Acids Res. 18: 6113- 6118.

OKIMOTO, R., J. L. MACFARLANE, D. 0. CLARY and D. R. WOLSTEN- HOLME, 1992 The mitochondrial genomes of two nematodes Caaorhabditis elegans and Ascaris suum. Genetics 130: 471-498.

PALUMBI, S. R., andJ. BENZIE, 1991 Large mitochondrial DNA differ- ences between morphologically similar Paneid shrimb. Mol. Mar. BIOI. Biotech. 1: 27-34.

PEARSON, W. R., and D. J. LIPMAN, 1988 Improved tools for biologi- cal sequence comparison. Proc. Natl. Acad. Sci. USA 8 5 2444- 2448.

PEREZ, M. L., J. R. VALVERDE, B. BATUECAS, F. AMAT, R. MARGO d al., 1994 Speciation in the A l t a i a genus: mitochondrial DNA analysis of bisexual and parthenogenetic shrimps. J. Mol. Evol.

PUSTELL, J., and F. C. KAFATOS, 1984 A convenient and adaptable

61: 65-78.

106-124.

36: 1058-1073.

38: 156-168.

Page 14: Complete Sequence and Gene Organization of the Mitochondrial … · 2002. 7. 5. · Accepted for publication May 15, 1995 ABSTRACT The complete sequence (14,130 bp) of the mitochondrial

1366 E. Hatzoglou, G. C. Rodakis and R. Lecanidou

package of computer programs for DNA and protein sequence management, analysis and homology determination. Nucleic Acids Res. 1 2 643-655.

PUSTELL, J., and F. C. KAFATOS, 1986 A convenient and adaptable microcomputer environment for DNA and protein manipulation and analysis. Nucleic Acids Res. 1 4 479-488.

RAMIREZ, V., P. SAVOIE and R. MOW, 1993 Molecular characteriza- tion and evolution of a duck mitochondrial genome. J. Mol. Evol.

ROE, B. A., D.-P. MA, R. K. WIUON and J. F.-H. WONG, 1985 The complete nucleotide sequence of the Xenopus l0eUi.s mitochon- drial genome. J. BIOI. Chem. 260 9759-9774.

SAMBROOK, J., E. F. FRITSCH and T. MANIATIS, 1989 Molecular clon- ing: a laboratory manual. Cold Spring Laboratory Press, Cold Spring Harbor, NY.

SANGER, F., S. KICKLEN and A. R COULSON, 1977 DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 7 4

SHADEL, G. S., and D. A. CLAYTON, 1993 Mitochondrial transcription initiation. J. Biol. Chem. 468 16083-16086.

SMITH, M. J., A. ARNDT, S. GORSKI and E. FAJBER, 1993 The phylog: eny of echinoderm classes based on mitochondrial gene arrange- ments. J. Mol. Evol. 3 6 545-554.

37: 296-310.

5463-5467.

SNYDER, M., A. R FRASER, J. LAROCHE, K. E. GARTNER-KEPKAY and E. ZOUROS, 1987 Atypical mitochondrial DNA from the deep-sea scallop Plocopectm magellanicus. Proc. Natl. Acad. Sci. USA 8 4 7595-7599.

SRIVASTAVA, K. K., E. E. CABLE and H. L. BONKOVSKY, 1993 Improved resolution and sensitivity of Northern blots using polyacrylamide- urea gels. Biotechniques 15: 227-231.

TERRETT, J., S . MILES and R. H. THOMAS, 1994 The mitochondrial genome of Cepaea naoralis: gene order, base composition, and heteroplasmy. Nautilus (Suppl.) 2 79-84.

VALXERDE, J. R, R. MARCO and R. GARESSE, 1994 A conserved h e p tamer motif for ribosomal RNA transcription termination in ani- mal mitochondria. Proc. Natl. Acad. USA 91: 5368-5371.

VAN ETIXN, R. A., M. W. WALBERG and D. A. CLAWON, 1980 Precise localization and nucleotide sequence of the two mouse mito- chondrial rRNA genes and three immediately adjacent novel tRNA genes. Cell 2 2 157-170.

WOLSTENHOLME, D. R., 1992 Animal mitochondrial DNA Structure and evolution. Int. Rev. Cyt. 141: 173-216.

YOKOGAWA, T., Y. WATANABE, Y. KUMAZAWA, T. UEDA, I. HIRAO et al., 1991 A novel cloverleaf structure found in mammalian mito- chondrial tRNAs' (UCN). Nucleic Acids Res. 1 9 6101-6105.

Communicating editor: W.-H. LI