9
THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 267, No. 17, Issue of June 15, pp. 12211-12219,1992 Printed in U. S. A. Gene and Pseudogene of the Mouse Cation-dependent Mannose 6-Phosphate Receptor GENOMIC ORGANIZATION, EXPRESSION, AND CHROMOSOMAL LOCALIZATION* (Received for publication, October 28, 1991) Thomas Ludwig@, Ulrich Riitherll, Rainer Metzger 11, Neal G. Copeland**, Nancy A. Jenkins**, Peter Lobel$$, and Bernard HoflackS From the European Molecular Biology Laboratory,$Cell Biology and TDifferentiation Programme, Postfach 10.22.09, Meyerhofstrasse I and the 11 Department of Pharmacology, University of Heidelberg, Zm Neuenheimer Feu 366, 6900 Heidelberg, Federal Republic of Germany, the **Mammalian Genetics Laboratory, ABL-Basic Research Program, National Cancer Znstitute- Frederick Cancer Research and Development Center, Frederick, Maryland 21 701, and the $$Center for Advanced Biotechnology and Medicine and University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854 The cation-dependent mannose 6-phosphate receptor (CD-MPR) is one of the two transmembrane proteins involved in transport of lysosomal enzymes. We have cloned the mouse CD-MPR gene and also a very un- usual processed-type CD-MPR pseudogene. They are both present at one copy per haploid genome and map to chromosomes 6 and 3, respectively. Comparison of the complete 10-kilobase (kb) se- quence of the functional gene with the cDNA indicates that it contains seven exons. Exon 1 encodes the 5‘- untranslated region of the mRNA, the others (exons 2- 7) encode the luminal, transmembrane, and cyto- plasmic domainsof the CD-MPR. Exon 7 also contains a 1.2-kb-long 3”untranslated region of the mRNA. A unique transcription-initiation site was determined by primer extension of mouse liver mRNA. The promoter elements in the 5’ upstream region of this site resemble those contained in genes constitutively transcribed. However, Northern blot analysis demonstrates that the CD-MPR is variably expressed in adult mouse tissues and during mouse development. The pseudogene, which is flanked by direct repeats, is almost colinear with the cDNA indicating that it presumably arose by reverse transcription of an mRNA. However, the pseudogene differs from the cDNA. It contains at its 5’ end, an additional 340- nucleotide (nt) sequence homologous to the promoter region of the functional gene. This sequence exhibits some promoter activity in vitro. Furthermore, a 24-nt insertion interrupts the region homologous to the 5’- noncoding region of the cDNA. In the functional gene, this 24-nt sequence occurs between exon 1 and 2, where it is flanked by typical consensus sequences of exontintron boundaries. Therefore, it may represent an additional exon of the functional gene. These two features of the pseudogene suggest that expression of * This work was supported in part by the National Cancer Institute, Department of Health and Human Services, under Contract N01- CO-74101 (to N. G. C), NATO under Contract 900226 (to B. H. and P. L.), and the Searle Scholars Program/The Chicago Community Trust (to P. L.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. to the GenBankTM/EMBLData Bank with accession numberls) The nucleotide sequence(s) reported in thispaper has been submitted X64068 CCD-MPR cDNA), X64069 (CD-MPR pseudogene), and X64070 (CD-MPR nene). I Supported by a predoctoral fellowship of the Boehringer Ingel- heim Fonds. To whom correspondence should be addressed. the CD-MPR gene may be regulated by use of different promoters and/or alternative splicing. In higher eukaryotic cells, the mannose 6-phosphate recep- tors (MPRs)‘ are essential components of the targeting sys- tem that delivers newly synthesized lysosomal enzymes to lysosomes (for review see Kornfeld and Mellman, 1989). Two distinct MPRs have been characterized the cation-dependent MPR (CD-MPR) and the cation-independent MPR (CI- MPR). This latter is also the receptor for insulin-like growth factor I1 (Morgan et al., 1987, MacDonald et al., 1988, Tong et al., 1988). Cloning and sequence analysis of the bovine, human, and mouse CD-MPR (Dahms et al., 1987; Pohlmann et al., 1987, Ma et al., 1991; Koster et al., 1991) and the bovine and human CI-MPR (Lobel et al., 1988, Oshima et al., 1988) revealed that they are distinct but related proteins encoded by different genes which presumably arose from a common ancestor. The extracytoplasmic domain of the CI-MPR is a repetitive structure consisting of 15 repeatingunits (Lobel et al., 1988), each of theserepeats is similar to the luminal binding domain of the CD-MPR (Dahms et al., 1988). The MPRs are distributed over several cellular compartments (for review see Kornfeld and Mellman, 1989). In the trans-Golgi network, they mediate sorting of lysosomal enzymes from secretory proteins. At the plasma membrane, the CI-MPR binds and mediates endocytosis of extracellular ligands, but the CD-MPR does not bind ligands although this receptor is endocytosed. It is still unclear why two distinct MPRs are involved in the transport of lysosomal enzymes. Some insight on their function has come from studies on CI-MPR-deficient cell lines. These cells secrete large amounts of their newly synthe- sized lysosomal enzymes (Gabel et al., 1983). Expression of the CI-MPR in these cells correcting this hypersecretion demonstrates that this receptor is predominantly involved in their intracellular retention (Kyle et al., 1988;Lobel et al., 1989). The function of the CD-MPR in lysosomal enzyme trafficking is less clear, mostly due to the lack of cells devoid of this receptor. Overexpression of the CD-MPR in different cell systems suggests that this receptor is involved in both The abbreviations used are: MPR(s), mannose 6-phosphate recep- tor(s); CD-MPR, cation-dependent mannose 6-phosphate receptor; CI-MPR, cation-independent mannose 6-phosphate receptor; SDS, sodium dodecyl sulfate; kb, kilobase(s;) bp, base pair(s); nt, nucleotide; CAT, chloramphenicol acetyltransferase. 12211

Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

Embed Size (px)

Citation preview

Page 1: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

THE JOURNAL OF BIOLOGICAL CHEMISTRY Vol. 267, No. 17, Issue of June 15, pp. 12211-12219,1992 Printed in U. S. A.

Gene and Pseudogene of the Mouse Cation-dependent Mannose 6-Phosphate Receptor GENOMIC ORGANIZATION, EXPRESSION, AND CHROMOSOMAL LOCALIZATION*

(Received for publication, October 28, 1991)

Thomas Ludwig@, Ulrich Riitherll, Rainer Metzger 11, Neal G . Copeland**, Nancy A. Jenkins**, Peter Lobel$$, and Bernard HoflackS From the European Molecular Biology Laboratory,$Cell Biology and TDifferentiation Programme, Postfach 10.22.09, Meyerhofstrasse I and the 11 Department of Pharmacology, University of Heidelberg, Zm Neuenheimer Feu 366, 6900 Heidelberg, Federal Republic of Germany, the **Mammalian Genetics Laboratory, ABL-Basic Research Program, National Cancer Znstitute- Frederick Cancer Research and Development Center, Frederick, Maryland 21 701, and the $$Center for Advanced Biotechnology and Medicine and University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854

The cation-dependent mannose 6-phosphate receptor (CD-MPR) is one of the two transmembrane proteins involved in transport of lysosomal enzymes. We have cloned the mouse CD-MPR gene and also a very un- usual processed-type CD-MPR pseudogene. They are both present at one copy per haploid genome and map to chromosomes 6 and 3, respectively.

Comparison of the complete 10-kilobase (kb) se- quence of the functional gene with the cDNA indicates that it contains seven exons. Exon 1 encodes the 5‘- untranslated region of the mRNA, the others (exons 2- 7) encode the luminal, transmembrane, and cyto- plasmic domains of the CD-MPR. Exon 7 also contains a 1.2-kb-long 3”untranslated region of the mRNA. A unique transcription-initiation site was determined by primer extension of mouse liver mRNA. The promoter elements in the 5’ upstream region of this site resemble those contained in genes constitutively transcribed. However, Northern blot analysis demonstrates that the CD-MPR is variably expressed in adult mouse tissues and during mouse development.

The pseudogene, which is flanked by direct repeats, is almost colinear with the cDNA indicating that it presumably arose by reverse transcription of an mRNA. However, the pseudogene differs from the cDNA. It contains at its 5’ end, an additional 340- nucleotide (nt) sequence homologous to the promoter region of the functional gene. This sequence exhibits some promoter activity in vitro. Furthermore, a 24-nt insertion interrupts the region homologous to the 5’- noncoding region of the cDNA. In the functional gene, this 24-nt sequence occurs between exon 1 and 2, where it is flanked by typical consensus sequences of exontintron boundaries. Therefore, it may represent an additional exon of the functional gene. These two features of the pseudogene suggest that expression of

* This work was supported in part by the National Cancer Institute, Department of Health and Human Services, under Contract N01- CO-74101 (to N. G. C), NATO under Contract 900226 (to B. H. and P. L.), and the Searle Scholars Program/The Chicago Community Trust (to P. L.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

to the GenBankTM/EMBL Data Bank with accession numberls) The nucleotide sequence(s) reported in thispaper has been submitted

X64068 CCD-MPR cDNA), X64069 (CD-MPR pseudogene), and X64070 (CD-MPR nene).

I Supported by a predoctoral fellowship of the Boehringer Ingel- heim Fonds. To whom correspondence should be addressed.

the CD-MPR gene may be regulated by use of different promoters and/or alternative splicing.

In higher eukaryotic cells, the mannose 6-phosphate recep- tors (MPRs)‘ are essential components of the targeting sys- tem that delivers newly synthesized lysosomal enzymes to lysosomes (for review see Kornfeld and Mellman, 1989). Two distinct MPRs have been characterized the cation-dependent MPR (CD-MPR) and the cation-independent MPR (CI- MPR). This latter is also the receptor for insulin-like growth factor I1 (Morgan et al., 1987, MacDonald et al., 1988, Tong et al., 1988). Cloning and sequence analysis of the bovine, human, and mouse CD-MPR (Dahms et al., 1987; Pohlmann et al., 1987, Ma et al., 1991; Koster et al., 1991) and the bovine and human CI-MPR (Lobel et al., 1988, Oshima et al., 1988) revealed that they are distinct but related proteins encoded by different genes which presumably arose from a common ancestor. The extracytoplasmic domain of the CI-MPR is a repetitive structure consisting of 15 repeating units (Lobel et al., 1988), each of these repeats is similar to the luminal binding domain of the CD-MPR (Dahms et al., 1988). The MPRs are distributed over several cellular compartments (for review see Kornfeld and Mellman, 1989). In the trans-Golgi network, they mediate sorting of lysosomal enzymes from secretory proteins. At the plasma membrane, the CI-MPR binds and mediates endocytosis of extracellular ligands, but the CD-MPR does not bind ligands although this receptor is endocytosed.

It is still unclear why two distinct MPRs are involved in the transport of lysosomal enzymes. Some insight on their function has come from studies on CI-MPR-deficient cell lines. These cells secrete large amounts of their newly synthe- sized lysosomal enzymes (Gabel et al., 1983). Expression of the CI-MPR in these cells correcting this hypersecretion demonstrates that this receptor is predominantly involved in their intracellular retention (Kyle et al., 1988; Lobel et al., 1989). The function of the CD-MPR in lysosomal enzyme trafficking is less clear, mostly due to the lack of cells devoid of this receptor. Overexpression of the CD-MPR in different cell systems suggests that this receptor is involved in both

’ The abbreviations used are: MPR(s), mannose 6-phosphate recep- tor(s); CD-MPR, cation-dependent mannose 6-phosphate receptor; CI-MPR, cation-independent mannose 6-phosphate receptor; SDS, sodium dodecyl sulfate; kb, kilobase(s;) bp, base pair(s); nt, nucleotide; CAT, chloramphenicol acetyltransferase.

12211

Page 2: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

12212 Gene and Pseudogene of the CD-MPR

secretion and intracellular retention of newly synthesized lysosomal enzymes. Overexpression of the human CD-MPR in BHK cells leads to a higher secretion of newly synthesized lysosomal enzymes (Chao et al., 1990), whereas the overex- pression of the same receptor in CI-MPR-negative mouse L- cells results in a higher intracellular retention of lysosomal enzymes (Watanabe et al., 1990).

Klier and co-workers (1991) recently cloned the human CD-MPR gene and described its exon/intron boundaries. They concluded from the analyses of the promoter region that this gene is constitutively expressed. We have cloned the mouse CD-MPR gene which is variably expressed in mouse tissues and established its complete 10-kb sequence. We have also cloned and sequenced a very unusual processed-type CD- MPR pseudogene. The pseudogene is similar to the cDNA but has two additional features. It contains a 24-bp insert also found in the functional gene. Its 5' region is highly homolo- gous to the promoter region of the functional gene. The structure of the pseudogene strongly suggests that the CD- MPR gene contains an additional exon and that its expression can be regulated by use of additional promoter elements and/ or alternative splicing.

EXPERIMENTAL PROCEDURES

Materials-T4 DNA ligase, restriction enzymes, and T4 polynucle- otide kinase were from Boehringer. [ cx -~*P]~CTP, [ cx -~~SI~ATP, [y- '"PIATP, and [14C]chloramphenicol were from Amersham.

Library Screening-For the mouse CD-MPR cDNA, a mouse liver cDNA library (2 X lo6 primary plaques; Stratagene) was screened by hybridization using a full-length bovine cDNA (Dahms et al., 1987). For the genomic clones "MboI-partial" genomic libraries of BALB/c mice' and C57BL/6J mice (1.2 X lo6 plaques each) were screened with the full-length mouse cDNA. Positive plaques were isolated and rescreened until pure preparations were obtained.

DNA Sequencing-The nucleotide sequences of the mouse CD- MPR cDNA, gene and pseudogene, were determined by the dideoxy- nucleotide chain termination method (Sanger et al., 1977) using Sequenase I1 reagents (United States Biochemical Corp.) and [ c x - ~ ~ S ] dATP. DNA fragments were subcloned into the vector pBluescript SK(+) (Stratagene) and sequenced using both synthetic internal primers and primers located adjacent to the polylinker within the vector. Sequencing reaction products were electrophoresed on 6 and 4% polyacrylamide, 7 M urea gels.

RNA Extraction and Blot Hybridization Analysis-Total RNA was isolated using the guanidine hydrochloride procedure (Chirgwin et al., 1979). For Northern (RNA) blot analysis, total RNAs (20 pg/slot) were separated by 1% agarose gel electrophoresis and transferred to membranes (Genescreen). Filters were prehybridized in 7% SDS, 0.5 M NaP04, pH 7.2 (Church and Gilbert, 1984) for 30 min a t 65 "C. Hybridization was carried out in the same buffer supplemented with '"P-labeled probes (2 X lo6 cpm/ml) for 16 h at 65 "C. The filters were washed twice for 5 min in 5% SDS, 3 X SSC at 65 "C and then three times for 20 min in 1% SDS, 1 X SSC. X-ray films were exposed for 1-2 days at -70 "C with intensifying screens.

cording to Field and Gross (1985). The 5' end-labeled oligonucleotide Primer Extension-Primer extension analysis was performed ac-

5'GGAAGACGCGTCCGGAGATCAGGGGTGCGGGGGCG3'com- plementary to the 5"untranslated region of the CD-MPR mRNA was used as a primer. The primer was annealed to 10 pg of total mouse liver RNA or tRNA. The hybrids were precipitated with ethanol, and the primer was extended with avian myeloblastosis virus reverse transcriptase (Stratagene). The primer extension products were ana- lyzed on a 6% polyacrylamide, 7 M urea gel. A sequence reaction performed on a genomic DNA fragment with the same primer was run in parallel.

Plasmid Constructions and Chloramphenicol Acetyltransferase (CAT) Assays-5' fragments of the CD-MPR gene and pseudogene (1.4 kb upstream of the ATG initiation codon) were amplified using the polymerase chain reaction with oligonucleotide primers contain- ing restriction enzyme sites. The resulting BamHI-XhoI fragments

R. Metzger, T. Ludwig, C. Berberich, B. Bunneman, F. Kirchhoff, W. Jaromilek, J. Mullins, and D. Ganten, submitted for publication.

8 0 / E 5 / 8 H E X H X F / K D

1 - . . . . . . . . . . . . . " I .

b , I f 2 3 4 5 6 J

- c - - 4 "- - 4 ..-. ""4"

"""""""" """"

C"C" """ " " - A

I I b

En S S P P P P H t - _ , ,

FIG. 1. Organization of the mouse CD-MPR gene. A, restric- tion endonuclease cleavage map of the mouse CD-MPR. The hatched bar indicates the region that was sequenced. Restriction sites used for restriction fragment analysis and subcloning are indicated B, BglII; E, EcoRI; H, HindIII; K, KpnI; S, Sad; X, XhoI. B, exon/ intron structure and sequencing strategy. Filled or open boxes repre- sent exons or 5'- and 3'-noncoding sequences, respectively. Introns and flanking sequences are depicted with thin lines. Exon 1* indicates the additional exon found in the pseudogene. Two potential polyad- enylation signals (PA) are indicated. Arrows depict the direction and extent of nucleotide sequence determined from each primer. C, re- striction endonuclease cleavage map of the intronless gene. The open box indicates the sequence flanked by the direct repeats. Restriction sites used for restriction fragment analysis and subcloning are indi- cated E, EcoRI; H, HindIII; S, Sad; P, PstI.

were cloned into the CAT-expression vector pBLCAT3 (Luckow and Schutz, 1987). All polymerase chain reaction products were sequenced and were identical to the template sequences. Transfections of mouse L-cells with the resulting plasmids pBLCAT3-CDMPR and pBLCAT3-Pseudo were performed using the transfection reagent DOTAP (Boehringer, Mannheim, FRG) according to the supplier's manual. Cell lysates were prepared 48 h after transfection and assayed for CAT activity as described (Gorman et al., 1982). pBLCATZ containing the HSV-TK promoter and pBLCAT3 without insert were used as a positive and negative control, respectively.

Interspecific Backcross Mapping-Interspecific backcross progeny were generated by mating (C57BL/6J X Mus spretus) F1 females and C57BL/6J males as described (Copeland and Jenkins, 1991). A total of 205 N2 progeny were obtained; a random subset of these N2 mice were used to map the CD-MPR locus (see text for details). DNA isolation, restriction enzyme digestion, agarose gel electrophoresis, Southern blot transfer, and hybridization were performed essentially as described (Jenkins et al., 1982). All blots were prepared with Zetabind nylon membrane (AMF-Cuno). We first mapped a cDNA probe that should hybridize to both the structural CD-MPR gene and the CD-MPR pseudogene loci. Fragments of 6.0 and 3.7 kb were detected in EcoRI-digested C57BL/6J DNA and 8.0. and 3.5 kb in EcoRI-digested M. spretus DNA, The 8.0-kb M. spretus-specific re- striction fragment length polymorphism mapped to chromosome 6; the 3.5-kb fragment mapped to chromosome 3. To determine which locus corresponded to the CD-MPR structural gene, we mapped an intron-specific probe. This gene-specific probe identified only the locus on chromosome 6. A description of the probes and restriction fragment length polymorphisms for the loci linked to the CD-MPR locus, including the ras-related fibrosarcoma oncogene (Raf-2 ), ret protooncogene (Ret), lymphocyte antigene-4 (ly-4), and fibroblast growth factor-6 (Fgf-6) loci has been reported previously (Hogan et al., 1991). A description of the probes and restriction fragment length polymorphisms for the loci linked to the CD-MPR pseudogene on mouse chromosome 3, including fibrinogen y poIypeptide (Fgg), con- nexin-40 (Cxn-40), and nerve growth factor P (Ngfb) loci have been de~cribed.~ Recombination distances were calculated as described (Green, 1981) using the computer program SPRETUS MADNESS. Gene order was determined by minimizing the number of recombi- nation events required to explain the allele distribution patterns.

Other Methods-Preparation of plasmid DNA, phage DNA, and genomic DNA electrophoresis and restriction mapping of DNA, and polymerase chain reaction were carried out essentially as described by Samhrook et al. (1989). Hybridization probes were labeled with [ C X - ~ ~ P I ~ C T P using a random oligonucleotide primer kit (Stratagene).

J.-A. Haeflinger, R. Bruzzone, N. A. Jenkins, D. J. Gilbert, N. G. Copeland, and D. L. Paul, submitted for publication.

Page 3: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

Gene and Pseudogene of the CD-MPR 12213 1 2 0

360 240

4 8 0 600 1 2 0

960 840

1080 1200 1320

1560 1440

1680 1800 1920 2040 2160 2280 2400 2520 2640

2880 2160

3000 3120 3240 3360 3480 3600 3120 3840

3960 4080 4200 4320 4440

4560

4680 4800 4920 5040 5160 5280 5400

ctcataattcaatqatqttatqccttttq~t~qq~qq~qqqaq~ctq~tttc~tq~qc~tqqt~actaqtaatccqtttctttttctcttc~tccttttcttqtat~TTTTGAGAGTACT 5520 rPheGluSerThr

ValGlyGlnGlySerAspThrTyrSerTyrI lePhehrqValCy~ArqGl~laSerAsnHisSerSerGlyAlaGlyLeuValGlnI leAsnLyaSerA~nAspLy~GluThrValVal G T G G G C C A G G G C T C A G A C A C A T A C A G C T A C A T A T T C A G A G T A T G C C G G G ~ G C T A G C ~ C C A C T C C T C T G G A G C A G G C C T G G T C C A G A T C ~ C ~ G C ~ T G A C ~ G G A G A C A G T G G T T 5640

GlyArqI l eAsnGluThrHis ILePheAonGlyS GGGAGAATClUC~GACTCACATCTTC~TG~~aqatatttcctqctqactaatcccaaqtcaqccctccaqtaatcctqtctqagttqqccttctccaqctttcTqtatqccgtq 5760 ctttatqqaattqqtctagactqtaqctcacaqtqcccaqttcaqaqctqctcaqqqcacctctgtqcttqaaqacaqtqcqaaactttqataaqaqaatqtaaqattqqacacacagtt 5880 c a a q q t a q a q c a t t t q c c c a c a t q c a c a a a q t t ~ t q = c = t t q ~ t c c t t ~ ~ t ~ c c ~ c ~ ~ ~ a ~ ~ q q ~ a q a c ~ q ~ ~ q ~ a a a q q ~ t q g = = ~ ~ q q ~ ~ ~ t a t q ~ c t ~ t q ~ c t q a c t q ~ q a a t q ~ t t 6000 qactctaqccaqaattttacacaqtcLqcaqtccctqccagtcacccaaatactcaaccaqttccataccaqccactqccatttctcatccaattaaccactattqaatcttaaqataac 6120 t t t t t t t g t q c a a a c c a q t a a g t c t a t t t ~ ~ q q q q q q a ~ a a a ~ ~ c ~ t ~ t q ~ ~ t ~ q t t q t a c t ~ ~ = q a c t g t ~ t t t t t c ~ ~ ~ t q t c a t q ~ ~ ~ q t ~ t ~ t q c t q g ~ g q q q ~ q ~ ~ c c t q t q g t ~ 6240 tqatttcttcccataacaqataaattatt~tcc=qtqtag~t~ttq==t=tqta~=q==t~a~~c=aqttcttqacccaqa=tctctcct=gtqcg~tqgtqtttcaqtc=ct=tt=t=ttq= 6360 tataaacaqacaccatqatcaagaqaq=~~q=atttaattqqaqq~ttacattttc=q~qqtt=gtqcatt=tcaccaca==qqqgaqc=tq~~gt~=ac=qqtac~c=tqqtqctqq=q~ 6480 aqtaqctcaqaqttctatqtcctqatctqqqtaaaaaq~tqqqqaaaqa=tctqqqccttq==tqqqtttttqaaacctcaaaccc=ccc=~qtqacqtacttcctcc=a~aqq~==c=c 6600 cccctaatctttacaatcctttcagacagttccatqccatagtq~ctt=acc~qtc~~~t~t=tqqqqcc~ctcttattc~q~cc~cc===q~tqqcttt~~q=c=cc=qctc~=t~~t~ 6120 ttaaaatcttqttttctctataqcaaaqttqaaaqaatccttaacttqqttqatagtcaa===qtattttacaqaqaatgttaatc~t=qtaaataaatacttaq=qtccctqctqcctq 6840

aqqttaatcttacatatctqqtcattqctcttaaaaca=qqqaacaaaaqttttctt~=q~aacaaqttttaaqqc=~qqtaaqq~qqctc=tqtctqtq=tctc~~~=cttqqq=qq 7080 qcaatcttgtqqaatcccctcttcttttqcattqtqttq=qaa~atqcacatcaqtqttaaaaqcttqcataqaaactqt=tttqc~qt~tctcqc=tqc=ttt~gt~t=q===aqaata 6960

caaaqqcaqqaaqatcacaaqttqqaqqccaqccaq~qctacataqc~aaa=tcatctttt~~aaaaqaaaqattttaatttaqcatt=aqataaac=qatcqt~tc~~tt~tc=tt==c 7200

t t t t c c t q t c t ~ T T T G C A T C A T T G G T T G C T G T C T A T A T C A T T G G G G G T T T c T T A T A C C A G C G A C T G G T A G T G G G G G C C ~ G G ~ T G G A G C A G T T T C C T C A T c T ~ C C ~ T C T G ~ A G G 8400

ATCTTGGCAI\CCTAGTAGCT~~aqtaqc=a=qtt~=tttqqqct~~qc~ttqqattttqqqtqtt~t~tqqqq=qqqqq~tttqqqq~q~acqtc~qtttq~~~cc~t~t~~tctqttt 8520 SpLeuGlyAEnLeuValAla

ePheAlaSerLeuVa1AlaValTyrIleIleGlyGlyPheLeuTyrGl~qLeuValValGlyAlaLysGlyMetGluGlnPheProHisLeuAlaPheTrpGluA

c a c t c t q a a a t a q a t a a t c a a q a t t a t c t t t c c t t t q c t t t t t q c t t c c ~ G A T G G T T G T G A C T T T G T G T G c C ~ T C C A A A C C C C G c ~ T G T G C C T G c A G C A T A T c G T G G A G T G G ~ G A T 8640 AspGlyCy5ASpPheValCy~ArqSerLysProArqAsnValPloAlaAlaTyrlVqGlyValGlyAsp

A ~ p G l n L e U G l y G l U G l l l S e I G l U G l l i A r q A S p A S p H i ~ ~ ~ G A C C A G C T G G G T ~ G A G T C G G A A G A A A G G G A ~ G A T C A T C T G C T A C C ~ T G ~ G A T T G C A C T T T ~ T G T C T A G C C A C C T C T T C A G T C C C C ~ C C ~ G C A T A C T C A G C ~ G A C T T C T ~ 8760 GCCGGTCTCCCATCTCTCACCTTGCCCTTAATATTCrTGCTTTCCAGTTGGCTTTTGATTTGACCCT~CCTGCCTCTCTTrGCTCCTTCCTATrGTTTCTCCTCTGCAC~GTAGAGTG 8880 G A A G G C A G A C A G A C A T A G G G C C T G T G G G G C A G A C T C C T C C T G C A C C C C A G G ~ ~ G C ~ G ~ C A G C T ~ T A G G C A G ~ G C C A T G A G C C C T G G C T G T ~ C A T r T T T A T ~ G T T G ~ C A C T 9000 G G A T A C T C T ~ G C A A A G A T T A C T T G T A C C T T T T G G T T G C C T C ~ G T T G T T C T G C A G T G C T T A C C T G G C T G T ~ C C C ~ T A C T C C C A T T T ~ T T T G A G C C T G ~ T C A G T C A C C 9120

9240 9360 9480 9600 9120

9960 9840

10069

FIG. 2. Complete nucleotide sequence of the mouse cation-dependent mannose 6-phosphate receptor gene. The layout of exons within the gene was determined using the consensus sequences of the exon/intron boundaries and by comparison to the mouse cDNA. Intron sequences are displayed in lowercase letters. The transcription initiation site and the end of the CD-MPR RNA are indicated by a boldlitalic typed nucleotide (positions 1245 and 9922), respectively. Intron boundaries with the consensus gt and ag dinucleotides and the two putative poly(A) signals are underlined. The amino acid setpence of the mouse CD-MPR is also shown. Potential glycosylation sites are indicated by asterisks.

Page 4: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

12214 Gene and Pseudogene of the CD-MPR TABLE I

Exonlintron boundaries of the m u s e CD-MPR gene Nucleotide sequences at the exon and intron boundaries are shown in uppercase and lowercase letters, respectively. The intron sizes are

shown in parentheses. Exon 1* indicates the putative exon found in the pseudogene. Exon/intron Exon size

number Exon Intron (bp) Exon

1 160 TGACAC g t g a g t .......... (2451) .......... t t t t t t t t t c c c c a g AGAAAT 1* 24 ATTGAG g t g a g t ...........( 606) .......... a c a c t c t t c c a a a a g AATGTT 2 180 CAAAAG gtaaag ........... (842) .......... c c t t t t c t t g t a t a g T m G A 3 167 ATGGAA gtaaga ..........( 1630) .......... c t t g t c c c t c c t c a g GTAATT 4 110 CTAGCG gtaagg ........... (135) .......... t c t t c t c a t c a t c a g GCTAAT 5 131 TGTCAT g t g a t a ........... (613) .......... t c t t t t c c t g t c t a g ATITGC 6 127 GTAGCT g t a a g t ........... (151) .......... g c t t t t t g c t t c c a g GATGGT 7 1247+

Consensus sequence g t r a y (n )yag dxw

G A T C 1 2

FIG. 3. Primer extension with mouse liver RNA. A 3ZP-la- beled oligonucleotide was annealed to total RNA (lane 1 ) and tRNA (lane 2 ) as described under “Experimental Procedures” and extended using avian myeloblastosis virus reverse transcriptase. The products were separated on 6% polyacrylamide, 7 M urea gels together with a sequencing reaction using the same oligonucleotide as primer on genomic DNA. Lanes GATC show the sequence of the noncoding strand. The arrow indicates the position of the primer extension product.

RESULTS

The Mouse CD-MPR Gene-We first isolated mouse CD-

MPR cDNA clones from a mouse liver cDNA library. The longest clone had a 147-bp 5”untranslated region, an 834-bp open reading frame, and a 1247-bp 3”noncoding region con- taining two putative polyadenylation signals (217 and 1210 bp downstream of the termination codon). The deduced amino acid sequence (see Fig. 2) was identical to that of the mouse CD-MPR whose cDNAs were recently isolated from mouse fibroblast and embryo cDNA libraries (Ma et al., 1991 and Koster et al., 1991) and more than 90% identical to that of the bovine and human CD-MPRs (Dahms et aZ., 1987; Pohl- mann et al., 1987).

A full-length cDNA was used to isolate the CD-MPR gene from a mouse genomic library. Sixteen positive clones were obtained and analyzed by restriction mapping. Fragments hybridizing with the probe were subcloned and sequenced. Fig. 1 shows a partial restriction map of the CD-MPR gene, its organization, and the sequencing strategy which was fol- lowed. The complete nucleotide sequence is displayed in Fig. 2. Comparison of this genomic sequence with that of the mouse cDNA revealed that the mouse CD-MPR is encoded by seven exons. All exon/intron boundaries exhibit the con- sensus sequences typical of splice junctions (Mount, 1982). Table I summarizes the exon/intron boundaries, the consen- sus splice donor and acceptor sites, as well as the length of the different exons and introns.

The transcription initiation site of the CD-MPR gene was determined by primer extension of mouse liver mRNA. The size of the resulting fragment indicates that the transcription initiation site is a cytosine located 161 nt upstream of the translational initiation codon (Fig. 3). No product was formed when liver mRNA was replaced by yeast tRNA. The region upstream and downstream of the initiation site (between nt 720 and 1620 in Fig. 2) is unusually rich in G and C residues (60%). The CpG dinucleotide is approximately equal to the GpC pairs. Such sequences, called CpG islands (Gardiner- Garden and Frommer, 1987) as well as the lack of the “TATA box” are characteristic of constitutively expressed “house- keeping” genes (Dynan, 1986).

The Mouse CD-MPR Pseudogene-During the sequencing of the genomic DNA, we noticed that several of the isolated clones lacked sequences corresponding to identified introns. Southern hybridization analysis of digested mouse genomic DNA using either an intron- or an exon-specific probe also revealed that a pseudogene lacking intron sequences was present in the mouse genome (not shown). The complete nucleotide sequence of this CD-MPR pseudogene is shown in Fig. 4. The CD-MPR pseudogene is bounded by 12/14-bp direct repeats (5’ AAGAT/CT/ATATAATTC 3’) at the 5’ and 3‘ termini. The nucleotide sequences 5’ upstream and 3’ downstream of the 12/14-bp direct repeats were totally unre- lated to the sequences of the CD-MPR gene (not shown).

Page 5: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

Gene and Pseudogene of the CD-MPR 12215 1 - ~ ~ A A C A A A ~ ~ G C C C P C A C G T A A T G T C T T C C C A

101 G ~ ~ ~ T C C C T G G R G G T G G R G C T G A A C ~ C A T T ~ ~ ~ T ~ ~ ~ A A R G G C G A C C ~ ~ ~ ~ T G R T C T T G R C A A T C P C A A G A T T ~ A G T P G A T ~ C C A ~ ~ ~

201 G A A C ~ ~ ~ C G A A C I A G G C T G C A C C C A R T C A G C ~ C C A G A R G ~ ~ ~ A G A C C A ~ A R C C A R G ~ C P G ~ ~ ~ A T A A G C C ~ ~ ~ G G A ~ T A ~ G ~ ~ ~ A ~ A G C A ~

C G G G G C P C A T T G R G G C C C C T A ~ ~ ~ G A A G C G C ~ T ~ ~ A G R ~ ~ T C G R G ~ ~ ~ A C P G 57 301 TGCTCPGUIGCCTCPCGCTGCTTCCTGTTTCAGG~TCPGGRG ........................ G ................................

GGGTCTGCGCCCCCGCACCCCCTGA~TCCCGGRCGCG~~T~CAACCTC~~~AGAC~ATPC?T~~T~~TGRGCCGCGGI'T~TCPGTPTTCCCGTGR 157 401 ....... T .......................... A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T . . . .

501 .. . A ~ A c A T r c + + t ~ ~ . ................................................................... T TCCCCPTLTCTGGCTG~~GGR~AACTGPTATTGI'TCCTRCTCCT~GTG~GTGAGAG~ 233

601 .......... C .................... A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . TCTGGCAGATAGAA~TCGTGTGRC~~A~AGAGARGGATAAGGAGTCAAAGAAC~GGI'GGCT~C~GGAGAGGI'TACG~CACTGI'TTA 3 3 3

701 R C - ~ T T P G R G A ~ A C P G ~ C A G G G C T C R G 4 3 3 ....................................................................................................

801 ............................................... C ... T..............G...........................G..... GGTCCAGIITCAR~GCARTGACAR~AGIIC~~~T~~~TTGGGRGAAT~CGAGRCTACATCTTCAATGGAAGTAATP~~~ATCA~~ATATATAAA 533

G G G G G T G A T G R R T A T G R C R ~ C T G T ~ A A R G R G C A G C R G C 633 891 ..................................................................................................

T G T C T G R G G A R C G R G G C A A A G ~ ~ C A G G A T T G C T T C T R C C ~ ~ T 733 997 ....................................................................................................

WLTCPTACTTGTCATATTTATCATT~~~TTGCTGTCTATAT~TTGG~~~G~TCPTATACCAGCGACT~T~~~T~G~CC~GGAATGG~~~C~~~T~ 833 1097 ........................ G .................................. C.T.....T................................

C C T C A T C P G G C C T T C ~ A G G R T ~ T ~ ~ ~ C A A C C P A G I ' A G C T ~ T ~ T T G T G A C T P T ~ G ~ C G A ~ C A A A C C C C ~ ~ ~ C A A T G T G C C ~ C ~ ~ ~ C A T A ~ ~ G T G 933 1191 ................................................................. T ......... T........................

G A G T G G G R G A T G A C C A G C T G G G I G A A C R G T C ~ ~ ~ A A G A A A ~ A T G A T C A T C T G C T A C ~ ~ T G C ~ T ~ A A T G T C T A G C C A C ~ ~ T C A G ~ C C C C 1033 1291 ....... G .............. G . . . . . . . A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............

A R A C C A A R G C A T A C T C A G C C A G A C ~ C ~ ~ ~ ~ G T T C C C C A T C ~ ~ T ~ C ~ T G C C ~ T A A T A T T ~ T G C T T P C C A G T P ~ ~ ~ C T P T ~ A T P T G A C C C T A A 1133 1391 ........................................................ TG ..........................................

C C T G C C T C T C P T T r r C ? T C ~ A ? T G ? T T ~ C ~ C T G C A C A A G T f f i A ~ ~ ~ f f i A ~ G A C A T A ~ C T G T G ~ ~ G ~ T C C T G C A C C C C C A G G A 1233 1491 T

A C ~ A C C ~ C A G C T A A ~ G A R A ~ C A T G R G C C 1333 1593 T.. .........................................................

1692 ........................................................................................ T .. A . . . . . . . . T P G T A C C T T T G G T P G C C T C T ~ T ~ ~ G ~ T P A C C T ~ ~ T A A C C C C A A A T A C T C C C C A T T ~ T P T G R G C ~ ~ T ~ G T C A C ~ G R T T G T C C C 1433

1192 T G.........C........A........T.................... CPTGI'GTGGGATTAGGACKGGAAAAAAAATGTGAAATTCCCC~CPACAG~TTTAA~ACCATCA~TTGTGI'ATAffiT~TAA~~TTGATA 1533 ...........

1890 ..................... TA ......................................................................... CA.. GPTTGARTTffiACCCTGGCCCATGGTTTRTAT~TCATAffiTARCCCAAATGCCTAATCIIAGTffiTCCTTPTCPAAAGTAAACffiACCC~~~GTPGCCC 1633

1990 ......................... A . . . . G . .......................................................... A ......... TPGTGGTAGTPAGI'CTCAGI'~C~~~CCPT-TTCPT~T~T~CCCTCPA~~GGI'ACTT~~T~G~~~T~~T?TTCTCAC~~~ATCTGI'AGI'TTG 1732

2090 ........................................... A ................ A . . . . . . . . . . . . . . . . . . . . . . . . . C . . . . . . . . . . . . . GCTTITTATACT?TCCCCAAT~T~~IIAGG~~AAGCGG~GTffiACATGGffiGCPCTGCffiG~T~GCPTTGffiT~GRC~TGCTGCACC 1832

2190 ............................ A. .... T ................... T ....... C....ATTT......... ................. A T. TGTCPCT~C~~CTU;CPC~~~~~GTCTGRCCCACPTATCTAT~~~G~GGATT~A~TCPCT----CTPTCTGTPTAGGCCATCTGCCCAGRTGC 1928

2290 GC.. . . ...... T...... ........ CT.. ........ C..A... ...................................................... - - I I A G C T G A T C T C G C T A T A A G T ~ G C C C P A ~ ~ ~ C ~ C T A R ~ T A A C A A C A C A ~ T A R T ~ G ~ C ~ A ~ ~ ~ A G R T ~ G ~ A R A ~ A G ~ C A T T ~ T T A 2025

2390 CPTTAGRGAARAATGAATGGI'CATTCCCTACT~T~TACCCTTGCAAT~CffiT~AAAT~T~TTCTTPCTATACAAG~TAAT~TATT~ 2125 ....................................................................................................

2490 T. .....TC..... CCT ..................... C..TG ......................... C ............. C ................. - ~ ~ T ~ ~ T - - - T P G T P G I ' T ~ C A T C C T A A C C C C G T G - - T T T P ~ T G ~ T A A T A T A C C A C P ~ G I ' A C ~ C - A ~ T ~ A 2218

""

2590 TAAA A . . C . . . . C . . . . C- AAA&ACAC- ( Poly A 1 .......

FIG. 4. Nucleotide sequence com- parison of the CD-MPR pseudogene and cDNA. The ATG codon and stop codons are shown by open boxes. The 12/ 14-bp direct repeats that bound the in- tronless gene and the two putative poly(A) signals are underlined. Dots in- dicate identity with the cDNA sequence. Dashes indicate the absence of nucleo- tides. Numbers on the left and the right side of the sequence refer to the pseudo- gene and the cDNA sequences, respec- tively.

Comparison of the CD-MPR Gene, Pseudogene, and cDNA-We first compared the nucleotide sequence of the pseudogene with that of the cDNAs (Fig. 4). A large part of the pseudogene (from nt 340 to 2614) is highly homologous, almost colinear with the cDNA, and contains the remnants of a poly(A) tail in the 3"noncoding region. Several point mutations, deletions, and insertions can be found over this region of the pseudogene. Its sequence predicts that it would encode a protein 95% identical to the CD-MPR. However, a deletion at position 939 would introduce a stop codon at position 950, giving rise to a truncated CD-MPR made up of the first 141 amino acids.

The pseudogene differs from the cDNA in that it contains a 24-nt sequence insertion (nt 504-527). This 24-nt sequence is also found in the CD-MPR gene between exon 1 and 2 (position 3855 and 3880) and is flanked by typical consensus sequences for intron/exon boundaries (see Table I). It would give rise to an additional 24-nt exon, referred to as exon 1*, which would encode a short open reading frame Met-Ala-His-

Lys followed by a stop codon in frame with the initiation codon found in exon 2 of the CD-MPR gene.

The 5' region of the pseudogene is 340 bp longer than the cDNA. This sequence is highly homologous to the region of the CD-MPR gene located 5' upstream of the transcription initiation site that we have identified (Fig. 5). In the gene, this sequence must represent the promotor region. Although it lacks a classical TATA box, it contains a "CAAT box" at position -121 and three of its reverse complements at posi- tions -164, -218, and -295, which are recognized by CP1/ CP2 and by CPl/CP2-RC (reverse complement) factors, re- spectively. We noted that the CAAT-box at position -121 can be recognized by nuclear factor 1 (NF1) (Jones et al., 1987). All these elements are located much further upstream from the transcription initiation site than the usual -70 position. Consensus binding sites for the transcription factor SP1 (Briggs et al., 1986) can also be found at positions -237 and -253 as well as a site for factor AP2 (Imagawa et al., 1987) at position -257. The 340-nt-long homologous sequence

Page 6: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

12216

FIG. 5. Nucleotide sequences of the upstream region of the mouse CD-MPR gene and pseudogene. The upper sequence delineates the 5“flank- ing region and the first noncoding exon of the CD-MPR gene. The lower se- quence represents the 5’ end of the pseu- dogene. Identical nucleotides are indi- cated by closed circles. The transcription initiation site of the CD-MPR mRNA is depicted as +l. The potential transcrip- tion factor binding sites and the direct repeat at the 5’ end of the pseudogene are underlined.

Gene and Pseudogene of the CD-MPR GATCATCCTTGAAATCATGTACATACAAACATAAACAGATTCAGCAGGTTGTAGTGTATCTATGCAAATATTTGTAACTA

TAACAATTAAAGAAGATGAGACCATCAGTCTGAGTGGGGGACATGGGGAGAGCTGGGTGAAGGGGCCTTCGAAAGAAATG

GAAGAAGGAAAAAGTCAGAAGAAGGGGAAGTAATTATACTTTAATTAAAATGTATTAAAACTTTTAAGTTAACTTTTTTA

AAAAGTATAGGACACTACGGTGTTAGCAGTTTAAGCACAGTTCTACCTCTGACATGAATCCCTATCTAAGCAGAGTGAGG

GAGGTGCATGCAAAAGCACGTGAGAGTAAAAGGGGCATGAGCAGCTAAAGCCAAAAGAAACTTCAGAGGAAGAAATGAGT

AACCTGGAGAGCGAATCCAGGAAGCCGCAATTGTTTCAAACCAAGCTCTAAGAACAACTAGGGCTTGAGAAGTGGATAAT

AAATCCAAGGAGTCTTCAAATGACCTTACCAAAGGCACAAAGACACGGGTCCCCCCAAACTGAAGTACAGGCTAGAGGGC

CGGCACACATAGGGCAGGCGCCTTTGATATATGCCTTTGACAGTATGCTTATCTAGCAAGATGCCACAATAGGAAAATTC

ATTGCCCAGGTAACAGTAACAATAGTTATGTCCCGAAAGAATTAGAAAATTCTGGAACAAAGAAAATCAAGACAGGCAGG

ACCTATAACATTCCACTGTCTGGGTAGCAGAGTCAACTGCAAACCAGAAGCGAGCCCAGCCGACTGATGGCTAAACAGCG

CCTCACAGAGGCCACCAAGGTAGCAGAAGAGTCCTAGAAACTACAACTCCCAGAGAGCTTCGGGCGAGGGAACTACAACT

CCCAGCAAGCCCRAGGAGCGACACTCTTTGTAGTTCCACAGATCGCCCTCACGTACTGTCTTCA----GACCT~GA ............................................ AAGATTTATAATTCAACAAATCGCCCTCACGTAATGTCTTCAGTGGGACCTATTAGGA

CPl/CPZ-RC

CACCTGGCGCATGACCCCGCGTGGATGAGGAGCGCCCG~AGAGTCCCTG~TCGAACCTC~TAGAA ............................................................................ C A C C T G G C A C A T G A C C C C G C A T G G A T G G G G A G C G C C C G C C T A G A A

AP-2 SP 1 SP 1 CPl/CPZ-RC

AGGCGACCCGAAAAGAAGAAACTGATCTTGACAATCTCCGAGTG~AGTTGATGGGACCGGGGAACAGCGAACTAGG ...................................... .................................... AGGCGACCCGAAAAGAAGAAACTGATCTTGACAATCTC---AAG~AGTTGATGGGACCAGGGAACAGCGRRCTAGG

CPl/CPZ-RC

CTGCTGA~CAGCGTCCAAAAGAGAGACCAGGGAACCAAGGTTCTGCGGTCACACGCCGCGGAGTCTAGTGGGAGGA ........................................................................... CTGCTGA~CAGCGTCCAGAAGAGAGACCAGGGAACCAAGGTTCTGTGGTCATAAGCCGGGGAGTCTAGTGGGAGGA

NFl/CTP

GCAGTTGCTCAGCAGGCTCTCGCTGCTTCCTGTTTCCGGCTTCTGGAGCGGGGCGCATTGAGGCGCCTAGGGAAGCGCTG t l

............................................................................ GCAGTTGCTCTGCAGGCTCTCGCTGCTTCCTGTTTCAGGCTTCTGGAGCGGGGCTCATTGAGGCGCCTAGGGGAGCGCTG

GCTTCAGAGAGGRRTCGAGGGACTGGGGTCTGCGCCCCCGCACCCCTGATCTCCGGACGCGTCTTCCAACCTCAGAGACA .............................................................................. GCTTCAGAGAGGAATCGAGGGACTGGGGTCTGTGCCCCCGCACCCCTGATCTCCGGACGAGTCTTCCAACCTCAGAGACA

CATTCTTGGGGGCTTCTGAGCCGCGGTTGCTCTGTTTTCCCGTGACAC +160 ............................................... CATTCTTGGGGGCTTCTGAGCCGCGGTTGCTCTGTTTTCCTGTGACAC 503

in the 5’ region of the pseudogene also exhibits several point mutations. For example, the CPl/CPZ-RC binding site at position 52 and the SP1 binding site at position 110 of the pseudogene are affected by nucleotide exchanges suggesting that the promoter activity of the pseudogene could be altered.

Promoter Activity of the CD-MPR Gene and Pseudogene- In order to determine the promoter activity of the CD-MPR gene, a 1.4-kb fragment (nucleotides 1-1411 in Fig. 2) con- taining the 5”flanking region of the gene and the first exon was cloned into the CAT-vector pBL CAT3 (pBL CAT3 CD- MPR). Similarly, we introduced a 1.4-kb fragment containing the 5’-flanking region and the 5”untranslated region of the pseudogene into pBL3CAT (pBL CAT3 CD-MPR PSEUDO). Mouse L-cells were transfected with these constructs or with the vectors pBLCAT2 containing the HSV-TK promoter (reporter promoter) or pBLCAT3 (control vector). The cor- responding cell lysates were then assayed for their chloram- phenicol acetyltransferase activity. Fig. 6 shows that the 5’- flanking region of the CD-MPR gene had a higher level of CAT expression and therefore a stronger promoter activity than the HSV-TK promoter (pBL CATS) used as a positive

-1085

-1105

-925

- 8 4 5

- 7 65

-685

-605

- 6 0 5

-525

- 4 4 5

-365

-289

58

-209

138

- 1 2 9

215

-49

2 95

+32

3 1 5

t112

455

control. The pBL CAT3 CD-MPR PSEUDO produced an equal level of enzyme activity as the pBL CAT2. However, its promoter activity is far lower than that of the CD-MPR promoter. This lower promoter activity could be attributed to the point mutations in the CPlICP2-RC and SP1 binding sites described above. This result suggests that the CD-MPR pseudogene can be transcribed.

The CD-MPR Transcripts-Northern blot analysis using the entire mouse cDNA as a probe revealed two CD-MPR mRNAs of approximately 2.4 and 1.5 kb in length (Fig. 7). The size difference of the two messages reflects the use of two different polyadenylation signals (Dahms et al., 1987). The mouse CD-MPR cDNA also contains two different polyade- nylation signals (ATTAAA, Jung et al., 1980, Tosi et al., 1981) 217 and 1210 bp downstream of the termination codon (Fig. 4). In adult. mice, the amount of both CD-MPR mRNAs is very high in tissues such as liver, spleen, and kidney and low but still detectable in heart and pancreas (Fig. 7C). In the mouse embryonic stem cell line MBLl (Pease et al., 1990), the CD-MPR is also expressed in high abundance (Fig. 7A). During the early stages of development of the mouse embryo

Page 7: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

Gene and Pseudogene of the CD-MPR 12217

(days 7-15), transcripts of the CD-MPR are easily detected (Fig. 7B); however, expression of the CD-MPR decreases during the later stages of development. We conclude from these results that the CD-MPR is differentially expressed in mouse tissues and during mouse development.

Chromosomal Localization-The mouse chromosomal loca- tion of the structural CD-MPR gene and pseudogene was determined by interspecific backcross analysis (see “Experi- mental Procedures”). The results indicate that they are both present at one copy in the mouse genome (Fig. 8). The CD- MPR locus is localized on mouse chromosome 6, linked to the ras-related fibrosarcoma oncogene (Raf-1 ), ret protooncogene (Ret), lymphocyte antigene-4 (ly-4), and fibroblast growth

P B L C A T 2 p B L C A T 3 p B L C A T 3 p B L C A T 3 CD-MPR CD-MPR

pseudo FIG. 6. Promoter activity of the 5”flanking regions of the

CD-MPR gene and pseudogene. pBLCAT3-MCDMPR and pBLCAT3-PSEUDO were constructed as described under “Experi- mental Procedures” and transfected into mouse L-cells. After 48 h, CAT activity of the cell extracts was determined. pBLCAT2 contain- ing the HSV-TK promotor and pBLCAT3 (vector alone) were used as positive and negative controls, respectively. The reaction products of the CAT assays were separated by TLC and detected by autoradi- ography. The lower spot represents the chloramphenicol used as substrate, the upper spots the acetylated chloramphenicol.

a b

7 I O 12 14 15 16 17 19 __

factor-6 (Fgf-6) (Fig. 8A). The ratios of the total number of mice exhibiting recombinant chromosomes to the total num- ber of mice analyzed for each pair of loci and the most likely gene order are: centromere, Raf-l-2/178-Ret-2/189-CD-MPR- 3/186-Ly-4-3/188-Fgf-6. The recombination frequencies (ex- pressed as genetic distance in centiMorgans f S.E.) are Raf- 1, 1.1 f 0.8; Ret, 1.1 f 0.7; CD-MPR, 1.6 f 0.9; Ly-4, 1.6 f 0.9; Fgf-6. The CD-MPR pseudogene mapped to chromosome 3, linked to fibrinogen y polypeptide (Fgg), connexin-40 (Cxn- 40), and nerve growth factor /3 (Ngfo) loci (Fig. 8B). The ratios of the total number of mice exhibiting recombinant chromosomes to the total number of mice analyzed for each pair of loci and the most likely gene order are: centromere- Fgg-13/191-Cxn-40-0/158-CD-MPR-pseudo-8/153-Ngfb. The recombination frequencies are Fgg, 6.8 f 1.8; [Cxn-40, CD- MPR-pseudo], 5.2 f 1.8; Ngfb. The fact that no recombination between Cxn-40 and CD-MPR-pseudo was observed in 158 mice typed suggests that the two loci are within 1.9 centi- Morgans (upper 95% confidence limit).

DISCUSSION

In this study we have determined the organization and the chromosomal localization of the mouse cation-dependent mannose 6-phosphate receptor gene and pseudogene. The CD- MPR gene maps to the mouse chromosome 6 in a region which would extend the region of synteny with the human chromosome 12. This would agree well with the localization of the human CD-MPR which has been assigned to the human chromosome 12 (Pohlmann et al., 1987). Comparison of the mouse CD-MPR gene with the corresponding cDNA would indicate that it consists of 7 exons. The 5‘-flanking region of the gene resembles that of a housekeeping gene and transcrip- tion initiates a t a unique site. During the preparation of this manuscript, Klier et al. (1991) reported the organization of the human CD-MPR gene. Their results are similar to ours but there are also interesting differences. Analysis of the complete 10-kb sequence of the mouse gene and characteriza- tion of the pseudogene revealed the presence of a possible additional 24-nt-long exon (exon 1*) located between exon 1 and 2 of the mouse CD-MPR gene and flanked by typical consensus sequences of exonlintron boundaries. Such a se- quence was not identified in the human gene whose introns were not sequenced. Similarly, Klier et al. (1991) reported the existence of two transcription intiation sites in the human

C

S P L i T H L u K M B S E “. - .

FIG. 7. Detection of the mouse CD-MPR transcripts. RNA from different sources (20 pg of total RNA/lane) was probed for CD-MPR transcripts as described under “Experimental Procedures.” a, mouse embryonic stem cells (MBL-1); b, 7, lo-, 12-, 14-, 15-, 16-, 17- and 19- day-old mouse embryos; c, different tissues from adult mice: S, spleen; P, pancreas; Li, liver; T, thymus; H, heart; Lu, lungs; K, kidney; M, muscle; B, brain; SG, salivary gland. The positions of the 28 and 18 S rRNA molecules are indicated. Hybridization with a mouse glycerin aldehyde phosphate dehydrogenase (GAPDH)-specific probe indicated that identical amounts of RNA were loaded (not shown).

Page 8: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

12218

4q28

1 pter-q 12

lp22-p13

FIG. 8. Chromosomal localization of the CD-MPR gene and pseudogene. A, position of the CD-MPR locus on mouse chromosome 6. The CD-MPR gene (M6pr) was localized to mouse chromosome 6 by interspecific backcross analysis. The segregation patterns of the CD- MPR and flanking genes in 170 backcross animals that were typed in common for CD-MPR is shown at the top of the figure. Each column represents the chromosome identified in the backcross progeny that was inherited from the (C57BL/6J X M. spretus) F1 parent. The shaded boxes represent the presence of a C57BL/6J allele, and white boxes represent the presence of a M. spretus allele. The number of offspring inheriting each type of chromosome is listed at the bottom of each column. A partial chromosome 6 linkage map showing the location of CD- MPR in relation to linked genes is shown at the bottom of the figure. Recombination distances between loci in centiMorgans are shown to the left of the chromosome and the positions of loci in human chromosomes are shown to the right. References for the map positions of most loci in human chromosomes can be obtained from OMIM (Onlines Mendelian Inheritence in Man), a computerized database of human linkage information maintained by The William H. Welch Medical Library of The Johns Hopkins University (Baltimore, MD). B, position of the CD-MPR-pseudogene locus on mouse chromosome 3. The CD-MPR pseudogene (MGpr-Ips) was localized as described above, except that the segregation patterns are shown for the 148 animals that were typed in common for CD-MPR-pseudogene and flanking genes.

gene, whereas we consistently observed a unique site. Fur- thermore, we could show that a high G/C content is not restricted to the 5”flanking region of the inititation site but also extends into the first intron. These G/C-rich regions are important for transcription of housekeeping genes (Gardiner- Garden and Frommer, 1987).

We have also characterized a CD-MPR pseudogene which maps to the mouse chromosome 3. Pseudogenes are believed to arise either by gene duplication or by retroposition, in which an mRNA species is reverse-transcribed into DNA copies and randomly inserted into the genome (Brosius, 1991). The CD-MPR pseudogene exhibits all the hallmarks of a retroposon (also referred to as processed-type pseudogene). It contains no introns, i.e. part of its sequence is colinear with and highly homologous to the mouse liver cDNA. It also contains the remnants of a poly(A) tail at the 3‘ end. It is bounded by the remnants of flanking direct repeats, which may have been used when integration occurred randomly into the genome (Rogers, 1985). The 95% nucleic acid identity between the gene and the pseudogene suggests that this reintegration is a relatively recent event. The presence of such a pseudogene has not been reported in humans.

The CD-MPR pseudogene has two intriguing features. First, it contains the 24-nt insert that is also found in the gene (exon I*). Its presence in a retrosposon strongly suggests that it is used as an additional exon in the gene. The second is that its 5’ sequence is highly homologous to the promoter region of the gene. Chloramphenicol acetyltransferase assays indicate that indeed the 5’ region of the pseudogene confers some promoter activity, but less than that of the promoter of the gene. We attribute this loss of activity to point mutations affecting two of the putative binding sites for transactivating factors. We do not yet know whether this pseudogene is

transcribed and functional in uiuo. The corresponding tran- scripts have not been detected. The open reading frame of the pseudogene would code for a soluble protein that corresponds to the signal sequence and the first 122 amino acids of the CD-MPR luminal binding domain. It would lack a significant part of the luminal domain including 2 cysteine residues, the transmembrane and cytoplasmic domains. It is unclear whether such a soluble CD-MPR would bind Man-6-P-con- taining ligands, since the cysteine residues of the CD-MPR luminal domain are required for ligand binding conformation (Wendland et al., 1991).

Our study and others (Klier et al., 1991) indicate that the promoter region of the CD-MPR gene resembles that of other so-called housekeeping genes. Northern blot analysis indi- cates, however, that the CD-MPR gene is differentially ex- pressed in tissues of adult mice. In addition, the level of CD- MPR mRNAs is high during the early stages of mouse devel- opment but reduced during later stages. Similar observations using quantitative western blotting were reported for expres- sion of the CI-MPR in rat embryos (Sklar et al., 1989). The unusual features of the CD-MPR pseudogene may suggest that expression of the CD-MPR gene can be regulated. The presence of a 5‘ extension in the pseudogene, homologous to the promoter region of the CD-MPR gene, may reflect syn- thesis of an alternative transcript using promoter elements further upstream of the promoter elements that we have described. We have shown that, in mouse liver, transcription of the CD-MPR mRNA initiates at a unique site. We interpret our data as indicating that this putative alternative initiation site is infrequently used or at least is used in tissues other than liver. Several genes such as the amylase gene are regu- lated in a tissue-specific manner through the production of different transcripts by the use of different promoters (Young et al., 1981).

Page 9: Gene and Pseudogene of the Mouse Cation-dependent Mannose 6

Gene and Pseudogene of the CD-MPR 12219

The presence of the 24-bp insertion in the region of the pseudogene corresponding to the 5'-noncoding region of the CD-MPR cDNA also suggests that the CD-MPR mRNA is alternatively spliced. As mentioned above, this sequence may represent an additional exon in the mouse CD-MPR gene but is absent from the CD-MPR cDNA that we have isolated from mouse liver and other cDNAs isolated from bovine liver (Dahms et al., 1987), human placenta (Pohlmann et al., 19871, mouse fibroblasts (Ma et al., 1991), and mouse embryos (Kos- ter et al., 1991). Using probes specific for this 24-nt insertion, we have not been able to detect a transcript containing this sequence in the few tissues we have examined. Nevertheless, it cannot be excluded that this transcript is transiently ex- pressed or present in low abundance. Alternative splicing can occur in a tissue-specific (Helfman et al., 1990), development stage-specific (Breitbart et al., 1985), or sex-specific (Nagoshi and Baker, 1990) manner. The interesting feature of this putative alternatively spliced exon is that it contains a 12-nt open reading frame followed by a stop codon which is six nucleotides upstream of and in frame with the initiation codon of the CD-MPR open reading frame. Such "minicistron"-like sequences are found in many eukaryotic mRNAs and play a role in their translation (for review see Kozak, 1989). Trans- lation of downstream open reading frames can still occur by a termination-reinitiation mechanism but to a lower extent (Geballe and Mocarski, 1988; Ozawa et al., 1988). Thus, alter- native splicing of such a regulatory element as well as use of an additional promoter could provide means of regulating expression of the CD-MPR. This may be biologically signifi- cant if one considers that the two MPRs compete for ligand binding in the trans-Golgi network (Stein et al., 1987) and may have different functions in lysosomal enzyme transport (Chao et al., 1990; Watanabe et al., 1990).

Acknowledgments-We thank Ulrike Bauer (EMBL, Heidelberg, FRG), D. J. Gilbert, D. A. Swing, and M. E. Barnstead (Frederick, MD) for excellent technical assistance. We thank Thomas Schim- mang (EMBL) for providing the filter with the embryonic RNA. We also thank Drs. Catherine Ovitt, John Tooze, and Marino Zerial for helpful discussions and comments on the manuscript.

REFERENCES Breitbart, R. E., Nguyen, H. T., Medford, R. M., Destree, A. T., Mahdavi, V.,

Briggs, M. R., Kadonaga, J. T., Bell, S. P., and Tijan, R. (1986) Science 2 3 4 ,

Brosius, J. (1991) Science 2 5 1 , 753 Chao, H. H.-J., Waheed, A., Pohlmann, R., Hille, A,, and von Figura, K. (1990)

Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., and Rutter, W. J. (1979)

Church G. M., and Gilbert W. (1984) Proc. Natl. Acad. Sci. U. S. A. 8 1 , 1991-

Copeland, N. G., and Jenkins, N. A. (1991) Trends Genet. 7,113-118

and Nadal-Girard, B. (1985) Cell 41.67-82

47-52

EMBO J. 9,3507-3513

Biochemistry 18,5294-5299

1995

Dahms. N. M.. Lohel. P.. Breitmever. J.. Chirmin. J. M.. and Kornfeld, S. I . . - . ~ (1987) Cell 60,1811192 Dynan, W. S.(1986) Trends Genet. 2,196-197 Field. L. J.. and Gross. K. W. (1985) Proc. Natl. Acad. Sci U. S. A. 8 2 , 6198- . .

Gabel, C. A,, Goldberg, D. E., and Kornfeld, S. (1983) Proc. Natl. Acad. Sci.

Geballe, A. P., and Mocarski, E. S. (1988) J. Virol. 6 2 , 3334-3340 Gardiner-Garden, M., and Frommer, M. (1987) J. Mol. Biol. 196,261-282

Gorman, C. M., Moffat, L. F., and Howard, B. H. (1982) Mol. Cell. Biol. 2 ,

6200 '

U. S. A. 80,775-779

1044-io51 Green. E. L. (1981) in Genetics and Probabilitv in Animal Breedin2 Experiments,

pp. 77-113, Macmillan, New York Helfman, D. M., Roscigno, R. F., Mulligan, G. J., Finn, L. A,, and Weber, K.

Hogan, A., Heyner, S., Charron, M. J., Copeland, N. G., Gilbert, D. J., Jenkins, S. (1990) Genes & Deu. 4,98-110

N. A,. Thorens. B.. and Schultz. G. A. (1991) Deuekmment (Camb.) 113 ,

"

363-372 , , . .

Jenkins, N. A., Copeland, N. G., Taylor, B. A,, and Lee, B. K.( 1982) J. Virol. Imagawa, M., Chiu, R., and Karin M. (1987) Cell 51,251-260

43 9G-26 Jones, K., Kadonaga, J. T., Rosenfeld, P., Kelly, T., and Tjian, R. (1987) Cell

Jung, A., Sippel, A. E., Grey, M., and Schiitz, G . (1980) Proc. Natl. Acad. Sci.

Klier, H.-J., von Figura, K., and Pohlmann, R. (1991) Eur. J. Biochem. 1 9 7 ,

",__ "

4 8 , 79-89

U. S. A. 77,5759-5763

23-28 Kir-nfeld, S., and Mellman, I. (1989) Annu. Reu. Cell Biol. 5 , 483-525 Koster, A,, Nagel, G. , von Figura, K., and Pohlmann, R. (1991) Biol. Chem.

Kozak, M. (1989) J. Cell Biol. 108,229-241 Kyle, J. W., Nolan, C. M., Oshima, A,, and Sly, W. S. (1988) J. Biol. Chem.

Lobel, P., Dahms, N. M., and Kornfeld, S. (1988) J. Biol. Chem. 2 6 3 , 2563-

Hoppe-Seyler 3 7 2 , 297-300

263,16230-16235

Lobel, P., Fujimoto, K., Ye, R. D., Griffiths, G., and Kornfeld, S. (1989) Cell

Luckow, B., and Schiitz, G. (1987) Nucleic Acids Res. 13,5490 Ma, Z., Grubb, J. H., and Sly W. S. (1991) J. Biol. Chem. 266,10589-10595 MacDonald, R. G., Pfeffer, S. R., Coussens, L., Tepper, M. A., Brocklebank, C.

M., Mole, J. E., Anderson, J. K., Chen, E., Czech, M. P., and Ullrich, A. (1988) Science 239,1134-1137

Morgan, D. O., Edman, J. C., Standring, D. N., Fried, V. A., Smith, M. C., Roth, R. A,, and Rutter, W. J. (1987) Nature 329,301-307

Mount, S. M. (1982) Nucleic Acids Res. 10,459-472 Nagoshi, R. N., and Baker, B. S. (1990) Genes & Deu. 4,89-97 Oshima, A,, Nolan, C. M., Kyle, J. W., Grubb, J . H., and Sly, W. S. (1988) J.

Ozawa, K., Ayub, J., and Young, N. (1988) J. Biol. Chem. 263,10922-10926 Pease, S., Braghetta, P., Gearing, D., Grail, D., and Williams, R. L. (1990) Deu.

Pohlmann, R., Nagel, G., Schmidt, B., Stein, M., Lorkowski, G., Krentler, C., Cully,, J., Meyer, H. E., Grzeschik, K.-H.? Mersmann, G., Hasilik, A., and von Flgura, K. (1987) Proc. Natl. Acad. Sa. U. S. A. 84,5575-5579

Rogers, J. H. (1985) Int. Reu. Cytol. 93,187-279 Sambrook, J., Fritsch E. F., and Maniatis, T. (1989) Molecular Cloning: A

Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 14,5463-5467

Sharp, P. A. (1981) Cell 23,643-646 Sklar, M. M., Kiess, W., Thomas, C. L., and Nissley, S. P. (1989) J. Biol. Chem.

Stein, M., Zijderhand-Bleekemolen, J. E., Geuze, H., Hasilik, A,, and von

Tong, P. Y., Tollefsen, S. E., and Kornfeld, S. (1988) J. Biol. Chem. 2 6 3 , 2585-

2570)

67,787-796

Bwl. Chem. 2 6 3 , 2553-2562

Biol. 141,344-352

264,16733-16738

Figura, K.(1987) EMBO J. 6,2677-2681

95m Tosi, M., Young, R. A., Hagenbuchle, O., and Schihler, U. (1981) Nucleic Acids

Watanabe, H., Grubb, J. H., and Sly, W. S. (1990) Proc. Natl. Acad. Sci. U. S. A.

Wendland, M., von Figura, K., and Pohlmann, R. (1991) J. Biol. Chem. 2 6 6 ,

Young, R. A., Hagenbuchle, O., and Schibler, U. (1981) Cell 23,451-458

I"_

Res. 9 , 2313-2323

87,8036-8040

7132-7136