9
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc. Vol. 267, No. 9, Issue of March 25, pp. 6188-6196, 1992 Printed in U. S. A. Human d(V1) Collagen Gene HETEROGENEITY AT THE 5”UNTRANSLATED REGION GENERATED BY AN ALTERNATE EXON* (Received for publication, September 4, 1991) Biagio Saittal, Rupert Timplt, and Mon-Li ChullII From the Department of Biochemistry and Molecular Biology and the TDepartment of Dermatology, Jefferson Instituteof Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 and the $Man-Planck-Institut fur Biochemie, 0-8033 Martinsried, Federal Republic of Germany Cosmid clones containing the 5’ region of the human a2(VI) collagen gene have been isolated and character- ized. DNA sequencing indicates that the signal peptide and the amino-globular domain are encoded by four exonsof 142,596,21, and 66 base pairs (bp). However, Sl nuclease and primer extension analyses show that the transcription start site is not present in the 142-bp exon. Two different 5’ cDNA clones are generated by the anchored polymerase chain reaction. Using the 5‘ cDNA clones as probes, two untranslated exons (1,lA) are found 12 kilobase pairs upstream of the first coding exon. These two exons are alternatively used in human fibroblasts, and most transcripts contain exon 1 se- quence. Exon 1 shows, by primer extension and S1 nuclease protection assay, two major and several minor transcription start sites. The promoter region contains a canonical TATA box, seven GGGCGG sequences, two possible CAAT boxes, and two sequences resembling AP2 binding sites. Exon 1A contains three alternative splice donor sites and is located 650 bp downstream of exon 1. The most 3’ splice donor site of exon 1A is found within an Alu repeat sequence. Exon 1A is pre- ceded byfive GGGCGG sequences and one resembling the AP2 binding site although neither TATA or CAAT boxes are found. Two additional GGGCGG sequences are located at the beginning of exon 1A. This study establishes that the human a2(VI) collagen gene is 36 kilobase pairs long and contains 30 exons. The 5‘- untranslated and promoter regions are significantly different from the corresponding segments of the chicken gene. The human gene produces by alternative processing multiple mRNAs differingin the 5’-un- translated region as well as the 3’-coding and noncod- ing sequences. Type VI collagen forms a major class of microfibrils found in most tissues of vertebrates (for a review see Ref. 1). In situ, Grants AR 38912, AR 38923, AR 19616, and AR 39740 and by * This work was supported in part by National Institutes of Health Deutsche Forchungsgemeinshaft Grant SFB 266. The costs of publi- cation of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numberls) M81834, M8183.5, and M81836. $ On leave of absence from the Istitutodi Biologia dello Sviluppo, Consiglio Nazionale delle Ricerche, Palermo, Italy. I( To whom correspondence should be sent: Dept. of Biochemistry and Molecular Biology, Thomas Jefferson University, 233 S. 10th St., Philadelphia, PA 19107. these microfibrils localize close to cells, nerves, blood vessels, and large collagen fibrils andare considered to have an anchoring function(2,3). Consistent withsuch a function are the biochemical findings that type VI collagen binds cells (4, 5) and that its fusion protein binds type I collagen (6). The binding activity also implies that in addition to its structural role, type VI collagen may be involved in cell migration and differentiation and embryonic development (7). Type VI collagen is unusual among collagens in that the noncollagenous domains comprise 80% of its total mass. It consists of three chains, al(VI), a2(VI), and a3(VI), with molecular masses of 140, 140, and 300-340 kDa, respectively (1, 8, 9). Full-length cDNAs encoding all three chains from the human (10-12) and the chicken (6, 13-16) have been cloned and sequenced. Analysis of their primary structure indicates that each chain contains a central collagenous do- main of 335-336 amino acid residues, which is flanked by a large globular domain at both amino and carboxyl termini. The globular domains consist of repetitive motifs of approxi- mately 200 residues which share significant identity with similar domains in von Willebrand factor, certain integrins and complement components, and the cartilage matrix pro- tein. The al(V1) and a2(VI) chains have one such repeat in the amino-globular domain (Nl) and two in the carboxyl- globular domain (C1 and C2). The a3(VI) chain has a total of 11 repetitive motifs, with 9 found in the amino-globular domain. Collectively, the domain structures suggest that the three type VI collagen genes may have evolved by exon shuffling and duplication of two primordial genes, one coding for the collagenous domain and the other the 200-residue repetitive motif. The human gene encoding the a2(VI) collagen is present as a single copy on human chromosome 21q22.3 in proximity to the al(V1) collagen gene (17). Recently we have isolated and characterized the genomic region coding for the carboxyl- globular domain and the triple-helical domain of the human a2(VI) chain (18, 19), and other investigators have reported the complete structure of the chicken a2(VI) collagen gene (20-22). In this study we have completed the isolation and characterization of the entire human a2(VI) collagen gene. We demonstrated here that even though the exon structure of the human and chicken a2(VI) genes is highly conserved, the 5‘ end of the human gene is significantly different from that of the chicken gene. Unlike the chicken gene, the human gene possesses a canonical TATA box in the promoter region and transcripts predominantly initiate 43 and 47 bp’ down- stream of the TATA box. Perhaps a more striking feature has been elucidated by the cloning of the 5’ end of the human The abbreviations used are: bp, base pair(s); kb, kilobase pairb); PCR, polymerase chain reaction. 6188

Human d(V1) Collagen Gene

Embed Size (px)

Citation preview

Page 1: Human d(V1) Collagen Gene

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1992 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 267, No. 9, Issue of March 25, pp. 6188-6196, 1992 Printed in U. S. A.

Human d(V1) Collagen Gene HETEROGENEITY AT THE 5”UNTRANSLATED REGION GENERATED BY AN ALTERNATE EXON*

(Received for publication, September 4, 1991)

Biagio Saittal, Rupert Timplt, and Mon-Li ChullII From the Department of Biochemistry and Molecular Biology and the TDepartment of Dermatology, Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, Pennsylvania 19107 and the $Man-Planck-Institut fur Biochemie, 0-8033 Martinsried, Federal Republic of Germany

Cosmid clones containing the 5’ region of the human a2(VI) collagen gene have been isolated and character- ized. DNA sequencing indicates that the signal peptide and the amino-globular domain are encoded by four exonsof 142,596,21, and 66 base pairs (bp). However, Sl nuclease and primer extension analyses show that the transcription start site is not present in the 142-bp exon. Two different 5’ cDNA clones are generated by the anchored polymerase chain reaction. Using the 5‘ cDNA clones as probes, two untranslated exons ( 1 , l A ) are found 12 kilobase pairs upstream of the first coding exon. These two exons are alternatively used in human fibroblasts, and most transcripts contain exon 1 se- quence. Exon 1 shows, by primer extension and S1 nuclease protection assay, two major and several minor transcription start sites. The promoter region contains a canonical TATA box, seven GGGCGG sequences, two possible CAAT boxes, and two sequences resembling AP2 binding sites. Exon 1 A contains three alternative splice donor sites and is located 650 bp downstream of exon 1. The most 3’ splice donor site of exon 1 A is found within an Alu repeat sequence. Exon 1 A is pre- ceded by five GGGCGG sequences and one resembling the AP2 binding site although neither TATA or CAAT boxes are found. Two additional GGGCGG sequences are located at the beginning of exon 1A. This study establishes that the human a2(VI) collagen gene is 36 kilobase pairs long and contains 30 exons. The 5‘- untranslated and promoter regions are significantly different from the corresponding segments of the chicken gene. The human gene produces by alternative processing multiple mRNAs differing in the 5’-un- translated region as well as the 3’-coding and noncod- ing sequences.

Type VI collagen forms a major class of microfibrils found in most tissues of vertebrates (for a review see Ref. 1). In situ,

Grants AR 38912, AR 38923, AR 19616, and AR 39740 and by * This work was supported in part by National Institutes of Health

Deutsche Forchungsgemeinshaft Grant SFB 266. The costs of publi- cation of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertise- ment” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numberls) M81834, M8183.5, and M81836.

$ On leave of absence from the Istituto di Biologia dello Sviluppo, Consiglio Nazionale delle Ricerche, Palermo, Italy.

I( To whom correspondence should be sent: Dept. of Biochemistry and Molecular Biology, Thomas Jefferson University, 233 S. 10th St., Philadelphia, PA 19107.

these microfibrils localize close to cells, nerves, blood vessels, and large collagen fibrils and are considered to have an anchoring function (2,3). Consistent with such a function are the biochemical findings that type VI collagen binds cells (4, 5) and that its fusion protein binds type I collagen (6). The binding activity also implies that in addition to its structural role, type VI collagen may be involved in cell migration and differentiation and embryonic development (7).

Type VI collagen is unusual among collagens in that the noncollagenous domains comprise 80% of its total mass. It consists of three chains, al(VI), a2(VI), and a3(VI), with molecular masses of 140, 140, and 300-340 kDa, respectively (1, 8, 9). Full-length cDNAs encoding all three chains from the human (10-12) and the chicken (6, 13-16) have been cloned and sequenced. Analysis of their primary structure indicates that each chain contains a central collagenous do- main of 335-336 amino acid residues, which is flanked by a large globular domain at both amino and carboxyl termini. The globular domains consist of repetitive motifs of approxi- mately 200 residues which share significant identity with similar domains in von Willebrand factor, certain integrins and complement components, and the cartilage matrix pro- tein. The al(V1) and a2(VI) chains have one such repeat in the amino-globular domain (Nl) and two in the carboxyl- globular domain (C1 and C2). The a3(VI) chain has a total of 11 repetitive motifs, with 9 found in the amino-globular domain. Collectively, the domain structures suggest that the three type VI collagen genes may have evolved by exon shuffling and duplication of two primordial genes, one coding for the collagenous domain and the other the 200-residue repetitive motif.

The human gene encoding the a2(VI) collagen is present as a single copy on human chromosome 21q22.3 in proximity to the al(V1) collagen gene (17). Recently we have isolated and characterized the genomic region coding for the carboxyl- globular domain and the triple-helical domain of the human a2(VI) chain (18, 19), and other investigators have reported the complete structure of the chicken a2(VI) collagen gene (20-22). In this study we have completed the isolation and characterization of the entire human a2(VI) collagen gene. We demonstrated here that even though the exon structure of the human and chicken a2(VI) genes is highly conserved, the 5‘ end of the human gene is significantly different from that of the chicken gene. Unlike the chicken gene, the human gene possesses a canonical TATA box in the promoter region and transcripts predominantly initiate 43 and 47 bp’ down- stream of the TATA box. Perhaps a more striking feature has been elucidated by the cloning of the 5’ end of the human

The abbreviations used are: bp, base pair(s); kb, kilobase pairb); PCR, polymerase chain reaction.

6188

Page 2: Human d(V1) Collagen Gene

The 5' End of the Human a2(VI.. Collagen Gene 6189

a2(VI) cDNA, which has revealed multiple mRNA species that differ in their 5"untranslated regions. This heterogeneity has been generated by the use of an alternative 5' exon that contain three splice donor sites.

MATERIALS AND METHODS

Isolation of Genomic DNA Clones-A human leukocyte genomic library constructed in pHV4 cosmid vector (18) and a human cosmid library in the vector pWE15 were obtained from Stratagene (La Jolla, CA) and screened by standard procedures (23). The probe used was a 1.8-kb EcoRI-BamHI fragment isolated from the 5' end of the a2(VI) cDNA clone, designated F225, which contained the 21 bp of the 5'-untranslated region, the coding regions for the amino-globular domain, and most of the triple-helical domain (10). Positive clones were purified by three or four sequential rounds of screening, and the cosmid DNA was prepared by the alkaline lysis method (23).

Sequence Analysis-Genomic DNA fragments were subcloned into Bluescript vectors (Stratagene). Plasmid DNA was prepared by a small scale boiling procedure (23) and sequenced by the dideoxy chain termination method (24) using 35S-dATP and Sequenase (U. S. Bio- chemical Corp.). Nucleotide sequencing was first performed with primers derived from the vector and the cDNA sequences. Further sequences were obtained with primers synthesized on the basis of the newly determined DNA sequences.

RNA Isolation-Established human diploid fibroblast cell lines 3349 and 1520 were obtained from the Coriell Institute for Medical Research (Camden, NJ). Osteosarcoma cells (Saos-2) were obtained from the American Type Culture Collection. Cells were grown in Dulbecco's modified Eagle's medium containing 10% fetal calf serum. Total RNA and poly(A)+ RNA were prepared as described previously (18). Total RNA of osteosarcoma cells was a generous gift of David Stokes, Thomas Jefferson University.

SI Nuclease and Primer Extension Analyses-Nuclease S1 analysis was performed as described previously (18, 25). Primer extension analysis generally followed the protocol described in Sambrook et al. (23). Briefly, oligonucleotide primers were 5' end labeled with [y-"P] ATP (3,000 Ci/mol) using T4 polynucleotide kinase. The primers were annealed to 1 pg of poly(A)+ RNA at 50 "C for 1 h in 20 pl of 10 mM Tris, pH 8.0,l mM EDTA, and 75 mM KC1 containing 1 unit of RNasin. After ethanol precipitation, the primers were extended for 1 h at 37 "C with 200 units of murine Moloney virus reverse transcrip- tase in 20 pl of a mixture containing 50 mM Tris, pH 8.3; 75 mM KCI; 3 mM MgC1,; 10 mM dithiothreitol; a 0.5 mM concentration of dATP, dCTP, dGTP, dTTP and in the presence of 1 unit of RNasin and 50

ng of actinomycin D. The reaction mixtures were ethanol precipitated and separated on a 6% polyacrylamide sequencing gel.

Anchored PCR Cloning of the 5' cDNA-Anchored polymerase chain reaction (PCR) was performed as described by Loh et al. (26). Briefly, single-stranded cDNA was prepared from 1 pg of fibroblast poly(A)+ RNA using 1 ng of primer f (see Fig. 5) and 200 units of murine Moloney virus reverse transcriptase under conditions sug- gested by the manufacturer (Bethesda Research Laboratories). A poly(G) tail was added to the cDNA with terminal deoxynucleotide transferase in 2 mM Cocl~, 1 mM dGTP for 1 h at 37 "C. After phenol- chloroform extraction and ethanol precipitation, the cDNA was used for PCR amplification using 4.0 units of Taq polymerase (Promega, Madison, WI) and 70 ng of each primer in 100 p1 of the standard buffer (Perkin-Elmer Cetus Instruments). The primers included one specific for the 5' end of the previously isolated cDNA (e in Fig. 5) and a mixture of the 33-mer EPSC primer (CGGAATTCTGCAGT- CAGCT(C)14) and 19-mer EPS primer (CGGAATTCTGCAGT- CAGCT) at a ratio of 1:9. The PCR was carried out in a thermal cycler (Coy Laboratory, Ann Arbor, MI) for 25 cycles of 2 min each at 94,52, and 72 "C. Five p1 of the final reaction mixture was ligated with SmaI-digested pUC19 plasmid, transformed into Escherichia coli (DH5a, Bethesda Research Laboratories), and screened with primer e (see Fig. 5). DNAs from positive clones were isolated and sequenced as described above.

RESULTS

Isolation and Mapping of the Genomic Region Coding for the Amino-globular Domain-Two cosmid clones, 7a and B10, were isolated by screening of approximately 4 X lo5 clones from each of the two cosmid libraries. Restriction enzyme mapping indicated that B10 (35 kb in pHV4 vector) and 7a (35 kb in pWE15 vector) overlapped by 20 kb (Fig. 1). The 4.2-kb Hind111 fragment from the B10 cosmid clone that hybridized to the 5' end of the cDNA was subcloned into Bluescript. The nucleotide sequences corresponding to the entire coding region, parts of introns, and the noncoding flaking regions were determined.

As shown in Figs. 1 and 2, the amino-globular domain is encoded by four exons (exons 2-5), the first of which (exon 2) codes for the signal peptide and amino acid residues 1-18. This exon also includes 21 bp of noncoding DNA found at the beginning of the F225 cDNA clone. In these experiments, the

Cos E10 (35 Kb)

Cos7a (35 Kb)

H 4 Kb

5' H E H E E

I ZyKb 4.2 Kh, """"""' "-"""""

I I I I I

0.5 Kb 66 I - P P P x x P H

(1 1 (1 A) (2) (3) (4) (5)

FIG. 1. Genomic organization of the 5' end of the human &(VI) collagen gene. The top panel shows two overlapping cosmid clones, B10 and 7a. The middle panel shows the restriction map of clone B10, locations of the sequenced Alu repetitive elements, and positions of the 2.4-kb X-H and 4.2-kb H-H fragments. The Alu sequences are either in the same or opposite orientation as the aP(V1) collagen gene, depicted as Alu (An) and Alu (Tn), respectively. Restriction sites shown are: E, EcoRI; H, HindIII; B, BglI; P, PstI; X, XhoI. In the bottom panel a more detailed map of the subcloned region with a schematic representation of the exon/intron organization is presented. Exons coding for the 5'-untranslated region and for the amino-globular domain are 1, lA, and 2-5, respectively. Exon 5 is a fusion exon also encoding the beginning of the triple helical domain. Exons are numbered from the 5' end (in parentheses below the exon), and the sizes of exons (in bp) are indicated above each exon.

Page 3: Human d(V1) Collagen Gene

6190 The 5' End of the Human a2(VI) Collagen Gene

FIG. 2. Nucleotide sequence of the 4.2-kb HindIII fragment (Fig. 1, middle panel) coding for the amino- globular domain. Underlined se- quences represent the HindIII restric- tion sites. Arrows denote the direction of the primers e and f a t the beginning and end of exon 2, respectively (see also Fig. 5). An Alu repetitive sequence is present in the last intron between positions 3866 and 4158 (dashed tines).

73

217

361

505

649

793

931

1081

1225

1369

1513

1657

1801

1945

2089

2233

2317

2521

3465

3609

10 20 I I

30 I

40 i

50 I

60 I

70 I

aagctttggtcacaggttatgccacatttaaaaatgaattgggaaaaagtttatctttttatgttctaaaac aagttaaatagcatgaagtgcttgtccttgaagctgtgaaactcacccacaaactgcctgagctggcacctt tgggaacggacttcattcctccctgggccccaggctgagcaggtctggcctgggtcacctcatccactccag acgccgcaaaatcacagcaacccttcaggccctctcctggcagcgctcccctagtccggtgcttggacactc ctcccatctgcacgaggagagggcggcactcaatcctgcatcaccctggaactgcacgctctgaaccaggcc gtgccttctgccagcgtccggctgctgccagccccactctgcggagccagcagcacagtgaggcccgtctga tggcctccagctccaccctgcggagtcagcgggcacagtgaggccggtcctgatggcctcctggagatcctg gagcttaggccaagccacagggcatcagtgaggatggacgttcaagggctggatctgtttgaccccagggtc ctggcactccaacagataggaaagccaggctaatgacggctgtgtccctacacttgacagagtcctccctcc ctccttatcaaagtcctgtttaaagggaatggagccaggctggagagagtgcctagctctgcaggggagccg gtctggggaagctgggatctctctcccgccctcccctctgcactgcttccagggcagccccaggcatggggg gcgccagacagtggtgctccattcccttccatgccgacgcgcccagctacccactccacccagccctggaga catggaagggcctcaaccatccaaatcccacccaaaactgagcccagaggcacccactaaacatctgtgaca cccagggtggggcaagaggcgcaagccccccagtccagatgctggtgatggtgtgtgctgggcgcagacccc gcttccttgaagactgaggcagtgcccccaatcccgctgacctggtgtgcgtgcgcctgccatgggggaggg tgccaggggagaggcactgggggtgtctgagcgacccccacccctgttgcagGACTTCAGGGCCACAGGTGC TGCCAAGAT CTCCAGGGCACCTGCTCCGTGCTCCTGCTCTGGGGAATCCTGGGGGCCATCCAGGCCCAGCA

~ L Q G T c s v L L L w G I L G A I Q A Q Q e x O n 2 GCAGGAGGTCATCTCGCCGGBCACTACTACCGAGAGA~CAAC~cTGCCCAGgtgccagggtcgggccggggct

~ ~~

- Q E V I S P D T T E R N N N C P

ctgggcatttgggggcagttggaccagtacccaggtgccaggggtcgggggccgggggctctgggcatttgg ggggcagttgggaccagtacccaggtgccaSoogttggggggccgggggctctggcattcgggggcagcggag gtcaaacccacaaacaggcacggggccaggaaacggggctccaacagtccctcctaggctggctcgtacagg tcctgtgccccagAGAAGACCGACTGCCCCATCCACGTGTACTTCGTGCTGGACACCTCGGAGAGCGTCACC

ATGCAGTCCCCCACGGACATCCTGCTCTTCCACATGAAGCAGTTCGTGCCGCAGTTCATCAGCCAGCTGCAG M Q S P T D I L L F H M K Q F V P Q F I S Q L Q AACGAGTTCTACCTGGACCAGGTGGCGCTGAGCTGGGCCTACGGCGGCCTGCACTTCTCTGACCAGGTGGAG N E F Y L D Q V A L S W A Y G G L H F S D Q V E GTGTTCAGCCCACCGGGCAGCGACCGGGCCTCCTTCATCAAGAACCTGCAGGGCATCAGCTCCTTCCGCCGC V F S P P G S D R A S F I K N L Q G I S S F R R GGCACCTTCACCGACTGCGCGCTGGCCAACATGACGGAGCAGATCCGGCAGGCAGGACCGCAGCAAGGGCACCGTC exon

E K T D C P I H V Y F V L D T S E S V T

G T F T D C A L A N M T E Q I R Q D R S K G T V CACTTCGCCGTGGTCATCACCGACGGCCACGTCACCGGCAGCCCCTGCGGCATCAAGCTGCAGGCCGAGCGG H F A V V I T D G H V T G S P C G I K L Q A E R

GCCCGCGAGGAGGGCATCCGGCTCTTCGCCGTGGCCCCCAACCAGAACCTGAAGGAGCAGGGCCTGCGGGAC A R E E G I R L F A V A P N Q N L K E Q G L R D ATCGCCAGCACGCCGCACGAGCTCTACCGCAACGACTACGCCGGCAACCATGCTGCCCGACTCCACCGAGATC~C I A S T P H E L Y R N D Y A T M L P D S T E I N

CAGGACACCATCAACCGCATCATCAAGGTCATGgtgagccgtgggcgggagcaccgtccacgcgccaggggt

ggccacggtgggctgtccacccactccgggcctcactttacccctctgtgagtgcggaggccgaaggaggaa gctccgggcagggcctgggccactcaggtgtccctccatccccacccagactcgaggcacacggctaaccag ca tg tc tg tc t t t t c tgcagAAACACGAAGCCTACGGAGAGgtgag tggcgc t t ccc t t cc tgccag tgc tg

gccggcagctgacccagcagagatgaccgcgccaggctgccgactcctggcgcctccaggctggaacagatg agaggagagggagtcacctgtcacctgttggaccgtaggccttggagtctggagcaagggctcccagccaaa gctaggctgtttagatcccgtgagggtcagcgttagggtcacccacagagcacgtgcttacaaggagaggtc gagggtctggcctccgggcaggtgggatccatccaccctggact ........ 0.8 Kb.. . . . . . . . . . . . . ggcttggtttttaaacttgcctagacacctgaccgagagccaaatctcttggctgtccctgatggggcagag cctcacagcaccccattctcacagctccctcacgcccgcccaggttctcagggcatttcagcatctccttgg cccctgctgagagtcgtgggctacacgttctgagaccctgccctgccacctgaggaatgtcccacccatgca accttctgtctctgcttcctcgtttcagTGCTACAAGGTGAGCTGCCTGG~TCCCTGGGCCCTCTGGGCC

Q D T I N R I I K V M

K H E A Y G E exon 4

C Y K V S C L E I P G P S G P ~ ~ ~ ~ ~

TABLE I Splice junction sites in the 5'-untranslated region and in the amino-globular domain

of the human oi2fvI) collazen gene Exon no. SDlice acceutor sites Exon size Sulice donor sites

1 1A 1A 1A 2 3 4 5

Start sites Start sites Start sites Start sites ccacccctgttgcagGACT ggtcctgtgccccagAGAA CtgtcttttctgcagAAAC . mttcctc&.ttcaflGCT

bp 41-63

115, 117 305,307 395,397

. . . 142

. . . 596

. . 21 . . . 66

. . . CCAGgtgagcgc

. . . GCAGgtgtgcgg"

. . . TCTGgtgattat"

. . . TGAGgtcaaggab

. . . CCAG'gtgccagg

. . . CATGgtgagccg

. . . AGAGgtgagtgg , . . GAAG gtaagatg

Internal splice donor sites in exon 1A (Fig. 5).

Indicates the presence of split codon. * Splice donor site within an Alu repeat sequence (Fig. 5).

Page 4: Human d(V1) Collagen Gene

The 5' End of the Human cuB(VI) Collagen Gene 6191

M 1 2 3 4

354 +'

220 -

79 +

75 +

END OF THE CDNA

ATG

84 bp protected fragment h*

"84

FIG. 3. S1 nuclease analysis of the 5' end of exon 2. The probe was a 0.5-kb PstI-BglI genomic fragment as depicted in the schematic diagram. The probe was 5' 32P-labeled and annealed with poly(A)+ RNA from human skin fibroblasts (3349). Lane 1, probe alone; lane 2, probe plus SI nuclease; lanes 3 and 4, 1 and 3 pg of poly(A)+ RNA, respectively, with probe and S1 nuclease. The probe (509 bp) and the protected fragment (84 bp) are indicated. Lane M, DNA size markers; the sizes in bp are indicated on the left. The samples were run on a 6% polyacrylamide gel, and autoradiography was performed overnight a t -70 "C.

5' end of exon 2 could not be determined by comparison of the cDNA and genomic sequences because this cDNA is not a full-length clone. Exon 3 (596 bp) and exon 4 (21 bp) code for amino acid residues 19-217 and 218-225, respectively. Exon 5 (66 bp) contains 30 bp, which codes for the last 10 amino acid residues of the amino-globular domain and 36 bp for the beginning of the triple-helix. As shown in Table I, the sequences of intron/exon junctions conform well to the con- sensus sequences of eukaryotic splice junctions (27,28).

Identification of the 5' End of Exon 2-To define the 5' end of exon 2, the S1 nuclease protection assay was performed using a 0.5-kb PstI-BglI genomic fragment that contained exon 2 (Fig. 3). The fragment was 5' end labeled with 32P at the BglI site, which is located in exon 2 corresponding to 78 bp from the 5' end of the cDNA F225. This probe protected a band of 84 bp. Inspection of the genomic sequence revealed a splice acceptor site that coincided with the S1 nuclease cleavage site (Fig. 2). Primer extension experiments using an

- A 5 A S 7 > OllFl,S I(PS C I I I P S C G C C C ( : ( . I ; I . C . i . C r ; [ ; ~ ~ ~ ~ ' s c ~ ~ ~ * ~ ~ G ( ; * G c C T C C T c G G G * C C * G

__c

G A C ' : . i . l . A G ^ G ! . C ; . C A ' ; ~ ; ? ~ ; C ~ G C c M G ; ~ T - 51Cl

A S S olll?os lips CIEPLTGGiu~s,-r-tiC ! 'GG(;C!~'! '~;G~:'C?CCAC~GCTGGCMCCGA~CGGATCGGCCCTC?GT

e G G ~ C C C G C ~ C C ; ~ C T ? C A C G G C C * C , , G G T G C T G C C M C I ~ G C T -

M e 1

AS5a

G C i ' G _ G C C i r G C R G h A C ! : : ~ C , ; ~ : I ; ~ ~ C ( ~ ~ ~ C ~ ; ~ ' ~ ~ , ~ . , ~ ~ ; ~ ! ~ ~ ; ~ C C C ~ G G A G ; " . ~ G G G , ~ C C ? T C C C ~ T ? G G G G i ' C G G , V ~ ' ~ ~ ~ G , , , , ~ G ~ ; ~ ? C [ ; ~ C C C ' . ~ ~ G T G G ; ~ G C C G C * G G ~ G T ~ C G G G C G h G C G G C G C C C * ~ C ~ ~ ; ~

Ch:;;iii,.~!;G,,CSGrcC.. .~ ,.( >...,.<.,.,. . ..",. ...,. r...".,.l(...^.i.^r . I ..>. I , .VL . . . . . . . . n C : ; A T ~ ~ C I . C A ( ; C ? C T G G C T T G G h ( ; G C C C ~ ~ _ ~ ' ; 4

ChCT?CGAC~CTGGAC?l'( :h~;~;:~CC;hl."C,C': ' : ;CC': ;C':CCAG .SCA:CXCT - Me1

" AS5b O?C'.CCA'TGCTtiGCiri . ' ~Mr( ;S i i ' : ' " ;GCCC?CT~~GG~*~CCGC/ ,GGTGTGCGGGCG,~GCGGCGCCC;~TCCG~ C C T G T G C C A ~ C A G , l ~ ~ C ~ ' ~ : ' l . ; ' : ~ ; : ' C ~ . ' ~ ~ ~ S C C ' . " i : ~ ; ~ ~ n C C C C C C T C G A G A n G C G n C ~ ~ ~ CAGt~V,~GAGCGG::C?GC~,~~C~.~:C~~;;,.[;C~~C~TG':'TC'~~GGG,,~A;~CAC;~GC?CTGGCT~G~;,GGCCCCC:_~ ChCT?CGAC~CTGI;TGA'?'?~,~??CA~.~;;,;,AG^CC;,~ACCGGGCACGGTGC?C,,CGCC?G~MTCC MC,1C?TCG

G G ; ~ C G C C G h C C C G G G C R ( ; i ~ ' ~ ~ ; ~ ~ ~ ? : : ; ~ : ~ ~ ; , \ C ? ~ C , ~ G G G C C A C A G G T G C T G C C , ~ G A ' : IC:CCAGG.;CACCXC7 4

- >IC,

B -

A S 7

1A 2 ~ ~~ AS5 _ ~ _

~~

1 A 2 m ASSa

1A 2 m.. ~~ ~~ " . . . . . . . . . . . . . . . . . . AS5b

FIG. 4. Nucleotide sequence and schematic representation of the heterogeneous 5' cDNA clones for the human &(VI) collagen. A , two cDNA clones obtained by anchored PCR, AS5 and AS7, and the two longer species of AS5 cDNA isolated by PCR, AS5a and AS5b, are shown. Underlined sequences are previously known common cDNA; double underlined are new sequences common for AS7 and AS5 species. Arrows show the nucleotide sequences and the orientation of the primers. Oligonucleotides EPSC/EPS are described under "Materials and Methods." text. B, schematic representation of exons 1, lA, and 2 for the four different kinds of cDNAs, which differ in their 5"untranslated regions.

oligonucleotide in this exon yielded several bands, whose lengths mapped the transcription start site approximately 40- 60 bp upstream of the S1 protected site (data not shown). The S1 and primer extension analyses therefore suggested the presence of exon(s) upstream of exon 2.

Cloning of 5' cDNA Extensions-To locate the first exon the anchored PCR method was used to extend the 5' end of the cDNA. The amplified cDNA fragments were cloned, and nine independent clones (insert size, 80-100 bp) were char- acterized by DNA sequencing. All clones contained the 3' PCR primer e (Fig. 4), and the sequence upstream of the primer matched the genomic sequence until the putative splice acceptor site identified by the S1 nuclease assay. Thereafter the sequences diverged from the genomic DNA. As shown in Figs. 4 and 5, most clones contained the same sequence upstream of the splice acceptor site although the length of these clones differed by 5-10 bp. The longest clone, AS7, possessed 45 bp of sequence upstream of the splice junction (Fig. 4A). However, a single clone (AS5) possessed a different upstream sequence. Analysis of the 5' cDNA clones estab- lished that exon 2 is 142 bp long, which includes 27 bp of 5'- noncoding sequence.

Identification of Exon 1 and Exon 1A-The AS7 cDNA- and AS5 cDNA-specific oligonucleotides were found to hy- bridize to a common 2.4-kb XhoI-Hind111 fragment approxi-

Page 5: Human d(V1) Collagen Gene

6192 The 5' End of the Human cr2(VI) Collagen Gene 10 20 30 40 50 60 I I I I I I T" 80

I -744 agggtggggagtggggaccccagacagagccctaccagggacccctgtcactctgtccccggctgggctcaggtggggacctcacg

-659 gtggtcccagggcccagcaccgaagcccacctgtggtttccagcgggaaaggggtggcaggggtggctggccgcatgcccaggctc

-574 tgccCcaacctccgcgcccaggctctgctgtccctgccctcccggctccccaccctcaggccccaggagcagcagtttctgcagga Pat1

-489 gctcctgacccggggcctctcgcgggaggcctgagcaagcgggacacaggacacggggtaggggaggggtg-tgatgggg

-404 ggaaccctgcaccccccaggcagctgctaccaaggggcgagtcccagggcccccgtcggccctgcgtgcggggcgcggtccccaa

FIG. 5. Nucleotide sequence of the promoter and 6'-untranslated re- gions of the human a2(VI) collagen gene. Two major transcriptional start sites of exon 1 are indicated by hollow arrows, and weak initiation sites are in- dicated by solid arrows. The most 5' start site is labeled +l. Exon sequences are depicted by uppercase letters. The TATA box at -25 is shown. Underlined se- quences represent potential Spl binding sites. PstI and XhoI are restriction sites. Oligonucleotides used in the primer ex- tension, PCR, and first-strand cDNA synthesis (a-f), similarities with CAAT sequences (double-dashed lines), and AP2 binding sites (dashed lines) are in- dicated. An Alu repetitive sequence is represented between the brackets. The internal splice donor site6 of exon 1A are indicated by solid dots. Note that the Alu element overlaps with exon 1A. Methio- nine encoded by the ATG codon within exon 2 is indicated above the codon.

-319 cacccagggccccggaggcgacacagccccagccaggtcgtccgggaaatggggcgggggcgacgggcggcggggcccgggacgcg

-234 aagtccgagcagcagcqggcaggggctqgcgggggagctcggcccgggctgcaggggggtccccaccctctccacctcctcctgcc

-149 tcccgccctcgagggtccccgcttccc~~~~~~~ccccctcccgtgcccccggccccctcctc=~~~~~ccgcggggccgcagcgcttc

-64 Ctggcggcggggcgggtcaggccggcgg~g~gg~cgccggccgcggttccCTCCCTGCTGCTTCTCGGCG~

22 CCGCGCCTCGGGCCGTCGGGAGCGGAGCCTCCTCGGGACCAGgtqagcgcctcccggaccccgcacctggaagccgctcggcccgc

"""""" - - PStI

XhoI - -25 +ll t t 1 fr t

*+ exon 1

a - 4 b 107 gggggtgaccccgagtcctgggaag9c99cggcggcggcggctccgtccctcgggcccccgggaagggggactccag~~~~~-~~~acgg

192 cggggggctcggcgggttcggggctcctcctcgcggggctqgggccgcqcct~cccctgtqgctccgcgtctctqggtccgaccct

217 cgggcgcgcgacttggggccacctccccgcqg~ctcctctggcgcggagcggcctggtcggggtggggggggtccctgtctgcgcc

362 cgagctcggtgctgggacccccgctcccgagacgaccccggcaccgcacgccccgccaggccccgcgtctgcgagcggttcgggtc

447 cggctccggccccgcggggaagacgccccggctqgctgggacctccgggggcgcagqgcctctccccgggccggacggaaggggcg - 617 cgcccctcccggcctggagcccaccaggqccccgccaggcccaggagaagcPPgtc~gacggaggcggctccccagggcggcgggacc

t t exon 1 A -

7 0 2 CgggCtgacagcgacccgCAGCCCTGCCGGG~ACAC~CTGGGAcTCCGcCGGGGCGCTGGTGG~ccGCTGGGcCTG

787 GGTCTCCACTGCTGGCAACCGMCGGATCGGATCGGCCCTCTGTGGAGCCGCAGGTGTGCGGGCGAGCGGCGCCCATCCGGGCTGTGCCAG

872 CAGAACCCCGGTGCCCGCGCCTAGGACGCCCCTGGAGAAGGGACCTTCCCTTTGGGGTCGGAACCCAG~GGAGGGGCCTGCGAT

957 CCGCGGAGCTCCTTGTTCTTGGGATAACACAGCTCTGGCTTGGAGGCCCCCTTGCACTTCGACTCTGGTGATTTATTC~G~GG

1042 C~AGACCGGGCACGGTGCTCACGCCTGTAATCCCAACACTTGGGGAGGCCGAGGCGGGCAGATCACCTGAGgtcaaggagtcgaga

1127 ccagcctagcacagggtgaaagccgtctctctactaaaatacaaaaaaaattagccgggcgtggtggcagcacctgtaatcccagc

1212 taatcgggaggctgaggcaggagaaatcacttgaacctgggagqcgqaggttgcagtgagctgagatcgcgccactgactccagcc

"

e C -

0

1297 tgggrgagggagcgagactgtctcaaaaaaaaaaaaaaaaaaaaaaaaaaggaaagga~ggcccggtgagatgctttctcttaaac

1382 acggccctgcacgttgagttgctgcctcctgtggcctatttcacgtttatgcaaagtcgggcgcctgatgcggggctcacccgcca

1467 caagcagggggtcctg . . . . . . 12.Kb . . . . . . tgccagqggagaggcactggggqtgtctgagcgacccccacccctgttgcagG

13552 ACTTCAGGGCCACAGGTGCTGCCRAGATGCTCCAGGGCACCTGCTCCGTGCTCCTGCTCTGGGG~TCCTGGGGGCCATCCAGGCC Met exon 2

". e 13637 CAGCAGCAGGAGGTCATCTCGCCGGACh~TACCGAGAGACTGCCCAGgtgccagggtcgggccggggctctgggcatt

mately 12 kb upstream of exon 2 (Fig. 1). The DNA sequence of 1.5 kb of this genomic region was determined (Fig. 5). The results confirm that this region contains both AS7 and AS5 sequences and that the two sequences are 650 bp apart (Fig. 5). In addition, consensus splice donor sites are found at the junctions at which the sequences of the cDNA and genomic DNA diverged. Collectively, these data indicate that exon 2 can be spliced into two different upstream regions, exon 1 and exon lA, generating the AS7 and AS5 cDNAs, respectively.

Identification of Three Splice Donor Sites in Exon 1A"The existence of the AS5 and AS1 mRNAs was verified further by PCR amplification of total RNA from fibroblast 3349 (Fig. 6A). Primers from exon 1 and 2 yielded an 89-bp product as predicted from amplification of the AS7 mRNA. Primers from exon 1 A and 2 produced a major band at 94 bp as expected for the AS5 mRNA as well as a 284-bp product. Cloning of the latter two PCR products, however, generated three differ- ent kinds of clones with insert sizes of 89, 284, and 374 bp, respectively. The longest product was not visible on the agarose gel shown in Fig. 6A but appeared in the reamplifi- cation of the PCR products (data not shown). The sequence of the shortest cDNA was identical to that of the AS5 cDNA

(Fig. 4A). The 284-bp product, designated AS5a, contained a 195-bp insertion whose sequence was colinear with the ge- nomic sequence downstream of exon 1A whereas the longest clone, designated AS5b, had an additional 90 bp of genomic sequence identified further downstream (Figs. 4A and 5). Interestingly, an Alu repetitive sequence is found in this region of the gene, and the longest clone (ASW) included part of the Alu sequence. Thus, the data from PCR cloning indicate that there are two other 5' splice donor sites in exon 1A downstream from the one identified by anchored PCR cloning, and these three donor sites can be alternatively used in normal skin fibroblasts.

Primer Extension and Sl Nuclease Analyses of Exon 1- Nuclease S1 analysis of fibroblast mRNA using a 198-bp probe from exon 1 yielded two major protected fragments of 41 and 45 bp (Fig. 7). Longer exposure of the film revealed six larger fragments of lower intensity (data not shown). Protected fragments of a similar size were obtained with total RNA from Saos-2 osteosarcoma cells except that the intensity of an approximately 63-bp doublet was much higher than that observed in fibroblasts. Primer extension with an oligonucle- otide from exon 1 yielded two major extension products of 41

Page 6: Human d(V1) Collagen Gene

5' 1

Y 23

1 A 3'

- I72 bD

FIG. 6. PCR analysis of the 5"untranslated region of the human a2(VI) collagen mRNA. PCR was performed with specific primers after first-strand cDNA synthesis using mRNA isolated from human skin fibroblast 1520 and oligonucleotide f (Fig. 5). The prod- ucts were separated on a 4% agarose gel ( A ) . Lane I shows PCR products of 94 and 284 bp specific for exons 1A and 2 by using primers c and e as depicted in the schematic diagram in B. Lane 2 shows a PCR product of 89 bp specific for exons 1 and 2 by using primers a and e ( B ) . Lane M represents DNA size markers. B, schematic diagram showing the primers for the PCR and the 198- and 172-bp single-stranded DNA probes used in the S1 nuclease analyses in Figs. 7 and 8.

and 43 bp and several minor products ranging in size from 45 to 63 bp. The sizes of the primer extension products agreed with those of the S1 nuclease analyses although the relative intensity of the bands in these two experiments differed. Comparison of the S1 protected band sizes with the sequenc- ing ladder of the S1 probe allowed the assignment of the

6193

-63

43 5

4 4 1

FIG. 7. Analysis of exon 1 by S1 nuclease protection and primer extension. The probe for S1 analysis was a 198-bp single- stranded DNA generated by DNA polymerase I (Klenow fragment) using oligonucleotide b and XhoI restriction site (Fig. 6R). Lane M represents a 1-kb DNA size ladder. Lane I contains probe alone. Lane 2 contains probe plus S1 nuclease. Lanes 3 and 4 contain probe, S1 nuclease, and 1 and 3 pg, respectively, of fibroblast 3349 poly(A)+ RNA. Lane 5 contains probe, S1 nuclease, and 10 pg of total RNA of Saos-2 cells. Lane 6 represents primer extension analysis using 1 pg of fibroblast 3349 mRNA and oligonucleotide b (Fig. 6B). DNA fragments were run on a 6% polyacrylamide sequencing gel, and autoradiography was performed overnight a t -70 "C.

Page 7: Human d(V1) Collagen Gene

6194 The 5’ End of the Human a2(VI) Collagen Gene

1724

FIG. 8. S1 nuclease analysis of exon 1A. The probe was a 172- bp single-stranded DNA synthesized by DNA polymerase I (Klenow fragment) using oligonucleotide d and PstI restriction site (Fig. 6B) . I h z e M represents DNA size markers. Lune 1 contains probe alone. Lane 2 contains probe plus S1 nuclease. Lanes 3 and 4 contain 2 and 10 pg, respectively, of fibroblast 3349 poly(A)+ RNA with probe and S1 nuclease. Protected fragments were run on a 6% polyacrylamide sequencing gel, and autoradiography was performed overnight at -70 “C.

unlikely that the promoter upstream of exon .1 transcribes both AS5 and AS7 mRNAs. Further inspection of the DNA sequence upstream of the exon 1A revealed five GGGCGG sequences and two additional GGGCGG sequences found at the beginning of exon 1A. Nevertheless, canonical TATA and CAAT sequences were not found, nor was a splice acceptor site present at the 5’ end of exon 1A. These data strongly suggest that another promoter is present upstream of exon 1A.

DISCUSSION

The studies described in this report, in conjunction with our previous structural analyses of the 3’ portion of the gene (18, 19), provide a complete characterization of the human a2(VI) collagen gene. Together, we have isolated and char- acterized three overlapping cosmid clones, B10, 7a, and D l (Fig. 9), spanning 80 kb of genomic DNA. Structural mapping indicates that the entire a2(VI) gene consists of 30 exons spanning 36 kb of DNA. Two of the exons, 1A and 28A, are alternatively utilized to produce multiple mRNAs that differ in the 5’- and the 3”untranslated regions, as well as in a segment coding for the carboxyl-globular domain (18). Re- cently, physical mapping using pulsed field gel electrophoresis suggested that the a2(VI) collagen gene is located within 700 kb from the telomere of the long arm of chromosome 21 (31).

The amino terminus of the a2(VI) chain consists of a signal peptide, a 200-amino acid residue repetitive motif (Nl), and

a short cysteine-rich segment connected to the triple-helical domain. The exons coding for this region correspond well with the protein subdomains. Specifically, the signal peptide is encoded by exon 2; the entire N1 domain is encoded by a single exon of 596 bp (exon 3); and the short connecting segment is encoded by exon 4 and the first half of exon 5. Similar to the organization of several other collagen genes, there is a junction exon (exon 5) that encodes the transition of the noncollagenous domain and the collagenous domain. This arrangement is in sharp contrast to that of the 3’ end of the gene where an intron separates the regions coding for the triple-helical domain and the carboxyl-globular domain. Com- parison with the chicken a2(VI) gene indicates that the exons coding for the amino-globular domain are strictly conserved between the human and the chicken (18-22).

The exon structures of the 200-residue repeats of von Willebrand factor, cartilage matrix protein, integrin receptor P150,95, and complement factor B have been reported (32-35). These studies demonstrate that each repeat can be encoded by 1-5 exons. We have shown previously that two repeats (C1 and C2) in the carboxyl-globular domain of the a2(VI) chain are encoded by either one or two exons. More recently we showed that each of the eight consecutive repeats in the amino-globular domain of the a3(VI) chain is each encoded by a single exon (25). Interestingly, the boundaries of these repeats are always delineated by introns although the number of exons coding for each repeat varies. These observations suggest that the primordial gene for these repeats has no introns and that introns are acquired after the primordial gene has been duplicated and shuffled to separate genomic locations.

Analysis of the extreme 5‘ region of the gene presented here reveals that there is little sequence homology between the promoters of the human and chicken a2(VI) collagen genes (22). The human gene contains a canonical TATA box and two possible CAAT boxes whereas the chicken gene lacks both elements. A TATA box is thought to specify the precise position of transcription initiation (36), and thus genes with- out a TATA box often start transcription at multiple sites. Our primer extension and S1 nuclease analyses identified two major start sites of the human gene at 43 and 47 bp down- stream of the TATA element. However, we also found that six additional weak start sites in a segment of 30 bp surround the major start sites. Because the TATA box is flanked by two strong potential SP1 binding sites (Fig. 5), it is conceiv- able that steric hindrance may prevent simultaneous binding of transcription factors to the closely spaced TATA and SP1 binding elements. Therefore, initiation from additional weak sites may be a consequence of transcription in a TATA- independent mode. In this regard it is of interest to note that both TATA-dependent and -independent modes of transcrip- tion are utilized by the mouse metallothionein gene (37). On the other hand, the chicken a2(VI) collagen gene, which lacks a TATA box, initiates transcription at multiple sites spread over a larger region of 60 bp (22). As in the chicken gene, multiple CpG sequences are found in the promoter region of the human gene. The CpG sequences are present at the 5’ end of all housekeeping genes as well as many tissue-specific genes (38). Furthermore, a large intron (12.5 kb) is found in the 5”untranslated region of the human gene. The translation start sites of most other collagen genes are usually present in the first exon.

Most strikingly, the human a2(VI) gene produces at least four mature mRNAs that differ in the sequence of the 5‘- untranslated region. The sequence divergence begins at 27 bp upstream of the ATG codon. The most abundant mRNA

Page 8: Human d(V1) Collagen Gene

The 5’ End of the Human a2(VI) Collagen Gene 6195

- 3 Kb Cos B10 (35 Kb)

/ /

/ , Cos D l (33 Kb) .

/ /

/

5’

. I C .

1- 5’UT T H . .

3

1 1 A 2 3 4 5 6 8 10 12 14 16 18 20 23 24 2627 28A 28

27 28A 28

FIG. 9. Schematic representation of the human a2(VI) collagen gene. The B10, 7a, and D l overlapping genomic cosmid clones are indicated (top). The exons that comprise the 5’-untranslated region (5‘ UT) , amino- globular domain ( N ) , triple-helical domain (TH), and carboxyl-globular domain ( C ) are depicted between the arrows (middle). Alternative splicing at the 5’ end and at the 3’ end of the gene is shown (bottom).

species AS7 is transcribed from exon 1 whereas three low abundance mRNAs, AS5, AS5a, and ASSb, utilize the alter- nate exon 1A that contains three splice donor sites. It is of interest to note that the AS5b mRNA contains a donor splice site within an Alu repetitive sequence. To our knowledge this is the first demonstration of such an element being included in an exon. S1 nuclease analysis indicates that exon 1A starts at 720 (Figs. 5 and 8). The three minor RNA species could initiate transcription either from exon 1A or from exon 1 as does the AS7 RNA although by alternative splicing into exon 1A. If the latter is the case, mRNAs that share both exon 1 and 1A sequences should exist. The fact that anchored PCR cloning and PCR amplification using primers for exon 1 and 1A fail to detect such mRNA species seems to favor the idea that exon 1A contains its own promoter. In addition, there are seven SP1 binding sequences clustered in a region sur- rounding exon lA, suggesting that this region could contain a second promoter. Further functional assays are necessary to delineate the precise mechanism by which these heteroge- neous 5‘ transcripts are generated.

Alternative processing of RNA is a common mechanism used by eukaryotic cells to generate multiple transcripts from a single gene (for a review, see Ref.39). In most of the cases, the expression of these multiple transcripts is regulated in a tissue-specific or temporal manner. There are a t least two examples of a collagen gene using alternative promoters in modulating its tissue-specific expression. The al(1X) collagen gene initiates transcription in the cornea at a site 20 kb downstream from the one used in the cartilage (40). As a result the cornea mRNA encodes an al(1X) chain that lacks a significant portion of the amino-terminal globular domain. Similarly, a chondrocyte-specific promoter is found in the second intron of the chick al(1) collagen gene (41). Transcrip- tion from this al(1) promoter generates an mRNA with a different 5’ end. This mRNA encodes short polypeptides unrelated to the al(1) collagen. Consequently, the synthesis of the al(1) chain is turned off in chondrocytes. The &(VI) collagen gene described in this report represents another example of a collagen gene that utilizes alternative processing to produce mRNAs with different 5’ ends. In this case, alter- native processing does not alter the protein coding region.

There are several examples of genes that produce multiple mRNA transcripts that differ only in the 5”untranslated sequences. The process involves different mechanisms. For example, the genes for a-amylase (42), insulin-like growth factors I and I1 (43-45), actin 5C (46), and aldolase A (47) transcribe from multiple promoters whereas the gene for hydroxymethylglutaryl-CoA reductase uses a combination of multiple transcription initiation sites and multiple 5’ splice donor sites of an intron (48). It should be noted that all of these genes contain an intron before the translation start codon, therefore alternative processing of the 5’ exons does not affect the coding sequence. Previous studies have sug- gested that structural elements such as hairpin loops and open reading frames in the 5”untranslated region affect the translation efficiency of the mRNA (49, 50). Indeed, it has been shown that tissue-specific utilization of the transcrip- tional start sites for the complement protein C2 produces multiple mRNAs with differential translation efficiencies (51). The gene for insulin-like growth factor I1 transcribes two mRNAs with different 5”untranslated regions. One spe- cies is found exclusively on the membrane-bound polysomes whereas the other is present in cytoplasmic particles and not directly engaged in protein synthesis (52). The 5”untrans- lated regions of all four a2(VI) collagen mRNA species lack an AUG codon upstream of the authentic translation start site, and therefore the protein products of these four mRNAs remain the same. However, because all four 5”untranslated regions are GC rich, computer analyses predict that stable hairpin structures can be formed. This suggests that the translation efficiency or the stability of the four mRNAs may be different. We have shown previously that the 3‘ end of the aZ(V1) gene undergoes alternative splicing. This process gives rise to three transcripts that differ in the carboxyl-terminal coding sequence and the 3”untranslated region. It is now interesting to consider the newly identified heterogeneity in the 5’ end in the context of what is known of the diversity in the 3’ end. For example, it is formally possible that the choice of the 5’ and the 3‘ exons are intimately linked. Even though significant changes in the expression of these multiple &(VI) collagen mRNAs have not yet been detected in different tissues and cell lines we cannot exclude the possibility that

Page 9: Human d(V1) Collagen Gene

6196 The 5' End of the Human a2(VI) Collagen Gene

the minor mRNA species are preferentially utilized in some restricted tissues or during development. Further work is required to understand the biological significance of the alter- native processing. The isolation and characterization of the entire gene represent an important step toward elucidating the mechanisms involved in the regulation of the a2(VI) collagen gene.

Acknowledgments-We thank Loretta Renkart for her excellent technical assistance and Dr. George Dodge and Sulagna Chakraborty for their critical reading of this manuscript.

REFERENCES

1. Timpl, R., and Engel, J. (1987) in Structure and Function of Collagen Types (Mayne, R., and Burgeson, R. E., eds) pp. 105- 143, Academic Press, Orlando, FL

2. Bruns, R., Press, W., Engvall, E., Timpl, R., and Gross, J. (1986) J. Cell Biol. 103, 394-404

3. Keene, D. R., Engvall, E., and Glanville, R. W. (1988) J. Cell Biol. 107, 1995-2006

4. Wayner, E. A., and Carter, W. G. (1987) J. Cell Biol. 105, 1873- 1884

5. Aumailley, M., Mann, K., von der Mark, H., and Timpl, R. (1989) Exp. Cell Res. 181,463-474

6. Bonaldo, P., Russo, V., Bucciotti, F., Doliana, R., and Colombatti, A. (1990) Biochemistry 29,1245-1254

7. Otte, A. P., Roy, D., Siemerink, M., Koster, C. H., Hochstenbach, F., Timmermans, A., and Durston, A. (1990) J. Cell Biol. 111 ,

8. Trueb, B., and Winterhalter, K. H. (1986) EMBO J. 5, 2815- 2819

9. Colombatti, A., Bonaldo, P., Ainger, K., Bressan, G. M., and Volpin, D. (1987) J. Biol. Chem. 262, 14454-14460

10. Chu, M.-L., Conway, D., Pan, T., Baldwin, C., Mann, K., Deutz- mann, R., and Timpl, R. (1988) J. Biol. Chem. 263, 18601- 18606

11. Chu, M.-L., Pan, T.-C., Conway, D., Kuo, H. J., Glanville, R. W., Timpl, R., Mann, K., and Deutzmann, R. (1989) EMBO J. 8,

12. Chu, M.-L., Zhang, R.-Z., Pan, T.-C., Stokes, D., Conway, D., Kuo, H.-J., Glanville, R., Mayer, U., Mann, K., Deutzmann, R., and Timpl, R. (1990) EMBO J. 9,385-393

13. Trueb, B., Schaeren-Wiemers, N., Schreier, T., and Winterhalter, K. H. (1989) J. Biol. Chem. 264,136-140

14. Koller, E., Winterhalter, K. H., and Trueb, B. (1989) EMBO J.

15. Bonaldo, P., Russo, V., Bucciotti, F., Bressan, G. M., and Col-

16. Bonaldo, P., and Colombatti, A. (1989) J. Biol. Chem. 264,

17. Weil, D., Mattei, M.-G., Passage, E., Nguyen, V. C., Pribula- Conway, D., Mann, K., Deutzmann, R., Timpl, R., and Chu, M.-L. (1988) Am. J. Hum. Genet. 42,435-445

18. Saitta, B., Stokes, D. G., Vissing, H., Timpl, R., and Chu, M.-L. (1990) J. Biol. Chem. 265 , 6473-6480

19. Saitta, B., Wang, Y.-M., Renkart, L., Zhang, R.-Z., Pan, T.-E., Timpl, R. and Chu, M.-L. (1991) Genomics 11,145-153

20. Hayman, A. R., Koppel, J., Winterhalter, K. H., and Trueb, B.

271-278

1939-1946

8,1073-1077

ombatti, A. (1989) J. Biol. Chem. 264,5575-5580

20235-20239

21.

22.

23.

24.

25.

26.

21. 28.

29. 30. 31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45. 46. 47.

48.

49. 50. 51.

52.

(1990) J . Biol. C k m . 265,9864-9868 Hayman, A. R., Koppel, J., and Trueb, B. (1991) Eur. J. Biochern.

197,177-184 Koller, E., Hayman, A. R., and Trueb, B. (1991) Nucleic Acids

Res. 19,485-491 Samhrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular

Cloning: A Laboratory Manual, pp. 1.38-1.39, 7.79-7.83, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Sanger, F., Nicklen, S., and Coulson, A. (1977) Proc. Natl. Acad. Sci. U. S. A. 74,5463-5467

Stokes, D. G., Saitta, B., Timpl, R., and Chu, M.-L. (1991) J. Biol. Chem. 266,8626-8633

Loh, E. Y., Elliott, J. F., Cwirla, S., Lanier, L. L., and Davis, M M. (1989) Science 243,217-220

Mount, S. M. (1982) Nucleic Acids Res. 10,59-72 Shapiro, M. B., and Senapathy, P. (1987) Nucleic Acids Res. 15 ,

Dynan, W. S., and Tjian, R. (1983) Cell 3 5 , 79-87 Mitchell, P. J., and Tjian, R. (1989) Science 245,371-378 Burmeister, M., Kim, S., Price, R., de Lange, T., Tantravahi, V.,

Myers, R. M., and Cox, D. R. (1991) Genomics 9, 19-31 Mancuso, D. J., Tuley, E. A., Westfield, L. A., Worrall, N. K.,

J. E. (1989) J. Biol. Chem. 2 6 4 , 19514-19527 Shelton-holes, B. B., Sorace, J. M., Alevy, Y. G., and Sadler,

Kiss, I., Deik, F., Holloway, R. G., Jr., Delius, H., Mebust, K. A., Frimberger, E., Argraves, W. S., Tsonis, P. A., Winterbottom, N., and Goetinck, P. F. (1989) J. Biol. Chem. 264,8126-8134

Corbi, A. L., Garcia-Aguilar, J., and Springer, T. A. (1990) J. Biol. Chem. 265 , 2782-2788

Campbell, R. D., and Porter, R. R. (1983) Proc. Natl. Acad. Sci.

Ghosh, P. K., Lebowitz, P., Frisque, R. J., and Gluzman, Y. (1981)

Garrity, P. A., and Wold, B. J. (1990) Mol. Cell. Biol. 10 , 5646-

Gardiner-Garden, M., and Frommer, M. (1987) J. Mol. Biol. 196 ,

Smith. C. W. J.. Patton. J. G.. and Nadal-Ginard (1989) Annu.

7155-7174

U. S. A. 80,4464-4468

Proc. Natl. Acad. Sci. U. S. A. 7 8 , 100-104

5654

261-282

Reu.'Genet. 23,527-517 '

Nishimura. I.. Muraeaki. Y.. and Olsen. B. R. (1989) J. Biol. Chem. 264,' 20033~20041 '

. .

Bennett, V. D., and Adams, S. L. (1990) J. Biol. Chem. 2 6 5 ,

Young, R. A,, Hagenbuchle, O., and Schibler, U. (1981) Cell 2 3 ,

Roberts, C. T., Lasky, S. R., Lowe, W. L., and LeRoith, D. (1987)

Lowe, W. L., Roberts, C. T., Lasky, S., R., and LeRoith, D. (1987)

Sussenbach, J. S. (1989) Prog. Growth Factor Res. 1,33-48 Bond, B. J., and Davison, N. (1986) Mol. Cell. Biol. 6,2080-2088 Izzo, P., Costano, P., Lupo, A., Rippa, E., Paolella, G., and

Reynold, G. A., Goldstein, J. L., and Brown, M. S. (1985) J. Biol.

Kozak, M. (1986) Cell 4 4 , 283-292 Kozak, M. (1989) Mol. Cell. Biol. 9 , 5134-5142 Horiuchi, T., Macon, K. J., Kidd, V. J., and Volanakis, J. E.

Nielsen, F. C., Gammeltoft, S., and Christiansen, J. (1990) J.

2223-2230

451-458

Biochem. Biophys. Res. Commun. 146,1154-1159

Proc. Natl. Acad. Sci. U. S. A. 8 4 , 8946-8950

Salvatore, F. (1988) Eur. J. Biochem. 174, 569-578

Chem. 260,10369-10377

(1990) J. Biol. Chem. 265,6521-6524

Biol. Chem. 265,13431-13434