14
Am J Hum Genet 37:719-732, 1985 Analysis of Three Restriction Fragment Length Polymorphisms in the Human Type II Procollagen Gene CHARIS E. L. ENGI AND CHARLES M. STROM SUMMARY Cloned genomic DNA sequences corresponding to various regions of the human type 1I procollagen gene were used to analyze the DNA from 78 normal volunteers. Southern hybridization experiments de- tected polymorphic HindIll, BamHl, and EcoRl sites. The presence of the polymorphic HindIll site results in a 7.0-kilobase (kb) band, and the absence of this site results in a 14.0-kb band. When present, the BamHI polymorphic site yields a 4.8-kb band, and when absent, yields a 7.2-kb band. The presence of the EcoRl polymorphic site results in a 3.7-kb band, and its absence results in a 7.0-kb band. Each polymorphic site was mapped. Analyses of the data demonstrated that the sites are present in overall gene frequencies of .39 for HindIll, .04 for BamHl, and .02 for EcoRl. Gene frequencies of the polymorphic sites were also studied with respect to race. The polymorphic sites are present in a Hardy-Weinberg distribution in the study population. Study of an extended family demonstrated that the segregation of the HindIll polymorphic site is consistent with Mendelian inheritance. INTRODUCTION Restriction fragment length polymorphisms (RFLPs) provide a powerful tool for the investigation of genetic linkage [ 1-10]. The detection of RFLPs was first Received November 13, 1984; revised February 7, 1985. C. E. L. E. is the recipient of the American Heart Association-Borg-Warner Medical Student Research Fellowship. C. M. S. is a Hartford Fellow and was supported by March of Dimes grant 6- 35160, the Sprague Foundation, the Schweppe Foundation, and the Amoco Foundation, and grants AM-33910 and HD-04583 from the National Institutes of Health. ' Both authors: Department of Pediatrics, The Joseph P. Kennedy, Jr. Mental Retardation Research Center, Committee on Developmental Biology, and the Committee on Genetics, The University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637. © 1985 by the American Society of Human Genetics. All rights reserved. 0002-9297/85/3704-0007$02.00 719

Polymorphisms in the HumanType II Procollagen Gene

  • Upload
    doandat

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Am J Hum Genet 37:719-732, 1985

Analysis of Three Restriction Fragment LengthPolymorphisms in the Human Type II

Procollagen Gene

CHARIS E. L. ENGI AND CHARLES M. STROM

SUMMARY

Cloned genomic DNA sequences corresponding to various regions ofthe human type 1I procollagen gene were used to analyze the DNAfrom 78 normal volunteers. Southern hybridization experiments de-tected polymorphic HindIll, BamHl, and EcoRl sites. The presenceof the polymorphic HindIll site results in a 7.0-kilobase (kb) band, andthe absence of this site results in a 14.0-kb band. When present, theBamHI polymorphic site yields a 4.8-kb band, and when absent,yields a 7.2-kb band. The presence of the EcoRl polymorphic siteresults in a 3.7-kb band, and its absence results in a 7.0-kb band. Eachpolymorphic site was mapped. Analyses of the data demonstrated thatthe sites are present in overall gene frequencies of .39 for HindIll, .04for BamHl, and .02 for EcoRl. Gene frequencies of the polymorphicsites were also studied with respect to race. The polymorphic sites arepresent in a Hardy-Weinberg distribution in the study population.Study of an extended family demonstrated that the segregation of theHindIll polymorphic site is consistent with Mendelian inheritance.

INTRODUCTION

Restriction fragment length polymorphisms (RFLPs) provide a powerful toolfor the investigation of genetic linkage [ 1-10]. The detection of RFLPs was first

Received November 13, 1984; revised February 7, 1985.C. E. L. E. is the recipient of the American Heart Association-Borg-Warner Medical Student

Research Fellowship. C. M. S. is a Hartford Fellow and was supported by March of Dimes grant 6-35160, the Sprague Foundation, the Schweppe Foundation, and the Amoco Foundation, and grantsAM-33910 and HD-04583 from the National Institutes of Health.

' Both authors: Department of Pediatrics, The Joseph P. Kennedy, Jr. Mental RetardationResearch Center, Committee on Developmental Biology, and the Committee on Genetics, TheUniversity of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637.

© 1985 by the American Society of Human Genetics. All rights reserved. 0002-9297/85/3704-0007$02.00

719

ENG AND STROM

utilized to examine the inheritance of mitochondrial and yeast genes [9, 10].Botstein et al. [9] described a basis for constructing a human genomic geneticlinkage map using RFLPs and random DNA sequences as probes. In this way,DNA marker loci could be arranged into linkage groups. Pedigrees could thenbe analyzed to determine if RFLPs segregate with certain disease traits. In theP3-globin gene cluster, seven polymorphic have been found in a span of 60 kb ofDNA [2]. Calculations reveal a ratio of one polymorphic site per 300 noncodingbase pairs in this gene cluster. RFLPs have also been discovered in many otherhuman genes.The collagens are a family of related proteins that contain three polypeptide

chains arranged in a triple helix and are major structural components in verte-brate connective tissue [11]. Type II collagen is the predominant collagen ofcartilage and is also found in the notochord, the nucleus pulposus, and variousregions of the eye [11, 12]. It consists of a homopolymer of three aI1(1I)(Col2AJ) chains [11]. Previous work by Strom and Upholt resulted in theisolation of genomic clones corresponding to the human al(I) procollagengene [13]. The subclone, pgHCol(II)A, which contains coding sequences cor-responding to the carboxy-terminal propeptide region of the human type IIprocollagen gene [13], was used in Southern blot hybridization experiments todemonstrate that the human otl(II) procollagen gene is on chromosome 12 byanalyses of DNA from various mouse-human hybrid cell lines [14].We prepared DNA from 78 normal controls. The DNA was analyzed using

Southern filters of BamHI, EcoRl, and Hindull digestions hybridized withvarious genomic DNA probes corresponding to the human type II procollagengene [13]. We discovered polymorphic HindIII, BamHI, and EcoRI sites thatare present in overall gene frequencies of .39, .04, and .02, respectively. Thegene frequencies differ slightly between the black and Caucasian populations.The HindIII, BamHI, and EcoRI polymorphic sites are present in frequenciesof .35, 0, and .01, respectively, in the Caucasian subgroup, and .48, .10, and.05, respectively, in the black subgroup of our study population. Analysesrevealed that the polymorphic sites are present in a Hardy-Weinberg distribu-tion in the study population, and analysis of an extended family demonstratedthat the segregation of the HindIll polymorphic site is consistent with Mende-lian inheritance.Three polymorphic sites segregating in a population would be expected to

generate eight different haplotypes. Only three haplotypes were found in theCaucasian study population, while four were observed in the black population.

MATERIALS AND METHODS

Sample Population

Peripheral blood samples were obtained from laboratory volunteers, medical studentvolunteers, and their family members.

DNA Probes

The genomic subclone pgHCol(II)A contains a 2.0-kb EcoRIIBamHI insert inpBR322. The construction of this subclone has been described [13]. DNA sequence

720

TYPE II PROCOLLAGEN GENE

analyses revealed that pgHCol(II)A contains coding sequences specifying amino acids29-79 of the carboxy-terminal propeptide (the numbering scheme corresponds to thatused for the chicken al(I) procollagen gene in which amino acids are numbered sequen-tially beginning from the first residue of the carboxy-terminal telopeptide as describedby Fietzek and Kuhn [15] and as discussed by Sandell et al. [16]). This subclone alsocontains sequences coding for the entire carboxy-terminal telopeptide and triple helicalamino acids 1,000-1,014 (of a total of 1,014) [13] in addition to triple helical amino acids892-999 (W. B. Upholt and C. M. Strom, unpublished data, 1984). The subclonepgHCol(II)B contains a 2.7 kb EcoRI insert in pBR322, and its construction has beendescribed [13]. DNA sequence analyses demonstrated that pgHCol(II)B contains codingsequences specifying triple helical amino acids 694-711 [13] and amino acids 712-891(W. B. Upholt, C. M. Strom, G. Y. Paik, and C. E. L. Eng, unpublished data, 1984).The human genomic subclone pG44 (generously supplied by Dr. L. Kunkel) contains

a 2.6-kb HindIII/EcoRI insert in pBR322 and has been localized to Xq24-Xqter [17].

Southern HybridizationDNA was prepared from 10-20 ml heparinized peripheral blood by digestion with

proteinase K and RNase followed by phenol:chloroform extractions [18]. Prior to re-striction enzyme digestion, 5-10 pug ofDNA were spermine precipitated by the additionof 0.11 vol of 0.1 M spermine [19] to eliminate inhibition of restriction enzyme digestion.Reaction mixtures contained restriction endonuclease at a concentration of 2-4 U/[LgDNA. Reaction conditions were those recommended by the supplier (BRL, Gaithers-burg, Md.). As one method to assure completeness of restriction digestion, an aliquot ofeach reaction mixture was removed and used to digest lambda DNA at half the enzyme:DNA concentration present in the genomic DNA reaction mixture. Since digestion oflambda DNA is not an assurance of complete restriction digestion, further controlsusing digestion with varying restriction enzyme/DNA concentrations and hybridizationsusing various probes were also performed (see RESULTS), yielding similar results.

Sequential restriction enzyme digestions were carried out as follows. The DNA wasdigested overnight with the first restriction enzyme, and, subsequently, this reactionmixture was extracted with phenol:chloroform, the DNA was precipitated with ethanol,taken up in the appropriate buffer, and the second digestion performed. After overnightincubation at 37°C, the genomic DNA, lambda digestion controls, and size standardswere electrophoresed in 0.7%-1.0% agarose gels at 5 V/cm for 6-8 hrs, stained withethidium bromide, photographed, and examined. The genomic DNA was then trans-ferred to Zetabind (AMF, Cuno, Meriden, Conn.) filters by the procedure of Southern[20]. Prehybridization, hybridization, and washing procedures were those recom-mended by the manufacturer of Zetabind. Hybridizations were carried out in a solutioncontaining 5 x SSC (I x SSC = 0.15 M NaCI, 0.015 M Na citrate), I x Denhardt'ssolution (0.02% Ficoll 400, 0.02% polyvinylpyrolidone-360, and 0.02% bovine serumalbumin), 0.02 M NaPO4, pH 6.7, 100 ,ug/ml sonicated salmon sperm DNA, 10% dextransulfate, and 50 formamide.

Inserts of pgHCol(II)A, pgHCol(II)B, and pG44 were prepared and made radioactiveby nick-translation in the presence of [32P]deoxynucleoside triphosphates (Amersham,Arlington Heights, Ill.) to a specific activity of approximately 1.5 x 107 cpm/,ug DNA asdescribed [13]. All hybridizations were carried out in 50% formamide. Hybridizationswith radioactive pgHCol(II)A insert or pgHCol(II)B insert were performed at 51°C, andsimultaneous hybridizations using pG44 insert and pgHCol(II)A insert were performedat 45°C under the ionic conditions described [13]. After hybridization, each filter waswashed at room temperature with 2 x SSC, 0.1% SDS for 15 min and with 0.1 x SSC,0.1% SDS for 15 min prior to washing twice at 600C in 0. I x SSC, 0.1% SDS for 45 min.To eliminate the possibility of pBR322 or lambda DNA contamination, critical South-

ern filters were hybridized with nick-translated pBR322 or lambda DNA, and no specifichybridization was observed.

721

ENG AND STROM

Restriction Mapping

Restriction maps were generated by single and sequential restriction digestions ofDNA purified from recombinant clones or by end-labeling of recombinant DNA fol-lowed by restriction digestions as described [13].

RESULTS

A map of BamHI, EcoRI, and Hindl11 sites extending approximately 18 kbupstream from the 3' end (with respect to mRNA polarity) of the human type 1Iprocollagen gene is shown in figure 1. Two more Hindll sites are presentwithin 1 kb upstream of H', but are not shown in the figure since they have yetto be precisely mapped.The 4.0-kb EcoRI fragment 3' to pgHCol(II)A and flanked by E3 and E4 in

figure 1 was initially incorrectly positioned between pgHCol(II)A andpgHCol(II)B [13]. The initial position was inferred from experiments usingchicken al(II) procollagen probes hybridized to Southern filters of EcoRI di-gestions of the lambda human recombinant clone LgHCol(II)a [13]. Subse-quently, sequencing data for pgHCol(II)A and pgHCol(II)B revealed that theexon that codes for triple helical amino acids 892-909 (of a total of 1,014) endsapproximately 140 base pairs (bp) downstream from the 5' end of pgHCol(II)A,and the exon that codes for triple helical amino acids 856-891 begins approxi-mately 50 bp upstream from the 3' end of pgHCol(II)B (G. Y. Paik, C. E. L.Eng, C. M. Strom, and W. B. Upholt, unpublished data, 1984). Since these twoexons code for sequential amino acids, it is likely that pgHCol(II)A and

s' 32E2 B3Ia2a

eB2 E2

I Ilb

Ea EaI ~~~~~~~~~ic

Ea EaId

11 1 2 1 2 E2 H3 B3 E3 E4V I a~~ 'Is/Ii ~~~~~~~~~~~~~~~~~~~e

kb I I I I I I0 2 4 6 8 10 12 14 16 18

FIG. 1.-Restriction enzyme cleavage map of approximately 18 kb of the human type 11 procolla-gen gene. a, pgHCol(II)A; b, pgHCol(II)B; c, LgHCol(Il)a; d, LgHCol(I1)c; e, composite map.Restriction enzyme cleavage sites are denoted BamHI, B '-3, EcoR1, E '-4, Hindlll, H ''3. The symbolEa represents artificially created EcoRI sites for cloning purposes [13]. The polymorphic sites aremarked H2, B2, and E3. Two further HindIll sites located within I kb upstream of H' have yet tobe precisely mapped and are therefore not shown (see text).

722

TYPE 1I PROCOLLAGEN GENE

pgHCol(LI)B are contiguous. Other mapping and sequencing data haveconfirmed that the 4.0-kb EcoRI fragment is positioned as shown in figure 1.The mapping of the HindIll sites present in the isolated recombinant lambda

genomic clones and pBR322 genomic subclones is important because of thediscovery of a polymorphic HindlIl site. Sequence analyses predicted the pres-ence of a HindIlI site close to the BamHI site at the 3' end of pgHCol(II)A [ 13].The presence of this site was confirmed by the following experiments. In theconstruction of pgHCol(II)A, pBR322 was subjected to EcoRI/BamHI sequen-tial digestion and the 375-bp fragment was separated from the remainder of theplasmid by Sepharose 4B chromatography as described [13]. The 375-bp frag-ment contains the only HindIlI site in pBR322 [21]. Therefore, after ligationwith insert, there is no longer a HindlIl site in the pBR322 sequence ofpgHCol(II)A. Purified pgHCol(II)A plasmid DNA was digested with BamHl,EcoRI, or Hind1II separately and EcoRIIHindIII or EcoRI/BamHI sequen-tially. The single digestions with each restriction endonuclease linearized theplasmid (fig. 2, lanes 3-5). Sequential EcoRllHindlll and EcoRIIBamHI diges-tions resulted in identical patterns consisting of a 2.0-kb fragment (insert) and a3.8-kb fragment (pBR322 minus the 375-bp fragment) (fig. 2, lanes 1 and 2).Since pgHCol(II)A no longer contains the pBR322 HindIII site, these dataconfirm the existence of a HindlIl site close to the BamHI site of pgHCol(I1)Athat had been predicted by sequence analyses [13] and is labeled H3 in figure 1.

Screening of a human genomic DNA library with pgHCol(II)B resulted in theisolation of the lambda genomic clone LgHCol(II)c (fig. Id). The clone contains

1 2 3 4 5

55.8

-3.8

-2_0

FIG. 2.-Ethidium-bromide-stained agarose gel of pgHCol(II)A plasmid digested with variousrestriction enzymes, both alone and sequentially. Lane 1, HindIII/EcoRl; lane 2, BamHIIEcoRl;lane 3, BamHl; lane 4, EcoRI; lane 5, HindIll.

723

three HindIll sites, one of which (H') was mapped by single and sequentialrestriction digestions. The two other HindIll sites present upstream of HI inLgHCol(II)c have neither been thoroughly mapped nor subjected to sequenceanalysis. The HindIll site marked H' in figure I is approximately 10.5 kbupstream from the 3' end of LgHCol(II)c and 14.0 kb from the HindIll site inpgHCol(II)A.DNA prepared from 78 volunteers were digested to completion with HindIll,

BamHI, or EcoRI and hybridized to the human type 1I genomic probepgHCol(II)A. The summary of these results is shown in table 1.When Southern filters of DNA samples digested with HindlIl were hy-

bridized with pgHCol(LL)A, three patterns were observed (fig. 3A): a single 7.0-kb band, a 7.0-kb band plus a 14.0-kb band, or a single 14.0-kb band. Data citedabove demonstrated that HindIll sites labeled H' and H3 in figure 1 are presentin the recombinant clones. The polymorphic site must be upstream of H3 be-cause the patterns are detected using the probe pgHCol(II)A. Therefore, it islikely that the 7.0-kb fragment is the result of a polymorphism that creates aHindIll site at the location marked H2 in figure I that is not present in therecombinant clones. To confirm that it is the presence or absence of this sitethat results in the observed patterns, five DNA samples with varying HindIllpatterns were digested with HindIll alone (fig. 3A) or sequentially with HindIlland EcoRl (fig. 3B), and Southern filters prepared and hybridized to

TABLE I

SUMMARY OF RESULTS OF HYBRIDIZATIONS OF DNA SAMPLES DIGESTED WITH BammHl, EcoRI,OR HindIll WITH THE HUMAN PROCOLLAGEN GENOMIC PROBE pgHCo1(11)A

A. Frequency of BamHI, EcoRI, or Hind]Il genotypes in the normal population

Caucasian Black Total

BamHI genotypes ....... No. = 31 (%) No. = 24 (%) No. = 55 (%)+/+.. ................ 31 (100) 19 (79) 50 (92)+/-.. ................ 0 5 (21) 5 (8)

EcoRl genotypes ........ No. = 35 (%) No. = 21 (%) No. = 56 (%)+/+.................. 34 (97) 20 (95) 54 (97)+/-.. ................ 1 (3) 0 1 (1.5)-/.I................. 0 1 (5) 1 (1.5)

HindIII genotypes ....... No. = 48 (%) No. = 23 (%) No. = 71 (%)-/.I................. 18 (37.5) 6 (26) 24 (35)+/-.................. 26 (54) 12 (52) 38 (52)+/+.................. 4 (8.5) 5 (22) 9 (13)

B. Frequency of observed BamHI-EcoRI-Hind]lI haplotypes

HAPLOTYPES

Caucasian Black TotalBamHI EcoRI HindIII No. = 60 (%) No. = 40 (%) No. = 100 (%)

I..... + + - 37 (62) 21 (52.5) 58 (57)II.... + + + 22 (37) 16 (40) 38 (37)II...... - + _ 0 1 (2.5) 1 (2)IV.... + - _ (1) 2 (5) 3 (3)

724 ENG AND STROM

TYPE II PROCOLLAGEN GENE

A B1 2345 6 78 1 2 3 45

14.0

7.0-

FIG. 3.-Autoradiograms of Southern filters of genomic DNA digested with dIMI or EXoRIHind11l and hybridized to pgHCol(II)A. A, Hind111; sequential EcoRlI/HindIII1 digestions. Lanes I,2, 3, 4, and 5 in panel B correspond to lanes 1, 4, 5, 7, and 3, respectively, in panel A.

pgHCol(IL)A. All five samples had the single expected 2.0-kb EcoRI/HfindIIIband irrespective of whether they had the 7.0/7.0, 7.0/14.0, or 14.0/14.0genotype when digested with HindIll alone, thus confirming that H3 is presentin all cases. These data are also consistent with the possibility that the observedHindIll patterns are the result of a deletion of 7.0 kb of DNA 5' to H3. Toeliminate this possibility, EcoRI digestions of DNA with varying HindIll hy-bridization patterns were hybridized withpgHCos(ei)B. All samples had thesingle expected 9.5-kb band (data not shown). A deletion of 7.0 kb in the regionbetween the sites marked H' andHt would have given a band smaller than 9.5kb. These experiments confirm that the 14.0-kb band is due to the absence ofthe site markedHe2 (- allele) in figure iand the 7.0-kb band due to the presenceof this site (± allele).To eliminate the possibility that the observations were a result of contamina-

tion by cloned DNA sequences, critical filters were hybridized with pBR322and lambda DNA as probes, and no bands were observed.To ensure that the HindIll restriction digestions were complete, three types

of experiments were carried out. First, lambda control reactions were exam-ined to ensure that the lambda DNA was digested to completion in all incuba-tions prior to Southern transfer (see MATERIALS AND METHODS). Second, DNAsamples for individuals with a -/- or +/- genotype were digested usingincreasing restriction enzyme:DNA concentrations of2U/hb g DNA, 5 U/p3gDNA, and 10Ulabg DNA. In all cases, the HindIlI bands observed were identi-cal in all digestions (data not shown). Third, Southern filters of Hindll diges-

725

ENG AND STROM

1 2 3 4 5 6 7 8 91011121314

S 14.0

5.5

FIG. 4.-Autoradiogram of Southern filter of Hindill digestions of genomic DNA hybridized to amixture of pgHCol(II)A and pG44. The 14.0- and 7.0-kb bands represent the pgHCoI(I1)A hybridi-zation and the 5.5-kb band represents the pG44 hybridization.

tions of genomic DNA with varying HindlIl patterns were washed and auto-radiographed to ensure that no probe remained bound and subsequentlyhybridized to a mixture of radioactive pG44 and pgHCol(II)A (fig. 4). The X-linked subclone pG44 hybridized to the expected single 5.5-kb band irrespec-tive of the pgHCol(II)A hybridization pattern. If digestion were incomplete,additional bands would have been observed for the X-linked probe.Numerical analyses of the HindIll digestion patterns revealed that the +

allele has an overall gene frequency of .39 (p) and the - allele has an overallgene frequency of .61 (q) in the study population. The +/+ genotype is presentin a frequency of . 13 (p2 = .15), the + / - genotype is present in a frequency of.52 (2pq = .48), and the -/- genotype is present in a frequency of .35 (q2 =.37). In the Caucasian subgroup, the gene frequency of the + allele is .35 (p),and of the - allele, .65 (q). The + / + genotype is present in a frequency of .085(p2 = .12), the +/- in a frequency of .54 (2pq = .46), and the -/- in afrequency of .38 (q2 = .42). In the black subgroup, the gene frequency of the +allele is .48 (p), and of the - allele, .52 (q). The +/+ genotype is present in afrequency of .22 (p2 = .23), the + / - in a frequency of .52 (2pq = .50), and the-/- in a frequency of .26 (q2 = .27). Chi-square analysis revealed no statisti-cally significant deviation from the Hardy-Weinberg distribution in the overallstudy population as well as in the two ethnic subgroups (P > .7). In addition,the differences in the distribution of HindIlI genotypes between the black andCaucasian subgroups were not statistically significant (P = .25).

Segregation of the HindIll RFLP was analyzed in a family from the studypopulation in which members of 3 generations were available (fig. 5). The datademonstrated that the segregation of the + and - alleles is consistent withMendelian inheritance.When DNA samples were digested to completion with BamHI and Southern

filters prepared and hybridized to pgHCol(II)A, two patterns were observed(fig. 6A): a single 4.8-kb band, or a 4.8-band plus a 7.2-kb band. The BamHIsites marked B', B2, and B3 in figure 1 are present in the isolated recombinantclones (see [13] and above). To determine if the observed 7.2-kb fragment is theresult of a loss of site B ' or B3, DNA samples from individuals possessing bothBamHI hybridization patterns were digested sequentially with BamHI andEcoRI and Southern filters prepared and hybridized to pgHCol(II)A (fig. 6B) or

726

TYPE 11 PROCOLLAGEN GENE

A B1 2 3 4 5 6 7 8

8156 1 8157 8159

-14.0- 7.08161 Bl 55

/8149 8158

FIG. 5.-Pedigree and autoradiogram showing segregation of the Hindill polymorphic site in anextended family. A, (+) = presence of polymorphic site; (-) = absence of polymorphic site. B,HindIll digestions of DNA samples from members of pedigree shown in panel A. Lane I is B 149;lane 2, B155; lane 3, B156; lane 4, B 157; laneS, B158; lane 6, B159; lane 7, B 160; lane 8, B161.

pgHCol(II)B (fig. 6C). Hybridization with pgHCol(II)A revealed a single 2.0-kbband, thus demonstrating that the site marked B3 is present in all cases. Hy-bridizations using pgHCol(II)B revealed that DNA obtained from individualswith the 4.8/4.8 genotype resulted in the single expected 2.3-kb band (fig. 6C,lane 1). In contrast, DNA from individuals with the 4.8/7.2 genotype revealedbands at 2.3 kb and 5.0 kb (fig. 6C, lanes 2-4), thus confirming that the 7.2-kbband is due to the absence of the site marked B2 (- allele) and the 4.8-kb bandis due to the presence of this site (+ allele). Maxam-Gilbert sequencing hasbeen performed in both directions from the BamHI site labeled B2 in figure 1.The downstream sequence indicates that this site is approximately 42 bp fromthe exon coding for triple helical amino acids 694-711 [13]. The upstreamsequence contains at least 100 bp of noncoding sequence (C. M. Strom, unpub-lished results, 1983). Therefore, this site probably lies within an intron.To ensure that the BamHI restriction digestions were complete, lambda

control reactions were examined to ensure that the lambda DNA was digestedto completion in all incubations prior to Southern transfer (see MATERIALS ANDMETHODS). In addition, DNA samples from individuals with +/+ and +/-genotypes were digested with BamHI and Southern filters prepared and hy-bridized to pG44. The X-linked probe hybridized to the single expected 6.6-kbband in all cases, and no additional bands were observed that would indicateincomplete restriction digestion (data not shown). DNA samples from individ-uals with both genotypes were digested with increasing restriction enzyme:DNA concentrations of 2 U/,pg, 5 U/[ig, and 10 U/ptg DNA. In all cases, theBamHI bands observed were identical in all digestions (data not shown).

727

A B C1 2 3 4 5 1 2 3 1 2 3 4

-772-4D~~-48 2 0 -3

FIG. 6.-Autoradiograms of Southern filters of BamHI or BamHIIEcoRI digestions of genomicDNA hybridized to various type II procollagen probes. A, BainHI digestions, hybridized topgHCol(II)A. B, Sequential BamnHIIEcoRl digestions, hybridized to pgHCol(II)A. Lanes 1, 2, and3 in panel B correspond to lanes 1, 3, and 4, respectively, in panel A. C, BamHIIEcoRI sequentialdigestions, hybridized to pgHCol(1I)B. Lanes 1, 2, 3, and 4 in panel C correspond to lanes /, 3, 4,and 5, respectively, in panel A.

Numerical analyses of the BamHI digestion patterns (summarized in table 1)revealed that the - allele has a gene frequency of .04 (q) and the + allele has agene frequency of .96 (p) in the overall study population. The + / + genotype ispresent in a frequency of .92 (p2 = .92), the +/- genotype is present in afrequency of .08 (2pq = .08), and the -/- genotype was not observed (q2 =.002). In the Caucasian subgroup, the frequencies of the + and - alleles are 1(p) and 0 (q), respectively. The + / + genotype is present in a frequency of I (p2= 1). In the black subgroup, the frequencies of the + and - alleles are .90 (p)and .10 (q), respectively. The + / + genotype is present in a frequency of .79 (p2= .81), and the + /- genotype, in a frequency of .21 (2pq = .18). Chi-squareanalysis revealed no statistically significant deviation of the BamHI genotypicfrequencies from a Hardy-Weinberg distribution in the overall population aswell as in the two ethnic groups (P > .9). However, the distributions of theBamHI genotypes between the black and Caucasian subgroups differsignificantly (P < .025).When DNA samples from the study population were digested to completion

with EcoRI and hybridized with pgHCol(II)A, three patterns were observed(fig. 7A): a single 3.7-kb band, a 3.7-kb band plus a 7.0-kb band, or a single 7.0-kb band. The three EcoRI sites marked El, E2, and E3 in figure I are present inthe recombinant clones (see [13] and above). Absence of the site marked E2 infigure 1 would be expected to yield a 13.5-kb band and therefore could not beresponsible for the observed 7.0-kb band. Thus, the 7.0-kb polymorphic frag-ment is probably due to the absence of the site marked E3 in figure 1. To

ENG AND STROM728

TYPE II PROCOLLAGEN GENE

A B1 2 3 1 2

FIG. 7.-Autoradiograms of Southern filters of EcoRI or BamnHI/EcoRI digestions of genomicDNA hybridized to pgHCol(II)A. A, EcoRI. B, Sequential BamnHIIEcoRI digestions. Lanes I and 2correspond to lanes I and 2, respectively, in panel A.

confirm this possibility, DNA prepared from individuals with the 3.7/3.7genotype or the 3.7/7.0 genotype was sequentially digested with BamHI andEcoRI and Southern filters prepared and hybridized with pgHCol(II)A. Bothsamples had only the single 2.0-kb fragment (fig. 7B). These data confirm thatthe 7.0-kb band is probably the result of the absence of the site E3 (- allele) andthe 3.7-kb band is probably the result of the presence of the site E3(+ allele).However, we cannot eliminate the possibility that an insertion of 3.3 kb ofDNA has occurred downstream from the site marked B3 in figure 1.To ensure that the EcoRI restriction digestions were complete, lambda con-

trol reactions were examined to ensure that the lambda DNA was digested tocompletion in all incubations prior to Southern transfer (see MATERIALS ANDMETHODS). DNA samples from individuals with the +1/+, +1/-, or -I/- geno-types were digested with EcoRI and Southern filters prepared and hybridizedto pG44. The X-linked probe hybridized to the single expected 3.4-kb band(data not shown).Numerical analyses of the EcoRI digestion patterns (summarized in table I)

revealed that the - allele has a gene frequency of .02 (q) and the + allele has agene frequency of .98 (p) in the overall study population. The +1/+ genotype ispresent in a frequency of .97 (p2 = .96), the +1/- genotype is present in afrequency of .015 (2pq = .04), and the -I/- genotype is present in a frequencyof .015 (q2 = .0004). In the Caucasian subgroup, the + and - alleles arepresent in gene frequencies of .99 (p) and .01 (q), respectively. The +1+genotype is present in a gene frequency of .97 (p2 = .98), the +I/ in afrequency of .03 (2pq = .02), and the -I/- genotype was not observed. In theblack subgroup, the + and - alleles are present in gene frequencies of .95 (p)and .05 (q). The +1± genotype is present in a gene frequency of .95 (p2 = .90o),the -I/- in a frequency of .05 (q2 = .0025), and the +I/ genotype was notobserved (2pq = .0975). Chi-square analysis revealed no significant deviation

729

ENG AND STROM

from the Hardy-Weinberg distribution in the overall study population as well asin the ethnic subgroup (P > .25). Furthermore, the distribution of EcoRlgenotypes between the black and Caucasian subgroups were not significantlydifferent (P > .25).

Table 1 also presents the frequencies of the various BamHI-EcoRI-HindIllhaplotypes present in the study population. This analysis was possible onlybecause almost all individuals (98%) whose BamHI, EcoRI, and HindIllgenotypes were determined were heterozygous for only one restriction site.However, there were two black individuals (2%) who had BamHI, EcoRI, andHindIll genotypes of +/-, -/-, and +/-, respectively. The presence ofheterozygosity of two restriction sites made it impossible to assign haplotypesfor these individuals. These data were therefore omitted from the calculationsof haplotype frequencies. Haplotype I is present in a higher frequency in theCaucasian subgroup than in the black one. Haplotypes 1, II, and III wereobserved in the Caucasian sample, while haplotypes I-IV were observed inblacks. However, chi-square analysis using a 2 x 4 contingency table revealedthat the distribution of haplotypes between the Caucasian and black subgroupswas not significantly different (P = .25).

DISCUSSION

In this report, a HindIll polymorphic site occurring with a frequency of .39, aBamHI polymorphic site occurring with a frequency of .04, and an EcoRIpolymorphic site occurring with a frequency of .02 were identified and mapped.The allelic frequencies were also calculated for blacks and Caucasians. Appro-priate experiments were performed to eliminate the possiblity that incompleterestriction enzyme digestion or contamination by cloned DNA sequences wereresponsible for the observations. The genotypes are present in a Hardy-Weinberg distribution in the study population. A family study revealed that theHindIll segregation pattern is consistent with Mendelian inheritance. Thisconfirms preliminary reports by Strom and Chien [22] regarding these polymor-phisms. We have previously determined that the recombinant cosmid cloneisolated by Weiss et al. [23] represents the human type II procollagen gene ([14]and C. M. Strom and W. B. Upholt, unpublished data, 1984). There are somediscrepancies between the map generated by Weiss et al. and the map in figure1 regarding the precise placement of certain restriction sites. This is probablydue to differences in mapping strategies. Preliminary reports by Soloman et al.noted a HindIll polymorphism present in an allelic frequency of .45 using DNAprobes from their cosmid clones [24]. The sites of the normal and polymorphicbands were not reported. Preliminary reports by Francomano et al. mentioneda polymorphic Hind.I1 site in the human type 11 procollagen gene that gener-ates an 8.0-kb fragment when present and an 18.0-kb fragment when absent[25]. They also reported a polymorphic BamHI site that yields a 4.8-kb frag-ment when present and a 7.0-kb fragment when absent. The - allele was foundin one black out of 84 DNA samples from their study population [24]. Thepresent study has confirmed the absence of the - allele in Caucasians. In ourstudy population, the frequency of the + HindIll allele is .35 for Caucasians

730

TYPE II PROCOLLAGEN GENE

and .48 for blacks. Francomano et al. [24] reported the frequencies to be .47 inCaucasians and .37 in blacks. The frequencies of the + HindIll allele in theblack and Caucasian subgroups in our study population are significantly differ-ent from those reported by Francomano et al. (P < .005). This may be due tosampling from two different populations. These BamHI and Hind.l1 poly-morphic sites reported by Solomon et al. and Francomano et al. are almostcertainly the same sites reported by Strom and Chien [22] and mapped in thisstudy.

Preliminary results revealed that the distribution of the HindIll genotypes ina sample of 23 Caucasian achondroplasts was statistically significantly differentfrom that of the normal Caucasian population (C. E. L. Eng, R. M. Pauli, andC. M. Strom, unpublished results, 1985). Therefore, the observation suggeststhat the HindIll genotypes segregate differently in the population of achondro-plasts and may suggest a relationship between the type 11 collagen gene andachondroplasia. Segregation of the HindIll polymorphic site was also analyzedin nine families with achondroplasia. These family studies were uninformative.The use of RFLPs to establish genetic linkage is valuable in the study of

genetic disorders. The polymorphic sites and haplotypes demonstrated in thisreport may eventually be used for linkage analyses of genetic diseases involv-ing defective cartilage formation.

ACKNOWLEDGMENTS

We thank C. Chien, T. Christides, and C. Belles for expert technical assistance. Wealso thank the Puscheck family and the Strom family for donating peripheral bloodsamples. We are grateful to Dr. W. B. Upholt for critical review of this manuscript.

REFERENCES

1. KAN YW, Dozy AM: Polymorphism of DNA sequence adjacent to human 3-globinstructural gene: relationship to sickle mutation. Proc Natl Acad Sci USA 75:5631-5635, 1978

2. ORKIN SH, KAZAZIAN HH, ANTONARAKIS SE, ET AL.: Linkage of 1-thalassemiamutations and P-globin gene polymorphisms with DNA polymorphisms in human 1-globin gene cluster. Nature 296:627-631, 1982

3. CHANG JC, KAN YW: A sensitive new prenatal test for sickle-cell anemia. N Engl JMed 307:30-32, 1982

4. PROCHOWNIK EV, ANTONARAKIS S, BAUER KA, ROSENBERG RD, FEARON ER, ORKINSH: Molecular heterogeneity of inherited antithrombin III deficiency. N Engl J Med308:1549-1552, 1983

5. ERLICH HA, STETLER D, SHENG-DONG R, NESS D, GRUMENT C: Segregation andmapping analysis of polymorphic HLA class I restriction fragments: detection of anovel fragment. Science 222:72-74, 1983

6. Woo SLC, LIDSKY AS, GUTTLER F, THIRUMALACHARY C, ROBSON KJH: Prenataldiagnosis of classical phenylketonuria by gene mapping. J Am Med Assoc251:1998-2002, 1984

7. PHILLIPS JA, HJELLE BL, SEEBURG PH, ZACHMANN M: Molecular basis for familialisolated growth hormone deficiency type I. Proc Nati Acad Sci USA 78:6372-6375,1981

8. TSIPOURAS P, MYERS JC, RAMIREZ F, PROKOP DJ: Restriction fragment length poly-morphism associated with the proa2(I) gene of human type I procollagen. J ClinInvest 72:1262-1267, 1983

731

ENG AND STROM

9. BOTSTEIN D, WHITE RL, SKOLNICK M, DAVIS RW: Construction of a genetic linkagemap in man using restriction fragment length polymorphisms. Am J Hum Genet32:314-331, 1980

10. HUTCHINSON C, NEWBOLD J, POTTE S, EDGELL MH: Maternal inheritance of mamma-lian mitochondrial DNA. Nature 251:536-538, 1974

11. BORNSTEIN P, SAGE H: The biochemistry of collagens. Ann Rev Biochem 49:957-1003, 1980

12. VON DER MARK K: The collagens. Curr Top Dev Biol 14:199-225, 198013. STROM CM, UPHOLT WB: Isolation and characterization of genomic clones corre-

sponding to the human type II procollagen gene. Nucleic Acids Res 12:1025-1038,1984

14. STROM CM, EDDY RL, SHOWS TB: Localization of the human type II collagen geneto chromosome 12. Somat Cell Mol Genet 10:651-655, 1984

15. FIETZEK PP, KUHN K: The primary structure of collagen. Int Rev Connect TissueRes 7:1-60, 1976

16. SANDELL LJ, PRENTICE HL, KRAVIS D, UPHOLT WB: Structure and sequence of thechicken type II procollagen gene. J Biol Chem 259:7826-7834, 1984

17. KUNKEL L, TANTRAUAHI U, EISENHARD M, LATT SA: Regional localization on thehuman X of DNA segments from flow sorted chromosomes. Nucleic Acids Res10:1557-1578, 1982

18. MANIATIs T, FRITSCH EF, SAMBROOK J: Molecular Cloning. A laboratory manual,Cold Spring Harbor, N.Y., Cold Spring Harbor Laboratory, 1982

19. HOOPES BC, MCCLURE WR: Studies on the selectivity of DNA precipitation byspermine. Nucleic Acids Res 9:5493-5504, 1981

20. SOUTHERN EM: Detection of specific sequences among DNA fragments separatedby gel electrophoresis. J Mol Biol 98:503-517, 1975

21. SUTCLIFFE JG: pBR322 restriction map marked from the DNA sequence: accurateDNA size markers up to 4361 nucleotide pairs long. Nucleic Acids Res 5:2721-2728,1978

22. STROM CM, CHIEN C: Restriction fragment length polymorphisms in the human typeII collagen gene. Fed Proc 43:1852, 1984

23. WEISS EH, CHEAH KSE, GROSVELD FG, DAHL HHM, SOLOMON E, FLAVELL R:Isolation and characterization of a human collagen atl(I)-like gene from a cosmidlibrary. Nucleic Acids Res 10: 1981-1994, 1982

24. SOLOMON E, HIORNS LR, PARKER M, CHEAH KSE, WEISS E, FLAVELL RA: Assign-ment of a human alpha-1(I)-like gene to chromosome 12, by molecular hybridiza-tion. Cytogenet Cell Genet 37:588-589, 1984

25. FRANCOMANO AM, NUNEZ AM, YAMADA Y, PHILLIPS JA III: Cartilage collagen geneanalysis in human chondrodysplasias. Am J Hum Genet 36:137S, 1984

Annual MeetingAmerican Society of Human Genetics

October 9-12, 1985Hotel Utah

Salt Lake City, UtahProgram Committee Chair: Arthur BeaudetLocal Committee Chair: Mary Dadone

732