9
THE JOURNAL OF EXPERIMENTAL ZOOLOGY 282:245–253 (1998) © 1998 WILEY-LISS, INC. Extended Analysis of the Region Encompassing the PRM1®PRM2®TNP2 Domain: Genomic Organization, Evolution and Gene Identification JEFFREY A. KRAMER, 1 MARK D. ADAMS, 2 GAUTAM B. SINGH, 3 NORMAN A. DOGGETT, 4 AND STEPHEN A. KRAWETZ 1 * 1 Department of Obstetrics & Gynecology, and the Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, Michigan 48201 2 The Institute for Genome Research, Rockville, Maryland 20850 3 Bioinformatics Algorithms Research Division, National Center for Genome Resources, Santa Fe, New Mexico 87505 4 Center for Human Genome Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 ABSTRACT The human male haploid expressed protamine 1 (PRM1) protamine 2 (PRM2) transition protein 2 (TNP2) locus comprises a coordinately regulated multigenic domain. This region of 16p13.13 has been used as a model to address how the organization of genes and genic domains within the human genome may influence tissue specific gene expression. Toward this goal, we have completed an extensive computational and biological analysis of the region encompassing the PRM1PRM2TNP2 domain. These analyses have revealed the likely genesis of this domain. Interestingly, the SOCS-1 gene and an hnRNP C-class pseudogene lies just 3´ of this domain. Regions of nuclear matrix attachment also mark these newly identified genes. J. Exp. Zool. 282:245253, 1998. © 1998 Wiley-Liss, Inc. Grant sponsor: U.S. D.O.E; Grant number: W-7405-ENG-36. *Correspondence to: Dr. Stephen A. Krawetz, Department of Ob- stetrics & Gynecology, and the Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, 275 E. Hancock, Detroit, Michigan 48102. Received 3 April 1998; Accepted 3 April 1998 The protamines are small, basic proteins contain- ing a characteristic cluster of arginine residues. They are responsible for compacting sperm DNA during the final steps of spermiogenesis and are expressed solely in the testis. While fish contain multiple copies of the PRM1 (protamine 1) gene scattered throughout their genome (Dixon et al., ’86), avian species, like chicken, contain two tan- demly arrayed copies (Oliva and Dixon, ’89). In contrast, the mammalian PRM1 gene, PRM2 (protamine 2) gene and TNP2 (transition pro- tein 2) gene exist as a single coordinately ex- pressed sperm-specific cluster. Together this cluster comprises a single chromatin domain (Choudhary et al., ’95). Sequence analysis of the human PRM1PRM2TNP2 locus revealed a fourth candidate coding segment between the PRM2 and TNP2 genes that was termed “gene 4” (Nelson and Krawetz, ’94). The presence of this sequence was recently confirmed in the human gene cluster and a similar sequence has also been shown to be present within the rat, mouse and bull Prm1Prm2Tnp2gene clusters (Schlüter et al., ’96). It was suggested that the rat gene 4 homologue that was termed Prm3, represented an expressed, protamine-related gene (Schlüter and Engel, ’95). Computational analysis shows that the mem- bers of the mammalian PRM1PRM2gene4TNP2 cluster share sequence similarity at both the nucleotide and amino acid levels. Even though this gene family is evolving at an unusually rapid rate (Retief and Dixon, ’93), this cluster is con- served in several mammalian species. Together with previous data (Krawetz and Dixon, ’88 and references therein), this supports the view that the PRM1PRM2gene4TNP2 cluster arose from a single PRM1-like progenitor by a series of gene duplication events. However, it is likely that the human gene4/PRM3 is an inactive duplicated copy of PRM1 that was created during the gen-

Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

Embed Size (px)

Citation preview

Page 1: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

THE JOURNAL OF EXPERIMENTAL ZOOLOGY 282:245–253 (1998)

© 1998 WILEY-LISS, INC.

Extended Analysis of the Region Encompassing thePRM1®PRM2®TNP2 Domain: GenomicOrganization, Evolution and Gene Identification

JEFFREY A. KRAMER,1 MARK D. ADAMS,2 GAUTAM B. SINGH,3NORMAN A. DOGGETT,4 AND STEPHEN A. KRAWETZ1*1Department of Obstetrics & Gynecology, and the Center for MolecularMedicine and Genetics, Wayne State University School of Medicine, Detroit,Michigan 48201

2The Institute for Genome Research, Rockville, Maryland 208503Bioinformatics Algorithms Research Division, National Center for GenomeResources, Santa Fe, New Mexico 87505

4Center for Human Genome Studies, Los Alamos National Laboratory, LosAlamos, New Mexico 87545

ABSTRACT The human male haploid expressed protamine 1 (PRM1) →protamine 2 (PRM2)→transition protein 2 (TNP2) locus comprises a coordinately regulated multigenic domain.This region of 16p13.13 has been used as a model to address how the organization of genesand genic domains within the human genome may influence tissue specific gene expression.Toward this goal, we have completed an extensive computational and biological analysis ofthe region encompassing the PRM1→PRM2→TNP2 domain. These analyses have revealedthe likely genesis of this domain. Interestingly, the SOCS-1 gene and an hnRNP C-class pseudogenelies just 3´ of this domain. Regions of nuclear matrix attachment also mark these newly identifiedgenes. J. Exp. Zool. 282:245�253, 1998. © 1998 Wiley-Liss, Inc.

Grant sponsor: U.S. D.O.E; Grant number: W-7405-ENG-36.*Correspondence to: Dr. Stephen A. Krawetz, Department of Ob-

stetrics & Gynecology, and the Center for Molecular Medicine andGenetics, Wayne State University School of Medicine, 275 E. Hancock,Detroit, Michigan 48102.

Received 3 April 1998; Accepted 3 April 1998

The protamines are small, basic proteins contain-ing a characteristic cluster of arginine residues.They are responsible for compacting sperm DNAduring the final steps of spermiogenesis and areexpressed solely in the testis. While fish containmultiple copies of the PRM1 (protamine 1) genescattered throughout their genome (Dixon et al.,’86), avian species, like chicken, contain two tan-demly arrayed copies (Oliva and Dixon, ’89). Incontrast, the mammalian PRM1 gene, PRM2(protamine 2) gene and TNP2 (transition pro-tein 2) gene exist as a single coordinately ex-pressed sperm-specific cluster. Together thiscluster comprises a single chromatin domain(Choudhary et al., ’95).

Sequence analysis of the human PRM1→PRM2→TNP2 locus revealed a fourth candidatecoding segment between the PRM2 and TNP2genes that was termed “gene 4” (Nelson andKrawetz, ’94). The presence of this sequence wasrecently confirmed in the human gene cluster anda similar sequence has also been shown to bepresent within the rat, mouse and bull Prm1→Prm2→Tnp2→gene clusters (Schlüter et al., ’96).

It was suggested that the rat gene 4 homologuethat was termed Prm3, represented an expressed,protamine-related gene (Schlüter and Engel, ’95).

Computational analysis shows that the mem-bers of the mammalian PRM1→PRM2→gene4→TNP2 cluster share sequence similarity at boththe nucleotide and amino acid levels. Even thoughthis gene family is evolving at an unusually rapidrate (Retief and Dixon, ’93), this cluster is con-served in several mammalian species. Togetherwith previous data (Krawetz and Dixon, ’88 andreferences therein), this supports the view thatthe PRM1→PRM2→gene4→TNP2 cluster arosefrom a single PRM1-like progenitor by a series ofgene duplication events. However, it is likely thatthe human gene4/PRM3 is an inactive duplicatedcopy of PRM1 that was created during the gen-

Page 2: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

246 J.A. KRAMER ET AL.

esis of the present mammalian PRM1→PRM2→TNP2 domain.

In order to determine whether this region wasduplicated further during the course of the evolu-tion of the human genome, we have begun to iso-late and sequence a series of cosmid clonescontiguous with this region. To date this analysishas revealed two additional genes. Within this re-gion of 16p13.13, the SOCS-1 gene that is markedby a CpG island abuts the PRM1→PRM2→TNP2genic domain. In addition, a member of thehnRNP C-class of genes lies just 3´ to this region.

MATERIALS AND METHODSEvolutionary analysis of the human

protamine gene clusterDNA sequence alignments were performed us-

ing the coding regions of the four members of theprotamine domain of human (Accession no. HSU-15422), mouse (Accession no. Z47352), rat (Acces-sion no. Z46939) and bull (Accession no. Z46938),with the quail protamine sequence (Accession no.M30275) as an outgroup. The alignment was per-formed on a Sun SPARC workstation using ClustalW (Thompson et al., ’94). Bootstrapped phylogenetictrees were generated from the resulting alignmentsusing Treetool version 2.0.1 (Maidak et al., ’94).

Screening the human chromosome 16specific library and genomic sequencingThe Los Alamos National Laboratory human

chromosome 16 specific cosmid library (LA16NC02)was prepared from flow-sorted chromosomes 16that were partially digested with Sau3AI andcloned into the BamHI site of sCOS1 (Longmireet al., ’93). Each clone in the library was desig-nated by its corresponding microtiter plate coordi-nates. High density gridded nylon membraneswere prepared at a 4 × 4 × 96 density using theLANL gridding robot. These membranes were hy-bridized using 32P-radiolabeled sequence specificsubclones, restriction fragments and PCR productsas probes. PCR primers and conditions are avail-able at the Internet address http://compbio.med.wayne.edu/. Membranes were prehybridized in ahybridization solution of 5× SSPE containing 0.1%polyvinylpyrrolidone, 0.1% Ficoll 400, 50% form-amide, 1% SDS, and 0.1 mg/ml yeast tRNA for 1hr at 45°C, then hybridized overnight at 45°C infresh hybridization solution containing 1.25% poly-ethylene glycol 8000 and ~106 cpm/ml of the ap-propriate radiolabeled probe. The membranes werethen washed for 30 min at ambient temperature

in a solution of 2× SSPE with 0.1% SDS then at45°C–50°C in 0.1× SSPE with 0.1% SDS for an ad-ditional 15–30 min. Subsequently the membraneswere autoradiographed overnight at –70°C. North-ern and Southern hybridization analyses were es-sentially as described (Choudhary et al., ’95). The~16 kb EcoRI terminal fragment of clone 356d7 thatoverlaps the 3´ region of hp3.1 was sequenced inits entirety using the strategy and techniques es-sentially as described (Fleischmann et al., ’95).

Computational analysisAll computational analyses were performed on

a Sun SPARC workstation. Potential exons andCpG islands were identified using GRAIL, version1.2 (Uberbacher and Mural, ’91). Uneven posi-tional base preference and HpaII sites were de-fined using Staden (x)nip (Staden, ’88). Repetitiveelements were identified using CENSOR at http://charon.lpi.org/~server/. GRAIL exons were searchedagainst GenBank using BLAST (Altschul et al., ’90)at http://www.ncbi.nlm.nih.gov/BLAST/. Potentialcoding segments were also searched against theTIGR human cDNA database (HCD: Adams et al.,’95). MAR analysis was performed as previouslydescribed (Singh et al., ’97) using MAR-FINDERver. 0.5. A copy of the latest version is availableat http://www.ncgr.org/MarFinder/.

Functional MAR analysisComputationally defined MARs were experi-

mentally assessed as described (Kramer andKrawetz, ’96, ’97). Briefly, cells frozen in 50 mMHepes, pH 7.5, buffer, containing 5 mM MgOAc,10 mM NaCl plus 25% glycerol were rapidly hand-thawed, brought up to 1 ml with phosphate buff-ered saline (PBS), then centrifuged at 4°C for 5min at 7,500g. The cell pellet was washed twicewith PBS, then suspended in nuclei buffer com-posed of 10 mM Tris-HCl, pH 7.7, buffer contain-ing 100 mM NaCl, 0.3 M sucrose, 3 mM MgCl2plus 0.5 % Triton X-100, then incubated for 15min on ice. The resulting nuclei were washedtwice with PBS, then suspended in 1 ml of a halosolution comprised of 10 mM Tris-HCl, pH 7.7,buffer containing 10 mM EDTA and 2 M NaCl.The sample was incubated on ice for 15 min, thencentrifuged at 4°C for 10 min at 12,000g. The su-pernatant was removed and the resulting nuclearhalo structures were suspended in 100 µl of a 1×restriction buffer, then digested with a restrictionendonuclease for 4 hr at 37°C. Following diges-tion, an equal volume of 4 M NaCl was added,

Page 3: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

GENOMIC ANALYSIS OF A SEGMENT OF 16P13.13 247

the sample was then incubated for 15 min at37°C, then centrifuged at 4°C for 30 min at12,000g. The supernatant containing the matrixindependent fraction was then transferred to anew tube, whereas the pellet containing the ma-trix bound fraction was suspended in 200 µl ofTE. DNA was purified from both fractions thensubjected to PCR amplification (Kramer andKrawetz, ’97).

RESULTSGenesis of the mammalian PRM1® PRM2®

TNP2 domainPairwise alignments of the genes within this re-

gion of human chromosome 16p13.13 revealedthat the individual members of the domain weresimilar to one another at both the nucleotide andamino acid levels (data not shown). Among thefour members of the human locus, the PRM1 geneshowed the greatest similarity at the nucleotidelevel to the PRM2 gene (53%), but also showedsimilarity to gene4 (47%), while the TNP2 genewas most closely related to the PRM2 gene (47%).The corresponding bootstrapped phylogenetic treepresented in Figure 1a shows the relatedness ofeach of the members of several mammalianPRM1→PRM2→gene4→TNP2 clusters and sup-ports the model of the genesis of this locus pre-sented in Figure 1b. Additional comparisons of the5´ UTR of human gene4 with the other membersof the locus revealed only limited conservation ofthe putative promoter/enhancer elements thatwere present in protamines 1 and 2 and transi-tion protein 2 genes (Kramer and Krawetz, un-published observation). Similar comparisons ofgene4/Prm3 sequences from several species re-vealed numerous differences in their putative pro-moter/enhancer regions including a possible repeatexpansion in the 5´ UTR of the human gene. Bothanalyses are commensurate with the apparentlack of expression of this gene in human testis asdescribed below.

Lack of expression of gene4Human multiple tissue Northerns, and testis

Northerns from human, mouse and transgenicmice containing 9 and 12 copies of the humanPRM1→PRM2→gene4→TNP2 locus were uti-lized to assess the expression of gene4 withsubclone probe 7a5 (pos. 22,764–23,488). In ad-dition, PRM1 expression was assessed usingsubclone probe 3f1 (pos. 14,637–15,082). At highstringencies of hybridization (45°C) and wash-

ing (45°C–65°C), the gene4 specific probe 7a5failed to identify any transcript, despite expo-sure times of up to 3 weeks. In contrast, usingless stringent conditions, i.e., 37°C hybridizationand washing, a ~0.5 kb testis transcript couldbe visualized after a one week exposure. How-ever, under these less stringent conditions, hy-bridization to both the 28S and 18S rRNAs wasalso observed. When the stringency of washingwas increased so as to remove hybridization tothe rRNAs, the signal corresponding to that ofthe 0.5 kb transcript was also obliterated. It isinteresting to note that when the PRM1 specificsubclone 3f1 (pos. 14,637–15,082) was used as aprobe under conditions of both high and lowstringency, a similar sized ~0.5 kb testis tran-script was observed. Comparison of the predictedgene4 coding sequence to the TIGR HumancDNA Database (HCD; Adams et al., ’95) and thehuman expressed sequence testis database (Paw-lak et al., ’95) failed to identify a homologoussequence. Together these data suggest that atleast in humans, gene4 has been inactivated.

Identification of clones contiguous to hP3.1Screening the human chromosome 16 cosmid li-

brary identified several cosmid clones contiguouswith the PRM1→PRM2→TNP2 domain. As shownin Figure 2, probe 8e4 (pos. 38,765–39,464), di-rected towards the 3´ end of the cosmid hP3.1,identified clones 356d7, 327b2 and 395c1. Cosmid356d7 was selected for further analysis and re-striction mapping was performed. A large ~16 kbEcoRI fragment that overlaps hP3.1 (Accession no.HSU15422) was gel purified, subcloned, then se-quenced (Accession no. AC002047) yielding a53,060 bp sequence contig of this region. Primersto 3´ end of this region amplified a 354 bp prod-uct (pos. 52,616–52,969) that was then used as aprobe to identify clone 345f2. Subclone 3d11 (pos.938–1,433) was used as a probe to identify clones381f11 and 424c10 that overlap with the 5´ endof hP3.1. As shown in Figure 2, an ~1.7 kb BamHIfragment is present toward the 5´ end of clone381f11. This has been used as a probe to identifythe distal 5´ overlapping clones 440e5, 380g6 and432h7. Clone 440e5 has been selected for subse-quent analysis.

Computational analysisComputational analysis of the extended sequence

of this 53,060 bp region was undertaken to assessthe coding capacity of this region. As shown in Fig-ure 3a and detailed in Table 1, 94 different repeti-

Page 4: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

248 J.A. KRAMER ET AL.

tive elements are scattered throughout this re-gion. These included 51 Alu’s, 18 L1s, 16 MIRs(mammalian-wide interspersed repeat), 5 MLTs(mammalian LTR-transposon), 3 MERs (mediumreiteration repeat) and a single HRES (humanHTLV-1 related endogenous retroviral sequence)element. Of the 51 Alu’s present, 21 correspondto the most ancient J-class, 24 are from the S-class and 6 are from the evolutionarily recent (~5Myr old) Y-class.

As shown in Figure 3, the distribution of HpaIIsites compared favorably to the location of CpGislands that are usually indicative of segmentscontaining housekeeping genes. GRAIL analysisidentified multiple regions with “good” or “excel-lent” coding potential (Fig. 3b, light and darkshaded regions respectively). Uneven positionalbase preference essentially followed the GRAILpredictions. In the forward strand, the members

of the PRM1→PRM2→gene4→TNP2 domainwere identified while in the reverse strand anhnRNP C2-like sequence was identified. Addi-tional candidate GRAIL exons did not possess anysignificant similarity to any sequence depositedin GenBank. In contrast, uneven positional basepreference also revealed the presence of theSOCS-1 gene.

MAR analysis reveals sperm specificand somatic MARs

MAR-FINDER analysis (Singh et al., ’97) sum-marized in Figure 4a showed four peaks with high,i.e., >70%, matrix association potential. Two ofthese peaks (pos. 8,750 and 34,400) have been pre-viously biologically confirmed as sperm-specificMARs (Kramer and Krawetz, ’96). As shown inFigure 4b, biological analysis of the region circum-scribing position 51,500 that is just 3´ of the re-

Fig. 1. Genesis of the PRM1→PRM2→TNP2 domain. (a) Phylo-genetic analysis shows the relat-edness of the members of themammalian PRM1→PRM2→TNP2 domain. The bootstrappedphylogenetic tree suggests thatPRM2 and TNP2 arose from anidentical progenitor, while PRM1shares a direct ancestor withgene4. The values at each noderepresent the number of trials(out of 1000) that placed eachbranch point at that position. Thequail protamine sequence wasused as an outgroup. (b) Subse-quent to the divergence of mam-mals, an ancient progenitor of thecurrent PRM1 gene was dupli-cated, giving rise to the PRM2gene. The PRM1→PRM2 locusthat arose then underwent an-other duplication, giving rise tothe current mammalian PRM1→PRM2→gene4→TNP2 organiza-tion. Gene4 appears to havebeen inactivated in at leastsome species.

Page 5: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

GENOMIC ANALYSIS OF A SEGMENT OF 16P13.13 249

gion containing the hnRNP C-like sequence, re-vealed a third sperm-specific MAR. Like theSMARs of the PRM1→PRM2→TNP2 domain, thisregion is not associated with the somatic nuclearmatrix. In contrast, the large peak centered atposition 38,200 shown in Figure 4b, that lies just5´ of the SOCS-1 gene represents a somatic MAR,similar to those of the β-globin gene (Kramer andKrawetz, ’96, ’97).

DISCUSSION

It has been established that the members of thePRM1→PRM2→TNP2 domain are homologous. Allthree genes contain a characteristic cluster of argi-nine residues arising largely from the use of AGGand AGA codons. This most likely reflects their evo-lution from a basic pentapeptide core (Black andDixon, ’67). It has been shown that the position ofthe members of the mammalian domain is con-served in other mammals, i.e., mouse, bull and rat(Schlüter et al., ’96). Furthermore, it has been notedthat each of the three human genes contains twoexons and that the intron sects the first nucleotideof a codon (Schlüter et al., ’92).

Analysis of the segment of human chromosome16p13.13 encompassing the PRM1→PRM2→TNP2 domain had revealed a potential fourth cod-ing region (Nelson and Krawetz, ’94). While the

gene4 region does not appear to contain a secondexon, it is important to note that the second exonin all members of this cluster contains both a veryshort (14–41 nucleotides) and divergent codingsegment. As we have shown above, gene4 appearsto have been inactivated. This would have removedselective pressure to maintain this sequence andthus would have enhanced its divergence. Never-theless, as demonstrated by the phylogenetic analy-sis of Figure 1, gene4/Prm3, is clearly related tothe other members of this dynamically evolvinglocus. This has led us to propose the model of thegenesis of the locus shown in Figure 1b. Specifi-cally, and in accord with other analyses (Krawetzet al., ’87), these data suggest that PRM1 was du-plicated giving rise to PRM2. The PRM1→PRM2locus was then duplicated again in mammals, giv-ing rise to gene4 and TNP2. As defined both bio-logically and biophysically, the PRM1→PRM2→TNP2 locus forms a single DNase I-sensitive genicdomain (Choudhary et al., ’95; Kramer and Kra-wetz, ’96). This domain can essentially be consid-ered an independent evolutionary unit as it sharesno apparent similarity to its neighboring gene con-taining region.

The insertion of numerous mammalian and pri-mate specific repetitive elements into the PRM1→PRM2→TNP2 domain during the recent evolu-tionarily past, including six of the new Y-class of

Fig. 2. Isolation of cosmid clones from the region surround-ing the PRM1→PRM2→TNP2 domain from the human chro-mosome 16 library. Southern hybridization using specificrestriction fragments or PCR product probes identified adja-cent clones from the human chromosome 16 specific library(Doggett et al., ’95). BamHI (B), EcoRI (E) and HindIII (H)

restriction sites are indicated at the top. Cosmid clones identi-fied by their plate coordinates are indicated as lines, individualrestriction fragments, subclones or PCR amplicons used forSouthern analysis are indicated as shaded boxes. Closed ar-rowheads indicate the known ends of each cosmid insert, openarrowheads indicate that the end has yet to be established.

Page 6: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

250 J.A. KRAMER ET AL.

Alu elements, suggests that this region remainsrecombinogenic. In light of the frequency of theseevents, we sought to determine whether this pro-pensity toward insertion extended to its adjacentregions. Cosmid clones were identified that werecontiguous with the cosmid hP3.1. Clone 356d7has been sequenced as described. Other clonesthat will extend this region are being mapped andsequenced. As shown in Figure 3a, it is interest-ing to note that, like hP3.1, both 381f11 and

424c10 contain a large, i.e., ~7 kb, cluster of 14repetitive elements (positions 3,078–10,059). Thisregion contains two fragments from a single MLT1element, eight fragments from three differentclasses of L1 elements and 11 fragments from 10Alu elements that are essentially separated byless than 40 bp. Clones containing this dense clus-ter of repetitive elements, like 381f11 and hP3.1were clonally unstable (Nelson and Krawetz, ’95),while 424c10 was refractory to growth in culture.

TABLE 1. Location and classification of repetitive elements

Position Length Classification Position Length Classification

265–384 120 Alu-Jo 18,877–18,991 115 MIR463–515 53 MER5A 19,007–19,284 278 Alu-Jb553–619 67 L1ME3A 21,404–21,293 112 MLT1D620–908 288 Alu–Sz 21,839–21,562 278 Alu-Sq

1,427–1,695 269 Alu-Jb 23,388–23,595 208 MIR2,125–1,914 212 Alu-Sz 24,745–24,845 101 MIR2,186–2,445 260 MER33 25,262–24,846 417 MLT1C2,544–2,470 76 Alu-Sxzg 25,403–25,697 295 Alu-Jo2,621–2,545 77 Alu-Sz 26,766–26,527 240 MIR2,761–2,622 140 Alu-Jo 28,928–28,811 118 LINE23,078–3,246 159 LINE2 29,341–29,189 153 Alu-Jo3,277–3,564 288 Alu-Sxz 29,470–29,345 126 Alu-Sz3,598–3,831 234 LINE2 29,698–29,545 154 L1PA73,931–3,845 87 Alu-Sz 29,851–29,702 150 Alu-Sz4,168–3,932 237 Alu-Sxzg 29,990–29,854 137 Alu-Jo4,372–4,170 203 Alu-Sz 30,306–30,017 290 Alu-Y4,437–4,860 424 MLT1D 31,232–31,376 145 MIR4,875–5,125 251 Alu-Jo 31,692–31,519 174 MIR5,148–5,209 62 MLT1D 32,711–32,909 199 MIR5,214–6,231 1018 LINE2 33,536–33,399 138 Alu-J6,570–6,293 278 Alu-Jb 33,601–33,887 287 Alu-Jb6,578–7,536 959 L1MC3 34,583–34,348 236 Alu-Jb7,554–7,827 274 Alu-Sg 34,879–34,591 289 Alu-Y7,841–7,910 70 L1MC3 36,212–36,547 336 LINE27,912–8,104 193 L1 36,651–36,558 94 MIR8,108–8,379 272 Alu-Y 37,549–37,810 262 Alu-J8,380–8,741 362 L1 37,983–37,882 102 MIR8,780–9,069 290 Alu-Sq 38,309–37,997 313 Alu-Jo9,081–9,508 428 L1MC3 38,404–38,317 88 MIR9,793–9,521 273 Alu-Y 40,431–40,056 376 HRES1

10,059–801 259 Alu-Jb 42,171–42,043 139 MIR11,419–11,325 95 Alu-S 43,592–43,497 96 MIR11,480–11,688 209 Alu-Sz 44,355–44,291 65 MIR12,148–12,223 76 Alu-Sz 44,711–44,419 293 Alu-Jo12,767–12,684 84 L1MC4 45,335–45,052 284 Alu-Jo12,903–12,787 117 Alu-Jo 45,441–45,336 106 LINE213,079–12,917 164 L1MC4 45,779–45,613 167 LINE213,359–13,316 44 L1MB7 47,197–47,481 285 Alu-Sz13,371–13,649 279 Alu-Sz 47,765–47,500 266 Alu-Jb13,843–13,685 279 L1MB7 48,121–47,922 200 MIR14,144–14,270 128 Alu-Jo 48,476–48,769 294 Alu-Sx14,276–14,565 290 Alu-Sz 48,830–49,118 289 Alu-Sxz15,879–15,591 279 Alu-Y 49,492–49,790 299 Alu-Sz16,194–16,008 187 MER63A 50,445–50,313 133 MIR17,727–17,438 290 Alu-Jo 50,887–51,023 137 MIR18,045–17,757 289 Alu-Sq 51,163–51,445 283 Alu-Sz18,674–18,267 408 MLT1D 52,013–51,714 300 Alu-Y

Page 7: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

GENOMIC ANALYSIS OF A SEGMENT OF 16P13.13 251

Fig. 3. Computational analysis. (a) A restriction map (top)of the ~53 kb region encompassing the human PRM1→PRM2→TNP2 domain shows BamHI (B), EcoRI (E), andHindIII (H) sites. Repetitive elements (bottom) were identi-fied utilizing the Censor server. MIR elements are indicatedas black bars, MERs as open bars with black stripes and L1elements as open bars. The HRES repeat is indicated by acheckered bar. The various Alu repetitive elements are rep-resented by shaded bars as noted, with the J-class repeatsdenoted with very light gray, S-class with light gray and Y-class with medium gray bars. (b) GRAIL analysis revealed

several CpG islands (light gray bars, top panel). HpaII re-striction sites (second panel from the top), indicative of CpGrich regions, and uneven base preference indicative of gene-containing regions (middle panel) were assessed using Staden(x)nip. Grail exons in both the forward and reverse orienta-tions are shown (bottom two panels). Regions of “high” or“excellent” coding potential (gray and black bars, respectively).The locations of the PRM1, PRM2, and TNP2 genes, as wellas the SOCS-1 gene and the putative gene4/Prm3 and hnRNP-C16ψ pseudogenes are indicated.

Page 8: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

252 J.A. KRAMER ET AL.

In contrast, the other clones that did not containthis region presented no difficulties in either theirgrowth or isolation. The role of this repetitive el-ement cluster in chromosomal stability remainsto be addressed.

Analysis of the entire 53,060 bp sequence of thissegment of human chromosome 16p13.13 has iden-tified the presence of the SOCS-1 gene abuttingthe PRM1→PRM2→TNP2 domain. In addition, anhnRNP C-like gene lies ~17 kb 3´ to the end of theprotamine domain. Both of these gene regions aremarked by functional MARs. The hnRNP C-likegene shared 92% identity with other members ofthe hnRNP C2 multigene copy family (Nakagawaet al., ’86). However, recent extended sequencingof this region has demonstrated that the hnRNPC-like sequence is a pseudogene (Kramer et al.,unpublished observations) and has been termedhnRNP-C16ψ. It is interesting that within this re-gion of the human genome MARs appear to mark

each of these different genic domains. This maybe a general feature of this region of chromosome16 or of the human genome. As such, further analy-sis of this region to clarify the role of MARs isclearly warranted.

ACKNOWLEDGMENTSThis work was presented by S.A.K. at the 9th

International Congress on Genes, Gene Families,and Isozymes, April 14–19, 1997, San Antonio, TX.J.A.K. is supported by a Lalor Foundation post-doctoral fellowship. N.A.D. is supported by the U.S.D.O.E. under contract W-7405-ENG-36. This workwas supported in part by a F.M.R.E. grant to S.A.K.We thank Dr. Anthony R. Kerlavage from The In-stitute for Genomic Research for performing thesearches of TIGR’s human cDNA database prior toits release. We also thank Dr. Jerzy Jurka fromthe Genetic Information Research Institute for as-sisting us with repetitive element nomenclature.

Fig. 4. MAR analysis. (a) The MAR-FINDER algorithmidentified four potential sites of attachment to the nuclearmatrix. These correspond to regions with >70% MAR-Poten-tial. (b) Mature sperm and HeLa nuclear halo structures wereprepared, then the matrix bound and matrix independent ge-nomic segments were separated. PCR amplification revealedrestriction fragments enriched in either the nuclear matrix

bound or independent fractions. Individual restriction sitesare indicated as vertical lines, fragments containing ampliconsused in this analysis are indicated as boxes. Black boxes werestrongly (>70%) enriched in the matrix bound (pellet) frac-tion, white boxes were (>70%) enriched in the matrix inde-pendent (supernatant), i.e., loop fraction. Gray boxes indicatesegments that can be considered nuclear matrix associated.

Page 9: Extended analysis of the region encompassing the PRM1→PRM2→TNP2 domain: Genomic organization, evolution and gene identification

GENOMIC ANALYSIS OF A SEGMENT OF 16P13.13 253

LITERATURE CITEDAdams, M.D., A.R. Kerlavage, R.D. Fleischmann, R.A.

Fuldner, C.J. Bult, N.H. Lee, E.F. Kirkness, K.G. Weinstock,J.D. Gocayne, and O. White et al. (1995). Initial assess-ment of human gene diversity and expression patterns basedupon 83 million nucleotides of cDNA sequence. Nature,377(Suppl):3–174.

Altschul, S F., W. Gish, W. Miller, E.W. Myers, and D. J.Lipman (1990) Basic local alignment search tool. J. Mol.Biol., 215:403–410.

Black, J.A., and G.H. Dixon (1967) Evolution of protamine: Afurther example of partial gene duplication. Nature,216:152–154.

Choudhary, S.K., S.M. Wykes, J.A. Kramer, A.N. Mohamed,F. Koppitch, J.E. Nelson, and S. Krawetz (1995) A haploidexpressed gene cluster exists as a single chromatin domainin human sperm. J. Biol. Chem., 270:8755–8762.

Dixon, G.H., J.M. Aiken, J.M. Jankowski, D.I. McKenzie, R.Moir, and J.C. States (1986) Organization and evolution ofthe protamine genes of salmonid fishes. In: ChromosomalProteins and Gene Expression. G. Reeck, G. Goodwin andP. Pulgdomenech, eds. Plenum Publishing Co., New York,pp. 287–314.

Doggett, N.A., L.A. Goodwin, J.G. Tesmer, L.J. Meincke, C.Bruce, L.M. Clark, M.R. Altherr, A.A. Ford, H.-C. Chi., B.L.Marrone, J.L. Longmire, S.A. Lane, S.A. Whitmore, M.G.Lowenstein, R.D. Sutherland, M.O. Mundt, E.H. Knill, W.J.Bruno, C.A. Macken, D.C. Torney, J.-R. Wu, J. Griffith, G.R.Sutherland, L.L. Deaven, D.F. Callen, and R.K. Moyzis(1995) An integrated physical map of human chromosome16. Nature, 377(Suppl):335–365.

Fleischmann, R.D., M.D. Adams, O. White, R.A. Clayton, E.F.Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomb, B.A.Dougherty, and J.M. Merrick et al. (1995) Whole-genomerandom sequencing and assembly of Haemophilus influen-zae Rd. Science, 269:496–512.

Kramer, J. A., and S.A. Krawetz (1996) Nuclear matrix in-teractions within the sperm genome. J. Biol. Chem.,271:11619–11622.

Kramer, J.A., and S.A. Krawetz (1997) A PCR-based assay todetermine nuclear matrix association. BioTechniques,22:826–828.

Krawetz, S.A., W. Connor, and G.H. Dixon (1987) Cloning ofbovine P1 protamine cDNA and the evolution of vertebrateP1 protamines. DNA, 6:47–57.

Krawetz, S.A., and G.H. Dixon (1988) Sequence similaritiesof the protamine genes: Implications for regulation and evo-lution. J. Mol. Evol. 27:291–297.

Longmire, J.L., N.C. Brown, L.J. Meincke, M.L. Campbell,K.L. Albright, J.J. Fawcett, E.W. Campbell, R.K. Moyzis,

C.E. Hildebrand, G.A. Evans, and L.L. Deaven (1993) Con-struction and characterization of partial digest DNA librar-ies made from flow-sorted human chromosome 16. Genet.Anal.: Techniques Applications, 10:69–76.

Maidak, B.L., N. Larsen, M.J. McCaughey, R. Overbeek, G.T.Olsen, K. Fogel, J. Blandy, and C.R. Woese (1994) The Ri-bosomal Database Project. Nucl. Acids Res., 22:3485–3487.

Nakagawa, T.Y., M.S. Swanson, B.J. Wold, and G. Dreyfuss(1986) Molecular cloning of cDNA for the nuclear ribonucle-oprotein particle C proteins: a conserved gene family. Proc.Natl. Acad. Sci. USA, 83:2007–2011.

Nelson, J.E., and S.A. Krawetz (1994) Characterization of ahuman locus in transition. J. Biol. Chem., 269:31067–31073.

Nelson, J.E., and S.A. Krawetz (1995) Mapping the clonallyunstable recombinogenic PRM1→PRM2→TNP2 region ofhuman 16p13.13-13.2. DNA Seq., 5:163–168.

Oliva, R., and G.H. Dixon (1989) Chicken protamine genesare intronless. J. Biol. Chem., 264:12472–12481.

Pawlak, A., C. Toussaint, I. Levy, F. Bulle, M. Poyard, R.Barouki, and G. Guellaen (1995) Characterization of alarge population of mRNAs from human testis. Genomics,26:151–158.

Retief, J.D., and G. Dixon (1993) Evolution of pro-protamineP2 gene in primates. Eur. J. Biochem., 214:609–615.

Schlüter, G., H. Kremling, and W. Engel (1992) The gene forhuman transition protein 2: Nucleotide sequence, assign-ment to the protamine gene cluster, and evidence for itslow expression. Genomics, 14:377–383.

Schlüter, G., A. Celik, R. Obata, M. Schlicker, S. Hofferbert,A. Schlung, I.M. Adham, and W. Engel (1996) Sequenceanalysis of the conserved protamine gene cluster showsthat it contains a fourth expressed gene. Mol. Reprod. Dev.,43:1–6.

Schlüter, G., and W. Engel (1995) The rat Prm3 gene is anintronless member of the protamine gene cluster and is ex-pressed in haploid male germ cells. Cytogenet. Cell Genet.,71:352–355.

Singh, G.B., J.A. Kramer, and S.A. Krawetz (1997) Math-ematical prediction of the regions of chromatin scaffold at-tachment. Nucl. Acids Res., 25:1419–1425.

Staden, R. (1988) Methods to define and locate patterns ofmotifs in sequences. CABIOS, 4:53–60.

Thompson, J.D., D.G. Higgins, and T.J. Gibson (1994)CLUSTAL W: Improving the sensitivity of progressive mul-tiple sequence alignment through sequence weighting, po-sition-specific gap penalties and weight matrix choice. Nucl.Acids Res., 22:4673–4680.

Uberbacher, E.C., and R.J. Mural (1991) Locating protein-coding regions in human DNA sequences by a multiple sen-sor-neural network approach. Proc. Natl. Acad. Sci. USA,88:11261–11265.