8
DNA RESEARCH 6, 37-44 (1999) Characterization of a 1200-kb Genomic Segment of Chromosome 3p22-p21.3 Yataro DAIGO, 1 ' 3 Minoru ISOMURA, 12 Tadashi NISHIWAKI, 1 Mayumi TAMARI, 1 Shinji ISHIKAWA, 1 Mikio KAI, 1 Yasushi MLRATA, 1 Kumiko TAKEUCHI, 2 Yuka YAMANE, 2 Rie HAYASHI, 2 Maiko MINAMI, 2 Masayuki A. FUJINO, 3 Yoshiaki HOJO, 4 ' 5 Ikuo UCHIYAMA, 4 Toshihisa TAKAGI, 4 and Yusuke NAKAMURA 1 ' 2 '* Laboratory of Molecular Medicine, Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan, 1 Department of Human Genome Analysis, the Cancer Chemotherapy Center, Japanese Foundation for Cancer Research, 1-37-1 Kami-Ikebukuro, Toshima-ku, Tokyo 170-8455, Japan, 2 First Department of Medicine, Yamanashi Medical University School of Medicine, 1110 Shimokato, Tamaho, Nakakoma-gun, Yamanashi 409-3898, Japan, 3 Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan, 11 and Information Systems Group, Hitachi, Ltd, Sinsuna Plaza 6-27. 1 Shinsuna, Koto-ku, Tokyo 136-8632, Japan 5 (Received 2 December 1998; revised 28 December 1998) Abstract We previously determined the nucleotide sequence and characterized the 685-kb proximal half of CEPH YAC936cl, which corresponds to a portion of human chromosome 3p21.3. In the study reported here, we characterized the remaining 515-kb of this YAC clone corresponding to the telomeric half'of its human insert. The newly sequenced region contained a total of ten genes including six reported previously: phos- pholipase C delta 1 (PLCD1), human activin receptor type IIB (hActR-IIB), organic cation transporter-like 1 (OCTL1), organic cation transporter-like 2 (OCTL2), oxidative stress response 1 (OSR1), and human xylulokinase-like protein (XYLB). The remaining four genes present in the telomeric region included two known genes, MyD88 and ACAA, and two novel genes. One (designated ENGL) of the novel sequences was found to encode an amino-acid sequence homologous to the family of DNA/RNA endonucleases, especially endonuclease G. The other gene F56 revealed no significant homology to any known genes. These results disclosed complete physical and transcriptional maps of the 1200-kb region of 3p present in YAC 936c 1. Key words: large-scale DNA sequencing; tumor suppressor genes; homozygously deleted region; physical and transcriptional maps; human chromosome 3p22-p21.3 1. Introduction The short arm of chromosome 3 is thought to contain multiple tumor suppressor genes because it shows fre- quent losses of heterozygosity in carcinomas of the lung, uterus, esophagus, and kidney. 1 ^ 4 Our group has been performing large-scale DNA sequencing as a part of the Japanese Human Genome Project. In an effort to isolate one of these putative tumor suppressor genes, we have been examining genomic DNA corresponding to the re- gion on chromosome 3p21.3 where a DNA segment is ho- mozygously deleted in a lung-cancer cell line. By screen- ing cDNAs using genomic DNA as a probe and using Communicated by Yoshiyuki Sakaki * To whom correspondence should be addressed. Tel. +81- 3-5449-5372, Fax. +81-3-544'5-5433, E-mail: [email protected] tokyo.ac.jp the GRAIL2 5 and HEXON 6 programs to predict exons, we earlier identified and reported four genes in the ho- mozygously deleted region. However, all four were sub- sequently excluded as candidates for tumor-suppressor functions. 7 ~ 8 As the homozygous deletion might exert a positional effect to influence expression of genes in the near vicinity, we expanded our DNA sequencing into the surrounding region. In studies reported previously, we were able to characterize the genomic structures of two of the known genes present in the relevant region, phos- pholipase C delta 1 (PLCD1) 9 and activin receptor type IIB (hActR-IIB). 10 In addition, we isolated four novel genes: organic cation transporter-like 1 (0CTL1), 11 or- ganic cation transporter-like 2 (0CTL2), 11 human xy- lulokinase like protein (XYLB), 12 and oxidative stress response 1 (0SR1). 13 Here we describe completion of a sequence analysis for 1200-kb of chromosome 3p21.3, Downloaded from https://academic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 December 2021

Characterization of a 1200-kb Genomic Segment of - DNA Research

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

DNA RESEARCH 6, 37-44 (1999)

Characterization of a 1200-kb Genomic Segment of Chromosome3p22-p21.3

Yataro DAIGO,1'3 Minoru ISOMURA,12 Tadashi NISHIWAKI,1 Mayumi TAMARI,1 Shinji ISHIKAWA,1

Mikio KAI,1 Yasushi MLRATA,1 Kumiko TAKEUCHI,2 Yuka YAMANE,2 Rie HAYASHI,2

Maiko MINAMI,2 Masayuki A. FUJINO,3 Yoshiaki HOJO,4'5 Ikuo UCHIYAMA,4

Toshihisa TAKAGI,4 and Yusuke NAKAMURA1'2'*

Laboratory of Molecular Medicine, Human Genome Center, Institute of Medical Science, The Universityof Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan,1 Department of Human GenomeAnalysis, the Cancer Chemotherapy Center, Japanese Foundation for Cancer Research, 1-37-1Kami-Ikebukuro, Toshima-ku, Tokyo 170-8455, Japan,2 First Department of Medicine, Yamanashi MedicalUniversity School of Medicine, 1110 Shimokato, Tamaho, Nakakoma-gun, Yamanashi 409-3898, Japan,3

Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, The University ofTokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan,11 and Information Systems Group, Hitachi,Ltd, Sinsuna Plaza 6-27. 1 Shinsuna, Koto-ku, Tokyo 136-8632, Japan5

(Received 2 December 1998; revised 28 December 1998)

Abstract

We previously determined the nucleotide sequence and characterized the 685-kb proximal half of CEPHYAC936cl, which corresponds to a portion of human chromosome 3p21.3. In the study reported here, wecharacterized the remaining 515-kb of this YAC clone corresponding to the telomeric half'of its humaninsert. The newly sequenced region contained a total of ten genes including six reported previously: phos-pholipase C delta 1 (PLCD1), human activin receptor type IIB (hActR-IIB), organic cation transporter-like1 (OCTL1), organic cation transporter-like 2 (OCTL2), oxidative stress response 1 (OSR1), and humanxylulokinase-like protein (XYLB). The remaining four genes present in the telomeric region included twoknown genes, MyD88 and ACAA, and two novel genes. One (designated ENGL) of the novel sequences wasfound to encode an amino-acid sequence homologous to the family of DNA/RNA endonucleases, especiallyendonuclease G. The other gene F56 revealed no significant homology to any known genes. These resultsdisclosed complete physical and transcriptional maps of the 1200-kb region of 3p present in YAC 936c 1.Key words: large-scale DNA sequencing; tumor suppressor genes; homozygously deleted region; physicaland transcriptional maps; human chromosome 3p22-p21.3

1. Introduction

The short arm of chromosome 3 is thought to containmultiple tumor suppressor genes because it shows fre-quent losses of heterozygosity in carcinomas of the lung,uterus, esophagus, and kidney.1^4 Our group has beenperforming large-scale DNA sequencing as a part of theJapanese Human Genome Project. In an effort to isolateone of these putative tumor suppressor genes, we havebeen examining genomic DNA corresponding to the re-gion on chromosome 3p21.3 where a DNA segment is ho-mozygously deleted in a lung-cancer cell line. By screen-ing cDNAs using genomic DNA as a probe and using

Communicated by Yoshiyuki Sakaki* To whom correspondence should be addressed. Tel. +81-

3-5449-5372, Fax. +81-3-544'5-5433, E-mail: [email protected]

the GRAIL25 and HEXON6 programs to predict exons,we earlier identified and reported four genes in the ho-mozygously deleted region. However, all four were sub-sequently excluded as candidates for tumor-suppressorfunctions.7~8 As the homozygous deletion might exert apositional effect to influence expression of genes in thenear vicinity, we expanded our DNA sequencing into thesurrounding region. In studies reported previously, wewere able to characterize the genomic structures of twoof the known genes present in the relevant region, phos-pholipase C delta 1 (PLCD1)9 and activin receptor typeIIB (hActR-IIB).10 In addition, we isolated four novelgenes: organic cation transporter-like 1 (0CTL1),11 or-ganic cation transporter-like 2 (0CTL2),11 human xy-lulokinase like protein (XYLB),12 and oxidative stressresponse 1 (0SR1).13 Here we describe completion ofa sequence analysis for 1200-kb of chromosome 3p21.3,

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

38 Sequence Analysis of 3p22-p21.3 [Vol. 6,

cloned in YAC936cl, which includes the homozygouslydeleted 685-kb segment reported previously.

2. Materials and Methods

2.1. Genomic clones and librariesYAC clone 936c 1 was obtained from the CEPH

YAC library. Chromosomal mapping and chimerismof YAC936cl were analyzed by fluorescence in situ hy-bridization as described previously.14 Cosmid contigswere constructed from YAC936cl according to methodsdescribed previously.14

2.2. DNA sequencingTwenty cosmid clones that constituted a contig of ap-

proximately 515-kb of YAC936cl, representing genomicmaterial located on human chromosome 3p22-p21.3,were sequenced by the shotgun method. Briefly, these20 clones were fragmented by sonication; DNA fragmentsof 1.5- to 6.0-kb were subcloned into pBluescriptIISK( —).Two hundred randomly selected clones were sequenced atboth ends with T3 and T7 dye terminator using an ABIPrism 377 DNA sequencer (Applied Biosystems). DNAsequences were assembled by means of the ABI "Assem-bler" computer software. Gaps between the assembledsegments were connected by direct-cosmid sequencing us-ing primers designed from the end sequences of the as-sembled segments.

2.3. Isolation of cDNA clonesWe analyzed genomic DNA sequences from the tar-

get region with two exon-prediction computer programs,GRAIL2 and HEXON, and performed exon-connectionexperiments by reverse-transcriptase PCR (RT-PCR) asdescribed previously15'16 to investigate whether the pre-dicted candidate-exons were actually transcribed. Thenwe screened human cDNA libraries using the exon-connected products as probes, or performed 5' and 3'rapid amplification of cDNA ends (RACE) experimentswith cDNA fragments using the Marathon cDNA ampli-fication kit (Clontech) according to the manufacturer'sinstructions.

2.4- Northern-blot analysisHuman multiple-tissue blots (Clonetech) were hy-

bridized with cDNA fragments that had been labeled byrandom-oligonucleotide priming. A total of 16 tissuesexamined were as follows, heart, brain, placenta, lung,liver, skeletal muscle, kidney, pancreas, spleen, thymus,prostate, testis, ovary, small intestine, colon, and leuko-cyte. Pre-hybridization, hybridization and washing wereperformed according to the supplier's recommendations.The blots were autoradiographed and analyzed with aBAS 1000 image analyzer (FUJI).

2.5. Fluorescence in situ hybridization (FISH)To confirm the chromosomal location of each gene, we

performed FISH as described by Inazawa et al.17 For de-lineation of the G-banding pattern, metaphase chromo-somes were prepared by thymidine synchronization andbromodeoxyuridine release, find placed on microscopeslides. Cosmid clones containing the genes were labeledwith biotin-16-dUTP (Boehringer) by nick-translationand were hybridized to the denatured metaphase chromo-somes. Hybridization signals were rendered visible withFITC-avidin (Boehringer). Precise assignments of thesignals were determined by visualization of the replicatedG-bands.

3. Results

3.1. Physical mapping and characterization of genomicsequences

To determine the genomic sequence of the telomericend of YAC936cl, we constructed a physical contig mapextending 515-kb telomeric to the breakpoint of the ho-mozygously deleted region. We were able to constructa "minimal tiling path" for this 515-kb segment with20 cosmids (Fig. IB). Fluorescence in situ hybridizationusing these clones as probes confirmed that they werederived from 3p22-p21.3 (data not shown).

Database searches found a large number of D-segments (sequence-tagged sites or expressed-sequencetags, Fig. 1A) or repetitive sequences that matchedparts of the total 1,201,033-nucleotide region. Seventeen(CA)n microsatellites, and 957 copies of repetitive se-quences such as Alu (437 copies) and LINEs (153 copies)(Fig. IE, IF, and 1G), MIR (128 copies), LTR elements(MaLRs, Retrovirus, and MER4 group: 141 copies), andDNA elements (MER 1, MER2, and Mariners: 98 copies)were detected also. The GC content of this 1.2-Mb DNAsegment was 44.0% on average, which corresponds toan HI isochore.18 A total of 37 CpG islands (Fig. ID),which are known to be frequently associated with 5'upstream regions of housekeeping genes and also sometissue-specific genes,19'20 were detected in this region byGeneScope software.21 A VNTR (variable number of tan-dem repeat) sequence consisting of about 25 repeats of a40-bp consensus unit (CACATATATTATATATTTCAT-ACGTATTTCATATATATTT)n was found in the regioncovered by cosmid clone 467.

3.2. Identification of genes in the 515-kb segmentTo construct a transcription map of the 515-kb telom-

eric portion of the YAC, we analyzed genomic DNA se-quences with two exon-prediction computer programs,GRAIL2 and HEXON. We were able to identify fourknown genes in this way; two of them (PLCD1 andhActR-IIB) had been reported previously by us. Thegenomic structures of the other two, MyD88 and ACAA,

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

No. 37]

chromosome 3p22-p21.3Y. Daigo et al. 39

centromere telomere

A

301

_253

244

1

135

? /

~2oT

118_

119

206 139

252"

*

153

TIT101 145

280 "158

150

330"

220 171

250 ~To7 "

242

30T

213

111

. 305 306

"356

594

603 602

/A.467

"543" 5SM

/

576 566

425 "511

490

$ <:

483

536

_459

508

424

"57? 320

villin-like F56 MyD88

mil mmn n

trans - Golgi p230 integrin a RLC HYA22 PLC61 ACAA 0 S R 1 OCTL1OCTL2 XYLB ActRIIB ENGL100 200 3OC 400 5M 600 700 800 900 1QO0 1100 1200(kb)

I I I •HIM Illl Ul iH Ml III

(CA)52 (CA)39 (CA)52 (CA)54

J I ICA>*J-(CA)41^(CA)32 (CA)63 {CA)41 VNTR (CA)46 (CA)27 (CA)5O (CA)42

IJMI I I ...HIM II, I I I I I . I, ...."|'JAf.U/JJ'.."f.'liliil mi i ii i HI II i i I liil in ii mi I inn I I III nf i I Tin n inn iin ii in II n iiii ill i i II ill I I iijiii in n mi 11

J II i i Ii II n i i r I I IT

305

PLC51 ACAA

Figure 1. Detailed physical map and structural features of the 1200-kb region of CEPH YAC936cl corresponding to chromosome3p22-p21.3. (A) Assignment of D-segments (STSs). (B) Locations of overlapping clones subjected to nucleotide sequencing. (C)Exon-intron organization of the identified genes. (D) CpG islands predicted by GeneScope. (E) Locations of microsatellite re-peats (CA)n. A variable number of tandem repeat (VNTR) sequence, about 25 repeats of a 40-bp consensus unit (CACATATAT-TATATATTTCATACGTATTTCATATATATTT)n, was observed in the region covered by cosmid clone 467. (F) Locations of Alurepetitive elements. (G) Locations of LINE1 repetitive elements. (H) Precise locations of genes in the 130-kb segment covered bycosmid clones 306-543. Open portions of the boxes are coding regions, and filled areas are untranslated. F56 and ACAA sharedexonic material at their 3' end;:.

are summarized in Tables 1 and 2. MyD88, a myeloiddifferentiation primary response gene contained in cos-mid 467, was found to span about 4.3-kb with five exons(Table 2). ACAA, the 3-oxoacyl-CoA thiolase gene, wascovered by cosmid clones 467 and 543; it spanned anapproximately 14.4-kb genomic region and consisted of12 exons (Table 2). Although nucleotide sequences ofexon-intron boundaries of the MyD88 and ACAA geneswere reported by others,22'23 all of the splice signals ob-tained by our sequences were consensus ones.

3.3. Isolation of novel genesBy means of exon-connection experiments and

computer-prediction of exons, with subsequent cDNAscreening, we were able to define six additional genes

in the 515-kb region, including four (OCTL1, OCTL2,XYLB, and OSR1) that we reported recently (Table 1).One of the two additional genes, designated F56, showedno significant homology to any known genes. The other,with similarity to endonuclease G, we designated endonu-clease G-like (ENGL). Northern-blot analysis revealedthat a 1.7-kb transcript of ENGL was expressed in all 16human tissues examined (data not shown), although a2.1-kb band specific to heart, liver, skeletal muscle, andtestis was detected. We identified two types of cDNAclones, 1460 and 1799 nucleotides respectively (DNA se-quences are not shown, but are available from GenBank,accession numbers AB020523 and AB020735). WithSSCP analysis of several cancer materials, we have foundno somatic mutation in these genes so far.

Figure 2 illustrates the genomic structure of the ENGL

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

40 Sequence Analysis of 3p22-p21.3 [Vol. 6,

Table 1. Transcription units identified from a 1200-kb region on 3p22-p21.3.

Transcriptionunit

trans-GolgiintegrinaRLCHYA22villin-likePLC81F56ACAAMyD88OSR1OCTL1OCTL2XYLBhActRIIBENGL

Status a

knownreported previouslyreported previouslyreported previouslyknownthis reportknownknownreported previouslyreported previouslyreported previouslyreported previouslyknownthis report

Transcriptsize (kb)b

ND4.1,9.54.6, 2.72.2ND6.0, 8.0NDND4.62.4,2.62.3,2.42.3,1.8ND1.7,2.1

Expressionprofile13

NDAll tissuesAll tissuesAll tissuesNDAll tissuesNDNDAll tissuesAll tissuesAll tissuesAll tissuesNDAll tissues

Number ofexons

11 (incomplete)28

8151537125

18101018116 (incomplete)

Number of exons predicted0

Grail2

5265

13122465898

13114

GENSCAN

7203

139

2195

1299

13100

Accessionnumber

U41740D25303D88153D88154U09117AB020522X65140U70451ABO17642AB010438AB011082ABO15046X77533AB020523

a) The DNA sequence of each clone was determined, checked against the databases, and anaiyzed using exon predictionprograms, "this report" means the novel gene isolated by us. "reported previously" indicates that the complete cDNAsequence and genomic structure was reported previously by us. b) Each novel cDNA clone was examined by hybridizationagainst human multiple tissue Northern blots (Clonetech; including 16 human tissues), c) The number of predicted exonexamined by GRAIL2 or GENSCAN respectively.

413-bp-3' region of the ENGL-1cDNA that locates telomeric tothe YAC936C1

Figure 2. Genomic structure of the ENGL gene. Locations of exons are indicated by numbered boxes, which are drawn to scaleto indicate the size of each exon. Open portions of the boxes are coding regions, and filled areas are untranslated. Hatched boxcorresponds to the 413-bp-3' region of the ENGL-b cDNA that locates telomeric to the YAC936cl. Arrows indicate the transcriptionaldirection of each gene. The genomic sequence data will appear in the DDBJ, EMBL, and GenBank databases with accession numberAB008681.

gene. The 1.7-kb transcript, ENGL-a, consisted of six ex-ons; the 2.1-kb transcript, ENGL-b, shared only exons 4and 5, and parts of exons 3 and 6 with ENGL-a. The 3'nucleotides of ENGL-b from 1405 to the 3' end were notpresent in the genomic DNA sequences we determinedin the YAC. The 368-amino-acid protein encoded by the1.7-kb transcript and a 149-amino-acid protein encodedby the 2.1-kb transcript share only 90 amino acids in theirmiddle portions. A homology search using the FASTAprogram24 with the ENGL-a sequence revealed thatthe predicted product was similar to proteins belong-ing to members of the DNA/RNA endonuclease family,especially human endonuclease G (ENDOG),25 murineENDOG,26 and bovine ENDOG27"29 (38.2%, 37.1%, and36.5% amino acid identity, respectively; Fig. 3).

4. Discussion

We have determined the complete DNA sequence of a515-kb segment of genomic DNA corresponding to thetelomeric half of CEPH YAC936cl (human chromosome3p22-p21.3). By combining the results with the 685-kbsequence of the centromeric half reported previously, wehave completed sequence analysis of the entire regioncontained in this YAC. We have now a total of about1.2-Mb of 3p genomic sequence (in an average of 99.98%sequence accuracy), which in addition to 14 genes con-tains 17 copies of microsatellite sequences containing 27to 63 CA repeats. It also contains 957 copies of variousrepetitive elements that together account for 34.4% of theentire sequence. The average GC content of the 1.2-Mbsequence was 44.0%, but among the 204 exons the GCcontent averaged 47.8%.

As to the possibility of deletions or rearrangementsof CEPH YAC clones, using FISH analysis we mappedthe several cosmids derived from YAC936cl including

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

No. 37] Y. Daigo et al. 41

Table 2. Genomic structure of MyD88 and ACAA.

coding E x o n | e n g t h

number (bp)CDNA

positionIntron

numberIntron length

(bp)

1

2

3

4

5

360

135

181

92

1888

1-

361 -

496-

677-

769-

360

495

.676

768

2656

836

390

189

284

Genomic structure of ACAA gene

codingexonnumber

Exon length(bp)

CDNAposition

Intronnumber

Intron length(bp)

1

2

3

4

5

6

7

8

9

10

11

12

207

94

58

80

43

99

82

190

180

56

146

407

1 - 207

208- 301

302- 359

360 - 439

440- 482

483- 581

582- 663

664- 853

854-1033

1034-1089

1090-1235

1236-1642

1

2

3

4

5

6

7

8

9

10

11

479

2732

2082

411

2349

1604

1357

537

515

316

2996

the clones that correspond to the URA or TRP end, orthe clones including the isolated gene(s) to chromosome3p21.3. In addition, we revealed that a total of 204 ex-ons were distributed over the 1200-kb segment. By com-paring the cDNA sequences isolated with the 1200-kbsegment, all of the exons were found in this region withcompatible direction. Hence no gross rearrangement ordeletion of YAC936cl was found in our sequences.

One VNTR (variable number of tandem repeat) se-quence of about 25 repeats of a 40-bp consensus unitwas found between the 5' end of the OSR1 gene andthe 3' end of the MyD88 gene. Evidence has latelyemerged that some VNTR sequences likely play signif-icant roles in the regulation of transcription of mRNA,and that certain allelic variants may be associated withpersonality traits or with susceptibility to diseases suchas IDDM or epilepsy.30"35 For example, an allele of theintronic VNTR sequence present in the 5' promoter re-gion of the serotonin transporter gene, which carries a44 bp deletion, is significantly more common in individu-als with anxiety-related personalities such as neuroticism,tension, and harm-avoidance than in the general pop-ulation. Moreover, the region containing the sequencelacking those 44 bp is less active in promoting transcrip-tion of the reporter gene than the full VNTR sequence.36

Hence, the VNTR we found in the 5' flanking regions of

ENGL-a tAIKSIASBg RGSRRFLSGF VAGAWGflAG AGBASHOFFR S

EtCL-a DTCKLLDFQE FTLYLSTRKI EGARSVLRLE KIMENLKNAE IEPDDYFMSR

ENGL-a YEKKLEELKA KEQSGTQIRK PShumanrp.ousebovine

Figure 3. Alignment of the predicted amino acid sequences ofthe family of DNA/RNA endonucleases: human ENGL, humanENDOG, murine ENDOG, and bovine ENDOG (GenBank ac-cession numbers AB020523, Q14249, 008600, and P38447 re-spectively). Shaded residues are conserved in all four proteinsshown here.

OSR1 and ACAA (see Fig. 1C and IE) might influencetranscription of those two genes.

Using computer-assisted methods, we identified in thenewly sequenced 515-kb genomic fragment four knowngenes (PLCD1, hActR-IIB, MyD88, and ACAA); six oth-ers encoded OCTL1, OCTL2, OSR1, XYLB, F56, andENGL (Table 1). Two of those, F56 and ACAA, locatedin the 130-kb segment covered by cosmid clones 306-543(Fig. 1H) shared their last exons. Therefore, expressionof these two genes may be regulated by a common mech-anism and their products may be involved in biophysio-logically related pathways.

ENGL, a novel molecule, revealed significant homol-ogy to endonuclease G found in mitochondria humans,mice, bovines, and yeast. This enzyme may function withribonuclease as well as with deoxyribonuclease, play-ing a role in mitochondrial DNA (mtDNA) replication.Northern-blot analysis revealed that a 1.7-kb transcriptwas expressed in all 16 human tissues examined, althougha 2.1-kb band was also present in heart, liver, skeletalmuscle, and testis. The 2.1-kb transcripts appeared tolack exon 1, exon 2, and 747 bp of exon 6 (partial se-

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

42 Sequence Analysis of 3p22-p21.3 [Vol. 6,

quence), comprising a total of 278 codons and 330 bpof the 3' non-coding region, but they possessed an addi-tional 952-bp exon, a 74-bp exon located 5' of exon 3, and413 bp of the 3' region. On the basis of its sequence simi-larities and conserved motifs with respect to DNA/RNAendonucleases, we believe that ENGL protein likely playsa role in mtDNA replication. However, the role of thetissue-specific 2.1-kb transcript remains unclear.

The MyD88 gene, which is activated in Mlmyeloleukemic cells in response to interleukin-6,37 in-duces both growth arrest and terminal differentiation. Itsfirst exon encodes a "death domain" similar to the intra-cellular segment of TNF receptor-1, and its C-terminalregion is homologous to type I interleukin-1 receptor cy-toplasmic domain.38 Another known gene in the segmentstudied here encodes human peroxisomal 3-oxoacyl-CoAthiolase (AC A A), an enzyme operative in the peroxiso-mal beta-oxidation system.

We previously reported that three of four genes lo-cated in the 685-kb centromeric region of the 3p re-gion contained in YAC936cl were likely to be related tocell structure or to intracellular transport (trans-Gblgip230, HYA22, and villin-like).8 In the 515-kb segmentsequenced here, a total of ten genes encoding proteinswith various biophysiological functions are present; twoof them are likely to be transmembrane transporters(OCTL1 and OCTL2) and the others may have im-portant functions In metabolism, differentiation, or cellproliferation.9"13

The human genome project has engendered a numberof exon-prediction computer programs whose accuracyis very important, even critical, to our purposes. Wecompared data obtained from analysis of the 1200-kbsegment of 3p genomic DNA with GRAIL2 and GEN-SCAN. The latter, a recently designed program, differsfrom most existing gene-recognition algorithms.39 Theresult revealed that 69% of the 204 exons included inthe 14 genes were predicted by GENSCAN and 73% byGRAIL2 (see Table 1). One hundred twenty six (64%) ofthe 196 fragments predicted as exons by GENSCAN, and143 (77%) of the 186 candidate exons predicted by theGRAIL2 program as "excellent" scores were confirmedto be genuine exons. Only 13 (11%) of the 116 exonspredicted as "good" scores and four (6%) of the 66 exonspredicted as "marginal" scores by the GRAIL2 programwere confirmed to be parts of genes (Table 3a).

Among the computer-predicted exons, the first andlast exons were also predicted relatively accurately: 18(64%) of the 28 fragments predicted as first or last exonsby GRAIL2, 19 (67%) of the 28 such exons predictedby GENSCAN, and 24 (86%) of the 28 predicted byGRAIL2 or GENSCAN together were confirmed as realexons (Table 3b). GENSCAN also detected 13 (92%) ofthe 14 genes present, and recognized 10 (71%) of themas complete genes. Combining these two exon-predictionprograms should make it possible to routinely detect

Table 3. Genomic DNA analysis data provided by GRAIL 2 andGENSCAN computer program.

program

GRAIL 2

GENSCAN

(b)

program

GRAIL 2

GENSCAN

GRAIL 2&

GENSCAN

score

excellent

good

marginal

first (a)predicted/real exon

8/14(57%)

9/14(64%)

12/14(86%)

real exon / predicted exon

143/186(77%)

13/116(11%)

4/66(6%)

126/196(64%)

last (b)predicted/ (a)&(b)~eal exon

1.0/14(71%) 18/28(64%)

10/14(71%) 19/28(67%)

:.2/14(86%) 24/28(86%)

a) The frequency of real exon/predicted exon. For the GRAIL2 program, the groups were divided according to excellent,marginal, or good scores, b) The frequency of first or lastreal exon/predicted exon.

genes and determine their genomic structures effectively.

Acknowledgments: This work was supported partlyby the Japan Science and Technology Corporation (JST)and by a "Research for the Future Program Grant(96L00102)" from the Japan Society for the Promotionof Science.

The total genomic sequence of this 1200-kb region isnow available on the Internet (www-alis.tokyo.jst.go.jp/HGS/team.GK/3p21.3/map.html) and the 515-kb ge-nomic sequence data will appear in the DDBJ,EMBL, and GenBank databases with accession numbersAB008681 and AB010443.

References

1. Yamakawa, K., Morita, R., Takahashi, E., Hori, T.,Ishikawa, J., and Nakamura, Y. 1991, A detailed deletionmapping of the short arm of chromosome 3 in sporadicrenal cell carcinoma, Cancer Res., 51, 4707-4711.

2. Yokoyama, S., Yamakawa, K., Tsuchiya, E., Murata, M.,Sakiyama, S., and Nakamura, Y. 1992, Deletion mappingon the short arm of chromosome 3 in squamous cell carci-noma and adenocarcinoma of the lung, Cancer Res., 52,873-877.

3. Hibi, K., Takahashi, T., Yamakawa, K. et al. 1992, Threedistinct regions involved in 3p deletion in human lungcancer, Oncogene, 7, 445-449.

4. Mori, T., Yanagisawa, A., Kato, Y. et al. 1994, Accumu-lation of genetic alterations during esophageal carcino-

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

No. 37] Y. Daigo et al. 43

genesis, Hum. Mol. Genet., 3, 1969-1971.5. Xu, Y., Mural, R. J., and Uberbacher, E. C. 1994, Con-

structing gene models from accurately predicted exons:an application of dynamic programming, Comput. Appl.Biosci., 10, 613-623.

6. Solovyev, V. V., Salamov, A. A., and Lawrence, C.B. 1994, Predicting internal exons by oligonucleotidecomposition and discriminate analysis of spliceable openreading frames, Nucleic Acids Res., 22, 5156-5163.

7. Hibi, K., Yamakawa, K., Ueda, R. et al. 1994, Aberrantupregulation of a novel integrin alpha subunit gene at3p21.3 in small cell lung cancer, Oncogene, 9, 611-619.

8. Ishikawa, S., Kai, M., Tamari, M. et al. 1997, Sequenceanalysis of a 685-kb genomic region on chromosome 3p22-p21.3 that is homozygously deleted in a lung carcinomacell line, DNA Res., 4, 35-43.

9. Ishikawa, S., Takahashi, T., Ogawa, M., and Nakamura,Y. 1997, Genomic structure of the human PLCD1 (phos-pholipase C delta 1) locus on 3p22-p21.3, Cytogenet. CellGenet, 78, 58-60.

10. Ishikawa, S., Kai, M., Murata, Y. et al. 1998, Genomicorganization and mapping of the human activin receptortype IIB (hActR-IIB) gene, J. Hum. Genet., 43, 132-134.

11. Nishiwaki, T., Daigo, Y., Tamari, M., Fujii, Y., andNakamura, Y., Molecular cloning, mapping, and char-acterization of two novel human genes, OCTL1 andOCTL2, bearing homology to organic-cation trans-porters, Cytogenet. Cell Genet., (In press).

12. Tamari, M., Daigo, Y., Ishikawa, S., and Nakamura, Y.1998, Genomic structure of a novel human gene (XYLB)on chromosome 3p22-p21.3 encoding a xylulokinase-likeprotein, Cytogenet. Cell Genet, 82, 101-104.

13. Tamari, M., Daigo, Y., and Nakamura, Y. Isolation andcharacterization of a novel serine threonine kinase geneon 3p22-p21.3, J. Hum. Genet, (In press).

14. Murata, Y., Tamari, M., Takahashi, T. et al. 1994, Char-acterization of an 800 kb region at 3p22-p21.3 that washomozygously deleted in a lung cancer cell line, Hum.Mol. Genet., 3, 1341-1344.

15. Fearon, E. R., Cho, K. R., Nigro, J. M. et al. 1990, Iden-tification of a chromosome 18q gene that is altered incolorectal cancers, Science, 247, 49-56.

16. Horii, A., Nakatsuru, S., Ichii, S., Nagase, H., andNakamura, Y. 1993, Multiple forms of the APC tran-scripts and their tissue-specific expression, Hum. Mol.Genet., 2, 283-287.

17. Inazawa, J., Saito, H., Ariyama, T., Abe, T., andNakamura, Y. 1993, High resolution cytogenetic mappingof 342 new cosmid markers including 43 RFLP mark-ers on human chromosome 17 by fluorescence in situ hy-bridization, Genomics, 17, 153-162.

18. Bernardi, G. 1995, The human genome: organization andevolutionary history, Annu. Rev. Genet., 29, 445-476.

19. Bird, A. P. 1986, CpG-rich island and the function ofDNA methylation, Nature, 321, 209-213.

20. Gardiner-Garden, M. and Frommer, M. 1987, CpG is-lands in vertebrate genomes, J. Mol. Biol., 196, 261-282.

21. Murakami, K. and Takagi, T. 1998, Gene recognition bycombination of several gene-finding programs, Bioinfor-matics, 14, 665-675.

22. Bonnert, T. P., Garka, K. E., Parnet, P., Sonoda, G.,

Testa, J. R., and Sims, J. E. 1997, The cloning and char-acterization of human MyD88: a member of an IL-1 re-ceptor related family, FEBS Lett, 402, 81-84.

23. Bout, A., Franse, M. M., Collins, J., Blonden, L., Tager,J. M., and Benne, R. 1991, Characterization of thegene encoding human peroxisomal 3-oxoacyl-CoA thio-lase (AC A A), No large DNA rearrangement in a thiolase-deficient patient, Biochim. Biophys. Acta., 1090, 43-51.

24. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman, D. J. 1990, Basic local alignment search tool, J.Mol. Biol., 215, 403-410.

25. Tiranti, V., Rossi, E., Ruiz-Carrillo, A. et al. 1995,Chromosomal localization of mitochondrial transcriptionfactor A (TCF6), single-stranded DNA-binding protein(SSBP), and endonuclease G (ENDOG), three humanhousekeeping genes involved in mitochondrial biogenesis,Genomics, 25, 559-564.

26. Prats, E., Noel, M., Letourneau, J. et al. 1997, Charac-terization and expression of the mouse endonuclease Ggene, DNA cell. Biol., 16, 1111-1122.

27. Moos, M. Jr., Nguyen, N. Y., and Liu, T. Y. 1988, Repro-ducible high yield sequencing of proteins electrophoret-ically separated and transferred to an inert support, J.Biol. Chem., 263, 6005-6008.

28. Cote, J. and Ruiz-Carrillo, A. 1993, Primers for mito-chondrial DNA replication generated by endonuclease G,Science, 261, 765-769.

29. Gerschenson, M., Houmiel, K. L., and Low, R. L. 1995,Endonuclease G from mammalian nuclei is identical tothe major endonuclease of mitochondria, Nucleic AcidsRes., 23, 88-97.

30. Bennett, S. T., Wilson, A. J., Esposito, L. et al. 1997,Insulin VNTR allele-specific effect in type 1 diabetes de-pends on identity of untranslated paternal allele, NatureGenet, 17, 350-353.

31. Lafreniere, R. G., Rochefort, D. L., Chretien, N. et al.1997, Unstable insertion in the 5' flanking region of thecystatin B gene is the most common mutation in progres-sive myoclonus epilepsy type 1, EPM1, Nature Genet,15, 298-302 .

32. Lalioti, M. D., Scott, H. S., Buresi, C. et al. 1997, Dode-camer repeat expansion in cystatin B gene in progressivemyoclonus epilepsy, Nature, 386, 847-850.

33. Pugliese, A., Zeller, M., Fernandez, A. et al. 1997, Theinsulin gene is transcribed in the human thymus and tran-scription levels correlate with allelic variation at the INSVNTR-IDDM2 susceptibility locus for type 1 diabetes.Nature Genet., 15, 293-296.

34. Vafiadis, P., Bennett, S. T., Todd, J. A. et al. 1997, In-sulin expression in human thymus is modulated by INSVNTR alleles at the IDDM locus, Nature Genet, 15,289-293.

35. Nakamura, Y., Koyama, K., and Matsushima, M. 1998,VNTR (variable number of tandem repeat) sequences astranscriptional, translational, or functional regulators, J.Hum. Genet., 43, 149-152.

36. Lesch, K-P., Bengel, D., Heils, A. et al. 1996, Associationof anxiety-related traits with a polymorphism in the sero-tonin transporter gene regulatory region, Science, 274,1527-1531.

37. Kenneth, A. L., Barbara, H. L., and Dan, A. L. 1990,

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021

44 • Sequence Analysis of 3p22-p21.3 [Vol.6,

Nucleotide sequence and expression of a cDNA encoding terization and modular analysis of human MyD88, Onco-MyD88, a novel myeloid differentiation primary response gene, 13, 2467-2475.gene induced by IL-6, Oncogene, 5, 1095-1097. 39. Burge, C. and Karlin, S. 1997, Prediction of complete

38. Hardiman, G., Rock, F. L., Balasubramanian, S., gene structures in human genomic DNA, J. Mol. Biol.,Kastelein, R. A., and Bazan J. F. 1996, Molecular charac- 268, 78-94.

Dow

nloaded from https://academ

ic.oup.com/dnaresearch/article/6/1/37/351730 by guest on 25 D

ecember 2021