5
Proc. Natl. Acad. Sci. USA Vol. 84, pp. 349-353, January 1987 Biochemistry Cloning and characterization of human liver cDNA encoding a protein S precursor (vitamin K-dependent protein/protein processing/protein C cofactor/blood coagulation/epidermal growth factor domain) JOANN HOSKINS, DOUGLAS K. NORMAN, ROBERT J. BECKMANN, AND GEORGE L. LONG*t Division of Molecular and Cell Biology, Lilly Research Laboratories, Indianapolis, IN 46285 Communicated by Clarence A. Ryan, September 19, 1986 (received for review July 21, 1986) ABSTRACT Human liver cDNA encoding a protein S precursor was isolated from two cDNA libraries by two different techniques. Based upon the frequency of positive clones, the abundance of mRNA for protein S is =0.01%. Blot hybridization of electrophoretically fractionated poly(A)+ RNA revealed a major mRNA =4 kilobases long and two minor forms of -3.1 and =2.6 kilobases. One of the cDNA clones contains a segment encoding a 676 amino acid protein S precursor, as well as 108 and 1132 nucleotides of 5' and 3' noncoding sequence, respectively, plus a poly(A) region at the 3' end. The cDNAs are adenosine plus thymidine-rich (60%) except for the 5' noncoding region, where 78% of the nucleo- tides are guanosine or cytosine. The protein precursor consists of a 41 amino acid "leader" peptide followed by 635 amino acids corresponding to mature protein S. Comparison of the mature protein region with homologous vitamin K-dependent plasma proteins shows that it is composed of the following domains: an amino-terminal y-carboxyglutamic acid-rich re- gion of 37 amino acids; a 36 amino acid linker region rich in hydroxy amino acids; four epidermal growth factor-like seg- ments, each -45 amino acids long; and a 387 amino acid carboxyl-terminal domain of unrecognized structure and un- known function. Human protein S is a vitamin K-dependent plasma protein, first isolated and characterized by DiScipio and Davie (1). The predominant form of free human protein S has a molecular weight of =70,000 (1, 2). Mature human protein S contains approximately 10 y-carboxyglutamic acid (Gla) residues in the amino-terminal region of the molecule and -7.8% carbohydrate (1). The protein also contains three l3-hydroxyaspartic acid residues (3). It has been reported (2) that about 20% of isolated free protein S exists in a modified form, indistinguishable on nonreducing NaDodSO4/polyac- rylamide gels from intact protein S but with Mr 64,000 on reducing gels. Limited cleavage by thrombin converts all of protein S to the modified form (2, 4). The other product generated by the cleavage is a small peptide (Mr 6000-8000), representing the amino-terminal Gla domain of the intact molecule (4). The Gla domain in vitamin K-dependent pro- teins is believed to direct binding to phospholipid membrane via Ca2l bridging (5). Under nonreducing conditions, the cleaved Gla-domain peptide remains covalently attached to the remainder of the protein because of disulfide bonding (2, 4). Under normal conditions, -60% of total circulating protein S is complexed with complement factor C4b-binding protein (C4bp) (6). The binding of protein S to C4bp has no apparent effect in vitro on the binding to or degradation of C4b (7). However, binding of C4bp to protein S significantly decreases the ability of protein S to potentiate the action of activated protein C (8). Dahlback (9) has speculated that protein S may mediate C4bp binding and action on the altered cell surface via interaction of the Gla domain with negatively charged membrane phospholipid. A physiological role for protein S as a cofactor for activated protein C was suggested by Walker (10, 11). Activated protein C is an important feedback down-regulator of blood coagulation (12). In vitro studies showed an enhancement by bovine protein S of protein C inactivation of coagulation cofactor Va (10). The effect was shown to be due to stoichiometric (1:1) complex formation of activated protein C and protein S and subsequent binding of the complex to the phospholipid vesicles where Va inactivation occurs (11). In further studies, Walker (13) demonstrated that the species specificity of bovine activated protein C in plasma is over- come by the presence of bovine protein S. In vivo evidence for the role of protein S in regulating blood coagulation comes from studies of patients with hereditary protein S deficiency who are at higher risk of venous thrombotic disease (14-18). The clinical manifestations re- ported in protein S-deficient individuals reported in these studies are very similar to those associated with hereditary protein C deficiency (19). Recently, Comp et al. (20) reported that acquired protein S deficiency, resulting from increased binding of protein S to C4bp, occurs in patients with ne- phrotic syndrome or acute systemic lupus erythematosis and in pregnant women. They noted that in each of these situations, there is an increase in thromboembolic complica- tions. Little is known about the structure of protein S beyond its size. A 13-residue amino-terminal sequence for human and bovine protein S has been reported (1). We report here the isolation and characterization of liver cDNA clones encoding human protein S. One of the clones encodes the entire 676 amino acid sequence of the protein S precursor. The precur- sor consists of a 41 amino acid "leader" peptide followed by 635 amino acids corresponding to the mature protein. MATERIALS AND METHODS Antibody Screening of a Agtll Library. A human liver cDNA library in phage Xgtll and the host cells (Escherichia coli strain Y1090) were purchased from Clontech (Palo Alto, CA). Two million plaques (four library equivalents) at a density of -200,000 per 150-mm plate were screened by the method of Young and Davis (22), as described by the supplier. Primary goat polyclonal antibody to human protein S was supplied by P. C. Comp (Oklahoma Medical Research Foundation, Oklahoma City, OK). Secondary biotinylated rabbit antibody to goat IgG and avidin-conjugated horserad- ish peroxidase were purchased from Vector Laboratories Abbreviations: C4bp, complement factor C4b-binding protein; Gla, t-carboxyglutamic acid. *Present address: Department of Biochemistry, College of Medicine, University of Vermont, Burlington, VT 05405. *To whom reprint requests should he addressed. 349 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

Embed Size (px)

Citation preview

Page 1: Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

Proc. Natl. Acad. Sci. USAVol. 84, pp. 349-353, January 1987Biochemistry

Cloning and characterization of human liver cDNA encoding aprotein S precursor

(vitamin K-dependent protein/protein processing/protein C cofactor/blood coagulation/epidermal growth factor domain)

JOANN HOSKINS, DOUGLAS K. NORMAN, ROBERT J. BECKMANN, AND GEORGE L. LONG*tDivision of Molecular and Cell Biology, Lilly Research Laboratories, Indianapolis, IN 46285

Communicated by Clarence A. Ryan, September 19, 1986 (received for review July 21, 1986)

ABSTRACT Human liver cDNA encoding a protein Sprecursor was isolated from two cDNA libraries by twodifferent techniques. Based upon the frequency of positiveclones, the abundance ofmRNA for protein S is =0.01%. Blothybridization of electrophoretically fractionated poly(A)+RNA revealed a major mRNA =4 kilobases long and two minorforms of -3.1 and =2.6 kilobases. One of the cDNA clonescontains a segment encoding a 676 amino acid protein Sprecursor, as well as 108 and 1132 nucleotides of 5' and 3'noncoding sequence, respectively, plus a poly(A) region at the3' end. The cDNAs are adenosine plus thymidine-rich (60%)except for the 5' noncoding region, where 78% of the nucleo-tides are guanosine or cytosine. The protein precursor consistsof a 41 amino acid "leader" peptide followed by 635 aminoacids corresponding to mature protein S. Comparison of themature protein region with homologous vitamin K-dependentplasma proteins shows that it is composed of the followingdomains: an amino-terminal y-carboxyglutamic acid-rich re-gion of 37 amino acids; a 36 amino acid linker region rich inhydroxy amino acids; four epidermal growth factor-like seg-ments, each -45 amino acids long; and a 387 amino acidcarboxyl-terminal domain of unrecognized structure and un-known function.

Human protein S is a vitamin K-dependent plasma protein,first isolated and characterized by DiScipio and Davie (1).The predominant form of free human protein S has amolecular weight of =70,000 (1, 2). Mature human protein Scontains approximately 10 y-carboxyglutamic acid (Gla)residues in the amino-terminal region of the molecule and-7.8% carbohydrate (1). The protein also contains threel3-hydroxyaspartic acid residues (3). It has been reported (2)that about 20% of isolated free protein S exists in a modifiedform, indistinguishable on nonreducing NaDodSO4/polyac-rylamide gels from intact protein S but with Mr 64,000 onreducing gels. Limited cleavage by thrombin converts all ofprotein S to the modified form (2, 4). The other productgenerated by the cleavage is a small peptide (Mr 6000-8000),representing the amino-terminal Gla domain of the intactmolecule (4). The Gla domain in vitamin K-dependent pro-teins is believed to direct binding to phospholipid membranevia Ca2l bridging (5). Under nonreducing conditions, thecleaved Gla-domain peptide remains covalently attached tothe remainder of the protein because of disulfide bonding (2,4). Under normal conditions, -60% of total circulatingprotein S is complexed with complement factor C4b-bindingprotein (C4bp) (6). The binding of protein S to C4bp has noapparent effect in vitro on the binding to or degradation ofC4b (7). However, binding of C4bp to protein S significantlydecreases the ability of protein S to potentiate the action ofactivated protein C (8). Dahlback (9) has speculated that

protein S may mediate C4bp binding and action on the alteredcell surface via interaction of the Gla domain with negativelycharged membrane phospholipid.A physiological role for protein S as a cofactor for activated

protein C was suggested by Walker (10, 11). Activatedprotein C is an important feedback down-regulator of bloodcoagulation (12). In vitro studies showed an enhancement bybovine protein S of protein C inactivation of coagulationcofactor Va (10). The effect was shown to be due tostoichiometric (1:1) complex formation of activated protein Cand protein S and subsequent binding of the complex to thephospholipid vesicles where Va inactivation occurs (11). Infurther studies, Walker (13) demonstrated that the speciesspecificity of bovine activated protein C in plasma is over-come by the presence of bovine protein S.

In vivo evidence for the role ofprotein S in regulating bloodcoagulation comes from studies of patients with hereditaryprotein S deficiency who are at higher risk of venousthrombotic disease (14-18). The clinical manifestations re-ported in protein S-deficient individuals reported in thesestudies are very similar to those associated with hereditaryprotein C deficiency (19). Recently, Comp et al. (20) reportedthat acquired protein S deficiency, resulting from increasedbinding of protein S to C4bp, occurs in patients with ne-phrotic syndrome or acute systemic lupus erythematosis andin pregnant women. They noted that in each of thesesituations, there is an increase in thromboembolic complica-tions.

Little is known about the structure of protein S beyond itssize. A 13-residue amino-terminal sequence for human andbovine protein S has been reported (1). We report here theisolation and characterization of liver cDNA clones encodinghuman protein S. One of the clones encodes the entire 676amino acid sequence of the protein S precursor. The precur-sor consists of a 41 amino acid "leader" peptide followed by635 amino acids corresponding to the mature protein.

MATERIALS AND METHODSAntibody Screening of a Agtll Library. A human liver

cDNA library in phage Xgtll and the host cells (Escherichiacoli strain Y1090) were purchased from Clontech (Palo Alto,CA). Two million plaques (four library equivalents) at adensity of -200,000 per 150-mm plate were screened by themethod of Young and Davis (22), as described by thesupplier. Primary goat polyclonal antibody to human proteinS was supplied by P. C. Comp (Oklahoma Medical ResearchFoundation, Oklahoma City, OK). Secondary biotinylatedrabbit antibody to goat IgG and avidin-conjugated horserad-ish peroxidase were purchased from Vector Laboratories

Abbreviations: C4bp, complement factor C4b-binding protein; Gla,t-carboxyglutamic acid.*Present address: Department of Biochemistry, College of Medicine,University of Vermont, Burlington, VT 05405.*To whom reprint requests should he addressed.

349

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

350 Biochemistry: Hoskins et al.

(Burlingame, CA). Positive plaques were replated at :5000plaques per 100-mm plate, rescreened, and finally replated at-100 plaques per 100-mm plate. Isolated positive plaquesfrom the third round of screening were used to purify phageDNA by the plate lysate technique and CsCl step-gradientfractionation (23). Subfragments representing inserted pro-tein S cDNA in Xgtll were generated by digestion withrestriction endonuclease EcoRI and ligated into EcoRI-digested, phosphatase-treated plasmid pBR322 (New En-gland Biolabs). The resulting chimeric plasmid was used totransform CaCl2-treated E. coli strain RR1, as described (24).Subfragments containing the Agtll insertion junction regionswere subcloned by Kpn I and Sac I (one site within the insert)digestion, ligation into either Kpn I/Sac I- or Sac I-cleavedpUC19 (Bethesda Research Laboratories), and transforma-tion of E. coli JM109 cells. Colonies containing the desiredDNA subfragments were determined by restriction endonu-clease mapping (digestion according to the supplier's speci-fications) of plasmid DNA prepared from 1-ml overnightcultures by the method of Birnboim and Doly (25).

Large-Scale Plasmid Preparation. Large preparations ofplasmid suitable for preparation of hybridization probes andfor DNA sequencing were grown in M9 medium (26) in thepresence of chloramphenicol and were isolated by centrifu-gation in CsCl/ethidium bromide gradients after cell lysis(26). DNA sequencing was by the chemical modificationmethod ofMaxam and Gilbert (27). AllDNA sequences werestored and analyzed with the aid of the Lilly ResearchLaboratories computing environment.cDNA Probe Screening of a Plasmid Library. A human liver

cDNA library (28) in plasmid pBR322 was screened withprotein S-encoding subfragments (radiolabeled by nick-trans-lation) from Xgtll clones, as previously outlined (28). Ap-proximately 50,000 individual colonies were screened induplicate. PlasmidDNA from isolated colonies was preparedand analyzed as described above.

Determination of Protein S mRNA Size by Blot Hybridiza-tion. Polyadenylylated RNA isolated from human and bovineliver was fractionated by electrophoresis in an agarose gel inthe presence of formaldehyde, transferred to nitrocellulose,and hybridized with nick-translated probe as described (29).

RESULTSHuman Protein S cDNA Clones in Agtll. Twenty faint

"positives" were observed upon screening =2 x 106 plaques.Seven of the 10 most intense plaques remained positive uponrescreening at the lower density. Of the three clones con-taining the largest inserts (=2.6 kilobases), two (4HS-4-1 and4HS-6-3) were identical and the third (4HS-1-1) differed onlyat the 5' end of the coding strand. The entire cDNA insert ofclone 4HS-6-3 was sequenced and found to be identical witha corresponding region (nucleotides 380-2938, Fig. 2) ofplasmid clone pHHS-IIa, described below. Clones cHS-6-3and 4HS-4-1 contain an unidentified nucleotide segment of51bases at the 5' end following the EcoRI cloning site sequence(Fig. 1). This region is followed by coding sequence beginningwith amino acid 38 of the mature protein. Clone 4HS-1-1begins at nucleotide 351 in Fig. 2. All three of the character-ized clones end at an identical EcoRI restriction site (nucle-

10 30 :u

otides 2933-2938 in Fig. 2). Analysis of the Xgtll clonesrevealed that none of them contained the region encoding theamino terminus of protein S.Human Protein S cDNA Clones in Plasmid pBR322. Nick-

translated cDNA fragments, derived from a Xgtll clone,representing the amino- and carboxyl-terminal regions ofprotein S (HincII-HindIII, bases 452-564; Hinfl-EcoRI,bases 2799-2938; shown in Fig. 2) were used to screen asecond human livercDNA library (28). Ten "positives" wereobserved upon high-density screening of %50,000 colonies.One of the isolated clones (pHHS-IIa) hybridized with bothprobes and has been completely characterized. Anotherclone (pHHS-VIb) was also characterized and found tocorrespond to =2.0 kilobases of the 3' end of clone pHHS-IIa(begins at base 1401 in Fig. 2). Clone pHHS-Ila contains a3349-nucleotide insert (Fig. 2). The insert is comprised of a32-base oligo(dG) tail, a 108-base 5' noncoding region, 2028bases encoding a protein S precursor protein (676 aminoacids), a TAA stop codon, a 1132-base 3' noncoding region,a 26-base poly(A) segment, and a 14-base oligo(dC) tail. Asshown in Fig. 2, the Pst I cloning restriction site (CTGCAG)at the 3' end of clone pHHS-IIa was not properly recon-structed during the initial library construction.

Structure of a Human Protein S Precursor Protein. Trans-lation of the coding region of clone pHHS-IIa (Fig. 2) revealsthat nascent protein S is produced as a 676 amino acidprecursor with the following composition: Phe22Leu65Ile36-Val45 Ala37 Gly44Met10 Cys36Pro3M Ser55 Thr28Tyr23Trp6 His11-Lys48Arg24Glu52Gln25Asp34Asn44. The precursor consists ofan amino-terminal 41 amino acid leader peptide followed bya 635 amino acid polypeptide chain that corresponds to themature protein. The molecular weight and amino acid com-position of purified plasma protein S reported earlier (1) is inagreement with that predicted from the structure shown inFig. 2.Comparison of the Amino-Terminal Region of Protein S with

Other Vitamin K-Dependent Proteins. Alignment of the leaderpeptide and Gla domains of human protein S with othervitamin K-dependent plasma proteins (Fig. 3) shows that theamino-terminal region of protein S is highly homologous withthose of other vitamin K-dependent proteins.

Protein S mRNA Size. Blot hybridization of electrophoreti-cally fractionated poly(A)+ RNA (Fig. 4) suggests that thereare multiple forms of human protein S mRNA. The majorform is =4 kilobases long. Two minor forms (3.1 and 2.6kilobases) are also present. Only one bovine protein S mRNA(m2.8 kilobases) was observed.

DISCUSSIONHuman protein S cDNA clones were isolated from twodifferent libraries by two different screening techniques.Assuming that only one-sixth of the Xgtll clones are in theproper orientation and reading frame for antigen expression,that all of the high-density faint positives observed in the twolibraries are indeed true positives, and that the number ofpositives reflects the relative abundance of mRNA, bothscreenings suggest that the abundance of protein S mRNA inliver is =0.01%.

70 90 110 130

GAATTCCTCCAAATTGACTGATTTACAGACAAACTTTTGGTACACAACCAAT TTATAGGATTAT rTTTTATCCAAAATACTTAGTTTGTCTTCGCTCTTTTCAAACTGGGTTATTCACTGCTGCACGTCAG -- -

AspTyrPheTyrProLysTyrLeuValCysLeuArgSerPheGtnThrGtyLeuPheThrAlaAtaArgGln - --

38 50 60

FIG. 1. DNA sequence of the 5' end of Xgtll clone 4HS-6-3 human protein S cDNA insert. Nucleotide numbering is from the beginningof the EcoRI restriction endonuclease recognition site (GAATTC) at the end of the cDNA insert. Corresponding amino acid numbering (belowthe sequence) is for mature protein S. Arrow points to a possible intron splice site, described in the text.

Proc. Natl. Acad. Sci. USA 84 (1987)

rn

Page 3: Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

Biochemistry: Hoskins et al. Proc. Nati. Acad. Sci. USA 84 (1987) 351

1 CTGCAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGCAGCACGGCTCAGACCGAGGCGCACAGGCTCGCAGCTCCGGGCGCCTAGCGCCCGGTCCCCGCCGCGACGCGCCACCGTC 120

121 CCTGCCGGCGCCTCCGCGCCTTCGA2TGAGGGTCCTGGGTGGGCGCTGCGGGGCGCCGCTGGCGTGTCTCCTCCTAGTGCTTCCCGTCTCAGAGGCMMCCTTCTGTCMMGCMCAGG240M R V L G G R C G A P L A C L L L V L P V S E A N L L S K Q a A-40 -20

241 CT3TCAC0GTCCTGGTTAGGMGCGTCGTGCMATTCTTTACTTGAAGMMCCAMCAGGGTAATCTTGMMGAGMTGCATCGMGAACTGTGCMTMMGMGMGCCAGGGAGGTCT360S Q V L V R K R R A N S L L E E T K Q G N L E R E C I E E L C N K E E A R E V F

+1 20

361 TTGAAAATGACCCGGA4CGGATTATTTTTATCCAAAATACTTAGTTTGTCTTCGCTCTTTTCAMCTGGGTTAT8TCACTGCTGCACGTCAGT0CMCTMTGCTTATCCTGACCTMGM480E N D P E T D Y F Y P K Y L V C L R S F Q T G L F T A A R Q S T N A Y P D L R S

40 60

481 GCTGTGTCTGCCATTCCAGACCAGTGTAGTCCTCTGCCATGC0T0GMGATGGATATATGAGCTGCAMGATGGMMGCTTCTTTTACTTGCACTTGTMMCCAGGTTGGCAAGGAG600C V N A I P D Q C S P L P C N E D G Y M S C K D G K A S F T C T C K P G W Q G E

80 100

601 A72AGTGTGAATTT0GACATAMTGAATGCAGATCCCTCAAATATAAAT0GGAGGTTGCAGTCMATTTGTGATAATACACCTGGAAGTTACCACTGTTCCTGTAAMATGGTTTTGTTA720K C E F D I N E C K D P S N I N G G C S QI C D N T P G S Y H C S C K N G F V M

120 140

721 TGCT8TTCA0TAAGMAGATTGTA8GATGTGGATGMATGCTCTTTGAAGCCMAGCATTTGTGGCACAGCTGTGTGCMGMCATCCCAGGAGATTTTGMTGTGAATGCCCCGMGGCT840L S N K K D C K D V D E C S L K P S I C G T A V C K N I P G D F E C E C P E G Y

160 180

841 ACAGATATMTCTCAMTCMMGTCTTGTGMGATATAGATGMTGCTCTGAGMCATGTGTGCTCAGCTTTGTGTCMTTACCCTGGAGGTTACACTTGCTATTGTGATGGGAAGAAAG 960R Y N L K S K S C E D I D E C S E N M C A Q L C V N Y P G G Y T C Y C D G K K G

200 220961 GATT1C08CTTGCCCMGAT0CAGMGAGTTGTGAGGTTGTTT0CAGTGTGCCTTCCCTTGMCCTTGACACAMGTATGMTTACTTTACTTGGCGGAGCAGTTTGCAGGGGTTGTTTTAT1080

F K L A Q D Q K S C E V V S V C L P L N L D T K Y E L L Y L A E Q F A G V V L Y240 260

1081 ATT1TA2ATTTCGTTTGCCAGMAATCAGCAGATT2TTCAGCAGMTTTGATTTCCGGACATATGATT2CAGMGGCGTGATACTGTACGCAGAATCTATCGATCACTCAGCGTGGCTCCTGA1200L K F R L P E I S R F S A E F D F R T Y D S E G V I L Y A E S I D H S A W L L I

280 3001201 TTGCACTTCGTGGTGGAAAGATTGMGTTCAGCTTMGMTGMCATACATCCMMTCACMCTGGAGGTGATGTTATTMTAATGGTCTATGGMTATGGTGTCTGTGGMGAATTAG 1320

A L R G G K I E V Q L K N E H T S K I T T G G D V I N N G L W N M V S V E E L E320 340

1321 AACATAGTATTAGCAT1T4AATAGCT40GMGCTGTGATGGATATMMTMMCCTGGACCCCTTTTTAAGCCGGMMTGGATTGCTGGMACCMMGTATACTTTGCAGGATTCCCTC1440H S IS I K I A K E A V M D I N K P G P L F K P E N G L L E T K V Y F A G F P R

360 . . . . . 380.1441 GG1AGTGGA0GTGMCTCATTAMCCGATTMCCCTCGTCTAGATGGATGTATACGAAGCTGGAATTTGATGAAGCMGGAGCTTCTGGAATMAGGAAATTATTCMGMMACAAA1560

K V E S E L I K P I N P R L D G C I R S W N L M K Q G A S G I K E I I Q E K Q N400 420

1561 ATAAGCATTGCCTGGTTACTGTGGAGMGGGCTCCTACTATCCTGGTTCTGGMTTGCTCMTTTCACATAGATTATAATMTGTATCCAGTGCTGAGGGTTGGCATGTAMTGTGACCT 1680K H C L V T V E K G S Y Y P G S G I A Q F H I D Y N N V S S A E G W H V N V T L

440 . * V 4601681 T1G8TATTCGTCCATCCACGGGCACTGGTGTTATGCTTGCCTTGGTTTCTGGTAACAACACAGTGCCCTTrGCTGTGTCCTTGGTGGACTCCACCTCTGAMAAATCACAGGATATTCTGT1800

N I R P S T G T G V M L A L V S G N N T V P F A V S L V D S T S E K S Q D I L L480. . . . 500

1801 TATCT2GT0TGAATACTGTAATATAT1CGGATACAGGCCCTMAGTCTATGT9T2CCGATCMCMTCTCATCTGGMTTTAGAGT0CMCAGMMCMTCTGGAGTTGTCGACACCACTTMM1920S V E N T V I Y R I Q A L S L C S D Q Q S H L E F R V N R N N L E L S T P L K I

520 540

1921 TAG20CCATCTCCCATG40GACCTTCMAGACAACTTGCCGTCTTGGACAAAGCMTGAMGCMMGTGGCCACATACCTGGGTGGCCTTCCAGATGTTCCATTCAGTGCCACACCAG2040E T I S H E D L Q R Q L A V L D K A M K A K V A T Y L G G L P D V P F S A T P V

560 580

2041 TG1TGCCTTTTATMTGGCTGCAT0GGMGTGMTATTAATGGTGTACAGTTGGATCTGGATGMGCCATTTCTMACATMTGATAT0TAGAGCTCACTCATGTCCATCAGTTTGGMM2160N A F Y N G C M E V N I N G V Q L D L D E A I S K H N D I R A H S C P S V W K K

600 620

2161 AGACAAAGMTTCTTMGGCATCTTTTCTCTGCTTATMTACCTTTTCCTTGTGTGTMTTATACTTATGTTTCMTMCAGCTGMGGGTTTTATTTACMTGTGCAGTCTTTGATTAT 2280T K N S

635

2281 TTTGTGGTCCTTTCCTGGGATT2TTTMAAGGTCCTTTGTCMGGMAMAAT4TCTGTTGTGATATAMATCACAGTMMGMATTCTTACTTCTCTTGCTATCTMGAATAGTGAAMATA2400

2401 ACAATTTTAAAT2TTGMTTTTTTTCCTAC5MMTGACAGT2TTCMTTTTTGTT2TGTAMACTMAT0TTTMTTTTATCATCATGMCTAGT0GTCTMATACCTATGTTTTTTTCAGAMGC2520

2521 2GG40GTMCTCMMCMAAGTGCGTGTMTTMMTACTATTMTCATAGGCAGATACTATTTTGTTTATGTTTTTGTTTTTTTCCTGATGAAGGCAGMGAGATGGTGGTCTATTAA2640

2641 ATAT2G7TTGTGGAGGGT2CCTMTGCCTTAT7TT6CMMCMT0TCCTCAGGGGGACCAGCTTTGGCT2TCATCTTTCTCTTGTGTGGCTTCACATTTMMCCAGTATCTTTATTGMTTA2760

2761 GA8AC80GTGGGACATATTTTCCTGAGAGCAGCACAGGMTCTTCTTCTTGGCAGCTGCAGTCTGTCAGGATGAGATATCAGATTAGGTTGGATAGGTGGGGAMTCTGMGTGGGTAC2880

2881 ATTTTT0TATTTTGCTGT3GTGGGTCACACMAGGTCTACATTACMMAAGACAGMTTCAGGGATGGMMGGAGMTGMCMMTGTGGGAGTTCATAGTTTTCCTTGMTCCAACTTTTA3000

3001 AT3TACCAGAGTAAGTTGCC12AATGTGATTGTTGMGTACAAMGGMCTATGMMCCAGMCMMTT0TTMCMMGGACMCCACAGAGGGATATAGT0GMTATCGTATCATTGTM3120

3121 TC34G0GTMGGAGGTMGATTGCCACGTGCCTGCTGGTACTGTGATGCATTTCAAGTGGCAGTTTTATCACGTTTGMTCTACCATTCATAGCCAGATGTGTATCAGATGTTTCACT3240

3241 GACAGTTTT 35CAATAAATTCTTTTCACTGTATTTTATAT3CACT5TATMTMATCGGTGTATMTTTTA5AAAAAAAAAAAAACCCCCCCCCCCGCAG 3355

FIG. 2. DNA sequence of a human liver cDNA encoding the precursor of human protein S. The nucleotide sequence begins with the PstI restriction endonuclease cloning site (CTGCAG) and ends with a non-reconstructed Pst I cloning site (CCGCAG). The polyadenylylationrecognition sequence (AATAAA) 21 nucleotides upstream from the poly(A) segment is underlined. An alternative recognition site isdash-underlined. Corresponding amino acids are shown in one-letter code: F, Phe; L, Leu; I, Ile; V, Val; A, Ala; G, Gly; M, Met; C, Cys; P,Pro; S, Ser; T, Thr; Y, Tyr; W, Trp; H, His; K, Lys; R, Arg; E, Glu; Q, Gln; D, Asp; N, Asn. Amino acid numbering (below the sequence)is based on the amino-terminal residue ofthe mature protein (position + 1). The termination codon (TAA) is shown with an asterisk. Solid trianglesare shown below asparagine residues that are possible sites of N-linked glycosylation.

Page 4: Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

Proc. Natl. Acad. Sci. USA 84 (1987)

-40 -30 -20 -10 -1MRVLGGRCGAPLACLLLVLPVSEANLLSKQQASQVLVRKRR

MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRMGRVNMIMAESPGLITICLLGYLLSAECTVFLDHENANKILNRPKR

..?SLAGLLLLGESLFIRREQANNILARVTR..?QLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRMVSQALRLLCLLLGLQGCLAAVFVTQEEAHGVLHRRRR

+1 10 20 30ANS-LLEETKQGNLERECIEELCNKEEAREVFENDPETANS-FLEELRHSSLERECIEEICDFEEAKEIFQNVDDTYNSGKLEEFVGGNLERECMEEKCSFEEAREVFENTERTANS-ELEEMKKGHLERECMEETCSYEEAREVFEDSDKTANT-FLEEVRKGNLERECVEETCSYEEAFEALESSTATANA-FLEELRPGSLERECKEEQCSFEEAREIFKDAERT

Protein SProtein CFactor IXFactor XProthrombinFactor VII

Protein SProtein CFactor IXFactor XProthrombinFactor VII

FIG. 3. Comparison of amino acid se-quences (one-letter abbreviations) for the aminotermini of human vitamin K-dependent plasmaprotein precursors. Position + 1 marks the ami-no-terminal residue of the mature proteins. Po-sitions with identical residues are marked withan asterisk. Amino acid sequences are fromreferences as follows: protein C (28), factor IX(30), factor X (31), prothrombin (32), and a 38amino acid leader peptide form of factor VII(33). Dashes represent gaps introduced to max-imize homology.

One ofthe plasmid clones (pHHS-IIa) has an =3.5-kilobaseinsert containing the entire coding region for protein S as wellas 5' and 3' noncoding regions and a poly(A) segment. Thisinsert may represent a nearly full-length cDNA. Blot hybrid-ization (Fig. 4) suggests that, in the human, multiple molec-ular forms of protein S mRNA exist, in contrast to only onebovine mRNA species, which is smaller than its humancounterparts. The possibility that the minor forms representmessages for other structurally related proteins has not beenexcluded.One of the Xgtll clones contains at its 5' end an unrecog-

nized 51-base segment preceding the protein S coding se-quence. One possible explanation is that this segment rep-resents the 3' end ofa protein S intervening sequence (intron)that had not been properly spliced out of the mRNA precur-sor. Two observations are consistent with this hypothesis.Immediately upstream from the position where the sequencebegins to code for protein S (base 59 in Fig. 1), an ApGdinucleotide exists, which has been observed at the 3' end ofall natural introns (34). Also, as discussed below, this regioncorresponds to the proposed junction of the Gla and linkerprotein structural domains. According to the proposal ofGilbert (35) that introns often separate protein structuraldomains, this would be a probable site in the gene for anintron. Characterization of the human gene for protein Sshould clearly resolve this point.The DNA sequence and base composition of clone pHHS-

Ha have several interesting features. Excluding the oligo(dG)

H R

9.5.7.54-

4.4-4

1.4.

0.3-.

FIG. 4. Blot hybridization ofhuman (H) and bovine (B) livermRNA. Poly(A)+ RNA was puri-fied by oligo(dT)-cellulose columnchromatography, and 5 gg waselectrophoresed in an agarose geland transferred to a nitrocellulosefilter. A nick-translated humanprotein S cDNA EcoRI-HindIIIfragment (nucleotides 559-2173,Fig. 2) was used to probe the filter.Shown adjacent to the autoradio-gram are the migration positionsand sizes (in kilobases) of RNAstandards (Bethesda ResearchLaboratories).

and -(dC) tails resulting from the cDNA synthesis procedureand the poly(A) segment, the DNA is considerably A+T-rich(30.4% A, 30.0% T, 21.5% G, 18.0% C). However, the108-base 5' noncoding subregion is G+C-rich (78%). Noother segment of a similar size within the cDNA has a highG+C content. The G+C-rich 5' noncoding region may playan important role in regulating gene expression either at thetranscriptional or at the translational level. Another interest-ing feature of the cDNA is the presence of a guanine deoxy-nucleotide (position 3330, Fig. 2) within the poly(A) segment.An independent clone (pHHS-VIb) for protein S from the samelibrary contains a poly(A) segment at the same position as forpHHS-IIa but is composed only of76 adenine deoxynucleotides(data not shown). Several cDNAs for other proteins have beenisolated from the library, and none have been found to containany base other than adenine in the poly(A) region (data notshown). It is unclear whether the guanine base is the result ofa cDNA-synthesis error or a polyadenylylation error. Both ofthese would be expected to be relatively low-frequency events.In this regard, no base difference was seen between thecommon region (2385 nucleotides) of Xgtll and plasmid clonesequences for human protein S.

Translation of the coding portion of clone pHHS-IIa intoamino acid sequence results in a structure consisting of a 41amino acid leader peptide followed by a 635 amino acidpeptide corresponding to the mature protein. Residues 1-13agree with the amino terminus of plasma protein S reportedby DiScipio and Davie (1). Fig. 3 clearly shows that proteinS is homologous with other vitamin K-dependent plasmaproteins not only in the Gla domain (residues 1-37) but alsoin the propeptide region. The plasma vitamin K-dependentprotein leader peptides are thought to consist of a classical"signal" peptide necessary for membrane translocation (36),followed by a propeptide (28). From considerations of eu-karyotic signal-peptide sequences and consensus recognitionsites for signal peptidase (37), we tentatively assign theputative signal peptidase cleavage site to the carboxyl-ter-minal side of residue -18. This cleavage would result in a 24amino acid signal peptide and a 17 amino acid propeptide.The sequence homology and high frequency of basic aminoacids in the propeptide region of these proteins suggest thatit may play an important role, prior to its removal, in nascentprotein folding or posttranslational modification, such asy-carboxylation or f3-hydroxylation. Consistent with thishypothesis is the fact that the propeptide region of rat boneosteocalcin (bone Gla protein) is highly homologous with theplasma vitamin K-dependent proteins (38), even thoughmature osteocalcin (containing Gla residues) is not homolo-gous to the vitamin K-dependent plasma proteins. It ispresumed that the 11 Glu residues in the Gla domain ofhumanprotein S precursor all exist as Gla residues in the mature

352 Biochemistry: Hoskins et al.

Page 5: Cloning cDNAencoding andcharacterization ofhumanliver ... · Cloning andcharacterization ofhumanliver cDNAencodinga ... The predominant form of free human protein S has a ... 3349-nucleotide

Proc. Natl. Acad. Sci. USA 84 (1987) 353

protein, in good agreement with the chemical-analysis valueof 10.3 (1).The Gla domain is followed by a "linker" region (residues

38-73) that is rich in hydroxy amino acids (three Thr andthree Ser). The function of this region is unknown, but it mayserve as a site of O-linked glycosylation and activation vialimited proteolysis as reported for coagulation factor XII (39).The cleavage ofprotein S by thrombin to generate aMr -6000amino-terminal peptide (4) probably occurs within this regioncontaining three Arg residues.

Following the linker segment are four domains (residues74-121, 122-165, 166-207, 208-248) homologous with epi-dermal growth factor (EGF). EGF domains have also beenfound in other proteins (40-42). By homology with protein C(43), Asp-95, -135, and -182 within EGF domains 1-3 aretentatively assigned as the three sites of P-hydroxyasparticacid (3) in the mature protein. The fourth EGF domain doesnot contain an aspartate residue in the corresponding loca-tion. The role of the EGF domains in protein S or in otherplasma proteins is unknown.The remainder of human protein S (amino acids 249-635)

is unidentified as to function or similarity to other proteins.A search of the Protein Identification Resource proteinsequence data base of February 1986 (National BiomedicalResearch Foundation, Washington, DC; formerly called the"Dayhoff" data base) with the computer program FASTP(44) revealed no extensive regions ofhomology. Protein S hasa 116 amino acid overlap (amino acids 273-388) containing17.2% identical residues (requiring one amino acid deletion inprotein S) with an internal portion of the /3 subunit ofDNA-directed RNA polymerase (45). There are only fivecysteine residues within the 387 amino acid unknown region,in sharp contrast to the amino-terminal Gla and EGF do-mains. Near the center of the unknown region are threeclustered potential N-linked glycosylation sites (Asn-Xaa-Ser/Thr), at residues 458, 468, and 489. These may serve asthe sites of reported carbohydrate attachment (1).

Note. After communication of this work, a report on thepartial cDNA sequence of human protein S appeared (21).

We thank Phil Comp for goat polyclonal antibodies to humanprotein S and Sherry Pike for excellent secretarial assistance.

1. DiScipio, R. G. & Davie, E. W. (1979) Biochemistry 18, 899-904.2. Suzuki, K., Nishioka, J. & Hashimoto, S. (1983) J. Biochem. 94,

699-705.3. Fernlund, P. & Stenflo, J. (1983) J. Biol. Chem. 258, 12509-12512.4. Dahlback, B. (1983) Biochem. J. 209, 837-846.5. Nelsestuen, G. L., Kisiel, W. & DiScipio, R. G. (1978) Biochem-

istry 17, 2134-2138.6. Dahlback, B. (1983) Biochem. J. 209, 847-856.7. Dahlback, B. & Hildebrand, B. (1983) Biochem. J. 209, 857-863.8. Bertina, R. M., Wijngaarden, A. V., Reinalda-Root, J., Poort,

S. R. & Bom, V. J. J. (1985) Thromb. Haemostasis 53, 268-272.9. Dahlback, B. (1984) Semin. Thromb. Hemostasis 10, 139-148.

10. Walker, F. J. (1980) J. Biol. Chem. 255, 5521-5524.11. Walker, F. J. (1981) J. Biol. Chem. 256, 11128-11131.12. Stenflo, J. (1984) Semin. Thromb. Hemostasis 10, 109-121.13. Walker, F. J. (1981) Thromb. Res. 22, 321-327.14. Schwarz, H. P., Fisher, M., Hopmeier, P., Batard, M. A. &

Griffin, J. H. (1984) Blood 64, 1297-1300.15. Comp, P. C., Nixon, R. R., Cooper, M. R. & Esmon, C. T. (1984)

J. Clin. Invest. 74, 2082-2088.16. Comp, P. C. & Esmon, C. T. (1984) New Engl. J. Med. 311,

1525-1528.17. Brochmans, A. W., Bertina, R. M., Reinalda-Root, J., Engesser,

L., Muller, H. P., Leeuw, J. A., Michiels, J. J., Brommer, E. J. P.& Briet, E. (1985) Thromb. Haemostasis 53, 273-277.

18. Bertina, R. M. (1985) Haemostasis 15, 241-246.19. Griffin, J. H. (1984) Semin. Thromb. Hemostasis 10, 162-166.20. Comp, P. C., Vigano, S., D'Angelo, A., Thurneau, G., Kaufman,

C. & Esmon, C. T. (1985) Blood 66, 348 (abstr.).21. Lundwall, A., Dackowski, W., Cohen, E., Shaffer, M., Mahr, A.,

Dahlback, B., Stenflo, J. & Wydro, R. (1986) Proc. Natl. Acad.Sci. USA 83, 6716-6720.

22. Young, R. A. & Davis, R. W. (1983) Proc. Natl. Acad. Sci. USA80, 1194-1198.

23. Davis, R. W., Botstein, D. & Roth, R. R. (1980) Advanced Bacte-rial Genetics (Cold Spring Harbor Laboratory, Cold Spring Harbor,NY).

24. Dagert, M. & Ehrlich, S. D. (1979) Gene 6, 23-28.25. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7,

1513-1523.26. Katz, L., Williams, P. H., Sato, S., Leavill, R. W. & Helinski,

D. R. (1973) J. Bacteriol. 114, 577-591.27. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65,

499-560.28. Beckmann, R. J., Schmidt, R. J., Santerre, R. F., Plutzky, J.,

Crabtree, G. R. & Long, G. L. (1985) Nucleic Acids Res. 13,5233-5247.

29. Lai, E. C., Riser, M. E. & O'Malley, B. W. (1983) J. Biol. Chem.258, 12693-12701.

30. Arson, D. S., Choo, K. H., Rees, D. J. G., Giannelli, F., Gould,F., Huddleston, J. A. & Brownlee, G. G. (1984) EMBO J. 3,1053-1060.

31. Fung, M. R., Hay, C. W. & MacGillivray, R. T. A. (1985) Proc.Natl. Acad. Sci. USA 82, 3591-3595.

32. Degen, S. J. F., MacGillivray, R. T. A. & Davie, E. W. (1983)Biochemistry 22, 2087-2097.

33. Hagen, F. S., Gray, C. L., O'Hara, P., Grant, F. J., Saari, G. C.,Woodbury, R. G., Hart, C. E., Insley, M., Kisiel, W., Kurachi, K.& Davie, E. W. (1986) Proc. Natl. Acad. Sci. USA 83, 2412-2416.

34. Breathnach, R., Benoist, O., O'Hare, K., Gannon, F. & Chambon, P.(1978) Proc. Natl. Acad. Sci. USA 75, 4853-4857.

35. Gilbert, W. (1978) Nature (London) 271, 501.36. Jackson, R. C. & Blobel, G. (1980) Ann. N.Y. Acad. Sci. 343,

391-403.37. Perlman, D. & Halvouson, H. 0. (1983) J. Mol. Biol. 167, 391-409.38. Pan, L. C. & Price, P. A. (1985) Proc. Nati. Acad. Sci. USA 82,

6109-6113.39. McMullen, B. A. & Fujikawa, K. (1985) J. Biol. Chem. 260,

5328-5341.40. Ebena, Y., Ellis, L., Jarmagin, K., Edery, M., Graf, L., Clauser,

E., Ou, J., Masiarz, F., Kan, Y. W., Goldfine, I. D., Roth, R. A.& Rutter, W. J. (1985) Cell 40, 747-758.

41. Ullrich, A., Bell, J. R., Chen, E. Y., Herrera, R., Petruzzelli,L. M., Dull, T. J., Gray, A., Caussens, L., Liao, Y.-C.,Tsubolsaua, M., Mason, A., Seeburg, P. H., Grunfeld, C., Rosen,0. M. & Ramachandran, J. (1985) Nature (London) 313, 756-761.

42. Banyai, L., Varadi, A. & Patthy, L. (1983) FEBS Lett. 163, 37-41.43. Drakenberg, T., Fernlund, P., Roepstorff, P. & Stenflo, J. (1983)

Proc. Natl. Acad. Sci. USA 80, 1802-1806.44. Lipman, D. J. & Pearson, W. R. (1985) Science 227, 1435-1441.45. Orthinnikov, Y. A., Monastyrskaya, G. S., Gwbanov, V. V.,

Guryev, S. O., Chertov, 0. Y., Modyanov, N. N., Grinkevich,V. A., Makarova, I. A., Marchenko, T. V., Polovnikova, I. N.,Lipkin, V. M. & Sverdlov, E. D. (1981) Eur. J. Biochem. 116,621-629.

Biochemistry: Hoskins et al.