12
Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and functional expression of the cathepsin F-like cysteine protease domain Takayuki Miyaji a , Satoshi Murayama a , Yoshiaki Kouzuma a , Nobutada Kimura b , Michael R. Kanost c , Karl J. Kramer d , Masami Yonekura a, * a Laboratory of Food Molecular Functionality, College of Agriculture, Ibaraki University, 3-21-1, Chuo, Ami-machi, Inashiki-gun, Ibaraki 300-0393, Japan b Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8566, Japan c Department of Biochemistry, Kansas State University, Manhattan, KS 66506, USA d Center for Grain and Animal Health Research, Agricultural Research Service, United States Department of Agriculture,1515 College Avenue, Manhattan, KS 66502, USA article info Article history: Received 3 June 2010 Received in revised form 10 August 2010 Accepted 10 August 2010 Keywords: Cysteine protease inhibitor Cystatin Cathepsin F-like protease Manduca sexta Multifunctional domains abstract A Manduca sexta (tobacco hornworm) cysteine protease inhibitor, MsCPI, puried from larval hemo- lymph has an apparent molecular mass of 11.5 kDa, whereas the size of the mRNA is very large (w9 kilobases). MsCPI cDNA consists of a 9,273 nucleotides that encode a polypeptide of 2,676 amino acids, which includes nine tandemly repeated MsCPI domains, four cystatin-like domains and one procathepsin F-like domain. The procathepsin F-like domain protein was expressed in Escherichia coli and processed to its active mature form by incubation with pepsin. The mature enzyme hydrolyzed Z-LeueArgeMCA, Z- PheeArgeMCA and BoceValeLeueLyseMCA rapidly, whereas hydrolysis of SuceLeueTyreMCA and Z- ArgeArgeMCA was very slow. The protease was strongly inhibited by MsCPI, egg-white cystatin and sunower cystatin with K i values in the nanomolar range. When the MsCPI tandem protein linked to two MsCPI domains was treated with proteases, it was degraded by the cathepsin F-like protease. However, tryptic digestion converted the MsCPI tandem protein to an active inhibitory form. These data support the hypothesis that the mature MsCPI protein is produced from the MsCPI precursor protein by trypsin- like proteases. The resulting mature MsCPI protein probably plays a role in the regulation of the activity of endogenous cysteine proteases. Ó 2010 Elsevier Ltd. All rights reserved. 1. Introduction Proteinaceous cysteine protease inhibitors (CPIs) have been found in various animal and plant tissues, and many of them have been isolated and characterized in terms of their protein structure and inhibitory activity. According to the MEROPS listing (Rawlings et al., 2004), more than 10 families of CPIs have been classied based on the amino acid sequences of inhibitory domains. Cystatins (family I25) are the best characterized CPI group of mammalian and plant origins. On the basis of sequence homology, the cystatin superfamily is divided into three subfamilies, I25A (cystatin A and sarcocystatin), I25B (cystatin C and phytocystatin) and I25C (kini- nogens). Cystatins are believed to be involved in the regulation of endogenous cysteine protease activity and/or in defensive mechanisms against invading organisms such as bacteria, viruses and parasites (Dubin, 2005). Cystatins from animal origin target lysosomal cysteine proteases, generally known as cathepsins. The cysteine proteases are involved both in the physiological protein breakdown (cathepsins B, C, F, H, L, O and X) and specic functions correlated with their tissue distribution (cathepsins K, S and V). The cysteine proteases are optimally active in the slightly acidic, reducing milieu found in lysosomes. They comprise a group of papain-related enzymes, sharing similar amino acid sequences and folds (Turk et al., 2001). Generally, cystatins have three conserved sequence motifs including a Gly in the vicinity of the N-terminal region, QeXeVeXeG in the rst hairpin loop, and PeW in the second hairpin loop. Struc- tural analysis revealed that the amino-terminal segment and two loops of cystatins form a tripartite wedge-shaped edge structure that interacts with the active site cleft of the cognate cysteine proteases, papain and cathepsin H (Rzychon et al., 2004; Jenko et al., 2003). Among the three regions, the QeXeVeXeG motif is critical for * Corresponding author. Tel./fax: þ81 29 888 8683. E-mail address: [email protected] (M. Yonekura). Contents lists available at ScienceDirect Insect Biochemistry and Molecular Biology journal homepage: www.elsevier.com/locate/ibmb 0965-1748/$ e see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ibmb.2010.08.003 Insect Biochemistry and Molecular Biology 40 (2010) 835e846

Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

lable at ScienceDirect

Insect Biochemistry and Molecular Biology 40 (2010) 835e846

Contents lists avai

Insect Biochemistry and Molecular Biology

journal homepage: www.elsevier .com/locate/ ibmb

Molecular cloning of a multidomain cysteine protease and protease inhibitorprecursor gene from the tobacco hornworm (Manduca sexta) and functionalexpression of the cathepsin F-like cysteine protease domain

Takayuki Miyaji a, Satoshi Murayama a, Yoshiaki Kouzuma a, Nobutada Kimura b, Michael R. Kanost c,Karl J. Kramer d, Masami Yonekura a,*

a Laboratory of Food Molecular Functionality, College of Agriculture, Ibaraki University, 3-21-1, Chuo, Ami-machi, Inashiki-gun, Ibaraki 300-0393, JapanbBioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8566, JapancDepartment of Biochemistry, Kansas State University, Manhattan, KS 66506, USAdCenter for Grain and Animal Health Research, Agricultural Research Service, United States Department of Agriculture, 1515 College Avenue, Manhattan, KS 66502, USA

a r t i c l e i n f o

Article history:Received 3 June 2010Received in revised form10 August 2010Accepted 10 August 2010

Keywords:Cysteine protease inhibitorCystatinCathepsin F-like proteaseManduca sextaMultifunctional domains

* Corresponding author. Tel./fax: þ81 29 888 8683.E-mail address: [email protected] (M. Yo

0965-1748/$ e see front matter � 2010 Elsevier Ltd.doi:10.1016/j.ibmb.2010.08.003

a b s t r a c t

A Manduca sexta (tobacco hornworm) cysteine protease inhibitor, MsCPI, purified from larval hemo-lymph has an apparent molecular mass of 11.5 kDa, whereas the size of the mRNA is very large (w9kilobases). MsCPI cDNA consists of a 9,273 nucleotides that encode a polypeptide of 2,676 amino acids,which includes nine tandemly repeated MsCPI domains, four cystatin-like domains and one procathepsinF-like domain. The procathepsin F-like domain protein was expressed in Escherichia coli and processed toits active mature form by incubation with pepsin. The mature enzyme hydrolyzed Z-LeueArgeMCA, Z-PheeArgeMCA and BoceValeLeueLyseMCA rapidly, whereas hydrolysis of SuceLeueTyreMCA and Z-ArgeArgeMCA was very slow. The protease was strongly inhibited by MsCPI, egg-white cystatin andsunflower cystatin with Ki values in the nanomolar range. When the MsCPI tandem protein linked to twoMsCPI domains was treated with proteases, it was degraded by the cathepsin F-like protease. However,tryptic digestion converted the MsCPI tandem protein to an active inhibitory form. These data supportthe hypothesis that the mature MsCPI protein is produced from the MsCPI precursor protein by trypsin-like proteases. The resulting mature MsCPI protein probably plays a role in the regulation of the activityof endogenous cysteine proteases.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Proteinaceous cysteine protease inhibitors (CPIs) have beenfound in various animal and plant tissues, and many of them havebeen isolated and characterized in terms of their protein structureand inhibitory activity. According to the MEROPS listing (Rawlingset al., 2004), more than 10 families of CPIs have been classifiedbased on the amino acid sequences of inhibitory domains. Cystatins(family I25) are the best characterized CPI group of mammalian andplant origins. On the basis of sequence homology, the cystatinsuperfamily is divided into three subfamilies, I25A (cystatin A andsarcocystatin), I25B (cystatin C and phytocystatin) and I25C (kini-nogens). Cystatins are believed to be involved in the regulation ofendogenous cysteine protease activity and/or in defensive

nekura).

All rights reserved.

mechanisms against invading organisms such as bacteria, virusesand parasites (Dubin, 2005). Cystatins from animal origin targetlysosomal cysteine proteases, generally known as cathepsins. Thecysteine proteases are involved both in the physiological proteinbreakdown (cathepsins B, C, F, H, L, O and X) and specific functionscorrelatedwith their tissue distribution (cathepsins K, S and V). Thecysteine proteases are optimally active in the slightly acidic,reducing milieu found in lysosomes. They comprise a group ofpapain-related enzymes, sharing similar amino acid sequences andfolds (Turk et al., 2001).

Generally, cystatins have three conserved sequence motifsincluding a Gly in the vicinity of the N-terminal region, QeXeVeXeGin the first hairpin loop, and PeW in the second hairpin loop. Struc-tural analysis revealed that the amino-terminal segment and twoloops of cystatins form a tripartite wedge-shaped edge structure thatinteracts with the active site cleft of the cognate cysteine proteases,papain and cathepsin H (Rzychon et al., 2004; Jenko et al., 2003).Among the three regions, the QeXeVeXeG motif is critical for

Page 2: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846836

inhibition of cysteine proteases as revealed from the mutationalanalysis of egg-white cystatin (Auerswald et al., 1992).

In insects, cystatins include sarcocystatin from Sarcophaga per-egrina and a cystatin-like protein from Drosophila melanogaster(Suzuki and Natori, 1985; Delbridge and Kelly, 1990). Saito et al.(1989) suggested that sarcocystatin A from S. peregrina isinvolved in morphogenesis of larval and adult structures of Sar-cophaga. Besides cystatins, two other types of CPIs, the propeptideregion of a cysteine protease homologous protein from Bombyxmori (family I29) and an inhibitor of apoptosis protein from D.melanogaster (family I32) also are found in insects (Yamamoto et al.,1999; Hay, 2000). However, relatively little information is knownabout CPIs from insects compared with mammals and plants. Toobtain a better understanding about structures and functions ofnovel CPIs from insects, we investigated a CPI from the tobaccohornworm (Manduca sexta).

Previously, a CPI from M. sexta, MsCPI, was purified from larvalhemolymph of the tobacco hornworm and the partial nucleotidesequence was determined (Miyaji et al., 2007). Although the MsCPIprotein purified from the hemolymph had a small molecular massof 11,400e11,600 Da, northern blot analysis indicated that themessage was extremely large (w9 kb). Molecular cloning of theMsCPI cDNA revealed that the encoded protein belonged to thecystatin family and was composed of at least six tandemly repeatedMsCPI segments.

To determine the complete structure of the MsCPI gene, weperformed genomic and cDNA cloning of the full-length gene, andfunctionally expressed the procathepsin F-like domain that is a partof the MsCPI precursor. In addition, we determined the conceptualamino acid sequence of MsCPI precursor protein and the interac-tion that occurs between MsCPI and the cathepsin F-like protease.

2. Materials and methods

2.1. Screening of MsCPI cDNA from M. sexta fat body cDNA library

An M. sexta fat body cDNA library that was previously con-structed (Jiang et al., 2003) was screened by plaque hybridizationusing a partial MsCPI cDNA (324 bp) labeled with digoxigenin asa probe (Miyaji et al., 2007). Six positive clones were isolated andphage DNAs were prepared from them. The DNA with the longestinsert was digested with EcoRI and XhoI, and the DNA fragment wassubcloned into the plasmid vector pUC18 or pBluescript SK�. Thenucleotide sequence was determined by the dideoxy chain termi-nation method using the BigDye Terminator v3.1 cycle sequencingReady Reaction Kit and automated DNA sequencer ABI prism3130xl (Applied Biosystems Inc.).

2.2. Isolation of genomic DNA from an M. sexta larva

Genomic DNA was prepared from a fifth instar M. sexta larvausing the method of Blin and Stafford (1976), as follows. The larvawas frozen in liquid nitrogen and ground to a powder. The powderwas then dissolved in extraction buffer (20 mg/ml RNase A, 0.5%SDS, 100 mM EDTA and 10 mM TriseHCl, pH 8.0). After incubationat 37 �C for 1 h, the solution was gently mixed with proteinase K ata final concentration of 100 mg/ml at 50 �C for 3 h, and then incu-bated for 10 min with an equal volume of phenol equilibrated withTE (1 mM EDTA, 10 mM Tris-HCl, pH 8.0). After three extractionswith phenol, the supernatant was mixed thoroughly with0.1 volume of 3 M sodium acetate, pH 4.6 and 2 volumes of ethanol.The precipitated DNAwas recovered and washed using 70% ethanoltwice. The purity and concentration of the genomic DNA thusobtained were confirmed by agarose gel electrophoresis and bymeasuring the absorbance at 260 nm.

2.3. M. sexta genomic library construction and screening

An M. sexta genomic library was constructed using the Copy-Control� Fosmid Library Production Kit (EPICENTRE) according tothe manufacturer’s instructions. M. sexta genomic DNA was incu-bated with the End-Repair Enzyme Mix at room temperature for45 min. Approximately 40 kb end-repaired DNAs were resolved by1% low melting point agarose gel electrophoresis. The 40 kb frag-mentswere recovered from the gel using GELase (45 �C for 1 h). Thesize-fractionated DNA was ligated with the CopyControl pCC1FOSvector at room temperature for 2 h and then the mixture wasincubated with theMaxPlax Lambda Packaging Extracts at 37 �C for180 min. The packaged CopyControl fosmid clones were incubatedwith Escherichia coli EPI300 cells at 37 �C for 20 min after which theinfected EPI300 cells were spread on LB plate containing 12.5 mg/mlchloramphenicol and incubated at 37 �C overnight. The M. sextafosmid library thus constructed was screened by colony hybrid-ization using a 324-bp MsCPI gene fragment labeled with digox-igenin as a probe. After the second and third screening, one positiveclone was isolated and fosmid DNA was prepared from it. Toconfirm the insert position of MsCPI precursor gene, the fosmidDNA was digested with various restriction enzymes and Southernhybridization was performed. The DNA fragments obtained byEcoRI, HindIII or PstI digestions were subcloned into the plasmidvector pUC18 and the nucleotide sequences were determined.

2.4. Southern hybridization

The fosmid DNA from the positive clone was digested with BglII,EcoRI, HindIII, PstI, SacI, SalI, SphI, XbaI, or XhoI. The DNA fragmentswere resolved by agarose gel electrophoresis and transferred toa Hybond-Nþ nylon membrane (GE Healthcare), after which themembrane was baked at 80 �C for 2 h. A 324-bp MsCPI gene frag-ment was labeled with alkaline phosphatase (AlkPhos DirectLabeling and Detection System with CDP-Star, GE Healthcare) foruse as a probe. Hybridizationwas performed in hybridization buffer(4% blocking reagent/0.5 M NaCl) at 55 �C overnight. After washingfor 10 min twice at 55 �C with 50 mM Na phosphate buffer, pH 7.0containing 0.2% blocking reagent, 0.1% SDS, 1 mM MgCl2, 150 mMNaCl and 2 M urea followed by 5 min twice at room temperaturewith 50 mM TriseHCl buffer, pH 10 containing 2 mM MgCl2 and100 mM NaCl, the membrane was incubated with CDP-Star detec-tion reagent for 5 min and visualized using an LAS 3000 MiniImaging System (Fuji Film).

2.5. 50-RACE and RT-PCR of MsCPI gene

Isolation of total RNA from fat body ofM. sextawas performed aspreviously reported (Miyaji et al., 2007). 50-RACE and RT-PCR wereconducted using the GeneRacer� Kit (Invitrogen) according to themanufacturer’s instructions. Briefly, full-lengthmRNAwith a 50-capstructure was decapped by tobacco acid pyrophosphatase and thenligated with RNA Oligo. First-strand cDNA was synthesized bySuperScript� III Reverse Transcriptase using the mRNA and MsCPIspecific primers. PCR was performed using the first-strand cDNA asa template and gene-specific primers, and the PCR products werepurified by using the Wizard� SV Gel and PCR Clean-Up System(Promega). The purified PCR products were cloned into the pGEMT-Easy vector (Promega) and then sequenced.

2.6. Expression of procathepsin F-like protein in E. coli

MsCPI cDNA prepared from a positive clone from M. sexta fatbody cDNA library was digestedwith BamHI and the DNA fragmentincluding the region encoding the procathepsin F-like domain

Page 3: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 837

(Msprocat F) was ligated into pUC18 (pUC-MsCPI). Msprocat Fgene-specific primers (F1, 50-CCAAGGAGGCACTGTGTGGCGC-30; F2,50-CATATGGCCGAGGTGTATCACCAT TTG-30; B1, 50-AAGAATACATAATTCACTC TGG-30 and B2, 50-GAATTCTCATACGACAG CACTGCTCGCC-30) were constructed and PCR was performed using a primer set ofF1 and B1 with pUC-MsCPI as the template. A second PCR wasperformed using a primer set of F2 and B2 with the first PCRmixture as the template. The amplifiedMsprocat F genewas ligatedinto pGEM T-Easy vector. The Nde I site in the Msprocat F gene wasmutated without amino acid substitution by the mutation primers,F3, 50-GTACAAAACACATGGGTATGAAAGCTAGTC-30 and B3, 50-GACTAGCTTTCATACCCATGTGTTT TGTAC-30, using the QuikChangeSite-Directed Mutagenesis Kit (Stratagene). After confirmation ofthe nucleotide sequence, the DNA fragment encoding the MsprocatF protein was excised by digestion with Nde I and Sal I, and thenligated into the expression vector pET-22b previously digested withthe same enzymes. The resulting plasmid, pET-Msprocat F, wasintroduced into the E. coli BL21 CodonPlus(DE3) RIL strain and the E.coli was incubated in a Circle Grow medium (Qbiogene). Afterinduction with 1 mM IPTG, the culture was incubated overnight at37 �C, after which the cells were harvested. The cells were lysed bysonication in 30 mM TriseHCl buffer, pH 7.5 containing 30 mMNaCl. The lysate was centrifuged at 8000� g for 10 min and thepellet (inclusion bodies) was collected and washed with 1 Msucrose, followed by 1% Triton X-100/10 mM EDTA.

2.7. Refolding of Msprocat F protein

Refolding of Msprocat F protein was performed using themethod of Brömme et al. (2004). The inclusion bodies were dis-solved in 50 mM TriseHCl buffer, pH 8.0 containing 5 mM EDTA,5 mM cysteine and 8 M urea to the protein concentration of 1 mg/ml. The protein solution was diluted using 50 mM TriseHCl buffer,pH 8.0 containing 5 mM cysteine to a final concentration of 20 mg/ml and stirred at 4 �C overnight. After refolding, the Msprocat Fsolution was dialyzed against 50 mM TriseHCl buffer, pH 8.0 andconcentrated by ultrafiltration using a YM-10 filter (Millipore).

Protein concentration was determined using the BCA proteinassay kit (Thermo Fisher Scientific) with bovine serum albumin asthe standard protein. SDS-PAGE was carried out using a 15% poly-acrylamide gel as described by Laemmli (1970).

2.8. Activation of Msprocat F and purification

Activation of Msprocat F was performed using pepsin, whichwas successfully used previously (Brömme et al., 2004; Wang et al.,1998; Fonovic et al., 2004). The refolded Msprocat F solution wasdialyzed against 20 mM Na-acetate buffer, pH 4.5 containing 1 mMEDTA. After dialysis, the Msprocat F solution was supplementedwith DTT (final concentration 5 mM) and porcine pepsin (Sigma) ata molar ratio of 1/10. The solution was incubated at 37 �C for 5 h,and activation was monitored by measuring peptidase activityusing Z-LeueArgeMCA as the substrate and western blotting usinga polyclonal antibody elicited against Msprocat F.

After activation, the solution was applied to a SP-sepharose FFcolumn (GE Healthcare) equilibrated with 20 mM Na-acetatebuffer, pH 5.5 containing 1 mM DTT and 1 mM EDTA, and theproteins were eluted with a linear gradient of increasing NaClconcentration from 0 to 0.5 M.

2.9. Western blotting

Western blotting was carried out with a murine anti-Msprocat Fserum obtained according to the method of Young and Davis(1983), and a horseradish peroxidase-conjugated rabbit IgG

fraction generated against mouse IgG (Funakoshi). The protein wasvisualized by addition of the peroxidase substrate solution, 0.5 mg/ml 3, 30-diaminobenzidine tetrahydrochloride (Dojindo), 0.15%H2O2, 5 mM imidazole and 100 mM TriseHCl, pH 7.5.

2.10. Assay of Mscat F activity

Peptidase (cathepsin F-like protease, Mscat F) activity wasmeasured using the method of Barrett and Kirschke (1981). Fluoro-genic substrates, t-Butyloxycarbonyl-L-Leucyl-L-Lysyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (BoceLeueLyseArgeMCA), Benzylox-ycarbonyl-L-Arginyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (Z-ArgeArgeMCA), Benzyloxycarbonyl-L-Leucyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (Z-LeueArgeMCA), Benzyloxycarbonyl-L-Phenyl-alanyl-L-Arginine 4-Methyl-Coumaryl-7-Amide (Z-PheeArgeMCA;Peptide Institute, Inc) or N-Succinyl-L-Leucyl-L-Tyrosine 4-methyl-Coumaryl-7-Amide (SuceLeueTyreMCA, Sigma), were dissolved inDMSO to the concentration of 10 mM and diluted 1000-fold in100mM Na-acetate buffer, pH 5.5 containing 1 mM DTT and 1 mMEDTA.Onemilliliter of substrate solutionwaspreincubated at 37 �C for10 min. One hundred microliters of enzyme solutionwere added andincubated at 37 �C for 10 min. After incubation, 200 ml of 100 mMNa-acetate buffer, pH5.0 containing 10mM iodoacetic acid was added tostop the reaction. Amounts of the liberated product, 7-amino-4-methylcoumarin (AMC), were determined by measurement of thefluorescence intensity at 440 nm (excitation at 380 nm) using a Shi-madzu spectrofluorophotometer (model RF-1500). One unit ofenzyme activity was defined as the amount of enzyme liberating1 nmol AMC in 1 min. The concentration of active Mscat F wasdetermined using E-64 (L-trans-epoxysuccinyl-leucylamido (4-gua-nidino) butane) as the active site titrant (Barrett et al., 1982). Thekinetic parameters, Vmax and Km, were determined by using theLineweavereBurk double reciprocal plot.

2.11. Effects of pH and temperature on Mscat F activity

The effect of pH on Mscat F activity was determined using Z-LeueArgeMCA as the substrate at 37 �C for 10 min in 100 mMbuffers containing 1 mM DTT and 1 mM EDTA. Buffers used weresodium formate (pH 3.0e4.0), sodium acetate (pH 4.0e5.5), sodiumphosphate (pH 5.5e7.0).

The effect of temperature on Mscat F activity was determinedusing Z-LeueArgeMCA as the substrate at 20e50 �C for 10 min in100 mM sodium acetate buffer, pH 5.5 containing 1 mM DTT and1 mM EDTA. Mscat F was also treated at 20e50 �C for 30 min inadvance, and then the heat stability was investigated by measuringactivity at 37 �C.

2.12. Inhibition assay

Inhibition of Mscat F was assayed using recombinant MsCPI(Miyaji et al., 2007), egg-white cystatin, (Sigma) and sunflowercystatin, Sca (Kouzuma et al., 2001). Inhibition constants (Ki) towardMscat F were determined by the method of Henderson (1972).

2.13. Action of Mscat F on the MsCPI tandem protein

The expression of MsCPI tandem proteinwhich contains MsCPI-5and 6 domains in E. coli was performed according to previouslyreported methods (Miyaji et al., 2007). Purified MsCPI tandemprotein andMscat F or trypsinwere incubated at an enzyme/inhibitormolar ratio of 1/9 in 1 mMNa-acetate buffer, pH5.5 containing 1 mMDTTand 1 mMEDTA at 37 �C for 1 h. The proteolytic processingof theMsCPI tandem protein was confirmed by SDS-PAGE and measure-ment of the inhibitory activity against papain.

Page 4: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

Fig. 1. Gene organization and nucleotide sequence of MsCPI. A. Gene organization and domains of the MsCPI precursor protein. Introns in MsCPI gene are indicated by horizontallines. 50-UTR (nucleotides 1e45), signal sequence (46e105), four cystatin-like domains (domain 1, 505e903; 2, 1651e1995; 3, 5338e5700; 4, 6769e7119), nine MsCPI domains(domain 1, 2392e2703; 2, 2704e3018; 3, 3019e3333; 4, 3334e3648; 5, 3649e3963; 6, 3964e4278; 7, 4279e4593; 8, 4594e4908; 9, 4909e5223), procathepsin F-like domain (pro-region, 7120e7416; mature enzyme, 7417e8076) and 30-UTR (8077e9273) are present. B. Nucleotide and deduced amino acid sequences of MsCPI cDNA. Putative signal sequence isboxed. MsCys-1e4, MsCPI-1e9 and procathepsin F-like domains are indicated by underlines, dotted lines and broken lines, respectively. Arrowheads indicate boundaries betweenMsCPI domains. Stop codon is indicated by an asterisk.

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846838

Page 5: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

Fig. 1. (continued).

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 839

3. Results

3.1. cDNA cloning and nucleotide sequence of MsCPI gene

MsCPI cDNA clones were screened from 50,000 plaques of an M.sexta fat body cDNA library using a partial MsCPI cDNA labeled withdigoxigenin as a probe. Six positive clones were isolated and thenucleotide sequence of the longest insert DNA was determined. ThecDNA consists of 5665 nucleotides and encodes a protein with atleast 1492 amino acids. Analysis of the deduced amino acid sequence

suggested that the protein is composed of multifunctional domainsincluding at least five MsCPI domains, two cystatin-like domains andone procathepsin F-like domain. However, because the cDNAwas nota full-length cDNA, we determined the nucleotide sequence of theMsCPI gene obtained from genomic DNA.

3.2. Genomic cloning and nucleotide sequence of MsCPI gene

Approximately 5000 colonies from the M. sexta genomic librarywere screened using the MsCPI gene as a probe. One positive clone

Page 6: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

10 20 30 40 50 60

MsCPI-1 S V P - G G I T A Q N P N A P E F K Q L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A T T Q V V S G S M T R IMsCPI-2 E T P V G G I Q A Q D P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A T T Q V V S G S M T R IMsCPI-3 E T P V G G I Q A Q E P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A S T Q V V S G S M T R IMsCPI-4 E T P V G G I Q A Q D P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A T T Q V V S G S M T R IMsCPI-5 E T P V G G I Q A Q E P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A S T Q V V S G S M T R IMsCPI-6 E T P V G G I Q A Q E P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A S T Q V V S G S M T R IMsCPI-7 E T P V G G I Q A Q D P N D P I F Q S L A D E S M Q K Y L Q S - - I G S T K P H K V V R V V K A T T Q V V S G S M T R IMsCPI-8 E T P V G G I Q A Q E P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A S T Q V V S G S M T R IMsCPI-9 E T P V G G I Q A Q D P N D P I F Q S L A E E S M Q K Y L Q S - - I G S T K P H K V V R V V K A T T Q V V S G S M T R IBmsarco-type 1 L D F V G G L Q L Q D A H D Q K Y K L L A E E S L R Q F L Q K - - N G T T K P H T V V R L N K V T T Q V V S G T L I R Lsarcocystatin Q C - V G C P - S E V K G D K L K Q - - S E E T L N K S L S K L A A G D G P T Y K L V K I N S A T T Q V V S G S K D V IDmsarco-type Q V - V G G V - S Q L E G D S R K E - - A L E L L D A T L A Q L A T G D G P S Y K A I N V T S V T G Q V V A G S L N T Y

70 80 90 Identity (%)MsCPI-1 E F V I S P S D G N S G D V I S C Y S E V W E Q P WM H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 0MsCPI-2 E F V I S P S D G N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 8MsCPI-3 E F V I S P S D R N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 9MsCPI-4 E F V I S P S D G N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 8MsCPI-5 E F V I S P S D G N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - -MsCPI-6 E F V I S P S D G N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - 1 0 0MsCPI-7 E F V I S P A D G N S G D V I S C Y S E V W E Q P W R H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 6MsCPI-8 E F V I S P S D G N S G D V I S C Y S E V W E Q P WM H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 9MsCPI-9 E F V I S P S D G N S G D V I S C Y S E V W E Q P WM H K - - K E I T V D C K I N N Q K Y R A K R - - - 9 7Bmsarco-type 1 D F V A A P T G E E S R - - Y Q C H S E I W E R P W L K K - - T D I E V N C K Q S N E - - R S K R - - - 4 9sarcocystatin N A D L K D E N D K T K - - - T C D I T I W S Q P W L - E N G I E V T F N C P G E P K - - V V K K H S A 3 1Dmsarco-type E V E L D N G S D K - K - - - Q C T V K I W T Q P W L K E N G T N I K I K C S G D D G - - E L D R - T W 2 5

100 110

10 20 30 40 50

MsCys-1 E E P G A P - - C I G C A T H I - - N P Q A A G V T - E L A T L A V R H L D R H D P G V R H A L K T V I D V EMsCys-2 E E E G N E L N N V S N I - - - - - P L D D P V L Q - Q L C K E S I K K I E K E S P K L - N A L N I S E M I SMsCys-3 Q N S E K T I - - V G A F R E E Q - D P T K N Q Y K - L L A K D S L R E Y Q R L K K S N - H A H K V I E V K RMsCys-4 E D D E T P L - - - - - - - E A H I E F E S A E M A M Q L A K E A L K H I E A K Y P H P - R K Q K V V R I F SBmCys-6 E D D - - P L - - - - - - - E A H V E F E S A E M A L Q L A N E A L K H I E A R Y P N P - R K Q K I L R I F TTcCys-1 - - - - D S V P C L G C P F D L - - N T N A E G V D - E L I D V A L E H I Q T E R A K K - H A - - L V K V L Rcystatin C S S P G K P P R L V G G P M D A - - S V E E E G V R - R A L D F A V G E Y N K A S N D M - Y H S R A L Q V V Rhuman cathepsin F - A P A Q P R - - A A S F Q A W - - G P P S P E L L - A P T R F A L E M F N R G R A A G - T R A V L G L V R Gstefin A M I P - - - - - - - G G L S E A - - K P A T P E I Q - E I V D K V K P Q L E E K T N E T - Y G - - K L E A V QMsCPI-5 E T P - - - - - - V G G I Q A Q - - E P N D P I F Q - S L A E E S M Q K Y L Q S I G S T - K P H K V V R V V Kkininogen domain 3 - - - - - - - - C V G C P R D I - - P T N S P E L E - E T L T H T I T K L N A E N N A T - F Y F K I D N V K K

60 70 80 90

MsCys-1 R Q V Q V V N G V R Y I L T L L V D Y D S C T E T Q T - - - - E S C V S - - - - - - - - - S N T C K I S I L EMsCys-2 A T T Q K V S G T L T K I A F I L E H L N C T K D T P V G L R R K C E V V D E - - - L G - S N L C E A V I F EMsCys-3 V R T K V V S G L I Y E I H F T A V P T T C P S K I F - - D A S Y C K Q Q Q G - - - S A - M L S C H S K I W SMsCys-4 L E K Q V V A G I H Y R L K I E I G L T D C S A L S A - - - Q T D C K L V K D - - - E G L N K F C R V N V W LBmCys-6 L E K Q V V A G I H Y R M K V E V G L T N C T A L T N - - - R S D C K H I S D - - - E S L N K F C R V N V WMTcCys-1 L Q Q Q V V T G V K Y I L T A E F A P T L C E K S A N - L D A S S C P R D T N - - - A E - T T I C E I T Y L Hcystatin C A R K Q I V A G V N Y F L D V E L G R T T C T K T Q P - - N L D N C P F H D Q P H L K R - K A F C S F Q I Y Ahuman cathepsin F R V R R A G Q G S L Y S L E A T L E E P P C N D P M V - - - - - - C R L P V S - - - K K - T L L C S F Q V L Dstefin A Y K T Q V V A G T N Y Y I K V R A G D - - - - - - - - - - - - - - - - - - - - - - - - - - N K Y M H L K V F KMsCPI-5 A S T Q V V S G S M T R I E F V I S P S D G N S - - - - - - - - - - - - - - - - - - G D - V I S C Y S E V W Ekininogen domain 3 A R V Q V V A G K K Y F I D F V A R E T T C S K E S N E E L T E S C E T K K L - - - G Q - S L D C N A E V Y V

MsCys-1 K P W V K L P R G G K Y R G I L S N N C T S E WQ F - - - - - G D N G E V I D N Y D S N N RMsCys-2 K H T - - - - - - - R K E R Q I T Y N C - - - - - - - - - - - - - - - - - - - - - - - I E RMsCys-3 Q PW - - - - - - - L K R K Q I T V A C N P D T T L - - - - - - - - - - - - - G D N R N K RMsCys-4 R P W T - - - - - - D H P A N Y R V T C D Y H E A A - - - - - - - - - - - - - - - - - T - -BmCys-6 R P W T - - - - - - N H P P N F R V T C D Y Q E S A - - - - - - - - - - - - - - - - - T I DTcCys-1 K P W I - - - - - S K A K H V I K N N C T V S Q E Y V R P D I R K S D I S N E I D V S G K Rcystatin C V P WQ - - - - - - G T M T L S K S T C - - - - - - - - - - - - - - - - - - - - - - - Q D Ahuman cathepsin F E L - - - - - - - - G R H V L L R K D C - - - - - - - - - - - - - - - - - - - - - - - - - -stefin A S L P G - - - - - - Q N E D L V L T G Y Q V D K N K - - - - - - - - - - - - - D D E L T G FMsCPI-5 Q PW - - - - - - - R H K K E I T V D C K I N N Q K - - - - - - - - - - - - - - - Y R A K Rkininogen domain 3 V P W - - - - - - - E K K I Y P T V N C Q P - - - - - - - - - - - - - - - - - - - - - L G M

100 110

120 130 140 150

A

B

Fig. 2. Comparison of the amino acid sequences of MsCPI domains and MsCys-1e4. A. Comparison of the amino acid sequences of MsCPI domains and insect sarcocystatin-typecystatins. The amino acid sequences were aligned using MAFFT software (http://mafft.cbrc.jp/alignment/software/). Identical amino acids with MsCPI domains are enclosed in blackboxes. The three conserved regions of a cystatin (G, QXVXG and PW) are underlined. MsCPI-1e9, MsCPI domains 1e9; Bmsarco-type 1, sarcocystatin-type protein 1 in MsCPIprecursor homologous protein from B. mori (KAIKO base Gene No. BGIBMGA005131); Sarcocystatin, sarcocystatin A from Sarcophaga peregrina (GenBank accession no. J02847) andDmsarco-type, Cystatin-like protein from Drosophila melanogaster (GenBank accession no. AAF55073). B. Comparison of the amino acid sequences of MsCys-1e4 and severalcystatins. Cystatin-like domains were aligned together with several cystatins using MAFFT software. Two or more identical amino acid residues among MsCys-1e4 are enclosed inblack boxes. The three conserved regions of the cystatins are underlined. MsCys-1e4,Manduca sexta cystatin-like domains 1e4; BmCys-6, cystatin-like domain 6 in MsCPI precursorhomologous protein from B. mori; TcCys-1, cystatin-like domain 5 in cathepsin F-like cysteine protease from Tribolium castaneum (GenBank accession no. XP_973607); cystatin C,human cystatin C (GenBank accession no. CAA29096); human cathepsin F, cystatin-like domain from human cathepsin F; stefin A, human stefin A (GenBank accession no.BAA87858); MsCPI-5, MsCPI domain 5; kininogen domain 3, human low molecular weight kininogen domain 3 (GenBank accession no. AAA35497).

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846840

Page 7: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 841

was isolated and Southern hybridization using a 324-bp MsCPIgene fragment as a probe suggested that EcoRI, HindIII and PstIdigestion of the fosmid DNA was effective for subcloning of MsCPIprecursor gene. Therefore, the fosmid DNA was digested by EcoRI,HindIII and/or PstI and the nucleotide sequence of the insert DNA(about 38 kbp) was determined (GenBank accession no. AB524031).To clarify the positions of the introns, RT-PCR using gene-specificprimers was performed. Furthermore, the positions of the initiationcodons and 50-UTR were determined by 50-RACE. As shown inFig. 1A, the MsCPI gene (27,515 bp) was organized into eight exonswith seven introns (1.2 kbp, 6.7 kbp, 71 bp, 0.6 kbp, 5.8 kbp, 2.7 kbpand 1.8 kbp). Combining the partial cDNA and genomic DNAsequences, the full-length MsCPI transcript (GenBank accession no.AB242821) consists of 9273 nucleotides that encode a polypeptideof 2676 amino acids (Fig. 1B).

3.3. Primary structure analysis of MsCPI precursor protein

Analysis of the amino acid sequence of the polypeptide of 2676amino acids revealed that the presence of another two cystatin-likedomains and four MsCPI domains. Consequently, the MsCPI proteinis apparently synthesized from a very large protein precursor thathas four cystatin-like domains, nine MsCPI domains and one pro-cathepsin F-like domain (Fig. 1B).

The MsCPI domains (MsCPI-2e9) are composed of 315 nucleo-tides encoding 105 amino acids except for MsCPI-1, which consistsof only 312 nucleotides encoding 104 amino acids. The MsCPIdomains share 88e100% identical amino acid residues. Severalamino acids in the MsCPI domains are different from a previouslyreported RT-PCR product of the MsCPI gene (Miyaji et al., 2007).However, all MsCPI domains share three conservative motifs ofknown cystatins (glycine in the N-terminal region, QeXeVeXeG inthemiddle region, and PeW in the C-terminal region) and have twocysteine residues, suggesting that they belong to the I25A family towhich sarcocystatin from S. peregrina also is a member (Fig. 2A).

Four cystatin-like domains (MsCys-1e4) are composed of 133,115, 125 and 117 amino acids, respectively (Fig. 2B). They share17e21% identity with each other and have at least four cysteineresidues. They exhibit 15e20% identical residues with cystatin Cfrom Homo sapiens. Therefore, it is proposed that MsCys-1e4belong to the I25B family. They share completely or partiallyconserved motifs of cystatins. MsCys-1 has three conserved motifs.On the other hand, MsCys-2 and 4 lack two (glycine and PW) andone motif (glycine), respectively. In MsCys-3, QeXeVeXeG in themiddle region was replaced by KeXeVeXeG.

The procathepsin F-like domain consists of 318 amino acids thatinclude a 99-amino acid putative pro-region and 219-amino acidmature enzyme. The mature region contains the three conservedactive site residues in a cysteine protease, Cys25, His159 andAsn175 (papain numbering) in the sequence. It shared 52% and36e38% identity with human procathepsin F and other cathepsins(K, L and S), respectively (Fig. 3).

When we searched for homologous proteins with the proca-thepsin F-like domain, several procathepsin F-like proteins withone or some cystatin-like domains in the molecule were identified(Fig. 4). Hence, we define a cystatin-like domain as a domain withthe sequence QeXeVeXeG or related sequence because of theimportance of this motif in cysteine protease inhibition. Proteinsfrom B. mori (KAIKObase Gene No. BGIBMGA005131) and Triboliumcastaneum (GenBank accession no. XP_973607) have eight cystatin-like domains although the molecular sizes are smaller than that ofMsCPI precursor. The MsCPI precursor and the B. mori protein havetwo types of cystatin-like domains, the I25A type such as that ofsarcocystatin and MsCPI, and the I25B type such as cystatin C. Incontrast, the T. castaneum protein has only the I25B type of cystatin-

like domains. In other insect species, the procathepsin F-likeprotein from Apis mellifera has only a single cystatin-like domain. InD. melanogaster and human, a cystatin-like domain with theQeXeVeXeG motif is not present in their procathepsin F-likeproteins. However, it has been predicted that the N-terminal regionof human procathepsin F has a cystatin-like fold (Nägler et al.,1999). Cathepsin F from the human liver fluke, Opisthorchis viver-rini, lacks a region corresponding to the cystatin-like domain(Pinlaor et al., 2009).

3.4. Expression and activation of Msprocat F

To investigate the function of procathepsin F-like domain of theMsCPI precursor (Msprocat F), the recombinant protein wasproduced in E. coli. The recombinant Msprocat F (35 kDa) accu-mulated in cells was insoluble (Fig. 5A). Therefore, the recombinantprotein was dissolved in a solution containing 8 M urea and thendilutedwith renaturation buffer to facilitate refolding of the proteinto its biologically active structure.

After dialysis against an acidic buffer (pH 4.5), activation of therefolded Msprocat F protein was achieved by proteolysis withpepsin. When Msprocat F was incubated at 37 �C for 2 h in thepresence of pepsin, peptidase activity (Z-LeueArgeMCA hydro-lyzing activity) was drastically increased, reaching its the highestlevel after 3 h and then gradually decreasing (Fig. 5B). Western blotanalysis indicated that three proteins (bands i, ii, and iii) of differentmasses were derived from the Msprocat F protein. Under the acidiccondition (pH 4.5), protein i apparently converted to protein ii(Fig. 5C, 0 h). During incubation with pepsin at 37 �C, the smallerprotein (band iii) was produced (Fig. 5C, 2 h), and the intensity ofband iii was the highest at 3 h incubation and gradually decreasedconcomitant with the activity. The N-terminal amino acid sequencesof band i was AlaeGlueValeTyreHis, indicating that i is an intactform ofMsprocat F. The N-terminal amino acid sequences of ii and iiiwere LyseAlaeValeIle and AlaeProeAspeSer, respectively. Thesedata suggested that Msprocat F was converted under acidic condi-tions to an intermediate form by autolysis at the C-terminal side ofArg91p (Msprocat F numbering in Fig. 3), and then by cleavage at theC-terminal side of Thr99p by pepsin to an enzymatically activemature form.

Activated Mscat F protein was purified by ion-exchange chro-matography using SP-Sepharose FF column. The purity of the MscatF protein was confirmed by SDS-PAGE (Fig. 5A). The yield of theMscat F protein was 0.3 mg per liter of LB broth.

3.5. Action of Mscat F on synthetic MCA substrates

The substrate specificity of the Mscat F was determined usingseven synthetic substrates, BoceGlueLyseLyseMCA, BoceLeueLyseArgeMCA, BoceValeLeueLyseMCA, SuceLeueTyreMCA, Z-PheeArgeMCA, Z-LeueArgeMCA and Z-ArgeArgeMCA (Table 1).Mscat F was most active toward with Km and kcat values of 1.12 mMand 0.589 s�1, respectively. BoceGlueLyseLyseMCA, BoceLeueLyseArgeMCA and Z-ArgeArgeMCA were hydrolyzed very slowlyby Mscat F, indicating that substitution at the P2 site substantiallyreduced catalytic efficiency (kcat/Km). The kcat/Km toward Z-Phe-ArgeMCA of Mscat F (kcat/Km¼ 215,000 M�1 s�1) was higher thanthat of human cathepsin F expressed in baculovirus system (kcat/Km¼ 106,000 M�1 s�1, Fonovic et al., 2004) and cathepsin F-likecysteine protease from Clonorchis sinensis (kcat/Km¼ 10,100M�1 s�1,Na et al., 2008), and lower than that of human cathepsin F producedin Pichia pastoris (kcat/Km¼ 5,682,000 M�1 s�1,Wang et al., 1998).

Page 8: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

10p 20p 30p 40p 50p

Msprocat F A E V Y H H L Q A E H L F Y E F L S T Y K P E Y I N D R H Q M R Q R F E I F K E N V R - - K M H E L N T H E R G T A T YBmprocat F - - - - - - - - - - H L F Y D F L A T H K P N Y I D D A A E M R R R F E I F K G N V R - - K I H E L N T H E R G T A V Yhcat F S Q D L - P V K M A S I F K N F V I T Y N R T Y - E S K E E A RW R L S V F V N N M V - - R A Q K I Q A L D R G T A Q Yhcat K - - L Y P E E I L D T HW E L W K K T H R K Q Y N N K V D E I S R R L - I W E K N L K Y I S I H N L E - A S L G V H T Yhcat L - T L T F D H S L E A QW T K W K A M H N R L Y G - M N E E GW R R A - V W E K N M K M I E L H N Q E - Y R E G K H S Fhcat S - Q L H K D P T L D H HW H L W K K T Y G K Q Y K E K N E E A V R R L - I W E K N L K F V M L H N L E - H S M G M H S Y

60p 70p 80p 90p 1 10

Msprocat F - - G V T R F A D L T Y E E F S T K H M G M - - K A S L R D P N Q V Q F R K A V I P N V T A P D S F DW R D H G A V T GBmprocat F - - G I T Q F A D L S Y E E F G K K Y L G L - - K P S L R D T N Q I P M R Q A E I P K I E I P D K F DW R D Y D A V T Dhcat F - - G V T K F S D L T E E E F R T I Y L N T - - - - L L R K E P G N K M K Q A K S V G D L A P P E W DW R S K G A V T Khcat K E L A M N H L G D M T S E E V V Q K M T G L - - K V P L S H S R S N D T L Y I P E E G R A P D S V D Y R K K G Y V T Phcat L T M A M N A F G D M T S E E F R Q V M N G F Q N R K P R K G K V F Q E P L F Y - - - - - E A P R S V DW R E K G Y V T Phcat S D L G M N H L G D M T S E E V M S L T S S L - - R V P S QWQ R - - N I T Y K S N P N R I L P D S V DW R E K G C V T E

20 30 40 50 60 70

Msprocat F V K D Q G S C G S CW A F S V T G N I E G QW K M K T G D L V S L S E Q E L V D C D K L - - - D Q G C N G G L P D N A YBmprocat F V K D Q G M C G S CW A F S V T G N V E G Q Y K L K T G Q L L S L S E Q E L V D C D K L - - - D E G C N G G L P D N A Yhcat F V K D Q G M C G S CW A F S V T G N V E G QW F L N Q G T L L S L S E Q E L L D C D K M - - - D K A C M G G L P S N A Yhcat K V K N Q G Q C G S CW A F S S V G A L E G Q L K K K T G K L L N L S P Q N L V D C V S E - - - N D G C G G G Y M T N A Fhcat L V K N Q G Q C G S CW A F S A T G A L E G Q M F R K T G R L I S L S E Q N L V D C S G P Q - G N E G C N G G L M D Y A Fhcat S V K Y Q G S C G A CW A F S A V G A L E A Q L K L K T G K L V T L S A Q N L V D C S T E K Y G N K G C N G G F M T T A F

80 90

Msprocat F R A I E Q L G G L E S E D D Y P Y E G S D D K C S F N K T L A R V Q I S G A V N I T S - N E T D M A K W L V K H G P I SBmprocat F R A I E Q L G G L E L E T D Y P Y E G E D D K C V F N K S I A R V Q I S G A V N I T S - N E T D M A K W L V N H G P I Shcat F S A I K N L G G L E T E D D Y S Y Q G H M Q S C N F S A E K A K V Y I N D S V E L S Q - N E Q K L A A W L A K R G P I Shcat K Q Y V Q K N R G I D S E D A Y P Y V G Q E E S C M Y N P T G K A A K C R G Y R E I P E G N E K A L K R A V A R V G P V Shcat L Q Y V Q D N G G L D S E E S Y P Y E A T E E S C K Y N P K Y S V A N D T G F V D I P K - Q E K A L M K A V A T V G P I Shcat S Q Y I I D N K G I D S D A S Y P Y K A M D Q K C Q Y D S K Y R A A T C S K Y T E L P Y G R E D V L K E A V A N K G P V S

Msprocat F I G I N A N - - A M Q F Y M G G I S H P W R M L C N P S N L D H G V L I V G Y G A K D Y P L F H K H L P Y W I I K N SWBmprocat F I G I N A N - - A M Q F Y M G G I S H P W K M L C N P T N L D H G V L I V G Y G A K D Y P L F H K H L P Y W T I K N SWhcat F V A I N A F - - G M Q F Y R H G I S R P L R P L C S P W L I D H A V L L V G Y G N R S - - - - - - D V P F W A I K N SWhcat K V A I D A S L T S F Q F Y S K G V Y - - Y D E S C N S D N L N H A V L A V G Y G I Q K - - - - - - G N K HW I I K N SWhcat L V A I D A G H E S F L F Y K E G I Y - - F E P D C S S E D M D H G V L V V G Y G F E S T E S - - D N N K Y W L V K N SWhcat S V G V D A R H P S F F L Y R S G V Y - - Y E P S C - T Q N V N H G V L V V G Y G D L N - - - - - - G K E Y W L V K N SW

Identity (%)Msprocat F G T SWG E Q G Y Y R V Y R G D G - T C G V N Q M A S S A V V - -Bmprocat F G K SWG E Q G Y Y R V Y R G D G - T C G V N K M A S S S V V - 8 2hcat F G T DWG E K G Y Y Y L H R G S G - A C G V N T M A S S A V V D 5 2hcat K G E NWG N K G Y I L M A R N K N N A C G I A N L A S F P K M - 3 8hcat L G E E WG M G G Y V K M A K D R R N H C G I A S A A S Y P T V - 3 7hcat S G H N F G E E G Y I R M A R N K G N H C G I A S F P S Y P E I - 3 6

190 200 210

140 150 160 170

100 110 120 130

180

Fig. 3. Comparison of amino acid sequences of procathepsin F-like domain of MsCPI cDNAwith human cathepsins F, K, L and S and also the B. mori procathepsin F-like domain. Theprocathepsin F-like domain was aligned together with various cysteine proteases using MAFFT software. Identical amino acids with procathepsin F-like domain from MsCPIprecursor are enclosed in black boxes. Catalytic residues and residues forming the S2 binding pocket are indicated by asterisks and dots, respectively. The cleavage sites for theactivation of the enzyme are indicated by arrowheads. Bmprocat F, procathepsin F-like domain in MsCPI precursor homologous protein from B. mori; hcat F, Homo sapiens cathepsinF; hcat K, Homo sapiens cathepsin K (GenBank accession no. CAA57649); hcat L, Homo sapiens cathepsin L (GenBank accession no. AAA66974) and hcat S, Homo sapiens cathepsin S(GenBank accession no. AAB22005).

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846842

3.6. Effects of pH and temperature on Mscat F activity

To characterize the enzymatic properties of Mscat F, the pepti-dase activity was investigated using Z-LeueArgeMCA as thesubstrate under various conditions. The optimum pH of Mscat F waspH 5.5e6.0 (Fig. 6A). More than 40% of maximum activity remainedat pH3.5 and6.5, andnearly disappeared at pH3.0 and7.0.When thereaction temperature was varied, maximum activity occurred at37 �C and there was no activity at 50 �C (Fig. 6B). Mscat F was rela-tively stable up to 35 �C after an incubation time of 30 min (Fig. 6B).

3.7. Inhibition of Mscat F by various cystatins

Inhibition of Mscat F was investigated using three cystatins,MsCPI (insect), egg-white cystatin (animal) and sunflower cystatin

Sca (plant). Although all three cystatins inhibited Mscat F, theydiffered in effectiveness (Table 2). Inhibitory activity of MsCPI(Ki¼ 1.2 nM)wasw20-fold stronger than that of egg-white cystatin(25.3 nM) and w260-fold stronger than that of Sca (310 nM).

3.8. Action of Mscat F on the MsCPI tandem protein

To investigate whether Mscat F can function as a processingprotease of the MsCPI precursor, recombinant MsCPI tandemproteinwas produced in E. coli. After theMsCPI tandem proteinwasincubated with Mscat F at 37 �C for 1 h, both SDS-PAGE and papaininhibition assay were performed (Fig. 7). SDS-PAGE of the digestshowed a ladder of protein bands, suggesting that the tandemprotein was degraded nonspecifically by Mscat F. The inhibitoryactivity of the digest against papain was decreased to only 15%

Page 9: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

M. sexta

B. mori

T.

O.

castaneum

A. mellifera

D. melanogaster

H. sapiens

Cystatin-like domain having partial conserved sequencesUnknown function

MsCPI type domain Procathepsin F-like domain

Signal sequence Cystatin-like domain having complete conserved sequences

500 aa

Fig. 4. Comparison of domain structures of the MsCPI precursor protein and procathepsin F elike proteins. Among three conserved regions of a cystatin (G, QeXeVeXeG, and PW),domains with the motif QeXeVeXeG or a related sequence are defined as cystatin-like domains. M. sexta, MsCPI precursor protein (GenBank accession no. AB242821); B. mori,MsCPI precursor homologous protein from B. mori (KAIKO base Gene No. BGIBMGA005131); T. castaneum, cathepsin F-like cysteine protease from Tribolium castaneum (GenBankaccession no. XP_973607); A. mellifera, CG12163-PA isoform A from Apis mellifera (GenBank accession no. XP_392381); D. melanogaster, CG12163 isoform A from D. melanogaster(GenBank accession no. NP_730901); H. sapiens, Homo sapiens procathepsin F (GenBank accession no. AAF13146) and O. viverrini, cysteine protease from O. viverrini (GenBankaccession no. AAV69023).

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 843

relative to the tandem protein. In contrast, when theMsCPI tandemprotein was treated with trypsin, the digest yielded a single bandwith mobility similar to that of the mature MsCPI protein (Fig. 7A).The inhibitory activity against papain of the tryptic digest was fourtimes higher than that of the MsCPI tandem protein (Fig. 7B).

4. Discussion

Proteinaceous cysteine protease inhibitors inactivate cysteineproteases from various organisms. In mammals, the physiologicalroles of CPIs are recognized as regulatory proteins for endogenouscysteine proteases and also as defensive proteins inhibiting exog-enous cysteine proteases that facilitate infections caused bybacteria and viruses. Although it is thought that CPIs from insectshave also similar functions, little information is known about thestructures and functions of insect CPI proteins.

To date the CPI from larval hemolymph of M. sexta is the thirdCPI protein to be purified from insects. MsCPI has an apparentmolecular mass of 11.5 kDa, whereas the size of the mRNA isunexpectedly large (w9 kb) and is inconsistent with the molecularmass of the inhibitor protein. Therefore, we searched for a geneencoding a MsCPI precursor protein, whose mass would beconsistent with size of the mRNA.

cDNA and genomic cloning indicated that there was indeeda MsCPI precursor protein (2676 amino acid residues) that con-tained multifunctional domains, including four cystatin-likedomains, nine tandem-repeated MsCPI domains and one proca-thepsin F-like domain. In insects, genes encoding proteins withmulti-CPI domains and a cysteine protease domain (cathepsin F)such as the MsCPI precursor are found in another lepidopteran (B.mori) and a coleopteran (T. castaneum). In contrast, so far it has notbeen found in a dipteran (D. melanogaster) or an hymenopteran (A.mellifera). It is unclear whether the genewas acquired or lost duringthe evolutionary process. However, that question will be answered

when a larger number of genome sequences have been determinedin the future.

Nine MsCPI domains share 88e100% identity with each otherand contain three conserved motifs. Therefore, it is hypothesizedthat the MsCPI domains play similar physiological roles. The cys-tatin-like domain, MsCys-1, has three conserved motifs, but theother cystatin-like domains are less conserved. We have attemptedto express the MsCys-1e4 proteins in E. coli and to investigate theircysteine protease inhibitory activities.

Although both Z-LeueArgeMCA and Z-PheeArgeMCA areeffective substrates for Mscat F, the kcat/Km toward Z-Phe-eArgeMCA was only 40% of that toward Z-LeueArgeMCA. Thispreference resembles those of human cathepsin K and S, but it wasdifferent from human cathepsin F, which hydrolyzed Leu and Pheresidues in the P2 position of substrates equally (Wang et al., 1998).Based on the x-ray crystallographic studies of papain-like prote-ases, such as cruzain and cathepsins B, L, and K (McGrath et al.,1997; Kamphuis et al., 1985; Musil et al., 1991; McGrath et al.,1995), the S2 subsite, which binds the P2 amino acid residue ofsubstrates, defines the primary substrate specificity of cathepsins.The crystal structure of human cathepsin F indicates that S2 pocketof human cathepsin F is formed by several amino acid residues,such as Leu67, Pro68, Ala133, Ile157, Asp158, Ala160 and Met205(Fig. 3). Among these residues, three of them, Ala133, Ile157 andAla160, are substituted for Gly133, Leu159, and Gly162 in Mscat F,respectively, suggesting that these substitutions affect the substratepreference of Mscat F.

Mscat F was inhibited byMsCPI (Ki¼ 1.2 nM)more strongly thanby egg-white cystatin (Ki¼ 25.3 nM) and Sca (Ki¼ 310 nM). Thus,MsCPI is an effective inhibitor of the cognate cysteine protease,Mscat F. This tendency was also observed with two other cystatins,one from animal origin (egg-white cystatin and human cathepsins)and plant origin (sunflower cystatin and papain), suggesting thatmany cystatins inhibit their cognate cysteine proteases effectively(Table 2).

Page 10: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

Fig. 5. Overexpression and activation of the Msprocat F protein. A. SDS-PAGE ofexpressed proteins in E. coli cells. Lane 1, total cell extracts of E. coli cells withoutinduction; lane 2, total cell extracts after induction by IPTG; lane 3, soluble fraction ofE. coli cells after sonication, lane 4, insoluble fraction after sonication, and lane 5,purified Mscat F protein. M, molecular weight markers, bovine serum albumin(67.0 kDa); ovalbumin (45.0 kDa); a-chymotrypsinogen (25.6 kDa); soybean trypsininhibitor (20.1 kD); and lysozyme (14.3 kDa) B. Mscat F activity measured using Z-LeueArgeMCA as the substrate. The assay was performed at 37 �C for 10 min in100 mM sodium acetate buffer, pH 5.5 after incubation of Msprocat F with porcinepepsin at 37 �C for 0e5 h. C. Western blotting of rMsprocat F using mouse anti-rMsprocat F antiserum. Western blotting was performed after incubation of Msprocat Fwith porcine pepsin at 37 �C for 0e5 h. Three protein bands (i, ii, and iii) derived fromMsprocat F were detected. R indicates Msprocat F after refolded and dialyzed against50 mM TriseHCl buffer, pH 8.0.

Fig. 6. Effects of pH and temperature on Mscat F activity. A. Effect of pH on Mscat Factivity was assayed using Z-LeueArg MCA as the substrate, 37 �C for 10 min in bufferscontaining 1 mM DTT and 1 mM EDTA. Buffers used were 100 mM sodium formatebuffer (pH 3.0e4.0, circles), 100 mM sodium acetate buffer (pH 4.0e5.5, triangles),100 mM sodium phosphate buffer (pH 5.5e7.0, squares). B. Effect of temperature onMscat F activity. Optimum temperature (closed circles) of Mscat F was investigated at20e50 �C for 10 min in 100 mM sodium acetate, pH 5.5 containing 1 mM DTT and1 mM EDTA. Heat stability (open circles) of Mscat F was investigated after pre-incubation at 20e50 �C for 30 min.

Table 2Inhibition constants (Ki) of various cystatins toward Mscat F and five other cysteineproteases.

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846844

So far the physiological functions of cathepsin F in mammalshave been reported. One of the functions is in antigen processingand presentation (Shi et al., 2000). The major histocompatibilitycomplex (MHC) class II-associated invariant chain (Ii), which bindsto the peptide binding groove of newly synthesized MHC class IIpeptides, regulates intracellular trafficking and peptide loading of

Table 1Hydrolysis of synthetic substrates by Mscat F.

Substrates kcat (s�1) Km (mM) kcat/Km

(�103 s�1M�1)Relativeactivity (%)

Z-LeueArgeMCA 0.589 1.12 527 100Z-PheeArgeMCA 0.310 1.45 215 40.8BoceValeLeueLyseMCA 1.13 8.65 131 24.9SuceLeueTyreMCA 0.284 13.9 20.4 3.87Z-ArgeArgeMCA 0.00853 39.7 0.215 0.0408BoceLeueLyseArgeMCA n.d. n.d. n.d.BoceGlueLyseLyseMCA n.d. n.d. n.d.

The kinetic constants were determined at pH 5.5, 37 �C. Relative activities are shownwith the kcat/Km value for Z-LeueArgeMCA taken as 100%. n.d.¼ no activitydetected.

MHC class II molecules. In general, peptide loading occurs afterdegradation of Ii molecules by a series of endosomal and lysosomalproteases, cathepsin L and S (Nakagawa et al., 1998; Driessen et al.,1999). However, Shi et al. (2000) indicated that cathepsin F isinvolved in Ii peptide processing since macrophages frommice thatwere deficient in both cathepsins L and S were still capable ofprocessing Ii. Thus, in higher organisms with an acquired immunesystem, cathepsin F functions as an enzyme responsible for antigenprocessing and presentation. However, since insects apparentlylack a similar system for acquired immunity, the role(s) of cathepsinF in insects remains unclear. Therefore, we investigated the func-tion of Mscat F. In the boundary sequence of the two MsCPIdomains, there are two basic amino acid motifs, KR and RXXR(Fig. 2A). Generally, prohormones with these motifs undergoproteolytic cleavage at the N- or C-terminal side of those residuesor between the two basic residues by cathepsin L or prohormoneproteases (Hook et al., 2004). Since cathepsin F belongs to thecathepsin L family and Mscat F can cleave at the C-terminal side ofan arginine residue, a possible role of Mscat F is in the processing ofMsCPI precursor protein. To assess the possibility, a recombinant“MsCPI tandem protein” with two MsCPI domains linked by themotif RAKR was constructed. The inhibitory activity of the MsCPItandem domain protein toward papain was lower than that ofa single domain (MsCPI) possibly because of the steric hindrancecaused by the proximity of two domains or for other reasons (datanot shown). Hence, the MsCPI tandem protein was incubated withMscat F at an enzyme/inhibitor molar ratio of 1/9, which

Cysteine proteases M. sexta MsCPI Chickenegg-white cystatin

Sunflower Sca

M. sextaCathepsin F-like protease 1.2 25.3 310

HumanCathepsin B 6.8 1.7Cathepsin F 0.06Cathepsin H 3.0 0.064Cathepsin L 0.87 0.019

PapayaPapain 3.1 4.2

The Ki values (nM) towardMscat F were determined using themethod of Henderson(1972). The Ki values toward cathepsin F and B, H and L by chicken egg-whitecystatin and MsCPI are from references (Barrett and Kirschke, 1981; Fonovic et al.,2004; Miyaji et al., 2007). The Ki value toward papain for sunflower cystatin Sca isrefered to Kouzuma et al. (2001).

Page 11: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

Fig. 7. Action of Mscat F and trypsin on the MsCPI tandem protein. MsCPI tandemprotein and Mscat F or trypsin were incubated at an enzyme/inhibitor molar ratio of 1/9 in 1 mM Na-acetate buffer, pH 5.5 containing 1 mM DTT and 1 mM EDTA at 37 �C for1 h. A. SDS-PAGE of MsCPI tandem protein digested with Mscat F or trypsin. Lane 1,MsCPI tandem protein; lane 2, MsCPI tandem protein digested with Mscat F; lane 3,MsCPI tandem protein digested with trypsin and lane 4, rMsCPI. M, Marker. B. Inhib-itory activity toward papain of MsCPI tandem protein digested with Mscat F or trypsin.1, MsCPI tandem protein; 2, MsCPI tandem protein digested with Mscat F and 3, MsCPItandem protein digested with trypsin.

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846 845

corresponds to the MsCPI/Msprocat F domain ratio in the MsCPIprecursor molecule. However, the tandem protein was not pro-cessed specifically by Mscat F and the digest did not inhibit papain.On the other hand, trypsin did convert theMsCPI tandem protein toa single domain of MsCPI with inhibitory activity against papain.Thus, the MsCPI tandem protein was inactivated by Mscat Fdigestion, though active protein by trypsin. The processing in vivoof the MsCPI precursor protein is not catalyzed by Mscat F but bysome trypsin-like or prohormone processing protease.

In insects several cysteine proteases are involved in assortedphysiological processes (Reeck et al., 1999). In S. peregrina cathepsinB is involved in dissociation of the fat body during metamorphosis(Takahashi et al., 1993). Cathepsin L is essential for the differenti-ation of imaginal discs in S. peregrina cells (Homma et al., 1994) andis involved in larval molting and metamorphosis in Helicoverpaarmigera (Wang et al., 2010). More recently, several cathepsin-likeproteases including cathepsins B and L were found in M. sexta, andcysteine protease activity increased after parasitism by the para-sitoid wasp, Cotesia congregata (Serbielle et al., 2009). Although thefunctions of Mscat F remain undetermined, Mscat F may play rolessimilar to those cysteine proteases. On the other hand, MsCPI canstrongly inhibit Mscat F and various other cysteine proteasesincluding animal cathepsins (Miyaji et al., 2007). Therefore, it isproposed that MsCPI is involved in the regulation of endogenousprotease activity and/or in the inhibition of cysteine proteasesderived from invading organisms. To further investigate thesespeculations, we need to clarify the processing mechanism of theMsCPI-cat F precursor protein in various tissues and duringdevelopment of M. sexta. Elucidation of these regulatory systemswill provide new insights for future biological research on insectproteolytic enzymes and their inhibitors.

References

Auerswald, E.A., Genenger, G., Assfalg-Machleidt, I., Machleidt, W., Engh, R.A.,Fritz, H., 1992. Recombinant chicken egg white cystatin variants of the QLVSGregion. Eur. J. Biochem. 209, 837e845.

Barrett, A.J., Kirschke, H., 1981. Cathepsin B, cathepsin H, and cathepsin L. MethodsEnzymol. 80, 535e561.

Barrett, A.J., Kembhavi, A.A., Brown, M.A., Kirschke, H., Knight, C.G., Tamai, M.,Hanada, K., 1982. L-trans-Epoxysuccinyl-leucylamido(4-guanidino)butane (E-64) and its analogues as inhibitors of cysteine proteinases including cathepsinsB, H and L. Biochem J. 201, 189e198.

Blin, N., Stafford, D.W., 1976. A method for the preparation of DNA from calf thymusnuclei in maximum purity and yield, analysis of this DNA compared with thatobtained by other methods. Nucleic Acids Res. 3, 2303e2308.

Brömme, D., Nallaseth, F.S., Turk, B., 2004. Production and activation of recombinantpapain-like cysteine proteases. Methods 32, 199e206.

Delbridge, M.L., Kelly, L.E., 1990. Sequence analysis, and chromosomal localizationof a gene encoding a cystatin-like protein from Drosophila melanogaster. FEBSLett. 274, 141e145.

Driessen, C., Bryant, R.A., Lennon-Duménil, A.M., Villadangos, J.A., Bryant, P.W., Shi, G.P.,Chapman, H.A., Ploegh, H.L., 1999. Cathepsin S controls the trafficking and matu-ration of MHC class II molecules in dendritic cells. J. Cell Biol. 147, 775e790.

Dubin, G., 2005. Proteinaceous cysteine protease inhibitors. Cell Mol. Life Sci. 62,653e669.

Fonovic, M., Brömme, D., Turk, V., Turk, B., 2004. Human cathepsin F: expression inbaculovirus system, characterization and inhibition by protein inhibitors. Biol.Chem. 385, 505e509.

Hay, B.A., 2000. Understanding IAP function and regulation: a view from Drosophila.Cell Death Differ. 7, 1045e1056.

Henderson, P.J., 1972. A linear equation that describes the steady-state kinetics ofenzymes and subcellular particles interacting with tightly bound inhibitors.Biochem. J. 127, 321e333.

Homma, K., Kurata, S., Natori, S., 1994. Purification, characterization, and cDNAcloning of procathepsin L from the culture medium of NIH-Sape-4, an embry-onic cell line of Sarcophaga peregrina (flesh fly), and its involvement in thedifferentiation of imaginal discs. J. Biol. Chem. 269, 15258e15264.

Hook, V., Yasothornsrikul, S., Greenbaum, D., Medzihradszky, K.F., Troutner, K., Toneff, T.,Bundey, R., Logrinova, A., Reinheckel, T., Peters, C., Bogyo, M., 2004. Cathepsin L andArg/Lys aminopeptidase: a distinct prohormone processing pathway for thebiosynthesis of peptideneurotransmitters andhormones. Biol. Chem. 385, 473e480.

Jenko, S., Dolenc, I., Guncar, G., Dobersek, A., Podobnik, M., Turk, D., 2003. Crystalstructure of Stefin A in complexwith cathepsin H: N-terminal residues of inhibitorscan adapt to the active sites of endo- and exopeptidases. J. Mol. Biol. 326, 875e885.

Jiang, H., Wang, Y., Yu, X.Q., Kanost, M.R., 2003. Manduca sexta serpin-3 regulatesprophenoloxidase activation in response to infection by inhibiting proph-enoloxidase-activating proteinases. J. Biol. Chem. 278, 3552e3561.

Kamphuis, I.G., Drenth, J., Baker, E.N., 1985. Thiol proteases. Comparative studiesbased on the high-resolution structures of papain and actinidin, and on aminoacid sequence information for cathepsins B and H, and stem bromelain. J. Mol.Biol. 182, 317e329.

Kouzuma, Y., Tsukigata, K., Inanaga, H., Doi-Kawano, K., Yamasaki, N., Kimura, M.,2001. Molecular cloning and functional expression of cDNA encoding thecysteine proteinase inhibitor Sca from sunflower seeds. Biosci. Biotechnol.Biochem. 65, 969e972.

Laemmli, U.K., 1970. Cleavage of structural proteins during the assembly of the headof bacteriophage T4. Nature 227, 680e685.

McGrath, M.E., Eakin, A.E., Engel, J.C., McKerrow, J.H., Craik, C.S., Fletterick, R.J., 1995.The crystal structure of cruzain: a therapeutic target for Chagas’ disease. J. Mol.Biol. 247, 251e259.

McGrath, M.E., Klaus, J.L., Barnes, M.G., Brömme, D., 1997. Crystal structure of humancathepsin K complexed with a potent inhibitor. Nat. Struct. Biol. 4, 105e109.

Miyaji, T., Kouzuma, Y., Yaguchi, J., Matsumoto, R., Kanost, M.R., Kramer, K.J.,Yonekura, M., 2007. Purification of a cysteine protease inhibitor from larvalhemolymph of the tobacco hornworm (Manduca sexta) and functionalexpression of the recombinant protein. Insect Biochem. Mol. Biol. 37, 960e968.

Musil, D., Zucic, D., Turk, D., Engh, R.A., Mayr, I., Huber, R., Popovic, T., Turk, V.,Towatari, T., Katunuma, N., Bode, W., 1991. The refined 2.15�A X-ray crystalstructure of human liver cathepsin B: the structural basis for its specificity.EMBO J. 10, 2321e2330.

Na, B.K., Kang, J.M., Sohn, W.M., 2008. CsCF-6, a novel cathepsin F-like cysteineprotease for nutrient uptake of Clonorchis sinensis. Int. J. Parasitol. 38, 493e502.

Nägler, D.K., Sulea, T., Ménard, R., 1999. Full-length cDNA of human cathepsin Fpredicts the presence of a cystatin domain at the N-terminus of the cysteineprotease zymogen. Biochem. Biophys. Res. Commun. 257, 313e318.

Nakagawa, T., Roth, W., Wong, P., Nelson, A., Farr, A., Deussing, J., Villadangos, J.A.,Ploegh, H., Peters, C., Rudensky, A.Y., 1998. Cathepsin L: critical role in Iidegradation and CD4 T cell selection in the thymus. Science 280, 450e453.

Pinlaor, P., Kaewpitoon, N., Laha, T., Sripa, B., Kaewkes, S., Morales, M.E., Mann, V.H.,Parriott, S.K., Suttiprapa, S., Robinson, M.W., To, J., Dalton, J.P., Loukas, A.,Brindley, P.J., 2009. Cathepsin F cysteine protease of the human liver fluke,Opisthorchis viverrini. PLoS Negl. Trop. Dis. 3, e398.

Rawlings, N.D., Tolle, D.P., Barrett, A.J., 2004. Evolutionary families of peptidaseinhibitors. Biochem. J. 378, 705e716.

Reeck, G., Oppert, B., Denton, M., Kanost, M., Baker, J.E., Kramer, K.J., 1999. Insectproteinases. In: Turk, V. (Ed.), Proteases: New Perspectives. Birkhauser Verlag,Basel, Switzerland, pp. 125e148.

Rzychon, M., Chmiel, D., Stec-Niemczyk, J., 2004. Modes of inhibition of cysteineproteases. Acta Biochim. Pol. 51, 861e873.

Saito, H., Suzuki, T., Ueno, K., Kubo, T., Natori, S., 1989. Molecular cloning of cDNA forsarcocystatin A and analysis of the expression of the sarcocystatin A geneduring development of Sarcophaga peregrina. Biochemistry 28, 1749e1755.

Serbielle, C., Moreau, S., Veillard, F., Voldoire, E., Bézier, A., Mannucci, M.A., Volkoff, A.N.,Drezen, J.M., Lalmanach, G., Huguet, E., 2009. Identification of parasite-responsivecysteine proteases in Manduca sexta. Biol Chem. 390, 493e502.

Shi, G.P., Bryant, R.A., Riese, R., Verhelst, S., Driessen, C., Li, Z., Bromme, D.,Ploegh, H.L., Chapman, H.A., 2000. Role for cathepsin F in invariant chain

Page 12: Insect Biochemistry and Molecular Biology...Molecular cloning of a multidomain cysteine protease and protease inhibitor precursor gene from the tobacco hornworm (Manduca sexta) and

T. Miyaji et al. / Insect Biochemistry and Molecular Biology 40 (2010) 835e846846

processing and major histocompatibility complex class II peptide loading bymacrophages. J. Exp. Med. 191, 1177e1186.

Suzuki, T., Natori, S., 1985. Purification and characterization of an inhibitor of thecysteine protease from the hemolymph of Sarcophaga peregrina larvae. J. Biol.Chem. 260, 5115e5120.

Takahashi, N., Kurata, S., Natori, S., 1993. Molecular cloning of cDNA for the29 kDa proteinase participating in decomposition of the larval fat bodyduring metamorphosis of Sarcophaga peregrina (flesh fly). FEBS Lett. 334,153e157.

Turk, V., Turk, B., Turk, D., 2001. Lysosomal cysteine proteases: facts and opportu-nities. EMBO J. 20, 4629e4633.

Wang, B., Shi, G.P., Yao, P.M., Li, Z., Chapman, H.A., Brömme, D., 1998. Humancathepsin F. Molecular cloning, functional expression, tissue localization, andenzymatic characterization. J. Biol. Chem. 273, 32000e32008.

Wang, L.F., Chai, L.Q., He, H.J., Wang, Q., Wang, J.X., Zhao, X.F., 2010. A cathepsin L-like proteinase is involved in moulting and metamorphosis in Helicoverpaarmigera. Insect Mol. Biol. 19, 99e111.

Yamamoto, Y., Watabe, S., Kageyama, T., Takahashi, S.Y., 1999. A novel inhibitorprotein for Bombyx cysteine proteinase is homologous to propeptide regions ofcysteine proteinases. FEBS Lett. 448, 257e260.

Young, R.A., Davis, R.W., 1983. Efficient isolation of genes by using antibody probes.Proc. Natl. Acad. Sci. U.S.A. 80, 1194e1198.