Download pdf - Development of MHC class nonamers from Cowpea mosaic viral protein

Gene Therapy and Molecular Biology Vol 12, page 87

87

Gene Ther Mol Biol Vol 12, 87-94, 2008

Development of MHC class nonamers from Cowpea

mosaic viral protein Research Article

Virendra S Gomase1,*, Karbhari V Kale2 1Department of Bioinformatics, Padmashree Dr. D.Y. Patil University, Navi Mumbai, 400614, India 2Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University,

Aurangabad, 431004 (MS), India

__________________________________________________________________________________

*Correspondence: Virendra S. Gomase, Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar

Marathwada University, Aurangabad, 431004 (MS), India; mobile: +91-9226960668; e-mail: [email protected]

Key words: Genome polyprotein, Epitopes, MHC, SVM, Crystal Structure, Hydrophilicity, Hydrophobicity

Abbreviations: Cowpea mosaic virus, (CPMV); instability index, (II); support vector machine, (SVM)

Received: 6 June 2008; Revised: 9 June 2008

Accepted: 10 June 2008; electronically published: July 2008

Summary Cowpea mosaic virus causes one of the most commonly reported virus diseases of cowpea (Vigna unguiculata), in

which it produces chlorotic spots with diffuse borders in inoculated primary leaves. Cowpea mosaic viral peptides

are most suitable for subunit vaccine development because with single epitope, the immune response can be

generated in large population. Peptide binders identified through this approach tend to high-efficiency binders,

which is lagers percentage of their atoms are directly involved in binding as compared to larger molecules. For

development of MHC binder prediction method, an elegant machine learning technique support vector machine

(SVM) has been used. SVM has been trained on the binary input of single amino acid sequence. We also found the

SVM based MHCII-IAb peptide regions 51-PTINHPTFV, 113-PLPKFDSTV, 187-VYSKDDALE, 181-

RKYAVLVYS, (optimal score is 1.034); MHCII-IAd peptide regions 138-AISAMFADG, 170-LSAMRADIG, 25-

PSSADANFR, 191-DDALETDEL, (optimal score is 0.541); MHCII-IAg7 peptide regions 27-SADANFRVL, 151-

LVYQYAASG, 159-GVQANNKLL, 158-SGVQANNKL, (optimal score is 1.692); and MHCII- RT1.B peptide

regions 57-TFVGSERCR, 188-YSKDDALET, 68-YTFTSITLK, 44-KTLAAGRPT, (optimal score is 0.787) which

represented predicted binders from genome polyprotein m. These antigenic epitope are sufficient for eliciting the

desired immune response against viral infection. Study focused on computational approach to deciphering the

peptide fragments, which are antigenic in nature for synthetic peptides viral vaccines and their function of genome

polyprotein m. In analysis predicted antigenic epitopes of genome polyprotein m are predicted a successful

immunization strategy against various diseases.

I. Introduction: Cowpea mosaic virus Cowpea mosaic virus (CPMV) is a plant virus that

belongs to the genus Comovirus of the family

Comoviridae (Goldbach and Wellink, 1996; Pouwels et al,

2002). Cowpea mosaic virus sometimes referred to as

cowpea stunt virus causes foliage to turn yellowish green

with areas of light and dark green tissue. Infected leaves

are often stunted and frequently have a puckered

appearance. CPMV causes one of the most commonly

reported virus diseases of cowpea (Vigna unguiculata), in

which it produces chlorotic spots with diffuse borders in

inoculated primary leaves. Trifoliate leaves develop a

bright yellow or light green mosaic of increasing severity

in younger leaves. Viral symptoms are foliage to turn

yellowish green with areas of light and dark green tissue.

Infected leaves are often stunted and frequently have a

puckered appearance. Susceptible host species are Datura

stramonium, Glycine max, Gomphrena globosa, Nicotiana

tabacum, Phaseolus vulgaris, Pisum sativum, Spinacia

oleracea, Vicia faba, Vigna angularis, Vigna radiata,

Vigna unguiculata. Capsid protein is named for their

primary function; to encapsidate viral genomic nucleic

acids. However, encapsidation is only one feature of an

extremely diverse array of structural, functional, and

ecological roles played during viral infection and spread

(Callaway et al, 2001). The capsid protein is

multifunctional; in addition to having a role in

encapsidation; it affects virus movement in plants (Suzuki

et al, 1991; Kaplan et al, 1998), transmission, symptom

expression, and host range (Shintaku and Palukaitis,

1990). Bioinformatics is being increasingly used to

support target validation by providing functionally

Gomase and Kale: Development of MHC class nonamers from Cowpea mosaic viral protein

88

predictive information mined from databases and

experimental datasets using a variety of computational

tools. The predictive power of these bioinformatics

approaches is strongest when information from several

techniques is combined, including experimental

confirmation of protein antigenicity predictions (Gomase

and Changbhale, 2007; Gomase et al, 2007).

II. Methodology 1. Database searching The protein sequences databases are used to store the vast

amount of information issuing from the genome projects (Gracy

and Argos, 1998; Bateman et al, 2000). There are many different

types of databases available, but for routine protein sequence

analysis, primary and secondary, GenBank, UniProt databases

are initially the most important (Barker et al, 2000; Benson,

2003; Bairoch et al, 2005). We analysed the genome protein

sequence of a viral genome polyprotein M (Wezenbeek, 1983;

Taylor, 1999; Altmann and Lomonossoff, 2000; Canizares, 2004;

Carvalho, 2004).

2. Prediction of antigenicity This program predicts those segments from within viral

genome polyprotein M that are likely to be antigenic by eliciting

an antibody response. Antigenic epitope is determined using the

Hopp and Woods, Welling and Protrusion Index (Thornton),

Parker antigenicity methods (Welling et al, 1985; Thornton et al,

1986; Parker et al, 1994; IsHak et al, 2003; Gomase, 2006).

Predictions are based on a table that reflects the occurrence of

amino acid residues in experimentally known segmental

epitopes.

III. Results and Interpretations A Genome polyprotein M sequence is 1046 aa

residues long as-

MFSFTEAKSKISLWTRSAAPLNNVYLSYSCRCGLGK

RKLAGGCCSAPYITCYDSADFRRVQYLYFCLTRYC

CLYFFLLLLADWFYKKSSIFFETEFSRGFRTWRKIVK

LLYILPKFEMESIMSRGIPSGILEEKAIQFKRAKEGNK

PLKDEIPKPEDMYVSHTSKWNVLRKMSQKTVDLSK

AAAGMGFINKHMLTGNILAQPTTVLDIPVTKDKTL

AMASDFIRKENLKTSAIHIGAIEIIIQSFASPESDLMG

GFLLVDSLHTDTANAIRSIFVAPMRGGRPVRVVTFP

NTLAPVSCDLNNRFKLICSLPNCDIVQGSQVAEVSV

NVAGCATSIEKSHTPSQLYTEEFEKEGAVVVEYLGR

QTYCAQPSNLPTEEKLRSLKFDFHVEQPSVLKLSNS

CNAHFVKGESLKYSISGKEAENHAVHATVVSREGA

SAAPKQYDPILGRVLDPRNGNVAFPQMEQNLFALS

LDDTSSVRGSLLDTKFAQTRVLLSKAMAGGDVLLD

EYLYDVVNGQDFRATVAFLRTHVITGKIKVTATTNI

SDNSGCCLMLAINSGVRGKYSTDVYTICSQDSMTW

NPGCKKNFSFTFNPNPCGDSWSAEMISRSRVRMTVI

CVSGWTLSPTTDVIAKLDWSIVNEKCEPTIYHLADC

QNWLPLNRWMGKLTFPQGVTSEVRRMPLSIGGGA

GATQAFLANMPNSWISMWRYFRGELHFEVTKMSSP

YIKATVTFLIAFGNLSDAFGFYESFPHRIVQFAEVEE

KCTLVFSQQEFVTAWSTQVNPRTTLEADGCPYLYAI

IHDSTTGTISGDFNLGVKLVGIKDFCGIGSNPGIDGS

RLLGAIAQGPVCAEASDVYSPCMIASTPPAPFSDVT

AVTFDLINGKITPVGDDNWNTHIYNPPIMNVLRTAA

WKSGTIHVQLNVRGAGVKRADWDGQVFVYLRQS

MNPESYDARTFVISQPGSAMLNFSFDIIGPNSGFEFA

ESPWANQTTWYLECVATNPRQIQQFEVNMRFDPNF

RVAGNILMPPFPLSTETPPLLKFRFRDIERSKRSVMV

GHTATAA

Genome polyprotein M is 1046 amino acids residues

long, Organism is cowpea mosaic virus and Lineage is

Viruses; ssRNA positive-strand viruses, no DNA stage;

Comoviridae; Comovirus. Genome polyprotein M having

molecular weight is 116218.2 daltons, theoretical pI is

8.36. Atomic composition of Genome polyprotein M of

cowpea mosaic virus is C5207H8091N1389O1517S56 with total

number of atoms: 16260 and the instability index (II) is

computed to be 39.46; this classifies the genome

polyprotein m as stable. Aliphatic index for genome

polyprotein m is 81.29 and grand average of

hydropathicity (GRAVY): -0.092 (Gasteiger et al, 2005).

A. Analysis of genome polyprotein m Percent Hydrophilic amino acids - 49.7132

Percent Hydrophobic amino acids - 50.2868

Ratio of % OF Hydrophilic to % Hydrophobic -

0.988593

Mean ! Hydrophobic moment - 0.207361

Mean Helix Hydrophobic moment-0.168268

Number of Basic amino acids-105

Number of Acidic amino acids-97

Estimated pI for Protein-8.9

Total Linear Linear Charge Density -0.195029

Polar Area of Extended Chain (Angs) - 66160.3

Non Polar Area of Extended Chain (Angs) -117985.0

Total Area of Extended Chain (Angs) -184145.0

Polar ASA of folded protein (Angs) -13198.0

Non Polar ASA of folded protein (Angs) -17611.9

ASA of folded protein (Angs) -30809.8

Ratio of Folded of protein to extended area-

0.179129

Buried polar area of Folded of protein (Angs) -

49208.9

Buried Non polar area of Folded of protein (Angs) -

85764.2

Buried Charge area of Folded of protein (Angs)-

5632.88

Total buried surface (Angs)-140597.0

Number of buried amino acids -541

Packing volume (est) (Angs) -141932.0

Packing volume (act) (Angs) -138932.0

Interior volume of protein -102030.0

Exterior volume protein - 36901.4

Partial specific volume (Ml/g)-0.726493

Fisher volume ratio (act)- 0.361671

Protein solubility - 1.38252

Solvent free energy of folding (Kcal/mol) = -1019.52

Total number of negatively charged residues (Asp +

Glu): 97

Total number of positively charged residues (Arg +

Lys): 105

B. Prediction of Antigenic peptides In these methods we found the antigenic

determinants by finding the area of greatest local

hydrophilicity. The Hopp-Woods scale was designed to

predict the locations of antigenic determinants in a protein,

assuming that the antigenic determinants would be

exposed on the surface of the protein and thus would be


89

located in hydrophilic regions. Its values are derived from

the transfer-free energies for amino acid side chains

between ethanol and water. Welling antigenicity plot gives

value as the log of the quotient between percentage in a

sample of known antigenic regions and percentage in

average proteins (Figures 1-4).

Figure 1. Hopp-Woods antigenicity plot of genome polyprotein M.

Figure 2. Welling antigenicity plot of genome polyprotein M.

Figure 3. Protrusion Index (Thornton) antigenicity plot of genome polyprotein M.

Figure 4. Parker antigenicity plot of genome polyprotein M.


90

C. Helical Wheel The helical wheel command graphically displays the

disposition of amino acids, side chains, about an assumed

alpha helix. The view is always along the central axis of

helix, from N to C-terminus. The helical wheel is an

effective method for displaying the symmetry of

hydrophobic and hydrophilic side chains. The helical

wheel assumes a periodicity of 3.6 residues per helical

turn. Individual residues represented as colored circles are

placed successively at each node of helix. Multiple turn of

helix are represented by “radiating” the spiral outward

from the helix center. The interconnecting bars indicate

the residue arrangement along the helix. Residues shading

is assigned on the basis of property or by their degree of

hydrophobicity.

D. ! staircase A graphical display of the disposition of amino acids

side chains about an assumed ! strands. The view is

always along the central axis of the beta strands. The !

staircase is an effective method for displaying the

symmetry of hydrophobic and hydrophilic side chains.

The ! staircase mimics the right-handed super twist found

in most ! strands.

E. Protein Modeling We generate a purified protein for analysis of the

chosen target and then structure determined the target

experimentally then analyzed in molecular modeling

software to evaluate their similarity to known protein

structures and to determine possible relationships that are

identifiable from protein sequence alone. The target

structure will also serve as a detailed model for

determining the structure of peptide within that protein

structure.

F. MHC Binding peptides These MHC binding peptides are sufficient for

eliciting the desired immune response. For predicting

binding affinity of peptides toward the TAP transporter

and the prediction of TAP binding peptides is crucial in

identifying the MHC class-1 restricted T cell epitopes. The

prediction is based on cascade support vector machine,

using sequence and properties of the amino acids. The

correlation coefficient of 0.88 was obtained by using jack-

knife validation test. In this test, we found the MHCI and

MHCII binding regions (Table 1). MHC molecules are

cell surface glycoproteins, which take active part in host

immune reactions and involvement of MHC class-I and

MHC II in response to almost all antigens. In this assay we

predicted the binding affinity of genome polyprotein m,

which shows different nonamers (Table 1). For

development of MHC binder prediction method, an

elegant machine learning technique support vector

machine (SVM) has been used. SVM has been trained on

the binary input of single amino acid sequence. We also

found the SVM based MHCII-IAb peptide regions 51-

PTINHPTFV, 113-PLPKFDSTV, 187-VYSKDDALE,

181-RKYAVLVYS, (optimal score is 1.034); MHCII-IAd

peptide regions 138-AISAMFADG, 170-LSAMRADIG,

25-PSSADANFR, 191-DDALETDEL, (optimal score is

0.541); MHCII-IAg7 peptide regions 27-SADANFRVL,

151-LVYQYAASG, 159-GVQANNKLL, 158-

SGVQANNKL, (optimal score is 1.692); and MHCII-

RT1.B peptide regions 57-TFVGSERCR, 188-

YSKDDALET, 68-YTFTSITLK, 44-KTLAAGRPT,

(optimal score is 0.787) which represented predicted

binders from genome polyprotein m (Table 1). The

predicted binding affinity is normalized by the 1% fractil.

The MHC peptide binding is predicted using neural

networks trained on C terminals of known epitopes. In

analysis predicted MHC/peptide binding is a log-

transformed value related to the IC50 values in nM units.

Table 1. SVM Based MHC-peptide binding nonamers in genome polyprotein m sequence.

Prediction method Rank Sequence Residue No. Peptide Score

ALLELE: I-Ab 1 PTINHPTFV 51 1.034

ALLELE: I-Ab 2 PLPKFDSTV 113 0.913

ALLELE: I-Ab 3 VYSKDDALE 187 0.890

ALLELE: I-Ab 4 RKYAVLVYS 181 0.874

ALLELE: I-Ad 1 AISAMFADG 138 0.541

ALLELE: I-Ad 2 LSAMRADIG 170 0.529

ALLELE: I-Ad 3 PSSADANFR 25 0.526

ALLELE: I-Ad 4 DDALETDEL 191 0.457

ALLELE: RT1.B 1 TFVGSERCR 57 0.787

ALLELE: RT1.B 2 YSKDDALET 188 0.710

ALLELE: RT1.B 3 YTFTSITLK 68 0.575

ALLELE: RT1.B 4 KTLAAGRPT 44 0.496

ALLELE: I-Ag7 1 SADANFRVL 27 1.692

ALLELE: I-Ag7 2 LVYQYAASG 151 1.625

ALLELE: I-Ag7 3 GVQANNKLL 159 1.583

ALLELE: I-Ag7 4 SGVQANNKL 158 1.523


91

IV. Discussion Antigenic epitope is determined using the Hopp and

Woods, Welling and Protrusion Index (Thornton), Parker

antigenicity methods. We found the genome polyprotein

M is more antigenic in nature (Figure 1-4). The helical

wheel shows the symmetry of hydrophobic and

hydrophilic side chains (Figure 5). A graphical display of

the disposition of amino acids side chains about an

assumed ! strands (Figure 6). Structure determined the

target and regions preferably select peptides lying in !

sheet region, avoiding peptides located in helical regions;

in which number amino acid are hydrophobic in nature

(Figure 7). The analyses of the crystal structures of Ag-

Ab complexes, which can show in order to be recognized

by the antibodies, the residues must be accessible for

interactions and thus be present on the surface of antigens.

We also found the SVM based MHCII-IAb peptide

regions 51-PTINHPTFV, 113-PLPKFDSTV, 187-

VYSKDDALE, 181-RKYAVLVYS, (optimal score is

1.034); MHCII-IAd peptide regions 138-AISAMFADG,

170-LSAMRADIG, 25-PSSADANFR, 191-

DDALETDEL, (optimal score is 0.541); MHCII-IAg7

peptide regions 27-SADANFRVL, 151-LVYQYAASG,

159-GVQANNKLL, 158-SGVQANNKL, (optimal score

is 1.692); and MHCII- RT1.B peptide regions 57-

TFVGSERCR, 188-YSKDDALET, 68-YTFTSITLK, 44-

KTLAAGRPT, (optimal score is 0.787) which represented

predicted binders from genome polyprotein m (Table 1).

The predicted binding affinity is normalized by the 1%

fractil.

In favorable cases, just knowing the structure of a

particular protein may also provide considerable insight

into its possible function. Although the information

derived from modeling studies is primarily about

molecular function, protein structure data also provide a

wealth of information on mechanisms linked to the

function and the evolutionary history of and relationships

between macromolecules and to facilitate comparative

analysis involving three-dimensional structure.

Figure 5. Graphical presentation

of helical wheel model of genome

polyprotein M.

Figure 6. Graphical presentation

of beta staircase model of genome

polyprotein M.


92

Figure 7. Structure of genome

polyprotein M structure predict

using DS Viewer (Accelrys) based

protein modelling software. It is a

solid ribbon structure having

2BFU PDB ID.

V. Conclusion Genome polyprotein M is named for their primary

function; to encapsidate viral genomic nucleic acids.

Antigenic epitopes of genome polyprotein M are important

determinants of against various plant diseases. The

knowledge of the immune responses to a protein antigen

progressed; it became clear that the whole protein is not

necessary for raising the immune response, but small

segments of protein called the antigenic determinants or

the epitopes are sufficient for eliciting the desired immune

response. The structure analysis method is allows potential

drug targets to identify. These regions are antigenic in

nature and form antibodies against plant diseases and it

should be tried on gel separation to get a pure form of it

for primer prediction. Recent findings show that peptides

presented in a particulate form result in enhanced immune

responses.

References Altmann F, Lomonossoff GP (2000) Glycosylation of the capsid

proteins of cowpea mosaic virus: a reinvestigation shows the

absence of sugar residues. J. Gen. Virology 81, 1111-1114.

Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B,

Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin

MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2005)

The Universal Protein Resource (UniProt). Nucleic Acids

Res 33, D154-159.

Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B,

Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin

MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2000)

The Protein Information Resource (PIR). Nucleic Acids Res

28, 263-266.

Bateman A, Birney E, Durbin R, Eddy SR, Howe KL,

Sonnhammer EL (2000) The Pfam protein families’

database. Nucleic Acids Res 28, 263-266.

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler

DL (2003) GenBank. Nucleic Acids Res 31, 23.

Canizares MC, Taylor KM, Lomonossoff GP (2004) Surface-

exposed C-terminal amino acids of the small coat protein of

Cowpea mosaic virus are required for suppression of

silencing. J Gen Virology 85, 3431-3435.

Callaway A, Giesman-Cookmeyer D, Gillock ET, Sit TL,

Lommel SA (2001) The multifunctional capsid proteins of

plant RNA viruses. Annu Rev Phytopathology 39, 419-60.

Carvalho CM, Pouwels J, van Lent JW, Bisseling T, Goldbach

RW and Wellink J (2004) The movement protein of cowpea

mosaic virus binds GTP and single-stranded nucleic acid in

vitro. J. Virology 78, 1591-1594.

Goldbach RW, Wellink J (1996) Comoviruses: molecular

biology and replication. In The Plant Viruses, vol. 5,

Polyhedral Virions and Bipartite RNA Genomes. 35-76.

Edited by B. D. Harrison & A. F. Murant. New York:

Plenum.

Gomase VS (2006) Prediction of Antigenic Epitopes of

Neurotoxin Bmbktx1 from Mesobuthus martensii. Curr

Drug Discov Technol 3, 225-229.

Gomase VS, Changbhale SS (2007) Antigenicity Prediction in

Melittin: Possibilities of in Drug Development from Apis

dorsata. Current Proteomics 4, 107-114.

Gomase VS, Kale KV, Jyothiraj A, Vasanthi R (2007) Automatic

Modeling of Protein 3d Structure Polg_Prsvw Protein.

Medicinal Chemistry Research 15, 213.

Gomase VS, Kale KV, Chikhale NJ, Changbhale SS (2007)

Prediction of MHC Binding Peptides and Epitopes from

Alfalfa mosaic virus. Current Drug Discovery Technology

4, 117-121.

Gracy J, Argos P (1998) Automated protein sequence database

classification. I. Integration of compositional similarity

search, local similarity search, and multiple sequence

alignment. Bioinformatics 14, 164-173.

IsHak JA, Kreuze JF, Johansson A, Mukasa SB, Tairo F, Abo El-

Abbas FM, Valkonen JP (2003) Some molecular

characteristics of three viruses from SPVD-affected sweet

potato plants in Egypt. Arch Virology 148, 2449-60.

Kaplan IB, Zhang L, Palukaitis P (1998) Characterization of

cucumber mosaic virus. V. Cell-to-cell movement requires

capsid protein but not virions. Virology 246, 221-231.

Parker KC, Bednarek MA, Coligan JE (1994) Scheme for

ranking potential HLA-A2 binding peptides based on

independent binding of individual peptide side-chains. J.

Immunology 152, 163-75.


93

Pouwels J, Carette JE, van Lent J, Wellink J (2002) Cowpea

mosaic virus: effect on host cell processes. Mol Plant

Pathology 3, 411-418.

Shintaku M, Palukaitis P (1990) Genetic mapping of Cucumber

mosaic virus. Viral genes and plant pathogenesis, 156-164.

Suzuki M, Kuwata S, Kataoka J, Masuta C, Nitta N, Takanami Y

(1991) Functional analysis of deletion mutants of cucumber

mosaic virus RNA3 using an in vitro transcription system.

Virology 183, 106-113.

Taylor KM, Spall VE, Butler PJ, Lomonossoff GP (1999) The

cleavable carboxyl-terminus of the small coat protein of

cowpea mosaic virus is involved in RNA encapsidation.

Virology 255, 129-137.

Thornton JM, Edwards MS, Taylor WR, Barlow DJ (1986)

Location of 'continuous' antigenic determinants in the

protruding regions of proteins. EMBO J 5, 409-413.

Welling GW, Weijer WJ, van der Zee R, Welling-Wester S

(1985) Prediction of sequential antigenic regions in proteins.

FEBS Lett 188, 215-218.

Wezenbeek P, Verver J, Harmsen J, Vos P, van Kammen A

(1983) Primary structure and gene organization of the

middle-component RNA of cowpea mosaic virus. EMBO J

2, 941-946.


94