Gene Therapy and Molecular Biology Vol 12, page 87
87
Gene Ther Mol Biol Vol 12, 87-94, 2008
Development of MHC class nonamers from Cowpea
mosaic viral protein Research Article
Virendra S Gomase1,*, Karbhari V Kale2 1Department of Bioinformatics, Padmashree Dr. D.Y. Patil University, Navi Mumbai, 400614, India 2Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University,
Aurangabad, 431004 (MS), India
__________________________________________________________________________________
*Correspondence: Virendra S. Gomase, Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar
Marathwada University, Aurangabad, 431004 (MS), India; mobile: +91-9226960668; e-mail: [email protected]
Key words: Genome polyprotein, Epitopes, MHC, SVM, Crystal Structure, Hydrophilicity, Hydrophobicity
Abbreviations: Cowpea mosaic virus, (CPMV); instability index, (II); support vector machine, (SVM)
Received: 6 June 2008; Revised: 9 June 2008
Accepted: 10 June 2008; electronically published: July 2008
Summary Cowpea mosaic virus causes one of the most commonly reported virus diseases of cowpea (Vigna unguiculata), in
which it produces chlorotic spots with diffuse borders in inoculated primary leaves. Cowpea mosaic viral peptides
are most suitable for subunit vaccine development because with single epitope, the immune response can be
generated in large population. Peptide binders identified through this approach tend to high-efficiency binders,
which is lagers percentage of their atoms are directly involved in binding as compared to larger molecules. For
development of MHC binder prediction method, an elegant machine learning technique support vector machine
(SVM) has been used. SVM has been trained on the binary input of single amino acid sequence. We also found the
SVM based MHCII-IAb peptide regions 51-PTINHPTFV, 113-PLPKFDSTV, 187-VYSKDDALE, 181-
RKYAVLVYS, (optimal score is 1.034); MHCII-IAd peptide regions 138-AISAMFADG, 170-LSAMRADIG, 25-
PSSADANFR, 191-DDALETDEL, (optimal score is 0.541); MHCII-IAg7 peptide regions 27-SADANFRVL, 151-
LVYQYAASG, 159-GVQANNKLL, 158-SGVQANNKL, (optimal score is 1.692); and MHCII- RT1.B peptide
regions 57-TFVGSERCR, 188-YSKDDALET, 68-YTFTSITLK, 44-KTLAAGRPT, (optimal score is 0.787) which
represented predicted binders from genome polyprotein m. These antigenic epitope are sufficient for eliciting the
desired immune response against viral infection. Study focused on computational approach to deciphering the
peptide fragments, which are antigenic in nature for synthetic peptides viral vaccines and their function of genome
polyprotein m. In analysis predicted antigenic epitopes of genome polyprotein m are predicted a successful
immunization strategy against various diseases.
I. Introduction: Cowpea mosaic virus Cowpea mosaic virus (CPMV) is a plant virus that
belongs to the genus Comovirus of the family
Comoviridae (Goldbach and Wellink, 1996; Pouwels et al,
2002). Cowpea mosaic virus sometimes referred to as
cowpea stunt virus causes foliage to turn yellowish green
with areas of light and dark green tissue. Infected leaves
are often stunted and frequently have a puckered
appearance. CPMV causes one of the most commonly
reported virus diseases of cowpea (Vigna unguiculata), in
which it produces chlorotic spots with diffuse borders in
inoculated primary leaves. Trifoliate leaves develop a
bright yellow or light green mosaic of increasing severity
in younger leaves. Viral symptoms are foliage to turn
yellowish green with areas of light and dark green tissue.
Infected leaves are often stunted and frequently have a
puckered appearance. Susceptible host species are Datura
stramonium, Glycine max, Gomphrena globosa, Nicotiana
tabacum, Phaseolus vulgaris, Pisum sativum, Spinacia
oleracea, Vicia faba, Vigna angularis, Vigna radiata,
Vigna unguiculata. Capsid protein is named for their
primary function; to encapsidate viral genomic nucleic
acids. However, encapsidation is only one feature of an
extremely diverse array of structural, functional, and
ecological roles played during viral infection and spread
(Callaway et al, 2001). The capsid protein is
multifunctional; in addition to having a role in
encapsidation; it affects virus movement in plants (Suzuki
et al, 1991; Kaplan et al, 1998), transmission, symptom
expression, and host range (Shintaku and Palukaitis,
1990). Bioinformatics is being increasingly used to
support target validation by providing functionally
Gomase and Kale: Development of MHC class nonamers from Cowpea mosaic viral protein
88
predictive information mined from databases and
experimental datasets using a variety of computational
tools. The predictive power of these bioinformatics
approaches is strongest when information from several
techniques is combined, including experimental
confirmation of protein antigenicity predictions (Gomase
and Changbhale, 2007; Gomase et al, 2007).
II. Methodology 1. Database searching The protein sequences databases are used to store the vast
amount of information issuing from the genome projects (Gracy
and Argos, 1998; Bateman et al, 2000). There are many different
types of databases available, but for routine protein sequence
analysis, primary and secondary, GenBank, UniProt databases
are initially the most important (Barker et al, 2000; Benson,
2003; Bairoch et al, 2005). We analysed the genome protein
sequence of a viral genome polyprotein M (Wezenbeek, 1983;
Taylor, 1999; Altmann and Lomonossoff, 2000; Canizares, 2004;
Carvalho, 2004).
2. Prediction of antigenicity This program predicts those segments from within viral
genome polyprotein M that are likely to be antigenic by eliciting
an antibody response. Antigenic epitope is determined using the
Hopp and Woods, Welling and Protrusion Index (Thornton),
Parker antigenicity methods (Welling et al, 1985; Thornton et al,
1986; Parker et al, 1994; IsHak et al, 2003; Gomase, 2006).
Predictions are based on a table that reflects the occurrence of
amino acid residues in experimentally known segmental
epitopes.
III. Results and Interpretations A Genome polyprotein M sequence is 1046 aa
residues long as-
MFSFTEAKSKISLWTRSAAPLNNVYLSYSCRCGLGK
RKLAGGCCSAPYITCYDSADFRRVQYLYFCLTRYC
CLYFFLLLLADWFYKKSSIFFETEFSRGFRTWRKIVK
LLYILPKFEMESIMSRGIPSGILEEKAIQFKRAKEGNK
PLKDEIPKPEDMYVSHTSKWNVLRKMSQKTVDLSK
AAAGMGFINKHMLTGNILAQPTTVLDIPVTKDKTL
AMASDFIRKENLKTSAIHIGAIEIIIQSFASPESDLMG
GFLLVDSLHTDTANAIRSIFVAPMRGGRPVRVVTFP
NTLAPVSCDLNNRFKLICSLPNCDIVQGSQVAEVSV
NVAGCATSIEKSHTPSQLYTEEFEKEGAVVVEYLGR
QTYCAQPSNLPTEEKLRSLKFDFHVEQPSVLKLSNS
CNAHFVKGESLKYSISGKEAENHAVHATVVSREGA
SAAPKQYDPILGRVLDPRNGNVAFPQMEQNLFALS
LDDTSSVRGSLLDTKFAQTRVLLSKAMAGGDVLLD
EYLYDVVNGQDFRATVAFLRTHVITGKIKVTATTNI
SDNSGCCLMLAINSGVRGKYSTDVYTICSQDSMTW
NPGCKKNFSFTFNPNPCGDSWSAEMISRSRVRMTVI
CVSGWTLSPTTDVIAKLDWSIVNEKCEPTIYHLADC
QNWLPLNRWMGKLTFPQGVTSEVRRMPLSIGGGA
GATQAFLANMPNSWISMWRYFRGELHFEVTKMSSP
YIKATVTFLIAFGNLSDAFGFYESFPHRIVQFAEVEE
KCTLVFSQQEFVTAWSTQVNPRTTLEADGCPYLYAI
IHDSTTGTISGDFNLGVKLVGIKDFCGIGSNPGIDGS
RLLGAIAQGPVCAEASDVYSPCMIASTPPAPFSDVT
AVTFDLINGKITPVGDDNWNTHIYNPPIMNVLRTAA
WKSGTIHVQLNVRGAGVKRADWDGQVFVYLRQS
MNPESYDARTFVISQPGSAMLNFSFDIIGPNSGFEFA
ESPWANQTTWYLECVATNPRQIQQFEVNMRFDPNF
RVAGNILMPPFPLSTETPPLLKFRFRDIERSKRSVMV
GHTATAA
Genome polyprotein M is 1046 amino acids residues
long, Organism is cowpea mosaic virus and Lineage is
Viruses; ssRNA positive-strand viruses, no DNA stage;
Comoviridae; Comovirus. Genome polyprotein M having
molecular weight is 116218.2 daltons, theoretical pI is
8.36. Atomic composition of Genome polyprotein M of
cowpea mosaic virus is C5207H8091N1389O1517S56 with total
number of atoms: 16260 and the instability index (II) is
computed to be 39.46; this classifies the genome
polyprotein m as stable. Aliphatic index for genome
polyprotein m is 81.29 and grand average of
hydropathicity (GRAVY): -0.092 (Gasteiger et al, 2005).
A. Analysis of genome polyprotein m Percent Hydrophilic amino acids - 49.7132
Percent Hydrophobic amino acids - 50.2868
Ratio of % OF Hydrophilic to % Hydrophobic -
0.988593
Mean ! Hydrophobic moment - 0.207361
Mean Helix Hydrophobic moment-0.168268
Number of Basic amino acids-105
Number of Acidic amino acids-97
Estimated pI for Protein-8.9
Total Linear Linear Charge Density -0.195029
Polar Area of Extended Chain (Angs) - 66160.3
Non Polar Area of Extended Chain (Angs) -117985.0
Total Area of Extended Chain (Angs) -184145.0
Polar ASA of folded protein (Angs) -13198.0
Non Polar ASA of folded protein (Angs) -17611.9
ASA of folded protein (Angs) -30809.8
Ratio of Folded of protein to extended area-
0.179129
Buried polar area of Folded of protein (Angs) -
49208.9
Buried Non polar area of Folded of protein (Angs) -
85764.2
Buried Charge area of Folded of protein (Angs)-
5632.88
Total buried surface (Angs)-140597.0
Number of buried amino acids -541
Packing volume (est) (Angs) -141932.0
Packing volume (act) (Angs) -138932.0
Interior volume of protein -102030.0
Exterior volume protein - 36901.4
Partial specific volume (Ml/g)-0.726493
Fisher volume ratio (act)- 0.361671
Protein solubility - 1.38252
Solvent free energy of folding (Kcal/mol) = -1019.52
Total number of negatively charged residues (Asp +
Glu): 97
Total number of positively charged residues (Arg +
Lys): 105
B. Prediction of Antigenic peptides In these methods we found the antigenic
determinants by finding the area of greatest local
hydrophilicity. The Hopp-Woods scale was designed to
predict the locations of antigenic determinants in a protein,
assuming that the antigenic determinants would be
exposed on the surface of the protein and thus would be
Gene Therapy and Molecular Biology Vol 12, page 89
89
located in hydrophilic regions. Its values are derived from
the transfer-free energies for amino acid side chains
between ethanol and water. Welling antigenicity plot gives
value as the log of the quotient between percentage in a
sample of known antigenic regions and percentage in
average proteins (Figures 1-4).
Figure 1. Hopp-Woods antigenicity plot of genome polyprotein M.
Figure 2. Welling antigenicity plot of genome polyprotein M.
Figure 3. Protrusion Index (Thornton) antigenicity plot of genome polyprotein M.
Figure 4. Parker antigenicity plot of genome polyprotein M.
Gomase and Kale: Development of MHC class nonamers from Cowpea mosaic viral protein
90
C. Helical Wheel The helical wheel command graphically displays the
disposition of amino acids, side chains, about an assumed
alpha helix. The view is always along the central axis of
helix, from N to C-terminus. The helical wheel is an
effective method for displaying the symmetry of
hydrophobic and hydrophilic side chains. The helical
wheel assumes a periodicity of 3.6 residues per helical
turn. Individual residues represented as colored circles are
placed successively at each node of helix. Multiple turn of
helix are represented by “radiating” the spiral outward
from the helix center. The interconnecting bars indicate
the residue arrangement along the helix. Residues shading
is assigned on the basis of property or by their degree of
hydrophobicity.
D. ! staircase A graphical display of the disposition of amino acids
side chains about an assumed ! strands. The view is
always along the central axis of the beta strands. The !
staircase is an effective method for displaying the
symmetry of hydrophobic and hydrophilic side chains.
The ! staircase mimics the right-handed super twist found
in most ! strands.
E. Protein Modeling We generate a purified protein for analysis of the
chosen target and then structure determined the target
experimentally then analyzed in molecular modeling
software to evaluate their similarity to known protein
structures and to determine possible relationships that are
identifiable from protein sequence alone. The target
structure will also serve as a detailed model for
determining the structure of peptide within that protein
structure.
F. MHC Binding peptides These MHC binding peptides are sufficient for
eliciting the desired immune response. For predicting
binding affinity of peptides toward the TAP transporter
and the prediction of TAP binding peptides is crucial in
identifying the MHC class-1 restricted T cell epitopes. The
prediction is based on cascade support vector machine,
using sequence and properties of the amino acids. The
correlation coefficient of 0.88 was obtained by using jack-
knife validation test. In this test, we found the MHCI and
MHCII binding regions (Table 1). MHC molecules are
cell surface glycoproteins, which take active part in host
immune reactions and involvement of MHC class-I and
MHC II in response to almost all antigens. In this assay we
predicted the binding affinity of genome polyprotein m,
which shows different nonamers (Table 1). For
development of MHC binder prediction method, an
elegant machine learning technique support vector
machine (SVM) has been used. SVM has been trained on
the binary input of single amino acid sequence. We also
found the SVM based MHCII-IAb peptide regions 51-
PTINHPTFV, 113-PLPKFDSTV, 187-VYSKDDALE,
181-RKYAVLVYS, (optimal score is 1.034); MHCII-IAd
peptide regions 138-AISAMFADG, 170-LSAMRADIG,
25-PSSADANFR, 191-DDALETDEL, (optimal score is
0.541); MHCII-IAg7 peptide regions 27-SADANFRVL,
151-LVYQYAASG, 159-GVQANNKLL, 158-
SGVQANNKL, (optimal score is 1.692); and MHCII-
RT1.B peptide regions 57-TFVGSERCR, 188-
YSKDDALET, 68-YTFTSITLK, 44-KTLAAGRPT,
(optimal score is 0.787) which represented predicted
binders from genome polyprotein m (Table 1). The
predicted binding affinity is normalized by the 1% fractil.
The MHC peptide binding is predicted using neural
networks trained on C terminals of known epitopes. In
analysis predicted MHC/peptide binding is a log-
transformed value related to the IC50 values in nM units.
Table 1. SVM Based MHC-peptide binding nonamers in genome polyprotein m sequence.
Prediction method Rank Sequence Residue No. Peptide Score
ALLELE: I-Ab 1 PTINHPTFV 51 1.034
ALLELE: I-Ab 2 PLPKFDSTV 113 0.913
ALLELE: I-Ab 3 VYSKDDALE 187 0.890
ALLELE: I-Ab 4 RKYAVLVYS 181 0.874
ALLELE: I-Ad 1 AISAMFADG 138 0.541
ALLELE: I-Ad 2 LSAMRADIG 170 0.529
ALLELE: I-Ad 3 PSSADANFR 25 0.526
ALLELE: I-Ad 4 DDALETDEL 191 0.457
ALLELE: RT1.B 1 TFVGSERCR 57 0.787
ALLELE: RT1.B 2 YSKDDALET 188 0.710
ALLELE: RT1.B 3 YTFTSITLK 68 0.575
ALLELE: RT1.B 4 KTLAAGRPT 44 0.496
ALLELE: I-Ag7 1 SADANFRVL 27 1.692
ALLELE: I-Ag7 2 LVYQYAASG 151 1.625
ALLELE: I-Ag7 3 GVQANNKLL 159 1.583
ALLELE: I-Ag7 4 SGVQANNKL 158 1.523
Gene Therapy and Molecular Biology Vol 12, page 91
91
IV. Discussion Antigenic epitope is determined using the Hopp and
Woods, Welling and Protrusion Index (Thornton), Parker
antigenicity methods. We found the genome polyprotein
M is more antigenic in nature (Figure 1-4). The helical
wheel shows the symmetry of hydrophobic and
hydrophilic side chains (Figure 5). A graphical display of
the disposition of amino acids side chains about an
assumed ! strands (Figure 6). Structure determined the
target and regions preferably select peptides lying in !
sheet region, avoiding peptides located in helical regions;
in which number amino acid are hydrophobic in nature
(Figure 7). The analyses of the crystal structures of Ag-
Ab complexes, which can show in order to be recognized
by the antibodies, the residues must be accessible for
interactions and thus be present on the surface of antigens.
We also found the SVM based MHCII-IAb peptide
regions 51-PTINHPTFV, 113-PLPKFDSTV, 187-
VYSKDDALE, 181-RKYAVLVYS, (optimal score is
1.034); MHCII-IAd peptide regions 138-AISAMFADG,
170-LSAMRADIG, 25-PSSADANFR, 191-
DDALETDEL, (optimal score is 0.541); MHCII-IAg7
peptide regions 27-SADANFRVL, 151-LVYQYAASG,
159-GVQANNKLL, 158-SGVQANNKL, (optimal score
is 1.692); and MHCII- RT1.B peptide regions 57-
TFVGSERCR, 188-YSKDDALET, 68-YTFTSITLK, 44-
KTLAAGRPT, (optimal score is 0.787) which represented
predicted binders from genome polyprotein m (Table 1).
The predicted binding affinity is normalized by the 1%
fractil.
In favorable cases, just knowing the structure of a
particular protein may also provide considerable insight
into its possible function. Although the information
derived from modeling studies is primarily about
molecular function, protein structure data also provide a
wealth of information on mechanisms linked to the
function and the evolutionary history of and relationships
between macromolecules and to facilitate comparative
analysis involving three-dimensional structure.
Figure 5. Graphical presentation
of helical wheel model of genome
polyprotein M.
Figure 6. Graphical presentation
of beta staircase model of genome
polyprotein M.
Gomase and Kale: Development of MHC class nonamers from Cowpea mosaic viral protein
92
Figure 7. Structure of genome
polyprotein M structure predict
using DS Viewer (Accelrys) based
protein modelling software. It is a
solid ribbon structure having
2BFU PDB ID.
V. Conclusion Genome polyprotein M is named for their primary
function; to encapsidate viral genomic nucleic acids.
Antigenic epitopes of genome polyprotein M are important
determinants of against various plant diseases. The
knowledge of the immune responses to a protein antigen
progressed; it became clear that the whole protein is not
necessary for raising the immune response, but small
segments of protein called the antigenic determinants or
the epitopes are sufficient for eliciting the desired immune
response. The structure analysis method is allows potential
drug targets to identify. These regions are antigenic in
nature and form antibodies against plant diseases and it
should be tried on gel separation to get a pure form of it
for primer prediction. Recent findings show that peptides
presented in a particulate form result in enhanced immune
responses.
References Altmann F, Lomonossoff GP (2000) Glycosylation of the capsid
proteins of cowpea mosaic virus: a reinvestigation shows the
absence of sugar residues. J. Gen. Virology 81, 1111-1114.
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B,
Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin
MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2005)
The Universal Protein Resource (UniProt). Nucleic Acids
Res 33, D154-159.
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B,
Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin
MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2000)
The Protein Information Resource (PIR). Nucleic Acids Res
28, 263-266.
Bateman A, Birney E, Durbin R, Eddy SR, Howe KL,
Sonnhammer EL (2000) The Pfam protein families’
database. Nucleic Acids Res 28, 263-266.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler
DL (2003) GenBank. Nucleic Acids Res 31, 23.
Canizares MC, Taylor KM, Lomonossoff GP (2004) Surface-
exposed C-terminal amino acids of the small coat protein of
Cowpea mosaic virus are required for suppression of
silencing. J Gen Virology 85, 3431-3435.
Callaway A, Giesman-Cookmeyer D, Gillock ET, Sit TL,
Lommel SA (2001) The multifunctional capsid proteins of
plant RNA viruses. Annu Rev Phytopathology 39, 419-60.
Carvalho CM, Pouwels J, van Lent JW, Bisseling T, Goldbach
RW and Wellink J (2004) The movement protein of cowpea
mosaic virus binds GTP and single-stranded nucleic acid in
vitro. J. Virology 78, 1591-1594.
Goldbach RW, Wellink J (1996) Comoviruses: molecular
biology and replication. In The Plant Viruses, vol. 5,
Polyhedral Virions and Bipartite RNA Genomes. 35-76.
Edited by B. D. Harrison & A. F. Murant. New York:
Plenum.
Gomase VS (2006) Prediction of Antigenic Epitopes of
Neurotoxin Bmbktx1 from Mesobuthus martensii. Curr
Drug Discov Technol 3, 225-229.
Gomase VS, Changbhale SS (2007) Antigenicity Prediction in
Melittin: Possibilities of in Drug Development from Apis
dorsata. Current Proteomics 4, 107-114.
Gomase VS, Kale KV, Jyothiraj A, Vasanthi R (2007) Automatic
Modeling of Protein 3d Structure Polg_Prsvw Protein.
Medicinal Chemistry Research 15, 213.
Gomase VS, Kale KV, Chikhale NJ, Changbhale SS (2007)
Prediction of MHC Binding Peptides and Epitopes from
Alfalfa mosaic virus. Current Drug Discovery Technology
4, 117-121.
Gracy J, Argos P (1998) Automated protein sequence database
classification. I. Integration of compositional similarity
search, local similarity search, and multiple sequence
alignment. Bioinformatics 14, 164-173.
IsHak JA, Kreuze JF, Johansson A, Mukasa SB, Tairo F, Abo El-
Abbas FM, Valkonen JP (2003) Some molecular
characteristics of three viruses from SPVD-affected sweet
potato plants in Egypt. Arch Virology 148, 2449-60.
Kaplan IB, Zhang L, Palukaitis P (1998) Characterization of
cucumber mosaic virus. V. Cell-to-cell movement requires
capsid protein but not virions. Virology 246, 221-231.
Parker KC, Bednarek MA, Coligan JE (1994) Scheme for
ranking potential HLA-A2 binding peptides based on
independent binding of individual peptide side-chains. J.
Immunology 152, 163-75.
Gene Therapy and Molecular Biology Vol 12, page 93
93
Pouwels J, Carette JE, van Lent J, Wellink J (2002) Cowpea
mosaic virus: effect on host cell processes. Mol Plant
Pathology 3, 411-418.
Shintaku M, Palukaitis P (1990) Genetic mapping of Cucumber
mosaic virus. Viral genes and plant pathogenesis, 156-164.
Suzuki M, Kuwata S, Kataoka J, Masuta C, Nitta N, Takanami Y
(1991) Functional analysis of deletion mutants of cucumber
mosaic virus RNA3 using an in vitro transcription system.
Virology 183, 106-113.
Taylor KM, Spall VE, Butler PJ, Lomonossoff GP (1999) The
cleavable carboxyl-terminus of the small coat protein of
cowpea mosaic virus is involved in RNA encapsidation.
Virology 255, 129-137.
Thornton JM, Edwards MS, Taylor WR, Barlow DJ (1986)
Location of 'continuous' antigenic determinants in the
protruding regions of proteins. EMBO J 5, 409-413.
Welling GW, Weijer WJ, van der Zee R, Welling-Wester S
(1985) Prediction of sequential antigenic regions in proteins.
FEBS Lett 188, 215-218.
Wezenbeek P, Verver J, Harmsen J, Vos P, van Kammen A
(1983) Primary structure and gene organization of the
middle-component RNA of cowpea mosaic virus. EMBO J
2, 941-946.
Gomase and Kale: Development of MHC class nonamers from Cowpea mosaic viral protein
94