Computational energy-based redesign of robust proteins

G

C

C

GD

a

ARRAA

KPRE

1

hNhrNtt(tsdabfo

n

0d

ARTICLE IN PRESSModel

ACE-4034; No. of Pages 10

Computers and Chemical Engineering xxx (2010) xxx–xxx

Contents lists available at ScienceDirect

Computers and Chemical Engineering

journa l homepage: www.e lsev ier .com/ locate /compchemeng

omputational energy-based redesign of robust proteins

iovanni Stracquadanio ∗, Giuseppe Nicosiaepartment of Mathematics and Computer Science, University of Catania, Viale A. Doria n. 6, 95125, Catania, Italy

r t i c l e i n f o

rticle history:eceived 19 October 2009eceived in revised form 11 January 2010ccepted 5 April 2010vailable online xxx

eywords:rotein robustnessobust-protein-design algorithmnergetic relative entropy

a b s t r a c t

The robustness of a system is a property that pervades all aspects of Nature. The ability of a system toadapt itself to perturbations due to internal and external agents, to aging, to wear, to environmentalchanges is one of the driving forces of evolution. At the molecular level, understanding the robustnessof a protein has a great impact on the in silicon design of polypeptide chains and drugs; the chance ofcomputationally checking the ability of a protein to preserve its structure and function in the native statecan lead to the design of new compounds that can work in a living cell more effectively. Inspired bythe well known robustness analysis framework used in Electronic Design Automation, we introduced anotion of robustness for proteins and two dimensionless quantities: the energetic yield and the energeticrelative entropy. We used the energetic yield in order to quantify the robustness of a protein, and to detectsensitive regions and sensitive residues in the protein, whereas we adopted the energetic relative entropyto measure the discrepancy between two potential energy distributions. Subsequently, we implemented
a new robustness-centred protein design algorithm called Robust-Protein-Design (RPD); the aim of thealgorithm is to discover new conformations with a specific function with high yield values. We performedan extensive characterization of the robustness property of many peptides, proteins, and drugs. Moreover,we found that robustness and relative entropy are conflicting objectives which constitute a trade-off usefulas design principle for new proteins and drugs.
Finally, we used the RPD algorithm on the Crambin protein (1CRN); the obtained results confirm thatthe algorithm was able to find out a Crambin-like protein that is 23% more robust than the wild type.

. Introduction

In the last twenty years, many computational approachesave been largely applied in biochemistry (Cutello, Narzisi, &icosia, 2006; Floudas, 2007; Tramontano, 2006). Many effortsave been put in defining effective and efficient folding algo-ithms (Klepeis & Floudas, 2003; Klepeis, Pieja, & Floudas, 2003;icosia & Stracquadanio, 2008) to computationally design new pro-

eins, but, to our knowledge, we lack a general accepted approacho estimate, in silicon, the robustness of wild type and syntheticcomputer designed) protein structures. The estimation of pro-ein robustness is a key point in a robust protein design flowince it is crucial to estimate how well the structure remainsefined under physical mutations; physical mutations may occur

Please cite this article in press as: Stracquadanio, G., & Nicosia, G. CompChemical Engineering (2010), doi:10.1016/j.compchemeng.2010.04.005

t any step of the synthesis process, and at any stage of manyio-chemical processes occurring in a living cell. Since functionollows structure in protein science, the estimation of the yieldf a protein is crucial in order to measure, statistically, how

∗ Corresponding author. Tel.: +39 095 738 3079; fax: +39 095 330 094.E-mail addresses: [email protected] (G. Stracquadanio),

[email protected] (G. Nicosia).

098-1354/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.oi:10.1016/j.compchemeng.2010.04.005

© 2010 Elsevier Ltd. All rights reserved.

well it conserves its function under structural mutations (Estrada,2002).

In the present article, inspired by the current state-of-the-art inElectronic Design Automation (EDA), we define the concept of ener-getic yield and energetic relative entropy for proteins; subsequentlywe introduce a new statistical analysis to estimate the robust-ness of the protein structure by using three types of perturbations(global, local and residue) and by defining two algorithms, the Pro-tein Monte-Carlo Sampling algorithm and the Robust-Protein-Designalgorithm. The experimental study is focused on two directions;firstly, we experimentally assess the effectiveness of the method-ology in finding sensitive regions and sensitive residues in proteins;successively, we outline a new protein design algorithm calledRobust-Protein-Design. In particular, we show how to make awild type protein more robust through the systematic muta-tion of faulty and sensitive residues of the structure, in order todesign new proteins with a specific function and a more robustconformation.

utational energy-based redesign of robust proteins. Computers and

To assess the discrepancies between mutant and wild typepotential energy distributions we used three relative entropies:the Kullback-Leibler, Rényi and von Neumann divergences. Theobtained results show a strong correlation between high yield adlow relative entropy values. In fact, we found that a robust pro-

dx.doi.org/10.1016/j.compchemeng.2010.04.005

http://www.sciencedirect.com/science/journal/00981354

http://www.elsevier.com/locate/compchemeng

mailto:[email protected]

mailto:[email protected]


https://www.researchgate.net/publication/36218278_Protein_structure_prediction?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/6374508_Computational_Methods_in_Protein_Structure_Prediction?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

ING

C

2 and C

tr

wm

2

woWa

ne(f

�

wiTpneuitty

�

Ttr

ticgtptps

dpdsowtl(

stste

ARTICLEModel


G. Stracquadanio, G. Nicosia / Computers

ein structure produces a yield increase jointly with a decreasingelative entropy.

Finally, the Robust-Protein-Design has been applied to two drugs;e experimentally prove that is possible to systematically maxi-ize, in silicon, the robustness of these molecules.

. The robustness design principle

In our work, a perturbation is defined as a function � = ˛(˝,�),here ˛ applies a stochastic noise � to the system ˝ in order to

btain a trial sample, �; the ˛ function is called ˛-perturbation.ithout loss of generality, we assume that the noise is defined by

random distribution.In order to simulate a statistically meaningful perturbation phe-

omenon, we generate an ensemble, T, of perturbed systems. Thelement � of the ensemble T is considered robust to a perturbationmutation) of the stochastic noise � for a given property � if theollowing robustness condition is verified:

(˝,�,�, �) ={

1 if |�(˝) − �(�)| ≤ �0 otherwise

(1)

here ˝ is the reference system, � is a property of the system, �s a trial sample of the ensemble T, and � is a robustness threshold.he definition of � function does not make any assumption on theroperty function �; the � function can be anything and it is notecessarily related to properties and features of the system. How-ver, the property function � implicitly assumes that the propertynder investigation is quantifiable. The robustness of a system ˝

s the number of robust trials of T respect the property � over theotal number of trials (|T |); as already known in EDA, we denotehis measure as the yield of the system. Formally, we can define aield function � as follows:

(˝,T,�, �) =

∑� ∈ T

�(˝,�,�, �)

|T | (2)

he function� (˝,T,�, �) is a dimensionless quantity that assesseshe yield of a given system, in general, and of proteins in thisesearch work.

It is straightforward to note that the yield is function of �; in par-icular, the yield grows with a non-decreasing behaviour respectsncreasing � values. Following this consideration, it is clear thathoosing a meaningful threshold is crucial and not a trivial task; ineneral, in terms of robustness condition, if a system is robust allhe trials should differ the less possible from it. However it is a goodractice to set a strict threshold value, it is important to not restricthe analysis to a small set of feasible trials (in this work, feasiblerotein structure), in order to not neglect a great part of plausibleystems (protein conformations).

In the EDA domain, the threshold value is typically set by expertesigners by taking into account the manufacturing system, thehysical properties of materials and the adherence to the originalesign. In the area of robust protein design, it is not a trivial tasketting an ad-hoc � value; for instance, threshold values dependingn the family of the protein is a plausible approach. In this researchork we performed extensive computational experiments in order

o detect general and reasonable threshold values: 1.0 kcal/mol forocal analysis (local robustness) and 5.0 kcal/mol for global analysisglobal robustness).

Finally, it is important to remark that the yield of a system is


trictly related to studied perturbation (the type of applied muta-ion); in this terms, we refer to ˛-yield to the yield value of theystem ˝ perturbed by the ˛ perturbation, and to ˛-analysis forhe process of generation and evaluation of the robustness of thensemble of systems generated by the perturbation ˛.

PRESShemical Engineering xxx (2010) xxx–xxx

It is important to remark the differences between thermody-namic stability and thermodynamic robustness; thermodynamicstability concerns the folding and unfolding time of a protein whereonly the folded and unfolded states are taken into account. In thiscase the stability refers to the difference of Gibbs free energy (�G)between the folded Gf and unfolded Gu states. In this terms thethermodynamic stability is defined as �G = Gu − Gf ; it is obviousthat the larger is �G more stable is the protein (Wolynes, 2006).The thermodynamic robustness clearly differs from the stability;robustness concerns the estimation of the number of conforma-tions belonging to the energy sphere of a predefined radius (thethreshold �) and centred on the native structure (the referencesystem ˝). While the stability concerns the determination of theunfolding threshold of a protein, the robustness emphasizes theprobability of finding folded structures that differs at most � fromthe native conformation, under some external or internal pertur-bations.

2.1. The Protein Monte-Carlo Sampling algorithm

In order to study the robustness of proteins under variousmolecular deformations, an ad-hoc Monte-Carlo algorithm, calledProtein Monte-Carlo Sampling (PMCS), is introduced. According tothe classical Monte-Carlo sampling algorithms, given a protein,PMCS generates a n trial conformations by randomly perturbinga protein structure in the native state. The protein structure can bedescribed by using angles of internal rotations in the main chain.Internal rotations around N and C˛ atoms, and C˛ and C atoms arenot restricted by the electronic structure of the bond, but only bypossible steric collisions in the conformations. The side-chain con-formations can be expressed by using angles of internal rotation,denoted by 1, . . . , n; the conformation of any side-chain corre-sponding to different combinations of values of i angles are calledrotamers. In the current work, we use an internal coordinates repre-sentation (torsion angles), which is currently the most widely used(Helles, 2008). Each residue type requires a fixed number of tor-sion angles to fix the 3D coordinates of all atoms. Bond lengths andangles are fixed at their ideal values.

According to our definition of perturbation, we have to definethe random noise and the procedures to perturb the trials. In ourexperiments, we choose a classical normal distribution since it isa plausible random noise to apply; however, it is reasonable toapply various statistical distributions. We designed three different˛-analysis responsible for generating trial conformations: a Globalanalysis, a Local analysis and a Residue analysis.

The Global analysis procedure refers to deformations applied tothe whole structure of the protein: all the angles of the protein willbe mutated; this procedure has been introduced to analyze strongand dramatic mutations occurring due to changes of the cellularenvironment or to an error regarding the synthesis process.

The Local analysis procedure perturbs an individual dihedralangle at time in order to find sensitive points in the structure; thisanalysis is extremely helpful especially for de-novo optimizationalgorithms based on potential energy functions, since it is able toidentify the most sensitive angle of the structure.

Finally, the Residue analysis procedure perturbs all the angles ofa residue; this analysis is especially indicated for identifying keyresidues in the polypeptide chain. In particular, the identificationof sensitive amino acids opens to the definition of a new class ofalgorithms that is focused on robust-optimization.


2.2. The Robust-Protein-Design algorithm

We designed a new algorithm based on robust protein designprinciple, called Robust-Protein-Design. The aim of the Robust-Protein-Design algorithm is to discover new proteins with a specific


https://www.researchgate.net/publication/5771823_A_comparative_study_of_the_reported_performance_of_ab_initio_protein_structure_prediction_algorithms_J_R_Soc_Interface_5387-396?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/6854669_Wolynes_PG_Recent_successes_of_the_energy_landscape_theory_of_protein_folding_and_function_Q_Rev_Biophys_38_405-410?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

ING

C

and C

fefmtettpasmtts

rpntd

cfioSeba

tlss1t

tcstif

ARTICLEModel



unction with an improved robustness; at this stage, we are inter-sted in finding mutants of wild type proteins with a knownunction and an increased robustness to perturbations. From a

athematical point of view, the algorithm tries to discover struc-ures with potential energy distribution well centred on the nativenergy value and with the tightest spread. The basic idea is to findhe most sensitive regions of the structure and replacing it withhe other amino acids that maximize the yield; it is obvious that isossible to iteratively replace each amino acid of a sequence andchieving a yield value. However, there are two important con-traints on this protein redesign process. Firstly, each amino acidutation must be neutral (Bemporad et al., 2008); since we want

o preserve the function, it is important to consider only mutationshat preserve it, and this is mandatory especially in drug-designince we do not want to deal with toxic drugs.

Secondly, it is mandatory to verify that the mutant folds cor-ectly; this constraint can be checked by the evaluation of theotential energy, that for positive values states that a protein isot in a feasible state. It is important to note that, since the func-ion depends on the structure, that a mutant structure should notiffer more than 1Å from the wild type, in terms of RMSDC˛ .

According with these structural and functional design prin-iples, the RPD algorithm is built of two main procedures; therst procedure is responsible for the evaluation of neutralityf the mutant, which, in our research work, is based on theDM server (http://mordred.bioc.cam.ac.uk/ sdm/sdm.php)(Wortht al., 2007). SDM takes in input a wild type protein, the position toe mutated and the amino acid to put in; the server answers givingBoolean response that states the neutrality or not of the mutant.

The second procedures is based on the robustness estimationrough the PMCS algorithm; since we are working at the residueevel, we choose to perform a residue analysis in order to under-tand if the mutation of the residue improves the robustness of thetructure. The pseudo-code of the algorithm is shown in Algorithm. The algorithm takes in input a wild type protein C and robustnesshreshold � used for the residue analysis.

Firstly, the algorithm performs a residue analysis on the wildype protein in order to discover the most sensitive residue. Suc-


essively, the algorithm mutates the protein by changing the mostensitive amino acid with the others, and it queries the SDM servero establish if the mutant is neutral or not; if the mutant is neutral,t undergoes a full regularization of the structure and it will be usedor the residue analysis, otherwise it will be discarded. From the set

PRESShemical Engineering xxx (2010) xxx–xxx 3

of neutral mutants, the mutant with the highest residue yield valueis returned.

3. Methods

The interactions of the side-chains and main-chains with eachother, with the solvent and with the ligands determine the energyof the given protein conformation. The folding of a protein is a pro-cess that drives the atoms to be stabilized into a conformation thatis better than others, the so called native state conformation; the for-mation of the native state is a global property of a protein, becausethe stabilizing interactions involve parts of the protein that aredistant in the polypeptide chain but near in space. From a thermo-dynamic point of view, the free energy of a protein depends on theentropy and on the enthalpy of the system. Since we are interestedin the folded state of a protein, we consider as a good candidatestructure the one with lowest energy (Vendruscolo, 2007). For thistask we use an analytical expression that gives information aboutthe thermo-dynamical state of a protein as a function of the posi-tion of the atoms; this is the so called potential energy function. Mosttypical potential energy functions have the following form:

E(−→R ) =∑bonds

B(R) +∑

angles

A(R)+ (3)

+∑

torsions

T(R) +∑

non-bonded

N(R) (4)

where −→R is the vector representing the conformation of the

molecule, typically in Cartesian coordinates or in torsion angles.The first three terms describe the local interactions between

atoms that are separated by one, two or three covalent bonds;many proteins contain covalent bonds in addition to those of thepolypeptide backbone and of the side-chain. In particular, the firstterm refers to the bond length stretching, the second one to the

angle bending, and the last one represents the angle twisting. Thelast term takes into account the non-local interactions between


pairs of atoms that are separated along the covalent structure byat least three bonds. In particular, one of the main non-bondedactors is the van der Waals forces; the packing of atoms in a proteincontributes to the stability of the protein itself by excluding thenon-polar atoms from contact with water and by packing together


http://mordred.bioc.cam.ac.uk/~sdm/sdm.php

https://www.researchgate.net/publication/5331125_Biological_function_in_a_non-native_partially_folded_state_of_a_protein_EMBO_J?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/6562646_Determination_of_structurally_heterogeneous_states_of_proteins?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

ING

C

4 and C

ti&RupmaotLoT

E

E

E

E

E

It(uTmoAoiwoY

Niibsgtdoo

patpMH

3

eso

ARTICLEModel



he atoms of the protein. The literature on proper cost functionss vast (Cornell et al., 1995; Hermans, Berendsen, Van Gusteren,

Postma, 1984; Momany, McGuire, Burgess, & Scheraga, 1975;oterman, Lambert, Gibson, & Scheraga, 1989). In this work, wese the Empirical Conformation Energy Program for Peptides (ECEPP)otential energy function version 3 (Nemethy et al., 1992); in thisodel, the lengths of covalent bonds, along with the bond angles

re taken to be constant at their equilibrium value, and the degreesf freedom become the torsional angles of the system. The poten-ial energy function Etot is the sum of the electrostatic term Ec ,ennard–Jones term ELJ , the hydrogen bonding term EHB for all pairsf peptides, together with the torsion termEtor for all torsion angles.he function has the following form:

tot = EC + ELJ + EHB + Etor (5)

C =∑(i,j)

qiqjrij

(6)

LJ =∑(i,j)

(Aij

r12ij

− Bij

r6ij

)(7)

HB =∑(i,j)

(Cij

r12ij

− Dij

r10ij

)(8)

tor =∑l

Ul(1 ± cos(ll)) (9)

n this model, rij is the distance between atoms i and j, and l is theorsion angle for chemical bond l. The bond lengths and bond angleswhich are hard degrees of freedom) are fixed at experimental val-es and dihedral angles �, ,ω and i are independent variables.he various parameters (qi, Aij, Bij, Cij, Dij, Ul and l) were deter-ined by a combination of a priori calculations and minimization

f the potential energies of the crystal lattices of single amino acids.s already stated, the free energy of folding of a protein consistsf the sum of contributions from the energy of its intra molecularnteractions and from the free energy of interaction of the molecule

ith the surrounding solvent water; however, exact computationf the solvent contribution is very complex (Eisenberg, Wesson, &amashita, 1989; Wesson & Eisenberg, 1992).

In this study, we use the model proposed by Ooi, Oobatake,emethy, and Scheraga (1987): they assume that the extent of

nteraction of any functional group i of a solute with the solvents proportional to the solvent accessible surface area Ai of group iecause the group may interact directly only with the group at thisurface. The total free energy of hydration of a solute molecule isiven by the following equation:�Go

h=∑

igiAi where the summa-ion extends over all groups of the solute and Ai is the conformationependent accessible surface area of group i, whereas the constantf proportionality gi represents the contribution to the free energyf hydration of group i per unit-accessible area.

The model is adopted because it is specifically designed to sup-lement the ECEPP algorithm; the free energy of hydration, to bedded to the ECEPP energy, must correspond only to the addi-ional interactions of the atoms of the solute with water. All theotential energy calculations have been conducted using the Simpleolecular Mechanics for Proteins (SMMP) (Eisenmenger, Hansmann,ayryan, & Hu, 2001).

.1. The sets of peptides, proteins and drugs analyzed


In order to validate our methodology, we performed severalxperiments on two set of polypeptides: firstly, we identify a repre-entative set of proteins, and, successively, we extend the method-logy to a set of well known drugs (http://www.drugbank.ca/).


The set of proteins can be divide into three subsets; the first isa subset of 35 peptides taken from the PEPstr test bed (Nicosia& Stracquadanio, 2008). The test bed is composed of 77 experi-mentally determined 3D structures of bio-active peptides; only fewstructures are solved using X-ray crystallography, and most of themhave NMR solved structures. From these 77 structures, we excluded35 peptides stabilized by disulfide bridges. The remaining set of42 molecules can be grouped according to their regular secondarystructure: 32.3% are �-helices, 6.9% are �-sheet and the remaining34.9% are �-turns. From the set of 42 peptides, we remove 7 pep-tides that are redundant. The second set takes into account nineproteins (PDB codes: 1PLW, 1CRN, 1IGD, 1BDD, 1GAB,1E0L, 1AML,1BJB, 1BJC); the benchmark is built in order to consider a paradig-matic peptide (PDB Id. 1PLW), the three basic classes (�,� + �,�),a protein with two of its mutants (PDB IDs: 1AML, 1BJB, 1BJC).

The last one considers the Acyl Coenzyme A Binding Protein(PDB code 1ST7) that is an enzyme from yeast; in particular, AcylCoA binding protein (ACBP) binds thiol esters of long fatty acids andcoenzyme A in a one-to-one binding mode with high specificity andaffinity. Acyl-CoAs are important intermediates in fatty lipid syn-thesis and fatty acid degradation and play a role in regulation ofintermediary metabolism and gene regulation. The suggested roleof ACBP is to act as a intra-cellular acyl-CoA transporter and poolformer. ACBPs are present in a large group of eukaryotic speciesand several tissue-specific isoforms have been detected.

The other set of proteins is focused on drugs; we considertwo drugs: the Oxicitin (DrugBank Id:DB00107, PDB Id:1NPO) andthe Glucagon (DrugBank Id:DB00040, PDB Id:1GCN)(Wishart et al.,2006). The Oxytocin is a synthetic 9-residue cyclic peptide. It isused to induce labour or to enhance uterine contractions duringlabour. The Glucagon is a 29-residue peptide hormone that is syn-thesized in a special non-pathogenic laboratory strain of Escherichiacoli bacteria that has been genetically altered by the addition of thegene for glucagons. It is used in the treatment of hypoglycemia andin gastric imaging, glucagon increases blood glucose concentrationand is used in the treatment of hypoglycemia.

For each protein, we perform a regularization (Meinke,Mohanty, Eisenmenger, & Hansmann, 2008) of the native structureaccording with the potential energy model adopted and, succes-sively, we use the regularized structure to perform the designedrobustness analysis. The regularization of the structure is importantto conduct a correct analysis; when we analyze the structures fromthe Protein Data Bank (PDB), the actual bond lengths and angles inthe PDB-structure will slightly differ from the fixed values that areassumed with our potential. Forcing the molecule into the standardbonding geometry model may lead to un-physically high energies.The regularization finds an optimal structure within the standardgeometry model starting from a PDB structure (Meinke et al., 2008).According to the three defined perturbations, we generate 10,000trials for the Global analysis; 200 trials for each torsion angles ofa given protein in the Local analysis and 200 for each residue inResidue analysis.

The key parameter of our robustness analysis is the �-threshold.In order to set plausible values for the threshold for Global, Local andResidue analysis, we have conducted preliminary experiments onthe peptides subset.

In Fig. 1, we report the robustness analysis results conductedon the 35 peptides by setting the threshold to 1.0 kcal/mol for thethree robustness analysis. The x-axis of Fig. 1 reports the peptidesin increasing order with respect to the number of residues. Theseresults show that under global mutations, peptides are less robust


with respect the local and residue yield; the local yield shows thatsingle mutations does not affect heavily the structure of the pep-tides. The most interesting results concern the residue yield; itshows that the robustness has a good correlation with the lengthof the protein sequence.


http://www.drugbank.ca/

https://www.researchgate.net/publication/223352934_SMMP_v_30-Simulating_proteins_and_protein_interactions_in_Python_and_Fortran?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/231402749_Energy_Parameters_in_Polypeptides_10_Improved_geometric_parameters_and_nonbonded_interactions_for_use_in_the_ECEPP3_algorithm_with_application_to_proline-containing_peptides?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/225092266_A_Second_Generation_Force_Field_for_the_Simulation_of_Proteins?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/21850610_Atomic_solvation_parameters_applied_to_MD_of_proteins_in_solution?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

ARTICLE ING Model

CACE-4034; No. of Pages 10

G. Stracquadanio, G. Nicosia / Computers and C

F(f

ag1

ssLtopa

oabu

3

mbabrmirkdtd

K

wtf

K

ig. 1. Energetic robustness of the 35 peptides extracted from the PEPstr test bedNicosia & Stracquadanio, 2008). The x-axis reports the peptides in increasing orderor number of residues. The plot shows the local, residue, and global yield values.

However, in the rest of the research work, in order to obtainmore meaningful results from the global analysis, we set the

lobal analysis threshold to 5.0 kcal/mol, instead we maintain.0 kcal/mol for the local and residue analysis.

For each of the three kind analysis, we performed an exten-ive set of experiments to characterize the yield of each proteintructure. We analyzed the energetic distributions due to Global,ocal and Residue mutations using the energy histograms; each his-ogram is computed by sampling the energy landscape with binsf 1.0 kcal/mol and considering a radius of 1000 bins around theotential energy value of the native structure, for the Global yield,nd a radius of 50 bins for Local and Residue yield.

Moreover, for the Residue analysis, we performed the estimationf the computed potential energy variation at residue level. Theverage potential energy value is estimated and the spread of eachox is set to the corresponding standard deviation; this analysis isseful to remark the most sensitive residue of a protein.

.2. The relative entropy

The relative entropy is used to measure discrepancies betweenodels; in our research work, we measure the discrepancies

etween mutant and wild type potential energy distributions. Rel-tive entropy is not the only way we could measure discrepanciesetween alternative probability distributions. However, in usingelative entropy, we follow a substantial body of work in appliedathematics that reaps benefits from entropy in terms of tractabil-

ty and interpretability (Hansen & Sargent, 2007). In particular, weefer to the well-known notion of Kullback–Leibler divergence, alsonown as information gain (Kullback, 1959). The Kullback–Leiblerivergence (KLd) is perhaps the most frequently used information-heoretic distance measure; theKLd is a functional of two probabilityistribution p(x) and r(x), formally defined as follows:

Ld(x, p, r) ≡ −∫dxp(x) × log

(p(x)r(x)

)(10)

here if r(x) is kept constant, the KLd assume the same form ofhe Shannon Entropy. The Kullback–Leibler entropy verifies theollowing condition:


Ld(x, p, r) ={

0 if p(x) = r(x)> 0 otherwise

(11)

PRESShemical Engineering xxx (2010) xxx–xxx 5

If the set of possible values of x is discrete in nature or it can bediscretized, the information gain form changes as follows:

KLd(x, p, r) ≡ −∑n

p(xn) × log(p(xn)r(xn)

)(12)

where n is the number of possible discrete values. From a generallearning algorithm perspective, it is important to remark that therelative entropy is the amount of information the algorithm hasalready learned during its search process. Once the algorithm startsto learn, the relative entropy KLd increases monotonically until itreaches a final steady state. This is consistent with the idea of amaximum entropy principle (Jaynes et al., 2003) of the form:

dKLddt

≥ 0 (13)

Since dKLd = 0, when the learning process ends, it is possible to useit as a termination condition. In order to study the divergence of apotential energy distribution of a mutant respect the wild type, wemeasure the information gain of the two distributions; it is obviousthat if a mutant has a low relative entropy respect the wild typeprotein, it means that probably resides in the same, or in a closeregion of the folding funnel.

Moreover, the information gain is useful in our Robust-Protein-Design algorithm; it is possible to use the information gain toestimate how far a neutral mutant is from the wild type protein fun-nel, and it can be used as a metric for the feasibility or unfeasibilityof a decoy.

It is important to remark that the validity of the metrics is inde-pendent from the specific form adopted; in order to verify thisassumption, we measure the discrepancies of two potential energydistributions using the Rényi generalized divergenceRd (Rényi, 1961)and the von Neumann divergence VNd (Kopp, Jia, & Chakravarty,2007).

The Rényi distribution of order ˛, with ˛ > 0, ˛ /= 1, measuresthe discrepancy between a true distribution p and an approximateddistribution q of a random variable x as follows:

Rd(x, p, q,˛) = 1˛− 1

log

(∑n

p˛(xn)q˛−1(xn)

)(14)

The Rényi divergence is always grater than zero like theKullback–Leibler divergence. It is possible to note that for ˛→ 1the Rényi converges to the Shannon’s entropy.

Finally, we adopt the von Neumann entropy which is defined,for a quantum system in a state �, as follows:

S(�) = −tr(� log�) (15)

If we consider the eigenvalues {�n} of �, the von Neumann entropyis given by:

S(�) = −∑n

�n log�n (16)

since tr(�) =∑n�n = 1, the eigenvalues of � form a probabilitydistribution. Hence, the von Neumann entropy � is equivalent tothe Shannon entropy of the eigenvalues.

For any two density operators � and � on a given state space,the relative entropy is defined as:

S(�||�) = −tr(� log�) − tr(� log�) (17)


This is analogue to the Shannon’s relative entropy D(p||r) betweentwo probability distribution p and r on a common set. A basic prop-erty of the von Neumann relative entropy is that is always graterthan zero. In our research work, we use a discretized form of the


https://www.researchgate.net/publication/9081527_ASTRO-FOLD_A_Combinatorial_and_Global_Optimization_Framework_for_Ab_Initio_Prediction_of_Three-Dimensional_Structures_of_Proteins_from_the_Amino_Acid_Sequence?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/220022177_On_measures_of_information_and_entropy?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/50336131_Information_Theory_And_Statistics?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

https://www.researchgate.net/publication/5363797_Generalized_Pattern_Search_Algorithm_for_Peptide_Structure_Prediction?el=1_x_8&enrichId=rgreq-d776b4b4-7af2-4021-8fc7-ab2863b68dd1&enrichSource=Y292ZXJQYWdlOzIyMjgyODUwNTtBUzoxMDI5ODA1OTQ0MzgxNTJAMTQwMTU2Mzg5NTY0OA==

ARTICLE IN PRESSG Model


6 G. Stracquadanio, G. Nicosia / Computers and Chemical Engineering xxx (2010) xxx–xxx

Table 1Protein taken into account in our experiments. For each protein, we report the PDB code, the number of residues, the number of angles (Na), the relative class, the potentialenergy value of the native state, and the robustness values.

PDB and Na Class E0 (kcal/mol) Analysis Robustness PDB and Na Class E0 (kcal/mol) Analysis Robustness

Global 99.80 Global 2.861PLW(5) – −24.835 Local 99.98 1AML(40) � −276.133 Local 97.1324 Residue 100.00 224 Residue 76.36

Global 0.72 Global 17.621CRN(46) � + � −225.219 Local 64.71 1BJB(28) � −235.491 Local 98.32235 Residue 14.00 161 Residue 85.02

Global 0.00 Global 75.311IGD(61) � + � −584.261 Local 80.75 1BJC(28) ˛ −263.719 Local 98.10356 Residue 22.54 159 Residue 87.38

Global 0.00 DB00107 Global 95.401BDD(60) � −659.484 Local 88.29 1NPO(9) Drug −65.783 Local 99.76357 Residue 46.14 48 Residue 98.06

Global 0.00 DB00040 Global 73.301GAB(53) � −419.262 Local 87.34 1GCN(29) Drug −273.502 Local 97.57324 Residue 42.76 175 Residue 86.57

15

v

V

w

4

rpdFmbtfftl(mbdtts

oaokofaaa

fiasa

clear that there are sensitive residues for the robustness of the pro-teins; in particular, there are mutations of a single amino acid thatare greater than others of many order of magnitude, and it can beclassified as one of the main actor of the protein folding process.

Fig. 2. Amyloids potential energy distributions. We report the potential energy dis-

Global 0.001E0L(37) � −233.022 Local 87.44221 Residue 43.93

on Neumann relative entropy defined as:

Nd(x, p, q) = −1n

∑n

(p(xn) log q(xn)) − 1n

∑n

(p(xn) log p(xn)) (18)

here x is a random discretized variable with n possible values.

. Results and discussions

The experiments reported in Table 1 show interesting energeticobustness properties; firstly, the Global analysis shows that therotein conformations, which undergo to perturbation normallyistributed on each dihedral angle of non-small proteins, unfold.rom this point of view, it is important to note that this kind ofutations, occurring for example in the synthesis process due to

ias in the process, produces a protein that can be misfolded orotally different from the designed one. Although this is evidentor small and medium size proteins, this is not necessarily trueor peptides or proteins fully exposed to solvent; by inspectinghe results on the Met-enkephalin (PDB code 1PLW) or the Amy-oids of the Alzheimer’s disease (PDB code 1AML, 1BJB and 1BJC)Lundin et al., 2007), the protein robustness is close to the maxi-

um even in Global analysis (see Table 1); this could be justifiedy the fact that small proteins maintain a well defined structureue to very small coil regions, that, typically, do not connect struc-ural motif; instead for fully exposed proteins, it is seams clearhat the solvent force plays a central role in the definition of thetructure.

By inspecting the results of the Local analysis it is important toutline a clear experimental result; although there are angles thatre responsible for the complete misfolding of a protein the yieldf the structure remains very high. From another point of view, thisind of analysis claims that a protein, which undergo a perturbationn an individual dihedral angle, has a very low probability of mis-olding. This insight suggests that a single angle perturbation is notviable way to find the native structures of a protein for de-novo

lgorithms; mutations of more angles seems to be more reasonablend plausible mechanism during the folding process.


Finally, by inspecting the results of the Residue analysis we cannd the most interesting results; the protein robustness variesccording to their class. The � + � class seems to be the more sen-itive to residue mutation with an average yield of ∼18.27%; the �nd � classes report an average yield of 44.45% and 43.93% respec-

Global 0.00ST7(86) Enzyme −890.776 Local 74.7827 Residue 23.56

tively. By looking the box-error-plot of the Residue analysis, it is


tribution for the Global Analysis (a: top plot), and for the Residue Analysis (b: bottomplot). The potential energy histograms of the Global Analysis shows that the 1BJCmutant has strict energy spread well centred on the native potential energy value(E0); the 1AML shows a larger spread and the histogram shows a great variability ofthe potential energy of the trials. For the Residue Analysis, the energy histogram iswell centred on the native state E0 for all the three proteins.




G. Stracquadanio, G. Nicosia / Computers and Chemical Engineering xxx (2010) xxx–xxx 7

Table 2Analysis of the relative entropy of the Amyloids for the Global and Residue analysis: for each protein we report the mean ( ) and the standard deviation (�),the robustness,the Kullback–Leibler divergence (KLd) the Rényi divergence (Rd), and the Von Neumann divergence (VNd). In bold face, we remark the correlation between high robustnessvalues and low relative entropy values.

PDB Id. � Robustness KLd R˛=2d

R˛=3d

R˛=4d

VNd

Global analysis1AML −249.43 38.25 2.76 – – – – –1BJB −222.32 18.52 17.62 3.89 4.59 4.81 4.97 0.011BJC −259.56 3.39 75.31 2.02 2.44 2.58 2.65 0.007

Residue analysis1AML −275.31 2.89 76.36 – – – – –1BJB −235.00 3.23 85.02 17.31 16.84 17.52 17.78 0.051BJC −263.49 0.73 87.38 10.16 10.42 10.52 10.59 0.03

Table 3Robustness analysis of the mutants of the Crambin, the Reference Structure (RS) in this experiment. For each new synthetic Crambin-like protein (Crambin-mutant, ormutant) we report the mutated amino acid (aa), the type of mutation (Neutral (N), Disease Associated (D), Unfolded (U)), the potential energy value in the native state (E0),and the residue robustness value.

1CRN-I16 1CRN-Q17

AA Mut. E0 (kcal/mol) Robustness RMSDC˛ (Å) AA Mut. E0 (kcal/mol) Robustness RMSDC˛ (Å)

C RS −225.219 14.00 - R RS −388.118 36.04 1.184A N −365.964 25.58 1.015 A D – – –D N −399.786 34.99 1.113 C D – – –E N −369.268 24.52 0.646 D N −322.001 25.96 1.374F N −388.038 30.34 1.337 E D – – –G N −364.802 32.17 0.956 F D – – –H N −394.265 30.42 0.911 G N −350.270 32.14 1.185I N −388.118 36.04 1.184 H U – – –K N −394.239 30.46 1.239 I U − – –L N −391.632 30.18 1.143 K N −343.820 23.37 1.111M N −388.424 32.11 1.282 L U – – –N D – – – M D – – –P D – – – N D – – –Q N −392.754 30.79 0.606 P D – – –R N −418.925 35.04 1.006 Q N −350.818 37.61 1.121S N −361.138 23.84 0.624 S U – – –

fiofbmst

4

ay

tthlmoccdTrmr

In order to assess the effectiveness of the RPD algorithm, weinvestigate the less robust protein in terms of Residue Yield value,the Crambin (see Table 1), and we perform the RPD algorithm. The

T D – – –V N −358.409 26.13 0.611W N −384.837 32.07 0.840Y N −397.363 27.49 1.196

The results are very general and this property seems to be veri-ed for all classes of proteins but not for peptides; it is possible tobserve that the 1PLW and 1NPO peptides have a large core regionor Global and Residue analysis, whereas they tend to the generalehaviour when they undergo to Local mutations. This experi-ental evidence seems to confirm that peptides are very robust

tructure with a large core region, with many decoys belonging tohe native funnel.

.1. Analysis of the Amyloids proteins

For the Amyloid A4 (1AML) and its mutants (1BJB and 1BJC) theverage residue yield is 82.92% with the mutants achieving a higherield than the ancestor.

By looking at the Amyloids sequences, it is interesting to notehat the three proteins are identical for the first 28 residues, withhe mutant different on the last residues. Probably the second �elix of the 1AML with its surrounding coil region makes the protein

ess robust. It is important to note that for this set of proteins, theost sensitive residues has the same order of magnitude of the

thers that is in contrast with the results reported for the � + �lass (see Table 1 and Fig. 2). For the mutants of the Amyloid, weompute the relative entropy of the 1BJB and 1BJC potential energy


istributions against the 1AML distributions as reported in Table 2.he results show that the 1BJC, which is the mutant with the lowestelative entropy value, is the most robust; this seems to confirm thatutants that are not far from the wild type protein are the most

obust.

T D – – –V N −314.261 23.37 1.370W N −344.361 30.08 1.150Y N −283.419 27.95 1.215

4.2. Robust-Protein-Design for the Crambin


Fig. 3. Crambin potential energy distributions. Distribution of the potential energyfor the Residue Analysis of the Crambin (PDB Id. 1CRN), and the mutants discoveredwith the RPD algorithm; the mutant distributions are well centred on the E0 valueeven if they are not close to the Crambin wild type.




8 G. Stracquadanio, G. Nicosia / Computers and Chemical Engineering xxx (2010) xxx–xxx

Table 4Analysis of the relative entropy of the Crambin and RPD mutants for the Residue analysis: for each protein we report the mean ( ) and the standard deviation (�),therobustness, the Kullback–Leibler divergence (KLd) the Rényi divergence (Rd), and the Von Neumann divergence (VNd). In bold face, we remark the correlation between highrobustness values and low relative entropy values.

Protein � Robustness KLd R˛=2d

R˛=3d

R˛=4d

VNd

gnrns

Cte(Bm1wotcttwstisrfismtcmeFut

that both are robust to all the three types of perturbations. TheGlobal yield varies from 73.30% for the 1GCN to 95.40% of the 1NPO,and high yield values are reported both for Local and Residue Yield;in particular, for the 1NPO, we obtained a Local Yield of 99.76% and

Residue analysis1CRN −217.080 62.28 14.001CRN-I16 −376.464 31.263 36.041CRN-I16-Q17 −350.818 28.426 37.61

oal of the RPD algorithm is to reduce the variability around theative state caused by environmental conditions, noise, and dete-ioration. The aim is to reduce the mean to the specific target (theominal value) of the native state producing a symmetrical bell-haped energy distribution centred on the native energy value E0.

The most sensitive residue of the 1CRN is the 16th residue, aysteine (see Fig. 4); we change systematically this amino acid withhe remaining 19 amino acids, and, using the SDM server (Wortht al., 2007), it is possible to identify three disruptive mutationsfor Asparagine, Proline, Threonine) and sixteen neutral mutations.y inspecting the results in Table 3, from the set of the neutralutants, the mutant with the Isoleucine at the sixteen position,CRN-I16, reports an yield of 36.04% that is more robust that theild type 1CRN which has a yield value of 14.06%. The robustness

f the whole structure is clear by inspecting Fig. 3; it is possibleo note that the energy histogram of the mutant, 1CRN-I16, is wellentred around the native state E0 and it is smoother than the wildype Crambin. Moreover, the robust optimization provides a struc-ure with a lower potential energy variation. Finally, in Table 3e report the yield values of the mutant 1CRN-I16 (the reference

tructure with Yield 36.04%) mutating of the second most sensi-ive residue, the Arginine(17). For the second time it is possible tomprove the robustness of the protein mutating the second mostensitive residue (see Fig. 5), the Arginine(17) with a Glutamineeaching a yield value of 37.61%. Starting with a yield of 14.00%or the wild type Crambin, we obtained a yield of 36.04% mutat-ng the most sensitive residue, Cys(16), with Ile(16); mutating theecond most sensitive residue, R(17), with Gln(17) is obtained autant with high yield, similar structure (RMSDC˛ = 1.121Å) and

he same function. The designed algorithm using a sort of learningascade was able to find out a Crambin-like protein that is ∼23.61%ore robust than the wild type. Moreover, we estimate the relative


ntropy for the two mutants (see Table 4). It is possible to note inig. 3 that there is a perfect adherence of the relative entropy val-es with the estimated energetic distributions; moreover, it seemso be confirmed as in Amyloids, that the conformation with the

Fig. 4. Potential energy variation at residue level of the wild type Crambin.

– – – – –21.20 16.935 17.385 17.604 0.03820.294 16.718 17.118 17.310 0.037

lowest relative entropy is the most similar and the most robust, atthe structural and functional level, respect the wild type. Finally,we performed a third iteration of the RPD algorithm, by replacingthe 21st amino acid; all the 19 mutants result in a conformationwith a potential energy greater then zero, and these structures wereconsidered unfolded.

4.3. Analysis of drugs

The robustness analysis on the two drugs (see Table 1) confirms


Fig. 5. Potential energy variation at residue level of the two mutants discovered byRPD: the 1CRN-I16 mutant (top plot) and the 1CRN-I16-Q17 mutant (bottom plot).It is possible to note that the spread of the most sensitive residue decrease of severalorder of magnitude in both the 1CRN-I16 and 1CRN-I16-Q17 mutants.




G. Stracquadanio, G. Nicosia / Computers and Chemical Engineering xxx (2010) xxx–xxx 9

Table 5Mutation of the most sensitive residue (Glutamine, 4 residue) of 1NPO. For eachmutant we report the mutant amino acid (aa), the type of mutation (N = Neutral,D = Disease Associated, RS = Reference sequence), the relative potential energy value,and the results of the robustness and the analysis for the residue and globalrobustness.

Residue GlobalAA Mut. E0 (kcal/mol) Robustness (%) Robustness (%) RMSDC˛ (Å)

Q RS −65.783 98.06 95.40A N −44.661 98.61 96.01 0.039C N −45.590 96.67 97.04 2.102D N −67.121 98.72 96.10 0.146E N −62.733 97.67 97.61 2.124F N −48.942 99.00 96.56 0.143G D – – – –H N −53.560 97.72 97.49 2.080I N −49.900 96.83 97.19 2.105K N −52.106 97.61 97.94 2.085L N −44.843 96.44 98.03 2.305M N −45.449 97.83 97.46 2.126N N −69.997 96.72 95.47 0.145P N −53.425 95.67 97.18 0.477R N −81.161 97.22 96.98 0.145S N −46.800 94.44 98.49 1.677T N −53.020 97.44 95.70 0.034

a9nwtptratrctidwtr

iFiw

4

pit

Fig. 6. Potential energy distributions of the Oxicitin (1NPO) and its mutant (1NPO-S4): we report the potential energy distribution for the Global Analysis (a), and the

TAt

V N −44.980 98.28 97.17 2.170W N −52.621 97.89 94.57 0.369Y N −56.144 99.00 97.05 0.148

Residue Yield of 98.06%; the Glucagon reports a Local Yield of7.57% and a Residue Yield of 86.57%. Despite the structures are sig-ificantly different, the robustness properties are well maintained,hich is in contrast with the result obtained for proteins. It is clear

hat the robustness of the compound is a target of the drug designrocess and our analysis confirms in silicon this evidence. In ordero improve the robustness of the 1NPO, we perform our RPD algo-ithm, which returns a mutant in which the most sensitive aminocid, a glutamine Q(4), was replaced by a serine (see Table 5). Sincehe Oxicitin is a peptide, we do not limit our investigation to theesidue analysis but we take into account also the global one; thishoice is justified by our preliminary study where we proved thathe global analysis is more meaningful for peptides. The residue yields improved by ∼1%, instead the global yield is ∼3% higher (Table 5);espite the high robustness of the drug, RPD is able to find a mutantith an increased yield value, and it is an interesting results since

he algorithm is able to find robust structure also in presence ofobust proteins.

The relative entropy of the mutant is consistent with the exper-mental data for Global and Residue analysis (see Table 6 and Fig. 6).inally, the RPD improvement is not huge since the drugs takennto account are small peptides (1NPO) and a well-defined �-helix

hich are robust structures.


.3.1. The volume of the robustness regionThe yield analysis provides statistical information on the

ropensity of a given protein to remain into the native state. Its interesting to note that the samples of robust protein struc-ure define a region, in the dihedral angle space, where all the

able 6nalysis of the relative entropy of the 1NPO and the RPD mutant for the Global analysis: for

he relative entropy (KLd) the Rényi divergence (Rd), and the Von Neumann divergence (V

PDB Id. � Yield KLd

Global analysis1NPO −67.12 2.57 95.40 –1NPO-S4 −45.56 1.25 98.49 14.78Residue analysis1NPO −67.12 2.57 95.40 –1NPO-S4 −46.690 0.486 94.44 23.04

potential energy distribution for the Residue Analysis (b). The potential energy his-tograms of the Global Analysis shows that the 1NPO-S4 mutant has strict energyspread well centred on the native potential energy value (E0). For the Residue Anal-ysis, the energy histogram is well centred on the native state E0 for both proteins.

bounded conformations differ at most � from the wild type pro-tein. It is obvious to note that a great volume is an implicit indexof the largeness of the robust region in the bottom of the funnel.Finding all the structures that belongs to the robust region is notpossible, and an approximate algorithm should be used. Accordingto this considerations, we define a Monte-Carlo volume estima-tion algorithm; the aim is to find a prefixed number of decoys nsufficiently different and belonging to the robust region. The algo-


rithm starts the Monte-Carlo sampling from the native structure,and then it tries to identify m, with m n new robust trials; eachnew trial is sampled untilmnew robust trials are found. The processis iterated on each new trial untiln robust decoys are found. Succes-

each protein we report the mean ( ) and the standard deviation (�),the robustness,Nd) of the trials.

R˛=2d

R˛=3d

R˛=4d

VNd

– – – –22.78 23.62 23.93 0.25

– – – –8 23.346 23.458 23.516 0.921


ING

C

1 and C

sreWtwtdtlaf

5

smattrtvdrftaDyratmatpPrfptttptta

ocao

S

h

ARTICLEModel


0 G. Stracquadanio, G. Nicosia / Computers

ively, we compute the minimum bounding ellipsoid enclosing theobust conformations and its volume is computed; we choose thellipsoid as ideal model since it usually fits tighter than a sphere.e have performed a preliminary test on the 1BJC and 1GCN pro-

eins, since they have similar number of angles and yield values;e run our Monte-Carlo Volume fitting algorithm starting from

he native structure and requiring the definition of 104 robustecoys. The volume for the 1BJC is 1.31421 and 5.022 × 1021 forhe 1GCN; these results seem to confirm that proteins with simi-ar number of angles and yield values have comparable volumes,nd hence similar volume of the robust region in the bottom of theunnel.

. Conclusion

The robustness indicates the state of a system where a givenet of performances (e.g., objective functions, properties, features,etrics) is minimally sensitive to factors causing variability, wear,

nd aging at the lowest cost. In this research work, it is provedhat the robustness is a useful design principle to investigate pro-eins and design new peptides and drugs. In structural biology, theobustness is the condition used to describe a protein structurehat maintains its folded state and hence its function, with limitedariability in spite of diverse and changing environmental con-itions, or residue-to-residue variation. Considering the obtainedesults, a protein structure is robust when it has limited or reducedunctional variation even in presence of noise and environmen-al perturbation. In our research work, we introduced two newlgorithms, the Protein Monte-Carlo Sampling and Robust-Protein-esign algorithms, and two dimensionless quantities, the energeticield and the energetic relative entropy, for studying the proteinobustness properties. The extensive studies on a set of proteinsnd drugs show some well-defined properties; proteins are robusto local mutations, but become more sensitive to residual or global

utations. In particular for global mutations, only small peptidesnd proteins with strong secondary structure, like �-helix, main-ain a good robustness. The robustness principle was the startingoint for the robustness optimization algorithm called Robust-rotein-Design (RPD). RPD mutates systematically the most sensibleesidue of a protein in order to discover new mutants with the sameunction and an improved robustness. The results on the Crambinrotein (1CRN) and on the Oxicitin drug (DB00107) confirm thathe algorithm discovers new protein sequence with the same func-ion, similar structure and improved yield. It is important to remarkhat the suggested methodology is absolutely general and trans-arent to the problem domain. The universality is claimed sincehe definition of system, robustness and yield; there is no assump-ion on the nature of the system, the properties or features to benalyzed.

The robustness design-principle could be applied to any kindf systems definable in mathematical terms, starting from biologi-al pathways to electronic circuits. The general applicability of thispproach opens new frontiers towards in silicon automatic designf molecular and synthetic systems.


upplementary materials

More simulations and source codes are available at:ttp://www.dmi.unict.it/ stracquadanio/protein-robustness.html.


References

Bemporad, F., Gsponer, J., Hopearuoho, H., Plakoutsi, G., Stati, G., Stefani, M., et al.(2008). Biological function in a non-native partially folded state of a protein.EMBO Journal, 27(10), 1525–1535.

Cornell, W., Cieplak, P., Bayly, C., Gould, I., Merz, K., Ferguson, D., et al. (1995). Asecond generation force field for the simulation of proteins, nucleic acids, andorganic molecules. Journal of the American Chemical Society, 117(19), 5179–5197.

Cutello, V., Narzisi, G., & Nicosia, G. (2006). A multi-objective evolutionary approachto the protein structure prediction problem. Journal of the Royal Society Interface,Royal Society Publications, London, 3(6), 139–151.

Eisenberg, D., Wesson, M., & Yamashita, M. (1989). Interpretation of protein foldingand binding with atomic solvation parameters. Chemica Scripta, 29, 217–221.

Eisenmenger, F., Hansmann, U., Hayryan, S., & Hu, C. (2001). [SMMP] A modernpackage for simulation of proteins. Computer Physics Communications, 138(2),192–212.

Estrada, E. (2002). Characterization of the folding degree of proteins. Bioinformatics,18(5), 697–704, doi:10.1093/bioinformatics/18.5.697

Floudas, C. (2007). Computational methods in protein structure prediction. Biotech-nology and Bioengineering, 97(2), 207–213.

Hansen, L., & Sargent, T. (2007). Robustness. Princeton University Press.Helles, G. (2008). A comparative study of the reported performance of ab initio pro-

tein structure prediction algorithms. Journal of Royal Society and Interface, 5(21),387–396.

Hermans, J., Berendsen, H., Van Gusteren, W., & Postma, J. (1984). A consistent empir-ical potential for water-protein interactions. Biopolymers, 23(8), 1513–1518.

Jaynes, E. (2003). Probability theory: The logic of science. Cambridge University Press.Klepeis, J., & Floudas, C. (2003). ASTRO-FOLD: A combinatorial and global opti-

mization framework for ab initio prediction of three-dimensional structures ofproteins from the amino acid sequence. Biophysical Journal, 85(4), 2119–2146.

Klepeis, J., Pieja, M., & Floudas, C. (2003). Hybrid global optimization algorithmsfor protein structure prediction: Alternating hybrids. Biophysical Journal, 84(2),869–882.

Kopp, A., Jia, X., & Chakravarty, S. (2007). Replacing energy by von Neumann entropyin quantum phase transitions. Annals of Physics, 322(6), 1466–1476.

Kullback, S. (1959). Information theory and statistics. Wiley Publication in Mathemat-ical Statistics.

Lundin, C., Johansson, S., Johnson, A., Näslund, J., von Heijne, G., & Nilsson, I. (2007).Stable insertion of Alzheimer A ˇ peptide into the ER membrane strongly cor-relates with its length. FEBS Letters, 581(20), 3809–3813.

Meinke, J., Mohanty, S., Eisenmenger, F., & Hansmann, U. (2008). SMMP v. 3.0 Simu-lating proteins and protein interactions in Python and Fortran. Computer PhysicsCommunications, 178(6), 459–470.

Momany, F. A., McGuire, R. F., Burgess, A. W., & Scheraga, H. A. (1975). Energyparameters in polypeptides. VII. Geometric parameters, partial atomic charges,nonbonded interactions, hydrogen bond interactions and intrinsic torsionalpotentials for the naturally occuring amino acids. Journal of Physical Chemistry,79(22), 2361–2381.

Nemethy, G., Gibson, K., Palmer, K., Yoon, C., Paterlini, G., Zagari, A., et al. (1992).Energy parameters in polypeptides. 10. Improved geometrical parametersand nonbonded interactions for use in the ECEPP/3 algorithm, with applica-tion to proline-containing peptides. The Journal of Physical Chemistry, 96(15),6472–6484.

Nicosia, G., & Stracquadanio, G. (2008). Generalized pattern search algorithm forpeptide structure prediction. Biophysical Journal, 95(10), 4988–4999.

Ooi, T., Oobatake, M., Nemethy, G., & Scheraga, H. (1987). Accessible surface areas as ameasure of the thermodynamic parameters of hydration of peptides. Proceedingsof the National Acadamy Sciences USA, 84(10), 3086–3090.

Rényi, A. (1961). On measures of information and entropy. In In Proceedings of the4th Berkeley symposium on mathematics, statistics and probability (pp. 547–561).

Roterman, I., Lambert, M., Gibson, K., & Scheraga, H. (1989). A comparison ofthe CHARMM, AMBER and ECEPP potentials for peptides. II. Phi–psi maps forN-acetyl alanine N’-methyl amide: comparisons, contrasts and simple experi-mental tests. Journal of Biomolecular and Structural Dynamics, 7(3), 421–453.

Tramontano, A. (2006). Protein structure prediction. Weinheim, Germany: Wiley Inc.Vendruscolo, M. (2007). Determination of conformationally heterogeneous states of

proteins. Current Opinion in Structural Biology, 17(1), 15–20.Wesson, L., & Eisenberg, D. (1992). Atomic solvation parameters applied to molecular

dynamics of proteins in solution. Protein Science, 1(2), 227.Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., et al.

(2006). Drugbank: A comprehensive resource for in silico drug discovery andexploration. Nucleic Acids Research, 34, D668–D672.


Wolynes, P. (2006). Recent successes of the energy landscape theory of proteinfolding and function. Quarterly Reviews of Biophysics, 38(04), 405–410.

Worth, C., Bickerton, G., Schreyer, A., Forman, J., Cheng, T., Lee, S., et al. (2007).A structural bioinformatics approach to the analysis of nonsynonymous sin-gle nucleotide polymorphisms (nsSNPs) and their relation to disease. Journal ofBioinformatics and Computer Biology, 5(6), 1297–1318.


Documents

Computational energy-based redesign of robust proteins