Modeling of the Inhibition Constant (Ki) of Some Cruzain Ketone-Based Inhibitors Using 2D Spatial Autocorrelation Vectors and Data-Diverse Ensembles of Bayesian-Regularized Genetic

Modeling of the Inhibition Constant (Ki) of Some CruzainKetone-Based Inhibitors Using 2D Spatial AutocorrelationVectors and Data-Diverse Ensembles of Bayesian-RegularizedGenetic Neural Networks

Julio Caballeroa, Alain Tundidor-Cambab and Michael Fernandeza*a Molecular Modeling Group, Center for Biotechnological Studies, University of Matanzas, Autopista Varadero km 31!2, Matanzas,C.P. 44740, Cuba, E-mail: [email protected], Tel: (53) (45)261251, Fax: (53) (45)253101

b Scientific Prospection Group, National Centre for Scientific Researches (CNIC), P.O. Box 6880, Havana, Cuba

Keywords: Artificial neural networks, Bayesian regularization, Genetic algorithm, QSAR,Trypanosoma cruzi

Received: January 1, 2006; Accepted: February 17, 2006

DOI: 10.1002/qsar.200610001

AbstractThe inhibition constant (Ki) of a set of 46 ketone-based cruzain inhibitors against cysteineprotease cruzain was successfully modeled by means of data-diverse ensembles ofBayesian-regularized genetic neural networks. 2D spatial autocorrelation vectors wereused for encoding structural information yielding a nonlinear model describing about 90and 75% of ensemble training and test set variances, respectively. From the results of aranking analysis of the neural network inputs, it was derived that atomic van der Waalsvolume distributions at topological lags 3, 5, and 6 in the 2D topological structure of theinhibitors have a high nonlinear influence on the inhibition constants. Furthermore,optimum subset of autocorrelation vectors well mapped the studied compounds accordingto their inhibition constant values in a Kohonen self-organizing map.

1 Introduction

Trypanosoma cruzi, a parasitic protozoan, is the causativeagent of the Chagas disease or American trypanosomiasis,one of the most threatening endemics in Central and

South America. Approximately 16 – 18 million people areinfected, resulting in adverse health events such as heartfailure and more than 50000 deaths each year [1]. It isthought that another 100 million people are at risk of in-fection [1]. The infectious trypomastigote form of T. cruzi

QSAR Comb. Sci. 26, 2007, No. 1, 27 – 40 I 2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 27

Abbreviations: ANN, Artificial Neural Network; BRANN,Bayesian-Regularized Artificial Neural Network; BRGNN,Bayesian-Regularized Genetic Neural Network; GA, GeneticAlgorithm; KCI, Ketone-based Cruzain Inhibitor; LOO, Leave-One-Out; NNE, Neural Network Ensemble; PLS, Partial LeastSquares; QSAR, Quantitative Structure –Activity Relationship;SOM, Self-Organizing Map

Symbols:Ki Inhibition constantpk Weighting atomic property kL Spatial lagpk Average value of property kN Number of compoundsL Number of nonzero elements in the autocorrelation

sumdij Topological distance or spatial lag between atoms i

and jd(l, dij) Dirac-delta functionMATSlpk MoranMs index at topological lag l weighted by the

atomic property pk

GATSlpk GearyMs coefficient at topological lag l weighted bythe atomic property pk

ATSlpk Broto-MoreauMs autocorrelation coefficient at spatiallag l weighted by the atomic property pk

Inpj Input jOutj Output j obtained from the input j

f(inpj) Transfer functionF Network performance function

MSE Mean of the sum of squares of the network errorsyi Predicted biological activity of the compound iti Experimental stability of the compound ivi Additive noise processsv Zero-mean Gaussian noise standard deviation.P(D jw,b,M) LikelihoodMSW Mean of the sum of the squares of the network

weightsA Inverse variance of the distributionP(w jD,a,b,M) Posterior probabilityP(w ja,M) Prior densityRMSE Root mean square network error

Full Papers

is transmitted to human hosts through the triatomine“kissing bug” vectors.The primary cysteine protease of T. cruzi, cruzain, is ex-

pressed throughout the life cycle and is essential for thesurvival of the parasite within host cells [2]. Parasite pene-tration into the host cell and digestion of immunoglobu-lins, as a defense mechanism, involve cruzain enzyme. Inthis sense, some cruzain inhibitors have been shown to suc-cessfully treat animal models of Chagas disease by block-ing the parasitic life cycle [3]. Thus, targeting cruzain hasbecome interesting for the development of potential thera-peutics for the treatment of the Chagas disease.Cruzain is inhibited by organomercurial reagents, E-64,

Tos-Lys-CH2Cl, leupeptin, a number of peptidyl chlorome-thanes, and peptidyl fluoromethane derivatives, vinyl sul-fones, thio semicarbazones, cystatins, stefins, and kinino-gens [3]. Covalent inhibitors, such as peptidyl epoxy ke-tones [4], aldehydes, and vinyl sulfones [5] have also beenreported to inhibit cruzain. However, design of inhibitorsof cruzain has almost exclusively focused on irreversibleinhibitors such as fluoromethyl ketones and vinyl sulfones[6]. Although animal studies established that vinyl sulfoneinhibitors are not toxic at therapeutic doses, the poor se-lectivity of these irreversible inhibitors for cruzain overhuman cysteine proteases [6] remains a significant con-cern.Ketone-based Cruzain Inhibitors (KCIs) [6, 7] are the

most promising cruzain inhibitors: they showed greaterthan 1000-fold selectivity for cruzain over human cathe-psin B and 100-fold selectivity for cruzain over humancathepsin L [6]. Focusing rational drug design studies onthese compounds could lead to improved KCIs and theidentification of relevant structural features for cruzain en-zyme binding.Since interactions between a chemical and its biological

target are often nonlinear, Artificial Neural Network(ANN) methodology had been successfully applied in re-gression Quantitative Structure –Activity Relationship(QSAR) studies of biological properties [8 – 17]. Besidesthe nonlinearity between biological activities and the com-puted molecular descriptors, another major problem ariseswhen the number of calculated variable exceeds the num-ber of compounds in the data set, so that one is dealingwith an undetermined problem where undesirable overfit-ting can result [10]. This problem can be handled by imple-menting a feature selection routine that determines whichof the descriptors have a significant influence on the activi-ty of a set of compounds. AGenetic Algorithm (GA) rath-er than forward or backward elimination procedures hasbeen successfully applied for feature selection in QSARstudies when the dimensionality of the data set is high and/or the interrelations between variables are convoluted [10,12 – 17].In this work we report the neural network modeling of

the inhibition constant Ki of some KCIs using a 2D topo-logical approach. Structural information was encoded by

computing 2D autocorrelation vectors over the topologicalrepresentation of 46 KCIs. In this way, a set of descriptorswas computed, and by employing a nonlinear modelingtechnique previously used by our group, Bayesian-Regu-larized Genetic Neural Networks (BRGNNs) [13 – 17], op-timum ANN-based predictive models of the inhibitionconstants, were built. In order to provide robust models,we employed data-diverse ensembles of BRGNNs for cal-culating Ki instead of a single network. In addition to theregression model, we built a Self-Organizing Map (SOM)of inhibition constants using the inputs of the optimumBRGNN predictor for unsupervised training of competi-tive neurons.

2 Materials and Methods

2.1 2D Spatial Autocorrelation Approach

The binding of a substrate to its receptor is dependent onthe shape of the substrate and on a variety of effects suchas the molecular electrostatic potential, polarizability, hy-drophobicity, and lipophobicity. Therefore, in a QSARstudy the strategy for encoding molecular informationmust, in some way, either explicitly or implicitly, accountfor these physicochemical effects. Furthermore, usuallydata sets include molecules of different size with differentnumbers of atoms, so the structural encoding structuresmust allow comparing such molecules. Thus, we werefaced with the problem of having to compare moleculeswith different numbers of atoms. Information having vari-able length can be transformed into fixed-length informa-tion by autocorrelation [9].Autocorrelation vectors have several useful properties.

First, a substantial reduction in data can be achieved bylimiting the topological distance l. Second, the autocorre-lation coefficients are independent of the original atomnumberings, so they are canonical. And third, the length ofthe correlation vector is independent of the size of themolecule [9].For the autocorrelation vectors, H-depleted molecular

structure is represented as a graph G and physicochemicalproperties of atoms (i.e., atomic masses, atomic van derWaals volumes, atomic Sanderson electronegativities, andatomic polarizabilities) as real values assigned to the verti-ces of G (Table 1).These descriptors can be obtained by summing up the

products of certain properties of two atoms, located at giv-en topological distances or spatial lag in G. Three spatialautocorrelation vectors were employed for modeling theinotropic activity.MoranMs index [18]

28 I 2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 26, 2007, No. 1, 27 – 40

Full Papers Julio Caballero et al.

www.qcs.wiley-vch.de

MATSlpk ¼N2L

Pij

ijðpki � pkÞðpkj � pkÞPiðpki � pkÞ

ð1Þ

GearyMs coefficient [19]

GATSlpk ¼ðN � 1Þ

4L

Pij

ijðpki � pkÞðpkj � pkÞPiðpki � pkÞ

ð2Þ

Broto –MoreauMs autocorrelation coefficient [20]

ATSlpk ¼X

iijpkipkj ð3Þ

where MATSlpk, GATSlpk, and ATSlpk are MoranMs index,GearyMs coefficient, and Broto –MoreauMs autocorrelationcoefficient at spatial lag l, respectively; pki and pkj are thevalues of property k of atom i and j, respectively; pk is theaverage value of property k; and L is the number of non-zero elements in the sum and d(l, dij) is a Dirac-delta func-tion defined as

dðl; dijÞ ¼1 if dij ¼ l0 if dij=l

� �ð4Þ

where dij is the topological distance or spatial lag betweenatoms i and j.Spatial autocorrelation measures the level of interde-

pendence between properties and the nature and strengthof that interdependence. It may be classified as either posi-tive or negative. In a positive case all similar values appeartogether, while a negative spatial autocorrelation has dis-similar values appearing in close association [18, 19]. In amolecule, MoranMs and GearyMs spatial autocorrelationanalysis tests whether the value of an atomic property atone atom in the molecular structure is independent of thevalues of the property at neighboring atoms. If depend-ence exists, the property is said to exhibit spatial autocor-relation. Moreau and Broto first applied autocorrelationfunction to the topology of molecular structures [20, 21].The autocorrelation vectors represent the degree of simi-larity between molecules. In addition, 2D spatial autocor-relation code has been successfully applied in nonlinearQSAR studies, proving to contain relevant nonlinear infor-

mation concerning different biological phenomena [8, 9,14, 17].

2.2 BRGNN Approach

In the context of ANN-based modeling of biological inter-actions, we have used BRGNNs as a robust nonlinear mod-eling technique that combines GA and Bayesian regulariza-tion for neural network input selection and supervised net-work training, respectively. This approach attempts to solvethe main weaknesses of neural network modeling: the selec-tion of optimum input variables and the adjustment of net-work weights and biases to optimum values for yieldinggeneralizable neural network predictors [10, 22].By combining the concepts of Bayesian-Regularized Ar-

tificial Neural Network (BRANN) and GA, BRGNNs areimplemented in such a way that BRANN inputs are select-ed inside a GA framework. BRGNN approach is a versionof the So and Karplus report [10], incorporating Bayesianregularization that has been successfully introduced by ourgroup for modeling the inhibitory activity of several thera-peutic target enzymes [13 – 17]. BRGNN was programmedwithin MATLAB environment [23] using GA and NeuralNetworks Toolboxes. BRGNN technique leads to neuralnetworks trained with optimum inputs selected from thewhole autocorrelation vector data matrix (Figure 1).

2.2.1 BRANN

ANNs are computer-based models in which a number ofprocessing elements, also called neurons, units, or nodes,are interconnected by links in a netlike structure forming“layers” [24, 25]. Every connection between two neuronsis associated with a weight, a positive or negative realnumber that multiplies the signal from the precedingneuron. Neurons are commonly distributed among theinput, hidden, and output layers. Neurons in the inputlayer receive their values from independent variables;in turn, the hidden neurons collect values from prece-dent neurons, giving a result that is passed to a succes-sor one. Finally, neurons in the output layer take valuesfrom other units and correspond to different dependentvariables.Commonly, ANNs are adjusted, or trained, so that a par-

ticular input leads to a specific target output. According tothis, the output j is obtained from the input j, by applica-tion of Eq. (5)

QSAR Comb. Sci. 26, 2007, No. 1, 27 – 40 www.qcs.wiley-vch.de I 2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 29

Table 1. Representation of different molecular graphs G and topological distances or spatial lags dij.

Molecular graphs G

Dij 1 1 2 2 3 4

Modeling of the Inhibition Constant (Ki) of Some Cruzain Ketone-Based Inhibitors


Outj ¼ f ðinpjÞ ð5Þ

where the function f is called transfer function. When theANN is training, the weights are updated in order to mini-mize network error. In contrast to common statisticalmethods, ANNs are not restricted to linear correlations orlinear subspaces [24]. The employed transfer function,commonly hyperbolic tangent function, allows establishingnonlinear relations. Thus, ANNs can take into accountnonlinear structures and structures of arbitrarily shapedclusters or curved manifolds.While more connections take effect, the ANN adjusts

better the relation input – output. However, when param-eters increase, network loses its ability to generalize. Erroron the training set is driven to a very small value, but whennew data is presented to the network, the error is large. Inthis process, the predictor has memorized the training ex-amples, but it has not learned to generalize to new situa-tions; it means network overfits the data.Network overfitting has been avoided by using reduced-

architecture networks and by implementing regularization[22] and early stopping [25] algorithms. Regularization in-volves modifying the performance function, which is nor-mally chosen to be the sum of squares of the network er-rors (MSE) on the training set. As a result, networkparameters take regularized values, smaller weights, and

biases, and this will force the network response to besmoother and less likely to overfit. Differently to earlystopping, when using regularization networks are traineduntil convergence is reached. Otherwise, early stopping at-tempts to avoid overfitting by eliminating network over-training; for this purpose the available data is divided intothree subsets. The first subset is the training set, which isused for computing the gradient and updating the networkweights and biases. The second subset is the validation setand its error is monitored during the training process. Thevalidation error will normally decrease during the initialphase of training, as does the training set error. However,when the network begins to overfit the data, the error onthe validation set will typically begin to rise. When the val-idation error increases for a specified number of iterations,the training is stopped, and the weights and biases at theminimum of the validation error are returned.Regularization generally provides better generalization

performance than early stopping, when training functionapproximation networks [26]. This is because Bayesianregularization does not require that a validation data setbe separated out of the training data set. It uses all of thedata. This advantage is especially noticeable when the sizeof the data set is small. With early stopping, the choice ofthe validation set is also important. The validation setshould be representative of all points in the training set


Figure 1. Schematic representation of BRGNN technique with a prototype back-propagation neural network with 3-3-1 architecture.Autocorrelation vectors chosen by GA constitute inputs and network is trained against logarithmic inhibition constants (Kis) of KCIs.



and this drawback becomes critical again when using smalldata sets.Typically, neural network training aims to reduce the

sum of squared errors

F ¼ MSE ¼ 1N

XNi¼1

ðyi � tiÞ2 ð6Þ

In this equation F is the network performance function,MSE is the mean of the sum of squares of the network er-rors, N is the number of compounds, yi is the predicted bio-logical activity of the compound i, and ti is the experimen-tal stability of the compound i.MacKayMs BRANNs have been designed to resist over-

fitting [26]. In order to accomplish this purpose, BRANNsinclude an error term that regularizes the weights by pe-nalizing overly large magnitudes.Assuming a set of pairs D¼ {xi, ti}, where i¼1...N is a la-

bel running over the pairs, the data set can be modeled asdeviating from this mapping under some additive noiseprocess (vi)

ti ¼ yi þ vi ð7Þ

If v is modeled as zero-mean Gaussian noise with standarddeviation sv, then the probability of the data given theparameters w is

PðD w; b;MÞ ¼ 1ZDðbÞ

expð�b�MSEÞ�� ð8Þ

where M is the particular neural network model used, b¼1/s2

v and the normalization constant is given by ZD(b)¼ (p/b)N/2. P(D jw,b,M) is called the likelihood. The maximumlikelihood parameters wML (the w that minimizes MSE)depends sensitively on the details of the noise in the data.For completing the interpolation model, a prior proba-

bility distribution must be defined which embodies our pri-or knowledge on the sort of mappings that are “reasona-ble” [27]. Typically this is quite a broad distribution, re-flecting the fact that we only have a vague belief in a rangeof possible parameter values. Once we have observed thedata, BayesM theorem can be used to update our beliefs,and we obtain the posterior probability density. As a re-sult, the posterior distribution is concentrated on a smallerrange of values than the prior distribution. Since a neuralnetwork with large weights will usually give rise to a map-ping with large curvature, we favor small values for thenetwork weights. At this point, it is defined a prior that ex-presses the sort of smoothness it is expected the interpo-lant to have. The model has a prior of the form

Pðw a;MÞ ¼ 1ZWðaÞ

expð�a�MSWÞ�� ð9Þ

where a represents the inverse variance of the distributionand the normalization constant is given by ZW(a)¼ (p/a)N/2.MSW is the mean of the sum of the squares of the networkweights and is commonly referred to as a regularizingfunction.Considering the first level of inference, if a and b are

known, then the posterior probability of the parameters wis

Pðw D; b;MÞ ¼ PðD w; b;MÞ � Pðw a;MÞjj 1PðD a; b;MÞj

�� ð10Þ

where P(w jD,a,b,M) is the posterior probability, which isthe plausibility of a weight distribution considering the in-formation of the data set in the model used, P(w ja,M) isthe prior density, which represents our knowledge of theweights before any data is collected, P(D jw,b,M) is thelikelihood function, which is the probability of the data oc-curring, given the weights and P(D ja,b,M) is a normaliza-tion factor, which guarantees that the total probability is 1.Considering that the noise in the training set data is

Gaussian and that the prior distribution for the weights isGaussian, the posterior probability fulfills the relation

Pðw D;a; b;M ¼ 1ZF

�� expð�FÞ ð11Þ

where ZF depends on objective function parameters. Sounder this framework, minimization of F is identical tofind the (locally) most probable parameters [26].In short, Bayesian regularization involves modifying the

performance function (F) defined in Eq. (6), which is pos-sible by improving generalization by adding an additionalterm.

F ¼ b�MSEþ a�MSW ð12Þ

The relative size of the objective function parameters a

and b dictates the emphasis for getting a smoother net-work response. MacKayMs Bayesian framework automati-cally adapts the regularization parameters to maximize theevidence of the training data [26].Bayesian regularization overcomes the remaining defi-

ciencies of neural networks and produces predictors thatare robust and well matched to the data; in this sense,BRANNs have been successfully applied in structure –property/activity analysis [13 – 17, 22].Fully connected, three-layer BRANNs with backpropa-

gation training were implemented in MATLAB environ-ment [23]. In these nets, the transfer functions of input andoutput layers were linear, and the hidden layer had neu-rons with a hyperbolic tangent transfer function. Inputsand targets took the values from independent variables se-lected by the GA and � log(Ki) values, respectively; bothwere normalized prior to network training. BRANN train-ing was carried out according to the Levenberg –Mar-




quardt optimization [28]. The initial value for m was 0.005with decrease and increase factors of 0.1 and 10, respec-tively. The training was stopped when m became largerthan 1010.

2.2.2 GA

GAs are governed by biological evolution rules [29]. Theyare stochastic optimization methods that have been in-spired by evolutionary principles. The distinctive aspect ofa GA is that it investigates many possible solutions simul-taneously, each of which explores different regions in pa-rameter space [30]. The first step is to create a populationof N individuals. Each individual encodes the same num-ber of randomly chosen descriptors. The fitness of eachindividual in this generation is determined. In the secondstep, a fraction of children of the next generation is pro-duced by crossover (crossover children) and the rest bymutation (mutation children) from the parents on the ba-sis of their scaled fitness scores. The new offspring containscharacteristics from two or one of its parents.In the BRGNN approach, individuals in the populations

are BRANN predictors with a fixed architecture and theMSE of data fitting was tried as the individual fitness func-tion (Figure 2). An individual is represented by a string ofintegers which means the numbering of the rows in the all-descriptors matrix (96 rows�46 columns) that will be test-ed as BRANN inputs. So and Karplus [10] used a varietyof fitness functions which are proportional to the residualerror of the training set, the test set, or even the cross-vali-dation set from the neural network simulations. However,since we implemented regularized networks, we tried theMSE of data fitting as the individual fitness function. Thefirst step is to create a gene pool (population of neural net-work predictors) of N individuals. Each individual encodesthe same number of descriptors; the descriptors are ran-domly chosen from a common data matrix, and in a waysuch that (1) no two individuals can have exactly the sameset of descriptors and (2) all descriptors in a given indivi-dual must be different. The fitness of each individual inthis generation is determined by the MSE of the modeland scaled using a scaling function. A top scaling fitnessfunction scaled a top fraction of the individuals in a popu-lation equally; these individuals have the same probabilityto be reproduced while the rest are assigned the value 0.In the next step, a fraction of children of the next gener-

ation is produced by crossover (crossover children) andthe rest by mutation (mutation children) from the parents.Sexual and asexual reproductions take place so that thenew offspring contains characteristics from two or one ofits parents. In a sexual reproduction two individuals are se-lected probabilistically on the basis of their scaled fitnessscores and serve as parents. Next, in a crossover each pa-rent contributes a random selection of half of its descriptorset and a child is constructed by combining these twohalves of “genetic code”. Finally, the rest of the individuals

in the new generation are obtained by asexual reproduc-tion when parents selected randomly are subjected to arandom mutation in one of its genes; i.e., one descriptor isreplaced by another.Similarly to So and Karplus [10], we also included elit-

ism which protects the fittest individual in any given gener-ation from crossover or mutation during reproduction.The genetic content of this individual simply moves on tothe next generation intact. This selection, crossover, andmutation process is repeated until all of the N parents inthe population are replaced by their children. The fitnessscore of each member of this new generation is again eval-uated, and the reproductive cycle is continued until a 90%of the generations showed the same target fitness score[31].Differently to other GA-based approach, the objective

of our algorithm is not to obtain a sole optimum modelbut a reduced population of well-fitted models, with MSEthat lower a threshold MSE value, at which the Bayesian


Figure 2. Flow diagram describing the strategy for the imple-mented GA.



regularization guaranties to posses good generalizationabilities (Figure 2). This is because we used MSE of datatraining fitting instead of cross-validation or test set MSEvalues as cost function and therefore the optimum modelcannot be directly derived from the best-fitted modelyielded by the genetic search. However, from cross-valida-tion experiments over the subpopulation of well-fittedmodels it can derive the best generalizable network withthe highest predictive power. This process also assures toavoid chance correlations. This approach have shown to behighly efficient in comparison with cross-validation-basedGA approach since only optimum models, according tothe Bayesian regularization, are cross validated at the endof the routine and not all the model generated throughoutall the search process.

2.3 Data-Diverse Artificial Neural Network Ensembles(NNEs)

An artificial NNE is a learning paradigm where manyANNs are jointly used to solve a problem [32]. On the ba-sis of this judgment, a collection of a finite number of neu-ral networks is trained for the same task and the outputscan be combined to form one unified prediction. As a re-sult, the generalization ability of the neural network sys-tem can be significantly improved by reducing overfitting[33].An effective NNE should consist of a set of ANNs that

not only are highly correct but also make their errors ondifferent parts of the input space as well. So, the combina-tion of the output of several classifiers is only useful if theydisagree on some inputs. Krogh and Vedelsby [34] laterproved that the ensemble error can be divided into a termmeasuring the average generalization error of each indivi-dual network and a term called diversity that measures thedisagreement among the networks. In this way, the MSEof the ensemble estimator is guaranteed to be less than orequal to the averaged MSE of the component estimators.Model diversity can be introduced by manipulating the

input features (feature selection), randomizing the trainingprocedure (overfitting, underfitting, training with differenttopologies, and/or training parameters, etc.), manipulatingthe response value (adding noise), or manipulating thetraining set [35]. Since BRANN predictors have demon-strated to be highly stable to network topology variations[22], the latter method was used for introducing diversityin BRGNN ensembles.Data-diverse NNEs were previously used by us for mod-

el validation [15]. For generating the NNE constituent pre-dictors, we partitioned the whole data into several trainingand test sets. The assembled predictors aggregate theiroutputs to produce a single prediction. In this way, insteadof predicting a sole randomly selected external set, we pre-dict the result of averaging several ones. In this way, eachinhibitor was predicted several times forming training andtest sets and an average of both values were reported. The

predictive power was measured accounting root squarecorrelation coefficient (R2) and Root MSE (RMSE) valuesof the averaged test set of BRGNN ensembles having anoptimum number of members.

2.4 SOMs

Despite back-propagated neural networks having been ex-tensively preferred for nonlinear QSAR modeling, SOMshas also been reported as useful ANNs, accounting for im-portant merits and widespread applications [36, 37]. In or-der to settle structural similarities among the KCIs, a Ko-honen SOM was built. Kohonen [38] introduced a neuralnetwork model that generates SOMs. In such maps, mole-cules with similar descriptor vectors are projected into thesame or closely adjacent neurons [36, 37]. These networkshave been successfully used for addressing structural simi-larities among bioactive chemical data sets [9, 12 – 17, 39].In this work, SOMs were implemented in a MATLAB

environment [23], and neurons were initially located at agrid topology. The ordering phase was developed in 1000steps with a 0.9 learning rate, until a tuned neighborhooddistance (1.0) was achieved. The tuning phase learningrate was 0.02. Training was performed for a period of 2000epochs in an unsupervised manner.

2.5 Data Set and Computational Strategies

In an attempt to investigate the inhibition of cruzain, wetried to obtain a reliable nonlinear regression model forthe inhibition constant Ki of 46 KCIs against cruzain takenfrom the literature [6, 7] (Table 2). We aimed to identifyrelevant chemical structural features required and/or re-sponsible for enzyme inhibition by means of BRGNN ap-proach. In this connection, 2D autocorrelation vectorswere used for structural information encoding [18 – 20].This chemical code has been successfully employed by

our group for modeling enzyme inhibitory activities [12,17]. Dragon computer software [40] was used for calculat-ing three types of autocorrelation vectors (Section 2.1) atspatial lags ranging from one to eight and weighted byfour atomic properties (atomic masses, atomic van derWaals volumes, atomic Sanderson electronegativities, andatomic polarizabilities); thus, a total of 96 (3�8�4) 2Dautocorrelation vectors were computed. Descriptors thatstayed constant or almost constant were eliminated andpairs of variables with a square correlation coefficient (R2)greater than 0.9 were classified as intercorrelated, andonly one of these was included for building the model. Fi-nally a 77-descriptor data matrix was obtained.Since 77 2D autocorrelation descriptors were available

for a QSAR analysis on 46 compounds, the 77-dimensionalautocorrelation space was explored searching for 3 – 6-di-mensional subspaces that derive an optimum nonlinear re-gression model throughout BRGNN technique. In thissense, inside the GA framework networks were trained





Table 2. Chemical structures of KCIs, logarithmic experimental, and predicted inhibition constants by optimum model BRGNN 2.

� log(Ki)

No.b R1 R2 R3 R4 Exp. Calc.trainc Calc.test

d

1 �CH2�S�CH2(4-Cl Ph) �CH2Ph �CH2Ph Ph �1.918 �1.775 �1.6522 �CH2�S�CH2Ph �CH2Ph �CH2Ph Ph �1.644 �1.440 �1.3063 �CH2�S�CH(CH3)CO2Et �CH2Ph �CH2Ph Ph �1.568 �1.907 �2.0004 �CH2�S�CH2CH2NHCOCH3 �CH2Ph �CH2Ph Ph �1.535 �1.031 �0.9485 �CH2�S�(c-(C5H9)) �CH2Ph �CH2Ph Ph �1.535 �1.569 �1.6696 �CH2�S�CH2(4-OCH3Ph) �CH2Ph �CH2Ph Ph �1.511 �1.465 �1.4247 �CH2�S�CH2�S�CH2CO2Et �CH2Ph �CH2Ph Ph �1.494 �1.461 �1.4628 �CH2�S�(CH2)2CH(CH3)2 �CH2Ph �CH2Ph Ph �1.479 �1.651 �1.7459 �CH2�S�CH2(4-C(CH3)3Ph) �CH2Ph �CH2Ph Ph �1.342 �1.243 �1.05810 �CH2�S�CH2CONHCH3 �CH2Ph �CH2Ph Ph �1.176 �1.232 �1.27711 �CH2�S�C(CH3)2CH2C(CH3)3 �CH2Ph �CH2Ph Ph �1.017 �0.994 �0.99912 �CH2�S�c-(C6H11) �CH2Ph �CH2Ph Ph �0.875 �0.825 �0.79413 �CH2�S�CH(CH3)CH2CH3 �CH2Ph �CH2Ph Ph �0.763 �0.178 0.13214 �CH2�S�(C(CH3)(CH2C(CH3)3)2 �CH2Ph �CH2Ph Ph �0.699 �0.608 �0.11315 �CH2�S�CH(CH3)2 �CH2Ph �CH2Ph Ph �0.663 �0.721 �0.71716 �CH2�S�CH2CH2Ph �CH2Ph �CH2Ph Ph �0.623 �0.690 �0.62317 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2Ph Ph �0.462 �0.572 �0.65218 �CH2�S�C(CH3)3 �CH2Ph �CH2Ph Ph �0.398 �0.398 �0.32819 �CH2�S�(CH2)3Ph �CH2Ph �CH2Ph Ph �0.301 �0.636 �0.69920 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2CH(CH3)2 Ph �0.903 �1.028 �1.18321 �CH2�S�CH(CH3)2 �CH2Ph �CH2CH(CH3)2 Ph �0.892 �0.934 �0.92722 �CH2�S�C(CH3)3 �CH2Ph �CH2CH(CH3)2 Ph �0.114 �0.343 �0.39823 �CH2�S�(CH2)3Ph �CH2Ph �CH2CH(CH3)2 Ph 0.000 �0.295 �0.35924 �CH2�S�C(CH3)3 �CH2Ph �CH2CH(CH3)2 Morph �1.776 �1.887 �2.07325 �CH2�S�C(CH3)3 �CH2Ph �CH2Ph Morph �1.737 �1.748 �1.85326 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2CH(CH3)2 Morph �1.843 �1.851 �1.86627 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2Ph Morph �1.772 �1.921 �2.03228 �CH2�S�(CH2)3Ph �CH2Ph �CH2CH(CH3)2 Morph �2.116 �1.841 �1.76429 �CH2�S�(CH2)3Ph �CH2Ph �CH2Ph Morph �2.335 �1.607 �1.46230 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2Ph 2-Pyr �1.483 �1.298 �1.27031 �CH2�S�(CH2)2CO2Et �CH2Ph �CH2Ph 3-Pyr �1.356 �1.183 �1.10832 �CH2�S�(CH2)3Ph �CH2Ph �CH2Ph 3-Pyr �0.740 �1.005 �1.03533 �CH2�S�(CH2)3Ph �CH2Ph �CH2�CH�(CH3)2 3-Pyr �0.732 �0.764 �0.78334 �CH2�S�(CH2)3Ph �(CH2)2Ph �CH2Ph 3-Pyr 0.046 �0.246 �0.28835 �CH2�S�(CH2)3Ph �(CH2)2Ph �CH2�CH�(CH3)2 3-Pyr �0.041 0.025 0.00336 �CH2�S�C(CH3)3 �CH2Ph �CH2�CH�(CH3)2 2-Pyr �1.140 �1.080 �1.01937 �CH2�S�C(CH3)3 �CH2Ph �CH2�CH�(CH3)2 4-Pyr �1.149 �1.109 �1.06538 �CH2�S�C(CH3)3 �CH2Ph �CH2Ph 3-Pyr �0.663 �0.838 �0.86339 �CH2�S�C(CH3)3 �CH2Ph �CH2�CH�(CH3)2 3-Pyr �0.681 �0.822 �0.84040 �CH2�S�C(CH3)3 �(CH2)2Ph �CH2Ph 3-Pyr �0.146 �0.063 0.06841 �CH2�S�C(CH3)3 �(CH2)2Ph �CH2�CH�(CH3)2 3-Pyr �0.301 �0.079 0.03742 �CONHCH3 �H �CH2Ph Ph �1.140 �1.553 �1.79843 �COOH �H �CH2Ph Ph 0.081 �0.073 �0.16644 �CONHCH2Ph �H �CH2Ph Ph �0.531 �0.524 �0.54945a �COOCH2CH3 �H �CH2Ph Ph �3.043 �2.760 �2.12746 �CONHCH2Ph �CH(CH3)2 �CH2Ph Ph 0.057 �0.166 �0.337

Morph, Morpholinyl; Pyr, Pyridinyl.a Sketch for compound 45.b Compounds 1 – 41 were taken from [6] and compounds 42 – 46 were taken from [7].c Calculated as average overtraining sets using a 50-member ensemble.d Calculated as average over test sets in the using a 50-member ensemble.



with three, four, five, and six inputs. Afterwards, optimumsubset of autocorrelation vectors were used for unsuper-vised training of competitive neurons in order to build aSOM of the inhibition constant of the KCIs that leads toelucidation of structural features and autocorrelation vec-tors distributions on the data set relevant for cruzain inhib-ition.

3 Results and Discussion

3.1 BRGNN Simulations

The implemented BRGNN algorithm searches for thebest-fitted BRANNs, in such a way that from one genera-tion to another the algorithm tries to minimize the MSE ofthe networks (fitness function). By employing this ap-proach instead of a more complicated and time-consumingcross-validation-based fitness function we gain in CPUtime and simplicity of the routine. Furthermore, we candevote our relatively small data set completely to train thenetworks. However, the use of the MSE fitness functioncould lead to undesirably well-fitted but poorly general-ized networks. In this connection, we avoided such predic-tors in the BRGNN approach in two ways: (1) keepingnetwork architectures as simple as possible (only two hid-den nodes) inside the GA framework and (2) implement-ing Bayesian regulation in the network training function(Section 2.2.1). The nonlinear subspaces in the data setwere explored varying the number of network inputs fromthree to six. As a result of the algorithm a small populationof well-fitted models was obtained. Afterwards those mod-els were tested in cross-validation experiments and themodel with the best cross-validation statistics was selectedas optimum.Concerning the possibility of change correlations, fol-

lowing the method used by So and Karplus in [10], we per-formed a randomization test. Randomized values weregiven to the dependent variable [� log(Ki)] and networkswere trained using this randomized target and the real setof independent variables (optimum autocorrelation vec-tors). By repeating this processes 500 times, no correlation

was found between R2 values for training and cross-valida-tion, similar to the results of So and Karplus [10].Table 3 shows statistics and variables for optimum

BRGNN with five inputs but varying the number of hid-den nodes. The variables in the nonlinear models repre-sent: ATS3v, the Broto –Moreau autocorrelation of atopological structure of lag 3 weighted by van der Waalsvolumes; ATS5v, the Broto –Moreau autocorrelation of atopological structure of lag 5 weighted by van der Waalsvolumes; MATS7e, the Moran autocorrelation of a topo-logical structure of lag 7 weighted by Sanderson electrone-gativities; MATS1p, the Moran autocorrelation of a topo-logical structure of lag 1 weighted by polarizabilities;GATS6v, the Geary autocorrelation of a topological struc-ture of lag 6 weighted by van der Waals volumes. By in-spection of Table 3 it can be observed that Bayesian regu-larization yielded quite stable and reliable networks. Thebehavior of these networks was asymptotic with respect tothe number of hidden nodes with maximum number of op-timum parameters equal to 25. However, considering thecross-validation statistics among those neural networks theoptimum predictor was BRGNN 2 with two hidden nodesand only 13 optimum parameters having highest values ofthe square correlation coefficients for data fitting (R2) andleave-one-out (LOO) cross-validation (R2

cv) about 0.87and 0.75, respectively.The correlation matrix of the optimum subset of five au-

tocorrelation vectors is shown in Table 4. As can be ob-served there is no significant intercorrelation among de-scriptors. This fact reflects that different topological infor-mation is brought to the model by each autocorrelation


Table 3. Statistics of the optimum BRGNNs with the optimum descriptors subset but varying the number of neurons in the hiddenlayer for the cruzain inhibition constants of the KCIs. The optimum neural network predictor appears in bold letter.

Variables Model Hidd. nod. Num. par. Opt. par. R2 S R2cv Scv

ATS3v 1 1 8 7 0.636 0.422 0.528 0.482ATS5v 2 2 15 13 0.874 0.249 0.749 0.353MATS7e 3 3 22 17 0.893 0.229 0.678 0.406MATS1p 4 4 29 21 0.918 0.201 0.595 0.488GATS6v 5 5 36 25 0.940 0.173 0.586 0.503

6 6 43 25 0.938 0.175 0.555 0.523

Hidd. nod. represents the number of hidden nodes; Num. par. represents the number of neural network parameters; Opt. par. represents the optimumnumber of neural network parameters yielded by the Bayesian regularization; R2 and R2

cv are the square correlation coefficients of data fitting and LOOcross-validation processes, respectively; S and Scv are the standard deviations of data fitting and LOO cross-validation processes, respectively.

Table 4. Correlation matrix of the inputs of the optimum predic-tor BRGNN 2.

ATS3v ATS5v MATS7e MATS1p GATS6v

ATS3v 1 0.019 0.002 0.392 0.145ATS5v 1 0.143 0.027 0.072MATS7e 1 0.184 0.009MATS1p 1 0.224GATS6v 1



vector resembling a topological pattern crucial for the in-hibition of cruzain.

3.2 Data-Diverse Ensembles of BRGNNs

In order to build a robust model we used ensembles ofBRGNNs instead of a single network to calculate Ki val-ues for KCIs. Recently Baumann [41] demonstrated thatensemble averaging significantly improves the predictionaccuracy by averaging the predictions of several modelsthat are obtained in parallel with bootstrapped trainingsets and provide a more realistic meaning of the predictivecapacity of any regression model.Here we used a perturbation technique called subagging

but results are not expected to be different for traditionalbagging [41]. A bootstrapped generated training set is ob-tained; later the repetitions in the bootstrap sample are re-moved (i.e., remove objects that were drawn twice, thrice,etc.). The resulting set encompasses the training set whilethe remaining objects which are not a part of the trainingset represent the test set (set difference between all objectsand the training set). Note that removal of the repetitionsafter the bootstrap sampling is the only difference betweensubagging and bagging [41]. Diverse-data ensembles, re-cently applied by us [15], consists in training severalBRGNNs with different randomly partitioned trainingsets of 34 KCIs (75% of the data) and predicting the activi-ty of the rest 12 inhibitors (25% of the data) in the testsets.Outputs of the trained networks were combined to form

one unified prediction. As a result, the generalization abil-ity of the neural network system is significantly improvedsince changing the elements that constitute test and train-ing sets is a way to introduce diversity to the ensemble[35]. We reported in Table 2 two calculated Ki values foreach KCIs: one average over the training sets and anotherover the test sets. The optimum number of members in theensemble predictor was selected by studying the behaviorof RMSE of the training and test sets versus the number ofnetworks in the ensemble.Usually the predictive power of a QSAR model is as-

sessed by performing internal (calculating R2cv of a cross-

validation process) or/and external (calculating R2test of an

external test-set fitting) validations. The use of internalvalidations has been subjected to serious criticism and themost accepted procedure is to use external validation [42].However, this accepted approach has a drawback that oneneeds to split the data in the training and test sets in such away that the performance of the model could depend onthe partition made. This problem becomes critical in non-linear modeling where high-dimensional problems aretreated and error surfaces are rather complex [41]. Re-flecting this, Figure 3 shows plots of RMSE values forNNEs with number of members varying from 2 to 100. Ascan be observed the RMSE distribution for ensembleshaving a low number of members is extremely broad with

values about 0.10 – 0.50 and 0.20 – 0.70 for the training andtest sets, respectively. But mean values of such statisticalquantity decrease and remain stable for ensembles having50 and more members with values about 0.22 – 0.23 and0.35 – 0.40 for the training and test sets, respectively. Thisresult reflects that even when using Bayesian regularizednetworks, robust models, invariable to data splitting, canbe only derived by combining an adequate number ofdata-diverse independent predictors in ensemble architec-ture [41]. According to the above results, we selected theoptimum ensemble having 50 members as a robust predic-tor for the inhibition constants of KCIs.Figure 4 depicts plots of calculated versus experimental

� log(Ki) values for each inhibitor calculated as an averageovertraining and test sets according to the ensemble pre-dictor. The ensemble accuracy for data fitting was about90 and 75% for inhibitors in the training and test sets, re-spectively. BRGNN approach fits well in a nonlinear waythe logarithmic inhibition constants mathematically en-coded by means of a combination of 2D topological infor-mation and atomic properties. The inhibitory profile ofKCIs, which the optimum five vectors encode, was success-fully learned by the ensemble of BRGNNs during super-vised training.

3.3 Comparison to Linear Regression Modeling by GA-Based Partial Least Square (GA-PLS)

In order to settle the goodness of the BRGNN approachfor modeling the present data set, our optimum modelBRGNN 2 was compared with linear regression modelsthat were developed using PLS, a powerful and wide-spread linear approach. Taking into account the high di-mension of our data set a feature selection-based PLS wasused in order to obtain valid results. A robust GA-PLSroutine has been successfully applied to the analysis of


Figure 3. Plots of RMSE of training (*) and test (*) sets for� log(Ki) average values for 20 ensembles vs. the number ofneural networks in each ensemble.



spectral data by Leardi [43]. Details of the implementedalgorithm can be found elsewhere [43]. This procedureuses GA for descriptors and latent variables selections in-side the PLS framework. This approach has been shown tobe very efficient when handling high-dimensional data andto yield models having very often higher and anyway neverlower predictive power than model using the whole dataset [43].PLS regression models for the inhibition constants of

the KCIs under study were obtained by using the PLS-GAToolbox for MATLAB developed by Leardi [43] in whichthe above-mentioned algorithm was implemented.GA-PLS was run five times in order to test the stability

of the best model obtained. Figure 5 depicts the behaviorof the explained cross-validation variance versus the num-ber of descriptors included for calculating the latent varia-bles. It is noteworthy that the best linear model underper-formed the nonlinear BRGNN approach even when it in-cludes contributions of 13 autocorrelation vectors and 6 la-tent variables for about 70% maximum of data variancedescribed in the LOO cross-validation process. Among the13 autocorrelation vectors, five descriptors were weightedby polarizability, five were weighted by Sanderson electro-negativities, two were weighted by atomic masses, and theother two were weighted by van der Waals volumes. Thesuccessful application of GA-PLS algorithm to our QSARproblem can be addressed by taking into account that themodel including the whole data set only described about40% of cross-validation variance (Figure 5).The BRGNN technique clearly overcomes the linear ap-

proach by describing not only 75% of cross-validation datavariance in comparison to 70% for the PLS model, butalso the linear approach required to encompass linear con-

tributions of 13 autocorrelation vectors while modelBRGNN 2 just correlated in a nonlinear way the logarith-mic inhibition constant with five descriptors. In addition,the high complexity of the best GA-PLS model also makesour BRGNN model preferable for investigating particulareffects of descriptors or/and structural keys encoded in thebest subset of autocorrelation vectors.

3.4 Interpretation of the Models

In order to gain a deeper inside on the relative effects ofeach autocorrelation vector in the model BRGNN 2, a re-cently reported weight-based input ranking scheme wascarried out. Black-box nature of three-layers ANNs hasbeen “deciphered” in a recent report of Guha et al. [44].Their method allows understanding how an input descrip-tor is correlated to the predicted output by the network,and consists of two parts. First, the nonlinear transform fora given neuron is linearized. Later, the magnitude in whicha given neuron affects the downstream output is deter-mined. Next, a ranking scheme for neurons in the hiddenlayer is developed. The ranking scheme is carried out bydetermining the Square Contribution Values (SCVs) foreach hidden neuron (see [44] for details). This method forANN model interpretation is similar in manner to thePLSs interpretation method for linear models describedby Stanton [45].Results of the ANN deciphering study are presented in

Table 5. The reported effective weight matrix for the opti-mum model BRGNN 2 shows that the second hidden neu-ron has the major contribution to the model with an SCVvalue four-fold higher in comparison to the other hidden


Figure 4. Plots of average calculated vs. experimental � log(Ki)for KCIs in training (*) and test (*) sets according to 50-mem-ber ensemble of the optimum network BRGNN 2. The dottedline is an ideal fit with the respective intercept and slope equalto zero and one.

Figure 5. Plots of the explained cross-validation variance vs.the number of descriptors in the PLS analysis. The maximumcorrespond to 70% for 13 autocorrelation vectors and 6 latentvariables.



neuron. Interestingly, on this neuron the atomic van derWaals-weighted autocorrelation vectors: ATS3v, ATS5v,and GATS6v have the highest impacts equal to 2.957,�1.366, and �2.835, respectively, while Sanderson electro-negativity- and polarizability-weighted descriptors haveimpacts <j1 j . From this analysis can be derived thatatomic van der Waals distributions on substructural frag-ments of size 3, 5, and 6 are the most important featuresruling the modeling of KCIs inhibition constant accordingto model BRGNN 2.((Table 5))Our model reflects that hydrophobic interac-

tions among KCIs and cruzain active site are the main fac-tor in the inhibitory process. This fact agrees well with cru-zain substrate/inhibitor binding mechanism. Cruzain is pe-culiar in that this enzyme, as well as cathepsin-B-like pro-teases and related cysteine proteases, has evolved a dualspecificity that allows tight binding and catalysis on bothbasic and aromatic residues [46]. In cathepsin-B-like pro-teases, binding of the substrate or inhibitor P2 residue tothe enzyme-shallow S2 specificity subsite, lined by hydro-phobic residues and faced by a solvent-accessible Glu resi-due (Glu205 in cruzain), is the major determinant for sub-strate and inhibitor selection. In contrast, the chemical na-ture of the substrate or inhibitor P1 and P3 residues is lessrelevant but the inhibitory potency of the inhibitors can bemodulated by modifying such residues. Considering thisand taking into account that KCIs inhibitors under studyhave hydrophobic but not basic substituents at positionR3, representing hydrophobic P2 residues (Table 2), it isexpectable that hydrophobicity-related property, atomicvan der Waals volume, appears as the most relevant in thenonlinear model developed.Finally, we aimed to settle some similarities among

KCIs by building a SOM using the optimum subset of au-tocorrelation vectors. Figure 6 depicts the 8�8 SOM ofthe � log(Ki) values. Thirty-five neurons were occupied ofa total of 64 neurons yielding about 55% of occupancy inthe map. As can be observed, inhibitors with similar activi-ty range were located at neighboring neurons in the map.Most active inhibitors were distributed at top zone of theleft-top/right-bottom diagonal of the map (Zone 1). Other-

wise, less active inhibitors were placed at the lower zoneof the same left-top/right-bottom diagonal of the map(Zone 2). By analyzing the map, some structural similari-ties among compounds can be addressed taking into ac-count their allocation at such regions with two levels of in-hibitory activity.The majority of the highly active KCIs (10, 12, 13, 14,

15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44, 46), those thathad highly hydrophobic P1’ and P3 residues, were placedin Zone 1. Astonishingly, compound 43 with a carboxylicgroup in position R1 and a hydrogen at position R2, thatmeans an inhibitor with Glycine in P1 residue and lackingof P1’ residue but having high inhibition constant, was


Table 5. Effective weight matrix for the optimum modelBRGNN 2 for the inhibition constant Ki of the studied KCIsa.The most-relevant descriptors appear in bold.

Network inputs Hidden neurons

2 1

ATS3v 2.957 �1.564ATS5v �1.366 �0.315MATS7e �0.248 0.652MATS1p 0.111 0.362GATS6v �2.835 0.946SCV 0.796 0.204

a The columns are ordered by the SCVs for the hidden neurons, shown inthe last row.

Figure 6. Kohonen SOMs of � log(Ki) and normalized valuesof the autocorrelation vectors weighted by van der Waals vol-umes (ATS3v, ATS5v, and GATS6v). Logarithmic inhibition con-stant legend is placed at the right-hand side of the map. Less-ac-tive inhibitor, compound 45, appears highlighted in the map.



placed in this active neighborhood. This fact could be re-lated with the stabilization of this inhibitor by electrostaticinteractions between the carboxylic group and polar resi-dues surrounding cruzain active site. In this sense, Choeet al. [7] had reported that such carbonyl group interactswith residues Glutamine 19 and Histidine 159 in the cru-zain active-site neighborhood. This experimental fact alsosupports the occurrence in our optimum model of autocor-relation vectors weighted by other atomic properties suchas Sanderson electronegativity and polarizabilities. Inturn, in the less-active Zone 2 were placed the less-activecompounds having less hydrophobic P1’ and P3 residues.Interestingly, less active inhibitor (compound 45) was welllocated in the less active zone of the map. This compoundwas predicted by the ensemble with the highest RMSE¼0.916 but correctly with the lower inhibition constant. It isnoteworthy that despite this compound corresponds to arelatively different chemo type (Table 2) with a vinyl moi-ety, both the regression model and the qualitative ap-proach (SOM) were able to adequately “classify” thiscompound.In addition, Figure 6 also depicts the SOMs of the nor-

malized values of the most relevant three descriptorsweighted by atomic van der Waals volume (ATS3v, ATS5v,and GATS6v). As can be observed, interpretation of therelation between the autocorrelation vectors and the loga-rithmic inhibition constants is not amenable. The distribu-tions of such variables in the maps do not resemble aneasy-to-elucidate pattern but it can be inferred that themost active neurons tend to have middle values of suchvariables. This fact reflects the notion that the optimum in-hibitory potency corresponds to some middle values of thedescriptors according to certain nonlinear function encom-passed by the ensemble predictor.Model interpretation is also limited by the nature of the

employed encoding. The 2D autocorrelation descriptorsrepresent the topological structure of the compounds, butare more complex in nature when compared to the classi-cal topological descriptors. The computation of these de-scriptors involves the summations of different autocorrela-tion functions corresponding to different structural lagsand leads to different autocorrelation vectors correspond-ing to the lengths of substructural fragments. Bearing thisaspect in mind, the interpretation of 2D autocorrelationdescriptors is not easy.However, an easy-to-interpret version of the autocorre-

lation formalism has been successful applied to 3D QSAR.Specifically, correlating values of potential on a CoMFA-like grid around molecules have been reported by Crucianiand co-workers [47]. They proposed a class of moleculardescriptors named Grid-Independent Descriptors(GRINDs) which are derived in such a way as to be highlyrelevant for describing biological properties of compoundswhile being alignment-independent, chemically interpreta-ble, and easy to compute. GRIND descriptors can be usedto obtain graphical diagrams called “correlograms” and

can be used for performing different chemometric analy-ses. Another major advantage, in addition to their align-ment-independent nature, is that the original descriptors(molecular interaction fields) can be regenerated from theautocorrelation transform and thus the results of the anal-ysis represented graphically, together with the original mo-lecular structures, in 3D plots [47]. Highly predictive andinterpretable models have been obtained using this auto-correlation-based 3D QSAR method [48].In our work, the pool of 2D autocorrelation descriptors

basically defines a wide 2D space. On behalf of a greaterapplicability, physicochemical properties (atomic masses,atomic van der Waals volumes, atomic Sanderson electro-negativities, and atomic polarizabilities) were inserted asweighting components. As a result, these descriptors ad-dress the topology of the structure or parts thereof in asso-ciation with a specific physicochemical property. For asound application, we found that a very few structuralkeys yield better performances than the complete set. Inpoint of fact, the BRGNN method selected an optimumdescriptor combination, which includes van der Waals vol-umes, as the most relevant key features. This result illus-trates that a certain distribution of this property is necessa-rily required for typifying the inhibitory potency of thestudied KCIs.However, our model should be mainly interesting as a

predictive tool rather than for designing or an approachfor explanation of the structure/property trends that areencoded within the model. Optimum BRGNN predictorshould be useful for predictive purposes, as an “in silico”filter, to screen for new potent KCIs.

4 Concluding Remarks

Biological interactions are complex by nature. In thissense, BRGNN approach was successfully applied formodeling inhibition constants of KCIs. 2D spatial autocor-relation vectors showed to encode relevant nonlinear in-formation regarding cruzain inhibition. An ensemble of 50BRGNNs explained about 90 and 75% of ensemble train-ing and test sets variances. From an input ranking analysisit was derived that autocorrelations of atomic van derWaals volumes at lags 3, 5, and 6 were the mean featuresgoverning inhibitory specificity of the study KCIs accord-ing to the optimum regression model. This fact suggested ahigh importance of the autocorrelation of substructures ofsuch size on the inhibition constants of the studied KCIs.Additionally, the SOM map built with the optimum de-

scriptor subset resembling the best regression model alsopointed out to the hydrophobicity of substituents as themost differentiating property, specifically residues P1’ andP3 in the inhibitor structure. The mapping of the most rel-evant descriptors in the SOM reflected that the most ac-tive inhibitor correspond to middle values of the autocor-relation of van der Waals volumes suggesting a certain




nonlinear behavior of the activity against the most rankednetwork inputs.The present work demonstrates the successful applica-

tion of the 2D spatial autocorrelation vectors to the mod-eling of cruzain inhibition in combination with BRGNNapproach. By using this technique it is possible to identifykey features related to the modeled property and at thesame time to build robust regression models.

Acknowledgements

The authors would like to acknowledge to Professor Ric-cardo Leardi for providing useful information and com-ments regarding GA-based PLS algorithm and PLS-GAtoolbox. Last but not the least, anonymous referees arealso acknowledged for their useful comments that helpedto improve the quality of the manuscript.

References

[1] Seventeenth Programme Report of the UNICEF/UNDP/World Bank/WHO Special Programme for Research &Training in Tropical Diseases, 2005, http://www.who.int/tdr/publications/publications/pdf/pr17/pr17.pdf

[2] J. C. Engel, P. S. Doyle, J. Palmer, I. Hsieh, D. F. Bainton,J. H. McKerrow, J. Cell Sci. 1998, 111, 597 – 606.

[3] J. C. Engel, P. S. Doyle, I. Hsieh, J. H. McKerrow, J. Exp.Med. 1998, 188, 725.

[4] W. R. Roush, F. V. Gonzalez, J. H. McKerrow, E. Hansell,Bioorg. Med. Chem. Lett. 1998, 8, 2809 – 2812.

[5] K. A. Scheidt, W. R. Roush, J. H. McKerrow, P. M. Selzer,E. Hansell, P. J. Rosenthal, Bioorg. Med. Chem. 1998, 6,2477 – 2494.

[6] L. Huang, A. Lee, J. A. Ellman, J. Med. Chem. 2002, 45,676 – 684.

[7] Y. Choe, L. S. Brinen, M. S. Price, J. C. Engel, M. Lange, C.Grisostomi, S. G. Weston, P. V. Pallai, H. Cheng, L. W. Har-dy, D. S. Hartsough, M. McMakin, R. F. Tilton, C. M. Baldi-no, C. S. Craik, Bioorg. Med. Chem. 2005, 13, 2141 – 2156.

[8] M. Wagener, J. Sadowski, Gasteiger, J., J. Am. Chem. Soc.1995, 117, 7769 – 7775.

[9] H. Bauknecht, A. Zell, H. Bayer, P. Levi, M. Wagener, J.Sadowski, J. Gasteiger, J. Chem. Inf. Comput. Sci. 1996, 36,1205 – 1213.

[10] S. So, M. Karplus, J. Med. Chem. 1996, 39, 1521 – 1530.[11] M. Fernandez, J. Caballero, A. H. Morales, E. A. Castro,

M. P. Gonzalez, Bioorg. Med. Chem. 2005, 13, 3269 – 3277.[12] M. Fernandez, A. Tundidor-Camba, J. Caballero, Mol. Si-

mul. 2005, 31, 575 – 584.[13] J. Caballero, M. Fernandez, J. Mol. Model. 2006, 12, 168 –

181.[14] J. Caballero, M. Garriga, M. Fernandez, J. Comput.-Aided

Mol. Des. 2005, 19, 771 – 789.[15] M. Fernandez, A. Tundidor-Camba, J. Caballero, J. Chem.

Inf. Comput. Sci. 2005, 45, 1884 – 1895.[16] M. P. Gonzalez, J. Caballero, A. Tundidor-Camba, A. M.

Helguera, M. Fernandez, Bioorg. Med. Chem. 2006, 14,200 – 213.

[17] M. Fernandez, J. Caballero, Bioorg. Med. Chem. 2006, 14,280 – 294.

[18] P. A. P. Moran, Biometrika 1950, 37, 17 – 23.[19] R. F. Geary, The Incorporated Statistician 1954, 5, 115 – 145.[20] G. Moreau, P. Broto, Nouv. J. Chim. 1980, 4, 359 – 360.[21] G. Moreau, P. Broto, Nouv. J. Chim. 1980, 4, 757 – 764.[22] a) F. R. Burden, D. A. Winkler, J. Med. Chem. 1999, 42,

3183 – 3187; b) D. A. Winkler, F. R. Burden, Biosilico 2004,2, 104 – 111.

[23] Matlab 7.0. software, available from The Mathworks Inc.,Natick, MA; http://www.mathworks.com.

[24] J. Zupan, J. Gasteiger, Anal. Chim. Acta 1991, 248, 1 – 30.[25] D. J. Livingstone, D. T. Manallack, I. V. Tetko. J. Comput.-

Aided Mol. Des. 1997, 11, 135 – 142.[26] D. J. C. Mackay, Neural Comput. 1992, 4, 415 – 447; Neural

Comput. 1992, 4, 448 – 472.[27] J. Lampinen, A. Vehtari, Neural Networks 2001, 14, 7 – 24.[28] F. D. Foresee, M. T. Hagan, Proceedings of the 1997 Interna-

tional Joint Conference on Neural Networks, IEEE, Hous-ton, 1997, pp. 1930 – 1935.

[29] H. Holland, Adaption in Natural and Artificial Systems, TheUniversity of Michigan Press, Ann Arbor, MI, 1975.

[30] H. M. Cartwright, Applications of Artificial Intelligence inChemistry, Oxford University Press, Oxford, 1993.

[31] B. Hemmateenejad, M. A. Safarpour, R. Miri, N. Nesari, J.Chem. Inf. Model. 2005, 45, 190 – 199.

[32] I. Tetko, D. J. Livingstone, A. I. Luik, J. Chem. Inf. Comput.Sci. 1995, 35, 826 – 833.

[33] L. K. Hansen, P. Salamon, IEEE Trans. Pattern Anal. Ma-chine Intell. 1990, 12, 993 – 1001.

[34] A. Krogh, J. Vedelsby, Neural network ensembles, cross-val-idation and active learning, in: Tesauro, G., Touretzky, D.,Lean, T. (Eds.), Advances in Neural Information ProcessingSystems 7, MIT Press, 1995; pp. 231 – 238.

[35] D. K. Agrafiotis, W. Cedeno, V. S. Lobanov, J. Chem. Inf.Comput. Sci. 2002, 42, 903 – 911.

[36] A. Yan, J. Gasteiger, M. Krug, S. Anzali, J. Comput.-AidedMol. Des. 2004, 18, 75 – 87.

[37] J. A. de-Sousa, J. Gasteiger, J. Chem. Inf. Comput. Sci. 2001,41, 369 – 375.

[38] T. Kohonen, Biol. Cybern. 1982, 43, 59 – 69.[39] S. Anzali, J. Gasteiger, U. Holzgrabe, J. Polanski, J. Sado-

wski, A. Teckentrup, M. Wagener, Persp. Drug Discov. De-sign 1998, 9 – 11, 273 – 299.

[40] R. Todeschini, V. Consonni, M. Pavan, Dragon Softwareversion 2.1, 2002

[41] K. Baumann, QSAR Comb. Sci. 2005, 24, 1033 – 1046.[42] A. Golbraikh, A. Tropsha, J. Mol. Graph. Model. 2002, 20,

269 – 276.[43] a) R. Leardi, A. Lupianez, Chemometr. Intell. Lab. 1998, 41,

195 – 207; b) R. Leardi, J. Chemometr. 2000, 14, 643 – 655.[44] R. Guha, D. T. Stanton, P. C. Jurs, J. Chem. Inf. Model.

2005, 45, 1109 – 1121.[45] D. T. Stanton, J. Chem. Inf. Comput. Sci. 2003, 43, 1423 –

1433.[46] F. Polticelli, G. Zaini, A. Bolli, G. Antonini, L. Gradoni, P.

Ascenzi, Biochemistry 2005, 44, 2781 – 2789.[47] a) M. Pastor, G. Cruciani, I. McLay, S. Pickett, S. Clementi.

J. Med. Chem. 2000, 43, 3233 – 3243; b) G. Cruciani, P. Cri-vori, P.-A. Carrupt, B. Testa. J. Mol. Struc.-THEOCHEM2000, 503, 17 – 30.

[48] a) P. Crivori, G. Cruciani, P.-A. Carrupt, B. Testa. J. Med.Chem. 2000, 43, 2204 – 2216; b) R. Budriesi, E. Carosati, A.Chiarini, B. Cosimelli, G. Cruciani, P. Ioan, D. Spinelli, R.Spisani, J. Med. Chem. 2005, 48, 2445.




Documents

Modeling of the Inhibition Constant (Ki) of Some Cruzain Ketone-Based Inhibitors Using 2D Spatial Autocorrelation Vectors and Data-Diverse Ensembles of Bayesian-Regularized Genetic