Neural Network Methods for Identification and Optimization ...schwartzgroup1.arizona.edu/schwartzgroup/sites/... · than those in the training set (Fausett, 1994), min new "min old!abs(min

J. theor. Biol. (2000) 206, 27}45doi:10.1006/jtbi.2000.2098, available online at http://www.idealibrary.com on

Neural Network Methods for Identi5cation and Optimization ofQuantum Mechanical Features Needed for Bioactivity

BENJAMIN B. BRAUNHEIM* AND STEVEN D. SCHWARTZ*-?

*¹he Department of Physiology and Biophysics, Albert Einstein College of Medicine, 1300 Morris ParkAvenue, Bronx, N> 10461, ;.S.A. and -Department of Biochemistry, Albert Einstein College of

Medicine, 1300 Morris Park Avenue, Bronx, N> 10461, ;.S.A.

(Received on 3 August 1999, Accepted in revised form on 4 May 2000)

This paper presents a new approach to the discovery and design of bioactive compounds. Thefocus of this application will be on the analysis of enzymatic inhibitors. At present thediscovery of enzymatic inhibitors for therapeutic use is often accomplished through randomsearches. The "rst phase of discovery is a random search through a large pre-fabricatedchemical library. Many molecules are tested with re"ned enzyme for signs of inhibition. Oncea group of lead compounds have been discovered the chemical intuition of biochemists is usedto "nd structurally related compounds that are more e!ective. This step requires newmolecules to be conceived and synthesized, and it is the most time-consuming and expensivestep. The development of computational and theoretical methods for prediction of themolecular structure that would bind most tightly prior to synthesis and testing, wouldfacilitate the design of novel inhibitors. In the past, our work has focused on solving theproblem of predicting the bioactivity of a molecule prior to synthesis. We used a neuralnetwork trained with the bioactivity of known compounds to predict the bioactivity ofunknown compounds. In our current work, we use a separate neural network in conjunctionwith a trained neural network in an attempt to gain insight as to how to modify existingcompounds and increase their bioactivity.

( 2000 Academic Press

Introduction

The "rst attempts to develop computationalmethods to predict an inhibitor's potency priorto synthesis have been broadly termed quantitat-ive structure activity relationship (QSAR)studies. These techniques require the user to de-"ne a functional relationship between a molecu-lar property and molecular action. In the QSARapproach, or any approach where a person ischarged with adjusting a mathematical model,

?Author to whom correspondence should be addressed.

0022}5193/00/170027#19 $35.00/0

the investigator must use variations in the struc-ture of a molecule as the motivation for changingthe value of coe$cients in the model. For a chem-ical reaction as complex as an enzymatically me-diated transformation of reactants to product, itis not possible to predict a priori all the e!ectsa change to a substrate molecule will have onenzymatic action. For this reason we have de-veloped neural network approaches to the pre-diction of bioactivity of an enzymatic inhibitor.Because chemical reactivity is controlled byquantum mechanical properties, we havedeveloped a method to train (Braunheim

( 2000 Academic Press

28 B. B. BRAUNHEIM AND S. D. SCHWARTZ

& Schwartz, 1999; Braunheim et al., 1999)neural networks with ab initio quantum chemicaldata. This method has proven itself to be highlyaccurate in the prediction of binding strengths ofdiverse inhibitors to a variety of enzymatic sys-tems. This paper describes the next step in sucha research program; once a method is foundwhich can recognize quantum features needed forbinding, we construct a new methodology whichoptimizes these quantum features to eventuallyproduce descriptions of substances that couldhave greater bioactivity.

Neural networks have been used previously forthe task of simulating biological molecular recog-nition. Gasteiger et al. (1994) have used Kohonenself-organizing networks to preserve the max-imum topological information of a moleculewhen mapping its three-dimensional surface ontoa plane. Wagener et al. have used autocorrelationvectors to describe di!erent molecules. In thatwork (Wagener et al., 1995), the molecular elec-trostatic potential at the molecular surface wascollapsed onto 12 autocorrelation coe$cients.Neural networks were used by Weinstein et al. topredict the mode of action of di!erent chemo-therapeutic agents (Weinstein et al., 1992). Thee!ectiveness of the drugs on di!erent malignanttissues served as descriptors and the output tar-get for the network was the mode of action of thedrug (e.g. alkylating agent, topoisomerase I in-hibitor, etc.). Tetko et al. (1994) used a similarautocorrelation vectors approach. So & Richards(1992) used networks to learn and to predictbiological activity from QSAR descriptions ofmolecular structure. Neural networks were usedby Thompson et al. (1995) to predict the aminoacid sequence that the HIV-1 protease wouldbind most tightly, and this information was usedto design HIV protease inhibitors. The presentwork is a departure from all previous work be-cause the quantum mechanical electrostatic po-tential at the van der Waals surface of a moleculeis used as the physicochemical descriptor. Theentire surface for each molecule, represented bya discrete collection of points, serves as the inputto the neural network.

Neural networks are multi-dimensional non-linear function approximators. We use neuralnetworks as the decision-making algorithm be-cause they require no assumptions about the

function they are learning to approximate. This isimportant because we assume that the inter-actions between the inhibitor and the active siteare determined by many aspects of the inhibitorand it would be impossible for a person to a prioripredict them all. One can imagine the Schrodin-ger equation as creating a complex, nonlinearrelationship between a speci"c enzyme active siteand a variety of enzymatic inhibitors. This non-linear relation is what a neural network candiscover; and so simulate biological recognition.The neural network learns to approximatea function that is de"ned by the input/outputpairs. For our work the input is a quantum mech-anical description of a molecule and the output isthe binding energy of that molecule with theenzyme.

After the neural network is &&trained'' with thequantum features of an appropriate set of mol-ecules of known bioactivity we assume thisconstruction has &&learned'' the rules relatingquantum descriptions to enzymatic recognition.This paper will describe the way in which a newneural network can be created which uses theserules to generate features that optimize bioactiv-ity. The structure of the rest of this paper is asfollows: "rst, a general description of backpropa-gation neural networks is given. This is thenspecialized to the problem of biorecognition andfeatures optimization with special attention giventhe derivation of appropriate rules for back-propagation of error. We then describe howquantum molecular features can be cast in a formto serve as input for a neural net. The next sectionof the paper describes our generalization of theneural network concept to one in which onenetwork is trained to recognize binding featuresand a second coupled network optimizes thesefeatures. The next section of the paper containsapplication of the concepts to a speci"c multi-substrate enzyme*IU nucleoside hydrolase. Thepaper then brie#y summarizes the work andconcludes.

Neural Networks

A computational neural network is a computeralgorithm which, during its training process, canlearn features of input patterns and associatethese with an output. Neural networks learn to

FIG. 1. Picture of a standard backpropagation neuralnetwork, with an input layer i, hidden layer j, and outputlayer k.

j/1

NEURAL NETWORK METHODS 29

approximate the function de"ned by the in-put/output pairs. The function is never speci"edby the user. After the learning phase, a well-trained network should be able to predict anoutput for a pattern not in the training set. In thecontext of the present work, the neural net istrained with a set of molecules which can act asinhibitors for a given enzyme until the neuralnetwork can associate with every quantum mech-anical description of the molecules in this set,a free energy of binding (which is the output).Then the network is used to predict the freeenergy of binding for unknown molecules (see, forexample, Braunheim et al., submitted).

Computational neural networks are composedof many simple units operating in parallel. Theseunits and the aspects of their interaction areinspired by biological nervous systems. The net-work's function is largely determined by theinteractions between units. Networks learn byadjusting the values of the connections betweenelements (Fausett, 1994). The neural network em-ployed in this study is a feed forward with back-propagation of error network, that learns withmomentum. The basic construction of a backpropagation neural network has three layers: aninput layer, hidden layer, and an output layer.The input layer is where the input data are trans-ferred. The link between the layers of the networkis one of multiplication by a weight matrix, whereevery entry in the input vector is multiplied bya weight and sent to every hidden layer neuron,so that the hidden layer weight matrix has thedimensions n]m, where n is the length of theinput vector and m is the number of hidden layerneurons. A bias is added to the hidden and out-put layer neurons, the function of the bias is toadd a translational degree of freedom to thenodes allowing the transfer function to shift tothe left or right of the abscissa depending on thesign.

Referring to the schematic in Fig. 1, the inputlayer is represented by the squares at the top ofthe diagram. The weights are represented by thelines connecting the layers: w

ijis the weight be-

tween the i-th neuron of the input layer and j-thneuron of the hidden layer and w

jkis the weight

between the j-th neuron of the hidden layer andthe k-th neuron of the output layer. In thisdiagram, the output layer has only one neuron

because the target pattern is a single number, thebinding energy. The hidden layer input from pat-tern number 1 for neuron j, hI

j(1), is calculated,

hIj(1)"b

j#

n+i/1

xoi(1)]w

ij, (1)

where xoi(1) is the output from the i-th input

neuron, wij

is the element of the weight matrixconnecting input neuron i with hidden layerneuron j, and b

jis the bias on the hidden layer

neuron j. This vector hIjis sent through a transfer

function, f. This function is nonlinear and usuallysigmoidal, taking any value and returning a num-ber between !1 and 1 (Fausett, 1994). A typicalexample is

f (hIj)"

21#e~hIj

!1,hoj. (2)

The hidden layer output, hoj

is then sent to theoutput layer. The output layer input oI

kis cal-

culated for the k-th output neuron

oIk"b

k#

m+ ho

jw

jk, (3)


where wjk

is the weight matrix element connectinghidden layer neuron j with output layer neuron k.The output layer output, oo

k, is calculated with

a similar transfer function as the one given above

g(oIk)"

c1#e(~oIk)

!g,ook, (4)

where c is the range of the binding energies of themolecules used in the study and g is the minimumnumber of all the binding energies. The minimumand maximum values are decreased and in-creased 10% to give the neural network the abil-ity to predict numbers slightly larger and smallerthan those in the training set (Fausett, 1994),

minnew

"minold

!abs (minold

]0.1), (5)

maxnew

"maxold

#abs (maxold

]0.1). (6)

The calculation of an output concludes the feed-forward phase of training. The weights and biasesare initialized with random numbers, so duringthe "rst iterations, the output of the network willbe random numbers. Backpropagation of error isused in conjunction with learning rules to in-crease the accuracy of predictions. The di!erencebetween oo

k(1) and the target value for input pat-

tern number 1, tk(1), determines the sign of the

corrections to the weights and biases. The size ofthe correction is determined by the "rst deriva-tive of the transfer function. The changes tothe weights and biases are proportional to aquantity d

k:

dk"(t

k!oo

k)]g@(oI

k), (7)

where gI is the "rst derivative of eqn (4). Thecorrections to the weights and biases are cal-culated,

Dwjk"ad

khoj, (8)

Dbk"ad

k. (9)

The corrections are moderated by a, the learningrate, this number ranges from zero to one exclus-ive of the end points. a functions to prevent thenetwork from training to be biased to the last

pattern of the iteration, the network's errorshould be minimized with respect to all the pat-terns in the training set. The same learning rule isapplied to the hidden layer weight matrix andbiases. Learning with momentum allows weightsand bias corrections from previous iterations toin#uence the correction of the present iteration.This causes the network to minimize the error ofall the patterns in the training set rather thanfocusing on minimizing the error of the pattern ofa given iteration. The correction to the weights ofthe output layer at iteration number q is a func-tion of the correction of the previous iteration,q!1, and k, the momentum constant,

Dwjk(q)"ad

khoj#kDw

jk(q!1), (10)

Dbk(q)"ad

k#kDb

k(q!1). (11)

The same procedure is applied to the hiddenlayer weights and biases. The correction termsare added to the weights and biases concludingthe backpropagation phase of the iteration. Thenetwork can train for hundreds to millions ofiterations depending on the complexity of thefunction de"ned by the input/output pairs. Thistype of backpropagation is a generalization of theWidrow}Ho! learning rule applied to multiple-layer networks and nonlinear di!erentiabletransfer functions (Rumelhart et al., 1986).

Training a neural network requires variationof four adjustable parameters: (1) number ofhidden layer neurons, (2) the learning rate,(3) momentum constant and (4) number of train-ing iterations. The only way to tell that a networkis well trained is to minimize the training setprediction error. This can be calculated by takingthe di!erence between the target

ivalue for a

molecule (experimentally determined bindingenergy), and the number the neural networkpredicted for that pattern

i, and summing the

absolute value of this number of all the moleculesin the training set. As training progresses thetraining set prediction error will decrease. Min-imizing training set error is not without negativeconsequence; overtraining occurs when the net-work trains for too many iterations or has toomany hidden layer neurons. The only way to tellthat a neural network has not been overtrained is


to have it make a prediction for a pattern not inthe training set. That is, see if the network cangeneralize from the information contained in theinput/output pairs of the training set and applythat information to a molecule it has not trainedwith. In accommodation of this fact, we employone of the training set molecules as an adjustermolecule. This molecule is left out of the trainingset during training and used to check if the neuralnetwork was overtrained. The procedure is totrain the neural network until the prediction seterror has decreased into a plateau. We then endtraining and test the neural network with theadjuster molecule. If the neural network predictsthe adjuster molecule's binding energy within5%, that neural network's construction is saved,if the prediction is more than 5% o!, a newselection of the four adjustable parameters ischosen. This procedure is repeated until a con-struction is found that allows the neural networkto predict the adjuster molecule's binding energywithin 5%. This is repeated for each of the 32molecules used in this study. This procedure iscalled the &&leave-one-out'' method (Weiss& Kulikowski, 1991). The "nal neural networkconstruction was chosen when it could predict allthe adjuster molecules within 5% error. The "nalconstruction for this system is "ve hidden layerneurons, 10 000 training iterations, learning rateequals 0.1 and the momentum term equals 0.9.There might exist other neural network construc-tions that can predict all the adjuster molecule'sbindingenergy within 5%, the one we choose wassimply found "rst. Training this network con-struction takes less than 10 min on an IBM R/S6000 workstation.

We now describe how quantum chemical datacan be input to train a neural network. We createquantum descriptions of molecules in the follow-ing way: "rst, the molecular structures are energyminimized using semi-empirical methods.Molecules with many degrees of freedom, arecon"gured such that they all have their #exibleregions in the same relative position. Thenthe wave function for the molecule is calculatedwith the program Gaussian 94 (1995). A variety ofbasis sets are used to insure converged results.From the wave function, the electrostatic poten-tial is calculated at all points around and withinthe molecule using the CUBE function. The

electron density, the square of the wave function,is also calculated at all points around and withinthe molecule with the CUBE function. With thesetwo pieces of information the electrostatic poten-tial at the van der Waals surface can be gener-ated. Such information sheds light on the kindsof interactions a given molecule can have withthe active site (Horenstein & Schramm, 1993).Regions with electrostatic potentials close to zeroare likely to be capable of van der Waals inter-actions, regions with a partial positive ornegative charge can serve as hydrogen bonddonor or acceptor sites, and regions with evengreater positive or negative potentials may beinvolved in Coulombic interactions. The electros-tatic potential also conveys information concern-ing the likelihood that a particular region canundergo electrophilic or nucleophilic attack(Evans et al., 1975). We choose the van der Waalssurface, within which 95% of the electron densityis found, to de"ne the molecular geometry. Onecan closely approximate the van der Waals sur-face by "nding all points around a moleculewhere the electron density is close to 0.002$delectrons bohr~3. d is the acceptance tolerance.When d is adjusted so that about 15 points peratom are accepted, this creates a fairly uniformmolecular surface, as shown previously (Bagdas-sarian et al., 1996a, b). The information abouta given molecular surface is thus described bya matrix with dimensions of 4]n where n is thenumber of points for the molecule, and the rowvector of length 4 contains the x, y, z coordinatesof a given point and the electrostatic potential atthat point.

To preserve the geometric and electrostaticintegrity of the training molecules, a collapseonto a lower-dimensional surface is avoided. Themolecules are oriented using common atoms androtation matrices. Three atomic positions that allthe molecules share are chosen and named a, b, c.a is translated to the origin, this translation isperformed on b and c and all the surface points.The coordinate system is rotated such that b is onthe positive x-axis. Then the coordinate system isrotated such that c is in the positive x, z plane.Inputs to a neural network must be in the form ofa vector not a matrix. To accomplish this trans-formation we map the electrostatic potential ofthe di!erent molecular surfaces onto a common


surface; a set of random points uniformly distrib-uted on a sphere with a larger radius thanthe largest molecule in the study. The radiusof the sphere used in this study was 15 bohr.The sphere is larger than the molecules so allmappings are outward.

The nearest-neighbor for each point on thesphere is found on the surface of the molecule.The electrostatic potential of this molecular pointis then given as the x, y, z coordinates of its near-est neighbor on the sphere. After this translation,the x, y, z coordinates of the surface points, fromall the molecules, will be identical (the coordi-nates of the sphere). This being the case, the x, y, zinformation can be discarded leaving only a vec-tor of the electrostatic potentials. This mappinginsures that similar parts of the molecules occupya similar position in the input vector. This elec-trostatic potential information is accompaniedby geometric information. The input to the neu-ral network is a vector of these mapped electro-static potentials and the distance the points weremapped from the molecular surface to the sphere.The information in the second half of the inputvector are scalars that relate the distance, in bohr,between the points on the sphere and its nearest-neighbor point on the molecule's surface. Thisportion of the input is designed to inform theneural network about the relative shapes andsizes of the molecular surfaces. Each input neur-on is given information (electrostatic potential orgeometry) about the nearest point on the surfaceof the inhibitor. In this design, each input neuronmay be imagined to be at a "xed point on thesphere around the inhibitors, judging the elec-trostatic potential and geometry of each inhibitorin the same way the enzyme active site would.

In the limit, with an in"nite number of points,all mappings are normal to the inhibitor's sur-face, and the mapping distances will be as smallas possible. To approach this limit a ten-foldexcess of points was selected to describe the mol-ecules. The molecule's surfaces are described by150 points per atom. The reference sphere thatthe points are mapped onto is described bya smaller number of points, 15 times the averagenumber of atoms in the molecules of the study. Asa result of mapping to the reference sphere allmolecules are described by the smaller number ofpoints. The number of points on the reference

sphere used in this study is 400. This means therewere 400 electrostatic potential and 400 geometrydescriptors used to describe each molecule.

Double Neural Network

In previous work, we have shown that neuralnetworks can be used to predict how well mol-ecules will function as inhibitors before experi-mental tests are done (Braunheim & Schwartz, inpress; Braunheim et al., submitted). We now de-scribe how neural networks can be used to designcharacteristics of inhibitors that are more potentthan those in the training set. We have createda double neural network whose function is tooptimize the characteristics of molecules neededfor bioactivity. A standard neural network asshown in Fig. 1, with function determined byeqns (1)}(11), is used to learn the rules of bindingto the enzyme. Once a neural network has beentrained to recognize what features of inhibitorsare necessary to bind to the enzyme, the weightsand biases are "xed and not allowed to vary. Thisnetwork is integrated into another neural net-work to form a double neural network. The goalof this construction is to use these learned bind-ing rules to discern how to create a quantumobject which binds more strongly than any yetpresented to the neural network.

The trained and "xed network is called theinner network and the other part is called theouter network (Fig. 2). The double network has"ve layers but only the weights and biases of theouter network are allowed to vary during train-ing. That is, during the training of the doublenetwork, the outer network's weights and biasesare responsible for minimizing prediction seterror. The inputs to the double network are thesame as the ones used to train the inner network.The outer network's output layer is the inputlayer to the inner network, therefore the outputof the outer network is the input to the innernetwork. The inner network's output targets arethe same as those used before only they have beendecreased 10%. That is they are the binding ener-gies of slightly better inhibitors. To reduce theerror of the inner network the outer networkmust output altered descriptions of the inputmolecule, but altered in a way such that it de-scribes an inhibitor with a greater a$nity for the

FIG. 2. Picture of a double neural network compriseda coupled inner and outer neural network.


enzyme. The outer network becomes an inhibitorimprover because the inner network's "xedweights and biases contain the rules for bindingto the enzyme. In order to compensate for thealtered binding energies the outer network mustoutput altered versions of the input moleculesthat would bind to the enzyme with greatera$nity.

The inputs to the double neural network con-tained both electrostatic potential and geometricinformation. During the training of the doubleneural network both of these descriptors wereallowed to vary. The range of the output func-tions of the output layer of the outer network hadto be modi"ed in a similar way as eqn (4),

gi(x)"

ci

1#e(~x)!g

i, (12)

where ciis the range of the numbers at position

i in the input vectors presented to the outernetwork and g

iis the minimum number at posi-

tion i in the input vectors. The maximum andminimum values are increased and decreased10% to give the neural network the ability tooutput numbers slightly larger and smaller thanthose used in the molecular descriptions [usingeqns (5) and (6)].

The double neural network trains for manyiterations in the same fashion as mentionedabove, the only di!erence is there is no estab-lished rule which de"nes the correct way to

determine the proper training parameters. Thecompetency of the outer network cannot be tes-ted independent of the inner network. This beingthe case the end point for training of the doublenetwork was chosen as the minimum number ofiterations and hidden layer neurons that wereneeded to minimize training error such that moreiterations and hidden layer neurons did notdecrease the error signi"cantly. The optimumconstruction for the double network was "vehidden layer neurons (in the outer network) and1 million training iterations. Training of thedouble neural network takes around 20 hr on anIBM R/S 6000 workstation. The learning rateand momentum term were the same values usedto train the inner network. After the double neu-ral network trained, improved versions of themolecular descriptions were output. These im-proved versions of input molecules are thentransformed back into three-dimensional repres-entations of molecules. With the molecules in thisformat it is possible to identify the molecularfeatures that the neural network found to requirealteration for improved binding. We report theresults of this technique to the study of the en-zyme system nucleoside hydrolase.

IU-Nucleoside Hydrolase

Protozoan parasites lack de novo purine bio-synthetic pathways, and rely on the ability tosalvage nucleosides from the blood of their hostfor RNA and DNA synthesis (Hammond &Gutteridge, 1984). The inosine}uridine preferringnucleosides hydrolase (IU-NH) from Crithidiafasciculata is unique and has not been found inmammals (Degano et al., 1998). This enzymecatalyses the N-ribosyl hydrolysis of all naturallyoccurring RNA purines and pyrimidines(Degano et al., 1998). The active site of the en-zyme has two binding regions, one region bindsribose and the other binds the base. The inosinetransition state requires DDG"17.7 kcalmol~1activation energy, 13.1 kcal mol~1 are used inactivation of the ribosyl group, and only4.6 kcalmol~1 are used for activation of the hy-poxanthine leaving group (Parkin et al., 1997).Analogues that resemble the inosine transitionstate both geometrically and electronically haveproven to be powerful competitive inhibitors of


this enzyme and could be used as anti-trypanosomal drugs (Degano et al., 1998).

The transition state for these reactions featurean oxocarbenium-ion achieved by the polariza-tion of the C4@ oxygen C1@ carbon bond of ribose.The C4@ oxygen is in proximity to a negativelycharged carboxyl group from Glu 166 duringtransition state stabilization (Degano et al., 1998).This creates a partial double bond betweenthe C4@ oxygen and the C1@ carbon causing theoxygen to have a partial positive charge andthe carbon to have a partial negative charge.Nucleoside analogues with iminoribitol groupshave a secondary amine in place of the C4@oxygen of ribose and have proven to be e!ectiveinhibitors of IU-NH.

IU-NH acts on all naturally occurring nucleo-sides (with C2@ hydroxyl groups), the lack of spe-ci"city for the leaving groups results from thesmall number of amino acids in this region toform speci"c interactions: Tyr 229, His 82 andHis 241 (Degano et al., 1998). The only crystalstructure data available concerning the con"g-uration of bound inhibitors was generated froma study of the enzyme bound to p-amino-phenyliminoribitol (pAPIR) (Fig. 3, no. 3). Tyr229 relocates during binding and moves abovethe phenyl ring of pAPIR. The side-chain hy-droxyl group of Tyr 229 is directed toward thecavity that would contain the six-member ring ofa purine, were it bound (Degano et al., 1998).His 82 is 3.6 As from the phenyl ring of pAPIR,and in the proper position for positive charge}ninteractions to occur (Degano et al., 1998).His 241 has been shown to be involved in leaving-group activation in the hydrolysis of inosine,presumably as the proton donor in the creationof hypoxanthine (Degano et al., 1998).

The conformations of the molecules used inthe study are "xed such that their structuresare consistent for all molecules. We assume thatthe enzyme will bind all molecules in a similarconformation. This approach has been veri"eda posteriori on even the very #exible linear chaininhibitors (arginine analogues) for nitric oxidesynthase (B. Braunheim & S. Schwartz, unpub-lished results). The neural network need not knowthe con"guration as long as the conformation ofall molecules we present to the neural network isconsistent. The known crystal structure of

the inhibitor p-aminophenyliminoribitol boundto IU-nucleoside hydrolase is used as the modelconformation.

The inosine transition state structure is stabil-ized by a negatively charged carboxyl groupwithin the active site 3.6 As from the C4@ oxygen(Horenstein et al., 1991). In order to simulate thisaspect of the active site, we included a negativelycharged #uoride ion (at the same relative positionof the nearest oxygen of the carboxyl group) inthe calculations of the electrostatic potential atthe van der Waals surface.

To underscore the complexity of an investiga-tion of this enzyme we examined the di!erentnature of transition state structures for the twodi!erent kinds of substrates, purines and py-rimidines. Inosine's transition state is the onlyone for which there is a determined structure andit is shown in Fig. 4. The transition state structurefor inosine is created by polarization of theribosyl group across the C4@ oxygen C1@ bond,and protonation of N7 of the purine group. Thisprotonation would be impossible when dealingwith the pyrimidine uridine (Fig. 3, no. 27), asthere is no place for this group to receive a proton(the electrons of N3 are involved in the ringconjugation). Therefore, it is clear that these twotypes of substrates have quite di!erent transitionstate structures, and that the rules of tight-bind-ing pyrimidine analogues are quite di!erent fromthose of binding purines. For pyrimidines ana-logues, the binding energy tends to decrease withincreasingly electron withdrawing substitutions.The opposite trend is seen with purine analogues.Any mathematical model of the binding prefer-ences of this enzyme would have to take intoaccount these contradictory trends with the dif-ferent kinds of substrates. In our previous publi-cation on this enzyme system (Braunheim& Schwartz, 1999; Braunheim et al., 1999), wefound that a neural network could make accuratepredictions for both purine and pyrimidine ana-logues when trained with purine and pyrimidineanalogues. For our current work with the doubleneural network, there is an added level of com-plexity, because the inner neural network mustteach the outer network during training. That is,when the outer neural network improves the mo-lecular descriptions, it is necessary for purines tobe improved in a di!erent way than pyrimidines.

FIG. 3. Two-dimensional representations of the molecules used in the neural network study. Binding energies published inBraunheim et al. (submitted).


FIG. 4. Two-dimensional representation of the inosidetransition state.


Results

Figure 3 shows the 32 molecules used in theneural network study. The inner network wastrained with all 32 molecules of the study. Sincewe knew that the binding rules for purines weredi!erent from those of pyrimidines, the outernetwork only trained with purines or pyrimidinesduring training. This way the outer networkcould focus on optimizing members from one ofthe two separate classes. The purine analoguesare molecule numbers 4, 9, 13, 19, 23}26,30}32 of Fig. 3. The pyrimidine analoguesare molecule numbers 1}3, 5}8, 10}12, 14}18,20}22, 27, 28 of Fig. 3. Figure 5 showscoincidentally oriented points on the surfacesof some of the molecules used in the studybefore and after their geometries were modi"edby the double neural network. All the picturesof surface points show the molecules in twoorientations (the "gure on the left showsthe molecules with the face of the base facingout of the page, the "gure on the right shows themolecules with the edge of the base facing out ofthe page). Figure 5(a) shows a purine and apyrimidine analogue oriented for maximum geo-metric coincidence, the molecules are nos 2 and4 in Fig. 3. Figures 5(b) and (c) show optimizedversions of the geometric descriptions of thesemolecules compared to their unaltered inputdescriptions.

Examination of the idealized molecules, theirelectrostatic potential and geometry, shows thatthe double neural network changed purines indi!erent ways than it did for pyrimidines. Thepurine analogues 4, 9, and 13 of Fig. 3 wereimproved by the double neural network in sim-ilar ways. The lower right-hand side (r.h.s.) of thesurface points shown in Fig. 6(a) and (c) show

that the neural network improved molecule 4 bymaking that region more positive (regionscolored by red dots have a partial positivecharge, green dots designate a neutral region andregions colored blue have a partial negativecharge). The molecules entire surface appears tobe more positive, this is consistent with the otherimproved purines (Figs 7 and 8). This is notsurprising, there is a trend within the moleculesstudied where the purine analogues with increas-ingly positive purine groups have increasingbinding energy. In addition to this structure}activity relationship, the transition statefor inosine has a full positive charge andit is the tightest binding purine. Figure 8shows molecule 13 before and after its descriptionwas idealized. Molecule 13 is larger than theother purine analogues, the double neuralnetwork improved its description such that itgeometrically more closely resembles theother purines. In order for the double neuralnetwork to do this it most have learned thatpurines analogues that more closely resemblethe typical purine form, function better.In addition, the double neural network operatedon this purine analogue in a di!erent waythan it did for any of the other purine analogues.It developed a set of operations that minimizedpart of the molecules surface that were appliedexclusively to molecule 13.

Figures 9}11 show that the double neural net-work idealized pyrimidines by making the lowerpart of the base more negative. To emphasize thisfeature we show the idealized molecular repres-entations of pyrimidine analogue numbers 1, 16and 17 of Fig. 3. An aromatic ring can be made tobe more electron rich by a variety of substituents(Br, OH, NH

2) these groups themselves vary

greatly in their electrostatic potential, noticehow the neural network consistently madethe lower portion of the ring more electronrich while the upper part of the ring (wherethe substituent groups were) is comprisedof both positive and negative points. That is,the neural network learned from the molecules inthe training set that the top part of the phenylring can vary greatly in electrostatic potential,but the electron richness of the mid and lowerpart of the phenyl ring determines bindingstrength.

FIG. 5. Surface point comparisons of the geometry of unaltered input molecules and ideal molecules. This "gure can be

viewed in colour online at http://www.idealibrary.com on


FIG. 6. Surface point comparison of the electrostatic potential and geometry of molecule no. 4. The unaltered moleculardescription is compared to that altered by the double neural network. This "gure can be viewed in colour online at

http://www.idealibrary.com on


















Conclusions

The success of the double neural networkmethod gives us reason to believe that the designof inhibitors could be automated. The "nal stepof this method is going from the electrostaticpotential at the van der Waals surface points toa description of the molecule's atomic coordinatesand atom type. This is obviously a very di$cultproblem, but the work reported in this paper canbe viewed as a "rst step in a program to predicte$cient chemical inhibitors. To conclude this pa-per, we will discuss theories about how the neuralnetworks were able to work together.

The inner network was trained and within itsweights and biases, it contains the rules for bind-ing to the enzyme. Molecular descriptions, in ourformat, can be input to this network and aprediction of binding energy will be output. Our"rst attempt was to use a random number gener-ator to output numbers, within the ranges of themolecular descriptions, and see if the functioncontained within the trained neural networkcould be optimized randomly. This would beanalogous to a Monte-Carlo-like search of mo-lecular descriptor space. This approach wasunsuccessful presumably because there are 800descriptors in or molecular descriptions, adjust-ing them randomly could take an almost in"niteamount of time if an exhaustive search is re-quired. We concluded that a smart search wasnecessary, that is the search for every one of the800 descriptors must be guided toward the opti-mum value. The problem with this is there is noway to know what the optimum value of any oneof the descriptors is until the pattern is presentedto the trained neural network and an output isgenerated and even then, this output cannotshow which numbers in the input acted to in-crease and decrease the output. The only place ina neural network where the values of an input arejudged for the degree to which they optimize anyfunction is inside the network. The error termand corrections to the hidden layer are

dj"f @(hI

j)

m+k/1

dkwjk

, (13)

Dwij"ad

jxi, (14)

Dbj"ad

j. (15)

Equation (13) shows how the error term for thehidden layer, d

j, is a function of both the input to

the hidden layer, f @(hIj), and the error of the out-

put layer, dk

(the total error of the network).Equations (14) and (15) show how the error termof the hidden layer is used to calculate the correc-tion terms for the weights and biases of the inputlayer (Dw

ij, Db

j) so in the next iteration, the error

term will be smaller. This sequence, shown ineqns (13)}(15), shows how the total error of thenetwork is used to ajust the weights and biases oflayers distant form the output layer. The neuralnetwork is in fact optimizing the input to thehidden layer in spite of the fact that there is nopreset optimum of what the hidden layer inputshould be. This ability of the learning rules to "nda multi-dimensional optimum is exactly what isexploited in the double neural network. The&&teaching'' of the outer network by the innernetwork occurs because the input layer's errorterm of the inner network is optimizing theweights and biases of the output layer of theouter network. Quantum features optimizationoccurs because the weights and biases of the innernetwork are "xed and because the true bindingenergies have been increased slightly. With thesecon"gurations, we forced the training rules to"nd the multi-dimensional optimum for the out-put of the outer network's output layer. This is, inturn, based on minimizing the error of the inputlayer of the inner network, and the only way todo this is to output a molecular description thathas a slightly larger binding energy than the oneinput to the outer network. In satisfying theserequirements, the outer network becomes a mo-lecular features optimizer based on the rules con-tained within the inner network.

REFERENCES

BAGDASSARIAN, C. K., BRAUNHEIM, B. B., SCHRAMM, V. L.& SCHWARTZ, S. D. (1996a). Quantitative measures ofmolecular similarity: methods to analyze transition-stateanalogues for enzymatic reactions. Int. J. Quant. Chem.,Quant. Biol. Symp. 23, 73}80.

BAGDASSARIAN, C. K., SCHRAMM, V. L. & SCHWARTZ, S. D.(1996b). Molecular electrostatic potential analysis for en-zymatic substrates, competitive inhibitors, and transition-state inhibitors. J. Am. Chem. Soc. 118, 8825}8836.

BRAUNHEIM, B. B. & SCHWARTZ, S. D. (1999). Computa-tional methods for transition state and inhibitor recogni-tion. Meth. Enzymol. 308, 398}426.


BRAUNHEIM, B. B., SCHWARTZ, S. D. & SCHRAMM, V. L.(1999). The use of quantum neural networks in a blindprediction of unknown binding free energies ofinhibitors to IU-nucleoside hydrolase. Biochemistry38, 16 076}16 083.

DEGANO, M., ALMO, S. C., SACCHETTINI, J. C. & SCHRAMM,V. L. (1998). Trypanosomal nucleoside hydrolase, a novelmechanism from the structure of a transition state com-plex. Biochemistry 37, 6277}6285.

EVANS, B. E., MITCHELL, G. N. & WOLFENDEN, R. (1975).The action of bacterial cytidine deaminase on 5,6-dihydro-cytidine. Biochemistry 14, 621}629.

FAUSETT, L. (1994). Fundamentals of Neural Networks.Englewood Cli!s, NJ: Prentice-Hall.

GASTEIGER, J., LI, X., RUDOLPH, C., SADOWSKI, J. &ZUPAN, J. (1994). Representation of molecular electrostaticpotentials by topological feature maps. J. Am. Chem. Soc.116, 4608}4620.

Gaussian 94, Revision C.2 (1995). Pittsburgh, PA: Gaussian,Inc.

HAMMOND, D. J. & GUTTERIDGE, W. E. (1984). Purineand pyrimidine metabolism in the trypanosomatide. Mol.Biochem. Parasitol. 13, 243}261.

HORENSTEIN, B. A., PARKIN, D. W., ESTUPINAN, B.& SCHRAMM, V. L. (1991). Transition-state analysis ofnucleoside hydrolase from Crithidia fasciculata. Biochemis-try 30, 10 788}10 795.

HORENSTEIN, B. A. & SCHRAMM, V. L. (1993). Electronicnature of the transition state for nucleoside hydrolase.A blueprint for inhibitor design. Biochemistry 32,7089}7097.

PARKIN, D. W., LIMBERG, G., TYLER, P. C., FURNEAU, R. H.,CHEN, X. Y. & SCHRAMM, V. L. (1997). Isozyme-speci"ctransition state inhibitors for the trypanosomal nucleosidehydrolase. Biochemistry 36, 3528}3534.

RUMELHART, D. E., HINTON, G. E. & WILLIAMS, R. J.(1986). Parallel Distributed Processing, Vol. 1. MA: MITPress.

SO, S.-S. & RICHARDS, W. G. (1992). Application of neuralnetworks: quantitative structure}activity relationship ofderivatives of 2,4-diamino-5-(substituted-benzyl) pyrimi-dines as DHFR inhibitors. J. Med. Chem. 35, 3201}3207.

TETKO, I. V., TANCHUK, V. Y., CHENTSOVA, N. P., ANTO-

NENKO, S. V., PODA, G. I., KUKHAR, V. P. & LUIK, A. I.(1994). HIV-1 reverse transcriptase inhibitor design usingarti"cial neural networks. J. Med. Chem. 37, 2520}2526.

THOMPSON, T. B., CHOU, K.-C. & ZHENG, C. (1995). Neuralnetwork predictions of the HIV-1 protease cleavage sites.J. theor. Biol. 177, 369}379.

WAGENER, M., SADOWSKI, J. & GASTEIGER, J. (1995). Auto-correlation of molecular surface properties for modelingcorticasteriod binding globulin and cytosolic Ah receptoractivity by neural networks. J. Am. Chem. Soc. 117,7769}7775.

WEINSTEIN, J. N., KOHN, K. W., GREVER, M. R., VISWANAD-

HAN, V. N., RUBINSTEIN, L. V., MONKS, A. P., SCUDIERO,D. A., WELCH, L., KOUTSOUKOS, A. D., CHIAUSA, A. J.& PAULL, K. D. (1992). Neural computing in cancer drugdevelopment: predicting mechanism of activity. Science258, 447}451.

WEISS, S. & KULIKOWSKI, C. (1991). Computer Systems ¹hat¸earn. CA: Morgan Kaufmann Publishers, Inc.

Documents

Neural Network Methods for Identification and Optimization ...schwartzgroup1.arizona.edu/schwartzgroup/sites/... · than those in the training set (Fausett, 1994), min new "min old!abs(min