Diverse Models for the Prediction of HIV Integrase Inhibitory Activity of Substituted Quinolone Carboxylic Acids

Full Paper

Diverse Models for the Prediction of HIV Integrase InhibitoryActivity of Substituted Quinolone Carboxylic Acids

Monika Gupta1 and Anil Kumar Madan2

1 Faculty of Pharmaceutical Sciences, M. D. University, Rohtak, India2 Faculty of Pharmaceutical Sciences, Pt. B. D. Sharma University of Health Sciences, Rohtak, India

In the present study both classification and correlation techniques of diverse nature were successfully

employed for the development of models for the prediction of human immunodeficiency virus (HIV)

integrase inhibitory activity using a dataset comprising 50 analogs of quinolone carboxylic acid. The

values of various molecular descriptors (MDs) for each analog in the dataset were computed using

the MDS V-life science QSAR plus module. The values of other MDs which are not part of MDS V-life

science were computed using an in-house computer program. A decision tree (DT) was constructed for

the HIV integrase inhibitory activity to determine the importance of MDs. The DT learned the

information from the input data with an accuracy of 98% and correctly predicted the cross-

validated (10 fold) data with an accuracy of 96%. Three MDs, E-state contribution descriptor

(SssOHE), molecular connectivity topochemical index (xA), and eccentric connectivity

topochemical index (jCC), were used to develop the models using moving average analysis (MAA).

The accuracy of classification of single descriptor based models using MAA was found to vary from a

minimum of 96% to a maximum of 98%. The statistical significance of the models was assessed

through specificity, sensitivity, overall accuracy, Mathew’s correlation coefficient, and

intercorrelation analysis. The widely used methods like multiple linear regression, partial least

squares, and principal component regression were employed for development of correlation

models. The models were generated on a training set of 36 molecules. The models had a

correlation coefficient (r2) of 0.86 to 0.92, significant cross validated correlation coefficient (q2) of

0.79 to 0.85, F-test from 63.2 to 93.06, r2 for external test set (pred_r2) from 0.69, coefficient of

correlation of predicted dataset (pred_ r2Se) of 0.77, and degree of freedom from 27 to 30. Alignment

independent descriptors, SsOHE-index, SaaCHE index, SssCH2, and x log P were found to be the most

important descriptors for the development of correlation models for the prediction of HIV integrase

inhibitory activity.

Keywords: Eccentric connectivity topochemical index / E-state contribution indices / HIV integrase inhibitors / Molecular

connectivity index / Quinolone carboxylic acid

Received: October 24, 2011; Revised: July 4, 2012; Accepted: July 11, 2012

DOI 10.1002/ardp.201100316

Introduction

Human immunodeficiency virus (HIV), a retrovirus, is the

primary cause of acquired immunodeficiency syndrome

(AIDS), and one of the main medical and social problems

nowadays [1]. An estimated 36 million people worldwide are

currently suffering from HIV and some 20 million people

having already died, representing a cumulative total number

of HIV infections to be 56 million. However, there is still no

known cure or vaccination against it [2].

Human immunodeficiency virus infection is characterized

by the relentless decline in both the number and function of

HIV specific CD4þ and T helper cells which are preferentially

infected [3]. Although highly active antiretroviral therapy

(HAART) regimens combining three or more RT or protease

inhibitors have proven to effectively suppress uncontrolled

Correspondence: Prof. Anil Kumar Madan, Faculty of PharmaceuticalSciences, Pt. B. D. Sharma University of Health Sciences, Rohtak 124 001,India.E-mail: [email protected]: þ91-1262-213202

Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 1

� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

viral replication, drug resistance, and drug-induced side-

effects can hinder complete viral suppression. It thus follows

that the identification of an inhibitor of HIV IN would add a

valuable third component to the antiviral armamentarium

[4]. HIV IN is an attractive target because there is no known

homolog in human cells, potentially minimizing the side-

effects associated with other antiviral agents [5, 6].

Integration of retroviral DNA into the genome of the host

cell is an essential step in the viral replication cycle [7, 8]. It is

a multistep process consisting of three biochemical steps:

(i) assembly of a complex with specific DNA sequences at the

end of HIV-1 long terminal repeat (LTR) regions; (ii) endo-

nucleolytic processing of the viral DNA which is initially

blunt-ended and is processed by the viral integrase protein

to remove the terminal dinucleotide from each 30 end. This

30 end-processing step exposes the conserved CA dinucleoti-

des that ultimately define the ends of the integrated viral

DNA; (iii) strand transfer, in which the viral DNA 30 ends

are covalently linked to the cellular DNA [9]. Integration

occurs by a pair of transesterification reactions, catalyzed

by integrase, in which the 30 hydroxyl groups of the processed

viral DNA ends attack a pair of phosphodiester bonds in

the target DNA.

In the case of HIV-1, the sites of attack on the two target

DNA strands are separated by 5 bp. The 30 ends of the viral

DNA are joined to target DNA, whereas the 50 ends of the

viral DNA are unjoined in the resulting integration inter-

mediate in the ‘‘30-processing’’ reaction, IN removes the ter-

minal GT dinucleotides from each LTR 30 end, while in the

‘‘strand transfer’’ reaction the two newly processed 30-viral

DNA ends are joined to the cellular DNA [10]. Repair of the

intermediate by cellular enzymes completes the integration

process [11]. Among different classes of integrase inhibitors

two classes, viz. dinucleotide analog [12] and strand transfer

inhibition, play a major role. The binding of strand-

transfer inhibitors displaces the reactive viral DNA end

from the active site, disarming the viral nucleoprotein com-

plex [13].

Several classes of HIV integrase inhibitors have been

reported in the literature. These include diketo acids [14],

naphthyridine ketones [15], naphthyridine carboxamides

[16, 17], and tri-cyclic pyrroloquinolines [18]. In spite of

numerous efforts for the development of integrase inhibi-

tors, only the diketo acid class of compounds could be devel-

oped to an advanced stage. The diketo acid moiety (g-ketone,

enolizable a-ketone, and carboxylic acid) was believed to be

essential for the inhibitory activity of this series of integrase

inhibitors [14]. In 2007, raltegravir (Isentress; Merck), a small-

molecule drug that inhibits HIV-1 integrase, became the first

drug in the diketo acid class to be approved by the US FDA

for the treatment of HIV-1 infection in combination with

other antiretroviral agents [19].

In order to identify new IN inhibitors, Sato and co-workers

[20] reported that the core structure of 4-quinolone-3-carbox-

ylic acid antibiotics can be used as an alternative scaffold to

diketo acids. These novel quinolone IN inhibitors were struc-

turally optimized into the highly potent compound GS-9137

(elvitegravir), exhibiting potent inhibitory activity against

IN-catalyzed DNA strand transfer and antiviral activity

in vitro [20]. Pasquini and co-workers [21] further explored

the effect of chemical modifications on the 4-quinolone-3-

carboxylic acid scaffold, with the aim of gaining new insight

into the fundamental structural requirements for integrase

inhibitory activity.

The global research effort to identify drugs that inhibit

HIV integrase has spanned approximately 1.5 decades and

recently led to human trials with several candidates, includ-

ing L-870,810 [22], GS-9137 [16], and MK-0518 [23], elvitegravir

[24], and GSK364735 [25]. Four of these exploratory drugs

have yielded 2-log or greater decreases in viral loads in phase

II studies, reflecting a profound clinical effect in blocking

this viral enzyme [26]. All of these molecules inhibit the ST

step of HIV-1 integrase and possess submicromolar antiviral

activity [27].

Structure based design spurred by significant pitfalls of

traditional method and rapid advances in molecular struc-

ture determination and computational resources has now

been accepted as a rational approach for the generation of

new pharmaceuticals [28]. Quantitative structure–activity/

property relationships (QSAR/QSPR) offer the possibility for

screening a large number of chemicals in a short time and

with low cost. QSAR/QSPR constitute vital tools used in drug

design and have gained extensive recognition in the corre-

lation and prediction of various properties like ADME [29],

toxicity [30], retention time [31], stability [32], and physico-

chemical properties [33] other than the biological activity

[34–37]. QSAR/QSPR establishes a statistical relationship

between biological activity of the chemicals of interest

and their structural properties [38]. Thousands of descriptors

from five classes are used in QSAR/QSPR studies: constitutional,

graph theoretical and topological indices, geometrical, electro-

static, and quantum descriptors. But the descriptors that

are simpler, easy to compute and non-correlated found their

extensive use in the QSAR/QSPR studies [39].

In the present study diverse classification and correlation

techniques have been successfully employed for the develop-

ment of models for the prediction of HIV-I integrase inhibi-

tory activity of quinolone carboxylic acid [40].

Results and discussion

In the present study, a decision tree (DT) was built from a set

of 22 molecular descriptors (MDs) listed in Table 1 [41–63].

The descriptor at the root node is most important and the

2 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12

� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com

importance of descriptors decreases as the length of tree

increases. The classification of various quinolone carboxylic

acid analogs (Fig. 1 and Table 2) as inactive and active using a

single tree, based on molecular connectivity index A1, is

shown in Fig. 2. The DT identified the molecular connectivity

index A1 as the most important index. The DT classified the

analogs with an accuracy of 98%. The specificity and sensi-

tivity of the training set was found to be of the order of 97.4

and 100%, respectively (Table 3). In 10-fold cross-validation,

96% of quinolone carboxylic acid analogs were correctly

classified with regard to biological activity. The specificity

and sensitivity of the cross validated set was found to be 97.4

and 91% and the value of Mathew’s correlation coefficient

(MCC) for the training and cross validated sets was found

to be 0.94 and 0.8, respectively (Table 3). MCC takes into

consideration both the sensitivity and specificity and is

generally used as a balanced measure in dealing with a data

imbalance situation. The values of MCC range between �1

to þ1 and indicate the potential of the model. High values

of MCC simply represent robustness of the proposed models

[64, 65].

The random forests (RFs) were grown with 22 MDs listed in

Table 1. The importance of a node was determined by the

mean decrease in accuracy. The RF classified quinolone car-

boxylic acid analogs as inactive and active with an accuracy

of 94% with respect to the HIV integrase inhibitory activity.

The out-of-bag (OOB) estimate of error was found to be only

3%. The specificity and sensitivity was of the order of 97.4 and

81.8%, respectively, and the value of MCC was found to be 0.8

(Table 3). Using a single descriptor at a time, four independ-

ent moving average analysis (MAA) based models using the

E-state contribution index (SsOHE), the molecular connec-

tivity index (xA) and the eccentric connectivity topochemical

index (jCC) were developed. The proposed models are illus-

trated in Table 4. The overall accuracy of prediction varied

from 96% for the eccentric connectivity topochemical index

(jCC) to >98% for SsOHE. Transitional ranges were observed in

all the models indicating a gradual change in HIV integrase

inhibitory activity. The average IC50 (Table 4 and Fig. 3) for

the active range in all the models varied from 6.7 to 7.23 nM.

The said range was also found to be active against overall

antiviral activity (Table 4 and Fig. 3). The observation of

extremely low average IC50 values simply indicates high

potency of the active ranges in the proposed models.

Moreover, the active ranges possessed average selectivity

index (SI) values from 970 to 1113.6 (Table 4 and Fig. 3).

The SI may be defined as quotient of CC50 and IC50. The

values of SI for all the analogs were calculated from the

reported values of CC50 and IC50. The value of SI is an indirect

measure of therapeutic index and consequently an indica-

tion of safety. The average low value of IC50 and high

average value of SI of the active range not only indicates

Table 1. List of molecular descriptors employed for the study.

Code Descriptor References

A1 Molecular connectivity topochemical index [41, 42]A2 Eccentric adjacency topochemical index [43]A3 Augmented eccentric connectivity

topochemical index[44]

A4 Eccentric connectivity topochemical index [45]A5 Connective eccentricity topochemical index [44]A6 Zagreb topochemical index, M2

C [46]A7 Wiener’s topochemical index [47]A8 Superaugmented eccentric connectivity

topochemcial index[48]

A9 Molecular connectivity index [49]A10 Eccentric adjacency index [50]A11 Augmented eccentric connectivity index [51]A12 Eccentric connectivity index [52]A13 Connective eccentricity index [53]A14 Zagreb index, M2 [54, 55]A15 Wiener’s index [56, 57]A16 Balaban mean square distance index [58]A17 Superpendentic index [59]A18 SsOHE-index [60, 61]A19 SaaCHE-index [60, 61]A20 SssCH2E-index [60, 61]A21 T_T_C_2 [62]A22 x log P [63]

XR1

O

R

O

1

349

2

8

67

5

X=C for A1

X=N for All

(A)

N

O

OH

H

R

1

349

2

8

67

5 O

(B)

N

O

OH

R

ClX

1

349

28

67

5O

X=Cl for all

X=F for C13

(C)

N

O

OH

OHR

ClCl

1

349

2

8

67

5O

(D)

N

O

OH

OH

FCl

R1

349

28

67

5O

(E)

N

O

OH

OH

FCl

O

R1

34

9

28

67

5O

(F)

Figure 1. Basic structures and arbitrary atom numbering scheme

for the quinolone carboxylic acid [40].

Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 3


Table 2. Relationship of molecular connectivity topochemical indices, eccentric connectivity topochemical index, and Balaban’s mean

square distance topochemical index with HIV integrase inhibitory activity.

Cpd no. R/R1 SsOHEindex

xA jcc IC50 (nM) (from ref. [40]) HIV integrase inhibitory activitya

Predicted using MAA models Reported 40

SsOHE-index xA jcc

A1 –COCOOH 17.79 9.783 421.312 50 S S S SA2 SCOCOOH/–CH3 8.915 11.098 500.183 100 000 S S S SA3 SOH 9.078 9.876 407.99 1630 S S S SA4 SOH/–CH3 9.209 10.249 428.855 2300 S S S SA5 SOEt 7.482 10.988 490 100 000 S S S SA6 NH2 7.498 9.907 406.33 100 000 S S S SB1 3-Cl 9.108 9.96 542.822 100 000 S S S SB2 2-Cl 9.113 9.96 506.531 800 S S S SB3 1-Cl 9.12 9.993 472.115 410 S S S SB4 1-F 9.065 10.174 433.097 500 S S S SB5 1-Me 9.114 10.286 427.656 1080 S S S SB6 1-OMe 9.13 10.671 465.75 1170 S S S SB7 1-CF3 8.43 11.201 526.662 720 S S S SB8 1,4-Cl2 9.162 10.078 544.471 370 S S S SB9 2,4-Cl2 9.155 10.044 548.429 250 S S S SB10 1,2-Cl2 9.155 10.144 544.471 70 S S S SC1 SMe 9.286 10.518 569.579 83 S S S SC2 SEt 9.381 11.055 594.829 31 S S S SC3 SPr 9.454 11.563 641.818 55 S S S SC4 SiPr 9.468 11.431 620.079 26 S S S SC5 SBu 9.511 12.063 696.984 65 S S S SC6 SCH2CO2H 18.5 11.775 686.439 32 S S S SC7 SCH2CH2CO2H 18.42 12.27 744.271 38 S S W SC8 SCH2CONH2 9.404 11.808 684.26 35 S S S SC9 S(CH2)2CONH2 9.472 12.303 741.926 116 S S W SC10 S(CH2)2NH2 9.423 11.516 648.338 215 S S S SC11 S(CH2)2OH 18.67 11.475 654.819 21 S S S SC12 S(CH2)3OH 18.59 11.973 710.318 77 S S S SC13 S(CH2)2OH 18.58 11.646 603.923 44 S S S SD1 8-F 18.57 11.797 676.374 24 S S S SD2 5-F 18.61 11.773 676.374 84 S S S SD3 9-F 18.62 11.776 678.957 25 S S S SD4 5-OMe 18.61 12.27 697.06 34 S S S SD5 5-Cl 18.84 11.593 687.316 12 S S S SD6 5-Me 18.81 11.886 671.735 43 S S S SD7 5-CF3 18.79 12.8 744.15 41 S S W SD8 5-CN 8.54 12.38 692.147 674 S S S SE1 SH 18.78 11.635 640.381 50 S S S SE2 SMe 18.92 12.053 665.631 14.8 S S S SE3 SEt 18.92 12.595 692.881 9 S S S þE4 SPr 19.18 13.095 738.5 9 þ þ W þE5 iPr 19.37 12.97 720.131 8.2 þ þ W þE6 iBu 19.42 13.451 767.75 8.2 þ þ þ þE7 Cyclohexyl 19.54 14.632 882.877 6 þ þ þ þE8 Ph 19.97 14.632 882.877 5.6 þ þ þ þF1 5-OMe 19.84 12.43 682.623 9.8 þ S S þF2 SPr 19.54 13.89 780.742 5.8 þ þ þ þF3 iPr 19.59 13.765 762.373 7.2 þ þ þ þF4 iBu 19.71 14.246 809.992 5.8 þ þ þ þF5 Cyclohexyl 20.15 15.427 926.094 6.7 þ þ þ þ

Note: a (þ) active compound, (S) inactive compound, and (W) compound in transitional range.



high potency but also safety. Consequently, these models

offer vast potential for the development of potent HIV inte-

grase inhibitors.

The methodology used above aims at the development

of suitable models for providing lead molecules through

exploitation of the active ranges in the proposed models

based on MDs. Proposed models are unique and differ widely

from the conventional QSAR models. Both systems of model-

ing have their own advantages and limitations. In the instant

case, the modeling system adopted has the distinct advantage

of identification of narrow active range, which may be erro-

neously skipped during routine regression analysis in con-

ventional QSAR modeling [48]. Since the ultimate goal of

modeling is to provide lead structures, these active ranges

can naturally play a vital role in lead identification and

optimization.

The intercorrelation between various MDs used in the

study was investigated. The degree of correlation can be

appraised by correlation coefficient ‘‘r’’. Pairs of MDs with

r � 0.97 are normally considered highly inter-correlated,

those with 0.90 � r � 0.97 are appreciably correlated, those

with 0.50 � r � 0.89 are weakly correlated and finally the

pairs of indices with r < 0.50 are not intercorrelated [66].

Intercorrelation analysis (Table 5) revealed that SsOHE

is weakly correlated with the eccentric connectivity topo-

chemical index (jCC) and the molecular connectivity index

(xA) while the eccentric connectivity topochemical index (jCC)

and the molecular connectivity index (xA) are appreciably

correlated with each other.

After QSAR study by multiple linear regression (MLR) using

the forward-backward stepwise variable selection method,

the final equation developed and the statistical data

(Table 6) observed are illustrated below.

pIC50 ¼ 0.1362 (SssOHE index) þ 0.6149 (x log P) þ 1.107

(T_T_C_2) � 0.164 (SssCH2 count) � 0.0722 (SaaCHE index) þ0.8827.

The QSAR model had a correlation coefficient (r2) of 0.92,

significant cross validated correlation coefficient (q2) of 0.85,

F-test of 63.2, r2 for the external test set (pred_r2) of 0.69, and

degree of freedom of 30. The model developed predicts 85% of

variance and is validated by an external set of compounds with a

predictive correlation coefficient of 0.69. The model is validated

by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00, a_ran_pred_r2 ¼ 0.01,

best_ran_r2 ¼ 0.55, best_ran_q2 ¼ 0.30, Z-score_ran_r2 ¼ 7.28,

Z-score_ran_q2 ¼ 5.37. The randomization test suggests that the

developed model has a probability of less than 1% and that

the model is generated by chance. The predictability of

Figure 2. The decision tree for distinguishing the active analog (A)

from the inactive analog (B); A1, molecular connectivity index.

Table 3. Confusion matrix for HIV integrase inhibitory activity using models based on decision tree and random forest.

Model Description Ranges Number of compounds predicted Specificity(%)

Sensitivity(%)

MCC OOBerror (%)

Active Inactive

Decision tree Training set Active 11 0 97.4 100 0.9 –Inactive 1 38

Cross validated set Active 10 1 97.4 91 0.8 –Inactive 1 38

Random forest Active 9 2 97.4 81.8 0.8 3Inactive 1 38



the model was evaluated by the test set of compounds. The

descriptors which contribute for the pharmacological action

are shown in Fig. 4.

The major group of contributing descriptors involved

subgroups like SsOHE-index, SssCH2 count, SaaCHE index,

x log P, and alignment independent descriptors. These

descriptors help in understanding the effect of substituents

on the activity of quinolone carboxylic acid.

The direct relationship of the descriptor SsOHE index

(41.8%) suggested that the presence of an –OH group in

the side chain in the basic ring quinolone carboxylic acid

has a very favourable effect on the biological activity.

The presence of the SaaCHE index (having negative MLR

coefficient—9.32%) which represents electrotopological

indices for the number of CH groups attached to two aro-

matic bonds is inversely proportional to the biological

activity. It reveals that the presence of –CH– between the

aromatic rings decreases the biological activity.

Table 4. Proposed MAA based models for the prediction of HIV integrase inhibitory activity.

Index Nature ofrange

Indexvalue

Totalcompoundsin the range

Number ofcompoundspredictedcorrectly

Overallaccuracy of

prediction (%)

Inhibition ofstrand transfer

averageIC50 (nM) a

Antiviralactivityaverage

EC50 (nM)

Average SI

SsOHEindex

Inactive <19.18 40 39 98 10279.97(10543.32)

7016.44(7195.83)

258.74(250.27)

Active I19.18 10 10 7.23 10.02 1146.19xA Inactive <12.97 41 38 96 10 013

(10 803)6845.73

(7385.20)287.65

(256.86)Active I12.97 9 9 7.033 9.23 1113.16

jcc Inactive =710.32 38 36 95.5 10799.47

(11398.93)6385.13

(6739.32)285.76

(245.17)Transitional >710.32–<762.38 5 NA 168.88 7611.46 833.108Active I762.38 7 7 6.7 9.4 969.55

NA, not applicable.a Values in brackets are based upon correctly predicted analogs in the particular range.

Figure 3. (A) Average IC50 (nM) values of quinolone carboxylic acid for strand transfer inhibitory (STI) and antiviral activity in various

ranges of MAA based models. (B) Average selectivity index (SI) values of quinolone carboxylic acid for HIV integrase inhibitory activity in

various ranges of MAA based models.

Table 5. Intercorrelation matrix for MDs used in MAA.

SssOHE index x jc

SssOHE index 1 0.69 0.69x 1 0.93jc 1



The presence of the SssCH2 count (having negative MLR

coefficient—14.11%) in the model revealed that the number of

–CH2– between the nitrogen and the hydroxyl group should

be as small as possible for the biological activity.

The directly related descriptor x log P (23.09%) signifies the

octanol/water partition coefficient, indicating that the sub-

stitution with a more lipophilic group at the aromatic ring

has a positive influence on the biological activity.

The descriptor T_T_C_2 means the count of number of any

heavy atoms (single, double or triple bonded) separated from

the carbon atom (single, double, or triple bonded) by two

bond distance. The direct relationship (13.87%) of T_T_C_2

indicates that the carboxy and oxo group should always be

present at the 3 and 4 position for good biological activity.

After QSAR study by partial least squares (PLS) using

the forward-backward stepwise variable selection method,

the final equation was developed, and the statistical data

(Table 6) observed are illustrated below.


(T_T_C_2) � 0.1637 (SssCH2 count) � 0.0721 (SaaCHE index) þ0.8929.

The QSAR model had a correlation coefficient (r2) of 0.92, a


F-test of 81.91, r2 for the external test set (pred_r2) of 0.69,

and degree of freedom of 30. The model developed predicts

85% of variance and is validated by an external set of

compounds with a predictive correlation coefficient of 0.69.

The model was validated by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00,

a_ran_pred_r2 ¼ 0.01, best_ran_r2 ¼ 0.61, best_ran_q2 ¼ 0.48,

Z-score_ran_r2 ¼ 7.04, Z-score_ran_q2 ¼ 4.47.

The major groups of descriptors involved in developing the

equation by PLS are subgroups like SsOHE-index, SssCH2

count, SaaCHE index, x log P, and alignment independent

descriptor T_T_C_2. The descriptors are common between

MLR and PLS. These only differ from each other in their

percentage of contribution as shown in Fig. 5.Figure 4. Plot of percentage contribution of each descriptor in

the MLR model.

Figure 5. (A) Plot of percentage contribution of each descriptor in the PLS model. (B) Plot of percentage contribution of each descriptor

in the PCR model.

Table 6. Comparative statistical parameters of MLR, PLS, and

PCR.

Parameters MLR PLS PCR

N 33 33 33df 27 28 30r2 0.92 0.92 0.86q2 0.85 0.85 0.79F-test 63.2 81.91 93.06r2 se 0.32 0.31 0.40q2 se 0.43 0.42 0.48pred_ r2 0.69 0.69 0.69pred_ r2Se 0.77 0.77 0.78best_ran_ r2 0.55 0.61 0.33best_ran_ q2 0.30 0.48 0.2Z-score_ran_ r2 7.28 7.04 13.97Z-score_ran_ q2 5.37 4.47 8.75a_ ran_ r2 0.00 0.00 0.00000a_ ran_ q2 0.00 0.000 0.00000a_ ran_ pred_r2 0.01 0.01 0.001

MLR, multiple linear regression; PLS, partial least squares; PCR,principal component regression; n, number of molecules of thetraining set; df, degree of freedom; r2, coefficient of correlation;q2, cross validated r2; pred r2, r2 for the external test set; pred_ r2

Se, coefficient of correlation of the predicted data set.



After the QSAR study by principal component regression

(PCR) using the forward-backward stepwise variable selection

method, the final equation was developed and the statistical

data (Table 6) observed are illustrated below.


(T_T_C_2) � 0.0063.

The QSAR model had a correlation coefficient (r2) of 0.86, a


F-test of 93.06, r2 for external test set (pred_r2) of 0.69, and degree

of freedom of 30. The model developed predicts 79% of variance

and is validated by an external set of compounds with a pre-

dictive correlation coefficient of 0.69. The model is validated

by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00, a_ran_pred_r2 ¼ 0.001,

best_ran_r2 ¼ 0.33, best_ran_q2 ¼ 0.2.

The major groups of descriptors involved in developing the

equation by PCR are subgroups SsOHE-index, x log P, and

alignment independent descriptor T_T_C_2. The QSAR model

by PCR also reveals that the descriptors are common in

MLR and PCR. These only differ from each other in their

percentage of contribution as shown in Fig. 5.

The above study leads to the development of statistically

significant QSAR models which allow an understanding

of the molecular properties/features that play an impor-

tant role in governing the variation in the activities.

In addition, this QSAR study allows investigating the influ-

ence of very simple and easy-to-compute descriptors in

determining biological activities, which could shed light

on the key factors that may aid in the design of potent

molecules.

Conclusions

In the present studies both classification and correlation

approaches have been successfully employed for the develop-

ment of models for the prediction of HIV inhibitory activity

of quinolone carboxylic acid. All the classification models

of diverse nature exhibited a high degree of predictability

with regard to HIV integrase inhibitory activity. The accuracy

of classification of single descriptor based models using MAA

varied from 96 to 98%. Active ranges of the proposed models

exhibited high potency (indicated by low value of IC50) as

well as safety through a high value of SI (an indirect measure

of the therapeutic index). High values of correlation coeffi-

cients (r2 and q2) were observed in MLR, PLS, and PCR based

correlation models. High accuracy of prediction of these

models offers vast potential for providing lead structures

for the development of potent but safe therapeutic agents

for HIV integrase inhibition. Simultaneous use of diverse

technologies for the development of numerous models can

be of immense use for accelerating the drug discovery

process.

Methodology

Data set

All the 50 quinolone carboxylic acid derivatives reported by

Sato et al. [40] as HIV integrase inhibitors were selected as a

dataset for the purpose of the present study. The basic structures

for the said derivatives are shown in Fig. 1 and various

substituents are listed in Table 2.

Molecular descriptors

Molecular descriptors of diverse nature were used in the

current study. These included physico-chemical descriptors,

path count, path cluster, estate contribution descriptors,

polar surface area descriptors, element counts, topological

descriptors, and a variety of alignment independent descrip-

tors. All computational work was performed on an Apple

workstation (8-core processor) using V-life MDS QSAR plus

developed by V life sciences technologies Pvt. Ltd, Pune, India.

The values of other MDs which are not part of V-life MDS

QSAR plus were computed using an in-house computer

program.

Molecular descriptors with significant degenerate values

were omitted from a large pool of descriptors initially calcu-

lated both through V-life MDS QSAR plus software and an

in-house computer program. For the remaining descriptors,

a pairwise correlation analysis was carried out (one of any

two indices with r � 0.97 was excluded to reduce redundant

information). The said exclusion method was used to reduce

the collinearity and correlation between descriptors.

A total of 22 descriptors [41–63] listed in Table 1 were

employed for developing the DT. The MDs shortlisted on

the basis of non-correlating nature and classification ability

and subsequently employed for the present study are as per

the following.

SsOHE index (estate contribution)

This simply represents an electrotopological state index for

the number of hydroxyl (–OH) groups connected with one

single bond [60, 61].

SsOHEi ¼ Ii þDIi (1)

where SsOHEi is the electrotopological state for atom i, Iiis an intrinsic value, and DIi is the influence of other atoms

on i [60, 61].

SaaCHE index (estate contribution)

This is an electrotopological state index for the number of

–CH groups connected with two aromatic bonds [60, 61].

SssCH2 index (estate contribution)

This is an electrotopological state index for the number of

–CH2 groups connected with two single bonds [60, 61].



Alignment independent descriptors

These alignment-independent versatile structure descriptor

for QSAR and QSPR based on the distribution of molecular

features have been defined by Baumann [62].

These include

T_T_C_2.

This is the count of number of heavy groups (having single,

double, or triple bonds) separated from any carbon atom

(single or double bonded) by two bond distances in a

molecule [62].

Physicochemcial descriptors

x log P.

This descriptor signifies the ratio of solute concentrations

in octanol and water and is generally termed as octanol water

partition coefficient. This is the atom based evaluation of

log P [63].

Molecular connectivity topochemical index (xA)

It is defined as the summation of the modified bond values of

adjacent vertices for all edges in the hydrogen-suppressed

molecular graph [41, 42].

xA ¼Xn

i¼1

ðVci V

cj Þ�1=2 (2)

where Vi and Vj represent modified valencies of a pair of

vertices joined by edges (i, j) [41, 42].

Eccentric connectivity topochemical index (jCC)

It is defined as the summation of the quotients of the product

chemical eccentricity and the chemical degree of each vertex

in a hydrogen suppressed molecular graph having n vertices

[45].

jcc ¼

Xn

i¼1

Eic �Vic (3)

where Vic is the chemical degree of vertex i, Eic is the chemical

eccentricity of vertex i, and n is the number of vertices in

graph G [45].

Classification techniques

Decision tree (DT)

DT provides a simple and useful solution for many problems

of classification where the information contained in the

datasets is relatively complex [67]. In the present study, DT

was grown to identify the importance of MDs. In DT, the

molecules at each parent node are categorized or classified,

based upon the descriptor value, into two child nodes. The

prediction for a molecule reaching a given terminal node is

obtained by majority vote of molecules reaching the same

terminal node in a training set [68]. In the present study, the

R program (version 2.1.0) along with the RPART library was

utilized to grow DT. The active compounds were labeled as

‘‘A’’ (n ¼ 15) and the inactive compounds were similarly

labeled as ‘‘B’’ (n ¼ 37). Each analog was assigned a biological

activity which was subsequently compared with the reported

HIV integrase inhibitory activity.

Random forest (RF)

Random forests were grown for HIV integrase inhibitory

activity. RF grows numerous classification trees. RF is an

ensemble of n number of tree unpruned DTs created by

using bootstrap samples of the training data and a random

subset of variables to define the best split at each node

(tree fork) [69]. The bootstrap sample used during tree

growth is a random selection with replacement from

the molecules in the training set. Model performance for

each tree is internally assessed with the prediction error

of the data left out in the bootstrap procedure (OOB data)

[70]. In order to classify a new object from an input vector,

put the input vector down each of the trees in the forest.

Each tree gives a classification means the tree ‘‘votes’’ for

that class. The forest chooses the classification having the

maximum number of votes (over all the trees involved in

the forest) [71]. Besides preserving most of the appealing

features of DT, RF performs a type of cross-validation in

parallel with the training step by using so called OOB.

OOB data is used to calculate prediction accuracy. In the

present study, the RFs were grown separately for HIV

integrase inhibitory activity with the R program (version

2.1.0) using the RF library.

Moving average analysis (MAA)

MAA of correctly predicted compounds is the basis of develop-

ment of a single MD based model [50, 72]. For the selection

and evaluation of range specific features, exclusive activity

ranges are discovered from the frequency distribution of

response level and subsequently identify the active range

by analyzing the resulting data by maximization of the

moving average with respect to active compounds

(<35% ¼ inactive, 35–65% ¼ transitional, >65% ¼ active)

[50, 71]. For the purpose of MAA based models the compounds

having reported IC50 values of �10 nM were considered to

be active while those possessing IC50 values >10 nM were

treated to be inactive. The HIV integrase inhibitory activity

assigned to each compound was subsequently compared with

the reported biological activity [40]. The average IC50 (nM)

values for each range were also calculated. The cytotoxic

concentration reported for 50 analogs was defined as

cytotoxic concentration required to reduce the viability

of infected cells by 50% (CC50). The SI may be defined as



quotient of CC50 and IC50. The values of SI for all the analogs

were calculated from the reported values of CC50 and IC50.

The average SI values for various ranges in the proposed

models were calculated. The value of SI is an indirect measure

of therapeutic index and consequently an indication of

safety.

Correlation techniques

Multiple linear regression (MLR)

MLR is also commonly referred to as the linear free-energy

relationship (LFER). This method represents an extension of

the simple regression analysis to more than one dimension

[73]. MLR normally generates a QSAR equation by performing

standard multivariable regression calculations to facilitate

identification of the dependence of a drug property on any or

all of the descriptors under investigation [74].

Partial least squares (PLS)

PLS is an iterative regression procedure that produces its

solutions based on linear transformation of a large number

of original MDs to a small number of new orthogonal terms

called latent variables. PLS gives a statistically robust solution

even if the independent variables are highly interrelated

among themselves or when the independent variables exceed

the number of observations [75].

Principal component regression (PCR)

Principal component analysis (PCA) is a substitute for MLR

when explanatory variables are correlated. It is another data

reduction technique that generates a new set of orthogonal

descriptors referred to as principal components (PCs) which

describe most of the information contained in the independ-

ent variables in order of decreasing variance. Consequently,

PCA reduces the dimensionality of a multivariate data set of

descriptors to the actual amount of data available. When PCs

are employed as the independent variables to perform a

linear regression, the method is termed as PCR [76].

In the present study, the dataset was divided into training

and test sets by random selection method for MLR, PLS, and

PCR methods using pIC50 as dependent variable and various

descriptors as independent variables. These models were

generated using a training set of 36 molecules. Predictive

power of the resulting models was evaluated by the test set of

16 molecules with uniformly distributed biological activities.

Data analysis and validation

For classification models the sensitivity and specificity values

were calculated. The sensitivity may be defined [64] as the ratio

of the number of true positive compounds and the sum of

true positive and false negative compounds. Sensitivity can

be expressed as follows:

Sensitivity ¼ PT

PT þ NF(4)

where PT is the number of compounds correctly predicted as

active and NF is the number of compounds incorrectly pre-

dicted as inactive [64].

The specificity may be defined [64] as the ratio of the number

of true negative compounds and the sum of true negative and

false positive compounds. Specificity can be expressed as

follows:

Specificity ¼ NT

NT þ PF(5)

where NT is the number of compounds correctly predicted as

inactive and PF is the number of compounds incorrectly pre-

dicted as active [64]. The terms specificity and sensitivity

assesses the goodness of the models. These represent the classi-

fication accuracies for the active and inactive compounds,

respectively [64, 65]. The randomness of the model was also

determined by calculating MCC [64]. The intercorrelation [66]

between the estate contribution index (SssOE), the augmented

eccentric connectivity topochemical index (AjCC), the molecular

connectivity index (xA) and the connective eccentricity topo-

chemical index (CjC ) was also investigated. Results are

summarized in Tables 2–5 and Figs. 2 and 3.

The following statistical measures were used to correlate

biological activity and MDs for correlation models; n, number

of molecules; k, number of descriptors in a model; df, degree of

freedom; r2, coefficient of correlation; q2, cross validated r2;

pred_ r2, r2 for external test set; pred_ r2Se, coefficient of

correlation of predicted dataset; Z-score, Z-score calculated by

the randomization test; best _ran_ r2; best _ran_q2, highest q2

value in the randomization test; a, statistical significance

parameter obtained by the randomization test. Validation

was done to study the internal stability and predictive ability

of the correlation models. Internal validation of the correlation

models was carried out using the leave-one-out (q2, LOO)

method which describes the internal stability of a model

[77]. For external validation, the activity of each molecule in

the test set was predicted using the model developed by the

training set [78]. The pred_r2 value is indicative of the predic-

tive power of the current model for the external test set. To

evaluate the statistical significance of the QSAR model for an

actual dataset, one tail hypothesis testing was used [79, 80]. The

robustness of the models for the training sets was analyzed by

comparing these models to those derived for random datasets.

Random sets were generated rearranging the activities of

molecules in the training set. The statistical model was derived

using various randomly rearranged activities (random sets)

with the selected descriptors and the corresponding values



of q2 were calculated. The significance of the models hence

obtained was derived based on a calculated Z-score [79, 80]. If

the Z-score value is less than 4.0, the probability (a) of signifi-

cance of the randomization test is calculated by the formula as

given in the literature [80]; otherwise it is derived by comparing

the Z-score value with the Z-score critical value as reported in

ref. [81]. For example, a Z-score value greater than 3.10 indicates

that there is a probability (a) of less than 0.001 that the QSAR

model constructed for the real dataset is generated by chance.

The randomization test suggests that all the developed models

have a probability of less than 1% that the model is generated

by chance.

The authors have declared no conflict of interest.

References

[1] E. B. De Melo, M. Miguel, C. Ferreira, Eur. J. Med. Chem. 2009,44, 3577–3583.

[2] E. De Clercq, Nat. Rev. 2007, 6, 1001–1018.

[3] J. Stebbing, S. Patterson, F. Gotch, Cell Res. 2003, 13, 1–7.

[4] A. Pendri, T. L. Troyer, M. J. Sofia, M. A. Walker, B. N. Naidu,J. Banville, N. A. Meanwell, I. Dicker, Z. Lin, M. Krystal, S. W.Gerritz, J. Comb. Chem. 2010, 12, 84–90.

[5] Y. Pommier, A. A. Johnson, C. Marchand, Nat. Rev. 2005, 4,236–248.

[6] T. T. Charvat, D. J. Lee, W. E. Robinson, A. R. Chamberlin,Bioorg. Med. Chem. 2006, 14, 4552–4567.

[7] P. O. Brown, in Retroviruses (Eds.: J. M. Coffin, S. H. Hughes,H. E. Varmus), Cold Spring Harbor Laboratory, ColdSpringHarbor, NY 1997, pp. 161–203.

[8] R. Craigie, J. Biol. Chem. 2001, 276, 23213–23216.

[9] P. Pace, M. E. Di Francesco, C. Gardelli, S. Harper,E. Muraglia, E. Nizi, F. Orvieto, A. Petrocchi, M. Poma,M. Rowley, R. Scarpelli, R. Laufer, O. G. Paz, E. Monteagudo,F. Bonelli, D. Hazuda, K. A. Stillmock, V. Summa, J. Med. Chem.2007, 50, 2225–2239.

[10] M. L. Barreca, L. D. Luca, N. Iraci, A. Chimirri, J. Med. Chem.2006, 49, 3994–3997.

[11] M. Li, M. Mizuuchi, T. R. Burke, R. Craigie, EMBO J. 2006, 25,1295–1304.

[12] Y. Aubert, M. Chassignol, V. Roig, G. Mbemba, J. Weiss,H. Meudal, J. F. Mouscadet, U. Asseline, Eur. J. Med. Chem.2009, 44, 5029–5044.

[13] S. Hare, S. S. Gupta, E. Valkov, A. Engelman, P. Cherepanov,Nature 2010, 464, 232–236.

[14] J. A. Grobler, K. Stillmock, B. Hu, M. Witmer, P. Felock, A. S.Espeseth, A. Wolfe, M. Egbertson, M. Bourgeois, J. Melamed,J. S. Wai, S. Young, J. Vacca, D. J. Hazuda, Proc. Natl. Acad. Sci.USA 2002, 99, 6661–6666.

[15] L. Zhuang, J. S. Wai, M. W. Embrey, T. E. Fisher, M. S.Egbertson, L. S. Payne, J. P. Guare, J. P. Vacca, D. J.Hazuda, P. J. Felock, A. L. Wolfe, K. A. Stillmock, M. V.Witmer, G. Moyer, W. A. Schleif, L. J. Gabryelski, Y. M.

Leonard, J. J. Lynch, S. R. Michelson, S. D. Young, J. Med.Chem. 2003, 46, 453–456.

[16] D. J. Hazuda, N. J. Anthony, R. P. Gomez, S. M. Jolly, J. S. Wai,L. Zhuany, T. E. Fisher, M. Embrey, J. P. Guare, M. S.Egbertson, J. P. Vacca, J. R. Huff, P. J. Felock, M. V.Witmer, K. A. Stillmock, R. Danovich, J. Gobler, M. D.Miller, A. S. Espeseth, L. Jin, I.-W. Chen, K. Kassahun, J. D.Ellis, B. K. Wong, W. Xu, P. G. Pearson, W. A. Schleif,R. Cortese, E. Emini, V. Summa, M. K. Holloway, S. D.Young, Proc. Natl. Acad. Sci. 2004, 101, 11233–11238.

[17] V. Summa, A. Petrocchi, V. G. Matassa, C. Gardelli, E. Muraglia,M. Rowley, O. G. Paz, R. Laufer, E. Monteagudo, P. Pace, J. Med.Chem. 2006, 49, 6646–6649.

[18] H. Jin, R. Z. Cai, L. Schacherer, S. Jabri, M. Tsiang, M. Fardis,X. Chen, J. M. Chen, C. U. Kim, Bioorg. Med. Chem. Lett. 2006, 16,3989–3992.

[19] S. G. Deeks, S. Kar, S. I. Gubernick, P. Kirkpatrick, Nat. Rev.2008, 7, 117–118.

[20] M. Sato, T. Motomura, H. Aramaki, T. Matsuda, M. Yamashita,Y. Ito, H. Kawakami, Y. Matsuzaki, W. Watanabe, K. Yamataka,S. Ikeda, E. Kodama, M. Matsuoka, H. Shinkai, J. Med. Chem.2006, 49, 1506–1508.

[21] S. Pasquini, C. Mugnaini, C. Tintori, M. Botta, A. Trejos, I. K.Arvela, M. Larhed, M. Witvrouw, M. Michiels, F. Christ,Z. Debyser, F. Corelli, J. Med. Chem. 2008, 51, 5125–5129.

[22] A. Billich, Curr. Opin. Invest. Drugs 2003, 4, 206–209.

[23] M. Markowitz, J. O. Morales-Ramirez, B. Y. Nguyen, C. M.Kovacs, R. T. QSteigbigel, D. A. Cooper, R. Liporace,R. Schwartz, R. Isaacs, L. R. Gilde, L. Wenning, J. Zhao,H. Teppler, J. Acquir. Immune Defic. Syndr. 2006, 43, 509–515.

[24] E. DeJesus, D. Berger, M. Markowitz, C. Cohen, T. Hawkins,P. Ruane, R. Elion, C. Farthing, L. Zhong, A. K. Cheng,D. McColl, B. P. Kearney, J. Acquir. Immune Defic. Syndr.2006, 43, 1–5.

[25] E. P. Garvey, B. A. Johns, M. J. Gartland, S. A. Foster, W. H.Miller, R. G. Ferris, R. J. Hazen, M. R. Underwood, E. E. Boros,J. B. Thompson, J. G. Weatherhead, C. S. Koble, S. H. Allen,L. T. Schaller, R. G. Sherrill, T. Yoshinaga, M. Kobayoshi,C. Wakasa-Morimoto, S. Miki, K. Nakahara, T. Noshi,A. Sato, T. Fujiwara, Antimicrob. Agents Chemother. 2008, 52,901–908.

[26] E. P. Garvey, B. Schwartz, M. J. Gartland, S. Lang, W. Halsey,G. Sathe, H. L. Carter, K. L. Weaver, Mutant Biochem. 2009, 48,1644–1653.

[27] E. E. Boros, C. E. Edwards, S. A. Foster, M. Fuji, T. Fujiwara,E. P. Garvey, P. L. Golden, R. J. Hazen, J. L. Jeffrey, B. A. Johns,T. Kawasuji, R. Kiyama, C. S. Koble, N. Kurose, W. H. Miller,A. L. Mote, H. Murai, A. Sato, J. Thompson, M. C. Woodward,T. Yoshinaga, J. Med. Chem. 2009, 52, 2754–2761.

[28] M. Grover, B. Singh, M. Bakshi, S. Singh, Pharm. Sci. Technol.Today 2000, 3, 28–35.

[29] C. Hansch, A. Leo, S. B. Mekapati, A. Kurup, Bioorg. Med. Chem.2004, 12, 3391–3400.

[30] F. A. Pasha, H. K. Srivastava, A. Srivastava, P. P. Singh, QSARComb. Sci. 2007, 26, 69–84, DOI: 10.1002/qsar.200630010

[31] M. Zhao, Z. Li, Y. Wu, Y. R. Tang, C. Wang, Z. Zhang, S. Peng,Eur. J. Med. Chem. 2007, 42, 955–965.



[32] H. K. Srivastava, M. Chourasia, D. Kumar, G. N. Sastry,J. Chem. Inf. Model. 2011, 51, 558–571.

[33] H. K. Srivastava, Bull. Kor. Chem. Soc. 2009, 30, 67–76.

[34] P. Srivani, G. N. Sastry, J. Mol. Graph. Mod. 2009, 27, 676–688.

[35] P. Srivani, D. Usharani, E. D. Jemmis, G. N. Sastry, Curr.Pharm. Des. 2008, 14, 3854–3872.

[36] H. K. Srivastava, F. A. Pasha, S. K. Mishra, P. P. Singh, Med.chem. Res. 2009, 18, 455–466.

[37] M. H. Bohari, H. K. Srivastava, G. N. Sastry, Org. Med. Chem.Lett. 2011, 1, 1–12.

[38] C. Hansch, T. Fujita, in Classical and Three-Dimensional QSAR inAgro-chemistry (Eds.: C. Hansch, T. Fujita), Symposium Series606, American Chemical Society, Washington, DC 1995,pp. 1–12.

[39] O. Ivanciuc, S. L. Taraviras, J. Chem. Inf. Comput. Sci. 2000, 40,126–134.

[40] M. Sato, H. Kawakami, T. Motomura, H. Aramaki, T. Matsuda,M. Yamashita, Y. Ito, Y. Matsuzaki, K. Yamataka, S. Ikeda,H. Shinkai, J. Med. Chem. 2009, 52, 4869–4882.

[41] A. Goel, A. K. Madan, J. Chem. Inf. Comput. Sci. 1995, 35, 510–514.

[42] H. Dureja, A. K. Madan, J. Mol. Mod. 2005, 11, 525–531.

[43] S. Gupta, M. Singh, A. K. Madan, Indian J. Chem. 2003, 42A,1414–1425.

[44] A. K. Madan, H. Dureja, in Novel Molecular StructureDescriptors—Theory and Applications II (Eds.: I. Gutman,B. Furtula), Eccentricity Based Descriptors for QSAR/QSPR,Mathematical Chemistry Monographs, No. 9, University ofKragujevac, Kragujevac, Serbia 2010, pp. 91–138.

[45] V. Kumar, S. Sardana, A. K. Madan, J. Mol. Mod. 2004, 10, 399–407.

[46] S. Bajaj, S. S. Sambhi, A. K. Madan, Croat. Chem. Acta 2005, 78,165–174.

[47] S. Bajaj, S. S. Sambhi, A. K. Madan, J. Mol. Struct. (Theochem)2004, 684, 197–203.

[48] H. Dureja, S. Gupta, A. K. Madan, J. Mol. Graph. Mod. 2008, 26,1020–1029.

[49] M. Randic, J. Am. Chem. Soc. 1975, 97, 6609–6615.

[50] S. Gupta, M. Singh, A. K. Madan, J. Comput. Aided Mol. Des.2001, 15, 671–678.

[51] S. Bajaj, S. S. Sambhi, A. K. Madan, QSAR Comb. Sci. 2006, 25,813–823.

[52] V. Sharma, R. Goswami, A. K. Madan, J. Chem. Inf. Comput. Sci.1997, 37, 273–282.

[53] S. Gupta, M. Singh, A. K. Madan, J. Mol. Graph. Mod. 2000, 18,18–25.

[54] I. Gutman, B. Ruscic, N. Trinajstic, C. F. Wicox, J. Chem. Phys.1975, 62, 3399–3405.

[55] I. Gutman, M. Randic, Chem. Phys. Lett. 1977, 47, 15–19.

[56] H. Wiener, J. Am. Chem. Soc. 1947, 69, 2636–2638.

[57] H. Wiener, J. Chem. Phys. 1947, 15, 766.

[58] A. T. Balaban, Pure Appl. Chem. 1983, 55, 199–206.

[59] S. Gupta, M. Singh, A. K. Madan, J. Chem. Inf. Comput. Sci. 1999,39, 272–277.

[60] L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci. 1995, 35, 502–514.

[61] L. H. Hall, B. K. Moheny, L. B. Kier, J. Chem. Inf. Comput. Sci.1991, 31, 76–82.

[62] K. Baumann, J. Chem. Inf. Comput. Sci. 2002, 42, 26–35.

[63] R. Wang, Y. Fu, L. Lai, J. Chem. Inf. Comput. Sci. 1997, 37, 615–621.

[64] L. Han, Y. Wang, S. H. Bryant, BMC Bioinformatics 2008, 9, 401–408.

[65] C. Lamanna, M. Bellini, A. Padova, G. Westerberg, L. Maccari,J. Med. Chem. 2008, 51, 2891–2897.

[66] S. Nikolic, G. Kavacevic, A. Milicevic, N. Trinanjstic, Croat.Chem. Acta 2003, 76, 113–124.

[67] H. Kim, G. J. Koehler, Omega Int. J. Manage. Sci. 1995, 23, 637–652.

[68] A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, S. D. Brown,J. Chemomet. 2004, 18, 275–285.

[69] L. Breiman, Mach. Learn. 2001, 45, 5–32.

[70] A. Merzlikine, Y. A. Abramov, S. J. Kowsz, V. H. Thomas,T. Mano, Int. J. Pharm. 2011, 418, 207–216, DOI: 10.1016/j.ijpharm.2011.03.065

[71] A. Liaw, M. Wiener, ‘Classification and regression byrandom forest’, R News, 2002, 2, 18–22.

[72] H. Dureja, S. Gupta, A. K. Madan, Sci. Pharm. 2008, 76, 377–394.

[73] R. A. Berk, in Regression Analysis: A constructive Critique (Ed.:R. A. Berk) The formalities of Multiple regression, SAGEPublications Ltd, London 2003, pp. 103–110.

[74] T. W. Schultz, M. T. D. Cronin, J. D. Walker, A. O. Aptula,J. Mol. Struct. 2003, 622, 1–22.

[75] S. Wold, W. J. Dunn, J. Chem. Inf. Comput. Sci. 1998, 23, 6–23.

[76] G. H. Dunteman, in Principal Component Analysis (Ed.:D. H. Dunteman), Basic concepts of principal componentanalysis. SAGE Publications Ltd, London 1989, pp. 15–22.

[77] W. Zheng, A. Tropsha, J. Chem. Inf. Comput. Sci. 2000, 40, 185–194.

[78] P. Gedeck, B. Rohde, C. Bartels, J. Chem. Inf. Comput. Sci. 2006,46, 1924–1936.

[79] N. Gilbert, in Statistics (Ed.: N. Gilbert), W. B. Saunders Co,Philadelphia, PA 1976.

[80] A. Golbraikh, A. Tropsha, J. Chem. Inf. Comput. Sci. 2003, 43,144–154.

[81] M. Shen, Y. Xiao, A. Golbraikh, V. K. Gombar, A. Tropsha,J. Med. Chem. 2003, 46, 3013–3020.



Documents

Diverse Models for the Prediction of HIV Integrase Inhibitory Activity of Substituted Quinolone Carboxylic Acids