Upload
monika-gupta
View
213
Download
0
Embed Size (px)
Citation preview
Full Paper
Diverse Models for the Prediction of HIV Integrase InhibitoryActivity of Substituted Quinolone Carboxylic Acids
Monika Gupta1 and Anil Kumar Madan2
1 Faculty of Pharmaceutical Sciences, M. D. University, Rohtak, India2 Faculty of Pharmaceutical Sciences, Pt. B. D. Sharma University of Health Sciences, Rohtak, India
In the present study both classification and correlation techniques of diverse nature were successfully
employed for the development of models for the prediction of human immunodeficiency virus (HIV)
integrase inhibitory activity using a dataset comprising 50 analogs of quinolone carboxylic acid. The
values of various molecular descriptors (MDs) for each analog in the dataset were computed using
the MDS V-life science QSAR plus module. The values of other MDs which are not part of MDS V-life
science were computed using an in-house computer program. A decision tree (DT) was constructed for
the HIV integrase inhibitory activity to determine the importance of MDs. The DT learned the
information from the input data with an accuracy of 98% and correctly predicted the cross-
validated (10 fold) data with an accuracy of 96%. Three MDs, E-state contribution descriptor
(SssOHE), molecular connectivity topochemical index (xA), and eccentric connectivity
topochemical index (jCC), were used to develop the models using moving average analysis (MAA).
The accuracy of classification of single descriptor based models using MAA was found to vary from a
minimum of 96% to a maximum of 98%. The statistical significance of the models was assessed
through specificity, sensitivity, overall accuracy, Mathew’s correlation coefficient, and
intercorrelation analysis. The widely used methods like multiple linear regression, partial least
squares, and principal component regression were employed for development of correlation
models. The models were generated on a training set of 36 molecules. The models had a
correlation coefficient (r2) of 0.86 to 0.92, significant cross validated correlation coefficient (q2) of
0.79 to 0.85, F-test from 63.2 to 93.06, r2 for external test set (pred_r2) from 0.69, coefficient of
correlation of predicted dataset (pred_ r2Se) of 0.77, and degree of freedom from 27 to 30. Alignment
independent descriptors, SsOHE-index, SaaCHE index, SssCH2, and x log P were found to be the most
important descriptors for the development of correlation models for the prediction of HIV integrase
inhibitory activity.
Keywords: Eccentric connectivity topochemical index / E-state contribution indices / HIV integrase inhibitors / Molecular
connectivity index / Quinolone carboxylic acid
Received: October 24, 2011; Revised: July 4, 2012; Accepted: July 11, 2012
DOI 10.1002/ardp.201100316
Introduction
Human immunodeficiency virus (HIV), a retrovirus, is the
primary cause of acquired immunodeficiency syndrome
(AIDS), and one of the main medical and social problems
nowadays [1]. An estimated 36 million people worldwide are
currently suffering from HIV and some 20 million people
having already died, representing a cumulative total number
of HIV infections to be 56 million. However, there is still no
known cure or vaccination against it [2].
Human immunodeficiency virus infection is characterized
by the relentless decline in both the number and function of
HIV specific CD4þ and T helper cells which are preferentially
infected [3]. Although highly active antiretroviral therapy
(HAART) regimens combining three or more RT or protease
inhibitors have proven to effectively suppress uncontrolled
Correspondence: Prof. Anil Kumar Madan, Faculty of PharmaceuticalSciences, Pt. B. D. Sharma University of Health Sciences, Rohtak 124 001,India.E-mail: [email protected]: þ91-1262-213202
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 1
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
viral replication, drug resistance, and drug-induced side-
effects can hinder complete viral suppression. It thus follows
that the identification of an inhibitor of HIV IN would add a
valuable third component to the antiviral armamentarium
[4]. HIV IN is an attractive target because there is no known
homolog in human cells, potentially minimizing the side-
effects associated with other antiviral agents [5, 6].
Integration of retroviral DNA into the genome of the host
cell is an essential step in the viral replication cycle [7, 8]. It is
a multistep process consisting of three biochemical steps:
(i) assembly of a complex with specific DNA sequences at the
end of HIV-1 long terminal repeat (LTR) regions; (ii) endo-
nucleolytic processing of the viral DNA which is initially
blunt-ended and is processed by the viral integrase protein
to remove the terminal dinucleotide from each 30 end. This
30 end-processing step exposes the conserved CA dinucleoti-
des that ultimately define the ends of the integrated viral
DNA; (iii) strand transfer, in which the viral DNA 30 ends
are covalently linked to the cellular DNA [9]. Integration
occurs by a pair of transesterification reactions, catalyzed
by integrase, in which the 30 hydroxyl groups of the processed
viral DNA ends attack a pair of phosphodiester bonds in
the target DNA.
In the case of HIV-1, the sites of attack on the two target
DNA strands are separated by 5 bp. The 30 ends of the viral
DNA are joined to target DNA, whereas the 50 ends of the
viral DNA are unjoined in the resulting integration inter-
mediate in the ‘‘30-processing’’ reaction, IN removes the ter-
minal GT dinucleotides from each LTR 30 end, while in the
‘‘strand transfer’’ reaction the two newly processed 30-viral
DNA ends are joined to the cellular DNA [10]. Repair of the
intermediate by cellular enzymes completes the integration
process [11]. Among different classes of integrase inhibitors
two classes, viz. dinucleotide analog [12] and strand transfer
inhibition, play a major role. The binding of strand-
transfer inhibitors displaces the reactive viral DNA end
from the active site, disarming the viral nucleoprotein com-
plex [13].
Several classes of HIV integrase inhibitors have been
reported in the literature. These include diketo acids [14],
naphthyridine ketones [15], naphthyridine carboxamides
[16, 17], and tri-cyclic pyrroloquinolines [18]. In spite of
numerous efforts for the development of integrase inhibi-
tors, only the diketo acid class of compounds could be devel-
oped to an advanced stage. The diketo acid moiety (g-ketone,
enolizable a-ketone, and carboxylic acid) was believed to be
essential for the inhibitory activity of this series of integrase
inhibitors [14]. In 2007, raltegravir (Isentress; Merck), a small-
molecule drug that inhibits HIV-1 integrase, became the first
drug in the diketo acid class to be approved by the US FDA
for the treatment of HIV-1 infection in combination with
other antiretroviral agents [19].
In order to identify new IN inhibitors, Sato and co-workers
[20] reported that the core structure of 4-quinolone-3-carbox-
ylic acid antibiotics can be used as an alternative scaffold to
diketo acids. These novel quinolone IN inhibitors were struc-
turally optimized into the highly potent compound GS-9137
(elvitegravir), exhibiting potent inhibitory activity against
IN-catalyzed DNA strand transfer and antiviral activity
in vitro [20]. Pasquini and co-workers [21] further explored
the effect of chemical modifications on the 4-quinolone-3-
carboxylic acid scaffold, with the aim of gaining new insight
into the fundamental structural requirements for integrase
inhibitory activity.
The global research effort to identify drugs that inhibit
HIV integrase has spanned approximately 1.5 decades and
recently led to human trials with several candidates, includ-
ing L-870,810 [22], GS-9137 [16], and MK-0518 [23], elvitegravir
[24], and GSK364735 [25]. Four of these exploratory drugs
have yielded 2-log or greater decreases in viral loads in phase
II studies, reflecting a profound clinical effect in blocking
this viral enzyme [26]. All of these molecules inhibit the ST
step of HIV-1 integrase and possess submicromolar antiviral
activity [27].
Structure based design spurred by significant pitfalls of
traditional method and rapid advances in molecular struc-
ture determination and computational resources has now
been accepted as a rational approach for the generation of
new pharmaceuticals [28]. Quantitative structure–activity/
property relationships (QSAR/QSPR) offer the possibility for
screening a large number of chemicals in a short time and
with low cost. QSAR/QSPR constitute vital tools used in drug
design and have gained extensive recognition in the corre-
lation and prediction of various properties like ADME [29],
toxicity [30], retention time [31], stability [32], and physico-
chemical properties [33] other than the biological activity
[34–37]. QSAR/QSPR establishes a statistical relationship
between biological activity of the chemicals of interest
and their structural properties [38]. Thousands of descriptors
from five classes are used in QSAR/QSPR studies: constitutional,
graph theoretical and topological indices, geometrical, electro-
static, and quantum descriptors. But the descriptors that
are simpler, easy to compute and non-correlated found their
extensive use in the QSAR/QSPR studies [39].
In the present study diverse classification and correlation
techniques have been successfully employed for the develop-
ment of models for the prediction of HIV-I integrase inhibi-
tory activity of quinolone carboxylic acid [40].
Results and discussion
In the present study, a decision tree (DT) was built from a set
of 22 molecular descriptors (MDs) listed in Table 1 [41–63].
The descriptor at the root node is most important and the
2 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
importance of descriptors decreases as the length of tree
increases. The classification of various quinolone carboxylic
acid analogs (Fig. 1 and Table 2) as inactive and active using a
single tree, based on molecular connectivity index A1, is
shown in Fig. 2. The DT identified the molecular connectivity
index A1 as the most important index. The DT classified the
analogs with an accuracy of 98%. The specificity and sensi-
tivity of the training set was found to be of the order of 97.4
and 100%, respectively (Table 3). In 10-fold cross-validation,
96% of quinolone carboxylic acid analogs were correctly
classified with regard to biological activity. The specificity
and sensitivity of the cross validated set was found to be 97.4
and 91% and the value of Mathew’s correlation coefficient
(MCC) for the training and cross validated sets was found
to be 0.94 and 0.8, respectively (Table 3). MCC takes into
consideration both the sensitivity and specificity and is
generally used as a balanced measure in dealing with a data
imbalance situation. The values of MCC range between �1
to þ1 and indicate the potential of the model. High values
of MCC simply represent robustness of the proposed models
[64, 65].
The random forests (RFs) were grown with 22 MDs listed in
Table 1. The importance of a node was determined by the
mean decrease in accuracy. The RF classified quinolone car-
boxylic acid analogs as inactive and active with an accuracy
of 94% with respect to the HIV integrase inhibitory activity.
The out-of-bag (OOB) estimate of error was found to be only
3%. The specificity and sensitivity was of the order of 97.4 and
81.8%, respectively, and the value of MCC was found to be 0.8
(Table 3). Using a single descriptor at a time, four independ-
ent moving average analysis (MAA) based models using the
E-state contribution index (SsOHE), the molecular connec-
tivity index (xA) and the eccentric connectivity topochemical
index (jCC) were developed. The proposed models are illus-
trated in Table 4. The overall accuracy of prediction varied
from 96% for the eccentric connectivity topochemical index
(jCC) to >98% for SsOHE. Transitional ranges were observed in
all the models indicating a gradual change in HIV integrase
inhibitory activity. The average IC50 (Table 4 and Fig. 3) for
the active range in all the models varied from 6.7 to 7.23 nM.
The said range was also found to be active against overall
antiviral activity (Table 4 and Fig. 3). The observation of
extremely low average IC50 values simply indicates high
potency of the active ranges in the proposed models.
Moreover, the active ranges possessed average selectivity
index (SI) values from 970 to 1113.6 (Table 4 and Fig. 3).
The SI may be defined as quotient of CC50 and IC50. The
values of SI for all the analogs were calculated from the
reported values of CC50 and IC50. The value of SI is an indirect
measure of therapeutic index and consequently an indica-
tion of safety. The average low value of IC50 and high
average value of SI of the active range not only indicates
Table 1. List of molecular descriptors employed for the study.
Code Descriptor References
A1 Molecular connectivity topochemical index [41, 42]A2 Eccentric adjacency topochemical index [43]A3 Augmented eccentric connectivity
topochemical index[44]
A4 Eccentric connectivity topochemical index [45]A5 Connective eccentricity topochemical index [44]A6 Zagreb topochemical index, M2
C [46]A7 Wiener’s topochemical index [47]A8 Superaugmented eccentric connectivity
topochemcial index[48]
A9 Molecular connectivity index [49]A10 Eccentric adjacency index [50]A11 Augmented eccentric connectivity index [51]A12 Eccentric connectivity index [52]A13 Connective eccentricity index [53]A14 Zagreb index, M2 [54, 55]A15 Wiener’s index [56, 57]A16 Balaban mean square distance index [58]A17 Superpendentic index [59]A18 SsOHE-index [60, 61]A19 SaaCHE-index [60, 61]A20 SssCH2E-index [60, 61]A21 T_T_C_2 [62]A22 x log P [63]
XR1
O
R
O
1
349
2
8
67
5
X=C for A1
X=N for All
(A)
N
O
OH
H
R
1
349
2
8
67
5 O
(B)
N
O
OH
R
ClX
1
349
28
67
5O
X=Cl for all
X=F for C13
(C)
N
O
OH
OHR
ClCl
1
349
2
8
67
5O
(D)
N
O
OH
OH
FCl
R1
349
28
67
5O
(E)
N
O
OH
OH
FCl
O
R1
34
9
28
67
5O
(F)
Figure 1. Basic structures and arbitrary atom numbering scheme
for the quinolone carboxylic acid [40].
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 3
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
Table 2. Relationship of molecular connectivity topochemical indices, eccentric connectivity topochemical index, and Balaban’s mean
square distance topochemical index with HIV integrase inhibitory activity.
Cpd no. R/R1 SsOHEindex
xA jcc IC50 (nM) (from ref. [40]) HIV integrase inhibitory activitya
Predicted using MAA models Reported 40
SsOHE-index xA jcc
A1 –COCOOH 17.79 9.783 421.312 50 S S S SA2 SCOCOOH/–CH3 8.915 11.098 500.183 100 000 S S S SA3 SOH 9.078 9.876 407.99 1630 S S S SA4 SOH/–CH3 9.209 10.249 428.855 2300 S S S SA5 SOEt 7.482 10.988 490 100 000 S S S SA6 NH2 7.498 9.907 406.33 100 000 S S S SB1 3-Cl 9.108 9.96 542.822 100 000 S S S SB2 2-Cl 9.113 9.96 506.531 800 S S S SB3 1-Cl 9.12 9.993 472.115 410 S S S SB4 1-F 9.065 10.174 433.097 500 S S S SB5 1-Me 9.114 10.286 427.656 1080 S S S SB6 1-OMe 9.13 10.671 465.75 1170 S S S SB7 1-CF3 8.43 11.201 526.662 720 S S S SB8 1,4-Cl2 9.162 10.078 544.471 370 S S S SB9 2,4-Cl2 9.155 10.044 548.429 250 S S S SB10 1,2-Cl2 9.155 10.144 544.471 70 S S S SC1 SMe 9.286 10.518 569.579 83 S S S SC2 SEt 9.381 11.055 594.829 31 S S S SC3 SPr 9.454 11.563 641.818 55 S S S SC4 SiPr 9.468 11.431 620.079 26 S S S SC5 SBu 9.511 12.063 696.984 65 S S S SC6 SCH2CO2H 18.5 11.775 686.439 32 S S S SC7 SCH2CH2CO2H 18.42 12.27 744.271 38 S S W SC8 SCH2CONH2 9.404 11.808 684.26 35 S S S SC9 S(CH2)2CONH2 9.472 12.303 741.926 116 S S W SC10 S(CH2)2NH2 9.423 11.516 648.338 215 S S S SC11 S(CH2)2OH 18.67 11.475 654.819 21 S S S SC12 S(CH2)3OH 18.59 11.973 710.318 77 S S S SC13 S(CH2)2OH 18.58 11.646 603.923 44 S S S SD1 8-F 18.57 11.797 676.374 24 S S S SD2 5-F 18.61 11.773 676.374 84 S S S SD3 9-F 18.62 11.776 678.957 25 S S S SD4 5-OMe 18.61 12.27 697.06 34 S S S SD5 5-Cl 18.84 11.593 687.316 12 S S S SD6 5-Me 18.81 11.886 671.735 43 S S S SD7 5-CF3 18.79 12.8 744.15 41 S S W SD8 5-CN 8.54 12.38 692.147 674 S S S SE1 SH 18.78 11.635 640.381 50 S S S SE2 SMe 18.92 12.053 665.631 14.8 S S S SE3 SEt 18.92 12.595 692.881 9 S S S þE4 SPr 19.18 13.095 738.5 9 þ þ W þE5 iPr 19.37 12.97 720.131 8.2 þ þ W þE6 iBu 19.42 13.451 767.75 8.2 þ þ þ þE7 Cyclohexyl 19.54 14.632 882.877 6 þ þ þ þE8 Ph 19.97 14.632 882.877 5.6 þ þ þ þF1 5-OMe 19.84 12.43 682.623 9.8 þ S S þF2 SPr 19.54 13.89 780.742 5.8 þ þ þ þF3 iPr 19.59 13.765 762.373 7.2 þ þ þ þF4 iBu 19.71 14.246 809.992 5.8 þ þ þ þF5 Cyclohexyl 20.15 15.427 926.094 6.7 þ þ þ þ
Note: a (þ) active compound, (S) inactive compound, and (W) compound in transitional range.
4 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
high potency but also safety. Consequently, these models
offer vast potential for the development of potent HIV inte-
grase inhibitors.
The methodology used above aims at the development
of suitable models for providing lead molecules through
exploitation of the active ranges in the proposed models
based on MDs. Proposed models are unique and differ widely
from the conventional QSAR models. Both systems of model-
ing have their own advantages and limitations. In the instant
case, the modeling system adopted has the distinct advantage
of identification of narrow active range, which may be erro-
neously skipped during routine regression analysis in con-
ventional QSAR modeling [48]. Since the ultimate goal of
modeling is to provide lead structures, these active ranges
can naturally play a vital role in lead identification and
optimization.
The intercorrelation between various MDs used in the
study was investigated. The degree of correlation can be
appraised by correlation coefficient ‘‘r’’. Pairs of MDs with
r � 0.97 are normally considered highly inter-correlated,
those with 0.90 � r � 0.97 are appreciably correlated, those
with 0.50 � r � 0.89 are weakly correlated and finally the
pairs of indices with r < 0.50 are not intercorrelated [66].
Intercorrelation analysis (Table 5) revealed that SsOHE
is weakly correlated with the eccentric connectivity topo-
chemical index (jCC) and the molecular connectivity index
(xA) while the eccentric connectivity topochemical index (jCC)
and the molecular connectivity index (xA) are appreciably
correlated with each other.
After QSAR study by multiple linear regression (MLR) using
the forward-backward stepwise variable selection method,
the final equation developed and the statistical data
(Table 6) observed are illustrated below.
pIC50 ¼ 0.1362 (SssOHE index) þ 0.6149 (x log P) þ 1.107
(T_T_C_2) � 0.164 (SssCH2 count) � 0.0722 (SaaCHE index) þ0.8827.
The QSAR model had a correlation coefficient (r2) of 0.92,
significant cross validated correlation coefficient (q2) of 0.85,
F-test of 63.2, r2 for the external test set (pred_r2) of 0.69, and
degree of freedom of 30. The model developed predicts 85% of
variance and is validated by an external set of compounds with a
predictive correlation coefficient of 0.69. The model is validated
by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00, a_ran_pred_r2 ¼ 0.01,
best_ran_r2 ¼ 0.55, best_ran_q2 ¼ 0.30, Z-score_ran_r2 ¼ 7.28,
Z-score_ran_q2 ¼ 5.37. The randomization test suggests that the
developed model has a probability of less than 1% and that
the model is generated by chance. The predictability of
Figure 2. The decision tree for distinguishing the active analog (A)
from the inactive analog (B); A1, molecular connectivity index.
Table 3. Confusion matrix for HIV integrase inhibitory activity using models based on decision tree and random forest.
Model Description Ranges Number of compounds predicted Specificity(%)
Sensitivity(%)
MCC OOBerror (%)
Active Inactive
Decision tree Training set Active 11 0 97.4 100 0.9 –Inactive 1 38
Cross validated set Active 10 1 97.4 91 0.8 –Inactive 1 38
Random forest Active 9 2 97.4 81.8 0.8 3Inactive 1 38
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 5
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
the model was evaluated by the test set of compounds. The
descriptors which contribute for the pharmacological action
are shown in Fig. 4.
The major group of contributing descriptors involved
subgroups like SsOHE-index, SssCH2 count, SaaCHE index,
x log P, and alignment independent descriptors. These
descriptors help in understanding the effect of substituents
on the activity of quinolone carboxylic acid.
The direct relationship of the descriptor SsOHE index
(41.8%) suggested that the presence of an –OH group in
the side chain in the basic ring quinolone carboxylic acid
has a very favourable effect on the biological activity.
The presence of the SaaCHE index (having negative MLR
coefficient—9.32%) which represents electrotopological
indices for the number of CH groups attached to two aro-
matic bonds is inversely proportional to the biological
activity. It reveals that the presence of –CH– between the
aromatic rings decreases the biological activity.
Table 4. Proposed MAA based models for the prediction of HIV integrase inhibitory activity.
Index Nature ofrange
Indexvalue
Totalcompoundsin the range
Number ofcompoundspredictedcorrectly
Overallaccuracy of
prediction (%)
Inhibition ofstrand transfer
averageIC50 (nM) a
Antiviralactivityaverage
EC50 (nM)
Average SI
SsOHEindex
Inactive <19.18 40 39 98 10279.97(10543.32)
7016.44(7195.83)
258.74(250.27)
Active I19.18 10 10 7.23 10.02 1146.19xA Inactive <12.97 41 38 96 10 013
(10 803)6845.73
(7385.20)287.65
(256.86)Active I12.97 9 9 7.033 9.23 1113.16
jcc Inactive =710.32 38 36 95.5 10799.47
(11398.93)6385.13
(6739.32)285.76
(245.17)Transitional >710.32–<762.38 5 NA 168.88 7611.46 833.108Active I762.38 7 7 6.7 9.4 969.55
NA, not applicable.a Values in brackets are based upon correctly predicted analogs in the particular range.
Figure 3. (A) Average IC50 (nM) values of quinolone carboxylic acid for strand transfer inhibitory (STI) and antiviral activity in various
ranges of MAA based models. (B) Average selectivity index (SI) values of quinolone carboxylic acid for HIV integrase inhibitory activity in
various ranges of MAA based models.
Table 5. Intercorrelation matrix for MDs used in MAA.
SssOHE index x jc
SssOHE index 1 0.69 0.69x 1 0.93jc 1
6 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
The presence of the SssCH2 count (having negative MLR
coefficient—14.11%) in the model revealed that the number of
–CH2– between the nitrogen and the hydroxyl group should
be as small as possible for the biological activity.
The directly related descriptor x log P (23.09%) signifies the
octanol/water partition coefficient, indicating that the sub-
stitution with a more lipophilic group at the aromatic ring
has a positive influence on the biological activity.
The descriptor T_T_C_2 means the count of number of any
heavy atoms (single, double or triple bonded) separated from
the carbon atom (single, double, or triple bonded) by two
bond distance. The direct relationship (13.87%) of T_T_C_2
indicates that the carboxy and oxo group should always be
present at the 3 and 4 position for good biological activity.
After QSAR study by partial least squares (PLS) using
the forward-backward stepwise variable selection method,
the final equation was developed, and the statistical data
(Table 6) observed are illustrated below.
pIC50 ¼ 0.1365 (SssOHE index) þ 0.6110 (x log P) þ 1.1047
(T_T_C_2) � 0.1637 (SssCH2 count) � 0.0721 (SaaCHE index) þ0.8929.
The QSAR model had a correlation coefficient (r2) of 0.92, a
significant cross validated correlation coefficient (q2) of 0.85,
F-test of 81.91, r2 for the external test set (pred_r2) of 0.69,
and degree of freedom of 30. The model developed predicts
85% of variance and is validated by an external set of
compounds with a predictive correlation coefficient of 0.69.
The model was validated by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00,
a_ran_pred_r2 ¼ 0.01, best_ran_r2 ¼ 0.61, best_ran_q2 ¼ 0.48,
Z-score_ran_r2 ¼ 7.04, Z-score_ran_q2 ¼ 4.47.
The major groups of descriptors involved in developing the
equation by PLS are subgroups like SsOHE-index, SssCH2
count, SaaCHE index, x log P, and alignment independent
descriptor T_T_C_2. The descriptors are common between
MLR and PLS. These only differ from each other in their
percentage of contribution as shown in Fig. 5.Figure 4. Plot of percentage contribution of each descriptor in
the MLR model.
Figure 5. (A) Plot of percentage contribution of each descriptor in the PLS model. (B) Plot of percentage contribution of each descriptor
in the PCR model.
Table 6. Comparative statistical parameters of MLR, PLS, and
PCR.
Parameters MLR PLS PCR
N 33 33 33df 27 28 30r2 0.92 0.92 0.86q2 0.85 0.85 0.79F-test 63.2 81.91 93.06r2 se 0.32 0.31 0.40q2 se 0.43 0.42 0.48pred_ r2 0.69 0.69 0.69pred_ r2Se 0.77 0.77 0.78best_ran_ r2 0.55 0.61 0.33best_ran_ q2 0.30 0.48 0.2Z-score_ran_ r2 7.28 7.04 13.97Z-score_ran_ q2 5.37 4.47 8.75a_ ran_ r2 0.00 0.00 0.00000a_ ran_ q2 0.00 0.000 0.00000a_ ran_ pred_r2 0.01 0.01 0.001
MLR, multiple linear regression; PLS, partial least squares; PCR,principal component regression; n, number of molecules of thetraining set; df, degree of freedom; r2, coefficient of correlation;q2, cross validated r2; pred r2, r2 for the external test set; pred_ r2
Se, coefficient of correlation of the predicted data set.
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 7
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
After the QSAR study by principal component regression
(PCR) using the forward-backward stepwise variable selection
method, the final equation was developed and the statistical
data (Table 6) observed are illustrated below.
pIC50 ¼ 0.1433 (SssOHE index) þ 0.4374 (x log P) þ 0.1376
(T_T_C_2) � 0.0063.
The QSAR model had a correlation coefficient (r2) of 0.86, a
significant cross validated correlation coefficient (q2) of 0.79,
F-test of 93.06, r2 for external test set (pred_r2) of 0.69, and degree
of freedom of 30. The model developed predicts 79% of variance
and is validated by an external set of compounds with a pre-
dictive correlation coefficient of 0.69. The model is validated
by a_ran_r2 ¼ 0.00, a_ran_q2 ¼ 0.00, a_ran_pred_r2 ¼ 0.001,
best_ran_r2 ¼ 0.33, best_ran_q2 ¼ 0.2.
The major groups of descriptors involved in developing the
equation by PCR are subgroups SsOHE-index, x log P, and
alignment independent descriptor T_T_C_2. The QSAR model
by PCR also reveals that the descriptors are common in
MLR and PCR. These only differ from each other in their
percentage of contribution as shown in Fig. 5.
The above study leads to the development of statistically
significant QSAR models which allow an understanding
of the molecular properties/features that play an impor-
tant role in governing the variation in the activities.
In addition, this QSAR study allows investigating the influ-
ence of very simple and easy-to-compute descriptors in
determining biological activities, which could shed light
on the key factors that may aid in the design of potent
molecules.
Conclusions
In the present studies both classification and correlation
approaches have been successfully employed for the develop-
ment of models for the prediction of HIV inhibitory activity
of quinolone carboxylic acid. All the classification models
of diverse nature exhibited a high degree of predictability
with regard to HIV integrase inhibitory activity. The accuracy
of classification of single descriptor based models using MAA
varied from 96 to 98%. Active ranges of the proposed models
exhibited high potency (indicated by low value of IC50) as
well as safety through a high value of SI (an indirect measure
of the therapeutic index). High values of correlation coeffi-
cients (r2 and q2) were observed in MLR, PLS, and PCR based
correlation models. High accuracy of prediction of these
models offers vast potential for providing lead structures
for the development of potent but safe therapeutic agents
for HIV integrase inhibition. Simultaneous use of diverse
technologies for the development of numerous models can
be of immense use for accelerating the drug discovery
process.
Methodology
Data set
All the 50 quinolone carboxylic acid derivatives reported by
Sato et al. [40] as HIV integrase inhibitors were selected as a
dataset for the purpose of the present study. The basic structures
for the said derivatives are shown in Fig. 1 and various
substituents are listed in Table 2.
Molecular descriptors
Molecular descriptors of diverse nature were used in the
current study. These included physico-chemical descriptors,
path count, path cluster, estate contribution descriptors,
polar surface area descriptors, element counts, topological
descriptors, and a variety of alignment independent descrip-
tors. All computational work was performed on an Apple
workstation (8-core processor) using V-life MDS QSAR plus
developed by V life sciences technologies Pvt. Ltd, Pune, India.
The values of other MDs which are not part of V-life MDS
QSAR plus were computed using an in-house computer
program.
Molecular descriptors with significant degenerate values
were omitted from a large pool of descriptors initially calcu-
lated both through V-life MDS QSAR plus software and an
in-house computer program. For the remaining descriptors,
a pairwise correlation analysis was carried out (one of any
two indices with r � 0.97 was excluded to reduce redundant
information). The said exclusion method was used to reduce
the collinearity and correlation between descriptors.
A total of 22 descriptors [41–63] listed in Table 1 were
employed for developing the DT. The MDs shortlisted on
the basis of non-correlating nature and classification ability
and subsequently employed for the present study are as per
the following.
SsOHE index (estate contribution)
This simply represents an electrotopological state index for
the number of hydroxyl (–OH) groups connected with one
single bond [60, 61].
SsOHEi ¼ Ii þDIi (1)
where SsOHEi is the electrotopological state for atom i, Iiis an intrinsic value, and DIi is the influence of other atoms
on i [60, 61].
SaaCHE index (estate contribution)
This is an electrotopological state index for the number of
–CH groups connected with two aromatic bonds [60, 61].
SssCH2 index (estate contribution)
This is an electrotopological state index for the number of
–CH2 groups connected with two single bonds [60, 61].
8 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
Alignment independent descriptors
These alignment-independent versatile structure descriptor
for QSAR and QSPR based on the distribution of molecular
features have been defined by Baumann [62].
These include
T_T_C_2.
This is the count of number of heavy groups (having single,
double, or triple bonds) separated from any carbon atom
(single or double bonded) by two bond distances in a
molecule [62].
Physicochemcial descriptors
x log P.
This descriptor signifies the ratio of solute concentrations
in octanol and water and is generally termed as octanol water
partition coefficient. This is the atom based evaluation of
log P [63].
Molecular connectivity topochemical index (xA)
It is defined as the summation of the modified bond values of
adjacent vertices for all edges in the hydrogen-suppressed
molecular graph [41, 42].
xA ¼Xn
i¼1
ðVci V
cj Þ�1=2 (2)
where Vi and Vj represent modified valencies of a pair of
vertices joined by edges (i, j) [41, 42].
Eccentric connectivity topochemical index (jCC)
It is defined as the summation of the quotients of the product
chemical eccentricity and the chemical degree of each vertex
in a hydrogen suppressed molecular graph having n vertices
[45].
jcc ¼
Xn
i¼1
Eic �Vic (3)
where Vic is the chemical degree of vertex i, Eic is the chemical
eccentricity of vertex i, and n is the number of vertices in
graph G [45].
Classification techniques
Decision tree (DT)
DT provides a simple and useful solution for many problems
of classification where the information contained in the
datasets is relatively complex [67]. In the present study, DT
was grown to identify the importance of MDs. In DT, the
molecules at each parent node are categorized or classified,
based upon the descriptor value, into two child nodes. The
prediction for a molecule reaching a given terminal node is
obtained by majority vote of molecules reaching the same
terminal node in a training set [68]. In the present study, the
R program (version 2.1.0) along with the RPART library was
utilized to grow DT. The active compounds were labeled as
‘‘A’’ (n ¼ 15) and the inactive compounds were similarly
labeled as ‘‘B’’ (n ¼ 37). Each analog was assigned a biological
activity which was subsequently compared with the reported
HIV integrase inhibitory activity.
Random forest (RF)
Random forests were grown for HIV integrase inhibitory
activity. RF grows numerous classification trees. RF is an
ensemble of n number of tree unpruned DTs created by
using bootstrap samples of the training data and a random
subset of variables to define the best split at each node
(tree fork) [69]. The bootstrap sample used during tree
growth is a random selection with replacement from
the molecules in the training set. Model performance for
each tree is internally assessed with the prediction error
of the data left out in the bootstrap procedure (OOB data)
[70]. In order to classify a new object from an input vector,
put the input vector down each of the trees in the forest.
Each tree gives a classification means the tree ‘‘votes’’ for
that class. The forest chooses the classification having the
maximum number of votes (over all the trees involved in
the forest) [71]. Besides preserving most of the appealing
features of DT, RF performs a type of cross-validation in
parallel with the training step by using so called OOB.
OOB data is used to calculate prediction accuracy. In the
present study, the RFs were grown separately for HIV
integrase inhibitory activity with the R program (version
2.1.0) using the RF library.
Moving average analysis (MAA)
MAA of correctly predicted compounds is the basis of develop-
ment of a single MD based model [50, 72]. For the selection
and evaluation of range specific features, exclusive activity
ranges are discovered from the frequency distribution of
response level and subsequently identify the active range
by analyzing the resulting data by maximization of the
moving average with respect to active compounds
(<35% ¼ inactive, 35–65% ¼ transitional, >65% ¼ active)
[50, 71]. For the purpose of MAA based models the compounds
having reported IC50 values of �10 nM were considered to
be active while those possessing IC50 values >10 nM were
treated to be inactive. The HIV integrase inhibitory activity
assigned to each compound was subsequently compared with
the reported biological activity [40]. The average IC50 (nM)
values for each range were also calculated. The cytotoxic
concentration reported for 50 analogs was defined as
cytotoxic concentration required to reduce the viability
of infected cells by 50% (CC50). The SI may be defined as
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 9
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
quotient of CC50 and IC50. The values of SI for all the analogs
were calculated from the reported values of CC50 and IC50.
The average SI values for various ranges in the proposed
models were calculated. The value of SI is an indirect measure
of therapeutic index and consequently an indication of
safety.
Correlation techniques
Multiple linear regression (MLR)
MLR is also commonly referred to as the linear free-energy
relationship (LFER). This method represents an extension of
the simple regression analysis to more than one dimension
[73]. MLR normally generates a QSAR equation by performing
standard multivariable regression calculations to facilitate
identification of the dependence of a drug property on any or
all of the descriptors under investigation [74].
Partial least squares (PLS)
PLS is an iterative regression procedure that produces its
solutions based on linear transformation of a large number
of original MDs to a small number of new orthogonal terms
called latent variables. PLS gives a statistically robust solution
even if the independent variables are highly interrelated
among themselves or when the independent variables exceed
the number of observations [75].
Principal component regression (PCR)
Principal component analysis (PCA) is a substitute for MLR
when explanatory variables are correlated. It is another data
reduction technique that generates a new set of orthogonal
descriptors referred to as principal components (PCs) which
describe most of the information contained in the independ-
ent variables in order of decreasing variance. Consequently,
PCA reduces the dimensionality of a multivariate data set of
descriptors to the actual amount of data available. When PCs
are employed as the independent variables to perform a
linear regression, the method is termed as PCR [76].
In the present study, the dataset was divided into training
and test sets by random selection method for MLR, PLS, and
PCR methods using pIC50 as dependent variable and various
descriptors as independent variables. These models were
generated using a training set of 36 molecules. Predictive
power of the resulting models was evaluated by the test set of
16 molecules with uniformly distributed biological activities.
Data analysis and validation
For classification models the sensitivity and specificity values
were calculated. The sensitivity may be defined [64] as the ratio
of the number of true positive compounds and the sum of
true positive and false negative compounds. Sensitivity can
be expressed as follows:
Sensitivity ¼ PT
PT þ NF(4)
where PT is the number of compounds correctly predicted as
active and NF is the number of compounds incorrectly pre-
dicted as inactive [64].
The specificity may be defined [64] as the ratio of the number
of true negative compounds and the sum of true negative and
false positive compounds. Specificity can be expressed as
follows:
Specificity ¼ NT
NT þ PF(5)
where NT is the number of compounds correctly predicted as
inactive and PF is the number of compounds incorrectly pre-
dicted as active [64]. The terms specificity and sensitivity
assesses the goodness of the models. These represent the classi-
fication accuracies for the active and inactive compounds,
respectively [64, 65]. The randomness of the model was also
determined by calculating MCC [64]. The intercorrelation [66]
between the estate contribution index (SssOE), the augmented
eccentric connectivity topochemical index (AjCC), the molecular
connectivity index (xA) and the connective eccentricity topo-
chemical index (CjC ) was also investigated. Results are
summarized in Tables 2–5 and Figs. 2 and 3.
The following statistical measures were used to correlate
biological activity and MDs for correlation models; n, number
of molecules; k, number of descriptors in a model; df, degree of
freedom; r2, coefficient of correlation; q2, cross validated r2;
pred_ r2, r2 for external test set; pred_ r2Se, coefficient of
correlation of predicted dataset; Z-score, Z-score calculated by
the randomization test; best _ran_ r2; best _ran_q2, highest q2
value in the randomization test; a, statistical significance
parameter obtained by the randomization test. Validation
was done to study the internal stability and predictive ability
of the correlation models. Internal validation of the correlation
models was carried out using the leave-one-out (q2, LOO)
method which describes the internal stability of a model
[77]. For external validation, the activity of each molecule in
the test set was predicted using the model developed by the
training set [78]. The pred_r2 value is indicative of the predic-
tive power of the current model for the external test set. To
evaluate the statistical significance of the QSAR model for an
actual dataset, one tail hypothesis testing was used [79, 80]. The
robustness of the models for the training sets was analyzed by
comparing these models to those derived for random datasets.
Random sets were generated rearranging the activities of
molecules in the training set. The statistical model was derived
using various randomly rearranged activities (random sets)
with the selected descriptors and the corresponding values
10 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
of q2 were calculated. The significance of the models hence
obtained was derived based on a calculated Z-score [79, 80]. If
the Z-score value is less than 4.0, the probability (a) of signifi-
cance of the randomization test is calculated by the formula as
given in the literature [80]; otherwise it is derived by comparing
the Z-score value with the Z-score critical value as reported in
ref. [81]. For example, a Z-score value greater than 3.10 indicates
that there is a probability (a) of less than 0.001 that the QSAR
model constructed for the real dataset is generated by chance.
The randomization test suggests that all the developed models
have a probability of less than 1% that the model is generated
by chance.
The authors have declared no conflict of interest.
References
[1] E. B. De Melo, M. Miguel, C. Ferreira, Eur. J. Med. Chem. 2009,44, 3577–3583.
[2] E. De Clercq, Nat. Rev. 2007, 6, 1001–1018.
[3] J. Stebbing, S. Patterson, F. Gotch, Cell Res. 2003, 13, 1–7.
[4] A. Pendri, T. L. Troyer, M. J. Sofia, M. A. Walker, B. N. Naidu,J. Banville, N. A. Meanwell, I. Dicker, Z. Lin, M. Krystal, S. W.Gerritz, J. Comb. Chem. 2010, 12, 84–90.
[5] Y. Pommier, A. A. Johnson, C. Marchand, Nat. Rev. 2005, 4,236–248.
[6] T. T. Charvat, D. J. Lee, W. E. Robinson, A. R. Chamberlin,Bioorg. Med. Chem. 2006, 14, 4552–4567.
[7] P. O. Brown, in Retroviruses (Eds.: J. M. Coffin, S. H. Hughes,H. E. Varmus), Cold Spring Harbor Laboratory, ColdSpringHarbor, NY 1997, pp. 161–203.
[8] R. Craigie, J. Biol. Chem. 2001, 276, 23213–23216.
[9] P. Pace, M. E. Di Francesco, C. Gardelli, S. Harper,E. Muraglia, E. Nizi, F. Orvieto, A. Petrocchi, M. Poma,M. Rowley, R. Scarpelli, R. Laufer, O. G. Paz, E. Monteagudo,F. Bonelli, D. Hazuda, K. A. Stillmock, V. Summa, J. Med. Chem.2007, 50, 2225–2239.
[10] M. L. Barreca, L. D. Luca, N. Iraci, A. Chimirri, J. Med. Chem.2006, 49, 3994–3997.
[11] M. Li, M. Mizuuchi, T. R. Burke, R. Craigie, EMBO J. 2006, 25,1295–1304.
[12] Y. Aubert, M. Chassignol, V. Roig, G. Mbemba, J. Weiss,H. Meudal, J. F. Mouscadet, U. Asseline, Eur. J. Med. Chem.2009, 44, 5029–5044.
[13] S. Hare, S. S. Gupta, E. Valkov, A. Engelman, P. Cherepanov,Nature 2010, 464, 232–236.
[14] J. A. Grobler, K. Stillmock, B. Hu, M. Witmer, P. Felock, A. S.Espeseth, A. Wolfe, M. Egbertson, M. Bourgeois, J. Melamed,J. S. Wai, S. Young, J. Vacca, D. J. Hazuda, Proc. Natl. Acad. Sci.USA 2002, 99, 6661–6666.
[15] L. Zhuang, J. S. Wai, M. W. Embrey, T. E. Fisher, M. S.Egbertson, L. S. Payne, J. P. Guare, J. P. Vacca, D. J.Hazuda, P. J. Felock, A. L. Wolfe, K. A. Stillmock, M. V.Witmer, G. Moyer, W. A. Schleif, L. J. Gabryelski, Y. M.
Leonard, J. J. Lynch, S. R. Michelson, S. D. Young, J. Med.Chem. 2003, 46, 453–456.
[16] D. J. Hazuda, N. J. Anthony, R. P. Gomez, S. M. Jolly, J. S. Wai,L. Zhuany, T. E. Fisher, M. Embrey, J. P. Guare, M. S.Egbertson, J. P. Vacca, J. R. Huff, P. J. Felock, M. V.Witmer, K. A. Stillmock, R. Danovich, J. Gobler, M. D.Miller, A. S. Espeseth, L. Jin, I.-W. Chen, K. Kassahun, J. D.Ellis, B. K. Wong, W. Xu, P. G. Pearson, W. A. Schleif,R. Cortese, E. Emini, V. Summa, M. K. Holloway, S. D.Young, Proc. Natl. Acad. Sci. 2004, 101, 11233–11238.
[17] V. Summa, A. Petrocchi, V. G. Matassa, C. Gardelli, E. Muraglia,M. Rowley, O. G. Paz, R. Laufer, E. Monteagudo, P. Pace, J. Med.Chem. 2006, 49, 6646–6649.
[18] H. Jin, R. Z. Cai, L. Schacherer, S. Jabri, M. Tsiang, M. Fardis,X. Chen, J. M. Chen, C. U. Kim, Bioorg. Med. Chem. Lett. 2006, 16,3989–3992.
[19] S. G. Deeks, S. Kar, S. I. Gubernick, P. Kirkpatrick, Nat. Rev.2008, 7, 117–118.
[20] M. Sato, T. Motomura, H. Aramaki, T. Matsuda, M. Yamashita,Y. Ito, H. Kawakami, Y. Matsuzaki, W. Watanabe, K. Yamataka,S. Ikeda, E. Kodama, M. Matsuoka, H. Shinkai, J. Med. Chem.2006, 49, 1506–1508.
[21] S. Pasquini, C. Mugnaini, C. Tintori, M. Botta, A. Trejos, I. K.Arvela, M. Larhed, M. Witvrouw, M. Michiels, F. Christ,Z. Debyser, F. Corelli, J. Med. Chem. 2008, 51, 5125–5129.
[22] A. Billich, Curr. Opin. Invest. Drugs 2003, 4, 206–209.
[23] M. Markowitz, J. O. Morales-Ramirez, B. Y. Nguyen, C. M.Kovacs, R. T. QSteigbigel, D. A. Cooper, R. Liporace,R. Schwartz, R. Isaacs, L. R. Gilde, L. Wenning, J. Zhao,H. Teppler, J. Acquir. Immune Defic. Syndr. 2006, 43, 509–515.
[24] E. DeJesus, D. Berger, M. Markowitz, C. Cohen, T. Hawkins,P. Ruane, R. Elion, C. Farthing, L. Zhong, A. K. Cheng,D. McColl, B. P. Kearney, J. Acquir. Immune Defic. Syndr.2006, 43, 1–5.
[25] E. P. Garvey, B. A. Johns, M. J. Gartland, S. A. Foster, W. H.Miller, R. G. Ferris, R. J. Hazen, M. R. Underwood, E. E. Boros,J. B. Thompson, J. G. Weatherhead, C. S. Koble, S. H. Allen,L. T. Schaller, R. G. Sherrill, T. Yoshinaga, M. Kobayoshi,C. Wakasa-Morimoto, S. Miki, K. Nakahara, T. Noshi,A. Sato, T. Fujiwara, Antimicrob. Agents Chemother. 2008, 52,901–908.
[26] E. P. Garvey, B. Schwartz, M. J. Gartland, S. Lang, W. Halsey,G. Sathe, H. L. Carter, K. L. Weaver, Mutant Biochem. 2009, 48,1644–1653.
[27] E. E. Boros, C. E. Edwards, S. A. Foster, M. Fuji, T. Fujiwara,E. P. Garvey, P. L. Golden, R. J. Hazen, J. L. Jeffrey, B. A. Johns,T. Kawasuji, R. Kiyama, C. S. Koble, N. Kurose, W. H. Miller,A. L. Mote, H. Murai, A. Sato, J. Thompson, M. C. Woodward,T. Yoshinaga, J. Med. Chem. 2009, 52, 2754–2761.
[28] M. Grover, B. Singh, M. Bakshi, S. Singh, Pharm. Sci. Technol.Today 2000, 3, 28–35.
[29] C. Hansch, A. Leo, S. B. Mekapati, A. Kurup, Bioorg. Med. Chem.2004, 12, 3391–3400.
[30] F. A. Pasha, H. K. Srivastava, A. Srivastava, P. P. Singh, QSARComb. Sci. 2007, 26, 69–84, DOI: 10.1002/qsar.200630010
[31] M. Zhao, Z. Li, Y. Wu, Y. R. Tang, C. Wang, Z. Zhang, S. Peng,Eur. J. Med. Chem. 2007, 42, 955–965.
Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12 Models for the Prediction of HIV Integrase Inhibitory Activity 11
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com
[32] H. K. Srivastava, M. Chourasia, D. Kumar, G. N. Sastry,J. Chem. Inf. Model. 2011, 51, 558–571.
[33] H. K. Srivastava, Bull. Kor. Chem. Soc. 2009, 30, 67–76.
[34] P. Srivani, G. N. Sastry, J. Mol. Graph. Mod. 2009, 27, 676–688.
[35] P. Srivani, D. Usharani, E. D. Jemmis, G. N. Sastry, Curr.Pharm. Des. 2008, 14, 3854–3872.
[36] H. K. Srivastava, F. A. Pasha, S. K. Mishra, P. P. Singh, Med.chem. Res. 2009, 18, 455–466.
[37] M. H. Bohari, H. K. Srivastava, G. N. Sastry, Org. Med. Chem.Lett. 2011, 1, 1–12.
[38] C. Hansch, T. Fujita, in Classical and Three-Dimensional QSAR inAgro-chemistry (Eds.: C. Hansch, T. Fujita), Symposium Series606, American Chemical Society, Washington, DC 1995,pp. 1–12.
[39] O. Ivanciuc, S. L. Taraviras, J. Chem. Inf. Comput. Sci. 2000, 40,126–134.
[40] M. Sato, H. Kawakami, T. Motomura, H. Aramaki, T. Matsuda,M. Yamashita, Y. Ito, Y. Matsuzaki, K. Yamataka, S. Ikeda,H. Shinkai, J. Med. Chem. 2009, 52, 4869–4882.
[41] A. Goel, A. K. Madan, J. Chem. Inf. Comput. Sci. 1995, 35, 510–514.
[42] H. Dureja, A. K. Madan, J. Mol. Mod. 2005, 11, 525–531.
[43] S. Gupta, M. Singh, A. K. Madan, Indian J. Chem. 2003, 42A,1414–1425.
[44] A. K. Madan, H. Dureja, in Novel Molecular StructureDescriptors—Theory and Applications II (Eds.: I. Gutman,B. Furtula), Eccentricity Based Descriptors for QSAR/QSPR,Mathematical Chemistry Monographs, No. 9, University ofKragujevac, Kragujevac, Serbia 2010, pp. 91–138.
[45] V. Kumar, S. Sardana, A. K. Madan, J. Mol. Mod. 2004, 10, 399–407.
[46] S. Bajaj, S. S. Sambhi, A. K. Madan, Croat. Chem. Acta 2005, 78,165–174.
[47] S. Bajaj, S. S. Sambhi, A. K. Madan, J. Mol. Struct. (Theochem)2004, 684, 197–203.
[48] H. Dureja, S. Gupta, A. K. Madan, J. Mol. Graph. Mod. 2008, 26,1020–1029.
[49] M. Randic, J. Am. Chem. Soc. 1975, 97, 6609–6615.
[50] S. Gupta, M. Singh, A. K. Madan, J. Comput. Aided Mol. Des.2001, 15, 671–678.
[51] S. Bajaj, S. S. Sambhi, A. K. Madan, QSAR Comb. Sci. 2006, 25,813–823.
[52] V. Sharma, R. Goswami, A. K. Madan, J. Chem. Inf. Comput. Sci.1997, 37, 273–282.
[53] S. Gupta, M. Singh, A. K. Madan, J. Mol. Graph. Mod. 2000, 18,18–25.
[54] I. Gutman, B. Ruscic, N. Trinajstic, C. F. Wicox, J. Chem. Phys.1975, 62, 3399–3405.
[55] I. Gutman, M. Randic, Chem. Phys. Lett. 1977, 47, 15–19.
[56] H. Wiener, J. Am. Chem. Soc. 1947, 69, 2636–2638.
[57] H. Wiener, J. Chem. Phys. 1947, 15, 766.
[58] A. T. Balaban, Pure Appl. Chem. 1983, 55, 199–206.
[59] S. Gupta, M. Singh, A. K. Madan, J. Chem. Inf. Comput. Sci. 1999,39, 272–277.
[60] L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci. 1995, 35, 502–514.
[61] L. H. Hall, B. K. Moheny, L. B. Kier, J. Chem. Inf. Comput. Sci.1991, 31, 76–82.
[62] K. Baumann, J. Chem. Inf. Comput. Sci. 2002, 42, 26–35.
[63] R. Wang, Y. Fu, L. Lai, J. Chem. Inf. Comput. Sci. 1997, 37, 615–621.
[64] L. Han, Y. Wang, S. H. Bryant, BMC Bioinformatics 2008, 9, 401–408.
[65] C. Lamanna, M. Bellini, A. Padova, G. Westerberg, L. Maccari,J. Med. Chem. 2008, 51, 2891–2897.
[66] S. Nikolic, G. Kavacevic, A. Milicevic, N. Trinanjstic, Croat.Chem. Acta 2003, 76, 113–124.
[67] H. Kim, G. J. Koehler, Omega Int. J. Manage. Sci. 1995, 23, 637–652.
[68] A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, S. D. Brown,J. Chemomet. 2004, 18, 275–285.
[69] L. Breiman, Mach. Learn. 2001, 45, 5–32.
[70] A. Merzlikine, Y. A. Abramov, S. J. Kowsz, V. H. Thomas,T. Mano, Int. J. Pharm. 2011, 418, 207–216, DOI: 10.1016/j.ijpharm.2011.03.065
[71] A. Liaw, M. Wiener, ‘Classification and regression byrandom forest’, R News, 2002, 2, 18–22.
[72] H. Dureja, S. Gupta, A. K. Madan, Sci. Pharm. 2008, 76, 377–394.
[73] R. A. Berk, in Regression Analysis: A constructive Critique (Ed.:R. A. Berk) The formalities of Multiple regression, SAGEPublications Ltd, London 2003, pp. 103–110.
[74] T. W. Schultz, M. T. D. Cronin, J. D. Walker, A. O. Aptula,J. Mol. Struct. 2003, 622, 1–22.
[75] S. Wold, W. J. Dunn, J. Chem. Inf. Comput. Sci. 1998, 23, 6–23.
[76] G. H. Dunteman, in Principal Component Analysis (Ed.:D. H. Dunteman), Basic concepts of principal componentanalysis. SAGE Publications Ltd, London 1989, pp. 15–22.
[77] W. Zheng, A. Tropsha, J. Chem. Inf. Comput. Sci. 2000, 40, 185–194.
[78] P. Gedeck, B. Rohde, C. Bartels, J. Chem. Inf. Comput. Sci. 2006,46, 1924–1936.
[79] N. Gilbert, in Statistics (Ed.: N. Gilbert), W. B. Saunders Co,Philadelphia, PA 1976.
[80] A. Golbraikh, A. Tropsha, J. Chem. Inf. Comput. Sci. 2003, 43,144–154.
[81] M. Shen, Y. Xiao, A. Golbraikh, V. K. Gombar, A. Tropsha,J. Med. Chem. 2003, 46, 3013–3020.
12 M. Gupta and A. K. Madan Arch. Pharm. Chem. Life Sci. 2012, 000, 1–12
� 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.archpharm.com