Upload
monika-gupta
View
213
Download
1
Embed Size (px)
Citation preview
Superaugmented Eccentric Distance SumConnectivity Indices: Novel HighlyDiscriminating Topological Descriptors forQSAR ⁄QSPR
Monika Gupta1, Sunil Gupta2, HarishDureja1 and Anil Kumar Madan3,*
1Faculty of Pharmaceutical Sciences, M.D. University, Rohtak124001, Haryana, India2J. C. D. College of Pharmacy, Sirsa 125055, Haryana, India3Faculty of Pharmaceutical Sciences, Pt. B. D. Sharma University ofHealth Sciences, Rohtak 124001, Haryana, India*Corresponding author: Anil Kumar Madan, [email protected]
Four highly discriminating fourth-generation topo-logical indices (TIs), termed as superaugmentedeccentric distance sum connectivity indices, aswell as their topochemical versions (denoted bySEDnc
C1, SEDncC2,
SEDncC3 and SEDnc
C4), have been concep-tualized in this study. The values of these indicesfor all possible structures with three, four, andfive vertices containing one heteroatom werecomputed using an in-house computer program.The proposed superaugmented eccentric distancesum connectivity topochemical indices exhibitedexceptionally high discriminating power, lowdegeneracy, and high sensitivity toward both thepresence and the relative position of hetero-atom(s) for all possible structures with five verti-ces containing at least one heteroatom.Intercorrelation analysis revealed the absence ofcorrelation of proposed indices with Zagrebindices and the molecular connectivity index.Subsequently, the proposed TIs were successfullyutilized for the development of models for theprediction of checkpoint kinase inhibitory activityof 2-arylbenzimidazoles. A data set comprising 47differently substituted analogs of 2-arylbenzimi-dazoles was selected for the study. The values ofvarious TIs for each analog in the data set werecomputed using an in-house computer program.The resulting data were analyzed, and suitablemodels were developed through decision tree(DT), random forest (RF), and moving averageanalysis (MAA). The performance of the modelswas assessed by calculating the specificity, sensi-tivity, overall accuracy, and Mathew’s correlationcoefficient. A decision tree was constructed forthe checkpoint kinase inhibitory activity to deter-mine the importance of topological indices. Thedecision tree identified the proposed TIs – SEDnc
C3,SEDnc
C4 – as the most important indices. The deci-
sion tree learned the information from the inputdata with an accuracy of 96% and correctly pre-dicted the cross-validated (10-fold) data with anaccuracy of 77%. Random forest correctly pre-dicted the checkpoint kinase inhibitory activitywith an accuracy of 83%. The single index-basedmodels were also developed for the prediction ofcheckpoint kinase inhibitory activity using MAA.The accuracy of prediction of single index-basedmodels derived through MAA was found to varyfrom a minimum of 90% to a maximum of 95%.Exceptionally high discriminating power, lowdegeneracy, and high sensitivity toward branchingand presence of heteroatom of proposed indicescan be of immense use in drug design, isomer dis-crimination, similarity ⁄ dissimilarity studies, quanti-tative structure activity ⁄ property relationships,lead optimization, and combinatorial library design.
Key words: 2-arylbenzimidazole, checkpoint kinase inhibitors, super-augmented eccentric distance sum connectivity topochemical indices,topological indices
Received 3 February 2011, revised 7 September 2011 and accepted forpublication 16 October 2011
The identification and optimization of the lead compounds in arapid and cost-effective way are the most critical steps in drug dis-covery. The computer-aided drug discovery approach offers an alter-native to the real world of synthesis and screening (1,2). Thecomputational techniques have advanced rapidly over the past fewdecades and have played a major role in the development of anumber of drugs now in the market or going through clinical trials(3,4). QSAR ⁄ QSPR is the mathematical relationship linking chemicalstructure and pharmacological activity ⁄ property in a quantitativemanner for the series of compounds (5). It also reduces the numberof compounds to be synthesized and promptly detects the mostfavorable compounds. Fundamentally, QSAR aims to identify rela-tionships between some aspects of molecular structure and proper-ties as toxicology, pharmacodynamics, and pharmacokinetics (6).
The 2D approach has a number of advantages compared with thehigher dimension QSAR methodologies. First of all, owing to thevariety of molecular descriptors available, optimized coordinates arenot always required. In fact, connectivity information (in the form of
38
Chem Biol Drug Des 2012; 79: 38–52
Research Article
ª 2011 John Wiley & Sons A/S
doi: 10.1111/j.1747-0285.2011.01264.x
an adjacency matrix) alone can be used to develop QSAR models.As a result, the models using topological descriptors can be builtrapidly for very large sets of molecules. Second, this approachavoids the alignment step and thus can be used in the absence ofexperimental information regarding the binding of a molecule to itstarget.a
The 2D QSAR makes use of TIs which are the numerical valuesassociated with the chemical constitution for correlation of chemicalstructures with various physical properties, chemical reactivity, orbiological activity (7). These are derived from topological represen-tation of molecules and can be considered structure explicit descrip-tors (8). The TIs are among the most useful descriptors knownnowadays, as these can be rapidly computed for large number ofmolecules and also offer a simple way of measuring molecularbranching, shape, size, cyclicity, symmetry, chirality, complexity, andheterogeneity of atomic environments in the molecule (9–14). Thepast two decades have witnessed that the use of TIs in QSAR mod-els enhanced the scope of drug design by producing the reliableestimates of therapeutic and toxic potential of chemicals (15).
The genetic integrity of a cell is constantly challenged by radiation,chemical agents, and replication errors (16). These agents mainlycause double strand breaks (DSB) and single strand breaks (SSB) andcause genomic instability that may lead to tumor development, if leftunrepaired (17). The DNA damage is also used to cure the cancer.Many of the conventional anticancer treatments (ionizing radiation,hyperthermia, pyrimidine and purine antimetabolites, alkylatingagents, DNA topoisomerase inhibitors, and platinum compounds) atleast partly damage the DNA of cells. As these treatments are notspecifically selective for cancer cells, patients have suffered fromserious side effects when taking these drugs (18). Therefore, DNAdamage causes the disease, used to treat the disease, and responsi-ble for the toxicity of therapies for disease (19).
In DNA damage response (DDR), eukaryotic cells activate checkpointpathways to arrest the cell cycle (20–22). The checkpoints comprisea subroutine integrated into the larger DDR pathway that regulatesa multifaceted response. Moreover, several checkpoint genes areessential for cell and organism survival (23–27) implying that thesepathways are not only surveyors of occasional damage but arefirmly integrated components of cellular physiology (22).
The DNA damage checkpoints are known to comprise signal trans-duction cascades that link the detection of DNA damage to severalother processes, i.e. inhibition of progression through the cell cyclefrom G1 to S, through S and from G2 into M, activation of DNArepair and initiation of apoptosis (28). DNA damage is recognizedby damage sensor proteins such as Mre11-Rad50-Nbs1 (MRN com-plex) and breast and ovarian cancer locus 1 (BRCA1)-associatedgenome surveillance complex (BASC). These proteins recruit andactivate the upstream Ataxia-telangiectasia mutated (ATM) proteinand ATM and Rad 3-related (ATR) kinases (17,29). Checkpoint kinas-es Chk1 and Chk2 are downstream key mediators of DDR throughactivation of an increasing number of substrates such as p53,NBS1, BRCA1, MDM2, Cdc25A, Cdc25C, and E2F1 (30–32). The rel-evance of these kinases in the maintenance of genome integrity isclearly indicated by the severe human genetic disorders and the
predisposition to cancer associated with defects in these proteins(20,33–35).
Radiation and chemotherapy as the therapy for cancer often haveserious side effects that limit their efficacy. Modulations of check-point regulating responses to these types of drugs appear as apotential strategy to sensitize the tumor cells to the DNA damagingagents (17). Checkpoint kinase 2 acts as mediator between DNAdamage signaling and also act as barrier for tumorogenesis (36).There is evidence in favor of therapeutic value of Chk2 inhibitors(37,38). Checkpoint kinase 2 inhibitors are reported to augment theeffect of various cytotoxic drugs, e.g. Doxorubicin (39), Cisplatin(40), and Paclitaxel (41).
The side effects from the radiation therapy have been reported asmore serious. As these side effects are in part determined by p53-mediated apoptosis, temporary suppression of p53 has been sug-gested as a therapeutic strategy to prevent damage of normal tis-sues during treatment of p53-deficient tumors (42,43). The p53response to DNA breaks induced by radiation and certain chemicalagents is controlled by Chk2 (36). Studies showed that Chk2-defi-ciency exhibited radioresistance and a critical role in p53 functionin response to IR by regulating its transcriptional activity and itsstability indicating the utility of Chk2 inhibitors as radioprotectantfor normal cells (44,45). Thus, Chk2 inhibitors may be useful drugsfor reducing the side effects of cancer therapy and other types ofstress associated with p53 activation (46,47).
Agents that target checkpoint kinases have demonstrated impres-sive evidence preclinically that this approach will provide tumor-spe-cific potentiating agents and may have broad therapeutic utility.Only a few selective Chk2 inhibitors have been reported other than2-arylbenzimidazole (48), NSC 109555 (49), VRX0466617 (50), iso-thiazole carboxamides (51), and PV 1019 (52). There are variouspublished inhibitors of Chk1 (Staurosporin, Go6976, SB-218078, ICP-1, CEP-3891, and AZD7762) (53) and both Chk1 and Chk2 (TAT-S216A, UCN-01, and debromohymenialdisine) (54,55), CEP-6367, Sul-foraphane (18,56,57).
The past decade has witnessed the development of checkpointkinase inhibitors for the treatment of cancer. Three checkpointkinase inhibitors have already entered clinical trials since 2005 (58).The pharmaceutical industry strives to explore novel scaffolds forcheckpoint kinase inhibition.
In this study, four topological descriptors termed as superaugment-ed eccentric distance sum connectivity indices and their topochemi-cal versions have been conceptualized and successfully utilizedalong with existing TIs for development of models for prediction ofcheckpoint kinase (Chk2) inhibitory activity of 2-arylbenzimidazoles.
Methodology
Calculation of topological indicesThe values of SEDnc
Nwere calculated for all possible structures with
three, four, and five vertices containing one heteroatom (Figures 1and 2.) using an in-house computer program.
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 39
Cpd
no.*
Structure cSEDC1
ξ cSEDC2
ξ cSEDC3
ξ cSEDC4
ξ
1N
1.974 2.397 2.661 2.803
2N
2.484 3.809 5.396 7.16
3N
0.363 0.395 0.428 0.461
4 N6.585 14.858 32.737 70.8
5 N 6.244 13.833 29.931 63.636
6 N 3.872 6.435 9.815 13.806
7 N 3.186 4.168 4.859 5.261
8N
1.992 4.063 8.274 16.834
9N
1.027 1.286 1.456 1.551
10N
1.217 1.849 2.613 3.473
11N
1.348 2.084 2.955 3.906
12N
0.294 0.338 0.374 0.404
13 N 0.364 0.468 0.58 0.701
14 N 0.089 0.099 0.109 0.119
15N
8.642 20.158 45.688 101.092
16 N 8.755 20.431 46.17 101.674
17N
8.381 19.282 42.945 93.253
18 N 8.027 18.187 39.998 85.988
19 N 11.374 28.617 69.097 161.69820
N 11.814 30.236 74.188 176.075
21N 10.299 23.831 52.393 111.239
22N
4.898 8.667 14.128 21.096
23N
0.774 1.059 1.37 1.702
24N
3.811 8.09 17.153 36.323
25N
0.759 1.143 1.6 2.107
26N
0.688 1.028 1.432 1.883
27N
0.816 1.247 1.75 2.293
28N
3.387 7.274 15.4 32.248
29N
3.508 7.606 16.21 34.098
30N
3.708 8.22 17.939 38.678
Figure 1: Index values of for all possible structure with three, four, and five vertices containing one heteroatom. *Cpd no., compoundnumber.
Gupta et al.
40 Chem Biol Drug Des 2012; 79: 38–52
44
N
2.45 5.029 10.24 20.729
45 N 2.488 5.327 11.304 23.824
46N
3.814 8.996 20.596 46.037
47N
3.677 8.686 19.956 44.831
48N
3.156 7.027 15.178 32.057
49N
3.378 7.661 16.872 36.289
50 N 1.034 2.159 4.467 9.176
51 N 1.114 2.437 5.281 11.367
52N
0.93 1.905 3.874 7.834
53N
0.539 1.128 2.354 4.904
54N
0.501 1.017 2.061 4.175
55N
0.492 0.989 1.986 3.988
56N
2.085 3.322 4.84 6.533
57N
1.909 2.982 4.315 5.838
58N
1.246 2.556 5.237 10.716
59N
1.331 2.761 5.721 11.833
60N
1.303 2.647 5.371 10.89
61 N 0.253 0.319 0.39 0.467
* Cpd no.-Compound number
31 N 3.475 7.623 16.448 35.052
32 N 1.362 2.775 5.647 11.482
33
N
1.466 3.014 6.189 12.692
34N
0.319 0.452 0.599 0.757
35 N 0.294 0.414 0.547 0.692
36 N 0.301 0.485 0.713 0.966
37 N 0.26 0.408 0.596 0.814
38 N 1.085 2.358 5.078 10.859
39N
0.099 0.128 0.159 0.192
40N
0.106 0.137 0.17 0.206
41N
0.135 0.19 0.251 0.317
42N
0.039 0.047 0.057 0.067
43N
2.719 5.886 12.625 26.888
Figure 1: (Continued.)
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 41
Superaugmented eccentric distance sumconnectivity indicesSuperaugmented eccentric distance sum connectivity indices, SEDnc
N,
proposed in this study can be defined as the inverse of the summa-tion of quotients of the product of adjacent vertex degrees and theproduct of the squared distance sum and eccentricity of the con-
cerned vertex for all vertices in a hydrogen-suppressed moleculargraph. It can be expressed as follows:
SEDncN¼
Xn
i¼1
Mi
E Ni � S2
i
" #�1
ð1Þ
Arbitrary vertex numberingN
CC
CC
CC
CC
CC1
2
3
4
5
6
7
8
9
10
11
NC
CC
CC
CC
C
C
12
34
5
67
89
10
11
C
NC
CC
C
C
C
C
C
C
C
12
34
5
67
8
910
11
Chemical Distance Matrices (DC)
1 2 3 4 5 6 7 8 9 10 11 Si Ei
1234567891011
0 1 2 3 4 5 6 7 8 9 101.167 0 1 2 3 4 5 6 7 8 92.167 1 0 1 2 3 4 5 6 7 83.167 2 1 0 1 2 3 4 5 6 74.167 3 2 1 0 1 2 3 4 5 65.167 4 3 2 1 0 1 2 3 4 56.167 5 4 3 2 1 0 1 2 3 47.167 6 5 4 3 2 1 0 1 2 381.67 7 6 5 4 3 2 1 0 1 29.167 8 7 6 5 4 3 2 1 0 110.167 9 8 7 6 5 4 3 2 1 0
5546.16739.16734.16731.16730.16731.16734.167112.6746.16755.167
1098765.1676.1677.1678.1679.16710.167
1 2 3 4 5 6 7 8 9 10 11 Si Ei
1234567891011
0 1 2 3 4 5 6 7 8 2 71.167 0 1 2 3 4 5 6 7 1 62.167 1 0 1 2 3 4 5 6 2 53.167 2 1 0 1 2 3 4 5 0 44.167 3 2 1 0 1 2 3 4 4 35.167 4 3 2 1 0 1 2 3 5 26.167 5 4 3 2 1 0 1 2 6 17.167 6 5 4 3 2 1 0 1 7 28.167 7 6 5 4 3 2 1 0 8 32.167 1 2 3 4 5 6 7 8 0 77.167 6 5 4 3 2 1 2 3 7 0
4536.16731.16725.16727.16728.16731.16738.16747.16745.16740.167
87654.1675.1676.1677.1678.16787.167
1 2 3 4 5 6 7 8 9 10 11 Si Ei
1234567891011
0 1 2 3 4 2 3 4 2 3 41.167 0 1 2 3 1 2 3 1 2 32.167 1 1 2 2 2 1 2 2 1 23.167 2 1 0 1 3 2 1 3 2 14.167 3 2 1 0 4 3 2 4 3 22.167 1 2 3 4 0 3 4 2 3 43.167 2 1 2 3 3 0 3 3 2 34.167 3 2 1 2 4 3 0 4 3 22.167 1 2 3 4 2 3 4 0 3 43.167 2 1 2 3 3 2 3 3 0 34.167 3 2 1 2 4 3 2 4 3 0
2819.16718.16719.16728.16728.16725.16728.16728.16725.16728.167
432.1673.1674.16743.1674.16743.1674.167
Chemical Adjacency Matrices (AC)1 2 3 4 5 6 7 8 9 10 11 Vi
1234567891011
0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 0 0 0 0 0 00 1 0 1 0 0 0 0 0 0 00 0 1 0 1 0 0 0 0 0 00 0 0 1 0 1 0 0 0 0 00 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 0 1 0 0 00 0 0 0 0 0 1 0 1 0 00 0 0 0 0 0 0 1 0 1 00 0 0 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 1 0
12.167222222221
1 2 3 4 5 6 7 8 9 10 11 Vi
1234567891011
0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 0 0 0 0 1 00 1 0 1 0 0 0 0 0 0 10 0 1 0 1 0 0 0 0 0 00 0 0 1 0 1 0 0 0 0 00 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 0 1 0 1 00 0 0 0 0 0 1 0 1 0 00 0 0 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 0 0 00 0 1 0 0 0 0 0 0 0 0
13.167222232111
1 2 3 4 5 6 7 8 9 10 11 Vi
1234567891011
0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 1 0 0 1 0 00 1 0 1 0 0 1 0 0 1 00 0 1 0 1 0 0 1 0 0 10 0 0 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0
14.167441111111
Augmentative Chemical Adjacency Matrices ( αcA )
1 2 3 4 5 6 7 8 9 10 11 Mic
123456789
1011
0 2.167 0 0 0 0 0 0 0 0 01 0 2 0 0 0 0 0 0 0 00 2.167 0 2 0 0 0 0 0 0 00 0 2 0 2 0 0 0 0 0 00 0 0 2 0 2 0 0 0 0 00 0 0 0 2 0 2 0 0 0 00 0 0 0 0 2 0 2 0 0 00 0 0 0 0 0 2 0 2 0 00 0 0 0 0 0 0 2 0 2 00 0 0 0 0 0 0 0 2 0 10 0 0 0 0 0 0 0 0 2 0
2.1672
4.33444444422
1 2 3 4 5 6 7 8 9 10 11 Mic
1234567891011
0 3.167 0 0 0 0 0 0 0 0 01 0 2 0 0 0 0 0 0 1 00 3.167 0 2 0 0 0 0 0 0 00 0 2 0 2 0 0 0 0 0 00 0 0 2 0 2 0 0 0 0 00 0 0 0 2 0 3 0 0 0 00 0 0 0 0 2 0 2 0 0 00 0 0 0 0 0 3 0 1 0 00 0 0 0 0 0 0 2 0 0 00 3.167 0 0 0 0 0 0 0 0 00 0 0 0 0 0 3 0 0 2 0
3.1672
6.334446432
3.1673
1 2 3 4 5 6 7 8 9 10 11 Mic
1234567891011
0 4.167 0 0 0 0 0 0 0 0 01 0 4 0 0 1 0 0 1 0 00 4.167 0 4 0 0 1 0 0 1 00 0 4 0 0 0 0 1 0 0 10 0 0 4 1 0 0 0 0 0 00 4.167 0 0 0 0 2 0 0 0 00 0 4 0 0 0 0 0 0 0 00 0 0 4 0 0 0 0 0 0 00 4.167 0 0 0 0 0 0 0 0 00 0 4 0 0 0 0 0 0 0 00 0 0 4 0 0 0 0 0 0 0
4.1674
16.66844
4.16744
4.16744
Superaugmented eccentric distance sum connectivity topochemical index-11n
i = 1
−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
∗= ∑ 2
icci
cicSED
SE
Mξ C1
= 238.801 = 147.233 = 20.804Superaugmented eccentric distance sum connectivity topochemical index-2
1−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
∗= ∑ 2
ic2ic
icn
i = 1
cSED
SE
MξC2
=1554.158 = 796.096 = 52.646Superaugmented eccentric distance sum connectivity topochemical index-3
1
2ic
3ic
icn
i = 1
cSED
SE
Mξ
C3
−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
∗= ∑
=9810.431 = 4155.506 = 127.118
Superaugmented eccentric distance sum connectivity topochemical index-41
2ic
4ic
icn
i = 1
cSEDSE
MξC4
−
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
∗= ∑
=60235.7 = 21001.523 = 296.55
Figure 2: Calculation of values of superaugmented eccentric distance sum connectivity topochemical index-1 (SEDncC1), superaugmented
eccentric distance sum connectivity topochemical index-2 (SEDncC2), superaugmented eccentric distance sum connectivity topochemical index-3
(SEDncC3), and superaugmented eccentric distance sum connectivity topochemical index-4 (SEDnc
C4), for three isomers of 11-membered molecule(decylamine).
Gupta et al.
42 Chem Biol Drug Des 2012; 79: 38–52
where Mi is the product of degrees of all the vertices (vj), adjacent tovertex i and can be easily obtained by multiplying all the non-zerorow elements in augmentative adjacency matrix, Ei is the eccentricity,Si is the distance sum of vertex i, and n is the number of vertices inthe graph, and the N is equal to 1, 2, 3, 4 for superaugmented eccen-tric distance sum connectivity indices-1, -2, -3, -4, respectively.
Similarly, the topochemical version of superaugmented eccentric dis-tance sum connectivity indices can be defined as the inverse of thesummation of quotients of the product of adjacent vertex chemicaldegrees and the product of the squared chemical distance sum andchemical eccentricity of the concerned vertex for all vertices in ahydrogen-suppressed molecular graph.
It can be expressed as follows:
SEDncCN ¼
Xn
i¼1
Mic
E Nic � S2
ic
" #�1
ð2Þ
where Mic is the product of chemical degrees of all the vertices(vj), adjacent to vertex i and can be easily obtained by multiplyingall the non-zero row elements in additive chemical adjacencymatrix, Eic is the chemical eccentricity, Si is the chemical distancesum of vertex i, and n is the number of vertices in the graph, andthe N is equal to 1, 2, 3, 4 for superaugmented eccentric distancesum connectivity topochemical indices-1, -2, -3, -4, respectively(denoted by SEDnc
C1, SEDnc
C2, SEDnc
C3, and SEDnc
C4).
Superaugmented eccentric distance sum connectivity topochemicalindices can be easily calculated from the chemical distance matrix(Dc), chemical adjacency matrix (AC), and augmentative chemicaladjacency matrix (Aa
c ). The calculation of proposed SEDncC1
, SEDncC2
,SEDnc
C3, and SEDnc
C4for three isomers of 11-membered molecule (de-
cylamine) has been exemplified in Figure 2.
The index values of the proposed topochemical descriptors towardpresence and the relative position of heteroatom(s) for all three-,four-, and five-membered isomers containing one heteroatom havebeen complied in Figure 1. The discriminating power and degener-acy of the superaugmented eccentric distance sum connectivity to-pochemical indices were investigated using all possible structureswith three, four, and five vertices containing one heteroatom hasbeen given in Table 1. The intercorrelation of the proposed super-augmented eccentric distance sum connectivity indices with Wie-ner's index, Zagreb indices, the molecular connectivity index, andeccentric connectivity indices were investigated (Table 2).
Topological indicesThe 26 descriptors including the proposed indices (Table 3) (59–75) ofdiverse nature were used in this study. Though a total of 26 descriptorswere employed for the present study, only 14 indices were shortlistedon the basis of non-correlating nature and classification ability. Theseshortlisted indices used in the present study are defined below.
Wiener’s topochemical index (Wc)Wiener's topochemical index (67) is defined as sum of the chemicaldistances between all pairs of vertices in hydrogen suppressed
molecular graph. It is a refined form of the oldest and widely useddistance-based topological index, Wiener's index (76), and this mod-ified index considers the presence and the relative position of het-eroatom(s) in a molecular structure. It can be expressed as
Wc ¼ 1=2Xn
i¼1
Xn
j¼1
Picjc ð3Þ
where Picjc is the chemical length the path that contains the leastnumber of edges between vertex i and j in the graph G and n isthe number of vertices in the hydrogen depleted graph(67).
Zagreb indices (M1 and M2)This pair of indices (74,75) denoted by M1 and M2 was introducedin 1972 and is defined as per the Equations 4 and 5.
M1 ¼X
vertices
dðiÞdðiÞ ð4Þ
M2 ¼Xedges
dðiÞdðjÞ ð5Þ
where d(i) is the degree of vertex i, which can be defined as numberof edges incident on a vertex i and d(i)d(j) is the weight of edge {i,j}.
Similarly Zagreb topochemical indices (66) Mc1 and Mc
2 are definedas per the Equations 6 and 7.
Mc1ðGÞ ¼
Xn
i¼1
ðd cðiÞÞ2 ð6Þ
where dc(i) is the chemical degree vertex i and n is the number ofvertices.
M c2ðGÞ ¼
Xn
ij
ðdcðiÞðd cðjÞÞ ð7Þ
where dc(i) dc(j)is the chemical weight of edge {i, j} in the hydrogensuppressed molecular graph and n is the number of edges.
Connective eccentricity indexConnective eccentricity index (73) can be defined as summation ofthe ratios of the degree of a vertex (Vi) and its eccentricity (Ei) forall vertices in the hydrogen suppressed molecular structure. It canbe expressed by the following equation:
C n ¼Xn
i¼1
Vi
Ei
� �ð8Þ
The eccentricity Ei of a vertex i in a graph G is the path length from ver-tex i to the vertex j that is farthest from i ðEi ¼maxðdijÞ; j 2 GÞ.
Data setA data set (48) comprising 47 analogs of 2-arylbenzimidazole wasselected for the present investigation. The basic structure for these
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 43
analogs is depicted in Figure 3, and various substituents areenlisted in Figure 4. The values of 26 descriptors (Table 3) used inthis study were calculated for all the analogs involved in the dataset using an in-house computer program. Compounds havingreported IC50 values of £25 nM were considered to be active,whereas those possessing IC50 values >25 nM were treated to beinactive for the purpose of the present study.
Decision treeDecision tree provides a useful solution for many problems of clas-sification where large data sets are used and the information con-tained is complex. A decision tree (generally defined) is a treewhose internal nodes are tests (on input patterns) and whose leafnodes are categories (off patterns). A decision tree assigns a classnumber (or output) to an input pattern by filtering the pattern downthrough the tests in the tree. Each test has given mutually exclusiveand exhaustive outcomes.
Decision trees are constructed beginning with the roots of tree andproceeding down to its leaves. In terms of ability, decision trees
are a rapid and effective method of classifying data set entries andcan provide good decision support capabilities (77,78). In this study,the decision tree was grown to identify the importance of TIs. In adecision tree, the molecules at each parent node are classified,based on the index value, into two child nodes. The prediction formolecule reaching a given terminal node is obtained by majorityvote of molecules reaching the same terminal node in training set.In this study, R program (version 2.1.0; University of Auckland, Auck-land, New Zealand) along with the RPART library was used to growthe decision tree. The active compounds were labeled as 'A'(n = 18) and the inactive compounds were labeled 'B' (n = 29). Eachanalog was assigned a biological activity, which was then com-pared with the reported Chk2 inhibitory activity.
Random forestRandom forest (RF) was grown for Checkpoint (Chk2) inhibitoryactivity. Random forest grows numerous classification trees. To clas-sify a new object from an input vector, put the input vector downeach of the trees in the forest. Each tree gives a classificationmeans the tree 'votes' for that class. The forest chooses the classi-fication having the most votes (over all the trees in the forest) (79).In this study, the RFs were grown with the R program (version2.1.0) using the RF library.
Moving average analysisMoving average analysis constitutes the basis for development ofsingle topological index-based model (70,80). For the selection andevaluation of range-specific features, exclusive activity ranges werediscovered from the frequency distribution of response level andsubsequently identify the active range by analyzing the resultingdata by maximization of the moving average with respect to activecompounds (<35% = inactive, 35–65% = transitional, >65% = active)The checkpoint kinase (Chk2) inhibitory activity assigned to eachcompound was compared with the reported biological activity. Theaverage IC50 (nm) values for each range and activity were also calcu-lated.
Data analysisThe sensitivity and specificity values were calculated, whichrepresents the classification accuracies for the active and inactivecompounds, respectively. The randomness of model was alsopredicted by calculating Mathew's correlation coefficient (MCC).
Table 1: Comparison of the discriminating power and degener-acy of SEDnc
C1 , SEDncC2, SEDnc
C3, SEDncC4 using all possible structures
with three, four, and five vertices containing one heteroatom
SEDncC1
SEDncC2
SEDncC3
SEDncC4
For three verticesMinimum value 0.363 0.395 0.428 0.461Maximum value 2.484 3.809 5.396 7.16Ratio 1:6.843 1:9.643 1:12.61 1:15.54Degeneracya 0 ⁄ 3 0 ⁄ 3 0 ⁄ 3 0 ⁄ 3
For four verticesMinimum value 0.089 0.099 0.109 0.119Maximum value 6.585 14.858 32.737 70.8Ratio 1:73.989 1:150.080 1:300.34 1:594.96Degeneracy 0 ⁄ 11 0 ⁄ 11 0 ⁄ 11 0 ⁄ 11
For five verticesMinimum value 0.039 0.047 0.057 0.067Maximum value 11.814 30.236 74.188 176.075Ratio 1:302.923 1:643.319 1:1301.54 1:2627.99Degeneracy 0 ⁄ 47 0 ⁄ 47 0 ⁄ 47 0 ⁄ 47
aDegeneracy: number of compounds having same values ⁄ total number ofcompounds with same number of vertices.
Table 2: Intercorrelation matrix
v A nc MC1 MC
2 WcSEDnc
C1SEDnc
C2SEDnc
C3SEDnc
C4
vA 1 0.939 0.59 0.619 0.743 )0.01 0.061 0.106 0.132nc 1 0.599 0.662 0.67 )0.07 )0.567 0.062 0.093MC
1 1 0.979 0.016 )0.62 )0.567 )0.53 )0.496MC
2 1 0.045 )0.57 )0.502 )0.46 )0.422Wc 1 0.548 0.58 0.593 0.594SEDnc
C1 1 0.993 0.98 0.965SEDnc
C2 1 0.996 0.988SEDnc
C3 1 0.998SEDnc
C4 1
Gupta et al.
44 Chem Biol Drug Des 2012; 79: 38–52
The MCC values ranging between )1 and +1 indicates the potentialof model. Mathew's correlation coefficient took both the sensitivityand specificity into account, and it is generally used as a balancedmeasure in dealing with data imbalance situation (81).
The results are summarized in Tables 4 and 5 and Figures 5 and 6.The validation of the decision tree (DT)-based model and self con-sistency test were performed by 10-fold cross validation (CV)method, in which the data set was randomly split into 10-folds.The model was developed using nine randomly selected folds, andthe prediction was done on the remaining fold. The goodness ofDT-based model was also assessed by calculating the specificityand sensitivity. The 10-fold cross validation results have been pre-sented in Table 4.
Results and Discussion
The successful application of many topological descriptors is some-what limited owing to low discriminating power and high degener-acy. There is always a strong need for the development ofdescriptors and approaches that could provide explicit informationon the molecular aspects responsible of drug action (1). Moreover,pharmacogenomics (82), combinatorial chemistry (83,84), and highthrough put screening (85) permit to obtain and evaluate thousandsof compounds in a short time. These technologies have generatednew challenges for computational scientists, as they demand novelapproaches to the computer-aided lead discovery and optimizationin an accelerated way (86).
As the structure of the compound depends on connectivity of itsconstituent atoms, therefore, TIs based on connectivity can revealthe role of structural and substructural information of molecules inestimating biological activity and evaluate toxicity. Topological indi-ces developed for predicting physicochemical properties and biologi-cal activities of chemical substances can be used for drug design(87,88). The application of TIs in drug design can be in lead discov-ery and lead optimization, virtual screening, structure activity ⁄ prop-erty studies, structure pharmacokinetics study, and structure toxicityrelationships. Recently, these are also being used in similarity ⁄ dis-similarity studies, combinatorial chemistry in studying the chiralityof the molecule, isomer discrimination, and molecular complexity(1,3).
As shown in Figure 2, the value of SEDncC1
changes by a factor of11 (from 238.801 to 20.804), the value of SEDnc
C2 changes by a fac-tor of 30 (1554.158–52.646), the value of SEDnc
C3 changes by a
Table 3: Topostructural and topochemical indices
Code Index References
A1 Molecular connectivity topochemical index (59,60)A2 Eccentric adjacency topochemical index (61)A3 Augmented eccentric connectivity
topochemical index(62)
A4 Superadjacency topochemical index (63)A5 Eccentric connectivity topochemical index (64)A6 Connective eccentricity topochemical index (65)A7 Zagreb topochemical index, MC
1 (66)A8 Zagreb topochemical index, MC
2 (66)A9 Wiener's topochemical index (67)A10 Superaugmented eccentric connectivity
topochemical index-1(68)
A11 Superaugmented eccentric distance sumconnectivity topochemical index-1
–
A12 Superaugmented eccentric distance sumconnectivity topochemical index-2
–
A13 Superaugmented eccentric distance sumconnectivity topochemical index-3
–
A14 Superaugmented eccentric distance sumconnectivity topochemical index-4
–
A15 Molecular connectivity index (69)A16 Eccentric adjacency index (70)A17 Augmented eccentric connectivity index (71)A18 Superadjacency index (63)A19 Eccentric connectivity index (72)A20 Connective eccentricity index (73)A21 Zagreb index, M1 (74,75)A22 Zagreb index, M2 (74,75)A23 Superaugmented eccentric distance sum
connectivity index-1–
A24 Superaugmented eccentric distance sumconnectivity index-2
–
A25 Superaugmented eccentric distance sumconnectivity index-3
–
A26 Superaugmented eccentric distance sumconnectivity index-4
–
NH
N
ORR'
A B
C D
NH
N
R
O
NH
N
O
H2NOR
NH
N
O
XR'H2N
H2N
Figure 3: Basic structures of 2-arylbezimidazole analogs (48).
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 45
factor of about 77 (9810.431–127.118), and the value of SEDncC4
changes by a factor of 203 (60235.7–296.55) with a minor changein the branching of an 11-membered molecule containing one het-eroatom. These descriptors have high discriminating power, whichis defined as the ratio of highest to lowest value for all possiblestructures of same number of vertices. The discriminating power ofSEDnc
C1;SED nc
C2, SEDncC3, and SEDnc
C4 is 302.9, 643.31, 1301.54, and
2627.99, respectively for all possible structures containing only fivevertices (Table 1).
High discriminating power of proposed new descriptors rendersthem extremely sensitive toward any change in molecular structure.The indices having discriminating power ‡100 for structurescontaining only five vertices are treated as 'fourth-generation'
Compound No. R' R X cSED
C3ξ cSED
C4ξ M1
C Wc
activity predictedactivity reported
cSEDC3
ξ cSEDC4
ξ M1C Wc
A1 5-CO2H - 91586.12 896147.4 146.833 1786.002 - - - - -
A 2 5-CN - 79292.74 764564.5 137.089 1591.261 - - - - -
A 3 5-NO2 - 93857.5 925718.4 148.586 1795.521 - - - - -
A 4 5-NH2 - 59235.27 532383.1 133.423 1404.508 - - - - -
A 5 5-SO2NH2 - 119298.8 1317554 188.871 2114.952 - ± ± ± -
A 6 5-CONH2 - 91449.81 894421.7 145.643 1784.01 - - - - -
A 7 5-CONHMe - 113258.4 1145389 150.005 2001.937 - - - - -
A 8 5-CONMe2 - 127095.1 1291854 156.367 2222.032 - - - ± -
A 9 4-CONH2 - 72988.76 677747 145.643 1736.01 - - - - -
B 1 - - 8178.405 52418.31 103.425 634.61 - - - - -
B 2 -C2H5O
- 32673.71 277181.9 120.977 1049.688 - - - - -
B 3-
O
- 39180.8 334152.1 155.643 1562.043 - - - - -
B 4 -
O
- 64553.36 584108.6 145.643 1700.01 - - - - -
B 5-
H3C
- 12639.64 88537.02 109.425 748.53 - - - - -
B 6- - 23890.89 184135.9 129.425 1112.29 - - - - -
B 7
-O Cl
Cl
- 137763.8 1548364 188.807 2171.079 - ± ± ± -
B 8-
O
Cl
- 177984.9 2066145 167.225 2038.566 + + ± ± +
C1 -
Cl
Cl- 197741.5 2308991 188.807 2279.079 + + ± + +
C 2 -
Cl
CH3- 167856.9 1859651 173.225 2253.625 + + + + +
Figure 4: Relationship of superaugmented eccentric distance sum connectivity topochemical indices, Zagreb topochemical Index, Wiener'stopochemical Index with Checkpoint Kinase (Chk2) inhibitors. (+) active compound, ()) inactive compound and (€) compound in transitionalrange.
Gupta et al.
46 Chem Biol Drug Des 2012; 79: 38–52
topological descriptors (68,89). SEDncC1, SEDnc
C2, SEDncC3, and SEDnc
C4did not exhibit any degeneracy for all possible structures withthree, four, and five vertices.
Extremely low degeneracy of the proposed indices ensures theenhanced sensitivity toward the minor changes in branching, con-
nectivity, and changes in the molecular structures. The intercorrela-tion between the proposed superaugmented eccentric distance sumconnectivity topochemical indices and other well-known TIs wasalso investigated. Pairs of TIs with r ‡ 0.97 are considered highlyintercorrelated, those with 0.90 £ r £ 0.97 appreciably correlated,those with 0.50 £ r £ 0.89 weakly correlated, and finally the pairs
C3
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
C4
-
CF3
Cl- 249350.8 2968322 208.276 3011.165 ± - + ± +
-
F
CF3
- 207690.4 2276202 196.532 2961.227 + + ± ± +
-CH3
CH3- 140599.3 1459086 157.643 2228.171 + ± ± + +
- CONH2 - 208910.9 2357104 164.893 2535.002 ± - ± ± -
-
OH
- 111166.6 1114167 153.752 1999.253 - - - - -
-
OH
Cl- 196827.6 2289581 175.334 2257.954 + + + + +
- - 209235.1 2333511 171.643 2650.332 ± + + ± +
-
Cl
- 148340.1 1639973 167.225 2019.566 + ± ± ± +
- OMe - 184453.3 2076658 158.529 2282.825 + + ± + +
- Me - 125457 1298135 151.643 2014.09 - ± - ± -
-OMe
OMe
- 223391.7 2536058 171.415 2761.638 ± - ± ± -
- NO2 - 215479.8 2453650 167.836 2548.014 ± - ± ±-
Note: (+) active compound, (-) inactive compound and (±) compound in transitional range
-
COOH
- 168234 1787241 166.083 2480.243 + ± ± ± -
-
CONEt2
- 344574.6 4076895 184.285 3636.027 - - ± ± -
- NHSO2CH3 - 412604.4 5519888 209.816 3313.389 - - + ± -
- --
SO2NH-114792.8 1299518 202.267 2681.661 - ± + ± +
- -CONH- 130955.8 1346909 155.705 2194.43 - ± - ± -
- -Bond- 55669.42 497660.2 137.425 1507.13 - - - - -
Cl - -SO2NH-
161985.7 1963304 177.287 2479.373 + + + ± +
Cl - -CONH-
242726.9 2943229 177.287 2479.373 ± - + ± -
Cl - -Bond- 112354.5 1213102 159.007 1731.546 - - ± - -
Cl
Cl
- -SO2NH- 178305.8 2165525 245.43 3319.478 + + + ± +
Cl
Cl
- -CONH-
267916.7 3267598 167.419 2719.215 ± - ± ± -
Cl - -S- 228733.3 2836739 152.507 2223.325 - - ± + +
Cl - -S(O)- 178413 2157663 165.842 2431.061 + + + + +
Cl - -S(O)2- 112538.2 1330680 177.843 2642.797 - ± + ± +
- -OCH2- 132413.5 1367023 132.336 2052.09 - ± - ± -
OH - -S- 173856.5 1968678 146.007 2203.012 + + + ± +
Figure 4: Continued.
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 47
of TIs with r < 0.50 are not intercorrelated (90). As indicated inTable 2, SEDnc
C1, SEDncC2, SEDnc
C3, and SEDncC4 are not intercorrelated
with the well-known vA, nc, M1, and M2. However, these indiceswere found to be weakly intercorrelated with Wc and highly inter-correlated with each other, as these are based on similar princi-ples ⁄ matrices. The pair of indices vA and nc, M1 and M2, arehighly intercorrelated, whereas vA and Wc, nc and Wc, nc and M1,nc and M2 are found to be weakly intercorrelated, while M1 andM2 are found not be intercorrelated with Wc.
In this study, DT-, random forest (RF)- and moving average analysis(MAA)-based models were developed for the prediction of check-point kinase (Chk2) inhibitory activity of 2-arylbenzimidazole. Thedecision tree was built by utilizing 26 TIs of diverse nature. Theindex at root node is most important, and the importance of indexdecreases as the length of tree increases. The classification of 2-a-rylbenzimidazoles analogs both as active and inactive using a single
tree, based on A13, A14, and A6, is illustrated in Figure 5 (therespective descriptor is denoted with an alphanumerical abbrevia-tion that refers to Table 3). The decision tree identified the A13(SEDnc
C3) as the most important index. The decision tree classifiedthe 2-arylbenzimidazoles analogs in the training set with an accu-racy of 96% and 10-fold cross-validation with an accuracy of76.6%. The specificity and sensitivity of the DT-based model intraining set were of the order of 96.5% and 94.4%, respectively(Table 4). The specificity and sensitivity of the DT-based model incross-validated set with respect to inactive analogs were of theorder of 82.7% and 66.6%. The values of MCC for DT-based modelin the training set and cross-validated set are 0.9 and 0.03, respec-tively, suggesting the randomness and robustness of the model. Thevalues of specificity, sensitivity, and MCC are shown in Table 4.
The RFs were grown with 26 topological descriptors enlisted inTable 3. The importance of node was determined by mean
Table 4: Confusion matrix for checkpoint kinase (Chk2) inhibitory activity and recognition rate of models based on decision tree and ran-dom forest (RF)
Model Description Ranges
Number of compoundpredicted
Specificity (%) Sensitivity (%)
Mathew'scorrelationcoefficientActive Inactive
Decision tree Training set Active 17 1 96.5 94.4 0.9Inactive 1 28
Cross-validated set Active 12 6 82.7 66.6 0.03Inactive 5 24
RF Active 16 2 82.7 88.8 0.098Inactive 5 24
Table 5: Proposed model for the prediction of checkpoint kinase inhibitors
Index Nature of range Index value
Totalcompoundsin the range
Numberscompoundspredictedcorrectly
Overallaccuracy ofprediction(%) Average IC50 (nM)
SEDncC3 Lower inactive <140599.3 24 21 90 1239.44(1414.2)
Active 140599.3–2076090.4 13 12 66.49 (10.95)Transitional >2076090.4–<267916.7 7 NA 123.38Upper inactive ‡267916.7 3 3 3620 (3620)
SEDncC4 Lower inactive <1298135 17 17 94.59 1672.7 (1672.7)
Transitional 1298135–<1859651 10 NA 205.46Active 1859651–2357104 11 11 10.4 (10.4)Upper inactive >2357104 9 7 1301.41 (1671.42)
MC1 Inactive <157.64 19 18 90.32 1508 (1590.444)
Lower transitional 157.64–<171.64 9 NA 132.33Lower active 171.64–184.285 6 5 120.417 (8.5)Upper transitional >184.285–<202.267 7 NA 123.38Upper active 202.267–245.43 6 5 14.4167 (9.1)
Wc Lower inactive <2014.09 16 16 >99 1152.25 (1152.25)Lower transitional 2014.09–<2223.32 9 NA 146.7Active 2223.32–2431.06 7 7 10.843 (10.843)Upper transitional >2431.06 15 NA 165.7
NA, not applicable.Values in brackets are based on correctly predicted analogs in the particular range.
Gupta et al.
48 Chem Biol Drug Des 2012; 79: 38–52
decrease in accuracy. The RF classified 2-arylbenzimidazoles ana-logs either as active or as inactive with an accuracy of 83%. Thespecificity and sensitivity were of the order of 82.7% and 88.8%,
respectively, and the value of MCC was found to be 0.098(Table 4).
Using a single index at a time, four independent MAA-based mod-els using SEDnc
C3, SEDncC4, MC
1 , and Wc were developed. The pro-posed models are shown in Table 5. The methodology used in thisstudy aims at the development of suitable models for providing leadmolecules through exploitation of the active ranges in the proposedmodels. These models are unique and differ widely from the con-ventional QSAR models. Both systems of modeling have their ownadvantages and limitations. In the instant case, the modeling sys-tem adopted has distinct advantage of identification of narrowactive range, which may be erroneously skipped during routineregression analysis in conventional QSAR modeling (68). As the ulti-mate goal of modeling is to provide lead structures, therefore,these active ranges can play a vital role in lead identification.
Retrofit analysis of data (Figure 4 and Table 5) reveals that theMAA-based models derived from SEDnc
C3, SEDncC4, MC
1 , and Wc cor-rectly predicted analogs with regard to checkpoint kinase inhibitory(Chk2) activity to the tune of 90%, 94.5%, 90.32% and >99%,respectively. The transitional ranges were observed in all the fourmodels indicating a gradual change in checkpoint kinase inhibitoryactivity. The active ranges of the models based on SEDnc
C4 and Wc
correctly predicted checkpoint kinase inhibitory (Chk2) activity ofanalogs with an accuracy of >99%. As observed from Table 5 andFigure 6, the average IC50 of correctly predicted analogs of theactive ranges of all the four models varied from only 8.5 to�11 nM indicating exceptionally high potency. High accuracy of pre-diction amalgamated with high potency renders active ranges ofthe proposed models extremely beneficial for providing lead struc-tures for the development of potent checkpoint kinase inhibitors.
Conclusion
Superaugmented eccentric distance sum connectivity topochemicalindices – novel molecular descriptors exhibited exceptionally highdiscriminating power and sensitivity towards both the presence andthe relative position of heteroatom amalgamated with low degener-acy. Moreover, these indices were found to be non-correlating withimportant topological descriptors. These qualities ensure their utilityin drug design, quantitative structure activity ⁄ property relationships,combinatorial library design, isomer discrimination, and similar-ity ⁄ dissimilarity studies.
Subsequently, proposed TIs along with other TIs were successfullyemployed for development of numerous models for Chk2 inhibitoryactivity of 2-arylbenzimidazoles through decision tree, RF, and MAA.Decision tree revealed that proposed superaugmented eccentric dis-tance sum connectivity topochemical index-3 (SEDnc
C3) and superaug-mented eccentric distance sum connectivity topochemical index-4(SEDnc
C4) are the most important indices. The exceptionally highdegree of predictability of the resulting models offers a vast poten-tial for providing lead structures for the development of specificChk2 inhibitors that will help in improving the therapeutic windowof radiation therapy and chemotherapy by reducing their sideeffects on the normal cells.
0
500
1000
1500
2000
2500
3000
3500
4000
IC50
(nm
)
Lowerinactive
Lowertransitional
Active Uppertransitional
Upperinactive
Upper active
Nature of ranges
Superaugmented eccentric distance sum connectivity topochemical index-3Superaugmented eccentric distance sum connectivity topochemical index-4Wiener’s topochemical indexZagreb topochemical index
Figure 6: Average IC50 (nM) value of correctly predicted analogsof 2-arylbenimidazole in various ranges of topological models.
Figure 5: A decision tree for distinguishing active analog (A)from inactive analog (B); A13-superaugmented eccentric distancesum connectivity topochemical index-3 (SEDnc
C3), A14-superaugment-ed eccentric distance sum connectivity topochemical index-4(SEDnc
C4), A6 - Connective eccentricity topochemical index.
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 49
References
1. Estrada E., Uriarte E. (2001) Recent advances on the role oftopological indices in drug discovery research. Curr MedChem;8:1573–1588.
2. Hann M., Green R. (1999) Chemoinformatics – a new name foran old problem? Curr Opin Chem Biol;3:379–383.
3. Estrada E., Patlewicz G., Uriarte E. (2003) From molecular graphsto drugs. A review on the use of topological indices in drugdesign and discovery. Ind J Chem;42A:1315–1329.
4. Venkatesh S., Lipper R.A. (2000) Role of the development scien-tist in compound lead selection and optimization. J PharmSci;89:145–154.
5. Hansch C. (1969) A quantitative approach to biochemical struc-ture-activity relationships. Acc Chem Res;2:232–239.
6. Greener M. (2005) QSAR: prediction beyond the fourth dimen-sion. Drug Disc Dev;8:44–47.
7. Waterbeemd V.D., Carter R.E., Grassy G., Kubinyi H., Martin Y.C.,Tute M.S., Willett P. (1997) Glossary of terms used in computa-tional drug design. Pure Appl Chem;69:1137–1152.
8. Randic M. (1997) On characterization of chemical structure. JChem Inf Comput Sci;37:672–687.
9. Katritzky A.R., Gordeeva E.V. (1993) Traditional topological indi-ces vs electronic, geometrical, and combined molecular descrip-tors in QSAR ⁄ QSPR research. J Chem Inf Comput Sci;33:835–857.
10. Devillers J., Balaban A.T. (1999) Topological Indices and RelatedDescriptors in QSAR and QSPR. Singapore: Gordon and BreachScience Publishers.
11. Ponce Y.M. (2004) Total and local (atom and atom type) molecu-lar quadratic indices: significance interpretation, comparison toother molecular descriptors, and QSPR ⁄ QSAR applications. Bio-org Med Chem;12:6351–6369.
12. Basak S.C., Grunwald G.D. (1994) Molecular similarity and riskassessment: analog selection and property estimation usinggraph invariants. SAR QSAR Environ Res;2:289–307.
13. Basak S.C., Bertelsen S., Grunwald G.D. (1994) Application ofgraph theoretical parameters in quantifying molecular similarityand structure-activity relationships. J Chem Inf ComputSci;34:270–276.
14. Hu Q.N., Liang Y.Z., Fang K.T. (2003) The matrix expression,topological index and atomic attribute of molecular topologicalstructure. J Data Sci;1:361–389.
15. Bagchi M.C., Maiti B.C., Bose S. (2004) QSAR of anti tuberculo-sis drugs of INH type using graphical invariants. J Mol Str(Theochem);679:179–186.
16. Massague J. (2004) G1 cell-cycle control and cancer. Nat-ure;432:298–306.
17. Perona R., Amor V.M., Pinilla R.M., Injesta C.B. (2008) Role ofCHK2 in cancer development. Clin Transl Oncol;10:538–542.
18. Kawabe T. (2004) G2 checkpoint abrogators as anticancer drugs.Mol Cancer Ther;3:513–519.
19. Kastan M.B., Bartek J. (2004) Cell-cycle checkpoints and cancer.Nature;18:316–323.
20. Hoeijmakers J.H. (2001) Genome maintenance mechanisms forpreventing cancer. Nature;411:366–374.
21. Bartek J., Lukas J. (2001) Mammalian G1- and S-phase checkpointsin response to DNA damage. Curr Opin Cell Biol;13:738–747.
22. Zhou B.-B.S., Elledge S.J. (2000) The DNA damage response:putting check points in perspective. Nature;408:433–439.
23. Elledge S.J. (1996) Cell cycle checkpoints: preventing an identitycrisis. Science;274:1664–1672.
24. Brown E.J., Baltimore D. (2000) ATR disruption leads to chromo-somal fragmentation and early embryonic lethality. GenesDev;14:397–402.
25. De Klein A., Muijtjens M., Os R.V., Vehoeven Y., Smit B., CarrA.M., Lehmann A.R., Hoeijmakers J.H.J. (2000) Targeted disrup-tion of the cell-cycle checkpoint gene ATR leads to early embry-onic lethality in mice. Curr Biol;10:479–482.
26. Liu Q., Guntuku S., Cui X.S., Matsuoka S., Cortez D., Tamai K.,Luo G., Rivera S.C., Demayo F., Bradley A., Donehower L.A., Ell-edge S.J. (2000) Chk1 is an essential kinase that is regulated byATR and required for the G (2) ⁄ M DNA damage checkpoint.Genes Dev;14:1448–1459.
27. Takai H., Tominaga K., Motoyama N., Minamishima Y.A., Naga-hama H., Tsukiyama T., Ikeda K., Nakayama K., Nakanishi M.,Nakayama K.I. (2000) Aberrant cell cycle checkpoint functionand early embryonic death in Chk1 () ⁄ )) mice. GenesDev;14:1439–1447.
28. Zhou B.B., Anderson H.J., Roberge M. (2003) Models of anti-cancer therapy targeting DNA checkpoint kinases in cancer ther-apy. Cancer Biol Ther;2:S16–S22.
29. Skladanowski A., Bozko P., Sabisz M. (2009) DNA structure andintegrity checkpoints during the cell cycle and their role in drugtargeting and sensitivity of tumor cells to anticancer treatment.Chem Rev;109:2951–2973.
30. Shiloh Y. (2003) ATM and related protein kinases: safeguardinggenome integrity. Nat Rev Cancer;3:155–168.
31. Kastan M.B., Lim D. (2000) The many substrates and functionsof ATM. Nat Rev Mol Cell Biol;1:179–186.
32. Bartek J., Lukas J. (2003) Chk1 and Chk2 kinases in checkpointcontrol and cancer. Cancer Cell;3:421–429.
33. Stracker T.H., Usuia T., John Petrini H.J. (2009) Taking the timeto make important decisions: the checkpoint effector kinasesChk1 and Chk2 and the DNA damage response. DNARepair;8:1047–1054.
34. Gent V., Hoeijmakers J.H.J., Kannar R. (2001) Chromosomal sta-bility and the DNA double-stranded break connection. Nat RevCancer;2:196–206.
35. Khanna K.K., Jackson S.P. (2001) DNA double-strand breaks: sig-naling, repair and the cancer connection. Nat Genet;27:247–254.
36. Bartkova J., Horejsi Z., Koed K., Kramer A., Tort F., Zieger K.,Guldberg P., Sehested M., Nesland J.M., Lukas C., Orntoft T.,Lukas J., Bartek J. (2005) DNA damage response as a candidateanti-cancer barrier in early human tumorigenesis. Nat-ure;434:864–870.
37. Pommier Y., Sordet O., Rao V.A., Zhang H., Kohn K.W. (2005)Targeting Chk2 kinase: molecular interaction maps and therapeu-tic rationale. Curr Pharm Des;22:2855–2872.
38. Antoni L., Sodha N., Collins I., Garrett M.D. (2007) CHK2 kinase:cancer susceptibility and cancer therapy two sides of the samecoin. Nat Rev Cancer;7:925–936.
Gupta et al.
50 Chem Biol Drug Des 2012; 79: 38–52
39.Castedo M., Perfettini J.-L., Roumier T., Andreau K., YakushijinK., Horne D., Medema R., Kroemer G. (2004) The cell cyclecheckpoint kinase Chk2 is a negative regulator of mitotic catas-trophe. Oncogene;23:4353–4361.
40. Vakifahmetoglu H., Olsson M., Tamm C., Heidari N., Orrenius S.,Zhivotovsky B. (2008) DNA damage induces two distinct modesof cell death in ovarian carcinomas. Cell Death Differ;15:555–566.
41. Chabalier-Taste C., Racca C., Dozier C., Larminat F. (2008) BRCA1is regulated by Chk2 in response to spindle damage. BiochimBiophys Acta;1783:2223–2233.
42. Komarov P.G., Komarova E.A., Kondratov R.V., Christov-TselkovK., Coon J.S., Chernov M.V., Gudkov A.V. (1999) A chemicalinhibitor of p53 that protects mice from the side effects of can-cer therapy. Science;285:1733–1737.
43. Evan G., vousden K.H. (2001) Proliferation, cell cycle and apopto-sis in cancer. Nature;411:342–348.
44. Hirao A., Kong Y.Y., Mastsuoka S., Wakeham A., Ruland J.,Yoshida H., Liu D., Elledge S.J., Mak T.W. (2000) DNA damage-induced activation of p53 by the checkpoint kinase Chk2. Sci-ence;287:1824–1827.
45. Takai H., Naka K., Okada Y., Watanabe M., Harada N., Saito S.,Anderson C.W., Appella E., Nakanishi M., Suzuki H., NagashimaK., Sawa H., Ikeda K., Motoyama N. (2002) Chk2-deficient miceexhibit radioresistance and defective p53-mediated transcription.EMBO J;21:5195–5205.
46. Jack M.T., Woo R.A., Motoyama N., Takai H., Lee P.W.K. (2004)DNA-dependent protein kinase and checkpoint kinase 2 synergis-tically activate a latent population of p53 upon DNA damage. JBiol Chem;279:15269–15273.
47. Zhou B.B., Bartek J. (2004) Targeting the checkpoint kinases:chemosensitization versus chemoprotection. Nat Rev Can-cer;4:216–225.
48. Arienti K.L., Brunmark A., Axe F.U., McClur K., Lee A., Belvitt J.,Neff D.F., Haung L., Crawford S., Pandit C.R., Karlsson L., Brei-tenbcher J.G. (2005) Checkpoint kinase inhibitors: SAR andradioprotective properties of a series of 2-arylbenzimidazoles. JMed Chem;48:1873–1885.
49. Jobson A.G., Cardellina J.H. II, Scudiero D., Kondapaka S.,Zhang H., Kim H., Shoemaker R., Pommier Y. (2007) Identifica-tion of a bis-guanylhydrazone [4,4_-diacetyldiphenylurea-bis(guanylhydrazone); NSC 109555] as a novel chemotype forinhibition of Chk2 kinase. Mol Pharmacol;72:876–884.
50. Carlessi L., Buscemi G., Larson G., Hong Z., Wu J.Z., Delia D.(2007) Biochemical and cellular characterization of VRX0466617,a novel and selective inhibitor for the checkpoint kinase Chk2.Mol Cancer Ther;6:935–944.
51. Larson G., Yan S., Chen H., Rong F., Hong Z., Wu J.Z. (2007)Identification of novel, selective and potent Chk2 inhibitors. Bio-org Med Chem Lett;17:172–175.
52. Jobson A.G., Lountos G.T., Lorenzi P.L., Llamas J., Connelly J.,Cerna D., Tropea J.E. et al. (2009) Cellular inhibition of check-point kinase 2 (Chk2) and potentiation of camptothecins radia-tion by the novel Chk2 inhibitor PV1019 [7-Nitro-1H-indole-2-carboxylic acid {4-[1-(guanidinohydrazone)-ethyl]-phenyl}-amide].J Pharmacol Exp Ther;331:816–826.
53. Zabludoff S.D., Deng C., Grondine M.R., Sheehy A.M., AshwellS., Caleb B.L., Green S. et al. (2008) AZD7762, a novel check-point kinase inhibitor, drives checkpoint abrogation and potenti-ats DNA-targeted therapies. Mol Cancer Ther;7:2955–2966.
54. Curman D., Cinel B., Williams D.E., Rundle N., Blockaaron W.D.,Goodarzi A., Hutchinsi J.R., Clarkei P.R., Zhou B.-B., Lees-MillerS.P., Andersen R.J., Roberge M. (2001) Inhibition of the G2 DNAdamage checkpoint and of protein kinases Chk1 and Chk2 bythe marine sponge alkaloid debromohymenialdisine. J BiolChem;276:17914–17919.
55. Yu Q., Rose J.L., Zhang H., Takemura H., Kohn K.W., Pommier Y.(2002) UCN-01 inhibits p53 up-regulation and abrogates radia-tion-induced G2-M checkpoint independently of p53 by targetingboth of the checkpoint kinases, Chk2 and Chk1. CancerRes;62:5743–5748.
56. Singh S.V., Antosiewicz A.H., Singh A.V., Lew K.L., SrivastavaS.K., Kamath R., Brown K.D., Zhang L., Baskaran R. (2004) Sulfo-raphane-induced G2 ⁄ M phase cell cycle arrest involves check-point kinase 2-mediated phosphorylation of cell division cycle. JBiol Chem;279:25813–25822.
57. Bucher N., Britten C.D. (2008) G2 checkpoint abrogation andcheckpoint kinase-1 targeting in the treatment of cancer. Br JCancer;98:523–528.
58. Janetka J.W., Ashwell S. (2009) Checkpoint kinase inhibitors: areview of the patent literature. Expert Opin Ther Pat;19:165–197.
59. Goel A., Madan A.K. (1995) Structure-activity study on anti-inflammatory pyrazole carboxylic acid hydarzide anlogs usingmolecular connectivity indices. J Chem Inf Comput Sci;35:510–514.
60. Dureja H., Madan A.K. (2005) Topochemical models for predic-tion of cyclin-dependent kinase 2 inhibitory activity of indole-2-ones. J Mol Mod;11:525–531.
61. Gupta S., Singh M., Madan A.K. (2003) Novel topochemical de-scriptors for predicting anti-HIV activity. Indian JChem;42A:1414–1425.
62. Bajaj S. (2005) Study on topochemical descriptors for the predic-tion of physicochemical and biological properties of molecules.Ph.D. Thesis, New Delhi, India: Guru Gobind Singh IndraprasthaUniversity.
63. Bajaj S., Sambhi S.S., Madan A.K. (2004) Prediction of carbonicanhydrase activation by tri- ⁄ tetrasubstituted pyridinium-azoledrugs: a computational approach using novel topochemicaldescriptor. QSAR Comb Sci;23:506–514.
64. Kumar V., Sardana S., Madan A.K. (2004) Predicting anti-HIVactivity of 2, 3-diaryl-1, 3-thiazolidin-4-ones: computationalapproach using reformed eccentric connectivity index. J MolMod;10:399–407.
65. Gupta S. (2002) Application and development of graph invariantsof drug design. Ph.D. Thesis, Patiala, India: Punjabi University.
66. Bajaj S., Sambhi S.S., Madan A.K. (2005) Prediction of anti-inflammatory activity of N-arylanthranilic acids: computationalapproach using refined Zagreb indices. Croat ChemActa;78:165–174.
67. Bajaj S., Sambhi S.S., Madan A.K. (2004) Predicting anti-HIVactivity of phenethylthiazolethiourea (PETT) analogs:computa-tional approach using Wiener's topochemical index. J Mol Str(Theochem);684:197–203.
Superaugmented Eccentric Distance Sum Connectivity Indices
Chem Biol Drug Des 2012; 79: 38–52 51
68. Dureja H., Gupta S., Madan A.K. (2008) Predicting anti-HIV-1activity of 6-arylbenzonitriles: computational approach using su-peraugmented eccentric connectivity topochemical indices. JMol Graph and Mod;26:1020–1029.
69. Randic M. (1975) On characterization of molecular branching. JAm Chem Soc;97:6609–6615.
70. Gupta S., Singh M., Madan A.K. (2001) Predicting anti-HIV activ-ity: computational approach using a novel topological descriptor.J Comput Aided Mol Des;15:671–678.
71. Bajaj S., Sambi S.S., Madan A.K. (2006) Model for prediction ofanti-HIV activity of 2-pyridinone derivatives using novel topologi-cal descriptor. QSAR Comb Sci;25:813–823.
72. Sharma V., Goswami R., Madan A.K. (1997) Eccentric connectiv-ity index: a novel highly discriminating topological descriptor forstructure – property and structure – activity studies. J Chem InfComput Sci;37:273–282.
73. Gupta S., Singh M., Madan A.K. (2000) Connective eccentricindex: a novel topological descriptor for predicting biologicalactivity. J Mol Graph Mod;18:18–25.
74. Gutman I., Ruscic B., Trinajstic N., Wicox C.F. (1975) Graph the-ory and molecular orbitals XII acyclc polyenes. J ChemPhys;62:3399–3405.
75. Gutman I., Randic M. (1977) Algebric characterization of skeletalbranching. Chem Phys Lett;47:15–19.
76. Wiener H. (1947) Structural determination of paraffin boilingpoints. J Am Chem Soc;69:17–20.
77. Kim H., Koehler G.J. (1995) Theory and practice of decision treeinduction. Omega Int J Mgmt Sci;23:637–652.
78. Sprogar M., Kokol P., Zorman M., Podgorelec V., Yamamoto R.,Masuda G., Sakamoto N. (2001) Supporting medical decisionswith vector decision trees. Medinfo;10:552–556.
79. Breiman L. (2001) Random forests. Mach Learn;45:5–32.80. Dureja H., Gupta S., Madan A.K. (2008) Topological models for
the prediction of pharmacokinetic parametes of Cephalosporinsusing random forest, decision tree and moving average analysis.Sci Pharm;76:401–408.
81. Han L., Wang Y., Bryant S.H. (2008) Developing and validatingpredictive decision tree models from mining chemical structural
fingerprints and high throughput screening data in Pubchem.BMC Bioinformatics;9:401.
82. Bailey D.S., Dean P.M. (1992) Pharmacogenomics and its impact ondrug design and optimization. Annu Rev Med Chem;34:339–348.
83. Gallop M.A., Barrett R.W., Dower W.J., Fodor S.P.A., GordonE.M. (1994) Applications of combinatorial technologies to drugdiscovery. 1. Background and peptide combinatorial libraries. JMed Chem;37:1233–1251.
84. Gordon E.M., Barrett R., Dower W.J., Fodor S.P.A., Gallop M.A.(1994) Applications of combinatorial technologies to drug discov-ery. 2. Combinatorial organic synthesis, library screening strate-gies, and future directions. J Med Chem;37:1386–1401.
85. Devlin J.P. (2000) High Throughput Screening. New York: MarcelDekker.
86. Estrada E., Molina E. (2001) Novel local (fragment-based) topo-logical molecular descriptors for QSPR ⁄ QSAR and moleculardesign. J Mol Graph Mod;20:54–64.
87. Galvez J., Garcia-Domenech R., Julian-Ortiz J.V., Soler R. (1995)Topological approach to drug design. J Chem Inf ComputSci;35:272–284.
88. Ivanciuc O., Ivanciuc T., Klein D.J., Seitz W.A., Balaban A.T.(2001) Wiener index extension by counting even ⁄ odd graph dis-tances. J Chem Inf Comput Sci;41:536–549.
89. Madan A.K., Dureja H. (2010) Eccentricity based descriptors forQSAR ⁄ QSPR, mathematical chemistry monographs, no. 9. In: Gut-man I., Furtula B., editors. Novel Molecular Structure Descriptors– Theory and Applications II. Serbia: Croatian Chemical Society; p.91–138.
90. Nikolic S., Kovacevic G., Milicevic A., Trinajstic N. (2003) TheZagreb indices 30 years after. Croat Chem Acta;76:113–124.
Note
aAvailable at: http://cheminfo.informatics.indiana.edu/
~rguha/writing/pub/thesis/chap1.pdf (accessed 16 June
2008).
Gupta et al.
52 Chem Biol Drug Des 2012; 79: 38–52