15
Superaugmented Eccentric Distance Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for QSAR QSPR Monika Gupta 1 , Sunil Gupta 2 , Harish Dureja 1 and Anil Kumar Madan 3, * 1 Faculty of Pharmaceutical Sciences, M.D. University, Rohtak 124001, Haryana, India 2 J. C. D. College of Pharmacy, Sirsa 125055, Haryana, India 3 Faculty of Pharmaceutical Sciences, Pt. B. D. Sharma University of Health Sciences, Rohtak 124001, Haryana, India *Corresponding author: Anil Kumar Madan, [email protected] Four highly discriminating fourth-generation topo- logical indices (TIs), termed as superaugmented eccentric distance sum connectivity indices, as well as their topochemical versions (denoted by SED n c C1 , SEDn c C2 , SED n c C3 and SED n c C4 ), have been concep- tualized in this study. The values of these indices for all possible structures with three, four, and five vertices containing one heteroatom were computed using an in-house computer program. The proposed superaugmented eccentric distance sum connectivity topochemical indices exhibited exceptionally high discriminating power, low degeneracy, and high sensitivity toward both the presence and the relative position of hetero- atom(s) for all possible structures with five verti- ces containing at least one heteroatom. Intercorrelation analysis revealed the absence of correlation of proposed indices with Zagreb indices and the molecular connectivity index. Subsequently, the proposed TIs were successfully utilized for the development of models for the prediction of checkpoint kinase inhibitory activity of 2-arylbenzimidazoles. A data set comprising 47 differently substituted analogs of 2-arylbenzimi- dazoles was selected for the study. The values of various TIs for each analog in the data set were computed using an in-house computer program. The resulting data were analyzed, and suitable models were developed through decision tree (DT), random forest (RF), and moving average analysis (MAA). The performance of the models was assessed by calculating the specificity, sensi- tivity, overall accuracy, and Mathew’s correlation coefficient. A decision tree was constructed for the checkpoint kinase inhibitory activity to deter- mine the importance of topological indices. The decision tree identified the proposed TIs – SED n c C3 , SED n c C4 – as the most important indices. The deci- sion tree learned the information from the input data with an accuracy of 96% and correctly pre- dicted the cross-validated (10-fold) data with an accuracy of 77%. Random forest correctly pre- dicted the checkpoint kinase inhibitory activity with an accuracy of 83%. The single index-based models were also developed for the prediction of checkpoint kinase inhibitory activity using MAA. The accuracy of prediction of single index-based models derived through MAA was found to vary from a minimum of 90% to a maximum of 95%. Exceptionally high discriminating power, low degeneracy, and high sensitivity toward branching and presence of heteroatom of proposed indices can be of immense use in drug design, isomer dis- crimination, similarity dissimilarity studies, quanti- tative structure activity property relationships, lead optimization, and combinatorial library design. Key words: 2-arylbenzimidazole, checkpoint kinase inhibitors, super- augmented eccentric distance sum connectivity topochemical indices, topological indices Received 3 February 2011, revised 7 September 2011 and accepted for publication 16 October 2011 The identification and optimization of the lead compounds in a rapid and cost-effective way are the most critical steps in drug dis- covery. The computer-aided drug discovery approach offers an alter- native to the real world of synthesis and screening (1,2). The computational techniques have advanced rapidly over the past few decades and have played a major role in the development of a number of drugs now in the market or going through clinical trials (3,4). QSAR QSPR is the mathematical relationship linking chemical structure and pharmacological activity property in a quantitative manner for the series of compounds (5). It also reduces the number of compounds to be synthesized and promptly detects the most favorable compounds. Fundamentally, QSAR aims to identify rela- tionships between some aspects of molecular structure and proper- ties as toxicology, pharmacodynamics, and pharmacokinetics (6). The 2D approach has a number of advantages compared with the higher dimension QSAR methodologies. First of all, owing to the variety of molecular descriptors available, optimized coordinates are not always required. In fact, connectivity information (in the form of 38 Chem Biol Drug Des 2012; 79: 38–52 Research Article ª 2011 John Wiley & Sons A/S doi: 10.1111/j.1747-0285.2011.01264.x

Superaugmented Eccentric Distance Sum Connectivity Indices: Novel Highly Discriminating Topological Descriptors for QSAR/QSPR

Embed Size (px)

Citation preview

Superaugmented Eccentric Distance SumConnectivity Indices: Novel HighlyDiscriminating Topological Descriptors forQSAR ⁄QSPR

Monika Gupta1, Sunil Gupta2, HarishDureja1 and Anil Kumar Madan3,*

1Faculty of Pharmaceutical Sciences, M.D. University, Rohtak124001, Haryana, India2J. C. D. College of Pharmacy, Sirsa 125055, Haryana, India3Faculty of Pharmaceutical Sciences, Pt. B. D. Sharma University ofHealth Sciences, Rohtak 124001, Haryana, India*Corresponding author: Anil Kumar Madan, [email protected]

Four highly discriminating fourth-generation topo-logical indices (TIs), termed as superaugmentedeccentric distance sum connectivity indices, aswell as their topochemical versions (denoted bySEDnc

C1, SEDncC2,

SEDncC3 and SEDnc

C4), have been concep-tualized in this study. The values of these indicesfor all possible structures with three, four, andfive vertices containing one heteroatom werecomputed using an in-house computer program.The proposed superaugmented eccentric distancesum connectivity topochemical indices exhibitedexceptionally high discriminating power, lowdegeneracy, and high sensitivity toward both thepresence and the relative position of hetero-atom(s) for all possible structures with five verti-ces containing at least one heteroatom.Intercorrelation analysis revealed the absence ofcorrelation of proposed indices with Zagrebindices and the molecular connectivity index.Subsequently, the proposed TIs were successfullyutilized for the development of models for theprediction of checkpoint kinase inhibitory activityof 2-arylbenzimidazoles. A data set comprising 47differently substituted analogs of 2-arylbenzimi-dazoles was selected for the study. The values ofvarious TIs for each analog in the data set werecomputed using an in-house computer program.The resulting data were analyzed, and suitablemodels were developed through decision tree(DT), random forest (RF), and moving averageanalysis (MAA). The performance of the modelswas assessed by calculating the specificity, sensi-tivity, overall accuracy, and Mathew’s correlationcoefficient. A decision tree was constructed forthe checkpoint kinase inhibitory activity to deter-mine the importance of topological indices. Thedecision tree identified the proposed TIs – SEDnc

C3,SEDnc

C4 – as the most important indices. The deci-

sion tree learned the information from the inputdata with an accuracy of 96% and correctly pre-dicted the cross-validated (10-fold) data with anaccuracy of 77%. Random forest correctly pre-dicted the checkpoint kinase inhibitory activitywith an accuracy of 83%. The single index-basedmodels were also developed for the prediction ofcheckpoint kinase inhibitory activity using MAA.The accuracy of prediction of single index-basedmodels derived through MAA was found to varyfrom a minimum of 90% to a maximum of 95%.Exceptionally high discriminating power, lowdegeneracy, and high sensitivity toward branchingand presence of heteroatom of proposed indicescan be of immense use in drug design, isomer dis-crimination, similarity ⁄ dissimilarity studies, quanti-tative structure activity ⁄ property relationships,lead optimization, and combinatorial library design.

Key words: 2-arylbenzimidazole, checkpoint kinase inhibitors, super-augmented eccentric distance sum connectivity topochemical indices,topological indices

Received 3 February 2011, revised 7 September 2011 and accepted forpublication 16 October 2011

The identification and optimization of the lead compounds in arapid and cost-effective way are the most critical steps in drug dis-covery. The computer-aided drug discovery approach offers an alter-native to the real world of synthesis and screening (1,2). Thecomputational techniques have advanced rapidly over the past fewdecades and have played a major role in the development of anumber of drugs now in the market or going through clinical trials(3,4). QSAR ⁄ QSPR is the mathematical relationship linking chemicalstructure and pharmacological activity ⁄ property in a quantitativemanner for the series of compounds (5). It also reduces the numberof compounds to be synthesized and promptly detects the mostfavorable compounds. Fundamentally, QSAR aims to identify rela-tionships between some aspects of molecular structure and proper-ties as toxicology, pharmacodynamics, and pharmacokinetics (6).

The 2D approach has a number of advantages compared with thehigher dimension QSAR methodologies. First of all, owing to thevariety of molecular descriptors available, optimized coordinates arenot always required. In fact, connectivity information (in the form of

38

Chem Biol Drug Des 2012; 79: 38–52

Research Article

ª 2011 John Wiley & Sons A/S

doi: 10.1111/j.1747-0285.2011.01264.x

an adjacency matrix) alone can be used to develop QSAR models.As a result, the models using topological descriptors can be builtrapidly for very large sets of molecules. Second, this approachavoids the alignment step and thus can be used in the absence ofexperimental information regarding the binding of a molecule to itstarget.a

The 2D QSAR makes use of TIs which are the numerical valuesassociated with the chemical constitution for correlation of chemicalstructures with various physical properties, chemical reactivity, orbiological activity (7). These are derived from topological represen-tation of molecules and can be considered structure explicit descrip-tors (8). The TIs are among the most useful descriptors knownnowadays, as these can be rapidly computed for large number ofmolecules and also offer a simple way of measuring molecularbranching, shape, size, cyclicity, symmetry, chirality, complexity, andheterogeneity of atomic environments in the molecule (9–14). Thepast two decades have witnessed that the use of TIs in QSAR mod-els enhanced the scope of drug design by producing the reliableestimates of therapeutic and toxic potential of chemicals (15).

The genetic integrity of a cell is constantly challenged by radiation,chemical agents, and replication errors (16). These agents mainlycause double strand breaks (DSB) and single strand breaks (SSB) andcause genomic instability that may lead to tumor development, if leftunrepaired (17). The DNA damage is also used to cure the cancer.Many of the conventional anticancer treatments (ionizing radiation,hyperthermia, pyrimidine and purine antimetabolites, alkylatingagents, DNA topoisomerase inhibitors, and platinum compounds) atleast partly damage the DNA of cells. As these treatments are notspecifically selective for cancer cells, patients have suffered fromserious side effects when taking these drugs (18). Therefore, DNAdamage causes the disease, used to treat the disease, and responsi-ble for the toxicity of therapies for disease (19).

In DNA damage response (DDR), eukaryotic cells activate checkpointpathways to arrest the cell cycle (20–22). The checkpoints comprisea subroutine integrated into the larger DDR pathway that regulatesa multifaceted response. Moreover, several checkpoint genes areessential for cell and organism survival (23–27) implying that thesepathways are not only surveyors of occasional damage but arefirmly integrated components of cellular physiology (22).

The DNA damage checkpoints are known to comprise signal trans-duction cascades that link the detection of DNA damage to severalother processes, i.e. inhibition of progression through the cell cyclefrom G1 to S, through S and from G2 into M, activation of DNArepair and initiation of apoptosis (28). DNA damage is recognizedby damage sensor proteins such as Mre11-Rad50-Nbs1 (MRN com-plex) and breast and ovarian cancer locus 1 (BRCA1)-associatedgenome surveillance complex (BASC). These proteins recruit andactivate the upstream Ataxia-telangiectasia mutated (ATM) proteinand ATM and Rad 3-related (ATR) kinases (17,29). Checkpoint kinas-es Chk1 and Chk2 are downstream key mediators of DDR throughactivation of an increasing number of substrates such as p53,NBS1, BRCA1, MDM2, Cdc25A, Cdc25C, and E2F1 (30–32). The rel-evance of these kinases in the maintenance of genome integrity isclearly indicated by the severe human genetic disorders and the

predisposition to cancer associated with defects in these proteins(20,33–35).

Radiation and chemotherapy as the therapy for cancer often haveserious side effects that limit their efficacy. Modulations of check-point regulating responses to these types of drugs appear as apotential strategy to sensitize the tumor cells to the DNA damagingagents (17). Checkpoint kinase 2 acts as mediator between DNAdamage signaling and also act as barrier for tumorogenesis (36).There is evidence in favor of therapeutic value of Chk2 inhibitors(37,38). Checkpoint kinase 2 inhibitors are reported to augment theeffect of various cytotoxic drugs, e.g. Doxorubicin (39), Cisplatin(40), and Paclitaxel (41).

The side effects from the radiation therapy have been reported asmore serious. As these side effects are in part determined by p53-mediated apoptosis, temporary suppression of p53 has been sug-gested as a therapeutic strategy to prevent damage of normal tis-sues during treatment of p53-deficient tumors (42,43). The p53response to DNA breaks induced by radiation and certain chemicalagents is controlled by Chk2 (36). Studies showed that Chk2-defi-ciency exhibited radioresistance and a critical role in p53 functionin response to IR by regulating its transcriptional activity and itsstability indicating the utility of Chk2 inhibitors as radioprotectantfor normal cells (44,45). Thus, Chk2 inhibitors may be useful drugsfor reducing the side effects of cancer therapy and other types ofstress associated with p53 activation (46,47).

Agents that target checkpoint kinases have demonstrated impres-sive evidence preclinically that this approach will provide tumor-spe-cific potentiating agents and may have broad therapeutic utility.Only a few selective Chk2 inhibitors have been reported other than2-arylbenzimidazole (48), NSC 109555 (49), VRX0466617 (50), iso-thiazole carboxamides (51), and PV 1019 (52). There are variouspublished inhibitors of Chk1 (Staurosporin, Go6976, SB-218078, ICP-1, CEP-3891, and AZD7762) (53) and both Chk1 and Chk2 (TAT-S216A, UCN-01, and debromohymenialdisine) (54,55), CEP-6367, Sul-foraphane (18,56,57).

The past decade has witnessed the development of checkpointkinase inhibitors for the treatment of cancer. Three checkpointkinase inhibitors have already entered clinical trials since 2005 (58).The pharmaceutical industry strives to explore novel scaffolds forcheckpoint kinase inhibition.

In this study, four topological descriptors termed as superaugment-ed eccentric distance sum connectivity indices and their topochemi-cal versions have been conceptualized and successfully utilizedalong with existing TIs for development of models for prediction ofcheckpoint kinase (Chk2) inhibitory activity of 2-arylbenzimidazoles.

Methodology

Calculation of topological indicesThe values of SEDnc

Nwere calculated for all possible structures with

three, four, and five vertices containing one heteroatom (Figures 1and 2.) using an in-house computer program.

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 39

Cpd

no.*

Structure cSEDC1

ξ cSEDC2

ξ cSEDC3

ξ cSEDC4

ξ

1N

1.974 2.397 2.661 2.803

2N

2.484 3.809 5.396 7.16

3N

0.363 0.395 0.428 0.461

4 N6.585 14.858 32.737 70.8

5 N 6.244 13.833 29.931 63.636

6 N 3.872 6.435 9.815 13.806

7 N 3.186 4.168 4.859 5.261

8N

1.992 4.063 8.274 16.834

9N

1.027 1.286 1.456 1.551

10N

1.217 1.849 2.613 3.473

11N

1.348 2.084 2.955 3.906

12N

0.294 0.338 0.374 0.404

13 N 0.364 0.468 0.58 0.701

14 N 0.089 0.099 0.109 0.119

15N

8.642 20.158 45.688 101.092

16 N 8.755 20.431 46.17 101.674

17N

8.381 19.282 42.945 93.253

18 N 8.027 18.187 39.998 85.988

19 N 11.374 28.617 69.097 161.69820

N 11.814 30.236 74.188 176.075

21N 10.299 23.831 52.393 111.239

22N

4.898 8.667 14.128 21.096

23N

0.774 1.059 1.37 1.702

24N

3.811 8.09 17.153 36.323

25N

0.759 1.143 1.6 2.107

26N

0.688 1.028 1.432 1.883

27N

0.816 1.247 1.75 2.293

28N

3.387 7.274 15.4 32.248

29N

3.508 7.606 16.21 34.098

30N

3.708 8.22 17.939 38.678

Figure 1: Index values of for all possible structure with three, four, and five vertices containing one heteroatom. *Cpd no., compoundnumber.

Gupta et al.

40 Chem Biol Drug Des 2012; 79: 38–52

44

N

2.45 5.029 10.24 20.729

45 N 2.488 5.327 11.304 23.824

46N

3.814 8.996 20.596 46.037

47N

3.677 8.686 19.956 44.831

48N

3.156 7.027 15.178 32.057

49N

3.378 7.661 16.872 36.289

50 N 1.034 2.159 4.467 9.176

51 N 1.114 2.437 5.281 11.367

52N

0.93 1.905 3.874 7.834

53N

0.539 1.128 2.354 4.904

54N

0.501 1.017 2.061 4.175

55N

0.492 0.989 1.986 3.988

56N

2.085 3.322 4.84 6.533

57N

1.909 2.982 4.315 5.838

58N

1.246 2.556 5.237 10.716

59N

1.331 2.761 5.721 11.833

60N

1.303 2.647 5.371 10.89

61 N 0.253 0.319 0.39 0.467

* Cpd no.-Compound number

31 N 3.475 7.623 16.448 35.052

32 N 1.362 2.775 5.647 11.482

33

N

1.466 3.014 6.189 12.692

34N

0.319 0.452 0.599 0.757

35 N 0.294 0.414 0.547 0.692

36 N 0.301 0.485 0.713 0.966

37 N 0.26 0.408 0.596 0.814

38 N 1.085 2.358 5.078 10.859

39N

0.099 0.128 0.159 0.192

40N

0.106 0.137 0.17 0.206

41N

0.135 0.19 0.251 0.317

42N

0.039 0.047 0.057 0.067

43N

2.719 5.886 12.625 26.888

Figure 1: (Continued.)

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 41

Superaugmented eccentric distance sumconnectivity indicesSuperaugmented eccentric distance sum connectivity indices, SEDnc

N,

proposed in this study can be defined as the inverse of the summa-tion of quotients of the product of adjacent vertex degrees and theproduct of the squared distance sum and eccentricity of the con-

cerned vertex for all vertices in a hydrogen-suppressed moleculargraph. It can be expressed as follows:

SEDncN¼

Xn

i¼1

Mi

E Ni � S2

i

" #�1

ð1Þ

Arbitrary vertex numberingN

CC

CC

CC

CC

CC1

2

3

4

5

6

7

8

9

10

11

NC

CC

CC

CC

C

C

12

34

5

67

89

10

11

C

NC

CC

C

C

C

C

C

C

C

12

34

5

67

8

910

11

Chemical Distance Matrices (DC)

1 2 3 4 5 6 7 8 9 10 11 Si Ei

1234567891011

0 1 2 3 4 5 6 7 8 9 101.167 0 1 2 3 4 5 6 7 8 92.167 1 0 1 2 3 4 5 6 7 83.167 2 1 0 1 2 3 4 5 6 74.167 3 2 1 0 1 2 3 4 5 65.167 4 3 2 1 0 1 2 3 4 56.167 5 4 3 2 1 0 1 2 3 47.167 6 5 4 3 2 1 0 1 2 381.67 7 6 5 4 3 2 1 0 1 29.167 8 7 6 5 4 3 2 1 0 110.167 9 8 7 6 5 4 3 2 1 0

5546.16739.16734.16731.16730.16731.16734.167112.6746.16755.167

1098765.1676.1677.1678.1679.16710.167

1 2 3 4 5 6 7 8 9 10 11 Si Ei

1234567891011

0 1 2 3 4 5 6 7 8 2 71.167 0 1 2 3 4 5 6 7 1 62.167 1 0 1 2 3 4 5 6 2 53.167 2 1 0 1 2 3 4 5 0 44.167 3 2 1 0 1 2 3 4 4 35.167 4 3 2 1 0 1 2 3 5 26.167 5 4 3 2 1 0 1 2 6 17.167 6 5 4 3 2 1 0 1 7 28.167 7 6 5 4 3 2 1 0 8 32.167 1 2 3 4 5 6 7 8 0 77.167 6 5 4 3 2 1 2 3 7 0

4536.16731.16725.16727.16728.16731.16738.16747.16745.16740.167

87654.1675.1676.1677.1678.16787.167

1 2 3 4 5 6 7 8 9 10 11 Si Ei

1234567891011

0 1 2 3 4 2 3 4 2 3 41.167 0 1 2 3 1 2 3 1 2 32.167 1 1 2 2 2 1 2 2 1 23.167 2 1 0 1 3 2 1 3 2 14.167 3 2 1 0 4 3 2 4 3 22.167 1 2 3 4 0 3 4 2 3 43.167 2 1 2 3 3 0 3 3 2 34.167 3 2 1 2 4 3 0 4 3 22.167 1 2 3 4 2 3 4 0 3 43.167 2 1 2 3 3 2 3 3 0 34.167 3 2 1 2 4 3 2 4 3 0

2819.16718.16719.16728.16728.16725.16728.16728.16725.16728.167

432.1673.1674.16743.1674.16743.1674.167

Chemical Adjacency Matrices (AC)1 2 3 4 5 6 7 8 9 10 11 Vi

1234567891011

0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 0 0 0 0 0 00 1 0 1 0 0 0 0 0 0 00 0 1 0 1 0 0 0 0 0 00 0 0 1 0 1 0 0 0 0 00 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 0 1 0 0 00 0 0 0 0 0 1 0 1 0 00 0 0 0 0 0 0 1 0 1 00 0 0 0 0 0 0 0 1 0 10 0 0 0 0 0 0 0 0 1 0

12.167222222221

1 2 3 4 5 6 7 8 9 10 11 Vi

1234567891011

0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 0 0 0 0 1 00 1 0 1 0 0 0 0 0 0 10 0 1 0 1 0 0 0 0 0 00 0 0 1 0 1 0 0 0 0 00 0 0 0 1 0 1 0 0 0 00 0 0 0 0 1 0 1 0 1 00 0 0 0 0 0 1 0 1 0 00 0 0 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 0 0 00 0 1 0 0 0 0 0 0 0 0

13.167222232111

1 2 3 4 5 6 7 8 9 10 11 Vi

1234567891011

0 1 0 0 0 0 0 0 0 0 01.167 0 1 0 0 1 0 0 1 0 00 1 0 1 0 0 1 0 0 1 00 0 1 0 1 0 0 1 0 0 10 0 0 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0

14.167441111111

Augmentative Chemical Adjacency Matrices ( αcA )

1 2 3 4 5 6 7 8 9 10 11 Mic

123456789

1011

0 2.167 0 0 0 0 0 0 0 0 01 0 2 0 0 0 0 0 0 0 00 2.167 0 2 0 0 0 0 0 0 00 0 2 0 2 0 0 0 0 0 00 0 0 2 0 2 0 0 0 0 00 0 0 0 2 0 2 0 0 0 00 0 0 0 0 2 0 2 0 0 00 0 0 0 0 0 2 0 2 0 00 0 0 0 0 0 0 2 0 2 00 0 0 0 0 0 0 0 2 0 10 0 0 0 0 0 0 0 0 2 0

2.1672

4.33444444422

1 2 3 4 5 6 7 8 9 10 11 Mic

1234567891011

0 3.167 0 0 0 0 0 0 0 0 01 0 2 0 0 0 0 0 0 1 00 3.167 0 2 0 0 0 0 0 0 00 0 2 0 2 0 0 0 0 0 00 0 0 2 0 2 0 0 0 0 00 0 0 0 2 0 3 0 0 0 00 0 0 0 0 2 0 2 0 0 00 0 0 0 0 0 3 0 1 0 00 0 0 0 0 0 0 2 0 0 00 3.167 0 0 0 0 0 0 0 0 00 0 0 0 0 0 3 0 0 2 0

3.1672

6.334446432

3.1673

1 2 3 4 5 6 7 8 9 10 11 Mic

1234567891011

0 4.167 0 0 0 0 0 0 0 0 01 0 4 0 0 1 0 0 1 0 00 4.167 0 4 0 0 1 0 0 1 00 0 4 0 0 0 0 1 0 0 10 0 0 4 1 0 0 0 0 0 00 4.167 0 0 0 0 2 0 0 0 00 0 4 0 0 0 0 0 0 0 00 0 0 4 0 0 0 0 0 0 00 4.167 0 0 0 0 0 0 0 0 00 0 4 0 0 0 0 0 0 0 00 0 0 4 0 0 0 0 0 0 0

4.1674

16.66844

4.16744

4.16744

Superaugmented eccentric distance sum connectivity topochemical index-11n

i = 1

⎥⎥⎥

⎢⎢⎢

∗= ∑ 2

icci

cicSED

SE

Mξ C1

= 238.801 = 147.233 = 20.804Superaugmented eccentric distance sum connectivity topochemical index-2

1−

⎥⎥⎥

⎢⎢⎢

∗= ∑ 2

ic2ic

icn

i = 1

cSED

SE

MξC2

=1554.158 = 796.096 = 52.646Superaugmented eccentric distance sum connectivity topochemical index-3

1

2ic

3ic

icn

i = 1

cSED

SE

C3

⎥⎥⎥

⎢⎢⎢

∗= ∑

=9810.431 = 4155.506 = 127.118

Superaugmented eccentric distance sum connectivity topochemical index-41

2ic

4ic

icn

i = 1

cSEDSE

MξC4

⎥⎥⎥

⎢⎢⎢

∗= ∑

=60235.7 = 21001.523 = 296.55

Figure 2: Calculation of values of superaugmented eccentric distance sum connectivity topochemical index-1 (SEDncC1), superaugmented

eccentric distance sum connectivity topochemical index-2 (SEDncC2), superaugmented eccentric distance sum connectivity topochemical index-3

(SEDncC3), and superaugmented eccentric distance sum connectivity topochemical index-4 (SEDnc

C4), for three isomers of 11-membered molecule(decylamine).

Gupta et al.

42 Chem Biol Drug Des 2012; 79: 38–52

where Mi is the product of degrees of all the vertices (vj), adjacent tovertex i and can be easily obtained by multiplying all the non-zerorow elements in augmentative adjacency matrix, Ei is the eccentricity,Si is the distance sum of vertex i, and n is the number of vertices inthe graph, and the N is equal to 1, 2, 3, 4 for superaugmented eccen-tric distance sum connectivity indices-1, -2, -3, -4, respectively.

Similarly, the topochemical version of superaugmented eccentric dis-tance sum connectivity indices can be defined as the inverse of thesummation of quotients of the product of adjacent vertex chemicaldegrees and the product of the squared chemical distance sum andchemical eccentricity of the concerned vertex for all vertices in ahydrogen-suppressed molecular graph.

It can be expressed as follows:

SEDncCN ¼

Xn

i¼1

Mic

E Nic � S2

ic

" #�1

ð2Þ

where Mic is the product of chemical degrees of all the vertices(vj), adjacent to vertex i and can be easily obtained by multiplyingall the non-zero row elements in additive chemical adjacencymatrix, Eic is the chemical eccentricity, Si is the chemical distancesum of vertex i, and n is the number of vertices in the graph, andthe N is equal to 1, 2, 3, 4 for superaugmented eccentric distancesum connectivity topochemical indices-1, -2, -3, -4, respectively(denoted by SEDnc

C1, SEDnc

C2, SEDnc

C3, and SEDnc

C4).

Superaugmented eccentric distance sum connectivity topochemicalindices can be easily calculated from the chemical distance matrix(Dc), chemical adjacency matrix (AC), and augmentative chemicaladjacency matrix (Aa

c ). The calculation of proposed SEDncC1

, SEDncC2

,SEDnc

C3, and SEDnc

C4for three isomers of 11-membered molecule (de-

cylamine) has been exemplified in Figure 2.

The index values of the proposed topochemical descriptors towardpresence and the relative position of heteroatom(s) for all three-,four-, and five-membered isomers containing one heteroatom havebeen complied in Figure 1. The discriminating power and degener-acy of the superaugmented eccentric distance sum connectivity to-pochemical indices were investigated using all possible structureswith three, four, and five vertices containing one heteroatom hasbeen given in Table 1. The intercorrelation of the proposed super-augmented eccentric distance sum connectivity indices with Wie-ner's index, Zagreb indices, the molecular connectivity index, andeccentric connectivity indices were investigated (Table 2).

Topological indicesThe 26 descriptors including the proposed indices (Table 3) (59–75) ofdiverse nature were used in this study. Though a total of 26 descriptorswere employed for the present study, only 14 indices were shortlistedon the basis of non-correlating nature and classification ability. Theseshortlisted indices used in the present study are defined below.

Wiener’s topochemical index (Wc)Wiener's topochemical index (67) is defined as sum of the chemicaldistances between all pairs of vertices in hydrogen suppressed

molecular graph. It is a refined form of the oldest and widely useddistance-based topological index, Wiener's index (76), and this mod-ified index considers the presence and the relative position of het-eroatom(s) in a molecular structure. It can be expressed as

Wc ¼ 1=2Xn

i¼1

Xn

j¼1

Picjc ð3Þ

where Picjc is the chemical length the path that contains the leastnumber of edges between vertex i and j in the graph G and n isthe number of vertices in the hydrogen depleted graph(67).

Zagreb indices (M1 and M2)This pair of indices (74,75) denoted by M1 and M2 was introducedin 1972 and is defined as per the Equations 4 and 5.

M1 ¼X

vertices

dðiÞdðiÞ ð4Þ

M2 ¼Xedges

dðiÞdðjÞ ð5Þ

where d(i) is the degree of vertex i, which can be defined as numberof edges incident on a vertex i and d(i)d(j) is the weight of edge {i,j}.

Similarly Zagreb topochemical indices (66) Mc1 and Mc

2 are definedas per the Equations 6 and 7.

Mc1ðGÞ ¼

Xn

i¼1

ðd cðiÞÞ2 ð6Þ

where dc(i) is the chemical degree vertex i and n is the number ofvertices.

M c2ðGÞ ¼

Xn

ij

ðdcðiÞðd cðjÞÞ ð7Þ

where dc(i) dc(j)is the chemical weight of edge {i, j} in the hydrogensuppressed molecular graph and n is the number of edges.

Connective eccentricity indexConnective eccentricity index (73) can be defined as summation ofthe ratios of the degree of a vertex (Vi) and its eccentricity (Ei) forall vertices in the hydrogen suppressed molecular structure. It canbe expressed by the following equation:

C n ¼Xn

i¼1

Vi

Ei

� �ð8Þ

The eccentricity Ei of a vertex i in a graph G is the path length from ver-tex i to the vertex j that is farthest from i ðEi ¼maxðdijÞ; j 2 GÞ.

Data setA data set (48) comprising 47 analogs of 2-arylbenzimidazole wasselected for the present investigation. The basic structure for these

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 43

analogs is depicted in Figure 3, and various substituents areenlisted in Figure 4. The values of 26 descriptors (Table 3) used inthis study were calculated for all the analogs involved in the dataset using an in-house computer program. Compounds havingreported IC50 values of £25 nM were considered to be active,whereas those possessing IC50 values >25 nM were treated to beinactive for the purpose of the present study.

Decision treeDecision tree provides a useful solution for many problems of clas-sification where large data sets are used and the information con-tained is complex. A decision tree (generally defined) is a treewhose internal nodes are tests (on input patterns) and whose leafnodes are categories (off patterns). A decision tree assigns a classnumber (or output) to an input pattern by filtering the pattern downthrough the tests in the tree. Each test has given mutually exclusiveand exhaustive outcomes.

Decision trees are constructed beginning with the roots of tree andproceeding down to its leaves. In terms of ability, decision trees

are a rapid and effective method of classifying data set entries andcan provide good decision support capabilities (77,78). In this study,the decision tree was grown to identify the importance of TIs. In adecision tree, the molecules at each parent node are classified,based on the index value, into two child nodes. The prediction formolecule reaching a given terminal node is obtained by majorityvote of molecules reaching the same terminal node in training set.In this study, R program (version 2.1.0; University of Auckland, Auck-land, New Zealand) along with the RPART library was used to growthe decision tree. The active compounds were labeled as 'A'(n = 18) and the inactive compounds were labeled 'B' (n = 29). Eachanalog was assigned a biological activity, which was then com-pared with the reported Chk2 inhibitory activity.

Random forestRandom forest (RF) was grown for Checkpoint (Chk2) inhibitoryactivity. Random forest grows numerous classification trees. To clas-sify a new object from an input vector, put the input vector downeach of the trees in the forest. Each tree gives a classificationmeans the tree 'votes' for that class. The forest chooses the classi-fication having the most votes (over all the trees in the forest) (79).In this study, the RFs were grown with the R program (version2.1.0) using the RF library.

Moving average analysisMoving average analysis constitutes the basis for development ofsingle topological index-based model (70,80). For the selection andevaluation of range-specific features, exclusive activity ranges werediscovered from the frequency distribution of response level andsubsequently identify the active range by analyzing the resultingdata by maximization of the moving average with respect to activecompounds (<35% = inactive, 35–65% = transitional, >65% = active)The checkpoint kinase (Chk2) inhibitory activity assigned to eachcompound was compared with the reported biological activity. Theaverage IC50 (nm) values for each range and activity were also calcu-lated.

Data analysisThe sensitivity and specificity values were calculated, whichrepresents the classification accuracies for the active and inactivecompounds, respectively. The randomness of model was alsopredicted by calculating Mathew's correlation coefficient (MCC).

Table 1: Comparison of the discriminating power and degener-acy of SEDnc

C1 , SEDncC2, SEDnc

C3, SEDncC4 using all possible structures

with three, four, and five vertices containing one heteroatom

SEDncC1

SEDncC2

SEDncC3

SEDncC4

For three verticesMinimum value 0.363 0.395 0.428 0.461Maximum value 2.484 3.809 5.396 7.16Ratio 1:6.843 1:9.643 1:12.61 1:15.54Degeneracya 0 ⁄ 3 0 ⁄ 3 0 ⁄ 3 0 ⁄ 3

For four verticesMinimum value 0.089 0.099 0.109 0.119Maximum value 6.585 14.858 32.737 70.8Ratio 1:73.989 1:150.080 1:300.34 1:594.96Degeneracy 0 ⁄ 11 0 ⁄ 11 0 ⁄ 11 0 ⁄ 11

For five verticesMinimum value 0.039 0.047 0.057 0.067Maximum value 11.814 30.236 74.188 176.075Ratio 1:302.923 1:643.319 1:1301.54 1:2627.99Degeneracy 0 ⁄ 47 0 ⁄ 47 0 ⁄ 47 0 ⁄ 47

aDegeneracy: number of compounds having same values ⁄ total number ofcompounds with same number of vertices.

Table 2: Intercorrelation matrix

v A nc MC1 MC

2 WcSEDnc

C1SEDnc

C2SEDnc

C3SEDnc

C4

vA 1 0.939 0.59 0.619 0.743 )0.01 0.061 0.106 0.132nc 1 0.599 0.662 0.67 )0.07 )0.567 0.062 0.093MC

1 1 0.979 0.016 )0.62 )0.567 )0.53 )0.496MC

2 1 0.045 )0.57 )0.502 )0.46 )0.422Wc 1 0.548 0.58 0.593 0.594SEDnc

C1 1 0.993 0.98 0.965SEDnc

C2 1 0.996 0.988SEDnc

C3 1 0.998SEDnc

C4 1

Gupta et al.

44 Chem Biol Drug Des 2012; 79: 38–52

The MCC values ranging between )1 and +1 indicates the potentialof model. Mathew's correlation coefficient took both the sensitivityand specificity into account, and it is generally used as a balancedmeasure in dealing with data imbalance situation (81).

The results are summarized in Tables 4 and 5 and Figures 5 and 6.The validation of the decision tree (DT)-based model and self con-sistency test were performed by 10-fold cross validation (CV)method, in which the data set was randomly split into 10-folds.The model was developed using nine randomly selected folds, andthe prediction was done on the remaining fold. The goodness ofDT-based model was also assessed by calculating the specificityand sensitivity. The 10-fold cross validation results have been pre-sented in Table 4.

Results and Discussion

The successful application of many topological descriptors is some-what limited owing to low discriminating power and high degener-acy. There is always a strong need for the development ofdescriptors and approaches that could provide explicit informationon the molecular aspects responsible of drug action (1). Moreover,pharmacogenomics (82), combinatorial chemistry (83,84), and highthrough put screening (85) permit to obtain and evaluate thousandsof compounds in a short time. These technologies have generatednew challenges for computational scientists, as they demand novelapproaches to the computer-aided lead discovery and optimizationin an accelerated way (86).

As the structure of the compound depends on connectivity of itsconstituent atoms, therefore, TIs based on connectivity can revealthe role of structural and substructural information of molecules inestimating biological activity and evaluate toxicity. Topological indi-ces developed for predicting physicochemical properties and biologi-cal activities of chemical substances can be used for drug design(87,88). The application of TIs in drug design can be in lead discov-ery and lead optimization, virtual screening, structure activity ⁄ prop-erty studies, structure pharmacokinetics study, and structure toxicityrelationships. Recently, these are also being used in similarity ⁄ dis-similarity studies, combinatorial chemistry in studying the chiralityof the molecule, isomer discrimination, and molecular complexity(1,3).

As shown in Figure 2, the value of SEDncC1

changes by a factor of11 (from 238.801 to 20.804), the value of SEDnc

C2 changes by a fac-tor of 30 (1554.158–52.646), the value of SEDnc

C3 changes by a

Table 3: Topostructural and topochemical indices

Code Index References

A1 Molecular connectivity topochemical index (59,60)A2 Eccentric adjacency topochemical index (61)A3 Augmented eccentric connectivity

topochemical index(62)

A4 Superadjacency topochemical index (63)A5 Eccentric connectivity topochemical index (64)A6 Connective eccentricity topochemical index (65)A7 Zagreb topochemical index, MC

1 (66)A8 Zagreb topochemical index, MC

2 (66)A9 Wiener's topochemical index (67)A10 Superaugmented eccentric connectivity

topochemical index-1(68)

A11 Superaugmented eccentric distance sumconnectivity topochemical index-1

A12 Superaugmented eccentric distance sumconnectivity topochemical index-2

A13 Superaugmented eccentric distance sumconnectivity topochemical index-3

A14 Superaugmented eccentric distance sumconnectivity topochemical index-4

A15 Molecular connectivity index (69)A16 Eccentric adjacency index (70)A17 Augmented eccentric connectivity index (71)A18 Superadjacency index (63)A19 Eccentric connectivity index (72)A20 Connective eccentricity index (73)A21 Zagreb index, M1 (74,75)A22 Zagreb index, M2 (74,75)A23 Superaugmented eccentric distance sum

connectivity index-1–

A24 Superaugmented eccentric distance sumconnectivity index-2

A25 Superaugmented eccentric distance sumconnectivity index-3

A26 Superaugmented eccentric distance sumconnectivity index-4

NH

N

ORR'

A B

C D

NH

N

R

O

NH

N

O

H2NOR

NH

N

O

XR'H2N

H2N

Figure 3: Basic structures of 2-arylbezimidazole analogs (48).

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 45

factor of about 77 (9810.431–127.118), and the value of SEDncC4

changes by a factor of 203 (60235.7–296.55) with a minor changein the branching of an 11-membered molecule containing one het-eroatom. These descriptors have high discriminating power, whichis defined as the ratio of highest to lowest value for all possiblestructures of same number of vertices. The discriminating power ofSEDnc

C1;SED nc

C2, SEDncC3, and SEDnc

C4 is 302.9, 643.31, 1301.54, and

2627.99, respectively for all possible structures containing only fivevertices (Table 1).

High discriminating power of proposed new descriptors rendersthem extremely sensitive toward any change in molecular structure.The indices having discriminating power ‡100 for structurescontaining only five vertices are treated as 'fourth-generation'

Compound No. R' R X cSED

C3ξ cSED

C4ξ M1

C Wc

activity predictedactivity reported

cSEDC3

ξ cSEDC4

ξ M1C Wc

A1 5-CO2H - 91586.12 896147.4 146.833 1786.002 - - - - -

A 2 5-CN - 79292.74 764564.5 137.089 1591.261 - - - - -

A 3 5-NO2 - 93857.5 925718.4 148.586 1795.521 - - - - -

A 4 5-NH2 - 59235.27 532383.1 133.423 1404.508 - - - - -

A 5 5-SO2NH2 - 119298.8 1317554 188.871 2114.952 - ± ± ± -

A 6 5-CONH2 - 91449.81 894421.7 145.643 1784.01 - - - - -

A 7 5-CONHMe - 113258.4 1145389 150.005 2001.937 - - - - -

A 8 5-CONMe2 - 127095.1 1291854 156.367 2222.032 - - - ± -

A 9 4-CONH2 - 72988.76 677747 145.643 1736.01 - - - - -

B 1 - - 8178.405 52418.31 103.425 634.61 - - - - -

B 2 -C2H5O

- 32673.71 277181.9 120.977 1049.688 - - - - -

B 3-

O

- 39180.8 334152.1 155.643 1562.043 - - - - -

B 4 -

O

- 64553.36 584108.6 145.643 1700.01 - - - - -

B 5-

H3C

- 12639.64 88537.02 109.425 748.53 - - - - -

B 6- - 23890.89 184135.9 129.425 1112.29 - - - - -

B 7

-O Cl

Cl

- 137763.8 1548364 188.807 2171.079 - ± ± ± -

B 8-

O

Cl

- 177984.9 2066145 167.225 2038.566 + + ± ± +

C1 -

Cl

Cl- 197741.5 2308991 188.807 2279.079 + + ± + +

C 2 -

Cl

CH3- 167856.9 1859651 173.225 2253.625 + + + + +

Figure 4: Relationship of superaugmented eccentric distance sum connectivity topochemical indices, Zagreb topochemical Index, Wiener'stopochemical Index with Checkpoint Kinase (Chk2) inhibitors. (+) active compound, ()) inactive compound and (€) compound in transitionalrange.

Gupta et al.

46 Chem Biol Drug Des 2012; 79: 38–52

topological descriptors (68,89). SEDncC1, SEDnc

C2, SEDncC3, and SEDnc

C4did not exhibit any degeneracy for all possible structures withthree, four, and five vertices.

Extremely low degeneracy of the proposed indices ensures theenhanced sensitivity toward the minor changes in branching, con-

nectivity, and changes in the molecular structures. The intercorrela-tion between the proposed superaugmented eccentric distance sumconnectivity topochemical indices and other well-known TIs wasalso investigated. Pairs of TIs with r ‡ 0.97 are considered highlyintercorrelated, those with 0.90 £ r £ 0.97 appreciably correlated,those with 0.50 £ r £ 0.89 weakly correlated, and finally the pairs

C3

C5

C6

C7

C8

C9

C10

C11

C12

C13

C14

C15

C16

C17

D1

D2

D3

D4

D5

D6

D7

D8

D9

D10

D11

D12

D13

C4

-

CF3

Cl- 249350.8 2968322 208.276 3011.165 ± - + ± +

-

F

CF3

- 207690.4 2276202 196.532 2961.227 + + ± ± +

-CH3

CH3- 140599.3 1459086 157.643 2228.171 + ± ± + +

- CONH2 - 208910.9 2357104 164.893 2535.002 ± - ± ± -

-

OH

- 111166.6 1114167 153.752 1999.253 - - - - -

-

OH

Cl- 196827.6 2289581 175.334 2257.954 + + + + +

- - 209235.1 2333511 171.643 2650.332 ± + + ± +

-

Cl

- 148340.1 1639973 167.225 2019.566 + ± ± ± +

- OMe - 184453.3 2076658 158.529 2282.825 + + ± + +

- Me - 125457 1298135 151.643 2014.09 - ± - ± -

-OMe

OMe

- 223391.7 2536058 171.415 2761.638 ± - ± ± -

- NO2 - 215479.8 2453650 167.836 2548.014 ± - ± ±-

Note: (+) active compound, (-) inactive compound and (±) compound in transitional range

-

COOH

- 168234 1787241 166.083 2480.243 + ± ± ± -

-

CONEt2

- 344574.6 4076895 184.285 3636.027 - - ± ± -

- NHSO2CH3 - 412604.4 5519888 209.816 3313.389 - - + ± -

- --

SO2NH-114792.8 1299518 202.267 2681.661 - ± + ± +

- -CONH- 130955.8 1346909 155.705 2194.43 - ± - ± -

- -Bond- 55669.42 497660.2 137.425 1507.13 - - - - -

Cl - -SO2NH-

161985.7 1963304 177.287 2479.373 + + + ± +

Cl - -CONH-

242726.9 2943229 177.287 2479.373 ± - + ± -

Cl - -Bond- 112354.5 1213102 159.007 1731.546 - - ± - -

Cl

Cl

- -SO2NH- 178305.8 2165525 245.43 3319.478 + + + ± +

Cl

Cl

- -CONH-

267916.7 3267598 167.419 2719.215 ± - ± ± -

Cl - -S- 228733.3 2836739 152.507 2223.325 - - ± + +

Cl - -S(O)- 178413 2157663 165.842 2431.061 + + + + +

Cl - -S(O)2- 112538.2 1330680 177.843 2642.797 - ± + ± +

- -OCH2- 132413.5 1367023 132.336 2052.09 - ± - ± -

OH - -S- 173856.5 1968678 146.007 2203.012 + + + ± +

Figure 4: Continued.

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 47

of TIs with r < 0.50 are not intercorrelated (90). As indicated inTable 2, SEDnc

C1, SEDncC2, SEDnc

C3, and SEDncC4 are not intercorrelated

with the well-known vA, nc, M1, and M2. However, these indiceswere found to be weakly intercorrelated with Wc and highly inter-correlated with each other, as these are based on similar princi-ples ⁄ matrices. The pair of indices vA and nc, M1 and M2, arehighly intercorrelated, whereas vA and Wc, nc and Wc, nc and M1,nc and M2 are found to be weakly intercorrelated, while M1 andM2 are found not be intercorrelated with Wc.

In this study, DT-, random forest (RF)- and moving average analysis(MAA)-based models were developed for the prediction of check-point kinase (Chk2) inhibitory activity of 2-arylbenzimidazole. Thedecision tree was built by utilizing 26 TIs of diverse nature. Theindex at root node is most important, and the importance of indexdecreases as the length of tree increases. The classification of 2-a-rylbenzimidazoles analogs both as active and inactive using a single

tree, based on A13, A14, and A6, is illustrated in Figure 5 (therespective descriptor is denoted with an alphanumerical abbrevia-tion that refers to Table 3). The decision tree identified the A13(SEDnc

C3) as the most important index. The decision tree classifiedthe 2-arylbenzimidazoles analogs in the training set with an accu-racy of 96% and 10-fold cross-validation with an accuracy of76.6%. The specificity and sensitivity of the DT-based model intraining set were of the order of 96.5% and 94.4%, respectively(Table 4). The specificity and sensitivity of the DT-based model incross-validated set with respect to inactive analogs were of theorder of 82.7% and 66.6%. The values of MCC for DT-based modelin the training set and cross-validated set are 0.9 and 0.03, respec-tively, suggesting the randomness and robustness of the model. Thevalues of specificity, sensitivity, and MCC are shown in Table 4.

The RFs were grown with 26 topological descriptors enlisted inTable 3. The importance of node was determined by mean

Table 4: Confusion matrix for checkpoint kinase (Chk2) inhibitory activity and recognition rate of models based on decision tree and ran-dom forest (RF)

Model Description Ranges

Number of compoundpredicted

Specificity (%) Sensitivity (%)

Mathew'scorrelationcoefficientActive Inactive

Decision tree Training set Active 17 1 96.5 94.4 0.9Inactive 1 28

Cross-validated set Active 12 6 82.7 66.6 0.03Inactive 5 24

RF Active 16 2 82.7 88.8 0.098Inactive 5 24

Table 5: Proposed model for the prediction of checkpoint kinase inhibitors

Index Nature of range Index value

Totalcompoundsin the range

Numberscompoundspredictedcorrectly

Overallaccuracy ofprediction(%) Average IC50 (nM)

SEDncC3 Lower inactive <140599.3 24 21 90 1239.44(1414.2)

Active 140599.3–2076090.4 13 12 66.49 (10.95)Transitional >2076090.4–<267916.7 7 NA 123.38Upper inactive ‡267916.7 3 3 3620 (3620)

SEDncC4 Lower inactive <1298135 17 17 94.59 1672.7 (1672.7)

Transitional 1298135–<1859651 10 NA 205.46Active 1859651–2357104 11 11 10.4 (10.4)Upper inactive >2357104 9 7 1301.41 (1671.42)

MC1 Inactive <157.64 19 18 90.32 1508 (1590.444)

Lower transitional 157.64–<171.64 9 NA 132.33Lower active 171.64–184.285 6 5 120.417 (8.5)Upper transitional >184.285–<202.267 7 NA 123.38Upper active 202.267–245.43 6 5 14.4167 (9.1)

Wc Lower inactive <2014.09 16 16 >99 1152.25 (1152.25)Lower transitional 2014.09–<2223.32 9 NA 146.7Active 2223.32–2431.06 7 7 10.843 (10.843)Upper transitional >2431.06 15 NA 165.7

NA, not applicable.Values in brackets are based on correctly predicted analogs in the particular range.

Gupta et al.

48 Chem Biol Drug Des 2012; 79: 38–52

decrease in accuracy. The RF classified 2-arylbenzimidazoles ana-logs either as active or as inactive with an accuracy of 83%. Thespecificity and sensitivity were of the order of 82.7% and 88.8%,

respectively, and the value of MCC was found to be 0.098(Table 4).

Using a single index at a time, four independent MAA-based mod-els using SEDnc

C3, SEDncC4, MC

1 , and Wc were developed. The pro-posed models are shown in Table 5. The methodology used in thisstudy aims at the development of suitable models for providing leadmolecules through exploitation of the active ranges in the proposedmodels. These models are unique and differ widely from the con-ventional QSAR models. Both systems of modeling have their ownadvantages and limitations. In the instant case, the modeling sys-tem adopted has distinct advantage of identification of narrowactive range, which may be erroneously skipped during routineregression analysis in conventional QSAR modeling (68). As the ulti-mate goal of modeling is to provide lead structures, therefore,these active ranges can play a vital role in lead identification.

Retrofit analysis of data (Figure 4 and Table 5) reveals that theMAA-based models derived from SEDnc

C3, SEDncC4, MC

1 , and Wc cor-rectly predicted analogs with regard to checkpoint kinase inhibitory(Chk2) activity to the tune of 90%, 94.5%, 90.32% and >99%,respectively. The transitional ranges were observed in all the fourmodels indicating a gradual change in checkpoint kinase inhibitoryactivity. The active ranges of the models based on SEDnc

C4 and Wc

correctly predicted checkpoint kinase inhibitory (Chk2) activity ofanalogs with an accuracy of >99%. As observed from Table 5 andFigure 6, the average IC50 of correctly predicted analogs of theactive ranges of all the four models varied from only 8.5 to�11 nM indicating exceptionally high potency. High accuracy of pre-diction amalgamated with high potency renders active ranges ofthe proposed models extremely beneficial for providing lead struc-tures for the development of potent checkpoint kinase inhibitors.

Conclusion

Superaugmented eccentric distance sum connectivity topochemicalindices – novel molecular descriptors exhibited exceptionally highdiscriminating power and sensitivity towards both the presence andthe relative position of heteroatom amalgamated with low degener-acy. Moreover, these indices were found to be non-correlating withimportant topological descriptors. These qualities ensure their utilityin drug design, quantitative structure activity ⁄ property relationships,combinatorial library design, isomer discrimination, and similar-ity ⁄ dissimilarity studies.

Subsequently, proposed TIs along with other TIs were successfullyemployed for development of numerous models for Chk2 inhibitoryactivity of 2-arylbenzimidazoles through decision tree, RF, and MAA.Decision tree revealed that proposed superaugmented eccentric dis-tance sum connectivity topochemical index-3 (SEDnc

C3) and superaug-mented eccentric distance sum connectivity topochemical index-4(SEDnc

C4) are the most important indices. The exceptionally highdegree of predictability of the resulting models offers a vast poten-tial for providing lead structures for the development of specificChk2 inhibitors that will help in improving the therapeutic windowof radiation therapy and chemotherapy by reducing their sideeffects on the normal cells.

0

500

1000

1500

2000

2500

3000

3500

4000

IC50

(nm

)

Lowerinactive

Lowertransitional

Active Uppertransitional

Upperinactive

Upper active

Nature of ranges

Superaugmented eccentric distance sum connectivity topochemical index-3Superaugmented eccentric distance sum connectivity topochemical index-4Wiener’s topochemical indexZagreb topochemical index

Figure 6: Average IC50 (nM) value of correctly predicted analogsof 2-arylbenimidazole in various ranges of topological models.

Figure 5: A decision tree for distinguishing active analog (A)from inactive analog (B); A13-superaugmented eccentric distancesum connectivity topochemical index-3 (SEDnc

C3), A14-superaugment-ed eccentric distance sum connectivity topochemical index-4(SEDnc

C4), A6 - Connective eccentricity topochemical index.

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 49

References

1. Estrada E., Uriarte E. (2001) Recent advances on the role oftopological indices in drug discovery research. Curr MedChem;8:1573–1588.

2. Hann M., Green R. (1999) Chemoinformatics – a new name foran old problem? Curr Opin Chem Biol;3:379–383.

3. Estrada E., Patlewicz G., Uriarte E. (2003) From molecular graphsto drugs. A review on the use of topological indices in drugdesign and discovery. Ind J Chem;42A:1315–1329.

4. Venkatesh S., Lipper R.A. (2000) Role of the development scien-tist in compound lead selection and optimization. J PharmSci;89:145–154.

5. Hansch C. (1969) A quantitative approach to biochemical struc-ture-activity relationships. Acc Chem Res;2:232–239.

6. Greener M. (2005) QSAR: prediction beyond the fourth dimen-sion. Drug Disc Dev;8:44–47.

7. Waterbeemd V.D., Carter R.E., Grassy G., Kubinyi H., Martin Y.C.,Tute M.S., Willett P. (1997) Glossary of terms used in computa-tional drug design. Pure Appl Chem;69:1137–1152.

8. Randic M. (1997) On characterization of chemical structure. JChem Inf Comput Sci;37:672–687.

9. Katritzky A.R., Gordeeva E.V. (1993) Traditional topological indi-ces vs electronic, geometrical, and combined molecular descrip-tors in QSAR ⁄ QSPR research. J Chem Inf Comput Sci;33:835–857.

10. Devillers J., Balaban A.T. (1999) Topological Indices and RelatedDescriptors in QSAR and QSPR. Singapore: Gordon and BreachScience Publishers.

11. Ponce Y.M. (2004) Total and local (atom and atom type) molecu-lar quadratic indices: significance interpretation, comparison toother molecular descriptors, and QSPR ⁄ QSAR applications. Bio-org Med Chem;12:6351–6369.

12. Basak S.C., Grunwald G.D. (1994) Molecular similarity and riskassessment: analog selection and property estimation usinggraph invariants. SAR QSAR Environ Res;2:289–307.

13. Basak S.C., Bertelsen S., Grunwald G.D. (1994) Application ofgraph theoretical parameters in quantifying molecular similarityand structure-activity relationships. J Chem Inf ComputSci;34:270–276.

14. Hu Q.N., Liang Y.Z., Fang K.T. (2003) The matrix expression,topological index and atomic attribute of molecular topologicalstructure. J Data Sci;1:361–389.

15. Bagchi M.C., Maiti B.C., Bose S. (2004) QSAR of anti tuberculo-sis drugs of INH type using graphical invariants. J Mol Str(Theochem);679:179–186.

16. Massague J. (2004) G1 cell-cycle control and cancer. Nat-ure;432:298–306.

17. Perona R., Amor V.M., Pinilla R.M., Injesta C.B. (2008) Role ofCHK2 in cancer development. Clin Transl Oncol;10:538–542.

18. Kawabe T. (2004) G2 checkpoint abrogators as anticancer drugs.Mol Cancer Ther;3:513–519.

19. Kastan M.B., Bartek J. (2004) Cell-cycle checkpoints and cancer.Nature;18:316–323.

20. Hoeijmakers J.H. (2001) Genome maintenance mechanisms forpreventing cancer. Nature;411:366–374.

21. Bartek J., Lukas J. (2001) Mammalian G1- and S-phase checkpointsin response to DNA damage. Curr Opin Cell Biol;13:738–747.

22. Zhou B.-B.S., Elledge S.J. (2000) The DNA damage response:putting check points in perspective. Nature;408:433–439.

23. Elledge S.J. (1996) Cell cycle checkpoints: preventing an identitycrisis. Science;274:1664–1672.

24. Brown E.J., Baltimore D. (2000) ATR disruption leads to chromo-somal fragmentation and early embryonic lethality. GenesDev;14:397–402.

25. De Klein A., Muijtjens M., Os R.V., Vehoeven Y., Smit B., CarrA.M., Lehmann A.R., Hoeijmakers J.H.J. (2000) Targeted disrup-tion of the cell-cycle checkpoint gene ATR leads to early embry-onic lethality in mice. Curr Biol;10:479–482.

26. Liu Q., Guntuku S., Cui X.S., Matsuoka S., Cortez D., Tamai K.,Luo G., Rivera S.C., Demayo F., Bradley A., Donehower L.A., Ell-edge S.J. (2000) Chk1 is an essential kinase that is regulated byATR and required for the G (2) ⁄ M DNA damage checkpoint.Genes Dev;14:1448–1459.

27. Takai H., Tominaga K., Motoyama N., Minamishima Y.A., Naga-hama H., Tsukiyama T., Ikeda K., Nakayama K., Nakanishi M.,Nakayama K.I. (2000) Aberrant cell cycle checkpoint functionand early embryonic death in Chk1 () ⁄ )) mice. GenesDev;14:1439–1447.

28. Zhou B.B., Anderson H.J., Roberge M. (2003) Models of anti-cancer therapy targeting DNA checkpoint kinases in cancer ther-apy. Cancer Biol Ther;2:S16–S22.

29. Skladanowski A., Bozko P., Sabisz M. (2009) DNA structure andintegrity checkpoints during the cell cycle and their role in drugtargeting and sensitivity of tumor cells to anticancer treatment.Chem Rev;109:2951–2973.

30. Shiloh Y. (2003) ATM and related protein kinases: safeguardinggenome integrity. Nat Rev Cancer;3:155–168.

31. Kastan M.B., Lim D. (2000) The many substrates and functionsof ATM. Nat Rev Mol Cell Biol;1:179–186.

32. Bartek J., Lukas J. (2003) Chk1 and Chk2 kinases in checkpointcontrol and cancer. Cancer Cell;3:421–429.

33. Stracker T.H., Usuia T., John Petrini H.J. (2009) Taking the timeto make important decisions: the checkpoint effector kinasesChk1 and Chk2 and the DNA damage response. DNARepair;8:1047–1054.

34. Gent V., Hoeijmakers J.H.J., Kannar R. (2001) Chromosomal sta-bility and the DNA double-stranded break connection. Nat RevCancer;2:196–206.

35. Khanna K.K., Jackson S.P. (2001) DNA double-strand breaks: sig-naling, repair and the cancer connection. Nat Genet;27:247–254.

36. Bartkova J., Horejsi Z., Koed K., Kramer A., Tort F., Zieger K.,Guldberg P., Sehested M., Nesland J.M., Lukas C., Orntoft T.,Lukas J., Bartek J. (2005) DNA damage response as a candidateanti-cancer barrier in early human tumorigenesis. Nat-ure;434:864–870.

37. Pommier Y., Sordet O., Rao V.A., Zhang H., Kohn K.W. (2005)Targeting Chk2 kinase: molecular interaction maps and therapeu-tic rationale. Curr Pharm Des;22:2855–2872.

38. Antoni L., Sodha N., Collins I., Garrett M.D. (2007) CHK2 kinase:cancer susceptibility and cancer therapy two sides of the samecoin. Nat Rev Cancer;7:925–936.

Gupta et al.

50 Chem Biol Drug Des 2012; 79: 38–52

39.Castedo M., Perfettini J.-L., Roumier T., Andreau K., YakushijinK., Horne D., Medema R., Kroemer G. (2004) The cell cyclecheckpoint kinase Chk2 is a negative regulator of mitotic catas-trophe. Oncogene;23:4353–4361.

40. Vakifahmetoglu H., Olsson M., Tamm C., Heidari N., Orrenius S.,Zhivotovsky B. (2008) DNA damage induces two distinct modesof cell death in ovarian carcinomas. Cell Death Differ;15:555–566.

41. Chabalier-Taste C., Racca C., Dozier C., Larminat F. (2008) BRCA1is regulated by Chk2 in response to spindle damage. BiochimBiophys Acta;1783:2223–2233.

42. Komarov P.G., Komarova E.A., Kondratov R.V., Christov-TselkovK., Coon J.S., Chernov M.V., Gudkov A.V. (1999) A chemicalinhibitor of p53 that protects mice from the side effects of can-cer therapy. Science;285:1733–1737.

43. Evan G., vousden K.H. (2001) Proliferation, cell cycle and apopto-sis in cancer. Nature;411:342–348.

44. Hirao A., Kong Y.Y., Mastsuoka S., Wakeham A., Ruland J.,Yoshida H., Liu D., Elledge S.J., Mak T.W. (2000) DNA damage-induced activation of p53 by the checkpoint kinase Chk2. Sci-ence;287:1824–1827.

45. Takai H., Naka K., Okada Y., Watanabe M., Harada N., Saito S.,Anderson C.W., Appella E., Nakanishi M., Suzuki H., NagashimaK., Sawa H., Ikeda K., Motoyama N. (2002) Chk2-deficient miceexhibit radioresistance and defective p53-mediated transcription.EMBO J;21:5195–5205.

46. Jack M.T., Woo R.A., Motoyama N., Takai H., Lee P.W.K. (2004)DNA-dependent protein kinase and checkpoint kinase 2 synergis-tically activate a latent population of p53 upon DNA damage. JBiol Chem;279:15269–15273.

47. Zhou B.B., Bartek J. (2004) Targeting the checkpoint kinases:chemosensitization versus chemoprotection. Nat Rev Can-cer;4:216–225.

48. Arienti K.L., Brunmark A., Axe F.U., McClur K., Lee A., Belvitt J.,Neff D.F., Haung L., Crawford S., Pandit C.R., Karlsson L., Brei-tenbcher J.G. (2005) Checkpoint kinase inhibitors: SAR andradioprotective properties of a series of 2-arylbenzimidazoles. JMed Chem;48:1873–1885.

49. Jobson A.G., Cardellina J.H. II, Scudiero D., Kondapaka S.,Zhang H., Kim H., Shoemaker R., Pommier Y. (2007) Identifica-tion of a bis-guanylhydrazone [4,4_-diacetyldiphenylurea-bis(guanylhydrazone); NSC 109555] as a novel chemotype forinhibition of Chk2 kinase. Mol Pharmacol;72:876–884.

50. Carlessi L., Buscemi G., Larson G., Hong Z., Wu J.Z., Delia D.(2007) Biochemical and cellular characterization of VRX0466617,a novel and selective inhibitor for the checkpoint kinase Chk2.Mol Cancer Ther;6:935–944.

51. Larson G., Yan S., Chen H., Rong F., Hong Z., Wu J.Z. (2007)Identification of novel, selective and potent Chk2 inhibitors. Bio-org Med Chem Lett;17:172–175.

52. Jobson A.G., Lountos G.T., Lorenzi P.L., Llamas J., Connelly J.,Cerna D., Tropea J.E. et al. (2009) Cellular inhibition of check-point kinase 2 (Chk2) and potentiation of camptothecins radia-tion by the novel Chk2 inhibitor PV1019 [7-Nitro-1H-indole-2-carboxylic acid {4-[1-(guanidinohydrazone)-ethyl]-phenyl}-amide].J Pharmacol Exp Ther;331:816–826.

53. Zabludoff S.D., Deng C., Grondine M.R., Sheehy A.M., AshwellS., Caleb B.L., Green S. et al. (2008) AZD7762, a novel check-point kinase inhibitor, drives checkpoint abrogation and potenti-ats DNA-targeted therapies. Mol Cancer Ther;7:2955–2966.

54. Curman D., Cinel B., Williams D.E., Rundle N., Blockaaron W.D.,Goodarzi A., Hutchinsi J.R., Clarkei P.R., Zhou B.-B., Lees-MillerS.P., Andersen R.J., Roberge M. (2001) Inhibition of the G2 DNAdamage checkpoint and of protein kinases Chk1 and Chk2 bythe marine sponge alkaloid debromohymenialdisine. J BiolChem;276:17914–17919.

55. Yu Q., Rose J.L., Zhang H., Takemura H., Kohn K.W., Pommier Y.(2002) UCN-01 inhibits p53 up-regulation and abrogates radia-tion-induced G2-M checkpoint independently of p53 by targetingboth of the checkpoint kinases, Chk2 and Chk1. CancerRes;62:5743–5748.

56. Singh S.V., Antosiewicz A.H., Singh A.V., Lew K.L., SrivastavaS.K., Kamath R., Brown K.D., Zhang L., Baskaran R. (2004) Sulfo-raphane-induced G2 ⁄ M phase cell cycle arrest involves check-point kinase 2-mediated phosphorylation of cell division cycle. JBiol Chem;279:25813–25822.

57. Bucher N., Britten C.D. (2008) G2 checkpoint abrogation andcheckpoint kinase-1 targeting in the treatment of cancer. Br JCancer;98:523–528.

58. Janetka J.W., Ashwell S. (2009) Checkpoint kinase inhibitors: areview of the patent literature. Expert Opin Ther Pat;19:165–197.

59. Goel A., Madan A.K. (1995) Structure-activity study on anti-inflammatory pyrazole carboxylic acid hydarzide anlogs usingmolecular connectivity indices. J Chem Inf Comput Sci;35:510–514.

60. Dureja H., Madan A.K. (2005) Topochemical models for predic-tion of cyclin-dependent kinase 2 inhibitory activity of indole-2-ones. J Mol Mod;11:525–531.

61. Gupta S., Singh M., Madan A.K. (2003) Novel topochemical de-scriptors for predicting anti-HIV activity. Indian JChem;42A:1414–1425.

62. Bajaj S. (2005) Study on topochemical descriptors for the predic-tion of physicochemical and biological properties of molecules.Ph.D. Thesis, New Delhi, India: Guru Gobind Singh IndraprasthaUniversity.

63. Bajaj S., Sambhi S.S., Madan A.K. (2004) Prediction of carbonicanhydrase activation by tri- ⁄ tetrasubstituted pyridinium-azoledrugs: a computational approach using novel topochemicaldescriptor. QSAR Comb Sci;23:506–514.

64. Kumar V., Sardana S., Madan A.K. (2004) Predicting anti-HIVactivity of 2, 3-diaryl-1, 3-thiazolidin-4-ones: computationalapproach using reformed eccentric connectivity index. J MolMod;10:399–407.

65. Gupta S. (2002) Application and development of graph invariantsof drug design. Ph.D. Thesis, Patiala, India: Punjabi University.

66. Bajaj S., Sambhi S.S., Madan A.K. (2005) Prediction of anti-inflammatory activity of N-arylanthranilic acids: computationalapproach using refined Zagreb indices. Croat ChemActa;78:165–174.

67. Bajaj S., Sambhi S.S., Madan A.K. (2004) Predicting anti-HIVactivity of phenethylthiazolethiourea (PETT) analogs:computa-tional approach using Wiener's topochemical index. J Mol Str(Theochem);684:197–203.

Superaugmented Eccentric Distance Sum Connectivity Indices

Chem Biol Drug Des 2012; 79: 38–52 51

68. Dureja H., Gupta S., Madan A.K. (2008) Predicting anti-HIV-1activity of 6-arylbenzonitriles: computational approach using su-peraugmented eccentric connectivity topochemical indices. JMol Graph and Mod;26:1020–1029.

69. Randic M. (1975) On characterization of molecular branching. JAm Chem Soc;97:6609–6615.

70. Gupta S., Singh M., Madan A.K. (2001) Predicting anti-HIV activ-ity: computational approach using a novel topological descriptor.J Comput Aided Mol Des;15:671–678.

71. Bajaj S., Sambi S.S., Madan A.K. (2006) Model for prediction ofanti-HIV activity of 2-pyridinone derivatives using novel topologi-cal descriptor. QSAR Comb Sci;25:813–823.

72. Sharma V., Goswami R., Madan A.K. (1997) Eccentric connectiv-ity index: a novel highly discriminating topological descriptor forstructure – property and structure – activity studies. J Chem InfComput Sci;37:273–282.

73. Gupta S., Singh M., Madan A.K. (2000) Connective eccentricindex: a novel topological descriptor for predicting biologicalactivity. J Mol Graph Mod;18:18–25.

74. Gutman I., Ruscic B., Trinajstic N., Wicox C.F. (1975) Graph the-ory and molecular orbitals XII acyclc polyenes. J ChemPhys;62:3399–3405.

75. Gutman I., Randic M. (1977) Algebric characterization of skeletalbranching. Chem Phys Lett;47:15–19.

76. Wiener H. (1947) Structural determination of paraffin boilingpoints. J Am Chem Soc;69:17–20.

77. Kim H., Koehler G.J. (1995) Theory and practice of decision treeinduction. Omega Int J Mgmt Sci;23:637–652.

78. Sprogar M., Kokol P., Zorman M., Podgorelec V., Yamamoto R.,Masuda G., Sakamoto N. (2001) Supporting medical decisionswith vector decision trees. Medinfo;10:552–556.

79. Breiman L. (2001) Random forests. Mach Learn;45:5–32.80. Dureja H., Gupta S., Madan A.K. (2008) Topological models for

the prediction of pharmacokinetic parametes of Cephalosporinsusing random forest, decision tree and moving average analysis.Sci Pharm;76:401–408.

81. Han L., Wang Y., Bryant S.H. (2008) Developing and validatingpredictive decision tree models from mining chemical structural

fingerprints and high throughput screening data in Pubchem.BMC Bioinformatics;9:401.

82. Bailey D.S., Dean P.M. (1992) Pharmacogenomics and its impact ondrug design and optimization. Annu Rev Med Chem;34:339–348.

83. Gallop M.A., Barrett R.W., Dower W.J., Fodor S.P.A., GordonE.M. (1994) Applications of combinatorial technologies to drugdiscovery. 1. Background and peptide combinatorial libraries. JMed Chem;37:1233–1251.

84. Gordon E.M., Barrett R., Dower W.J., Fodor S.P.A., Gallop M.A.(1994) Applications of combinatorial technologies to drug discov-ery. 2. Combinatorial organic synthesis, library screening strate-gies, and future directions. J Med Chem;37:1386–1401.

85. Devlin J.P. (2000) High Throughput Screening. New York: MarcelDekker.

86. Estrada E., Molina E. (2001) Novel local (fragment-based) topo-logical molecular descriptors for QSPR ⁄ QSAR and moleculardesign. J Mol Graph Mod;20:54–64.

87. Galvez J., Garcia-Domenech R., Julian-Ortiz J.V., Soler R. (1995)Topological approach to drug design. J Chem Inf ComputSci;35:272–284.

88. Ivanciuc O., Ivanciuc T., Klein D.J., Seitz W.A., Balaban A.T.(2001) Wiener index extension by counting even ⁄ odd graph dis-tances. J Chem Inf Comput Sci;41:536–549.

89. Madan A.K., Dureja H. (2010) Eccentricity based descriptors forQSAR ⁄ QSPR, mathematical chemistry monographs, no. 9. In: Gut-man I., Furtula B., editors. Novel Molecular Structure Descriptors– Theory and Applications II. Serbia: Croatian Chemical Society; p.91–138.

90. Nikolic S., Kovacevic G., Milicevic A., Trinajstic N. (2003) TheZagreb indices 30 years after. Croat Chem Acta;76:113–124.

Note

aAvailable at: http://cheminfo.informatics.indiana.edu/

~rguha/writing/pub/thesis/chap1.pdf (accessed 16 June

2008).

Gupta et al.

52 Chem Biol Drug Des 2012; 79: 38–52