10
Modeling of 13 C NMR chemical shifts of benzene derivatives using the RCPCANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors Zahra Garkani-Nejad and Marziyeh Poshteh-Shirani Abstract: The primary goal of a quantitative structureproperty relationship study is to identify a set of structurally based numerical descriptors that can be mathematically linked to a property of interest. In this work, two main groups of descrip- tors have been used to predict 13 C NMR chemical shifts of ipso, ortho, meta, and para positions in a series of 113 monosub- stituted benzenes. First, two groups of descriptors original molecular descriptors (constitutional, topological, electronic, and geometrical) and multivariate image analysis (MIA) descriptors were calculated. Then, calculated descriptors were subjected to principal component analysis and the most significant principal components were extracted. Finally, more corre- lated principal components were used as inputs of artificial neural networks. The results obtained using the rank correla- tionprincipal componentartificial neural network (RCPCANN) modeling method show high ability to predict 13 C NMR chemical shifts. Also, comparison of the results indicates that MIA descriptors show better ability to predict 13 C NMR chemical shifts than the original molecular descriptors. Key words: multivariate image analysis (MIA), quantitative structureproperty relationship (QSPR), principal componentartificial neural network (PCANN), 13 C NMR chemical shift, monosubstituted benzenes. Résumé : Le but principal dune étude de relation quantitative structure-propriété est didentifier un ensemble des descrip- teurs numériques à base de structure que peuvent être liés dune façon mathématique à une propriété intéressante. Dans ce travail, deux groupes principaux de descripteurs pour prédire les déplacements chimiques, en RMN du 13 C, des positions ipso, ortho, méta et para dans une série de 113 dérivés monosubstitués du benzène. Dans un premier temps, on a calculé deux groupes de descripteurs, incluant des descripteurs moléculaires originaux (constitutionnels, topologiques, électroniques et géométriques) et on a procédé à une analyse dimage multivariable (AIM). Dans un deuxième temps, on a soumis les de- scripteurs calculés à une analyse du composant principal et on en a extrait les composants principaux les plus significatifs. Enfin, on a utilisé dautres composants principaux apparentés comme données pour des réseaux neuraux artificiels. Les ré- sultats obtenus à laide de la méthode modelage par corrélation des rangs de Spearman avec un composant principal appli- qué à un réseau neural artificiel (« CR-CP-RNA ») montrent une grande habilité à prédire les déplacements chimiques en RMN du 13 C. De plus une comparaison des résultats indique que les descripteurs dAIM possèdent une plus grande habilité à prédire les déplacements chimiques en RMN du 13 C que les descripteurs moléculaires originaux. Motsclés : analyse dimage multivariable (AIM), relation quantitative structure-propriété (RQSP), composant principal ap- pliqué à un réseau neural artificiel (« CP-RNA »), déplacement chimique en RMN du 13 C, benzènes monosubstitués. [Traduit par la Rédaction] Introduction At the present time the methods of studying quantitative structureproperty relationships are increasingly developed in various areas of chemistry, biology, and pharmacy. Quantita- tive structureactivity/structureproperty relationship (QSAR/ QSPR) studies, one of the most important areas in chemo- metrics, give information that is useful for molecular design and medicinal chemistry. 14 QSAR/QSPR models are mathe- matical equations relating chemical structure to a wide vari- ety of physical, chemical, biological, and technological properties. In general, development of a QSPR model in- volves three steps: structural encoding, feature selection, and model building. Structural encoding involves the use of nu- merical descriptors to encode the structural features of a com- pound. Feature selection is then employed to determine which subset of the descriptors indicates the best relation with the property of interest. Models built from the best subset Received 20 November 2010. Accepted 12 February 2011. Published at www.nrcresearchpress.com/cjc on 3 May 2011. Z. Garkani-Nejad and M. Poshteh-Shirani. Chemistry Department, Faculty of Science, Vali-e-Asr University, Rafsanjan, Iran. Corresponding author: Z. Garkani-Nejad (e-mail: [email protected]). 598 Can. J. Chem. 89: 598607 (2011) doi:10.1139/V11-041 Published by NRC Research Press Can. J. Chem. Downloaded from www.nrcresearchpress.com by SUNY AT STONY BROOK on 11/09/14 For personal use only.

Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

Embed Size (px)

Citation preview

Page 1: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

Modeling of 13C NMR chemical shifts of benzenederivatives using the RC–PC–ANN method: Acomparative study of original moleculardescriptors and multivariate image analysisdescriptors

Zahra Garkani-Nejad and Marziyeh Poshteh-Shirani

Abstract: The primary goal of a quantitative structure–property relationship study is to identify a set of structurally basednumerical descriptors that can be mathematically linked to a property of interest. In this work, two main groups of descrip-tors have been used to predict 13C NMR chemical shifts of ipso, ortho, meta, and para positions in a series of 113 monosub-stituted benzenes. First, two groups of descriptors — original molecular descriptors (constitutional, topological, electronic,and geometrical) and multivariate image analysis (MIA) descriptors — were calculated. Then, calculated descriptors weresubjected to principal component analysis and the most significant principal components were extracted. Finally, more corre-lated principal components were used as inputs of artificial neural networks. The results obtained using the rank correla-tion–principal component–artificial neural network (RC–PC–ANN) modeling method show high ability to predict 13C NMRchemical shifts. Also, comparison of the results indicates that MIA descriptors show better ability to predict 13C NMRchemical shifts than the original molecular descriptors.

Key words: multivariate image analysis (MIA), quantitative structure–property relationship (QSPR), principalcomponent–artificial neural network (PC–ANN), 13C NMR chemical shift, monosubstituted benzenes.

Résumé : Le but principal d’une étude de relation quantitative structure-propriété est d’identifier un ensemble des descrip-teurs numériques à base de structure que peuvent être liés d’une façon mathématique à une propriété intéressante. Dans cetravail, deux groupes principaux de descripteurs pour prédire les déplacements chimiques, en RMN du 13C, des positionsipso, ortho, méta et para dans une série de 113 dérivés monosubstitués du benzène. Dans un premier temps, on a calculédeux groupes de descripteurs, incluant des descripteurs moléculaires originaux (constitutionnels, topologiques, électroniqueset géométriques) et on a procédé à une analyse d’image multivariable (AIM). Dans un deuxième temps, on a soumis les de-scripteurs calculés à une analyse du composant principal et on en a extrait les composants principaux les plus significatifs.Enfin, on a utilisé d’autres composants principaux apparentés comme données pour des réseaux neuraux artificiels. Les ré-sultats obtenus à l’aide de la méthode modelage par corrélation des rangs de Spearman avec un composant principal appli-qué à un réseau neural artificiel (« CR-CP-RNA ») montrent une grande habilité à prédire les déplacements chimiques enRMN du 13C. De plus une comparaison des résultats indique que les descripteurs d’AIM possèdent une plus grande habilitéà prédire les déplacements chimiques en RMN du 13C que les descripteurs moléculaires originaux.

Mots‐clés : analyse d’image multivariable (AIM), relation quantitative structure-propriété (RQSP), composant principal ap-pliqué à un réseau neural artificiel (« CP-RNA »), déplacement chimique en RMN du 13C, benzènes monosubstitués.

[Traduit par la Rédaction]

Introduction

At the present time the methods of studying quantitativestructure–property relationships are increasingly developed invarious areas of chemistry, biology, and pharmacy. Quantita-tive structure–activity/structure–property relationship (QSAR/QSPR) studies, one of the most important areas in chemo-metrics, give information that is useful for molecular designand medicinal chemistry.1–4 QSAR/QSPR models are mathe-

matical equations relating chemical structure to a wide vari-ety of physical, chemical, biological, and technologicalproperties. In general, development of a QSPR model in-volves three steps: structural encoding, feature selection, andmodel building. Structural encoding involves the use of nu-merical descriptors to encode the structural features of a com-pound. Feature selection is then employed to determinewhich subset of the descriptors indicates the best relationwith the property of interest. Models built from the best subset

Received 20 November 2010. Accepted 12 February 2011. Published at www.nrcresearchpress.com/cjc on 3 May 2011.

Z. Garkani-Nejad and M. Poshteh-Shirani. Chemistry Department, Faculty of Science, Vali-e-Asr University, Rafsanjan, Iran.

Corresponding author: Z. Garkani-Nejad (e-mail: [email protected]).

598

Can. J. Chem. 89: 598–607 (2011) doi:10.1139/V11-041 Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 2: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

Table 1. Structure of monosubstituted benzenes and experimental13C NMR chemical shifts (ppm) for four positions.

No. Substituent X C-1 C-2 C-3 C-41 –H 128.5 128.5 128.5 128.52 –CH3 137.7 129.5 128.4 125.53 –CH2CH3 140.2 127.9 128.4 125.74 –CH2CH2CH3 138.8 128.3 128.6 125.85 –CH(CH3)2 148.7 126.3 128.2 125.76 –CH2CH2CH2CH3 139.4 128.3 128.3 125.77 –C(CH3)3 147.1 125.2 128.1 125.48 –cyclopropyl 143.6 125.2 127.9 124.99 –cyclopentyl 146.3 127 128.1 125.610 –cyclohexyl 144.8 126.7 128.2 125.711 –1-adamantyl 150.7 125.6 128.0 125.412 –CH2F 137.0 127.8 128.9 129.013 –CF3 131.0 125.3 128.8 131.814 –CH2Cl 137.8 128.8 128.7 128.515 –CHCl2 140.4 126.1 128.6 129.716 –CCl3 144.8 126.8 128.4 130.317 –CH2Br 138.0 129.2 128.8 128.718 –CH2I 139.0 128.5 128.5 128.519 –CH2OH 140.9 127.3 128.7 127.420 –CH2OCH3 137.2 127.6 128.4 127.621 –CH2NH2 143.4 127.1 128.3 126.522 –CH2NHCH3 141.1 128.2 128.2 126.723 –CH2N(CH3)2 136.3 129.0 128.2 127.024 –CH2NO2 130.7 130.7 130.7 129.725 –CH2CN 130.1 129.0 127.7 127.826 –CH2SH 141.0 127.9 128.5 126.927 –CH2SCH3 138.3 128.9 128.4 126.928 –CH2S(O)CH3 129.3 130 128.9 128.329 –CH2SO2CH3 128.4 130.6 129.1 129.130 –CH2CHO 135.9 129.8 129.0 127.431 –CH2COCH3 134.3 129.3 128.6 126.932 –CH2COOH 135.0 129.9 128.9 127.333 –CH=CH2 137.4 126.2 128.4 127.734 –C(CH3)=CH2 141.1 125.4 128.1 127.335 –C≡CH 122.3 132.1 128.1 128.236 –phenyl 136.6 127.4 129.0 127.437 –2-pyridyl 139.7 127.1 129.0 127.138 –4-pyridyl 138.1 126.9 129.0 12939 –F 162.1 115.5 130.1 124.140 –Cl 133.8 128.9 129.9 126.641 –Br 123.1 131.8 130.7 127.542 –I 97.3 137.4 130.1 127.443 –OH 157.3 115.7 129.9 121.144 –OCH3 162.0 114.1 129.5 125.345 –OCH=CH2 156.7 117 129.2 122.746 –O-phenyl 156.1 117.3 128.2 121.647 –OCOCH3 150.9 121.4 128.9 125.348 –OSi(CH3)3 155.3 120.1 129.4 121.449 –OPO(O-phenyl)2 150.4 120.1 129.7 125.550 –OCN 153.5 115.8 131.1 127.551 –NH2 146.7 115.1 129.3 118.552 –NHCH3 143.5 112.3 129.3 116.953 –N(CH3)2 144.5 113.1 129.4 118.054 –NH-phenyl 143.2 117.9 129.4 118.055 –N(phenyl)2 141.6 121.5 129.4 122.956 –NH3

+ 128.6 122.7 130.7 130.757 –NH2

+CH(CH3)2 134.0 124.4 129.6 129.2

Table 1 (concluded).

No. Substituent X C-1 C-2 C-3 C-4

58 –N+(CH3)3 148.0 121.2 131.0 130.959 –N(O)(CH3)2 154.7 120.1 129.3 129.160 –NHCOCH3 138.2 120.4 128.7 124.161 –NHOH 150.0 115.4 126.3 123.262 –NHNH2 151.3 112.0 129.0 118.963 –N(NO)CH3 142.2 119.1 129.4 127.264 –N=CH-phenyl 153.2 122.0 129.8 127.065 –N=NCH3 150.7 122.3 129.0 127.066 –NO 165.9 120.9 129.3 135.667 –NO2 148.9 123.6 129.4 134.668 –CN 112.5 132.0 129.2 132.869 –NCO 133.6 124.8 129.6 125.770 –NCS 131.5 125.8 129.8 127.571 –N+≡N 115.8 134.5 134.2 144.572 –SH 132.5 129.2 128.8 125.373 –SCH3 138.5 126.6 128.7 124.974 –SC(CH3)3 133.0 137.5 128.2 128.575 –S(CH3)2

+ 127.5 131.6 130.7 134.876 –SCH=CH2 134.3 130.5 128.7 126.777 –S-phenyl 135.8 131.0 129.1 127.078 –S-S-phenyl 136.0 127.2 129.3 127.479 –S(O)CH3 146.1 123.5 129.6 130.980 –SO2CH3 140.8 127.1 129.3 133.681 –SO2OH 143.5 126.3 129.8 132.382 –SO2OCH3 134.9 127.9 130.0 134.483 –SO2F 133.1 128.5 130.0 136.084 –SO2Cl 144.1 126.8 129.7 135.385 –SO2NH2 139.3 125.5 128.8 131.786 –SCN 124.8 131.0 130.7 130.787 –CHO 136.7 129.7 129.0 134.388 –COCH3 137.4 128.6 128.4 132.989 –COCF3 122.9 130.3 129.2 135.290 –COC≡CH 135.9 129.5 128.5 134.491 –CO-phenyl 137.8 130.1 128.2 132.292 –COOH 130.6 130.1 128.4 133.793 –COOCH3 130.5 129.7 128.4 132.894 –CONH2 133.5 127.3 128.6 131.995 –CON(CH3)2 134.5 127 128.3 129.596 –COF 132.7 130.1 127.8 133.897 –COCl 133.2 131.2 128.8 135.198 –COSH 134.7 127.9 128.7 133.999 –CH=NCH3 137.3 129.0 128.6 130.8100 –CSH2-phenyl 147.2 129.5 127.9 130.9101 –CSH2-(1-piperidyl) 143.5 125.4 128.3 128.3102 –SiH3 128.0 135.8 128.1 129.8103 –SiH2CH3 133.3 134.8 128.0 129.5104 –Si(CH3)3 140.1 133.4 127.8 128.9105 –Si(phenyl)3 134.3 136.4 127.9 129.6106 –SiCl3 131.5 133.1 128.6 132.7107 –P(CH3)2 142.1 130.1 127.9 127.5108 –P(phenyl)2 137.4 133.7 128.5 128.6109 –PO(CH3)2 131.0 129.6 128.6 131.5110 –PO(OH)2 126.6 132.1 130.0 134.1111 –PO(OCH2CH3)2 130.1 132.1 128.3 131.9112 –PS(CH3)2 135.2 130.5 128.7 131.4113 –PS(OCH2CH3)2 134.6 131.3 128.1 131.9

Garkani-Nejad and Poshteh-Shirani 599

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 3: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

of descriptors form a direct link between descriptors and theproperty of interest. Finally, validation determines the levelof the model’s predictive capabilities for unknown com-pounds.The most frequently applied computational methods for

QSPR are based on 3D or multidimensional approaches,5–11which provide information about the relation between physi-cochemical parameters and the property of interest. A 2D im-age-based methodology has recently been developed (MIA-QSPR, multivariate image analysis applied to QSPR),12,13which avoids conformational screening and 3D alignmentsteps. MIA-QSPR has also been shown to be nicely predic-tive with some operational advantages.Geladi and Esbensen14 have demonstrated that image anal-

ysis may provide useful information in chemistry, though thedescriptors do not have a direct physicochemical meaning,since they are binaries. In MIA-QSPR studies, images (2Dchemical structures) have been shown to contain chemical in-

formation,12,13 allowing the correlation of chemical structuresand properties.Goodarzi et al. have reported a QSPR study of the 13C

NMR chemical shifts of methoxy flavonol derivatives usingthe MIA-QSPR method.15 They revealed that the predictiveability of MIA descriptors is comparable or even superior tothat of the well-known gauge-included atomic orbital proce-dure for 13C NMR chemical shift calculation. Garkani-Nejadand Poshteh-Shirani have used the MIA-QSPR method topredict 13C NMR chemical shifts of naphthalene deriva-tives.16 The obtained results indicated high predictive abilityof MIA descriptors in predicting 13C NMR chemical shiftsof naphthalene derivatives.The main aim of the present work was to reveal the capa-

bilities of the rank correlation–principal component–artificialneural network (RC–PC–ANN) modeling method in predict-ing 13C NMR chemical shifts of a series of 113 monosubsti-tuted benzenes using two kinds of descriptors: original

Fig. 1. 2D images and the unfolding step of the 113 chemical structures to give the X-matrix. The arrow in the structure indicates the coordi-nates of a pixel in common among the whole series of compounds, used in the 2D alignment step.

Table 2. CR–PCR models for C1–C4 positions of monosubstituted benzenes using PCs extracted from original molecular de-scriptors.

Position ModelIpso 138.627(±0.694) + 0.460(±0.131)C2 + 0.625(±0.203)C3 – 1.463(±0.278)C7 – 1.509 (±0.378)C9 + 1.179(±0.462)C12 –

2.868(±0.947)C27 – 3.884(±1.216)C32 + 3.295 (±1.299)C34 + 21.199(±7.830)C76

Ortho 126.691(±0.367) – 0.289(±0.070)C2 – 0.623(±0.108)C3 + 0.657(±0.147)C7 + 1.221(±0.200)C9

Meta 128.950(±0.064) + 0.147(±0.022)C5 + 0.117(±0.025)C6 + 0.141(±0.026)C7 + 0.116(±0.035)C9

Para 128.458(±0.275) – 0.531(±0.052)C2 + 0.591(±0.110)C7

Table 3. CR–PCR models for C1–C4 positions of monosubstituted benzenes using PCs extracted from MIA descriptors.

Position ModelIpso 138.627(±0.718) + 0.298(±0.111)C3 – 0.682(±0.277)C27 – 0.731(±0.322)C36 + 0.893(±0.395)C52 + 0.988(±0.397)C53 –

1.344(±0.459)C64 + 1.281(±0.537)C76 – 1.369(±0.547)C78 – 1.258(±0.570)C81 + 1.297(±0.587)C82 +1.504(±0.627)C87 + 1.524(±0.672)C91 – 1.912(±0.733)C95 – 2.566(±0.748)C96 – 2.644(±0.966)C108

Ortho 126.691(±0.386) – 0.170(±0.60)C3 + 0.212(±0.096)C10 + 0.305(±0.135)C21 – 0.384(±0.144)C25 + 0.318(±149)C27 –0.380(±0.152)C29 + 0.573(±0.171)C35 + 0.569(±0.173)C36 + 0.578(±0.268)C71 – 0.745(±0.293)C77 +0.772(±0.294)C78 + 0.773(±0.306)C81 – 0.773(±0.315)C82 + 1.005(±0.394)C95 + 1.021(±0.518)C108

Meta 128.950(±0.069) + 0.034(±0.013)C5 – 0.051(±0.017)C10 – 0.049(±0.020)C14 – 0.044(±0.022)C18 + 0.059(±0.023)C20 –0.065(±0.026)C26 + 0.067(±0.026)C27 – 0.099(±0.029)C33 + 0.074(±0.032)C39 + 0.069(±0.034)C44 +0.068(±0.035)C46 + 0.103(±0.042)C61 + 0.098(±0.043)C63 + 0.119(±0.053)C79 – 0.117(±0.054)C80

Para 128.958(±0.307) – 0.068(±0.027)C1 – 0.157(±0.068)C7 – 0.243(±0.074)C9 – 0.285(±0.094)C17 – 0.268(±0.099)C16 +0.233(±0.103)C20 + 0.317(±0.118)C27 + 0.442(±0.136)C35 + 0.386 (±0.143)C39 – 0.356(±0.149)C40 +0.422(±0.193)C61 + 0.456(±0.230)C75 – 0.479(±0.236)C76 – 0.500(±0.244)C79 – 0.686(±0.313)C94

600 Can. J. Chem. Vol. 89, 2011

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 4: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

Table 4. Calculated values of 13C NMR chemical shifts (ppm)using the RC–PC–ANN method based on original molecular de-scriptors for all carbon positions: training, validation, and test sets.

No. Substituent X C-1 C-2 C-3 C-4

Training1 –H 130.77 128.54 128.50 128.612 –CH3 144.90 129.50 128.44 122.313 –CH2CH3 134.64 128.76 128.30 124.564 –CH2CH2CH3 140.20 129.52 128.28 125.166 –CH2CH2CH2CH3 139.85 129.58 128.28 126.447 –C(CH3)3 142.36 125.32 128.37 125.168 –cyclopropyl 143.79 125.31 128.43 123.1810 –cyclohexyl 144.79 126.70 128.37 127.4211 –1-adamantyl 145.38 125.60 128.55 122.9012 –CH2F 142.51 128.35 128.52 127.3313 –CF3 139.14 125.28 128.87 132.6315 –CHCl2 141.35 125.49 128.99 132.2316 –CCl3 146.07 126.74 128.40 132.6017 –CH2Br 138.43 128.73 128.74 129.0618 –CH2I 137.53 129.28 129.05 129.1219 –CH2OH 140.93 128.23 128.43 127.4321 –CH2NH2 136.76 127.10 128.50 126.9625 –CH2CN 141.68 129.27 128.61 126.8026 –CH2SH 137.10 128.29 128.46 127.5227 –CH2SCH3 138.78 129.30 128.38 126.9828 –CH2S(O)CH3 134.01 129.36 128.70 128.2131 –CH2COCH3 143.34 129.55 128.36 128.5832 –CH2COOH 135.63 129.51 128.50 131.3835 –C≡CH 122.04 129.35 128.45 126.3939 –F 142.86 115.50 130.10 122.2940 –Cl 134.78 128.09 129.65 126.8241 –Br 123.11 128.39 130.47 128.5643 –OH 153.63 115.18 129.90 121.2345 –OCH=CH2 156.87 119.54 128.86 124.6546 –O-phenyl 151.65 117.30 129.67 120.6147 –OCOCH3 146.26 119.70 128.87 127.8148 –OSi(CH3)3 151.17 119.45 129.40 124.7549 –OPO(O-phenyl)2 147.79 119.76 129.70 125.2950 –OCN 149.70 118.99 130.83 128.2651 –NH2 151.84 115.06 129.30 120.5353 –N(CH3)2 149.10 113.03 129.40 120.2054 –NH-phenyl 147.96 117.90 129.26 118.8055 –N(phenyl)2 147.08 121.52 129.33 122.7856 –NH3

+ 134.14 122.70 130.70 129.5157 –NH2

+CH(CH3)2 133.33 129.50 129.59 129.7358 –N+(CH3)3 136.65 126.78 131.00 131.9760 –NHCOCH3 138.31 119.59 129.12 126.6861 –NHOH 152.47 115.16 126.24 124.3262 –NHNH2 153.31 112.84 129.16 123.8564 –N=CH-phenyl 156.51 119.71 128.76 128.3365 –N=NCH3 147.51 129.32 128.72 127.4666 –NO 156.92 127.54 128.84 131.6967 –NO2 150.51 123.84 128.88 132.4468 –CN 127.19 128.90 128.93 128.1169 –NCO 135.44 124.80 129.04 127.9371 –N+≡N 119.60 134.31 134.18 143.9672 –SH 138.54 129.29 128.73 125.9474 –SC(CH3)3 133.15 128.93 128.61 127.8875 –S(CH3)2

+ 130.46 130.85 130.84 132.4876 –SCH=CH2 133.75 128.53 128.70 127.4278 –S-S-phenyl 137.85 129.24 129.06 128.2981 –SO2OH 144.52 128.17 129.33 132.67

Table 4 (concluded).

No. Substituent X C-1 C-2 C-3 C-4

83 –SO2F 138.80 128.18 129.64 136.0084 –SO2Cl 137.25 128.21 130.01 135.3085 –SO2NH2 141.86 128.19 129.57 132.6786 –SCN 128.93 126.59 130.07 129.6889 –COCF3 123.48 129.18 128.86 132.6690 –COC≡CH 130.74 129.40 128.33 132.3091 –CO-phenyl 137.02 129.55 128.39 126.3892 –COOH 133.85 128.48 128.38 132.5193 –COOCH3 131.50 129.37 128.30 132.4194 –CONH2 128.46 126.91 128.46 132.3896 –COF 135.14 127.83 128.47 132.5297 –COCl 136.80 131.20 128.51 132.5199 –CH=NCH3 141.22 129.42 128.38 127.12100 –CSH2-phenyl 143.13 129.50 128.61 128.88101 –CSH2-(1-piperidyl) 145.58 125.40 128.53 128.74104 –Si(CH3)3 137.92 133.67 128.36 132.44105 –Si(phenyl)3 136.67 136.09 127.99 125.37106 –SiCl3 131.81 134.78 128.60 132.67108 –P(phenyl)2 139.23 133.66 128.85 129.83109 –PO(CH3)2 139.08 129.02 128.34 132.67110 –PO(OH)2 131.61 128.42 128.71 132.67111 –PO(OCH2CH3)2 131.31 132.32 129.00 132.67112 –PS(CH3)2 137.13 128.32 128.36 132.67113 –PS(OCH2CH3)2 135.35 131.30 128.34 132.67

Validation5 –CH(CH3)2 139.53 126.32 128.20 125.7514 –CH2Cl 137.10 128.85 128.70 128.5023 –CH2N(CH3)2 133.69 128.96 128.20 127.0024 –CH2NO2 131.78 130.21 130.70 129.7530 –CH2CHO 132.87 129.73 129.00 127.4036 –phenyl 133.43 127.36 129.00 127.3937 –2-pyridyl 138.17 127.10 130.70 127.1038 –4-pyridyl 137.34 126.73 130.70 128.9842 –I 96.90 136.38 130.70 127.4163 –N(NO)CH3 140.25 119.07 130.70 127.2270 –NCS 128.28 125.92 129.80 127.5073 –SCH3 139.63 130.25 128.70 124.9187 –CHO 135.77 129.81 129.00 133.1988 –COCH3 136.32 128.67 128.40 131.3795 –CON(CH3)2 132.84 127.10 128.30 131.36103 –SiH2CH3 137.91 131.39 128.00 129.46

Test9 –cyclopentyl 143.84 128.16 128.10 125.6220 –CH2OCH3 136.52 128.15 128.19 127.1122 –CH2NHCH3 137.34 128.15 128.23 126.8129 –CH2SO2CH3 124.95 128.12 129.10 129.2233 –CH=CH2 134.98 128.15 128.38 127.5834 –C(CH3)=CH2 141.08 128.15 128.26 127.5344 –OCH3 156.50 114.21 129.49 125.3752 –NHCH3 144.57 112.02 129.30 116.9159 –N(O)(CH3)2 154.18 119.95 129.31 128.6777 –S-phenyl 131.75 128.15 129.10 127.0479 –S(O)CH3 144.01 123.10 129.60 132.3480 –SO2CH3 139.73 127.30 129.30 132.4082 –SO2OCH3 134.99 127.09 130.00 134.4098 –COSH 132.02 127.83 128.70 132.19102 –SiH3 128.19 135.79 128.10 129.72107 –P(CH3)2 139.46 129.02 127.93 127.64

Garkani-Nejad and Poshteh-Shirani 601

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 5: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

Table 5. Calculated values of 13C NMR chemical shifts (ppm)using the RC–PC–ANN method based on MIA descriptors for allcarbon positions: training, validation, and test sets.

No. Substituent X C-1 C-2 C-3 C-4

Training1 –H 128.64 129.48 128.66 128.282 –CH3 143.43 129.42 128.22 125.313 –CH2CH3 137.86 127.79 128.42 125.794 –CH2CH2CH3 140.30 128.93 128.55 125.936 –CH2CH2CH2CH3 136.20 128.80 128.05 127.357 –C(CH3)3 150.40 123.66 128.03 124.938 –cyclopropyl 141.98 126.97 127.81 125.4510 –cyclohexyl 147.67 127.65 128.25 125.6111 –1-adamantyl 148.03 125.25 128.02 126.6612 –CH2F 132.34 127.83 129.08 128.6713 –CF3 134.16 125.95 128.94 130.9415 –CHCl2 141.55 129.37 128.33 129.5816 –CCl3 138.09 126.53 128.42 129.4417 –CH2Br 136.52 129.62 128.62 128.9618 –CH2I 136.10 129.04 128.39 128.5719 –CH2OH 142.10 127.87 128.70 127.1021 –CH2NH2 142.01 128.02 128.26 126.3325 –CH2CN 129.74 128.14 127.72 127.8726 –CH2SH 141.08 127.82 128.52 127.1527 –CH2SCH3 140.73 129.40 128.41 127.2728 –CH2S(O)CH3 134.06 128.75 128.98 128.2131 –CH2COCH3 131.35 129.81 128.44 127.0232 –CH2COOH 128.69 128.32 129.00 127.2735 –C≡CH 123.75 131.40 128.18 128.7339 –F 158.59 115.21 129.71 123.3340 –Cl 135.37 128.35 129.98 126.7441 –Br 128.02 130.36 130.19 127.8643 –OH 154.16 115.47 129.98 121.0645 –OCH=CH2 145.89 116.58 129.07 122.3446 –O-phenyl 155.84 117.78 128.18 121.5647 –OCOCH3 142.60 126.41 128.86 125.7648 –OSi(CH3)3 155.30 120.38 129.54 121.5349 –OPO(O-phenyl)2 145.62 123.16 129.55 124.0450 –OCN 151.82 116.70 131.12 128.0351 –NH2 146.77 115.10 129.30 119.1753 –N(CH3)2 145.27 111.75 129.49 118.9254 –NH-phenyl 139.53 120.92 129.49 117.8555 –N(phenyl)2 137.70 121.68 129.41 122.6156 –NH3

+ 131.57 123.83 130.74 130.4957 –NH2

+CH(CH3)2 133.52 125.96 129.39 129.6458 –N+(CH3)3 140.98 121.20 131.00 130.0860 –NHCOCH3 140.27 121.62 128.76 124.1261 –NHOH 149.95 115.20 126.36 122.7262 –NHNH2 151.35 112.83 129.03 118.7764 –N=CH-phenyl 158.37 121.11 129.44 127.3265 –N=NCH3 143.43 126.10 129.04 124.8766 –NO 159.06 122.15 129.21 136.2867 –NO2 148.15 124.59 129.39 134.6668 –CN 115.95 132.18 128.99 130.9069 –NCO 135.16 124.88 129.54 124.5071 –N+≡N 111.49 133.62 133.75 142.1572 –SH 129.47 128.27 128.81 125.3874 –SC(CH3)3 134.27 136.88 128.14 128.8075 –S(CH3)2

+ 128.94 130.72 130.67 134.7476 –SCH=CH2 134.21 129.93 128.64 127.0278 –S-S-phenyl 136.55 127.47 129.21 128.2181 –SO2OH 137.19 125.63 129.85 131.68

Table 5 (concluded).

No. Substituent X C-1 C-2 C-3 C-4

83 –SO2F 133.74 129.72 130.14 136.4484 –SO2Cl 142.98 126.23 129.75 135.7485 –SO2NH2 139.05 127.57 128.79 132.2986 –SCN 123.97 128.93 130.80 130.6289 –COCF3 129.87 129.41 129.22 135.4390 –COC≡CH 136.88 129.46 128.50 134.1691 –CO-phenyl 142.13 129.78 128.16 132.2192 –COOH 132.37 130.13 128.51 133.3093 –COOCH3 137.27 129.49 128.57 132.4794 –CONH2 133.92 127.81 128.54 131.4096 –COF 134.89 127.73 127.64 132.0397 –COCl 132.94 129.20 128.80 134.9099 –CH=NCH3 142.24 129.63 128.60 130.45100 –CSH2-phenyl 144.82 129.05 127.86 130.95101 –CSH2-(1-piperidyl) 146.27 125.91 128.24 128.05104 –Si(CH3)3 138.54 133.34 127.83 129.26105 –Si(phenyl)3 137.75 135.16 128.27 129.97106 –SiCl3 133.81 134.07 128.58 132.86108 –P(phenyl)2 140.55 133.88 128.49 129.02109 –PO(CH3)2 131.81 129.86 129.03 132.29110 –PO(OH)2 127.40 128.61 129.96 134.04111 –PO(OCH2CH3)2 127.84 128.10 128.37 131.42112 –PS(CH3)2 138.03 132.24 128.72 131.68113 –PS(OCH2CH3)2 140.34 129.72 128.71 131.54

Validation5 –CH(CH3)2 148.39 126.10 128.20 125.7114 –CH2Cl 137.52 128.94 128.69 128.0323 –CH2N(CH3)2 135.13 126.63 128.19 126.8724 –CH2NO2 129.76 130.83 130.60 129.6530 –CH2CHO 134.66 128.42 128.98 127.5436 –phenyl 138.02 127.07 128.96 127.3337 –2-pyridyl 138.48 125.44 128.97 127.1438 –4-pyridyl 137.40 127.48 128.97 129.1242 –I 96.85 136.27 130.07 127.3563 –N(NO)CH3 141.65 118.95 129.38 127.1270 –NCS 129.01 124.36 129.78 127.6173 –SCH3 137.91 127.67 128.67 125.3987 –CHO 136.84 127.26 128.99 134.1188 –COCH3 136.33 127.79 128.41 132.8395 –CON(CH3)2 133.50 126.72 128.29 129.69103 –SiH2CH3 134.99 132.96 127.99 129.58

Test9 –cyclopentyl 144.12 127.61 128.09 125.3920 –CH2OCH3 134.31 127.82 128.42 127.4822 –CH2NHCH3 141.16 128.34 128.18 126.7629 –CH2SO2CH3 125.87 131.49 129.08 129.1233 –CH=CH2 138.50 126.24 128.41 127.7934 –C(CH3)=CH2 139.316 125.77 128.08 127.2744 –OCH3 161.33 114.06 129.49 125.3452 –NHCH3 142.10 112.96 129.27 116.6359 –N(O)(CH3)2 154.10 120.21 129.29 129.1477 –S-phenyl 131.02 131.57 129.06 126.9979 –S(O)CH3 143.04 123.42 129.58 130.8480 –SO2CH3 139.51 131.33 129.29 133.5982 –SO2OCH3 131.24 126.66 129.92 134.1998 –COSH 133.20 128.11 128.71 133.78102 –SiH3 127.83 131.49 128.11 129.51107 –P(CH3)2 138.87 130.34 127.90 127.52

602 Can. J. Chem. Vol. 89, 2011

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 6: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

molecular descriptors and MIA descriptors. The results ob-tained using the two kinds of descriptors were compared.

Experimental

Data setThe 13C NMR chemical shifts of 113 monosubstituted

benzenes (in ppm relative to TMS) for ipso, ortho, meta, andpara positions were obtained from the literature.17 During thisstudy, attention was paid to collecting 13C NMR chemicalshift data under similar experimental conditions. The chemi-cal structures of the different substituted compounds used inthis study and their 13C NMR chemical shifts are shown inTable 1.

Descriptors

Original molecular descriptorsAll structures were generated with HyperChem software

(version 7) and optimized by the AM1 semi-empiricalmethod.18 The selection and calculation of the structural de-scriptors as numerically encoded parameters is regarded asan essential step in every QSPR study. In this work, someQSPR descriptors were calculated using the HyperChempackage. Then the HyperChem files were transferred intoDragon software19 to calculate different kinds of descriptors.Constitutional, topological, and geometrical descriptors cal-culated with Dragon software have been successfully used invarious QSPR studies.20,21 Also, MOPAC software22 wasused to calculate electronic and quantum chemical descrip-tors. Then, descriptors with constant or almost constant val-ues for all molecules were eliminated. In addition, pairs ofvariables with a correlation coefficient greater than 0.90were classified as intercorrelated and only one of them wasconsidered in developing the models. A total of 146 descrip-tors were considered for further investigations.

Multivariate image analysis (MIA) descriptorsThe structures of studied compounds (Table 1) were sys-

tematically drawn in ACD/ChemSketch23 and saved in Paint-Star24 as bitmap files. The bitmap windows were cut to a215 × 210 pixels size to minimize memory and the drawnmolecules were systematically fixed in a given coordinate bya common point among them. For instance, position 1(Fig. 1) was manually fixed at the 100 × 100 pixels coordi-nate. Each 2D image was read and converted into binaries(double array in MATLAB25). Each image of 215 × 210 pix-els was unfolded to a 1 × 45 150 pixels row and then the 113images were grouped to form a 113 × 45 150 pixels matrix

(Fig. 1). Columns with zero variance were removed to mini-mize memory, reducing the size of matrices to 113 × 817pixels.

Principal component regressionsIn QSAR/QSPR studies, a regression model of the form

y = Xb + e may be used to describe a set of predictor varia-bles (X) with a predicted variable (y) by means of a regres-sion vector (b). However, the collinearity that often existsbetween independent variables creates a severe problem incertain types of mathematical treatments such as matrix in-version.26 A better predictive model can be obtained by or-thogonalization of the variables by means of principalcomponent analysis (PCA), and the consequent method iscalled principal component regression (PCR).27–29 To reducethe dimensionality of the independent variable space, a lim-ited number of principal components (PCs) are used. Selec-tion of significant and informative PCs is the main problemin almost all PCA-based calibration methods.30–32 Differentmethods have been used to select the significant PCs for cal-ibration purposes. In the simplest and most common one, thefactors are ranked in order of decreasing eigenvalue and in-troduced into the calibration model one after the other. In an-other method, called correlation ranking, the factors areranked by their correlation coefficient with the property tobe correlated (a dependent variable).32 Better results are oftenachieved by this method. Therefore, in this study, the correla-tion ranking method was used to select significant PCs ob-tained from original molecular descriptors and MIAdescriptors.In the present work, the first PCA was carried out on two

data matrices using Minitab.33 After acquiring PCs, PCRanalysis including correlation ranking (CR–PCR) was em-ployed. In the CR–PCR procedure, the PCs were enteredinto the PCR model consecutively based on their decreasingcorrelation with the 13C NMR chemical shifts. The coeffi-cient of determination R2 ≥ 0.5 was used to select the opti-mum number of factors in the PCR models. The obtainedmodels are summarized in Tables 2 and 3 for original molec-ular descriptors and MIA descriptors, respectively. In all CR–PCR equations, the factor with the highest correlation withthe 13C NMR chemical shift is considered the most signifi-cant and, subsequently, the factors are introduced into thecalibration model until R2 ≥ 0.5 is achieved. PCs with highercorrelations have greater information about the variation inthe 13C NMR chemical shifts.

Artificial neural network modelingBecause of the complexity of the relationships between the

Table 6. Statistical parameters for RC–PC–ANN models based on original molecular descriptors and MIA descriptors.

C-1 C-2 C-3 C-4

Method Set R2 SE R2 SE R2 SE R2 SERC–PC–ANN (original) Training 0.761 3.997 0.853 1.978 0.842 0.376 0.821 1.785

Validation 0.931 2.784 0.894 1.196 0.655 0.647 0.924 0.626Test 0.956 1.827 0.946 1.413 0.989 0.071 0.975 0.661

RC–PC–ANN (MIA) Training 0.871 3.242 0.935 1.332 0.976 0.155 0.982 0.613Validation 0.991 1.043 0.933 1.007 0.999 0.016 0.993 0.196Test 0.973 1.566 0.927 1.625 0.999 0.019 0.999 0.127

ChemDraw All 0.903 3.239 0.942 1.286 0.328 0.815 0.831 1.540

Garkani-Nejad and Poshteh-Shirani 603

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 7: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

activity and properties of the molecules and the structures,nonlinear modeling methods are often used to model thestructure–activity and structure–property relationships. Artifi-cial neural networks (ANNs) as nonparametric nonlinearmodeling techniques have attracted increasing interest in re-

Table 7. 13C NMR chemical shifts (ppm) calculated using Chem-Draw.

No. Substituent X C-1 C-2 C-3 C-41 –H 128.5 128.5 128.5 128.52 –CH3 138.4 129.0 128.6 125.73 –CH2CH3 144.3 127.7 128.6 125.94 –CH2CH2CH3 142.0 128.1 128.8 126.05 –CH(CH3)2 148.4 126.1 128.4 125.96 –CH2CH2CH2CH3 142.6 128.1 128.5 125.97 –C(CH3)3 151.3 125.0 128.3 125.68 –cyclopropyl 145.8 125.0 128.1 125.19 –cyclopentyl 148.5 126.8 128.3 125.810 –cyclohexyl 147.0 126.5 128.4 125.911 –1-adamantyl 147.7 126.5 128.4 125.912 –CH2F 137.0 127.6 129.1 129.213 –CF3 131.0 125.1 129.0 132.014 –CH2Cl 137.8 128.6 128.9 128.715 –CHCl2 140.4 125.9 128.8 129.916 –CCl3 144.0 126.6 128.6 130.517 –CH2Br 138.0 129.0 129.0 128.918 –CH2I 139.1 128.3 128.7 127.819 –CH2OH 141.2 127.1 128.9 127.620 –CH2OCH3 137.5 127.4 128.6 127.821 –CH2NH2 143.3 126.9 128.5 126.722 –CH2NHCH3 140.2 127.9 128.5 127.023 –CH2N(CH3)2 138.6 128.8 128.4 127.224 –CH2NO2 130.7 129.4 130.9 129.925 –CH2CN 131.3 128.8 127.9 128.026 –CH2SH 139.6 127.7 128.7 127.127 –CH2SCH3 138.8 128.7 128.6 127.128 –CH2S(O)CH3 130.7 131.6 129.1 128.529 –CH2SO2CH3 128.5 132.3 129.3 129.330 –CH2CHO 131.9 129.6 129.2 127.631 –CH2COCH3 134.0 129.1 128.8 127.132 –CH2COOH 134.7 129.7 129.1 127.533 –CH=CH2 137.9 128.5 128.6 127.934 –C(CH3)=CH2 141.1 126.4 128.6 127.935 –C≡CH 122.7 132.3 128.3 128.436 –phenyl 140.8 127.9 129.2 127.637 –2-pyridyl 139.0 127.6 129.2 127.338 –4-pyridyl 142.3 127.4 129.2 129.239 –F 162.9 115.5 130.3 124.340 –Cl 134.3 128.8 130.1 126.841 –Br 123.1 131.6 130.9 127.742 –I 94.3 137.6 130.3 127.643 –OH 158.5 115.9 130.1 121.344 –OCH3 160.6 114.3 129.7 121.045 –OCH=CH2 157.6 117.2 129.4 122.946 –O-phenyl 157.0 118.9 128.4 121.847 –OCOCH3 151.3 121.6 129.1 125.548 –OSi(CH3)3 156.4 120.1 129.6 121.649 –OPO(O-phenyl)2 150.2 120.3 130.1 121.350 –OCN 153.8 116.0 131.3 127.751 –NH2 148.4 115.0 129.5 122.452 –NHCH3 150.0 113.5 129.5 120.853 –N(CH3)2 151.1 114.3 129.6 121.954 –NH-phenyl 142.4 120.6 129.6 121.955 –N(phenyl)2 145.9 125.7 129.6 126.856 –NH3

+ 134.1 121.2 130.5 128.757 –NH2

+CH(CH3)2 138.0 121.2 130.5 128.758 –N+(CH3)3 146.4 121.2 130.5 128.759 –N(O)(CH3)2 150.0 116.6 126.5 127.1

Table 7 (concluded).

No. Substituent X C-1 C-2 C-3 C-4

60 –NHCOCH3 138.5 121.6 128.9 128.061 –NHOH 153.5 114.5 130.0 127.162 –NHNH2 153.0 113.2 129.2 122.863 –N(NO)CH3 142.5 118.9 129.5 131.064 –N=CH-phenyl 152.0 122.3 130.0 127.265 –N=NCH3 150.9 122.6 129.2 125.766 –NO 168.3 121.1 129.5 135.767 –NO2 147.9 123.9 129.6 134.868 –CN 112.6 132.2 129.4 133.069 –NCO 133.6 125.1 129.8 125.970 –NCS 131.5 126.1 130.0 127.771 –N+≡N 114.0 131.5 132.8 135.272 –SH 130.7 129.4 129.0 125.573 –SCH3 139.4 126.8 128.9 125.174 –SC(CH3)3 133.8 140.2 128.4 128.775 –S(CH3)2

+ 143.3 125.4 128.7 128.776 –SCH=CH2 135.1 129.4 129.0 125.577 –S-phenyl 135.7 131.2 129.3 127.278 –S-S-phenyl 136.0 127.2 129.0 125.579 –S(O)CH3 145.7 124.1 129.8 131.180 –SO2CH3 141.0 128.3 129.7 133.781 –SO2OH 145.5 128.1 130.0 132.582 –SO2OCH3 147.4 129.7 130.2 134.683 –SO2F 131.9 127.9 130.2 136.284 -SO2Cl 144.1 126.8 129.9 135.585 –SO2NH2 143.7 127.3 129.0 131.986 –SCN 124.8 131.2 130.9 130.987 –CHO 136.9 129.9 129.2 134.588 –COCH3 136.7 128.8 128.6 133.189 –COCF3 123.4 130.5 129.4 135.490 –COC≡CH 135.1 128.9 129.2 134.591 –CO-phenyl 138.4 130.3 128.4 132.492 –COOH 130.2 130.3 128.6 133.993 –COOCH3 130.1 129.9 128.6 133.094 –CONH2 134.2 127.5 128.8 132.195 –CON(CH3)2 135.2 127.2 128.5 129.796 –COF 132.8 129.9 129.2 134.597 –COCl 133.2 131.4 129.0 135.398 –COSH 134.7 128.1 128.9 134.199 –CH=NCH3 136.4 129.2 128.8 131.0100 –CSH2-phenyl 137.1 127.7 128.7 127.1101 –CSH2-(1-piperidyl) 136.9 127.7 128.7 127.1102 –SiH3 128.0 135.8 129.5 130.0103 –SiH2CH3 133.0 134.8 135.3 130.0104 –Si(CH3)3 140.1 133.4 129.2 129.1105 –Si(phenyl)3 134.3 136.4 129.5 130.0106 –SiCl3 131.9 134.0 129.5 130.0107 –P(CH3)2 142.1 131.4 128.1 127.7108 –P(phenyl)2 137.1 133.7 128.7 128.8109 –PO(CH3)2 135.0 129.6 128.8 134.2110 –PO(OH)2 139.1 126.9 128.8 134.2111 –PO(OCH2CH3)2 134.2 129.4 128.5 134.6112 –PS(CH3)2 128.5 131.2 128.7 131.2113 –PS(OCH2CH3)2 128.5 131.2 128.7 131.2

604 Can. J. Chem. Vol. 89, 2011

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 8: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

cent years.34,35 Multilayer feed-forward neural networkstrained with back-propagation learning algorithms have be-come increasingly popular.34–37 The flexibility of ANN fordiscovering a more complex relationship has resulted in thismethod finding wide application in QSAR/QSPR studies (re-viewed by Duch et al.38).The principal component–artificial neural network (PC–

ANN) was proposed by Gemperline et al. to improve the train-ing speed and decrease the overall calibration error.39 PC–ANN combines the PCA with ANN and models the nonlinearrelationships between the PCs and the dependent variable.A feed-forward neural network with a back-propagation of

error algorithm was constructed to model the structure–prop-erty relationship. Our network had an input layer, a hiddenlayer, and an output layer. The input vectors were the set ofPCs selected by correlation ranking (Tables 2 and 3). Thenumber of nodes in the input layer depended on the numberof PCs introduced to the network. The number of nodes inthe hidden layer was optimized through the learning proce-dure. There was only one node in the output layer. The dataset was separated into three subsets: a training set including71 compounds, a validation set including 16 compounds,and a test set including 16 compounds. The training, valida-tion, and test data sets were used to optimize the networkperformance. The training set was used for model generation,the test set was used to control the overtraining, and the vali-dation set was used to evaluate the generated model. TheANN program was coded in MATLAB 7.1 for Windows.25The network was then trained using the training set by theback-propagation strategy for optimization of the weight val-ues. The proper number of nodes in the hidden layer was de-termined by training the network with different numbers ofnodes in the hidden layer. The standard error (SE) measureshow good the outputs are in comparison with the target val-ues. It should be noted that for evaluating the overfitting, thetraining of the network for the prediction of 13C NMR chem-ical shifts must stop when the SE of the test set begins to in-crease while the SE of the training set continues to decrease.Therefore, training of the network was stopped when over-training began.The results obtained using the RC–PC–ANN method for

PCs extracted from original molecular descriptors and MIAdescriptors are shown in Tables 4 and 5.

Results and discussionTable 1 lists the structures of monosubstituted benzenes

used in this study and their corresponding experimental 13CNMR chemical shift values for four different positions: ipso,ortho, meta, and para. After elimination of the constants andone of the collinear descriptors, 146 original molecular de-scriptors and 817 MIA descriptors remained. After the appli-cation of PCA to the two descriptor data matrices, 112 PCswere extracted for each group of descriptors. Then, separatePCR models were obtained for each position (ipso, ortho,meta, and para). The models obtained by the correlationranking procedure are shown in Tables 2 and 3. The QSPRmodels for all positions resulted in high statistical qualities.Comparison of these models indicates that the number ofPCs is lower for the original molecular descriptors than forthe MIA descriptors. This shows that the PCs obtained from

Fig. 2. Plots of experimental 13C chemical shifts of monosubstitutedbenzenes against values calculated using RC–PC–ANN models basedon original molecular descriptors for C1–C4 (a–d, respectively).

Garkani-Nejad and Poshteh-Shirani 605

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 9: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

the original molecular descriptors provide greater informationabout the variations of 13C NMR chemical shifts.To increase the ability of the obtained models to predict

13C NMR chemical shifts, a nonlinear modeling method wasemployed. Typically, superior models can be found usingANNs because they implement nonlinear relationships andbecause they have more adjustable parameters than linearmodels. Therefore, in this study we used ANN as the nonlin-ear model. Correlation ranking was used to select the mostrelevant set of PCs as inputs of the ANN. The results of theRC–PC–ANN method are presented in Tables 4 and 5 for thetraining, validation, and test sets, respectively.For comparison, R2 and SE values of RC–PC–ANN mod-

els for original molecular descriptors and MIA descriptors aresummarized in Table 6. As can be seen from this table, theQSPR models obtained from the two main groups of descrip-tors indicate good performance for different carbon positionsand could predict the 13C NMR chemical shifts of the relatedmolecules with low error. Also, comparison of the results in-dicates that the RC–PC–ANN models based on PCs extractedfrom the MIA descriptors are more accurate and show betterpredictive ability than the RC–PC–ANN models based onPCs extracted from the original molecular descriptors.Finally, 13C NMR chemical shifts of the studied com-

pounds were calculated using ChemDraw;40 these values areshown in Table 7. Also, statistical parameters of these valuesare indicated in Table 6.The results obtained in the present work indicate that

though MIA descriptors do not have a direct physicochemicalmeaning, they may provide useful information and are capa-ble of predicting the 13C NMR chemical shifts of studiedcompounds with a good predictive ability.Plots of experimental 13C NMR chemical shifts versus

those calculated using RC–PC–ANN modeling methods foripso, ortho, meta, and para positions are shown in Figs. 2a–2d for the original molecular descriptors and Figs. 3a–3d forthe MIA descriptors.

Conclusion

The QSPR modeling of 13C NMR chemical shifts ofmonosubstituted benzenes was studied using the RC–PC–ANN method. A comparison was made between the originalmolecular descriptors and MIA descriptors. The resultsshowed that both groups of descriptors are capable of pre-dicting 13C NMR chemical shifts of benzene derivatives.Also, it was found that RC–PC–ANN models based on PCsextracted from MIA descriptors are more accurate and showbetter predictive ability than RC–PC–ANN models based onPCs extracted from original molecular descriptors.

AcknowledgementsSupport of this work by Vali-e-Asr University is acknowl-

edged.

References(1) Garkani-Nejad, Z.; Ahmadi-Roudi, B. Eur. J. Med. Chem.

2010 45 (2), 719. doi:10.1016/j.ejmech.2009.11.019.(2) Jalali-Heravi, M.; Garkani-Nejad, Z.; Kyani, A. QSAR Comb.

Sci. 2008 27 (2), 137. doi:10.1002/qsar.200510205.

Fig. 3. Plots of experimental 13C chemical shifts of monosubstitutedbenzenes against values calculated using RC–PC–ANN modelsbased on MIA descriptors for C1–C4 (a–d, respectively).

606 Can. J. Chem. Vol. 89, 2011

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.

Page 10: Modeling of 13 C NMR chemical shifts of benzene derivatives using the RC–PC–ANN method: A comparative study of original molecular descriptors and multivariate image analysis descriptors

(3) Gedeck, P.; Lewis, R. A. Curr. Opin. Drug Discov. Devel.2008 11, 569.

(4) Freitas, M. P. Curr. Comput. Aided Drug Des. 2007 3 (4), 235.doi:10.2174/157340907782799408.

(5) Schweitzer, R. C.; Small, G. W. J. Chem. Inf. Comput. Sci.1997 37, 249.

(6) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. J. Am. Chem.Soc. 1988 110 (18), 5959. doi:10.1021/ja00226a005.

(7) Klebe, G.; Abraham, U.; Mietzner, T. J. Med. Chem. 1994 37(24), 4130. doi:10.1021/jm00050a010.

(8) Jalali-Heravi, M.; Garkani-Nejad, Z. J. Chromatogr. A 2002950 (1–2), 183. doi:10.1016/S0021-9673(02)00054-7.

(9) Jalali-Heravi, M.; Garkani-Nejad, Z. J. Chromatogr. A 2002971 (1–2), 207. doi:10.1016/S0021-9673(02)01043-9.

(10) Vedani, A.; Dobler, M. J. Med. Chem. 2002 45 (11), 2139.doi:10.1021/jm011005p.

(11) Vedani, A.; Dobler, M.; Lill, M. A. J. Med. Chem. 2005 48(11), 3700. doi:10.1021/jm050185q.

(12) Freitas, M. P.; Brown, S. D.; Martins, J. A. J. Mol. Struct. 2005738 (1–3), 149. doi:10.1016/j.molstruc.2004.11.065.

(13) Freitas, M. P. Org. Biomol. Chem. 2006 4 (6), 1154. doi:10.1039/b516396j.

(14) Geladi, P.; Esbensen, K. J. Chemom. 1989 3 (2), 419. doi:10.1002/cem.1180030209.

(15) Goodarzi, M.; Freitas, M. P.; Ramalho, T. C. Spectrochim.Acta A Mol. Biomol. Spectrosc. 2009 74 (2), 563. doi:10.1016/j.saa.2009.07.003.

(16) Garkani-Nejad, Z.; Poshteh-Shirani, M. Talanta 2010 83 (1),225. doi:10.1016/j.talanta.2010.09.012.

(17) Pretsch, E.; Bühlmann, P.; Affolter, C. Structure Determina-tion of Organic Compounds: Tables of Spectral Data;Springer: Berlin, 2000; pp 97–99.

(18) HyperChem, version 7.0; Hypercube, Inc.: Gainesville, Fla.,2002.

(19) Todeschini, R.; Consonni, V.; Mauri, A.; Pavan, M. DRAGON(software for the calculation of molecular descriptors), version3; Dipartimento di Scienze dell’Ambiente e del Territorio,Università degli Studi di Milano–Bicocca, and Talete srl:Milan, 2003. Available from http://www.disat.unimib.it/chm/Dragon.htm.

(20) Lü, W. J.; Chen, Y. L.; Ma, W. P.; Zhang, X. Y.; Luan, F.; Liu,M. C.; Chen, X. G.; Hu, Z. D. Eur. J. Med. Chem. 2008 43 (3),569. doi:10.1016/j.ejmech.2007.04.011.

(21) Turabekova, A.; Rasulev, B. F. A.; Levkovich, M. G.;Abdullaev, N. D.; Leszczynski, J. Comput. Biol. Chem. 200832 (2), 88. doi:10.1016/j.compbiolchem.2007.10.003.

(22) Stewart, J.J.P. MOPAC: A Semiempirical Molecular OrbitalProgram, version 6; US Air Force Academy: ColoradoSprings, Colo., 1990.

(23) ACD/ChemSketch, version 11.02; Advanced Chemistry Devel-opment Inc.: Toronto, Ont., 2008.

(24) PaintStar, version 2.70; 2003. Available from http://paintstar.en.softonic.com.

(25) MATLAB, version 7.1; MathWorks, Inc.: Natick, Mass., 2005.Available from www.mathworks.com.

(26) Montgomery, D. C.; Peck, E. A. Introduction to LinearRegression Analysis; Wiley: New York, 1982.

(27) Jolliffe, I. T. Principal Component Analysis; Springer, NewYork, 1986.

(28) Kalivas, J. H.; Lang, P. M. Mathematical Analysis of SpectralOrthogonality; Marcel Dekker: New York, 1994.

(29) Puchwein, G. Anal. Chem. 1988 60 (6), 569. doi:10.1021/ac00157a015.

(30) Xie, Y. L.; Kalivas, J. H. Anal. Chim. Acta 1997 348 (1–3), 19.doi:10.1016/S0003-2670(97)00035-4.

(31) Sutter, J. M.; Kalivas, J. H.; Lang, P. M. J. Chemometr. 1992 6(4), 217. doi:10.1002/cem.1180060406.

(32) Sun, J. J. Chemom. 1995 9 (1), 21. doi:10.1002/cem.1180090104.

(33) Minitab, version 15.1; Minitab Inc.: State College, Pa., 2008.Available from www.minitab.com.

(34) Chen, H. F. Anal. Chim. Acta 2008 609 (1), 24. doi:10.1016/j.aca.2008.01.003.

(35) Garkani-Nejad, Z. Chromatographia 2009 70 (5–6), 869.doi:10.1365/s10337-009-1241-6.

(36) Garkani-Nejad, Z. J. Chromatogr. Sci. 2010 48, 317.(37) Manallack, D. T.; Tehan, B. G.; Gancia, E.; Hudson, B. D.;

Ford, M. G.; Livingstone, D. J.; Whitley, D. C.; Pitt, W. R.J. Chem. Inf. Comput. Sci. 2003 43, 674.

(38) Duch, W.; Swaminathan, K.; Meller, J. Curr. Pharm. Des.2007 13 (14), 1497. doi:10.2174/138161207780765954.

(39) Gemperline, P. J.; Long, J. R.; Gregoriou, V. G. Anal. Chem.1991 63 (20), 2313. doi:10.1021/ac00020a022.

(40) ChemDraw, version 11; CambridgeSoft: Cambridge, Mass.,2008. Available from www.cambridgesoft.com.

Garkani-Nejad and Poshteh-Shirani 607

Published by NRC Research Press

Can

. J. C

hem

. Dow

nloa

ded

from

ww

w.n

rcre

sear

chpr

ess.

com

by

SUN

Y A

T S

TO

NY

BR

OO

K o

n 11

/09/

14Fo

r pe

rson

al u

se o

nly.