Upload
edgar-sharp
View
214
Download
1
Embed Size (px)
Citation preview
QSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPsQSAR AND CHEMOMETRIC APPROACHES TO THE SCREENING OF POPs
FOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORTFOR ENVIRONMENTAL PERSISTENCE AND LONG RANGE TRANSPORTPaola GramaticaPaola Gramaticaaa, Ester Papa, Ester Papaaa and Stefano Pozzi and Stefano Pozzibb
a) Department of Structural and Functional Biology, University of Insubria - Varese a) Department of Structural and Functional Biology, University of Insubria - Varese (Italy)(Italy)
b) Laboratory of environmental Studies (SPAA) - Lugano (Switzerland)b) Laboratory of environmental Studies (SPAA) - Lugano (Switzerland)e-mail: e-mail: [email protected] http://fisio.dipbsf.uninsubria.it/qsar/http://fisio.dipbsf.uninsubria.it/qsar/
QSARQSARResearchResearch
UnitUnitD 13
Principal Component Analysis
PC1
PC
2
1
2
3
4567
8
910
1112
1314
1516
1718
19
20
2122
23
24
25
26
27
2829
30
3132
33
34
35
363738
39
4041
42
43
44
45
4647
48
49
50
51
5253
54
55
56575859
6061
62
6364
656667
68
69
70
71
72
737475
76 77
78
7980
81
-2.5
-1.5
-0.5
0.5
1.5
2.5
-7 -5 -3 -1 1 3 5
mp
logKow
logKoa
logVP
logH
logSolMW
Global Mobility IndexGlobal Mobility Index
The inherent tendency of compounds towards global mobility is regulated mainly by volatility, water solubility, Kow and Koa. A Global Mobility IndexGlobal Mobility Index is obtained from the linear combination, by PCA, of the physico-chemical properties: the PC1 score (EV%=74.6%) in Fig. 2.The chemicals on the right side of are those with the major tendency to mobility.
The need for a scientific foundation for the criteria used to evaluate persistence and long-range transport (LRT) potential of POPs (Persistent Organic Pollutants) in the environment has been recently highlighted1. Persistence is a necessary condition for long-range transport, however persistent chemicals are not necessarily subject to long-range transport: the inherent tendency of compounds towards global mobility must also be taken into account. The half-life of organic pollutants in various compartments is among the most commonly used criterion for studying persistence, but these studies are severely hindered by the limited availability of experimental degradation half-life data, thus there is an incentive to develop reliable procedures, like QSAR/QSPR, to estimate lacking data. The same is true for physico-chemical properties particularly relevant for determining mobility potential 2. As the Long Range Transport potential of POPs is due to the contemporaneous influence of their persistence in the environment and their inherent tendency to mobility, the finding of the best combination of chemical properties minimizing LRT is a multicriteria problem and can be approached positively through MultiCriteria Decision-Making (MCDM) techniques 3: procedures for combining the magnitude of several properties into a single quantitative measure of overall quality.
For modeling and predicting half life we used a data set of 141 organic compounds, for which half-life experimental values in different compartments are available from Howard 4,
Mackay5 and Rodan6. The molecular structure has been represented by a wide set of molecular descriptors7 calculated by a software developed by R.Todeschini7,8: Constitutional
descriptors(56), Topological descriptors(69), Walk counts (20), Bcut descriptors (64), Galvez indices (21), 2D Autocorrelations (96), Charge descriptors (7), Aromaticity descriptors (4),
Molecular profiles (40), Geometrical descriptors(18), 3D MoRSE descriptors (160), WHIM descriptors9 (99), GETAWAY descriptors (196), Empirical descriptors (3).
The selection of the best subset variables for modelling half-life was done by a Genetic Algorithm (GA-VSS) approach, where the response is obtained by ordinary least square
regression (OLS). All the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the
validation of the models (MOBY-DIGS package)10.
IntroductionIntroduction
Principal Component Analysis Cum. E.V.: 74.7% ( PC1:48.6% )
PC1
PC
2 12
3 4
56 78
91011
12 13
1415
1617
18
1920
21
22
23
24
2526
2728
29
30 3132
3334
3536
37
3839
40
4142
43
44
45
4647
4849
50 5152
53
54
55 5657
58 5960
61
62
63
6465
66
6768
69
70
71 72
73
74
7980
818283
8687
8990
91
92
95
9697
98
99
100101
102
103104
105 106107108109110 111
112113
114
115
116
117
118120
122
123124
125
126127 128129
130
131
132
133134
136137
138139
140
141
-5
-4
-3
-2
-1
0
1
2
3
4
-4 -3 -2 -1 0 1 2 3 4
Soil
Groundwater
Air
Surface water
General Persistence IndexGeneral Persistence IndexThe Principal Component Analysis (PCA) of the experimental and predicted half-life of 141 pollutants in various media allows the ranking of the chemicals according to their overall half-life and relative persistence in different media.A general Persistence Indexgeneral Persistence Index is obtained from the linear combination of half-life data in four environmental media (PC1 in Fig. 1).The chemicals on the right are the most globally persistent in the various compartments. Figure 1Figure 1
PERSISTENCEPERSISTENCE MOBILITYMOBILITY
Screening of Long Range Transport PotentialScreening of Long Range Transport PotentialThe finding of the best combination of chemical properties minimizing LRT can be approached by MultiCriteria MultiCriteria Decision-MakingDecision-Making (MCDM) techniques: procedures for combining the magnitude of several properties into a single quantitative measure of overall quality. The utility functionutility function is chosen here as the best combined criteria function and is applied to the most relevant properties determining the LRT, according to the following criteria, f(x), all expressed as the minimum: the general Persistence IndexPersistence Index (Fig.1), deriving from the PCA combination of half-life in four environmental compartments, the Mobility IndexMobility Index (Fig.2), deriving from the cited physico-chemical properties and the Air Half-lifeAir Half-life, which is considered particularly relevant in determining LRT. The k=3 properties, equally weighted (by the weight ) and added in the utility function, according to the reported formula, allow a ranking of the studied chemicals according to their LRT potential, giving a LRT index ( F(x)).
The chemicals, highlighted in Fig. 3, with the lowest utility (F(x) near 0) will exhibit highest LRT potential, while those with F(x) near 1 will have the lowest possibility for LRT.
ID
UT
ILIT
Y
1
2
3 4
5
6
7
89
10
11
1213
141516
17
1819
20
21
22
2324
2526
2728
29
30
31
3233
34
35
3637
38
39
40
41
42
43
4445
464748
49
505152
53
54
5556
57
5859
6061
6263646566
676869
70
71
7273
74
75
76
77
78
79
80
81
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 10 20 30 40 50 60 70 80
High LRT potential (Class 1)
Medium LRT potential (Class 2)
Low LRT potential (Class 3)
F x fi i
i
k
x( ) ( )
1
Figure 3Figure 3
Figure 2Figure 2
The QSPR (Quantitative Structure-Property Relationships) approach is applied here in two steps: first, to fill the gap in the experimental data of the studied properties and finally to model the scores of the MCDM function, the LRT index (Fig. 3). Different kinds of theoretical molecular descriptors have been used to obtain OLS regression models (Fig.4) and CART classification models (Fig. 5) with good predictive power (Q2
LOO=86.8%, Q2LMO =86.2% and Misclassification Risk Cross val.=6.2%,
respectively.LRT potential for 81 chemicals
LRT index from MCDM (utility function)
LR
T p
ote
nti
al p
red
icte
d
1 234
56
7 8 9 10
11 12
13 1415
16 17181920
21
22 2324 25262728
29 30 313233
34
35
3637
38 3940
41 4243
4445
46
4748
49
5051
52
53
5455
56
57
5859
6061
62 6364
6566
676869
70
7172
73
74
75
76
77
78
79
80
81
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
ConclusionsConclusions The ranking of the studied chemicals according to their LRT potential, obtained by the utility function of MCDM, can be proposed as an alternative approach to
others based on characteristic travel distance (CTD)11
. An additional advantage of this approach is that the application of the QSPR models (both regression and classification) on the scores of the MCDM utility function (defined as LRT index) can allow a fast pre-screeningfast pre-screening of existing and new chemicals for their inherent tendency to LRT, based simply on the knowledge of their molecular structure.based simply on the knowledge of their molecular structure.
ReferencesReferences
1- Klecka, G.M., Ed. (1999). SETAC Pellston Workshop Environ. Toxicol. Chem. (Suppl.), 18, 8
2- Gramatica, P., Pozzi, S., Consonni, V. and Di Guardo, A. (2001) SAR and QSAR in Environ. Res., in
press.
3- Hendriks M.M.W.B., De Boer J.H., Smilde A.K. and Doornbos D.A. (1992) Chemom. Intell. Lab. Syst 16,
175.
4- Howard,P.H. et all. Handbook of environmental degradation rates (1991)
http://esc.syrres.com/interkow/PhysProp.htm ;
5- Mackay, Shiu, Ma Illustrated handbook of physical-chemical properties and environmental fate
for organic chemicals (2000);
6- Rodan, B.D et all. Envir. Sci. technol.,33(2 3482-3488 (1999);
7- R.Todeschini and V.Consonni,Handbook of molecular descriptors (2000) Wiley;
8- R.Todeschini, DRAGON ver.1.0, Milano, 2000 free download from http://www.disat.unimib.it/chm;
9- R. Todeschini and P.Gramatica (1997) Quant. Struct.Act. Rel. 16, 113-119.
10- R. Todeschini, R. (1999). MOBY DIGS - Software for multilinear regression analysis and variable
subset selection by Genetic Algorithm, rel. 2.1 Milan (Italy).
11- Beyer, A., Mackay, D., Matthies, M., Wania, F. and Webster E. (2000). Environ. Sci.Technol. 34,
699-703.
nC 14.50
nC 7.00
E1u 0.40
2 2
Assigned class
Classification Tree
Figure 5Figure 5Figure 4Figure 4
1 3