Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Application of Chemometric and QSAR Approaches to Scoring Ligand
Receptor Binding Affinity
Alexander TropshaLaboratory for Molecular Modeling
University of North Carolina at Chapel Hill
Collaborators: C. Breneman, W. Deng, N. Sukumar (RPI);A. Golbraikh, J. Feng, Y.D.Xiao (UNC)
OUTLINE
• Representation and scoring of ligand-receptor interactions in the context of structure-based design
• Chemometric and QSAR Approaches to Scoring– Complementarity between ligands and active sites in
multidimensional descriptor space– QSAR modeling of diverse ligands of different
receptors– Database mining
Scoring function is the crucial component of structure-based drug
designDatabase Compounds
Complexes
Docking methods
Scoring function
Candidate Compounds
Molecular Representations and Scoring Functions for Ligand-Receptor Interaction
Chemical similarity, QSAR (QSBR)
Multiple Chemical Descriptors
Statistical, or knowledge-based (PMF, SMoG, BLEEP)
Chemical atom types
Empirical estimates of intermolecular interaction terms (VALIDATE, SCORE)
Molecular mechanics
Force-Field basedMolecular mechanics
Scoring functionRepresentation
Representation of Ligand-Receptor Interface in Chemometric Space
• Both ligands and active sites are represented independently by multidimensional chemical descriptors
• Chemical similarity measures are used to establish complementarity between ligands and their receptors
• QSAR approaches are used to establish correlations between descriptors of ligands and their binding affinity to diverse receptors
PMF test set: 68 compounds (Muegge & Martin,
J.Med.Chem.1999,42,791-804)
Serine Proteases: 16Metalloproteases: 15L-arabinose binding protein: 9Endothiapepsin: 11Other proteins: 17
Aspartic Proteases: 18 complexesSerine Proteases: 20Metalloproteases: 22Human Carbonic Anhydrase II (HCA, 19 Complexes) Sugar-Binding Proteins (14 Complexes)Endothiapepsin (11 Complexes)Purine Nucleoside Phosphatase(PNP, 5 Complexes) Other Proteins (10 Complexes)
SMoG2001 test set: 111 compounds (Ishchenko & Shakhnovich
J. Med. Chem. 2002, 45, 2770-2780)
Datasets
SCORING…BUT NO DOCKING: QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS:
MOLECULAR DIVERSITY
Transferable Atom Equivalents (TAE Descriptors)*
uA library of atomic charge density fragments has been built in a form that allows for the rapid retrieval of the fragments and molecular assembly.
uAtoms in the library correspond to structurally distinct atom types
uAssociated with each atomic charge density fragment in the TAE library are a set of descriptors.
uThe atomic descriptors in the TAE library are calculated ab initio at a high level of theory (HF 6-31+G*)
*e.g., Breneman et al, "Electron Density Modeling of Large Systems Using the Transferable Atom Equivalent Method" Computers & Chemistry, 1995, 19(3), 161.
Mapping of PMF dataset from high-dimensional to 2D TAE Descriptor Space
using SVD/TNPACK (Xie, Tropsha, Schlick. JCICS, 1999, 2000; 167-177)
Projections Based on Ligand-TAE
-4
-3
-2
-1
0
1
2
3
4
0 2 4 6 8
TNPack1
TN
Pack2
Ki < -45.0 -45.0 < Ki < -35.0-35.0 < Ki < -25.0 Ki > -25.0
Projections Based on Protein-TAE
-2.5-2
-1.5-1
-0.50
0.5
11.5
2
2.53
0 2 4 6 8 10TNPack1
TN
Pa
ck
2
Ki < -45.0 -45.0 < Ki < -35.0
-35.0 < Ki < -25.0 Ki > -25.0
Algorithm to establish complementarity between ligands and their receptors in multidimensional TAE descriptor
space using variable selection
Calculate pairwise distances in selected subspace based on protein TAE
Randomly Select a Subset of MMvarvar Descriptors
Calculate pairwise distances in selected subspace based on ligand TAE
Calculate r2 in selected descriptor subspace
SA
Build correlations between corresponding pairwise distances
RR vs. LL Distances in the Entire TAE Space (2016 points)
Ligand-Protein Distance Pairs
y = 1.0771xR2 = 0.5028
0
1
2
3
4
5
6
7
8
0 2 4 6 8
LL-Distances
RR-D
ista
nces
LL vs RR Distances in Selected Descriptors Space of 20 TAE
descriptors (2016 points)Ligand-Protein Distance Pairs After Variable
Selection (Nselected=10)
y = 1.2244xR2 = 0.7059
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5
LL
RR
Most frequently occurring TAE descriptors in 35 best models
05
101520253035
EP2PIP
13
SIEPA
2
SIEPA
3
Del(G)N
A5
Del(K)N
A4 Lapl5SIK
A3PIP
11SIK
A2 EP3Fu
k5La
pl6
Del(K)M
ax EP4
PIP6
SIEPA
4PIP
3
Del(G)N
A3
Del(Rho
)NA3
Fre
qu
ency
Chemical meaning of selected descriptors
• Electrostatic interactions, induced-dipole interactions and hydrogen bonding
• Hydrophobic surface area
QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS
OBJECTIVE:
BUILD QSAR MODELS USING INFORMATION ABOUT LIGAND STRUCTURES ONLY
QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS: COMPUTATIONAL PROCEDURE
59 COMPOUNDS MOLCONNZ DESCRIPTORS
NORMALIZATION:
Xij and are the non-normalized and normalized j-th descriptor values for compound i, Xj,min and Xj,max are the minimum and maximum values for j-thdescriptor
kNN QSAR:FOR EACH TRAINING SETnvar 10 to 50 with step 4 (11 values)10 models for each nvar value
min,max,
min,
jj
jijnij XX
XXX
−
−=
nijX
SELECTION OF TRAINING AND TEST SETS
MODEL VALIDATION:Prediction of binding energies for the test set compounds. Comparison of predicted and observed binding energies.
2D Descriptors2D DescriptorsSet of descriptors includes topological features of compounds
Hydrogen - depleted molecular graph and vertex degrees ai
1
1
2
13
Molecular connectivity indices
∑=
−=N
iia
1
5.00 )(χ ∑ −=edges All
5.01 )(21 ii aaχ ∑ −− =
subgraphs edge- 1)-(n All
5.01 )...(21 viii
n aaaχ
EXAMPLE: Molecular connectivity indices
Molecular shape indices
Wiener number
Information indices
Platt number
Graph radius and diameter
etc.
The number of descriptors exceeds the number of compounds
Hall, L. H.; Kier, L. B. In Reviews in Computational Chemistry II, VCH, 1991, 367-422.
VARIABLE SELECTION kNN QSAR
Randomly select a subset of descriptors
Select the best QSAR model for nvar and K
SIM
UL
AT
ED
AN
NE
AL
ING
LEAVE-ONE-OUT CROSS-VALIDATION
Exclude a compound
Predict activity y of the excluded compound as the weighted average of activities of 1 to K nearest neighbors
Calculate the predictive ability (q2) of the “model”
Modify descriptor subset
∑∑
−
−=
neighborsnearest
neihborsnearest
)exp(
)exp(ˆ
i
ii
d
dyy
∑∑
−
−−= 2
22
)(
)ˆ(1
i
ii
yy
yyq
Zheng, W. and Tropsha, A. JCICS., 2000; 40; 185-194
y = 0.9383xR 0
2 = -3.3825
y = 0.3154x + 3.4908R2 = 0.9778
0
2
4
6
8
10
0 2 4 6 8 10
Predicted
Ob
serv
ed
y = 1.0023xR 0
2 = 0.5238
y = 3.1007x - 10.715R
2 = 0.9778
0
2
4
6
8
10
0 2 4 6 8 10
Observed
Pre
dic
ted
∑∑∑
−−
−−=
22 )~~()(
)~~)((
yyyy
yyyyR
ii
ii
yky r ~0 =
2
220 )~~(
)~(1
0
yy
yyR
i
rii
−
−−=
∑∑
2
220 )(
)~(1'
0
yy
yyR
i
rii
−−
−=∑∑
''~ byay r +=yky r '~ 0 =
PRACTICING SAFE QSAR
Correlation coefficient
2~~
i
ii
yyy
k∑∑=
2)~~()~~)((
yyyyyy
ai
ii
−−−
=∑
∑
yayb ~−=
y = 1.2458x - 1.8812
R2 = 0.8604
y = 0.9796x
R02 = 0.8209
3
5
7
9
3 5 7 9
Predicted
Ob
serv
ed
2)(
)~~)(('
yy
yyyya
i
ii
−−−
=∑
∑
yayb '~' −=2
~'
i
ii
y
yyk
∑∑=
Coefficients of determinationRegressionRegression through
the origin
CRITERIA
22'0
20
'
22
or ;0.1or
;6.0;5.0
RRRkk
Rq
≈≈
>>
19
QSAR ANALYSIS OF PMF DATASET: RESULTS
59 COMPOUNDS
TRAINING SET: 50TEST SET: 9
TRAINING SET: 46TEST SET: 13
TRAINING SET: 39TEST SET: 20
q2=0.737R2
pred=0.850R0
2=0.847k=1.137
S.E.=0.479F=39.58
q2=0.510R2
pred=0.718R0
2=0.716k=0.850
S.E.=0.653F=45.74
q2=0.584R2
pred=0.757R0
2=0.743k=1.108
S.E.=0.509F=34.22
QSAR ANALYSIS OF LIGANDS OF DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES
TRAINING SET: 50TEST SET: 9
TRAINING SET: 46TEST SET: 13
y = 1.0183xR2 = 0.7393
y = 0.8551xR2 = 0.841
y = 1.0198x + 0.0637R2 = 0.7393
y = 0.7835x - 2.914R2 = 0.8497
-80
-60
-40
-20
0
-80 -60 -40 -20 0
Predicted
Obs
erve
d
y = 1.0391xR2 = 0.5978
y = 0.8586x - 7.3812R2 = 0.6287
y = 1.1078xR2 = 0.7436
y = 0.9911x - 3.8447R2 = 0.7568
-80
-60
-40
-20
0
-80 -60 -40 -20 0
Predicted
Obs
erve
d
QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS
111 RECEPTOR-LIGAND COMPLEXES
111 LIGANDS
ELIMINATION OF IDENTICAL COMPOUNDS
95 COMPOUNDS
TRAINING SET: 88TEST SET: 7
TRAINING SET: 67TEST SET: 28
TRAINING SET: 44TEST SET: 51
QSAR ANALYSIS OF SMoG2001 dataset: BEWARE OF
q2!
0.2
0.4
0.6
0.8
1
0.5 0.6 0.7 0.8
q2
R2
0.1
0.3
0.5
0.7
0.9
0.4 0.5 0.6 0.7 0.8
q2
R2
0.2
0.4
0.6
0.8
0.4 0.5 0.6 0.7
q2
R2
QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: RESULTS
95 COMPOUNDS
64.31.750.960.610.610.5043526
72.12.930.970.660.670.5735605
70.72.040.970.570.570.5451447
61.42.830.950.680.700.6728674
88.31.980.970.780.820.6021743
57.51.520.950.800.840.6313822
122.80.500.950.960.960.677881
Fs2kR02R2q2Test setTraining setN
QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES
TRAINING SET: 88TEST SET: 7
TRAINING SET: 67TEST SET: 28
y = 1.2418x + 1.9386R2 = 0.6938
y = 0.9979xR2 = 0.6662
y = 1.1454x + 1.5642R2 = 0.7024
y = 0.9467xR0
2 = 0.6794
-16
-14
-12
-10
-8
-6
-4
-2
0
-14 -12 -10 -8 -6 -4 -2 0
Predicted log(Kd)
Ob
se
rve
d l
og
(Kd)
y = 0.9733x - 0.1802R2 = 0.673
y = 1.0051x + 0.3972R2 = 0.9609
y = 0.9528xR0
2 = 0.9577
y = 0.9958xR0
2 = 0.6726
-16
-14
-12
-10
-8
-6
-4
-2
0
-14 -12 -10 -8 -6 -4 -2 0
Predicted log(Kd)
Ob
se
rve
d l
og
(Kd)
TRAINING SET: 44TEST SET: 51
QSAR ANALYSIS OF LIGANDS TO DIFFERENT RECEPTORS: OBSERVED VS. PREDICTED BINDING ENERGIES
y = 1.123x + 0.8693R2 = 0.5464
y = 1.0115xR0
2 = 0.5408
y = 0.8957x - 0.5852R2 = 0.5731
y = 0.9691xR0
2 = 0.5689
-16
-14
-12
-10
-8
-6
-4
-2
0
-14 -12 -10 -8 -6 -4 -2 0
Predicted log(Kd)
Obs
erve
d lo
g(K
d)
Activity randomization to assess model robustness
Struc.1
Struc.2
Struc.n
..
..
..
..
..
..
Pro.1
Struc.3............
Pro.2
Pro.3
Pro.n
Struc.1
Struc.2
Struc.n
..
..
..
..
..
..
Pro.1
Struc.3
..
..
..
..
..
..
Pro.2
Pro.3
Pro.n
None of the random models was acceptable
CONCLUSIONS• Chemometric and QSAR approaches to representation
and scoring of protein ligand interaction provide an important supplement to existing structure based methods
• Based on the description of active site atoms one can search for complementary ligands in chemical databases
• Ligand affinity can be predicted from multi-target QSAR models of ligands in the absence of the receptor structure
• (Optimal) binding free energy is defined by ligand structure alone?
ACKNOWLEDGMENTS• UNC associates
Former:– Weifan ZHENG– Sung Jin CHO– Xin Chen– Stephen CAMMER
• Collaborators:– C. Breneman (RPI)– D. Bonchev (Texas A&M)– R. Hormann (R&H)– C. Reynolds
R&HàOrtho)– V. Gombar (GSK)– E. Gilford (Pfizer)
• Funding– NIH– Millenium– Rohm & Haas– Ortho McNail– GSK– Pfizer
QSAR group:
– Alex GOLBRAIKH
– Yun-De XIAO– Min SHEN– Scott OLOFF– Yuanyuan QIAO
Protein folding group:
–John GRIER
–Jun FENG–Shuxing ZHANG
–David BOSTICK
–Bala KRISHNAMOORTHY
–Ruchir SHAH
–Sagar KHARE