View
64
Download
0
Category
Tags:
Preview:
DESCRIPTION
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA). Martin Ott. Outline. Introduction Structures and activities Regression techniques: PCA, PLS Analysis techniques: Free-Wilson, Hansch - PowerPoint PPT Presentation
Citation preview
Bioinformatics IV
Quantitative Structure-Activity Relationships (QSAR)
and
Comparative Molecular Field Analysis (CoMFA)
Martin Ott
Outline
• Introduction• Structures and activities • Regression techniques:
PCA, PLS• Analysis techniques:
Free-Wilson, Hansch• Comparative Molecular Field Analysis
QSAR: The Setting
Quantitative structure-activity relationships are usedwhen there is little or no receptor information, butthere are measured activities of (many) compounds
They are also useful to supplement docking studies which take much more CPU time
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
0
1
2
3
4
5
6
7
8
9
1 3 5 7 9 11 13 15
EC5
0
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
LD50
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
QSAR: Which Relationship?
Quantitative structure-activity relationships correlate chemical/biological activitieswith structural features or atomic, group ormolecular properties
within a range of structurally similar compounds
Free Energy of Binding andEquilibrium Constants
The free energy of binding is related to the reaction constants of ligand-receptor complex formation:Gbinding = –2.303 RT log K
= –2.303 RT log (kon / koff)
Equilibrium constant KRate constants kon (association) and koff (dissociation)
Concentration as Activity Measure
• A critical molar concentration Cthat produces the biological effectis related to the equilibrium constant K
• Usually log (1/C) is used (c.f. pH)
• For meaningful QSARs, activities needto be spread out over at least 3 log units
Molecules Are Not Numbers!
O
NCH3
OH
H
HOH-1.09.109*10-31
2.99792*108
-½
0 -0.3183
-180.156
196.967
149,597,870,691
e
43
7
Where are the numbers? Numerical descriptors
An Example: Capsaicin Analogs
X EC50(M) log(1/EC50)
H 11.80 4.93Cl 1.24 5.91
NO2 4.58 5.34CN 26.50 4.58
C6H5 0.24 6.62NMe2 4.39 5.36
I 0.35 6.46NHCHO ? ?
X
NH
O
OH
MeO
An Example: Capsaicin Analogs
X log(1/EC50) MR Es
H 4.93 1.03 0.00 0.00 0.00Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40NHCHO ? 10.31 -0.98 0.00 -0.98
MR = molar refractivity (polarizability) parameter; = hydrophobicity parameter;
= electronic sigma constant (para position); Es = Taft size parameter
An Example: Capsaicin Analogs
X
NH
O
OH
MeO
log(1/EC50) = -0.89 + 0.019 *
MR + 0.23 * + -0.31 * +
-0.14 * Es
Basic Assumption in QSAR
The structural properties of a compound contributein a linearly additive way to its biological activity
provided there are no non-linear dependencies of transport or binding on some properties
Molecular Descriptors • Simple counts of features, e.g. of atoms,
rings,H-bond donors, molecular weight
• Physicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubility
• Group properties, e.g. Hammett and Taft constants, volume
• 2D Fingerprints based on fragments• 3D Screens based on fragments
2D Fingerprints
Br
NH
O
OH
MeO
C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C CΞC C=N Am Im
1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
Principal Component Analysis (PCA)
• Many (>3) variables to describe objects= high dimensionality of descriptor data
• PCA is used to reduce dimensionality• PCA extracts the most important factors
(principal components or PCs) from the data• Useful when correlations exist between
descriptors• The result is a new, small set of variables
(PCs) which explain most of the data variation
PCA – From 2D to 1D
PCA – From 3D to 3D-
Different Views on PCA
• Statistically, PCA is a multivariate analysis technique closely related to eigenvector analysis
• In matrix terms, PCA is a decomposition of matrix Xinto two smaller matrices plus a set of residuals: X = TPT + R
• Geometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions
Partial Least Squares (PLS)
y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1
y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2
y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3
…
yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en
Y = XA + E
(compound 1)(compound 2)(compound 3)…(compound n)
X = independent variablesY = dependent variables
PLS – Cross-validation
• Squared correlation coefficient R2
• Value between 0 and 1 (> 0.9)• Indicating explanative power of regression equation
• Squared correlation coefficient Q2
• Value between 0 and 1 (> 0.5)• Indicating predictive power of regression equation
With cross-validation:
Free-Wilson Analysis
log (1/C) = aixi + xi: presence of group i (0 or 1) ai: activity group contribution of group i : activity value of unsubstituted compound
Free-Wilson Analysis
+ Computationally straightforward
– Predictions only for substituents already included
– Requires large number of compounds
Hansch Analysis
Drug transport and binding affinitydepend nonlinearly on lipophilicity:
log (1/C) = a (log P)2 + b log P + c + k
P: n-octanol/water partition coefficient: Hammett electronic parametera,b,c: regression coefficientsk: constant term
Hansch Analysis
+ Fewer regression coefficients needed for correlation
+ Interpretation in physicochemical terms
+ Predictions for other substituents possible
Pharmacophore
• Set of structural features in a drug molecule recognized by a receptor
• Sample features: H-bond donor charge hydrophobic center
• Distances, 3D relationship
Pharmacophore Selection
L = lipophilic site; A = H-bond acceptor;D = H-bond donor; PD = protonated H-bond donor
DopaminePharmacophore
LPD
D
d1
d2 d3
LPD
D
d1
d2 d3L
PD
D
d1
d2 d3
NH+
CO2H
CH3H
NH
NH+H
CH3
OH
OH
OH
OH
NH3+
OH
NH3+
OH
NH+H
CH3
OH
OH
Pharmacophore Selection
L = lipophilic site; A = H-bond acceptor;D = H-bond donor; PD = protonated H-bond donor
DopaminePharmacophore
LPD
D
d1
d2 d3
LPD
D
d1
d2 d3L
PD
D
d1
d2 d3
NH+
CO2H
CH3H
NH
LPD
D
d1
d2 d3
Comparative Molecular Field Analysis (CoMFA)
• Set of chemically related compounds• Common pharmacophore or
substructure required• 3D structures needed (e.g., Corina-
generated)• Flexible molecules are “folded” into
pharmacophore constraints and aligned
CoMFA Alignment
C7OH
OH
A
D
B
C1
MeO OMe
ClClCl
BA
O
OC7OH
OHOH
A
B
C1
O
NMe2
OH
A B
CL
LL d1
d2d3L
LL
d1
d2
d3
L
LL
d1
d2
d3
L
L
L
d1 d2
d3
L
LL
d1
d2
d3
"Pharmacophore"
CoMFA Grid and Field Probe
(Only one molecule shown for clarity)
Electrostatic Potential Contour Lines
CoMFA Model Derivation
Van der Waals field(probe is neutral carbon)
Evdw = (Airij-12 - Birij
-6)
Electrostatic field(probe is charged atom)
Ec = qiqj / Drij
• Molecules are positioned in a regular gridaccording to alignment
• Probes are used to determine the molecular field:
3D Contour Map for Electronegativity
CoMFA Pros and Cons
+ Suitable to describe receptor-ligand interactions
+ 3D visualization of important features+ Good correlation within related set+ Predictive power within scanned space– Alignment is often difficult– Training required
Recommended