Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
0.0 0.5 1.0 1.5 2.0
log Po/w (R3)
Mechanistic Free-Wilson SAR Analysis of Potency Data
Existing computational approaches may prove very useful for identification of analogs with optimized
ADME and safety profiles. However, taking potency into consideration is impossible without a dedicated
predictor for the particular target affinity endpoint. Since potency measurements are typically performed for
combinatorial sets of compounds with a common scaffold and varying substituents, Free-Wilson-type SAR
analysis1 is the method of choice to produce such models. Notably, modeling should not focus on pursuing
the best possible statistical performance. In lead optimization projects, straightforward mechanistic
interpretation of the models is a much more powerful feature. It has been shown that physicochemical
trends are evident even for complex protein-ligand interactions, for example in P-gp substrate specificity2
or hERG channel inhibition. In this situation, an interpretable model that can successfully capture such
tendencies could still be highly valuable even if it is not highly predictive, as it may help to direct research
efforts towards more promising candidates. The current approach consists of two stages:
1. Modified Free-Wilson-type fragmentation is employed to produce a data matrix for statistical analysis.
First, a common scaffold is identified among the molecules comprising the data set. The substituents
are represented, however, by their contributions to major physicochemical properties rather than by
particular structural fragments. The matrix is subsequently populated with R-group contributions to
molecular size, logP, and hydrogen bonding potential. This helps minimize the number of variables in
the model and ensures generalized mechanistic interpretation in terms of these properties, rather than
simply pointing out the effects of introducing a specific substituent.
2. A class-specific model is built relating compounds’ potency to physicochemical characteristics of
substituents using the Gradient Boosting Machine (GBM) statistical technique3. GBM methodology
relies on stepwise optimization of an ensemble of weak predictors (decision trees) that may account for
non-linear effects in the property variation. For this reason it is preferred over traditional additive
approaches such as MLR or PLS. To avoid over-fitting 5-fold cross validation is performed, and only
those model iterations where cross-validated R2 is within 20% of training R2 are considered.
The proposed modeling methodology is planned for integration into the Structure Design software built on the
ACD/Percepta platform5 (interface shown in Figure 4). The software currently offers automated analog
generation coupled with full-featured ADME/Tox profiling and ranking capabilities. Support for Auto-SAR
analysis of data supplied by the user would complete the workflow outlined above, and would greatly enhance
user experience by eliminating the need for separate tools to account for potency in computational lead
optimization.
Trainable QSAR Model of Plasma Protein Binding and its
Application for Predicting Volume of Distribution
Case Study: Cannabinoid CB2 Receptor Agonists
In this section we present a real-world scenario illustrating how the described mechanistic Free-Wilson
SAR analysis could be applied to model target affinity for a small class-specific data set, and what insight
could be gained from the results obtained.
FIGURE 3. Example compounds illustrating the significance of ‘local’ vs. whole-molecule physicochemical effects.
Pranas Japertas,1,2 Andrius Sazonovas,1,2
Kiril Lanevskij,1,2 Remigijus Didziapetris1,2
1 Advanced Chemistry Development, Inc., 8 King
Street East, Toronto, Ontario, M5C 1B5, Canada
2 VsI “Aukstieji algoritmai”, A.Mickeviciaus g. 29,
LT-08117 Vilnius, Lithuania
Employing Potency Data in Computational
Lead Optimization by Automated Free-Wilson
Analysis
Introduction
Lead optimization efforts are guided by a combination of factors, among which, the lead’s potency and its
ADME/Tox properties play major roles. Each drug discovery project aims at optimizing activity against a
specific target, however, computational models for the multitude of target affinity endpoints are not readily
available. Consequently, conventional in silico lead optimization techniques can only be used for ADME/Tox
profiling, while potency is neglected. In this work we present an Auto-SAR approach to overcome this issue
by incorporating user-defined potency data in analog profiling. This approach is based on automatic Free-
Wilson type SAR analysis on a series of known compounds with a common scaffold and varying
substituents, to evaluate the influence of substituents in different positions on the considered property. The
substituents are represented by their contributions to major physicochemical properties such as size,
lipophilicity, ionization, and hydrogen bonding. Exploring physicochemical dependencies allows feasible,
mechanistically interpretable class-specific SAR models to be obtained from small data sets (several tens of
compounds with measured potency data). Modeling involves special statistical methods to capture the
nonlinearity in the relationship between the dependent property and used descriptors. The obtained class-
specific models can be used to gain a better understanding of substituent effects, evaluate target activities
of new compounds of the same class, and guide lead optimization efforts to the most promising candidates.
Finally, we present several case studies based on published lead optimization articles, where the structural
analogs suggested by the software are compared to those proposed by authors of the original studies.
FIGURE 4. Prototype of automated Free-Wilson analysis in the Structure Design user interface.
SCHEME 2. A proposed workflow of in silico lead optimization involving ADME/Tox profiling combined with
Auto-SAR utilizing available potency data.
References
1. Kubinyi H. QSAR: Hansch Analysis and Related
Approaches. Wiley VCH, 1993, 240 p.
2. Didziapetris R et al. J Drug Target. 2003,11, 391.
3. Friedman JH. Greedy Function Approximation: A
Gradient Boosting Machine, IMS 1999 Reitz Lecture.
4. van der Stelt M et al. J Med Chem. 2011, 54, 7350.
5. Structure Design on the ACD/Percepta platform
www.acdlabs.com/leadopSCHEME 1. An outline of mechanistic Free-Wilson model development workflow.
Application of Potency Data in a Lead Optimization Workflow
A natural further step building upon the described concept would be to integrate potency data in the
computational lead optimization pipeline and make it available for compound ranking along with ADME/Tox
profiles. A small dataset of measured potency values for 20+ compounds with substituent alteration
performed in at least two sites would suffice for automatic derivation of simple Free-Wilson type QSAR
models describing the substituent physicochemical property contributions to the compound’s overall
potency. The resulting in silico lead optimization workflow would be as shown in Scheme 2. This approach
would address the issue that candidates suggested by the software solely on the basis of their ADME/Tox
profiles may fail potency requirements.
N
N
NO
O
R1
R2
R3 [+c]
pEC50 Substituents log Po/w Vx
R1 R2 R3 R1 R2 R3 R1 R2 R3
2.82 -Me -Cl -Cl +0.2 +0.6 +0.8 +0.14 +0.11 +0.11
3.20 -Me -Cl -OCH3 +0.2 +0.6 –0.2 +0.14 +0.11 +0.20
0.86 -Me -Br -CF3 +0.2 +0.7 +0.9 +0.14 +0.17 +0.18
4.13 -Me -Br -OH +0.2 +0.7 –0.4 +0.14 +0.17 +0.06
2.19 -Et -OH -CH2OH +0.5 –0.4 –0.5 +0.28 +0.06 +0.20
3.10 -Et -OH -Br +0.5 –0.4 +0.8 +0.28 +0.06 +0.17
1.85 -Et -Ph -Ph +0.5 +0.3 +1.9 +0.28 +0.60 +0.60
2.93 -Et -Ph -CH2Ph +0.5 +0.3 +1.7 +0.28 +0.60 +0.74
N
N
NO
O
R1
R2
R3 [+c]
LogP
Vx
LogP
Vx
LogP
Vx
pEC50
Predicted
Obs
erve
d
1
2
+
Data set with a
common scaffold
N N
S N
NR2
R1
Measure potency values
for the compound series
Perform automated
Free-Wilson SAR analysis
Transfer to chemical
spreadsheet
Generate analogs and rank according to
ADME/Tox Profile and Auto-SAR model
predictions
Optimized lead
N N
S N
NR2
R1
Exp Position log Po/w Vx
R1 R2 R1 R2 R1 R2
5.5 -H -Cl 0 +0.6 0 +0.1
6.0 -Me -H +0.2 0 +0.2 0
5.9 -H -Br 0 +0.7 0 +0.2
6.2 -Me -OH +0.5 –0.4 +0.2 +0.1
4.5 -H -H 0 0 0 0
6.5 -Me -Ph +0.5 +0.3 +0.2 +0.6
4.4
4.9
5.4
5.9
4.4 4.9 5.4 5.9
Head Office: +1 416 368-3435
Email: [email protected]
www.acdlabs.com
Request a reprint
of this poster
N
N
NO
O
R1
R2
R3
FIGURE 1. A common scaffold of the
considered CB2 antagonists.
The presented case study leads to several key conclusions:
1) GBM method used in modeling successfully describes non-linear effects in property variation.
2) Due to the position-specific nature of the explored dependences, the employed approach can not only
show what property changes are necessary, but can also capture the local effects indicating where
these changes have to be applied in order to achieve the desired effect.
Authors of the original study draw significant attention to basic affinity-lipophilicity relationships, which is also
a major focus point of our analysis. However, apart from lipophilicity (octanol/water—logP) we also
investigated the influence of ionization (pKa), molecular size (McGowan Volume, Vx), and hydrogen bonding
potential (Abraham’s A & B) of substituents by including their contributions to the respective properties of the
molecules as descriptors in the GBM model. The key results of our physicochemical Free-Wilson analysis
are briefly described below:
pEC50 for CB2 receptor: the most significant
physicochemical determinant in the GBM model was
lipophilicity of substituent in the R1-position. pEC50
value quickly rises with increasing logPo/w, and
ultimately reaches a plateau (Figure 2, A), while no
such trend was evident at R3-position (not shown).
pIC50 for hERG inhibition:
1. The major determinant of hERG blocking
propensity was basic pKa of R3-substituent, with
a steady increase in pIC50 consistent with the
ionized fraction at physiological conditions
(Figure 2, B).
2. logPo/w dependence at R3 follows a distinct
pattern with IC50 increasing up to a certain
“optimum” logPo/w and then rapidly falling with
further increasing lipophilicity (Figure 2, C).
3. A much weaker logPo/w dependence was
observed at R1 indicating that this part of the
molecule probably does not play a major role in
hERG binding (not shown).
0.0 0.5 1.0 1.5 2.0
log Po/w (R1)
3.0 4.0 5.0 6.0 7.0 8.0 9.0
Base pKa (R3)
pIC
50
(h
ERG
)p
EC5
0 (
CB
2)
This test case is based on a recent J. Med. Chem. publication
dealing with discovery of novel CB2 receptor agonists.4 The core
1-(4-(Pyridin-2-yl)benzyl)-imidazolidine-2,4-dione scaffold
with varying substituents introduced in 3 different positions is
depicted in Figure 1. Here, R2 was only alternated between
hydrogen and fluorine atoms in order to modulate pKa of the
basic amine in R3. Therefore, this position is out of the scope of
the current study, and we only focus on positions R1 and R3.
Whole-molecule vs. ‘local’ property value approach
An important aspect of this work is that ‘local’ position-specific analysis of physicochemical property
dependences employed here may provide valuable insight that could not be obtained from whole molecule
property values alone. Consider, for example, two molecules from the studied data set shown in Figure 3.
They have similar lipophilicity but the former is significantly more potent and does not block hERG. In fact,
compound 44 was selected as the best of the series. The reason is evident from our results discussed
above: CB2 affinity is more sensitive to changes in substituent R1 (cyclopropyl- better than methyl-), while
making R3 more hydrophilic has little effect on pEC50, but helps further attenuate hERG inhibition.
FIGURE 2. Key physicochemical dependences
observed in GBM model; higher affinities colored
green in case of CB2, and red in case of hERG.
(A)
(B)
(C)
Compound 44
log Do/w = 1.0
pEC50 (CB2) = 8.0
hERG Inhibition @ 100 μM ≈ 0%
Compound 20
log Do/w = 1.3
pEC50 (CB2) = 7.4
hERG Inhibition @ 100 μM = 64%
Visit ACD/Labs at Booth# 1112
N
N
NO
O
F
N
SO O
N
N N
N
CH3O
O