Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Computational Advances in Structure Based Drug Design with
Applications to HIV-1 Reverse Transcriptase
Robert Christopher Rizzo
YALE UNIVERSITY
2001
copyright 2001
by
Robert Christopher Rizzo
Abstract
Computational Advances in Structure Based Drug Design with Applications to HIV-1
Reverse Transcriptase
Robert Christopher Rizzo
2001
Computational advances in structure based drug design are presented which
emphasize the development of protocols and methodology used in force-field
parameterization, scoring function development, structure prediction and validation, and
docking.
Force-field parameters have been developed for amines primarily by fitting to
experimental data for pure liquids and to hydrogen−bond strengths from gas-phase ab
initio calculations. The parameters were used to compute relative free energies of
hydration using free energy perturbation calculations in Monte Carlo simulations
(MC/FEP). The results are in excellent agreement with experimental data, in contrast to
numerous prior computational reports. MC simulations for the pure liquids of thirteen
additional amines demonstrated the transferability of the force field.
The interactions and energetics associated with the binding of 20 HEPT and 20
nevirapine non-nucleoside inhibitors of HIV-1 reverse transcriptase (RT) have been
explored in an effort to establish simulation protocols and methods that can be used in the
development of more effective anti-HIV drugs. Each inhibitor was modeled in the bound
and unbound states via MC statistical mechanics methods. A viable regression equation
was obtained using only four descriptors to correlate the 40 experimental activities with a
r2 of 0.75 and cross-validated q2 of 0.69. The MC results revealed three physically
reasonable parameters that control the binding affinities.
Molecular docking and simulation methods have been used to generate a model of
the FDA-approved inhibitor Sustiva bound to HIVRT. The docking protocol was
validated with known NNRTI complexes. MC/FEP simulations confirmed that the
predicted structures yield correct results for the effects of the Y181C and V106A
mutations on the activity of Sustiva, nevirapine, MKC-442, and 9-Cl TIBO. A
subsequently reported crystallographic complex of Sustiva with HIVRT fully confirmed
the prediction.
Docking studies that include cluster analysis are presented in an effort to reduce
the number of candidate conformers that need to be docked for very flexible ligands.
Despite a limited conformational search, clustering based on a rmsd value of 2.5 Å
dramatically reduced the total number of clusters yet still retained at least one cluster
representative with a conformation similar to the experimental bound-like conformation
for the majority of systems tested.
Computational Advances in Structure Based Drug Design with Applications to HIV-1
Reverse Transcriptase
A Dissertation
Presented to the Faculty of the Graduate School
of
Yale University
in Candidacy for the Degree of
Doctor of Philosophy
by
Robert Christopher Rizzo
Dissertation Director: William L. Jorgensen
May 2001
Acknowledgements
I would like to express my sincere thanks to Professor William L. Jorgensen for
allowing me to pursue graduate studies in his laboratory. I will always be deeply
indebted to him for his encouragement that I think and act as an independent scientist, for
suggesting interesting and challenging projects, and for keeping me focused with timely
and insightful advice.
I would like to thank the members of my thesis committee, Professors Martin
Saunders and Donald Crothers for many helpful suggestions and comments throughout
my entire graduate career. Special thanks go to Dr. Julian Tirado-Rives for his patience
and day-to-day help.
At Villanova university I would like to thank Dr. Joseph W. Bausch, Dr. Morgan
Besson, and especially Dr. José de la Vega for going above and beyond the call of duty in
helping me prepare for graduate school. I am also very grateful to Dr. Juan G. Alvarez at
the Harvard Medical School for his early encouragement that I pursue an undergraduate
degree in Chemistry.
Thanks to Dr. Dongchul Lim for incorporating software suggestions into his
ChemEdit program that facilitated inhibitor Z-matrix generation in Chapter Three.
Thanks also to Dr. Albert C. Pierce for computational assistance in fitting torsion
parameters (Chapter Two), to Dr. Melissa L. Plount Price for help with docking
calculations (Chapter Four), and to Dr. De-Ping Wang who performed free energy
perturbations (Chapter Four) and additions to the MATADOR program (Chapter 5).
Thanks to Matt Repasky for much assistance with PERL programming and to Dennis
iv
Ostrovsky. I would also like to thank Dr. Marilyn B. Kroeger Smith and Professor
Richard H. Smith for their collaborations and helpful discussions and to Jayaraman
Chandrasekhar. Thanks to Marina Udier and Mark Wilson for proofreading this
dissertation.
I would like to acknowledge all the members of the Jorgensen lab past and present
with whom I have worked. The acceptance I have felt from this diverse group will be
fondly remembered and it has been a privilege to interact with so many talented people.
A special thanks to Patricia Morales for her day-to day help.
Thanks to Bob Jordan, Tim Reeder, David Lenat, Hashim Al-Hashimi, and Mark
Wilson for constant love and emotional support. In particular, Mark the Genius who puts
up with me on a daily basis.
I can't express into words the love and encouragement I have received from my
family and from my fiancée and best friend Elizabeth. I dedicate this thesis to my parents
Frank Joseph and Mary Lou Rizzo.
v
Table of Contents
List of Figures................................................................................................................. viii
List of Tables. ................................................................................................................. xiii
Preface................................................................................................................................ 1
Chapter One ...................................................................................................................... 4
Structure Based Methods for Computational Drug Design ................................................ 4
Introduction................................................................................................................. 4
Monte Carlo and Molecular Dynamics Methods........................................................ 5
Potential Energy.......................................................................................................... 8
Pure Liquid Properties. ............................................................................................. 10
Free Energy Perturbations......................................................................................... 11
Linear Response and Extended Linear Response Methods. ..................................... 16
Chapter Two.................................................................................................................... 19
OPLS All-Atom Model for Amines: Resolution of the Amine Hydration Problem ....... 19
Background. .............................................................................................................. 19
Previous Simulation Studies. .................................................................................... 20
Computational Details. ............................................................................................. 23
Results and Discussion. ............................................................................................ 29
Conclusion. ............................................................................................................... 56
Chapter 3 ......................................................................................................................... 58
Estimation of Binding Affinities for HEPT and Nevirapine Analogs with HIV-1 Reverse
Transcriptase via Monte Carlo Simulations...................................................................... 58
Background. .............................................................................................................. 58
Computational Details. ............................................................................................. 65
vi
Results and Discussion. ............................................................................................ 78
Conclusion. ............................................................................................................... 93
Chapter 4 ......................................................................................................................... 95
Validation of a Model for the Complex of HIV-1 Reverse Transcriptase with Sustiva
through Computation of Resistance Profiles .................................................................... 95
Background. .............................................................................................................. 95
Results and Discussion. .......................................................................................... 102
Conclusion. ............................................................................................................. 110
Chapter 5 ....................................................................................................................... 111
Docking Aided by Cluster Analysis: Protocol Development and Validation Studies... 111
Background. ............................................................................................................ 111
Computational Details. ........................................................................................... 116
Results..................................................................................................................... 120
Conclusion. ............................................................................................................. 141
Cited References............................................................................................................ 142
vii
List of Figures.
Figure 0. 1. HIV infection estimates from the World Health Organization and the Joint
United Nations Programme on HIV/AIDS. ................................................................ 1
Figure 0. 2. AIDS and death from AIDS estimates from the Centers for Disease Control
and Prevention. ........................................................................................................... 3
Figure 1. 1. The Metropolis Monte Carlo sampling method. Figure adapted from
reference 7................................................................................................................... 6
Figure 1. 2. Thermodynamic cycle used to determine the relative free energy of
hydration (∆∆Ghyd) between two molecules A and B. .............................................. 13
Figure 1. 3. Thermodynamic cycle used to determine the relative free energy of binding
(∆∆Gb) between two molecules A and B to a protein P............................................ 14
Figure 1. 4. Thermodynamic cycle used to determine the relative fold resistance (∆∆GFR)
between two molecules A and B, a wild-type protein (PWT), and mutant protein
(PMUT)........................................................................................................................ 15
Figure 2. 1. Thermodynamic cycle used to determine the relative free energy of
hydration (∆∆Ghyd) between methylamine and ammonia. ........................................ 27
Figure 2. 2. Gas-phase interaction energies and enthalpies (kcal/mol) of amines with
potassium ion. Calculated results are from the OPLS-AA force field, and the
experimental enthalpies are from reference 55. ........................................................ 37
Figure 2. 3. N−N radial distribution functions for liquid amines from Monte Carlo
simulations with the OPLS-AA force field. X-ray results for ammonia are at +4 °C
from reference 74. Successive curves are offset 3.0 units along the y-axis. ............. 41
Figure 2. 4. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: methylamine
ammonia............................................................................................................... 44
viii
Figure 2. 5. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: dimethylamine
methylamine......................................................................................................... 45
Figure 2. 6. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: trimethylamine
dimethylamine...................................................................................................... 46
Figure 2. 7. N−HW (amine N−water H) radial distribution functions in TIP4P water from
MC simulations with the OPLS-AA force field. ...................................................... 48
Figure 2. 8. H(N)−OW (amino H−water O) radial distribution functions in TIP4P water
from MC simulations with the OPLS-AA force field............................................... 49
Figure 2. 9. Solute−solvent (amine−water) energy pair distributions from MC
simulations with the OPLS-AA force field. The y-axis records the number of water
molecules per kcal/mol, which interact with the amine solute with the interaction
energy given on the x-axis. ....................................................................................... 51
Figure 3. 1. Cartoon representation of an HIV particle. Reverse transcriptase (RT)
converts viral RNA to viral DNA for subsequent incorporation into the host cell
genome. ..................................................................................................................... 59
Figure 3. 2. Schematic diagram showing the different binding sites for nucleoside
(NRTI) and non-nucleoside (NNRTI) HIV reverse transcriptase (HIVRT) inhibitors.
The apo coordinates in green on the left are from reference 86. The NRTI/HIVRT
complex in cyan (top) showing the NRTI binding site in red and the viral nucleic
acid site in magenta is from reference 87. The NNRTI/HIVRT complex in cyan
(bottom) showing the NNRTI binding site in red is from reference 88.................... 60
Figure 3. 3. Schematic representation of a binding event showing different environments
for HIVRT inhibitors. Small arrows depict potential interactions of a drug with
water (unbound state) or water and protein (bound state). ....................................... 66
Figure 3. 4. HIVRT binding site model surrounded by a 22 Å cap of water. Blue
residues sampled in the MC simulations, red residues rigid, green residues not used.
Crystal structure coordinates, pdb entry 1rt1, from reference 88. ............................ 67
ix
Figure 3. 5. No steric clash is observed between HIVRT side-chain Tyr181A and the i-Pr
group of MKC−442 in the modeled structure using the “down” conformation, which
is only reported for the parent HEPT. ....................................................................... 70
Figure 3. 6. Experimental conformation of Tyr181A for 16 HIVRT non-nucleoside
inhibitor complexes: nevirapine (green), HEPT(magenta), BHAP (grey), α−APA
(red), TIBO (yellow), and carboxanylide (cyan) analogs. The complexes were
aligned by minimizing the rmsd between Cα carbons at residues Leu100A,
Lys103A, Tyr181A, and Val106A. See text for pdb references.............................. 71
Figure 3. 7. Annealing protocol showing heating, equilibration, and averaging portions
used in the MC simulations for the unbound inhibitors............................................ 74
Figure 3. 8. Convergence of the inhibitor-water Coulombic energy for the HEPT data set
after 10 million (1 cycle) and 50 million (5 cycles) configurations of averaging using
the annealing protocol. Each inhibitor was simulated twice starting from one of two
different conformations obtained from a minimization in either the 1rt1 or 1rti
crystal structure. ........................................................................................................ 76
Figure 3. 9. Predicted binding affinities (∆Gcalcd) using eq 3.3 vs. experimental activities
(∆Gexptl) for 20 HEPT and 20 nevirapine analogs with HIVRT. .............................. 82
Figure 3. 10. Plot of ∆G (kcal/mol) vs. λ for the perturbation of N,N-dimethylacetamide
to N-methylacetamide. The non-bonded parameters and geometries were scaled
using the coupling coordinate λ. ............................................................................... 85
Figure 3. 11. Two water molecules (orange) are displaced by compound H07 (green, Et
analog) that are observed in simulations of compound H08 (magenta, Me analog)
with HIVRT. ............................................................................................................. 87
Figure 3. 12. Top – computed snapshots of Nevirapine (N10) and N-methyl Nevirapine
(N11) with Tyr188A from the MC simulations. Bottom – optimized structures of
model 2° and 3° amides, N-methylacetamide and N,N-dimethylacetamide, with
benzene. The net interaction energy is shown along with the shortest distances to
aromatic carbons. ...................................................................................................... 90
Figure 3. 13. A water-mediated hydrogen bond is consistently observed between N01 (Et
analog) and Lys101A that is not observed in the MC simulations of N13 (t-Bu
analog) with HIVRT. ................................................................................................ 93
x
Figure 4. 1. Docking validation results. Crystal (red) vs. docked (green) structure in the
NNRTI binding site. Nevirapine (pdb entry 1vrt), MKC-442 (pdb entry 1rt1), HEPT
(pdb entry 1rti), and 9-Cl TIBO (pdb entry 1rev). Each compound was initially
positioned outside of the binding site. ...................................................................... 99
Figure 4. 2. Orientation of the four NNRTIs in the HIVRT binding site. (A) Best docked
structure of Sustiva. (B) Nevirapine from pdb entry 1vrt. (C) MKC-442 from pdb
entry 1rt1. (D) 9-Cl TIBO from pdb entry 1rev...................................................... 102
Figure 4. 3. Left − butterfly shapes adopted by Sustiva (red) and nevirapine (green).
Right − the same overlay in CPK colors................................................................. 103
Figure 4. 4. Top − overlays of the binding-site positions of nevirapine, MKC-442, and 9-
Cl TIBO (red) with Sustiva (green). Bottom − the same overlays in CPK colors. 104
Figure 4. 5. Predicted vs. experimental binding mode for Sustiva (rmsd = 0.73 Å). Cα
carbons aligned at Leu 100, Lys101, Val 106, Tyr181, and Tyr 188. Experimental
structure from reference 135. .................................................................................. 105
Figure 4. 6. Thermodynamic cycle used to compute relative fold resistance values. In
this example the wild-type side-chain Tyr (magenta) is perturbed to the mutant side
chain Cys in the presence of Drug A (solid red) and Drug B (checkered red) while
bound to a protein (green). Relative fold resistance (∆∆G) = ∆GB – ∆GA = ∆GMUT –
∆GWT. ...................................................................................................................... 106
Figure 4. 7. Principal point mutations that confer resistance to non-nucleoside HIV-1 RT
inhibitors. The protein is shown as a ribbon trace in green, the mutation sites in red,
and the non-nucleoside binding site in blue. Crystal structure coordinates, pdb entry
1rt1, from reference 88............................................................................................ 108
Figure 5. 1. Clustering protocol for reducing the number of conformers generated from
conformational searches using rmsd geometric similarity...................................... 115
Figure 5. 2. Three lowest energy solutions from rigid docking calculations for trypsin
system 1PPH. The experimental binding mode is shown in magenta and three
docking solutions are shown in green. .................................................................... 121
xi
Figure 5. 3. Number of correctly docked structures shown in green from 10 block runs of
1000 Tabu cycles each. ........................................................................................... 125
Figure 5. 4. Example of a shallow and solvent exposed binding site vs. an enclosed
buried binding site................................................................................................... 125
Figure 5. 5. Predicted (green) vs. experimental (red) binding mode for ligand 1APB
before the ligand was subjected to a conformational search. Rmsd = 3.2 Å. ........ 127
Figure 5. 6. Conformational search results for unbound ligand 1APB. The conformers
are overlaid to emphasize the 11 different hydroxyl group rotamers. .................... 129
Figure 5. 7. Lowest energy complex obtained for system 1APB after docking using the
11 conformers obtained from the conformational search. The heavy atom rmsd is
0.67 Å from the crystal structure shown in green. .................................................. 129
Figure 5. 8. Crystal structure conformation (spoke representation) overlaid with best
match conformer (ball and stick representation) from the conformational searches
for ligands 1AE8, 1AJV, 1BMM, and 1DWC........................................................ 130
Figure 5. 9. Crystal structure conformation (spoke representation) overlaid with best
match conformer (ball and stick representation) from the conformational searches
for ligands 1GNO, 1HDT, 1HPV, and 1HSG......................................................... 131
Figure 5. 10. A histogram representation of how similarity values affect the number of
clusters for the 26 most flexible ligands. ................................................................ 136
Figure 5. 11. A visual representation of clustering. The first 4 clusters are shown for
ligand 1HPX and were obtained using a rmsd similarity value of 2.0 Å................ 137
Figure 5. 12. Representative cluster survivors (ball and stick representation) overlaid
with crystal structure conformation (spoke representation).................................... 140
xii
List of Tables.
Table 2. 1. Previously Calculated Relative Free Energies of Hydration (kcal/mol) for
Amines. ..................................................................................................................... 22
Table 2. 2. OPLS-AA Bond Stretching Parameters......................................................... 30
Table 2. 3. OPLS-AA Angle Bending Parameters. ......................................................... 30
Table 2. 4. OPLS-AA Fourier Coefficients (kcal/mol).................................................... 32
Table 2. 5. OPLS-AA Non-Bonded Parameters. ............................................................. 34
Table 2. 6. Comparison of Hydrogen-Bond Interaction Energies ( kcal/mol) for Amines.
................................................................................................................................... 36
Table 2. 7. Computed Densities and Heats of Vaporization from Pure Liquid
Simulations. .............................................................................................................. 38
Table 2. 8. Relative Free Energies (kcal/mol) of Hydration (water), Solvation
(chloroform), and Transfer (water → chloroform), and ∆log P for Amines at 25 °C.
................................................................................................................................... 43
Table 2. 9. Linear Response Components (kcal/mol) for Amines in Water.................... 54
Table 3. 1. Inhibition of HIV-1 RT by HEPT Analogs. .................................................. 63
Table 3. 2. Inhibition of HIV-1 RT by Nevirapine Analogs............................................ 64
Table 3. 3. Individual Contributions to the Total Computed Free Energies of Binding for
HEPT Analogs with HIV-1 RT................................................................................. 79
Table 3. 4. Individual Contributions to the Total Computed Free Energies of Binding for
Nevirapine Analogs with HIV-1 RT......................................................................... 80
Table 4. 1. Relative Free Energies of Binding (∆GFR) Estimated from Fold Resistance
(FR) Values. .............................................................................................................. 96
Table 4. 2. Relative Fold Resistance Energies (∆∆G) in kcal/mol for HIV-1 RT
Mutations Normalized to Sustiva............................................................................ 107
xiii
Table 5. 1. Protein-ligand Complexes Used in this Study ............................................. 113
Table 5. 2. The Percent of Structures Correctly Docked using the Ligand Crystal
Structure Conformation. ......................................................................................... 122
Table 5. 3. Intermolecular Energies and rmsd Results from Rigid Docking Calculations
for Ligands 1AE8 and 1AAQ. ................................................................................ 124
Table 5. 4. Average CPU Timings for System 1AJV. ................................................... 126
Table 5. 5. Energy Difference Between the Bound-like Conformer and the Lowest
Energy Conformer Found in the Conformational Searches for Eight Different
Ligands.................................................................................................................... 132
Table 5. 6. Cluster Analysis Results. Each Column Tabulates the Number of Rotatable
bonds (Nrot), the Number of Conformers (Nconf) found in the Limited
Conformational Search, and Number of Clusters for 10 different rmsd Similarity
Tolerance Values.. .................................................................................................. 134
Table 5. 7. The Number of Cluster Representatives with an rmsd <= 2.0 Å from the
Ligand Crystal Conformation. Five Cluster Tolerances are Shown. ..................... 139
xiv
Preface.
The number of people now infected with the human immunodeficiency virus
(HIV), the etiological agent that causes acquired immunodeficiency syndrome (AIDS), is
50% higher than what was predicted only a decade ago by the Joint United Nations
Programme on AIDS (UNAIDS) and the World Health Organization (WHO).1 Sub-
Saharan Africa is so disproportionately affected by HIV/AIDS that it is difficult for those
of us in less affected areas to comprehend the magnitude of the epidemic (Figure 0.1).
Although the huge populations of India and China have so far experienced minimal HIV
transmission, recent statistics indicate an exponential growth of HIV infection in the
Russian Federation; complacency towards HIV is a continued risk for all nations.2 It
should be noted that most HIV infections worldwide are transmitted through heterosexual
sex or through intravenous drug use.
Figure 0. 1. HIV infection estimates from the World Health Organization and the Joint
United Nations Programme on HIV/AIDS.
1
Retroviruses like HIV have evolved to exist as a swarm of virions in which some
viral proteins have slightly different amino acid sequences (mutations) over the largest
population (wild-type) group.3, 4 Because of the variable nature of certain antigenic HIV
coat proteins the immune response is unable to clear all HIV particles from the
bloodstream (passive evasion). 5 HIV can also escape immune surveillance by directly
targeting, infecting, and killing immune response cells (active evasion). AIDS can result
when too many immune cells have been destroyed and opportunistic infections take hold.
Despite these setbacks, substantial progress has been made in reducing the
amount of measurable HIV present in an infected individual. The declining death rates
from HIV/AIDS in the United States (Figure 0.2) and other developed countries can be
attributed in part to aggressive anti-retroviral chemotherapies targeting two proteins
essential for completion of the viral life cycle. HIV reverse transcriptase (HIVRT) is
responsible for copying the viral RNA genome so the virus can replicate, and HIV
protease (HIVPR) processes immature protein strands into complete viral proteins.
Unfortunately, since genetic mutations affect all HIV enzymes, a compound designed to
inhibit wild-type HIVRT, for example, is a less effective inhibitor of mutant HIVRT.
The end result is that the virus is never completely eliminated from the body. To date,
anti-retroviral compounds targeting HIV represent treatment options for postponing
AIDS and are not a cure.
2
Figure 0. 2. AIDS and death from AIDS estimates from the Centers for Disease Control
and Prevention.
In the United States and elsewhere, government and private funding in the basic
research towards the study of HIV/AIDS has resulted in HIV (and the associated viral
proteins) being the most examined disease causing virus to date. This has resulted in an
abundance of structural information about HIV that can be used as a starting point for
structure based methods towards the design of improved anti-HIV drugs. In this thesis,
computational advances in structure based drug design, with emphasis on the
development of protocols and methodology, are presented with applications to HIV-1
reverse transcriptase.
3
Chapter One
Structure Based Methods for Computational Drug Design
Introduction.
Structure based methods which include computational chemistry are at the
forefront of modern day rational drug design. The modeling of biological systems at the
atomic level can yield thermodynamic and structural information that compliment
experimental methods. If, for example, a better physical understanding of how drugs
interact with their targets can emerge from such studies it is hoped that more effective
chemotherapeutics can be designed. The computational chemist and molecular modeler
often wants to understand why a given drug binds better to its target than does another,
and the prediction of binding affinity is of particular importance. Although binding
affinity is only part of the process in drug discovery, strong binding to the therapeutic
target is important for any drug candidate. Increasing the affinity of a compound for its
target may lead to a reduced dose size that may in turn lower toxicity/side effect
problems associated with all drugs.
Usually, Cartesian coordinates typically used as a starting point for structure
based computer simulations are obtained from X-ray crystallography and nuclear
magnetic resonance (NMR) experiments. However, structure prediction methods can be
used to compute the binding mode of a novel compound using a receptors of known
structure (docking), or used to generate a target model of unknown structure using
information from related proteins (homology modeling). The drug target is often a
protein in which some therapeutic benefit will result if the normal enzymatic function of
4
the protein can be reduced i.e., inhibited. Although the targets themselves may be quite
large many enzyme inhibitors are small organic molecules.
Molecular mechanics is the technique most often used by
computational/theoretical chemists to model biological systems in the condensed phase
(includes solvent). This thesis describes ongoing advancements in molecular mechanics
force-field parameterization (Chapter Two), in protocol and simulation method
development for modeling protein−ligand binding (Chapter Three), in structure
prediction as well as determination of binding affinity differences for inhibitors with
mutant proteins (Chapter Four), and docking (Chapter 5). The simulation methods and
protocols described in this thesis are completely general and may be applied to any
protein−ligand system provided that some initial structural information about the drug
target is known and in which the binding of the ligand to the protein is non-covalent.
Monte Carlo and Molecular Dynamics Methods.
Monte Carlo. Most of the simulation results in this thesis have been obtained
via molecular mechanics simulation which employ Monte Carlo (MC) statistical
mechanics sampling methods as first introduced by Metropolis and coworkers.6
Metropolis et al. devised a scheme in which thermodynamic averages of desirable
properties could be computed by focusing only on choosing configurations of a system
which will have a Boltzmann weighted distribution of energies rather than trying to
evaluate all possible states of a system. New configurations of a state can be obtained,
for example, by varying the internal degrees of freedom such as bond lengths, bond
angles, or dihedral angles, or through rigid body rotations and translations or volume
5
changes. During the course of the simulation, if a new configuration is generated that has
an energy less than the previously evaluated configuration (∆E < 0) the MC move is
always accepted (Figure 1.1). If the new configuration has an energy greater than the
previous configuration (∆E > 0) then the move is accepted having Boltzmann probability
exp(−∆E/RT). This is achieved by generating a random number ξ (between 0 and 1) and
accepting uphill moves with ξ > exp(−∆E/RT) but rejecting moves with ξ <
exp(−∆E/RT).
Figure 1. 1. The Metropolis Monte Carlo sampling method. Figure adapted from
reference 7.
6
The algorithm forces the chain of configurations that are generated to be low in
energy. The lower energy states of a system are usually the most important since they are
expected to be the most populated and will contribute the most towards any average
thermodynamic property. Statistical uncertainties (±1σ) in the computed properties are
obtained through the batch means procedure (eq 1.1) where m is the number of batches
and θi is the average of property θ in the i-th batch.8
∑ −−=m
ii mm )1(/)( 22 θθσ (1. 1)
Molecular Dynamics. Molecular dynamics (MD) methods rely on Newton's
equations of motion, which relate how forces influence position and velocity over time
according to eq 1.2.7 Here, Fi is a force acting on a particle i of mass mi with acceleration
ai.
iii amF = (1. 2)
Force is the negative gradient of the potential energy (Etot), acceleration is the second
derivative of the atomic position with respect to time (t), and velocity (v) is the first
derivative of the atomic position with respect to time. From these relationships the key
differential equation to be solved for MD methods is shown for one particle i along one
coordinate x, eq 1.3.
7
2
2
tot)(
)(dt
trdmrE i
ii
rr
=∇− (1. 3)
The Verlet algorithm is the most widely used method to compute the coordinates for a
new time step although other methods exist for evolution of eq 1.3.7 The Verlet method
uses the current set of coordinates and accelerations at time t and the previous set of
coordinates at time (t − δt) to compute the new coordinates (r) at a new time (t + δt) as
shown in eq 1.4.9
)()()(2)( 2 tatttrtrttr +−−=+ δδ (1. 4)
It should be noted that time averaged properties obtained via a MD simulation
should be the same (within standard error) as ensemble averaged properties from a MC
simulation (the ergodic principle), provided each simulation utilized the same potential
energy function and all results have fully converged.7
Potential Energy.
Regardless of the simulation method (MC or MD) classical potential energy
expressions (force fields) are normally used to evaluate the total energy of the system.10
The most standard form of the function consists of harmonic bond-stretching and angle-
bending terms, a truncated Fourier series for torsional energetics, and Coulomb and
Lennard-Jones terms for the nonbonded interactions, eqs 1.5−1.8.11
8
( )2,0,bonds ∑ −=
iiiir rrkE (1. 5)
( )2,0,angles ∑ −=
iiiikE ϑϑϑ (1. 6)
( ) ( ) (∑ ⎥⎦⎤
⎢⎣⎡ ++−++=
iiiiiiit VVVE ϕϕϕ 3cos1
212cos1
21cos1
21
,3,2,1orsion ) (1. 7)
∑∑> ⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟
⎟⎠
⎞⎜⎜⎝
⎛+=
i ij ij
ij
ij
ijij
ij
ji
rrreqq
E6122
nonbond 4σσ
ε (1. 8)
The parameters are the force constants k, the and 0r 0ϑ reference values, the Fourier
coefficients V, the partial atomic charges q and the Lennard-Jones radii and well-depths,
σ and ε. Standard combining rules are used such that σij = (σiiσjj)1/2 and εij =
(εiiεjj)1/2.11 The non-bonded interactions are evaluated intermolecularly and for
intramolecular atom pairs separated by three or more bonds. The 1,4-intramolecular
interactions are reduced by a factor of 2 in order to use the same parameters for both
intra- and intermolecular interactions.11
The accuracy of molecular modeling results is primarily influenced by the quality
of the force field parameters used to evaluate the total energy of the system. This fact
underlies the philosophy of the parameterization of the OPLS (optimized potentials for
liquid simulations) force fields, which recognizes the necessity of computing condensed-
phase properties in the development of force fields for use in condensed-phase
simulations.11
9
Pure Liquid Properties.
Frequently, pure liquid simulation results are used to guide force-field
parameterization. The parameters are adjusted to achieve maximal agreement with
experiment. This helps to insure that the simulation results are accurate and
interpretations based on the results more meaningful. The density and heats of
vaporization are the two key thermodynamic properties that can be readily computed
from the simulation results and compared with experiment. The density is computed
from the molecular weight of the compound and the average molecular volume. The
molecular volume is obtained by dividing the average size of the simulation cell by the
number of molecules used in the pure liquid simulation. The heat of vaporization for
flexible molecules requires a separate gas-phase simulation in addition to the pure liquid
simulation and can be computed from the simulation results using eq 1.9.
RTEEHHH +−=∆−∆=∆ )liquid()gas( totintraliquidgasvap (1. 9)
Here, Eintra(gas) is the average intramolecular energy in the gas-phase, and Etot(liquid) is
the total potential energy of the liquid consisting of both the average intramolecular
energy of the liquid Eintra(liquid) and the average intermolecular energy of the liquid
Einter(liquid). The PV-work term in the enthalpy is equal to RT for the ideal gas and it is
negligible for the liquid.
Pure liquids are usually simulated in the NPT ensemble (constant number of
particles, pressure, and temperature) which most closely approximates normal
experimental conditions and employ periodic boundary conditions as introduced by
10
Metropolis et al.6 Simulation results for 17 amine compounds (aliphatic, cyclic, and
aromatic molecules) using cubic cells of 267 molecules each is presented in Chapter
Two. A recent review detailing MC simulations for pure liquids has been published.8
Free Energy Perturbations.
Free energy perturbation (FEP) methodology as first introduced by Zwanzig12 is
generally regarded as the most accurate method for the computation of free energy for a
variety of thermodynamic properties. The free energy change between two systems A to
B is computed according to eq 1.10, where kb is Boltzmann's constant, T is the
temperature, E is the total potential energy for the full system with A or B, and the
averaging is performed for system A.12
( ) ( )[ ]AbABbAB /explnBA∆ TkEETkGGG −−−=−=→ (1. 10)
Although the Zwanzig equation is exact, in practice the perturbation must be small or
convergence of the free energy is slow. Convergence of eq 1.10 is usually promoted in
two ways (1) molecules A and B are usually quite similar, i.e., related analogs that differ
minimally in functionality and (2) a coupling parameter λ is introduced to allow gradual
interconversion of the potential functions and geometries, ξ, of A and B (eq 1.11).
( ) AB 1)( ζλλζλζ −+= (1. 11)
11
Several incremental mutations are performed between λ = 0 (A) and λ = 1 (B). A
typical ∆λ is ±0.05, which requires 10 separate simulations (windows) for the full
mutation using double-wide sampling.13 Frequently however, smaller ∆λ values are used
near the end points of the mutations, where the free energy changes are often largest or
noisiest. Plots of ∆G vs. λ can be monitored to asses the convergence of the FEP by
looking for a smooth free energy profile that changes little with increased averaging.
In practice, a relative rather than absolute free energy is most often computed due
to convergence and standard state issues. Since free energy is a state function, and by
definition path independent, a thermodynamic cycle can be constructed for many types
thermodynamic quantities which allow for a comparison between theory and experiment,
as presented below.
12
Hydration Free Energy.
The relative free energy of hydration between two molecules A and B can be
determined from the thermodynamic cycle in Figure 1.2.13, 14 In Figure 1.2, ∆Ggas is
evaluated here through a Monte Carlo/Free Energy Perturbation simulation by mutation
of A to B in isolation, and ∆Gwater is obtained by an equivalent mutation in the presence
of explicit water molecules. Note that ∆Ghyd(A and B) are the experimental free energies
of hydration for molecules A and B that are related to the theoretically determined values
by eq 1.12.
(A)∆(B)∆∆∆∆∆ hydhydgaswaterhyd GGGGG −=−= (1. 12)
Figure 1. 2. Thermodynamic cycle used to determine the relative free energy of
hydration (∆∆Ghyd) between two molecules A and B.
13
Binding Free Energy.
A thermodynamic cycle can be constructed to determine the relative free energy
of binding (∆∆Gb) between two ligands A and B as shown in Figure 1.3. Here, ligand A
is converted to B free in solution (unbound state) to yield ∆Gunbound(A→B) and
complexed with the protein (bound state) to give ∆Gbound(A→B). Each ligand will have
an affinity for the protein P which is reflected in the experimental free energy of binding
∆Gb(A and B) values, corresponding to the two horizontal legs of the thermodynamic
cycle. As before, the quantities are related and a theoretical prediction can be related to
experiment via eq 1.13.
(A)∆(B)∆)BA(∆)BA(∆)BA(∆∆ bbunboundboundb GGGGG −=→−→=→ (1. 13)
Figure 1. 3. Thermodynamic cycle used to determine the relative free energy of binding
(∆∆Gb) between two molecules A and B to a protein P.
14
Relative Fold Resistance.
A thermodynamic cycle has been devised and used to compute a relative fold
resistance energy as shown in Figure 1.4 which leads to eq 1.14.
ABWTMUT ∆∆∆∆B)(A∆∆ GGGGG −=−=→ (1. 14)
Here, fold resistance (FR) is the ratio of mutant (MUT) activity to wild-type (WT)
activity and quantifies the loss in binding affinity for a compound due to a particular
mutation in the target enzyme. FR can be converted to a free energy via ∆G = RT ln FR.
The quantities are related with ∆∆G being the experimentally observable difference in the
fold resistance values given by RT ln FRB – RT ln FRA and is equivalent to the difference
in the simulations results ∆GMUT − ∆GWT.
Figure 1. 4. Thermodynamic cycle used to determine the relative fold resistance (∆∆GFR)
between two molecules A and B, a wild-type protein (PWT), and mutant protein (PMUT).
15
Linear Response and Extended Linear Response Methods.
A more approximate method for the estimation of free energies of binding ∆Gb is
based on linear response (LR) theory, as introduced by Åqvist and coworkers (eq 1.15).15
This approach is considerably faster than standard FEP simulations because no
intermediate transformation process is required to calculate the binding affinity.15 Only
the endpoints (states A and B) of the binding free energy thermodynamic cycle are
simulated which typically results in CPU savings by at least a factor of 10.
Coulvdwb ∆∆∆ EβEαG += (1. 15)
Here, signifies an ensemble average of the difference (bound − unbound) in
interaction energies (∆E) of the inhibitor−solvent plus inhibitor−protein interaction
energies in the bound state and of the inhibitor−solvent interaction energies in the
unbound state.15 The two energy terms represent the differences in average van der
Waals (Lennard-Jones) and electrostatic (Coulombic) contributions, respectively, which
are normally calculated using a molecular mechanics force field and either MD or MC
simulations. The Coulombic energy differences were originally scaled by a factor β =
0.50, while the coefficient α was determined by fitting the simulation results to known
experimental binding affinities.15
Jorgensen et al. introduced an extension of the LR approach for the calculations of
free energies of solvation, which corresponds to eq 1.16 for computing free energies of
binding.16, 17 In this extended linear response (ELR) approach, both coefficients, α and
16
β, are allowed to vary, and a third term representing the solvent accessible surface area
(SASA) of the solute is included, and scaled by a coefficient γ. The rationale for the
SASA term is that it provides a means to account for possible positive free energies of
hydration caused by the penalty for solute cavity formation in water.16, 17
∆SASA∆∆∆ Coulvdwb γEβEαG ++= (1. 16)
Encouraged by prior MD/LR15, 18-22 and MC/ELR23-25 binding studies, we endeavored
to treat larger data sets to see if good correlations to experiment could still be obtained.
Recently, Duffy and Jorgensen have correlated results from aqueous MC simulations
with solvation properties for more than 200 diverse organic compounds.26 The
descriptors were expanded from those in eq 1.16 to include, for example, hydrogen-bond
counts and the hydrophobic, hydrophilic and aromatic components of the solvent-
accessible surface area. A multivariate fitting approach was used which corresponds to
eq 1.17 for computing binding affinities.
constant∆ b += ∑n
nnξcG (1. 17)
Here, cn represents an optimizable coefficient for the associated descriptor ξn. In
principle, any physically reasonable quantity could be considered as a descriptor.
Specifically relevant to protein-ligand binding was the success in predictions of log P
(octanol/water) for 200 solutes. Only four descriptors were needed to yield a correlation
17
with r2 = 0.91 and a rms error of 0.53.26 Given the potential parallel between solute
octanol/water partitioning and ligand protein/water partitioning, we sought to consider
alternative descriptors too for protein-ligand binding using a data set comprising of 40
non-nucleoside inhibitors of HIV-1 reverse transcriptase, as presented in Chapter 3.
It should be emphasized that the ELR method relies on using experimental data,
in conjunction with a set of descriptors obtained via computer simulations, to derive a
regression equation. However, once a reasonable, cross-validated regression equation is
derived, no additional experimental data is needed in order to make activity predictions
for novel compounds. Simulations for the bound and unbound states are all that is
needed to make activity predictions for any new compound. Ideally, a universal
regression equation (scoring function) may emerge through additional studies.
18
Chapter Two
OPLS All-Atom Model for Amines: Resolution of the Amine Hydration
Problem
Background.
One particularly notable area where classical force fields have failed is in the
calculation of free energies of hydration for both amines and amides.27, 28 Specifically,
calculated free energies of hydration (∆Ghyd) have not been in agreement with observed
experimental trends for the amine series,29, 30 ammonia, methylamine, dimethylamine,
and trimethylamine, and for the amide series,31 acetamide (ACT), N-methylacetamide
(NMA), and N,N-dimethylacetamide (DMA). Experimentally, these molecules show
counterintuitive hydration behavior with increasing methyl substitution.27, 28 That is,
one might expect that replacement of an amino hydrogen by a seemingly hydrophobic
methyl group would lead to an unfavorable (positive) contribution to the free energy of
hydration. In fact, the experimental data for ammonia and methylamine reveal the
opposite trend with a ∆∆Ghyd of –0.26 kcal/mol.29, 30 Subsequent methylations do
decrease the hydrophilic character with a ∆∆Ghyd (methylamine → dimethylamine) of
+0.27 kcal/mol and a ∆∆Ghyd (dimethylamine → trimethylamine) of +1.06 kcal/mol.
Furthermore, amides exhibit a similar sequence with a favorable relative free energy of
hydration ∆∆Ghyd (ACT → NMA) of –0.40 kcal/mol for the first methylation, and an
unfavorable ∆∆Ghyd (NMA → DMA) of +1.53 kcal/mol for the second methylation.31 A
19
general consensus does not exist concerning the physical basis of these anomalous
hydration trends.
Previous Simulation Studies.
Given the biological importance of the amide and amine functional groups,
numerous computer simulations have been performed in an effort to study the anomalous
hydration patterns. For focusing on the amines, computational studies have employed
standard classical potential energy functions and polarizable potential functions in MD
simulations with explicit solvent molecules, and self-consistent reaction field (SCRF)
methods.27, 28, 32-37 However, all of the MD and most of the SCRF calculations have
yielded serious discrepancies with the experimental data (Table 2. 1).
Early studies by Rao and Singh32 used MD/FEP calculations with an all-atom
AMBER force field to obtain the results for the amine series in Table 2. 1, column A.
Although the computed relative free energies obtained for the first and third methylations
are close to the experimental values, the second methylation yielded a ∆∆G of 1.93
kcal/mol, much higher than the experimental result of 0.27 kcal/mol. This study also
suffered from large hysteresis in the computed van der Waals (Lennard-Jones)
component of the free energy change and short simulation times. Kollman and
coworkers also used MD/FEP methods and found significant disagreement between
calculated and experimental values for the amines using both the pairwise-additive
AMBER 4.0 potentials27 and a fully polarizable model33 (Table 2. 1, columns B and C).
The simulations consistently revealed increasingly positive ∆∆Gs with increasing methyl
substitution. Likewise, Ding et al.28 used MD/FEP methods to calculate ∆∆Ghyd for the
20
amine series with and without polarization (Table 2. 1, columns D and E). The errors are
again large; although polarization seems to provide some improvement, the error for the
methylamine to dimethylamine transformation is still greater than 2 kcal/mol.
Subsequently, Marten et al.34 tried SCRF calculations with a polarizable
quantum-mechanical solute and a dielectric continuum representation of the solvent.
Despite the more sophisticated treatment of the solute, the computed relative free
energies of hydration obtained were essentially constant at 1.5−1.8 kcal/mol, once again
in significant disagreement with the experimental data (Table 2. 1, column F). These
researchers were able to reproduce the observed hydration results only by including a
hydrogen-bond correction term to fit the experimental data.34 Barone et al. have recently
noted the sensitivity of SCRF results to the choice of atomic radii.36 Notably, Marten et
al.34 also reported hydrogen-bond strengths for the amines with a water molecule as both
donor and acceptor using two force fields (OPLS* and AMBER*) and ab initio
LMP2/cc-pVTZ(-f) calculations. The authors concluded that hydrogen-bonding
interactions are improperly modeled by the force fields. In particular, the amines are too
good as hydrogen-bond donors and the nearly constant acceptor strength is not
reproduced with the force fields. It should be noted that OPLS parameters have only
been reported previously for primary amines.11, 14 The OPLS* parameters used in the
MacroModel program and other "OPLS" parameters28 for secondary and tertiary amines
were not developed in our laboratory.
21
Table 2. 1. Previously Calculated Relative Free Energies of Hydration (kcal/mol) for
Amines.
perturbation
FEPa
A
FEPb
B
polariz.
FEPc
C
FEPd
D
polariz.
FEPd
E
SCRF
GVBe
F exptlf
ammonia →
methylamine −0.07 ± 0.13 0.62 ± 0.05 0.38 ± 0.06 1.13 ± 0.19 0.3 ± 0.5 1.8 −0.26
methylamine →
dimethylamine 1.93 ± 0.08 1.62 ± 0.01 1.32 ± 0.03 3.16 ± 0.25 2.5 ± 0.6 1.8 0.27
dimethylamine →
trimethylamine 1.17 ± 0.06 2.34 ± 0.02 2.90 ± 0.09 2.29 ± 0.32 0.6 ± 0.6 1.5 1.06
aReference 32. bReference 27. cReference 33. dReference 28. eReference 34. fReferences 29 and 30.
Because of the success in reproducing experimental free energies of hydration
using FEP methods for numerous organic molecules,38, 39 the discrepancy between
theory and experiment for the amines is troublesome. In addition, the widespread interest
in structure-based drug design necessitates accurate models for amines since they are
very common components in drugs. In this paper, OPLS-AA (all-atom) parameters are
reported for ammonia and for primary, secondary, and tertiary amines. As usual, the
development has considered molecular structures, conformational energetics, hydrogen
bonding, pure liquid properties, and relative free energies of hydration. The number of
new parameters is kept to a minimum. The parameter set was developed for ammonia,
methylamine, dimethylamine, and trimethylamine. Subsequent testing covered a variety
of additional primary, secondary, and tertiary amines including cyclic and aromatic
amines. Simulations in chloroform were also carried out for the four key amines in order
22
to test the suitability of the parameters in less polar environments. This permitted
computation of relative free energies of transfer and comparison with experimental
partition coefficients, log P.
Computational Details.
Force Field Parameterization.
The standard form of the classical potential energy function used in this study has
been presented in Chapter 1. Bond-stretching and angle-bending parameters were
initially assigned from the OPLS-AA parameter set,11 which includes many entries from
the AMBER all-atom force field.40 Each atom has an associated AMBER atom type that
is used to designate the parameters for atom pairs (bond stretching) or atom triplets (angle
bending). The AMBER atom types used here are NT (amine nitrogen), H (hydrogen on
nitrogen), CT (aliphatic carbon), HC (hydrogen on aliphatic carbon), CA (aromatic
carbon), and HA (hydrogen on aromatic carbon). The present work then focused on the
development of the Fourier coefficients, partial charges, and Lennard-Jones parameters.
Parameterization is an iterative process. First, a Z-matrix was constructed for
each amine, and initial parameters were assigned on the basis of the published values for
primary amines.11 Replacement of amino hydrogens by OPLS-AA methyl groups
yielded trial partial charges for secondary and tertiary amines, and initial parameters for
ammonia were taken from the work of Gao et al.41 Gas-phase energy minimizations
were then performed with the BOSS program42 with the use of these parameters. The
geometries obtained were compared with those from experiments and from ab initio
23
optimizations at the RHF/6-31G* level. This provided a basis for adjusting the
parameters for bond stretching and angle bending. The ab initio calculations were
performed with Gaussian 95.43 The procedure for determination of missing Fourier
coefficients has been described.11 Briefly, an energy scan was performed for examples
of the missing torsions with RHF/6-31G* calculations. A full geometry optimization was
done at each point with the exception of the chosen dihedral angle. Similarly, the same
energy scans were carried out using the force field with the BOSS program and with the
Fourier coefficients for the missing torsion set to zero. Then, the relative energies from
the scans are used as input to the Simplex-based fitting program, Fitpar,44 to determine
the Fourier coefficients that minimize the differences between the RHF/6-31G* and
force-field results. The initial Fourier coefficients often require refitting when the atomic
charges and Lennard-Jones parameters are subsequently adjusted.
The observation of Marten et al.34 concerning the flawed representation of
hydrogen bonding of amines with water guided our early assignments of the partial
charges for amines. The charges for H(N), N, and C were adjusted to reproduce the
LMP2 interactions energies for each complex of the four prototypical amines with a
water molecule.34 For comparison, we also computed the corresponding interaction
energies at the RHF/6-31G* level. In each case, all degrees of freedom were optimized.
However, it was necessary to constrain the hydrogen bonds to be linear for the RHF/6-
31G* calculations in which water was the hydrogen-bond acceptor, to avoid
rearrangements.
When satisfactory agreement with molecular structures, torsional energy scans,
and hydrogen-bond strengths was obtained, MC simulations for the four pure liquids
24
were performed. Some adjustments of the partial charges and Lennard-Jones parameters
were made so that calculated properties for the pure liquid amines agreed well with
experiment. In general, the computed heats of vaporization are most affected by the
choice of partial charges, while densities are particularly sensitive to the Lennard-Jones
radii. Since our efforts were guided by consideration of multiple types of experimental
and ab initio data, the final parameter set reflects a compromise. If satisfactory results
had not been obtained with the OPLS-AA model, we would have considered
augmentation with an extra interaction site in a lone-pair position on nitrogen. This
turned out not to be necessary. We did not expect that explicit polarization would be
needed in view of the prior successes with so many other organic liquids and water.11, 45
Pure Liquid Simulations.
The Metropolis Monte Carlo simulations6 were performed with the BOSS
program on Silicon Graphics workstations or a multiprocessor Pentium cluster running
Linux. All molecules were fully flexible, which necessitates that MC simulations be
performed for both the ideal gas and liquid in order to compute heats of vaporization,
∆Hvap. The calculations were executed in the NPT ensemble at 1 atm and at either the
normal boiling point of the liquid or at 25 °C. Gas-phase simulations consisted of 3
million configurations of equilibration, followed by 3 million configurations of
averaging. For the pure liquids, periodic boundary conditions were employed with cubic
cells of 267 molecules. The equilibrated box sizes ranged from approximately 22 × 22 ×
22 Å for ammonia to 40 × 40 × 40 Å for triethylamine. Intermolecular non-bonded
interactions were truncated at 11 Å, based roughly on the center-of-mass of each
25
molecule, and quadratically feathered to zero over the last 0.5 Å. For nonaqueous
solvents, a standard correction is made for Lennard-Jones interactions neglected beyond
the cutoff.8 Each liquid was first equilibrated for 12 million configurations and the
averaging occurred over an additional 12 million configurations, which were run in
batches of 500,000 configurations. Overall, the computed densities, heats of
vaporization, radial distribution functions, energy distributions and conformational
properties are very well converged with MC simulations of this length. By adjusting the
allowed ranges for rigid-body rotations, translations, and dihedral angle movement,
acceptance ratios of between 40% for aliphatic amines and 18−20% for cyclic and
aromatics amines were obtained for new configurations. The ranges for bond stretching
and angle bending are set automatically by the BOSS program on the basis of the force
constants and temperature.
It should be noted that more than one set of non-bonded parameters may yield
calculated densities and heats of vaporization in close agreement with experiment. For
ammonia, 25 pure liquid simulations were run using different non-bonded parameter sets.
Six of these yielded a calculated density and heat of vaporization within 3% of the
experimental values. Only parameter sets for ammonia were further considered if they
also yielded reasonable hydrogen-bond energetics with water and a qualitatively correct
free energy of hydration relative to methylamine. Otherwise, free energies of hydration
were not considered in the parameterization.
26
Free Energy Perturbations.
As an example, the relative free energies of hydration for methylamine and
ammonia can be determined from the thermodynamic cycle in Figure 2.1, which leads to
eq 2.1.13, 14
)NHCH()NH( 23hyd3hydgaswatehyd GGGGG r ∆−∆=∆−∆=∆∆ (2. 1)
Figure 2. 1. Thermodynamic cycle used to determine the relative free energy of
hydration (∆∆Ghyd) between methylamine and ammonia.
∆Ggas is evaluated here through MC/FEP simulations by mutation of methylamine to
ammonia in isolation, and ∆Gwater is obtained by an equivalent mutation in the presence
27
of explicit water molecules. Their difference can then be compared to the difference in
experimental free energies of hydration.
All of the present free energy perturbations consisted of mutating a methyl group
to a hydrogen atom. The three methyl hydrogens are mutated to dummy atoms, which
have zero for q, σ, and ε, and the methyl carbon is mutated to the appropriate secondary,
primary, or ammonia hydrogen, H(NT). For these fully flexible systems, we retain the
CT−HC force constants for the H(NT)-dummy pairs, but reduce the r0 to 0.3 Å. For the
angle bending, we retain only one angle to the dummy atom with nonzero parameters.
This combination keeps the dummy atom in a reasonable position without placing any
constraint on the final structure, that is, the same total energy is obtained from an energy
minimization with or without the dummy atom.46
The use of flexible geometries for the solutes requires computation of ∆Ggas in
Figure 2.1. In this case, the MC simulation for each window consisted of 3 million
configurations of equilibration followed by 3 million configurations of averaging. The
ranges for dihedral-angle changes were adjusted so that ca. 40% acceptance for new
configurations was achieved. Convergence was monitored by plotting the results for
∆Ggas vs. λ, which showed little change after 1 million configurations of averaging.
The FEP calculations in water were performed for a single solute in a periodic
cube with 500 TIP4P water molecules.47 Both solute−solvent and solvent−solvent
cutoffs were at 10.0 Å based roughly on the separations of amine nitrogens and water
oxygens. Each window consisted of 6 million configurations of equilibration, followed
by 8 million configurations of averaging. Negligible differences in the computed free
energy changes occurred after 5 million configurations of averaging. Similarly, as in the
28
pure liquid simulations, adjustment of the allowed ranges for rigid body rotations,
translations, and dihedral angle movements yielded acceptance rates of 30−50% for new
configurations. The simulation protocol in chloroform was the same except that the
number of chloroform molecules was 267 and the solvent−solvent, and solute−solvent
cutoffs were extended to 12.0 Å. The potential functions for chloroform are the OPLS 4-
site model.14
Results and Discussion.
Force Field Parameters.
The final OPLS-AA parameters for amines are reported in Tables 2.2−2.5. The
bond-stretching and angle-bending parameters (Tables 2.2 and 2.3) are mostly from prior
work.11 Missing combinations of atom types for aromatic amines, for example, the
CA−NT bond-stretching and CA−NT−H, CA−CA−N, and CA−NT−CT angle-bending
parameters, were extrapolated from related types and adjusted to yield good accord with
RHF/6-31G* optimized geometries. As before,11 the molecular structures from OPLS-
AA optimizations are essentially identical to RHF/6-31G* results; for bond lengths and
bond angles involving nitrogen, the average deviations are 0.01 Å and 1.5°. Furthermore,
the average differences between the computed results and experimental data are 0.02 Å
for bond lengths and 2° for bond angles.
29
Table 2. 2. OPLS-AA Bond Stretching Parameters.
bond kb (kcal mol-1 Å-2) r0 (Å)
H−NT 434.0 1.010
CA−NT 481.0 1.340
CT−NT 382.0 1.448
CA−HA 367.0 1.080
CT−HC 340.0 1.090
CT−CT 268.0 1.529
CA−CA 469.0 1.400
Table 2. 3. OPLS-AA Angle Bending Parameters.
angle kθ (kcal mol-1 rad-2) θ0 (deg)
CT−NT−H 35.00 109.50
H−NT−H 43.60 106.40
CA−NT−H 35.00 111.00
CA−CA−NT 70.00 120.10
CA−NT−CT 50.00 109.50
CA−CA−HA 35.00 120.00
CA−CA−CA 63.00 120.00
CT−CT−HC 37.50 110.70
CT−CT−CT 58.35 112.70
HC−CT−HC 33.00 107.80
CT−CT−NT 56.20 109.47
CT−NT−CT 51.80 107.20
HC−CT−NT 35.00 109.50
30
The torsional parameters are listed in Table 2.4. The parameters for primary
amines and hydrocarbons were reported previously and are provided for completeness.11
Additional torsional parameters were developed in this work for the HC−CT−NT−CT and
CT−NT−CT−CT combinations in aliphatic amines and for the CA−CA−NT−H and
CA−CA−NT−CT torsions in anilines. The OPLS-AA parameters reproduce all tested
RHF/6-31G* torsional-energy profiles with an average difference of less than 0.1
kcal/mol for methylamine (HCNH), ethylamine (CCNH, HCCN), propylamine (CCNH,
CCCN), dimethylamine (HCNC), diethylamine (CCNC), trimethylamine (HCNC), and
triethylamine (CCNC).
It was found that cyclic aliphatic amines required unique CT−CT−NT−H and
CT−NT−CT−CT torsional terms in order to obtain close agreement with ab initio results
for equatorial vs. axial disposition of hydrogens or methyl groups on nitrogen in cyclic
amines.
N NR
Requatorial axial
With the reported parameters, there is reasonable accord among the computed
results; for example, for piperidine and N-methylpyrrolidine the equatorial conformers
are preferred by 0.82 and 2.61 kcal/mol with the force field, 0.82 and 3.68 kcal/mol with
RHF/6-31G*//RHF/6-31G*, and 0.36 and 3.44 kcal/mol with B3LYP/6-31G*//RHF/6-
31
31G*. For piperidine, higher-level ab initio calculations give values of 0.6−0.9 kcal/mol
and experimental results are 0.2−0.5 kcal/mol.48-50
Table 2. 4. OPLS-AA Fourier Coefficients (kcal/mol).
amine type dihedral angle V1 V2 V3
aliphatic HC−CT−NT−H 0.000 0.000 0.400
aliphatic HC−CT−CT−NT −1.013 −0.709 0.473
aliphatic CT−CT−NT−H −0.190 −0.417 0.418
aliphatic CT−CT−CT−NT 2.392 −0.674 0.550
aliphatic CT−NT−CT−CT 0.416 −0.128 0.695
aliphatic HC−CT−NT−CT 0.000 0.000 0.560
aliphatic HC−CT−CT−HC 0.000 0.000 0.318
aliphatic HC−CT−CT−CT 0.000 0.000 0.366
aliphatic CT−CT−CT−CT 1.740 −0.157 0.279
four-member cyclic CT−CT−NT−H 0.000 4.000 0.000
five-member cyclic CT−CT−NT−H 0.200 −0.417 0.418
six-member cyclic CT−CT−NT−H 0.819 −0.417 0.418
exocyclic methyl group CT−NT−CT−CT 1.536 −0.128 0.695
aromatic CA−CA−NT−H 0.000 2.030 0.000
aromatic CA−CA−NT−CT −7.582 3.431 3.198
aromatic (improper) Z−CA−X−Y 0.000 2.200 0.000
aromatic X−CA−CA−Y 0.000 7.250 0.000
The torsional parameters, which were developed for a monosubstituted functional
group, are then also used for polysubstituted cases. Although this is generally successful,
N,N-dimethylaniline initially seemed problematic. Although nearly exact agreement was
obtained between the OPLS-AA and RHF/6-31G* dihedral-angle energy profiles for both
aniline and N-methylaniline, the RHF/6-31G* torsion scan for the tertiary analog yields a
32
rotational barrier of 0.6 kcal/mol, while the force field gives a barrier of 2.2 kcal/mol.
These values are lower than the barriers of ca. 3.7 kcal/mol for aniline and N-
methylaniline from both OPLS-AA and RHF/6-31G*. Estimates from experimental
sources have not converged, but are in the 3−6 kcal/mol range for all three anilines.51 To
investigate the possibility that electron correlation may be important, the dimethylaniline
scan was repeated with B3LYP/6-31G* optimizations. This did yield a higher barrier,
3.5 kcal/mol, and the reported CA−CA−NT−CT parameters have been retained for both
secondary and tertiary anilines.
The non-bonded parameters for amines are listed in Table 2.5. The pattern of
partial charges was largely determined by reproduction of the hydrogen-bond strengths
(vide infra). The partial negative charge on nitrogen becomes more positive by
0.12−0.15 e for each added methyl group, and the charge on the amine hydrogen
becomes more positive by 0.02 e on going from ammonia to primary and then secondary
amines. The charge for hydrogens on α-carbons was fixed at 0.06 e and this then
determined from neutrality the required charges on the α-carbons. The same charges are
used for anilines with neutrality determining the charge for ipso carbons. Thus, only the
charges on N and H(N) were effectively varied and the results form simple patterns. The
charge on nitrogen in ammonia, −1.020 e, ended up only slightly different from Gao's
value of −1.026 e,41 which may reflect the change to a flexible geometry.
33
Table 2. 5. OPLS-AA Non-Bonded Parameters.
atom type atom or group q (e−) σ (Å) ε (kcal/mol)
NT ammonia −1.02 3.42 0.170
NT 1º amine −0.90 3.30 0.170
NT 2º amine −0.78 3.30 0.170
NT 3º amine −0.63 3.30 0.170
H(NT) ammonia 0.34 0.00 0.000
H(NT) 1º amine 0.36 0.00 0.000
H(NT) 2º amine 0.38 0.00 0.000
HC(CT) for CT directly bonded to NT 0.06 2.50 0.015
HC alkanes 0.06 2.50 0.030
CT(NT) 1º amine CH3 group 0.00 3.50 0.066
CT(NT) 2º amine CH3 group 0.02 3.50 0.066
CT(NT) 3º amine CH3 group 0.03 3.50 0.066
CT(NT) 1º amine CH2 group 0.06 3.50 0.066
CT(NT) 2º amine CH2 group 0.08 3.50 0.066
CT(NT) 3º amine CH2 group 0.09 3.50 0.066
CA(NT) 1º amine ipso carbon 0.18 3.55 0.070
CA(NT) 2º amine ipso carbon 0.20 3.55 0.070
CA(NT) 3º amine ipso carbon 0.21 3.55 0.070
34
The Lennard-Jones parameters in Table 2.5 remained unchanged from the original
OPLS-AA parameter set11 with minor exceptions. For ammonia, the Lennard-Jones σ
needed adjustment to obtain satisfactory agreement with both the experimental density
and heat of vaporization of the pure liquid. Otherwise, the Lennard-Jones parameters for
nitrogens in all amines are the same with σ = 3.30 Å and ε = 0.17 kcal/mol, whereas 3.25
Å and 0.17 kcal/mol had previously been used for primary amines.11 The σ and ε for
amine hydrogens are zero, as always for hydrogens attached to heteroatoms.11 And, for
hydrogens on α-carbons, the reduced ε of 0.015 kcal/mol has been used vs. 0.030 for
alkanes. The same reduced ε is used for α hydrogens in aldehydes, ketones, esters, and
nitro compounds.11 All parameters for more remote alkyl and aromatic carbons and
hydrogens have the standard OPLS-AA values.11 Thus, it turns out that there is little
new in Table 2.5 beyond the choice of charges for N and H(N) in amines.
Gas-Phase Interaction Energies.
The hydrogen-bond strengths for the amine−water complexes from the OPLS*,
AMBER*, and ab initio LMP2 calculations of Marten et al.34 are listed in Table 2.6
along with the present RHF/6-31G* and OPLS-AA results. It is expected that the LMP2
results are highly accurate,52, 53 so they provide the target patterns for the force fields.
Qualitatively, the LMP2 and RHF/6-31G* results show the same trends, a nearly constant
interaction energy around −6 kcal/mol for water as the hydrogen-bond donor and a
significantly weaker interaction of −2 to −3 kcal/mol for water as the hydrogen-bond
acceptor. The incorrect orderings from the MacroModel calculations are well remedied
by the OPLS-AA results. The hydrogen bonds are uniformly 20−30% stronger with the
35
OPLS-AA force field than from the LMP2 calculations. Such enhancement of
intermolecular interactions is needed for reproduction of, for example, heats of
vaporization with the fixed charge models.11, 47 This presumably compensates for the
lack of explicit polarization. As an additional check of the robustness of the force field,
enthalpies of interaction were computed from normal mode calculations for ammonia,
methylamine, dimethylamine, and trimethylamine with potassium ion using Åqvist's K+
parameters.54 Excellent agreement with gas-phase experimental data55 was obtained, as
shown in Figure 2.2.
Table 2. 6. Comparison of Hydrogen-Bond Interaction Energies ( kcal/mol) for Amines.
previously reporteda this study
molecule OPLS*b AMBER*b LMP2 RHFc OPLS-AA
Water as a H-Bond Donor
ammonia –7.5 –9.7 –5.5 –6.6 –6.5
methylamine –7.0 –7.6 –5.9 –6.5 –7.4
dimethylamine –6.3 –5.4 –6.1 –6.3 –7.8
trimethylamine –5.1 –3.0 –6.1 –5.9 –7.5
Water as a H-Bond Acceptor
ammonia –4.2 –6.1 –2.2 –2.9 –3.1
methylamine –4.4 –7.3 –2.3 –2.7 –3.6
dimethylamine –4.6 –8.4 –2.4 –2.7 –3.8
aReference 34. bAsterisk denotes MacroModel version. cRHF/6-31G*//RHF/6-31G* optimizations with
water fixed: r(OH) = 0.9572 Å and <HOH = 104.52°. For water as hydrogen-bond donor, six intermolecular
degrees of freedom were optimized. For water as hydrogen-bond acceptor, the H-bond was constrained to be
linear.
36
Figure 2. 2. Gas-phase interaction energies and enthalpies (kcal/mol) of amines with
potassium ion. Calculated results are from the OPLS-AA force field, and the
experimental enthalpies are from reference 55.
Pure Liquid Results.
The OPLS-AA parameters for ammonia, methylamine, dimethylamine, and
trimethylamine were developed in conjunction with computation of their liquid densities
and heats of vaporization. These are considered to be the key properties since they reflect
both the size of the molecules and the average intermolecular interactions. The
transferability of the parameters was tested through subsequent MC simulations for the
pure liquids of ethylamine, propylamine, diethylamine, triethylamine, aziridine, azetidine,
pyrrolidine, 1-methylpyrrolidine, piperidine, 1-methylpiperidine, aniline, N-
37
methylaniline, and N,N-dimethylaniline. The results are shown in Table 2.7. In all cases,
excellent agreement with experimental densities was obtained with an average unsigned
error of 1%. The heats of vaporization obtained from the MC simulations for the gases
and liquids are also in good agreement with the experimental data in Table 2.7; the
average unsigned error is less than 3%.
Table 2. 7. Computed Densities and Heats of Vaporization from Pure Liquid
Simulations.
density (g/cm3) ∆Hvap (kcal/mol)
liquid T (°C) calcd exptl calcd exptl
ammonia −33.35 0.697 ± 0.001 0.682a 5.42 ± 0.008 5.58a methylamine −6.30 0.698 ± 0.002 0.694b 6.22 ± 0.018 6.17c ethylamine 16.50 0.705 ± 0.002 0.687d 6.95 ± 0.023 6.70e propylamine 25.00 0.717 ± 0.001 0.711f 7.80 ± 0.030 7.47g dimethylamine 6.88 0.658 ± 0.002 0.671d 6.22 ± 0.024 6.33h diethylamine 25.00 0.709 ± 0.001 0.699I 7.84 ± 0.021 7.48g trimethylamine 2.87 0.660 ± 0.001 0.653d 5.32 ± 0.021 5.48j triethylamine 25.00 0.722 ± 0.001 0.723d 8.61 ± 0.028 8.33g aziridine 25.00 0.802 ± 0.001 0.831k 8.20 ± 0.020 8.09l azetidine 25.00 0.820 ± 0.001 0.841m 7.77 ± 0.020 8.17l pyrrolidine 25.00 0.860 ± 0.001 0.854n 9.33 ± 0.024 8.95l 1-methylpyrrolidine 25.00 0.807 ± 0.001 0.799o 7.95 ± 0.022 7.94l piperidine (equatorial) 25.00 0.870 ± 0.001 0.857p 10.71 ± 0.036 9.39l piperidine (axial) 25.00 0.861 ± 0.001 0.857p 8.66 ± 0.028 9.39l 1-methylpiperidine 25.00 0.821 ± 0.001 0.816o 8.81 ± 0.026 8.55l aniline 25.00 1.036 ± 0.001 1.017q 12.78 ± 0.038 12.60r N-methylaniline 25.00 0.975 ± 0.001 0.984q 12.66 ± 0.040 12.70r N,N-dimethylaniline 25.00 0.937 ± 0.001 0.953q 11.68 ± 0.027 11.90r aReference 56. bReference 57. cReference 58. dExtrapolated from reference 59. eReference 60. fReference 61. gReference 62. hReference 63. IReference 59. jReference 64. kReference 65. lReference 66. mReference 67. nReference 68. oReference 69, exptl at 20 °C, simulation at 25 °C. pReference 70. qReference 71. rReference 72.
Piperidine is an interesting case. The pure liquid simulations were normally
started using the lowest-energy conformation for all molecules as determined from the
38
gas-phase energy minimizations with the new force field. While acyclic aliphatic and
aromatic amines pose no sampling problems with respect to the intramolecular degrees of
freedom, cyclic aliphatic compounds tend to stay in the original ring conformation, since
ring flipping or inversion barriers are ca. 6 kcal/mol. Although, as mentioned above, the
force field favors equatorial piperidine by 0.8 kcal/mol over the axial form in the gas
phase, pure liquid simulations were run, starting from both conformers, for all piperidine
molecules in the liquid. At the ends of the runs, no molecules had changed conformation
in the equatorial liquid, while only three of the initially axial molecules were equatorial.
The results in Table 2.7 show that the calculated densities for the axial and equatorial
liquids are nearly the same and both are very close to the experimental value of 0.857
g/cm3. However, the calculated heat of vaporization is higher than the experimental
value by 1.3 kcal/mol for the equatorial liquid, while it is too low by 0.7 kcal/mol for the
axial liquid. In both cases the gas is taken as equatorial. The comparison between theory
and experiment suggests that piperidine in the pure liquid is a mixture of equatorial and
axial. The exact mixture could be pursued with a modified MC sampling procedure that
can achieve the equilibrium, although the acceptance rate may be low. On the
experimental side, the conformational preference for piperidine has been the subject of
lively debate.49, 50 The conclusion from numerous spectrocopic measurements is that
piperidine is equatorial in the gas phase and in non-polar solvents, but that it is mostly
axial in alcohol solvents. The possibility of a mixture for the neat liquid near 25 °C
seems reasonable. It may also be noted that for 1-methylpiperidine, the computed and
experimental results in Table 2.7 show the usual level of accord. In this case, the
evidence is that the equatorial form is dominant in all media.49, 50
39
Radial distribution functions (rdfs) provide a measure of the local structure in
liquids and coordination numbers can be obtained by the integration of their peaks.8 The
N−N rdfs for the four prototypical amines are presented in Figure 2.3. The loss of
hydrogen bonding for trimethylamine is clearly apparent in the lack of a peak near 3 Å.
Estimates of the numbers of hydrogen bonds per molecule are more readily obtained
from integration of the first peak in the N−H(N) rdfs, which reveal sharper first peaks
with minima near 2.5 Å (not shown). Integration to that point yields average numbers of
hydrogen bonds of 2.56 for ammonia at −33 °C, 1.96 for methylamine at −6 °C, and 1.11
for dimethylamine at 7 °C. The latter figure is consistent with the expected hydrogen-
bonded chains, while more branching is apparent for ammonia and methylamine. The
numerical result for liquid ammonia is similar to the findings from prior simulation,41, 73
while hydrogen-bonding results have not been reported previously for the other amines.
Furthermore, integration of the ammonia N−N rdf from the pure liquid simulations out to
the first minimum at ca. 4.85 Å encompasses 11.5 neighbors. For comparison, the X-ray
results of Narten at +4 °C yield 12.0 neighbors from integration of the N−N rdf to the
minimum at 5.0 Å.74
40
Figure 2. 3. N−N radial distribution functions for liquid amines from Monte Carlo
simulations with the OPLS-AA force field. X-ray results for ammonia are at +4 °C from
reference 74. Successive curves are offset 3.0 units along the y-axis.
41
Free Energies of Hydration.
Results from the MC/FEP simulations for the relative free energies of hydration
of the four prototypical amines are recorded in Table 2.8. The calculated values are in
excellent agreement with experiment. Methylamine is the most hydrophilic and the large
increments upon increasing methylation obtained previously (Table 2. 1) have been
appropriately ameliorated. Plots of ∆G vs. λ are shown for the perturbations in the gas
phase, water, and chloroform for the three interconversions in Figures 2.4−2.6. The
smoothness of the free energy profiles, which were obtained using a ∆λ of 0.05 for most
windows, attests to the high precision that can be obtained for such MC/FEP calculations
with the BOSS program.
42
Table 2. 8. Relative Free Energies (kcal/mol) of Hydration (water), Solvation (chloroform), and Transfer (water → chloroform),
and ∆log P for Amines at 25 °C.
∆∆Ghyd (water) ∆∆Gsolv (CHCl3) ∆∆Gtrans ∆log Pa
perturbation (A → B) calcd exptlb calcd exptlc calcdd exptle calcd exptle
methylamine → ammonia 0.11 ± 0.20 0.26 1.10 ± 0.10 0.8 0.99 ± 0.22 0.49 0.73 ± 0.22 0.36
dimethylamine → methylamine −0.10 ± 0.18 −0.27 0.99 ± 0.14 0.5 1.09 ± 0.23 0.79 0.80 ± 0.23 0.58
trimethylamine → dimethylamine −1.53 ± 0.15 −1.06 0.82 ± 0.16 0.2 2.35 ± 0.22 1.27 1.73 ± 0.22 0.93
a∆log P = log PA − log PB. bReferences 29 and 30. cReference 75. d∆∆Gtrans(calcd) = ∆∆Gsolv(calcd) - ∆∆Ghyd(calcd). eFrom
Masterfile Database, Pomona College Medchem Project & BioByte Corp., Claremont, CA, 1994.
43
Figure 2. 4. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: methylamine
ammonia
44
Figure 2. 5. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: dimethylamine
methylamine
45
Figure 2. 6. Plots of ∆G (kcal/mol) vs. λ in the gas phase, water, and chloroform from
free energy perturbation calculations with the OPLS-AA force field: trimethylamine
dimethylamine.
46
Rdfs and energy pair distributions for the four prototypical amines in water were
analyzed to clarify the variations in hydrogen bonding and free energies of hydration.
The first peaks in the N−HW rdfs (amine N−water H) are well resolved in Figure 2.7 and
integration to the minima at 2.5 Å yields estimates of the number of N−HW hydrogen
bonds: 1.23 for ammonia, 1.20 for methylamine, 1.05 for dimethylamine, and 1.09 for
trimethylamine. Thus, not surprisingly, each amine is accepting roughly one hydrogen
bond from a water molecule. Hydrogen-bond donation is characterized by the H(N)−OW
rdfs in Figure 2.8; the first peak can be assigned to hydrogen bonds with the amine
hydrogens, while the larger second peak near 3.5 Å arises from the oxygen of the water
that is donating a hydrogen bond to the nitrogen. The first peaks are not as sharp and
well-defined as in the N−HW rdfs since amines are significantly better hydrogen-bond
acceptors than donors (Table 2.6). Integration to the first minimum near 2.5 Å in the
H(N)−OW rdfs yields estimated numbers of hydrogen bonds of 1.31 for methylamine and
0.82 for dimethylamine. For ammonia, the first peak has become a shoulder, but
integration to the same limit yields an estimate of 1.38 hydrogen bonds. Combining the
results for both types of interactions yields estimates of the total number of hydrogen
bonds with water of 2.61 for ammonia, 2.51 for methylamine, 1.87 for dimethylamine,
and 1.09 for trimethylamine. If the hydrogen bonds had similar strengths, these
decreasing numbers of hydrogen bonds could lead to the erroneous order of increasing
hydrophobicity with increasing methylation.
47
Figure 2. 7. N−HW (amine N−water H) radial distribution functions in TIP4P water from
MC simulations with the OPLS-AA force field.
48
Figure 2. 8. H(N)−OW (amino H−water O) radial distribution functions in TIP4P water
from MC simulations with the OPLS-AA force field.
49
However, variations in the hydrogen-bond strengths are apparent in the energy
pair distributions in Figure 2.9. Hydrogen bonds are reflected in the low-energy bands in
such plots. Integration to the well-defined minima near −3.5 kcal/mol yields the
following numbers for hydrogen bonds: 1.00 for ammonia, 1.11 for methylamine, 1.18
for dimethylamine, and 1.04 for trimethylamine. Clearly, this is the peak for the
hydrogen-bond donating water molecule. Moreover, the average strength of this
interaction increases with increasing methylation until it levels off for dimethylamine and
trimethylamine in Figure 2.9. The hydrogen-bond accepting waters have weaker
interactions that are in the −2.0 to −3.5 kcal/mol region and their number naturally
declines with replacement of amino hydrogens by methyl groups. Thus, qualitatively two
opposing effects can be inferred: increased contribution from hydrogen-bond acceptance
and diminished contribution from the weaker hydrogen-bond donation with increasing
methylation. With thanks to the availability of the ab initio LMP2 results (Table 2.6), the
proper balance of hydrogen-bond strengths is achieved with the OPLS-AA force field and
leads to the correct order of free energies of hydration. If, for example, the amines are
too good as hydrogen-bond donors, as with AMBER*, then the latter effect dominates
and hydrophobicity will increase incrementally with increasing methylation.
50
Figure 2. 9. Solute−solvent (amine−water) energy pair distributions from MC
simulations with the OPLS-AA force field. The y-axis records the number of water
molecules per kcal/mol, which interact with the amine solute with the interaction energy
given on the x-axis.
51
The concern over the disagreement between prior computation and experiment for
amine hydration has also been emphasized in a recent paper by Miklavc.76 It was argued
that the errors could be explained by inadequate sampling of the torsional motion for the
amines in the aqueous FEP calculations, which would lead to an R ln 3 underestimate in
∆S for each methyl rotor. That is, the FEP calculations in water would only sample one
of the three equivalent rotameric states for conversion of a hydrogen into a methyl group,
while all three conformational states would be sampled in a gas-phase FEP calculation.
Thus, the gain in the number of states would be missed in water. This is not correct,
since the three equivalent rotational states for each methyl group in amines and other
organic molecules are fully sampled in any MD or MC simulations of normal length.
Actually, the real problem is that FEP calculations would yield the same free energy
change for conversion of a hydrogen to a methyl group in which the methyl group rotated
freely or was locked in one conformational well by poor sampling or by a modified
torsional potential. The ∆G contribution from a change in number of conformational
states, m and n, for two systems A and B only becomes apparent through full
characterization of all available conformational states of A and B and their relative free
energies (eq 2.2), as discussed elsewhere.77, 78
⎥⎦
⎤⎢⎣
⎡−−−=→∆ ∑ ∑
m
i
n
jji RTGRTGRTG )/exp(/)/exp(ln)BA( AB (2. 2)
Another View: Energy Components from Linear Response. We have used linear
response methods (Chapter One) to estimate free energies of hydration from the results
52
with eq 2.3 where Evdw and ECoul are the van der Waals (Lennard-Jones) and electrostatic
(Coulombic) energy components of the total solute−water interaction energy, SASA is
the solute's solvent-accessible surface area using a probe radius of 1.4 Å for water, and α,
β, and γ are empirical parameters.16, 17
SASACoulvdwhyd γEβEαG ++=∆ (2. 3)
The earlier studies have been expanded with results for 44 diverse organic solutes in
water including the four prototypical amines, all modeled using the OPLS-AA force field
in MC simulations with 500 TIP4P water molecule.79 The optimized parameters are α =
0.410, β = 0.463, and γ = 0.0193 kcal/mol-Å2. The fit yields an average unsigned error of
0.74 kcal/mol for the 44 predicted free energies of hydration in comparison to the
experimental data, which cover a 13 kcal/mol range.
Notably, the predicted ∆Ghyd values for the amine series nicely parallel the
experimental data, as summarized in Table 2.9. The results show that the Lennard-Jones
interactions become more favorable and the surface-area term (penalty for cavity
formation) becomes more unfavorable with increasing methylation. In fact, the
variations of these two components are almost exactly compensating, and the pattern in
total free energies of hydration parallels the changes in the Coulombic solute−water
interactions. This again emphasizes the importance of the hydrogen-bond strengths. It
also supports the above analysis that the trend in free energies of hydration can be
attributed to the opposition of better hydrogen-bond acceptance and poorer hydrogen-
53
bond donation with increasing methylation of the amines. The optimal point happens to
occur for methylamine.
Table 2. 9. Linear Response Components (kcal/mol) for Amines in Water.
amine ∆Gvdw ∆GCoul ∆GSASA
calcd
∆Ghyd
exptla
∆Ghyd
ammonia +0.41 −6.41 2.66 −3.34 −4.31
methylamine −0.46 −6.87 3.49 −3.84 −4.57
dimethylamine −1.26 −6.50 4.23 −3.53 −4.30
trimethylamine −2.32 −5.05 4.96 −2.41 −3.24
∆∆Gvdw ∆∆GCoul ∆∆GSASA
calcd
∆∆Ghyd
exptl
∆∆Ghyd
ammonia 0.00 0.00 0.00 0.00 0.00
methylamine −0.87 −0.46 0.83 −0.50 −0.26
dimethylamine −1.67 −0.09 1.57 −0.19 +0.01
trimethylamine −2.73 +1.36 2.30 +0.93 +1.07
aReferences 29 and 30.
54
Free Energies of Transfer and ∆log P Results.
As a further test of the transferability of the OPLS-AA parameters, FEP
calculations were also performed for the amine series in chloroform (Table 2.8). The free
energy of transfer of a solute i between water and chloroform is related to its partition
coefficient (Pi) via eq 2.4.
)()(log3.2)( hydsolvtrans iGiGPRTiG i ∆−∆=−=∆ (2. 4)
Computation of relative free energies of solvation in both solvents then allows direct
comparison with experimentally determined log P values by eq 2.5.14
)log(log3.2)BA()BA()BA( BAhydsolvtrans PPRTGGG −=→∆∆−→∆∆=→∆∆ (2. 5)
In Table 2.8, the MC/FEP results are in good accord with the experimental relative free
energies of solvation in chloroform. In this case the free energy of solvation becomes
steadily more favorable, though by a diminishing amount, with increasing methylation.
Combination with the computed results in water then leads to reasonable agreement
between the simulation results and experiment for the relative log P values. Thus, the
present MC simulations with the OPLS-AA force field reproduce the expected order of
free energies of hydration in a non-polar solvent as well as the unusual order in water.
Previously, Dunn and Nagy performed MC/FEP simulations for the conversion of
methylamine to dimethylamine in water and chloroform.80 A relative log P of 2.5 was
55
obtained, which is too large in comparison with the experimental value of 0.6 or the 0.8
obtained here. The problem comes mostly from the methylamine to dimethylamine
perturbation in water, which gave a ∆G of 2.90 kcal/mol vs. the experimental value of 0.3
kcal/mol.80 McDonald et al. computed free energies of solvation in chloroform for
methylamine, dimethylamine, and trimethylamine, using MC/FEP simulations with
OPLS Lennard-Jones parameters, but with RHF/6-31G* CHELPG charges.17 The
computed ∆∆Gsolv values in chloroform were 1.3 and 1.1 kcal/mol for the dimethylamine
to methylamine and trimethylamine to dimethylamine conversions, respectively, which
agree with the experimental data by ca. 0.3 kcal/mol less well than the present results
(Table 2.8).
Conclusion.
Previous computational efforts on amine hydration have employed models with
standard pairwise additive interaction potentials,27, 28, 32 explicit polarization,28, 33 and
quantum mechanical SCRF calculation.34-37 Although all studies with explicit solvent
molecules and most SCRF models failed to mirror the experimental trends in free
energies of hydration, the work presented in this chapter has shown that a simple,
classical force field, which is parameterized to reproduce experimental properties of pure
liquids (Table 2.7) as well as ab initio hydrogen-bond strengths (Table 2.6), can solve the
amine hydration problem (Table 2.8). There is no need for models with more complex
functional forms including explicit polarization. The present parameterization of the
critical non-bonded terms involved few unique parameters and features simple charge
increments upon increasing methylation in the amine series (Table 2.5). The results of
56
the Monte Carlo simulations also led to the explanation of the observed variation in free
energies of hydration through two competing trends, increased contribution from
hydrogen-bond acceptance and diminished contribution from hydrogen-bond donation
with increasing methylation of the amines.
In further testing, the present force field was shown to yield excellent results for
properties of thirteen additional liquid amines (Table 2.7). The transferability of the
parameters to less polar solvents such as chloroform was also demonstrated by
computation of relative log P values in reasonable agreement with experiment (Table
2.8). In view of the very common occurrence of amines in chemotherapeutics, the
availability of a force field with such broad, documented success for a wide range of
properties in different media is most important for computer-aided drug design. Errors in
partitioning between water and low-dielectric media may be expected to adversely affect
predictions on protein-ligand binding as well as QSAR analyses.
57
Chapter 3
Estimation of Binding Affinities for HEPT and Nevirapine Analogs with
HIV-1 Reverse Transcriptase via Monte Carlo Simulations
Background.
The human immunodeficiency virus (HIV), which has been identified as the
causative agent of acquired immunodeficiency syndrome (AIDS),81 infected about
14,500 people each day in 2000.1 The World Health Organization and the Joint United
Nations Programme on HIV/AIDS estimate that 21.8 million persons have died from the
disease, 36.1 million people are currently infected with HIV, and over 95% of new
infections are in developing countries (Figure 0.1).1 The need for potent, safe, and
inexpensive chemotherapeutics is clear, and the therapies must also be effective against
mutant strains of HIV which arise from and circumvent existing anti-HIV treatments.82
One of the key enzymes packaged within the HIV virion capsid is a reverse
transcriptase (RT) that plays an essential role in the replication of the virus (Figure
3.1).81, 83, 84 Consequently, HIVRT has emerged as a prime target for the development
of drugs for HIV/AIDS therapy.81, 82 The HIVRT protein has both RNA dependent
DNA polymerase and RNaseH activities that are required for the conversion of genomic
viral RNA to DNA; this viral DNA is subsequently incorporated into the host cell
genome.81, 83, 85 Inhibitors of HIVRT fall into two main classes (Figure 3.2):82, 85 (1)
Nucleoside inhibitors (NRTIs) are compounds that mimic normal nucleoside substrates
58
but lack the 3′−OH group required for DNA chain elongation. NRTIs compete with
native nucleosides and effectively stall polymerase activity by becoming incorporated
into the growing DNA strand thereby causing premature chain termination.82, 85 (2)
Non-nucleoside inhibitors (NNRTIs) are molecules that bind to a region of HIVRT
located near the polymerase catalytic site.85 The binding event alters the conformation of
critical residues and thereby inhibits the ability of the enzyme to perform normal RT
functions.82
Figure 3. 1. Cartoon representation of an HIV particle. Reverse transcriptase (RT)
converts viral RNA to viral DNA for subsequent incorporation into the host cell genome.
59
Figure 3. 2. Schematic diagram showing the different binding sites for nucleoside
(NRTI) and non-nucleoside (NNRTI) HIV reverse transcriptase (HIVRT) inhibitors. The
apo coordinates in green on the left are from reference 86. The NRTI/HIVRT complex in
cyan (top) showing the NRTI binding site in red and the viral nucleic acid site in magenta
is from reference 87. The NNRTI/HIVRT complex in cyan (bottom) showing the NNRTI
binding site in red is from reference 88.
60
Although both NRTIs and NNRTIs dramatically decrease viral load in most
infected persons on initiation of antiviral therapy, resistance to the chemotherapeutics
invariably develops.85 After the onset of infection, the virus replicates quickly within the
host and a genetically related swarm (quasispecies) of virions is soon established.3, 4
This viral pool of variants arises rapidly mainly due to the low fidelity of HIVRT, which
has been estimated to yield from 5−10 errors per HIV genome per round of replication.89,
90 Since as many as 109 virions are produced each day,91 resistance to both nucleoside
and non-nucleoside drugs quickly develops.82 Since resistance arises in response to the
chemotherapy, structurally unique inhibitors are needed that can challenge the swarm of
virions in different ways. The use of combinations of NRTIs, NNRTIs, and HIV protease
inhibitors is currently the best method for controlling HIV infection.92, 93 However, it is
also desirable to have multiple inhibitors within a class since their unique modes of
binding can lead to different resistance profiles. The present study has used computer
simulations in an effort to develop protocols and methods that can be used in the design
of improved anti-HIV drugs. In particular, computations have been carried out for the
binding affinities of 40 analogs of the NNRTIs, HEPT and nevirapine (Tables 3.1 and
3.2). Nevirapine was the first FDA-approved NNRTI and the HEPT analog, MKC-442,
is in clinical trials.
61
HN
NO
O
O
N
HN
NN
O
MKC-442 nevirapine
HN
NOR2
O
R3
R1
N
N
NN
OR1
R2
R3
HEPT analogs nevirapine analogs
Specific goals of the research are twofold: (1) the estimation of binding affinities
in the context of structure based drug design using available experimental data and (2)
understanding the variations in binding affinities through interpretation of energetic and
structural results from simulations.
62
Table 3. 1. Inhibition of HIV-1 RT by HEPT Analogs.
HN
NOR2
O
R3
R1
No. R1 R2 R3 EC50 ca. ∆Gexptl H01 Me CH2OCH2CH2OH SPh 7.0a −7.32 H02 Me CH2OCH2CH2CH3 SPh 3.6a −7.73 H03 Me CH2OCH2CH3 SPh 0.33a −9.20 H04 Me CH2OCH3 SPh 2.1a −8.06 H05 Me CH2OCH2Ph SPh 0.088a −10.01 H06 i-Pr CH2OCH2Ph SPh 0.0027a −12.16 H07 Me Et SPh 2.2a −8.03 H08 Me Me SPh > 150.0a > −5.43 H09 Et CH2OCH2CH3 SPh 0.019a −10.96 H10 i-Pr CH2OCH2CH3 SPh 0.012a −11.24 H11 i-Pr CH2OCH2CH3 CH2Ph 0.004b −11.89 H12 c-Pr CH2OCH2CH3 SPh 0.1a −9.93 H13 Me CH2OCH2CH2OH CH2Ph 23.0c −6.52 H14 Me CH2OCH2CH2OH OPh 85.0c −5.78 H15 Me CH2OCH2CH2OH SPh-3,5 di-Me 0.26d −9.35 H16 Et CH2OCH2CH2OH SPh-3,5 di-Me 0.013d −11.19 H17 i-Pr CH2OCH2CH2OH SPh-3,5 di-Me 0.0027d −12.16 H18 Et CH2OCH2Ph SPh 0.0059a −11.68 H19 Me H SPh > 250.0a > −5.11 H20 Me Bu SPh 1.2a −8.40 aReference 94. bReference 95. cReference 96. dReference 97. H01 is parent HEPT, H11 is MKC-442. EC50 in µM at 37 ºC. Estimated experimental binding energies ∆Gexptl ≈ RT ln (EC50) in kcal/mol.
63
Table 3. 2. Inhibition of HIV-1 RT by Nevirapine Analogs.
N
N
NN
OR1
R2
R3
1
No. R1 R2 R3 IC50a ca. ∆Gexptl N01 Me Et H 0.125 −9.42 N02 Me Et 2-Me 0.17 −9.24 N03 Me Et 2-Cl 0.15 −9.31 N04 Me Et 3-Me 0.76 −8.35 N05 Me Et 3-Cl > 1.0 > −8.19 N06 Me Et 4-Me 1.9 −7.81 N07 H Et H 0.44 −8.67 N08 H Et 4-Me 0.035 −10.17 N09 H Et 4-Cl 0.095 −9.58 N10 H c-Pr 4-Me 0.084 −9.65 N11 Me c-Pr 4-Me > 1.0 > −8.19 N12 Me Pr H 0.45 −8.66 N13 Me t-Bu H 11.0 −6.77 N14 Me COCH3 H 15.3 −6.57 N15 Me Et 4-Et 0.11 -9.49 N16 Me CH2SCH3 H 0.85 −8.28 N17 H c-Pr 4-CH2OH 3.0 −7.54 N18 H c-Pr 4-CN 1.25 −8.05 N19 Me CH2CH2F H 2.9 −7.56 N20 H c-Pr H 0.45 −8.66 aReference 98. N10 is Nevirapine. IC50 in µM at 25 ºC. Estimated experimental binding energies ∆Gexptl ≈ RT ln (IC50) in kcal/mol.
64
Computational Details.
Theoretical Method
The most rigorous computational approaches used for the calculation of binding
affinities (∆Gb) are the free energy perturbation (FEP) and thermodynamic integration
(TI) methods.99-102 These methods typically employ molecular dynamics (MD) or
Monte Carlo (MC) simulations and have yielded impressive results for a number of
protein-ligand systems, as reviewed elsewhere.99-102 However, since the FEP and TI
methods are quite computationally expensive more approximate and faster methods are
desirable. In the present work, ∆Gb predictions were made based on an extended linear
response (ELR) theory as introduced in Chapter One. The methodology, which
corresponds to eqs 3.1 for estimating binding affinities, uses descriptors such as
hydrogen-bond counts, hydrophobic, hydrophilic and aromatic components of the solvent
accessible surface area in addition to the standard Lennard-Jones and Coulombic terms
first advocated by Åqvist and coworkers.15
∑ +=n
nncG constant∆ b ξ (3. 1)
∆Gb is obtained using a multivariate fitting approach to experimental data where cn
represents an optimizable coefficient for the associated descriptor ξn. In principle, any
physically reasonable quantity could be considered as a descriptor in eq 3.1.
Configurationally averaged quantities are collected during two separate MC simulation
corresponding to the unique drug environment for the unbound state (drug + water) and
65
bound state (drug + water + protein) and the difference (bound − unbound) for each
descriptor is computed (Figure 3.3).
Figure 3. 3. Schematic representation of a binding event showing different environments
for HIVRT inhibitors. Small arrows depict potential interactions of a drug with water
(unbound state) or water and protein (bound state).
66
System Setup.
Given the large size of HIVRT, simulations of the entire protein-ligand complex
are currently impractical. Therefore, a model of the NNRTI binding site was constructed
which incorporated only nearby residues (Figure 3.4). Using the initial crystal structure
coordinates for MKC-442 bound to HIVRT (pdb entry 1rt1),88 a representative model
was constructed by including only those residues within ca. 15 Å of atom C6 of the
HEPT uracil core.
Figure 3. 4. HIVRT binding site model surrounded by a 22 Å cap of water. Blue
residues sampled in the MC simulations, red residues rigid, green residues not used.
Crystal structure coordinates, pdb entry 1rt1, from reference 88.
67
To avoid excessive fragmentation of the protein backbone, a few additional amino acids
were included. Hydrogen atoms were added, and clipped residues were then capped with
acetyl and methylamine groups. Residues with all atoms outside a 10 Å sphere from C6
were kept rigid during the MC simulations. The final system size was 123 protein
residues plus the inhibitor. Specifically, the rigid residues are 91-94A, 109-110A, 161-
178A, 184-185A, 192-197A, 199-205A, 222-224A, 230-232A, 240-242A, 316-317A,
320-321A, 343-349A, 381-383A, 134-135B, 137B, and 140B. The flexible residues are
95-108A, 179-183A, 186-191A, 198A, 225-229A, 233-239A, 318-319A, 136B, and
138B. To impose overall charge neutrality for the system,18 all but one of the rigid Asp,
Lys, Glu, and Arg residues were made neutral. The tautomeric states of His residues in
the binding site were assigned by visual inspection. A residue-based cutoff at 9 Å was
used for the solute−solvent and intrasolute non-bonded interactions. The water−water
cutoff was also at 9 Å, based on the O−O separation. The nevirapine analogs were
treated similarly starting from the coordinates of the X-ray structure of nevirapine bound
to HIVRT (pdb entry 1vrt).103
The initial Cartesian coordinates for each HEPT or nevirapine analog were
generated by analogy to the conformations in the crystal structures of HIVRT with MKC-
44288 and nevirapine103 using the XChemEdit program.104, 105 The Z-matrix
connectivity was then graphically assigned, and the results saved both as a PEPZ
database,106 and a Gaussian95 input file.43 The OPLS-AA force field11, 107 was used for
the systems except the partial charges for the inhibitors were determined using the
ChelpG procedure at the HF/6-31G* level.43 Any missing OPLS-AA torsional
parameters were assigned by analogy to existing ones with the exception of two new
68
torsions, which were fit to results of dihedral angle energy scans at the HF/6-31G* level
for the model compounds methyl benzyl ether and thioanisole, as previously described.11
The OPLS-AA parameters have been developed to reproduce accurately molecular
geometries, torsional energetics, free energies of hydration, enthalpies of vaporization,
and liquid densities for a wide-range of model compounds.11
Crystal Structure Choice.
H01 (the parent HEPT) and analog H11 (MKC-442) differ in potency by about
4.6 kcal/mol (Table 3.1); a possible explanation has been suggested by Hopkins et al.
based on an interpretation of crystallographic evidence.88 A difference of ca. 100° in the
χ1 dihedral angle for Tyr181A was found between the structures for H01 (pdb 1rti) and
H11 (pdb 1rt1).88 It was suggested that H11 (R1 = i-Pr) is a more potent compound than
H01 (R1 = Me) because the larger group at R1 sterically forces Tyr181A "up".88 A
favorable aromatic π-stacking interaction can then occur between Tyr181A and the
phenyl ring in the R3 substituent (Table 3.1).88 We believe that this interpretation is
flawed for the following two reasons: (1) No steric clashes are visually apparent when
H11 is docked into the H01 crystal structure (Figure 3.5), and conjugate gradient energy
minimizations reveal no energetically unfavorable steric interactions between the i-Pr
group of H11 when Tyr181A is "down" as in the parent HEPT (H01) structure. This
suggests that unfavorable steric interaction are not responsible for the "up"
conformational preference observed in the MKC-442 crystal structure.
69
Figure 3. 5. No steric clash is observed between HIVRT side-chain Tyr181A and the i-Pr
group of MKC−442 in the modeled structure using the “down” conformation, which is
only reported for the parent HEPT.
(2) More importantly, an overlay of 16 experimental HIVRT/NNRTI crystal structures all
show Tyr181A to be in the same "up" conformation with the lone exception of the
structure for the parent HEPT (Figure 3.6). The 16 experimental structures include six
different inhibitor cores. In fact, the NNRTIs based on nevirapine (green; 3hvt, 1rth,
1vrt),103, 108 HEPT (magenta; 1rt1, 1rt2, 1rti),88 α−APA (red; 1hni, 1vru),103, 109 TIBO
(yellow; 1hnv, 1tvr, 1rev),110, 111 BHAP (grey; 1klm),112 and carboxanylide (cyan; 1rt4,
1rt5, 1rt6, 1rt7)113 could potentially allow Tyr181A to adopt the "down" conformation
given that no steric clashes would result, yet this is not reported. It should be noted that a
change in χ1 for Tyr181A of ca. 100° would be a rare event in computer simulations of
the present lengths and was not observed. Therefore, given the consistency in which
70
Tyr181A is observed to be in the "up" conformation (Figure 3.6), pdb entry 1rt188 was
chosen as the starting point for all simulations of HEPT analogs.
Figure 3. 6. Experimental conformation of Tyr181A for 16 HIVRT non-nucleoside
inhibitor complexes: nevirapine (green), HEPT(magenta), BHAP (grey), α−APA (red),
TIBO (yellow), and carboxanylide (cyan) analogs. The complexes were aligned by
minimizing the rmsd between Cα carbons at residues Leu100A, Lys103A, Tyr181A, and
Val106A. See text for pdb references.
71
Monte Carlo Simulations.
Each protein−inhibitor complex was subjected to 50 steps of conjugate gradient
energy minimization, using a distant-dependent dielectric constant of 4 (ε = 4r), to relax
the crystal structure with the force field prior to the MC simulations. For the MC
simulations, a 22 Å water cap was used containing 851 (bound) and 1485 (unbound)
TIP4P water molecules.47 All HIVRT side chains with an atom within ca. 10 Å from the
defined center of the water cap were sampled, the protein backbone was fixed, and each
inhibitor was fully flexible. Bond lengths for the protein remained fixed after the initial
energy minimizations. A protein residue−inhibitor list, which was kept constant during
the entire simulation, was determined for each complex during the initial solvent
equilibration stage of the simulation. A MC move for a side chain was attempted every
10 configurations, while a move for the inhibitor was attempted every 56 configurations.
All remaining moves were for solvent molecules. Solvent−solvent neighbor lists were
also used, and the maximum number of internal coordinates to be varied for an attempted
move was limited to 30. All MC simulations and energy minimizations were performed
with the MCPRO program.114 The computations were executed on a PC cluster with ca.
70 processors running Linux. The complete processing of one inhibitor (bound and
unbound) requires 2.5 days using one 800 MHz PentiumIII processor. Thus, ca. 300
inhibitors could be processed in one week on a PC cluster with 100 top-end processors.
72
Bound Simulations.
Each MC simulation for a protein−inhibitor complex consisted of 1 million
configurations of solvent-only equilibration, 10 million configurations of full
equilibration, and 10 million configurations of averaging. In general, convergence of the
results for complexes is less problematic than for the simulations of the inhibitors alone
in water. This probably results from the facts that in the simulations of the complexes the
ligands are more conformationally restricted than in pure water, and about one-half as
many water molecules are used for the complexes than for the unbound inhibitors.
Unbound Simulations Using an Annealing Protocol.
Initial results for the inhibitors alone in water revealed that the solute−water
Coulombic interaction energy showed the slowest convergence among the descriptors
and that it was not well converged with MC simulations of the same length as for the
complexes. Surprisingly, the same average energies were not obtained when the
simulations were initiated from two similar yet distinct geometries. After additional
testing, an annealing protocol (Figure 3.7) was developed to enhance the convergence.
Each unbound MC simulation consisted of 1 million configurations of solvent-only
equilibration at the experimental temperature of 37 °C or 25 °C. Then, 5 million
configurations of equilibration ensued in which only the water and the dihedral angles of
the inhibitor were sampled. The MC acceptance rate for the inhibitor was also increased
through a local heating option in MCPRO with the temperature specified to be 727 °C
(1000 K) for the attempted moves of the inhibitor. This was followed by an additional 5
million configurations of full equilibration at the normal temperature, followed by 10
73
million configurations of averaging. The latter three processes were then repeated for a
total of five cycles (Figure 3.7). The local heating is applied only to the inhibitor and no
bonds or angles are sampled during this stage. The focus is on increased conformational
sampling for the inhibitor. Since local heating is only specified for the inhibitor, the bulk
water structure is largely unaffected during the heating phase, and the dihedral-only
sampling ensures that bond lengths and angles do not have to be cooled upon
reequilibration.
Figure 3. 7. Annealing protocol showing heating, equilibration, and averaging portions
used in the MC simulations for the unbound inhibitors.
74
Convergence of the solute−solvent ECoul was greatly improved, as illustrated in
Figure 3.8, using the new protocol. For Figure 3.8, simulations were initiated from two
alternative geometries of all 20 HEPT analogs; one was based on the 1rt1 structure and
the other on the 1rti structure.88 The annealing results for the HEPT compounds clearly
show that in five cycles acceptable convergence is obtained independent of small
differences in the starting geometry of the unbound inhibitors. It may be noted that the
annealing protocol formally corresponds to averaging the MC results from five
independent simulations of 10 million configurations each. The importance of well-
converged results can not be overestimated for the LR or ELR equations to have good
predictive value. For example, 5 kcal/mol of noise in the unbound ECoul value can easily
translate to 1−3 kcal/mol of noise in the predicted ∆Gb with usual values for β in eqs 1.15
or 1.16.
75
Figure 3. 8. Convergence of the inhibitor-water Coulombic energy for the HEPT data set
after 10 million (1 cycle) and 50 million (5 cycles) configurations of averaging using the
annealing protocol. Each inhibitor was simulated twice starting from one of two different
conformations obtained from a minimization in either the 1rt1 or 1rti crystal structure.
Free Energy Perturbations.
To help in interpreting the results for nevirapine analogs, a FEP calculation was
performed to determine the difference in free energy of hydration (∆∆Ghyd) between
model 3° and 2° amides. Specifically, N,N-dimethylacetamide (DMA) was converted to
N-methylacetamide (NMA) using well-established methods.99 No internal degrees of
freedom were sampled, so ∆∆Ghyd could be computed simply by performing one
mutation in water.13, 99 The FEP calculations were performed for the solute in a periodic
76
cube containing 500 TIP4P water molecules. Both solute−solvent and solvent−solvent
cutoffs were at 10 Å based on the separations of amide nitrogens and water oxygens.
Each of the 10 windows consisted of 6 million configurations of equilibration, followed
by an additional 5 million configurations of averaging. The potential functions for the
amides were the same as for the HEPT and nevirapine inhibitors, OPLS-AA with HF/6-
31G* ChelpG atomic charges.
Experimental Activities.
The experimental EC50 activities at 37 °C reported for the HEPT series94-97 and
the IC50 values at 25 °C for nevirapine analogs98 were converted into approximate free
energies of binding (∆Gexptl) by eq 3.2 as listed in Tables 3.1 and 3.2. Although not
formally equivalent, relative activities should correspond to relative free energies of
binding for closely related series of inhibitors.115
)ln(∆ exptl ActivityRTG ≈ (3. 2)
To correlate both data sets simultaneously, an offset might be necessary, though it turned
out not to be needed. Measured activities from the same laboratory116 indicate that
nevirapine (N10) is more potent than the parent HEPT (H01) by ca. 2.8 kcal/mol in
general agreement with the difference of 2.3 kcal/mol from the data in Tables 3.1 and 3.2.
In another study,117 MKC-442 (H11) was reported to be more potent than nevirapine
(N10) by about 1.0 kcal/mol, while the data in Tables 3.1 and 3.2 imply 2.2 kcal/mol.
77
The experimental HEPT activities span a range of 7.1 kcal/mol, which is twice as large as
the range for the nevirapine analogs (Tables 3.1 and 3.2). Uncertainties were not
reported for the experimental data, but they are typically at least 0.5 kcal/mol.
Results and Discussion.
Regression Equations.
Correlations were derived using the statistical software package JMP.118 Eq 3.3
shows the best four-descriptor equation obtained by fitting the experimental activities of
the 40 compounds using the generic regression, eq 3.1.
( ) 6.48.2∆0085.0
30.0∆94.0∆
area
totalcalcd
+−°−
++−=
amide2PHOB
EXX-LJHBG (3. 3)
∆HBtotal is the change in the total number of hydrogen bonds for the inhibitor; a hydrogen
bond is defined here by a distance of less than 2.5 Å between an N, O, or S atom and a
hydrogen attached to a heteroatom.26 EXX-LJ is the ligand−protein Lennard-Jones
interaction energy, ∆PHOBarea is the change in hydrophobic SASA upon binding, and 2°-
amide is an indicator variable (1 or 0) for compounds with or without a 2° amide
functional group. The contributions for each term are recorded in Tables 3.3 and 3.4.
78
Table 3. 3. Individual Contributions to the Total Computed Free Energies of Binding
for HEPT Analogs with HIV-1 RT.
No.
∆Gexptl
total
total
∆HBtotal
∆Gcalcd
EXX-LJ ∆PHOBarea
2°-amide
H01 −7.32a −7.33 3.24 −13.70 −1.47 0.00
H02 −7.73a −10.05 2.24 −14.93 −1.96 0.00
H03 −9.20a −9.62 2.05 −14.39 −1.87 0.00
H04 −8.06a −7.65 2.29 −13.04 −1.48 0.00
H05 −10.01a −10.11 1.67 −15.41 −0.96 0.00
H06 −12.16a −11.89 1.88 −16.79 −1.56 0.00
H07 −8.03a −8.04 1.46 −12.69 −1.39 0.00
H08 −5.43a −6.75 1.64 −11.89 −1.09 0.00
H09 −10.96a −9.92 2.23 −14.59 −2.15 0.00
H10 −11.24a −10.30 2.18 −14.68 −2.39 0.00
H11 −11.89b −10.07 2.34 −14.61 −2.39 0.00
H12 −9.93a −10.47 1.99 −14.87 −2.18 0.00
H13 −6.58c −6.59 3.17 −12.77 −1.58 0.00
H14 −5.78c −6.83 3.48 −13.32 −1.58 0.00
H15 −9.35d −9.82 3.24 −14.79 −2.86 0.00
H16 −11.19d −10.94 2.96 −15.29 −3.19 0.00
H17 −12.16d −11.20 3.10 −15.60 −3.28 0.00
H18 −11.68a −11.30 1.46 −16.05 −1.29 0.00
H19 −5.11a −4.86 1.68 −10.56 −0.58 0.00
H20 −8.40a −10.07 1.62 −14.35 −1.93 0.00
aReference 94. bReference 95. cReference 96. dReference 97. ∆Gexptl ≈ RT ln (Activity). ∆Gcalcd
obtained from eq 3.3. Energies in kcal/mol.
79
Table 3. 4. Individual Contributions to the Total Computed Free Energies of Binding
for Nevirapine Analogs with HIV-1 RT.
No.
∆Gexptla
total
total
∆HBtotal
∆Gcalcd
EXX-LJ ∆PHOBarea
2°-amide
N01 −9.42 −7.78 2.08 −12.85 −1.60 0.00
N02 −9.24 −9.05 2.15 −13.47 −2.32 0.00
N03 −9.31 −8.16 2.44 −13.62 −1.57 0.00
N04 −8.35 −8.93 2.08 −13.32 −2.28 0.00
N05 −8.19 −7.87 2.54 −13.46 −1.54 0.00
N06 −7.81 −8.53 2.23 −13.28 −2.07 0.00
N07 −8.67 −7.76 3.58 −12.19 −0.91 −2.82
N08 −10.17 −9.32 3.71 −13.26 −1.54 −2.82
N09 −9.58 −8.42 3.92 −13.15 −0.97 −2.82
N10 −9.65 −9.98 3.60 −13.62 −1.73 −2.82
N11 −8.19 −8.72 2.67 −13.87 −2.10 0.00
N12 −8.66 −8.30 2.24 −13.37 −1.77 0.00
N13 −6.77 −7.86 2.77 −13.38 −1.84 0.00
N14 −6.57 −6.41 3.36 −13.21 −1.15 0.00
N15 −9.49 −10.56 3.22 −13.67 −1.88 −2.82
N16 −8.28 −8.09 2.57 −13.74 −1.52 0.00
N17 −7.54 −8.05 5.53 −13.85 −1.50 −2.82
N18 −8.05 −8.99 4.32 −13.81 −1.27 −2.82
N19 −7.55 −7.04 2.85 −13.17 −1.31 0.00
N20 −8.66 −8.75 3.42 −12.95 −0.99 −2.82
aReference 98. ∆Gexptl ≈ RT ln (Activity). ∆Gcalcd obtained from eq 3.3. Energies in kcal/mol.
80
For the 40 compounds, the correlation coefficient r2 of 0.75 reflects good accord
between theory and experiment (Figure 3.9). Cross validation by the leave-one-out
procedure yields a q2 of 0.69 and implies reasonable predictive power for compounds not
included in the original data set. The computed activities show a rmsd of 0.94 kcal/mol
in comparison with experiment and an average unsigned error of only 0.69 kcal/mol. The
uncertainties in the experimental data and in the convergence of the MC results are
estimated to be at this level. All of the descriptors in eq 3.3 are significant. Probability >
F ratios (regression model mean square/error mean square) are small: ∆HBtotal (0.0005),
EXX-LJ (<0.0001), ∆PHOBarea (0.0037), and 2°-amide (<0.0001). This implies that the
probability of a greater F value occurring by chance is low. No systematic deviation in
the predicted ∆Gcalcd values was found; the computed residuals show random scatter.
81
Figure 3. 9. Predicted binding affinities (∆Gcalcd) using eq 3.3 vs. experimental activities
(∆Gexptl) for 20 HEPT and 20 nevirapine analogs with HIVRT.
The four descriptors in eq 3.3 make physical sense: (1) ∆HBtotal is always
negative; water is the best hydrogen-bonding medium, so there is an inevitable loss in
number of hydrogen bonds for an inhibitor upon binding. The coefficient implies that the
loss of each hydrogen bond costs 0.94 kcal/mol in free energy of binding. (2) The EXX-
LJ term implies that a good geometrical fit between the ligand and the protein is also
important. Favorable packing contributions to binding are contained in this term as well
as any unfavorable steric interactions. The change in ligand−water Lennard-Jones energy
(∆ESX-LJ) is highly correlated with EXX-LJ (greater loss in ∆ESX-LJ corresponds with
82
greater gain in EXX-LJ), so its inclusion does not improve the regression. (3) The
∆PHOBarea term is also negative; SASA for a ligand is always lost upon binding. The
associated coefficient is positive so that the removal of hydrophobic surface area upon
binding is favorable for the free energy, which simply reflects the hydrophobic effect. (4)
Finally, as described in the next section, a 2°-amide indicator is needed to account for
deficiencies in the partial charges.
The separate data sets yield modified optimal fits. For the HEPT analogs alone,
an r2 of 0.83 is obtained with eq 3.4.
6.5∆0112.031.0∆00.1∆ areatotalcalcd +++−= PHOBEXX-LJHBG (3. 4)
All descriptors in eq 3.3 are still significant except no 2° amides are present for HEPT
analogs so this descriptor is eliminated. For the nevirapine data set alone, however, only
the ∆HBtotal and 2°-amide descriptors are significant. A fit with these two descriptors
plus a constant yields an r2 of 0.58 (eq 3.5). In this case, the lower r2 may reflect
challenges associated with the compressed range of the experimental activities in
comparison with the data for the HEPT series.
2.1144.2∆10.1∆ totalcalcd −−°−−= amide2HBG (3. 5)
Binding affinity fits with the traditional ELR approach (eq 1.16) were also made
for comparison. A reasonable r2 of 0.56 and rmsd of 1.24 kcal/mol are obtained with eq
1.16 augmented by the 2°-amide indicator and a constant. Nevertheless, eq 3.3 is clearly
83
superior with the same number of descriptors. It may be noted that eq 3.3 does not
include a term that obviously reflects differences in flexibility for the inhibitors. A
rotatable-bond count was considered, but was not found to be statistically significant.
For more diverse sets of ligands, it is likely that such a term may be needed to reflect the
entropic penalty for loss of conformational freedom upon binding.119
2°-Amide Indicator.
During the fitting, it was discovered that acceptable correlations could not be
obtained for the nevirapine analogs unless an indicator variable was included for 2°
amides. Suspecting that the use of the 6-31G* ChelpG charges was overestimating
hydration differences in the unbound state, the FEP calculation was performed for the
model 3° → 2° amide conversion of DMA → NMA in water (Figure 3.10). The
computed ∆∆Ghyd of –2.47 ± 0.24 kcal/mol is too negative by 1 kcal/mol in comparison
with the experimental value of –1.53 kcal/mol.31 By analogy, nevirapine analogs with 2°
amides would be expected to be too well hydrated in the unbound state and thus pay an
artificially high desolvation penalty for binding. Thus the indicator coefficient of −2.8 in
eq 3.3 has the correct sign, though the magnitude is larger than from the simple model, as
clarified below.
84
Figure 3. 10. Plot of ∆G (kcal/mol) vs. λ for the perturbation of N,N-dimethylacetamide
to N-methylacetamide. The non-bonded parameters and geometries were scaled using the
coupling coordinate λ.
It should be noted that obtaining correct relative free energies of hydration for
amines and amides has been a long-standing problem in the computational community.27,
28, 33, 34, 120 Successful parameters for 1°, 2°, and 3° aliphatic, cyclic, and aromatic
amines have now been reported,120 and parallel improvements for amides have recently
been achieved.121
Analysis of Binding Trends – HEPT Series.
Eq 3.3 presents a straightforward framework for understanding the trends in the
observed activities. For the HEPT analogs in Table 3.3, the ranges for the free energy
85
contributions from the hydrogen-bond loss, protein−inhibitor Lennard-Jones energy, and
burial of hydrophobic surface area are 1.9, 6.2, and 2.7 kcal/mol, respectively. There is
variation in the R2 side chain (Table 3.1) and the side chains with no oxygen atoms (H07,
H08, H19, and H20) show smaller desolvation penalties (Table 3.3). In the simulations
of the complexes, R2 is in a channel that contains some water and the terminal hydroxyl
in, for example, H01 is involved in hydrogen bonds with water or the backbone carbonyl
of Leu 234A. So, the range of desolvation penalties is not as great as might have been
expected. The larger analogs then benefit from more favorable Lennard-Jones
interactions (H05, H06, H16, H17, H18), which is the dominant discriminator. The
HEPT derivatives with R3 as 3,5-dimethyl-thiophenyl (H15, H16, H17) or with isopropyl
groups at R1 (H06, H10, H11, H17) get an additional boost for burial of more
hydrophobic surface area than their less substituted analogs. The factors combine such
that H06, H17, and H18 are observed and predicted to be in the most active group. Some
comments can also be made on specific pairs of inhibitors with small structural, but large
activity differences.
H08 vs. H07. The HEPT analogs H08 (R2 = Me) and H07 (R2 = Et) differ only by a Me
group yet have an experimental activity difference ∆∆Gexptl, of more than 2.6 kcal/mol
(Table 3.1). The computed relative free energy of binding (∆∆Gcalcd) is 1.3 kcal/mol, in
qualitative agreement with experiment. In Table 3.3, the computed free energy penalties
for lost hydrogen bonds (∆HBtotal) are similar, 1.46 kcal/mol for H07 and 1.64 kcal/mol
for H08. However, the larger Et group of H07 improves the hydrocarbon packing in the
86
binding pocket (Figure 3.11) and yields a more favorable EXX-LJ contribution by about
0.8 kcal/mol over H08.
Figure 3. 11. Two water molecules (orange) are displaced by compound H07 (green, Et
analog) that are observed in simulations of compound H08 (magenta, Me analog) with
HIVRT.
Additional benefit for H07 comes from the burial of more hydrophobic surface area (−1.4
kcal/mol) than for H08 (−1.1 kcal/mol). Given the structure in Figure 3.11, these results
for the descriptors are reasonable. In addition, two water molecules are displaced from
the binding pocket upon expansion of the methyl group in H08 to ethyl in H07 (Figure
3.11). In general, this should be an entropically favorable process since the bound water
molecules likely gain translational and rotational freedom upon transfer into the bulk
solvent.119 The free energy gain for displacing one bound water molecule has been
estimated to be as high as 2 kcal/mol at 300 K.122 In contrast, homologation of H03 to
H02, though reported to diminish activity (Table 3.1), is also predicted to enhance
87
activity (Table 3.3). A steric problem is not found for H02 here, which is consistent with
the observed accommodation of the even larger benzyloxy group for H05.
H14 vs. H01. H14 (R3 = OPh) and the parent HEPT, H01 (R3 = SPh) only differ in the
atom linking the phenyl ring to the uracil core, yet H01 is more potent than H14 by 1.5
kcal/mol (Table 3.1). The computed results are again in qualitative accord with the
difference diminished to 0.5 kcal/mol. Examination of the components in Table 3.3
shows that H14 has a computed ∆G(∆HBtotal) of 3.48 kcal/mol compared to 3.24 for H01.
Though an ether oxygen is expected to be better hydrated in the unbound state than a
thioether sulfur, these atoms are hindered in the bisaryl analogs H01 and H14, so there is
only a small differential. However, the more favorable ∆G(EXX-LJ) contribution for H01
(–13.70) compared to H14 (–13.32) makes sense given that sulfur is more polarizable
than oxygen and has a larger Lennard-Jones ε (0.25 vs. 0.14 kcal/mol).11 Finally, the
∆G(∆PHOBarea) values are essentially the same for H01 (–1.47 kcal/mol) and H14 (−1.58
kcal/mol) reflecting similar burial of hydrophobic surface area. Thus, the greater activity
of the sulfur analog H01 is predicted to come primarily from better van der Waals
interactions with some help from a smaller desolvation penalty.
88
Analysis of Binding Trends – Nevirapine Series.
For the nevirapine series, the energy ranges are 3.5, 1.7, and 1.4 kcal/mol for the
desolvation penalty, protein−inhibitor Lennard-Jones interactions, and burial of
hydrophobic surface area contributions (Table 3.4). The compressed ranges are
consistent with the smaller variation in activities (Figure 3.9). The small ranges for the
latter two effects are also consistent with the diminished differences in total size and
hydrophobic surface area; i.e., the ranges of SASA and FOSA values are 468−530 and
115−275 Å2 for the nevirapines and 448−648 and 66−409 Å2 for the HEPT analogs. The
dominant term then becomes desolvation and the 2° amide indication. Thus, the
inhibitors with more polar side chains are less active, i.e., N14, N17, and N18 (Table
3.2).
However, the experimental results for the 2° vs. 3° amide analogs are
interestingly mixed: N08 and N10, are experimentally more active than their 3°
homologs; N06 and N11, by more than a factor of 10 (Table 3.2), while the 2° N07 is
reported to be less active than its 3° derivative N01 by a factor of 3.5, and another pair
with R2 = Et and R3 = 2, 3-dimethyl is reported to have the 3° compound more active
than the 2° by a factor of 2.98 In the crystal structure for nevirapine (N10) with
HIVRT,103 there are water molecules hydrogen-bonded to both pyridine nitrogens and
the amide carbonyl, though there is no hydrogen bond for the amide NH. The simulation
results typically have one water molecule hydrogen-bonded to a pyridine nitrogen, but
there is no water molecule within hydrogen bonding range of the amide carbonyl. Thus,
89
the 2° amide fragment is not well-accommodated in any event, and in the absence of
another factor, the 2° amides should not be so competitive with the 3° analogs.
The missing factor appears to be a favorable NH-aryl π-type hydrogen bond for
the 2° amides with the phenyl ring of Tyr188A. Though this has not been specifically
noted in the crystallographic studies, it is illustrated in Figure 3.12.103, 108
Figure 3. 12. Top – computed snapshots of Nevirapine (N10) and N-methyl Nevirapine
(N11) with Tyr188A from the MC simulations. Bottom – optimized structures of model
2° and 3° amides, N-methylacetamide and N,N-dimethylacetamide, with benzene. The
net interaction energy is shown along with the shortest distances to aromatic carbons.
90
The structures are shown for N10 and N11 with Tyr188 from the last configuration of
both MC runs, which is representative. For comparison, the optimal structures and
interaction energies using the OPLS-AA force field for the complexes of the model
amides, cis-NMA and DMA, with benzene are also shown. The shortest distance
between the amide NH and an aromatic carbon is only 0.26 Å longer for N10 than cis-
NMA and a comparably attractive interaction is indicated. The longer distance is
reasonable since the optimized NMA-benzene structure is effectively at a temperature of
0 K; it may also be noted that the interaction energy for trans-NMA with benzene is
somewhat more attractive, −5.55 kcal/mol. For DMA and benzene, the π-type hydrogen
bond is lost and the attraction drops nearly 3 kcal/mol. The shortest distance between the
N-methyl carbon and a ring carbon of Tyr188 is now 3.4 Å, which is 0.2 Å shorter than in
the optimal DMA-benzene complex. In the crystal structure for N10 with HIVRT,103 the
shortest distance between the amide N (coordinates are not given for the H) and the
Tyr188 ring carbons is 3.54 Å for CD2, while the corresponding distance for the
computed structure in Figure 3.12 is 3.24 Å. Thus, we propose that the binding of the 2°
amides in the nevirapine series benefits significantly from a π-type hydrogen bond with
Tyr188. This factor coupled with the overestimate of the desolvation energy of 2°
amides is responsible for the magnitude and significance of the 2°-amide indicator in eq
3.3. Such strong π-type hydrogen bonds could be included in the hydrogen bond counts
in the future. In support of this analysis, it is known that the Y188C mutant of HIVRT is
100 to 1000-fold less sensitive to nevirapine (N10) that the wild-type protein.123 The
decrease in activity for 3° analogs such as N11 should not be as severe, but this has not
been studied to our knowledge.
91
One final point for the nevirapines is that N13 (R2 = t-Bu) is observed to have low
activity (Table 3.2). This is the only compound in this series with a tertiary substituent at
R2, and not surprisingly, the hydration of the proximal pyridine nitrogens is affected. The
effect is actually not great for N13 unbound in water; it is computed to accept an average
of 3.0 hydrogen bonds from water, which is just a little less than the 3.2−3.6 for 3°
amides N01−N06. However, in the complex with HIVRT, the bulkier tert-butyl group
displaces the water molecule from the pocket near the pyridine nitrogens. Both the
hydration of a pyridine nitrogen and the backbone of Lys101A are adversely affected.
This is illustrated in Figure 3.13 by contrasting representative configurations from the
MC simulations of the complexes for N13 and N01. The energetic penalty for the net
loss of the hydrogen-bonding with the pyridine nitrogen is about 0.7 kcal/mol in
comparing N13 with N01 in Table 3.4. Eq 3.3 does not obviously reflect the penalty for
the poorer solvation of Lys101A, which may account for N13 being predicted to be too
active by 1 kcal/mol.
92
Figure 3. 13. A water-mediated hydrogen bond is consistently observed between N01 (Et
analog) and Lys101A that is not observed in the MC simulations of N13 (t-Bu analog)
with HIVRT.
Conclusion.
The results of the MC simulations presented in this chapter revealed three
physically reasonable parameters that control binding for two series of inhibitors with
HIVRT: loss of hydrogen bonds with the inhibitor upon binding is unfavorable, burying
hydrophobic surface area of the inhibitor is favorable, and a good geometrical match
between the inhibitor and the protein is important. The best regression equation that was
generated (eq 3.3) reveals a strong correlation with experimental activities (Figure 3.9, r2
= 0.75) and the cross-validated q2 of 0.69 implies reasonable predictive power for
compounds not included in the original data set. Given the comparatively large size of
the data set (40 compounds), the results provide strong support for the utility of the ELR
93
method. On the technical side, convergence of the results for the unbound inhibitors in
water was carefully investigated and led to the development of an effective annealing
method. Further efforts on improving the efficiency and convergence of both the
unbound and bound simulations are on-going.
The structural details from the Monte Carlo simulations are also valuable in
interpreting trends in the binding and activity data. In particular, a key π-type hydrogen
bond between the 2° amide fragment of nevirapine analogs and the aryl ring of Tyr188A
of HIVRT was identified that explains the otherwise surprising activity of the 2° amides
and the poor activity of nevirapine against the Y188C mutant. Detailed knowledge of the
hydration of the inhibitor and the protein by specific water molecules is also repeatedly
found to be relevant in interpreting binding/activity data.
Finally, given the severity of the HIV/AIDS pandemic,2 the development of
improved, low-cost anti-HIV drugs is critical. The present study has been successful in
advancing the potential for computational methods to participate in achieving this goal.
It has been demonstrated that computer simulations can be used to make predictions of
binding affinities for sizeable data sets in a reasonable time frame. And, the examination
of the associated energetic and structural results can provide bases for understanding
activity differences and for rational drug design.
94
Chapter 4
Validation of a Model for the Complex of HIV-1 Reverse Transcriptase
with Sustiva through Computation of Resistance Profiles
Background.
Drug-design efforts to arrest reverse transcription in HIV have led to the FDA
approval of three non-nucleoside reverse transcriptase inhibitors (NNRTIs), nevirapine,
delavaridine, and efavirenz (Sustiva). Additional compounds, including MKC-442, are in
clinical trials (Table 4.1). Because of the low fidelity of HIVRT, the mutation rate in the
encoded proteins including HIVRT is great.89, 90 As a result, all HIVRT inhibitors incur
resistance problems that adversely affect their clinical value.85, 124 A quantitative
measure of a drug's effectiveness against a mutation is given by the fold resistance (FR),
which is the ratio of mutant to wild type activities. Sustiva has been shown to remain
notably active against several common HIVRT point mutations including Val → Ala at
position 106 (V106A) and Tyr → Cys at position 181 (Y181C) (Table 4.1). When this
work was initiated no HIVRT structure with Sustiva had been reported that may help
explain its improved resistance profile. To study Sustiva, we (a) computed a structure for
the Sustiva/HIVRT complex, (b) validated the structure through computations of the
effects of the V106A and Y181C mutations on binding affinities for four drugs, and (c)
obtained structural insights on the improved effectiveness of Sustiva.
95
Table 4. 1. Relative Free Energies of Binding (∆GFR) Estimated from Fold Resistance (FR) Values.
NHO
Cl
O
F3C
N N
HN
N
O
NH
N O
O
O
N
N NH
S
Cl
Sustiva Nevirapine MKC-442 9-Cl TIBOFold Resistanceg
Kia IC90b IC50c IC90b EC50f EC50d EC50e IC50c IC50f EC50f
Y181C/WT 0.59 0.11 3.30 2.79 3.49 2.90 5.04 1.64 1.00 2.86V106A/WT
0.54 0.70 2.81 2.88 3.49 2.92 1.20 1.76
2.02L100I/WT 1.09
1.91 1.32 0.98 1.57
1.42
2.96 2.80
Y188C/WT 0.81 3.24 2.29 1.99 K103N/WT 1.11 1.79 1.96 2.26 3.99 2.56aReference 125. bReference 126. cReference 127. dReference 128. eReference 117. fReference 129. gFR = mutant/wild-type activities, ∆GFR = RT ln FR in kcal/mol. The columns show the structure, compound name, the assay type and reference for the FR values, and ∆GFR for several common HIVRT mutations
96
Computational Details.
System Setup.
A binding site model for the docking calculations was constructed from the 2.55
Å crystal structure of the MKC-442/HIVRT complex88 with MKC-442 removed
including only those residues within ca. 15 Å of MKC-442. Residues included were 91-
110A, 161-205A, 222-242A, 316-321A, 343-349A, 381-383A, and 134-140B. The final
system contained 123 protein residues with acetyl and methylamine capping groups on
end termini and the inhibitor. Neutralized residues included 110A, 166A, 169A, 172-
173A, 177A, 185A, 194A, 199A, 201A, 203-204A, 223-224A, 320A, 344A and 347A.
Tautomeric states of His residues were assigned by visual inspection. System setups for
the other NNRTI/HIVRT complexes were analogous; however, the coordinates
originated from the X-ray structure of nevirapine (pdb 1vrt),103 HEPT (pdb 1rti),88 or 9-
Cl TIBO (pdb 1rev)111 bound to HIVRT.
Docking.
The MATADOR130 docking program was then used to dock Sustiva in to NNRTI
binding site model. MATADOR uses a Monte Carlo-based Tabu131 search algorithm.
To keep the Tabu search focused on the known NNRTI binding site during the docking
runs, a 50 kcal/mol-Å2 half-harmonic restraining force was applied if the distance
between the ligand and the binding site center was greater than 5 Å. The defined binding
site was roughly centered on the C6 carbon of the MKC-442/HIVRT complex. The Tabu
list was set to be 25 and constructed from unique structures considering energetic as well
97
as geometric criteria. In total, 100 Tabu cycles were performed with each Tabu search
generating 100 randomly placed ligand positions around the binding site. The decision to
accept a new structure onto the Tabu lists is made after an intermolecular energy
minimization in Cartesian space and is based on both energetic and geometric criteria.
The protein and ligand were rigid during the docking. The CM1P augmented OPLS-AA
force field26 provided the initial structure of Sustiva; it was also used to determine the
non-bonded energies, which were stored on a spherical grid in order to increase
computational efficiency. The total intermolecular interactions between the ligand and
protein amount to a measure of both steric and electrostatic complimentarity; the lowest
energy structure found during the simulations was taken as the "best" docked system. A
distance-dependent dielectric constant of 4 (ε = 4r) was used for all docking calculations.
Docking Validation.
As simulation controls, MKC-442, nevirapine, 9-Cl TIBO and HEPT88 were also
docked back into their respective binding sites to verify that the docking protocol could
reproduce experimental structures. The lowest-energy structure generated during the
docking runs was taken as the "best" structure and was found in all cases to reproduce
closely the position and orientation observed in the crystal structures; the root-mean-
square-deviations (rmsd) for the non-hydrogen atoms of the four ligands between the X-
ray and docked structures was 0.43−0.60 Å (Figure 4.1). These low rmsd values and the
limited flexibility of Sustiva are favorable for the accuracy of the docking calculations.
98
Figure 4. 1. Docking validation results. Crystal (red) vs. docked (green) structure in the
NNRTI binding site. Nevirapine (pdb entry 1vrt), MKC-442 (pdb entry 1rt1), HEPT (pdb
entry 1rti), and 9-Cl TIBO (pdb entry 1rev). Each compound was initially positioned
outside of the binding site.
Molecular Dynamics Simulations.
To minimize unfavorable interactions, Molecular dynamics (MD) equilibration
simulations were then performed on the docked Sustiva structure and the equivalent
nevirapine, MKC-442, and 9-Cl TIBO binding-site models, which were based on their
crystal structures. The CM1P augmented OPLS-AA force field26 was used with the
IMPACT program132 for the MD simulations. Ten cycles of gradient-based energy
minimization were performed prior to the MD simulations and the complex was then
99
restrained in the following manner. Protein residues were allowed to move freely within
ca. 10 Å of the binding site (95-107A, 172A, 177-182A, 188-192A, 198A, 227A, 229A,
234-236A, 318-319A, 321A and 135-139B). Movement was restrained for those residues
in a 10-12 Å shell about the binding site, i.e., for residues 94A, 108A, 175-176A, 183A,
187A, 225A, 237-239A, 317A, 320A, 349A, 382-383A, 134B, 140B with harmonic
potentials. All other residues were restrained to their positions after conjugate-gradient
minimization. The Verlet algorithm was used to integrate Newton's equations of motion
using a time step of 0.001 pico-seconds (ps) and constant temperature was maintained
through coupling to a Berendsen temperature bath using a relaxation parameter of 0.2 ps
for the velocity scaling. Bond lengths were fixed by the SHAKE algorithm and a
distance-dependent dielectric constant of 4 (ε = 4r) was used. First, 3 ps of initial
equilibration was performed at 100 K followed by 50 ps of equilibration at 300 K.
Quenching of the structure was performed by reducing the simulation temperature over 6
blocks of 4 ps each starting at 300 K and ending at 50 K. The same MD equilibration
was also performed on the nevirapine, MKC-442, and the 9-Cl TIBO structures, and the
resultant complexes were then used in the MC simulations.
Monte Carlo Simulations.
Monte Carlo free energy perturbation (MC/FEP)99 simulations were then
performed with the MCPRO program114 to compute relative fold resistance energies
(next section) on the Sustiva structure and the equivalent nevirapine, MKC-442, and 9-Cl
TIBO binding-site models after the MD equilibration. Each protein−inhibitor complex
was briefly energy-minimized prior to the MC simulations using a distance-dependent
100
dielectric constant of 4 (ε = 4r). The CM1P augmented OPLS-AA force field26 was
used. For the MC simulations, water cap with 22 Å radius was used containing ca. 850
TIP4P water molecules and the system was partitioned into rigid residues (91-94A, 109-
110A, 161-178A, 184-185A, 192-197A, 199-205A, 222-224A, 230-232A, 240-242A,
316-317A, 320-321A, 343-349A, 381-383A, 134-135B, 137B, 140B) and flexible
residues (95-108A, 179-183A, 186-191A, 198A, 225-229A, 233-239A, 318-319A, 136B,
138B). All HIVRT side chains within ca. 10 Å from the center of the water cap were
sampled, the protein backbone was fixed, and each inhibitor was fully flexible. Bond
lengths for the protein remained fixed after the initial energy minimizations and a 9 Å
solvent-solvent, solute-solvent, and intrasolute non-bonded cutoff was used for all MC
simulations. A fixed protein residue-inhibitor list was specified for each simulation and
determined for each complex during the solvent equilibration stage of the simulation. An
attempted move for protein side-chains was requested every 10 configurations, while an
attempted move for the inhibitor was requested every 56 configurations. All remaining
moves were for water molecules. Solvent−solvent neighbor lists were also used, and the
maximum number of internal variables to be sampled for a given attempted move was 30.
Each solvated complex was subjected to 1 million configurations of solvent-only
equilibration, 10 million of equilibration, and 10 million configurations of averaging per
window during the FEP simulations.
101
Results and Discussion.
Binding Mode.
The docking calculations placed Sustiva in a reasonable position and orientation
in the binding site in comparison with the crystal structures for the complexes of HIVRT
with MKC-442,88 nevirapine,103 and 9-Cl TIBO111 (Figure 4.2).
Figure 4. 2. Orientation of the four NNRTIs in the HIVRT binding site. (A) Best docked
structure of Sustiva. (B) Nevirapine from pdb entry 1vrt. (C) MKC-442 from pdb entry
1rt1. (D) 9-Cl TIBO from pdb entry 1rev.
102
The best docked structure of Sustiva reveals that it makes interactions that are consistent
with those for other NNRTIs and that it overlays well with the “butterfly” shape
associated with nevirapine (Figure 4.3). Unlike nevirapine, hydrogen bonds are present
between Sustiva and the protein backbone at position Lys101 that are similar to those
observed in the crystal structures with 9-Cl TIBO and MKC-442 (Figure 4.2).
Figure 4. 3. Left − butterfly shapes adopted by Sustiva (red) and nevirapine (green).
Right − the same overlay in CPK colors.
Nevirapine makes no formal ligand-protein hydrogen bonds, but it does form a π-type
hydrogen bond between the secondary amide hydrogen and Tyr188133 and water-
mediated hydrogen bonds.103, 133 The cyclopropyl ethynyl group of Sustiva is
positioned towards aromatic residues Tyr181 and Tyr188 in the same fashion as the
methylpyridine fragment of nevirapine, the benzyl ring of MKC-442, and the
dimethylallyl group of 9-Cl TIBO (Figure 4.2). Presumably, these aryl-π interactions all
contribute favorably to binding.85, 124 Superposition based on the HIVRT Cα atoms
shows that these π fragments of the inhibitors coincide spatially in the binding site and
that Sustiva’s π fragment is the smallest (Figure 4.4).
103
Figure 4. 4. Top − overlays of the binding-site positions of nevirapine, MKC-442, and 9-
Cl TIBO (red) with Sustiva (green). Bottom − the same overlays in CPK colors.
An alternative binding mode suggested by Maga et al. was based on a simple
alignment of nevirapine and Sustiva in which the amide moiety of both drugs was
superimposed.134 The present docking calculations did not find this orientation.
Furthermore, forced placement of Sustiva in this alternative geometry yielded steric and
electrostatic protein-ligand interaction energies ca. 5 and 15 kcal/mol, respectively, less
favorable than for our docked structure. The alternative orientation is unlikely since the
hydrogen bonds to the backbone of Lys101 would be sacrificed.
A subsequent crystallographic study by Ren et al.135 indeed confirms the
correctness of the Sustiva/HIVRT structure predicted here as shown in Figure 4.5. An
overlay of the experimental and predicted bind modes show the Sustiva compounds in
identical conformations except for a slight change in the rotameric state of the
cyclopropyl ethynyl group which would be expected to rotate freely at room temperature.
104
Figure 4. 5. Predicted vs. experimental binding mode for Sustiva (rmsd = 0.73 Å). Cα
carbons aligned at Leu 100, Lys101, Val 106, Tyr181, and Tyr 188. Experimental
structure from reference 135.
Relative Fold Resistance.
A computational experiment was then pursued to help validate the Sustiva model
by predicting relative FR values. Our results should yield the observed experimental
trends, given the proposed Sustiva/HIVRT structure is in fact correct. The methodology,
presented in Chapter One, is a general computational approach to determining the impact
of protein mutations on drug candidates and hinges on the thermodynamic cycle in Figure
4.5.
105
Figure 4. 6. Thermodynamic cycle used to compute relative fold resistance values. In
this example the wild-type side-chain Tyr (magenta) is perturbed to the mutant side chain
Cys in the presence of Drug A (solid red) and Drug B (checkered red) while bound to a
protein (green). Relative fold resistance (∆∆G) = ∆GB – ∆GA = ∆GMUT – ∆GWT.
For two inhibitors, A and B, ∆GWT and ∆GMUT are the differences in free energy of
binding for B vs. A with the wild-type and mutant proteins, respectively, while ∆GA and
∆GB are the changes in free energy of binding for A and B with the mutant vs. the wild-
type protein. The FR activity ratios from IC or EC values are expected to parallel
binding constant ratios for similar inhibitors.115 Computationally, one could mutate
either the drug or the protein. However, we have chosen to perform the structurally
simpler mutations of the protein; specifically, Val106 was mutated to Ala, and Tyr181
was mutated to Cys in the presence of the four NNRTIs (Figure 4.5).
106
Although there is significant variability in the reported fold resistance data,
presumably due to the use of different assay conditions (Table 4.1), Sustiva, however,
consistently emerges as more tolerant towards the Y181C and V106A mutations than the
other drugs, especially nevirapine and MKC-442. Indeed, the present FEP results do
predict Sustiva to be less affected by both mutations than the other three inhibitors as
shown in Table 4.2. The agreement of the computed free energies with the experimental
results strongly supports the correctness of our docked model and the potential utility of
computing relative FR energies.
Table 4. 2. Relative Fold Resistance Energies (∆∆G) in kcal/mol for HIV-1 RT
Mutations Normalized to Sustiva.
∆∆G for Y181C ∆∆G for V106A
inhibitor calcd exptla calcd exptla
Sustiva 0.00 0.00 0.00 0.00
nevirapine 3.88 ± 0.3 2.20, 2.71, 2.90 3.33 ± 0.4 2.34, 2.27, 2.95
MKC-442 4.70 ± 0.3 2.31, 4.45 0.72 ± 0.5 2.38
9-Cl TIBO 3.01 ± 0.3 1.05, 0.41, 2.27 1.32 ± 0.5 0.66, 1.22, 1.48
aValues derived from Table 4.1.
107
Structural Details.
The structural model suggests some factors that render Sustiva less affected by the
Y181C and V106A mutations in comparison with the other compounds. It is well known
that the NNRTI binding site is capable of accommodating structurally diverse inhibitors
and that different inhibitors give rise to strikingly different patterns of resistance
mutations among ca. 15 residues that line the binding site.85, 124 The most common
point mutation sites are depicted schematically in Figure 4.6.
Figure 4. 7. Principal point mutations that confer resistance to non-nucleoside HIV-1 RT
inhibitors. The protein is shown as a ribbon trace in green, the mutation sites in red, and
the non-nucleoside binding site in blue. Crystal structure coordinates, pdb entry 1rt1,
from reference 88.
108
In general, this variability implies that the effect of mutations on drug binding needs
assessment on a case by case basis. However, the Y181C mutant arises early and confers
resistance for many NNRTIs. This can be attributed to the loss of favorable aryl/π
interactions, e.g., between the tyrosine and the methylpyridyl and benzyl rings of
nevirapine and MKC-442, and the dimethylallyl group of 9-Cl TIBO (Figure 4.2). Loss
of the interaction between Tyr181 and the smaller, less polarizable cyclopropyl ethynyl
group of Sustiva is expected to be less detrimental.
In the case of V106A, the residue is tucked under the benzene ring of Sustiva and
is in van der Waals’ contact with the trifluoromethyl group. Reduction of these
interactions appears to be partly compensated by better alignment of the NH-O hydrogen
bond with Lys101 when the buttressing effect of the valine side chain is reduced by
conversion to alanine. In the MC simulations, the hydrogen bond between the oxazinone
NH of Sustiva and the carbonyl oxygen of Lys101 is on average 0.1 Å shorter (1.77 vs.
1.85 Å) when residue 106 is Ala rather than Val. The interaction of the valine’s
isopropyl group with the weakly polarizable trifluoromethyl group is also likely less
attractive than the corresponding interactions with the cyclopropyl group of nevirapine
and the isopropyl and ethoxymethyl groups of MKC-442 (Figure 4.2). Thus, it is
reasonable to propose, on the basis of the present structure, that Sustiva’s improved
resistance profile benefits from a combination of less favorable initial interactions with
Tyr181 and Val106 and more favorable hydrogen bonding with Lys101 in the V106A
mutant. Consistently, the L100I mutation is more damaging (Table 4.1) because Leu100
forms a snug lid over the ring systems for all four inhibitors (Figure 4.2). Without
adjustment, the branching at Cβ rather than Cγ would direct the methyl group of Ile100
109
directly into the rings. An alternative strategy for improved resistance profiles is to
enhance interactions with immutable residues such as Trp 229.136
Conclusion.
In this chapter, we have presented a molecular model for the important anti-HIV
drug Sustiva bound to HIVRT. The resultant structure reveals that Sustiva overlays well
with the butterfly shape of nevirapine (Figure 4.3) and makes similar contacts with
HIVRT as do other reported NNRTIs including hydrogen bonds with the backbone of
Lys101 (Figure 4.2). The docking protocols and methods have been validated using a
control set of NNRTIs of known orientation in the binding site (Figure 4.1). FEP
methodology for the assessment of relative resistance profiles for drug candidates has
been defined (Figure 4.5). Results from its application to four NNRTIs (Table 4.2) are in
good agreement with the experimental activity trends and provided additional evidence
that the proposed binding mode for Sustiva was correct. Sustiva’s relative insensitivity to
the Y181C and V106A mutants appears to arise from a mix of relatively weaker
interactions with Tyr181 and Val106 and improvement of hydrogen bonding for Ala106.
A comparison between the proposed and experimental 135 Sustiva/HIVRT complexes
fully confirms the correctness of the structure predicted here. These findings highlight
the power of molecular modeling for structure and binding affinity predictions and its
potential for structure-based drug design.
110
Chapter 5
Docking Aided by Cluster Analysis: Protocol Development and
Validation Studies
Background.
The docking of ligands into drug targets in order to study intermolecular
interactions at the atomic level is an important part of structure based drug design. The
determination of the binding mode for a novel ligand, for which no experimental
structure of the protein-ligand complex has been reported, is a frequent goal. Although
the target binding site may be known from crystallographic studies of mechanistically
related inhibitors, the number of possible conformations a novel compound could adopt
in the target may be quite large for flexible ligands that contain many rotatable bonds.
The dimensionality of the problem quickly becomes an issue for docking scenarios that
involve thousands of ligands. A balance between accuracy and efficiency is important if
promising drug leads are to be discovered, in a reasonable time frame, using
computational techniques.
We recently reported a prediction of the binding mode of the potent anti-HIV
nonucleoside reverse transcriptase (NNRTI) inhibitor Sustiva obtained through rigid
docking calculations.137 Subsequent experimental work reported a crystallographic
Sustiva/HIVRT complex that fully confirmed the predicted binding mode.135 The
docking protocols had been validated using a test set of four additional NNRTIs by
docking each compound into the HIVRT binding site using the conformation found in the
111
crystal. For Sustiva, only one conformer needed to be docked because the molecule has
only one rotatable bond about the cyclopropyl ethynyl group.
NHO
Cl
O
F3C
Sustiva (efavirenz)
Given the success of the earlier Sustiva docking study we endeavored to increase the data
set size and diversity for rigid docking and to devise methods useful for docking
compounds with multiple rotatable bonds. The 44 different protein-ligand complexes
used in this study represent 11 different proteins (Table 5.1). Many of the ligands in this
set are quite flexible, 26 out of the total 44 have ten or more rotatable bonds which
present an enormous challenge for flexible docking computations. In addition, 9 of the
ligands are sugar-like compounds whose binding affinity is expected to be primarily
driven by electrostatic interactions. This requires a proper arrangement of hydroxyl
groups in order for the ligand to interact favorably with the protein.
112
Table 5. 1. Protein-ligand Complexes Used in this Study
protein pdba protein pdba protein pdba
α-thrombin 1AE8 HIV protease 1HPS thymidylate synthase 1BID
α-thrombin 1BMM HIV protease 1HPV trypsin 1PPC α-thrombin 1BMN HIV protease 1HPX trypsin 1PPH α-thrombin 1DWB HIV protease 1HSG trypsin 1TNG α-thrombin 1DWC HIV protease 1HTF trypsin 1TNH
α-thrombin 1DWD HIV protease 1HVR trypsin 1TNJ
α-thrombin 1HDT HIV protease 4PHV trypsin 1TNK
ε-thrombin 1ETS L-arabinose BPb 1ABE trypsin 1TNL
ε-thrombin 1ETT L-arabinose BP 1ABF trypsin 3PTB
HIV protease 1AAQ L-arabinose BP 1APB elastase 1ELC
HIV protease 1AJV L-arabinose BP 1BAP histidine BP 1HSL
HIV protease 1AJX L-arabinose BP 5ABP retinol BP 1RBP
HIV protease 1GNO L-arabinose BP 6ABP glucose/galactose BP 2GBP
HIV protease 1HBV L-arabinose BP 7ABP intestinal fatty acid BP 2IFB
HIV protease 1HIH L-arabinose BP 8ABP a1AE8 reference 138, 1BMM reference 139, 1BMN reference 139, 1DWB reference 140, 1DWC reference
140, 1DWD reference 140, 1HDT reference 141, 1ETS reference 142, 1ETT reference 142, 1AAQ reference
143, 1AJV reference 144, 1AJX reference 144, 1GNO 145, 1HBV reference 146, 1HIH reference 147, 1HPS
reference 148, 1HPV reference 149, 1HPX reference 150, 1HSG reference 151, 1HTF reference 152, 1HVR
reference 153, 4PHV reference 154, 1ABE reference 155, 1ABF reference 156, 1APB reference 157, 1BAP
reference 157, 5ABP reference 156, 6ABP reference 158, 7ABP reference 158, 8ABP reference 158, 1BID
reference to be published, 1PPC reference 159, 1PPH reference 159, 1TNG reference 160, 1TNH reference
160, 1TNJ reference 160, 1TNK reference 160, 1TNL reference 160, 3PTB reference 161, 1ELC reference
162, 1HSL reference 163, 1RBP reference 164, 2GBP reference 165, 2IFB reference 166. bbinding protein
(BP)
113
The present work is a multi-step approach similar to the divide-and-conquer
strategy recently reported by Wang et al.167 and is divided into three types of
calculations. (1) Using a rigid docking protocol we have docked the 44 different ligands
in Table 5.1 back into their respective crystal structure using the conformation of each
ligand as observed in the crystal. This acts as a control data set; the correct placement in
the crystal structure should be obtained if starting from the correct binding mode
conformation. (2) A limited conformational search was performed for each unbound
ligand in order to generate a number of energy minima conformers of which one or more
may closely resemble the binding mode as observed in the crystal. (3) Cluster analysis,
based on rmsd similarity, was then performed for each ligand using the total set of
conformers generated from the unbound conformational searches. For a given ligand, the
lowest energy member found in each cluster is chosen as the "representative" of that
family. In theory this reduces the total number of conformers that may need to be
docked. Finally, to determine if bound-like conformations are retained after clustering
each cluster representative was then compared with the ligand crystal structure
conformation.
114
The clustering method is illustrated graphically in Figure 5.1. The specific goal is
to reduce the number of candidate structures that would need to be docked for a given
molecule in such a way that bound-like conformations are retained. The cluster members
could then to docked into the target where subsequent energy minimizations, molecular
dynamics (MD), or Monte Carlo (MC) simulations could be used as further refinement.
Figure 5. 1. Clustering protocol for reducing the number of conformers generated from
conformational searches using rmsd geometric similarity.
115
Computational Details.
System Setup.
Binding site models for the 44 protein-ligand complexes were constructed using
the crystal structure coordinates for each system downloaded from the RCSB data bank
(Table 5.1). For most systems, a truncated binding site is necessary in order to make the
simulations practical. This process was accomplished in a semi-automated way using the
recently developed C program CHOP168 to prepare the input files necessary for the
PEPZ106 program to build the protein-ligand Z-matrices used for the docking and Monte
Carlo extended linear response (MC/ELR) studies being pursued concurrently. For each
system, the center of the binding site was defined based on the geometric center of the
ligand as found in the crystal. All residues having any atom farther than a cut-size
parameter from the ligand (13.0−14.2 Å) were deleted. The program will attempt to
replace some previously deleted residues to avoid excessive fragmentation of the protein
backbone based on user supplied min-gap (four residues), and min-chain (three residues)
parameters. Acetyl and methylamine capping groups are then added to the remaining
clipped residues. All charged residues outside of a user defined variable-size (9 Å)
region were neutralized subject to a target-q parameter that dictates the overall charge of
the system. Neutralization of the outer most residues avoids having charged amino acid
groups at the vacuum/water interface for MC/ELR simulations.18 Default protonation
states were used for the side-chains within the 9 Å variable region. The OPLS-AA force
field was used for the protein part of each system while the CM1A augmented OPLS-AA
force field was used for the inhibitors. The CM1A charges scaled by 1.08 (for neutral
116
molecules) was found to yield the lowest overall errors for computed vs. experimental
free energies of hydration (∆∆Ghyd) for 16 test-case molecules in TIP4P water in
comparison with several other scale factors or for CM1P*1.30 charges.169 Fully flexible
ligand Z-matrices were constructed using the AUTOZMAT program170 based the crystal
structure conformation of each compound.
Docking Protocols.
The program MATADOR,130 which uses a MC-based Tabu searching algorithm,
was used for all of the docking calculations. Two key additions to MATADOR have
recently been made. (1) New ligand positions are generated about the current lowest
energy solution using a Gaussian rather than uniform distribution. This process is
continued for each Tabu cycle until a lower energy intermolecular complex is not found
within 50 steps and indicates that a local minimum has been found. This appears to
direct the ligand into a position in the binding pocket corresponding to a local energy
minimum much faster than an intermolecular energy minimization. Once a local
minimum is attained, a new reasonable trial structure is generated and the process repeats
until the requested number of Tabu cycles has been completed. (2) To insure that a
starting intermolecular geometry is sterically reasonable, the overlap is computed
between the ligand and the protein. A three dimensional grid of 40 x 40 x 40 points is
generated around the binding pocket of the protein with each individual grid having 0.8 Å
per side. If a protein atom falls within any grid, that grid and the surrounding 26
neighboring grids (in 3 dimensions) are assigned a value of 1. For each initial structure,
117
if any heavy atom of the ligand is placed within a grid having a value 1, the move is
rejected.
As before, the protein and ligand were kept rigid during the docking, the protein
force field was stored on a spherical grid in order to increase computational efficiency,
and a distance-dependent dielectric constant of 4 (ε = 4r) was used in the calculations.
The Tabu list was set to be 25 and constructed from unique structures that considered
energetic as well as geometric criteria. To keep the search focused on the known binding
site during the docking runs a 100 kcal/mol-Å2 half-harmonic restraining force was
applied if the distance between the ligand and the binding site center was greater than 2.5
Å. The non-bonded intermolecular interaction energy between the ligand and protein
was computed for each trial structure and provides a measure of the steric and
electrostatic complementarity. As in the previous Sustiva study, the lowest energy
structure generated was taken as the "best" docked system.137 For docking calculations
of ligands with multiple conformers, the intramolecular energy for the given conformer is
added to the protein-ligand intermolecular energy as a means to assess the relative strain
energy of the conformer.
Conformational Searching and Clustering Analysis
Each unbound ligand was subjected to a limited Monte Carlo conformational
search using the BOSS program171 in order to generate local minima geometries for
subsequent docking calculations. Since only 200 starting structures were requested,
which for very flexible molecules would not be sufficient to determine all of the local
minima, the searches are incomplete. To increase the likelihood that bound-like
118
conformations might be found with these limited searches a dielectric constant of 20.0
was used. The internal geometry of a ligand, while bound to a protein, is expected to
posses fewer internal hydrogen-bonds. Larger dielectrics would be expected to provide
electrostatic screening and prevent too many compacted structures in the conformational
searches. The degrees of freedom to be sampled for each molecule were determined
automatically by the program and ranged from 2−28 rotatable bonds.
Even with the limited number of starting structures requested in the
conformational searches, for compounds with many rotatable bonds the number of
conformers generated may be too large for efficient docking. For this reason, cluster
analysis was performed which can group conformers into families that are geometrically
similar. The root-mean-square-deviation (rmsd) was determined for every pairwise
combination of conformers considering only heavy atom coordinates. Starting from the
lowest energy conformer, all conformers with an rmsd less than or equal to some rmsd
tolerance are considered to be part of that cluster and removed from further clustering for
the given rmsd tolerance. In this way, each conformer can only belong to one cluster.
The number of clusters obtained equals the number of total conformers using a rmsd
tolerance of 0.0 Å while increasing the numerical value of the tolerance always reduces
the number of clusters for each compound. The lowest energy member from each cluster
is then considered the cluster "representative".
119
Results.
Crystal Structure Docking Validation.
As a first step towards the development of docking protocols each ligand was first
removed from the crystal structure and then docked back in without changing the
conformation of the ligand. Rigid docking calculations were initiated requesting ten MC
blocks of 500 or 1000 Tabu cycles each consisting of 100 ligand trial placements. For
each block a new random seed number was used which leads to a different initial
structure and final results. Each block yields one best solution that can be compared with
the experimental crystal structure. Predicted structures having rmsd values of less than or
equal to 2 Å may be considered in good agreement with the experimental crystal structure
and provide some indication that the structure is geometrically close. Correct, close, and
incorrect docking solutions are illustrated in Figure 5.2 using 3 of the 10 solutions from
the 10 block runs for trypsin system 1PPH. For this particular case, the lowest
intermolecular energy obtained also correlates with the smallest rmsd between the docked
and experimental binding mode.
120
Figure 5. 2. Three lowest energy solutions from rigid docking calculations for trypsin
system 1PPH. The experimental binding mode is shown in magenta and three docking
solutions are shown in green.
121
Table 5.2 tabulates the percent of structures correctly docked (eq 5.1) and
represents an upper limit for successful docking if one knows a priori the correct bound
conformation of the ligand. Some crystal structure complexes have unfavorable
intermolecular energies before docking, for these cases, it is unlikely that docking will be
successful unless the unfavorable contacts are relieved using energy minimization
techniques or by reducing van der Waals radii for the ligand.
10044
crystal thefrom Å 2.0 dockednumber correct percent ×<=
= (5.1)
Table 5. 2. The Percent of Structures Correctly Docked using the Ligand Crystal
Structure Conformation.
Number of
Tabu Cycles
Number of
Blocks
Total Number of
Trial Structures
% correct
(rmsd <= 2.0 Å)
500 1 5 x 104 38.6
500 5 25 x 104 72.7
500 10 50 x 104 81.8
1000 1 1 x 105 54.5
1000 5 5 x 105 75.0
1000 10 10 x 105 81.8 aRoot mean square deviation (rmsd) of <= 2.0 Å between the predicted and experimental
structure for the data set comprising 44 protein-ligands complexes.
122
The percent of structures correctly docked improves the longer the simulations are run.
A large improvement of 34% (38.6 → 72.7) and 21% (54.5 → 75.0) is obtained after 5
rather than 1 blocks for the 550 cycle and 1000 cycle Tabu runs respectively. Increasing
the number of blocks to 10 only improves the success rate by another 7% − 9%.
The docking results appear to converge to the correct solution more quickly for
ligands that have a clearly defined binding site (Table 5.3, Figures 5.3 and 5.4). For
example, the correct solution (rmsd <= 2.0 Å) is found in only 2 of 10 blocks for ligand
1AE8 which has a shallow binding site on the surface of α-thrombin. In contrast, the
correct solution is found 9 times out of 10 for the inhibitor that is more buried in the HIV
protease binding site from system 1AAQ. Table 5.3 tabulates the intermolecular energies
and rmsds solutions with the crystal obtained from each block. In Table 5.3 the lowest
energy solution (bold rows) closely resembles the binding mode of the ligand.
123
Table 5. 3. Intermolecular Energies and rmsd Results from Rigid Docking Calculations for Ligands 1AE8 and 1AAQ.
ligand block number rmsd intermolecular energy 1 4.12 −33.04 2 4.26 −32.95 3 4.51 −33.58 4 4.84 −32.93
1AE8 5 4.22 −33.57 6 4.33 −34.33 7 4.28 −33.48 8 4.38 −32.62 9 0.70 −46.20 10 0.99 −34.34
1 1.12 −48.43 2 1.10 −46.08 3 0.75 −49.64 4 0.71 −37.86
1AAQ 5 0.89 −48.11 6 12.19 −39.02 7 0.99 −46.06 8 0.71 −51.05 9 0.75 −51.26 10 1.26 −42.53
Figure 5.3 graphically depicts the solutions in Table 5.3 which were initiated requesting
1000 Tabu cycles. The two types of binding sites, buried vs. solvent-exposed are also
presented in Figure 5.4 for α-thrombin (1AE8, exposed) and HIV protease (1AAQ,
buried).
124
Figure 5. 3. Number of correctly docked structures shown in green from 10 block runs of
1000 Tabu cycles each.
Figure 5. 4. Example of a shallow and solvent exposed binding site vs. an enclosed
buried binding site.
125
CPU Timings.
The CPU timings for the rigid docking calculation are dependent on the size of
the protein-ligand system. Taking complex 1AJV as an example, the truncated HIV
protease protein contains 142 protein residues out of 199 total and the cyclic sulfamide
inhibitor is considered as 1 residue with 75 (41 heavy) atoms. Note that the binding site
model could have been made smaller for the docking calculations which would have
dramatically decreased the docking times, however, the timing results presented here are
for a typically-sized system suitable for MC/ELR simulations with explicit solvent. The
CPU timings shown in Table 5.4 have been obtained using MATADOR executed on a
733MHz PentiumIII processor running Linux.
Table 5. 4. Average CPU Timings for System 1AJV.
System
Number of
Tabu Cycles
Number of
Blocks
Total Number of
Trial Structures
Avg. CPU time
(minutes)
1AJV 500 1 5 x 104 5
1AJV 500 5 25 x 104 23
1AJV 500 10 50 x 104 45
1AJV 1000 1 1 x 105 7
1AJV 1000 5 5 x 105 34
1AJV 1000 10 10 x 105 67
126
Conformational Search Results.
Each ligand was subjected to a limited conformational search that requested 200
starting structures. The variables to be sampled were determined automatically by the
BOSS program and consisted only of rotations about torsional angles. Using the defaults,
no aromatic or cyclic ring torsions were varied in the conformational searches.
For sugars, whose binding is primarily determined by hydrogen-bonding and not
van der Waals interactions, docking is especially challenging. Although the rigid
docking tends to place each sugar into the appropriate binding pocket using random
hydrogen positions the hydroxyl group orientations are not optimal for electrostatic
interactions with the protein. This leads to docking solutions that are incorrect as
illustrated for L-arabinose binding protein system 1APB (Figure 5.5).
Figure 5. 5. Predicted (green) vs. experimental (red) binding mode for ligand 1APB
before the ligand was subjected to a conformational search. Rmsd = 3.2 Å.
127
Search results for ligand α-D-fucose, from system 1APB, yielded all the correct
rotameric states for each hydroxyl group and correctly predicted the absence of a third
rotamer for the hydroxyl attached to the anomeric carbon (Figure 5.6). In general, for the
L-arabinose systems in the present data set a small number of conformers is obtained
from the searches. Subsequent docking of each conformer separately resulted in one
conformer having the appropriate hydroxyl pattern for interaction with the protein and
yielded the correct solution for 1APB (Figure 5.7). The correct binding mode was also
obtained using multi-conformer docking for L-arabinose binding protein systems 1ABE,
1ABF, 1BAP, and 6ABP.
128
Figure 5. 6. Conformational search results for unbound ligand 1APB. The conformers
are overlaid to emphasize the 11 different hydroxyl group rotamers.
Figure 5. 7. Lowest energy complex obtained for system 1APB after docking using the
11 conformers obtained from the conformational search. The heavy atom rmsd is 0.67 Å
from the crystal structure shown in green.
129
Despite the limited number of starting structures requested in the conformational
searches, for most ligands, the search results yield at least one conformer that is similar to
the bound conformation. This is illustrated in Figures 5.8 and 5.9 for eight of the twenty-
six ligands with 10 or more rotatable bonds.
Figure 5. 8. Crystal structure conformation (spoke representation) overlaid with best
match conformer (ball and stick representation) from the conformational searches for
ligands 1AE8, 1AJV, 1BMM, and 1DWC.
130
Figure 5. 9. Crystal structure conformation (spoke representation) overlaid with best
match conformer (ball and stick representation) from the conformational searches for
ligands 1GNO, 1HDT, 1HPV, and 1HSG.
131
The flexible ligands yielded a large number of conformers although in most cases
the lowest energy structure found (conformer 1) was not the best geometric match with
the crystal. The energy differences between these bound-like conformation from the
conformational search and the lowest energy conformer provides an estimate of relative
strain energy and is tabulated in Table 5.5 for eight representative compounds.
Table 5. 5. Energy Difference Between the Bound-like Conformer and the Lowest Energy Conformer Found in the Conformational Searches for Eight Different Ligands.
ligand ∆E ligand ∆E
1AE8 5.7 kcal/mol 1GNO 8.6 kcal/mol
1AJV 3.6 kcal/mol 1HDT 0.0 kcal/mol
1BMM 12.3 kcal/mol 1HPV 9.6 kcal/mol
1DWC 5.9 kcal/mol 1HSG 2.0 kcal/mol
132
Cluster Analysis Results.
In theory, the number of conformers found in an exhaustive and complete
conformational search should be a function of the number of rotatable bonds in the
molecule. Table 5.6 lists the number of dihedral angles sampled in the limited
conformational searches (Nrot) and the resultant number of local minima (Nconf) found
for each molecule out of 200 starting structures using a dielectric constant of 20.0. The
number of rotatable bonds sampled in the searches is 10 or greater for 26 out of the 44
ligands which yield, on average, 164 conformers each. We used cluster analysis in order
to group the conformers, for each system, into families of like geometries. For ligands
that have 10 or less total conformers, or whose conformers geometries only differ
because of hydroxyl group orientations (i.e., sugars) cluster analysis may not be useful
since only heavy atoms are used in the rmsd computations. Table 5.6 shows the number
of clusters obtained for each system for 10 different rmsd similarity cutoff values. Figure
5.10 is a histogram representation of how different rmsd similarity values affect the
clustering results for the 26 most flexible ligands in Table 5.6.
133
Table 5. 6. Cluster Analysis Results. Each Column Tabulates the Number of Rotatable bonds (Nrot), the Number of Conformers
(Nconf) found in the Limited Conformational Search, and Number of Clusters for 10 different rmsd Similarity Tolerance Values..
Number of clusters obtained for increasing rmsd (Å) similarity valuesa
protein pdb codeb Nrotc Nconfd
rmsd
1.00
rmsd
1.50
rmsd
2.00
rmsd
2.50
rmsd
2.75
rmsd
3.00
rmsd
3.50
rmsd
4.00
rmsd
4.50
rmsd
5.00
α-thrombin 1AE8 16 186 153 86 30 9 7 4 3 2 2 1α-thrombin 1BMM
17 180 160 106 45 17 11 7 4 3 2 1α-thrombin 1BMN 14 174 127 82 42 15 10 7 3 2 2 1α-thrombin 1DWB 3 2 1 1 1 1 1 1 1 1 1 1α-thrombin 1DWC 13 157 120 66 33 12 9 6 3 2 2 1α-thrombin 1DWD 13 170 162 116 39 15 11 7 4 3 2 1α-thrombin
1HDT
23 187 176 164 108 42 25 16 6 3 3 2
ε-thrombin 1ETS 13 174 160 119 41 14 11 6 3 2 2 1ε-thrombin
1ETT
10 78 58 35 12 4 3 2 2 1 1 1
HIV protease 1AAQ 23 187 182 168 118 48 27 13 6 3 2 2HIV protease 1AJV 12 147 116 40 9 3 2 2 2 1 1 1HIV protease 1AJX 12 125 91 38 11 5 2 2 2 1 1 1HIV protease 1GNO 28 189 177 164 117 49 29 17 9 4 2 2HIV protease 1HBV 18 158 143 121 69 27 16 11 5 4 3 1HIV protease 1HIH 20 178 162 142 94 31 18 13 6 3 2 1HIV protease 1HPS 21 193 190 173 137 62 37 24 11 4 3 2HIV protease 1HPV 15 170 151 109 50 14 9 6 4 2 2 1HIV protease 1HPX 19 180 173 160 114 55 34 19 8 4 2 2HIV protease 1HSG 16 165 152 123 70 25 17 12 6 4 3 2HIV protease 1HTF 16 184 178 150 73 26 18 12 6 3 2 2HIV protease 1HVR 10 126 96 53 20 9 6 5 2 1 1 1HIV protease 4PHV 17 171 162 147 89 36 25 15 8 4 3 2
134
Table 5.6 continued
Number of clusters obtained for increasing rmsd (Å) similarity valuesa
protein pdb codeb Nrotc Nconfd rmsd
1.00
rmsd
1.50
rmsd
2.00
rmsd
2.50
rmsd
2.75
rmsd
3.00
rmsd
3.50
rmsd
4.00
rmsd
4.50
rmsd
5.00 L-arabinose BP 1ABE 4 7 1 1 1 1 1 1 1 1 1 1L-arabinose BP
1ABF 4 11 1 1 1 1 1 1 1 1 1 1L-arabinose BP 1APB 4 11 1 1 1 1 1 1 1 1 1 1L-arabinose BP 1BAP 4 7 1 1 1 1 1 1 1 1 1 1L-arabinose BP 5ABP 6 36 1 1 1 1 1 1 1 1 1 1L-arabinose BP 6ABP 4 9 1 1 1 1 1 1 1 1 1 1L-arabinose BP 7ABP 4 12 1 1 1 1 1 1 1 1 1 1L-arabinose BP
8ABP
6 39 1 1 1 1 1 1 1 1 1 1
thymidylate synthase
1BID
5 48 19 5 1 1 1 1 1 1 1 1
trypsin 1PPC 13 165 152 111 42 17 11 7 4 3 2 1trypsin 1PPH 10 112 89 51 15 5 5 4 2 2 1 1trypsin 1TNG 2 2 1 1 1 1 1 1 1 1 1 1trypsin 1TNH 2 1 1 1 1 1 1 1 1 1 1 1trypsin 1TNJ 3 2 1 1 1 1 1 1 1 1 1 1trypsin 1TNK 4 5 3 1 1 1 1 1 1 1 1 1trypsin 1TNL 2 2 1 1 1 1 1 1 1 1 1 1trypsin
3PTB
3 2 1 1 1 1 1 1 1 1 1 1
elastase
1ELC
16 177 159 115 42 16 10 7 5 2 1 1
histidine BP
1HSL
4 5 5 1 1 1 1 1 1 1 1 1
retinol BP
1RBP
10 136 40 14 5 2 2 2 1 1 1 1
glucose/galactose BP 2GBP 6 30 1 1 1 1 1 1 1 1 1 1intestinal fatty acid BP 2IFB 14 188 170 36 8 4 3 2 1 1 1 1 aRmsd similarity values are computed using heavy atoms only. bSee Table 5.1 for pdb references. cNumber of rotatable bonds (Nrot) sampled in the conformational searches. dNumber of conformers (Nconf) obtained from a limited conformational search that requested 200 starting structures.
135
Figure 5. 10. A histogram representation of how similarity values affect the number of
clusters for the 26 most flexible ligands.
136
The grouping of conformers into clusters of similar geometry is visually represented in
Figure 5.11. Here, the first 4 clusters are shown for ligand 1HPX and were obtained
using a rmsd similarity value of 2.0 Å.
Figure 5. 11. A visual representation of clustering. The first 4 clusters are shown for
ligand 1HPX and were obtained using a rmsd similarity value of 2.0 Å.
137
The grouping of conformers into families does reduce the dimensionality of the
problem, however, in any filtering technique correct solutions will almost certainly be
discarded. To determine how many of the cluster representatives are geometrically
similar to the bound crystal structure conformation we computed the number of family
members in each cluster that have an rmsd <= 2.0 Å from the geometry of the ligand in
the crystal for 5 different rmsd tolerance values. In Table 5.7 only compounds with 10
rotatable bonds or more are included (N=26). Here, a value of 0 corresponds to no
cluster member having a bound-like conformation. In principle only 1 bound-like
conformer needs to be retained for the technique to be useful. Clustering based on a rmsd
similarity cutoff of 2.50 appears to dramatically reduce the number of conformers (Table
5.6) yet still retain at least 1 conformer that is close to the crystal structure geometry of
the bound ligand for the majority of systems in Table 5.7. For each rmsd tolerance cutoff
value in Table 5.7 the total number of ligands for which no cluster member is <= 2.0 Å
from the crystal conformation is the No. missed value and indicates that the bound-like
conformation was filtered out. Using smaller rmsd tolerances in the clustering does
increase the likelihood that a bound-like conformations will be retained however the
number of cluster representatives is also increased.
138
Table 5. 7. The Number of Cluster Representatives with an rmsd <= 2.0 Å from the
Ligand Crystal Conformation. Five Cluster Tolerances are Shown.
ligand 1.50 Å 2.00 Å 2.50 Å (rmsd to crystal) 2.75 Å 3.00 Å
1AE8 6 2 1 (1.7) 0 0
1BMM 7 4 1 (1.9) 1 0
1BMN 4 3 1 (1.9) 1 1
1DWC 4 2 1 (1.3) 1 1
1DWD 0 0 0 0 0
1HDT 3 1 1 (1.3) 1 1
1ETS 5 4 0 1 1
1ETT 5 1 1 (1.1) 1 1
1AAQ 3 2 0 1 1
1AJV 20 3 1 (1.5) 1 1
1AJX 6 2 0 0 0
1GNO 1 1 1 (1.8) 1 0
1HBV 1 1 1 (2.0) 0 0
1HIH 3 3 1 (1.0) 1 1
1HPS 2 1 1 (2.0) 1 1
1HPV 4 1 1 (0.9) 0 0
1HPX 3 3 1 (1.8) 1 1
1HSG 2 0 0 0 0
1HTF 3 1 1 (1.7) 0 0
1HVR 5 1 1 (1.8) 1 1
4PHV 0 0 0 0 0
1PPC 5 2 1 (1.9) 1 0
1PPH 7 2 0 0 0
1ELC 5 2 1 (1.8) 1 0
1RBP 7 2 1 (0.8) 1 1
2IFB 17 3 1 (1.3) 1 1
No. missed No. missed No. missed No. missed No. missed
2 3 7 9 13
139
Figure 5.12 shows the cluster representative that was retained after first pruning
down the total conformer list (Nconf → Nclust) using an rmsd similarity tolerance of 2.5
Å overlaid with the experimental crystal structure for 4 representative compounds in
Table 5.7.
Figure 5. 12. Representative cluster survivors (ball and stick representation) overlaid
with crystal structure conformation (spoke representation).
140
Conclusion.
In this chapter we have presented rigid docking results for 44 protein-ligand
complexes, conformational search results for each unbound ligand, and cluster results
based on geometric similarity. An upper limit of 82% was found for the re-docking of
the ligands back into their respective proteins using the ligand conformation of the
crystal. To determine if bound-like geometries of each ligand could be generated for
cases in which the binding geometry of ligand was not known, a limited conformational
search which requested 200 starting structures was performed for each ligand. Despite
the limited searches, bound-like geometries were found among the many local minima
generated, even for very flexible ligands. Clustering analysis has been used to group the
conformational search results into families of like geometry as defined by a rmsd
similarity tolerance value. Clustering based on a rmsd value of 2.5 Å dramatically
reduced the total number of clusters yet still retained at least one cluster representative
with a conformation similar to the experimental bound-like conformation for the majority
of systems. For a given ligand it may be appropriate to vary the rmsd cutoff until the
desired number of clusters is obtained. Although a clustering solution may be
geometrically similar to the bound-like ligand it remains to be seen if these structure can
be docked back into the protein binding sites given that a perfect fit is unlikely. Although
reducing the steric penalty for overlap between the ligand and protein should improve the
percent of cluster survivors that can be successfully docked in, molecular dynamics or
Monte Carlo simulations should be used to refine the candidate structures prior to any
binding affinity estimations using scoring-based functions.
141
Cited References.
(1) AIDS epidemic update: December 2000, Joint United Nations Programme on
HIV/AIDS (UNAIDS) and The World Health Organization (WHO).
http://www.unaids.org.
(2) AIDS epidemic update: December 1999, Joint United Nations Programme on
HIV/AIDS (UNAIDS) and The World Health Organization (WHO).
http://www.unaids.org.
(3) Goodenow, M.; Huet, T.; Saurin, W.; Kwok, S.; Sninsky, J.; Wainhobson, S. HIV-1
Isolates Are Rapidly Evolving Quasispecies: Evidence For Viral Mixtures and Preferred
Nucleotide Substitutions. J. Acquir. Immune Defic. Syndr. Hum. Retrovirol. 1989, 2,
344-352.
(4) Eigen, M. On the nature of virus quasispecies. Trends Microbiol. 1996, 4, 216-218.
(5) Harper, D. R. Molecular Virology; Bios Scientific Publishers Ltd: Oxford, 1998.
(6) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M., N; Teller, A. H. Equation of
State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087-1092.
(7) Allen, M. P.; Tidlesley, D. J. Computer Simulations of Liquids; Clarendon Press:
Oxford, 1987.
(8) Jorgensen, W. L. Monte Carlo Simulations for Liquids. In Encyclopedia of
Computational Chemistry; Schleyer, P. v. R., Ed.; Wiley: New York, 1998; Vol. 3, pp
1754-1763.
(9) Verlet, L. Computer 'Experiments' on Classical Fluids. I. Thermodynamical
Properties of Lennard-Jones Molecules. Phys. Rev. 1967, 159, 98-103.
142
(10) Allinger, N. A. Force Fields: A Brief Introduction. In Encyclopedia of
Computational Chemistry; Schleyer, P. v. R., Ed.; Wiley: New York, 1998; Vol. 2, pp
1013-1015.
(11) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the
OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic
Liquids. J. Am. Chem. Soc. 1996, 118, 11225-11236.
(12) Zwanzig, R. W. High-Temperature Equation of State by a Perturbation Method. I.
Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420-1426.
(13) Jorgensen, W. L.; Ravimohan, C. Monte Carlo Simulation of Differences in Free
Energies of Hydration. J. Chem. Phys. 1985, 83, 3050-3054.
(14) Jorgensen, W. L.; Briggs, J. M.; Contreras, M. L. Relative Partition-Coefficients For
Organic Solutes From Fluid Simulations. J. Phys. Chem. 1990, 94, 1683-1686.
(15) Åqvist, J.; Medina, C.; Samuelsson, J.-E. A New Method For Predicting Binding
Affinity in Computer-Aided Drug Design. Protein Eng. 1994, 7, 385-391.
(16) Carlson, H. A.; Jorgensen, W. L. An Extended Linear Response Method For
Determining Free Energies of Hydration. J. Phys. Chem. 1995, 99, 10667-10673.
(17) McDonald, N. A.; Carlson, H. A.; Jorgensen, W. L. Free energies of solvation in
chloroform and water from a linear response approach. J. Phys. Org. Chem. 1997, 10,
563-576.
(18) Hansson, T.; Åqvist, J. Estimation of binding free energies for HIV proteinase
inhibitors by molecular dynamics simulations. Protein Eng. 1995, 8, 1137-1144.
(19) Paulsen, M. D.; Ornstein, R. L. Binding free energy calculations for P450cam-
substrate complexes. Protein Eng. 1996, 9, 567-571.
143
(20) Hulten, J.; Bonham, N. M.; Nillroth, U.; Hansson, T.; Zuccarello, G.; Bouzide, A.;
Aqvist, J.; Classon, B.; Danielson, U. H.; Karlen, A.; Kvarnstrom, I.; Samuelsson, B.;
Hallberg, A. Cyclic HIV-1 Protease Inhibitors Derived from Mannitol: Synthesis,
Inhibitory Potencies, and Computational Predictions of Binding Affinities. J. Med.
Chem. 1997, 40, 885-897.
(21) Hansson, T.; Marelius, J.; Aqvist, J. Ligand binding affinity prediction by linear
interaction energy methods. J. Comput.-Aided Mol. Des. 1998, 12, 27-35.
(22) Wang, W.; Wang, J.; Kollman, P. A. What Determines the van der Waals
Coefficient beta in the LIE (Linear Interaction Energy) Method to Estimate Binding Free
Energies Using Molecular Dynamics Simulations? Proteins 1999, 34, 395-402.
(23) Jones-Hertzog, D. K.; Jorgensen, W. L. Binding affinities for Sulfonamide Inhibitors
with human Thrombin Using Monte Carlo Simulations with a Linear Response Method.
J. Med. Chem. 1997, 40, 1539-49.
(24) Smith, R. H.; Jorgensen, W. L.; Tirado-Rives, J.; Lamb, M. L.; Janssen, P. A. J.;
Michejda, C. J.; Smith, M. B. K. Prediction of Binding Affinities for TIBO Inhibitors of
HIV-1 Reverse Transcriptase Using Monte Carlo Simulations in a Linear Response
Method. J. Med. Chem. 1998, 41, 5272-5286.
(25) Lamb, M. L.; Tirado-Rives, J.; Jorgensen, W. L. Estimation of the binding affinities
of FKBP12 inhibitors using a linear response method. Bioorg. Med. Chem. 1999, 7, 851-
860.
(26) Duffy, E. M.; Jorgensen, W. L. Prediction of Properties from Simulations: Free
Energies of Solvation in Hexadecane, Octanol, and Water. J. Am. Chem. Soc. 2000, 122,
2878-2888.
144
(27) Morgantini, P. Y.; Kollman, P. A. Solvation Free Energies of Amides and Amines:
Disagreement Between Free Energy Calculations and Experiment. J. Am. Chem. Soc.
1995, 117, 6057-6063.
(28) Ding, Y. B.; Bernardo, D. N.; Kroghjespersen, K.; Levy, R. M. Solvation Free
Energies of Small Amides and Amines From Molecular-Dynamics Free Energy
Perturbation Simulations Using Pairwise Additive and Many-Body Polarizable
Potentials. J. Phys. Chem. 1995, 99, 11575-11583.
(29) Ben-Naim, A.; Marcus, Y. Solvation Thermodynamics of Nonionic Solutes. J.
Chem. Phys. 1984, 81, 2016-2027.
(30) Jones, F. M., III; Arnett, E. M. Thermodynamics of Ionization and Solution of
Aliphatic Amines in Water. Prog. Phys. Org. Chem. 1974, 11, 263-322.
(31) Wolfenden, R. Interaction of the Peptide Bond With Solvent Water: A Vapor Phase
Analysis. Biochemistry 1978, 17, 201-204.
(32) Rao, B. G.; Singh, U. C. Hydrophobic Hydration - a Free-Energy Perturbation
Study. J. Am. Chem. Soc. 1989, 111, 3125-3133.
(33) Meng, E. C.; Caldwell, J. W.; Kollman, P. A. Investigating the anomalous solvation
free energies of amines with a polarizable potential. J. Phys. Chem. 1996, 100, 2367-
2371.
(34) Marten, B.; Kim, K.; Cortis, C.; Friesner, R. A.; Murphy, R. B.; Ringnalda, M. N.;
Sitkoff, D.; Honig, B. New model for calculation of solvation free energies: Correction of
self-consistent reaction field continuum dielectric theory for short-range hydrogen-
bonding effects. J. Phys. Chem. 1996, 100, 11775-11788.
145
(35) Cramer, C. J.; Truhlar, D. G. Am1-Sm2 and Pm3-Sm3 Parameterized Scf Solvation
Models For Free-Energies in Aqueous-Solution. J. Comput.-Aided Mol. Des. 1992, 6,
629-666.
(36) Barone, V.; Cossi, M.; Tomasi, J. A new definition of cavities for the computation of
solvation free energies by the polarizable continuum model. J. Chem. Phys. 1997, 107,
3210-3221.
(37) Klamt, A.; Jonas, V.; Burger, T.; Lohrenz, J. C. W. Refinement and parametrization
of COSMO-RS. J. Phys. Chem. A 1998, 102, 5074-5085.
(38) Sun, Y. X.; Spellmeyer, D.; Pearlman, D. A.; Kollman, P. Simulation of the
Solvation Free-Energies For Methane, Ethane, and Propane and Corresponding Amino-
Acid Dipeptides - a Critical Test of the Bond-Pmf Correction, a New Set of Hydrocarbon
Parameters, and the Gas-Phase Water Hydrophobicity Scale. J. Am. Chem. Soc. 1992,
114, 6798-6801.
(39) Jorgensen, W. L.; Tirado-Rives, J. Free energies of hydration for organic molecules
from Monte Carlo simulations. Perspect. Drug Discov. Design 1995, 3, 123-138.
(40) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D.
M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A 2nd Generation Force
Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am.
Chem. Soc. 1995, 117, 5179-5197.
(41) Gao, J. L.; Xia, X.; George, T. F. Importance of Bimolecular Interactions in
Developing Empirical Potential Functions For Liquid-Ammonia. J. Phys. Chem. 1993,
97, 9241-9247.
(42) Jorgensen, W. L. BOSS Version 3.8; Yale University: New Haven, CT, 1997.
146
(43) Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.;
Cheeseman, J. R.; Strain, M. C.; Burant, J. C.; Stratman, R. E.; Petersson, G. A.;
Montgomery, J. A.; Zakrzewski, V. G.; Raghavachari, K.; Ayala, P. Y.; Cui, Q.;
Morokuma, K.; Ortiz, J. V.; Foresman, J. B.; Cioslowski, J.; Stefanov, B. B.; Chen, W.;
Wong, M. W.; Andres, J. L.; Replogle, E. S.; Gomperts, R.; Martin, R. L.; Fox, D. J.;
Keith, T.; Al-Laham, M. A.; Nanayakkara, A.; Challacombe, M.; Peng, C. Y.; Stewart, J.
J. P.; Gonzalez, C.; Head-Gordon, M.; Gill, P. M. W.; Johnson, B. G.; Pople, J. A.
Gaussian 95, Development Version (Revision E.1); Gaussian Inc.: Pittsburgh PA, 1996.
(44) Maxwell, D.; Tirado-Rives, J. Fitpar Version 1.1.1.; Yale University: New Haven,
Connecticut, 1994.
(45) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L.
Comparison of simple potential functions for simulating liquid water. J. Chem. Phys.
1983, 79, 926-935.
(46) Severance, D. L.; Essex, J. W.; Jorgensen, W. L. Generalized Alteration of Structure
and Parameters - a New Method For Free-Energy Perturbations in Systems Containing
Flexible Degrees of Freedom. J. Comput. Chem. 1995, 16, 311-327.
(47) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L.
Comparison of Simple Potential Functions For Simulating Liquid Water. J. Chem. Phys.
1983, 79, 926-935.
(48) Stamant, A.; Cornell, W. D.; Kollman, P. A. Calculation of Molecular Geometries,
Relative Conformational Energies, Dipole-Moments, and Molecular Electrostatic
Potential Fitted Charges of Small Organic-Molecules of Biochemical Interest By
Density-Functional Theory. J. Comput. Chem. 1995, 16, 1483-1506.
147
(49) Lambert, J. B.; Featherman, S. I. Conformational Analysis of Pentamethylene
Heterocycles. Chem. Rev. 1975, 75, 611-626.
(50) Blackburne, I. D.; Katritzky, A. R.; Takeuchi, Y. Conformations of Piperidine and of
Derivatives with Additional Ring Heteroatoms. Accounts Chem. Res. 1975, 8, 300-306.
(51) Anet, F. A. L.; Ghiaci, M. On the Question of the Realtionship of Nitrogen-15
Chemical Shifts to Barriers to C-N Internal Rotation. Dynamic Nuclear Magnetic
Resonance of Urea and Aniline Derivatives. J. Am. Chem. Soc. 1979, 101, 6857-6860.
(52) Murphy, R. B.; Beachy, M. D.; Friesner, R. A.; Ringnalda, M. N. Pseudospectral
Localized Moller-Plesset Methods - Theory and Calculation of Conformational Energies.
J. Chem. Phys. 1995, 103, 1481-1490.
(53) Kim, K. S.; Friesner, R. A. Hydrogen bonding between amino acid backbone and
side chain analogues: A high-level ab initio study. J. Am. Chem. Soc. 1997, 119, 12952-
12961.
(54) Åqvist, J. Ion-Water Interaction Potentials Derived form Free Energy Perturbation
Simulations. J. Phys. Chem. 1990, 94, 8021-8024.
(55) Davidson, W. R.; Kebarle, P. Binding-Energies and Stabilities of Potassium-Ion
Complexes From Studies of Gas-Phase Ion Equilibria K+ + M = K+M. J. Am. Chem.
Soc. 1976, 98, 6133-6138.
(56) Haar, L.; Gallagher, J. S. Thermodynamic Properties of Ammonia. J. Phys. Chem.
Ref. Data 1978, 7, 635-&.
(57) Felsing, W. A.; Thomas, A. R. Vapor Pressures and Other Physical Constants of
Methylamine and Methylamine Solutions. Ind. Eng. Chem. 1929, 21, 1269-1272.
148
(58) Aston, J. G.; Siller, C. W.; Messerly, G. H. Heat Capacities and Entropies of Organic
Compounds. III. Methylamine from 11.5 °K. to the Boiling Point. Heat of Vaporization
and Vapor Pressure. The Enropy from Molecular Data. J. Am. Chem. Soc. 1937, 59,
1743-1751.
(59) Swift, E., Jr. The Densities of Some Aliphatic Amines. J. Am. Chem. Soc. 1942, 64,
115-116.
(60) The Properties of Gases and Liquids, 3rd ed.; Reid, R. C., Prausnitz, J. M. and
Sherwood, T. K., Ed.; McGraw-Hill, New York, 1977.
(61) Letcher, T. M. Thermodynamics of aliphatic amine mixtures I. The excess volumes
of mixing for primary, secondary, and tertiary aliphatic amines with benzene and
substituted benzene compounds. J. Chem. Thermodyn. 1972, 5, 159-173.
(62) CRC Handbook of Chemistry and Physics, 72nd ed.; Lide, D. R., Ed.; CRC Press,
Inc., Boca Raton, FL, 1991-1992.
(63) Aston, J. G.; Eidinoff, M. L.; Forster, W. S. The Heat Capacitiy and Entropy, Heats
of Fusion and Vaporization and the Vapor Pressure of Dimethylamine. J. Am. Chem.
Soc. 1939, 61, 1539-1543.
(64) Aston, J. G.; Sagenkahn, M. L.; Szasz, G. J.; Moessen, G. W.; Zuhr, H. F. The Heat
Capacitiy and Entropy, Heats of Fusion and Vaporization and the Vapor Pressure of
Trimethylamine. The Enropy from Spectroscopic and Molecular Data. J. Am. Chem.
Soc. 1944, 66, 1171-1177.
(65) Barb, W. G. The Kinetics and Mechinism of the Polymerization of Ethyleneimine.
J. Chem. Soc. 1955, 2564-2577.
149
(66) Cabani, S.; Conti, G.; Lepori, L. Thermodynamic Study on Aqueous Dilute
Solutions of Organic Compounds Part 1. Cyclic Amines. Trans. Faraday Soc. 1971, 67,
1933-1942.
(67) Ruzicka, L.; Salomon, G.; Meyer, K. E. Overview of Cyclic Amine Properties (In
German). Helv. Chim. Acta 1937, 20, 109-128.
(68) Helm, V. R.; Lanum, W. J.; Cook, G. L.; Ball, J. S. Purification and properties of
pyrrole, purrolidine, pyridine, and 2-methylpyridine. J. Am. Chem. Soc. 1958, 62, 858-
861.
(69) Lanum, W. J.; Morris, J. C. Physical Properties of Some Sulfur and Nitrogen
Compounds. J. Chem. Eng. Data 1969, 14, 93-98.
(70) Nakanishi, K.; Wada, H.; Touhara, H. Thermodynamics excess functions of
methanol + piperidine at 298.15 K. J. Chem. Thermodyn. 1975, 7, 1125-1130.
(71) Le Fevre, J. W. A simple relationship between molecular polarisation in solution and
the dielectric constant of the solvent. J. Chem. Soc. 1935, 773-779.
(72) Vriens, G. N.; Hill, A. G. Equilibria of Several Reactions of Aromatic Amines. Ind.
Eng. Chem. 1952, 44, 2732-2735.
(73) Jorgensen, W. L.; Ibrahim, M. Structure and Properties of Liquid Ammonia. J. Am.
Chem. Soc. 1980, 102, 3309-3315.
(74) Narten, A. H. Liquid-Ammonia - Molecular Correlation-Functions From X-Ray-
Diffraction. J. Chem. Phys. 1977, 66, 3117-3120.
(75) Giesen, D. J.; Chambers, C. C.; Cramer, C. J.; Truhlar, D. G. Solvation model for
chloroform based on class IV atomic charges. J. Phys. Chem. B 1997, 101, 2061-2069.
150
(76) Miklavc, A. Solvation free energies of small amines: An interpretation thereof and
its general significance. J. Chem. Inf. Comput. Sci. 1998, 38, 269-270.
(77) Straatsma, T. P.; McCammon, J. A. Treatment of Rotational Isomers in Free-Energy
Evaluations - Analysis of the Evaluation of Free-Energy Differences By Molecular-
Dynamics Simulations of Systems With Rotational Isomeric States. J. Chem. Phys.
1989, 90, 3300-3304.
(78) Jorgensen, W. L.; Morales de Tirado, P. I.; Severance, D. L. Monte-Carlo Results
For the Effect of Solvation On the Anomeric Equilibrium For 2-Methoxytetrahydropyran.
J. Am. Chem. Soc. 1994, 116, 2199-2200.
(79) Jorgensen, W. L., To be published.
(80) Dunn, W. J., III; Nagy, P. I. Relative Log-P and Solution Structure For Small
Organic Solutes in the Chloroform Water-System Using Monte-Carlo Methods. J.
Comput. Chem. 1992, 13, 468-477.
(81) Mitsuya, H.; Yarchoan, R.; Broder, S. Molecular Targets For Aids Therapy. Science
1990, 249, 1533-1544.
(82) De Clercq, E. HIV Resistance to Reverse Transcriptase Inhibitors. Biochem.
Pharmacol. 1994, 47, 155-169.
(83) Katz, R. A.; Skalka, A. M. The Retroviral Enzymes. Ann. Rev. Biochem. 1994, 63,
133-173.
(84) Turner, B. G.; Summers, M. F. Structural Biology of HIV. J. Mol. Biol. 1999, 285,
1-32.
(85) Tantillo, C.; Ding, J. P.; Jacobomolina, A.; Nanni, R. G.; Boyer, P. L.; Hughes, S.
H.; Pauwels, R.; Andries, K.; Janssen, P. A. J.; Arnold, E. Locations of Anti-Aids Drug
151
Binding Sites and Resistance Mutations in the 3-Dimensional Structure of HIV-1 Reverse
Transcriptase: Implications For Mechanisms of Drug Inhibition and Resistance. J. Mol.
Biol. 1994, 243, 369-387.
(86) Rodgers, D. W.; Gamblin, S. J.; Harris, B. A.; Ray, S.; Culp, J. S.; Hellmig, B.;
Woolf, D. J.; Debouck, C.; Harrison, S. C. The Structure of Unliganded Reverse-
Transcriptase From the Human-Immunodeficiency-Virus Type-1. Proc. Natl. Acad. Sci.
U. S. A. 1995, 92, 1222-1226.
(87) Huang, H.; Chopra, R.; Verdine, G. L.; Harrison, S. C. Structure of a Covalently
Trapped Catalytic Complex of HIV-1 Reverse Transcriptase: Implications for Drug
Resistance. Science 1998, 282, 1669-1674.
(88) Hopkins, A. L.; Ren, J. S.; Esnouf, R. M.; Willcox, B. E.; Jones, E. Y.; Ross, C.;
Miyasaka, T.; Walker, R. T.; Tanaka, H.; Stammers, D. K.; Stuart, D. I. Complexes of
HIV-1 reverse transcriptase with inhibitors of the HEPT series reveal conformational
changes relevant to the design of potent non-nucleoside inhibitors. J. Med. Chem. 1996,
39, 1589-1600.
(89) Preston, B. D.; Poiesz, B. J.; Loeb, L. A. Fidelity of HIV-1 Reverse Transcriptase.
Science 1988, 242, 1168-1171.
(90) Roberts, J. D.; Bebenek, K.; Kunkel, T. A. The Accuracy of Reverse Transcriptase
From HIV-1. Science 1988, 242, 1171-1173.
(91) Perelson, A. S.; Neumann, A. U.; Markowitz, M.; Leonard, J. M.; Ho, D. D. HIV-1
Dynamics in Vivo: Virion Clearance Rate, Infected Cell Life-Span, and Viral Generation
Time. Science 1996, 271, 1582-1586.
152
(92) Wilson, E. K. AIDS Conference Highlights Hope of Drug Cocktails, Chemokine
Research. Chem. Eng. News 1996, 74, 42-46.
(93) Cohen, J. AIDS Therapies: The Daunting Challenge of Keeping HIV Suppressed.
Science 1997, 277, 32-33.
(94) Tanaka, H.; Takashima, H.; Ubasawa, M.; Sekiya, K.; Nitta, I.; Baba, M.; Shigeta,
S.; Walker, R. T.; Declercq, E.; Miyasaka, T. Synthesis and Antiviral Activity of Deoxy
Analogs of 1[(2- Hydroxyethoxy)Methyl]-6-(Phenylthio)Thymine (HEPT) As Potent and
Selective Anti-HIV-1 Agents. J. Med. Chem. 1992, 35, 4713-4719.
(95) Tanaka, H.; Takashima, H.; Ubasawa, M.; Sekiya, K.; Inouye, N.; Baba, M.;
Shigeta, S.; Walker, R. T.; Declercq, E.; Miyasaka, T. Synthesis and Antiviral Activity of
6-Benzyl Analogs of 1-[(2- Hydroxyethoxy)Methyl]-6-(Phenylthio)Thymine (HEPT) As
Potent and Selective Anti-HIV-1 Agents. J. Med. Chem. 1995, 38, 2860-2865.
(96) Tanaka, H.; Baba, M.; Hayakawa, H.; Sakamaki, T.; Miyasaka, T.; Ubasawa, M.;
Takashima, H.; Sekiya, K.; Nitta, I.; Shigeta, S.; Walker, R. T.; Balzarini, J.; Declercq, E.
A New Class of HIV-1-Specific 6-Substituted Acyclouridine Derivatives: Synthesis and
Anti-HIV-1 Activity of 5- Substituted or 6-Substituted Analogs of 1-[(2-
Hydroxyethoxy)Methyl]-6-(Phenylthio)Thymine (HEPT). J. Med. Chem. 1991, 34, 349-
357.
(97) Tanaka, H.; Takashima, H.; Ubasawa, M.; Sekiya, K.; Nitta, I.; Baba, M.; Shigeta,
S.; Walker, R. T.; Declercq, E.; Miyasaka, T. Structure-Activity-Relationships of 1-[(2-
Hydroxyethoxy)Methyl]-6-(Phenylthio)Thymine Analogs: Effect of Substitutions At the
C-6 Phenyl Ring and At the C-5 Position On Anti-HIV-1 Activity. J. Med. Chem. 1992,
35, 337-345.
153
(98) Hargrave, K. D.; Proudfoot, J. R.; Grozinger, K. G.; Cullen, E.; Kapadia, S. R.;
Patel, U. R.; Fuchs, V. U.; Mauldin, S. C.; Vitous, J.; Behnke, M. L.; Klunder, J. M.; Pal,
K.; Skiles, J. W.; McNeil, D. W.; Rose, J. M.; Chow, G. C.; Skoog, M. T.; Wu, J. C.;
Schmidt, G.; Engel, W. W.; Eberlein, W. G.; Saboe, T. D.; Campbell, S. J.; Rosenthal, A.
S.; Adams, J. Novel Nonnucleoside Inhibitors of HIV-1 Reverse-Transcriptase. 1.
Tricyclic Pyridobenzodiazepinones and Dipyridodiazepinones. J. Med. Chem. 1991, 34,
2231-2241.
(99) Jorgensen, W. L. Free Energy Changes in Solution. In Encyclopedia of
Computational Chemistry; Schleyer, P. v. R., Ed.; Wiley: New York, 1998; Vol. 2, pp
1061-1070.
(100) Lamb, M. L.; Jorgensen, W. L. Computational approaches to molecular
recognition. Curr. Opin. Chem. Biol. 1997, 1, 449-457.
(101) Kollman, P. Free Energy Calculations: Applications to Chemical and Biochemical
Phenomena. Chem. Rev. 1993, 93, 2395-2417.
(102) Jorgensen, W. L. Free Energy Calculations: A Breakthrough for Modeling Organic
Chemistry in Solution. Acc. Chem. Res. 1989, 22, 184-189.
(103) Ren, J.; Esnouf, R.; Garman, E.; Somers, D.; Ross, C.; Kirby, I.; Keeling, J.; Darby,
G.; Jones, Y.; Stuart, D.; et al. High resolution structures of HIV-1 RT from four RT-
inhibitor complexes. Nat. Struct. Biol. 1995, 2, 293-302.
(104) Lim, D. Autozmat Version 1.85; Yale University: New Haven, CT, 1999.
(105) Lim, D.; Jorgensen, W. L. ChemEdit. In Encyclopedia of Computational
Chemistry; Schleyer, P. v. R., Ed.; Wiley: New York, 1998; Vol. 5, pp 3295-3302.
(106) Tirado-Rives, J. PEPZ Version 1.0; Yale University: New Haven, CT, 1997.
154
(107) Jorgensen, W. L. BOSS Version 4.1; Yale University: New Haven, CT, 2000.
(108) Smerdon, S. J.; Jager, J.; Wang, J.; Kohlstaedt, L. A.; Chirino, A. J.; Friedman, J.
M.; Rice, P. A.; Steitz, T. A. Structure of the Binding Site for Nonnucleoside Inhibitors
of the Reverse Transcriptase of Human Immunodeficiency Virus Type 1. Proc. Natl.
Acad. Sci. U. S. A. 1994, 91, 3911-3915.
(109) Ding, J.; Das, K.; Tantillo, C.; Zhang, W.; Clark, A. D., Jr.; Jessen, S.; Lu, X.;
Hsiou, Y.; Jacobo-Molina, A.; Andries, K.; et al. Structure of HIV-1 reverse transcriptase
in a complex with the non-nucleoside inhibitor alpha-APA R 95845 at 2.8 Å resolution.
Structure 1995, 3, 365-79.
(110) Das, K.; Ding, J. P.; Hsiou, Y.; Clark, A. D.; Moereels, H.; Koymans, L.; Andries,
K.; Pauwels, R.; Janssen, P. A. J.; Boyer, P. L.; Clark, P.; Smith, R. H.; Smith, M. B. K.;
Michejda, C. J.; Hughes, S. H.; Arnold, E. Crystal structures of 8-Cl and 9-Cl TIBO
complexed with wild- type HIV-1 RT and 8-Cl TIBO complexed with the Tyr181Cys
HIV-1 RT drug-resistant mutant. J. Mol. Biol. 1996, 264, 1085-1100.
(111) Ren, J.; Esnouf, R.; Hopkins, A.; Ross, C.; Jones, Y.; Stammers, D.; Stuart, D. The
structure of HIV-1 reverse transcriptase complexed with 9-chloro-TIBO: lessons for
inhibitor design. Structure 1995, 3, 915-26.
(112) Esnouf, R. M.; Ren, J. S.; Hopkins, A. L.; Ross, C. K.; Jones, E. Y.; Stammers, D.
K.; Stuart, D. I. Unique features in the structure of the complex between HIV-1 reverse
transcriptase and the bis(heteroaryl)piperazine (BHAP) U-90152 explain resistance
mutations for this nonnucleoside inhibitor. Proc. Natl. Acad. Sci. U. S. A. 1997, 94,
3984-3989.
155
(113) Ren, J.; Esnouf, R. M.; Hopkins, A. L.; Warren, J.; Balzarini, J.; Stuart, D. I.;
Stammers, D. K. Crystal structures of HIV-1 reverse transcriptase in complex with
carboxanilide derivatives. Biochemistry 1998, 37, 14394-14403.
(114) Jorgensen, W. L. MCPRO Version 1.65; Yale University: New Haven, CT, 2000.
(115) Cheng, Y.; Prusoff, W. H. Relationship Between Inhibition Constant (Ki) and
Concentration of Inhibitor Which Causes 50 Per Cent Inhibition (I50) of an Enzymatic
Reaction. Biochem. Pharmacol. 1973, 22, 3099-3108.
(116) Balzarini, J.; Karlsson, A.; Sardana, V. V.; Emini, E. A.; Camarasa, M. J.;
Declercq, E. Human Immunodeficiency Virus 1 (HIV-1)-Specific Reverse- Transcriptase
(RT) Inhibitors May Suppress the Replication of Specific Drug-Resistant (E138K)RT
HIV-1 Mutants or Select For Highly Resistant (Y181 → C181I) RT HIV-1 Mutants.
Proc. Natl. Acad. Sci. U. S. A. 1994, 91, 6599-6603.
(117) Baba, M.; Shigeta, S.; Yuasa, S.; Takashima, H.; Sekiya, K.; Ubasawa, M.; Tanaka,
H.; Miyasaka, T.; Walker, R. T.; Declercq, E. Preclinical Evaluation of MKC-442, a
Highly Potent and Specific Inhibitor of Human-Immunodeficiency Virus Type 1 In Vitro.
Antimicrob. Agents Chemother. 1994, 38, 688-692.
(118) Sall, J. JMP Version 3; SAS Institute Inc.: Cary, NC, 1995.
(119) Böhm, H.-J.; Klebe, G. What Can We Learn from Molecular Recognition in
Protein-Ligand Complexes for the Design of New Drugs? Angew. Chem.-Int. Edit. Engl.
1996, 35, 2588-2614.
(120) Rizzo, R. C.; Jorgensen, W. L. OPLS All-Atom Model for Amines: Resolution of
the Amine Hydration Problem. J. Am. Chem. Soc. 1999, 121, 4827-4836.
(121) Pearlman, S.; Jorgensen, W. L., Submitted for publication.
156
(122) Dunitz, J. D. The Entropic Cost of Bound Water in Crystals and Biomolecules.
Science 1994, 264, 670.
(123) Buckheit, R. W.; Fliakasboltz, V.; Yeagybargo, S.; Weislow, O.; Mayers, D. L.;
Boyer, P. L.; Hughes, S. H.; Pan, B. C.; Chu, S. H.; Bader, J. P. Resistance to 1-[(2-
Hydroxyethoxy)Methyl]-6-(Phenylthio)Thymine Derivatives Is Generated By Mutations
At Multiple Sites in the HIV-1 Reverse-Transcriptase. Virology 1995, 210, 186-193.
(124) De Clercq, E. The role of non-nucleoside reverse transcriptase inhibitors (NNRTIs)
in the therapy of HIV-1 infection. Antiviral Res. 1998, 38, 153-179.
(125) Young, S. D.; Britcher, S. F.; Tran, L. O.; Payne, L. S.; Lumma, W. C.; Lyle, T. A.;
Huff, J. R.; Anderson, P. S.; Olsen, D. B.; Carroll, S. S.; Pettibone, D. J.; Obrien, J. A.;
Ball, R. G.; Balani, S. K.; Lin, J. H.; Chen, I. W.; Schleif, W. A.; Sardana, V. V.; Long,
W. J.; Byrnes, V. W.; Emini, E. A. L-743,726 (Dmp-266) - a Novel, Highly Potent
Nonnucleoside Inhibitor of the Human-Immunodeficiency-Virus Type-1 Reverse-
Transcriptase. Antimicrob. Agents Chemother. 1995, 39, 2602-2605.
(126) Levin, J. NNRTI Update - NNRTI Resistance Report 1998.
http://www.natap.org/reports/NR5-nnrti_update2.resis.htm.
(127) Byrnes, V. W.; Sardana, V. V.; Schleif, W. A.; Condra, J. H.; Waterbury, J. A.;
Wolfgang, J. A.; Long, W. J.; Schneider, C. L.; Schlabach, A. J.; Wolanski, B. S.;
Graham, D. J.; Gotlib, L.; Rhodes, A.; Titus, D. L.; Roth, E.; Blahy, O. M.; Quintero, J.
C.; Staszewski, S.; Emini, E. A. Comprehensive Mutant Enzyme and Viral Variant
Assessment of Human-Immunodeficiency-Virus Type-1 Reverse-Transcriptase
Resistance to Nonnucleoside Inhibitors. Antimicrob. Agents Chemother. 1993, 37, 1576-
1579.
157
(128) Balzarini, J.; Baba, M.; Declercq, E. Differential Activities of 1-[(2-
Hydroxyethoxy)Methyl]-6- (Phenylthio)Thymine Derivatives Against Different Human-
Immunodeficiency-Virus Type-1 Mutant Strains. Antimicrob. Agents Chemother. 1995,
39, 998-1002.
(129) Balzarini, J.; Karlsson, A.; Meichsner, C.; Paessens, A.; Riess, G.; Declercq, E.;
Kleim, J. P. Resistance Pattern of Human-Immunodeficiency-Virus Type-1 Reverse-
Transcriptase to Quinoxaline S-2720. J. Virol. 1994, 68, 7986-7992.
(130) Jorgensen, W. L. MATADOR Version 1.0; Yale University: New Haven, CT, 2000.
(131) Baxter, C. A.; Murray, C. W.; Clark, D. E.; Westhead, D. R.; Eldridge, M. D.
Flexible docking using Tabu search and an empirical estimate of binding affinity.
Proteins 1998, 33, 367-382.
(132) Levy, R. M. IMPACT Version c1.00; Schrödinger, Inc.: Jersy City, NJ, 1999.
(133) Rizzo, R. C.; Tirado-Rives, J.; Jorgensen, W. L. Estimation of Binding Affinities
for HEPT and Nevirapine Analogues with HIV-1 Reverse Transcriptase via Monte Carlo
Simulations. J. Am. Chem. Soc. 2001, 44, 145-154.
(134) Maga, G.; Ubiali, D.; Salvetti, R.; Pregnolato, M.; Spadari, S. Selective Interaction
of the Human Immunodeficiency Virus Type 1 Reverse Transcriptase Nonnucleoside
Inhibitor Efavirenz and Its Thio-Substituted Analog with Different Enzyme-Substrate
Complexes. Antimicrob. Agents Chemother. 2000, 44, 1186-1194.
(135) Ren, J.; Milton, J.; Weaver, K. L.; Short, S. A.; Stuart, D. I.; Stammers, D. K.
Structural Basis for the Resilience of Efavirenz (DMP-266) to Drug Resistance Mutations
in HIV-1 Reverse Transcriptase. Structure 2000, 8, 1089-1094.
158
(136) Hopkins, A. L.; Ren, J. S.; Tanaka, H.; Baba, M.; Okamato, M.; Stuart, D. I.;
Stammers, D. K. Design of MKC-442 (emivirine) analogues with improved activity
against drug-resistant HIV mutants. J. Med. Chem. 1999, 42, 4500-4505.
(137) Rizzo, R. C.; Wang, D.; Tirado-Rives, J.; Jorgensen, W. L. Validation of a Model
for the Complex of HIV-1 Reverse Transcriptase with Sustiva through Computation of
Resistance Profiles. J. Med. Chem. 2001, 122, 12898-12900.
(138) De Simone, G.; Balliano, G.; Milla, P.; Gallina, C.; Giordano, C.; Tarricone, C.;
Rizzi, M.; Bolognesi, M.; Ascenzi, P. Human alpha-thrombin inhibition by the highly
selective compounds N-ethoxycarbonyl-D-Phe-Pro-alpha-azaLys p-nitrophenyl ester and
N-carbobenzoxy-Pro-alpha-azaLys p-nitrophenyl ester: a kinetic, thermodynamic and X-
ray crystallographic study. 1997, 269, 558-69.
(139) Malley, M. F.; Tabernero, L.; Chang, C. Y.; Ohringer, S. L.; Roberts, D. G.; Das,
J.; Sack, J. S. Crystallographic determination of the structures of human alpha-thrombin
complexed with BMS-186282 and BMS-189090. 1996, 5, 221-8.
(140) Banner, D. W.; Hadvary, P. Crystallographic analysis at 3.0-A resolution of the
binding to human thrombin of four active site-directed inhibitors. 1991, 266, 20085-93.
(141) Tabernero, L.; Chang, C. Y.; Ohringer, S. L.; Lau, W. F.; Iwanowicz, E. J.; Han,
W. C.; Wang, T. C.; Seiler, S. M.; Roberts, D. G.; Sack, J. S. Structure of a retro-binding
peptide inhibitor complexed with human alpha-thrombin. 1995, 246, 14-20.
(142) Brandstetter, H.; Turk, D.; Hoeffken, H. W.; Grosse, D.; Sturzebecher, J.; Martin,
P. D.; Edwards, B. F.; Bode, W. Refined 2.3 A X-ray crystal structure of bovine thrombin
complexes formed with the benzamidine and arginine-based thrombin inhibitors NAPAP,
159
4-TAPAP and MQPA. A starting point for improving antithrombotics. 1992, 226, 1085-
99.
(143) Dreyer, G. B.; Lambert, D. M.; Meek, T. D.; Carr, T. J.; Tomaszek, T. A.;
Fernandez, A. V.; Bartus, H.; Cacciavillani, E.; Hassell, A. M.; Minnich, M.; et al.
Hydroxyethylene isostere inhibitors of human immunodeficiency virus-1 protease:
structure-activity analysis using enzyme kinetics, X-ray crystallography, and infected T-
cell assays. 1992, 31, 6646-59.
(144) Backbro, K.; Lowgren, S.; Osterlund, K.; Atepo, J.; Unge, T.; Hulten; Bonham, N.
M.; Schaal, W.; Karlen, A.; Hallberg, A. Unexpected binding mode of a cyclic sulfamide
HIV-1 protease inhibitor. 1997, 40, 898-902.
(145) Hong, L.; Treharne, A.; Hartsuck, J. A.; Foundling, S.; Tang, J. Crystal structures
of complexes of a peptidic inhibitor with wild-type and two mutant HIV-1 proteases.
1996, 35, 10627-33.
(146) Newlander, K. A.; Callahan, J. F.; Moore, M. L.; Tomaszek, T. A.; Huffman, W. F.
A novel constrained reduced-amide inhibitor of HIV-1 protease derived from the
sequential incorporation of gamma-turn mimetics into a model substrate. 1993, 36, 2321-
31.
(147) Priestle, J. P.; Fassler, A.; Rosel, J.; Tintelnot-Blomley, M.; Strop, P.; Grutter, M.
G. Comparative analysis of the X-ray structures of HIV-1 and HIV-2 proteases in
complex with CGP 53820, a novel pseudosymmetric inhibitor. 1995, 3, 381-9.
(148) Thompson, S. K.; Murthy, K. H.; Zhao, B.; Winborne, E.; Green, D. W.; Fisher, S.
M.; DesJarlais, R. L.; Tomaszek, T. A.; Meek, T. D.; Gleason, J. G.; et al. Rational
design, synthesis, and crystallographic analysis of a hydroxyethylene-based HIV-1
160
protease inhibitor containing a heterocyclic P1'--P2' amide bond isostere. 1994, 37,
3100-7.
(149) Kim, E. E.; Baker, C. T.; Dwyer, M. D.; Murcko, M. A.; Rao, B. G.; Tung, R. D.;
Navia, M. A. Crystal-Structure of Hiv-1 Protease in Complex With Vx-478, a Potent and
Orally Bioavailable Inhibitor of the Enzyme. J. Am. Chem. Soc. 1995, 117, 1181-1182.
(150) Baldwin, E. T.; Bhat, T. N.; Gulnik, S.; Liu, B.; Topol, I. A.; Kiso, Y.; Mimoto, T.;
Mitsuya, H.; Erickson, J. W. Structure of HIV-1 protease with KNI-272, a tight-binding
transition-state analog containing allophenylnorstatine. 1995, 3, 581-90.
(151) Chen, Z.; Li, Y.; Chen, E.; Hall, D. L.; Darke, P. L.; Culberson, C.; Shafer, J. A.;
Kuo, L. C. Crystal structure at 1.9-A resolution of human immunodeficiency virus (HIV)
II protease complexed with L-735,524, an orally bioavailable inhibitor of the HIV
proteases. 1994, 269, 26344-8.
(152) Jhoti, H.; Singh, O. M.; Weir, M. P.; Cooke, R.; Murray-Rust, P.; Wonacott, A. X-
ray crystallographic studies of a series of penicillin-derived asymmetric inhibitors of
HIV-1 protease. 1994, 33, 8417-27.
(153) Lam, P. Y.; Jadhav, P. K.; Eyermann, C. J.; Hodge, C. N.; Ru, Y.; Bacheler, L. T.;
Meek, J. L.; Otto, M. J.; Rayner, M. M.; Wong, Y. N.; et al. Rational design of potent,
bioavailable, nonpeptide cyclic ureas as HIV protease inhibitors. 1994, 263, 380-4.
(154) Bone, R.; Vacca, J. P.; Anderson, P. S.; Holloway, M. K. X-Ray Crystal-Structure
of the Hiv Protease Complex With L- 700,417, an Inhibitor With Pseudo C2 Symmetry.
J. Am. Chem. Soc. 1991, 113, 9382-9384.
(155) Quiocho, F. A.; Vyas, N. K. Novel stereospecificity of the L-arabinose-binding
protein. 1984, 310, 381-6.
161
(156) Quiocho, F. A.; Wilson, D. K.; Vyas, N. K. Substrate specificity and affinity of a
protein modulated by bound water molecules. 1989, 340, 404-7.
(157) Vermersch, P. S.; Tesmer, J. J.; Lemon, D. D.; Quiocho, F. A. A Pro to Gly
mutation in the hinge of the arabinose-binding protein enhances binding and alters
specificity. Sugar-binding and crystallographic studies. 1990, 265, 16592-603.
(158) Vermersch, P. S.; Lemon, D. D.; Tesmer, J. J.; Quiocho, F. A. Sugar-binding and
crystallographic studies of an arabinose-binding protein mutant (Met108Leu) that
exhibits enhanced affinity and altered specificity. 1991, 30, 6861-6.
(159) Bode, W.; Turk, D.; Sturzebecher, J. Geometry of binding of the benzamidine- and
arginine-based inhibitors N alpha-(2-naphthyl-sulphonyl-glycyl)-DL-p-
amidinophenylalanyl-pipe ridine (NAPAP) and (2R,4R)-4-methyl-1-[N alpha-(3-methyl-
1,2,3,4-tetrahydro-8- quinolinesulphonyl)-L-arginyl]-2-piperidine carboxylic acid
(MQPA) to human alpha-thrombin. X-ray crystallographic determination of the NAPAP-
trypsin complex and modeling of NAPAP-thrombin and MQPA-thrombin. 1990, 193,
175-82.
(160) Kurinov, I. V.; Harrison, R. W. Prediction of new serine proteinase inhibitors.
1994, 1, 735-43.
(161) Marquart, M.; Walter, J.; Deisenhofer, J.; Bode, W.; Huber, R. The Geometry of
the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and Its Complexes
With Inhibitors. Acta Crystallogr. Sect. B-Struct. Commun. 1983, 39, 480-490.
(162) Mattos, C.; Rasmussen, B.; Ding, X.; Petsko, G. A.; Ringe, D. Analogous inhibitors
of elastase do not always bind analogously. 1994, 1, 55-8.
162
(163) Yao, N.; Trakhanov, S.; Quiocho, F. A. Refined 1.89-A structure of the histidine-
binding protein complexed with histidine and its relationship with many other active
transport/chemosensory proteins. 1994, 33, 4769-79.
(164) Cowan, S. W.; Newcomer, M. E.; Jones, T. A. Crystallographic refinement of
human serum retinol binding protein at 2A resolution. 1990, 8, 44-61.
(165) Vyas, N. K.; Vyas, M. N.; Quiocho, F. A. Sugar and signal-transducer binding sites
of the Escherichia coli galactose chemoreceptor protein. 1988, 242, 1290-5.
(166) Sacchettini, J. C.; Gordon, J. I.; Banaszak, L. J. Crystal structure of rat intestinal
fatty-acid-binding protein. Refinement and analysis of the Escherichia coli-derived
protein with bound palmitate. 1989, 208, 327-39.
(167) Wang, J.; Kollman, P. A.; Kuntz, I. D. Flexible ligand docking: a multistep strategy
approach. 1999, 36, 1-19.
(168) Tirado-Rives, J. CHOP Version 1.0; Yale University: New Haven, CT, 2001.
(169) Jorgensen, W. L., Unpublished Data
(170) Lim, D. Autozmat Version 1.85; Yale University: New Haven, CT, 2000.
(171) Jorgensen, W. L. BOSS Version 4.2; Yale University: New Haven, CT, 2001.
163