Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
MOLECULAR DOCKING
2
What is Protein-Ligand Docking?
• Definition:Computationally predict the structures of protein-ligand
complexes from their conformations and orientations. The orientation that maximizes the interaction reveals the most accurate structure of the complex.
• Importance of complexes- structure -> function
3
What is Docking?
• Given two molecules find their correct association:
+
=
T
4
3-D Representation of a Protein Binding Site
5.24.2-4.7
6.7
4.8
5.1-7.1 Distances betweenbinding groupsin Angstroms and the type of interactionis searchable
5
General Protein–Ligand Binding• Ligand
- Molecule that binds with a protein- DNA, drug lead compounds, etc.
• Protein active site(s)- Allosteric binding- Competitive binding
• Function of binding interaction- Natural and artificial
6
Issues Involved in Docking
• Protein Structure and Active Site- Assumed knowledge (PDBs, comparative modeling etc.)- PROCAT database: 3d enzyme active site templates
• Ligand Structure- Pharmacophore (base fragment) in potential drug compound
- well known groups
• Rigid vs. Flexible- In solution or in vacum- Structure fixed, partly fixed, modeling of flexibility
7
Algorithmic Approaches to Docking
• Qualitative– Geometric– Shape complementarity and fitting
• Quantitative– Energy calculations– Determine global minimum energy– Free energy measure
• Hybrid– Geometric and energy complementarity– 2 phase process: soft and hard docking
It involves:
Finding useful ways of representing the molecules and molecular properties.
Exploration of the configuration spaces available for interaction between ligand and receptor.
Evaluate and rank configurations using a scoring system, in this case the binding energy
However, since it is difficult to evaluate the binding energy because the binding sites may not be easily accessible, the binding energy is modeled as follows:
∆G bind= ∆Gvdw + ∆Ghbond + ∆Gelect + ∆G conform+ ∆G tor + ∆G sol
Docking uses a “search and score” method
10
PDBfiles
Surface Representation
Patch Detection
Matching Patches
Scoring & Filtering
Candidatecomplexes
Docking Strategy
Adding angles in Cartesian space
x
y
r
(x,y)
(x’,y’)
x' = |r| cos (= |r|(cos cos sin sin = (|r| cos cos |r| sin sin = xcos y sin
y' = |r| sin (= |r|(sin cos sin cos = (|r| sin cos |r| cos sin = ycos x sin
x = |r|cos y = |r|sin
x'y'
cos sinsin cos
xy
in matrix notation...
converting internal motion to Cartesian motion
rotation matrix
A 3D rotation matrix
Is the product of 2D rotation matrices.
cos sin 0sin cos 00 0 1
cos 0 sin0 1 0
sin 0 cos
coscos sin cossincos cos sinsin
sin 0 cos
Kinds of: search
Exhaustive
Deterministic
Dependent on granularity of sampling
Feasible only for low-dimensional problems
DOF, 6D search
Anchor‐and‐grow (or) incremental construction algorithmIt tries to explore all the degrees of freedom in a molecule, butultimately face the problem of combinatorial explosionSo ligands are often incrementally grown into active sites
DOCK (incremental) FlexX (incremental) Glide (incremental)
Kinds of search:SYSTEMATIC
Kinds of:::: search
Random
Outcome varies
Repeat to improve chances of success
Feasible for higher-dimensional problems
Simulated Annealing (SA) Evolutionary Algorithms (EA)
Genetic Algorithm (GA) /Tabu Search (TS) Hybrid Global-Local Search/Lamarckian GA (LGA)
Monte Carlo (MC) methods and Evolutionary algorithm It works by making random changes to either a single ligand or a population of ligands Novel ligand is evaluated by pre‐defined probability function In Tabu search, to accept the novel molecule, it calculates RMSD between current moecular coordinates and every molecule’s previously recorded conformation
AutoDock (MC/SA,GA/LGA) GOLD (GA)
Kinds of search:STOCHASTIC
Energy Minimization methods and Molecular Dynamics simulations
Molecular Dynamics simulations are often unable to cross high‐energy barriers within feasible simulation time periods, they might accommodate ligands in local minima of the energy surface so, an attempt is made to simulate different parts of a protein‐ligand system at different temperatures Energy minimization is rarely used as stand‐alone search techniques, as only local energy minima can be reached
DOCK Glide AutoDock
Kinds of Search:DETERMINISTIC
Random/stochastic• AutoDock (MC) • MOE-Dock (MC,TS) • GOLD (GA) • PRO_LEADS (TS) Systematic•DOCK (incremental) • FlexX (incremental) • Glide (incremental) • Hammerhead (incremental) Simulation/Deterministic•DOCK• Glide• MOE-Dock• AutoDock• Hammerhead
SCORING FUNCTIONS:FORCE – FIELD BASED SCORINGQuantifying the sum of two energiesReceptor‐ligand interaction energy and internal ligand energy
Most scoring functions consider a single protein conformationto omit the internal protein energy, which simplifies thescoring
Force‐field scoring functions varies based on different forcefield parameter setsFor E.g.:G‐Score Tripos force fieldAutoDock AMBER force field
SCORING FUNCTIONS:EMPIRICAL SCORING
Based on binding energies and/or conformations
It is designed based on idea that binding energies can beapproximated by a sum of individual uncorrelated terms
The coefficients are obtained from regression analysis usingexperimentally determined binding energies and X‐raystructural information
Disadvantage – it depends on the molecular data sets used toperform regression analysis
KNOWLEDGE BASED SCORINGIt reproduce experimental structures rather than binding energies
Protein‐ligand complex is modelled using relatively simple atomicinteraction‐pair potentials
A number of atom‐type interactions are defined depending ontheir molecular environment
The main attraction is computational simplicity which permitsefficient screening of large compound databases
Disadvantage‐derivation is essentially based on informationimplicitly encoded in limited sets of protein‐ligand complex
23
Scoring in Ligand-Protein Docking
Potential Energy Description:
Type of Scoring Functions
FORCE FIELD BASEDD-ScoreG-ScoreGold ScoreAutodockDock
KNOWLEDGE BASEDPMFDrug Score
CONSENSUSCSCOREX-ScoreEMPIRICAL
LudiF-ScoreChem Score
AutoDock
Cerius2/LigScore
vdW 6-9
C+pol buried polar surface in attractive protein – ligand complex
Totpol2 square of buried polar surface in attractive – repulsive protein –
ligand complex
Cerius2/PLP
Cerius2/PMF
Cerius2/ LUDI
SYBYL/F-Score
SYBYL/G-Score:
SYBYL/D-Score:
SYBYL/ChemScore:
DrugScore:
X-Score:
Bonded Interactions: Non‐Bonded Interactions:
It based on the bonded and non bonded interactions of ligand – binding site
DOCKING SCORE
BINDING AFFINITY
The interaction of most ligands with their binding sites can be characterized in terms of a binding affinity
The free energy of binding (ΔG) is related to binding affinity by
The equilibrium equation is :
Where ΔG is Gibb’s freeenergy, R is gas constant, T is temperatures and K is equilibrium constant E is enzyme and I is inhibitor
172 Protein – ligand complexes chosen basedon resolution (better than 2.5 Å)
172 100 passes
100 Protein – Ligand complex has: 43 different proteins Molecular weight 122 – 913 KDa Rotatable single bond (ligand) 0 – 20
Conformational sampling procedure
The selection of suitable sample for study
It is done by AutoDock (Genetic Algorithm)
Parameter: For best fitting, Translation, rotation, and torsions are
set to 0.5 Å, 15°, and 15°, respectively
The size of the docking box is 30 Å X 30 Å X 30 Å
Screening parameter: RMSD 0 - 15 Å Distinctive conformational clusters 30 - 70 Docked conformation should close to experimental
conformation (RMSD ≤ 2.0 Å)
ga-num-generations determines the quality of the sample 50 - 200 runs per complex
Force Field based scoring: AutoDock, G-Score, D-Score
Empirical scoring: LigScore, PLP,LUDI, F-Score, ChemScore, X-Score
Knowledge – Based scoring: PMF, DrugScore
Success rates of 11 scoring functions under different rmsd criteria
AUTODOCKSimulated Annealing Based on temperature effects Start with high temperature and global search Lower temperature local search
Genetic Algorithm Charles Darwin’s Theory of Evolution
Genotype Phenotype Lamarckian Algorithm ( Jean –Baptiste de Lamarck)
Phenotype Genotype
Search parameters
Population sizeCrossover rateMutation rateLocal searchenergy evals
Termination criteriaenergy evalsgenerations
Genetic function algorithmStart with a random population (50-200)
Perform Crossover (Sex, two parents -> 2 children) and Mutation (Cosmic rays, one individual gives 1 mutant child)
Compute fitness of each individual Proportional Selection & ElitismNew Generation begins if total energy evals or
maximum generations reached
Dimensionality of molecular docking
Degrees of Freedom (DOF) Position or Translation (x,y,z) = 3
Orientation or Quaternion (qx, qy, qz, qw) = 4
Rotatable Bonds or Torsions (tor1, tor2, … torn) = n
Total DOF, or Dimensionality, D = 3 + 4 + n
AutoDock uses grid-based dockingLigand-protein
interaction energies are pre-calculated and then used as a look-up table during simulation
Grid maps are constructed based on atoms of interest in ligand (here CANOSH)
Docking Preparation – Grid
(SYBYL)
Initial X‐Ray crystallographic positions of protein and ligand
Simulated annealingOne copy of the ligand (Population = 1) Starts from a random or specific
postion/orientation/conformation (=state) Constant temperature annealing cycle
(Accepted & Rejected Moves) Temperature reduced before next cycleStops at maximum cycles
Docking – Simulated Annealing• Runs = 100• Cycles = 50• Initial Temp (RT) = 1,000• Temp reduction factor = .95• Linear temperature reduction• Translation reduction factor = 1• Quaternion reduction factor = 1• Torsional reduction factor = 1• # rotatable bonds = 12• Initial coordinates = Random• Initial quaternion = Random• Initial dihedrals = Random• Translation step = 2.0 Å• Quaternion step = 50 deg• Torsion step = 50 deg
Results: 100 different clusters Energy range: -0.63 to +64,000 Conformation #81: -0.63 Conformation #67: +20.02 Conformation #68: +10.74
Lowest energy conf not close to position but similar to original
Conf #67 closest to position and conformation of original ligand; higher energy
Conf #68 close to position but not conformation of original ligand; not as high energy
Original ligand confSA conformation #67
(SYBYL)
Close‐up of previous
(SYBYL)
Original ligand confBest GA confBest LGA confBest SA confBest LS conf
GOLD (CCDC, Cambridge, UK)www.ccdc.cam.ac.uk/products/life_sciences/gold/
• Flexible docking:– match protein and ligand hydrogen bond “fitting points”– optimizes the poses using a Genetic algorithm– Flexible rings by flipping ring corners
• Locally flexible protein: polar hydrogens allowed to move
• Water switched on and off to maximize interactions
• SF: GOLDfitness score
FRED(OpenEye, Santa Fe, CA, USA)
• Rigid body docking using a shape-based approach: – random generation of poses within active site– use of Gaussian functions to represent atoms– Use of Gaussian docking functions (combines overlap
between ligand with protein atoms and area intersection) and a Quasi-Newton rigid body optimization algorithm to place ligand and select poses
– Uses Rigid protein
SF : ChemScore; Emperical FF
FlexX/FlexE(BioSolveIt, Sankt Augustin, Germany)
• Flexible docking:• incremental construction for the ligand combined with a matching of
ligand groups to protein interaction types• multiple conformations for rings
• Rigid of flexible protein:• all atom representation• composite structures assessed (FlexE) • water considered using the particle concept (waters placed before docking
and only kept during the docking run if favourable interactions are created)
• SF : Flex X SF;Emperical FF
The general schema
Incremental construction
Scoring function
Receptor-ligand interactions
Ligand conformational flexibility
Modeling
AlgorithmBase selection
Base placement
FLEX-X
Scoring function• Estimates the free binding energy in the complex
• The function is additive in the ligand atoms.
match score
contact score
Ligand fragmentation
• Good results are produced if the added fragments are small
• Every fragment, except for the base fragment consist of only one component.
DOCK 6.0(http://dock.compbio.ucsf.edu)(UCSF,CA,USA)
• Rigid body docking using a clique matching algorithm
• Flexible ligand using an incremental construction algorithmcombined with a simplex minimizer
• Flexible protein:-negative image of the active site using spheres -use of precomputed grids based on AMBER intermolecular
energy and GB/SA(Generalised Born/surface Area) solvation energy
-protein flexibility considered using combined grids
• SF: Dockscore; Amber Force Field
75
DOCK as an Example
DOCK works in 5 steps:• Step 1 Start with 3D coordinates of target receptor• Step 2 Generate molecular surface for receptor• Step 3 Generate spheres to fill the active site of the
receptor: The spheres become potential locations forligand atoms
• Step 4 Matching: Sphere centers are then matched tothe ligand atoms, to determine possible orientations forthe ligand
• Step 5 Scoring: Find the top scoring orientation
DOCK as an Example
4 5
• Three scoring schemes: Shape scoring, Electrostatic scoring and Force-field scoring
• Image 5 is a comparison of the top scoring orientation of the molecule thioketal with the orientation found in the crystal structure
77
The DOCK AlgorithmTwo steps in rigid ligand mode:
Orienting the putative ligand in the siteGuided by matching distances, between pre-defined site points on the target to interatomic distances of the ligand. The RT matrix is used for the transform of the ligand.
Scoring the resulting orientationEach orientation is scored for each quality fit. The process is repeated a user-defined number of orientations or maximum orientations
78
Site Points Generation in DOCK
• Program SPHGEN identifies the active site, and other sites of interest.
• Each invagination is characterized by a set of overlapping spheres.
• For receptors, a negative image of the surface invaginations is created;
• For a ligand, the program creates a positive image of the entire molecule.
79
The MatchingCan be directed by 2 additional features:
• Chemical matching - labeling the site points such that only particular atom types are allowed to be matched to them.
• Critical cluster - subsets of interest can be defined as critical clusters, so that at least one member of them will be part of any accepted ligand “match”.
Increase in efficiency and speed due to elimination of potentially less promising orientations!
80
.. .
.
. .. .
N
NHN
SO
F
.. .
N
NHN
SO
F
.
N
NHN
SO
F
N
NHN
SO
F
1. Define the target binding site points.
2. Match the distances.
3. Calculate the transformation matrix for the orientation.
4. Dock the molecule.
5. Score the fit.
DOCK
83
Pharmacophore-Based Docking
84
Pharmacophore-based DockingBasic idea:
• Appropriate spatial disposition of a small number of functional groups in a molecule is sufficient for achieving a desired biological effect.
• The ensemble formation will be guided by these functional groups.
85
Pharmacophore Fingerprint• Pharmacophore fingerprint - a set of pharmacophore
features and their relative position.• Typical pharmacophore features:
– Hydrogen-bond donors and acceptors– Positive and negative ionizable atoms/groups– Hydrophobes and ring centroids
• Implemented in DOCK 4.0.1– Hydrogen-bond donors– Hydrogen-bond acceptors– Dual hydrogen-bond donor and acceptor – 5 or 6 membered ring centroids
86
Pharmacophore DOCK
Prepare target structure
Generate a set ofchemically labeled site
points
Read a 3D pharmacophorefrom the database
Compare distances betweenpharmacophore points andsite points to determine an
orientation matrix
Match?NoYes
Orientationstries >MAX
Orientationstries >MAX
No No
Yes Yes
Use the transformation matrix todock all conformers associated with
the pharmacophore
Score allconformers
Save the best scoringconformer for each molecule
87
Advantages of Pharmacophore-based Docking
• Rapid elimination of ligands containing functional groups which would interfere with binding.
• Speed increase over docking of individual molecules.
• More information pertaining to the entire molecule is retained (no rigid portions).
• Chemical matching and critical clusters are encouraged.
88
Limitations of Pharmacophore-based Searching
• A limited subset of key interactions (typically 4-6) which must be extracted from the target site with dozens of potential interactions.
• Complex queries are extremely slow.• The majority of the information contained in the target
structure is not considered during the search. There is no scoring function beyond the binary (match/no match). Any steric or electronic constraints imposed by the target, but not defined by the target are ignored.
89
Conformational Ensembles DockingObservations:
1. Generating an orientation of a ligand in a binding site may be separated from calculating a conformation of the ligand in that particular orientation.
2. Multiple conformations of a given ligand usually have some portion in common (internally rigid atoms such as ring systems), and therefore, contain redundancies.
90
Conformational Ensembles DockingObservations:
1. Generating an orientation of a ligand in a binding site may be separated from calculating a conformation of the ligand in that particular orientation.
2. Multiple conformations of a given ligand usually have some portion in common (internally rigid atoms such as ring systems), and therefore, contain redundancies.
91
Overview of the Ligand Ensemble Method
92
Disadvantages of Conformational Ensemble Docking
• Loss of information when the orientations are guided only by a subset of the atoms in molecule. Orientations may be missed because potential distance matches from non-rigid portions of the molecule are not considered.
• The ensemble method will fail for ligands that lack internally rigid atoms.
• The use of chemical matching and critical clusters is limited.
CASTP:http://cast.engr.uic.edu/
XSITE:http://www.biochem.ucl.ac.uk/~roman/xsite/manual/man2.html
Voidoo:http://spec.ch.man.ac.uk/prog_man/o-sat/voidoo.html
APROPOS:http://www.csb.yale.edu/userguides/datamanip/apropos/apropos_descrip.html
CANGAROO:http://chem.leeds.ac.uk/ICAMS/eccc/cangaroo.html
Surfnet:http://www.biochem.ucl.ac.uk/~roman/surfnet/surfnet.html
PASS: http://www.delanet.com/~bradygp/pass/
Active site templates for Enzymeshttp://www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.html
ACTIVE SITE IDENTIFICATION PROGRAMS
Protein – Ligand Docking Programs
AutoDockhttp://www.scripps.edu/mb/olson/doc/autodock/GOLDhttp://www.ccdc.cam.ac.uk/products/life_sciences/gold/FLEXXhttp://www.biosolveit.de/FlexX/GLIDEhttp://www.schrodinger.com/ICMhttp://www.molsoft.com/docking.htmlDockhttp://www.cmpharm.ucsf.edu/kuntz/dock.html
Protein protein Docking Programs
ZDOCK : http://zlab.bu.edu/zdock/HEX : http://www.csd.abdn.ac.uk/hex/GRAMM : http://vakser.bioinformatics.ku.edu/resources/grammICM : http://www.molsoft.com/docking.htmlCLUSPRO : http://nrc.bu.edu/cluster/clusdoc.htmlKORDO : http://www.bioinfo.de/isb/gcb99/poster/zimmermann/MOLFIT : http://www.weizmann.ac.il/Chemical_Research_Support//molfit/