Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
23.02.2012 Bernhard Knapp 1
An introduction into
“Docking”
and
“Molecular Dynamics simulations”
Univ. Ass. Dipl.-Ing. (FH) Dr. scient. med. Bernhard Knapp
Center for Medical Statistics, Informatics and Intelligent Systems
Department for Biosimulation and Bioinformatics
Medical University of Vienna / AKH (General Hospital)
23.02.2012 Bernhard Knapp 2
TOC
1. Basic biology knowledge
2. Docking
• Docking in general
• Example AutoDock
3. Molecular Dynamics
• Introduction
• Limitations
• Example Gromacs
3. Tutorial on PDB / jmol
23.02.2012 Bernhard Knapp 3
Basic biology knowledge
23.02.2012 Wikimedia
4
Amino acids
Build up proteins (german “Eiweiß”)
all have the same basic structure (“backbone” consisting of an amine group, a carboxylic acid group and a C-alpha atom) but differ in their side-chain => residue (the side chain defines which AA it is)
20 different canonical amino acids (AAs) are existing (that means 20 different side-chains)
23.02.2012 5
Wikimedia
23.02.2012 Bernhard Knapp 6
Several amino acids are connected via „peptide bonds“
Wikimedia
23.02.2012 Bernhard Knapp 7
peptide: > 1 AAoligopeptide: < 10 (other sources state 30)polypeptide: > 10 AAsprotein: > 50 AAsmacropeptide: > 100 AAs
monopeptide: 1 AAdipeptide: 2 AAtripeptide: 3 AAtetrapeptide: 4 AApentapeptide: 5 AAhexapeptide: 6 AAheptapentide: 7 AAoctapeptide: 8 AAnonapeptide: 9 AAdecapeptide: 10 AAundecapeptide: 11 AAs...icosapeptide: 20 AAstricontapeptide: 30 AAstetracontapeptide: 40 AAs
Then they are called:
… however the exact definitions differ (and you do not need to learn them for the examination of this lecture!)
23.02.2012 Bernhard Knapp 8
Structure levels
Primary structure: the pure sequence of the AAs
Secondary structure: e.g. beta-sheet, alpha-helix, or turns
Tertiary structure: 3D arrangement of secondary structure elements
Quaternary structure: several proteins together
Wikimedia
23.02.2012 9
How we can illustrate them(also see the tutorial at the end)
And whatabout thesize ofproteinsand AAs?
[Janeway]
~13x6x5 nm~20x20x20 nm
1 Nanometer == 10-9m == 0.0000000001m
23.02.2012 Bernhard Knapp 12
2 more definitions:
Ligand: also known as (small) peptide, epitope, guest, antigenic determinant
Receptor: also known as (big) protein, host, macro molecule
23.02.2012 Bernhard Knapp 13
Docking in general
23.02.2012 Bernhard Knapp 14
What does docking mean?
trying to find the „best matching“ between 2 molecules
23.02.2012 15
(„induced fit“)
Let us try with this one …
Who could fit to me?
23.02.2012 Bernhard Knapp 16
23.02.2012 Bernhard Knapp 17
[Kitchen et al., 2004]
23.02.2012 Bernhard Knapp 18
Why is docking useful?
Docking (~Virtual Screening) is of paramount interest for drug discovery
For one target millions of different possible drugs can be tested
The best n matches will be tried in experiments
Will save time, resources and money
23.02.2012 Bernhard Knapp 19
Usually 3 steps
1) Decide how to search through the spatial space
2) Decide how flexible ligand and receptor can be
3) Decide how to score various parameter sets
23.02.2012 Bernhard Knapp 20 B h d K
Where is the difficulty?
1) 6 degrees of freedom in 3d space (3 translational, 3 rotational)
2) 100+ degrees of freedom if we consider full flexibility of all bounds
3) nearly each atom interacts witch every other one
23.02.2012 Bernhard Knapp 21
Ad 1) Search Algorithms used (for spatial space)Systematic docking
- Brute Force
- Fragmentation
- Database
Heuristic docking
- Monte Carlo
- Genetic algorithms
- Tabu search
Simulations Docking
- Molecular Dynamics
- Gradient (Energy) Methods
23.02.2012 Bernhard Knapp 22
Ad 2) Deciding about the flexibility“rigid body” docking
- receptor and ligand are considered as 100% rigid
- very fast (6dfs only), but inaccurate
“induced fit” docking
- moveable [backbone| side] chains
“flexible ligand”
- only the ligand is considered als flexible, the receptor remains rigid
“full flexibility”
- computational very expensive
23.02.2012 Bernhard Knapp 23
Ad 3) Scoring functions (1/2)
Force Field based scoring function
- energy of the interaction and internal energy of the ligand
- combination of : Van der Waales, Lennard Jones, electrostatic energy, …
- e.g. D-Score, GoldScore, AutoDock, CHARMM, …
empirical scoring functions
- Trying to reproduce experimental observed docking behaviors by means of formulas
- usually the sum of uncorrelated terms
- e.g. LUDI, F-Score, SCORE, X-SCORE, …
23.02.2012 Bernhard Knapp 24
Scoring Funktionen (2/2)
Knowledge based scoring function
- trying the deduce rules form experiments
- e.g. DrugScore, PMF, …
Geometrical scoring function
- based on shape complementarity
- e.g. Connely Surface, Soft Belt Scoring
Consensus scoring function
- hybrid versions
- e.g. various Review Papers: [Trost, 2005]
23.02.2012 25
Difference between position score and rank score
„The pose score is often a rough measure of the fit of a ligand into the active site. The rank score is generally more complex and might attempt to estimate binding energies.“
"relatively small chemical modifications can lead to significant changes in binding."
[Kitchen et al., 2004]
23.02.2012 Bernhard Knapp 26 [Sousa, 2006]
23.02.2012 Bernhard Knapp 27 [Sousa, 2006]
23.02.2012 Bernhard Knapp 28 23232322222232232222222 .02.2012 Bernhard
Correct result vs incorrect result
23.02.2012 Bernhard Knapp 29
… and what about the correctness and reliability?
Currently correct results are more or less restricted to the area where the tools have been calibrated
e.g. for pMHC the area under the ROC is between 0.5 and 0.75 using different substitution and scoring tools [Knapp, 2008]
But
"We have long known that there is nothing in biology which is fundamentally inconsistent or incommensurable with mathematics, chemistry, and physics. Biology long ago rejected vitalism. The only information needed for life is provided by an organism's chemical constituents. It is unlikely in the extreme that living systems cannot be understood in terms of chemistry and physics.“ [Wan, 2008]
23.02.2012 Bernhard Knapp 30
Example Autodock
23.02.2012 Bernhard Knapp 31
What is Autodock
“AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. AutoDock actually consists of two main programs: AutoDock performs the docking of the ligand to a set of grids describing the target protein; AutoGrid pre-calculates these grids. In addition to using them for docking, the atomic affinity grids can be visualised. This can help, for example, to guide organic synthetic chemists design better binders.”
url: http://autodock.scripps.edu/
23.02.2012 Bernhard Knapp 32
search algorithms used for spatial spaceSystematic docking
- Brute Force
- Fragmentation
- Database
Heuristic docking
- Monte Carlo
- Genetic algorithms
- Tabu search
Simulations Docking
- Molecular Dynamics
- Gradient (Energy) Methods
23.02.2012 Bernhard Knapp 33
Deciding about the flexibility“rigid body” docking
- receptor and ligand are considered as 100% rigid
- very fast (6dfs only), but inaccurate
“induced fit” docking
- moveable [backbone| side] chains
“flexible ligand”
- only the ligand is considered als flexible, the receptor remains rigid
“full flexibility”
- computational very expensive
23.02.2012 Bernhard Knapp 34
Scoring functions (1/2)
Force Field based scoring function
- energy of the interaction and internal energy of the ligand
- combination of : Van der Waales, Lennard Jones, electrostatic energy, …
- e.g. D-Score, GoldScore, AutoDock, CHARMM, …
empirical scoring functions
- Trying to reproduce experimental observed docking behaviors by means of formulas
- ususlly the sum of uncorrelated terms
- e.g. LUDI, F-Score, SCORE, X-SCORE, …
23.02.2012 Bernhard Knapp 35
Scoring Funktionen (2/2)
Knowledge based scoring function
- trying the deduce rules form experiments
- e.g. DrugScore, PMF, …
Geometrical scoring function
- based on shape complementarity
- e.g. Connely Surface, Soft Belt Scoring
Consensus scoring function
- hybrid versions
- e.g. various Review Papers: [Trost, 2005]
23.02.2012 Bernhard Knapp 36
Autodock: sampling of spatial space (1/4)
Simulated Annealing
Different solutions
Qua
lity
of s
olut
ion
Global min
Random start up position, e.g. here
Stack in local min
23.02.2012 Bernhard Knapp 37
Autodock: sampling of spatial space (2/4)simulated annealing (german “abkühlen”) procedure:
Idea: local neighborhood search but „sometimes“ accepting worse solutions (certain probability)
Similar to annealing of crystals in physics
1. Melt a solid body in a heating pot
2. Atoms are almost randomly distributed
3. Slowly anneal
4. At each temperature a thermical balance is found
5. Atoms will arrange in an energetically advantageous position
TkEE
B
ij
ep
23.02.2012 Bernhard Knapp 38
Autodock: sampling of spatial space (3/4)Genetic Algorithms
- A set a values is used to define the ligand, receptor and their current states
- Doing it as nature:
1. Creating random population of solutions
2. Evaluation of fitness
3. Selection of the fittest n solutions
4. cross over, mutation, …
5. goto 2 again
244678904339965
2366849092127844
244678909212544
×
P1 P2 C1
23.02.2012 Bernhard Knapp 39
Autodock: Flexibility (1/1)
receptor hold rigid
ligands bounds have full flexibility according to a rotamer library
state of ligands bounds are represented as genes in the GA
23.02.2012 Bernhard Knapp 40
Autodock: Scoring in 1998 (1/1)
12, 6 Lennard Jones potential
Hydrogen bounds, weighted by angle t
Electrostatic forces
Torsion angles
Solvation effects
23.02.2012 Bernhard Knapp 41
Autodock2007: in general zero!
23.02.2012 Bernhard Knapp 42
Autodock2007: unbound?
3 approches for the unbound state
Extended
Compact
Bound
23.02.2012 Bernhard Knapp 43
Autodock2007
23.02.2012 Bernhard Knapp 44
Autodock2007: the formula
the weighting factors W have been calibrated on a set of 188 recptor/ligand complexes with known experimental binding affinities
Coordinates from the protein data bank (www.pdb.org)
Binding data from ligand-protein database (http://lpdb.scripps.edu/)
23.02.2012 Bernhard Knapp 45
Autodock2007: AD3 vs AD4
23.02.2012 Bernhard Knapp 46
Autodock2007: successrate against exp data
75 cases: found but other scored better
67 cases: found and scored best
28 cases: not found
=> 84% of all ligands found
23.02.2012 Bernhard Knapp 47
Video Autodock
[published on Autodock Homepage]
23.02.2012 48
Biologists have often concerns about the success of computational techniques. [Jorgensen, 2004] nicely summarizes such a situation:
“’Is there really a case where a drug that’s on the market was designed by acomputer?’ When asked this, I invoke the professorial mantra (’All questions are good questions.’), while sensing that the desired answer is ’no’. Then, the inquisitor could go back to the lab with the reassurance that his or her
choice to avoid learning about computational chemistry remains wise.”
So what is the role of computers in drug discovery?
23.02.2012 Bernhard Knapp 49
Take home messages for the first part
Computational methods can be used to identify potential drugs
They can help to reduce the number of candidates to test or predict a set of possible candidates. However, they can not predict the one and only working substance in one step
The methods are diverse
Nowadays there is still much space for improvement of the methods
"The day is coming when theory and computation will guide biology, as it does physics now.“ [Wan, 2008]
23.02.2012 Bernhard Knapp 50
Molecular Dynamics (MD)
23.02.2012 Bernhard Knapp 51
Introduction
MD is a type of computer simulation
Atoms interact under given laws of physics for a specified time
MD can be seen as an interface between “wet”-lab experiments and theoretical models
Used to analyze the spatial and energetic dynamics of e.g. bio-molecules, materials, …
Usually very computational power and memory consuming
Calculate forces between all atoms of the system …
… but what does forces mean?
A combination of bonded and non-bonded interactions …
n=6, usually n>1000
Bonded interactions
bond length
bond angle
torision
= = 12 = =
= arc cos × ×× ×= 12 1 +
23.02.2012 54
What does the „bond length“ term really mean?
[Shaw et al.]
perfect
too faraway
tooclose
23.02.2012 55
What does the „bond angle“ term really mean?
[Shaw et al.]
perfect
too big
toosmall
23.02.2012 56
What does the „torsion“ term really mean?
[Shaw et al.]
perfect
tilted
Non bonded interactions
Coulomb
Lennard-Jones
= 12 +
= 4
=
23.02.2012 58
What does the „coulomb“ term really mean?
[Shaw et al.]
23.02.2012 59
What does the „Lennard-Jones“ term really mean?
[Shaw et al.]
perfect
too faraway
tooclose
This all together is called a „force field“
… and of course the real implementations are waymore complicated. There are several softwarepackages available (e.g. GROMACS, AMBER, CHARMM, Schroedinger, …)
We divide time into discrete time steps of e.g. 1 fs (= 10-15 s)
…
t -> 10 000 000 fs (=10 ns)0 fs
What can we do with this force field?
… and calculate the forces for each time step while adjusting the postions
Iterate …and iterate …and iterate …and iterate …and iterate …
[from wikipedia]
Finally we get something like this:
In reality however more like this:
23.02.2012 Bernhard Knapp 65
„The equations are solved simultaneously in small time steps. The system isfollowed for some time, taking care that the temperature and pressure remain atthe required values, and the coordinates are written to an output file at regularintervals. The coordinates as a function of time represent a trajectory of thesystem.“ [Gromacs Manual]
23.02.2012 Bernhard Knapp 66
Define initial atoms positions
Calculate forces
Move atoms
Increment time
Stop criterionreached?
Flow diagram of a MD:
23.02.2012 Bernhard Knapp 67
Example for MD simulation using Gromacs [Hess et al., 2008]
1. Obtain atom coordinates for the system to be simulated (e.g. pdbformat from www.pdb.org) (takes minutes to days, mostly depended on the human)
2. Validate the pdb file (takes seconds)
3. Create a virtual simulation box around the system (takes seconds)
4. Fill the box with artificial water (takes seconds)
5. Minimize the energy of the system (takes minutes to hours)
6. Warm the system up to room temperature (takes hours to days)
7. Start the real MD simulation (takes days to months)
8. Evaluate Results (takes minutes to years(!) depended on the human)
23.02.2012 Bernhard Knapp 68
Example for MD simulation
0 ns 20 ns
23.02.2012 Bernhard Knapp 69
Video MD-Simulation shown via VMD
23.02.2012 Bernhard Knapp 70
Example for MD simulation
23.02.2012 Bernhard Knapp 71
Limitations of MD simulations (1 of 2) (on the basis of Gromacs)
Newton’s equations of motion describe classical mechanics, not quantum mechanics (=> sometimes problems with e.g. hydrogen atoms)
Electrons are in ground state: they are supposed to adjust their dynamics when the atomic positions changes (Born-Oppenheimer approximation)
Force fields are approximate: balance between computational load and accuracy, their parameters can be user-modified
Force fields are pair additive: omission of polarization
23.02.2012 Bernhard Knapp 72
Limitations of MD simulations (2 of 2) (on the basis of Gromacs)
Long range interactions are cutoff: only one image of each particle in the periodic boundary conditions is considered => cutoff can not exceed half the box size
Boundary conditions are unnatural: a lot of particles have vacuum as neighbor to avoid that periodic boundary conditions are used. => Sometimes the system is influencing itself
Computational costs and runtime (3 months for 20 ns!)
Cumulative errors in numerical integration and limitation in floating point representation
23.02.2012 Bernhard Knapp 73
Evaluations of MD-trajectories
Now we have something like that:
… a huge set of individual configurations over time. But what does this agglomeration of single structures tell us?
23.02.2012 Bernhard Knapp 74
RMSD
First idea: difference of the single frames (transparent) from starting structure (solid). Calculate the root mean square deviation:
N
i
Yi
Xi rr
NRMSD
1
21
Where N is the number of atoms, i is the current atom, rX is the target structure and rY is the reference structure.
Be careful if you compare structures with different positions and rotations in space. You will properly need to superimpose (fit) them first.
23.02.2012 Bernhard Knapp 75
RMSD cont
The RMSD over time (in this case rY is the first frame)
All frames:
Frame with highest RMSD:
23.02.2012 Bernhard Knapp 76
23.02.2012 Bernhard Knapp 77
Radius of Gyration
A similar measurement is the radius of gyration. It measures the distance of the regions’ parts from its center of gravity. Or in other words how packed a certain region is.
E.g.
The radius of gyration is an interesting property since it can be determined experimentally using “static light scattering” as well as with “small angle neutron-” or “x-ray scattering”. This allows theoretical scientists to check their models against reality.
23.02.2012 Bernhard Knapp 78
RMSF
Next idea: fluctuation of a particular amino acid over time. Calculate the “root mean square fluctuation”:
M
kikii rtr
MRMSF
1
2~)(1
Where M is the number of frames taken into account, ri(tk) is particle i of complex r at time k and r with tilde is the reference. This reference can for example be the average over a given time window.
23.02.2012 Bernhard Knapp 79
RMSF cont
23.02.2012 Bernhard Knapp 80
23.02.2012 Bernhard Knapp 81
SASA
How much of a certain area is exposed to the solvent (e.g. a amino acid or a region)? Calculate the solvent accessible surface area
solvent
protein
(possible) target
23.02.2012 Bernhard Knapp 82
SASA
Methodology to calculate the SASA:
23.02.2012 Bernhard Knapp 83
23.02.2012 Bernhard Knapp 84
23.02.2012 Bernhard Knapp 85
Take home messages for the second part
MD is a computer simulation of “real” atom-atom interactions
MD is very time and resource consuming
The output trajectories are huge and various ways to analyze them are existing
There are still certain limitations
23.02.2012 Bernhard Knapp 86
Tutorial on PDB / jmol
23.02.2012 87
Introduction TCRpMHC interaction on white board
23.02.2012 Bernhard Knapp 88
www.pdb.org => 1mi5
23.02.2012 Bernhard Knapp 89
right click => „console“select *cartoon offselect *:Cwireframe 100
23.02.2012 Bernhard Knapp 90
Opinions, comments und suggestions?
23.02.2012 Bernhard Knapp 91
Further literatureDocking:
Sousa SF, Fernades P, Ramos MJ. Protein-Ligand Docking Current Status and Future Challanges, Proteins 2006; 65:15-26.A semiempirical free energy force field with charge-based desolvation. Huey,R., Morris,G.M., Olson,A.J., and Goodsell,D.S. (2007). J Comput Chem. 28, 1145-1152.Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K., and Olson, A. J. J.ComputationalChemistry 19, 1639-1662. 1998. Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications, Nat. Rev. Drug Discov. 2004; 3:935-949.
Molecular Dynamics simulations:
Dodson GG, Lane DP, Verma CS (2008) Molecular simulations of protein dynamics: new windows onmechanisms in biology. EMBO Rep 9: 144-150.Karplus M, Kuriyan J (2005) Molecular dynamics and protein function. Proc Natl Acad Sci U S A 102: 6679-6685.Hess B, Kutzner C, vanderSpoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J Chem Theory Comput 2008.