Upload
dinhliem
View
222
Download
1
Embed Size (px)
Citation preview
Accelerating Virtual High-Throughput
Ligand Docking Screening One Million Compounds Using a Petascale Supercomputer
Sally R. Ellingson, PhD Candidate
Department of Genome Science and Technology, University of Tennessee
Center for Molecular Biophysics, UT/ORNL
Advisor: Dr. Jerome Baudry
2012 Emerging Computational Methods for the Life Sciences Workshop (In Conjunction with HPDC12 Delft, Netherlands)
Outline
• What is virtual molecular docking?
• What is the importance of a virtual high-throughput screening?
• Autodock4 and Autodock4.lga.MPI
▫ Implementation details
▫ Case study: million compound screen
• What is the importance of multi-protein docking?
▫ Limitations with current screening software
▫ Future opportunities using Autodock Vina
What is virtual molecular docking?
• Predicts conformation of a protein-ligand complex
• Predicts binding affinity of the ligand to the protein
Diller, D. J. and Merz, K. M. (2001), High throughput docking for library design and library prioritization. Proteins, 43: 113–124.
(+) Reproduce correct bound conformation (+) Assign better scores to high-affinity ligands than to decoys (enrichment) (-) Generate scores that correlate with measured binding affinities
Why is virtual docking important in
novel drug discovery?
• Many medications act by binding and inhibiting a specific target
• Early stage drug discovery consist of identifying ligands that bind to specific proteins with a high affinity and retain favorable pharmacological properties.
http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99-hes-about-99-wrong/
What is the importance of a virtual
high-throughput screening?
(A) Sally R. Ellingson and Jerome Baudry. High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud. In Proceedings of the second international
workshop on Emerging computational methods for the life sciences (ECMLS '11). ACM, New York, NY, USA, 33-38. DOI=10.1145/1996023.1996028 http://doi.acm.org/10.1145/1996023.1996028.
(B) Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds
Using a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
(A) (B)
Why is high-throughput virtual
screening important in drug discovery?
http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99-hes-about-99-wrong/
Virtual screenings: -Faster and more cost efficient -Allows larger search space of chemical compounds -Creates a wider, shorter funnel
Autodock4 http://autodock.scripps.edu/
Free, open source docking software developed at The Scripps Research Institute
Conformational Search using Lamarckian Genetic Algorithm
Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K. and Olson, A. J. (1998), Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem., 19: 1639–1662.
Autodock4 http://autodock.scripps.edu/
Free, open source docking software developed at The Scripps Research Institute
Scoring of generated conformations
Huey, R., Morris, G. M., Olson, A. J. and Goodsell, D. S. (2007), A semiempirical free energy force field with charge-based desolvation. J. Comput. Chem., 28: 1145–1152.
Autodock4 http://autodock.scripps.edu/
Free, open source docking software developed at The Scripps Research Institute
Virtual Docking Process
Precalculated Affinity Grids
Receptor PDBQT
Ligand PDBQT
Docking Parameter File
AutoDock Docking Log File
This process must be done for every ligand in a high-throughput screening
Autodock4.lga.MPI
Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202–1209
Main Improvements for Virtual Screening -Separation of parameters associated with the screening and individual ligands -Concatenated binary grid files (HDF5) -Reduced output size
A high-throughput virtual screening tool
Goal -Develop a virtual screening tool that runs on high-performance supercomputers (MPI)
Autodock4.lga.MPI
Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202–1209
A high-throughput virtual screening tool
using 196 CPUs
maps.h5 19MB -53MB → 9.8MB-28MB
Autodock4.lga.MPI
Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers. B. Collignon, R. Schulz, J.C. Smith and J. Baudry J. Comput. Chem. (2011) 32 (6): 1202–1209
A high-throughput virtual screening tool
Postdocking (analysis)
TUTORIAL http://www.bio.utk.edu/baudrylab/autodockmpi.htm
Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds Using
a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
Predocking (file preparation)
Million Compound Screening
on a petascale supercomputer
Workflow controlled by python scripts Runs on Lens (analysis cluster - Jaguar)
Sally R. Ellingson, Sivanesan Dakshanamurthy, Milton Brown, Jeremy C. Smith, and Jerome Baudry. Accelerating Virtual High-Throughput Ligand Docking: Screening One Million Compounds Using
a Petascale Supercomputer. Proceedings of the third international workshop on Emerging computational methods for the life sciences (ECMLS '12) (accepted)
Million Compound Screening
on a petascale supercomputer
65k processors
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
# o
f c
om
po
un
ds
Rotatable Bonds (Degrees of Freedom)
Million Compound Library
What is the importance of
multi-protein docking?
http://www.chemistry-blog.com/2012/01/04/tedtalk-medicine-for-the-99-hes-about-99-wrong/
Multi-protein docking
Many proteins of important function
Drug Candidate
Also for many conformations of the same protein – to model receptor flexibility
Multi-protein docking: -Determine toxicity and side effects -Predict failures earlier in the process -Increase overall success rate
Multi-protein docking and limitations
with current screening software
Multi-protein docking
Many proteins of important function
Drug Candidate
Autodock4.lga.MPI -Separate MPI jobs for each receptor -Binary grid files for each receptor
What is needed? A tool that allows an increase in the number of receptors used in a screening with a minimal increase in the amount of I/O per docking task
Receptor PDBs Ligand PDBs
Multi-protein
screening
All combinations
Autodock Vina Potential as docking engine for multi-protein screening
• Scoring function: machine-learning approach
• Conformational search: iterated local search global optimizer step mutation, local optimization, Metropolis acceptance criterion
Trott, O. and Olson, A. J. (2010), AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31: 455–461. doi: 10.1002/jcc.21334
Average time in minutes per complex 2-quad core processors
Autodock4 Autodock Vina
Autodock Vina Potential as docking engine for multi-protein screening
• Calculates grid maps efficiently during docking and does not store them on disk
• Result clustering and ranking details hidden (reduced output)
• Limitations removed (i.e. maximum # of rotatable bonds)
• Already multi-threaded (each docking potentially more efficient)
Summary
• High-throughput molecular docking is an important tool to increase the cost and time efficiency of drug discovery
• Current screening tool, Autodock4.lga.MPI, allows for a million compounds to be screened in less than 24 hours
• Future development will focus on using multiple receptors
Acknowledgements
• Genome Science and Technology, UT • Center for Molecular Biophysics, UT/ORNL
▫ Jeremy C. Smith • SCALE-IT, NSF/IGERT
Scalable Computing and Leading Edge Innovative Technologies
• National Center for Computational Sciences • Georgetown University • NIH-CTSA • ECMLS12 workshop organizers