Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
www.egi.euEGI-InSPIRE RI-261323
EGI-InSPIRE
www.egi.euEGI-InSPIRE RI-261323
Cheminformatics platform for drug discovery application
Hsi-Kai, WangAcademic Sinica Grid ComputingEGI User Forum, 13, April, 2011
1
www.egi.euEGI-InSPIRE RI-261323
• Introduction to drug discovery• Computing requirement of high
throughput virtual screening• Cheminfomatics case study
www.egi.euEGI-InSPIRE RI-261323
Computational chemistry /Molecular modeling
useful across the pipeline,but
very different techniques
aim for success,but if not:
fail early, fail cheap
Ref: Makus R. and Ralph W., Nature Rev. Drug Discov. (2003), 2, 123-131
Drug discovery development
www.egi.euEGI-InSPIRE RI-261323
Strategy in drug discovery
Ligand unknown Ligand known
Receptor(3D structure)
unknown
CombichemHTS
Virtual Screening
PharmacophoreSimilarity
QSAR
Receptor(3D structure)
known
Receptor-bases searchingDe novo design
Structure-based drug designReceptor-ligand interaction
Docking
4
www.egi.euEGI-InSPIRE RI-261323
• What is grid• Many definitions exist in the literature
• Foster and Kesselman, 1998. “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational facilities.”
• Grid can provide• Large scale and on-demand resources
• Computing resources (computing grids)• Storage resources (data grids)
Drug discovery on Grid (1/2)
www.egi.euEGI-InSPIRE RI-261323
chemical compounds
Receptor structures
Molecular docking ….takes years
Data challenge on Grid….can be done in weeks
In vitroscreeningof best ~50
hits
Hits sorting and refining
Problem– Millions of compounds and drugs molecules are presently available for screening– But developing efficient assay in laboratory for such a work is time-consuming and
very expensive
Solution– Grids offer high-speed computing and huge-data managing capability– Possible variant targets can be studied quickly by present modelling applications.– This will help medicinal chemists to respond to major instant threats.
Drug discovery on Grid (2/2)
www.egi.euEGI-InSPIRE RI-261323
GVSS, GAP Virtual Screening Service
www.egi.euEGI-InSPIRE RI-261323
GAP Service Architecture
8
www.egi.euEGI-InSPIRE RI-261323
•A lightweight framework for parallel scientific applications in master worker model,
•The framework takes care of all synchronization, communication, and workflow management details on behalf of application
User Application Interface
GRID environments
DIANE, DIstributed ANalysis Environment
www.egi.euEGI-InSPIRE RI-261323
• Each horizontal line segment = one task = one docking
• Unhealthy workers are removed from the worker list
• Failed tasks are rescheduled to healthy workers
the “bad” worker removed
good load balance
The profile of a DIANE job
www.egi.euEGI-InSPIRE RI-261323
• 280 DIANE worker agents were submitted as LCG jobs
• 200 jobs (~71%) were healthy– ~16 % failures related to
middleware errors– ~12 % failures related to
application errors
DIANE utilizes ~ 95% of the healthy resources
stable throughput
Efficiency and throughput of DIANE
www.egi.euEGI-InSPIRE RI-261323
GVSS application: dengue virus
Ref: Hsin-Yen C. et al, J Grid Computing (2010), 8, 529-541
www.egi.euEGI-InSPIRE RI-261323 Ref: Clark G.G. , "Dengue: An emerging arboviral disease“, 2006
Worldwide dengue distribution
Areas infested with Aedes aegyptiAreas with Ae. aegypti and dengue epidemics
www.egi.euEGI-InSPIRE RI-261323
http://en.wikipedia.org/wiki/Aedes
Kuhn, R.J.et al. Cell 108, 717−725; 2002
Dengue virus
www.egi.euEGI-InSPIRE RI-261323 Ref: PDB: 2vbc (2008) J.Virol. 82: 173
H51D75
S135
Dengue NS3 protease
www.egi.euEGI-InSPIRE RI-261323
Dengue Fever Data Challenge / resources & 1st result
Total number of completed docking jobs
300,000
Estimated needed computing power
4,167 CPU*days
Duration of the experiment
60 days
Cumulative computing results
42.5 GB
Total Computing Recourses in EUAsia VO
268 Cores
Number of used Computing Elements
6
www.egi.euEGI-InSPIRE RI-261323
• Accumulating Computing Recourses in EUAsia VO: 268 cpu-cores(100 – ASGC(TW), 2 – TH, 4 - VN, 18 – MIMOS(MY), 80 – UPM(MY), 64 - CESNET(CZ))• lcg-infosites --vo euasia ce
• Registered VQS account: • 6 users (TW)• 17 user (PH, 15 in AdMU, 2 in ASTI)• 2 user (TH, 1 in NECTEC, 1 in HAII)• 1 user (MY, UPM)• 1 user (ID, ITB)• 2 user (VN, IAMI)• 1 user (FR, HealthGrid)
Joint Computing Resources & Users
www.egi.euEGI-InSPIRE RI-261323
Integration of SG & DG by EDGES
18
www.egi.euEGI-InSPIRE RI-261323
Scenario 1 –DG to SG via bridge
19
www.egi.euEGI-InSPIRE RI-261323
Scenario 2 –SG to DG via bridge
20
www.egi.euEGI-InSPIRE RI-261323
Scenario 3 –SG/DG resources but not through EDGeS
bridges
21
Job Manager
Task Manager
www.egi.euEGI-InSPIRE RI-261323
Web UI Service Architecture
22
www.egi.euEGI-InSPIRE RI-261323
Prototype Web UI Screenshot
23
www.egi.euEGI-InSPIRE RI-261323
Simulation of drug discovery workflow
24
Ligand
Protein
Preparing ligand & protein
Docking Scoring
Generating conformation Analyzing & ranking data
www.egi.euEGI-InSPIRE RI-261323
Protein Database
25Ref: PDB, http://www.rcsb.org/pdb/home/home.do
PDBbind, http://sw16.im.med.umich.edu/databases/pdbbind/index.jsp
www.egi.euEGI-InSPIRE RI-261323
• Genetic algorithm– is a search heuristic that mimics the process of natural evolution. It
generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.
– AutoDock, GOLD…
• Molecular dynamics– is used to find poses by force-fields. The generated conformations
usually consists of a simulated annealing to locate the global optimum in a large search space.
– AMBER, CHARMM…• Shape complementarities
– is a description of the molecules, including solvent-accessible surface area, geometric constraints, H-bond, hydrophobic/hydrophilic interaction between all atoms in the complex.
– DOCK, FRED…
General class ofdocking algorithm
www.egi.euEGI-InSPIRE RI-261323
• Force Field– affinities are estimated by intermolecular van der Waals,
electrostatic interaction et al. between all atoms of the two molecules in the complex.
– AMBER…• Empirical
– count the number of interactions and assign a score based on the number of occurrences. Example H-bond, ionic, hydrophobic/hydrophilic interaction.
– LUDI, X-Score…• Knowledge-base
– observe known protein/ligand structures, and favor interactions and geometries that are seen often.
– DrugScore, PMF…
General class ofscoring function
www.egi.euEGI-InSPIRE RI-261323 28Ref: AutoDock, http://autodock.scripps.edu/
X-SCORE, http://sw16.im.med.umich.edu/software/xtool/
Tools of docking and scoring
www.egi.euEGI-InSPIRE RI-261323
Simulated Condition
• Ligand and Protein• PDBBind database v2010 (3429 complexes)
• Docking• software: AutoDock• computing time: 30 ~ 50 min per docking
• ReScoring• software: X-Score• computing time: 1 ~ 2 min per scoring
29
www.egi.euEGI-InSPIRE RI-261323
Free energy in AutoDock, X-Score
www.egi.euEGI-InSPIRE RI-261323
Free energy R2 inligand molecular weight
www.egi.euEGI-InSPIRE RI-261323
Free energy R2 inprotein enzyme type
www.egi.euEGI-InSPIRE RI-261323
RMSD in AutoDock, X-Score
33
www.egi.euEGI-InSPIRE RI-261323
www.egi.euEGI-InSPIRE RI-261323
www.egi.euEGI-InSPIRE RI-261323
Future work
• Finish implement Web-based Virtual Screening Service with EDGeSinfrastructure.
• The 691 proteins x 691 ligands docking tasks complete and data analysis.
• Other proteins are classified by enzyme code.
36
www.egi.euEGI-InSPIRE RI-261323
Thank you for your attention