22
Towards a modular Web-based Workflow environment for enabling large scale Virtual Screening in Cancer Chemoprevention Research 19 June 2012 COST Conference Personalised Medicine: Better Healthcare for the Future Christos Kannas Computer Science Dept., University of Cyprus

20120615_Granatum_COST_v2

Embed Size (px)

DESCRIPTION

Presentation about the Virtual Screening Scientific Workflow for Cancer Chemoprevention developed within the GRANATUM project.

Citation preview

Page 1: 20120615_Granatum_COST_v2

Towards a modular Web-based Workflow environment for

enabling large scale Virtual Screening in Cancer

Chemoprevention Research19 June 2012

COST ConferencePersonalised Medicine: Better Healthcare for the Future

Christos KannasComputer Science Dept., University of Cyprus

Page 2: 20120615_Granatum_COST_v2

2

Outline

• About the Project• Overview of the Project• Objectives• State of the Art Review• Implementation

• Virtual Screening Process• Predictive Model Preparation• In-Silico Tools and Methods

• Early in silico experiments• Concluding Remarks

June 19, 2012

Page 3: 20120615_Granatum_COST_v2

3

About the Project• The vision of the GRANATUM project is to:

• bridge the information, knowledge and collaboration gap among biomedical researchers in Europe (at least),

• ensure that the biomedical scientific community has homogenized, integrated access to the globally available information and data resources needed to perform complex cancer chemoprevention experiments, and conduct studies on large-scale datasets.

• The GRANATUM project is partially funded by the European Commission under the Seventh Framework Programme in the area of Virtual Physiological Human (ICT-2009.5.3).

• http://www.granatum.org/

June 19, 2012

Page 4: 20120615_Granatum_COST_v2

4

Overview of the Project

June 19, 2012

Page 5: 20120615_Granatum_COST_v2

5

Objectives

• Design a scientific algorithmic workflow for the development of in silico chemoprevention models.

• Implement workflow(s) for the selection of promising chemopreventive agents.

• Connect the custom in-silico models for compound selection to other datasets, and evidence included in the Linked Biomedical Data Space.

• Test the performance of custom in-silico models.

June 19, 2012

Page 6: 20120615_Granatum_COST_v2

6

State Of the Art• Significant overlap of chemoprevention and traditional

drug discovery process (DDP).• Special case with additional constraints, e.g. no toxicity

• In Silico Models and Tools: heavily borrowing from DDP.

June 19, 2012

SOA Review

Online resources Databases (e.g. ChemBL), journals, reports, …

Infrastructure tools Chemoinformatics toolkits (e.g. RDKit and CDK): compound representation, property and descriptor calculation, substructure mining, …

Advanced comp. chem. Biological property predictive models, compound 3D conformations, docking tools, …

Machine learning Classification and regression methods, available open source libraries

Scientific workflow systems

Knime, Taverna, Galaxy, …

Page 7: 20120615_Granatum_COST_v2

7

Virtual Screening Process Template

June 19, 2012

Input• Linked

Biomedical Space

• Files

Preprocessing• File format

transformations• Standardization• Descriptor

Calculation• Fragmentation

Processing• Attribute filter• Similarity

search• Substructure

Search• Docking• Predictive

Models

Postprocessing• Cleaning• Formatting

Output• Storage• Visualization

Page 8: 20120615_Granatum_COST_v2

8

Predictive Model Preparation Template

Predictive Model

Biological data

Chemical data

Algorithm• Algorithm

parameters

June 19, 2012

Page 9: 20120615_Granatum_COST_v2

9

Chemopreventive Property ModelsAnti – oxidant

Direct Effect

Indirect Effect

Direct/Indirect Effect

Anti – inflammatory

COX-2 but not COX-1

inhibitor

Reduction of TNF-a

Reduction of LOX

Induction of AP-1

Reduction of Interleukins

Anti – proliferating

Cyclin D1 down-

regulation

Her-2 down-regulation

Cyclin E down-

regulation

EGFR down-regulation

Apoptotic

Anti-apoptotic members of Bcl-2 family

down-regulation

IAP family down-

regulation

Caspase up-regulation/acti

vation

Anti – metastatic / Anti

– agiogenic

COX-2 down-regulation

VEGF down-regulation

PDGF down-regulation

Estrogenic Activity

ER-alpha binding affinity

ER-beta binding affinity

ER-alpha/beta binding affinity

No affinity

Estrogen Antagonists

Selective Estrogen Receptor

Modulators (SERMs)

Estrogen Receptor

Modulators

June 19, 2012

Page 10: 20120615_Granatum_COST_v2

10

In Silico Tools and Methods• Generic Chemoinformatics Tool:

• E-Health Lab and collaborators resources• RDKit

• Docking Experiment Tools:• AutoDock Vina• Chil2 GlamDock

• Data Mining & Statistics Tools:• In house tools• R

• Scientific Workflow System:• Galaxy

June 19, 2012

Page 11: 20120615_Granatum_COST_v2

11

Early In Silico Experiments• In silico tool & models validation • Steps:

• Prepare compound dataset • Mix of natural products and known inhibitors (4% actives)

• Implementation/application of predictive models• Rule of Five• Toxicity model

• Implementation/application of docking model• ER-alpha

• Compound prioritization• Top selections visualization/evaluation

June 19, 2012

Page 12: 20120615_Granatum_COST_v2

12

Virtual Screening Process Example

Natural products

collection + known ER-

alpha inhibitors

Calculate physicochemic

al molecular descriptors

Rule of Five filter

Toxicity modelDocking to ER-alpha

Compound prioritization; Report on top

selections

June 19, 2012

Page 13: 20120615_Granatum_COST_v2

13

Cytotoxicity Predictive Model

Cytotoxicity Predictive

Model

• Cytotoxicity Bio-Chemical data

• SVM:• Kernel: Linear• Stratified K-

Fold:• 5-folds• 10-folds

Morgan Fingerprints

• Bit Vector 2048-bits

Oral Drug-like Filtering

• HBA <= 10• HBD <= 5• Molecular

Weight <=500• logP <= 5

Clean Molecules

• Remove Salts

Cytotoxicity Dataset

• Source : The Scripps Research Institute Molecular Screening Center

• PubChem Bio-Assay: AID 464

• Tested: 706• Active: 331• Inactive: 375

June 19, 2012

Page 14: 20120615_Granatum_COST_v2

14

Virtual Screening Process Example

Ranked order of Cytotoxicity Prediction, Docking and Oral Druglikness Filtering results

ER-Alpha Docking (GlamDock)ER-Alpha Protein 2451 molecules for docking experiments

Cytotoxicity (Predictive Model)SVM Classifier Trained with Bio-Assay 464 dataset Predict: 2451 molecules

Calculate Morgan FingerprintsBit Vector 2048-bits

Oral Druglike FilteringHBA <= 10 HBD <= 5 Molecular Weight <=500 logP <= 5 Result: 2035 pass, 416 not

pass

Clean MoleculesRemove Salts 2451 molecules (42 Known, 2409 Indofine)

Demo DatasetKnown ER-Alpha Inhibitors (42) Indofine Dataset (2494) Result: 2451 OK, Remove 85 (valence errors, empty

molecule block)

June 19, 2012

row-20-top-known row-36-top-known row-42-top-known

row-729-top-unknown row-1652-top-unknown row-1988-top-unknown

Page 15: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-20-top-known

June 19, 2012 15

Page 16: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-36-top-known

June 19, 2012 16

Page 17: 20120615_Granatum_COST_v2

Docking results: known ER inhibitors

row-42-top-known

June 19, 2012 17

Page 18: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-729-top-unknown

June 19, 2012 18

Page 19: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-1652-top-unknown

June 19, 2012 19

Page 20: 20120615_Granatum_COST_v2

Docking results: Indofine compounds

row-1988-top-unknown

June 19, 2012 20

Page 21: 20120615_Granatum_COST_v2

21

Concluding Remarks• Support of chemopreventive specific predictive models.

• Initial promising results on ERa (based on Indofine dataset).

• Modular architecture and workflow management.• Integrated with additional tools within the Granatum

Project.• Linked Biomedical Data Space.• Social Collaborative Workspace.

• Product Release:• Advanced Prototype Version: October 2012• Final Version: April 2013

June 19, 2012

Page 22: 20120615_Granatum_COST_v2

22June 19, 2012