Upload
salford-systems
View
1.270
Download
0
Tags:
Embed Size (px)
Citation preview
Donovan N. Chin & R. Aldrin Denny
Traditional Drug Discovery (insert graph)
In Silico Prediction of ADME (insert graph)◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution
Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
(insert graph)◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption
Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein
(insert graph)
Receptor Docking
Ligand Shape
Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
(insert graph)
341 Active
47 Non-Active
(insert graph)
After filtering by Pharmacophore Feature
(insert graph)
(insert functions for)◦ F_Score*
◦ D_Score
◦ G_Score
◦ PMF_Score
◦ Chem_Score
◦ ICM_Score*
Cell Adhesion Assay (50% Serum)◦ (insert graph)
Biochemical Adhesion Assay◦ (insert graph)
Scoring Functions Are Poor More Often Than Not
Receptor Site View Library Design FlexXScore Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bondsConsensus=5? if yes, substructure exists?if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
(insert graph)
Goal: Predict hit/miss class based on presence of features (fingerprints)
Method◦ Given a set of N samples◦ Given that some subset A of them are good („active‟) Then we estimate for a new compound: P(good)~ A/N
◦ Given a set of binary features F For a given feature F:
It appears in N samples
It appears in A good samples
Can we estimate: P(good l F)~A/N (Problem: Error gets worse as Nsmall)
◦ P‟(good l F)= (A+P(good)k)/(n+k) P‟(good l F)p(good)as N0 P‟(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
Descriptors (insert) Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000)
◦ Contains tertiary and stereochemistry information◦ Fast
Classification Analysis
◦ Developing Non-Linear Scoring Functions to classify actives and non-actives
◦ (insert graphs)
◦ Cost Function to Minimize: Gini Impurity N= 1-ΣP^2(ω)
Training Set Prediction Success
(insert table)
10-fold cross validation
Randomly split training and test sets
Significant Improvement in Separating Actives from Non-Actives
(insert graph)
Significant Improvement in Finding Hits Using New SF
Optimal tree identified (insert graph)
No random effects (insert graph)
(insert cluster)
Able to identify different molecular property criteria that lead to hits
(insert graph)
(insert graph)
Size= magnitude of OBA
OBA values cover range of descriptor space
(insert graph)
Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”
Build Model (insert graphs) Apply Model
Features found in high OBA
Features found in low OBA
Would be nice if CART did similar view
Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models
Identified key differences in molecular physical properties that led to hits
Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)
Biogen IDEC
Modeling ◦ Rajiah Denny◦ Claudio Chuaqui◦ Juswinder Singh◦ Herman van Vlijmen◦ Norman Wang◦ Anuj Patel◦ Zhan Deng
Chemistry◦ Kevin Guckian◦ Dan Scott◦ Thomas Durand-Reville◦ Pat Conlon◦ Charlie Hammond◦ Chuck Jewell
Pharmacology◦ Tonika Bonhert