Upload
vanthuan
View
213
Download
0
Embed Size (px)
Citation preview
2004 Sheffield Chemoinformatics Conference, April 21-23
Automated Decision Support for the Screening Process
C.A. Nicolaou1, D.A. Kleier2, T.K. Brunck1, P.A. Bacha1
1Bioreason, Inc., 121 Sandoval St., Suite 220, Santa Fe, NM, USA2DuPont Agricultural Products, Stine-Haskell Research Center, Newark,
DE, USA
2004 Sheffield Chemoinformatics Conference, April 21-23
Outline
• Decision Support for LeadOptimization– Focus on SAR extraction
• Definitions and Goals • Bioreason Approach
– Fundamentals– RTableGenerator and SARXtractor
• Examples
2004 Sheffield Chemoinformatics Conference, April 21-23
Decision Support for LeadOptimization
• LeadOptimization:– Fine tuning lead compounds to increase potency
and remove undesired biological properties• Focus on SAR extraction step
– An effort to understand mechanisms/interactions– Suggest optimization paths– Enable synthesis/acquisition of compounds with
highest probability of generating further crucial knowledge
– Task performed by highly trained experts• Wrong decisions can be very-very costly
– Time, resources, …
2004 Sheffield Chemoinformatics Conference, April 21-23
Definitions
• What is SAR/SPR? Ask a chemist…– “I know it when I see it”– “I definitely don’t want to see a table of numbers –
I need a scaffold and R-groups to work with” • It is usually reported in the form of R-tables
on a given scaffold• Ideally it is reduced to a set of rules
– Rules can be targeted for use with an expert system engine or on their own
– Rules can be accumulated into a knowledge base
2004 Sheffield Chemoinformatics Conference, April 21-23
Example: SAR Information on 3o Amines
N
Rvariation of R allowedto retain activity 3
dim ethylsubstitutionallowed methyl = inactive
propynyl a llowed
Et, i-Pr = inactive
EW D groups = inactive
sensitive to stericbulk and EWD groups
What a chemist would like to see:
2004 Sheffield Chemoinformatics Conference, April 21-23
Goals
• Correctness• Completeness• Easily interpretable/usable results
– As close to the form described by experts– Meaningful descriptors
• Usability– User friendly– Capable of handling screening datasets
generated in modern drug discovery environment
2004 Sheffield Chemoinformatics Conference, April 21-23
Bioreason Fundamentals
• Technical approach– Learn what is important directly from the data
• Assumptions/predefined knowledge kept to a minimum
– Form structural classes• Unsupervised, solely based on molecular graphs
– Reason with the classes• Overlay activities, other biological attributes• Characterize & prioritize classes• Build models, etc…
2004 Sheffield Chemoinformatics Conference, April 21-23
Types of Classes
Defined Rings Rings with Variable Closures
2004 Sheffield Chemoinformatics Conference, April 21-23
SAR Extraction Approach
• Global SAR models – Few rules, sometimes quite hard to interpret
• Local models on Bioreason classes?– First, learn good classes
• The scaffolds are the first part of SAR– Generate R-tables– Choose an appropriate class of descriptors
• Characterize R-groups – From the R-tables and descriptors…
• Construct models• Extract rules
2004 Sheffield Chemoinformatics Conference, April 21-23
Generating R-Tables
• For classes without variation in activity…– No need for R-table – Scaffold is indicative/predictive of activity
• For mixed classes…– Using scaffold as a starting point automatically
learn R-groups– Relate R-groups to available compound
activity/property attributes.• Not an easy problem!
– Enabling user interaction to change alignment
2004 Sheffield Chemoinformatics Conference, April 21-23
Automatic R-table Examplefrom acute-toxicity data set
2004 Sheffield Chemoinformatics Conference, April 21-23
Discovering SAR
• Calculate appropriate SAR descriptors– Position-Specific-Descriptors(PSD) to
characterize R-groups • Construct models
– From R-tables and PSD set of each class• Interpret models
– Extract concise, meaningful rules
2004 Sheffield Chemoinformatics Conference, April 21-23
Position-Specific-Descriptors
• Physicochemical– Rgroup_Property– LogP, molecular weight, and number of H-bond
donors, H-bond acceptors, charges, rings, and rotatable bonds, polar surface area, basic sites, …
– Example: R3_logP == 0.75• Pseudo-3D pharmacophores
– Rgroup_number of bonds_pharmacophore point – H-bond donor, H-bond acceptor, anion, cation, polar,
hydrophobe, aromatic ring, and aliphatic ring– Example: R4_3_HBA: Yes
• All descriptors learned for each class modeled
2004 Sheffield Chemoinformatics Conference, April 21-23
SAR Extraction Algorithm
• Multiple Domain– compound can appear in
more than one child node– compound can contribute to
more than one rule• Multiple Splitting
– parent node can have more than two children
– extract as much SAR at each level as is statistically meaningful
– number of splits controlled by statistical means, e.g. parent-child chi2 cutoff (0.7)
DatasetAvg. Prop. 4.2
R4_3_HBA: YesR10_MW>31
R3_logP inrange(0.45-0.75)
R11_2_HYD: Yes… …
2004 Sheffield Chemoinformatics Conference, April 21-23
SAR Tree Interpretation
• Each node in this type of decision tree is a rule or hypothesis easily to express in English
• Each hypothesis has– Indication of certainty (statistical)– Feature name/range (e.g. logP between x and y)– Support (number of examples)– List of examples
• Rules with multiple elements are possible – Aggregate certainty terms
2004 Sheffield Chemoinformatics Conference, April 21-23
Analysis of Commercial Pesticides
• Source of compounds– The pesticide manual: A world compendium– Published by the British crop protection council
• Types of activity considered– Herbicide, insecticide, fungicide, plant growth
regulation– Binary indicator variables used for type of activity
• Task: – Identify scaffolds associated with herbicidal activity
& features that distinguish herbicides from non-herbicides within the same class
®
2004 Sheffield Chemoinformatics Conference, April 21-23
Diphenylether subclass is evenly
distributed between herbicides and non-herbicides. What
substituent features distinguish the
herbicides?
OR1
R4
R2 R7
R5
R3R6
2004 Sheffield Chemoinformatics Conference, April 21-23
R-Table sorted by herbicide activity and displayed at
cutoff between herbicides and non-herbicides
2004 Sheffield Chemoinformatics Conference, April 21-23
If R5 has a HBA located 2 bonds from the
scaffold,then probability of
activity is 95% (cf. 47% for class as a whole)
with a certainty of 1.00.
2004 Sheffield Chemoinformatics Conference, April 21-23
If R5 PSA within the range of 33.97 to 65.16,
then probability of herbicidal activity is
100% (cf. 47% for class as a whole)
with a certainty of 1.00.
2004 Sheffield Chemoinformatics Conference, April 21-23
This rule cleanly differentiates
pyrethroid insecticides from
diphenyl ether herbicides
If R3 has an AlR center located 5 bonds from
the scaffold,probability of herbicidal
activity is 0% with a certainty of 1.00.
2004 Sheffield Chemoinformatics Conference, April 21-23
Peptide Deformylase Inhibitors
ClassPharmer™SAR extraction & pharmacophore
perception
®
2004 Sheffield Chemoinformatics Conference, April 21-23
Learning SAR Rules for Inhibitors of Peptide Deformylase (PDF)
• Training set of 22 mostly Beta-sulfinylhydroxamates– Reference: Apfel, et al., J. Med. Chem., 43,
2324(2000)• Compounds classified & characterized by
MCS using ClassPharmer™ technology• R-Tables generated for each class• QSARs learned for each class
Training Set of Hydroxamic Acids1 2.22 (0)
NOH
O NH
O
NH
O OH
16 1.96 (0)
S
OO
O
NHOHNHO
14 1.51 (0)
S
OO
O
NHOHO
5 1.46 (0)
S
OO
O
NHOH
9 1.03 (0)
S
OO
O
NHOHO
7 0.80 (0)
S
OO
O
NHOH
12 0.72 (0)
S
OO
O
NHOH
8 -0.04 (0)
S
OO
O
NH OH
O
O
2 -1.34 (0)
S
OO
O
NHOH
2004 Sheffield Chemoinformatics Conference, April 21-23
Classification & R-grouping by ClassPharmer™
NO
O
R3
SO
Bx1
A
R1R2
pIC50 R116 1.95860731484
O
*H
*
NHO a
b
13 1.63827216398
O
*H
* ab
8 -0.0413926851582
O
*H
OO
*
a
b
2 -1.34242268082
O
*H
*
a
b
R2 R3 X1
1.96
1.64
-0.04
-1.34
Cpd ID
16
13
8
2
8o 342 1.96 (1)
SO
OO
NHOHNHO
8h 295 1.03 (1)
S
OO
O
NHOHO
9a 269 1.00 (1)
S
O
O
NHOH
8e 299 0.85 (1)
S
OO
O
NHOH
9e 348 0.68 (1)
SO
O
NHOHBr
ClassPharmer™ Rule for Desirable R3 Groups
If R3(MW) in range of 50 to 74, the probability of activity is significantly enhanced
92% of CompdsSatisfying the
premise are active
67% of Compounds in class are active
4 0.80 (1)
SO
OO
NHOH
7 0.80 (1)
S
OO
O
NHOH
8 -0.04 (1)
SO
OO
NH OH
OO
3 -0.48 (1)
S
OO
O
NHOH
2 -1.34 (1)
S
OO
O
NHOH
Obverse Rule for Undesirable R3 Groups
If R3(MW) outside of range of 50 to 74, the probability of activity is significantly decreased
All Compounds Satisfying the
premise are inactive
2004 Sheffield Chemoinformatics Conference, April 21-23
R3 pocket in active site of (PDF)Ni(II)
CGG49 in active site of E. coli Ni-PDF (Roche) Apfel, et al. J. Med. Chem. 2000, 43, 2324-2331
OHNHO
S OO
R3 = nBu
2004 Sheffield Chemoinformatics Conference, April 21-23
Future Directions
• Expand position specific descriptors types– ADME/Tox analysis– Electronic
• Rule Synopsis
• Mine info across screens, libraries, time
2004 Sheffield Chemoinformatics Conference, April 21-23
Acknowledgements
• Bioreason– Terence K. Brunck– Pat Bacha– Suzanne Sloan
• DuPont– Dan A. Kleier– A number of forward thinking and very
patient scientists