35
2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for the Screening Process C.A. Nicolaou 1 , D.A. Kleier 2 , T.K. Brunck 1 , P.A. Bacha 1 1 Bioreason, Inc., 121 Sandoval St., Suite 220, Santa Fe, NM, USA 2 DuPont Agricultural Products, Stine-Haskell Research Center, Newark, DE, USA

Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

Embed Size (px)

Citation preview

2004 Sheffield Chemoinformatics Conference, April 21-23

Automated Decision Support for the Screening Process

C.A. Nicolaou1, D.A. Kleier2, T.K. Brunck1, P.A. Bacha1

1Bioreason, Inc., 121 Sandoval St., Suite 220, Santa Fe, NM, USA2DuPont Agricultural Products, Stine-Haskell Research Center, Newark,

DE, USA

2004 Sheffield Chemoinformatics Conference, April 21-23

Outline

• Decision Support for LeadOptimization– Focus on SAR extraction

• Definitions and Goals • Bioreason Approach

– Fundamentals– RTableGenerator and SARXtractor

• Examples

2004 Sheffield Chemoinformatics Conference, April 21-23

Decision Support for LeadOptimization

• LeadOptimization:– Fine tuning lead compounds to increase potency

and remove undesired biological properties• Focus on SAR extraction step

– An effort to understand mechanisms/interactions– Suggest optimization paths– Enable synthesis/acquisition of compounds with

highest probability of generating further crucial knowledge

– Task performed by highly trained experts• Wrong decisions can be very-very costly

– Time, resources, …

2004 Sheffield Chemoinformatics Conference, April 21-23

Definitions

• What is SAR/SPR? Ask a chemist…– “I know it when I see it”– “I definitely don’t want to see a table of numbers –

I need a scaffold and R-groups to work with” • It is usually reported in the form of R-tables

on a given scaffold• Ideally it is reduced to a set of rules

– Rules can be targeted for use with an expert system engine or on their own

– Rules can be accumulated into a knowledge base

2004 Sheffield Chemoinformatics Conference, April 21-23

Example: SAR Information on 3o Amines

N

Rvariation of R allowedto retain activity 3

dim ethylsubstitutionallowed methyl = inactive

propynyl a llowed

Et, i-Pr = inactive

EW D groups = inactive

sensitive to stericbulk and EWD groups

What a chemist would like to see:

2004 Sheffield Chemoinformatics Conference, April 21-23

Goals

• Correctness• Completeness• Easily interpretable/usable results

– As close to the form described by experts– Meaningful descriptors

• Usability– User friendly– Capable of handling screening datasets

generated in modern drug discovery environment

2004 Sheffield Chemoinformatics Conference, April 21-23

Bioreason Fundamentals

• Technical approach– Learn what is important directly from the data

• Assumptions/predefined knowledge kept to a minimum

– Form structural classes• Unsupervised, solely based on molecular graphs

– Reason with the classes• Overlay activities, other biological attributes• Characterize & prioritize classes• Build models, etc…

2004 Sheffield Chemoinformatics Conference, April 21-23

Overview of the classes

2004 Sheffield Chemoinformatics Conference, April 21-23

Types of Classes

Defined Rings Rings with Variable Closures

2004 Sheffield Chemoinformatics Conference, April 21-23

Subclass

Parent Class Subclasses

2004 Sheffield Chemoinformatics Conference, April 21-23

Compounds in classes

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Approach

• Global SAR models – Few rules, sometimes quite hard to interpret

• Local models on Bioreason classes?– First, learn good classes

• The scaffolds are the first part of SAR– Generate R-tables– Choose an appropriate class of descriptors

• Characterize R-groups – From the R-tables and descriptors…

• Construct models• Extract rules

2004 Sheffield Chemoinformatics Conference, April 21-23

Generating R-Tables

• For classes without variation in activity…– No need for R-table – Scaffold is indicative/predictive of activity

• For mixed classes…– Using scaffold as a starting point automatically

learn R-groups– Relate R-groups to available compound

activity/property attributes.• Not an easy problem!

– Enabling user interaction to change alignment

2004 Sheffield Chemoinformatics Conference, April 21-23

Automatic R-table Examplefrom acute-toxicity data set

2004 Sheffield Chemoinformatics Conference, April 21-23

Discovering SAR

• Calculate appropriate SAR descriptors– Position-Specific-Descriptors(PSD) to

characterize R-groups • Construct models

– From R-tables and PSD set of each class• Interpret models

– Extract concise, meaningful rules

2004 Sheffield Chemoinformatics Conference, April 21-23

Position-Specific-Descriptors

• Physicochemical– Rgroup_Property– LogP, molecular weight, and number of H-bond

donors, H-bond acceptors, charges, rings, and rotatable bonds, polar surface area, basic sites, …

– Example: R3_logP == 0.75• Pseudo-3D pharmacophores

– Rgroup_number of bonds_pharmacophore point – H-bond donor, H-bond acceptor, anion, cation, polar,

hydrophobe, aromatic ring, and aliphatic ring– Example: R4_3_HBA: Yes

• All descriptors learned for each class modeled

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Algorithm

• Multiple Domain– compound can appear in

more than one child node– compound can contribute to

more than one rule• Multiple Splitting

– parent node can have more than two children

– extract as much SAR at each level as is statistically meaningful

– number of splits controlled by statistical means, e.g. parent-child chi2 cutoff (0.7)

DatasetAvg. Prop. 4.2

R4_3_HBA: YesR10_MW>31

R3_logP inrange(0.45-0.75)

R11_2_HYD: Yes… …

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Tree Interpretation

• Each node in this type of decision tree is a rule or hypothesis easily to express in English

• Each hypothesis has– Indication of certainty (statistical)– Feature name/range (e.g. logP between x and y)– Support (number of examples)– List of examples

• Rules with multiple elements are possible – Aggregate certainty terms

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Example

2004 Sheffield Chemoinformatics Conference, April 21-23

Analysis of Commercial Pesticides

• Source of compounds– The pesticide manual: A world compendium– Published by the British crop protection council

• Types of activity considered– Herbicide, insecticide, fungicide, plant growth

regulation– Binary indicator variables used for type of activity

• Task: – Identify scaffolds associated with herbicidal activity

& features that distinguish herbicides from non-herbicides within the same class

®

2004 Sheffield Chemoinformatics Conference, April 21-23

Diphenylether subclass is evenly

distributed between herbicides and non-herbicides. What

substituent features distinguish the

herbicides?

OR1

R4

R2 R7

R5

R3R6

2004 Sheffield Chemoinformatics Conference, April 21-23

R-Table sorted by herbicide activity and displayed at

cutoff between herbicides and non-herbicides

2004 Sheffield Chemoinformatics Conference, April 21-23

If R5 has a HBA located 2 bonds from the

scaffold,then probability of

activity is 95% (cf. 47% for class as a whole)

with a certainty of 1.00.

2004 Sheffield Chemoinformatics Conference, April 21-23

If R5 PSA within the range of 33.97 to 65.16,

then probability of herbicidal activity is

100% (cf. 47% for class as a whole)

with a certainty of 1.00.

2004 Sheffield Chemoinformatics Conference, April 21-23

This rule cleanly differentiates

pyrethroid insecticides from

diphenyl ether herbicides

If R3 has an AlR center located 5 bonds from

the scaffold,probability of herbicidal

activity is 0% with a certainty of 1.00.

2004 Sheffield Chemoinformatics Conference, April 21-23

Peptide Deformylase Inhibitors

ClassPharmer™SAR extraction & pharmacophore

perception

®

2004 Sheffield Chemoinformatics Conference, April 21-23

Learning SAR Rules for Inhibitors of Peptide Deformylase (PDF)

• Training set of 22 mostly Beta-sulfinylhydroxamates– Reference: Apfel, et al., J. Med. Chem., 43,

2324(2000)• Compounds classified & characterized by

MCS using ClassPharmer™ technology• R-Tables generated for each class• QSARs learned for each class

Training Set of Hydroxamic Acids1 2.22 (0)

NOH

O NH

O

NH

O OH

16 1.96 (0)

S

OO

O

NHOHNHO

14 1.51 (0)

S

OO

O

NHOHO

5 1.46 (0)

S

OO

O

NHOH

9 1.03 (0)

S

OO

O

NHOHO

7 0.80 (0)

S

OO

O

NHOH

12 0.72 (0)

S

OO

O

NHOH

8 -0.04 (0)

S

OO

O

NH OH

O

O

2 -1.34 (0)

S

OO

O

NHOH

2004 Sheffield Chemoinformatics Conference, April 21-23

Classification & R-grouping by ClassPharmer™

NO

O

R3

SO

Bx1

A

R1R2

pIC50 R116 1.95860731484

O

*H

*

NHO a

b

13 1.63827216398

O

*H

* ab

8 -0.0413926851582

O

*H

OO

*

a

b

2 -1.34242268082

O

*H

*

a

b

R2 R3 X1

1.96

1.64

-0.04

-1.34

Cpd ID

16

13

8

2

8o 342 1.96 (1)

SO

OO

NHOHNHO

8h 295 1.03 (1)

S

OO

O

NHOHO

9a 269 1.00 (1)

S

O

O

NHOH

8e 299 0.85 (1)

S

OO

O

NHOH

9e 348 0.68 (1)

SO

O

NHOHBr

ClassPharmer™ Rule for Desirable R3 Groups

If R3(MW) in range of 50 to 74, the probability of activity is significantly enhanced

92% of CompdsSatisfying the

premise are active

67% of Compounds in class are active

4 0.80 (1)

SO

OO

NHOH

7 0.80 (1)

S

OO

O

NHOH

8 -0.04 (1)

SO

OO

NH OH

OO

3 -0.48 (1)

S

OO

O

NHOH

2 -1.34 (1)

S

OO

O

NHOH

Obverse Rule for Undesirable R3 Groups

If R3(MW) outside of range of 50 to 74, the probability of activity is significantly decreased

All Compounds Satisfying the

premise are inactive

2004 Sheffield Chemoinformatics Conference, April 21-23

R3 pocket in active site of (PDF)Ni(II)

CGG49 in active site of E. coli Ni-PDF (Roche) Apfel, et al. J. Med. Chem. 2000, 43, 2324-2331

OHNHO

S OO

R3 = nBu

2004 Sheffield Chemoinformatics Conference, April 21-23

Future Directions

• Expand position specific descriptors types– ADME/Tox analysis– Electronic

• Rule Synopsis

• Mine info across screens, libraries, time

2004 Sheffield Chemoinformatics Conference, April 21-23

Acknowledgements

• Bioreason– Terence K. Brunck– Pat Bacha– Suzanne Sloan

• DuPont– Dan A. Kleier– A number of forward thinking and very

patient scientists

2004 Sheffield Chemoinformatics Conference, April 21-23

Getting Close…

N

Rvariation of R allowedto retain activity 3

dim ethylsubstitutionallowed methyl = inactive

propynyl a llowed

Et, i-Pr = inactive

EW D groups = inactive

sensitive to stericbulk and EWD groups