SLAS2016: Why have one model when you could have thousands?

  • View
    518

  • Download
    4

  • Category

    Science

Preview:

Citation preview

Why have one modelwhen you could have thousands?

Alex M. Clark, Ph.D.

January 2016

© 2016 Molecular Materials Informatics, Inc. http://molmatinf.com

MOLECULAR MATERIALS INFORMATICS

Cheminformatics• Generally 2D structures with activities:

• Look for trends: structure-activity relationships

• Leverages quantity rather than detail... but quality is also supremely important

2

MOLECULAR MATERIALS INFORMATICS

Structure-Activity Models

• Bayesian models very effective

• Tabulate structure fingerprints for actives vs. inactives

• Prediction: ordering, probability

• Low maintenance

3

10001001000001101001011101110111

• ECFP6 fingerprints

0.8343ROC integral

MOLECULAR MATERIALS INFORMATICS

The Data Problem• > 10 years ago: quantity the biggest issue

- open structure-activity data rare and small - paid collections, big pharma registration

• ~5 years ago: quality the biggest issue

- huge databases, e.g. PubChem, ChemSpider, ZINC, vendors, etc.

- generally no provenance: anything goes

• Cheminformatics seemed to be stagnant...

- new methods, same mediocre performance

4

MOLECULAR MATERIALS INFORMATICS

The Data Solution• Recently: some excellent developments

- Open Melting Points: models actually work - PubChem: direct submission by scientists - CDD: store and share with same platform - ChEMBL: large, open, high quality, broad

• Can now have quantity and quality, without fees or restrictions

• Evidence suggests that the data was holding us back, not the methods

5

MOLECULAR MATERIALS INFORMATICS

ChEMBL• Hierarchy looks like this:

• What we need it to be:

6

target assay activity molecule

dataset assayactivitymolecule

target

mergedactivity

materialsfor model

MOLECULAR MATERIALS INFORMATICS

Slicing & Dicing

• Divide by target, species and type of assay (protein binding, whole cell, ADMET, etc.)

• Measurements: [Ki, Kd] or [IC50, EC50, AC50, GI50]

• Units: [M, mM, μM, nM]

• Relations [=, <, >, ≤, ≥]

• Total of 8646 groups of structure-activity

7

MOLECULAR MATERIALS INFORMATICS

Consolidation• Strip salts / adducts

• Common organic elements only:

- [H, C, N, O, P, S, F, Cl, Br, I, B, Si, Se, As, Sb, Te]

• Duplicate molecules: merge activities, e.g.

- [1.2, 1.8] ➡ 1.5 ± 0.3 - [> 5, 5.5] ➡ > 5 - [< 1, 3.5] ➡ invalid

• Keep groups with at least 100 molecules remaining

• Now down to 1839 datasets

8

MOLECULAR MATERIALS INFORMATICS

Model Building• Bayesian models need a threshold...

9

pIC50 9 157 3

inactive active

• Suitable values often known; large scale automation: must estimate

• Score: population, balance, trial Bayesian

• See J. Chem. Inf. Model. 55, 1246-1260 (2015)

MOLECULAR MATERIALS INFORMATICS

Model Results

• Metrics generally good for Bayesian models using ECFP6 fingerprints

• Note that not all datasets have any SAR

10

AU

C (

easy

)

AU

C (

hard

)

population population

MOLECULAR MATERIALS INFORMATICS

Deliverable• Datasets with acceptable models: 1826

- list of unique molecules - activity (standard molar units) - threshold (active/inactive) - target & assay provenance - Bayesian model (ECFP6)

• Targets are diverse, data is high quality: thanks to the ChEMBL project

• Can apply all models to any molecule...

• Start with a set of discontinued drugs...

11

MOLECULAR MATERIALS INFORMATICS

Discontinued Drugs12

• ~50 drugs that passed most tests, but never made it to market

• Maybe they cure something else?

MOLECULAR MATERIALS INFORMATICS

Detail & Visualisation13

Atom-centric Bayesian

Honeycomb clustering

MOLECULAR MATERIALS INFORMATICS

PolyPharma app

• Proof of concept tools being explored for several drug discovery collaborations

• Interactive functionality demonstrated as a mobile app for iPhone & iPad

• Free to use

14

http://itunes.apple.com/app/polypharma/id1025327772

MOLECULAR MATERIALS INFORMATICS 15

MOLECULAR MATERIALS INFORMATICS 16

MOLECULAR MATERIALS INFORMATICS 17

MOLECULAR MATERIALS INFORMATICS 18

MOLECULAR MATERIALS INFORMATICS 19

MOLECULAR MATERIALS INFORMATICS 20

MOLECULAR MATERIALS INFORMATICS 21

MOLECULAR MATERIALS INFORMATICS 22

MOLECULAR MATERIALS INFORMATICS 23

Acknowledgments

http://molmatinf.com http://molsync.com http://cheminf20.org

@aclarkxyz

• Collaborative Drug Discovery

• Sean Ekins

• Society for Laboratory Automation & Screening

• Inquiries to info@molmatinf.com

Recommended