25
Cheminformatics in Drug Discovery, an Industrial Perspective Hongming Chen, Thierry Kogej, Ola Engkvist Hit Discovery, Discovery Sciences, Astrazeneca R&D Gothenburg

Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Cheminformatics in Drug Discovery, an Industrial Perspective

Hongming Chen, Thierry Kogej, Ola EngkvistHit Discovery, Discovery Sciences, Astrazeneca R&D Gothenburg

Page 2: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Drug Discovery Today

An industry perspective of Today’s Challenge

2 IMED Biotech Unit I Discovery Sciences

Source: PhRMA profile 2016

Source: Paul et al., Nat. Rev. Drug Disc. (2010) 9:March

We need to become better,

faster & cheaper

Page 3: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Cheminformatics @ AstraZeneca

• HTS work-up

• Library design

• Virtual screening

• Machine learning & AI

3

Page 4: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

High Throughput Screening

From Millions to just a few Low cost/compound

High cost/compound

Slide modified from Mark Wigglesworth, AZ, with permission

Page 5: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

HTS Analysis: Clustering analysis

5

• Heavily dependent on computational chemistry

resources

• Linux, scripts, static workflows, data in flat files

• Cutting, pasting and reformatting between

applications

• Difficult to revisit or take over an analysis from a

colleague

• Time-consuming

Early days

iHAT: An Spotfire add-in for HTS Analysis

• Leverage the powerful visualization function of

Spotfire

• Annotation of compounds with in-house

experimental and predicted data

• Data integration from multiple sources

• Clustering of compounds

• Visualization and manipulation of cluster tree

• NN search

Page 6: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

iHAT: Clustering and Reclustering

6

Page 7: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

7

Library design @ AstraZeneca

• Diversity library is generally out of fashion

• Focused library fit for specific project need

• DNA encoded libraries become popular, but analysis is challenging, >60M to 8B library sizes

Currently, use classical library design method to reduce to 50M

preferred AZ library size

Only AZ reagentsVendors reagents

Fsp3

Page 8: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

8

Definition of VS

• Virtual screening refers to any in-silico techniques used to search large

compound databases (available chemicals or virtual libraries) to select a

smaller number for biological testing

• Virtual screening can be used to

− Select compounds for screening from in-house databases

− Choose compounds to purchase from external suppliers

− Select compounds from virtual libraries to be synthesized

• The technique applied depends on the amount of information available

about the particular disease target and the desired outcome

Page 9: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

VS methods

Lots of actives and

inactives knownActives known

QSAR

models

Aligned

conformation

Co-crystallized

ligand conf.

3D Structure of Target

Unknown Known

Ligand-based methods structure-based methods

2D ligand

•Substructure search;

•Fingerprint-based

similarity search

3D ligand

•Shape similarity search

•Pharmacophore mapping

•core replacement

Protein structure

Binding pocket

•Docking and scoring

•Pharmacophore mapping

•shape similarity

•Expand SAR

•Improve affinity

•Scaffold hopping

•filter, improve hit rate

•filter, improve hit rate

Page 10: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

VS example

10

• Identification of sPLA2X inhibitors using ligand and

structure based virtual screening

IC50 (NMR) 20uM

Page 11: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

11 IMED Biotech Unit I Discovery Sciences

Virtual screening platform @ AZ

Virt. NN 106-8

VS chemical

space 1015

Sub-sets 108-11

Primary VS 108-11

Virt. hits 106

Secondary VS

106

20-30 virt. lib.s

1-3 libs./50-100

cpd. ordered

Ligand

Structure,

Ligand,

Properties,

Novelty

Reactions,

reagents

Theoretical chemical space

(1060)

Known existing

compounds

108 AZ

106

AZ-Virtual space 1015

Diverse: > 4000 libraries

Synthesizable: historic

AZ-reactions

Searchable in subsets of

108-11

MJ Vainio J. Chem. Inf. Model., 2012, 52 (7), pp 1777–1786.

Iterative virtual screening workflow

Chemical space

Page 12: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

12 IMED Biotech Unit I Discovery Sciences

105

Computational strategy

Few libraries are enriched

specifically to the query

project3D similarity search

Structure based workup

Representative subset

(as large as possible)

108

Selection of libraries

(3D similarity and

structure based workup)

Page 13: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

AI & Machine Learning Today

Context, Definition & Advances

13 IMED Biotech Unit I Discovery Sciences Source: http://hduongtrong.github.io

Source: http://www.geeksforgeeks.org/artificial-

intelligence-an-introduction/

Page 14: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

The rise of deep learning in drug discovery

14

• Deep learning technologies have

been adopted in drug discovery

• Various forms of NN have been

applied so far

Page 15: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

De novo molecular generation with deep learning has developed very rapidly

15

Page 16: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Deep learning @ AstraZeneca: Vision

16

• Creating an integrate AI platform to impact drug discovery projects

Augmented design

Autonomousdesign

Automatic design

AI design platform

AI reaction platform

Automatic make and test (iLab)

Integration

Segler M.H.S. et al. Neural‐Symbolic Machine Learning for Retrosynthesis and Reaction Prediction, Chemistry, 2017, 23(25), 5966-5971

Segler M.H.S. et al. Planning chemical syntheses with deep neural networks and symbolic AI, Nature, 2018, 555, 604-610

Formalize chemical intuition

Data foundation

“Learn from everything”

Page 17: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

17

Deep learning @ AZ: De Novo Molecular Augmented Design Platform (REINVENT)

Generation of novel

chemical space

AZ+

Pub

Reinforcement learning to generate

project relevant compounds

Iterations

Iterations of design and compound synthesis

Desirability function

Σ IC50, LogP,

Novelty etc.

Prior model

Agent model

Page 18: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Deep learning at AstraZeneca: Reaction informatics

• First steps, building:

– World-class Reaction Knowledge Base

– On our work (past collaboration with M. Segler)

Flat filesFlat filesFlat filesReaxys

Flat filesFlat filesFlat filesPatent

data

Predictive chemical

reaction models

DMTA

Make

Test

Analyse

Design

iLab

AZ Reaction Connect

~20mill reactions

MedChem ELN

PharmSci ELN

AZ ChemistryConnect

Page 19: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Becoming FASTER with AI

Through unsupervised learning for hit identification

19

Deep CNN

autoencoder

Manifold

Learning

(t-SNE)

Deep CNN

classifier

🙂 🙂

Deep CNN

autoencoder

Manifold

Learning

(t-SNE)

-40 -20 0 20 40 60

-40

-20

02

04

0

tsne-x

tsn

e-y

Deep CNN

classifier

Page 20: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Becoming CHEAPER with ML/AI

By only conducting experiments when needed

20

Extract features

(signature descriptor)

Build/Apply ML model

(Mondrian TCP on SVM)

p-value (class 🙂 )

p-v

alu

e (

cla

ss 🙂

)

Do

n’t

syn

thesiz

e

Don’t test

Page 21: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

21

Becoming FASTER and CHEAPER with AI

AI augmented de novo molecule design

RNN + Reinforcement

learning

Page 22: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

AZ’s first DMTA automation platform

22

• First protype built during 2017

• All DMTA steps fully integrated

• Suited for 100s of uninterrupted DMTA cycles. ML/AI module is integrated.

• Cycle times of ca. 2h

• Successfully applied in ongoing research project

Test

Screening data analysis &

Compound design

Make

Purification & Analysis

Reformatting

Page 23: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Conclusions

• Cheminformatics is widely applied in Pharmaceutical industry

• Cheminformatics includes various aspects across different disciplines

• Adoption of machine learning and AI technologies will help

Cheminformatics to better fit current and future research needs

23

Page 24: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Acknowledgement

• Marcus Olivercrona

• Thomas Blaschke

• Isabella Feierberg

• Christophe Grebner

• Erik Malmerberg

• Christian Tyrchan

• Garry Pairaudeau

• Clive Green

• Lars Carlsson

• Peter Varkonyi

• Michael Kossenjans24

• EU Fundings

Page 25: Cheminformatics in Drug Discovery, an Industrial Perspectiveinfochim.u-strasbg.fr/CS3_2018/Presentations/Chen... · Drug Discovery Today An industry perspective of Today’s Challenge

Confidentiality Notice

This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove

it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the

contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,

Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com

25