53
Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg http://infochim.u-strasbg.fr/

Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Embed Size (px)

Citation preview

Page 1: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics approaches to virtual screening and in silico design

Alexandre VarnekLaboratoire d’Infochimie, Université de Strasbourg

http://infochim.u-strasbg.fr/

Page 2: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

StrasbourgParis

Page 3: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Laboratory of Chemoinformatics

Master on Chemoinformatics(since 2002)

Page 4: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics:

new disciline combining several „old“ fields

Chemical databases, QSAR,Virtual screening,In silico design , ……………..

Page 5: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

•Needs for chemoinformatics

• Fundamentals of chemoinformatics

•Some applications

OUTLOOK

Page 6: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics: why

Page 7: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

•amount of informationmany millions of compounds and reactions

many millions of publications

Chemical Databases

Storage, organization and search experimental data

Page 8: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

May 2009 September 2010

54,984,228

62,105,511

39,804,330

281,474

43,995,234

831,886

+7 M

+2 M

+22 M

Page 9: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Problem: Flood of Information

• > 54 million compounds

• > 5 million new compounds / year

• 800,000 publications / year0

5 000 000

10 000 000

15 000 000

20 000 000

25 000 000

30 000 000

# of

stru

ctur

es

1965 1970 1975 1980 1985 1990 1995 2000

Year

=> can anyone read 4.000 publications / day ?

chemical information should be well organized and searchable

Page 10: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Problem: Not Enough Information• > 54,000,000 chemical compounds

• > 500,000 3D structures in Cambridge Crystallographic File

• 230,000 infrared spectra in largest database (Bio-Rad)> 1 % of all compounds

0.4 % of all compounds

The goal of chemoinfomatics is to develop predictive approaches and tools

What about physico-chemical and biological properties ?

Page 11: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics as a modeling discipline

Page 12: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

What structure do I need for a certain property ?

How do I make this structure ?

What is the product of my reaction ?

Chemoinfomatics as a modeling discipline

structure-activity relationships

synthesis design

reaction prediction, structure elucidation

Page 13: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Theoretical chemistry

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

- Molecular model- Basic concepts- Major applications- Learning approaches

Page 14: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Molecular Model

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics• molecular graph• descriptor vector

electrons and nuclei

atoms and bonds

Page 15: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Basic mathematical approaches

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

Schrödinger equation, HF, DFT, …

Classical mechanicsStatistical mechanics

-Graph theory, -Statistical Learning Theory

Page 16: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Basic concepts

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics chemical space

wave/particle dualism

classical mechanics

Page 17: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemical space = objects + metrics

• Objects: - molecular graphs;

- descriptors vectors {Di} = f ( )

Metrics:- Graphs hierarchy, - Similarity measures

N

NNH

N

NH2

N

NNH

N

NH2

Page 18: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Navigation in Chemical Space:topological space of chemical structures

Relationships between the objects:

• Hierarchical scaffold-tree approach• Structural mutation rules• Network-like Similarity Graphs• Combinatorial Analog Graphs• ………….

Rational organisation of structural data Exploration of the chemical space Identification of new objects (e.g., active scaffolds, R-groups combinations, etc)

Page 19: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Navigation in Chemical Space:vectorial space defined by molecular descriptors

Relationships between the objects: In this space, each molecule is represented as a vector whereas the metric is defined by similarity measures.

In properly selected spaces, neighboring molecules possess similar properties. Different databases could be compared. Compounds subsets for screening could be rationally selected

Page 20: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

• Physicochemical parameters can be broadly classiied into three general types:

• Electronic ()• Steric (Es)

• Hydrophobic (logP)

Example :Example : Hansch AnalysisHansch Analysis

Biological Activity = Biological Activity = f f ((Physicochemical parameters Physicochemical parameters ) + constant ) + constant

log1/C = a ( log P )log1/C = a ( log P )22 + b log P + + b log P + + + EEss + C + C

Page 21: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Constitutional (mol. weight, the number of S, N or O atoms, …)

Topological (Randic index, informational content, …)

Geometrical (molecular size, distances between functional groups, … )

Electrostatic (electrostatic potential, charges, …)

Charged Partial Surface Area

Quantum-chemical (energies of molecular orbitals, reactivity indices, …)

Thermodynamical (heat of formation, logP, …)

Fragments (sequences of atoms and bonds, augmented atoms, …)

More than 4000 types of descriptors are known

Molecular Descriptors

Page 22: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Learning approach

Quantum Chemistry

Force Field Molecular Modelling

Chemoinformatics

deductive >> inductive

deductive inductive

deductive << inductive

Page 23: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

• In chemoinformatics the logic of learning is not based on existing physical theories. Chemoinformatics considers the world too complex to be a priori described by any set of rules. Thus, the rules (models) in chemoinformatics are not explicitly taken from rigorous physical models, but learned inductively from the data.

Learning approach

Page 24: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics: From Data to Knowledge

know-ledge

information

data

generalization

context

measurementor calculation

deductivelearning

inductivelearning

Page 25: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

• In chemoinformatics, a model represents an ensemble of rules or mathematical equation linking a given property (activity) with the molecular structure.

Models

PROPERTY= f (structure)

• Two main types of models: - binary classification (SAR) - regression (QSAR)

Page 26: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Organic chemistry: exercise of « intuitive » chemoinformatics

Page 27: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

The Markovnikov Rule:  When a Brønsted acid, HX, adds to an unsymmetrically substituted double bond, the acidic hydrogen of the acid bonds to that carbon of the double bond that has the greater number of hydrogen atoms already attached to it.

Extraction of rules from the data

Page 28: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

In silico design

ChemicalDatabases

Virtuel screening

Major applications

Structure-Activity Models

Machine-learning approaches:- MLR,-Decision Trees,- Artificial Neural Networks,- Support Vector Machines, -………

Algorithms for organisation and search the data

- fingerprints,- graph theory,- similarity measures,

Page 29: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemoinformatics: some applications

Page 30: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Dmitry Mendeleév

(1834 – 1907)

• Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in 1869. Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements: Ga (1875), Sc (1879) and Ge (1886).

Discoverer of the Periodic Table —an early “Chemoinformatician”

Page 31: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Periodic Table

Chemical properties of elements gradually vary along the two axis

Page 32: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Target Protein

Large librariesof molecules

High Throughout Screening

Hit

experiment

computations

Virtual Screening

Small Library of selected hits

Page 33: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Chemical universe:

• > 50 M compounds are currently available • 1060 druglike molecules could be synthesised

Virtual screening is inevitable to analyse a huge amount of protein-ligand combinations

Virtual screening must be very fast and efficient !

Human proteome:

• 84000 peptides

Page 34: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

~106 – 109

molecules

VIRTUAL SCREENING

INACTIVES

HITS

CHEMICAL DATABASE

Virtual screening “funnel”

Similarity search

Filters

(Q)SAR

Docking

Pharmacophore models

~101 – 103

molecules

Page 35: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

REACh regulation

• The European Union adopted Regulation on the Registration, Evaluation, Authorisation, and Restriction of Chemicals (the “REACH Regulation”), which entered into force on June 1, 2007.

• REACH imposes requirements of information of physico-chemical, toxicology and eco-toxicology parameters for the chemicals, production of which exceeds 1 ton.

• More than 30.000 compounds must be tested. Total cost estimated (EU Commission) over a 11 -15 year period is €2.8 - €5.2 bn

No Data, No Market!

Page 36: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

predictions of > 20 physico-chemical properties and NMR spectra for each individual compound

Chemoinformatics tools in SciFinder:

Page 37: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Drug design

Page 38: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Virtual screening - what does it give us?Herbert Koppen (Boehringer, Germany)

Current Opinion Drug Discovery & Dev. (2009) 12: 397-407

From virtuality to realityUlrich Rester (Bayer, Germany)

Current Opinion Drug Discovery & Dev. (2008) 11: 559-568

What has virtual screening ever done for drug discovery?David E Clark (Argenta Discovery Ltd, UK)

Expert Opinion on Drug Discovery (2008) 8: 841-851

Virtual screening: success stories & drugs

Page 39: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

39

Market: tirofiban (1999)Aggrastat (trade name) from Merck, GP IIb/IIIa antagonist (myocardial infarction, it is an anticoagulant))

(2S)-2-(butylsulfonylamino)-3-[4-[4-(4-piperidyl)butoxy]phenyl propanoic acid (Mol. Mass: 440.6 g/mol)

PK data: Bioavailability: IV only (intravenous only); Half life : 2 hoursCombined with heparin and aspirin, but numerous precautions

http://www.bioscience.ws/encyclopedia/

In silico screening: success stories & drugs

Page 40: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Materials design

Page 41: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Ionic LiquidsIonic Liquids are composed of

large organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

and anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

Page 42: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

There exist combinations of ions that could lead to useful ionic liquids

Ionic LiquidsLarge organic cations:

PF6-, Cl-, BF4

-, CF3SO3-, [CF3SO2)2N]-

anions:

N RR12

+N R

R

1

2

+ N

N+

R

R

R

1

2

3

N

R

R

R

R1

2

3

4

+N

N+

R

R

1

3

1018

Page 43: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Viscosity predictions on 23 new ILs

Solvionics company

None of these Ionic Liquids have been used for model preparation

Page 44: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Ionic Liquids viscosity: Experimental validation of the Neural Networks models

• prediction error (~70 cP) is similar to the “noise” in the experimental data used for the training of the model

exp

pred

G. Marcou, I. Billard , A. Ouadi and A. Varnek, submitted

RMSE=73 cP

Page 45: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Metabolites prediction

Page 46: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Prediction of aromatic hydroxylation sites for human CYP1A2 substrates

N

NH2O

aromatic hydroxylation

Potential hydroxylation sites

CYP1A2N

NH2O

???

?

The obtained model correctly predicts the hydroxylation products with the probability of ≈80%(see poster of C. Muller)

Method: SVM + descriptors issued from condensed graphs of reaction

Page 47: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Reaction conditions

Page 48: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Search of optimal reaction conditions

reaction query

Potential products of the reaction. The compound A is a target

A B C

+ H2

Page 49: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg
Page 50: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Experimental validation

Sub A

Conditions suggested by the program Expérimental validation

catalyst solvent additif Yield (Exp)1 Pt/C (10%) THF None A : 98 %2 Pt/C (10%) DMF None A : 90 %, Sub : 2%3 Ir/CaCO3 (5%) EtOH NEt3 (5 %) A : 100 %4 Ir/CaCO3 (5%) Hexane None INSOLUBLE5 Ir/CaCO3 (5%) DMF None A : 27%, Sub : 69 %

+ H2

A. Varnek, in “Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010

Page 51: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Joseph Louis Gay-Lussac, Mémoires de la Société d ’Arcueil 2:207 (1808)

« We are perhaps not far removed from the time when we shall be able to submit the bulk of chemical phenomena to calculation »

Page 52: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg
Page 53: Chemoinformatics approaches to virtual screening and in silico design Alexandre Varnek Laboratoire d’Infochimie, Université de Strasbourg

Visit our website : http://infochim.u-strasbg.fr