Upload
cecilia-newman
View
219
Download
2
Embed Size (px)
Citation preview
Chemoinformatics approaches to virtual screening and in silico design
Alexandre VarnekLaboratoire d’Infochimie, Université de Strasbourg
http://infochim.u-strasbg.fr/
StrasbourgParis
Laboratory of Chemoinformatics
Master on Chemoinformatics(since 2002)
Chemoinformatics:
new disciline combining several „old“ fields
Chemical databases, QSAR,Virtual screening,In silico design , ……………..
•Needs for chemoinformatics
• Fundamentals of chemoinformatics
•Some applications
OUTLOOK
Chemoinformatics: why
•amount of informationmany millions of compounds and reactions
many millions of publications
Chemical Databases
Storage, organization and search experimental data
May 2009 September 2010
54,984,228
62,105,511
39,804,330
281,474
43,995,234
831,886
+7 M
+2 M
+22 M
Problem: Flood of Information
• > 54 million compounds
• > 5 million new compounds / year
• 800,000 publications / year0
5 000 000
10 000 000
15 000 000
20 000 000
25 000 000
30 000 000
# of
stru
ctur
es
1965 1970 1975 1980 1985 1990 1995 2000
Year
=> can anyone read 4.000 publications / day ?
chemical information should be well organized and searchable
Problem: Not Enough Information• > 54,000,000 chemical compounds
• > 500,000 3D structures in Cambridge Crystallographic File
• 230,000 infrared spectra in largest database (Bio-Rad)> 1 % of all compounds
0.4 % of all compounds
The goal of chemoinfomatics is to develop predictive approaches and tools
What about physico-chemical and biological properties ?
Chemoinformatics as a modeling discipline
What structure do I need for a certain property ?
How do I make this structure ?
What is the product of my reaction ?
Chemoinfomatics as a modeling discipline
structure-activity relationships
synthesis design
reaction prediction, structure elucidation
Theoretical chemistry
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
- Molecular model- Basic concepts- Major applications- Learning approaches
Molecular Model
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics• molecular graph• descriptor vector
electrons and nuclei
atoms and bonds
Basic mathematical approaches
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
Schrödinger equation, HF, DFT, …
Classical mechanicsStatistical mechanics
-Graph theory, -Statistical Learning Theory
Basic concepts
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics chemical space
wave/particle dualism
classical mechanics
Chemical space = objects + metrics
• Objects: - molecular graphs;
- descriptors vectors {Di} = f ( )
Metrics:- Graphs hierarchy, - Similarity measures
N
NNH
N
NH2
N
NNH
N
NH2
Navigation in Chemical Space:topological space of chemical structures
Relationships between the objects:
• Hierarchical scaffold-tree approach• Structural mutation rules• Network-like Similarity Graphs• Combinatorial Analog Graphs• ………….
Rational organisation of structural data Exploration of the chemical space Identification of new objects (e.g., active scaffolds, R-groups combinations, etc)
Navigation in Chemical Space:vectorial space defined by molecular descriptors
Relationships between the objects: In this space, each molecule is represented as a vector whereas the metric is defined by similarity measures.
In properly selected spaces, neighboring molecules possess similar properties. Different databases could be compared. Compounds subsets for screening could be rationally selected
• Physicochemical parameters can be broadly classiied into three general types:
• Electronic ()• Steric (Es)
• Hydrophobic (logP)
Example :Example : Hansch AnalysisHansch Analysis
Biological Activity = Biological Activity = f f ((Physicochemical parameters Physicochemical parameters ) + constant ) + constant
log1/C = a ( log P )log1/C = a ( log P )22 + b log P + + b log P + + + EEss + C + C
Constitutional (mol. weight, the number of S, N or O atoms, …)
Topological (Randic index, informational content, …)
Geometrical (molecular size, distances between functional groups, … )
Electrostatic (electrostatic potential, charges, …)
Charged Partial Surface Area
Quantum-chemical (energies of molecular orbitals, reactivity indices, …)
Thermodynamical (heat of formation, logP, …)
Fragments (sequences of atoms and bonds, augmented atoms, …)
More than 4000 types of descriptors are known
Molecular Descriptors
Learning approach
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
deductive >> inductive
deductive inductive
deductive << inductive
• In chemoinformatics the logic of learning is not based on existing physical theories. Chemoinformatics considers the world too complex to be a priori described by any set of rules. Thus, the rules (models) in chemoinformatics are not explicitly taken from rigorous physical models, but learned inductively from the data.
Learning approach
Chemoinformatics: From Data to Knowledge
know-ledge
information
data
generalization
context
measurementor calculation
deductivelearning
inductivelearning
• In chemoinformatics, a model represents an ensemble of rules or mathematical equation linking a given property (activity) with the molecular structure.
Models
PROPERTY= f (structure)
• Two main types of models: - binary classification (SAR) - regression (QSAR)
Organic chemistry: exercise of « intuitive » chemoinformatics
The Markovnikov Rule: When a Brønsted acid, HX, adds to an unsymmetrically substituted double bond, the acidic hydrogen of the acid bonds to that carbon of the double bond that has the greater number of hydrogen atoms already attached to it.
Extraction of rules from the data
In silico design
ChemicalDatabases
Virtuel screening
Major applications
Structure-Activity Models
Machine-learning approaches:- MLR,-Decision Trees,- Artificial Neural Networks,- Support Vector Machines, -………
Algorithms for organisation and search the data
- fingerprints,- graph theory,- similarity measures,
Chemoinformatics: some applications
Dmitry Mendeleév
(1834 – 1907)
• Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in 1869. Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements: Ga (1875), Sc (1879) and Ge (1886).
Discoverer of the Periodic Table —an early “Chemoinformatician”
Periodic Table
Chemical properties of elements gradually vary along the two axis
Target Protein
Large librariesof molecules
High Throughout Screening
Hit
experiment
computations
Virtual Screening
Small Library of selected hits
Chemical universe:
• > 50 M compounds are currently available • 1060 druglike molecules could be synthesised
Virtual screening is inevitable to analyse a huge amount of protein-ligand combinations
Virtual screening must be very fast and efficient !
Human proteome:
• 84000 peptides
~106 – 109
molecules
VIRTUAL SCREENING
INACTIVES
HITS
CHEMICAL DATABASE
Virtual screening “funnel”
Similarity search
Filters
(Q)SAR
Docking
Pharmacophore models
~101 – 103
molecules
REACh regulation
• The European Union adopted Regulation on the Registration, Evaluation, Authorisation, and Restriction of Chemicals (the “REACH Regulation”), which entered into force on June 1, 2007.
• REACH imposes requirements of information of physico-chemical, toxicology and eco-toxicology parameters for the chemicals, production of which exceeds 1 ton.
• More than 30.000 compounds must be tested. Total cost estimated (EU Commission) over a 11 -15 year period is €2.8 - €5.2 bn
No Data, No Market!
predictions of > 20 physico-chemical properties and NMR spectra for each individual compound
Chemoinformatics tools in SciFinder:
Drug design
Virtual screening - what does it give us?Herbert Koppen (Boehringer, Germany)
Current Opinion Drug Discovery & Dev. (2009) 12: 397-407
From virtuality to realityUlrich Rester (Bayer, Germany)
Current Opinion Drug Discovery & Dev. (2008) 11: 559-568
What has virtual screening ever done for drug discovery?David E Clark (Argenta Discovery Ltd, UK)
Expert Opinion on Drug Discovery (2008) 8: 841-851
Virtual screening: success stories & drugs
39
Market: tirofiban (1999)Aggrastat (trade name) from Merck, GP IIb/IIIa antagonist (myocardial infarction, it is an anticoagulant))
(2S)-2-(butylsulfonylamino)-3-[4-[4-(4-piperidyl)butoxy]phenyl propanoic acid (Mol. Mass: 440.6 g/mol)
PK data: Bioavailability: IV only (intravenous only); Half life : 2 hoursCombined with heparin and aspirin, but numerous precautions
http://www.bioscience.ws/encyclopedia/
In silico screening: success stories & drugs
Materials design
Ionic LiquidsIonic Liquids are composed of
large organic cations:
PF6-, Cl-, BF4
-, CF3SO3-, [CF3SO2)2N]-
and anions:
N RR12
+N R
R
1
2
+ N
N+
R
R
R
1
2
3
N
R
R
R
R1
2
3
4
+N
N+
R
R
1
3
There exist combinations of ions that could lead to useful ionic liquids
Ionic LiquidsLarge organic cations:
PF6-, Cl-, BF4
-, CF3SO3-, [CF3SO2)2N]-
anions:
N RR12
+N R
R
1
2
+ N
N+
R
R
R
1
2
3
N
R
R
R
R1
2
3
4
+N
N+
R
R
1
3
1018
Viscosity predictions on 23 new ILs
Solvionics company
None of these Ionic Liquids have been used for model preparation
Ionic Liquids viscosity: Experimental validation of the Neural Networks models
• prediction error (~70 cP) is similar to the “noise” in the experimental data used for the training of the model
exp
pred
G. Marcou, I. Billard , A. Ouadi and A. Varnek, submitted
RMSE=73 cP
Metabolites prediction
Prediction of aromatic hydroxylation sites for human CYP1A2 substrates
N
NH2O
aromatic hydroxylation
Potential hydroxylation sites
CYP1A2N
NH2O
???
?
The obtained model correctly predicts the hydroxylation products with the probability of ≈80%(see poster of C. Muller)
Method: SVM + descriptors issued from condensed graphs of reaction
Reaction conditions
Search of optimal reaction conditions
reaction query
Potential products of the reaction. The compound A is a target
A B C
+ H2
Experimental validation
Sub A
Conditions suggested by the program Expérimental validation
catalyst solvent additif Yield (Exp)1 Pt/C (10%) THF None A : 98 %2 Pt/C (10%) DMF None A : 90 %, Sub : 2%3 Ir/CaCO3 (5%) EtOH NEt3 (5 %) A : 100 %4 Ir/CaCO3 (5%) Hexane None INSOLUBLE5 Ir/CaCO3 (5%) DMF None A : 27%, Sub : 69 %
+ H2
A. Varnek, in “Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010
Joseph Louis Gay-Lussac, Mémoires de la Société d ’Arcueil 2:207 (1808)
« We are perhaps not far removed from the time when we shall be able to submit the bulk of chemical phenomena to calculation »
Visit our website : http://infochim.u-strasbg.fr