58
Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy [email protected] 2-0622

Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy [email protected] 2-0622

Embed Size (px)

Citation preview

Page 1: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Chemical Data andComputer-Aided Drug Discovery

Mike GilsonSchool of [email protected]

2-0622

Page 2: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Outline

Overview of drug discovery

Structure-based computational methodsWhen we know the structure of the targeted protein

Ligand-based computational methodsWhen we don’t know the protein’s structure

Page 3: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

What is a drug?

Page 4: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Small Molecule Drugs

Aspirin

Sildenafil (Viagra)

Glipizide (Glucotrol)

Taxol

Digoxin

Darunavir

Page 5: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Nanoparticles(e.g., packaged small-molecule drugs)

Doxil(liposome package,

extended circulation time,milder toxicity)

Abraxane(albumin-packaged taxol)

http://www.doxil.com/about_doxil.html http://www.abraxane.com/professional/nab-technology.aspx

Page 6: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Biopharmaceuticals

Erythropoietin (EPO)Stabilized variant of a natural protein hormone

Etanercept (Enbrel)Protein with TNF receptor + Ab Fc domainScavenges TNF, diminishes inflammation

http://www.ganfyd.org/index.php?title=Erythropoietin_beta http://en.wikipedia.org/wiki/File:Enbrel.jpg

Page 7: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

How are drugs discovered?

Page 8: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Digoxin

Foxglove

Aspirin Taxol

Willow

Pacific Yew

Natural Products

Page 9: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

How Aspirin Works

inflammation

platelet activation

Aspirin

platelet inactivation

Page 10: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Biomolecular Pathways and Target SelectionE.g. signaling pathways

http://www.isys.uni-stuttgart.de/forschung/sysbio/insulin/index.html

Target protein

Page 11: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Empirical Path to Ligand DiscoveryCompound library(commercial, in-house,

synthetic, natural)

High throughput screening(HTS)

Hit confirmation

Lead compounds(e.g., µM Kd)

Lead optimization(Medicinal chemistry)

Potent drug candidates(nM Kd)

Animal and clinical evaluation

Page 12: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Compound Libraries

Commercial (also in-house pharma) Government (NIH)

Academia

Page 13: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Computer-Aided Ligand Design

Aims to reduce number of compounds synthesized and assayed

Lower costs

Less chemical waste

Faster progress

Page 14: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

HIV Protease/KNI-272 complex

Scenario 1Structure of Targeted Protein Known: Structure-Based Drug Discovery

Page 15: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Protein-Ligand Docking Structure-Based Ligand Design

VDW

Dihedral

Screened Coulombic

+ -

Potential functionEnergy as function of structure

Docking softwareSearch for structure of lowest energy

Page 16: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Energy Determines Probability (Stability)Boltzmann distribution

Ene

rgy

Pro

babi

lity

( )/( ) E x RTp x e

x

Page 17: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Structure-Based Virtual Screening

Compound database 3D structure of target(crystallography, NMR, modeling)

Virtual screening(e.g., computational docking)

Candidate ligands

Experimental assay

Ligands

Ligand optimizationMed chem, crystallography, modeling

Drug candidates

Page 18: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Fragmental Structure-Based Screening

“Fragment” library 3D structure of target(crystallography, NMR, modeling)

Fragment docking

Compound design

http://www.beilstein-institut.de/bozen2002/proceedings/Jhoti/jhoti.html

Experimental assay and ligand optimizationMed chem, crystallography, modeling Drug candidates

Page 19: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Physics-Based

Knowledge-Based

Potential Functions for Structure-Based DesignEnergy as a function of structure

Page 20: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Physics-Based PotentialsEnergy terms from physical theory

Van der Waals interactions (shape fitting)Bonded interactions (shape and flexibility)Coulombic interactions (charge-charge complementarity)Hydrogen-bonding

Page 21: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Common Simplifications Used in Physics-Based Docking

Quantum effects approximated classically

Protein often held rigid

Configurational entropy neglected

Influence of water treated crudely

Page 22: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Proteins and Ligand are Flexible

+

Ligand

Protein

Complex

D Go

Page 23: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Binding Energy and Entropy

Unbound states

Bound states

l 3n lnbound FreeG RT E EK RTD

EFree

EBound

Energy part Entropy part

/

/

2

6

Bound

Free

RE

RTE

TeK

e

Page 24: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Structure-Based DiscoveryPhysics-oriented approaches

WeaknessesFully physical detail becomes computationally intractableApproximations are unavoidableParameterization still required

StrengthsInterpretable, provides guides to designBroadly applicable, in principle at leastClear pathways to improving accuracy

StatusUseful, far from perfectMultiple groups working on fewer, better approxs

Force fields, quantumFlexibility, entropyWater effects

Moore’s law: hardware improving

Page 25: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Knowledge-Based Docking Potentials

Histidine

Ligandcarboxylate

Aromaticstacking

Page 26: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Probability Energy

( )/( ) E r RTp r e

( ) ln ( )E r RT p r

Boltzmann:

Inverse Boltzmann:

Example: ligand carboxylate O to protein histidine N

1. Find all protein-ligand structures in the PDB with a ligand carboxylate O2. For each structure, histogram the distances from O to every histidine N3. Sum the histograms over all structures to obtain p(rO-N)4. Compute E(rO-N) from p(rO-N)

Page 27: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

“PMF”, Muegge & Martin, J. Med. Chem. 42:791, 1999Knowledge-Based Docking Potentials

A few types of atom pairs, out of several hundred total

Atom-atom distance (Angstroms)

( )( )

( )prot lig vdw type ij ijpairs ij

E E E r

Nitrogen+/Oxygen- Aromatic carbons Aliphatic carbons

Page 28: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Structure-Based DiscoveryKnowledge-based potentials

WeaknessesAccuracy limited by availability of dataAccuracy may also be limited by overall approach

StrengthsRelatively easy to implementComputationally fast

StatusUseful, far from perfectMay be at point of diminishing returns

Page 29: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Limitations of Knowledge-Based Potentials

1. Statistical limitations (e.g., to pairwise potentials)

2. Even if we had infinite statistics, would the results be accurate? (Is inverse Boltzmann quite right? Where is entropy?)

r1 r2 r10…

10 bins for a histogram of O-N distances

rO-N

rO-C

100 bins for a histogram of O-N & O-C distances

rO-N

Page 30: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

e.g. MAP Kinase Inhibitors

Using knowledge of existing inhibitors to discover more

Scenario 2Structure of Targeted Protein Unknown: Ligand-Based Drug Discovery

Page 31: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Why Look for Another Ligand if You Already Have Some?

Experimental screening generated some ligands, but they don’t bind tightly

A company wants to work around another company’s chemical patents

An high-affinyt ligand is toxic, is not well-absorbed, etc.

Page 32: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Ligand-Based Virtual Screening

Compound Library Known Ligands

Molecular similarityMachine-learning

Etc.

Candidate ligands

Assay

Actives

OptimizationMed chem, crystallography, modeling

Potent drug candidates

Page 33: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Sources of Data on Known LigandJournals, e.g., J. Med. Chem.

Page 34: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Some Binding and Chemical Activity Databases

PubChem (NIH) pubchem.ncbi.nlm.nih.govChEMBL (EMBL) www.ebi.ac.uk/chemblBindingDB (UCSD) www.bindingdb.org

Page 35: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

BindingDBwww.bindingdb.org

Page 36: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Finding Protein-Ligand Data in BindingDB

e.g., by Name of Protein “Target”

e.g., by Ligand Draw Search

Page 37: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Sample Query ResultsBindingDB to PDB

Page 38: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

PDB to BindingDB

Page 39: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Download data inmachine-readableformat

Sample Query Results

Page 40: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Machine-Readable Chemical FormatStructure-Data File (SDF)

PDB Format Lacks Chemical BondingSDF Format Defines Chemical Bonds

Page 41: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

There are Many Other Chemical File FormatsInterconvert with Babel

Page 42: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Chemical SimilarityLigand-Based Drug-Discovery

Compounds(available/synthesizable)

Compare with known ligands

SimilarTest experimentally

Different

Don’t bother

Page 43: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Chemical FingerprintsBinary Structure Keys

Molecule 1

Molecule 2

phenyl

methyl

ketone

carboxy

late

amidealdehyd

e

chlorin

e

fluorine

ethylnaphthyl

S-S bond

alcohol

Page 44: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Chemical Similarity from FingerprintsTanimoto Similarity or Jaccard Index, T

0.25U

ITN

N

NI=2Intersection

NU=8Union

Molecule 1

Molecule 2

Page 45: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Hashed Chemical FingerprintsBased upon paths in the chemical graph

1-atom paths: C F N H S O2-atom paths: F-C C-C C-N C-S S-O C-H3-atom paths: F-C-C C-C-N C-N-H C-S-O

C S-O etc.

Each path sets a pseudo-random bit-pattern in a very long molecular fingerprint

Page 46: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Maximum Common Substructure

Ncommon=34

Page 47: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Potential Drawbacks of Plain Chemical Similarity

May miss good ligands by being overly conservative

Too much weight on irrelevant details

Page 48: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Scaffold Hopping

Zhao, Drug Discovery Today 12:149, 2007

Identification of synthetic statins by scaffold hopping

Page 49: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Abstraction and Identification of Relevant Compound Features

Ligand shape

Pharmacophore models

Chemical descriptors

Statistics and machine learning

Page 50: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

+ 1

Bulky hydrophobe

Aromatic

5.0 ±0.3 Å3.2 ±0.4 Å

2.8 ±0.3 Å

Pharmacophore ModelsΦάρμακο (drug) + Φορά (carry)

A 3-point pharmacophore

Page 51: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Molecular DescriptorsMore abstract than chemical fingerprints

Physical descriptorsmolecular weightchargedipole momentnumber of H-bond donors/acceptorsnumber of rotatable bondshydrophobicity (log P and clogP)

Topologicalbranching indexmeasures of linearity vs interconnectedness

Etc. etc.

Rotatable bonds

Page 52: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

A High-Dimensional “Chemical Space”Each compound is at a point in an n-dimensional space

Compounds with similar properties are near each other

Descriptor 1

Descriptor 2

Des

crip

tor 3

Point representing a compound in descriptor space

Page 53: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Statistics and Machine LearningSome examples

Partial least squares

Support vector machines

Genetic algorithms for descriptor-selection

Page 54: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Summary

Overview of drug discovery

Computer-aided methodsStructure-basedLigand-based

Interaction potentialsPhysics-basedKnowledge-based (data driven)

Ligand-protein databases, machine-readable chemical formats

Ligand similarity and beyond

Mike Gilson, School of Pharmacy, [email protected], 2-0622

Page 55: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Activities and Discussion Topics

BindingDB: Advil Machine-readable format, Binding activities

PDB/BindingDB2ONY at PDB BindingDB Substructure search Related data

Similarity search

Combined computational approaches(physics + knowledge)-based docking potentials(ligand + structure)-based computational discovery

Other data-driven methods where it may be hard to get enough statistics

Validation of computational methods

Protein-ligand databases: getting data and assessing data quality

Page 56: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622
Page 57: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Drug Discovery Pipeline(One Model)

Target identification

Target validation

Assay development

Animal Pharmacokinetics,

Toxicity

Phase I Clinical(safety, metab, PK)

Phase II Clinical(efficacy)

Phase III Clinical(comparison with existing therapy)

Lead optimization

Lead compound

(ligand) discovery

Page 58: Chemical Data and Computer-Aided Drug Discovery Mike Gilson School of Pharmacy mgilson@ucsd.edu 2-0622

Muegge J. Med. Chem. 49: 5895, 2006

Updated Knowledge-Based PMF Potential