Informatics in Drug Discovery - evqfm.com.br ? Phases of Drug Discovery Enabling Science & Technology

  • Published on
    30-Jul-2018

  • View
    212

  • Download
    0

Transcript

  • Informatics in Informatics in Drug DiscoveryDrug Discovery

    Tudor I. OpreaDivision of Biocomputing

    University of New Mexico School of Medicinetoprea@salud.unm.edu

    The University of New MexicoDivision of BIOCOMPUTINGCopyright Tudor I. Oprea, 2008. All rights reserved

  • OutlineOutline Phases of Drug Discovery Target Identification

    What is Bioinformatics?

    Hit Identification What is Cheminformatics?

    Lead Identification The GPR30 agonist; systems chemical biology

    Lead Optimization Structure-based design against Influenza

    The Importance of Accurate Information

  • PalestrantePalestrante Palestra: The first definition of the word comes from the ancient times (ant.):

    lugar para exerccios de ginstica na Grcia e em Roma. (Place for the practice of exercises and gymnastic in Greece and Rome)

    The "Boxer Vase" from Hagia Triada in Crete So Palestrante is a practitioner of exercises

    in this case, of Speech and Mind

  • Phases of Drug DiscoveryPhases of Drug Discovery

    Enabling Science & Technology Emerging Technologies

    Predictive ADME/Tox, Safety Assessment Front-loading Risk

    Therapeutic Input Clinical Insights

    Stringent Criteria

    Clinical CandidateTarget Lead

    Target Identification

    HitIdentification

    Lead & ProbeIdentification

    Concept Testing Launch

    Develop-ment for launch

    Lead/Probe optimization

    File for:eINDIND

    Receive:NDA

  • Preclinical Drug Discovery ParadigmPreclinical Drug Discovery Paradigm

    TargetsTargets LeadsLeadsIdenti-fication

    Vali-dation

    Design

    MakeTest

    GenomeProteomeDisease AreaBiol. Effects

    Hit IDOngoing TV Fast-followersIntellectual Property coverageCandidate DrugsCandidate Drugs

    BioinformaticsBioinformatics CheminformaticsCheminformatics

    Medical InformaticsMedical Informatics

  • Target Identification

    Hit Identif.

    Lead Identif.

    Lead optim.

    Clinical Candidate

    ProductionIdentification

    Human genetics

    Mouse genetics

    Target Identification in Preclinical DiscoveryTarget Identification in Preclinical Discovery

    ValidationThe key in target identification is mass production of pure protein for structural studies

  • What is Bioinformatics?What is Bioinformatics? Historically, bioinformatics was related to sequence

    analysis for genes & proteins It tried to find patterns in multiple seq., and understand how

    sequence information relates to genes, proteins, cellular substructures (organelles), cells and tissues, as well as whole (micro)organisms.

    Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.

    Its ultimate goal is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned

  • Human Genome MapHuman Genome Map

    Use this website to search the human genome. No prior experience required.

    http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi

  • Query: Query: Color BlindnessColor Blindness

    Two hits on chromosomes 7 and X (opsin)

  • Continuing the searchContinuing the search We searched GenBank for human opsin

    and found >150 hits. We selected NT_025965 (Homo sapiens chromosome X) We then searched for opsin 1 in the hitlist, and

    found a mRNA, that was in turn translated into the following protein sequence:

    MAQQWSLQRLAGRHPQDSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVTASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISIVNQVSGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCIIPLAIIMLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMIFAYCVCWGPYTFFACFAAANPGYAFHPLMAALPAYFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDGSELSSASKTEVSSVSSVS

    This sequence was submitted to BLAST

  • BLAST / PDB ResultsBLAST / PDB Results Our query was limited to PDB (Protein DataBank) Putative conserved domain (7 TM) has been detected:

    Five seq. (3 proteins from the opsin family, 2 fragments) were identified:

  • 1F881F88: X: X--ray structure of ray structure of bovine bovine rhodopsinrhodopsin

    Front view (left) and trans-membrane view (right) of the prototype for 7-TM protein models for GPCRs (G-protein coupled receptors)

    e.c.

    i.c.

    t.m.

    GPCRs are targets to ~35% of the known drugs

  • Statistical distribution of Statistical distribution of AffymetrixAffymetrix U133U133--A A gene chip probes produced by gene chip probes produced by BPQsBPQs after after

    18 hr incubation with MCF18 hr incubation with MCF--10A cells. 10A cells.

    This analysis provided Scott Burchiel with a small subset of genes that could be analyzed in more detail. Experiments indicated that 3,6-BPQ induces dioxin-response elements more than 1,6-BPQ.

    3,6-BPQ1,6-BPQ

    S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214

  • Top Genes Expressed after BPQ ExposureTop Genes Expressed after BPQ Exposure

    The University of New MexicoDivision of BIOCOMPUTING

    Browse Pathways for free at Biocarta.com

    172816q22.1 NQO1 NAD(P)H dehydrogenase, quinone 1210519_s_at

    864410p15-p14 AKR1C3 Aldo-keto reductase family 1, member C3 (3-alpha hydroxysteroid dehydrogenase, type II)209160_at

    15452p21 CYP1B1 cytochrome P450, family 1, subfamily B, polypeptide 1202436_s_at

    164610p15-p14 AKR1C2

    Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)

    211653_x_at

    164610p15-p14 AKR1C2

    Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)

    209699_x_at

    164510p15-p14 AKR1C1

    Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)

    216594_x_at

    164510p15-p14 AKR1C1

    Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)

    204151_x_at

    LocusLink

    ChromosomeGeneDescriptionProbe

    S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214

  • TryptophanTryptophan Metabolism (KEGG)Metabolism (KEGG)

    http://www.biocarta.com/pathfiles/tryptophanPathway.asp

    CYP1B1

  • CYP1B1 CYP1B1 Homology Homology

    ModelModelBuilt from X-ray structure of CYP2B4

    Partially optimized with 3,6-BPQ

    O

    O

    Benzo[a]pyrene-3,6-dioneCAS # 3067-14-9

    The University of New MexicoDivision of BIOCOMPUTING

  • High Throughput Synthesis and Screening

    Synthetic Compounds Natural Products

    Target Identification

    Hit Identif.

    Lead Identif.

    Lead optim.

    Clinical Candidate

    Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

  • Compound managementCompound management

    Handles both liquids & solidsRobots designed for fast reformatting & subsetting (cherry picking)Cheminformatics-based selection for diversity

  • From single compound synthesis to From single compound synthesis to combinatorial chemistrycombinatorial chemistry

    Courtesy of Thomas Khler

  • Compound storage is fully automated

  • What is What is CheminformaticsCheminformatics?? Molecular models are widely-used in biosciences ranging

    from analytical chemistry & biochemistry to immunology & toxicology

    Cheminformatics integrates data via computer-assisted manipulation of chemical structures

    Chemical inventory & compound registration are vital to cheminformatics, but it is their combination with other theoretical tools, linked to physical (organic) chemistry, toxicity, etc. that brings unique capabilities in the area of bioactivity discovery.

    Other names used by different people that relate to cheminformatics and computational chemistry: computer-aided drug design; quantum dynamics; biopolymer modeling; molecular / chemical diversity; virtual screening; etc...

  • 1 1 60 0 01 2 60 0 01 3 10 1 91 4 5 2 2. . . . . .

    Understanding ModelingUnderstanding Modeling

    RGB format550 Kb

    JPG format25 Kb

    Data Compression rarely Data Compression rarely leads to understandingleads to understanding

    Complexity reduction (e.g., Complexity reduction (e.g., modeling) leads to modeling) leads to interpretationinterpretation

  • Chemical system Biological system

    Commercial candidates(pharmaceuticals, flavors, additives, etc.)

    Experimental knowledge

    Ki Metabolism Toxicity...pH Stability Soly pKa ...

    Modified from G. Cruciani

    StructureStructure--Property CorrelationsProperty Correlations

  • ++

    Modified from G. Cruciani

    3D3D--DescriptorsDescriptors

  • ++

    Modified from G. Cruciani

    3D3D--DescriptorsDescriptors

  • Etot = Eel + ELJ + Ehb + Eent

    Modified from G. Cruciani

    3D3D--DescriptorsDescriptors

  • Modified from G. Cruciani

    3D3D--DescriptorsDescriptors

  • Modified from G. Cruciani

    3D3D--DescriptorsDescriptors

  • StructureStructure--Based Drug DesignBased Drug Design

    M von Itzstein et al., Nature, 363:418 - 423, 1993

  • O=C1N(C)C(=O)N(C)C(=C12)N=CN2C

    CN1C(=O)N(C)C(=O)c(c12)n(C)cn2

    Cover Art for Chemoinformatics in Drug Discovery, Wiley VCH 2005

    The Many Facesof Caffeine

  • Combinatorial & Medicinal Chemistry

    ToxicogenomicsToxicogenomics

    Target Identif

    Hit Identif.

    Lead Identif.

    Lead optim.

    Clinical Candidate

    N

    R2R1

    R3" R4"

    Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

  • GPR30 GPR30 A Novel Estrogen TargetA Novel Estrogen Target Prossnitz et al @ UNM

    identified a fully functional intracellular GPCR (bound to Endo-plasmic Reticulum)

    Binds Estradiol Tamoxifen is an

    agonist may explain cancer relapses to Tamoxifen therapy

    17E2-Alexa 546

    Revankar CM et al., Science 2005, 307:1625

    NH

    OS

    ONH

    NCH3

    CH3 CH3

    CH3S

    OOH

    O

    CH3 CH3ClO

    OH

    ClCl

    SOH

    OO

    CH3OH

    HH

    HNH

    O

    OHThe University of New MexicoSCHOOL OF MEDICINE

  • The University of New MexicoSCHOOL OF MEDICINE

    40% 2D Similarity(MDL, Daylight)40% ROCS(3D Similarity fromOpenEye)20% ALMOND(pharmacophorefingerprint similarityfrom Molecular Discovery)Query: Estradiol

    Applied to 10,000molecules; of these,100 were tested.

    One got lucky

    Bologa C & Revankar C et al., Nature Chem. Biol. 2006, 2: 207-212

    Discovery of a Potent GPR30 AgonistDiscovery of a Potent GPR30 Agonist

  • Computational Chemistry - (Q)SAR

    N

    O N

    N

    N

    NCl

    O

    NO

    N

    N

    N

    N

    Combinatorial Chemistry

    N

    ON

    O

    ON

    O

    N

    N

    N

    N NO

    O

    N O

    O

    O

    NON

    Chemical Space Navigation

    N

    N

    N

    N

    N

    S

    O

    N N

    N

    O

    N O

    ON

    OO

    NO

    N

    NNChemicalInformation

    BiologicalInformation

    StructuralInformation

    ++

    High Throughput High Throughput CheminformaticsCheminformatics

    T.I. Oprea, Curr Opin Chem Biol, 2002, 6, 384-389

  • Needles in the HaystackNeedles in the Haystack

  • Mostly Haystack,Mostly Haystack,But Some NeedlesBut Some Needles

  • actives per target

    T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53

    Density of Actives in Density of Actives in HTS SpaceHTS Space

  • actives per target

    T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53

    MW = 817

    ClogP = 5.7

    PSA = 269

    Lipinski score = 3

    2 x -N=N-

    2 x -SO3H

    Spotting Frequent HittersSpotting Frequent Hitters

  • Transgenic animalsClinically relevant disease modelsStudy metabolism & toxicology inhuman-ized conditions

    Structure-based Drug DesignZanamivir (inhaled; Relenza)first anti-influenza drug (1999)Oseltamivir (orally available prodrug, Tamiflu ) launched in 2001

    Target Identif

    Hit Identif.

    Lead Identif.

    Lead optim.

    Clinical Candidate

    O

    NH

    NH2

    OO

    O

    Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

  • Neuraminidase Inhibitors for InfluenzaNeuraminidase Inhibitors for Influenza X-ray structure guided rational design

    GRID-suggested replacing -OH with basic functionality Physical properties not amenable for oral delivery

    GSK markets this as Relenza, First drug for influenza

    OCO2H

    OH

    OH

    OH

    NH

    OH

    O

    IC50 = 8600nM IC50 = 5nM

    Zanamivir

    OCO2H

    OH

    OH

    OH

    NH

    NH

    O

    NH2

    NH

    Lead molecule

    Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

    M von Itzstein et al., Nature, 363:418 - 423, 1993

  • J. Am. Chem. Soc. 681, 119, 1997 J. Med. Chem. 2451, 41, 1998

    IC50 = 1nM

    CO2H

    OH

    NH

    NH2

    O

    O

    O

    H

    H

    O

    OH

    IC50 = 150nM

    Glu 276

    Gilead Neuraminidase InhibitorsGilead Neuraminidase Inhibitors

    Zwitterion not amenable for oral delivery Ethyl ester (oseltamivir) good oral absorption, duration Marketed as Tamiflu, first oral drug for influenza

    O

    CO2HNH

    NH2

    O

    Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

  • RelenzaRelenza vs Tamifluvs Tamiflu Both are potent neuraminidase inhibitors Relenza: Zanamivir is delivered via Diskhaler

    Tamiflu simple tablet formulation Deesterified in plasma, with long plasma T

    Tamiflu (marketed by Roche) took 65% U.S. market-share from Relenza in 7 weeks

    Q1/Q2 2002 sales Relenza vs Tamiflu Relenza market share fallen to 10% GSK quoted reason Slowness of the US to adopt

    inhalation therapies

    Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

    Inhalation to Overcome Low BioavailabilityInhalation to Overcome Low Bioavailability

  • LigandLigand--Based Virtual ScreeningBased Virtual Screening

    Binding mode of two agonists (red) and two antagonists (blue)B. Edwards et al., Mol Pharmacol 2005 68:1301-1310

  • TargetTarget--Based Virtual ScreeningBased Virtual Screening

    Started with NMR-solved bound of known peptide inhibitor; applied to existing X-ray structure (docking)

    Used 2D/3D searches in CDL library, using peptide infoWork by Richard Larson, Cristian Bologa, Tudor Oprea (UNM) The University of New MexicoOFFICE OF BIOCOMPUTING

  • Macromolecules Genes Chemical Diversity

    High Throuput Screening

    Identify actives Optimize

    (Q)SAR

    Animal Testing

    Pharmacokinetics, pharmacodynamics, toxicology

    Clinical Testing

    High Throughput Drug DiscoveryHigh Throughput Drug Discovery

    Genomics CombiChem

    HTSHTS

  • Informatics in Drug DiscoveryInformatics in Drug DiscoveryMedicalInformatics

    MedicalInformatics

    Disease

    Gene(s) Target(s)

    ScreeningPathwayAnalysis

    Lead(s)

    ADME/ToxOptimization

    ClinicalTrials

    OptimalDelivery

    OptimalDose

    ImprovedTherapy

    Patient(s)

    MolecularAssemblies

    CandidateDrug(s)

    CheminformaticsBioinformatics

    The University of New MexicoDivision of BIOCOMPUTING

  • Sdertlje

    Lund

    Mlndal

    Alderley

    Reims

    Charnwood

    Wilmington

    Boston

    Montreal

    For example, AstraZeneca For example, AstraZeneca preclinical discovery:preclinical discovery:

    4500 people working in 11 sites 4500 people working in 11 sites across 4 continentsacross 4 continents

    Truly global organizationTruly global organization

    Emphasis on local, existing Emphasis on local, existing strengths, difficult to restrengths, difficult to re--balancebalance

    Internal and external Internal and external collaborations need to be collaborations need to be maximizedmaximized

    Bangalore, India

    Brisbane, Australia

    Research Sites are Often Worlds ApartResearch Sites are Often Worlds Apart

  • Large pharma houses cover mostly major disease areas Gastrointestinal (Gastro Esophageal Reflux / Irritable Bowel

    Syndrome) Cancer (Solid tumours) Respiratory (Asthma/Chronic Obstructive Pulmonary Disease) Inflammation (Autoimmune) Cardiovascular (Thrombosis) Infection (Genomics targets) CNS (Neurology) Pain (Analgesia)

    Focused on civilized (first) world health problems Not here to cure mankind, but to make a profit If projected sales below $1.3 billion, will not launch Often need to bow to market pressure Mee too has been the most effective strategy so far.

    The Task: Drug HuntingThe Task: Drug Hunting

  • The University of New MexicoSCHOOL OF MEDICINE

    Traditional Drug DiscoveryTraditional Drug Discovery

    Pote

    ncy

    ADME

    Pote

    ncy

    ADME

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The Traditional Approach: AnalysisThe Traditional Approach: Analysis Historically, medicinal chemistry efforts started

    from a lead structure with (typically) poor ADME properties and micromolar affinity.

    Initial synthetic efforts were aimed at improving the binding affinity.

    Once medium- to low-nanomolar affinity was achieved, ADME properties were optimized.

    A similar strategy is being pursued in virtual screening, whereby ADME filters are applied post-docking, in an effort to trim down the large number of virtual hits

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The Problem with the Traditional The Problem with the Traditional ApproachApproach

    One has to preserve the molecular determinants responsible for potency, while modifying the structure in order to achieve good ADME properties.

    This often results in reduced binding affinity, and the process may require several iterations before optimal structures are found

    To a large extent, in vivo screening for good ADME properties can be introduced in order to avoid radical drops in affinity. This strategy is currently used in many pharmaceutical companies.

    However, the in vivo ADME screening strategy cannot cope with increasingly large numbers of compounds.

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    Pote

    ncy

    ADME

    Pote

    ncy

    ADME

    A Solution From the LiteratureA Solution From the Literature

    Apply ADMEfilters duringpost-HTS analysis orvirtual screening:PSA,LogD74, Solubility,Rule of Fiveetc.

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The Literature Approach: AnalysisThe Literature Approach: Analysis This approach yields good ADME properties without

    considering the receptor affinity. It is based on the expectation that, out of thousands (or

    more) compounds, some will display good affinity. This approach is a more advanced implementation of the

    computational alert procedure of Chris Lipinski. One has to preserve the molecular determinants

    responsible for DMPK properties, while modifying the structure in order to achieve good potency.

    This may often result in reduced ADME properties, but the process is less time-consuming, before optimal structures are found.

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The Problem with the Literature Approach:The Problem with the Literature Approach:

    To a large extent, this strategy rules out potential problems with passive permeability, but is unlikely to capture activemechanisms unless more advanced filters are introduced. This approach is very popular in several pharmaceutical companies.

    However, the ADME-filtering strategy cannot cope with the situation where NO HITS are found, because it decreases the probability of detecting hits outside the predefined ADME properties space.

    There is a clear need to design an approach that would allow the simultaneous optimization of both receptor binding affinity and ADME properties.

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The The ConvergentConvergent ApproachApproach

    Pote

    ncy

    ADME

    Pote

    ncy

    ADME

    Ignore

    Investigatein moredetail

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The The ConvergentConvergent Approach: AnalysisApproach: Analysis This strategy requires simultaneous optimisation of both

    binding affinity and ADME properties. An integrated software framework to address this problem is required.

    The real difficulty consists in addressing relevant properties in a consistent manner. For example: PSA, hydrophobicity and H-bonds are important in both binding and permeability - but the question is, in what proportion are they contributing in each case?

    In the convergent strategy, one would simultaneously monitor changes that would influence binding affinity and ADME properties. This may result in progressive increments in affinity and ADME properties, making this process less time-consuming.

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    The Problem with the The Problem with the ConvergentConvergent Approach:Approach:

    An integrated software framework that allows interactive user-input should allow a rapid convergence towards interesting compounds.

    This strategy relies on the accuracy of the experimental data used to derive the QSAR models to predict affinity and ADME (the old problem how good is your model?).

    However, this framework cannot substitute creative thinking, serendipity and good research!

    This approach is limited by the unexpected (novel targets, novel chemistry) and by inherent errors (20-30%).

    T. I. Oprea. Molecules, 7:51-62, 2002

  • The University of New MexicoSCHOOL OF MEDICINE

    Reliability of Biological Data (1)Reliability of Biological Data (1)

    Actual pICActual pIC5050

    Pre

    dict

    ed p

    ICP

    redi

    cted

    pIC

    5050

    CoMFA Model (External prediction)3210-1-2

    -2

    -1

    0

    1

    2

    3

    HIV-1 ProteaseInhibitorsN = 36Q2 = 0.814 KM = 0.064 mM

    Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266

  • The University of New MexicoSCHOOL OF MEDICINE

    Reliability of Biological Data (2)Reliability of Biological Data (2)

    Actual pICActual pIC5050

    Pre

    dict

    ed p

    ICP

    redi

    cted

    pIC

    5050

    CoMFA Model (External prediction)

    HIV-1 ProteaseInhibitorsN = 36Q2 = 0.487 KM = 2.0 mM

    3210-1-1

    0

    1

    2

    3

    Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266

  • The University of New MexicoSCHOOL OF MEDICINE

    Reliability of Biological Data (3)Reliability of Biological Data (3)

    %HIA Model (Training set)

    Human IntestinalAbsorption:1: Sulfasalazine(%HIA is 12, not 65)2: Sulfapyridine(%HIA is 93)3. 5-aminosalicylicBacterial azo bond reduction occurs in the intestine

    N

    NH

    S NN OH

    O

    O

    COOH

    N

    NH

    S NO

    O

    N OH

    COOH

    1

    2 3

    +

    Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274

  • The University of New MexicoSCHOOL OF MEDICINE

    Reliability of Biological Data (4)Reliability of Biological Data (4)Oral DrugAbsorptionN = 16Q2 = 0.44 (%HIA) Q2 = 0.71 (Caco)

    This sigmoidalrelationshipis frequentlydescribed inthe literature

    -7

    -6

    -5

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5

    u[1]

    t[1]

    Acetyls

    Alpreno

    Atenolo

    Cortico

    DexametFelodip

    Hydroco

    Mannito

    Metopro

    Olsalaz

    Practol

    Propran

    Salicyl

    Sulfasa

    Testost

    Warfari

    Azo bond reductionin the large intestine

    Paracellular Transport

    %HIA Model (Training set)

    Any model left?!

    Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274

  • The University of New MexicoSCHOOL OF MEDICINE

    Reliability of Biological Data (Reliability of Biological Data (55))

    There is poor agreement in terms of clearance data - over 42% of the compounds differ more than 30%

    71.3

    42.6

    13.9

    3.0

    01020304050607080

    % o

    f com

    pare

    d co

    mpo

    unds

    >10 % >30 % >100 % >1000 %difference between sources

    Sources:Goodman andGilman 1996 vs. Averys 1997 Data:Clearance (202 drugs)

  • The University of New MexicoSCHOOL OF MEDICINE

    Reference Published Structure Corrected Structure Comment JMC 37-476 chart 1

    N

    O

    O

    O NO O

    O

    rolipram: incorrect N atom position

    JMC 43-2217 chart 1

    N

    N

    O

    N

    N

    O

    A-85380: incorrect ring size

    -||- & JMC 36-2645

    NN

    O

    O

    O N

    N

    O

    tropisetron: methyl group in plus

    -||-

    N

    N N

    O

    N O O N

    O

    N

    ON

    O

    DAU-6285: missing methoxy; N instead O

    JMC 37-758 chart 1 N

    N

    O

    O

    N

    OH

    N3

    N

    N

    O

    O

    N

    O

    N3

    Ro-15-4513: methyl group missing

    JMC 37-787 figure 1

    N

    S

    O

    SO

    O

    OO

    NS

    O

    S

    epalrestat: E/Z config: E instead Z

    Reliability of Chemical Data (1)Reliability of Chemical Data (1)

  • The University of New MexicoSCHOOL OF MEDICINE

    NH

    N

    OH

    O

    NH

    O

    NH2O NH2

    O

    ONH

    O

    "Carisoprodol"Merck Index 13th ed #1854

    Carisoprodol correct structure

    Disclaimer:The above error have been corrected in Merck Index 14th edition. In general, the Merck Index is a reliable source of information.

    Reliability of Chemical Data (2)Reliability of Chemical Data (2)

  • Chirality: What chemists can interpret, computers are not always able (the above/below the plane must be strictly enforced)

    Not machine-readable Machine-readable

    Missing/altered atoms/substituents overall error rate above 9% Incorrectly drawn or written structures (3.4%); incorrect molecular

    formula or molecular weight (3.4%); Unspecified binding position for substituents or ambiguous

    numbering scheme for the heterocyclic backbone (0.91%); Structures with the incorrect backbone (0.71%); Incorrect generic names or chemical names (0.24%); Incorrect biological activity (0.34%); Incorrect references (0.2%).

    N

    NRO

    N

    NH2N

    N

    N

    OH OH

    NH2

    N

    N

    NNO

    OH OH

    R

    Reliability of Chemical Data (3)Reliability of Chemical Data (3)

  • Reliability of Structural DataReliability of Structural Data

    Photoactive yellow protein from E. Halophila First structure 1PHY.. Wrong Subsequently corrected at higher resolution

    1PHY 1989 2.42PHY 1995 1.4

    The University of New MexicoSCHOOL OF MEDICINE

    A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003

  • Reliability of Structural Data (2)Reliability of Structural Data (2)Where there is no chicken wire, theres no electrons..atoms

    1FQH now withdrawn from PDB

    The University of New MexicoSCHOOL OF MEDICINE

    A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003

  • Reliability of Structural Data (3)Reliability of Structural Data (3)

    NH2 & O cant be distinguished from density as isoelectronic PDBREPORT suggest 15% in Protein databank likely incorrect

    N/C cannot normally be distinguished from density

    NH

    ONH

    OOH

    N

    e.g. glutamine, asparagine

    e.g. histidine

    NH2

    O

    O

    NH2

    NH

    NN NH

    The University of New MexicoSCHOOL OF MEDICINE

  • Where do I work?Where do I work?

  • The Importance of Accurate InformationThe Importance of Accurate Information With one source, we have information With two sources, we can have confirmation or confusion With several sources, knowledge emerges Accurate Information is important hence we tend to trust

    certain newspapers, TV stations or scientific journals (the peer-review system regulates that).

    If something is really important to you (*) then consult multiple sources and verify that your assumptions are correct

    (*) e.g., who is on my PhD committee; whats my girlfriends birthday; what are the side-effects for the medicine my parents are taking; is this synthesis route correct? etc.

  • The University of New MexicoSCHOOL OF MEDICINE

    The Attrition Rate in Drug DiscoveryThe Attrition Rate in Drug DiscoveryHTS

    HTS Hits

    HTS Actives

    Lead Series

    Drug Candidates

    Drug

    1,000,000

    2,000

    1,200

    50-200

    1

    0.1

    Incr

    ease

    d k

    nowl

    edge

    and

    val

    ue

    Incr

    ease

    d ris

    k of

    failu

    re

    Incr

    ease

    d e

    xper

    imen

    tal e

    rror r

    ate

  • The University of New MexicoSCHOOL OF MEDICINE

    The Optimization Response in Drug DiscoveryThe Optimization Response in Drug Discovery

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Dru

    ggab

    ility

    Multiple Response Surface

    0

    0.2

    0.4

    0.6

    Fuji-Yama Landscape

    Chemical Space

    Leads

    Drugs

    Hits

  • The University of New MexicoSCHOOL OF MEDICINE

    Considering a Career in Molecular SciencesConsidering a Career in Molecular Sciences Science is not a democracy: just because everyone believes

    something does not make it true Choose topics that open new possibilities, not those that (may)

    lead to dead-ends; dont get stuck with a single technology Always go to the source (original publication), as information gets

    to be sometimes selectively presented

    Make sure you understand the basics try to explain what you do to a 5-yr old. If you manage, you grasped the concepts

    Stay away from fashionable science: just because it might get you funded, it does not mean its science

    Make sure to give credit where credit is due but do not be afraid to claim whats yours (protect your ideas)

  • The University of New MexicoSCHOOL OF MEDICINE

    Some Take Home MessagesSome Take Home Messages Nothing is what it seems: verify what you see, doubt what you find, and

    always get independent confirmation of your observations. For this reason, once you are sure about your findings you can be ready to defend your results (e.g., GPR30 still very contested by others)

    Dont be afraid to say I DO NOT KNOW, omniscient beings are not of this world (think Buddha, Jesus, and other enlightened beings)

    Although you do not know, be ready to learn Focus on problem-solving skills, they are more important than static

    learning & memory Always find ways to reward creativity and out-of-the box thinking People are 100x more important than equipment as you progress in

    your career, you will find that people are the most important asset If someone steals your ideas (this happends all the time) remember that

    this is a subtle form of flattery (so is envy). Focus on generating new ideas, and do not turn the stolen-idea-situation into an obsession (this will block your creativity)

    Learn where your fear comes from: deal with it inside, and do not take it on other people. Fear leads to anger, anger leads to violence

    Express yourself freely and creatively. The Universe is a friendly place.

  • The University of New MexicoSCHOOL OF MEDICINE

    Garland MarshallGarland Marshalls Definition of Lucks Definition of Luck