Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg AstraZeneca R&D S¶dert¤lje

  • View
    215

  • Download
    2

Embed Size (px)

Text of Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg...

  • Slide 1
  • Bioinformatics from a drug discovery perspective EMBRACE Workshop, 22-23 March 2007 Niclas Jareborg AstraZeneca R&D Sdertlje
  • Slide 2
  • AstraZeneca Drug Discovery Research Areas CV/GI (Cardiovasc/Gastrointest), RIRA (Resp/Infl), CNS/Pain, Cancer, Infection Discovery Sites UK Charnwood (RIRA), Alderley Park (Cancer, CV/GI, RIRA) North America Boston (Cancer, Infection), Willmington (CNS/Pain), Montreal (CNS/Pain) Sweden Lund (RIRA), Mlndal (CVGI), Sdertlje (CNS/Pain) India Bangalore (Infection) Bioinformatics All RAs have their own bioinformatics teams Infrastructure at Alderley Park (db:s, large Linux clusters) IS organisation
  • Slide 3
  • A target is defined as... a biological target protein on which a chemical entity (e.g. a drug molecule) exerts its action A drug target must be associated with a disease
  • Slide 4
  • Drug discovery process Target identification Target validation Hit identification (HTS) Hit to lead (Lead identification) Lead optimisation Candidate drug Clinical trials Protein Assay Compound library Hit Genes Effort
  • Slide 5
  • Target Definition Alternative Splicing Identify pharmacologically relevant target variant(s) Sequence variation Function Target Metabolizing enzyme Binding of substance Identify most common variant Might differ in different populations!
  • Slide 6
  • Target Definition Expression Is the target expressed in a relevant human tissue? Databases Microarrays Immunhistochemistry In situ hybridization Proteomics Literature
  • Slide 7
  • Target Definition Selectivity How similar are related proteins? Do similar proteins have functions that we do not want to affect? Animal models Orthologous genes Same family size? Splice variants Same as in human? Polymorphisms Differences between inbred strains Tissue expression Overlap human? Available transgenes or knock-outs
  • Slide 8
  • Launch Development for Launch Registration CD Pre- nomination Hit IdentificationLead Identification Lead Optimisation ResearchDevelopment Commercialisation Support target identification flag up population variants in target MS5 MS1 MS2MS3 MS4 Primary screening Identify polymorphic and splice variants Selectivity screening Identify paralogues Target Identification Bioinformatics input to the drug discovery process Sales Genetics & Bioinformatics Support Biomarker identification Support choice of model organism(s)
  • Slide 9
  • In-house generated gene centric information resource DNA and protein sequence Similarity to other species Splice variants Genetic mutations Tissue expression
  • Slide 10
  • DNA and protein sequence Similarity to other species In-house generated gene centric information resource Splice variants Genetic mutations Tissue expression Pathways Patents Gene symbol Synonyms Literature Functional motifs
  • Slide 11
  • Target identification ESTs sequencing campaigns Target Candidates Validation (in silico, lab bench) Validation as potential targets Micro arrays (Affymetrix, glas etc.) Proteomics Specificity / selectivity Targets from different experimental approaches as well as validation using different technologies Genetics/genome information Differential biology In silico Literature
  • Slide 12
  • Target identification ~30000 human genes 1 potential target What? Where? Novel? Link to disease?
  • Slide 13
  • The human genome offers many potential drug targets
  • Slide 14
  • Samuel Svensson, PhD AstraZeneca R&D Sdertlje Current Drug Targets - few target classes Based on 483 drugs in Goodman and Gilman's "The Pharmacological basis of therapeutics"
  • Slide 15
  • ~2-3.000 druggable targets < 5.000 targets for small molecule drugs < 5.000 targets for small molecule drugs ~30000 human genes Only a subfraction of gene products play a direct role in disease patophysiology Druggable genome ~2-3.000 genes; 500 GPCRs, 50 NHRs, >200 ion channels, >1.000 enzymes (e.g. 450 proteases, 500 kinases, >200 others) pathogens & commensal gut bacteria genes Number of druggable targets smaller than expected?
  • Slide 16
  • Updating the (shrinking?) Targetome Down to 22K ? (see) PMID: 15174140PMID: 15174140 Some of the 120 InterPro domains are unpromising many potentials still functional orphans realistically nearer 2000 ? OMIMOMIM still only at 1900 and only low numbers of robust genetic association results
  • Slide 17
  • Current trends Blue sky genomics -> literature Finding unknown targets -> prioritizing the lists Moving from single target focus Comparing and ranking of target candidates Integration of relevant but disparate data sources Better understanding of the target neighbourhood Disease mechanism Biomarkers Toxicology
  • Slide 18
  • Sources of Contextual Information Structured Unstructured 20% 80% Internal Docs: Tox Reports, Clinical Trial Reports. External Docs: Patents; USPTO, WIPO, EP, etc Literature; Medline, Embase Press Releases: competitor, supplier, collaborator, academic (etc) Government Agencies Conference Proceedings News Feeds Internal Chemical Dbs Internal Biological Dbs External, Commercial Dbs GVK Bio, Ingenuity IPA External Public Dbs EMBL, PDB, SNPdb, etc Mature TechnologyEmerging Technology Current approach to retrieving information from unstructured sources is through manual extraction I.e. Finding documents and reading them!
  • Slide 19
  • Dissecting the Decision Making Process Locating relevant documents and information Retrieving them in a useable format Reading information Locating the facts within documents Understanding what it means Putting the information into context Turning information into knowledge Developing new hypotheses Input into decision making FindingExtractingIntegratingCreating
  • Slide 20
  • FindingExtractingIntegratingCreating Difficult to capture breadth Chance to miss things White space in failing to find things Limited time to read things Focus on reviews and summaries Based on individual scientists own knowledge Narrow Biased Hypotheses are per project Reactive not proactive Issues with the Manual Approach
  • Slide 21
  • Text mining Sources Literature Patents In-house reports Information Protein-protein interactions Tissue expression Pharmacological differences Splice variants, Polymorphisms Species Toxicology etc
  • Slide 22
  • Extraction of facts from unstructured data sources Natural Language Processing, Ontologies Linguamatics I2E Knowledgebase generation Emerging Systems:Text Mining
  • Slide 23
  • Biomedical Entity-Relationship Data CASP9 PARP CASP3 CASP8 BCL2 Co-published Co-Published Information Activates Binds Inactivates Activates Binds Gene:Gene Semantic Relationships Inc Expression Gene:Disease Semantic Relationships Neoplasia Hyperplasia Associated with Activates Increases Gene:Metabolite Semantic Relationships ADP-ribose Synthesizes MTPN TNF Thalidomide Inhibits Gene:Chemical/Drug Semantic Relationships
  • Slide 24
  • www.ingenuity.com Pilot Systems: Pathway Analysis: Ingenuity IPA
  • Slide 25
  • BER System in Action Gene Expression Proteomic Metabonomic Significant Biological Entity List: Gene List Protein List Metabolite List Genetic ERSystem (Gene/Metabolite Knowledgebase) Biological environment of the list. Question: What is the underlying biology, pathology, physiology etc associated with this list of entities? What is it telling me? Canonical pathways associated with the list Diseases, Biological processes associated with the list Literature Evidence Trail Hypothesis Generation
  • Slide 26
  • Species Human Rat Dog Etc. Affects Involved in Is a Linked with Observed in Structuring the Knowledge Delivers facts as networks of information: Knowledge Bases Clinical Observations Diarrhoea Vomiting Loose Stools Bloating Nausea Etc. Cellular Processes Compound Genes Observed in Pathology GI toxicity GI pathology GI Tox Knowledge Map
  • Slide 27
  • Data source integration CIRA TSR Interface Disease/ Target KB DataMart ETL: Biz rules, scoring ETL Disease KB Interface DataMart Complex Data Query Automated ETL engines GenesExpressionTargetsChemLiteraturePatentCI Focused NLP Extraction Direct Project Queries Ontologies CVGI TSR Interface DataMart Vizualisation Representation Extraction
  • Slide 28
  • Workflow technology Enables scientists to use, modify and implement solutions that specialist groups help them put in place; removes (in principle) the need to make extensive IS projects for new data types.
  • Slide 29
  • The Knowledge Technology Ziggurat Decision Making Process Find Extract Integrate Create Builds on Content Licensing & Access Document Retrieval and Storage Fact Extraction (Text Mining) Information Structuring Modelling Content Licensing & Access Document Retrieval and Storage Fact Extraction (Text Mining) Knowledge Structuring Modelling Unstructured Information Developing semantic relationships KNOWLEDGE BASES Current focus Systems biology
  • Slide 30
  • Sequences Patented inhibitors Literature inhibitors and PDB ligands HTS, foussed screens & project SAR data Docking & virtual screening Fingerprint structure search Competitor compounds Library and fragment data AZ protein and ligand