52
Copyright GeneGo 2000-2003 CONFIDENTIAL Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej Bugrim GeneGo, Inc.

Systems Biology for Drug Discovery

Embed Size (px)

DESCRIPTION

Systems Biology for Drug Discovery. Building and using protein interaction networks: industry perspective. Andrej Bugrim GeneGo, Inc. Topics. Annotation process and collecting network content for idustrial-type applications - PowerPoint PPT Presentation

Citation preview

Page 1: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Systems Biology for Drug Discovery

Building and using protein interaction networks: industry perspective

Andrej BugrimGeneGo, Inc.

Page 2: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALTopics

• Annotation process and collecting network content for idustrial-type applications

• Biological and disease ontologies – how to improve and use them in functional analysis

• Tools: utilizing network data in pharmaceutical R&D

Page 3: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMulti-level understanding of human biology

Level of

phenotype

Level ofCell process/

network

Level of protein

Causativerelations

Mechanisticrelations

Page 4: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

BC-perturbed cell processesCausative BC models

Disease-centered knowledge base in MetaMiner (Oncology example)

General BC schema

Other cancers chosen by Consortium

Compare

Causative disease associations:DNA, RNA, protein levels

Disease group

Protein-protein; Protein-DNA; protein-RNA interactions

Network group

Ligand-receptorinteractions: drugs,

leads, hits

Chemistry group

Biomarkers

Specialty group

GG annotation team

Page 5: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Content

Page 6: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Ligands: metabolites, peptides, xenoboitics

Membrane receptors

Signal transduction:G proteins,

Secondary messengersKinases

Phosphotases

Transcription factors

Core effect: metabolic pathways

Metabolites

•1,600 drugs w/targets• 4,100 endogenous metabolites•>21,000 ligand-receptor interactions•850 GPCRs and other membrane receptors•110 Nuclear hormone receptors

Three interactions domains in MetaCore

172K manually curated physical signaling interactions538 canonical maps

42,000 13-step canonical signal transduction pathways

924 Human transcription factors6,000 target genes

11,300 metabolic reactions

116 Fine metabolic maps

4,100 endogenous metabolites

Page 7: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMetaBase Content Overview

– Database• Chemical compounds 580,000• Drugs 8,590• Chemical Reactions 35,600• Metabolic networks 251

– Network• Proteins + genes 13,402• Transcription factors 924• Chemical compounds 26,000• Drugs 2,740• Endogenous compounds 4,100• Proteins linked to drugs 2,711• Reactions 5,330• Small molecule ligands for

human receptors 3,510• blockers for ion channels 629• Pubmed journals 3,100• Pubmed articles 81,400• Total amount of interactions 177,000

– Content• GeneGo regulatory networks 120• GeneGo disease networks 88• Maps 538• Regulatory maps 325• Metabolic maps 116• Traditional metabolic maps (EC) 97• Diseases 4,920

Page 8: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

4,100

8,590

3,422

15,700

25,662

27,418

3,580

Endogenous compounds

Drugs

Drug metabolites

Compounds in reacts

Compounds in network

Compounds with structures

Reaction substrates withkinetic data

MetaBase content by type

Database

Genes (human: 38,700)

Total:137,500

Chemical compounds

580,000

Human proteins 14,570

Metabolicreactions

35,600

Page 9: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Protein-protein interactions

Influence on expression; 10,120; 14%

Regulation of transcription;

15,725; 21%

Unspecified regulation; 3,990; 5%

Covalent modification;

5,967; 8%

Activation/ inhibition via

binding; 43,079; 52%

Network interactions

All interactions taken from articles indexed in Pubmed

Pubmed journals 3,100

Pubmed articles 81,400

Small molecule-protein

Regulation of transporters;

5,786; 14%

Binding to kinazes; 6,984;

16%

Regulation of enzymes; 8,898;

21%

Regulation of other proteins;

6,218; 15%

Binding to receptors; 14,497; 34%

Manually curated interactions (172,787)

Signalling interactions; 137,297; 79%

Metabolic reactions; 35,490; 21%

Y2H "Interactome"; 2,370; 1%

Logical relations; 1,934; 1%

Protein-protein; 87,675; 51%

Small molecule-protein; 42,383; 26%

With MicroRNA; 1,620; 1%

With virus protiens; 335; 0%Chip-Chip; 980; 1%

Page 10: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALType of interactions in network

Effects

activation

inhibition

unspecified

Direct interactionIndirect interaction

Mechanism Mechanism

phosphorylation influence on expression

dephosphorylation unspecified

other type of covalent modification  

binding  

transport  

cleavage  

transcription regulation  

transformation  

catalysis  

competition  

Page 11: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALDistribution of interactions by mechanism

competition0.1%

catalysis8%

transformation1%

transcription regulation

15%

dephospho-rylation

0.5%

phospho-rylation

4.1%

unspecified6.4%

influence on expression

12%

binding48%

covalent modification

1%

cleavage2%

transport2%

Page 12: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALNetwork objects

Network objects

Metabolic reactions; 5,353

Metabolic reactions; 5,353

Proteins; 13,406

Chemical compounds ; 25,662

Xenobiotic compounds; 15,955

Drug metabolites; 1,032Drugs; 2,741

Endogenous compounds; 4,010

Metabolites of xenobiotics; 1,924

Enzymes; 2,910

Kinazes; 626

Phosphatases; 137

Proteases; 352

Transcription factors; 924

membrane receptors; 764

Nuclear hormone receptors; 110

Receptor Ligands; 640

Transporters; 804

Ion Channels; 217Other; 5,922

Total number of nodes: 40,229

Page 13: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALProteins: distribution by tissue & localization

Proteins: distribution by tissue

7181

7365

7236

7064

7247

6888

7485

7430

7150

7263

5715

6961

7377

8376

6241

7788

7655

7064

7803

6761

7427

7758

7471

7484

4452

76950

10

00

20

00

30

00

40

00

50

00

60

00

70

00

80

00

90

00

10

00

0

Adrenal Gland

Brain

Colon

Heart

Kidney

Liver

Lung

Mammary Gland

Marrow

Ovary

Pancreas

Placenta

Prostate

Retina

Salivary Gland

Skin

Spinal Cord

Spleen

Testes

Thymus

Thyroid

Tonsil

Trachea

Upper GI Tract

Uteri

Common for all these tissuesProteins: distribution by cell compartment

42

44

48

54

56

56

91

94

100

126

147

178

226

249

335

399

530

684

823

18107

1 10 100 1,000 10,000 100,000

lysosome

actin cytoskeleton

cytoskeleton

proteinaceous extracellular matrix

Golgi apparatus

intracellular

endoplasmic reticulum

cytosol

membrane

soluble fraction

mitochondrion

extracellular space

membrane fraction

extracellular region

integral to membrane

plasma membrane

cytoplasm

integral to plasma membrane

nucleus

Unspecified

Page 14: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMolecular functions in Database

catalytic activity; 4086; 23%

binding; 8503; 46%

signal transducer activity; 2535; 13%

transcription regulator activity;

1396; 7%

transporter activity; 1078; 6%

enzyme regulator activity; 599; 3%

chemorepellant activity; 3; 0%

chemoattractant activity; 8; 0%

structural molecule activity; 459; 2%

chaperone regulator activity; 11; 0%

translation regulator activity; 75; 0%

motor activity; 77; 0%

antioxidant activity; 51; 0%

Page 15: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALEndogenous compounds (4,100 total)

Endogenous compounds by origin Steroids 4% Fatty Acids

5%

Lipids43%

Peptides10%

Other19%

Carbohydrates15%

Vitamins/Co-factors

6%

Nucleotides2%

•3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only•751 endogenous ligand for 498 receptors with 2,455 interactions•4000 (98%) of endogenous compounds in network•15,962 network interactions with endogenous metabolites•3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups)

Page 16: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALNetwork and pathway statistics in GeneGO

• >40,000 nodes;• ~177,000 edges;• Average node degree: 3,77;• 241 million shortest pathways;• Average shortest pathway length: 5.3811;

• 42,000 13-step canonical signal transduction pathways; • 200 canonical metabolic pathways- major metabolic

fluxes like glycolysis or TCA;• 72,000 pathways on metabolic maps: pathways

analogous to KEGG (KEGG has 42,500)

Enzyme1 Enzyme2reaction1 reaction2metabolite

Page 17: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALPathways in regulatory network

B

B

C

Tr

2

3

A

kinase

1Tr+P

ZB

B

D

Tr

B

Tr

B

kinase

B

+P

B

kinase

B+P

a

ab

Start: TMR (transmembrane receptor) TF (Transcription Factor)

End: Target genes

Page 18: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Ontologies

Page 19: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Mixed ontologiesMixed ontologies

Knowledge base (ontologies)

By genre:- Drama- Action- Romance- Horror- Foreign

By director:- Lynch-Tarantino- Leone- Stone- Antonioni

By actor:- Pitt- Nicholson- Depp- Redford- Damon

By year:-2007-2006-2005-2004-2003

• How do you compare “action” movies vs. Tarantino movies vs. 2003 movies?•These are incomparable as these are different categories

Molecular pthwyCellular processDiseaseMetabolic process

Page 20: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMultiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology

Page 21: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALEnrichment in GO and GeneGo processes

•4 samples from 4 patiens•Disease/norm from same patients•Affy U133A arrays

GO processes GeneGo process networks

• Resolution: list of proteins• No connections between proteins• No sgnaling/effect within process

• Resolution: interactions between proteins• Connections between all proteins in folder• Clear signaling path, effect within process

Page 22: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Genes from GO process“Inflammatory response”

231

Genes from GO-process“Immune response”

446Genes from GO-processes“Inflammatory response”

“Immune response”613

Not in networks268

Genes in 15 process networks1642

Genes added to networks1297

In networks345

Not in networks

79Not in networks

199

In networks

247In networks

152

Inflammation

Page 23: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

34%

66%

17%

83%

Diseases

Human genes linked to diseases

– 6,318

Human genes not linked to diseases –

32,391 Diseases with no gene links – 3,251

Diseases linked to genes – 1,630

6,318 genes are linked to 1,630 diseases

4,881 Diseases, based on MeSH 38,709 Human genes total

21,264 unique articles, indexed in PubMed

Page 24: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALDisease tree – Neoplasms by Site

Page 25: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALDrug toxicity tree

Folders from MeSH Folders created at GeneGo based on reviews

38 Drug-induced pathological processes

Page 26: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALGene-Disease connections in public domain and GeneGo

GENE MeSH

•Hierarchical strusturedisease classification 4,888 diseases•Genes associated with diseases 6,429•Cited articles 33, 792

Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations withgenes and proteins

Only citation with Diseases name. Low trust

Only hierarchical structure

disease tree

OMIM

Only genetic info (mutation, SNPs)-No expression- No protein activity, loc

GeneGo

Page 27: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALContent. Cancer maps and networks. Breast Cancer: general scheme

Page 28: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALAngiogenesis in tumor growth

Page 29: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Unique genes

HumanMouse, Rat

141 mouse genes

74 rat genes

9 mouse genes2 rat genes

1 mouse gene1 rat gene

Unique genes and orthologs catalyse one reaction

Unique genes catalyze unique reactions

There is no human orthologs for Protein A

Orthologs catalyse different reactions

Fine metabolic differences between rodents, human

Page 30: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Tools

Page 31: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALData analysis workflow in MetaDiscovery suit

HTS, HCS

PathwayEditorMetaLink MapEditor

Custom interactions data:-Y2H-Pull-down-Co-expression- annotation

Custom maps,networks, pathways

MetaCore/MetaDrug platform

Med. chemistry:- Indications- Toxicities- Off-site effects

Modeling software:-CellDesigner- Virtual Cell

-

SBML, BioPax

Biology:- Biomarkers- Pathway-based targets

Structuressdf, MOL

Molecular bio data HTS, HCSMetabolitesISIS DB

Signature networks-Diseases-Drug response

P-value scoringOntologies:-GO processes-GeneGo processes-Canonical pathways-Metabolic networks-Diseases-Toxicities

Cross-experiment comparison-Time series- Multi-patient cohorts- Multiple logical operations-Complete report

Network alignment- Multiple algorithms- Sub-network queries

Page 32: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMetaCore™ Platform

Networks

Building Tools

Visualization

Tools

Oracle Based Database

curated interactions from the literature

Data:m-arrays, SAGE, proteomics,siRNA, metabolites, custom

interactionsLogical operations module

Pathway editor Statistics for pathways, processes, networks

Page 33: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Networks of protein interactions

– Dynamic; built “on-the-fly”

– Exploratory tool

– Build new pathways for genes of interest

Pathways Integration

Interactive, static maps

– 550 maps

– Signaling, regulation, metabolism, diseases

– Backbone of formalized “state of art” in the field

Page 34: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALChoose direction and checkpoints within network building page

From – histaminethrough – histamine H1 receptorto – Actin

Page 35: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Non-significant bars become semi-transparent

False discovery rate filter

0.01 ApplyApply iiThreshold

Page 36: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALNew customization modules

• MapEditor: custom maps synchronized with MC/MD database– Draw pathways maps from scratch– Transform gene lists into networks into pathway maps– Edit MetaCore’s canonical maps– View and score your maps within the context of canonical maps– Map experimental data on custom maps

• MetaLink: overlaying custom interactions– Import custom interactions (Y2H, co-expression, pull-down, etc.)– Visualize using GeneGo network building algorithms– Score “unknown” proteins (high IP potential) based on relevance to

“benchmark” networks built from MetaCore interactions

• PathwayEditor: annotation technology transfer, at the database level– Custom annotation of interactions, compounds, diseases, metabolism in the

framework of internal annotation system at GeneGo– Use the annotation forms, workflows and QC system developed at GeneGo– Novel objects are imported and integrated with pre-existing data in MetaCore

Page 37: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALAdding Localizations

Additional Localizations can be added

Page 38: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALYour NEW map is now an interactive part of MetaCore

Users can visualize

their experimental data on the new map

Page 39: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Resulting Direct Interactions network

Pink interactions are from the uploaded links file Mouse over an

interaction to see the uploaded weight value

Blue interactions are in both the links file and the MetaCore database

Mapping interaction sets on networks

Page 40: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Algorithms

Page 41: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALOld and new ways to analyze data

Full data tables

Statistical procedures,

thresholds of fold, p-value either in MC

or 3rd party tools

Sets of genes

Connect them on network by one way or another:

Too many choices, no clear way to choose

Full data tables

Statistical procedures in MC

based on concurrent analysis

of expression profiles and connectivity

Sets of network modules

Apply to global network

Current way of analysis: all significance calculations done before mapping onto network

New way of analysis: significance calculations follow the mapping onto network

Page 42: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALSamples are analyzed in pathway’s expression space

Sample 1 Sample 2 Sample 3 Sample 4

Gene 1 1 4 3 2

Gene 2 4 2 7 6

Gene 3 2 9 3 8

Gene 4 2 5 4 2

Page 43: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALNetwork signatures for compounds effects

Mestranol

Tamoxifen

Phenobarbital

Phenobarbital

Page 44: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALFinding topologically significant nodes

A

B C

Topologically significantTopologically significant Not topologically significant

4 out 6 under nodes regulated by B are differentially expressed: more than random

share = significant

Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event

= not significant

In reality algorithm also considers nodes beyond first-degree neighbors

Differentially expressed genes Non-differentially expressed genes

Page 45: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALWhy JAK1 is significant in this dataset?

Regulation via JAK1

JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1

Topological significance helps to find important links in pathways that do not come up on HT screens

Feedback loops

Page 46: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALRegulation of lipid Metabolism

Differentially expressed genes identified by microarray and confirmed by proteomic screen

Topologically significant nodes revealed by the new algorithm

Page 47: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALPutting it all together: network activity inference

– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition

cascades

Z Z Z

Experimental data: start cascade

Experimental data: terminate cascade

Inferred activity

Experimental data

Predicted input

Predicted target

Scoring intermediary nodes

Page 48: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALWork in progress

• Finding Patterns of significance (based on one experiment): – Significant neighborhoods– Significant receptors (by underlying cascade)– Significant transcription factors (by upstream cascade)– Significant interaction types (by distribution of expression at terminals)

• Finding common and different pathway modules (based on multiple samples:– Looking for “differential pathways” - modules that distinguish one group of

samples from another– Finding common motifs in a group of pathway modules

• Inferring patterns of network activity– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition

cascades • Looking into mutual gene-process information and Bayesian inference of

significance– If gene G occurs only in process P its up-/down-regulation is a significant

evidence with respect to inferring P’s status– If gene G occurs in many other processes in addition to P its up-/down-

regulation is not a significant evidence with respect of inferring P’s status

Page 49: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIAL

Future products

Page 50: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMetaMiner Consortiums for 2007

• Oncology (breast cancer, 4 other cancers)

• Metabolic diseases (diabetes II, obesity, metabolic syndrome)

• CNS and neurodegenerative diseases

• Immunological and autoimmune diseases

Page 51: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMetaMiner consortiums: Analytical platform for disease areas

HTS, HCS

MetaMiner (Oncology) platform

Cancer relevant annotations, datatabases,Active cpds analysis creening

• Maps for disease, processes, drug action• Custom maps for projectsExperimental data depository

Data parsing, normalization

Data analysis

Cancer consortium labs

Compounds scoring:- Indications- Toxicities- Off-site effects

Drug targets:-Divergence hubs on networks; - “Druggability” testing- Pathways connectivity

Biomarkers:-Combination of different types - Expression - Secreted proteins - Metabolites-Convergence hubs (core effectors)

Page 52: Systems Biology for Drug Discovery

Copyright GeneGo 2000-2003

CONFIDENTIALMetaTox consortium. Functional descriptors

Enrichment by category Pathways maps Toxicity, process maps Sub-networks, modules, nodes

Mapping on descriptors

Indexing & scoring by tox. category

Predictive models