1
The capacity of modern experimental methods to generate data about biological processes has surpassed the ability of existing informatics approaches to generate meaningful mechanistic explanations. Mechanistic systems biology models could potentially address this gap, but model construction remains a laborintensive process requiring both biological knowledge and modeling expertise. As a result, modeling studies remain fairly small in scope and are disconnected from genomescale research. For mechanistic models to attain the necessary scope, methods for the automated assembly and analysis of large models from available knowledge sources will be required. Here we describe the use of the Integrated Network and Dynamical Reasoning Assembler (INDRA) 1 to assemble mechanistic facts from databases and literature into a rulebased Kappa 2 model in order to explain observations in a previously published phosphoproteomic dataset. 3 Explanations were generated by identifying paths through the rule influence map between drug targets and measured protein nodes. The model yielded detailed, biochemically plausible explanations for 20 of 22 of the largest effects (91%), and 95/135 (70%) of smaller effects. Additional improvements in performance could also be made by supplying manually curated mechanistic information in the form of natural language. Explanation of drug effects using a mechanistic model automatically assembled from natural language, databases, and literature John A. Bachman 1* , Benjamin M. Gyori 1* , and Peter K. Sorger 1 *These authors contributed equally to this work 1 Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA INTRODUCTION RESULTS Availability: https://github.com/sorgerlab/indra Funding: DARPA Big Mechanism program, ARO contract W911NF1410397 Phosphorylation(RAF, MEK) Phosphorylation(BRAF, MAP2K1, 218) Phosphorylation(BRAF, MAP2K1) Phosphorylation(BRAF, MAP2K1, S, 218) Phosphorylation(BRAF, MAP2K1, S, 222) Phosphorylation(BRAF, MAP2K1, S) Phosphorylation(BRAF, MAP2K1, 222) Mechanisms are normalized into Statements Correcting systematic errors in entity grounding Applying mechanistic models to interpretation of data requires that the named entities extracted from text (genes, proteins, small molecules, biological processes, etc.) be appropriately “grounded” to identifiers in the relevant databases. This is often challenging due to overlapping synonyms among gene names and ambiguous acronyms. A key problem is that mechanisms described in literature often refer to protein families and named complexes which cannot be directly related to genes and proteins measured in experimental data. To solve this problem, we created Bioentities (http://github.com/sorgerlab/bioentities), a resource accounting for the hierarchical relationships between genes/proteins, protein families, and named complexes. In addition, INDRA includes a curated “grounding map” that maps commonly encountered entities in text to the relevant database identifiers. Representation of “AMPK” in the Bioentities hierarchy Synonyms for JNK protein family in the INDRA grounding map Identifying relationships between mechanisms Relations can be organized into a hierarchy based on their specificity A key challenge in assembling detailed mechanistic networks is that a single mechanism may be described at different levels of specificity among the literature and various databases. Reconciling these overlapping mechanisms is essential to eliminate spuriously distinct edges in the assembled model. Using hierarchical ontologies of protein modification types, activity types, and the protein family information provided in Bioentities, INDRA implements duplicate removal, hierarchybased redundancy resolution, and other forms of error correction and mechanism linking. The Integrated Network and Dynamical Reasoning Assembler (INDRA) 1 automatically assembles mechanistic models from pathway databases, literature, and expert knowledge expressed in natural language. INDRA draws on three existing natural language processing systems 4,5,6 and uses a modular architecture to build different types of models from a variety of sources. Mechanisms extracted from each source format are normalized into Statements, an SBOcompatible internal representation, where they are processed to remove errors, identify overlaps, and estimate reliability. Statements are designed to correspond in both specificity and ambiguity to descriptions of biochemistry as found in text (e.g., “MEK1 phosphorylates ERK2”, rather than a detailed reaction mechanism). The representation currently encompasses posttranslational modifications, chemical conversions, protein expression and degradation, and generic activation/inhibition relationships. Statement evidence : Evidence Phosphorylation Modification enzyme : Agent substrate : Agent residue : string position : string "is a" (inheritance) composition (has one or more, life-cycle dependence) Statements Agent and components Agent name : string mods : list[ModCondition] mutations : list[MutCondition] bound_conditions : list [BoundCondition] location : string activity : ActivityCondition db_refs : dict Hydroxylation Dehydroxylation Ubiquitination Deubiquitination Dephosphorylation Acetylation Deacetylation Glycosylation Deglycosylation Sumoylation Desumoylation SelfModification enzyme : Agent residue : string position : string Autophosphorylation ActiveForm agent : Agent activity_type : string is_active : boolean Conversion subj : Agent obj_from : list[Agent] obj_to : list[Agent] Activation Transphosphorylation Gef gef : Agent gtpase : Agent gef_activity : string Gap gap : Agent gtpase : Agent gap_activity : string ModCondition mod_type : string residue : string position : string is_modified : boolean MutCondition from_residue : string to_residue : string position : string BoundCondition agent : Agent is_bound : string Farnesylation ActivityCondition activity_type : string is_active : boolean Inhibition RegulateActivity subject : Agent object : Agent obj_activity : string RegulateAmount subject : Agent object : Agent Evidence text : string source_api : string source_id : string pmid : string annotations : dict epistemics : dict IncreaseAmount DecreaseAmount Ribosylation Deribosylation Defarnesylation Geranylgeranylation Degeranylgeranylation Palmitoylation Depalmitoylation Myristoylation Demyristoylation Other AddModification RemoveModification Methylation Demethylation Complex members : list[Agent] Conceptual overview of automated assembly System architecture and approach INDRA software architecture Estimating the reliability of extracted mechanisms Even stateoftheart NLP and text mining algorithms have limited accuracy, with roughly 2030% of extracted relations representing a misinterpretation of the corresponding sentence (“reader error”). Given empirical estimates of the persentence error rate for different readers, INDRA’s BeliefEngine component aggregates results to estimate the overall probability that a relation is the result of reader error. It accomplishes this by: 1) aggregating evidence from multiple sentences read by the same reader 2) aggregating results from different reading algorithms on the same sentence 3) propagating error estimates through the network of related statements Mechanisms can then be filtered with a precision threshold (e.g., 95% confidence). Reading systems produce partially overlapping extractions Reliability estimates are propagated through the specificity hierarchy Use case for explanation: interpreting phosphoproteomic data REFERENCES 1. B. M. Gyori*, J. A. Bachman*, K. Subramanian, J. L. Muhlich, L. Galescu, and P. K. Sorger. “From word models to executable models of signaling networks using automated assembly.” bioRxiv, 2017. 2. V. Danos, J. Feret, W. Fontana, R. Harmer, and J. Krivine. “RuleBased Modeling of Cellular Signaling.” Concurrency Theory (CONCUR) 2007, Lecture Notes in Computer Science, 4703:17–41, 2007. 3. E. J. Molinelli, A. Korkut, et al., “Perturbation biology: Inferring signaling networks in cellular systems.” PLoS Computational Biology, 9(12):e1003290, Dec 2013. 4. J. Allen, W. de Beaumont, L. Galescu, and C. M. Teng. “Complex event extraction using DRUM.” 2015. 5. M. A. ValenzuelaEscarcega, G. HahnPowell, T. Hicks, and M. Surdeanu. “A domainindependent rulebased framework for event extraction.” In Proc. 53rd Annual Meeting of the ACLIJCNLP, 2015. 6. D. McDonald et al., “Extending Biology Models with Deep NLP over Scientific Articles.” Workshops at the 30 th AAAI Conference on Artificial Intelligence, 2016. 7. C. F. Lopez*, J. L. Muhlich*, J. A. Bachman*, and P. K. Sorger. “Programming biological models in python using PySB.” Molecular Systems Biology, 9(1):646–646, Apr 2014. Curated mechanisms for MTOR feedback inhibition on AKT B BV BM BGR BGRV BMV BGMR BR BBGGRR BBGGRRV BBGGMRR BBGGMRRV BBGRR BBGRRV BBGMRR BBGMRRV BRV BBGGRRVV BBGGMRRVV BBGRRVV BBGMRRVV BGMRV BMR BBGGMMRR BBGGMMRRV BBGMMRR BBGMMRRV BMRV BBGGMMRRVV BBGMMRRVV BBRR BBRRV BBMRR BBMRRV BBGR BBGRV BBGMR BBGMRV BBR BBRV BBMR BBMRV BBRRVV BBMRRVV BBGRVV BBGMRVV BBRVV BBMRVV BBMMRR BBMMRRV BBGMMR BBGMMRV BBMMR BBMMRV BBMMRRVV BBGMMRVV BBMMRVV BB BBV BBM BBMV BBVV BBMVV BBMM BBMMV BBMMVV Model representations for statically identifying causal paths Drug combinations RPPA measurements How did this happen? http://www.sanderlab.org/pertbio/ Directed protein interaction graph Kappa rule influence map 2 Chemical reaction network Mechanistic detail/causal context More false positive paths (less stringent context) More false negative paths (more stringent context) Boolean network The assembly challenge MEK phosphorylates ERK ERK phosphorylates MEK MEK1 phosphorylates ERK2 at T185 MEK1p218p222 phosphorylates ERK2 at T184 MEK1p218p222 phosphorylates ERK2 at T185. Methyl Ethyl Ketone phosphorylates ERK “Raw” mechanisms MEK phosphorylates ERK MEK phosphorylates ERK Assembled mechanisms Generating mechanistic models from assembled Statements In directed interaction graphs, the relatively limited causal context leads to an explosion of paths between any two proteins. This leads to many false positive paths and makes identification of long causal chains difficult (or even intractable) in large networks. Generating explanations from the Kappa 2 rule influence map identifying rules whose activity is increased by the abundance of the subject (e.g., drug) searching for a path to an observable representing the object (e.g., a measured protein) with the appropriate overall polarity scoring paths by whether the signs of measured intermediate nodes are correctly predicted Causal path for “Pervanadate increases MAPK1 phosphorylation” Pvd_binds_DUSP Pvd_binds_DUSP_rev [0->0];[1->1] DUSP_binds_MAPK1_phosT185 [1->0] [0->0];[1->1] [1->0] [0->1] DUSP_binds_MAPK1_phosT185_rev [0->0];[1->1] DUSP_dephos_MAPK1_at_T185 [0->0];[1->1] [0->1] [0->0];[1->1] [0->0];[1->1] [0->1] [0->0] [0->0];[1->1] MAPK1_pT185 [1->0] Extending the model by describing mechanisms in English “IGF1R phosphorylates IRS1 at tyrosine. Tyrosinephosphorylated IRS1 binds PI3K. Serine phosphorylated IRS1 is degraded. Active PPP2CA dephosphorylates IRS1 at serine. Active MTOR inhibits PPP2CA. To build a mechanistic model, highlevel assertions such as “MEK1 phosphorylates ERK1” must be converted into specific reaction mechanisms. INDRA uses userspecified policies that determine how the different Statement types are implemented, as PySB 7 rules and corresponding reactions. Phosphorylation(MEK1, ERK1) onestep (pseudofirstorder) onestep (MichaelisMenten) twostep (enzymesubstrate complex formation) ATPdependent (unordered bibi reaction) Genome assembly Sequence reads Assembled sequence Knowledge assembly Assembly of a large number of mechanistic facts is analogous to genome assembly: databases and literature yield a large number of redundant, partially overlapping facts that may contain errors. Mechanisms must be corrected and “aligned” in order to produce a set of facts suitable for generating a nonredundant, nondegenerate model. To evaluate the ability of INDRA to systematically generate explanations of highthroughput data, we assembled a rule based executable model to explain a previously published dataset of the phosphoproteomic response of a melanoma cell line to 12 different drugs. 3 A rulebased model containing 221 proteins and 1451 rules was assembled from mechanisms extracted from databases and ~95,000 publications (abstracts and full texts). Static analysis of the rule influence map provided by Kappa identified possible mechanistic paths linking drug targets to experimentally observed effects on phosphoprotein abundances. Drug Target Antibody Fold- change Path ? MEK MAPK pT202 0.47 SRC CHK2 pT68 1.75 SRC 4EBP1 pT37 0.44 AKT AKT pT308 0.25 AKT GSK3A/B pS21 0.44 AKT AKT pS473 0.17 AKT S6 pS235 0.36 CDK4 4EBP1 pS65 0.44 CDK4 YBI pS102 2.13 MTOR AKT pT308 2.19 MTOR S6 pS240 0.05 MTOR AKT pS473 3.19 MTOR p70S6K pT389 0.33 MTOR S6 pS235 0.06 PKC GSK3A/B pS21 1.59 PKC S6 pS240 0.47 PKC S6 pS235 0.3 PI3K p70S6K pT389 0.5 PI3K S6 pS240 0.44 PI3K AKT pS473 0.2 PI3K S6 pS235 0.27 SRC phosphorylated on Y418 phosphorylates PAK2 on S20. PAK2 phosphorylated on S20 phosphorylates RAF1 on S338. RAF1 phosphorylated on S338, T269 and S471 phosphorylates MAPK1 on T185. MAPK1 phosphorylated on T185 and Y187 phosphorylates TP53 on S15. TP53 phosphorylated on S20 and S15 decreases the amount of PLK1. PLK1 phosphorylates CHEK2 on T68, which is measured by CHK2_pT68. Example explanation: How does Src inhibition increase CHK2 pT68? Performance: For the largest effects in the data (>50% foldchange) the model generated biochemically plausible explanations for 20 of the 22 effects (91%). For effects at the 20% foldchange level, the model Where the model was unable to identify a causal path between a drug perturbation and an observed effect, we were able to use NLP to manually curate a causal path in simplified English and co assemble it with the automated model. Overall, this study shows the potential of automatically assembled models to systematically explain highthroughput data, generating mechanistic hypotheses and identifying genuinely novel phenomena. explained 95/135 (70%) of effects. Notably, performance was biased toward drug targets well represented in the literature corpus: the model explained 94/106 (89%) of effects due to PI3K, PKC, SRC, MTOR, MEK, AKT, RAF, and JAK inhibition, but only 1/29 (3%) of effects due to CDK, STAT or MDM2 inhibition. The Kappa influence map captures detailed context while avoiding the combinatorial explosion of chemical species. Paths are obtained by:

2017-07-19 john ismb2017 poster · ZEB1 SAA EGF VPS37A PI3K HGF CDH1 MMP9 MET EGFR invasion TRIM11 ERRFI1 aldosterone IL6ST KLF5 DPP4 CYP11B2 artemisinin Integrins IFNL1 CXCL8 HDAC6

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2017-07-19 john ismb2017 poster · ZEB1 SAA EGF VPS37A PI3K HGF CDH1 MMP9 MET EGFR invasion TRIM11 ERRFI1 aldosterone IL6ST KLF5 DPP4 CYP11B2 artemisinin Integrins IFNL1 CXCL8 HDAC6

The  capacity  of  modern  experimental  methods  to  generate  data  about  biological  processes  has  surpassed  the  ability  of  existing  informatics  approaches  to  generate  meaningful  mechanistic  explanations.  Mechanistic  systems  biology  models  could  potentially  address  this  gap,  but  model  construction  remains  a  labor-­‐intensive  process  requiring  both  biological  knowledge  and  modeling  expertise.  As  a  result,  modeling  studies  remain  fairly  small  in  scope  and  are  disconnected  from  genome-­‐scale  research.  For  mechanistic  models  to  attain  the  necessary  scope,  methods  for  the  automated  assembly  and  analysis  of  large  models  from  available  knowledge  sources  will  be  required.  Here  we  describe  the  use  of  the  Integrated  Network  and  Dynamical  Reasoning  Assembler  (INDRA)1 to  assemble  mechanistic  facts  from  databases  and  literature  into  a  rule-­‐based  Kappa2 model  in  order  to  explain  observations  in  a  previously  published  phosphoproteomic dataset.3 Explanations  were  generated  by  identifying  paths  through  the  rule  influence  map  between  drug  targets  and  measured  protein  nodes.  The  model  yielded  detailed,  biochemically  plausible  explanations  for  20  of  22  of  the  largest  effects  (91%),  and  95/135  (70%)  of  smaller  effects.  Additional  improvements  in  performance  could  also  be  made  by  supplying  manually  curated  mechanistic  information  in  the  form  of  natural  language.

Explanation  of  drug  effects  using  a  mechanistic  model  automatically  assembled  from  natural  language,  databases,  and  literatureJohn  A.  Bachman1*,  Benjamin  M.  Gyori1*,  and  Peter  K.  Sorger1*These  authors  contributed  equally  to  this  work      1Laboratory  of  Systems  Pharmacology,  Harvard  Medical  School,  Boston,  MA,  USA

INTRODUCTION

RESULTS

Availability:  https://github.com/sorgerlab/indra Funding:  DARPA  Big  Mechanism  program,  ARO  contract  W911NF-­‐14-­‐1-­‐0397

Phosphorylation(RAF,  MEK)

Phosphorylation(BRAF,  MAP2K1,  218)

Phosphorylation(BRAF,  MAP2K1)

Phosphorylation(BRAF,  MAP2K1,  S,  218) Phosphorylation(BRAF,  MAP2K1,  S,  222)

Phosphorylation(BRAF,  MAP2K1,  S)

Phosphorylation(BRAF,  MAP2K1,  222)

Mechanisms  are  normalized  into  Statements

Correcting  systematic  errors  in  entity  groundingApplying  mechanistic  models  to  interpretation  of  data  requires  that  the  named  entities  extracted  from  text   (genes,  proteins,  small  molecules,  biological  processes,  etc.)  be  appropriately  “grounded”  to  identifiers  in  the  relevant  databases.  This  is  often  challenging  due  to  overlapping  synonyms  among  gene  names  and  ambiguous  acronyms.  A  key  problem  is  that  mechanisms  described  in  literature  often  refer  to  protein  families  and  named  complexes  which  cannot  be  directly  related  to  genes  and  proteins  measured  in  experimental  data.  To  solve  this  problem,  we  created  Bioentities(http://github.com/sorgerlab/bioentities),  a  resource  accounting  for  the  hierarchical  relationships  between  genes/proteins,  protein  families,  and  named  complexes.

In  addition,  INDRA  includes  a  curated  “grounding  map”  that  maps  commonly-­‐encountered  entities  in  text  to  the  relevant  database  identifiers.

Representation  of  “AMPK”in  the  Bioentities hierarchy

Synonyms  for  JNK  protein  familyin  the  INDRA  grounding  map

Identifying  relationships  between  mechanisms

Relations  can  be  organized  into  a  hierarchybased  on  their  specificity

A  key  challenge  in  assembling  detailed  mechanistic  networks  is  that  a  single  mechanism  may  be  described  at  different  levels  of  specificity  among  the  literature  and  various  databases.  Reconciling  these  overlapping  mechanisms  is  essential  to  eliminate  spuriously  distinct  edges  in  the  assembled  model.  Using  hierarchical  ontologies  of  protein  modification  types,  activity  types,  and  the  protein  family  information  provided  in  Bioentities,  INDRA  implements  duplicate  removal,  hierarchy-­‐based  redundancy  resolution,  and  other  forms  of  error  correction  and  mechanism  linking.  

The  Integrated  Network  and  Dynamical  Reasoning  Assembler  (INDRA)1 automatically  assembles  mechanistic  models  from  pathway  databases,  literature,  and  expert  knowledge  expressed  in  natural  language.  INDRA  draws  on  three  existing  natural  language  processing  systems4,5,6 and  uses  a  modular  architecture  to  build  different  types  of  models  from  a  variety  of  sources.  

Mechanisms  extracted  from  each  source  format  are  normalized  into  Statements,  an  SBO-­‐compatible  internal  representation,  where  they  are  processed  to  remove  errors,  identify  overlaps,  and  estimate  reliability.  Statements  are  designed  to  correspond  in  both  specificity  and  ambiguity  to  descriptions  of  biochemistry  as  found  in  text  (e.g.,  “MEK1  phosphorylates  ERK2”,  rather  than  a  detailed  reaction  mechanism).  The  representation  currently  encompasses  post-­‐translational  modifications,  chemical  conversions,  protein  expression  and  degradation,  and  generic  activation/inhibition  relationships.  

Statement

evidence : Evidence

Phosphorylation

Modification

enzyme : Agentsubstrate : Agentresidue : stringposition : string

"is a" (inheritance)composition (has one or more, life-cycle dependence)

StatementsAgent and components

Agent

name : stringmods : list[ModCondition]mutations : list[MutCondition]bound_conditions : list [BoundCondition]location : stringactivity : ActivityConditiondb_refs : dict

Hydroxylation Dehydroxylation

Ubiquitination Deubiquitination

Dephosphorylation

Acetylation Deacetylation

Glycosylation Deglycosylation

Sumoylation Desumoylation

SelfModification

enzyme : Agentresidue : stringposition : string Autophosphorylation

ActiveForm

agent : Agentactivity_type : stringis_active : boolean

Conversionsubj : Agentobj_from : list[Agent]obj_to : list[Agent]

Activation

Transphosphorylation

Gef

gef : Agentgtpase : Agentgef_activity : string

Gap

gap : Agentgtpase : Agentgap_activity : string

ModCondition

mod_type : stringresidue : stringposition : stringis_modified : boolean

MutCondition

from_residue : stringto_residue : stringposition : string

BoundCondition

agent : Agentis_bound : string

Farnesylation

ActivityCondition

activity_type : stringis_active : boolean

Inhibition

RegulateActivity

subject : Agentobject : Agentobj_activity : string

RegulateAmount

subject : Agentobject : Agent

Evidence

text : stringsource_api : stringsource_id : stringpmid : stringannotations : dictepistemics : dict

IncreaseAmount

DecreaseAmount

Ribosylation Deribosylation

Defarnesylation

Geranylgeranylation Degeranylgeranylation

Palmitoylation Depalmitoylation

Myristoylation Demyristoylation

Other

AddModification

RemoveModification

Methylation Demethylation

Complex

members : list[Agent]

Conceptual  overview  of  automated  assembly

System  architecture  and  approach

INDRA  software  architecture

Estimating  the  reliability  of  extracted  mechanismsEven  state-­‐of-­‐the-­‐art  NLP  and  text  mining  algorithms  have  limited  accuracy,  with  roughly  20-­‐30%  of  extracted  relations  representing  a  misinterpretation  of  the  corresponding  sentence  (“reader  error”).  Given  empirical  estimates  of  the  per-­‐sentence  error  rate  for  different  readers,  INDRA’s  BeliefEngine component  aggregates  results  to  estimate  the  overall  probability  that  a  relation  is  the  result  of  reader  error.  It  accomplishes  this  by:

1) aggregating  evidence  from  multiple  sentences  read  by  the  same  reader

2) aggregating  results  from  different  reading  algorithms  on  the  same  sentence

3) propagating  error  estimates  through  the  network  of  related  statements

Mechanisms  can  then  be  filtered  with  a  precision  threshold  (e.g.,  95%  confidence).

Reading  systems  produce  partiallyoverlapping  extractions

Reliability  estimates  are  propagated  through  the  specificity  hierarchy

Use  case  for  explanation:  interpreting  phosphoproteomic data

REFERENCES1. B.  M.  Gyori*,  J.  A.  Bachman*,  K.  Subramanian,  J.  L.  Muhlich,  L.  Galescu,  and  P.  K.  Sorger.  “From  word  models  to  executable  models  of  signaling  networks  using  automated  assembly.”  bioRxiv, 2017.2. V.  Danos,  J.  Feret,  W.  Fontana,  R.  Harmer,  and  J.  Krivine.  “Rule-­‐Based  Modeling  of  Cellular  Signaling.”  Concurrency  Theory  (CONCUR)  2007,  Lecture  Notes  in  Computer  Science, 4703:17–41,  2007.3. E.  J.  Molinelli,  A.  Korkut,  et  al.,  “Perturbation  biology:  Inferring  signaling  networks  in  cellular  systems.”  PLoS Computational  Biology, 9(12):e1003290,  Dec  2013.  4. J.  Allen,  W.  de  Beaumont,  L.  Galescu,  and  C.  M.  Teng.  “Complex  event  extraction  using  DRUM.”  2015.  5. M.  A.  Valenzuela-­‐Escarcega,  G.  Hahn-­‐Powell,  T.  Hicks,  and  M.  Surdeanu.  “A  domain-­‐independent  rule-­‐based  framework  for  event  extraction.”  In  Proc.  53rd  Annual  Meeting  of  the  ACL-­‐IJCNLP, 2015.6. D.  McDonald  et  al.,  “Extending  Biology  Models  with  Deep  NLP  over  Scientific  Articles.”  Workshops  at  the  30th AAAI  Conference  on  Artificial  Intelligence,  2016.7. C.  F.  Lopez*,  J.  L.  Muhlich*,  J.  A.  Bachman*,  and  P.  K.  Sorger.  “Programming  biological  models  in  python  using  PySB.”  Molecular  Systems  Biology, 9(1):646–646,  Apr  2014.  

Curated  mechanisms  for  MTOR  feedback  inhibition  on  AKT

TIAM1SOS1 ICAM1

PIK3CA

FGFR3

Ca PRKAA1PAK1

NRAS

NANOGP8

INSRNOX1

sorafenib

autophagy

PAK2RASGRF1

RAC1senescenceproliferationcell_proliferation

SIVA1 IRS1DUSP1

UTS2caffeine

ERBB2AGT

cell_survival

BRAFNF1

rapamycin

MTOR GRN

TP53

RPTOR

GRB10EIF4EBP1

ARAF

ZEB1SAA

EGF VPS37API3K

HGF

CDH1MMP9

METEGFR

TRIM11invasionERRFI1

aldosterone

IL6ST

KLF5DPP4

CYP11B2

artemisinin

Integrins

IFNL1CXCL8

HDAC6

apocynin

melatoninNADPH

HTN3cell_migration

SMC2

diosmetin

STAT3

CS

HMGB1

PLXNB1

IGF1R

NRG1

AR

CD274

PTX3dapagliflozin

SREBF1

GH1

UL138

RHBDD1

PLD1

SOX10

AKTIP

ANXA2

VLDL

SETD2

AMIGO2

CBL

PRH2

afatinib

cell_growth

SHC3

metastasis

RASA3

ELAVL1

SFTPC

SNAI2

cell_viability

GTP

angiogenesis

VAV1

THBS1TCN1

CXCL16

ALK

PIK3R3

CXCL12

PTENPIK3R1

SHC1AKT

RASA1

MAZSTK11

HRAS KRASRHOA

PTPN9

cetuximab

NEU1PDCD6IP

PTPN11metabolism

TP53BP2

GRB2

RPL17

RET

TNFRSF12A

tst

ABCB1

erlotinib

PGRMC1CXCL2

RASAL1 BEZ235ROCK1

RASA2

localizationLPA

CXCR4

UL135

PDCD6alcama

IGF2BP3

ADAM17

EPS8 WntCDC25A

CRP

ARHGAP35arsenite

CAV1

STUB1 FGF2

TNFSF12

GSK3B

PDGFRB

MB21D1

IRS2

PDGFD

endocytosis

NotchGPRC5ASOD2PHB

VEGFA DCA DA

CTSS

ABI1

DAB2IPRASSF2

CTNNB1

SNAI1GDPAGO2

SRC

KDR

SPRR2A

FOXP1

DLG1

RAF1

MAP2K2 MAP2K1

RPS6KB1

ADM

CDKN1B

PDPK1

RPS6KB2

DUSP3

CTGF

JUN

RPS6KA3

CDKN1A

Rapamycin

FOS

RPS6KA5

transcription

IRF1

MEK

DIRAS3

glucosePEBP1

AMPK

DYRK1B

TAS_116

MYC

VRK3

RAS

dabrafenib

PKA

KSR1

cell_cycle

CASP8

adhesionTRPM2CCL2

KIAA0101

S100A9

VCAM1

WISP1

HSMCR30

CXCL10

TNF

OLR1MKNK1

TLR4

NFKBIA

MITF

LPS

BCL2

NLRP1

MEK_inhibitorsp38

PLX4032

VEGFB

PKC

STAB2

apoptosis

cisplatinTGFB1

cypermethrin

CAMP

TNFRSF10BSLC22A3

differentiation

ERCC8

IFNG

BAX

HMOX1

IL6

ERK

CD36PTGS2

AREG

NFkappaB gefitinib

JNK

collagen

INS

ROSCP

GHRLEPHB2

PLAT

GCG

signal_transductioncell_death

RNF26SQSTM1NTRK1TubulinRUSC2PROCR APC NR4A2

GSK3dhaA

SIRT1

XBP1

SLC12A3CLEC4DMAP3K7 APEX1 KITLG

SLC6A2

FCGR3B

MAP3K3 CCL20

CXCR3

GRM2GLP1RERMAP

KIT

LRP1

APP REG1A POMC

CSF1R GIT2

CSF1CCR4

SCRIB

PKMHIF1A

Actin

PDGFRAFLT3

SMAD3FGF23 MMP2

FAT4

TEC

LPXN

ACOD1

RALAFABP4

TAZ

ARF6PPARG RBBP5 TLR5

MUC16

WNK3

MSLN

SMARCE1

dmpBMME

VIP

KIF13B

RA

melanin

CBLB

PTPRJ

SMPD1

translation

PTK2

IKBMMP13ERN1MM

rutin

CCR3

SMAD4

CCL28

GLUL

CCND2CCND1

CDK4

TFDP1

E2F1E2F2

TFDP2

E2F3

RASSF1

CDK6

IL12

DTLHOTAIR

RGS19

MARK2

RB1

cocaine

TET1

PJA2

MARK3

TNFSF11

AICARCAMKK2

PGD

NFATC1TNFRSF11A

SNCG

BDNF

FASLG

GLI1

IPO7

EGCG

progesterone

RASD1

MAS1paeoniflorin

UGCGDSPP

TNFRSF11B

NTRK2

NOS2

nitric_oxide

IL10

cytokine_production

NORELA

inflammatory_responseOXT

IGF1

SP600125

AQP7LRIG1

curcumin

TNFAIP8L2FAS

MAGEE1

Sorafenib

CASP3

aspirin

Cdetoposide

IL1B

CX3CL1

S1PR2

SMCP

WDR20ATP

HSPD1vorinostat

quercetinWFDC2

inflammation

PDK3PDK1 SYK

OSCAR

NLRP3

RNF126

cholesterolCTSK

BMP2 TAK165ATRA

hCGIGFBP7CHSY1 XYLT1CHST11

metformin

SPP1PPP1R3Acellular_senescenceBMP7

TAX1BP1

ACE2

PSMD4

SMAD

MTDH

SMAD7

FNDC5

fs_1_h

SPRY2

oxygen

NGFIQGAP1

SB203580

ABA

PITRM1MEK_inhibitorCXCL13

TNFSF10

STAR

CIRBP

TBCATLRNFE2L2WNT3A

KLF4

SAV1

SDC2

AMP

CASR

GDF15

PAK4

MST1

CDKN2ACASP7

VemurafenibvemurafenibRAF_inhibitors

AHR

PREX2CDC42

CYTH2PEA15

PRKAB2PRKAG1 PRKAA2

STK4

ARHGEF2

MDM2

FANCA

PRKAB1

ROCK2

FGFR2

LGALS1

JQ1

TERT

RAC2

GRM5

MAPKAPK2

MMP3

STK3YAP1

ICAM2

TP63

APAF1mTORC1 TSC2

MDM4 AKT1

RHEB

PAK3

TP73

RPS6KA1AKT3

MAP2K3

AKT2

DUSP10

DUSP8DUSP4

ETS1

DUSP7

ETS2DUSP16

DUSP2

DUSP6

FLI1ELF1FEVMYCBPSPDEF

ELK3

EHFELK4

CDK2

COPS5

TBK1

BRCA1

cyclin_E

GABPA

SKI

RPS6KC1RPS6KA4

RPS6KA6RPS6KA2

MAPK15MAPK6

ERGELF3ERF

ETV3MMP1

ELF4DDC

UNGCCNA2

MCM7TK1

CDC6

BARD1

CCNA1

MCM4

MCM3

FGFR1

ELK1ELF2ELF5

MAPK3DUSP9EXOC7DUSP5

MYB PPP1CA MAPK1

HSP90

Cyclin

MAPK8MAPK7

B

BV

BM

BGR

BGRV

BMV

BGMR

BR

BBGGRR

BBGGRRV

BBGGMRR

BBGGMRRV

BBGRR

BBGRRV

BBGMRR

BBGMRRV

BRV

BBGGRRVVBBGGMRRVV

BBGRRVV BBGMRRVV

BGMRV

BMR

BBGGMMRR

BBGGMMRRV

BBGMMRR

BBGMMRRV

BMRV

BBGGMMRRVV BBGMMRRVV

BBRR

BBRRV

BBMRR

BBMRRV

BBGR

BBGRV

BBGMR

BBGMRV

BBR

BBRV

BBMR

BBMRV

BBRRVV

BBMRRVV

BBGRVV

BBGMRVV

BBRVV

BBMRVV

BBMMRR

BBMMRRV

BBGMMR

BBGMMRV

BBMMR

BBMMRV

BBMMRRVVBBGMMRVV

BBMMRVV

BB

BBV

BBM

BBMV

BBVV

BBMVV

BBMM

BBMMV

BBMMVV

Model  representations  for  statically  identifying  causal  paths

Drug  com

binatio

ns

RPPA  measurements

How  did  this happen?

http://www.sanderlab.org/pertbio/

Directed  proteininteraction  graph

Kappa  ruleinfluence  map2

Chemical  reactionnetwork

Mechanistic  detail/causal  contextMore  false  positive paths(less  stringent  context)

More  false  negative paths(more  stringent  context)

Boolean  network

The  assembly  challenge

MEK  phosphorylates  ERK

ERK  phosphorylates  MEK

MEK1  phosphorylates  ERK2  at  T185  

MEK1p218p222  phosphorylates  ERK2  at  T184

MEK1p218p222  phosphorylates  ERK2  at  T185.  

Methyl  Ethyl  Ketone phosphorylates  ERK

“Raw”  mechanismsMEK  phosphorylates  ERK

MEK  phosphorylates  ERK

Assembled  mechanisms

Generating  mechanistic  models  from  assembled  Statements

In  directed  interaction  graphs,  the  relatively  limited  causal  context  leads  to  an  explosion  of  paths  between  any  two  proteins.  This  leads  to  many  false  positive  paths  and  makes  identification  of  long  causal  chains  difficult  (or  even  intractable)  in  large  networks.

Generating  explanations  from  the  Kappa2 rule  influence  map

-­‐ identifying  rules  whose  activity  is  increased  by  the  abundance  of  the  subject  (e.g.,  drug)

-­‐ searching  for  a  path  to  an  observable  representing  the  object  (e.g.,  a  measured  protein)  with  the  appropriate  overall  polarity

-­‐ scoring  paths  by  whether  the  signs  of  measured  intermediate  nodes  are  correctly  predicted

Causal  path  for  “Pervanadateincreases  MAPK1  phosphorylation”

Pvd_binds_DUSP

Pvd_binds_DUSP_rev

[0->0];[1->1]

DUSP_binds_MAPK1_phosT185

[1->0]

[0->0];[1->1]

[1->0]

[0->1]

DUSP_binds_MAPK1_phosT185_rev

[0->0];[1->1]DUSP_dephos_MAPK1_at_T185

[0->0];[1->1]

[0->1]

[0->0];[1->1]

[0->0];[1->1]

[0->1]

[0->0]

[0->0];[1->1]

MAPK1_pT185

[1->0]

Extending  the  model  by  describing  mechanisms  in  English

“IGF1R  phosphorylates  IRS1  at  tyrosine.Tyrosine-­‐phosphorylated  IRS1  binds  PI3K.Serine  phosphorylated  IRS1  is  degraded.Active  PPP2CA  dephosphorylates  IRS1  at  serine.Active  MTOR  inhibits  PPP2CA.

To  build  a  mechanistic  model,  high-­‐level  assertions  such  as  “MEK1  phosphorylates  ERK1”  must  be  converted  into  specific  reaction  mechanisms.  INDRA  uses  user-­‐specified  policies  that  determine  how  the  different  Statement  types  are  implemented,  as  PySB7 rules  and  corresponding  reactions.  

Phosphorylation(MEK1,  ERK1)

one-­‐step  (pseudo-­‐first-­‐order)one-­‐step  (Michaelis-­‐Menten)two-­‐step  (enzyme-­‐substrate  complex  formation)ATP-­‐dependent  (unordered  bi-­‐bi  reaction)

Genome  assemblySequence  reads

Assembled  sequence

Knowledge  assembly

Assembly  of  a  large  number  of  mechanistic  facts  is  analogous  to  genome  assembly:  databases  and  literature  yield  a  large  number  of  redundant,  partially  overlapping  facts  that  may  contain  errors.  Mechanisms  must  be  corrected  and  “aligned”  in  order  to  produce  a  set  of  facts  suitable  for  generating  a  non-­‐redundant,  non-­‐degenerate  model.  

To  evaluate  the  ability  of  INDRA  to  systematically  generate  explanations  of  high-­‐throughput  data,  we  assembled  a  rule-­‐based  executable  model  to  explain  a  previously  published  dataset  of  the  phospho-­‐proteomic  response  of  a  melanoma  cell  line  to  12  different  drugs.3 A  rule-­‐based  model  containing  221  proteins  and  1451  rules  was  assembled  from  mechanisms  extracted  from  databases  and  ~95,000  publications  (abstracts  and  full  texts).  Static  analysis  of  the  rule  influence  map  provided  by  Kappa  identified  possible  mechanistic  paths  linking  drug  targets  to  experimentally  observed  effects  on  phosphoprotein  abundances.

Drug Target

AntibodyFold-

changePath

?

MEK MAPK pT202 0.47

SRC CHK2 pT68 1.75

SRC 4EBP1 pT37 0.44

AKT AKT pT308 0.25

AKT GSK3A/B pS21 0.44

AKT AKT pS473 0.17

AKT S6 pS235 0.36

CDK4 4EBP1 pS65 0.44

CDK4 YBI pS102 2.13

MTOR AKT pT308 2.19

MTOR S6 pS240 0.05

MTOR AKT pS473 3.19

MTOR p70S6K pT389 0.33

MTOR S6 pS235 0.06

PKC GSK3A/B pS21 1.59

PKC S6 pS240 0.47

PKC S6 pS235 0.3

PI3K p70S6K pT389 0.5

PI3K S6 pS240 0.44

PI3K AKT pS473 0.2

PI3K S6 pS235 0.27

SRC  phosphorylated  on  Y418  phosphorylates  PAK2  on  S20.  PAK2  phosphorylated  on  S20  phosphorylates  RAF1  on  S338.  RAF1  phosphorylated  on  S338,  T269  and  S471  phosphorylates  MAPK1  on  T185.  MAPK1  phosphorylated  on  T185  and  Y187  phosphorylates  TP53  on  S15.  TP53  phosphorylated  on  S20  and  S15  decreases  the  amount  of  PLK1.  PLK1  phosphorylates  CHEK2  on  T68,  which  is  measured  by  CHK2_pT68.

Example  explanation: How  does  Src inhibition  increase  CHK2  pT68?  

Performance: For  the  largest  effects  in  the  data  (>50%  fold-­‐change)  the  model  generated  biochemically  plausible  explanations  for  20  of  the  22  effects  (91%).  For  effects  at  the  20%  fold-­‐change  level,  the  model

Where  the  model  was  unable  to  identify  a  causal  path  between  a  drug  perturbation  and  an  observed  effect,  we  were  able  to  use  NLP  to  manually  curate  a  causal  path  in  simplified  English  and  co-­‐assemble  it  with  the  automated  model.

Overall,  this  study  shows  the  potential  of  automatically  assembled  models  to  systematically  explain  high-­‐throughput  data,  generating  mechanistic  hypotheses  and  identifying  genuinely  novel  phenomena.

explained  95/135  (70%)  of  effects.  Notably,  performance  was  biased  toward  drug  targets  well-­‐represented  in  the  literature  corpus:  the  model  explained  94/106  (89%)  of  effects  due  to  PI3K,  PKC,  SRC,  MTOR,  MEK,  AKT,  RAF,  and  JAK  inhibition,  but  only  1/29  (3%)  of  effects  due  to  CDK,  STAT  or  MDM2  inhibition.

The  Kappa  influence  map  captures  detailed  context  while  avoiding  the  combinatorial  explosion  of  chemical  species.  Paths  are  obtained  by: