17
Physical Biology PAPER • OPEN ACCESS State–time spectrum of signal transduction logic models To cite this article: Aidan MacNamara et al 2012 Phys. Biol. 9 045003 View the article online for updates and enhancements. You may also like Dynamics and stability of a three- dimensional model of cell signal transduction with delay Chris Levy and David Iron - Optical biosensing in microfluidics using nanoporous microbeads and amorphous silicon thin-film photodiodes: quantitative analysis of molecular recognition and signal transduction Inês F Pinto, Denis R Santos, Catarina R F Caneira et al. - Signal Sensing and Signal Transduction with Heme and Hemeproteins Shigetoshi Aono - Recent citations Literature and data-driven based inference of signalling interactions using time-course data Enio Gjerga et al - Visualization of drug target interactions in the contexts of pathways and networks with ReactomeFIViz Aurora S. Blucher et al - Hybrid parallel multimethod hyperheuristic for mixed-integer dynamic optimization problems in computational systems biology Patricia González et al - This content was downloaded from IP address 221.160.146.209 on 14/11/2021 at 09:53

State--time spectrum of signal transduction logic models

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Physical Biology

PAPER bull OPEN ACCESS

Statendashtime spectrum of signal transduction logicmodelsTo cite this article Aidan MacNamara et al 2012 Phys Biol 9 045003

View the article online for updates and enhancements

You may also likeDynamics and stability of a three-dimensional model of cell signaltransduction with delayChris Levy and David Iron

-

Optical biosensing in microfluidics usingnanoporous microbeads and amorphoussilicon thin-film photodiodes quantitativeanalysis of molecular recognition andsignal transductionInecircs F Pinto Denis R Santos Catarina RF Caneira et al

-

Signal Sensing and Signal Transductionwith Heme and HemeproteinsShigetoshi Aono

-

Recent citationsLiterature and data-driven based inferenceof signalling interactions using time-coursedataEnio Gjerga et al

-

Visualization of drug target interactions inthe contexts of pathways and networkswith ReactomeFIVizAurora S Blucher et al

-

Hybrid parallel multimethod hyperheuristicfor mixed-integer dynamic optimizationproblems in computational systemsbiologyPatricia Gonzaacutelez et al

-

This content was downloaded from IP address 221160146209 on 14112021 at 0953

OPEN ACCESSIOP PUBLISHING PHYSICAL BIOLOGY

Phys Biol 9 (2012) 045003 (16pp) doi1010881478-397594045003

Statendashtime spectrum of signaltransduction logic modelsAidan MacNamara1 Camille Terfve1 David Henriques1Beatriz Penalver Bernabe12 and Julio Saez-Rodriguez1

1 European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus Cambridge CB101SD UK2 Department of Chemical and Biological Engineering Northwestern University 2145 Sheridan RdTech E-136 Evanston IL 60208-3120 USA

E-mail saezrodriguezebiacuk

Received 29 February 2012Accepted for publication 30 May 2012Published 7 August 2012Online at stacksioporgPhysBio9045003

AbstractDespite the current wealth of high-throughput data our understanding of signal transduction isstill incomplete Mathematical modeling can be a tool to gain an insight into such processesDetailed biochemical modeling provides deep understanding but does not scale well aboverelatively a few proteins In contrast logic modeling can be used where the biochemicalknowledge of the system is sparse and because it is parameter free (or at most uses relativelya few parameters) it scales well to large networks that can be derived by manual curation orretrieved from public databases Here we present an overview of logic modeling formalismsin the context of training logic models to data and specifically the different approaches tomodeling qualitative to quantitative data (state) and dynamics (time) of signal transductionWe use a toy model of signal transduction to illustrate how different logic formalisms(Boolean fuzzy logic and differential equations) treat state and time Different formalismsallow for different features of the data to be captured at the cost of extra requirements in termsof computational power and data quality and quantity Through this demonstration theassumptions behind each formalism are discussed as well as their advantages anddisadvantages and possible future developments

S Online supplementary data available from stacksioporgPhysBio9045003mmedia

1 Introduction

The question of how signal transduction networks are ableto weigh and integrate a multitude of extra- and intracellularsignals into context-specific phenotypic outcomes is complexand difficult to answer Typically a signal transductionnetwork links diverse inputs (stimuli) and outputs (generegulation motility etc) through a dense system of proteinsassembled in pathways that are connected by crosstalk andembedded in feedback loops (Rangamani and Iyengar 2008Joslashrgensen and Linding 2010 Terfve and Saez-Rodriguez

Content from this work may be used under the terms of theCreative Commons Attribution-NonCommercial-ShareAlike

30 licence Any further distribution of this work must maintain attribution tothe author(s) and the title of the work journal citation and DOI

2012) This complexity enhances the robustness and versatilityof the network but makes it difficult to understand in termsof mechanism This is demonstrated where the complexconsequences of mutation and deregulation in diseases such ascancer make identifying potential drug targets difficult even inthe case where the causative mutation is well known (Kreegerand Lauffenburger 2010 Patlak 2010) Often counter-intuitivetherapeutic targets produce the most successful results dueto this complexity and the field of network pharmacology isbased around this premise (Aislyn and Boran 2010)

Ideally in order to understand such a complex anddynamic system the quantities and states of large populationsof proteins and their splice variants should be measuredin vivo across time and across populations of cells (both tissuesand individuals) under a range of different conditions (Liberali

1478-397512045003+16$3300 1 copy 2012 IOP Publishing Ltd Printed in the UK amp the USA

Phys Biol 9 (2012) 045003 A MacNamara et al

et al 2008) In the absence of this quality of data it is necessaryto use more qualitative and less time-resolved information todeduce mechanism The focus of measurement in such signaltransduction networks is the protein or more specifically theprotein together with post-translational modifications (PTMs)as it is PTMs such as phosphorylation that convey informationthrough a network Hence measurement comes from the fieldof proteomics However assumptions are necessary whenconsidering the variety of PTMs that may occur There aremore than 500 different types of PTMs and measuring thestatus of each site for all proteins is technically impossible(Khoury et al 2011) Indeed this problem can be encounteredwith just phosphorylation alone Considering the epidermalgrowth factor receptor (EGFR) has 31 phosphorylation sitesthis implies that 231 states of EGFR (each site can bephosphorylated or not) would need to be measured to providefull knowledge of the activation of this receptor and howit would change over time Therefore the study of signaltransduction networks tends to concentrate on a subset ofphosphorylation sites where the site interaction partner(s)is known and measurement is technologically feasible (eghigh-quality antibodies are available) These phosphorylationevents are often used as markers of activation and deactivationThe consequences of such an approach is an experimental biastoward such phosphosites a problem that is only now beingaddressed through less-biased high-throughput techniquessuch as mass spectrometry

Phosphoproteomics can be divided into antibody-and mass-spectrometry-based methods A comprehensivesummary of these methods can be found in Terfve and Saez-Rodriguez (2012) Broadly speaking the quality of data can bemeasured in terms of coverage time resolution and specificityAntibody-based methods are generally specific (depending onthe quality of the antibody) and can be used to measure timecourses of target proteins across many conditions Howeverthe number of targets that can be measured is limited Incontrast mass spectrometry techniques allow for systematicidentification and quantification of phosphorylated proteinsAlthough this comes with the caveat of requiring expensiveequipment and advanced know-how as reliable quantificationcan be difficult protocols (especially for mass spectrometry)can be laborious and mapping measurements to proteins isnot trivial (Ilsley et al 2009)

Whatever method is chosen from the above the resultis a quantitative lsquopart listrsquo consisting of phosphoproteomicmeasurements from the signaling network of interest takenunder a certain number of conditions (different stimuliinhibitors time points etc) and describing the states of theseparts

11 From parts to interactions

In order to deduce a mechanism of action that explainsthese types of data the interactions between the partsmust be understood Interactions can be represented asnode-edge graphs The nodes can be biological entitiessuch as proteins as in this case or genes or metabolitesin the case of transcriptional or metabolic studies Edges

can be described as biological activities such as catalysisassociation and modification of the participating nodesThey may be directed (ie protein X affects protein Y andnot vice versa) or undirected Furthermore they can besigned (inhibitoryactivating) or unsigned When these graphsdescribe protein interactions they can be characterized in twocategories protein interaction networks (PINs) and proteinsignaling networks (PSNs) (Pieroni et al 2008)

PINs can be constructed from a number of sourceshigh-throughput experiments such as two-hybrid and affinitypurificationmass spectrometry or systematic literaturesearches (bibliome mining) These methods yield limitedfunctional insight beyond a possible interaction between twoproteins They are represented as a graph with a set of nodesconnected by a set of edges without sign or directionalityThere are a number of public repositories of PINs such asIntAct HPRD and STRING

PSNs are more detailed representations of proteininteractions where if described as a graph their edges can havedirectionality and when possible sign PSNs are generallyobtained through expert curation of the literature or textmining There are multiple public repositories of manuallycurated networks including KEGG WikiPathways NaturePathway Interaction Database and Reactome Each has theirstrengths and weaknesses in terms of graphical formatsannotation accuracy and curation When creating a PSN asmany sources as possible should be consulted before decidingon a final network (Bader et al 2006 Bauer-Mehren et al 2009)Pathway Commons is also a useful portal that integrates theseand other PSN and PIN repositories

While PSNs provide an insight into the transfer ofsignaling information they do so at an abstract levelwithout specifying the mechanism of signal transductionThis additional detail is provided by biochemical networksthat describe such interactions in a quantitative manner(ie phosphorylation binding dimerization etc) There aremany examples in the context of metabolism and recentlyan increasing number for signal transduction such as thereconstruction of the signaling network downstream ofthe EGFR (Oda et al 2005) and the retinoblastomaE2transcription factor (RBE2F) pathway (Calzone et al 2008)

Independent of the resolution of the network there issuch a high degree of interconnectedness redundancy andcellcontext specificity even in well-studied networks suchas the mitogen-activated protein kinase (MAPK) signalingpathway that it is difficult to obtain a high degree ofaccuracy from prior knowledge alone (Kirouac et al 2012)Therefore prior knowledge networks should be constructedwith as much cell-specific knowledge as possible and high-throughput databases can be advantageous to add annotationand information (eg expression data for a particular networkin a specific cell type) The growing use of model standardssuch as Systems Biology Markup Language (SBML) (Huckaet al 2003) can also aid in pooling resources across datasets

12 From interactions to mechanism

Through manual curation of the literature or networkdatabases there are a number of ways of arriving at a

2

Phys Biol 9 (2012) 045003 A MacNamara et al

network or map that represents the biological interactions ofthe system of interest So what are the requirements to stepfrom this biological representation to a mathematical analysisand understanding of these networks Graph theory can beused to analyze the topology of a network to understandthe principles behind its design (Barabasi and Oltvai 2004)These networks can also be used as a scaffold for overlayingexpression data to better understand the activation of sub-networks (Bossi and Lehner 2009) These analyses can providea useful insight but are not amenable to explaining how asignaling network responds to a defined set of perturbations(Saez-Rodriguez et al 2009) This aim can only be achievedvia mechanistic and predictive computational modeling Bycomputational modeling we mean in the context of thispaper the construction of an in silico representation of asystem (in this case a cell signaling network) that can besimulated through a set of programmable commands thatmimic the functioning of the system over time (Terfve andSaez-Rodriguez 2012) Simulating over time (or dynamicmodeling) consists of using functions to describe how eachspeciesrsquo (or nodersquos) state in a network changes as a functionof its inputs

There are many approaches to this type of modelingand the choice of method is highly dependent on the qualityand type of data available for the network together withthe accuracy of ones prior knowledge about its topology andinteractions We will briefly give an overview of some popularmethods but more detail about each approach can be foundin a number of excellent reviews (eg Aldridge et al 2006 deJong 2002)

Physicochemical modeling (modeling that includesbiochemical and physical features of the system) is anapproach that uses equations derived from physical andchemical theory to describe biological processes such ascovalent binding association and diffusion (Aldridge et al2006) This is a popular and insightful type of model forsignaling networks and many examples can be found inrepositories such as biomodelsnet These equations are builtthrough a deep understanding of the underlying biochemistryand hence refer to distinct processes (such as catalysisand complex formation) The family of physicochemicalmodel formalisms include among others ordinary and partialdifferential equations (ODEs and PDEs) their stochasticvariants and rule-based approaches ODEs are the mostcommon approach and can represent a signaling networkthrough a set of coupled equations that describe the changein concentration of the elements (biomolecular species) ofthe network ODEs are based on the assumption of massaction kineticsmdasha law that defines the rate of a reactionas being proportional to the concentrations of the reactingspecies (Chen et al 2010) This assumption can break downif there are spatial gradients for species or if concentrationsof species are low enough that random fluctuations become afactor in the behavior of the system In such cases PDEs andstochastic formalisms are better suited to capture the biologicalbehavior

Another drawback to physicochemical modeling is thedifficulty in managing and manipulating large networks both

in terms of the combinatorial complexity that such networkspresent (for example the number of phosphorylation statesof EGFR) and determination of the parameters of eachequation such as rate constants and initial conditions Rule-based modeling allows easier manipulation and managementof larger systems Models are specified by a set of rulescorresponding to the molecular interactions among proteindomains and these are then automatically converted into amodel that describes all possible reactions and molecularconfigurations (see Hlavacek et al (2006) for an introductionto rule-based modeling)

In summary physicochemically detailed modelinggenerally works well with small detailed biochemicalnetworks In the absence of such criteria a coarser-grainedapproach is necessary and logic modeling can be viewed inthis light

13 Logic modeling

Unlike physicochemical modeling logic modeling requiresonly a PSN as a starting point to simulate signaling processesAlthough sparse in detail such graphs are very insightfulfor understanding how the structure (or topology) determinesthe flow of information from input through to output (forexample ligandndashreceptor binding through to transcriptionfactor activation (Marsquoayan et al 2005)) However beforequestions can be addressed by simulation the graph must bemade computable by defining how each node state changesas its inputs change so that inputndashoutput relationships can bequantified for the whole system

Logic modeling uses transfer functions to describe therelationship between nodes in a graph (see section 83)Transfer functions are the mathematical representation ofthe relationship between inputs and outputs in a system Inphysicochemical modeling these are based on mass actionkinetics and describe how the input species are transformedinto output species by the chemical reaction In logic modelingtransfer functions consist of logic operators (AND OR NOT)that describe how an output node is activated by its inputs Toillustrate this we can use a simple case from the PI-3-Kinase-Akt signaling network that controls growth and division inmammalian cells As part of this network the kinase Akt isactivated by the kinase PDK1 and the kinase complex mTORThis would be represented in a PSN by two directed positiveedges from PDK1 and mTOR to Akt However from thisrepresentation it would not be known whether both or eitherkinases are necessary for Akt activation In logic modeling thisrelationship can be represented using an AND operator thatspecifies the necessity of both kinases for Akt activation Suchan example also illustrates the strength and weakness of logicmodeling the reduction of complexity that enables modelingof large systems with incomplete information and fewerparameters against less mechanistic detail and biochemicalaccuracy

The use of logic-based modeling of biological systemsgoes back to more than 40 years with the first model describinga gene regulatory network (Kauffman 1969) Since then logic

3

Phys Biol 9 (2012) 045003 A MacNamara et al

modeling has proved particularly useful in describing the effectof environmental inputs on cell phenotypes through networksof signal transduction There are multiple studies using thistype of modeling as a basis (Helikar et al 2008 Calzoneet al 2010 Gonzalez et al 2008 Mendoza and Xenarios 2006Schlatter et al 2009 Sahin et al 2009) (see also reviews Morriset al 2010 Watterson et al 2008 Thakar and Albert 2010)The structure of a signal transduction network lends itselfto logic modeling with clearly defined input nodes (ligandndashreceptor combinations) measurable elements correspondingto activation (phosphorylated proteins downstream of thereceptor) and relatively little knowledge of the biochemistryinvolved

Having summarized how logic modeling formulatesinputoutput relationships (see above and section 83) thenext step is to consider the complexity of the logic modelingin terms of how state and time are treated The state refershere to the value or quantity attached to each node (typicallya protein) and reflects the activation of that node at any pointduring a simulation Their value is proportional to activationand can vary from 0 to some arbitrary or defined upper limitdepending on the type of logic modeling being undertakenStates can be defined as onoff (Boolean logic) multi-levelor continuous and we will discuss each of these in turnWhen training logic models with experiments these valuesare often compared to biochemical data For instance thephosphorylation of a protein is considered to be a proxy of itsactivation (eg phosphorylation at the phosphosite threonine-202 of extracellular signal-regulated kinase (ERK))

Similarly there are several approaches to handle timein a logic context ranging from the simplest approximationof discrete or steady-state measurement to more biologicallyrealistic continuous updating These techniques will also beintroduced in turn with the examples below

14 Software

As a means to introduce the different methods and how theycan be used to model different aspects of signal transductionwe will use the tool CellNOpt (wwwcellnoptorg) CellNOptis a software package that trains the topology of a PSN toexperimental data by the criterion of minimizing the errorbetween the data and the logic model created from the PSNIn CellNOpt the starting network based on prior knowledgeis called the prior knowledge network (PKN) (a name wewill use in the rest of this paper) This PKN is preprocessedbefore training by compression and expansion (see materialsand methods section 81) The compression step of CellNOptis a method of reducing the complexity of a logic modelby removing nodes that have no effect on the outcome ofsimulation The expansion step subsequently includes allpossible hyperedges (materials and methods section 81) inthe model The model is trained by minimizing a bipartitefunction that calculates the mismatch between the logic modeland experimental data (mean squared error (MSE)) whilepenalizing model size This minimization can be solved usingdifferent strategies from simple enumeration of options forsmall cases to stochastic optimization algorithms such as

genetic algorithms (Saez-Rodriguez et al 2009) or integerlinear programming (Mitsos et al 2009)

The R version (CellNOptR) is available on Bioconductorand has a number of added features that allows the user torun different variations of logic modeling within the sameframework of model calibration These variations includesteady state to discrete time Boolean modeling fuzzy logic andlogic ODEs all of which will be discussed in turn below Wewill also refer to other software packages that have contrastingor complementary approaches to CellNOptR

For the remainder of the tutorial different logicformalisms will be introduced and explained with the aid ofCellNOptR and the assumptions strengths and weaknesses ofeach formalism with regard to training to data will be illustratedwith a lsquotoy modelrsquo of signal transduction

15 The example model

To illustrate the variety of logic modeling approacheswe will use an imaginary but biologically plausible PKN(figure 1) This network includes a subset of intracellularsignaling networks known to be activated downstream ofEGF and TNFα stimulation and was derived from a largernetwork presented in Saez-Rodriguez et al (2009) In briefthe PKN includes three MAPK cascades (ERK p38 andJNK1) the PI3KAktGSK-3 pro-survival pathway andthe IKKIκBNFκB pathway It consists of 30 nodes and33 edges

From this PKN we derived a model (the data-generating model) that was used to simulate experimentaldata (section 81 and table S1figure S1 (available fromstacksioporgPhysBio9045003mmedia)) The in silico datareplicate biologically plausible behavior that has been seenin such networks such as the transient behavior of ERKactivation (Sasagawa et al 2005) and the oscillatory dynamicsof NFκB translocation from the cytoplasm to the nucleus(Hoffmann et al 2002) (figure 2) These in silico dataconsist of ten lsquoexperimentsrsquo which vary according to differentcombinations of stimulation and inhibition (inhibition isachieved by blocking the activity of two specific kinases(proteins) PI3K and Raf-1 with small-molecule inhibitors)and 16 observations at 2 min intervals from t = 0Inhibition is used in such experiments to further understandthe combinations of upstream events that contribute to theactivation of a particular protein The readouts chosen are well-established downstream events of EGFTNFα stimulationThe experiment represents an ideal situation with multipletime-point sampling However as we will discuss later withfewer measurements one can capture most (but not necessarilyall) of the dynamics of the system The values are between 0and 1 and Gaussian noise was also added to the output toimitate inherent biological noise and the measurement error(see materials and methods section 82) The PKN has thefollowing important properties

(i) It does not specify which input or combination of inputsactivate a particular node (for example both Map3K1and Map3K7 activate MKK4 The PKN does not specifywhether this is an AND or OR relationship figure 1)

4

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

OPEN ACCESSIOP PUBLISHING PHYSICAL BIOLOGY

Phys Biol 9 (2012) 045003 (16pp) doi1010881478-397594045003

Statendashtime spectrum of signaltransduction logic modelsAidan MacNamara1 Camille Terfve1 David Henriques1Beatriz Penalver Bernabe12 and Julio Saez-Rodriguez1

1 European Bioinformatics Institute (EMBL-EBI) Wellcome Trust Genome Campus Cambridge CB101SD UK2 Department of Chemical and Biological Engineering Northwestern University 2145 Sheridan RdTech E-136 Evanston IL 60208-3120 USA

E-mail saezrodriguezebiacuk

Received 29 February 2012Accepted for publication 30 May 2012Published 7 August 2012Online at stacksioporgPhysBio9045003

AbstractDespite the current wealth of high-throughput data our understanding of signal transduction isstill incomplete Mathematical modeling can be a tool to gain an insight into such processesDetailed biochemical modeling provides deep understanding but does not scale well aboverelatively a few proteins In contrast logic modeling can be used where the biochemicalknowledge of the system is sparse and because it is parameter free (or at most uses relativelya few parameters) it scales well to large networks that can be derived by manual curation orretrieved from public databases Here we present an overview of logic modeling formalismsin the context of training logic models to data and specifically the different approaches tomodeling qualitative to quantitative data (state) and dynamics (time) of signal transductionWe use a toy model of signal transduction to illustrate how different logic formalisms(Boolean fuzzy logic and differential equations) treat state and time Different formalismsallow for different features of the data to be captured at the cost of extra requirements in termsof computational power and data quality and quantity Through this demonstration theassumptions behind each formalism are discussed as well as their advantages anddisadvantages and possible future developments

S Online supplementary data available from stacksioporgPhysBio9045003mmedia

1 Introduction

The question of how signal transduction networks are ableto weigh and integrate a multitude of extra- and intracellularsignals into context-specific phenotypic outcomes is complexand difficult to answer Typically a signal transductionnetwork links diverse inputs (stimuli) and outputs (generegulation motility etc) through a dense system of proteinsassembled in pathways that are connected by crosstalk andembedded in feedback loops (Rangamani and Iyengar 2008Joslashrgensen and Linding 2010 Terfve and Saez-Rodriguez

Content from this work may be used under the terms of theCreative Commons Attribution-NonCommercial-ShareAlike

30 licence Any further distribution of this work must maintain attribution tothe author(s) and the title of the work journal citation and DOI

2012) This complexity enhances the robustness and versatilityof the network but makes it difficult to understand in termsof mechanism This is demonstrated where the complexconsequences of mutation and deregulation in diseases such ascancer make identifying potential drug targets difficult even inthe case where the causative mutation is well known (Kreegerand Lauffenburger 2010 Patlak 2010) Often counter-intuitivetherapeutic targets produce the most successful results dueto this complexity and the field of network pharmacology isbased around this premise (Aislyn and Boran 2010)

Ideally in order to understand such a complex anddynamic system the quantities and states of large populationsof proteins and their splice variants should be measuredin vivo across time and across populations of cells (both tissuesand individuals) under a range of different conditions (Liberali

1478-397512045003+16$3300 1 copy 2012 IOP Publishing Ltd Printed in the UK amp the USA

Phys Biol 9 (2012) 045003 A MacNamara et al

et al 2008) In the absence of this quality of data it is necessaryto use more qualitative and less time-resolved information todeduce mechanism The focus of measurement in such signaltransduction networks is the protein or more specifically theprotein together with post-translational modifications (PTMs)as it is PTMs such as phosphorylation that convey informationthrough a network Hence measurement comes from the fieldof proteomics However assumptions are necessary whenconsidering the variety of PTMs that may occur There aremore than 500 different types of PTMs and measuring thestatus of each site for all proteins is technically impossible(Khoury et al 2011) Indeed this problem can be encounteredwith just phosphorylation alone Considering the epidermalgrowth factor receptor (EGFR) has 31 phosphorylation sitesthis implies that 231 states of EGFR (each site can bephosphorylated or not) would need to be measured to providefull knowledge of the activation of this receptor and howit would change over time Therefore the study of signaltransduction networks tends to concentrate on a subset ofphosphorylation sites where the site interaction partner(s)is known and measurement is technologically feasible (eghigh-quality antibodies are available) These phosphorylationevents are often used as markers of activation and deactivationThe consequences of such an approach is an experimental biastoward such phosphosites a problem that is only now beingaddressed through less-biased high-throughput techniquessuch as mass spectrometry

Phosphoproteomics can be divided into antibody-and mass-spectrometry-based methods A comprehensivesummary of these methods can be found in Terfve and Saez-Rodriguez (2012) Broadly speaking the quality of data can bemeasured in terms of coverage time resolution and specificityAntibody-based methods are generally specific (depending onthe quality of the antibody) and can be used to measure timecourses of target proteins across many conditions Howeverthe number of targets that can be measured is limited Incontrast mass spectrometry techniques allow for systematicidentification and quantification of phosphorylated proteinsAlthough this comes with the caveat of requiring expensiveequipment and advanced know-how as reliable quantificationcan be difficult protocols (especially for mass spectrometry)can be laborious and mapping measurements to proteins isnot trivial (Ilsley et al 2009)

Whatever method is chosen from the above the resultis a quantitative lsquopart listrsquo consisting of phosphoproteomicmeasurements from the signaling network of interest takenunder a certain number of conditions (different stimuliinhibitors time points etc) and describing the states of theseparts

11 From parts to interactions

In order to deduce a mechanism of action that explainsthese types of data the interactions between the partsmust be understood Interactions can be represented asnode-edge graphs The nodes can be biological entitiessuch as proteins as in this case or genes or metabolitesin the case of transcriptional or metabolic studies Edges

can be described as biological activities such as catalysisassociation and modification of the participating nodesThey may be directed (ie protein X affects protein Y andnot vice versa) or undirected Furthermore they can besigned (inhibitoryactivating) or unsigned When these graphsdescribe protein interactions they can be characterized in twocategories protein interaction networks (PINs) and proteinsignaling networks (PSNs) (Pieroni et al 2008)

PINs can be constructed from a number of sourceshigh-throughput experiments such as two-hybrid and affinitypurificationmass spectrometry or systematic literaturesearches (bibliome mining) These methods yield limitedfunctional insight beyond a possible interaction between twoproteins They are represented as a graph with a set of nodesconnected by a set of edges without sign or directionalityThere are a number of public repositories of PINs such asIntAct HPRD and STRING

PSNs are more detailed representations of proteininteractions where if described as a graph their edges can havedirectionality and when possible sign PSNs are generallyobtained through expert curation of the literature or textmining There are multiple public repositories of manuallycurated networks including KEGG WikiPathways NaturePathway Interaction Database and Reactome Each has theirstrengths and weaknesses in terms of graphical formatsannotation accuracy and curation When creating a PSN asmany sources as possible should be consulted before decidingon a final network (Bader et al 2006 Bauer-Mehren et al 2009)Pathway Commons is also a useful portal that integrates theseand other PSN and PIN repositories

While PSNs provide an insight into the transfer ofsignaling information they do so at an abstract levelwithout specifying the mechanism of signal transductionThis additional detail is provided by biochemical networksthat describe such interactions in a quantitative manner(ie phosphorylation binding dimerization etc) There aremany examples in the context of metabolism and recentlyan increasing number for signal transduction such as thereconstruction of the signaling network downstream ofthe EGFR (Oda et al 2005) and the retinoblastomaE2transcription factor (RBE2F) pathway (Calzone et al 2008)

Independent of the resolution of the network there issuch a high degree of interconnectedness redundancy andcellcontext specificity even in well-studied networks suchas the mitogen-activated protein kinase (MAPK) signalingpathway that it is difficult to obtain a high degree ofaccuracy from prior knowledge alone (Kirouac et al 2012)Therefore prior knowledge networks should be constructedwith as much cell-specific knowledge as possible and high-throughput databases can be advantageous to add annotationand information (eg expression data for a particular networkin a specific cell type) The growing use of model standardssuch as Systems Biology Markup Language (SBML) (Huckaet al 2003) can also aid in pooling resources across datasets

12 From interactions to mechanism

Through manual curation of the literature or networkdatabases there are a number of ways of arriving at a

2

Phys Biol 9 (2012) 045003 A MacNamara et al

network or map that represents the biological interactions ofthe system of interest So what are the requirements to stepfrom this biological representation to a mathematical analysisand understanding of these networks Graph theory can beused to analyze the topology of a network to understandthe principles behind its design (Barabasi and Oltvai 2004)These networks can also be used as a scaffold for overlayingexpression data to better understand the activation of sub-networks (Bossi and Lehner 2009) These analyses can providea useful insight but are not amenable to explaining how asignaling network responds to a defined set of perturbations(Saez-Rodriguez et al 2009) This aim can only be achievedvia mechanistic and predictive computational modeling Bycomputational modeling we mean in the context of thispaper the construction of an in silico representation of asystem (in this case a cell signaling network) that can besimulated through a set of programmable commands thatmimic the functioning of the system over time (Terfve andSaez-Rodriguez 2012) Simulating over time (or dynamicmodeling) consists of using functions to describe how eachspeciesrsquo (or nodersquos) state in a network changes as a functionof its inputs

There are many approaches to this type of modelingand the choice of method is highly dependent on the qualityand type of data available for the network together withthe accuracy of ones prior knowledge about its topology andinteractions We will briefly give an overview of some popularmethods but more detail about each approach can be foundin a number of excellent reviews (eg Aldridge et al 2006 deJong 2002)

Physicochemical modeling (modeling that includesbiochemical and physical features of the system) is anapproach that uses equations derived from physical andchemical theory to describe biological processes such ascovalent binding association and diffusion (Aldridge et al2006) This is a popular and insightful type of model forsignaling networks and many examples can be found inrepositories such as biomodelsnet These equations are builtthrough a deep understanding of the underlying biochemistryand hence refer to distinct processes (such as catalysisand complex formation) The family of physicochemicalmodel formalisms include among others ordinary and partialdifferential equations (ODEs and PDEs) their stochasticvariants and rule-based approaches ODEs are the mostcommon approach and can represent a signaling networkthrough a set of coupled equations that describe the changein concentration of the elements (biomolecular species) ofthe network ODEs are based on the assumption of massaction kineticsmdasha law that defines the rate of a reactionas being proportional to the concentrations of the reactingspecies (Chen et al 2010) This assumption can break downif there are spatial gradients for species or if concentrationsof species are low enough that random fluctuations become afactor in the behavior of the system In such cases PDEs andstochastic formalisms are better suited to capture the biologicalbehavior

Another drawback to physicochemical modeling is thedifficulty in managing and manipulating large networks both

in terms of the combinatorial complexity that such networkspresent (for example the number of phosphorylation statesof EGFR) and determination of the parameters of eachequation such as rate constants and initial conditions Rule-based modeling allows easier manipulation and managementof larger systems Models are specified by a set of rulescorresponding to the molecular interactions among proteindomains and these are then automatically converted into amodel that describes all possible reactions and molecularconfigurations (see Hlavacek et al (2006) for an introductionto rule-based modeling)

In summary physicochemically detailed modelinggenerally works well with small detailed biochemicalnetworks In the absence of such criteria a coarser-grainedapproach is necessary and logic modeling can be viewed inthis light

13 Logic modeling

Unlike physicochemical modeling logic modeling requiresonly a PSN as a starting point to simulate signaling processesAlthough sparse in detail such graphs are very insightfulfor understanding how the structure (or topology) determinesthe flow of information from input through to output (forexample ligandndashreceptor binding through to transcriptionfactor activation (Marsquoayan et al 2005)) However beforequestions can be addressed by simulation the graph must bemade computable by defining how each node state changesas its inputs change so that inputndashoutput relationships can bequantified for the whole system

Logic modeling uses transfer functions to describe therelationship between nodes in a graph (see section 83)Transfer functions are the mathematical representation ofthe relationship between inputs and outputs in a system Inphysicochemical modeling these are based on mass actionkinetics and describe how the input species are transformedinto output species by the chemical reaction In logic modelingtransfer functions consist of logic operators (AND OR NOT)that describe how an output node is activated by its inputs Toillustrate this we can use a simple case from the PI-3-Kinase-Akt signaling network that controls growth and division inmammalian cells As part of this network the kinase Akt isactivated by the kinase PDK1 and the kinase complex mTORThis would be represented in a PSN by two directed positiveedges from PDK1 and mTOR to Akt However from thisrepresentation it would not be known whether both or eitherkinases are necessary for Akt activation In logic modeling thisrelationship can be represented using an AND operator thatspecifies the necessity of both kinases for Akt activation Suchan example also illustrates the strength and weakness of logicmodeling the reduction of complexity that enables modelingof large systems with incomplete information and fewerparameters against less mechanistic detail and biochemicalaccuracy

The use of logic-based modeling of biological systemsgoes back to more than 40 years with the first model describinga gene regulatory network (Kauffman 1969) Since then logic

3

Phys Biol 9 (2012) 045003 A MacNamara et al

modeling has proved particularly useful in describing the effectof environmental inputs on cell phenotypes through networksof signal transduction There are multiple studies using thistype of modeling as a basis (Helikar et al 2008 Calzoneet al 2010 Gonzalez et al 2008 Mendoza and Xenarios 2006Schlatter et al 2009 Sahin et al 2009) (see also reviews Morriset al 2010 Watterson et al 2008 Thakar and Albert 2010)The structure of a signal transduction network lends itselfto logic modeling with clearly defined input nodes (ligandndashreceptor combinations) measurable elements correspondingto activation (phosphorylated proteins downstream of thereceptor) and relatively little knowledge of the biochemistryinvolved

Having summarized how logic modeling formulatesinputoutput relationships (see above and section 83) thenext step is to consider the complexity of the logic modelingin terms of how state and time are treated The state refershere to the value or quantity attached to each node (typicallya protein) and reflects the activation of that node at any pointduring a simulation Their value is proportional to activationand can vary from 0 to some arbitrary or defined upper limitdepending on the type of logic modeling being undertakenStates can be defined as onoff (Boolean logic) multi-levelor continuous and we will discuss each of these in turnWhen training logic models with experiments these valuesare often compared to biochemical data For instance thephosphorylation of a protein is considered to be a proxy of itsactivation (eg phosphorylation at the phosphosite threonine-202 of extracellular signal-regulated kinase (ERK))

Similarly there are several approaches to handle timein a logic context ranging from the simplest approximationof discrete or steady-state measurement to more biologicallyrealistic continuous updating These techniques will also beintroduced in turn with the examples below

14 Software

As a means to introduce the different methods and how theycan be used to model different aspects of signal transductionwe will use the tool CellNOpt (wwwcellnoptorg) CellNOptis a software package that trains the topology of a PSN toexperimental data by the criterion of minimizing the errorbetween the data and the logic model created from the PSNIn CellNOpt the starting network based on prior knowledgeis called the prior knowledge network (PKN) (a name wewill use in the rest of this paper) This PKN is preprocessedbefore training by compression and expansion (see materialsand methods section 81) The compression step of CellNOptis a method of reducing the complexity of a logic modelby removing nodes that have no effect on the outcome ofsimulation The expansion step subsequently includes allpossible hyperedges (materials and methods section 81) inthe model The model is trained by minimizing a bipartitefunction that calculates the mismatch between the logic modeland experimental data (mean squared error (MSE)) whilepenalizing model size This minimization can be solved usingdifferent strategies from simple enumeration of options forsmall cases to stochastic optimization algorithms such as

genetic algorithms (Saez-Rodriguez et al 2009) or integerlinear programming (Mitsos et al 2009)

The R version (CellNOptR) is available on Bioconductorand has a number of added features that allows the user torun different variations of logic modeling within the sameframework of model calibration These variations includesteady state to discrete time Boolean modeling fuzzy logic andlogic ODEs all of which will be discussed in turn below Wewill also refer to other software packages that have contrastingor complementary approaches to CellNOptR

For the remainder of the tutorial different logicformalisms will be introduced and explained with the aid ofCellNOptR and the assumptions strengths and weaknesses ofeach formalism with regard to training to data will be illustratedwith a lsquotoy modelrsquo of signal transduction

15 The example model

To illustrate the variety of logic modeling approacheswe will use an imaginary but biologically plausible PKN(figure 1) This network includes a subset of intracellularsignaling networks known to be activated downstream ofEGF and TNFα stimulation and was derived from a largernetwork presented in Saez-Rodriguez et al (2009) In briefthe PKN includes three MAPK cascades (ERK p38 andJNK1) the PI3KAktGSK-3 pro-survival pathway andthe IKKIκBNFκB pathway It consists of 30 nodes and33 edges

From this PKN we derived a model (the data-generating model) that was used to simulate experimentaldata (section 81 and table S1figure S1 (available fromstacksioporgPhysBio9045003mmedia)) The in silico datareplicate biologically plausible behavior that has been seenin such networks such as the transient behavior of ERKactivation (Sasagawa et al 2005) and the oscillatory dynamicsof NFκB translocation from the cytoplasm to the nucleus(Hoffmann et al 2002) (figure 2) These in silico dataconsist of ten lsquoexperimentsrsquo which vary according to differentcombinations of stimulation and inhibition (inhibition isachieved by blocking the activity of two specific kinases(proteins) PI3K and Raf-1 with small-molecule inhibitors)and 16 observations at 2 min intervals from t = 0Inhibition is used in such experiments to further understandthe combinations of upstream events that contribute to theactivation of a particular protein The readouts chosen are well-established downstream events of EGFTNFα stimulationThe experiment represents an ideal situation with multipletime-point sampling However as we will discuss later withfewer measurements one can capture most (but not necessarilyall) of the dynamics of the system The values are between 0and 1 and Gaussian noise was also added to the output toimitate inherent biological noise and the measurement error(see materials and methods section 82) The PKN has thefollowing important properties

(i) It does not specify which input or combination of inputsactivate a particular node (for example both Map3K1and Map3K7 activate MKK4 The PKN does not specifywhether this is an AND or OR relationship figure 1)

4

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

et al 2008) In the absence of this quality of data it is necessaryto use more qualitative and less time-resolved information todeduce mechanism The focus of measurement in such signaltransduction networks is the protein or more specifically theprotein together with post-translational modifications (PTMs)as it is PTMs such as phosphorylation that convey informationthrough a network Hence measurement comes from the fieldof proteomics However assumptions are necessary whenconsidering the variety of PTMs that may occur There aremore than 500 different types of PTMs and measuring thestatus of each site for all proteins is technically impossible(Khoury et al 2011) Indeed this problem can be encounteredwith just phosphorylation alone Considering the epidermalgrowth factor receptor (EGFR) has 31 phosphorylation sitesthis implies that 231 states of EGFR (each site can bephosphorylated or not) would need to be measured to providefull knowledge of the activation of this receptor and howit would change over time Therefore the study of signaltransduction networks tends to concentrate on a subset ofphosphorylation sites where the site interaction partner(s)is known and measurement is technologically feasible (eghigh-quality antibodies are available) These phosphorylationevents are often used as markers of activation and deactivationThe consequences of such an approach is an experimental biastoward such phosphosites a problem that is only now beingaddressed through less-biased high-throughput techniquessuch as mass spectrometry

Phosphoproteomics can be divided into antibody-and mass-spectrometry-based methods A comprehensivesummary of these methods can be found in Terfve and Saez-Rodriguez (2012) Broadly speaking the quality of data can bemeasured in terms of coverage time resolution and specificityAntibody-based methods are generally specific (depending onthe quality of the antibody) and can be used to measure timecourses of target proteins across many conditions Howeverthe number of targets that can be measured is limited Incontrast mass spectrometry techniques allow for systematicidentification and quantification of phosphorylated proteinsAlthough this comes with the caveat of requiring expensiveequipment and advanced know-how as reliable quantificationcan be difficult protocols (especially for mass spectrometry)can be laborious and mapping measurements to proteins isnot trivial (Ilsley et al 2009)

Whatever method is chosen from the above the resultis a quantitative lsquopart listrsquo consisting of phosphoproteomicmeasurements from the signaling network of interest takenunder a certain number of conditions (different stimuliinhibitors time points etc) and describing the states of theseparts

11 From parts to interactions

In order to deduce a mechanism of action that explainsthese types of data the interactions between the partsmust be understood Interactions can be represented asnode-edge graphs The nodes can be biological entitiessuch as proteins as in this case or genes or metabolitesin the case of transcriptional or metabolic studies Edges

can be described as biological activities such as catalysisassociation and modification of the participating nodesThey may be directed (ie protein X affects protein Y andnot vice versa) or undirected Furthermore they can besigned (inhibitoryactivating) or unsigned When these graphsdescribe protein interactions they can be characterized in twocategories protein interaction networks (PINs) and proteinsignaling networks (PSNs) (Pieroni et al 2008)

PINs can be constructed from a number of sourceshigh-throughput experiments such as two-hybrid and affinitypurificationmass spectrometry or systematic literaturesearches (bibliome mining) These methods yield limitedfunctional insight beyond a possible interaction between twoproteins They are represented as a graph with a set of nodesconnected by a set of edges without sign or directionalityThere are a number of public repositories of PINs such asIntAct HPRD and STRING

PSNs are more detailed representations of proteininteractions where if described as a graph their edges can havedirectionality and when possible sign PSNs are generallyobtained through expert curation of the literature or textmining There are multiple public repositories of manuallycurated networks including KEGG WikiPathways NaturePathway Interaction Database and Reactome Each has theirstrengths and weaknesses in terms of graphical formatsannotation accuracy and curation When creating a PSN asmany sources as possible should be consulted before decidingon a final network (Bader et al 2006 Bauer-Mehren et al 2009)Pathway Commons is also a useful portal that integrates theseand other PSN and PIN repositories

While PSNs provide an insight into the transfer ofsignaling information they do so at an abstract levelwithout specifying the mechanism of signal transductionThis additional detail is provided by biochemical networksthat describe such interactions in a quantitative manner(ie phosphorylation binding dimerization etc) There aremany examples in the context of metabolism and recentlyan increasing number for signal transduction such as thereconstruction of the signaling network downstream ofthe EGFR (Oda et al 2005) and the retinoblastomaE2transcription factor (RBE2F) pathway (Calzone et al 2008)

Independent of the resolution of the network there issuch a high degree of interconnectedness redundancy andcellcontext specificity even in well-studied networks suchas the mitogen-activated protein kinase (MAPK) signalingpathway that it is difficult to obtain a high degree ofaccuracy from prior knowledge alone (Kirouac et al 2012)Therefore prior knowledge networks should be constructedwith as much cell-specific knowledge as possible and high-throughput databases can be advantageous to add annotationand information (eg expression data for a particular networkin a specific cell type) The growing use of model standardssuch as Systems Biology Markup Language (SBML) (Huckaet al 2003) can also aid in pooling resources across datasets

12 From interactions to mechanism

Through manual curation of the literature or networkdatabases there are a number of ways of arriving at a

2

Phys Biol 9 (2012) 045003 A MacNamara et al

network or map that represents the biological interactions ofthe system of interest So what are the requirements to stepfrom this biological representation to a mathematical analysisand understanding of these networks Graph theory can beused to analyze the topology of a network to understandthe principles behind its design (Barabasi and Oltvai 2004)These networks can also be used as a scaffold for overlayingexpression data to better understand the activation of sub-networks (Bossi and Lehner 2009) These analyses can providea useful insight but are not amenable to explaining how asignaling network responds to a defined set of perturbations(Saez-Rodriguez et al 2009) This aim can only be achievedvia mechanistic and predictive computational modeling Bycomputational modeling we mean in the context of thispaper the construction of an in silico representation of asystem (in this case a cell signaling network) that can besimulated through a set of programmable commands thatmimic the functioning of the system over time (Terfve andSaez-Rodriguez 2012) Simulating over time (or dynamicmodeling) consists of using functions to describe how eachspeciesrsquo (or nodersquos) state in a network changes as a functionof its inputs

There are many approaches to this type of modelingand the choice of method is highly dependent on the qualityand type of data available for the network together withthe accuracy of ones prior knowledge about its topology andinteractions We will briefly give an overview of some popularmethods but more detail about each approach can be foundin a number of excellent reviews (eg Aldridge et al 2006 deJong 2002)

Physicochemical modeling (modeling that includesbiochemical and physical features of the system) is anapproach that uses equations derived from physical andchemical theory to describe biological processes such ascovalent binding association and diffusion (Aldridge et al2006) This is a popular and insightful type of model forsignaling networks and many examples can be found inrepositories such as biomodelsnet These equations are builtthrough a deep understanding of the underlying biochemistryand hence refer to distinct processes (such as catalysisand complex formation) The family of physicochemicalmodel formalisms include among others ordinary and partialdifferential equations (ODEs and PDEs) their stochasticvariants and rule-based approaches ODEs are the mostcommon approach and can represent a signaling networkthrough a set of coupled equations that describe the changein concentration of the elements (biomolecular species) ofthe network ODEs are based on the assumption of massaction kineticsmdasha law that defines the rate of a reactionas being proportional to the concentrations of the reactingspecies (Chen et al 2010) This assumption can break downif there are spatial gradients for species or if concentrationsof species are low enough that random fluctuations become afactor in the behavior of the system In such cases PDEs andstochastic formalisms are better suited to capture the biologicalbehavior

Another drawback to physicochemical modeling is thedifficulty in managing and manipulating large networks both

in terms of the combinatorial complexity that such networkspresent (for example the number of phosphorylation statesof EGFR) and determination of the parameters of eachequation such as rate constants and initial conditions Rule-based modeling allows easier manipulation and managementof larger systems Models are specified by a set of rulescorresponding to the molecular interactions among proteindomains and these are then automatically converted into amodel that describes all possible reactions and molecularconfigurations (see Hlavacek et al (2006) for an introductionto rule-based modeling)

In summary physicochemically detailed modelinggenerally works well with small detailed biochemicalnetworks In the absence of such criteria a coarser-grainedapproach is necessary and logic modeling can be viewed inthis light

13 Logic modeling

Unlike physicochemical modeling logic modeling requiresonly a PSN as a starting point to simulate signaling processesAlthough sparse in detail such graphs are very insightfulfor understanding how the structure (or topology) determinesthe flow of information from input through to output (forexample ligandndashreceptor binding through to transcriptionfactor activation (Marsquoayan et al 2005)) However beforequestions can be addressed by simulation the graph must bemade computable by defining how each node state changesas its inputs change so that inputndashoutput relationships can bequantified for the whole system

Logic modeling uses transfer functions to describe therelationship between nodes in a graph (see section 83)Transfer functions are the mathematical representation ofthe relationship between inputs and outputs in a system Inphysicochemical modeling these are based on mass actionkinetics and describe how the input species are transformedinto output species by the chemical reaction In logic modelingtransfer functions consist of logic operators (AND OR NOT)that describe how an output node is activated by its inputs Toillustrate this we can use a simple case from the PI-3-Kinase-Akt signaling network that controls growth and division inmammalian cells As part of this network the kinase Akt isactivated by the kinase PDK1 and the kinase complex mTORThis would be represented in a PSN by two directed positiveedges from PDK1 and mTOR to Akt However from thisrepresentation it would not be known whether both or eitherkinases are necessary for Akt activation In logic modeling thisrelationship can be represented using an AND operator thatspecifies the necessity of both kinases for Akt activation Suchan example also illustrates the strength and weakness of logicmodeling the reduction of complexity that enables modelingof large systems with incomplete information and fewerparameters against less mechanistic detail and biochemicalaccuracy

The use of logic-based modeling of biological systemsgoes back to more than 40 years with the first model describinga gene regulatory network (Kauffman 1969) Since then logic

3

Phys Biol 9 (2012) 045003 A MacNamara et al

modeling has proved particularly useful in describing the effectof environmental inputs on cell phenotypes through networksof signal transduction There are multiple studies using thistype of modeling as a basis (Helikar et al 2008 Calzoneet al 2010 Gonzalez et al 2008 Mendoza and Xenarios 2006Schlatter et al 2009 Sahin et al 2009) (see also reviews Morriset al 2010 Watterson et al 2008 Thakar and Albert 2010)The structure of a signal transduction network lends itselfto logic modeling with clearly defined input nodes (ligandndashreceptor combinations) measurable elements correspondingto activation (phosphorylated proteins downstream of thereceptor) and relatively little knowledge of the biochemistryinvolved

Having summarized how logic modeling formulatesinputoutput relationships (see above and section 83) thenext step is to consider the complexity of the logic modelingin terms of how state and time are treated The state refershere to the value or quantity attached to each node (typicallya protein) and reflects the activation of that node at any pointduring a simulation Their value is proportional to activationand can vary from 0 to some arbitrary or defined upper limitdepending on the type of logic modeling being undertakenStates can be defined as onoff (Boolean logic) multi-levelor continuous and we will discuss each of these in turnWhen training logic models with experiments these valuesare often compared to biochemical data For instance thephosphorylation of a protein is considered to be a proxy of itsactivation (eg phosphorylation at the phosphosite threonine-202 of extracellular signal-regulated kinase (ERK))

Similarly there are several approaches to handle timein a logic context ranging from the simplest approximationof discrete or steady-state measurement to more biologicallyrealistic continuous updating These techniques will also beintroduced in turn with the examples below

14 Software

As a means to introduce the different methods and how theycan be used to model different aspects of signal transductionwe will use the tool CellNOpt (wwwcellnoptorg) CellNOptis a software package that trains the topology of a PSN toexperimental data by the criterion of minimizing the errorbetween the data and the logic model created from the PSNIn CellNOpt the starting network based on prior knowledgeis called the prior knowledge network (PKN) (a name wewill use in the rest of this paper) This PKN is preprocessedbefore training by compression and expansion (see materialsand methods section 81) The compression step of CellNOptis a method of reducing the complexity of a logic modelby removing nodes that have no effect on the outcome ofsimulation The expansion step subsequently includes allpossible hyperedges (materials and methods section 81) inthe model The model is trained by minimizing a bipartitefunction that calculates the mismatch between the logic modeland experimental data (mean squared error (MSE)) whilepenalizing model size This minimization can be solved usingdifferent strategies from simple enumeration of options forsmall cases to stochastic optimization algorithms such as

genetic algorithms (Saez-Rodriguez et al 2009) or integerlinear programming (Mitsos et al 2009)

The R version (CellNOptR) is available on Bioconductorand has a number of added features that allows the user torun different variations of logic modeling within the sameframework of model calibration These variations includesteady state to discrete time Boolean modeling fuzzy logic andlogic ODEs all of which will be discussed in turn below Wewill also refer to other software packages that have contrastingor complementary approaches to CellNOptR

For the remainder of the tutorial different logicformalisms will be introduced and explained with the aid ofCellNOptR and the assumptions strengths and weaknesses ofeach formalism with regard to training to data will be illustratedwith a lsquotoy modelrsquo of signal transduction

15 The example model

To illustrate the variety of logic modeling approacheswe will use an imaginary but biologically plausible PKN(figure 1) This network includes a subset of intracellularsignaling networks known to be activated downstream ofEGF and TNFα stimulation and was derived from a largernetwork presented in Saez-Rodriguez et al (2009) In briefthe PKN includes three MAPK cascades (ERK p38 andJNK1) the PI3KAktGSK-3 pro-survival pathway andthe IKKIκBNFκB pathway It consists of 30 nodes and33 edges

From this PKN we derived a model (the data-generating model) that was used to simulate experimentaldata (section 81 and table S1figure S1 (available fromstacksioporgPhysBio9045003mmedia)) The in silico datareplicate biologically plausible behavior that has been seenin such networks such as the transient behavior of ERKactivation (Sasagawa et al 2005) and the oscillatory dynamicsof NFκB translocation from the cytoplasm to the nucleus(Hoffmann et al 2002) (figure 2) These in silico dataconsist of ten lsquoexperimentsrsquo which vary according to differentcombinations of stimulation and inhibition (inhibition isachieved by blocking the activity of two specific kinases(proteins) PI3K and Raf-1 with small-molecule inhibitors)and 16 observations at 2 min intervals from t = 0Inhibition is used in such experiments to further understandthe combinations of upstream events that contribute to theactivation of a particular protein The readouts chosen are well-established downstream events of EGFTNFα stimulationThe experiment represents an ideal situation with multipletime-point sampling However as we will discuss later withfewer measurements one can capture most (but not necessarilyall) of the dynamics of the system The values are between 0and 1 and Gaussian noise was also added to the output toimitate inherent biological noise and the measurement error(see materials and methods section 82) The PKN has thefollowing important properties

(i) It does not specify which input or combination of inputsactivate a particular node (for example both Map3K1and Map3K7 activate MKK4 The PKN does not specifywhether this is an AND or OR relationship figure 1)

4

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

network or map that represents the biological interactions ofthe system of interest So what are the requirements to stepfrom this biological representation to a mathematical analysisand understanding of these networks Graph theory can beused to analyze the topology of a network to understandthe principles behind its design (Barabasi and Oltvai 2004)These networks can also be used as a scaffold for overlayingexpression data to better understand the activation of sub-networks (Bossi and Lehner 2009) These analyses can providea useful insight but are not amenable to explaining how asignaling network responds to a defined set of perturbations(Saez-Rodriguez et al 2009) This aim can only be achievedvia mechanistic and predictive computational modeling Bycomputational modeling we mean in the context of thispaper the construction of an in silico representation of asystem (in this case a cell signaling network) that can besimulated through a set of programmable commands thatmimic the functioning of the system over time (Terfve andSaez-Rodriguez 2012) Simulating over time (or dynamicmodeling) consists of using functions to describe how eachspeciesrsquo (or nodersquos) state in a network changes as a functionof its inputs

There are many approaches to this type of modelingand the choice of method is highly dependent on the qualityand type of data available for the network together withthe accuracy of ones prior knowledge about its topology andinteractions We will briefly give an overview of some popularmethods but more detail about each approach can be foundin a number of excellent reviews (eg Aldridge et al 2006 deJong 2002)

Physicochemical modeling (modeling that includesbiochemical and physical features of the system) is anapproach that uses equations derived from physical andchemical theory to describe biological processes such ascovalent binding association and diffusion (Aldridge et al2006) This is a popular and insightful type of model forsignaling networks and many examples can be found inrepositories such as biomodelsnet These equations are builtthrough a deep understanding of the underlying biochemistryand hence refer to distinct processes (such as catalysisand complex formation) The family of physicochemicalmodel formalisms include among others ordinary and partialdifferential equations (ODEs and PDEs) their stochasticvariants and rule-based approaches ODEs are the mostcommon approach and can represent a signaling networkthrough a set of coupled equations that describe the changein concentration of the elements (biomolecular species) ofthe network ODEs are based on the assumption of massaction kineticsmdasha law that defines the rate of a reactionas being proportional to the concentrations of the reactingspecies (Chen et al 2010) This assumption can break downif there are spatial gradients for species or if concentrationsof species are low enough that random fluctuations become afactor in the behavior of the system In such cases PDEs andstochastic formalisms are better suited to capture the biologicalbehavior

Another drawback to physicochemical modeling is thedifficulty in managing and manipulating large networks both

in terms of the combinatorial complexity that such networkspresent (for example the number of phosphorylation statesof EGFR) and determination of the parameters of eachequation such as rate constants and initial conditions Rule-based modeling allows easier manipulation and managementof larger systems Models are specified by a set of rulescorresponding to the molecular interactions among proteindomains and these are then automatically converted into amodel that describes all possible reactions and molecularconfigurations (see Hlavacek et al (2006) for an introductionto rule-based modeling)

In summary physicochemically detailed modelinggenerally works well with small detailed biochemicalnetworks In the absence of such criteria a coarser-grainedapproach is necessary and logic modeling can be viewed inthis light

13 Logic modeling

Unlike physicochemical modeling logic modeling requiresonly a PSN as a starting point to simulate signaling processesAlthough sparse in detail such graphs are very insightfulfor understanding how the structure (or topology) determinesthe flow of information from input through to output (forexample ligandndashreceptor binding through to transcriptionfactor activation (Marsquoayan et al 2005)) However beforequestions can be addressed by simulation the graph must bemade computable by defining how each node state changesas its inputs change so that inputndashoutput relationships can bequantified for the whole system

Logic modeling uses transfer functions to describe therelationship between nodes in a graph (see section 83)Transfer functions are the mathematical representation ofthe relationship between inputs and outputs in a system Inphysicochemical modeling these are based on mass actionkinetics and describe how the input species are transformedinto output species by the chemical reaction In logic modelingtransfer functions consist of logic operators (AND OR NOT)that describe how an output node is activated by its inputs Toillustrate this we can use a simple case from the PI-3-Kinase-Akt signaling network that controls growth and division inmammalian cells As part of this network the kinase Akt isactivated by the kinase PDK1 and the kinase complex mTORThis would be represented in a PSN by two directed positiveedges from PDK1 and mTOR to Akt However from thisrepresentation it would not be known whether both or eitherkinases are necessary for Akt activation In logic modeling thisrelationship can be represented using an AND operator thatspecifies the necessity of both kinases for Akt activation Suchan example also illustrates the strength and weakness of logicmodeling the reduction of complexity that enables modelingof large systems with incomplete information and fewerparameters against less mechanistic detail and biochemicalaccuracy

The use of logic-based modeling of biological systemsgoes back to more than 40 years with the first model describinga gene regulatory network (Kauffman 1969) Since then logic

3

Phys Biol 9 (2012) 045003 A MacNamara et al

modeling has proved particularly useful in describing the effectof environmental inputs on cell phenotypes through networksof signal transduction There are multiple studies using thistype of modeling as a basis (Helikar et al 2008 Calzoneet al 2010 Gonzalez et al 2008 Mendoza and Xenarios 2006Schlatter et al 2009 Sahin et al 2009) (see also reviews Morriset al 2010 Watterson et al 2008 Thakar and Albert 2010)The structure of a signal transduction network lends itselfto logic modeling with clearly defined input nodes (ligandndashreceptor combinations) measurable elements correspondingto activation (phosphorylated proteins downstream of thereceptor) and relatively little knowledge of the biochemistryinvolved

Having summarized how logic modeling formulatesinputoutput relationships (see above and section 83) thenext step is to consider the complexity of the logic modelingin terms of how state and time are treated The state refershere to the value or quantity attached to each node (typicallya protein) and reflects the activation of that node at any pointduring a simulation Their value is proportional to activationand can vary from 0 to some arbitrary or defined upper limitdepending on the type of logic modeling being undertakenStates can be defined as onoff (Boolean logic) multi-levelor continuous and we will discuss each of these in turnWhen training logic models with experiments these valuesare often compared to biochemical data For instance thephosphorylation of a protein is considered to be a proxy of itsactivation (eg phosphorylation at the phosphosite threonine-202 of extracellular signal-regulated kinase (ERK))

Similarly there are several approaches to handle timein a logic context ranging from the simplest approximationof discrete or steady-state measurement to more biologicallyrealistic continuous updating These techniques will also beintroduced in turn with the examples below

14 Software

As a means to introduce the different methods and how theycan be used to model different aspects of signal transductionwe will use the tool CellNOpt (wwwcellnoptorg) CellNOptis a software package that trains the topology of a PSN toexperimental data by the criterion of minimizing the errorbetween the data and the logic model created from the PSNIn CellNOpt the starting network based on prior knowledgeis called the prior knowledge network (PKN) (a name wewill use in the rest of this paper) This PKN is preprocessedbefore training by compression and expansion (see materialsand methods section 81) The compression step of CellNOptis a method of reducing the complexity of a logic modelby removing nodes that have no effect on the outcome ofsimulation The expansion step subsequently includes allpossible hyperedges (materials and methods section 81) inthe model The model is trained by minimizing a bipartitefunction that calculates the mismatch between the logic modeland experimental data (mean squared error (MSE)) whilepenalizing model size This minimization can be solved usingdifferent strategies from simple enumeration of options forsmall cases to stochastic optimization algorithms such as

genetic algorithms (Saez-Rodriguez et al 2009) or integerlinear programming (Mitsos et al 2009)

The R version (CellNOptR) is available on Bioconductorand has a number of added features that allows the user torun different variations of logic modeling within the sameframework of model calibration These variations includesteady state to discrete time Boolean modeling fuzzy logic andlogic ODEs all of which will be discussed in turn below Wewill also refer to other software packages that have contrastingor complementary approaches to CellNOptR

For the remainder of the tutorial different logicformalisms will be introduced and explained with the aid ofCellNOptR and the assumptions strengths and weaknesses ofeach formalism with regard to training to data will be illustratedwith a lsquotoy modelrsquo of signal transduction

15 The example model

To illustrate the variety of logic modeling approacheswe will use an imaginary but biologically plausible PKN(figure 1) This network includes a subset of intracellularsignaling networks known to be activated downstream ofEGF and TNFα stimulation and was derived from a largernetwork presented in Saez-Rodriguez et al (2009) In briefthe PKN includes three MAPK cascades (ERK p38 andJNK1) the PI3KAktGSK-3 pro-survival pathway andthe IKKIκBNFκB pathway It consists of 30 nodes and33 edges

From this PKN we derived a model (the data-generating model) that was used to simulate experimentaldata (section 81 and table S1figure S1 (available fromstacksioporgPhysBio9045003mmedia)) The in silico datareplicate biologically plausible behavior that has been seenin such networks such as the transient behavior of ERKactivation (Sasagawa et al 2005) and the oscillatory dynamicsof NFκB translocation from the cytoplasm to the nucleus(Hoffmann et al 2002) (figure 2) These in silico dataconsist of ten lsquoexperimentsrsquo which vary according to differentcombinations of stimulation and inhibition (inhibition isachieved by blocking the activity of two specific kinases(proteins) PI3K and Raf-1 with small-molecule inhibitors)and 16 observations at 2 min intervals from t = 0Inhibition is used in such experiments to further understandthe combinations of upstream events that contribute to theactivation of a particular protein The readouts chosen are well-established downstream events of EGFTNFα stimulationThe experiment represents an ideal situation with multipletime-point sampling However as we will discuss later withfewer measurements one can capture most (but not necessarilyall) of the dynamics of the system The values are between 0and 1 and Gaussian noise was also added to the output toimitate inherent biological noise and the measurement error(see materials and methods section 82) The PKN has thefollowing important properties

(i) It does not specify which input or combination of inputsactivate a particular node (for example both Map3K1and Map3K7 activate MKK4 The PKN does not specifywhether this is an AND or OR relationship figure 1)

4

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

modeling has proved particularly useful in describing the effectof environmental inputs on cell phenotypes through networksof signal transduction There are multiple studies using thistype of modeling as a basis (Helikar et al 2008 Calzoneet al 2010 Gonzalez et al 2008 Mendoza and Xenarios 2006Schlatter et al 2009 Sahin et al 2009) (see also reviews Morriset al 2010 Watterson et al 2008 Thakar and Albert 2010)The structure of a signal transduction network lends itselfto logic modeling with clearly defined input nodes (ligandndashreceptor combinations) measurable elements correspondingto activation (phosphorylated proteins downstream of thereceptor) and relatively little knowledge of the biochemistryinvolved

Having summarized how logic modeling formulatesinputoutput relationships (see above and section 83) thenext step is to consider the complexity of the logic modelingin terms of how state and time are treated The state refershere to the value or quantity attached to each node (typicallya protein) and reflects the activation of that node at any pointduring a simulation Their value is proportional to activationand can vary from 0 to some arbitrary or defined upper limitdepending on the type of logic modeling being undertakenStates can be defined as onoff (Boolean logic) multi-levelor continuous and we will discuss each of these in turnWhen training logic models with experiments these valuesare often compared to biochemical data For instance thephosphorylation of a protein is considered to be a proxy of itsactivation (eg phosphorylation at the phosphosite threonine-202 of extracellular signal-regulated kinase (ERK))

Similarly there are several approaches to handle timein a logic context ranging from the simplest approximationof discrete or steady-state measurement to more biologicallyrealistic continuous updating These techniques will also beintroduced in turn with the examples below

14 Software

As a means to introduce the different methods and how theycan be used to model different aspects of signal transductionwe will use the tool CellNOpt (wwwcellnoptorg) CellNOptis a software package that trains the topology of a PSN toexperimental data by the criterion of minimizing the errorbetween the data and the logic model created from the PSNIn CellNOpt the starting network based on prior knowledgeis called the prior knowledge network (PKN) (a name wewill use in the rest of this paper) This PKN is preprocessedbefore training by compression and expansion (see materialsand methods section 81) The compression step of CellNOptis a method of reducing the complexity of a logic modelby removing nodes that have no effect on the outcome ofsimulation The expansion step subsequently includes allpossible hyperedges (materials and methods section 81) inthe model The model is trained by minimizing a bipartitefunction that calculates the mismatch between the logic modeland experimental data (mean squared error (MSE)) whilepenalizing model size This minimization can be solved usingdifferent strategies from simple enumeration of options forsmall cases to stochastic optimization algorithms such as

genetic algorithms (Saez-Rodriguez et al 2009) or integerlinear programming (Mitsos et al 2009)

The R version (CellNOptR) is available on Bioconductorand has a number of added features that allows the user torun different variations of logic modeling within the sameframework of model calibration These variations includesteady state to discrete time Boolean modeling fuzzy logic andlogic ODEs all of which will be discussed in turn below Wewill also refer to other software packages that have contrastingor complementary approaches to CellNOptR

For the remainder of the tutorial different logicformalisms will be introduced and explained with the aid ofCellNOptR and the assumptions strengths and weaknesses ofeach formalism with regard to training to data will be illustratedwith a lsquotoy modelrsquo of signal transduction

15 The example model

To illustrate the variety of logic modeling approacheswe will use an imaginary but biologically plausible PKN(figure 1) This network includes a subset of intracellularsignaling networks known to be activated downstream ofEGF and TNFα stimulation and was derived from a largernetwork presented in Saez-Rodriguez et al (2009) In briefthe PKN includes three MAPK cascades (ERK p38 andJNK1) the PI3KAktGSK-3 pro-survival pathway andthe IKKIκBNFκB pathway It consists of 30 nodes and33 edges

From this PKN we derived a model (the data-generating model) that was used to simulate experimentaldata (section 81 and table S1figure S1 (available fromstacksioporgPhysBio9045003mmedia)) The in silico datareplicate biologically plausible behavior that has been seenin such networks such as the transient behavior of ERKactivation (Sasagawa et al 2005) and the oscillatory dynamicsof NFκB translocation from the cytoplasm to the nucleus(Hoffmann et al 2002) (figure 2) These in silico dataconsist of ten lsquoexperimentsrsquo which vary according to differentcombinations of stimulation and inhibition (inhibition isachieved by blocking the activity of two specific kinases(proteins) PI3K and Raf-1 with small-molecule inhibitors)and 16 observations at 2 min intervals from t = 0Inhibition is used in such experiments to further understandthe combinations of upstream events that contribute to theactivation of a particular protein The readouts chosen are well-established downstream events of EGFTNFα stimulationThe experiment represents an ideal situation with multipletime-point sampling However as we will discuss later withfewer measurements one can capture most (but not necessarilyall) of the dynamics of the system The values are between 0and 1 and Gaussian noise was also added to the output toimitate inherent biological noise and the measurement error(see materials and methods section 82) The PKN has thefollowing important properties

(i) It does not specify which input or combination of inputsactivate a particular node (for example both Map3K1and Map3K7 activate MKK4 The PKN does not specifywhether this is an AND or OR relationship figure 1)

4

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 1 The prior knowledge network (PKN) used as the starting point for the toy model depicted with VANTED (Junker et al 2006) in anSBGN activity flow format (Novere et al 2009) The experimental design is annotated as colored nodes inputs are shown in green inhibitednodes in red and readouts in blue (Raf-1 is annotated as both red and blue as it is inhibited and measured) Nodes that were compressed (seematerials and methods section 81 and figures S2 and S3 (available from stacksioporgPhysBio9045003mmedia)) have a dashed outlineThe data-generating model contained hyperedges from TRAF2 to MKK7 (dashed edges in figure) that were removed for the PKN todemonstrate how incomplete prior knowledge can affect fitting the data The input to MKK4 is highlighted to demonstrate the concept of ahypergraph MKK4 has two inputs Map3K1 and Map3K7 However both AND and OR can be necessary for MKK activation This isrepresented by the two inputs expanding to three hyperedges (inset and supplementary figure S4 (available fromstacksioporgPhysBio9045003mmedia)) The goal of CellNOptR is to find the subset of all possible hyperedges (logic gates) in the PKNthat best explains the data

(ii) It includes additional interactions (TNFR rarr PI3KPI3K rarr Rac Rac rarr Map3K1) not present in the data-generating model

(iii) It is missing interactions (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) that are present in the data-generating modeland are necessary to fully explain the in silico data

The purpose of these gaps and errors in our lsquopriorknowledgersquo is to demonstrate the ability of CellNOptR totrain context-specific models from unspecific prior knowledgeand also to demonstrate the limitations of such an approachwhen information is incomplete We will also demonstratehow CellNOptR performs when trying to find the true networktopology and model parameters by using the different logicmodel formalisms to simulate the lsquoexperimental datarsquo andhence demonstrate the strengths weaknesses and underlyingassumptions of each of the logic model formalisms in turn The

network has been designed such that the features uncoveredby the logic formalisms are not confounded by the missinginteractions

2 Boolean steady state

In arguably the simplest case of data an experimental designlooking at a particular signal transduction network will consistof a set of measurements representing the phosphorylationstate of a subset of proteins in the signaling networkThese measurements will be taken before the addition of astimulus or stimuli and at a single time point after stimulation(t = 0 and t = t1) Additionally the effects of multipleconditions (inhibitions perturbations) may also be examinedwith this design This is a common approach when studyingsignal transduction which has classically been used via low-throughput methods and has more recently been scaled-up

5

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 2 The in silico data generated to test each logic modeling formalism The data were generated by a logic model (the data-generatingmodel) Each row of the figure represents an experiment with a certain combination of stimuli and inhibitors (shown in the final twocolumns black is ON white is OFF) The simulated data are shown as a continuous black line Gaussian noise was added (section 15 andmaterials and methods section 82) and the data were lsquosampledrsquo at 16 equally spaced time points between 0 and 30 min to simulate afine-grained time course experimental design

owing to new technological developments (Terfve and Saez-Rodriguez 2012)

Choosing a single time point after stimulationleads to a simple design and minimizes the cost perexperiment However it then becomes critical to choosean appropriate time t1 (see figures S5ndash7 (availablefrom stacksioporgPhysBio9045003mmedia)) Ideally oneshould perform a set of detailed time course experiments thatencapsulates the variation in activation in the system but thisis usually not viable in terms of cost and time constraintsIt may be only possible to perform a detailed time courseexperiment for a single phosphoprotein From this a timepoint can be chosen that is characteristic of the activation of thephosphoproteins of interest Typically in signal transductiona fast wave of activation occurs over a short timescale afterstimulation This is followed by slower later mechanisms thatoften down regulate the signals over a longer timescale (egdegradation internalization etc)

Returning to our example the measurement ofphosphorylated ERK could be viewed as a sensible outputwith which a time course can be obtained (Its activation wouldbe representative of the dynamics of the MAPK cascade anditrsquos technically a good choice because of the quality of ERKphosphosite-specific antibodies) From this time course wewould see that two different timescales seem to exist an early

activation phase followed by a late phase Thus characteristictime points can be chosen (figure 2) and a reasonable earlytime point would be in the range 4ndash12 min For argumentrsquossake we will choose t1 = 10 min

We can see from the data the difficulty with defininga characteristic time point or how choosing a single timepoint may affect the ability to capture all dynamic featuresFor example it is impossible to understand the oscillatorynature of NFκB translocation with a single time point andERK activation dynamics can only be partly representative ofother phosphoprotein dynamics (even those closely relatedin function such as Raf1) For the oscillations of NFκBone would need to sample with a density of at least every25 min (since the wavelength is 5 min) while to obtain anapproximate sense of the transient activation of ERK two well-chosen time points can be enough In spite of this steady-statemeasurement can give a qualitative overview of the systemthat allows for robust albeit coarse-grained conclusions withrelatively few data points (and thus cost)

21 Steady-state optimization and simulation

One way to measure a modelrsquos ability to fit experimental datawith a single time point such as that described above is tomake the assumption that the system reaches at that point of a

6

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 3 The fit of the trained model using the Boolean steady-state formalism The simulated data are shown as two blue circles (t0 and t1)connected by a blue dotted line The colors represent the goodness of fit between the model and the data at t1 = 10 Heat-map coloration isused to signify the range from high error (red normalized mean squared error (MSE) = 1) to no error (white MSE = 0) t is measured inminutes and the y-axis is the normalized activity of the measured proteins The training in CellNOptR took 180 s

pseudo-steady state the fast reactions have already occurredwhile the slow reactions have not yet significantly affected thenetworkrsquos behavior (Klamt et al 2006) This approximationimplies that the flux through the system (in our case thephosphorylation cascade in signal transduction) has stabilizedand the quantities of phosphorylated proteins are no longervarying to a significant degree With this assumption a modelof this system can be simulated until it has also reached asteady state

With the in silico data (figure 2) as our starting pointthe PKN (figure 1) was trained using the steady-state modelformalism at t1 = 10 min Details about the node states andtransfer functions of this formalism (Boolean steady state) aresummarized in section 83 Figure 3 shows the steady-statesimulation overlaid on the experimental data

22 Interpretation of steady-state result

The Boolean steady-state formalism used by CellNOptRfor optimization recovers most of the underlying lsquotruersquonetwork and hence gives a good steady-state approximationof the in silico data (see figures 3 and 8) Howeverthere are some exceptions that highlight the limitations ofsteady-state measurements Using this formalism CellNOptRcannot identify the NFκB oscillations caused by feedbackhyperedges that cause negative feedback are penalized in

CellNOptR as a steady state cannot be reached when they arepresent Another limitation is that the state of each element inthe model is limited to 01 (either switched on or off) Henceintermediate levels of activation cannot be simulated (such asp38 activation under TNFα and EGF stimulation) Finally theeffect of the missing pathway from TNFα to AP1 is observedwhen the experimental measurement cannot be explained withTNFα stimulation in the absence of EGF stimulation

Thus the strength of steady-state Boolean logic is stronglydependent on the assumptions underlying the data If one hasenough knowledge of the data and biochemistry such that theassumption of steady state is a fair one to make training anetwork to data using steady-state Boolean logic modelingcan uncover cell-specific behavior for example differencesbetween cancer and normal cells (Saez-Rodriguez et al 2011a)Another advantage is the scalability of such an approachbecause the method is parameter-free large networks can betrained under a large number of conditions

3 Two time points (or additional steady state)

As mentioned in section 2 it is quite common in signalingnetworks to observe a transient behavior where a speciesis quickly activated and subsequently deactivated Such adynamic obviously cannot be captured with a steady-stateapproach where only one time point is considered Therefore

7

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 4 The fit of the model at two time points t1 and t2 using the two steady-state approach Again the colors are representative of the fitthis time at t1 = 10 and t2 = 30 t is measured in minutes and the y-axis is the normalized activity of the measured proteins The training inCellNOptR took 240 s

in the above section this issue was avoided by only modelinglsquofast eventsrsquo ie the activation phase of the signal propagationHowever when information about more than one time pointis available and such a fast activation followed by slowdeactivation (or indeed any combination of slower and fasterprocesses) is observed then it is possible to also capturethese processes while keeping the simplifying assumption ofsteady states In essence it is assumed that multiple pseudo-steady states reflect the mechanisms that are acting at differenttimescales and they can be optimized independently We willillustrate this with the CellNOptR implementation for twotimescales but the approach is extendable to more than twotime points

Defining suitable time points that adequately representthe process timescales that we want to model is a similarproblem to what was discussed above for a unique steadystate with the added complexity of having to choose morethan one point that is consistent for all modeled species Thiscan be guided by prior knowledge eg if it is known that areceptor is activated on a fast timescale (eg 30 min for fullactivation) by phosphorylation and then deactivated by slowinternalization and degradation (eg 2 h for full silencing of thesignal) However in general it is better to develop a detailedtime course as stated above In our case again using ERKwe would say that a second measurement at 20ndash30 min wouldbe adequate 30 min was used for the sake of argument (seefigure 4)

31 Multiple steady-state optimization and simulation

In CellNOptR a model of a system with two steady statesat different timescales is simulated by assuming that a subsetof the hyperedges (interactions) only become active at a latertime point that is they operate on a different timescale (Klamtet al 2006) That being the case the two time points cantherefore be optimized separately In practice this means thatthe optimization is done in two steps

(i) The scaffold model (the model after compression of non-essential nodes and expansion of all possible hyperedgessee figure 1 and materials and methods section 81)derived from the PKN is used to train the model againstthe data at t1 thereby identifying hyperedges that bestreproduce the data at this time point

(ii) Hyperedges that were not selected as active at t1 areused as the search space for training the model at t2 Forsimulation (and therefore testing the model fit) candidatemodels are tested by using the steady state of the t1 modelas an initial state then computing the steady state fromthere including candidate t2 hyperedges There is also theadditional constraint that whenever hyperedges at t1 andt2 influence a node in contradicting ways the t2 hyperedgeoverrules the t1 hyperedge and the state of the target nodesis locked to the state defined by the t2 hyperedge

Besides the additional constraint of the overridinghyperedges described above the node states and transfer

8

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

functions are calculated in the same way as the Boolean steady-state formalism (section 83)

32 Interpretation

In our example we can see that the two steady-stateoptimization finds the feedback from ERK back to SOS-1(figure 8) Hence from figure 4 the transient activation ofRaf1 ERK and AP1 is captured in the trained model Usinga single characteristic time point a model that includes thenegative feedback from ERK to SOS-1 at t1 would not beselected as the branch never reaches a stable steady statebecause of oscillation However if we say that the branch isactive at t1 but that the negative feedback is only active att2 and that when active this negative feedback permanentlyturns SOS-1 off then the model does reach a steady state att1 (where SOS-1 Raf-1 and ERK are all ON) and a differentsteady state at t2 (where SOS-1 Raf-1 and ERK are all OFFas a result of the activated negative feedback)

4 Synchronous multiple time-point simulation andmultiple timescales

As discussed in section 3 by measuring at two characteristictime points the trained logic model is capable of finding theslow negative feedback from ERK to SOS-1 and thereforemove a step closer to understanding the lsquotruersquo networkHowever the oscillations of NFκB still cannot be explainedwith the pseudo-steady-state formalism as it is necessary touse the full time course (and not just two time points) datato observe this effect This can be modeled by a discretetime Boolean model that is available as add-on R packageto CellNOptR CNORdt (discrete time)

41 Synchronous and asynchronous updating

CNORdt introduces some variation in how time is handled inthe model Instead of simulating and fitting data at steadystates it is capable of fitting time course data by usingan additional model parameter together with a synchronousupdating scheme

Synchronous updating is where all nodes are updatedsimultaneously during model simulation hence each nodeat time t is a function of its input nodes at t minus 1 (seesection 83) This is the updating scheme used in CellNOptRAn alternative method is asynchronous updating wherenodes are updated in a random or non-synchronous orderdepending on the asynchronous method used This leads todifferent simulation properties depending on the updatingmethod chosen Synchronous updates are deterministic andsimulations run under the same conditions (inputs andperturbations) will reach the same steady state (or attractor)each time In contrast asynchronous updating introducesstochasticity into the system such that different steady statescan be reached from the same starting conditions Therandom updating of node values is one possible applicationof asynchronicity This enables sampling over all timescales(any reaction can be deemed to be slowest or fastest) thus

avoiding the constraint inherent in synchronous simulations ofan equal timescale over all reactions However this addedcomplexity can make results difficult to interpret (Garget al 2008) Mixed synchronousasynchronous updating isan intermediate approach that can stratify reaction groupsaccording to their known reaction rates thus taking advantageof a priori knowledge and reducing the complexity of a fullyasynchronous approach (Faure et al 2006 Albert et al 2008Assmann and Albert 2009 Garg et al 2008)

CNORdt introduces a scaling parameter that defines thetimescale of the Boolean synchronous simulation Where eachlsquotickrsquo (t) (or simulation step) is the synchronous updating ofall nodes in the model according to their inputs at t minus 1 thescaling parameter defines the lsquotickrsquo frequency relative to thetimescale of the real data Although this is a crude approach(ie it implies a single rate across all reactions) it allowsus to fit a synchronous Boolean simulation to data Henceall data points can be fitted to the model and hyperedges thatcause feedback in the model can be included which allows themodel to reveal more complex dynamics such as oscillationsCNORdt still describes the node states as either on or off (10)and the transfer functions are calculated as in section 83 Thescaling parameter is applied to the simulation of the systemand hence does not affect the transfer functions themselves

Figures 5 and 8 show how the NFκB oscillations can bepredicted by fitting a dynamic logic model to the full timecourse and maintaining the two steady-state assumptions fromsection 3 ie simulating lsquofastrsquo reactions from t = 0 to t = 10and lsquoslowrsquo reactions from t = 10 to t = 30

5 Constrained fuzzy logic

One of the main limitations of Boolean logic models isthat the assumption of a single level of activation (speciescan only be onoff) is biochemically unrealistic Fuzzylogic is another logic modeling formalism that allows forintermediate levels of activation It was originally developedin the field of control theory for predicting the outputsof complex processes where inputs could only partially becharacterized (Morris et al 2011a) Its strength lies in theflexibility it affords when defining relationships between inputand output nodes This flexibility can also be a weakness ifa large number of parameters are required to define thesefunctional relationships Constrained fuzzy logic (cFL) dealswith this potential complexity by limiting the repertoire ofrelationships between nodes The cFL formalism used inCellNOpt (CNORfuzzy) is fully described in Morris et al(2011a) Briefly the relationships (or transfer functions)between nodes in cFL are limited to Hill functions Henceeach transfer function has two free parameters the Hillcoefficient n which controls the steepness of the function andthe sensitivity parameter k which determines the midpointof the function (ie the value of the input that produceshalf the maximal output) By varying these two parameterslinear sigmoidal and step-like dynamics can be producedthat are good approximations to proteinndashprotein interactionsand enzymatic reactions In CNORfuzzy further constraintsare imposed by initially limiting the possible parameter

9

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 5 The fit of the model at multiple time points using fast (t = 0 to t = 10) and slow (from t = 10 to t = 30) timescales t is measuredin minutes and the y-axis is the normalized activity of the measured proteins The training in CNORdt took 300 s

combinations to a subset of discrete values Details of thetransfer functions used can be found in materials and methodssection 84

51 Model training and simulation

Modeling training and simulation in CNORfuzzy is carriedout in a similar manner to the Boolean steady-state formalismAfter compression and expansion of the logic hypergraph agenetic algorithm determines transfer functions and a networktopology that minimize the MSE between the model and thedata at steady state This is followed by a number of refinementsteps that fine-tune the Hill function parameters and reducethe complexity of the network topology The in silico data andmodel fit at t1 = 10 are shown in figure 6

52 Interpretation

CNORfuzzy is capable of fitting intermediate values (figure 6)For most cases the cFL model generates similar fits to thesteady-state Boolean model However the fit to data is moreaccurate since the values are continuous and not limited to0 or 1 More importantly the cFL model obtains a better fitfor p38 as it uncovers a link in the structure that Booleanmodels are unable to capture In the lsquotruersquo network TNFα andEGF are both required to activate p38 (albeit the activationis low relative to the other signals) In the previous Booleanformalisms this low activation of p38 cannot be modeled as the

simulation can only take the values 01 However CNORfuzzyis capable of adding the hyperedge lsquoMap3K1 AND Map3K7rarr MKK4rsquo (figure 8) to explain this activation and hence movea step closer to finding the underlying true network

The CNORfuzzy model fit also illustrates some caveatsassociated with fuzzy logic We can see that CNORfuzzyalso retains the Map3K7 rarr p38 hyperedge (figure 8) thusactivating p38 with TNFα stimulation alone (ie in the absenceof EGF stimulation) This occurs as CNORfuzzy attempts to fitthe noisy signal of inactive p38 thus adding a hyperedge thatis not present CNORfuzzy also adds hyperedges from TNFα

to AP1 that convey a weak activating signal to compensatefor the missing hyperedges (TRAF2 rarr ASK-1 ASK-1 rarrMap3K7) from the PKN (figure 1) These examples illustratethe sensitivity of the cFL approach to the data quality and thiscan make interpretation of the results more subtle and difficult(Morris et al 2011a)

6 Logic ODEs

The Boolean logic formalisms described above canqualitatively fit the network topology and logic gates thatbest describe the underlying data cFL can add quantitativeinformation by its ability to fit intermediate values between0 and 1 at steady state In terms of time however all theseformalisms rely on discrete simulations To obtain a fullycontinuous model both in state and time CNORode adds to

10

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 6 The fit of the trained model at t1 = 10 using constrained fuzzy logic t is measured in minutes and the y-axis is the normalizedactivity of the measured proteins The training in CNORfuzzy took 1200 s

these methods by transforming a discrete logic model to acontinuous model It does this by defining a set of ODEs foreach model species There are several formalisms to convertdiscrete logic to continuous models (eg SQUAD (Di Caraet al 2007)) or hybrid models (eg piecewise linear models(de Jong 2002)) CellNOpt includes the method developed byWittmann et al (2009) that was implemented in Matlab asOdefy (Krumsiek et al 2010)

61 Converting from Boolean to continuous

The approach used to convert Boolean to continuous models isfully explained in Wittmann et al (2009) Briefly the goal is tosimulate the full dynamics of each species in the logic modelwhile retaining consistency with the Boolean representationWhat this means is that where the output of a logic gate is 0or 1 the ODEs replacing a Boolean state should also returnto 0 or 1 This is achieved in a similar manner to cFL (butwith an additional parameter τ ) by applying a normalizedHill function between the intervals 0 and 1 Applying thesefunctions to each hyperedge defines a new continuous ODEmodel to replace the underlying Boolean model This is morefully explained in section 85

62 Parameter estimation

CNORode currently provides links to two stochastic non-local optimization algorithms a genetic algorithm (genalg

package httpcranr-projectorgwebpackagesgenalg) andan implementation in R of scatter search (Egea and Martı2010) These are used to fit the Hill function parameters k andn and the ODE parameter τ to each logic gate in a model thathas been already topologically optimized by one or more ofthe other formalisms

63 Compressing an ODE model

Compression of the model before training may lead to the lossof elements important to capture dynamic features and mustthus be done with caution Returning to our example (figure 2)the in silico data were generated through a set of normalizedHill functions Hence with the exception of AP1 (wherethe missing hyperedge prohibits any exact simulation of thissignal) CNORode should be capable of simulating exactlythe other signals in the system after parameter optimization ofthe associated logic ODEs However this may not be possiblewhen the model is compressed To give an example in ourtoy model (figure 1) the pathway consisting of SOS-1 RasRaf-1 MEK 1 and ERK is compressed to SOS-1 rarr Raf-1 rarr ERK The in silico data were generated with ODEsdescribing the uncompressed interactions We can see fromfigure 7 that the compressed model can accurately simulatethe in silico data for this pathway (Raf-1 and ERK signals) Inthis case the normalized Hill functions have enough dynamicplasticity to summarize four interactions (SOS-1 rarr Ras rarrRaf-1 rarr MEK 1 rarr ERK) as two (SOS-1 rarr Raf-1 rarr ERK)

11

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Rafminus1 ERK AP1 GSKminus3 p38 NFκB Stim Inh

05

05

05

05

05

05

05

05

05

0 10 30

05

0 10 30 0 10 300 10 30 0 10 30 0 10 30

egf

tnfa

pi3k

i

raf1

i

Figure 7 The fit of the trained model using CNORode t is measured in minutes and the y-axis is the normalized activity of the measuredproteins The parameter training in CNORode took 2000 s

However this is not the case where we have feedback fromERK through a phosphatase (ph) back to SOS-1 and NFkBthrough expression (ex) back to IkB In these cases it isnecessary to not compress lsquophrsquo and lsquoexrsquo to allow CNORodeto model the correct dynamics (transience and oscillationsrespectively) The non-compression is required as lsquophrsquo andlsquoexrsquo are integral to the dynamics observed in the in silicodata So figures 7 and 8 show with the exception of AP1 thatCNORode can accurately model the in silico data of the toymodel once compression of those key nodes is suppressed

7 Summary and future developments

In this contribution we have reviewed different logic-basedapproaches to model signal transduction networks Recentdevelopments in proteomics techniques both antibody based(xMAP protein arrays high-throughput microscopy etc) tomass spectrometry methods (Terfve and Saez-Rodriguez 2012)allow us to generate a large amount of phosphoproteomic dataGiven the size of the underlying networks we believe thatlogic-based models which do not need extensive biochemicaldetail and thus lead to tractable models even when dealing withmultiple pathways are a useful approach to analyzing signaltransduction on a large scale Therefore we have focused ourwork on how to train logic models to experimental data andimplemented various methodologies toward this end in ourtool CellNOptR

Our recent developments presented here expand ourprevious work by including strategies to deal with theinherent dynamic nature of signaling processes (and hencewith time series data) We have discussed how modelingdynamic aspects require more detailed formalisms (and thusin general more data and computational time) and how thegeneral methodology has to be re-evaluated at multiple levelsin particular the compression of the network prior to theoptimization hence we are currently working to develop ageneral compression routine for dynamic models Anotherarea of active development is the implementation of efficientoptimization strategies to identify both structure and (ifexisting) continuous parameters (Banga 2008) Although wehave covered here a broad palette of logic-based formalismswe plan to explore other approaches Some are combinations ofwhat we have discussed (eg a cFL formalism simulated overmultiple timescales) others are formalisms related to thoseused here (eg SQUAD) or others could add new featuressuch as a probabilistic framework (Shmulevich et al 2003)stochasticity (Albert et al 2008) or formal methodologies(Fisher and Henzinger 2007)

For the sake of simplicity we have used a toy model thatis itself based on a logic formalism to exemplify the potentialdynamic behavior and thereby different modeling variants Weare currently working on more realistic benchmarks based onbiochemical models and studying in more detail the role ofexperimental noise and experimental design in recovering theunderlying model structure

12

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

Figure 8 The contribution of each logic modeling formalism to the understanding of the model used to simulate the in silico training dataThe time taken for training the model using each formalism is also shown

As illustrated in our example with the link TRAF2 rarrASK-1 rarr MKK7 databases are comprehensive but notcomplete and it is therefore likely that important links aremissing from the system of interest (Kirouac et al 2012)To overcome this limitation we are working on strategiesto integrate as many network resources as possible Theseinclude methods that propose novel links that expand the priorknowledge network (Saez-Rodriguez et al 2009 Eduati et al2010) and the use of information from PINs (Vinayagam et al2011)

The focus of CellNOptR is the calibration of logicmodels to data but a large set of other tools exist that analyzelogic models from different angles (Morris et al 2010) Forexample the Q2LM toolbox (Morris et al 2011b) uses cFLto understand the effect of perturbations in the context ofthe whole system under investigation (eg under what setof stimuli is a therapeutic perturbation most effective)CellNetAnalyzer (Klamt et al 2007) has a battery ofmethods from graph theory as well as specific techniques for

logic models These include minimal intervention sets (theminimum number of perturbations for a desired phenotype)to propose possible therapeutic targets These tools use thesame model format as CellNOptR so it is easy to pass modelsfor analysis More generally we are part of the CoLoMoToinitiative which aims to facilitate interoperability among thesetools the main goal here is the development of SBML-qual asa language to exchange logic models (sbmlorgCommunity

WikiSBML_Level_3_ProposalsQualitative_Models) aswell as the implementation of the SBGN format for networkrepresentation (Novere et al 2009)

In general efficient integration of data and priorknowledge to model signal transduction require the use ofappropriate standards for data prior knowledge about thenetworks and the models themselves (Saez-Rodriguez et al2011a) We consider that logic models will be an area ofdevelopment in the future with increasing application to signaltransduction research

13

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

(A) (B ) (C)

Figure 9 An overview of the graphical representation of logic models (A) The SOP expression for the activation of C summarized as anXOR gate (B) SOP expressions describing the activation of C and D (C) An example of a hypergraph representation where the nodes areconnected by hyperedges

8 Materials and methods

81 CellNOptR

As mentioned in section 14 CellNOptR includes someadditional steps in pre-processing logic models beforesimulation and training to data The details of these stepscan be found in Saez-Rodriguez et al (2009) Briefly themodel is compressed by removing non-identifiable elementsThese include nodes on terminal branches that are not part ofthe experimental design (non-observables figure 1 p90RSKand CREB) nodes that are not affected by the inputs orperturbations (non-controllables) and additional nodes that canbe removed without affecting logic outcome during simulation(figure 1 Ras MEK 1 etc)

After this compression step a superstructure of allpossible hyperedges is created (figure 1 inset) Thissuperstructure contains lsquothe spacersquo of hyperedges that isoptimized (through the removal of redundant hyperedges)by training to the experimental data The training uses agenetic algorithm to search for logic models that minimizea bipartite function This function includes the MSE betweenthe simulation of the optimized logic model and the data anda penalty term for model size Depending on the formalismused (see the main text) the simulation and data may be atsteady state (CellNOptR CNORfuzzy) or all data points canbe used (CNORdt) The resulting logic model is then a subsetof the superstructure and contains only the hyperedges that bestexplain the experimental data (with the additional attribute ofparsimony given the size penalty in the optimization function)

82 Network and data generation

The toy model was constructed manually and is basedon the model from Saez-Rodriguez et al (2011a) Thein silico data were generated from the toy model usingCNORode The parameters were manually adjusted to modelas closely as possible the known dynamics of ERK andNFκB activation After simulation noise was added to eachdata point according to N(μ σ 2) where μ = 0 and σ 2 =005 The data were then rescaled between the intervals[0 1] Two methods of cross validation were also performed

to demonstrate the robustness of CellNOptR (steady-stateBoolean) to sparseness in the data (figure S8 (available fromstacksioporgPhysBio9045003mmedia))

Model and data files together with the correspondingR scripts can currently be found at httpwwwebiacuksimaidanmacpubliclogicModelingTutorial (passwordtutorial)

83 Boolean logic

A Boolean model can be represented as follows

(1) N species X1 X2 XN each represented by a variablexi taking values 0 or 1

(2) For each species Xi there are a subset of species Ri =Xi1 Xi2 XiNi sub X1 X2 XN that influencexi

(3) And for each species Xi an update function Bi 0 1Ni rarr 0 1From these set of rules the state of each species at time

t + 1 is a function of the state of its influencing species at timet (Kauffman 1969)

So how does the function Bi (also called a transferfunction) for each species Xi deal with inputs from othernodes Bi can be represented in a sum-of-product (SOP)formulation (Mendelson 1970) which allows for multiplepossible inputs (AND NOT OR gates) to be processed into asingle output To illustrate this consider the following simpleexample (figure 9)

We know that the element D is activated by a combinationof A and B (ie both A and B are needed for activation) Henceboth the graphical and written representation of this activationis relatively straightforward

B1 (a b) = a and b

However in the case of the activation of C this occurs whenA is active without B or when B is active without A In thiscase one needs some additional rules of representation

The SOP representation allows the above activation to bewritten using only AND NOT and OR operators

B1 (a b) = (a and notb) or (nota and b)

14

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

(a) (b) (c) (d)

Figure 10 The construction of gates with cFL (a) activating (b)inhibitory (c) an AND gate and (d) an OR gate

This is done by calculating the product within brackets andsumming between brackets Essentially SOP representationsare rules of precedence for complex multi-node inputs In termsof graphically representing the activation of C its activationcannot be easily represented using standard SBGN AND NOTor OR operators (figure 9) Hence this SOP expression can besummarized as an XOR gate

A logic network where relations are encoded by SOPexpressions that can be represented as a hypergraph (Klamtet al 2006) A hypergraph is defined as a set of nodes connectedby hyperedges where a hyperedge is a generalization of anedge that can be connected to more than two nodes This inturn can facilitate a more precise representation of biologicalknowledge (for example where two proteins are necessary forthe activation of a target)

84 Fuzzy logic

cFL defines the transfer function between nodes as a Hillfunction Depending on the type of interaction (or logic gatefigure 10) this function can take different forms (Morris et al2011a)

(a) If node C depends only on A a normalized Hill functionis used to calculate C where k and n are the sensitivitycoefficient and Hill coefficient respectively

c = (kn + 1)an

kn + an

(b) An inhibitory relationship is represented as the aboveexpression subtracted from 1

c = 1 minus (kn + 1)an

kn + an

(c) An AND gate the minimum value of c is used

c = min

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

(d) And for an OR gate the maximum value is used

c = max

((kn2

1 + 1) an2

kn21 + an2

(kn2

2 + 1) bn2

kn22 + bn2

)

85 Logic ODEs

As in the case of cFL CNORode uses phenomenologicaltransfer functions (ie non-mechanistic normalized Hillfunctions) to describe the dynamics of a nodersquos state as afunction of its inputs Using the examples in figure 10 againthese functions can be described as follows

(a)

c = 1τ(B(a) minus c) where

c is the development of cover time B(a) is the normalized Hill function of thecontinuous variable a This takes the form an

kn+an

1n

kn+1n

(k and n are again the sensitivity and Hill coefficientsrespectively) τ can be interpreted as the maximumvalue of species c (biologically this could encompassdegradation or other limiting factors) and there is anadditional degradation term proportional to c

(b) An inhibitory relationship is simply the above expressionsubtracted from 1

c = 1 minus 1τ(B(a) minus a)

(c) The AND gates take the form

c = 1τ(B(a)B(b) minus c)

(d) The OR gate notation is as follows

c =1τ(B(a)B(b) + B(a)[1 minus B(b)] + B(b)[1 minus B(a)] minus c)

In the case of an AND gate the product of B (a) andB(b) is taken which maintains consistency in the output withthe equivalent Boolean model (ie if a = 1 and b = 0 inboth ODE and logic formalisms c = 0 similarly with an ORgate if a = 1 and b = 0 in both ODE and logic formalismsc = 1) As in the case of cFL normalized Hill functions canapproximate commonly observed biochemical dynamics suchas linear sigmoidal and step-like behavior

Acknowledgments

The authors thank J Banga J Egea Inna Pertsovskaya andMelody Morris for valuable help and discussion Fundingwas provided by the EU-7FP-BioPreDyn and EMBL-EIPODprograms

References

Aislyn D W and Boran R I 2010 Systems approaches topolypharmacology and drug discovery Curr Opin DrugDiscov Dev 13 297ndash309

Albert I et al 2008 Boolean network simulations for life scientistsSource Code Biol Med 3 16

Aldridge B B et al 2006 Physicochemical modelling of cellsignalling pathways Nature Cell Biol 8 1195ndash203

Assmann S M and Albert R 2009 Discrete dynamic modeling withasynchronous update or how to model complex systems in theabsence of quantitative information Methods Mol Biol553 207ndash25

Bader G D Cary M P and Sander C 2006 Pathguide a pathwayresource list Nucleic Acids Res 34 D504ndash6 (Database issue)

Banga J 2008 Optimization in computational systems biology BMCSyst Biol 2 47

Barabasi A-L and Oltvai Z N 2004 Network biology understandingthe cellrsquos functional organization Nature Rev Genet 5 101ndash13

Bauer-Mehren A Furlong L I and Sanz F 2009 Pathway databasesand tools for their exploitation benefits current limitations andchallenges Mol Syst Biol 5 290

Bossi A and Lehner B 2009 Tissue specificity and the humanprotein interaction network Mol Syst Biol 5 260

Calzone L et al 2008 A comprehensive modular map of molecularinteractions in RBE2F pathway Mol Syst Biol 4 173

Calzone L et al 2010 Mathematical modelling of cell-fate decisionin response to death receptor engagement PLoS Comput Biol6 e1000702

Chen W W Niepel M and Sorger P K 2010 Classic andcontemporary approaches to modeling biochemical reactionsGenes Dev 24 1861ndash75

15

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References

Phys Biol 9 (2012) 045003 A MacNamara et al

de Jong H 2002 Modeling and simulation of genetic regulatorysystems a literature review J Comput Biol 9 67ndash103

Di Cara A et al 2007 Dynamic simulation of regulatory networksusing SQUAD BMC Bioinformatics 8 462

Eduati F et al 2010 A Boolean approach to linear prediction forsignaling network modeling PLoS One 5 e12789

Egea J and Martı R 2010 An evolutionary method forcomplex-process optimization Comput Oper Res 37 315ndash24

Faure A et al 2006 Dynamical analysis of a generic Boolean modelfor the control of the mammalian cell cycle Bioinformatics22 e124ndash31

Fisher J and Henzinger T A 2007 Executable cell biology NatureBiotechnol 25 1239ndash49

Garg A et al 2008 Synchronous versus asynchronous modeling ofgene regulatory networks Bioinformatics 24 1917ndash25

Gonzalez A Chaouiya C and Thieffry D 2008 Logical modelling ofthe role of the Hh pathway in the patterning of the Drosophilawing disc Bioinformatics 24 i234ndash40

Helikar T et al 2008 Emergent decision-making in biological signaltransduction networks Proc Natl Acad Sci USA105 1913ndash8

Hlavacek W S et al 2006 Rules for modeling signal-transductionsystems Sci STKE 2006 re6

Hoffmann A et al 2002 The IkappaBndashNFndashkappaB signalingmodule temporal control and selective gene activation Science298 1241ndash5

Hucka M et al 2003 The systems biology markup language(SBML) a medium for representation and exchange ofbiochemical network models Bioinformatics 19 524ndash31

Ilsley G R Luscombe N M and Apweiler R 2009 Know your limitsassumptions constraints and interpretation in systems biologyBiochim Biophys Acta 1794 1280ndash7

Joslashrgensen C and Linding R 2010 Simplistic pathways or complexnetworks Curr Opin Genet Dev 20 15ndash22

Junker B H Klukas C and Schreiber F 2006 VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 7 109

Kauffman S A 1969 Metabolic stability and epigenesis in randomlyconstructed genetic nets J Theor Biol 22 437ndash67

Khoury G A Baliban R C and Floudas C A 2011 Proteome-widepost-translational modification statistics frequency analysisand curation of the swiss-prot database Sci Rep 1 90

Kirouac D C et al 2012 Creating and analyzing pathway and proteininteraction compendia for modelling signal transductionnetworks BMC Syst Biol 6 29

Klamt S Saez-Rodriguez J and Gilles E D 2007 Structural andfunctional analysis of cellular networks with CellNetAnalyzerBMC Syst Biol 1 2

Klamt S et al 2006 A methodology for the structural and functionalanalysis of signaling and regulatory networks BMCBioinformatics 7 56

Kreeger P K and Lauffenburger D A 2010 Cancer systems biologya network modeling perspective Carcinogenesis 31 2ndash8

Krumsiek J et al 2010 Odefymdashfrom discrete to continuous modelsBMC Bioinformatics 11 233

Liberali P Ramo P and Pelkmans L 2008 Protein kinases starting amolecular systems view of endocytosis Annu Rev Cell DevBiol 24 501ndash23

Marsquoayan A et al 2005 Formation of regulatory patterns duringsignal propagation in a mammalian cellular network Science309 1078ndash83

Mendelson E 1970 Boolean Algebra and Switching Circuits(Schaumrsquos Outline Series) (New York McGraw-Hill)

Mendoza L and Xenarios I 2006 A method for the generation ofstandardized qualitative dynamical systems of regulatorynetworks Theor Biol Med Modelling 3 13

Mitsos A et al 2009 Identifying drug effects via pathwayalterations using an integer linear programming optimizationformulation on phosphoproteomic data PLoS Comput Biol5 e1000591

Morris M K et al 2010 Logic-based models for the analysis of cellsignaling networks Biochemistry 49 3216ndash24

Morris M K et al 2011a Training signaling pathway maps tobiochemical data with constrained fuzzy logic quantitativeanalysis of liver cell responses to inflammatory stimuli PLoSComput Biol 7 e1001099

Morris M K et al 2011b Querying quantitative logic models(Q2LM) to study intracellular signaling networks andcellcytokine interactions Biotechnol J 7 374ndash86

Novere N L et al 2009 The systems biology graphical notationNature Biotechnol 27 735ndash41

Oda K et al 2005 A comprehensive pathway map of epidermalgrowth factor receptor signaling Mol Syst Biol 1 20050010

Patlak M 2010 Competitors try collaboration to speed drugdevelopment J Natl Cancer Inst 102 841ndash3

Pieroni E et al 2008 Protein networking insights into globalfunctional organization of proteomes Proteomics 8 799ndash816

Rangamani P and Iyengar R 2008 Modelling cellular signallingsystems Essays Biochem 45 83ndash94

Saez-Rodriguez J et al 2009 Discrete logic modelling as a means tolink protein signalling networks with functional analysis ofmammalian signal transduction Mol Syst Biol 5 331

Saez-Rodriguez J Alexopoulos L G and Stolovitzky G 2011aSetting the standards for signal transduction research SciSignal 4 pe10

Saez-Rodriguez J et al 2011b Comparing signaling networksbetween normal and transformed hepatocytes using discretelogical models Cancer Res 71 5400ndash11

Sahin O et al 2009 Modeling ERBB receptor-regulated G1Stransition to find novel targets for de novo trastuzumabresistance BMC Syst Biol 3 1

Sasagawa S et al 2005 Prediction and validation of the distinctdynamics of transient and sustained ERK activation NatureCell Biol 7 365ndash73

Schlatter R et al 2009 ONOFF and beyondmdasha boolean model ofapoptosis PLoS Comput Biol 5 e1000595

Shmulevich I et al 2003 Steady-state analysis of genetic regulatorynetworks modelled by probabilistic boolean networks CompFunct Genomics 4 601ndash8

Terfve C and Saez-Rodriguez J 2012 Modeling signaling networksusing high-throughput phospho-proteomics Adv Exp MedBiol 736 19ndash57

Thakar J and Albert R 2010 Boolean models of within-host immuneinteractions Curr Opin Microbiol 13 377ndash81

Vinayagam A et al 2011 A directed protein interaction networkfor investigating intracellular signal transduction Sci Signal4 rs8

Watterson S Marshall S and Ghazal P 2008 Logic models ofpathway biology Drug Discov Today 13 447ndash56

Wittmann D M et al 2009 Transforming Boolean models tocontinuous models methodology and application to T-cellreceptor signaling BMC Syst Biol 3 98

16

  • 1 Introduction
    • 11 From parts to interactions
    • 12 From interactions to mechanism
    • 13 Logic modeling
    • 14 Software
    • 15 The example model
      • 2 Boolean steady state
        • 21 Steady-state optimization and simulation
        • 22 Interpretation of steady-state result
          • 3 Two time points (or additional steady state)
            • 31 Multiple steady-state optimization and simulation
            • 32 Interpretation
              • 4 Synchronous multiple time-point simulation and multiple timescales
                • 41 Synchronous and asynchronous updating
                  • 5 Constrained fuzzy logic
                    • 51 Model training and simulation
                    • 52 Interpretation
                      • 6 Logic ODEs
                        • 61 Converting from Boolean to continuous
                        • 62 Parameter estimation
                        • 63 Compressing an ODE model
                          • 7 Summary and future developments
                          • 8 Materials and methods
                            • 81 CellNOptR
                            • 82 Network and data generation
                            • 83 Boolean logic
                            • 84 Fuzzy logic
                            • 85 Logic ODEs
                              • Acknowledgments
                              • References