Inferring Quantitative Models of Regulatory Networks From Expression Data

Inferring Quantitative Models of Regulatory Networks From

Expression Data

Iftach NachmanHebrew University

Aviv RegevHarvard

Nir FriedmanHebrew University

Goal: Reconstruct Cellular Networks

Biocarta. http://www.biocarta.com/

Structure Function Dynamics

Conditions

Ge

ne

s

Common approach:Interaction Networks

Different semantics for networks Boolean, probabilistic, differential

equations, …

A Major Assumption…

mRNA tr. rate

protein

active protein

mRNA

mRNA degradation

TF

G

TF

G

TF

G

TF

Activation signal

Hidden

mRNA

Observed

Realistic Regulation Modeling Model the closest

connection

Active protein levels are not measured

Transcript rates are computed from expression data and mRNA decay rates

Realistic biochemical model of transcription rates

TF

G

TF

G

TFHidden

Observed

proteinmRNA

mRNA tr. rate

active protein

Activation signal

mRNA degradation

HiddenObserved

OnOff

Modeling Transcription Rate

Simplest case: one activator

G

TF

mRNA transcripts

G

TF

On

[McAdams & Arkin, 1997; Ronen et al, 2002]

P( )• Avg rate = + P( )•


Steady state equations:G

TF

1tf][S[S]

[S][tf]κtf][Sκ bd

Concentration of free promoters

Concentration of bound promoters

Concentration of TF

[tf]1[tf]

[tf])| P(

[tf]11

[tf])| P(

][1][

([tf])Ratetf

tf

0

db

d

b

κκ


G

TF

0

db

= 1

= 4 = 20

= 250

TF activity

Tra

nsc

rip

tio

n

rate

TF

acti

vity

Time

= 1

= 4 = 20 = 250

Tra

ns

rate

Time

][1][

([tf])Ratetf

tf

[Buchler et al, 2003; Setty et al, 2002]

General Two Regulator FunctionTF2TF1

G

0 0.2 0.4 0.6

a

b

c

d

P(State)

a

b

c

d

1

X

2

G

TF

Similar models for other modes of binding: Competitive binding Cooperative binding

0 0.2 0.4 0.6

a

b

c

d

P(State)

0 0.2 0.4 0.6

a

b

c

d

P(State)

General Two Regulator FunctionTF2TF1

G

b= 0

a= 0

c= 0

d =1

b= 1

c= 1

b

a

c

d

X

X

X

X

= Average Rate

Rate

“AND” gate“OR” gate

a

b

c

d

[Buchler et al, 2003; Setty et al, 2002]

Avg rate = function of TF concentrations

Few parameters: Affinity parameters Rate parameters

Models of Regulatory NetworksRegulators(activity)

Target Genes(trans. rate)

G4

TF2TF1

G3G2G1

TF3

G5 G6 G7

Noise

Observed rates

?Predicted rates

TF

acti

vity

Time

Tra

ns

rate

Time

Learning

Learning From Data

Transcriptionrates

Expressiondata

mRNA decay rates

Kinetic parameters

G4

TF2TF1

G3G2G1

TF1

TF2

+

Gradient

ascent

TF1

TF2

G4

TF2TF1

G3G2G1

Learning

Cell Cycle Experiment

Transcriptionrates

Expressiondata

mRNA decay rates

Kinetic parameters

+

Biological Databases [YPD]

ChIP location [Lee et. al]

7 regulators & 141 target genes

Cell cycle gene expression

[Spellman et. al]

+

mRNA decay rates[Wang et al] Transcription rates

M/G1

G1

S

S/G2

G2/M

predictionsinput

par

amet

ers

0

2

1


17x141 = 2397Data points

466parameters

17x7 = 119Regulator activity

values

G1 G2 G1 G2

FKH1 FKH2

G1 G2 G1 G2

SWI5

ACE2

Regulator Activity Profiles

When are they active?

Known biology: SWI4 & MBP1:

mid-late G1 FHK1: S/G2 FKH2: G2/M SWI5: M/G1

G1 G2 G1 G2

MBP1

SWI4

Reconstructed activity profiles match direct experimental knowledge

Regulator Activity Profiles

When are they active? Could we reconstruct these

from mRNA profiles?

Known biology: SWI5 is transcriptionally

regulated MCM1 is notRegulator’s own mRNA is not sufficient to

reconstruct activity levels

mRNA profile

SWI5

Activity

mRNA

MCM1

Activity

mRNA

M/G1

G1

S

S/G2

G2/M

input predictions


How well are we doing?

residue

1predictedinput

r

ModelLearning

ab initio Learning

Transcriptionrates

Learning

Expressiondata

mRNA decay rates

Kinetic parameters

G4

TF2TF1

G3G2G1

TF1

TF2

+Big assumption: Network topology is given Unrealistic, even for well understood

systems

+

Challenge:

Reconstruct network topology? Number of regulators Their joint effect on target genes

How Do We Learn Structure?Standard approach: hill climbing search

G4

TF2TF1

G3G2G1

G4

TF2TF1

G3G2G1G4

TF2TF1

G3G2G1

G4

TF2TF1

G3G2G1

G4

TF2TF1

G3G2G1

G4

TF2TF1

G3G2G1

-17.23

-23.13-19.19

G4

TF2TF1

G3G2G1

TF3

Problem:

Scoring structures is costly Requires non-linear parameter

optimization Impractical on real data

Pred(G|TF,Y)

Ideal regulator

TimePred(G|TF)

TF

G

Y

Step 1:Compute optimal

hypothetical regulator

Time

reg

ula

tors

Step 2:Search for

“similar” regulator

TF1

TF2

TF3

TF4

Activity level

Target Profile

Ideal Regulator MethodGoal: Consider adding edges

Idea: Score only promising candidates

Parent(s) activity

Predicted(G|TF,TF2)

Time

reg

ula

tors

TF1

TF2

TF3

TF4

Step 3:Add new parent

and optimize parameters

Time

Step 1:Compute optimal

hypothetical regulator

Step 2:Search for

“similar” regulator

Pred(G|TF,Y)

Ideal regulator

Y

Target Profile

TF

G

TF2

Crucial point: Choice of similarity measure Principled approach see [Nachman et al UAI04]

Provides approximation to Δlikelihood

Ideal Regulator MethodGoal: Consider adding edges

Idea: Score only promising candidates

New regulator: “centroid” of selected ideal regulators

Adding New Regulator

Ideal regulators

Idea: Introduce hidden regulator for geneswith similar ideal regulator

TFnew

G1 G2 G4

G1

G2

G3

G4

G5

Y1

Y2

Y3

Y4

Y5

Time

M/G1

G1

S

S/G2

G2/M

Inputrates

0

2

1

Curatedprior knowledge

466 params

ab initiofrom scratch461 params

Ab initio Structure Learning

Inputrates

Curatedprior knowledge

466 params

ab initiofrom scratch461 params

M/G1

G1

S

S/G2

G2/M

0

2

1

0

200

400

600

800

1000

1200

1400

1600

Curated Ab Initio

log likelihood

BIC

Ab initio Structure Learning

0 20 40 60 80 100 120

H2

SWI5

H4SWI4

Significant target overlap & correlated activity

Significant target overlap & weak correlation

H1

MBP1

H3

FKH2

curated

ab initio

targetgenes

regulators

regulators

Regulators: ab initio vs. curated

H1 H2H4 H3H5 H6 H7

SWI4 MBP1 ACE2 FKH1 SWI5 MCM1 FKH2

curated

ab initio

targetgenes

regulators

regulators

Significant agreement with “known” topology Both in structure & dynamics

Improved predictions

Regulators: ab initio vs. curatedSWI4 MBP1 ACE2 FKH1 SWI5 MCM1 FKH2

H1 H2H4 H3H5 H6 H7

ModelLearning

Conclusions

Kinetic parameters

G4

TF2TF1

G3G2G1

TF1

TF2

+

+

Transcriptionrates

Network(prior knowledge)

G4

TF2TF1

G3G2G1

Realistic model, based on first principles

Learning procedure Reconstruct unobserved activity profiles Reconstruct network topology

Insights into Structure & Dynamics Function

Future Directions

Prior knowledge ChIP location Cis-regulatory

elements

External perturbations

Internal feedback

G4

TF2TF1

G3G2G1

TF3

G5 G6 G7

Documents

Inferring Quantitative Models of Regulatory Networks From Expression Data