Constraint Programming in Community-based Gene ...fferdinando3/files/talks/...BackgroundConstraint Programing in Community NetworksExperiments and ResultsConclusions Constraint Programming

Background Constraint Programing in Community Networks Experiments and Results Conclusions

Constraint Programming in Community-basedGene Regulatory Network Inference

Ferdinando Fioretto Enrico Pontelli

Dept. Computer Science, New Mexico State University

Sept. 24, 2013


Talk Outline

1 Background

2 Constraint Programing in Community Networks

3 Experiments and Results

4 Conclusions


Gene Regulatory Networks

A cell contains different entities (including pro-teins, RNA) which interact and perform specificfunctions.




DNA transcription




DNA transcription

mRNA translation



Some proteins (Transcriptor Factors (TF)) can regulate theproduction of other proteins.

Done by enhancing or inhibiting DNAtranscription or mRNA translation.

The unit of encapsulation of theseinteractions are the coding regions of the DNA: the genes.

A Gene Regulatory Network is the set of the interactions amonggenes.


Gene Regulatory NetworksModeling

A GRN is described by a weighted directed graph G = (V,E).

V is the set of genes of the network.

E ⊆ V × V × [0, 1] is the set of the regulatory interactions.

Each regulatory interaction s→ t is associated with a confidencevalue ωs→t ∈ [0, 1].

Example

G1 regulates G2.

G2 regulates G5.

G3 is regulated by G4.

G4 regulates G2 and is regulated by G5.


Gene Regulatory Network InferenceGRN inference from high-throughput data

Motivation:

Key to understand important genetic diseases, such as cancer.

Crucial to devise effective medical interventions.


Gene Regulatory Network InferenceCurrent Methods and Challenges

Methods proposed:Correlation-based.

Information-theoretic based.

Boolean Networks.

Bayesian Networks.

Regression-based.

Stochastics.Based on different assumptions.

Exhibits peculiar limitations.

Solutions proposed:Integrating heterogeneous data into the inference model.Meta-approaches using multiple inference models (CommunityNetworks (CN)).


Gene Regulatory Network InferenceCurrent Methods and Challenges

Methods proposed:Correlation-based.

Information-theoretic based.

Boolean Networks.

Bayesian Networks.

Regression-based.

Stochastics.Based on different assumptions.

Exhibits peculiar limitations.Solutions proposed:

Integrating heterogeneous data into the inference model.Meta-approaches using multiple inference models (CommunityNetworks (CN)).


Gene Regulatory Network InferenceCommunity Networks

community network

G1

G2

GJ

edge ranking

Borda voting score:

ωs#→t

=1|G|

|G|∑

j=1

ω j

s#→t

ω j

s#→t

: the ranked interaction s → t

by the j-th method in G.

D. Marbach et al. “Wisdom of crowds for robust gene network inference”.Nature Methods, 9(8):796–804, Aug. 2012.


Gene Regulatory Network InferenceOur Approach

CN approach for an “initial analysis” of the GRN.Community prediction collective agreements.

Integrate additional biological knowledge (when available).Leverage specific GRN properties.

Why CP ?





Why CP ?


Constraint ProgrammingConstraint Satisfaction Problem (CSP)

Variables X : xi = position of the queen in the ith column.

Domains D: Dxi = {1, . . . , n}.Constraints C: ∀i,∀j with i < j:

xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j

Search = Labeling +Constraint Propagation

Solution = assignmentfor X satisfying all c ∈ C





xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







xi 6= xj

xi + i 6= xj + j

xi − j 6= xj − j







Why CP ?Separation between prediction methods and model.Declaratively.Constraint expressions allow incremental model refinement.


Constrained Community NetworksCSP Modeling

GRN inference (GRNi) problem:

Given a set of n genes, a GRNi is a CSP 〈X ,D, C〉X = 〈x1, . . . , xn2−n〉(regulatory relations, exuding self regulations).

D = 〈D1, . . . ,Dn2−n〉, with each Dk = {0, . . . , 100}(possible confidence values).

C is a list of constraints expressing properties of the GRNs.

Notation:

xs→t: “s regulates t” and Ds→t its domain.

d(xs→t): the value assigned to xs→t.


Constrained Community NetworksCSP Modeling

A solution to the GRNi defines a GRN prediction G = (V,E)

V = {1, . . . , n},E = {〈s, t,w〉 | d(xs→t) > 0}, where w = d(xs→t)/100.


Constrained Community NetworksE.coli2 size 10 (from DREAM3)

G2 G10G9

G1 G5

G8

G7

G6

G4

G3


Constrained Community NetworksE.coli2 size 10 CN prediction


Analysis and Domains ReductionThe pre resolution phase

Leverage the collection of GRN predictions G by:(i.) Reducing the size of the solution search space.

(ii.) Integrate the Gj ∈ G taking into account their discrepancies.

Set up domains of each variable xs→t ∈ X , such that:

Ds→t = Ds→t ∩ Bs→twhere:

Bs→t ={

ωs#→t︸︷︷︸

if σs→t <θd

}

σs→t =1(|G|2

)|G|∑

j=1

|G|∑

i=j+1

∣∣ω j

s#→t− ω i

s#→t

∣∣

θd ∈ [0, 1] is a “disagreement threshold”.


Analysis and Domains ReductionThe pre resolution phase

Leverage the collection of GRN predictions G by:(i.) Reducing the size of the solution search space.

(ii.) Integrate the Gj ∈ G taking into account their discrepancies.

Set up domains of each variable xs→t ∈ X , such that:

Ds→t = Ds→t ∩ Bs→twhere:

Bs→t ={ω

s#→t− σs→t

2, ω

s#→t, ω

s#→t

+σs→t

2︸︷︷︸if σs→t ≥ θd ∧ 0.1<ω

s#→t< 0.9

}

σs→t =1(|G|2

)|G|∑

j=1

|G|∑

i=j+1

∣∣ω j

s#→t− ω i

s#→t

∣∣

θd ∈ [0, 1] is a “disagreement threshold”.


ConstraintsSparseness

Elements of a GRN are considered to be controlled by a smallnumber of genes: GRN are sparse.

Combining predictions in a CN does not guarantee sparseness.

Enforce a sparseness constraint by:

atleast k ge(kl,X, θl) :∣∣{xi ∈ X | d(xi) > θl}

∣∣ ≥ kl

and

atmost k ge(km,X, θm) :∣∣{xi ∈ X | d(xi) > θm}

∣∣ ≤ km

with kl,m > 0 and 0 ≤ θl,m ≤ 100, andwhere d(xi) indicates the value of an assignment for xi



atleast k ge(10,X , 65) ∩ atmost k ge(25,X , 65)



atleast k ge(10,X , 65) ∩ atmost k ge(25,X , 65)


ConstraintsRedundant edge

Several state-of-the art inference methods rely on techniqueswhich cannot discriminate causality (e.g., M.I., Correlation).

Given a collection of predictions G={G1, . . . ,GJ} for a GRNG=(V,E) and a non-empty set of non causal based methodsH ⊆ G, an edge t→ s is redundant if:

∀Gi ∈ G \ H . ω is→t > ω i

t→s + β

If an edge t→ s is redundant we call the edge s→ t required.

Let XR be the set of all the required and redundant variables,

red edge(xs→t, xt→s, θR, θr) : xs→t > θR ∧ xt→s < θr

with θR, θr ∈ N, and 0 ≤ θR ≤ 100.



∀xs→t, xt→s ∈ XR red edge(xs→t, xt→s, 75, 50)



∀xs→t, xt→s ∈ XR red edge(xs→t, xt→s, 75, 50)


ConstraintsSparseness + Redundant edge


ConstraintsTranscriptor Factor

Information about DNA-binding motifs often available frompublic sources (e.g., BDB, Gene Ontology).

Existing methods do not often allow integration of suchinformation (treated in postprocess).

A gene s ∈ V is a transcriptor factor (TF) if it regulates theproduction of other genes.

Express this property on the out-degree of s:

tf(s) : atleast k ge(ks,Xs, θs)

where: Xs = {xs→t ∈ X | t ∈ V}k is the co-expressing degree (the number of genes targeted by the TF).



atleast k ge(2,Ni, 85) with Ni = {xi→s | (∀Gj ∈ G) ω ji→s > 0.10}, (i = 1, 5, 9)



atleast k ge(2,Ni, 85) with Ni = {xi→s | (∀Gj ∈ G) ω ji→s > 0.10}, (i = 1, 5, 9)


ConstraintsCo-transcriptor Factors

Multiple TFs cooperate to regulate a specific gene(Co-regulators).Let s′, s′′ ∈ V be two TFs, which are co-regulators.

coregulator(k,X, θ) : ∀xs′→t′ , xs′′→t′′ ∈ X

| {(s′, s′′, t′) | s′ 6=s′′ ∧ t′= t′′ ∧ d(xs′→t′)>θ ∧ d(xs′′→t′′)>θ} | ≥ k

with k ∈ N and 0 < θ < 1



coregulator(1,V, 75), with s′=1, s′′=5



coregulator(1,V, 75), with s′=1, s′′=5


GRN Consensus

We implement two solution strategy prop-labeling (DFS) and aMonte Carlo (MC) based prop-labeling tree exploration.

No consensus on objective function to drive the solution search.

We propose 3 metric to generate a GRN consensus ConstrainedCommunity Network (CCN).Given a set S of m solutions, the consensus value a∗k associatedwith the variable xk is computed by:

Max Frequency: a∗k = arg maxa∈S|xk

(freq(a, k))

Average: a∗k =1m

m∑

i=1

aik.

Weighted average: a∗k =1∑

a∈S|xk

freq(a, k)2

∑

a∈S|xk

freq(a, k)2a.


ExperimentsCommunity Networks

The CN was built from 4 top ranking methods of last DREAMcompetitions:

1 TIGRESS (Regression model)2 Genie3 (Random Forest approach)3 Infleator (MCZ + tlCLR + linear ODE)4 CLR (Mutual Information model)


ExperimentsDatasets and validation

Benchmarks: DREAM{3,4} (110 GRNs of various sizes).

Subnetworks from GRNs of E. coli and S. cerevisiae.Datasets:

steady state expressions for wild typessteady state expressions measured after gene knockouts.time-series data.

Validation: AUROC score.

CCNs generated via MC search with 1, 000 samplings.


ExperimentsSettings


ExperimentsSettings


Domains Setup.

✓d =1

|ECN |X

(s,t,w)2ECN

�s!t


ExperimentsSettings

Sparseness constraint.

atleast k ge(kl, X , ✓l) \ atmost k ge(km, X , ✓m)

Ordered ECN1 g1 ! g3 0.9982 g1 ! g8 0.981

. . .n g4 ! g6 0.856

. . .n log(n) g7 ! g3 0.633

. . .

kl |{xi|xi 2 X ^ max(Dxi) > ✓l}|km � |{xi|xi 2 X ^ min(Dxi) > ✓m}|


ExperimentsSettings

Redundant edge constraint.

8Gi 2 G \ H . ! is!t > ! i

t!s + �

1|G||ERR|

X

Gi2G\H(! i

s!t � ! it!s))

red edge(xs!t, xt!s, ✓R, ✓r)

1|G \ H||EREQ|

X

Gi2G\H! i

s!t

1|G \ H||ERED|

X

Gi2G\H! i

t!s


ResultsCCN with sparsity and redundant edge constraints

AU

RO

C %

impr

ovem

ent

05

1015

b f a w b f a w b f a w b f a w b f a w

DREAM3 10 DREAM4 10 DREAM3 50 DREAM3 100 DREAM4 100

s,rs,r,t

Average AUC score improvements (in percentage) w.r.t. CN rank


ExperimentsIntegrating GRN knowledge: TFs


ExperimentsSettings


Domains Setup.

✓d =1

|ECN |X

(s,t,w)2ECN

�s!t


ExperimentsSettings

Sparseness constraint.

atleast k ge(kl, X , ✓l) \ atmost k ge(km, X , ✓m)

Ordered ECN1 g1 ! g3 0.9982 g1 ! g8 0.981

. . .n g4 ! g6 0.856

. . .n log(n) g7 ! g3 0.633

. . .

kl |{xi|xi 2 X ^ max(Dxi) > ✓l}|km � |{xi|xi 2 X ^ min(Dxi) > ✓m}|


ExperimentsSettings

Redundant edge constraint.

8Gi 2 G \ H . ! is!t > ! i

t!s + �

1|G||ERR|

X

Gi2G\H(! i

s!t � ! it!s))

red edge(xs!t, xt!s, ✓R, ✓r)

1|G \ H||EREQ|

X

Gi2G\H! i

s!t

1|G \ H||ERED|

X

Gi2G\H! i

t!s


ExperimentsIntegrating GRN knowledge: TFs

Transcription Factor constraint.

atleast k ge(blog(n)c, X, ✓)

Ordered ECN1 g1 ! g3 0.9982 g1 ! g8 0.981

. . .n g4 ! g6 0.856

. . .


ResultsCCN with additional GRN knowledge integrationA

UR

OC

% im

prov

emen

t

05

1015



s,rs,r,t



ResultsCCN with additional GRN knowledge integrationA

UR

OC

% im

prov

emen

t

05

1015



s,rs,r,t



Conclusions

CP-based approach to infer GRNs by integrating severalmethods in a CN.Introduces a set of constraints able to:

1 enforce the satisfaction of GRNs specific properties;2 take account of the community predictions agreements and

methods limitations.

No assumptions on datasets nor on the type of inferencemethods.Take Home Message:

GRN knowledge integration offer improvements in predictionaccuracy.Constraints are a powerful tool to model and integrate GRNproperties.

Thank you!


Conclusions

CP-based approach to infer GRNs by integrating severalmethods in a CN.Introduces a set of constraints able to:

1 enforce the satisfaction of GRNs specific properties;2 take account of the community predictions agreements and

methods limitations.

No assumptions on datasets nor on the type of inferencemethods.Take Home Message:

GRN knowledge integration offer improvements in predictionaccuracy.Constraints are a powerful tool to model and integrate GRNproperties.

Thank you!

Documents

Constraint Programming in Community-based Gene ...fferdinando3/files/talks/...BackgroundConstraint Programing in Community NetworksExperiments and ResultsConclusions Constraint Programming