Upload
kali
View
61
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Dirk Husmeier Frank Dondelinger Sophie Lebre. Inferring gene regulatory networks with non-stationary dynamic Bayesian networks. Biomathematics & Statistics Scotland. Overview. Introduction Non-homogeneous dynamic Bayesian network for non-stationary processes Flexible network structure - PowerPoint PPT Presentation
Citation preview
Inferring gene regulatory networks with non-stationary dynamic Bayesian networks
Dirk Husmeier Frank Dondelinger
Sophie Lebre
Biomathematics & Statistics Scotland
Overview
• Introduction
• Non-homogeneous dynamic Bayesian network for non-stationary processes
• Flexible network structure
• Open problems
Can we learn signalling pathways from postgenomic data?
From Sachs et al Science 2005
Network reconstruction from postgenomic data
Friedman et al. (2000), J. Comp. Biol. 7, 601-620
Marriage between
graph theory
and
probability theory
Bayes net
ODE model
A
CB
D
E F
NODES
EDGES
Graph theory
•Directed acyclic graph (DAG) representing conditional independence relations.
Probability theory
•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.
•We can infer how well a particular network explains the observed data.
),|()|(),|()|()|()(
),,,,,(
DCFPDEPCBDPACPABPAP
FEDCBAP
[A]= w1[P1] + w2[P2] + w3[P3] +
w4[P4] + noise
BGe (Linear model)
A
P1
P2
P4
P3
w1
w4
w2
w3
BDe (Nonlinear discretized model)
P1
P2
P1
P2
Activator
Repressor
Activator
Repressor
Activation
Inhibition
Allow for noise: probabilities
Conditional multinomial distribution
P
P
Model Parameters q
Integral analytically tractable!
BDe: UAI 1994
BGe: UAI 1995
Dynamic Bayesian network
Example: 2 genes 16 different network structures
Best network: maximum score
Identify the best network structure
Ideal scenario: Large data sets, low noise
Uncertainty about the best network structure
Limited number of experimental replications, high noise
Sample of high-scoring networks
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
High-confident edge
High-confident non-edge
Uncertainty about edges
Can we generalize this scheme to more than 2 genes?
In principle yes.
However …
Number of structures
Number of nodes
Configuration space of network structures
Find the high-scoring structures
Sampling from the posterior distribution
Taken from the MSc thesis by Ben Calderhead
Madigan & York (1995), Guidici & Castello (2003)
Configuration space of network structures
MCMC Local change
If accept
If accept with probability
Taken from the MSc thesis by Ben Calderhead
Overview
• Introduction
• Non-homogeneous dynamic Bayesian networks for non-stationary processes
• Flexible network structure
• Open problems
Dynamic Bayesian network
Example: 4 genes, 10 time points
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Standard dynamic Bayesian network: homogeneous model
Limitations of the homogeneity assumption
Our new model: heterogeneous dynamic Bayesian network. Here: 2 components
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Our new model: heterogeneous dynamic Bayesian network. Here: 3 components
Learning with MCMC
q
k
h
Number of components (here: 3)
Allocation vector
Non-homogeneous model
Non-linear model
[A]= w1[P1] + w2[P2] + w3[P3] +
w4[P4] + noise
BGe: Linear model
A
P1
P2
P4
P3
w1
w4
w2
w3
BDe: Nonlinear discretized model
P1
P2
P1
P2
Activator
Repressor
Activator
Repressor
Activation
Inhibition
Allow for noise: probabilities
Conditional multinomial distribution
P
P
Pros and cons of the two models
Linear Gaussian model
• Restriction to linear processes
• Original data no information loss
Multinomial model
• Nonlinear model
• Discretization information loss
Can we get an approximate nonlinear model without data discretization?
y
x
Can we get an approximate nonlinear model without data discretization?
Idea: piecewise linear model
y
x
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Inhomogeneous dynamic Bayesian network with common changepoints
Inhomogenous dynamic Bayesian network with node-specific changepoints
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
NIPS 2009
Overview
• Introduction
• Non-homogeneous dynamic Bayesian network for non-stationary processes
• Flexible network structure
• Open problems
Non-stationarity in the regulatory process
Non-stationarity in the network structure
ICML 2010
Flexible network structure with regularization
Flexible network structure with regularization
Flexible network structure with regularization
Morphogenesis in Drosophila melanogaster
• Gene expression measurements over 66 time steps of 4028 genes (Arbeitman et al., Science, 2002).
• Selection of 11 genes involved in muscle development.
Zhao et al. (2006),
Bioinformatics 22
Transition probabilities: flexible structure with regularization
Morphogenetic transitions: Embryo larva larva pupa pupa adult
Comparison with:
Dondelinger, Lèbre & Husmeier Ahmed & Xing
Collaboration with Frank Dondelinger and Sophie Lèbre
NIPS 2010
Method based on homogeneous DBNs
Method based on differential equations
Sample of high-scoring networks
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
Method based on homogeneous DBNs
Method based on differential equations
Overview
• Introduction
• Non-homogeneous dynamic Bayesian network for non-stationary processes
• Flexible network structure
• Open problems
Exponential versus binomial prior distribution
Exploration of various information sharing options
How to deal with static data?
Change-point process
Free allocation
Allocation sampler versus change-point process
• More flexibility, unrestricted mixture model.
• Not restricted to time series
• Higher computational costs
• Incorporates plausible prior knowledge for time series.
• Reduced complexity• Less universal, not
applicable to static data
Marco GrzegorczykUniversity of Dortmund
Germany
Frank Dondelinger Biomathematics & Statistics Scotland
United Kingdom
Sophie LèbreUniversité de Strasbourg
France
Acknowledgements
Further details for discussion during
question time
Details on exponential prior
Hierarchical Bayesian model
Hierarchical Bayesian model
MCMC scheme (for symmetric proposal distributions)
Details on other priors
where
Partition function
Ignoring the fan-in restriction:
Number of genes
Simulation study• We randomly generated 10 networks with 10
nodes each.• Number of regulators for each node drawn from
a Poisson distribution with mean=3.• 5 time series segments• Network changes: number of changes drawn
from a Poisson distribution.• For each segment: time series of length 50
generated from a linear regression model, interaction parameters drawn from N(0,1), iid Gaussian noise from N(0,1).
Synthetic simulation study
No information sharing between
adjacent segments
Information sharing between adjacent
segments
Frank Dondelinger, Sophie Lèbre, Dirk Husmeier: ICML 2010