27
A comparative approach for gene network inference using time- series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques, Université de Montréal October 2003

A comparative approach for gene network inference using time-series gene expression data

  • Upload
    bazyli

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

A comparative approach for gene network inference using time-series gene expression data. Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques, Université de Montréal October 2003. http://www.sri.com/pharmdisc/cancer_biology/laderoute.html. DNA Microarrays. - PowerPoint PPT Presentation

Citation preview

Page 1: A comparative approach for gene network inference using time-series gene expression data

A comparative approach for gene network inference using time-series

gene expression data

Guillaume Bourque* andDavid Sankoff

*Centre de Recherches Mathématiques,Université de Montréal

October 2003

Page 2: A comparative approach for gene network inference using time-series gene expression data

DNA Microarrays

http://www.sri.com/pharmdisc/cancer_biology/laderoute.html

• Experiment design

• Noise reduction

• Normalization

• …

• Data analysis

Page 3: A comparative approach for gene network inference using time-series gene expression data

Gene Expression Data

Page 4: A comparative approach for gene network inference using time-series gene expression data

Beyond Clustering…

Time series

x2

x1

x4

x3

_

_

+

+ _

_

+

_

?

Gene network

Page 5: A comparative approach for gene network inference using time-series gene expression data

Comparative Framework

Specie CSpecie BSpecie A

Page 6: A comparative approach for gene network inference using time-series gene expression data

Harder Problem?

• This new problem seems more ambitious and harder to solve.

• BUT, we will show that, for closely related species (samples), the comparative framework can actually improve the quality of the solutions recovered.

• The repetitive nature of the data can be used to sort through some of the noise and some of the ambiguity.

Page 7: A comparative approach for gene network inference using time-series gene expression data

Outline

• Gene network model• Single network inference

– Algorithm– Simulations

• Multiple networks inference– Algorithm– Simulations

• Conclusions

Page 8: A comparative approach for gene network inference using time-series gene expression data

Gene Network Model

• We use linear differential equations to model the gene trajectories (Chen et al. ‘99, D’haeseleer et al. ‘99):

dxi(t) / dt = a0 + ai,1 x1(t)+ ai,2 x2(t)+ … + ai,n xn(t)

• Several reasons for that choice:– Takes advantage of the continuous aspect of the data.

– Allows for feed-back loops.

– Low number of parameters implies that we are less likely to over fit the data.

– Sufficient to model complex interactions between genes.

Page 9: A comparative approach for gene network inference using time-series gene expression data

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

Page 10: A comparative approach for gene network inference using time-series gene expression data

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

interactioncoefficient

Page 11: A comparative approach for gene network inference using time-series gene expression data

Small Network Example

dx1(t) / dt = 0.491 - 0.248 x1(t)dx2(t) / dt = -0.473 x3(t) + 0.374 x4(t)dx3(t) / dt = -0.427 + 0.376 x1(t) - 0.241 x3(t)dx4(t) / dt = 0.435 x1(t) - 0.315 x3(t) - 0.437 x4(t)

x2

x1

x4

x3

_

_

+

+ _

_

+

_

constantcoefficient

Page 12: A comparative approach for gene network inference using time-series gene expression data

Problem Revisited

a0,i a1,i a2,i a3,i a4,i

x1 .431 -.248 0 0 0

x2 0 0 0 -.473 .374

x3 -.427 .376 0 -.241 0

x4 0 .435 0 -.315 -.437

Given the time-series data, can we find the interactions coefficients?

Page 13: A comparative approach for gene network inference using time-series gene expression data

Linear Differential Equations• Even under the simplest linear model, there are m(m+1)

unknown parameters to estimate:• m(m-1) directional effects• m self effects• m constant effects

• Number of data points is mn and we typically have that n << m (few time-points).

• To avoid over fitting, extra constraints must be incorporated into the model such as:

• Smoothness of the equations (D’haeseleer et al. ‘99)• Sparseness of the network, i.e. few non-null interaction

coefficients (Yeung et al. ‘02, De Hoon et al. ‘02)

Page 14: A comparative approach for gene network inference using time-series gene expression data

Algorithm for Network Inference

• To recover the interaction coefficients, we use stepwise multiple linear regression.

• Why?– This procedure finds coefficient that significantly improve

the fit in the regression. It limits the number of non-zero coefficients (i.e. it finds sparse networks) a feature we were seeking.

– It is highly flexible and provides p-value scores which can be interpreted easily.

Page 15: A comparative approach for gene network inference using time-series gene expression data

Partial F Test

• The procedure finds the interaction coefficients iteratively for each gene xi.

• A partial F test is constructed to compare the total square error of the predicted gene trajectory with a specific subset of coefficients being added or removed.

• If the p-value obtained from the test exceeds a certain cutoff, the subset of coefficients is significant and will be added or removed.

• The procedures iterates until no more subsets of coefficients are either added or removed.

Page 16: A comparative approach for gene network inference using time-series gene expression data

Simulations

• Difficult to find coefficients that will produce realistic gene trajectories.

• We select coefficients such that the resulting trajectories satisfy 3 conditions:– They are bounded

– The correlation of any pair is not too high

– They are not too stable

• We added gaussian noise to model errors.

Page 17: A comparative approach for gene network inference using time-series gene expression data

Gaussian Noise

Page 18: A comparative approach for gene network inference using time-series gene expression data

regression

procedure

Network Inferencea0,i a1,i a2,i a3,i a4,i

x1 .431 -.248 0 0 0

x2 0 0 0 -.473 .374

x3 -.427 .376 0 -.241 0

x4 0 .435 0 -.315 -.437

Procedure recovers perfectly this network with 4 genes and 10 interactions coefficients.

x2

x1

x4

x3

_

_+

+ _

_

+_

Page 19: A comparative approach for gene network inference using time-series gene expression data

10 Genes

Procedure also recovers perfectly this network with 10 genes and 22 interactions coefficients.

Page 20: A comparative approach for gene network inference using time-series gene expression data

Multiple Networks

Specie CSpecie BSpecie A

Page 21: A comparative approach for gene network inference using time-series gene expression data

Types of Problems

• Multiple networks related by a graph or a tree can arise from various situations:– Different species

– Different developments stages

– Different tissues

• The goal is now not only to maximize the fit (with as few interactions as possible) but also to minimize an evolutionary cost on the graph of the networks.

Page 22: A comparative approach for gene network inference using time-series gene expression data

Evolutionary Cost

{1, 2}

{1, 2, 3}

{1} {1, 3} {1, 2, 3}

sets of predictedregulatorsevolutionary

event

Evolutionary cost = 3

Page 23: A comparative approach for gene network inference using time-series gene expression data

Multiple Network Inference

• The stepwise regression algorithm is modified to add/remove subsets of regulators directly on the edges of the graph.

• Partial F tests are computed on the vertices affected by this change the evaluate the change in fit.

• The p-values obtained are then modified based on the change in evolutionary cost.

• The p-values are finally combined into a scoring function using a Kolmogorov-Smirnov Test.

• The algorithm iteratively adds/removes the best scoring move when above/below a certain threshold.

Page 24: A comparative approach for gene network inference using time-series gene expression data

Simulation Example

Page 25: A comparative approach for gene network inference using time-series gene expression data

Simulation Example

Page 26: A comparative approach for gene network inference using time-series gene expression data

Simulation Results

Page 27: A comparative approach for gene network inference using time-series gene expression data

Conclusions

• The comparative framework actually simplifies the inference process especially for instances of the problem with more genes, more noise or fewer time-points.

• The procedure could also be used for the revision of gene networks.

• Possibility of exploring different evolutionary models.

• We need to try the procedure on real data.