45
Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March 10 th 2016 Magnus Rattray Faculty of Life Sciences University of Manchester

Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Modelling gene expression dynamicswith Gaussian processes

Regulatory Genomics and EpigenomicsMarch 10th 2016

Magnus RattrayFaculty of Life Sciences

University of Manchester

Page 2: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Talk Outline

Introduction to Gaussian process regression

Example 1. Hierarchical models: batches and clusters

Example 2. Branching models: perturbations and bifurcations

Example 3. Differential equations: Pol-II to mRNA dynamics

Page 3: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes: flexible non-parametric models

Probability distributions over functions

f (t) ∼ GP(mean(t), cov(t, t ′)

)Covariance function k = cov(t, t ′) defines typical properties,

I Static . . . Dynamic

I Smooth . . . Rough

I Stationary. . . non-Stationary

I Periodic. . . Chaotic

The covariance function has parameters tuning these properties

Bayesian Machine Learning perspective: Rasmussen & Williams“Gaussian Processes for Machine Learning” (MIT Press, 2006)

Page 4: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes

Samples from a 25-dimensional multivariate Gaussian distribution:

n

fn

5 10 15 20 25

−1

−0.5

0.5

1

1.5

2

n

m

5 10 15 20 25

5

10

15

20

25

0.2

0.4

0.6

0.8

[f1, f2, . . . , f25] ∼ N (0,C )

Learning and Inference in Computational Systems Biology, MIT Press

Page 5: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes

Take dimension →∞

−3 −2 −1 1 2 3

−2.5

−2

−1.5

−1

−0.5

0.5

1

1.5

2

2.5

−3 −2 −1 1 2 3

−3

−2

−1

1

2

3

f ∼ GP(0, k) k(t, t ′)

= exp

(−(t − t ′)2

l2

)

Learning and Inference in Computational Systems Biology, MIT Press

Page 6: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Gaussian processes for inference: Bayesian Regression

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 7: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 8: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 9: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Regression example

2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

2

1

0

1

2

Figures from James Hensman

Page 10: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex1. Hierarchical models: batches and clusters

0 5 10 15 2021012

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

0 5 10 15 2021012

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Data from Kalinka et al. ”Gene expression divergence recapitulatesthe developmental hourglass model” Nature 2010

Joint work with James Hensman and Neil Lawrence

Page 11: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Usual processing options for time course batches

Lumped

0 5 10 15 20

2

1

0

1

2

Averaged

0 5 10 15 20

2

1

0

1

2

Page 12: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process

gene:

g(t) ∼ GP(0, kg (t, t ′)

)replicate:

fi (t) ∼ GP(g(t), kf (t, t ′)

)

Page 13: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process

3

2

1

0

1

2

J. Hensman, N.D. Lawrence, M.Rattray ”Hierarchical Bayesianmodelling of gene expression time series across irregularly sampledreplicates and clusters” BMC Bioinformatics 2013

Page 14: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

An extended hierarchy

h(t) ∼ GP(

0, kh(t, t ′))

cluster

gi (t) ∼ GP(h(t), kg (t, t ′)

)gene

fir (t) ∼ GP(gi (t), kf (t, t ′)

)replicate

Page 15: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

2

1

0

1

2

CG11786

2

1

0

1

2

CG12723

2

1

0

1

2

CG13196

2

1

0

1

2

CG13627

2

1

0

1

2

CG4386

2

1

0

1

2

Osi15

0 5 10 15 202

1

0

1

2

0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20

Time (hours)

Norm

alis

ed

gen

e e

xp

ress

ion

Page 16: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Hierarchical Gaussian process for clustering

Modifying an existing algorithm to include this model of replicateand cluster structure leads to more meaningful clustering

MF BP CC L N. clust.

agglomerative HGP 0.46 0.16 0.50 7360.8 50agglomerative GP 0.39 0.13 0.36 6203.7 128Mclust (concat.) 0.39 0.07 0.25 1324.0 26

Mclust (averaged) 0.40 0.08 0.24 -736.2 20

Variational Bayes algorithm is more efficient, allowing Bayesianclustering of >10K profiles with a Dirichlet Process prior

J. Hensman, M.Rattray, N.D. Lawrence “Fast non-parametricclustering of time-series data” IEEE TPAMI 2015

Page 17: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Clustering with a periodic covariance

Page 18: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex2. Branching models: perturbations and bifurcations

0 5 10 15Time (hrs)

−4

−3

−2

−1

0

1

2

3

4

5

Sim

ulated gene expression

Simulated control data

Simulated perturbed data

Joint work with Jing Yang, Chris Penfold and Murray Grant

Page 19: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 1

Page 20: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 2

Page 21: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 3

Page 22: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 4

Page 23: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Samples from a branching model

Sample 5

Page 24: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Joint distribution to two functions crossing at tp

f ∼ GP(0,K ) , g ∼ GP(0,K ) , g(tp) = f (tp)

Σ =

(Kff Kfg

Kgf Kgg

)=

(K (T,T)

K(T,tp)K(tp ,T)k(tp ,tp)

K(T,tp)K(tp ,T)k(tp ,tp)

K (T,T)

)(1)

Page 25: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Joint distribution of two datasets diverging at tp

y c(tn) ∼ N (f (tn), σ2)

yp(tn) ∼ N (f (tn), σ2) for tn ≤ tp

yp(tn) ∼ N (g(tn), σ2) for tn > tp

Page 26: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Posterior probability of the perturbation time tp

p(tp|y c(T), yp(T)) ' p(y c(T), yp(T)|tp)∑t=tmaxt=tmin

p(y c (T), yp(T)|t)

0 5 10 15Time (hrs)

−4

−3

−2

−1

0

1

2

3

4

5

Sim

ulated gene expression

Simulated control data

Simulated perturbed data

Page 27: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Comparison to DE thresholding approach

Alternative: use first point where some DE threshold is passedWe tested several DE metrics in the nsgp package

0 5 10 15 20

0

5

10

15

20

Esti

mate

d p

ert

urb

ati

on

tim

es

DEtime

Mean

Median

MAP

0 5 10 15 20

nsgp

EMLL0.5

EMLL1

Testing perturbation times

Page 28: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Investigating a plant’s response to bacterial challenge

Infection with virulent Pseudomonas syringage pv. tomato DC3000vs. disarmed strain DC3000hrpA

0 10 17

Times (hrs)

8

9

10

11

12

13

Gen

e e

xp

ressio

n

CATMA1c72124

Condition 1

Condition 2

0 10 17

Times (hrs)

CATMA1a09335

Condition 1

Condition 2

Yang et al. “Inferring the perturbation time from biological timecourse data” Bioinformatics (accepted) preprint: arxiv 1602.01743

Page 29: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Current work: modelling branching in single-cell data

12

48

1632

64start

5

10

15

20

25

30

35

40

1

2

3

4

5

6

7

with A. Boukouvalas, M. Zweissele, J. Hensman and N. Lawrence

Page 30: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Ex 3. Linking Pol-II activity to mRNA profiles

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II

0

2

4

6

Pol II

ENSG00000163659 (TIPARP)

0 40 80 160 320 64012800

50

100

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Joint work with Antti Honkela,Jaakko Peltonen, Neil Lawrence

Page 31: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Linking Pol-II activity to mRNA profiles

dm(t)

dt= βp(t −∆)− αm(t)

I m(t) is mRNA concentration (RNA-Seq data)

I p(t) is mRNA production rate (3’ pol-II ChIP-Seq data)

I α is degradation rate (mRNA half-life t1/2 = 2/α)

I ∆ is processing delay

We model p(t) ∼ GP(0, kp) as a Gaussian process (GP)

Likelihood can be worked out exactly

Bayesian MCMC used to estimate parameters α, β, ∆ and GPcovariance (2) and noise variance (1) parameters

Page 32: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Linking Pol-II activity to mRNA profiles

dm(t)

dt= βp(t −∆)− αm(t)

I m(t) is mRNA concentration (RNA-Seq data)

I p(t) is mRNA production rate (3’ pol-II ChIP-Seq data)

I α is degradation rate (mRNA half-life t1/2 = 2/α)

I ∆ is processing delay

We model p(t) ∼ GP(0, kp) as a Gaussian process (GP)

Likelihood can be worked out exactly

Bayesian MCMC used to estimate parameters α, β, ∆ and GPcovariance (2) and noise variance (1) parameters

Page 33: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II a: Early pol-II, no production delay

0

2

4

Pol II

ENSG00000166483 (WEE1)

0 40 80 160 320 64012800

20

40

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 34: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II b: Early pol-II, delayed production

0

1

2

3

Pol II

ENSG00000019549 (SNAI2)

0 40 80 160 320 64012800

2

4

6

8

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 35: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II c: Later pol-II, delayed production

0

2

4

6

Pol II

ENSG00000163659 (TIPARP)

0 40 80 160 320 64012800

50

100

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 36: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II d: Late pol-II, no delay

0

0.5

1

Pol II

ENSG00000162949 (CAPN13)

0 40 80 160 320 64012800

5

10

15

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 37: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II e: Late pol-II, no delay

0

0.5

1

Pol II

ENSG00000117298 (ECE1)

0 40 80 160 320 64012800

20

40

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 38: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Example fits

040

80160

320640

1280

t (min)

mRNA

a

b

c

d

e

f

040

80160

320640

1280

t (min)

Pol−II f: Late pol-II, delayed production

0

1

2

3

Pol II

ENSG00000181026 (AEN)

0 40 80 160 320 64012800

10

20

30

mR

NA

(F

PK

M)

t (min)0 50 100

0

0.2

0.4

0.6

0.8

1

Delay (min)

Page 39: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Large processing delays observed in 11% of genes

Delay (min)

# of

gen

es

0 40 80 >120

050

100

1500

1550

Page 40: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Delay linked with gene length and intron structure

0 20 40 60 80t (min)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Fra

ctio

n of

gen

es w

ith ∆

>t

02

46

810

12−

log 1

0(p−

valu

e)

m<10000(N=282)m>10000(N=1504)

p−value

0 20 40 60 80t (min)

0.00

0.05

0.10

0.15

0.20

0.25

Fra

ctio

n of

gen

es w

ith ∆

>t

02

46

810

−lo

g 10(p

−va

lue)

f<0.20(N=1443)f>0.20(N=343)

p−value

∆: delay m: gene length f: final intron length / gene length

Page 41: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Delayed genes show evidence of late-intron retention

DLX

3

0246

10 m

in

0

5

10

20 m

in

0102030

40 m

in

05

101520

80 m

in

02468

10

160

min

02468

320

min

10 20 30 40 50 60t (min)

−0.

010.

000.

010.

020.

030.

04M

ean

pre−

mR

NA

end

acc

umul

atio

n in

dex

01

23

45

67

−lo

g 10(p

−va

lue)

∆ > t∆ < tp−value

Left: density of RNA-Seq reads uniquely mapping to the introns inthe DLX3 gene

Right: Differences in the mean pre-mRNA accumulation index inlong delay genes (blue) and short delay genes (red)

Page 42: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Comparison of processing and transcription times

RNA processing delay ∆ (min)

Tran

scrip

tiona

l del

ay (

min

)

1 3 10 30 100

13

1030

100

Transcription time = length/velocityestimate from Danko Mol Cell 2013

Transcription time > processing delayin 87% of genes

Honkela et al. “Genome-wide modeling of transcription kineticsreveals patterns of RNA production delays” PNAS 2015

Page 43: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Extension - inferring time-varying degradation rates

dm(t)

dt= βp(t)− α(t)m(t)

0 2 4 6 8 10Times

0.5

0.0

0.5

1.0

1.5

2.0

2.5

Degra

dati

on r

ate

D

0 2 4 6 8 10Times

5

0

5

10

15

20

25

30

35

Gene e

xpre

ssio

n

mRNA measurementspre-mRNA measurements

Page 44: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Conclusion

I Gaussian process models provide a good mix of flexibility andtractability in temporal and spatial modelling:

I Hierarchical modelling, e.g. replicates, clusters, speciesI Periodic models without strong sinuisoidal assumptionsI Tractable under branching and bifurcationsI Tractable under linear operations on functions

I Easy addition and multiplication of kernels

I Good approximate inference algorithms for non-Gaussian data

I Codebase is growing making methods increasingly flexible andcomputationally efficient (e.g. GPy, GPflow and some R)

Funding: BBSRC (EraSysBio+ SYNERGY), EU FP7 (RADIANT)

Collaborators: Neil Lawrence, James Hensman, Antti Honkela,Jaakko Peltonen, Jing Yang, Alexis Boukouvalas, Max Zweissele

Page 45: Modelling gene expression dynamics with Gaussian processes · Di erential equations: Pol-II to mRNA dynamics. Gaussian processes: ... M.Rattray, N.D. Lawrence \Fast non-parametric

Our existential crisis