20
Gaussian Process Regression Networks Andrew Gordon Wilson [email protected] mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with David A. Knowles and Zoubin Ghahramani June 27, 2012 ICML, Edinburgh 1 / 20

Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Gaussian Process Regression Networks

Andrew Gordon Wilson

[email protected]/andrewUniversity of Cambridge

Joint work with David A. Knowles and Zoubin Ghahramani

June 27, 2012ICML, Edinburgh

1 / 20

Page 2: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Multiple responses with input dependent covariances

I Two response variables:y1(x): concentration of cadmium at aspatial location x.y2(x): concentration of zinc at aspatial location x.

I The values of these responses, at agiven spatial location x∗, arecorrelated.

I We can account for thesecorrelations (rather than assumingy1(x) and y2(x) are independent) toenhance predictions.

I We can further enhance predictionsby accounting for how thesecorrelations vary with geographicallocation x. Accounting for inputdependent correlations is adistinctive feature of the Gaussianprocess regression network.

1 2 3 4 5longitude

1

2

3

4

5

6

lati

tude

0.15

0.00

0.15

0.30

0.45

0.60

0.75

0.90

2 / 20

Page 3: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Motivation for modelling dependent covariances

PromiseI Many problems in fact have input dependent uncertainties and correlations.

I Accounting for dependent covariances (uncertainties and correlations) cangreatly improve statistical inferences.

Uncharted Territory

I For convenience, response variables are typically seen as independent, or ashaving fixed covariances (e.g. multi-task literature).

I The few existing models of dependent covariances are typically not expressive(e.g. Brownian motion covariance structure) or scalable (e.g. < 5 responsevariables).

GoalI We want to develop expressive and scalable models (> 1000 response variables)

for dependent uncertainties and correlations.

3 / 20

Page 4: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Outline

I Gaussian process reviewI Gaussian process regression networksI Applications

4 / 20

Page 5: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Gaussian processes

DefinitionA Gaussian process (GP) is a collection of random variables, any finite number ofwhich have a joint Gaussian distribution.

Nonparametric Regression Model

I Prior: f (x) ∼ GP(m(x), k(x, x′)), meaning (f (x1), . . . , f (xN)) ∼ N (µ,K),with µi = m(xi) and Kij = cov(f (xi), f (xj)) = k(xi, xj).

GP posterior︷ ︸︸ ︷p(f (x)|D) ∝

Likelihood︷ ︸︸ ︷p(D|f (x))

GP prior︷ ︸︸ ︷p(f (x))

−10 −5 0 5 10−3

−2

−1

0

1

2

3

input, t a)

outp

ut, f

(t)

Gaussian process sample prior functions

−10 −5 0 5 10−3

−2

−1

0

1

2

3

input, tb)

outp

ut, f

(t)

Gaussian process sample posterior functions

5 / 20

Page 6: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Gaussian processes

“How can Gaussian processes possiblyreplace neural networks? Did we throw thebaby out with the bathwater?”

David MacKay, 1998.

6 / 20

Page 7: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Semiparametric Latent Factor Model

The semiparametric latent factor model (SLFM) (Teh, Seeger, and Jordan, 2005) is apopular multi-output (multi-task) GP model for fixed signal correlations betweenoutputs (response variables):

p×1︷︸︸︷y(x) =

p×q︷︸︸︷W

q×1︷︸︸︷f(x) +σy

N (0,Ip)︷︸︸︷z(x)

I x: input variable (e.g. geographical location).

I y(x): p× 1 vector of output variables(responses) evaluated at x.

I W: p× q matrix of mixing weights.

I f(x): q× 1 vector of Gaussian processfunctions.

I σy: hyperparameter controlling noisevariance.

I z(x): i.i.d Gaussian white noise with p× pidentity covariance Ip.

fi(x)

f1(x)

fq(x)

y1(x)

yj(x)

W11

Wpq

W1q

Wp1

yp(x)

...

... ...

...

x

7 / 20

Page 8: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Deriving the GPRN

f1(x1)

f2(x1)

y1(x1)

y2(x1)

f1(x2)

f2(x2)

y1(x2)

y2(x2)

Structure atx = x1

Structure atx = x2

At x = x1 the two outputs (responses) y1 and y2 are correlated since they share thebasis function f1. At x = x2 the outputs are independent.

8 / 20

Page 9: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

From the SLFM to the GPRN

fi(x)

f1(x)

fq(x)

y1(x)

yj(x)

W11

Wpq

W1q

Wp1

yp(x)

...

... ...

...

x

GPRNSLFM

?

p×1︷︸︸︷y(x) =

p×q︷︸︸︷W

q×1︷︸︸︷f(x) +σy

N (0,Ip)︷︸︸︷z(x)

9 / 20

Page 10: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

From the SLFM to the GPRN

fi(x)

f1(x)

fq(x)

y1(x)

yj(x)

W11

Wpq

W1q

Wp1

yp(x)

...

... ...

...

x

GPRNSLFM

f̂i(x)

f̂1(x)

f̂q(x)

y1(x)

yj(x)

W11(x)

Wpq (x)

W1q(x

)W

p1 (x

)

yp(x)

...

... ...

...

x

p×1︷︸︸︷y(x) =

p×q︷︸︸︷W

q×1︷︸︸︷f(x) +σy

N (0,Ip)︷︸︸︷z(x)

p×1︷︸︸︷y(x) =

p×q︷ ︸︸ ︷W(x) [

q×1︷︸︸︷f(x) +σf

N (0,Iq)︷︸︸︷ε(x) ]︸ ︷︷ ︸

f̂(x)

+σy

N (0,Ip)︷︸︸︷z(x)

10 / 20

Page 11: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Gaussian process regression networks

p×1︷︸︸︷y(x) =

p×q︷ ︸︸ ︷W(x) [

q×1︷︸︸︷f(x) +σf

N (0,Iq)︷︸︸︷ε(x) ]︸ ︷︷ ︸

f̂(x)

+σy

N (0,Ip)︷︸︸︷z(x)

or, equivalently,

y(x) =

signal︷ ︸︸ ︷W(x)f(x) +

noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .

I y(x): p× 1 vector of output variables(responses) evaluated at x.

I W(x): p× q matrix of weight functions.W(x)ij ∼ GP(0, kw).

I f(x): q× 1 vector of Gaussian process nodefunctions. f(x)i ∼ GP(0, kf ).

I σf , σy: hyperparameters controlling noisevariance.

I ε(x), z(x): Gaussian white noise.

f̂i(x)

f̂1(x)

f̂q(x)

y1(x)

yj(x)

W11(x)

Wpq (x)

W1q(x

)W

p1 (x

)

yp(x)

...

... ...

...

x

11 / 20

Page 12: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

GPRN Inference

I We sample from the posterior over Gaussian processes in the weight and nodefunctions using elliptical slice sampling (ESS) (Murray, Adams, and MacKay,2010). ESS is especially good for sampling from posteriors with correlatedGaussian priors.

I We also approximate this posterior using a message passing implementation ofvariational Bayes (VB).

I The computational complexity is cubic in the number of data points and linearin the number of response variables, per iteration of ESS or VB.

I Details are in the paper.

12 / 20

Page 13: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

GPRN Results, Jura Heavy Metal Dataset

1 2 3 4 5longitude

1

2

3

4

5

6

lati

tude

0.15

0.00

0.15

0.30

0.45

0.60

0.75

0.90

f2(x)

f1(x)

W11(x)

W12(x)

W21(x)

W22(x)

W31(x)

W32(x)

y1(x)

y2(x)

y3(x)

y(x) =

signal︷ ︸︸ ︷W(x)f(x) +

noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .

13 / 20

Page 14: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

GPRN Results, Gene Expression 50D

GPRN (ESS) GPRN (VB) CMOGP LMC SLFM0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8GENE (p=50)

SM

SE

14 / 20

Page 15: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

GPRN Results, Gene Expression 1000D

GPRN (ESS) GPRN (VB) CMOFITC CMOPITC CMODTC0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SM

SE

GENE (p=1000)

15 / 20

Page 16: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Training Times on GENE

Training time GENE (50D) (s) GENE (1000D) (s)

GPRN (VB) 12 330GPRN (ESS) 40 9000LMC, CMOGP, SLFM minutes days

16 / 20

Page 17: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Multivariate Volatility Results

GPRN (VB) GPRN (ESS) GWP WP MGARCH0

1

2

3

4

5

6

7

MS

E

MSE on EQUITY dataset

17 / 20

Page 18: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Summary

I A Gaussian process regression network is used for multi-task regression andmultivariate volatility, and can account for input dependent signal and noisecovariances.

I Can scale to thousands of dimensions.

I Outperforms multi-task Gaussian process models and multivariate volatilitymodels.

18 / 20

Page 19: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

Generalised Wishart ProcessesRecall that the GPRN model can be written as

y(x) =

signal︷ ︸︸ ︷W(x)f(x) +

noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .

The induced noise process,

Σ(x)noise = σ2f W(x)W(x)T + σ2

y I ,

is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). Atevery x, Σ(x) is marginally Wishart, and the dynamics of Σ(x) are governed by theGP covariance kernel used for the weight functions in W(x).

19 / 20

Page 20: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,

GPRN Inference New

y(x) =

signal︷ ︸︸ ︷W(x)f(x) +

noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .

Prior is induced through GP priors in nodes and weights

p(u|σf ,γ) = N (0,CB)

Likelihood

p(D|u, σf , σy) =

N∏i=1

N (y(xi); W(xi)f̂(xi), σ2y Ip)

Posterior

p(u|D, σf , σy,γ) ∝ p(D|u, σf , σy)p(u|σf ,γ)

We sample from the posterior using elliptical slice sampling (Murray, Adams, andMacKay, 2010) or approximate it using a message passing implementation ofvariational Bayes.

20 / 20