136
Data Analysis and Probabilistic Inference Gaussian Processes Recommended reading: Rasmussen/Williams: Chapters 1, 2, 4, 5 Deisenroth & Ng (2015)[3] Marc Deisenroth Department of Computing Imperial College London February 22, 2017

Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Data Analysis and Probabilistic Inference

Gaussian ProcessesRecommended reading:Rasmussen/Williams: Chapters 1, 2, 4, 5Deisenroth & Ng (2015)[3]

Marc DeisenrothDepartment of ComputingImperial College London

February 22, 2017

Page 2: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

http://www.gaussianprocess.org/

Gaussian Processes Marc Deisenroth February 22, 2017 2

Page 3: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Problem Setting

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2ε

˘

, find adistribution over functions pp f q that explains the data

Probabilistic regression problem

Gaussian Processes Marc Deisenroth February 22, 2017 3

Page 4: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Problem Setting

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10

−2

0

2

x

f(x)

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2ε

˘

, find adistribution over functions pp f q that explains the data

Probabilistic regression problem

Gaussian Processes Marc Deisenroth February 22, 2017 3

Page 5: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Recap from CO-496: Bayesian Linear Regression

§ Linear Regression Model:

f pxq “ φpxqJw, w „ N`

0, Σp˘

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Integrating out the parameters when predicting leads to adistribution over functions:

pp f px˚q|x˚, X, yq “ż

pp f px˚q|x˚, wqppw|X, yqdw

“ N`

µpx˚q, σ2px˚q˘

µpx˚q “ φJ˚ Σp ΦpK` σ2n Iq´1y

σ2px˚q “ φJ˚ Σp φ˚ ´ φJ˚ Σp ΦpK` σ2n Iq´1ΦJ Σp φ˚

K “ ΦJΣpΦ

Gaussian Processes Marc Deisenroth February 22, 2017 4

Page 6: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Recap from CO-496: Bayesian Linear Regression

§ Linear Regression Model:

f pxq “ φpxqJw, w „ N`

0, Σp˘

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Integrating out the parameters when predicting leads to adistribution over functions:

pp f px˚q|x˚, X, yq “ż

pp f px˚q|x˚, wqppw|X, yqdw

“ N`

µpx˚q, σ2px˚q˘

µpx˚q “ φJ˚ Σp ΦpK` σ2n Iq´1y

σ2px˚q “ φJ˚ Σp φ˚ ´ φJ˚ Σp ΦpK` σ2n Iq´1ΦJ Σp φ˚

K “ ΦJΣpΦ

Gaussian Processes Marc Deisenroth February 22, 2017 4

Page 7: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Recap from CO-496: Bayesian Linear Regression

§ Linear Regression Model:

f pxq “ φpxqJw, w „ N`

0, Σp˘

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Integrating out the parameters when predicting leads to adistribution over functions:

pp f px˚q|x˚, X, yq “ż

pp f px˚q|x˚, wqppw|X, yqdw

“ N`

µpx˚q, σ2px˚q˘

µpx˚q “ φJ˚ Σp ΦpK` σ2n Iq´1y

σ2px˚q “ φJ˚ Σp φ˚ ´ φJ˚ Σp ΦpK` σ2n Iq´1ΦJ Σp φ˚

K “ ΦJΣpΦ

Gaussian Processes Marc Deisenroth February 22, 2017 4

Page 8: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Prior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

Gaussian Processes Marc Deisenroth February 22, 2017 5

Page 9: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Prior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

demo: sampling from prior, sampling from posteriorGaussian Processes Marc Deisenroth February 22, 2017 6

Page 10: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Prior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

-10 0 10x

-10

-5

0

5

10

y

Gaussian Processes Marc Deisenroth February 22, 2017 7

Page 11: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Prior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-10 0 10x

-5

0

5

y

Gaussian Processes Marc Deisenroth February 22, 2017 8

Page 12: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Posterior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

Gaussian Processes Marc Deisenroth February 22, 2017 9

Page 13: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Posterior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

demo: sampling from prior, sampling from posteriorGaussian Processes Marc Deisenroth February 22, 2017 10

Page 14: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from the Posterior over Functions

Consider a linear regression setting

y “ a` bx` ε , ε „ N`

0, σ2n˘

ppa, bq “ N`

0, I˘

-4 -2 0 2 4a

-4

-2

0

2

4

b

-10 0 10x

-5

0

5

y

Gaussian Processes Marc Deisenroth February 22, 2017 11

Page 15: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Fitting Nonlinear Functions

§ Fit nonlinear functions using (Bayesian) linear regression:Linear combination of nonlinear features

§ Example: Radial-basis-function (RBF) network

f pxq “nÿ

i“1

wiφipxq , wi „ N`

0, σ2p˘

whereφipxq “ exp

`

´ 12px´ µiq

Jpx´ µiq˘

for given “centers” µi

Gaussian Processes Marc Deisenroth February 22, 2017 12

Page 16: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Fitting a Radial Basis Function Network

φipxq “ exp`

´ 12px´ µiq

Jpx´ µiq˘

-5 0 5x

-2

0

2f(

x)

§ Place Gaussian-shaped basis functions φi at 25 input locations µi,linearly spaced in the interval r´5, 3s

Gaussian Processes Marc Deisenroth February 22, 2017 13

Page 17: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Samples from the RBF Prior

f pxq “nÿ

i“1

wiφipxq , ppwq “ N`

0, I˘

-5 0 5x

-4

-2

0

2

4f(

x)

Gaussian Processes Marc Deisenroth February 22, 2017 14

Page 18: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Samples from the RBF Posterior

f pxq “nÿ

i“1

wiφipxq , ppw|X, yq “ N`

mN , SN˘

-5 0 5x

-4

-2

0

2

4f(

x)

Gaussian Processes Marc Deisenroth February 22, 2017 15

Page 19: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

RBF Posterior

-5 0 5x

-2

0

2f(

x)

Gaussian Processes Marc Deisenroth February 22, 2017 16

Page 20: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Limitations

-5 0 5x

-2

0

2

f(x)

§ Feature engineering§ Finite number of features:

§ Above: Without basis functions on the right, we cannot expressany variability of the function

§ Ideally: Add more (infinitely many) basis functionsGaussian Processes Marc Deisenroth February 22, 2017 17

Page 21: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Approach

§ Instead of sampling parameters, which induce a distribution overfunctions, sample functions directly

Make assumptions on the distribution of functions

§ Intuition: function = infinitely long vector of function valuesMake assumptions on the distribution of function values

Gaussian Processes Marc Deisenroth February 22, 2017 18

Page 22: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process

§ We will place a distribution pp f q on functions f§ Informally, a function can be considered an infinitely long vector

of function values f “ r f1, f2, f3, ...s§ A Gaussian process is a generalization of a multivariate Gaussian

distribution to infinitely many variables.

DefinitionA Gaussian process (GP) is a collection of random variables f1, f2, . . . ,any finite number of which is Gaussian distributed.

§ A Gaussian distribution is specified by a mean vector µ and acovariance matrix Σ

§ A Gaussian process is specified by a mean function mp¨q and acovariance function (kernel) kp¨, ¨q

Gaussian Processes Marc Deisenroth February 22, 2017 19

Page 23: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process

§ We will place a distribution pp f q on functions f§ Informally, a function can be considered an infinitely long vector

of function values f “ r f1, f2, f3, ...s§ A Gaussian process is a generalization of a multivariate Gaussian

distribution to infinitely many variables.

DefinitionA Gaussian process (GP) is a collection of random variables f1, f2, . . . ,any finite number of which is Gaussian distributed.

§ A Gaussian distribution is specified by a mean vector µ and acovariance matrix Σ

§ A Gaussian process is specified by a mean function mp¨q and acovariance function (kernel) kp¨, ¨q

Gaussian Processes Marc Deisenroth February 22, 2017 19

Page 24: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process

§ We will place a distribution pp f q on functions f§ Informally, a function can be considered an infinitely long vector

of function values f “ r f1, f2, f3, ...s§ A Gaussian process is a generalization of a multivariate Gaussian

distribution to infinitely many variables.

DefinitionA Gaussian process (GP) is a collection of random variables f1, f2, . . . ,any finite number of which is Gaussian distributed.

§ A Gaussian distribution is specified by a mean vector µ and acovariance matrix Σ

§ A Gaussian process is specified by a mean function mp¨q and acovariance function (kernel) kp¨, ¨q

Gaussian Processes Marc Deisenroth February 22, 2017 19

Page 25: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Covariance Function

§ The covariance function (kernel) is symmetric and positivesemi-definite

§ It allows us to compute covariances between (unknown) functionvalues by just looking at the corresponding inputs:

Covr f pxiq, f pxjqs “ kpxi, xjq

Gaussian Processes Marc Deisenroth February 22, 2017 20

Page 26: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd fPosterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 27: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd fPosterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 28: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd fPosterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 29: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd fPosterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 30: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd f

Posterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 31: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem

Objective

For a set of observations yi “ f pxiq ` ε, ε „ N`

0, σ2n˘

, find a(posterior) distribution over functions pp f |X, yq that explains the data

Training data: X, y. Bayes’ theorem yields

pp f |X, yq “ppy| f , Xq pp f q

ppy|Xq

Prior: pp f q “ GPpm, kq Specify mean m function and kernel k.

Likelihood (noise model): ppy| f , Xq “ N`

f pXq, σ2n I˘

Marginal likelihood (evidence): ppy|Xq “ş

ppy| f pXqqpp f |Xqd fPosterior: pp f |y, Xq “ GPpmpost, kpostq

Gaussian Processes Marc Deisenroth February 22, 2017 21

Page 32: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Prior over Functions

§ Treat a function as a long vector of function values:

f “ r f1, f2, . . . s

Look at a distribution over function values fi “ f pxiq

§ Consider a finite number of N function values f and all other(infinitely many) function values f . Informally:

pp f , f q “ N

¨

˝

»

µ f

µ f

fi

fl ,

»

Σ f f Σ f f

Σ f f Σ f f

fi

fl

˛

where Σ f f P Rmˆm and Σ f f P R

Nˆm, m Ñ8.

§ Σpi,jqf f “ Covr f pxiq, f pxjqs “ kpxi, xjq

§ Key property: The marginal remains finite

pp f q “ż

pp f , f qd f “ N`

µ f , Σ f f˘

Gaussian Processes Marc Deisenroth February 22, 2017 22

Page 33: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Prior over Functions

§ Treat a function as a long vector of function values:

f “ r f1, f2, . . . s

Look at a distribution over function values fi “ f pxiq

§ Consider a finite number of N function values f and all other(infinitely many) function values f . Informally:

pp f , f q “ N

¨

˝

»

µ f

µ f

fi

fl ,

»

Σ f f Σ f f

Σ f f Σ f f

fi

fl

˛

where Σ f f P Rmˆm and Σ f f P R

Nˆm, m Ñ8.

§ Σpi,jqf f “ Covr f pxiq, f pxjqs “ kpxi, xjq

§ Key property: The marginal remains finite

pp f q “ż

pp f , f qd f “ N`

µ f , Σ f f˘

Gaussian Processes Marc Deisenroth February 22, 2017 22

Page 34: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Prior over Functions

§ Treat a function as a long vector of function values:

f “ r f1, f2, . . . s

Look at a distribution over function values fi “ f pxiq

§ Consider a finite number of N function values f and all other(infinitely many) function values f . Informally:

pp f , f q “ N

¨

˝

»

µ f

µ f

fi

fl ,

»

Σ f f Σ f f

Σ f f Σ f f

fi

fl

˛

where Σ f f P Rmˆm and Σ f f P R

Nˆm, m Ñ8.

§ Σpi,jqf f “ Covr f pxiq, f pxjqs “ kpxi, xjq

§ Key property: The marginal remains finite

pp f q “ż

pp f , f qd f “ N`

µ f , Σ f f˘

Gaussian Processes Marc Deisenroth February 22, 2017 22

Page 35: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training and Test Marginal

§ In practice, we always have finite training and test inputsxtrain, xtest.

§ Define f˚ :“ f test, f :“ f train.

§ Then, we obtain the finite marginal

pp f , f˚ q “ż

pp f , f˚ , f other qd f other “ N˜«

µ f

µ˚

ff

,

«

Σ f f Σ f˚

Σ˚ f Σ˚˚

ff¸

Gaussian Processes Marc Deisenroth February 22, 2017 23

Page 36: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training and Test Marginal

§ In practice, we always have finite training and test inputsxtrain, xtest.

§ Define f˚ :“ f test, f :“ f train.

§ Then, we obtain the finite marginal

pp f , f˚ q “ż

pp f , f˚ , f other qd f other “ N˜«

µ f

µ˚

ff

,

«

Σ f f Σ f˚

Σ˚ f Σ˚˚

ff¸

Gaussian Processes Marc Deisenroth February 22, 2017 23

Page 37: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem (ctd.)

Posterior over functions (with training data X, y):

pp f |X, yq “ppy| f , Xq pp f |Xq

ppy|Xq

Using the properties of Gaussians, we obtain

ppy| f , Xq pp f |Xq “ N`

y | f pXq, σ2n I˘

N`

f pXq |mpXq, K˘

“ ZN`

f pXq | mpXq `KpK` σ2n Iq´1py´mpXqq

looooooooooooooooooooomooooooooooooooooooooon

posterior mean

, K´KpK` σ2n Iq´1K

loooooooooooomoooooooooooon

posterior covariance

˘

K “ kpX, Xq

Marginal likelihood:

Z “ ppy|Xq “ż

ppy| f , Xq pp f |Xq d f “ N`

y |mpXq, K` σ2n I˘

Gaussian Processes Marc Deisenroth February 22, 2017 24

Page 38: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem (ctd.)

Posterior over functions (with training data X, y):

pp f |X, yq “ppy| f , Xq pp f |Xq

ppy|Xq

Using the properties of Gaussians, we obtain

ppy| f , Xq pp f |Xq “ N`

y | f pXq, σ2n I˘

N`

f pXq |mpXq, K˘

“ ZN`

f pXq | mpXq `KpK` σ2n Iq´1py´mpXqq

looooooooooooooooooooomooooooooooooooooooooon

posterior mean

, K´KpK` σ2n Iq´1K

loooooooooooomoooooooooooon

posterior covariance

˘

K “ kpX, Xq

Marginal likelihood:

Z “ ppy|Xq “ż

ppy| f , Xq pp f |Xq d f “ N`

y |mpXq, K` σ2n I˘

Gaussian Processes Marc Deisenroth February 22, 2017 24

Page 39: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem (ctd.)

Posterior over functions (with training data X, y):

pp f |X, yq “ppy| f , Xq pp f |Xq

ppy|Xq

Using the properties of Gaussians, we obtain

ppy| f , Xq pp f |Xq “ N`

y | f pXq, σ2n I˘

N`

f pXq |mpXq, K˘

“ ZN`

f pXq | mpXq `KpK` σ2n Iq´1py´mpXqq

looooooooooooooooooooomooooooooooooooooooooon

posterior mean

, K´KpK` σ2n Iq´1K

loooooooooooomoooooooooooon

posterior covariance

˘

K “ kpX, Xq

Marginal likelihood:

Z “ ppy|Xq “ż

ppy| f , Xq pp f |Xq d f “ N`

y |mpXq, K` σ2n I˘

Gaussian Processes Marc Deisenroth February 22, 2017 24

Page 40: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Regression as a Bayesian Inference Problem (ctd.)

Posterior over functions (with training data X, y):

pp f |X, yq “ppy| f , Xq pp f |Xq

ppy|Xq

Using the properties of Gaussians, we obtain

ppy| f , Xq pp f |Xq “ N`

y | f pXq, σ2n I˘

N`

f pXq |mpXq, K˘

“ ZN`

f pXq | mpXq `KpK` σ2n Iq´1py´mpXqq

looooooooooooooooooooomooooooooooooooooooooon

posterior mean

, K´KpK` σ2n Iq´1K

loooooooooooomoooooooooooon

posterior covariance

˘

K “ kpX, Xq

Marginal likelihood:

Z “ ppy|Xq “ż

ppy| f , Xq pp f |Xq d f “ N`

y |mpXq, K` σ2n I˘

Gaussian Processes Marc Deisenroth February 22, 2017 24

Page 41: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (1)

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Objective: Find pp f pX˚q|X, yq for training data X, y and testinputs X˚.

§ GP prior: pp f |Xq “ N`

mpXq, K˘

§ Gaussian Likelihood: ppy| f pXqq “ N`

f pXq, σ2n I˘

§ With f „ GP it follows that f , f˚ are jointly Gaussian distributed:

pp f , f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K kpX, X˚qkpX˚, Xq kpX˚, X˚q

ff¸

§ Due to the Gaussian likelihood, we also get ( f is unobserved)

ppy, f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K`σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

ff¸

Gaussian Processes Marc Deisenroth February 22, 2017 25

Page 42: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (1)

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Objective: Find pp f pX˚q|X, yq for training data X, y and testinputs X˚.

§ GP prior: pp f |Xq “ N`

mpXq, K˘

§ Gaussian Likelihood: ppy| f pXqq “ N`

f pXq, σ2n I˘

§ With f „ GP it follows that f , f˚ are jointly Gaussian distributed:

pp f , f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K kpX, X˚qkpX˚, Xq kpX˚, X˚q

ff¸

§ Due to the Gaussian likelihood, we also get ( f is unobserved)

ppy, f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K`σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

ff¸

Gaussian Processes Marc Deisenroth February 22, 2017 25

Page 43: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (1)

y “ f pxq ` ε, ε „ N`

0, σ2n˘

§ Objective: Find pp f pX˚q|X, yq for training data X, y and testinputs X˚.

§ GP prior: pp f |Xq “ N`

mpXq, K˘

§ Gaussian Likelihood: ppy| f pXqq “ N`

f pXq, σ2n I˘

§ With f „ GP it follows that f , f˚ are jointly Gaussian distributed:

pp f , f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K kpX, X˚qkpX˚, Xq kpX˚, X˚q

ff¸

§ Due to the Gaussian likelihood, we also get ( f is unobserved)

ppy, f˚|X, X˚q “ N˜«

mpXqmpX˚q

ff

,

«

K`σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

ff¸

Gaussian Processes Marc Deisenroth February 22, 2017 25

Page 44: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (2)

Prior:

ppy, f˚|X, X˚q “ Nˆ„

mpXqmpX˚q

,„

K` σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

˙

Posterior predictive distribution pp f˚|X, y, X˚q at test inputs X˚

obtained by Gaussian conditioning:

pp f˚|X, y, X˚q “ N`

Er f˚|X, y, X˚s, Vr f˚|X, y, X˚s˘

Er f˚|X, y, X˚s “ mpostpX˚q “ mpX˚qloomoon

prior mean

`kpX˚, XqpK` σ2n Iq´1py´mpXqq

Vr f˚|X, y, X˚s “ kpostpX˚, X˚q

“ kpX˚, X˚qloooomoooon

prior variance

´kpX˚, XqpK` σ2n Iq´1kpX, X˚q

From now: Set prior mean function m ” 0

Gaussian Processes Marc Deisenroth February 22, 2017 26

Page 45: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (2)

Prior:

ppy, f˚|X, X˚q “ Nˆ„

mpXqmpX˚q

,„

K` σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

˙

Posterior predictive distribution pp f˚|X, y, X˚q at test inputs X˚obtained by Gaussian conditioning:

pp f˚|X, y, X˚q “ N`

Er f˚|X, y, X˚s, Vr f˚|X, y, X˚s˘

Er f˚|X, y, X˚s “ mpostpX˚q “ mpX˚qloomoon

prior mean

`kpX˚, XqpK` σ2n Iq´1py´mpXqq

Vr f˚|X, y, X˚s “ kpostpX˚, X˚q

“ kpX˚, X˚qloooomoooon

prior variance

´kpX˚, XqpK` σ2n Iq´1kpX, X˚q

From now: Set prior mean function m ” 0

Gaussian Processes Marc Deisenroth February 22, 2017 26

Page 46: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP Predictions (2)

Prior:

ppy, f˚|X, X˚q “ Nˆ„

mpXqmpX˚q

,„

K` σ2n I kpX, X˚q

kpX˚, Xq kpX˚, X˚q

˙

Posterior predictive distribution pp f˚|X, y, X˚q at test inputs X˚obtained by Gaussian conditioning:

pp f˚|X, y, X˚q “ N`

Er f˚|X, y, X˚s, Vr f˚|X, y, X˚s˘

Er f˚|X, y, X˚s “ mpostpX˚q “ mpX˚qloomoon

prior mean

`kpX˚, XqpK` σ2n Iq´1py´mpXqq

Vr f˚|X, y, X˚s “ kpostpX˚, X˚q

“ kpX˚, X˚qloooomoooon

prior variance

´kpX˚, XqpK` σ2n Iq´1kpX, X˚q

From now: Set prior mean function m ” 0Gaussian Processes Marc Deisenroth February 22, 2017 26

Page 47: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Prior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚,∅s “ mpx˚q “ 0Vr f px˚q|x˚,∅s “ σ2px˚q “ kpx˚, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 48: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Prior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚,∅s “ mpx˚q “ 0Vr f px˚q|x˚,∅s “ σ2px˚q “ kpx˚, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 49: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 50: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 51: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 52: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 53: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 54: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 55: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 56: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 57: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Illustration: Inference with Gaussian Processes

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8−3

−2

−1

0

1

2

3

x

f(x)

Posterior belief about the function

Predictive (marginal) mean and variance:

Er f px˚q|x˚, X, ys “ mpx˚q “ kpX, x˚qJpK` σ2ε Iq´1y

Vr f px˚q|x˚, X, ys “ σ2px˚q “ kpx˚, x˚q ´ kpX, x˚qJpK` σ2ε Iq´1kpX, x˚q

Gaussian Processes Marc Deisenroth February 22, 2017 27

Page 58: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Covariance Function

§ A Gaussian process is fully specified by a mean function m and akernel/covariance function k

§ The covariance function (kernel) is symmetric and positivesemi-definite

§ Covariance function encodes high-level structural assumptionsabout the latent function f (e.g., smoothness, differentiability,periodicity)

Gaussian Processes Marc Deisenroth February 22, 2017 28

Page 59: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Covariance FunctionkGausspxi, xjq “ σ2

f exp`

´ pxi ´ xjqJpxi ´ xjq{`

§ σf : Amplitude of the latent function§ `: Length scale. How far do we have to move in input space

before the function value changes significantlySmoothness parameter

§ Assumption on latent function: Smooth (8 differentiable)Gaussian Processes Marc Deisenroth February 22, 2017 29

Page 60: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Length-Scales

Length scales determine how wiggly the function is and how muchinformation we can transfer to other function values

x-10 -5 0 5 10

f(x)

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Data

Gaussian Processes Marc Deisenroth February 22, 2017 30

Page 61: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Length-Scales

Length scales determine how wiggly the function is and how muchinformation we can transfer to other function values

x-10 -5 0 5 10

f(x)

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Data

Gaussian Processes Marc Deisenroth February 22, 2017 30

Page 62: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Length-Scales

Length scales determine how wiggly the function is and how muchinformation we can transfer to other function values

x-10 -5 0 5 10

f(x)

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Data

Gaussian Processes Marc Deisenroth February 22, 2017 30

Page 63: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Length-Scales

Length scales determine how wiggly the function is and how muchinformation we can transfer to other function values

x-10 -5 0 5 10

f(x)

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3Data

Gaussian Processes Marc Deisenroth February 22, 2017 30

Page 64: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Matern Covariance Function

kMat,3{2pxi, xjq “ σ2f

´

1`?

3}xi´xj}

`

¯

exp´

´

?3}xi´xj}

`

¯

§ σf : Amplitude of the latent function§ `: Length scale. How far do we have to move in input space

before the function value changes significantly?

§ Assumption on latent function: 1-times differentiable

Gaussian Processes Marc Deisenroth February 22, 2017 31

Page 65: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Periodic Covariance Function

kperpxi, xjq “ σ2f exp

´

´2 sin2 ` κpxi´xjq

˘

`2

¯

“ kGausspupxiq, upxjqq, upxq “„

cospκxqsinpκxq

κ: Periodicity parameter

Gaussian Processes Marc Deisenroth February 22, 2017 32

Page 66: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Meta-Parameters of a GP

The GP possesses a set of hyper-parameters:

§ Parameters of the mean function

§ Hyper-parameters of the covariance function (e.g., length-scalesand signal variance)

§ Likelihood parameters (e.g., noise variance σ2n)

Train a GP to find a good set of hyper-parameters

Model selection to find good mean and covariance functions(can also be automated Automatic Statistician (Lloyd et al., 2014))

Gaussian Processes Marc Deisenroth February 22, 2017 33

Page 67: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Meta-Parameters of a GP

The GP possesses a set of hyper-parameters:

§ Parameters of the mean function

§ Hyper-parameters of the covariance function (e.g., length-scalesand signal variance)

§ Likelihood parameters (e.g., noise variance σ2n)

Train a GP to find a good set of hyper-parameters

Model selection to find good mean and covariance functions(can also be automated Automatic Statistician (Lloyd et al., 2014))

Gaussian Processes Marc Deisenroth February 22, 2017 33

Page 68: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Meta-Parameters of a GP

The GP possesses a set of hyper-parameters:

§ Parameters of the mean function

§ Hyper-parameters of the covariance function (e.g., length-scalesand signal variance)

§ Likelihood parameters (e.g., noise variance σ2n)

Train a GP to find a good set of hyper-parameters

Model selection to find good mean and covariance functions(can also be automated Automatic Statistician (Lloyd et al., 2014))

Gaussian Processes Marc Deisenroth February 22, 2017 33

Page 69: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hyper-Parameters

GP TrainingFind good GP hyper-parameters θ (kerneland mean function parameters)

θ

σnyixi

f

N

§ Place a prior ppθq on hyper-parameters§ Posterior over hyper-parameters:

ppθ|X, yq “ppθq ppy|X, θq

ppy|Xq, ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

§ Choose hyper-parameters θ˚, such that

θ˚ P arg maxθ

log ppθq ` log ppy|X, θq

Maximize marginal likelihood if ppθq “ U (uniform prior)

Gaussian Processes Marc Deisenroth February 22, 2017 34

Page 70: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hyper-Parameters

GP TrainingFind good GP hyper-parameters θ (kerneland mean function parameters)

θ

σnyixi

f

N

§ Place a prior ppθq on hyper-parameters§ Posterior over hyper-parameters:

ppθ|X, yq “ppθq ppy|X, θq

ppy|Xq, ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

§ Choose hyper-parameters θ˚, such that

θ˚ P arg maxθ

log ppθq ` log ppy|X, θq

Maximize marginal likelihood if ppθq “ U (uniform prior)

Gaussian Processes Marc Deisenroth February 22, 2017 34

Page 71: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hyper-Parameters

GP TrainingFind good GP hyper-parameters θ (kerneland mean function parameters)

θ

σnyixi

f

N

§ Place a prior ppθq on hyper-parameters§ Posterior over hyper-parameters:

ppθ|X, yq “ppθq ppy|X, θq

ppy|Xq, ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

§ Choose hyper-parameters θ˚, such that

θ˚ P arg maxθ

log ppθq ` log ppy|X, θq

Maximize marginal likelihood if ppθq “ U (uniform prior)

Gaussian Processes Marc Deisenroth February 22, 2017 34

Page 72: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hyper-Parameters

GP TrainingFind good GP hyper-parameters θ (kerneland mean function parameters)

θ

σnyixi

f

N

§ Place a prior ppθq on hyper-parameters§ Posterior over hyper-parameters:

ppθ|X, yq “ppθq ppy|X, θq

ppy|Xq, ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

§ Choose hyper-parameters θ˚, such that

θ˚ P arg maxθ

log ppθq ` log ppy|X, θq

Maximize marginal likelihood if ppθq “ U (uniform prior)

Gaussian Processes Marc Deisenroth February 22, 2017 34

Page 73: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training via Marginal Likelihood Maximization

GP TrainingMaximize the evidence/marginal likelihood (probability of the datagiven the hyper-parameters, where the unwieldy f has beenintegrated out) Also called Maximum Likelihood-Type-II

Marginal likelihood:

ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

ż

N`

y | f pXq, σ2n I˘

N`

f pXq | 0, K˘

d f “ N`

y | 0, K` σ2n I˘

Learning the GP hyper-parameters:

θ˚ P arg maxθ

log ppy|X, θq

log ppy|X, θq “ ´12 yJK´1

θ y ´ 12 log |Kθ| ` const , Kθ :“ K` σ2

n I

Gaussian Processes Marc Deisenroth February 22, 2017 35

Page 74: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training via Marginal Likelihood Maximization

GP TrainingMaximize the evidence/marginal likelihood (probability of the datagiven the hyper-parameters, where the unwieldy f has beenintegrated out) Also called Maximum Likelihood-Type-II

Marginal likelihood:

ppy|X, θq “

ż

ppy| f pXqqpp f |X, θqd f

ż

N`

y | f pXq, σ2n I˘

N`

f pXq | 0, K˘

d f “ N`

y | 0, K` σ2n I˘

Learning the GP hyper-parameters:

θ˚ P arg maxθ

log ppy|X, θq

log ppy|X, θq “ ´12 yJK´1

θ y ´ 12 log |Kθ| ` const , Kθ :“ K` σ2

n I

Gaussian Processes Marc Deisenroth February 22, 2017 35

Page 75: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training via Marginal Likelihood Maximization

Log-marginal likelihood:

log ppy|X, θq “ ´12 yJK´1

θ y ´ 12 log |Kθ| ` const , Kθ :“ K` σ2

n I

§ Automatic trade-off between data fit and model complexity

§ Gradient-based optimization of hyper-parameters θ:

B log ppy|X, θq

Bθi“ 1

2 yJK´1θ

BKθ

BθiK´1

θ y´ 12 tr

`

K´1θ

BKθ

Bθi

˘

“ 12 tr

`

pααJ ´K´1θ qBKθ

Bθi

˘

,

α :“ K´1θ y

Gaussian Processes Marc Deisenroth February 22, 2017 36

Page 76: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training via Marginal Likelihood Maximization

Log-marginal likelihood:

log ppy|X, θq “ ´12 yJK´1

θ y ´ 12 log |Kθ| ` const , Kθ :“ K` σ2

n I

§ Automatic trade-off between data fit and model complexity

§ Gradient-based optimization of hyper-parameters θ:

B log ppy|X, θq

Bθi“ 1

2 yJK´1θ

BKθ

BθiK´1

θ y´ 12 tr

`

K´1θ

BKθ

Bθi

˘

“ 12 tr

`

pααJ ´K´1θ qBKθ

Bθi

˘

,

α :“ K´1θ y

Gaussian Processes Marc Deisenroth February 22, 2017 36

Page 77: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Training via Marginal Likelihood Maximization

Log-marginal likelihood:

log ppy|X, θq “ ´12 yJK´1

θ y ´ 12 log |Kθ| ` const , Kθ :“ K` σ2

n I

§ Automatic trade-off between data fit and model complexity

§ Gradient-based optimization of hyper-parameters θ:

B log ppy|X, θq

Bθi“ 1

2 yJK´1θ

BKθ

BθiK´1

θ y´ 12 tr

`

K´1θ

BKθ

Bθi

˘

“ 12 tr

`

pααJ ´K´1θ qBKθ

Bθi

˘

,

α :“ K´1θ y

Gaussian Processes Marc Deisenroth February 22, 2017 36

Page 78: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example: Training Data

-10 -8 -6 -4 -2 0 2 4 6 8 10x

-3

-2

-1

0

1

2

3y

Gaussian Processes Marc Deisenroth February 22, 2017 37

Page 79: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example: Marginal Likelihood Contour

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5

log-

leng

th-s

cale

sLog-Marginal Likelihood, N=20

-4

-3.5

-3

-2.5

-2

-1.5

Gaussian Processes Marc Deisenroth February 22, 2017 38

Page 80: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example: Exploring the Modes (1)

-10 -8 -6 -4 -2 0 2 4 6 8 10x

-3

-2

-1

0

1

2

3y

Gaussian Processes Marc Deisenroth February 22, 2017 39

Page 81: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example: Exploring the Modes (2)

-10 -8 -6 -4 -2 0 2 4 6 8 10x

-3

-2

-1

0

1

2

3y

Gaussian Processes Marc Deisenroth February 22, 2017 40

Page 82: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (1)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=2

-4

-3.5

-3

-2.5

-2

-1.5

-1

Gaussian Processes Marc Deisenroth February 22, 2017 41

Page 83: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (2)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=3

-4

-3.5

-3

-2.5

-2

-1.5

Gaussian Processes Marc Deisenroth February 22, 2017 42

Page 84: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (3)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=5

-4

-3.5

-3

-2.5

-2

-1.5

-1

Gaussian Processes Marc Deisenroth February 22, 2017 43

Page 85: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (4)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=10

-4

-3.5

-3

-2.5

-2

-1.5

-1

Gaussian Processes Marc Deisenroth February 22, 2017 44

Page 86: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (5)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=15

-4

-3.5

-3

-2.5

-2

-1.5

Gaussian Processes Marc Deisenroth February 22, 2017 45

Page 87: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (6)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=20

-4

-3.5

-3

-2.5

-2

-1.5

Gaussian Processes Marc Deisenroth February 22, 2017 46

Page 88: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (7)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=50

-4

-3.5

-3

-2.5

-2

-1.5

Gaussian Processes Marc Deisenroth February 22, 2017 47

Page 89: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (8)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=100

-4

-3.5

-3

-2.5

-2

-1.5

-1

Gaussian Processes Marc Deisenroth February 22, 2017 48

Page 90: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood (9)

-6 -5 -4 -3 -2 -1 0log-noise

-1

0

1

2

3

4

5lo

g-le

ngth

-sca

les

Log-Marginal Likelihood, N=200

-4

-3.5

-3

-2.5

-2

-1.5

-1

Gaussian Processes Marc Deisenroth February 22, 2017 49

Page 91: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex

§ In particular in the very-small-data regime, a GP can end up inthree different modes when optimizing the hyper-parameters:

§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 92: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:

§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 93: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)

§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 94: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)

§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 95: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 96: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 97: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 98: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal Likelihood and Parameter Learning

§ The marginal likelihood is non-convex§ In particular in the very-small-data regime, a GP can end up in

three different modes when optimizing the hyper-parameters:§ Overfitting (unlikely, but possible)§ Underfitting (everything is considered noise)§ Good fit

§ Re-start hyper-parameter optimization from randominitialization to mitigate the problem

§ With increasing data set size the GP typically ends up in the“good-fit” mode. Overfitting (indicator: small length-scales andsmall noise variance) is very unlikely.

§ Ideally, we would integrate the hyper-parameters outWhy can we do not do this easily?

Gaussian Processes Marc Deisenroth February 22, 2017 50

Page 99: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Model Selection—Mean Function and Kernel

§ Assume we have a finite set of models Mi, each one specifying amean function mi and a kernel ki. How do we find the best one?

§ Some options:§ BIC, AIC (see CO-496)§ Compare marginal likelihood values (assuming a uniform prior on

the set of models)

Gaussian Processes Marc Deisenroth February 22, 2017 51

Page 100: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Model Selection—Mean Function and Kernel

§ Assume we have a finite set of models Mi, each one specifying amean function mi and a kernel ki. How do we find the best one?

§ Some options:§ BIC, AIC (see CO-496)§ Compare marginal likelihood values (assuming a uniform prior on

the set of models)

Gaussian Processes Marc Deisenroth February 22, 2017 51

Page 101: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example

x-4 -3 -2 -1 0 1 2 3 4

f(x)

-2

-1

0

1

2

3

§ Four different kernels (mean function fixed to m ” 0)§ MAP hyper-parameters for each kernel§ Log-marginal likelihood values for each (optimized) model

Gaussian Processes Marc Deisenroth February 22, 2017 52

Page 102: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example

x-4 -3 -2 -1 0 1 2 3 4

f(x)

-2

-1

0

1

2

3Constant kernel, LML=-1.1073

§ Four different kernels (mean function fixed to m ” 0)§ MAP hyper-parameters for each kernel§ Log-marginal likelihood values for each (optimized) model

Gaussian Processes Marc Deisenroth February 22, 2017 52

Page 103: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example

x-4 -3 -2 -1 0 1 2 3 4

f(x)

-2

-1

0

1

2

3Linear kernel, LML=-1.0065

§ Four different kernels (mean function fixed to m ” 0)§ MAP hyper-parameters for each kernel§ Log-marginal likelihood values for each (optimized) model

Gaussian Processes Marc Deisenroth February 22, 2017 52

Page 104: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example

x-4 -3 -2 -1 0 1 2 3 4

f(x)

-2

-1

0

1

2

3Matern kernel, LML=-0.8625

§ Four different kernels (mean function fixed to m ” 0)§ MAP hyper-parameters for each kernel§ Log-marginal likelihood values for each (optimized) model

Gaussian Processes Marc Deisenroth February 22, 2017 52

Page 105: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Example

x-4 -3 -2 -1 0 1 2 3 4

f(x)

-2

-1

0

1

2

3Gaussian kernel, LML=-0.69308

§ Four different kernels (mean function fixed to m ” 0)§ MAP hyper-parameters for each kernel§ Log-marginal likelihood values for each (optimized) model

Gaussian Processes Marc Deisenroth February 22, 2017 52

Page 106: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Application Areas

−2 0 2−5

0

5

angle in rad

ang.

vel.

in r

ad/s

−2

0

2

4

6

8

§ Reinforcement learning and roboticsModel value functions and/or dynamics with GPs

§ Bayesian optimization (Experimental Design)Model unknown utility functions with GPs

§ GeostatisticsSpatial modeling (e.g., landscapes, resources)

§ Sensor networks§ Time-series modeling and forecasting

Gaussian Processes Marc Deisenroth February 22, 2017 53

Page 107: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Limitations of Gaussian Processes

Computational and memory complexityTraining set size: N

§ Training scales in OpN3q

§ Prediction (variances) scales in OpN2q

§ Memory requirement: OpND` N2q

Practical limit N « 10, 000

Gaussian Processes Marc Deisenroth February 22, 2017 54

Page 108: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Tips and Tricks for Practitioners

§ To set initial hyper-parameters, use domain knowledge ifpossible.

§ Standardize input data and set initial length-scales ` to « 0.5.§ Standardize targets y and set initial signal variance to σf « 1.§ Often useful: Set initial noise level relatively high (e.g.,

σn « 0.5ˆ σf amplitude, even if you think your data have lownoise. The optimization surface for your other parameters will beeasier to move in.

§ When optimizing hyper-parameters, try random restarts or othertricks to avoid local optima are advised.

§ Mitigate the problem of numerical instability (Choleskydecomposition of K` σ2

n I) by penalizing high signal-to-noiseratios σf {σn

Gaussian Processes Marc Deisenroth February 22, 2017 55

Page 109: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Appendix

Gaussian Processes Marc Deisenroth February 22, 2017 56

Page 110: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution

ppx|µ, Σq “ p2πq´D2 |Σ|´

12 exp

`

´ 12px´ µqJΣ´1px´ µq

˘

§ Mean vector µ Average of the data

§ Covariance matrix Σ Spread of the data

−4 −3 −2 −1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

x

p(x

)

p(x)

Mean

95% confidence bound

x

86

42

0

y

42

02

46

8

p(x, y

)

0.04

0.03

0.02

0.01

0.00

0.01

0.02

0.03

0.04

Gaussian Processes Marc Deisenroth February 22, 2017 57

Page 111: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution

ppx|µ, Σq “ p2πq´D2 |Σ|´

12 exp

`

´ 12px´ µqJΣ´1px´ µq

˘

§ Mean vector µ Average of the data

§ Covariance matrix Σ Spread of the data

−4 −3 −2 −1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

x

p(x

)

p(x)

Mean

95% confidence bound

−5 −4 −3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

x1

x2

Mean

95% confidence bound

Gaussian Processes Marc Deisenroth February 22, 2017 57

Page 112: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution

ppx|µ, Σq “ p2πq´D2 |Σ|´

12 exp

`

´ 12px´ µqJΣ´1px´ µq

˘

§ Mean vector µ Average of the data

§ Covariance matrix Σ Spread of the data

−4 −3 −2 −1 0 1 2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

0.3

x

p(x

)

Data

p(x)

Mean

95% confidence interval

−5 −4 −3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

x1

x2

Data

Mean

95% confidence bound

Gaussian Processes Marc Deisenroth February 22, 2017 57

Page 113: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from a Multivariate Gaussian

Objective

Generate a random sample y „ N`

µ, Σ˘

from a D-dimensional jointGaussian with covariance matrix Σ and mean vector µ.

However, we only have access to a random number generator thatcan sample x from N

`

0, I˘

...

Exploit that affine transformations y “ Ax` b of a Gaussian randomvariable x remain Gaussian

§ Mean: ExrAx` bs “ AExrxs ` b§ Covariance: VxrAx` bs “ AVxrxsAJ

1. Find conditions for A, b to match the mean of y

2. Find conditions for A, b to match the covariance of y

Gaussian Processes Marc Deisenroth February 22, 2017 58

Page 114: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from a Multivariate Gaussian

Objective

Generate a random sample y „ N`

µ, Σ˘

from a D-dimensional jointGaussian with covariance matrix Σ and mean vector µ.

However, we only have access to a random number generator thatcan sample x from N

`

0, I˘

...

Exploit that affine transformations y “ Ax` b of a Gaussian randomvariable x remain Gaussian

§ Mean: ExrAx` bs “ AExrxs ` b§ Covariance: VxrAx` bs “ AVxrxsAJ

1. Find conditions for A, b to match the mean of y

2. Find conditions for A, b to match the covariance of y

Gaussian Processes Marc Deisenroth February 22, 2017 58

Page 115: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from a Multivariate Gaussian

Objective

Generate a random sample y „ N`

µ, Σ˘

from a D-dimensional jointGaussian with covariance matrix Σ and mean vector µ.

However, we only have access to a random number generator thatcan sample x from N

`

0, I˘

...

Exploit that affine transformations y “ Ax` b of a Gaussian randomvariable x remain Gaussian

§ Mean: ExrAx` bs “ AExrxs ` b§ Covariance: VxrAx` bs “ AVxrxsAJ

1. Find conditions for A, b to match the mean of y

2. Find conditions for A, b to match the covariance of yGaussian Processes Marc Deisenroth February 22, 2017 58

Page 116: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from a Multivariate Gaussian (2)

Objective

Generate a random sample y „ N`

µ, Σ˘

from a D-dimensional jointGaussian with covariance matrix Σ and mean vector µ.

x = randn(D,1); Sample x „ N`

0, I˘

y = chol(Σ)’*x + µ; Scale x and add offset

Here chol(Σ) is the Cholesky factor L, such that LJL “ Σ

Therefore, the mean and covariance of y are

Erys “ y “ ErLJx` µs “ LJErxs ` µ “ µ

Covrys “ Erpy´ yqpy´ yqJs “ ErLJxxJLs “ LJErxxJsL “ LJL “ Σ

Gaussian Processes Marc Deisenroth February 22, 2017 59

Page 117: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Sampling from a Multivariate Gaussian (2)

Objective

Generate a random sample y „ N`

µ, Σ˘

from a D-dimensional jointGaussian with covariance matrix Σ and mean vector µ.

x = randn(D,1); Sample x „ N`

0, I˘

y = chol(Σ)’*x + µ; Scale x and add offset

Here chol(Σ) is the Cholesky factor L, such that LJL “ Σ

Therefore, the mean and covariance of y are

Erys “ y “ ErLJx` µs “ LJErxs ` µ “ µ

Covrys “ Erpy´ yqpy´ yqJs “ ErLJxxJLs “ LJErxxJsL “ LJL “ Σ

Gaussian Processes Marc Deisenroth February 22, 2017 59

Page 118: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Conditional

x-6 -4 -2 0 2 4

y

-5

-4

-3

-2

-1

0

1

2

3Joint p(x,y) ppx, yq “ N

˜«

µxµy

ff

,

«

Σxx Σxy

Σyx Σyy

ff¸

ppx|yq “ N`

µx|y, Σx|y˘

µx|y “ µx ` Σxy Σ´1yy py´ µy q

Σx|y “ Σxx ´ Σxy Σ´1yy Σyx

Conditional ppx|yq is also GaussianComputationally convenient

Gaussian Processes Marc Deisenroth February 22, 2017 60

Page 119: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Conditional

x-6 -4 -2 0 2 4

y

-5

-4

-3

-2

-1

0

1

2

3Joint p(x,y)Observation

ppx, yq “ N˜«

µxµy

ff

,

«

Σxx Σxy

Σyx Σyy

ff¸

ppx|yq “ N`

µx|y, Σx|y˘

µx|y “ µx ` Σxy Σ´1yy py´ µy q

Σx|y “ Σxx ´ Σxy Σ´1yy Σyx

Conditional ppx|yq is also GaussianComputationally convenient

Gaussian Processes Marc Deisenroth February 22, 2017 60

Page 120: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Conditional

x-6 -4 -2 0 2 4

y

-5

-4

-3

-2

-1

0

1

2

3Joint p(x,y)Observation yConditional p(x|y)

ppx, yq “ N˜«

µxµy

ff

,

«

Σxx Σxy

Σyx Σyy

ff¸

ppx|yq “ N`

µx|y, Σx|y˘

µx|y “ µx ` Σxy Σ´1yy py´ µy q

Σx|y “ Σxx ´ Σxy Σ´1yy Σyx

Conditional ppx|yq is also GaussianComputationally convenient

Gaussian Processes Marc Deisenroth February 22, 2017 60

Page 121: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal

x-6 -4 -2 0 2 4

y

-5

-4

-3

-2

-1

0

1

2

3Joint p(x,y)Marginal p(x)

ppx, yq “ N˜«

µxµy

ff

,

«

Σxx Σxy

Σyx Σyy

ff¸

Marginal distribution:

pp x q “ż

pp x , y qd y

“ N`

µx , Σxx˘

§ The marginal of a joint Gaussian distribution is Gaussian

§ Intuitively: Ignore (integrate out) everything you are notinterested in

Gaussian Processes Marc Deisenroth February 22, 2017 61

Page 122: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal

x-6 -4 -2 0 2 4

y

-5

-4

-3

-2

-1

0

1

2

3Joint p(x,y)Marginal p(x)

ppx, yq “ N˜«

µxµy

ff

,

«

Σxx Σxy

Σyx Σyy

ff¸

Marginal distribution:

pp x q “ż

pp x , y qd y

“ N`

µx , Σxx˘

§ The marginal of a joint Gaussian distribution is Gaussian

§ Intuitively: Ignore (integrate out) everything you are notinterested in

Gaussian Processes Marc Deisenroth February 22, 2017 61

Page 123: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution in the Limit

Consider the joint Gaussian distribution ppx, xq, where x P RD andx P Rk, k Ñ8 are random variables.

Then

ppx, xq “ N˜«

µxµx

ff

,„

Σxx ΣxxΣxx Σxx

¸

where Σxx P Rkˆk and Σxx P R

Dˆk, k Ñ8.However, the marginal remains finite

pp x q “ż

pp x , x qd x “ N`

µx , Σxx˘

where we integrate out an infinite number of random variables xi.

Gaussian Processes Marc Deisenroth February 22, 2017 62

Page 124: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution in the Limit

Consider the joint Gaussian distribution ppx, xq, where x P RD andx P Rk, k Ñ8 are random variables.Then

ppx, xq “ N˜«

µxµx

ff

,„

Σxx ΣxxΣxx Σxx

¸

where Σxx P Rkˆk and Σxx P R

Dˆk, k Ñ8.

However, the marginal remains finite

pp x q “ż

pp x , x qd x “ N`

µx , Σxx˘

where we integrate out an infinite number of random variables xi.

Gaussian Processes Marc Deisenroth February 22, 2017 62

Page 125: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

The Gaussian Distribution in the Limit

Consider the joint Gaussian distribution ppx, xq, where x P RD andx P Rk, k Ñ8 are random variables.Then

ppx, xq “ N˜«

µxµx

ff

,„

Σxx ΣxxΣxx Σxx

¸

where Σxx P Rkˆk and Σxx P R

Dˆk, k Ñ8.However, the marginal remains finite

pp x q “ż

pp x , x qd x “ N`

µx , Σxx˘

where we integrate out an infinite number of random variables xi.

Gaussian Processes Marc Deisenroth February 22, 2017 62

Page 126: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal and Conditional in the Limit

§ In practice, we consider finite training and test data xtrain, xtest

§ Then, x “ txtrain, xtest, xotheru

(xother plays the role of x from previous slide)

ppxq “ N

¨

˚

˚

˝

»

µtrain

µtest

µother

fi

ffi

ffi

fl

,

»

Σtrain Σtrain,test

Σtest,train Σtest

Σtrain,other

Σtest,other

Σother,train Σother,test Σother

fi

ffi

ffi

fl

˛

ppxtrain, xtestq “

ż

pp xtrain, xtest , xother qd xother

ppxtest|xtrainq “ N`

µ˚, Σ˚˘

µ˚ “ µtest ` Σtest,train Σ´1train pxtrain ´ µtrain q

Σ˚ “ Σtest ´ Σtest,train Σ´1train Σtrain,test

Gaussian Processes Marc Deisenroth February 22, 2017 63

Page 127: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal and Conditional in the Limit

§ In practice, we consider finite training and test data xtrain, xtest

§ Then, x “ txtrain, xtest, xotheru

(xother plays the role of x from previous slide)

ppxq “ N

¨

˚

˚

˝

»

µtrain

µtest

µother

fi

ffi

ffi

fl

,

»

Σtrain Σtrain,test

Σtest,train Σtest

Σtrain,other

Σtest,other

Σother,train Σother,test Σother

fi

ffi

ffi

fl

˛

ppxtrain, xtestq “

ż

pp xtrain, xtest , xother qd xother

ppxtest|xtrainq “ N`

µ˚, Σ˚˘

µ˚ “ µtest ` Σtest,train Σ´1train pxtrain ´ µtrain q

Σ˚ “ Σtest ´ Σtest,train Σ´1train Σtrain,test

Gaussian Processes Marc Deisenroth February 22, 2017 63

Page 128: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal and Conditional in the Limit

§ In practice, we consider finite training and test data xtrain, xtest

§ Then, x “ txtrain, xtest, xotheru

(xother plays the role of x from previous slide)

ppxq “ N

¨

˚

˚

˝

»

µtrain

µtest

µother

fi

ffi

ffi

fl

,

»

Σtrain Σtrain,test

Σtest,train Σtest

Σtrain,other

Σtest,other

Σother,train Σother,test Σother

fi

ffi

ffi

fl

˛

ppxtrain, xtestq “

ż

pp xtrain, xtest , xother qd xother

ppxtest|xtrainq “ N`

µ˚, Σ˚˘

µ˚ “ µtest ` Σtest,train Σ´1train pxtrain ´ µtrain q

Σ˚ “ Σtest ´ Σtest,train Σ´1train Σtrain,test

Gaussian Processes Marc Deisenroth February 22, 2017 63

Page 129: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal and Conditional in the Limit

§ In practice, we consider finite training and test data xtrain, xtest

§ Then, x “ txtrain, xtest, xotheru

(xother plays the role of x from previous slide)

ppxq “ N

¨

˚

˚

˝

»

µtrain

µtest

µother

fi

ffi

ffi

fl

,

»

Σtrain Σtrain,test

Σtest,train Σtest

Σtrain,other

Σtest,other

Σother,train Σother,test Σother

fi

ffi

ffi

fl

˛

ppxtrain, xtestq “

ż

pp xtrain, xtest , xother qd xother

ppxtest|xtrainq “ N`

µ˚, Σ˚˘

µ˚ “ µtest ` Σtest,train Σ´1train pxtrain ´ µtrain q

Σ˚ “ Σtest ´ Σtest,train Σ´1train Σtrain,test

Gaussian Processes Marc Deisenroth February 22, 2017 63

Page 130: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Marginal and Conditional in the Limit

§ In practice, we consider finite training and test data xtrain, xtest

§ Then, x “ txtrain, xtest, xotheru

(xother plays the role of x from previous slide)

ppxq “ N

¨

˚

˚

˝

»

µtrain

µtest

µother

fi

ffi

ffi

fl

,

»

Σtrain Σtrain,test

Σtest,train Σtest

Σtrain,other

Σtest,other

Σother,train Σother,test Σother

fi

ffi

ffi

fl

˛

ppxtrain, xtestq “

ż

pp xtrain, xtest , xother qd xother

ppxtest|xtrainq “ N`

µ˚, Σ˚˘

µ˚ “ µtest ` Σtest,train Σ´1train pxtrain ´ µtrain q

Σ˚ “ Σtest ´ Σtest,train Σ´1train Σtrain,test

Gaussian Processes Marc Deisenroth February 22, 2017 63

Page 131: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hierarchical Inference

§ Level-1 inference (posterior on f ):

pp f |X, y, θq “ppy|X, f q pp f |X, θq

ppy|X, θq

ppy|X, θq “

ż

ppy| f , Xq pp f |X, f θqd f

§ Level-2 inference (posterior on θ)

ppθ|X, yq “ppy|X, θq ppθq

ppy|Xq

θ

σnyixi

f

N

Gaussian Processes Marc Deisenroth February 22, 2017 64

Page 132: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

Gaussian Process Training: Hierarchical Inference

§ Level-1 inference (posterior on f ):

pp f |X, y, θq “ppy|X, f q pp f |X, θq

ppy|X, θq

ppy|X, θq “

ż

ppy| f , Xq pp f |X, f θqd f

§ Level-2 inference (posterior on θ)

ppθ|X, yq “ppy|X, θq ppθq

ppy|Xq

θ

σnyixi

f

N

Gaussian Processes Marc Deisenroth February 22, 2017 64

Page 133: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP as the Limit of an Infinite RBF Network

Consider the universal function approximator

f pxq “ÿ

iPZ

limNÑ8

1N

Nÿ

n“1

γn exp

˜

´px´ pi` n

N qq2

λ2

¸

, x P R , λ P R`

with γn „ N`

0, 1˘

(random weights)Gaussian-shaped basis functions (with variance λ2{2) everywhere

on the real axis

f pxq “ÿ

iPZ

ż i`1

iγpsq exp

ˆ

´px´ sq2

λ2

˙

ds “ż 8

´8

γpsq expˆ

´px´ sq2

λ2

˙

ds

§ Mean: Er f pxqs “ 0

§ Covariance: Covr f pxq, f px1qs “ θ21 exp

´

´px´x1q2

2λ2

¯

for suitable θ21

GP with mean 0 and Gaussian covariance function

Gaussian Processes Marc Deisenroth February 22, 2017 65

Page 134: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP as the Limit of an Infinite RBF Network

Consider the universal function approximator

f pxq “ÿ

iPZ

limNÑ8

1N

Nÿ

n“1

γn exp

˜

´px´ pi` n

N qq2

λ2

¸

, x P R , λ P R`

with γn „ N`

0, 1˘

(random weights)Gaussian-shaped basis functions (with variance λ2{2) everywhere

on the real axis

f pxq “ÿ

iPZ

ż i`1

iγpsq exp

ˆ

´px´ sq2

λ2

˙

ds “ż 8

´8

γpsq expˆ

´px´ sq2

λ2

˙

ds

§ Mean: Er f pxqs “ 0

§ Covariance: Covr f pxq, f px1qs “ θ21 exp

´

´px´x1q2

2λ2

¯

for suitable θ21

GP with mean 0 and Gaussian covariance function

Gaussian Processes Marc Deisenroth February 22, 2017 65

Page 135: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

GP as the Limit of an Infinite RBF Network

Consider the universal function approximator

f pxq “ÿ

iPZ

limNÑ8

1N

Nÿ

n“1

γn exp

˜

´px´ pi` n

N qq2

λ2

¸

, x P R , λ P R`

with γn „ N`

0, 1˘

(random weights)Gaussian-shaped basis functions (with variance λ2{2) everywhere

on the real axis

f pxq “ÿ

iPZ

ż i`1

iγpsq exp

ˆ

´px´ sq2

λ2

˙

ds “ż 8

´8

γpsq expˆ

´px´ sq2

λ2

˙

ds

§ Mean: Er f pxqs “ 0

§ Covariance: Covr f pxq, f px1qs “ θ21 exp

´

´px´x1q2

2λ2

¯

for suitable θ21

GP with mean 0 and Gaussian covariance functionGaussian Processes Marc Deisenroth February 22, 2017 65

Page 136: Gaussian Processes - Imperial College Londondfg/ProbabilisticInference/GaussianProce… · A Gaussian process is a generalization of a multivariate Gaussian distribution to infinitely

References I

[1] N. A. C. Cressie. Statistics for Spatial Data. Wiley-Interscience, 1993.[2] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems. In Advances in

Neural Information Processing Systems, pages 2618–2626, 2012.[3] M. P. Deisenroth and J. W. Ng. Distributed Gaussian Processes. In Proceedings of the International Conference on Machine

Learning, 2015.[4] M. P. Deisenroth, C. E. Rasmussen, and J. Peters. Gaussian Process Dynamic Programming. Neurocomputing,

72(7–9):1508–1524, March 2009.[5] M. P. Deisenroth, R. Turner, M. Huber, U. D. Hanebeck, and C. E. Rasmussen. Robust Filtering and Smoothing with

Gaussian Processes. IEEE Transactions on Automatic Control, 57(7):1865–1871, 2012.[6] R. Frigola, F. Lindsten, T. B. Schon, and C. E. Rasmussen. Bayesian Inference and Learning in Gaussian Process

State-Space Models with Particle MCMC. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger,editors, Advances in Neural Information Processing Systems, pages 3156–3164. Curran Associates, Inc., 2013.

[7] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard. Gaussian Process Model Based Predictive Control. InProceedings of the 2004 American Control Conference (ACC 2004), pages 2214–2219, Boston, MA, USA, June–July 2004.

[8] A. Krause, A. Singh, and C. Guestrin. Near-Optimal Sensor Placements in Gaussian Processes: Theory, EfficientAlgorithms and Empirical Studies. Journal of Machine Learning Research, 9:235–284, February 2008.

[9] J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani. Automatic Construction andNatural-Language Description of Nonparametric Regression Models. In AAAI Conference on Artificial Intelligence, pages1–11, 2014.

[10] M. A. Osborne, S. J. Roberts, A. Rogers, S. D. Ramchurn, and N. R. Jennings. Towards Real-Time Information Processingof Sensor Network Data Using Computationally Efficient Multi-output Gaussian Processes. In Proceedings of theInternational Conference on Information Processing in Sensor Networks, pages 109–120. IEEE Computer Society, 2008.

[11] J. Quinonero-Candela and C. E. Rasmussen. A Unifying View of Sparse Approximate Gaussian Process Regression.Journal of Machine Learning Research, 6(2):1939–1960, 2005.

[12] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. Adaptive Computation and MachineLearning. The MIT Press, Cambridge, MA, USA, 2006.

[13] S. Roberts, M. A. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain. Gaussian Processes for Time Series Modelling.Philosophical Transactions of the Royal Society (Part A), 371(1984), February 2013.

Gaussian Processes Marc Deisenroth February 22, 2017 66