302
Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 1 / 74

Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Session 1: Gaussian Processes

Neil D. Lawrence and Raquel Urtasun

CVPR16th June 2012

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 1 / 74

Page 2: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Book

?

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 2 / 74

Page 3: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 3 / 74

Page 4: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 4 / 74

Page 5: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Gaussian Density

Perhaps the most common probability density.

p(y |µ, σ2) =1√

2πσ2exp

(−(y − µ)2

2σ2

)= N

(y |µ, σ2

)The Gaussian density.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 5 / 74

Page 6: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Density

0

1

2

3

0 1 2

p(h|µ,σ

2)

h, height/m

The Gaussian PDF with µ = 1.7 and variance σ2 = 0.0225. Mean shownas red line. It could represent the heights of a population of students.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 6 / 74

Page 7: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Density

N(y |µ, σ2

)=

1√2πσ2

exp

(−(y − µ)

2

2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 7 / 74

Page 8: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 9: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 10: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 11: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 12: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 13: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 14: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 =mx1 + c

y2 =mx2 + c

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 15: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 − y2 =m(x1 − x2)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 16: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 − y2

x1 − x2=m

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 17: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

m =y2 − y1

x2 − x1

c = y1 −mx1

0

1

2

3

4

5

0 1 2 3y

x

c

y 1−

y 2

x2 − x1

m =y2−y1x2−x1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 18: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Simultaneous Equations

How do we deal with threesimultaneous equations with only twounknowns?

y1 =mx1 + c

y2 =mx2 + c

y3 =mx3 + c 0

1

2

3

4

5

0 1 2 3y

x

c

y 1−

y 2

x2 − x1

m =y2−y1x2−x1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 19: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 20: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 21: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 22: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Noise Models

We aren’t modeling entire system.

Noise model gives mismatch between model and data.

Gaussian model justified by appeal to central limit theorem.

Other models also possible (Student-t for heavy tails).

Maximum likelihood with Gaussian noise leads to least squares.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 11 / 74

Page 23: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

What about two unknowns and oneobservation?

y1 = mx1 + c

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 24: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

m =y1 − c

x

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 25: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = 1.75 =⇒ m = 1.25

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 26: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −0.777 =⇒ m = 3.78

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 27: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −4.01 =⇒ m = 7.01

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 28: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −0.718 =⇒ m = 3.72

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 29: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = 2.45 =⇒ m = 0.545

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 30: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −0.657 =⇒ m = 3.66

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 31: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −3.13 =⇒ m = 6.13

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 32: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.

c = −1.47 =⇒ m = 4.47

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 33: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Underdetermined System

Can compute m given c.Assume

c ∼ N (0, 4) ,

we find a distribution of solutions.0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 34: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Probability for Under- and Overdetermined

To deal with overdetermined introduced probability distribution for‘variable’, εi .

For underdetermined system introduced probability distribution for‘parameter’, c.

This is known as a Bayesian treatment.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 13 / 74

Page 35: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

For general Bayesian inference need multivariate priors.

E.g. for multivariate linear regression:

yi =∑i

wjxi ,j + εi

(where we’ve dropped c for convenience), we need a prior over w.

This motivates a multivariate Gaussian density.

We will use the multivariate Gaussian to put a prior directly on thefunction (a Gaussian process).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

Page 36: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

For general Bayesian inference need multivariate priors.

E.g. for multivariate linear regression:

yi = w>xi ,: + εi

(where we’ve dropped c for convenience), we need a prior over w.

This motivates a multivariate Gaussian density.

We will use the multivariate Gaussian to put a prior directly on thefunction (a Gaussian process).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

Page 37: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multivariate Regression Likelihood

Recall multivariate regression likelihood:

p(y|X,w) =1

(2πσ2)n/2exp

(− 1

2σ2

n∑i=1

(yi −w>xi ,:

)2)

Now use a multivariate Gaussian prior:

p(w) =1

(2πα)p2

exp

(− 1

2αw>w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

Page 38: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multivariate Regression Likelihood

Recall multivariate regression likelihood:

p(y|X,w) =1

(2πσ2)n/2exp

(− 1

2σ2

n∑i=1

(yi −w>xi ,:

)2)

Now use a multivariate Gaussian prior:

p(w) =1

(2πα)p2

exp

(− 1

2αw>w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

Page 39: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Posterior Density

Once again we want to know the posterior:

p(w|y,X) ∝ p(y|X,w)p(w)

And we can compute by completing the square.

log p(w|y,X) =− 1

2σ2

n∑i=1

y 2i +

1

σ2

n∑i=1

yix>i ,:w

− 1

2σ2

n∑i=1

w>xi ,:x>i ,:w −

1

2αw>w + const.

p(w|y,X) = N (w|µw ,Cw )

Cw = (σ−2X>X + α−1)−1 and µw = Cwσ−2X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

Page 40: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Posterior Density

Once again we want to know the posterior:

p(w|y,X) ∝ p(y|X,w)p(w)

And we can compute by completing the square.

log p(w|y,X) =− 1

2σ2

n∑i=1

y 2i +

1

σ2

n∑i=1

yix>i ,:w

− 1

2σ2

n∑i=1

w>xi ,:x>i ,:w −

1

2αw>w + const.

p(w|y,X) = N (w|µw ,Cw )

Cw = (σ−2X>X + α−1)−1 and µw = Cwσ−2X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

Page 41: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Bayesian vs Maximum Likelihood

Note the similarity between posterior mean

µw = (σ−2X>X + α−1)−1σ−2X>y

and Maximum likelihood solution

w = (X>X)−1X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 17 / 74

Page 42: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Marginal Likelihood is Computed as Normalizer

p(w|y,X)p(y|X) = p(y|w,X)p(w)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 18 / 74

Page 43: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Marginal Likelihood

Can compute the marginal likelihood as:

p(y|X, α, σ) = N(

y|0, αXX> + σ2I)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 19 / 74

Page 44: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Two Dimensional Gaussian

Consider height, h/m and weight, w/kg .

Could sample height from a distribution:

p(h) ∼ N (1.7, 0.0225)

And similarly weight:

p(w) ∼ N (75, 36)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 20 / 74

Page 45: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Height and Weight Modelsp

(h)

h/m

Marginal Distributions

p(w

)w/kg Gaussian

distributions for height and weight.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 21 / 74

Page 46: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 47: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 48: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 49: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 50: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 51: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 52: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 53: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 54: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 55: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 56: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 57: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 58: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 59: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 60: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 61: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 62: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 63: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 64: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 65: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 66: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 67: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 68: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 69: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Independence Assumption

This assumes height and weight are independent.

p(h,w) = p(h)p(w)

In reality they are dependent (body mass index) = wh2 .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 23 / 74

Page 70: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 71: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 72: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 73: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 74: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 75: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 76: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 77: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 78: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 79: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 80: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 81: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 82: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 83: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 84: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 85: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 86: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 87: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 88: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 89: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 90: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 91: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 92: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 93: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Independent Gaussians

p(w , h) = p(w)p(h)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 94: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Independent Gaussians

p(w , h) =1√

2πσ21

√2πσ2

2

exp

(−1

2

((w − µ1)2

σ21

+(h − µ2)2

σ22

))

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 95: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Independent Gaussians

p(w , h) =1

2π√σ2

1σ22

exp

(−1

2

([wh

]−[µ1

µ2

])> [σ2

1 00 σ2

2

]−1([wh

]−[µ1

µ2

]))

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 96: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Independent Gaussians

p(y) =1

2π |D|exp

(−1

2(y − µ)>D−1(y − µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 97: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(y − µ)>D−1(y − µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 98: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(R>y − R>µ)>D−1(R>y − R>µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 99: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(y − µ)>RD−1R>(y − µ)

)this gives a covariance matrix:

C−1 = RD−1R>

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 100: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |C|12

exp

(−1

2(y − µ)>C−1(y − µ)

)this gives a covariance matrix:

C = RDR>

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 101: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 102: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 103: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 104: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 105: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 106: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 107: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 108: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 109: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Sampling a Function

Multi-variate Gaussians

We will consider a Gaussian with a particular structure of covariancematrix.

Generate a single sample from this 25 dimensional Gaussiandistribution, f = [f1, f2 . . . f25].

We will plot these points against their index.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 29 / 74

Page 110: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

ji

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 111: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

ji

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 112: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 113: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 114: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 115: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 116: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 117: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

1 0.96587

0.96587 1

(b) correlation between f1 and f2.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 118: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 119: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 120: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 121: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 122: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction with Correlated Gaussians

Prediction of f2 from f1 requires conditional density.

Conditional density is also Gaussian.

p(f2|f1) = N

(f2|

k1,2

k1,1f1, k2,2 −

k21,2

k1,1

)

where covariance of joint density is given by

K =

[k1,1 k1,2

k2,1 k2,2

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 32 / 74

Page 123: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 124: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 125: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 126: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 127: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction with Correlated Gaussians

Prediction of f∗ from f requires multivariate conditional density.

Multivariate conditional density is also Gaussian.

p(f∗|f) = N(f∗|K∗,fK−1

f,f f,K∗,∗ −K∗,fK−1f,f Kf,∗

)

Here covariance of joint density is given by

K =

[Kf,f K∗,fKf,∗ K∗,∗

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 34 / 74

Page 128: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Prediction with Correlated Gaussians

Prediction of f∗ from f requires multivariate conditional density.

Multivariate conditional density is also Gaussian.

p(f∗|f) = N (f∗|µ,Σ)

µ = K∗,fK−1f,f f

Σ = K∗,∗ −K∗,fK−1f,f Kf,∗

Here covariance of joint density is given by

K =

[Kf,f K∗,fKf,∗ K∗,∗

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 34 / 74

Page 129: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k(x, x′

)= α exp

(−‖x− x′‖2

2

2`2

)

Covariance matrix is builtusing the inputs to thefunction x.

For the example above itwas based on Euclideandistance.

The covariance function isalso know as a kernel.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

Page 130: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k(x, x′

)= α exp

(−‖x− x′‖2

2

2`2

)

Covariance matrix is builtusing the inputs to thefunction x.

For the example above itwas based on Euclideandistance.

The covariance function isalso know as a kernel.

-3

-2

-1

0

1

2

3

-1 -0.5 0 0.5 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

Page 131: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x1 = −3.0, x1 = −3.0

k1,1 = 1.00× exp(− (−3.0−−3.0)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 132: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x1 = −3.0, x1 = −3.0

k1,1 = 1.00× exp(− (−3.0−−3.0)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 133: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 134: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 135: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 136: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x2 = 1.20

k2,2 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 137: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x2 = 1.20

k2,2 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 138: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 139: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 140: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 141: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 142: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 143: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 144: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 145: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 146: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 147: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x1 = −3, x1 = −3

k1,1 = 1.0× exp(− (−3−−3)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 148: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x1 = −3, x1 = −3

k1,1 = 1.0× exp(− (−3−−3)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 149: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 150: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 151: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 152: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x2 = 1.2

k2,2 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 153: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x2 = 1.2

k2,2 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 154: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 155: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 156: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 157: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 158: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 159: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 160: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x3 = 1.4

k3,3 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 161: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x3 = 1.4

k3,3 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 162: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 163: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 164: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 165: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 166: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 167: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 168: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 169: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 170: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 171: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 172: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 173: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 174: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x1 = −3.0, x1 = −3.0

k1,1 = 4.00× exp(− (−3.0−−3.0)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 175: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x1 = −3.0, x1 = −3.0

k1,1 = 4.00× exp(− (−3.0−−3.0)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 176: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 177: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 178: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 179: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x2 = 1.20

k2,2 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 180: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x2 = 1.20

k2,2 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 181: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 182: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 183: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 184: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 185: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 186: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 187: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 188: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 189: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 190: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 37 / 74

Page 191: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Basis Function Form

Radial basis functions commonly have the form

φk (xi ) = exp

(−|xi − µk |2

2`2

).

Basis function mapsdata into a “featurespace” in which alinear sum is a nonlinear function.

0

0.5

1

-8 -6 -4 -2 0 2 4 6 8

φ(x

)

x

Figure: A set of radial basis functions with width` = 2 and location parameters µ = [−4 0 4]>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 38 / 74

Page 192: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Basis Function Representations

Represent a function by a linear sum over a basis,

f (xi ,:; w) =m∑

k=1

wkφk(xi ,:), (1)

Here: m basis functions and φk(·) is kth basis function and

w = [w1, . . . ,wm]> .

For standard linear model: φk(xi ,:) = xi ,k .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 39 / 74

Page 193: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Random Functions

Functions derived using:

f (x) =m∑

k=1

wkφk(x),

where W is sampledfrom a Gaussian density,

wk ∼ N (0, α) .

-2

-1

0

1

2

-8 -6 -4 -2 0 2 4 6 8f

(x)

xFigure: Functions sampled using the basis set fromfigure 2. Each line is a separate sample, generated bya weighted sum of the basis set. The weights, w aresampled from a Gaussian density with variance α = 1.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 40 / 74

Page 194: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 195: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 196: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 197: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 198: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 199: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 200: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 201: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 202: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 203: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 204: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 205: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 206: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 207: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 208: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 209: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 210: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 211: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 212: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 213: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 214: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Uniform Basis Functions

Set each center location to

µk = a + ∆µ · (k − 1).

Specify the bases in terms of their indices,

k (xi , xj) =γ∆µm∑

k=1

exp

(−

x2i + x2

j

2`2

−2 (a + ∆µ · k) (xi + xj) + 2 (a + ∆µ · k)2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 45 / 74

Page 215: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Uniform Basis Functions

Set each center location to

µk = a + ∆µ · (k − 1).

Specify the bases in terms of their indices,

k (xi , xj) =γ∆µm∑

k=1

exp

(−

x2i + x2

j

2`2

−2 (a + ∆µ · k) (xi + xj) + 2 (a + ∆µ · k)2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 45 / 74

Page 216: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 217: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 218: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 219: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 220: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 221: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 222: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 223: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 224: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 225: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 226: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 227: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Nonparametric Gaussian Processes

This work takes us from parametric to non-parametric.

The limit implies infinite dimensional w.

Gaussian processes are generally non-parametric: combine data withcovariance function to get model.

This representation cannot be summarized by a parameter vector of afixed size.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 49 / 74

Page 228: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Parametric models have a representation that does not respond toincreasing training set size.

Bayesian posterior distributions over parameters contain theinformation about the training data.

I Use Bayes’ rule from training data, p (w|y,X),I Make predictions on test data

p (y∗|X∗, y,X) =

∫p (y∗|w,X∗) p (w|y,X)dw) .

w becomes a bottleneck for information about the training set to passto the test set.

Solution: increase m so that the bottleneck is so large that it nolonger presents a problem.

How big is big enough for m? Non-parametrics says m→∞.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 50 / 74

Page 229: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 230: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 231: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 232: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 233: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 234: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 235: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 236: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 237: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 238: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 239: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

RBF Basis Functions

k(x, x′

)= αφ(x)>φ(x′)

φi (x) = exp

(−‖x − µi‖2

2

`2

)

µ =

−101

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 53 / 74

Page 240: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

RBF Basis Functions

k(x, x′

)= αφ(x)>φ(x′)

φi (x) = exp

(−‖x − µi‖2

2

`2

)

µ =

−101

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 53 / 74

Page 241: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 242: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 243: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 244: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 55 / 74

Page 245: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Constructing Covariance Functions

Sum of two covariances is also a covariance function.

k(x, x′) = k1(x, x′) + k2(x, x′)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 56 / 74

Page 246: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Constructing Covariance Functions

Product of two covariances is also a covariance function.

k(x, x′) = k1(x, x′)k2(x, x′)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 57 / 74

Page 247: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Multiply by Deterministic Function

If f (x) is a Gaussian process.

g(x) is a deterministic function.

h(x) = f (x)g(x)

Thenkh(x, x′) = g(x)kf (x, x′)g(x′)

where kh is covariance for h(·) and kf is covariance for f (·).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 58 / 74

Page 248: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

MLP Covariance Function

k(x, x′

)= αasin

(wx>x′ + b

√wx>x + b + 1

√wx′>x′ + b + 1

)

Based on infinite neuralnetwork model.

w = 40

b = 4

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 59 / 74

Page 249: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

MLP Covariance Function

k(x, x′

)= αasin

(wx>x′ + b

√wx>x + b + 1

√wx′>x′ + b + 1

)

Based on infinite neuralnetwork model.

w = 40

b = 4 -2

-1

0

1

2

-1 0 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 59 / 74

Page 250: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

Linear Covariance Function

k(x, x′

)= αx>x′

Bayesian linear regression.

α = 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 60 / 74

Page 251: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Covariance Functions

Linear Covariance Function

k(x, x′

)= αx>x′

Bayesian linear regression.

α = 1

-2

-1

0

1

2

-1 0 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 60 / 74

Page 252: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 253: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 254: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 255: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 256: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 257: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 258: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 259: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 260: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Noise Models

Graph of a GP

Relates input variables, X,to vector, y, through fgiven kernel parameters θ.

Plate notation indicatesindependence of yi |fi .Noise model, p (yi |fi ) cantake several forms.

Simplest is Gaussiannoise.

yi

X

fi

θ

i = 1 . . . n

Figure: The Gaussian processdepicted graphically.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 62 / 74

Page 261: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Noise

Gaussian noise model,

p (yi |fi ) = N(yi |fi , σ2

)where σ2 is the variance of the noise.

Equivalent to a covariance function of the form

k(xi , xj) = δi ,jσ2

where δi ,j is the Kronecker delta function.

Additive nature of Gaussians means we can simply add this term toexisting covariance matrices.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 63 / 74

Page 262: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 263: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 264: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 265: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 266: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 267: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 268: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 269: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 270: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 271: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

N (y|0,K) =1

(2π)n2 |K|

exp

(−y>K−1y

2

)The parameters are inside the covariance function

(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 272: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

N (y|0,K) =1

(2π)n2 |K|

exp

(−y>K−1y

2

)The parameters are inside the covariance function

(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 273: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

logN (y|0,K) = −n2log 2π−1

2log |K|−y>K−1y

2

The parameters are inside the covariance function(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 274: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

E (θ) =1

2log |K| + y>K−1y

2

The parameters are inside the covariance function(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 275: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Eigendecomposition of Covariance

K = RΛ2R>

λ1

λ2

where Λ is a diagonal matrix and R>R = I.

Useful representation since |K| =∣∣Λ2∣∣ = |Λ|2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 66 / 74

Page 276: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 277: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 278: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 279: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 280: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 281: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 282: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0 0

0 λ2 0

0 0 λ3

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 283: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2λ3

λ1 0 0

0 λ2 0

0 0 λ3

λ1

λ2

λ3

|Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 284: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 285: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Capacity control: log |K|

|RΛ| = λ1λ2

w1,1 w1,2

w2,1 w2,2

λ1

λ2

|Λ|RΛ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 286: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 287: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 288: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 289: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 290: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 291: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 292: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 293: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 294: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 295: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 296: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 297: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 298: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Gene Expression Example

Global expression estimation with l = 30

Global expression estimation with l = 15.6

Data from ?. Figure from ?.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 70 / 74

Page 299: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 71 / 74

Page 300: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Limitations of Gaussian Processes

Inference is O(n3) due to matrix inverse (in practice use Cholesky).

Gaussian processes don’t deal well with discontinuities (financialcrises, phosphorylation, collisions, edges in images).

Widely used exponentiated quadratic covariance (RBF) can be toosmooth in practice (but there are many alternatives!!).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 72 / 74

Page 301: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

Summary

Broad introduction to Gaussian processes.I Started with Gaussian distribution.I Motivated Gaussian processes through the multivariate density.

Emphasized the role of the covariance (not the mean).

Performs nonlinear regression with error bars.

Parameters of the covariance function (kernel) are easily optimizedwith maximum likelihood.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 73 / 74

Page 302: Session 1: Gaussian Processesurtasun/tutorials/gp_cvpr12_session1.pdf · Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence

References I

G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6):939–948, Jun 2008. [URL]. [DOI].

A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses throughGaussian process regression. BMC Bioinformatics, 12(180), 2011. [DOI].

R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996. Lecture Notes in Statistics 118.

J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769–784, 2002.

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. [GoogleBooks] .

C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5):1203–1216, 1998.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 74 / 74