302
Session 1: Gaussian Processes Neil D. Lawrence and Raquel Urtasun CVPR 16th June 2012 Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 1 / 74

Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Session 1: Gaussian Processes

Neil D. Lawrence and Raquel Urtasun

CVPR16th June 2012

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 1 / 74

Page 2: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Book

?

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 2 / 74

Page 3: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 3 / 74

Page 4: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 4 / 74

Page 5: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Gaussian Density

Perhaps the most common probability density.

p(y |µ, σ2) =1√

2πσ2exp

(−(y − µ)2

2σ2

)= N

(y |µ, σ2

)The Gaussian density.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 5 / 74

Page 6: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Density

0

1

2

3

0 1 2

p(h|µ,σ

2)

h, height/m

The Gaussian PDF with µ = 1.7 and variance σ2 = 0.0225. Mean shownas red line. It could represent the heights of a population of students.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 6 / 74

Page 7: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Density

N(y |µ, σ2

)=

1√2πσ2

exp

(−(y − µ)

2

2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 7 / 74

Page 8: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 9: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 10: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 11: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 12: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 13: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Important Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)(Aside: As sum increases, sum of non-Gaussian, finite variancevariables is also Gaussian [central limit theorem].)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 8 / 74

Page 14: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 =mx1 + c

y2 =mx2 + c

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 15: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 − y2 =m(x1 − x2)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 16: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

y1 − y2

x1 − x2=m

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 17: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Simultaneous Equations

A system of two differentialequations with two unknowns.

m =y2 − y1

x2 − x1

c = y1 −mx1

0

1

2

3

4

5

0 1 2 3y

x

c

y 1−

y 2

x2 − x1

m =y2−y1x2−x1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 18: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Simultaneous Equations

How do we deal with threesimultaneous equations with only twounknowns?

y1 =mx1 + c

y2 =mx2 + c

y3 =mx3 + c 0

1

2

3

4

5

0 1 2 3y

x

c

y 1−

y 2

x2 − x1

m =y2−y1x2−x1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 9 / 74

Page 19: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 20: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 21: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Overdetermined System

With two unknowns and two observations:

y1 =mx1 + c

y2 =mx2 + c

Additional observation leads to overdetermined system.

y3 = mx3 + c

This problem is solved through a noise model ε ∼ N(0, σ2

)y1 = mx1 + c + ε1

y2 = mx2 + c + ε2

y3 = mx3 + c + ε3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 10 / 74

Page 22: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Noise Models

We aren’t modeling entire system.

Noise model gives mismatch between model and data.

Gaussian model justified by appeal to central limit theorem.

Other models also possible (Student-t for heavy tails).

Maximum likelihood with Gaussian noise leads to least squares.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 11 / 74

Page 23: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

What about two unknowns and oneobservation?

y1 = mx1 + c

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 24: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

m =y1 − c

x

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 25: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = 1.75 =⇒ m = 1.25

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 26: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −0.777 =⇒ m = 3.78

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 27: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −4.01 =⇒ m = 7.01

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 28: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −0.718 =⇒ m = 3.72

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 29: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = 2.45 =⇒ m = 0.545

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 30: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −0.657 =⇒ m = 3.66

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 31: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −3.13 =⇒ m = 6.13

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 32: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.

c = −1.47 =⇒ m = 4.47

0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 33: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Underdetermined System

Can compute m given c.Assume

c ∼ N (0, 4) ,

we find a distribution of solutions.0

1

2

3

4

5

0 1 2 3y

x

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 12 / 74

Page 34: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Probability for Under- and Overdetermined

To deal with overdetermined introduced probability distribution for‘variable’, εi .

For underdetermined system introduced probability distribution for‘parameter’, c.

This is known as a Bayesian treatment.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 13 / 74

Page 35: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

For general Bayesian inference need multivariate priors.

E.g. for multivariate linear regression:

yi =∑i

wjxi ,j + εi

(where we’ve dropped c for convenience), we need a prior over w.

This motivates a multivariate Gaussian density.

We will use the multivariate Gaussian to put a prior directly on thefunction (a Gaussian process).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

Page 36: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

For general Bayesian inference need multivariate priors.

E.g. for multivariate linear regression:

yi = w>xi ,: + εi

(where we’ve dropped c for convenience), we need a prior over w.

This motivates a multivariate Gaussian density.

We will use the multivariate Gaussian to put a prior directly on thefunction (a Gaussian process).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 14 / 74

Page 37: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multivariate Regression Likelihood

Recall multivariate regression likelihood:

p(y|X,w) =1

(2πσ2)n/2exp

(− 1

2σ2

n∑i=1

(yi −w>xi ,:

)2)

Now use a multivariate Gaussian prior:

p(w) =1

(2πα)p2

exp

(− 1

2αw>w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

Page 38: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multivariate Regression Likelihood

Recall multivariate regression likelihood:

p(y|X,w) =1

(2πσ2)n/2exp

(− 1

2σ2

n∑i=1

(yi −w>xi ,:

)2)

Now use a multivariate Gaussian prior:

p(w) =1

(2πα)p2

exp

(− 1

2αw>w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 15 / 74

Page 39: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Posterior Density

Once again we want to know the posterior:

p(w|y,X) ∝ p(y|X,w)p(w)

And we can compute by completing the square.

log p(w|y,X) =− 1

2σ2

n∑i=1

y 2i +

1

σ2

n∑i=1

yix>i ,:w

− 1

2σ2

n∑i=1

w>xi ,:x>i ,:w −

1

2αw>w + const.

p(w|y,X) = N (w|µw ,Cw )

Cw = (σ−2X>X + α−1)−1 and µw = Cwσ−2X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

Page 40: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Posterior Density

Once again we want to know the posterior:

p(w|y,X) ∝ p(y|X,w)p(w)

And we can compute by completing the square.

log p(w|y,X) =− 1

2σ2

n∑i=1

y 2i +

1

σ2

n∑i=1

yix>i ,:w

− 1

2σ2

n∑i=1

w>xi ,:x>i ,:w −

1

2αw>w + const.

p(w|y,X) = N (w|µw ,Cw )

Cw = (σ−2X>X + α−1)−1 and µw = Cwσ−2X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 16 / 74

Page 41: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Bayesian vs Maximum Likelihood

Note the similarity between posterior mean

µw = (σ−2X>X + α−1)−1σ−2X>y

and Maximum likelihood solution

w = (X>X)−1X>y

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 17 / 74

Page 42: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Marginal Likelihood is Computed as Normalizer

p(w|y,X)p(y|X) = p(y|w,X)p(w)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 18 / 74

Page 43: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Marginal Likelihood

Can compute the marginal likelihood as:

p(y|X, α, σ) = N(

y|0, αXX> + σ2I)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 19 / 74

Page 44: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Two Dimensional Gaussian

Consider height, h/m and weight, w/kg .

Could sample height from a distribution:

p(h) ∼ N (1.7, 0.0225)

And similarly weight:

p(w) ∼ N (75, 36)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 20 / 74

Page 45: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Height and Weight Modelsp

(h)

h/m

Marginal Distributions

p(w

)w/kg Gaussian

distributions for height and weight.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 21 / 74

Page 46: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 47: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 48: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 49: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 50: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 51: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 52: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 53: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 54: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 55: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 56: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 57: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 58: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 59: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 60: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 61: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 62: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 63: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 64: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 65: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 66: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 67: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 68: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Sample height and weight one after the other and plot against each other.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 22 / 74

Page 69: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Independence Assumption

This assumes height and weight are independent.

p(h,w) = p(h)p(w)

In reality they are dependent (body mass index) = wh2 .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 23 / 74

Page 70: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 71: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 72: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 73: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 74: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 75: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 76: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 77: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 78: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 79: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 80: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 81: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 82: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 83: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 84: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 85: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 86: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 87: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 88: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 89: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 90: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 91: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 92: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling Two Dimensional Variablesw/k

g

h/m

Joint Distribution

p(h

)

Marginal Distributions

p(w

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 24 / 74

Page 93: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Independent Gaussians

p(w , h) = p(w)p(h)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 94: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Independent Gaussians

p(w , h) =1√

2πσ21

√2πσ2

2

exp

(−1

2

((w − µ1)2

σ21

+(h − µ2)2

σ22

))

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 95: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Independent Gaussians

p(w , h) =1

2π√σ2

1σ22

exp

(−1

2

([wh

]−[µ1

µ2

])> [σ2

1 00 σ2

2

]−1([wh

]−[µ1

µ2

]))

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 96: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Independent Gaussians

p(y) =1

2π |D|exp

(−1

2(y − µ)>D−1(y − µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 25 / 74

Page 97: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(y − µ)>D−1(y − µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 98: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(R>y − R>µ)>D−1(R>y − R>µ)

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 99: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |D|12

exp

(−1

2(y − µ)>RD−1R>(y − µ)

)this gives a covariance matrix:

C−1 = RD−1R>

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 100: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Correlated Gaussian

Form correlated from original by rotating the data space using matrix R.

p(y) =1

2π |C|12

exp

(−1

2(y − µ)>C−1(y − µ)

)this gives a covariance matrix:

C = RDR>

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 26 / 74

Page 101: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 102: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 103: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 104: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 105: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Recall Univariate Gaussian Properties

1 Sum of Gaussian variables is also Gaussian.

yi ∼ N(µi , σ

2i

)n∑

i=1

yi ∼ N

(n∑

i=1

µi ,

n∑i=1

σ2i

)

2 Scaling a Gaussian leads to a Gaussian.

y ∼ N(µ, σ2

)wy ∼ N

(wµ,w 2σ2

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 27 / 74

Page 106: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 107: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 108: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multivariate Consequence

Ifx ∼ N (µ,Σ)

Andy = Wx

Theny ∼ N

(Wµ,WΣW>

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 28 / 74

Page 109: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Sampling a Function

Multi-variate Gaussians

We will consider a Gaussian with a particular structure of covariancematrix.

Generate a single sample from this 25 dimensional Gaussiandistribution, f = [f1, f2 . . . f25].

We will plot these points against their index.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 29 / 74

Page 110: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

ji

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 111: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

ji

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 112: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 113: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 114: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 115: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 116: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

00.10.20.30.40.50.60.70.80.91

(b) colormap showing correlations betweendimensions.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 117: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Distribution Sample

-2

-1

0

1

2

0 5 10 15 20 25

f i

i(a) A 25 dimensional correlated randomvariable (values ploted against index)

1 0.96587

0.96587 1

(b) correlation between f1 and f2.

Figure: A sample from a 25 dimensional Gaussian distribution.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 30 / 74

Page 118: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 119: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 120: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 121: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f2 from f1

-1

0

1

-1 0 1

f 1

f2

1 0.96587

0.96587 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f2).

We observe that f1 = −0.313.

Conditional density: p(f2|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 31 / 74

Page 122: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction with Correlated Gaussians

Prediction of f2 from f1 requires conditional density.

Conditional density is also Gaussian.

p(f2|f1) = N

(f2|

k1,2

k1,1f1, k2,2 −

k21,2

k1,1

)

where covariance of joint density is given by

K =

[k1,1 k1,2

k2,1 k2,2

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 32 / 74

Page 123: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 124: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 125: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 126: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction of f5 from f1

-1

0

1

-1 0 1

f 1

f5

1 0.57375

0.57375 1

The single contour of the Gaussian density represents the jointdistribution, p(f1, f5).

We observe that f1 = −0.313.

Conditional density: p(f5|f1 = −0.313).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 33 / 74

Page 127: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction with Correlated Gaussians

Prediction of f∗ from f requires multivariate conditional density.

Multivariate conditional density is also Gaussian.

p(f∗|f) = N(f∗|K∗,fK−1

f,f f,K∗,∗ −K∗,fK−1f,f Kf,∗

)

Here covariance of joint density is given by

K =

[Kf,f K∗,fKf,∗ K∗,∗

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 34 / 74

Page 128: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Prediction with Correlated Gaussians

Prediction of f∗ from f requires multivariate conditional density.

Multivariate conditional density is also Gaussian.

p(f∗|f) = N (f∗|µ,Σ)

µ = K∗,fK−1f,f f

Σ = K∗,∗ −K∗,fK−1f,f Kf,∗

Here covariance of joint density is given by

K =

[Kf,f K∗,fKf,∗ K∗,∗

]

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 34 / 74

Page 129: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k(x, x′

)= α exp

(−‖x− x′‖2

2

2`2

)

Covariance matrix is builtusing the inputs to thefunction x.

For the example above itwas based on Euclideandistance.

The covariance function isalso know as a kernel.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

Page 130: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)

k(x, x′

)= α exp

(−‖x− x′‖2

2

2`2

)

Covariance matrix is builtusing the inputs to thefunction x.

For the example above itwas based on Euclideandistance.

The covariance function isalso know as a kernel.

-3

-2

-1

0

1

2

3

-1 -0.5 0 0.5 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 35 / 74

Page 131: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x1 = −3.0, x1 = −3.0

k1,1 = 1.00× exp(− (−3.0−−3.0)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 132: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x1 = −3.0, x1 = −3.0

k1,1 = 1.00× exp(− (−3.0−−3.0)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 133: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 134: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 135: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x1 = −3.0

k2,1 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 136: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x2 = 1.20

k2,2 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 137: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x2 = 1.20, x2 = 1.20

k2,2 = 1.00× exp(− (1.20−1.20)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 138: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 139: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 140: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x1 = −3.0

k3,1 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 141: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 142: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 143: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x2 = 1.20

k3,2 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 144: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 145: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.00 0.110 0.0889

0.110 1.00 0.995

0.0889 0.995 1.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 146: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 2.00 and α = 1.00.

x3 = 1.40, x3 = 1.40

k3,3 = 1.00× exp(− (1.40−1.40)2

2×2.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 147: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x1 = −3, x1 = −3

k1,1 = 1.0× exp(− (−3−−3)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 148: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x1 = −3, x1 = −3

k1,1 = 1.0× exp(− (−3−−3)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 149: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 150: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 151: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x1 = −3

k2,1 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 152: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x2 = 1.2

k2,2 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 153: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x2 = 1.2, x2 = 1.2

k2,2 = 1.0× exp(− (1.2−1.2)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 154: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 155: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 156: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x1 = −3

k3,1 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 157: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 158: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 159: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x2 = 1.2

k3,2 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 160: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x3 = 1.4

k3,3 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 161: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x3 = 1.4, x3 = 1.4

k3,3 = 1.0× exp(− (1.4−1.4)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 162: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 163: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 164: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x1 = −3

k4,1 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 165: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 166: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 167: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x2 = 1.2

k4,2 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 168: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 169: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 170: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x3 = 1.4

k4,3 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 171: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 172: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

1.0 0.11 0.089 0.044

0.11 1.0 1.0 0.92

0.089 1.0 1.0 0.96

0.044 0.92 0.96 1.0

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 173: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3, x2 = 1.2, x3 = 1.4, and x4 = 2.0 with ` = 2.0 and α = 1.0.

x4 = 2.0, x4 = 2.0

k4,4 = 1.0× exp(− (2.0−2.0)2

2×2.02

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 174: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x1 = −3.0, x1 = −3.0

k1,1 = 4.00× exp(− (−3.0−−3.0)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 175: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x1 = −3.0, x1 = −3.0

k1,1 = 4.00× exp(− (−3.0−−3.0)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 176: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 177: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 178: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x1 = −3.0

k2,1 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 179: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x2 = 1.20

k2,2 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 180: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x2 = 1.20, x2 = 1.20

k2,2 = 4.00× exp(− (1.20−1.20)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 181: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 182: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 183: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x1 = −3.0

k3,1 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 184: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 185: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 186: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x2 = 1.20

k3,2 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 187: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 188: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

4.00 2.81 2.72

2.81 4.00 4.00

2.72 4.00 4.00

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 189: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance FunctionsWhere did this covariance matrix come from?

k (xi , xj) = α exp(− ||xi−xj ||

2

2`2

)

x1 = −3.0, x2 = 1.20, and x3 = 1.40 with ` = 5.00 and α = 4.00.

x3 = 1.40, x3 = 1.40

k3,3 = 4.00× exp(− (1.40−1.40)2

2×5.002

)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 36 / 74

Page 190: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 37 / 74

Page 191: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Basis Function Form

Radial basis functions commonly have the form

φk (xi ) = exp

(−|xi − µk |2

2`2

).

Basis function mapsdata into a “featurespace” in which alinear sum is a nonlinear function.

0

0.5

1

-8 -6 -4 -2 0 2 4 6 8

φ(x

)

x

Figure: A set of radial basis functions with width` = 2 and location parameters µ = [−4 0 4]>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 38 / 74

Page 192: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Basis Function Representations

Represent a function by a linear sum over a basis,

f (xi ,:; w) =m∑

k=1

wkφk(xi ,:), (1)

Here: m basis functions and φk(·) is kth basis function and

w = [w1, . . . ,wm]> .

For standard linear model: φk(xi ,:) = xi ,k .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 39 / 74

Page 193: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Random Functions

Functions derived using:

f (x) =m∑

k=1

wkφk(x),

where W is sampledfrom a Gaussian density,

wk ∼ N (0, α) .

-2

-1

0

1

2

-8 -6 -4 -2 0 2 4 6 8f

(x)

xFigure: Functions sampled using the basis set fromfigure 2. Each line is a separate sample, generated bya weighted sum of the basis set. The weights, w aresampled from a Gaussian density with variance α = 1.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 40 / 74

Page 194: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 195: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 196: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 197: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 198: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 199: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 200: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Direct Construction of Covariance Matrix

Use matrix notation to write function,

f (xi ; w) =m∑

k=1

wkφk (xi )

computed at training data gives a vector

f = Φw.

w and f are only related by a inner product.

Φ is fixed and non-stochastic for a given training set.

f is Gaussian distributed.

it is straightforward to compute distribution for f

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 41 / 74

Page 201: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 202: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 203: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 204: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 205: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Expectations

We use 〈·〉 to denote expectations under prior distributions.

We have〈f〉 = φ 〈w〉 .

Prior mean of w was zero giving

〈f〉 = 0.

Prior covariance of f is

K =⟨

ff>⟩− 〈f〉 〈f〉>

⟨ff>⟩

= Φ⟨

ww>⟩Φ>,

givingK = γ′ΦΦ>.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 42 / 74

Page 206: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 207: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 208: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 209: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance between Two Points

The prior covariance between two points xi and xj is

k (xi , xj) = γ′m∑`

φ` (xi )φ` (xj)

or in vector form

k (xi , xj) = φ: (xi )> φ: (xj) ,

For the radial basis used this gives

k (xi , xj) = γ′m∑

k=1

exp

(−|xi − µk |2 + |xj − µk |2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 43 / 74

Page 210: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 211: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 212: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 213: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Selecting Number and Location of Basis

Need to choose1 location of centers2 number of basis functions

Consider uniform spacing over a region:

k (xi , xj) = γ∆µm∑

k=1

exp

(−

x2i + x2

j − 2µk (xi + xj) + 2µ2k

2`2

),

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 44 / 74

Page 214: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Uniform Basis Functions

Set each center location to

µk = a + ∆µ · (k − 1).

Specify the bases in terms of their indices,

k (xi , xj) =γ∆µm∑

k=1

exp

(−

x2i + x2

j

2`2

−2 (a + ∆µ · k) (xi + xj) + 2 (a + ∆µ · k)2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 45 / 74

Page 215: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Uniform Basis Functions

Set each center location to

µk = a + ∆µ · (k − 1).

Specify the bases in terms of their indices,

k (xi , xj) =γ∆µm∑

k=1

exp

(−

x2i + x2

j

2`2

−2 (a + ∆µ · k) (xi + xj) + 2 (a + ∆µ · k)2

2`2

).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 45 / 74

Page 216: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 217: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 218: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 219: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Basis Functions

Take µ0 = a and µm = b so b = a + ∆µ · (m − 1).

Take limit as ∆µ→ 0 so m→∞

k(xi , xj) =γ

∫ b

aexp

(−

x2i + x2

j

2`2

+2(µ− 1

2 (xi + xj))2 − 1

2 (xi + xj)2

2`2

)dµ,

where we have used k ·∆µ→ µ.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 46 / 74

Page 220: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 221: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 222: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Result

Performing the integration leads to

k(xi ,xj) = γ

√π`2

2exp

(−

(xi − xj)2

4`2

)

×

[erf

((b − 1

2 (xi + xj))

`

)− erf

((a− 1

2 (xi + xj))

`

)],

Now take limit as a→ −∞ and b →∞

k (xi , xj) = α exp

(−

(xi − xj)2

4`2

).

where α = γ√π`2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 47 / 74

Page 223: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 224: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 225: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 226: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Infinite Feature Space

A RBF model with infinite basis functions is a Gaussian process.

The covariance function is the exponentiated quadratic.

Note: The functional form for the covariance function and basisfunctions are similar.

I this is a special case,I in general they are very different

Similar results can obtained for multi-dimensional input networks ??.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 48 / 74

Page 227: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Nonparametric Gaussian Processes

This work takes us from parametric to non-parametric.

The limit implies infinite dimensional w.

Gaussian processes are generally non-parametric: combine data withcovariance function to get model.

This representation cannot be summarized by a parameter vector of afixed size.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 49 / 74

Page 228: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Parametric models have a representation that does not respond toincreasing training set size.

Bayesian posterior distributions over parameters contain theinformation about the training data.

I Use Bayes’ rule from training data, p (w|y,X),I Make predictions on test data

p (y∗|X∗, y,X) =

∫p (y∗|w,X∗) p (w|y,X)dw) .

w becomes a bottleneck for information about the training set to passto the test set.

Solution: increase m so that the bottleneck is so large that it nolonger presents a problem.

How big is big enough for m? Non-parametrics says m→∞.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 50 / 74

Page 229: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 230: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 231: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 232: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 233: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

The Parametric Bottleneck

Now no longer possible to manipulate the model through the standardparametric form given in (1).

However, it is possible to express parametric as GPs:

k (xi , xj) = φ: (xi )> φ: (xj) .

These are known as degenerate covariance matrices.

Their rank is at most m, non-parametric models have full rankcovariance matrices.

Most well known is the “linear kernel”, k(xi , xj) = x>i xj .

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 51 / 74

Page 234: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 235: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 236: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 237: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 238: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Making Predictions

For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.

In GPs this involves combining the training data with the covariancefunction and the mean function.

Parametric is a special case when conditional prediction can besummarized in a fixed number of parameters.

Complexity of parametric model remains fixed regardless of the size ofour training data set.

For a non-parametric model the required number of parameters growswith the size of the training data.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 52 / 74

Page 239: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

RBF Basis Functions

k(x, x′

)= αφ(x)>φ(x′)

φi (x) = exp

(−‖x − µi‖2

2

`2

)

µ =

−101

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 53 / 74

Page 240: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

RBF Basis Functions

k(x, x′

)= αφ(x)>φ(x′)

φi (x) = exp

(−‖x − µi‖2

2

`2

)

µ =

−101

-3

-2

-1

0

1

2

3

-3 -2 -1 0 1 2 3

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 53 / 74

Page 241: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 242: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 243: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions and Mercer Kernels

Mercer Kernels and Covariance Functions are similar.

the kernel perspective does not make a probabilistic interpretation ofthe covariance function.

Algorithms can be simpler, but probabilistic interpretation is crucialfor kernel parameter optimization.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 54 / 74

Page 244: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 55 / 74

Page 245: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Constructing Covariance Functions

Sum of two covariances is also a covariance function.

k(x, x′) = k1(x, x′) + k2(x, x′)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 56 / 74

Page 246: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Constructing Covariance Functions

Product of two covariances is also a covariance function.

k(x, x′) = k1(x, x′)k2(x, x′)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 57 / 74

Page 247: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Multiply by Deterministic Function

If f (x) is a Gaussian process.

g(x) is a deterministic function.

h(x) = f (x)g(x)

Thenkh(x, x′) = g(x)kf (x, x′)g(x′)

where kh is covariance for h(·) and kf is covariance for f (·).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 58 / 74

Page 248: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

MLP Covariance Function

k(x, x′

)= αasin

(wx>x′ + b

√wx>x + b + 1

√wx′>x′ + b + 1

)

Based on infinite neuralnetwork model.

w = 40

b = 4

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 59 / 74

Page 249: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

MLP Covariance Function

k(x, x′

)= αasin

(wx>x′ + b

√wx>x + b + 1

√wx′>x′ + b + 1

)

Based on infinite neuralnetwork model.

w = 40

b = 4 -2

-1

0

1

2

-1 0 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 59 / 74

Page 250: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

Linear Covariance Function

k(x, x′

)= αx>x′

Bayesian linear regression.

α = 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 60 / 74

Page 251: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Covariance Functions

Linear Covariance Function

k(x, x′

)= αx>x′

Bayesian linear regression.

α = 1

-2

-1

0

1

2

-1 0 1

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 60 / 74

Page 252: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 253: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 254: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 255: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 256: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 257: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 258: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 259: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Interpolation

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

f(x

)

x

Figure: Real example: BACCO (see e.g. (?)). Interpolation through outputs fromslow computer simulations (e.g. atmospheric carbon levels).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 61 / 74

Page 260: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Noise Models

Graph of a GP

Relates input variables, X,to vector, y, through fgiven kernel parameters θ.

Plate notation indicatesindependence of yi |fi .Noise model, p (yi |fi ) cantake several forms.

Simplest is Gaussiannoise.

yi

X

fi

θ

i = 1 . . . n

Figure: The Gaussian processdepicted graphically.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 62 / 74

Page 261: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Noise

Gaussian noise model,

p (yi |fi ) = N(yi |fi , σ2

)where σ2 is the variance of the noise.

Equivalent to a covariance function of the form

k(xi , xj) = δi ,jσ2

where δi ,j is the Kronecker delta function.

Additive nature of Gaussians means we can simply add this term toexisting covariance matrices.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 63 / 74

Page 262: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 263: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 264: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 265: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 266: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 267: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 268: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 269: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 270: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gaussian Process Regression

-3

-2

-1

0

1

2

3

-2 -1 0 1 2

y(x

)

x

Figure: Examples include WiFi localization, C14 callibration curve.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 64 / 74

Page 271: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

N (y|0,K) =1

(2π)n2 |K|

exp

(−y>K−1y

2

)The parameters are inside the covariance function

(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 272: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

N (y|0,K) =1

(2π)n2 |K|

exp

(−y>K−1y

2

)The parameters are inside the covariance function

(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 273: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

logN (y|0,K) = −n2log 2π−1

2log |K|−y>K−1y

2

The parameters are inside the covariance function(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 274: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

E (θ) =1

2log |K| + y>K−1y

2

The parameters are inside the covariance function(matrix).

ki ,j = k(xi , xj ;θ)

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 65 / 74

Page 275: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Eigendecomposition of Covariance

K = RΛ2R>

λ1

λ2

where Λ is a diagonal matrix and R>R = I.

Useful representation since |K| =∣∣Λ2∣∣ = |Λ|2.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 66 / 74

Page 276: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

λ1 0

0 λ2

λ1

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 277: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 278: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 279: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 280: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2

Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 281: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 282: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0 0

0 λ2 0

0 0 λ3

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 283: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2λ3

λ1 0 0

0 λ2 0

0 0 λ3

λ1

λ2

λ3

|Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 284: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|Λ| = λ1λ2

λ1 0

0 λ2

λ1

λ2 |Λ|Λ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 285: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Capacity control: log |K|

|RΛ| = λ1λ2

w1,1 w1,2

w2,1 w2,2

λ1

λ2

|Λ|RΛ =

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 67 / 74

Page 286: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 287: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1

λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 288: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Data Fit: y−1K−1y2

-6

-4

-2

0

2

4

6

-6 -4 -2 0 2 4 6

y 2

y1

λ1λ2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 68 / 74

Page 289: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 290: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 291: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 292: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 293: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 294: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 295: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 296: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 297: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Learning Covariance ParametersCan we determine length scales and noise levels from the data?

-2

-1

0

1

2

-2 -1 0 1 2

y(x

)

x

-10

-5

0

5

10

15

20

10−1 100 101

length scale, `

E (θ) =1

2|K|+ y>K−1y

2

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 69 / 74

Page 298: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Gene Expression Example

Global expression estimation with l = 30

Global expression estimation with l = 15.6

Data from ?. Figure from ?.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 70 / 74

Page 299: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Outline

1 The Gaussian Density

2 Covariance from Basis Functions

3 Basis Function Representations

4 Constructing Covariance

5 GP Limitations

6 Conclusions

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 71 / 74

Page 300: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Limitations of Gaussian Processes

Inference is O(n3) due to matrix inverse (in practice use Cholesky).

Gaussian processes don’t deal well with discontinuities (financialcrises, phosphorylation, collisions, edges in images).

Widely used exponentiated quadratic covariance (RBF) can be toosmooth in practice (but there are many alternatives!!).

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 72 / 74

Page 301: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

Summary

Broad introduction to Gaussian processes.I Started with Gaussian distribution.I Motivated Gaussian processes through the multivariate density.

Emphasized the role of the covariance (not the mean).

Performs nonlinear regression with error bars.

Parameters of the covariance function (kernel) are easily optimizedwith maximum likelihood.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 73 / 74

Page 302: Session 1: Gaussian Processes - ttic.uchicago.edururtasun/tutorials/gp_cvpr12_session1.pdf · 1 x 2) Urtasun and Lawrence Session 1: GP and Regression CVPR Tutorial 9 / 74. Two Simultaneous

References I

G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of the trp63transcription factor revealed by a combination of gene expression profiling and reverse engineering. Genome Research, 18(6):939–948, Jun 2008. [URL]. [DOI].

A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression time courses throughGaussian process regression. BMC Bioinformatics, 12(180), 2011. [DOI].

R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996. Lecture Notes in Statistics 118.

J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769–784, 2002.

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. [GoogleBooks] .

C. K. I. Williams. Computation with infinite neural networks. Neural Computation, 10(5):1203–1216, 1998.

Urtasun and Lawrence () Session 1: GP and Regression CVPR Tutorial 74 / 74