Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Introduction
• Goal: Learn input-output systems: given an input, predict output.
• Gaussian Process Regression (GPR): powerful nonparametric regressiontechnique
• Kriging and spline fits are both instances of GPR
Outline for today:
• Gaussian random vectors, marginals, and conditionals
• Gaussian processes
• Covariance functions
• GPR prediction
AA222: Introduction to Multidisciplinary Design Optimization 1
How You Get Your Grade1
Now that I have your undivided attention. . . ,
• Students are sometimes ‘graded on a curve’
• Originally, this was a bell-curve : the normal (Gaussian) distribution
• If a random variable X is normally distributed with mean µ and variance σ2
p(x) =1
σ√
2πexp
(−(x−µ)2
2σ2
)
1Not really.
AA222: Introduction to Multidisciplinary Design Optimization 2
Multiple Gaussian Random Variables
What can we say about multiple normally-distributed random variables?
x1 and x2 are Gaussian, µ1 = 1, µ2 = 3, σ1 = σ2 = 1.
AA222: Introduction to Multidisciplinary Design Optimization 3
Multiple Gaussian Random Variables
If x1 and x2 are independent,
p(x1, x2) = p(x1)p(x2),
=1
σ1
√2π· 1σ2
√2π
exp(−(x1 − µ1)2
2σ21
)exp
(−(x2 − µ2)2
2σ22
),
=1
σ1
√2π· 1σ2
√2π
exp(−(x1 − µ1)2
2σ21
− (x2 − µ2)2
2σ22
),
=1
(2π)σ1σ2exp
−[x1 − µ1
x2 − µ2
]T [ 1σ2
10
0 1σ2
2
] [x1 − µ1
x2 − µ2
]2
,
=1
(2π)n/2|Σ|exp
(−(x− µ)TΣ−1(x− µ)
2
).
AA222: Introduction to Multidisciplinary Design Optimization 4
Correlated Gaussian Random Variables
p(x) =1
(2π)n/2|Σ|exp
(−(x− µ)TΣ−1(x− µ)
2
)• What about cross-terms in Σ?
• They denote covariances: components of x don’t vary independently
• Plots below show samples generated when σ12 = 0, 0.7, 0.9
AA222: Introduction to Multidisciplinary Design Optimization 5
Gaussian Random Vector
• A vector X whose components Xi are Gaussian random variables
• Density function analogous to 1-D case, but note covariances!
p(x) =1
(2π)12|Σ|
exp(−(x− µ)TΣ−1(x− µ)
2
)
Probability density for a 2-D Gaussian random vector.
AA222: Introduction to Multidisciplinary Design Optimization 6
Marginal Distribution
• Idea: ignore some components; “marginalize over” or “integrate out”
• Picture: project joint distribution onto appropriate ‘wall’, normalize
• Math: p(xi) =∫X−i
p(xi, x−i)dx−i
• Marginals of Gaussians are Gaussian!
• Mean: omit components of µ
E[X1] = µ1
• Covariance: omit rows and columns of Σ
cov(X1) = Σ11.Marginal density for 2-D Gaussian
Note: not normalized!
AA222: Introduction to Multidisciplinary Design Optimization 7
Conditional Distribution
• Idea: fix the “given” value of one or more components; X1 | X2 = a
• Picture: slice of joint distribution (suitably normalized)
• Math: Bayes rule. p(x1|x2) =p(x1, x2)p(x2)
• Conditionals of Gaussians are Gaussian!
• Mean: Depends on “given” a via covariance
E(X1 | X2 = a) = µ1 + Σ12Σ−122 (a− µ2)
• Covariance: independent of a!
cov(X1 | X2 = a) = Σ11 − Σ12Σ−122 Σ21
Conditional pdfs for 2-D Gaussian
Note: not normalized!
link to prediction slide
AA222: Introduction to Multidisciplinary Design Optimization 8
Motivation for Gaussian Process Regression
• Suppose we want to model a system x −→ G −→ y
• What if we consider the outputs for each x as a random variable?
AA222: Introduction to Multidisciplinary Design Optimization 9
Motivation for Gaussian Process Regression
• Suppose we want to model a system x −→ G −→ y
• What if we consider the outputs for each x as a random variable?
• Outputs corresponding to ‘nearby’ inputs are positively correlated
AA222: Introduction to Multidisciplinary Design Optimization 10
Motivation for Gaussian Process Regression
• Suppose we want to model a system x −→ G −→ y
• What if we consider the outputs for each x as a random variable?
• Outputs corresponding to ‘nearby’ inputs are positively correlated
• Outputs corresponding to ‘distant’ outputs are uncorrelated
AA222: Introduction to Multidisciplinary Design Optimization 11
Gaussian Processes
Stochastic process: possibly infinite set of random variables.
Model a system x −→ G −→ y using a stochastic process.
• The random variables themselves represent the outputs of the system
• The indices into the set represent the inputs of the system
G = {Yx, x ∈ X}
e.g., Y(4.5,−6.7) is the output at the (x1 = 4.5, x2 = −6.7)th input.
GP: Any finite subset of outputs form a Gaussian random vector.
Specify marginal mean, variance: regardless of other outputs, the output at anysingle location has mean µ0 and variance σ2
0.
What about the covariance?
AA222: Introduction to Multidisciplinary Design Optimization 12
Covariance Function
Covariance between outputs depends on distance between inputs.
• Specify a correlation function instead, then scale by marginal variance
• As d(x1, x2)→ 0, corr(Y1, Y2)→ 1
• As d(x1, x2)→∞, corr(Y1, Y2)→ 0
• Monotonic decrease in between may not be a bad idea!
• Negative exponential, squared exponential, linear decrease to 0, all possible.
• For smooth functions, use corr(Yi, Yj) = exp(−(xi − xj)2
2τ2
)
• For other covariance functions, see Rasmussen and Williams, 2006
AA222: Introduction to Multidisciplinary Design Optimization 13
GPR Prediction is Straightforward
• Consider a finite subset outputs {Yx(i), i = 1, . . . ,m}
• Concatenate the outputs to form a Gaussian random vector Y
• Correlation function specifies correlation matrix R, with Rij = corr(Yi, Yj)
• Multiply by marginal variance σ20 to get covariance matrix Σ
• Suppose a subset of these outputs Y1 is known (“given”)
• What is the conditional distribution of (Y2 | Y1 = y)?
• We have seen this before. . .
AA222: Introduction to Multidisciplinary Design Optimization 14
GPR Prediction is Straightforward
• Consider a finite subset outputs {Yx(i), i = 1, . . . ,m}
• Concatenate the outputs to form a Gaussian random vector Y
• Correlation function specifies correlation matrix R, with Rij = corr(Yi, Yj)
• Multiply by marginal variance σ20 to get covariance matrix Σ
• Suppose a subset of these outputs Y1 is known (“given”)
• What is the conditional distribution of (Y2 | Y1 = y)?
• We have seen this before. . .
• We’ve just made prediction using a Kriging model!
AA222: Introduction to Multidisciplinary Design Optimization 8
GPR Prediction: A Summary
• Begin with given data D = {(x(i), y(i)), i = 1, . . . ,m1}
• Append desired prediction locations and corresponding unknown outputs
• Write it all together as one large set {Yx(i), i = 1, . . . ,m}
• Concatenate outputs to form a Gaussian random vector Y
• Use correlation function to compute correlations of every output Yi with everyother output Yj. Use resulting correlation matrix R, scale by σ0 to getcovariance matrix Σ. Mean vector is just µ01
• Use formulae for Gaussian conditional distributions(k = known, u = unknown).
E(Yu | Yk = y) = µ01 + ΣukΣ−1kk (y − µ01),
cov(Yu | Yk = y) = Σuu − ΣukΣ−1kkΣku.
AA222: Introduction to Multidisciplinary Design Optimization 9
Calibrating: Tuning Hyperparameters
How do we set µ0, σ0, τ?
Cross-validation: Out of m known data points,
• Leave one out
• Construct fit using all the others
• See how it performed on the left-out datum
• Repeat for all points, compute average2 ‘leave-one-out error’
Find values of µ0, σ0, τ that minimizes average held-out error.
Alternative statistical approach: maximum marginal likelihood (ML-II )(see Rasmussen and Williams, 2006)
2Not as difficult as it seems: algebra tricks allow us to simplify the math
AA222: Introduction to Multidisciplinary Design Optimization 10
Questions?
AA222: Introduction to Multidisciplinary Design Optimization 11