Upload
truongminh
View
222
Download
6
Embed Size (px)
Citation preview
Pattern Recognition and Machine Learning:Introduction
Libao Jin
November 17, 2016
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive model
Target Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digit
Note that there is one such target vector t for each digit imagex
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectors
Once the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test set
In practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
Sum-of-Squares Error Function
E(w) = 12
N∑n=1{y(xn, w)− tn}2
Minimize Sum-of-Squares Error Function
E(w) = 12
N∑n=1{y(xn, w)− tn}2 = 1
2
N∑n=1
M∑j=0
wjxjn − tn
2
∂E(w)∂wj
=N∑
n=1
M∑j=0
wjxjn − tn
xjn
=[xj
1 · · · xjN
]
x01 x1 · · · xM
1x0
2 x2 · · · xM2
...... . . . ...
x0N xN · · · xM
N
w0w1...
wM
−
t1t2...
tN
=[xj
1 · · · xjN
](Xw− t) = 0⇒ w = (XT X)−1XT t
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting
Root-Mean-Square (RMS) Error: ERMS =√
2E(w∗)/N
Polynomial Coefficients
Data Set Size: N = 15
9th Order Polynomial
Data Set Size: N = 100
9th Order Polynomial
Probability Theory
Marginal Probability: p(X = xi) = ciN .
Joint Probability: p(X = xi, Y = yj) = nij
N .
Conditional Probability: p(Y = yj |X = xi) = nij
ci.
Probability Theory
Sum Rule:p(X = xi) = ci
N = 1N
∑Lj=1 nij =
∑Lj=1 p(X = xi, Y = yj).
Product Rule:p(X = xi, Y = yj) = nij
N = nij
ci· ci
N = p(Y = yj |X = xi)p(X = xi).
The Rules of Probability
Sum Rule p(X) =∑Y
p(X, Y )
Product Rule p(X, Y ) = p(Y |X)p(X)
Bayes’ Theorem
By Product Rule, we have
p(X, Y ) = p(Y, X)⇒ p(Y |X)p(X) = p(X|Y )p(Y )
p(Y |X) = p(X|Y )p(Y )p(X)
p(X) =∑Y
P (X|Y )p(Y )
posterior ∝ likelihood × prior
Probability Density
P (z) =∫ z
−∞p(x)dx
p(x) ≥ 0∫ ∞−∞
p(x)dx = 1
Expectations
E[f ] =∑
x p(x)f(x) E[f ] =∫
p(x)f(x)dx
E[f |y] =∑
x p(x|y)f(x) Conditional ExpectationE[f ] ≈ 1
N
∑Nn=1 f(xn) Approximate Expectation
Variances and Covariances
var[f ] = E[(f(x)− E[f(x)])2] = E[f(x)2]− E[f(x)]2.
cov[x, y] = Ex,y[{x− E[x]}{y − E[y]}] = Ex,y[xy]− E[x]E[y].
cov[x, y] = Ex,y[{x−E[x]}{yT−E[yT ]}] = Ex,y[xyT ]−E[x]E[yT ].