Upload
nvbond
View
220
Download
0
Embed Size (px)
Citation preview
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
1/139
Introduction toRadial Basis Function Networks
:
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
2/139
Content Overview The Models of Function Approximator The Radial Basis Function Networks RBFNs for Function Approximation The Projection Matrix Learning the Kernels Bias-Variance Dilemma The Effective Number of Parameters Model Selection Incremental Operations
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
3/139
Introduction toRadial Basis Function Networks
Overview
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
4/139
Typical Applications of NN
Pattern Classification
Function Approximation
Time-Series Forecasting
( )l f x l C N
( )fy xmY R y
m
X R x
nX R x
1 2 3( ) ( , , , )t t tft
x x x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
5/139
Function Approximation
mX R fn
Y R
:f X YUnknown
mX RnY R
:f X YApproximator
f
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
6/139
NeuralNetwork
Supervised Learning
UnknownFunction
ixiy
iy1,2,i
++
ie
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
7/139
Neural Networks as
Universal Approximators Feedforward neural networks with a single hidden layer of
sigmoidal units are capable of approximating uniformly any
continuous multivariate function, to any desired degree ofaccuracy. Hornik, K., Stinchcombe, M., and White, H. (1989). "Multilayer
Feedforward Networks are Universal Approximators," Neural Networks,2(5), 359-366.
Like feedforward neural networks with a single hidden layer
of sigmoidal units, it can be shown that RBF networks areuniversal approximators. Park, J. and Sandberg, I. W. (1991). "Universal Approximation Using
Radial-Basis-Function Networks," Neural Computation, 3(2), 246-257.
Park, J. and Sandberg, I. W. (1993). "Approximation and Radial-Basis-Function Networks," Neural Computation, 5(2), 305-316.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
8/139
Statistics vs. Neural Networks
Statistics Neural Networksmodel networkestimation learningregression supervised learninginterpolation generalization
observations training setparameters (synaptic) weightsindependent variables inputsdependent variables outputsridge regression weight decay
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
9/139
Introduction toRadial Basis Function Networks
The Model of
Function Approximator
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
10/139
Linear Models
1 (( ) )i
ii
m
f w x x
Fixed BasisFunctions
Weights
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
11/139
Linear Models
Feature VectorsInputs
Hidden
Units
OutputUnits
Decomposition
Feature Extraction Transformation
Linearly
weightedoutput
1
(( ) )ii
i
m
f w
x x
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
12/139
Linear Models
Feature VectorsInputs
Hidden
Units
OutputUnits
Decomposition
Feature Extraction Transformation
Linearly
weightedoutput
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
1
(( ) )ii
i
m
f w
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
13/139
Example Linear Models
Polynomial
Fourier Series
( ) ii
iwf xx
0exp 2( )k
k jf w k xx
( ) , 0,1,2,ii ix x
0( ) exp 2 0,1,, 2,k x j k x k
1
(( ) )ii
i
m
f w
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
14/139
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
Single-Layer Perceptrons as
Universal Aproximators
Hidden
Units
With sufficient number ofsigmoidal units, it can be auniversal approximator.
1
(( ) )ii
i
m
f w
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
15/139
y
1 2 m
x1 x2 xn
w1 w2 wm
x =
Radial Basis Function Networks as
Universal Aproximators
Hidden
Units
With sufficient number ofradial-basis-function units,
it can also be a universalapproximator.
1
(( ) )ii
i
m
f w
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
16/139
Non-Linear Models
1 (( ) )i
ii
m
f w x x
Adjusted by theLearning process
Weights
1
(( ) )ii
i
m
f w
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
17/139
Introduction toRadial Basis Function Networks
The Radial Basis
Function Networks
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
18/139
Radial Basis Functions
1
(( ) )ii
i
m
f w
x x
Center
Distance Measure Shape
Three parameters
for a radial function:
xi
r= ||x xi||
i(x)=(||x xi||)
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
19/139
Typical Radial Functions
Gaussian
Hardy Multiquadratic
Inverse Multiquadratic
2
2
2 0 and
r
r e r
2 2 0 andr r c c c r
2 2 0 andr c r c c r
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
20/139
Gaussian Basis Function (=0.5,1.0,1.5)
2
22 0 and
r
r e r
0.5 1.0
1.5
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
21/139
Inverse Multiquadratic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-10 -5 0 5 10
2 2 0 andr c r c c r
c=1
c=2
c=3c=4
c=5
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
22/139
Most General RBF
1( ) ( )i iT
i x x x
12 3
Basis {i: i =1,2,} is `near orthogonal.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
23/139
Properties of RBFs On-Center, Off Surround
Analogies with localized receptive fieldsfound in several biological structures, e.g., visual cortex;
ganglion cells
A f i
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
24/139
The Topology of RBF
Feature Vectors
x1 x2 xn
y1 ym
Inputs
HiddenUnits
OutputUnits
Projection
Interpolation
As a functionapproximator ( )y f x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
25/139
The Topology of RBF
Feature Vectors
x1 x2 xn
y1 ym
Inputs
HiddenUnits
OutputUnits
Subclasses
Classes
As a pattern classifier.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
26/139
Introduction toRadial Basis Function Networks
RBFNs for
Function Approximation
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
27/139
The idea
x
y Unknown Functionto Approximate
TrainingData
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
28/139
The idea
x
y Unknown Functionto Approximate
TrainingData
Basis Functions (Kernels)
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
29/139
The idea
x
y
Basis Functions (Kernels)
Function
Learned
1
)) ((m
i
iiwy f
xx
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
30/139
The idea
x
y
Basis Functions (Kernels)
Function
Learned
NontrainingSample
1
)) ((m
i
iiwy f
xx
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
31/139
The idea
x
yFunction
Learned
NontrainingSample
1
)) ((m
i
iiwy f
xx
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
32/139
Radial Basis Function Networks as
Universal Aproximators
1
( ) ( )m
i
i
iy wf
x x
x1 x2 xn
w1 w2 wm
x =
Training set ( )1
( ),p
k
k
ky
xT
Goal (( ))k kfy x for all k
2
( )
1
( )min kp
k
k
SSE f y
x
2
)(
1
) (
1
p mk
i
k
k
i
i
y w
x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
33/139
Learn the Optimal Weight Vector
w1 w2 wm
x1 x2 xnx =
Training set ( )1
( ),p
k
k
ky
xT
Goal (( ))k kfy x for all k
1
( ) ( )m
i
i
iy wf
x x
2
( )
1
( )min kp
k
k
SSE f y
x
2
)(
1
) (
1
p mk
i
k
k
i
i
y w
x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
34/139
2
( )
1
( )min kp
k
k
SSE f y
x
2
)(
1
) (
1
p mk
i
k
k
i
i
y w
x
RegularizationTraining set ( )
1
( ),p
k
k
ky
xT
Goal (( ))k kfy x for all k
2
1
m
i
ii w
2
1
m
i
ii w
0i If regularization
is unneeded, set 0i
C
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
35/139
Learn the Optimal Weight Vector
2
2( )
1
( )
1
p mk
k
i
i
i
k wfyC
xMinimize
1
( ) ( )m
i
i
iwf
x x
(
( )
) ( )
1
2 2kp
j
k
j
k
k
j j
wfC
fw w
y
xx
( ) ( ) ( )1
2 2p
k kj
k
jjk f wy
x x
0
( ) * ( ) ( )1 1
)* (p p
k k k
j j
k k
jj
kwf y
x x x
*
1
*( ) ( )i
m
i
i
wf
x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
36/139
Learn the Optimal Weight Vector
(1) ( ), ,T
p
j j j x x
* * (1) * ( ), ,T
pf ff x xDefine
(1) ( ), , Tpy yy
* *
j
T T
j jj w f y 1, ,j m
( ) * ( ) ( )1 1
)* (p p
k k k
j j
k k
jj
kwf y
x x x
1
( ) ( )m
i
i
iwf
x x *1
*( ) ( )i
m
i
i
wf
x x
* *T T
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
37/139
Learn the Optimal Weight Vector
*
1 1
*
2 2
*
1
*
*
1
2 2
*
T T
T T
T T
m mmm
w
w
w
f y
f y
f y
* *
j
T T
j jj w f y 1, ,j m
1 2, , , m
1
2
m
* * *
1 2* , , ,
T
mw w ww
Define
**T T
w f y
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
38/139
Learn the Optimal Weight Vector
**T T
w f y
* (1)
* (2)*
* ( )p
f
f
f
x
xf
x
*
1
*( ) ( )i
m
i
i
wf
x x
(1)
1
(2)
1
(
*
*
1
*
)
k
k
m
k
k
m
k
k
mP
k
k
k
w
w
w
x
x
x
(1) (1) (1)*1 21
(2) (2) (2) *1 2 2
*( ) ( ) ( )
1 2
m
m
p p pm
m
w
w
w
x x x
x x x
x x x
* w
* *T T w w y
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
39/139
Learn the Optimal Weight Vector
* *T T w w y
*T T w y
1* T T w y
Variance Matrix
1 T A y
1:A
Design Matrix:
( ) ( )p
k k
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
40/139
Summary
1* T T
w y
1 T A y
(1)
(2)
( )
j
j
j
p
j
x
x
x
(1)
( 2)
( )p
yy
y
y
1 2, , , m
*1
*
2
*
*
m
w
w
w
w
1
2
m
1
)) ((m
i
iiwy f
xx
Training set ( )1
( ),p
k
k
ky
xT
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
41/139
Introduction toRadial Basis Function Networks
The Projection Matrix
1* T
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
42/139
The Empirical-Error Vector
1* Tw A y
*
1w*
2w*
mw
( )k
x
( )
1
k
x( )
2
k
x( )k
nx( )k
x
( )
1
k
x( )
2
k
x( )k
nx
Unknown
Function
y ** f w
1 2 m
1* T
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
43/139
The Empirical-Error Vector
1* Tw A y
*
1w*
2w*
mw
( )k
x
( )
1
k
x( )
2
k
x( )k
nx( )k
x
( )
1
k
x( )
2
k
x( )k
nx
Unknown
Function
y ** f w
1 2 m
Error Vector
* e y f * y w 1 T Ay y
1 Tp AI y Py
If =0 the RBFNs learning algorithm
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
44/139
Sum-Squared-Error
e Py 1 Tp
P I A
T A
1 2, , , m Error Vector
T
Py Py2T
y P y
2
( ) * ( )
1
pk k
k
SSE y f
x
If =0, the RBFN s learning algorithmis to minimizeSSE (MSE).
2TSSE P T P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
45/139
The Projection Matrix
e Py 1 Tp
P I A
T A
1 2, , , m Error Vector
2TSSE y P y
1 2( , , , )mspane 0
( ) P Py Pe e Py
T y Py
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
46/139
Introduction toRadial Basis Function Networks
Learning the Kernels
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
47/139
RBFNs as Universal Approximators
x1 x2 xn
y1 yl
1 2 m
w11 w12 w1mwl1 wl2 wlml
Training set
) ( )(
1,
pk
k
k
yxT
2
2( ) exp 2
j
j
j
x
x
Kernels
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
48/139
What to Learn? Weightswijs
Centers js of js Widthsjs of js
Number of js
Model Selection
x1 x2 xn
y1 yl
1 2 m
w11 w12 w1mwl1 wl2 wlml
( ) ( )l
k kf
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
49/139
One-Stage Learning
( ) ( ) ( )1 k k kij i i jw y f x x
( ) ( )1
k k
i ij j
j
f w
x x
( ) ( ) ( )2 21
ljk k k
j j ij i i
ij
w y f
x x x
2
( ) ( ) ( )
3 31
ljk k k
j j ij i i
ij
w y f
x x x
( ) ( )l
k kf
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
50/139
One-Stage Learning
( ) ( )1
k k
i ij j
j
f w
x xThe simultaneous updates of all threesets of parameters may be suitable
for non-stationary environments or on-line setting.
( ) ( ) ( )1 k k kij i i jw y f x x
( ) ( ) ( )2 21
ljk k k
j j ij i i
ij
w y f
x x x
2
( ) ( ) ( )
3 31
ljk k k
j j ij i i
ij
w y f
x x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
51/139
x1 x2 xn
y1 yl
1 2 m
w11 w12 w1mwl1 wl2 wlml
Two-Stage Training
Step 1
Step 2
Determines Centers js of js. Widthsjs of js. Number of js.
Determines wijs.
E.g., using batch-learning.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
52/139
Train the Kernels
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
53/139
Unsupervised Training
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
54/139
Subset Selection Random Subset Selection Forward Selection Backward Elimination
Clustering Algorithms
KMEANS LVQ Mixture Models
GMM
Methods
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
55/139
Subset Selection
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
56/139
Random Subset Selection Randomly choosing a subset of points from
training set
Sensitive to the initially chosen points.
Using some adaptive techniques to tune Centers Widths #points
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
57/139
Clustering Algorithms
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
58/139
Clustering Algorithms
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
59/139
Clustering Algorithms
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
60/139
Clustering Algorithms1
23
4
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
61/139
Introduction toRadial Basis Function Networks
Bias-VarianceDilemma
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
62/139
Goal Revisit
Minimize Prediction Error
Ultimate Goal Generalization
Goal of Our Learning Procedure
Minimize Empirical Error
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
63/139
Badness of Fit Underfitting
A model (e.g., network) that is not sufficiently
complex can fail to detect fully the signal in acomplicated data set, leading to underfitting.
Produces excessive bias in the outputs.
Overfitting A model (e.g., network) that is too complex may
fit the noise, not just the signal, leading tooverfitting.
Produces excessive variance in the outputs.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
64/139
Underfitting/Overfitting Avoidance
Model selection Jittering Early stopping Weight decay
Regularization
Ridge Regression Bayesian learning Combining networks
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
65/139
Best Way to Avoid Overfitting
Use lots of training data, e.g., 30 times as many training cases as there are
weights in the network. for noise-free data, 5 times as many training
cases as weights may be sufficient.
Dontarbitrarily reduce the number ofweights for fear of underfitting.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
66/139
Badness of FitUnderfit Overfit
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
67/139
Badness of FitUnderfit Overfit
However it's not really a dilemma
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
68/139
Bias-Variance Dilemma
Underfit Overfit
Large bias Small bias
Small variance Large variance
However, it s not really a dilemma.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
69/139
Underfit Overfit
Large bias Small bias
Small variance Large variance
Bias-Variance DilemmaMore on overfitting
Easily lead to predictionsthat are far beyond therange of the training data.
Produce wild predictionsin multilayer perceptronseven with noise-free data.
It's not really a dilemma.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
70/139
Bias-Variance Dilemma
It s not really a dilemma.
fitmore
overfitmore
underfit
Bias
Variance
The mean of the bias=?
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
71/139
Bias-Variance Dilemma
Sets of functionsE.g., depend on #
hidden nodes used.
The true modelnoise
bias
Solution obtainedwith training set 2.
Solution obtainedwith training set 1.
Solution obtainedwith training set 3.
biasbias
The variance of the bias=?
The mean of the bias=?
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
72/139
Bias-Variance Dilemma
Sets of functionsE.g., depend on #
hidden nodes used.
The true modelnoise
The variance of the bias=?
bias
Variance
Reduce the effective number of parameters.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
73/139
Model Selection
Sets of functionsE.g., depend on #
hidden nodes used.
The true modelnoise
bias
Variance
Reduce the number of hidden nodes.
Goal: 2min ( ) ( )E y f x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
74/139
Bias-Variance Dilemma
Sets of functionsE.g., depend on #
hidden nodes used.
The true modelnoise
bias
( )g x( ) ( )y g x x
( )f x
Goal: min ( ) ( )E y f x x
Goal: 2min ( ) ( )E y f x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
75/139
Bias-Variance Dilemma
Goal: min ( ) ( )E y f x x
2
( ) ( )E y f
x x 2
( ) ( )E g f
x x
2 2
( ) ( ) 2 ( ) ( )E g f g f
x x x x
2 2( ) ( ) 2 ( ) ( )E g f E g f E
x x x x
Goal: 2min ( ) ( )E g f x x 2
min ( ) ( )E y f
x x
0 constant
2 22
( ) ( ) ( ) ( )E y f E E g f x x x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
76/139
Bias-Variance Dilemma
2( ) ( )E g f
x x
2[ ( )] [( ) () )( ]E f E fE g f
x xx x
2
[ ]( )) (E fE g
x x 2
[ ]( )) (E fE f
x x
[ ( )] [ ( )) ( ]2 ( )E g fE f E f x xx x
2 2
[ ( )] [ ( )]( ) ( ) [ ( )] [2 ( ) ( ) ( )]E f E f E f EE g g f ff
x x x xx x x x
( ) ( )[ ( )] [ ( )]E E f Eg f f x xxx
( ) ( )E g f x x [ )]( ()EE fg x x [ ( ()] )E E f f x x [ ( )] [ ( )]E f E fE x x
( ) ( )E g E f x x [ )]( () EE fg x x [ ( )] ( )EE f f x x [ ( )] [ ( )]E f E f x x 0
( ) ( ) ( ) ( )y f g f
0
Goal: 2min ( ) ( )E y f x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
77/139
Bias-Variance Dilemma
2 22( ) ( ) ( ) ( )E y f E E g f
x x x x
2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f x x xx
Goal: min ( ) ( )E y f x x
noise bias2 variance
Cannot beminimized
Minimize bothbias2 and variance
Goal: 2min ( ) ( )E y f x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
78/139
Model Complexity vs. Bias-Variance
2 22
( ) ( ) ( ) ( )E y f E E g f x x x x
2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f x x xx
Goal: min ( ) ( )E y f x x
noise bias2 variance
ModelComplexity(Capacity)
Goal: 2min ( ) ( )E y f x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
79/139
Bias-Variance Dilemma
2 22( ) ( ) ( ) ( )E y f E E g f
x x x x
2 22 [ ( )] [ ( )( ( ]) )E fE E fE g E f x x xx
Goal: min ( ) ( )E y f x x
noise bias2 variance
ModelComplexity(Capacity)
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
80/139
Example (Polynomial Fits)22exp( 16 )y x x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
81/139
Example (Polynomial Fits)
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
82/139
Example (Polynomial Fits)
Degree 1 Degree 5
Degree 10 Degree 15
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
83/139
Introduction toRadial Basis Function Networks
The Effective Numberof Parameters
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
84/139
Variance Estimation
Mean
Variance
22
1
1
p
ii xp
In general, notavailable.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
85/139
Variance Estimation
Mean1
1
p
i
i
x x
p
Variance
22 2
1
1
1
p
iis xp
Loss 1 degreeof freedom
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
86/139
Simple Linear Regression0 1y x
2
~ (0, )N 0
1
y
x
y
x
2
p
i iSSE y y Minimize
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
87/139
Simple Linear Regression
y
x
0
1
y
x
0 1y x 2
~ (0, )N
1
i i
i
SSE y y
2
p
i iSSE y y Minimize
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
88/139
Mean Squared Error (MSE)
y
x
0
1
y
x
0 1y x 2
~ (0, )N
1
i i
i
y y
2
2
SSE
MSE p Loss 2 degrees
of freedom
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
89/139
Variance Estimation
2SSE
MSE mp
Loss m degreesof freedom
m: #parameters of the model
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
90/139
The Number of Parameters
w1 w2 wm
x1 x2 xnx =
1
( ) ( )m
i
i
iy wf
x x #degrees of freedom:
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
91/139
The Effective Number of Parameters ()
w1 w2 wm
x1 x2 xnx =
1
( ) ( )m
i
i
iy wf
x x The projection Matrix
1 Tp
P I AT A
0
( )p trace P m
( ) ( ) ( )trace trace trace A B A BFacts:
( ) ( )AB BA
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
92/139
The Effective Number of Parameters ()The projection Matrix
1 Tp
P I AT A
0
1( ) Tptrace trace AP I
1 Tp trace A
( ) ( )trace traceAB BA
1
TTp trace
Pf)
1T Tp trace
mp trace I
p m ( )p trace P m
( )p trace PThe effective number of parameters:
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
93/139
Regularization
((1
)
2
)
p
k
k
k fy
x2
1
m
i
ii w
Empirial
Er
Model s
penalror ts
yCo t
2
( )
1 1
( )p m
k
i
k
k
i
iwy
x
Penalize models withlarge weightsSSE
( )p trace P
The effective number of parameters:
( )p trace P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
94/139
Regularization
((1
)
2
)
p
k
k
k fy
x2
1
m
i
ii w
Empirial
Er
Model s
penalror ts
yCo t
2
( )
1 1
( )p m
k
i
k
k
i
iwy
x
Penalize models withlarge weightsSSE
Without penalty (i=0),there are m degrees
of freedom tominimize SSE (Cost).
The effective numberof parameters =m.
( )p trace P
The effective number of parameters:
( )p trace P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
95/139
Regularization
((1
)
2
)
p
k
k
k fy
x2
1
m
i
ii w
Empirial
Er
Model s
penalror ts
yCo t
2
( )
1 1
( )p m
k
i
k
k
i
iwy
x
Penalize models withlarge weightsSSE
With penalty (i>0),the liberty to minimizeSSE will be reduced.
The effective numberof parameters
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
96/139
Variance Estimation
2 MSE
Loss degreesof freedom
( )p trace P
SSEp
The effective number of parameters:
( )p trace P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
97/139
Variance Estimation
2 MSE
( )p trace P
( )SSE
trace P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
98/139
Introduction toRadial Basis Function Networks
Model Selection
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
99/139
Model Selection
Goal Choose the fittest model
Criteria Least prediction error
Main Tools (Estimate Model Fitness) Cross validation
Projection matrix Methods
Weight decay (Ridge regression) Pruning and Growing RBFNs
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
100/139
Empirical Error vs. Model Fitness
Minimize Prediction Error
Ultimate Goal Generalization
Goal of Our Learning Procedure
Minimize Empirical Error
Minimize Prediction Error
(MSE)
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
101/139
Estimating Prediction Error
When you have plenty of data useindependent test sets E.g., use the same training set to traindifferent models, and choose the best
model by comparing on the test set.
When data is scarce, use Cross-Validation Bootstrap
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
102/139
Cross Validation
Simplest and most widely used method forestimating prediction error.
Partition the original set into severaldifferent ways and to compute an averagescore over the different partitions, e.g., K-fold Cross-Validation Leave-One-Out Cross-Validation Generalize Cross-Validation
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
103/139
K-Fold CV
Split the set, say,D of availableinput-output patterns into kmutuallyexclusive subsets, sayD1,D2, ,Dk.
Train and test the learning algorithmktimes, each time it is trained onD\Di and tested onDi.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
104/139
K-Fold CV
Available Data
Test Set
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
105/139
K-Fold CV
Available DataD1 D2 D3 . . . Dk
D1 D2 D3. . . Dk
D1 D2 D3. . . Dk
D1 D2 D3 . . . Dk
D1 D2 D3. . . Dk
Training Set
Estimate 2
A special case of k-fold CV.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
106/139
Leave-One-Out CV
Split thep available input-outputpatterns into a training set of sizep1 and a test set of size 1.
Average the squared error on theleft-out pattern over thep possibleways of partition.
A special case of k-fold CV.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
107/139
Error Variance Predicted by LOO
22
1
1 ( )
p
LOO i i i
i
y fp
x
( , ) : 1,2, ,k kD y k p x
\ ( , )
i i i
D D y x
Available input-output patterns.
1, ,i p Training sets of LOO.
if Function learned usingDi as training set.
The estimate for the variance of prediction error using LOO:
Error-square forthe left-out element.
A special case of k-fold CV.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
108/139
Error Variance Predicted by LOO
22
1
1 ( )
p
LOO i i i
i
y fp
x
( , ) : 1,2, ,k kD y k p x
\ ( , )
i i i
D D y x
Available input-output patterns.
1, ,i p Training sets of LOO.
if Function learned usingDi as training set.
The estimate for the variance of prediction error using LOO:
Error-square forthe left-out element.
Given a model, the function with least
empirical error forDi.
As an index of models fitness.We want to find a model also minimize this.
A special case of k-fold CV.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
109/139
Error Variance Predicted by LOO
22
1
1 ( )
p
LOO i i i
i
y fp
x
( , ) : 1,2, ,k kD y k p x
\ ( , )
i i i
D D y x
Available input-output patterns.
1, ,i p Training sets of LOO.
if Function learned usingDi as training set.
The estimate for the variance of prediction error using LOO:
Error-square forthe left-out element.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
110/139
Error Variance Predicted by LOO
22
1
1 ( )
p
LOO i i i
i
y fp
xError-square for
the left-out element.
2 2
(
1
( ))
T
LOO dip ag
y PP Py
2 2(1 ( ))
T
LOO dip
ag y PP Py
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
111/139
Generalized Cross-Validation
p
( )( ) p
tracediag
p
PP I
22
2
( )
T
GCV
p
trace
y P y
P
2
2
( )
Tp
p
y P y
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
112/139
More Criteria Based on CV
22
2
( )
T
GCV
p
p
y P y GCV(Generalized CV)
22
T
UEVp
y P y UEV(Unbiased estimate of variance)
2
2
T
FPE
p
p p
y P y FPE
(Final Prediction Error)
AkaikesInformation
Criterion
BIC(Bayesian Information Criterio)
22 ln( ) 1
T
BIC
p p
p p
y P y
2 2 2 2
UEV FPE GCV BIC
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
113/139
More Criteria Based on CV
22
2
( )
T
GCV
p
p
y P y
22
T
UEVp
y P y
2
2
T
FPE
p
p p
y P y
22 ln( ) 1
T
BIC
p p
p p
y P y
2
UEV
p
p
2UEVp
p
2ln( ) 1
UEV
p p
p
2 2 2 2
UEV FPE GCV BIC
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
114/139
More Criteria Based on CV
22
2
( )
T
GCV
p
p
y P y
22
T
UEVp
y P y
2
2
T
FPE
p
p p
y P y
22 ln( ) 1
T
BIC
p p
p p
y P y
2
UEV
p
p
2UEV
p
p
2ln( ) 1
UEV
p p
p
?P
Standard Ridge Regression,
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
115/139
Regularization
((1
)
2
)
p
k
k
k fy
x 2
1
m
i
ii w
Empirial
Er
Model s
penalror ts
yCo t
2
( )
1 1
( )p m
k
i
k
k
i
iwy
x
Penalize models withlarge weightsSSE
,i i
Standard Ridge Regression,
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
116/139
Regularization
((1
)
2
)
p
k
k
k fy
x 2
1
m
i
ii w
Empirial
Er
Model s
penalror ts
yCo t
2
( )
1 1
( )p m
k
i
k
k
i
iwy
x
Penalize models withlarge weightsSSE
2
1
m
i
iw
,i i
2
(( ) )
1 1
2
1
minp m m
k
i
k
k
i i
i i
C w wy
x
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
117/139
Solution Review
1* Tw A yT
m IA
1 Tp P I A
Used to compute model selection criteria
E l
22( ) (1 2 ) xy x x x e 2~ (0 0 2 )N
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
118/139
Example 50p ~ (0,0.2 )N
Width of RBF r= 0.5
E l
22( ) (1 2 ) xy x x x e 2~ (0 0 2 )N
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
119/139
Example2 2 2 2
UEV FPE GCV BIC
50p
~ (0,0.2 )N
Width of RBF r= 0.5
E l
22( ) (1 2 ) xy x x x e 2~ (0 0 2 )N
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
120/139
Example2 2 2 2
UEV FPE GCV BIC
50p
~ (0,0.2 )N
Width of RBF r= 0.5
Optimizing the
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
121/139
Optimizing theRegularization Parameter
Re-Estimation Formula
2 1 21
( )
T
T
trace
trace
A
w
Ay P y
A Pw
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
122/139
Local Ridge Regression
Re-Estimation Formula
2 1 21
( )
T
T
trace
trace
A
w
Ay P y
A Pw
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
123/139
Example Width of RBF( )p trace P
0.05r
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
124/139
Example Width of RBF( )p trace P
0.05r
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
125/139
Example Width of RBF0.05r
There are two local-minima.
2 1 2
1
( )
T
T
trace
trace
A
w
Ay P y
A Pw
Using the about re-estimationformula, it will be stuck at thenearest local minimum.
That is, the solution depends onthe initial setting.
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
126/139
Example Width of RBF0.05r
There are two local-minima.
2 1 2
1
( )
T
T
trace
trace
A
w
Ay P y
A Pw
(0) 0.01
lim 0.1( )t
t
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
127/139
Example Width of RBF0.05r
There are two local-minima.
2 1 2
1
( )
T
T
trace
trace
A
w
Ay P y
A Pw
5(0) 10
4( )lim 2.1 10t
t
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
128/139
Example Width of RBF0.05rRMSE: Root Mean Squared ErrorIn real case, it is not available.
E l
sin(12 )y x 2~ (0,0.1 )N
50m p
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
129/139
Example Width of RBF0.05rRMSE: Root Mean Squared ErrorIn real case, it is not available.
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
130/139
Local Ridge Regression
2
(( ) )
1 1
2
1
min
p m m
ki
k
ki i
i i
C w wy
x
Standard Ridge Regression
2
(( ) )
1 1
2
1
minp m m
i i
k
i
k i
i
k
i
C w wy
x
Local Ridge Regression
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
131/139
Local Ridge Regression
2
(( ) )
1 1
2
1
min
p m m
ki
k
ki i
i i
C w wy
x
Standard Ridge Regression
2
(( ) )
1 1
2
1
minp m m
i i
k
i
k i
i
k
i
C w wy
x
Local Ridge Regression
j implies that j()
can be removed.
Th S l ti
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
132/139
The Solutions
1* Tw A y
T
m IA
1 T
p
P I AUsed to compute model selection criteria
T A
T
A Linear Regression
Standard Ridge Regression
Local Ridge Regression
Optimizing the
22
2
( )
T
GCV
p
trace y P y
P
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
133/139
p gRegularization Parameters
Incremental Operation
P:The current projection Matrix.
Pj:The projection Matrix obtained by removing j().
Tj j j j
j T
j j j j
P PP P
P
( )
Optimizing the
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
134/139
p gRegularization Parameters
22
2
( )
T
GCV
p
trace
y P y
P
Solve2
0GCV
j
Subject to 0j
Tj j j j
j T
j j j j
P PP P
P
Optimizing the
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
135/139
p gRegularization Parameters
2T
ja y P y2T T
j j j jb y P y P 2 2
( )T T
j j j j jc P y P ( )
jtrace P
2T
j j j P
ifif and
0 if and
if
c bT
j j j b a
c bTj
j j j b a
c b c bT T
j j j j j jb a b a
a ba b
a b
P
P
P P
Solve2
0GCV
j
Subject to 0j
Optimizing the
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
136/139
p gRegularization Parameters
2T
ja y P y2T T
j j j jb y P y P 2 2
( )T T
j j j j jc P y P ( )
jtrace P
2T
j j j P
ifif and
0 if and
if
c bT
j j j b a
c bTj
j j j b a
c b c bT T
j j j j j jb a b a
a ba b
a b
P
P
P P
Solve2
0GCV
j
Subject to 0j
Remove j()
Th Al ith
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
137/139
The Algorithm
Initialize is.
e.g., performing standard ridge regression.
Repeat the following until GCV converges:
Randomly selectj and compute
Perform local ridge regression If GCV reduce & remove j()
j
j
R f
7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
138/139
References
Kohavi, R. (1995), "A study of cross-validation and bootstrap for accuracy
estimation and model selection," International Joint Conference on Artificial
Intelligence (IJCAI).
Mark J. L. Orr (April 1996),Introduction to Radial Basis Function
Networks, http://www.anc.ed.ac.uk/~mjo/intro/intro.html .
http://www.anc.ed.ac.uk/~mjo/intro/intro.htmlhttp://www.anc.ed.ac.uk/~mjo/intro/intro.html7/28/2019 Lecture3---Introdunction to Radial Basis Function Networks
139/139
Introduction toRadial Basis Function Networks
IncrementalOperations