Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Radial Basis Function NetworksRadial Basis Function NetworksRadial Basis Function NetworksRadial Basis Function Networks
Radial Basis Function Networks A special types of ANN that have three
layerslayers Input layerHidd lHidden layerOutput layer
i f i hidd l i Mapping from input to hidden layer is nonlinear
Mapping from hidden to output layer is linear
PR , ANN, & ML 2
ComparisonMulti-layer perceptron Multiple hidden layers
RBF Networks Single hidden layer Multiple hidden layers
Nonlinear mapping W: inner product
Single hidden layer Nonlinear + linear W: distance W: inner product
Global mapping W l ifi
W: distance Local mapping W d t Warp classifiers
Stochastic approximation
Warp data Curve fitting
approximation
PR , ANN, & ML 3
Another View: Curve Fitting We try to estimate a mapping from patterns
into classes f(patterns)->classes, f(X)->df(p ) , f( ) Patterns are represented as feature vector X Classes are decisions d Classes are decisions d Training samples: f(Xi)->di, i=1,..., n Interpolation of the f based on samples
d
x2
d
PR , ANN, & ML 4x1
Yet Another View: Warping Data If the problem is not linearly separable,
MLP will use multiple neurons to defineMLP will use multiple neurons to define complicated decision boundaries (warp classifiers)
Another alternative is to warp data into higher dimensional space that they are much more likely to be linearly separable (single perceptron will do)
This is very similar to the idea of Support Vector Machine
PR , ANN, & ML 5
Example XOR Warpped XOR
y|]0,0[|
2 )(t
e xx
(1 1)
)(xx
(1,1)
)()(
)(2
1
xx
x
yx
(0,0)
x |]11[| t
(0,1)(1,0)
( , )
PR , ANN, & ML 6
|]1,1[|1 )(
t
e xx
More Example
PR , ANN, & ML 7
A Pure Interpolation Approach Given: (Xi, di), i=1, …, n Desired: f(Xi)= di( i) i
Solution: f(X), with f(Xi)= di
Radial basis function solution iiwf )()( XXX X,Xi) – general form is shift and rotation invariant Shift invariant requires X-Xi
i
iiwf )()( XXX
q i
Rotation invariant requires || X-Xi ||
Example22)( Multiquadrics
Inserve Multiquadrics Gaussan
22)( crr
22
1)(cr
r
PR , ANN, & ML 8
2
2
2)( r
er
Graphical Interpretation Each neuron responds based on the distance to the center
of its receptive field The bottom level is a nonlinear mappingpp g The top level is a linear weighted sum
iiwf )()( XXX i
w1 wn
)( 1XX )( nXX )( 2XX
PR , ANN, & ML 9x1 x2 xm
Other Alternatives: Global Lagrange polynomials
)(n
knk Lyxfy
)())(())(()())(())((
)(
111
111,
0,
nkkkkkkok
nkkokn
kknk
xxxxxxxxxxxxxxxxxxxxL
yfy
PR , ANN, & ML 10
Other Alternatives: Local Bezier Basis B-spline basis
PR , ANN, & ML 11
B-Spline Interpolation A big subject in mathematics
U d i di i li Used in many disciplinesApproximation Pattern recognitionComputer graphics
As far as pattern recognition is concernedDetermine order of spline (DOFs)
Knot vectors (partition into intervals) Fitting in each interval
PR , ANN, & ML 12
Interpolation SolutionXXX ),()(
i
ii
d
wf
XX2
1
2
1
22221
11211
),(
jiijn
n
dd
ww
21
),(
jiij
nnnnnn dw
DΦWDΦW
1
is symmetrical is invertable (if all X ’s are distinct)
PR , ANN, & ML 13
is invertable (if all Xi s are distinct)
Practical Issue: Accuracy (cont.) The function represents the Green’s
function for a certain differential operatorfunction for a certain differential operator When it is shift and rotational invariant, we
it (X X ) G(||X X ||) ican write (X, Xi) as G(||X-Xi||), again, Gaussian Kernel is a popular choice here
PR , ANN, & ML 14
Practical Issues Accuracy
How about data are noisy?How about data are noisy? Speed
How about there are many sample points? Training
What is the training procedure?
PR , ANN, & ML 15
Practical Issue: Accuracy When data are noisy, pure interpolation
represents a form of “overfitting”represents a form of overfitting Need a stabilizing (or smoothing,
l i ti ) tregularization) term The solution should achieve two things
Good fitting Smoothness
PR , ANN, & ML 16
Practical Issue: Accuracy (cont.)2
1
2 ||21))((
21 Dfdf
n
iii X
1 22 i
TXn
i
m
ijiji DfGwd 2*
1
2
1
||21))((
21
DG)GG(GW
DG)WGG(GT
oT
To
Ti i
1
1 1 22
The solution is rooted in the regularization theory, which is way beyond the scope of this course (readwhich is way beyond the scope of this course (read the papers on the class Web sites for more details)
Try to minimize error as a weighted sum of two
PR , ANN, & ML 17
y gterms which impose the fitting and the smoothness constraints
Sidebar I It can be proven that MAP estimator (Baysian rule)
gives the same results as regularized RBF solution Un-regularized (fitting) solution assumes the same
prior
Dfdfn
iii ||
21))((
21 2
1
2X
PfPfPfP
i
)()()|()|(
22 1
XXX
cfPfPfPP
))(log()|(log)|(log)(
XXX
PR , ANN, & ML 18
Sidebar II Regularization is also similar to (or call)
ridge regression in statisticsridge regression in statistics The problem here is to fit a model to data
ith t fittiwithout overfitting In linear case, we have
wwxwyi j j
jjijoiridge
22)(minarg w
w
wxwyi j
jijoiridge
2)(minargw
w
PR , ANN, & ML 19
swsubjecttoj
j
2
Intuition When variables xi are highly correlated,
their coefficients become poorly determinedtheir coefficients become poorly determined with high varianceE g widely large positive coefficient on oneE.g. widely large positive coefficient on one
can be canceled by a similarly large negative coefficient on its correlated cousincoefficient on its correlated cousin
Size constraint is helpfulCaveat: constraint is problem dependentCaveat: constraint is problem dependent
PR , ANN, & ML 20
Solution to Ridge Regression Similar to regularization
ww i j j
jjijoiridge wwxwy 22)(minarg
WWYXWYXW
WWYXWYXWww
TT
TTridge
d )()(
)()(minarg
WYXWXw
WWYXWYXW
T
TT
dd
0)(
0)()(
YXIXXYXWIXX
WYXWX
TTridge
TT
λ 1)()(
0)(
PR , ANN, & ML 21
YXIXXw TTridge λ 1)(
Ugly Math
YXIXXw TTridge λ 1)(
YUVΣIUΣUVΣUΣYXIXXXXwY
TTTTTT
TTridge
λVVλ
1
1
)()(
YUΣVIUΣUVΣUΣYUVΣIUΣUVΣUΣ
TTTTTT λVVλVV
1111 )()()(
)(
YUΣIΣΣUΣYUΣIVUΣUVΣVUΣ
TTT
TTTTTTT
λVλVV
1
111
)()(
Yuu Ti
ii λ
2
2
)(
PR , ANN, & ML 22
ii λ
Physical Interpretation Singular values of X represents the spread
of data along different body-fittingof data along different body fitting dimensions
To estimate Y(=Xwridge) regularization To estimate Y( Xw ) regularization minimizes the contribution from less spread-out dimensionsLess spread-out dimensions usually have much
larger variance (high dimension eigen modes)1Trace X(XTX+I)-1XT is called effective
degrees of freedom
PR , ANN, & ML 23
More Details Trace X(XTX+I)-1XT is called effective
degrees of freedomdegrees of freedomControls how many eigen modes are actually
used or activeused or active Different methods are possible
Sh i ki th t ib ti l d Shrinking smoother: contributions are scaled Projection smoother: contributions are used (1)
or not sed (0)or not used (0)
PR , ANN, & ML 24
Practical Issue: Speed When there are many training samples, G
and matrices are of size n by nand matrices are of size n by n Inverting such a matrix is of O(n3) Reducing the number of bases used
PR , ANN, & ML 25
Practical Issue: Speed (cont.)
m
*
m<n
n m
jiji
jjj
DfGwd
Gwf
||21))((
21
)()(
2*2
1
*
TX
TXX
m
To
Ti i
GGG
),(),(),(
22
12111
1 1
TXTXTXDG)WGG(G
mnmnnn
m
GGG
GGG
),(),(),(
),(),(),(
21
22212
TXTXTX
TXTXTXG
m
m
o
mn
GGGGGG
),(),(),(),(),(),(
22212
12111
TTTTTTTTTTTT
G
PR , ANN, & ML 26
mmmnnn GGG
),(),(),( 21 TTTTTT
Practical Issue: Training How can the center of radial basis functions
for the reduced basis set be determined?for the reduced basis set be determined? Chosen randomly Training involves finding wi, using SVD
PR , ANN, & ML 27
Training with K-mean Using unsupervised clustering
Fi d h d t l t d th t i Find where data are clustered – that is where the radial basis functions should be
l dplaced With k-mean
PR , ANN, & ML 28
K-Means Algorithm(fixed # of clusters)(fixed # of clusters)
Arbitrarily pick N cluster centers, assign samples to nearest center
Compute sample mean of each clusterp p Reassign samples to clusters with the
nearest mean (for all samples)nearest mean (for all samples) Repeat if there are changes, otherwise stop
PR , ANN, & ML 29
PR , ANN, & ML 30
PR , ANN, & ML 31
PR , ANN, & ML 32
Training with Gradient Decent Error Expression
21 n m
1 1))((
21 )()()(
n
i
m
jjiji
nnn
Gwd TX
Free variables in the error expression areWeightgCenter locationBasis spreadp
PR , ANN, & ML 33
Effect of Weights)()(
2)( ))((
21 nnn m
jijin Gwd TX
)()()(
1 1
)(
))((2
nnmn
i jjiji
Gwd
TX
)()()(
1
)(
)(
nnn
n
jjijii
G
Gwd
TX
TX
)(
)(
1
)( )(n
iji
ni
j
Gw
TX
)(
)()()1(
n
j
n
wn
jn
jw
ww
PR , ANN, & ML 34
j
Effect of Center Positions)()(
2)( ))((
21 nnn m
jijin Gwd TX
)()()(
1 1
)(
2nnm
jijin
i
i j
Gwd TX
)()(
)(1)()(
)(
1
)()('2nn
n ji
n
jin
in
j
n
j
Gw TXΣTX
)(
)()()1(
1
)()(
nnn
jii
jiij
j
TT
T
)( n
j
jjT
TT T
PR , ANN, & ML 35
Effect of Basis Spread)()(
2)( ))((
21 nnn m
jijin Gwd TX
)()()(
1 1
)(
2nnm
jijin
i
i jjiji
Gwd
TX
)(
)()()()(
)(
1
)('n
nn
ij
n
jin
in
j
n
jjj
Gw
QTX
)()(
)(
)(
11
))((nn T
jijin
ij
iji
jiij
j
TXTXQ
Σ
)(11
)()(1)1(1
nj
nn
jn
j
jijiij
ΣΣΣ
Σ
PR , ANN, & ML 36
jΣ
Details A lot of theoretical development results are
omitted hereomitted hereE.g., relation to kernel regression and SVM
A l t f t i id ti t A lot of tuning considerations are not covered hereE.g., how to determine ?
This is an active research area
PR , ANN, & ML 37
Examples
PR , ANN, & ML 38544,000 data points w. 80,000 centersAccuracy of 1.4x10-6 for all data points
Problem Definitioni i l d f d Given a point cloud of data From laser range scanner, orCT MR etcCT, MR, etc.
Find a single analytical surface approximation Or an inside outside function Or an inside-outside function
Range data are s(X)=0Outside is s(X)>0Outside is s(X)>0 Inside is s(X)<0
Just sample data s(X)=0 is not enoughp ( ) g s can be a trivial zero functionNeed off-surface data generation
PR , ANN, & ML 39
Procedures1. Off surface data generation2. Choose a subset from the interpolation node xi and p i
fit an RBF only to these3. Evaluate the resideual ei = fi – s(xi)4. If max(ei)<accuracy, then stop5. Else append new centers where ei is largepp i g6. Re-fit RBF and go back to step 2
PR , ANN, & ML 40
More Results
PR , ANN, & ML 41Less smoothing More smoothing