My Dhsch6 Rbf

8/11/2019 My Dhsch6 Rbf

1/17

RBF Neural Networks

We consider the other major class of neural

network model Radial Basis Function (RBF)

Neural Networks in which the activation of a

hidden unit is determined by the distance

between the input vector and a prototype vector.

A function is approximated as a linear

combination of a set ofbasis functions.

yk

x

M

j 1

wk jj

x

wk0

M

j 0

wk jj

x

Architecture of the RBF neural network p.1


2/17

Basis Functions

Basis functions normally take the formj

x

j

. The

function depends on the distance (usually taken to be

Euclidean) between the input vector

xand a vector

j.

The most common form of basis function used is the

Gaussian function

j

x

exp

x

j

2

22j

j determines the center of basis functionj;j is a width

parameter that controls how spread the curve is.

The Gaussian function is not normalized, since any overall

factors can be absorbed into the weights p.2


3/17

Basis Functions

The Gaussian function is alocalizedbasis

function with the property that

r

0as

r

.

Another choice of localized basis function

r

r2 2

0

r

x

j

A hidden neuron is more sensitive to data points

near its center. This sensitivity may be tuned byadjusting the width. For a given input vector,typically only a few hidden units will have

significant activations.

p.3


4/17

Expressiveness

The hidden layer applies a nonlinear

transformation from the input space to the hidden

space

RBF neural networks are capable of universal

function approximation, with only mild restrictionson the form of the basis functions.

More general topologies of RBF network (more

than one hidden layer) are not normallyconsidered.

p.4


5/17

Example: Implementing XOR

z1 e

x

1

2

1

0 0

t

z2 e

x

2

2

2

1 1

t

x1 x2 z1 z2

0 0 1 .135

0 1 .37 .37

1 0 .37 .371 1 .135 1

p.5


6/17

Example: Implementing XOR

11

1

X1

X2

1

When mapped into the feature space

z1 z2

, the two

classes become linearly separable. p.6


7/17

Generalized Gaussian RBFs

The Gaussian RBFs can be generalized to allow for

arbitrary covariance matricesj.

j

x

exp 1

2

x

j

t 1

j

x

j

Since the matricesj are symmetric, each basis function

hasd

d

3

2independent adjustable parameters, as

compared with thed

1independent parameters for the

regular Gaussian basis functions.

In practice there is a trade-off to be considered between

using a smaller number of basis with many adjustable

parameters and a larger number of less flexible functions.

p.7


8/17

Training RBF networks

Sum-of-squared-error function

E np 1

Ep Ep 1

2

ck 1

tkp ykp

2

wheretkp is the target value for the output unitkwhenthe network is presented with input vector

xp.

zjp e

xp

j

2

22j ykp

M

j 0

wk jzjp

Gradient descent algorithm computing gradients p.8


9/17


Ep

wk j

Ep

ykp

ykp

wk j

Ep

ykp

tkp ykp

ThusEp

wk j

tkp ykp

zjp

p.9


10/17


Ep

ji

k

Ep

yk p

yk p

zj p

zj p

ji

zj p

ji zj p

ji

xp

j

2

22j

zj p

22j

ji

i

xip ji

2

zj p

22j

2

xip ji

1

zj p2j

xip ji

Ep

ji

zj p

2j

xip

ji

k

tk p

yk p

wk j

p.10


11/17


Ep

j

k

Ep

ykp

ykp

zjp

zjp

j

zjp

j

zjp

j

xp

j

2

22j

zjp2

jlnzjp

Ep

j

zjp2

j

lnzjpk

tkp ykp

wk j

p.11


12/17


13/17

Two-stage training of RBF nets

In the first stage the input data set

xp

alone is

used to determine the parameters of the basis

functions (e.g.

j andj for the Gaussian basisfunctions). For example

randomly select a set of training data as the

basis function centers

identify clusters of training data and put a

basis function centered at each clusterchoosej to be some multiple of averagedistance between the basis function centers.

p.13

i i f


14/17


In the second stage, the basis functions are kept

fixed and the second-layer weights are optimized.

yk

x

M

j

0

wk jj

x

Since the basis functions are considered fixed,

the network is equivalent to a single-layer neural

network.

p.14

T i i f RBF


15/17


Consider a sum-of-squared-error function

E

1

2p k

tk p

yk

xp

2

Since the error function is a quadratic function of the

weights, its minimum can be found in terms of the solution

of a set of linear equations.

Setting the gradient ofEwith respect towk j to zero gives

t

Wt

tT

where

W

k j

wk j,

p j

j

xp

, and

T

pk

tk p.

p.15

T t t i i f RBF t


16/17


Given the pseudoinverse

t

1t, theformal solution for the weights is given by

Wt

T

In practice, the equations are solved usingsingular value decomposition, to avoid problems

due to the possibility oftbeing singular or

nearly singular.The second-layer weights can be found by fast,

linear algebra techniques!

p.16

T t t i i f RBF t


17/17


The possibility of choosing suitable parameters for the

hidden units without having to perform a full non-linear

optimization of the network is one of the major advantages

of RBF networks, as compared with the multi-layer neural

networks.

The use of unsupervised techniques to determine the basis

function parameters need not to be optimal for the

supervised learning problem, since it does not take onaccount of the target labels associated with the data

Unsupervised techniques can be used to initialize the basis

function parameters before running the gradient descent

algorithm p.17

Documents

My Dhsch6 Rbf