My Dhsch6 Rbf

Embed Size (px)

Citation preview

  • 8/11/2019 My Dhsch6 Rbf

    1/17

    RBF Neural Networks

    We consider the other major class of neural

    network model Radial Basis Function (RBF)

    Neural Networks in which the activation of a

    hidden unit is determined by the distance

    between the input vector and a prototype vector.

    A function is approximated as a linear

    combination of a set ofbasis functions.

    yk

    x

    M

    j 1

    wk jj

    x

    wk0

    M

    j 0

    wk jj

    x

    Architecture of the RBF neural network p.1

  • 8/11/2019 My Dhsch6 Rbf

    2/17

    Basis Functions

    Basis functions normally take the formj

    x

    j

    . The

    function depends on the distance (usually taken to be

    Euclidean) between the input vector

    xand a vector

    j.

    The most common form of basis function used is the

    Gaussian function

    j

    x

    exp

    x

    j

    2

    22j

    j determines the center of basis functionj;j is a width

    parameter that controls how spread the curve is.

    The Gaussian function is not normalized, since any overall

    factors can be absorbed into the weights p.2

  • 8/11/2019 My Dhsch6 Rbf

    3/17

    Basis Functions

    The Gaussian function is alocalizedbasis

    function with the property that

    r

    0as

    r

    .

    Another choice of localized basis function

    r

    r2 2

    0

    r

    x

    j

    A hidden neuron is more sensitive to data points

    near its center. This sensitivity may be tuned byadjusting the width. For a given input vector,typically only a few hidden units will have

    significant activations.

    p.3

  • 8/11/2019 My Dhsch6 Rbf

    4/17

    Expressiveness

    The hidden layer applies a nonlinear

    transformation from the input space to the hidden

    space

    RBF neural networks are capable of universal

    function approximation, with only mild restrictionson the form of the basis functions.

    More general topologies of RBF network (more

    than one hidden layer) are not normallyconsidered.

    p.4

  • 8/11/2019 My Dhsch6 Rbf

    5/17

    Example: Implementing XOR

    z1 e

    x

    1

    2

    1

    0 0

    t

    z2 e

    x

    2

    2

    2

    1 1

    t

    x1 x2 z1 z2

    0 0 1 .135

    0 1 .37 .37

    1 0 .37 .371 1 .135 1

    p.5

  • 8/11/2019 My Dhsch6 Rbf

    6/17

    Example: Implementing XOR

    11

    1

    X1

    X2

    1

    When mapped into the feature space

    z1 z2

    , the two

    classes become linearly separable. p.6

  • 8/11/2019 My Dhsch6 Rbf

    7/17

    Generalized Gaussian RBFs

    The Gaussian RBFs can be generalized to allow for

    arbitrary covariance matricesj.

    j

    x

    exp 1

    2

    x

    j

    t 1

    j

    x

    j

    Since the matricesj are symmetric, each basis function

    hasd

    d

    3

    2independent adjustable parameters, as

    compared with thed

    1independent parameters for the

    regular Gaussian basis functions.

    In practice there is a trade-off to be considered between

    using a smaller number of basis with many adjustable

    parameters and a larger number of less flexible functions.

    p.7

  • 8/11/2019 My Dhsch6 Rbf

    8/17

    Training RBF networks

    Sum-of-squared-error function

    E np 1

    Ep Ep 1

    2

    ck 1

    tkp ykp

    2

    wheretkp is the target value for the output unitkwhenthe network is presented with input vector

    xp.

    zjp e

    xp

    j

    2

    22j ykp

    M

    j 0

    wk jzjp

    Gradient descent algorithm computing gradients p.8

  • 8/11/2019 My Dhsch6 Rbf

    9/17

    Training RBF networks

    Ep

    wk j

    Ep

    ykp

    ykp

    wk j

    Ep

    ykp

    tkp ykp

    ThusEp

    wk j

    tkp ykp

    zjp

    p.9

  • 8/11/2019 My Dhsch6 Rbf

    10/17

    Training RBF networks

    Ep

    ji

    k

    Ep

    yk p

    yk p

    zj p

    zj p

    ji

    zj p

    ji zj p

    ji

    xp

    j

    2

    22j

    zj p

    22j

    ji

    i

    xip ji

    2

    zj p

    22j

    2

    xip ji

    1

    zj p2j

    xip ji

    Ep

    ji

    zj p

    2j

    xip

    ji

    k

    tk p

    yk p

    wk j

    p.10

  • 8/11/2019 My Dhsch6 Rbf

    11/17

    Training RBF networks

    Ep

    j

    k

    Ep

    ykp

    ykp

    zjp

    zjp

    j

    zjp

    j

    zjp

    j

    xp

    j

    2

    22j

    zjp2

    jlnzjp

    Ep

    j

    zjp2

    j

    lnzjpk

    tkp ykp

    wk j

    p.11

  • 8/11/2019 My Dhsch6 Rbf

    12/17

  • 8/11/2019 My Dhsch6 Rbf

    13/17

    Two-stage training of RBF nets

    In the first stage the input data set

    xp

    alone is

    used to determine the parameters of the basis

    functions (e.g.

    j andj for the Gaussian basisfunctions). For example

    randomly select a set of training data as the

    basis function centers

    identify clusters of training data and put a

    basis function centered at each clusterchoosej to be some multiple of averagedistance between the basis function centers.

    p.13

    i i f

  • 8/11/2019 My Dhsch6 Rbf

    14/17

    Two-stage training of RBF nets

    In the second stage, the basis functions are kept

    fixed and the second-layer weights are optimized.

    yk

    x

    M

    j

    0

    wk jj

    x

    Since the basis functions are considered fixed,

    the network is equivalent to a single-layer neural

    network.

    p.14

    T i i f RBF

  • 8/11/2019 My Dhsch6 Rbf

    15/17

    Two-stage training of RBF nets

    Consider a sum-of-squared-error function

    E

    1

    2p k

    tk p

    yk

    xp

    2

    Since the error function is a quadratic function of the

    weights, its minimum can be found in terms of the solution

    of a set of linear equations.

    Setting the gradient ofEwith respect towk j to zero gives

    t

    Wt

    tT

    where

    W

    k j

    wk j,

    p j

    j

    xp

    , and

    T

    pk

    tk p.

    p.15

    T t t i i f RBF t

  • 8/11/2019 My Dhsch6 Rbf

    16/17

    Two-stage training of RBF nets

    Given the pseudoinverse

    t

    1t, theformal solution for the weights is given by

    Wt

    T

    In practice, the equations are solved usingsingular value decomposition, to avoid problems

    due to the possibility oftbeing singular or

    nearly singular.The second-layer weights can be found by fast,

    linear algebra techniques!

    p.16

    T t t i i f RBF t

  • 8/11/2019 My Dhsch6 Rbf

    17/17

    Two-stage training of RBF nets

    The possibility of choosing suitable parameters for the

    hidden units without having to perform a full non-linear

    optimization of the network is one of the major advantages

    of RBF networks, as compared with the multi-layer neural

    networks.

    The use of unsupervised techniques to determine the basis

    function parameters need not to be optimal for the

    supervised learning problem, since it does not take onaccount of the target labels associated with the data

    Unsupervised techniques can be used to initialize the basis

    function parameters before running the gradient descent

    algorithm p.17