LMS Variants

Embed Size (px)

Citation preview

  • 7/28/2019 LMS Variants

    1/19

    1

    Lecture 5: Variants of the LMS algorithm

    Standard LMS Algorithm

    FIR filters:

    y(n) = w0(n)u(n) + w1(n)u(n 1) + . . . + wM1(n)u(n M + 1)

    =M1k=0

    wk(n)u(n k) = w(n)Tu(n), n = 0, 1, 2, . . . ,

    Error between filter output y(t) and a desired signal d(t):

    e(n) = d(n) y(n) = d(n) w(n)Tu(n)

    Change the filter parameters according to

    w(n + 1) = w(n) + u(n)e(n)

    1. Normalized LMS Algorithm

    Modify at time n the parameter vector from w(n) to w(n + 1)

    fulfilling the constraintwT(n + 1)u(n) = d(n)

    with the least modification of w(n), i.e. with the least Euclidian norm of the difference

    w(n + 1) w(n) = w(n + 1)

  • 7/28/2019 LMS Variants

    2/19

    Lecture 5 2

    Thus we have to minimize

    w(n + 1)2 =M1k=0

    (wk(n + 1) wk(n))2

    under the constraintwT(n + 1)u(n) = d(n)

    The solution can be obtained by Lagrange multipliers method:

    J(w(n + 1), ) = w(n + 1)2 +

    d(n) wT(n + 1)u(n)

    =M1k=0

    (wk(n + 1) wk(n))2 +

    d(n)

    M1i=0

    wi(n + 1)u(n i)

    To obtain the minimum of J(w(n + 1), ) we check the zeros of criterion partial derivatives:

    J(w(n + 1), )

    wj(n + 1)= 0

    wj(n + 1)

    M1k=0

    (wk(n + 1) wk(n))2 +

    d(n)

    M1i=0

    wi(n + 1)u(n i)

    = 0

    2(wj(n + 1) wj(n)) u(n j) = 0

    We have thus

    wj(n + 1) = wj(n) +1

    2u(n j)

    where will result from

    d(n) =

    M1i=0

    wi(n + 1)u(n i)

  • 7/28/2019 LMS Variants

    3/19

    Lecture 5 3

    d(n) =M1i=0

    (wi(n) +1

    2u(n i))u(n i)

    d(n) =M1i=0

    wi(n)u(n i) +1

    2M1i=0

    (u(n i))2

    =2(d(n)

    M1i=0 wi(n)u(n i))

    M1i=0 (u(n i))2

    =2e(n)M1

    i=0 (u(n i))2

    Thus, the minimum of the criterion J(w(n + 1), ) will be obtained using the adaptation equation

    wj(n + 1) = wj(n) +2e(n)

    M1i=0 (u(n i))2

    u(n j)

    In order to add an extra freedom degree to the adaptation strategy, one constant, , controlling the step sizewill be introduced:

    wj(n + 1) = wj(n) + 1M1

    i=0 (u(n i))2

    e(n)u(n j) = wj(n) +

    u(n)2e(n)u(n j)

    To overcome the possible numerical difficulties when u(n) is very close to zero, a constant a > 0 is used:

    wj(n + 1) = wj(n) +

    a + u(n)2e(n)u(n j)

    This is the updating equation used in the Normalized LMS algorithm.

  • 7/28/2019 LMS Variants

    4/19

    Lecture 5 4

    The principal characteristics of the Normalized LMS algorithm are the following:

    The adaptation constant is dimensionless, whereas in LMS, the adaptation has the dimensioning of ainverse power.

    Setting

    (n) = a + u(n)2

    we may vue Normalized LMS algorithm as a LMS algorithm with data- dependent adptation step size.

    Considering the approximate expression

    (n) =

    a + M E(u(n))2the normalization is such that :

    * the effect of large fluctuations in the power levels of the input signal is compensated at the adaptationlevel.

    * the effect of large input vector length is compensated, by reducing the step size of the algorithm.

    This algorithm was derived based on an intuitive principle:

    In the light of new input data, the parameters of an adaptive

    system should only be disturbed in a minimal fashion.

    The Normalized LMS algorithm is convergent in mean square sense if

    0 < < 2

  • 7/28/2019 LMS Variants

    5/19

    Lecture 5 5

    Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization)

    0 500 1000 1500 2000 2500103

    102

    101

    100

    Learning curve Ee2(n) for LMS algorithm

    time step n

    =0.075

    =0.025

    =0.0075

    0 500 1000 1500 2000 2500103

    102

    101

    100

    101

    Learning curve Ee2(n) for Normalized LMS algorithm

    time step n

    =1.5

    =1.0

    =0.5

    =0.1

  • 7/28/2019 LMS Variants

    6/19

    Lecture 5 6

    Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization):

    The LMS was run with three different step-sizes: = [0.075;0.025;0.0075]

    The NLMS was run with four different step-sizes: = [1.5; 1.0; 0.5; 0.1]

    With the step-size = 1.5, NLMS behaved definitely worse than with step-size = 1.0 (slower, andwith a higher steady state square error). So = 1.5 is further ruled out

    Each of the three step-sizes = [1.0; 0.5; 0.1] was interesting: on one hand, the larger the step-size, the faster the convergence. But on the other hand, the smaller the step-size, the better thesteady state square error. So each step-size may be a useful tradeoff between convergence speed andstationary MSE (not both can be very good simultaneously).

    LMS with = 0.0075 and NLMS with = 0.1 achieved a similar (very good) average steady state squareerror. However, NLMS was faster.

    LMS with = 0.075 and NLMS with = 1.0 had a similar convergence speed (very good). However,NLMS achieved a lower steady state average square error.

    To conclude: NLMS offers better trade-offs than LMS.

    The computational complexity of NLMS is slightly higher than that of LMS.

  • 7/28/2019 LMS Variants

    7/19

    Lecture 5 7

    2. LMS Algorithm with Time Variable Adaptation Step

    Heuristics of the method: We combine the benefits of two different situations:

    The convergence time constant is small for large .

    The mean-square error in steady state is low for small .

    Therefore, in the initial adaptation stages is kept large, then it is monotonically reduced, such that in thefinal adaptation stage it is very small.

    There are many receipts of cooling down an adaptation process.

    Monotonically decreasing the step size

    (n) =1

    n + c

    Disadvantage for non-stationary data: the algorithm will not react anymore to changes in the optimumsolution, for large values of n.

    Variable Step algorithm:w(n + 1) = w(n) + M(n)u(n)e(n)

    where

    M(n) =

    0(n) 0 . 00 1(n) . 00 0 . 0

    0 0 . M1(n)

  • 7/28/2019 LMS Variants

    8/19

    Lecture 5 8

    or componentwise

    wi(n + 1) = wi(n) + i(n)u(n i)e(n) i = 0, 1, . . . , M 1

    * each filter parameter wi(n) is updated using an independent adaptation step i(n).

    * the time variation of i(n) is ad-hoc selected as

    if m1 successive identical signs of the gradient estimate, e(n)u(n i) ,are observed, then i(n)is increased i(n) = c1i(n m1) (c1 > 1)(the algorithm is still far of the optimum, is better toaccelerate)

    if m2 successive changes in the sign of gradient estimate, e(n)u(n i), are observed, then i(n) isdecreased i(n) = i(nm2)/c2 (c2 > 1) (the algorithm is near the optimum, is better to decelerate;by decreasing the step-size, so that the steady state error will finally decrease).

  • 7/28/2019 LMS Variants

    9/19

    Lecture 5 9

    Comparison of LMS and variable size LMS ( = 0.01n+c)within the example from Lecture 4(channel equalization) with c = [10; 20; 50]

    0 500 1000 1500 2000 250010

    3

    102

    101

    100

    Learning curve Ee2(n) for LMS algorithm

    time step n

    =0.075

    =0.025

    =0.0075

    0 500 1000 1500 2000 250010

    3

    102

    101

    100

    101

    Learning curve Ee2(n) for variable size LMS algorithm

    time step n

    c=10

    c=20c=50

    LMS =0.0075

  • 7/28/2019 LMS Variants

    10/19

    Lecture 5 10

    3. Sign algorithms

    In high speed communication the time is critical, thus faster adaptation processes is needed.

    sgn(a) =

    1; a > 0

    0; a = 01; a < 0

    The Sign algorithm (other names: pilot LMS, or Sign Error)

    w(n + 1) = w(n) + u(n) sgn(e(n))

    The Clipped LMS (or Signed Regressor)

    w(n + 1) = w(n) + sgn(u(n))e(n)

    The Zero forcing LMS (or Sign Sign)

    w(n + 1) = w(n) + sgn(u(n)) sgn(e(n))

    The Sign algorithm can be derived as a LMS algorithm for minimizing the Mean absolute error (MAE)criterion

    J(w) = E[|e(n)|] = E[|d(n) wTu(n)|]

    (I propose this as an exercise for you).

    Properties of sign algorithms:

  • 7/28/2019 LMS Variants

    11/19

    Lecture 5 11

    Very fast computation : if is constrained to the form = 2m, only shifting and addition operations arerequired.

    Drawback: the update mechanism is degraded, compared to LMS algorithm, by the crude quantizationof gradient estimates.

    * The steady state error will increase

    * The convergence rate decreases

    The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for 32000 bps system.

  • 7/28/2019 LMS Variants

    12/19

    Lecture 5 12

    Comparison of LMS and Sign LMS within the example from Lecture 4 (channel equalization)

    0 500 1000 1500 2000 250010

    3

    102

    101

    100

    Learning curve Ee2(n) for LMS algorithm

    time step n

    =0.075

    =0.025

    =0.0075

    0 500 1000 1500 2000 250010

    3

    102

    101

    100

    101

    Learning curve Ee2(n) for Sign LMS algorithm

    time step n

    =0.075

    =0.025

    =0.0075

    =0.0025

    Sign LMS algorithm should be operated at smaller step-sizes to get a similar behavior as standard LMSalgorithm.

  • 7/28/2019 LMS Variants

    13/19

    Lecture 5 13

    4. Linear smoothing of LMS gradient estimates

    Lowpass filtering the noisy gradient

    Let us rename the noisy gradient g(n) = wJ

    g(n) =wJ = 2u(n)e(n)

    gi(n) = 2e(n)u(n i)

    Passing the signals gi(n) through low pass filters will prevent the large fluctuations of direction duringadaptation process.

    bi(n) = LP F(gi(n))

    where LPF denotes a low pass filtering operation. The updating process will use the filtered noisygradient

    w(n + 1) = w(n) b(n)

    The following versions are well known:

    Averaged LMS algorithmWhen LPF is the filter with impulse response h(0) = 1

    N, . . . , h(N 1) = 1

    N, h(N) = h(N + 1) = . . . = 0

    we obtain simply the average of gradient components:

    w(n + 1) = w(n) +

    N

    nj=nN+1

    e(j)u(j)

  • 7/28/2019 LMS Variants

    14/19

    Lecture 5 14

    Momentum LMS algorithm

    When LPF is an IIR filter of first order h(0) = 1 , h(1) = h(0), h(2) = 2h(0), . . . then,

    bi(n) = LP F(gi(n)) = bi(n 1) + (1 )gi(n)

    b(n) = b(n 1) + (1 )g(n)

    The resulting algorithm can be written as a second order recursion:

    w(n + 1) = w(n) b(n)

    w(n) = w(n 1) b(n 1)

    w(n + 1) w(n) = w(n) w(n 1) b(n) + b(n 1)

    w(n + 1) = w(n) + (w(n) w(n 1)) (b(n) b(n 1))

    w(n + 1) = w(n) + (w(n) w(n 1)) (1 )g(n)

    w(n + 1) = w(n) + (w(n) w(n 1)) + 2(1 )e(n)u(n)

    w(n + 1) w(n) = (w(n) w(n 1)) + (1 )e(n)u(n)

    Drawback: The convergence rate may decrease.

    Advantages: The momentum term keeps the algorithm active even in the regions close to minimum.

    For nonlinear criterion surfaces this helps in avoiding local minima (as in neural network learning bybackpropagation)

  • 7/28/2019 LMS Variants

    15/19

    Lecture 5 15

    5. Nonlinear smoothing of LMS gradient estimates

    If there is an impulsive interference in either d(n) or u(n), the performances of LMS algorithm willdrastically degrade (sometimes even leading to instabilty).

    Smoothing the noisy gradient components using a nonlinear filter provides a potential solution.

    The Median LMS Algorithm

    Computing the median of window size N + 1, for each component of the gradient vector, will smoothout the effect of impulsive noise. The adaptation equation can be implemented as

    wi(n + 1) = wi(n) med ((e(n)u(n i)), (e(n 1)u(n 1 i)), . . . , (e(n N)u(n N i)))

    * Experimental evidence shows that the smoothing effect in impulsive noise environment is very strong.

    * If the environment is not impulsive, the performances of Median LMS are comparable with those ofLMS, thus the extra computational cost of Median LMS is not worth.

    * However, the convergence rate must be slower than in LMS, and there are reports of instabilty occurringwhith Median LMS.

  • 7/28/2019 LMS Variants

    16/19

    Lecture 5 16

    6. Double updating algorithm

    As usual, we first compute the linear combination of the inputs

    y(n) = w(n)Tu(n) (1)

    then the error

    e(n) = d(n) y(n)

    The parameter adaptation is done as in Normalized LMS:

    w(n + 1) = w(n) +

    u(n)2e(n)u(n)

    Now the new feature of double updating algorithm is the output estimate as

    y(n) = w(n + 1)Tu(n) (2)

    Since we use at each time instant, n, two different computations of the filter output ((1) and (2)), the nameof the method is double updating.

    One interesting situation is obtained when = 1. Then

    y(n) = w(n + 1)Tu(n) = (w(n)T + u(n)Tu(n)

    e(n)u(n)T)u(n)

    = w(n)Tu(n) + e(n) = w(n)Tu(n) + d(n) w(n)Tu(n) = d(n)

    Finaly, the output of the filter is equal to the desired signal y(n) = d(n), and the new error is zeroe(n) = 0.

    This situation is not acceptable (sometimes too good is worse than good). Why?

  • 7/28/2019 LMS Variants

    17/19

    Lecture 5 17

    7. LMS Volterra algorithm

    We will generalize the LMS algorithm, from linear combination of inputs (FIR filters) to quadratic combina-tions of the inputs.

    y(n) = w(n)Tu(n) =M1

    k=0

    wk(n)u(n k) Linear combiner

    y(n) =M1k=0

    w[1]k (n)u(n k) +

    M1i=0

    M1j=i

    w[2]i,j(n)u(n i)u(n j) (3)

    Quadratic combiner

    We will introduce the input vector with dimension M + M(M + 1)/2:

    (n) =

    u(n) u(n 1) . . . u(n M + 1) u2(n) u(n)u(n 1) . . . u2(n M + 1)T

    and the parameter vector

    (n) =

    w[1]0 (n) w

    [1]1 (n) . . . w

    [1]M1(n) w

    [2]0,0(n) w

    [2]0,1(n) . . . w

    [2]M1,M1(n)

    T

    Now the output of the quadratic filter (3) can be writteny(n) = (n)T(n)

    and therefore the error

    e(n) = d(n) y(n) = d(n) (n)T(n)

    is a linear function of the filter parameters (i.e. the entries of (n))

  • 7/28/2019 LMS Variants

    18/19

    Lecture 5 18

    Minimization of the mean square error criterion

    J() = E(e(n))2 = E(d(n) (n)T(n))2

    will proceed as in the linear case, resulting in the LMS adptation equation

    (n + 1) = (n) + (n)e(n)

    Some proprties of LMS for Volterra filters are the following:

    The mean sense convergence of the algorithm is obtained when

    0