Adaptive Fuzzy Filtering in a Deterministic Setting

Embed Size (px)

Citation preview

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    1/14

    IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009 763

    Adaptive Fuzzy Filtering in a Deterministic SettingMohit Kumar, Member, IEEE, Norbert Stoll, and Regina Stoll

    AbstractMany real-world applicationsinvolve the filtering andestimation of process variables. This study considers the use of in-terpretable Sugeno-type fuzzy models for adaptive filtering. Ouraim in this study is to provide different adaptive fuzzy filteringalgorithms in a deterministic setting. The algorithms are derivedand studied in a unified way without making any assumptions onthe nature of signals (i.e., process variables). The study extends,in a common framework, the adaptive filtering algorithms (usu-ally studied in signal processing literature) andp-norm algorithms(usually studied in machine learning literature) to semilinear fuzzymodels. A mathematical framework is provided that allows thedevelopment and an analysis of the adaptive fuzzy filtering algo-rithms. We study a class of nonlinear LMS-like algorithms for theonline estimation of fuzzy model parameters. A generalization ofthe algorithms to the p-norm is provided using Bregman diver-

    gences (a standard tool for online machine learning algorithms).Index TermsAdaptive filtering algorithms, Bregman diver-

    gences, p-norm, robustness, Sugeno fuzzy models.

    I. INTRODUCTION

    AREAL-WORLD complex process is typically character-

    ized by a number of variables whose interrelations are

    uncertain and not completely known. Our concern is to ap-

    ply, in an online scenario, the fuzzy techniques for such pro-

    cesses aiming at the filtering of uncertainties and the estimation

    of variables. The adaptive filtering algorithms applications are

    not only limited to the engineering problems but also, e.g., to

    medicinal chemistry, where it is required to predict the biologi-cal activity of a chemical compound before its synthesis in the

    laboratory [1]. Once a compound is synthesized and tested ex-

    perimentally for its activity, the experimental data can be used

    for an improvement of the prediction performance (i.e., online

    learning of the adaptive system). Adaptive filtering of uncer-

    tainties may be desired, e.g., for an intelligent interpretation of

    medical data that are contaminated by the uncertainties arising

    from the individual variations due to a difference in age, gender,

    and body conditions [2].

    We focus on a process model with n inputs (represented bythe vector x Rn ) and a single output (represented by the scalary). Adaptive filtering algorithms seek to identify the unknown

    Manuscript received March 9, 2007; revised August 10, 2007; acceptedOctober 30, 2007. First published April 30, 2008; current version publishedJuly 29, 2009. This work was supported by the Center for Life Science Automa-tion, Rostock, Germany.

    M. Kumar is with the Center for Life Science Automation, D-18119 Rostock,Germany (e-mail: [email protected]).

    N. Stoll is with the Institute of Automation, College of Computer Scienceand Electrical Engineering, University of Rostock, D-18119 Rostock, Germany(e-mail: [email protected]).

    R. Stoll is with the Institute of Preventive Medicine, Faculty of Medicine,University of Rostock, D-18055 Rostock, Germany (e-mail: [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TFUZZ.2008.924331

    TABLE I

    EXAMPLES OF (e) FROM [6][8]

    parameters of a model (being characterized by a vector w)using inputoutput data pairs {x(j), y(j)} related via

    y(j) = M(x(j), w) + nj

    where M(x(j), w) is the model output for an input x(j),and nj

    is theunderlying uncertainty. If the chosenmodel Mis nonlinearin parameter vector w (as is the casewithneural and fuzzy mod-els), the standard gradient-descent algorithm is mostly used for

    an online estimation ofw via performing following recursions:

    wj = wj 1

    E r(w, j)

    w

    wj 1

    Er(w, j) =1

    2[y(j) M(x(j), w)]2 (1)

    where is the step size (i.e., learning rate).If we relax our model to be linear in w i.e., inputoutput data

    are related via

    y(j) = GTj w + nj , where Gj is the regressor vector

    then a variety of algorithms are available in the literature for an

    adaptive estimation of linear parameters [3]. The most popular

    algorithm is the LMS because of its simplicity and robustness

    [4], [5]. Many LMS-like algorithms have been studied for linear

    models [3], [4] while addressing the robustness, convergence,

    and steady-state error issues. A particular class of algorithms

    takes the update form as

    wj = wj 1 (GT

    j wj 1 y(j))Gj

    where is a nonlinear scalar function such that different choicesof the functional form lead to the different algorithms, as stated

    in Table I. The generalization of LMS algorithm to p-norm(2 p < ) is given by the update rule [9], [10]:

    wj = f1 (f(wj 1 ) [G

    Tj wj 1 y(j)]Gj ). (2)

    Here, f (a p indexing for f is understood), as defined in [10], isthe bijective mapping f : RK RK such that

    f = [f1 fK]T, fi (w) =

    sign(wi )|wi |q1

    wq2q(3)

    where w = [w1 wK]T RK, q is dual to p (i.e., 1/p +

    1/q = 1), and q denotes the q-norm defined as

    wq = i |wi |q1/q

    .

    1063-6706/$26.00 2009 IEEE

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    2/14

    764 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    The inverse f1 : RK RK is given as

    f1 =

    f11 f1K

    T, f1i (v) =

    sign(vi )|vi |p1

    vp2p(4)

    where v = [v1 vK]T RK.

    Sugeno-type fuzzy models are linear in consequents and non-

    linear in antecedents (i.e., membership functions parameters).

    When it comes to the online estimation of fuzzy model param-

    eters, the following two approaches, in general, are used.

    1) The antecedent parameters are adapted using gradients-

    descent and the consequent parameters by the recursive

    least-squares algorithm [11], [12].

    2) A combination of data clustering and recursive least-

    squares algorithm is applied [13], [14].

    The wide use of gradient-descent algorithm for adaptation of

    nonlinear fuzzy model parameters (e.g., in [15]) is due to its sim-

    plicity and low computational cost. However, gradient-descent-

    based algorithms for nonlinear systems are not justified by rig-orous theoretical arguments [16]. Only a few papers dealing

    with the mathematical analysis of the adaptive fuzzy algorithms

    have appeared till now. The issue of algorithm stability has been

    addressed in [17]. The authors in [18] introduce an energy gain

    bounding approach for the estimation of parameters via min-

    imizing the maximum possible value of energy gain from dis-

    turbances to the estimation errors along the line ofH-optimalestimation. The algorithms for the adaptive estimation of fuzzy

    model parameters based on least-squares and H-optimizationcriteria are provided in [19].

    To the knowledge of the authors, the fuzzy literature still

    lacks1) the development and deterministic mathematical analysis

    (in terms of filtering performance) of the methods that

    extend the Table I type algorithms (i.e., LMF, LMMN,

    sign error, etc.) to the interpretable fuzzy models;

    2) the generalization of the algorithms with error nonlinear-

    ities (i.e., Table I type algorithms) to the p-norms that aremissing, even for linear in parameters models;

    3) the development and deterministic mathematical analysis

    (in terms of filtering performance) of the p-norm algo-rithms [e.g., of type (2)] for an adaptive estimation of the

    parameters of an interpretable fuzzy model.

    This paper is intended to provide the aforementioned studies

    in a unified manner. This is done via solving a constrainedregularized nonlinear optimization problem in Section II.

    Section III provides the deterministic analysis of the algorithms

    with emphasis on filtering errors. Simulation studies are pro-

    vided in Section IV followed by some remarks and, finally, the

    conclusion.

    II. ADAPTIVE FUZZY ALGORITHMS

    Sugeno-type fuzzy models are characterized by two types of

    parameters: consequents and antecedents. If we characterize the

    antecedents using a vector and consequents using a vector

    , then the output of a zero-order TakagiSugeno fuzzy model

    could be characterized as

    Fs (x) = GT (x, ), c h (5)

    where G() is a nonlinear function (which is defined by the shapeof membership functions), and c h is a matrix inequality tocharacterize the interpretability of the model. The details of

    (5) can be found in, e.g., [19] as well as in the Appendix. Astraightforward approach to the design of an adaptive fuzzy fil-

    ter algorithm is to update, at time time j, the model parameters(j 1 , j 1 ) based on current data pair (x(j), y(j)), where weseek to decrease the loss term |y(j) GT (x(j), j )j |

    2 ; how-

    ever, we do not want to make big changes in initial parameters

    (j 1 , j 1 ). That is

    (j , j ) = arg min(,,c h)

    1

    2|y(j) GT (x(j), )|2

    +1j

    2 j 1

    2 +1, j

    2 j 1

    2

    (6)

    where j > 0, , j > 0 are the learning rates for antecedentsand consequents, respectively, and denotes the Euclideannorm (i.e., we write the 2-norm of a vector instead of 2 as ). The terms j 1

    2 and j 1 2 provide regular-

    ization to the adaptive estimation problem. To study the different

    adaptive algorithms in a unified framework, the following gen-

    eralizations can be provided to the loss as well as regularization

    term.

    1) The loss term is generalized using a function Lj (, ).Some examples ofLj (, ) include

    Lj (, ) =

    |y(j) GT (x(j), )|12 |y(j) G

    T(x(j), )|2

    14 |y(j) G

    T(x(j), )|4

    a2 |y(j) G

    T (x(j), )|2

    + b4 |y(j) GT (x(j), )|4 .

    (7)

    2) The regularization terms are generalized using Bregman

    divergences [9], [20]. The Bregman divergence dF(u, w)[21], which is associated with a strictly convex twice dif-

    ferentiable function F from a subset ofRK to R, is definedfor u, w RK, as follows:

    dF(u, w) = F(u) F(w) (u w)T f(w)

    where f = F denotes the gradient of F. Note thatdF(u, w) 0, which is equal to zero only for u = w andstrictly convex in u. Some of the examples of Bregmandivergences are as follows.

    a) Bregman divergence associated to the squared

    q-norm: If we define F(w) = (1/2)w2q, then thecorresponding Bregman divergence dq(u, w) is de-fined as

    dq(u, w) =1

    2u2q

    1

    2w2q (u w)

    T f(w)

    where f is given by (3). It is easy to see that for

    q = 2, we have d2 (u, w) = (1/2)u w2 .

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    3/14

    KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 765

    b) Relative entropy: For a vector w = [w1 wK] RK (with wi 0), if we define F(w) =K

    i=1 (wi ln wi wi ), then the Bregman diver-gence is the unnormalized relative entropy:

    dRE (u, w) =K

    i= 1 ui lnui

    wi

    ui + wi.Bregman divergences have beenwidely studied in learning

    and informationtheory (see,e.g.,[22][28]).However,our

    idea is to replace the regularization terms (1j /2)

    j 1 2 and (1, j /2) j 1

    2 in (6) by the generalized

    terms 1j dF (, j 1 ) and 1 ,j dF(, j 1 ), respectively.

    This approach, in the context of linear models, was intro-

    duced for deriving predictive algorithms in [29] and filter-

    ing algorithms in [9]. It is obvious that different choices

    of function F result in the different filtering algorithms.Our particular concern in this text is to provide a p-normgeneralization to the filtering algorithms. For the p-norm(2 p < ) generalization of the algorithms, we con-sider the Bregman divergence associated to the squared

    q-norm [i.e., F(w) = (1/2)w2q], where q is dual to p(i.e., 1/p + 1/q = 1).

    In view of these generalizations, the adaptive fuzzy algo-

    rithms take the form

    (j , j ) = arg min(,,c h )

    [Lj (, )

    + 1j dq(, j 1 ) + 1, j dq(, j 1 )]. (8)

    For a particular choice Lj (, ) = (1/2)|y(j) GT (x(j),)|2 and q = 2, problem (8) reduces to (6). For a given valueof, we define

    () = arg min

    Ej (, )

    Ej (, ) = Lj (, ) + 1

    j dq(, j 1 )

    so that the estimation problem (8) can be formulated as

    j = arg min

    [Ej ((), ) + 1, j dq(, j 1 ), c h] (9)j = (j ). (10)

    Expressions (9) and (10) represent a generalized update rule for

    adaptive fuzzy filtering algorithms that can be particularized for

    a choice ofLj (, ) and q. For any choice of Lj (, ) listed in(7), Ej (, ) is convex in and, thus, could be minimized w.r.t. by setting its gradient equal to zero. This results in

    1j f() 1

    j f(j 1 )

    y(j) GT (x(j), )

    G(x(j), ) = 0 (11)

    where function is given as

    (e) =

    sign(e), for sign error

    e, for LMS

    e3 , for LMF

    ae + be3 , for LMMN.

    (12)

    The minimizing solution must satisfy (11). Thus = f1 (f(j 1 )

    +j

    y(j) GT (x(j), )G(x(j), ) . (13)For a given , (13) is implicit in

    and could be solved nu-

    merically. It follows from (13) that for a sufficient small value

    of j , it is reasonable to approximate the term GT(x(j), )on the right-hand side of (13) with the term GT(x(j), )j 1 ,as has been done in [9] to obtain the explicitupdate. Thus, an

    approximate but in closed form, the solution of (13) is given as

    () = f1 (f(j 1 )+ j

    y(j) GT (x(j), )j 1

    G(x(j), )

    . (14)

    Here, () has been written to indicate the dependenceof the solution on . Since dq(, j 1 ) = (1/2)

    2q

    (1/2)j 1 2q ( j 1 )T f(j 1 ), (9) is equivalent to

    j = arg min Ej ((), )+

    1 ,j2

    2q 1, j

    Tf(j 1 ), c h

    (15)

    as the remaining terms are independent of . There is no harmin adding a -independent term in (15):

    j = arg min

    Ej ((), ) + 1 ,j

    22q

    1 ,j T f(j 1 ) +

    1, j2

    f(j 1 )2 , c h

    . (16)

    For any 2 p < , we have 1 < q 2, and thus, q .This makes

    1, j2

    2q 1 ,j

    T f(j 1 ) +1, j

    2f(j 1 )

    2 (17)

    1 ,j

    2[2 2Tf(j 1 ) + f(j 1 )

    2 ]

    =1 ,j

    2 f(j 1 )

    2 . (18)

    Solving the constrained nonlinear optimization problem (16),

    as we will see, becomes relatively easy by slightly modifying(decreasing) the level of regularization being provided in the

    estimation of j . The expression (17), i.e., last three terms ofoptimization problem (16), accounts for the regularization in

    the estimation ofj . For a given value of, j , a decrease in thelevel of regularization would occur via replacing the expression

    (17) in the optimization problem (16) by expression (18):

    j = arg min

    Ej ((), ) + 1, j

    2 f(j 1 )

    2 , c h

    .

    (19)

    For a viewpoint of an adaptive estimation of vector , noth-

    ing goes against considering (19) instead of (16), since any

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    4/14

    766 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    desired level of regularization could be still achieved via adjust-

    ing in (19) the value of free parameter , j . The motivation ofconsidering (19) is derived from the fact that it is possible to

    reformulate the estimation problem as a least-squares problem.

    To do so, define a vector

    r() = Ej ((), )(1, j /2)1/2 ( f(j 1 )) where Ej ((), ) 0, and , j > 0. Now, it is possible torewrite (19) as

    j = arg min

    [r()2 , c h]. (20)

    To compute j recursively based on (20), we suggest followingGaussNewton like algorithm:

    j = j 1 + s(j 1 ) (21)

    s() = arg mins

    r() + r()s2 , cs h c

    (22)

    where r() is the Jacobian matrix of vector r w.r.t. computed

    by the method of finite differences. Fortunately, r

    () is a full-rank matrix, since ,j > 0. The constrained linear least-squaresproblem (22) can be solved by transforming it first to a least

    distance programming (see [30] for details). Finally, (10) in

    view of (14) becomes

    j = f1 (f(j 1 )

    + j

    y(j) GT (x(j), j )j 1

    G(x(j), j )

    . (23)

    III. DETERMINISTIC ANALYSIS

    We provide in this section a deterministic analysis of adap-

    tive fuzzy filtering algorithms (21)(23) in terms of filtering

    performance. For this, consider a fuzzy model that fits giveninputoutput data {x(j), y(j)}kj =0 according to

    y(j) = GT (x(j), j ) + vj (24)

    where is some true parameter vector (that is to be estimated),j is given by (21), and vj accommodates any disturbance dueto measurement noise, modeling errors, mismatch between jand global minima of (20), and so on. We are interested in

    the analysis of estimating using (23) in the presence of adisturbance signal vj . That is, we take j as an estimate of

    at thejth time instant and try to calculate an upper bound on thefiltering errors. In filtering setting, it is desired to estimate the

    quantity GT

    (x(j), j )

    using an adaptive model. The j 1 isa priori estimate of at the jth time index, and thus, a priorifiltering error can be expressed as

    ef ,j = GT (x(j), j )

    GT (x(j), j )j 1 .

    One would normally expect |GT (x(j), j ) GT (x(j),

    j )j 1 |2 to be the performance measure of an algorithm. How-

    ever, the squared error as a performance measure does not seem

    to be suitable for a uniform analysis of all the algorithms. We

    introduce a generalized performance measure P (y, y) that isdefined for scalars y and y as

    P (y, y) = y

    y((r) (y))dr (25)

    Fig. 1. Value P (a, b) is equal to the area of the shaded region.

    where is a continuous strictly increasing function with (0) =0.Itcanbeeasilyseenthatfor (e) = e, we have normal squarederror i.e., P (y, y) = (y y)

    2 /2. A different but integral-basedloss function, called matching loss for a continuous, increasing

    transfer function , was considered in [31] and [32] for a singleneuron model. The matching loss for was defined in [32] as

    M (y, y) =

    1 (y)1 (y)

    ((r) y)dr.

    If we let (r) =

    (r)dr, then

    P (y, y) = (y) (y) (y y)(y). (26)

    Note that in definition (25), the continuous function is not anarbitrary function but its integral (r) =

    (r)dr must be a

    strictly convex function. The strictly increasing nature of (r)[i.e., strict convexity of(r)] enables us to assess the mismatchbetween y and y using P (y, y). Fig. 1 illustrates the physicalmeaning ofP (a, b). The value P (a, b) is equal to the area ofthe shaded region in the figure. One could infer from Fig. 1 that

    a mismatch between a and b could be assessed via calculatingthe area of the shaded region [i.e., P (a, b)] provided the givenfunction is strictly increasing. Usually, P is not symmetric,i.e., P (y, y) = P (y, y). One of such types of performancemeasures [i.e., M (y, y)] was considered previously in [22].

    In our analysis, we assess the instantaneous filter-ing error [i.e., the mismatch between GT(x(j), j )j 1and GT (x(j), j )

    ] by calculating P (GT(x(j), j )j 1 ,GT(x(j), j )

    ). For the case (e) = e, we haveP

    GT(x(j), j )j 1 , GT (x(j), j )

    = |ef ,j |2 /2. The

    filtering performance of an algorithm, which is run from j = 0to j = k, can be evaluated by calculating the sum

    kj = 0

    P

    GT (x(j), j )j 1 , GT(x(j), j )

    .

    Similarly, the magnitudes of disturbances vj = y(j)

    GT(x(j), j ), j = 0, . . . , k will be assessed by calculating

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    5/14

    KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 767

    the mismatch between y(j) and GT (x(j), j ) as follows:

    kj = 0

    P

    y(j), GT (x(j), j )

    .

    Finally, the robustness of an algorithm (i.e., sensitivity of filter-

    ing errors toward disturbances) can be assessed by calculating

    an upper bound on the ratio, e.g., (29). The term dq(, 1 )in the denominator of (29) assesses the disturbance due to a

    mismatch between initial guess 1 and the true vector .

    Lemma 1: Let m be a scalar and Gj RK such that j =f1 (f(j 1 ) + mGj ), and then

    dq(j 1 , j ) m2

    2(p 1)Gj

    2p .

    Proof : See [10, Lemma 4]. Lemma 2: Ifj and j 1 are related via (23), then

    dq(, j 1 ) dq(

    , j ) + dq(j 1 , j )

    = j y(j)GT (x(j), j )j 1 (j 1 )T G(x(j), j ).Proof: The proof follows simply by using the definitions of

    dq(, j 1 ), dq(

    , j ), and dq(j 1 , j ). In view of Lemma 1 and (23), we have

    dq(j 1 , j ) 2

    j

    |

    y(j) GT (x(j), j )j 1

    |2

    2

    (p 1)G(x(j), j )2

    p . (27)

    Theorem 1: The estimation algorithm (21)(23) with beinga continuous strictly increasing function and (0) = 0, for any

    2 p < , with a learning rate

    0 < j 2P

    y(j), GT(x(j), j )j 1

    den

    (28)

    where

    den =

    y(j) GT (x(j), j )j 1

    (p 1)

    [(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2

    p

    achieves an upper bound on filtering errors such thatkj =0

    aj P

    GT (x(j), j )j 1 , G

    T (x(j), j )

    k

    j =0 a

    j P (y(j), GT (x(j), j )) + dq(, 1 )

    1

    (29)where qis dual to p, and aj > 0 is given as

    aj = j

    y(j) GT(x(j), j )j 1

    (y(j)) (GT (x(j), j )j 1 ). (30)

    Here, we assume that y(j) = GT(x(j), j )j 1 , since thereis no update of parameters (i.e., j = j 1 ) if y(j) =GT(x(j), j )j 1 .

    Proof: Define j = dq(, j 1 ) dq(

    , j ) and usingLemma 2, we have

    j = j

    y(j) GT (x(j), j )j 1

    ( j 1 )

    T G(x(j), j )

    dq(j 1 , j ). (31)

    Using (27), we have

    j j

    y(j)GT(x(j), j )j 1

    (j 1 )T G(x(j), j )

    p 1

    22j |

    y(j) GT(x(j), j )j 1

    |2 G(x(j), j )

    2p .

    That is

    j

    aj [(y(j)) (GT (x(j), j )j 1 )](

    j 1 )T G(x(j), j )

    p 1

    2j

    y(j) GT(x(j), j )j 1

    aj

    [(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2

    p .

    Since j satisfies (28), the previous inequality reduces to

    j

    aj [(y(j)) (GT (x(j), j )j 1 )](

    j 1 )T G(x(j), j )

    a

    j P (y(j), GT

    (x(j), j )j 1 ). (32)It can be verified using definition (26) that

    [(y(j)) (GT(x(j), j )j 1 )]( j 1 )

    TG(x(j), j )

    = P

    GT (x(j), j )j 1 , GT (x(j), j )

    P (y(j), GT (x(j), j )

    )+P (y(j), GT (x(j), j )j 1 )

    and thus, inequality (32) is further reduced to

    j a

    j P

    GT (x(j), j )j 1 , GT (x(j), j )

    aj P

    y(j), GT (x(j), j )

    .

    In additionk

    j =0

    j = dq(, 1 ) dq(

    , k )

    dq(, 1 ), since dq(

    , k ) 0

    resulting in

    dq(, 1 )

    kj = 0

    aj P

    GT (x(j), j )j 1 , GT (x(j), j )

    k

    j = 0 aj P y(j), GT (x(j), j )from which the inequality (29) follows.

    Following inferences could be immediately made from

    Theorem 1.

    1) For the special case (e) = e and taking 1 = 0, theresults of Theorem 1 are modified as follows:k

    j = 0 j |(GT (x(j), j )j 1 G

    T (x(j), j )|2k

    j = 0 j |y(j) GT(x(j), j )|2 + 2q

    1

    where

    j 1

    (p 1)G(x(j), j )2p.

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    6/14

    768 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    Choosing

    j =1

    (p 1)U2p, where Up G(x(j), j )p

    we get

    k

    j =0

    |(GT

    (x(j), j )j 1 GT

    (x(j), j )

    |2

    k

    j =0

    |y(j) GT (x(j), j )|2 + (p 1)U2p

    2q

    which is formally equivalent to [9, Th. 2]

    2) Inequality (29) illustrates the robustness property in the

    sense that if, for the given positive values {aj }k

    j = 0 , the

    disturbances {vj }k

    j =0 are small, i.e.,

    k

    j =0P

    y(j), GT (x(j), j )

    is small, then the filtering errors

    kj =0

    P

    GT (x(j), j )j 1 , GT(x(j), j )

    remain small. The positive value aj represents a weightgiven to the jth data pair in the summation.

    3) If we define an upper bound on the disturbance signal as

    vm ax = maxj

    P

    y(j), GT(x(j), j )

    then it follows from (29) that

    kj = 0

    aj P GT (x(j), j )j 1 , GT (x(j), j ) (k + 1)am ax v

    m ax + dq(

    , 1 )

    where ama x = maxj

    aj . That is

    1

    k + 1

    kj = 0

    aj P

    GT (x(j), j )j 1 , GT (x(j), j )

    am ax vm ax +

    dq(, 1 )

    k + 1. (33)

    As dq(, 1 ) is finite, we have

    1k + 1

    kj = 0

    aj PGT (x(j), j )j 1 , GT (x(j), j )

    ama x vma x , when k .

    Inequality (33) shows the stability of the algorithm against

    disturbance vj in the sense that if disturbance signalP

    y(j), GT (x(j), j )

    is bounded (i.e., vm ax is fi-nite), then the average value of filtering errors assessed

    as

    1

    k + 1

    kj = 0

    aj P

    GT (x(j), j )j 1 , GT (x(j), j )

    remains bounded.

    4) In the ideal case of zero disturbances (i.e., vj = 0), wehave P

    y(j), GT (x(j), j )

    = 0, andk

    j = 0

    aj P

    GT (x(j), j )j 1 , GT (x(j), j )

    dq(, 1 ).

    Since dq(, 1 ) is finite and

    aj P (G

    T (x(j), j )

    j 1 , GT (x(j), j )

    ) 0, there must exist a sufficientlarge index T such that

    j T

    aj P

    GT (x(j), j )j 1 , GT(x(j), j )

    = 0.

    In other words

    GT (x(j), j )j 1 = GT (x(j), j )

    j T

    since aj > 0. This shows the convergence of thealgorithmat time index T toward true parameters.

    5) Theorem 1 cannot be applied for the case of (e) =

    sign(e), since in this case is not a continuous strictlyincreasing function. However, one could instead choose

    (e) = tanh(e) that has a shape similar to that of signfunction, and Theorem 1 still remains applicable. Note

    that in this case, the loss term in (8) will be

    Lj (, ) = ln(cosh(y(j) GT(x(j), ))).

    Theorem 1 is an important result due to the generality of

    function , and thus, offers a possibility of studying, in a unifiedframework, different fuzzy adaptive algorithms corresponding

    to the different choices of continuous strictly increasing function

    . Table II provides a few examples of (plotted in Fig. 2),leading to the different p-norms algorithms listed in the table asA1,p , A2,p , and so on.

    IV. SIMULATION STUDIES

    This section provides simulation studies to illustrate the

    following.

    1) Adaptive fuzzy filtering algorithms A1,p , A2,p , . . ., p 2perform better than the commonly used gradient-descent

    algorithm.

    2) For the estimation of linear parameters (keeping mem-

    bership functions fixed), LMS is the standard algorithm

    known for its simplicity and robustness. This study

    provides a family of algorithms characterized by and

    p. It will be shown that the p-norm generalization of thealgorithms corresponding to the different choices of may achieve better performance than the standard LMS

    algorithm.

    3) We will show the robustness and convergence of the fil-

    tering algorithms.

    4) For a given value of p, the algorithms corresponding tothe different choices of may prove being better than thestandard choice (e) = e (i.e., squared error loss term).

    A. First Example

    Consider the problem of filtering noise from a chaotic

    time series. The time series is generated by simulating the

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    7/14

    KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 769

    TABLE IIFEW EXAMPLES OF ADAPTIVE FUZZY FILTERING ALGORITHMS

    Fig. 2. Different examples of function .

    MackeyGlass differential delay equation

    dx

    dt=

    0.2x(t 17)

    1 + x10 (t 17) 0.1x(t)

    y(t) = x(t) + n(t)

    where x(0) = 1.2, x(t) = 0, for t < 0, and n(t) is a ran-dom noise chosen from a uniform distribution on the in-

    terval [0.2, 0.2]. The aim is to filter the noise n(t) fromy(t) to estimate x(t) by using a set of past values i.e.,[x(t 24), x(t 18), x(t 12), x(t 6)].

    Assume that there exists an ideal fuzzy model characterized

    by (, ) that models the relationship between input vec-tor [x(t 24) x(t 18) x(t 12) x(t 6)]T and output x(t).That is

    x(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )

    y(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )

    + n(t).

    The fourth-order RungeKutta method was used for the simula-

    tion of time series, and 500 inputoutput data pairs, starting fromt = 124 to t = 623, were extracted for an adaptive estimation ofparameters (, ) using different algorithms. If (t1 , t1 )denotes the a priori estimation at time t, then filtering error ofan algorithm is defined as

    ef ,t = x(t) y(t)

    where

    y(t)

    = GT [x(t24) x(t 18) x(t12) x(t6)]T , t1t1 .We consider a total of 30 differentalgorithms i.e., A1 ,p , . . . , A5,pfor p = 2, 2.2, 2.4, 2.6, 2.8, 3, running from t = 124 to t =623, for the prediction of the desired value x(t). We choose,for an example sake, the trapezoidal type of membership

    functions (defined by 36) such that the number of mem-

    bership functions assigned to each of the four inputs [i.e.,

    x(t 24), x(t 18), x(t 12), x(t 6)]isequalto3.Theini-tial guess about model parameters was taken as

    12 3 =

    0

    0.30.60.91.21.5

    T

    0

    0.30.60.91.21.5

    T

    0

    0.30.60.91.21.5

    T

    0

    0.30.60.91.21.5

    T

    where 12 3 = [0]81 1 . The matrix c and vector h are chosen insuch a way that the two consecutive knots must remain separated

    at least by a distance of 0.001 during the recursions of thealgorithms.

    The gradient-descent algorithm (1), if intuitively applied to

    the fuzzy model (5), needs following considerations.

    1) The antecedent parameters of the fuzzy model should be

    estimated with the learning rate , while the consequent

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    8/14

    770 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    parameters with the learning rate . That is

    jj

    =

    j 1j 1

    00

    Er(,,j)

    j 1 ,j 1

    Er(,,j)

    j 1 ,j 1

    Er(,,j) =1

    2[y(j) GT(x(j), )]2 .

    Introducing the notation j =

    Tj T

    j

    T, the gradient-

    descent update takes the form

    j = j 1

    00

    Er(, j)

    j 1

    .

    Here, may or may not be equal to . In general,

    should be lesser than to avoid any oscillations of theestimated parameters.2) During the gradient-descent estimation of membership

    functions parameters, in the presence of disturbances, the

    knots (elements of vector ) may attempt to come close (oreven cross) one another. That is, inequalities (38) and (39)

    do not hold good and, thus, result in a loss of interpretabil-

    ity and estimation performance. For a better performance

    of gradient-descent, the knots must be prevented from

    crossing one another by modifying the estimation scheme

    as

    j = j 1 0

    0 Er(, j) j 1 , if cj hj 1 , otherwise.

    (34)

    Each of the aforementioned algorithms and gradient-descent

    algorithm (34) is run at = = 0.9. Thefiltering performanceof each algorithm is assessed via computing energy of the fil-

    tering error signal ef ,t . The energy of a signal is equal to thesquared L2 -norm. The energy of filtering error signal ef ,t , fromt = 124 to t = 623, is defined as

    62 3

    t=124 |e

    f ,t |2

    .

    A higher energy of filtering errors means the higher magnitudes

    of filtering errors and, thus, a poor performance of the filtering

    algorithm.

    Fig. 3 compares the different algorithms by plotting their

    filtering errors energies at different values of p. As seenfrom Fig. 3, all the algorithms A1,p , . . . , A5,p for p =2, 2.2, 2.4, 2.6, 2.8, 3 perform better than the gradient-descent since graident-descent method is associated to a higher

    energy of the filtering errors. As an illustration, Fig. 4 shows the

    time plot of the absolute filtering error |ef ,t | for gradient-descent

    and the algorithm A4,p at p = 2.4.

    Fig. 3. Filtering performance of different adaptive algorithms.

    Fig. 4. Time plot of absolute filtering error |ef ,t | from t = 124 to t = 623for gradient-descent and the algorithm A4 ,2 .4 .

    B. Second Example

    Now, we consider a linear fuzzy model (membership func-

    tions being fixed)

    y(j) = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]

    + vj

    where = [ 0.25 0.5 1 0.3 ]T , x(j) takesrandom val-ues from a uniform distribution on [1, 1], vj is a random noisechosen from a uniform distribution on the interval [0.2, 0.2],and the membership functions are defined by (37) taking

    = (1, 0.3, 0.3, 1).Algorithm (23) was employed to estimate for different

    choices of (listed in Table II) atp = 3 (asanexampleofp > 2).The initial guess is taken as 1 = 0. For comparison, the LMSalgorithm is also simulated. Note that the LMS algorithm is

    just a particularization of (23) for p = 2 and (e) = e. That is,

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    9/14

    KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 771

    Fig. 5. Comparison of algorithms (A1 ,3 , A2 ,3 , A3 , 3 , A4 ,3 , A5 ,3 ) with theLMS.

    A2,2 in Table II, in the context of linear estimation, is the LMSalgorithm. The performance of an algorithm was evaluated by

    calculating the instantaneous a priori error in the estimation of

    as j 1 2 . The learning rate j for each algorithm,including LMS, is chosen according to (28) as follows:

    j

    =2P

    y(j), GTj j 1

    y(j)GTj j 1

    [(y(j))(GTj j 1 )](p1)Gj 2

    p

    where

    Gj = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]T .

    For an assessment of the expected error values i.e., E[ j 1

    2 ], 500 independent experiments have been performed.The 500 independent time plots of values j 1

    2 have

    been averaged to obtain the time plot of E[ j 1 2 ],

    shown in Fig. 5. The better performance of algorithms

    (A1 ,3 , A2,3 , A3,3 , A4; 3 , A5,3 ) than the LMS can be seen inFig. 5.

    This example indicates that the p-norm generalization of thealgorithms makes a sense since A2,p algorithm for p = 3 (avalue of p > 2) proved being better than the p = 2 case (i.e.,LMS).

    C. Robustness and Convergence

    To study therobustness properties of thealgorithms, we inves-

    tigate the sensitivity of the filtering errors toward disturbances.

    To make this more precise, we plot the curve between distur-

    bance energy and filtering errors energy and analyze the curve.

    In the aforementioned example (i.e., second example), the en-

    ergy of filtering errors will be defined as

    Ef(j) =

    j

    i= 0 GTi GTi i1 2

    Fig. 6. Filtering errors energy as function of energy of disturbances. Thecurves are averaged over 100 independent experiments.

    and the total energy of disturbances as

    Ed (j) =

    ji= 0

    |vi |2 + 1

    2

    where the term 1 2 accounts for the disturbancedue to a mismatch between the initial guess and true

    parameters. Fig. 6 shows the plot between the values

    {Ed (j)}1000

    j = 0 and {Ef(j)}1000

    j = 0 for each of the algorithms

    (A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ). The curves for (A2,3 , A4,3 , A5,3 )are not distinguishable. The curves in Fig. 6 have been averaged

    over 100 independent experiments. The initial guess 1 is cho-sen in each experiment randomly from a uniform distributionon [1, 1]. The curves in Fig. 6 show the robustness of the algo-rithms in the sense that a small energy of disturbances does not

    lead to a large energy of filtering errors. Thus, if the disturbances

    are bounded, then the filtering errors also remain bounded [i.e.,

    bounded-input bounded-output (BIBO) stability].

    To verify the convergence properties of the algorithms,

    the plots of Fig. 5 are redrawn via running the algorithms

    (A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ) in an ideal case of zero distur-bances (i.e., vj = 0). Fig. 7 shows, in this case, the convergenceof the algorithms toward true parameters.

    D. Third Example

    Our third example has been taken from [9], where at time t,a signal yt is transmitted over a noisy channel. The recipientis required to estimate the sent signal yt out of the actuallyobserved signal

    rt =k 1i=0

    ui+ 1 yti + vt

    where vt is zero-mean Gaussian noise with a signal-to-noiseratio of 10 dB. The signal is estimated using an adaptive filter

    yt = GTt t1 , Gt = [rtm rt rt+m ] R2m + 1

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    10/14

    772 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    Fig. 7. Convergence of thealgorithms in theideal case toward true parameters.

    where t R2m +1 are the estimated filter parameters at time t.We take the values k = 10 and m = 15 as in [9]. Thetransmittedsignal yt is chosen to be zero-mean Gaussian with unit variance;however, yt was a binary signal in [9] (which is obviously notquite the same as our framework). The vector u Rk , describ-ing the channel, was chosen in [9] in two different manners.

    1) In the first case, u is chosen from a Gaussian distributionwith unit variance and then normalized to make u = 1.

    2) In the second case, ui = si ez i , where si {1, 1}, zi {10, 10} are distributed uniformly and then normalizedto make u = 1.

    Algorithm (23) was used to estimate the filter parameters tak-

    ing different choices of at p = 2.5 in the first case. However,the second case (with u being sparse), as discussed in [9], fa-vors fairly large valueofp [i.e.,p = 2ln(2m + 1)]. Thelearningrate is chosen to be equal to (28) as in the previous example.

    The instantaneous filtering error is defined as

    ef ,t = yt GTt t1 .

    The performance of different filtering algorithms is assessed by

    calculating the root mean square of filtering errors:

    RMSFE(t) =

    1

    t + 1

    t

    i=0|ef ,t |

    2

    1/2.

    The time plots of root mean square of filtering errors, aver-

    aged over 100 independent experiments, are shown in Fig. 8.

    Fig. 8(a) shows the faster convergence of algorithm A1 ,p thanA2,p while Fig. 8(b) shows the faster convergence of A3,p thanA2,p . This indicates that the algorithms corresponding to thedifferent choices of (i.e., A1,p , A3,p , etc.) may prove beingbetter in some sense than the standard choice (e) = e (i.e.,algorithm A2,p ). Hence, the proposed framework that offers thepossibility of developing filtering algorithms corresponding to

    the different choices of is a useful tool.The provided simulation studies clearly indicate the poten-

    tial of our approach in adaptive filtering. The first example

    shows the better filtering performance of our approach than

    the most commonly used gradient-descent algorithm for esti-

    mating the parameters of nonlinear neural/fuzzy models. Some

    hybrid methods, e.g., clustering for membership functions and

    RLS algorithm for consequents, have been suggested in the lit-

    erature for an online identification of the fuzzy models. It is

    a well-known fact that RLS optimizes the average (expected)

    performance under some statistical assumptions, while LMSoptimizes the worst-case performance. Since our algorithms are

    the generalized versions of LMS and possess LMS-like robust-

    ness properties (as indicated in Theorem 1), we do not compare

    RLS algorithm with our algorithms. Moreover, for a fair com-

    parison, RLS must be generalized as we generalized LMS in

    our analysis. More will be said on the generalization of the RLS

    algorithm in Section V.

    V. SOME REMARKS

    This text outlines an approach to adaptive fuzzy filtering in a

    broad sense, and thus, several related studies could be made. In

    particular, we would like to mention the following.1) We studied the adaptive filtering problem using a zero-

    order TakagiSugeno fuzzy model due to the simplicity.

    However, the approach can be applied to any semilinear

    model with linear inequality constraints, e.g., first-order

    TakagiSugeno fuzzy models, radial basis function (RBF)

    neural networks, B-spline models, etc. The approach is

    valid for any model characterized by parameters set such that = l n

    y = GT (x, n )l , cn h.

    2) We have considered for the p-norm generalization of

    the algorithms the Bregman divergence associated to thesquared q-norm. Another important example of Bregmandivergences is the relative entropy, which are defined

    between vectors u = [u1 uK]T and w = [w1 wK]T

    (assuming ui , wi 0 andK

    i=1 ui =K

    i= 1 wi = 1) asfollows:

    dRE (u, w) =K

    i= 1

    ui lnuiwi

    .

    The relative entropy is the Bregman divergences dF(u, w)

    for F(u) =

    Ki= 1 (ui ln ui ui ). It is possible to derive

    and analyze different exponentiated gradient [29] type

    fuzzy filtering algorithms via using relative entropy as

    regularizer and following the same approach. However,

    some additional efforts are required to handle the unity

    sum constraint on the vectors.

    3) We arrived at the explicit update form (14) by making an

    approximation. A natural question that arises is if any

    improvement in the filtering performance (assessed in

    Theorem 1) could be made by solving (13) nu-

    merically instead of approximating. An upper bound

    on the filtering errors (as in Theorem 1) could

    be calculated in this case too; however, we would

    then be considering the a posteriori filtering errorskj = 0 P GT(x(j), j )j , GT (x(j), j ).

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    11/14

    KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 773

    Fig. 8. RMSFE(t) as function of time. The curves are averaged over 100 independent experiments. (a) p = 2.5 with u being dense. (b) p = 2ln(2m + 1 )

    with u being sparse.

    4) Our emphasis in this study was on filtering. In machine

    learning literature, one is normally interested in the pre-

    diction performance of such algorithms [10], [33], [34].

    The presented algorithms could be evaluated, in a similar

    manner as Theorem 1, by calculating an upper bound on

    the prediction errorsk

    j = 0 P

    GT(x(j), j )j 1 , y(j)

    .

    5) This study offers the possibility of developing and analyz-

    ing new fuzzy filtering algorithms by defining a continu-

    ous strictly increasing function . An interesting researchdirection is to optimize the function for the problem athand. For example, in the context of linear estimation, an

    expression for the optimum function that minimizes the

    steady-state mean-square error is derived in [8].

    6) Other than the LMS, the recursive least squares (RLS) is a

    well-known algorithm that optimizes the average perfor-

    mance under some stochastic assumptions on the signals.

    The deterministic interpretation of RLS algorithm is that

    it solves a regularized least-squares problem

    wk = arg minw

    kj =0

    y(j) GTj w2 + 1 w2 .

    Our future study is concerned with the generalization of

    RLS algorithm, in the context of interpretable fuzzy mod-

    els, based on the solution of following regularized least-

    squares problem

    (k , k ) = arg min( , ,c h)

    Jk

    Jk =k

    j =0

    Lj (, ) + 1 dq(

    , 1 )

    + 1 dq(, 1 )

    where some examples of loss term Lj (, ) are provided

    in Table II.

    VI. CONCLUSION

    Much work has been done on applying fuzzy models in func-

    tion approximation and classification tasks. We feel that many

    real-world applications (e.g., in chemistry [35], biomedical en-

    gineering [2], etc.) require the filtering of uncertainties from

    the experimental data. The nonlinear fuzzy models by virtue

    of membership functions are more promising than the classical

    linear models. Therefore, it is essential to study the adaptive

    fuzzy filtering algorithms. Adaptive filtering theory for linear

    models has been well developed in the literature; however, its

    extension to the fuzzy models is complicated by the nonlin-earity of membership functions and the interpretability con-

    straints. The contribution of the manuscript (summarized in

    Theorem 1, its inferences, and Table II) is to provide a mathe-

    matical framework that allows the development and an analysis

    of the adaptive fuzzy filtering algorithms. The power of our

    approach is the flexibility of designing the algorithms based

    on the choice of function and the parameter p. The derivedfiltering algorithms have the desired properties of robustness,

    stability, and convergence. This paper is an attempt to provide

    a deterministic approach to study the adaptive fuzzy filtering

    algorithms, and the study opens many research directions, as

    discussed in Section V. Future work involves the study of fuzzy

    filtering algorithms derived using relative entropy as a regular-

    izer, optimizing the function for the problem at hand, and ageneralization of RLS algorithm to p-norm in the context ofinterpretable fuzzy models.

    APPENDIX

    TAKAGISUGENO FUZZY MODEL

    Let us consider an explicit mathematical formulation of a

    Sugeno-type fuzzy inference system that assigns to each crisp

    value (vector) in input space a crisp value in output space.

    Consider a Sugeno fuzzy inference system (Fs : X Y), map-

    ping n-dimensional input space (X = X1 X2 Xn ) to

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    12/14

    774 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    one-dimensional real line, consisting of K different rules. Theith rule is in form

    Ifx1 is Ai1 and x2 is Ai2 and xn is Ain then y = ci

    for all i = 1, 2, . . . , K , where Ai1 , Ai2 , . . . , Ain arenonempty fuzzy subsets of X1 , X2 , . . . , X n , respectively,

    such that the membership functions A i j : Xj [0, 1] ful-fillK

    i= 1

    nj =1 A i j (xj ) > 0 for all xj Xj , and values

    c1 , . . . , cK are real numbers. The different rules, by usingproduct as conjunction operator, can be aggregated as

    Fs (x1 , x2 , . . . , xn ) =

    Ki= 1 ci

    nj = 1 A i j (xj )K

    i=1

    nj = 1 A i j (xj )

    . (35)

    Let us define a real vector such that the membership functionscan be constructed from the elements of vector . To illustratethe construction of membership functions based on knot vector

    (), consider the following examples.1) Trapezoidal membership functions: Let

    =

    a1 , t11 , . . . , t

    2P1 21 , b1 , . . . , an , t

    1n , . . . , t

    2Pn 2n , bn

    such that for the ith input (xi [ai , bi ]), ai 0 for all i = 1, . . . , nsuch that for trapezoidal membership functions

    t1i ai i

    tj + 1i tji i for all j = 1, 2, . . . , (2Pi 3)

    bi t2Pi 2i i .

    These inequalities can be written in terms of a matrix inequality

    c h [18], [37][42]. Hence, the output of a Sugeno-typefuzzy model

    Fs (x) = GT (x, ), c h

    is linear in consequents (i.e., ) but nonlinear in antecedents(i.e., ).

    REFERENCES

    [1] M. Kumar, K. Thurow, N. Stoll, and R. Stoll, Robust fuzzy mappings forQSAR studies, Eur. J. Med. Chem., vol. 42, no. 5, pp. 675685, 2007.

    [2] M. Kumar, M. Weippert, R. Vilbrandt, S. Kreuzfeld, and R. Stoll, Fuzzyevaluation of heartrate signals for mental stress assessment, IEEE Trans.Fuzzy Syst., vol. 15, no. 5, pp. 791808, Oct. 2007.

    [3] A. H. Sayed, Fundamentals of Adaptive Filtering. New York: Wiley,2003.

    [4] S. Haykin, Adaptive Filter Theory, 3rd ed. New York: PrenticeHall,1996.

    [5] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality of the LMSalgorithm, IEEE Trans. Signal Process., vol. 44, no. 2, pp. 267280,Feb. 1996.

    [6] N. R. Yousef and A. H. Sayed, A unified approach to the steady-stateand tracking analyses of adaptive filters, IEEE Trans. Signal Process.,

    vol. 49, no. 2, pp. 314324, Feb. 2001.[7] T. Y. Al-Naffouri and A. H. Sayed, Transient analysis of adaptive filters

    with error nonlinearities, IEEE Trans. Signal Process., vol. 51, no. 3,pp. 653663, Mar. 2003.

    [8] T. Y. Al-Naffouri and A. H. Sayed, Adaptive filters with error nonlin-earities: Mean-square analysis and optimum design, EURASIP J. Appl.Signal Process., vol. 2001, no. 4, pp. 192205, 2001.

    [9] J. Kivinen,M. K. Warmuth, andB. Hassibi, The p-normgeneralization ofthe LMS algorithm for adaptive filtering, IEEE Trans. Signal Process.,vol. 54, no. 5, pp. 17821793, May 2006.

    [10] C. Gentile, The robustness of the p-norm algorithms, Mach. Learning,vol. 53, no. 3, pp. 265299, 2003.

    [11] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Comput-ing: A Computational Approach to Learning and Machine Intelligence.Upper Saddle River, NJ: PrenticeHall, 1997.

    [12] J. Abonyi, L. Nagy, and F. Szeifert, Adaptive fuzzy inference system and

    its application in modelling and model based control, Chem. Eng. Res.Des., vol. 77, pp. 281290, Jun. 1999.[13] P. P. Angelov and D. P. Filev, An approach to online identification of

    TakagiSugeno fuzzy models, IEEE Trans. Syst., Man Cybern. B, Cy-bern., vol. 34, no. 1, pp. 484498, Feb. 2004.

    [14] M. J. Er, Z. Li, H. Cai, and Q. Chen, Adaptive noise cancellation usingenhanced dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst.,vol. 13, no. 3, pp. 331342, Jun. 2005.

    [15] C.-F. Juang and C.-T. Lin, Noisy speech processing by recurrently adap-tive fuzzy filters, IEEE Trans. Fuzzy Syst., vol. 9, no. 1, pp. 139152,Feb. 2001.

    [16] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality criteria for LMSand backpropagation, in Advance Neural Information Process Systems,vol. 6, J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Mateo, CA:Morgan Kaufmann, Apr. 1994, pp. 351359.

    [17] W. Yu and X. Li, Fuzzy identification using fuzzy neural networks withstable learning algorithms, IEEE Trans. Fuzzy Syst., vol. 12, no. 3,

    pp. 411420, Jun. 2004.

    [18] M. Kumar, N. Stoll, and R. Stoll, An energy gain bounding approachto robust fuzzy identification, Automatica, vol. 42, no. 5, pp. 711721,May 2006.

    [19] M. Kumar, R. Stoll, and N. Stoll, Deterministic approach to robust adap-tive learning of fuzzy models, IEEE Trans. Syst., Man. Cybern. B,Cybern., vol. 36, no. 4, pp. 767780, Aug. 2006.

    [20] K. S. Azoury and M. K. Warmuth, Relative loss bounds for on-linedensity estimation with the exponential family of distributions, Mach.

    Learning, vol. 43, no. 3, pp. 211246, Jun. 2001.[21] L. Bregman, The relaxation method of finding the common point of

    convex sets and its application to the solution of problems in convexprogramming, USSR Comp. Math. Phys., vol. 7, pp. 200217, 1967.

    [22] J. Kivinenand M. K. Warmuth,Relative lossbounds for multidimensionalregression problems, Mach. Learning, vol. 45, no. 3, pp. 301329, 2001.

    [23] M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, Adaboostand Bregman distances, Mach. Learning, vol. 48, pp. 253285, 2002.

    [24] N. Murata, T. Takenouchi, T. Kanamori, and S. Eguchi, Informationgeometry of u-boost and Bregman divergence, Neural Comput., vol. 16,no. 7, pp. 14371481, 2004.

    [25] B. Taskar, S. Lacoste-Julien,and M. I. Jordan, Structured prediction, dualextragradient and Bregman projections, J. Mach. Learning Res., vol. 7,pp. 16271653, 2006.

    [26] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering withBregman divergences, J. Mach. Learning Res., vol. 6, pp. 17051749,2005.

    [27] R. Nock and F. Nielsen, On weighting clustering, IEEE Trans. PatternAnal. Mach. Intell., vol. 28, no. 8, pp. 12231235, Aug. 2006.

    [28] A. Banerjee, X. Guo, and H. Wan, On the optimality of conditionalexpectation as a Bregman predictor, IEEE Trans. Inf. Theory, vol. 51,no. 7, pp. 26642669, Jul. 2005.

    [29] J. Kivinen and M. K. Warmuth, Additive versus exponentiated gradientupdates for linear prediction, Inf. Comput., vol. 132, no. 1, pp. 164, Jan.1997.

    [30] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems.Philadelphia, PA: SIAM, 1995.

    [31] D. P. Helmbold, J. Kivinen,and M. K. Warmuth, Relative loss boundsforsingle neurons, IEEE Trans. Neural Netw., vol. 10, no.6, pp.12911304,Nov. 1999.

    [32] P. Auer, M. Hebster, and M. K. Warmuth, Exponentially many localminima for single neurons, in Advance Neural Information ProcessSystems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds.Cambridge, MA: MIT Press, Jun. 1996, pp. 316322.

    [33] N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth, Worst-case quadraticloss bounds for prediction using linear functions and gradient descent,

    IEEE Trans. Neural Netw., vol. 7, no. 3, pp. 604619, May 1996.[34] P. Auer,N. Cesa-Bianchi, and C. Gentile, Adaptive and self-confidenton-

    line learning algorithms, J. Comput. Syst. Sci., vol. 64, no. 1, pp. 4875,Feb. 2002.

    [35] S. Kumar, M. Kumar, R. Stoll, and U. Kragl, Handling uncertainties intoxicity modelling using a fuzzy filter, SAR QSAR Environ. Res., vol. 18,no. 7/8, pp. 645662, Dec. 2007.

    [36] P. Lindskog, Fuzzy identification from a grey box modeling point ofview, in Fuzzy Model Identification: Sel. Approaches, H. Hellendoornand D. Driankov, Eds. Berlin, Germany: Springer-Verlag, 1997.

    [37] M. Burger, H. Engl, J. Haslinger, and U. Bodenhofer, Regularized data-driven construction of fuzzy controllers, J. Inverse Ill-Posed Problems,vol. 10, pp. 319344, 2002.

    [38] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive fuzzy identification

    of time-varying processes with uncertain data. Handling uncertainties inthe physical fitness fuzzy approximation with real world medical data:An application, Fuzzy Optim. Decis. Making, vol. 2, pp. 243259, Sep.2003.

    [39] M. Kumar, R. Stoll, and N. Stoll, Regularized adaptation of fuzzy infer-ence systems. Modelling the opinion of a medical expert about physicalfitness: An application, FuzzyOptim. Decis. Making, vol. 2, pp. 317336,Dec. 2003.

    [40] M. Kumar, R. Stoll, and N. Stoll, Robust solution to fuzzy identificationproblem with uncertain data by regularization. Fuzzy approximation tophysical fitness with real world medical data: An application, FuzzyOptim. Decis. Making, vol. 3, no. 1, pp. 6382, Mar. 2004.

    [41] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive identification of fuzzysystems with uncertain data, Fuzzy Optim. Decis. Making, vol. 3, no. 3,pp. 195216, Sep. 2004.

    [42] M. Kumar, R. Stoll, and N. Stoll, A robust design criterion for inter-pretable fuzzy models with uncertain data, IEEE Trans. Fuzzy Syst.,

    vol. 14, no. 2, pp. 314328, Apr. 2006.

  • 7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

    14/14

    776 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

    Mohit Kumar (M08) received the B.Tech. degree inelectrical engineering from the National Institute ofTechnology, Hamirpur, India, in 1999, the M.Tech.degree in control engineering from the Indian Insti-tute of Technology, Delhi, India, in 2001, and thePh.D. degree (summa cum laude) in electrical engi-neering from Rostock University, Rostock, Germany,in 2004.

    From 2001 to 2004, he was a Research Scien-tist with the Institute of Occupational and SocialMedicine, Rostock. He is currently with the Center

    for Life Science Automation, Rostock. His current research interests include ro-bust adaptive fuzzy identification, fuzzy logic in medicine, and robust adaptivecontrol.

    Norbert Stoll received the Dip.-Ing. in automa-tion engineering and the Ph.D. degree in measure-ment technology from Rostock University, Rostock,Germany, in 1979 and 1985, respectively.

    He was the Head of the analytical chemistry sec-tion with the Academy of Sciences of GDR, Cen-tral Institute for Organic Chemistry, until 1991. From1992 to 1994, he was an Associate Director of theInstitute of Organic Catalysis, Rostock. Since 1994,he has been a Professor of measurement technologywith the Engineering Faculty, University of Rostock.

    From 1994 to 2000, he was the Director of the Institution of Automation atRostock University. Since 2003, he has been the Vice President of the Centerfor Life Science Automation, Rostock. His current research interests includemedical process measurement, lab automation, and smart systems and devices.

    Regina Stoll received the Dip.-Med., the Dr.med.in occupational medicine, and the Dr.med.habil inoccupational and sports medicine from Rostock Uni-versity, Rostock, Germany, in 1980, 1984, and 2002,respectively.

    She is the Head of the Institute of PreventiveMedicine, Rostock. She is a faculty member withthe Medicine Faculty and a Faculty Associate withthe College of Computer Science and Electrical En-gineering, Rostock University. She also holds an ad-

    junct faculty member position with the Industrial En-gineering Department, North Carolina State University, Raleigh. Her currentresearch interests include occupational physiology, preventive medicine, andcardiopulmonary diagnostics.