Adaptive Fuzzy Filtering in a Deterministic Setting

7/30/2019 Adaptive Fuzzy Filtering in a Deterministic Setting

1/14

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009 763

Adaptive Fuzzy Filtering in a Deterministic SettingMohit Kumar, Member, IEEE, Norbert Stoll, and Regina Stoll

AbstractMany real-world applicationsinvolve the filtering andestimation of process variables. This study considers the use of in-terpretable Sugeno-type fuzzy models for adaptive filtering. Ouraim in this study is to provide different adaptive fuzzy filteringalgorithms in a deterministic setting. The algorithms are derivedand studied in a unified way without making any assumptions onthe nature of signals (i.e., process variables). The study extends,in a common framework, the adaptive filtering algorithms (usu-ally studied in signal processing literature) andp-norm algorithms(usually studied in machine learning literature) to semilinear fuzzymodels. A mathematical framework is provided that allows thedevelopment and an analysis of the adaptive fuzzy filtering algo-rithms. We study a class of nonlinear LMS-like algorithms for theonline estimation of fuzzy model parameters. A generalization ofthe algorithms to the p-norm is provided using Bregman diver-

gences (a standard tool for online machine learning algorithms).Index TermsAdaptive filtering algorithms, Bregman diver-

gences, p-norm, robustness, Sugeno fuzzy models.

I. INTRODUCTION

AREAL-WORLD complex process is typically character-

ized by a number of variables whose interrelations are

uncertain and not completely known. Our concern is to ap-

ply, in an online scenario, the fuzzy techniques for such pro-

cesses aiming at the filtering of uncertainties and the estimation

of variables. The adaptive filtering algorithms applications are

not only limited to the engineering problems but also, e.g., to

medicinal chemistry, where it is required to predict the biologi-cal activity of a chemical compound before its synthesis in the

laboratory [1]. Once a compound is synthesized and tested ex-

perimentally for its activity, the experimental data can be used

for an improvement of the prediction performance (i.e., online

learning of the adaptive system). Adaptive filtering of uncer-

tainties may be desired, e.g., for an intelligent interpretation of

medical data that are contaminated by the uncertainties arising

from the individual variations due to a difference in age, gender,

and body conditions [2].

We focus on a process model with n inputs (represented bythe vector x Rn ) and a single output (represented by the scalary). Adaptive filtering algorithms seek to identify the unknown

Manuscript received March 9, 2007; revised August 10, 2007; acceptedOctober 30, 2007. First published April 30, 2008; current version publishedJuly 29, 2009. This work was supported by the Center for Life Science Automa-tion, Rostock, Germany.

M. Kumar is with the Center for Life Science Automation, D-18119 Rostock,Germany (e-mail: [email protected]).

N. Stoll is with the Institute of Automation, College of Computer Scienceand Electrical Engineering, University of Rostock, D-18119 Rostock, Germany(e-mail: [email protected]).

R. Stoll is with the Institute of Preventive Medicine, Faculty of Medicine,University of Rostock, D-18055 Rostock, Germany (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TFUZZ.2008.924331

TABLE I

EXAMPLES OF (e) FROM [6][8]

parameters of a model (being characterized by a vector w)using inputoutput data pairs {x(j), y(j)} related via

y(j) = M(x(j), w) + nj

where M(x(j), w) is the model output for an input x(j),and nj

is theunderlying uncertainty. If the chosenmodel Mis nonlinearin parameter vector w (as is the casewithneural and fuzzy mod-els), the standard gradient-descent algorithm is mostly used for

an online estimation ofw via performing following recursions:

wj = wj 1

E r(w, j)

w

wj 1

Er(w, j) =1

2[y(j) M(x(j), w)]2 (1)

where is the step size (i.e., learning rate).If we relax our model to be linear in w i.e., inputoutput data

are related via

y(j) = GTj w + nj , where Gj is the regressor vector

then a variety of algorithms are available in the literature for an

adaptive estimation of linear parameters [3]. The most popular

algorithm is the LMS because of its simplicity and robustness

[4], [5]. Many LMS-like algorithms have been studied for linear

models [3], [4] while addressing the robustness, convergence,

and steady-state error issues. A particular class of algorithms

takes the update form as

wj = wj 1 (GT

j wj 1 y(j))Gj

where is a nonlinear scalar function such that different choicesof the functional form lead to the different algorithms, as stated

in Table I. The generalization of LMS algorithm to p-norm(2 p < ) is given by the update rule [9], [10]:

wj = f1 (f(wj 1 ) [G

Tj wj 1 y(j)]Gj ). (2)

Here, f (a p indexing for f is understood), as defined in [10], isthe bijective mapping f : RK RK such that

f = [f1 fK]T, fi (w) =

sign(wi )|wi |q1

wq2q(3)

where w = [w1 wK]T RK, q is dual to p (i.e., 1/p +

1/q = 1), and q denotes the q-norm defined as

wq = i |wi |q1/q

.

1063-6706/$26.00 2009 IEEE


2/14

764 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 17, NO. 4, AUGUST 2009

The inverse f1 : RK RK is given as

f1 =

f11 f1K

T, f1i (v) =

sign(vi )|vi |p1

vp2p(4)

where v = [v1 vK]T RK.

Sugeno-type fuzzy models are linear in consequents and non-

linear in antecedents (i.e., membership functions parameters).

When it comes to the online estimation of fuzzy model param-

eters, the following two approaches, in general, are used.

1) The antecedent parameters are adapted using gradients-

descent and the consequent parameters by the recursive

least-squares algorithm [11], [12].

2) A combination of data clustering and recursive least-

squares algorithm is applied [13], [14].

The wide use of gradient-descent algorithm for adaptation of

nonlinear fuzzy model parameters (e.g., in [15]) is due to its sim-

plicity and low computational cost. However, gradient-descent-

based algorithms for nonlinear systems are not justified by rig-orous theoretical arguments [16]. Only a few papers dealing

with the mathematical analysis of the adaptive fuzzy algorithms

have appeared till now. The issue of algorithm stability has been

addressed in [17]. The authors in [18] introduce an energy gain

bounding approach for the estimation of parameters via min-

imizing the maximum possible value of energy gain from dis-

turbances to the estimation errors along the line ofH-optimalestimation. The algorithms for the adaptive estimation of fuzzy

model parameters based on least-squares and H-optimizationcriteria are provided in [19].

To the knowledge of the authors, the fuzzy literature still

lacks1) the development and deterministic mathematical analysis

(in terms of filtering performance) of the methods that

extend the Table I type algorithms (i.e., LMF, LMMN,

sign error, etc.) to the interpretable fuzzy models;

2) the generalization of the algorithms with error nonlinear-

ities (i.e., Table I type algorithms) to the p-norms that aremissing, even for linear in parameters models;

3) the development and deterministic mathematical analysis

(in terms of filtering performance) of the p-norm algo-rithms [e.g., of type (2)] for an adaptive estimation of the

parameters of an interpretable fuzzy model.

This paper is intended to provide the aforementioned studies

in a unified manner. This is done via solving a constrainedregularized nonlinear optimization problem in Section II.

Section III provides the deterministic analysis of the algorithms

with emphasis on filtering errors. Simulation studies are pro-

vided in Section IV followed by some remarks and, finally, the

conclusion.

II. ADAPTIVE FUZZY ALGORITHMS

Sugeno-type fuzzy models are characterized by two types of

parameters: consequents and antecedents. If we characterize the

antecedents using a vector and consequents using a vector

, then the output of a zero-order TakagiSugeno fuzzy model

could be characterized as

Fs (x) = GT (x, ), c h (5)

where G() is a nonlinear function (which is defined by the shapeof membership functions), and c h is a matrix inequality tocharacterize the interpretability of the model. The details of

(5) can be found in, e.g., [19] as well as in the Appendix. Astraightforward approach to the design of an adaptive fuzzy fil-

ter algorithm is to update, at time time j, the model parameters(j 1 , j 1 ) based on current data pair (x(j), y(j)), where weseek to decrease the loss term |y(j) GT (x(j), j )j |

2 ; how-

ever, we do not want to make big changes in initial parameters

(j 1 , j 1 ). That is

(j , j ) = arg min(,,c h)

1

2|y(j) GT (x(j), )|2

+1j

2 j 1

2 +1, j

2 j 1

2

(6)

where j > 0, , j > 0 are the learning rates for antecedentsand consequents, respectively, and denotes the Euclideannorm (i.e., we write the 2-norm of a vector instead of 2 as ). The terms j 1

2 and j 1 2 provide regular-

ization to the adaptive estimation problem. To study the different

adaptive algorithms in a unified framework, the following gen-

eralizations can be provided to the loss as well as regularization

term.

1) The loss term is generalized using a function Lj (, ).Some examples ofLj (, ) include

Lj (, ) =

|y(j) GT (x(j), )|12 |y(j) G

T(x(j), )|2

14 |y(j) G

T(x(j), )|4

a2 |y(j) G

T (x(j), )|2

+ b4 |y(j) GT (x(j), )|4 .

(7)

2) The regularization terms are generalized using Bregman

divergences [9], [20]. The Bregman divergence dF(u, w)[21], which is associated with a strictly convex twice dif-

ferentiable function F from a subset ofRK to R, is definedfor u, w RK, as follows:

dF(u, w) = F(u) F(w) (u w)T f(w)

where f = F denotes the gradient of F. Note thatdF(u, w) 0, which is equal to zero only for u = w andstrictly convex in u. Some of the examples of Bregmandivergences are as follows.

a) Bregman divergence associated to the squared

q-norm: If we define F(w) = (1/2)w2q, then thecorresponding Bregman divergence dq(u, w) is de-fined as

dq(u, w) =1

2u2q

1

2w2q (u w)

T f(w)

where f is given by (3). It is easy to see that for

q = 2, we have d2 (u, w) = (1/2)u w2 .


3/14

KUMAR et al.: ADAPTIVE FUZZY FILTERING IN A DETERMINISTIC SETTING 765

b) Relative entropy: For a vector w = [w1 wK] RK (with wi 0), if we define F(w) =K

i=1 (wi ln wi wi ), then the Bregman diver-gence is the unnormalized relative entropy:

dRE (u, w) =K

i= 1 ui lnui

wi

ui + wi.Bregman divergences have beenwidely studied in learning

and informationtheory (see,e.g.,[22][28]).However,our

idea is to replace the regularization terms (1j /2)

j 1 2 and (1, j /2) j 1

2 in (6) by the generalized

terms 1j dF (, j 1 ) and 1 ,j dF(, j 1 ), respectively.

This approach, in the context of linear models, was intro-

duced for deriving predictive algorithms in [29] and filter-

ing algorithms in [9]. It is obvious that different choices

of function F result in the different filtering algorithms.Our particular concern in this text is to provide a p-normgeneralization to the filtering algorithms. For the p-norm(2 p < ) generalization of the algorithms, we con-sider the Bregman divergence associated to the squared

q-norm [i.e., F(w) = (1/2)w2q], where q is dual to p(i.e., 1/p + 1/q = 1).

In view of these generalizations, the adaptive fuzzy algo-

rithms take the form

(j , j ) = arg min(,,c h )

[Lj (, )

+ 1j dq(, j 1 ) + 1, j dq(, j 1 )]. (8)

For a particular choice Lj (, ) = (1/2)|y(j) GT (x(j),)|2 and q = 2, problem (8) reduces to (6). For a given valueof, we define

() = arg min

Ej (, )

Ej (, ) = Lj (, ) + 1

j dq(, j 1 )

so that the estimation problem (8) can be formulated as

j = arg min

[Ej ((), ) + 1, j dq(, j 1 ), c h] (9)j = (j ). (10)

Expressions (9) and (10) represent a generalized update rule for

adaptive fuzzy filtering algorithms that can be particularized for

a choice ofLj (, ) and q. For any choice of Lj (, ) listed in(7), Ej (, ) is convex in and, thus, could be minimized w.r.t. by setting its gradient equal to zero. This results in

1j f() 1

j f(j 1 )

y(j) GT (x(j), )

G(x(j), ) = 0 (11)

where function is given as

(e) =

sign(e), for sign error

e, for LMS

e3 , for LMF

ae + be3 , for LMMN.

(12)

The minimizing solution must satisfy (11). Thus = f1 (f(j 1 )

+j

y(j) GT (x(j), )G(x(j), ) . (13)For a given , (13) is implicit in

and could be solved nu-

merically. It follows from (13) that for a sufficient small value

of j , it is reasonable to approximate the term GT(x(j), )on the right-hand side of (13) with the term GT(x(j), )j 1 ,as has been done in [9] to obtain the explicitupdate. Thus, an

approximate but in closed form, the solution of (13) is given as

() = f1 (f(j 1 )+ j

y(j) GT (x(j), )j 1

G(x(j), )

. (14)

Here, () has been written to indicate the dependenceof the solution on . Since dq(, j 1 ) = (1/2)

2q

(1/2)j 1 2q ( j 1 )T f(j 1 ), (9) is equivalent to

j = arg min Ej ((), )+

1 ,j2

2q 1, j

Tf(j 1 ), c h

(15)

as the remaining terms are independent of . There is no harmin adding a -independent term in (15):

j = arg min

Ej ((), ) + 1 ,j

22q

1 ,j T f(j 1 ) +

1, j2

f(j 1 )2 , c h

. (16)

For any 2 p < , we have 1 < q 2, and thus, q .This makes

1, j2

2q 1 ,j

T f(j 1 ) +1, j

2f(j 1 )

2 (17)

1 ,j

2[2 2Tf(j 1 ) + f(j 1 )

2 ]

=1 ,j

2 f(j 1 )

2 . (18)

Solving the constrained nonlinear optimization problem (16),

as we will see, becomes relatively easy by slightly modifying(decreasing) the level of regularization being provided in the

estimation of j . The expression (17), i.e., last three terms ofoptimization problem (16), accounts for the regularization in

the estimation ofj . For a given value of, j , a decrease in thelevel of regularization would occur via replacing the expression

(17) in the optimization problem (16) by expression (18):

j = arg min

Ej ((), ) + 1, j

2 f(j 1 )

2 , c h

.

(19)

For a viewpoint of an adaptive estimation of vector , noth-

ing goes against considering (19) instead of (16), since any


4/14


desired level of regularization could be still achieved via adjust-

ing in (19) the value of free parameter , j . The motivation ofconsidering (19) is derived from the fact that it is possible to

reformulate the estimation problem as a least-squares problem.

To do so, define a vector

r() = Ej ((), )(1, j /2)1/2 ( f(j 1 )) where Ej ((), ) 0, and , j > 0. Now, it is possible torewrite (19) as

j = arg min

[r()2 , c h]. (20)

To compute j recursively based on (20), we suggest followingGaussNewton like algorithm:

j = j 1 + s(j 1 ) (21)

s() = arg mins

r() + r()s2 , cs h c

(22)

where r() is the Jacobian matrix of vector r w.r.t. computed

by the method of finite differences. Fortunately, r

() is a full-rank matrix, since ,j > 0. The constrained linear least-squaresproblem (22) can be solved by transforming it first to a least

distance programming (see [30] for details). Finally, (10) in

view of (14) becomes

j = f1 (f(j 1 )

+ j

y(j) GT (x(j), j )j 1

G(x(j), j )

. (23)

III. DETERMINISTIC ANALYSIS

We provide in this section a deterministic analysis of adap-

tive fuzzy filtering algorithms (21)(23) in terms of filtering

performance. For this, consider a fuzzy model that fits giveninputoutput data {x(j), y(j)}kj =0 according to

y(j) = GT (x(j), j ) + vj (24)

where is some true parameter vector (that is to be estimated),j is given by (21), and vj accommodates any disturbance dueto measurement noise, modeling errors, mismatch between jand global minima of (20), and so on. We are interested in

the analysis of estimating using (23) in the presence of adisturbance signal vj . That is, we take j as an estimate of

at thejth time instant and try to calculate an upper bound on thefiltering errors. In filtering setting, it is desired to estimate the

quantity GT

(x(j), j )

using an adaptive model. The j 1 isa priori estimate of at the jth time index, and thus, a priorifiltering error can be expressed as

ef ,j = GT (x(j), j )

GT (x(j), j )j 1 .

One would normally expect |GT (x(j), j ) GT (x(j),

j )j 1 |2 to be the performance measure of an algorithm. How-

ever, the squared error as a performance measure does not seem

to be suitable for a uniform analysis of all the algorithms. We

introduce a generalized performance measure P (y, y) that isdefined for scalars y and y as

P (y, y) = y

y((r) (y))dr (25)

Fig. 1. Value P (a, b) is equal to the area of the shaded region.

where is a continuous strictly increasing function with (0) =0.Itcanbeeasilyseenthatfor (e) = e, we have normal squarederror i.e., P (y, y) = (y y)

2 /2. A different but integral-basedloss function, called matching loss for a continuous, increasing

transfer function , was considered in [31] and [32] for a singleneuron model. The matching loss for was defined in [32] as

M (y, y) =

1 (y)1 (y)

((r) y)dr.

If we let (r) =

(r)dr, then

P (y, y) = (y) (y) (y y)(y). (26)

Note that in definition (25), the continuous function is not anarbitrary function but its integral (r) =

(r)dr must be a

strictly convex function. The strictly increasing nature of (r)[i.e., strict convexity of(r)] enables us to assess the mismatchbetween y and y using P (y, y). Fig. 1 illustrates the physicalmeaning ofP (a, b). The value P (a, b) is equal to the area ofthe shaded region in the figure. One could infer from Fig. 1 that

a mismatch between a and b could be assessed via calculatingthe area of the shaded region [i.e., P (a, b)] provided the givenfunction is strictly increasing. Usually, P is not symmetric,i.e., P (y, y) = P (y, y). One of such types of performancemeasures [i.e., M (y, y)] was considered previously in [22].

In our analysis, we assess the instantaneous filter-ing error [i.e., the mismatch between GT(x(j), j )j 1and GT (x(j), j )

] by calculating P (GT(x(j), j )j 1 ,GT(x(j), j )

). For the case (e) = e, we haveP

GT(x(j), j )j 1 , GT (x(j), j )

= |ef ,j |2 /2. The

filtering performance of an algorithm, which is run from j = 0to j = k, can be evaluated by calculating the sum

kj = 0

P

GT (x(j), j )j 1 , GT(x(j), j )

.

Similarly, the magnitudes of disturbances vj = y(j)

GT(x(j), j ), j = 0, . . . , k will be assessed by calculating


5/14


the mismatch between y(j) and GT (x(j), j ) as follows:

kj = 0

P

y(j), GT (x(j), j )

.

Finally, the robustness of an algorithm (i.e., sensitivity of filter-

ing errors toward disturbances) can be assessed by calculating

an upper bound on the ratio, e.g., (29). The term dq(, 1 )in the denominator of (29) assesses the disturbance due to a

mismatch between initial guess 1 and the true vector .

Lemma 1: Let m be a scalar and Gj RK such that j =f1 (f(j 1 ) + mGj ), and then

dq(j 1 , j ) m2

2(p 1)Gj

2p .

Proof : See [10, Lemma 4]. Lemma 2: Ifj and j 1 are related via (23), then

dq(, j 1 ) dq(

, j ) + dq(j 1 , j )

= j y(j)GT (x(j), j )j 1 (j 1 )T G(x(j), j ).Proof: The proof follows simply by using the definitions of

dq(, j 1 ), dq(

, j ), and dq(j 1 , j ). In view of Lemma 1 and (23), we have

dq(j 1 , j ) 2

j

|

y(j) GT (x(j), j )j 1

|2

2

(p 1)G(x(j), j )2

p . (27)

Theorem 1: The estimation algorithm (21)(23) with beinga continuous strictly increasing function and (0) = 0, for any

2 p < , with a learning rate

0 < j 2P

y(j), GT(x(j), j )j 1

den

(28)

where

den =

y(j) GT (x(j), j )j 1

(p 1)

[(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2

p

achieves an upper bound on filtering errors such thatkj =0

aj P

GT (x(j), j )j 1 , G

T (x(j), j )

k

j =0 a

j P (y(j), GT (x(j), j )) + dq(, 1 )

1

(29)where qis dual to p, and aj > 0 is given as

aj = j

y(j) GT(x(j), j )j 1

(y(j)) (GT (x(j), j )j 1 ). (30)

Here, we assume that y(j) = GT(x(j), j )j 1 , since thereis no update of parameters (i.e., j = j 1 ) if y(j) =GT(x(j), j )j 1 .

Proof: Define j = dq(, j 1 ) dq(

, j ) and usingLemma 2, we have

j = j

y(j) GT (x(j), j )j 1

( j 1 )

T G(x(j), j )

dq(j 1 , j ). (31)

Using (27), we have

j j

y(j)GT(x(j), j )j 1

(j 1 )T G(x(j), j )

p 1

22j |


|2 G(x(j), j )

2p .

That is

j

aj [(y(j)) (GT (x(j), j )j 1 )](

j 1 )T G(x(j), j )

p 1

2j


aj

[(y(j)) (GT(x(j), j )j 1 )]G(x(j), j )2

p .

Since j satisfies (28), the previous inequality reduces to

j

aj [(y(j)) (GT (x(j), j )j 1 )](

j 1 )T G(x(j), j )

a

j P (y(j), GT

(x(j), j )j 1 ). (32)It can be verified using definition (26) that

[(y(j)) (GT(x(j), j )j 1 )]( j 1 )

TG(x(j), j )

= P

GT (x(j), j )j 1 , GT (x(j), j )

P (y(j), GT (x(j), j )

)+P (y(j), GT (x(j), j )j 1 )

and thus, inequality (32) is further reduced to

j a

j P

GT (x(j), j )j 1 , GT (x(j), j )

aj P

y(j), GT (x(j), j )

.

In additionk

j =0

j = dq(, 1 ) dq(

, k )

dq(, 1 ), since dq(

, k ) 0

resulting in

dq(, 1 )

kj = 0

aj P

GT (x(j), j )j 1 , GT (x(j), j )

k

j = 0 aj P y(j), GT (x(j), j )from which the inequality (29) follows.

Following inferences could be immediately made from

Theorem 1.

1) For the special case (e) = e and taking 1 = 0, theresults of Theorem 1 are modified as follows:k

j = 0 j |(GT (x(j), j )j 1 G

T (x(j), j )|2k

j = 0 j |y(j) GT(x(j), j )|2 + 2q

1

where

j 1

(p 1)G(x(j), j )2p.


6/14


Choosing

j =1

(p 1)U2p, where Up G(x(j), j )p

we get

k

j =0

|(GT

(x(j), j )j 1 GT

(x(j), j )

|2

k

j =0

|y(j) GT (x(j), j )|2 + (p 1)U2p

2q

which is formally equivalent to [9, Th. 2]

2) Inequality (29) illustrates the robustness property in the

sense that if, for the given positive values {aj }k

j = 0 , the

disturbances {vj }k

j =0 are small, i.e.,

k

j =0P

y(j), GT (x(j), j )

is small, then the filtering errors

kj =0

P

GT (x(j), j )j 1 , GT(x(j), j )

remain small. The positive value aj represents a weightgiven to the jth data pair in the summation.

3) If we define an upper bound on the disturbance signal as

vm ax = maxj

P

y(j), GT(x(j), j )

then it follows from (29) that

kj = 0

aj P GT (x(j), j )j 1 , GT (x(j), j ) (k + 1)am ax v

m ax + dq(

, 1 )

where ama x = maxj

aj . That is

1

k + 1

kj = 0

aj P

GT (x(j), j )j 1 , GT (x(j), j )

am ax vm ax +

dq(, 1 )

k + 1. (33)

As dq(, 1 ) is finite, we have

1k + 1

kj = 0

aj PGT (x(j), j )j 1 , GT (x(j), j )

ama x vma x , when k .

Inequality (33) shows the stability of the algorithm against

disturbance vj in the sense that if disturbance signalP

y(j), GT (x(j), j )

is bounded (i.e., vm ax is fi-nite), then the average value of filtering errors assessed

as

1

k + 1

kj = 0

aj P

GT (x(j), j )j 1 , GT (x(j), j )

remains bounded.

4) In the ideal case of zero disturbances (i.e., vj = 0), wehave P

y(j), GT (x(j), j )

= 0, andk

j = 0

aj P

GT (x(j), j )j 1 , GT (x(j), j )

dq(, 1 ).

Since dq(, 1 ) is finite and

aj P (G

T (x(j), j )

j 1 , GT (x(j), j )

) 0, there must exist a sufficientlarge index T such that

j T

aj P

GT (x(j), j )j 1 , GT(x(j), j )

= 0.

In other words

GT (x(j), j )j 1 = GT (x(j), j )

j T

since aj > 0. This shows the convergence of thealgorithmat time index T toward true parameters.

5) Theorem 1 cannot be applied for the case of (e) =

sign(e), since in this case is not a continuous strictlyincreasing function. However, one could instead choose

(e) = tanh(e) that has a shape similar to that of signfunction, and Theorem 1 still remains applicable. Note

that in this case, the loss term in (8) will be

Lj (, ) = ln(cosh(y(j) GT(x(j), ))).

Theorem 1 is an important result due to the generality of

function , and thus, offers a possibility of studying, in a unifiedframework, different fuzzy adaptive algorithms corresponding

to the different choices of continuous strictly increasing function

. Table II provides a few examples of (plotted in Fig. 2),leading to the different p-norms algorithms listed in the table asA1,p , A2,p , and so on.

IV. SIMULATION STUDIES

This section provides simulation studies to illustrate the

following.

1) Adaptive fuzzy filtering algorithms A1,p , A2,p , . . ., p 2perform better than the commonly used gradient-descent

algorithm.

2) For the estimation of linear parameters (keeping mem-

bership functions fixed), LMS is the standard algorithm

known for its simplicity and robustness. This study

provides a family of algorithms characterized by and

p. It will be shown that the p-norm generalization of thealgorithms corresponding to the different choices of may achieve better performance than the standard LMS

algorithm.

3) We will show the robustness and convergence of the fil-

tering algorithms.

4) For a given value of p, the algorithms corresponding tothe different choices of may prove being better than thestandard choice (e) = e (i.e., squared error loss term).

A. First Example

Consider the problem of filtering noise from a chaotic

time series. The time series is generated by simulating the


7/14


TABLE IIFEW EXAMPLES OF ADAPTIVE FUZZY FILTERING ALGORITHMS

Fig. 2. Different examples of function .

MackeyGlass differential delay equation

dx

dt=

0.2x(t 17)

1 + x10 (t 17) 0.1x(t)

y(t) = x(t) + n(t)

where x(0) = 1.2, x(t) = 0, for t < 0, and n(t) is a ran-dom noise chosen from a uniform distribution on the in-

terval [0.2, 0.2]. The aim is to filter the noise n(t) fromy(t) to estimate x(t) by using a set of past values i.e.,[x(t 24), x(t 18), x(t 12), x(t 6)].

Assume that there exists an ideal fuzzy model characterized

by (, ) that models the relationship between input vec-tor [x(t 24) x(t 18) x(t 12) x(t 6)]T and output x(t).That is

x(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )

y(t) = GT ([x(t 24) x(t 18) x(t 12) x(t 6)]T , )

+ n(t).

The fourth-order RungeKutta method was used for the simula-

tion of time series, and 500 inputoutput data pairs, starting fromt = 124 to t = 623, were extracted for an adaptive estimation ofparameters (, ) using different algorithms. If (t1 , t1 )denotes the a priori estimation at time t, then filtering error ofan algorithm is defined as

ef ,t = x(t) y(t)

where

y(t)

= GT [x(t24) x(t 18) x(t12) x(t6)]T , t1t1 .We consider a total of 30 differentalgorithms i.e., A1 ,p , . . . , A5,pfor p = 2, 2.2, 2.4, 2.6, 2.8, 3, running from t = 124 to t =623, for the prediction of the desired value x(t). We choose,for an example sake, the trapezoidal type of membership

functions (defined by 36) such that the number of mem-

bership functions assigned to each of the four inputs [i.e.,

x(t 24), x(t 18), x(t 12), x(t 6)]isequalto3.Theini-tial guess about model parameters was taken as

12 3 =

0

0.30.60.91.21.5

T

0

0.30.60.91.21.5

T

0

0.30.60.91.21.5

T

0

0.30.60.91.21.5

T

where 12 3 = [0]81 1 . The matrix c and vector h are chosen insuch a way that the two consecutive knots must remain separated

at least by a distance of 0.001 during the recursions of thealgorithms.

The gradient-descent algorithm (1), if intuitively applied to

the fuzzy model (5), needs following considerations.

1) The antecedent parameters of the fuzzy model should be

estimated with the learning rate , while the consequent


8/14


parameters with the learning rate . That is

jj

=

j 1j 1

00

Er(,,j)

j 1 ,j 1

Er(,,j)

j 1 ,j 1

Er(,,j) =1

2[y(j) GT(x(j), )]2 .

Introducing the notation j =

Tj T

j

T, the gradient-

descent update takes the form

j = j 1

00

Er(, j)

j 1

.

Here, may or may not be equal to . In general,

should be lesser than to avoid any oscillations of theestimated parameters.2) During the gradient-descent estimation of membership

functions parameters, in the presence of disturbances, the

knots (elements of vector ) may attempt to come close (oreven cross) one another. That is, inequalities (38) and (39)

do not hold good and, thus, result in a loss of interpretabil-

ity and estimation performance. For a better performance

of gradient-descent, the knots must be prevented from

crossing one another by modifying the estimation scheme

as

j = j 1 0

0 Er(, j) j 1 , if cj hj 1 , otherwise.

(34)

Each of the aforementioned algorithms and gradient-descent

algorithm (34) is run at = = 0.9. Thefiltering performanceof each algorithm is assessed via computing energy of the fil-

tering error signal ef ,t . The energy of a signal is equal to thesquared L2 -norm. The energy of filtering error signal ef ,t , fromt = 124 to t = 623, is defined as

62 3

t=124 |e

f ,t |2

.

A higher energy of filtering errors means the higher magnitudes

of filtering errors and, thus, a poor performance of the filtering

algorithm.

Fig. 3 compares the different algorithms by plotting their

filtering errors energies at different values of p. As seenfrom Fig. 3, all the algorithms A1,p , . . . , A5,p for p =2, 2.2, 2.4, 2.6, 2.8, 3 perform better than the gradient-descent since graident-descent method is associated to a higher

energy of the filtering errors. As an illustration, Fig. 4 shows the

time plot of the absolute filtering error |ef ,t | for gradient-descent

and the algorithm A4,p at p = 2.4.

Fig. 3. Filtering performance of different adaptive algorithms.

Fig. 4. Time plot of absolute filtering error |ef ,t | from t = 124 to t = 623for gradient-descent and the algorithm A4 ,2 .4 .

B. Second Example

Now, we consider a linear fuzzy model (membership func-

tions being fixed)

y(j) = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]

+ vj

where = [ 0.25 0.5 1 0.3 ]T , x(j) takesrandom val-ues from a uniform distribution on [1, 1], vj is a random noisechosen from a uniform distribution on the interval [0.2, 0.2],and the membership functions are defined by (37) taking

= (1, 0.3, 0.3, 1).Algorithm (23) was employed to estimate for different

choices of (listed in Table II) atp = 3 (asanexampleofp > 2).The initial guess is taken as 1 = 0. For comparison, the LMSalgorithm is also simulated. Note that the LMS algorithm is

just a particularization of (23) for p = 2 and (e) = e. That is,


9/14


Fig. 5. Comparison of algorithms (A1 ,3 , A2 ,3 , A3 , 3 , A4 ,3 , A5 ,3 ) with theLMS.

A2,2 in Table II, in the context of linear estimation, is the LMSalgorithm. The performance of an algorithm was evaluated by

calculating the instantaneous a priori error in the estimation of

as j 1 2 . The learning rate j for each algorithm,including LMS, is chosen according to (28) as follows:

j

=2P

y(j), GTj j 1

y(j)GTj j 1

[(y(j))(GTj j 1 )](p1)Gj 2

p

where

Gj = [ A 1 1 (x(j)) A 2 1 (x(j)) A 3 1 (x(j)) A 4 1 (x(j)) ]T .

For an assessment of the expected error values i.e., E[ j 1

2 ], 500 independent experiments have been performed.The 500 independent time plots of values j 1

2 have

been averaged to obtain the time plot of E[ j 1 2 ],

shown in Fig. 5. The better performance of algorithms

(A1 ,3 , A2,3 , A3,3 , A4; 3 , A5,3 ) than the LMS can be seen inFig. 5.

This example indicates that the p-norm generalization of thealgorithms makes a sense since A2,p algorithm for p = 3 (avalue of p > 2) proved being better than the p = 2 case (i.e.,LMS).

C. Robustness and Convergence

To study therobustness properties of thealgorithms, we inves-

tigate the sensitivity of the filtering errors toward disturbances.

To make this more precise, we plot the curve between distur-

bance energy and filtering errors energy and analyze the curve.

In the aforementioned example (i.e., second example), the en-

ergy of filtering errors will be defined as

Ef(j) =

j

i= 0 GTi GTi i1 2

Fig. 6. Filtering errors energy as function of energy of disturbances. Thecurves are averaged over 100 independent experiments.

and the total energy of disturbances as

Ed (j) =

ji= 0

|vi |2 + 1

2

where the term 1 2 accounts for the disturbancedue to a mismatch between the initial guess and true

parameters. Fig. 6 shows the plot between the values

{Ed (j)}1000

j = 0 and {Ef(j)}1000

j = 0 for each of the algorithms

(A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ). The curves for (A2,3 , A4,3 , A5,3 )are not distinguishable. The curves in Fig. 6 have been averaged

over 100 independent experiments. The initial guess 1 is cho-sen in each experiment randomly from a uniform distributionon [1, 1]. The curves in Fig. 6 show the robustness of the algo-rithms in the sense that a small energy of disturbances does not

lead to a large energy of filtering errors. Thus, if the disturbances

are bounded, then the filtering errors also remain bounded [i.e.,

bounded-input bounded-output (BIBO) stability].

To verify the convergence properties of the algorithms,

the plots of Fig. 5 are redrawn via running the algorithms

(A1 ,3 , A2,3 , A3,3 , A4,3 , A5,3 ) in an ideal case of zero distur-bances (i.e., vj = 0). Fig. 7 shows, in this case, the convergenceof the algorithms toward true parameters.

D. Third Example

Our third example has been taken from [9], where at time t,a signal yt is transmitted over a noisy channel. The recipientis required to estimate the sent signal yt out of the actuallyobserved signal

rt =k 1i=0

ui+ 1 yti + vt

where vt is zero-mean Gaussian noise with a signal-to-noiseratio of 10 dB. The signal is estimated using an adaptive filter

yt = GTt t1 , Gt = [rtm rt rt+m ] R2m + 1


10/14


Fig. 7. Convergence of thealgorithms in theideal case toward true parameters.

where t R2m +1 are the estimated filter parameters at time t.We take the values k = 10 and m = 15 as in [9]. Thetransmittedsignal yt is chosen to be zero-mean Gaussian with unit variance;however, yt was a binary signal in [9] (which is obviously notquite the same as our framework). The vector u Rk , describ-ing the channel, was chosen in [9] in two different manners.

1) In the first case, u is chosen from a Gaussian distributionwith unit variance and then normalized to make u = 1.

2) In the second case, ui = si ez i , where si {1, 1}, zi {10, 10} are distributed uniformly and then normalizedto make u = 1.

Algorithm (23) was used to estimate the filter parameters tak-

ing different choices of at p = 2.5 in the first case. However,the second case (with u being sparse), as discussed in [9], fa-vors fairly large valueofp [i.e.,p = 2ln(2m + 1)]. Thelearningrate is chosen to be equal to (28) as in the previous example.

The instantaneous filtering error is defined as

ef ,t = yt GTt t1 .

The performance of different filtering algorithms is assessed by

calculating the root mean square of filtering errors:

RMSFE(t) =

1

t + 1

t

i=0|ef ,t |

2

1/2.

The time plots of root mean square of filtering errors, aver-

aged over 100 independent experiments, are shown in Fig. 8.

Fig. 8(a) shows the faster convergence of algorithm A1 ,p thanA2,p while Fig. 8(b) shows the faster convergence of A3,p thanA2,p . This indicates that the algorithms corresponding to thedifferent choices of (i.e., A1,p , A3,p , etc.) may prove beingbetter in some sense than the standard choice (e) = e (i.e.,algorithm A2,p ). Hence, the proposed framework that offers thepossibility of developing filtering algorithms corresponding to

the different choices of is a useful tool.The provided simulation studies clearly indicate the poten-

tial of our approach in adaptive filtering. The first example

shows the better filtering performance of our approach than

the most commonly used gradient-descent algorithm for esti-

mating the parameters of nonlinear neural/fuzzy models. Some

hybrid methods, e.g., clustering for membership functions and

RLS algorithm for consequents, have been suggested in the lit-

erature for an online identification of the fuzzy models. It is

a well-known fact that RLS optimizes the average (expected)

performance under some statistical assumptions, while LMSoptimizes the worst-case performance. Since our algorithms are

the generalized versions of LMS and possess LMS-like robust-

ness properties (as indicated in Theorem 1), we do not compare

RLS algorithm with our algorithms. Moreover, for a fair com-

parison, RLS must be generalized as we generalized LMS in

our analysis. More will be said on the generalization of the RLS

algorithm in Section V.

V. SOME REMARKS

This text outlines an approach to adaptive fuzzy filtering in a

broad sense, and thus, several related studies could be made. In

particular, we would like to mention the following.1) We studied the adaptive filtering problem using a zero-

order TakagiSugeno fuzzy model due to the simplicity.

However, the approach can be applied to any semilinear

model with linear inequality constraints, e.g., first-order

TakagiSugeno fuzzy models, radial basis function (RBF)

neural networks, B-spline models, etc. The approach is

valid for any model characterized by parameters set such that = l n

y = GT (x, n )l , cn h.

2) We have considered for the p-norm generalization of

the algorithms the Bregman divergence associated to thesquared q-norm. Another important example of Bregmandivergences is the relative entropy, which are defined

between vectors u = [u1 uK]T and w = [w1 wK]T

(assuming ui , wi 0 andK

i=1 ui =K

i= 1 wi = 1) asfollows:

dRE (u, w) =K

i= 1

ui lnuiwi

.

The relative entropy is the Bregman divergences dF(u, w)

for F(u) =

Ki= 1 (ui ln ui ui ). It is possible to derive

and analyze different exponentiated gradient [29] type

fuzzy filtering algorithms via using relative entropy as

regularizer and following the same approach. However,

some additional efforts are required to handle the unity

sum constraint on the vectors.

3) We arrived at the explicit update form (14) by making an

approximation. A natural question that arises is if any

improvement in the filtering performance (assessed in

Theorem 1) could be made by solving (13) nu-

merically instead of approximating. An upper bound

on the filtering errors (as in Theorem 1) could

be calculated in this case too; however, we would

then be considering the a posteriori filtering errorskj = 0 P GT(x(j), j )j , GT (x(j), j ).


11/14


Fig. 8. RMSFE(t) as function of time. The curves are averaged over 100 independent experiments. (a) p = 2.5 with u being dense. (b) p = 2ln(2m + 1 )

with u being sparse.

4) Our emphasis in this study was on filtering. In machine

learning literature, one is normally interested in the pre-

diction performance of such algorithms [10], [33], [34].

The presented algorithms could be evaluated, in a similar

manner as Theorem 1, by calculating an upper bound on

the prediction errorsk

j = 0 P

GT(x(j), j )j 1 , y(j)

.

5) This study offers the possibility of developing and analyz-

ing new fuzzy filtering algorithms by defining a continu-

ous strictly increasing function . An interesting researchdirection is to optimize the function for the problem athand. For example, in the context of linear estimation, an

expression for the optimum function that minimizes the

steady-state mean-square error is derived in [8].

6) Other than the LMS, the recursive least squares (RLS) is a

well-known algorithm that optimizes the average perfor-

mance under some stochastic assumptions on the signals.

The deterministic interpretation of RLS algorithm is that

it solves a regularized least-squares problem

wk = arg minw

kj =0

y(j) GTj w2 + 1 w2 .

Our future study is concerned with the generalization of

RLS algorithm, in the context of interpretable fuzzy mod-

els, based on the solution of following regularized least-

squares problem

(k , k ) = arg min( , ,c h)

Jk

Jk =k

j =0

Lj (, ) + 1 dq(

, 1 )

+ 1 dq(, 1 )

where some examples of loss term Lj (, ) are provided

in Table II.

VI. CONCLUSION

Much work has been done on applying fuzzy models in func-

tion approximation and classification tasks. We feel that many

real-world applications (e.g., in chemistry [35], biomedical en-

gineering [2], etc.) require the filtering of uncertainties from

the experimental data. The nonlinear fuzzy models by virtue

of membership functions are more promising than the classical

linear models. Therefore, it is essential to study the adaptive

fuzzy filtering algorithms. Adaptive filtering theory for linear

models has been well developed in the literature; however, its

extension to the fuzzy models is complicated by the nonlin-earity of membership functions and the interpretability con-

straints. The contribution of the manuscript (summarized in

Theorem 1, its inferences, and Table II) is to provide a mathe-

matical framework that allows the development and an analysis

of the adaptive fuzzy filtering algorithms. The power of our

approach is the flexibility of designing the algorithms based

on the choice of function and the parameter p. The derivedfiltering algorithms have the desired properties of robustness,

stability, and convergence. This paper is an attempt to provide

a deterministic approach to study the adaptive fuzzy filtering

algorithms, and the study opens many research directions, as

discussed in Section V. Future work involves the study of fuzzy

filtering algorithms derived using relative entropy as a regular-

izer, optimizing the function for the problem at hand, and ageneralization of RLS algorithm to p-norm in the context ofinterpretable fuzzy models.

APPENDIX

TAKAGISUGENO FUZZY MODEL

Let us consider an explicit mathematical formulation of a

Sugeno-type fuzzy inference system that assigns to each crisp

value (vector) in input space a crisp value in output space.

Consider a Sugeno fuzzy inference system (Fs : X Y), map-

ping n-dimensional input space (X = X1 X2 Xn ) to


12/14


one-dimensional real line, consisting of K different rules. Theith rule is in form

Ifx1 is Ai1 and x2 is Ai2 and xn is Ain then y = ci

for all i = 1, 2, . . . , K , where Ai1 , Ai2 , . . . , Ain arenonempty fuzzy subsets of X1 , X2 , . . . , X n , respectively,

such that the membership functions A i j : Xj [0, 1] ful-fillK

i= 1

nj =1 A i j (xj ) > 0 for all xj Xj , and values

c1 , . . . , cK are real numbers. The different rules, by usingproduct as conjunction operator, can be aggregated as

Fs (x1 , x2 , . . . , xn ) =

Ki= 1 ci

nj = 1 A i j (xj )K

i=1

nj = 1 A i j (xj )

. (35)

Let us define a real vector such that the membership functionscan be constructed from the elements of vector . To illustratethe construction of membership functions based on knot vector

(), consider the following examples.1) Trapezoidal membership functions: Let

=

a1 , t11 , . . . , t

2P1 21 , b1 , . . . , an , t

1n , . . . , t

2Pn 2n , bn

such that for the ith input (xi [ai , bi ]), ai 0 for all i = 1, . . . , nsuch that for trapezoidal membership functions

t1i ai i

tj + 1i tji i for all j = 1, 2, . . . , (2Pi 3)

bi t2Pi 2i i .

These inequalities can be written in terms of a matrix inequality

c h [18], [37][42]. Hence, the output of a Sugeno-typefuzzy model

Fs (x) = GT (x, ), c h

is linear in consequents (i.e., ) but nonlinear in antecedents(i.e., ).

REFERENCES

[1] M. Kumar, K. Thurow, N. Stoll, and R. Stoll, Robust fuzzy mappings forQSAR studies, Eur. J. Med. Chem., vol. 42, no. 5, pp. 675685, 2007.

[2] M. Kumar, M. Weippert, R. Vilbrandt, S. Kreuzfeld, and R. Stoll, Fuzzyevaluation of heartrate signals for mental stress assessment, IEEE Trans.Fuzzy Syst., vol. 15, no. 5, pp. 791808, Oct. 2007.

[3] A. H. Sayed, Fundamentals of Adaptive Filtering. New York: Wiley,2003.

[4] S. Haykin, Adaptive Filter Theory, 3rd ed. New York: PrenticeHall,1996.

[5] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality of the LMSalgorithm, IEEE Trans. Signal Process., vol. 44, no. 2, pp. 267280,Feb. 1996.

[6] N. R. Yousef and A. H. Sayed, A unified approach to the steady-stateand tracking analyses of adaptive filters, IEEE Trans. Signal Process.,

vol. 49, no. 2, pp. 314324, Feb. 2001.[7] T. Y. Al-Naffouri and A. H. Sayed, Transient analysis of adaptive filters

with error nonlinearities, IEEE Trans. Signal Process., vol. 51, no. 3,pp. 653663, Mar. 2003.

[8] T. Y. Al-Naffouri and A. H. Sayed, Adaptive filters with error nonlin-earities: Mean-square analysis and optimum design, EURASIP J. Appl.Signal Process., vol. 2001, no. 4, pp. 192205, 2001.

[9] J. Kivinen,M. K. Warmuth, andB. Hassibi, The p-normgeneralization ofthe LMS algorithm for adaptive filtering, IEEE Trans. Signal Process.,vol. 54, no. 5, pp. 17821793, May 2006.

[10] C. Gentile, The robustness of the p-norm algorithms, Mach. Learning,vol. 53, no. 3, pp. 265299, 2003.

[11] J. S. R. Jang, C. T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Comput-ing: A Computational Approach to Learning and Machine Intelligence.Upper Saddle River, NJ: PrenticeHall, 1997.

[12] J. Abonyi, L. Nagy, and F. Szeifert, Adaptive fuzzy inference system and

its application in modelling and model based control, Chem. Eng. Res.Des., vol. 77, pp. 281290, Jun. 1999.[13] P. P. Angelov and D. P. Filev, An approach to online identification of

TakagiSugeno fuzzy models, IEEE Trans. Syst., Man Cybern. B, Cy-bern., vol. 34, no. 1, pp. 484498, Feb. 2004.

[14] M. J. Er, Z. Li, H. Cai, and Q. Chen, Adaptive noise cancellation usingenhanced dynamic fuzzy neural networks, IEEE Trans. Fuzzy Syst.,vol. 13, no. 3, pp. 331342, Jun. 2005.

[15] C.-F. Juang and C.-T. Lin, Noisy speech processing by recurrently adap-tive fuzzy filters, IEEE Trans. Fuzzy Syst., vol. 9, no. 1, pp. 139152,Feb. 2001.

[16] B. Hassibi, A. H. Sayed, and T. Kailath, H optimality criteria for LMSand backpropagation, in Advance Neural Information Process Systems,vol. 6, J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Mateo, CA:Morgan Kaufmann, Apr. 1994, pp. 351359.

[17] W. Yu and X. Li, Fuzzy identification using fuzzy neural networks withstable learning algorithms, IEEE Trans. Fuzzy Syst., vol. 12, no. 3,

pp. 411420, Jun. 2004.

[18] M. Kumar, N. Stoll, and R. Stoll, An energy gain bounding approachto robust fuzzy identification, Automatica, vol. 42, no. 5, pp. 711721,May 2006.

[19] M. Kumar, R. Stoll, and N. Stoll, Deterministic approach to robust adap-tive learning of fuzzy models, IEEE Trans. Syst., Man. Cybern. B,Cybern., vol. 36, no. 4, pp. 767780, Aug. 2006.

[20] K. S. Azoury and M. K. Warmuth, Relative loss bounds for on-linedensity estimation with the exponential family of distributions, Mach.

Learning, vol. 43, no. 3, pp. 211246, Jun. 2001.[21] L. Bregman, The relaxation method of finding the common point of

convex sets and its application to the solution of problems in convexprogramming, USSR Comp. Math. Phys., vol. 7, pp. 200217, 1967.

[22] J. Kivinenand M. K. Warmuth,Relative lossbounds for multidimensionalregression problems, Mach. Learning, vol. 45, no. 3, pp. 301329, 2001.

[23] M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, Adaboostand Bregman distances, Mach. Learning, vol. 48, pp. 253285, 2002.

[24] N. Murata, T. Takenouchi, T. Kanamori, and S. Eguchi, Informationgeometry of u-boost and Bregman divergence, Neural Comput., vol. 16,no. 7, pp. 14371481, 2004.

[25] B. Taskar, S. Lacoste-Julien,and M. I. Jordan, Structured prediction, dualextragradient and Bregman projections, J. Mach. Learning Res., vol. 7,pp. 16271653, 2006.

[26] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering withBregman divergences, J. Mach. Learning Res., vol. 6, pp. 17051749,2005.

[27] R. Nock and F. Nielsen, On weighting clustering, IEEE Trans. PatternAnal. Mach. Intell., vol. 28, no. 8, pp. 12231235, Aug. 2006.

[28] A. Banerjee, X. Guo, and H. Wan, On the optimality of conditionalexpectation as a Bregman predictor, IEEE Trans. Inf. Theory, vol. 51,no. 7, pp. 26642669, Jul. 2005.

[29] J. Kivinen and M. K. Warmuth, Additive versus exponentiated gradientupdates for linear prediction, Inf. Comput., vol. 132, no. 1, pp. 164, Jan.1997.

[30] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems.Philadelphia, PA: SIAM, 1995.

[31] D. P. Helmbold, J. Kivinen,and M. K. Warmuth, Relative loss boundsforsingle neurons, IEEE Trans. Neural Netw., vol. 10, no.6, pp.12911304,Nov. 1999.

[32] P. Auer, M. Hebster, and M. K. Warmuth, Exponentially many localminima for single neurons, in Advance Neural Information ProcessSystems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds.Cambridge, MA: MIT Press, Jun. 1996, pp. 316322.

[33] N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth, Worst-case quadraticloss bounds for prediction using linear functions and gradient descent,

IEEE Trans. Neural Netw., vol. 7, no. 3, pp. 604619, May 1996.[34] P. Auer,N. Cesa-Bianchi, and C. Gentile, Adaptive and self-confidenton-

line learning algorithms, J. Comput. Syst. Sci., vol. 64, no. 1, pp. 4875,Feb. 2002.

[35] S. Kumar, M. Kumar, R. Stoll, and U. Kragl, Handling uncertainties intoxicity modelling using a fuzzy filter, SAR QSAR Environ. Res., vol. 18,no. 7/8, pp. 645662, Dec. 2007.

[36] P. Lindskog, Fuzzy identification from a grey box modeling point ofview, in Fuzzy Model Identification: Sel. Approaches, H. Hellendoornand D. Driankov, Eds. Berlin, Germany: Springer-Verlag, 1997.

[37] M. Burger, H. Engl, J. Haslinger, and U. Bodenhofer, Regularized data-driven construction of fuzzy controllers, J. Inverse Ill-Posed Problems,vol. 10, pp. 319344, 2002.

[38] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive fuzzy identification

of time-varying processes with uncertain data. Handling uncertainties inthe physical fitness fuzzy approximation with real world medical data:An application, Fuzzy Optim. Decis. Making, vol. 2, pp. 243259, Sep.2003.

[39] M. Kumar, R. Stoll, and N. Stoll, Regularized adaptation of fuzzy infer-ence systems. Modelling the opinion of a medical expert about physicalfitness: An application, FuzzyOptim. Decis. Making, vol. 2, pp. 317336,Dec. 2003.

[40] M. Kumar, R. Stoll, and N. Stoll, Robust solution to fuzzy identificationproblem with uncertain data by regularization. Fuzzy approximation tophysical fitness with real world medical data: An application, FuzzyOptim. Decis. Making, vol. 3, no. 1, pp. 6382, Mar. 2004.

[41] M. Kumar, R. Stoll, and N. Stoll, Robust adaptive identification of fuzzysystems with uncertain data, Fuzzy Optim. Decis. Making, vol. 3, no. 3,pp. 195216, Sep. 2004.

[42] M. Kumar, R. Stoll, and N. Stoll, A robust design criterion for inter-pretable fuzzy models with uncertain data, IEEE Trans. Fuzzy Syst.,

vol. 14, no. 2, pp. 314328, Apr. 2006.


14/14


Mohit Kumar (M08) received the B.Tech. degree inelectrical engineering from the National Institute ofTechnology, Hamirpur, India, in 1999, the M.Tech.degree in control engineering from the Indian Insti-tute of Technology, Delhi, India, in 2001, and thePh.D. degree (summa cum laude) in electrical engi-neering from Rostock University, Rostock, Germany,in 2004.

From 2001 to 2004, he was a Research Scien-tist with the Institute of Occupational and SocialMedicine, Rostock. He is currently with the Center

for Life Science Automation, Rostock. His current research interests include ro-bust adaptive fuzzy identification, fuzzy logic in medicine, and robust adaptivecontrol.

Norbert Stoll received the Dip.-Ing. in automa-tion engineering and the Ph.D. degree in measure-ment technology from Rostock University, Rostock,Germany, in 1979 and 1985, respectively.

He was the Head of the analytical chemistry sec-tion with the Academy of Sciences of GDR, Cen-tral Institute for Organic Chemistry, until 1991. From1992 to 1994, he was an Associate Director of theInstitute of Organic Catalysis, Rostock. Since 1994,he has been a Professor of measurement technologywith the Engineering Faculty, University of Rostock.

From 1994 to 2000, he was the Director of the Institution of Automation atRostock University. Since 2003, he has been the Vice President of the Centerfor Life Science Automation, Rostock. His current research interests includemedical process measurement, lab automation, and smart systems and devices.

Regina Stoll received the Dip.-Med., the Dr.med.in occupational medicine, and the Dr.med.habil inoccupational and sports medicine from Rostock Uni-versity, Rostock, Germany, in 1980, 1984, and 2002,respectively.

She is the Head of the Institute of PreventiveMedicine, Rostock. She is a faculty member withthe Medicine Faculty and a Faculty Associate withthe College of Computer Science and Electrical En-gineering, Rostock University. She also holds an ad-

junct faculty member position with the Industrial En-gineering Department, North Carolina State University, Raleigh. Her currentresearch interests include occupational physiology, preventive medicine, andcardiopulmonary diagnostics.

Documents

Adaptive Fuzzy Filtering in a Deterministic Setting