NARMAX time series model prediction: feedforward and ...static.tongtianta.site/paper_pdf/33a846e8-a884-11e9-afd0-00163e08bb86.pdf · fuzzy logic has been incorporated with the neural

Fuzzy Sets and Systems150 (2005) 331–350www.elsevier.com/locate/fss

NARMAX time series model prediction: feedforward andrecurrent fuzzy neural network approaches

Yang Gao∗, Meng Joo ErSchool of Electrical and Electronic Engineering, Nanyang Technological University, S1-B2a-01, Nanyang Avenue, PDF Room,

Singapore 639798, Singapore

Received 30 October 2003; received in revised form 13 September 2004; accepted 17 September 2004Available online 27 August 2004

Abstract

The nonlinear autoregressive moving average with exogenous inputs (NARMAX) model provides a powerfulrepresentation for time series analysis, modeling and prediction due to its capability of accommodating the dynamic,complex and nonlinear nature of real-world time series prediction problems. This paper focuses on the modelingand prediction of NARMAX-model-based time series using the fuzzy neural network (FNN) methodology. Bothfeedforward and recurrent FNNs approaches are proposed. Furthermore, an efficient algorithm for model struc-ture determination and parameter identification with the aim of producing improved predictive performance forNARMAX time-series models is developed. Experiments and comparative studies demonstrate that the proposedFNN approaches can effectively learn complex temporal sequences in an adaptive way and they outperform somewell-known existing methods.© 2004 Elsevier B.V. All rights reserved.

Keywords:Time series prediction; Fuzzy neural networks; Takagi–Sugeno–Kang fuzzy inference systems; NARMAX models

1. Introduction

Time series prediction is an important practical problem with a variety of applications in businessand economic planning, inventory and production control, weather forecasting, signal processing, andmany other fields. In the last decade, neural networks (NNs) have been extensively applied for complextime series processing tasks[3,5,16,17]. This is due to their capability of handling nonlinear functional

∗ Corresponding author.E-mail addresses:[email protected](Y. Gao),[email protected](M.J. Er)..

0165-0114/$ - see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.fss.2004.09.015

http://www.elsevier.com/locate/fss

mailto:[email protected]

mailto:[email protected]

332 Y. Gao, M.J. Er / Fuzzy Sets and Systems 150 (2005) 331–350

dependencies between past time series values and estimates of the values to be forecast. More recently,fuzzy logic has been incorporated with the neural models for time series prediction[2,4,8–14,19]. Theseapproaches are generally known as fuzzy neural networks (FNNs) or NN-based fuzzy inference sys-tems (FISs) approaches. FNNs possess both the advantages of FISs, such as human-like reasoning andease of incorporating expert knowledge, and NNs, such as learning abilities, optimization abilities andconnectionist structures. By virtue of this, low-level learning and computational power of NNs can beincorporated into the FISs on one hand and high-level human-like reasoning of FISs can be incorporatedinto NNs on the other hand.Either using NNs or FNNs to model time series, the design of learning algorithm is a key issue to

address. This normally involves parameter learning and structure learning. Many NNs/FNNs only offerparameter learning,which use techniques like backpropagation[8], and least squares[5,13], etc. However,only adjustment of parameters will not be sufficient in many cases, because this reduces the flexibilityand numerical processing capability of the networks and may result in redundant or inefficient compu-tation. Now, a lot of NN/FNN learning algorithms also offer structure learning, using techniques suchas genetic algorithms[9,14], orthogonal techniques[3,17], similarity/degree measure[2,12], and variousintegrated one-pass methods[4,10,11,16,19], etc. Another issue to be concerned for NN/FNN modelingis the choice of model representation. There are two major choices in this case, namely feedforward[2–4,8–11,14,16,17,19]and recurrent[5,12,13].In this paper, time series models are investigated using FNN approaches in both feedforward and

recurrentmodel representation.TheproposedFNNpredictors have the following salient features: (1)Theyarecapableofmodelingcomplex timeserieswithon-lineadjustment; (2)Dynamicstructureandparameterlearning; (3) Fast learning speed; (4) Good generalization and computational efficiency. A sequential andhybrid (supervised/unsupervised) learning algorithm, namely generalized fuzzy neural network (G-FNN)learning algorithm, is employed to form the FNN prediction model.1 Various comparative studies showthat the proposed approaches are superior to many existing methods.The rest of the paper is organized as follows. Section 2 reviews the NARMAX models in the use of

feedforward and recurrent G-FNNs. The concept of optimal predictors is also introduced in this section.Section 3 provides a description of the G-FNN architecture and learning algorithm. In Section 4, onefeedforward and two recurrent G-FNN predictors are proposed for nonlinear time series modeling andprediction. Three experiments are presented in Section 5 to demonstrate the predictive ability of theproposed methods. The results are compared with some popular existing methods. Finally, conclusionsand directions for future research are given in Section 6.

2. NARMAX model and optimal predictors

2.1. General NARMAX(ny ,ne,nx) model

The statistical approach for forecasting involves the construction of stochastic models to predict thevalue of an observationyt using previous observations. A very general class of such models used for

1The algorithm has been shown to be effective on functional modeling and nonlinear control (refer to our latest work inintelligent control problems[6,7]).

Y. Gao, M.J. Er / Fuzzy Sets and Systems 150 (2005) 331–350 333

forecasting purpose is the nonlinear autoregressive moving average with exogenous inputs (NARMAX)models given by

y(t) = F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne), x(t − 1), . . . , x(t − nx)] + e(t), (1)

wherey, eandx are output, noise and external input of the system model, respectively,ny , ne andnx arethe maximum lags in the output, noise and input, respectively, andF is an unknown smooth function. Itis assumed thate(t) has zero mean, is independent and identically distributed, is independent of pastyandx, and has a finite variance�2.Several special cases of the general NARMAX(ny ,ne,nx) model which are frequently used are sum-

marized here:NAR(ny) Model:

y(t) = F[y(t − 1), . . . , y(t − ny)] + e(t), (2)

NARMA(ny ,ne) Model:

y(t) = F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne)] + e(t), (3)

NARX(ny ,nx) Model:

y(t) = F[y(t − 1), . . . , y(t − ny), x(t − 1), . . . , x(t − nx)] + e(t). (4)

2.2. Optimal predictors

Optimum prediction theory revolves around minimizing mean square error (MSE). Given the finitepast and provided the conditional mean exists, theminimumMSE predictor is the conditional mean givenby [5]

y(t) = E[y(t)|y(t − 1), y(t − 2), . . . , y(1)]. (5)

2.2.1. Approximately optimal NAR(X) predictorsAssuming thate(t) has zero mean and a finite variance�2 in (1), the optimal predictor given by (5) for

the NAR(ny) or NARX(ny ,nx) model in (2) or (4) is approximately given by

y(t) = E[y(t)|y(t − 1), . . . , y(t − ny)],= F[y(t − 1), . . . , y(t − ny)] (NAR model) (6)or= F[y(t − 1), . . . , y(t − ny), x(t − 1), . . . , x(t − nx)] (NARX model), (7)

respectively, where optimal predictors (6) and (7) have MSE�2.

2.2.2. Approximately optimal NARMA(X) predictorsFor the NARMA(ny ,ne) or NARMAX(ny ,nx ,ne) model in (3) or (1), the optimal predictor (5) is

approximately given by

y(t) = E[y(t)|y(t − 1), . . . , y(t − ny)]= F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne)] (NARMA model) (8)


Fig. 1. G-FNN architecture.

or= F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne), x(t − 1), . . . , x(t − nx)]

(NARMAX model), (9)

wheree = y − y.In this case, the following initial conditions are often used:

y(t) = e(t) = 0 t�0. (10)

3. Basic G-FNN = Takagi–Sugeno–Kang-type FIS

3.1. G-FNN architecture

G-FNN is a functionally equivalent to TSK-type FIS whose architecture is depicted in Fig.1.The output of the G-FNN is given by

y(t)=FG-FNN[z(t)] =nr∑j=1

�j (z)wj

=nr∑j=1

exp[−(z− cj )T�j (z− cj )]wj, (11)


where the input vectorz = [z1 . . . zni ]T, the center vectorcj = [c1j . . . cnij ]T and the width matrix�j = diag

(1

�21j· · · 1

�2ni j

)for the Gaussian membership function of thejth rule, TSK-type weight vector

wj = k0j + k1j z1 + · · · + knij zni of the jth rule, andkijs are real-valued parameters.Eq. (11) can be represented in matrix form as follows:

y(t) = �T(t)w, (12)

where the regression vector� = [�1 �1z1 . . .�1zni . . . . . .�nr�nr

z1 . . .�nrzni ]T and the weight vector

w = [k01 k11 . . . kni1 . . . . . . k0nr k1nr . . . kninr ]T.3.2. G-FNN learning algorithm

In this paper, structure and parameters of the time series model are formed automatically and dynami-cally using the G-FNN learning algorithm. The G-FNN learning algorithm is a sequential and hybrid(supervised/unsupervised) learning algorithm, which provides an efficient way of constructing the G-FNN model online and combining structure and parameter learning simultaneously. Structure learningincludes determining the proper number of membership functionsnr , i.e. the coarse of input fuzzypartitions and the finding of correct fuzzy logic rules. The learning of network parameters correspondsto the learning of the premise and consequent parameters of the G-FNN. Premise parameters includemembership function parameterscj and�j , and the consequent parameter refers to the weightwj of theG-FNN.For structure learning, two criteria are proposed for fuzzy rule generation and an error reduction ratio

(ERR) concept is proposed for rule pruning. Parameter learning is performed by combining the semi-closed fuzzy set concept for membership learning and the linear least square (LLS) method for weightlearning. Given the supervised training data, the proposed learning algorithm first decides whether ornot to generate a fuzzy rule based on two proposed criteria. If structure learning is necessary, premiseparameters of a new fuzzy rule will be obtained using semi-closed fuzzy set concept. The G-FNN willfurther decide whether there are redundant rules to be deleted based on the ERR, and it will also changethe consequents of all the fuzzy rules properly. If no structure learning is necessary, parameter learningwill be performed to adjust the current premise and consequent parameters. This structure/parameterlearning will be repeated for each online incoming training input–output pair. The flow chart of thislearning algorithm is shown in Fig.2.

3.2.1. Two criteria of fuzzy rule generationThe output error of the G-FNNwith respect to the teaching signal is an important factor in determining

whether a new rule should be added. For each training data pair[z(t), y(t)] : t = 1 . . . nd , wherey(t)is the desired output or the supervised teaching signal andnd is the total number of training data, wefirst compute the G-FNN outputy(t) for the existing structure using (11). The system error is definedase(t) = ‖y(t) − y(t)‖. If e(t) is bigger than a designed thresholdKe, a new fuzzy rule should beconsidered.A fuzzy rule is a local representation over the region of input space. If a new pattern does not fall within

the accommodation boundary of the existing rules, the G-FNN will consider generating a new rule andvice versa. For each observation[z(t), y(t)] at sample timet, we calculate the regularized Mahalanobisdistancemdj (t) = √[z(t) − cj ]T�j [z(t) − cj ], j = 1 . . . nr . The accommodation factor is defined as


Fig. 2. Flow chart of the G-FNN learning algorithm.

d(t) = min mdj (t). If d(t) is bigger thanKd =√ln 1

� , a new rule should be considered becausethe existing fuzzy system does not satisfy�-completeness[18]. Otherwise, the new input data can berepresented by the nearest existing rule.

3.2.2. Pruning of fuzzy rulesIn this learning algorithm, the ERR concept proposed in[3] is utilized for rule and parameter sensitivity

measurement. It is further adopted for rule pruning.At sample timet, the total training data of the G-FNNare(Z, y) whereZ = [z(1) z(2) . . . z(t)]T andy = [y(1) y(2) . . . y(t)]T. From (12), we have

y = �w + e, (13)

wherey ∈ t is the teaching signal,w ∈ v is the real-valued weight vector,� = [�(1) . . .�(t)]T ∈ t×v is known as the regressor,e= [e(1) e(2) . . . e(t)]T ∈ t is the system error vector that is assumedto be uncorrelated with the regressor�, andv = nr(ni + 1).For any matrix�, if its row number is larger than the column number, i.e.t�v, it can be transformed

into a set of orthogonal basis vectors by QR decomposition[15]. Thus

� = QR, (14)


whereQ = [q1 q2 . . . qv] ∈ t×v has orthogonal columns, andR ∈ v×v is an upper triangularmatrix. Therefore, (13) can be written as

y = QRw + e. (15)

Thus, an ERR due toq� : � = 1 . . . v can be defined as follows[3]:

err� = (qT� y)2

qT� q�yTy. (16)

The ERR offers a simple and effective means of seeking a subset of significant regressors. The termerr�in (16) reflects the similarity ofq� andy or the inner product ofq� andy. Largererr� implies thatq� ismore significant to the outputy.The ERR matrix of the G-FNN is defined as follows:

ERR=

err1 err2 . . . errnrerrnr+1 errnr+2 . . . errnr+nr

...... . . .

...

errni×nr+1 errni×nr+2 . . . errni×nr+nr

(17)

= [err 1 err 2 . . . err nr

]. (18)

We can further define the total ERRT errj , j = 1 . . . nr corresponding to thejth rule as

T errj =√(err j )Terr j

ni + 1. (19)

If T errj is smaller than a designed threshold 0< Kerr < 1, thejth fuzzy rule should be deleted, andvice versa.

3.2.3. Determination of premise parametersPremise parameters or fuzzy Gaussian membership functions of the G-FNN are allocated to be semi-

closed fuzzy sets and therefore satisfy the�-completeness of fuzzy rules. The following three cases areconsidered while determining the premise parameters:

1. e(t) > Ke andd(t) > Kd

First, we compute the Euclidean distanceedijn(t) = ‖zi(t)− bijn‖ betweenzi(t) and the boundarypointbijn ∈ {ci1, ci2, . . . , ciNr , zi,min, zi,max}. Next, we find

jn = argminedijn(t). (20)

If edijn(t) is less than a threshold or a dissimilarity ratio of neighboring membership functionKmf ,

we choose

ci(nr+1) = bijn, �i(nr+1) = �

ijn. (21)


Otherwise, we choose

ci(nr+1) = zi(t), (22)

�i(nr+1) =max(|ci(nr+1) − ci(nr+1)a |, |ci(nr+1) − ci(nr+1)b |)√

ln 1�

. (23)

2. e(t) > Ke butd(t)�Kd

z(t) can be clustered by the adjacent fuzzy rule, however, the rule is not significant enough toaccommodate all the patterns covered by its ellipsoidal field. Therefore, the ellipsoidal field needs tobe decreased to obtain a better local approximation. A simple method to reduce the Gaussian width isas follows:

�ij new

= Ks × �ij old

, (24)

whereKs is a reduction factor which depends on the sensitivity of the input variables.3. e(t)�Ke butd(t) > Kd or e(t)�Ke andd(t)�Kd

The system has good generalization and nothing need to be done except adjusting weights.

3.2.4. Determination of consequent parametersAs foreshadowed earlier, TSK-type consequent parameters are determined using the LLSmethod. The

LLS method is employed to find the weight vectorw such that the error energy(eTe) is minimized in(13) [15]. Furthermore, the LLS method provides a computationally simple but efficient procedure ofdetermining the weight so that it can be computed very quickly and used for real-time control. The weightvector is calculated as follows:

w = �†y, (25)

where�† is the pseudoinverse of�

�† = (�T�)−1�T. (26)

4. Feedforward and recurrent G-FNN predictors

4.1. Feedforward G-FNN NAR(X) predictors

With proper choice of their inputs, feedforward G-FNNs can be used to emulate NAR(ny) or NARX(ny ,nx) models for time series prediction (see Fig.3).A feedforwardG-FNN is a nonlinear approximationto functionF that is equivalent to optimal predictors in (6) and (7) as follows:

y(t) = F[y(t − 1), . . . , y(t − ny)]= FG-FNN

{[y(t − 1) . . . y(t − ny)]T}

(NAR model) (27)or= F[y(t − 1), . . . , y(t − ny), x(t − 1), . . . , x(t − nx)]= FG-FNN

{[y(t − 1) . . . y(t − ny) x(t − 1) . . . x(t − nx)]T}

(NARX model). (28)


Fig. 3. Feedforward G-FNN NARX.

Fig. 4. Recurrent G-FNN I NARX.

As mentioned in Section3.2, the structure and parameters, i.e.nr , cj , �j andwj , of the feedforwardG-FNN are estimated fromnd input–output training pairs using the G-FNN learning algorithm, therebyarriving atF, an estimate ofF. Estimates are obtained by minimizing the sum of the squared residuals∑nd

t=1[y(t) − y(t)]2 and this is accomplished by the LLS method.

4.2. Recurrent G-FNN I NAR(X) predictors

Unlike feedforward G-FNN, recurrent G-FNN I uses feedback signals while approximating NAR(X)models (see Fig.4). To emulate NAR(X) models, the output of recurrent G-FNN I is represented asfollows:

y(t) = F[y(t − 1), . . . , y(t − ny)]= FG-FNN{[y(t − 1) . . . y(t − ny)]T} (NAR model) (29)or= F[y(t − 1), . . . , y(t − ny), x(t − 1), . . . , x(t − nx)]= FG-FNN

{[y(t − 1) . . . y(t − ny) x(t − 1) . . . x(t − nx)]T}

(NARX model). (30)

A recurrent G-FNN I NARmodel is very close to a relaxation network, e.g. recurrent G-FNN I NAR(1)model is functionally equivalent to a SISO Hopfield network (a typical relaxation network). Relaxationnetworks start from a given state and settle to a fixed point, typically denoting a class[5]. Such behavior isgenerally considered more desirable for classification rather than modeling. In order to use the recurrentG-FNN I to model and predict a trajectory rather than to reach a fixed point, it is advisable to use NARXmodels while applying recurrent G-FNN I predictors.


Fig. 5. Recurrent G-FNN II NARMAX.

4.3. Recurrent G-FNN II NARMA(X) predictors

The recurrent G-FNN II presented in this section is a G-FNN with feedback signals (see Fig.5) thatcan be used to estimate optimal NARMA(X) predictors in (8) and (9) as follows:

y(t)= F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne)]=FG-FNN

{[y(t − 1) . . . y(t − ny) e(t − 1) . . . e(t − ne)]T}

(NARMA model) (31)

or

= F[y(t − 1), . . . , y(t − ny), e(t − 1), . . . , e(t − ne), x(t − 1), . . . , x(t − nx)]=FG-FNN

{[y(t − 1) . . . y(t − ny) e(t − 1) . . . e(t − ne) x(t − 1) . . . x(t − nx)]T}

(NARMAX model) (32)

Similarly, topology of the recurrent G-FNN II is formed based on the G-FNN learning algorithm tominimize

∑ [y(t) − y(t)]2.

5. Experiments and comparative studies

To demonstrate the validity of our proposed predictors, the feedforward G-FNN, recurrent G-FNNI and II are trained and tested on a NARX(2,1) dynamic process, Box–Jenkins gas furnace data, andchaotic Mackey–Glass time series. In addition, performances of the proposed methodology are com-pared with some recently developed methods, including FNN methods such as adaptive network-basedfuzzy inference system (ANFIS)[8], RBF-based adaptive fuzzy system (RBF-AFS)[4], fuzzy neuralnetwork system using similarity analysis (FNNS)[2], hybrid neural fuzzy inference system (HyFIS)[11], dynamic fuzzy neural network (D-FNN)[19], and dynamic evolving neural-fuzzy inference system(DENFIS) [10], and NN methods such as orthogonal-least-squares-based RBF network (OLS-RBFN)[3] and pseudo-Gaussian basis function network (PG-BF)[16], and resource-allocating network usingorthogonal techniques (RANO)[17], etc.


Table 1G-FNN performance on NARX(2,1) time series (noise-free)

Model ny ne nx nr npa Training MSEb Testing MSE

Feedforward 1 0 0 2 8 0.2060 0.2059G-FNN 2 0 0 7 45 0.00040 0.00039

1 0 1 6 37 0.0028 0.00282 0 1 5 45 0.00012 0.00011

Recurrent 1 0 0 2 8 2.38 2.53G-FNN I 2 0 0 10 53 1.35 1.95

1 0 1 7 42 0.0079 0.00802 0 1 5 44 0.00015 0.00032

Recurrent 1 1 0 5 32 0.1202 0.1501G-FNN II 2 1 0 10 75 0.0060 0.0017

1 1 1 7 54 0.0023 0.00642 1 1 4 47 0.00022 0.00045

anp denotes number of parameters of the G-FNN and is calculated as∑ni

i=1(nci +n�i )+nr (ni +1), whereni = ny +ne +nxis the total number of input variables of the G-FNN both feedforward and recurrent types,nci andn�i are the number of centerand width parameters, respectively, of the Gaussian membership functions for theith input variable.bMSE as a performance index of prediction models is defined as 1/nd

∑ndt=1[y(t) − y(t)]2.

5.1. Example 1: NARX(2,1) dynamic process

The dynamic NARX(2,1) process is generated as follows:

y(t)= y(t − 1)y(t − 2)[y(t − 1) + 2.5]1+ y2(t − 1) + y2(t − 2)

+ x(t − 1) + e(t)

=F[y(t − 1), y(t − 2), x(t − 1)] + e(t), (33)

where the input has the formx(t) = sin(2�t/25). The topology of all feedforward and recurrent G-FNNsare estimated using the first 200 observations of a time series generated from NARX(2,1) model in (33).The feedforward and recurrent G-FNN predictors are then tested on the following 200 observations fromthemodel. Thresholds of theG-FNN learning algorithm are designed in a systematic manner based on the

desired accuracy level of the system (refer to Appendix), i.e.emax= 0.1, emin = 0.02,dmax=√ln( 1

0.5),

dmin =√ln( 1

0.8),Ks,min = 0.9,Kmf = 0.5(25%) andKerr = 0.001.Two experiments are performed to verify the capabilities of the proposed G-FNN predictors for both

noise-free and noisy situations. The disturbance signal is set to beN(0,4e − 2) white noise in noisyconditions. The results are shown in Tables1 and2, respectively. The testing set error is the basis ofmodel comparison while the training set error is indicative of whether the model has overfitted the data.A few observations and comments can be made based on the above two tables:

• As closeness to the real time series model, G-FNN prediction models withny = 2 provides betterfitting results than those withny = 1. Similarly, NARX prediction models perform better than NARprediction models.


Table 2G-FNN performance on NARX(2,1) time series (noisy)

Model ny ne nx nr np Training MSE Testing MSE

Feedforward 1 0 1 7 42 0.070 0.089G-FNN 2 0 1 5 43 0.047 0.057

Recurrent 1 0 1 6 40 0.071 0.100G-FNN I 2 0 1 6 52 0.059 0.094

Recurrent 1 1 1 6 49 0.080 0.110G-FNN II 2 1 1 5 60 0.052 0.092

• Using the G-FNN learning algorithm, structure and parameters of NARMAX time series models areformedautomatically andsystematically. Inparticular, fuzzy rulesof theG-FNNpredictorsare recruitedor pruned dynamically according to their significance to the system performance and the complexity ofthe mapped system. It should be highlighted that computational complexity of the G-FNN predictorsdoes not increase exponentially with an increase in system dimensionality. In other words, the G-FNNpredictors are able to generate a very parsimonious rule base.

• The recurrent G-FNN I NAR models perform poorly. This verifies our analysis in Section4.2. Therecurrent NAR models aim to reach a fixed point rather than predict a continuous range of values.

• As mentioned in Section2.2.1, the training MSE of the G-FNN NARX predictors in noisy conditionsshould be at best equal to 4e− 2, i.e. variance of the white noise. This is verified because the G-FNNsare not able to model the noise, which is good from a prediction perspective. Hence in practice, it ishighly recommended to use preprocessing techniques, such as low-pass filters to filter out the noisefrom the signal before using G-FNN predictors.

• Concerning modeling and predicting NARX(ny ,nx) time series, the performance of the proposedprediction models can be summarized as follows: feedforward NARX> recurrent II NARMAX≈recurrent I NARX> feedforward NAR≈ recurrent II NARMA> recurrent I NAR, where> denotes“performs better than” and≈ denotes “is comparable with”. Therefore, the recurrent G-FNN I and IIdo not have much advantage over the feedforward G-FNN in both noise-free and noisy conditions.

To further demonstrate the superior performance of the G-FNN predictors, comparisons with fourmethods, i.e. RBF-AFS, FNNS, OLS-RBFN and D-FNN, are shown in Table3. In this comparison, allsix methods provide simultaneous structure and parameter learning within the FNN prediction models.Noise-free signals and fitting models withny = 2 andnx = 1 are used for all the methods. It can beseen that the G-FNN provides better generalization as well as greater parsimony. The proposed G-FNNpredictors are fast in learning speed because no iterative learning loops are needed. Generating fewerrules and parameters leads to better computational efficiency using the G-FNN learning algorithm.

5.2. Example 2: Box–Jenkins gas furnace data

In this example, the proposed predictors are applied to the Box–Jenkins gas furnace data[1], a bench-mark problem for testing identification algorithms. This data set was recorded from a combustion processof a methane–air mixture. During the process, the portion of methane was randomly changed, keeping a


Table 3Performance comparisons with RBF-AFS, FNNS, OLS-RBFN and D-FNN

Model nr np Testing MSE Learning method

RBF-AFS 35 280 0.019 structure (one pass)×a parameter (iterative loops)FNNS 22 84 0.005 structure (one pass)× parameter (300 loops)OLS-RBFN 65 326 0.00083 structure× parameter (one pass)D-FNN 6 60 0.00080 structure× parameter (one pass)G-FNN II 4 47 0.00045 structure× parameter (one pass)G-FNN I 5 44 0.00032 structure× parameter (one pass)G-FNN 5 45 0.00011 structure× parameter (one pass)

a× implies structure and parameter learnings are performed simultaneously.

Table 4Kerr on modeling and prediction results

Kerr nr Testing MSE

0.005 4 0.18300.001 5 0.09630.0005 5 0.09630.0001 6 0.08620.00005 7 0.0744

constant gas flow rate. The data set consists of 296 pairs of input–output measurements. The inputu(t)

is the methane gas flow into the furnace and the outputy(t) is the CO2 concentration in the outlet gas.The sampling interval is 9 s.To facilitate comparison, the following fitting model is chosen:

y(t) = F[y(t − 1), u(t − 4)]. (34)

As a result, there are in total 292 input/output data pairs withy(t −1) andu(t −4) as input variables andy(t) as an output variable. The data were partitioned in 200 data pairs as a training set, and the remaining92 pairs as a test set for validation. Since the feedforward G-FNN has advantages over the recurrentG-FNNs as shown in Example 1, only feedforward G-FNN is used in this experiment. Thresholds of

the G-FNN learning algorithm were set asemax = 2, emin = 0.1, dmax =√ln( 1

0.5), dmin =√ln( 1

0.8),Ks,min = 0.9, andKmf = 1.66(30%) (refer to Appendix).The decrement design method ofKerr is also demonstrated in this experiment as shown in Table4.

It is verified thatKerr directly affects rule generation and thus modeling error. As the value ofKerr isdecreased, more rules are generated which leads to smaller modeling errors. Training and testing resultsof the feedforward G-FNN withKerr = 0.00005 are also illustrated in figure form. A total of 7 fuzzyrules are generated for the G-FNN during training as shown in Fig.6(a). The corresponding Gaussianmembership functions with respect to the input variables are shown in Fig.7. It can be seen that themembership functions are evenly distributed over the training interval. This is in line with the aspirationof “local representation” in fuzzy logic. The root mean square error (RMSE) is able to converge very


Fig. 6. Training performances: (a) Fuzzy rule generation; (b) RMSE during training process.

quickly during training as illustrated in Fig.6(b).As theG-FNN algorithm uses one-pass learningmethodto avoid iterative learning loops, fast learning speed can be achieved. Fig.8shows the actual and predictedvalues for both training and testing data.In this experiment, performance comparison of the G-FNN predictor is carried out with three learning

models, i.e.ANFIS, HyFIS andPG-BF. The results are summarized in Table5. It is shown that theG-FNNprovides better generalization and more compact rule base.

5.3. Example 3: chaotic Mackey–Glass time series

ChaoticMackey–Glass time series is again a benchmark problem that has been considered by a numberof researchers[3,4,8]. The time series is generated from the following equation:

y(t) = (1− a)y(t − 1) + by(t − �)

1+ y10(t − �), (35)

where� > 17 gives chaotic behavior. Higher value of� yields higher-dimensional chaos. For the ease ofcomparison, parameters are selected as follows:a = 0.1, b = 0.2 and� = 17.The fitting model of (35) is chosen to be

y(t) = F[y(t − p), y(t − p − �t), y(t − p − 2�t), y(t − p − 3�t)]. (36)

For simulation purpose, it is assumed thaty(t) = 0, ∀t < 0, y(0) = 1.2, andp = �t = 6 and119� t�1140. As in Example 2, only feedforward G-FNN is used in this example. The first 500


Fig. 7. Gaussian membership functions w.r.t input variables.

Fig. 8. Prediction results: (a) Box–Jenkins gas furnace data and one-step ahead prediction; (b) prediction error.

input/output data pairs of theMackey–Glass time series generated from (35) are used for training the feed-forward G-FNN while the following 500 data pairs are used for validating the identified model. Thresh-

olds of the G-FNN learning algorithm are predefined asemax = 0.1, emin = 0.01, dmax =√ln( 1

0.5),

dmin =√ln( 1

0.8),Ks,min = 0.9,Kmf = 0.25(25%) andKerr = 0.0001.


Table 5Performance comparisons with ANFIS, HyFIS and PG-BF

Model nr Testing MSE Learning method

ANFIS 25 0.1643 parameter (iterative loops)HyFIS 15 0.0945 structure (one pass) +a parameter (200 loops)PG-BF 10 0.157 structure× parameter (one pass)G-FNN 7 0.0744 structure× parameter (one pass)

a+ implies structure learning and parameter learning are performed offline separately.

Fig. 9. Training performances: (a) Fuzzy rule generation; (b) RMSE during training process.

Using the G-FNN learning algorithm, a total of 10 fuzzy rules are generated for the G-FNN predictorduring training as shown in Fig.9(a). The corresponding Gaussian membership functions and the RMSEduring training are shown in Figs.10and9(b), respectively. Fig.11 shows that the actual and predictedvalues for both training and testing data are essentially the same and their differences can only be seenon a finer scale.Performance comparisons of the feedforward G-FNNwith RBF-AFS, OLS-RBFN, HyFIS andANFIS

are shown in Table6. All the methods in these comparisons provide structure learning except for ANFISwhich only provides parameter learning. Similar to Example 1, feedforward G-FNN is shown to bebetter performed than RBF-AFS and OLS-RBFN. The G-FNN predictor has bigger training RMSEcompared with ANFIS and HyFIS. This is due to the fact that ANFIS and HyFIS both provide iterativelearning loops that may lead to global optimal solutions, whereas G-FNN, in this case, only acquires a


Fig. 10. Gaussian membership functions w.r.t input variables.

Fig. 11. Prediction results: (a) Mackey–Glass time series fromt = 118–1140 and six-step ahead prediction; (b) prediction error.

sub-optimal solution. Another comparison is performed for feedforward G-FNN with some existing one-pass also known as on-line learning models, i.e. RANO, D-FNN and DENFIS, applied on the same task(see Table7). Feedforward G-FNN predictor outperforms these methods in terms of predictive accuracyas well as computational efficiency.


Table 6Performance comparisons with RBF-AFS, OLS-RBFN, HyFIS and ANFIS

Model nr np Training RMSE Testing RMSEa

RBF-AFS 21 210 0.0107 0.0128OLS-RBFN 35 211 0.0087 0.0089HyFIS 16 104 0.0021 0.0100ANFIS 16 104 0.0016 0.0030G-FNN 10 90 0.0063 0.0056

aTesting RMSEs are calculated based on the identified models obtained from the first 500 training pairs. No further tuning ofparameters is applied while testing the models.

Table 7Performance comparisons with RANO, D-FNN and DENFIS

Model nr Testing NDEIa

RANO 18 0.1492D-FNN 10 0.0360DENFIS 883 0.0420G-FNN 10 0.0243

aNondimensional error index (NDEI), also knownasnormalizedRMSE, is definedas theRMSEdividedby the standard deviationof the target time series.

6. Conclusions

In this paper, one feedforward and two recurrent G-FNN predictors are proposed, tested and compared.The proposed G-FNN predictors provide a sequential and hybrid learning method for model structuredetermination and parameter identification which greatly improves predictive performance. Experimentsand comparative studies demonstrate superior performance of the proposed approaches over some well-known existingmethods. Further study is necessary to investigate the robustness of the proposedmethods.One possible way to test the robustness is to train the predictionmodels using various training data groupsgenerated from the systemat different sampling rates. If thepredictionmodels are formedsimilarly despiteusing different training data group, the proposed FNN methods are deemed robust in this sense.

Appendix A. Threshold design in G-FNN algorithm

Remark A.1. The thresholdsKe andKd are designed following the idea of hierarchical learning, todecay gradually during learning as follows:

Ke =

emax, 1< t <

nd3 ,

max[emax( eminemax)((3t/nd)−1), emin] nd

3 � t� 2nd3 ,

emin,2nd3 � t�nd,

(A.1)


Kd =

dmax=

√ln 1

�min, 1< t <

nd3 ,

max[dmax( dmindmax)((3t/nd)−1), dmin] nd

3 � t� 2nd3 ,

dmin =√ln 1

�max,

2nd3 � t�nd.

(A.2)

The key idea of hierarchical learning is thatKe andKd are set to be larger asemax anddmax initially,which is known as coarse learning. They will gradually reachemin anddmin, respectively, to achieve finelearning. It is apparent that the values ofKe andKd directly affect rule generation. Generally speaking,the bigger the values ofKe andKd , the lower the chance of generating a new rule, and vice versa.In practical design,emaxandemin are selected based on the desired accuracy of theG-FNN.emaxshould

be set to the maximum acceptable error level of the system, andemin should be set to the desired errorlevel of the system.dmax anddmin can be designed based on the minimum and maximum completenesslevel of the fuzzy rules (i.e.�min and�max) that the designers are trying to achieve. It is recommended tohave�min = 0.5 and�max= 0.8 [18].

Remark A.2. The thresholdKmf implies minimum partitions of the input space. It is therefore recom-mended to be 20–40% of the minimum range value of all input variables.

Remark A.3. The thresholdKs is designed based on the sensitivity measure of theith input variable inthe jth fuzzy rule denoted assij . It is calculated using the ERR matrix in (17) as follows:

sij = erri×nr+j∑nii=1 erri×nr+j

. (A.3)

Ks could be designed with a minimum value ofKs,min whensij = 0, and a maximum value of 1 whensij is greater than or equal to the average sensitivity level of the input variables in thejth rule, i.e.

Ks =

1

1+ (1−Ks,min)nr2

Ks,min(sij− 1

ni)2, sij <

1ni,

1, 1ni

�sij .

(A.4)

Therefore,Ks,min�Ks�1 is used for width adjustment to increase the significance of the correspondingGaussian membership function. In order not to diminish the major information carried by the Gaussianfunction, it is recommended to chooseKs,min = 0.9.

Remark A.4. The thresholdKerr plays a crucial role on system performance which directly affectsrule generation and thus modeling error. Generally speaking, the bigger the value ofKerr , the higherthe chance of pruning a rule, and vice versa. IfKerr is set too big, rule generation will be insufficient.Since the least square method is used to determine the weight vectors of the G-FNN, insufficient rulegeneration will lead to large modeling errors.Kerr should be designed according to (19), i.e. the averagelevel of the significance of each rule parameter to each output. Analytical design ofKerr is difficult ifnot impossible, so it is highly recommended to use trial-and-error method which is more straightforward.From our experiences,Kerr is first tried from a value slightly smaller than 0.005 and gradually decreaseduntil acceptable performance is achieved. This decrement design method ofKerr can guarantee sufficient


rules to be generated resulting in small modeling errors.Kerr is therefore the last threshold parameter tobe designed for the G-FNN learning algorithm.

References

[1] G.E.P. Box, G.M. Jenkins, Time Series Analysis, Forecasting and Control, Holden Day, San Francisco, CA, 1970.[2] C.T. Chao, Y.J. Chen, C.C. Teng, Simplification of fuzzy-neural systems using similarity analysis, IEEE Trans. Systems,

Man Cybernet. 26 (1996) 344–354.[3] S. Chen, C.F.N. Cowan, P.M. Grant, Orthogonal least squares learning algorithm for radial basis function network, IEEE

Trans. Neural Networks 2 (1991) 302–309.[4] K.B. Cho, B.H. Wang, Radial basis function based adaptive fuzzy systems and their applications to system identification

and prediction, Fuzzy Sets and Systems 83 (1996) 325–339.[5] J.T. Connor, R.D. Martin, L.E. Atlas, Recurrent neural networks and robust time series prediction, IEEE Trans. Neural

Networks 5 (1994) 240–254.[6] Y. Gao, M.J. Er, Online adaptive fuzzy neural identification and control of a class of MIMO nonlinear systems, IEEETrans.

Fuzzy Systems 11 (2003) 462–477.[7] Y. Gao, M.J. Er, An intelligent adaptive control scheme for postsurgical blood pressure regulation, IEEE Trans. Neural

Networks, 16 (2004).[8] J.S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE Trans. Systems Man Cybernet. 23 (1993)

665–684.[9] C.F. Juang, Temporal problems solved by dynamic fuzzy network based on genetic algorithm with variable-length

chromosomes, Fuzzy Sets and Systems 142 (2004) 199–219.[10] N.K. Kasabov, Q. Song, DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series

prediction, IEEE Trans. Fuzzy Systems 10 (2002) 144–154.[11] J. Kim, N. Kasabov, HyFIS: adaptive neuro-fuzzy inference systems and their application to nonlinear dynamical systems,

Neural Networks 12 (1999) 1301–1319.[12] C.J. Lin, C.H. Chen, Identification and prediction using recurrent compensatory neuro-fuzzy systems, Fuzzy Sets and

Systems, Available online 11 August 2004, in press.[13] P.A. Mastorocostas, J.B. Theocharis, An orthogonal least-squares method for recurrent fuzzy-neural modeling, Fuzzy Sets

and Systems 140 (2003) 285–300.[14] S.E. Papadakis, J.B. Theocharis, A.G. Bakirtzis, A load curve based fuzzy modeling technique for short-term load

forecasting, Fuzzy Sets and Systems 135 (2003) 279–303.[15] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C: The Art of Scientific Computing,

University Press, Nancy, 1992.[16] I. Rojas, J. Gonzalez, A. Canas, A.F. Diaz, F.J. Rojas, M. Rodriguez, Short-term prediction of chaotic time series by using

RBF network with regression weights, Internat. J. Neural Systems 10 (2000) 353–364.[17] M. Salmeron, J. Ortega, C.G. Puntonet, A. Prieto, Improved RAN sequential prediction using orthogonal techniques,

Neurocomputing 41 (2001) 153–172.[18] L.X. Wang, A Course in Fuzzy Systems and Control, Prentice-Hall, Englewood Cliffs, NJ, 1997.[19] S.Wu, M.J. Er, Dynamic fuzzy neural networks—a novel approach to function approximation, IEEE Trans. Systems Man

Cybernet. Part B 30 (2000) 358–364.

Documents

NARMAX time series model prediction: feedforward and ...static.tongtianta.site/paper_pdf/33a846e8-a884-11e9-afd0-00163e08bb86.pdf · fuzzy logic has been incorporated with the neural