[IEEE 2014 American Control Conference - ACC 2014 - Portland, OR, USA (2014.6.4-2014.6.6)] 2014 American Control Conference - Weighted support vector echo state machine for multivariate

Weighted Support Vector Echo State Machine for Multivariate

Dynamic System Modeling*

Min Han and Xinying Wang

Abstract— Support vector echo state machine (SVESM) isa promising dynamic system modeling tool, which performedlinear support vector regression (SVR) in the high dimension”reservoir” state space. A variant of SVESM, weighted supportvector echo state machine (WSVESM) is proposed in this paperto deal with the multivariate dynamic system modeling problem.The historical observed data of the dynamic system are treatedas multivariate time series, and the proposed WSVESM modelis used to predict the time series. Different weights are allocatedto the training data, and a multi-parameter solution pathalgorithm is introduced to determine the solution of WSVESM.Simulation results based on artificial and real-world examplesshow the effectiveness of the proposed method.

I. INTRODUCTION

Nonlinear dynamic system modeling has become an im-

portant research direction and has been bounded in not

only the scientific research field but also the engineering

field [1]. Commonly, Nonlinear dynamic systems have the

characteristics of nonlinearity and have complex evolution

laws. It is difficult to establish the exact mathematic model

of nonlinear dynamic systems. However, a lot of historical

evolution data can be observed as a function of time. Time

series prediction technique can utilize these historical data

to predict the future evolution dynamics and has been a

promising nonlinear system modeling method.

Considering the essence of nonlinear dynamic systems,

the predictor should have the nonlinearity processing ability,

which restricted the traditional linear models. Consequently,

neural networks (NNs), which can effectively deal with

nonlinearity, has become a popular nonlinear time series

modeling methodology [2]. The mechanical theory basis

of most neural network prediction models is the Takens’

theorem [3], which indicates that a nonlinear time series x(t)observed from a dynamic system can be perfectly forecast

from a few previous support points, principally. This means

the next value x(t + 1) is determined by a few historical

values x(t), x(t− τ), x(t− 2τ), . . . , x(t− (m− 1)τ), where

τ is the delay time and m is the embedding dimension.

Essentially, due to the Takens’ theorem [3], if there are

enough delayed coordinates, a scalar time series can be gen-

erally sufficient to reconstruct the dynamics of the nonlinear

dynamic systems. On the other hand, a given scalar time

series may not be sufficient to reconstruct the dynamics

practically. Furthermore, multiple variable time series data

*This work was supported by National Natural Science Foundation ofChina under the Projects 61374154 and 61074096.

Min Han and Xinying Wang are with the Faculty of Electronic Informa-tion and Electrical Engineering, Dalian University of Technology, Dalian,Liaoning, 116023. China. [email protected]

of the dynamic system can be observed and there are some

substantial advantages using these multivariate data [4], [5].

Recently, echo state network (ESN) is proposed to model

the Mackey-Glass chaotic system [6]. A sparsely connected

reservoir is used as the hidden layer of ESN, the input

weights and the inner connected weights are randomly

generated and keep fixed during the learning process, and

only the connected weights between the reservoir and the

output layer are tunable [7]. Owing to the simple structure

and the efficient training method, ESNs have been applied

to many different fields successfully, e.g. signal processing

[8], [9], time series prediction [10], [11], pattern recognition

[12], [13] and so on [14], [15]. The research on ESN is in

the ascendant. Conceptually, training process of ESN can be

seen as a supervised non-temporal linear regression. A lot

of algorithms has been extended or developed for ESN [16],

[17], where echo state Gaussian process [18], Evolino [19],

recurrent kernel machines [20] and support vector echo state

machine [10] combine ESNs and kernel methods (KMs).

These methods, combing the advantage both of ESNs and

KMs, have superior modeling accuracy and have been an

interesting research direction [21].

In this paper, a weighted support vector echo state network

(WSVESM) is proposed for multivariate dynamic system

modeling. Different weights are assigned to multivariate time

series data, and a solution path algorithm is developed to

optimize the weights. Utilizing the weights assigned to each

sample, the proposed model can capture more underlying

dynamics and has more robust modeling capability [22]. The

rest of this paper is structured as follows. Section II gives a

brief review of the preliminary works. Section III presents the

weighted support vector echo state machine and its solution

path algorithm. Section IV gives the simulation results of

two examples, where the first example is to model Lorenz

chaotic multivariate dynamic system, the second is to model

annual runoff and sunspot bivariate dynamic system. Finally,

in Section V, discussions and conclusions are given.

II. PRELIMINARY

In this paper, WSVESM is developed based on SVESM.

Before the introduction of WSVESM, it is therefore neces-

sary to briefly recall ESN [6] and SVESM [10].

A. Echo State Network

The equations of ESN are as follows

x(k + 1) = tansig(Wxx(k) + Winu(k) + Wbacky(k)) (1)

y(k + 1) = WToutx(k) (2)

2014 American Control Conference (ACC)June 4-6, 2014. Portland, Oregon, USA

978-1-4799-3274-0/$31.00 ©2014 AACC 4824

where tansig denotes hyperbolic tangent sigmoid activation

function which is applied elementwise, x(k) is the state

vector in the ”reservoir”, u(k) and y(k) indicate the input

and the output vector of the ESNs, respectively. Win, Wx,

Wback and Wout are the input weights, internal connection

weights, feedback weights and output weights of the ESNs,

respectively. Wback is discarded in this paper.

The essence of ESNs is the large, sparsely connected

and fixed ”reservoir”, from which the desired output is

obtained by learning suitable output weights. Determination

of optimal output weights is a linear regression task

minˆWout

∥

∥

∥XWout − yd

∥

∥

∥(3)

where X =[

xT (Init) , xT (Init + 1) , . . . , xT (Trn)]

, yd =[yd (Init) , yd (Init + 1) , . . . , yd (Trn)]. Init and Trn are the

beginning and ending index of the training examples, respec-

tively, and the size of the training set is Nt = Trn−Init+1.

Init is usually set to a special value to discard the influence

of reservoir initial transient.

The prediction function for new network inputs is given

by (without loss of generality, a bias term b is added to the

regression function)

f(

xtest)

= wT xtest + b (4)

where xtest is the new reservoir state vector excited by the

new network inputs and without loss of generality, hereafter

we use w instead of Wout. The reservoir used for new

prediction is the same as the one used in the training phase.

The output weights can be calculated by solving a standard

least-square problem.

B. Support Vector Echo State Machine

SVESM is proposed as a kind of RNNs for modeling

nonlinear dynamic system with the advantage of the SVMs

and ESNs. The essence is to perform linear SVR in the high

dimension ”reservoir” state space, which is called ’reservoir

trick’ contrast to ’kernel trick’. According to the mechanism

of SVESM, the learning procedure can be translated to a

standard quadratic programming problem, that is the optimal

solution is unique and the optimization problem is convex.

The remarkable features of SVESM are the flexibility to use

different loss functions. Sparse solution representation can be

obtained if the ε-insensitive loss function is applied, and the

robust loss functions such as the Huber loss function can be

utilized to create a model which is less sensitive to outliers.

These additional features seldom appear in traditional RNNs.

The expression of the ε-insensitive loss function is as

follows

Lε (f (x)− y) =

{

0,|f (x)− y| − ε,

|f (x)− y| < εotherwise

(5)

Then the primary optimization problem of SVESM with ε-insensitive loss function can be formatted as

min ‖w‖+ CNt∑

j=1

(

ξj + ξ∗j)

s.t.

(

wT xj + b)

− yj ≤ ε+ ξj ,j = 1, . . . , Nt

yj −(

wT xj + b)

≤ ε+ ξ∗j ,j = 1, . . . , Nt

ξj , ξ∗j ≥ 0,j = 1, . . . , Nt

(6)

The corresponding dual optimization problem can be written

as

minα,α∗

12

Nt∑

i=1

Nt∑

j=1

(αi − α∗i )

(

αj − α∗j

)

xTi xj

−Nt∑

i=1

(αi − α∗i ) yi +

Nt∑

i=1

(αi + α∗i ) ε

(7)

with constraints

0 ≤ αi, α∗i ≤ C, i = 1, . . . , Nt,

Nt∑

i=1

(αi − α∗i ) = 0 (8)

where α and α∗ are the corresponding Lagrange multipli-

ers, which can be determined by solving the minimization

problem (7)with constraints (8).

The solution are given by a set of Lagrange multipliers

and the predictions function for the new reservoir test state

variable xtestis generally written as

f(

xtest)

=

s∑

i=1

αs (i) xT xtest (9)

where αs(i) is the Lagrange multipliers corresponding to the

reservoir state vector x(i). s is the number of support vectors,

which is controlled by the size of the insensitive tube if the

ε-insensitive loss function is applied.

When ε-insensitive loss function is used, the solution can

be sparse and the sparseness of the solution can be controlled

by the insensitive parameters ε. Big reservoir and large Cmean a complex model. If the reservoir or C is larger than

necessary, SVESM tends to overfit. For a fixed reservoir,

the model complexity can be controlled by the tuning of

regularization parameter C. The techniques such as cross

validation, bootstrap or Bayesian method can be applied to

determine which values are proper for these parameters.

III. WEIGHTED SUPPORT VECTOR ECHO STATE

MACHINES

Compared with traditional RNNs, SVESM needs not to

deal with the problems of weights estimation, outlier sup-

pression, and generalization capability control [10]. How-

ever, the proper regularization parameter of SVESM is diffi-

cult to be determined and the regularization effect is equally

applied to each sample. It is obvious that assigning different

weights (regularization parameters) to different samples will

lead to more robust and precise prediction model [22].

Furthermore, there is no doubt that different weights will

play an important role in dealing with multivatiate time series

data, which may have different descriptions of the underlying

dynamic systems.

4825

A. WSVESM

The primary optimization problem of WSVESM can be

written as

minw,b{ξi+ξ∗

i}Nt

i=1

12 ‖w‖

22 +

Nt∑

i=1

Ci (ξi + ξ∗i )

s.t.

yi − f (xi) ≤ ε+ ξi,f (xi)− yi ≤ ε+ ξ∗i ,

ξi, ξ∗i ≥ 0, i = 1, . . . , Nt.

(10)

where ε is the insensitive width. WSVESM degenerates to

SVESM when Ci = C, so that SVESM can be viewed as a

special case of WSVESM.

The corresponding dual optimization problem of

WSVESM is as follows

max{αi}

Nt

i=1

− 12

Nt∑

i=1

Nt∑

j=1

αiαjK (xi, xj)−εNt∑

i=1

|αi|+Nt∑

i=1

yiαi

s.t.Nt∑

i=1

αi = 0,−Ci ≤ αi ≤ Ci, i = 1, · · · , Nt

(11)

The regression function has the following expression

f (x) =

Nt∑

i=1

αiK (xi, x) + b (12)

The corresponding KKT conditions are

|yi − f (xi)| ≤ ε, αi = 0|yi − f (xi)| = ε, 0 < |αi| < Ci

|yi − f (xi)| ≥ ε, |αi| = CiNt∑

i=1

αi = 0

(13)

As a results, the training set can be divided into the

following three index sets

O = {i| |yi − f (xi)| ≥ ε, |αi| = Ci}E = {i| |yi − f (xi)| = ε, 0 < |αi| < Ci}

I = {i| |yi − f (xi)| ≤ ε, |αi| = 0}(14)

Let

Kε =

[

0 1T

1 KE

]

s =

sign (y1 − f (x1))...

sign (yNt− f (xNt

))

c =

C1

...

Cn

(15)

[

bαE

]

= −(Kε)−1

[

1TOKE,O

]

diag (sO) cO

+(Kε)−1

[

0yE − εsE

] (16)

αO = diag (sO) cOαI = 0

(17)

where diag (sO) indicates the diagonal matrix with diagonal

elements given by sO.

B. Multi-parameter solution path algorithm

Similar to path-following algorithm for the ordinary

WSVM [23], which can compute the entire solution path

for the regularization parameters c = [C1, . . . , Cn]T

, in this

section, we introduce the path-following algorithm to de-

termine the optimal regularization parameters of WSVESM.

The algorithm keeps track of the optimal αi and b when the

vector c is changed.

A change of the index sets O, E and I is called an event.

As long as no event occurs, the WSVESM solution for all c

can be computed by (15)-(17) since all the KKT conditions

(13) are still satisfied. However, when an event occurs, it

is necessary to check the violation of the KKT conditions.

When c is changed, the issue of event detection is addressed

as follows. When the event occurs, the changes of c can be

expressed as[

∆b∆αE

]

= ∆θφ (18)

where

φ = −(Kε)−1

[

1TOKE,O

]

diag (sO)(

c(new)O − c

(old)O

)

(19)

Meanwhile, ∆f (xi) can be expressed as

∆f (xi) =[

1 Ki,E

]

[

∆b∆αE

]

+Ki,Odiag (sO)∆cO

= ∆θψi

(20)

where

ψi =[

1 Ki,E

]

φ+Ki,Odiag (sO)(

c(new)O − c

(old)O

)

(21)

The elements of the index set of E are

E ={

e1, . . . , e|E|

}

The largest step without event occurring is

∆θ = mini∈{1,...,|E|},j∈I∪O

{

−αmi

φi+1,Cmi

− αmi

φi+1 − dmi

,|ε− f (xj)|

ψj

}

+(22)

IV. SIMULATION EXAMPLES

In order to demonstrate the modeling capability of the

proposed WSVEM, two simulation examples are conducted

in this section. The simulation is conducted in Matlab2011b

environment running on the Windows 7 operating system,

Pentium (R) CPU 2.6 GHz, memory 4G RAM. To further

demonstrate the effectiveness, the performance comparisons

of WSVESM with other state-of-art models are also listed.

A. Lorenz Multivariate Dynamic System

In this example, the proposed WSVESM is used to model

the benchmark Lorenz multivariate dynamic system. The

equations of Lorenz three-variable-dynamic-system is as

follows.

dxdt

= a(y − x)dy

dt= (c− z)x− y

dzdt

= xy − bz

(23)

4826

When a = 10, b = 8/3, c = 28 and x(0) = y(0) = z(0) =1.0, chaotic dynamics of (23) can be observed. The fourth-

order Runge-Kutta method is used to generate the chaotic

time series with sampling time selected as 0.02. From (23),

we can see that the derivative of x is dependent on x and

y, the derivative of y is dependent on x, y and z, and the

derivative of z is dependent on x, y and z. As a result, we

use the x(t), y(t) and z(t) together to predict the x(t + η)and η is the prediction horizon.

The size of the reservoir of WSVESM is set as 200,

and the sparseness and spectral radius of Wx are set as

2% and 0.98, respectively. The input weights are randomly

generated in [-1, +1], the former 2000 sets of data are used

as training set and the remaining 491 samples are used to

test the prediction performance.

The 1-step-ahead prediction based on WSVESM simula-

tion results are shown in Fig. 1. From the results, we can

see that the prediction curve fits the actual cure well, and

the prediction errors are reasonably small.

The comparison of WSVESM and other commonly used

methods are summarized in TABLE I. The compared meth-

ods are ESN [6], extreme learning machine (ELM) [24],

SVESM [10], where ESN is a chaotic time series modeling

method superior to traditional RNNs, ELM is also a promis-

ing modeling method with better modeling performance than

traditional feed-forward NNs, SVESM is the prototype model

of WSVESM. As shown in TABLE I, the prediction accuracy

based on WSVESM proposed herein is similar with SVESM

and WSVESM get the best prediction result among the

compared four methods.

TABLE I

COMPARISON OF PREDICTION ACCURACY

Methods ESN [6] ELM [24] SVESM [10] WSVESM

RMSE 0.0025 0.0022 0.0018 0.0017

0 50 100 150 200 250 300 350 400 450 500−20

−10

0

10

20

a

0 50 100 150 200 250 300 350 400 450 500−0.1

0

0.1

0.2

0.3

b

Actual Predicted

Fig. 1. Lorenz x(t) time series 1-step-ahead prediction curve of WSVESMmethod. (a) prediction curves and (b) prediction error.

In order to further demonstrate the performance of

WSVEM, iterative predictions and errors over 100 points

are shown in Fig. 2. The predicted curve can fit the actual

curve well, and the prediction errors are increased very slow

with the iteration.

0 10 20 30 40 50 60 70 80 90 100−20

−10

0

10

20

a

0 10 20 30 40 50 60 70 80 90 100−0.2

−0.1

0

0.1

0.2

b

Actual Predicted

Fig. 2. Lorenz x(t) time series 100 iteration prediction curve of WSVESMmethod. (a) prediction curves and (b) prediction error.

B. The Annual Runoff of Yellow River and Sunspots

In this simulation, WSVESM is tested on a real-world

example. The complex dynamic system consisting of the

annual mean sunspots and natural annual runoff is used.

Fig. 3 shows the annual mean sunspots number (Fig. 3 (a))

and the annual runoff of Yellow River (3 (b)) measured

at Sanmenxia gauge station from 1700 to 2003 about 304

years. The parameters of WSVESM in this example are set

as follows. The size of the reservoir is set as 200, and the

sparseness and spectral radius of Wx are set as 2% and

0.98, respectively. The input weights are randomly generated

uniform over [-1, +1], the former 250 sets of data are used

for training model and the remaining 48 sets of data are used

as testing set.

The simulation results are shown in TABLE II. Through

the comparison of the effect of different models, we can see

the proposed WSVESM is able to reserve a better dynamic

characterization of sunspots and the Yellow River runoff and

has a higher prediction accuracy. Fig. 4 shows the predicted

curve can approximate the trend of the actual curve, and the

errors are in a promising low level.

TABLE II

COMPARISON OF PREDICTION ACCURACY()

Methods MrESN[25] ESN[6] SVESM[10] WSVESM

Runoff 43.7030 50.7200 48.5690 25.0733

4827

1700 1750 1800 1850 1900 1950 2003 0

50

100

150

200

Time(year)

1700 1750 1800 1850 1900 1950 2003 0

500

1000

Time(year)

Ru

no

ff(×

10

8m

3)

Annual runoff of Yellow River

Annual sunspots

b

Sun

spo

ts(N

um

.)

a

Fig. 3. The Sunspot number and annual runoff series of Yellow RiverBivariate Time Series. (a) Figure of the sunspot number series (b) Figureof the annual runoff series of Yellow River.

0 5 10 15 20 25 30 35 40 45 50500

600

700

800

900

a

0 5 10 15 20 25 30 35 40 45 50−100

−50

0

50

100

b

Actural

Predicted

Fig. 4. Prediction annual runoff time series of Yellow River based onWSVESM. (a) prediction curve and (b) prediction error.

V. CONCLUSIONS

In this paper, a weighted support vector echo state machine

prediction model is proposed. Each of the training samples

is assigned a weight, which is computed by a solution path

algorithm. The WSVESM model combines the advantage of

echo state network and support vector machine, and through

instance weighted, can capture more dynamic features of the

multivariate dynamic system. Its performance has been tested

by two multivariate dynamic system modeling examples

which are lorenz chaotic time series and the annual runoff

of Yellow River and sunspots time series. The simulation

results show that the model based on the method presented

herein is more accurate and effective.

REFERENCES

[1] H. Zhang and Y. Quan, “Modeling, identification, and control of aclass of nonlinear systems,” Fuzzy Systems, IEEE Transactions on,vol. 9, no. 2, pp. 349–354, 2001.

[2] D. Liu, T.-s. Chang, and Y. Zhang, “A constructive algorithm forfeedforward neural networks with incremental training,” IEEE Trans-

actions on Circuits and Systems I: Fundamental Theory and Applica-

tions, vol. 49, no. 12, pp. 1876–1879, 2002.[3] F. Takens, “Detecting strange attractors in turbulence,” Dynamical

systems and turbulence, Warwick 1980, pp. 366–381, 1981.[4] L. Cao, A. Mees, and K. Judd, “Dynamics from multivariate time

series,” Physica D: Nonlinear Phenomena, vol. 121, no. 1-2, pp. 75–88, 1998.

[5] A. a. Jamshidi and M. J. Kirby, “Modeling multivariate time serieson manifolds with skew radial basis functions,” Neural computation,vol. 23, no. 1, pp. 97–123, Jan. 2011.

[6] H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting chaoticsystems and saving energy in wireless communication,” Science, vol.304, no. 5667, p. 78, 2004.

[7] M. Lukosevicius and H. Jaeger, “Reservoir computing approaches torecurrent neural network training,” Computer Science Review, vol. 3,no. 3, pp. 127–149, 2009.

[8] M. Ozturk, D. Xu, and J. Prncipe, “Analysis and design of echo statenetworks,” Neural Computation, vol. 19, no. 1, pp. 111–138, 2007.

[9] Y. Xia, B. Jelfs, M. Van Hulle, J. Prncipe, and D. Mandic, “Anaugmented echo state network for nonlinear adaptive filtering ofcomplex noncircular signals,” Neural Networks, IEEE Transactions

on, no. 99, pp. 1–10, 2010.[10] Z. Shi and M. Han, “Support vector echo-state machine for chaotic

time-series prediction,” Neural Networks, IEEE Transactions on,vol. 18, no. 2, pp. 359–372, 2007.

[11] D. Li, M. Han, and J. Wang, “Chaotic time series prediction basedon a novel robust echo state network,” IEEE Transactions on Neural

Networks and Learning Systems, vol. 23, no. 5, pp. 787–799, 2012.[12] R. F. Reinhart and J. Jakob Steil, “Regularization and stability in

reservoir networks with output feedback,” Neurocomputing, vol. 90,pp. 96–105, Mar. 2012.

[13] A. Rodan and P. Tio, “Simple deterministically constructed cyclereservoirs with regular jumps,” Neural computation, vol. 24, no. 7,pp. 1822–1852, Mar. 2012.

[14] D. Prokhorov, “Echo state networks: Appeal and challenges,” inInternational Joint Conference on Neural Networks, vol. 3. IEEE,2005, pp. 1463–1466.

[15] Y. Xue, L. Yang, and S. Haykin, “Decoupled echo state networks withlateral inhibition,” Neural Networks, vol. 20, no. 3, pp. 365–76, Apr.2007.

[16] L. Busing, B. Schrauwen, and R. Legenstein, “Connectivity, dynamics,and memory in reservoir computing with binary and analog neurons,”Neural computation, vol. 22, no. 5, pp. 1272–311, May 2010.

[17] D. Shutin, C. Zechner, S. Kulkarni, and H. Poor, “Regularizedvariational bayesian learning of echo state networks with delay sumreadout,” Neural Computation, pp. 1–29, 2011.

[18] S. Chatzis and Y. Demiris, “Echo state gaussian process,” Neural

Networks, IEEE Transactions on, vol. 22, no. 9, pp. 1435–1445, 2011.[19] J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez, “Training

recurrent networks by evolino,” Neural computation, vol. 19, no. 3,pp. 757–79, Mar. 2007.

[20] M. Hermans and B. Schrauwen, “Recurrent kernel machines: Comput-ing with infinite echo state networks,” Neural Computation, vol. 24,no. 1, pp. 104–133, 2012.

[21] X. Liu, C. Gao, and P. Li, “A comparative analysis of support vectormachines and extreme learning machines,” Neural Networks, Apr.2012.

[22] J. Suykens, J. De Brabanter, L. Lukas, and J. Vandewalle, “Weightedleast squares support vector machines: robustness and sparse approx-imation,” Neurocomputing, vol. 48, no. 1, pp. 85–105, 2002.

[23] M. Karasuyama, N. Harada, M. Sugiyama, and I. Takeuchi, “Multi-parametric solution-path algorithm for instance-weighted support vec-tor machines,” Machine Learning, vol. 88, no. 3, pp. 297–330, Apr.2012.

[24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:Theory and applications,” Neurocomputing, vol. 70, no. 1?, pp. 489 –501, 2006.

[25] M. Han and D. Mu, “Multi-reservoir echo state network with sparsebayesian learning,” in Advances in Neural Networks-ISNN 2010.Springer, 2010, pp. 450–456.

4828

Documents

[IEEE 2014 American Control Conference - ACC 2014 - Portland, OR, USA (2014.6.4-2014.6.6)] 2014 American Control Conference - Weighted support vector echo state machine for multivariate