Upload
xinying
View
214
Download
2
Embed Size (px)
Citation preview
Weighted Support Vector Echo State Machine for Multivariate
Dynamic System Modeling*
Min Han and Xinying Wang
Abstract— Support vector echo state machine (SVESM) isa promising dynamic system modeling tool, which performedlinear support vector regression (SVR) in the high dimension”reservoir” state space. A variant of SVESM, weighted supportvector echo state machine (WSVESM) is proposed in this paperto deal with the multivariate dynamic system modeling problem.The historical observed data of the dynamic system are treatedas multivariate time series, and the proposed WSVESM modelis used to predict the time series. Different weights are allocatedto the training data, and a multi-parameter solution pathalgorithm is introduced to determine the solution of WSVESM.Simulation results based on artificial and real-world examplesshow the effectiveness of the proposed method.
I. INTRODUCTION
Nonlinear dynamic system modeling has become an im-
portant research direction and has been bounded in not
only the scientific research field but also the engineering
field [1]. Commonly, Nonlinear dynamic systems have the
characteristics of nonlinearity and have complex evolution
laws. It is difficult to establish the exact mathematic model
of nonlinear dynamic systems. However, a lot of historical
evolution data can be observed as a function of time. Time
series prediction technique can utilize these historical data
to predict the future evolution dynamics and has been a
promising nonlinear system modeling method.
Considering the essence of nonlinear dynamic systems,
the predictor should have the nonlinearity processing ability,
which restricted the traditional linear models. Consequently,
neural networks (NNs), which can effectively deal with
nonlinearity, has become a popular nonlinear time series
modeling methodology [2]. The mechanical theory basis
of most neural network prediction models is the Takens’
theorem [3], which indicates that a nonlinear time series x(t)observed from a dynamic system can be perfectly forecast
from a few previous support points, principally. This means
the next value x(t + 1) is determined by a few historical
values x(t), x(t− τ), x(t− 2τ), . . . , x(t− (m− 1)τ), where
τ is the delay time and m is the embedding dimension.
Essentially, due to the Takens’ theorem [3], if there are
enough delayed coordinates, a scalar time series can be gen-
erally sufficient to reconstruct the dynamics of the nonlinear
dynamic systems. On the other hand, a given scalar time
series may not be sufficient to reconstruct the dynamics
practically. Furthermore, multiple variable time series data
*This work was supported by National Natural Science Foundation ofChina under the Projects 61374154 and 61074096.
Min Han and Xinying Wang are with the Faculty of Electronic Informa-tion and Electrical Engineering, Dalian University of Technology, Dalian,Liaoning, 116023. China. [email protected]
of the dynamic system can be observed and there are some
substantial advantages using these multivariate data [4], [5].
Recently, echo state network (ESN) is proposed to model
the Mackey-Glass chaotic system [6]. A sparsely connected
reservoir is used as the hidden layer of ESN, the input
weights and the inner connected weights are randomly
generated and keep fixed during the learning process, and
only the connected weights between the reservoir and the
output layer are tunable [7]. Owing to the simple structure
and the efficient training method, ESNs have been applied
to many different fields successfully, e.g. signal processing
[8], [9], time series prediction [10], [11], pattern recognition
[12], [13] and so on [14], [15]. The research on ESN is in
the ascendant. Conceptually, training process of ESN can be
seen as a supervised non-temporal linear regression. A lot
of algorithms has been extended or developed for ESN [16],
[17], where echo state Gaussian process [18], Evolino [19],
recurrent kernel machines [20] and support vector echo state
machine [10] combine ESNs and kernel methods (KMs).
These methods, combing the advantage both of ESNs and
KMs, have superior modeling accuracy and have been an
interesting research direction [21].
In this paper, a weighted support vector echo state network
(WSVESM) is proposed for multivariate dynamic system
modeling. Different weights are assigned to multivariate time
series data, and a solution path algorithm is developed to
optimize the weights. Utilizing the weights assigned to each
sample, the proposed model can capture more underlying
dynamics and has more robust modeling capability [22]. The
rest of this paper is structured as follows. Section II gives a
brief review of the preliminary works. Section III presents the
weighted support vector echo state machine and its solution
path algorithm. Section IV gives the simulation results of
two examples, where the first example is to model Lorenz
chaotic multivariate dynamic system, the second is to model
annual runoff and sunspot bivariate dynamic system. Finally,
in Section V, discussions and conclusions are given.
II. PRELIMINARY
In this paper, WSVESM is developed based on SVESM.
Before the introduction of WSVESM, it is therefore neces-
sary to briefly recall ESN [6] and SVESM [10].
A. Echo State Network
The equations of ESN are as follows
x(k + 1) = tansig(Wxx(k) + Winu(k) + Wbacky(k)) (1)
y(k + 1) = WToutx(k) (2)
2014 American Control Conference (ACC)June 4-6, 2014. Portland, Oregon, USA
978-1-4799-3274-0/$31.00 ©2014 AACC 4824
where tansig denotes hyperbolic tangent sigmoid activation
function which is applied elementwise, x(k) is the state
vector in the ”reservoir”, u(k) and y(k) indicate the input
and the output vector of the ESNs, respectively. Win, Wx,
Wback and Wout are the input weights, internal connection
weights, feedback weights and output weights of the ESNs,
respectively. Wback is discarded in this paper.
The essence of ESNs is the large, sparsely connected
and fixed ”reservoir”, from which the desired output is
obtained by learning suitable output weights. Determination
of optimal output weights is a linear regression task
minˆWout
∥
∥
∥XWout − yd
∥
∥
∥(3)
where X =[
xT (Init) , xT (Init + 1) , . . . , xT (Trn)]
, yd =[yd (Init) , yd (Init + 1) , . . . , yd (Trn)]. Init and Trn are the
beginning and ending index of the training examples, respec-
tively, and the size of the training set is Nt = Trn−Init+1.
Init is usually set to a special value to discard the influence
of reservoir initial transient.
The prediction function for new network inputs is given
by (without loss of generality, a bias term b is added to the
regression function)
f(
xtest)
= wT xtest + b (4)
where xtest is the new reservoir state vector excited by the
new network inputs and without loss of generality, hereafter
we use w instead of Wout. The reservoir used for new
prediction is the same as the one used in the training phase.
The output weights can be calculated by solving a standard
least-square problem.
B. Support Vector Echo State Machine
SVESM is proposed as a kind of RNNs for modeling
nonlinear dynamic system with the advantage of the SVMs
and ESNs. The essence is to perform linear SVR in the high
dimension ”reservoir” state space, which is called ’reservoir
trick’ contrast to ’kernel trick’. According to the mechanism
of SVESM, the learning procedure can be translated to a
standard quadratic programming problem, that is the optimal
solution is unique and the optimization problem is convex.
The remarkable features of SVESM are the flexibility to use
different loss functions. Sparse solution representation can be
obtained if the ε-insensitive loss function is applied, and the
robust loss functions such as the Huber loss function can be
utilized to create a model which is less sensitive to outliers.
These additional features seldom appear in traditional RNNs.
The expression of the ε-insensitive loss function is as
follows
Lε (f (x)− y) =
{
0,|f (x)− y| − ε,
|f (x)− y| < εotherwise
(5)
Then the primary optimization problem of SVESM with ε-insensitive loss function can be formatted as
min ‖w‖+ CNt∑
j=1
(
ξj + ξ∗j)
s.t.
(
wT xj + b)
− yj ≤ ε+ ξj ,j = 1, . . . , Nt
yj −(
wT xj + b)
≤ ε+ ξ∗j ,j = 1, . . . , Nt
ξj , ξ∗j ≥ 0,j = 1, . . . , Nt
(6)
The corresponding dual optimization problem can be written
as
minα,α∗
12
Nt∑
i=1
Nt∑
j=1
(αi − α∗i )
(
αj − α∗j
)
xTi xj
−Nt∑
i=1
(αi − α∗i ) yi +
Nt∑
i=1
(αi + α∗i ) ε
(7)
with constraints
0 ≤ αi, α∗i ≤ C, i = 1, . . . , Nt,
Nt∑
i=1
(αi − α∗i ) = 0 (8)
where α and α∗ are the corresponding Lagrange multipli-
ers, which can be determined by solving the minimization
problem (7)with constraints (8).
The solution are given by a set of Lagrange multipliers
and the predictions function for the new reservoir test state
variable xtestis generally written as
f(
xtest)
=
s∑
i=1
αs (i) xT xtest (9)
where αs(i) is the Lagrange multipliers corresponding to the
reservoir state vector x(i). s is the number of support vectors,
which is controlled by the size of the insensitive tube if the
ε-insensitive loss function is applied.
When ε-insensitive loss function is used, the solution can
be sparse and the sparseness of the solution can be controlled
by the insensitive parameters ε. Big reservoir and large Cmean a complex model. If the reservoir or C is larger than
necessary, SVESM tends to overfit. For a fixed reservoir,
the model complexity can be controlled by the tuning of
regularization parameter C. The techniques such as cross
validation, bootstrap or Bayesian method can be applied to
determine which values are proper for these parameters.
III. WEIGHTED SUPPORT VECTOR ECHO STATE
MACHINES
Compared with traditional RNNs, SVESM needs not to
deal with the problems of weights estimation, outlier sup-
pression, and generalization capability control [10]. How-
ever, the proper regularization parameter of SVESM is diffi-
cult to be determined and the regularization effect is equally
applied to each sample. It is obvious that assigning different
weights (regularization parameters) to different samples will
lead to more robust and precise prediction model [22].
Furthermore, there is no doubt that different weights will
play an important role in dealing with multivatiate time series
data, which may have different descriptions of the underlying
dynamic systems.
4825
A. WSVESM
The primary optimization problem of WSVESM can be
written as
minw,b{ξi+ξ∗
i}Nt
i=1
12 ‖w‖
22 +
Nt∑
i=1
Ci (ξi + ξ∗i )
s.t.
yi − f (xi) ≤ ε+ ξi,f (xi)− yi ≤ ε+ ξ∗i ,
ξi, ξ∗i ≥ 0, i = 1, . . . , Nt.
(10)
where ε is the insensitive width. WSVESM degenerates to
SVESM when Ci = C, so that SVESM can be viewed as a
special case of WSVESM.
The corresponding dual optimization problem of
WSVESM is as follows
max{αi}
Nt
i=1
− 12
Nt∑
i=1
Nt∑
j=1
αiαjK (xi, xj)−εNt∑
i=1
|αi|+Nt∑
i=1
yiαi
s.t.Nt∑
i=1
αi = 0,−Ci ≤ αi ≤ Ci, i = 1, · · · , Nt
(11)
The regression function has the following expression
f (x) =
Nt∑
i=1
αiK (xi, x) + b (12)
The corresponding KKT conditions are
|yi − f (xi)| ≤ ε, αi = 0|yi − f (xi)| = ε, 0 < |αi| < Ci
|yi − f (xi)| ≥ ε, |αi| = CiNt∑
i=1
αi = 0
(13)
As a results, the training set can be divided into the
following three index sets
O = {i| |yi − f (xi)| ≥ ε, |αi| = Ci}E = {i| |yi − f (xi)| = ε, 0 < |αi| < Ci}
I = {i| |yi − f (xi)| ≤ ε, |αi| = 0}(14)
Let
Kε =
[
0 1T
1 KE
]
s =
sign (y1 − f (x1))...
sign (yNt− f (xNt
))
c =
C1
...
Cn
(15)
[
bαE
]
= −(Kε)−1
[
1TOKE,O
]
diag (sO) cO
+(Kε)−1
[
0yE − εsE
] (16)
αO = diag (sO) cOαI = 0
(17)
where diag (sO) indicates the diagonal matrix with diagonal
elements given by sO.
B. Multi-parameter solution path algorithm
Similar to path-following algorithm for the ordinary
WSVM [23], which can compute the entire solution path
for the regularization parameters c = [C1, . . . , Cn]T
, in this
section, we introduce the path-following algorithm to de-
termine the optimal regularization parameters of WSVESM.
The algorithm keeps track of the optimal αi and b when the
vector c is changed.
A change of the index sets O, E and I is called an event.
As long as no event occurs, the WSVESM solution for all c
can be computed by (15)-(17) since all the KKT conditions
(13) are still satisfied. However, when an event occurs, it
is necessary to check the violation of the KKT conditions.
When c is changed, the issue of event detection is addressed
as follows. When the event occurs, the changes of c can be
expressed as[
∆b∆αE
]
= ∆θφ (18)
where
φ = −(Kε)−1
[
1TOKE,O
]
diag (sO)(
c(new)O − c
(old)O
)
(19)
Meanwhile, ∆f (xi) can be expressed as
∆f (xi) =[
1 Ki,E
]
[
∆b∆αE
]
+Ki,Odiag (sO)∆cO
= ∆θψi
(20)
where
ψi =[
1 Ki,E
]
φ+Ki,Odiag (sO)(
c(new)O − c
(old)O
)
(21)
The elements of the index set of E are
E ={
e1, . . . , e|E|
}
The largest step without event occurring is
∆θ = mini∈{1,...,|E|},j∈I∪O
{
−αmi
φi+1,Cmi
− αmi
φi+1 − dmi
,|ε− f (xj)|
ψj
}
+(22)
IV. SIMULATION EXAMPLES
In order to demonstrate the modeling capability of the
proposed WSVEM, two simulation examples are conducted
in this section. The simulation is conducted in Matlab2011b
environment running on the Windows 7 operating system,
Pentium (R) CPU 2.6 GHz, memory 4G RAM. To further
demonstrate the effectiveness, the performance comparisons
of WSVESM with other state-of-art models are also listed.
A. Lorenz Multivariate Dynamic System
In this example, the proposed WSVESM is used to model
the benchmark Lorenz multivariate dynamic system. The
equations of Lorenz three-variable-dynamic-system is as
follows.
dxdt
= a(y − x)dy
dt= (c− z)x− y
dzdt
= xy − bz
(23)
4826
When a = 10, b = 8/3, c = 28 and x(0) = y(0) = z(0) =1.0, chaotic dynamics of (23) can be observed. The fourth-
order Runge-Kutta method is used to generate the chaotic
time series with sampling time selected as 0.02. From (23),
we can see that the derivative of x is dependent on x and
y, the derivative of y is dependent on x, y and z, and the
derivative of z is dependent on x, y and z. As a result, we
use the x(t), y(t) and z(t) together to predict the x(t + η)and η is the prediction horizon.
The size of the reservoir of WSVESM is set as 200,
and the sparseness and spectral radius of Wx are set as
2% and 0.98, respectively. The input weights are randomly
generated in [-1, +1], the former 2000 sets of data are used
as training set and the remaining 491 samples are used to
test the prediction performance.
The 1-step-ahead prediction based on WSVESM simula-
tion results are shown in Fig. 1. From the results, we can
see that the prediction curve fits the actual cure well, and
the prediction errors are reasonably small.
The comparison of WSVESM and other commonly used
methods are summarized in TABLE I. The compared meth-
ods are ESN [6], extreme learning machine (ELM) [24],
SVESM [10], where ESN is a chaotic time series modeling
method superior to traditional RNNs, ELM is also a promis-
ing modeling method with better modeling performance than
traditional feed-forward NNs, SVESM is the prototype model
of WSVESM. As shown in TABLE I, the prediction accuracy
based on WSVESM proposed herein is similar with SVESM
and WSVESM get the best prediction result among the
compared four methods.
TABLE I
COMPARISON OF PREDICTION ACCURACY
Methods ESN [6] ELM [24] SVESM [10] WSVESM
RMSE 0.0025 0.0022 0.0018 0.0017
0 50 100 150 200 250 300 350 400 450 500−20
−10
0
10
20
a
0 50 100 150 200 250 300 350 400 450 500−0.1
0
0.1
0.2
0.3
b
Actual Predicted
Fig. 1. Lorenz x(t) time series 1-step-ahead prediction curve of WSVESMmethod. (a) prediction curves and (b) prediction error.
In order to further demonstrate the performance of
WSVEM, iterative predictions and errors over 100 points
are shown in Fig. 2. The predicted curve can fit the actual
curve well, and the prediction errors are increased very slow
with the iteration.
0 10 20 30 40 50 60 70 80 90 100−20
−10
0
10
20
a
0 10 20 30 40 50 60 70 80 90 100−0.2
−0.1
0
0.1
0.2
b
Actual Predicted
Fig. 2. Lorenz x(t) time series 100 iteration prediction curve of WSVESMmethod. (a) prediction curves and (b) prediction error.
B. The Annual Runoff of Yellow River and Sunspots
In this simulation, WSVESM is tested on a real-world
example. The complex dynamic system consisting of the
annual mean sunspots and natural annual runoff is used.
Fig. 3 shows the annual mean sunspots number (Fig. 3 (a))
and the annual runoff of Yellow River (3 (b)) measured
at Sanmenxia gauge station from 1700 to 2003 about 304
years. The parameters of WSVESM in this example are set
as follows. The size of the reservoir is set as 200, and the
sparseness and spectral radius of Wx are set as 2% and
0.98, respectively. The input weights are randomly generated
uniform over [-1, +1], the former 250 sets of data are used
for training model and the remaining 48 sets of data are used
as testing set.
The simulation results are shown in TABLE II. Through
the comparison of the effect of different models, we can see
the proposed WSVESM is able to reserve a better dynamic
characterization of sunspots and the Yellow River runoff and
has a higher prediction accuracy. Fig. 4 shows the predicted
curve can approximate the trend of the actual curve, and the
errors are in a promising low level.
TABLE II
COMPARISON OF PREDICTION ACCURACY()
Methods MrESN[25] ESN[6] SVESM[10] WSVESM
Runoff 43.7030 50.7200 48.5690 25.0733
4827
1700 1750 1800 1850 1900 1950 2003 0
50
100
150
200
Time(year)
1700 1750 1800 1850 1900 1950 2003 0
500
1000
Time(year)
Ru
no
ff(×
10
8m
3)
Annual runoff of Yellow River
Annual sunspots
b
Sun
spo
ts(N
um
.)
a
Fig. 3. The Sunspot number and annual runoff series of Yellow RiverBivariate Time Series. (a) Figure of the sunspot number series (b) Figureof the annual runoff series of Yellow River.
0 5 10 15 20 25 30 35 40 45 50500
600
700
800
900
a
0 5 10 15 20 25 30 35 40 45 50−100
−50
0
50
100
b
Actural
Predicted
Fig. 4. Prediction annual runoff time series of Yellow River based onWSVESM. (a) prediction curve and (b) prediction error.
V. CONCLUSIONS
In this paper, a weighted support vector echo state machine
prediction model is proposed. Each of the training samples
is assigned a weight, which is computed by a solution path
algorithm. The WSVESM model combines the advantage of
echo state network and support vector machine, and through
instance weighted, can capture more dynamic features of the
multivariate dynamic system. Its performance has been tested
by two multivariate dynamic system modeling examples
which are lorenz chaotic time series and the annual runoff
of Yellow River and sunspots time series. The simulation
results show that the model based on the method presented
herein is more accurate and effective.
REFERENCES
[1] H. Zhang and Y. Quan, “Modeling, identification, and control of aclass of nonlinear systems,” Fuzzy Systems, IEEE Transactions on,vol. 9, no. 2, pp. 349–354, 2001.
[2] D. Liu, T.-s. Chang, and Y. Zhang, “A constructive algorithm forfeedforward neural networks with incremental training,” IEEE Trans-
actions on Circuits and Systems I: Fundamental Theory and Applica-
tions, vol. 49, no. 12, pp. 1876–1879, 2002.[3] F. Takens, “Detecting strange attractors in turbulence,” Dynamical
systems and turbulence, Warwick 1980, pp. 366–381, 1981.[4] L. Cao, A. Mees, and K. Judd, “Dynamics from multivariate time
series,” Physica D: Nonlinear Phenomena, vol. 121, no. 1-2, pp. 75–88, 1998.
[5] A. a. Jamshidi and M. J. Kirby, “Modeling multivariate time serieson manifolds with skew radial basis functions,” Neural computation,vol. 23, no. 1, pp. 97–123, Jan. 2011.
[6] H. Jaeger and H. Haas, “Harnessing nonlinearity: Predicting chaoticsystems and saving energy in wireless communication,” Science, vol.304, no. 5667, p. 78, 2004.
[7] M. Lukosevicius and H. Jaeger, “Reservoir computing approaches torecurrent neural network training,” Computer Science Review, vol. 3,no. 3, pp. 127–149, 2009.
[8] M. Ozturk, D. Xu, and J. Prncipe, “Analysis and design of echo statenetworks,” Neural Computation, vol. 19, no. 1, pp. 111–138, 2007.
[9] Y. Xia, B. Jelfs, M. Van Hulle, J. Prncipe, and D. Mandic, “Anaugmented echo state network for nonlinear adaptive filtering ofcomplex noncircular signals,” Neural Networks, IEEE Transactions
on, no. 99, pp. 1–10, 2010.[10] Z. Shi and M. Han, “Support vector echo-state machine for chaotic
time-series prediction,” Neural Networks, IEEE Transactions on,vol. 18, no. 2, pp. 359–372, 2007.
[11] D. Li, M. Han, and J. Wang, “Chaotic time series prediction basedon a novel robust echo state network,” IEEE Transactions on Neural
Networks and Learning Systems, vol. 23, no. 5, pp. 787–799, 2012.[12] R. F. Reinhart and J. Jakob Steil, “Regularization and stability in
reservoir networks with output feedback,” Neurocomputing, vol. 90,pp. 96–105, Mar. 2012.
[13] A. Rodan and P. Tio, “Simple deterministically constructed cyclereservoirs with regular jumps,” Neural computation, vol. 24, no. 7,pp. 1822–1852, Mar. 2012.
[14] D. Prokhorov, “Echo state networks: Appeal and challenges,” inInternational Joint Conference on Neural Networks, vol. 3. IEEE,2005, pp. 1463–1466.
[15] Y. Xue, L. Yang, and S. Haykin, “Decoupled echo state networks withlateral inhibition,” Neural Networks, vol. 20, no. 3, pp. 365–76, Apr.2007.
[16] L. Busing, B. Schrauwen, and R. Legenstein, “Connectivity, dynamics,and memory in reservoir computing with binary and analog neurons,”Neural computation, vol. 22, no. 5, pp. 1272–311, May 2010.
[17] D. Shutin, C. Zechner, S. Kulkarni, and H. Poor, “Regularizedvariational bayesian learning of echo state networks with delay sumreadout,” Neural Computation, pp. 1–29, 2011.
[18] S. Chatzis and Y. Demiris, “Echo state gaussian process,” Neural
Networks, IEEE Transactions on, vol. 22, no. 9, pp. 1435–1445, 2011.[19] J. Schmidhuber, D. Wierstra, M. Gagliolo, and F. Gomez, “Training
recurrent networks by evolino,” Neural computation, vol. 19, no. 3,pp. 757–79, Mar. 2007.
[20] M. Hermans and B. Schrauwen, “Recurrent kernel machines: Comput-ing with infinite echo state networks,” Neural Computation, vol. 24,no. 1, pp. 104–133, 2012.
[21] X. Liu, C. Gao, and P. Li, “A comparative analysis of support vectormachines and extreme learning machines,” Neural Networks, Apr.2012.
[22] J. Suykens, J. De Brabanter, L. Lukas, and J. Vandewalle, “Weightedleast squares support vector machines: robustness and sparse approx-imation,” Neurocomputing, vol. 48, no. 1, pp. 85–105, 2002.
[23] M. Karasuyama, N. Harada, M. Sugiyama, and I. Takeuchi, “Multi-parametric solution-path algorithm for instance-weighted support vec-tor machines,” Machine Learning, vol. 88, no. 3, pp. 297–330, Apr.2012.
[24] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:Theory and applications,” Neurocomputing, vol. 70, no. 1?, pp. 489 –501, 2006.
[25] M. Han and D. Mu, “Multi-reservoir echo state network with sparsebayesian learning,” in Advances in Neural Networks-ISNN 2010.Springer, 2010, pp. 450–456.
4828