Multi-output parameter estimation of dynamic systems by output shapes

This article was downloaded by: [Kourosh Danai]On: 07 August 2014, At: 12:45Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Click for updates

International Journal of Systems SciencePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tsys20

Multi-output parameter estimation of dynamic systemsby output shapesJeffrey C. Simmonsa & Kourosh Danaiaa Department of Mechanical and Industrial Engineering, University of Massachusetts,Amherst, USAPublished online: 01 Aug 2014.

To cite this article: Jeffrey C. Simmons & Kourosh Danai (2014): Multi-output parameter estimation of dynamic systems byoutput shapes, International Journal of Systems Science, DOI: 10.1080/00207721.2014.943822

To link to this article: http://dx.doi.org/10.1080/00207721.2014.943822

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://crossmark.crossref.org/dialog/?doi=10.1080/00207721.2014.943822&domain=pdf&date_stamp=2014-08-01

http://www.tandfonline.com/loi/tsys20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207721.2014.943822

http://dx.doi.org/10.1080/00207721.2014.943822

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

International Journal of Systems Science, 2014http://dx.doi.org/10.1080/00207721.2014.943822

Multi-output parameter estimation of dynamic systems by output shapes

Jeffrey C. Simmons and Kourosh Danai∗

Department of Mechanical and Industrial Engineering, University of Massachusetts, Amherst, USA

(Received 22 April 2014; accepted 1 July 2014)

A multi-output method of parameter estimation is introduced for dynamic systems that relies on the shape attributes ofmodel outputs. The shapes of outputs in this method are represented by the surfaces that are generated by continuous wavelettransforms (CWTs) of the outputs in the time-scale domain. Since the CWTs also enhance the delineation of outputs andtheir sensitivities to model parameters in the time-scale domain, regions in the time-scale plane can be identified whereinthe sensitivity of the output with respect to one model parameter dominates all the others. This allows approximation of theprediction error in terms of individual model parameters in isolated regions of the time-scale domain, thus enabling parameterestimation based on a small set of wavelet coefficients. These isolated regions of the time-scale plane also reveal numeroustransparencies to be exploited for parameter estimation. It is shown that by taking advantage of these transparencies, therobustness of parameter estimation can be improved. The results also indicate the potential for improved precision and fasterconvergence of the parameter estimates when shape attributes are used in place of the magnitude.

Keywords: sensitivity analysis; partial differential equations; parameter estimation; wavelet transforms

1. Introduction

Nonlinear dynamic models are the essential componentsof virtual environments that drive today’s design, optimi-sation, control, and automation practice. They provide theframework for characterising the behaviour of biological,ecological, social, and economic systems, as well as arti-facts like aircraft and manufacturing systems. However, tobe effective, models must have a high degree of fidelity toreliably represent the process. This entails having the cor-rect form as well as accurate model parameters (coefficientsand exponents). Effective parameter estimation, therefore,is essential to model development.

Dynamic models generally comprise a set of ordinaryor partial differential equations and are often constructedaccording to first-principles or empirical knowledge of thesystem. Parameter estimation entails adjusting the modelparameters, � ∈ �Q, so as to minimise the sampled pre-diction error, ε(tk, u(t),�) between the measured outputsy(tk, u(t)) = [y1(tk), . . . , yR(tk)]T ∈ �R and modeled out-puts, y(tk, u(t),�) = [y1(tk), . . . , yR(tk)]T ∈ �R , obtainedwith the same input u(t), as Ljung (1999):

ε(tk, u(t), �) = y(tk, u(t))

− y(tk, u(t), �) tk = t1, . . . , tN (1)

The parameter estimation problem can be viewed as anonlinear optimisation problem (Astrom & Eykhoff, 1971)wherein the solution is sought to minimise a cost function,

∗Corresponding author. Email: [email protected]

V, in terms of the prediction error, as

�(u(t), εN ) = arg min�

V (�, u(t), εN ) (2)

where εN := (ε(tk, u(t), �))k=1,...,N denotes the vector ofsampled prediction error, with all dependencies hidden bythe notation.

The solution to the above optimisation problem can besought by gradient-based methods such as nonlinear least-squares (NLS) (Seber & Wild, 1989), NARMAX methods(Billings, 2013), or genetic algorithms (Goldberg, 1989),convex programming (Fletcher, 1987), Monte Carlo opti-misation (Rubinstein, 1986), or adaptive estimation tech-niques (Narendra & Annaswamy, 1989; Sastry & Bodson,1989). Regardless of the solution method used, the costfunction V in Equation (2) is generally formulated based onthe magnitude of the prediction error, as

V (εN ) =N,R∑k,j

L(εj (tk)) (3)

where εj (tk) = yj (tk) − yj (tk) is the prediction error asso-ciated with the jth output and L is a scalar-valued (typi-cally positive) function such as the square function used inNLS.

Among the above optimisation methods, NLS is themethod of choice for dynamic systems due to its efficient

C© 2014 Taylor & Francis

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014

http://dx.doi.org/10.1080/00207721.2014.943822

mailto:[email protected]

2 J.C. Simmons and K. Danai

use of gradients (Strang, 2006). It adjusts the model param-eters iteratively as

�(q + 1) = �(q) + μ��(q) (4)

where q is the iteration number, �� = �∗ − � =[�θ1, . . . ,�θQ]T denotes the vector of parameter errors be-tween the true parameter values �∗ = [θ∗

1 , . . . , θ∗Q]T ∈ RQ

and their current estimate � = [θ1, . . . , θQ]T . The vectorof parameter errors is estimated at each iteration by NLSas

�� = (�T �)−1�T εN (5)

where � ∈ �NR × Q denotes the matrix of output sensitivi-ties (i.e., Jacobian), having the form

� =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

∂y1(t1, �)/∂θ1 . . . ∂y1(t1, �)/∂θQ

.... . .

...∂y1(tN , �)/∂θ1 . . . ∂y1(tN , �)/∂θQ

.... . .

...∂yR(t1, �)/∂θ1 . . . ∂yR(t1, �)/∂θQ

.... . .

...∂yR(tN , �)/∂θ1 . . . ∂yR(tN , �)/∂θQ

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦(6)

with each column characterising the sensitivity of all out-puts to an individual model parameter at tk = t1, . . ., tN.

The NLS solution in Equation (5) is based on the mag-nitude of the prediction error, as defined by the cost functionin Equation (3). An appealing alternative to the error mag-nitude is the error shape (e.g., slope and/or rate of slopechange), as represented by the continuous wavelet trans-form(s) (CWT)(s) (Mallat, 1998) of the prediction error.However, the CWTs of time series include both times andscales. They convert an NR dimensional time series intoR surfaces spanning the time-scale planes of N × M di-mension; hence, they expand Mfold the size of the data tobe used for parameter estimation. This increased dimen-sionality, in turn, impedes the direct insertion of the datacontained in these surfaces in Equation (5) for parameterestimation, because the resulting � and εN which repre-sent the cascaded elements of these surfaces would be toobloated to be implemented in Equation (5). This paper offersa methodical solution to this data expansion by adopting aselective approach to data inclusion.

The salient feature of the proposed method is its sizemitigation of the CWT surfaces by focusing on isolatedinformation-rich regions of these surfaces. These regionsof the time-scale plane, called ‘parameter signatures’, canbe isolated because of the enhanced delineation of time se-ries in the time-scale domain. Each region is sought whereinthe sensitivity of an output to a model parameter dominatesits sensitivities to all the other model parameters, hence

making it possible to attribute the prediction error to theerror of individual model parameters in their correspond-ing parameter signature (Danai & McCusker, 2009). Thesesingle-parameter approximations of the prediction error en-able, in turn, consideration of isolated segments of surfacesof the output sensitivities and prediction error for parameterestimation, thus making it tractable to include isolated por-tions of the CWTs in Equation (5) for parameter estimation.A potential pitfall of the above scheme is the absence of pa-rameter signatures due to parameter non-identifiability. Forsuch cases, an alternative integration routine is consideredwherein separate estimates of individual model parametersare obtained for their iterative adaptation.

2. Overview of PARSIM

The single-output single-wavelet transform separate param-eter estimation solution of the Parameter Signature IsolationMethod (PARSIM) is presented in Danai and McCusker(2009). Here we provide its generalised solutions that canaccommodate multiple outputs and multiple WTs. Briefly,PARSIM capitalises on the enhanced delineation of out-put sensitivities to isolate regions of the time-scale planewherein one output sensitivity dominates all the others. Jus-tified by this dominance, the prediction error is attributedto the error of the corresponding parameter in each region,hence allowing inclusion of select portions of wavelet coef-ficients for least-squares estimation or separate estimationof individual parameter errors.

PARSIM, like NLS, assumes an accurate and identifi-able model, M�. This implies that the true model parametervalues �∗ can be found if the model outputs y(t) obtainedunder the same input u(t) applied to the process match theobservations y(t) in the mean square sense; i.e.,

y(t, u(t)) ≡ y(t, u(t),�) =⇒ � = �∗ (7)

PARSIM, like NLS, also relies on a first-order approxima-tion of the model as

y(t, u(t)) ≈ y(t, u(t), �)

+Q∑

i=1

�θi

∂ y(t, u(t),�)

∂θi

∣∣∣∣�=�

+ ν (8)

to yield the approximation of the prediction error, as

ε(t, u(t),�∗, �) ≈ �� + ν (9)

where ν denotes measurement noise.

2.1. Shape attributes

The capacity to represent the shape attributes of time se-ries is rooted in CWTs’ multi-scale differential feature

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014

International Journal of Systems Science 3

Figure 1. Gauss wavelet (left) and Sombrero wavelet (right) which are the first and second derivatives of the Gaussian function,respectively.

(Mallat, 1998). Consider the wavelet ψ(t) to be the nthorder derivative of the smoothing function β(t); i.e.,

ψ(t) = (−1)ndn(β(t))

dtn(10)

and the wavelet transform (WT) of the time function f attime t defined as

W {f }(t, s) = f ∗ ψs(t) =∫ ∞

−∞f (τ )

1√sψ∗(

τ − t

s

)dτ

(11)

where W{f} denotes the WT of the time function f(t), ∗

denotes convolution, ψ∗ is the complex conjugate of ψ ,ψs(t) = 1√

sψ( t

s), and t and s denote the time (transla-

tion) and scale (dilation or constriction) parameters, re-spectively. Then according to Mallat and Hwang (1992),this wavelet transform is a multi-scale differential operatorof the smoothed function f∗βs(t) in the time-scale domain;i.e.,

W {f }(t, s) = sn dn

dtn(f ∗ βs(t)) (12)

Using this feature, one can utilise a CWT to represent acertain shape attribute of a time series. For instance, onemay consider the smoothing function β(t) to be the Gaussianfunction. In this case, the Gauss wavelet which is the firstderivative of the Gaussian function, as shown in the leftplot of Figure 1, produces a WT that represents the slopeof the signal f(t) smoothed by the Gaussian function, and

orthogonal to it. Similarly, the Sombrero wavelet which isthe second derivative of the Gaussian function, as shown inthe right plot of Figure 1, produces a WT that denotes therate of slope change of this smoothed signal in the time-scale domain.

The shape representation capacity of CWTs is illus-trated via a simple example. The time series having piece-wise constant slopes is shown in the top plot of Figure 2.Shown in the second row are, respectively, the slice of itsGaussian smoothed surface at the first scale (left), the first-scale slice of its Gauss WT (middle), and the first-scaleslice of its Sombrero WT (right). As can be observed fromthe time series, depicted in the top plot of Figure 2, thereare slope changes at approximately 30 and 80 seconds, af-ter which the time series becomes flat. The slice of theGaussian smoothed surface (left bottom) is almost identi-cal to the original signal (top) because Gaussian smoothingat low scales does not change the time series due to thevery narrow span of the smoothing function. The slice ofthe Gauss WT (middle bottom) is piece-wise constant withthe magnitude of each segment being proportional to thetime series slope. The slice of the Sombrero WT containstwo spikes at the points of slope change, representing therate of change of slope of the time series.

Another important feature of the CWTs relevant toPARSIM is their enhanced delineation of time series in thetime-scale domain (Addison, 2002). To illustrate this point,let us consider the hypothetical output sensitivities ζ 1 andζ 2 in the left plot of Figure 3 associated with the hypothet-ical parameters θ1 and θ2. The two output sensitivities arenearly collinear with a correlation coefficient of ρ = 0.9997.Yet if we consider the difference between their absolute

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 2. Example of features extracted from a time-series by different WTs.

Figure 3. Two highly correlated output sensitivities and the difference between the absolute normalised values of their Gauss waveletcoefficients.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 4. Gauss wavelet coefficients of the prediction error shown as a surface together with the parameter signatures of two modelparameters shown by grey regions in the time-scale plane.

normalised Gauss wavelet coefficients, (|W{ζ 1}|/max|W{ζ 1}|) − (|W{ζ 2}|/max |W{ζ 2}|), shown in the right plotof Figure 3, one observes that it consists of both positiveand negative values. This indicates that for each output sen-sitivity, there are regions of the time-scale plane whereinthe absolute value of one output sensitivity’s normalisedwavelet coefficient exceeds the other’s, albeit by a smallmargin. One can extrapolate these results to multiple out-put sensitivities, with the expectation that the regions as-sociated with individual parameter signatures will becomesmaller with the overlap from the other output sensitivities’wavelet coefficients. However, given the independent out-put sensitivities (i.e., full ranked Jacobian �) and adequateresolution in the time-scale plane (i.e., number of pixels),there will always be at least a pixel wherein the waveletcoefficient of each output sensitivity exceeds all the others.

2.2. Notion of parameter signature

The enhanced delineation of output sensitivities enablesisolation of regions of the time-scale domain wherein asingle output sensitivity dominates the others (Danai &McCusker, 2009). We refer to each such region as a param-eter signature, as formally defined below.

Definition: The parameter signature �ri,j of the parame-

ter θ i is the region consisting of all pixels (tk, sl) in thetime-scale plane wherein the normalised wavelet coeffi-cients obtained from the rth WT of the corresponding out-put sensitivity |Wr{∂yj /∂θi}(tk, sl)| exceeds the normalisedwavelet coefficients of all the other output sensitivities by adominance factor ηr

j , expressed mathematically as

If ∃(tk, sl) � |Wr{∂yj /∂θi}(tk, sl)|> ηr

j |Wr{∂yj /∂θm}(tk, sl)|∀ m = 1, . . . ,

Q �= i =⇒ (tk, sl) = (t ik, sil ) ∈ �r

i,j (13)

where j is associated with the output, the subscript or su-perscript r denotes the type of WT (i.e., Gauss, Sombrero,etc.) and

Wr{∂yj /∂θi} = Wr{∂yj /∂θi}max(t,s)

∣∣Wr{∂yj /∂θi}∣∣ (14)

Now if the dominance factor ηrj is selected to be large,

i.e., ηrj >> 1, then one can assume the wavelet coefficients

of the corresponding output sensitivity, ∂yj /∂θi , to be dom-inant among the wavelet coefficients of the output sensi-tivities associated with output yj. This then enables us toredefine the wavelet coefficient of the prediction error inEquation (9) in terms of a single parameter at the corre-sponding parameter signature �r

i,j , as

Wr{εj }(t ik, s

il

) ≈ �θi Wr{∂yj /∂θi}(t ik, s

il

)+ Wr{ν}∀ (

t ik, sil

) ∈ �ri,j (15)

For illustration purposes, the parameter signatures oftwo different parameters are shown by the grey regions ofthe time-scale plane in Figure 4 against the WT (surface)of the prediction error. According to Equation (15), theprediction error at each grey pixel (i.e., pixel of parametersignature) can be attributed to the error of the correspond-ing parameter, �θ i, and used for its estimation. Ideally,each of the parameter error estimates �θ i , at each pixel(t ik, s

il ) ∈ �r

i,j , should be identical with a perfect param-eter signature. However, as is shown in Danai and Mc-Cusker (2009) and subsequent sections, the parameter sig-natures are far from perfect and, as such, the parametererror estimates at different pixels are not identical. We ad-dressed this in Danai and McCusker (2009) for a singleWT by obtaining the mean estimate of the parameter errorover all pixels of the corresponding parameter signature.However, here we are facing multiple parameter signatures

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


obtained from different WTs of various outputs. As such, amore elaborate solution needs to be formulated wherein thetransparencies afforded by the parameter signatures can beused to advantage. We have developed two solutions to con-solidate the parameter estimates obtained from the above

single-parameter approximation of the prediction error inEquation (15). The first solution adopts a least-squares ap-proach to the integration of estimates. The second solutionis an alternate approach when the absence of parametersignatures, due to low parameter identifiability, impedesthe application of the preferred least-squares solution.

2.3. The least-squares solution

PARSIM uses a format akin to NLS for integration of theparameter error estimates, with the form

��s = (�s

T �s)−1

�sT εs (16)

where �s is the matrix of wavelet coefficients of out-put sensitivities at the parameter signatures, and εs is thematrix of wavelet coefficients of the prediction error at

the parameter signatures. Therefore, the noted feature ofPARSIM is its selective incorporation of wavelet coeffi-cients in the least-squares solution in accordance with thedecoupled prediction error formulation in Equation (15).Using this format, the Jacobian matrix, �s, finds the form

�s =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

W1{∂y1/∂θ1}|�1,11,1

W1{∂y1/∂θ2}|�1,12,1

. . . W1{∂y1/∂θQ}|�1,1Q,1

......

. . ....

W1{∂y1/∂θ1}|�1,max(#1,1)

1,1W1{∂y1/∂θ2}|�1,max(#1,1)

2,1. . . W1{∂y1/∂θQ}|

�1,max(#1,1)

Q,1

W1{∂y2/∂θ1}|�1,11,2

W1{∂y2/∂θ2}|�1,12,2

. . . W1{∂y2/∂θQ}|�1,1Q,2

......

. . ....

W1{∂y2/∂θ1}|�1,max(#1,2)

1,2W1{∂y2/∂θ2}|�1,max(#1,2)

2,2. . . W1{∂y2/∂θQ}|

�1,max(#1,2)

Q,2

......

. . ....

WP {∂yR/∂θ1}|�P,11,R

WP {∂yR/∂θ2}|�P,12,R

. . . WP {∂yR/∂θQ}|�P,1Q,R

......

. . ....

WP {∂yR/∂θ1}|�P,max(#P,R )

1,R

WP {∂yR/∂θ2}|�P,max(#P,R )

2,R

. . . WP {∂yR/∂θQ}|�

P,max(#P,R )

Q,R

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(17)

where each column comprises the cascaded wavelet co-efficients, by various WTs, of the output sensitivities ofa parameter at its parameter signatures. By confining thewavelet coefficients to those at the parameter signatures,PARSIM reduces the size of the Jacobian from �NMR × Q

to �max(#r,j )R×Q where the dimension max(#r,j ) denotes themaximum cardinal number of the parameter signatures forall θ i at the rth WT and jth output. It is worth noting herethat we uniformise the size of the parameter signatures bypadding the wavelet coefficients of size-deficient parametersignatures, relative to max(#r,j ), with zero.

The least-squares formulation of Equation (16) also de-viates from the ordinary least-squares formulation of Equa-tion (5) in that it requires the prediction error to be definedas a matrix, of the form

εs =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

W1{ε1}|�1,11,1

W1{ε1}|�1,12,1

. . . W1{ε1}|�1,1Q,1

......

. . ....

W1{ε1}|�1,max(#1,1)

1,1W1{ε1}|�1,max(#1,1)

2,1. . . W1{ε1}|�1,max(#1,1)

Q,1

W1{ε2}|�1,11,2

W1{ε2}|�1,12,2

. . . W1{ε2}|�1,1Q,2

......

. . ....

W1{ε2}|�1,max(#1,2)

1,2W1{ε2}|�1,max(#1,2)

2,2. . . W1{ε2}|�1,max(#1,2)

Q,2

......

. . ....

WP {εR}|�P,11,R

WP {εR}|�P,12,R

. . . WP {εR}|�P,1Q,R

......

. . ....

WP {εR}|�

P,max(#P,R )

1,R

WP {εR}|�

P,max(#P,R )

2,R

. . . WP {εR}|�

P,max(#P,R )

Q,R

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

(18)

where the ith column of the error matrix, εs, comprisesthe cascaded wavelet coefficients, by various WTs, of theprediction errors of different outputs at the corresponding

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


parameter signatures �ri,j of parameter θ i. Based on the

above formulation, the parameter error estimate, ��s, ac-cording to Equation (16) will be a Q × Q matrix of theform

��s =

⎛⎜⎜⎜⎝�θ1 ζ(1,2) . . . ζ(1,Q)

ζ(1,2) �θ2 . . . ζ(2,Q)...

.... . .

ζ(1,R) . . . ζ(R,Q−1) �θQ

⎞⎟⎟⎟⎠ (19)

where each column of ��s is obtained by least-squares es-timation using the corresponding column in εs. Among thecomponents of ��s, we only consider the diagonal terms,�θ i , which are obtained using the components of �s(., i)and εs(., i) associated with the same parameter signatures,hence, representing a true adaptation of the least-squaresmethod. The off-diagonal terms, ζ (a, b), by the same anal-ogy, are meaningless in that they are obtained from compo-nents of �s(., a) and εs(., b) that relate to different parame-ter signatures. The parameter error estimates �θ i are thenupdated as in NLS

θi(q + 1) = θi(q) + μi(q)�θ i(q) (20)

except that here each parameter is adapted separately withits separate adaptation step size μi(q).

As will be shown later, the above solution is effective ex-cept in cases where no parameter signature can be obtainedfor a parameter across different WTs of various outputs. Insuch cases, the Jacobian �s would be rank deficient, pre-cluding the implementation of the least-squares solution.Lack of a signature can be due to a variety of causes. Oneis when the parameter is not observable through any ofthe outputs. Another is when an output sensitivity is closelycorrelated to another output sensitivity such that no waveletcoefficient from a WT dominates those of the others. A sce-nario wherein parameter signatures cannot be extracted isillustrated in the context of the Chua’s circuit (Platform 1),described below.

Platform 1: Chua’s circuit is described by the ordinarydifferential equations (Kennedy, 1994):

dI3

dt= −R0

LI3 − 1

LV2

dV2

dt= 1

C2I3 − G

C2(V2 − V1)

dV1

dt= G

C1(V2 − V1) − 1

C1f (V1)

y = [I3 V2 V1

]T(21)

where

f (V1) = GbV1 − (Ga − Gb)(|V1 + E| − |V1 − E|)

Table 1. Number of pixels of parameter signatures fromthree different transformations (Gaussian smoothing, Gauss, andSombrero WTs) of the first two outputs (I3 and V2) of the Chua’scircuit.

Parameter signature size

WT �L �R0 �C2 �G �C1

CWT11 93 759 0 0 0

CWT12 789 57 0 2 0

CWT21 0 970 9 176 0

CWT22 906 0 20 3 0

CWT31 1487 1272 0 0 0

CWT32 199 0 0 19 0

and

�∗ = [L∗ R∗

0 C∗2 G∗ C∗

1

]T= [−9.7136 4.75 −1.0837 33.932813 1

]TTo avoid the chaotic aspect of Chua’s circuit, a short timewindow of 5 seconds is considered for simulation. Sim-ulated outputs associated with longer time windows ofthis system would preclude its first-order approximationaccording to Equation (8) that is essential for the appli-cability of PARSIM and NLS. Synchronisation methods(Parlitz, 1996; Parlitz, Junge, & Kocarev, 1996), among oth-ers, have been proposed for parameter estimation of thesesystems when their outputs exhibit their chaotic behaviour(Abarbanel, 1993). The error surfaces of Chua’s circuit fromdifferent outputs in this time window are devoid of localminima but have shallow gradients which pose challeng-ing parameter identifiability conditions. The parametersG∗

a,G∗b and E∗ are non-identifiable by any of the outputs,

so are held constant at their true values. For this illustrationand throughout the paper for this model, the nominal pa-rameter values, which are also used as the initial values ofthe parameters for their estimation, are

� = [0.95L∗ 1.05R∗

0 0.95C∗2 1.05G∗ 0.95C∗

1

]TThe prediction error ε(t) = y(t,�∗) − y(t, �) then reflectsthe mismatch between the true and nominal parametervalues.

The lack of parameter signatures due to parameter non-identifiability is illustrated for an episode of Chua’s circuitin Table 1. Listed in this table are the number of pixels(cardinal number) of the parameter signatures from threedifferent transformations (Gaussian smoothing, Gauss, andSombrero WTs) of the first two outputs. The absence ofparameter signatures is clear from the zeros in the last col-umn. This inability to extract parameter signatures, at thisepisode of parameter estimation, results in a rank-deficient

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Table 2. Sample of parameter error estimates obtained from two CWTs (Gauss and Sombrero) of the three Chua’s circuit outputs, shownwith their true values.

Error estimates

Parameter True error CWT11 CWT1

2 CWT13 CWT2

1 CWT22 CWT2

3

�L −0.48 −0.32 −0.58 −0.88 −0.49 −0.37 −0.38�R0 −0.23 0.079 0.071 0.069 0.0 0.002 0.052�C2 −0.05 0.0 −0.009 −0.0108 −0.0118 0.0 −0.0115�G −1.69 0.67 −2.17 1.16 −1.13 −0.915 −4.65�C1 0.05 0.0 0.0 0.0 0.005 0.0 0.0

Jacobian �s, with a nill column, and unfit for parameter es-timation. To remedy the disruptions to parameter estimationby rank-deficient Jacobians, a second estimation procedureis developed for PARSIM as discussed next.

2.4. The separate parameter estimate solution

A direct ramification of the single-parameter approxima-tion of the prediction error in Equation (15) is that the errorof the corresponding parameter, �θ i, can be estimated atthe pixels (t ik, s

il ) of the associated parameter signature �r

i,j .Ideally the estimate of �θ i at a pixel of �r

i,j should be iden-tical to the next at another pixel. However, the parametersignatures are not perfect and, as such, the parameter errorestimates are not identical. In the single-WT single-outputimplementation of PARSIM (Danai & McCusker, 2009),the mean of the parameter error estimates obtained at theindividual pixels of the parameter signature were used inEquation (20), as

�θr

i,j = �θr

i,j = 1

Nri,j

N,M∑k,l

Wr{εj }(t ik, sil )

Wr{∂yj /∂θi}(t ik, sil )

∀ (t ik, sil ) ∈ �r

i,j (22)

where Nri,j denotes the number of pixels (t ik, s

il ) included

in �ri,j (i.e., its cardinal number). In the multi-WT multi-

output implementation of PARSIM, however, several (P ×R) estimates are obtained, each different from the other, asshown in Table 2 at an estimation iteration of Chua’s circuitparameters. As expected, the estimates vary widely, somebeing in even opposite direction (e.g., those of �G). Onecould use, here, the averaging strategy adopted in Danaiand McCusker (2009) again, but such a strategy would notonly render too crude an estimate but also disregard thequality of the parameter signatures yielding the individualestimates. A more prudent approach, therefore, is to con-sider a weighted strategy for the integration of the estimateswhereby the accuracy of each estimate is inferred from thequality of the associated parameter signature. Such a strat-

egy will have the form

�θ i =P,R∑r,j

wri,j �θ

r

i,j (23)

where wri,j denotes the weight assigned to each estimate

�θr

i,j according to the confidence in its accuracy. It is shownbelow that the weights wr

i,j can be defined in terms of thequality of the parameter signatures, which can be obtainedaccording to the transparencies afforded in the time-scaledomain.

As implied above, there are measures available toPARSIM to assess the quality of the parameter signatures.At the same time, PARSIM can benefit from several degreesof freedom to affect the quality of parameter signatures forimproved estimation. One such degree of freedom is thatseveral wavelet transforms can be obtained for each outputto characterise its different shape attributes. A second de-gree of freedom is the dominance factor ηr

j in Equation (13)used for parameter signature extraction. A third degree offreedom is the adaptation step size, μi(q) in Equation (20),that can be adjusted at each iteration based on the quality ofthe corresponding parameter error estimate. PARSIM usesthe transparencies afforded in the time-scale domain to as-sess the quality of parameter signatures. It then uses its de-grees of freedom to affect the parameter signatures qualityand the parameter estimates they provide. The transparen-cies used for assessing the parameter signature quality arediscussed next, followed by a description of the degrees offreedom PARSIM uses to affect the quality of parametersignatures.

3. Degrees of freedom of PARSIM

A salient feature of PARSIM is its capacity to assessthe quality of parameter signatures according to theuniformity/non-uniformity of the parameter error estimatesacross the pixels of individual parameter signatures. Non-uniformity of the parameter error estimates arises fromthe first-order approximation of the prediction error in

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 5. Parameter signature pixels shown together with the modulus maxima of the wavelet coefficients of the corresponding outputsensitivity. The right plot is the full parameter signature, whereas the left plot shows the parameter signature reduced to 50 pixels closestto the modulus maxima of the output sensitivity.

Equation (8) as well as the approximate nature of extractedparameter signatures. The latter corresponds to the fact thatparameter signatures are obtained at finite dominance fac-tors ηr

j in Equation (13), which do not necessarily satisfythe requisite condition of ηr

j >> 1 satisfying the notion ofdominance in Equation (15). Since the adaptation mech-anism in PARSIM benefits from several degrees of free-dom, namely the dominance factor ηr

j in Equation (13), theweight wr

i,j in Equation (15), and the adaptation step sizeμi(q) in Equation (20), it can take advantage of differentcharacterisations of this non-uniformity for improving itsperformance. The mentioned transparencies will be usedas feedback to adjust several degrees of freedom avail-able to PARSIM for improving its parameter estimationperformance.

3.1. Parameter signatures alignmentwith the edges

It has been reported widely that ‘edges’ represent the mostdistinguishable aspect of images and are used extensivelyfor data condensation (Mallat, 1998). Edges are detectedin the time-scale domain by the modulus maxima of theCWTs (Mallat, 1998), as indicators of the decay of theCWT amplitudes across scales. Following the definition byMallat (1998), a modulus maxima at any point (t0, s0) on

the time-scale plane is a local maxima of |W{f}(t, s0)|. Thisimplies that at a modulus maxima (Mallat, 1998)

∂W {f }(t0, s0)

∂t= 0 (24)

where this maximum is a strict maximum and the maximalines are the connected curves s(t) in the time-scale planealong which all points are modulus maxima.

The notion that the CWT modulus maxima capture asignificant part of the image motivates giving them pref-erence in the parameter signatures. To this end, only thoseparameter signature pixels can be selected that correspondto the largest wavelet coefficients of the associated outputsensitivity. This selection strategy is illustrated in Figure 5where on the left is the full parameter signature superim-posed on the modulus maxima of the wavelet coefficientof the corresponding output sensitivity, and on the right isthe refined parameter signature with its pixels selected tocorrespond to the largest wavelet coefficients of the outputsensitivity. As is illustrated by the two parameter signatures,this strategy achieves the objective of mostly selecting thepixels intersecting or of close proximity to the modulusmaxima of the output sensitivity.

The above strategy of refining the parameter signaturesto a reduced size provides the following benefits: (1) itpre-filters the signature pixels such that only those pixels

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 6. The parameter error estimates from the pixels of the parameter signatures in Figure 5. The left plot shows the parameter errorestimates of the original parameter signature, whereas the right plot shows the estimates at the select (fifteen) pixels.

containing the most essential information are used in theparameter error estimate, and (2) it restricts and unifies thenumber of pixels of every parameter signature, thereby re-sulting in a compact and uniformly formed Jacobian, �s,in Equation (17). To underline this point, shown respec-tively in Figure 6 are the parameter error estimates at thepixels of the parameter signatures in Figure 5. Althoughthe parameter error estimates from the refined parametersignature on the right are not uniform in sign, they arefar more condensed and, therefore, of lower entropy thanthose on the left. Higher entropies represent more evenlydistributed parameter error estimates across the pixels ofthe corresponding parameter signature. By the same token,the more concentrated the parameter error estimates are,the lower is the entropy of their population (Addison, 2002;Coifman and Wickerhauser 1992). In accordance with thisconcept, minimum entropy corresponds to the bulk of theinformation content centred at as few pixels as possible inthe parameter signature, leading to enhanced separation ofthe columns of the Jacobian �s in Equation (17).

The ramification in parameter estimation of the aboveparameter signature refinement strategy is evaluated in theapplication to the van der Pol oscillator, introduced below asthe second platform of this paper. Parameter estimation ofthe van der Pol oscillator faces non-convex error surfacesand local minima, which are challenging to navigate bygradient-based parameter estimation methods such as NLSand PARSIM.

Platform 2: The van der Pol oscillator is an example ofa self-excited nonlinear oscillator, having the form Wang,

Dayawansa, and Martin (1999)

mx − c(1 − x2)x + kx = 0

y = [ x x ]T (25)

with its true parameters defined as �∗ = [m

∗c

∗k

∗]T =

[375 10000 75000]T. As in Chua’s circuit, the pre-diction error of this system reflects the differ-ence between the true and nominal parameters, asε(t) = y(t,�∗) − y(t, �), with the nominal parame-ters set as � = [m, c, k]T = [0.8θ∗

1 , 1.25θ∗2 , 0.8θ∗

3 ]T =[300, 12500, 60000]T . The system was simulated in re-sponse to the initial condition x(0) = 0.02, x(0) = 0 witha short time-window of 1.27 seconds to avoid chaotic be-haviour and accommodate its first-order approximation.

The effect of parameter signature refinement on param-eter estimation by PARSIM is shown in Figure 7, whichshows the estimates of the van der Pol oscillator parametersby the full and refined parameter signatures, respectively.The results clearly indicate the higher accuracy achieved bythe refined parameter signatures, hence giving credence tothe parameter signature refinement strategy adopted.

3.2. Dominance factor adaptation

PARSIM relies on the dominance factors ηrj in

Equation (13) to extract the parameter signatures. Theseparameter signatures then determine the regions of the time-scale domain to be included in PARSIM’s parameter esti-mation solution. Therefore, the dominance factors play a

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 7. Estimates of the van der Pol oscillator parameters bythe full and refined parameter signatures.

critical role in affecting not only the size of the parametersignatures and their pixel locations but also the magnitudeof the corresponding parameter error estimates and theirentropies. As a rule, a higher dominance factor results ina smaller parameter signature, as is clear from the smallerparameter signature extracted by a higher dominance fac-tor on the right. Smaller parameter signatures also tend toyield lower entropies for their corresponding parameter sig-natures (Bonachela et al. 2008). Therefore, it appears thata suitable strategy for dominance factor selection shouldfavour higher dominance factors. However, too high a dom-inance factor can lead to a null parameter signature. It is,therefore, essential to the operation of PARSIM to have astrategy whereby a suitable dominance factor is continu-ally selected such that the highest overall quality parametersignatures are extracted for each WT of an output at eachiteration of parameter estimation.

The strategy adopted for selection of each dominancefactor ηr

j extracts parameter signatures at a pre-specified setof dominance factors and then selects the dominance factoryielding parameter error estimates with the least overallentropy. This selection strategy has the form

ηr∗j = arg min

ηrj

√√√√ Q∑i=1

(Sr

i,j

(ηr

j , �θr

i,j

))2(26)

where ηr∗j denotes the selected dominance factor for the rth

WT of the jth output yielding the least L2 entropy norm ofthe parameter error estimates among the dominance factorsconsidered. Another benefit of the devised dominance fac-tor selection strategy is its assurance of maximal parametersignature extraction by avoidance of null parameter signa-tures that are represented by infinite entropy values. Nullparameter signatures, as shown in Table 1, not only hin-der the ‘least-squares solution’ of PARSIM but also lead tozero parameter error estimates (Table 2) that denigrate theeffectiveness of its ‘separate parameter estimate solution.’

For illustration purposes, shown in Figure 8 are theselected dominance factors for three different WT/outputcombinations of the van der Pol oscillator at different it-eration points of parameter estimation. From the dynamicsof the selection process, it is clear that different dominancefactor levels are selected for each WT/output combinationamong the five choices available. For instance, the domi-nance factor selected in the left plot is mostly at the lowerend, whereas the ones in the middle and right plots aregravitating toward the higher end.

The ultimate test of the adopted selection strategy, how-ever, lies on its improvement of parameter estimation byPARSIM. One measure of parameter estimation quality isthe condition number of the Jacobian �s (Equation (17)),which is examined for the selected dominance factor of

Figure 8. Illustration of dominance factor selection during estimation of the van der Pol oscillator parameters.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 9. The condition number of �s changes with dominancefactor.

the van der Pol oscillator in Figure 9 among the conditionnumbers of the Jacobian matrices of the dominance fac-tors considered. The results clearly indicate the effective-ness of the devised dominance factor selection strategy inyielding the best-conditioned Jacobian matrix for parameterestimation, even though the selection is performed indepen-dent of the condition number.

Next examined is the effectiveness of the adopted domi-nance factor selection strategy on parameter estimation. Forthis, the effect of the selection strategy is evaluated in esti-mation of the van der Pol oscillator parameters. Shown inFigure 10 are the prediction and precision errors of the vander Pol oscillator parameter estimates at different iterationpoints (q in Equation (20)) obtained with dominance factoradaptation (noted as variable DF). Shown in Figure 10 arealso the errors from two other estimation runs at two fixeddominance factors. The prediction error in Figure 10 rep-

resents the error between the true and modeled outputs, as∑k|ε(tk)|, and the precision error, εθ , represents the squared

sum of the parameters error at each iteration q, formulatedas

εθ (q) =Q∑

i=1

(θ∗i − θi(q)

θi(q)

)2

(27)

to denote the accuracy of the estimated parameters. It isnoted that in practice the true parameter values θ∗

i are notknown to allow estimation of the precision error. However,in simulation-based studies such as this, they provide aneffective means of evaluating the validity of parameter es-timates, beyond the prediction error minimisation capacityof the method. Both the prediction and precision errorsin Figure 10 show the effectiveness of the devised dom-inance factor adaptation strategy in improving parameterestimation as compared to using a fixed dominance fac-tor throughout the parameter estimation run. The aboveresults, therefore, show that the adopted dominance factorselection strategy not only leads to a Jacobian that has thebest condition number, among the choices considered, butalso improves the accuracy of parameter estimation.

3.3. Weight assignment

The quality measure of each estimate �θr

i,j in PARSIM isits Shannon entropy. Therefore, in the separate parameterestimate solution, the weights wr

i,j in Equation (23) areassigned such that higher weights are assigned to parameterestimates of lower entropy. Accordingly, the weights are

Figure 10. Illustration of how optimal dominance factor effects convergence.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


defined as

wr.,j = max

(Sr

.,j

)− Sr.,j (28)

where max(Sr.,j ) denotes the maximum entropy of all the

parameter error estimates at the current iteration. The aboveweight assignment strategy, which is consistent with asso-ciating lower entropies with higher quality estimates, willassign the highest weight to the parameter error estimatewith the lowest entropy.

4. Performance evaluation

The contribution of PARSIM is to allow incorporation ofshape attributes in parameter estimation. It achieves this byincluding selected portions of output sensitivities and pre-diction error wavelet coefficients in parameter estimation,to allow managing the enormousness of data generated byvarious WTs of the model outputs. There are two aspects ofPARSIM’s performance. One aspect concerns the internalworkings of PARSIM, as it pertains to its different solutions(i.e., least-squares versus separate parameter estimates) anddifferent WT combinations. The other aspect concerns thepotential advantages of PARSIM in comparison to time-based parameter estimation.

The performance of gradient-based parameter estima-tion methods is dependent upon the global convexity of theerror surface; therefore, the general performance of PAR-SIM cannot be comprehensively evaluated independent ofits application platform. In lieu of such generalisation ca-pacity, we have sufficed to presenting results from both itssolutions and with different WTs, to provide a qualitativeview of their influence on parameter estimation cases con-sidered.

Although PARSIM benefits from several degrees offreedom that are unavailable to time-based parameter es-timation, its ultimate advantage is realised when the shapeattributes provide a competitive advantage to the magni-tude of output sensitivities and prediction error. Again, inthe absence of a generalised framework, results are pro-vided to illustrate behavioural aspects that can be gainedby the inclusion of shape attributes. The results presentedhere are, therefore, meant to only highlight those aspectsof PARSIM’s performance that seem to offer an advantageover magnitude-based parameter estimation. In that light,we have presented the results of PARSIM together with thatof NLS, solely to provide a basis for evaluating its per-formance and not as evidence of its overall superiority totime-based parameter estimation.

4.1. The two PARSIM solutions

The two solutions of PARSIM are formulated in Equa-tions (16) and (22), respectively. The fundamental differ-ence between the two solutions is the use of the covariance

matrix (�sT �s) in the least-squares (LS) solution (Equa-

tion (16)), which incorporates the cross-correlation betweenthe output sensitivities (i.e., the off-diagonal components ofthe covariance matrix) in estimation of the parameter errors,�θ i. In this light, the separate parameter estimate solutionin Equation (22) can be viewed as associated solely withthe diagonal elements of the covariance matrix. To pro-vide a comparison between the two solutions, hence thesignificance of the off-diagonal elements of the covariancematrix, the parameters of both the Chua’s circuit and thevan der Pol oscillator were estimated by each of these solu-tions. The prediction and precision errors of the estimatesduring the estimation run are shown in Figure 11. The re-sults from the van der Pol oscillator indicate that the sep-arate parameter estimate solution becomes entrapped ina local minima, even though the prediction error reacheszero. Those from the Chua’s circuit indicate that the LSsolution provides a much smoother convergence profile, al-though both solutions converge to the correct parametervalues. Overall, the results in Figure 11 clearly indicate themore effective performance of the LS solution. Therefore,the LS solution is designated as the solution of choice byPARSIM, unless the Jacobian �s becomes rank deficientdue to the absence of signatures for a parameter by anyof the WT/output combinations (e.g., see Table 1). In suchcase and for that iteration, the separate parameter estimatesolution is used as the backup solution.

4.2. Combination of shape attributes

The salient feature of PARSIM is its incorporation of shapeattributes in parameter estimation. Therefore, its effective-ness vis-a-vis time-based (magnitude-based) parameter es-timation is contingent upon the added information contentprovided by the shape attributes over and beyond the mag-nitude. As demonstrated later, one condition of such con-tingency, that we have identified so far, is the smoothnessof output sensitivities. Yet, given smooth output sensitivi-ties and the preference for the shape attributes, the questionstill remains as to which shape attributes to consider. Toillustrate the significance of shape attributes in parameterestimation, shown in Figure 12 are the parameter estimatesof the van der Pol oscillator and Chua’s circuit by differenttransform combinations. Even though the results are case-specific, the precision errors on the left illustrate that onlyone combination of the WTs; i.e., Gaussian smoothing to-gether with the Gauss and Sombrero WTs, is effective infinding the global minimum for the van der Pol oscillator.On the other hand, the precision errors on the right indicatethe parameter estimates of the Chua’s circuit are more for-giving in that of the six transform combinations, only twofail to provide adequate information content and four ofthem lead the estimates to their true values. It is interestingto note that unlike the corresponding precision errors, theprediction errors on the left all reach zero, underlining the

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 11. Prediction and precision errors of parameter estimates of the van der Pol oscillator (left) and Chua’s circuit (right) by each ofthe two PARSIM solutions.

Figure 12. Parameter estimates of the van der Pol oscillator (left) and Chua’s circuit (right) obtained with different transform combinations.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 13. Prediction and precision errors of fifty estimation runs of Chua’s circuit parameters obtained by PARSIM or NLS withdifferent initial parameter values.

presence of local minima for the van der Pol oscillator. An-other point of interest in Figure 12 is the observation thatthe best estimation results correspond to the largest com-bination of transforms. Even though anecdotal and not ofvalue in and of itself, this observation highlights PARSIM’seffective integration capacity that enables it to benefit fromthe added information of various transforms for improvedparameter estimation.

4.3. Convergence characteristics

Defining the convergence characteristics of PARSIM is notan easy task, because of the case-specificity of the param-eter estimation problem. Furthermore, our experience withPARSIM is limited to a few platform applications, corre-sponding to a limited range of error surface characteristics.Therefore, the results presented are not meant to providea definitive and comprehensive view of PARSIM’s per-formance, they are to rather demonstrate only aspects ofPARSIM that are noteworthy as a parameter estimation so-lution. One such aspect is the potential for faster conver-gence. Another aspect is PARSIM’s tendency to be less vul-nerable to local minima entrapment. Yet another aspect ofPARSIM is its sensitivity to the smoothness of output sensi-tivities. The results provided next illustrate these aspects ofPARSIM.

4.3.1. Speed of convergence

In general, PARSIM is a computationally intensive method,due to the added computation associated with wavelettransformation, signature extraction and the selection of

its various degrees of freedom. As such, any potentiallyfaster convergence rates attained by PARSIM would beovershadowed by its longer computation times, except incases where simulation times dominate the times usedfor parameter estimation. While Chua’s circuit does notrepresent such a case, it can be used to evaluate thespeed of convergence of PARSIM. To this end, fifty es-timation runs of Chua’s circuit’s parameters were per-formed by PARSIM and NLS with random initial param-eter values within 2% of the nominal parameter values� = [0.95 θ∗

1 , 1.05 θ∗2 , 0.95 θ∗

3 , 1.05 θ∗4 , 0.95 θ∗

5 ]. Shownin Figure 13 are the prediction and precision errors of theseestimation runs. They indicate faster convergence ratesachieved by PARSIM than NLS. However, these results arenot to be generalised as a characteristic of PARSIM, as weexpect numerous scenarios to exist wherein PARSIM’s per-formance would be inferior to magnitude-based parameterestimation.

4.3.2. Evasion of local minima

A potential hazard of gradient-based parameter estimationis the entrapment in local minima with non-convex errorsurfaces. In these cases, the prediction error could reach aminimum of zero but not the precision error. A case in pointis the van der Pol oscillator, as represented by the results inFigure 12. To facilitate visualisation, the error surface of thevan der Pol oscillator is plotted in Figure 14 in terms of onlytwo of its parameters, c and k. Also shown in Figure 14 arethe trajectories of parameter estimates by both PARSIM andNLS from two starting points on the non-convex regions ofthe error surface. The results clearly indicate the marked

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 14. Two cases where PARSIM finds the global minimum when NLS gets entrapped in a local minima.

difference between the two trajectories. The NLS solutionsend in local minima, whereas the solutions from PARSIMconverge to the global minimum (bottom of the convexregion). However, the results in Figure 14 could be due to theinitial conditions used, or may be specific to the applicationplatform. To provide a slightly more diversified study ofthis aspect of PARSIM, a third platform is described belowwhich also poses the local minima challenge.

Platform 3: The parameter estimates of the nonlinearmass–spring–damper (MSD) model

mx + cx|x| + kx3 = 0 (29)

can also be entrapped in local minima when the out-put represents the free response of the displacement x toinitial conditions. In this model, m denotes the systemmass, c is its damping coefficient, and k is its springconstant. The true model parameter values were set as�∗ = [m∗, c∗, k∗] = [375, 9800, 130000] and the nominalparameter values as � = [0.8 θ∗

1 , 1.25 θ∗2 , 0.8 θ∗

3 ]. The out-put comprised the free response of the MSD to the initialconditions x(0) = 0.2 and x(0) = 0.

The capacity of PARSIM in evading local minimawas tested in application to both the van der Pol os-cillator and the MSD. For this purpose, fifty estimationruns of the van der Pol oscillator and MSD parame-ters were performed with random initial values by both

PARSIM and NLS. The random initial values for the vander Pol oscillator were within ±10% of the nominal pa-rameter values � = [0.8 θ∗

1 , 1.25 θ∗2 , 0.8 θ∗

3 ] and those ofthe MSD were within ±1% of the nominal parameter val-ues � = [0.8 θ∗

1 , 1.25 θ∗2 , 0.8 θ∗

3 ]. The prediction and pre-cision errors of the estimation runs are shown in Figure 15,where the left plots are associated with the van der Pol os-cillator and the right plots with the MSD. The predictionerrors in the top plots indicate the success of NLS in ze-roing the prediction error far more rapidly than PARSIM,albeit at erroneous parameter values, as indicated by thegenerally non-zero values of the precision error by NLS inthe bottom plots. PARSIM, on the other hand, takes consid-erably more iterations to minimise the prediction error, butit is far more successful at reaching the correct parametervalues, as represented by the generally smaller values of theprecision errors obtained by PARSIM in the bottom plots.We attribute the better precision of PARSIM to its separatenature of estimation of the model parameters.

4.3.3. Smoothness of output sensitivities

PARSIM relies on the shape attributes of output sensitiv-ities, therefore, its performance depends on their reliablecharacterisation. By this analogy, superfluous spikes in theoutput sensitivities that are caused by idiosyncrasies ofnumerical simulation would adversely distort the param-eter signatures and hamper the parameter estimation. This

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Figure 15. Prediction and precision errors of fifty estimation runs of the van der Pol oscillator parameters (left) and MSD parameters(right) obtained with random initial parameter values.

aspect of PARSIM was investigated, by smoothing the out-puts of the van der Pol oscillator, to remove non-analyticspikes. Smoothing improved the parameter signatures byeliminating superfluous regions that would otherwise beincluded at instances of large slopes (slope changes), low-ered the overall entropy values of the parameter signatures(i.e., improved their information content), and increased thespeed of convergence of the prediction error, albeit not at ahigher precision.

5. Conclusion

PARSIM is a method of parameter estimation for dynamicsystems that relies on the shape attributes of system out-puts. The shape attributes are represented by continuouswavelet transformation of the outputs into the time-scaledomain, but the surfaces that characterise the shape at-tributes present too vast a data content to be readily utilisedin NLS. PARSIM overcomes this impediment by consid-ering a selected subset of data points located in isolatedregions of the time-scale domain and close to the modulusmaxima of the corresponding wavelet coefficient surfaces.

Two different solutions are devised for parameter esti-mation. The first and preferred solution uses the LS formu-

lation. The second solution, used as recourse, is only reliedupon when parameter signatures cannot be extracted for amodel parameter from any of the wavelet transforms of theoutputs.

Another feature of PARSIM is its capacity to use severalmeasures of parameter signature quality as the basis foradjusting its degrees of freedom. This enables PARSIM toselect the best dominance factor among a set of dominancefactors considered as well as to adjust the adaptation stepsize associated with each parameter.

The preliminary results presented in the paper validatethe robustness of PARSIM’s solutions and highlight someadvantages of including the shape attributes in lieu of themagnitudes of the time series involved (i.e., output sensitiv-ities and prediction errors). Also presented are the results toindicate the benefit of the smoothness of the output sensi-tivities and PARSIM’s potential vulnerability to superfluousspikes caused by numerical simulation. The application ofPARSIM, however, is not without a cost. Its additional de-grees of freedom make it more onerous to implement andits longer computation time prolongs the parameter estima-tion effort. As such, PARSIM should be considered whenits potentially improved precision outweighs its cost ofimplementation.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


• Computation time: The added computational costof PARSIM, compared to a time-based method likeNLS, is associated with (1) transformation of the in-volved time series (output sensitivities and predictionerrors) to the time-scale domain via various CWTs,(2) extraction of the parameter signatures, (3) selec-tion of the dominance factors, (4) computation ofthe adaptation step sizes, and (5) estimation of theparameter errors. In application to both Chua’s cir-cuit and the van der Pol oscillator, 50 iterations ofPARSIM on a current state-of-the-art personal com-puter would require 20–30 minutes depending on thenumber of CWTs and dominance factors considered.In comparison, the application of NLS to the sameplatforms would require approximately 15 seconds.Running an estimation of the Chua’s circuit with asingle dominance factor, for instance, requires ap-proximately 3 seconds per iteration. The addition ofa second dominance factor would double the time to6 seconds. Therefore, the bulk of PARSIM’s com-putation time is attributed to wavelet transformationand extracting the parameter signatures at the variousdominance factors. This added computation time willbe a detractor of PARSIM in applications involvingquick simulation run-times. However, in applicationswith long simulation run-times, the time required togenerate parameter signatures becomes less of a con-cern. In these cases, the bulk of the run-time is shiftedfrom extracting the parameter signatures to generat-ing the Jacobian matrix.

• Compensation for noise: The effect of noise on theparameter estimates and the provisions for its miti-gation are not discussed in the present paper, sincenoise compensation by PARSIM has been addressedin McCusker, Currier, and Danai (2011). One of thecommon utilities of the time-scale domain is in fil-tering noise, because of the prominence of high-frequency noise in the lower scales of the wavelettransforms. The filters that have been developedthreshold the wavelet coefficients in the lower scalesand then reconstruct the signal in the time domain(Donoho & Johnstone, 1994; Donoho, Johnstone,Kerkyacharian, & Picard, 1995). However, the recon-structed signals do not necessarily render better pa-rameter estimates because of the disconnect betweendenoising and parameter estimation (McCusker et al.,2011). PARSIM, in contrast, performs parameter es-timation in the time-scale domain, therefore, it obvi-ates the need for reconstructing the denoised signalin the time domain. It compensates for noise, instead,by estimating the distortion by noise of the waveletcoefficients of the prediction error and then usingthis estimate to assign confidence to the parametererror estimates at individual pixels of the parame-ter signature. The application and benefits of noise

compensation by PARSIM are shown for onlysingle wavelets in McCusker et al. (2011), buttheir extensions are straightforward for multiplewavelets.

• Number of parameters: If one considers the parame-ter signatures as sets, then the union of these sets isbound by the number of pixels included in the time-scale plane. As such, the larger the number of pa-rameters (parameter signatures), the higher the com-petition for pixels. In this paper, we restricted eachtime series to 128 data points and the transforma-tion to 72 scales, resulting in a time-scale plane of128 × 72 dimension. This number of pixels canbe distributed among a large number of parametersignatures, certainly more than those ordinarily con-sidered for dynamic systems. Nevertheless, furtheranalysis is warranted to verify this assertion and toestablish the range of parameters that are manageableby PARSIM.

• Selection of the best suite of CWTs: The results pre-sented in this paper are based on the Gaussian fam-ily of CWTs. Depending on the application plat-form, different combinations of Gauss and SombreroWTs and Gaussian smoothing were selected. How-ever, these combinations were selected by trial anderror, and not autonomously according to the infor-mation content each transformation contributed tothe parameter estimation solution. Therefore, devis-ing a measure for identifying the optimal set of CWTsfor each application platform will be an interestingproblem to solve. Another interesting problem wouldinvolve the inclusion of other WTs in the operationof PARSIM.

• Range of dominance factors: Among the various de-grees of freedom of PARSIM, the dominance factoris the most critical. In its current form, the domi-nance factor is selected among a set of dominancefactors. A different strategy would entail determin-ing the maximum dominance factor by minimisingthe number of zero-valued entropies.

• Number of edge-point pixels: The results produced inthis paper are based on an arbitrary maximum num-ber of edge point pixels. Given that the number ofpixels included in each parameter signature can havea significant impact on the quality of parameter esti-mates, identifying a measure whereby the optimumnumber of edge points can be determined is a poten-tially worthy topic of study.

AcknowledgementsWe are grateful to the authors of Wavelab from the Departmentof Statistics at Stanford University who have provided a valuableplatform for producing some of the results in this paper.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014


Notes on contributorsJeffrey Simmons has over 27 years of ex-perience working in the aerospace industryat Pratt & Whitney, a United TechnologiesCorporation company, where his primaryactivity has been focused on computer sim-ulations of the propulsion systems as well asthe numerical methods employed in the sim-ulation solvers. Jeffrey has recently com-pleted a doctorate in philosophy in the field

of system identification through the Department of Mechanicaland Industrial Engineering at the University of MassachusettsAmherst, February 2014. His previous degrees include a Mastersof Arts degree, 1998, in mathematics and Bachelors of Arts de-gree, 1995, in mathematics with a minor in physics; both degreesare from Central Connecticut State University.

Kourosh Danai received all his degreesfrom the University of Michigan, the lastone a PhD in 1986. He joined the Mechan-ical Engineering Department of Universityof Massachusetts Amherst in 1987, wherehe is now a professor. Dr Danai’s researchhas focused on automation solutions thatare inspired by artificial intelligence. Withhis students, he has developed solutions for

fault diagnosis of helicopter gearboxes as well as optimisation andcontrol of manufacturing processes. His latest research has beenfocused on system identification in the time-scale domain. He hasbeen the recipient of three innovation awards from NASA. Hespent the summer of 1990 at Sikorsky Aircraft Company (work-ing on helicopter track and balance), and the fall of 1994 at theUnited Technologies Research Center (working on sensor locationselection in helicopter gearboxes). Dr Danai is a fellow of ASME.

ReferencesAbarbanel, H.D.I. (1993). The analysis of observed chaotic data in

physical systems. Reviews of Modern Physics, 65(4), 1331–1392.

Addison, P.S. (2002). The illustrated wavelet transform handbook.Bristol: Institute of Physics Publishing.

Astrom, K., & Eykhoff, P. (1971). System identification - A sur-vey. Automatica, 7, 123–162.

Billings, S.A. (2003). Nonlinear system identification. Singapore:Wiley.

Bonachela, J., Hinrichsen, H., & Munoz, M. (2008). Entropy esti-mates on small data sets. Journal of Physics A: Mathematicaland Theoretical, 41, 202001:1–9.

Coifman, R.R., & Wickerhauser, M.V. (1992). Entropy-based al-gorithms for best basis selection. IEEE Transactions on In-formation Theory, 38(2), 713–718.

Danai, K., & McCusker, J.R. (2009). Parameter estimation by pa-rameter signature isolation in the time-scale domain. ASMEJournal of Dynamic Systems, Measurement and Control,131(4), 041008.

Donoho, D.L., & Johnstone, I.M. (1994). Ideal spatial adaptationby wavelet shrinkage. Biometrika, 81(3), 425–455.

Donoho, D.L., Johnstone, I.M., Kerkyacharian, D., & Picard, D.(1995). Wavelet shrinkage, asymptopia? Journal of the RoyalStatistical Society, Series B (Methodological), 57(2), 301–369.

Fletcher, R. (1987). Practical methods of optimization (2nd ed.).Chichester: Wiley.

Goldberg, D.E. (1989). Genetic algorithms. Reading, MA: Addi-son Wesley.

Kennedy, M.P. (1994). ABC - adventures in bifurication & chaos:A program for studying chaos. Journal of the Franklin Insti-tute, 331B(6), 631–658.

Ljung, L. (1999). System identification: Theory for the user (2nded.). Saddle River, NJ: Prentice Hall.

Mallat, S. (1998). A wavelet tour of signal processing (2nd ed.).San Diego: Academic Press.

Mallat, S., & Hwang, W.L. (1992). Singularity detection and pro-cessing with wavelets. IEEE Transactions on Information The-ory, 38(2), 617–643.

McCusker, J., Currier, T., & Danai, K. (2011). Improved parameterestimation by noise compensation in the time-scale domain.Signal Processing, 91(1), 72–84.

Narendra, K.S., & Annaswamy, A.M. (1989). Stable adaptivesystems. Englewood Cliffs, NJ: Prentice Hall.

Parlitz, U. (1996). Estimating model parameters from time-seriesby auto-synchronization. Physical Review Letters, 76(8),1232–1235.

Parlitz, U., Junge, L., & Kocarev, L. (1996). Synchronization-based parameter estimation from time series. Physics ReviewE, 54(6), 6253–6259.

Rubinstein, R.Y. (1986). Monte Carlo optimization, simulation,and sensitivity of queueing networks. New York, NY: Wiley.

Sastry, S., & Bodson, M. (1989). Adaptive control. EnglewoodCliffs, NJ: Prentice Hall.

Seber, G.A.F., & Wild, C.J. (1989). Nonlinear regression. NewYork, NY: John Wiley and Sons.

Strang, G. (2006). Linear algebra and its applications (4th ed.).Thomson Brooks/Cole.

Wang, N., Dayawansa, W.P., & Martin, C.F. (1999). Van der Poloscillator networks. Proceedings of the 38th Conference onDecision and Control. Phoenix, AZ, USA.

Dow

nloa

ded

by [

Kou

rosh

Dan

ai]

at 1

2:45

07

Aug

ust 2

014

Documents

Multi-output parameter estimation of dynamic systems by output shapes