NONLINEAR DYNAMICAL SYSTEM IDENTIFICATION …jeti.uni-freiburg.de/papers/IJBC_edi.pdf · nonlinear dynamical system identification from uncertain and indirect measurements henning

June 23, 2004 17:5 01034

Tutorials and Reviews

International Journal of Bifurcation and Chaos, Vol. 14, No. 6 (2004) 1905–1933c© World Scientific Publishing Company

NONLINEAR DYNAMICAL SYSTEMIDENTIFICATION FROM UNCERTAIN AND

INDIRECT MEASUREMENTS

HENNING U. VOSS and JENS TIMMERFreiburg Center for Data Analysis and Modeling,University of Freiburg, 79104 Freiburg, Germany

JURGEN KURTHSCenter for Dynamics of Complex Systems,

University of Potsdam, 14469 Potsdam, Germany

Received July 10, 2002; Revised December 1, 2002

We review the problem of estimating parameters and unobserved trajectory components fromnoisy time series measurements of continuous nonlinear dynamical systems. It is first shownthat in parameter estimation techniques that do not take the measurement errors explicitlyinto account, like regression approaches, noisy measurements can produce inaccurate parameterestimates. Another problem is that for chaotic systems the cost functions that have to be min-imized to estimate states and parameters are so complex that common optimization routinesmay fail. We show that the inclusion of information about the time-continuous nature of theunderlying trajectories can improve parameter estimation considerably. Two approaches, whichtake into account both the errors-in-variables problem and the problem of complex cost func-tions, are described in detail: shooting approaches and recursive estimation techniques. Both aredemonstrated on numerical examples.

Keywords : System identification; multiple shooting algorithm; unscented Kalman filter;maximum likelihood.

1. Introduction

For a quantitative understanding and control oftime-varying phenomena from nature and technol-ogy, it is necessary to relate the observed dynami-cal behavior to mathematical models. These modelsusually depend on a number of parameters whosevalues are unknown or only known imprecisely.Furthermore, often only a part of the system’s dy-namics can be measured. For example, in an electro-physiological experiment the voltage across a nervecell membrane may be measured but not the ioniccurrents, which are not accessible but would alsobe needed for modeling the observed dynamics.As being part of a network, the neuron may alsodepend on concealed and thus unobserved inputs

which need to be identified from the measuredneuron dynamics. To duly apply the powerfulprinciples of the theory of nonlinear dynamics inmodeling such systems, a precedent system identi-fication is inevitable. In the nerve cell example, notonly unknown parameters have to be identified butalso time-varying unobserved states. Similar prob-lems arise in a variety of low-dimensional nonlineardynamical systems, be they electronic circuits, me-chanical devices, physiological rhythm generators,competing biological populations, nonlinear opticaldevices, chemical reactors, or even macroeconomicsystems, among others.

We consider the problem of estimating pa-rameters and unobserved trajectory componentsfrom noisy time series measurements of continuous

1905

June 23, 2004 17:5 01034

1906 H. U. Voss et al.

nonlinear dynamical systems. Rather than empha-size mathematical rigor, this tutorial aims at pro-viding a comprehensible introduction to this fieldfor the applied scientist, being from physics, biology,chemistry, or the engineering or economic sciences.

There is a vast literature on parameter estima-tion and system identification in general, so we re-strict ourselves to a class of problems that are oftenencountered in applications: The models consideredare assumed to be deterministic but in general non-linear with continuous time dependence, such thattheir dynamics are described by systems of first-order ordinary differential equations. These mod-els are assumed to be known; they result in generalfrom first principle equations, but some or all of theparameters involved are assumed to be unknown.The data are assumed to be measurements on ex-perimental setups or from the “real world”, wherebyit is assumed that the dynamical behavior of thesystem is correctly described by the model if onlythe correct parameter values are used. The mea-surements may be disturbed by a certain amount ofmeasurement noise.

It is not assumed, however, that all system com-ponents are indeed measured; if the model consistsof say, a D-dimensional system of first-order ordi-nary differential equations, then only less than Dtrajectory components may be experimentally ac-cessible and measured.

Besides our interest in parameter estimation,we also want to estimate the unobserved trajecto-ries at all time points of interest. To make the prob-lem even more complicated, it is not assumed thatthe observed components are measured directly butwe allow for smooth nonlinear measurement func-tions. For example, in terms of the previous examplefrom electrophysiology, the measurement could bedistorted by a nonlinearity in the signal amplifier.

This paper aims at solving such problems usingmainly two different approaches:

We will not only describe the “standard”approach of parameter estimation in ordinary dif-ferential equations based on shooting methods, butalso a less known approach based on recursive es-timation, where parameters are estimated for eachtime point recursively proceeding through the timeseries. These two approaches are somewhat comple-mentary; the former focuses on parameter estima-tion, whereby also the trajectories can be estimated.The latter focuses on the estimation of trajectories,whereby also parameters can be estimated. We willgive a worked-out example to compare these two

approaches and will find that in this example therecursive approach can compete well with the stan-dard approach, albeit being computationally muchsimpler.

The outline of this paper is as follows: In thenext section the problem of parameter and stateestimation is made more precise, and problems re-lated to methods that are not based on the statespace concept are highlighted. In Sec. 3 the errors-in-variables problem and its implications for param-eter estimation is illustrated. The other main prob-lem, complex cost functions, is described in Sec. 4.Then, two different approaches to solve these prob-lems are worked out on examples: The multipleshooting approach (Sec. 5) and recursive estimation(Sec. 6). Further nonlinear filtering approaches arementioned in Sec. 7. Finally, nonparametric mod-eling and recursive estimation in spatiotemporalsystems is considered in Sec. 8.

Citations of methods are given in the text wherethey are mentioned first. Although we tried to pro-vide an up-to-date bibliography, it is by no meanscomplete. Nevertheless, we hope that it will providea useful starting point for further reading.

2. The Problem of Parameter andTrajectory Estimation

Throughout the paper, only autonomous dynamicalsystems are considered. The parameter estimationmethods further described will be straightforwardlyalso applicable to nonautonomous systems, but tokeep notation simple, we suppress explicit time de-pendence. Further, if not stated otherwise, we focuson deterministic dynamics, and stochasticity entersonly through the measurement errors.

We start with the description of the modelsystem. The model is given by an autonomous sys-tem of Dx first-order differential equations with Dλ

parameters,

x1(t) = f1(x1(t), x2(t), . . . , xDx(t), λ1, . . . , λDλ)

x2(t) = f2(x1(t), x2(t), . . . , xDx(t), λ1, . . . , λDλ)

...

xDx(t) = fDx(x1(t), x2(t), . . . , xDx(t), λ1, . . . , λDλ).

(1)

In vector notation, system (1) can be written as

x(t) = f(x(t),λ) . (2)

June 23, 2004 17:5 01034

Nonlinear Dynamical System Identification from Uncertain and Indirect Measurements 1907

The parameter vector λ contains all parametersof both the system and the measurement functionwhich will be introduced later.

For the rest of this paper the following situ-ation is encountered: The states x(t) cannot bemeasured directly; what is observed is the quan-tity y(t) which is related to x(t) by some trans-formation due to a smooth measurement functionplus some independent random errors η(t). Further-more, the state is only sampled at discrete instantsof time. For notational simplification, only uniformsampling is encountered, that is, the measurementsare performed with a fixed sampling interval ∆t attime t, t + ∆t, . . . , t + (N − 1)∆t.

These assumptions cover a wide range of sit-uations; they can be expressed in a denser formby introducing a measurement equation. It consistsof a measurement function G: RDx → RDy whichmaps the state vector x(t) to a quantity of equal orsmaller dimension, and the measurement errors ηt

which are assumed to be Gaussians with zero mean.They are independent for different times but neednot have a constant covariance matrix over time.The measurement equation then becomes

yt = G(xt,λ) + ηt , (3)

yielding a multivariate time series yti (i =1, . . . , N). From here on, subscripts denote discretevalues, like sampling time.

The problem that will be encountered can beformulated as: Given a Dy-dimensional time se-ries yti (i = 1, . . . , N) and the model functions inEqs. (2) and (3), we have to estimate the statesx(t) for any time t of interest and the parametersλ. This problem is referred to as a dynamical systemidentification problem.

There is one approach to solve this estimationproblem which could instantly come to mind; it isbased on embedding and least squares: Write sys-tem (1) as a scalar differential equation of higherorder, if possible, and estimate the derivatives fromthe data sample. Since it is equivalent to a differ-ential embedding of the time series [Packard et al.,1980; Takens, 1981], this gives access to the unob-served system components or at least some trans-formations of them, which can be used in a leastsquares (LS) fit to estimate the parameters. Thereare several variants of the LS approach [Crutch-field & McNamara, 1987; Cremers & Hubler, 1987;Breeden & Hubler, 1990; Gouesbet, 1991b; Kadtkeet al., 1993; Aguirre & Billings, 1995; Hegger et al.,1998], which have been shown to work for small

amounts of noise and if the system of Eqs. (1) can betransformed into a single equation of higher order.Besides, this transformation may be quite involvedand not easy to find [Gouesbet, 1991a, 1992], thereare two main drawbacks:

(i) Measurement noise prevents the accurate andeven appropriate estimation of derivatives, es-pecially of higher order. Filtering is not alwaysa cure since filtering introduces temporal de-pendencies into the data and may disturb thedata distribution. For example, think of a mov-ing average window which involves also neigh-bored data points to smooth the data samples.These dependencies may then pretend dynam-ics which are not inherent in the data, leadingto wrong results for the parameters [Timmeret al., 2000].

(ii) The errors-in-variables problem. This problemstems from regression analysis and occurs if notonly the dependent variables have errors butalso the independent variables. If not appro-priately taken into account, it leads to biased,i.e. wrong, parameter estimates. Biased param-eters usually occur in connection with the naiveapplication of a least squared estimation whichis not suited to this problem. The errors-in-variables problem is unavoidable for the prob-lem considered here, the estimation of param-eters in dynamical systems with measurementnoise.

3. State Space Modeling and theErrors-in-Variables Problem

The errors-in-variables problem is explained in moredetail in the following, and the failure of the leastsquares approach is demonstrated. Also, the some-times encountered cure by means of a total LSapproach is discussed.

The errors-in-variables problem is known instatistics for a long time [Madansky, 1959]. Its sig-nificance in the context of nonlinear time seriesanalysis has been introduced by [Kostelich, 1992].To explain the problem by means of our systemidentification problem, we rewrite Eqs. (2) and (3)in discrete time [Gelb, 1974], to get the discrete non-linear deterministic state space model

xt+∆t = F(xt,λ) , (4)

yt+∆t = G(xt+∆t,λ) + ηt+∆t . (5)

June 23, 2004 17:5 01034


The time step ∆t corresponds to the sampling time,and the function F is obtained from the function fin Eq. (2) via

F(xt,λ) = xt +

∫ t+∆t

t

f(x(T ), λ) dT . (6)

The states described by Eqs. (4) and (6) obey theMarkov condition, i.e. each state follows uniquelyfrom its predecessor.

In this section we consider the linear state spacemodel, but the errors-in-variables problem carriesover to nonlinear models as well [Carroll et al., 1995;McSharry & Smith, 1999]. A linear state spacemodel has the form

xt+∆t = F (λ)xt + εt , (7)

yt+∆t = G(λ)xt+∆t + ηt+∆t . (8)

The system functions F and G are now described byDx ×Dx and Dy ×Dx-dimensional matrices F andG, respectively, multiplied with the states. Lineardeterministic models have asymptotically, i.e. forlong times, only trivial dynamics generically; thestate components relax to the origin or diverge to in-finity. Therefore, to establish this linear model witha nontrivial albeit asymptotically finite dynamics,the process noise εt is added in this example. It isdefined as being Gaussian with zero mean and in-dependent over time. Its covariance matrix is notobliged to be constant over time. The covariancesof the process and measurement noise are denotedby Qt and Rt, respectively, i.e.

Qt := E[

εtε′t

]

and Rt := E[

ηtη′t

]

, (9)

where E[·] is the expectation value. Therefore,εt ∼ N (0, Qt) and ηt ∼ N (0, Rt). The expres-sion N (µ, B) denotes a vectorial Gaussian randomvariable with mean µ and covariance matrix B.

Now a particularly simple case is considered[Timmer, 1998]: The state space model (7), (8) inone dimension with the trivial measurement func-tion Gxt = xt and constant noise variances Q andR:

xt+∆t = Fxt + εt (|F | < 1) (10)

yt+∆t = xt+∆t + ηt+∆t . (11)

First assume that there is no measurement noise.Then the constant F can be estimated by multiply-

ing both sides of Eq. (10) with xt. Averaging yields

F =

∑

t

xt+∆txt

∑

t

x2t

=cov (xt+∆t, xt)

var (xt), (12)

with “var” and “cov” being the variance and thecovariance, respectively.

This is also the least squares estimate of Fwhich would arise from minimizing the cost func-tion χ2(F ):

F = arg minF

χ2(F ) (13)

= arg minF

∑

t

(xt+∆t − Fxt)2

Q. (14)

It can be shown that the least squared esti-mator here is a maximum likelihood estimator (MLestimator). Maximum likelihood estimators of pa-rameters are those which maximize the probabilityof the observed data given the model parameters.They are generally unbiased, i.e. they yield asymp-totically the correct values for the parameters.

Now assume that there is a finite measure-ment noise with constant variance R, but the leastsquares estimator is used in a naive way. Proceedingas above, this results in

Fnaive =cov (yt+∆t, yt)

var (yt). (15)

Since var (yt) = var (xt) + R and cov (yt+∆t, yt) =Fvar (xt), one gets

Fnaive = Fvar (xt)

var (xt) + R, (16)

from which it follows that

|Fnaive| < |F | . (17)

The larger the measurement variance R, the morethe naive estimator is biased. This bias is indepen-dent of the size of the data sample and hence cannotbe compensated by measuring more data. Indepen-dent of the true value of F , for large noise the naiveestimate asymptotically vanishes.

How can one understand this? The parameterF was estimated using a regression approach, whichsolves a problem of the form: Given a dependentvariable which can be written as a linear functionof an independent variable plus some noise, estimatethe parameters of the linear function. In this set-ting it is assumed that only the dependent variableis uncertain but not the independent variable. Intime series analysis problems, like Eqs. (10) and

June 23, 2004 17:5 01034


y y

y y y

y

y

∆t+ t ∆t+ t

ttt

∆t+ t

t

Figure 1: The LS approach estimates the the regression line by minimizing

the mean squared distance between the measurements yt+∆t and yt. The

latter are wrongly taken to be given exactly (a). For uncertain yt but certain

yt+∆t the LS method works only if the meaning of yt and yt+∆t is exchanged

(b), whereas for uncertain yt and yt+∆t the orthogonal distance method treats

both variables in the same way (c).

52

(a) (b) (c)

Fig. 1. The LS approach estimates the regression line by minimizing the mean squared distance between the measurementsyt+∆t and yt. The latter are wrongly taken to be given exactly (a). For uncertain yt but certain yt+∆t the LS method worksonly if the meaning of yt and yt+∆t is exchanged (b), whereas for uncertain yt and yt+∆t the orthogonal distance methodtreats both variables in the same way (c).

(11), the data points are usually measurements ofboth dependent variables, e.g. at time t + ∆t, andindependent variables, e.g. at time t.

Therefore, in time series analysis the indepen-dent variables become uncertain as well, giving riseto the errors-in-variables problem. In other words,the reason for the bias is that the naive estimatoris not a ML estimate any more; for a proper ML es-timate the probability of the data given the modelshould also depend on the uncertainty of the inde-pendent variables, here xt, rather than only on theuncertainty of the dependent variables, here xt+∆t.

In this particular example, the naive estima-tor can still be used to correctly yield F in a MLsense; the known bias can be used to correct thewrong estimate by using Eq. (16) and the fact thatin this model the variance of xt is known to be varQ/(1 − F 2). However, it stands to reason that thesearch for such correction formulas becomes rathercumbersome for more general models [Fuller, 1987;Seber & Wild, 1989; Jaeger & Kantz, 1996], and inour nonlinear model (4), (5) it should hardly be pos-sible generally to estimate the system functions bycorrecting a least squares fit to the measurements.Note that only the observational noise causes trou-ble but not the system noise in Eq. (7). Therefore,the errors-in-variables problem is also significant inthe determinsitic state space model (4), (5).

To summarize this result, we see that all prob-lems have arisen because the values of xt are notgiven but are uncertain themselves; a least squaresapproach does not take the distribution of xt intoaccount but takes the xt to be given exactly. Inother words, in the LS approach the regressioncoefficient F is determined such as to minimize the

mean squared distance between the measurementsyt+∆t and yt [Fig. 1(a)].

A tutorial of the errors-in-variables problemwith a worked-out solution for the Henon map isprovided by Jaeger and Kantz [1996]. They alsoshow how this problem affects other kinds of non-linear estimation problems like the estimation ofinvariant quantities of dynamical systems.

A more general approach is the methodof total least squares (TLS) [Jeffreys, 1980, 1988;Boggs et al., 1987; van Huffel & Vandewalle, 1991].This method treats both yt+∆t and yt in the sameway by estimating F such as to minimize the or-thogonal distances between the regression line andthe data [Fig. 1(c)]. This seems to be sensible, sincethis is the intermediate case for certain yt and un-certain yt+∆t on the one hand and certain yt+∆t

and uncertain yt on the other hand [Fig. 1(b)]. Onecould also say that the LS method estimates thetrue values of yt+∆t via xt+∆t = Fyt, and the or-thogonal distance method estimates the true valuesof yt+∆t and yt. In this sense the orthogonal distancemethod copes with the errors-in-variables problem,but it is still not optimal, and, therefore, not a MLestimate: In its simplest form, it completely ignoresinformation about the distributions of yt+∆t andyt. Furthermore, the TLS approach leads to sig-nificantly larger estimation errors. Consequently, ithas been shown that the TLS method is of onlyrather limited use for nonlinear time-discrete es-timation problems [Kostelich, 2001; McSharry &Smith, 1999]. For these reasons we do not adhereto the orthogonal distance method here.

What we should keep in mind from thislesson is the following: The TLS method works by

June 23, 2004 17:5 01034


estimating not only the unobserved values of xt+∆t

from the measurements, but also the values of xt.

Albeit being not optimal, this is the right direc-

tion to proceed: The errors-in-variables problem can

only be solved if the values of the independent

variables xt are also estimated from the data in

addition to the values of the dependent variables

xt+∆t. This approach will be revived later on, when

in addition to the estimation of parameters of a

dynamical system the unobserved trajectories will

also be estimated (Secs. 5 and 6).

4. The Problem of Complex CostFunctions

The errors-in-variables problem is not the only fact

0 0.2 0.4 0.6 0.8 15

6

7

8

9

10

11

12

13

χ2(x

1)

N=2

x1

|

0 0.2 0.4 0.6 0.8 15

10

15

20

25

30

χ2(x

1)

N=4

x1

|

0 0.2 0.4 0.6 0.8 110

15

20

25

30

35

40

45

50

χ2(x

1)

N=6

x1

|

0 0.2 0.4 0.6 0.8 115

20

25

30

35

40

45

50

55

60

65

χ2(x

1)

N=8

x1

|

Figure 2: The cost function of the shift map (18) with observational noise for

different numbers of measurements N . The true initial value of the sequence

is marked by an arrow.

53

Fig. 2. The cost function of the shift map (18) with observational noise for different numbers of measurements N . The trueinitial value of the sequence is marked by an arrow.

June 23, 2004 17:5 01034


0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

χ2(x

1)

N=2

x1

|

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

χ2(x

1)

N=4

x1

|

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

χ2(x

1)

N=6

x1

|

0 0.2 0.4 0.6 0.8 10

5

10

15

20

25

30

35

40

45

50

χ2(x

1)

N=8

x1

|

Figure 3: The same as Fig. 2 but with a different noise realization.

54

Fig. 3. The same as Fig. 2 but with a different noise realization.

that prevents an easy nonlinear dynamical system

identification. As will be shown now, another prob-

lem arises, especially for chaotic dynamics: The cost

function may become so complex that a numeri-

cal minimization by standard optimization methods

fails. Whereas the former problem is of a genuinely

statistical nature, the latter one seems to be more

of technical nature. However, as it will turn out,

this is only part of the story, because the cost func-tions for data from chaotic dynamical systems showanomalous scaling behavior.

The problem of complex cost functions isdemonstrated on a simple example. Consider thenonlinear state space model (4), (5) with the shiftmap

xt+1 = f(xt) , f(xt) = 2xt mod1 , (18)

June 23, 2004 17:5 01034


x1,1

x2

,1

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.70.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

0.28

0.3

40

60

80

100

120

140

160

180

Figure 4: Clip of the cost function of the estimate of an initial state vector

of the Henon map.

55

Fig. 4. Clip of the cost function of the estimate of an initial state vector of the Henon map.

which maps the unit interval onto itself. The shiftmap is chaotic and serves as a theoretic model tounderstand also dynamical aspects of the more gen-eral cases of two-dimensional maps and autonomousvector fields of the form of Eq. (2) [Wiggins, 1990].

Rather than estimating parameters, to keep theproblem simple we treat the initial value x1 asthe quantity that should be estimated. The initialvalue x1 can also be seen as a parameter controllingthe deterministic sequence of xt for t > 1. If thestates xt, including the initial state, are disturbedby Gaussian noise ηt with constant variance R andN measurements

yt = xt + ηt (19)

are made, the cost function for the estimation of theinitial state at time t = 1 is

χ2(x1) =

N∑

t=1

(yt − xt)2

R. (20)

Two cost functions for different noise realizationsand x1 = 0.501 and R = 0.2 are shown in Figs. 2and 3. One observes the following:

(i) The cost functions become rather complexrapidly. Optimization methods could fail

already for as small a number of measurementsas 4, since there are many local minima with asimilar value of the cost function. Similar ob-servations have been made by [Miller et al.,1994] for the Lorenz oscillator.

(ii) For the two noise realizations the cost func-tions are structurally rather different for as fewas six data points.

(iii) Still for eight measurements the cost functionspoint to a wrong estimate in both cases.

(iv) One can guess from the figures that the esti-mation error does not decrease by a 1/

√N -

scheme, as it holds for nonlinear regression.The reason is the strong dependency in thedata caused by the deterministic origin of thedynamics. Similar anomalous scaling behaviorhas also been reported for dimension estimates[Theiler, 1990] and for Lyapunov exponentestimates [Theiler & Smith, 1995].

Matters become more complicated for a costfunction that depends on more than one variable.One could think of the estimation of a parameter inaddition to the initial state. Rather than consider-ing this case, the problem should be illustrated bycomputing the cost function for the initial vector of

June 23, 2004 17:5 01034


the two-dimensional Henon map. This map is givenby

x1,t+1 = 1 − 1.4x21,t + x2,t

x2,t+1 = 0.3x1,t .(21)

With the measurements

y1,t = x1,t + ε1,t

y2,t = x2,t + ε2,t ,(22)

the cost function is defined to be

χ2(x1,1, x2,1)=N

∑

t=1

[

(y1,t−x1,t)2

R1

+(y2,t−x2,t)

2

R2

]

.

(23)

The true initial value is (x1,1, x2,1) = (0.51, 0.18),and N = 20 measurements are taken with a con-stant noise variance of R1 = R2 = 0.2. From this ex-ample another observation can be made (Fig. 4):

(v) There are long separated “valleys” where thecost function locally takes on similar minimalvalues. This points to the fact that the twoestimates are highly dependent on each other.Therefore, it seems that the true trajectory can-not be estimated only by searching for its initialvalue, but in principle each of the (x1,t, x2,t)(t = 1, . . . , N) should be estimated.

5. The Multiple Shooting Approach

The problem of local minima in the cost func-tion will be treated by means of two differentapproaches: multiple shooting and recursive estima-tion. Multiple shooting is an ML approach and will,therefore, also solve the errors-in-variables problem.For weakly nonlinear systems, recursive estimationis an ML approach as well.

If a model like Eqs. (2) and (3) is known, how-ever, the data must not be treated as independentover time, and one should for a ML estimation alsoestimate x(t) in addition to the parameters λ. Thiswill be done below by utilizing the temporal de-pendencies as induced by the model (2). In otherwords, the information that the system produces acontinuous trajectory is incorporated.

5.1. The initial value approach

The system (2) together with an initial value xt1

and a known parameter vector λ constitutes an ini-tial value problem: Find a trajectory x(t) that for

given λ solves this system for all times of inter-est t. This is the direct problem, in contrast to theinverse problem we are interested in; we try to cal-culate those trajectories and parameters that arethe most likely ones, given measurements at dis-crete times. To keep things simple, in this chapterwe restrict ourselves to scalar measurements, i.e. themeasurement function (3) becomes

yt = G(xt, λ) + ηt . (24)

In the initial value approach [Schittkowski, 1994]initial guesses for xt1 and λ are chosen and thedynamical equations are solved numerically. Thelikelihood is given as the probability distributionρ of the data given the states and parameters, andthe ML estimator becomes

(xt1 , λ) = arg maxxt1

, λρ[{yti}|xt1 , λ] . (25)

It is asymptotically, i.e. for an infinite number ofdata samples, unbiased. In the case of indepen-dent Gaussian measurement noise, maximizing thelikelihood amounts to the minimization of the costfunction

χ2(xt1 , λ) =N

∑

i=1

(yti − G(xti(xt1 , λ), λ))2

Rti

. (26)

It is the sum of squared residuals between the dataand the model trajectory, weighted with the inversevariances of the measurement noise. The parame-ters are identified as those minimizing χ2(xt1 , λ).

Often prior knowledge about the parameterscan be formulated as equality or inequality con-straints. For instance, “rate constants” are usuallynon-negative. In the context of the multiple shoot-ing approach described later on, the continuity ofthe true trajectory is ensured by means of equalityconstraints.

The initial value approach is an iterative pro-cedure. Thus, an initial guess is updated againand again unless some convergence criterion ismet. The update step is usually computed fromthe gradient or the Hessian, leading to differ-ent optimization methods: the steepest descentmethod [Bryson & Ho, 1969], the conjugategradient method [Fletcher & Reeves, 1964], theLevenberg–Marquardt algorithm [Stortelder, 1996],and the Gauss–Newton algorithm [Schittkowski,1995], among others. Overviews of approaches foriterative solutions of nonlinear optimization prob-lems are given in several textbooks [Gill et al., 1981;Stoer & Bulirsch, 1993; Press et al., 1997]. To make

June 23, 2004 17:5 01034


the optimization task effective, at least first deriva-tives of the system equations with respect to theparameters and trajectory initial conditions, thesensitivities, should be provided.

Simulation studies have shown that for manytypes of dynamics this approach is numerically un-stable by yielding a diverging trajectory or stoppingin a local rather than in the global minimum [Hor-belt, 2001]. The reason is that for slightly wrongparameters the estimated trajectory may deviatefrom the true trajectory after a short time. This ismost evident in the case of chaotic dynamics, wheredue to the sensitivity with respect to initial condi-tions the estimated trajectory is expected to followthe true trajectory of the system only for a lim-ited amount of time [Lorenz, 1963; Ott et al., 1994;Wiggins, 1990; Nicolis, 1995; Alligood et al., 1997].This is related to the well-known shadowing prob-lem of chaotic dynamics [Hammel et al., 1988;Nusse & Yorke, 1989; Farmer & Sidorowich, 1990].The divergence of the numerical and measured tra-jectory introduces many local minima in the costfunction (26), as described above in Sec. 4.

5.2. Multiple shooting

A possible solution to the above mentioned prob-lems is the multiple shooting algorithm introducedby [van Domselaar & Hemker, 1975] in the con-text of parameter estimation, and further devel-oped by [Bock, 1983; Baake et al., 1992]. It hasbeen applied, for example, to plant physiology[Baake & Schloder, 1992], chemical reaction kinet-ics [Bock, 1981], nonlinear optics [Horbelt et al.,2001], and electronic circuits [Timmer et al., 2000;Horbelt et al., 2002].

The motivation to use multiple shooting ratherthan the initial-value approach, which is a single-shooting method, is a more efficient minimizationof the cost function without getting lost too eas-ily in local minima. In multiple shooting, initialconditions are estimated at several points alongthe time series, the shooting nodes, and thus theestimated trajectory can be forced to stay closerto the true one for a longer time. Mathemati-cally, the task is considered as a multi-point bound-ary value problem [Stoer & Bulirsch, 1993]. It isbased on a shooting method to solve two-pointboundary value problems [Bellman & Kalaba, 1965;Bryson & Ho, 1969]. The fitting interval [t1, tN ] ispartitioned into m subintervals,

t1 = τ1 < τ2 < · · · < τm+1 = tN . (27)

For each subinterval [τj, τj+1], local initial valuesxj = x(τj) are introduced as additional parame-ters. The dynamical equation is integrated piecewise(Fig. 5) and the cost function χ2(x1, . . . , xm, λ) isevaluated and minimized as in the initial value ap-proach. Whereas the dynamical parameters λ areconstant over the entire interval, the local initialvalues are optimized separately in each subinterval.

This approach leads to an initially discontin-uous trajectory, which is, however, close to thetrue one. The final trajectory should be continu-ous, i.e. the computed solution at the end of onesubinterval must match the local initial values of thenext one. These continuity conditions are linearizedin each iteration of the procedure and are thentaken into account as equality constraints [Stoer,1971]. The particular structure of their linearizedform permits the continuity constraints and the ex-tra variables xj (j = 1, . . . , m) to be eliminatedeasily from the resulting large linear system. In thisway the dimension of the system of equations tobe solved in each iteration is no larger than withthe initial-value approach. This procedure, calledcondensation [Bock, 1983], can be applied to manykinds of optimization strategies. Since only the lin-earized continuity constraints are imposed on theupdate step, the iteration is allowed to proceed tothe final continuous solution through “forbiddenground”: the iterates will generally be discontinu-ous trajectories. This freedom allows the method tostay close to the observed data, prevents divergenceof the numerical solution and reduces the problemof local minima.

Here a generalized Gauss–Newton algorithm isused [Hanson & Haskell, 1982], and both the modelnonlinearities and the continuity constraints are lin-earized in each iteration. Their first order Taylorexpansions make up a large linear optimizationproblem with linear equality constraints. For thefollowing reasons, in some cases it may become nec-essary to omit the continuity constraints at someshooting nodes: For time series of chaotic systemsit will not be possible in general to find a trajec-tory that follows the true trajectory for arbitrarylong times. Furthermore, if the model is not cor-rect, the Gauss–Newton method may not convergeif the whole time interval [t1, tN ] of measurement istaken as input data [Horbelt, 2001].

After convergence, confidence intervals for theparameters can be calculated from the curvature ofthe logarithm of the likelihood [Shumway & Stof-fer, 2000]: If the model has been specified correctly,

June 23, 2004 17:5 01034


τ τ j τ τ τj−1 j+1 j+2 j+3

Figure 5: Schematic illustration of the multiple shooting approach. Black:

true trajectory; blue dots: measurements; red: multiple shooting trajectories.

56

Fig. 5. Schematic illustration of the multiple shooting approach. Black: true trajectory; blue dots: measurements; red: multipleshooting trajectories.

the global minimum of χ2(λ) is found, and itsquadratic approximation holds in a large enoughregion around the solution point, then the covari-ance matrix of the parameter estimate is given bythe inverse Hesse matrix of the objective function:

(Σ)ij =

(

1

2

∂2χ2(λ)

∂λi∂λj

)−1

. (28)

An eigenvalue analysis of Σ reveals parametersor linear combinations of parameters that are notidentifiable from the given data. This happens,

for instance, when parameters or states cannot beestimated unambiguously, for example if the modelis invariant under a transformation that altersthe state vector and the parameters, but not themeasurements.

A good correspondence between the estimatedtrajectory and the true trajectory has little signif-icance if it has been achieved by adjusting a largenumber of free parameters. Overfitting, i.e. whenthe number of degrees of freedom is not in areasonable relation to the amount of information

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

Figure 6: Identification of the Lorenz model (29) with the multiple shooting

approach. The states and parameters are estimated by observing noisy data

from the first component of x, x1. Measured data set (blue dots), true tra-

jectory (black), and multiple shooting trajectories (red). For better visibility,

only the first 200 data points of the 400 data points time series are shown.

(a) after three iterations, (b) after ten iterations, (c) after 20 iterations. This

is the final trajectory estimate.

57

(a)

Fig. 6. Identification of the Lorenz model (29) with the multiple shooting approach. The states and parameters are estimatedby observing noisy data from the first component of x, x1. Measured data set (blue dots), true trajectory (black), and multipleshooting trajectories (red). For better visibility, only the first 200 data points of the 400 data points time series are shown.(a) after three iterations, (b) after ten iterations, (c) after 20 iterations. This is the final trajectory estimate.

June 23, 2004 17:5 01034


20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1








57

(b)

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1








57

(c)

Fig. 6 (Continued )

contained in the data, can be detected by dispro-portionately large confidence intervals.

The problem of finding useful initial guesses forthe parameters is not fully solved yet. Recently, ahybrid approach between parametric and nonpara-metric estimation has been proposed to solve thistask for a rather large class of continuous time mod-els [Timmer et al., 2000]. See also Sec. 8.

5.3. An example

The multiple shooting method is applied to a noisytime series of the chaotic Lorenz model [Lorenz,1963]

x1 = −λ1x1 + λ1x2 ,

x2 = λ2x1 − x2 − x1x3 ,

x3 = −λ3x3 + x1x2 ,

(29)

with the parameter vector λ = (10, 46, 2.67).The measurements are constructed by corrupt-

ing the component x1(t) with Gaussian measure-

ment noise ηt ∼ N (0, R) at 400 equidistant (∆t =0.02) sample times. The standard deviation of η,√

R, is 25% of the standard deviation of the x1-component of the system. We choose the initialparameter vector λ/2 and the initial values of the200 multiple shooting intervals equal to 0.9 timesthe true values.

The convergence of the algorithm is shownin Figs. 6(a)–6(c). It results in χ2 = 353 and a

parameter vector estimate of λ = (9.75 ± 0.26,47.2 ± 1.0, 2.58 ± 0.07) (1σ confidence intervals).

A recent overview of other related nonrecursivesystem identification approaches has been given in[Edsberg & Wedin, 1995], citing most of the relevantliterature and providing a Matlab toolbox.

6. Recursive Parameter Estimation

In the previous chapter the information containedin the temporal order of the data samples wasexploited by putting a global constraint onto the es-timation procedure for the trajectories. The reason

June 23, 2004 17:5 01034


was to force the estimated trajectories to be con-tinuous in time, since only then they are in com-pliance with the model. An alternative method forestimating underlying trajectories is to proceed re-cursively through the data sample and to obey thecontinuity constraint at each time step. The sys-tem trajectory is then estimated recursively by aprediction-correction method: Predict the value ofthe trajectory at time t + ∆t from its value at timet, and correct this estimate taking the new mea-surement at time t + ∆t into account. This way,one can proceed through the data sample until thelast data point is reached. There is a bunch of re-cursive estimation methods which are, however, allbased on this prediction-correction scheme, and incase of Gaussian system and measurement noise allthese amount into a recursive least squares algo-rithm [Ljung & Soderstrom, 1983]. In this setup,parameters can be estimated simultaneously byaugmenting the system’s state by the parametervector.

We first describe the well-known Kalman andextended Kalman filters for stochastic systems andwill see that the covariances of the state componentsare essential for the calculation of the prediction-correction scheme. Therefore, after deriving theKalman filter in its familiar form for linear stochas-tic systems, we will also state it in writing usingfully nonlinear and deterministic system functions.In applications to nonlinear deterministic systems,the covariance matrices have to be evaluated numer-ically in general, as opposed to the linear case wherethey can be explicitly computed from the matricesin the state space model.

The application of Kalman filtering to non-linear deterministic systems will be worked outfor the same system as in the previous chapter.We will concentrate on the particular approach of“unscented Kalman filtering”, which is nowadayspreferred more due to its superiority to the conven-tional approach of extended Kalman filtering. Seealso [Sitz et al., 2002].

A Matlab routine for parameter estimationwith the unscented Kalman filter is available fromthe authors or via www.fdm.uni-freiburg.de/∼hv.

6.1. The Kalman filter

The Kalman filter (KF) [Kalman, 1960; Kalman& Bucy, 1960] is one of the most powerful signalanalysis tools. The reason is that for linear sys-tems given in a state space formulation the KF

is an ML estimator and thus an optimal estima-tor for unobserved system trajectories. To be morespecific, “optimal” means that the KF yields themost probable, but not necessarily a typical tra-jectory, since it has some low-pass properties. Forweakly nonlinear systems the extended Kalman fil-ter (EKF) [Anderson & Moore, 1979] can be used.It is based on a Taylor-series expansion of thesystem nonlinearities. For higher order accuracyit may consist of rather complicated expressions.Therefore, in most cases the EKF of first order ischosen, which is based on a local linearization ofthe system functions. In this case the Gaussian-ity of the state variables is maintained, which isa prerequisite for the optimal functioning of theKF. Tutorials and textbooks about Kalman filter-ing, including applications, are [Jazwinski, 1970;Gelb, 1974; Bar-Shalom & Fortmann, 1988; Aoki,1990; Hamilton, 1994; Mendel, 1995; Gershenfeld,1999; Grewal & Andrews, 2001; Honerkamp, 2002;Walker, 2002].

Some novel approaches of nonlinear Kalman fil-tering are not based on a Taylor expansion of thesystem functions any more but approximate thedensity of the state by Gaussians, retaining therebythe whole system nonlinearities. Since these novelapproaches are easier to implement and more accu-rate than the EKF of first order, we will use themin the following. However, to better understand hownonlinear Kalman filtering works, it is necessary tofocus on the linear case first. Already for the lin-ear case, the KF equations will also be written ina more flexible way than usual, using nonapprox-imated system functions. This is the basis for thestraightforward application of the novel methods toour particular system identification problem in thesubsequent section.

The KF is a recursive estimator with aprediction-correction structure: We start with thenonlinear state space model (4), (5) and focus ourattention on the time point t + ∆t, the referencetime. The state estimate at time t + ∆t, given allstate estimates . . . , xt−∆t, xt and all measurements. . . , yt, yt+∆t until then, is denoted by xt+∆t|t+∆t.This is the a posteriori, or complete, estimate whichutilizes all information taking the new measurementinto account. To save indices, the a posteriori es-timate at the reference time point t + ∆t in thefollowing will be written as

x := xt+∆t|t+∆t . (30)

June 23, 2004 17:5 01034


t+ t∆

t+2 t | t+2 t∆ ∆

��

��

��

��

��

��

t t+2 t∆

xt|t

x

x

x

x

∼

∼

^ ^

^

time

∆ ∆t+2 t | t+ tvalue

Figure 7: Timing diagram of the Kalman filter. The indices at the reference

time t+∆t are suppressed, as in the text.

58

Fig. 7. Timing diagram of the Kalman filter. The indices at the reference time t + ∆t are suppressed, as in the text.

Equivalently, the a priori, or preliminary, state esti-mate at time t+∆t again utilizes all previous stateestimates but only measurements . . . , yt−∆t, yt upto one time step before the reference time t+∆t. Itwill be denoted by xt+∆t|t, or simply

x := xt+∆t|t . (31)

These quantities are illustrated in Fig. 7.The KF estimation procedure of the state at

time t + ∆t consists of two parts:

(i) an a priori estimate of the state x prior to ob-servation of the new measurement yt+∆t, and

(ii) a correction after observation of the new mea-surement yt+∆t. It is proportional to the dif-ference of the new measurement yt+∆t and thepredicted value of the new measurement y:

xt+∆t|t+∆t = xt+∆t|t+Kt+∆t(yt+∆t−yt+∆t|t).

(32)

The Dx ×Dy-dimensional Kalman gain matrixK is defined below. Dropping the time index ofthe reference time t + ∆t, and using the defini-tions above yields simply

x = x + K(y − y) . (33)

Therefore, the Kalman estimate of the state is re-cursively given by a prediction-correction structure.The prediction uncertainty is given by the covari-ance matrix of the state estimate,

Pxx = Pxx − KP ′xy . (34)

The apostrophe denotes the transpose of a matrix.The Kalman gain matrix is given by

K = PxyP−1yy . (35)

The covariance matrices are defined as the expecta-tion of the outer (dyadic) products of the deviationof the states from the a priori estimates:

Pxx = E[(x − x)(x − x)′] , (36)

Pxy = E[(x − x)(y − y)′] , (37)

and

Pyy = E[

(y − y)(y − y)′]

. (38)

The vectors are understood to be column vec-tors, and the apostrophe denotes their transpose,i.e. making them row vectors.

In these equations it is assumed implicitly,without indexing, that the expectation value E[·]is conditioned on all measurements made up tothe time indicated by the covariance on the l.h.sof the equation. For example, Eq. (36) exactlymeans Pxx,t+∆t|t = E[(xt+∆t − xt+∆t|t)(xt+∆t −xt+∆t|t)

′| . . . , xt−∆t, xt]. The a priori state estimatex is

x = E[F(xt|t)] . (39)

Similarly, the a priori measurement estimate is

y = G(x) . (40)

Equations (33) to (40) will be derived inAppendix A. They yield the optimal filter for thecase of normally distributed states in a linear statespace model and constitute approximations for thenonlinear case.

June 23, 2004 17:5 01034


If a state space model is explicitly given in theform of Eqs. (7) and (8), the KF can be expressedalso by using the system matrices F and G:

K = PxxG′(GPxxG′ + R)−1 , (41)

Pxx = (1 − KG)Pxx , (42)

Pxx = F Pxx,t|tF′ + Qt , (43)

x = F xt|t , (44)

y = Gx . (45)

These equations will also be derived in Appendix A.The first three equations can be computed offline.

Note that the process noise ε may vanish with-out causing problems in the expressions for the KF[Gelb, 1974], but not the measurement noise η.Therefore, the matrix Q may be omitted for deter-ministic dynamics, but the matrix R should alwaysbe retained; otherwise the KF does not make muchsense.

For the limit of infinitesimally small samplingtime steps, the time discretization cannot be per-formed so straightforwardly as done here; rather,one should start with the Riccati equations whichconstitute the KF equations for the time-continuouscase [Sinha & Rao, 1991]. Since we are dealing withdata with a finite sampling time step, using themodel (4), (5), we adhere to this writing for therest of the paper.

6.2. Nonlinear Kalman filtering

The KF (41) to (45) yields the optimum recursiveML estimates for the states in the linear state spacemodel (7), (8). The reason is that Gaussian randomvariables remain Gaussians for all times, and dueto the Markov property the ML estimator amountsto a recursive least squares estimator. To return toour main problem, the estimation of states and pa-rameters in the nonlinear system (4), (5), we needto relax the assumption of model linearity. We dis-cuss two different approaches to achieve an approxi-mation for weakly nonlinear functions: The methodof extended Kalman filtering approximates the sys-tem function by locally linear functions, and themethod of unscented Kalman filtering approximatesthe density of the states by a Gaussian.

In both cases, the problem of recursive leastsquares estimation boils down to the following:Given a nonlinear relationship

xt+∆t = F(xt) (46)

and

xt := E[xt], Pxx,t = E[(xt − xt)(xt − xt)′] ,

(47)

how can mean and covariance be best approximatedin the next time step? That is, find xt+∆t andPxx,t+∆t.

(i) The extended Kalman filter: In the EKF, thenonlinearity in Eq. (46) is Taylor expanded upto a given order, and then expectations aretaken to compute the updated mean and co-variance; for the EKF of first order, these soobtained mean and covariance are accurate upto first and second orders, respectively:

xt+∆t ≈ F(xt) , (48)

Pxx,t+∆t ≈ ∇F(xt)Pxx,t(∇F(xt))′ . (49)

This approximation is equivalent to a statespace model with time-varying system matrixFt; it is allowed to do this because the deriva-tion of the Kalman equations (41) to (45) wouldgo through unchanged with a time-dependentmatrix F [Honerkamp, 2002].

(ii) The unscented Kalman filter: There are alterna-tive approaches to the EKF of which some havesignificant advantages with respect to imple-mentation or accuracy. The unscented Kalmanfilter (UKF) retains the exact nonlinearity Fbut approximates the a posteriori probabil-ity density of the state xt+∆t by a Gaussian.It is motivated by the fact that it is easierto approximate a distribution by a Gaussianthan to approximate an arbitrary nonlinearfunction by linearization [Julier et al., 1995;Julier et al., 2000]. The approximation of thea posteriori density is accomplished by using aset of characteristic points, the so-called sigmapoints. They will be be used to adjust the prob-ability density of the state after transformationby the system function F to a Gaussian again:

A normally distributed random variable x of di-mension Dx is completely described by its mean andcovariance matrix. This information can be stored,with some redundancy, in 2Dx sigma points Xi ofdimension Dx each:

Xi := x + (√

DxP )i (i = 1, . . . , Dx) , (50)

Xi+Dx := x − (√

DxP )i (i = 1, . . . , Dx) . (51)

June 23, 2004 17:5 01034


The square root is any matrix square root of choice[Strang, 1988], and (

√· )i denotes its ith row or col-umn. Considered as a data sample, this set of sigmapoints has the same mean and covariance as x.

A simple example may shed some light on this:If the random variable x has zero mean and itscovariance matrix is

Pxx =

(

R1 0

0 R2

)

, (52)

then the set of sigma points is

√2

{(√R1

0

)

,−(√

R1

0

)

,

(

0√R2

)

,−(

0√R2

)}

,

(53)

and it follows that

PXX =

1

4(2R1 + 2R1) 0

01

4(2R2 + 2R2)

=Pxx.

(54)

It can be shown that the transformation ofsigma points,

Xi,t+∆t = F(Xi,t) , (55)

xt+∆t =1

2Dx

2Dx∑

i=1

Xi,t+∆t , (56)

Pxx,t+∆t =1

2Dx

2Dx∑

i=1

(Xi,t+∆t − xt+∆t)

× (Xi,t+∆t − xt+∆t)′ , (57)

preserves statistics up to second order in a Taylorseries expansion [Julier & Uhlmann, 1996]:

xt+∆t ≈ F(xt) +∇′Pxx,t∇

2F(xt) , (58)

Pxx,t+∆t ≈ ∇F(xt)Pxx,t(∇F(xt))′ . (59)

Compared with the EKF of first order, Eq. (48),(49), it is more accurate for nonlinear systems,and much simpler to computationally implementthan the extended KF. In the case of linear statespace models, both methods yield exactly the sameestimates.

The application of the unscented transforma-tion (55) to (59) to Kalman filtering in the nonlineardeterministic state space model (4), (5) is straight-forward: Use the KF, Eqs. (33) to (40), with the

following mean values and covariances:

x =1

2Dx

2Dx∑

i=1

Xi (Xi = F(Xi,t)) , (60)

y =1

2Dx

2Dx∑

i=1

Yi (Yi = G(Xi)) , (61)

Pxx =1

2Dx

2Dx∑

i=1

(Xi − x)(Xi − x)′ , (62)

Pxy =1

2Dx

2Dx∑

i=1

(Xi − x)(Yi − y)′ , (63)

Pyy =1

2Dx

2Dx∑

i=1

(Yi − y)(Yi − y)′ + Rt+∆t . (64)

Equations (33) to (35) together with (60) to(64) provide the complete UKF. Its computationalimplementation is rather easy, since the systemfunction F is used as it is; for example, if thenumerical solution of the system requires a partic-ular integration scheme, this routine can be usedunmodified for estimation also; the only differenceis that rather than computing a single trajectory,now 2Dx trajectories have to be integrated, one foreach sigma point.

The approximations imposed by the UKF areto neglect cumulants of order higher than two inthe densities of states. If higher order cumulantsbecome significant, e.g. for heavy-tailed or mul-timodal distributions, the estimates can becomebiased. However, it often turns out that the UKFgives excellent results for even higher order nonlin-earities. The reason is that the nonlinearities thatarise from the temporal discretization of the dy-namics of a time continuous system are only weaklynonlinear, unless the time steps are too large; eachnumerical integration scheme is based on a decom-position of the dynamics into an identity mappingto which a comparably small correction is added.

One of the main advantages of this approach isthat there is no need for a computation of deriva-tives with respect to the state. This allows thestraightforward use of state space models that con-tain nondifferentiable terms or models where theJacobian cannot be computed easily. This is of-ten the case for high-dimensional systems occurringwith partial differential equations.

If it is not necessary to work recursively or inreal-time, the estimates can be improved by using

June 23, 2004 17:5 01034


the Kalman smoother rather than the Kalman filter.In the Kalman smoother, after a forward sweep overthe time series, a backward sweep is performed, suchthat the estimate at time t depends not only onpast but also on future times. The combination offorward and backward filter yields, at least in thelinear case, the optimal smoother [Fraser & Pot-ter, 1969]. Equivalent concepts for the UKF are notknown so far to our knowledge.

6.3. Parameter estimation with the

nonlinear Kalman filter

There are different possibilities to estimate param-eters using the nonlinear Kalman filter based onEqs. (33) to (35) and (60) to (64). We will concen-trate on the so-called augmented state vector ap-proach, since it can be implemented without hardlyany modification of the filtering equations as theystand now. Other approaches are mentioned below.

In the augmented state vector approach, thedynamical system (4), (5) is rewritten in the fol-lowing way:

λt+∆t = λt ,

xt+∆t = F(xt,λ) ,

yt+∆t = G(xt+∆t,λ) + ηt+∆t ,

(65)

i.e. the state vector is augmented by the constantparameter vector. Also without a stochastic term inthe parameter equation, the estimate of the parame-ter can still vary in time due to the Kalman update(33), which influences also the estimate of λt+∆t.But if the dynamics is stationary, the covariance ofλt should decrease steadily whilst recursively pro-ceeding through the time series and approaching thetrue value. This behavior is illustrated in the follow-ing example.

For a further discussion of this way ofparameter estimation, including the problem of es-timation biases due to the approximate nature ofrecursive estimation and possible solutions, see e.g.[Bar-Shalom et al., 2001].

6.4. An example

We again consider the Lorenz system (29), whereonly disturbed values of the x1 component are mea-sured and all three parameters are unknown. Thedata used are the same as in Sec. 5.3.

For parameter estimation, the augmented statevector approach is used, i.e. the three parametersof the Lorenz system are extended to the state of

the system to Dx = 6 dimensions. The completenonlinear state space model is

λi,t+∆t = λi,t (i = 1, 2, 3) ,

xi,t+∆t = xi,t + Ii (i = 1, 2, 3) ,

yt = x1,t + ηt

(66)

with

I1

I2

I3

=

∫ t+∆t

t

−λ1,T x1,T + λ1,T x2,T

λ2,T x1,T − x2,T − x1,Tx3,T

−λ3,Tx3,T + x1,T x2,T

dT .

(67)

The application of the UKF is shown inFigs. 8(a) to 8(c). The matrix square root usedfor calculating the sigma points is taken as onefactor of the Cholesky decomposition of the covari-ance matrix [Press et al., 1997]. The whole pro-cedure is used in an iterative way, i.e. in addi-tion to the recursive structure of the algorithm,it is applied iteratively to obtain several “sweeps”over the data set. This method can significantly re-duce that part of estimation error which is causedby the nonlinearities of the system [Gelb, 1974;Ljung & Soderstrom, 1983]. In the first sweep overthe data [Fig. 8(a)] the parameter estimates startwith rather large uncertainties, which where speci-fied initially to be large. In this case the trajectoryadjusts to the true trajectory quickly, thus has lowbias, but for the price of a large variance. The initialguesses for the second sweep are set in the follow-ing way: The initial unobserved state componentsare retained from the first sweep, and the initialparameter estimates are set to their final values att = 400 from the first sweep. The covariance matrixPxx is taken as the final value of the first sweep, butwith entries related to a cross-covariance betweenstates and parameters removed.

This procedure results in χ2 = 321 and aparameter vector estimate of λ = (9.97 ± 0.11,46.47 ± 0.31, 2.68 ± 0.02) (1σ confidence inter-vals). The parameter vector estimate is obtained byaveraging the estimates over all times; its error isobtained by taking the standard deviation of theseparameter estimates.

The comparison of the recursive estimation re-sult [Fig. 8(c)] with the multiple shooting result[Fig. 6(c)] reveals the following:

(i) The recursively estimated trajectory is not assmooth as the one estimated with multipleshooting. On the other hand, its χ2 value is

June 23, 2004 17:5 01034


smaller, namely 321 versus 353, respectively.The reason may be that there are no continu-ity constraints in recursive estimation, and thetrajectory can follow the noisy measurementsmore closely due to the prediction-correctionstructure of the estimator. Both χ2 are statisti-cally compatible, and one cannot say that onemethod outperforms the other.

(ii) In this example, the parameter estimates areless biased and more accurate for recursive es-timation. Although we believe that this resultshould not be generalized to other cases tooearly, the reason may be that the recursive pa-rameter estimate is taken as the mean valueover all estimates over time. The price one hasto pay for the better parameter estimate is thediminished accuracy of the state estimate.

Which of these two approaches should be pre-ferred in applications depends on many factors andcannot be said across the board. Up to now thereare no studies about a performance comparison ofrecursive and shooting approaches for deterministicnonlinear systems.

Some other recent numerical examples andapplications of these approaches are given ine.g. [Sitz et al., 2002; Sitz et al., 2003; Horbelt et al.,2000; Horbelt et al., 2001; Timmer et al., 2000;Horbelt et al., 2002; Muller & Timmer, 2002].

6.5. Parameter tracking with the

nonlinear Kalman filter

If the parameter vector is not constant but slowlyvarying in time, instead of a parameter estimationproblem one has the problem of parameter tracking.The difference to the state estimation problem isthat there is no model needed to describe the vari-ation of the parameter over time; rather, as longas the parameter varies slowly as compared to thestate dynamics, it can behave in an arbitrary way.This problem can be tackled using Kalman filtersin the following way:

In the augmented state vector approach, thedynamical system (4), (5) is rewritten as

λt+∆t = λt + εt ,

xt+∆t = F(xt, λ) , (68)

yt+∆t = G(xt+∆t, λ) + ηt+∆t ,

i.e. the state vector is augmented by a stochasti-cally varying parameter vector. There are now dif-ferent ways to incorporate the information about

this variability, given by the covariance matrix Qt

of εt, into the estimation procedure [Gelb, 1974;Bar-Shalom et al., 2001]. Note that the estimationprocedure still always yields a unique result; theterm εt is just an auxiliary term, and only Qt andnot εt enters the estimation procedure. This willbecome clear in the following example.

6.6. An example

As an example, we consider the FitzHugh–Nagumosystem [FitzHugh, 1961] which is in widespread useas a simplified model of the Hodgkin–Huxley system[Hodgkin & Huxley, 1952] for the excitable dynam-ics in certain nerve cells. It is given by the two-dimensional system

x1 = c

(

x2 + x1 −x3

1

3+ z(t)

)

x2 = −x1 − a + bx2

c.

(69)

The parameters are a = 0.7, b = 0.8, and c = 3.The variable x1 is a membrane voltage which can bemeasured, and x2 is a usually unobserved lumpedvariable that describes the combined effect of dif-ferent kinds of ionic currents. The external voltagez(t) influences the dynamics. Here we assume thatz(t) is varying slowly in comparison with the othertwo variables; it is treated here as the parameterλt to be tracked. Similar setups have been used todescribe bursting in this system [Honerkamp et al.,1985].

Our aim is to track the variation of z(t) overtime from the measurement of only x1, disturbedby a significant amount of measurement noise as itis usually the case in experiments of this kind. Inthis example no parameters are estimated.

The measurements are constructed by corrupt-ing the component x1, t with Gaussian noise ηt ∼N (0, R) at 800 equidistant (∆t = 0.2) sampletimes. The standard deviation of η,

√R, is 20% of

the standard deviation of the x1-component of thesystem. The external variable z(t) is constructedas a cosine-halfwave, up to an additive constant,as shown in Fig. 9(b). The initial guesses of z andx2 are set to zero, the initial guess of x1 is set tothe measured value. For parameter tracking, theaugmented state vector approach (68) is used witha constant covariance matrix Q = 0.015. In eachestimation step, the component of the covariancematrix Pxx which corresponds to the variance of z

June 23, 2004 17:5 01034


0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1

Figure 8: Recursive parameter estimation in the Lorenz model (29) with

the UKF approach. Like in Fig. 6, the states and parameters are estimated

by observing noisy data from the first component of x. Measured data set

(blue dots), true trajectory (black), estimated trajectory (red), and estimated

parameters with confidence intervals (pink). (a) The result after the first

iteration of the process. (b) After the second iteration of the process. (c)

The first 200 data points, and the true and estimated trajectories of (b)

to facilitate a comparison with the results of multiple shooting parameter

estimation in Fig. 6c.

59

(a)

0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1










59

(b)

0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

0 50 100 150 200 250 300 350 400−30

−20

−10

0

10

20

30

40

50

t

x1, y, est. x

1 a

nd p

ara

mete

rs

20 40 60 80 100 120 140 160 180 200−30

−20

−10

0

10

20

30

t

x1, y, estim

ate

d x

1










59

(c)

Fig. 8. Recursive parameter estimation in the Lorenz model (29) with the UKF approach. Like in Fig. 6, the states andparameters are estimated by observing noisy data from the first component of x. Measured data set (blue dots), true trajectory(black), estimated trajectory (red), and estimated parameters with confidence intervals (pink). (a) The result after the firstiteration of the process. (b) After the second iteration of the process. (c) The first 200 data points, and the true and estimatedtrajectories of (b) to facilitate a comparison with the results of multiple shooting parameter estimation in Fig. 6(c).

June 23, 2004 17:5 01034


100 200 300 400 500 600 700 800

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t

x1, y

100 200 300 400 500 600 700 800

−1.5

−1

−0.5

0

0.5

1

1.5

2

t

z, estim

ate

d z

, x

2, estim

ate

d x

2

Figure 9: Recursive parameter tracking in the FitzHugh-Nagumo model (69)

with the UKF approach. Like in the previous examples, the states and

the external parameter are estimated by observing noisy data from the first

component of x. (a) Measured data set (blue dots) and the true trajectory

of x1 (black). (b) The unobserved external variable z (black), its tracked

estimate (pink, shown with 1σ confidence intervals), the true trajectory of

the unobserved component x2 (black), and its estimate (red).

60

(a)

100 200 300 400 500 600 700 800

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t

x1, y

100 200 300 400 500 600 700 800

−1.5

−1

−0.5

0

0.5

1

1.5

2

t

z, estim

ate

d z

, x

2, estim

ate

d x

2

Figure 9: Recursive parameter tracking in the FitzHugh-Nagumo model (69)

with the UKF approach. Like in the previous examples, the states and

the external parameter are estimated by observing noisy data from the first

component of x. (a) Measured data set (blue dots) and the true trajectory

of x1 (black). (b) The unobserved external variable z (black), its tracked

estimate (pink, shown with 1σ confidence intervals), the true trajectory of

the unobserved component x2 (black), and its estimate (red).

60

(b)

Fig. 9. Recursive parameter tracking in the FitzHugh–Nagumo model (69) with the UKF approach. Like in the previousexamples, the states and the external parameter are estimated by observing noisy data from the first component of x.(a) Measured data set (blue dots) and the true trajectory of x1 (black). (b) The unobserved external variable z (black), itstracked estimate (pink, shown with 1σ confidence intervals), the true trajectory of the unobserved component x2 (black), andits estimate (red).

is set to the constant value Q to mimic the uncer-tainty of the variable to be tracked.

The application of the UKF is shown in Fig. 9.Since the parameters of the model were given, thevariable z, corresponding to λ in Eq. (68), is trackedquite accurately, along with a very precise esti-

mate of the unobserved component of the state vec-tor. Note that this example constitutes an appli-cation which cannot be treated with nonrecursiveapproaches like multiple shooting.

A Matlab routine reproducing this example isavailable via www.fdm.uni-freiburg.de/∼hv.

June 23, 2004 17:5 01034


7. Further Nonlinear RecursiveApproaches

We have seen that the basis of all recursive non-linear filters is to reproduce as best as it gets thetemporal evolution of the probability densities tomaximize the likelihood. Here only the first twomoments, mean and covariance, were taken into ac-count. Recently, there evolved several other meth-ods related to this task, some of them being moregeneral. However, this is not an exhausting overviewof all these methods, but can be used as a startingpoint for further reading, reflecting many of the ac-tual main developments:

Methods to treat estimation in deterministicstate space models recursively rather than with di-rect nonlinear optimization as in the initial valueapproach or multiple shooting go back to [Pearson,1967]. In these methods, sometimes denoted by non-linear least squares methods [Gelb, 1974], the prob-ability density of the state x is unknown, like in ourcase.

The ideas of unscented Kalman filtering goback to the more general idea of stochastic approx-imation [Robbins & Munroe, 1951; Phaneuf, 1968;Sunahara, 1969; Mahalanabis & Farooq, 1971] andare put forward in [Norgaard et al., 2000].

An alternative approach to the augmented statevector for parameter estimation with the KF is dualestimation. In dual estimation, the parameter sys-tem is treated as a system of its own, and the esti-mation procedure alternates between the parameterstate and the system state [Wan et al., 2000].

Related to dual estimation again is the Ex-pectation Maximization (EM) algorithm [Dempsteret al., 1977; Shumway & Stoffer, 1982; Honerkamp,2002] applied to nonlinear dynamical system iden-tification [Ghahramani & Roweis, 1999]. In the EMalgorithm, the parameter vector is not treated asevolving in time but estimated by the following iter-ative algorithm: In each iteration, the expectation ofthe state given the current parameter estimate andthe data is calculated (E-step), then the parameteris updated by maximizing the conditional likelihood(M-step). The E-step is computed again with a KF,and the M-step usually with a gradient based algo-rithm, unless an analytic form as in linear modelscan be found. For hidden Markov models, whichare based on a discrete state space, there exists alsoa recursive estimator based on the EM algorithm[Radons & Stiller, 1999].

Closely related to the estimation of unobservedstates are model based nonlinear noise reductionmethods which take the errors-in-variables probleminto account [Farmer & Sidorowich, 1990; Hammel,1990; Davies, 1992]. The problem of convergenceof constrained minimization approaches for nonlin-ear noise reduction, which was not covered in thistutorial, has been investigated recently by [Brocker& Parlitz, 2001].

The case of varying parameters in recursiveestimation, i.e. nonstationary problems, has beeninvestigated for a long time now by Young [Young,1970, 1981, 1984]. Newer results allow even for re-cursive nonparametric model estimation [Young,2000].

Filtering in uncertain models, where the ex-plicit form of the state space model is known onlyup to some disturbances, has been recently studiedin [Sayed & Subramanian, 2002]. This is related tothe framework of adaptive filtering [Xie et al., 1994;Petersen & Savkin, 1999].

The prediction-correction structure of the KFreminds one on coupled synchronizing dynamicalsystems and the observer problem in engineeringscience [Nijmeijer, 2001]. Indeed, recent methodsappeared which use the principle of synchroniza-tion in coupled dynamical systems to estimateparameters [Stojanovski et al., 1996; Maybhate &Amritkar, 1999; Sakaguchi, 2002].

In the UKF approach, only mean and covari-ance of the posterior probability are calculatedexactly, hence it is valid only approximately ifthis assumption is violated. For strong nonlinear-ities, i.e. of more than second order if given by apolynomial, this approximation may cause a biasin the estimates. A filter that can cope with ar-bitrary nonlinearities in an exact way, however,has to be infinite dimensional, because it cannotbe given in closed form [Hazewinkel et al., 1983;Arulampalam et al., 2002]. This is true at least upto some exceptions especially designed to yield afinite dimensional filter [Benes, 1981; Daum, 1986;Schmidt, 1993]. The UKF used an optimally chosenset of sampling points, the sigma points, to repre-sent the posterior density under the assumption ofGaussianity.

There are other novel methods based on MonteCarlo sampling [Muller, 1991] which allow to loosenthis assumption by using more general densities and(sequential) importance sampling [Tanizaki, 1993],bootstrapping [Gordon et al., 1993; Kitagawa, 1996],

June 23, 2004 17:5 01034


particle filters [Arulampalam et al., 2002], or con-densation (from “conditional density estimation”)[MacCormick & Blake, 1999]. The general idea ofMonte Carlo sampling is to represent the densitiesby finite samples and to use them in the Kalmanrecursion. Thereby the samples are drawn in a ran-dom manner, unlike the sigma points which arecomputed uniquely. It is especially useful if theprobability densities deviate strongly from a Gaus-sian, e.g. by being multimodal. Of course, this maybe frequently the case for chaotic systems, but weare not aware of an application of particle filters tochaotic systems.

Importance sampling belongs to the indepen-dent sampling family. If the sample at time t + ∆tis configured such as to depend on the sample attime t, the samples are not independent any longer.This can be exploited to design highly efficientsampling strategies, known as Markov chain MonteCarlo estimations [Carlin et al., 1992; Davies, 1998;Bremer & Kaplan, 2001], with the burden of a muchhigher computational load and difficult problems ofslow convergence [Fruhwirth-Schnatter, 1994; Shep-hard, 1994] and convergence diagnosis. However,these approaches are promising for distinguishingbetween process and measurement noise, a problemnot encountered here at all, because in nonlinearsystem estimation this problem is in its general formunsolved yet.

The latter approaches are all rather new andresearch is proceeding rapidly.

8. Nonparametric Estimation andSpatiotemporal Systems

Nonparametric model estimation is a suitable toolfor modeling in the case that explicit model equa-tions are unknown; for sufficiently clean data, oftena model or at least an initial guess of a model canbe derived nonparametrically from time series ofdynamical systems [Voss & Kurths, 1997; Timmeret al., 2000; Peifer et al., 2003] or from spatiotem-poral data [Voss et al., 1998; Voss et al., 1999a;Andersen et al., 2002]. The statistical measure ofmaximal correlation [Hirschfeld, 1935; Gebelein,1941; Renyi, 1959; Csaki & Fischer, 1963] servesas a quantitative criterion for the goodness ofthe nonparametric model fits [Voss et al., 1999b].The herein often used ACE-algorithm [Breiman& Friedman, 1985; Frank & Lanteri, 1988; Buja,1990; Hardle, 1990; Schimek, 2000] is accessiblevia www.fdm.uni-freiburg.de/∼hv. For nonparamet-

ric modeling, see also [Abel, 2004; Wessel et al.,2004].

At an even more fundamental level, e.g. if one isnot interested primarily in the model equations butin the dimension of the dynamical system, meth-ods based on time-delay embeddings have beendeveloped and applied to many physical and phys-iological systems [Kantz & Schreiber, 1995, 1997;Hegger et al., 1999]. A method that gives morereliable results of the phase space dimension fornoisy data than a precise estimation of correla-tion dimensions [Grassberger & Procaccia, 1983a,1983b; Grassberger et al., 1991; Albano et al.,1988] is based on the false nearest-neighbor ap-proach [Kennel et al., 1992; Abarbanel et al., 1993;Abarbanel, 1996; Hegger & Kantz, 1999]. This maybe used as a tool to explore if one has to cope witha low-dimensional system or a higher dimensionalone.

The latter case is true for spatially extendedsystems whose phase space dimension is infinite.Fortunately, many systems have only a limitedrange of scales for the spatial components of the dy-namics. In these systems numerical simulations canstill be performed in an approximate sense. There-fore, it should be possible also to perform some kindof parameter estimation for such systems. The gen-eralization of methods mentioned in Secs. 5 and 6to these systems is not straightforward though, be-cause of the continued high-dimensionality whichcauses computational difficulties that cannot becompensated by large computer power alone; as anexample, consider a two-dimensional system withmeasurements on a 100 × 100 spatial grid, and as-sume that the number of state components is 5. Thedynamics of this system can then be imagined asa 5 · 104-dimensional state vector evolving in time,leading to covariance matrices with 25·108 entries inthe UKF. This problem is only tractable if approxi-mated somehow [Sitz et al., 2003; Muller & Timmer,2002]. See also [Busch & Hutt, 2004; Mandelj et al.,2004].

9. Summary and Discussion

The problem of identifying parameters and un-observed state components from samples ofcontinuous-time nonlinear dynamical systems facestwo main difficulties: The errors-in-variables prob-lem and the problem of complex cost functions.Whereas the former one can be overcome by takingthe smooth dependence of the states properly into

June 23, 2004 17:5 01034


account, the latter can be managed by splitting theoptimization procedure to minimize the cost func-tion into several parts. This methodology resultsin two somehow orthogonal approaches: multipleshooting methods and recursive filtering methods.Both turn out to be useful especially for the analysisof chaotic processes. Although the recursive meth-ods can only yield approximate results, it has beenshown that they may compete well with the mul-tiple shooting approach. An advantage is that theimplementation is much simpler than for methodsbased on global, nonrecursive, optimization.

Some words are in order why we consider onlydeterministic systems, although e.g. recursive es-timation was developed historically in stochasticsystems. The reason is twofold: Parameter esti-mation in nonlinear stochastic differential equa-tions, as opposed to time-discrete models, suffersfrom the fact that the integration time steps mustusually be orders of magnitude smaller than thesampling time steps, unless the data are highly sam-pled [Dacunha-Castelle & Florens-Zmirou, 1986;Florens-Zmirou, 1989], and there must be no mea-surement noise. These situations do hardly oc-cur in real applications. Whether the proposedquasi-maximum-likelihood approaches of [Bibby &Sørensen, 1995; Timmer, 2000] can be extendedto data with measurement noise still needs to beinvestigated. The other reason is that estimationapproaches based on the smoothness of trajecto-ries cannot rely on this assumption any more,because on small scales the trajectories cannot beapproximated by smooth ones [Kloeden & Platen,1995]. The problem of identification of stochas-tic dynamical systems provides much room forfuture research. See also [Lotosky & Rozovskii,1998; Siegert et al., 1998] and [Siefert & Peinke,2004].

Before being applied to experimental data, bothmultiple shooting and recursive estimation shouldbe validated first on numerical simulations for thefollowing reason: We did not encounter the prob-lem of identifiability of system parameters andtrajectories. For example, even if the state spaceformulation of the system is linear, it may hap-pen that a solution cannot be determined in anunambiguous way. In other words, it may be im-possible to uniquely estimate the complete statefrom the indirect measurements. This is not a ques-tion of the amount of noise but of the system’sstructure. For linear systems, criteria can be for-mulated that allow to test for so-called observ-

ability [Gelb, 1974; Conti, 1976; Barnett, 1983;Anderson & Deistler, 1984; Chui & Chen, 1989;Soderstrom et al., 2002], but there is no concise the-ory up to now for nonlinear systems.

In recent years nonlinear system identificationhas made much progress, but developed along twodifferent branches. Both branches already founda lot of applications; the recursive estimation ap-proach predominantly in the engineering sciences,and shooting approaches in the natural sciences.We expect that both could profit from each otherif combined, and many novel solutions for the stillopen challenges could be found.

Acknowledgments

We thank G. Radons and T. Muller for useful hintsabout the manuscript, W. Horbelt for calculatingthe multiple shooting example, and A. Sitz forfruitful discussions about nonlinear filtering.

Our financial sponsors were: The DFG-Graduiertenkolleg Nichtlineare Differential-gleichungen: Modelierung, Theorie, Numerik,Visualisierung, and the DFG-Schwerpunktpro-gramm 1114: mathematical methods for time seriesanalysis and digital image processing.

References

Abarbanel, H. D. I. et al. [1993] “The analysis of ob-served chaotic data in physical systems,” Rev. Mod.Phys. 65, 1331–1392.

Abarbanel, H. D. I. [1996] Analysis of Observed ChaoticData (Springer, NY).

Aguirre, L. A. & Billings, S. A. [1995] “Retrieving dy-namical invariants from chaotic data using NARMAXmodels,” Int. J. Bifurcation and Chaos 5, 449–474.

Albano, A. M. et al. [1988] “Singular-value decomposi-tion and the Grassberger–Procaccia algorithm,” Phys.Rev. A38, 3017–3026.

Alligood, K., Sauer, T. D. & Yorke, J. A. [1997] Chaos:An Introduction to Dynamical Systems (Springer,Heidelberg).

Anderson, B. D. O. & Moore, J. B. [1979] OptimalFiltering (Prentice-Hall).

Anderson, B. D. O. & Deistler, M. [1984] “Identifiabil-ity of dynamic errors-in-variables models,” J. TimeSeries Anal. 5, 1–13.

Andersen, K. H. et al. [2002] “Pattern dynamics of vortexripples in sand: Nonlinear modeling and experimentalvalidation,” Phys. Rev. Lett. 88, 234302.

Aoki, M. [1990] State Space Modeling of Time Series(Springer, NY).

Arulampalam, S., Maskell, S. R., Gordon, N. J. &Clapp, T. [2002] “A tutorial on particle filters for

June 23, 2004 17:5 01034


on-line nonlinear/non-Gaussian Bayesian tracking,”IEEE Trans. Sign. Process, to appear.

Baake, E., Baake, M., Bock, H. G. & Briggs, K. M. [1992]“Fitting ordinary differential equations to chaoticdata,” Phys. Rev. A45, 5524–5529.

Baake, E. & Schloder, J. P. [1992] “Modeling the fastfluorescence rise of photosynthesis,” Bull. Math. Biol.54, 999–1021.

Bar-Shalom, Y. & Fortmann, T. E. [1988] Trackingand Data Association (Academic Press, Orlando,Florida).

Bar-Shalom, Y., Li, X. R. & Kirubarajan, T. [2001] Esti-mation with Applications to Tracking and Navigation(Wiley, NY).

Barnett, S. [1983] Polynomials and Linear ControlSystems (Marcel Dekker, NY).

Bellman, R. E. & Kalaba, R. [1965] Quasiliniearizationand Boundary Value Problems (American Elsevier,NY).

Benes, V. E. [1981] “Exact finite-dimensional filters forcertain diffusions with nonlinear drift,” Stochastics 5,65–92.

Bibby, B. M. & Sørensen, M. [1995] “Martingale estima-tion functions for discretely observed diffusion pro-cesses,” Bernoulli 1, 17–39.

Bock, H. G. [1981] “Numerical treatment of inverseproblems in chemical reaction kinetics,” Modelingof Chemical Reaction Systems, eds. Ebert, K. H.,Deuflhard, P. & Jager, W., Vol. 18 (Springer, NY),pp. 102–125.

Bock, H. G. [1983] “Recent advances in parameter iden-tification for ordinary differential equations,” Progressin Scientific Computing, Vol. 2, eds. Deuflhard, P. &Hairer, E. (Birkhauser, Boston), pp. 95–121.

Boggs, P. T., Byrd, R. H. & Schnabel, R. B. [1987] “Astable and efficient algorithm for nonlinear orthogonaldistance regression,” SIAM J. Scient. Stat. Comput.8, 1052–1078.

Breeden, J. L. & Hubler, A. [1990] “Reconstructing equa-tions of motion from experimental data with unob-served variables,” Phys. Rev. A42, 5817–5826.

Breiman, L. & Friedman, J. H. [1985] “Estimatingoptimal transformations for multiple regression andcorrelation,” J. Am. Stat. Assoc. 80, 580–598.

Bremer, C. L. & Kaplan, D. T. [2001] “Markov chainMonte Carlo estimation of nonlinear dynamics fromtime series,” Physica D160, 116–126.

Brocker, J. & Parlitz, U. [2001] “Efficient noncausal noisereduction for deterministic time series,” Chaos 11,319–326.

Bryson, A. E. & Ho, Y. C. [1969] Applied OptimalControl (Waltham, Ginn).

Buja, A. [1990] “Remarks on functional canonical vari-ates, alternating least squares methods and ACE,”Ann. Stat. 18, 1032–1069.

Carlin, B. P., Polson, N. G. & Stoffer, D. S. [1992]

“A Monte Carlo approach to nonnormal and nonlin-ear state space modeling,” J. Amer. Stat. Assoc. 87,493–500.

Carroll, R. J., Ruppert, D. & Stefanski, L. A. [1995]Measurement Error in Nonlinear Models (Chapmanand Hall, London).

Chui, C. K. & Chen, G. [1989] Linear Systems andOptimal Control (Springer, Berlin).

Conti, R. [1976] Linear Differential Equations andControl (Academic Press, London).

Cremers, J. & Hubler, A. [1987] “Construction ofdifferential equations from experimental data,” Z.Naturforsch. 42a, 797–802.

Crutchfield, J. P. & McNamara, B. S. [1987] “Equa-tions of motion from data series,” Compl. Syst. 1,417–451.

Csaki, P. & Fischer, J. [1963] “On the general notionof maximal correlation,” Magyar Tud. Akad. Mat.Kutato Int. Kozl. 8, 27–51.

Dacunha-Castelle, D. & Florens-Zmirou, D. [1986]“Estimation of the coefficients of a diffusion from dis-crete observations,” Stochastics 19, 263–284.

Daum, F. E. [1986] “Exact finite-dimensional nonlinearfilters,” IEEE Trans. Automat. Contr. 31, 616–622.

Davies, M. [1992] “Noise reduction by gradient descent,”Int. J. Bifurcation and Chaos 3, 113–118.

Davies, M. E. [1998] “Nonlinear noise reduction throughMonte Carlo sampling,” Chaos 8, 775–781.

Dempster, A. P., Laird, N. M. & Rubin, D. B. [1977]“Maximum likelihood from incomplete data via EMalgorithm,” J. Roy. Stat. Soc. B39, 1–38.

Edsberg, L. & Wedin, P. [1995] “Numerical tools forparameter estimation in ODE-systems,” Opt. Meth.Softw. 6, 193–217.

Farmer, J. D. & Sidorowich, J. J. [1990] “Optimal shad-owing and noise reduction,” Physica D47, 373–392.

FitzHugh, R. [1961] “Impulses and physiological statesin theoretical models of nerve membrane,” Biophys.J. 1, 445–466.

Fletcher, R. & Reeves, C. M. [1964] “Function minimiza-tion by conjugate gradients,” Computer J. 7, 149–154.

Florens-Zmirou, D. [1989] “Approximate discretetimeschemes for statistics of diffusion processes,” Statis-tics 4, 547–557.

Frank, I. E. & Lanteri, S. [1988] “ACE: Nonlinearregression model,” Chemom. Intell. Lab. Syst. 3,301–313.

Fraser, D. C. & Potter, J. E. [1969] “The optimum lin-ear smoother as a combination of two optimum linearfilters,” IEEE Trans. Automat. Contr. 7, 387–390.

Fruhwirth-Schnatter, S. [1994] “Data augmentation anddynamic linear models,” J. Time Series Anal. 15,183–202.

Fuller, W. A. [1987] Measurement Error Models (JohnWiley, NY).

Gebelein, H. [1941] “Das statistische Problem der

June 23, 2004 17:5 01034


Korrelation als Variations- und Eigenwertproblemund sein Zusammenhang mit der Ausgleichsrech-nung,” Zeitschrift fur angew. Math. und Mech. 21,364–379.

Gelb, A. [1974] Applied Optimal Estimation (The M.I.T.Press, Cambridge, MA).

Gershenfeld, N. [1999] The Nature of Mathematical Mod-eling (Cambridge University Press, Cambridge).

Ghahramani, Z. & Roweis, S. T. [1999] “Learning non-linear dynamical systems using an EM algorithm,”Advances in Neural Information Processing Systems,Vol. 11, eds. Kearns, M. J., Solla, S. A. & Cohn, D.A. (MIT Press).

Gill, P. E., Murray, W. & Wright, M. H. [1981] PracticalOptimization (Academic Press, London).

Gordon, N. J., Salmond, D. J. & Smith, A. F. M. [1993]“Novel approach to nonlinear/non-Gaussian Bayesianstate estimation,” Comm. Radar Sign. Process. 140,107–113.

Gouesbet, G. [1991a] “Reconstruction of standard andinverse vector fields equivalent to a Rossler system,”Phys. Rev. A44, 6264–6280.

Gouesbet, G. [1991b] “Reconstruction of the vector fieldsof continuous dynamical systems from scalar timeseries,” Phys. Rev. A43, 5321–5331.

Gouesbet, G. [1992] “Reconstruction of vector fields:The case of the Lorenz system,” Phys. Rev. A46,1784–1796.

Grassberger, P. & Procaccia, I. [1983a] “Characteri-zation of strange attractors,” Phys. Rev. Lett. 50,346–349.

Grassberger, P. & Procaccia, I. [1983b] “Measuringthe strangeness of strange attractors,” Physica D9,189–208.

Grassberger, P., Schreiber, T. & Schaffrath, C. [1991]“Nonlinear time sequence analysis,” Int. J. Bifurca-tion and Chaos 1, 521–547.

Grewal, M. S. & Andrews, A. P. [2001] Kalman Filtering:Theory and Practice Using MATLAB (John Wiley,NY).

Hamilton, J. D. [1994] Time Series Analysis (PrincetonUniversity Press, Princeton).

Hammel, S., Grebogi, C. & Yorke, J. A. [1988] “Numer-ical orbits of chaotic processes represent true orbits,”Bull. Amer. Math. Soc. 19, 465–470.

Hammel, S. M. [1990] “A noise reduction method forchaotic systems,” Phys. Lett. A148, 421–428.

Hanson, R. J. & Haskell, K. H. [1982] “Two algorithmsfor the linearly constrained least squares problem,”ACM Trans. Math. Softw. 8, 323–333.

Hardle, W. [1990] Applied Nonparametric Regression(Cambridge University Press, Cambridge).

Hazewinkel, M., Marcus, S. I. & Sussmann, H. J. [1983]“Nonexistence of finite dimensional nonlinear filtersfor conditional statistics of the cubic sensor problem,”Syst. Cont. Lett. 3, 331–340.

Hegger, R. et al. [1998] “Dynamical properties of aferroelectric capacitor observed through nonlineartime series analysis,” Chaos 8, 727–736.

Hegger, R. & Kantz, H. [1999] “Improved false near-est neighbor method to detect determinism in timeseries,” Phys. Rev. E60, 4970–4973.

Hegger, R., Kantz, H. & Schreiber, T. [1999] “Practicalimplementation of nonlinear time series methods: TheTISEAN package,” Chaos 9, 413–435.

Hirschfeld, H. O. [1935] “A connection between corre-lation and contingency,” Proc. Camb. Phil. Soc. 31,520–524.

Hodgkin, A. L. & Huxley, A. F. [1952] “A quantitativedescription of membrane current and its applicationto conduction and excitation in nerve,” J. Physiol(London) 117, 500–544.

Honerkamp, J., Mutschler, G. & Seitz, R. [1985]“Coupling of a slow and a fast oscillator can generatebursting,” Bull. Math. Biol. 47, 1–21.

Honerkamp, J. [2002] Statistical Physics (Springer,Berlin).

Horbelt, W., Muller, T., Timmer, J., Melzer, W. &Winkler, K. [2000] “Analysis of nonlinear differ-ential equations: parameter estimation and modelselection,” Medical Data Analysis, Lecture Notes inMathematics, Vol. 1933, eds. Brause, R. & Hanisch,E. (Springer, Berlin), pp. 152–158.

Horbelt, W. [2001] Maximum Likelihood Estimationin Dynamical Systems, PhD thesis, University ofFreiburg.

Horbelt, W., Timmer, J., Bunner, M., Meucci, R. &Ciofini, M. [2001] “Identifying physical properties ofa CO2 laser by dynamical modeling of measured timeseries,” Phys. Rev. E64, 016222.

Horbelt, W., Timmer, J. & Voss, H. U. [2002] “Parame-ter estimation in nonlinear delayed feedback systemsfrom noisy data,” Phys. Lett. A299, 513–521.

Jaeger, L. & Kantz, H. [1996] “Unbiased reconstruc-tion of the dynamics underlying a noisy chaotic timeseries,” Chaos 6, 440–450.

Jazwinski, A. H. [1970] Stochastic Processes and Filter-ing Theory (Academic Press, NY).

Jeffreys, W. H. [1980] “On the method of least squares,”Astron. J. 85, 177–181.

Jeffreys, W. H. [1988] “Erratum: On the method of leastsquares,” Astron. J. 95, 1299.

Julier, S. J., Uhlmann, J. K. & Durrant-Whyte, H.F. [1995] “A new approach for filtering nonlinearsystems,” in Proc. 1995 American Control Conf.,Seattle, pp. 1628–1632.

Julier, S. J. & Uhlmann, J. K. [1996] “A generalmethod for approximating nonlinear transformationsof probability distributions,” Technical Report, Dept.of Engineering Science, University of Oxford.

Julier, S., Uhlmann, J. & Durrant-Whyte, H. F. [2000]“A new method for the nonlinear transformation of

June 23, 2004 17:5 01034


means and covariances in filters and estimators,”IEEE Trans. Automat. Contr. 45, 477–482.

Kadtke, J., Brush, J. & Holzfuss, J. [1993] “Globaldynamical equations and Lyapunov exponents fromnoisy chaotic time series,” Int. J. Bifurcation andChaos 3, 607–616.

Kalman, R. [1960] “A new approach to linear filteringand prediction problems,” Trans. ASME J. Basic.Eng. Series 82/D, 35–46.

Kalman, R. E. & Bucy, R. S. [1960] “New results in lin-ear filtering and prediction theory,” Trans. ASME J.Basic. Eng. Series 83/D, 95–108.

Kantz, H. & Schreiber, T. [1995] “Dimension estimatesand physiological data,” Chaos 5, 143–153.

Kantz, H. & Schreiber, T. [1997] Nonlinear Time SeriesAnalysis (Cambridge University Press, Cambridge).

Kennel, M. B., Brown, R. & Abarbanel, H. D. I.[1992] “Determining embedding dimension for phase-space reconstruction using a geometrical construc-tion,” Phys. Rev. A45, 3403–3411.

Kitagawa, G. [1996] “Monte Carlo filter and smootherfor non-Gaussian nonlinear state space models,” J.Comput. Graph. Stat. 5, 1–25.

Kloeden, P. & Platen, E. [1995] Numerical Solution ofStochastic Differential Equations (Springer, Berlin).

Kostelich, E. J. [1992] “Problems in estimating dynamicsfrom data,” Physica D58, 138–152.

Kostelich, E. J. [2001] “Bootstrap estimates of chaoticdynamics,” Phys. Rev. E64, 016213.

Ljung, L. & Soderstrom, T. [1983] Theory and Practiceof Recursive Identification (MIT Press, Cambridge,MA).

Lorenz, E. N. [1963] “Deterministic non-periodic flow,”J. Atmos. Sci. 20, 130–141.

Lotosky, S. V. & Rozovskii, B. L. [1998] “Recursivenonlinear filter for a continuous-discrete time model:Separation of parameters and observables,” IEEETrans. Autom. Contr. 43, 1154–1158.

MacCormick, J. & Blake, A. [1999] A Probabilistic Ex-clusion Principle for Tracking Multiple Objects (IEEEPress).

Madansky, A. [1959] “The fitting of straight lines whenboth variables are subject to error,” J. Amer. Stat.Assoc. 54, 173–205.

Mahalanabis, A. K. & Farooq, M. [1971] “A second-ordermethod for state estimation of nonlinear dynamicalsystems,” Int. J. Contr. 14, 631–639.

Maybhate, A. & Amritkar, R. [1999] “Use of synchro-nization and adaptive control in parameter estimationfrom time series,” Phys. Rev. E59, 284–293.

McSharry, P. E. & Smith, L. A. [1999] “Better nonlinearmodels from noisy data: Attractors with maximumlikelihood,” Phys. Rev. Lett. 83, 4285–4288.

Meinhold, R. J. & Singpuwalla, N. D. [1983]“Understanding the Kalman filter,” Amer. Statist.37, 123–127.

Mendel, J. M. [1995] Lessons in Estimation Theoryfor Signal Processing, Communications and Control(Prentice Hall, NJ).

Miller, R. N., Ghil, M. & Gauthiez, F. [1994] “Advanceddata assimilation in strongly nonlinear dynamicalsystems,” J. Atm. Sci. 51, 1037–1056.

Muller, P. [1991] “Monte Carlo integration in gen-eral dynamic models.” Statistical Multiple In-tegration, eds. Flournoy, N. & Tsutakawa, R.K. (American Mathematical Society, Providence),pp. 145–163.

Muller, T. G. & Timmer, J. [2002] “Fitting parameters inpartial differential equations from partially observednoisy data,” Physica D171, 1–7.

Nicolis, G. [1995] Introduction to Nonlinear Science(Cambridge University Press, Cambridge).

Nijmeijer, H. [2001] “A dynamical control view on syn-chronization,” Physica D154, 219–228.

Norgaard, M., Poulsen, N. K. & Ravn, O. [2000]“New developments in state estimation for nonlinearsystems,” Automatica 36, 1627–1638.

Nusse, H. E. & Yorke, J. A. [1989] “Is every approxi-mate trajectory of some process near an exact tra-jectory of a nearby process?” Commun. Math. Phys.114, 363–379.

Ott, E., Sauer, T. & Yorke, J. A. [1994] Coping withChaos, Wiley Series in Nonlinear Science (John Wiley,NY).

Packard, N. H., Crutchfield, J. P., Farmer, J. D. & Shaw,R. S. [1980] “Geometry from a time series,” Phys. Rev.Lett. 45, 712–716.

Pearson, J. B. [1967] “On nonlinear least squaresfiltering,” Automatica 4, 97–105.

Peifer, M., Timmer, J. & Voss, H. U. [2003] “Nonpara-metric identification of nonlinear oscillating systems,”J. Sound Vibr., to appear.

Petersen, I. R. & Savkin, A. V. [1999] Robust KalmanFiltering for Signals and Systems with Large Uncer-tainties (Birkhauser, Boston).

Phaneuf, R. J. [1968] Approximate Nonlinear Estima-tion, PhD thesis, MIT.

Press, W. H. et al. [1997] Numerical Recipes in C(Cambridge University Press, Cambridge).

Radons, G. & Stiller, J. C. [1999] “Online estimation ofhidden Markov models,” IEEE Sign. Process. Lett. 6,213–215.

Renyi, A. [1959] “On measures of dependence,” Acta.Math. Acad. Sci. Hungar. 10, 441–451.

Robbins, H. & Munroe, S. [1951] “A stochastic approxi-mation method A,” Math. Statist. 22, 400–407.

Sakaguchi, H. [2002] “Parameter evaluation from timesequences using chaos synchronization,” Phys. Rev.E65, 1–4.

Sayed, A. H. & Subramanian, A. [2002] “State-space es-timation with uncertain models,” Total Least Squaresand Errors-in-Variables Modeling, III: Analysis,

June 23, 2004 17:5 01034


Algorithms and Applications, eds. van Huffel, S. &Lemmerling, P. (Kluwer, NY).

Schimek, M. G. (ed.) [2000] Smoothing and Regression.Approaches, Computation and Application (Wiley &Sons, NY).

Schittkowski, K. [1994] “Parameter estimation insystems of nonlinear equations,” Numer. Math. 68,129–142.

Schittkowski, K. [1995] “Parameter estimation in dif-ferential equations,” Recent Trends in OptimizationTheory and Applications, ed. Agarwal, R. (WorldScientific, Singapore), pp. 353–370.

Schmidt, G. C. [1993] “Designing nonlinear filters basedon Daum’s theory,” J. Guidance Contr. Dyn. 16,371–376.

Seber, G. A. F. & Wild, C. J. [1989] Nonlinear Regres-sion (John Wiley, NY).

Shephard, N. [1994] “Partial non-Gaussian state space,”Biometrika 81, 115–131.

Shumway, R. H. & Stoffer, D. S. [1982] “An approach totime series smoothing and forecasting using the EMalgorithm,” J. Time Ser. Anal. 3, 253–264.

Shumway, R. H. & Stoffer, D. S. [2000] Time Series Anal-ysis and its Applications (Springer, NY).

Siegert, S., Friedrich, R. & Peinke, J. [1998] “Analysisof data sets of stochastic systems,” Phys. Lett. A243,275–280.

Sinha, N. K. & Rao, G. P. (eds.) [1991] Identification ofContinuous-Time Systems (Kluwer, Dordrecht).

Sitz, A., Schwarz, U., Kurths, J. & Voss, H. U. [2002]“Estimation of parameters and unobserved compo-nents for chaotic systems from noisy time series,”Phys. Rev. E66, 016210.

Sitz, A., Kurths, J. & Voss, H. U. [2003] “Identificationof nonlinear spatiotemporal systems via partitionedfiltering,” Phys. Rev. E68, 016202.

Soderstrom, T., Soverini, U. & Mahata, K. [2002]“Perspectives on errors-in-variables estimation fordynamic systems,” Sign. Process. 82, 1139–1154.

Stoer, J. [1971] “On the numerical solution of con-strained leastsquares problems,” SIAM J. Numer.Anal. 8, 382–411.

Stoer, J. & Bulirsch, R. [1993] Introduction to NumericalAnalysis (Springer, Heidelberg).

Stojanovski, T., Kocarev, L. & Parlitz, U. [1996] “Asimple method to reveal the parameters of theLorenz system,” Int. J. Bifurcation and Chaos 6,2645–2652.

Stortelder, W. J. H. [1996] “Parameter estimation inchemical engineering, a case study for resin produc-tion,” Scientific Computing in Chemical Engineering,eds. Keil, F., Mackens, W., Voß, H. & Werther, J.(Springer, Heidelberg), pp. 226–238.

Strang, G. [1988] Linear Algebra and Its Applications(Harcourt Brace Jovanovich, San Diego).

Sunahara, Y. [1969] “An approximate method of state

estimation for nonlinear dynamical systems,” in JointAutomatic Control Conf., University of Colorado.

Takens, F. [1981] “Detecting strange attractors in turbu-lence,” Dynamical Systems and Turbulence, LectureNotes in Mathematics, Vol. 898, eds. Rand, D. &Young, L. (Springer, Berlin), pp. 366–381.

Tanizaki, H. (ed.) [1993] Nonlinear Filters; Estimationand Applications (Springer, Berlin).

Theiler, J. [1990] “Statistical precision of dimensionestimators,” Phys. Rev. A41, 3038–3051.

Theiler, J. & Smith, L. A. [1995] “Anomalous conver-gence of Lyapunov exponent estimates,” Phys. Rev.E51, 3738–3741.

Timmer, J. [1998] “Modeling noisy time series: Phys-iological tremor,” Int. J. Bifurcation and Chaos 8,1505–1516.

Timmer, J. [2000] “Estimating parameters in nonlinearstochastic differential equations,” Chaos Solit. Fract.11, 2571–2578.

Timmer, J., Rust, H., Horbelt, W. & Voss, H. U. [2000]“Parametric, nonparametric and parametric mod-elling of a chaotic circuit time series,” Phys. Lett.A274, 123–134.

van Domselaar, B. & Hemker, P. [1975] “Nonlinearparameter estimation in initial value problems,”Technical Report NW 18/75, Mathematical CentreAmsterdam.

van Huffel, S. & Vandewalle, J. [1991] The Total LeastSquares Problem: Computational Aspects and Analy-sis (SIAM, Philadelphia).

Voss, H. & Kurths, J. [1997] “Reconstruction of nonlin-ear time delay models from data by the use of optimaltransformations,” Phys. Lett. A234, 336–344.

Voss, H., Bunner, M. J. & Abel, M. [1998] “Identificationof continuous, spatiotemporal systems,” Phys. Rev.E57, 2820–2823.

Voss, H. U., Kolodner, P., Abel, M. & Kurths, J.[1999a] “Amplitude equations from spatiotemporalbinary-fluid convection data,” Phys. Rev. Lett. 83,3422–3425.

Voss, H. U., Schwache, A., Kurths, J. & Mitschke, F.[1999b] “Equations of motion from chaotic data: Adriven optical fiber ring resonator,” Phys. Lett. A256,47–54.

Walker, D. M. [2002] “Kalman filtering of time seriesdata,” Modelling and Forecasting Financial Data-Techniques of Nonlinear Dynamics, eds. Soofi, A.& Cao, L. (Kluwer Academic Publishers, Boston),pp. 137–157.

Wan, E. A., Merwe, R. & Nelson, A. T. [2000] “Dual esti-mation and the unscented transformation,” Advancesin Neural Information Processing Systems, Vol. 12,eds. Solla, S., Leen, T. & Muller, K.-R. (The MITPress), pp. 666–672.

Wiggins, S. [1990] Introduction to Applied NonlinearDynamical Systems and Chaos (Springer, NY).

June 23, 2004 17:5 01034


Xie, L., Soh, Y. C. & de Souza, C. E. [1994] “RobustKalman filtering for uncertain discrete-time systems,”IEEE Trans. Autom. Contr. 39, 1310–1340.

Young, P. [1981] “Parameter estimation for continuous-time models — a survey,” Automatica 17, 23–39.

Young, P. [2000] “Stochastic, dynamic modelling andsignal processing: Time variable and state depen-dent parameter estimation,” Nonlinear and Non-stationary Signal Processing, eds. Fitzgerald, W.J. et al. (Cambridge University Press, Cambridge),pp. 74–114.

Young, P. C. [1970] “An instrumental variable methodfor real-time identification of a noisy process,” Auto-matica 6, 271–287.

Young, P. C. [1984] Recursive Estimation and TimeSeries Analysis (Springer, Berlin).

Appendix ADerivation of the Filtering Equations(33) to (40) and (41) to (45)

We derive the KF putting more emphasis on aheuristic understanding rather than a more fun-damental derivation based on the theorem ofBayes. For the latter approach, see for example[Honerkamp, 2002; Gershenfeld, 1999]. Another in-troduction into Kalman filtering from a more statis-tical viewpoint is given in [Meinhold & Singpuwalla,1983].

First consider the linear state space model (7),(8) with parameter dependence neglected. To esti-mate the state xt from a time series of measure-ments yt recursively, we first note that

x = xt+∆t|t = E[

xt+∆t|t

]

= E[Fxt|t + εt] = F xt|t .

(A.1)

Thereby it was taken that E[εt] = 0 and that theprevious state xt has to be estimated again fromthe states up to time t−∆t and the measurementsup to time t. Similarly,

y = yt+∆t|t = Gxt+∆t|t = Gx . (A.2)

The Kalman gain is now derived as follows: Wetry to estimate x = xt+∆t|t+∆t in Eq. (32) as accu-rately as possible, i.e. with the lowest estimation er-ror. Any random variable that results from a lineartransformation of a Gaussian random variable againis a Gaussian random variable. Therefore, xt+∆t is,like εt, normally distributed, and completely de-scribed by its mean and covariance. The estimationerror is thus given by the covariance matrix

Pxx,t+∆t|t+∆t = E[(xt+∆t − xt+∆t|t+∆t)

× (xt+∆t − xt+∆t|t+∆t)′] , (A.3)

or in our shorthand notation by

Pxx = E[(x − x)(x − x)′] . (A.4)

The Kalman gain K should be chosen such that thetrace of this matrix is minimized, because the traceis the sum over the squared errors of the compo-nents of x. Here it is assumed that variances are ad-ditive. It is not necessary to think about a minimiza-tion of the cross-diagonal elements of Pxx, since Pxx

is symmetric and can thus be diagonalized; the traceis invariant under this transformation. To performthis minimization explicitly, we rewrite Eq. (A.4):From

x− x = x− x− K(y − y) (A.5)

and

y − y = Gx + η − Gx (A.6)

one gets

x − x = x − x− KG(x − x) − Kη

= (1 − KG)(x − x) − Kη . (A.7)

Therefore, using definition (9), we obtain

Pxx = (1 − KG)E[(x − x)(x − x)′]

× (1 − KG)′ + KRK ′

= (1 − KG)Pxx(1 − KG)′ + KRK ′

= Pxx − KGPxx − PxxG′K ′

+K(GPxxG′ + R)K ′ . (A.8)

Now choose K such that

tr Pxx!= min , (A.9)

using the relations

d

dAtr (AB) = B ′ (AB square) , (A.10)

d

dAtr (ACA′) = 2AC (C symmetric) , (A.11)

(

df

dA

)

ij

:=df

dAij

, (A.12)

and

tr A = trA′ : (A.13)

d tr Pxx

dK= −2PxxG′ + 2K(GPxxG′ + R) = 0 .

(A.14)

June 23, 2004 17:5 01034


Solving this for the Kalman gain yields a minimumof (A.9) at

K = PxxG′(GPxxG′ + R)−1 . (A.15)

It can be checked that this is indeed a minimum.Inserting this into Eq. (A.8) one obtains

Pxx = Pxx − PxxG′(GPxxG′ + R)GPxx

= (1 − KG)Pxx . (A.16)

It remains to compute Pxx:

Pxx = E[(xt+∆t − x)(xt+∆t − x)′]

= E[(Fxt + εt − F xt|t)(Fxt + εt − F xt|t)′]

= E[(F (xt − xt|t) + εt)(F (xt − xt|t) + εt)′]

= E[F (xt − xt|t)(xt − xt|t)′F ′] + Qt

= F Pxx,t|tF′ + Qt . (A.17)

Therefore, Pxx can be computed recursively fromPxx one time step before. This proves Eqs. (41)to (45).

Finally, we come to the proof of the filteringequations (33) to (40), which hold also for deter-ministic but nonlinear state space models, our maingoal: A Gaussian random variable x ∼ N (x, Pxx)transforms into the Gaussian y = Gx + η via

y ∼ N (Gx, GPxxG′+R) (R = E[

ηη′]

) . (A.18)

Therefore,

Pyy = GPxxG′ + R (A.19)

and

Pxy = PxxG′ . (A.20)

Inserting Eqs. (A.19) and (A.20) into Eq. (A.15)proves Eq. (35). Equation (34) follows immediately.

Documents

NONLINEAR DYNAMICAL SYSTEM IDENTIFICATION …jeti.uni-freiburg.de/papers/IJBC_edi.pdf · nonlinear dynamical system identification from uncertain and indirect measurements henning