MIDA Identification

This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final

exam requires integrating this material with teacher explanations and textbooks.

Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION

1

MODEL

IDENTIFICATION




2

Up to now, we considered (ARMA/ARMAX) models

and studied their properties: covariance & spectrum computation,

prediction.

Basic question: who gives the model equations?

“Simple phenomena”: model equations are obtained by combination

and interconnection of simple physical laws

In many cases however, the underlying physical phenomenon is too

complicated to proceed this way (simple physical laws are not

available – E.g.: atmospheric pressure, stock-exchange, etc.)

Moreover, by combining many simple laws one can eventually

obtained models which are too complicated for any purpose (E.g.

model of a ship with 10000 difference equations – who can use it?)

)( dtu − )(ty

)(te

)(

)(

zA

zC

)(

)(

zA

zB+

+




3

Model identification: retrieve suitable model from experiments on

the real system

Identification problem: define an automatic procedure to find a

model for S based on available (input/output or time series) data

S)(ty)(tu

)(,),2(),1( Nuuu K )(,),2(),1( Nyyy K

system trueon the Experiment

( ))}(...,),2(),1({

)}(...,),2(),1({

Nuuu

Nyyy

+

+

e(t)

u(t-d) y(t)

)(

)(

zA

zB

)(

)(

zA

zC




4

Observation: ARMA/ARMAX models are characterized by the

numerator and denominator polynomials coefficients (parameters)

We will talk of parametric identification

There are also NON parametric methods where one directly

estimates the probabilistic properties of the model (mean, covariance

function, spectrum) based on available data.

We will talk about NON parametric methods later




5

Parametric model identification at glance

Five steps in identification:

1. Experiment design and data collection;

2. Selection of a parametric model class { }Θ∈= ϑϑϑ ),()( MM

(ϑ = parameters vector – each different ϑ corresponds to a

different model);

3. Choice of the identification criterion: 0)( ≥ϑNJ

(it measures the performance of the model corresponding to ϑ in

describing available data)

)(minargˆ ϑϑϑ

NN J= ,

the “best” model is that minimizing the identification criterion

4. Minimization of )(ϑNJ with respect to ϑ

(this minimization process will lead us to N

ϑ̂ )

5. [Model validation]

Once the optimal model )ˆ( NM ϑ has been obtained, we verify

whether this model is actually a good one. If it is not, the

identification process must be repeated.




6

1. Experiment design and data collection

Basically already discussed... Issues when performing data collection:

• Choice of the data length N [depending on the uncertainty]

• Design of the input )(tu [for I/O systems]

2. Choice of the parametric model class { }Θ∈= ϑϑϑ ),()( MM

Many options in general

Discrete time vs. Continuous time

Linear vs. Non linear

Time invariant vs. Time variant

Static vs. Dynamics

We will focus on ARMA/ARMAX models

)(1

1)(

1)(

:)(

1

1

1

1

1

1

11

21te

zaza

zczcdtu

zaza

zbzbbty

M

m

m

n

n

m

m

p

p

−−

−−

−−

+−−

−−−

++++−

−−−

+++=

L

K

L

K

ϑ

Where ),0()( 2λWNte ≈

We will consider the

identification of zero mean

process first. The general case

will be a trivial extension




7

What is the parameter vector?

T

npmccbbaa ] [

111LLL=ϑ

It is the npmn ++=ϑ -dimensional vector of the coefficients of the

numerator and denominator polynomials

Observation: 2λ is a parameter too which needs to be identified.

As we will see, however, 2λ is much less important than other

parameters. So, we will indicate by ϑ the vector of “important”

parameters and keep 2λ aside.

We will also write for short

)(),(

),()(

),(

),()( :)( te

zA

zCdtu

zA

zBtyM

ϑ

ϑ

ϑ

ϑϑ +−= , ),0()( 2λWNte ≈

Θ is the set of admissible values for the parameter vector ϑ

It incorporates a-priori information on the possible value for the

parameters, e.g. { }01:11

>∧<=Θ baϑ




8

As we will see, to perform identification we will rely on the theory of

prediction. Hence, we will assume the following.

ASSUMPTION

For every Θ∈ϑ , the stochastic part of )(ϑM (i.e. the part depending

on the white noise ),0()( 2λWNte ≈ ) is canonical and has no zeroes on

the unit circle

In other words, we want to identify model in canonical representations

(this is not an issue since canonical/non canonical representations are

all equivalent)

The requirement that there are no zeroes on the unit circle instead

poses some limitations on the systems we can identify. However:

− zeroes on the unit circle are not usually required to model the

behavior of a given system

− the behavior of models with zeroes on the unit circle can be

approximate by means of models with zeroes close to the unit

circle




9

Observation: d is a fixed time delay. npm ,, are the model order and

are fixed for the moment.

Note that npm ,, can be equal to 0 too. E.g. 0=p corresponds to

ARMA models

Importantly enough, for 0=n we obtained the important class

of ARX models (AR models if 0=p )

Observation: sometimes it may be useful to consider ARMA and

ARMAX models with some fixed structure

Example

)(1

1)(

1)( :)(

1

1

1

12

teaz

azdtu

az

zbbtyM

−

−

−

−

−

++−

−

+=ϑ

Here, the parameter vector is given by

=

b

aϑ only

This types of models are useful when the structure of the system to be

identified is partially known (grey-box identification)

We will talk of black-box identification, instead, when no knowledge

on the system is available and the model structure must be found from

data only




10

3. Choice of the identification criterion 0)( ≥ϑNJ

)(ϑNJ must measure the capability of model )(ϑM in describing the

collected data )}(...,),1(),(...,),1({ NyyNuu

Observation

After the measurement process, )}(...,),1(),(...,),1({ NyyNuu is a

numerical sequence (it is a sequence of N2 real numbers)

)(),(

),()(

),(

),()( :)( te

zA

zCdtu

zA

zBtyM

ϑ

ϑ

ϑ

ϑϑ +−= is instead a stochastic

model (there are infinite possible realizations of the output)

How can we compare a numerical sequence and a stochastic model?

IDEA (predictive approach): generate predictors from models and

evaluate the model capability of predicting the system behavior

Predictor must be feed with past inputs and outputs, so we can feed it

with the available data record and evaluate its performance on it

The best model is the one with the best predictive performance




11

)(),(

),()(

),(

),()( :)( te

zA

zCdtu

zA

zBtyM

ϑ

ϑ

ϑ

ϑϑ +−= , ),0()( 2λWNte ≈

The model is stochastic

and output is stochastic

too

From stochastic models

to predictor models

)1(),(

),()(

),(

),(),()1|(ˆ :)(ˆ −+−=− ty

zC

zFdtu

zC

zEzBttyM

ϑ

ϑ

ϑ

ϑϑϑ

Predictor model returns a deterministic output once that they fed with

numerical data, and the returned )1|(ˆ −tty can be compared against

the system real output

)(ˆ ϑM)(tu)1|(ˆ −tty

)(ty

)(ϑM)(tu )(ty

)(te

Note that predictor

models do not depend

on λ2. That’s why λ2

is

not an “important”

parameter, it is not

needed to compute

prediction




12

The PEM (Prediction Error Minimization) Identification scheme

More precisely,

),1|(ˆ)(),1|()(/)(

),1|2(ˆ)2(),1|2()2(/)2(2

),0|1(ˆ)1(),0|1()1(/)1(1

),(),1|()(/)(

ϑϑ

ϑϑ

ϑϑ

ϑεϑ

−−−

−

−

−

NNyNyNNyNyNuN

yyyyu

yyyyu

iiiyiyiui

MMMM

Predicted values returned by the

model )(ϑM (they depend on ϑ )

Prediction errors

)(ˆ ϑM

)(tu

),1|(ˆ ϑ−tty

)(ty

S

),( ϑε t

min




13

PEM identification criterion

( ) ∑∑==

=−−=N

i

N

iN i

Niiyiy

NJ

1

2

1

2),(

1),1|(ˆ)(

1)( ϑεϑϑ

i.e. it is the empirical variance of the prediction error (global

performance index with respect of all available data)

PEM best model

∑=Θ∈Θ∈

==N

iNN i

NJ

1

2),(1

minarg)(minargˆ ϑεϑϑϑϑ

i.e. the best model is that minimizing the empirical prediction error

variance

Identification of the noise variance

∑=

==N

iNNNN

iN

J1

22 )ˆ,(1

)ˆ(ˆ ϑεϑλ

Underlying idea: if SM N =)ˆ(ϑ (i.e. the true system was perfectly

identified), we would have )()ˆ,( tet N =ϑε and ])ˆ,([E 22

Nt ϑελ = .

To compute 2̂

Nλ from data, ][E L is approximated with its empirical

counterpart ∑=

N

iN 1

1




14

4. Minimization of )(ϑNJ with respect to ϑ

+ℜ→ℜ ϑϑ n

NJ :)(

Example: 1=ϑn

ϑ

)(ϑN

J

Nϑ̂

The computational complexity of the problem of minimizing )(ϑNJ

depends on the form of the function+ℜ→ℜ ϑϑ n

NJ :)(

There are two main relevant cases

AR / ARX → )(ϑN

J quadratic

ARMA,MA / ARMAX → )(ϑN

J not quadratic




15

1. )(ϑNJ is a quadratic function of ϑ

ϑ

)(ϑN

J

Nϑ̂

the minimum can be explicitly computed

2. )(ϑNJ is not quadratic

ϑ

)(ϑN

J

Nϑ̂

the minimum must be sought by means of iterative numerical methods

• Gradient methods

• Newton methods

• Quasi-Newton methods

Iterative methods guarantee the convergence towards a minimum of

)(ϑNJ . Yet, there could be local minima!!




16

IDENTIFICATION OF AR / ARX MODELS

(Least Squares (LS) method)

Generic ARX model:

)()(

1)(

)(

)()( :)( te

zAdtu

zA

zBtyM +−=ϑ where )(te ~ ),0(

2λWN

m

m zazazazA−−− −−−−= ...1)(

2

2

1

1

12

3

1

21...)( +−−− ++++= p

pzbzbzbbzB

== T

pmbbaa ] [

11LLϑ (column vector, dimension pmn +=ϑ )

( ) )()()()()(1)( :)( tedtuzBtyzAtyM +−+−=ϑ

( ) ( ) )()(...)(...)( 11

21

1

1tedtuzbzbbtyzazaty

p

p

m

m+−++++++= +−−−−

)()(...)()(...)1()(11

teqtubdtubmtyatyatypm

+−++−+−++−=

Predictable at time 1−t

Unpredictable at time 1−t

N.B. 1−+= pdq




17

Models in prediction form

−++−−+−+

+−+−+−=−

)(...)1()(

)()...2()1()1|(ˆ :)(ˆ

21

21

qtubdtubdtub

mtyatyatyattyM

p

mϑ

Compact notation:

== T

pmbbaa ] [

11LLϑ (parameter vector, it’s a column vector

with dimensionality pmn +=ϑ )

=−−−−= Tqtudm) u(ty(t)y(tt )]()1[)( LLϕ

)(tϕ is called regression vector (or regressor) and is a column vector.

Its dimension is pmn +=ϑ as well

Then,

ϑϕϕϑϑϑ TTttttyM )()(),1|(ˆ :)(ˆ ==− (scalar product)

Observation

),1|(ˆ ϑ−tty depends linearly on ϑ




18

Identification criterion

( )∑=

−−=N

tN

ttytyN

J1

2);1/(ˆ)(

1)( ϑϑ

( )∑=

−=N

t

T

Ntty

NJ

1

2

)()(1

)( ϑϕϑ

Since ϑϕ Tttty )()1/(ˆ =− is linear in ϑ , )(ϑNJ turns out to be a

quadratic function of ϑ ⇒ minimum can be explicitly computed

Optimization theory gives us the condition to find the minimum:

0

d

)(d

ˆ

== N

NJ

ϑϑϑ

ϑ

0d

)( d

ˆ

2

2

≥= N

NJ

ϑϑϑ

ϑ

The derivative vector must be null,

condition to find stationary points

The Hessian matrix must be semi-definite positive,

condition for spotting out minimum points




19

Observation: by definition the derivative vector is

∂

∂

∂

∂∂

∂

=

∂

∂

∂

∂∂

∂

=

q

N

N

N

n

N

N

N

N

b

J

a

J

a

J

J

J

J

J

)(

)(

)(

)(

)(

)(

d

)( d2

1

2

1

ϑ

ϑ

ϑ

ϑ

ϑ

ϑ

ϑϑ

ϑ

ϑ

ϑ

ϑ

MM

(it’s a column vector)

Let us compute the derivative vector

( ) =

−= ∑

=

N

t

TN ttyN

J

1

2

)()(1

d

d

d

)( dϑϕ

ϑϑ

ϑ (derivative is linear)

( ) =−= ∑=

N

t

Ttty

N 1

2

)()(d

d1ϑϕ

ϑ (basic rule of derivation)

( ) ( )=−−= ∑=

N

t

TTttytty

N 1

)()(d

d)()(2

1ϑϕ

ϑϑϕ

( )( ) =−−= ∑=

N

t

Tttty

N 1

)()()(21

ϕϑϕ

( )∑=

−−=N

t

Tttyt

N 1

)()()(2

ϑϕϕ

This term is linear

in ϑ and recall that

we are considering

derivative vector as

column vector




20

By letting 0d

)( d=

ϑ

ϑN

J we get

( ) 0)()()(2

1

=−− ∑=

N

t

Tttyt

Nϑϕϕ

0)()(2

)()(2

11

=+− ∑∑==

N

t

TN

t

ttN

tytN

ϑϕϕϕ

Least squares (LS) normal equations

)()()()(11

tytttN

t

N

t

T ∑∑==

=

ϕϑϕϕ

We have a linear system of ϑn equations for ϑn unknowns

The solutions correspond to stationary points of our identification

criterion

∑=

N

t 1

= ∑=

N

t 1

× ××

=×ϑn

ϑn ϑn ϑn

ϑn ϑn ϑnϑn 1




21

If ∑=

N

t

Ttt

1

)()( ϕϕ is NOT singular, and hence invertible:

Least squares (LS) formula

∑∑=

−

=

=

N

t

N

t

T

Nttytt

1

1

1

)()()()(ˆ ϕϕϕϑ

The solution is unique and is explicitly computed

Are the solutions of the normal equations minimum points? Yes

( )∑=

−−=N

t

TN ttytN

J

1

)()()(2

d

)( dϑϕϕ

ϑ

ϑ

∑=

==N

t

TNN ttN

JJ

12

2

)()(2

d

)( d

d

d

d

)( dϕϕ

ϑ

ϑ

ϑϑ

ϑ does not depend on ϑ

Is it semi-definite positive?

Recall, a quadratic matrix M is called semi-definite positive if:

0 ,0 ≥⋅⋅≠∀ xMxxT




22

In our case

∑∑==

==⋅⋅=⋅⋅N

t

TTN

t

TTNTxttx

Nxtt

Nxx

Jx

112

2

)()(2

)()(2

d

)( dϕϕϕϕ

ϑ

ϑ

[since xttxTT )()( ϕϕ = ]

( )∑=

≥=N

t

Ttx

N 1

2

0)(2

ϕ the Hessian is always semi-definite positive

The solution of the normal equation are always minimum points

There are two possible cases

Case 1. ∑=

=N

t

TN ttN

J

12

2

)()(2

d

)( dϕϕ

ϑ

ϑis non singular, i.e. invertible

)(ϑNJ is parabolic with an unique point of minimum which is )(ϑNJ

as given by the LS formula

)(ϑNJ

Nϑ̂

1ϑ 2ϑ




23

Case 2. ∑=

=N

t

TN ttN

J

12

2

)()(2

d

)( dϕϕ

ϑ

ϑis singular, i.e. not invertible

)(ϑNJ is parabolic but degenerate, with an infinite number of

minimum points which are the solutions of the normal equations

In this case, all solutions of the normal equations are equivalent for

prediction purposes and the “best” model can be chosen at will among

these

Warning: the presence of multiple global minima means that

1. the data record was not representative enough of the underlying

physical phenomenon

2. the chosen model class was too complex and there are equivalent

models for describing the same phenomenon

)(ϑNJ

1ϑ2ϑ




24

IDENTIFICATION OF ARMA / ARMAX MODELS

(Maximum Likelihood (ML) method)

Generic ARMAX model:

)()(

)()(

)(

)()( :)( te

zA

zCdtu

zA

zBtyM +−=ϑ where )(te ~ ),0(

2λWN

m

m zazazazA−−− −−−−= ...1)(

2

2

1

1

12

3

1

21...)( +−−− ++++= p

pzbzbzbbzB

n

n zczczczC−−− ++++= ...1)(

2

2

1

1

T

npmccbbaa ] [

111LLL=ϑ (dimension npmn ++=ϑ )

)(ϑM is canonic Θ∈∀ϑ

1-step division between )(zC and )(zA (they are monic)

)(zA)(zC

1

)()( zAzC − )()()(

1)(

1zAzCzFz

zE

−=

=−

)(zA−




25

Model in prediction form

)()(

)()(

)(

)()(),1|(ˆ :)(ˆ dtu

zC

zBty

zC

zAzCttyM −+

−=− ϑϑ

Prediction error

)()(

)()(

)(

)()(1),1|(ˆ)(),( dtu

zC

zBty

zC

zAzCttytyt −−

−−=−−= ϑϑε

)()(

)()(

)(

)(),( dtu

zC

zBty

zC

zAt −−=ϑε

Identification criterion: ( )∑=

=N

tN

tN

J1

2;

1)( ϑεϑ

Problem: due to )(zC at the denominator, ),( ϑε t is not linear with

respect to ϑ and the identification criterion )(ϑNJ is not a quadratic

function of ϑ .

In general, )(ϑNJ may present local minima




26

Computing )(argminargˆ ϑϑϑ

NNJ

Θ∈

= (i.e. minimizing )(ϑN

J ) requires

iterative methods:

• The algorithm is initialized with an initial estimate (typically,

randomly chosen) of the optimal parameter vector: 1ϑ

• Update rule: )(1 iif ϑϑ =+ (the estimate is refined through steps)

• The sequence of estimates should converge to Nϑ̂

N

ii ϑϑϑϑϑϑ ˆ... 1321 →→→→→→→ +L

ϑ

)(ϑN

J

Nϑ̂

1ϑ 2ϑ 3ϑ …..




27

Problem: local minima

Typically, iterative algorithms are guaranteed to converge to a

minimum which however could be a local one.

No analytical solutions, just empirical approaches:

• The iterative algorithm is applied M times, each time using a

different (randomly chosen) initialization:

M

N

MM

N

N

N

M ϑϑϑ

ϑϑϑ

ϑϑϑ

ϑϑϑ

ˆ...)

...

ˆ...)3

ˆ...)2

ˆ...)1

21

33231

22221

11211

→→→

→→→

→→→

→→→

This way, we obtain M different solutions corresponding to the

minima of )(ϑN

J

• Among the M different solutions i

Nϑ̂ , choose the one which

corresponds to the minimum value of )ˆ( i

NNJ ϑ




28

ϑ

)(ϑN

J

Nϑ̂

This is an empirical method only. It may happen that the global

minimum is not found

Clearly, the bigger M , the greater the probability of finding the

global minimum

BUT

the bigger M , the higher the computational complexity




29

We are now ready to discuss the update rule )(1 iif ϑϑ =+ of an

iterative method

ϑ

)(ϑN

J

1ϑ2ϑ 3ϑ …..

We will present the so called Newton method which guarantees that

the obtained sequence of estimates converges to a local minimum of

)(ϑNJ




30

Newton Method

Fundamental problem: How do I obtained 1+iϑ based on iϑ ?

Idea: let )(ϑiV be the 2nd order Taylor approximant of )(ϑNJ in the

neighborhood of iϑ :

( )

( ) ( )iNTi

NTii

N

i

i

i

J

JJV

ϑϑϑ

ϑϑϑ

ϑ

ϑϑϑϑϑ

ϑϑ

ϑϑ

−⋅⋅−+

+⋅−+=

=

=

2

2

d

)(d

2

1

d

)( d)()(

Then, 1+iϑ is obtained as the minimum of )(ϑiV i.e. it is the minimum

of the 2nd order Taylor approximant of )(ϑNJ about iϑ

)(ϑNJ

iϑ1+iϑ ϑ

)(ϑiV




31

Let us compute an explicit expression for 1+iϑ by letting 0 d

)( d=

ϑ

ϑiV

( ) 0 d

)(d

d

)( d

d

)( d2

2

=−⋅+===

iNN

i

ii

JJVϑϑ

ϑ

ϑ

ϑ

ϑ

ϑ

ϑ

ϑϑϑϑ

Update rule of the Newton method

ii

NNii JJ

ϑϑϑϑϑ

ϑ

ϑ

ϑϑϑ

=

−

=

+ ⋅

−=

d

)( d

d

)(d1

2

2

1

It remains to compute:

i

NJ

ϑϑϑ

ϑ

= d

)( d Derivative vector of )(ϑNJ (1st derivative)

i

NJ

ϑϑϑ

ϑ

=

2

2

d

)(d Hessian matrix of )(ϑNJ (2nd derivative)

= − ×ϑn

ϑn ϑnϑn ϑn




32

� Let us compute ϑ

ϑ

d

)( dN

J

∑∑==

==N

t

N

t

N tN

tN

J

1

2

1

2 ),( d

d1),(

1

d

d

d

)( dϑε

ϑϑε

ϑϑ

ϑ

∑=

⋅=N

t

N tt

N

J

1 d

),( d),(

2

d

)( d

ϑ

ϑεϑε

ϑ

ϑ

� Let us compute 2

2

d

)(d

ϑ

ϑN

J

∑=

⋅==N

t

NN tt

N

JJ

12

2

d

),( d),(

2

d

d

d

)( d

d

d

d

)(d

ϑ

ϑεϑε

ϑϑ

ϑ

ϑϑ

ϑ

∑∑==

⋅+⋅=N

t

N

t

T

N tt

N

tt

N

J

12

2

12

2

d

),( d),(

2

d

),( d

d

),( d2

d

)(d

ϑ

ϑεϑε

ϑ

ϑε

ϑ

ϑε

ϑ

ϑ

Warning: typically, the second term is neglected and the following

approximation is adopted:

∑=

⋅≈N

t

T

N tt

N

J

12

2

d

),( d

d

),( d2

d

)(d

ϑ

ϑε

ϑ

ϑε

ϑ

ϑ




33

Observation (Hessian matrix approximation)

Why do we make the approximation:

∑=

⋅≈N

t

T

N tt

N

J

12

2

d

),( d

d

),( d2

d

)(d

ϑ

ϑε

ϑ

ϑε

ϑ

ϑ…???

Because this way we have an Hessian always definite positive ⇒ iϑ is

forced to descent towards a minimum

Definite positive Hessian ( iϑ descents towards the minimum)

)(ϑNJ

iϑ1+iϑ ϑ

)(ϑiV




34

Definite negative Hessian ( iϑ would converge towards a maximum)

When instead we take ∑=

⋅≈N

t

T

N tt

N

J

12

2

d

),( d

d

),( d2

d

)(d

ϑ

ϑε

ϑ

ϑε

ϑ

ϑ we have:

)(ϑNJ

iϑ1+iϑ ϑ

)(ϑNJ

iϑ 1+iϑ ϑ

)(ϑiV




35

After introducing the approximation on the Hessian matrix, the update

rule for the Newton method becomes as follows:

⋅

⋅−= ∑∑

=

−

=

+N

t

i

iN

t

Tii

ii tt

N

tt

N 1

1

1

1

d

),( d),(

2

d

),( d

d

),( d2

ϑ

ϑεϑε

ϑ

ϑε

ϑ

ϑεϑϑ

N.B: all quantities in the right-hand-side are computed for iϑϑ =

Final step: how can we compute ϑ

ϑε

d

),( d t?

Recall that )1()(

)()(

)(

)()( −−= tu

zC

zBty

zC

zAtε i.e.

)(...1

...)(

...1

...1)(

1

1

11

21

1

1

1

1 dtuzczc

zbzbbty

zczc

zazat

n

n

p

p

n

n

m

m −+++

+++−

+++

+++=

−−

+−−

−−

−−

ε

[ ]Tnpm cccbbbaaa LLL 212121 =ϑ

T

nc

t

c

t

b

t

a

tt

∂

∂

∂

∂

∂

∂

∂

∂=

),(

),(

),(

),(

d

),( d

111

ϑεϑεϑεϑε

ϑ

ϑεLLL




36

Partial derivatives of ),( ϑε t with respect to maa ,,1 K

)(...1

...)(

...1

...1)(1

1

1

1

1

1 dtuzc

zbb

aty

zczc

zaza

aa

tn

n

p

p

i

n

n

m

m

ii

−++

++

∂

∂−

+++

++

∂

∂=

∂

∂−

+−

−−

−−ε

Hence,

)()()(

1)(

)(

)(

)2()2()(

1)(

)(

)(

)1()1()(

1)(

)(

)(

2

2

1

1

mtmtyzC

tyzC

z

a

t

ttyzC

tyzC

z

a

t

ttyzC

tyzC

z

a

t

m

m

−=−==∂

∂

−=−==∂

∂

−=−==∂

∂

−

−

−

αε

αε

αε

M

)()(

1:)( ty

zCt =α

Partial derivatives of ),( ϑε t with respect to mbb ,,1 K

)(...1

...)(

...1

...1)(1

1

11

21dtu

zczc

zbzbb

bty

zc

za

bb

tn

n

p

p

i

n

n

m

m

ii

−+++

++

∂

∂−

++

++

∂

∂=

∂

∂−−

+−−

−

−ε

Hence,

)1()()(

)(

)1()()(

)(

)()()(

1)(

1

1

2

1

+−−=−−=∂

∂

−−=−−=∂

∂

−=−−=∂

∂

+−

−

pdtdtuzC

z

b

t

dtdtuzC

z

b

t

dtdtuzCb

t

p

p

βε

βε

βε

M

)()(

1:)( tu

zCt −=β




37

Partial derivatives of ),( ϑε t with respect to ncc ,,1 K

)1()(

)()(

)(

)()( −−= tu

zC

zBty

zC

zAtε

( ) )1()()()()(...11

1 −−=+++ −−tuzBtyzAtzczc

n

n ε

( )[ ] [ ])1()()()()(...1 1

1−−

∂

∂=+++

∂

∂ −−tuzBtyzA

ctzczc

c i

n

n

i

ε

0)(

)()( =∂

∂+−

i

i

c

tzCtz

εε

Hence,

)()()(

1)(

)2()2()(

1)(

)1()1()(

1)(

2

1

ntntzCb

t

ttzCb

t

ttzCb

t

p

−=−−=∂

∂

−=−−=∂

∂

−=−−=∂

∂

γεε

γεε

γεε

M

)()(

1:)( t

zCt εγ −=




38

Hence,

it is composed by npm ++ signals,

defined for Nt ,...,2,1= .

Signals )(),(),( ttt γβα are obtained according to following scheme:

/

)(ty

)(tu

+

)(1

zBz−

)(zA

)(

1

zC

)(

1

zC−

)(

1

zC

)(

1

zC−

)(tβ

)(tε

)(tγ

)(tα

-

−

−

+−−

−

−

−

=∂

∂

)(

...

)1(

)1(

...

)(

)(

...

)1(

)(

nt

t

pdt

dt

mt

t

t

γ

γ

β

β

α

α

ϑ

ε




39

A brief summary for the update rule in Newton method

(how 1+iϑ is computed based on iϑ )

• compute polynomials ),(),,(),,( iiizCzBzA ϑϑϑ at step i

• compute signals ),(),,(),,(),,( iiiitttt ϑγϑβϑαϑε by filtering

available data according to the previous scheme

• compute ϑ

ϑε

d

),(di

t

• update the parameter estimate:

⋅

⋅−= ∑∑

=

−

=

+N

t

i

iiN

t

Tii

ii tt

tt

1

)(

1

1

1

d

),(d),(

d

),(d

d

),(d

ϑ

ϑεϑε

ϑ

ϑε

ϑ

ϑεϑϑ

Observation

Before doing filtering, we need to check each time whether ),( izC ϑ

has roots inside the unit circle; if not, make ),( izC ϑ stable by taking

reciprocal roots (Bauer algorithm)




40

Appendix

In numerical optimization, the update from iϑ to 1+iϑ can be

performed according three types of methods:

Gradient method

∂

∂−=

=

+

i

Nii J

ϑϑϑ

ϑµϑϑ

)(1

µ is a fixed parameter which is called “step” of the gradient method

⇒ simple & robust ( iϑ descents always toward the minimum).

⇒ it could be very slow to reach the minimum (when iϑ is close to

the minim the gradient tends to 0).

Newton method

∂

∂

∂

∂−=

=

−

=

+

ii

NNii JJ

ϑϑϑϑϑ

ϑ

ϑ

ϑϑϑ

)()(1

2

2

1

The step of the gradient method is modulated through the Hessian

⇒ very fast convergence

⇒ computationally more demanding

⇒ could not converge if the Hessian is definite negative.




41

Quasi- Newton method

∂

∂−=

=

−+

i

Nii JM

ϑϑϑ

ϑϑϑ

)(11

1−M is a definite positive approximation of the Hessian matrix

⇒ it always converges to a minimum

⇒ less computationally demanding than the Newton method

⇒ faster convergence than the gradient method, although slower

than the Newton method

Quasi-Newton method are often adopted in practice

Remark

Since we introduced the approximation ( ) ( )

∑= ∂

∂

∂

∂≈

∂

∂ N

t

T

N tt

N

J

12

22)(

ϑ

ε

ϑ

ε

ϑ

ϑ,

the method we used to minimize )(ϑNJ can be more properly

classified as a quasi-Newton method

In order to guarantee that 2

2)(

ϑ

ϑ

∂

∂ NJ is invertible one usually takes

( ) ( )δ

ϑ

ε

ϑ

ε

ϑ

ϑI

tt

N

J N

t

T

N +

∂

∂

∂

∂≈

∂

∂∑

=12

22)(

,

where I is the identity matrix and δ is a small positive number

Documents

MIDA Identification