Upload
ifelistigris
View
244
Download
1
Embed Size (px)
DESCRIPTION
Model Identification Notes
Citation preview
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
1
MODEL
IDENTIFICATION
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
2
Up to now, we considered (ARMA/ARMAX) models
and studied their properties: covariance & spectrum computation,
prediction.
Basic question: who gives the model equations?
“Simple phenomena”: model equations are obtained by combination
and interconnection of simple physical laws
In many cases however, the underlying physical phenomenon is too
complicated to proceed this way (simple physical laws are not
available – E.g.: atmospheric pressure, stock-exchange, etc.)
Moreover, by combining many simple laws one can eventually
obtained models which are too complicated for any purpose (E.g.
model of a ship with 10000 difference equations – who can use it?)
)( dtu − )(ty
)(te
)(
)(
zA
zC
)(
)(
zA
zB+
+
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
3
Model identification: retrieve suitable model from experiments on
the real system
Identification problem: define an automatic procedure to find a
model for S based on available (input/output or time series) data
S)(ty)(tu
)(,),2(),1( Nuuu K )(,),2(),1( Nyyy K
system trueon the Experiment
( ))}(...,),2(),1({
)}(...,),2(),1({
Nuuu
Nyyy
+
+
e(t)
u(t-d) y(t)
)(
)(
zA
zB
)(
)(
zA
zC
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
4
Observation: ARMA/ARMAX models are characterized by the
numerator and denominator polynomials coefficients (parameters)
We will talk of parametric identification
There are also NON parametric methods where one directly
estimates the probabilistic properties of the model (mean, covariance
function, spectrum) based on available data.
We will talk about NON parametric methods later
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
5
Parametric model identification at glance
Five steps in identification:
1. Experiment design and data collection;
2. Selection of a parametric model class { }Θ∈= ϑϑϑ ),()( MM
(ϑ = parameters vector – each different ϑ corresponds to a
different model);
3. Choice of the identification criterion: 0)( ≥ϑNJ
(it measures the performance of the model corresponding to ϑ in
describing available data)
)(minargˆ ϑϑϑ
NN J= ,
the “best” model is that minimizing the identification criterion
4. Minimization of )(ϑNJ with respect to ϑ
(this minimization process will lead us to N
ϑ̂ )
5. [Model validation]
Once the optimal model )ˆ( NM ϑ has been obtained, we verify
whether this model is actually a good one. If it is not, the
identification process must be repeated.
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
6
1. Experiment design and data collection
Basically already discussed... Issues when performing data collection:
• Choice of the data length N [depending on the uncertainty]
• Design of the input )(tu [for I/O systems]
2. Choice of the parametric model class { }Θ∈= ϑϑϑ ),()( MM
Many options in general
Discrete time vs. Continuous time
Linear vs. Non linear
Time invariant vs. Time variant
Static vs. Dynamics
We will focus on ARMA/ARMAX models
)(1
1)(
1)(
:)(
1
1
1
1
1
1
11
21te
zaza
zczcdtu
zaza
zbzbbty
M
m
m
n
n
m
m
p
p
−−
−−
−−
+−−
−−−
++++−
−−−
+++=
L
K
L
K
ϑ
Where ),0()( 2λWNte ≈
We will consider the
identification of zero mean
process first. The general case
will be a trivial extension
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
7
What is the parameter vector?
T
npmccbbaa ] [
111LLL=ϑ
It is the npmn ++=ϑ -dimensional vector of the coefficients of the
numerator and denominator polynomials
Observation: 2λ is a parameter too which needs to be identified.
As we will see, however, 2λ is much less important than other
parameters. So, we will indicate by ϑ the vector of “important”
parameters and keep 2λ aside.
We will also write for short
)(),(
),()(
),(
),()( :)( te
zA
zCdtu
zA
zBtyM
ϑ
ϑ
ϑ
ϑϑ +−= , ),0()( 2λWNte ≈
Θ is the set of admissible values for the parameter vector ϑ
It incorporates a-priori information on the possible value for the
parameters, e.g. { }01:11
>∧<=Θ baϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
8
As we will see, to perform identification we will rely on the theory of
prediction. Hence, we will assume the following.
ASSUMPTION
For every Θ∈ϑ , the stochastic part of )(ϑM (i.e. the part depending
on the white noise ),0()( 2λWNte ≈ ) is canonical and has no zeroes on
the unit circle
In other words, we want to identify model in canonical representations
(this is not an issue since canonical/non canonical representations are
all equivalent)
The requirement that there are no zeroes on the unit circle instead
poses some limitations on the systems we can identify. However:
− zeroes on the unit circle are not usually required to model the
behavior of a given system
− the behavior of models with zeroes on the unit circle can be
approximate by means of models with zeroes close to the unit
circle
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
9
Observation: d is a fixed time delay. npm ,, are the model order and
are fixed for the moment.
Note that npm ,, can be equal to 0 too. E.g. 0=p corresponds to
ARMA models
Importantly enough, for 0=n we obtained the important class
of ARX models (AR models if 0=p )
Observation: sometimes it may be useful to consider ARMA and
ARMAX models with some fixed structure
Example
)(1
1)(
1)( :)(
1
1
1
12
teaz
azdtu
az
zbbtyM
−
−
−
−
−
++−
−
+=ϑ
Here, the parameter vector is given by
=
b
aϑ only
This types of models are useful when the structure of the system to be
identified is partially known (grey-box identification)
We will talk of black-box identification, instead, when no knowledge
on the system is available and the model structure must be found from
data only
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
10
3. Choice of the identification criterion 0)( ≥ϑNJ
)(ϑNJ must measure the capability of model )(ϑM in describing the
collected data )}(...,),1(),(...,),1({ NyyNuu
Observation
After the measurement process, )}(...,),1(),(...,),1({ NyyNuu is a
numerical sequence (it is a sequence of N2 real numbers)
)(),(
),()(
),(
),()( :)( te
zA
zCdtu
zA
zBtyM
ϑ
ϑ
ϑ
ϑϑ +−= is instead a stochastic
model (there are infinite possible realizations of the output)
How can we compare a numerical sequence and a stochastic model?
IDEA (predictive approach): generate predictors from models and
evaluate the model capability of predicting the system behavior
Predictor must be feed with past inputs and outputs, so we can feed it
with the available data record and evaluate its performance on it
The best model is the one with the best predictive performance
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
11
)(),(
),()(
),(
),()( :)( te
zA
zCdtu
zA
zBtyM
ϑ
ϑ
ϑ
ϑϑ +−= , ),0()( 2λWNte ≈
The model is stochastic
and output is stochastic
too
From stochastic models
to predictor models
)1(),(
),()(
),(
),(),()1|(ˆ :)(ˆ −+−=− ty
zC
zFdtu
zC
zEzBttyM
ϑ
ϑ
ϑ
ϑϑϑ
Predictor model returns a deterministic output once that they fed with
numerical data, and the returned )1|(ˆ −tty can be compared against
the system real output
)(ˆ ϑM)(tu)1|(ˆ −tty
)(ty
)(ϑM)(tu )(ty
)(te
Note that predictor
models do not depend
on λ2. That’s why λ2
is
not an “important”
parameter, it is not
needed to compute
prediction
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
12
The PEM (Prediction Error Minimization) Identification scheme
More precisely,
),1|(ˆ)(),1|()(/)(
),1|2(ˆ)2(),1|2()2(/)2(2
),0|1(ˆ)1(),0|1()1(/)1(1
),(),1|()(/)(
ϑϑ
ϑϑ
ϑϑ
ϑεϑ
−−−
−
−
−
NNyNyNNyNyNuN
yyyyu
yyyyu
iiiyiyiui
MMMM
Predicted values returned by the
model )(ϑM (they depend on ϑ )
Prediction errors
)(ˆ ϑM
)(tu
),1|(ˆ ϑ−tty
)(ty
S
),( ϑε t
min
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
13
PEM identification criterion
( ) ∑∑==
=−−=N
i
N
iN i
Niiyiy
NJ
1
2
1
2),(
1),1|(ˆ)(
1)( ϑεϑϑ
i.e. it is the empirical variance of the prediction error (global
performance index with respect of all available data)
PEM best model
∑=Θ∈Θ∈
==N
iNN i
NJ
1
2),(1
minarg)(minargˆ ϑεϑϑϑϑ
i.e. the best model is that minimizing the empirical prediction error
variance
Identification of the noise variance
∑=
==N
iNNNN
iN
J1
22 )ˆ,(1
)ˆ(ˆ ϑεϑλ
Underlying idea: if SM N =)ˆ(ϑ (i.e. the true system was perfectly
identified), we would have )()ˆ,( tet N =ϑε and ])ˆ,([E 22
Nt ϑελ = .
To compute 2̂
Nλ from data, ][E L is approximated with its empirical
counterpart ∑=
N
iN 1
1
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
14
4. Minimization of )(ϑNJ with respect to ϑ
+ℜ→ℜ ϑϑ n
NJ :)(
Example: 1=ϑn
ϑ
)(ϑN
J
Nϑ̂
The computational complexity of the problem of minimizing )(ϑNJ
depends on the form of the function+ℜ→ℜ ϑϑ n
NJ :)(
There are two main relevant cases
AR / ARX → )(ϑN
J quadratic
ARMA,MA / ARMAX → )(ϑN
J not quadratic
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
15
1. )(ϑNJ is a quadratic function of ϑ
ϑ
)(ϑN
J
Nϑ̂
the minimum can be explicitly computed
2. )(ϑNJ is not quadratic
ϑ
)(ϑN
J
Nϑ̂
the minimum must be sought by means of iterative numerical methods
• Gradient methods
• Newton methods
• Quasi-Newton methods
Iterative methods guarantee the convergence towards a minimum of
)(ϑNJ . Yet, there could be local minima!!
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
16
IDENTIFICATION OF AR / ARX MODELS
(Least Squares (LS) method)
Generic ARX model:
)()(
1)(
)(
)()( :)( te
zAdtu
zA
zBtyM +−=ϑ where )(te ~ ),0(
2λWN
m
m zazazazA−−− −−−−= ...1)(
2
2
1
1
12
3
1
21...)( +−−− ++++= p
pzbzbzbbzB
== T
pmbbaa ] [
11LLϑ (column vector, dimension pmn +=ϑ )
( ) )()()()()(1)( :)( tedtuzBtyzAtyM +−+−=ϑ
( ) ( ) )()(...)(...)( 11
21
1
1tedtuzbzbbtyzazaty
p
p
m
m+−++++++= +−−−−
)()(...)()(...)1()(11
teqtubdtubmtyatyatypm
+−++−+−++−=
Predictable at time 1−t
Unpredictable at time 1−t
N.B. 1−+= pdq
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
17
Models in prediction form
−++−−+−+
+−+−+−=−
)(...)1()(
)()...2()1()1|(ˆ :)(ˆ
21
21
qtubdtubdtub
mtyatyatyattyM
p
mϑ
Compact notation:
== T
pmbbaa ] [
11LLϑ (parameter vector, it’s a column vector
with dimensionality pmn +=ϑ )
=−−−−= Tqtudm) u(ty(t)y(tt )]()1[)( LLϕ
)(tϕ is called regression vector (or regressor) and is a column vector.
Its dimension is pmn +=ϑ as well
Then,
ϑϕϕϑϑϑ TTttttyM )()(),1|(ˆ :)(ˆ ==− (scalar product)
Observation
),1|(ˆ ϑ−tty depends linearly on ϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
18
Identification criterion
( )∑=
−−=N
tN
ttytyN
J1
2);1/(ˆ)(
1)( ϑϑ
( )∑=
−=N
t
T
Ntty
NJ
1
2
)()(1
)( ϑϕϑ
Since ϑϕ Tttty )()1/(ˆ =− is linear in ϑ , )(ϑNJ turns out to be a
quadratic function of ϑ ⇒ minimum can be explicitly computed
Optimization theory gives us the condition to find the minimum:
0
d
)(d
ˆ
== N
NJ
ϑϑϑ
ϑ
0d
)( d
ˆ
2
2
≥= N
NJ
ϑϑϑ
ϑ
The derivative vector must be null,
condition to find stationary points
The Hessian matrix must be semi-definite positive,
condition for spotting out minimum points
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
19
Observation: by definition the derivative vector is
∂
∂
∂
∂∂
∂
=
∂
∂
∂
∂∂
∂
=
q
N
N
N
n
N
N
N
N
b
J
a
J
a
J
J
J
J
J
)(
)(
)(
)(
)(
)(
d
)( d2
1
2
1
ϑ
ϑ
ϑ
ϑ
ϑ
ϑ
ϑϑ
ϑ
ϑ
ϑ
ϑ
MM
(it’s a column vector)
Let us compute the derivative vector
( ) =
−= ∑
=
N
t
TN ttyN
J
1
2
)()(1
d
d
d
)( dϑϕ
ϑϑ
ϑ (derivative is linear)
( ) =−= ∑=
N
t
Ttty
N 1
2
)()(d
d1ϑϕ
ϑ (basic rule of derivation)
( ) ( )=−−= ∑=
N
t
TTttytty
N 1
)()(d
d)()(2
1ϑϕ
ϑϑϕ
( )( ) =−−= ∑=
N
t
Tttty
N 1
)()()(21
ϕϑϕ
( )∑=
−−=N
t
Tttyt
N 1
)()()(2
ϑϕϕ
This term is linear
in ϑ and recall that
we are considering
derivative vector as
column vector
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
20
By letting 0d
)( d=
ϑ
ϑN
J we get
( ) 0)()()(2
1
=−− ∑=
N
t
Tttyt
Nϑϕϕ
0)()(2
)()(2
11
=+− ∑∑==
N
t
TN
t
ttN
tytN
ϑϕϕϕ
Least squares (LS) normal equations
)()()()(11
tytttN
t
N
t
T ∑∑==
=
ϕϑϕϕ
We have a linear system of ϑn equations for ϑn unknowns
The solutions correspond to stationary points of our identification
criterion
∑=
N
t 1
= ∑=
N
t 1
× ××
=×ϑn
ϑn ϑn ϑn
ϑn ϑn ϑnϑn 1
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
21
If ∑=
N
t
Ttt
1
)()( ϕϕ is NOT singular, and hence invertible:
Least squares (LS) formula
∑∑=
−
=
=
N
t
N
t
T
Nttytt
1
1
1
)()()()(ˆ ϕϕϕϑ
The solution is unique and is explicitly computed
Are the solutions of the normal equations minimum points? Yes
( )∑=
−−=N
t
TN ttytN
J
1
)()()(2
d
)( dϑϕϕ
ϑ
ϑ
∑=
==N
t
TNN ttN
JJ
12
2
)()(2
d
)( d
d
d
d
)( dϕϕ
ϑ
ϑ
ϑϑ
ϑ does not depend on ϑ
Is it semi-definite positive?
Recall, a quadratic matrix M is called semi-definite positive if:
0 ,0 ≥⋅⋅≠∀ xMxxT
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
22
In our case
∑∑==
==⋅⋅=⋅⋅N
t
TTN
t
TTNTxttx
Nxtt
Nxx
Jx
112
2
)()(2
)()(2
d
)( dϕϕϕϕ
ϑ
ϑ
[since xttxTT )()( ϕϕ = ]
( )∑=
≥=N
t
Ttx
N 1
2
0)(2
ϕ the Hessian is always semi-definite positive
The solution of the normal equation are always minimum points
There are two possible cases
Case 1. ∑=
=N
t
TN ttN
J
12
2
)()(2
d
)( dϕϕ
ϑ
ϑis non singular, i.e. invertible
)(ϑNJ is parabolic with an unique point of minimum which is )(ϑNJ
as given by the LS formula
)(ϑNJ
Nϑ̂
1ϑ 2ϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
23
Case 2. ∑=
=N
t
TN ttN
J
12
2
)()(2
d
)( dϕϕ
ϑ
ϑis singular, i.e. not invertible
)(ϑNJ is parabolic but degenerate, with an infinite number of
minimum points which are the solutions of the normal equations
In this case, all solutions of the normal equations are equivalent for
prediction purposes and the “best” model can be chosen at will among
these
Warning: the presence of multiple global minima means that
1. the data record was not representative enough of the underlying
physical phenomenon
2. the chosen model class was too complex and there are equivalent
models for describing the same phenomenon
)(ϑNJ
1ϑ2ϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
24
IDENTIFICATION OF ARMA / ARMAX MODELS
(Maximum Likelihood (ML) method)
Generic ARMAX model:
)()(
)()(
)(
)()( :)( te
zA
zCdtu
zA
zBtyM +−=ϑ where )(te ~ ),0(
2λWN
m
m zazazazA−−− −−−−= ...1)(
2
2
1
1
12
3
1
21...)( +−−− ++++= p
pzbzbzbbzB
n
n zczczczC−−− ++++= ...1)(
2
2
1
1
T
npmccbbaa ] [
111LLL=ϑ (dimension npmn ++=ϑ )
)(ϑM is canonic Θ∈∀ϑ
1-step division between )(zC and )(zA (they are monic)
)(zA)(zC
1
)()( zAzC − )()()(
1)(
1zAzCzFz
zE
−=
=−
)(zA−
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
25
Model in prediction form
)()(
)()(
)(
)()(),1|(ˆ :)(ˆ dtu
zC
zBty
zC
zAzCttyM −+
−=− ϑϑ
Prediction error
)()(
)()(
)(
)()(1),1|(ˆ)(),( dtu
zC
zBty
zC
zAzCttytyt −−
−−=−−= ϑϑε
)()(
)()(
)(
)(),( dtu
zC
zBty
zC
zAt −−=ϑε
Identification criterion: ( )∑=
=N
tN
tN
J1
2;
1)( ϑεϑ
Problem: due to )(zC at the denominator, ),( ϑε t is not linear with
respect to ϑ and the identification criterion )(ϑNJ is not a quadratic
function of ϑ .
In general, )(ϑNJ may present local minima
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
26
Computing )(argminargˆ ϑϑϑ
NNJ
Θ∈
= (i.e. minimizing )(ϑN
J ) requires
iterative methods:
• The algorithm is initialized with an initial estimate (typically,
randomly chosen) of the optimal parameter vector: 1ϑ
• Update rule: )(1 iif ϑϑ =+ (the estimate is refined through steps)
• The sequence of estimates should converge to Nϑ̂
N
ii ϑϑϑϑϑϑ ˆ... 1321 →→→→→→→ +L
ϑ
)(ϑN
J
Nϑ̂
1ϑ 2ϑ 3ϑ …..
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
27
Problem: local minima
Typically, iterative algorithms are guaranteed to converge to a
minimum which however could be a local one.
No analytical solutions, just empirical approaches:
• The iterative algorithm is applied M times, each time using a
different (randomly chosen) initialization:
M
N
MM
N
N
N
M ϑϑϑ
ϑϑϑ
ϑϑϑ
ϑϑϑ
ˆ...)
...
ˆ...)3
ˆ...)2
ˆ...)1
21
33231
22221
11211
→→→
→→→
→→→
→→→
This way, we obtain M different solutions corresponding to the
minima of )(ϑN
J
• Among the M different solutions i
Nϑ̂ , choose the one which
corresponds to the minimum value of )ˆ( i
NNJ ϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
28
ϑ
)(ϑN
J
Nϑ̂
This is an empirical method only. It may happen that the global
minimum is not found
Clearly, the bigger M , the greater the probability of finding the
global minimum
BUT
the bigger M , the higher the computational complexity
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
29
We are now ready to discuss the update rule )(1 iif ϑϑ =+ of an
iterative method
ϑ
)(ϑN
J
1ϑ2ϑ 3ϑ …..
We will present the so called Newton method which guarantees that
the obtained sequence of estimates converges to a local minimum of
)(ϑNJ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
30
Newton Method
Fundamental problem: How do I obtained 1+iϑ based on iϑ ?
Idea: let )(ϑiV be the 2nd order Taylor approximant of )(ϑNJ in the
neighborhood of iϑ :
( )
( ) ( )iNTi
NTii
N
i
i
i
J
JJV
ϑϑϑ
ϑϑϑ
ϑ
ϑϑϑϑϑ
ϑϑ
ϑϑ
−⋅⋅−+
+⋅−+=
=
=
2
2
d
)(d
2
1
d
)( d)()(
Then, 1+iϑ is obtained as the minimum of )(ϑiV i.e. it is the minimum
of the 2nd order Taylor approximant of )(ϑNJ about iϑ
)(ϑNJ
iϑ1+iϑ ϑ
)(ϑiV
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
31
Let us compute an explicit expression for 1+iϑ by letting 0 d
)( d=
ϑ
ϑiV
( ) 0 d
)(d
d
)( d
d
)( d2
2
=−⋅+===
iNN
i
ii
JJVϑϑ
ϑ
ϑ
ϑ
ϑ
ϑ
ϑ
ϑϑϑϑ
Update rule of the Newton method
ii
NNii JJ
ϑϑϑϑϑ
ϑ
ϑ
ϑϑϑ
=
−
=
+ ⋅
−=
d
)( d
d
)(d1
2
2
1
It remains to compute:
i
NJ
ϑϑϑ
ϑ
= d
)( d Derivative vector of )(ϑNJ (1st derivative)
i
NJ
ϑϑϑ
ϑ
=
2
2
d
)(d Hessian matrix of )(ϑNJ (2nd derivative)
= − ×ϑn
ϑn ϑnϑn ϑn
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
32
� Let us compute ϑ
ϑ
d
)( dN
J
∑∑==
==N
t
N
t
N tN
tN
J
1
2
1
2 ),( d
d1),(
1
d
d
d
)( dϑε
ϑϑε
ϑϑ
ϑ
∑=
⋅=N
t
N tt
N
J
1 d
),( d),(
2
d
)( d
ϑ
ϑεϑε
ϑ
ϑ
� Let us compute 2
2
d
)(d
ϑ
ϑN
J
∑=
⋅==N
t
NN tt
N
JJ
12
2
d
),( d),(
2
d
d
d
)( d
d
d
d
)(d
ϑ
ϑεϑε
ϑϑ
ϑ
ϑϑ
ϑ
∑∑==
⋅+⋅=N
t
N
t
T
N tt
N
tt
N
J
12
2
12
2
d
),( d),(
2
d
),( d
d
),( d2
d
)(d
ϑ
ϑεϑε
ϑ
ϑε
ϑ
ϑε
ϑ
ϑ
Warning: typically, the second term is neglected and the following
approximation is adopted:
∑=
⋅≈N
t
T
N tt
N
J
12
2
d
),( d
d
),( d2
d
)(d
ϑ
ϑε
ϑ
ϑε
ϑ
ϑ
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
33
Observation (Hessian matrix approximation)
Why do we make the approximation:
∑=
⋅≈N
t
T
N tt
N
J
12
2
d
),( d
d
),( d2
d
)(d
ϑ
ϑε
ϑ
ϑε
ϑ
ϑ…???
Because this way we have an Hessian always definite positive ⇒ iϑ is
forced to descent towards a minimum
Definite positive Hessian ( iϑ descents towards the minimum)
)(ϑNJ
iϑ1+iϑ ϑ
)(ϑiV
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
34
Definite negative Hessian ( iϑ would converge towards a maximum)
When instead we take ∑=
⋅≈N
t
T
N tt
N
J
12
2
d
),( d
d
),( d2
d
)(d
ϑ
ϑε
ϑ
ϑε
ϑ
ϑ we have:
)(ϑNJ
iϑ1+iϑ ϑ
)(ϑNJ
iϑ 1+iϑ ϑ
)(ϑiV
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
35
After introducing the approximation on the Hessian matrix, the update
rule for the Newton method becomes as follows:
⋅
⋅−= ∑∑
=
−
=
+N
t
i
iN
t
Tii
ii tt
N
tt
N 1
1
1
1
d
),( d),(
2
d
),( d
d
),( d2
ϑ
ϑεϑε
ϑ
ϑε
ϑ
ϑεϑϑ
N.B: all quantities in the right-hand-side are computed for iϑϑ =
Final step: how can we compute ϑ
ϑε
d
),( d t?
Recall that )1()(
)()(
)(
)()( −−= tu
zC
zBty
zC
zAtε i.e.
)(...1
...)(
...1
...1)(
1
1
11
21
1
1
1
1 dtuzczc
zbzbbty
zczc
zazat
n
n
p
p
n
n
m
m −+++
+++−
+++
+++=
−−
+−−
−−
−−
ε
[ ]Tnpm cccbbbaaa LLL 212121 =ϑ
T
nc
t
c
t
b
t
a
tt
∂
∂
∂
∂
∂
∂
∂
∂=
),(
),(
),(
),(
d
),( d
111
ϑεϑεϑεϑε
ϑ
ϑεLLL
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
36
Partial derivatives of ),( ϑε t with respect to maa ,,1 K
)(...1
...)(
...1
...1)(1
1
1
1
1
1 dtuzc
zbb
aty
zczc
zaza
aa
tn
n
p
p
i
n
n
m
m
ii
−++
++
∂
∂−
+++
++
∂
∂=
∂
∂−
+−
−−
−−ε
Hence,
)()()(
1)(
)(
)(
)2()2()(
1)(
)(
)(
)1()1()(
1)(
)(
)(
2
2
1
1
mtmtyzC
tyzC
z
a
t
ttyzC
tyzC
z
a
t
ttyzC
tyzC
z
a
t
m
m
−=−==∂
∂
−=−==∂
∂
−=−==∂
∂
−
−
−
αε
αε
αε
M
)()(
1:)( ty
zCt =α
Partial derivatives of ),( ϑε t with respect to mbb ,,1 K
)(...1
...)(
...1
...1)(1
1
11
21dtu
zczc
zbzbb
bty
zc
za
bb
tn
n
p
p
i
n
n
m
m
ii
−+++
++
∂
∂−
++
++
∂
∂=
∂
∂−−
+−−
−
−ε
Hence,
)1()()(
)(
)1()()(
)(
)()()(
1)(
1
1
2
1
+−−=−−=∂
∂
−−=−−=∂
∂
−=−−=∂
∂
+−
−
pdtdtuzC
z
b
t
dtdtuzC
z
b
t
dtdtuzCb
t
p
p
βε
βε
βε
M
)()(
1:)( tu
zCt −=β
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
37
Partial derivatives of ),( ϑε t with respect to ncc ,,1 K
)1()(
)()(
)(
)()( −−= tu
zC
zBty
zC
zAtε
( ) )1()()()()(...11
1 −−=+++ −−tuzBtyzAtzczc
n
n ε
( )[ ] [ ])1()()()()(...1 1
1−−
∂
∂=+++
∂
∂ −−tuzBtyzA
ctzczc
c i
n
n
i
ε
0)(
)()( =∂
∂+−
i
i
c
tzCtz
εε
Hence,
)()()(
1)(
)2()2()(
1)(
)1()1()(
1)(
2
1
ntntzCb
t
ttzCb
t
ttzCb
t
p
−=−−=∂
∂
−=−−=∂
∂
−=−−=∂
∂
γεε
γεε
γεε
M
)()(
1:)( t
zCt εγ −=
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
38
Hence,
it is composed by npm ++ signals,
defined for Nt ,...,2,1= .
Signals )(),(),( ttt γβα are obtained according to following scheme:
/
)(ty
)(tu
+
)(1
zBz−
)(zA
)(
1
zC
)(
1
zC−
)(
1
zC
)(
1
zC−
)(tβ
)(tε
)(tγ
)(tα
-
−
−
+−−
−
−
−
=∂
∂
)(
...
)1(
)1(
...
)(
)(
...
)1(
)(
nt
t
pdt
dt
mt
t
t
γ
γ
β
β
α
α
ϑ
ε
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
39
A brief summary for the update rule in Newton method
(how 1+iϑ is computed based on iϑ )
• compute polynomials ),(),,(),,( iiizCzBzA ϑϑϑ at step i
• compute signals ),(),,(),,(),,( iiiitttt ϑγϑβϑαϑε by filtering
available data according to the previous scheme
• compute ϑ
ϑε
d
),(di
t
• update the parameter estimate:
⋅
⋅−= ∑∑
=
−
=
+N
t
i
iiN
t
Tii
ii tt
tt
1
)(
1
1
1
d
),(d),(
d
),(d
d
),(d
ϑ
ϑεϑε
ϑ
ϑε
ϑ
ϑεϑϑ
Observation
Before doing filtering, we need to check each time whether ),( izC ϑ
has roots inside the unit circle; if not, make ),( izC ϑ stable by taking
reciprocal roots (Bauer algorithm)
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
40
Appendix
In numerical optimization, the update from iϑ to 1+iϑ can be
performed according three types of methods:
Gradient method
∂
∂−=
=
+
i
Nii J
ϑϑϑ
ϑµϑϑ
)(1
µ is a fixed parameter which is called “step” of the gradient method
⇒ simple & robust ( iϑ descents always toward the minimum).
⇒ it could be very slow to reach the minimum (when iϑ is close to
the minim the gradient tends to 0).
Newton method
∂
∂
∂
∂−=
=
−
=
+
ii
NNii JJ
ϑϑϑϑϑ
ϑ
ϑ
ϑϑϑ
)()(1
2
2
1
The step of the gradient method is modulated through the Hessian
⇒ very fast convergence
⇒ computationally more demanding
⇒ could not converge if the Hessian is definite negative.
This material is protected by copyright and is intended for students’ use only. Sell and distribution are strictly forbidden. Final
exam requires integrating this material with teacher explanations and textbooks.
Model Identification and Data Analysis (MIDA) - MODEL IDENTIFICATION
41
Quasi- Newton method
∂
∂−=
=
−+
i
Nii JM
ϑϑϑ
ϑϑϑ
)(11
1−M is a definite positive approximation of the Hessian matrix
⇒ it always converges to a minimum
⇒ less computationally demanding than the Newton method
⇒ faster convergence than the gradient method, although slower
than the Newton method
Quasi-Newton method are often adopted in practice
Remark
Since we introduced the approximation ( ) ( )
∑= ∂
∂
∂
∂≈
∂
∂ N
t
T
N tt
N
J
12
22)(
ϑ
ε
ϑ
ε
ϑ
ϑ,
the method we used to minimize )(ϑNJ can be more properly
classified as a quasi-Newton method
In order to guarantee that 2
2)(
ϑ
ϑ
∂
∂ NJ is invertible one usually takes
( ) ( )δ
ϑ
ε
ϑ
ε
ϑ
ϑI
tt
N
J N
t
T
N +
∂
∂
∂
∂≈
∂
∂∑
=12
22)(
,
where I is the identity matrix and δ is a small positive number