Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
System Identification and Optimization MethodsBased on Derivatives
Chapters 5 & 6 from Jang
System Identification and Optimization MethodsBased on Derivatives
Chapters 5 & 6 from Jang
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
2
Neuro-Fuzzy and Soft ComputingNeuro-Fuzzy and Soft Computing
Neural networks
Fuzzy inf. systems
Model space
Adaptive networks
Derivative-free optim.
Derivative-based optim.
Approach space
SoftSoftComputingComputing
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
3
OutlineOutline
•System Identification•Least Square Optimization•Optimization Based on Differentiation
• First order differentiation – Steepest descent• Second order differentiation
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
4
System Identification: IntroductionSystem Identification: Introduction
Goal• Determine a mathematical model for an unknown
system (or target system) by observing its input-output data pairs
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
5
System Identification: IntroductionSystem Identification: Introduction
Purpose:• To predict a system’s behavior, as in time series
prediction & weather forecasting• To explain the interactions & relationships
between inputs & outputs of a system• To design a controller based on the model of a
system, as an aircraft or ship control• Simulate the system under control once the model
is known
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
6
System Identification: IntroductionSystem Identification: Introduction
There are two main steps that are involved
• Structure identification
• Parameter identification
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
7
System Identification: IntroductionSystem Identification: Introduction
• Structure identification
Apply a-priori knowledge about the target system to determine a class of models within which the search for the most suitable model is to be conducted; this class of model is denoted by a function y = f(u,θ) where:
y is the model outputu is the input vector θ is the parameter vector
f depends on the problem at hand and on the designer’s experience and the laws of nature governing the target system
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
8
System Identification: IntroductionSystem Identification: Introduction
• Parameter identification
The structure of the model is known, however we need to apply optimization techniques in order todetermine the parameter vector such thatthe resulting model describes the system appropriately:
θ=θ ˆ)ˆ,u(fy θ=
iii u to assignedy with 0yy →−
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
9
System Identification: IntroductionSystem Identification: Introduction
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
10
System Identification: IntroductionSystem Identification: Introduction
The data set composed of m desired input-output pairs (ui;yi) (i = 1,…,m) is called the training data
System identification needs to do both structure & parameter identification repeatedly until satisfactory model is found: it does this as follows:
1) Specify & parameterize a class of mathematical models representing the system to be identified
2) Perform parameter identification to choose the parameters that best fit the training data set
3) Conduct validation set to see if the model identified responds correctly to an unseen data set
4) Terminate the procedure once the results of the validation test are satisfactory. Otherwise, another class of model is selected and repeat steps 2 to 4
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
11
Least-Square EstimatorsLeast-Square Estimators
General form:
y = θ1 f 1(u) + θ2 f2(u) + … + θnfn(u) (*)where:
• u = (u1, …, up)T is the model input vector• f1, …, fn are known functions of u• θ1, …, θn are unknown parameters to be estimated
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
12
Least-Square EstimatorsLeast-Square Estimators
The task of fitting data using a linear model is referred to as linear regression
We collect a training data set {(ui;yi), i = 1, …, m}
Equation (*) becomes:
which is equivalent to: A θ = y
=θ++θ+θ
=θ++θ+θ
=θ++θ+θ
mnmn2m21m1
2n2n222121
1n1n212111
y)u(f...)u(f)u(f
y)u(f...)u(f)u(fy)u(f...)u(f)u(f
M
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
13
Least-Square EstimatorsLeast-Square Estimators
Where: A is an m*n matrix which is:
θ is n*1 unknown parameter vector:
and y is an m*1 output vector:
A θ = y ⇔ θ = A-1y (solution)
=
)u(f)u(f
)u(f)u(fA
mnm1
1n11
L
M
L
θ
θ=θ
n
1
M
[ ])u(f),...,u(fa and y y
y ini1Ti
m
1
=
= M
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
14
Least-Square EstimatorsLeast-Square Estimators
We have m outputs & n fitting parameters to find (or m equations & n unknown variables)
Usually, m is greater than n, since the model is just an approximation of the target system and the data observed might be corrupted. Therefore, an exact solution is not always possible.
To overcome this inherent conceptual problem, an error vector e is added to compensate
A θ + e = y
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
15
Least-Square EstimatorsLeast-Square Estimators
The goal now consists of finding that reduces the errors between yi and
or
θθ= T
ii ay
∑=
−m
i
Tii ayimize
1min θ
θ
( )∑=θ
θ−m
1i
2Tii ayimizemin
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
16
Least-Square EstimatorsLeast-Square Estimators
If e = y - Aθ then:
We need to compute:
∑=
=θ−θ−==θ−=θ
mi
1i
TT2Tii )Ay()Ay(ee)ay()(E
)Ay()Ay(min T θ−θ−θ
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
17
Least-Square EstimatorsLeast-Square Estimators
Theorem [least-squares estimator]
The squared error is minimized when θ =(called the least-squares estimators LSE) satisfies the normal equation ATA = ATy, if ATA isnonsingular, is unique & is given by
= (ATA)-1ATy
θ
θθ
θ
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
18
Least-Square Estimators: ExampleLeast-Square Estimators: Example
ExampleThe relationship is between the spring length & the force applied L = k1f + k0 (linear model)
• Goal: find = (ko , k1)T that best fits the data for a given force fo, we need to determine the corresponding spring length L0
- Solution 1: provide 2 pairs (L0,f0) and (L1,f1) and solve a linear system of 2 equations and 2 variables k1 and k0However, because of noisy data, this solution is not reliable.
- Solution 2: use a larger training set (Li,fi)
θ
01 k & k⇒
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
20
Least-Square Estimators: ExampleLeast-Square Estimators: Example
since y = e + Aθ, we can write:
therefore the LSE of [k0,k1]T which minimizesis equal to :
{
{321
M
43421y
e7
4
3
2
1
1
0
A
0.56.41.43.35.21.25.1
e
eeee
kk
9.2 17.4 15.9 14.4 13.2 11.9 11.1 1
=
+
θ
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
21
Least-Square Estimators: ExampleLeast-Square Estimators: Example
Therefore, the LSE of [k0,k1]T which minimizes
is equal to :
∑=
=7
1i
2i
T eee
==
−
44.020.1
yA)AA(k
k T1T
1
0
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
22
Least-Square Estimators: ExampleLeast-Square Estimators: Example
We rely on this estimation because we have more data
If we are not happy with the LSE estimators then we can increase the model’s degree of freedom such that:
L = k0 + k1f + k2 f2 + … + knfn
(least square polynomial)
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
23
Least-Square Estimators: ExampleLeast-Square Estimators: Example
Higher order models fit better the data but they do not always reflect the inner law that governs the system
For example, when f is increasing toward 10N, the length is decreasing.
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
24
Least-Square Estimators: ExampleLeast-Square Estimators: Example
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
25
Derivative-Based OptimizationDerivative-Based Optimization
Based on first derivatives:• Steepest descent• Conjugate gradient method• Gauss-Newton method• Levenberg-Marquardt method• And many others
Based on second derivatives:• Newton method• And many others
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
26
Steepest DescentSteepest Descent[ ]Tnθθθθ ,...,, 21=
( ) ( )( )θθ EE min=*
Have n-dimensional space
*θθ =Want to find such that
Investigate the search space via iteration
( ),...3,2,11 =+=+ kkkkk dηθθ
such that
( ) ( ) ( )kkkk EEE θηθθ <+=+ d1
Neuro-Fuzzy and Soft Computing: Fuzzy Sets
27
Steepest DescentSteepest Descent
is determined from the gradient of ( )θEIf the direction d
( ) ( ) ( ) ( ) ( ) T
n
EEEEθ
∂∂
∂∂
∂∂
=∇=θθ
θθ
θθθ ,...,,
21
defgi.e.,
such that gηθθ −=+ kk 1
then we talk about the method of steepest descent