Upload
larissa-keith
View
53
Download
2
Embed Size (px)
DESCRIPTION
Chapter 3 Single-Layer Perceptrons. 授課教師 : 張傳育 博士 (Chuan-Yu Chang Ph.D.) E-mail: [email protected] Tel: (05)5342601 ext. 4337 Office: ES709. x 1 (i). Unknown Dynamic system. Output d(i). x 2 (i). x 3 (i). Adaptive Filter Problem. - PowerPoint PPT Presentation
Citation preview
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering
Chapter 3Single-Layer Perceptrons
授課教師 : 張傳育 博士 (Chuan-Yu Chang Ph.D.)E-mail: [email protected]: (05)5342601 ext. 4337Office: ES709
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 2
Adaptive Filter Problem在動態系統 (dynamic system) 中,其數學特徵是未知的,在系統中我們所知道的只有一組由系統產生的 labeled input-output data.也就是說,當一個 m-dimension 的輸入 x(i) 輸入到系統中,系統會產生對應的輸出 d(i) 。
因此系統的外部行為可表示成
UnknownDynamicsystem
x1(i)
x2(i)
x3(i)
Outputd(i)
Tm ixixixi
niidi
,...,,
where
,...,...,2,1;,:
21
x
x(3.1)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 3
Adaptive Filter Problem (cont.)問題在於如何設計一多輸入單一輸出的模型?
The neural model operates under the influence of an algorithm that controls necessary adjustments to the synaptic weights of the neuron.
• The algorithm starts from an arbitrary setting of the neuron’s synaptic weights.
• Adjustments to the synaptic weights, in response to statistical variations in the system’s behavior, are made on a continuous basis.
• Computations of adjustments to the synaptic weights are completed inside a time interval that is one sampling period long.
Adaptive model consists of two continuous processes• Filtering process, which involves the computation of two signa
ls. An output, and an error signal
• Adaptive process Automatic adjustment of the synaptic weights of the neur
on in accordance with the error signal e(i).
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 4
Adaptive Filter Problem (cont.)The output y(i) is the same as the induced local field v
(i)
Eq(3.2) 可表示成向量的內積形式
where
The neuron’s output y(i) is compared to the corresponding output d(i)
m
kkk ixiwiviy
1
iiiy T wx
Tm iwiwiwi ,...,, 21w
iyidie
x1(i)
x2(i)
x3(i)
d(i)
y(i)
e(i)
v(i)w1(i)
w2(i)
w3(i)
-1
(3.4)
(3.2)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 5
Unconstrained Optimization Techniques若一成本函數 (cost function)E(w) 對權重向量 w是連續可微,則 a
daptive filtering algorithm 的目的在於選擇一權重向量 w,具有最小的成本。
若最佳的權重向量為 w* ,則須滿足
Minimize the cost function E (w) with respect to the weight vector w.
The necessary condition for optimality is
where gradient operator is
the gradient vector of the cost function is
ww EE *
0* wE
T
mwww
,...,,21
T
mwww
EEE
E ,...,,21
w
(3.5)
(3.7)
(3.8)
(3.9)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 6
Unconstrained Optimization Techniques
Local iterative descentStarting with an initial guess denoted by w(0), generate
a sequence of weight vectors w(1), w(2),…,such that the cost function E(w) is reduced at each iteration of the algorithm
where w(n) is the old value of the weight vector and w(n+1) is its updated value.
We hope that the algorithm will eventually converge onto the optimal solution w*.
nn ww EE 1 (3.10)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 7
Method of steepest Descent
The successive adjustments applied to the weight vector w are in the direction of steepest descent, that is in a direction opposite to the gradient vector
The steepest descent algorithm is formally described by
The correction of the algorithm is
)(wg E
)()()1( nnn gww
)(
)()1()(
n
nnn
g
www
(3.11)
(3.12)
(3.13)
Stepsize/learning rate
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 8
Method of steepest Descent (cont.)
為了證明 steepest descent algorithm 滿足 Eq(3.10) 的條件,使用一個一階 Taylor 序列展開 w(n) 來近似 E(w(n+1))
將 Eq(3.13) 代入上式,可得
nnnn T wgww EE 1
2
1
nn
nnnn T
gw
ggww
E
EE
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 9
Method of steepest Descent (cont.)The method of steepest descent converges to t
he optimal solution w* slowly.The learning-rate parameter has a serious infl
uence on its convergence behavior.When is small, the transient response of the algorit
hm is overdamped, the trajectory traced by w(n) follows a smooth path in the W-plane.
When is large, the transient response of the algorithm is underdamped, the trajectory of w(n) follows a zigzagging (oscillatory) path.
When exceeds a certain critical value, the algorithm becomes unstable.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 10
Method of steepest Descent (cont.)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 11
Here F is assumed to be defined on the plane, and that its graph has a bowl shape.
The blue curves are the contour lines, that is, the regions on which the value of F is constant.
A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is perpendicular to the
contour line going through that point. We see that gradient descent leads us to the bottom of the bowl,
that is, to the point where the value of the function F is minimal.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 12
Method of steepest Descent (cont.)Newton’s method
To minimize the quadratic approximation of the cost functionE (w) around the current point w(n).
This minimization is performed at each iteration of the algorithm.Using a second-order Taylor series expansion of the cost functi
on around the point w(n).
g(n) is the m-by-1 gradient vector of the cost function E (w) evaluated at the point w(n).
The matrix H(n) is the m-by-m Hessian matrix of E (w) .
nnnnn
nnn
TT wHwwg
www
2
1
1 EEE(3.14)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 13
Method of steepest Descent (cont.) The Hessian of E (w) is defined by
從 Eq(3.15) 可知, cost function E (w) 必須可對 w進行兩次微分將 Eq(3.14) 對 w進行微分,當下式滿足時, E (w) 改變量將會
最小
上式可解得
也就是
wwwww
wwwww
wwwww
mmm
m
m
2
2
2
2
1
2
2
2
22
2
12
21
2
21
2
21
2
2
EEE
EEE
EEE
E
wH
0 nnn wHg
nnn gHw 1
nnnnnn gHwwww 11
(3.15)
(3.16)
The Hessian H(n) has to be a positive definite matrix for all n. There is no guarantee that H(n) is positive definite at every iteration of the algorithm.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 14
Method of steepest Descent (cont.)Gauss-Newton Method
Let the sum of error square
error signal e(i) 是可調的權重向量 w的函數。給定一工作點 w(n) ,我們可將 e(i) 在 w的相依性表示成
其矩陣表示法為
其中錯誤向量 (error vector) 表示成
n
i
ie1
2
2
1wE
nnnn wwJewe ,'
Tneeen ,...,2,1e
(3.19)
,...,n,inie
ieieT
n
21 ,,'
www
www
(3.17)
(3.18)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 15
Method of steepest Descent (cont.)J(n) is the n-by-m Jacobian matrix of e(n):
The Jacobian J(n) is the transpose of the m-by-n gradient matrix ∇e(n)
The updated weight vector w(n+1) is then defined by
neeen ,...,2,1e
nm
m
m
www
www
www
nenene
eee
eee
n
ww
J
21
21
21222
111
2
,'2
1minarg1 weww
nn
(3.20)
(3.21)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 16
Method of steepest Descent (cont.)Using Eq(3.19) to evaluate the squared Euclidean norm
of e’(n,w), we get
上式對 w微分,並令結果為零 , 可得
可解得
Gauss-Newton 法只需要 error vector e(n) 的 Jacobian matrix 。但須確保 JT(n)J(n) 是非奇異矩陣 (nonsingular)
nnnn
nnnnn
TT
T
wwJJww
wwJeewe
2
12
1,'
2
1 22
0 nnnnn TT wwJJeJ
nnnnnn TT eJJJww1
1
(3.22)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 17
Method of steepest Descent (cont.)There is no guarantee that this condition will always hold.
• Add the diagonal matrix I to the matrix JT(n)J(n).
• The parameter is a small positive constant.
On this basis, the Gauss-Newton method is implemented in the slightly modified form
The effect of this modification is progressively reduced as the number of iterations , n, is increased.
Eq(3.23) 為底下 modified cost function 的解
w(0) 為權重向量 w(i) 的初始值。
nnnnnn TT eJIJJww1
1
n
i
ie1
220
2
1www E
(3.23)
(3.24)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 18
Linear Least-Squares FilterLinear Least-Squares Filter has two distinctive c
haracteristics: The single neuron is built in linear The cost function E (w) used to design the filter consists of the su
m of error squares. 因此使用 Eq(3.3) 和 Eq(3.4) , error vector 可表示成
where d(n) is a n-by-1 desired response vector:
and X(n) is the n-by-m data matrix:
nnn
nnnn T
wXd
wxxxde
,...,2,1
Tndddn ,...,2,1d
Tnn xxxX ,...,2,1
(3.25)
x1(i)
x2(i)
x3(i)
d(i)
y(i)
e(i)
v(i)w1(i)
w2(i)
w3(i)
-1
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 19
Linear Least-Squares Filter (cont.)將 Eq(3.25) 對 w(n) 微分可得梯度矩陣 (gradient matrix)
e(n) 的 Jacobian 為
將 Eq(3.25) 和 (3.26) 代入 (3.22) 可得
因此, Eq(3.27) 可改寫成
nn TXe
nn XJ (3.26)
nnnn
nnnnnnnn
TT
TT
dXXX
wXdXXXww1
11
nnn dXw 1
nnnn TT XXXX1
The pseudoinverse of the data matrix X(n)
(3.27)
(3.28)
(3.29)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 20
Linear Least-Squares Filter (cont.)Wiener Filter:
The input vector x(i) and desired response d(i) are draw from an ergodic environment.
We may then substitute long-term sample for expectations (ensemble averages)
Ergodic environment 可使用二階統計來表示Correlation matrix of the input vector x(i), Rx.Cross-correlation vector between the input vector x(i) and desired
response d(i) , rx,d.
where E denotes the statistical expectation operator.
nnn
iiiiE T
n
n
i
T
n
T XXxxxxRx1
limlim1
ndnn
idin
idiE T
n
n
in
d Xxxrx1
lim1
lim1
(3.31)
(3.30)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 21
Linear Least-Squares Filter (cont.)Accordingly, we may reformulate the linear least-squares soluti
on of Eq(3.27) as
The weight vector w0 is called the Wiener solution to the linear optimum filtering problem.
For an ergodic process, the linear least-square filter asymptotically approaches the Wiener filter as the number of observations approaches infinity.
However, the second-order statistics is not available in many important situations encountered in practice.
d
T
n
T
n
TT
nn
nnn
nnn
nnnnn
xx rR
dXXX
dXXXww
1
1
10
1lim
1lim
lim1lim
(3.32)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 22
Least-Mean-Square Algorithm
The LMS algorithm is based on the use of instantaneous values for the cost function
其中, e(n) 為時間 n 時的錯誤訊號將 E(n) 對權重向量 w微分可得
new 2
2
1E
w
nene
w
w
E
(3.33)
(3.34)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 23
Least-Mean-Square Algorithm (cont.)因為
因此
所以 Eq(3.34) 可改寫成
上式稱為梯度向量的估計 (Estimated Gradient vector) 可得
套入 Eq(3.12) 的最陡坡降法, LMS 可寫成。
)()(
)(n
n
nex
w
nnndne T wx
nenn xg ˆ
nennn xww ˆ1ˆ
)()()(
)(nen
nx
w
w
E
(3.35)
(3.36)
(3.37)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 24
Least-Mean-Square Algorithm (cont.)
Summary of LMS AlgorithmTraining Sample:
Input signal vector: x(n)Desired response: d(n)
User-selected parameter: Initialization:
Set
ComputationFor n=1, 2,…, compute
nnndne T xw
nennn xww ˆ1ˆ
0w (0)ˆ
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 25
Least-Mean-Square Algorithm (cont.)Signal-flow graph representation of LMS algorith
m將 Eq(3.35) 和 Eq(3.37) 結合起來,可將 LMS 演算法的權重向量演化的過程表示成
其中 I 為 identity matrix因此,我們將
ndnnnnI
nnndnnnT
T
xwxx
wxxww
ˆ
ˆˆ1ˆ(3.38)
1ˆˆ 1 nzn ww
)1(ˆ nw
z-1I
x(n)xT(n)
x(n)d(n))(ˆ nw
++
-
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 26
Convergence Considerations of the LMS Algorithm從控制理論我們知道一個回饋系統 (feedback system) 的穩定性 (stability) 是由回饋迴路的參數來決定。從圖 3.3 , LMS 演算法的回饋迴路中有兩個參數: learnin
g rate , input vector x(n)LMS 演算法的收斂準則
Convergence in the mean square
假設• 連續的輸入向量 x(1), x(2),… 在統計上是彼此獨立的• 在時間 n ,輸入向量 x(n) 對於前面所有樣本的 disired response d
(1), d(2),…d(n-1) 是統計上獨立的• 在時間 n , desired response d(n) 相依於 x(n)• x(n) 和 d(n) 是從 Gaussian-distributed 中取出
nneE asconstant2 (3.41)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 27
Convergence Considerations of the LMS Algorithm
By invoking the elements of independence theory and assuming that the learning rate parameter is sufficiently smallThe LMS is convergent in the mean square provided
that satisfies the condition
where max is the largest eigenvalue of the correlation matrix Rx.
然而,在實際的 LMS 的應用中,缺乏關於的 max知識。為了解決此難題,可使用 trace of Rx作為 max的保守估計,則 Eq(3.42) 可改寫成
max
20
xRtr
20
(3.42)
(3.43)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 28
Convergence Considerations of the LMS Algorithm
By definition, the trace of a square matrix is equal to the sum of its diagonal elements.Each diagonal element of the correlation matrix Rx e
quals the mean-square value of the corresponding sensor input
因此, Eq(3.43) 可再改寫成
提供一個滿足上式的學習速率, LMS 演算法可保證收斂到 mean-square , (implies convergence of the mean)
inputssensortheofvaluessquaremeanofum
20
s (3.44)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 29
Virtues and Limitations of the LMS algorithmVirtues of the LMS algorithm
Simplicity Robust
Small model uncertainty and small disturbances can only result in small estimation errors.
Limitations of the LMS algorithm Slow rate of convergence
Typically requires a number of iterations equal to about 10 times the dimensionality of the input space for it to reach a steady-state condition.
Sensitivity to variations in the eigenstructure of the inputThe LMS algorithm is sensitive to variations in the condition numb
er or eigenvalue defined by
When the condition number X(Rx) is high, the sensitivity of the LMS algorithm becomes acute.
min
max
xR (3.45)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 30
Learning CurvesLearning curve
Is a plot of the mean-square value of the estimation error, Eav(n), versus the number of iterations, n.
Rate of convergenceDefine as the number of iterati
ons, n, required for Eav(n) to decrease to some arbitrarily chosen value.Such as 10 percent of the initial
value Eav(0). Misadjustment
How close the adaptive filter is to optimality in the mean-square error sense.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 31
Learning Curves (cont.) Misadjustment is defined as
where Emin denote the minimum mean-square error produced by the Wiener filter, designed on the basis of known values of the correlation matrix Rx and cross-correlation vector rxd.
The misadjustment M of the LMS algorithm is directly proportional to the learning-rate parameter .
The average time constant av is inversely proportional to the learning rate parameter .If the learning rate parameter is reduced so as to reduce the misadj
ustment, then the settling time of the LMS algorithm is increased. Careful attention must be given to the choice of the learning para
meter in the design of the LMS algorithm in order to produce a satisfactory overall performance.
1
minmin
min
EE
EEE
M(3.46)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 32
Learning-rate Annealing SchedulesLMS 演算法在計算過程可以將 learning –rate 設定成幾種方式:Constant
Learning-rate is time-varying (by Robbins, 1951)
where c is a constant. When c is large, there is a danger of parameter blowup for small n.
Search-then-converge schedule (by Darken and Moody, 1992)
nn allfor 0
n
cn
/1
0
nn
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 33
Learning-rate Annealing SchedulesLearning-rate annealing schedules
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 34
PerceptronMcCulloch-Pitts model
The perceptron consists of a linear combiner followed by a hard limiter (signum function).
The summing node of the neuronal model computes a linear combination of the inputs applied to its synapses, and also incorporates an externally applied bias.
The resulting sum is applied to a hard limiter.The neuron produces an output equal to +1 if th
e hard limiter input is positive, and -1 if it is negative.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 35
Perceptron (cont.)The synaptic weights of the perceptron are denoted by w1,
w2,…,wm.
The inputs applied to the perceptron are denoted by x1, x2,…,xm.
The bias is denoted by b.The induced local field of the neuron is
x1
x2
xm
Bias, bw1
w2
wm
yv
(v)
m
iii bxwv
1
(3.50)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 36
Perceptron (cont.)Perceptron 的目的在於將外界輸入 (x1, x2,…,xm) 的刺激正確的分類為 class C1或 class C2。The decision rule for the classification is to assign the point
represented by the inputs (x1, x2,…,xm) to class C1 if the perc
eptron output y is +1 and to class C2 if it is -1 。The simplest form of the perceptron is two decision regions
separated by a hyperplane defined by
The synaptic weights (w1, w2,…,wm) of the perceptron can be adapted on an iteration-by-iteration basis.
01
m
iii bxw
x1
x2
Class C1
Class C2
(3.51)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 37
Perceptron Convergence Theorem 根據圖 3.8( 將圖 3.6 的 bias 納入固定輸入 ) ,則 (m+1)-by-1 的 input v
ector 和 weight vector 可表示成
因此, the induced local field of the neuron is defined as
wTx=0 時,座標點 (x1, x2,…,xm) 會描繪一 hyperplane ,可將 input 分成兩類
nwnwnwnbn
nxnxnxn
m
m
,...,,),()(
,...,,,1)(
21
21
W
X
x1
x2
xm
w1
w2
wm
yv
(v)
x0=+1w0=b
m
i
Tii nnnxnwnv
0
xw (3.52)
其中 w0(n) 為 bias b(n)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 38
Perceptron Convergence Theorem (cont.)
ClassC2
ClassC1
DecisionBoundary
ClassC2
ClassC1
欲被分類的 pattern 必須有足夠的分離,以確保存在一 hyperplane
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 39
Perceptron Convergence Theorem (cont.)假設 perceptron 的輸入變數,是由兩個可線性分離的 class
所組成,其中子集合 X1={x1(1), x1(2),…} ,子集合 X2={x2(1), x2(2),…} , X1和 X2的聯集構成完整的 training set X 。
拿 X1和 X2來訓練分類器,將會調整權重向量 w,使兩個類別 C1和 C2可線性分離。也就是說存在一個權重向量 w,使
2
1
class tobelonging or input vectevery for 0
class tobelonging or input vectevery for 0
C
C
xxw
xxw
T
T
(3.53)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 40
Perceptron Convergence Theorem (cont.)The algorithm for adapting the weight vector of the elemen
tary perceptron is formulated as follows:If the nth member of the training set, x(n), is correctly classifi
ed by the weight vector w(n), no correction is made to the weight vector of the perceptron.
Otherwise, the weight vector of the perceptron is updated in accordance with the rule
2
1
class tobelongs )( and 0 if )()1(
class tobelongs )( and 0 if )()1(
C
C
nnn
nnnT
T
xxwww
xxwww
1
2
class tobelongs )( and 0 if n)()1(
class tobelongs )( and 0 if n)()1(
C
C
nnnn
nnnnT
T
xxwxww
xxwxww
(3.54)
(3.55)
X(n) 應被分成 C1 但被錯分為 C2;X(n) 應被分成 C2 但被錯分為 C1;
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 41
Perceptron Convergence Theorem (cont.)
證明 =1 時, fixed increment adaptation rule的收斂性。假設 initial condition w(0)=0 , wT(n)x(n)<0 for n=1,2,…,
and the input vector x(n) belongs to the subset X1。( 也就是說, percetron 錯將 x(1), x(2)… 分成第二類 ) n)=1 ,根據 Eq(3.55) 的第二式,可得
給訂初始條件 w(0)=0 ,則 w(n+1) 可由逐次累加 x(n)獲得
1 class tobelonging (n)for )()()1( Cxxww nnn
)(...)2()1()1( nn xxxw
(3.56)
(3.57)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 42
Perceptron Convergence Theorem (cont.)
由於 class C1和 C2是假設可線性分離,因此存在一個解 w
0,對屬於 X1子集合的所有輸入向量 x(1),…,x(n) ,使 wTx(n)>0 。因此,可定義一正值
因此對 Eq(3.57) 兩側同時乘以 wT0
因此,根據 Eq(3.58) 的定義我們可得
)(min 0)( 1
nT
nxw
x C
)(...)2()1()1( 0000 nn TTTT xwxwxwww
nnT )1(0ww
(3.58)
(3.59)
共有 n項
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 43
Perceptron Convergence Theorem (cont.)
根據 Cauchy-Schwarz inequality ,可知
將 Eq(3.59) 代入 Eq(3.60) 可得
或
2022
0 11 nn Twwww
2222
0 1 nn ww
2
0
2221w
wn
n
(3.60)
(3.61)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 44
Perceptron Convergence Theorem (cont.)實際上, Eq(3.56) 可改寫成 ( 以 k 取代 n)
對 Eq(3.62) 兩邊同時取 Euclidean平方,並展開可得
因為一開始假設 perceptron 錯將屬於 C1的向量 x(k) 分成C2,因此 wT(n)x(n)<0 ,所以可從 Eq(3.63)推得
將上式移項,可得
kkkkk T xwxww 21 222
2221 kkk xww
,...,nkkkk 1 1 222 xww
(3.62)
(3.63)
(3.64)
1)( and 1,...,for )()(1 X knkkkk xxww
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 45
Perceptron Convergence Theorem (cont.)代入初始條件 w(0)=0 ,並將所有的不等式 k=1,…,n加總,可得
其中
在 Eq(3.65) 和 Eq(3.61) 中, n 的值不能超過某個值 nmax ,此 nmax 必須同時滿足 Eq(3.65) 和 Eq(3.61) ,因此
將上式移項整理可得
n
kkn
k
1
221 xw
2
)( 1
max kk
xx X
(3.65)
(3.66)
max2
0
22max n
n
w
2
2
0max
wn (3.67)
Perceptron 必須在最多經過 nmax 次疊代後,停止調整 synaptic weight
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 46
Perceptron Convergence Theorem (cont.)
因此,當 (n)=1 for all n, and w(0)=0, perceptron 調整神經鍵的權重值,最多只需 nmax次的迭代。
Fixed-increment convergence theorem of the perceptronLet the subsets of training vectors X1 and X2 be linearly sep
arable.Let the inputs presented to the perceptron originate from th
ese two subsets.
The perceptron converges after some n0 iterations, in the s
ense that
is a solution vector for n0<=nmax
...21 000 nnn www
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 47
Perceptron Convergence Theorem (cont.)
Absolute error-correction procedure for adaptation of a single-layer perceptron
Each pattern is presented repeatedly to the perceptron until that pattern is classified correctly.
The use of an initial value w(0) merely results in a decrease or increase in the number of iterations required to converge, depending on how w(0) relates to the solution w0.
nnnnn TT xwxx
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 48
Perceptron Convergence Theorem (cont.)Summary of the Perceptron Convergence Theorem
Initialization. Set w(0)=0. Then perform the following computations for time step n=1,2,…
Activation. Activate perceptron by applying continuous-valued input vector x(n) and desired response d(n)
Computation of Actual Response. Compute the actual response of the perceptron
Adaptation of Weight Vector. Update the weight vector of the perceptron
whereContinuation: Increment time step n by one and go back to step
2.
nnny T xwsgn
nnyndnn xww 1
2
1
class tobelongs )( if1
class tobelongs )( if1
C
C
n
nnd
x
x
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 49
Relation between the perceptron and Bayes Classifier for a Gaussian Environment
Bayes ClassifierTo minimize the average risk RR.For a two-class problem, the average risk is defined as
pi: a priori probability that the observation vector x is drawn from subspace Xi
cij: cost of deciding in favor of class Ci represented by subspace Xi when class Cj is true.
fx(x|Ci): conditional probability density function of the random vector X
1
)|()|( +
)|()|(
212121
222111
XX
X X
CC
CC
xxxx
xxxx
xx
xx
dfpcdfpc
dfpcdfpc
21
21
2
1 2(3.72)
Correct decision
Incorrect decision
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 50
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
由於每個 observation vector x需被分成 C1或 C2中的一類,因此
因此, Eq(3.72) 可改寫成
where c11<c21 and c22<c12, we observe that fact that
2X1XX
1
)|()|( +
)|()|(
212121
222111
XXX
X XX
CC
CC
xxxx
xxxx
xx
xx
dfpcdfpc
dfpcdfpc
2-
1
2-
1
1
1 1
1)|()|( 21 XX
CC xxxx xx dfdf
(3.73)
(3.74)
(3.75)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 51
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
因此,將 Eq(3.74) 展開後,可簡化成
其中, Eq(3.76) 的前兩項為固定成本。由於我們需要使 average risk RR 最小化, 因此可根據 Eq
(3.76)推導出下列策略 :若 observation vector x的積分值為負值,則 x應被指定為
X1 (class 1, C1) 。若 observation vector x的積分值為正值,則 x應被指定為
X2 (class 2 , C2) 。若 observation vector x的積分值為零值,表示其對 averag
e risk RR沒有影響可指定成任何一類,這裡則將 x應指定為X2 (class 2) 。
1
)|()()|()( 1121122122222121
X
CC xxx xx dfccpfccppcpc 12(3.76)
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 52
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
根據前面的說明, Bayes classifier 可定義成 If the condition
holds, assign the observation vector x to subspace X1 (Class C1). Otherwise assign x to X2 (class C2)
為方便說明,將上式移項整理後,定義
222122111211 || CC xx xx fccpfccp
)|(
)|(
2
1
C
C
x
xx
x
x
f
f
)(
)(
11211
22122
ccp
ccp
(3.77)
(3.78)
Likelihood ratio
Threshold
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 53
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
將 Bayes classifier 重新敘述成For an observation vector x, the likelihood ratio (x) i
s greater than the threshold , assign x to class C1. Otherwise, assign it to class C2.Likelihood
Ratiocomputer
Comparatorx(x)
Assign x to class C1
If (x)>Otherwise, assign it to class C2
LikelihoodRatio
computerComparatorx
log(x)Assign x to class C1
If log(x)>Otherwise, assign it to class C2
log
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 54
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
Bayes Classifier for Gaussian Distribution假設
X的條件機率密度函數可表示為
假設兩個類別的機率相等
假設分類錯誤的成本相等,正確分類成本為零
CμXμX
μX
CμXμX
μX
T
T
E
EClass
E
EClass
22
22
11
11
:
:
C
C
2 ,1 , )()(2
1exp
(det()2(
1)|( 1
2
if i
Tim
μxCμxC
xx21i
))C
2
121 pp
022111221 ccandcc
(3.80)
(3.81)
(3.79)
因為 C1 和 C2 有相關,所以共變異矩陣 C不為對角矩陣,假設 C為非奇異矩陣,因此存在 C-1 。
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 55
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
將 Eq(3.79) 代入 Eq(3.77) ,再取 log 可得
將 Eq(3.80) 和 Eq(3.81) 代入 Eq(3.78) ,再取 log 可得
Eq(3.82) 和 Eq(3.83) 表示的 Bayes classifier 可描述成底下的 linear classifier
其中
)(2
1)(
)()(2
1)()(
2
1)(log
11
121
21
21
21
211
1
μCμμCμxCμμ
μxCμxμxCμxx
TTT
TT
0log
by T xw)(log xy
)( 211 μμCw
)(2
11
112
12 μCμμCμ TTb
(3.82)
(3.83)
(3.84)
(3.85)
(3.86)
(3.87)
Threshold =1
從 Eq(3.51) 和 Eq(3.84) 可知, Bayes classifier 類似於 perceptron的 linear classifier
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 56
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)
The classifier consists of a linear combiner with weight vector w and bias b
根據 Eq(3.84) , log-likelihood test for two-class problem 可描述成If the output y of the linear combiner is positive, assi
gn the observation vector x to class C1. otherwise, assign it to class C2.
x1
x2
xm
Bias, bw1
w2
wm
y
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 57
Relation between the perceptron and Bayes Classifier for a Gaussian Environment (cont.)Perceptron vs. Bayes classifier for Gaussian
The perceptron operates on the premise that the patterns to be classified are linearly separable. The Gaussian distribution of the two patterns assumed in the derivation of the Bayes classifier certainly do overlap each other and are therefore no separable.
The Bayes classifier minimizes the probability of classification error. The Bayes classifier always positions the decision boundary at the point where the Gaussian distributions for the two classes C1 and C2 cross each other.
Nonparametric vs. parametric The perceptron convergence algorithm is both adaptive and sim
ple to implement ,但 Bayes classifier 的計算較複雜且較浪費記憶體空間。
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 58
Two overlapping, one-dimensional Gaussian distributions
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 59
Ergodic process (From Wikipedia) In probability theory, stationary ergodic process is a
stochastic process which exhibits both stationarity and ergodicity. In essence this implies that the random process will not change its statistical properties with time.
Stationarity is the property of a random process which guarantees that its statistical properties, such as the mean value, its moments and variance, will not change over time. A stationary process is one whose probability distribution is the same at all times.
Several sub-types of stationarity are defined: first-order, second-order, nth-order, wide-sense and strict-sense.
An ergodic process is one which conforms to the ergodic theorem. The theorem allows the time average of a conforming process to equal the ensemble average. In practice this means that statistical sampling can be performed at one instant across a group of identical processes or sampled over time on a single process with no change in the measured result.
資訊工程所 醫學影像處理實驗室 (Medical Image Processing Lab. )Graduate School of Computer Science & Information Engineering 60
Taylor series, (From Wikipedia)Taylor series
Taylor series is a representation of a function as an infinite sum of terms calculated from the values of its derivatives at a single point. It may be regarded as the limit of the Taylor polynomials.
The Taylor series of a real or complex function f that is infinitely differentiable in a neighborhoods of a real or complex number a, is the power series
which in a more compact form can be written as