Upload
collin-lyons
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.1
Some useful Contraction Mappings Results for a particular choice of norms
Prop.1.12
1
2ini i iii i
.i ii
• R ( x ' G x ) ,x
max xx
Defi ne a norm on by norm by and
consequently a block-maximum norm by
1 2 3 3 2
i
i
G
• •
A ,A ,A A A i
x, y X
Suppose that each is symmetric positive defi nite and
let the norms and be as above. Suppose that there exist
positive constants with , such that f or each
and f or each , we
2 2
2 3
2
1
10
2
i i ii
i i i i i
n
-i i i i
f ( x ) f ( y ) A x y
f ( x ) f ( y ))'( x y ) A x y A x y
r T : X RA
T ( x ) x - rG f ( x )
have
and
(
Then, provided that , the mapping defi ned by
is a contrac w.r.t. the blo • .ck-maximum norm
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.2
Some useful Contraction Mappings
Prop.1.13 Assume the following:
1 1
2 22
1 1 1
2 2 2
i
i i i i i i i
ii i i i
G
|| x || ( x ' G x ) ||G x ||
G G G G G
•
I f is symmetric positive defi nite, then
where is a symmetric square root of (i.e. )
and is the Euclidean norm. (by Prop.A.27 and A.28)
20
0 0 i i i
X f : R R
G i
k || f ( x )|| k x X
δ ε f ( x )- δG
(a) The set is convex and is continuously diff erentiable.
(b) is symmetric and positive defi nite f or
(c) f or
(d) and is nonnegative defi ni1 1
2 22
1
1j j i jj i
n -i i i
i,x X ||G f ( x )G || δ( ε ), i, x X
r
T : X R T ( x ) x - rG f ( x )
te,
and
Then, provided that is positive and small enough,
, defi ned by is a contradiction
w.r.t the block-ma1
2i i i
i|| x || ( x ' G x )maxximum norm
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.3
Unconstrained Optimization Jacobi algorithm(Generalization of the JOR for linear eq.s)
Gauss-Seidel algorithm(Generalization of the SOR for linear eq.s)
1
2
1
0ii ii
x( t ) x( t ) r[ D( x( t ))] F( x( t ))
r D( x )
i [ D( x )] F( x )
F( x ) Ax b
Where is a positive step size, and is a diagonal matrix
whose -th diagonal entry is .
C.f .) I n the linear eq. case with
1
1
1
1
x( t ) x( t ) rD {( B D )x( t ) b }
( r )x( t ) rD ( Bx( t ) b )
:J OR
2
1 1
1 1
1 1
ii i
ii
i i n
F( z( i,t ))x ( t ) x ( t ) r , i , ,n
F( z( i,t ))
z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t ))
Where
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.4
Gradient algorithm(Generalization of the Richardson’s for linear eq.s)
Gauss-Seidel variant of the Gradient algorithm
The above 4 algorithms are called the Descent Algorithm; in fact, the Gradient algorithm is called the Steepest Descent Algorithm.
))(()()1( txFrtxtx
nitizFrtxtx iii ,,1 )),,(()()1(
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.5
Descent Direction
21 0
The J acobi algorithm can be viewed as a scaled version of
the gradient algorithm, whereby the th component of the update
is scaled by a f actor of , assuming i ii ii
i
F( x( t )) / F( x( t )) F( x( t ))
)(xF
)(xF
x
rs
rsx
F(x)vs)F(x
Θ2 2S ' F( x ) || S || || F( x )|| cos θ
F
Directional derivative of
along the direction s
0Any vector satisf ying is called a descent
direction.
nS R S' F( x )
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.6
Scaled Gradient algorithm
))(())(()()1( 1 txFtDrtxtx
)(xFrx
x B
D( t ) D( t )Where is a scaling matrix ; of ten, is chosen diagonal,
which simplifi es the task of inverting it.
B
A
With proper scaling, the direction of is pref erable to that
of .
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.7
Newton and Approximate Newton Methods
Even for nonquadratic case, Newton’s algorithm converges much faster (under certain assumptions) than previously introduced algorithms, particularly in the neighborhood of the optimal solution [OrR 70]
2 1
2
1
1
2
1
F
x( t ) x( t ) r( F( x( t ))) F( x( t ))
F( x ) x' Ax x' b
F( x( t )) Ax( t ) b F( x( t )) A
x( t ) x( t ) -
Assume that is twice continuously diff erentiable.
I n the linear (quadratic) case with
i.e. and ,
1
1
1
1
1 1
-
-
rA ( Ax( t ) - b )
( - r )x( t ) rA b
r x( t ) A b
I f , converge in a single step!
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.8
The Jacobi algorithm can be viewed as an approximation of Newton’s algorithm in which the off-diagonal entries of are ignored.
Approximate Newton Method
2
1
Where is the approximate solution of
and
(Employ an iterative algorithm to solve and
terminate af ter on
ˆx( t ) x( t ) rS( t )
S( t ) H( t ) S( t ) g( t ),
g( t ) F( x( t )) H( t ) F( x( t ))
H( t )S( t ) -g( t )
ly a f ew iteration).
F2
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.9
Convergence Analysis using the descent approach Assumption 2.1
Lemma 2.1 (Descent Lemma)
2 2
0a. f or every
b. (Lipschitz Continuity of
The f unction is continuously diff erentiable and
there exists a constant such that
n
n
F( x ) x R
F
K
F( x ) F( y ) K x y x, y R
2
22n
F
KF( x y ) F( x ) y' F( x ) y , x, y R
I f is satisfi ed the Lipschitz condition of Assumption 2.1(b)
Then,
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.10
Prop. 2.1 (Convergence of Descent Algorithms)
1 2
12 2
1
Suppose that Assumption 2.1 holds and let and be
Positive constraints. Consider the seq. generated by
Where satisfi es
(*)
And
K K
{ x( t )}
x( t ) x( t ) γs( t )
s( t )
s( t ) K F( x( t )) , t
2
2 2
220 0
t (**)
I f
then t
S( t )' F( x( t )) K S( t ) ,
Kγ , lim F( x( t )) .
K
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.11
22
2
2
2 2
2
12
2
02
Proof ) Using the descent lemma and the assumption (**)
We have
Let . Then by the
KF( x( t )) F( x( t )) γS( t )' F( x( t )) γ S( t )
KγF( x( t )) γ( K ) S( t )
Krβ r( K ) β
2
20
2
20
0 1 0
10
0
assumption on
Adding these inequalities,
Since this is true f or all ,
This implies that and (*) shows that
t
τ
τ
t
r
F( x( t )) F( x( )) β S( τ )
t
S( τ ) F( x( )β
lim S( t ) l
0
Q.E.Dtim F( x( t )) .
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.12
1
01 0
Proof of Descent Lemma)
Let be a scalar parameter and let .
The chain rule yields,
t g( t ) F( x yt )
dg( t )y' F( x yt )
dtdg( t )
F( x y ) F( x ) g( ) g( ) dtdt
1
0
1 1
0 0
1
2 20
2
1
0
y'
y' F( x ty )dt
y' F( x )dt y'( F( x ty ) F( x ))dt
F( x )dt y F( x ty ) F( x ) dt
y' F( x ) y Kt y
1
20
2
2
1
2
( Lipschitz)
y' Q.E.D
dt
F( x ) K y
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.13
Show that Jacobi, Gradient, scaled Gradient, Newton and Approximate Newton satisfy the conditions of Prop.2.1 (under certain assumptions), who implies that for these algorithms.
0tlim F( x( t ))
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.14
Gradient Algorithm
2 2
1
2 2
22 2
2
1
1
20
s( t ) F( x( t ))
s( t ) F( x( t ))
K
s( t )' F( x( t )) s( t )' s( t )
s( t ) K s( t )
K
γK
Scaled Gradient Algorithm
1
2
2
2
2
2 2
2
2 2
1
0
0
x( t ) γs( t ) x( t )
s( t ) ( D( t )) F( x( t )) - - - - - - (*)
Assume that K
D( t )- K I is nonnegative definite for each t.
s( t )'( D( t ) K I )s( t )
s( t )' D( t )s( t ) K s( t )
s( t )' F( x( t )) K s( t ) ( (*))
Assume 1
2
2 2 2
22 1
1
t
ii
that Ksup D( t )
Then, D( t )s( t ) - F( x( t ))
D( t ) s( t ) F( x( t ))
Jacobi K F( x ) K
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.15
Prop. 2.2 (Convergence of the Gauss-Seidel Algorithm)
2
0
0
20
Suppose that Assumption 2.1 holds and that is twice
diff erentiable.
Assume that there exist constants , such that
f or all
I f f or all , and if the seq.
i i
ni ii i
i
i
F
d D
d F( x ) D x R ,
dγ i
D
2
1 1
2
1 1
1 1
0
0 0 0 0
generated by
where
Then
Proof ) Let
ii i
ii
i i n
t
i i
ii
F( z( i,t ))x ( t ) x ( t ) γ , i , ,n
F( z( i,t ))
z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t )),
lim F( x( t )) .
F( z( i,t ))S ( t ) ( , , , , , , )
F( z( i,t ))
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.16
1
1
Then, 1 i n and
Let us view as a f unction of the single variable .
By the mean value theorem (Prop. A.30), there exists
Some such tha
i
n
i
z( i ,t ) z( i,t ) γS ( t ),
x( t ) z( n,t ) γS ( t )
F x
z [ x, y ]
2
2
2 2
22
22
t
Using the Descent Lemma
i i ii
i i i i i i
i
i i ii
F( x ) F( y ) F( z )( x y )
F( x ) F( y ) D x y D x y
F( x ) F( y ) D x y
DF( z( i,t ) γS ( t )) F( z( i,t )) γS ( t )' F( z( i,t )) γ S ( t )
2 22
2 2
2
2
2
2
by def . of
i iii
i
iii
DF( z( i,t )) γd S ( t ) γ S ( t )
( S ( t ))
DF( z( i,t )) γ( d γ ) S ( t )
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.17
The case of a convex cost function Prop. 2.3 (Convergence of Descent Methods in Convex Optim.)
Prop. 2.4 (Geometric Convergence for Strictly Convex Optim.)
F
{ x( t )} x*
{ x( t )} x* F
Suppose that is convex and satisfi es Assumption 2.1, and
The seq. is as in Prop. 2.1 or 2.2. I f is a limit point of
, then minimizes .
2
20 nα ( F( x ) F( y ))'( x y ) α x y , x, y R (*)
Suppose, in addition to Assumption 2.1, that there exists some
such that
* nx R F
r
{ x( t )} x*
Then, there exists a unique that minimizes .
Furthermore, provided that is chosen positive and small enough,
generated by the gradient algorithm converges to
geometrically.
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.18
2
n nT : R R
T( x ) x γ F( x )
Proof )
(*) implies that the mapping
defi ned by is a contraction w.r.t. Euclidean norm
Provided that r is positive and suffi ciently small.
Use Prop. 1.12 to prove it!
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.19
Convexity Definition A.13
Convex set Non-convex set
Str
ictl
y C
onve
x
Con
vex,
but
not
st
rict
ly c
onve
x
Non
-con
vex
C C C
1 0 1αx ( α )y C, x, y C, α [ , ] Let C be a subset of Rn. We say that C is convex if
1 1 0 1
f : C R
f ( αx ( α )y ) αf ( x ) ( α ) f ( y ), x, y C, α [ , ]
Let C be a convex subset of Rn. A f unction is called convex if
The f unction f is called concave if f is convex
1 1 0 1
x, y C, x y,
f ( αx ( α )y ) αf ( x ) ( α ) f ( y ), α ( , )
The f unction f is strictly convex. I f f or every
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.20
Convexity (Cont’d) Proposition A.35
A linear function is convex The weighted sum of convex functions with positive weights is convex Any vector norm is convex
Proposition A.36
Proposition A.39
i ii I
h( x ) sup f ( x ) f i I
is convex, if is convex f or each
n
f : Rn R
C R f : C R
I f is convex, then it is continuous. More generally,
if is convex and is convex, then f is continuous in the interior of C.
nC R f : Rn R
f f ( z ) f ( x ) ( z x )' f ( x ), x,z C (*)
x t f C
Let be a convex set and let be diff erentiable.
is convex on the set C iff
I f these inequality (*) is strict whenever , then is strictly convex on .
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.21
Convexity (Cont’d) Proposition A.40
Proposition A.41 (Strong Convexity)
nf : R RLet be twice diff erentiable, and let A be a real symmetric n x n matrix2f f ( x ) x. is convex iff is non-negative defi nite f or all
2 f ( x ) x fI f is positive defi nite f or every , then is strictly convex.
2
2
Let be continuously diff erentiable,
and let be a positive constant.
I f f satisfi es ,
then f is strictly convex
n
n
f : R R
α
( f ( x ) f ( y ))'( x y ) α x y , x, y R
f ( x ) x' Ax A
f ( x ) x' Ax A
is convex iff is non-negative defi nite
is strictly convex, iff is positive defi nite
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.22
Constrained Optimization
Proposition 3.1 (Optimality Condition)
nX R nMinimize a cost f unction F:R R over a set , assuming that
F is continuously diff erentiable and X is non-empty, closed, and convex
0
0
0
x X ( y x )' F( x ) y X
x X
( y x )' F( x ) y X
( y x )' F( x )
a. I f a vector minimizes f over X, then f or every
b. Let F be convex on the set X. A vector minimizes
F over X
f or every
Proof )
a. Suppose that 0 1y X . ε ( , )
F( x ε( y x )) F( x )
x ε( y x ) X
( y x )' F
f or some Then, there exists some
such that .
Then, , because X is convex, which proves that x does not minimize
F over the set X
b. Suppose that 0
0
( x ) y X
y X
F( y ) F( x ) ( y x )' F( x ) F( x ) ( ( y x )' F( x ) )
holds f or every . Then, using the convexity of F,
f or every , we have
Theref ore, x minimizes F over X.
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.23
Constrained Optimization (Cont’d)
Proposition 3.2 (Projection Theorem)
2z xx arg min z x
Let [x]+ denote orthogonal projection C w.r.t. Euclidean norm of a vector x
onto the convex set X, defi ned by
2
0
+
n +
a. For every , there exists a unique that minimizes
over all
b. Given some , a vector is equal to [x] iff f or all
c. f :R X defi ned by f (x) = [x] is conti
n
n
x R z X z x
z X
x R z X ( y z )'( x z ) y X
2
2
2 2
n
nuous and non-expansive that is,
f or all x,y R
Proof )
a. Let x be fi xed and let w be some element of x
minimizing over all
satisf ying which is a compact set
x y x y
x z z X
x z x w
2
2 Furthermore, the f unction g defi ned be is continuous.
Existence f ollows because a continuous f unctions in a compact set
always attains its minimum (5 Weierstrass Thm. Prop A.8 )
g( z ) z x
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.24
proof of prop 3.22
2
0
0 2
0
*
* *
* *
* *
z z X z x
( y z ) g( z )
( y z ) ( z x ) y( z ) ( z x ))
( y z ) ( x z )
b. is the minimizer of g(z) over all ( g(z) = )
, f or every y X
, f or every y X , (
, f or all y X
0
0
0
( v x ) ( x x ) v X
y X ( y x ) ( x x )
( x y ) ( y y )
(
n
c. Let x and y be elements of R .
From (b), we have f or all
Since , we obtain
Similary,
Adding these two inequalities,
2
22 2
22
0
y x ) x ( y x ) x ( x y ) y ( x y ) y
( x y ) ( y x ) ( y x ) ( y x )
y x ( y x ) ( y x ) y x y x
x y x y
i.e. non-expansive => continuous
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.25
Gradient Projection Algorithm
))(( )()1( txFtxtx
)0(x
))0((xF
)1(x
)3(x
)2(x
T : X X T(x) x γ F(x) Let be the mapping defi ned by
(gradient projection mapping)
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.26
Proposition 3.3 Assumption 3.1
Same as Assumption 2.1 ( as in unconstrained optimization)
Prop 3.3 (Properties of the gradient projection mapping)
2
2
1
2
0
x X
KF(T( x )) F( x ) T( x ) x
γ
( y x ) F( x ) y X
I f F satisfi es the Lipschitz condition of assumption 3.1(b), r is positive ,
and then ,
(a)
(b) We have T(x) = x iff f or all
I n particular , if F is convex on the set X , we have T(x)=x
iff x minimizes F over the set X .
(c) The mapping T is continuous.
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.27
Proof of Proposition 3.3 Proof of Proposition 3.3 (a)
x
( )F x
y
( )T x
( )x r F x
X
2
22
0
F(T( x )) F( x T( x ) x )
KF( x ) (T( x ) x ) F( x ) T( x ) x
( y T( x )) ( x γ F( x ) T( x )) , y X
---- (*) ( By Descent Lemma )
By Projection Theorem (b)
---- (***)
I n particular , lett
2
2
2
2
0
1
2
( x T( x )) ( x γ F( x ) T( x )) ,
γ(T( x ) x ) F( x ) T( x ) x
KF(T( x )) F( x ) T( x ) x
γ
ing y = x ,
---- (**)
By combining (*) and (**),
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.28
Proof of Proposition 3.3 Proof of Proposition 3.3 (b)
Proof of Proposition 3.3 (c)
0
0
0
T( x ) x ( y x ) γ F( x ) y X
( y x ) γ F( x ) x ) y X
( y x ) ( x γ F( x ) x ) y X
x T( x )
Using (***) , if , then , f or all
Conversely , if , f or all then,
, f or all
I n the convex case , this x is the minimizer of F over the set X
x x γ F( x )
Since F is continuously diff erentiable , the mapping is continuous
T is continuous ( is continuous by prop. 3.2(c) )
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.29
Proposition 3.4 Convergence of the Gradient Projection Algorithm
proof ) refer to proposition 3.3
20
0
*
* *
F γ xK
{ x( t )}
( y x ) F( x ) y X
Suppose that satisfi es assumption 3.1. I f and if is a limit point of
the sequence generated by the gradient projection algorithm , then
f or . I n particular , *X x
F X
if F is convex on the set , then
minimizes over the set
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.30
Proposition 3.5 Geometric Convergence for strongly convex problems
0 , )( 2 aaxxF axxF 2)(
2
0α
( F( x ) F( y )) ( x y ) α x y , x, y X
Suppose , in addition to Assumption 3.1 , that there exists some such that
*x
γ
Then, there exists a unique vector that minimizes F over the set X.
Furthermore , provided that is chosen positive & small enough , the sequence
{x(t)} generated by the gradient projection algorit *xhm converges to
geometrically
FRemark : strong convexity of F is equivalent to strong monotonicity of
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.31
This type of algorithms f ail, in general , to converge to a minimizing point
The scaled gradient projection algorithm does not have x* as a fi xed point
Scaled Gradient Projection Algorithms
11x( t ) [ x( t ) r( M( t )) F( x( t ))]
M( t )
Where is an invertible scaling matrix
*x
* ( *)x r F x
X
1* ( *)x rM F x
1[ * ( *)]x rM F x
2
2( x y ) M( t )( x y ) α x - y , x, y X
The condition f or convergence of the scaled gradient projection algorithm
M(t) is symmetric and
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.32
Proposition 3.7
1 2
1
21
1
1
1
2
Defi ne a norm
quadratic p
/
M ( t )
M ( t )
M ( t )y X
x ( x' M( t )x )
x( t ) [ x( t ) rM( t ) F( x( t ))]
x( t ) arg min y x( t ) rM( t ) F( x( t ))
a rg min[ ( y - x( t ))' M( t )( y x( t )) ( y x( t ))' F( x( t ))]r
rogramming
2
20
0
nM ( t )
M ( t ) M ( t )
M ( t ) M ( t ) M ( t ) M
M( t ) : Symmetric & ( x - y )' M( t )( x - y ) α x - y , x, y X , α
( a ) unique y X that minimize ( x - y )' M( t )( x - y ) , x R , y [ x ]
( b ) ( y [ x ] )' M( t )( x [ x ] )
( d ) ([ x ] [ y ] )' M( t )([ x ] [ y ]
2
2
3 2
0
( t )
t
M ( t ) M ( t )
t t
t
) ( x - y )' M( t )( x - y ) ; non exp ansive
( e ) T ( x ) x iff ( y - x )' F( x ) , y X
( f ) T ( x ) T ( y ) α x - y
( g ) If r is small enough, F(T ( x )) F( x ) A T ( x )- x
( h ) If r is small enough, lim x( t ) x* an 0 d ( y - x*)' F( x*)
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.33
The case of a product constraint set : parallel implementations
x
y
xy
1a1b
2a
2b
1x 1y
2x2y
1
n
i ii
x γ F( x )
x γ F( x )
X [ a ,b ]
can be parallelized in the obvious manner.
However , is not , in general , amenable to parallel implementation.
I f the set X is a box (i.e., ) , the projection of x o
i i[ a ,b ]
n X
is obtained by projecting the i-th component of x
on the interval
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.34
The assumption that X is a Cartesian product opens up the possibility for a Gauss-Seidel version of the gradient projection algorithm.
i m
More generally , suppose that Rn is represented as the Cartesian
product of spaces Rni , where n + + n = n , and that the
constraint set X is a Cartesian product of set Xi , where Xi is
a closed co
1 1 m im i( x , , x ) x
i
nvex subset of Rni. Then , the projection of x on X is
equal to the vector
where is the projection of x onto X .
1 1
1
1 1
i i i i
i i m
x ( t ) x ( t ) γ F( z( i,t ))
z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t ))
Gauss-Seidel version of the gradient projection algorithm.
where
Network Systems Lab.
Korea Advanced Institute of Science and Technology
No.35
Proposition 3.8 Convergence of the Gauss-Seidel Gradient Projection Algorithm
:
* { ( )}
( *) ( *) 0 ,
nF R R r
x x t
y x F x for
I f satisfi es Assumption 3.1 and if is chosen positive
and small enough, then any limit point of ite seq.
generated by the Gauss-Seidel algorithm satisfi es
all y X