Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Some useful Contraction Mappings Results for a particular choice of norms

Network Systems Lab.

Korea Advanced Institute of Science and Technology

No.1

Some useful Contraction Mappings Results for a particular choice of norms

Prop.1.12

1

2ini i iii i

.i ii

• R ( x ' G x ) ,x

max xx

Defi ne a norm on by norm by and

consequently a block-maximum norm by

1 2 3 3 2

i

i

G

• •

A ,A ,A A A i

x, y X

Suppose that each is symmetric positive defi nite and

let the norms and be as above. Suppose that there exist

positive constants with , such that f or each

and f or each , we

2 2

2 3

2

1

10

2

i i ii

i i i i i

n

-i i i i

f ( x ) f ( y ) A x y

f ( x ) f ( y ))'( x y ) A x y A x y

r T : X RA

T ( x ) x - rG f ( x )

have

and

(

Then, provided that , the mapping defi ned by

is a contrac w.r.t. the blo • .ck-maximum norm



No.2

Some useful Contraction Mappings

Prop.1.13 Assume the following:

1 1

2 22

1 1 1

2 2 2

i

i i i i i i i

ii i i i

G

|| x || ( x ' G x ) ||G x ||

G G G G G

•

I f is symmetric positive defi nite, then

where is a symmetric square root of (i.e. )

and is the Euclidean norm. (by Prop.A.27 and A.28)

20

0 0 i i i

X f : R R

G i

k || f ( x )|| k x X

δ ε f ( x )- δG

(a) The set is convex and is continuously diff erentiable.

(b) is symmetric and positive defi nite f or

(c) f or

(d) and is nonnegative defi ni1 1

2 22

1

1j j i jj i

n -i i i

i,x X ||G f ( x )G || δ( ε ), i, x X

r

T : X R T ( x ) x - rG f ( x )

te,

and

Then, provided that is positive and small enough,

, defi ned by is a contradiction

w.r.t the block-ma1

2i i i

i|| x || ( x ' G x )maxximum norm



No.3

Unconstrained Optimization Jacobi algorithm(Generalization of the JOR for linear eq.s)

Gauss-Seidel algorithm(Generalization of the SOR for linear eq.s)

1

2

1

0ii ii

x( t ) x( t ) r[ D( x( t ))] F( x( t ))

r D( x )

i [ D( x )] F( x )

F( x ) Ax b

Where is a positive step size, and is a diagonal matrix

whose -th diagonal entry is .

C.f .) I n the linear eq. case with

1

1

1

1

x( t ) x( t ) rD {( B D )x( t ) b }

( r )x( t ) rD ( Bx( t ) b )

:J OR

2

1 1

1 1

1 1

ii i

ii

i i n

F( z( i,t ))x ( t ) x ( t ) r , i , ,n

F( z( i,t ))

z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t ))

Where



No.4

Gradient algorithm(Generalization of the Richardson’s for linear eq.s)

Gauss-Seidel variant of the Gradient algorithm

The above 4 algorithms are called the Descent Algorithm; in fact, the Gradient algorithm is called the Steepest Descent Algorithm.

))(()()1( txFrtxtx

nitizFrtxtx iii ,,1 )),,(()()1(



No.5

Descent Direction

21 0

The J acobi algorithm can be viewed as a scaled version of

the gradient algorithm, whereby the th component of the update

is scaled by a f actor of , assuming i ii ii

i

F( x( t )) / F( x( t )) F( x( t ))

)(xF

)(xF

x

rs

rsx

F(x)vs)F(x

Θ2 2S ' F( x ) || S || || F( x )|| cos θ

F

Directional derivative of

along the direction s

0Any vector satisf ying is called a descent

direction.

nS R S' F( x )



No.6

Scaled Gradient algorithm

))(())(()()1( 1 txFtDrtxtx

)(xFrx

x B

D( t ) D( t )Where is a scaling matrix ; of ten, is chosen diagonal,

which simplifi es the task of inverting it.

B

A

With proper scaling, the direction of is pref erable to that

of .



No.7

Newton and Approximate Newton Methods

Even for nonquadratic case, Newton’s algorithm converges much faster (under certain assumptions) than previously introduced algorithms, particularly in the neighborhood of the optimal solution [OrR 70]

2 1

2

1

1

2

1

F

x( t ) x( t ) r( F( x( t ))) F( x( t ))

F( x ) x' Ax x' b

F( x( t )) Ax( t ) b F( x( t )) A

x( t ) x( t ) -

Assume that is twice continuously diff erentiable.

I n the linear (quadratic) case with

i.e. and ,

1

1

1

1

1 1

-

-

rA ( Ax( t ) - b )

( - r )x( t ) rA b

r x( t ) A b

I f , converge in a single step!



No.8

The Jacobi algorithm can be viewed as an approximation of Newton’s algorithm in which the off-diagonal entries of are ignored.

Approximate Newton Method

2

1

Where is the approximate solution of

and

(Employ an iterative algorithm to solve and

terminate af ter on

ˆx( t ) x( t ) rS( t )

S( t ) H( t ) S( t ) g( t ),

g( t ) F( x( t )) H( t ) F( x( t ))

H( t )S( t ) -g( t )

ly a f ew iteration).

F2



No.9

Convergence Analysis using the descent approach Assumption 2.1

Lemma 2.1 (Descent Lemma)

2 2

0a. f or every

b. (Lipschitz Continuity of

The f unction is continuously diff erentiable and

there exists a constant such that

n

n

F( x ) x R

F

K

F( x ) F( y ) K x y x, y R

2

22n

F

KF( x y ) F( x ) y' F( x ) y , x, y R

I f is satisfi ed the Lipschitz condition of Assumption 2.1(b)

Then,



No.10

Prop. 2.1 (Convergence of Descent Algorithms)

1 2

12 2

1

Suppose that Assumption 2.1 holds and let and be

Positive constraints. Consider the seq. generated by

Where satisfi es

(*)

And

K K

{ x( t )}

x( t ) x( t ) γs( t )

s( t )

s( t ) K F( x( t )) , t

2

2 2

220 0

t (**)

I f

then t

S( t )' F( x( t )) K S( t ) ,

Kγ , lim F( x( t )) .

K



No.11

22

2

2

2 2

2

12

2

02

Proof ) Using the descent lemma and the assumption (**)

We have

Let . Then by the

KF( x( t )) F( x( t )) γS( t )' F( x( t )) γ S( t )

KγF( x( t )) γ( K ) S( t )

Krβ r( K ) β

2

20

2

20

0 1 0

10

0

assumption on

Adding these inequalities,

Since this is true f or all ,

This implies that and (*) shows that

t

τ

τ

t

r

F( x( t )) F( x( )) β S( τ )

t

S( τ ) F( x( )β

lim S( t ) l

0

Q.E.Dtim F( x( t )) .



No.12

1

01 0

Proof of Descent Lemma)

Let be a scalar parameter and let .

The chain rule yields,

t g( t ) F( x yt )

dg( t )y' F( x yt )

dtdg( t )

F( x y ) F( x ) g( ) g( ) dtdt

1

0

1 1

0 0

1

2 20

2

1

0

y'

y' F( x ty )dt

y' F( x )dt y'( F( x ty ) F( x ))dt

F( x )dt y F( x ty ) F( x ) dt

y' F( x ) y Kt y

1

20

2

2

1

2

( Lipschitz)

y' Q.E.D

dt

F( x ) K y



No.13

Show that Jacobi, Gradient, scaled Gradient, Newton and Approximate Newton satisfy the conditions of Prop.2.1 (under certain assumptions), who implies that for these algorithms.

0tlim F( x( t ))



No.14

Gradient Algorithm

2 2

1

2 2

22 2

2

1

1

20

s( t ) F( x( t ))

s( t ) F( x( t ))

K

s( t )' F( x( t )) s( t )' s( t )

s( t ) K s( t )

K

γK

Scaled Gradient Algorithm

1

2

2

2

2

2 2

2

2 2

1

0

0

x( t ) γs( t ) x( t )

s( t ) ( D( t )) F( x( t )) - - - - - - (*)

Assume that K

D( t )- K I is nonnegative definite for each t.

s( t )'( D( t ) K I )s( t )

s( t )' D( t )s( t ) K s( t )

s( t )' F( x( t )) K s( t ) ( (*))

Assume 1

2

2 2 2

22 1

1

t

ii

that Ksup D( t )

Then, D( t )s( t ) - F( x( t ))

D( t ) s( t ) F( x( t ))

Jacobi K F( x ) K



No.15

Prop. 2.2 (Convergence of the Gauss-Seidel Algorithm)

2

0

0

20

Suppose that Assumption 2.1 holds and that is twice

diff erentiable.

Assume that there exist constants , such that

f or all

I f f or all , and if the seq.

i i

ni ii i

i

i

F

d D

d F( x ) D x R ,

dγ i

D

2

1 1

2

1 1

1 1

0

0 0 0 0

generated by

where

Then

Proof ) Let

ii i

ii

i i n

t

i i

ii

F( z( i,t ))x ( t ) x ( t ) γ , i , ,n

F( z( i,t ))

z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t )),

lim F( x( t )) .

F( z( i,t ))S ( t ) ( , , , , , , )

F( z( i,t ))



No.16

1

1

Then, 1 i n and

Let us view as a f unction of the single variable .

By the mean value theorem (Prop. A.30), there exists

Some such tha

i

n

i

z( i ,t ) z( i,t ) γS ( t ),

x( t ) z( n,t ) γS ( t )

F x

z [ x, y ]

2

2

2 2

22

22

t

Using the Descent Lemma

i i ii

i i i i i i

i

i i ii

F( x ) F( y ) F( z )( x y )

F( x ) F( y ) D x y D x y

F( x ) F( y ) D x y

DF( z( i,t ) γS ( t )) F( z( i,t )) γS ( t )' F( z( i,t )) γ S ( t )

2 22

2 2

2

2

2

2

by def . of

i iii

i

iii

DF( z( i,t )) γd S ( t ) γ S ( t )

( S ( t ))

DF( z( i,t )) γ( d γ ) S ( t )



No.17

The case of a convex cost function Prop. 2.3 (Convergence of Descent Methods in Convex Optim.)

Prop. 2.4 (Geometric Convergence for Strictly Convex Optim.)

F

{ x( t )} x*

{ x( t )} x* F

Suppose that is convex and satisfi es Assumption 2.1, and

The seq. is as in Prop. 2.1 or 2.2. I f is a limit point of

, then minimizes .

2

20 nα ( F( x ) F( y ))'( x y ) α x y , x, y R (*)

Suppose, in addition to Assumption 2.1, that there exists some

such that

* nx R F

r

{ x( t )} x*

Then, there exists a unique that minimizes .

Furthermore, provided that is chosen positive and small enough,

generated by the gradient algorithm converges to

geometrically.



No.18

2

n nT : R R

T( x ) x γ F( x )

Proof )

(*) implies that the mapping

defi ned by is a contraction w.r.t. Euclidean norm

Provided that r is positive and suffi ciently small.

Use Prop. 1.12 to prove it!



No.19

Convexity Definition A.13

Convex set Non-convex set

Str

ictl

y C

onve

x

Con

vex,

but

not

st

rict

ly c

onve

x

Non

-con

vex

C C C

1 0 1αx ( α )y C, x, y C, α [ , ] Let C be a subset of Rn. We say that C is convex if

1 1 0 1

f : C R

f ( αx ( α )y ) αf ( x ) ( α ) f ( y ), x, y C, α [ , ]

Let C be a convex subset of Rn. A f unction is called convex if

The f unction f is called concave if f is convex

1 1 0 1

x, y C, x y,

f ( αx ( α )y ) αf ( x ) ( α ) f ( y ), α ( , )

The f unction f is strictly convex. I f f or every



No.20

Convexity (Cont’d) Proposition A.35

A linear function is convex The weighted sum of convex functions with positive weights is convex Any vector norm is convex

Proposition A.36

Proposition A.39

i ii I

h( x ) sup f ( x ) f i I

is convex, if is convex f or each

n

f : Rn R

C R f : C R

I f is convex, then it is continuous. More generally,

if is convex and is convex, then f is continuous in the interior of C.

nC R f : Rn R

f f ( z ) f ( x ) ( z x )' f ( x ), x,z C (*)

x t f C

Let be a convex set and let be diff erentiable.

is convex on the set C iff

I f these inequality (*) is strict whenever , then is strictly convex on .



No.21

Convexity (Cont’d) Proposition A.40

Proposition A.41 (Strong Convexity)

nf : R RLet be twice diff erentiable, and let A be a real symmetric n x n matrix2f f ( x ) x. is convex iff is non-negative defi nite f or all

2 f ( x ) x fI f is positive defi nite f or every , then is strictly convex.

2

2

Let be continuously diff erentiable,

and let be a positive constant.

I f f satisfi es ,

then f is strictly convex

n

n

f : R R

α

( f ( x ) f ( y ))'( x y ) α x y , x, y R

f ( x ) x' Ax A

f ( x ) x' Ax A

is convex iff is non-negative defi nite

is strictly convex, iff is positive defi nite



No.22

Constrained Optimization

Proposition 3.1 (Optimality Condition)

nX R nMinimize a cost f unction F:R R over a set , assuming that

F is continuously diff erentiable and X is non-empty, closed, and convex

0

0

0

x X ( y x )' F( x ) y X

x X

( y x )' F( x ) y X

( y x )' F( x )

a. I f a vector minimizes f over X, then f or every

b. Let F be convex on the set X. A vector minimizes

F over X

f or every

Proof )

a. Suppose that 0 1y X . ε ( , )

F( x ε( y x )) F( x )

x ε( y x ) X

( y x )' F

f or some Then, there exists some

such that .

Then, , because X is convex, which proves that x does not minimize

F over the set X

b. Suppose that 0

0

( x ) y X

y X

F( y ) F( x ) ( y x )' F( x ) F( x ) ( ( y x )' F( x ) )

holds f or every . Then, using the convexity of F,

f or every , we have

Theref ore, x minimizes F over X.



No.23

Constrained Optimization (Cont’d)

Proposition 3.2 (Projection Theorem)

2z xx arg min z x

Let [x]+ denote orthogonal projection C w.r.t. Euclidean norm of a vector x

onto the convex set X, defi ned by

2

0

+

n +

a. For every , there exists a unique that minimizes

over all

b. Given some , a vector is equal to [x] iff f or all

c. f :R X defi ned by f (x) = [x] is conti

n

n

x R z X z x

z X

x R z X ( y z )'( x z ) y X

2

2

2 2

n

nuous and non-expansive that is,

f or all x,y R

Proof )

a. Let x be fi xed and let w be some element of x

minimizing over all

satisf ying which is a compact set

x y x y

x z z X

x z x w

2

2 Furthermore, the f unction g defi ned be is continuous.

Existence f ollows because a continuous f unctions in a compact set

always attains its minimum (5 Weierstrass Thm. Prop A.8 )

g( z ) z x



No.24

proof of prop 3.22

2

0

0 2

0

*

* *

* *

* *

z z X z x

( y z ) g( z )

( y z ) ( z x ) y( z ) ( z x ))

( y z ) ( x z )

b. is the minimizer of g(z) over all ( g(z) = )

, f or every y X

, f or every y X , (

, f or all y X

0

0

0

( v x ) ( x x ) v X

y X ( y x ) ( x x )

( x y ) ( y y )

(

n

c. Let x and y be elements of R .

From (b), we have f or all

Since , we obtain

Similary,

Adding these two inequalities,

2

22 2

22

0

y x ) x ( y x ) x ( x y ) y ( x y ) y

( x y ) ( y x ) ( y x ) ( y x )

y x ( y x ) ( y x ) y x y x

x y x y

i.e. non-expansive => continuous



No.25

Gradient Projection Algorithm

))(( )()1( txFtxtx

)0(x

))0((xF

)1(x

)3(x

)2(x

T : X X T(x) x γ F(x) Let be the mapping defi ned by

(gradient projection mapping)



No.26

Proposition 3.3 Assumption 3.1

Same as Assumption 2.1 ( as in unconstrained optimization)

Prop 3.3 (Properties of the gradient projection mapping)

2

2

1

2

0

x X

KF(T( x )) F( x ) T( x ) x

γ

( y x ) F( x ) y X

I f F satisfi es the Lipschitz condition of assumption 3.1(b), r is positive ,

and then ,

(a)

(b) We have T(x) = x iff f or all

I n particular , if F is convex on the set X , we have T(x)=x

iff x minimizes F over the set X .

(c) The mapping T is continuous.



No.27

Proof of Proposition 3.3 Proof of Proposition 3.3 (a)

x

( )F x

y

( )T x

( )x r F x

X

2

22

0

F(T( x )) F( x T( x ) x )

KF( x ) (T( x ) x ) F( x ) T( x ) x

( y T( x )) ( x γ F( x ) T( x )) , y X

---- (*) ( By Descent Lemma )

By Projection Theorem (b)

---- (***)

I n particular , lett

2

2

2

2

0

1

2

( x T( x )) ( x γ F( x ) T( x )) ,

γ(T( x ) x ) F( x ) T( x ) x

KF(T( x )) F( x ) T( x ) x

γ

ing y = x ,

---- (**)

By combining (*) and (**),



No.28

Proof of Proposition 3.3 Proof of Proposition 3.3 (b)

Proof of Proposition 3.3 (c)

0

0

0

T( x ) x ( y x ) γ F( x ) y X

( y x ) γ F( x ) x ) y X

( y x ) ( x γ F( x ) x ) y X

x T( x )

Using (***) , if , then , f or all

Conversely , if , f or all then,

, f or all

I n the convex case , this x is the minimizer of F over the set X

x x γ F( x )

Since F is continuously diff erentiable , the mapping is continuous

T is continuous ( is continuous by prop. 3.2(c) )



No.29

Proposition 3.4 Convergence of the Gradient Projection Algorithm

proof ) refer to proposition 3.3

20

0

*

* *

F γ xK

{ x( t )}

( y x ) F( x ) y X

Suppose that satisfi es assumption 3.1. I f and if is a limit point of

the sequence generated by the gradient projection algorithm , then

f or . I n particular , *X x

F X

if F is convex on the set , then

minimizes over the set



No.30

Proposition 3.5 Geometric Convergence for strongly convex problems

0 , )( 2 aaxxF axxF 2)(

2

0α

( F( x ) F( y )) ( x y ) α x y , x, y X

Suppose , in addition to Assumption 3.1 , that there exists some such that

*x

γ

Then, there exists a unique vector that minimizes F over the set X.

Furthermore , provided that is chosen positive & small enough , the sequence

{x(t)} generated by the gradient projection algorit *xhm converges to

geometrically

FRemark : strong convexity of F is equivalent to strong monotonicity of



No.31

This type of algorithms f ail, in general , to converge to a minimizing point

The scaled gradient projection algorithm does not have x* as a fi xed point

Scaled Gradient Projection Algorithms

11x( t ) [ x( t ) r( M( t )) F( x( t ))]

M( t )

Where is an invertible scaling matrix

*x

* ( *)x r F x

X

1* ( *)x rM F x

1[ * ( *)]x rM F x

2

2( x y ) M( t )( x y ) α x - y , x, y X

The condition f or convergence of the scaled gradient projection algorithm

M(t) is symmetric and



No.32

Proposition 3.7

1 2

1

21

1

1

1

2

Defi ne a norm

quadratic p

/

M ( t )

M ( t )

M ( t )y X

x ( x' M( t )x )

x( t ) [ x( t ) rM( t ) F( x( t ))]

x( t ) arg min y x( t ) rM( t ) F( x( t ))

a rg min[ ( y - x( t ))' M( t )( y x( t )) ( y x( t ))' F( x( t ))]r

rogramming

2

20

0

nM ( t )

M ( t ) M ( t )

M ( t ) M ( t ) M ( t ) M

M( t ) : Symmetric & ( x - y )' M( t )( x - y ) α x - y , x, y X , α

( a ) unique y X that minimize ( x - y )' M( t )( x - y ) , x R , y [ x ]

( b ) ( y [ x ] )' M( t )( x [ x ] )

( d ) ([ x ] [ y ] )' M( t )([ x ] [ y ]

2

2

3 2

0

( t )

t

M ( t ) M ( t )

t t

t

) ( x - y )' M( t )( x - y ) ; non exp ansive

( e ) T ( x ) x iff ( y - x )' F( x ) , y X

( f ) T ( x ) T ( y ) α x - y

( g ) If r is small enough, F(T ( x )) F( x ) A T ( x )- x

( h ) If r is small enough, lim x( t ) x* an 0 d ( y - x*)' F( x*)



No.33

The case of a product constraint set : parallel implementations

x

y

xy

1a1b

2a

2b

1x 1y

2x2y

1

n

i ii

x γ F( x )

x γ F( x )

X [ a ,b ]

can be parallelized in the obvious manner.

However , is not , in general , amenable to parallel implementation.

I f the set X is a box (i.e., ) , the projection of x o

i i[ a ,b ]

n X

is obtained by projecting the i-th component of x

on the interval



No.34

The assumption that X is a Cartesian product opens up the possibility for a Gauss-Seidel version of the gradient projection algorithm.

i m

More generally , suppose that Rn is represented as the Cartesian

product of spaces Rni , where n + + n = n , and that the

constraint set X is a Cartesian product of set Xi , where Xi is

a closed co

1 1 m im i( x , , x ) x

i

nvex subset of Rni. Then , the projection of x on X is

equal to the vector

where is the projection of x onto X .

1 1

1

1 1

i i i i

i i m

x ( t ) x ( t ) γ F( z( i,t ))

z( i,t ) ( x ( t ), ,x ( t ),x ( t ), ,x ( t ))

Gauss-Seidel version of the gradient projection algorithm.

where



No.35

Proposition 3.8 Convergence of the Gauss-Seidel Gradient Projection Algorithm

:

* { ( )}

( *) ( *) 0 ,

nF R R r

x x t

y x F x for

I f satisfi es Assumption 3.1 and if is chosen positive

and small enough, then any limit point of ite seq.

generated by the Gauss-Seidel algorithm satisfi es

all y X

Documents

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Some useful Contraction Mappings Results for a particular choice of norms