An Infeasible Interior-Point Algorithm with Full Nesterov ... · 1 Introduction For a comprehensive learning about of interior-point methods (IPMs), we refer to Klerk [1] and Roos

An Infeasible Interior-Point Algorithm

with Full Nesterov-Todd Step for

Semidefinite Programming

Zhongyi Liu ∗

Abstract

This paper proposes an infeasible interior-point algorithmwith full Nesterov-Todd step for semidefinite programming, whichis an extension of the work of Roos (SIAM J. Optim., 16(4):1110–1136, 2006). The polynomial bound coincides with that of in-feasible interior-point methods for linear programming, namely,O(n log n/ε).

Keywords: semidefinite programming, full Nesterov-Todd step, in-

feasible interior-point methods, primal-dual, polynomial complexity

AMS subject classification: 65K05, 90C51

1 Introduction

For a comprehensive learning about of interior-point methods (IPMs), we refer

to Klerk [1] and Roos et al. [7]. In Roos [8], a full-Newton step infeasible

interior-point algorithm for linear programming (LP ) was presented. And some

extensions which are still based on (LP ) were carried out by Mansouri and Roos

[3], Liu and Sun [2]. In this paper we perform an extension to semidefinite

programming (SDP ).

We consider the (SDP ) given in the following standard form:

(SDP ) min{Tr(CX) : Tr(AiX) = bi, i = 1, . . . ,m, X � 0}∗College of Science, Hohai University, Nanjing 210098, China. Email: [email protected]

1

and its associated dual problem:

(SDD) max{bTy :m∑

i=1

yiAi + S = C, S � 0}.

Here C and Ai, i = 1, . . . ,m are symmetric n× n matrices, i.e., C,Ai ∈ Sn, and

b, y ∈ Rm. Furthermore, X � 0 (X � 0) means that X is symmetric and positive

semidefinite (symmetric and positive definite). The matrices Ai, i = 1, . . . ,m are

assumed to be linearly independent. (SDP ) is a generalization of (LP ) where

all the matrices Ai and C are diagonal which implies S is automatically diagonal

and so X might also be assumed to be diagonal.

Note that the previous forms for (SDP ) and (SDD) can be expressed into

the following equivalent form:

(SDP ) min{Tr(CX) : AX = b, X � 0}

and its associated dual problem:

(SDD) max{bTy : A∗y + S = C, S � 0},

whereAX = (Tr(A1X), T r(A2X), . . . , T r(AmX))T andA∗y =m∑

i=1

yiAi. Through-

out the paper, we use this standard form.

Monteiro and Zhang [4] gives a unified analysis of feasible IPMs for semidefi-

nite programming that use the so-called commutative class of search directions.

These search directions include the popular directions such as the NT (Nesterov-

Todd), the XS, and the SX directions. In this paper we use the NT direction to

extend the infeasible interior-point algorithm to (SDP ), which is first developed

for (LP ) by Roos [8].

Now we give some notations. An inner product is used for matrices X,Y ∈ Sn,

which is

< X, Y >:= X • Y := Tr(XY ).

And two norm symbols will appear in the text. For X ∈ Sn,

‖X‖2 :=√λmax(X2) := max

i{|λi(X)|},

and

‖X‖ := ‖X‖F = ‖λ(X)‖ =

√√√√ n∑i=1

λ2i (X),

where λ(X) is the eigenvalue vector of X. Note that for convenience we use the

same norm symbol to express the Frobeniues norm for matrices and the l2-norm

for vectors.

2

2 Full NT step infeasible IPMs

We assume both (SDP ) and (SDD) are strictly feasible. The central path for

(SDP ) is defined by the solution sets {(X(µ), y(µ), S(µ)), µ > 0} of the following

system

AX = b, X � 0,

A∗y + S = C, S � 0, (2.1)

XS = µI,

where I denotes the n × n identity matrix and µ > 0. Suppose that the point

(X, y, S) is strictly feasible, so X � 0 and S � 0. Newton’s method amounts to

linearizing the system (2.1), thus yielding the following system of equations

A∆X = b−AX,A∗∆y + ∆S = C −A∗y − S,

∆XS +X∆S = µI −XS.

Since Ai, i = 1, . . . ,m are linearly independent, and X,S � 0, one may easily

verify that the system is nonsingular. Hence this system uniquely defines the

search directions ∆X, ∆y and ∆S.

If X is primal feasible and (y, S) dual feasible, then b − AX = 0 and C −A∗y − S = 0, whence the above system reduces to

A∆X = 0,

A∗∆y + ∆S = 0, (2.2)

∆XS +X∆S = µI −XS,

which gives the usual search directions for feasible primal-dual IPMs.

A crucial observation for (SDP ) is that the system (2.2) might have no sym-

metric solution ∆X. Among ways of symmetrizing the third equation in the

Newton system, we consider the symmetrization scheme that yields the NT di-

rection. Let us define the matrix

P = X12 (X

12SX

12 )−

12X

12 = S−

12 (S

12XS

12 )

12S−

12 , (2.3)

and D = P12 . The matrix D can be used to rescale X and S to the same matrix

V defined by

V :=1√µD−1XD−1 =

1√µDSD. (2.4)

3

Obviously the matrices D and V are symmetric and positive definite. After

defining

DX :=1√µD−1∆XD−1, DS :=

1√µD∆SD, (2.5)

the complementary condition in (2.2) reduces to

DX +DS = V −1 − V. (2.6)

For the derivation of this equation in length, see Peng et al. [6].

For any ν with 0 < ν ≤ 1 we consider the perturbed problem (SDPν), defined

by

(SDPν) min{Tr((C − νDiag(r0b ))X) : AX = b− νr0

b , X � 0},

and its dual problem (SDDν), which is given by

(SDDν) max{(b− νr0b )

Ty : A∗y + S = C − νR0C , S � 0}.

Then after pre- and post-multiplying (2.6) by D, the NT search directions

can be written as the solution of the following system

A∆X = νr0b ,

A∗∆y + ∆S = νR0C , (2.7)

D−1∆XSD +D∆SXD−1 = µI −D−1XSD,

where r0b = b−AX0 and R0

C = C −A∗y0 − S0.

Note that the perturbed problems satisfy Slater’s regularity condition when

ν = 1.

Lemma 2.1. The original problems, (SDP ) and (SDD), are feasible if and only

if for each ν satisfying 0 < ν ≤ 1, the perturbed problems (SDPν) and (SDDν)

satisfy Slater’s regularity condition.

Proof. Suppose that (SDP ) and (SDD) are feasible. Let X be a feasible solution

of (SDP ) and (y,S) a feasible solution of (SDD). ThenAX = b andA∗y+S = C,

with X � 0 and S � 0. Now let 0 < ν ≤ 1, and consider

X = (1− ν)X + νX0, y = (1− ν)y + νy0, S = (1− ν)S + νS0.

4

One has

AX = A((1− ν)X + νX0) = (1− ν)AX + νAX0

= (1− ν)b+ νAX0 = b− ν(b−AX0),

showing that X is feasible for (SDPν). Similarly,

A∗y + S = (1− ν)(A∗y + S) + ν(A∗y0 + S0)

= (1− ν)C + ν(A∗y0 + S0) = C − ν(C −A∗y0 − S0),

showing that (y, S) is feasible for (SDDν). Since ν > 0 and X and S are sym-

metric positive definite, thus proving that (SDPν) and (SDDν) satisfy Slater’s

regularity condition.

To prove the inverse implication, suppose that (SDPν) and (SDDν) satisfy

Slater’s regularity condition for each ν satisfying 0 < ν ≤ 1. Obviously, then

(SDPν) and (SDDν) are feasible for these values of ν. Letting ν go to zero it

follows that (SDP ) and (SDD) are feasible.

Assuming that (SDP ) and (SDD) are feasible, it follows from Lemma 2.1

that the problems (SDPν) and (SDDν) satisfy Slater’s regularity condition, for

each ν ∈ (0, 1]. And their central paths exist. This means that the system

b−AX = νr0b , X � 0,

C −A∗y − S = νR0C , S � 0,

D−1∆XSD +D∆SXD−1 = µI −D−1XSD

has a unique solution, for every ν > 0. If ν ∈ (0, 1] and µ = νζ2 we denote this

unique solution as (X(ν), y(ν), S(ν)). As a consequence, X(ν) is the µ-center

of (SDPν) and (y(ν), S(ν)) the µ-center of (SDDν). Thus we have, by taking

ν = 1, (X(1), y(1), S(1)) = (X0, y0, S0) = (ζI, 0, ζI).

We measure proximity of iterates (X, y, S) to the µ-center of the perturbed

problems (SDPν) and (SDDν) by the quantity

δ(X,S;µ) := δ(V ) :=1

2‖V − V −1‖. (2.8)

Initially we have X = S = ζI and µ = ζ2, whence V = I and δ(X,S;µ) = 0.

In the sequel we assume that at the start of each iteration, δ(X,S;µ) is smaller

than or equal to a (small) threshold value τ > 0. Of course this is true at the

start of the first iteration.

5

Now we use the following system to define ∆fX,∆fy and ∆fS, which is called

the feasibility step,

A∆fX = θνr0b ,

A∗∆fy + ∆fS = θνR0C , (2.9)

D−1∆fXSD +D∆fSXD−1 = µI −D−1XSD.

The algorithm begins with a strictly feasible point (X, y, S) such that:

A1: (X, y, S) satisfies the feasibility conditions for the perturbed problems;

A2: Tr(XS) = nµ, δ(X,S;µ) ≤ τ with µ = νζ2.

First we find a new point (Xf , yf , Sf ) such that A1 is satisfied with ν+ := (1−θ)ν.Then µ is reduced, µ+ := (1 − θ)µ. After these two stages, what we desire is

δ(Xf , Sf ;µ+) ≤ τ . But in general δ(Xf , Sf ;µ+) ≤ τ cannot be kept more. So a

limited centering steps are applied to produce new points (X+, y+, S+) such that

Tr(X+S+) = nµ+ until δ(X+, S+;µ+) ≤ τ . Then the process can be repeated.

Note that after a feasibility step, the new iterates

Xf = X + ∆fX, yf = y + ∆fy, Sf = S + ∆fS,

are strictly feasible (i.e., Xf � 0 and Sf � 0) since θ is small enough. Now we

give a more formal description of the algorithm in Figure 1.

In the centering steps, starting at the iterates (X, y, S) = (Xf , yf , Sf ) and

targeting at the µ-centers, the search directions ∆X, ∆y, ∆S are the usual

primal-dual NT directions, (uniquely) defined by

A∆X = 0,

A∗∆y + ∆S = 0,

D−1∆XSD +D∆SXD−1 = µI −D−1XSD,

6

Primal-Dual Infeasible IPMs for (SDP)

Input:

Accuracy parameter ε > 0;

barrier update parameter θ, 0 < θ < 1;

threshold parameter τ > 0.

begin

X := ζI; y := 0; S := ζI; ν = 1;

while max{Tr(XS), ‖b−AX‖, ‖C −A∗y − S‖}≥ ε do

begin

feasibility step: (X, y, S) := (X, y, S) + (∆fX,∆fy,∆fS);

µ-update: µ := (1− θ)µ;

centering steps:

while δ(X,S;µ) ≥ τ do

(X, y, S) := (X, y, S) + (∆X,∆y,∆S);

end while

end

end

Figure 1: Algorithm

Denoting the iterates after a centering step as X+, y+ and S+, we recall the

following results from Chapter 7 of Klerk [1].

Lemma 2.2. Let X, S satisfy Slater’s regularity condition and µ > 0. If δ :=

δ(X,S;µ) < 1, then the full NT step is strictly feasible.

Corollary 2.3. Let X, S satisfy Slater’s regularity condition and µ > 0 such

that δ(X,S;µ) < 1. Then Tr(X+S+) = nµ.

Lemma 2.4. After a feasible full NT step the proximity function satisfies

δ+ := δ(X+, S+;µ) ≤ δ2√2(1− δ2)

.

From this lemma, one can easily get the following quadratic convergence re-

sult.

7

Lemma 2.5. If δ := δ(X,S;µ) < 1/√

2, then δ(X+, S+;µ) < δ2.

The centering steps serve to get iterates that satisfy Tr(XS) = nµ+ and

δ(X,S;µ) < τ , where τ is (much) smaller than 1/√

2. By using Lemma 2.5,

the required number of centering steps can easily to obtained. Because after the

µ-update we have δ(Xf , Sf ;µ+) ≤ 1/√

2, and hence after k centering steps the

iterates (X, y, S) satisfy

δ(X,S;µ+) ≤ (1√2)2k

.

From this one easily deduces that no more than

log2(log2

1

τ 2) (2.10)

centering steps are needed.

3 Technical results

Given a strictly primal feasible solution of (SDP ) and a strictly dual feasible

solution (y, S) of (SDD), and µ > 0, let

Φ(XS;µ) := Ψ(V ) :=n∑

i=1

ψ(λi(V )),

where V is defined as (2.4), and

ψ(t) :=1

2(t2 − 1− log t2).

It is well known that ψ(t) is the kernel function of the primal-dual logarithmic

barrier function, which, up to some constant, is the function Φ(XS;µ). Note that

V 2 =1

µD−1XSD =

1

µDSXD−1,

and then V and 1√µD−1X1/2S1/2D share the same system of eigenvalues.

Lemma 3.1. One has

Φ(XS;µ) = Φ(XS(µ);µ) + Φ(X(µ)S;µ).

8

Proof. The equality in the lemma is equivalent to

n∑i=1

(λi(

1

µD−1XSD)− 1− log(λi(

1

µD−1XSD))

)=

n∑i=1

(λi(

1

µD−1X(µ)SD)− 1− log(λi(

1

µD−1X(µ)SD))

)+

n∑i=1

(λi(

1

µD−1XS(µ)D)− 1− log(λi(

1

µD−1XS(µ)D))

).

Sincen∑

i=1

(λi(

1

µD−1XSD)− 1

)= Tr(

1

µD−1XSD)− n, (3.1)

andn∑

i=1

(λi(

1

µD−1X(µ)SD)− 1

)= Tr(

1

µD−1X(µ)SD)− n, (3.2)

n∑i=1

(λi(

1

µD−1XS(µ)D)− 1

)= Tr(

1

µD−1XS(µ)D)− n. (3.3)

The sum of (3.2) and (3.3) is equal to (3.1) if

Tr(1

µD−1XSD)− n = Tr(

1

µD−1X(µ)SD)− n+ Tr(

1

µD−1XS(µ)D)− n.

Using that X(µ)S(µ) = µI, whence Tr(X(µ)S(µ)) = nµ, this can be written as

Tr((X −X(µ))(S − S(µ))) = 0, that is, X −X(µ) and S − S(µ) are orthogonal.

This is indeed true since X −X(µ) belongs to L⊥ and S − S(µ) to L, where Lis defined as

L := span{A1, . . . , Am}.

In addition,

n∑i=1

log(λi(1

µD−1XSD)) =

n∑i=1

log(λi(1

µD−1X(µ)SD)) +

n∑i=1

log(λi(1

µD−1XS(µ)D))

holds true since the left-hand side is log det( 1µXS), the right-hand side is

log det(1

µX(µ)S) + log det(

1

µXS(µ)) = log det(

1

µ2XSX(µ)S(µ)) = log det(

1

µXS).

Hence the lemma is proved.

9

Theorem 3.2. Let δ(V ) be as defined in (2.8) and ρ(δ) later as in (4.4). Then

Ψ(V ) ≤ ψ(ρ(δ(V ))).

Proof. The lemma is obvious if V = I since then δ(V ) = Ψ(V ) = 0 and since

ρ(0) = 1 and ψ(1) = 0. Now we consider δ(V ) > 0 and Ψ(V ) > 0. For this

nontrivial case we consider, for τ > 0, the following problem

zτ = maxV{Ψ(V ) =

n∑i=1

ψ(λi(V )) : δ(V )2 =1

4

n∑i=1

ψ′(λi(V ))2 = τ 2}.

The first-order optimality conditions are

ψ′(λi(V )) = ηψ′(λi(V ))ψ′′(λi(V )), i = 1, . . . , n,

where the parameter η ∈ Rn. From the equations we have either ψ′(λi(V )) = 0 or

ηψ′′(λi(V )) = 1, for each i. The first case implies λi(V ) = 1. For the second case,

note that ψ′′(t) is monotonically decreasing, this implies that all λi(V ) such that

ηψ′′(λi(V )) = 1 have the same value (denoting this value as t, and the number of

the second case holding true as k, since τ > 0, thus k ≥ 1). Thus after reordering

the coordinates, λ(V ) has the form

λ(V ) = (t, . . . , t︸︷︷︸k

, 1, . . . , 1︸︷︷︸n−k

)T .

Since ψ′(1) = 0, the sum in the constraint condition has k nonzero components,

then kψ′(t)2 = 4τ 2. Note that ψ′(t) = t− 1/t, it follows that

t− 1

t= ± 2τ√

k,

which shows t = ρ(τ/√k) or t = 1/ρ(τ/

√k). For the first case, t > 1, which

produces the largest value of ψ(t), where we need to prove is

ψ(ρ(τ/√k)) ≥ ψ(

1

ρ(τ/√k)

).

In fact, for t ≥ 1,

ψ(t)− ψ(1

t) =

1

2(t2 − 1

t2) > 0.

Since we want to maximize Ψ(V ), then t = ρ(τ/√k) is just what we need. At

this time,

Ψ(V ) = kψ(ρ(τ√k)).

10

What remains is which value of k maximizes Ψ(V ). For this purpose, we consider

the first-order derivative of Ψ(V ). To simplify notations, we write

Ψ(V ) = kψ(t), t = ρ(s), s =τ√k.

From the definition of t, we have (t − s)2 = 1 + s2 or t2 − 1 = 2st, whence we

have

2s = t− 1

t= ψ′(t).

Some computations yield

dΨ(V )

dk= f(τ) := ψ(t)− s2ρ(s)√

1 + s2

and

f ′(τ) = − 1√k

s2

(1 + s2)√

1 + s2.

One may easily verify that f(τ) = 0 and f ′(τ) ≤ 0. This implies f(τ) ≤ 0 for

each τ ≥ 0. Hence we conclude that Ψ(V ) is monotonically decreasing in k. So

Ψ(V ) is maximal when k = 1 and the theorem is given.

Corollary 3.3. Let τ ≥ 0 and δ(V ) ≤ τ . Then Ψ(V ) ≤ τ ′ := ψ(ρ(τ)).

Proof. Since ρ(s) is monotonically increasing in s, and ρ(s) ≥ 1 for all s ≥ 0,

and moreover, ψ(t) is monotonically increasing if t ≥ 1, the function ψ(ρ(δ)) is

increasing in δ, for δ ≥ 0. Thus the result is immediate from Theorem 3.2.

Lemma 3.4. Let δ(V ) ≤ τ and let τ ′ be as defined in Corollary 3.3. Then

ψ(λi(X(µ)−1/2X1/2)) ≤ τ ′, ψ(λi(X(µ)−1/2X1/2)) ≤ τ ′, i = 1, . . . , n.

Proof. By Lemma 3.1, we have Ψ(XS, µ) = Ψ(XS(µ);µ) + Ψ(X(µ)S;µ). The

nonnegativity of Ψ(XS;µ), Ψ(XS(µ);µ) and Ψ(X(µ)S;µ), together with Corol-

lary 3.3, imply Ψ(XS(µ);µ) ≤ τ ′ and Ψ(X(µ)S;µ) ≤ τ ′. The first of these two

inequalities gives

Ψ(XS(µ);µ) =n∑

i=1

ψ(λi(1√µX1/2S(µ)1/2)) ≤ τ ′.

11

Since ψ(τ) ≥ 0 for every t > 0, it follows that

ψ(λi(1√µX1/2S(µ)1/2)) ≤ τ ′, i = 1, . . . , n.

Due to X(µ)S(µ) = µI, we get that, if not considering the order of eigenval-

ues, λ( 1√µX1/2S(µ)1/2) will be the same as λ(X(µ)−1/2S(µ)−1/2X1/2S(µ)1/2) and

λ(X(µ)−1/2X1/2). Thus we obtain the inequality in the lemma. The second

inequality follows in the same way.

In the sequel we use the inverse function of ψ(t) for 0 < t ≤ 1, which is

denoted as χ(s). So χ : [0,∞) → (0, 1] and we have

χ(s) = t ⇔ s = ψ(t), s ≥ 0, 0 < t ≤ 1. (3.4)

Lemma 3.5. For each t > 0 one has χ(ψ(t)) ≤ t ≤ 1 +√

2ψ(t).

Proof. See the proof of Lemma A.5 in Roos [8].

Corollary 3.6. If δ(V ) ≤ τ then

χ(τ ′) ≤ λi(X(µ)−1/2X1/2) ≤ 1 +√

2τ ′, χ(τ ′) ≤ λi(S(µ)−1/2S1/2) ≤ 1 +√

2τ ′.

Proof. This is immediate from Lemma 3.4 and Lemma 3.5.

Theorem 3.7. If δ(V ) ≤ τ then

‖S−1/2X1/2‖ ≤ 1 +√

2τ ′√µχ(τ ′)

‖X(µ)‖,

‖X−1/2S1/2‖ ≤ 1 +√

2τ ′√µχ(τ ′)

‖S(µ)‖.

Proof. It is known that S−1/2X1/2 shares a common system of eigenvalues with

(S(µ)−1/2S1/2)−1X(µ)−1/2X1/2X(µ)1/2S(µ)−1/2.

Using X(µ)S(µ) = µI, Corollary 3.3 and Corollary 3.6, then

n∑i=1

λ2i (S

−1/2X1/2) ≤ (1 +√

2τ ′)2

(χ(τ ′))2

n∑i=1

λ2i (X(µ)1/2S(µ)−1/2)

=

(1 +

√2τ ′

√µχ(τ ′)

)2 n∑i=1

λ2i (X(µ)).

The second inequality is obtained in the same way.

12

4 Analysis of the feasibility step

4.1 The feasibility step

Let X, y and S denote the iterates at the start of an iteration, and assume

δ(X,S;µ) ≤ τ . Recall that at the start of the first iteration this is certainly true,

because then δ(X,S;µ) = 0.

Defining DfX , D

fS as in (2.5) and V as in (2.4). Now we may write

D−1XfSfD = D−1(X + ∆fX)(S + ∆fS)D

= D−1(XS + ∆fXS +X∆fS + ∆fX∆fS)D

= D−1XSD +D−1∆fXSD +D−1X∆fSD +D−1∆fX∆fSD.

Since D−1X∆fSD ∼ D∆fSXD−1 and (2.7), then

XfSf ∼ µI +D−1∆fX∆fSD. (4.1)

Using (2.4) and (2.5), we may also write

Xf = X + ∆fX =√µD(V +Df

X)D, (4.2)

Sf = S + ∆fS =√µD−1(V +Df

S)D−1. (4.3)

Recall that we assume that before the feasibility step one has δ ≤ τ .

Lemma 4.1. The iterates (Xf , yf , Sf ) are certainly strictly feasible if

‖λ(DfX)‖ < 1

ρ(δ)and ‖λ(Df

S)‖ < 1

ρ(δ),

where

ρ(δ) := δ +√

1 + δ2. (4.4)

Proof. It is clear from (4.2) that Xf is strictly feasible if and only if V +DfX � 0

(the reason is Xf =√µDT (V + Df

X)D). This certainly holds if ‖λ(DfX)‖ <

min(λi(V )). Since

2δ = ‖V − V −1‖ = ‖λ(V )− λ(V )−1‖.

Hence the minimal value t that an entry of λ(V ) can attain will satisfy t ≤ 1

and 1/t − t = 2δ. Here the last equation implies t2 + 2δt − 1 = 0, which gives

t = −δ +√

1 + δ2 = 1/ρ(δ). This proves the first inequality in the lemma. The

second inequality is obtained in the same way.

13

In the sequel we denote

ω := ω(V ) :=1

2

√‖Df

X‖2 + ‖DfS‖2,

DfXS :=

1

2(Df

XDfS +Df

SDfX).

This implies

Tr(DfXS) = Tr(Df

XDfS) ≤ ‖Df

X‖2 · ‖DfS‖

≤ ‖DfX‖ · ‖D

fS‖ ≤

1

2(‖Df

X‖2 + ‖Df

S‖2) ≤ 2ω2, (4.5)

|λi(DfXS)| = |λi(

DfXD

fS +Df

SDfX

2)|

≤ ‖DfXD

fS +Df

SDfX

2‖ ≤ ‖Df

X‖ · ‖DfS‖ ≤ 2ω2. (4.6)

Lemma 4.2. One has

4δ(V f )2 ≤ θ2n

1− θ+

2ω2

1− θ+ (1− θ)

2ω2

1− 2ω2.

Proof.

δ(Xf , Sf ;µ+) = δ(V f ) =1

2‖V f − (V f )−1‖, where (V f )2 ∼ 1

µ+D−1XfSfD.

Using (4.1), we obtain

D−1XfSfD ∼ µI +D−1∆fX∆fSD. (4.7)

Since µV 2 = D−1XSD, after division of both sides in (4.7) by µ+ we get

(V f )2 ∼ µ

µ+I +

µ

µ+Df

XDfS

=I +Df

XDfS

1− θ

=I +Df

XS

1− θ+Df

XDfS −Df

SDfX

2(1− θ).

Set u = (λ1/21 (I +Df

XDfS), . . . , λ

1/2n (I +Df

XDfS))T , thus

2δ(V f ) =

∥∥∥∥ u√1− θ

−√

1− θu−1

∥∥∥∥ =

∥∥∥∥ θu√1− θ

+√

1− θ(u− u−1)

∥∥∥∥ .14

Therefore,

4δ(V f )2 =θ2

1− θ‖u‖2 + (1− θ)‖u− u−1‖2 + 2θuT (u− u−1)

= (θ2

1− θ+ 2θ)‖u‖2 + (1− θ)‖u− u−1‖2 − 2θuTu−1

= (θ2

1− θ+ 2θ)(n+ Tr(Df

XDfS)) + (1− θ)‖u− u−1‖2 − 2θn

=θ2n

1− θ+ (

θ2

1− θ+ 2θ)Tr(Df

XDfS) + (1− θ)‖u− u−1‖2.

Since

‖u−1 − u‖2 = Tr(I +DfXD

fS) + Tr((I +Df

XDfS)−1)− 2n

= Tr(DfXS) + Tr((I +Df

XDfS)−1)− n

≤ Tr(DfXS) + Tr((I +Df

XS)−1)− n

= Tr(DfXS) +

n∑i=1

(1

1 + λi(DfXS)

− 1

)

= Tr(DfXS)−

n∑i=1

λi(DfXS)

1 + λi(DfXS)

,

where the inequality is due to Lemma 2.1 in Peng et al. [5] and the third equality

is due to the fact λi(I+B) = 1+λi(B) for some matrix B. Note that λi(A+B) 6=λi(A) + λi(B) generally, see Theorem 2.3.5 in Wolkowicz et al. [9]. Substitution

gives

4δ(V f )2 =θ2n

1− θ+ (

θ2

1− θ+ 2θ)Tr(Df

XDfS) + (1− θ)

(Tr(Df

XS)−n∑

i=1

λi(DfXS)

1 + λi(DfXS)

)

=θ2n

1− θ+

1

1− θTr(Df

XS)− (1− θ)n∑

i=1

λi(DfXS)

1 + λi(DfXS)

.

Hence, using (4.5) and (4.6), we arrive at

4δ(V f )2 ≤ θ2n

1− θ+

2ω2

1− θ+ (1− θ)

2ω2

1− 2ω2,

which completes the proof.

Because we need to have δ(V f ) ≤ 1/√

2, it follows from Lemma 4.2 that it

suffices if

θ2n

1− θ+

2ω2

1− θ+ (1− θ)

2ω2

1− 2ω2≤ 2.

15

If we decide to choose

θ =α√2n, α ≤ 1√

2. (4.8)

Then, for n ≥ 2, one may easily verify that

ω ≤ 1

2⇒ δ(V f ) ≤ 1√

2.

Note that the system (2.9) can be expressed in terms of the scaled search

directions DfX and Df

S as follows.

ADfX = θνr0

b ,

A∗∆fy

µ+Df

S =1√µθνDR0

CD, (4.9)

DfX +Df

S = V −1 − V,

where

A(·) = (Tr(A1(·)), T r(A2(·)), . . . , T r(Am(·)))T ,

and

Ai =√µDAiD.

In the feasible case, the scaled search directions DfX and Df

S form an orthog-

onal decomposition for the matrix V −1 − V . Thus we can have upper bounds

for the norms of DfX and Df

S, namely, ‖DfX‖ ≤ 2δ(V ) and ‖Df

S‖ ≤ 2δ(V ), and

moreover ω = δ(V ). In the infeasible case the situation is quite different since or-

thogonality of DfX and Df

S is lost generally. And it is harder to get upper bounds

of ‖DfX‖ and ‖Df

S‖. First we need to obtain the upper bound of ω.

4.2 Upper bound for ω(V )

Now let

L := span{A1, . . . , Am}.

Given a strictly feasible point X we call ∆X a feasible direction at X if ∆X ∈ L⊥.

Similarly ∆S is a feasible direction at a strictly feasible point S if ∆S ∈ L. So

L⊥ := {ξ ∈ Rn×n : Aξ = 0}.

Obviously, the affine space {ξ ∈ Rn×n : Aξ = θνr0b} equals to Df

X + L⊥ and

DfS ∈ 1√

µθνDR0

CD + L.

16

Lemma 4.3. Let Q be the (unique) point in the intersection of the affine spaces

DfX + L⊥ and Df

S + L. Then

2ω ≤√‖Q‖2 + (‖Q‖+ 2δ(V ))2.

Proof. Let us denote R = V −1−V , since L+L⊥ = Rn×n, there exist Q1, R1 ∈ L⊥and Q2, R2 ∈ L such that

Q = Q1 +Q2, R = R1 +R2.

On the other hand, since DfX−Q ∈ L⊥ and Df

S−Q ∈ L, there must exist L1 ∈ L⊥and L2 ∈ L such that

DfX = Q+ L1, Df

S = Q+ L2.

Due to the third equation of (4.9) it follows that R = 2Q+L1 +L2, which implies

(2Q1 + L1) + (2Q2 + L2) = R1 +R2,

from which we conclude that

L1 = R1 − 2Q1, L2 = R2 − 2Q2.

Hence we obtain

DfX = Q+R1 − 2Q1 = (R1 −Q1) +Q2,

DfS = Q+R2 − 2Q2 = Q1 + (R2 −Q2).

Since the spaces L⊥ and L are orthogonal we conclude that

4‖ω‖2 = ‖DfX‖

2 + ‖DfS‖

2

= ‖R1 −Q1‖2 + ‖Q2‖2 + ‖Q1‖2 + ‖R2 −Q2‖2

= ‖Q−R‖2 + ‖Q‖2.

Assuming that ‖Q‖ 6= 0, since ‖R‖ = 2δ(V ), then the right-hand side is maximal

if R = −2δ(V )Q/‖Q‖, thus,

4ω2 ≤ ‖Q‖2 + (‖Q‖+ 2δ(V ))2.

Since the above inequality holds true with equality if ‖Q‖ = 0 (Q=0 implies that

DfX is orthogonal to Df

S, then 2ω2 = δ2). Hence the lemma is given.

In the sequel we denote δ(V ) as δ. Recall from 4.2 that in order to guarantee

that δ(V f ) ≤ 1/√

2 we need to have ω ≤ 1/2. Due to Lemma 4.3 it will certainly

hold if ‖Q‖ satisfies

‖Q‖2 + (‖Q‖+ 2δ)2 ≤ 1. (4.10)

17

4.3 Upper bound for ‖Q‖

Recall from Lemma 4.3 that Q is the (unique) solution of the system

AQ = θνr0b ,

A∗ξ +Q =1√µθνDR0

CD.

We proceed by deriving an upper bound for ‖Q‖. For the moment, let us write

rb = θνr0b , RC = θνR0

C .

Defining an operation AA∗ as

AA∗ :=

Tr(A1A1) Tr(A1A2) · · · Tr(A1Am)

Tr(A2A1) Tr(A2A2) · · · Tr(A2Am)

· · · · · · · · · · · ·Tr(AmA1) Tr(AmA2) · · · Tr(AmAm)

,

one can easily verify that the above matrix is symmetric and positive definite by

the definition of a positive definite matrix whenever A1, A2, . . . , Am are linearly

independent. Thus

ξ =1

µ(AA∗)−1(ADRCD − rb).

Substitution gives

Q =1√µ

(DRCD − A∗(AA∗)−1ADRCD + A∗(AA∗)−1rb

)=

1√µ

((I − A∗(AA∗)−1A)DRCD + A∗(AA∗)−1rb

):=

1√µ

(Q1 +Q2)

Here one can easily verify that A∗(AA∗)−1A is a orthogonal projection operator.

Let (y, S) be such that A∗y + S = C. Thus

DRCD = θνDR0CD = θνD(C −A∗y0 − S0)D = θν(A∗(y − y0) +D(S − S0)D).

We obtain

‖Q1‖ ≤ θν‖D(S − S0)D‖.

18

On the other hand, let X be such that AX = b. Then

rb = θνr0b = θν(b−AX0) = θνA(X −X0) = A(D−1(X −X0)D−1).

and

Q2 = θν(A∗(AA∗)−1A(D−1(X −X0)D−1)).

Hence it follows that

‖Q2‖ ≤ θν‖D−1(X −X0)D−1‖.

Since√µQ = Q1 +Q2 and Q1 and Q2 are orthogonal, we may conclude that

√µ‖Q‖ =

√‖Q1‖2 + ‖Q2‖2

≤ θν√‖D(S − S0)D‖2 + ‖D−1(X −X0)D−1‖2. (4.11)

Note that still free to choose X and S such that AX = b and A∗y + S = C. Let

X be an optimal solution of (SDP ) and (y, S) of (SDD). Then Tr(XS) = 0.

Let ζ be such that ‖X + S‖2 ≤ ζ, and the initial choices be

X0 = S0 = ζI, y0 = 0, µ0 = ζ2.

The matrices X0 − X and S0 − S satisfy

0 � X0 − X � ζI, 0 � S0 − S � ζI.

Thus it follows that √‖D(S − S0)D‖2 + ‖D−1(X −X0)D−1‖2

≤ ζ√‖D2‖2 + ‖D−2‖2

= ζ√‖P‖2 + ‖P−1‖2

= ζ√‖S−1/2X1/2‖2 + ‖X−1/2S1/2‖2.

Substitution into (4.11) gives

√µ‖Q‖ ≤ ζθν

√‖S−1/2X1/2‖2 + ‖X−1/2S1/2‖2. (4.12)

To proceed we need upper and lower bounds for X and S.

19

4.4 Bounds for ‖S−1/2X1/2‖ and ‖X−1/2S1/2‖

From Theorem 3.7,

‖S−1/2X1/2‖ ≤ 1 +√

2τ ′√µχ(τ ′)

‖X(µ, ν)‖,

‖X−1/2S1/2‖ ≤ 1 +√

2τ ′√µχ(τ ′)

‖X(µ, ν)‖,

where

τ ′ := ψ(ρ(τ)), ψ(t) =1

2(t2 − 1− log t2),

and where χ : [0,∞) → (0, 1] is the inverse function of ψ(t) for 0 < t ≤ 1, as

defined in (3.4).

We choose

τ =1

8. (4.13)

Then τ ′ = 0.016921, 1 +√

2τ ′ = 1.18396 and χ(τ ′) = 0.872865, whence

1 +√

2τ ′

χ(τ ′)= 1.35641 <

√2.

It follows that

‖S−1/2X1/2‖ ≤√

2√µ‖X(µ, ν)‖, ‖X−1/2S1/2‖ ≤

√2

√µ‖S(µ, ν)‖.

Substitution into (4.12) gives

µ‖Q‖ ≤√

2θνζ√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2.

Therefore, using µ = µ0ν = ζ2ν and θ = α/√

2n, we obtain the following upper

bound for the norm of ‖Q‖:

‖Q‖ ≤ α

ζ√n

√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2.

We define

κ(ζ, ν) =

√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2

ζ√

2n, 0 < ν ≤ 1, µ = µ0ν.

20

Note that X(ζ2, 1) = S(ζ2, 1) = ζI, we have κ(ζ, 1) = 1. Now we may write

‖Q‖ ≤√

2ακ(ζ), where κ(ζ) = max0<ν≤1

κ(ζ, ν).

We found in (4.10) that in order to have δ(V f ) ≤ 1/√

2, we should have ‖Q‖2 +

(‖Q‖ + 2δ(V ))2 ≤ 1. Since δ(V ) ≤ τ = 1/8, it suffices if ‖Q‖ ≤ 0.570971. Since√2/3 = 0.471405, we conclude that if we take

α =1

3κ(ζ), (4.14)

then we will certainly have δ(V f ) ≤ 1/√

2.

4.5 Bound for κ(ζ)

Due to the choice of X, y, S and the number ζ we have

AX = b, 0 � X � ζI,

A∗y + S = C, 0 � S � ζI,

T r(XS) = 0.

To simplify notations we denote X = X(µ, ν), y = y(µ, ν) and S = S(µ, ν). Then

X, y and S are uniquely determined by the following system

b−AX = ν(b−AζI), X � 0,

C −A∗y − S = ν(C − ζI), S � 0,

XS = νζ2I.

Hence we have

A(X −X − νX + νζI) = 0, X � 0,

A∗(y − y − νy) = S − S + νS − νζI, S � 0,

XS = νζ2I.

Note that Tr((X −X − νX + νζI)(S − S + νS − νζI)) = 0. Defining

X := (1− ν)X + νζI, S := (1− ν)S + νζI,

we have Tr((X −X)(S − S)) = 0. This gives

Tr(XS) + Tr(XS) = Tr(XS) + Tr(SX).

21

Since Tr(XS) = 0, X + S � ζI and XS = νζ2I, we may write

Tr(XS) + Tr(XS) = Tr(((1− ν)X + νζI)((1− ν)S + νζI)) + νζ2n

= ν(1− ν)Tr((X + S)ζI) + ν2ζ2n+ νζ2n

≤ ν(1− ν)Tr((ζI)(ζI)) + ν2ζ2n+ νζ2n

= ν(1− ν)ζ2n+ ν2ζ2n+ νζ2n

= 2νζ2n.

Moreover, using X � νζI and S � νζI, we get

Tr(XS) + Tr(SX) = Tr(((1− ν)X + νζI)S) + Tr(((1− ζ)S + νζI)X)

= (1− ν)(Tr(SX) + Tr(XS)) + νζTr(X + S)

≥ νζTr(X + S).

Hence we obtain Tr(S)+Tr(X) ≤ 2ζn. Since ‖X‖2 +‖S‖2 ≤ (Tr(X)+Tr(S))2,

it follows that√‖X‖2 + ‖S‖2

ζ√

2n≤ Tr(X) + Tr(S)

ζ√

2n≤ 2ζn

ζ√

2n=√

2n,

which proves

κ(ζ) ≤√

2n.

5 Iteration bound

In the previous sections we have found that if at the start of an iteration the

iterates satisfy δ(X,S;µ) ≤ τ , with τ as defined in (4.13), then after the feasibility

step, with θ as in (4.8), and α as in (4.14), the iterates satisfy δ(X,S;µ+) ≤ 1/√

2.

According to (2.10), at most

log2(log2

1

τ 2) = log2(log2 64)

centering steps suffice to get iterates that satisfy δ(X,S;µ+) ≤ τ . So each itera-

tion consists of at most 4 so-called ‘inner’ iterations, in each of which we need to

compute a new search direction. In each main iteration both the duality gap and

22

the norms of the residual vectors are reduced by the factor 1 − θ. Hence, using

Tr(X0S0) = ζ2n, the total number of iterations is bounded above by

1

θlog

max{nζ2, ‖r0b‖, ‖R0

C‖}ε

.

Since

θ =α√2n

=1

3√

2nκ(ζ),

the total number of inner iterations is therefore bounded above by

24n logmax{nζ2, ‖r0

b‖, ‖R0C‖}

ε.

6 Concluding remarks

In the paper we extend the work of Roos for infeasible interior-point methods to

semidefinite programming. As mentioned in Roos [8], extensions to other cases,

e.g., second-order cone programming and linear complementarity problems seem

be within reach, which will be our future work.

References

[1] E. de Klerk. Aspects of Semidefinite Programming: Interior Point Meth-

ods and Selected Applications. Kluwer Academic Publishers, Dordrecht, The

Netherlands, 2002.

[2] Z. Liu and W. Sun. An infeasible interior-point algorithm with full-Newton

step for linear optimization. Numerical Algorithms, 46(2):173–188, 2007.

[3] H. Mansouri and C. Roos. Simplified O(nL) infeasible interior-point algo-

rithm for linear optimization using full-Newton step. Optimization Methods

and Software, 22(3):519–530, 2007.

[4] R.D.C. Monteiro and Y. Zhang. A unified analysis for a class of long-step

primal-dual path-following interior-point algorithms for semidefinite pro-

gramming. Mathematical Programming, 81(3):281–299, 1998.

23

[5] J. Peng, C. Roos, and T. Terlaky. New complexity analysis of the primal-

dual method for semidefinite optimization based on NT-direction. Journal

of Optimization Theory and Applications, 109(2):327-343, 2001.

[6] J. Peng, C. Roos, and T. Terlaky. Self-regular functions and new search di-

rections for linear and semidefinite optimization. Mathematics Programming,

93(1):129–171, 2002.

[7] C. Roos, T. Terlaky, and J.-Ph.Vial. Theory and Algorithms for Linear Op-

timization. An Interior Approach. John Wiley and Sons, Chichester, UK,

1997.

[8] C. Roos. A full-Newton step O(n) infeasible interior-point algorithm for lin-

ear optimization. SIAM Journal on Optimization, 16(4):1110–1136, 2006.

[9] H. Wolkowicz, R. Saigal and L. Vandenberghe. Handbook of Semidefinite

Programming: Theory, Algorithms and Applications. Norwell, MA: Kluwer,

1999.

24

Documents

An Infeasible Interior-Point Algorithm with Full Nesterov ... · 1 Introduction For a comprehensive learning about of interior-point methods (IPMs), we refer to Klerk [1] and Roos