Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
An Infeasible Interior-Point Algorithm
with Full Nesterov-Todd Step for
Semidefinite Programming
Zhongyi Liu ∗
Abstract
This paper proposes an infeasible interior-point algorithmwith full Nesterov-Todd step for semidefinite programming, whichis an extension of the work of Roos (SIAM J. Optim., 16(4):1110–1136, 2006). The polynomial bound coincides with that of in-feasible interior-point methods for linear programming, namely,O(n log n/ε).
Keywords: semidefinite programming, full Nesterov-Todd step, in-
feasible interior-point methods, primal-dual, polynomial complexity
AMS subject classification: 65K05, 90C51
1 Introduction
For a comprehensive learning about of interior-point methods (IPMs), we refer
to Klerk [1] and Roos et al. [7]. In Roos [8], a full-Newton step infeasible
interior-point algorithm for linear programming (LP ) was presented. And some
extensions which are still based on (LP ) were carried out by Mansouri and Roos
[3], Liu and Sun [2]. In this paper we perform an extension to semidefinite
programming (SDP ).
We consider the (SDP ) given in the following standard form:
(SDP ) min{Tr(CX) : Tr(AiX) = bi, i = 1, . . . ,m, X � 0}∗College of Science, Hohai University, Nanjing 210098, China. Email: [email protected]
1
and its associated dual problem:
(SDD) max{bTy :m∑
i=1
yiAi + S = C, S � 0}.
Here C and Ai, i = 1, . . . ,m are symmetric n× n matrices, i.e., C,Ai ∈ Sn, and
b, y ∈ Rm. Furthermore, X � 0 (X � 0) means that X is symmetric and positive
semidefinite (symmetric and positive definite). The matrices Ai, i = 1, . . . ,m are
assumed to be linearly independent. (SDP ) is a generalization of (LP ) where
all the matrices Ai and C are diagonal which implies S is automatically diagonal
and so X might also be assumed to be diagonal.
Note that the previous forms for (SDP ) and (SDD) can be expressed into
the following equivalent form:
(SDP ) min{Tr(CX) : AX = b, X � 0}
and its associated dual problem:
(SDD) max{bTy : A∗y + S = C, S � 0},
whereAX = (Tr(A1X), T r(A2X), . . . , T r(AmX))T andA∗y =m∑
i=1
yiAi. Through-
out the paper, we use this standard form.
Monteiro and Zhang [4] gives a unified analysis of feasible IPMs for semidefi-
nite programming that use the so-called commutative class of search directions.
These search directions include the popular directions such as the NT (Nesterov-
Todd), the XS, and the SX directions. In this paper we use the NT direction to
extend the infeasible interior-point algorithm to (SDP ), which is first developed
for (LP ) by Roos [8].
Now we give some notations. An inner product is used for matrices X,Y ∈ Sn,
which is
< X, Y >:= X • Y := Tr(XY ).
And two norm symbols will appear in the text. For X ∈ Sn,
‖X‖2 :=√λmax(X2) := max
i{|λi(X)|},
and
‖X‖ := ‖X‖F = ‖λ(X)‖ =
√√√√ n∑i=1
λ2i (X),
where λ(X) is the eigenvalue vector of X. Note that for convenience we use the
same norm symbol to express the Frobeniues norm for matrices and the l2-norm
for vectors.
2
2 Full NT step infeasible IPMs
We assume both (SDP ) and (SDD) are strictly feasible. The central path for
(SDP ) is defined by the solution sets {(X(µ), y(µ), S(µ)), µ > 0} of the following
system
AX = b, X � 0,
A∗y + S = C, S � 0, (2.1)
XS = µI,
where I denotes the n × n identity matrix and µ > 0. Suppose that the point
(X, y, S) is strictly feasible, so X � 0 and S � 0. Newton’s method amounts to
linearizing the system (2.1), thus yielding the following system of equations
A∆X = b−AX,A∗∆y + ∆S = C −A∗y − S,
∆XS +X∆S = µI −XS.
Since Ai, i = 1, . . . ,m are linearly independent, and X,S � 0, one may easily
verify that the system is nonsingular. Hence this system uniquely defines the
search directions ∆X, ∆y and ∆S.
If X is primal feasible and (y, S) dual feasible, then b − AX = 0 and C −A∗y − S = 0, whence the above system reduces to
A∆X = 0,
A∗∆y + ∆S = 0, (2.2)
∆XS +X∆S = µI −XS,
which gives the usual search directions for feasible primal-dual IPMs.
A crucial observation for (SDP ) is that the system (2.2) might have no sym-
metric solution ∆X. Among ways of symmetrizing the third equation in the
Newton system, we consider the symmetrization scheme that yields the NT di-
rection. Let us define the matrix
P = X12 (X
12SX
12 )−
12X
12 = S−
12 (S
12XS
12 )
12S−
12 , (2.3)
and D = P12 . The matrix D can be used to rescale X and S to the same matrix
V defined by
V :=1√µD−1XD−1 =
1õDSD. (2.4)
3
Obviously the matrices D and V are symmetric and positive definite. After
defining
DX :=1√µD−1∆XD−1, DS :=
1√µD∆SD, (2.5)
the complementary condition in (2.2) reduces to
DX +DS = V −1 − V. (2.6)
For the derivation of this equation in length, see Peng et al. [6].
For any ν with 0 < ν ≤ 1 we consider the perturbed problem (SDPν), defined
by
(SDPν) min{Tr((C − νDiag(r0b ))X) : AX = b− νr0
b , X � 0},
and its dual problem (SDDν), which is given by
(SDDν) max{(b− νr0b )
Ty : A∗y + S = C − νR0C , S � 0}.
Then after pre- and post-multiplying (2.6) by D, the NT search directions
can be written as the solution of the following system
A∆X = νr0b ,
A∗∆y + ∆S = νR0C , (2.7)
D−1∆XSD +D∆SXD−1 = µI −D−1XSD,
where r0b = b−AX0 and R0
C = C −A∗y0 − S0.
Note that the perturbed problems satisfy Slater’s regularity condition when
ν = 1.
Lemma 2.1. The original problems, (SDP ) and (SDD), are feasible if and only
if for each ν satisfying 0 < ν ≤ 1, the perturbed problems (SDPν) and (SDDν)
satisfy Slater’s regularity condition.
Proof. Suppose that (SDP ) and (SDD) are feasible. Let X be a feasible solution
of (SDP ) and (y,S) a feasible solution of (SDD). ThenAX = b andA∗y+S = C,
with X � 0 and S � 0. Now let 0 < ν ≤ 1, and consider
X = (1− ν)X + νX0, y = (1− ν)y + νy0, S = (1− ν)S + νS0.
4
One has
AX = A((1− ν)X + νX0) = (1− ν)AX + νAX0
= (1− ν)b+ νAX0 = b− ν(b−AX0),
showing that X is feasible for (SDPν). Similarly,
A∗y + S = (1− ν)(A∗y + S) + ν(A∗y0 + S0)
= (1− ν)C + ν(A∗y0 + S0) = C − ν(C −A∗y0 − S0),
showing that (y, S) is feasible for (SDDν). Since ν > 0 and X and S are sym-
metric positive definite, thus proving that (SDPν) and (SDDν) satisfy Slater’s
regularity condition.
To prove the inverse implication, suppose that (SDPν) and (SDDν) satisfy
Slater’s regularity condition for each ν satisfying 0 < ν ≤ 1. Obviously, then
(SDPν) and (SDDν) are feasible for these values of ν. Letting ν go to zero it
follows that (SDP ) and (SDD) are feasible.
Assuming that (SDP ) and (SDD) are feasible, it follows from Lemma 2.1
that the problems (SDPν) and (SDDν) satisfy Slater’s regularity condition, for
each ν ∈ (0, 1]. And their central paths exist. This means that the system
b−AX = νr0b , X � 0,
C −A∗y − S = νR0C , S � 0,
D−1∆XSD +D∆SXD−1 = µI −D−1XSD
has a unique solution, for every ν > 0. If ν ∈ (0, 1] and µ = νζ2 we denote this
unique solution as (X(ν), y(ν), S(ν)). As a consequence, X(ν) is the µ-center
of (SDPν) and (y(ν), S(ν)) the µ-center of (SDDν). Thus we have, by taking
ν = 1, (X(1), y(1), S(1)) = (X0, y0, S0) = (ζI, 0, ζI).
We measure proximity of iterates (X, y, S) to the µ-center of the perturbed
problems (SDPν) and (SDDν) by the quantity
δ(X,S;µ) := δ(V ) :=1
2‖V − V −1‖. (2.8)
Initially we have X = S = ζI and µ = ζ2, whence V = I and δ(X,S;µ) = 0.
In the sequel we assume that at the start of each iteration, δ(X,S;µ) is smaller
than or equal to a (small) threshold value τ > 0. Of course this is true at the
start of the first iteration.
5
Now we use the following system to define ∆fX,∆fy and ∆fS, which is called
the feasibility step,
A∆fX = θνr0b ,
A∗∆fy + ∆fS = θνR0C , (2.9)
D−1∆fXSD +D∆fSXD−1 = µI −D−1XSD.
The algorithm begins with a strictly feasible point (X, y, S) such that:
A1: (X, y, S) satisfies the feasibility conditions for the perturbed problems;
A2: Tr(XS) = nµ, δ(X,S;µ) ≤ τ with µ = νζ2.
First we find a new point (Xf , yf , Sf ) such that A1 is satisfied with ν+ := (1−θ)ν.Then µ is reduced, µ+ := (1 − θ)µ. After these two stages, what we desire is
δ(Xf , Sf ;µ+) ≤ τ . But in general δ(Xf , Sf ;µ+) ≤ τ cannot be kept more. So a
limited centering steps are applied to produce new points (X+, y+, S+) such that
Tr(X+S+) = nµ+ until δ(X+, S+;µ+) ≤ τ . Then the process can be repeated.
Note that after a feasibility step, the new iterates
Xf = X + ∆fX, yf = y + ∆fy, Sf = S + ∆fS,
are strictly feasible (i.e., Xf � 0 and Sf � 0) since θ is small enough. Now we
give a more formal description of the algorithm in Figure 1.
In the centering steps, starting at the iterates (X, y, S) = (Xf , yf , Sf ) and
targeting at the µ-centers, the search directions ∆X, ∆y, ∆S are the usual
primal-dual NT directions, (uniquely) defined by
A∆X = 0,
A∗∆y + ∆S = 0,
D−1∆XSD +D∆SXD−1 = µI −D−1XSD,
6
Primal-Dual Infeasible IPMs for (SDP)
Input:
Accuracy parameter ε > 0;
barrier update parameter θ, 0 < θ < 1;
threshold parameter τ > 0.
begin
X := ζI; y := 0; S := ζI; ν = 1;
while max{Tr(XS), ‖b−AX‖, ‖C −A∗y − S‖}≥ ε do
begin
feasibility step: (X, y, S) := (X, y, S) + (∆fX,∆fy,∆fS);
µ-update: µ := (1− θ)µ;
centering steps:
while δ(X,S;µ) ≥ τ do
(X, y, S) := (X, y, S) + (∆X,∆y,∆S);
end while
end
end
Figure 1: Algorithm
Denoting the iterates after a centering step as X+, y+ and S+, we recall the
following results from Chapter 7 of Klerk [1].
Lemma 2.2. Let X, S satisfy Slater’s regularity condition and µ > 0. If δ :=
δ(X,S;µ) < 1, then the full NT step is strictly feasible.
Corollary 2.3. Let X, S satisfy Slater’s regularity condition and µ > 0 such
that δ(X,S;µ) < 1. Then Tr(X+S+) = nµ.
Lemma 2.4. After a feasible full NT step the proximity function satisfies
δ+ := δ(X+, S+;µ) ≤ δ2√2(1− δ2)
.
From this lemma, one can easily get the following quadratic convergence re-
sult.
7
Lemma 2.5. If δ := δ(X,S;µ) < 1/√
2, then δ(X+, S+;µ) < δ2.
The centering steps serve to get iterates that satisfy Tr(XS) = nµ+ and
δ(X,S;µ) < τ , where τ is (much) smaller than 1/√
2. By using Lemma 2.5,
the required number of centering steps can easily to obtained. Because after the
µ-update we have δ(Xf , Sf ;µ+) ≤ 1/√
2, and hence after k centering steps the
iterates (X, y, S) satisfy
δ(X,S;µ+) ≤ (1√2)2k
.
From this one easily deduces that no more than
log2(log2
1
τ 2) (2.10)
centering steps are needed.
3 Technical results
Given a strictly primal feasible solution of (SDP ) and a strictly dual feasible
solution (y, S) of (SDD), and µ > 0, let
Φ(XS;µ) := Ψ(V ) :=n∑
i=1
ψ(λi(V )),
where V is defined as (2.4), and
ψ(t) :=1
2(t2 − 1− log t2).
It is well known that ψ(t) is the kernel function of the primal-dual logarithmic
barrier function, which, up to some constant, is the function Φ(XS;µ). Note that
V 2 =1
µD−1XSD =
1
µDSXD−1,
and then V and 1√µD−1X1/2S1/2D share the same system of eigenvalues.
Lemma 3.1. One has
Φ(XS;µ) = Φ(XS(µ);µ) + Φ(X(µ)S;µ).
8
Proof. The equality in the lemma is equivalent to
n∑i=1
(λi(
1
µD−1XSD)− 1− log(λi(
1
µD−1XSD))
)=
n∑i=1
(λi(
1
µD−1X(µ)SD)− 1− log(λi(
1
µD−1X(µ)SD))
)+
n∑i=1
(λi(
1
µD−1XS(µ)D)− 1− log(λi(
1
µD−1XS(µ)D))
).
Sincen∑
i=1
(λi(
1
µD−1XSD)− 1
)= Tr(
1
µD−1XSD)− n, (3.1)
andn∑
i=1
(λi(
1
µD−1X(µ)SD)− 1
)= Tr(
1
µD−1X(µ)SD)− n, (3.2)
n∑i=1
(λi(
1
µD−1XS(µ)D)− 1
)= Tr(
1
µD−1XS(µ)D)− n. (3.3)
The sum of (3.2) and (3.3) is equal to (3.1) if
Tr(1
µD−1XSD)− n = Tr(
1
µD−1X(µ)SD)− n+ Tr(
1
µD−1XS(µ)D)− n.
Using that X(µ)S(µ) = µI, whence Tr(X(µ)S(µ)) = nµ, this can be written as
Tr((X −X(µ))(S − S(µ))) = 0, that is, X −X(µ) and S − S(µ) are orthogonal.
This is indeed true since X −X(µ) belongs to L⊥ and S − S(µ) to L, where Lis defined as
L := span{A1, . . . , Am}.
In addition,
n∑i=1
log(λi(1
µD−1XSD)) =
n∑i=1
log(λi(1
µD−1X(µ)SD)) +
n∑i=1
log(λi(1
µD−1XS(µ)D))
holds true since the left-hand side is log det( 1µXS), the right-hand side is
log det(1
µX(µ)S) + log det(
1
µXS(µ)) = log det(
1
µ2XSX(µ)S(µ)) = log det(
1
µXS).
Hence the lemma is proved.
9
Theorem 3.2. Let δ(V ) be as defined in (2.8) and ρ(δ) later as in (4.4). Then
Ψ(V ) ≤ ψ(ρ(δ(V ))).
Proof. The lemma is obvious if V = I since then δ(V ) = Ψ(V ) = 0 and since
ρ(0) = 1 and ψ(1) = 0. Now we consider δ(V ) > 0 and Ψ(V ) > 0. For this
nontrivial case we consider, for τ > 0, the following problem
zτ = maxV{Ψ(V ) =
n∑i=1
ψ(λi(V )) : δ(V )2 =1
4
n∑i=1
ψ′(λi(V ))2 = τ 2}.
The first-order optimality conditions are
ψ′(λi(V )) = ηψ′(λi(V ))ψ′′(λi(V )), i = 1, . . . , n,
where the parameter η ∈ Rn. From the equations we have either ψ′(λi(V )) = 0 or
ηψ′′(λi(V )) = 1, for each i. The first case implies λi(V ) = 1. For the second case,
note that ψ′′(t) is monotonically decreasing, this implies that all λi(V ) such that
ηψ′′(λi(V )) = 1 have the same value (denoting this value as t, and the number of
the second case holding true as k, since τ > 0, thus k ≥ 1). Thus after reordering
the coordinates, λ(V ) has the form
λ(V ) = (t, . . . , t︸ ︷︷ ︸k
, 1, . . . , 1︸ ︷︷ ︸n−k
)T .
Since ψ′(1) = 0, the sum in the constraint condition has k nonzero components,
then kψ′(t)2 = 4τ 2. Note that ψ′(t) = t− 1/t, it follows that
t− 1
t= ± 2τ√
k,
which shows t = ρ(τ/√k) or t = 1/ρ(τ/
√k). For the first case, t > 1, which
produces the largest value of ψ(t), where we need to prove is
ψ(ρ(τ/√k)) ≥ ψ(
1
ρ(τ/√k)
).
In fact, for t ≥ 1,
ψ(t)− ψ(1
t) =
1
2(t2 − 1
t2) > 0.
Since we want to maximize Ψ(V ), then t = ρ(τ/√k) is just what we need. At
this time,
Ψ(V ) = kψ(ρ(τ√k)).
10
What remains is which value of k maximizes Ψ(V ). For this purpose, we consider
the first-order derivative of Ψ(V ). To simplify notations, we write
Ψ(V ) = kψ(t), t = ρ(s), s =τ√k.
From the definition of t, we have (t − s)2 = 1 + s2 or t2 − 1 = 2st, whence we
have
2s = t− 1
t= ψ′(t).
Some computations yield
dΨ(V )
dk= f(τ) := ψ(t)− s2ρ(s)√
1 + s2
and
f ′(τ) = − 1√k
s2
(1 + s2)√
1 + s2.
One may easily verify that f(τ) = 0 and f ′(τ) ≤ 0. This implies f(τ) ≤ 0 for
each τ ≥ 0. Hence we conclude that Ψ(V ) is monotonically decreasing in k. So
Ψ(V ) is maximal when k = 1 and the theorem is given.
Corollary 3.3. Let τ ≥ 0 and δ(V ) ≤ τ . Then Ψ(V ) ≤ τ ′ := ψ(ρ(τ)).
Proof. Since ρ(s) is monotonically increasing in s, and ρ(s) ≥ 1 for all s ≥ 0,
and moreover, ψ(t) is monotonically increasing if t ≥ 1, the function ψ(ρ(δ)) is
increasing in δ, for δ ≥ 0. Thus the result is immediate from Theorem 3.2.
Lemma 3.4. Let δ(V ) ≤ τ and let τ ′ be as defined in Corollary 3.3. Then
ψ(λi(X(µ)−1/2X1/2)) ≤ τ ′, ψ(λi(X(µ)−1/2X1/2)) ≤ τ ′, i = 1, . . . , n.
Proof. By Lemma 3.1, we have Ψ(XS, µ) = Ψ(XS(µ);µ) + Ψ(X(µ)S;µ). The
nonnegativity of Ψ(XS;µ), Ψ(XS(µ);µ) and Ψ(X(µ)S;µ), together with Corol-
lary 3.3, imply Ψ(XS(µ);µ) ≤ τ ′ and Ψ(X(µ)S;µ) ≤ τ ′. The first of these two
inequalities gives
Ψ(XS(µ);µ) =n∑
i=1
ψ(λi(1√µX1/2S(µ)1/2)) ≤ τ ′.
11
Since ψ(τ) ≥ 0 for every t > 0, it follows that
ψ(λi(1√µX1/2S(µ)1/2)) ≤ τ ′, i = 1, . . . , n.
Due to X(µ)S(µ) = µI, we get that, if not considering the order of eigenval-
ues, λ( 1√µX1/2S(µ)1/2) will be the same as λ(X(µ)−1/2S(µ)−1/2X1/2S(µ)1/2) and
λ(X(µ)−1/2X1/2). Thus we obtain the inequality in the lemma. The second
inequality follows in the same way.
In the sequel we use the inverse function of ψ(t) for 0 < t ≤ 1, which is
denoted as χ(s). So χ : [0,∞) → (0, 1] and we have
χ(s) = t ⇔ s = ψ(t), s ≥ 0, 0 < t ≤ 1. (3.4)
Lemma 3.5. For each t > 0 one has χ(ψ(t)) ≤ t ≤ 1 +√
2ψ(t).
Proof. See the proof of Lemma A.5 in Roos [8].
Corollary 3.6. If δ(V ) ≤ τ then
χ(τ ′) ≤ λi(X(µ)−1/2X1/2) ≤ 1 +√
2τ ′, χ(τ ′) ≤ λi(S(µ)−1/2S1/2) ≤ 1 +√
2τ ′.
Proof. This is immediate from Lemma 3.4 and Lemma 3.5.
Theorem 3.7. If δ(V ) ≤ τ then
‖S−1/2X1/2‖ ≤ 1 +√
2τ ′√µχ(τ ′)
‖X(µ)‖,
‖X−1/2S1/2‖ ≤ 1 +√
2τ ′√µχ(τ ′)
‖S(µ)‖.
Proof. It is known that S−1/2X1/2 shares a common system of eigenvalues with
(S(µ)−1/2S1/2)−1X(µ)−1/2X1/2X(µ)1/2S(µ)−1/2.
Using X(µ)S(µ) = µI, Corollary 3.3 and Corollary 3.6, then
n∑i=1
λ2i (S
−1/2X1/2) ≤ (1 +√
2τ ′)2
(χ(τ ′))2
n∑i=1
λ2i (X(µ)1/2S(µ)−1/2)
=
(1 +
√2τ ′
√µχ(τ ′)
)2 n∑i=1
λ2i (X(µ)).
The second inequality is obtained in the same way.
12
4 Analysis of the feasibility step
4.1 The feasibility step
Let X, y and S denote the iterates at the start of an iteration, and assume
δ(X,S;µ) ≤ τ . Recall that at the start of the first iteration this is certainly true,
because then δ(X,S;µ) = 0.
Defining DfX , D
fS as in (2.5) and V as in (2.4). Now we may write
D−1XfSfD = D−1(X + ∆fX)(S + ∆fS)D
= D−1(XS + ∆fXS +X∆fS + ∆fX∆fS)D
= D−1XSD +D−1∆fXSD +D−1X∆fSD +D−1∆fX∆fSD.
Since D−1X∆fSD ∼ D∆fSXD−1 and (2.7), then
XfSf ∼ µI +D−1∆fX∆fSD. (4.1)
Using (2.4) and (2.5), we may also write
Xf = X + ∆fX =√µD(V +Df
X)D, (4.2)
Sf = S + ∆fS =√µD−1(V +Df
S)D−1. (4.3)
Recall that we assume that before the feasibility step one has δ ≤ τ .
Lemma 4.1. The iterates (Xf , yf , Sf ) are certainly strictly feasible if
‖λ(DfX)‖ < 1
ρ(δ)and ‖λ(Df
S)‖ < 1
ρ(δ),
where
ρ(δ) := δ +√
1 + δ2. (4.4)
Proof. It is clear from (4.2) that Xf is strictly feasible if and only if V +DfX � 0
(the reason is Xf =õDT (V + Df
X)D). This certainly holds if ‖λ(DfX)‖ <
min(λi(V )). Since
2δ = ‖V − V −1‖ = ‖λ(V )− λ(V )−1‖.
Hence the minimal value t that an entry of λ(V ) can attain will satisfy t ≤ 1
and 1/t − t = 2δ. Here the last equation implies t2 + 2δt − 1 = 0, which gives
t = −δ +√
1 + δ2 = 1/ρ(δ). This proves the first inequality in the lemma. The
second inequality is obtained in the same way.
13
In the sequel we denote
ω := ω(V ) :=1
2
√‖Df
X‖2 + ‖DfS‖2,
DfXS :=
1
2(Df
XDfS +Df
SDfX).
This implies
Tr(DfXS) = Tr(Df
XDfS) ≤ ‖Df
X‖2 · ‖DfS‖
≤ ‖DfX‖ · ‖D
fS‖ ≤
1
2(‖Df
X‖2 + ‖Df
S‖2) ≤ 2ω2, (4.5)
|λi(DfXS)| = |λi(
DfXD
fS +Df
SDfX
2)|
≤ ‖DfXD
fS +Df
SDfX
2‖ ≤ ‖Df
X‖ · ‖DfS‖ ≤ 2ω2. (4.6)
Lemma 4.2. One has
4δ(V f )2 ≤ θ2n
1− θ+
2ω2
1− θ+ (1− θ)
2ω2
1− 2ω2.
Proof.
δ(Xf , Sf ;µ+) = δ(V f ) =1
2‖V f − (V f )−1‖, where (V f )2 ∼ 1
µ+D−1XfSfD.
Using (4.1), we obtain
D−1XfSfD ∼ µI +D−1∆fX∆fSD. (4.7)
Since µV 2 = D−1XSD, after division of both sides in (4.7) by µ+ we get
(V f )2 ∼ µ
µ+I +
µ
µ+Df
XDfS
=I +Df
XDfS
1− θ
=I +Df
XS
1− θ+Df
XDfS −Df
SDfX
2(1− θ).
Set u = (λ1/21 (I +Df
XDfS), . . . , λ
1/2n (I +Df
XDfS))T , thus
2δ(V f ) =
∥∥∥∥ u√1− θ
−√
1− θu−1
∥∥∥∥ =
∥∥∥∥ θu√1− θ
+√
1− θ(u− u−1)
∥∥∥∥ .14
Therefore,
4δ(V f )2 =θ2
1− θ‖u‖2 + (1− θ)‖u− u−1‖2 + 2θuT (u− u−1)
= (θ2
1− θ+ 2θ)‖u‖2 + (1− θ)‖u− u−1‖2 − 2θuTu−1
= (θ2
1− θ+ 2θ)(n+ Tr(Df
XDfS)) + (1− θ)‖u− u−1‖2 − 2θn
=θ2n
1− θ+ (
θ2
1− θ+ 2θ)Tr(Df
XDfS) + (1− θ)‖u− u−1‖2.
Since
‖u−1 − u‖2 = Tr(I +DfXD
fS) + Tr((I +Df
XDfS)−1)− 2n
= Tr(DfXS) + Tr((I +Df
XDfS)−1)− n
≤ Tr(DfXS) + Tr((I +Df
XS)−1)− n
= Tr(DfXS) +
n∑i=1
(1
1 + λi(DfXS)
− 1
)
= Tr(DfXS)−
n∑i=1
λi(DfXS)
1 + λi(DfXS)
,
where the inequality is due to Lemma 2.1 in Peng et al. [5] and the third equality
is due to the fact λi(I+B) = 1+λi(B) for some matrix B. Note that λi(A+B) 6=λi(A) + λi(B) generally, see Theorem 2.3.5 in Wolkowicz et al. [9]. Substitution
gives
4δ(V f )2 =θ2n
1− θ+ (
θ2
1− θ+ 2θ)Tr(Df
XDfS) + (1− θ)
(Tr(Df
XS)−n∑
i=1
λi(DfXS)
1 + λi(DfXS)
)
=θ2n
1− θ+
1
1− θTr(Df
XS)− (1− θ)n∑
i=1
λi(DfXS)
1 + λi(DfXS)
.
Hence, using (4.5) and (4.6), we arrive at
4δ(V f )2 ≤ θ2n
1− θ+
2ω2
1− θ+ (1− θ)
2ω2
1− 2ω2,
which completes the proof.
Because we need to have δ(V f ) ≤ 1/√
2, it follows from Lemma 4.2 that it
suffices if
θ2n
1− θ+
2ω2
1− θ+ (1− θ)
2ω2
1− 2ω2≤ 2.
15
If we decide to choose
θ =α√2n, α ≤ 1√
2. (4.8)
Then, for n ≥ 2, one may easily verify that
ω ≤ 1
2⇒ δ(V f ) ≤ 1√
2.
Note that the system (2.9) can be expressed in terms of the scaled search
directions DfX and Df
S as follows.
ADfX = θνr0
b ,
A∗∆fy
µ+Df
S =1√µθνDR0
CD, (4.9)
DfX +Df
S = V −1 − V,
where
A(·) = (Tr(A1(·)), T r(A2(·)), . . . , T r(Am(·)))T ,
and
Ai =õDAiD.
In the feasible case, the scaled search directions DfX and Df
S form an orthog-
onal decomposition for the matrix V −1 − V . Thus we can have upper bounds
for the norms of DfX and Df
S, namely, ‖DfX‖ ≤ 2δ(V ) and ‖Df
S‖ ≤ 2δ(V ), and
moreover ω = δ(V ). In the infeasible case the situation is quite different since or-
thogonality of DfX and Df
S is lost generally. And it is harder to get upper bounds
of ‖DfX‖ and ‖Df
S‖. First we need to obtain the upper bound of ω.
4.2 Upper bound for ω(V )
Now let
L := span{A1, . . . , Am}.
Given a strictly feasible point X we call ∆X a feasible direction at X if ∆X ∈ L⊥.
Similarly ∆S is a feasible direction at a strictly feasible point S if ∆S ∈ L. So
L⊥ := {ξ ∈ Rn×n : Aξ = 0}.
Obviously, the affine space {ξ ∈ Rn×n : Aξ = θνr0b} equals to Df
X + L⊥ and
DfS ∈ 1√
µθνDR0
CD + L.
16
Lemma 4.3. Let Q be the (unique) point in the intersection of the affine spaces
DfX + L⊥ and Df
S + L. Then
2ω ≤√‖Q‖2 + (‖Q‖+ 2δ(V ))2.
Proof. Let us denote R = V −1−V , since L+L⊥ = Rn×n, there exist Q1, R1 ∈ L⊥and Q2, R2 ∈ L such that
Q = Q1 +Q2, R = R1 +R2.
On the other hand, since DfX−Q ∈ L⊥ and Df
S−Q ∈ L, there must exist L1 ∈ L⊥and L2 ∈ L such that
DfX = Q+ L1, Df
S = Q+ L2.
Due to the third equation of (4.9) it follows that R = 2Q+L1 +L2, which implies
(2Q1 + L1) + (2Q2 + L2) = R1 +R2,
from which we conclude that
L1 = R1 − 2Q1, L2 = R2 − 2Q2.
Hence we obtain
DfX = Q+R1 − 2Q1 = (R1 −Q1) +Q2,
DfS = Q+R2 − 2Q2 = Q1 + (R2 −Q2).
Since the spaces L⊥ and L are orthogonal we conclude that
4‖ω‖2 = ‖DfX‖
2 + ‖DfS‖
2
= ‖R1 −Q1‖2 + ‖Q2‖2 + ‖Q1‖2 + ‖R2 −Q2‖2
= ‖Q−R‖2 + ‖Q‖2.
Assuming that ‖Q‖ 6= 0, since ‖R‖ = 2δ(V ), then the right-hand side is maximal
if R = −2δ(V )Q/‖Q‖, thus,
4ω2 ≤ ‖Q‖2 + (‖Q‖+ 2δ(V ))2.
Since the above inequality holds true with equality if ‖Q‖ = 0 (Q=0 implies that
DfX is orthogonal to Df
S, then 2ω2 = δ2). Hence the lemma is given.
In the sequel we denote δ(V ) as δ. Recall from 4.2 that in order to guarantee
that δ(V f ) ≤ 1/√
2 we need to have ω ≤ 1/2. Due to Lemma 4.3 it will certainly
hold if ‖Q‖ satisfies
‖Q‖2 + (‖Q‖+ 2δ)2 ≤ 1. (4.10)
17
4.3 Upper bound for ‖Q‖
Recall from Lemma 4.3 that Q is the (unique) solution of the system
AQ = θνr0b ,
A∗ξ +Q =1√µθνDR0
CD.
We proceed by deriving an upper bound for ‖Q‖. For the moment, let us write
rb = θνr0b , RC = θνR0
C .
Defining an operation AA∗ as
AA∗ :=
Tr(A1A1) Tr(A1A2) · · · Tr(A1Am)
Tr(A2A1) Tr(A2A2) · · · Tr(A2Am)
· · · · · · · · · · · ·Tr(AmA1) Tr(AmA2) · · · Tr(AmAm)
,
one can easily verify that the above matrix is symmetric and positive definite by
the definition of a positive definite matrix whenever A1, A2, . . . , Am are linearly
independent. Thus
ξ =1
µ(AA∗)−1(ADRCD − rb).
Substitution gives
Q =1õ
(DRCD − A∗(AA∗)−1ADRCD + A∗(AA∗)−1rb
)=
1õ
((I − A∗(AA∗)−1A)DRCD + A∗(AA∗)−1rb
):=
1õ
(Q1 +Q2)
Here one can easily verify that A∗(AA∗)−1A is a orthogonal projection operator.
Let (y, S) be such that A∗y + S = C. Thus
DRCD = θνDR0CD = θνD(C −A∗y0 − S0)D = θν(A∗(y − y0) +D(S − S0)D).
We obtain
‖Q1‖ ≤ θν‖D(S − S0)D‖.
18
On the other hand, let X be such that AX = b. Then
rb = θνr0b = θν(b−AX0) = θνA(X −X0) = A(D−1(X −X0)D−1).
and
Q2 = θν(A∗(AA∗)−1A(D−1(X −X0)D−1)).
Hence it follows that
‖Q2‖ ≤ θν‖D−1(X −X0)D−1‖.
SinceõQ = Q1 +Q2 and Q1 and Q2 are orthogonal, we may conclude that
√µ‖Q‖ =
√‖Q1‖2 + ‖Q2‖2
≤ θν√‖D(S − S0)D‖2 + ‖D−1(X −X0)D−1‖2. (4.11)
Note that still free to choose X and S such that AX = b and A∗y + S = C. Let
X be an optimal solution of (SDP ) and (y, S) of (SDD). Then Tr(XS) = 0.
Let ζ be such that ‖X + S‖2 ≤ ζ, and the initial choices be
X0 = S0 = ζI, y0 = 0, µ0 = ζ2.
The matrices X0 − X and S0 − S satisfy
0 � X0 − X � ζI, 0 � S0 − S � ζI.
Thus it follows that √‖D(S − S0)D‖2 + ‖D−1(X −X0)D−1‖2
≤ ζ√‖D2‖2 + ‖D−2‖2
= ζ√‖P‖2 + ‖P−1‖2
= ζ√‖S−1/2X1/2‖2 + ‖X−1/2S1/2‖2.
Substitution into (4.11) gives
√µ‖Q‖ ≤ ζθν
√‖S−1/2X1/2‖2 + ‖X−1/2S1/2‖2. (4.12)
To proceed we need upper and lower bounds for X and S.
19
4.4 Bounds for ‖S−1/2X1/2‖ and ‖X−1/2S1/2‖
From Theorem 3.7,
‖S−1/2X1/2‖ ≤ 1 +√
2τ ′√µχ(τ ′)
‖X(µ, ν)‖,
‖X−1/2S1/2‖ ≤ 1 +√
2τ ′√µχ(τ ′)
‖X(µ, ν)‖,
where
τ ′ := ψ(ρ(τ)), ψ(t) =1
2(t2 − 1− log t2),
and where χ : [0,∞) → (0, 1] is the inverse function of ψ(t) for 0 < t ≤ 1, as
defined in (3.4).
We choose
τ =1
8. (4.13)
Then τ ′ = 0.016921, 1 +√
2τ ′ = 1.18396 and χ(τ ′) = 0.872865, whence
1 +√
2τ ′
χ(τ ′)= 1.35641 <
√2.
It follows that
‖S−1/2X1/2‖ ≤√
2√µ‖X(µ, ν)‖, ‖X−1/2S1/2‖ ≤
√2
√µ‖S(µ, ν)‖.
Substitution into (4.12) gives
µ‖Q‖ ≤√
2θνζ√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2.
Therefore, using µ = µ0ν = ζ2ν and θ = α/√
2n, we obtain the following upper
bound for the norm of ‖Q‖:
‖Q‖ ≤ α
ζ√n
√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2.
We define
κ(ζ, ν) =
√‖X(µ, ν)‖2 + ‖S(µ, ν)‖2
ζ√
2n, 0 < ν ≤ 1, µ = µ0ν.
20
Note that X(ζ2, 1) = S(ζ2, 1) = ζI, we have κ(ζ, 1) = 1. Now we may write
‖Q‖ ≤√
2ακ(ζ), where κ(ζ) = max0<ν≤1
κ(ζ, ν).
We found in (4.10) that in order to have δ(V f ) ≤ 1/√
2, we should have ‖Q‖2 +
(‖Q‖ + 2δ(V ))2 ≤ 1. Since δ(V ) ≤ τ = 1/8, it suffices if ‖Q‖ ≤ 0.570971. Since√2/3 = 0.471405, we conclude that if we take
α =1
3κ(ζ), (4.14)
then we will certainly have δ(V f ) ≤ 1/√
2.
4.5 Bound for κ(ζ)
Due to the choice of X, y, S and the number ζ we have
AX = b, 0 � X � ζI,
A∗y + S = C, 0 � S � ζI,
T r(XS) = 0.
To simplify notations we denote X = X(µ, ν), y = y(µ, ν) and S = S(µ, ν). Then
X, y and S are uniquely determined by the following system
b−AX = ν(b−AζI), X � 0,
C −A∗y − S = ν(C − ζI), S � 0,
XS = νζ2I.
Hence we have
A(X −X − νX + νζI) = 0, X � 0,
A∗(y − y − νy) = S − S + νS − νζI, S � 0,
XS = νζ2I.
Note that Tr((X −X − νX + νζI)(S − S + νS − νζI)) = 0. Defining
X := (1− ν)X + νζI, S := (1− ν)S + νζI,
we have Tr((X −X)(S − S)) = 0. This gives
Tr(XS) + Tr(XS) = Tr(XS) + Tr(SX).
21
Since Tr(XS) = 0, X + S � ζI and XS = νζ2I, we may write
Tr(XS) + Tr(XS) = Tr(((1− ν)X + νζI)((1− ν)S + νζI)) + νζ2n
= ν(1− ν)Tr((X + S)ζI) + ν2ζ2n+ νζ2n
≤ ν(1− ν)Tr((ζI)(ζI)) + ν2ζ2n+ νζ2n
= ν(1− ν)ζ2n+ ν2ζ2n+ νζ2n
= 2νζ2n.
Moreover, using X � νζI and S � νζI, we get
Tr(XS) + Tr(SX) = Tr(((1− ν)X + νζI)S) + Tr(((1− ζ)S + νζI)X)
= (1− ν)(Tr(SX) + Tr(XS)) + νζTr(X + S)
≥ νζTr(X + S).
Hence we obtain Tr(S)+Tr(X) ≤ 2ζn. Since ‖X‖2 +‖S‖2 ≤ (Tr(X)+Tr(S))2,
it follows that√‖X‖2 + ‖S‖2
ζ√
2n≤ Tr(X) + Tr(S)
ζ√
2n≤ 2ζn
ζ√
2n=√
2n,
which proves
κ(ζ) ≤√
2n.
5 Iteration bound
In the previous sections we have found that if at the start of an iteration the
iterates satisfy δ(X,S;µ) ≤ τ , with τ as defined in (4.13), then after the feasibility
step, with θ as in (4.8), and α as in (4.14), the iterates satisfy δ(X,S;µ+) ≤ 1/√
2.
According to (2.10), at most
log2(log2
1
τ 2) = log2(log2 64)
centering steps suffice to get iterates that satisfy δ(X,S;µ+) ≤ τ . So each itera-
tion consists of at most 4 so-called ‘inner’ iterations, in each of which we need to
compute a new search direction. In each main iteration both the duality gap and
22
the norms of the residual vectors are reduced by the factor 1 − θ. Hence, using
Tr(X0S0) = ζ2n, the total number of iterations is bounded above by
1
θlog
max{nζ2, ‖r0b‖, ‖R0
C‖}ε
.
Since
θ =α√2n
=1
3√
2nκ(ζ),
the total number of inner iterations is therefore bounded above by
24n logmax{nζ2, ‖r0
b‖, ‖R0C‖}
ε.
6 Concluding remarks
In the paper we extend the work of Roos for infeasible interior-point methods to
semidefinite programming. As mentioned in Roos [8], extensions to other cases,
e.g., second-order cone programming and linear complementarity problems seem
be within reach, which will be our future work.
References
[1] E. de Klerk. Aspects of Semidefinite Programming: Interior Point Meth-
ods and Selected Applications. Kluwer Academic Publishers, Dordrecht, The
Netherlands, 2002.
[2] Z. Liu and W. Sun. An infeasible interior-point algorithm with full-Newton
step for linear optimization. Numerical Algorithms, 46(2):173–188, 2007.
[3] H. Mansouri and C. Roos. Simplified O(nL) infeasible interior-point algo-
rithm for linear optimization using full-Newton step. Optimization Methods
and Software, 22(3):519–530, 2007.
[4] R.D.C. Monteiro and Y. Zhang. A unified analysis for a class of long-step
primal-dual path-following interior-point algorithms for semidefinite pro-
gramming. Mathematical Programming, 81(3):281–299, 1998.
23
[5] J. Peng, C. Roos, and T. Terlaky. New complexity analysis of the primal-
dual method for semidefinite optimization based on NT-direction. Journal
of Optimization Theory and Applications, 109(2):327-343, 2001.
[6] J. Peng, C. Roos, and T. Terlaky. Self-regular functions and new search di-
rections for linear and semidefinite optimization. Mathematics Programming,
93(1):129–171, 2002.
[7] C. Roos, T. Terlaky, and J.-Ph.Vial. Theory and Algorithms for Linear Op-
timization. An Interior Approach. John Wiley and Sons, Chichester, UK,
1997.
[8] C. Roos. A full-Newton step O(n) infeasible interior-point algorithm for lin-
ear optimization. SIAM Journal on Optimization, 16(4):1110–1136, 2006.
[9] H. Wolkowicz, R. Saigal and L. Vandenberghe. Handbook of Semidefinite
Programming: Theory, Algorithms and Applications. Norwell, MA: Kluwer,
1999.
24