Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Interior-Point Theory for Convex Optimization
Robert M. Freund
May, 2014
c©2014 Massachusetts Institute of Technology. All rights reserved.
1
1 Background
The material presented herein is based on the following two research texts:
Interior-Point Polynomial Algorithms in Convex Programming by YuriiNesterov and Arkadii Nemirovskii, SIAM 1994, and
A Mathematical View of Interior-Point Methods in Convex Optimiza-tion by James Renegar, SIAM 2001.
2 Barrier Scheme for Solving Convex Optimiza-tion
Our problem of interest is
P : minimizex cTx
s.t. x ∈ S ,
where S is some closed convex set, and denote the optimal objective valueby V ∗. Let f(·) be a barrier function for S, namely f(·) satisfies:
(a) f(·) is strictly convex on its domain Df := intS , and
(b) f(x)→∞ as x→ ∂S .
The idea of the barrier method is to dissuade the algorithm from computingpoints too close to ∂S, effectively eliminating the complicating factors ofdealing with ∂S. For every value of µ > 0 we create the barrier problem:
Pµ : minimizex µcTx+ f(x)
s.t. x ∈ Df .
Note that Pµ is effectively unconstrained, since the boundary of the feasibleregion will never be encountered. The solution of Pµ is denoted z(µ):
2
z(µ) := arg minx{µcTx+ f(x) : x ∈ Df} .
Intuitively, as µ→∞, the impact of the barrier function on the solution ofPµ should become less and less, so we should have cT z(µ)→ V ∗ as µ→∞.Presuming this is the case, the barrier scheme tries to use Newton’s methodto solve for approximate solutions xi of Pµi for an increasing sequence ofvalues of µi →∞.
In order to be more specific about how the barrier scheme might work, letus assume that at each iteration we have some value x ∈ Df that is anapproximate solution of Pµ for a given value µ > 0. We will, of course, needa way to define “is an approximate solution of Pµ” that will be developedlater. We then will increase the barrier parameter µ by a multiplicativefactor α > 1:
µ← αµ .
Then we will take a Newton step at x for the problem Pµ to obtain a newpoint x that we would like to then be an approximate solution of Pµ. If so,we can continue the scheme inductively.
We typically use g(·) and H(·) to denote the gradient and Hessian of f(·).Note that the Newton iterate for Pµ has the formula:
x← x−H(x)−1(µc+ g(x)) .
The general algorithmic scheme is presented in Algorithm 1.
3 Some Plain Facts
Let f(·) : Rn → R be a twice-differentiable function. We typically use g(·)and H(·) denote the gradient and Hessian of f(·).
Here are four facts about integrals and derivatives:
Fact 3.1 g(y) = g(x) +
∫ 1
0H(x+ t(y − x))(y − x)dt
3
Algorithm 1 General Barrier Scheme
Initialize.
Initialize with µ0 > 0, x0 ∈ Df that is “an approximate solutionof Pµ0 .” i← 0.
Define α > 1.
At iteration i :
1. Current values.
µ← µi
x← xi
2. Increase µ and take Newton step.
µ← αµ
x← x−H(x)−1(µc+ g(x))
3. Update values.
µi+1 ← µ
xi+1 ← x
4
Fact 3.2 Let h(t) := f(x+ tv). Then
(i) h′(t) = g(x+ tv)T v , and
(ii) h′′(t) = vTH(x+ tv)v .
Fact 3.3
f(y) = f(x) + g(x)T (y − x) + 12(y − x)TH(x)(y − x)
+
∫ 1
0
∫ t
0(y − x)T [H(x+ s(y − x))−H(x)](y − x) ds dt
Fact 3.4
∫ r
0
(1
(1− at)2− 1
)dt =
ar2
1− ar
This follows by observing that∫
1(1−at)2dt = 1
a(1−at) .
We also present five additional facts that we will need in our analyses.
Fact 3.5 Suppose f(·) is a convex function on Rn, and S ⊂ Rn is a compactconvex set, and suppose x ∈ intS satisfies f(x) ≤ f(y) for all y ∈ ∂S. Thenf(·) attains its global minimizer on S.
Fact 3.6 Let ‖v‖ :=√vT v be the Euclidean norm. Let λ1 ≤ . . . ≤ λn
be the ordered eigenvalues of the symmetric matrix M , and define ‖M‖ :=max{‖Mx‖ : ‖x‖ ≤ 1}. Then ‖M‖ = maxi{|λi|} = max{|λn|, |λ1|}.
Fact 3.7 Suppose A,B are symmetric and A + B = θI for some θ ∈ R.Then AB = BA. Furthermore, if A � 0, B � 0, then AαBβ = BβAα for allα, β ≥ 0.
To see why this is true, decompose A = PDP T where P is orthonormal(P T = P−1) and D is diagonal. Then B = P (θI −D)P T , whereby AαBβ =PDαP TP (θI −D)βP T = PDα(θI −D)βP T = P (θI −D)βDαP T = P (θI −D)βP TPDαP T = BβAα.
5
Fact 3.8 Suppose λn ≥ . . . ≥ λ1 > 0. Then
maxi{|λi − 1|} ≤ max{λn − 1, 1/λ1 − 1} .
Fact 3.9 Suppose a, b, c, d > 0. Then
min{ab,c
d
}≤ a+ c
b+ d≤ max
{ab,c
d
}.
4 Self-Concordant Functions and Properties
Let f(·) be a strictly convex twice-differentiable function defined on the openset Df := domainf(·) and let Df := cl Df . Consider x ∈ Df . We will oftenabbreviate Hx := H(x) for the Hessian at x. Here we assume that Hx � 0,whereby Hx can be used to define the norm
‖v‖x :=√vTHxv
which is the “local norm” at x. Notice that
‖v‖x =√vTHxv = ‖H
12x v‖ ,
where ‖w‖ =√wTw is the standard Euclidean (L2) norm. Let
Bx(x, 1) := {y : ‖y − x‖x < 1} .
This is called the open Dikin ball at x after the Russian mathematicianI.I.Dikin.
Definition 4.1 f(·) is said to be (strongly nondegenerate) self-concordantif for all x ∈ Df we have Bx(x, 1) ⊂ Df , and for all y ∈ Bx(x, 1) we have:
1− ‖y − x‖x ≤‖v‖y‖v‖x
≤ 1
1− ‖y − x‖x
for all v 6= 0.
Let SC denote the class of all such functions.
6
Remark 1 The following are the most-used self-concordant functions:
(i) f(x) = − ln(x) for x ∈ Df = {x ∈ R : x > 0} ,
(ii) f(X) = − ln det(X) for X ∈ Df = {X ∈ Sk×k : X � 0} , and
(iii) f(x) = − ln(x21 −∑n
j=2 x2j ) for x ∈ Df := {x : ‖(x2, . . . , xn)‖ < x1} .
Before showing that these functions are self-concordant, let us see how wecan combine self-concordant functions to obtain other self-concordant func-tions.
Proposition 4.1 (self-concordance under addition/intersection) Sup-pose that fi(·) ∈ SC with domain Di := Dfi for i = 1, 2, and suppose thatD := D1∩D2 6= ∅. Define f(·) = f1(·)+f2(·). Then Df = D and f(·) ∈ SC.
Proof: Consider x ∈ D = D1 ∩ D2. Let Bix(c, r) denote the Dikin ball
centered at c with radius r defined by fi(·) and let ‖ · ‖x,i denote the norminduced at x using the Hessian Hi(x) of fi(x) for i = 1, 2. Then sincex ∈ Di we have Bi
x(x, 1) ⊂ Di and ‖v‖2x = ‖v‖2x,1 + ‖v‖2x,2 because H(x) =H1(x) + H2(x). Therefore if ‖y − x‖x < 1 it follows that ‖y − x‖x,1 < 1and ‖y − x‖x,2 < 1, whereby y ∈ Bi
x(x, 1) ⊂ Di for i = 1, 2, and hencey ∈ D1 ∩D2 = D. Also, for any v 6= 0, using Fact 3.9 we have
‖v‖2y‖v‖2x
=‖v‖2y,1+‖v‖2y,2‖v‖2x,1+‖v‖2x,2
≤ max
{‖v‖2y,1‖v‖2x,1
,‖v‖2y,2‖v‖2x,2
}
≤ max
{(1
1−‖y−x‖x,1
)2,(
11−‖y−x‖x,2
)2}
≤(
11−‖y−x‖x
)2.
The virtually identical argument can also be applied to prove the “≥” in-equality of the definition of self-concordance by replacing “max” by “min”above and applying the other inequality of Fact 3.9.
7
Proposition 4.2 (self-concordance under affine transformation) LetA ∈ Rm×n satisfy rankA = n ≤ m. Suppose that f(·) ∈ SC with domainDf ⊂ Rm and define f(·) by f(x) = f(Ax−b). Then f(·) ∈ SC with domain
D := {x : Ax− b ∈ Df}.
Proof: Consider x ∈ D and s = Ax − b. Letting g(s) and H(s) denotethe gradient and Hessian of f(s) and g(x) and H(x) the gradient and Hes-sian of f(x), we have g(x) = AT g(s) and H(x) = ATH(s)A. Supposethat ‖y − x‖x < 1. Then defining t := Ay − b we have 1 > ‖y − x‖x =√
(yTAT − xTAT )H(s)(Ay −Ax) = ‖t−s‖s, whereby t ∈ Df and so y ∈ D.
Therefore Bx(x, 1) ⊂ D. Also, for any v 6= 0, we have
‖v‖y‖v‖x
=
√vTATH(t)Av√vTATH(s)Av
=‖Av‖t‖Av‖s
≤ 1
1− ‖s− t‖s=
1
1− ‖y − x‖x.
The exact same argument can also be applied to prove the “≥” inequalityof the definition of self-concordance.
Proposition 4.3 The three functions defined in Remark 1 are self-concordant.
Proof: We will prove that f(X) := − ln det(X) is self-concordant on itsdomain {X ∈ Sk×k : X � 0}. When k = 1, this is the logarithmic barrierfunction. Although it is true, we will not prove that f(x) = − ln(x21 −∑n
j=2 x2j ) is a self-concordant barrier for the interior of the second-order cone
Qn := {x : ‖(x2, . . . , xn)‖ ≤ x1}, as this proof is arithmetically uninspiring.
Before we get started with the proof, we have to expand and amend ournotation a bit. If we consider two vectors x, y in the n-dimensional spaceV (which we typically identify with Rn), we use the standard inner productxT y to define the standard norm ‖v‖ :=
√vT v. When we consider the vector
space V to be Sk×k (the set of symmetric matrices of order k), we need todefine the standard inner product of two symmetric k × k matrices X andY and also to define the standard norm on V = Sk×k. The standard innerproduct in this case is the trace inner product:
X • Y := Tr(XY ) =
n∑i=1
(XY )ii =
k∑i=1
k∑j=1
XijYij .
8
The trace inner product has the following properties which are elementaryto establish:
1. A •BC = AB • C
2. Tr(AB) = Tr(BA), whereby A •B = B •A
The trace inner product is used to define the standard norm:
‖X‖ :=√X •X .
This norm is also called the Frobenius norm of a matrix X. Noticethat the Frobenius norm of X is not the operator norm of X. For theremainder of this proof we use the trace inner product and the Frobeniusnorm, and the reader should ignore any mental mention of operator normsin this proof.
To prove f(X) := − ln det(X) is self-concordant, let X � 0 be given,and let Y ∈ BX(X, 1) and V ∈ Sk×k be given. We need to verify threestatements:
1. Y � 0,
2.‖V ‖Y‖V ‖X
≤ 1
1− ‖Y −X‖X, and
3.‖V ‖Y‖V ‖X
≥ 1− ‖Y −X‖X
To get started, direct expansion yields the following second-order expansionof f(X):
f(X + ∆X) ≈ f(X)−X−1 •∆X +1
2∆X •X−1∆XX−1
and indeed it is easy to derive:
• g(X) = −X−1 and
• H(X)∆X = X−1∆XX−1
9
It therefore follows that
‖∆X‖X =√
Tr(∆XX−1∆XX−1)
=
√Tr(X−
12 ∆XX−
12X−
12 ∆XX−
12 )
=
√Tr([X−
12 ∆XX−
12 ]2) .
(1)
Now define two auxiliary matrices:
F := X−12Y X−
12 and S := X−
12V X−
12 .
Note that
‖S‖ =
√Tr(X−
12V X−
12X−
12V X−
12 ) = ‖V ‖X . (2)
Furthermore let us write F = QDQT where Q is orthonormal (QT = Q−1)and the diagonal matrix D is comprised of the eigenvalues of F , and let λdenote the vector of eigenvalues, with minima and maxima λmin and λmax.To prove item (1.) above, we observe:
1 > ‖Y −X‖2X = Tr(X−12 (Y −X)X−
12X−
12 (Y −X)X−
12 )
= Tr(F − I)(F − I))
= Tr(Q(D − I)QTQ(D − I)QT )
= Tr((D − I)(D − I))
=∑k
j=1(λj − 1)2
= ‖λ− e‖22
(3)
where e = (1, . . . , 1). Since the last quantity above is less than 1, it followsthat λ > 0 and hence F � 0 and therefore Y � 0, establishing (1.). In orderto establish (2.) and (3.) we will need the following
‖F−12SF−
12 ‖ ≤ 1
λmin‖S‖ and ‖F−
12SF−
12 ‖ ≥ 1
λmax‖S‖ . (4)
10
To prove (4), we proceed as follows:
‖F−12SF−
12 ‖ =
√Tr(QD−
12QTSQD−
12QTQD−
12QTSQD−
12QT )
=√
Tr(D−1QTSQD−1QTSQ)
≤ 1√λmin
√Tr(QTSQD−1QTSQ)
= 1√λmin
√Tr(D−1QTSSQ)
≤ 1λmin
√Tr(QTSSQ)
= 1λmin
√Tr(SS) = 1
λmin‖S‖ .
The other inequality of (4) follows by substituting λmax for λmin and switch-ing ≥ for ≤ in the above chain of equalities and inequalities. We now have:
‖V ‖2Y = Tr(V Y −1V Y −1)
= Tr(X−12V X−
12X
12Y −1X
12X−
12V X−
12X
12Y −1X
12 )
= Tr(SF−1SF−1)
= Tr(F−12SF−
12F−
12SF−
12 )
= ‖F−12SF−
12 ‖2 ≤ 1
λ2min‖S‖2 = 1
λ2min‖V ‖2X
where the last inequality follows from (4) and the last equality from (2).Therefore
‖V ‖Y‖V ‖X
≤ 1
λmin≤ 1
1− |1− λmin|≤ 1
1− ‖e− λ‖2=
1
1− ‖Y −X‖Xwhere the last equality is from (3). This proves (2.). To prove (3.), use thesame equalities as above and the second inequality of (4) to obtain:
‖V ‖2Y = ‖F−12SF−
12 ‖2 ≥ 1
λ2max
‖V ‖2X
11
and therefore ‖V ‖Y‖V ‖X ≥1
λmax. If λmax ≤ 1 it follows directly that ‖V ‖Y‖V ‖X ≥
1λmax
≥ 1 ≥ 1− ‖Y −X‖X , while if λmax > 1 we have:
‖Y −X‖X = ‖λ− e‖2 ≥ λmax − 1 ,
from which it follows that
λmax‖Y −X‖X ≥ ‖Y −X‖X ≥ λmax − 1
and soλmax (1− ‖Y −X‖X) ≤ 1 .
From this it then follows that ‖V ‖Y‖V ‖X ≥1
λmax≥ 1−‖Y −X‖X , thus completing
the proof of (3.).
Our next result is rather technical, as it shows further properties of changesin Hessian matrices under self-concordance. Recall from Fact 3.6 that theoperator norm of a matrix M is defined using the Euclidean norm, namely
‖M‖ := max{‖Mx‖ : ‖x‖ ≤ 1} .
Lemma 4.1 Suppose that f(·) ∈ SC and x ∈ Df . If ‖y − x‖x < 1, then
(i) ‖H−12
x HyH− 1
2x ‖ ≤
(1
1−‖y−x‖x
)2,
(ii) ‖H12xH−1y H
12x ‖ ≤
(1
1−‖y−x‖x
)2,
(iii) ‖I −H−12
x HyH− 1
2x ‖ ≤
(1
1−‖y−x‖x
)2− 1 , and
(iv) ‖I −H12xH−1y H
12x ‖ ≤
(1
1−‖y−x‖x
)2− 1 .
Proof: Let Q := H− 1
2x HyH
− 12
x , and observe that Q � 0 with eigenvaluesλn ≥ . . . ≥ λ1 > 0. From Fact 3.6 we have
√‖Q‖ =
√λn = max
w
√wTQw
wTw= max
v
√vTHyv
vTHxv= max
v
‖v‖y‖v‖x
≤ 1
1− ‖y − x‖x
12
(where the third equality uses the substitution v = H− 1
2x w) and squaring
yields the first assertion. Similarly, we have
1√‖Q−1‖
=√λ1 = min
w
√wTQw
wTw= min
v
√vTHyv
vTHxv= min
v
‖v‖y‖v‖x
≥ 1−‖y−x‖x
(where the third equality again uses the substitution v = H− 1
2x w) and squar-
ing and rearranging yields the second assertion. Next observe
‖I −Q‖ = maxi{|λi − 1|} ≤ max{λn − 1, 1/λ1 − 1} ≤
(1
1− ‖y − x‖x
)2
− 1
where the first inequality is from Fact 3.8 and the second inequality followsfrom the two equation streams above, thus showing the third assertion ofthe lemma. Finally, we have
‖I−Q−1‖ = maxi{|1/λi−1|} ≤ max{1/λ1−1, λn−1} ≤
(1
1− ‖y − x‖x
)2
−1
where the first inequality is from Fact 3.8 and the second inequality followsfrom the two equation streams above, thus showing the fourth assertion ofthe lemma.
The next result states that we can bound the error in the quadraticapproximation of f(·) inside the Dikin ball Bx(x, 1).
Proposition 4.4 Suppose that f(·) ∈ SC and x ∈ Df . If ‖y − x‖x < 1,then∣∣∣∣f(y)−
[f(x) + g(x)T (y − x) +
1
2(y − x)THx(y − x)
]∣∣∣∣ ≤ ‖y − x‖3x3(1− ‖y − x‖x)
.
13
Proof: Let L denote the left-hand side of the inequality to be proved. FromFact 3.3 we have
L =
∣∣∣∣∫ 1
0
∫ t
0(y − x)T [H(x+ s(y − x))−H(x)](y − x) ds dt
∣∣∣∣=
∣∣∣∣∫ 1
0
∫ t
0(y − x)TH
12x [H
− 12
x H(x+ s(y − x))H− 1
2x − I]H
12x (y − x) ds dt
∣∣∣∣≤ ‖y − x‖2x
∣∣∣∣∫ 1
0
∫ t
0‖H−
12
x H(x+ s(y − x))H− 1
2x − I‖ ds dt
∣∣∣∣≤ ‖y − x‖2x
∣∣∣∣∣∫ 1
0
∫ t
0
(1
1− s‖y − x‖x
)2
− 1 ds dt
∣∣∣∣∣ (from Lemma 4.1)
= ‖y − x‖2x∫ 1
0
‖y − x‖xt2
1− t‖y − x‖xdt (from Fact 3.4)
≤ ‖y − x‖3x1− ‖y − x‖x
∫ 1
0t2 dt =
‖y − x‖3x3(1− ‖y − x‖x)
.
Recall Newton’s method to minimize f(·). At x ∈ Df we computethe Newton step:
n(x) := −H(x)−1g(x)
and compute the Newton iterate:
x+ := x+ n(x) = x−H(x)−1g(x) .
When f(·) ∈ SC, Newton’s method has some very wonderful properties aswe now show.
Theorem 4.1 Suppose that f(·) ∈ SC and x ∈ Df . If ‖n(x)‖x < 1, then
‖n(x+)‖x+ ≤(‖n(x)‖x
1− ‖n(x)‖x
)2
.
14
Proof: We will prove this by proving the following two results which to-gether establish the result:
(a) ‖n(x+)‖x+ ≤‖H−
12
x g(x+)‖1− ‖n(x)‖x
, and
(b) ‖H−12
x g(x+)‖ ≤ ‖n(x)‖2x1− ‖n(x)‖x
.
First we prove (a):
‖n(x+)‖2x+ = g(x+)TH−1(x+)H(x+)H−1(x+)g(x+)
= g(x+)TH− 1
2x H
12xH−1(x+)H
12xH
− 12
x g(x+)
≤ ‖H12xH−1(x+)H
12x ‖‖H
− 12
x g(x+)‖2
≤(
11−‖x+−x‖x
)2‖H−
12
x g(x+)‖2 (from Lemma 4.1)
=(
11−‖n(x)‖x
)2‖H−
12
x g(x+)‖2 ,
which proves (a). To prove (b), observe first that
g(x+) = g(x+)− g(x) + g(x)
= g(x+)− g(x)−Hxn(x)
=∫ 10 H(x+ t(x+ − x))(x+ − x)dt−Hxn(x) (from Fact 3.1)
=∫ 10 [H(x+ tn(x))−Hx]n(x)dt
=∫ 10 [H(x+ tn(x))−Hx]H
− 12
x H12x n(x)dt .
Therefore
H− 1
2x g(x+) =
∫ 1
0
[H− 1
2x H(x+ tn(x))H
− 12
x − I]H
12x n(x)dt
15
which then implies
‖H−12
x g(x+)‖ ≤∫ 10 ‖H
− 12
x H(x+ tn(x))H− 1
2x − I‖‖H
12x n(x)‖dt
≤ ‖H12x n(x)‖
∫ 10
(1
1−t‖n(x)‖x
)2− 1 dt (from Lemma 4.1)
= ‖n(x)‖x ‖n(x)‖x1−‖n(x)‖x (from Fact 3.4)
which proves (b).
Theorem 4.2 Suppose that f(·) ∈ SC and x ∈ Df . If ‖n(x)‖x ≤ 14 , then
f(·) has a minimizer z, and
‖z − x+‖x ≤3‖n(x)‖2x
(1− ‖n(x)‖x)3.
Proof: First suppose that ‖n(x)‖x ≤ 1/9, and define Qx(y) := f(x) +g(x)T (y − x) + 1
2(y − x)THx(y − x). Let y satisfy ‖y − x‖x ≤ 1/3. Thenfrom Proposition 4.4 we have
|f(y)−Qx(y)| ≤ ‖y − x‖3x3(1− 1/3)
≤ ‖y − x‖2x
9(2/3)=‖y − x‖2x
6,
and therefore
f(y) ≥ f(x) + g(x)TH−1x H12xH
12x (y − x) + 1
2‖y − x‖2x − 1
6‖y − x‖2x
≥ f(x)− ‖n(x)‖x‖y − x‖x + 13‖y − x‖
2x
= f(x) + 13‖y − x‖x(−3‖n(x)‖x + ‖y − x‖x) .
Now if y ∈ ∂S := ∂{y : ‖y − x‖x ≤ 3‖n(x)‖x}, it follows that f(y) ≥ f(x).So, by Fact 3.5, f(·) has a global minimizer z ∈ S, and so ‖z − x‖x ≤3‖n(x)‖x.
Now suppose that ‖n(x)‖x ≤ 1/4. From Theorem 4.1 we have
‖n(x+)‖x+ ≤(
1/4
1− 1/4
)2
= 1/9 ,
16
so f(·) has a global minimizer z and ‖z − x+‖x+ ≤ 3‖n(x+)‖x+ . Therefore
‖z − x+‖x ≤‖z − x+‖x+
1− ‖x− x+‖x(from Definition 4.1)
=‖z − x+‖x+1− ‖n(x)‖x
≤3‖n(x+)‖x+1− ‖n(x)‖x
≤ 3‖n(x)‖2x(1− ‖n(x)‖x)3
. (from Theorem 4.1)
Last of all in this section, we present a more traditional result aboutthe convergence of Newton’s method for a self-concordant function. Thisresult is not used elsewhere in our development, and is only included torelate the results herein to more traditional theory of Newton’s method.The proof of this result is left as a somewhat-challenging exercise.
Theorem 4.3 Suppose that f(·) ∈ SC and x ∈ Df and f(·) has a minimizerz. If ‖x− z‖z < 1
4 , then
‖x+ − z‖z < 4‖x− z‖2z .
5 Self-Concordant Barriers
We begin with another definition.
Definition 5.1 f(·) is a ϑ-(strongly nondegenerate self-concordant)-barrierif f(·) ∈ SC and
ϑ = ϑf := maxx∈Df
‖n(x)‖2x <∞ .
17
Note that ‖n(x)‖2x = (−g(x)TH(x)−1H(x)H(x)−1(−g(x)) = g(x)TH(x)−1g(x),so we can equivalently define
ϑf := maxx∈Df
g(x)TH(x)−1g(x)
orϑf := max
x∈Df
n(x)TH(x)n(x) .
The quantity ϑf is called the complexity value of the barrier f(·).
Let SCB denote the class of all such functions. The following prop-erty is very important.
Theorem 5.1 Suppose that f(·) ∈ SCB and x, y ∈ Df . Then
g(x)T (y − x) < ϑf .
Proof: Define φ(t) := f(x+t(y−x)), whereby φ′(t) = g(x+t(y−x))T (y−x)
and φ′′(t) = (y−x)TH(x+t(y−x))(y−x). We want to prove that φ
′(0) < ϑf .
If φ′(0) ≤ 0 there is nothing further to prove, so we can assume that φ
′(0) > 0
whereby from convexity it also follows that φ′(t) > 0 for all t ∈ [0, 1]. (Notice
that φ(1) = f(y) and so t = 1 is in the domain of φ(·).) Let t ∈ [0, 1] begiven and let v = x+ t(y − x). Then
φ′(t) = g(v)T (y − x)
= g(v)TH−1v H12v H
12v (y − x)
= −n(v)TH12v H
12v (y − x)
≤ ‖H12v n(v)‖‖H
12v (y − x)‖
= ‖n(v)‖v‖y − x‖v ≤√ϑf‖y − x‖v .
Also φ′′(t) = (y − x)Hv(y − x) = ‖y − x‖2v, whereby
φ′′(t)
φ′(t)2≥ ‖y − x‖2vϑf‖y − x‖2v
=1
ϑf.
18
It follows that
1
ϑf≤∫ 1
0
φ′′(t)
φ′(t)2dt = − 1
φ′(t)
∣∣∣∣10
=1
φ′(0)− 1
φ′(1),
and we have1
φ′(0)≥ 1
φ′(1)+
1
ϑf>
1
ϑf,
which proves the result.
The next two results show how the complexity value ϑ behaves underaddition/intersection and affine transformation.
Theorem 5.2 (self-concordant barriers under addition/intersection)Suppose that fi(·) ∈ SCB with domain Di := Dfi and complexity valuesϑi := ϑfi for i = 1, 2, and suppose that D := D1 ∩ D2 6= ∅. Definef(·) = f1(·) + f2(·). Then f(·) ∈ SCB with domain D, and ϑf ≤ ϑ1 + ϑ2.
Proof: Fix x ∈ D and let g, g1, g2 and H,H1, H2 denote the gradients andHessians of f(·), f1(·), f2(·) at x, whereby g = g1 + g2 and H = H1 + H2.
Define Ai = H−12HiH
− 12 for i = 1, 2. Then Ai � 0, A1 +A2 = I, so A1, A2
commute and A121 , A
122 commute, from Fact 3.7. Also define ui = A
− 12
i H−12 gi
19
for i = 1, 2. We have
gTH−1g = gT1 H−1g1 + gT2 H
−1g2 + 2gT1 H−1g2
= uT1A1u1 + uT2A2u2 + 2uT1A121A
122 u2
= uT1 [I −A2]u1 + uT2 [I −A1]u2 + 2uT1A122A
121 u2 (from Fact 3.7)
= uT1 u1 + uT2 u2 −[uT1A2u1 + uT2A1u2 − 2uT1A
122A
121 u2
]
= gT1 H− 1
2A−11 H−12 g1 + gT2 H
− 12A−12 H−
12 g2 − ‖A
122 u1 −A
121 u2‖2
≤ gT1 H− 1
2H12H−11 H
12H−
12 g1 + gT2 H
− 12H
12H−12 H
12H−
12 g2
= gT1 H−11 g1 + gT2 H
−12 g2
≤ ϑ1 + ϑ2
thereby showing that ϑf ≤ ϑ1 + ϑ2.
Theorem 5.3 (self-concordant barriers under affine transformation)Let A ∈ Rm×n satisfy rankA = n ≤ m. Suppose that f(·) ∈ SCB with com-plexity value ϑf , with domain Df ⊂ Rm and define f(·) by f(x) = f(Ax−b).Then f(·) ∈ SCB and ϑf ≤ ϑf .
Proof: Fix x ∈ D and define s = Ax − b. Letting g and H denote thegradient and Hessian of f(s) at s and g and H the gradient and Hessian off(x) at x, we have g = AT g and H = ATHA. Then
gT H−1g = gTA(ATHA)−1AT g = gTH−12H
12A(ATH
12H
12A)−1ATH
12H−
12 g
≤ gTH−12H−
12 g = gTH−1g ≤ ϑf
since the matrix H12A(ATH
12H
12A)−1ATH
12 is a projection matrix.
Remark 2 The complexity values of the three most-used barriers are asfollows:
20
1. ϑf = 1 for the barrier f(x) = − ln(x) defined on Df = {x : x > 0}
2. ϑf = k for the barrier f(X) = − ln det(X) defined on Df = {X ∈Sk×k : X � 0 }
3. ϑf = 2 for the barrier f(x) = − ln(x21 −∑n
j=2 x2j ) defined on Df =
{x : ‖(x2, . . . , xn)‖ ≤ x1}
Proof: Item (1.) follows from item (2.) so we first prove (2.). Recall theuse of the trace inner product X • Y = Tr(XY ) and the Frobenius norm‖X‖ :=
√X •X as discussed early in the proof of Proposition 4.3. Also
recall that for f(X) = − ln det(X) we have g(X) = −X−1 and H(X)∆X =X−1∆XX−1. Therefore the Newton step at X, denoted by n(X), is thesolution of the following equation:
X−1[n(X)]X−1 = X−1
and it follows that n(X) = X. Therefore, using (1) we have
‖n(X)‖2X = n(X)•X−1n(X)X−1 = X•X−1XX−1 = X•X−1 = Tr(XX−1) = Tr(I) = k
and therefore ϑf = maxX�0 ‖n(X)‖2X = k, which proves (2.) and hence (1.).
In order to prove (3.) we amend our notation a bit, letting Qn ={(t, x) ∈ R1 × Rn−1 : ‖x‖ ≤ t}. For (t, x) ∈ intQn we have ‖x‖ < t andmechanically we can derive:
g(t, x) =
−2t
t2 − xTx2
t2 − xTxx
H(t, x) =
2t2 + 2xTx
(t2 − xTx)2−4txT
(t2 − xTx)2
−4tx
(t2 − xTx)22(t2 − xTx)I + 4xxT
(t2 − xTx)2
and the Hessian inverse is given by
H(t, x)−1 =
(t2+xT x
2 txT
tx t2−xT x2 I + xxT
).
Directly plugging in yields
g(t, x)TH−1(t, x)g(t, x) = 2
21
from which it follows that ϑf = max(t,x)∈Qn g(x)TH(x)−1g(x) = 2.
Finally, we present a result that shows that “one half of the Dikinball” approximates the shape of a relevant portion of Df to within a factorof ϑf .
Proposition 5.1 Suppose that f(·) ∈ SCB with complexity value ϑf , letx ∈ Df , and define the following three sets:
(a) S1 := {y : ‖y − x‖x < 1, g(x)T (y − x) ≥ 0} ,
(b) S2 :=⊂ {y : y ∈ Df , g(x)T (y − x) ≥ 0} , and
(c) S3 :=⊂ {y : ‖y − x‖x < 4ϑf + 1, g(x)T (y − x) ≥ 0} .
Then S1 ⊂ S2 ⊂ S3.
The proof of Proposition 5.1 is not more involved than other resultsherein. It is not included because it is not necessary for subsequent results.It is stated here to show the power and beauty of self-concordant barriers.
Corollary 5.1 If z is the minimizer of f(·), then
Bz(z, 1) ⊂ Df ⊂ Bz(z, 4ϑf + 1) .
6 The Barrier Method and its Analysis
Our original problem of interest is
P : minimizex cTx
s.t. x ∈ S ,
whose optimal objective value we denote by V ∗. Let f(·) be a self-concordantbarrier on Df = intS. For every µ > 0 we create the barrier problem:
22
Pµ : minimizex µcTx+ f(x)
s.t. x ∈ Df .
The solution of this problem for each µ is denoted z(µ):
z(µ) := arg minx{µcTx+ f(x) : x ∈ Df} .
Intuitively, as µ → ∞, the impact of the barrier function on thesolution of Pµ should become less and less, so we should have cT z(µ)→ V ∗
as µ → ∞. Presuming this is the case, the barrier scheme will use New-ton’s method to solve for approximate solutions xi of Pµi for an increasingsequence of values of µi →∞.
In order to be more specific about how the barrier scheme might work, letus assume that at each iteration we have some value x ∈ Df that is anapproximate solution of Pµ for a given value µ > 0. (We will define “anapproximate solution of Pµ” shortly.) We then will increase the barrierparameter µ by a multiplicative factor α > 1:
µ← αµ .
Then we will take a Newton step at x for the problem Pµ to obtain a newpoint x that we would like to then be an approximate solution of Pµ. If so,we can continue the scheme inductively.
Let x ∈ Df be given, and let us compute the Newton step for Pµ at x. Theobjective function of Pµ is
hµ(x) := µcTx+ f(x) ,
whereby we have:
(a) ∇hµ(x) = µc+ g(x) and
(b) ∇2hµ(x) = H(x) = Hx .
23
Therefore the Newton step for hµ(·) at x is:
nµ(x) := −H−1x (µc+ g(x)) = n(x)− µH−1x c
and the new iterate is:
x := x+ nµ(x) = x−H−1x (µc+ g(x)) .
Remark 3 Notice that hµ(·) ∈ SC, since membership in SC has only todo with Hessians, and hµ(·) and f(·) have the same Hessian. However,membership in SCB depends also on gradients, and hµ(x) and f(x) havedifferent gradients, whereby it will be typically true that hµ(·) /∈ SCB (unlessc = 0).
We now define what we mean for y to be an “approximate solution”of Pµ.
Definition 6.1 Let γ ∈ [0, 1) be given. We say that y ∈ Df is a γ-approximate solution of Pµ if
‖nµ(y)‖y ≤ γ .
Notice that this definition is equivalent to
‖n(y)− µH−1y c‖y ≤ γ .
Essentially, the definition states that y is a γ-approximate solution of Pµ ifthe Newton step for Pµ at y is small (measured using the local norm at y).The following theorem gives an explicit optimality gap bound for y if y is aγ = 1/4-approximate solution of Pµ.
Theorem 6.1 Suppose γ ≤ 14 , and y ∈ Df is a γ-approximate solution of
Pµ. Then
cT y ≤ V ∗ +ϑfµ
(1
1− δ
)where δ = γ + 3γ2
(1−γ)3 .
24
Proof: From Theorem 4.2 we know that z(µ) exists and furthermore
‖y − z(µ)‖y = ‖y + nµ(y)− z(µ)− nµ(y)‖y
≤ ‖y+ − z(µ)‖y + ‖nµ(y)‖y
≤ 3γ2
(1−γ)3 + γ = δ .
From basic first-order optimality conditions we know that z(µ) satisfies
µc+ g(z(µ)) = 0
and from Theorem 5.1 we have
−µcT (w − z(µ)) = g(z(µ))T (w − z(µ)) < ϑf for all w ∈ Df .
Rearranging we have
cTw +ϑfµ> cT z(µ) for all w ∈ Df ,
whereby V ∗+ϑfµ ≥ c
T z(µ). Now for notational convenience denote z := z(µ)
25
and observe
cT y = cT z + cT (y − z)
≤ V ∗ +ϑfµ + cTH
− 12
z H12z (y − z)
≤ V ∗ +ϑfµ + ‖H−
12
z c‖‖(y − z)‖z
≤ V ∗ +ϑfµ +
(√cTH−1z c
)‖(y−z)‖y
1−‖(y−z)‖y
≤ V ∗ +ϑfµ +
(√(g(z)/µ)TH−1z (g(z)/µ)
)δ
1−δ
= V ∗ +ϑfµ +
√(g(z))TH−1
z (g(z))µ
δ1−δ
≤ V ∗ +ϑfµ +
√ϑfµ
δ1−δ
≤ V ∗ +ϑfµ
(1 + δ
1−δ
)= V ∗ +
ϑfµ(1−δ) .
The last inequality above follows from the fact (which we will not prove)that ϑf ≥ 1 for any f(·) ∈ SCB.
Note that with γ = 1/9 we have 1/(1 − δ) ≤ 6/5, whereby cT y ≤ V ∗ +1.2ϑf/µ.[
Theorem 6.2 Let β := 14 , γ := 1
9 , and α :=√ϑ+β√ϑ+γ
. Suppose x is a γ-
approximate solution of Pµ. Define µ := αµ, and let x be the Newton iteratefor Pµ at x, namely
x := x−H(x)−1(µc+ g(x)) .
Then
26
Algorithm 2 Barrier Method
Initialize.
Define γ := 1/9, β := 1/4, α :=√ϑ+β√ϑ+γ
.
Initialize with µ0 > 0, x0 ∈ Df that is a γ-approximate solutionof Pµ0 . i← 0.
At iteration i :
1. Current values.
µ← µi
x← xi
2. Increase µ and take Newton step.
µ← αµ
x← x−H(x)−1(µc+ g(x))
3. Update values.
µi+1 ← µ
xi+1 ← x
27
1. x is a β-approximate solution of Pµ, and
2. x is a γ-approximate solution of Pµ.
Proof: To prove (1.) we have:
‖nµ(x)‖x = ‖nαµ(x)‖x
= ‖H(x)−1(αµc+ g(x))‖x
= ‖α[H(x)−1(µc+ g(x))
]+ (1− α)H(x)−1g(x)‖x
≤ α‖H(x)−1(µc+ g(x))‖x + (α− 1)‖H(x)−1g(x)‖x
≤ αγ + (α− 1)‖n(x)‖x
≤ αγ + (α− 1)√ϑf = β .
To prove (2.) we invoke Theorem 4.1:
‖nµ(x)‖x ≤‖nµ(x)‖2x
(1− ‖nµ(x)‖x)2≤ β2
(1− β)2=
1
9= γ .
Applying Theorem 6.2 inductively we obtain the basic barrier methodfor self-concordant barriers as presented in Algorithm 2. The complexity ofthis scheme is presented below.
Theorem 6.3 Let ε > 0 be the desired optimality tolerance, and define
J :=
⌈9√ϑ ln
(6ϑ
5µ0ε
)⌉.
Then by iteration J of the barrier method the current iterate x satisfiescTx ≤ V ∗ + ε.
Proof: With the given values of γ, β, α we have δ := γ + 3γ2
(1−γ)3 satisfies1
1−δ ≤ 6/5 and
1− 1
α=
1
7.2√ϑ+ 1.8
≥ 1
9√ϑ.
28
After J iterations the current iterate x is a γ-approximate solution of Pµwhere µ = αJµ0. Therefore
lnµ0 − lnµ = J ln
(1
α
)≤ J
(1
α− 1
).
(Note that the inequality above follows from the fact that ln t ≤ t − 1 fort > 0, which itself is a consequence of the concavity of ln(·).) Therefore
lnµ ≥ lnµ0+J
(1− 1
α
)≥ lnµ0+
J
9√ϑ≥ lnµ0+ln
(6ϑ
5µ0ε
)≥ ln
(ϑ
(1− δ)ε
)Therefore µ ≥ ϑ
(1−δ)ε , and from Theorem 6.1 we have
cTx ≤ V ∗ +ϑ
µ(1− δ)≤ V ∗ + ε .
7 Remarks and other Matters
7.1 Nesterov-Nemirovskii definition of self-concordance
The original definition of self-concordance due to Nesterov and Nemirovskiiis different than that presented here in Definition 4.1. Let f(·) be a functiondefined on an open set Df ⊂ Rn. For every x ∈ Df and every vector h ∈ Rndefine the univariate function
φx,h(α) := f(x+ αh) .
Then Nesterov and Nemirovskii’s definition of self-concordance is as follows.
Definition 7.1 The function f(·) defined on the domain Df ⊂ Rn is (stronglynondegenerate) self-concordant if:
(a) H(x) � 0 for all x ∈ Df ,
(b) f(x)→∞ as x→ ∂Df , and
29
(c) for all x ∈ Df and h ∈ Rn, φx,h(·) satisfies |φ′′′x,h(0)| ≤ 2[φ′′x,h(0)
] 32
.
Definition 7.1 roughly states that the third derivative of f(·) isbounded by an appropriate function of the second derivative. This defini-tion assumes f(·) is three-times differentiable, unlike Definition 4.1. It turnsout that when f(·) is three-times differentiable, then Definition 7.1 impliesthe properties of Definition 4.1 and vice versa, thus the two definitions areequivalent. We prefer Definition 4.1 because proofs of later properties are farmore easy to establish. The key advantage of Definition 7.1 is that it leadsto an easier way to validate the self-concordance of most self-concordantfunctions, especially − ln det(X) and − ln(t2−xTx) for the semidefinite andsecond-order cones, respectively.
7.2 Universal Barrier
One natural question is: given a convex set S ⊂ Rn with nonempty interior,does there exist a self-concordant barrier fS(·) whose domain is Df = intS?Furthermore, if the answer is yes, does there exist a barrier whose complexityvalue is, say, O(n)? It turns out that the answers to these two questions areboth “yes,” but the proofs are incredibly difficult.
Let S ⊂ Rn be a closed convex set with nonempty interior. For eachpoint x ∈ intS, define the “local polar of S relative to x”:
PS(x) := {w ∈ Rn : wT (y − x) ≤ 1 for all y ∈ S} .
Basic geometry suggests that as x → ∂S, the volume of PS(x) should ap-proach ∞. Now define the function:
fS(x) := ln (volume (PS(x))) .
It turns out that fS(·) is a self-concordant function, but this is quitehard to prove. Furthermore, there is a universal constant C such that forany dimension n and any closed convex set S with nonempty interior, it istrue that
ϑfS(·) ≤ C · n .
30
This is a remarkable and very deep result. However, it is computationallyuseless, since the function values of fS(·) are not only extremely difficult tocompute, but gradients and Hessian information is also extremely difficultto compute.
7.3 Other Matters
(i) getting started,
(ii) other formats for convex optimization,
(iii) ϑ-logarithmic homogeneous barriers for cones,
(iv) primal-dual methods,
(v) computational practice, and
(vi) other self-concordant functions and self-concordant calculus.
7.4 Exercises
1. Prove that if ‖n(x)‖x > 1/4, then upon setting
y = x+1
5‖n(x)‖xn(x) ,
we have f(y) ≤ f(x)− 1/37.5 .
2. Prove Theorem 4.3. To get started, observe that x+ − z = x − z −H−1x (g(x) − g(z)) (since z is a minimizer and hence g(z) = 0), andthen use Fact 3.1 to show that
x+ − z = H− 1
2x
∫ 1
0H− 1
2x [Hx −H(x+ t(z − x))]H
− 12
x H12x (x− z)dt .
Next multiply each side by by H12x and take norms, and then invoke
Lemma 4.1. This should help get you started.
31
3. Use Theorem 4.3 to show that under the hypothesis of the theorem, thesequence of Newton iterates starting with x, x1 = x+, x
2 = (x1)+, . . .,satisfies
‖xi − z‖z <1
4(4‖x− z‖z)2
i
.
4. Complete the proof Proposition 4.1 by proving the “≥” inequality inthe definition of self-concordance using the suggestions at the end ofthe proof earlier in the text.
5. Complete the proof Proposition 4.2 by proving the “≥” inequality inthe definition of self-concordance using the suggestions at the end ofthe proof earlier in the text.
6. The proof of Proposition 4.3 uses the inequalities presented in (4), butonly the left-most inequality is proved in the text herein. Prove theright-most inequality by substituting λmax for λmin and switching ≥for ≤ in the chain of equalities and inequalities in the text.
32