Convex Analysis - mat.univie.ac.at

Lecture Notes“Convex Analysis”

Summer Semester 2021

Univ.-Prof. Dr. Radu Ioan Bot

Preface

“... in fact, the great watershed in optimization isn’t between linearity and non-linearity, but convexity and nonconvexity.” (R. Tyrrell Rockafellar, 1993)

3

4 Preface

Contents

I Convex sets 71 Preliminary notions and results . . . . . . . . . . . . . . . . . . . 72 The relative interior of a convex set in finite-dimensional spaces . 16

II Convex functions 273 Algebraic properties of convex functions . . . . . . . . . . . . . . 274 Topological properties of convex functions . . . . . . . . . . . . . 32

IIIConjugate functions and subdifferentiability 435 Conjugate functions . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Convex subdifferential . . . . . . . . . . . . . . . . . . . . . . . . 497 Directional derivative and differentiability . . . . . . . . . . . . . 55

IV Duality theory 638 A general perturbation approach . . . . . . . . . . . . . . . . . . 639 Closedness-type regularity conditions . . . . . . . . . . . . . . . . 7110 Fenchel duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7311 Lagrange duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

V Maximal monotone operators 8712 Monotone set-valued operators . . . . . . . . . . . . . . . . . . . . 8713 The maximal monotonicity of the convex subdifferential . . . . . . 9014 A representation of monotone operators by convex functions . . . 9515 The maximal monotonicity of the sum of two maximal monotone

operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Bibliography 107

5

6 Contents

Chapter I

Convex sets

In this chapter we will discuss some algebraic and topological properties of convexsets.

Throughout this lecture X will denote a nontrivial real normed space withnorm ‖ · ‖. For C,D ⊆ X, we denote by C + D := {x + y | x ∈ C, y ∈ D} theMinkowski sum of C and D. For λ ∈ R, let λC := {λx | x ∈ C}.

1 Preliminary notions and results

Definition 1.1 (affine set and convex set) A set C ⊆ X is said to be

(a) affine, if λx+ (1− λ)y ∈ C for all x, y ∈ C and all λ ∈ R;

(b) convex, if λx+ (1− λ)y ∈ C for all x, y ∈ C and all λ ∈ [0, 1];

(c) a cone, if λx ∈ C for all x ∈ C and all λ ≥ 0.

Every affine set is convex. The empty set is affine and, therefore, convex andit is a cone. Every linear subspace of X is an affine set. Every affine set whichcontains the origin is a linear subspace. A cone C ⊆ X is convex if and only ifC + C ⊆ C.

Let C ⊆ X be a nonempty set. We have

C is affine⇔ ∀x ∈ C the set C − x is a linear subspace

⇔ ∃x ∈ C such that C − x is a linear subspace.

For C an affine set and x ∈ C, one has C − x = C − y for all y ∈ C. The setC − x is called the linear subspace parallel to C and the dimension of the affineset C is defined as being equal the dimension of C − x.

By induction one can prove that a set C ⊆ X is

(a) affine if and only if for all k ∈ N, xi ∈ C and λi ∈ R, i = 1, ..., k, with∑ki=1 λi = 1, it holds

∑ki=1 λixi ∈ C;

7

8 I Convex sets

(b) convex if and only if for all k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, with∑ki=1 λi = 1, it holds

∑ki=1 λixi ∈ C.

The intersection of an arbitrary family of convex sets (cones, affine sets, lin-ear subspaces, respectively) is a convex set (cone, affine set, linear subspace,respectively).

Definition 1.2 (linear hull, affine hull, convex hull, conical hull) Let C ⊆ X.The set

(a) linC := ∩{D ⊆ X | C ⊆ D,D is a linear subspace} is called the linear hullof C.

(b) aff C := ∩{D ⊆ X | C ⊆ D,D is an affine set} is called the affine hull ofC.

(c) coC := ∩{D ⊆ X | C ⊆ D,D is a convex set} is called the convex hull ofC.

(d) coneC := ∩{D ⊆ X | C ⊆ D,D is a cone} is called the conical hull of C.

The following result provides a useful characterization of these sets.

Proposition 1.3 Let C ⊆ X. It holds:

(a) linC ={∑k

i=1 λixi | k ∈ N, xi ∈ C, λi ∈ R, i = 1, ..., k}

.

(b) aff C ={∑k

i=1 λixi | k ∈ N, xi ∈ C, λi ∈ R, i = 1, ..., k,∑k

i=1 λi = 1}

.

(c) coC ={∑k

i=1 λixi | k ∈ N, xi ∈ C, λi ≥ 0, i = 1, ..., k,∑k

i=1 λi = 1}

.

(d) coneC := {λx | x ∈ C, λ ≥ 0}.

Proof. We will prove only statement (c). The other statements can be provenin an analogous way.

Let D :={∑k

i=1 λixi | k ∈ N, xi ∈ C, λi ≥ 0, i = 1, ..., k,∑k

i=1 λi = 1}

. It is

obvious that C ⊆ D. Choosing an arbitrary element x ∈ D, this can be repre-sented as x =

∑ki=1 λixi, for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such that∑k

i=1 λi = 1. Since xi ∈ coC, i = 1, ..., k, it holds that x =∑k

i=1 λixi ∈ coC.This proves that C ⊆ D ⊆ coC.

The conclusion will follow by proving that D is a convex set. Let x, y ∈ Dand t ∈ [0, 1]. Then x =

∑ki=1 λixi, for k ∈ N, xi ∈ C and λi ≥ 0, i = 1, ..., k, such

1 Preliminary notions and results 9

that∑k

i=1 λi = 1, and y =∑l

j=1 µjyj, for j ∈ N, yj ∈ C and µj ≥ 0, j = 1, ..., l,

such that∑l

k=1 µj = 1. Then it holds

tx+ (1− t)y =k∑i=1

tλixi +l∑

j=1

(1− t)µjyj ∈ D,

since∑k

i=1 tλi +∑l

j=1(1− t)µj = t+ 1− t = 1. �

If C ⊆ X is a convex set, then λC + µC = (λ + µ)C for all λ, µ ≥ 0. Fromhere it follows that, if C is a convex set, then coneC is also a convex set.

An operator T : X → Y , where Y is another nontrivial real normed space, iscalled

(a) linear, if T (λx+ µy) = λT (x) + µT (y) for all x, y ∈ X and all λ, µ ∈ R;

(b) affine, if T (λx + (1 − λ)y) = λT (x) + (1 − λ)T (y) for all x, y ∈ X and allλ ∈ R.

It is easy to see that T is affine if and only if x 7→ T (x) − T (0) is linear. IfT : X → Y is an affine operator and C ⊆ X and D ⊆ Y are convex sets, thenthe sets T−1(D) = {x ∈ X | T (x) ∈ D} ⊆ X and T (C) := {T (x) | x ∈ C} ⊆ Yare convex, too.

If Ci ⊆ X are convex sets and λi ∈ R, i = 1, ..., k, then T : X × ... × X →X,T (x1, ..., xk) =

∑ki=1 λixi, is a linear operator. The set C1× ...×Ck is a convex

subset of the product space X × ... ×X, which implies that T (C1 × ... × Ck) =∑ki=1 λiCi is a convex set.

Proposition 1.4 Let C ⊆ X be a given set and x ∈ aff C. It holds

aff C = x+ lin(C − x).

Proof. “⊇” We have that C − x ⊆ aff C − x and aff C − x is a linear subspace,therefore lin(C − x) ⊆ aff C − x.

“⊆” Let y ∈ aff C. Then y =∑k

i=1 λixi, for k ∈ N, xi ∈ C, λi ∈ R, i = 1, ..., k,

such that∑k

i=1 λi = 1. It holds y − x =∑k

i=1 λixi − x =∑k

i=1 λi(xi − x) ∈lin(C − x), which proves that aff C − x ⊆ lin(C − x). �

The dimension of a given set C ⊆ X is defined as being the dimension of itsaffine hull. The following theorem provides a more precise representation for theconvex hull of a set of finite dimension. Before formulating it, we want to recallthat the vectors xi ∈ X, i = 0, 1, ...,m, are said to be affinely independent if thedimension of the set aff ({x0, x1, ..., xm}) is equal to m. Otherwise they are saidto be affinely dependent.

10 I Convex sets

According to Proposition 1.4 it holds

aff ({x0, x1, ..., xm}) = x0 + lin ({x1 − x0, ..., xm − x0}) ,

consequently, x0, x1, ..., xm are affinely independent if and only if x1−x0, ..., xm−x0 are linearly independent. This is further equivalent to

m∑i=0

λixi = 0,m∑i=0

λi = 0⇒ (λ0, λ1, ..., λm) = 0. (1.1)

Theorem 1.5 (Theorem of Caratheodory) Let C ⊆ X be a nonempty set withdimC = k. Then x ∈ coC if and only if x can be represented as a convexcombination of k + 1 elements of C, namely, there exist xi ∈ C and λi ≥ 0, i =1, ..., k + 1, with

∑k+1i=1 λi = 1, such that x =

∑k+1i=1 λixi.

Proof. “⇐” It follows by Proposition 1.3 (c).“⇒” Let x ∈ coC. Then there exist m ∈ N, xi ∈ C, λi ≥ 0, i = 1, ...,m, with∑mi=1 λi = 1, such that x =

∑mi=1 λixi. If m ≤ k + 1, then there is nothing to be

proven.Assume that m > k+1. As dim aff C = dimC = k and xi ∈ aff C, i = 1, ...,m,

it follows that x1, ..., xm are affinely dependent, which means that there exists(µ1, ..., µm) 6= 0 such that

∑mi=1 µixi = 0 and

∑mi=1 µi = 0. We define

s := min

{λiµi

∣∣∣∣ µi > 0, i = 1, ...,m

}and λi := λi − sµi, i = 1, ...,m.

Since there exists at least one positive µi, i = 1, ...,m, s is well-defined. Thenλi ≥ 0 for all i = 1, ...,m,

∑mi=1 λi =

∑mi=1 λi − s

∑mi=1 µi = 1 and

∑mi=1 λixi =∑m

i=1 λixi− s∑m

i=1 µixi =∑m

i=1 λixi = x. Since at least one of the λi, i = 1, ...,m,is equal to zero, this means that x can be represented as a convex combinationof m− 1 elements in C.

If m − 1 = k + 1, then the proof is done. Otherwise, we can repeat theprocedure until we obtain a representation for x as a convex combination of atmost k + 1 elements in C. �

The next theorem shows that the interior and the closure of a convex set arealso convex sets.

Theorem 1.6 Let C ⊆ X be a convex set. The following statements are true:

(a) clC is a convex set;

(b) for all x ∈ intC, y ∈ clC and λ ∈ (0, 1] it holds λx+ (1− λ)y ∈ intC;

(c) intC is a convex set;


(d) if intC 6= ∅, then cl(intC) = clC and int(clC) = intC.

Proof. (a) Let x, y ∈ clC and λ ∈ [0, 1]. Let (xk)k∈N, (yk)k∈N ⊆ C such that

xk → x and yk → y as k → +∞. Then λxk + (1 − λ)yk ∈ C for all k ∈ N,consequently, λx+ (1− λ)y ∈ clC.

(b) Let x ∈ intC, y ∈ clC and λ ∈ (0, 1]. Then there exists ε > 0 such thatx + B

(0, 2−λ

λε)

= B(x, 2−λ

λε)

={u ∈ X | ‖u− x‖ < 2−λ

λε}⊆ C. On the other

hand, B(y, ε) ∩ C 6= ∅ or, equivalently, y ∈ B(0, ε) + C. From here we obtain

B(λx+ (1− λ)y, ε) = λx+ (1− λ)y +B(0, ε)

⊆ λx+ (1− λ)B(0, ε) + (1− λ)C +B(0, ε)

= λx+ (2− λ)B(0, ε) + (1− λ)C

= λ

(x+

2− λλ

B(0, ε)

)+ (1− λ)C ⊆ λC + (1− λ)C = C.

Consequently, λx+ (1− λ)y ∈ intC.(c) Is a direct consequence of (b).(d) Let x ∈ intC. It holds cl(intC) ⊆ clC. Let y ∈ clC. According to (b),

1kx +

(1− 1

k

)y ∈ intC for all k ∈ N. We let k → +∞ and obtain y ∈ cl(intC),

thus cl(intC) = clC.It holds intC ⊆ int(clC). Let y ∈ int(clC). Then there exists ε > 0 such that

B(y, ε) ⊆ clC. Let λ < 0 such that u := y+ λ(x− y) ∈ B(y, ε) ⊆ clC. It followsfrom (b) that y = −λ

1−λx+ 11−λu ∈ intC, which proves that int(clC) ⊆ intC, thus

int(clC) = intC. �

Remark 1.7 If C ⊆ X is a given set, then cl(coC) is a convex set and it holds

coC ⊆ co(clC) ⊆ cl(coC).

The second is in general strict, as it can be seen for the set C = R×{0}∪{(0, 1)} ⊆R2.

In the last part of this section we will propose some refinements of the conceptof the interior.

Definition 1.8 (algebraic interior/core) Let C ⊆ X. The algebraic interior orcore of the set C is defined as

coreC := {x ∈ X | ∀u ∈ X ∃δ > 0 such that ∀λ ∈ [0, δ] it holds x+ λu ∈ C}.

One always has that intC ⊆ coreC ⊆ C. Indeed, let x ∈ intC and ε > 0such that B(x, ε) ⊆ C. Let u ∈ X. Then there exists δ > 0 such that x + δu ∈B(x, ε) ⊆ C. Consequently, for all λ ∈ [0, δ], it holds x+λu ∈ B(x, ε) ⊆ C, thus,x ∈ coreC.

12 I Convex sets

Proposition 1.9 Let C ⊆ X be a nonempty convex set. For every x ∈ C itholds:

x ∈ coreC ⇔ ∀u ∈ X ∃δ > 0 such that x+ δu ∈ C ⇔ cone(C − x) = X.

Proof. It is easy to see that x ∈ coreC implies ∀u ∈ X ∃δ > 0 such that x+δu ∈C and also that this statement implies cone(C − x) = X.

Assume that cone(C − x) = X and let u ∈ X. Then there exists δ > 0 suchthat u ∈ 1

δ(C − x) or, equivalently, x+ δu ∈ C. Let λ ∈ [0, δ]. Then

x+ λu =

(1− λ

δ

)x+

λ

δ(x+ δu) ∈ C,

which proves that x ∈ coreC. �

The following example introduces a convex set with nonempty algebraic inte-rior and empty interior.

Example 1.10 Let X be a infinite-dimensional normed space and x] : X → Ra non-continuous linear functional (one can use Hamel bases to construct such afunctional). Let C := {x ∈ X | |x](x)| < 1}, which is obviously a convex set.

We prove that coreC = C. Let x ∈ C and α := x](x) ∈ R with |α| < 1. Letu ∈ X. Then ∣∣∣∣x](x+

1− |α|2(|x](u)|+ 1)

u

)∣∣∣∣ ≤ |x](x)|+ 1− |α|2

< 1,

thus u ∈ 2(|x](u)|+1)1−|α| (C − x) ⊆ cone(C − x). This proves that x ∈ coreC.

On the other hand, intC = ∅. Indeed, assume that there exists x ∈ intC.Then there exists δ > 0 such that B(x, δ) ⊆ C. This means that for everyu ∈ X, ‖u‖ ≤ 1, it holds x + δ

2u ∈ C, thus

∣∣x](x) + δ2x](u)

∣∣ < 1 and, from here,|x](u)| < 2

δ(1 + |x](x)|). Since this actually means that x] is continuous, we get

a contradiction.

The following theorem provides conditions for the algebraic interior of a con-vex set to be equal with the interior of the set.

Theorem 1.11 Let C ⊆ X be a convex set. If one of the following conditions isfulfilled

(a) intC is nonempty;

(b) X is finite-dimensional;

(c) X is a Banach space and C is closed;


then intC = coreC.

Proof. (a) Assume that intC is nonempty and choose y ∈ intC. Let x ∈ coreC.Then there exists δ > 0 such that z := x + δ(x − y) = (1 + δ)x − δy ∈ C. Lett := δ

δ+1∈ (0, 1). According to Theorem 1.6 (b) we have ty + (1 − t)z ∈ intC.

However, ty + (1 − t)z = δδ+1

y + 1δ+1

((1 + δ)x − δy) = x, which proves thatx ∈ intC.

(b) Assume that dimX = n. Let x ∈ coreC. Then there exists a basis{e1, ..., en} of X such that x ± ei ∈ C for all i = 1, ..., n. The function ‖ · ‖1 :X → R, ‖

∑ni=1 αiei‖1 =

∑ni=1 |αi|, defines a norm on X. Since any two norms on

finite-dimensional spaces are equivalent, the set B‖·‖1(0, 1) is a neighbourhood ofzero in the norm topology of X.

For every y ∈ x+B‖·‖1(0, 1) we have that y−x =∑n

i=1 αiei, where∑n

i=1 |αi| <1. The convexity of C gives

y = x+n∑i=1

αiei =∑i:αi>0

αi(x+ ei) +∑i:αi<0

(−αi)(x− ei) +

(1−

n∑i=1

|αi|

)x ∈ C,

thus, x ∈ intC.(c) Assume that X is a Banach space and C is closed. Let x ∈ coreC. By

the definition of the algebraic interior we have that X =⋃n∈N n(C − x). By the

Baire Category Theorem (see, for instance, [4, Corollary 10.4]), we have that Xis a set of the second category, which means that there exists n0 ∈ N such that∅ 6= int(n0(C − x)) = int(n0(C−x)). Consequently, ∅ 6= int(C−x), which meansthat there exists u ∈ C−x and ε > 0 such that x+u+B(0, ε) ⊆ C. On the otherhand, since x ∈ coreC, there exists t > 0 such that x − tu ∈ C. The convexityof C yields

x+B

(0,

εt

t+ 1

)=

t

t+ 1(x+ u+B(0, ε)) +

1

t+ 1(x− tu) ⊆ C,

thus x ∈ intC. �

Definition 1.12 (algebraic relative interior/intrinsic core) Let C ⊆ X. Thealgebraic relative interior or intrinsic core of the set C is defined as

icrC:={x ∈ X | ∀y ∈ aff C ∃δ > 0 such that ∀λ ∈ [−δ, δ] it holds x+λ(y−x) ∈C}.

Proposition 1.13 Let C ⊆ X. Then x ∈ coreC if and only if aff C = X andx ∈ icrC.

Proof. “⇐” It follows from the definitions of the intrinsic core and of the core.“⇒” Let x ∈ coreC. It is clear that x ∈ icrC. We will prove that aff C = X.

Let u ∈ X. Then there exists λ > 0 such that x + λ(u − x) ∈ C. From here,we get u ∈ 1

λC +

(1− 1

λ

)x ⊆ 1

λaff C +

(1− 1

λ

)aff C ⊆ aff C. This proves that

aff C = X. �

14 I Convex sets

As a direct consequence of Proposition 1.13 it follows that, if C ⊆ X is a setwith nonempty interior, then aff C = X.

The following proposition provides an useful geometric characterization forthe intrinsic core of a convex set.

Proposition 1.14 Let C ⊆ X be a nonempty convex set. For every x ∈ C itholds:

(a) lin(C −x) = cone(C −C), thus, cone(C −C) is the linear subspace parallelto aff C;

(b)

x ∈ icrC ⇔ ∀y ∈ C ∃δ > 0 such that (1 + δ)x− δy ∈ C⇔ cone(C − x) = cone(C − C)

⇔ cone(C − x) is a linear subspace.

Proof. (a) We have C − C = C − x + x − C ⊆ lin(C − x), thus C − x ⊆C−C ⊆ cone(C−C) ⊆ lin(C−x). The conclusion will follow after proving thatcone(C − C) is a linear subspace.

Let u ∈ cone(C − C). Then u = λ(x − y) for λ ≥ 0 and x, y ∈ C. Thenαu ∈ cone(C − C) for all α ∈ R.

Let u, v ∈ cone(C − C). Then u = λu(xu − yu) for λu ≥ 0 and xu, yu ∈ C,and v = λv(xv − yv) for λv ≥ 0 and xv, yv ∈ C. Then

u+ v = λuxu + λvxv − λuyu − λvyv ∈ (λu + λv)(C − C) ∈ cone(C − C).

(b) Let x ∈ icrC. From the definition of the intrinsic core it follows that∀y ∈ C ∃δ > 0 such that (1 + δ)x− δy ∈ C.

Assuming that this property holds, we have that x− C ∈ cone(C − x). Con-sequently, C − C = C − x+ x− C ⊆ cone(C − x) + cone(C − x) ⊆ cone(C − x)and, further, cone(C −C) ⊆ cone(C − x). Since the opposite inclusion is alwaystrue, it holds cone(C − x) = cone(C − C).

Assuming that cone(C − x) = cone(C − C), according to (a), we have thatcone(C − x) is a linear subspace.

Assume now that cone(C − x) is a linear subspace. We will prove that x ∈icrC. Since C − x ⊆ cone(C − x) ⊆ cone(C − C) ⊆ lin(C − x), we have thatcone(C −x) = cone(C −C) = lin(C −x) = aff C −x. Let y ∈ aff C, y 6= x. Thenthere exist α, β > 0 such that y− x ∈ α(C − x) and x− y ∈ β(C − x). Let δ > 0such that − 1

β< −δ < δ < 1

αand λ ∈ [−δ, δ]. If λ ∈ [0, δ], then x + λ(y − x) ∈

λαC+(1−λα)x ∈ C. If λ ∈ [−δ, 0), then x+λ(y−x) = (−λβ)C+(1+λβ)x ∈ C.This proves that x ∈ icrC. �


The following notion will be used in the formulation of regularity conditionsfor strong duality. We will address it in detail in the next section when X is afinite-dimensional space.

Definition 1.15 (strong quasi-relative interior) Let C ⊆ X be a nonempty con-vex set. The strong quasi-relative interior of the set C is defined as

sqriC := {x ∈ C | cone(C − x) is a closed linear subspace}.

If C is convex, then it holds

intC ⊆ coreC ⊆ sqriC ⊆ icrC ⊆ C,

all inclusions being in general strict.

Remark 1.16 (quasi interior and quasi-relative interior) In the literature onecan find two further notions which generalize the interior of a convex set C ⊆ X.These are its quasi interior

qiC := {x ∈ C | cl(cone(C − x)) = X}

and its quasi-relative interior

qriC := {x ∈ C | cl(cone(C − x)) is a linear subspace}.

One has

intC ⊆ coreC ⊆sqriC ⊆ icrC

qiC⊆ qriC ⊆ C,

all the inclusions being in general strict. If intC 6= ∅, then all interior notions inthe above scheme become equal to intC.

For p ∈ [1,+∞), let `p be the real Banach space of real sequences (tn)n∈N such

that+∞∑n=1

|tn|p < +∞, equipped with the norm ‖ · ‖ : `p → R, ‖x‖ =( +∞∑n=1

|tn|p)1/p

for all x = (tn)n∈N ∈ `p. For `p+ = {(tn)n∈N ∈ `p : tn ≥ 0 ∀n ∈ N}, the positivecone of `p, it holds

int(`p+) = core(`p+) = sqri(`p+) = icr(`p+) = ∅,

while

qi(`p+) = qri(`p+) = {(tn)n∈N ∈ `p : tn > 0 ∀n ∈ N}.

16 I Convex sets

2 The relative interior of a convex set in finite-

dimensional spaces

Throughout this section we assume that X = Rn and that it is endowed with theEuclidean topology.

Definition 2.1 (relative interior) Let C ⊆ Rn be a nonempty convex set. Therelative interior of the set C is defined as

riC := {x ∈ C | ∃ε > 0 such that B(x, ε) ∩ aff C ⊆ C}

and it represents the interior of the set C relative to its affine hull. If intC 6= ∅,then aff C = Rn and riC = intC.

Example 2.2 For C = [0, 1] × {0} ⊆ R2 we have intC = ∅, while riC =(0, 1)× {0}.

Remark 2.3 (a) Let C ⊆ Rn. Since aff C is closed, it hold clC ⊆ cl(aff C) =aff C. This shows that it makes no sense to consider the closure of the set Crelative to its affine hull. We have

aff C ⊆ aff(clC) ⊆ aff(aff C) = aff C,

thus aff C = aff(clC).(b) For C1 ⊆ C2 ⊆ Rn, it holds aff C1 ⊆ aff C2, intC1 ⊆ intC2 and clC1 ⊆

clC2, however, not necessarily riC1 ⊆ riC2. Take, for instance, C1 = [0, 1]× {0}and C2 = [0, 1]× [0, 1].

If, in addition, aff C1 = aff C2, then, obviously, riC1 ⊆ riC2.

Proposition 2.4 Let C ⊆ Rn. The following statements are true:

(a) x ∈ riC ⇔ x ∈ C and there exists ε > 0 such that B(x, ε) ∩ aff C ⊆ riC;

(b) if riC 6= ∅, then aff(riC) = aff C = aff(clC);

(c) ri(riC) = riC.

Proof. (a) ”⇐” It follows from the definition of the relative interior.”⇒” As x ∈ riC, it holds x ∈ C and there exists ε > 0 such that B(x, ε) ∩

aff C ⊆ C. We will prove that B(x, ε2) ∩ aff C ⊆ riC. Let y ∈ B(x, ε

2) ∩ aff C.

Then y ∈ C and, since B(y, ε2) ⊆ B(x, ε), it holds B(y, ε

2) ∩ aff C ⊆ C. This

proves that y belongs to riC.(b) The second equality follows according to Remark 2.3. Let x ∈ C and

y ∈ riC ⊆ C. According to (a), there exists ε > 0 such that B(y, ε)∩aff C ⊆ riC.Since [y, x] := {(1− λ)y + λx | λ ∈ [0, 1]} ⊆ aff C, it holds B(y, ε)∩ [y, x] ⊆ riC.

2 The relative interior of a set in finite-dimensional spaces 17

Consequently, there exists λ ∈ (0, 1) such that z := (1−λ)y+λx ∈ riC. Thereforex = 1

λz +

(1− 1

λ

)y ∈ aff(riC), which proves that C ⊆ aff(riC). From here it

follows that aff C ⊆ aff(riC) ⊆ aff C.(c) The statement is obviously true for riC = ∅. Assume that riC 6= ∅.

According to (a) and (b) we have

x ∈ riC ⇔ x ∈ C and ∃ε > 0 such that B(x, ε) ∩ aff C ⊆ riC

⇔ x ∈ riC and ∃ε > 0 such that B(x, ε) ∩ aff(riC) ⊆ riC ⇔ x ∈ ri(riC).

�

The following theorem explains the prominent role the concept of relativeinterior plays for convex sets in finite-dimensional spaces.

Theorem 2.5 Let C ⊆ Rn be a nonempty convex set. Then it holds riC 6= ∅.

Proof. Let m := dimC and x0 ∈ C. Then dim(lin(C − x0)) = dim(aff C) = mand there exist x1, ..., xm ∈ C such that x1 − x0, ..., xm − x0 form a basis forlin(C − x0).

We consider the m-dimensional simplex S := co{x0, x1, ..., xm}. Since C is aconvex set, S ⊆ C and aff S ⊆ aff C. On the other hand, according to Proposition1.4, for z ∈ aff C, it holds z − x0 ∈ lin(C − x0) and, therefore, there existλ1, ..., λm ∈ R such that

z − x0 =m∑i=1

λi(xi − x0)⇔ z =

(1−

m∑i=1

λi

)x0 +

m∑i=1

λixi ∈ aff S.

This shows that aff C = aff S and, consequently, riS ⊆ riC.Next we will prove that x := 1

m+1

∑mi=0 xi ∈ riS, which will provide the

desired conclusion.Obviously, x ∈ aff S. For every u ∈ aff S−x there exist µi(u), i = 1, ...,m, such

that u =∑m

i=0 µi(u)xi and∑m

i=0 µi(u) = 0. Since x0, x1, ..., xm are affinely inde-pendent, the coefficients µi(u), i = 0, ...,m, are uniquely defined. The mappingu 7→ (µ0(u), µ1(u), ..., µm(u)) from aff S − x to Rm+1 is linear and, therefore, alsocontinuous. This implies that there exists ε > 0 such that for all u ∈ aff(S)− xwith ‖u‖ < ε it holds |µi(u)| ≤ 1

m+1for all i = 0, ...,m.

We will prove thatB(x, ε) ∩ aff(S) ⊆ S. (2.1)

Let y ∈ B(x, ε)∩ aff S. Then y−x ∈ aff S−x and ‖y−x‖ < ε. This means that|µi(y − x)| ≤ 1

m+1for all i = 0, ...,m. On the other hand, it holds

y = x+ (y − x) = x+m∑i=0

µi(y − x)xi =m∑i=0

(1

m+ 1+ µi(y − x)

)xi,

18 I Convex sets

1m+1

+ µi(y − x) ≥ 0 for all i = 0, ...,m, and∑m

i=0

(1

m+1+ µi(y − x)

)= 1, which

implies that y ∈ co{x0, x1, ..., xm} = S. This proves (2.1), consequently, x ∈ riS.�

We will see in the following that in finite-dimensional spaces the intrinsic core,the strong quasi-relative interior and the relative interior of a convex set coincide.

Theorem 2.6 Let C ⊆ Rn be a convex set. It holds

icrC = sqriC = riC.

Proof. The first equality follows from Proposition 1.14 (b), therefore, we willonly prove that icrC = riC.

Let x ∈ riC. Then x ∈ C and there exists ε > 0 such that B(x, ε)∩aff C ⊆ C.Let y ∈ C. Then there exists δ > 0 such that x + δ(x − y) ∈ B(x, ε) and, sincex + δ(x − y) ∈ aff C, it holds x + δ(x − y) ∈ C. According to Proposition 1.14(b), we have x ∈ icrC.

Let now x ∈ icrC. Then x ∈ C and for every u ∈ aff C − x = lin(C − x)there exists δ > 0 such that for all λ ∈ [−δ, δ] it holds x + λu ∈ C. Letm := dim(lin(C−x)) and {e1, ..., em} a basis of lin(C−x) such that x±ei ∈ C forall i = 1, ...,m. As in the proof of Theorem 1.11 (b), one can find a neighbourhoodof zero in the topology induced by the norm topology on lin(S−x), which can berepresented as B(0, ε)∩lin(C−x), for ε > 0, such that x+B(0, ε)∩lin(C−x) ⊆ Cor, equivalently, B(x, ε) ∩ aff C ⊆ C. This proves that x ∈ riC. �

In the following we will discuss some properties of the relative interior of aconvex set.

Theorem 2.7 Let C ⊆ Rn be a nonempty convex set. The following statementsare true:

(a) for all x ∈ riC, y ∈ clC and λ ∈ (0, 1] it holds λx+ (1− λ)y ∈ riC;

(b) riC is a convex set;

(c) cl(riC) = clC and ri(clC) = riC.

Proof. The statements can be proved in the lines of Theorem 1.6 (b)-(d) bytaking into account that riC 6= ∅ and aff(riC) = aff C = aff(clC). �

The following corollary is a direct consequence of Theorem 2.7.

Corollary 2.8 Let C1, C2 ⊆ Rn be given convex sets. The following statementsare equivalent:

(i) clC1 = clC2;


(ii) riC1 = riC2;

(iii) riC1 ⊆ C2 ⊆ clC1.

In the following we will give a useful characterization of the relative interiorof a convex set.

Theorem 2.9 Let C ⊆ Rn be a nonempty convex set. It holds

x ∈ riC ⇔ ∀y ∈ C ∃λ > 1 such that (1− λ)y + λx ∈ C. (2.2)

Proof. ”⇒” Let x ∈ riC and ε > 0 such that B(x, ε) ∩ aff C ⊆ C. Let y ∈ C.Then (1− λ)y+ λx ∈ aff C for all λ ∈ R and there exists λ ∈ R, λ > 1, such that(1− λ)y + λx = x+ (λ− 1)(x− y) ∈ B(x, ε). This implies (1− λ)y + λx ∈ C.

”⇐” Since riC 6= ∅, we can choose an element y in this set. Consequently,there exists λ > 1 such that z := (1 − λ)y + λx ∈ C. Then 1

λ∈ (0, 1) and,

according to Theorem 2.7 (a), x = 1λz +

(1− 1

λ

)y ∈ riC. �

Remark 2.10 (a) Theorem 2.9 is actually saying that, for all x ∈ riC and ally ∈ C, the segment [x, y] can be extended beyond its endpoint x without leavingthe set C. This means not only that there exists λ > 1 with (1− λ)y + λx ∈ C,but also that (1− µ)y + µx ∈ C holds for all µ ∈ [1, λ].

(b) If C ⊆ Rn is a nonempty convex set, then one also has

x ∈ riC ⇔ ∀y ∈ clC ∃λ > 1 such that (1− λ)y + λx ∈ C⇔ ∀y ∈ C ∃λ > 1 such that (1− λ)y + λx ∈ riC.

Let x ∈ riC and y ∈ clC. Let ε > 0 such that B(x, ε) ∩ aff C ⊆ C andλ ∈ R, λ > 1, such that (1 − λ)y + λx = x + (λ − 1)(x − y) ∈ B(x, ε). Since(1− λ)y + λx ∈ aff(clC) = aff C, it follows (1− λ)y + λx ∈ C. This proves thefirst equivalence.

The second equivalence follows from

x ∈ riC ⇔ x ∈ ri(riC)

⇔ ∀y ∈ cl(riC) = clC ∃λ > 1 such that (1− λ)y + λx ∈ riC.

The next theorem provides formulas for the closure and the relative interiorof the intersection of a family of convex sets.

Theorem 2.11 Let (Ci)i∈I ⊆ Rn be a family of convex sets with⋂i∈I

riCi 6= ∅. It

holds

cl

(⋂i∈I

Ci

)=⋂i∈I

clCi. (2.3)

In addition, if I is a finite index set, it holds

ri

(⋂i∈I

Ci

)=⋂i∈I

riCi. (2.4)

20 I Convex sets

Proof. Let x ∈⋂i∈I

riCi and y ∈⋂i∈I

clCi. According to Theorem 2.9, for all

i ∈ I and all λ ∈ (0, 1] it holds λx + (1− λ)y ∈ riCi ⊆ Ci. Consequently, for allλ ∈ (0, 1], y + λ(x− y) ∈

⋂i∈I

riCi ⊆⋂i∈ICi. We let λ converge to zero and obtain

y ∈ cl

(⋂i∈I

riCi

)⊆ cl

(⋂i∈ICi

). This proves that

⋂i∈I

clCi ⊆ cl

(⋂i∈ICi

). Since

the opposite inclusion is always true, relation (2.3) is proved.We have shown above that

cl

(⋂i∈I

Ci

)=⋂i∈I

clCi ⊆ cl

(⋂i∈I

riCi

)⊆ cl

(⋂i∈I

Ci

),

consequently, cl

(⋂i∈I

riCi

)= cl

(⋂i∈ICi

). The sets

⋂i∈I

ri(Ci) and⋂i∈ICi are convex,

therefore, according to Corollary 2.8, ri

(⋂i∈I

riCi

)= ri

(⋂i∈ICi

), which implies

that ri

(⋂i∈ICi

)⊆⋂i∈I

riCi.

We assume now that I is a finite set and choose an arbitrary element x ∈⋂i∈I

riCi. Let y ∈⋂i∈ICi. According to Theorem 2.9, for all i ∈ I, there exists

λi > 1 such that (1 − λi)y + λix ∈ Ci. Moreover, as seen in Remark 2.10(a), for all i ∈ I and all µi ∈ [1, λi] it holds (1 − µi)y + µix ∈ Ci. We defineλ := min{λi : i ∈ I} > 1. For all i ∈ I it holds (1−λ)y+λx ∈ Ci or, equivalently,

(1−λ)y+λx ∈⋂i∈ICi. By applying again Theorem 2.9, we obtain x ∈ ri

(⋂i∈ICi

),

which proves that (2.4) holds. �

Example 2.12 (a) The following example shows that the assumption⋂i∈I

riCi 6=

∅ cannot be omitted to obtain (2.3) and (2.4). Let C1 = (0,+∞) × (0,+∞) ∪{(0, 0)} and C2 = R× {0}. Then riC1 = (0,+∞)× (0,+∞), riC2 = C2, riC1 ∩riC2 = ∅, while ri(C1 ∩ C2) = {(0, 0)}.

Moreover, cl(C1 ∩ C2) = {(0, 0)} and clC1 ∩ clC2 = R+ × {0}.(b) The following example shows that, even if

⋂i∈I

riCi 6= ∅, (2.4) might not be

fulfilled if I is not finite. Let I = N and Cn =[0, 1 + 1

n

], thus riCn =

(0, 1 + 1

n

),

for all n ∈ N. It holds⋂n∈N

riCn = (0, 1], however, ri

( ⋂n∈N

Cn

)= ri((0, 1]) = (0, 1).

In the following we characterize the relative interior and the closure of theimage and of the preimage of a convex set under an affine operator.

Theorem 2.13 Let C ⊆ Rn be a convex set and T : Rn → Rm an affine operator.It holds T (clC) ⊆ clT (C) and T (riC) = riT (C).


Proof. The first statement follows by the continuity of T and does not requireC to be convex. The second statement is obviously true if C is empty. We assumethat C 6= ∅. We have

T (riC) ⊆ T (C) ⊆ T (clC) = T (cl(riC)) ⊆ clT (riC),

which implies that clT (riC) = clT (C). According to Corollary 2.8, riT (C) =riT (riC) ⊆ T (riC).

In order to prove the opposite inclusion we choose x ∈ T (riC) and an arbitraryelement y in T (C). Then there exist x′ ∈ riC and y′ ∈ C such that x = T (x′) undy = T (y′). According to Theorem 2.9, there exists λ > 1 such that (1−λ)y′+λx′ ∈C. This implies that (1−λ)y+λx = T ((1−λ)y′+λx′) ∈ T (C) and, since y waschosen arbitrarily in T (C), it holds x ∈ riT (C). �

Example 2.14 For the convex closed set C ={

(x1, x2) ∈ R2 | x1 > 0, x2 ≥ 1x1

}and the affine operator T : R2 → R, T (x1, x2) = x1, one can notice that

T (clC) = T (C) = (0,+∞) ⊆ clT (C),

and that the inclusion is strict.

Remark 2.15 For a finite family of sets Ci ⊆ Rni , i = 1, ...,m, one has cl(C1 ×...×Cm) = clC1× ...× clCm and aff(C1× ...×Cm) = aff C1× ...× aff Cm, which,according to the definition of the relative interior, leads to

ri(C1 × ...× Cm) = riC1 × ...× riCm.

Considering the convex sets Ci ⊆ Rn, i = 1, ...,m, and the affine operatorT : Rn × ... × Rn → Rn, T (x1, ..., xm) =

∑mi=1 αix

i, where αi ∈ R, i = 1, ...,m, itholds T (C1 × ...× Cm) =

∑mi=1 αiCi and, according to Theorem 2.13,

m∑i=1

αi clCi ⊆ cl

(m∑i=1

αiCi

)

andm∑i=1

αi riCi = ri

(m∑i=1

αiCi

).

Theorem 2.16 Let D ⊆ Rm be a convex set and T : Rn → Rm an affine oper-ator such that T−1(riD) 6= ∅. It holds T−1(clD) = clT−1(D) and T−1(riD) =riT−1(D).

22 I Convex sets

Proof. Let U := Rn ×D ⊆ Rn × Rm and V := {(x, T (x)) : x ∈ Rn} ⊆ Rn × Rm.V is an affine set, therefore, riV = V = clV . Since T−1(riD) 6= ∅, there existsx ∈ Rn such that T (x) ∈ riD. This implies that riU ∩ riV = (Rn× riD)∩V 6= ∅.According to Theorem 2.11 it holds

cl(U ∩ V ) = clU ∩ clV = clU ∩ V

andri(U ∩ V ) = riU ∩ riV = riU ∩ V.

Let PrRn : Rn × Rm → Rn, PrRn(x, y) = x, be the projection operator ofRn×Rm onto Rn. PrRn is a linear operator which fulfils PrRn(U ∩V ) = {x ∈ Rn :T (x) ∈ D}. According to Theorem 2.13 it holds

T−1(riD) = PrRn(riU ∩ V ) = PrRn(ri(U ∩ V )) = ri PrRn

(U ∩ V ) = riT−1(D)

and

T−1(clD) = PrRn(clU ∩ V ) = PrRn cl(U ∩ V ) ⊆ cl PrRn(U ∩ V ) = clT−1(D).

The inclusion clT−1(D) ⊆ T−1(clD) follows from the continuity of T and it isfulfilled for arbitrary sets D ⊆ Rm. �

Example 2.17 Let T : R → R, T (x) = 0 for all x ∈ R. It holds ri(0, 1] =ri[0, 1) = (0, 1) and T−1(ri(0, 1]) = T−1(ri[0, 1)) = ∅. Moreover, clT−1((0, 1]) =∅ 6= R = T−1(cl(0, 1]) and T−1(ri[0, 1)) = ∅ 6= R = riT−1([0, 1)). This shows thatthe condition T−1(riD) 6= ∅ in Theorem 2.16 cannot be omitted.

The following theorem is another important consequence of Theorem 2.13.

Theorem 2.18 Let E ⊆ Rn × Rm be a convex set and define E(x) := {y ∈ Rm :(x, y) ∈ E} for all x ∈ Rn. Further, let C := {x ∈ Rn : E(x) 6= ∅}. It holds

(x, y) ∈ riE ⇔ x ∈ riC and y ∈ riE(x).

Proof. Let PrRn : Rn × Rm → Rn, PrRn(x, y) = x, and PrRm : Rn × Rm →Rm, PrRm(x, y) = y, be the projection operators of Rn × Rm onto Rn and Rm,respectively. Obviously,

(x, y) ∈ riE ⇔ x ∈ PrRn(riE) and y ∈ PrRm(({x} × Rm) ∩ riE).

According to Theorem 2.13, we have PrRn(riE) = ri PrRn(E) = riC. On theother hand, Theorem 2.11 and Theorem 2.13 give

PrRm(({x} × Rm) ∩ riE) = PrRm(ri(({x} × Rm) ∩ E))

= ri PrRm(({x} × Rm) ∩ E) = riE(x).

�


In the last part of this section we will formulate some separation theoremsin finite-dimensional spaces which refine the classical Hahn-Banach SeparationTheorems in real normed spaces, that we recall below.

Throughout this lecture X∗ will denote the dual space of X. We will use thenotation 〈x∗, x〉 := x∗(x) for the evaluation of x∗ ∈ X∗ at x ∈ X.

Theorem 2.19 (Hahn-Banach Separation Theorem; see [4, Theorem 7.6]) LetX be a nontrivial real normed space and C,D ⊆ X two convex sets such that Cis open and C ∩D = ∅. Then there exists x∗ ∈ X∗, x∗ 6= 0, such that

〈x∗, c〉 > 〈x∗, d〉 ∀c ∈ C ∀d ∈ D.

Theorem 2.20 (Hahn-Banach Strong Separation Theorem; see [4, Theorem7.7]) Let X be a nontrivial real normed space and C,D ⊆ X two convex sets suchthat C is compact, D is closed and C∩D = ∅. Then there exists x∗ ∈ X∗, x∗ 6= 0,such that

infc∈C〈x∗, c〉 > sup

d∈D〈x∗, d〉.

We start with the following strong separation result which makes the choiceof the separating hyperplane more precise.

Theorem 2.21 Let C ⊆ Rn be a nonempty convex set and und d /∈ clC. Thenthere exists a ∈ Rn, a 6= 0, such that

inf{aTx | x ∈ C} = inf{aTx | x ∈ clC} > aTd.

Moreover, one can choose a in the set cone(clC − d) such that ‖a‖ = 1.

Proof. The set clC is convex and closed. Let

f := argmine∈clC

‖d− e‖

be the projection of d on clC (see, for instance, [4, Theorem 16.2]). Since d /∈ clC,it holds δ := ‖d− f‖2 > 0.

For every x ∈ clC it holds (see, for instance, [4, Lemma 16.3])

(f − d)T (x− f) ≥ 0⇔ (f − d)Tx ≥ (f − d)Tf

⇔ (f − d)Tx ≥ (f − d)T (f − d) + (f − d)Td = ‖f − d‖2 + (f − d)Td

⇔ (f − d)Tx ≥ δ + (f − d)Td.

Therefore, for b := f − d 6= 0, it holds

inf{bTx : x ∈ clC} ≥ δ + bTd > bTd. (2.5)

Let a := 1‖b‖b ∈ Rn. Then ‖a‖ = 1, a = 1

‖b‖(f−d) ∈ 1‖b‖(clC−d) ⊆ cone(clC−d)

andinf{aTx | x ∈ C} = inf{aTx | x ∈ clC} > aTd.

�

24 I Convex sets

Theorem 2.22 and its consequence, Theorem 2.23, which we will formulatebelow, extend the separation result in Theorem 2.19 to convex sets for which wedo not necessarily assume to have nonempty interiors.

Theorem 2.22 Let C ⊆ Rn be a nonempty convex set and d /∈ riC. Then thereexists a ∈ Rn, a 6= 0, such that

inf{aTx | x ∈ C} ≥ aTd

andsup{aTx | x ∈ C} > aTd.

Proof. If d /∈ clC, then the statement follows from Theorem 2.21. Assume thatd ∈ clC \ riC. According to Theorem 2.7 it holds d /∈ ri(clC). This means thatfor all k ∈ N we must have

B

(d,

1

k

)∩ aff(clC) * clC.

Therefore, for all k ∈ N there exists an element dk ∈ B(d, 1

k

)∩ aff(clC) =

B(d, 1

k

)∩ aff C such that dk /∈ clC. According to Theorem 2.21, for all k ∈ N

there exists ak ∈ Rn fulfilling ‖ak‖ = 1, ak ∈ cone(clC − dk) and

(ak)Tx > (ak)Tdk ∀x ∈ clC. (2.6)

Since dk ∈ aff(clC) = aff C, it holds, according to Proposition 1.4,

clC − dk ⊆ aff C − dk = lin(C − dk) := U(aff C) ∀k ∈ N,

where U(aff C) denotes the linear subspace parallel to aff C. Consequently, ak ∈cone(clC − dk) ⊆ U(aff C) for all k ∈ N.

On the other hand, as ‖ak‖ = 1 for all k ∈ N, there exist a subsequence(akj)j∈N ⊆ Rn and an element a ∈ Rn such that lim

j→+∞akj = a. Therefore,

a ∈ U(aff C) and ‖a‖ = 1. From (2.6) it follows that

aTx = limj→+∞

(akj)Tx ≥ limj→+∞

(akj)Tdkj = aTd ∀x ∈ clC,

consequently, inf{aTx | x ∈ clC} = inf{aTx | x ∈ C} ≥ aTd.It remains to show that there exists z ∈ C such that aT z > aTd. To this end

we assume thataTx = aTd ∀x ∈ C. (2.7)

Let u ∈ U(aff C) be arbitrarily chosen. Since d ∈ aff C, we have u + d ∈ aff C,consequently, there exist m ≥ 1, xi ∈ C, λi ∈ R, i = 1, ...,m, such that

∑mi=1 λi =

1 and u+ d =∑m

i=1 λixi. Acoording to (2.7) it holds

aTu = aT

(m∑i=1

λixi − d

)=

m∑i=1

λiaT (xi − d) = 0.


Since a is an element of the linear subspace U(aff C), we have ‖a‖2 = aTa = 0,which is a contradiction to ‖a‖ = 1. Consequently, sup{aTx | x ∈ C} > aTd. �

Theorem 2.23 (Proper Separation Theorem) Let C1, C2 ⊆ Rn be two nonemptyconvex sets. Then riC1∩riC2 = ∅ if and only of C1 and C2 are properly separable,which means that there exists a ∈ Rn, a 6= 0, such that

inf{aT c | c ∈ C1} ≥ sup{aTd | d ∈ C2}

andsup{aT c | c ∈ C1} > inf{aTd | d ∈ C2}.

Proof. ”⇒” The set C1 − C2 is convex and, since ri(C1 − C2) = riC1 − riC2,it holds 0 /∈ ri(C1 − C2). According to Theorem 2.22 there exists a ∈ Rn, a 6= 0,such that

inf{aT (c− d) | c ∈ C1, d ∈ C2} ≥ 0⇔ inf{aT c | c ∈ C1} ≥ sup{aTd | d ∈ C2}

and

sup{aT (c− d) | c ∈ C1, d ∈ C2} > 0⇔ sup{aT c | c ∈ C1} > inf{aTd | d ∈ C2}.

”⇐” Let C := C1 − C2. It holds

inf{aTx | x ∈ C} ≥ 0 and sup{aTx | x ∈ C} > 0.

For Z := {x ∈ Rn | aTx ≥ 0}, we have C ⊆ Z.Assuming that aTx = 0 for all x ∈ riC, we would have that aTx = 0 for all

x ∈ cl(riC) = clC, which would contradict sup{aTx : x ∈ C} > 0. Consequently,the set riC ∩ riZ = riC ∩ {x ∈ Rn : aTx > 0} is nonempty and, according toTheorem 2.11, riC ∩ riZ = ri(C ∩ Z) = riC, which implies that riC ⊆ riZ.

Assumg that riC1 ∩ riC2 6= ∅ ⇔ 0 ∈ riC, one would have 0 ∈ riZ = intZ ={x ∈ Rn : aTx > 0}. Contradiction! �

26 I Convex sets

Chapter II

Convex functions

In this chapter we will discuss algebraic and topological properties of convexfunctions defined on a nontrivial real normed space X and taking values in theextended real-line R := R ∪ {±∞}. The addition and the multiplication areextended in a natural way from R to R. Additionally, we make the followingconventions:

(+∞) + (−∞) = (−∞) + (+∞) = +∞,0(+∞) = (+∞)0 = +∞, 0(−∞) = (−∞)0 = 0.

3 Algebraic properties of convex functions

Definition 3.1 (convex function) A function f : X → R is called convex if forall x, y ∈ X and all λ ∈ [0, 1] it holds

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y). (3.1)

Remark 3.2 (a) Given a nonempty convex set C ⊆ X and f : C → R, we saythat f is convex on C if

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) ∀x, y ∈ C ∀λ ∈ [0, 1].

Then C is convex and f is convex on C if and only if the function

f : X → R, f(x) =

{f(x), if x ∈ C,+∞, otherwise,

is convex (in the sense of Definition 3.1).(b) It can be proved by induction that a function f : X → R is convex if and

only if for all k ∈ N and all xi ∈ X and λi ≥ 0, i = 1, ..., k, with∑k

i=1 λi = 1 itholds

f

(k∑i=1

λixi

)≤

k∑i=1

λif(xi).

27

28 II Convex functions

Example 3.3 (a) The indicator function of a set C ⊆ X is defined as

δC : X → R, δC(x) =

{0, if x ∈ C,+∞, otherwise.

Then C is a convex set if and only if δC is a convex function.(b) The function x 7→ ‖x‖ is convex.(c) Let (H, 〈·, ·〉) be a real Hilbert space and A : H → H a positive (or,

positive semidefinite) (〈x,Ax〉 ≥ 0 for all x ∈ H) operator. Then x 7→ 12〈x,Ax〉

is a convex function.

The following sets play a central role in convex analysis.

Definition 3.4 For a given function f : X → R we denote by

(a) dom f = {x ∈ X | f(x) < +∞} its effective domain;

(b) epi f = {(x, r) ∈ X × R | f(x) ≤ r} its epigraph;

(c) epiS f = {(x, r) ∈ X × R | f(x) < r} its strict epigraph.

The following proposition relates the convexity of a function to the convexityof its epigraph.

Proposition 3.5 Let f : X → R be a given function. The following statementsare equivalent:

(a) f is convex;

(b) epi f is a convex set;

(c) epiS f is a convex set.

Proof. “(a) ⇒ (c)” Let (x, r), (y, s) ∈ epiS f and λ ∈ [0, 1]. Then f(λx + (1 −λ)y) ≤ λf(x) + (1− λ)f(y) < λr + (1− λ)s, thus

λ(x, r) + (1− λ)(y, s) = (λx+ (1− λ)y, λr + (1− λ)s) ∈ epiS f,

which proves that epiS f is a convex set.“(c) ⇒ (b)” Let (x, r), (y, s) ∈ epi f and λ ∈ [0, 1]. Then, for all k ∈ N,

(x, r + 1k), (y, s + 1

k) ∈ epiS f , and, according to (b), (λx + (1 − λ)y, λr + (1 −

λ)s+ 1k) ∈ epiS f . In other words,

f(λx+ (1− λ)y) < λr + (1− λ)s+1

k∀k ∈ N,

which implies f(λx + (1 − λ)y) ≤ λr + (1 − λ) or, equivalently, λ(x, r) + (1 −λ)(y, s) ∈ epi f .

3 Algebraic properties of convex functions 29

“(b) ⇒ (a)” Let x, y ∈ X and λ ∈ [0, 1]. If f(x) = +∞ or f(y) = +∞, then(3.1) is obviously fulfilled. Assume that f(x) < +∞ and f(y) < +∞ and choosearbitrary r, s ∈ R such that f(x) < r and f(y) < s. Then (x, r), (y, s) ∈ epi fand, according to (b), λ(x, r)+(1−λ)(y, s) = (λx+(1−λ)y, λr+(1−λ)s) ∈ epi for, equivalently, f(λx + (1 − λ)y) ≤ λr + (1 − λ)s. We let r and s convege tof(x) and f(y), respectively, and obtain, also in this case, (3.1). �

Remark 3.6 (a) If f : X → R is a convex function, then, since PrX : X × R→R,PrX(x, r) = x, is a linear operator, dom f = PrX(epi f) is a convex set.

(b) If f : X → R is a convex function, then all its lower lever sets

{x ∈ X | f(x) ≤ λ}, where λ ∈ R,

are convex. The opposite statement is not true, as it can be seen for f : R →R, f(x) = x3.

The next proposition provides a characterization of convex functions whichtake the value −∞ at a point of the intrinsic core of their effective domain.

Proposition 3.7 Let f : X → R be a convex function and x ∈ X such thatf(x) = −∞. Then f(z) = −∞ for all z ∈ icr(dom f).

Proof. Let z ∈ icr dom f . Since dom f is a convex set and x ∈ dom f , accordingto Proposition 1.14 (b), there exists δ > 0 such that y := (1 + δ)z − δx ∈ dom f .It holds z = 1

1+δy + δ

1+δx, consequently,

f(z) = f

(1

1 + δy +

δ

1 + δx

)≤ 1

1 + δf(y) +

δ

1 + δf(x) = −∞.

�

The following notion will allow us to avoid some pathological classes of ex-tended real-valued functions.

Definition 3.8 (proper function) A function f : X → R is said to be proper ifdom f 6= ∅ and f(x) > −∞ for all x ∈ X.

Sublinear functions form an important subclass of the one of convex functions.

Definition 3.9 (sublinear function) A function f : X → R is called sublinear ifit is

(a) subaddtive, namely, f(x+ y) ≤ f(x) + f(y) for all x, y ∈ X;

(b) positively homogeneous, namely, f(0) = 0 and f(λx) = λf(x) for all λ > 0and all x ∈ X.


Example 3.10 (a) The support function of a set C ⊆ X, defined as

σC : X∗ → R, σC(x∗) = supx∈C〈x∗, x〉,

is sublinear.(b) A convex function is sublinear if and only if it is positively homogeneous.(c) Let C ⊆ X be a nonempty convex set. The Minkowski functional of C

pC : X → R, pC(x) = inf{λ ≥ 0 | x ∈ λC},

is sublinear. It holds pB(0,1) = pB(0,1) = ‖ · ‖, where B(0, 1) := clB(0, 1) = {x ∈X | ‖x‖ ≤ 1}.

In the following we will present some operations that allow us to constructconvex functions from others.

We will start by noticing that, if f : X → R is a convex function and λ ≥ 0,then λf is also convex. If f, g : X → R are two convex functions, then f + g isalso convex.

Proposition 3.11 Let fi : X → R, i ∈ I, be a family of convex functions. Thenits pointwise supremum f : X → R, f(x) = supi∈I fi(x), is also a convex function.

Proof. The result follows from epi f = ∩i∈I epi fi and Proposition 3.5. �

In the following we will assume that Y is another nontrivial real normed space.

Proposition 3.12 (infimal value function) Let Φ : X × Y → R be a convexfunction. Then the infimal value function of Φ

h : Y → R, h(y) = infx∈X

Φ(x, y),

is convex.

Proof. Let PrY×R : X × Y × R→ Y × R,PrY×R(x, y, r) = (y, r). One has

(y, s) ∈ epiS h⇔ h(y) < s⇔ ∃x ∈ X such that Φ(x, y) < s

⇔ ∃x ∈ X such that (x, y, s) ∈ epiS Φ⇔ (y, s) ∈ PrY×R(epiS Φ),

which shows that epiS h = PrY×R(epiS Φ). The convexity of h follows via Propo-sition 3.5. �

If f : Y → R is a convex function and T : X → Y is an affine operator, then,obviously f ◦ T : X → R is also convex.

3 Algebraic properties of convex functions 31

Proposition 3.13 (infimal function through an operator) Let f : X → R be aconvex function and T : X → Y an affine operator. The infimal function of fthrough T

Tf : Y → R, (Tf)(y) = inf{f(x) | Tx = y},is convex and it fulfills domTf = T (dom f).

Proof. The graph of the operator T , grT := {(x, y) | Tx = y}, is a convex set,thus Φ : X × Y → R,Φ(x, y) = f(x) + δgrT (x, y) is a convex function. By Propo-sition 3.12 it follows that (Tf)(y) = infx∈X Φ(x, y) is also convex. Moreover,

y ∈ domTf ⇔ (Tf)(y) < +∞⇔ ∃x ∈ X with Tx = y and f(x) < +∞⇔ ∃x ∈ dom f with Tx = y ⇔ y ∈ T (dom f).

�

Proposition 3.14 (infimal convolution) Let fi : X → R, i = 1, ...,m, be convexfunctions. The infimal convolution of the functions f1, ..., fm

f1�...�fm : X → R, (f1�...�fm)(x) = inf

{m∑i=1

fi(xi)∣∣∣ m∑

i=1

xi = x

},

is convex and it fuliflls dom f1�...�fm =∑m

i=1 dom fi.

Proof. Let f : X × ... × X → R, f(x1, ..., xm) =∑m

i=1 fi(xi), and A : X ×... × X → X,A(x1, ..., xm) =

∑mi=1 xi. Then f is a convex function and A is a

linear operator. According to the previous proposition, the infimal function of fthrough A, defined for all x ∈ X by

Af(x) = inf{f(x1, ..., xm)

∣∣∣ A(x1, ..., xm) = x}

= inf

{m∑i=1

fi(xi)∣∣∣ m∑

i=1

xi = x

}= (f1�...�fm)(x),

is also a convex function, and

dom f1�...�fm = domAf = A(dom f) = A(dom f1 × ...× dom fm)

=m∑i=1

dom fi.

�

Example 3.15 (distance function to a convex set) Let C ⊆ X be a convex set.Then δC is a convex function, thus δC�‖ · ‖ is a convex function, too. Since forall x ∈ X

δC�‖ · ‖(x) = inf{‖x− y‖+ δC(y) | y ∈ X} = inf{‖x− y‖ | y ∈ C} = dC(x),

the distance function to the convex set C dC : X → R, dC(x) = infy∈C ‖x− y‖, isconvex.


4 Topological properties of convex functions

This section is dedicated to the study of the most important topological propertiesof convex functions.

Definition 4.1 (lower and upper semicontinuity) Let f : X → R and x ∈ X.We say that f is:

(a) lower semicontinuous at x if

lim infy→x

f(y) = supδ>0

infy∈B(x,δ)

f(y) ≥ f(x); (4.1)

(b) upper semicontinuous at x if (−f) is lower semicontinuous at x;

(c) continuous at x if f(x) ∈ R and f is lower semicontinuous and upper semi-continuous at x;

(d) lower (upper) semicontinuous if it is lower (upper) semicontinuous at everyx ∈ X.

Remark 4.2 Let f : X → R and x ∈ X.(a) It is easy to see that the function f is lower semicontinuous at x ∈ X if

and only if actuallylim infy→x

f(y) = f(x).

(b) The function f is lower semicontinuous at x ∈ X if and only if for everysequence (xk)k∈N ⊆ X with xk → x (k → +∞) it holds

lim infk→+∞

f(xk) = supn≥1

infk≥n

f(xk) ≥ f(x).

(c) If f(x) ∈ R, then f is lower semicontinuous at x ∈ X if and only if

∀ε > 0 ∃δ > 0 such that f(y) ≥ f(x)− ε ∀y ∈ B(x, δ).

The following proposition characterizes the lower semicontinuity of a functionin terms of the closedness of its epigraphs and of the closedness of its lower levelsets (see also Proposition 3.5 and Remark 3.6 (b)).

Proposition 4.3 Let f : X → R be a given function. The following statementsare equivalent:

(a) f is lower semicontinuous;

(b) epi f is a closed set in X × R;

(c) for every λ ∈ R the lower level set {x ∈ X | f(x) ≤ λ} is closed.

4 Topological properties of convex functions 33

Proof. ”(a)⇒ (b)” Assume that there exists (x, r) ∈ cl(epi f) \ epi f . In otherwords, r < f(x). Let ε > 0 such that r < r + ε < f(x). Since f is lowersemicontinuous at x, there exists δ > 0 such that infy∈B(x,δ) f(y) > r + ε, whichimplies that B(x, δ)×(r−ε, r+ε)∩epi f = ∅. Contradiction to (x, r) ∈ cl(epi f)!

”(b) ⇒ (c)” Let λ ∈ R be fixed and assume that there exists an elementx ∈ cl ({z ∈ X | f(z) ≤ λ}) such that f(x) > λ. This means that (x, λ) /∈ epi f ,consequently, there exist ε > 0 and δ > 0 such that B(x, δ)×(λ−ε, λ+ε)∩epi f =∅. This implies that f(z) > λ for all z ∈ B(x, δ), which contradicts the fact thatx ∈ cl ({z ∈ X | f(z) ≤ λ}).

”(c) ⇒ (a)” Let x ∈ X. We will prove that (4.1) is fulfilled. If f(x) = −∞,then there is nothing to be proved.

If f(x) = +∞, then, for all n ∈ N, x ∈ {z ∈ X | f(z) > n}, which is an openset. Consequently, for all n ∈ N, there exists δn > 0 such that B(x, δn) ⊆ {z ∈X | f(z) > n}. Thus, infy∈B(x,δn) f(y) ≥ n. This yields supδ>0 infy∈B(x,δ) f(y) =+∞ = f(x).

Finally, assume that f(x) ∈ R and choose an arbitrary ε > 0. The set{z ∈ X | f(z) > f(x)− ε} is open and it contains x. Consequently, there existsδε > 0 such that B(x, δε) ⊆ {z ∈ X | f(z) > f(x) − ε}, which means thatinfy∈B(x,δε) f(y) ≥ f(x)− ε. Consequently,

supδ>0

infy∈B(x,δ)

f(y) ≥ infy∈B(x,δε)

f(y) ≥ f(x)− ε.

Taking into account that ε > 0 was chosen arbitrary, (4.1) follows. �

Example 4.4 Let C ⊆ X be a given set. Since epi δC = C × R+, δC is a lowersemicontinuous function if and only if C is a closed set.

The following proposition is the topological counterpart of Proposition 3.11and it follows from epi f = ∩i∈I epi fi and Proposition 4.3.

Proposition 4.5 Let fi : X → R, i ∈ I, be a family of lower semicontinuousfunctions. Then its pointwise supremum f : X → R, f(x) = supi∈I fi(x), is alsoa lower semicontinuous function.

The notion which we will introduce as follows is the largest lower semicontin-uous function majorized by a given function.

Definition 4.6 (lower semicontinuous hull) Let f : X → R be a given function.The function

f : X → R, f(x) = inf{t | (x, t) ∈ cl(epi f)},

is called the lower semicontinuous hull of f .


Theorem 4.7 Let f : X → R be a given function. The following statements aretrue:

(a) epi f = cl(epi f), consequently, f is lower semicontinuous;

(b) dom f ⊆ dom f ⊆ cl(dom f);

(c) f(x) = lim infy→x

f(y) for all x ∈ X.

Proof. We have epi f ⊆ cl(epi f) ⊆ epi f and

f(x) = inf{t | (x, t) ∈ cl(epi f)} ≤ inf{t | (x, t) ∈ epi f} = f(x) ∀x ∈ X.

(a) Let (x, r) ∈ epi f . Then f(x) = inf{t : (x, t) ∈ cl(epi f)} ≤ r and for all n ∈ Nthere exists tn ∈ R such that (x, tn) ∈ cl(epi f) and tn < r + 1

n. Consequently,

(x, r + 1n) ∈ cl(epi f) for all n ∈ N, which implies that (x, r) ∈ cl(epi f).

(b) The fact that dom f ⊆ dom f follows from f ≤ f . In order to prove thesecond inclusion we choose an arbitrary element x ∈ dom f . Then there existsr ∈ R such that f(x) ≤ r or, equivalently, (x, r) ∈ epi f = cl(epi f). Let δ > 0.Since B(x, δ)× (r− 1, r+ 1)∩ epi f 6= ∅, we have that B(x, δ)∩ dom f 6= ∅. Thisproves that x ∈ cl(dom f).

(c) Let x ∈ X. Since f ≤ f and f is lower semicontinuous, it holds

f(x) = lim infy→x

f(y) ≤ lim infy→x

f(y).

Assume that f(x) < lim infy→x f(y). Then there exists ε ∈ R such that

f(x) < ε < lim infy→x

f(y) = supδ>0

infy∈B(x,δ)

f(y).

From here it follows that there exists δε > 0 such that infy∈B(x,δε) f(y) > ε. Inother words, for all y ∈ B(x, δε), f(y) > ε or, equivalently, B(x, δε)× (−∞, ε) ∩epi f = ∅. Let r ∈ R such that f(x) < r < ε. Then (x, r) ∈ cl(epi f) and,since (x, r) ∈ B(x, δε) × (−∞, ε), it follows that B(x, δε) × (−∞, ε) ∩ epi f 6= ∅.Contradiction! �

Remark 4.8 The function sup{g | g ≤ f and g is lower semicontinuous} is ac-cording to Proposition 4.5 lower semicontinuous, thus,

f = sup{g | g ≤ f and g is lower semicontinuous}. (4.2)

Since f ≤ f and f is lower semicontinuous, we have

f ≤ sup{g | g ≤ f and g is lower semicontinuous}.

Let h : X → R be an arbitrary lower semicontinuous function with h ≤ f . Itholds epi f = cl(epi f) ⊆ cl(epih) = epih, which yields h ≤ f . This shows that(4.2) is true.


Next we will recall some basic facts related to the weak topology on X inducedby X∗. The collection of sets

Vw(0) :={Vx∗1,...,x∗n,ε(0) | n ∈ N, x∗i ∈ X∗, i = 1, ..., n, ε > 0

},

where

Vx∗1,...,x∗n,ε(0) := {y ∈ X | |〈x∗i , y〉| < ε ∀i = 1, ..., n},

form a basis of neighbourhoods of 0 for the weak topology (every neighbourhoodof 0 contains an element V ∈ Vw(0)). Consequently, for x ∈ X, the collection ofsets

Vw(x) :={Vx∗1,...,x∗n,ε(x) | n ∈ N, x∗i ∈ X∗, i = 1, ..., n, ε > 0

},

where

Vx∗1,...,x∗n,ε(x) := x+ Vx∗1,...,x∗n,ε(0) = {y ∈ X | |〈x∗i , y − x〉| < ε ∀i = 1, ..., n},

will forms a basis of neighbourhoods of x (every neighbourhood of x contains anelement V ∈ Vw(x)).

The weak topology on X is the weakest topology on X making all mappings〈x∗, ·〉 : X → R continuous for all x∗ ∈ X∗. A sequence (xn)n∈N ⊆ X convergesin the weak topology to x ∈ X (we write xn

w−→ x) as n → +∞ if and only iffor all x∗ ∈ X∗ the sequence (〈x∗, xn〉)n∈N converges to 〈x∗, x〉 as n → +∞. If asequence (xn)n∈N ⊆ X converges in the norm topology to x ∈ X (we write, when

necessary, xn‖·‖−→ x) as n→ +∞, then it converges also in the weak topology to

x ∈ X as n → +∞. The opposite statement is in general not true, as it can beseen for the canonical basis sequence (en)n∈N in `p, 1 < p < ∞, or in c0, whichconverges weakly to 0, while ‖en‖ = 1 for all n ∈ N.

For an arbitrary set C ⊆ X we have

C ⊆ clC ⊆ clw C,

where clw C denotes the weak closure of C, which is the closure of C in the weaktopology. Indeed, let x ∈ clC and n ∈ N, x∗i ∈ X∗, i = 1, ..., n, and ε > 0. Letδ := ε

max{‖x∗i ‖ | i=1,...,n}+1. Then there exists y ∈ C such that y ∈ B(x, δ). From

here we get that for every i = 1, ..., n, it holds |〈x∗i , y − x〉| ≤ ‖x∗i ‖‖y − x‖ < ε,thus y ∈ Vx∗1,...,x∗n,ε(x). This proves that x ∈ clw C.

The following theorem shows that the (norm) closure and the weak closure ofa convex set coincide.

Theorem 4.9 Let C ⊆ X be a convex set. Then it holds

clC = clw C.


Proof. Let x ∈ clw C. We will prove that x ∈ clC. Assuming the contrary,by the Hahn-Banach Strong Separation Theorem (see Theorem 2.20) there existsx∗ ∈ X∗, x∗ 6= 0, such that

infc∈C〈x∗, c〉 > 〈x∗, x〉,

therefore, there exists ε > 0 such that infc∈C〈x∗, c − x〉 > ε. This shows thatVx∗,ε(x) ∩ C = ∅, which contradicts x ∈ clw C. �

Definition 4.10 (weak lower semicontinuity) Let f : X → R and x ∈ X. Wesay that f is weakly lower semicontinuous at x if

lim infy

w−→x

f(y) = supV ∈Vw(x)

infy∈V

f(y) ≥ f(x). (4.3)

We say that f is weakly lower semicontinuous if it is weakly lower semicontinuousat every x ∈ X.

Weakly lower semicontinuous functions can be characterized in terms of the weakclosedness of their epigraphs and also of the weak closedness of all their lowerlevel sets as in Proposition 4.3. This fact, together with Theorem 4.9, lead to thefollowing result.

Theorem 4.11 Let f : X → R be a convex function. Then f is lower semicon-tinuous if and only if it is weakly lower semicontinuous.

Let us introduce now the weak∗ topology on X∗ induced by X. The collectionof sets

Vw∗(0) := {Vx1,...,xn,ε(0) | n ∈ N, xi ∈ X, i = 1, ..., n, ε > 0} ,

whereVx1,...,xn,ε(0) := {y∗ ∈ X∗ | |〈y∗, xi〉| < ε ∀i = 1, ..., n},

forms a basis of neighbourhoods of 0 for the weak∗ topology (every neighbourhoodof 0 contains an element V ∈ Vw∗(0)). Consequently, for x∗ ∈ X∗, the collectionof sets

Vw∗(x∗) := {Vx1,...,xn,ε(x∗) | n ∈ N, xi ∈ X, i = 1, ..., n, ε > 0} ,

where

Vx1,...,xn,ε(x∗) := x∗ + Vx1,...,xn,ε(0) = {y∗ ∈ X∗ | |〈y∗ − x∗, xi〉| < ε ∀i = 1, ..., n},

will form a basis of neighbourhoods of x∗ (every neighbourhood of x∗ contains anelement V ∈ Vw∗(x∗)).

The weak∗ topology on X is the weakest topology on X∗ making all mappings〈·, x〉 : X∗ → R continuous for all x ∈ X. A sequence (x∗n)n∈N ⊆ X∗ converges in

the weak∗ topology to x∗ ∈ X∗ (we write x∗nw∗−→ x∗) as n → +∞ if and only if

for all x ∈ X the sequence (〈x∗n, x〉)n∈N converges to 〈x∗, x〉 as n→ +∞.


Definition 4.12 (weak∗ lower semicontinuity) Let f : X∗ → R and x∗ ∈ X∗.We say that f is weakly∗ lower semicontinuous at x∗ if

lim infy∗

w∗−→x∗

f(y∗) = supV ∈Vw∗ (x∗)

infy∗∈V

f(y∗) ≥ f(x∗). (4.4)

We say that f is weakly∗ lower semicontinuous if it is weakly∗ lower semicontin-uous at every x∗ ∈ X∗.

Example 4.13 Let C ⊆ X be a given set. According to Proposition 4.5, thesupport function of C, σC : X∗ → R, σC(x∗) = supx∈C〈x∗, x〉, is weakly∗ lowersemicontinuous.

The following proposition shows that convex and lower semicontinuous func-tions which take the value −∞ cannot take finite values.

Proposition 4.14 Let f : X → R be convex and lower semicontinuous. Iff(x) = −∞, then f(z) = −∞ for all z ∈ dom f . Consequently, if f is sublinearand lower semicontinuous, then f is proper.

Proof. Let z ∈ X such that f(z) = t ∈ R. We have (z, t) ∈ epi f and (x, t−n) ∈epi f for all n ∈ N. Consequently, for all n ∈ N

1

n(x, t− n) +

(1− 1

n

)(z, t) =

(1

nx+

(1− 1

n

)z, t− 1

)∈ epi f,

thus (z, t− 1) ∈ cl(epi f) = epi f . This implies that f(z) ≤ f(z)− 1. Contradic-tion! �

Next we provide an important characterization of functions which are bothconvex and lower semicontinuous. To this end we will introduce first the followingnotion.

Definition 4.15 (affine-continuous minorant) Let f : X → R be a given func-tion. We say that x 7→ 〈x∗, x〉+ c, where (x∗, c) ∈ X∗×R, is an affine-continuousminorant of f if 〈x∗, x〉+ c ≤ f(x) for all x ∈ X.

Theorem 4.16 Let f : X → R be a given function. Then f is convex andlower semicontinuous with f(x) > −∞ for all x ∈ X if and only if the set of itsaffine-continuous minorants is nonempty and

f(x) = sup{〈x∗, x〉+ c | (x∗, c) ∈ X∗ × R, 〈x∗, y〉+ c ≤ f(y) ∀y ∈ X} ∀x ∈ X.


Proof. ”⇐” The statement follows from Proposition 3.11 and Proposition 4.5.”⇒” First we prove that the set

{(x∗, c) ∈ X∗ × R | 〈x∗, y〉+ c ≤ f(y) ∀y ∈ X}

is nonempty.If f is identically +∞, then this is obviously the case. Let now z ∈ X such

that f(z) ∈ R. Since (z, f(z)−1) /∈ epi f , according to Theorem 2.20, there exists(x∗, α) ∈ X∗ × R, (x∗, α) 6= (0, 0), such that

〈x∗, x〉+ αr > 〈x∗, z〉+ α(f(z)− 1) ∀(x, r) ∈ epi f. (4.5)

Since (z, f(z)) ∈ epi f , it holds α > 0, thus, (4.5) can be equivalently written as

r >1

α〈x∗, z − x〉+ f(z)− 1 ∀(x, r) ∈ epi f.

For all x ∈ dom f we have (x, f(x)) ∈ epi f , consequently,

f(x) >1

α〈x∗, z − x〉+ f(z)− 1.

As this inequality holds also for x /∈ dom f ,

x 7→⟨− 1

αx∗, x

⟩+

1

α〈x∗, z〉+ f(z)− 1

is an affine-continuous minorant of f .It obviously holds

f(x) ≥ sup{〈x∗, x〉+ c | (x∗, c) ∈ X∗ × R, 〈x∗, y〉+ c ≤ f(y) ∀y ∈ X} ∀x ∈ X.(4.6)

We will prove that (4.6) is fulfilled as equality. If f is identically +∞, then thisis obviously the case. Assume that f is not identically +∞ and that there existx ∈ X and r ∈ R such that

f(x) > r > sup{〈x∗, x〉+ c | (x∗, c) ∈ X∗ × R, 〈x∗, y〉+ c ≤ f(y) ∀y ∈ X}. (4.7)

It holds (x, r) /∈ epi f and so, applying again Theorem 2.20, we obtain an element(x∗, α) ∈ X∗ × R, (x∗, α) 6= (0, 0), and ε > 0 such that

〈x∗, x〉+ αr > 〈x∗, x〉+ αr + ε ∀(x, r) ∈ epi f. (4.8)

Since f is not identically +∞, there exists (y, s) ∈ epi f . This means that(y, s+ t) ∈ epi f for all t > 0, which yields α ≥ 0.

Assume that f(x) < +∞. The fact that (x, f(x)) ∈ epi f and (4.8) implyα(f(x)− r) > ε, therefore α > 0. After dividing (4.8) by α, we obtain

f(x) >1

α〈x∗, x− x〉+ r +

ε

α>

1

α〈x∗, x− x〉+ r ∀x ∈ dom f.


This means that the mapping x 7→⟨− 1αx∗, x

⟩+ 1α〈x∗, x〉+r is an affine-continuous

minorant of f which takes the value r at x. This contradicts (4.7).Assume that f(x) = +∞. If α > 0, then, by arguing as above, we get a

contradiction to (4.7). Assuming that α = 0, from (4.8) we obtain

〈−x∗, x− x〉+ ε < 0 ∀x ∈ dom f.

Using that the set of affine-continuous minorants of f is nonempty, there exists(x∗, β) ∈ X∗ × R such that

〈x∗, x〉+ β ≤ f(x) ∀x ∈ X.

Set γ := r−(〈x∗,x〉+β)ε

> 0. The mapping

x 7→ 〈x∗ − γx∗, x〉+ γ〈x∗, x〉+ β + γε

is affine-continuous and it fulfills for all x ∈ dom f

〈x∗ − γx∗, x〉+ γ〈x∗, x〉+ β + γε = 〈x∗, x〉+ β + γ (〈−x∗, x− x〉+ ε)

< 〈x∗, x〉+ β ≤ f(x).

Consequently, it is an affine-continuous minorant of f which takes at x := x thevalue

〈x∗ − γx∗, x〉+ γ〈x∗, x〉+ β + γε = 〈x∗, x〉+ β + γε = r.

We obtain also in this case a contraction to (4.7). This concludes the proof. �

In the remainder of this section we will address the continuity of convexfunctions. The following proposition will play an important role in these investi-gations.

Proposition 4.17 Let f : X → R be a proper and convex function and x ∈dom f . Assume that there exist δ > 0 and M ≥ 0 such that

f(y) ≤ f(x) +M ∀y ∈ B(x, δ).

Then it holds

|f(y)− f(x)| ≤ M

δ‖y − x‖ ∀y ∈ B(x, δ),

which means that f is continuous at x.

Proof. Let y ∈ B(x, δ), y 6= x. Then x+ δ‖y−x‖(y − x) ∈ B(x, δ) and, since f is

convex and 0 < ‖y−x‖δ≤ 1,

f(y)− f(x) = f

(‖y − x‖

δ

(x+

δ

‖y − x‖(y − x)

)+

(1− ‖y − x‖

δ

)x

)− f(x)

≤ ‖y − x‖δ

(f

(x+

δ

‖y − x‖(y − x)

)− f(x)

)≤ M

δ‖y − x‖.


This proves that

f(y)− f(x) ≤ M

δ‖y − x‖ ∀y ∈ B(x, δ).

On the other hand, for every y ∈ B(x, δ) we have 2x−y ∈ B(x, δ), thus, by usingagain the convexity of f ,

f(x)− f(y) ≤ f(2x− y)− f(x) ≤ M

δ‖y − x‖.

This provides the conclusion. �

Now can prove the following result which characterizes the continuity of con-vex functions.

Theorem 4.18 (continuity of convex functions) Let f : X → R be a convexfunction with the property that f is bounded above on a neighbourhood of a pointof its domain. If f is not proper, then f is identically −∞ on int(dom f). If fis proper, then f is continuous on int(dom f).

Proof. Let x ∈ dom f , δ > 0 and M ≥ 0 such that f(y) ≤M for all y ∈ B(x, δ).Then B(x, δ) ⊆ dom f , and so x ∈ int(dom f).

If f takes the value −∞, then, according to Proposition 3.7 and Proposition1.13, f(z) = −∞ for all z ∈ icr(dom f) = int(dom f).

Assume now that f is proper and let z ∈ int(dom f). Then there exists γ > 0such that u := z + γ(z − x) ∈ dom f . For λ := 1

1+γ∈ (0, 1) it yields

λu+ (1− λ)B(x, δ) = z − γ

1 + γx+

γ

1 + γx+B

(0,

γ

1 + γδ

)= B

(z,

γ

1 + γδ

),

thus, for every y ∈ B(z, γ

1+γδ)

, y := λu+ (1− λ)v, with v ∈ B(x, δ), and

f(y) ≤λf(u) + (1− λ)f(v) ≤ λf(u) + (1− λ)M = f(z) +M1,

where

M1 := λf(u) + (1− λ)M − f(z)≥λf(u) + (1− λ)f(x)− f(λu+ (1− λ)x) ≥ 0.

The continuity of f at z follows from Proposition 4.17. �

The following results shows that a proper and convex function which is boundedabove on a neighbourhood of a point of its domain is even locally Lipschitzcontinuous on the interior of is domain.


Theorem 4.19 (local Lipschitz continuity of convex functions) Let f : X → Rbe a proper and convex function with the property that f is bounded above on aneighbourhood of a point of its domain. Then f is locally Lipschitz continuous onint(dom f), in other words, for every x ∈ int(dom f) there exist δ > 0 and L > 0such that B(x, δ) ⊆ dom f and

|f(y)− f(z)| ≤ L‖y − z‖ ∀y, z ∈ B(x, δ).

Proof. Let x ∈ int(dom f). According to Theorem 4.18, f is continuous atx, which means that there exists δ > 0 such that B(x, 2δ) ⊆ dom f and for allu, v ∈ B(x, 2δ) it holds

|f(u)− f(v)| ≤ |f(u)− f(x)|+ |f(x)− f(v)| < 1

2+

1

2= 1.

Let y, z ∈ B(x, δ), y 6= z, and α := ‖y− z‖ > 0. We define u := y+ δα

(y− z) andnotice that ‖u− y‖ = δ

α‖y − z‖ = δ, which means that u ∈ B(x, 2δ). Using that

f is convex and noticing that y = αα+δ

u+ δα+δ

z, it yields

f(y)− f(z) ≤ α

α + δf(u) +

δ

α + δf(z)− f(z) =

α

α + δ(f(u)− f(z))<

α

α + δ.

Switching the roles of y and z allows us to conclude that

|f(y)− f(z)| < α

α + δ<α

δ=

1

δ‖y − z‖,

thus the conclusion holds with L := 1δ> 0. �

In finite-dimensional spaces we have a stronger formulation of the last twotheorems.

Theorem 4.20 Let f : Rn → R be a proper and convex function. Then f is lo-cally Lipschitz continuous on int(dom f), and therefore continuous on int(dom f).

Proof. Let x ∈ int(dom f) and 0 < γ ≤ 1 such that x ± γei ∈ dom f forall i = 1, ..., n, where ei, i = 1, ..., n denotes the canonical basis of Rn. Thenthere exists 0 < δ < γ√

nsuch that B(x, δ) ⊆ dom f . Let y ∈ B(x, δ). Then∑n

i=1 |yi − xi| ≤√n√∑n

i=1 |yi − xi|2 ≤√nδ < γ. We denote αi := yi−xi

γfor all

i = 1, ..., n. Then∑n

i=1 |αi| < 1 and

y = x+n∑i=1

αiγei =∑i:αi>0

αi(x+ γei) +∑i:αi<0

(−αi)(x− γei) +

(1−

n∑i=1

|αi|

)x.

By denotingM := max{f(x), f(x± γe1), ..., f(x± γen)},


it holds

f(y) ≤∑i:αi>0

αif(x+ γei) +∑i:αi<0

(−αi)f(x− γei) +

(1−

n∑i=1

|αi|

)f(x) ≤M,

which proves that f is bounded above on B(x, δ). The conclusion follows fromTheorem 4.18 and Theorem 4.19. �

An important consequence of the previous theorem is that convex functionswith full domains defined on finite-dimensional spaces are continuous.

Corollary 4.21 A convex function f : Rn → R is locally Lipschitz continuouseverywhere, and therefore continuous everywhere.

We close the section with a result which relates the lower semicontinuity of aconvex function defined on Banach spaces with their continuity.

Theorem 4.22 Let X be a Banach space and f : X → R a convex and lowersemicontinuous function having a finite value at at least one point. Then f iscontinuous on int(dom f) = core(dom f).

Proof. According to Proposition 4.14, f is a proper function. The inclusionint(dom f) ⊆ core(dom f) holds in general, thus, if core(dom f) = ∅, then thestatement of the theorem is true.

Let x ∈ core(dom f) and r ∈ R such that f(x) < r. The set C := {x ∈X | f(x) ≤ r} is convex and closed. We will prove that x ∈ coreC. To thisend, let u ∈ X fixed. Since x ∈ core(dom f), there exists δ > 0 such thatx+λu ∈ dom f for all λ ∈ [−δ, δ]. Define the function g : R→ R, g(t) = f(x+tu).Then g is a proper and convex function (being the composition of a convex fuctionwith an affine mapping) with [−δ, δ] ⊆ dom g. According to Theorem 4.20, g iscontinuous at 0 ∈ (−δ, δ) ⊆ int(dom g). Since g(0) = f(x) < r, there exists t > 0such that g(t) = f(x + tu) < r, which means that x + tu ∈ C. Since u wasarbitrarily chosen in X, according to Proposition 1.9 x ∈ coreC.

Taking into account that X is Banach space and C is a convex and closed set,Theorem 1.11 implies that x ∈ coreC = intC ⊆ int(dom f). This proves thatint(dom f) = core(dom f).

In addition, since x ∈ intC, there exists a neighbourhood of x which is con-tained in C and, consequently, on which f is bounded above by r. From here itfollows according to Theorem 4.18 that f is continuous at x. �

The following corollary is a direct consequence of Theorem 4.22.

Corollary 4.23 A convex and lower semicontinuous function f : X → R definedon a Banach space X is continuous everywhere.

Chapter III

Conjugate functions andsubdifferentiability

In this chapter we will introduce the notions of conjugate function and (convex)subdifferential of a (convex) function and investigate their properties.

5 Conjugate functions

Definition 5.1 (conjugate function) Let f : X → R be a given function. Thefunction

f ∗ : X∗ → R, f ∗(x∗) := supx∈X{〈x∗, x〉 − f(x)},

is called the (Fenchel) conjugate function of f .

Remark 5.2 The conjugate function of f is convex and weakly∗ lower semicon-tinuous (therefore, norm (strongly) lower semicontinuous on X∗). Indeed, if f isnot proper, then

• either f is identically +∞, which means that f ∗ is identically −∞;

• or there exists x′ ∈ X such that f(x′) = −∞, which means that f ∗ isidentically +∞.

If f is proper, then dom f 6= ∅ and

f ∗(x∗) = supx∈dom f

{〈x∗, x〉 − f(x)} ∀x∗ ∈ X∗.

Since for all x ∈ dom f the mapping x∗ 7→ 〈x∗, x〉 − f(x) is affine and weakly∗

lower semicontinuous, according to Proposition 3.11 and Proposition 4.5, f ∗ isconvex and weakly∗ lower semicontinuous.

In the following result we collect some elementary properties of conjugatefunctions. For their verification one just has to use Definition 5.1.

43

44 III Conjugate functions and subdifferentiability

Proposition 5.3 Let f, g, fi : X → R, i ∈ I, be given functions. The followingstatements are true:

(a) f(x) + f ∗(x∗) ≥ 〈x∗, x〉 ∀x ∈ X ∀x∗ ∈ X∗ (Young-Fenchel inequality);

(b) infx∈X f(x) = −f ∗(0);

(c) if f ≤ g, then f ∗ ≥ g∗;

(d) (supi∈I fi)∗ ≤ infi∈I f

∗i and (infi∈I fi)

∗ = supi∈I f∗i ;

(e) (λf)∗(x∗) = λf ∗(

1λx∗)∀x∗ ∈ X∗ ∀λ > 0;

(f) (f + c)∗(x∗) = f ∗(x∗)− c ∀x∗ ∈ X∗ ∀c ∈ R;

(g) if, for z ∈ X, fz : X → R, fz(x) = f(x − z), then (fz)∗(x∗) = f ∗(x∗) +

〈x∗, z〉 ∀x∗ ∈ X∗;

(h) if, for z∗ ∈ X∗, fz∗ : X → R, fz∗(x) = f(x) + 〈z∗, x〉, then (fz∗)∗(x∗) =

f ∗(x∗ − z∗) ∀x∗ ∈ X∗;

(i) (f + g)∗(x∗ + y∗) ≤ f ∗(x∗) + g∗(y∗) ∀x∗, y∗ ∈ X∗;

(j) (λf + (1− λ)g)∗ (x∗) ≤ λf ∗(x∗) + (1− λ)g∗(x∗) ∀x∗ ∈ X∗ ∀λ ∈ (0, 1).

Another property which follows directly from the definition of the conjugatefunctions is stated in the following proposition.

Proposition 5.4 Let Xi, i = 1, ...,m, be nontrivial real normed spaces, fi : Xi →R, i = 1, ...,m, given functions and

f : X1 × ...×Xm → R, f(x1, ..., xm) =m∑i=1

fi(xi).

Then

f ∗ : X∗1 × ...×X∗m → R, f(x∗1, ..., x∗m) =

m∑i=1

f ∗i (x∗i ).

Example 5.5 (a) For f : R→ R, f(x) = 12x2, it holds

f ∗ : R→ R, f ∗(x∗) =1

2(x∗)2.

(b) For f : R→ R, f(x) = ex, it holds

f ∗ : R→ R, f ∗(x∗) =

x∗(lnx∗ − 1), if x∗ > 0,0, if x∗ = 0,+∞, if x∗ < 0.

5 Conjugate functions 45

(c) For

f : R→ R, f(x) =

x(lnx− 1), if x > 0,0, if x = 0,+∞, if x < 0,

it holds f ∗ : R→ R, f ∗(x∗) = ex∗.

(d) For z∗ ∈ X∗ and c ∈ R, let f : X → R, f(x) = 〈z∗, x〉+ c. For all x∗ ∈ X∗it holds

f ∗(x∗) =

{−c, if x∗ = z∗,+∞, otherwise.

(e) Let C ⊆ X be a given set. For all x∗ ∈ X∗ it holds

δ∗C(x∗) = supx∈X{〈x∗, x〉 − δC(x)} = sup

x∈C〈x∗, x〉 = σC(x∗).

(f) Let C ⊆ X be a nonempty set. The conjugate of the Minkowski functional ofC at x∗ ∈ X∗ reads

p∗C(x∗) = supx∈X

{〈x∗, x〉 − inf{λ ≥ 0 : x ∈ λC}

}= sup

x∈X

〈x∗, x〉+ supλ≥0,x∈λC

{−λ}

= supλ≥0

{− λ+ sup

z∈C〈x∗, λz〉

}

= supλ≥0

{λ

(supz∈C〈x∗, z〉 − 1

)}=

{0, if σC(x∗) ≤ 1,+∞, otherwise.

(g) Since pB(0,1) = ‖ · ‖ and σB(0,1) = ‖ · ‖∗, it holds for all x∗ ∈ X∗

‖ · ‖∗(x∗) = δB∗(0,1)(x∗) =

{0, if ‖x∗‖∗ ≤ 1,+∞, otherwise.

(h) For 1 < p < ∞, let f : X → R, f(x) = 1p‖x‖p. For all x∗ ∈ X∗ it holds

f ∗(x∗) = 1q‖x∗‖q∗, where 1

p+ 1

q= 1.

In the following proposition we calculate the conjugate functions of the infimalfunction of a given function through a continuous linear operator and of theinfimal convolution of a finite family of functions, respectively.

Proposition 5.6 (a) For f : X → R a given function and A : X → Y acontinuous linear operator it holds (Af)∗ = f ∗ ◦ A∗, where

A∗ : Y ∗ → X∗, 〈A∗y∗, x〉 = 〈y∗, Ax〉 ∀y∗ ∈ Y ∗ ∀x ∈ X,

denotes the adjoint operator of A.


(b) For fi : X → R, i = 1, . . . ,m, given functions it holds (f1� . . .�fm)∗ =∑mi=1 f

∗i .

Proof. (a) By the definition of the conjugate function we have for all y∗ ∈ Y ∗

(Af)∗(y∗) = supy∈Y{〈y∗, y〉 − (Af)(y)} = sup

y∈Y{〈y∗, y〉 − inf

x∈X,Ax=yf(x)}

= supx∈X{〈y∗, Ax〉 − f(x)} = sup

x∈X{〈A∗y∗, x〉 − f(x)} = (f ∗ ◦ A∗)(y∗).

(b) We define as in the proof of Proposition 3.14 the function f : X×...×X →R, f(x1, ..., xm) =

∑mi=1 fi(xi), and the continuous linear operator A : X × ... ×

X → X,A(x1, ..., xm) =∑m

i=1 xi, and recall that Af = f1�...�fm. According tostatement (a) we have

(f1� . . .�fm)∗ = (Af)∗ = f ∗ ◦ A∗.

The conclusion follows by using Proposition 5.4 and the fact that the adjointoperator of A is defined as A∗x∗ = (x∗, ..., x∗) for all x∗ ∈ X∗. �

The following result shows that an arbitrary function and its lower semicon-tinuous hull have the same conjugate function.

Proposition 5.7 For f : X → R a given function it holds f ∗ = (f)∗.

Proof. Since f ≤ f , by Proposition 5.3 (c) we have (f)∗ ≥ f ∗. We assume thatthere exists x∗ ∈ X∗ such that (f)∗(x∗) > f ∗(x∗). Then there exists c ∈ R suchthat (f)∗(x∗) > c > f ∗(x∗). This means that 〈x∗, x〉 − f(x) ≤ c for all x ∈ Xor, equivalently, 〈x∗, x〉 − c ≤ f(x) for all x ∈ X. This implies that 〈x∗, x〉 − c ≤f(x) for all x ∈ X or, equivalently, (f)∗(x∗) = supx∈X{〈x∗, x〉 − f(x)} ≤ c.Contradiction! �

The following definition introduces the biconjugate function of a given func-tion.

Definition 5.8 (biconjugate function) Let f : X → R be a given function. Thefunction

f ∗∗ : X → R, f ∗∗(x) = supx∗∈X∗

{〈x∗, x〉 − f ∗(x∗)},

is called the biconjugate function of f .

If X is a reflexive Banach space, then f ∗∗ = (f ∗)∗ can be seen as the conjugateof the conjugate function of f .

By arguing as in Remark 5.2, one can see that f ∗∗ is convex and weakly lowersemicontinuous, therefore, norm lower semicontinuous on X.

5 Conjugate functions 47

According to the Young-Fenchel inequality we have for all x ∈ X that

〈x∗, x〉 − f ∗(x∗) ≤ f(x) ∀x∗ ∈ X∗,

thusf ∗∗(x) ≤ f(x).

Since f ∗∗ is lower semicontinuous, in view of Remark 4.8, it holds

f ∗∗(x) ≤ f(x) ≤ f(x) ∀x ∈ X. (5.1)

The following theorem shows that for proper, convex and lower semicontinuousfunctions the above inequality holds as equality.

Theorem 5.9 (Fenchel-Moreau Theorem) If f : X → R is proper, convex andlower semicontinuous, then f ∗ : X∗ → R is proper and f = f ∗∗.

Proof. We prove first that f ∗ is proper. If there exists z∗ ∈ X∗ such thatf ∗(z∗) = −∞, then f ∗∗ is identically +∞, thus, by (5.1), f is identically +∞.Contradiction to f proper!

As f is proper, convex and lower semicontinuous, by Theorem 4.16 it followsthat there exists (x∗, c) ∈ X∗×R such that 〈x∗, x〉+ c ≤ f(x) for all x ∈ X. Thisis equivalent to f ∗(x∗) ≤ −c, which implies that f ∗ is not identically +∞. Thisproves the properness of f ∗.

The fact that f = f ∗∗ follows also from Theorem 4.16, namely, for all x ∈ Xit holds

f ∗∗(x) = supx∗∈X∗

{〈x∗, x〉 − f ∗(x∗)} = supx∗∈X∗,c∈R,f∗(x∗)≤−c

{〈x∗, x〉+ c}

= sup{〈x∗, x〉+ c | (x∗, c) ∈ X∗ × R, 〈x∗, y〉+ c ≤ f(y) ∀y ∈ X} = f(x).

�

Having a nonempty convex and weakly∗ closed set C ⊆ X∗ and x∗ /∈ C,according to the Hahn-Banach Strong Separation Theorem (in locally convexspaces), there exists x ∈ X, x 6= 0, such that

infc∗∈C〈c∗, x〉 > 〈x∗, x〉.

As in the proof of Theorem 4.16, one can show that a function g : X∗ → R isconvex and weakly∗ lower semicontinuous with g(x∗) > −∞ for all x∗ ∈ X∗ ifand only if there exists (x′, c′) ∈ X × R such that g(x∗) ≥ 〈x∗, x′〉 + c′ for allx∗ ∈ X∗ and

g(x∗) =sup{〈x∗, x〉+ c | (x, c) ∈ X × R, 〈y∗, x〉+ c ≤ g(y∗) ∀y∗ ∈ X∗} ∀x∗ ∈ X∗.

This leads to the following version of the Fenchel-Moreau Theorem for convexand weakly∗ lower semicontinuous functions defined on X∗.


Theorem 5.10 If g : X∗ → R is proper, convex and weakly∗ lower semicontinu-ous, then g∗ : X → R is proper and g = g∗∗.

Remark 5.11 It is natural to ask whether it makes sense to continue the processof defining conjugate functions of higher order for an arbitrary function f : X →R. We will see that it does not make sense, since for

f ∗∗∗ : X∗ → R, f ∗∗∗(x∗) = supx∈X{〈x∗, x〉 − f ∗∗(x)},

we always have f ∗ = f ∗∗∗.If f ∗ is proper, then the statement follows from Theorem 5.10. If f ∗ is identi-

cally +∞, then f ∗∗ is identically −∞ and f ∗∗∗ is identically +∞, thus f ∗ = f ∗∗∗.If there exists z∗ ∈ X∗ such that f ∗(z∗) = −∞, then f ∗∗ is identically +∞, whichyields that f ∗∗∗ is identically −∞. On the other hand, from (5.1) it follows thatf is identically +∞, and so f ∗ is identically −∞, thus f ∗ = f ∗∗∗.

We will close this section by providing formulas for the conjugate of the com-position of a proper, convex and lower semicontinuous function with a continuouslinear operator and for the sum of finitely many proper, convex and lower semicon-tinuous functions, which we will obtain as a consequence of the Fenchel-MoreauTheorem.

To this end we will denote in the following by

gw∗

: X∗ → R, gw∗(x∗) = inf{t | (x∗, t) ∈ clw∗(epi g)},

the weakly∗ lower semicontinuous hull of a function g : X∗ → R.

Theorem 5.12 (Moreau-Rockafellar Theorem)

(a) For f : Y → R a proper, convex and lower semicontinuous function andA : X → Y a continuous linear operator fulfilling A−1(dom f) 6= ∅ it holds

(f ◦ A)∗ = A∗f ∗w∗

.

(b) For fi : X → R, i = 1, . . . ,m, proper, convex and lower semicontinuousfunctions fulfilling ∩mi=1 dom fi 6= ∅ it holds

(f1 + ...fm)∗ = f ∗1� . . .�f ∗mw∗

.

Proof. (a) According to Proposition 3.13, the function A∗f ∗ : X∗ → R is con-

vex, thus A∗f ∗w∗

: X∗ → R is convex and weakly∗ lower semicontinuous. Accord-ing to Proposition 5.6, the counterpart of Proposition 5.7 for the weakly∗ lowersemicontinuous hull, and the Theorem 5.10 it holds(

A∗f ∗w∗)∗

(x) = (A∗f ∗)∗ (x) = (f ∗∗ ◦ A)(x) = (f ◦ A)(x) ∀x ∈ X. (5.2)

6 Convex subdifferential 49

Next we will prove that A∗f ∗w∗

is proper. If it is identically +∞, then, accordingto (5.2), f ◦ A is identically −∞, which is a contradiction to f proper. If there

exists z∗ ∈ X∗ such that A∗f ∗w∗

(z∗) = −∞, then f ◦A is identically +∞, whichis a contradiction to the fact that A−1(dom f) 6= ∅.

Using again Theorem 5.10, it yields

A∗f ∗w∗

(x∗) = (A∗f ∗w∗

)∗∗(x∗) = (f ◦ A)∗(x∗) ∀x∗ ∈ X∗.

(b) We consider the proper, convex and lower semicontinuous function f :X × ...×X → R, f(x1, ..., xm) =

∑mi=1 fi(xi), and the continuous linear operator

A : X → X × ... × X,Ax = (x, ..., x). Then A−1(dom f) 6= ∅ and, according to(a),

(f1 + ...+ fm)∗ = (f ◦ A)∗ = A∗f ∗w∗

= f ∗1� . . .�f ∗mw∗

.

�

6 Convex subdifferential

Definition 6.1 ((convex) subdifferential) Let f : X → R be a given functionand x ∈ X be such that f(x) ∈ R. The set defined as

∂f(x) := {x∗ ∈ X∗ | f(y)− f(x) ≥ 〈x∗, y − x〉 ∀y ∈ X} for f(x) ∈ R,

and ∂f(x) := ∅ for f(x) /∈ R, is called the (convex) subdifferential of f at x.The elements of ∂f(x) are called (convex) subgradients of f at x. Therefore, the(convex) subdifferential can be seen as a set-valued operator ∂f : X ⇒ X∗ fromX to X∗.

Remark 6.2 The (convex) subdifferential of a function f : X → R at x ∈ Xis a convex and weakly∗ closed subset of X∗, however, it can be empty even iff(x) ∈ R.

For f(x) ∈ R we have

∂f(x) =⋂

y∈dom f

{x∗ ∈ X∗ | 〈x∗, y〉 − f(y) ≤ 〈x∗, x〉 − f(x)}

and that for all y ∈ dom f the set {x∗ ∈ X∗ | 〈x∗, y〉 − f(y) ≤ 〈x∗, x〉 − f(x)}is convex and weakly∗ closed, being either the empty set (if f(y) = −∞) or thelower level set of the affine and weakly∗ continuous mapping x∗ 7→ 〈x∗, y〉− f(y).

The function

f : R→ R, f(x) =

{−√x, if x ≥ 0,

+∞, otherwise,


is proper, convex and lower semicontinuous, f(0) = 0, however, ∂f(0) = ∅.Indeed, assuming that there exists x∗ ∈ ∂f(0), then one has that for all y > 0

−√y ≥ x∗y or, equivalently, − 1√y≥ x∗.

Contradiction!

The following proposition shows that the Young-Fenchel inequality for a func-tion f : X → R is fulfilled as equality only for pairs (x, x∗) lying on the graph ofits (convex) subdifferential, namely, fulfilling x∗ ∈ ∂f(x).

Proposition 6.3 Let f : X → R be a given function, x ∈ X and x∗ ∈ X∗. Then

x∗ ∈ ∂f(x) ⇔ f ∗(x∗) + f(x) = 〈x∗, x〉.

Proof. We have that x∗ ∈ ∂f(x) if and only if f(x) ∈ R and f ∗(x∗) =supy∈X{〈x∗, y〉 − f(y)} ≤ 〈x∗, x〉 − f(x), which is further equivalent to f(x) ∈ Rand f ∗(x∗)+f(x) ≤ 〈x∗, x〉. In view of the Young-Fenchel inequality (Proposition5.3 (a)), this is further equivalent to f(x) ∈ R and f ∗(x∗) + f(x) = 〈x∗, x〉 andfinally to f ∗(x∗) + f(x) = 〈x∗, x〉. �

Next we present some consequences of the nonemptiness of the (convex) sub-differential of an arbitrary function at a given point.

Proposition 6.4 Let f : X → R be a given function and x ∈ X be such that∂f(x) 6= ∅. The following statements are true:

(a) f ∗∗(x) = f(x) = f(x), f is lower semicontinuous at x, and f, f and f ∗∗ areproper;

(b) ∂f ∗∗(x) = ∂f(x) = ∂f(x);

(c) if f is convex, then f = f ∗∗.

Proof. (a) Let x∗ ∈ ∂f(x). Then f(x) ∈ R and the function h : X → R,h(y) = 〈x∗, y〉+ f(x)− 〈x∗, x〉, is an affine-continuous minorant of f . Accordingto the proof of Theorem 5.9 we have that f ∗∗ is the pointwise supremum of thefamily of affine-continuous minorants of f , which, in combination with (5.1), gives

h ≤ f ∗∗ ≤ f ≤ f. (6.1)

Taking into consideration that h(x) = f(x), we deduce that h(x) = f ∗∗(x) =f(x) = f(x), which implies that f is lower semicontinuous at x. The propernessof f , f and f ∗∗ follows from (6.1).


(b) It follows by using Proposition 6.3, statement (a) and that (see Proposition5.7 and Remark 5.11)

f ∗(x∗) = (f)∗(x∗) = f ∗∗∗(x∗) ∀x∗ ∈ X∗.

(c) If f is convex, then f is proper, convex and lower semicontinuous, thus,according to Theorem 5.9, it holds

f = (f)∗∗ = ((f)∗)∗ = f ∗∗.

�

In the following proposition we collect some properties of the (convex) subd-ifferential.

Proposition 6.5 Let f : X → R be a given function. The following statementsare true:

(a) 0 ∈ ∂f(x)⇔ f(x) ∈ R and f(x) = miny∈X f(y);

(b) ∂(λf)(x) = λ∂f(x) ∀x ∈ X ∀λ > 0;

(c) if, for z ∈ X, fz : X → R, fz(x) = f(x + z), then ∂fz(x) = ∂f(x + z)∀x ∈ X;

(d) if, for z∗ ∈ X∗, fz∗ : X → R, fz∗(x) = f(x) + 〈z∗, x〉, then ∂fz∗(x) =∂f(x) + z∗ ∀x ∈ X.

The following proposition provides the formula of the (convex) subdifferentialof a separable function.

Proposition 6.6 Let Xi, i = 1, ...,m, be nontrivial real normed spaces, fi : Xi →R, i = 1, ...,m, given functions and

f : X1 × ...×Xm → R, f(x1, ..., xm) =m∑i=1

fi(xi).

Then

∂f(x1, ..., xm) = ∂f1(x1)× ...× ∂fm(xm) ∀(x1, ..., xm) ∈ X1 × ...×Xm.

Example 6.7 (a) Let C ⊆ X be a given set. Then the (convex) subdifferentialof its indicator function δC reads

∂δC(x) = {x∗ ∈ X∗ | 〈x∗, y − x〉 ≤ 0 ∀y ∈ C} = NC(x) for x ∈ C

and ∂δC(x) = ∅ for x /∈ C.


(b) It holds

∂‖ · ‖(x) =

{{x∗ ∈ X∗ | ‖x∗‖∗ ≤ 1}, for x = 0,{x∗ ∈ X∗ | ‖x∗‖∗ = 1, 〈x∗, x〉 = ‖x‖}, for x 6= 0.

According to Proposition 6.3 and Example 5.5 (g) we have

x∗ ∈ ∂‖ · ‖(x)⇔ ‖ · ‖∗(x∗) + ‖x‖ = 〈x∗, x〉 ⇔ ‖x∗‖∗ ≤ 1 and 〈x∗, x〉 = ‖x‖.

If x = 0, then everything is clear. If x 6= 0, then the statement follows by takinginto account that ‖x‖ = 〈x∗, x〉 ≤ ‖x∗‖∗‖x‖, thus 1 ≤ ‖x∗‖∗.

(c) It holds

∂

(1

2‖ · ‖2

)(x) = {x∗ ∈ X∗ | ‖x∗‖∗ = ‖x‖, 〈x∗, x〉 = ‖x‖‖x∗‖∗} ∀x ∈ X.

Let x ∈ X. According to Proposition 6.3 and Example 5.5 (h) we have

x∗ ∈ ∂(

1

2‖ · ‖2

)(x)

⇔ 1

2‖x∗‖2

∗ +1

2‖x‖2 = 〈x∗, x〉 ≤ ‖x‖‖x∗‖∗ ≤

1

2‖x∗‖2

∗ +1

2‖x‖2

⇔ 〈x∗, x〉 = ‖x‖‖x∗‖∗ and ‖x‖ = ‖x∗‖∗.

The next result displays some connections between the (convex) subdifferen-tial of a given function f and the one of its conjugate.

Theorem 6.8 Let f : X → R be a given function and x ∈ X. The followingstatements are true:

(a) if x∗ ∈ ∂f(x), then x ∈ ∂f ∗(x∗);

(b) if f(x) = f ∗∗(x), then x∗ ∈ ∂f(x) if and only if x ∈ ∂f ∗(x∗).

(c) if f is proper, convex and lower semicontinuous, then x∗ ∈ ∂f(x) if andonly if x ∈ ∂f ∗(x∗).

Proof. (a) Since x∗ ∈ ∂f(x), according to Proposition 6.3 we have f(x) +f ∗(x∗) = 〈x∗, x〉. But f ∗∗(x) ≤ f(x), and thus f ∗∗(x) + f ∗(x∗) ≤ 〈x∗, x〉. Asthe reverse inequality is always fulfilled, using once more Proposition 6.3, we getx ∈ ∂f ∗(x∗).

(b) Because of (a) only the sufficiency must be proven. For any x ∈ ∂f ∗(x∗),again by Proposition 6.3, it holds 〈x∗, x〉 = f ∗(x∗) + f ∗∗(x) = f ∗(x∗) + f(x) andtherefore x∗ ∈ ∂f(x).

(c) Theorem 5.9 yields f = f ∗∗ and the equivalence follows from (b). �


The next theorem shows that proper and convex functions defined on anormed space have nonempty (convex) subdifferentials at their points of con-tinuity.

Theorem 6.9 Let f : X → R be a proper and convex function and x ∈ dom fsuch that f is continuous at x. Then ∂f(x) 6= ∅ and ∂f(x) is norm bounded, thusweakly∗ compact.

Proof. Let δ > 0 such that f(y) < f(x) + 1 for all y ∈ B(x, δ). Then B(x, δ)×[f(x)+1,+∞) ⊆ epi f , which means that int(epi f) 6= ∅. We have that (x, f(x)) /∈int(epi f), since, otherwise, there would exist η > 0 such that (x, f(x) − η) ∈epi f , which is impossible. Taking into account that int(epi f) is convex, bythe Hahn-Banach Separation Theorem (Theorem 2.19) there exists (x∗, α) ∈X∗ × R, (x∗, α) 6= (0, 0), such that

〈x∗, y〉+ αr ≥ 〈x∗, x〉+ αf(x) ∀(y, r) ∈ cl(epi f) = cl(int(epi f)). (6.2)

Choosing (y, r) := (x, f(x) + 1) ∈ epi f in (6.2), it yields α ≥ 0. If α = 0, then,from (6.2) it follows that

〈x∗, y − x〉 ≥ 0 ∀y ∈ dom f,

therefore,〈x∗, u〉 ≥ 0 ∀u ∈ B(0, δ).

This further implies that x∗ = 0, which leads to a contradiction. Consequently,α > 0. We divide (6.2) by α and, using that for all y ∈ dom f it holds (y, f(y)) ∈epi f , we get

f(y) ≥⟨− 1

αx∗, y − x

⟩+ f(x) ∀y ∈ dom f,

which gives

f(y) ≥⟨− 1

αx∗, y − x

⟩+ f(x) ∀y ∈ X,

thus − 1αx∗ ∈ ∂f(x).

In order to prove the second statement of the theorem we consider an arbitraryelement x∗ ∈ ∂f(x). Then, for every u ∈ B(0, δ), it holds 〈x∗, u〉 ≤ f(x + u) −f(x) < 1. This means that

δ‖x∗‖∗ = δ sup‖v‖<1

|〈x∗, v〉| = sup‖v‖<1

|〈x∗, δv〉| = sup‖u‖<δ

|〈x∗, u〉| = sup‖u‖<δ

〈x∗, u〉 ≤ 1,

which proves that x∗ ∈ B∗(0,1δ). We have shown that ∂f(x) ⊆ 1

δB∗(0, 1), in

other words, that ∂f(x) is norm bounded.According to the Theorem of Banach-Alaoglu (see, for instance, [8]), the closed

unit ball of X∗ is weakly∗ compact and, since ∂f(x) is weakly∗ closed, it is weakly∗

compact, too. �


We recall that, if X is a Banach space, then, according to the Uniform Bound-edness Principle, every weakly∗ compact subset of X∗ is norm bounded and, ofcourse, weakly∗ closed.

In finite-dimensional spaces the (convex) subdifferential of a proper and con-vex function is nonempty even at points in the relative interior of its effectivedomain (at which f must not necessarily be continuous).

Theorem 6.10 If f : Rn → R is a proper and convex function and x ∈ ri(dom f),then ∂f(x) 6= ∅.

Proof. Since f is a proper and convex function, its epigraph is a nonemptyconvex set. According to Theorem 2.18 (see also Exercise 20) we have

ri(epi f) = {(x, r) ∈ Rn × R | x ∈ ri(dom f), f(x) < r},

which means that (x, f(x)) /∈ ri(epi f). By Theorem 2.22, there exists (a, α) ∈Rn × R, (a, α) 6= (0, 0), such that

inf{aTy + αr | (y, r) ∈ epi f} ≥ aTx+ αf(x) (6.3)

andsup{aTy + αr | (y, r) ∈ epi f} > aTx+ αf(x). (6.4)

As in the proof of Theorem 6.9, from (6.3) it yields that α ≥ 0. If α = 0, thenthe two above statements become

inf{aTy | y ∈ dom f} ≥ aTx (6.5)

and, respectively,sup{aTy | y ∈ dom f} > aTx. (6.6)

We will prove that aT z = aTx for all z ∈ dom f , which will then contradict (6.6).Indeed, since x ∈ ri(dom f), there exists ε > 0 such that B(x, ε) ∩ aff(dom f) ⊆dom f . Let z ∈ dom f be arbitrarily chosen. Then there exists λ < 0 suchthat x + λ(z − x) ∈ B(x, ε) and, since x + λ(z − x) ∈ aff(dom f), it yieldsx + λ(z − x) ∈ dom f . According to (6.5) it holds aT (x + λ(z − x)) ≥ aTxor, equivalently, aT z ≤ aTx. Since the opposite inequality is also true, we getaT z = aTx.

This proves that α > 0. The conclusion follows as in the proof of Theorem6.9 after dividing (6.3) by α. �

Example 6.11 The (convex) subdifferential of a proper and convex functionf : Rn → R at an element x ∈ ri(dom f) is not necessarily bounded. Indeed,the function δR×{0} : R2 → R is proper and convex and it holds ri

(dom δR×{0}

)=

R× {0}. For every (x, 0) ∈ R× {0} it holds

∂δR×{0}(x, 0) = NR×{0}(x, 0) = {0} × R.

7 Directional derivative and differentiability 55

7 Directional derivative and differentiability

The main aim of this section is to introduce a directional derivative notion forconvex functions and to characterize the (convex) subdifferential in terms of it.

Theorem 7.1 (directional derivative) Let f : X → R be a proper and convexfunction and x ∈ dom f . Then for every u ∈ X there exists

f ′(x;u) := limt↓0

f(x+ tu)− f(x)

t= inf

t>0

f(x+ tu)− f(x)

t∈ R,

which is called the directional derivative of f at x in direction u. The functionu 7→ f ′(x;u) is sublinear and dom f ′(x; ·) = cone(dom f−x). If x ∈ core(dom f),then f ′(x;u) ∈ R for all u ∈ X.

Proof. Let u ∈ X and define

ϕ : (0,+∞)→ R, ϕ(t) =f(x+ tu)− f(x)

t.

We will show that ϕ is nondecreasing, in other words, that for 0 < t1 < t2 itholds ϕ(t1) ≤ ϕ(t2). This will imply that f ′(x;u) = limt↓0 ϕ(t) = inft>0 ϕ(t) ∈ R.Indeed, let 0 < t1 < t2. Then, using the convexity of f , it yields

f(x+ t1u) = f

((1− t1

t2

)x+

t1t2

(x+ t2u)

)≤(

1− t1t2

)f(x) +

t1t2f(x+ t2u)

and, from here,

ϕ(t1) =f(x+ t1u)− f(x)

t1≤ f(x+ t2u)− f(x)

t2= ϕ(t2).

Next we will show that u 7→ f ′(x;u) is sublinear. Obviously, f ′(x; 0) = 0. Forλ > 0 and u ∈ X it holds

f ′(x;λu) = limt↓0

f(x+ tλu)− f(x)

t= λ lim

t↓0

f(x+ λtu)− f(x)

λt

=λ lims↓0

f(x+ su)− f(x)

s= λf ′(x;u),

which proves that u 7→ f ′(x;u) is positively homogeneous.Moreover, for u1, u2 ∈ X it holds, by using again that f is convex,

f ′(x;u1 + u2) = limt↓0

f(x+ t(u1 + u2))− f(x)

t

= limt↓0

f(

12(x+ 2tu1) + 1

2(x+ 2tu2)

)− f(x)

t

≤ limt↓0

f(x+ 2tu1)− f(x)

2t+ lim

t↓0

f(x+ 2tu2)− f(x)

2t

= f ′(x;u1) + f ′(x;u2),


which proves that u 7→ f ′(x;u) is subadditive.Since

u ∈ dom f ′(x; ·)⇔f ′(x;u) < +∞⇔ ∃t > 0 such thatf(x+ tu)− f(x)

t< +∞

⇔∃t > 0 such that x+ tu ∈ dom f

⇔∃t > 0 such that u ∈ 1

t(dom f − x),

it follows that dom f ′(x; ·) = cone(dom f − x).Assume now that x ∈ core(dom f). Since dom f is a convex set, it yields

dom f ′(x; ·) = cone(dom f − x) = X. In other words, f ′(x;u) < +∞ for allu ∈ X. Assuming that there exists u ∈ X such that f ′(x; u) = −∞, according toProposition 3.7 it holds f ′(x;u) = −∞ for all u ∈ icr(dom f ′(x; ·)) = X, whichis a contradiction to f ′(x; 0) = 0. This proves that f ′(x;u) ∈ R for all u ∈ X. �

The next theorem provides a formulation for the (convex) subdifferential of aconvex function in terms of its directional derivative.

Theorem 7.2 Let f : X → R be a proper and convex function and x ∈ dom f .Then

∂f(x) = {x∗ ∈ X∗ | f ′(x;u) ≥ 〈x∗, u〉 ∀u ∈ X}.

Proof. “⊆” Let x∗ ∈ ∂f(x). For all u ∈ X and all t > 0 it holds

f(x+ tu)− f(x)

t≥ 1

t〈x∗, x+ tu− x〉 = 〈x∗, u〉,

which implies that f ′(x;u) ≥ 〈x∗, u〉.“⊇” Let x∗ ∈ X∗ such that f ′(x;u) ≥ 〈x∗, u〉 for all u ∈ X. Then for all

u ∈ X it holds

〈x∗, u〉 ≤ f ′(x;u) = limt↓0,t≤1

f((1− t)x+ t(x+ u))− f(x)

t

≤ limt↓0,t≤1

t(f(x+ u)− f(x))

t= f(x+ u)− f(x).

Consequently,

f(y)− f(x) ≥ 〈x∗, y − x〉 ∀y ∈ X,

which implies x∗ ∈ ∂f(x). �

On the other hand, we have a formulation for the directional derivative of afunction in terms of its (convex) subdifferential.


Theorem 7.3 Let f : X → R be a proper and convex function such that f iscontinuous at x ∈ dom f . Then f ′(x, ·) is continuous and

f ′(x;u) = max{〈x∗, u〉 | x∗ ∈ ∂f(x)} ∀u ∈ X.

Proof. Let δ > 0 such that f(y)− f(x) < 1 for all y ∈ B(x, δ). Then B(x, δ) ⊆dom f , thus x ∈ int(dom f) = core(dom f). According to Theorem 7.1 it holdsf ′(x;u) ∈ R for all u ∈ X. On the other hand, for all u ∈ B(0, δ) we have

f ′(x;u) = inft>0

f(x+ tu)− f(x)

t≤ f(x+ u)− f(x) < 1,

in other words, f ′(x; ·) is bounded above on B(0, δ). According to Theorem 4.18,f ′(x; ·) is continuous on int(dom f ′(x; ·)) = X, which proves the first statementof the theorem.

For all x∗ ∈ X∗ we have

(f ′(x; ·))∗(x∗) = supu∈X

{〈x∗, u〉 − inf

t>0

f(x+ tu)− f(x)

t

}= sup

u∈X,t>0

{〈x∗, tu〉 − f(x+ tu) + f(x)

t

}.

If x∗ ∈ ∂f(x), then for all u ∈ X and all t > 0 it holds

〈x∗, tu〉 − f(x+ tu) + f(x)

t≤ 0.

Since for u := 0 and t := 1 the fraction is equal to 0, it yields (f ′(x; ·))∗(x∗) = 0.If x∗ /∈ ∂f(x), then there exists y ∈ X such that 〈x∗, y−x〉−f(y) +f(x) > 0,

thus, for all t > 0 (by taking u := y−xt

) it holds

(f ′(x; ·))∗(x∗) ≥ 〈x∗, y − x〉 − f(y) + f(x)

t.

We let t→ 0 and this implies (f ′(x; ·))∗(x∗) = +∞. This proves that (f ′(x; ·))∗ =δ∂f(x).

By the Fenchel-Moreau Theorem we have for all u ∈ X

f ′(x;u) = (f ′(x; ·))∗∗(u) = (δ∂f(x))∗(u) = σ∂f(x)(u) = sup{〈x∗, u〉 | x∗ ∈ ∂f(x)}.

It follows from the fact that ∂f(x) is weakly∗ compact (see Theorem 6.9) thatthe supremum is attained. �

Theorem 7.2 and Theorem 7.3 have important consequences when applied toconvex and Gateaux differentiable functions. We recall that a proper function


f : X → R is said to be Gateaux differentiable at x ∈ dom f if there existsx∗ ∈ X∗ such that

〈x∗, u〉 = limt→0

f(x+ tu)− f(x)

t∀u ∈ X.

The functional x∗ is called Gateaux differential (derivative) of f at x, and it isdenoted by ∇f(x).

Remark 7.4 (a) We recall that a proper function f : X → R is said to beFrechet differentiable at x ∈ int(dom f) if there exists x∗ ∈ X∗ such that

limu→0

f(x+ u)− f(x)− 〈x∗, u〉‖u‖

= 0.

In this case f is continous and Gateaux differentiable at x. The functional x∗ iscalled Frechet differential (derivative) of f at x, and is consequently denoted alsoby ∇f(x).

(b) The function

f : R2 → R, f(x1, x2) =

{x31x2x41+x22

, if (x1, x2) 6= (0, 0),

0, otherwise,

is continuous and Gateaux differentiable at (0, 0), and it is not Frechet differen-tiable at (0, 0).

(c) If f : X → R is proper and Gateaux differentiable on B(x, δ) ⊆ dom f , forδ > 0, and the Gateaux derivative ∇f : B(x, δ)→ X∗ is continuous at x, then fis Frechet differentiable at x.

In order to show this, we consider an arbitrary u ∈ B(0, δ), u 6= 0. Thenthere exists ε > 0 such that x + tu ∈ B(x, δ) for all t ∈ (−ε, 1 + ε). We defineφ : (−ε, 1 + ε)→ R, φ(t) = f(x+ tu)− f(x)− t〈∇f(x), u〉. For all t ∈ (−ε, 1 + ε)it holds

φ′(t) = lims→0

φ(t+ s)− φ(t)

s= lim

s→0

f(x+ (t+ s)u)− f(x+ tu)

s− 〈∇f(x), u〉

=〈∇f(x+ tu)−∇f(x), u〉,

which also means that φ is differentiable on (−ε, 1 + ε) and therefore continuouson [0, 1]. By the Mean Value Theorem there exists λ ∈ (0, 1) such that

|f(x+ u)− f(x)− 〈∇f(x), u〉| = |φ(1)| = |φ(1)− φ(0)| = |φ′(λ)|≤ ‖∇f(x+ λu)−∇f(x)‖∗‖u‖,

consequently,

|f(x+ u)− f(x)− 〈∇f(x), u〉|‖u‖

≤ supt∈[0,1]

‖∇f(x+ tu)−∇f(x)‖∗.


The Frechet differentiability of f at x follows from the fact that

limu→0

supt∈[0,1]

‖∇f(x+ tu)−∇f(x)‖∗ = 0.

Corollary 7.5 If f : X → R is proper, convex and Gateaux differentiable atx ∈ dom f , then ∂f(x) = {∇f(x)}.

Proof. Since for all u ∈ X it holds f ′(x;u) = limt→0f(x+tu)−f(x)

t= 〈∇f(x), u〉,

according to Theorem 7.2 we have

∂f(x) = {x∗ ∈ X∗ | 〈∇f(x)− x∗, u〉 ≥ 0 ∀u ∈ X}= {x∗ ∈ X∗ | 〈∇f(x)− x∗, u〉 = 0 ∀u ∈ X}= {∇f(x)}.

�

Corollary 7.6 If f : X → R is proper, convex and continuous at x ∈ dom fsuch that ∂f(x) is a singleton, then f is Gateaux differentiable at x.

Proof. Let ∂f(x) = {x∗}, where x∗ ∈ X∗. By Theorem 7.3 it holds for allu ∈ X

f ′(x;u) = limt↓0

f(x+ tu)− f(x)

t= 〈x∗, u〉.

From here it yields for all u ∈ X that

limt↑0

f(x+ tu)− f(x)

t= − lim

s↓0

f(x+ s(−u))− f(x)

s= −〈x∗,−u〉 = 〈x∗, u〉.

In conclusion,

limt→0

f(x+ tu)− f(x)

t= 〈x∗, u〉,

which proves that f is Gateaux differentiable at x and ∇f(x) = x∗. �

Example 7.7 Let (H, 〈·, ·〉) be a real Hilbert space.(a) The function f : H → R, f(x) = 1

2‖x‖2, is Frechet differentiable on H and

for all x ∈ H it holds ∇f(x) = x. Indeed,

limu→0

f(x+ u)− f(x)− 〈x, u〉‖u‖

= limu→0

‖x+ u‖2 − ‖x‖2 − 2〈x, u〉2‖u‖

= limu→0

‖u‖2

= 0.

(b) For f : H → R, f(x) = ‖x‖, it holds

f ′(0;u) = limt↓0

t‖u‖t

= ‖u‖ ∀u ∈ H,


which shows that f is not Gateaux differentiable at 0.On the other hand, for x 6= 0 we have

f ′(x;u) = limt↓0

‖x+ tu‖ − ‖x‖t

= limt↓0

‖x+ tu‖2 − ‖x‖2

t(‖x+ tu‖+ ‖x‖)= lim

t↓0

t‖u‖2 + 2〈x, u〉‖x+ tu‖+ ‖x‖

=〈x, u〉‖x‖

=

⟨x

‖x‖, u

⟩∀u ∈ H.

Consequently, ∂f(x) ={

x‖x‖

}, which shows that f is Gateaux differentiable at x

with ∇f(x) = x‖x‖ .

We close the section by a result which provides a useful characterization ofthe convexity of a function on a convex and open set by means of its Gateauxderivative.

Theorem 7.8 Let C ⊆ X be a nonempty, convex and open set and f : C → R aGateaux differentiable function on C. The following statements are equivalent:

(i) f is convex on C;

(ii) f(y)− f(x) ≥ 〈∇f(x), y − x〉 ∀x, y ∈ C;

(iii) 〈∇f(y)−∇f(x), y − x〉 ≥ 0 ∀x, y ∈ C;

(iv) if f is twice Gateaux differentiable on C, then

∇2f(x)(u, u) ≥ 0 ∀x ∈ C ∀u ∈ X,

where

∇2f(x)(u, v) := limt→0

〈∇f(x+ tv), u〉 − 〈∇f(x), u〉t

denotes the second Gateaux differential (derivative) of f at x ∈ C in thedirections u ∈ X and v ∈ X.

Proof. “(i) ⇒ (ii)” Let x, y ∈ C. The function f + δC : X → R is proper,convex and Gateaux differentiable at x ∈ dom(f+δC) and∇(f+δC)(x) = ∇f(x).According to Corollary 7.5 it holds ∇f(x) ∈ ∂(f + δC)(x), thus 〈∇f(x), y−x〉 ≤(f + δC)(y)− (f + δC)(x) = f(y)− f(x).

“(ii)⇒ (i)” Let x, y ∈ C and λ ∈ [0, 1]. Then λx+(1−λ)y ∈ C and therefore

f(x)− f(λx+ (1− λ)y) ≥ 〈∇f(λx+ (1− λ)y), (1− λ)(x− y)〉

and

f(y)− f(λx+ (1− λ)y) ≥ 〈∇f(λx+ (1− λ)y), λ(y − x)〉.


We multiply the first inequality by λ, the second one by 1 − λ and add theresulting inequalities. This yields

λf(x) + (1− λ)f(y)− f(λx+ (1− λ)y) ≥ 0.

Consequently, f is convex on C.“(ii)⇒ (iii)” For all x, y ∈ C we have

f(y)− f(x) ≥ 〈∇f(x), y − x〉

andf(x)− f(y) ≥ 〈∇f(y), x− y〉,

which by summation gives 0 ≥ 〈∇f(x)−∇f(y), y−x〉 or, equivalently, 〈∇f(y)−∇f(x), y − x〉 ≥ 0.

“(iii) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for allt ∈ (−ε, 1 + ε). We define φ : (−ε, 1 + ε) → R, φ(t) = f(x + t(y − x)). For allt ∈ (−ε, 1 + ε) it holds

φ′(t) = lims→0

φ(t+ s)− φ(t)

s= lim

s→0

f(x+ (t+ s)(y − x))− f(x+ t(y − x))

s=〈∇f(x+ t(y − x)), y − x〉,

which also means that φ is differentiable on (−ε, 1 + ε) and therefore continuouson [0, 1]. By the Mean Value Theorem there exists λ ∈ (0, 1) such that

f(y)− f(x) = φ(1)− φ(0) = φ′(λ) = 〈∇f(x+ λ(y − x)), y − x〉.

On the other hand,

〈∇f(x+λ(y−x))−∇f(x), y−x〉 =1

λ〈∇f(x+λ(y−x))−∇f(x), λ(y−x)〉 ≥ 0,

which implies that f(y)− f(x) ≥ 〈∇f(x), y − x〉.“(iii) ⇒ (iv)” Let x ∈ C and u ∈ X. Since x ∈ coreC, there exists δ > 0

such that y := x+ δu ∈ C. Then

∇2f(x)(u, u) = limt→0

〈∇f(x+ tu), u〉 − 〈∇f(x), u〉t

= δ limt→0,t≤δ

〈∇f(x+ tδ(y − x))−∇f(x), t

δ(y − x)〉

t

≥ 0.

“(iv) ⇒ (ii)” Let x, y ∈ C and ε > 0 such that (1 − t)x + ty ∈ C for allt ∈ (−ε, 1 + ε), and φ : (−ε, 1 + ε) → R, φ(t) = f(x + t(y − x)). We have seenabove that for all t ∈ (−ε, 1 + ε) it holds

φ′(t) = 〈∇f(x+ t(y − x)), y − x〉.


In addition, for all t ∈ (−ε, 1 + ε) it holds

φ′′(t) = lims→0

φ′(t+ s)− φ′(t)s

= lims→0

〈∇f(x+ (t+ s)(y − x)), y − x〉 − 〈∇f(x+ t(y − x)), y − x〉s

=∇2f(x+ t(y − x))(y − x, y − x) ≥ 0.

According to Taylor’s Theorem, there exists λ ∈ (0, 1) such that

φ(1) = φ(0) + φ′(0) +1

2φ′′(λ) ≥ φ(0) + φ′(0),

in other words,f(y) ≥ f(x) + 〈∇f(x), y − x〉.

�

Chapter IV

Duality theory

In this chapter we will discuss in detail the theory of conjugate duality in convexanalysis and convex optimization, and some of its important consequences.

8 A general perturbation approach

In this section we will present a so-called perturbation approach which allows toassign to a given minimization problem a conjugate dual problem.

Let X be a nontrivial real normed space, F : X → R a proper function andconsider the following so-called primal optimization problem

(PG) infx∈X

F (x).

Let Y be another nontrivial real normed space and the function Φ : X × Y →R, called perturbation function, fulfilling Φ(x, 0) = F (x) for all x ∈ X. Theoptimization problem (PG) is nothing else than

(PG) infx∈X

Φ(x, 0).

The conjugate dual problem to (PG) is defined as the maximization problem

(DG) supy∗∈Y ∗

{−Φ∗(0, y∗)},

where Φ∗ : X∗×Y ∗ → R is the conjugate function of Φ. We denote by v(PG) andv(DG) the optimal objective values of the problems (PG) and (DG), respectively.The next result shows that for (PG) and (DG) weak duality always holds, namely,the optimal objective value of the primal problem is always greater than or equalto the optimal objective value of the dual problem.

Theorem 8.1 (weak duality) It holds

−∞ ≤ v(DG) ≤ v(PG) ≤ +∞.

63

64 IV Duality theory

Proof. For all x ∈ X and all y∗ ∈ Y ∗ we have

−Φ∗(0, y∗) = inf(z,y)∈X×Y

{−〈y∗, y〉+ Φ(z, y)} ≤ Φ(x, 0),

which implies that v(PG) ≥ v(DG). �

Next we explore the concept of strong duality, which is the situation whenv(PG) = v(DG) and the dual has an optimal solution. An important role in thecharacterization of strong duality will be played by the infimal value function ofΦ, h : Y → R, h(y) = infx∈X Φ(x, y). One can notice that v(PG) = h(0). Thefollowing result shows that the optimal objective value of (DG) can be expressedin terms of the biconjugate of h.

Proposition 8.2 It holds v(DG) = h∗∗(0).

Proof. For all y∗ ∈ Y ∗ we have

h∗(y∗) = supy∈Y{〈y∗, y〉 − h(y)} = sup

x∈Xy∈Y

{〈y∗, y〉 − Φ(x, y)} = Φ∗(0, y∗)

and, from here,

h∗∗(0) = supy∗∈Y ∗

{−h∗(y∗)} = supy∗∈Y ∗

{−Φ∗(0, y∗)} = v(DG).

�

Since h∗∗ ≤ h, it holds h∗∗(0) ≤ h(0), which is nothing else than the weak dualitystatement expressed via the infimal value function h.

Definition 8.3 (normal problem) We say that the problem (PG) is normal ifh(0) ∈ R and h is lower semicontinuous at 0.

The next result provides a characterization of the normality of (PG) in case theperturbation function Φ is convex.

Proposition 8.4 Let Φ : X × Y → R be a proper and convex function. Thefollowing statements are equivalent:

(i) the problem (PG) is normal;

(ii) v(PG) = v(DG) ∈ R.

8 A general perturbation approach 65

Proof. “(i)⇒ (ii)” We have h∗∗ ≤ h ≤ h. Since Φ is convex, h is also convex,and this implies that h is convex, too. The problem (PG) being normal, we haveh(0) = h(0) ∈ R. Since h is a convex and lower semicontinuous function, we haveaccording to Proposition 4.14 that h(y) > −∞ for all y ∈ Y . This guaranteesthe properness of h. According to the Fenchel-Moreau Theorem we obtain

h = (h)∗∗ = ((h)∗)∗ = (h∗)∗ = h∗∗

and so h∗∗(0) = h(0) = h(0) ∈ R. Since v(PG) = h(0) and v(DG) = h∗∗(0),statement (ii) is shown.

“(ii)⇒ (i)” Statement (ii) can be equivalently written as h∗∗(0) = h(0) ∈ R.This implies h(0) = h(0) ∈ R, which actually means that (PG) is normal. �

The following notion will be used to characterize the strong duality for (PG)and (DG).

Definition 8.5 (stable problem) We say that the problem (PG) is stable ifh(0) ∈ R and ∂h(0) 6= ∅.

Proposition 8.6 Let Φ : X × Y → R be a proper and convex function. Thefollowing statements are equivalent:

(i) the problem (PG) is stable;

(ii) the primal problem (PG) is normal, namely, v(PG) = v(DG) ∈ R, andthe dual (DG) has an optimal solution. In this situation the set of optimalsolutions of (DG) is equal to ∂h(0).

Proof. “(i) ⇒ (ii)” Assume that h(0) ∈ R and ∂h(0) 6= ∅. Thus, according toProposition 6.4, h(0) = h∗∗(0), which is nothing else than v(PG) = v(DG) ∈ R.By Proposition 8.4 it follows that (PG) is normal. Further, consider an elementy∗ ∈ ∂h(0). We have h(0) + h∗(y∗) = 0 or, equivalently, v(PG) = h(0) =−h∗(y∗) = −Φ∗(0, y∗), which, according to the weak duality theorem, impliesthat y∗ is an optimal solution of (DG).

“(ii)⇒ (i)” Assume that (PG) is normal and that its dual (DG) has an op-timal solution y∗ ∈ Y ∗. Thus h(0) = v(PG) = v(DG) = −Φ∗(0, y∗) = −h∗(y∗) ∈R, which is the same with h(0) + h∗(y∗) = 0 or, equivalently, y∗ ∈ ∂h(0). Sincethe set ∂h(0) is nonempty, the stability of (PG) is proved. �

When strong duality holds for (PG) and (DG), we write

infx∈X

Φ(x, 0) = maxy∗∈Y ∗

{−Φ∗(0, y∗)},

and use “max” instead of “sup” in order to emphasize that the dual problem hasan optimal solution.


As we have seen above, the stability completely characterizes the existence ofstrong duality for a given convex optimization problem and its conjugate dual.One of the main challenges in convex analysis is to formulate sufficient conditions,called regularity conditions, which ensure that the primal problem is stable. Inthe following we will formulate several such generalized interior point regularityconditions in terms of the perturbation function Φ, and also prove the corre-sponding strong duality theorem.

Theorem 8.7 Let Φ : X × Y → R be a proper and convex function such that0 ∈ PrY (dom Φ). If the regularity condition

(RCΦ1 ) ∃x′ ∈ X such that (x′, 0) ∈ dom Φ and Φ(x′, ·) is continuous at 0

is fulfilled, then for (PG) and (DG) strong duality holds.

Proof. The infimal value function h : Y → R, h(y) = infx∈X Φ(x, y), is convexand it fulfills h(0) = v(PG). If h(0) = −∞, then, according to the weak dualitytheorem, v(PG) = v(DG) = −∞ and every element y∗ ∈ Y ∗ is an optimalsolution of the dual problem (DG).

Let h(0) > −∞. Since h(0) ≤ Φ(x′, 0) < +∞, it holds h(0) ∈ R. Accordingto (RCΦ

1 ), there exists δ > 0 such that

h(y) ≤ Φ(x′, y) < Φ(x′, 0) + 1 ∀y ∈ B(0, δ), (8.1)

which implies that B(0, δ) ⊆ domh. In conclusion, 0 ∈ int(domh).Since h is bounded above on B(0, δ), we are in the setting of Theorem 4.18.

If h would be not proper, then it would be equal to −∞ on int(domh), whichwould be a contradiction to h(0) ∈ R. This shows that h is a proper functionand, consequently, continuous on int(domh). According to Theorem 6.9, it holds∂h(0) 6= ∅. In other words, the problem (PG) is stable, which leads to the desiredconclusion. �

One can notice in the proof of the above theorem that a consequence ofthe required regularity condition was that 0 is an interior point of domh =PrY (dom Φ). We will see in the next theorem that one can prove strong dualityfor the primal-dual pair (PG)− (DG) by assuming “only” that 0 ∈ core(domh),however, in the setting of complete normed spaces and by additionally requiringthat the perturbation function Φ is also lower semicontinuous.


(RCΦ2 ) X and Y are Banach spaces, Φ is lower semicontinuous

and 0 ∈ core(PrY (dom Φ))



Proof. We define Φ : X × Y → R, Φ(x, y) = max{Φ(x, y), ‖x‖}, and h : Y →R, h(y) = infx∈X Φ(x, y). Then Φ is a proper, convex and lower semicontinuous

function, h is a convex function, Φ ≤ Φ and h ≤ h. In addition,

h(y) ≥ 0 ∀y ∈ Y.

Let x′ ∈ X such that Φ(x′, 0) ∈ R. Then h(0) < +∞, which proves that h is

proper. Since dom Φ = dom Φ, it holds domh = PrY (dom Φ) = PrY (dom Φ) =

dom h, and so 0 ∈ core(domh) = core(dom h).

Let ε := max{h(0), 0} + 1 ∈ R and W := {y ∈ Y | h(y) < ε}. Then

0 ∈ coreW . Indeed, let y ∈ Y and λ > 0 such that λy ∈ dom h. We chooset ∈ (0, 1] such that t(h(λy)− ε+ 1) < 1. Then

h(tλy) ≤ th(λy) + (1− t)h(0) ≤ th(λy) + (1− t)(ε− 1)

= t(h(λy)− ε+ 1) + ε− 1 < ε,

which shows that tλy ∈ W and, consequently, proves that 0 ∈ coreW . From herewe immediately have that 0 ∈ core(clW ).

According to Theorem 1.11, for the convex and closed set clW we have 0 ∈core(clW ) = int(clW ), which means that there exists δ > 0 such that B(0, 2δ) ⊆clW .

Next we will prove that

h(y) ≤ ε ∀y ∈ B(0, δ). (8.2)

Let y ∈ B(0, δ) be arbitrarily chosen. Then 2y ∈ clW and, consequently, thereexists y1 ∈ W such that ‖2y− y1‖ < δ. This means that 4y− 2y1 = 2(2y− y1) ∈clW and, consequently, there exists y2 ∈ W such that ‖4y − 2y1 − y2‖ < δ.Repeating this procedure we obtain a sequence (yk)k∈N ⊆ W such that∥∥2ky − 2k−1y1 − ...− yk

∥∥ < δ ∀k ≥ 1

or, equivalently, ∥∥∥∥y − 1

2y1 − ...−

1

2kyk

∥∥∥∥ < 1

2kδ ∀k ≥ 1,

which proves that∑+∞

k=112kyk = y.

For all k ∈ N, since yk ∈ W , we can choose xk ∈ X such that Φ(xk, yk) < ε,which implies ‖xk‖ < ε. Since X is a Banach space and

∑+∞k=1

12k‖xk‖ < +∞,

there exists x ∈ X such that∑+∞

k=112kxk = x (see, for instance, [4, Lemma 1.12

(ii)]). This proves that+∞∑k=1

1

2k(xk, yk) = (x, y).


Since Φ(x′, 0) ∈ R, by using the convexity of Φ (see Remark 3.2 (a)), we obtainfor all n ∈ N

Φ

(n∑k=1

1

2k(xk, yk) +

1

2n(x′, 0)

)≤

n∑k=1

1

2kΦ(xk, yk) +

1

2nΦ(x′, 0),

≤n∑k=1

1

2kε+

1

2nΦ(x′, 0).

For n→ +∞ and by using that Φ is lower semicontinuous, this yields

h(y) ≤ Φ(x, y) ≤ ε.

This proves relation (8.2), which further implies

h(y) ≤ ε ∀y ∈ B(0, δ),

thus 0 ∈ int(domh). At this point we can argue as in the proof of Theorem 8.7.If h(0) = −∞, then strong duality follows automatically. If h(0) > −∞, thenh(0) ≤ Φ(x′, 0) < +∞, in other words, h(0) ∈ R. By Theorem 4.18 we obtainthat h is proper and continuous on int(domh), while Theorem 6.9 further givesthat ∂h(0) 6= ∅. This provides the desired conclusion. �

Remark 8.9 As it follows from the proof of Theorem 8.8, if Φ : X ×Y → R is aproper and convex function such that 0 ∈ PrY (dom Φ), then (RCΦ

2 ) is equivalentto

(RCΦ2′) X and Y are Banach spaces, Φ is lower semicontinuous

and 0 ∈ int(PrY (dom Φ)).

The following theorem introduces a regularity condition which is “weaker”than (RCΦ

2 ) and (RCΦ2′).


(RCΦ3 ) X and Y are Banach spaces, Φ is lower semicontinuous

and 0 ∈ sqri(PrY (dom Φ))


Proof. Since 0 ∈ sqri(PrY (dom Φ)), it holds that Z := cone(PrY (dom Φ)) is aclosed linear subspace of the Banach space Y , consequently, it is complete. Inaddition, domh = PrY (dom Φ) ⊆ Z.

We assume first that Z = {0}, which corresponds to the case when domh ={0}. Then v(PG) = h(0) and for all y∗ ∈ Y ∗ it holds h∗(y∗) = supy∈Y {〈y∗, y〉 −


h(y)} = −h(0), consequently, v(DG) = supy∗∈Y ∗{−h∗(y∗)} = h(0) = v(PG) andevery y∗ ∈ Y ∗ is an optimal solution of the dual problem.

Next we address the case when Z is a nontrivial Banach space. Let j :Z → Y , j(z) = z, be the canonical embedding of Z into Y and define Φ :

X × Z → R, Φ(x, z) = Φ(x, j(z)). The function Φ is proper, convex and lowersemicontinuous and its infimal value function is defined as

h : Z → R, h(z) = infx∈X

Φ(x, z) = infx∈X

Φ(x, j(z)) = h(j(z)).

Since domh ⊆ Z, it holds dom h = j−1(domh) = domh, which means that

0 ∈ core(PrZ(dom Φ)). Then, according to Theorem 8.8, it holds

infx∈X

Φ(x, 0) = maxz∗∈Z∗

{−Φ∗(0, z∗)}

or, equivalently,h(0) = h(0) = max

z∗∈Z∗{−h∗(z∗)}. (8.3)

Denoting by j∗ : Y ∗ → Z∗ the adjoint operator of j, one can easily notice thatj∗(y∗) = y∗|Z for all y∗ ∈ Y ∗. For all y∗ ∈ Y ∗, by taking into account thatdomh ⊆ Z, it holds

h∗(j∗(y∗)) = supz∈Z{〈j∗(y∗), z〉 − h(z)} = sup

z∈Z{〈y∗, j(z)〉 − h(j(z))}

= supz∈Z{〈y∗, z〉 − h(z)} = sup

z∈Y{〈y∗, z〉 − h(z)} = h∗(y∗).

According to (8.3), there exists z∗ ∈ Z∗ such that h(0) = −h∗(z∗). Accordingto the Hahn-Banach Theorem (see, for instance, [4, Theorem 6.6]), there exists

y∗ ∈ Y ∗ such that j∗(y∗) = y∗|Z = z∗. From here we have h(0) = −h∗(j∗(y∗)) =−h∗(y∗). Since, due to weak duality, h(0) ≥ −h∗(y∗) for all y∗ ∈ Y ∗, it holds

h(0) = maxy∗∈Y ∗

{−h∗(y∗)},

in other words, for (PG) and (DG) strong duality holds. �

In finite-dimensional spaces one can formulate a regularity condition for strongduality in terms of the relative interior.


(RCΦ4 ) Y is finite-dimensional and 0 ∈ ri(PrY (dom Φ))



Proof. The function h : Y → R is convex and 0 ∈ ri(domh). If h(0) = −∞,then the statement follows from the weak duality theorem. If h(0) ∈ R, then, byTheorem 2.6 and Proposition 3.7, h is proper. Thus, according to Theorem 6.10,∂h(0) 6= ∅, and the conclusion follows from Proposition 8.6. �

Regularity conditions and strong duality lie at the basis of exact conjugateand subdifferential calculus.

Theorem 8.12 (exact conjugate and subdifferential formulas) Let Φ : X×Y →R be a proper and convex function such that 0 ∈ PrY (dom Φ). If one of the reg-ularity conditions (RCΦ

i ), i ∈ {1, 2, 2′, 3, 4}, is fulfilled, then the following state-ments are true:

(a) (Φ(·, 0))∗(x∗) = miny∗∈Y ∗ Φ∗(x∗, y∗) for all x∗ ∈ X∗;

(b) ∂ (Φ(·, 0)) (x) = PrX∗ (∂Φ(x, 0)) for all x ∈ X.

Proof. (a) Let x∗ ∈ X∗. We define Φx∗ : X × Y → R,Φx∗(x, y) = Φ(x, y) −〈x∗, x〉. By weak duality we always have

(Φ(·, 0))∗(x∗) = supx∈X{〈x∗, x〉 − Φ(x, 0)} = − inf

x∈XΦx∗(x, 0)

≤− supy∗∈Y ∗

{−Φ∗x∗(0, y∗)} = inf

y∗∈Y ∗Φ∗x∗(0, y

∗) = infy∗∈Y ∗

Φ∗(x∗, y∗).

Furthermore, the function Φx∗ is proper and convex with dom Φx∗ = dom Φ,therefore, one of the regularity conditions (RCΦx∗

i ), i ∈ {1, 2, 2′, 3, 4}, is fulfilled.According to the theorems 8.7, 8.8, 8.10 and 8.11 we have

(Φ(·, 0))∗(x∗) = − infx∈X

Φx∗(x, 0) = − maxy∗∈Y ∗

{−Φ∗x∗(0, y∗)} = min

y∗∈Y ∗Φ∗(x∗, y∗),

which proves statement (a).(b) Let x ∈ X.“⊇” Let x∗ ∈ PrX∗ (∂Φ(x, 0)). Then there exists y∗ ∈ Y ∗ such that (x∗, y∗) ∈

∂Φ(x, 0). In other words,

Φ(u, v)− Φ(x, 0) ≥ 〈x∗, u− x〉+ 〈y∗, v〉 ∀(u, v) ∈ X × Y.

From here it follows that Φ(u, 0) − Φ(x, 0) ≥ 〈x∗, u − x〉 for all u ∈ X, whichmeans that x∗ ∈ ∂ (Φ(·, 0)) (x). This proves that the inclusion

PrX∗ (∂Φ(x, 0)) ⊆ ∂ (Φ(·, 0)) (x)

always holds.“⊆” Let x∗ ∈ ∂ (Φ(·, 0)) (x). Then Φ(x, 0) + (Φ(·, 0))∗(x∗) = 〈x∗, x〉. Accord-

ing to (a), there exists y∗ ∈ Y ∗ such that (Φ(·, 0))∗(x∗) = Φ∗(x∗, y∗), consequently,Φ(x, 0) + Φ∗(x∗, y∗) = 〈x∗, x〉 or, equivalently, (x∗, y∗) ∈ ∂Φ(x, 0). This provesthat x∗ ∈ PrX∗ (∂Φ(x, 0)). �

9 Closedness-type regularity conditions 71

9 Closedness-type regularity conditions

In this section we will introduce a new class of regularity conditions, calledclosedness-type regularity conditions, that guarantee strong duality for the gen-eral optimization problem (PG) and its conjugate dual problem (DG). Theconditions will be formulated in term of the conjugate function of the perturba-tion function Φ : X × Y → R. Notice that, since Φ∗ : X∗ × Y ∗ → R is a convexfunction, its infimal value function infy∗∈Y ∗ Φ∗(·, y∗) : X∗ → R is also a convexfunction.

Theorem 9.1 Let Φ : X×Y → R be a proper, convex and lower semicontinuousfunction such that 0 ∈ PrY (dom Φ). Then it holds

(Φ(·, 0))∗(x∗) =

(inf

y∗∈Y ∗Φ∗(·, y∗)

)w∗(x∗) ∀x∗ ∈ X∗.

Proof. Since Φ is proper, convex and lower semicontinuous, for the conjugateof infy∗∈Y ∗ Φ∗(·, y∗) it holds for all x ∈ X(

infy∗∈Y ∗

Φ∗(·, y∗))∗

(x) = supx∗∈X∗

{〈x∗, x〉 − inf

y∗∈Y ∗Φ∗(x∗, y∗)

}= sup

x∗∈X∗,y∗∈Y ∗{〈x∗, x〉 − Φ∗(x∗, y∗)} = Φ∗∗(x, 0) = Φ(x, 0).

This also implies that the function (infy∗∈Y ∗ Φ∗(·, y∗))w∗

is proper. Assuming thatit takes everywhere the value +∞, it yields that its conjugate, which coincideswith (infy∗∈Y ∗ Φ∗(·, y∗))∗, is identically −∞. This contradicts the properness

of Φ. On the other hand, assuming that (infy∗∈Y ∗ Φ∗(·, y∗))w∗

takes somewherethe value −∞, by the same argument as above, one would have that Φ(x, 0) isequal to +∞ for all x ∈ X. Since this contradicts the feasibility assumption

0 ∈ PrY (dom Φ), the function (infy∗∈Y ∗ Φ∗(·, y∗))w∗

is everywhere greater than−∞.

By the Fenchel-Moreau Theorem we obtain

(Φ(·, 0))∗ =

(inf

y∗∈Y ∗Φ∗(·, y∗)

)∗∗=

((inf

y∗∈Y ∗Φ∗(·, y∗)

)w∗)∗∗=

(inf

y∗∈Y ∗Φ∗(·, y∗)

)w∗,

which concludes the proof. �

A first consequence of this result follows.

Theorem 9.2 Let Φ : X×Y → R be a proper, convex and lower semicontinuousfunction with 0 ∈ PrY (dom Φ). Then

epi(Φ(·, 0))∗ = clω∗

(epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

))= clω∗ (PrX∗×R(epi Φ∗)) .


Proof. We will prove that

PrX∗×R(epi Φ∗) ⊆ epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

)⊆ clω∗ (PrX∗×R(epi Φ∗)) . (9.1)

If (x∗, r) ∈ PrX∗×R(epi Φ∗), then it is clear that infy∗∈Y ∗ Φ∗(x∗, y∗) ≤ r, thus(x∗, r) ∈ epi (infy∗∈Y ∗ Φ∗(·, y∗)).

If (x∗, r) ∈ epi(

infy∗∈Y ∗ Φ∗(·, y∗)), then for each ε > 0 there is an y∗ ∈ Y ∗

such that Φ∗(x∗, y∗) ≤ r + ε. This means that for all ε > 0 we have (x∗, r + ε) ∈∪y∗∈Y ∗ epi(Φ∗(·, y∗)) = PrX∗×R(epi Φ∗), and so (x∗, r) ∈ clω∗ (PrX∗×R(epi Φ∗)).

The conclusion follows from Theorem 9.1, which yields

epi(Φ(·, 0))∗ = clω∗

(epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

)),

and (9.1). �

Theorem 9.1 and Theorem 9.2 lead to the following result.

Theorem 9.3 Let Φ : X×Y → R be a proper, convex and lower semicontinuousfunction such that 0 ∈ PrY (dom Φ). Then PrX∗×R(epi Φ∗) is weakly∗ closed if andonly if

(Φ(·, 0))∗(x∗) = miny∗∈Y ∗

Φ∗(x∗, y∗) ∀x∗ ∈ X∗.

Proof. The infimal value function infy∗∈Y ∗ Φ∗(·, y∗) is proper (see the proof ofTheorem 9.1).

“⇒” If PrX∗×R(epi Φ∗) is weakly∗ closed, then, according to Theorem 9.2 and(9.1), epi(Φ(·, 0))∗ = PrX∗×R(epi Φ∗) = epi (infy∗∈Y ∗ Φ∗(·, y∗)). This proves that(Φ(·, 0))∗ = infy∗∈Y ∗ Φ∗(·, y∗). Let x∗ ∈ X∗. If (Φ(·, 0))∗(x∗) = +∞, then theinfimum in the infimal value function of Φ∗ is attained at every y∗ ∈ Y ∗. If(Φ(·, 0))∗(x∗) ∈ R, then

(x∗, (Φ(·, 0))∗(x∗)) ∈ epi(Φ(·, 0))∗ = PrX∗×R(epi Φ∗) = ∪y∗∈Y ∗ epi(Φ∗(·, y∗)),

consequently, there exists y∗ ∈ Y ∗ such that

Φ∗(x∗, y∗) ≤ (Φ(·, 0))∗(x∗) = infy∗∈Y ∗

Φ∗(x∗, y∗).

“⇐” The fact that (Φ(·, 0))∗ = infy∗∈Y ∗ Φ∗(·, y∗) implies that infy∗∈Y ∗ Φ∗(·, y∗)is weakly∗ lower semicontinuous, consequently, from (9.1)

clω∗ (PrX∗×R(epi Φ∗)) ⊆ epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

w∗)

= epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

).

On the other hand, since for all x∗ ∈ X∗ the infimum in the infimal value functionof Φ∗ is attained, we have

epi

(inf

y∗∈Y ∗Φ∗(·, y∗)

)⊆ ∪y∗∈Y ∗ epi(Φ∗(·, y∗)) = PrX∗×R(epi Φ∗).

This proves that PrX∗×R(epi Φ∗) is weakly∗ closed. �

10 Fenchel duality 73

In the light of Theorem 9.3 we can formulate a new so-called closedness-type regularity condition, which is sufficient not only for strong duality for theoptimization problems (PG) and (DG), but also for the exact conjugate andsubdifferential formulas in Theorem 8.12.


(RCΦ5 ) Φ is lower semicontinuous and PrX∗×R(epi Φ∗) is weakly∗ closed

is fulfilled, then the following statements are true:

(a) (Φ(·, 0))∗(x∗) = miny∗∈Y ∗ Φ∗(x∗, y∗) for all x∗ ∈ X∗;

(b) ∂ (Φ(·, 0)) (x) = PrX∗ (∂Φ(x, 0)) for all x ∈ X.

In particular, for (PG) and (DG) strong duality holds.

Remark 9.5 If Φ : X × Y → R is a proper, convex and lower semicontinu-ous function such that 0 ∈ PrY (dom Φ) and one of the regularity conditions(RCΦ

i ), i ∈ {1, 2, 2′, 3, 4}, is fulfilled, then, according to Theorem 8.12 and The-orem 9.3, the closedness-type regularity condition (RCΦ

5 ) is also fulfilled. In thenext sections we will provide an examples of primal-dual pairs of optimizationproblems for which the closedness-type regularity condition are satisfied, but thegeneralized interior point ones fail.

10 Fenchel duality

In this section, by using the general perturbation approach we develop a dualityframework for the optimization problem

(PA) infx∈X{f(x) + (g ◦ A)(x)}

and some of its particular instances.

Here, X and Y are nontrivial real normed spaces, A : X → Y is a continuouslinear operator, and f : X → R and g : Y → R are proper functions fulfillingA(dom f) ∩ dom g 6= ∅.

We will consider as perturbation function to this problem

ΦA : X × Y → R, ΦA(x, y) = f(x) + g(Ax+ y).

Obviously, ΦA(·, 0) = f + g ◦ A.


Its conjugate function (ΦA)∗ : X∗ × Y ∗ → R reads for all (x∗, y∗) ∈ X∗ × Y ∗

(ΦA)∗(x∗, y∗) = supx∈Xy∈Y

{〈x∗, x〉+ 〈y∗, y〉 − f(x)− g(Ax+ y)}

= supx∈Xr∈Y

{〈x∗, x〉+ 〈y∗, r − Ax〉 − f(x)− g(r)}

= supx∈Xr∈Y

{〈x∗ − A∗y∗, x〉+ 〈y∗, r〉 − f(x)− g(r)}

=f ∗(x∗ − A∗y∗) + g∗(y∗)

and gives rise to the following dual problem to (PA)

(DA) supy∗∈Y ∗

{−f ∗(−A∗y∗)− g∗(y∗)}.

By the weak duality theorem we have v(DA) ≤ v(PA).If f and g are convex functions, then ΦA is convex, too; if f and g are lower

semicontinuous, then ΦA is lower semicontinuous, too. The feasibility conditionguarantees that

0 ∈ PrY (dom ΦA) ={y ∈ Y | ∃x ∈ dom f such that y ∈ dom g − Ax}= dom g − A(dom f).

Thus, the regularity condition (RCΦ1 ) becomes

(RCA1 ) ∃x′ ∈ dom f ∩ A−1(dom g) such that g is continuous at Ax′.

In Banach spaces we obtain from (RCΦ2 ) and (RCΦ

2′) the following regularityconditions

(RCA2 ) X and Y are Banach spaces, f and g are lower semicontinuous

and 0 ∈ core(dom g − A(dom f))

and

(RCA2′) X and Y are Banach spaces, f and g are lower semicontinuous

and 0 ∈ int(dom g − A(dom f)),

respectively, while the regularity condition (RCΦ3 ) leads to

(RCA3 ) X and Y are Banach spaces, f and g are lower semicontinuous

and 0 ∈ sqri(dom g − A(dom f)).

The regularity condition (RCΦ4 ) becomes

(RCA4 ) Y is finite-dimensional and ri(dom g) ∩ ri(A(dom f))) 6= ∅.


In order to formulate a closedness-type regularity condition for (PA)− (DA),we will need to calculate PrX∗×R(epi(ΦA)∗). We have

(x∗, r) ∈ PrX∗×R(epi(ΦA)∗)⇔∃y∗ ∈ Y ∗ such that f ∗(x∗ − A∗y∗) + g∗(y∗) ≤ r

⇔∃y∗ ∈ Y ∗ such that (x∗ − A∗y∗, r − g∗(y∗))∈epi f ∗

⇔∃y∗ ∈ Y ∗ such that (x∗, r)∈epi f ∗ + (A∗y∗, g∗(y∗))

⇔ (x∗, r) ∈ epi f ∗ + (A∗ × IdR)(epi g∗),

where IdR denotes the identity operator on R and A∗× IdR : Y ∗×R→ X∗×R isdefined as (A∗ × IdR)(y∗, r) = (A∗y∗, r). Consequently, the regularity condition(RCΦ

5 ) leads to

(RCA5 ) f and g are lower semicontinuous and

epi f ∗ + (A∗ × IdR)(epi g∗) is weakly∗ closed.

By combining Theorem 8.12 and Theorem 9.4, we obtain the following result.

Theorem 10.1 Let f : X → R and g : Y → R be proper and convex functionsand A : X → Y a continuous linear operator such that A(dom f)∩ dom g 6= ∅. Ifone of the regularity conditions (RCA

i ), i ∈ {1, 2, 2′, 3, 4, 5}, is fulfilled, then thefollowing statements are true:

(a) (f + g ◦ A)∗(x∗) = miny∗∈Y ∗{f ∗(x∗ − A∗y∗) + g∗(y∗)} for all x∗ ∈ X∗;

(b) ∂ (f + g ◦ A) (x) = ∂f(x) + A∗∂g(Ax) for all x ∈ X.

In particular, for (PA) and (DA) strong duality holds.

Proof. The two statements follow from the formula of the conjugate function ofΦA and the fact that PrX∗(∂ΦA(x, 0)) = ∂f(x) +A∗∂g(Ax) for all x ∈ X. To seethat the last statement is true, let x ∈ X be fixed. Then x∗ ∈ PrX∗(∂ΦA(x, 0))if and only if there exists y∗ ∈ Y ∗ such that (x∗, y∗) ∈ ∂ΦA(x, 0) or, equivalently,ΦA(x, 0) + (ΦA)∗(x∗, y∗) = 〈x∗, x〉. In other words,

f(x) + g(Ax) + f ∗(x∗ − A∗y∗) + g∗(y∗) = 〈x∗, x〉,

which is further equivalent to

f(x) + f ∗(x∗ − A∗y∗)− 〈x∗ − A∗y∗, x〉+ g(Ax) + g∗(y∗)− 〈y∗, Ax〉 = 0.

By taking into account the Young-Fenchel inequality, this holds if and only if

f(x) + f ∗(x∗ − A∗y∗)− 〈x∗ − A∗y∗, x〉 = 0 and g(Ax) + g∗(y∗)− 〈y∗, Ax〉 = 0

or, equivalently,

x∗ − A∗y∗ ∈ ∂f(x) and y∗ ∈ ∂g(Ax).

This shows that x∗ ∈ PrX∗(∂ΦA(x, 0)) if and only if x∗ ∈ ∂f(x) +A∗∂g(Ax) andconcludes the proof. �


In case X = Y and A = IdX , the identity operator on X, the primal opti-mization problem (PA) becomes

(P F ) infx∈X{f(x) + g(x)}.

The conjugate of the perturbation function

ΦF : X ×X → R, ΦF (x, y) = f(x) + g(x+ y),

reads(ΦF )∗(x∗, y∗) = f ∗(x∗ − y∗) + g∗(y∗) ∀(x∗, y∗) ∈ X∗ ×X∗

and gives rise to the following so-called Fenchel dual problem to (P F )

(DF ) supy∗∈X∗

{−f ∗(−y∗)− g∗(y∗)}.

For (P F ) and (DF ) weak duality always holds.The generalized interior point regularity conditions stated for the primal-dual

pair (PA)− (DA) become

(RCF1 ) ∃x′ ∈ dom f ∩ dom g such that g (or f) is continuous at x′,

in case X is a Banach space

(RCF2 ) X is a Banach space, f and g are lower semicontinuous

and 0 ∈ core(dom g − dom f),

(RCF2′ ) X is a Banach space, f and g are lower semicontinuous

and 0 ∈ int(dom g − dom f),

and

(RCF3 ) X is a Banach space, f and g are lower semicontinuous

and 0 ∈ sqri(dom g − dom f),

and in case X is finite-dimensional

(RCF4 ) X is finite-dimensional and ri(dom g) ∩ ri(dom f) 6= ∅,

respectively. The regularity condition (RCF3 ) was introduced by Attouch and

Brezis and bears their names, while (RCF2 ) and (RCF

2′ ) are called the Moreau-Rockafellar regularity conditions.

The closedness-type regularity conditions (RCA5 ) reads in case of the primal-

dual pair (P F )− (DF )

(RCF5 ) f and g are lower semicontinuous

and epi f ∗ + epi g∗ is weakly∗ closed.


Theorem 10.1 gives rise to the following result.

Theorem 10.2 Let f, g : X → R be proper and convex functions such thatdom f∩dom g 6= ∅. If one of the regularity conditions (RCF

i ), i ∈ {1, 2, 2′, 3, 4, 5},is fulfilled, then the following statements are true:

(a) (f+g)∗(x∗) = miny∗∈X∗{f ∗(x∗−y∗)+g∗(y∗)} = (f ∗�g∗)(x∗) for all x∗ ∈ X∗;

(b) ∂ (f + g) (x) = ∂f(x) + ∂g(x) for all x ∈ X.

In particular, for (P F ) and (DF ) strong duality holds.

Example 10.3 (a) Let X and Y be notrivial real Banach spaces and f =δX×{0}, g = δ{0}×Y : X × Y → R proper, convex and lower semicontinuous func-tions. It holds dom f ∩ dom g = {(0, 0)} and v(P F ) = inf(x,y)∈X×Y {f(x, y) +g(x, y)} = 0. Since f ∗ = δ{0}×Y ∗ and g∗ = δX∗×{0}, it holds v(DF ) = 0and (x∗, y∗) = (0, 0) is an optimal solution of the dual, which means that for(P F ) and (DF ) strong duality holds. Since neither f nor g is continuous at(0, 0), the regularity condition (RCF

1 ) is not fulfilled. On the other hand, sincedom g − dom f = X × Y , we have 0 ∈ core(dom g − dom f), thus the regularitycondition (RCF

2 ) is fulfilled.(b) Let X be an infinite-dimensional real Banach space and x] : X → R a non-

continuous linear functional. The functions f = x] and g = −x] are proper andconvex and have full-domain, which means that 0 ∈ core(dom f − dom g). Theirconjugate functions are identically +∞, thus v(DF ) = −∞. On the other hand,v(P F ) = 0. This shows that for the regularity conditions (RCF

i ), i ∈ {2, 2′, 3},the assumption that the functions f and g are lower semicontinous cannot beomitted.

Example 10.4 (a) The functions f = δ[−1,+∞)×{0}, g = δ(−∞,1]×{0} : R2 → R areproper, convex and lower semicontinuous, (0, 0) ∈ dom f ∩ dom g and it holdsv(P F ) = infx∈R2{f(x) + g(x)} = 0. We have

f ∗ : R2 → R, f ∗(x∗1, x∗2) =

{−x∗1, if x∗1 ≤ 0,+∞, if x∗1 > 0,

(10.1)

and

g∗ : R2 → R, g∗(x∗1, x∗2) =

{x∗1, if x∗1 ≥ 0,

+∞, if x∗1 < 0,(10.2)

thus

v(DF ) = sup(x∗1,x

∗2)∈R2

{−f ∗(−x∗1,−x∗2)− g∗(x∗1, x∗2)} = sup(x∗1,x

∗2)∈R+×R

{−2x∗1} = 0

and every (0, x∗2), with x∗2 ∈ R, is an optimal solution of the dual. This shows thatfor (P F ) and (DF ) strong duality holds. Since cone(dom g − dom f) = R× {0},


neither (RCF1 ) nor (RCF

2 ) is fulfilled. However, the regularity condition (RCF3 )

is fulfilled.(b) The functions f = δ(−1,+∞)×{0}, g = δ(−∞,1)×{0} : R2 → R are proper,

convex and not lower semicontinuous, (0, 0) ∈ dom f∩dom g and it holds v(P F ) =infx∈R2{f(x) + g(x)} = 0. Their conjugate functions are given by (10.1) and(10.2), respectively, thus for (P F ) and (DF ) strong duality holds. None of theregularity conditons (RCF

i ), i ∈ {1, 2, 2′, 3}, is fulfilled, however, since ri(dom g)∩ri(dom f) = (−1, 1)× {0}, the regularity conditions (RCF

4 ) holds.

Example 10.5 Let `2 ={

(xn)n∈N ⊆ R |∑

n∈N |xn|2 < +∞}

, the convex andclosed sets C = {(xn)n∈N ∈ `2 | x2n−1 + x2n = 0 ∀n ∈ N} and D = {(xn)n∈N ∈`2 | x2n + x2n+1 = 0 ∀n ∈ N}, and the proper, convex and lower semicontinuousfunctions f, g : `2 → R, f = δC and, for x = (xn)n∈N, g(x) = x1 + δD(x). It holdsv(P F ) = infx∈`2{f(x) + g(x)} = 0 and v(DF ) = supx∗∈`2{−f ∗(x∗) − g∗(x∗)} =−∞, while 0 /∈ sqri(dom g − dom f) and 0 ∈ icr(dom g − dom f). This showsthat the condition 0 ∈ sqri(dom g − dom f) in (RCF

4 ) cannot be replaced by theweaker condition 0 ∈ icr(dom g − dom f).

Example 10.6 (closedness-type versus generalized interior point regularity con-ditions for Fenchel duality) Let f, g : R→ R, f(x) = 1

2x2 + δR+(x) and g = δR− be

proper, convex and lower semicontinuous functions. Then dom f ∩ dom g = {0}and v(P F ) = 0. It holds

f ∗(x∗) =

{12x∗, if x∗ ≥ 0,0, if x∗ < 0

and g∗ = δR+ ,

thus v(DF ) = 0 and every x∗ ≥ 0 is an optimal solution of the dual. Sincedom g − dom f = R−, none of the regularity conditons (RCF

i ), i ∈ {1, 2, 2′, 3, 4},is fulfilled. On the other hand, epi f ∗ + epi g∗ = R × R+, which means that theclosedness-type regularity condition (RCF

5 ) is verified.

Example 10.7 For C = R+×R and D = {(x1, x2) ∈ R2 | 2x1+x22 ≤ 0}, the func-

tions f = δC , g = δD : R2 → R are proper, convex and lower semicontinuous andfulfill dom f ∩dom g = {(0, 0)} and v(P F ) = inf(x1,x2)∈R2{f(x1, x2) + g(x1, x2)} =0. It holds

f ∗ = δR−×{0} and g∗(x∗1, x∗2) =

(x∗2)2

x∗1, if x∗1 > 0,

0, if x∗1 = x∗2 = 0,+∞, otherwise,

and v(DF ) = sup(x∗1,x∗2)∈R2{−f ∗(−x∗1,−x∗2)− g∗(x∗1, x∗2)} = 0, and (x∗1, x

∗2) = (0, 0)

is an optimal solution of the dual. This means that, despite of the fact that noneof the regularity conditons (RCF

i ), i ∈ {1, 2, 2′, 3, 4, 5}, is fulfilled, for (P F ) and(DF ) strong duality holds.


In case f(x) = 0 for all x ∈ X, the primal optimization problem (PA) becomes

(PAg) infx∈X

(g ◦ A)(x).

The conjugate of the perturbation function

ΦAg : X × Y → R, ΦAg(x, y) = g(Ax+ y),

reads for all (x∗, y∗) ∈ X∗ × Y ∗

(ΦAg)∗(x∗, y∗) = f ∗(x∗ − A∗y∗) + g∗(y∗) =

{g∗(y∗), if x∗ = A∗y∗,

+∞, otherwise,

and gives rise to the the following dual problem to (PAg)

(DAg) supy∗∈Y ∗A∗y∗=0

{−g∗(y∗)}.

For (PAg) and (DAg) weak duality always holds.The generalized interior point regularity conditions stated for the primal-dual

pair (PA)− (DA) become

(RCAg

1 ) ∃x′ ∈ A−1(dom g) such that g is continuous at Ax′,

in case X and Y are Banach spaces

(RCAg

2 ) X and Y are Banach spaces, g is lower semicontinuousand 0 ∈ core(dom g − ranA),

(RCAg

2′ ) X and Y are Banach spaces, g is lower semicontinuousand 0 ∈ int(dom g − ranA),

and

(RCAg

3 ) X and Y are Banach spaces, g is lower semicontinuousand 0 ∈ sqri(dom g − ranA),

and in case Y is finite-dimensional

(RCAg

4 ) Y is finite-dimensional and ri(dom g) ∩ ranA 6= ∅,

respectively, while the closedness-type regularity condition reads

(RCAg

5 ) g is lower semicontinuous and (A∗ × IdR)(epi g∗) is weakly∗ closed.

Theorem 10.1 leads to the following result.


Theorem 10.8 Let g : Y → R be a proper and convex function and A : X → Ya continuous linear operator such that ranA∩dom g 6= ∅. If one of the regularityconditions (RC

Ag

i ), i ∈ {1, 2, 2′, 3, 4, 5}, is fulfilled, then the following statementsare true:

(a) (g ◦ A)∗(x∗) = miny∗∈Y ∗,A∗y∗=x∗{g∗(y∗)} for all x∗ ∈ X∗;

(b) ∂ (g ◦ A) (x) = A∗∂g(Ax) for all x ∈ X.

In particular, for (PAg) and (DAg) strong duality holds.

We close this section by considering a primal-dual pair of optimization prob-lems which is introduced as a special instance of (PAg) − (DAg). For m ≥ 2,let fi : X → R, i = 1, ...,m, be proper functions such that ∩mi=1 dom fi 6= ∅. Wedenote Xm := X × ...×X︸︷︷︸

m

. Then g : Xm → R, g(x1, ..., xm) =∑m

i=1 fi(xi), is a

proper function and A : X → Xm, Ax = (x, ..., x), is a continuous linear operatorsuch that ranA ∩ dom g 6= ∅.

For this choice, the primal optimization problem (PAg) becomes

(PΣ) infx∈X

{m∑i=1

fi(x)

}.

For all (x∗1, ..., x∗m) ∈ (X∗)m, g∗(x∗1, ..., x

∗m) =

∑mi=1 f

∗i (x∗i ) and A∗(x∗1, ..., x

∗m) =∑m

i=1 x∗i , thus the dual (DAg) is nothing else than

(DΣ) supx∗i∈X∗,i=1,...,m,

m∑i=1

x∗i =0

{−

m∑i=1

f ∗i (x∗i )

}.

For (PΣ) and (DΣ) weak duality always holds.As dom g =

∏mi=1 dom fi and A−1(dom g) = ∩mi=1 dom fi, the regularity condi-

tion (RCAg

1 ) reads

(RCΣ1 ) ∃x′ ∈

m⋂i=1

dom fi such that fi is continuous at x′, i = 1, ...,m.

We have ranA = {(x, ..., x) | x ∈ X} = ∆Xm , which denotes the diagonal set

of Xm. The regularity conditions (RCAg

i ), i ∈ {2, 2′, 3, 4}, lead to

(RCΣ2 ) X is a Banach space, fi, i = 1, ...,m, are lower semicontinuous

and 0 ∈ core

(m∏i=1

dom fi −∆Xm

),

(RCΣ2′) X is a Banach space, fi, i = 1, ...,m, are lower semicontinuous

and 0 ∈ int

(m∏i=1

dom fi −∆Xm

),

11 Lagrange duality 81

(RCΣ3 ) X is a Banach space, fi, i = 1, ...,m, are lower semicontinuous

and 0 ∈ sqri

(m∏i=1

dom fi −∆Xm

),

and

(RCΣ4 ) X is finite-dimensional and

⋂mi=1 ri(dom fi) 6= ∅,

respectively.We notice that

(x∗, r) ∈ (A∗ × IdR)(epi g∗)⇔ ∃(x∗1, ..., x∗m) ∈ (X∗)m such that g∗(x∗1, ..., x∗m) ≤ r

and A∗(x∗1, ..., x∗m) = x∗

⇔ ∃(x∗1, ..., x∗m) ∈ (X∗)m such thatm∑i=1

f ∗i (x∗i ) ≤ r

andm∑i=1

x∗i = x∗ ⇔ (x∗, r) ∈m∑i=1

epi f ∗i ,

thus the closedness-type regularity condition (RCAg

5 ) reads

(RCΣ5 ) fi, i = 1, ...,m, are lower semicontinuous

and∑m

i=1 epi f ∗i is weakly∗ closed.

Theorem 10.8 leads to the following result.

Theorem 10.9 Let fi : X → R, i = 1, ...,m, be proper and convex functionssuch that ∩mi=1 dom fi 6= ∅. If one of the regularity conditions (RCΣ

i ), i ∈{1, 2, 2′, 3, 4, 5}, is fulfilled, then the following statements are true:

(a) (∑m

i=1 fi)∗(x∗) = min {

∑mi=1 f

∗i (x∗i ) |

∑mi=1 x

∗i = x∗} = (f ∗1�...�f

∗m)(x∗) for

all x∗ ∈ X∗;

(b) ∂ (∑m

i=1 fi) (x) =∑m

i=1 ∂fi(x) for all x ∈ X.

In particular, for (PΣ) and (DΣ) strong duality holds.

11 Lagrange duality

In this section, by using the general perturbation approach, we develop a dualityframework for the constrained optimization problem

(PL) inf f(x).such that x ∈ S,

g(x) 5C 0


Here, X and Z are nontrivial real normed spaces, C ⊆ Z is a nonempty proper(C ∩ −C = {0}) convex cone, S ⊆ X is a nonempty set, f : X → R is a properfunction and g : X → Z is a vector function such that dom f ∩S ∩ g−1(−C) 6= ∅.By “5C” we denote the partial order on Z defined for all u, v ∈ Z by

u 5C v ⇔ v − u ∈ C.

We will consider as perturbation function to this problem

ΦL : X × Z → R, ΦL(x, z) =

{f(x), if x ∈ S, g(x) 5C z,+∞, otherwise.

Its conjugate function (ΦL)∗ : X∗ × Z∗ → R reads for all (x∗, z∗) ∈ X∗ × Z∗

(ΦL)∗(x∗, z∗) = supx∈S,z∈Z

g(x)−z∈−C

{〈x∗, x〉+ 〈z∗, z〉 − f(x)}

= sups∈−C〈−z∗, s〉+ sup

x∈S{〈x∗, x〉+ 〈z∗, g(x)〉 − f(x)}

=

{supx∈S{〈x∗, x〉+ 〈z∗, g(x)〉 − f(x)}, if z∗ ∈ −C∗,

+∞, otherwise,

where C∗ = {z∗ ∈ Z∗ | 〈z∗, c〉 ≥ 0 ∀c ∈ C} denotes the dual cone of C, and givesrise to the following so-called Lagrange dual problem to (PL)

(DL) supz∗∈−C∗

infx∈S{f(x)− 〈z∗, g(x)〉},

or, equivalently,(DL) sup

z∗∈C∗infx∈S{f(x) + 〈z∗, g(x)〉}.

By the weak duality theorem we have v(DL) ≤ v(PL).The perturbation function ΦL is proper. If S is a convex set, f a convex

function and g a C-convex function, namely,

g(λx+ (1− λ)y)) 5C λg(x) + (1− λ)g(y) ∀x, y ∈ X ∀λ ∈ [0, 1]

or, equivalently, its C-epigraph epiC g = {(x, z) ∈ X×Z | g(x) 5C z} is a convexset, then ΦL is convex. This can be easily notice from the fact that

epi ΦL = {(x, z, r) ∈ X × Z × R | (x, r) ∈ epi f} ∩ (S × Z × R) ∩ (epiC g × R).

The feasibility condition guarantees that

0 ∈ PrZ(dom ΦL) = {z ∈ Z | ∃x ∈ dom f ∩ S such that z ∈ g(x) + C}= g(dom f ∩ S) + C.

The regularity condition (RCΦ1 ) states in this particular case that there exists

x′ ∈ dom f ∩S ∩ g−1(−C) such that the function z 7→ f(x′) + δS(x′) + δg(x′)+C(z)is continuous at 0, which is the same with asking that there exists x′ ∈ dom f ∩S ∩ g−1(−C) such that 0 ∈ int(g(x′) + C) or, equivalently, that


(RCL1 ) ∃x′ ∈ dom f ∩ S such that g(x′) ∈ − intC.

This is nothing else than the classical Slater constraint qualification.If S is a closed set, f is a lower semicontinuous function and g is a C-epi closed

function, which means that its C-epigraph is a closed set, then epi ΦL is a closedset, in other words, ΦL is lower semicontinuous. If C is closed and 〈z∗, g(·)〉 islower semicontinuous for all z∗ ∈ C∗, then g is C-epi closed.

In Banach spaces we obtain from (RCΦ2 ) and (RCΦ

2′) the following regularityconditions

(RCL2 ) X and Z are Banach spaces,S is closed, f is lower semicontinuous,

g is C-epi closed and 0 ∈ core (g(dom f ∩ S) + C)

and

(RCL2′) X and Z are Banach spaces,S is closed, f is lower semicontinuous,

g is C-epi closed and 0 ∈ int (g(dom f ∩ S) + C),

respectively, while the regularity condition (RCΦ3 ) leads to

(RCL3 ) X and Z are Banach spaces,S is closed, f is lower semicontinuous,

g is C-epi closed and 0 ∈ sqri (g(dom f ∩ S) + C).

The regularity condition (RCΦ4 ) becomes

(RCL4 ) Z is finite-dimensional and 0 ∈ ri (g(dom f ∩ S) + C).

In order to formulate a closedness-type regularity condition for (PL)− (DL),we will need to calculate PrX∗×R(epi(ΦL)∗). We have

(x∗, r) ∈ PrX∗×R(epi(ΦL)∗)⇔ ∃z∗ ∈ −C∗ such that

supx∈S{〈x∗, x〉+ 〈z∗, g(x)〉 − f(x)} ≤ r

⇔ ∃z∗ ∈ C∗ such that

supx∈X{〈x∗, x〉 − 〈z∗, g(x)〉 − f(x)− δS(x)} ≤ r

⇔ ∃z∗ ∈ C∗ such that (f + 〈z∗, g(·)〉+ δS)∗(x∗) ≤ r

⇔ (x∗, r) ∈ ∪z∗∈C∗ epi(f + 〈z∗, g(·)〉+ δS)∗.

Consequently, the regularity condition (RCΦ5 ) leads to

(RCL5 ) S is closed, f is lower semicontinuous, g is C-epi closed and

∪z∗∈C∗ epi(f + 〈z∗, g(·)〉+ δS)∗ is weakly∗ closed.

By combining Theorem 8.12 and Theorem 9.4, we obtain the following result.

Theorem 11.1 Let S ⊆ X be a nonempty convex set, f : X → R a properand convex function and g : X → Z a C-convex function such that dom f ∩ S ∩g−1(−C) 6= ∅. If one of the regularity conditions (RCL

i ), i ∈ {1, 2, 2′, 3, 4, 5}, isfulfilled, then the following statements are true:


(a) infx∈S,g(x)5C0{f(x)−〈x∗, x〉} = maxz∗∈C∗ infx∈S{f(x) + 〈z∗, g(x)〉− 〈x∗, x〉}for all x∗ ∈ X∗;

(b) ∂(f + δS∩g−1(−C)

)(x) =

⋃z∗∈C∗,〈z∗,g(x)〉=0 ∂ (f + 〈z∗, g(·)〉+ δS) (x) for all

x ∈ S ∩ g−1(−C).

In particular, for (PL) and (DL) strong duality holds.

Proof. The two statements follow from the formula of the conjugate functionof ΦL and the fact that

PrX∗(∂ΦL(x, 0)) =⋃

z∗∈C∗,〈z∗,g(x)〉=0

∂ (f + 〈z∗, g(·)〉+ δS) (x) ∀x ∈ S ∩ g−1(−C).

To see that the last statement is true, let x ∈ S ∩ g−1(−C) be fixed. Thenx∗ ∈ PrX∗(∂ΦL(x, 0)) if and only if there exists −z∗ ∈ Z∗ such that (x∗,−z∗) ∈∂ΦL(x, 0) or, equivalently, ΦL(x, 0) + (ΦL)∗(x∗,−z∗) = 〈x∗, x〉. This is nothingelse than z∗ ∈ C∗ and

f(x) + (f + 〈z∗, g(·)〉+ δS)∗(x∗) = 〈x∗, x〉,

which, by taking into account the Young-Fenchel inequality, is further equivalentto z∗ ∈ C∗, 〈z∗, g(x)〉 = 0 and

f(x) + 〈z∗, g(x)〉+ δS(x) + (f + 〈z∗, g(·)〉+ δS)∗(x∗) = 〈x∗, x〉.

This shows that x∗ ∈ PrX∗(∂ΦL(x, 0)) if and only if there exists z∗ ∈ C∗ such that〈z∗, g(x)〉 = 0 and x∗ ∈ ∂ (f + 〈z∗, g(·)〉+ δS) (x), which concludes the proof. �

In case Z = Rm+k, C = Rm+ × {0Rk}, which is a proper and convex cone,and g : X → Rm+k, g(x) = (g1(x), ..., gm(x), h1(x), ..., hk(x)), (PL) becomes thefollowing optimization problem with inequality and equality constraints

(PL) inf f(x).such that x ∈ S,

gi(x) ≤ 0, i = 1, ....,m,hj(x) = 0, j = 1, ..., k

Since C∗ = Rm+ × Rk, its Lagrange dual problem reads

(DL) supλi≥0,i=1,...,mµj∈R,i=1,...,k

infx∈S

{f(x) +

m∑i=1

λigi(x) +k∑j=1

µjhj(x)

}.

If gi : X → R, i = 1, ...,m, are convex and hj : X → R, j = 1, ..., k, are affine,then g : X → Rm+k, g(x) = (g1(x), ..., gm(x), h1(x), ..., hk(x)), is C-convex. Onecan easily notice that (RCL

1 ) is not fulfilled. However, if

0 ∈ ri(g(dom f ∩ S) + Rm+ × {0Rk}

), (11.1)


then for (PL) and (DL) strong duality holds and for all x ∈ S∩g−1(−Rm+×{0Rk})we have

∂(f + δS∩g−1(−Rm

+×{0Rk})

)(x) =

⋃λi≥0,λigi(x)=0,

i=1,...,m,µj∈R,i=1,...,k

∂

(f +

m∑i=1

λigi +k∑j=1

µjhj + δS

)(x).

This means that, if (11.1) is fulfilled, then x is an optimal solution of (PL)if and only if there exists so-called Lagrange multipliers λi ≥ 0, i = 1, ...,m,and µj ∈ R, j = 1, ..., k, such that the so-called Kraush-Kuhn-Tucker system ofoptimality conditions

x ∈ S, gi(x) ≤ 0, λigi(x) = 0, i = 1, ...,mhj(x) = 0, j = 1, ..., k

0 ∈ ∂(f +

∑mi=1 λigi +

∑kj=1 µjhj + δS

)(x)

is true.If, additionally, X = Rn, let E = {(x, z) ∈ Rn×Rm+k | x ∈ dom f∩S, z−g(x) ∈

Rm+ ×{0Rk}}. For all x ∈ Rn, let E(x) = {z ∈ Rm+k | (x, z) ∈ E}. Then E(x) = ∅for x /∈ dom f ∩ S and E(x) = g(x) + Rm+ × {0Rk} for x ∈ dom f ∩ S. Accordingto Theorem 2.18 it holds

(x, z) ∈ riE ⇔ x ∈ ri(dom f ∩ S)

and z ∈ ri(g(x) + Rm+ × {0Rk}) = g(x) + int Rm+ × {0Rk}.

Consequently,

riE = {(x, z) ∈ Rn × Rm+k | x ∈ ri(dom f ∩ S), z − g(x) ∈ int Rm+ × {0Rk}}

and, from here,

ri(g(dom f ∩ S) + Rm+ × {0Rk}

)= ri PrRm+k(E) = PrRm+k(riE)

= g(ri(dom f ∩ S)) + int Rm+ × {0Rk}.

This proves that, in this case, the regularity condition (11.1) is nothing else thanthe weak Slater constaint qualification

∃x′ ∈ ri(dom f ∩ S) such that gi(x′) < 0, i = 1, ...,m, hj(x

′) = 0, j = 1, ..., k.(11.2)

Example 11.2 (closedness-type versus generalized interior point regularity con-ditions for Lagrange duality) Let X = Z = R2, C = R2

+, S = R2+, f : R2 →

R, f(x1, x2) = 12x2

1 + x2, a convex and continuous function, and g : R2 →


R2, g(x1, x2) = (x1, x2−x1), an R2+-convex and continuous function. Then dom f∩

S ∩ g−1(−R2+) = {(0, 0)} and v(PL) = 0. It holds

v(DL) = supλ1,λ2≥0

infx1,x2≥0

{1

2x2

1 + x2 + λ1x1 + λ2(x2 − x1)

}= 0

and (λ1, λ2) = (0, 0) is an optimal solution of the dual problem. One can easilysee that none of the regularity conditons (RCL

i ), i ∈ {1, 2, 2′, 3, 4}, is fulfilled. Onthe other hand, ∪(λ1,λ2)∈R2

+epi(f + 〈(λ1, λ2), g(·)〉+δR2

+)∗ = R2×R+, which means

that the closedness-type regularity condition (RCL5 ) is verified.

Chapter V

Maximal monotone operators

In this chapter we will discuss the most significant notions and results of thetheory of maximal monotone operators.

12 Monotone set-valued operators

Definition 12.1 (monotone operator) A set-valued operator T : X ⇒ X∗ is saidto be monotone if

〈y∗ − x∗, y − x〉 ≥ 0 ∀x, y ∈ X ∀x∗ ∈ T (x) ∀y∗ ∈ T (y).

Let T : X ⇒ X∗ be a set-valued operator. The set

• D(T ) := {x ∈ X | T (x) 6= ∅} denotes the domain of T ;

• R(T ) := ∪x∈XT (x) denotes the range of T ;

• G(T ) := {(x, x∗) ∈ X ×X∗ | x∗ ∈ T (x)} denotes the graph of T .

Example 12.2 (a) For a function f : X → R, its convex subdifferential ∂f :X ⇒ X∗ is a monotone operator. Indeed, let x, y ∈ X, x∗ ∈ ∂f(x) and y∗ ∈∂f(y). Then f(x), f(y) ∈ R,

f(y)− f(x) ≥ 〈x∗, y − x〉 and f(x)− f(y) ≥ 〈y∗, x− y〉,

which gives 0 ≥ 〈x∗ − y∗, y − x〉 or, equivalently, 〈y∗ − x∗, y − x〉 ≥ 0.If f : X → R is a convex and Gateaux-differentiable function, then ∇f : X →

X∗ is a single-valued monotone operator (see also Theorem 7.8).(b) Let S : X → X∗ be a single-valued linear operator. Then S is monotone

if and only if 〈S(y − x), y − x〉 = 〈Sy − Sx, y − x〉 ≥ 0 for all x, y ∈ X, in otherwords, 〈Sz, z〉 ≥ 0 for all z ∈ X, which is nothing else than S is positive.

(c) A function ϕ : R → R is a monotone operator if and only if it is nonde-creasing, in other words, for every t1, t2 ∈ R with t1 < t2 it holds ϕ(t1) ≤ ϕ(t2).

87

88 V Maximal monotone operators

(d) Let (H, 〈·, ·〉) be a real Hilbert space, C ⊆ H a nonempty set and S : C →H a nonexpansive operator, which means that ‖S(x) − S(y)‖ ≤ ‖x − y‖ for allx, y ∈ C. Then the operator T : H ⇒ H, defined as T (x) = x − S(x) for allx ∈ C and T (x) = ∅, otherwise, is monotone. Indeed, for all x, y ∈ C it holds

〈T (y)− T (x), y − x〉 = 〈y − x− S(y) + S(x), y − x〉= ‖y − x‖2 − 〈S(y)− S(x), y − x〉≥ ‖y − x‖2 − ‖S(y)− S(x)‖‖y − x‖=‖y − x‖ (‖y − x‖ − ‖S(y)− S(x)‖) ≥ 0.

Notice that 0 ∈ R(T ) if and only if S has a fixed point in C.(e) Let (H, 〈·, ·〉) be a real Hilbert space and C ⊆ H a nonempty subset. The

(set-valued) projection operator onto the set C

PC : H ⇒ H, PC(x) =

{u ∈ C | ‖x− u‖ = inf

c∈C‖x− c‖

},

is monotone. Indeed, let x, y ∈ H, u ∈ PC(x) and v ∈ PC(y). Then ‖x − u‖ ≤‖x− v‖ and ‖y − v‖ ≤ ‖y − u‖, which implies

‖x− u‖2 + ‖y − v‖2 ≤ ‖x− v‖2 + ‖y − u‖2

or, equivalently,〈x, u〉+ 〈y, v〉 ≥ 〈x, v〉+ 〈y, u〉.

This is nothing else than 〈y − x, v − u〉 ≥ 0.

The following statement shows that the monotonicity is preserved by multipli-cation with a positive scalar, addition and composition with a continuous linearoperator.

Proposition 12.3 Let S : X ⇒ X∗, T : Y ⇒ Y ∗ be monotone operators,A : X → Y a continuous linear operator and γ > 0. Then the operators

γS : X ⇒ X∗, (γS)(x) = γS(x),

andS + A∗TA : X ⇒ X∗, (S + A∗TA)(x) = S(x) + A∗(T (Ax)),

are monotone.

Proof. That γS is monotone is clear. Let u, v ∈ X, u∗ ∈ S(u)+A∗(T (Au)) andv∗ ∈ S(v) + A∗(T (Av)). Then u∗ = x∗1 + A∗y∗1, with x∗1 ∈ S(u) and y∗1 ∈ T (Au),and v∗ = x∗2 + A∗y∗2, with x∗2 ∈ S(v) and y∗2 ∈ T (Av). It holds

〈v∗ − u∗, v − u〉 =〈x∗2 − x∗1, v − u〉+ 〈A∗y∗2 − A∗y∗1, v − u〉=〈x∗2 − x∗1, v − u〉+ 〈y∗2 − y∗1, Av − Au〉 ≥ 0,

which proves that S + A∗TA is monotone. �

12 Monotone set-valued operators 89

Definition 12.4 (maximal monotone operator) A monotone set-valued operatorT : X ⇒ X∗ is said to be maximally monotone if there exists no monotoneoperator T ′ : X ⇒ X∗ such that G(T ) ( G(T ′).

A monotone operator T : X ⇒ X∗ is maximally monotone if and only if

(x, x∗) ∈ X ×X∗ and 〈y∗ − x∗, y − x〉 ≥ 0 ∀(y, y∗) ∈ G(T )⇒ (x, x∗) ∈ G(T ).(12.1)

Assume that T : X ⇒ X∗ is maximally monotone and that, for (x, x∗) ∈ X×X∗,〈y∗− x∗, y− x〉 ≥ 0 for all (y, y∗) ∈ G(T ). Then (x, x∗) ∈ G(T ), since, otherwise,the operator T ′ : X ⇒ X∗ defined such that G(T ′) = G(T )∪{(x, x∗)} is a propermonotone extension of T .

On the other hand, assume that (12.1) holds and T : X ⇒ X∗ is not max-imally monotone. Then there exists a monotone operator T ′ : X ⇒ X∗ suchthat G(T ) ( G(T ′). Let (x, x∗) ∈ G(T ′) \ G(T ). Since T ′ is monootne, for all(y, y∗) ∈ G(T ) it holds 〈y∗ − x∗, y − x〉 ≥ 0, which, according to (12.1), implies(x, x∗) ∈ G(T ). Contradiction!

The next theorem shows that every monotone operator admits a maximalmonotone extension.

Theorem 12.5 Let T : X ⇒ X∗ be a monotone operator. Then there exists amaximal monotone extension of T , namely, a maximal monotone operator T ′ :X ⇒ X∗ such that G(T ) ⊆ G(T ′).

Proof. Let

M := {S : X ⇒ X∗ | S is monotone and G(T ) ⊆ G(S)},

which is obviously a nonempty set. We define on M the partial order

S1 ≤ S2 ⇔ G(S1) ⊆ G(S2).

It is easy to see that “≤” is reflexive, transitive and antisymmetric.Let (Si)i∈I be a totally ordered subset of M, which means that for all i, j ∈

either Si ≤ Sj or Sj ≤ Si. Then ∪i∈ISi ∈ M is an upper bound of this chain.According to the Zorn Lemma (see, for instance, [4, page 44]), there exists amaximal element T ′ in M. This means that T ′ is monotone, G(T ) ⊆ G(T ′) andthere exists no S ∈ M, S 6= T ′, such that G(T ′) ⊆ G(S). In other words, T ′ ismaximally monotone. �

Example 12.6 If S : X → X∗ is a single-valued linear and positive operator,then S is maximally monotone. According to Example 12.2 (b), S is monotone.Let (x, x∗) ∈ X ×X∗ such that 〈Sy− x∗, y− x〉 ≥ 0 for all y ∈ X. For all u ∈ Xand all λ > 0, by taking y := x+ λu, it holds

λ〈Sx− x∗, u〉+ λ2〈Su, u〉 ≥ 0


or, equivalently,

〈Sx− x∗, u〉+ λ〈Su, u〉 ≥ 0.

We let λ converge to zero and obtain 〈Sx−x∗, u〉 ≥ 0 for all u ∈ X, consequently,〈Sx− x∗, u〉 = 0 for all u ∈ X, in other words, Sx = x∗. This shows that (12.1)holds, which proves that S is maximally monotone.

Proposition 12.7 Let T : X ⇒ X∗ be a maximally monotone operator. Thenfor all x ∈ X the set T (x) is convex and weakly∗ closed.

Proof. Let x ∈ X be fixed. Then x∗ ∈ T (x) if and only if 〈y∗ − x∗, y − x〉 ≥ 0for all (y, y∗) ∈ G(T ). This means that

T (x) = ∩(y,y∗)∈G(T ){x∗ ∈ X∗ | 〈x∗, y − x〉 ≤ 〈y∗, y − x〉}

and proves that T (x) is the intersection of a family of convex and weakly∗ closedsets. Consequently, T (x) is convex and weakly∗ closed. �

In real Hilbert spaces we have the following criterion for the maximality of amonotone operator.

Proposition 12.8 Let (H, 〈·, ·〉) be a real Hilbert space and T : H → H a single-valued monotone and continuous operator. Then T is maximally monotone.

Proof. Let (x, u) ∈ H ×H such that 〈T (y) − u, y − x〉 ≥ 0 for all y ∈ H. Forall n ∈ N and yn := x + 1

n(u − T (x)), we have 〈T (yn) − u, u − T (x)〉 ≥ 0. Since

T (yn)→ T (x) as n→ +∞, this yields 0 ≤ 〈T (x)−u, u−T (x)〉 = −‖T (x)−u‖2,thus, T (x) = u. This shows that (12.1) holds, which proves that T is maximallymonotone. �

Example 12.9 As a consequence of Proposition 12.8 it follows that the projec-tion operator PC : H → H onto a nonempty, convex and closed subset C of areal Hilbert space H is maximally monotone.

13 The maximal monotonicity of the convex

subdifferential

This section is dedicated to the proof of the fact that the convex subdifferentialof a proper, convex and lower semicontinuous function defined on a nontrivialreal Banach space is maximally monotone. To this end we will prove a numberof important intermediary results. We will start with a very important result innonlinear analysis, the so-called Ekeland variational principle.

13 The maximal monotonicity of the convex subdifferential 91

Theorem 13.1 (Ekeland variational principle) Let (X, d) be a complete metricspace and f : X → R a proper, lower semicontinuous and bounded from belowfunction. Then for every x0 ∈ dom f and every ε > 0 there exists xε ∈ X suchthat

f(xε) ≤ f(x0)− εd(x0, xε)

andf(xε) < f(x) + εd(xε, x) ∀x ∈ X \ {xε}.

Proof. Let x0 ∈ dom f and ε > 0 be fixed. For every x ∈ X we consider theset F (x) := {y ∈ X | f(y) + εd(x, y) ≤ f(x)}. We will prove that there existsxε ∈ F (x0) such that F (xε) = {xε}, which is nothing else than the conclusion ofthe theorem.

Since y 7→ f(y) + εd(x, y) is lower semicontinuous, the set F (x) is closed forevery x ∈ X. For every x ∈ dom f it holds x ∈ F (x) ⊆ dom f , while for everyx ∈ X \ dom f it holds F (x) = X.

We also note that F (y) ⊆ F (x) for every y ∈ F (x). This is obvious forx /∈ dom f . Let x ∈ dom f , y ∈ F (x) and z ∈ F (y). Then

f(z) + εd(y, z) ≤ f(y) and f(y) + εd(x, y) ≤ f(x).

By the triangle inequality we get

f(z) + εd(x, z) ≤ f(y) + ε(d(x, z)− d(y, z)) ≤ f(y) + εd(x, y) ≤ f(x),

which proves that z ∈ F (x).Since f is bounded from below, v0 := inf{f(x) | x ∈ F (x0)} ∈ R. Then

there exists x1 ∈ F (x0) such that f(x1) < v0 + 12. We have v1 := inf{f(x) | x ∈

F (x1)} ∈ R and there exists x2 ∈ F (x1) such that f(x2) < v1 + 122

. Continuingin this way we obtain the sequences (vn)n≥0 and (xn)n≥0 such that for all n ≥ 0

vn := inf{f(x) | x ∈ F (xn)} ∈ R, xn+1 ∈ F (xn) and f(xn+1) < vn +1

2n+1.

From F (xn+1) ⊆ F (xn) it yields that vn+1 ≥ vn for all n ≥ 0. In addition, asxn+1 ∈ F (xn), for all n ≥ 1 it holds

εd(xn+1, xn) ≤ f(xn)− f(xn+1) ≤ f(xn)− vn ≤ f(xn)− vn−1 <1

2n.

Consequently, for n, p ≥ 1 it holds

d(xn+p, xn) ≤n+p−1∑j=n

d(xj+1, xj) <1

ε

n+p−1∑j=n

1

2j<

1

2n−1.

which shows that (xn)n≥0 is a Cauchy sequence, thus it converges to some elementxε ∈ X.


Let n ≥ 0. For every m ≥ n it holds xm ∈ F (xm) ⊆ F (xn) and, since F (xn)is closed, we have xε ∈ F (xn). In particular, xε ∈ F (x0), thus xε ∈ dom f .

Let x ∈ F (xε). Then, for all n ≥ 0, x ∈ F (xn), in other words, εd(x, xn) ≤f(xn) − f(x) ≤ f(xn) − vn < 1

2n. This yields that xn → x as n → +∞, thus,

x = xε. Since, obviously, xε ∈ F (xε), we get F (xε) = {xε} and the proof iscomplete. �

The following variant of the Ekeland variational principle is often used inapplications.

Corollary 13.2 Let (X, d) be a complete metric space and f : X → R a proper,lower semicontinuous and bounded from below function. Let ε > 0 and x0 ∈ dom fsuch that f(x0) ≤ infx∈X f(x) + ε. Then for every λ > 0 there exists xλ ∈ Xsuch that

f(xλ) ≤ f(x0), d(x0, xλ) ≤ λ

and

f(xλ) < f(x) +ε

λd(x, xλ) ∀x ∈ X \ {xλ}.

Proof. According to Theorem 13.1, there exists xλ ∈ X such that

f(xλ) ≤ f(x0)− ε

λd(x0, xλ)

and

f(xλ) < f(x) +ε

λd(x, xλ) ∀x ∈ X \ {xλ}.

Obviously, f(xλ) ≤ f(x0). On the other hand,

f(xλ) +ε

λd(x0, xλ) ≤ f(x0) ≤ inf

x∈Xf(x) + ε ≤ f(xλ) + ε,

which proves that d(x0, xλ) ≤ λ. �

The following extension of the (convex) subdifferential of a function will playan important role in the subsequent analysis.

Definition 13.3 ((convex) ε-subdifferential) Let f : X → R be a given functionand x ∈ X be such that f(x) ∈ R and ε ≥ 0. The set defined as

∂εf(x) := {x∗ ∈ X∗ | f(y)− f(x) ≥ 〈x∗, y − x〉 − ε ∀y ∈ X} for f(x) ∈ R,

and ∂εf(x) := ∅ for f(x) /∈ R, is called the (convex) ε-subdifferential of f at x.The elements of ∂εf(x) are called (convex) ε-subgradients of f at x. Therefore,the (convex) ε-subdifferential is a set-valued operator ∂εf : X ⇒ X∗.

13 The maximal monotonicity of the convex subdifferential 93

Let x ∈ X. It holds ∂0f(x) = ∂f(x) and, for 0 ≤ ε1 ≤ ε2, ∂ε1f(x) ⊆ ∂ε2f(x).Let ε ≥ 0. Then ∂εf(x) is a convex and weakly∗ closed set,

∂εf(x) = ∩η>ε∂ηf(x),

and

x∗ ∈ ∂εf(x)⇔ f ∗(x∗) + f(x) ≤ 〈x∗, x〉+ ε⇒ x ∈ ∂εf ∗(x∗).

Next we formulate a theorem, which is a consequence of the Ekeland’s vari-ational principle and it relates the graph of the (convex) ε-subdifferential of aproper, convex and lower semicontinuous function defined on a Banach spacewith the graph of its (convex) subdifferential.

Theorem 13.4 (Brøndsted-Rockafellar Theorem) Let X be a Banach space andf : X → R a proper, convex and lower semicontinuous function. Then for everyx0 ∈ dom f , ε > 0, λ > 0 and x∗0 ∈ ∂εf(x0), there exists (xλ, x

∗λ) ∈ X ×X∗ such

that

x∗λ ∈ ∂f(xλ), ‖xλ − x0‖ ≤ λ and ‖x∗λ − x∗0‖∗ ≤ε

λ.

In particular,

D(∂f) ⊆ dom f ⊆ cl(D(∂f)) and R(∂f) ⊆ dom f ∗ ⊆ cl(R(∂f)).

Proof. Let the proper, convex and lower semicontinuous function g : X →R, g(x) = f(x) − 〈x∗0, x〉. It holds g(x0) ≤ infx∈X g(x) + ε. By Corollary 13.2,there exists xλ ∈ X such that ‖xλ − x0‖ ≤ λ and g(xλ) ≤ g(x) + ε

λ‖x − xλ‖ for

all x ∈ X. Consequently, from Theorem 10.2 (b) and Proposition 6.5 (c),

0 ∈ ∂(g +

ε

λ‖ · −xλ‖

)(xλ) = ∂g(xλ) +

ε

λ∂‖ · −xλ‖(xλ) = ∂g(xλ) +

ε

λ∂‖ · ‖(0)

= ∂f(xλ)− x∗0 +ε

λ∂‖ · ‖(0),

which means that there exists x∗λ ∈ ∂f(xλ) such that λε(x∗0 − x∗λ) ∈ ∂‖ · ‖(0) or,

equivalently, ‖x∗λ − x∗0‖∗ ≤ ελ. The last two statements of the theorem follow

immediately. �

Now we can prove the main result of this section.

Theorem 13.5 Let X be a Banach space and f : X → R a proper, convex andlower semicontinuous. Then ∂f : X ⇒ X∗ is a maximal monotone operator.

Proof. Let (x, x∗) ∈ X×X∗ such that 〈y∗−x∗, y−x〉 ≥ 0 for all (y, y∗) ∈ G(∂f).We will prove that x∗ ∈ ∂f(x), which will mean that (12.1) is fulfilled and will,consequently, lead to the desired conclusion.


We define g : X → R, g(y) = f(y+x)−〈x∗, y〉, which is a proper, convex andlower semicontinuous function. By the Fenchel-Moreau Theorem, there existsu∗ ∈ X∗ such that

infy∈X

{g(y) +

1

2‖y‖2

}= −g∗(u∗)− 1

2‖u∗‖2

∗ ∈ R.

That this infimum is finite, is an easy consequence of the fact that g is proper,convex and lower semicontinuous.

From here it yields that there exists a sequence (yn)n∈N ⊆ X such that for alln ∈ N

1

n2≥ g(yn) +

1

2‖yn‖2 + g∗(u∗) +

1

2‖u∗‖2

∗ (13.1)

≥ 〈u∗, yn〉+1

2‖yn‖2 +

1

2‖u∗‖2

∗

≥ 1

2(‖yn‖ − ‖u∗‖∗)2,

where the second inequality follows from Young-Fenchel inequality. This yields,on the one hand,

‖yn‖ → ‖u∗‖∗ and 〈u∗, yn〉 → −‖u∗‖2∗ as n→ +∞.

On the other hand, for all n ≥ 1 it holds

g(yn) + g∗(u∗) ≤ 1

n2+ 〈u∗, yn〉

or, equivalently, u∗ ∈ ∂ 1n2g(yn). According to Theorem 13.4 there exists sequences

(zn)n∈N ⊆ X and (z∗n)n∈N ⊆ X∗ such that for all n ∈ N it holds

z∗n ∈ ∂g(zn), ‖zn − yn‖ ≤1

nand ‖z∗n − u∗‖∗ ≤

1

n.

We notice that for all n ∈ N we have z∗n ∈ ∂g(zn) = ∂f(zn+x)−x∗ or, equivalently,(zn + x, z∗n + x∗) ∈ G(∂f), which, according to our assumption, yields

〈z∗n, zn〉 ≥ 0.

We have for all n ∈ N

0 ≤ 〈z∗n, zn〉 = 〈z∗n, zn − yn〉+ 〈z∗n, yn〉 = 〈z∗n, zn − yn〉+ 〈z∗n − u∗, yn〉+ 〈u∗, yn〉

≤(

1

n+ ‖u∗‖∗

)‖zn − yn‖+ ‖z∗n − u∗‖∗‖yn‖+ 〈u∗, yn〉

and let in this inequality n→ +∞. This yields 0 ≤ −‖u∗‖2∗, thus u∗ = 0, which

also proves that yn → 0 as n→ +∞. Using that g is lower semicontinuous, (13.1)yields for n→ +∞

g(0) + g∗(0) = 0

or, equivalently, 0 ∈ ∂g(0) = ∂f(x) − x∗, in other words, (x, x∗) ∈ G(∂f). Thisprovides the desired conclusion. �

14 Monotone operators and their representative functions 95

The following example shows that there are maximal monotone operatorswhich are not (convex) subdifferentials.

Example 13.6 (maximal monotone operators which are not (convex) subdif-ferentials) A single-valued linear operator S : X → X∗ is said to be skew if〈Sx, x〉 = 0 for all x ∈ X or, equivalently, if

〈Sy, x〉 = −〈Sx, y〉 ∀x, y ∈ X.

As seen in Example 12.6, skew operators are maximally monotone.We will show that nonzero single-valued linear and skew operators are not

(convex) subdifferentials. This means that for every Banach space X of dimensionstrictly greater than 1 there exist maximal monotone operators defined on Xwhich are not (convex) subdifferentials.

Let S : X → X∗ be a nonzero single-valued linear and skew operator andassume that there exists f : X → R such that ∂f(x) = {Sx} for all x ∈ X. Thisimplies that dom f = X and for all x, y ∈ X it holds

f(y)− f(x) ≥ 〈Sx, y − x〉 and f(x)− f(y) ≥ 〈Sy, x− y〉.

From here we get

〈Sx, y〉 = −〈Sy, x〉 ≥ f(y)− f(x) ≥ 〈Sx, y〉 ∀x, y ∈ X,

consequently, f(y) = f(x) + 〈Sx, y〉 for all x, y ∈ X. In conclusion, f(y) = f(0)for all y ∈ X, which further implies that S is a zero operator. Contradiction!

14 A representation of monotone operators by

convex functions

Throughout this section we will assume that X is a nontrivial real Banach space.We will discuss some basic results related to the representation of monotoneoperators by convex functions. The central role in our investigations will beplayed by the so-called Fitzpatrick function.

Definition 14.1 (Fitzpatrick function) Let S : X ⇒ X∗ be a monotone opera-tor. The Fitzpatrick function associated to S is defined by

ϕS : X ×X∗ → R, ϕS(x, x∗) = sup(s,s∗)∈G(S)

{〈x∗, s〉+ 〈s∗, x〉 − 〈s∗, s〉}.

The most important properties of the Fitzpatrick function associated to amaximal monotone operator are collected in the following result.


Theorem 14.2 Let S : X ⇒ X∗ be a maximal monotone operator. The Fitz-patrick function ϕS : X ×X∗ → R associated to S has the following properties:

(a) ϕS(x, x∗) ≥ 〈x∗, x〉 for all (x, x∗) ∈ X ×X∗;

(b) ϕS is proper, convex and weak×weak∗ lower semicontinuous;

(c) G(S) = {(x, x∗) ∈ X × X∗ | ϕS(x, x∗) = 〈x∗, x〉} = {(x, x∗) ∈ X ×X∗ | ϕ∗S(x∗, x) = 〈x∗, x〉}.

Proof. (a) Let (x, x∗) ∈ X×X∗. If (x, x∗) ∈ G(S), then, obviously, ϕS(x, x∗) ≥〈x∗, x〉. If (x, x∗) /∈ G(S), then there exists (s, s∗) ∈ G(S) such that 〈s∗ − x∗, s−x〉 < 0 or, equivalently, 〈x∗, x〉 < 〈x∗, s〉+ 〈s∗, x〉 − 〈s∗, s〉 ≤ ϕS(x, x∗).

(b) The function ϕS is the pointwise supremum of a family of affine andweak×weak∗ continuous functions, consequently, it is convex and weak×weak∗

lower semicontinuous. The properness of ϕS follows from (c) by taking intoaccount that G(S) 6= ∅.

(c) We have seen that for (x, x∗) /∈ G(S) it holds ϕS(x, x∗) > 〈x∗, x〉, whichproves that {(x, x∗) ∈ X × X∗ | ϕS(x, x∗) = 〈x∗, x〉} ⊆ G(S). For the oppositeinclusion, let (x, x∗) ∈ G(S). By using the monotonicity of S, for all (s, s∗) ∈G(S) it holds 〈s∗−x∗, s−x〉 ≥ 0 or, equivalently, 〈x∗, x〉 ≥ 〈x∗, s〉+〈s∗, x〉−〈s∗, s〉.From here we get that ϕS(x, x∗) ≤ 〈x∗, x〉, which, in combination with (a), provesthat ϕS(x, x∗) = 〈x∗, x〉.

For all (x, x∗) ∈ X ×X∗ we have

ϕS(x, x∗) = sup(s,s∗)∈G(S)

{〈x∗, s〉+ 〈s∗, x〉 − 〈s∗, s〉}

= sup(s,s∗)∈G(S)

{〈x∗, s〉+ 〈s∗, x〉 − ϕS(s, s∗)}

≤ sup(s,s∗)∈X×X∗

{〈(x∗, x), (s, s∗)〉 − ϕS(s, s∗)}

= ϕ∗S(x∗, x),

which proves that

ϕ∗S(x∗, x) ≥ ϕS(x, x∗) ≥ 〈x∗, x〉 ∀(x, x∗) ∈ X ×X∗. (14.1)

If for (x, x∗) ∈ X × X∗ it holds ϕ∗S(x∗, x) = 〈x∗, x〉, then ϕS(x, x∗) = 〈x∗, x〉,which implies that (x, x∗) ∈ G(S). On the other hand, let (x, x∗) ∈ G(S). Forall (y, y∗) ∈ X ×X∗ it holds

ϕS(y, y∗) ≥ 〈x∗, y〉+ 〈y∗, x〉 − 〈x∗, x〉 = 〈(x∗, x), (y, y∗)〉 − 〈x∗, x〉,

which implies that

〈x∗, x〉 ≥ sup(y,y∗)∈X×X∗

{〈(x∗, x), (y, y∗)〉 − ϕS(y, y∗)} = ϕ∗S(x∗, x)

and gives, in combination with (14.1), ϕ∗S(x∗, x) = 〈x∗, x〉. �


Example 14.3 (a) If f : X → R is a proper, convex and lower semicontinuousfunction. For all (x, x∗) ∈ X ×X∗ it holds

〈x∗, x〉 ≤ ϕ∂f (x, x∗) ≤ f(x) + f ∗(x∗) ≤ ϕ∗∂f (x

∗, x) ≤ 〈x∗, x〉+ δG(∂f)(x, x∗).

(b) Let H be a real Hilbert space and C ⊆ H a nonempty, convex and closedset. The Fitzpatrick function of the normal cone operator NC : X ⇒ X∗ readsϕNC

(x, x∗) = δC(x) + δ∗C(x∗).

We will denote by J : X ⇒ X∗,

J(x) = ∂

(1

2‖ · ‖2

)(x) = {x∗ ∈ X∗ | ‖x∗‖∗ = ‖x‖ and 〈x∗, x〉 = ‖x‖‖x∗‖∗},

the so-called duality mapping of the space X. By Theorem 13.5, J is a maximalmonotone operator.

For (x, x∗) ∈ X ×X∗, let

∆(x, x∗) =1

2‖x‖2 + 〈x∗, x〉+

1

2‖x∗‖2

∗

≥ 1

2‖x‖2 − ‖x‖‖x∗‖∗ +

1

2‖x∗‖2

∗ =1

2(‖x‖ − ‖x∗‖∗)2 ≥ 0.

Then

∆(x, x∗) = 0 ⇔ ‖x‖ = ‖x∗‖∗ and 〈−x∗, x〉 = ‖x‖‖x∗‖∗ = ‖x‖2 ⇔ −x∗ ∈ J(x).

In reflexive Banach spaces, one can formulate in terms of the duality mappingthe so-called “−J” criterion for the maximality of a monotone operator. Inthis setting, we will identify the dual of X × X∗ with X∗ × X and consider asduality product 〈(y∗, y), (x, x∗)〉 = 〈y∗, x〉 + 〈x∗, y〉 for (x, x∗) ∈ X × X∗ and(y∗, y) ∈ X∗ × X. Consequently, the conjugate of a function h : X × X∗ → Rreads

h∗ : X∗ ×X → R, h∗(y∗, y) = sup(x,x∗)∈X×X∗

{〈y∗, x〉+ 〈x∗, y〉 − h(x, x∗)}.

Theorem 14.4 (“−J” criterion for maximality) Let X be a reflexive Banachspace and S : X ⇒ X∗ a monotone operator. The following statements areequivalent:

(i) S is maximally monotone;

(ii) G(S) +G(−J) = X ×X∗;

(iii) R(S(x+ ·) + J(·)) = X∗ for all x ∈ X.


Proof. “(ii)⇒ (i)” Let (x, x∗) ∈ X ×X∗ such that 〈y∗ − x∗, y − x〉 ≥ 0 for all(y, y∗) ∈ G(S). Then there exists (s, s∗) ∈ G(S) such that (x−s, x∗−s∗) ∈ G(−J)or, equivalently, ∆(x− s, x∗ − s∗) = 0. Since

0 = ∆(x− s, x∗ − s∗) =1

2‖x− s‖2 + 〈x∗ − s∗, x− s〉+

1

2‖x∗ − s∗‖2

∗

≥ 1

2‖x− s‖2 +

1

2‖x∗ − s∗‖2

∗ ≥ 0,

we obtain (x, x∗) = (s, s∗) ∈ G(S). This proves that S is maximally monotone.“(i)⇒ (iii)” Let x ∈ X and x∗ ∈ X∗. Then ϕS(z, z∗)−〈z∗, z〉+∆(x−z, x∗−

z∗) ≥ 0 for all (z, z∗) ∈ X ×X∗ or, equivalently,

inf(z,z∗)∈X×X∗

{ϕS(z, z∗) + g(z, z∗)} ≥ 0,

where g : X ×X∗ → R,

g(z, z∗) = ∆(x− z, x∗ − z∗)− 〈z∗, z〉

=1

2‖x− z‖2 +

1

2‖x∗ − z∗‖2

∗ − 〈x∗, z〉 − 〈z∗, x〉+ 〈x∗, x〉,

is a convex and continuous function. According to strong Fenchel duality, thereexists (u∗, u) ∈ X∗ ×X such that

ϕ∗S(u∗, u) + g∗(−u∗,−u) ≤ 0.

We have

g∗(−u∗,−u)= sup(z,z∗)∈X×X∗

{〈x∗ − u∗, z〉+ 〈z∗, x− u〉 − 1

2‖x− z‖2 − 1

2‖x∗ − z∗‖2

∗

}− 〈x∗, x〉

= supt∈X

{〈x∗ − u∗, x− t〉 − 1

2‖t‖2

}+ sup

t∗∈X∗

{〈x∗ − t∗, x− u〉 − 1

2‖t∗‖2

∗

}− 〈x∗, x〉

=〈x∗ − u∗, x〉+1

2‖x∗ − u∗‖2

∗ + 〈x∗, x− u〉+1

2‖x− u‖2 − 〈x∗, x〉

=∆(x− u, x∗ − u∗)− 〈u∗, u〉,

consequently, also by using (14.1),

0 ≤ ϕ∗S(u∗, u)− 〈u∗, u〉+ ∆(x− u, x∗ − u∗) ≤ 0.

From here it yields ϕ∗S(u∗, u) = 〈u∗, u〉 and ∆(x−u, x∗−u∗) = 0 or, equivalently,(u, u∗) ∈ G(S) and x∗ − u∗ ∈ J(u− x). Consequently,

x∗ = u∗ + x∗ − u∗ ∈ S(x+ (u− x)) + J(u− x) ∈ R(S(x+ ·) + J(·)).


“(iii)⇒ (ii)” Let (x, x∗) ∈ X ×X∗. Then there exists u ∈ X such that x∗ ∈S(x+u)+J(u), which means that there exists −u∗ ∈ J(u) with x∗+u∗ ∈ S(x+u).One can easily notice that u∗ ∈ J(−u) or, equivalently, −u∗ ∈ (−J)(−u). Thisproves that

(x, x∗) = (x+ u, x∗ + u∗) + (−u,−u∗) ∈ G(S) +G(−J).

�

An immediate consequence of Theorem 14.4 is the celebrated Rockafellar’ssurjectivity theorem.

Theorem 14.5 (Rockafellar’s surjectivity theorem) Let X be a nontrivial reflex-ive Banach space and S : X ⇒ X∗ a maximal monotone operator. Then

R(S + J) = X∗.

Remark 14.6 If X is a reflexive Banach space, S : X ⇒ X∗ a monotone op-erator and J and J−1 are single-valued, then R(S + J) = X∗ implies that S ismaximally monotone.

Indeed, let (x, x∗) ∈ X×X∗ such that 〈y∗−x∗, y−x〉 ≥ 0 for all (y, y∗) ∈ G(S).Since x∗ + J(x) ∈ R(S + J), there exists (s, s∗) ∈ G(S) such that x∗ + J(x) =s∗ + J(s) or, equivalently, J(s)− J(x) = x∗ − s∗. It holds

1

2‖s‖2 +

1

2‖J(s)‖2

∗ +1

2‖x‖2 +

1

2‖J(x)‖2

∗ = 〈J(s), s〉+ 〈J(x), x〉

= 〈J(x), s〉+ 〈J(s), x〉+ 〈J(s)− J(x), s− x〉= 〈J(x), s〉+ 〈J(s), x〉+ 〈x∗ − s∗, s− x〉

≤ 〈J(x), s〉+ 〈J(s), x〉 ≤ 1

2‖s‖2 +

1

2‖J(s)‖2

∗ +1

2‖x‖2 +

1

2‖J(x)‖2

∗.

Consequently, 12‖s‖2 + 1

2‖J(x)‖2

∗ = 〈J(x), s〉 and 12‖x‖2 + 1

2‖J(s)‖2

∗ = 〈J(s), x〉,which is further equivalent to (s, J(x)), (x, J(s)) ∈ G(J). Since J is single-valued,it holds J(x) = J(s), thus, x∗ = s∗ and, using in addition that J−1 is single-valued, we have x = s. In conclusion, (x, x∗) = (s, s∗) ∈ G(S), which proves thatS is maximally monotone.

Remark 14.7 In Hilbert spaces, by combining Theorem 14.5 and Remark 14.6,we rediscover the celebrated Minty’s theorem. This says that if H is a real Hilbertspace and S : H ⇒ H a monotone operator, then S is maximally monotone ifand only if R(S + IdH) = H.

If a Banach space is uniformly smooth and uniformly convex, as are for in-stance Lp-spaces, for 1 < p <∞, then both J and J−1 are single-valued.

The following proposition proposes an approach to construct monotone oper-ators inspired by the concept of Fitzpatrick function.


Proposition 14.8 Let h : X ×X∗ → R be a convex function such that

h(x, x∗) ≥ 〈x∗, x〉 ∀(x, x∗)×X ×X∗,

and Mh : X ⇒ X∗, G(Mh) = {(x, x∗) ∈ X × X∗ | h(x, x∗) = 〈x∗, x〉}. Thefollowing statements are true:

(a) Mh is a monotone operator;

(b) if (x, x∗), (y, y∗) ∈ X ×X∗ are such that

h(x, x∗)− 〈x∗, x〉+ ∆(y − x, y∗ − x∗) ≤ 0,

then (x, x∗) ∈ G(Mh);

(c) if S : X ⇒ X∗ is such that G(S) ⊆ G(Mh) and for all (y, y∗) ∈ X × X∗there exists (x, x∗) ∈ G(S) such that

h(x, x∗)− 〈x∗, x〉+ ∆(y − x, y∗ − x∗) ≤ 0,

then S is maximally monotone and S = Mh.

Proof. (a) For all (x, x∗), (y, y∗) ∈ G(Mh) it holds

1

2〈x∗, x〉+

1

2〈y∗, y〉 =

1

2h(x, x∗) +

1

2h(y, y∗) ≥ h

(1

2x+

1

2y,

1

2x∗ +

1

2y∗)

≥⟨

1

2x∗ +

1

2y∗,

1

2x+

1

2y

⟩,

consequently, 〈y∗ − x∗, y − x〉 ≥ 0.(b) Since ∆(y − x, y∗ − x∗) ≥ 0, it yields h(x, x∗) ≤ 〈x∗, x〉, thus h(x, x∗) =

〈x∗, x〉.(c) From G(S) ⊆ G(Mh) and (a) we have that S is monotone. Let (x, x∗) ∈

X × X∗ such that 〈y∗ − x∗, y − x〉 ≥ 0 for all (y, y∗) ∈ G(S). By assumption,there exists (z, z∗) ∈ G(S) such h(z, z∗)− 〈z∗, z〉 + ∆(x− z, x∗ − z∗) ≤ 0. Sinceh(z, z∗)− 〈z∗, z〉 = 0, it yields

0 ≥ ∆(x− z, x∗ − z∗) =1

2‖x− z‖2 + 〈x∗ − z∗, x− z〉+

1

2‖x∗ − z∗‖2

∗

≥ 1

2‖x− z‖2 +

1

2‖x∗ − z∗‖2

∗ ≥ 0,

which implies that (x, x∗) = (z, z∗) ∈ G(S). This proves that S is maximallymonotone and further that Mh is also maximally monotone. �

In the following we show that in reflexive Banach spaces one can build onthe approach introduced in Proposition 14.8 to construct maximal monotoneoperators.


Proposition 14.9 Let X be a nontrivial reflexive Banach space and h : X ×X∗ → R a proper and convex function such that

h(x, x∗) ≥ 〈x∗, x〉 and h∗(x∗, x) ≥ 〈x∗, x〉 ∀(x, x∗)×X ×X∗.

Then the set-valued operator having as graph {(x, x∗) ∈ X × X∗ | h∗(x∗, x) =〈x∗, x〉} is maximally monotone.

Proof. We define k : X × X∗ → R, k(x, x∗) = h∗(x∗, x), which is a convexfunction that fulfils k(x, x∗) ≥ 〈x∗, x〉 for all (x, x∗)×X ×X∗.

Let (y, y∗) ∈ X ×X∗. Then h(z, z∗) − 〈z∗, z〉 + ∆(y − z, y∗ − z∗) ≥ 0 for all(z, z∗) ∈ X ×X∗ or, equivalently,

inf(z,z∗)∈X×X∗

{h(z, z∗) + g(z, z∗)} ≥ 0,

where g : X × X∗ → R, g(z, z∗) = ∆(y − z, y∗ − z∗) − 〈z∗, z〉, is a convex andcontinuous function. According to strong Fenchel duality, there exists (x∗, x) ∈X∗ ×X such that

k(x, x∗) + g∗(−x∗,−x) = h∗(x∗, x) + g∗(−x∗,−x) ≤ 0,

which, by taking into account the formula for g∗ derived in the proof of Theorem14.4, is nothing else than

k(x, x∗)− 〈x∗, x〉+ ∆(y − x, y∗ − x∗) ≤ 0.

This gives k(x, x∗) = 〈x∗, x〉, namely, (x, x∗) ∈ G(Mk), and implies, according toProposition 14.8 (c), that Mk is maximally monotone. �

Remark 14.10 If X is a reflexive Banach space and f : X → R a proper, convexand lower semicontinuous function, then h : X×X∗ → R, h(x, x∗) = f(x)+f ∗(x∗),is proper and convex and it fulfils for all (x, x∗) ∈ X ×X∗

h(x, x∗) = f(x) + f ∗(x∗) ≥ 〈x∗, x〉 and h∗(x∗, x) = f ∗(x∗) + f(x) ≥ 〈x∗, x〉.

Proposition 14.9 allows us to conclude that, since G(∂f) = {(x, x∗) ∈ X ×X∗ | h∗(x∗, x) = 〈x∗, x〉} = {(x, x∗) ∈ X × X∗ | f ∗(x∗) + f(x) = 〈x∗, x〉}, the(convex) subdifferential of f is maximally monotone.

Theorem 14.11 Let X be a reflexive Banach space and h : X × X∗ → R aproper, convex and lower semicontinuous function such that

h(x, x∗) ≥ 〈x∗, x〉 ∀(x, x∗)×X ×X∗.

Then the set-valued operator having as graph {(x, x∗) ∈ X × X∗ | h(x, x∗) =〈x∗, x〉} is maximally monotone if and only if

h∗(x∗, x) ≥ 〈x∗, x〉 ∀(x, x∗)×X ×X∗.


Proof. “⇒” For all (x, x∗) ∈ X ×X∗ we have

h∗(x∗, x) = sup(s,s∗)∈X×X∗

{〈(x∗, x), (s, s∗)〉 − h(s, s∗)}

≥ sup(s,s∗)∈G(Mh)

{〈x∗, s〉+ 〈s∗, x〉 − h(s, s∗)}

= sup(s,s∗)∈G(Mh)

{〈x∗, s〉+ 〈s∗, x〉 − 〈s∗, s〉}

= ϕMh(x, x∗) ≥ 〈x∗, x〉.

“⇐” By the Fenchel-Moreau Theorem, the function k : X ×X∗ → R, k(x, x∗) =h∗(x∗, x), is proper and convex. It fulfils k(x, x∗) ≥ 〈x∗, x〉 for all (x, x∗) ∈ X×X∗.In addition, again via the Fenchel-Moreau Theorem,

k∗(x∗, x) = h∗∗(x, x∗) = h(x, x∗) ≥ 〈x∗, x〉 ∀(x, x∗) ∈ X ×X∗.

According to Proposition 14.9, Mk∗ = Mh = {(x, x∗) ∈ X × X∗ | h(x, x∗) =〈x∗, x〉} is maximally monotone. �

15 The maximal monotonicity of the sum of two

maximal monotone operators

Let X be a nontrivial real Banach space. We will address the question whenthe sum of two maximal monotone operators is maximally monotone. We haveseen in Proposition 12.3 that the sum is monotone, however, it is in general notmaximally monotone.

Example 15.1 The sets C = {(x1, x2) ∈ R2 | (x1 − 1)2 + x22 ≤ 1} and D =

{(x1, x2) ∈ R2 | (x1 + 1)2 + x22 ≤ 1} are both nonempty, convex and closed, thus,

according to Theorem 13.5, the normal cone operators NC , ND : R2 ⇒ R2 aremaximally monotone. For every x 6= (0, 0) it holds NC(x) + ND(x) = ∅, whileNC(0, 0) + ND(0, 0) = R− × {0} + R+ × {0} = R× {0}. One can easily see thatNC +ND is not maximally monotone, since G(NC +ND) = {(0, 0)}× (R×{0}) (G(NC∩D) = {(0, 0)} × R2.

Remark 15.2 If f, g : X → R are proper, convex and lower semincontinu-ous functions, then the (convex) subdifferential operators ∂f, ∂g : X ⇒ X∗ aremaximally monotone. We have seen in Theorem 10.2 (b) that if one of the reg-ularity conditions (RCF

i ), i ∈ {1, 2, 2′, 3, 4, 5}, is fulfilled, then ∂f(x) + ∂g(x) =∂(f + g)(x) for all x ∈ X, which means that ∂f + ∂g is maximally monotone.

In the following theorem we will provide a weak sufficient regularity conditionfor the maximal monotonicity of the sum of two maximal monotone operators inreflexive Banach spaces formulated in terms of their Fitzpatrick functions.

15 The maximal monotonicity of the sum of two maximal monotone operators 103

Theorem 15.3 Let X be a reflexive Banach space and S, T : X ⇒ X∗ be twomaximal monotone operators such that

0 ∈ sqri (PrX domϕS − PrX domϕT ) . (15.1)

Then S + T is maximally monotone.

Proof. Let

h : X ×X∗ → R, h(x, x∗) = infs∗∈X∗

{ϕS(x, s∗) + ϕT (x, x∗ − s∗)}.

Since h is the infimal value function of a convex function, it is convex.For all (x, x∗) ∈ X ×X∗ it holds

h(x, x∗) = infs∗∈X∗

{ϕS(x, s∗) + ϕT (x, x∗ − s∗)} ≥ infs∗∈X∗

{〈s∗, x〉+ 〈x∗ − s∗, x〉}

≥ 〈x∗, x〉 > −∞.

Since 0 ∈ PrX domϕS−PrX domϕT , there exist x ∈ X and s∗, t∗ ∈ X∗, such that(x, s∗) ∈ domϕS and (x, t∗) ∈ domϕT . Consequently, h(x, s∗ + t∗) ≤ ϕS(x, s∗) +ϕT (x, t∗) < +∞, which proves that h is proper.

Let (x, x∗) ∈ X ×X∗ be fixed. It holds

h∗(x∗, x) = sup(z,z∗)∈X×X∗

{〈x∗, z〉+ 〈z∗, x〉 − h(z, z∗)}

= supz∈X,z∗,s∗∈X∗

{〈x∗, z〉+ 〈z∗, x〉 − ϕS(z, s∗)− ϕT (z, z∗ − s∗)}

= supz∈X,s∗,t∗∈X∗

{〈x∗, z〉+ 〈s∗ + t∗, x〉 − ϕS(z, s∗)− ϕT (z, t∗)}

= (F +G)∗ (x∗, x, x), (15.2)

where

F,G : X×X∗×X∗ → R, F (u, s∗, t∗) = ϕS(u, s∗) and G(u, s∗, t∗) = ϕT (u, t∗).

The functions F and G are proper, convex and lower semicontinuous and fulfil

domF = {(u, s∗, t∗) ∈ X ×X∗ ×X∗ | (u, s∗) ∈ domϕS}

and

domG = {(u, s∗, t∗) ∈ X ×X∗ ×X∗ | (u, t∗) ∈ domϕT}.

It holds

domF − domG = (PrX domϕS − PrX domϕT )×X∗ ×X∗,


thus, according to (15.1), 0 ∈ sqri(domF − domG). From Theorem 10.2 (a) and(15.2) we have

h∗(x∗, x) = (F +G)∗ (x∗, x, x) = F ∗�G∗(x∗, x, x)

= min(u∗,s,t)∈X∗×X×X

{F ∗(u∗, s, t) +G∗(x∗ − u∗, x− s, x− t)}. (15.3)

For all (u∗, s, t) ∈ X∗ ×X ×X it holds

F ∗(u∗, s, t) = supu∈X,s∗,t∗∈X∗

{〈u∗, u〉+ 〈s∗, s〉+ 〈t∗, t〉 − ϕS(u, s∗)}

=

{ϕ∗S(u∗, s), if t = 0,

+∞, otherwise,

and

G∗(x∗ − u∗, x− s, x− t)= sup

u∈X,s∗,t∗∈X∗{〈x∗ − u∗, u〉+ 〈s∗, x− s〉+ 〈t∗, x− t〉 − ϕT (u, t∗)}

=

{ϕ∗T (x∗ − u∗, x− t), if x = s,

+∞, otherwise,

consequently,

h∗(x∗, x) = minu∗∈X∗

{ϕ∗S(u∗, x) + ϕ∗T (x∗ − u∗, x)}.

From here, by using (14.1), we get for all (x, x∗) ∈ X ×X∗

h∗(x∗, x) ≥ infu∗∈X∗

{〈u∗, x〉+ 〈x∗ − u∗, x〉} = 〈x∗, x〉.

This proves that the hypotheses of Proposition 14.9 are fulfilled, consequently,the set-valued operator having as graph {(x, x∗) ∈ X ×X∗ | h∗(x∗, x) = 〈x∗, x〉}is maximally monotone.

We notice that for (x, x∗) ∈ X ×X∗ it holds

h∗(x∗, x) = 〈x∗, x〉⇔ ∃u∗ ∈ X∗ such that

〈x∗, x〉 = ϕ∗S(u∗, x) + ϕ∗T (x∗ − u∗, x) ≥ 〈u∗, x〉+ 〈x∗ − u∗, x〉 = 〈x∗, x〉⇔ ∃u∗ ∈ X∗ such that ϕ∗S(u∗, x) = 〈u∗, x〉 and ϕ∗T (x∗ − u∗, x) = 〈x∗ − u∗, x〉⇔ ∃u∗ ∈ X∗ such that (x, u∗) ∈ G(S) and (x, x∗ − u∗) ∈ G(T )

⇔ ∃u∗ ∈ X∗ such that x∗ = u∗ + (x∗ − u∗) ∈ S(x) + T (x)

⇔ (x, x∗) ∈ G(S + T ).

This proves that S + T is maximally monotone. �

15 The maximal monotonicity of the sum of two maximal monotone operators 105

Remark 15.4 If X is a reflexive Banach space and S, T : X ⇒ X∗ are twomaximal monotone operators, then one can prove that (see [9])

0 ∈ sqri (PrX domϕS − PrX domϕT )

if and only if

cone (D(S)−D(T )) is a closed linear subspace.

A sufficient condition for these conditions to hold and, consequently, for themaximal monotonicity of S+T is the celebrated Rockafellar’s regularity condition

int(D(S)) ∩D(T ) 6= ∅.

Indeed, one has from here

0 ∈ int(D(S)−D(T ))

and, consequently, cone(D(S)−D(T )) = X.Notice that for the maximal mononotone operators in Example 15.1 one has

cone(D(NC)−D(ND)) = cone(C −D) = cone(C) = R+ × R,

which is obviously not a linear subspace.

Remark 15.5 Since F and G in the proof of Theorem 15.3 are proper, convexand lower semicontinuous functions one can obtain relation (15.3), according toTheorem 10.2 (a), also if the closedness-type regularity condition (here we alsotake into account that convex closed sets are weakly closed and thatX is reflexive)

epiF ∗ + epiG∗ is closed in X∗ ×X ×X × R

is fulfiled.In other words, if X is a reflexive Banach space and S, T : X ⇒ X∗ are two

maximal monotone operators such that

{(x∗ + y∗, x, y, r) ∈ X∗ ×X ×X × R | ϕ∗S(x∗, x) + ϕ∗T (y∗, y) ≤ r} is closed,(15.4)

then S + T is maximally monotone.We will provide in the following an example of a pair of maximal monotone

operators for which the interiority-type regularity conditions are not fulfilled,while the closedness-type one holds.

For f, g : R → R, f(x) = 12x2 + δR+(x) and g(x) = δR−(x), the operators

S = ∂f and T = ∂g are maximally monotone. For all x ∈ R we have

S(x) = x+NR+(x) =

x, if x > 0,

R−, if x = 0,∅, otherwise,


T (x) = NR−(x), and

S(x) + T (x) =

{R, if x = 0,∅, otherwise,

which means that S + T is maximally monotone. It holds cone(D(S)−D(T )) =R+, thus, the interiority-type regularity conditions are not fulfilled. For all(x, x∗) ∈ R2 it holds

ϕS(x, x∗) = max

{supy∗≤0

xy∗, supy>0{(x∗ + x)y − y2}

}

=

14(x+ x∗)2, if x ≥ 0, x+ x∗ ≥ 0,

0, if x ≥ 0, x+ x∗ ≤ 0,+∞, otherwise,

and, for all (y, y∗) ∈ R2, by Example 14.3 (b), ϕT (y, y∗) = δR−(y) + δR+(y∗).Therefore, for all (x, x∗) ∈ R2,

ϕ∗S(x∗, x) = max

{sup

z≥0,z+z∗≤0{x∗z + xz∗}, sup

z≥0,z+z∗≥0

{x∗z + xz∗ − 1

4(z + z∗)2

}}=

{x2, if x ≥ 0, x ≥ x∗,

+∞, otherwise,

and, for all (y, y∗) ∈ R2, ϕT (y∗, y) = δR+(y∗) + δR−(y). Consequently,

{(x∗ + y∗, x, y, r) ∈ R× R× R× R | ϕ∗S(x∗, x) + ϕ∗T (y∗, y) ≤ r}

= R×⋃x≥0

({x} × R− × [x2,+∞)

),

which is a closed set, thus, the closedness-type regularity condition (15.4) isfulfilled.

Remark 15.6 (Rockafellar’s conjecture) R.T. Rockafellar conjectured that if Xis a general Banach space and S, T : X ⇒ X∗ are two maximal monotone oper-ators such that int(D(S)) ∩D(T ) 6= ∅, then S + T is maximally monotone. Theconjecture is not solved yet.

Bibliography

[1] H.H. Bauschke, P.L. Combettes: Convex Analysis and Monotone OperatorTheory in Hilbert Spaces, Springer New York Dordrecht Heidelberg London,2017

[2] J.M. Borwein, J.D. Vanderweff : Convex Functions, Cambridge UniversityPress, 2010

[3] R.I. Bot: Conjugate Duality in Convex Optimization, Lecture Notes in Eco-nomics and Mathematical Systems, Vol. 637, Springer Berlin Heidelberg, 2010

[4] R.I. Bot: Skript zur Bachelor-Vorlesung “Funktionalanalysis” (SS2020),www.mat.univie.ac.at/∼rabot/tutorials/st20/skriptfunktionalanalysis.pdf

[5] I. Ekeland, R. Temam: Convex Analysis and Variational Problems, SIAM,1976

[6] R.R. Phelps: Convex Functions, Monotone Operators and Differentiability,Lecture Notes in Mathematics, Vol. 1364, Springer Berlin Heidelberg NewYork, 1993

[7] R.T. Rockafellar: Convex Analysis, Princeton University Press, 1970

[8] W. Rudin: Functional Analysis, McGraw-Hill, 1991

[9] S. Simons: From Hahn-Banach to Monotonicity, Lecture Notes in Mathemat-ics, Vol. 1693, Springer New York, 2008

[10] C. Zalinescu: Convex Analysis in General Vector Spaces, World Scientific,2002

107

https://www.mat.univie.ac.at/~rabot/tutorials/st20/skriptfunktionalanalysis.pdf

Documents

Convex Analysis - mat.univie.ac.at