Lecture 10 - Linearly Constrained Problems: Separation

Lecture 10 - Linearly Constrained Problems: Separation →Alternative Theorems → Optimality Conditions

I A hyperplane

H = {x ∈ Rn : aTx = b} (a ∈ Rn\{0}, b ∈ R)

is said to strictly separate a point y /∈ S from S if

aTy > b

andaTx ≤ b for all y ∈ S .

Theorem (separation of a point from a closed and convex set) Let C ⊆ Rn

be a nonempty closed and convex set, and let y /∈ C . Then there existsp ∈ Rn\{0} and α ∈ R such that

pTy > α and pTx ≤ α for all x ∈ C .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 1 / 21

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Proof of the Separation Theorem

I By the second orthogonal projection theorem, the vector x = PC (y) ∈ Csatisfies

(y − x)T (x− x) ≤ 0 for all x ∈ C ,

which is the same as

(y − x)Tx ≤ (y − x)T x for all x ∈ C .

I Denote p = y − x 6= 0 and α = (y − x)T x. Then

pTx ≤ α for all x ∈ C

I On the other hand,

pTy = (y − x)Ty = (y − x)T (y − x) + (y − x)T x = ‖y − x‖2 + α > α.


Thomas

Typewritten Text

Note p'x = alpha is the same as p' (x-xbar) = 0, which is the hyperplane going through xbar and orthgonal to p. If the boundary of C is a smooth hypersurface, then that hyperplane is the tangent plane of the boundary surface at xbar.

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Farkas Lemma - an Alternative TheoremFarkas Lemma. Let c ∈ Rn and A ∈ Rm×n. Then exactly one of thefollowing systems has a solution

I. Ax ≤ 0, cTx > 0.

II. ATy = c, y ≥ 0.

Another equivalent formulation is the following.

Farkas Lemma - second Formulation Let c ∈ Rn and A ∈ Rm×n. Then thefollowing two claims are equivalent:

(A) The implication Ax ≤ 0⇒ cTx ≤ 0 holds true.

(B) There exists y ∈ Rm+ such that ATy = c.

What does it mean?

Example. A =

(1 5−1 2

), c =

(−19

),


Thomas

Typewritten Text

Thomas

Typewritten Text

intersection of a convex polyhedral cone and a half space in R^n

Thomas

Typewritten Text

Thomas

Typewritten Text

intersection of a hyperplane and the positive orthant in R^m

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Claim: For A and c above, the set defined by I. is empty. Proof: Let y' = [1,2]. Then y' A = [-1 9] = c'. Since y>=0, this means if Ax<=0, then y'Ax <= 0, so c'x<=0. Q.E.D. This argument can easily to extended to show that if II is not empty, then I must be empty,or that, in the second formulation, (B)=>(A). The converse implication is less obvious, and the proof uses the separation theorem.

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Proof of Farkas LemmaI Suppose that system (B) is feasible:∃y ∈ Rm

+ such that ATy = c.

I To see that the implication (A) holds, suppose that Ax ≤ 0 for some x ∈ Rn.

I Multiplying this inequality from the left by yT :

yTAx ≤ 0.

I Hence,cTx ≤ 0,

I Suppose that the implication (A) is satisfied, and let us show that the system(B) is feasible. Suppose in contradiction that system (B) is infeasible.

I Consider the following closed and convex (why?) set

S = {x ∈ Rn : x = ATy for some y ∈ Rm+}

I c /∈ S .


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Proof Contd.

I By the separation theorem ∃p ∈ Rn\{0} and α ∈ R such that pTc > α and

pTx ≤ α for all x ∈ S . (1)

I 0 ∈ S ⇒ α ≥ 0⇒ pTc > 0.

I (1) is equivalent topTATy ≤ α for all y ≥ 0

or to(Ap)Ty ≤ α for all y ≥ 0, (2)

I Therefore, Ap ≤ 0.

I Contradiction to the assertion that implication (A) holds.


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Callout

>=0

Thomas

Line

Thomas

Line

Gordan’s Alternative TheoremTheorem. Let A ∈ Rm×n. Then exactly one of the following two systemshas a solution.

(A) Ax < 0.

(B) p 6= 0,ATp = 0,p ≥ 0.

Proof.I Suppose that system (A) has a solution.I Assume in contradiction that (B) is feasible: ∃p 6= 0 satisfying

ATp = 0,p ≥ 0.I Multiplying the equality ATp = 0 from the left by xT yields (Ax)Tp = 0,

which is an impossible equality.I Suppose that system (A) does not have a solution.I System (A) is equivalent to (s is a scalar) to Ax + se ≤ 0, s > 0.

I or to A

(xs

)≤ 0, cT

(xs

)> 0, where A =

(A e

)and c = en+1.

I The infeasibility of (A) is thus equivalent to the infeasibility of the system

Aw ≤ 0, cTw > 0,w ∈ Rn+1.


Thomas

Typewritten Text

Thomas

Typewritten Text

<0

Thomas

Line

Thomas

Typewritten Text

>=0

Thomas

Line

Proof of Gordan Contd.

I By Farkas’ lemma, ∃z ∈ Rm+ such that(

AT

eT

)z = c

I ⇔ ∃z ∈ Rm+ : AT z = 0, eT z = 1.

I ⇔ ∃0 6= z ∈ Rm+ : AT z = 0.

I ⇒ System (B) is feasible.


KKT Conditions for Linearly Constrained Problems

Theorem (KKT conditions for linearly constrained problems - necessary op-timality conditions)Consider the minimization problem

(P)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m

where f is continuously differentiable over Rn, a1, a2, . . . , am ∈Rn, b1, b2, . . . , bm ∈ R and let x∗ be a local minimum point of (P). Thenthere exist λ1, λ2, . . . , λm ≥ 0 such that

∇f (x∗) +m∑i=1

λiai = 0. (3)

andλi (aT

i x∗ − bi ) = 0, i = 1, 2, . . . ,m. (4)


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Ex: compare this result with the Lagrange multiplier theorem in calculus (which allows general nonlinear, but only *equality*, constraints)

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

lambda_i > 0 & a_i'x*-b_i < 0 cannot happen together, for each i.

Thomas

Line

Thomas

Line

Thomas

Typewritten Text

Note: We see (3), but not (4), in the Lagrange multiplier theorem.

Proof of KKT TheoremI x∗ is a local minimum ⇒ x∗ is a stationary point.I ∇f (x∗)T (x− x∗) ≥ 0 for every x ∈ Rn satisfying aT

i x ≤ bi for anyi = 1, 2, . . . ,m.

I Denote the set of active constraints by

I (x∗) = {i : aTi x∗ = bi}.

I Making the change of variables y = x− x∗, we have

∇f (x∗)Ty ≥ 0 for any y ∈ Rm satisfying aTi (y + x∗) ≤ bi , i = 1, 2, . . . ,m.

I or ∇f (x∗)Ty ≥ 0 for any y satisfying

aTi y ≤ 0 i ∈ I (x∗),

aTi y ≤ bi − aT

i x∗ i /∈ I (x∗).

I The second set of inequalities can be removed, that is, we will prove that

aTi y ≤ 0 for all i ∈ I (x∗)⇒ ∇f (x∗)Ty ≥ 0.


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Line

Thomas

Line

Thomas

Line

Thomas

Typewritten Text

(Same as: grad f(x*)' y >= 0 for any y satisfying just a_i' y <= 0 for all i in I(x*).)

Thomas

Typewritten Text

Thomas

Line

Proof Contd.

I Suppose then that y satisfies aTi y ≤ 0 for all i ∈ I (x∗)

I Since bi − aTi x∗ > 0 for all i /∈ I (x∗), it follows that there exists a small

enough α > 0 for which aTi (αy) ≤ bi − aT

i x∗.

I Thus, since in addition aTi (αy) ≤ 0 for any i ∈ I (x∗), it follows by the

stationarity condition that ∇f (x∗)Ty ≥ 0.

I We have shown aTi y ≤ 0 for all i ∈ I (x∗)⇒ ∇f (x∗)Ty ≥ 0.

I By Farkas’ lemma ∃λi ≥ 0, i ∈ I (x∗) such that

−∇f (x∗) =∑

i∈I (x∗)

λiai .

I Defining λi = 0 for all i /∈ I (x∗) we get that λi (aTi x∗ − bi ) = 0 for all

i ∈ {1, 2, . . . ,m} and

∇f (x∗) +m∑i=1

λiai = 0.


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Line

Thomas

Typewritten Text

The Convex Case

Theorem [KKT conditions for convex linearly constrained problems -necessary and sufficient optimality conditions]Consider the minimization problem

(P)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m

where f is a convex continuously differentiable function over Rn,a1, a2, . . . , am ∈ Rn, b1, b2, . . . , bm ∈ R and let x∗ be a feasible solu-tion of (P). Then x∗ is an optimal solution if and only if there existλ1, λ2, . . . , λm ≥ 0 such that

∇f (x∗) +m∑i=1

λiai = 0. (5)

andλi (aT

i x∗ − bi ) = 0, i = 1, 2, . . . ,m. (6)


Thomas

Line

Thomas

Typewritten Text

Thomas

Line

Proof of KKT in Convex Case

I Necessity was proven.

I Suppose that x∗ is a feasible solution of (P) satisfying (5) and (6). Let x bea feasible solution of (P).

I Define the function

h(x) = f (x) +m∑i=1

λi (aTi x− bi ).

I ∇h(x∗) = 0⇒ x∗ is a minimizer of h over Rn.

I

f (x∗) = f (x∗) +m∑i=1

λi (aTi x∗ − bi ) ≤ f (x) +

m∑i=1

λi (aTi x− bi ) ≤ f (x),


Thomas

Callout

1. It is convex, isn't it? 2. What is grad h(x) ?

Thomas

Callout

(5)

Thomas

Callout

x is feasible

Problems with Equality and Inequality ConstraintsTheorem [KKT conditions for linearly constrained problems]Consider the minimization problem

(Q)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m,cTj x = dj , j = 1, 2, . . . , p.

where f cont. dif., ai , cj ∈ Rn, bi , dj ∈ R.

(i) (necessity of the KKT conditions) If x∗ is a local minimum of (Q), thenthere exist λ1, λ2, . . . , λm ≥ 0 and µ1, µ2, . . . , µp ∈ R such that

∇f (x∗) +m∑i=1

λiai +

p∑j=1

µjcj = 0, (7)

λi (aTi x∗ − bi ) = 0, i = 1, 2, . . . ,m. (8)

(ii) (sufficiency in the convex case) If f is convex over Rn and x∗ is a feasiblesolution of (Q) for which there exist λ1, . . . , λm ≥ 0 and µ1, . . . , µp ∈ Rsuch that (7) and (8) are satisfied, then x∗ is an optimal solution of (Q).


Thomas

Typewritten Text

Equiv. to: c_j' x <= d_j & -c_j' x <= -d_j

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Another easy trick: Any real number can be written as the difference of two non-negative numbers.

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Line

Thomas

Line

Representation Via the LagrangianGiven the a problem

(NLP)min f (x)s.t. gi (x) ≤ 0, i = 1, 2, . . . ,m,

hj(x) = 0, j = 1, 2, . . . , p.

The associated Lagrangian function os

L(x,λ,µ) = f (x) +m∑i=1

λigi (x) +

p∑j=1

µjhj(x).

The KKT conditions can be written as

∇xL(x∗,λ,µ) = ∇f (x∗) +m∑i=1

λi∇gi (x∗) +

p∑j=1

µj∇hj(x∗) = 0

λigi (x∗) = 0, i = 1, 2, . . . ,m.


Thomas

Typewritten Text

Thomas

Typewritten Text

ExamplesI

min 12 (x21 + x22 + x23 )

s.t. x1 + x2 + x3 = 3.

I

min x21 + 2x22 + 4x1x2s.t. x1 + x2 = 1,

x1, x2 ≥ 0.

In class


Projection onto Affine SpacesLemma. Let C be the affine space

C = {x ∈ Rn : Ax = b},

where A ∈ Rm×n and b ∈ Rm. Then

PC (y) = y − AT (AAT )−1(Ay − b).

Proof. In class


Thomas

Typewritten Text

min ||x - y||^2 s.t. A x = b. Everything is convex (in fact it is a QP), apply the KKT result. (In this case, it says that the usual Lagrange multiplier condition is both necessary and sufficient for optimality.) (Of course, if you know your linear algebra, you can derive the same result without the KKT theorem, just like you don't need calculus to tell you that a x^2 + b x has its minimizer at x=-b/(2a).)

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Callout

assume rank m

Thomas

Typewritten Text

Thomas

Typewritten Text

Orthogonal Projection onto HyperplanesConsider the hyperplane

H = {x ∈ Rn : aTx = b} (0 6= a ∈ Rn, b ∈ R).

Then by the previous slide:

PH(y) = y − a(aTa)−1(aTy − b) = y − aTy − b

‖a‖2a.

Lemma (distance of a point from a hyperplane) Let H = {x ∈ Rn :aTx = b}, where 0 6= a ∈ Rn and b ∈ R. Then

d(y,H) =|aTy − b|‖a‖

.

Proof.

d(y,H) = ‖y − PH(y)‖ =

∥∥∥∥y −(

y − aTy − b

‖a‖2a

)∥∥∥∥ =|aTy − b|‖a‖

.


Orthogonal Projection onto Half-SpacesLet H− = {x ∈ Rn : aTx ≤ b},where 0 6= a ∈ Rn and b ∈ R.Then

PH−(x) = x− [aTx− b]+‖a‖2

a

Proof. In class


Thomas

Typewritten Text

Orthogonal Regression

I a1, . . . , am ∈ Rn.

I For a given 0 6= x ∈ Rn andy ∈ R, we define thehyperplane:

Hx,y :={

a ∈ Rn : xTa = y}.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

a2

a3

a5

a4

a1

I In the orthogonal regression problem we seek to find a nonzero vector x ∈ Rn

and y ∈ R such that the sum of squared Euclidean distances between thepoints a1, . . . , am to Hx,y is minimal:

minx,y

{m∑i=1

d(ai ,Hx,y )2 : 0 6= x ∈ Rn, y ∈ R

}.


Thomas

Typewritten Text

Q: Is this different from the standard least square problem (aka linear regression) in statistics? (See Chapter 3.)

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

A: Yes, quite different!

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Orthogonal RegressionI d(ai ,Hx,y )2 =

(aTi x−y)2‖x‖2 , i = 1, . . . ,m.

I The Orthogonal Regression problem is the same as

min

{m∑i=1

(aTi x− y)2

‖x‖2: 0 6= x ∈ Rn, y ∈ R

}.

I Fixing x and minimizing first with respect to y we obtain that the optimal yis given by y = 1

m

∑mi=1 aT

i x = 1meTAx.

I Using the above expression for y we obtain thatm∑i=1

(aTi x− y

)2=

m∑i=1

(aTi x− 1

meTAx

)2

=m∑i=1

(aTi x)2 − 2

m

m∑i=1

(eTAx)(aTi x) +

1

m(eTAx)2

=m∑i=1

(aTi x)2 − 1

m(eTAx)2 = ‖Ax‖2 − 1

m(eTAx)2

= xTAT

(Im −

1

meeT

)Ax.


Thomas

Typewritten Text

Thomas

Callout

Not a convex problem, and this time we have no intention to convexify it !

Orthogonal Regression

I Therefore, a reformulation of the problem is

minx

{xT [AT (Im − 1

meeT )A]x

‖x‖2: x 6= 0

}.

Proposition. An optimal solution of the orthogonal regression problem (x, y)where x is an eigenvector of AT (Im− 1

meeT )A associated with the minimumeigenvalue and y = 1

m

∑mi=1 aT

i x. The optimal function value of the problemis λmin

[AT (Im − 1

meeT )A].


Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

We turn the (non-convex) optimization problem into an eigen- problem !

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Thomas

Typewritten Text

Documents

Lecture 10 - Linearly Constrained Problems: Separation