Integer Programming: Large Scale Methods Part II (Chapter ...faculty.chicagobooth.edu/kipp.martin/root/htmls/coursework/36900/... · Methods Part II (Chapter 12 and Chapter 16)

Integer Programming: Large ScaleMethods Part II (Chapter 12 and Chapter 16)

University of ChicagoBooth School of Business

Kipp Martin

November 27, 2017

1 / 49

Outline

List of Files

Key Concepts

Cut Generation

Lagrangian Approach

Capstone Case: Capacitated Lot Sizing

Lagrangian Approach Application – Convex Optimization

2 / 49

Files

I genAssign.gms

I genAssignSub.gms

I genAssignData.txt

3 / 49

Key Concepts

I the quality of the formulation is what counts, not the size –quality usually measured by the lower bound

I if we can read the problem into memory we can solve thelinear relaxation (of course it may be a poor relaxation)

I many interesting applications have too many constraints orvariables to read into memory

I if there are too many variables we generate variables as needed

I we generate variables as needed by a pricing algorithm

I if there are too many constraints we generate constraints asneeded

I we generate constraints as needed by a separation algorithm

4 / 49

Key ConceptsThe mixed integer linear program under consideration is:

min c>x

(MIP) s.t. Ax ≥ b

Bx ≥ d

x ≥ 0

xj ∈ Z, j ∈ I

One solution approach is to solve the LP relaxation within branchand bound

min c>x

(MIP) s.t. Ax ≥ b

Bx ≥ d

x ≥ 0

5 / 49

Key Concepts

As an alternative, we try to define A and B appropriately and solve:

min c>x

(conv(MIP)) s.t. Ax ≥ b

x ∈ conv(Γ)

How to break the constraints into A and B takes a fair amount ofskill and practice. We want the characterization of conv(Γ) to bean easy hard problem.

If it is too easy there may not be a big win in terms of better lowerbounds.

6 / 49

Key Concepts

There are three approaches to actually solving

min c>x


x ∈ conv(Γ)

Approach 1: Characterize conv(Γ) by the inequalities that definethe polyhedron. This is the cutting plane approach. (Today)

Approach 2: Characterize conv(Γ) through inverse projection.Often called decomposition or column generation.

Approach 3: Lagrangian Relaxation (Today )

7 / 49

Cut Generation

There are three approaches to actually solving

min c>x


x ∈ conv(Γ)

We now illustrate cut generation with the TSP.

We need to figure out how to break up the constraints.

8 / 49

Cut Generation

Let’s go back and solve the cut generation problem. The problemis:

min∑e∈E

cexe (1)∑e∈δ(i)

xe = 2, i ∈ V (2)

∑e∈E(S)

xe ≤ |S | − 1, S ⊂ V \1 (3)

∑e∈E

xe = n (4)

xe ∈ 0, 1, ∀e ∈ E (5)

9 / 49

Cut Generation

Rewrite this as

min∑e∈E

cexe∑e∈δ(i)

xe = 2, i ∈ V

x ∈ Γ

where Γ corresponds to constraints (3)-(5). In the previous slidedeck, we characterized Γ by its extreme points and used columngeneration.

However, we can also generate the constraints in (3) throughseparation.

10 / 49

Cut GenerationWe now work with this example:

1

2

3

4

5

7

61

5

7

9

3

9

2

9

6

1

8

5

2

10

2

11 / 49

Cut Generation

So a problem relaxation by deleting the tour-breaking constraints.

min∑e∈E

cexe∑e∈δ(i)

xe = 2, i ∈ V

∑e∈E

xe = n

xe ∈ 0, 1, ∀e ∈ E

12 / 49

Cut GenerationHere is the solution. Argh! Subtours!

1

2

3

4

5

1 7

3

6

1

5

2

13 / 49

Cut Generation

The separation problem

max∑e∈E

xeαe −n∑i=1i 6=k

θi (6)

(SPk(x)) s.t. αe ≤ θi , e ∈ δ(i), i ∈ V (7)

θi ≥ 0. (8)

14 / 49

Cut Generation

Let k = 2. The solution is:

θ2 = 1, θ3 = 1, θ6 = 1

α(2,3) = 1, α(2,6) = 1, α(3,6) = 1

15 / 49

Lagrangian Approach

The MIP is

min c>x

(MIP) s.t. Ax ≥ b

x ∈ Γ

where, as always, the set Γ has special structure. Dualize theconstraints Ax ≥ b with a conformable vector λ ≥ 0 of Lagrangemultipliers and get

min c>x + λ>(b − Ax)

s.t. x ∈ Γ

16 / 49

Lagrangian Approach

Define L(λ) := minc>x + λ>(b − Ax) | x ∈ Γ.

The Lagrangian Dual problem (LD) is:

(LD) maxL(λ) : λ ≥ 0.

Theorem: (Weak Duality) v(LD) ≤ v(MIP)

Proof?

Theorem: L(λ) is a concave function of λ.

Proof?

Remark: No matter how ugly and non-convex the set Γ is, L(λ) isconcave, so at least we are trying to maximize a concave function.

17 / 49

Lagrangian Approach – Using Subgradients

Concave is not too bad, but this is a linear programming class! Idon’t have the ambition to tackle a concave function. You knowme, 1) I worry; and 2) I like easy problems.

γk is a subgradient of the concave function L(λ) at λk iff

L(λ) ≤ L(λk) + (γk)>(λ− λk)

for all λ.

Solve the following problem linear programming relaxation:

max θθ ≤ L(λk) + (γk)>(λ− λk), k = 1, . . . ,Kλ ≥ 0

(9)

where γk , k = 1, . . . ,K are subgradients. What is the geometry ofthis approach?

18 / 49

Lagrangian Approach – Finding Subrgradients

Here is how to find a subgradient. Given a λ ≥ 0, solve

L(λ) = minc>x + λ>

(b − Ax) | x ∈ Γ.

and obtain an optimal x(λ).

Theorem: The vector γ = b − Ax(λ) is a subgradient of L(λ) atλ = λ.

Proof: Next slide.

19 / 49

Lagrangian Approach – Finding Subrgradients

If

L(λ) = minc>x + λ>

(b − Ax) | x ∈ ΓL(λ) = c>x(λ) + λ>(b − Ax(λ))

then

L(λ) + (b − Ax(λ))>(λ− λ) = c>x(λ) + (b − Ax(λ))>λ

+(b − Ax(λ))>(λ− λ)

= c>x(λ) + (b − Ax(λ))>λ

+(b − Ax(λ))>λ− (b − Ax(λ))>λ

= c>x(λ) + (b − Ax(λ))>λ

≥ L(λ)

20 / 49

Lagrangian Approach - Subgradient Algorithm

Step 0: Initialize: Choose tolerance ε and K ← 1 and λK ← 0.Solve L(λK ) and obtain subgradient γK .

Step 1: Solve the relaxed Lagrangian dual (9) and obtain optimalθ and λ.

Step 2: K ← K + 1, λK ← λ, solve

L(λK ) = minc>x + (λK )>(b − Ax) | x ∈ Γ

save γK = b − Ax(λK ) and L(λK ).

Step 3: Stop if we satisfy ε− tolerance. Otherwise, add

θ ≤ L(λK ) + (γK )>(λ− λK )

to the relaxed Lagrangian dual (9).

Step 4: Go to Step 1

21 / 49


Consider the generalized assignment problem. Let xij be 1 iftask i is assigned person j , and 0 otherwise.

minn∑

i=1

m∑j=1

cijxij

m∑j=1

xij = 1, i = 1, . . . , n

n∑i=1

aijxij ≤ bj , j = 1, . . . ,m

xij ∈ 0, 1, i = 1, . . . , n, j = 1, . . . ,m

How do we partition into Ax ≥ b and Γ? We want an easy hardproblem.

22 / 49

Lagrangian Approach - Subgradient AlgorithmThe model genAssign.gms solves the integer programmingformulation for the generalized assignment problem.

SETS

i "task", j "server" ;

PARAMETERS

c(i, j), a(i, j), b(j) ;

VARIABLES

z "minimize the cost" ;

BINARY VARIABLES

x(i, j);

EQUATIONS

cost, assign( i), resource(j) ;

cost .. z =e= sum( (i, j), c(i, j)*x(i, j) ) ;

assign( i) .. sum(j, x(i, j) ) =g= 1;

resource(j) .. sum(i, a(i, j)*x(i, j) ) =l= b(j);

MODEL genAssign / cost, assign, resource/;

23 / 49


The data file is genAssignData.txt. For these data the optimalLP value is 20.32, and the optimal IP value is 34.0. A huge gap!

The GAMS model file genAssignSub.gms solves the Lagrangiandual. It contains two models: 1) genAssign without theassignment constraints; and 2) lagDual which is the relaxedLagrangian dual.

Model lagDual calls model genAssign inside the cut generationloop.

The value of the Lagrangian dual is 29.5, so we are really closingthe gap a lot.

24 / 49


cost .. z =e= sum( (i, j), adj_c(i, j)*x(i, j) ) ;

assign( i) .. sum(j, x(i, j) ) =g= 1;

resource(j) .. sum(i, a(i, j)*x(i, j) ) =l= b(j);

MODEL genAssign / cost, resource/;

cut(allcuts) .. theta =l= LagValue( allcuts)

+ sum( i, cutCoeffSub( allcuts, i)*(lambda(i) -

cutCoeffLambda( allcuts, i) ) )

MODEL lagDual / cut /;

Inside the loop,

adj_c(i, j) = c(i, j) - lambda.l(i);

SOLVE genAssign USING MIP MINIMIZING z ;

SOLVE lagDual USING LP MAXIMIZING theta ;

25 / 49

Lagrangian Approach

If the Lagrangian dual is

maxL(λ) : λ ≥ 0.

where L(λ) := minc>x + λ>(b − Ax) | x ∈ Γ. Then this isequivalent to

(LD) max θθ ≤ c>x + λ>(b − Ax), ∀x ∈ Γλ ≥ 0.

(10)

26 / 49

Lagrangian Approach

KEY Theorem: The optimal value of (LD) is equal to theoptimal value of CONV (MIP).

Proof: I am too tired! Accept on faith.

All of our approaches are equivalent!

27 / 49

Capstone Case: Capacitated Lot SizingDynamic Lot Sizing :Variables:

xit – units of product i produced in period t

Iit – inventory level of product i at the end of the period t

yit – is 1 if there is nonzero production of product i duringperiod t, 0 otherwise

Parameters:

dit – demand product i in period t

fit – fixed cost associated with nonzero production of producti in period t

cit – marginal production cost for product i in period t

hit – marginal holding cost charged to product i at the end ofperiod t

gt – production capacity in period t

28 / 49


Objective: Minimize sum of marginal production cost, holdingcost, fixed cost

N∑i=1

T∑t=1

(citxit + hit Iit + fityit)

Constraint 1: Do not exceed total capacity in each period

N∑i=1

xit ≤ gt , t = 1, . . . ,T

29 / 49


Constraint 2: Inventory balance equations

Ii ,t−1 + xit − Iit = dit , i = 1, . . . ,N, t = 1, . . . ,T

Constraint 3: Fixed cost forcing constraints

xit −Mityit ≤ 0, i = 1, . . .N, t = 1, . . . ,T

30 / 49

Capstone Case: Capacitated Lot SizingDynamic Lot Sizing : A “standard formulation” is:

minN∑i=1

T∑t=1

(citxit + hit Iit + fityit)

s.t.

N∑i=1

xit ≤ gt , t = 1, . . . ,T

Ii ,t−1 + xit − Iit = dit , i = 1, . . . ,N, t = 1, . . . ,T

xit −Mityit ≤ 0, i = 1, . . .N, t = 1, . . . ,T

xit , Iit ≥ 0, i = 1, . . . ,N, t = 1, . . . ,T

yit ∈ 0, 1, i = 1, . . . ,N, t = 1, . . . ,T .

31 / 49


The generic model is

min c>x

(MIP) s.t. Ax ≥ b

x ∈ Γ

Map the capacitated lot sizing model into this as follows: thecapacity constraints

∑Ni=1 xit ≤ gt map into the Ax ≥ b

constraints and the rest of the constraints

I inventory balance

I fixed charge big M

I nonnegativity and binary

define the Γ.32 / 49


Four approaches for generating bounds to find the optimal value of

min c>x

conv((MIP)) s.t. Ax ≥ b

x ∈ conv(Γ)

1. Column generation

2. Subgradient optimization to find optimal L(λ)

3. Generate the cuts that characterize conv(Γ)

4. The zitk formulation that also gives conv(Γ) (via Homework

5, question 5)

33 / 49


Dynamic Lot Sizing : An alternate formulation:

zitk is 1, if for product i in period t, the decision is to produceenough items to satisfy demand for periods t through k , 0otherwise.

T∑k=1

zi1k = 1

−t−1∑j=1

zij ,t−1 +T∑

k=t

zitk = 0, i = 1, . . . ,N, t = 2, . . . ,T

T∑k=t

zitk ≤ yit , i = 1, . . . ,N, t = 1, . . . ,T

34 / 49


Dynamic Lot Sizing : Characterize conv(Γ) through cuttingplanes. Barany, Van Roy, and Wolsey have characterized conv(Γ)by the (`,S) inequalities:

∑t∈S

xt ≤∑t∈S

(∑k=t

dk)yt + I`

for ` = 1, . . . ,T and ∀S ⊆ 1, . . . , `.Example: Let ` = 5 and S = 5. The cut is

x5 ≤ (∑k=5

dk)y5 + I` = d5y5 + I5.

35 / 49


The separation algorithm (formulated as an LP/IP) is given(x , I , y) and ` = 1, . . .T, solve

max∑t=1

x tαt −∑t=1

(∑k=t

dk)y tαt − I `

αt ∈ 0, 1, t = 1, . . . , `

and the αt = 1 index the elements in the set S .

36 / 49

Capstone Case: Capacitated Lot SizingDynamic Lot Sizing : An alternate formulation: use subgradient,Lagrangian duality to get the result. The Lagrangian problem tosolve in order to generate the subgradient is

minN∑i=1

T∑t=1

((cit + λt)xit + hit Iit + fityit)−T∑t=1

λtgt

s.t. Ii ,t−1 + xit − Iit = dit , i = 1, . . . ,N, t = 1, . . . ,T

xit −Mityit ≤ 0, i = 1, . . .N, t = 1, . . . ,T

xit , Iit ≥ 0, i = 1, . . . ,N, t = 1, . . . ,T

yit ∈ 0, 1, i = 1, . . . ,N, t = 1, . . . ,T .

What is the economic interpretation of λt in the term (cit + λt)xit?37 / 49


Dynamic Lot Sizing : An alternate formulation: use columngeneration to get convex hull.

This is what we did with the TSP problem in the previous set ofslides. See also Example 12.20 in the text.

38 / 49


Final Exam: Take the two product, five period data:

http://faculty.chicagobooth.edu/kipp.martin/root/

htmls/coursework/36900/datafiles/lotsizeDataExam.txt

and use one of the following techniques to find the convex hullrelaxation.

I the Barany, Van Roy, and Wolsey cuts

I subgradient optimization

I column generation

IMPORTANT: The final exam must be done on an individualbasis. You are not allowed to collaborate in any way with yourclassmates.

39 / 49

http://faculty.chicagobooth.edu/kipp.martin/root/htmls/coursework/36900/datafiles/lotsizeDataExam.txt

http://faculty.chicagobooth.edu/kipp.martin/root/htmls/coursework/36900/datafiles/lotsizeDataExam.txt


Objective: Apply projection and the Lagrangian duality toconvex optimization.

supx∈<n f (x)s.t. gi (x) ≥ 0 for i = 1, . . . , p

x ∈ Ω(CP)

where

I f (x) and gi (x) for i = 1, . . . , p are concave functions

I Ω is a closed, convex set.

40 / 49


The Lagrangian function L(λ) is

L(λ) := supf (x) +

p∑i=1

λigi (x) : x ∈ Ω.

The Lagrangian dual is

infλ≥0

L(λ).

Weak Duality: infλ≥0 L(λ) ≥ v(CP) where v(CP) is the optimalvalue of (CP).

41 / 49


Unfortunately, even for a convex problem, there may be a dualitygap, that is

infλ≥0

L(λ) > v(CP).

A great deal of research has gone into finding sufficientconditions that guarantee a zero duality gap.

These are often called constraint qualifications because theytypically involve conditions on the constraints.

One of the most well-know constraint qualifications is the Slaterconstraint qualification.

42 / 49


The Slater constraint qualification is that there exists an x∗ ∈ Ωsuch that gi (x

∗) > 0 for all i = 1, . . . , p.

The Slater constraint qualification gives the following classictheorem in convex optimization.

Theorem: Suppose the convex program (CP) is feasible andbounded. Moreover, suppose there exists an x∗ ∈ Ω such thatgi (x

∗) > 0 for all i = 1, . . . , p. Then there is zero duality gapbetween the convex program (CP) and its Lagrangian dual (9).Moreover, there exists a λ∗ ≥ 0 such that v(CP) = L(λ∗), i.e., theLagrangian dual is solvable.

We prove this result by using projection (because, as you know bynow, that is all I can do).

43 / 49

Lagrangian Approach Application – Convex OptimizationConstruct the semi-infinite linear program (yes, linear)

inf σs.t. σ −

∑pi=1 λigi (x) ≥ f (x) for x ∈ Ωλ ≥ 0.

(CP-SILP)

This is equivalent to the Lagrangian dual

inf L(λ)s.t. L(λ) = supf (x) +

∑pi=1 λigi (x) : x ∈ Ω

λ ≥ 0.(LD)

Replace

L(λ) = supf (x) +

p∑i=1

λigi (x) : x ∈ Ω

with

L(λ) ≥ f (x) +

p∑i=1

λigi (x), x ∈ Ω.

44 / 49


Theorem: Assume

I the convex program (CP) is feasible and bounded, and

I there exists a x∗ ∈ Ω such that gi (x∗) > 0 for all i = 1, . . . , p

then

I there is a zero duality gap between the convex program (CP)and its Lagrangian dual (LD), and

I there exists a λ∗ ≥ 0 such that v(LD) = L(λ∗), i.e., theLagrangian dual is solvable.

45 / 49


KEY IDEA: Show that the semi-infinite linear program (CP-SILP)is solvable and has zero duality gap. Do this using, yes you guessedit, projection!

Put (CP-SILP) into our linear programming format and solve withFourier-Motzkin elimination.

z − σ ≥ 0σ −

∑pi=1 λigi (x) ≥ f (x) ∀x ∈ Ωλi ≥ 0 i = 1, . . . , p

Eliminate σ and end up with

z −∑p

i=1 λigi (x) ≥ f (x) ∀x ∈ Ωλi ≥ 0 i = 1, . . . , p

46 / 49


Claim: The variables λ1, . . . , λp remain clean as the FourierMotzkin elimination procedure proceeds.

Use induction to show that after eliminating variables λ1, . . . , λk ,there is an inequality

z −p∑

i=k+1

λigi (x∗) ≥ f (x∗).

1. This implies variables λi , for i = k + 1, are clean Why?

2. If there are no dirty variables, there is no duality gap. Why?

47 / 49


We can also show:

v(CP-SILP) = v(LD) ≥ v(CP) = v(CP-FDSILP)

See http://www.optimization-online.org/DB_HTML/2013/

04/3817.html.

We just showed

v(CP-SILP) = v(CP-FDSILP)

therefore

v(LD) = v(CP).

48 / 49

http://www.optimization-online.org/DB_HTML/2013/04/3817.html



The Slater result can be strengthened.

Theorem: An instance of (CP-SILP) with finite optimal value istidy if and only if the set of optimal solutions to (CP-SILP) isbounded.

See http://www.optimization-online.org/DB_HTML/2013/

04/3817.html.

49 / 49



Documents

Integer Programming: Large Scale Methods Part II (Chapter ...faculty.chicagobooth.edu/kipp.martin/root/htmls/coursework/36900/... · Methods Part II (Chapter 12 and Chapter 16)