Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Short course: Optimality Conditions andAlgorithms in Nonlinear Optimization
Part I - Introduction to nonlinear optimization
Gabriel Haeser
Department of Applied MathematicsInstitute of Mathematics and Statistics
University of São PauloSão Paulo, SP, Brazil
Santiago de Compostela, Spain, October 28-31, 2014
www.ime.usp.br/∼ghaeser Gabriel Haeser
Outline
Part I - Introduction to nonlinear optimizationExamples and historical notesFirst and Second order optimality conditionsPenalty methodsInterior point methods
Part II - Optimality ConditionsAlgorithmic proof of Karush-Kuhn-Tucker conditionsSequential Optimality ConditionsAlgorithmic discussion
Part III - Constraint QualificationsGeometric InterpretationFirst and Second order constraint qualifications
Part IV - AlgorithmsAugmented Lagrangian methodsInexact Restoration algorithmsDual methods
www.ime.usp.br/∼ghaeser Gabriel Haeser
www.ime.usp.br/∼ghaeser Gabriel Haeser
Optimization
Optimization is a mathematical problem with many “real world”applications. The goal is to find minimizers or maximizers of amultivariable real function, under a restricted domain.
to draw a map of Americawith areas proportional to the
real areas
hard-spheres problem: toplace m points on a
n-dimensional sphere in sucha way that the smallest
distance between two pointsis maximized.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Problem America
To draw a map of America, similar to the usual map, with areasproportional to real areas.
Minimize 12∑m
i=1 ‖pi − pi‖2,Subject to 1
2∑nj
i=1(pxi py
i+1 − pxi+1py
i ) = βj, j = 1, . . . , c
c = 17 countriesβj is the real area of country j
m = 132 given points pi on thefrontiers of the usual mapGreen-Gauss formula to computeareas
www.ime.usp.br/∼ghaeser Gabriel Haeser
Problem America
United States (without Alaska and Hawaii) = 8.080.464 km2
Brazil = 8.514.876 km2
Usual map ratio ≈ 1.32
Real ratio ≈ 0.95
Usual map Areas proportional to real areaswww.ime.usp.br/∼ghaeser Gabriel Haeser
Problem America
Areas proportional to GDP Areas proportional to population
www.ime.usp.br/∼ghaeser Gabriel Haeser
Kissing and hard-spheres problems
The kissing number of dimension n is the largest number of unitspheres that may be put touching a n-dimensional unit spherewithout overlapping.
The hard-spheres problem consists of maximizing the smallestdistance d between m points on the n-dimensional sphere of ra-dius 2.
n Kissing number2 63 124 245 40–446 72–787 126–1348 2409 306–36410 500–554
d∗ ≥ 2⇒ kissing number ≥ m
n = 2, n = 3,m = 6, d∗ = 2 m = 12, d∗ ≈ 2.194
www.ime.usp.br/∼ghaeser Gabriel Haeser
Applications: Packing
www.ime.usp.br/∼ghaeser Gabriel Haeser
Applications: PackingInitial configuration for molecular dynamics
www.ime.usp.br/∼ghaeser Gabriel Haeser
Large scale problems: Finance
Jacek Gondzio and Andreas Grothey (May 2005):quadratic convex program with 353 million constraints and 1010million variables.
Tool: Interior Point Method
www.ime.usp.br/∼ghaeser Gabriel Haeser
Large scale problems: Localization
Find a point in the rectangle but not in the ellipsis such that thesum of the distances to the polygons is minimized.
1.567.804 polygons.3.135.608 variables.1.567.804 upper levelconstraints.12.833.106 lower levelconstraints.convergence in10 outer iterations,56 inner iterations,133 funct. evaluations,185 seconds.
Tool: Augmented Lagrangian method
www.ime.usp.br/∼ghaeser Gabriel Haeser
TANGO Project - www.ime.usp.br/∼egbirgin/tango
Trustable Algorithms for Nonlinear General Optimization
www.ime.usp.br/∼ghaeser Gabriel Haeser
TANGO Project - www.ime.usp.br/∼egbirgin/tango
40.370 visits registered by Google Analytics - Since 2007(More than 3.000 downloads)
USA: 7.969, Brazil: 7.230, Germany: 2.974
www.ime.usp.br/∼ghaeser Gabriel Haeser
TANGO Project - www.ime.usp.br/∼egbirgin/tango
Spain: 733
www.ime.usp.br/∼ghaeser Gabriel Haeser
Historical Notes
Military Programs formulated as a system of linearinequalities gave rise to the term Programming in a linearstructure (title of the first paper by G. Dantzig, 1948).Koopmans shortened the term to Linear Programming.Dorfman (in 1949) thought that Linear Programming wastoo restrictive and suggest the more general termMathematical Programming, now called MathematicalOptimization.Nonlinear Programming is the title of the 1951 paper byKuhn and Tucker that deals with Optimality Conditions.These results are the extension of the Lagrange rule ofmultipliers (1813) to the case of equality and inequalityconstraints. These were previously considered on the 1939unpublished master’s thesis of Karush (KKT conditions).These works are particularly important because theysuggest the development of algorithms to deal withpractical problems.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Historical Notes
Linear Programming is part of a revolutionary developmentthat gave humanity the capability to formulate an objectiveand determine a way of detailed decisions to reach thisgoal in the best way possible.Tools: Models, algorithms, computers and softwares.The impossibility to perform large computations is the mainreason, according to Dantzig, to the lack of interest inoptimization before 1947.
Important topics in computing: (a) Dealing with sparsity allowsfor solving larger problems; (b) Global optimization; (c)Automatic differentiation of a function represented in aprogramming language.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Automatic Differentiation
f (x1, x2) = sin(x1) + x1x2
www.ime.usp.br/∼ghaeser Gabriel Haeser
Duality
Game theory and linear programming:1948 - G. Dantzig visited John von Neumann in Princeton.
J. von Neumann, 1963. Discussion of a maximum problem.D. Gale, H. W. Kuhn, A. W. Tucker, 1951. Linear programmingand the theory of games.
Elements of duality:
a pair of optimization problems, one a maximum problemwith objective function f and the other a minimum problemwith objective function h, based on the same datafor feasible solutions to the pair of problems, always h ≥ f
necessary and sufficient conditions for optimality are h = f
www.ime.usp.br/∼ghaeser Gabriel Haeser
Duality
(Fermat XVII century): Given 3 points p1, p2 and p3 on the plane,find the point x that minimizes the sum of the distances from x top1, p2 and p3.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Duality
(Thomas Moss, The Ladies Diary, 1755): “In the three sides ofan equiangular field stand three trees, at the distances of 10, 12and 16 chains from one another: to find the content of the field,it being the greatest the data will admit.”
www.ime.usp.br/∼ghaeser Gabriel Haeser
Duality
(J.D. Gergonne (ed), Annales de Mathématiques Pures et Ap-pliquées, 1810-1811): Given any triangle, circumscribe thelargest possible equilateral triangle about it.
Solution given in the 1811-1812 edition by Rochat, Vecten,Fauguier and Pilatte where duality was acknowledged.
www.ime.usp.br/∼ghaeser Gabriel Haeser
The problem (NLP)
Minimize f (x),Subject to hi(x) = 0, i = 1, . . . ,m.
gj(x) ≤ 0, j = 1, . . . , p.
f , hi, gj : Rn → R are (twice) continuously differentiablefunctions.
Ω = x ∈ Rn | h(x) = 0, g(x) ≤ 0 (feasible set)
www.ime.usp.br/∼ghaeser Gabriel Haeser
Solution
Global Solution: A feasible point x∗ ∈ Ω is a global minimizer ofNLP when
f (x∗) ≤ f (x),∀x ∈ Ω
Local Solution: A feasible point x∗ ∈ Ω is a local minimizer ofNLP when there exists a neighbourhood B(x∗, ε) of x∗ such that
f (x∗) ≤ f (x), ∀x ∈ Ω ∩ B(x∗, ε)
A(x) = j ∈ 1, . . . , p | gj(x) = 0 (set of active inequalities atx ∈ Ω)
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
Minimize x2 + y2,Subject to x + y− 1 = 0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
First order optimality condition - Lagrange multipliers
Minimize x2 + y2,Subject to x + y− 1 = 0.
x = 12 , y = 1
2 ,
(11
)+ (−1)
(11
)= 0
www.ime.usp.br/∼ghaeser Gabriel Haeser
Example
Maximize x2 + y2,Subject to x + 2y− 2 ≤ 0,
x ≥ 0,y ≥ 0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Minimize −x2 − y2,Subject to x + 2y− 2 ≤ 0,
−x ≤ 0,−y ≤ 0.
x = 2, y = 0,(−40
)+ 4
(12
)+ 8
(0−1
)= 0
x = 0, y = 1,(
0−2
)+ 1
(12
)+ 1
(−10
)= 0
x = 0.4, y = 0.8,(−0.8−1.6
)+ 0.8
(12
)= 0
www.ime.usp.br/∼ghaeser Gabriel Haeser
First order optimality condition - KKT condition
(Karush-Kuhn-Tucker) Under some condition (constraintqualification), if x∗ is a local solution, there exist Lagrangemultipliers λ ∈ Rm and µ ∈ Rp such that:
∇f (x) +
m∑i=1
λi∇hi(x∗) +
p∑j=1
µj∇gj(x∗) = 0, (Lagrange condition)
µjgj(x∗) = 0, j = 1, . . . , p, (complementarity)
h(x∗) = 0, g(x∗) ≤ 0, (feasibility)
µ ≥ 0. (dual feasibility)
Interpretation: up to first order, a feasible direction cannot be adescent direction.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Second order optimality condition
x∗ =
(0.40.8
),∇g1(x∗) =
(12
), ∇2f (x∗) =
(−2 00 −2
).
There exists some d ∈ Rn,∇g1(x∗)Td ≤ 0, dT∇2f (x∗)d < 0.
Theorem: Under some conditions, if x∗ is a local minimizer
dT
∇2f (x) +
m∑i=1
λi∇2hi(x∗) +
p∑j=1
µj∇2gj(x∗)
d ≥ 0,
for every d ∈ Rn such that
∇f (x∗)Td ≤ 0,
∇hi(x∗)Td = 0, i = 1 . . . ,m
∇gj(x∗)Td ≤ 0, j ∈ A(x∗).
Interpretation: All critical directions must be of ascent nature.
www.ime.usp.br/∼ghaeser Gabriel Haeser
History of nonlinear programming
Kuhn, Tucker, 1951.Nonlinear programming.
Albert William Tucker(1905 - 1995)Princeton UniversityTopology
Harold William Kuhn(1925 - 2014)Princeton UniversityPhD 1950, AlgebraGame Theory, Optimization
Saddle point problem
φ(x∗, u) ≤ φ(x∗, u∗) ≤ φ(x, u∗), ∀x, u
www.ime.usp.br/∼ghaeser Gabriel Haeser
History of nonlinear programming
William Karush (1917-1997)
1939. Minima of Functions of Several Variableswith Inequalities as Side Conditions.M.Sc. thesis, Department of Mathematics,University of Chicago
Calculus of Variations and Optimization
University of Chicago and California StateUniversity (also Manhattan Project)
I concluded that you two had exploited and de-veloped the subject so much further than I, thatthere was no justification for my announcing tothe world, “Look what I did, first.”, 1975.
www.ime.usp.br/∼ghaeser Gabriel Haeser
History of nonlinear programming
Fritz John (1910 - 1994)
1948. Extremum problems with inequalities assubsidiary conditions.
PhD 1933 in Göttingen under CourantNew York University
Partial differential equations, convex geometry,nonlinear elasticity
www.ime.usp.br/∼ghaeser Gabriel Haeser
History of nonlinear programming
Fritz John (1910 - 1994)
Let S be a bounded set in Rm. Find the sphereof least positive radius enclosing S.
Minimize F(x) := xm+1,Subject toG(x, y) := xm+1−
∑mi=1(xi− yi)
2 ≥ 0 for all y ∈ S.
the boundary of a compact convex set S in Rn
lies between two homothetic ellipsoids of ratio≤ n, and the outter ellipsoid can be taken to bethe ellipsoid of least volume containing S.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Snell’s law of diffractionsin θy
vy= sin θz
vz
www.ime.usp.br/∼ghaeser Gabriel Haeser
Snell’s law of diffractionsin θy
vy= sin θz
vz
Minimize T(x) :=‖x− y‖
vy+‖x− z‖
vzSubject to h(x) = 0
At the solution x∗, ∇T(x∗) = x∗−yvy‖y−x∗‖ + x∗−z
vz‖z−x∗‖ is parallel to∇h(x∗), the normal vector to the surface.
Define y = x∗ + y−x∗vy‖y−x∗‖ and z = x∗ + z−x∗
vz‖z−x∗‖ .Hence −∇T(x∗) = (y− x∗) + (z− x∗) is the diagonal of thefollowing parallelogram:
www.ime.usp.br/∼ghaeser Gabriel Haeser
Snell’s law of diffractionsin θy
vy= sin θz
vz
By triangular sim-ilarity, y and z areequally away fromthe normal line.Hence‖y− x∗‖ sin θy =‖z− x∗‖ sin θz.The calculation‖y − x∗‖ = 1
vyand
‖z− x∗‖ = 1vz
yieldsSnell’s law.
www.ime.usp.br/∼ghaeser Gabriel Haeser
External Penalty Method
Choose a sequence ρk with ρk → +∞ and for each k solvethe problem
Minimize f (x) + ρkP(x),
obtaining the (global) solution xk, if it exists.P is a smooth functionP(x) ≥ 0
P(x) = 0⇔ h(x) = 0, g(x) ≤ 0
For example: P(x) = ‖h(x)‖22 + ‖max0, g(x)‖2
2
www.ime.usp.br/∼ghaeser Gabriel Haeser
External Penalty Method
Theorem: If xk is well defined then every limit point of xk isa global solution to Minimize P(x)
Theorem: If xk is well defined and there exists a point wherethe function P vanishes (feasible region is not empty), thenevery limit point of xk is a global solution ofMinimize f (x), Subject to h(x) = 0, g(x) ≤ 0.
The External Penalty Method can be used as a theoretical toolto prove KKT conditions, but also, it can be adjusted to be anefficient algorithm (augmented lagrangian method).
www.ime.usp.br/∼ghaeser Gabriel Haeser
External Penalty Method
Minimize x21 + x2
2,Subject to x1 − 1 = 0
x2 − 1 ≤ 0.
Minimize x21 + x2
2 + ρk((x1 − 1)2 + max0, x2 − 12)(= Φk(x)).
Solving ∇Φk(x) = 0 we get xk = ( ρk1+ρk
, 0)→ (1, 0).
Show simulation
www.ime.usp.br/∼ghaeser Gabriel Haeser
Internal Penalty Method
Choose a sequence µk with µk → 0+ and for each k solve theproblem
Minimize f (x) + µkB(x),Subject to h(x) = 0
g(x) < 0.
B is smoothB(x) ≥ 0
B(x)→ +∞ if some gi(x)→ 0 with g(x) < 0.For example: B(x) = −
∑mi=1 log(−gi(x))
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Consider the convex quadratic problem
Minimize cTx + 12 xTQx,
Subject to Ax = bx ≥ 0.
and the barrier subproblem
Minimize cTx + 12 xTQx− µ
∑nj=1 log xj,
Subject to Ax = bx > 0.
KKT condition
c− ATλ+ Qx− µX−1e = 0,Ax = b,
where X−1 = diagx−11 , . . . , x−1
n and e = (1, . . . , 1)T. Denotings = µX−1e we get
ATλ+ s− Qx = c,Ax = b,
XSe = µe, (x, s) > 0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Active-set methods
ATλ+ s− Qx = c,
Ax = b,
XSe = 0,
(x, s) ≥ 0.
Interior point methods
ATλ+ s− Qx = c,
Ax = b,
XSe = µe,
(x, s) > 0.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Complementarity: xisi = 0,∀i = 1, . . . , n.
Active-set methods try to guess the optimal active subsetA ⊆ 1, . . . , n and set xi = 0 for i ∈ A (active constraints), si = 0for i 6∈ A (inactive constraints).
Interior point methods use ε-mathematics:Replace xisi = 0,∀i = 1, . . . , nby xisi = µ, ∀i = 1, . . . , n.
Force convergence by letting µ→ 0+.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Solve the nonlinear system of equations
f (x, λ, s) = 0,
where f : R2n+m → R2n+m is the mapping:
f (x, λ, s) =
ATλ+ s− Qx− cAx− bXSe− µe
.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Newton direction:
−Q AT IA 0 0S 0 X
.
∆x∆λ∆s
=
c− ATλ− s + Qxb− Axµe− XSe
.
Reduce µ at each Newton iteration.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Algorithm: Step 0: Choose (x0, λ0, s0), (x0, s0) > 0, µ0 > 0 andparameters 0 < γ < 1 and ε > 0. Set k = 0.
Step 1: Compute the Newton direction (∆x,∆λ,∆s) at(x, λ, s) := (xk, λk, sk).
Step 2: Choose a stepsize α such that (xk +α∆x, sk +α∆s) > 0.
Step 3: Update µk+1 = γµk.
Step 4: If xksk ≤ εx0s0, stop. Else set k := k + 1 and go to Step 1.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Interior Point Method
Consider the merit function
ψ(x, s) = (n +√
n) log(xTs)−n∑
i=1
(xisi),
(Note that ψ(x, s)→ −∞⇒ xTs→ 0.)
Choosing the stepsize α that minimizes ψ(xk + α∆x, sk + α∆s)(exact line search) we get:
Theorem: If γ = nn+√
n , we have xTk sk ≤ εxT0 s0 in O(√
n log(nε
))iterations.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Algorithms
There are no “direct method” to solve NLP.NLP is solved using iterative methods.An iterative method generates a sequence of pointsxk ∈ Rn that converges (or not) to a solution of the problem.Iterative methods are programmed and implemented oncomputers, where real mathematical operations arereplaced by floating point operations.
www.ime.usp.br/∼ghaeser Gabriel Haeser
Algorithms
Theory is necessary to avoid performing an infinite numberof experiments.Useful theory should be able to predict the behavior ofmany experiments.Usually, the theory does not refer to the real sequencesgenerated by the computer, but theoretical sequencesdefined by the algorithms.The analogy between real sequences and theoreticalsequences is not perfect.There are practical phenomena that the theory is not ableto predict, but relevant theory is the one that contributs inexplaining practical phenomena.
www.ime.usp.br/∼ghaeser Gabriel Haeser