Introduction to Optimization
Karyn Sutton
SAMSI Undergraduate Workshop 2007
Wednesday, May 23, 2007
SAMSI Undergraduate Workshop 2007
INTRODUCTION TO OPTIMIZATION 1
Optimization
• the study of finding maxima/minima of functions, possibly subject to constraints• Find the min/max of a real-valued function of n variables: f(x1, ..., xn), xi ∈ R.• Could do this ‘manually’, evaluating f at various values of x, so why not?• Unconstrained:
min(x2+ y
2) x, y ∈ R
• Constrained:
min(x2+ y
2) x ≥ 0, y ≥ 0
min(x2+ y
2) y = x + 2
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 2
Calculus ReviewFinding Min/Max f(x), x ∈ R:
• f ′(x) = 0
• f ′′(x) > 0 local min• f ′′(x) < 0 local max• Global vs. local min/max
Finding Min/Max of f(x), x ∈ R2:
• 2nd Deriv test =⇒ local min, local max, saddle point• Gradient: direction of greatest change of f
∇f =
∂f∂x1∂f∂x2
!
• Gradient can be used in numerical methods
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 3
Gradient Descent Methods• Finding local extrema by subtracting a multiple of gradient• Assumes differentiability of function
∇f =
0B@∂f∂x1...∂f
∂xn
1CA• Extremum found depends on starting point, x0
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 4
Newton’s method
• Root-finding method:
xn+1 = xn −f(xn)
f ′(xn)
• Approximates function by the tangent line at that point.• Next step is x-intercept of that line
0 1 2 3 4 5 6!4
!2
0
2
4
6
8
x
y
One Step of Newtons Method for Finding a Root
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 5
Newton’s Method
• Application of a root-finding algorithm to f ′(x):
xn+1 = xn −f ′(xn)f ′′(xn)
• Assumes f is at least twice differentiable.
• Use Newton’s method for f(x) = 10x3 − 50x2 + 2x + 1 innewton1.m. (Results plotted for −2 ≤ x ≤ 6.)
• Try different values of initial conditions and note:
? # of iterations to extremum? path to extremum? which extremum method converges to
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 6
Gradient Descent ExampleExample of this in gd.m:
• Finding minimum of f(x, y) = 12
(αx2 + y2
)• Only one minimum =⇒ local = global
• Iterative method using 1st and 2nd order information
∇f =(
αxy
)H(f) =
(α 00 1
)
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 7
Gradient Descent Example
• Convergence for α > 0? Convergence for α < 0?
• Rate of convergence: how does x0 affect total # of iterations?
• Worst/best starting point for α = 4?
• Most algorithms cannot guarantee convergence
• But some popular algorithms perform well despite lack ofconvergence theory.
• However, some problems involve non-smooth functions =⇒ Directsearch methods
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 8
fminsearch/Nelder-Mead
• simplex is a convex hull: minimal convex set containing these n + 1 points
• form an n-simplex• n-simplex n-dimensional analogue of triangle
? n = 1, line segment? n = 2, triangle? n = 3, tetrahedron? etc...
• form simplex using n + 1 points
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 9
fminsearch/Nelder-Mead
• method moves away from the ‘worst’ of these points
• at each step, simplex can reflect, expand, contract shrink
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 10
fminsearch/Nelder-Mead
• Generally, no proof of convergence, BUT
• easy to implement
• inexpensive (function evaluations/iteration)
• no derivatives needed
• good progress at beginning of process
• (There are other simplex methods...)
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 11
Snapshots of minimizationAt steps 0, 1, 2, 3:
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 12
Snapshots of minimizationAt steps 12, 30:
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 13
Examples
• In direct ex.m: uses fminsearch to find min off(x) = 10x3 − 50x2 + 2x + 1
• In gradient ex.m: uses fminunc to find min off(x) = 10x3 − 50x2 + 2x + 1
Do the following:
1. Try x0 = −1 for both direct ex.m and gradient ex.m.
2. Try x0 = 2 for both...
3. Edit gradient ex.m so that the user-supplied gradient is not used.
4. Now change the function - try f(x) = |x| (in Matlab: abs(x)).
SAMSI Undergraduate Workshop 2007 May 23, 2007
INTRODUCTION TO OPTIMIZATION 14
Our LS problem
• Variables: β = [C, K]
• n = 2 so simplex is triangle• for m data points, cost function is:
L(β) =
mXi=1
|y(ti; β)− yi|2
• Could modify our cost function: add terms, normalize ( yi||y||), use weights, etc..
• Direct search:
[beta, resnorm] = fminsearch(@cost beam, init q, [], time, y tilde)
• Gradient-based search:
[beta, resnorm] = fminunc(@cost beam, init q, [], time, y tilde)
SAMSI Undergraduate Workshop 2007 May 23, 2007