Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
Notation
R the set of real numbers
nR the set of real n-vectors
m nR the set of real m×n matrices
( )n n R R nonegative (positive) orthant of nR , i.e., the set of all vectors in nR
all the coordinates of which are nonnegative (positive)
xX x is an element of the set X
xX x is not an element of the set X
XY the set X is a subset of the set Y; X=Y is not included
X=Y the sets X and Y coincide
XY the union of sets X and Y
XY the intersection of sets X and Y
X\Y the set of all xX such that xY
XY The Cartesian product of sets X and Y, i.e., the set of pairs (x,y) where xX and yY
{x:T} or {x|T} the set of all x satisfying condition T
int(X), relint(X) The interior (or relative interior) of set X
cl(X) or X The closure of set X
bd(X) or X The boundary of set X
conv(X) The convex hull of set X
aff(X) The affine hull of set X
x reads “for all x”
x reads “there exists x”
dis(x,y) distance between two points x and y
dis(x,X) distance between point x and set Y: inf∈
,
N(x) the open neighborhood of the point x
N(X) the open neighborhood of the set X
N(X) the -neighborhood of the set X: {x: dis(x,X) <
X= the set X is empty
(a,b) satisfies the condition a<<b
2
[a,b] satisfies the condition ab
f:XY a one-to-one mapping of X onto Y; a function f with domain X and range Y
x nR the vector x is an element of the space nR
xT, AT the transpose of the vector x or matrix A
‖ ‖ the norm of the vector x; typically, the Euclidean norm is meant
|A| the determinant of matrix A
x0 (>0) the coordinates of x are nonnegative (positive)
x≼(≺)y component-wise inequality (strict inequality) between x and y
A0 (>0) the square symmetric matrix A is positive semidefinite (positive definite); sometimes denoted as ( )
Diag(z) or D(z) a diagonal matrix whose ith diagonal element is zi
In identity matrix of order n
0nm the zero matrix of order nm
f(x) the column vector whose ith element is (the gradient vector)
2f(x) or H(x) the square matrix whose (i,j)th element is (the Hessian matrix)
G(x) the rectangular matrix whose (i,j)th element is (the Jacobian
matrix)
∈ the set of all xX at which the minimum of the function f is attained
on X.
sup, inf supremum, infimum
3
Mathematics Review
VECTORS
Vector x nR =
nx
x
x
:2
1
= [x1 x2 … xn]T.
Special vectors: 1=
1
:
1
1
; 0=
0
:
0
0
; ei= positioni th
0
:
1
:
0
nR { x nR | xi0, i=1,2,…,n};
nR { x nR | xi>0, i=1,2,…,n}
NORM of a vector ( ) is a real-valued function x : nR nR that is a measure
of “length” or distance. It satisfies the following properties:
(i) x 0x; (ii) x =0x=0; (iii) x =||* x for all real ;
(iv) yx x + y x and y.
Various norms:
l1-norm: 1
x =
n
iix
1
l2-norm: 2
x =2
1
1
2
n
iix
lp-norm: p
x =pn
i
pix
1
1
l-norm:
x = ii xMax
4
Suppose xRn and y Rn; and x, y 0. Then xTy = <x, y> = ixiyi= cosyx ,
where is the angle between the vectors x and y.
Note that if
xTy = 0, then cos = 0 =90 (orthogonal vectors)
xTy > 0 then cos > 0 <90 (acute angle between vectors)
xTy < 0 then cos < 0 >90 (obtuse angle between vectors)
Also, xTx = i xi2 = 0cosxx =
2x (as defined by normal vector
multiplication)
Note that a vector also has associated with it a direction (i.e., the vector [6 6]T and
the vector [1 1]T have the same direction). This is easily seen by looking at their
product as [6 6][1 1]T = 12 = 6 6 1 1 cos (by the definition above), so
that cos =12/(722) = 1, i.e, =0.
x
y
Example in R2
5
Combinations of vectors: Given x1, x2…, xn
Linear combination = j jxj
Affine combination = j jxj, where j j=1
Convex combination = j jxj, where j j=1 and all j0
Note that every vector in R2 is a linear combination of x and y.
Linear Independence: The vectors x1, x2…, xn are said to be linearly
independent if j jxj = 0 implies that j=0 for all j.
Span: The vectors x1, x2…, xk are said to span the space L Rn if any xL can be
written as a linear combination of x1, x2…, xk.
Basis: The vectors x1, x2…, xk constitute a basis for Rn if they span Rn, but if one
of them is dropped then the remaining vectors do not span Rn.
Convex (as well as affine) combination of x and y
Affine combination of x and y
x
y
6
MATRICES
Amn = {aij} where i=1,2,…,m is the row index and j=1,2,…,n is the column
index. We also denote Amn = [a1 a2 … an] where each aj Rm.
Special matrices: In Identity Matrix of order nn
[0] the zero matrix
AT =transpose of A where (AT)ij = Aji. If AT=A then A is said to be symmetric.
Inverse of a matrix: Given Ann, if nn matrix A-1 AA-1= A-1A = I, then A-1 is
called the inverse of A; otherwise A is said to be singular.
A is said to have full rank if the maximum number of linearly independent
columns or rows (= rank of A) = Min{n,m}.
Matrix Norms: Suppose A is some nn matrix. Then the matrix norm A is said
to be induced by the corresponding vector norm x , x nR and is defined as
A = Axx
Max1
.
Thus
1
A = 1
11
Axx
Max
={Max i |(jaijxj)|, st {j |xj|}=1}} =1 1
n
ijj n i
aMax
For example, if A=1 3 2
5 7 9
2 1 6
then Ax=1 2 3
1 2 3
1 2 3
3 2
5 7 9
2 6
x x x
x x x
x x x
and so
1A ={Max {|x1+3x2+2x3|+|-5x1+7x2+9x3|+|2x1-x2+6x3|}, st {|x1|+|x2|+|x3|}=1}}
= Max {1+5+2 (corr.:1,0,0), 3+7+1 (corr.:0,1,0), 2+9+6 (corr.: 0,0,1)}
= 17 (…with x1=x2=0 and x3=1 or –1 at the optimum)
7
Similarly,
2
A = 2
12
Axx
Max
={Max
12 2
1 1
n n
ij ji j
a x
, st
21
1
2
n
jjx =1}} =
SQRT{max(ATA)}, where max(ATA) is the largest eigenvalue of the matrix ATA (may be shown…)
This matrix norm is also called the spectral norm. For our example, the
eigenvalues of ATA are {0, 23.48, 108.05, 167.47} (check using MATLAB or
other package….), so that 2
A = 47.167 =12.94.
A =
Axx
Max1
= 11 1
, st | | 1n
ij j jj ni n j
Max a x max x Max
=
n
jij
miaMax
11
For our example,
A ={Max [{Max (|x1+3x2+2x3|, |-5x1+7x2+9x3|, |2x1-x2+6x3|}],
st Max(|x1|,|x2|,|x3|)=1}
= Max{1+3+2 (corr.:1,1,1), 5+7+9 (corr.:-1,1,1), 2+1+6 (corr.:1,-1,1)}
= Max (6, 21, 9) = 21 (…with x1= -1 and x2= x3 = 1 at the optimum…
Another special norm is the Frobenius norm defined as F
A 1
22
1 1
n n
iji j
a
; note
that this is just the l2-norm of the vector whose elements are all the elements of A.
Also, note that 1I .
8
SETS
A set S is a collection of elements. e.g.,
S = {1,2,3,4} finite, countable S = {1,2,3,…} infinite but countable
S = {xR| 0x1} infinite and not countable
In nonlinear optimization problems, typically we deal with sets of the form
S = {xRn | “some set of conditions”};
e.g., S = {xRn | aTx 0}, i.e. the set of vectors in Rn making an angle of 90 or
less with the vector a. The empty set is the set with no elements in it.
Neighborhood: Given a point xRn, its -neighborhood (where R++) is
defined as the set N(x) = {yRn | xy <}. The set N(x) is also called a ball
(or more precisely, an open ball).
Note that for yN(x), xy <
We may also define an -neighborhood for an entire set S in an analogous
fashion: N(S) = {y Rn | xyx
S
Min <}.
x y
-x
y-x
N(x)
9
“LOCAL” Points belonging to some -neighborhood
“GLOBAL” If is “extremely large,” i.e., “everywhere.”
Sets could be (i) open, (ii) closed, (iii) neither open nor closed or, (iv) both open
and closed.
Suppose S is a subset of Rn. Set S is said to be open if xS, >0 N(x)S.
The interior of a set S is defined as int(S) = {xS| >0 N(x)S}, i.e., it is the
collection of points in S that have some sufficiently small -neighborhood that is
contained entirely within S. It should be clear that a set S is open if S = int(S).
The closure of a set S is defined as cl(S) = {x Rn | >0, SN(x)}. In
other words it is the set of all points in Rn for which an arbitrarily small -
neighborhood can still not be separated from S (obviously the closure also
includes the interior of S), i.e., it is the set of all points that are in, or can be made
“arbitrarily close to” the set S. It should be clear that a set S is closed if S = cl(S).
The set of boundary points of a set S (denoted by S) is defined as the set of all
points x Rn such that for every >0, N(x) has at least one point that belongs to S
and one that does not belong to S. A closed set contains all of its boundary points.
int(S) cl(S) S
10
Examples:
S = {xR| axb} is the closed set [a,b]. Note that int(S)= {xR| a<x<b},
cl(S)=S and S={a,b}.
S = {xR| a<x<b} is the open set (a,b). Note that int(S)= S, while
cl(S)={xR| axb} and S={a,b}.
S = {x R| a<xb} is the set (a,b] that is neither open nor closed. Note that
int(S)= {x R| a<x<b}, cl(S)={xR| axb} and S={a,b}.
A set S is said to be bounded if it can be contained within a ball of finite radius,
i.e., for all xS, x <M for some M>0. Note that all three sets above are
bounded, whereas the set S = {xR| x0} is closed but unbounded.
A set S is said to be compact if it is both closed as well as bounded.
SEQUENCES
A sequence of vectors {xk} (= x1, x2,…) is said to converge to a limit point x* if
*xx k 0 as k, i.e., given >0, a positive integer N() *xx k < for
kN(). We define *k xx k
lim .
A subsequence of {xk} is a subset of x1, x2,…, and is denoted by {xk} where
{1,2,3,…}
Bolzano-Weierstrass Theorem: Every sequence {xk} in a compact set S has a
convergent subsequence with limit x* in S.
If for a sequence, given any >0, N()>0 mk xx < k, mN(), then the
sequence is called a Cauchy sequence. A sequence in Rn has a limit if, and only
if, it is Cauchy.
11
FUNCTIONS
Given a domain D Rn, a function f is a mapping f:DR that maps each point
in the domain on to a real number. A function f is said to be continuous at a point
x0D, if >0, >0 yD, 0xy < |f(y) – f(x0)|<.
A simpler way to state it is that given an -neighborhood of f(x0), we can find a
-neighborhood of x0 such that any point in this -neighborhood leads to a
function value that is in the -neighborhood of f(x0). An equivalent definition of
continuity is that if f is continuous at x0D, then any sequence that converges to
x0 also has the corresponding function values to f(x0).
Fact: Continuous functions attain their maxima or minima over compact sets. For
example, Max f(x) = x; st 0x1 yields a maximum at x=1 since the maximum is
over a compact set. On the other hand, for the problem Max f(x) = x; st 0x<1,
the maximization is over a non-compact set, and thus even though the supremum
of f(x) is equal to 1, this is never attained at any x, i.e., there is no maximizing
point. Note that sometimes the maxima or minima over non-compact sets may
still be attained (e.g., the minimum for the latter problem is attained at x=0).
x0 x
f(x) f(x)
x
x0
12
Differentiability
Let S be a nonempty set in Rn. A function f: S R is differentiable at a point
x0 int(S) if for all h in Rn (x0+h)S, there exist
(a) a vector f(x0) and
(b) a function (x0, h) satisfying0
limh
(x0,h)=0
such that
f(x0+h) = f(x0) + [f(x0)]Th + h (x0,h) ()
Here f(x0) = T
nx
f
x
f
x
f
)(,...),(),(21
000 xxx is called the gradient vector.
Note that the above representation of f as given in () is called a first-order
Taylor series expansion of f at the point x0 and without the remainder term
h (x0,h) it is called a first-order Taylor series approximation of f at x0. An
equivalent form is f(x) f(x0) + [x-x0]Tf(x0)
Another important concept: Using the same definitions for f and h as above, f is
said to be twice differentiable at x0 int(S) if in addition to f(x0) and (x0,h) as
defined above, there exists an nn symmetric matrix H(x0) called the Hessian
f(x0+h) = f(x0) + [f(x0)]Th + (½)hT H(x0) h + 2
h (x0,h) ()
Here H(x0) has as its (i-j)th element the second partial )(2
0xji xx
f
.
Again, the above representation of f as given in () is called a second-order
Taylor series expansion of f at the point x0 and without the remainder term h
13
(x0,h) it is called a second-order Taylor series approximation of f at x0. An
equivalent form is f(x) f(x0) + [x-x0]T f(x0) + (½)[x-x0]TH(x0) [x-x0]
Mean Value Theorem: Given S, a nonempty convex open set in Rn and a
function f:SR that is differentiable on int(S). Then x1, x2int(S), (0,1)
x=x1+(1-)x2 and f(x2)=f(x1)+[x2-x1]Tf(x), i.e., f(x2)-f(x1)= [x2-x1]Tf(x).
An illustration for a univariate function
is shown alongside…
Taylor’s Theorem: Given S, a nonempty convex open set in Rn and a function
f:SR that is twice differentiable on int(S).
Then x1, x2S, (0,1) and x=x1+(1-)x2
f(x2)=f(x1)+[x2-x1]Tf(x1)+ (½)[x2-x1]TH(x) [x2-x1]
Functions with continuous first derivatives are said to be continuously
differentiable, functions with continuous first and second derivatives are said to
be twice-continuously differentiable, etc. In general, the class of functions with
continuous derivatives of order 1 through k is denoted by Ck. Functions with high
degrees of differentiability are said to be “smooth” functions.
f(x1)
x x1 x2
f(x2)
14
NUMERICAL SOLUTION OF EQUATIONS
Used when analytical solutions are not possible; e.g.,
2x - sin x = 0.5,
x2 + x log x= 1
x4 + 5 - 2x = 0
Methods are iterative and provide successively better approximations at each
iteration.
NEWTON RAPHSON METHOD To solve f(x) = 0. 1) Start with x0 - an initial 'guess.' 2) If |f(x0)| > 0, then we try to find a 'correction', say x, so that
|f(x0 +x)| < |f(x0)|.
3) Let x0 + x = x1, our new (and better) approximation. Repeat the procedure
until |f(xn)| =| f(x*)| 0 ('sufficiently' close to zero)
Q. How do we get x?
A. Use a TAYLOR SERIES expansion about x0:
f(x0 +x) = f(x0) + x f (x0) + ½ (x)2 f(x) + ...
want this to be zero small...negligible,
Thus, at iteration i we set the first order Taylor series approximation to zero and
solve: f(xi) + x f(xi) 0 x = - [f(xi) / f(xi)] Then xi+1 = xi + x
15
GEOMETRIC INTERPRETATION: We look for x* such that f(x*)=0, i.e., where
f intersects the X-axis,
Note that xi+1 = xi +x is the intersection of the tangent to the curve at [xi, f(xi)]
with the X-axis.
Example: Given a>0, find √ using only the operations (+, , , ),
Let √ = x x2 a = 0,
To solve the equation f(x) = x2-a = 0, let xi the guess at iteration i,
xi xi+1 xi+2 x*
x
slope=f (xi)
f(xi)
f(xi+1)
f(x)
x
0∆
⟹ ∆
16
⟹ ∆ 2
and the improved guess xi+1 = xi +x = xi
2
0.5
e.g., Suppose a = 5, and the initial guess x0 =2, and we use a tolerance of 0.001
i xi f(xi) xi+1
0 2 -1 0.5(2+5/2)=2.25
1 2.25 0.0625 0.5(2.25+5/2.25)= 2.2361
2 2.2361 0.00014 STOP
Suppose a = 2, and the initial guess x0 =1
i xi f(xi) xi+1
0 1 -1 0.5(1+2/1)=1.5
1 1.5 0.25 0.5(1.5+2/1.5)= 1.4167
2 1.4167 0.00704 0.5(1.4167+2/1.4167)= 1.4142
3 1.4142 -0.00004 STOP
17
WHEEL-COUNTERWEIGHT PROBLEM
r=radius of wheel,
M = unbalanced moment,
t = thickness of the counterweight
= density of the counterweight
d = lever arm; wheel center to center of gravity
= arc; half angle
A = area of counterweight
PROBLEM: Design a sector shaped counterweight for a locomotive driver wheel
to balance a moment M,
e.g. M = 600 Newton-m,
t = 0.1 m,
= 700 kg/m3,
r = 1.0 m,
M
M
r
d
c.g. of counterweight
d t
18
Moment (M) = (Force)*(Distance)= (Mass)*(g)*(Distance)
= (Volume)*(Density)*(g)*(Distance); i.e., M = (At) * () * (g) * (d)
r
Note that 2r sin 2 and cos = d/r
r2 (area of sector subtended by 2) c.g. ½ (counterweight area)
= +
i.e., the counterweight area A is given by
A= 2{r2 - ½ (2r sin)d} [Note that the c.g. is "inside" the counterweight]
= 2{r2 - r2 sin cos} = r2(2 - 2sin cos) = r2(2 - sin2)
Thus we have:
SOLVE FOR ...,
ANALYTICAL APPROACH
2 2 ⁄ CLEARLY IMPOSSIBLE TO SOLVE!
19
NEWTON RAPHSON APPROACH
Let ⁄ (from given values)
Then we solve
f() = 2 2 0
f() = cos 2 2 cos 2 2 sin 2 sin
If = approximation at iteration i, then ∆ = - f( ) f( ), and the improved
approximation at the next iteration is
= – {f( ) f( )}
‐0.2
‐0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
f()
20
#include <iostream.h> #include <math.h> #include <stdlib.h> float c,m,t,rho,r,tol,x,f,fprm; float deriv(float y) { return (‐sin(y)*(2*y‐sin(2*y)) + cos(y)*(2‐2*cos(2*y))); } void main() { cout << "Enter unbalanced moment (M) in Newton‐meters: "; cin >> m; cout << "Enter counterweight thickness (T) in meters: "; cin >> t; cout << "Enter counterweight material density (RHO) in kg/cu.m.: "; cin >> rho; cout << "Enter radius of flywheel (R) in m: "; cin >> r; cout << "Enter maximum allowable unbalanced moment in N‐m: "; cin >> tol; cout << "Enter initial guess for theta in radians: "; cin >> x; cout << "\n"; tol=tol/(t*rho*9.81*(r*r*r)); c = m/(t*rho*9.81*(r*r*r)); // Evaluate f(theta) at current approximation f=(cos(x)*(2*x‐sin(2*x)))‐c; do { cout << "theta = " << x << "\tf(theta) = " << f << endl; // Evaluate the derivative of f(theta) at the current approximation fprm=deriv(x); // Evaluate and add the correction x=x+(‐f/fprm); // Find the new function value f=(cos(x)*(2*x‐sin(2*x)))‐c; } while (fabs(f) > tol); cout << "theta = " << x << "\tf(theta) = " << f << endl; float ff = abs(f)*t*rho*9.81*(r*r*r); cout << "\nFinal unbalanced moment is:\t" << ff; }
21
c:\Users\rajgopal\Documents\CPP>newtrap Enter unbalanced moment (M) in Newton-meters: 600 Enter counterweight thickness (T) in meters: 0.1 Enter counterweight material density (RHO) in kg/cu.m.: 7000 Enter radius of flywheel (R) in m: 1 Enter maximum allowable unbalanced moment in N-m: 0.001 Enter initial guess for theta in radians: 1 theta = 1 f(theta) = 0.501935 theta = 0.180515 f(theta) = -0.079709 theta = 0.815954 f(theta) = 0.346872 theta = 0.46643 f(theta) = 0.0283223 theta = 0.423793 f(theta) = 0.00186569 theta = 0.420555 f(theta) = 1.09459e-05 theta = 0.420536 f(theta) = -1.85746e-09 Final unbalanced moment is: 0
c:\Users\rajgopal\Documents\CPP>newtrap Enter unbalanced moment (M) in Newton-meters: 600 Enter counterweight thickness (T) in meters: 0.1 Enter counterweight material density (RHO) in kg/cu.m.: 7000 Enter radius of flywheel (R) in m: 1 Enter maximum allowable unbalanced moment in N-m: 0.001 Enter initial guess for theta in radians: 1.4 theta = 1.4 f(theta) = 0.331597 theta = 1.58746 f(theta) = -0.140825 theta = 1.54445 f(theta) = -0.00738953 theta = 1.54193 f(theta) = -2.57154e-05 theta = 1.54192 f(theta) = -4.93854e-08 Final unbalanced moment is: 0
22
SECANT METHOD [TO SOLVE f(x)=0]
The Newton-Raphson method requires derivatives (f'(x)) which are sometimes
difficult to compute...
Let x0 and x1 be two initial "guesses"
To Find x2:
From properties of similar triangles:
0
Solving for x2 we have,
x0 x1 x2
f(x0)
f(x1)
f(x)
x
1) Draw the secant through [x0, f(x0)] and [x1, f(x1)]
2) Let its intersection with the X-axis be at x2
3) Go to Step 1 with x1 and x2 as the two new guesses…
∙
23
MINIMIZATION OF A FUNCTION OF A SINGLE VARIABLE
(WITHOUT USING DERIVATIVES)
Analytical expressions for f(x) may be unknown (e.g. f(x) is measured in a
laboratory), or f(x) may not be differentiable everywhere
The result of minimization will be an "interval of uncertainty" which contains
the optimum and we assume that an initial interval of uncertainty is specified -
say [a0, b0]. The method stops when the interval is "sufficiently" small.
We assume that f is UNIMODAL.
UNIMODAL NOT UNIMODAL
These methods are usually used to solve "subproblems" within larger multi-
variable problems.
f(x)
x
f(x)
xlocal min
global min
24
THREE POINT EQUI-INTERVAL SEARCH
A simple but rather inefficient approach...
Let [ai, bi] be the interval of uncertainty iteration i. Let ci = (ai+bi )/2 -- the
midpoint -- and let f(ai ), f(bi), f(ci ) be known,
Find the midpoints of the two new subintervals [ai, ci] and [ci, bi] via
,
Evaluate f(di) and f(ei)
Consider the relative magnitudes of f at the five points.
Case 1: New interval of uncertainty
ai+1= bi+1=
Case 2: New interval of uncertainty
ai+1= bi+1=
Case 3: New interval of uncertainty
ai+1= bi+1=
a d c e b
a d c e b
a d c e b
25
Case 4: New interval of uncertainty
ai+1= bi+1=
Case 5: New interval of uncertainty
ai+1= bi+1=
Case 6: New interval of uncertainty
ai+1= bi+1=
Repeat the procedure until the interval of uncertainty is smaller than some
pre-specified tolerance , i.e., |bn – an| < .
At each iteration, the interval of uncertainty is (usually) halved (for example in
Cases 1-3); occasionally it may be reduced by more than 50 % (Cases 4-5) or less
than 50% (Case 6). HOWEVER, we typically need TWO function evaluations at
each iteration -- this may be quite expensive....
A more efficient search method is the so-called GOLDEN SECTION SEARCH,
which tries to reduce the number of function evaluations required.
a d c e b
a d c e b
a d c e b
26
GOLDEN SECTION SEARCH
Again, we have an interval of uncertainty [ai, bi] at iteration i, along with a point
ci in the interior. HOWEVER, ci is NOT the midpoint of the interval...
We choose ci along with another interior point di such that they are both
symmetrical about the midpoint of [ai, bi].
Consider the cases shown below:
Case 1 Case 2
Case 3 Case 4
Case 5
a c d b a c d b
a c d b a c d b
a c d b
Find [ai+1, bi+1] for each of these five cases
27
The interval is (usually) less than halved at each step. (i.e., the new interval is
larger than in the 3-pt equi-interval search). HOWEVER, only ONE new
function evaluation is required!
QUESTION: How to locate ci inside [ai, bi]?
CASE 1
CASE 2
x
x x x = length of current interval
x ∆ = length of next interval
, are constants; + = 1
and must satisfy
∆ ∆ Δ ∆ ⇒ 1 ∆ /Δ
Δ ∆ ∆ ()
∆ Δ⁄ ⟹∆Δ
⟹1 2 ∆
Δ
So from ( and it follows that (1-2)/ = (1-); i.e., 2 - 3 +1 = 0
√ .Choose “ ” sign; “+” leads to >1
The ratio (+)/ = / = 1.618034 is called the GOLDEN RATIO.
ak ck dk bk
ak+1 ck+1 dk+1 bk+1
ak+1 ck+1 dk+1 bk+1
a c d b
= (3-5)/2 = 0.381966
= 1- = 0.618034
28
Q. WHAT IS THE IMPLICATION?
A. If ci (di) is placed at a distance of 0.381966*(bi –ai) units from ai (bi), then at
the beginning of iteration n, cn (dn) will automatically be at a distance of
0.381966*(bn-an) from an (bn).
Here, at each iteration the interval of uncertainty x is reduced by approx. 38%
and there is only ONE new function evaluation per iteration.
Compare Golden Section to 3-pt Equi-Interval Search...
No. of evaluations of f(x)
Interval of Uncertainty 3-pt. Equi-Interval
Interval of Uncertainty Golden Section
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1.00
(0.5)1=0.5
(0.5)2=0.25
(0.5)3=0.125
(0.5)4=0.0625
(0.5)5=0.01325
(0.5)6=0.0156
(0.5)1=0.0078
1.00 (0.618)1=0.618 (0.618)2=0.382 (0.618)3=0.236 (0.618)4=0.146 (0.618)5=0.090
(0.618)6=0.0557 (0.618)7=0.0344 (0.618)8=0.0213 (0.618)9=0.0132
(0.618)10=0.00813 (0.618)11=0.00503 (0.618)12=0.00311 (0.618)13=0.00192 (0.618)14=0.00119
29
ELECTRICAL CABLE INSULATION
Maximum Electrical Stress = /(2r)
Utilization = U =(Line Voltage) / (Maximum Electrical Stress)
⟹
ln2
2 ln
PROBLEM: Proportion R with r so as to maximize the "utilization" for a given
volume (cross-sectional area) of the insulation --- this allows a 'maximum' safe
voltage for the cable for a given stress (and hence presumably the most power).
A = area = R2 - r2 = r2(R2/r2 -1) = r2(x2 -1), where we define x=R/r
Also, U = r ln (R/r) r =U / (ln x). Therefore
ln1 ⟹
ln1
To maximize U, we perform the equivalent task of minimizing 1/U...
Line Voltage applied = V
, where
= charge/unit length
= dielectric constant of the insulation
INSULATION
CONDUCTOR
SHEATH
R
r
30
1/ 1
ln
i.e., 1 / ln , where V= 1/U and c=/A
(It turns out that this function is unimodal in x)
Thus one could use (1) Three point Equi-interval Search, or (2) Golden Section
Search.
31
FIBONACCI SEARCH
Named after Leonardo of Pisa, son of Bonacci (hence Fibonacci!), in 1202 A.D.
Demonstrated initially with rabbit-breeding...
ASSUME
Each pair of mature rabbits produces one pair per litter
Exactly ONE litter per month
Maturation requires 1 month
How many pairs of rabbits are there at the end of each month, starting with one
new born pair in January?
Month J F M A M J J A S O
Immature
Mature
TOTAL
1
0
1
0
1
1
1
1
2
1
2
3
2
3
5
3
5
8
5
8
13
8
13
21
13
21
34
FIBONACCI NUMBERS: 1, 1, 2, 3, 5, 8, 13, 21,...
In general, F0 = F1 = 1, FN+1 = FN + FN-1,
32
The Fibonacci numbers are used for the Fibonacci Search. It also starts with 2
initial evaluations, and with only ONE subsequent evaluation per iteration.
HOWEVER, the interval of uncertainty is not reduced by the same amount each
time. Also, the number of function evaluations planned (say n) is determined
before the search commences.
Suppose at iteration k we have the interval Ik = [ak, bk] of length =bk - ak. Let
Then the new interval of uncertainty Ik+1 will usually be [ak, dk] or [ck, bk],
In the first case, = dk - ak = (Fn-k/Fn-(k-1))
In the second case, = bk - ck = (Fn-k/Fn-(k-1)) (VERIFY!!)
In either case is reduced by a factor of (Fn-k/Fn-(k-1))
EXERCISE: At iteration k+1, [ak+1, bk+1] = (i) [ck, bk] or (ii) [ak, dk]. Show
that,
(i) ck+1 = dk or (ii) dk+1 = ck,
Thus only ONE new function evaluation is required at the next step.
33
QUESTION: How should we choose n?
Let l = required final interval of uncertainty (Tolerance). We know that
1
2
1 2
1
1 2
3
2 3
2
2 3
etc., etc., etc…
1
2 1
2 1
2
2 1 1
So, if we want to be < l, then we must choose < l, i.e., Fn > / l.
Thus we pick the first n such that this condition is satisfied.
34
The Fibonacci search then proceeds as follows:
1) Start with a1, b1 (where b1-a1= ). Then find c1 and d1 using the formulas
on the previous page. Set k=0.
2) Set k=k+1. Evaluate f(ak), f(bk), f(ck) and f(dk) and eliminate a portion of
the interval. In general, we will have one of two cases:
(i) ak+1=ak, bk+1=dk, dk+1=ck and ck+1 is computed via the formula on
page 32
(ii) ak+1=ck, bk+1=bk, ck+1=dk and dk+1 is computed via the formula on
page 32
3) If kn-1 go to Step 2.
4) When k=n-1 the two formulas for ck and dk yield the same value ck = dk =
ak+ (½)(bk-ak), since F0 =F1 = 1. So no further interval reduction is
possible. Consequently, we place cn-1 (or dn-1 as the case may be…) at a
distance from the existing interior point.
5) Repeat the procedure of Step 2 for this last interval and obtain the final
interval of uncertainty [an, bn]
NOTE: The quantity is referred to as a “distinguishability constant.”
We will have bn-an = (b1-a1)/Fn + , where = 0 or =.
35
a1 c1 X* d1 b1
=13l ITER 1
a2 X* c2 d2 b2
=8l ITER 2
a3 X* c3 d3 b3
=5l ITER 3
a4 c4 X* d4 b4
=3l ITER 4
a5 c5 d5 X* b5
=2l ITER 5
Note that d5 is the mid‐point of [a5,b5]…
l
= l (Final Interval) ITER 6
Note that the length of the final interval is (1/F6)*(b1‐a1)+ = l+ since the final interval is [c5,b5] for this example. If the optimum had been in the final interval [a5,d5] then its length would have been l = (1/F6)*(b1‐a1).
Also, note that = + . Thus we start with = +
Fn = Fn‐1 + Fn‐2 1n-1 n-2
n n
F F+
F F 1 1 1 2 3
n-1 n-2
n n
F FI + I = I = I + I
F F
36
EXAMPLE: Suppose that the initial interval is [1.05, 4.00], so that =2.95 and
we need to reduce the final interval of uncertainty to l =0.01
Since /l = 295, the smallest value of n for which Fn exceeds 295 is given by
n=13 (…F13=377). So we will plan on 13 iterations. The search might perhaps
then proceed as follows:
a1= 1.05
c1= 1.05+ F11/F13(2.95) = 1.05+(144/377)(2.95) = 2.1768
d1= 1.05+ F12/F13(2.95) = 1.05+(233/377)(2.95) = 2.8732
b1= 4.00
a2= 1.05
c2= 1.05+ F10/F12(2.8732-1.05) = 1.05+(89/233)(1.8232) = 1.7464
d2= 2.1768
b2= 2.8732
a3= 1.7464
c3= 2.1768
d3==1.7464 + F10/F11(2.8732-1.7464) = 1.05+(89/144)(1.268) = 2.4428
b3= 2.8732
etc. etc. etc…
37
LAGRANGE'S INTERPOLATING POLYNOMIALS
Assume that we are given the n+1 values x0, x1,…,xn.
Define
… …
Some properties of lj(x):
1. lj(x) is a polynomial of degree n+1
2. lj(xj) = 1
3. lj(xi) = 0 for ij,
Lagrange's Interpolating polynomial is
Note that
. .
38
QUADRATIC INTERPOLATION OF A MINIMUM
Given a, b, c and f(a), f(b), f(c) we wish to find a point d and f(d).
The interpolating polynomial is
Note: p(a) = f(a), p(b) = f(b), p(c) = f(c).
Differentiating p(x) w.r.t. x and equating to zero, we obtain
2 2
2 0
Solving for x:
12
⟹12
Now, let d=x, then find f(d) and proceed as usual…
39
MULTI-DIMENSIONAL OPTIMIZATION WITHOUT DERIVATIVES: THE HOOKE-JEEVES METHOD
Used to solve the general problem:
MINIMIZE f(x), where x= [x1, x2, …, xn]T Rn,
At iteration i suppose we have the points xi and y1 ( Rn) and a set of search
directions d1,d2, ..., dn.
(NOTE: Usually these directions are the coordinate directions, but this is not
necessary…)
STEP 1
Using y1 find the optimum solution to the problem:
Minimize f() = f(y1+d1) to obtain 1.
Let y2 = y1+l d1,
Using y2 find the optimum solution to the problem:
Minimize f() = f(y2+d2) to obtain 2.
Let y3 = y2+2 d2, etc. etc…
Continue until yn+1 = yn +n dn
Let xi+1 = yn+1
Compare xi+1 and xi; IF ║xi+1 – xi║ < then STOP, ELSE go to STEP 2.
40
STEP 2
Let d = xi+1 – xi; solve the subproblem:
Minimize f() = f(xi+1 +d) to obtain *.
Let y1= xi+1 +*d and return to STEP 1
NOTE: At iteration 1, there is no y1; so use y1= x1)
====================X============X==================
AN EXAMPLE
Minimize f(x) = f(x1, x2) = (x1 -2)4 + (x1 -2x2)2
Start with x1 = 03
, and use d1= 10
, d2= 01
ITERATION 1
STEP 1 y1 = 03
, Min 03
10
= f(,3)
i.e., Min (-2)4 + (-2·3)2 1 = 3.13
Therefore y2 = y1 +l d1 = 03
3.13 10
= 3.133.00
.
Minimize 3.133.00
01
= f(3.13, 3+)
i.e. Min (3.13-2)4 + [3.13-2(3+)]2 2 = -1.44,
Therefore y3 = y2 +2 d2 = 3.133.00
1.44 01
= 3.131.56
.
Let x2 = 3.131.56
, ║x2 – x1║ is not negligible; go to Step 2.
41
STEP 2
d = 3.131.56
- 03
= 3.131.44
Minimize f() = 3.131.56
3.131.44
* = -0.10
Therefore y1= x2+*d = 3.131.56
- (0.10) 3.131.44
= 2.821.70
------------------------------
ITERATION 2
STEP 1 y1 = 2.821.70
, Min 2.821.70
10
= f(2.82+, 1.7)
This yields 1 = -0.12
Therefore y2 = y1 +l d1 = 2.701.70
.
Minimize 2.701.70
01
Min f(2.7, 1.7+) 2 = -0.35,
Therefore y3 = y2 +2 d2 = 2.701.35
.
Let x3 = 2.701.35
, ║x3 – x2║ is not negligible; go to Step 2.
STEP 2
d = x3-x2 = 2.701.35
– 3.131.56
= 0.430.21
Minimize f() = 2.701.35
0.430.21
* = 1.50
42
43
Therefore y1= x3+*d = 2.701.35
+ (1.50) 0.430.21
= 2.061.04
------------------------------
ITERATION 3,
STEP 1 y1 = 2.061.04
, Min 2.061.04
10
ETC. ETC.
Note that the optimal solution to the problem is, x*= [x1 =2, x2 =1], and we're
headed that way...
GEOMETRICAL INTERPRETATION
STEP 1: Exploratory search to get xi+1 (via y1, y2)
STEP 2: Line search along xi+1 – xi (to get y1, y2 for the next iteration)
x1=03
x2=3.131.56
x3=2.701.35
Contours of f(x)
x1 (=y1)
y3 (=x2)
y2
y1y2
y3 (=x3)
y1
44
CONVEXITY
i.e., If x1S, x2S x1 + (1-)x2 S for all [0,1], then S is a convex set.
In general, 1 x1 + 2 x2 is said to be a convex combination of x1 and x2 if 1,2 0
and 1+2 =1.
CONVEX HULL: The convex hull of the set S denoted by H(S) is the smallest
convex set that contains S. It is the collection of all convex combinations of
points in S.
e.g.
x1
x2
S
A set S is said to be CONVEX if for any two points x1 and x2 in S, the line segment joining x1and x2 also lies entirely in the set S.
x2
x1
S This set S is a NONCONVEX set since the line segment joining x1 and x2 is not entirely in the set S.
S S S
H(S) H(S) H(S)
45
CONVEX FUNCTIONS
The EPIGRAPH of a function f - denoted by epi(f) - is the set of points on or
above the graph of f(x).
The HYPOGRAPH of a function f - denoted by hyp(f) - is the set of points on or
below the graph of f(x).
A function f is convex if, and only if, the epigraph of f is a convex set.
convex nonconvex
Equivalently, for [0,1]
(1-)f(x1) + f(x2) f((1-)x1+ x2))
if, and only if, f is convex.
Some Convex Functions
f(x)
x
f(x)
x
epi (f) epi (f)
x1 x x2 x1 x x2 x1 x x2
F x
F F
f(x)f(x)
F=f(x)
46
SOME TERMINOLOGY AND SOME GENERALIZATIONS
1) EPIGRAPH: EPI(f) = { (x,y) | xRn, yR, yf(x) } Rn+1,
2) HYPOGRAPH: HYP(f) = { (x,y) | xRn, yR, yf(x) } Rn+1
3) LEVEL SET: S = { xRn | f(x) },
4) QUASICONVEXITY: Given [0,1], x1x2, the function f is said to be
quasiconvex if f(x1+(1-) x2) Max{f(x1), f(x2)}.
Clearly, every convex function is also quasiconvex but the converse is not
true. Quasiconvex functions can have discontinuities and local minima in
addition to the global minimum.
Theorem: f(x) quasiconvex all of its level sets are convex
Theorem: Suppose f(x) is quasiconvex. If x* is a strict local minimum of f, it
is also a strict global minimum of f.
f(x)
x
S
e.g.
f(x)
x
S
Quasiconvex, but not convex
NOTE: f: RnR convex S is convex for all
47
We can further strengthen quasiconvexity via the following:
4a) Given [0,1], x1x2, the function f is said to be strongly quasiconvex if
f(x1+(1-) x2) Max{f(x1), f(x2)}
Theorem: Suppose f(x) is strongly quasiconvex. Then it attains its (global)
minimum at no more than one single point.
We can weaken strong quasiconvexity slightly through the following:
4b) Given [0,1], x1x2, f(x1)f(x2), the function f is said to be strictly
quasiconvex if f(x1+(1-) x2) Max{f(x1), f(x2)}
Theorem: Suppose f(x) is strictly quasiconvex. If x* is a local minimum of f, it is
also a global minimum of f.
4c) A function f is quasiconcave if –f is quasiconvex
4d) A function that is both quasiconvex as well as quasiconcave is said to be
quasimonotone.
Strict (but not Strong) Strong as well as strict quasimonotone
48
5) PSEUDOCONVEXITY: f is said to be pseudoconvex on its domain if it is
differentiable everywhere and f T(x1) (x2-x1) 0 f(x2) f(x1), where f
is the gradient vector of f (i.e., fi=f/xi)
Theorem: If the gradient vector of a pseudoconvex function is equal to zero at
some point, the point must be a (global) minimizer.
Note that every (differentiable) convex function is also pseudoconvex, and every
pseudoconvex function is strictly quasiconvex (and therefore every local
minimum is also a global one).
THEOREM: Suppose S is a nonempty open convex set in Rn and let f:SR be
differentiable on S. Then f is (strictly) convex if, and only if, for any x'S,
f(x) (>) f(x') + f T(x') (x- x') for each xS
Note that if we are minimizing a convex f over xS then given any point x'S, the
affine function f(x') + f T(x') (x- x') bounds f from below. So the minimum of
this affine function over xS yields a lower bound on the optimum value of f.
This fact is often used in optimization algorithms.
≡
x1 xx2
f(x)
49
POSITIVE (NEGATIVE) SEMIDEFINITE (DEFINITE) MATRICES
Let H(x) be a square symmetric matrix. H is said to be POSITIVE
SEMIDEFINITE (DEFINITE) at x' if xTH(x')x (>) 0 for all x0, NEGATIVE
SEMIDEFINITE (DEFINITE) if xTH(x')x (<) 0 for all x0.
If the above conditions cannot be satisfied the matrix H is said to be INDEFINITE.
(Also, note here that if all entries in H are 0, it does not imply anything -- this is
not a sufficient condition for H to be positive semidefinite).
Examples:
2 00 2
⟹ 2 00 2
2 2
0forall . Thus H is positive definite everywhere.
2 22 2
⟹ 2 22 2
2
0forall . Thus H is positive semidefinite everywhere.
2 00 2
⟹ 2 00 2
2 2
May be of any sign. Thus H is indefinite.
6 2 0
0 12 1⟹
6 2 0
0 12 1 6 2
12 1 0forall . Thus H is positive definite everywhere.
50
TWICE DIFFERENTIABLE CONVEX FUNCTIONS
Suppose that f has 2nd partial derivatives. Then 2f(x) = H(x) is called the
HESSIAN matrix of f, where Hij(x) = [2f(x)]ij = .
QUADRATIC FORM: The quadratic form is a convenient way of representing
quadratic functions (polynomials of order no higher than 2). Specifically it has
the form f(x) = ½ xTHx + bTx, where H=2f(x) is the Hessian matrix of the
function f(x) with Hij = .
Example 1: f(x1, x2) = ax12 + bx2
2 + cx1x2
⟹ 12
22
Example 2: f(x1, x2) = ax12 + bx2
2 + cx32 + kx1x2 + lx1x3 + mx2x3 + p
⟹ 2
22
+ p
THEOREM
a) f: SR R; f convex 0 xS
b) f: SRnR; f convex 2f(x) is POSITIVE SEMIDEFINITE xS
c) f: SRnR; 2f(x) is POSITIVE DEFINITE xS f strictly convex
f strictly convex 2f(x) is POSITIVE SEMIDEFINITE xS
51
EIGENVALUES AND EIGENVECTORS
Given an n row by n column (nn) symmetric matrix A, if we have a scalar and
a nonzero vector vRn satisfying Av= v, i.e., [A-I]v = 0, then v is called
an eigenvector for A, and is called the corresponding eigenvalue for A.
To compute the eigenvalues, we solve
Det(A-I) = | A-I | = 0 for
Then for each i (in general we have n of them) we solve
(A-iI)v = 0 to find the eigenvector v
THEOREM: A matrix is positive (negative) semidefinite if all its eigenvalues are
nonnegative (nonpositive). A matrix is positive (negative) definite if all its
eigenvalues are positive (negative).
THEOREM: A real nn symmetric matrix has n (not necessarily distinct)
eigenvalues, and at most n linearly independent eigenvectors.
Example: A = 2 22 2
⟹ 2 22 2
Then |A-I| = (2-)2 - 4 = 0 - 4 = 0.
(-4) = 0 the eigenvalues are 1=4, 2=0.
(Note that these being nonnegative, A is positive semidefinite). Then the
eigenvectors are vT = [v1 v2], where2 22 2
= 0
52
i.e., (2-)v1 + 2v2 = 0, 2v1+ (2-)v2 = 0
i. e., for 4, for 0,
where t is any arbitrary real number.
THEOREM: Two eigenvectors of a real symmetric matrix corresponding to
different eigenvalues are orthogonal, i.e., if 12 then (v1)Tv2 = 0.
Further, the eigenvalues of such a matrix are always real.
DEFINITION: A principal minor Hii is the determinant of a (square) sub-matrix
obtained by deleting from a (square) matrix, a row and a column (or multiple
rows and columns) that share the same index i. The leading principal minors of a
square matrix are the determinants of the square sub-matrices along the main
diagonal.
THEOREM: A real symmetric matrix has eigenvalues that are positive if, and
only if, all the leading principal minors are positive. If the eigenvalues are all
nonnegative then the leading principal minors are all nonnegative. If ALL
possible principal minors are nonnegative, then the eigenvalues are nonnegative.
The above theorem indicates that if we have Hii0 (Hii<0) for some i , then H can
never be positive definite (semidefinite).
53
THEOREM: A real symmetric matrix A has eigenvalues that are negative if, and
only if, (-1)i|Ai| (where |Ai| is the ith leading principal minor) is positive for
i=1,..,n. If the eigenvalues are nonpositive then the corresponding (-1)i|Ai| are all
nonnegative, but for the eigenvalues to be nonpositive, (-1)i|Ai| must be
nonnegative for ALL possible principal minors of order i.
The last two theorems provide another way to check the definiteness of a
matrix (and thus to check for convexity...).
EXAMPLE 1: f(x1, x2) = -x12 - 5x2
2 + 2x1x2 + 10x1 - 10x2
2 2 1010 2 10 & 2 2
2 10
|H-I| = 0 (-2-)(-10-) - 4 = 0 = -6 25
Since 1 = -6-25 < 0 and 2 = -6+25 < 0 both eigenvalues are negative,
and therefore the Hessian H is negative definite, implying f is strictly concave.
EXAMPLE 2: . Here 2 11
Let us call . then |H-I| = 0 {Q(x1-2)-}(Qx1-) – Q2(x1-1)2 = 0
Solving for , we get (VERIFY…),
1 1 1 0
1 1 1 0
Therefore H is indefinite and f is neither convex nor concave.