43
ACM 113: Lecture 1 Agenda What is a mathematical optimization problem? Linear and nonlinear programming First examples Least squares Convex optimization Nonlinear optimization What is this course about?

ACM 113: Lecture 1 - Stanford University

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ACM 113: Lecture 1 - Stanford University

ACM 113: Lecture 1

Agenda

What is a mathematical optimization problem?Linear and nonlinear programmingFirst examplesLeast squaresConvex optimizationNonlinear optimizationWhat is this course about?

Page 2: ACM 113: Lecture 1 - Stanford University

What is a mathematical optimization problem?

A mathematical optimization problem or optimization problem is of the form

(P) minimize f0(x)subject to fi(x) ≤ bi, i = 1, . . . ,m

x ∈ Rn is the optimization variablef0 : Rn → R is the objective functionfunctions fi are the constraints

A vector x? is solution to (P) if for all x such that

fi(x) ≤ bi, ∀i

thenf0(x?) ≤ f0(x)

Page 3: ACM 113: Lecture 1 - Stanford University

What is a mathematical optimization problem?

A mathematical optimization problem or optimization problem is of the form

(P) minimize f0(x)subject to fi(x) ≤ bi, i = 1, . . . ,m

x ∈ Rn is the optimization variablef0 : Rn → R is the objective functionfunctions fi are the constraints

A vector x? is solution to (P) if for all x such that

fi(x) ≤ bi, ∀i

thenf0(x?) ≤ f0(x)

Page 4: ACM 113: Lecture 1 - Stanford University

Linear and nonlinear programming

Linear programming: The objective and the constraints are linearA function f is linear if

f (x + λy) = f (x) + λf (y)

for all x, y ∈ Rn and λ ∈ R

Nonlinear programming: The objective or one of the constraints is notlinear

Convex programming: The objective functional and the constraintfunctionals are convexA function f is convex if

f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)

for all x, y ∈ Rn and 0 ≤ λ ≤ 1

In this course, emphasis on convex programming/optimization

Page 5: ACM 113: Lecture 1 - Stanford University

Linear and nonlinear programming

Linear programming: The objective and the constraints are linearA function f is linear if

f (x + λy) = f (x) + λf (y)

for all x, y ∈ Rn and λ ∈ R

Nonlinear programming: The objective or one of the constraints is notlinear

Convex programming: The objective functional and the constraintfunctionals are convexA function f is convex if

f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)

for all x, y ∈ Rn and 0 ≤ λ ≤ 1

In this course, emphasis on convex programming/optimization

Page 6: ACM 113: Lecture 1 - Stanford University

Linear and nonlinear programming

Linear programming: The objective and the constraints are linearA function f is linear if

f (x + λy) = f (x) + λf (y)

for all x, y ∈ Rn and λ ∈ R

Nonlinear programming: The objective or one of the constraints is notlinear

Convex programming: The objective functional and the constraintfunctionals are convexA function f is convex if

f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)

for all x, y ∈ Rn and 0 ≤ λ ≤ 1

In this course, emphasis on convex programming/optimization

Page 7: ACM 113: Lecture 1 - Stanford University

Linear and nonlinear programming

Linear programming: The objective and the constraints are linearA function f is linear if

f (x + λy) = f (x) + λf (y)

for all x, y ∈ Rn and λ ∈ R

Nonlinear programming: The objective or one of the constraints is notlinear

Convex programming: The objective functional and the constraintfunctionals are convexA function f is convex if

f (λx + (1− λ)y) ≤ λf (x) + (1− λ)f (y)

for all x, y ∈ Rn and 0 ≤ λ ≤ 1

In this course, emphasis on convex programming/optimization

Page 8: ACM 113: Lecture 1 - Stanford University

Convexity is more general than linearity

f (λx+(1−λ)y)

λ f (x)+(1−λ) f (y)

f (x)

f (y)

Page 9: ACM 113: Lecture 1 - Stanford University

Applications of mathematical optimization

Nearly all fields of science and engineering

ExamplesPortfolio managementData fittingRegularization in signal processing

Page 10: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least Squares

No constraintsObjective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 11: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least Squares

No constraintsObjective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 12: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least SquaresNo constraints

Objective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 13: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least SquaresNo constraintsObjective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 14: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least SquaresNo constraintsObjective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 15: ACM 113: Lecture 1 - Stanford University

Examples of optimization problems

Least SquaresNo constraintsObjective functional

f0(x) = ‖b− Ax‖22 =

k∑i=1

(aTi x− bi)2

where A ∈ Rk×n (typically n ≤ k) and A = row(a1, . . . , ak)

Solution(ATA)x = ATb ⇒ x? = (ATA)−1ATb

assuming ATA is invertible (if not?)

Many methods for solving least squares problems: fast and stableMature technologyImportant and straightforward to recognize a least squares problem

Page 16: ACM 113: Lecture 1 - Stanford University

Another examplef0(x) = ‖b− Ax‖2

2 + λ‖x‖22

Connected with statistical regularization (ridge regression)Trade-off between fit and size of the xj’sE.g. prior distribution on parameters in Bayesian statistics

Page 17: ACM 113: Lecture 1 - Stanford University

Linear programming

minimize cTx

subject to aTi x ≤ bi, i = 1, . . . ,m

No analytical solutionMature technology: fast and stable solvers

Simplex method (Dantzig, 47)Interior point methods (Karmakar, 84)

Using linear programmingMany problems can be recast as LP’sNot as easy to recognize as least squares problemsExample

minimize‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Page 18: ACM 113: Lecture 1 - Stanford University

Linear programming

minimize cTx

subject to aTi x ≤ bi, i = 1, . . . ,m

No analytical solution

Mature technology: fast and stable solversSimplex method (Dantzig, 47)Interior point methods (Karmakar, 84)

Using linear programmingMany problems can be recast as LP’sNot as easy to recognize as least squares problemsExample

minimize‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Page 19: ACM 113: Lecture 1 - Stanford University

Linear programming

minimize cTx

subject to aTi x ≤ bi, i = 1, . . . ,m

No analytical solutionMature technology: fast and stable solvers

Simplex method (Dantzig, 47)Interior point methods (Karmakar, 84)

Using linear programmingMany problems can be recast as LP’sNot as easy to recognize as least squares problemsExample

minimize‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Page 20: ACM 113: Lecture 1 - Stanford University

Linear programming

minimize cTx

subject to aTi x ≤ bi, i = 1, . . . ,m

No analytical solutionMature technology: fast and stable solvers

Simplex method (Dantzig, 47)Interior point methods (Karmakar, 84)

Using linear programmingMany problems can be recast as LP’sNot as easy to recognize as least squares problemsExample

minimize‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Page 21: ACM 113: Lecture 1 - Stanford University

minimize ‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Objective is maximum absolute deviation, not mean square absolutedeviationObjective functional is not differentiable

Recast as

minimize t

subject to aTi x− bi ≤ t

aTi x− bi ≥ −t

Important to recognize LP’s

minimizen∑

j=1

|xj|

subject to aTi x ≤ bi, i = 1, . . . ,m

Is this is an LP?

Page 22: ACM 113: Lecture 1 - Stanford University

minimize ‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Objective is maximum absolute deviation, not mean square absolutedeviationObjective functional is not differentiable

Recast as

minimize t

subject to aTi x− bi ≤ t

aTi x− bi ≥ −t

Important to recognize LP’s

minimizen∑

j=1

|xj|

subject to aTi x ≤ bi, i = 1, . . . ,m

Is this is an LP?

Page 23: ACM 113: Lecture 1 - Stanford University

minimize ‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Objective is maximum absolute deviation, not mean square absolutedeviationObjective functional is not differentiable

Recast as

minimize t

subject to aTi x− bi ≤ t

aTi x− bi ≥ −t

Important to recognize LP’s

minimizen∑

j=1

|xj|

subject to aTi x ≤ bi, i = 1, . . . ,m

Is this is an LP?

Page 24: ACM 113: Lecture 1 - Stanford University

minimize ‖b− Ax‖∞ = minimize maxi|aT

i x− bi|

Objective is maximum absolute deviation, not mean square absolutedeviationObjective functional is not differentiable

Recast as

minimize t

subject to aTi x− bi ≤ t

aTi x− bi ≥ −t

Important to recognize LP’s

minimizen∑

j=1

|xj|

subject to aTi x ≤ bi, i = 1, . . . ,m

Is this is an LP?

Page 25: ACM 113: Lecture 1 - Stanford University

Convex optimization

minimize f0(x)subject to x ∈ C or fi(x) ≤ bi,

with f0 and fi convex (or C convex set)

Solving convex problemsIn general, no analytical solutionMany reliable and efficient algorithms existAlmost mature technology

Using convex optimizationOften go unrecognizedMany problems can be formulated as convex problemsImportant to learn the skills to cast nonlinear programs into convex programsImportant to distinguish between convex and nonconvex problems

Page 26: ACM 113: Lecture 1 - Stanford University

Convex optimization

minimize f0(x)subject to x ∈ C or fi(x) ≤ bi,

with f0 and fi convex (or C convex set)

Solving convex problemsIn general, no analytical solutionMany reliable and efficient algorithms existAlmost mature technology

Using convex optimizationOften go unrecognizedMany problems can be formulated as convex problemsImportant to learn the skills to cast nonlinear programs into convex programsImportant to distinguish between convex and nonconvex problems

Page 27: ACM 113: Lecture 1 - Stanford University

Convex optimization

minimize f0(x)subject to x ∈ C or fi(x) ≤ bi,

with f0 and fi convex (or C convex set)

Solving convex problemsIn general, no analytical solutionMany reliable and efficient algorithms existAlmost mature technology

Using convex optimizationOften go unrecognizedMany problems can be formulated as convex problemsImportant to learn the skills to cast nonlinear programs into convex programsImportant to distinguish between convex and nonconvex problems

Page 28: ACM 113: Lecture 1 - Stanford University

Solving optimization problems

General optimization problemVery difficult to solveTypical methods are compromising; e.g. extremely long (prohibitive)computation time or solution is not found

ExceptionsLeast squares problemsLP’sConvex optimization problems

“In fact, the great watershed in optimization isn’t between linearity andnonlinearity, but convexity and nonconvexity”

Rockafellar, SIAM Review, 1993

Page 29: ACM 113: Lecture 1 - Stanford University

Solving optimization problems

General optimization problemVery difficult to solveTypical methods are compromising; e.g. extremely long (prohibitive)computation time or solution is not found

ExceptionsLeast squares problemsLP’sConvex optimization problems

“In fact, the great watershed in optimization isn’t between linearity andnonlinearity, but convexity and nonconvexity”

Rockafellar, SIAM Review, 1993

Page 30: ACM 113: Lecture 1 - Stanford University

Solving optimization problems

General optimization problemVery difficult to solveTypical methods are compromising; e.g. extremely long (prohibitive)computation time or solution is not found

ExceptionsLeast squares problemsLP’sConvex optimization problems

“In fact, the great watershed in optimization isn’t between linearity andnonlinearity, but convexity and nonconvexity”

Rockafellar, SIAM Review, 1993

Page 31: ACM 113: Lecture 1 - Stanford University

Nonlinear optimization

1. Nonconvex objective

f (x)

Page 32: ACM 113: Lecture 1 - Stanford University

2. Nonconvex constraints

minimize cTx

subject to Ax ≤ b

EasyFeasible for n ∼ 100, 000

minimize cTx

subject to Ax ≤ b and xj ∈ {0, 1}

Extremely hardInfeasible for n ≥ 40?

Not much is known about general nonlinear problems‘Art’ more than a science and technology

Page 33: ACM 113: Lecture 1 - Stanford University

2. Nonconvex constraints

minimize cTx

subject to Ax ≤ b

EasyFeasible for n ∼ 100, 000

minimize cTx

subject to Ax ≤ b and xj ∈ {0, 1}

Extremely hardInfeasible for n ≥ 40?

Not much is known about general nonlinear problems‘Art’ more than a science and technology

Page 34: ACM 113: Lecture 1 - Stanford University

2. Nonconvex constraints

minimize cTx

subject to Ax ≤ b

EasyFeasible for n ∼ 100, 000

minimize cTx

subject to Ax ≤ b and xj ∈ {0, 1}

Extremely hardInfeasible for n ≥ 40?

Not much is known about general nonlinear problems‘Art’ more than a science and technology

Page 35: ACM 113: Lecture 1 - Stanford University

Methods

Local optimizationFinds a local minimumOften gives no idea whether this is a global minimum or how far this is froma global minimumRequires initial guessUsually efficient methods

Global optimizationWorst case complexity is exponential with problem sizeSmall-size problems are not even practical

Page 36: ACM 113: Lecture 1 - Stanford University

Methods

Local optimizationFinds a local minimumOften gives no idea whether this is a global minimum or how far this is froma global minimumRequires initial guessUsually efficient methods

Global optimizationWorst case complexity is exponential with problem sizeSmall-size problems are not even practical

Page 37: ACM 113: Lecture 1 - Stanford University

Insights from convex optimization

Initialization for local opt. methodsConvex heuristics for nonconvex problems

Example:minimize #{j : xj 6= 0}subject to Ax = b

Combinatorial problem (practically impossible)

minimize∑n

j=1 |xj|subject to Ax = b

Convex problem. Sometimes, same solution (Candes et al., 2004)

Bounds for global optimization: dual of an optimization problem is alwaysconvex→ gives a lower bound on the optimal value

Relaxation: nonconvex constraints replaced with looser convexconstraints

Page 38: ACM 113: Lecture 1 - Stanford University

Insights from convex optimization

Initialization for local opt. methodsConvex heuristics for nonconvex problems

Example:minimize #{j : xj 6= 0}subject to Ax = b

Combinatorial problem (practically impossible)

minimize∑n

j=1 |xj|subject to Ax = b

Convex problem. Sometimes, same solution (Candes et al., 2004)

Bounds for global optimization: dual of an optimization problem is alwaysconvex→ gives a lower bound on the optimal value

Relaxation: nonconvex constraints replaced with looser convexconstraints

Page 39: ACM 113: Lecture 1 - Stanford University

Insights from convex optimization

Initialization for local opt. methodsConvex heuristics for nonconvex problems

Example:minimize #{j : xj 6= 0}subject to Ax = b

Combinatorial problem (practically impossible)

minimize∑n

j=1 |xj|subject to Ax = b

Convex problem. Sometimes, same solution (Candes et al., 2004)

Bounds for global optimization: dual of an optimization problem is alwaysconvex→ gives a lower bound on the optimal value

Relaxation: nonconvex constraints replaced with looser convexconstraints

Page 40: ACM 113: Lecture 1 - Stanford University

Insights from convex optimization

Initialization for local opt. methodsConvex heuristics for nonconvex problems

Example:minimize #{j : xj 6= 0}subject to Ax = b

Combinatorial problem (practically impossible)

minimize∑n

j=1 |xj|subject to Ax = b

Convex problem. Sometimes, same solution (Candes et al., 2004)

Bounds for global optimization: dual of an optimization problem is alwaysconvex→ gives a lower bound on the optimal value

Relaxation: nonconvex constraints replaced with looser convexconstraints

Page 41: ACM 113: Lecture 1 - Stanford University

Insights from convex optimization

Initialization for local opt. methodsConvex heuristics for nonconvex problems

Example:minimize #{j : xj 6= 0}subject to Ax = b

Combinatorial problem (practically impossible)

minimize∑n

j=1 |xj|subject to Ax = b

Convex problem. Sometimes, same solution (Candes et al., 2004)

Bounds for global optimization: dual of an optimization problem is alwaysconvex→ gives a lower bound on the optimal value

Relaxation: nonconvex constraints replaced with looser convexconstraints

Page 42: ACM 113: Lecture 1 - Stanford University

This course

Recognize and formulate convex problemsDevelop algorithms and efficient codeLearn about the useful theory

TopicsConvexity: convex sets, functions, optimizationAlgorithmsExamples and applications

Page 43: ACM 113: Lecture 1 - Stanford University

This course

Recognize and formulate convex problemsDevelop algorithms and efficient codeLearn about the useful theory

TopicsConvexity: convex sets, functions, optimizationAlgorithmsExamples and applications