Upload
ngohanh
View
228
Download
2
Embed Size (px)
Citation preview
BBO Example MADS Computational experiments Discussion References
The Mesh Adaptive Direct SearchAlgorithm for Discrete Blackbox
Optimization
Sebastien Le DigabelCharles Audet
Christophe Tribes
GERAD and Ecole Polytechnique de Montreal
ICCOPT, Tokyo
2016–08–08
BBO: Algorithm and Applications 1/32
BBO Example MADS Computational experiments Discussion References
Presentation outline
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 2/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 3/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization (BBO) problems
I Optimization problem:
minx∈Ω
f(x)
I Evaluations of f (the objective function) and of the functionsdefining Ω are usually the result of a computer code (ablackbox).
I Variables are typically continuous, but in this work, some ofthem are discrete – integers or granular variables.
BBO: Algorithm and Applications 4/32
BBO Example MADS Computational experiments Discussion References
Blackboxes as illustrated by J. Simonis [ISMP 2009]
BBO: Algorithm and Applications 5/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 6/32
BBO Example MADS Computational experiments Discussion References
Motivating example: Parameters tuning (1/2)
I [Audet and Orban, 2006].
I The classical Trust-Region algorithm depends on fourparameters x = (η1, η2, α1, α2) ∈ R4
+.
I Consider a collection of 55 test problems from CUTEr.
I Let f(x) be the CPU time required to solve the collection ofproblems by a Trust-Region algorithm with parameters x.
I f(x0) ' 3h45 with the textbook valuesx0 = (1/4, 3/4, 1/2, 2).
BBO: Algorithm and Applications 7/32
BBO Example MADS Computational experiments Discussion References
Motivating example: Parameters tuning (2/2)
MADS Algorithmx ∈ R4
+
BlackboxRun TR on collection
Return CPU time
f(x)-
I This optimization produced x with f(x) ' 2h50.
Victory ?No becausex = (0.22125, 0.94457031, 0.37933594, 2.3042969).
BBO: Algorithm and Applications 8/32
BBO Example MADS Computational experiments Discussion References
Motivating example: Parameters tuning (2/2)
MADS Algorithmx ∈ R4
+
BlackboxRun TR on collection
Return CPU time
f(x)-
I This optimization produced x with f(x) ' 2h50.Victory ?
No becausex = (0.22125, 0.94457031, 0.37933594, 2.3042969).
BBO: Algorithm and Applications 8/32
BBO Example MADS Computational experiments Discussion References
Motivating example: Parameters tuning (2/2)
MADS Algorithmx ∈ R4
+
BlackboxRun TR on collection
Return CPU time
f(x)-
I This optimization produced x with f(x) ' 2h50.Victory ?No becausex = (0.22125, 0.94457031, 0.37933594, 2.3042969).
BBO: Algorithm and Applications 8/32
BBO Example MADS Computational experiments Discussion References
Granular variablesThe initial point x0 = (0.25, 0.75, 0.50, 2.00) is frequently usedbecause each entry is a multiple of 0.25.
Its granularity is G = 0.25
I How can we devise a direct search algorithm so that it stopson a prescribed granularity?With a granularity of G = 0.05, the code might produce
x = (0.20, 0.95, 0.40, 2.30).Which is much nicer (for a human) than
x = (0.22125, 0.94457031, 0.37933594, 2.3042969).
I This may be achieved using integer variables, together with arelative scaling, but there is a simpler way for mesh-basedmethods.
BBO: Algorithm and Applications 9/32
BBO Example MADS Computational experiments Discussion References
Granular variablesThe initial point x0 = (0.25, 0.75, 0.50, 2.00) is frequently usedbecause each entry is a multiple of 0.25.
Its granularity is G = 0.25
I How can we devise a direct search algorithm so that it stopson a prescribed granularity?With a granularity of G = 0.05, the code might produce
x = (0.20, 0.95, 0.40, 2.30).Which is much nicer (for a human) than
x = (0.22125, 0.94457031, 0.37933594, 2.3042969).
I This may be achieved using integer variables, together with arelative scaling, but there is a simpler way for mesh-basedmethods.
BBO: Algorithm and Applications 9/32
BBO Example MADS Computational experiments Discussion References
Granular variablesThe initial point x0 = (0.25, 0.75, 0.50, 2.00) is frequently usedbecause each entry is a multiple of 0.25.
Its granularity is G = 0.25
I How can we devise a direct search algorithm so that it stopson a prescribed granularity?With a granularity of G = 0.05, the code might produce
x = (0.20, 0.95, 0.40, 2.30).Which is much nicer (for a human) than
x = (0.22125, 0.94457031, 0.37933594, 2.3042969).
I This may be achieved using integer variables, together with arelative scaling, but there is a simpler way for mesh-basedmethods.
BBO: Algorithm and Applications 9/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 10/32
BBO Example MADS Computational experiments Discussion References
Mesh Adaptive Direct Search (MADS) in Rn
I [Audet and Dennis, Jr., 2006].
I Iterative algorithm that evaluates the blackbox at some trialpoints on a spatial discretization called the mesh.
I One iteration = search and poll.
I The search allows trial points generated anywhere on themesh.
I The poll consists in generating a list of trial pointsconstructed from poll directions. These directions grow dense.
I At the end of the iteration, the mesh size is reduced if no newsuccess point is found.
BBO: Algorithm and Applications 11/32
BBO Example MADS Computational experiments Discussion References
Mesh Adaptive Direct Search (MADS) in Rn
I [Audet and Dennis, Jr., 2006].
I Iterative algorithm that evaluates the blackbox at some trialpoints on a spatial discretization called the mesh.
I One iteration = search and poll.
I The search allows trial points generated anywhere on themesh.
I The poll consists in generating a list of trial pointsconstructed from poll directions. These directions grow dense.
I At the end of the iteration, the mesh size is reduced if no newsuccess point is found.
BBO: Algorithm and Applications 11/32
BBO Example MADS Computational experiments Discussion References
Mesh Adaptive Direct Search (MADS) in Rn
I [Audet and Dennis, Jr., 2006].
I Iterative algorithm that evaluates the blackbox at some trialpoints on a spatial discretization called the mesh.
I One iteration = search and poll.
I The search allows trial points generated anywhere on themesh.
I The poll consists in generating a list of trial pointsconstructed from poll directions. These directions grow dense.
I At the end of the iteration, the mesh size is reduced if no newsuccess point is found.
BBO: Algorithm and Applications 11/32
BBO Example MADS Computational experiments Discussion References
Mesh Adaptive Direct Search (MADS) in Rn
I [Audet and Dennis, Jr., 2006].
I Iterative algorithm that evaluates the blackbox at some trialpoints on a spatial discretization called the mesh.
I One iteration = search and poll.
I The search allows trial points generated anywhere on themesh.
I The poll consists in generating a list of trial pointsconstructed from poll directions. These directions grow dense.
I At the end of the iteration, the mesh size is reduced if no newsuccess point is found.
BBO: Algorithm and Applications 11/32
BBO Example MADS Computational experiments Discussion References
[0] Initializations (x0, ∆0: initial poll size )[1] Iteration k
let δk ≤ ∆k be the mesh size parameterSearch
test a finite number of mesh pointsPoll (if the Search failed)
construct set of directions Dk
test poll set Pk = xk + δkd : d ∈ Dkwith ‖δkd‖ ' ∆k
[2] Updatesif success
xk+1 ← success pointincrease ∆k
elsexk+1 ← xk
decrease ∆k
k ← k + 1, stop if ∆k ≤ ∆min or go to [1]
BBO: Algorithm and Applications 12/32
BBO Example MADS Computational experiments Discussion References
Poll illustration (successive fails and mesh shrinks)
δk = 1∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
BBO: Algorithm and Applications 13/32
BBO Example MADS Computational experiments Discussion References
Poll illustration (successive fails and mesh shrinks)
δk = 1∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
δk+1 = 1/4∆k+1 = 1/2
rxkr
p4 rp5
r
p6
= p4, p5, p6
BBO: Algorithm and Applications 13/32
BBO Example MADS Computational experiments Discussion References
Poll illustration (successive fails and mesh shrinks)
δk = 1∆k = 1
rxk
rp1
rp2
r
p3
trial points=p1, p2, p3
δk+1 = 1/4∆k+1 = 1/2
rxkr
p4 rp5
r
p6
= p4, p5, p6
δk+2 = 1/16∆k+2 = 1/4
rxk
rp7
r p8rSSp9
= p7, p8, p9
BBO: Algorithm and Applications 13/32
BBO Example MADS Computational experiments Discussion References
Discrete variables in MADS – so far
I MADS has been designed for continuous variables.
I Some theory exists for categorical variables [Abramson, 2004].
I So far: Only a patch allows to handle integer variables:Rounding + minimal mesh size of one.
In this work, we start from scratch and present direct searchmethods with a natural way of handling discrete variables.
BBO: Algorithm and Applications 14/32
BBO Example MADS Computational experiments Discussion References
Mesh refinement on min(x− 1/3)2∆k xk
1 00.5 0.50.25 0.250.125 0.3750.0625 0.31250.03125 0.343750.015625 0.3281250.0078125 0.33593750.00390625 0.332031250.001953125 0.333984375
alternately
∆k xk
1 00.5 0.50.2 0.40.1 0.30.05 0.350.02 0.340.01 0.330.005 0.3350.002 0.3320.001 0.333
Idea:Instead of dividing ∆k by 2, change it so that
10× 10b refines to 5× 10b
5× 10b refines to 2× 10b
2× 10b refines to 1× 10b
To get three decimals, one simply sets the granularity to 0.001.Integer variables are treated by setting the granularity to G = 1.
BBO: Algorithm and Applications 15/32
BBO Example MADS Computational experiments Discussion References
Mesh refinement on min(x− 1/3)2∆k xk
1 00.5 0.50.25 0.250.125 0.3750.0625 0.31250.03125 0.343750.015625 0.3281250.0078125 0.33593750.00390625 0.332031250.001953125 0.333984375
alternately
∆k xk
1 00.5 0.50.2 0.40.1 0.30.05 0.350.02 0.340.01 0.330.005 0.3350.002 0.3320.001 0.333
Idea:Instead of dividing ∆k by 2, change it so that
10× 10b refines to 5× 10b
5× 10b refines to 2× 10b
2× 10b refines to 1× 10b
To get three decimals, one simply sets the granularity to 0.001.Integer variables are treated by setting the granularity to G = 1.
BBO: Algorithm and Applications 15/32
BBO Example MADS Computational experiments Discussion References
Mesh refinement on min(x− 1/3)2∆k xk
1 00.5 0.50.25 0.250.125 0.3750.0625 0.31250.03125 0.343750.015625 0.3281250.0078125 0.33593750.00390625 0.332031250.001953125 0.333984375
alternately
∆k xk
1 00.5 0.50.2 0.40.1 0.30.05 0.350.02 0.340.01 0.330.005 0.3350.002 0.3320.001 0.333
Idea:Instead of dividing ∆k by 2, change it so that
10× 10b refines to 5× 10b
5× 10b refines to 2× 10b
2× 10b refines to 1× 10b
To get three decimals, one simply sets the granularity to 0.001.Integer variables are treated by setting the granularity to G = 1.
BBO: Algorithm and Applications 15/32
BBO Example MADS Computational experiments Discussion References
Poll and mesh size parameter updateI The poll size parameter ∆k is updated as
10× 10b ←→ 5× 10b ←→ 2× 10b ←→ 1× 10b
I The fine underlying mesh is defined with the mesh sizeparameter
δk =
1 if ∆k ≥ 1,max102b,G otherwise, i.e. ∆k ∈ 1, 2, 5 × 10b.
I Example: Granularity of G = 0.005 : δk ∆k
1 51 21 10.01 0.50.01 0.20.01 0.10.005 0.050.005 0.020.005 0.010.005 0.005← stop
BBO: Algorithm and Applications 16/32
BBO Example MADS Computational experiments Discussion References
[0] Initializations (x0, ∆0 ∈ 1, 2, 5 × 10b, G granularity )[1] Iteration k
let δk ≤ ∆k be the mesh size parameterSearch
test a finite number of mesh pointsPoll (if the Search failed)
construct set of directions Dk
test poll set Pk = xk + δkd : d ∈ Dkwith ‖δkd‖ ' ∆k
[2] Updatesif success
xk+1 ← success pointincrease ∆k
elsexk+1 ← xk
decrease ∆k
k ← k + 1, stop if δk = G or go to [1]
BBO: Algorithm and Applications 17/32
BBO Example MADS Computational experiments Discussion References
TheoryI MADS analysis relies on “the sequence of trial points are
located on some discretization of the space of variables calledthe mesh”.
I By multiplying or dividing ∆k by a rational number τ ,[Torczon, 1997] showed that all trial points from iteration 0 to` were located on a fine underlying mesh. The proof is nottrivial and uses the fact that τ ∈ Q, and does not work forτ ∈ R (paper won SIAM Outstanding Paper Prize).
I With the new mesh, that technical part of the proof becomes:
Consider any trial point t considered from iteration k = 0 to `.If a granularity of Gi is requested on variable i,
ti lies on the mesh of granularity Giif no granularity is requested on variable i,
ti lies on the mesh of granularity 10bi
with bi = minbki : k = 0, 1, . . . , `.
BBO: Algorithm and Applications 18/32
BBO Example MADS Computational experiments Discussion References
TheoryI MADS analysis relies on “the sequence of trial points are
located on some discretization of the space of variables calledthe mesh”.
I By multiplying or dividing ∆k by a rational number τ ,[Torczon, 1997] showed that all trial points from iteration 0 to` were located on a fine underlying mesh.
I With the new mesh, that technical part of the proof becomes:
Consider any trial point t considered from iteration k = 0 to `.If a granularity of Gi is requested on variable i,
ti lies on the mesh of granularity Giif no granularity is requested on variable i,
ti lies on the mesh of granularity 10bi
with bi = minbki : k = 0, 1, . . . , `.
BBO: Algorithm and Applications 18/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 19/32
BBO Example MADS Computational experiments Discussion References
Trust-Region parameters tuning
Find the values of the four parameters x = (η1, η2, α1, α2) ∈ R4+
that minimize the overall CPU time to solve 55 CUTEr problems.
I A surrogate function s is defined as the time to solve acollection of small-sized problems.
I In 2005, f(x) ' 3h-4h and s(x) ' 1m. The surrogate was200 times faster (now ' 140 times).
Theorem (Moore’s Law is valid)
Year CPU f(x) CPU s(x)2005 13800 s 69 s2016 330 s 2.3 s
Ratio: 42 30
Moore’s Law: Speed doublesevery 2 years:2016-2005 = 11 ⇒ 5.5 cycles,speedup of 25.5 ' 45.
BBO: Algorithm and Applications 20/32
BBO Example MADS Computational experiments Discussion References
Trust-Region parameters tuning
Find the values of the four parameters x = (η1, η2, α1, α2) ∈ R4+
that minimize the overall CPU time to solve 55 CUTEr problems.
I A surrogate function s is defined as the time to solve acollection of small-sized problems.
I In 2005, f(x) ' 3h-4h and s(x) ' 1m. The surrogate was200 times faster (now ' 140 times).
Theorem (Moore’s Law is valid)
Year CPU f(x) CPU s(x)2005 13800 s 69 s2016 330 s 2.3 s
Ratio: 42 30
Moore’s Law: Speed doublesevery 2 years:2016-2005 = 11 ⇒ 5.5 cycles,speedup of 25.5 ' 45.
BBO: Algorithm and Applications 20/32
BBO Example MADS Computational experiments Discussion References
Standard ×2÷ 2 versus New 1, 2, 5 × 10b
BBO: Algorithm and Applications 21/32
BBO Example MADS Computational experiments Discussion References
Trust-Region parameter tuning: Results
Standard New Granularity×2÷ 2 1, 2, 5 × 10b 0.01 0.05
CPU: 228 s 207 s 205 s 220 s
η1 0.2508936274 0.5325498121 0.35 0.35η2 0.9008242693 0.9938378210 0.99 0.95α1 0.5464425868 0.4817164472 0.50 0.45α2 1.5505441560 3.5206901980 2.82 3.20
Observations:I On this example, the new strategies seem preferable.I Trust-region recommendation for humans:
(η1, η2, α1, α2) = (0.35, 0.99, 0.5, 2.82).
BBO: Algorithm and Applications 22/32
BBO Example MADS Computational experiments Discussion References
Results on a collection of mixed-integer problems
I 43 problems taken from 3 papers by J. Muller et al.:
I SO-MI [Muller et al., 2013]: 19 problems, mixed, constrained.I SO-I [Muller et al., 2014]: 16 problems, discrete, constrained.I MISO [Muller, 2016]: 8 problems, mixed, unconstrained.
I 10 LHS starting points are considered for each problem, for atotal of 430 instances.
I NOMAD 3.7.3 (current release, classic mesh)vsNOMAD 3.8.Dev (prototype, new mesh).
I Performance and data profiles [More and Wild, 2009].
BBO: Algorithm and Applications 23/32
BBO Example MADS Computational experiments Discussion References
Performance profiles for the 430 instances
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9precision
75
80
85
90
95
100
num
ber o
f pro
blem
s (%
)
Performance profiles (43x10 instances)
NOMAD.3.7.3NOMAD.3.8.Dev
BBO: Algorithm and Applications 24/32
BBO Example MADS Computational experiments Discussion References
Data profiles for the 430 instances (precision 10−1)
0 100 200 300 400 500 600 700 800 900 1000number of simplex gradients
0
10
20
30
40
50
60
70
80
90
100
num
ber o
f pro
blem
s (%
)
Data profiles (43x10 instances, precision=1e-1)
NOMAD.3.7.3NOMAD.3.8.Dev
BBO: Algorithm and Applications 25/32
BBO Example MADS Computational experiments Discussion References
Data profiles for the 430 instances (precision 10−3)
0 100 200 300 400 500 600 700 800 900 1000number of simplex gradients
0
10
20
30
40
50
60
70
80
90
100
num
ber o
f pro
blem
s (%
)
Data profiles (43x10 instances, precision=1e-3)
NOMAD.3.7.3NOMAD.3.8.Dev
BBO: Algorithm and Applications 26/32
BBO Example MADS Computational experiments Discussion References
Data profiles for the 430 instances (precision 10−7)
0 100 200 300 400 500 600 700 800 900 1000number of simplex gradients
0
10
20
30
40
50
60
70
80
90
100
num
ber o
f pro
blem
s (%
)
Data profiles (43x10 instances, precision=1e-7)
NOMAD.3.7.3NOMAD.3.8.Dev
BBO: Algorithm and Applications 27/32
BBO Example MADS Computational experiments Discussion References
Blackbox optimization
Motivating example
The MADS algorithm
Computational experiments
Discussion
BBO: Algorithm and Applications 28/32
BBO Example MADS Computational experiments Discussion References
Discussion (1/2)
I New mesh parameter update rules to control the number ofdecimals 1, 2, 5 × 10b:
I A native way to handle granularity of variables G.I Integer variables are handled by setting G = 1.
I Computational experiments on trust region parameters:
I New parameters reduce CPU time by 35% (versus textbook).I New parameters have granularity 0.01 (readable by humans).I Experiments on other test problems in progress.
I Computational experiments on analytical problems:' 3% performance improvement over the previous NOMADversion.
BBO: Algorithm and Applications 29/32
BBO Example MADS Computational experiments Discussion References
Discussion (1/2)
I New mesh parameter update rules to control the number ofdecimals 1, 2, 5 × 10b:
I A native way to handle granularity of variables G.I Integer variables are handled by setting G = 1.
I Computational experiments on trust region parameters:I New parameters reduce CPU time by 35% (versus textbook).I New parameters have granularity 0.01 (readable by humans).I Experiments on other test problems in progress.
I Computational experiments on analytical problems:' 3% performance improvement over the previous NOMADversion.
BBO: Algorithm and Applications 29/32
BBO Example MADS Computational experiments Discussion References
Discussion (1/2)
I New mesh parameter update rules to control the number ofdecimals 1, 2, 5 × 10b:
I A native way to handle granularity of variables G.I Integer variables are handled by setting G = 1.
I Computational experiments on trust region parameters:I New parameters reduce CPU time by 35% (versus textbook).I New parameters have granularity 0.01 (readable by humans).I Experiments on other test problems in progress.
I Computational experiments on analytical problems:' 3% performance improvement over the previous NOMADversion.
BBO: Algorithm and Applications 29/32
BBO Example MADS Computational experiments Discussion References
Discussion (2/2)
I This will be part of our NOMAD 3.8 software:
I The only additional input from the user is G.
I ... and it is optional.
I www.gerad.ca/nomad.
BBO: Algorithm and Applications 30/32
BBO Example MADS Computational experiments Discussion References
References I
Abramson, M. (2004).Mixed variable optimization of a Load-Bearing thermal insulation system using afilter pattern search algorithm.Optimization and Engineering, 5(2):157–177.
Audet, C. and Dennis, Jr., J. (2006).Mesh adaptive direct search algorithms for constrained optimization.SIAM Journal on Optimization, 17(1):188–217.
Audet, C. and Orban, D. (2006).Finding optimal algorithmic parameters using derivative-free optimization.SIAM Journal on Optimization, 17(3):642–664.
Le Digabel, S. (2011).Algorithm 909: NOMAD: Nonlinear Optimization with the MADS algorithm.ACM Transactions on Mathematical Software, 37(4):44:1–44:15.
More, J. and Wild, S. (2009).Benchmarking derivative-free optimization algorithms.SIAM Journal on Optimization, 20(1):172–191.
BBO: Algorithm and Applications 31/32
BBO Example MADS Computational experiments Discussion References
References II
Muller, J. (2016).MISO: mixed-integer surrogate optimization framework.Optimization and Engineering, 17(1):177–203.
Muller, J., Shoemaker, C., and Piche, R. (2013).SO-MI: A surrogate model algorithm for computationally expensive nonlinearmixed-integer black-box global optimization problems.Computers and Operations Research, 40(5):1383–1400.
Muller, J., Shoemaker, C., and Piche, R. (2014).SO-I: a surrogate model algorithm for expensive nonlinear integer programmingproblems including global optimization applications.Journal of Global Optimization, 59(4):865–889.
Torczon, V. (1997).On the convergence of pattern search algorithms.SIAM Journal on Optimization, 7(1):1–25.
BBO: Algorithm and Applications 32/32