55
UPTEC F16 061 Examensarbete 30 hp September, 2016 High Performance Multi-Objective Voyage Planning Using Local Gradient-Free Methods Niklas Fejes

High Performance Multi-Objective Voyage Planning Using ...uu.diva-portal.org/smash/get/diva2:1038594/FULLTEXT01.pdf · n agot som utv arderas ... most prominent method so far has

  • Upload
    vudieu

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

UPTEC F16 061

Examensarbete 30 hpSeptember, 2016

High Performance Multi-Objective Voyage Planning Using Local Gradient-Free Methods

Niklas Fejes

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

High Performance Multi-Objective Voyage PlanningUsing Local Gradient-Free Methods

Niklas Fejes

A number of parallel gradient-free local optimization methods are investigated inapplication to problems of voyage planning for maritime ships. Two optimizationalgorithms are investigated, a parallel version of the Nelder–Mead Simplex methodand the Subplex method with Nelder–Mead Simplex as its inner solver. Additionally,two new formulations of the optimization problem are suggested which together withan improved implementation of the objective function increases the overallperformance of the model. Numerical results show the efficiency of these methods incomparison with the earlier introduced Grid search method and solvers from anopen-source optimization library.

ISSN: 1401-5757, UPTEC F16 061Examinator: Tomas NybergÄmnesgranskare: Maya NeytchevaHandledare: Kateryna Mishchenko

Popularvetenskaplig sammanfattning

Marin fardvagsplanering for transportfartyg har sedvanligen varit en befattning som utfortsmanuellt, men under senare ar har arbetsuppgiften alltmer forflyttats till datorbaserade systemsom kan optimera fardvagarna bade snabbare och med storre noggrannhet. Mycket arbete ochforskning har redan gjorts i omradet, och kommersiella produkter for fardvagsplanering finnstillgangliga pa marknaden. Dagens forskning utvecklar omradet vidare genom att forbattra debefintliga metoderna och introducera nya funktioner sa som mojligheten att optimera fard-vagarna for ett flertal olika kriterium som kan vara motstridiga, exempelvis optimering av badebransleatgang och fardtid.

I detta examensarbete, som bygger vidare pa tre tidigare examensarbeten utforda pa ABB Cor-porate Research, analyseras och utvecklas metoder for multiobjektiv fardvagsplanering genomatt anvanda sa kallade lokala losningsmetoder. Med multiobjektiv optimering, i jamforelse med”single-objective” optimering, amnar man hitta samtliga losningar dar en malfunktion inte kanforbattras utan att en annan malfunktion forsamras. Detta ger i praktiken ofta en oandlig,kontinuerlig mangd optimala losningar. De malfunktioner som beaktats i detta examensarbetear utover bransleatgang och fardtid aven ett enkelt matt pa fardsakerhet, som avser att goraresvagen sa saker som mojlig for besattning, fartyg och last.

I de foregaende examensarbetena har ett flertal metoder undersokts, sa som den diskreta”Grid search”-metoden baserad pa dynamisk programmering, och evolutionara metoder somefterharmar ett naturligt urval. I detta examensarbete undersoks en annan klass av optime-ringsmetoder som kallas gradientfria lokala metoder. Dessa har egenskapen att de inte ut-nyttjar malfunktionernas derivator, vilket ar essentiellt nar malfunktionerna beraknas genomkomplex simuleringskod. Att metoderna ar lokala innebar att de hittar lokala minima tillmalfunktionerna. Trots att detta ar en nackdel, da det ofta ar de globala losningarna someftersoks, kan anvandning av metoderna vagas upp av att de ar avsevart mycket snabbare ankontinuerliga globala metoder. Denna aspekt av balans mellan noggrannhet och snabbhet arnagot som utvarderas grundligt i arbetet.

I examensarbetet implementeras, analyseras och utvecklas ett flertal algoritmer som stalls ijamforelse med den globala ”Grid search”-metoden. De metoder som undersoks ar ”Nelder–Mead Simplex Search” och ”Subplex”-metoden, samt parallella generaliseringar av dessa tvametoder. Metoderna optimeras med avseende pa prestanda och berakningseffektivitet, och helasystemet for fardvagsoptimering granskas och effektiviseras. Utover detta undersoks hur pa-rallellisering av systemet gors pa basta satt, vilket har resulterat i en kod som kan koras upptill 200 ganger snabbare utan att byta hardvara. De implementerade metoderna jamfors ochprestandautvarderas aven mot optimeringsbiblioteket NLopt med oppen kallkod.

Den resulterande koden visar att en effektiv parallell implementation kan vara jamforbar medde globala metoder som for narvarande anvands. Genom utvardering och analys av systemet harkritiska delar av algoritmerna och simuleringskoden identifierats, som i sig har givit insikt i hurmetoderna kan forbattras och anvands pa basta mojliga satt for problemuppstallningen i fraga.Problemet med att de lokala algoritmerna inte alltid hittar basta mojliga losning kvarstar, menforandringar som kan forbattra ”globaliteten” hos metoderna har undersokts med goda resultat.Vidare utveckling kan goras genom att kombinera globala och lokala metoder, samt genom attforbattra den kluster-parallellisering som anvants vid utvardering av metoderna.

ii

Acknowledgments

First of all, I would like to express my gratitude to my supervisors Kateryna Mishchenko andMats Molander for all their support and good advice. I appreciate the encouraging discussionsand the guidance which has helped me make this thesis possible. A special thanks to MayaNeytcheva for her proof-reading of my report and for her valuable suggestions.

This thesis was made in conjunction with ABB Corporate Research in Vasteras, Sweden. Iwould like to thank Rickard Lindkvist and Shiva Sander-Tavallaey for their support and for theopportunity to be a part of this project. Finally, I would like to thank my fellow thesis workersat the department for the reminders to take coffee breaks and our interesting and profounddiscussions in the lunch room.

Niklas Fejes, July 2016

iii

Abbreviations

Abbreviation Meaning Section

VTW speed through water 2.2

VOG speed over ground 5.1

FIX fixed-time 5.2

TAV time-as-variable 5.3

SB St. John’s – Bodø 2.7

GN Gothenburg – New York 2.7

GD Gothenburg – Dunkirk 2.7

KQ Kashechewan – Quebec City 2.7

NMS Nelder–Mead Simplex 3.1

GBNM Globalized Bounded Nelder–Mead 3.1

SBPLX Subplex/Subspace-searching simplex 3.2

SOO single-objective optimization 2.1

MOO multi-objective optimization 2.1

iv

Contents

1 Introduction 1

2 Problem formulation 2

2.1 Mathematical framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 The voyage planning cost function . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.3 Scalarizing the multi-objective optimization problem . . . . . . . . . . . . . . . . 4

2.4 Constraint penalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Pareto front sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.6 Problem summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.7 Test voyages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Optimization algorithms 10

3.1 The Nelder–Mead Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 The Subplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Parallel optimization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Trivial Nelder–Mead Simplex method parallelization . . . . . . . . . . . . 15

3.3.2 Parallel Nelder–Mead Simplex method . . . . . . . . . . . . . . . . . . . . 15

3.3.3 Parallel Subplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 The Grid search method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Other local reference algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Algorithm and problem tuning 18

4.1 Improving the efficiency of the objective function computation . . . . . . . . . . 18

4.1.1 Profiling results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Wave data interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Modification of the Parallel Subplex algorithm . . . . . . . . . . . . . . . . . . . 21

4.4 Parameter tuning of the Parallel Subplex algorithm . . . . . . . . . . . . . . . . . 22

4.4.1 Optimal parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Choice of decision variables 25

5.1 Speed over ground variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Fixed travel time variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3 Time as decision variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

v

6 Results 30

6.1 Evaluation of the accuracy of the solutions . . . . . . . . . . . . . . . . . . . . . . 30

6.1.1 Wave data interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.1.2 NLopt vs. Parallel Subplex method . . . . . . . . . . . . . . . . . . . . . . 31

6.1.3 Modification of the Parallel Subplex algorithm . . . . . . . . . . . . . . . 31

6.1.4 Choice of decision variable . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 Evaluation of the convergence speed . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.3 Evaluation of the computational efficiency . . . . . . . . . . . . . . . . . . . . . . 36

6.3.1 Total front computation time . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Discussion & conclusions 37

7.1 Computational improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.3 Viability of the local solver approach . . . . . . . . . . . . . . . . . . . . . . . . . 39

8 Further improvements 40

A Timing of the objective function 43

B Implementation details 44

B.1 Implementation of the constraint penalization . . . . . . . . . . . . . . . . . . . . 44

B.2 Measuring the number of function calls . . . . . . . . . . . . . . . . . . . . . . . . 45

C Analysis of the initial algorithms 46

C.1 Analysis of the NLopt library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

C.2 Analysis of the Subplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

C.3 Analysis of the Parallel Nelder–Mead Simplex method . . . . . . . . . . . . . . . 47

vi

1 | Introduction

Voyage planning for ocean going ships was traditionally a manually performed task, but in recentyears there has been a shift towards using computer-aided systems to optimize the planningof the routes. A lot of work and research has already been made in the area, and commercialsoftware tools for voyage planning are readily available. Current research is developing thearea further by advancing the methods and introducing new features such as the possibility tooptimize the routes for multiple conflicting criteria.

This thesis is a continuation of three previous thesis projects [10], [1], [4], which are developingmethods for multi-objective voyage planning at ABB Corporate Research. In a single-objectivevoyage planning tool only one objective is considered for optimization, most commonly thefuel consumption of the voyage. In a multi-objective planning tool, several other criteria areconsidered as well, such as travel time and travel safety of the ship’s crew and cargo. Thesepossibly conflicting objectives form a multi-objective minimization problem, for which theremight be an infinite number of optimal solutions.

In the previous thesis projects, the aims have been to develop the voyage planning model andinvestigate the use of global multi-objective minimization methods to solve the problem. Themost prominent method so far has been the Grid search method which uses dynamic program-ming. Other methods which have been tested are evolutionary algorithms using stochasticmethods to mimic the evolution of species.

This thesis aims to investigate the use of local optimization methods for finding the globalsolutions to the multi-objective voyage planning problem. A local method is advantageousin that it converges quickly and in general should be faster than any comparable continuousglobal method. The obvious disadvantage is that a local optimization algorithm is non-global,meaning that it might find suboptimal solutions. This creates a trade-off between speed andaccuracy for the approach, which must be considered. Moreover, the computational efficiencyof the developed implementation is thoroughly examined and tuned in order to attain a high-performance voyage planning tool.

Outline

Chapter 2 introduces the voyage planning problem and its mathematical framework. Chapter 3describes the optimization algorithms being studied or used for reference in the thesis. In chapter4 and 5, the improvements and changes made to the problem and algorithm are described andevaluated. Chapter 6 presents the results, and chapters 7 and 8 conclude with discussion andfurther improvements.

1

2 | Problem formulation

2.1 Mathematical framework

Before we go into the details of the maritime voyage planning problem, let us first define thegeneral multi-objective optimization problem we are considering in this thesis. All optimizationproblems are assumed to be minimization problems, meaning that we want to find the minimalvalue of some objective. In the following definitions the variables xi ∈ R for i = 1, . . . , Nare called the decision variables. The vector x = (x1, . . . , xN )T ∈ X ⊆ RN is the decisionvector belonging to the feasible decision space X. The vector function f : X → RM is theobjective function, mapping the feasible decision space X ⊆ RN to the objective space RM . Apoint z ∈ RM is called an objective vector. The feasible objective space Z ∈ RM is the imageZ = f(X). Each component of f is a single objective function which may be in conflict withother single objectives in f .

The feasible decision space X is formed by the nonlinear constraint vector function g : RN →RK , where a point x ∈ RN is feasible if g(x) ≤ 0. With bound constraints xLB and xUB

included, the feasible decision space X is then defined as

X =

{x ∈ RN

∣∣∣∣ g(x) ≤ 0,xLB ≤ x ≤ xUB

}. (2.1)

Note that we use vector inequalities defined such that for x,y ∈ RN

x ≤ y if xi ≤ yi for all i = 1, . . . , N. (2.2)

Definition 2.1. An objective vector z∗ ∈ Z ⊆ RM is said to be Pareto optimal in Z if

{z ∈ Z | z ≤ z∗, z 6= z∗} = ∅, (2.3)

i.e. no other point z ∈ Z exists such that z ≤ z∗. Similarly, a decision vector x∗ ∈ X ⊆ RN issaid to be Pareto optimal in X if its corresponding objective vector f(x∗) is Pareto optimal inZ (= f(X)).

The set of all Pareto optimal points in Z is called the Pareto front.

Definition 2.2. A multi-objective minimization problem is a problem of the form

minimizex∈X

f(x)

where x∗ minimizes f if no other point x ∈ X exists such that f(x) ≤ f(x∗). (Equivalently, x∗ isPareto optimal in X). Note that the solution to the optimization problem is a possibly infiniteset of vectors which we usually want to sample in a way such that it identifies the properties ofthe full Pareto front.

2.2 The voyage planning cost function

This section explains how the maritime voyage planning cost function is formulated mathemat-ically. Put in words only, the objective is a function that maps a route between two points on

2

a map to a set of conflicting objective values describing the cost of the voyage. In this thesisthese objectives are the travel time, total amount of fuel consumed, and mean wave height.

In order to describe the voyage route with a finite number of decision variables, it is dividedinto L line segments, where the first segment starts in the route’s departure coordinate and thelast segment ends in the route’s arrival coordinate. The (L − 1) intermediate coordinates aredetermined by the nominal route which is an a-priori given feasible solution to the optimiza-tion problem, in our implementation created by the Isochrone method as described in [4]. Theparameterized route is based on this nominal route in a way such that we can vary the interme-diate coordinates perpendicularly to the travel direction, and we can also vary the speed overeach segment. With L segments we then get that the number of decision variables is N = 2L−1,and an example of such a route is given in Figure 2.1.

Longitude

-50 -40 -30 -20 -10 0 10

Latitude

50

55

60

65

nominal route

search space

Speed

20

21

22

23

24

25

Figure 2.1: The nominal route for a voyage between St. John’s and Bodø. The route has 54 coordinatesand 53 segments. The 52 intermediate coordinates can be varied along the dotted lines, and the speedover each segment can be varied between 5 and 25 knots. This yields a total of 105 decision variablesfor this problem.

The voyage planning problem studied in this thesis is defined as follows below.

Definition 2.3. A speed through water problem (Problem type VTW) for a route with Lsegments is the problem

minimizex∈X

f(x) with X =

{x ∈ R2L−1

∣∣∣∣ g(x) ≤ 0,xLB ≤ x ≤ xUB

}, (2.4)

where

x = (d1, . . . , dL−1, v1, . . . , vL)T ,

di =

“perpendicular distance from the nominalroute’s i-th intermediate coordinate”,

vi =“speed through water for the i-th segmentof the route”,

(2.5)

f(x) =

“travel time”“fuel consumed”

“mean wave height”

and g(x) =

“distance limit violation”

“land violation”“power violation”

“travel time violation”

. (2.6)

For the test voyages studied in the thesis, the distances di are measured in nautical miles andare limited to the range [−64, 64]. The speeds vi are measured as speed through water in knotsand are limited to the range [5, 25].

3

The specific implementation of the route properties in equation (2.6) are not discussed in detailsince they have been thoroughly defined in [4] and [8]. A few of the implementation propertiesthat are significant for this thesis are that:

• f(x) and g(x) are always computed simultaneously,

• neither f nor g is continuous in all regions of the decision space,

• g is implemented such that g(x) = 0 for all feasible x ∈ X, i.e. we cannot estimate thedistance to the boundary of the feasible region,

• all components of f and g are aggregated over the segments of the route, e.g. f1 =“travel time” =

∑Li=1 “travel time over segment i”,

• travel distance and time is measured w.r.t. to earth curvature, while coordinates are givenin latitude/longitude,

• both ocean currents and waves are considered when computing f(x) and g(x).

2.3 Scalarizing the multi-objective optimization problem

In order to make the single-objective optimization algorithms compatible with the multi-objectivefunction, the Direction method is used to scalarize the problem. In short, the method translatesthe multi-objective problem into a set of single-objective problems that can then be solved sep-arately. A full mathematical description of this method is given in [4], and a shorter overviewis presented below.

Let the bounding objectives zI , zN ∈ RM be such that for a Pareto front P ⊆ Z ⊆ RM it holdsthat

z∗ ≥ zI and z∗ ≤ zN for all z∗ ∈ P. (2.7)

A visualization of two such points for a two-dimensional objective is given in Figure 2.2. Ifpossible, these two points should be the ideal point and nadir point as defined in [4]. Use thesebounding objectives to scale the objective space RM such that for an objective vector z ∈ RMthe scaled vector y has components

yi =zi − zIizNi − zIi

. (2.8)

The scaled Pareto front P ′ is then contained in the unit hypercube, i.e.

P ′ =

{y∗∣∣∣∣ y∗i =

z∗i − zIizNi − zIi

, z∗ ∈ P}⊆ [0, 1]M . (2.9)

The basic idea behind the Direction method is that for each subproblem, a cone {y | y ≤ y0 + kd}⊆ RM should be found such that only one point y∗ ∈ P ′ is contained therein. For an illustration,see Figure 2.3. If we can find such a cone, the point y∗ must be Pareto optimal by definition.Given a reference point y0 ∈ RM and a direction d > 0 (usually d = (1,1, . . . )T ), this problemcan be stated as the minimization problem

minimizey

k = maxi

yi − y0idi

.

4

zN

zI

z2

z1

z 2 P

Figure 2.2: The bounding objectives zI and zNthat forms a rectangle containing the Pareto front(blue line).

y$

y0

1

010

Z0fy 5 y0 + kdg

d1k

d2k

Figure 2.3: The scaled Pareto front (blue line),and the reference plane Z0 (red line). The shadedregion is the cone {y ≤ y0 + kd}.

If k∗ is the solution to this problem, and there is only one corresponding vector y (such thatk∗ = maxi

yi−y0idi

), then y is Pareto optimal. If there are multiple vectors {y1,y2, . . . } whichgive the same value of k, then some of them might be what is called “weakly Pareto optimal”solutions meaning that there is no other point which improves the solution in all directions.Such solutions can occur if the search line does not pass through the Pareto front, e.g. if thereare discontinuities or “holes” in the front, or if the search line passes beside it.

In order to find a set of cones that sample the full Pareto front as well as possible, we need topick our starting points y0 in some smart way. This is done by projecting the unit cube [0, 1]M

onto the hyperplane passing through 0 with normal d > 0, and then picking our y0 from thisprojection. We call this hyperplane the reference plane Z0, shown as the red line in Figure 2.3.Ideally the reference plane Z0 should be sampled such that the Pareto front obtained becomesuniformly sampled, but this is not possible without first knowing the shape of the Pareto front.If Z0 is sampled uniformly, the Pareto optimal points obtained will be well-distributed over thePareto front.

The full scalarizing function Sy0 for a search-line starting in y0 ∈ Z0 is then defined such that

Sy0(z) = maxi

1

di

(zi − zIizNi − zIi

− y0i), (2.10)

where z ∈ RM is an objective vector.

2.4 Constraint penalization

Since some of the optimization algorithms studied in this thesis do not handle nonlinear con-straints by default, it is necessary to take care of them explicitly by some penalization method.One method commonly used in gradient-free algorithms is to set any infeasible objective valueto infinity, which is possible if we have a feasible initial guess and if we never need to comparetwo infeasible objective values to each other. In the Nelder–Mead Simplex method (Section 3.1)we might benefit from being able to do so, and in this thesis we thus use a penalization methodwhere two infeasible objective values can be compared by the value of the constraint function.The constraint penalization method used is based on the barrier penalization used in [2].

5

Given a scalar objective value v ∈ R, and a constraint vector c ∈ RK , (e.g. c = g(x)), introducethe constraint scalarization function g : RK → R≥0 as

g(c) =K∑i=1

max(0, ci). (2.11)

The penalizing function H : (R,RK)→ R is then

H(v, c) = v +

{k · (1 + g(c)) if g(c) > 0,

0 if g(c) ≤ 0,(2.12)

where k is a nonnegative parameter.

The parameter k is used to control the degree of penalization of the constraints, and should beset to be larger than or equal to some upper bound of the scalar objective, i.e. k ≥ vmax. Thisguarantees that an infeasible point will always have a higher penalized objective value than afeasible point, since we get that v ≤ vmax < k · (1 + g(x)). In the voyage planning problem(using d = (1, 1, . . . )T ), any scalarizing function Sy0 will have its minimal value in the range[0, 1], even though objective values larger than 1 can occur for non-optimal routes. This meansthat we should pick k ≥ 1, and in this thesis the parameter is set to k = 1. The components ofthe constraint vector ci can be scaled to balance the significance of the constraints, but in thisthesis the constraints are left unscaled since we have no such measure of significance.

Due to numerical issues the constraint penalization is implemented with a tolerance variable,which is explained in Appendix B.1.

2.5 Pareto front sampling

Given a voyage planning problem, the full Pareto front sampling is then performed by creatinga large set of search lines. For each search line a minimization problem is solved, which resultsin one single point on the Pareto front. The algorithm is:

input : voyage planning problem (Definition 2.3),bounding objectives zI , zN as in equation (2.7),number of sample points N ,

output: sampled Pareto front PS = {x∗1,x∗2, . . . ,x∗N}.Create a set of N reference points Z0 = {y1,y2, . . . ,yN} based on the boundingobjectives zI and zN , as described in Section 2.3;

for i = 1 : N doFind the minimum of the function

hi(x) = H(Syi(f(x)),g(x)),

where H is the penalization function from equation (2.12), Syi is the scalarizingfunction from equation (2.10), and f , g are from the voyage planning problem;

Let x∗i = argminx∈X

hi(x).

endAlgorithm 2.1: The Pareto sampling algorithm.

Note that each minimization problem can be solved independently of the other minimizationproblems, i.e. the algorithm is trivially parallelizable.

6

If the argmin would give a set of solutions {x1,x2, . . . }, we pick one (or all) xi such that xi isPareto optimal in {x1,x2, . . . }. By changing the algorithm to return all found Pareto optimalpoints for each search line, we can find more than N points in total. We have no guarantee thattwo distinct reference points will give distinct points from the Pareto front, so we should alwaysdo a final “Pareto filtering” which removes duplicate (or non–Pareto optimal) points from thesample set, meaning that we may get fewer than N distinct Pareto optimal points from thealgorithm.

The bounding objectives xI , xN are specified as inputs to the algorithm, but if we do not havesuch estimates we can perform an initial minimization search for each of the M scalar objectives.The bounding objectives can then be estimated as the points such that zi ≥ zI and zi ≤ zN forall found solution vectors zi for i = 1, . . . ,M .

2.6 Problem summary

With the definitions given hitherto, we can summarize the full front sampling process into asingle system. In Figure 2.4, the flowchart for the front sampling process is shown, bringingtogether all of the sections in this chapter. Note that this is not a true Pareto front samplingsince we are using local optimization methods, meaning that we have no guarantee of globaloptimality for the solutions. By using local optimization methods to solve our minimizationproblems, we are essentially changing the Pareto front sampling algorithm (Algorithm 2.1) from“find the (global) minimum of the function. . . ” to “find a local minimum of the function. . . ”in order to approximate the global minimum. The accuracy of this approximation is furtherevaluated in Section 6.1.

Route information(weather, wave data, nominal route, etc.)

Configuration(front size, parameters, etc.)

Search line generation

ScalarizationMOO → SOO

Voyage planning model(simulation code,

fuel model, etc.)

Local solver(NLopt, Parallel Subplex)

Candidate optimal solutions

(reference points)

(single objective functions)

(objective & decision vectors)

Figure 2.4: Flowchart for the full front sampling process.

The input of the front sampling process is the route information, such as departure and arrivalcoordinates, ocean wave and weather forecasts, fuel model specifications for the vessel, and thearrival time window. The front sampling process is controlled by the configuration of frontsize (number of search lines), parameters for the voyage planning model, scalarization method,choice of local solver, and the parameters of the solver method. The output of the process is aset of objective and decision vectors, which are the candidate optimal solutions. An example of

7

the output is visualized in Figure 2.5, where the Pareto optimal objective and decision vectorsfrom the Grid search algorithm (Section 3.4) are visualized for one of the studied routes.

100

104

time

108

1127570

fuel

60

80

90

100

110

50

waveheight

waveheight

80

85

90

95

100

105

110

(a) Pareto optimal objective vectors.

Longitude

-50 -40 -30 -20 -10 0 10

Latitude

50

55

60

65

(b) Routes for all Pareto optimal decision vectors.

Figure 2.5: Visualization of a set of Pareto optimal solutions for the voyage between St. John’s andBodø. Each point in (a) corresponds to one route trajectory in (b).

While the principal focus of this thesis has been to investigate the local optimization methods,all parts of the front sampling process have been examined in order to find out what can bemade to improve the applicability of local solvers when sampling the Pareto front of the voyageplanning problem. In addition to investigate the local optimization algorithms, this involvesoptimizing the efficiency of the simulation code, and evaluating where parallelism can be utilized.

2.7 Test voyages

The test voyages studied are the same as the voyages considered in [4], with one additional routebetween the two Canadian cities Kashechewan and Quebec City. Also, the weather conditionsin the Gothenburg – New York route has been changed to simulate a storm. The four voyageplanning problems considered are:

Gothenburg – Dunkirk (GD)This route has 38 segments (75 decision variables), and there are no ocean waves modeledin the problem. Since this makes the “mean wave height” objective equal to zero for allroutes, the voyage is thus only used for timing purposes and not Pareto front sampling.

Gothenburg – New York (GN)This route is mostly going over open water, but has some tight passages near Gothenburgand the northern British Isles. The ocean waves in this voyage contain a “storm” close tothe coast of New York which is intended to test the “mean wave height” objective of theproblem. This route has 89 segments, i.e. 177 decision variables.

St. John’s – Bodø (SB)This route is mostly going over open water, with fewer tight passages. It has 53 segments,i.e. 105 decision variables.

8

Kashechewan – Quebec City (KQ)This route between the two Canadian cities goes along the coast through Hudson Bay,around Newfoundland, and into the Gulf of St. Lawrence. It was added to the set of testvoyages to investigate a voyage with many land constraints. The ocean wave data in thisvoyage is missing for some regions, and like the GD test voyage it has thus only been usedfor timing purposes and not Pareto front sampling. The voyage has 68 segments, i.e. 135decision variables.

The nominal routes for all four test voyages are shown in Figure 2.6.

Gothenburg – Dunkirk (GD)

longitude

2 4 6 8 10 12

latitude

51

52

53

54

55

56

57

58

Gothenburg – New York (GN)

longitude

-80 -60 -40 -20 0 20

latitude

40

45

50

55

60

St. John’s – Bodø (SB)

longitude

-50 -40 -30 -20 -10 0 10

latitude

50

55

60

65

Kashechewan – Quebec City (KQ)

longitude

-80 -75 -70 -65 -60 -55 -50

latitude

50

55

60

Figure 2.6: The nominal routes for the four test voyages. The black solid line shows the nominal route,and the dotted lines indicate the maximum distance from the nominal route that a ship is allowed totravel. The gray regions indicate land or unnavigable water.

9

3 | Optimization algorithms

A number of optimization algorithms have been studied and evaluated in this thesis. Theircommon features are that they all are gradient-free and local search methods, meaning thatthey are designed to converge towards a local minimum of the objective function without theuse of information about the gradient of the function.

Previous results in [2] have indicated that this kind of methods might work well on the voy-age optimization problem, with promising convergence rates and accurate local minima. Thischapter describes the main concepts of the studied algorithms and their advantages and disad-vantages, with respect to the high-dimensional voyage planning problem.

In [2] the gradient-free methods from the open source NLopt1 library are evaluated. There, themost promising algorithms are found to be SBPLX (a variant of the Subplex method), Cobyla,Nelder–Mead Simplex, and Praxis. According to [2], these four local methods outperforma number of global methods (MLSL, CRS2 LM, ESCH, ISRES) in terms of both accuracy andCPU time. Based on their results, the focus of this thesis has been to investigate some ofthese four algorithms further and evaluate whether it is possible to parallelize them to someextent. The initial analysis of these algorithms, leading up to the choice of parallel algorithmsin Section 3.3, is included in Appendix C.

3.1 The Nelder–Mead Simplex method

One of the most popular gradient-free local optimization algorithms is the Simplex methodintroduced by Nelder and Mead [9] in 1965. The full algorithm description can be found in [9],[12], and [5]. It can be thought of as a generalized unbounded line search, which uses a simplexwith N+1 vertices to search an N -dimensional space by “rolling” the simplex in a directionaway from the worst objective value. This rolling is performed by moving the worst vertex ofthe simplex in the direction of the other vertices in an attempt to decrease its objective value,through a set of moves known as reflection, expansion, contraction, and massive contraction.

One mechanism that is important in this thesis is the formation of the initial simplex. Given adecision vector x ∈ RN , the initial simplex is formed by x and N points created by increasingor decreasing a single decision variable in x. As described in [12], it is formed by the (N+1)vectors

{x0,x1, . . . ,xN

}by setting x0 = x, and xi = x0 + scalei · ei for i = 1, . . . , N , where

scalei is a parameter and ei is the i-th column of the N ×N identity matrix.

Although the method is extremely popular, converges quickly, and has been investigated thor-oughly, it does not come without flaws. There is no guarantee of convergence, and there evenexist example problems where the method converges to a non-stationary point of the objectivefunction. Another flaw is that the method’s efficiency decreases as the dimension of the problemincreases. This is pointed out in [12], and it is one of the reasons behind the development of theSubplex method. The paper mentions that the algorithm works well for small dimensions (e.g.n ≤ 5) but becomes inefficient for much higher dimensions, which is the case for the voyageplanning problem where the number of dimensions are in the range of 100–200.

1http://ab-initio.mit.edu/nlopt

10

In Luersen and Le Riche [7] the authors introduce a variant of the Nelder–Mead Simplex methodcalled Globalized Bounded Nelder–Mead (GBNM) algorithm which attempts to improve boththe “globality” and the robustness of the method. In order to increase robustness, the methodreinitializes the simplex from the best found point when the simplex becomes too “disfigured”.The observation is that if the simplex takes too many steps it will degenerate, i.e. collapse into asubspace such that it no longer can take further steps in all directions. In three dimensions, thishappens if the initial tetrahedron transforms in a way that all vertices lie on a plane, or evenworse, on a line. The suggested solution is then to reinitialize the simplex, effectively restartingthe search with the best found point as the initial guess. If this point is a local minimum,the simplex will shrink down to a neighborhood around the local minimum without becomingdegenerate. In the GBNM algorithm the solution is to restart based on three different criteria,called the small, flat and degenerate simplex tests. An additional advantage of the restartis that it allows the simplex to search a larger neighborhood around the best found point,thus making it possible for the simplex to escape from a local minimum. In [7] this is takenone step further by a probabilistic restart mechanism, which in short samples a number ofpoints around the best found point, and uses the best of those to initialize the next simplex.Hence, the “Globalized” prefix in the algorithm name. What is important for this thesis isthe conclusion that the performance of the Nelder–Mead Simplex method can be improved bysimplex degeneracy detection, and handling of such through reinitialization.

While the distinction between reinitialization and restart is vague in [7], the usage in thisthesis is as follows: Reinitialization is when the vertices of the simplex are formed around thebest vertex, in the same way as with the initial simplex. The scalei parameters are changedsuch that the size of the simplex (by some definition) is unchanged or similar to before thereinitialization. Restart is when the simplex is reinitialized around the best vertex, but withthe scalei parameters as in the initial simplex formation.

The original algorithm does not handle bounded variables, and in this thesis the approachis to truncate any coordinate of the simplex if it lies outside of the feasible decision space.Mathematically, this is

x′i =

xmini if xi < xmin

i ,xmaxi if xi > xmax

i ,xi otherwise,

(3.1)

and the same approach is used in both NLopt’s implementation and in [7].

3.2 The Subplex method

The Subplex algorithm was proposed by Rowan in [12] as an attempt to avoid the degenerationproblems of the Nelder–Mead Simplex method when used on high-dimensional problems. Inshort, the algorithm partitions the search space into subspaces of sizes which are more suitablefor the Nelder–Mead Simplex search. Hence the name, which is a contraction of “Subspace-searching simplex”. The algorithm as implemented in this thesis is defined in Algorithm 3.1.

11

input : objective function f : RN → R,initial parameters x0 = (x1, x2, . . . , xN ),

output: best found parameters xbest.xbest = x0;while outer stopping criteria not met do

partition the set of indices {1, 2, . . . , N} into subsets (ξ1, ξ2, . . . , ξk) usingAlgorithm 3.2, where the number of subsets k is determined by Algorithm 3.2such that 1 ≤ k ≤ N ;

for i = 1 : k dolet indices {s1, . . . , sm} = ξi;minimize f(xs1, . . . , xsm) with Nelder–Mead Simplex method, fixing remainingparameters xi according to xbest;

update xbest with the best found {xs1, . . . , xsm};end

endAlgorithm 3.1: The Subplex algorithm.

The while loop in Algorithm 3.1 is called the outer optimization loop, while the inner loop isthe one found in the Nelder–Mead algorithm.

There are several parameters affecting the behavior of the outer optimization loop. First of all,the partitioning is grouping together “similar” dimensions by looking at the previous iteration’srelative change in xbest. The idea is that dimensions where xbest has changed the most should begrouped together in the next outer iteration, and so on. The partitioning algorithm also limitsthe size of the subsets to the range nsmin to nsmax, which are the two partitioning parameters.

The simplex size reduction factor ψ controls how far the inner Nelder–Mead Simplex methodshould search before it stops, which it does by an extra stopping criterion in the Nelder–MeadSimplex method. The stopping criterion is formulated as

‖xl − xh‖1‖x0

l − x0h‖1

< ψ, (3.2)

where xl is the vertex in the simplex with the lowest objective value, xh the point with the high-est, and x0

l and x0h the points with the lowest and highest objective value in the initial simplex.

The stopping criterion is designed to prevent the method from searching “too accurately” inthe subspaces, since these inner solutions are likely to change in the next outer iteration. Thesubspace searches become more exact if ψ is reduced.

The partitioning algorithm (Algorithm 3.2) uses the function of merit ϕ : (Rn,Z)→ R definedas

ϕ(w,m) =1

m

m∑i=1

|wi| −

1

n−m·

n∑i=m+1

|wi| if m < n,

0 if m = n.

(3.3)

Given a vector w of length n, and a splitting point m, this function measures the differencebetween the average value of the left and the right vector if the vector is split after componentm. If the vector is not split (m = n), then the average value of the whole vector is used.

12

input : step size vector ∆x = (|∆x1|, . . . , |∆xN |), containing the magnitude ofthe change in each parameter value in the previous outer iteration,

output : partitions (ξ1, ξ2, . . . , ξk).parameters: nsmin, nsmaxlet the indices s1, s2, . . . , sN be such that |∆xs1| ≥ |∆xs2| ≥ · · · ≥ |∆xsN |;let i = 0; j = 1;while i < N do

let w = (|∆xsi+1|, |∆xsi+2|, . . . , |∆xsN |);let m = argmax

m=1,2,...,N−iϕ(w,m) subject to nsmin ≤ m ≤ nsmax

and nsmin

⌈N −mnsmax

⌉≤ N −m;

ξj = {si+1, si+2, . . . , si+m};update i = i+m;update j = j + 1;

endAlgorithm 3.2: The subspace partitioning algorithm.

In Algorithm 3.2, note that w is a vector of length (N−i) containing the step sizes of the unparti-tioned variables, and that the output partitions is a vector of sets

(e.g. ({2, 4}, {1, 3, 5})

)of size

k, where k is implicitly determined by the algorithm. The second constraint (nsmin⌈N−mnsmax

⌉≤

N − m) is designed to guarantee that the remaining dimensions can be partitioned, and theequation is further explained in [12].

Algorithm 3.1 can be seen as a generalization of a restarting Nelder–Mead Simplex algorithmand an alternating variable algorithm. If the partitioning is such that it always chooses the fullset as the only partition (nsmin = nsmax = N), the algorithm will optimize the full space ineach inner iteration and restart the simplex when it becomes too small. If the partitioning issuch that it uses only one dimension in the inner optimization loop (nsmin = nsmax = 1), thenthe inner loop will optimize one single dimension at a time becoming an alternating variablealgorithm.

In the original description of Subplex there are two ambiguities in Algorithm 3.2 that make theimplementation in this thesis different than the one in NLopt. The first ambiguity is whetherw should consist of all components of the sorted step size vector ∆x, or only the last (N−i)components. While the first interpretation is used in NLopt, the second interpretation is whatis described in Algorithm 3.2 and is used in our implementation. The difference is that withthe second interpretation the already partitioned dimensions will not affect the partitioning ofthe remaining partitions, which seems to be more appropriate for the algorithm.

As an example, let nsmin = 1 and nsmax = 2, when we partition the vector ∆x = (7, 7, 2, 2, 1).This problem would be partitioned as ({1, 2}, {3, 4}, {5}) by our implementation, but withNLopt’s implementation it would be partitioned as ({1, 2}, {3}, {4}, {5}).

The second ambiguity is which value of m should be used if ϕ(w,m) has the same minimal valuefor multiple values of m. While this may seem like a negligible choice, the difference will beobvious already in the first outer optimization loop when ∆x = 0, since the choice will decidethe size of all but the last subspace. In our implementation the algorithm is designed suchthat it always chooses larger values of m, since this is beneficial when the Parallel Nelder–MeadSimplex method is used as the inner solver (see Section 3.3.2).

13

The size of the initial simplex in the inner Nelder–Mead search is controlled by the step vector,which is initiated with the same values as the scalei parameters in the Nelder–Mead Simplexmethod, and then updated in each outer optimization loop iteration. The vector is updated intwo steps, first the lengths of the components are modified as

step′i ←{

min(max(‖∆x‖1/‖step‖1, ω), 1/ω)) · stepi if m > 1,ψ · stepi if m = 1,

(3.4)

for i = 1, . . . , N , where m is the number of subspaces in Algorithm 3.1, and the vector ofprogress ∆x measures the difference of successive xbest iterates. The variables ψ and ω arealgorithm parameters. This is then followed by updating the direction of the steps by

stepi ←{

sign(∆xi) · |step′i| if ∆xi 6= 0,−step′i if ∆xi = 0.

(3.5)

For a more detailed description of the procedure and the reasoning behind it, see [12].

The step reduction coefficient ω controls the degree to which the step vector can be modified.According to [12], a small ω will cause the algorithm to converge rapidly to a local minimum,while a larger ω will cause a slower convergence, but lead to a more thorough search that maylocate a minimum with a lower objective value.

3.3 Parallel optimization algorithms

For all parallelization methods in this thesis, the concept of parameter level parallelization isused. This means that the algorithm is designed to make use of several independently evalu-ated objective values which can be computed in parallel. The alternative, objective functionparallelization, is another topic that is briefly discussed in Section 5.1. Based on the analysisand improvements of the objective function (Section 4.1 and Appendix C), the parallelizationis designed with the following assumptions:

Assumption 1: The objective function f is computed in constant time, i.e. the time taken forone evaluation does not depend on the input vector.

While the computation time for different evaluations does vary, the difference in time betweenfunction evaluations is small and the input vector dependency is complex. For example, theland constraint is computed by sampling the trajectory between two coordinates with a fixedstep size. Thus, the number of sampled points and the computational time will increase withthe length of the trajectory. In reality this effect is small which is indicated by the timing resultsin Figure C.1 and C.2. This implies that a load-balancing scheme would likely not be able toincrease the parallel throughput in the optimizations.

Assumption 2: Almost all computational time is spent in the objective function, and almostno time is spent in the optimization algorithm routines.

This fact is supported by the timing results in Table C.1, showing that in the initial implemen-tation the Nelder–Mead Simplex routines and Subplex routines accounts for less than 1% of thetotal computation time. The implications of this assumption are that any optimization to theobjective function will improve the optimization time proportionally, and that the performanceof the optimization routines is not critical for the convergence speed.

14

Assumption 3: The computational time per evaluation can be significantly decreased by eval-uating multiple input vectors at the same time, even on a single core machine.

The implications are that even if we only have e.g. 4 parallel processors, an algorithm mak-ing use of 400 simultaneous function evaluations will be useful since each core can compute100 simultaneous evaluations faster than 100 sequential evaluations. This assumption is wellsupported by the timing results in Figure 4.1.

In the methods described in this section it is not taken into account how the parallelizationis implemented, it is just assumed that the simultaneous evaluation of n objective vectors isfaster than the sequential evaluation of n objective vectors. An implementation can thus eitherdistribute the objective vector computations to different cores, or make use of mechanisms likevectorization to make things faster. When used in applications, the parallelization parametersmust always be tuned with respect to the specific problem, since the efficiency of the algorithmsdepend on the relationship between the evaluation time and the number of vectors.

3.3.1 Trivial Nelder–Mead Simplex method parallelization

Some parts of the original Nelder–Mead Simplex method are trivial to parallelize without affect-ing the results. When the initial simplex is formed (see Section 3.1), the objective value of theN+1 vertices can be computed in parallel. The same can be done for the massive contractionstep where all vertices but the best in the simplex are retracted toward the best vertex, allowingfor N parallel objective value computations. The parallel simplex initialization is also usefulfor the Subplex method (and any kind of restarting variant) since a new simplex is formed eachtime the method moves on to a new subspace. All parallel algorithms implemented in this thesismake use of these two parallelizations since the change does not affect the end results of themethods.

3.3.2 Parallel Nelder–Mead Simplex method

A parallel version of the Simplex method is introduced by Lee and Wiswall [5], in which theNelder–Mead update is applied to the P worst vertices in the (N+1)-simplex, instead of justthe single worst vertex. The algorithm parameter P is called the parallelization level, sincethe method can make use of P parallel objective function evaluations in each iteration. Thenumber P must be chosen such that P ≤ N since one point must be left to reflect over, andthe reflection point in the general case is the centroid of the (N−P+1) remaining points. Thethree possible choices of P for the three-dimensional case are shown in Figure 3.1.

What is noteworthy from the results in [5] are that they find that the algorithm’s speed ofconvergence may increase with the parallelization level P , even if the objective function iscomputed in serial and not in parallel. These results are reconstructed and further examinedin Appendix C.3.

15

Figure 3.1: The three possible parallelization levels of the three-dimensional simplex. The bold bluelines are the initial simplex, where 1 denotes the worst vertex (highest function value) and 4 denotes thebest. The dashed red lines show the search lines for the reflected points and the green region show thepoints from which the centroid is formed. The solid red lines together with the green region show thenew simplex formed from the reflected points. (The expansion/contraction points are not shown.)

3.3.3 Parallel Subplex method

What makes the Subplex algorithm hard to parallelize is that the outer optimization loopis sequential, where the problem in each iteration is minimized with respect to a subset ofthe parameters. This makes it impossible to start the computations in the next iterationwithout first finishing the current, and all parallelization must therefore be made in the inneroptimization loop, i.e. the Nelder–Mead Simplex search.

The approach in this method is to use the Parallel Nelder–Mead Simplex method as the inneroptimization, while keeping the outer optimization loop from the Subplex algorithm. Withthis approach, the maximal parallelization level for the inner Simplex search is limited by themaximal subspace dimension, determined by the values of the nsmin and nsmax parameters inthe Subplex algorithm. The default values for these parameters are 2 and 7, which is suggestedin [12] based on the large-dimension performance degradation of the Simplex method (mentionedin Section 3.1). Since we do not know whether this argument holds for the parallel Simplexmethod, we need to re-examine the parameters in order to increase the parallelization level andalgorithm performance. The Parallel Simplex method’s P parameter must be tuned as well, sowe then need to find a combination of nsmin, nsmax, and P that optimizes the performance ofthe combined method. The evaluation of the parameters can be found in Section 4.4.

A modification of the Subplex algorithm’s step variable has been investigated as well, with theintent to make the algorithm find better local minima by making the search more exhaustive.This modification is explained and evaluated in Section 4.3.

3.4 The Grid search method

The Grid search method is a global, gradient-free optimization method that discretizes thedecision variable space to a grid of fixed points, effectively reducing the possible solutions to afinite number. The problem is then reduced to calculating the objective and constraint functionsfor all combinations of the discrete variables, essentially turning it into a graph problem. The

16

discrete voyage planning problem is then solved by dynamic programming, where the problemis solved by iteratively finding all optimal routes at intermediate stages. This makes it possibleto find all Pareto optimal points in the grid without actually evaluating all grid points, sincemany points can be discarded by the dynamic programming. A summary of the referenceimplementation of the multi-objective Grid search method can be found in [4], and a detaileddescription of the single-objective Grid search method is given in [8].

Since this algorithm is comparatively fast and can find the optimal solutions for a discretizedproblem, an implementation of the Grid search method has been used as a reference for thelocal algorithms that have been evaluated. This implementation is referred to as the grid solver.The Grid search method is finding all Pareto optimal points in a subset of the feasible objectivespace, so the solution objective vectors are feasible objective vectors in the non-discretizedproblem. However, it is not necessarily so that the grid Pareto optimal solutions are Paretooptimal for the non-discrete problem.

3.5 Other local reference algorithms

In [2] a number of local optimization algorithms from the NLopt library are evaluated, of whichtwo are used for benchmarking in the evaluation in Chapter 6.

The Praxis methodThe Praxis method is a local, gradient-free optimization method using the “principal-axismethod” by Brent [3]. The evaluated implementation is from the NLopt library, which isessentially an interface to the original Fortran code from 1973.

The analysis of the algorithm performance in [2] indicates that it works well in applicationto the voyage planning problem, but the algorithm itself is not very well-documented andthe only available source code for an implementation is the original Fortran code, andthe modified “Fortran-to-C” variant used in NLopt. This makes it difficult to analyze thealgorithm, and a thorough examination would not fit into the frame of this thesis.

The Cobyla method

The Cobyla method (Constrained Optimization by Linear Approximations) is a local,gradient-free optimization method by Powell [11]. The implementation evaluated is againfrom the NLopt library, and further description of its implementation can be found inNLopt’s algorithm documentation2.

The analysis in [2] indicates similar or slightly better performance than the Praxis method.As with the Praxis method, a thorough examination did not fit into the frame of this thesis,and only the convergence properties of the NLopt implementation have been investigated.

2http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms

17

4 | Algorithm and problem tuning

A number of attempts have been made to improve both the computational speed and the ac-curacy of the obtained solutions. To improve the speed, both the objective function implemen-tation and the algorithm have been tuned and optimized in different ways. These changes willalso affect the accuracy of the solutions, and this chapter intends to describe the improvementsmade to the different parts of the optimization problem.

4.1 Improving the efficiency of the objective function computa-tion

Regardless of how fast and intelligent a parallel solver is, a big source of speedup can oftenbe found in the computation of the objective function. In general, the problem formulationfor gradient-free multi-objective optimization is that the objective function is a “black box”,which takes a vector of parameters and returns a vector of objective values. The box is “black”because we do not know what is going on inside of it – we cannot find an analytic gradient, andwe cannot make any general assumptions on the input-output relations. This is not the case inthis thesis, since we have complete access to the simulation code and can peek into it and makechanges as we find suitable. Any purely computational improvement to the objective functionwill benefit the parallel solvers as well, and this section describes the changes that have beenmade. The major improvements and conclusions that were made from the analysis of the initialimplementation are that:

• There were a few computationally heavy parts in the code that depended only on thegeometry of the nominal route and not on the decision variables. These are now pre-computed ahead of the optimization which decreases the computational time by 25–30%.

• The initial Matlab implementation might suffer from high overhead from function calls.This is suggested by the fact that a vectorized code that computes objectives for multipleroutes simultaneously is significantly faster (about ×100) per function evaluation.

• The computational time can be reduced by changing the decision variable from speed overwater to speed over ground. This makes it possible to parallelize/vectorize the computa-tions in the objective function over the route segments, see Section 5.1.

• The search for unnavigable water along the route (for the “land violation” objective) isdifficult to vectorize in Matlab, and a C/MEX implementation of the routines performingthis check improves the computational speed significantly.

The performance evaluation of the improvements is shown in Section 4.1.1.

The changes mentioned in this section are, unless otherwise stated, used in all evaluationssince they are only affecting the speed of the computations and not the values of the objectivefunctions. The exception is the change of decision variable (point 3) which will obviously alterthe behavior of the objective function as discussed in Section 5.1.

In the implementation developed in this thesis the objective function is vectorized, i.e. rewrittenso that it accepts multiple decision variable vectors instead of only one. The idea is that the

18

simulation code when evaluating multiple routes should be faster per evaluated trajectory, sinceit can combine similar computations, avoid function call overhead, and make use of Matlab’svectorized functions. In general, vectorized code will often also benefit from fewer cache missesand less branching overhead. Counting with Matlab’s profile tool3, the computation ofn ≥ 2 objective vectors for a route with L segments makes n · (10 +L) function calls, while thevectorized code makes 9 + 2L calls independently of the number of vectors. This means thatfor 100 parallel objective evaluations, the number of function calls is reduced by 98%.

4.1.1 Profiling results

In this section the computational efficiency of the objective function code with and without thechanges are analyzed when run on a single processing core. The feasibility of using multipleprocessing cores is discussed at the end of the section. The number of decision variable vectorsn is in the full front sampling process determined by the parallel local solver, e.g. the parameterP in the Parallel Nelder–Mead Simplex method.

The timing results for the efficiency improvements when applied to one of the test voyagesare presented in Figure 4.1. Results for the three other test voyages are similar to those inFigure 4.1, and are included in Appendix A. The decision vectors computed for are randomlygenerated, and the time is measured as the minimum computation time of 100 runs for theserial code and 25 runs for the vectorized code. All timings are performed on a machine withan AMD Opteron CPU4, with execution limited to a single core.

Number of vectors (n)1 20 40 60 80 100 120 140 160 180 200

Tim

e(relativeoriginal

code)

0%

50%

100%

150%

200%

250%

original code (59.2 ms)

best serial VTW (41.3 ms)

best serial VOG (8.3 ms)

VTWVOGVTW (no mex)VOG (no mex)

Figure 4.1: Timing of serial and vectorized code for the SB test voyage (see Section 2.7). VTW denotesspeed through water as the decision variable, while VOG denotes speed over ground (see Section 5.1).Note that the dotted horizontal lines are for a single evaluation, i.e. the vectorized VOG can computeover 200 routes in the same time as the original code can compute 1 route. To compute the same 200routes the original code would require 20000% of the original code’s single evaluation time.

The linear fit for the evaluation time of n vectors with the vectorized code are

tVTW(n) ≈ 94.5 + 0.202 · n [ms] and tVOG(n) ≈ 17.6 + 0.179 · n [ms],

which indicates that the “raw” computation time (without any overhead) for a single route isabout 0.17–0.20 ms. This is to be compared with the fastest single route evaluation of 8.3 ms andthe original code’s 59.2 ms. The time difference between the VTW and the VOG implementationindicates that the “route segment loop” overhead is large. Internally, the difference is that VOG

3See Appendix B.2 for details.4AMD Opteron (Bulldozer) 6274, 2.2 GHz, 16-cores.

19

computes (L·n) “route segment costs” through one segment-vectorized function call, while VTWcomputes the same costs through L segment-vectorized function calls.

The most noteworthy result is that the execution time of the vectorized function is not pro-portional to the number of evaluated variable vectors, i.e. t(n) � n · t(1) for large n, wheret(n) is the time required to compute n decision variable vectors. For a non-vectorized func-tion, the execution time without loop overhead is t(n) = n · t(1), since the computations aremade sequentially. For a perfectly parallel implementation with n computational cores and nooverhead, the execution time is t(n) = t(1). However, this no-overhead assumption is highlydubious, especially with Matlab’s parallel framework which is not designed for low-level paral-lelization. Some further discussion on the actual achieved efficiency improvements can be foundin Section 6.3.

It should be noted that the vectorized VTW implementation is still faster than the originalimplementation for n ≥ 2, since we then have n · 59.2 > 94.5 + 0.202 · n. For n = 1, thevectorized implementation is somewhat slower which is due to the overhead of the unusedvectorization.

The main results with regards to performance are that:

• The difference between the original code and the best serial VTW is mainly the result ofthe precomputation mentioned in Section 4.1.

• The C/MEX implementation of the land constraint evaluation reduced the computationtime significantly for large values of n.

• With a minor change of decision variable as the only algorithmical change (but manyother code optimizations), it was possible to reduce the evaluation time by 86% for asingle function evaluation. This result could be applied almost directly to the code usedin [2], increasing the speed of convergence by an order of magnitude.

• The vectorized code should always be used when possible. When using the Simplexmethod (parallel and non-parallel), the evaluation of the initial simplex is now over 200times faster for all test voyages.

4.2 Wave data interpolation

In the original simulation code, the wave variables (height, frequency, and direction) are derivedfrom a dataset containing estimates of the wave variables in a three-dimensional grid of latitude,longitude, and time. These variables are then used to compute the “mean wave height” objectiveand the water resistance on the ship when moving through the waves. The implementation wasdone in a way that for a given coordinate, the wave data were taken from the closest pointin this grid, creating regions with constant weather conditions and discontinuities betweenthem. This approach has its weak points, since for almost every decision vector x ∈ X, the“mean wave height” objective is then constant in some small neighborhood around x. Thismakes it unlikely for an optimization algorithm to find a good direction of search if the gradientis zero in the neighborhood in which it searches. Also, by definition such a point x is a localminimum which is obviously suboptimal when a local optimization algorithm is used to solvethe problem.

20

Figure 4.2: A minimum of the objective func-tion without interpolated wave data. Each lineshows the effect of a perturbation in a singledecision variable.

Figure 4.3: A minimum of the objective func-tion with interpolated wave data. Each lineshows the effect of a perturbation in a singledecision variable.

The solution has been to interpolate the wave data linearly over the 3D-grid, making the wavevariables continuous and the objective function less problematic to minimize for the optimizationalgorithms. Implementation-wise, Matlab’s griddedInterpolant objects are used with linearinterpolation. While cubic or spline interpolation could be used instead, none of the localoptimization methods would make any obvious use of a continuous gradient. The differencewave interpolation makes on the objective function is shown in Figure 4.2 and Figure 4.3,where the voyage planning cost function is visualized around a local minimum. Each line inthe plot shows a perturbation of a single decision variable x, where the distance from the localminimum x0 is computed as d = (x − x0)/(xUB − xLB). The two plots indicate that the pointx0 is a local minimum with respect to each variable independently, but it might not be a localminimum in the full decision variable vector space. This is because it is still possible thatsome linear combination of two or more variables could lead to a lower objective value in somedirection.

Without the wave interpolation, the objective function appears “step-like” when perturbed inany single dimension, while the objective function with wave interpolation is smoother andwithout distinctive steps. The vertical lines in Figure 4.3 are due to the constraint penalization,where the objective function is increased by 1 as soon as any constraint is violated.

4.3 Modification of the Parallel Subplex algorithm

In an attempt to make the Parallel Subplex algorithm find local minima with lower objective val-ues, the behavior of the step variable has been altered from the original algorithm description.The modification is that the length of the components of step are updated as

step′i ←{

scalei if ∆x 6= 0,step′i update as in eq. (3.4) if ∆x = 0,

(4.1)

where scalei is the initial simplex step parameter from the Subplex method. This modificationmakes the algorithm effectively “restart” when the inner Simplex search finds a better pointby resetting the step variable to its initial value. This makes the algorithm search a largerregion, making it more probable to find a better local minimum. Note that ∆x = 0 indicatesthat the inner search has stopped without finding a better point, likely due to the ψ-criterion

21

in equation (3.2). The modification is evaluated in Section 6.1.3, showing that it increases theaccuracy of the found solutions significantly.

4.4 Parameter tuning of the Parallel Subplex algorithm

A number of parameter variations have been investigated for the Parallel Subplex algorithm inorder to find out if they can improve the solutions and convergence rate of the problem. Thedefault parameters from the original Subplex algorithm might not be optimal in the ParallelSubplex algorithm, as discussed in Section 3.3.3.

For all evaluations in this section, the “mean k value” is used to measure how good a solution is.The k value for a search is defined as the scalar objective value, scaled such that k = 0 for thegrid solver’s solution5. When measured over time, the k value is the best found solution afterthe solver has searched for that amount of time. This value is then averaged over a number ofdifferent search lines in order to evaluate the overall performance of a parameter setting.

meankvalue

0

0.1

0.2

0.3

0.4

0.5

time [s]0 10 20 30 40 50 60

nsmin = 1nsmin = 5nsmin = 10nsmin = 14nsmin = 19nsmin = 25nsmin = 35nsmin = 44nsmin = 59nsmin = 88nsmin = 177

Figure 4.4: Parameter variation of nsmin. The k value measures how close the best found solution isto the grid solver’s solution at time t. Each line is the average of 80 search runs, and the problem issolved for the GN test voyage. For this test voyage the max value of the nsmin parameter is 177(= N).

The parameter variations that have been investigated are:

Minimum number of subspaces nsmin:This parameter controls the subspace dimensions in the Subplex method and is expectedto affect the convergence speed of the algorithm. In Figure 4.4, the effect of differentnsmin values are investigated. For all cases,

nsmin =

⌊N

m

⌋and nsmax =

⌈N

m

⌉for some m ∈ [1, N ],

where N is the total number of dimensions, and m is the number of subspaces. This setupensures that the partitioning algorithm will succeed, and that the subspace dimensionsare either nsmin or (nsmin+1). The parallelization level P is set to 85% of the subspacedimensions, i.e. for a subspace with dimension n, the parallelization level is

P = round(n · 0.85).

5This is the same as the accuracy measure defined in Section 6.1.

22

meankvalue

0

0.2

0.4

0.6

0.8

1

time [s]0 20 40 60 80 100 120 140 160 180

parallelization

levelP

1

20

40

60

80

100

120

140

160

177

(a) Mean scalar objective value vs. time.

time [s]0 20 40 60 80 100 120 140 160 180

parallelization

levelP

1

20

40

60

80

100

120

140

160

177

relative

parallelization

level

0%

20%

40%

60%

80%

100%

(b) Best parallelization level P vs. time. Thered regions show which values of P has the bestsolutions at a specific point in time.

Figure 4.5: Parameter variation of the parallelization level P . Each line is the average of 31 searchruns, and the problem is solved for the GN test voyage.

Parallelization level P :This parameter controls the parallelization level in the Parallel Nelder–Mead Simplexmethod. In Figure 4.5 the effect of different values of P are investigated when nsmin =nsmax = N . This parameter setting allows the largest parallelization level in the innersearch, and the choice was deemed one of the best from the nsmin parameter tuning.The convergence speeds for all possible variations of P are shown in Figure 4.5a. InFigure 4.5b, the best found solution after t seconds is highlighted.

Subplex size reduction factor ψ vs. Simplex contraction factor β:These two parameters were varied in order to examine if some values can improve theresults for the voyage planning problem. The Subplex method’s size reduction factor ψis described in Section 3.2, while the Nelder–Mead Simplex method’s simplex contractionfactor is the parameter controlling how much the simplex should shrink in the two con-traction moves. In [12] the parameter is called β for the contraction move and δ for themassive contraction move, but in this parameter variation we use δ = β and vary the valueof β. The two parameters ψ and β were chosen since they both control how thoroughthe search is, the first for the outer and the second for the inner search in the Subplexalgorithm. The heat map in Figure 4.6 shows how the convergence is affected by varyingthe parameters. The other algorithm parameters are again nsmin = nsmax = N . Thetested parameters are generated by:

beta = linspace(0.0625,0.9375,15);

psi = logspace(log10(0.9),log10(0.01),15);

4.4.1 Optimal parameters

From the nsmin parameter variation, it was determined that the optimal performance for solvingthe voyage planning problem is obtained when nsmin = nsmax = N . The option nsmin = bN/2c,nsmax = dN/2e is also a good candidate, and could be studied further. Since the option

23

ψ10-2

10-1

100

Sim

plexcontractionparameter

0.2

0.4

0.6

0.8

meankvalue

0.5

0.6

0.7

0.8

0.9

Figure 4.6: Parameter variation of the Parallel Subplex algorithm. The Simplex contraction parametersare the β and δ parameters as defined in [12]. The red dotted lines show where the default parametersare located, i.e. β = 0.5 and ψ = 0.25. The mean k value of the default setting is k = 0.5371, and themean k value of the best found setting is k = 0.4945. Each point is the average best found solution after3 minutes for 9 search runs, and the problem is solved for the SB test voyage.

nsmin = nsmax = N is essentially a restarting Nelder–Mead Simplex method, the results arethat the Parallel Subplex method works best when the parameters are chosen such that itbehaves like the Parallel Simplex method with restarting.

From the parallelization level P variation, it was determined that values of P the range 150–160gives the best solutions, with rapid convergence from the start. The choice P = 88 finds thebest solution in the end, but the convergence is slower than with the higher values of P . Also,we do not know if P = 88 is better because it happens to find better local minima, or if it hasbetter convergence than the other values of P . This makes the choice 150–160 a better choice,since we have wider range of good parameters. The optimal level of parallelization is likelydependent on the number of decision variables N , and the optimal value of P should thus bespecified as a function of N . In the evaluations in Chapter 6 the choice

P = round(N · 0.85).

is used as the best found parallelization level. (For the general Parallel Subplex method, usesubspace dimension size n instead of N .) In [5], the optimal parallelization level for a smooth6

test function with N = 100 is found to be P = 80, and with N = 200 it is found to beP = 150. These levels correspond to 80% and 75% respectively, but it should be remarked thatonly 10 different values of P are evaluated, and that they do not account for the paralleliza-tion/vectorization performance boost. Thus, our found optimal parameter value supports theresults of [5].

From the ψ and β parameter variation, it was found that no particular choice of parameterswould improve the solutions significantly. While there were choices of parameters that gavebetter mean k values, the improvements are not substantial enough to support a change fromthe default values.

6A non-smooth test function is also evaluated, for which the optimal P is lower.

24

5 | Choice of decision variables

This chapter describes three different decision variable alterations that have been investigatedfor the voyage planning cost function. The idea is that by changing the decision variablescontrolling the route which is traveled, we can make the algorithms find better minima in ashorter time.

The first variable alteration is made in order to speed up the computation of the objectivefunction, while the second and the third alterations are introduced to manipulate the behaviorof the objective function. The behavior is in this case the gradient and the “landscape” of theobjective function, and by changing it we hope to be able to remove local minima and make thelocal optimization algorithms avoid decision space regions where we know the objective valueswill be high. While the full analytical implications of the changes are too complex to analyzein this thesis, the motivations behind them are explained, and the experimental validations ofthe changes are presented in Section 6.1.

The problem definitions in this section are to be compared with the speed through water problem(Definition 5.1). This is the reference problem, and it is abbreviated as VTW (for speed throughwater) in other parts of the thesis. The indexed variables (e.g. vi, di in equation (2.5)) are alwaysdefined the same way throughout this section, but each of the problem definitions have theirown corresponding x, f , and g.

5.1 Speed over ground variable

One implementation modification that makes a big difference with respect to computation timeis to change the decision variables from speed through water vi to speed over ground vi. Thedifference between these two variables is whether the ocean currents are taken into account,such that while speed through water is used to compute the fuel consumption of the voyage,the speed over ground is used to compute the time it takes to travel between two points on amap. In the original implementation where speed through water is used as the decision variable,the individual costs and constraints for each segment of the route must be computed one afteranother, since the time of arrival at an intermediate coordinate is not known until the speedover ground is computed from the ocean current data. If speed over ground is used, the startingtime for each segment is quickly computed from the decision variables only, and the individualcosts for each segment of the route can be computed in parallel.

This change makes it possible to use objective function parallelization, i.e. the objective functionitself will make use of parallelism at some level. The implementation in this thesis does notassign different segments to different computation cores, but it still benefits from the parallelismthrough increased vectorization. An evaluation of the achieved performance gain can be foundin Section 4.1.1.

Since this modification is a change of decision variables, it will also affect the behavior of theobjective function. The effect of the variable change should be small however, since the twovariables are closely correlated.7 To make the feasible objective space equal after the variable

7For the 2925 solutions obtained in the GN test voyage (Figure 6.2), the average absolute difference betweenthe two variables were 0.192 knots, and the maximum difference was 1.636 knots.

25

change, the bound constraint on speed through water is now included in g instead since itsdependency on the decision variable becomes nonlinear. This constraint asserts that we do notneed to constrain speed over ground, and we can thus use

vLB = 0 and vUB = +∞

for all speeds vi. We can also enforce more realistic limits, e.g. by

vmin − vc ≤ vi ≤ vmax + vc, for all i = 1, . . . , L

where vmin and vmax are the limits on speed through water, and vc is an upper bound for theocean currents in the region of travel.

The problem is similar to the speed through water problem in Definition 2.3, and is defined as:

Definition 5.1. A speed over ground problem (Problem type VOG) is the problem

minimizex∈X

f(x) with X =

{x ∈ R2L−1

∣∣∣∣ g(x) ≤ 0,xLB ≤ x ≤ xUB

}, (5.1)

where

x = (d1, . . . , dL−1, v1, . . . , vL)T ,

di =

“perpendicular distance from the nominalroute’s i-th intermediate coordinate”,

vi =“speed over ground for the i-th segmentof the route”,

(5.2)

f(x) =

“travel time”“fuel consumed”

“mean wave height”

and g(x) =

“distance limit violation”

“land violation”“power violation”

“travel time violation”“speed through water violation”

. (5.3)

5.2 Fixed travel time variable

The second variable alteration is to fix the travel time prior to the optimization, reducingthe number of objectives from three to two. Given a decision variable vector x as defined inequation (5.2), it is easy to find another decision variable vector x′ for which the “travel time”objective is any desired value T by only changing the speed components vi of x:

1. Let x and f be as defined in equation (5.2) and (5.3),

2. For a vector x, compute f(x) and let TW = “travel time”, i.e. the first component of f(x),

3. Let the vector x′ be such that d′i = di for i = 1, . . . , L−1, and v′i = TTW

vi for i = 1, . . . , L.

4. The “travel time” objective of f(x′) is then T .

But to be able to define the problem properly, we introduce the “unscaled speed” variables wias wi = “unscaled speed over ground for the i-th segment”. The relationship between thesedecision variables and the speed over ground variables vi are

vi =T

TW· wi, with TW =

L∑j=1

Dj

wjfor i = 1, . . . , L, (5.4)

26

where T is the desired travel time and Di = “distance in nautical miles between coordinate(i−1) and i”.

We do not have any upper bound constraints on the decision variables wi, so we can relaxthe bound constraints in a similar way to vi, i.e. by using wUB = +∞ for all unscaled speedwi. The lower bound is still needed, since the unscaled speeds must be strictly larger than 0,and the lower bound must be a valid value. In the implementations in this thesis the valuewLB = 0.001 knots is used as the lower bound, but one should be careful if the unscaled speedsin the solutions obtained are close or equal to this value. This constraint will not affect thesolution space of the problem, but it could be avoided by other formulations of the problem. Wecan also remove the “travel time violation” constraint, but again have to include the nonlinearconstraint on speed through water (as in Section 5.1).

The problem is then defined as:

Definition 5.2. A fixed-time problem (Problem type FIX) for a route with travel time T is theproblem

minimizex∈X

fT (x) with X =

{x ∈ R2L−1

∣∣∣∣ gT (x) ≤ 0,xLB ≤ x ≤ xUB

}, (5.5)

where

x = (d1, . . . , dL−1, w1, . . . , wL)T ,

di =

“perpendicular distance from the nominalroute’s i-th intermediate coordinate”,

wi =“unscaled speed over ground for thei-th segment of the route”,

(5.6)

fT (x) =

(“fuel consumed”

“mean wave height”

)and gT (x) =

“distance limit violation”

“land violation”“power violation”

“speed through water violation”

.

(5.7)

With the parameter variation described in Definition 5.2 we get the same computational timebenefit as in the speed over ground problem, since the implementation can be parallelized in theexact same way. When profiling the two implementations used in this thesis, both variationshave the same evaluation time, so there is no disadvantage (or advantage) of the variation withrespect to computational efficiency.

The main motivation for the variable change is that we now have one less cost function tooptimize for, and one less constraint, since the “travel time” objective is automatically fixedto the required value. This is made possible by utilizing more information we have about thevoyage planning problem, i.e. it is a trivial task to find a route (not necessarily feasible) with arequired travel time. By solving this subproblem analytically, we can uncouple it from the twoother objectives and effectively “hide” it from the solver. Experimental results in Section 6.1show that the change improves the accuracy of the solutions significantly.

Since we have a different set of objectives, we have to change the algorithm with which thePareto front is obtained in the following way:

27

input : fix-time voyage planning problem (Definition 5.2),minimal and maximal allowed travel time tmin, tmax,bounding objectives zI , zN as in equation (2.7), (z ∈ R2, fuel and wave only),number of time sample point N1, number of “fuel–wave” sample points N2,

output: sampled Pareto front PS = {x∗11,x∗12, . . . ,x∗N1,N2}.

Create a set of N1 travel times T0 = {t1, t2, . . . , tN} such thattj = tmin + j−1

N1−1(tmax − tmin);

Create a set of N2 “fuel–wave” reference points Z0 = {y1,y2, . . . ,yN} (where yi ∈ R2)based on the bounding objectives zI and zN , as described in Section 2.3;

for j = 1 : N1 dofor i = 1 : N2 do

Find the minimum of the function

hij(x) = H(Syi(ftj (x)),gtj (x)),

where H is the penalization function from equation (2.12), Syi is the scalarizingfunction from equation (2.10), and ftj , gtj from the voyage planning problem;

Let x∗ij = argminx∈X

hij(x).

end

endAlgorithm 5.1: The fixed-time Pareto sampling algorithm.

5.3 Time as decision variable

The third and last variable alteration that has been investigated is to include the travel time Tin the decision variable, creating the problem:

Definition 5.3. A time-as-variable problem (Problem type TAV) is the problem

minimizex∈X

f(x) with X =

{x ∈ R2L

∣∣∣∣ g(x) ≤ 0,xLB ≤ x ≤ xUB

}, (5.8)

where

x = (d1, . . . , dL−1, w1, . . . , wL, T )T ,

di =“perpendicular distance from the nominalroute’s i-th intermediate coordinate”,

wi =“unscaled speed over ground for thei-th segment of the route”,

T = “travel time”,

(5.9)

f(x) =

“travel time”“fuel consumed”

“mean wave height”

and g(x) =

“distance limit violation”

“land violation”“power violation”

“speed through water violation”

. (5.10)

The motivation for defining such a problem, in comparison to Definition 5.2, is that we then havethe “travel time” as an objective again. This makes the Pareto front optimization comparableto optimizations where the speed through water and speed over ground decision vectors areused, and the “travel time” objective becomes continuous again. The “travel time violation”

28

must be re-included in the problem, but it is now a bound constraint and is embedded in xLB

and xUB instead of g.

Compared to a speed over ground decision vector, this method has the benefit that the “traveltime” objective is completely uncoupled to all but one decision variable, to which it is linearlydependent. This is a good property since a change in di or wi will only affect the “fuel consumed”and “mean wave height” objectives, but not the “travel time” objective. Experimental resultsare shown in Section 6.1, and they indicate that the accuracy of the solutions is similar to theresults of the fixed-time decision variable.

29

6 | Results

In this chapter the performance of the algorithms and the effect of the problem variations areevaluated. The two main aspects investigated are the speed of convergence of the algorithmsand the accuracy of the solutions. While the first objective is easily defined and measured, thesecond objective is not evaluated without difficulty.

6.1 Evaluation of the accuracy of the solutions

One aspect that was not investigated in [2] is the accuracy of the solutions obtained. Sincewe are investigating the use of local, gradient-free optimization methods to solve minimizationproblems, we are approximating the global minimum by a local minimum. The accuracy of thisapproximation must be evaluated in some way, and this is what we are doing in this section.

This task is not trivial since we cannot compare an “approximate” front to the actual Paretofront, due to the decision space being too large for us to find it. Since we cannot find outif a point on the obtained front is Pareto optimal, or how close it is to the Pareto front, wehave to measure the accuracy in some other way. The obtained front is thus compared to thediscretized problem’s Pareto front, which can be computed by the Grid search method describedin Section 3.4. While the discretized problem’s Pareto front is obviously not the same as thefull problem’s Pareto front, it is a subset of the feasible solution space and it should be close, insome sense, to the real Pareto front. It can be used to get an indication of how accurate a localsolver is, since if the solutions obtained with a local optimization method are equal or betterthan the grid solver’s solutions, they should be relatively close to the full Pareto front.

In order to be able to evaluate a front, we define the accuracy measure as:

Definition 6.1. Let P ⊆ Z be the Pareto front of the discretized problem, and let P ′ be thecorresponding scaled Pareto front, by equation (2.9). For a scaled objective vector z0 ∈ P ′, theaccuracy k of an objective vector x is

k = H(Sz0(f(x)),g(x)),

where H is the penalization function from equation (2.12), Sz0 is the scalarizing function fromequation (2.10), and f , g from the voyage planning problem. Note that k is the obtained solutionto the scalar minimization problem in Algorithm 2.1 if z0 is used as the reference point.

The accuracy measure can be seen as a measure of how good the local solver is at finding theglobal solution to the problem. If a local optimization method gives a solution x for which theaccuracy is k, then this solution is better than the grid solver’s solution if k < 0, equal if k = 0,and worse if k > 0.

Definition 6.1 also allows us to evaluate the performance of a local solver when obtaining a fullfront of solutions. We do this by computing a front with Algorithm 2.1, but instead of samplingN points from the reference plane, we use the grid solver’s solutions as our reference points.This gives us an “accuracy map” of the different regions of the Pareto front, which we use toevaluate the accuracy of the full front. In the accuracy maps in this section all solutions whichare better than or equal to the grid solver’s solutions are marked with red dots in order todistinguish the regions where the local solvers find better solutions.

30

6.1.1 Wave data interpolation

In order to evaluate the effect of wave data interpolation, accuracy maps for the four NLoptsolvers and the Parallel Subplex algorithm, with and without wave interpolation are shown inFigure 6.1 and Figure 6.2. All runs are solving the speed over ground problem (Problem typeVOG). Note that the shapes of the Pareto fronts are slightly different between Figure 6.1 and6.2, since the grid solver’s solutions are also different due to the wave data interpolation. Sinceit is clear that the wave data interpolation increases the accuracy of the solutions significantly,all other evaluations in this chapter have it enabled.

0

time

0.5

Parallel Subplex

11

0.5

fuel

1

0

0.5

0

waveheight

0

time

0.5

NELDERMEAD

11

0.5

fuel

1

0

0.5

0

waveheight

0

time

0.5

SBPLX

11

0.5

fuel

1

0

0.5

0

waveheight

0

time

0.5

COBYLA

11

0.5

fuel

1

0

0.5

0

waveheight

0

time

0.5

PRAXIS

11

0.5

fuel

1

0

0.5

0

waveheight

k0

0.2

0.4

0.6

Figure 6.1: Accuracy maps when no wave interpolation is used. The problem analyzed is the GN testvoyage, with Problem type VOG. Each point corresponds to a line-search with a grid solver solutionas the reference point. The marker position is the grid solver’s scaled objective value, and the colorcorresponds to the accuracy of the best found solution. The red points show where the solutions arebetter than the grid solver.

6.1.2 NLopt vs. Parallel Subplex method

Figure 6.1 and 6.2 also shows how the NLopt algorithm performs in comparison to the ParallelSubplex method with respect to accuracy, when solving Problem type VOG. It is clear that theParallel Subplex performs better than the NLopt methods, and it is also better than the gridsolver in the region where the wave height objective is less significant.

6.1.3 Modification of the Parallel Subplex algorithm

Figure 6.3 shows the effect of the step variable modification described in Section 4.3. Themodification improves the accuracy of the algorithm significantly, and all other Parallel Subplex

31

0

time

0.5

Parallel Subplex

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

NELDERMEAD

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

SBPLX

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

COBYLA

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

PRAXIS

11

0.5

fuel

0

0.5

1

0

waveheight

k

0

0.2

0.4

0.6

Figure 6.2: Accuracy maps when wave interpolation is used. The problem is otherwise the same as inFigure 6.1.

method evaluations in this section use the modified variable.

0

time

unmodified step variable

0.5

11

0.5

fuel

0

0.5

0

1

waveheight

0

time

modified step variable

0.5

11

0.5

fuel

0

0.5

0

1

waveheight

k

0

0.05

0.1

0.15

0.2

Figure 6.3: Accuracy maps of the Parallel Subplex method with and without the step variable modi-fication. The problem solved is the speed over ground definition, with the GN test voyage.

6.1.4 Choice of decision variable

In Figure 6.4 and 6.5, the accuracy maps for the four problem definitions are shown. For the GNtest voyage (Figure 6.4) the speed through water and speed over ground definitions have similaraccuracy, and the fixed-time and time-as-variable definitions yield an increased accuracy in allregions of the Pareto front. Similar results are achieved for the SB test voyage (Figure 6.5), butthe time-as-variable problem finds worse solutions. This is due to an algorithm issue where the“travel time” decision variable gets stuck at T = Tmax, which is further discussed in Section 7.2.In Figure 6.6, the distribution of the accuracy k for the four problem definitions are compared.

32

0

time

0.5

speed through water

11

0.5

fuel

0

0.5

0

1

waveheight

0

time

0.5

speed over ground

11

0.5

fuel

0

0.5

0

1

waveheight

0

time

0.5

fixed-time

11

0.5

fuel

0

0.5

0

1

waveheight

0

time

0.5

time-as-variable

11

0.5

fuel

0

0.5

0

1

waveheight

k

0

0.05

0.1

0.15

0.2

Figure 6.4: Accuracy maps with different problem definitions. The solver used is the Parallel Subplexmethod for all four fronts, with the GN test voyage. Notice that the “speed over ground” front is thesame as the rightmost front in Figure 6.2, and that the color scale is changed compared to Figure 6.1and 6.2.

0

time

0.5

speed through water

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

speed over ground

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

fixed-time

11

0.5

fuel

0

0.5

1

0

waveheight

0

time

0.5

time-as-variable

11

0.5

fuel

0

0.5

1

0

waveheight

k

0

0.1

0.2

0.3

0.4

0.5

Figure 6.5: Accuracy maps with different problem definitions. The solver used is the Parallel Subplexmethod for all four fronts, with the SB test voyage.

33

0

50

100

speed through water

0

50

100

speed over ground

0

50

100

fixed-time

final accuracy k

-0.1 0 0.1 0.2 0.3 0.4

number

ofsearchlines

0

50

100

time-as-variable

(a) GN test voyage

0

50

100

speed through water

0

50

100

speed over ground

0

50

100

fixed-time

final accuracy k

-0.1 0 0.1 0.2 0.3 0.4

number

ofsearchlines

0

50

100

time-as-variable

(b) SB test voyage

Figure 6.6: Distribution of accuracy k for the four problem definitions.

6.2 Evaluation of the convergence speed

time [s]0 20 40 60 80 100 120 140 160 180

meanaccuracy

k

0

0.2

0.4

0.6

0.8

single search line convergencemean search line convergence

Figure 6.7: Convergence of all 585 search lines, when the Parallel Subplex is solving the speed overground problem for the GN test voyage.

In this section the convergence speed of the different problems are evaluated. All plots show theaverage accuracy k for the best found solution up until the time on the horizontal axis. Eachconvergence line is the average of all searches in the corresponding accuracy map in Section 6.1.This means that the convergence line for a SB test voyage is the average of 559 searches, andfor a GN test voyage the average of 585 searches. In Figure 6.7, all convergence lines for one ofthe Pareto fronts are shown in order to illustrate how the average is formed. As can be seen,the convergence can vary a lot from search line to search line, meaning that an average accuracyclose to or less than zero does not imply that all solutions are as good or better than the gridsolver’s solutions.

In Figure 6.8 and 6.9 the average convergence for the NLopt library algorithms are comparedto the Parallel Subplex algorithm. These two plots also compare the wave data interpolation,and it is clear that the interpolation makes all solvers find better solutions.

In Figure 6.10 and 6.11 the convergence is compared for the four problem definitions. For bothtest voyages the fixed-time problem definition converges to the best solutions. For the SB testvoyage, the time-as-variable problem definition appears to converge to worse solutions than thespeed over ground problem definition. This is mostly due to the algorithm issue mentioned inSection 6.1.4.

34

time [s]0 20 40 60 80 100 120 140 160 180

meanaccuracy

k

0

0.1

0.2

0.3

0.4

0.5

NLopt Nelder–MeadNLopt CobylaNLopt PraxisNLopt SBPLXParallel Subplex

Figure 6.8: Convergence for the speed through water problem, GN test voyage, without wave interpo-lation.

time [s]0 20 40 60 80 100 120 140 160 180

meanaccuracy

k

0

0.1

0.2

0.3

0.4

0.5

NLopt Nelder–MeadNLopt CobylaNLopt PraxisNLopt SBPLXParallel Subplex

Figure 6.9: Convergence for the speed through water problem, GN test voyage, with wave interpolation.

time [s]0 10 20 30 40 50 60

meanaccuracy

k

0

0.1

0.2

0.3

0.4

0.5

speed through waterspeed over groundtime-as-variablefixed-time

Figure 6.10: Convergence for the four problem definitions, for the first 60 seconds. The problemsare solved with the Parallel Subplex method, for the GN test voyage. The mean number of functionevaluations per second is 641 for the speed through water problem, and ∼1800 for the others.

time [s]0 10 20 30 40 50 60

meanaccuracy

k

0

0.2

0.4

0.6

0.8

speed through watertime-as-variablespeed over groundfixed-time

Figure 6.11: Convergence for the four problem definitions, for the first 60 seconds. The problemsare solved with the Parallel Subplex method, for the SB test voyage. The mean number of functionevaluations per second is 662 for the speed through water problem, and ∼2200 for the others.

35

6.3 Evaluation of the computational efficiency

In order to evaluate the actual computational efficiency improvements achieved when runningthe optimizations, the time required to compute the Pareto fronts with the original and theserial implementation can be estimated. From the timing results in Section 4.1.1 (Table A.1)it is possible to compute the number of evaluations per second (eval/s) for the non-vectorizedimplementations, which can then be used to estimate the time required to compute the frontsfrom Chapter 6. In Table 6.1 these numbers are shown for the SB and GN test voyages. Notethat the times are for one single search line and not for the full front search.

Code eval/s time speedup

new vectorized 2200 2 m 130new serial 120 37 m 7.1original code 16.9 260 m 1

(a) SB test voyage

Code eval/s time speedup

new vectorized 1800 3 m 180new serial 78.7 69 m 7.9original code 10.0 540 m 1

(b) GN test voyage

Table 6.1: Achieved performance relative to the original code.

The 2 and 3 minute search time limits are heuristically chosen as time durations where most lineshave converged to a local minimum. The stopping criterion is further discussed in Section 7.2.

6.3.1 Total front computation time

In this section the total front computation time is compared to the time requirements of thegrid solver. It should be noted that these numbers are rough estimates, and that there aremany factors that affect the actual time requirements. These factors are discussed further inChapter 7.

execution timeVoyage 1 core 16 cores front size

SB 52 m 8 m 559GN 4 h 54 m 42 m 585GD 70 s – 25

(a) Grid solver

execution timeVoyage 1 core 16 cores 47 cores search lines

SB 18 h 38 m 1 h 51 m 24 m 559GN 29 h 15 m 1 h 10 m 39 m 585GD – – – –

(b) Parallel Subplex

Table 6.2: Time required to compute the full front for the different methods.

The execution times for the grid solver are based on measurements8 of the serial implementation,and the 16 core time is based on the results from [13], where five parallel implementations ofthe Grid search method are evaluated. Based on their evaluations, a speedup factor of about 7can be expected for both the SB and the GN test voyage when using 16 cores.

The execution times for the Parallel Subplex solver are theoretical, and are estimated by aperfectly parallel setup where each search line takes exactly 2 or 3 minutes to compute. The47 core time is provided since this setup9 was used for computing the fronts examined inSection 6.1 and 6.2. The measured overhead for the implementation was rather small evenfor the Matlab implementation, and the actual computation time for the GN test voyagewas 40 m 16 s, compared to a theoretical 39 m. Some possible improvements are discussed inChapter 8.

8CPU: Intel Core i7-3687U CPU @ 2.10GHz - 2.60 GHz9The cluster allocates one separate core for Matlab’s main thread, so the actual core count is 48.

36

7 | Discussion & conclusions

7.1 Computational improvements

A big improvement with respect to the computational efficiency of the full Pareto samplingprocess has been achieved, mainly through vectorization and code enhancements on differentlevels of the front sampling process. To achieve the same results as the new implementation, theoriginal code would require 180 times as much time for the largest of the evaluated problems.With the 48 core cluster used to compute the fronts in this thesis, this reduces the time requiredfrom 5 days to merely 40 minutes.

With one exception, all improvements are made in Matlab code only. Compared to a C/C++implementation, this is advantageous since future changes to the voyage planning model areeasily integrated into the code base. The one exception is the MEX/C++ implementation ofthe routine performing the land constraint violation check, which greatly improves the compu-tational speed of the vectorized objective function. This module can be used for other systemsas well, but it should be noted that the largest benefit is achieved when the land constraints formany segments are to be computed simultaneously. For the computation of one segment, theoriginal code is just as fast as the new implementation.

The use of parallel optimization algorithms in combination with a vectorized objective functionhas turned out to be a very successful approach to increase the efficiency of the implementation.The approach would likely not have had the same relative performance gain if the overhead of theobjective function were to be reduced by some high-performance implementation of the wholevoyage planning model, but a huge effort would be required for the development of such a code.The approach allows us to reduce the impact of the objective function overhead significantly,without actually using multiple parallel computational units. This in turns enables us to usethe full parallel capacity (where available) for different search lines, for which the algorithm istrivially parallelizable.

The alternative to this approach would be to either make the objective function internallyparallelized (i.e. objective function parallelization), or couple the parallel optimization algorithmwith a vectorized objective function that computes different routes on different computationalcores. However, it would likely be difficult to achieve equally good performance since Matlab’sparallel framework tends to have a large communication overhead.

The vectorized code would likely be useful for the evolutionary algorithms evaluated in previousthesis projects [1]. The same speedup of 180 should be observed for the GN test voyage if apopulation size of 150 (i.e. same parallelization level) is used, and an even bigger speedup if thepopulation size is larger than this.

7.2 Algorithms

The algorithms studied in this thesis have been evaluated thoroughly, and the conclusions arethat the parallel optimization algorithms can be advantageous when applied to the currentimplementation of the voyage planning problem. The implemented parallel Subplex algorithmhas turned out have both a good convergence rate, and a good accuracy with respect to finding

37

approximate solutions to the global problem.

One issue of using the NLopt library, or any other generic local optimization algorithms, isthat these algorithms are designed to find any local minimum of the objective function. Inthe application of voyage planning, it is of greater importance that the solution is close to theglobal minimum than that it is a local minimum. By modifying the Parallel Subplex suchthat it restarts the Simplex search (Section 4.3), we effectively “globalize” the algorithm in asimilar fashion as in the GBNM algorithm in [7]. The modification does not guarantee that thesolution will converge towards the global minimum, but it will increase the probability of findinga better local minimum. However, the modification might have undesired effects for other kindof problems, since if the objective is to find any local minimum the method will converge slowertowards a sufficiently good solution.

The restart mechanism introduced in Section 4.3 is also a crude brute-force approach that canbe improved in several ways. While further research is required, it is likely that a combination ofthe GBNM algorithm with the Parallel Nelder–Mead algorithm would yield better results thanthe current implementation. The implementation also has other issues, one of which appearsin the time-as-variable accuracy map in Figure 6.5. The high k values in this plot are dueto the false assumption that ∆f(x) = 0 implies that ∆x = 0. This is not the case in theimplemented Nelder–Mead Simplex method, since any equally good vertex (i.e. f(x) = f(x0))could be returned as the best found vector by the inner search. The issue can be resolved bymaking the inner search return x0 if this is the case, or by changing the condition in equation 4.1to test for ∆f(x) = 0. With the second approach, the algorithm found better results for thementioned accuracy map, but the effect on the other problems has not been investigated.

The conclusion drawn from the investigations of the restarting mechanism are that even thecrude approach used in this thesis is good enough to make the solver find significantly bettersolutions. The reinitialization mechanism is important for achieving a good convergence ratewith the Nelder–Mead Simplex method when the number of dimensions is high, and even moreso when the parallel Simplex method is used.

A result from the parameter tuning in Section 4.4 was that the choice nsmax = nsmin = N wasthe best configuration for the Parallel Subplex algorithm when applied to the voyage planningproblem. This is unexpected, since it effectively turns the Subplex algorithm into a restartingNelder–Mead Simplex method. Thus, the best found method is actually a variant of the ParallelNelder–Mead Simplex method, and not a Parallel Subplex method. Further research could bemade on the option nsmin = bN/2c, nsmax = dN/2e, which had similar performance as theprevious choice of parameters.

The stopping criterion used in all optimizations in this thesis is based on execution time, and ischosen heuristically based on observation of the convergence rate for the studied test voyages.As this approach for obvious reasons is not suitable for a final application, a more sophisticatedstopping criterion based on convergence rates should be investigated and implemented. Theadvantage of the execution time stopping criterion is that each search-line gets exactly the sameamount of optimization time, and that we are sure that the optimization does not stop tooearly. The last property is important when we are comparing the accuracy of different problemdefinitions, since we can be sure that the problem definition will not affect when the algorithmstops. For a final application, one should decide on a specific problem definition before decidingon the stopping criteria.

38

7.3 Viability of the local solver approach

When a global method like the Grid search algorithm is used, it is not necessary to have a“well-behaved” objective function. By this we mean that the objective function can have dis-continuities, zero gradient, many local minima, and multiple disjoint feasible regions. Theseproperties will not cause any problems as long as the voyage planning model and grid dis-cretization is accurate enough for our needs, and the global solver will find all optimal solutionsregardless.

The same is not true for a local solver, since it relies on the shape of the objective functionto determine where to search next in the objective space. Discontinuities and zero gradientswill make it difficult to locate a good “downhill” direction to search in, too many local minimawill make it difficult to get close to the global minimum, and disjoint feasible regions make itdifficult to “travel” from one region to another.

This creates a disadvantage for the local solver approach, in that the accuracies of the foundsolutions are sensitive to the design of the voyage planning problem. In this thesis we haveshown that tuning the voyage planning model to be more suited for the local solvers is moreimportant than the choice of local optimization algorithm. This is clear from the results inChapter 6, where both the wave data interpolation and fixed-time problem definitions improvedthe accuracies of the found solutions much more than the choice of optimization method. Thismeans that one has to be careful of how the voyage navigation problem is defined, both withrespect to objectives and decision variables.

Without the wave data interpolation, the “mean wave height” objective is a sum of discretevariables, making its gradient zero, and all points are thus local minima. The zero gradientis evident in Figure 4.2, where the objective function is piece-wise constant in a neighborhoodaround the minimum. Finding the minimum of this function is a discrete optimization problem,for which the (continuous) local optimization methods are unsuitable. With interpolation, theobjective is a sum of continuous variables which is better suited for the local optimizationmethods. The “mean wave height” objective could also be made even more well-behaved byincreasing the number of sampled points over the route, which would approximate the meanwave height over the full route better than the sum of the wave height at the coordinates. Thiswould likely also decrease the number of local minima in the objective space.

The conclusion is that for the local solver approach to be viable, the voyage planning problemmust be defined and implemented in a way such that it is a problem suitable for the localsolvers. It is thus more difficult to use than the global methods, but there are other propertiesthat must be considered. When compared to the grid solver, the main aspects to consider are:

• The sensitivity to the problem definition, as discussed above.

• As seen in Figure 6.4 and 6.5, the local solvers can find better solutions than the gridsolver, but there is no guarantee of finding such. The grid solver will find all optimalsolutions to the discrete problem, but we do not know how close they are to the optimalsolutions of the continuous problem.

• The front sampling algorithm being used is “embarrassingly parallel”, meaning that itscales perfectly with the number of computational cores available, up to the maximumsearch time of a search line. For the grid solver, a speedup factor of 7 can usually beexpected when using 16 cores (result from [13]).

39

• The computational time scales linearly with the number of required points on the front,but further work can likely be made to reduce the total workload.

• As opposed to the grid solver, the solutions found with the local solvers are continuous.This is an advantage since we can find better solutions that are not present in the discreteproblem.

• The size of the search space can be increased without affecting the convergence rate. Thecurrent “maximum distance from the nominal route” is set to give a fair comparison tothe grid solver, but the limits could easily be loosened to allow the local solvers to searchroutes over larger regions.

• The local solvers allow for more complex/accurate voyage objective models, since thereis no need for the “aggregation” property of the segments. For the grid solver, this canbe limiting if further constraints are to be used, or other properties of the route must beaccounted for. An example would be if there would be a constraint on the change of speedbetween two route segments. In this case, another dimension would be needed in the gridwhich would complicate the method unreasonably.

• With the current implementation, the grid solver has an advantage with respect to com-putational time if the same hardware setup is used for both methods. It is however likelythat other search schemes can reduce the total computation time for the local solvers, asdiscussed in Chapter 8.

8 | Further improvements

0

00

0.5

timefuel

waveheight

0.50.5

1

1 1

Pareto optimal

non–Pareto optimal

Figure 8.1: Non-optimal points in theSB test voyage, fixed-time problem.

There are several parts of the full front sampling process(Section 2.6) that can be improved further. In addition tothe possible improvements mentioned in Chapter 7, thePareto sampling algorithm can be better adapted to theuse of local solvers. In Figure 8.1, the Pareto optimalityof the found solutions from the fixed-time problem in Fig-ure 6.5 is shown. For this front evaluation, 236 out of the559 solutions were non–Pareto optimal, i.e. dominated byat least one of the other found solutions. The reason forthe non-optimality can be either that the search line hasnot fully converged, or that a non-global minimum hasbeen found.

To improve these non-optimal solutions, one could restartthe search with a dominating solution given as the initial guess, which would allow a bettersolution to be found for that search line. This process could then be iterated until all searchlines has one optimal solution.

A similar approach could be used to reduce the required CPU-time, by allowing the search linesto “borrow” solutions from each other. If the solver has access to, say 47 cores, it could startby solving 47 lines in parallel. When it starts the next 47 searches, it could use the so far bestfound solution to initiate those search lines, and so on. Since the separate search lines wouldhave to search for a shorter time, this setup would reduce the total required CPU-time. It could

40

also be used to “upsample” a front, i.e. by creating new search lines in a region of interest onthe front.

Another way of using the local solvers could be to couple them with the grid solver, by lettingthe grid solver do a “rough” initial search, and then let the local solver do a final search basedon the solutions of the grid solver. If this second-stage search is upsampled, it could improveboth the resolution and the accuracy of the solutions, without requiring a higher resolutiondiscretization in the grid search.

Some other details of the front sampling process that can be worked on are:

• Even if the current process would find the global solutions to the minimization subprob-lems, the solutions are possibly weakly Pareto optimal. A final search should thus bemade to make sure that the non-dominant objectives10 are minimal.

• The use of different scalarization methods should be investigated, since the choice ofsuch could improve the accuracy of the solutions. In the current implementation, thesolutions are sometimes located far from the actual search line, i.e. far from the tip of thecone in Figure 2.3. In [6], the scalarization is formulated as a line-search with a “slackvariable” instead of the cone formulation used in this thesis. Further improvements couldpenalize the solution if it is located too far from the search-line by placing a constrainton the slack variable. In [4], another scalarization method called the Weighting methodis described, which could be further evaluated in combination with some similar slackvariable constraint approach.

• Improved restarting mechanism (GBNM, etc.), as discussed in Section 7.2.

• Improved wave measure, as discussed in Section 7.2.

10i.e. the scaled scalar objectives that are not maximal in equation (2.10).

41

References

[1] Angelica Andersson. “Multi-objective optimisation of ship routes”. M.S. thesis. ABB Cor-porate Research/Chalmers University of Technology, 2015.

[2] Joakim Borgh and Erasmus Cedernaes. “Analysis of optimization algorithms for a blackbox problem”. Project report. ABB Corporate Research/Uppsala University, 2016.

[3] Richard P Brent. Algorithms for minimization without derivatives. Prentice-Hall, 1972.

[4] Waqar Hameed. “Multi-Objective Optimization of Voyage Plans for Ships”. M.S. thesis.ABB Corporate Research/Lund University, 2016.

[5] Donghoon Lee and Matthew Wiswall. “A parallel implementation of the simplex functionminimization routine”. In: Computational Economics 30.2 (2007), pp. 171–187.

[6] Jonas Linder and Simon Lindkvist. “Interactive multiobjective optimization with appli-cation to hot rolling mills”. M.S. thesis. ABB Corporate Research/Chalmers Universityof Technology, 2011.

[7] Marco A Luersen and Rodolphe Le Riche. “Globalized Nelder–Mead method for engineer-ing optimization”. In: Computers & structures 82.23 (2004), pp. 2251–2260.

[8] Mats Molander. Algorithm for Optimized Voyage Planning. Technical Report SECRC/AT/TR-14/021. ABB Corporate Research, 2014.

[9] John A Nelder and Roger Mead. “A simplex method for function minimization”. In: Thecomputer journal 7.4 (1965), pp. 308–313.

[10] Peter Nordstrom. “Multi-objective optimization and Pareto navigation for voyage plan-ning”. M.S. thesis. ABB Corporate Research/Uppsala University, 2014.

[11] Michael JD Powell. “A direct search optimization method that models the objective andconstraint functions by linear interpolation”. In: Advances in optimization and numericalanalysis. Springer, 1994, pp. 51–67.

[12] Thomas Harvey Rowan. “Functional stability analysis of numerical algorithms”. In: (1990).

[13] Anton Sundin and Viktor Wase. “Parallelization of ABB’s Grid Search Method usingMATLAB”. Project report. ABB Corporate Research/Uppsala University, 2016.

42

A | Timing of the objective function

Number of vectors (n)1 20 40 60 80 100 120 140 160 180 200

Tim

e(relativeoriginal

code)

0%

50%

100%

150%

200%

original code (40.2 ms)

best serial VTW (27.3 ms)

best serial VOG (6.1 ms)

VTWVOGVTW (no mex)VOG (no mex)

Figure A.1: Timing of serial code and vectorized code for the GD test voyage.

Number of vectors (n)1 20 40 60 80 100 120 140 160 180 200

Tim

e(relativeoriginal

code)

0%

50%

100%

150%

200%

250%

original code (77.1 ms)

best serial VTW (54.3 ms)

best serial VOG (10.1 ms)

VTWVOGVTW (no mex)VOG (no mex)

Figure A.2: Timing of serial code and vectorized code for the KQ test voyage.

Number of vectors (n)1 20 40 60 80 100 120 140 160 180 200

Tim

e(relativeoriginal

code)

0%

50%

100%

150%

200%

250%

original code (100.0 ms)

best serial VTW (70.0 ms)

best serial VOG (12.7 ms)

VTWVOGVTW (no mex)VOG (no mex)

Figure A.3: Timing of serial code and vectorized code for the GN test voyage.

43

execution time [ms]

Test voyage original code tVTW(n) tVOG(n) tVOG(1)

GD 40.2 54.9 + 0.125 · n 12.8 + 0.111 · n 6.1

GN 100.0 158.5 + 0.359 · n 28.4 + 0.303 · n 12.7

KQ 77.1 121.4 + 0.265 · n 22.1 + 0.232 · n 10.1

SB 59.2 94.5 + 0.202 · n 17.6 + 0.179 · n 8.3

Table A.1: Linear fits of evaluation time for the four test voyages. The tVTW(n) and tVOG(n) linearfits are for n ≥ 2.

B | Implementation details

B.1 Implementation of the constraint penalization

The constraint penalization is implemented in a more complex way than described in Section 2.4due to numerical reasons. Let x ∈ RN , f : RN → R, and let the scalar constraint violation g(x)be defined as in Section 2.4.

Define the barrier function bε as

bε(x) =

0 if x ≤ 0,x/ε if 0 < x < ε,1 if ε ≤ x,

where ε is a tolerance variable such that ε > 0.

The penalized objective function Hε with tolerance ε is then

Hε(v, c) = v +

{k · (bε(g(c)) + g(c)) if g(c) > 0,

0 if g(c) ≤ 0,

where k and ε are parameters.

The reason behind this design of Hε is that we want to mimic the behavior of H from equa-tion (2.12), while still retaining a tolerance for small violations of the constraints. Ideally, weshould use H as defined in equation (2.12), but because of the numerical differences in theevaluation of the objective function, the same decision vector can sometimes evaluate to twodifferent objective values, depending on how many decision vectors are evaluated in the samefunction call. Even if this effect is small (commonly in the magnitude of order 10−15), it is largeenough to trigger the barrier function, effectively amplifying the numerical error by several or-ders of magnitude. By letting the barrier function tolerate some violation of the constraint, theeffect of these numerical errors can be dampened. The tolerance used in throughout this thesisis ε = 10−8.

The numerical differences that have occurred in the scope of this thesis are due to A\[b,b] 6=[A\b,A\b] for some matrices A and b, when evaluated in Matlab.

44

B.2 Measuring the number of function calls

In order to measure the number of function calls in Matlab, the following code is used:

profile on

% Expression to count function calls for

f(X);

stats = profile(’info’);

profile off

% Number of function calls

num_calls = sum([stats.FunctionTable.NumCalls]);

Note that this measures the number of profiled Matlab function calls. While the actual per-formance degradation of the function call overhead is not evaluated in the thesis, it is oftenrecommended to reduce them for performance, especially for high-level programming languageslike Matlab.

45

C | Analysis of the initial algorithms

This section contains the results from the initial study of the algorithms, presenting the moti-vations behind the choice of methods and the reasoning behind the algorithm designs. In theanalysis the original implementation is used, and the problem used is thus of Problem typeVTW without the improvements made in Section 4.

C.1 Analysis of the NLopt library

In order to get a picture of the complexity of the NLopt algorithms, the self time of the localalgorithms in the library was analyzed. Table C.1 shows how much of the total computation timeis spent in the algorithm routines when optimizing the voyage planning problem. The advantageof a low self time percentage is, except for the reduced search time, that we do not have to worryabout the performance of the algorithm code in our own implementation of the algorithm. Alow self time percentage also indicates that the algorithm is not excessively complicated, makingit easier to focus on algorithm design instead of implementation. Based on this evaluation, itwas decided that the Subplex method (SBPLX in NLopt) and the Nelder–Mead algorithm weregood candidates to investigate further.

NLopt Algorithm Self time

SBPLX 0.28%NELDERMEAD 0.37%COBYLA 21.94%PRAXIS 0.28%NEWUOA 3.72%NEWUOA BOUND 63.22%BOBYQA 5.65%

Table C.1: Algorithm self time.

C.2 Analysis of the Subplex method

The problem with parallelizing the Subplex algorithm is that the method essentially has one“best” vector x that it sequentially updates, subspace for subspace. These updates cannotbe made in parallel, since the minimum in one subspace will affect the minimum of the nextsubspace. The only alternative is thus to make the inner search parallel, i.e. parallelize the NMS-search. This is possible to do with the Parallel Nelder–Mead Simplex method (Section 3.3.2),which then has its maximal parallelization level controlled by the subspace dimensions in theSimplex method.

To determine whether this is a viable approach, the code in NLopt was analyzed when solvingthe voyage planning problems studied in this thesis. Some statistical data for the algorithm isshown in Figure C.1 and C.2, where the number of subspace dimensions, function evaluationsper inner search (NMS-iteration), time per local search, and time per function evaluation ispresented.

46

subspace dimension1 2 3 4 5

0

1000

2000

evals/NMS-iteration0 30 60 90 120 150

0

250

500

time/NMS [ms]0 2 4 6 8 10 12 14

0

200

400

time/eval [ms]0 0.05 0.1 0.15 0.2

0

500

1000

Figure C.1: Statistics for the NLopt SBPLXalgorithm when optimizing the SB test voyage.(105 decision variables, 22 Subplex iterations,20000 function evaluations)

subspace dimension1 2 3 4 5

0

500

1000

evals/NMS-iteration0 30 60 90 120 150

0

100

200

300

time/NMS [ms]0 2 4 6 8 10 12 14

0

100

200

300

time/eval [ms]0 0.05 0.1 0.15 0.2

0

500

Figure C.2: Statistics for the NLopt SBPLXalgorithm, when optimizing the GD test voy-age. (75 decision variables, 22 Subplex itera-tions, 20000 function evaluations)

The analysis shows that the subspace partitioning algorithm (Algorithm 3.2 with NLopt’s imple-mentation) chooses the smallest possible subspace dimension (i.e. nsmin = 2) for the problems,and dimensions higher than 2 are almost never encountered. This is unsatisfactory, since it ef-fectively limits the parallelization level to 2. To make the approach efficient we need to increasethe average subspace dimension, e.g. by modifying the algorithm or by changing the algorithmparameters.

C.3 Analysis of the Parallel Nelder–Mead Simplex method

An evaluation of the Parallel Nelder–Mead Simplex method was made to verify that the im-plementation developed in this thesis has the same convergence properties as in [5], and toinvestigate the properties further. This was made by testing how the implementation con-verges on the test problem f(x) = 1

N xTx. This is a minimization problem of an N -dimensionalquadratic surface, which is a convex unimodal function with its minimum at x = 0, and itsgradient always directed towards the minimum. The problem is thus extremely well-behaved,and it is expected that any effective optimization algorithm should be able to converge to theminimum quickly. This problems is evaluated in [5], and the evaluation in this section attemptsto recreate the results achieved in their report. The main differences are, aside from the possibleimplementation level variations, that this evaluation looks at a larger number of parallelizationlevels and function evaluations, and that it does not restart the simplex if it becomes degen-erate (see Section 3.1). While the restarting is a crucial mechanism for better convergence, itis excluded in order to emphasize the effect caused by large values of the parallelization levelparameter P .

In Figure C.3, the mean convergence rate is shown for different parallelization levels P . Thesearch is initiated with the components of x0 distributed by the standard normal distribution,i.e. x0i ∼ N (0, 1). Each line shows the best found value after N iterations, averaged over 200optimization runs. The colored solid lines are the Parallel Nelder–Mead Simplex method fordifferent parallelization levels, while the dashed lines show how two NLopt algorithms performin comparison11. At 2000 evaluations, the parallelization level P = 72 has the lowest function

11Due to a different implementation of the simplex initialization, NLopt’s Nelder–Mead Simplex method has adifferent convergence profile.

47

function evaluation

0 2000 4000 6000 8000 10000

functionvalue

10-3

10-2

10-1

100

Mean cumulative minimum

Nelder–Mead

SBPLX

parallelizationlevel

P

10

20

30

40

50

60

70

80

90

100

Figure C.3: Convergence towards the minimumat 0 for different algorithms. The problem dimen-sion is N = 100, and the values are the minimumevaluated function values, averaged over 200 sim-ulations.

function evaluation0 2000 4000 6000 8000 10000

angle(degrees)

20

30

40

50

60

70

80

90

Mean gradient angle

parallelization

levelP

10

20

30

40

50

60

70

80

90

100

Figure C.4: Angle between the search directionand function gradient. When the angle is 90o thedirection of search is perpendicular to the gradi-ent, and thus not directed towards the minimum.The dimension is N = 100, and the angle is theaverage of 200 simulations.

value, which is similar to the results in [5]. After 4007 evaluations and forward, the non-parallelSimplex method with P = 1 has the lowest function value. This is because at this point, therate of convergence for all other algorithms have decreased so much that the serial variant hascaught up.

Figure C.4 shows that the search direction for high parallelization levels converges to 90o, whichmeans that it will not be able to make any progress. This explains why the yellow lines (P inthe range 80–100) in Figure C.3 stop decreasing after about 4000 function evaluations.

The conclusion drawn from this analysis is that it is possible to increase the initial convergencerate by using high values of P even without parallelism, but that some restarting mechanismmust be implemented to prevent the gradient angle from deteriorating. Ideally, for this problem,a parallelization level of P ≈ 72 should be used, backed by some restarting mechanism thatrestarts the simplex after about 1000–2000 function evaluations.

48