28
An algorithm model for mixed variable programming S. Lucidi , V. Piccialli , M. Sciandrone Abstract In this paper we consider a particular class of nonlinear optimization problems involving both continuous and discrete variables. The distinguishing feature of this class of nonlinear mixed optimization problems is that the structure and the number of variables of the problem depend on the values of some discrete variables. In particular we dene a general algorithm model for the solution of this class of problems, that generalizes the approach recently proposed by Audet and Dennis ([2]) and is based on the strategy of alternating a local search with respect to the continuous variables and a local search with respect to the discrete variables. We prove the global convergence of the algorithm model without specifying exactly the local continuous search, but only identifying its minimal require- ments. Moreover we dene a particular derivative-free algorithm for solving mixed variable programming problems where the continuous variables are linearly constrained and derivative information is not available. Finally we report numerical results obtained by the proposed derivative-free algorithm in solving a real optimal design problem. These results show the eectiveness of the approach. Key Words: nonlinear optimization, mixed variable programming, derivative-free methods. 1 Introduction Optimization problems involving both continuous and discrete variables are able to describe many real world problems. A particular class of such mixed variable optimization problems, which is important but dicult to solve, is characterized by the following distinguishing features: (i) the problem involves a special kind of discrete variables (called categorical variables), which identify an element of an unordered set (for example colors, shapes or materials) and aect the structure of the optimization problem. These categorical variables can be represented as a set of discrete numbers, but they can not assume intermediate values since for such values the corresponding optimization problem can be undened. This implies that in a minimization procedure their discreteness must be always satised; (ii) the dimensions of the problem are not xed and are themselves decision variables. In particular they can be represented by a vector of integer variables (called dimensional variables). Each value of these variables determines the number of the other variables, the number of constraints and the structure of the problem. Also in this case the discreteness of these variables can not be relaxed. Moreover the presence of these dimensional variables complicates considerably the analysis and the solution of the optimization problem; This work was partially supported by CNR Agenzia 2000, Optimization Methods for Support Vector Machine training, Roma, Italia. Dipartimento di Informatica e Sistemistica “A. Ruberti”, Universit` a degli Studi di Roma ”La Sapienza”, via Buonarroti 12, 00185, Roma ([email protected], [email protected]). Istituto di Analisi dei Sistemi ed Informatica, Consiglio Nazionale delle Ricerche, viale Manzoni 30, 00185, Roma ([email protected]). 1

An algorithm model for mixed ... - Optimization Online

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An algorithm model for mixed ... - Optimization Online

An algorithm model for mixed variable programming∗

S. Lucidi†, V. Piccialli†, M. Sciandrone‡

Abstract

In this paper we consider a particular class of nonlinear optimization problems involvingboth continuous and discrete variables. The distinguishing feature of this class of nonlinearmixed optimization problems is that the structure and the number of variables of the problemdepend on the values of some discrete variables.In particular we define a general algorithm model for the solution of this class of problems,that generalizes the approach recently proposed by Audet and Dennis ([2]) and is based on thestrategy of alternating a local search with respect to the continuous variables and a local searchwith respect to the discrete variables. We prove the global convergence of the algorithm modelwithout specifying exactly the local continuous search, but only identifying its minimal require-ments. Moreover we define a particular derivative-free algorithm for solving mixed variableprogramming problems where the continuous variables are linearly constrained and derivativeinformation is not available.Finally we report numerical results obtained by the proposed derivative-free algorithm in solvinga real optimal design problem. These results show the effectiveness of the approach.

Key Words: nonlinear optimization, mixed variable programming, derivative-free methods.

1 Introduction

Optimization problems involving both continuous and discrete variables are able to describe manyreal world problems. A particular class of such mixed variable optimization problems, which isimportant but difficult to solve, is characterized by the following distinguishing features:

(i) the problem involves a special kind of discrete variables (called categorical variables), whichidentify an element of an unordered set (for example colors, shapes or materials) and affectthe structure of the optimization problem. These categorical variables can be represented as aset of discrete numbers, but they can not assume intermediate values since for such values thecorresponding optimization problem can be undefined. This implies that in a minimizationprocedure their discreteness must be always satisfied;

(ii) the dimensions of the problem are not fixed and are themselves decision variables. In particularthey can be represented by a vector of integer variables (called dimensional variables). Eachvalue of these variables determines the number of the other variables, the number of constraintsand the structure of the problem. Also in this case the discreteness of these variables can notbe relaxed. Moreover the presence of these dimensional variables complicates considerably theanalysis and the solution of the optimization problem;

∗This work was partially supported by CNR Agenzia 2000, Optimization Methods for Support Vector Machinetraining, Roma, Italia.

†Dipartimento di Informatica e Sistemistica “A. Ruberti”, Universita degli Studi di Roma ”La Sapienza”, viaBuonarroti 12, 00185, Roma ([email protected], [email protected]).

‡Istituto di Analisi dei Sistemi ed Informatica, Consiglio Nazionale delle Ricerche, viale Manzoni 30, 00185, Roma([email protected]).

1

Page 2: An algorithm model for mixed ... - Optimization Online

2 S. Lucidi, V. Piccialli, M. Sciandrone

(iii) the objective function and/or the constraints do not satisfy any convexity assumption and thismakes difficult the minimization process. This implies, among other things, that an efficientbounding technique for fixed values of the integer variables can not be defined.

In order to formally describe the considered problem, we introduce the vector of decision variables,which has the following form: x

yz

where z ∈ Znz is the vector of dimensional variables, y ∈ Zny(z) is the vector of discrete variables,including also categorical variables, and x ∈ nx(z) is the vector of continuous variables.Then we consider problems with the following mathematical formulation:

min f(x, y, z) (1)

z ∈ Fzy ∈ Fy(z)x ∈ Fx(y, z)

where Fz ⊆ Znz , Fy(z) ⊆ Zny(z), Fx(y, z) ⊆ nx(z) and f(·, z) : nx(z) ×Zny(z) → .We note that the dimensional variables determine both the dimensions of the other variables of theproblem and the structure of the feasible set of the discrete variables y, while the feasible set of thecontinuous variables depends both on the dimensional variables z and on the discrete variables y.Standard solution approaches for mixed variables optimization problems are not able to tackleefficiently Problem (1), which presents features (i)-(iii). Therefore it is worthwhile to study anddefine new solution methods for this class of problems.Recently, a first globally convergent algorithm scheme for a class of problems with similar featureshas been proposed in [2] and then further developed in [1]. The basic idea is to alternate a local searchwith respect to the continuous variables and a local search with respect to the integer variables. Inparticular, the method described in [2] considers the special case where the domain of the continuousvariables Fx is bound constrained and it uses a local continuous search based on the pattern searchalgorithm introduced in [4]. In [1] the approach is extended to the case of nonsmooth functions andmore general feasible sets.In this paper, drawing our inspiration from the approach proposed in [2], we define an algorithmmodel for the solution of the entire class of problems described by Problem (1).This algorithm model handles explicitly the presence of dimensional variables and it does not referto a particular structure of the feasible sets or to a particular local continuous search. The globalconvergence properties of the algorithm are proved without specifying exactly the local continuoussearch, but only identifying the minimal requirements that it must satisfy.The proposed algorithm model can be used as basis for developing new different algorithms whichexploit the structure of the considered instance of Problem (1).As an example, in this paper we use this algorithm model to propose a derivative-free algorithm tosolve Problem (1) when the continuous variables are linearly constrained and derivative informationis not available. In [7] the model described in this paper is the framework to define derivative basedalgorithms for solving Problem (1) in case where the continuous variables are unconstrained andtheir number is very large.The paper is organized as follows: in Section 2 we give some definitions and assumptions. In Section3 we introduce as illustrative example a real optimization problem, concerning the optimal designof a magnetic resonance device. In Section 4 we describe our algorithm model and in Section 5 westudy its properties. In Section 6 we propose a derivative-free algorithm well suited to solve theapplication of Section 3, namely able to tackle Problem (1) in case where the continuous variables

Page 3: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 3

are linearly constrained and derivative information is not available. Finally in Section 7 we reportthe numerical results obtained by applying the algorithm described in Section 6 on the optimaldesign application.

In the sequel of the paper, we will denote by x the standard Euclidean norm of x and by B(x, )the ball centered at x of radius .

2 Definitions and Assumptions

In this section we start by recalling the definitions of global and local minima of Problem (1).Obviously the global optimum of Problem (1) is a point which satisfies the following definition:

Definition 2.1 A feasible point (x∗, y∗, z∗) is said to be a global minimizer of Problem (1) if

f(x∗, y∗, z∗) ≤ f(x, y, z) ∀z ∈ Fz, ∀y ∈ Fy(z), ∀x ∈ Fx(y, z). (2)

Less immediate is the definition of local minimum point. This notion refers to the behavior ofthe objective function in a “suitable neighborhood” of a given point. While a neighborhood of acontinuous variable is well represented by a continuous ball, the neighborhood of a discrete variablemust be defined taking into account the structure of the particular problem. Furthermore thediscrete neighborhood must represent the fact that variations of the discrete variables can implyalso variations of the continuous variables (see for example next section).Following Audet and Dennis ([2]), we can characterize a local solution (x∗, y∗, z∗) of Problem (1) asa point such that there are not better feasible solutions in the balls centered at the points belongingto the discrete neighborhood of (x∗, y∗, z∗):

Definition 2.2 A feasible point (x∗, y∗, z∗) is said to be a local minimizer of Problem (1) with respectto the feasible discrete neighborhood N (x∗, y∗, z∗) if there exists an > 0 such that ∀(x, y, z) ∈N (x∗, y∗, z∗)

f(x∗, y∗, z∗) ≤ f(x, y, z) ∀x ∈ B(x, ) ∩Fx(y, z), (3)

where N (x∗, y∗, z∗) is a finite set of feasible points.

This definition depends on the choice of the discrete neighborhoods. In fact the bigger is thediscrete neighborhood N (x∗, y∗, z∗), the better is the quality of the solution. However, the biggeris the discrete neighborhood N (x∗, y∗, z∗), the more difficult is to locate the solution.As it is common in nonlinear programming algorithms, to locate a global or local solution of Problem(1) can be prohibitive. More reasonable is to determine a point (usually called stationary point)which satisfies suitable necessary optimality conditions.The definition of these conditions and hence of stationary points of Problem (1) must refer to suitableoptimality conditions for the following continuous problem, that is Problem (1) for a fixed choice ofthe discrete variables z ∈ Fz and y ∈ Fy(z):

min f(x, y, z) (4)

x ∈ Fx(y, z).

Since we want to treat Problem (1) in its general form, we do not exactly specify optimality conditionsfor Problem (4), which depend on the particular structure of the original problem. For this reason,in defining stationary points of Problem (1), we generically refer to stationary points of Problem(4), namely to points satisfying suitable optimality conditions.

Definition 2.3 A feasible point (x∗, y∗, z∗) is said to be a stationary point of Problem (1) withrespect to the feasible discrete neighborhood N (x∗, y∗, z∗) if

Page 4: An algorithm model for mixed ... - Optimization Online

4 S. Lucidi, V. Piccialli, M. Sciandrone

(i) the point x∗ is a stationary point of the following continuous problem:

min f(x, y∗, z∗) (5)

x ∈ Fx(y∗, z∗);

(ii) every (x, y, z) ∈ N (x∗, y∗, z∗) satisfies f(x, y, z) ≥ f(x∗, y∗, z∗);(iii) for every (x, y, z) ∈ N (x∗, y∗, z∗) such that f(x, y, z) = f(x∗, y∗, z∗), the point x is a stationary

point of the following continuous problem

min f(x, y, z) (6)

x ∈ Fx(y, z).

Any minimization algorithm for solving Problem (1) produces vectors with both continuous anddiscrete components which can have different dimension at each iteration. For this reason we needto specify the notion of converging sequence and accumulation point of a sequence.

Definition 2.4 A sequence {xk, yk, zk} converges to a point (x, y, z) if for any > 0 there exists anindex k such that for all k ≥ k we have that yk = y, zk = z and xk − x < .

Definition 2.5 A point (x, y, z) is an accumulation point of the sequence {xk, yk, zk} if there existsan infinite subset K ⊆ {0, 1, . . .} such that the subsequence {xk, yk, zk}K converges to (x, y, z).

In the sequel we suppose verified the following assumptions:

Assumptions

A1. The set Fz contains a finite number of elements.

A2. Given z0 ∈ Fz , y0 ∈ Fy(z0) and x0 ∈ Fx(y0, z0), for each z ∈ Fz, the levelset

Lz(x0, y0, z0) = {(x, y) ∈ Fx(y, z)×Fy(z) : f(x, y, z) ≤ f(x0, y0, z0) + ξ, ξ > 0} (7)

is compact.

A3. For each z ∈ Fz and for each y ∈ Fy(z) the function f(·, y, z) is continuousin a neighborhood of {x : (x, y, z) ∈ Lz(x0, y0, z0)}.

A4. Let {xk, yk, zk} be a sequence converging to (x, y, z), then, for any(x, y, z) ∈ N (x, y, z), there exists a sequence {(xk, yk, zk)} convergingto (x, y, z) such that (xk, yk, zk) ∈ N (xk, yk, zk).

Assumption A1 is a reasonable assumption and requires that the vector of dimensional variables canassume a finite number of values. Assumption A2 requires the standard compactness of the level set

Page 5: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 5

of the objective function over the feasible set. Assumption A3 is a minimal smoothness requirementon the objective function. Assumptions A1-A3 ensure the existence of a solution of the problem, asshown in the next proposition.Assumption A4 is a mild continuity assumption on the discrete neighborhoods and it is equivalentto the lower semicontinuity of a point to set function as defined in [3].

Proposition 2.6 Under Assumptions A1-A3 the objective function f(x, y, z) admits a global mini-mum point on the feasible set.

Proof. By Assumption A1 we have that the vector z belongs to the finite set {z1, . . . , zm}. Given afeasible point (x0, y0, z0), for each feasible vector z

i, i = 1, . . . ,m, consider the level set Lzi(x0, y0, z0).By Assumption A2, this set is compact and hence the vector of discrete variables y can assume afinite number of values on it, namely y ∈ {yi1, . . . , yip}, where the superscript i means that thesevalues depend on the vector zi. Now, for a fixed value yij of the vector y, consider the set

Lyij

zi(x0, y0, z0) = {x ∈ Fx(yij , zi) : f(x, yij , zi) ≤ f(x0, y0, z0) + ξ, ξ > 0}.

This set is compact since it is a closed subset of the set Lzi(x0, y0, z0), which is compact by As-sumption A2. Moreover by Assumption A3 f(x, yij , z

i) is continuous on Lyij

zi (x0, y0, z0) and hence it

admits a global minimum point (x∗i,j , yij , z

i) on this set. Thus we can conclude that the objectivefunction f(x, y, z) admits a global minimum point on the feasible set, given by

(x∗, y∗, z∗) = arg mini=1,...,mj=1,...p

f(x∗i,j , yij , z

i).

3 An illustrative application

In order to illustrate our approach we consider a real application concerning the optimal design of asmall magnetic resonance device (see [5] for the details). The aim is to construct a device with thefollowing features:

- a high uniformity of the magnetic field to obtain high resolution images,

- a large uniformity region, to be able to scan an area which is large with respect to the dimensionof the apparatus,

- low weight to make the apparatus transportable.

The realization of such magnetic resonance apparatuses would greatly benefit the diagnosis, prognosisand the therapy of many pathologies. One possible technique to build a low field dedicated magnetis that of using permanent magnets surrounded by an iron yoke to amplify the magnetic field.The structure we consider is cylindric and constituted by elliptical rings having a certain numberof small magnets screwed on each of them (see fig. 1). Each magnet has cylindric shape with fixedheight. Moreover the structure is symmetric with respect to the semi-axes of the rings and withrespect to a plane parallel to the rings which divides the model into two halves. For this reason wecan consider only one eighth of the multipolar magnet and get the other parts by reflection, reducingthe number of variables necessary to describe the structure and the computational effort to calculatethe objective function of this problem.Let XY Z be a system of Cartesian coordinates with origin at the center of the structure, X axisparallel to the cylinder axis and Z and Y axes directed respectively along the shortest and longest

Page 6: An algorithm model for mixed ... - Optimization Online

6 S. Lucidi, V. Piccialli, M. Sciandrone

Figure 1: A possible structure of the multipolar magnet

Page 7: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 7

semi-axis of the elliptical base. Our aim is to obtain that the magnetic field B generated by themultipolar magnet is as uniform as possible within the target region and directed along the Z axis.The decision variables of the problem are:

- the number of rings nr,

- the number of magnets nm on each ring (all the rings have the same number of magnets),

- the position di of the i-th ring along the X axis (see fig. 2 point (a)),

- the angular position ϕj of the magnet j, which is the same on each ring (see fig. 2 point (b)),

- one variable bi for each ring, which represents the difference between the effective length ofthe two semi-axes of the i-th ring and two nominal values a and b (see fig. 2 point (c)),

- the radius r of the magnets.

Figure 2: Variables of the problem with 6 rings and 4 magnets

Therefore in the corresponding optimization problem we have that the discrete dimensional variablesare the number of rings (nr) and the number of magnets (nm); the continuous variables are thepositions di and the “offsets” bi of the rings, for i = 1, . . . , nr, and the angular positions ϕj of themagnets, for j = 1, . . . , nm.Finally, since the small magnets commercially available belong to a finite list, their radius (r) canassume only some integer values in a fixed range and hence is a categorical discrete variable.

Page 8: An algorithm model for mixed ... - Optimization Online

8 S. Lucidi, V. Piccialli, M. Sciandrone

We note that the dimension of the continuous variables depends on the dimensional variables. Sum-marizing and using the notation introduced in Section 1, the decision variables of the problem arethe following: x

yz

where

x = (d1, . . . , dnr, b1, . . . , bnr,ϕ1, . . . ,ϕnm)T , y = (r) and z = (nr, nm)T

with x ∈ 2nr+nm, y ∈ [ly, uy] and z ∈ [lz , uz].As regards the objective function to be minimized, it must be a measure of the non uniformity of themagnetic field. Let Np be the number of points uniformly distributed on a grid inside the cylindrical

target region of interest. Let B(i)X (x, y, z), B

(i)Y (x, y, z) and B

(i)Z (x, y, z) be the three components of

the magnetic field measured at the i-th point of the grid. The objective function is

f(x, y, z) =

Np

i=1

(B(i)Z (x, y, z)− BZ(x, y, z))2 +B(i)X (x, y, z)2 +B(i)Y (x, y, z)2 ,

where Bz is the average Z-component of the magnetic field probed on the grid points. It canbe easily verified that the objective function f(x, y, z) attains its global minimum value, 0, when

B(i)Z (x, y, z) = BZ(x, y, z), B

(i)X (x, y, z) = 0 and B

(i)Y (x, y, z) = 0 for all i = 1, . . . , Np, that is, when

the magnetic field is “sufficiently” uniform and directed along the Z-axis on every point of the controlgrid.We note that the magnetic field can not be analytically determined and hence it is computed by afield simulation program. This implies that our objective function is a black box function and it isexpensive to calculate. In particular each function evaluation can take several seconds.As regards the feasible set, we should have nonlinear constraints on the angular positions ϕj ofthe magnets to avoid overlapping. However, we have substituted these nonlinear constraints withsimpler but more restrictive box constraints. Moreover we have introduced linear constraints on thepositions di of the rings to avoid their overlapping. Finally the semi-axes of the elliptical rings cannot be arbitrary, and hence lower and upper bounds are imposed on the offsets bi.Thus the complete formulation of the problem is the following:

min f(d,ϕ, b, r, nr, nm) (8)

lnm ≤ nm ≤ unmlnr ≤ nr ≤ unrlr ≤ r ≤ urlϕ(r, nm) ≤ ϕ ≤ uϕ(r, nm)A(r, nr)d ≤ b(r, nr),lb ≤ b ≤ ubr, nm, nr integer,

where lϕ, uϕ ∈ nm, A(r, nr) ∈ nr×nr, b(r, nr) ∈ nr and lb, ub ∈ nr.For this problem, we define the discrete neighborhood of a given point as follows:

-the two points obtained respectively by increasing and decreasing of 1mm the radius r of themagnets and by keeping fixed the number of rings nr and the number of magnets nm;

-the two points obtained respectively by adding (nr := nr+1) and deleting (nr := nr− 1) one ringkeeping fixed the number of magnets nm and the radius r of the magnets;

Page 9: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 9

-the two points obtained respectively by adding (nm := nm+1) and deleting (nm := nm− 1) onesmall magnet keeping fixed the number of rings nm and the radius r of the magnets.

We have to note that when we obtain a new point changing one discrete variable, dimensional orgeneral, it can be necessary to change also the continuous variables. For example, if we increase of1mm the radius r of the magnets, and we do not change the angular position ϕj of the magnets, itcan happen that two magnets are overlapping in the new point, that is the new point is infeasible.

4 Algorithm model MIVAM

In this section we describe our algorithm model for solving Problem (1). In defining an algorithmmodel it is necessary to take into account the presence of both continuous and discrete variables,which need different minimization procedures. This algorithm is based on the idea to alternate twophases:

- an attempt to update the continuous variables by using a local continuous search (Phase 1),

- an attempt to update the discrete variables by using a local search in the discrete neighborhoodof the current point (Phase 2).

Phase 1:

Given the current feasible point (xk, yk, zk) we keep fixed the discrete variables (yk, zk) and weconsider the following continuous optimization problem:

min f(x, yk, zk) (9)

x ∈ Fx(yk, zk).

Starting from xk, we perform a local continuous search (denoted in the algorithm by LCS) with theaim to produce a new vector xk which is, roughly speaking, a better approximation of a stationarypoint of Problem (9).

Phase 2:

In this phase we try to update the discrete variables by considering the points belonging to thediscrete neighborhood N (xk, yk, zk) of the point (xk, yk, zk) produced by Phase 1.As first step we simply evaluate the objective function in the points belonging to N (xk, yk, zk). Ifone of these points produces a sufficient decrease with respect to f(xk, yk, zk), then it becomes thecurrent point and a new iteration is performed. We note, as already said, that the definition ofthe discrete neighborhood implies that also the continuous variables of the new current point canchange.

If none of the points belonging to N (xk, yk, zk) produces a sufficient decrease with respect tof(xk, yk, zk), this does not imply necessarily that it is not worthwhile to accept the discrete vari-ables of some of these points. In fact we are comparing the objective value of these points with theobjective value of (xk, yk, zk), where xk is produced by a minimization process.

Therefore we still try to update the discrete variables by selecting some points belonging toN (xk, yk, zk)with objective value not significantly worse than f(xk, yk, zk) and performing a suitable number oflocal continuous searches starting from each of these selected point.

Now, we describe the proposed algorithm model:

Page 10: An algorithm model for mixed ... - Optimization Online

10 S. Lucidi, V. Piccialli, M. Sciandrone

Mixed Integer Variable Algorithm Model (MIVAM)

Data: z0 ∈ Fz, y0 ∈ Fy(z0), x0 ∈ Fx(y0, z0), ξ ≥ 0, θ ∈ (0, 1), η0 > 0, µin > 0.

Step 0: Set k = 0, zk = z0, yk = y0, xk = x0, ηk = η0, µ0k = µ

in.

Step 1: Compute xk and µk by LCS(xk, µ0k, xk, µk, yk, zk) and set µ

0k+1 = µk.

Step 2: If there exists a (xk+1, yk+1, zk+1) ∈ N (xk, yk, zk) such that

f(xk+1, yk+1, zk+1) ≤ f(xk, yk, zk)− ηk,

set xk+1 = xk+1, yk+1 = yk+1, zk+1 = zk+1, ηk+1 = ηk and go to step 5.

Step 3: Define Wk = {(x, y, z) ∈ N (xk, yk, zk) : f(x, y, z) ≤ f(xk, yk, zk) + ξ}.

3.1: If Wk = ∅ choose (x , y , z ) ∈Wk, set j = 1, xj = x , µj−1 = µk.

Otherwise go to Step 4.

3.2: Compute xj+1 and µj by LCS(xj , µj−1, xj+1, µj , y , z ).

3.3: If f(xj+1, y , z ) ≤ f(xk, yk, zk)− ηk set xk+1 = xj+1, yk+1 = y ,

zk+1 = z , ηk+1 = ηk and go to step 5.

3.4: If µj > µk set j = j + 1 and go to 3.2,

otherwise set Wk =Wk \ {(x , y , z )} and go to 3.1.

Step 4: Set xk+1 = xk, yk+1 = yk, zk+1 = zk. If

f(xk+1, yk+1, zk+1) ≤ f(xk, yk, zk)− ηk,

set ηk+1 = ηk, otherwise set ηk+1 = θηk.

Step 5: Set k = k + 1 and go to step 1.

At Step 1 Phase 1 is performed, by applying the local continuous search procedureLCS(xk, µ

0k, xk, µk, yk, zk). This procedure should guarantee in the limit the stationarity with respect

to Problem (9). Most of the local continuous searches exploit some information obtained in theprevious iterates. For this reason the procedure LCS uses the scalar µ0k, which derives from theprevious iterations and gives, roughly speaking, an initial estimate of the expected improvement ofthe objective function obtainable from the point xk. The procedure LCS tries to produce a newpoint xk, where the objective function is sufficiently decreased. The information obtained during

Page 11: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 11

this process can be used to update the estimate µ0k producing a new scalar µk. A suitable choiceof the scalar µk (and µ

0k) and its updating rule must imply that the smaller µk is, the better xk

approximates a stationary point. In particular, if the procedure LCS is not able to produce asufficient decrease of the objective function, the point xk is set equal to xk and the scalar µk mustbe set to a value sufficiently smaller than µ0k.

At Step 2 and Step 3, Phase 2 is performed. In particular at Step 2 the objective function isevaluated in the points (xk+1, yk+1, zk+1) ∈ N (xk, yk, zk). If one of these points produces a decreasewith respect to f(xk, yk, zk) bigger or equal to ηk, then it becomes the new current point and a newiteration is performed.

Otherwise the discrete neighborhood is further investigated by Step 3. In particular a set Wk ⊆N (xk, yk, zk) of points with objective value not significantly worse than f(xk, yk, zk) is selected.Each of these points (x , y , z ) ∈Wk is considered promising and the algorithm tries to understandif it is worth replacing (yk, zk) with (y , z ).

In particular, starting from each point (x , y , z ) ∈ Wk, the local continuous search LCS is repeateduntil

- either it is produced a point which is significantly better than the point (xk, yk, zk),

- or the test at Step 3.4 fails.

In the first case we accept the discrete variables (y , z ). In fact the produced point becomes the newcurrent point and a new iteration is performed. In the second case we refuse the discrete variables(y , z ). In fact the failure of the test at Step 3.4 implies that Steps 3.1-3.4 have not been able toproduce a sufficient decrease of the objective function by a minimization process comparable to theone that has yielded the point (xk, yk, zk).

At Step 4 the point (xk, yk, zk) becomes the new current point and, if neither the local continuoussearch nor the discrete search have been able to produce a decrease of the objective function biggeror equal to ηk, then this parameter is reduced.

In order to complete the description of the algorithm, we should specify the procedure LCS, but,as already said in the introduction, we prefer to identify only the properties that it must satisfy toensure the global convergence of this algorithm model.

In particular, we require that Procedure LCS(xk, µ0k, xk, µk, yk, zk) satisfies the following two pro-

perties:

Property A

Given a feasible point (xh, yh, zh) and a scalar µ0h > 0 the local continuous search produces a

point xh and a scalar µh > 0 such that:

(i) xh ∈ Fx(yh, zh) and f(xh, yh, zh) ≤ f(xh, yh, zh),

(ii) - either f(xh, yh, zh) ≤ f(xh, yh, zh)− σ(µh)(where the function σ(·) ≥ 0 is such that if σ(th)→ 0, then th → 0);

- or µh ≤ δµ0h, δ ∈ (0, 1).

Page 12: An algorithm model for mixed ... - Optimization Online

12 S. Lucidi, V. Piccialli, M. Sciandrone

Property B

Let {(xh, yh, zh)} be a sequence of feasible points. For all h suppose to apply LCS procedurestarting from (xh, yh, zh) and to obtain a point (xh, yh, zh) and a scalar µh > 0. If the sequence{µh} is such that

limh→∞

µh = 0, (10)

then every accumulation point (x, y, z) of the sequence {(xh, yh, zh)} is such that x is a station-ary point of the problem

min f(x, y, z) (11)

x ∈ Fx(y, z).

Properties A and B are the minimal requirements that Procedure LCS, when embedded in AlgorithmMIVAM, must satisfy in order to ensure in the limit the stationarity with respect to the continuousvariables.We note that, by Property A, Algorithm MIVAM produces a sequence of feasible points such thatthe corresponding sequence of objective values is non increasing.Moreover the algorithm model accepts a new point only when it produces a “sufficient decrease” ofthe objective function. In fact at Step 2 and Step 3 the discrete variables are updated if the decreaseof the objective function is bigger or equal to the scalar ηk, while the local continuous search isrequested to satisfy point (ii) of Property A.This requirement could be weakened by requiring just a simple decrease of the objective function asin the algorithm proposed in [2], but this would imply strong restrictions on the sampling techniqueof the objective function with respect to the continuous variables and further assumptions on thediscrete neighborhoods.Anyway, on the one hand, the tests at Step 2 and Step 3 can not be too restrictive since the parameterηk is reduced by the algorithm whenever a sufficient decrease is not obtained. On the other handa suitable value of ηk should avoid to change the discrete variables too early and should preventoscillating between different discrete variables.

5 Theoretical Analysis

In this section we analyze the theoretical properties of the algorithm model MIVAM.First of all we prove that the algorithm is well posed, namely that the algorithm can not cycle atStep 3:

Proposition 5.1 For every µk > 0 and for every (x , y , z ) ∈ Wk, there exists an index j∗k such

that the scalar µj∗k produced at Step 3.2 satisfies:

µj∗k ≤ µk.

Proof. Suppose by contradiction thatµj > µk, ∀j. (12)

This implies that steps 3.1-3.4 produce an infinite sequence of points {xj} and an infinite sequenceof scalars {µj}. By Property A of LCS, we have two possibilities:

- eitherf(xj+1, y, z)− f(xj , y, z) ≤ −σ(µj), (13)

Page 13: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 13

- orµj ≤ δµj−1. (14)

Now we split the inner iterations of Step 3 in two subsets, J1 and J2, where J1 and J2 identify theiterations where (13) or (14) hold respectively.If j ∈ J1, by (13) we have:

|f(xj+1, y, z)− f(xj , y, z)| ≥ σ(µj). (15)

By Property A of Procedure LCS we have that the sequence {f(xj , y, z)} is non increasing. Moreoverby Assumption A2 it follows that the sequence {f(xj , y, z)} is bounded from below. Therefore thesequence {f(xj , y, z)} converges. This implies that, if J1 is infinite, the left term of the aboveinequality tends to zero and hence

limj→∞,j∈J1

σ(µj) = 0, (16)

so that, by the property of σ(·), we havelim

j→∞,j∈J1µj = 0. (17)

We now consider the set J2. For each j ∈ J2, let mj be the biggest index such that mj < j andmj ∈ J1(we can assume mj = 0 if the index mj does not exist, that is, J1 is empty). Then we have

µj ≤ (δ)j−mjµmj . (18)

If J2 is finite, recalling (16), we can write

limj→∞

σ(µj) = limj→∞,j∈J1

σ(µj) = 0,

which implies that µj → 0.Then assume that J2 is infinite. As j → ∞ and j ∈ J2, either mj → ∞ (if J1 is an infinite subset)or (j − mj) → ∞. In the first case, from (16) and (18), recalling that δ ∈ (0, 1), it follows thatµj → 0. In the second case, from (18), taking into account again that δ ∈ (0, 1), we obtain µj → 0.Thus we have proved that µj → 0, but this contradicts (12).

In order to analyze the properties of the algorithm, we need to characterize the iterations of thealgorithm. First of all, we split the set of indices of the iterates in two subsets: the set of unsuccessfuliterates

Ku = {k : ηk < ηk−1}and the set of successful iterates Ks which is the complement of the subset Ku. Note that aniteration is declared unsuccessful when it does not produce a point where the objective function issufficiently decreased. Otherwise the iteration is declared successful.Furthermore we introduce for every z ∈ Fz, the following subset of indices of iterates

K(z) = {k ∈ Ku : zk = z}. (19)

The next theorem describes some properties of the sequences produced by the algorithm, that willbe used later for the convergence analysis.

Proposition 5.2 Let {(xk, yk, zk)}, {ηk}, {µk} be the sequences produced by the algorithm. Then:(i) the sequence of function values {f(xk, yk, zk)} is non increasing and convergent;(ii) the sequence {(xk, yk, zk)} is bounded;(iii) the sequence {ηk} is such that lim

k→∞ηk = 0;

Page 14: An algorithm model for mixed ... - Optimization Online

14 S. Lucidi, V. Piccialli, M. Sciandrone

(iv) the subset of indices of unsuccessful iterates Ku is infinite;

(v) the sequence {µk} is such that limk→∞

µk = 0.

Proof.Point (i): the instructions of the algorithm and Property A of Procedure LCS imply that the sequence{f(xk, yk, zk)} is non increasing. Moreover by Proposition 2.6 we have that there exists a value fsuch that f(xk, yk, zk) ≥ f and hence the sequence {f(xk, yk, zk)} is convergent.Point (ii): by Assumption A1 we have that the vector zk belongs to the finite set {z1, . . . , zm}. Theinstructions of the algorithm imply that each point (xk, yk, zk) belongs to the set Lzk(x0, y0, z0),which is compact by Assumption A2. Thus the whole sequence {(xk, yk, zk)} is contained in the seti=1,...,m Lzi(x0, y0, z0), which is the union of a finite number of compact sets and hence is compact.

Then we can conclude that the sequence {(xk, yk, zk)} is bounded.Point (iii): it follows from the instructions of the algorithm that at each iteration we have twopossibilities:

- the iteration is successful, that is k ∈ Ks, so we change the current point and the

corresponding objective value satisfies

f(xk, yk, zk) ≤ f(xk−1, yk−1, zk−1)− ηk−1 = f(xk−1, yk−1, zk−1)− ηk, (20)

where the equality follows from the definition of Ks, that is Ks = {k : ηk = ηk−1};- the iteration is unsuccessful, that is k ∈ Ku, then

ηk = θηk−1, θ ∈ (0, 1).

Consider the set Ks: if k ∈ Ks we have from (20)

| f(xk, yk, zk)− f(xk−1, yk−1, zk−1) |≥ ηk, (21)

so that, if Ks is infinite, point (i) implies that

limk→∞,k∈Ks

ηk = 0. (22)

Now consider the set Ku. For each k ∈ Ku, let mk be the biggest index such that mk < k andmk ∈ Ks(assume mk = 0 if the index mk does not exist, that is, Ks is empty). Then we have

ηk = θk−mkηmk. (23)

If Ku is finite, recalling (22), we can write

limk→∞

ηk = limk→∞,k∈Ks

ηk = 0. (24)

Then assume that Ku is infinite. As k → ∞ and k ∈ Ku, either mk → ∞ (if Ks is an infinitesubset) or (k −mk) → ∞. In the first case, from (22) and (23), recalling that θ ∈ (0, 1), it followsthat limk→∞ ηk = 0.In the second case, from (23), taking into account again that θ ∈ (0, 1), it follows that limk→∞ ηk = 0,that is the thesis.

Point (iv): it follows from the definition of Ku and from point (iii).

Point (v): the proof is similar to that of point (iii); however, for the sake of completeness, we reportit below. Let xk and µk be the point and the scalar produced by Procedure LCS at Step 1 ofiteration k. By Property A of LCS, we have two possibilities:

Page 15: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 15

- eitherf(xk, yk, zk)− f(xk, yk, zk) ≤ −σ(µk), (25)

- orµ0k+1 = µk ≤ δµ0k. (26)

In the first case, by the instructions of the algorithm and by (25), since f(xk+1, yk, zk) ≤ f(xk, yk, zk)we have:

f(xk+1, yk, zk)− f(xk, yk, zk) ≤ −σ(µk). (27)

Now we split the iterations of the algorithm in two subsets, Ka and Kb, where Ka and Kb identifythe iterations where (25) or (26) hold respectively. If k ∈ Ka, by (27) we have

|f(xk+1, yk, zk)− f(xk, yk, zk)| ≥ σ(µk). (28)

By point (i) we have that the sequence {f(xk, yk, zk)} converges, so that, if Ka is infinite, the leftterm of the above inequality tends to zero and hence

limk→∞,k∈Ka

σ(µk) = 0. (29)

Then, the property of σ(·) implieslim

k→∞,k∈Ka

µk = 0. (30)

We now consider the set Kb. For each k ∈ Kb, let mk be the biggest index such that mk < k andmk ∈ Ka(we can assume mk = 0 if the index mk does not exist, that is, Ka is empty). Then wehave

µk ≤ (δ)k−mkµmk . (31)

If Kb is finite, , recalling (29), we can write

limk→∞

σ(µk) = limk→∞,k∈Ka

σ(µk) = 0, (32)

which implies that limk→∞ µk = 0.Then assume that Kb is infinite. As k →∞ and k ∈ Kb, either mk →∞ (if Ka is an infinite subset)or (k − mk) → ∞. In the first case, from (30) and (31), recalling that δ ∈ (0, 1), it follows thatlimk→∞ µk = 0.In the second case, from (31), taking into account again that δ ∈ (0, 1), it follows that limk→∞ µk = 0,and hence we have the thesis.

Before to state the first main convergence result, we report the following proposition:

Proposition 5.3

(i) There exists at least a vector z∗ ∈ Fz such that the subset K(z∗) of indices of iterates isinfinite.

(ii) Let K(z∗) be an infinite subset, then:

- the sequence {xk, yk, zk}K(z∗) admits at least one accumulation point;

- let (x∗, y∗, z∗) be an accumulation point of the sequence {xk, yk, zk}K(z∗), then every (x, y, z) ∈N (x∗, y∗, z∗) is an accumulation point of a sequence {(xk, yk, zk)}K(z∗), where (xk, yk, zk) ∈N (xk, yk, zk).

Page 16: An algorithm model for mixed ... - Optimization Online

16 S. Lucidi, V. Piccialli, M. Sciandrone

Proof.

(i) By Assumption A1 z belongs to the finite set {z1, . . . , zm}, and hence, by definition (19), we canwrite Ku = i=1,...,mK(z

i). Since the set Ku is infinite (see point (iv) of Proposition 5.2), we can

conclude that there exists at least a vector z∗ ∈ {z1, . . . , zm} such that K(z∗) is infinite.(ii) Point (ii) of Proposition 5.2 implies that the sequence {xk, yk, z∗}K(z∗) admits at least an accu-mulation point.Assumption A4 implies that every (x, y, z) ∈ N (x∗, y∗, z∗) is an accumulation point of a sequence{(xk, yk, zk)}K(z∗), where (xk, yk, zk) ∈ N (xk, yk, zk).The main result can now be proved.

Proposition 5.4 Let (xk, yk, zk) be the sequence produced by the algorithm. Let F∗ be a subset ofFz such that for all z∗ ∈ F∗ the corresponding K(z∗) is infinite. Then F∗ is non empty and foreach z∗ ∈ F∗ the sequence {(xk, yk, zk)}K(z∗) admits accumulation points and every accumulationpoint (x∗, y∗, z∗) is a stationary point of Problem (1).

Proof. Proposition 5.3 implies that the set F∗ is non empty and that for all z∗ ∈ F∗ there exists atleast one accumulation point of the corresponding subsequence {xk, yk, zk}K(z∗). Let (x∗, y∗, z∗) beany accumulation point of any of these subsequences: then we must prove that it satisfies conditions(i), (ii), (iii) stated in Definition 2.3.

Condition (i): by the instructions of the algorithm, we have that any (xk, yk, zk) produced is astarting point of Procedure LCS at Step 1. Then the result follows from point (v) of Proposition5.2 and from Property B.

Condition (ii): suppose by contradiction that there exists a (x, y, z) ∈ N (x∗, y∗, z∗) such thatf(x, y, z) < f(x∗, y∗, z∗). Then we can find an η > 0 such that

f(x, y, z) ≤ f(x∗, y∗, z∗)− η.

Continuity of the function f with respect to the continuous variables guarantees the existence of an> 0 such that, if x belongs to the ball B(x, ) of radius centered at x, then

f(x, y, z) ≤ f(x∗, y∗, z∗)− η

2. (33)

As (x, y, z) ∈ N (x∗, y∗, z∗), Assertion (ii) of Proposition 5.3 and Point (iii) of Proposition 5.2 implythat there exists k ∈ K(z∗) such that for all k ≥ k and k ∈ K(z∗) we have:

(xk, yk, zk) ∈ N (xk, yk, zk), (34)

xk ∈ B(x, ), yk = y, zk = z, (35)

ηk−1 <η

4. (36)

Then we have from (33) and (35)

f(xk, yk, zk) ≤ f(x∗, y∗, z∗)− η

2. (37)

Since K(z∗) is an infinite subset of the set Ku of unsuccessful iterates and k ∈ K(z∗), we have thatthe tests at Step 2 and Step 4 fail at the iteration k− 1. Then, recalling that xk = xk−1, yk = yk−1and zk = zk−1, we can write:

f(xk, yk, zk) > f(xk, yk, zk)− ηk−1 > f(xk, yk, zk)− 2ηk−1, (38)

Page 17: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 17

for all (xk, yk, zk) ∈ N (xk, yk, zk). It follows from (36), (38) and Point (i) of Proposition 5.2 that

f(xk, yk, zk) > f(xk, yk, zk)− η

2≥ f(x∗, y∗, z∗)− η

2,

but this contradicts (37).

Condition (iii): let us consider any (x, y, z) ∈ N (x∗, y∗, z∗) such that

f(x∗, y∗, z∗) = f(x, y, z). (39)

We observe that Proposition 5.3 implies that (x, y, z) is an accumulation point of a sequence{(xk, yk, zk)}K(z∗), where (xk, yk, zk) ∈ N (xk, yk, zk). Then, (39) and the continuity of f implythat, for sufficiently large values for k,

f(xk, yk, zk) ≤ f(xk, yk, zk) + ξ. (40)

Therefore, for such values of k, Steps 3.2-3.4 produce the points x2k, . . . , xj∗kk (where j

∗k is the number

of repetitions of steps 3.2-3.4 until the test at Step 3.4 fails) which, by the instructions at Step 3.2and by point (i) of Property A, satisfy the following inequalities:

f(xk, yk, zk) ≥ f(x2k, yk, zk) ≥ . . . ≥ f(xj∗k

k , yk, zk). (41)

Since k ∈ K(z∗) and hence xk = xk−1, yk = yk−1 and zk = zk−1, we can write

f(xj∗kk , yk, zk) ≥ f(xk, yk, zk)− ηk−1. (42)

Moreover, as the sequences {xk, yk, zk}K(z∗) and {xk, yk, zk}K(z∗) converge to the points (x, y, z)and (x∗, y∗, z∗) respectively, by (39), (41), (42) and by point (iii) of Proposition 5.2 we obtain:

limk→∞,k∈K(z∗)

f(xk, yk, zk) = limk→∞,k∈K(z∗)

f(x2k, yk, zk) = limk→∞,k∈K(z∗)

f(xk, yk, zk). (43)

Finally by Property (ii) of LCS, we have two possibilities:

- either

f(x2k, y, z)− f(xk, y, z) ≤ −σ(µ1k), (44)

- or

µ1k ≤ δµk. (45)

Taking into account that {µk}→ 0, recalling (43), (44) and (45), it follows easily that the sequence{µ1k} tends to zero and hence the result follows again by Property B.The convergence analysis just presented proves that Algorithm MIVAM converges to a point satis-fying the necessary conditions for optimality introduced in Section 2.

Now we state a further result, which shows some properties of the limit points of the vectors producedat Step 3.2. This additional analysis is strongly related to some results stated in [2] (see in particularPropositions 4.8 and 4.12).

Proposition 5.5 Let (x∗, y∗, z∗) be an accumulation point of the sequence {xk, yk, zk}K(z∗) and let(x, y, z) ∈ N (x∗, y∗, z∗) be an accumulation point of a sequence {(xk, yk, zk)}K(z∗), with (xk, yk, zk) ∈N (xk, yk, zk) (see Proposition 5.3). Then

Page 18: An algorithm model for mixed ... - Optimization Online

18 S. Lucidi, V. Piccialli, M. Sciandrone

(i) If (x, y, z) ∈ N (x∗, y∗, z∗) satisfies f(x, y, z) < f(x∗, y∗, z∗) + ξ, there exists a feasible point

(x, y, z) which is an accumulation point of the subsequence {(xj∗kk , yk, zk)}K(z∗), where K =

{k ∈ K(z∗) : (xk, yk, zk) ∈ Wk−1}, and xj∗k

k is obtained at iteration k starting from (xk, yk, zk)and repeating steps 3.2-3.4 j∗k times until the test at Step 3.4 fails. Moreover the point (x, y, z)satisfies

f(x∗, y∗, z∗) ≤ f(x, y, z) ≤ f(x, y, z) (46)

and it is a stationary point of the problem

min f(x, y, z) (47)

x ∈ Fx(y, z).

(ii) Suppose that Procedure LCS satisfies the following property: given a feasible point (xh, yh, zh)it produces a point xh such that:

f(xh, yh, zh) ≤ f(xh, yh, zh)− σ( xh − xh ) (48)

where the function σ(·) is such that if σ(th)→ 0, then th → 0.

Then, if (x, y, z) ∈ N (x∗, y∗, z∗) satisfies f(x, y, z) = f(x∗, y∗, z∗) = f(x, y, z) and x − x =s > 0 (where the point (x, y, z) is defined at point (i)), then, for all p ∈ (0, s) there exists anaccumulation point (x, y, z) of a subsequence {(xjkk , yk, zk)}K(z∗), where for all k ∈ K(z∗) thepoint (xjkk , yk, zk) is obtained at Step 3.2 by applying LCS Procedure jk (where jk ≤ j∗k) timesstarting from (xk, yk, zk). Moreover this limit point satisfies the following properties:

(a) x− x = p,

(b) f(x, y, z) = f(x, y, z),

(c) (x, y, z) is a stationary point of Problem (47).

Proof.Point (i): Proposition 5.3 and the continuity assumption on f ensure that every point (x, y, z) ∈N (x∗, y∗, z∗), satisfying f(x, y, z) < f(x∗, y∗, z∗) + ξ, is an accumulation point of a sequence{(xk, yk, zk)}K(z∗), where (xk, yk, zk) ∈ N (xk, yk, zk) and that, for sufficiently large values for k,

f(xk, yk, zk) ≤ f(xk, yk, zk) + ξ. (49)

Thus the set K(z∗) is not empty. Then Assumption A2 implies (see proof of point (ii) of Proposition5.2) that the sequence {(xj∗kk , yk, zk)}K(z∗) admits at least an accumulation point, which is feasibleby point (i) of Property A of LCS.Since k ∈ K(z∗), we have that the test of Step 3.3 fails at the (k − 1)-th iteration, that is:

f(xj∗kk , yk, zk) > f(xk, yk, zk)− ηk,

which implies with point (iii) of Proposition 5.2, for k ∈ K(z∗) and k →∞:

f(x, y, z) ≥ f(x, y, z) ≥ f(x∗, y∗, z∗),

where the left inequality follows from the instructions at Step 3.2 and from Property A of LCS.Now, we have to prove that (x, y, z) is a stationary point of Problem (47). By the instructions at

Step 3.4, we have that, for all k, µj∗kk ≤ µk. By point (v) of Proposition 5.2 it follows that

limk→∞

µj∗kk = 0 (50)

Page 19: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 19

and hence, by Property B, we have the thesis.

Point (ii): First of all, we can note that taking into account Assumption A2 and the instructions atStep 3, by repeating reasonings similar to the ones used to prove point (ii) of Proposition 5.2, wecan conclude that every sequence {(xjkk , yk, zk)}K(z∗), jk ∈ {1, . . . , j∗k} is bounded.Let K ⊆ K(z∗) be a set of indices such that the subsequences {(xk, yk, zk)}K and {(xk, yk, zk)}Kconverge respectively to the points (x∗, y∗, z∗) and (x, y, z).Now, for all k ∈ K, k sufficiently large, we define the index jk, as an index belonging to the set{j = 1, . . . , j∗k : xjk − x ≤ p, xj+1k − x > p}. Note that K, jk is well defined since by definition{x1k}K → x and {xj∗kk }K → x and by assumption x = x.By definition of jk, we have:

xjkk − x ≤ p < xjk+1k − x ≤ xjk+1k − xjkk + xjkk − x , (51)

and, by the instructions of the algorithm, we have:

f(xk, yk, zk) ≥ . . . ≥ f(xjkk , yk, zk) ≥ f(xjk+1k , yk, zk) ≥ . . . ≥ f(xj∗k

k , yk, zk). (52)

Then by assumption we have f(x, y, z) = f(x∗, y∗, z∗) = f(x, y, z), and hence it follows

limk→∞,k∈K

f(xjkk , yk, zk) = limk→∞,k∈K

f(xjk+1k , yk, zk) = f(x, y, z). (53)

By (48) we have thatf(xjk+1k , y, z) ≤ f(xjkk , y, z)− σ( xjk+1k − xjkk ), (54)

and hence it follows from (53) that

limk→∞,k∈K

xjk+1k − xjkk = 0. (55)

Let (x, y, z) be any accumulation point of the sequence {(xjkk , yk, zk)}K . Now, taking limits in (51),and recalling (55) we have that

x− x = p (56)

and this proves point (a).As regards point (b) it follows from (53).Finally, we have to prove point (c). Recalling point (ii) of Property A of LCS, we split the set K intwo subset, K1 and K2 such that for all k ∈ K1 we have

f(xjk+1k )− f(xjkk ) ≤ σ(µjkk ) (57)

and for all k ∈ K2, we haveµjk+1k ≤ δµjkk , δ ∈ (0, 1). (58)

By using (53), if k ∈ K1 we havelim

k→∞,k∈K1

µjkk = 0. (59)

If k ∈ K2, let mk be the biggest index such that mk < jk and such that

f(xmk+1k )− f(xmk

k ) ≤ σ(µmk

k ). (60)

It surely exists, since otherwise j∗k = 1 and hence xj∗kk = xk. Then we have

µjkk ≤ (δ)jk−mkµmk

k . (61)

Page 20: An algorithm model for mixed ... - Optimization Online

20 S. Lucidi, V. Piccialli, M. Sciandrone

Recalling (52) and (53), by (60) and by the property of σ we have that

limk→∞,k∈K2

µmk

k = 0 (62)

and hence (61) implies

limk→∞,k∈K2

µjkk = 0. (63)

By (59) and by (63), we have that

limk→∞,k∈K

µjkk = 0 (64)

and hence Property B of LCS implies point (c).

Remark: the property stated in the above proposition assumes a relevant role if Problem (1)satisfies convexity assumptions, that is, for all z ∈ Fz, y ∈ Fy(z), the objective function f(x, y, z)is convex with respect to the continuous variables and the feasible set Fx(y, z) is convex. In fact,consider any (x, y, z) ∈ N (x∗, y∗, z∗) such that f(x, y, z) < f(x∗, y∗, z∗) + ξ. Proposition 5.5 andthe convexity assumption imply that the global minimum point of the corresponding continuousproblem (47) has objective value greater or equal to f(x∗, y∗, z∗). Therefore, for a choice of ξ suchthat f(x, y, z) < f(x∗, y∗, z∗) + ξ for all (x, y, z) ∈ N (x∗, y∗, z∗), it follows that the stationary point(x∗, y∗, z∗) is also a local minimum point of Problem (1) according to Definition 2.2.

6 A derivative-free algorithm

We consider here a particular class of Problem (1), which includes the optimal design problemdescribed in Section 3. In particular, we assume that the feasible set is defined by linear constraintsand that the derivatives of f(x, y, z) with respect to continuous variables x can be neither calculatednor approximated explicitly. More formally, the problem we consider is:

min f(x, y, z) (65)

zl ≤ z ≤ zu,yl ≤ y ≤ yu,A(y, z)x ≤ b(y, z)z ∈ Znz , y ∈ Zny(z).

where A(y, z) ∈ m(y,z)×nx(z), b(y, z) ∈ m(y,z), ly, uy ∈ Zny(z) and lz , uz ∈ Znz .We suppose verified assumptions A1, A2, A4 and we replace Assumption A3 with the followingassumption:

A3 . for each z ∈ Fz and for each y ∈ Fy(z), the function f(., y, z) is continuously differentiable ina neighborhood of {x : (x, y, z) ∈ Lz(x0, y0, z0)}.

We define a stationary point of the following continuous problem

min f(x, y, z) (66)

A(y, z)x ≤ b(y, z)

any point (x, y, z) such that

∇xf(x, y, z)T (x− x) ≥ 0 ∀x : A(y, z)x ≤ b(y, z). (67)

Page 21: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 21

We describe a particular instance of Algorithm MIVAM by specifying the local continuous searchprocedure representing LCS, which takes into account the features of the considered class of problems.In order to overcome the lack of gradient information, drawing our inspiration from the derivative-free method proposed in [6], we define a procedure based on a sampling of the objective functionalong suitable sets of search directions. In particular, we assume that for every feasible point (x, y, z)there exists a set of search directions

D(x, y, z) = (d1, . . . , dr) : di ∈ nx(z), di = 1, i = 1, . . . , r (68)

that satisfies the following condition.

Condition 1

There exists a subset of feasible directions T (x, y, z) ⊆ D(x, y, z) which satisfies the followingproperties:

i) for every feasible pair (y, z) two positive constants α and ¯ exist such that, for all x ∈Fx(y, z) and for all di ∈ T (x, y, z), we have that

x+ αdi ∈ Fx(y, z)

for all α ∈ (0, α] and for all x ∈ B(x, ) ∩Fx(y, z);

ii) if a sequence of points {xk, yk, zk} converges to a point (x, y, z), then

∇xf(x, y, z)T (x− x) ≥ 0 ∀x ∈ Fx(y, z) if and only if (69)

limk→∞

i∈I(xk,yk,zk)min{0,∇xf(xk, yk, zk)Tdik} = 0,

where I(xk, yk, zk) = i : dik ∈ T (xk, yk, zk) .

We refer to [4] and [6] (see Sect. 3, Proposition 4 and point (iv) of Proposition 8) for the constructionof a set of directions satisfying Condition 1.We present below the derivative-free procedure (DFP) which plays the role of Procedure LCS in theproposed instance of Algorithm MIVAM.

Page 22: An algorithm model for mixed ... - Optimization Online

22 S. Lucidi, V. Piccialli, M. Sciandrone

Procedure DFP(x, µ0, x, µ, y, z)

Data: γ ≥ 0, δ1 ∈ (0, 1), δ2 ∈ (0, 1).

Step 1: Determine a set of directions D(x, y, z) given by (68) satisfying Condition 1.

Step 2: Set i = 1, xi = x.

Step 3: Compute the maximum steplength αi such that xi + αidi ∈ Fx(y, z).

Set αi = min{αi, µ0}.

Step 4: If αi > 0 and f(xi + αidi, y, z) ≤ f(xi, y, z)− γ(αi)2 go to Step 5.

Otherwise set αi = 0 and go to Step 6.

Step 5: Set α = αi

5.1: Let α = min{αi, αδ2 }.

5.2: If α = αi or f(xi + αdi, y, z) > f(xi, y, z)− γα2 set αi = α

and go to Step 6.

5.3: Set α = α and go to Step 5.1.

Step 6: Set xi+1 = xi + αidi. If i < r set i = i+ 1 and go to Step 3.

Step 7: Set x = xi+1. Set µ = max {maxi=1,...,r {αi}, δ1µ0} and stop.

First, we show that Procedure DFP is well defined.

Proposition 6.1 Assume that f(x, y, z) is bounded from below on Lz(x, y, z). Then Procedure DFPis well defined, i.e. it does not cycle at Step 5.

Proof. Suppose by contradiction that Procedure DFP cycles at Step 5. This implies that there existsan index i ∈ {1, . . . , r} such that for all j

αi > 0, (70)

xi +α

δjdi ∈ Fx(y, z), (71)

f(xi +α

δjdi, y, z) ≤ f(xi, y, z)− γ( α

δj)2, j = 0, 1, . . . (72)

Page 23: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 23

Taking limits in (72) for j →∞, we obtain that the objective function is unbounded from below onthe level set Lz(x, y, z), but this contradicts the assumption.We note that the hypothesis of Proposition 6.1 holds under assumptions A2 and A3 .The next theorem shows that Procedure DFP satisfies the properties needed to ensure the globalconvergence of the sequence of points generated by Algorithm MIVAM.

Proposition 6.2 Procedure DFP satisfies Properties A and B.

Proof.Property A: By the instructions of Procedure DFP, we have that the point x produced by ProcedureDFP is feasible and such that f(x, y, z) ≤ f(x, y, z), so that point (i) of Property A holds.As regards point (ii), by the instructions of DFP, we have that either µ = δ1µ

0 or µ = maxi=1,...,r {αi}.In the latter case, we have that x = x and for each i = 1, . . . , r we have

f(xi+1, y, z) ≤ f(xi, y, z)− γ(αi)2, (73)

and hence

f(x, y, z) ≤ f(x, y, z)− γr

i=1

(αi)2. (74)

Since x = x, there exists at least an index i such that αi = 0, so that (74) implies

f(x, y, z) ≤ f(x, y, z)− γ( maxi=1,...,r

{αi})2 = f(x, y, z)− γ(µ)2, (75)

and hence point (ii) holds.

Property B: let {(xh, yh, zh)} be a sequence of feasible points. We have to prove that, if we apply,for all h, Procedure DFP starting from (xh, yh, zh) and if the corresponding sequence {µh} is suchthat

limh→∞

µh = 0, (76)

then every accumulation point (x, y, z) of the sequence {(xh, yh, zh)} is such that x a stationarypoint of the following continuous problem

min f(x, y, z) (77)

x ∈ Fx(y, z),that is

∇xf(x, y, z)T (x− x) ≥ 0 ∀x ∈ Fx(y, z). (78)

We denote by dih, i = 1, . . . , r the directions used by DFP when applied at xh and by xih, α

ih, α

ih

the corresponding point and scalars.Let us consider any (possibly relabelled) subsequence {(xh, yh, zh)} such that limh→∞(xh, yh, zh) =(x, y, z) and limh→∞ dih = d

i.First of all we note that for all i ∈ {1, . . . , r}, we have

xih = xh +

i−1

j=1

αjhdjh

and then we can write

xih − xh =

i−1

j=1

αjhdjh ≤

i−1

j=1

αjhdjh (79)

=i−1

j=1

αjh ≤ r maxj=1,...,r

{αjh} ≤ rµh,

Page 24: An algorithm model for mixed ... - Optimization Online

24 S. Lucidi, V. Piccialli, M. Sciandrone

where the last inequality follows by the definition of µh at Step 7. Moreover, again by the definitionof µh, we have α

ih ≤ µh, which implies together with (76) that

limh→∞

αih = 0. (80)

By (76), (79) and (80) we have that, for h sufficiently large,

xih ∈ B(xh, ) ∩ Fx(yh, zh), (81)

2αihδ2≤ α, (82)

where ¯ and α are the scalars defined at Point i) of Condition 1.Now, again Condition 1 implies that for all i ∈ I(xh, yh, zh) = {i : dih ∈ T (xh, yh, zh)} we have

xih + 2αihδ2dih ∈ Fx(yh, zh), (83)

from which, recalling the definition of αih, we get

0 ≤ αihδ2< αih, ∀i ∈ I(xh, yh, zh). (84)

Then we have two possibilities: either αih > 0 or αih = 0.

For all i ∈ I(xh, yh, zh), we can introduce the two set of indicesH1(i) = {h : αih > 0}H2(i) = {h : αih = 0}.

For all h ∈ H1(i), the definition of αih and (84) imply:

f(xih +αihδ2dih, yh, zh)− f(xih, yh, zh) > −γ

αihδ2

2. (85)

By the mean value theorem, we obtain

∇f(xih + λαihδ2dih, yh, zh)

Tdih > −γαihδ2

2, (86)

where λ ∈ (0, 1). From (86), we have

∇f(xh, yh, zh)Tdih > −γ αihδ2

2 − (∇f(xih + λαihδ2dih, yh, zh)−∇f(xih, yh, zh))Tdih

+(∇f(xh, yh, zh)−∇f(xih, yh, zh))T dih. (87)

Taking limits in (87) for h → ∞, h ∈ H1(i), recalling that (xh, yh, zh) → (x, y, z) and using (80),we can write:

limh→∞,h∈H1(i)

∇f(xh, yh, zh)Tdih = ∇xf(x, y, z)T di ≥ 0. (88)

For all h ∈ H2(i), by point i) of Condition 1, we have that αih > 0 and the instructions of thealgorithm imply:

f(xih + αihdih, yh, zh)− f(xih, yh, zh) > −γ(αih)

2. (89)

By the mean value theorem, we obtain

∇f(xih + λαihdih, yh, zh)

T dih > −γ(αih)2, (90)

Page 25: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 25

where λ ∈ (0, 1). From (90), we have

∇f(xh, yh, zh)T dih > −γ(αih)2 − (∇f(xih + λαihd

ih, yh, zh)−∇f(xih, yh, zh))T dih

+(∇f(xh, yh, zh)−∇f(xih, yh, zh))Tdih. (91)

By repeating similar reasonings used above and recalling that the instructions of Procedure DFPimply

αih ≤ µ0h ≤µhδ1,

we obtain:lim

h→∞,h∈H2(i)∇f(xh, yh, zh)Tdih = ∇xf(x, y, z)T di ≥ 0. (92)

Therefore, by (88) and (92), for all i ∈ I(xh, yh, zh), we havelimh→∞

∇f(xh, yh, zh)Tdih = ∇xf(x, y, z)T di ≥ 0, (93)

from which we obtain

limh→∞

i∈I(xh,yh,zh)min {0,∇f(xh, yh, zh)T dih} = 0. (94)

This, combined with point ii) of Condition 1, implies that (78) holds and hence the thesis.

Finally the next proposition shows that Procedure DFP satisfies also the additional property re-quired in Proposition 5.5, and this completes the convergence analysis of this particular instance ofAlgorithm MIVAM.

Proposition 6.3 Procedure DFP satisfies the condition:

f(x, y, z) ≤ f(x, y, z)− σ( x− x ), (95)

where σ(·) ≥ 0 is such that if σ(th)→ 0, then th → 0.

Proof. By the instructions of Procedure DFP, we have two possibilities: either x = x and hence (95)is obviously satisfied, or x = x. In this case, we have

x = x+

r

i=1

αidi, (96)

and hence

x− x ≤r

i=1

αi. (97)

Moreover, by the acceptability criterion on αi, we have that

f(x)− f(x) ≤ −γr

i=1

(αi)2. (98)

Recalling that

(

r

i=1

αi)2 ≤ rr

i=1

(αi)2, (99)

by (97) and (98) we obtain

f(x)− f(x) ≤ −γrx− x 2, (100)

that is (95) with σ(t) = γr t2.

Propositions 6.2 and 6.3 show that the particular instance of Algorithm MIVAM where ProcedureDFP is used as Procedure LCS maintains all the convergence properties established in Section 5 forthe general case.

Page 26: An algorithm model for mixed ... - Optimization Online

26 S. Lucidi, V. Piccialli, M. Sciandrone

7 Optimal design of a magnetic resonance device: numericalresults

In this section we report the results of the computational experience on Problem (8), which regards,as already said in Section 3, the optimal design of a small apparatus for magnetic resonance.We have applied Algorithm MIVAM on this problem where we have used the definition of discreteneighborhood described in Section 3 and we have set the parameters at the values ξ = 10−2, θ =0.5, η0 = 10

−5, µin = 0.5.As regards the local continuous search LCS, we have employed the derivative-free Procedure DFPintroduced in Section 6 with γ = 10−6, δ1 = δ2 = 0.5. The directions determined at Step 1of Procedure DFP have been computed as described in [6]. Finally we have stopped AlgorithmMIVAM when µk became less (or equal) than 10

−6.The device structure corresponding to the starting point is reported in Fig. 3. This structure wasconsidered reasonable by the magnetic resonance experts. We note that this structure is made of 5rings and 3 magnets with radius of 22mm and the corresponding objective value is f0 = 0.3715.

Figure 3: Structure corresponding to the starting point

The best solution f∗ = 6.03 × 10−3 has been found by Algorithm MIVAM after 2162 objective

Page 27: An algorithm model for mixed ... - Optimization Online

Algorithm model for mixed variable programming 27

function evaluations and the corresponding structure, depicted in Fig. 4, is made of 7 rings and 3magnets with radius of 22mm.

Figure 4: Structure corresponding to the best point found by Algorithm MIVAM

We can note that a significant improvement of the uniformity of the magnetic field has been obtainedby a relatively small number of function evaluations. Moreover, by comparing Fig. 3 and Fig. 4we can observe that the final structure is quite different from the initial one and this points out theimportant role played by the discrete dimensional variables.

In order to evaluate more deeply the possible advantages deriving from the use of the proposedmixed optimization algorithm, we have performed further experiments.

In particular, we have applied a different strategy, which does not alternate continuous and discretesearches as Algorithm MIVAM, but it treats the dimensional and categorical variables as parameters,whose values are chosen by considering the points belonging to the discrete neighborhood of the initialvector. Then, starting from each one of these points and keeping fixed the discrete variables, thecorresponding nonlinear continuous problem is solved, i.e. a stationary point with respect to thecontinuous variables is produced.

The continuous minimization has been performed by means of a derivative-free algorithm proposedin [6], which is an iterative method that generates a sequence of feasible points with objective value

Page 28: An algorithm model for mixed ... - Optimization Online

28 S. Lucidi, V. Piccialli, M. Sciandrone

non increasing. At each iteration the objective function is evaluated at a finite number of pointsalong a suitable set of search directions in order to try to find one that provides a sufficient decreaseof the objective function value. The search directions and the sampling technique are the sameemployed by Procedure DFP used within Algorithm MIVAM. Thus, the implemented optimizationstrategy alternative to Algorithm MIVAM is based on the continuous search technique used by thelatter algorithm. Therefore the comparison between the proposed mixed algorithm and the abovedescribed strategy appears reasonable.In particular, the implemented strategy has found the best solution f∗ = 7.54×10−3 (with 6 rings, 3magnets and radius 22mm), which is significantly worse (with respect to the considered application)than the one determined by Algorithm MIVAM. Moreover, it has required a number of functionevaluations equal to 5579, which is much higher than the one required by Algorithm MIVAM.The obtained results seem to indicate that the proposed approach is promising, even thought furtherwork is needed on the specific optimal design problem considered. In particular, in order to obtaina better solution, two aspects may deserve to be further investigated. The first aspect regardsthe introduction of more sophisticated definitions of the discrete neighborhoods. The second oneconcerns the introduction of a global optimization technique in the minimization process of thecontinuous variables.

References

[1] Abramson M. A., Pattern Search Algorithms for Mixed Variable General Constrained Opti-mization Problems, Ph.D. Thesis. Department of Computational and Applied Mathematics, RiceUniversity, August 2002.

[2] Audet C., Dennis J. E., Pattern Search Algorithms for Mixed Variable Programming, SIAMJournal on Optimization, vol. 11, 2001, pp. 573-594.

[3] Berge C., Topological Spaces, MacMillan Comp, New York, 1963.

[4] Lewis R. M. and Torczon V., Pattern search methods for linearly constrained minimization,SIAM Journal on Optimization, vol. 10, Number 3, 2000, pp. 917-941.

[5] Liuzzi G., Lucidi S., Placidi G., Sotgiu A., A Magnetic Resonance Device designed via globaloptimization techniques, TR 09-02, Department of Computer and Systems Science “AntonioRuberti”, University of Rome “La Sapienza” (2002).

[6] Lucidi S., Sciandrone M., Tseng P., Objective-derivative-free methods for constrained optimiza-tion, Mathematical Programming, vol. 92 (1), 2002, pp. 37-59.

[7] Lucidi S., Piccialli V., A derivative based algorithm for a particular class of mixed variable opti-mization problems, TR 07-03, Department of Computer and Systems Science “Antonio Ruberti”,University of Rome “La Sapienza”, 2003.