ModelingPareto-OptimalSetUsingB-SplineBasisFunctions · 2012. 5. 28. · run. Evolutionary Multi-objective optimization (EMO) algorithms attempt to ﬁnd multiple Pareto-optimal solutions

Modeling Pareto-Optimal Set Using B-Spline Basis Functions

Piyush Bhardwaja, Bhaskar Dasguptaa, and Kalyanmoy Deba∗

a Department of Mechanical Engineering,

Indian Institute of Technology Kanpur

Kanpur, PIN 208016, India

Email: {piyushb,dasgupta,deb}@iitk.ac.in

KanGAL Report Number 2012008

Abstract

In the past few years, multi-objective optimization (MOO) algorithms have been extensively applied inseveral fields including engineering design problems. A major reason is the advancement of evolutionarymulti-objective optimization (EMO) algorithms that are able to find a set of non-dominated points spreadon the respective Pareto-optimal front in a single simulation. Besides just finding a set of Pareto-optimalsolutions, one is often interested in capturing knowledge about the variation of variable values overthe Pareto-optimal front. Recent innovization approaches for knowledge discovery from Pareto-optimalsolutions remain as a major activity in this direction. In this paper, we present a different data-fittingapproach for continuous parameterization of the Pareto-optimal front. Cubic B-spline basis functionsare used for fitting the data returned by an EMO procedure in the variable space. No prior knowledgeabout the order in the data is assumed. An automatic procedure for detecting gaps in the Pareto-optimalfront is also implemented. The algorithm takes points returned by the EMO as input and returns thecontrol points of the B-spline manifold representing the Pareto-optimal set. Results for several standardand engineering, bi-objective and tri-objective optimization problems demonstrate the usefulness of theproposed procedure.

1 Introduction

Multi-objective optimization (MOO) entails multiple conflicting objectives for which no unique single solutioncan be found. Therefore, the outcome of an MOO is often a Pareto-optimal set. Image of the Pareto-optimalset in the objective space constitutes the Pareto-optimal front. Points on the Pareto-optimal front are thetrade-off solutions which means no point on the Pareto-optimal front can be improved in one objectivewithout compromising it in another objective. The concept of dominance is used to compare two solutionsin MOO.

A solution X is said to dominate another solution Y (X � Y ), if X is not worse than Y in all of theobjectives and X is better than Y in at least one of them (Deb, 2001). In a set, the points not dominatedby any other set member are called the non-dominated points of that set. Non-dominated set of the entiresearch space constitutes the Pareto-optimal set.

Several classical and evolutionarymethods exist to solve the multi-objective optimization problem (MOOP).Methods like weighted sum approach (Athan and Papalambros, 1996), NBI (Das and Dennis, 1998), weightedmin-max approach (Marler and Arora, 2004), goal programming (Charnes et al., 1967; Charnes and Cooper,1977) and physical programming (Messac, 1996) give a single point on the Pareto-optimal front in eachrun. Evolutionary Multi-objective optimization (EMO) algorithms attempt to find multiple Pareto-optimalsolutions in a single run. A survey of methods to solve MOOP in engineering is done elsewhere (Marler andArora, 2004).

∗Corresponding author. Email: [email protected]

1

Most of these algorithms return either a single solution or a set of solutions on the Pareto-optimalfront, which is a continuous or a piecewise continuous entity. Though one or only a few solutions may beimplemented, a user may like to see solutions well-spread on the front. To give a user the flexibility of choosingany point that lies on the Pareto-optimal front, one needs a continuous parameterization of the front. Inthis work, a continuous parameterization is achieved by data fitting of non-dominated solutions returned bymulti-objective optimization algorithm (MOOA). A continuous parameterization of the front allows the userto produce points on the front at his own discretion. Hence, well-spaced points can be produced on the frontto aid the process of decision making. Missing solutions can also be generated by this procedure. Moreover,generating a continuous parameterization in the variable space rather than the objective space, allows us tomap the parameterized entity in the variable space to the objective space by using definition of the objectivefunction and thereby get a continuous parameterization in both the spaces. This also facilitates an easytransition from the objective to the variable space as the parameter value associated with a solution in thevariable space is the same as its image in the objective space. A continuous parameterization allows one toinvestigate the variation of variables with parameters along the Pareto-optimal front. This helps the user infinding interesting common properties of the points in the Pareto-optimal set. For example some variablesmay not change their values at all, along the Pareto-optimal front. The objective of the present study is todevelop an algorithm to find a continuous parameterization of the Pareto-optimal set. This parameterizationcan be used

1. to produce points on the Pareto-optimal front with discretion,

2. to facilitate transition from the objective to the variable space, and

3. to investigate the variation of variables along the Pareto-optimal front and deduce common principles.

A few algorithms attempt to give a continuous representation of the Pareto-optimal front as the resultof the optimization process. Steuer et al. (2010) proposed continuous parametric curves and surfaces forbi-objective and tri-objective problems using parametric quadratic programming but their algorithm wasrestricted to convex problems only. Grosan (2003) proposed a Pareto Evolutionary Continuous RegionAlgorithm (PECRA) to handle continuous regions on the Pareto-optimal front. However, it was restricted tosingle-variable MOOPs only. Dumitrescu et al. (2001) proposed an evolutionary approach called ContinuousPareto Set (CPS) algorithm to find continuous Pareto-optimal set (and the corresponding Pareto-optimalfront). In their algorithm, individuals in the population are closed intervals or points. CPS is also restricted tosingle-variable MOOPs only. Approximation of the true Pareto-optimal front by using experimentally foundsolutions has been attempted by Veldhuizen (1999). Response surface approximation was proposed by Goelet al. (2007) to approximate Pareto-optimal front. Their method gives one of the objectives as a functionof other objectives. In their method, the relationship exists in the objective space only. Moreover, theiralgorithm was tested on a problem with continuous Pareto-optimal front. Several evolutionary algorithmsmake use of the statistical properties of the solutions of the current generation to create solutions for thenext generations (Okabe et al., 2004; Bosman and Thierens, 2005; Zhou et al., 2005, 2006). However, thesealgorithms focus on producing points on the front rather than giving a continuous picture. Dellnitz et al.(2005) use multilevel subdivision techniques to approximate a Pareto-optimal set. They use a set-orientedapproach to create tight boxes covering the Pareto-optimal set. Their algorithm is developed as a stand-aloneglobal optimization method in contrast to the algorithm presented here, which is based on post-processingof points returned by some MOOA. Their method does not give parameterization of the Pareto-optimal setand variations of the design variables along the Pareto-optimal front.

Finding innovative design principles has been investigated in detail by Bandaru and Deb (2011). Theprocess of ‘innovization’ produces relationships between objectives, variables and constraints which are foundby using them as basis functions. The algorithm presented in this paper, generates parameteric equations forall the variables, which can be used to find qualitative relationships between the variables or verify knownrelationships. It focuses on design changes as designer progresses along the Pareto-optimal front. Theinnovization process does not give a continuous parameterization of the Pareto-optimal set. Bonham andParmee (2004) proposed Cluster Oriented Genetic Algorithms (COGA) to identify high performing regions

2

in the search space. COGA works in the design space to rapidly restrict the domain of interest to regionsof high performance. Multi-objective COGA explores the high performing regions independently for eachobjective and then project them on a hyperplane to find the common high performance region as well asrelationships and interaction between objectives. Rather than giving a Pareto-optimal front it focuses onfinding design space of high performance.

In this paper, a data fitting method is proposed to give the continuous parameterization of the non-dominated set returned by the MOOA. Thus, a continuous picture is obtained from the discrete data givenby the MOOA. The data fitting procedure is carried out in the variable space. The variable space canhave high dimension, but the dimension of the manifold to be fitted to the data depends on the numberof objectives. It can be shown for smooth objective and constraint functions, under some mild regularityconditions, that the Pareto-optimal set of an M -objective MOOP is at most an (M − 1)-dimensional entity(termed as Hyposurface in this study). Also, the order of the function describing the Pareto-optimal setin the variable space is lower than or equal to that of the function describing the Pareto-optimal front inthe objective space. This has been shown empirically for several standard optimization problems (Jin andSendhoff, 2003). Dellnitz et al. (2005) also made use of this property to approximate a Pareto-optimal setusing multilevel subdivision techniques. We can have confidence in the data fitting approach because aPareto-optimal set often consists of connected or piecewise connected points (Jin and Sendhoff, 2003). Thismeans that, between two close points of the Pareto-optimal set returned by the MOOA, we can find morepoints, which lie in the Pareto-optimal set but are not included in the output of the MOOA. This propertyof connectedness has been studied by several researchers, for example (Ehrgott and Klamroth, 1997; Jin andSendhoff, 2003; Daniilidis et al., 1997).

To detect disconnected parts of the Pareto-optimal set for piecewise connected Pareto-optimal set, weuse a clustering procedure. For data fitting we use cubic B-spline basis function with uniform knot vector(Mortenson, 1985). Many methods exist for curve fitting of the data points. However, most of the methodsneed some initial ordering of the data (Fang and Gossard, 1995). The present algorithm assumes no orderin the data.

The paper is structured in the following manner. Section 2 covers the mathematical formulation of theproblem and provides a brief background of the subject. Sections 3 and 4, provide details of the errorreduction and the clustering steps respectively. Section 5 gives a detailed explanation of the data fittingapproach. In Section 6, results for standard and engineering design optimization problems are presented.Limitations of the algorithm are discussed in Section 7 followed by conclusions in Section 8.

2 Mathematical Formulation

This section gives the mathematical formulations and a brief background necessary for the understanding ofthe paper. MOOP in its general form is stated as:

Minimize/Maximize fm(x), m = 1, 2, . . .M, (1)

subject to gj(x) ≥ 0, j = 1, 2, . . . J, (2)

hk(x) = 0, k = 1, 2, . . .K, (3)

x(L)i ≤ xi ≤ x

(U)i , i = 1, 2, . . .N, (4)

where fm, gj , hk, x(L)i and x

(U)i represent the objective functions, the inequality constraints, the equality

constraints and the lower and upper bounds on the variables, respectively. For a MOOP with differentiableobjectives and constraints a point in the Pareto-optimal set will necessarily satisfy the Karush Kuhn Tucker

3

(KKT) conditions. The KKT conditions can be stated as following (Miettinen, 1999; Deb, 2001):

M∑

i=1

λi ∇fi(x) −

J∑

j=1

µj ∇gj(x)−

K∑

k=1

νk ∇hk(x) = 0, (5)

µjgj(x) = 0, ∀j ∈ [1, . . . J ],

λi ≥ 0, ∀i ∈ [1, . . .M ],

µj ≥ 0, ∀j ∈ [1, . . . J ],

gj(x) ≥ 0, ∀j ∈ [1, . . . J ],

hk(x) = 0, ∀k ∈ [1, . . .K].

where λ and µ represent the Lagrange multipliers for the objectives and the constraints, respectively.A point satisfying the KKT conditions is a candidate solution for optimality. All points on the Pareto-

optimal front (and hence in the Pareto-optimal set) must satisfy the above KKT conditions. However, as theKKT conditions are not sufficient conditions for Pareto-optimality, not all KKT points are optimal points.Using the KKT conditions, regularity in the Pareto-optimal front solutions can be inferred under certainsmoothness conditions. Regularity implies that for an M -objective problem the Pareto-optimal front is atmost an (M − 1)-dimensional piecewise continuous manifold in both the objective and the variable space.This is an important property which has mostly gone unnoticed in the EMO field (Zhang et al., 2008). Here,we use this property to ensure that for an M -objective problem we need (M −1) parameters to parameterizethe Pareto-optimal set.

An N -dimensional manifold in a K-dimensional space represents a curve for N = 1, a surface for N = 2and a hypersurface for N = K − 1. ‘Manifold’ is used as a generalized term for the intermediate dimensions.The term ‘manifold’ has a specific definition in mathematics and topology, and can not be used casually. Forthe lack any other well-established term, we define a term Hyposurface as a substitute to the term ‘manifold’.

Definition 1. An M -dimensional hyposurface in an N -dimensional space (1 ≤ M < N) is an entity whichrequires M independent parameters for its complete description.

Thus, an 1-d hyposurface is a curve and a 2-d hyposurface is a surface. From here on the use of theterm ‘manifold’ will be avoided and instead of referring data fitting as curve or surface fitting, general termhyposurface fitting will be used.

Cubic B-spline basis functions with uniform knot vector are used for modeling the hyposurface. Theyprovide a convenient matrix notation for hyposurface fitting. Control points of B-spline hyposurface provide agood global and local control. The number of control points (Ncp) required to model the B-spline hyposurfacedepend on the dimension of the hyposurface and the order of the B-spline basis functions used for themodeling. Detailed theory of B-spline curves and surfaces is covered by Mortenson (Mortenson, 1985).

The data fitting problem is presented here. Given a set of data points, x1, x2,. . . ,xN, where each datapoint is aK-dimensional vector (xi = [xi1 xi2 . . .xiK ]T ), and a function f(ui1, ui2, . . . , uiM , Pk) : R

M+Ncp →R, where uij ∈ R ∀ j ∈ [1, . . .M ] are the M parameters representing the ith point on the M -dimensional hy-posurface in space and Pk is the k-th column of an NcpXK dimensional matrix of control points that modelsthe hyposurface, the problem of hyposurface fitting is to find the appropriate values of P and parametersui ∈ R

M corresponding to each data point, i such that f(ui, Pj) = xij , ∀i ∈ [1, . . . , N ], and ∀j ∈ [1, . . . ,K].The problem of hyposurface fitting inherently leads to a least square problem which is mathematically statedas follows:

MinimizeN∑

i=1

(f(ui,Pj)− xi,j)2 , ∀j ∈ [1, . . . ,K] . (6)

The problem, stated in other words is the minimization of the distance between the input points X andthe hyposurface modeled by the function f , where the parameters assigned to each data point (ui) and thecontrol points (P) that model the hyposurface are the unknowns. The complete problem of parameterizingthe Pareto-optimal set can be stated as following:

4

PROBLEM 1. Given the non-dominated set of points of an M -objective MOOP in N variables, returnedby some MOOA, find the control points and the parameterization which model the Pareto-optimal set of thegiven MOOP.

We propose an algorithm to solve the above problem. The proposed algorithm can be divided into twomajor parts. First part is a data pre-processing step where the given input data (X) undergoes an errorreduction step and a clustering step. The second part of the algorithm deals with the hyposurface fitting.Figure 1 shows the major steps of the algorithm pictorially. In the following sections each step of the

KKT-error Reduction Clustering Hyposurface Fitting Control PointsMOOP Data

Figure 1: Main steps of the algorithm.

algorithm is described.

3 KKT-Error Reduction

As statistical methods for approximating the Pareto-optimal set are used in this study, it is important that theinput data has a low error. Lower the error, better will be the modeled Pareto-optimal set. Non-dominatedpoints returned by certain MOOAs like the EMO algorithms, may not necessarily be Pareto-optimal. InNSGA-II (Deb et al., 2002), which is a popular EMO algorithm, population quickly gets close to the Pareto-optimal front and all the members in the population become non-dominated after a few generations. Thesenon-dominated members may have some Pareto-optimal as well as some sub-optimal solutions. The chancesof creating better solutions reduce as such solutions reduce in space because of the proximity of the currentsolutions to the Pareto-optimal front. Also, because of diversity preserving measures like crowding distance,any Pareto-optimal solution that is obtained, may be lost. This may lead to slow convergence to the truePareto-optimal front. This problem is called Pareto drift and has been studied in (Goel et al., 2007). Onepossible solution suggested by them, is the use of an archive to store all the non-dominated points. However,the convergence to the true Pareto-optimal front is slow. Local search on the sub-optimal points to takethem closer to the Pareto-optimal front can be used to reduce the error in the solutions found by MOOA.

KKT-error (ǫ) is defined here to estimate the error in the input data. Using the KKT conditions definedin Equation 5, we define the KKT-error for an input point Xk as:

ǫ =∥

∥

∥

M∑

m=1

λm

∇fm(Xk)

‖ ∇fm(Xk) ‖−

J∑

j=1

µj

∇gj(Xk)

‖ ∇gj(Xk) ‖−

T∑

t=1

νt∇ht(Xk)

‖ ∇ht(Xk) ‖

∥

∥

∥

2

(7)

For brevity, we drop the equality constraints here. At a point satisfying the KKT conditions, the vectorsλ∇f and µ∇g are in equilibrium and are linearly dependent. The KKT-error term (ǫ) is a measure of thedegree of imbalance in these vectors at every input point. The degree of imbalance does not depend on the

size of these vectors, therefore, unit vectors∇f(Xk)

‖∇fm(Xk)‖and

∇gj(Xk)‖∇gj(Xk)‖

can be used. This has the advantage

of making ǫ independent of the scale of the objectives, hence, permitting the use of a common thresholderror value (ǫthresh)for all problems.

At a point on the Pareto-optimal front ǫ = 0. To calculate the error, the problem is to find the vectors λand µ such that ǫ is minimized. The bounds on the variables are converted into inequality constraints. Activeconstraints are found and their gradients are calculated. For weights, λ 6= 0 and each λi ≥ 0, normalizationis done by using

∑Mi=1 λi = 1. The problem is mathematically defined as:

5

Minimize ǫ, (8)

such that

M∑

i=1

λi = 1,

λi ≥ 0. ∀i ∈ [1, . . .M ],

µj ≥ 0, ∀j ∈ [1, . . . J ].

Above problem is linear in λ and µ. Quadratic programming (QP) is used to solve the above optimizationproblem. Minimum value of ǫ for each input point, as returned by QP, is stored and compared with aminimum threshold value (ǫthresh). Points for which ǫ > ǫthresh are tagged as erroneous points and a localsearch is conducted for them, to reduce the KKT-error.

A Local search procedure based on minimizing the achievement scalarizing function (ASF) (Wierzbicki,1979) is conducted here. ASF procedure has been successfully used by Sindhya et al. (2009) to conduct thelocal search. ASF uses a reference point z, which is the objective function vector at the erroneous point,and finds the solution on the Pareto-optimal front close to the reference point. The single-objective problemsolved to get a point on the Pareto-optimal front is defined as:

MinimizeM

maxi=1

[wi (fi(x) − zi)], (9)

such that x ∈ S,

where w is a vector of weights used to scalarize the function and S is the feasible search space. The referencepoint z helps us to focus on a particular part of the Pareto-optimal front where as the weight vector providesa finer tradeoff between the objectives leading to convergence on a particular point on the Pareto-optimalfront (Deb et al., 2006). The λ vector found during the process of KKT-error calculation is used to calculate

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1

1.5

2

x1

x2

Before KKT−error reduction

(a) Before KKT-error reduction.

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

1.5

2

x1

x2

After KKT−error reduction

(b) After KKT-error reduction.

Figure 2: KKT-error reduction leads to a better organization of points.

the weights for the local search procedure. wi = λi

upbdi

−lwbdi

is used as weight for each objective function,

where upbdi and lwbdi are the upper and lower bounds of the ith objective function. The ‘max’ function in

the problem definition causes the problem to be non-smooth. The problem is converted into an equivalentsmooth problem (Miettinen, 1999) and is solved here using the sequential quadratic programming (SQP)routine of MATLAB. For the smooth variation of the problem refer to Miettinen (1999); Sindhya et al.(2009).

6

Figure 2 shows the KKT-error reduction for the ‘POL’ problem. Figure 2a shows the points before theKKT-error reduction. In Figure 2a, the points on the left seem to have a lower error as a curve can bethought to fit through these points, but the points on the right are highly unordered and no curve can fitthrough these points with a low fitting error. After the KKT-error reduction by the ASF procedure, pointshave arranged themselves in a better order for both the clusters. A regularity in the Pareto-optimal pointsis expected and the arrangement of points in an orderly fashion after the KKT-error reduction techniqueindicates the same, thereby approving the usefulness of our local search approach.

4 Clustering

The Pareto-optimal front and the Pareto-optimal set of a MOOP may be disconnected in certain problems.To take care of this, a clustering procedure is suggested to locate the gaps that may exist in a Pareto-optimalset. The proposed clustering process divides the corrected input data into K clusters, where K is the numberof disconnected Pareto-optimal regions that may not be known a priori. The automated procedure shouldproduce exactly the same number of clusters as the number of disconnected Pareto-optimal regions. Forexample, for a connected data the procedure should not divide the data into more than one clusters; for aproblem with two disconnected Pareto-optimal regions, exactly two clusters should be returned.

As the number of clusters that exist on the Pareto-optimal data are unknown, the clustering process iscarried out in three steps: (i) approximate estimate of the number of clusters, (ii) K-means clustering basedon the estimate, and (iii) update of clusters which are close to each other. Figure 3 pictorially shows thisthree-step process.

Estimate Numberof Clusters

Corrected Input Data K-means Clustering Merging Clusters Clustered Data

Figure 3: Pictorial representation of three-step clustering process.

First, the subtractive clustering method (Chiu, 1994) is used to make an initial estimate of the numberof clusters and cluster centers. Subtractive clustering gives us only an approximate estimate of the numberof clusters in the data set and doesn’t associate a cluster with every data point. Therefore, the numberof clusters and the cluster centers returned by subtractive clustering are fed as an estimate of number ofclusters (K) and initial center estimates, respectively. Thereafter, in the second step, each point is associatedwith a paricular cluster. Use of such a hybridized clustering procedure to obtain improved results has beenadvocated by several researchers (Ratrout, 2011; Hammouda and Karray, 2006). The above clusteringprocedure may still produce more than the expected number of clusters. Thereafter in the third step, acluster correction procedure is carried out, in which clusters with small minimum distance between themare combined together to form a larger cluster. The threshold for the minimum distance is decided on thebasis of the bounds on the data. One such situation is shown in Figure 4a. Here, initial clustering step giveseight clusters while it is clear from the plot of the data that there exists only one big cluster. The secondstep will associate each point with a different cluster, as shown in the figure. The cluster correction (third)step joins all the clusters together to indicate that there is a single large cluster associated with the suppliedpoints (Figure 4b). On the ‘POL’ problem, Figure 5 shows the outcome of the proposed three-step clusteringapproach. Two clusters are correctly identified by the proposed clustering approach.

5 Hyposurface Fitting

In Section 2, the task of hyposurface fitting was discussed and a method was proposed based on solving theoptimization problem given in equation (6). For a uniform open cubic B-spline data fitting, the objectivef(ui,Pj) can be defined by using a convenient matrix notation (Mortenson, 1985). For example, for an

7

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6−1

0

1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

x 3Cluster1

Cluster2

Cluster3

Cluster4

Cluster5

Cluster6

Cluster7

Cluster8

(a) Erroneous result of initial clustering.

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6−1

0

1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

x 3

Merged Cluster

(b) Corrected result after merging clusters.

Figure 4: Erroneous and corrected clustering on a hypothetical problem.

−3 −2 −1 0 1−1

−0.5

0

0.5

1

1.5

2

x1

x2

Cluster1Cluster2

Figure 5: Two clusters correctly returned by the proposed three-step clustering process for the ‘POL’ prob-lem.

one-dimensional hyposurface can be written as follows:

f(ui,Pj) =[

u3i1 u2

i1 ui1 1]

−1/6 3/6 −3/6 1/63/6 6/6 3/6 0/6−3/6 0/6 3/6 0/61/6 4/6 1/6 0/6

pi−1,j

pi,jpi+1,j

pi+2,j

, (10)

where i ∈ [1, . . .N ] and j ∈ [1, . . .K] andN is the number of data members of the node andK is the dimensionof the data. Similar matrix notation can also be written for a two or higher-dimensional hyposurfaces.

Given the data, u and P are the unknowns for the minimization problem (equation (6)). The problemis linear in control points P and non-linear in u. An iterative procedure is used to find the unknowns thatminimize the error value. The iterative procedure consists of following two steps:

1. Finding the control points: For fixed values of the parameters find the optimal control points thatminimize the error.

8

2. Updating the parameters: Keeping the control points found in previous step fixed, find the parametervalues that minimize the error.

Both steps are repeated till either the error is reduced below the fixed threshold or a pre-specifiedmaximum number of iterations are reached. If the error is large, the data is subdivided into several nodesusing a k-way tree approach. A full k-way tree is a rooted-tree in which each node other than the leafnodes has exactly k children. Hyposurface fitting is done during the process of tree creation, using the aboveprocess, with acceptable data fitting leading to leaf node creation. The procedure for tree building goes asfollows. Initially all the data is in the root node of the tree. Hyposurface fitting takes place for the data inthe node. If the fit is not acceptable, the node is further divided into 2M children nodes. For every childnode above process is recursively repeated. At the end of the recursion, hyposurface model stored in everyleaf node is joined together to give a continuous or piecewise continuous hyposurface. See Figure 6a for anexample of an one-way tree. Here, the initial fit for the data in the root node had an error greater than thethreshold. Therefore, the data was divided into two nodes, child 1 and child 2. Fitting for the data in child1 node was acceptable, so no further subdivision takes place and child 1 becomes the leaf node. For nodechild 2, fitting was still not good and subdivision of the data in further two children nodes: child 3 and child4, takes place. Fitting for these two nodes is satisfactory and they become the leaf node. Figure 6b showsflowchart of the algorithm.

For the data in each node, several steps are carried out for hyposurface fitting. Figure 7 shows a flowchartof these steps. A brief description of these steps follows.

5.1 Finding Outliers

The purpose of this subroutine is to find those data members in the node, which are dramatically differentfrom the rest of the data members in the same node. An archive of outliers is maintained, which are addedto the members of the current node. After this, the centroid of the data is found. Data members whichlie more than three times the radius distance from the centroid are deleted from the current node and arestored as the new updated archive. The data members which are outliers for one node may represent validmembers for other nodes. This process ensures that no information is lost.

5.2 Principal Component Analysis

The next step is to perform a Principal Component Analysis (PCA) of the data members in the node. Theaim of doing PCA is to find an initial parameterization of the data. PCA helps in finding a new basis systemin which the covariance between the variables is minimized, i.e. the covariance matrix becomes diagonal inthe new basis system. It can be mathematically shown that the principal components of the data matrixx are the eigenvectors of matrix xx

T. The principal directions are sorted according to the correspondingeigenvalues in a descending order and the data is projected along each of the first M principal directions.Scaling the projected data in [0, 1] provides a good estimate of the initial parameterization.

It is important that all the neighboring nodes have similar orientation of the directions returned by thePCA, as it helps in maintaining a proper continuity in the parameterization necessary for joining the modeledsegments with C2 continuity, at the end. To ensure this, the sign of the dot products of eigenvectors of theparent node and the child node is checked. If the sign is negative, direction for the child node is flipped.This process ensures that the parent and the children are similarly oriented.

PCA is also used to decide the number of control points (m) in each direction for the B-spline hyposurface.Total number of control points are fixed as some percentage of the number of data members in each node.These control points are divided between each of the M orthogonal directions in proportion of the eigenvalueassociated with the corresponding eigenvector.

5.3 Iterative hyposurface fitting

The steps for finding the control points and parameter updation are iteratively repeated to reduce the error,as described earlier. The problem of finding the control points for fixed parameterization is a linear leastsquare problem with linear constraints and is solved using SQP routine of MATLAB. The linear constraintsrepresent the bounds on the control points. These bounds are decided depending upon the range of input

9

Root Node

Child 1 Child 2

Child3 Child4

(a) 1-way tree

Start

Input data in root node

Is tree creation complete?

No

For each node repeat

fitting step recursively

Yes

Stop

Hyposurface fitting

Join hyposurfaces stored

in leaf nodes

(b) Flowchart of the algorithm.

Figure 610

Hyposurface Fitting

Find outliers

Perform PCA

Find number of

Control points

Find initial

parameterization

Find control points

Update parameterization

Is the fit acceptable or

max iterations reached?

Is the fit acceptable? YesNo

Divide data in 2M nodes Store control points for the model

Return Return

No

Yes

Figure 7: Flowchart showing the hyposurface fitting procedure.

11

data and ensure that the hyposurface is restricted in space. In the next step, the control points foundin the previous step are fixed and the non-linear problem in unknown parameters is solved to find theupdated parameter values. The iterations of these two steps end if the error value becomes small or themaximum number of iterations are reached. If the error value is small, the control points, correspondingparameterization, and the number of control points for the current node are stored, otherwise data membersof the current node are subdivided into 2M nodes.

5.4 Joining Hyposurface segments

The k-way tree creation process results in discontinuous segments modeled using B-spline basis functionsstored in the leaf nodes of the tree. Next step is to join together these segments to create a C2 continuoushyposurface. The two step, iterative hyposurface fitting method, described above, is used for modelingthe whole hyposurface in one go. To model a hyposurface using this iterative procedure, a good initialparameterization and total number of control points necessary for modeling are required. A parameterizationas well as the number of control points required for modeling every segment is stored in the leaf nodes ofthe k-way tree. The sum of these control points serves as the number of control points required to modelthe complete hyposurface. Scaling the parameterization stored in each node in the required range, basedon the position of node in the tree an initial parameterization for the data is worked out. The iterativeprocedure for the error reduction is now used to find the control points, which model the whole hyposurface.The iterations are carried till the threshold error is not satisfied. At the end of this process we get a C2

continuous hyposurface whose parametric equation is known to us.Figure 8 shows the steps of the algorithm for the data points in the first cluster for the ‘POL’ problem.

Figure 8a shows the data in the root node. No outliers were found in the data. Figure 8b shows the initialfitting. As the error value for this fit is greater than the threshold error value, the data is divided into twonodes. Figure 8c shows the subdivided data in the children nodes. Figure 8d shows the data fitting for childnode 1. This fitting is only slightly better than the previous fit and does not satisfy the threshold error andso the node is further divided into children nodes. As a result, after several subdivisions, a binary tree iscreated with the level of a leaf node indicating the intricacy of the features captured by it. Subsequent figuresshow the progress as the data is subdivided till an acceptable fit is found in Figure 8f. Figure 8g shows thediscontinuous segments after the tree creation is complete. Finally the segments are joined together to givea C2 continuous curve as shown in Figure 8h.

6 Results and discussion

To show the efficacy of the model, the algorithm is tested on several bi-objective and tri-objective problems.A problem set that covers unconstrained and constrained standard optimization problems as well as twoengineering design problems are considered. Following standard MOOPs are used: FON, ZDT2, ZDT3,POL, DTLZ2, DTLZ7 and OSY (Deb, 2001). Two engineering design problems viz. a two bar truss designproblem and a cutting parameter selection problem for machining are considered. For all of these problemsboth Pareto-optimal front and Pareto-optimal set are (M − 1)-dimensional hyposurface.

The input points are generated by NSGA-II. NSGA-II is run with a population size of 400 for 400generations and the results are passed to our procedure as input data. Threshold value of the KKT-error isfixed at 10−4. If the KKT-error is greater than this threshold, a local search is conducted using SQP routineof MATLAB. Maximum fnction evaluations for SQP are fixed to 1,000. For the subtractive clustering process,radii of influence is fixed to 0.10. For merging clusters, the threshold distance between the two clusters isdependent on the length of body diagonal of the hypercube enclosing the data. The number of the controlpoints that model the hyposurface is fixed as one-tenth of the number of data members stored in the node.The threshold error value to divide the data recursively in 2M children nodes is taken to be 10−4. Theresults for the problems mentioned above are presented in the following subsections.

6.1 FON problem

FON is a two-objective problem with n-variables. We show the results for n = 3 for better visualizationof solutions in the objective and variable space. Figure 9 shows the results for the FON problem. Figure9a shows the input data set, which is updated to Figure 9b by the KKT-error reduction technique. The

12

−3.15 −3.1 −3.05 −3

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

Data

(a) Root node data.

−3.15 −3.1 −3.05 −3

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

FittingData

(b) Initial fit.

−3.15 −3.1 −3.05 −3−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x2

Child Node 1 dataChild Node 2 data

(c) Child node data.

−3.15 −3.1 −3.05 −3−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x2

Child Node 2 data

Child Node 1 data

Fit

(d) Fitting for child node data.

−3.15 −3.1 −3.05 −3−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x2

Child Node 2 datachild Node 3 datachild node 4 dataFit

(e) Further subdivision and fitting.

−3.15 −3.1 −3.05 −3−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x2

Child Node 2 datachild node 4 datachild Node 5 datachild node 6 dataFit

(f) Acceptable fitting for a leaf node.

−3.15 −3.1 −3.05 −3

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

(g) Segments fitting data in leaf nodes.

−3.15 −3.1 −3.05 −3

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

(h) Segments joined together.

Figure 8: Fitting algorithm is illustrated.13

−0.8−0.6

−0.4−0.2

00.2

0.40.6

−1

−0.5

0

0.5

1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

x 3

Input Data

(a) Input data set.

−0.8−0.6

−0.4−0.2

00.2

0.40.6

−1

−0.5

0

0.5

1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

x 3



−1−0.5

00.5

1

−1

−0.5

0

0.5

1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x2

x3

Pareto set

(c) Pareto-optimal set.

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.00

0.15

0.23

0.31

0.38

0.46

0.54

0.62

0.69

0.770.85

x1

x2

1.00 Pareto front

(d) Pareto-optimal Front with parameter values.

Figure 9: Results for two variable ‘FON’ problem.

KKT-error reduction routine reduces the error in the data and now the data is better organized for curvefitting. Figures 9c and 9d show the Pareto-optimal set and the Pareto-optimal front in the variable andthe objective space, respectively. The Pareto-optimal front is obtained by mapping the fitted curve to theobjective space. Parameter values of points in the Pareto-optimal set are shown on their respective imagein the objective space in Figure 9d. This makes moving from the objective space to the variable space easy.Thus, one can easily find variable values corresponding to any point on the Pareto-optimal front. Also, theparameter values allows the user to choose any point on the front.

6.2 ZDT2

ZDT2 has a non-convex Pareto-optimal front. First the results of a three-variable ZDT2 problem are shownfollowed by those for a 30-variable problem. Figure 10a shows the initial input and the corrected data fora three variable ZDT2 problem. The red points show the corrected data. We can see that the data aftererror correction is much more organized and lies on a straight line. Figure 10b shows the curve fitting forthe Pareto-optimal set. All variables except x1 are close to zero, while x1 takes all values from zero to one.These values are same as those shown by Deb (2001). Figure 10c shows the image of the Pareto-optimalset in the objective space. Figure 10d shows the Pareto-optimal front for a 30-variable ZDT2 problem. The

14

Pareto-optimal set for a 30-variable ZDT2 problem can not be shown on a plot due to the high dimensionsof the variable space. However, we can see that the Pareto-optimal front of a 30-variable ZDT2 problem issimilar to its three-variable counterpart.

00.2

0.40.6

0.81

0

0.5

1

1.5

x 10−3

0

1

2

3

x 10−4

x1

x2

x 3

Input data


(a) Input data set.

0

0.5

1

0

0.5

1

1.5

x 10−3

0

1

2

3

x 10−4

x3

x1

x2

Pareto set

(b) Pareto-optimal set.

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.00 0.14

0.29

0.43

0.57

0.71

0.86

1.00

f1

f2

Pareto front

(c) Pareto-optimal Front with parameter values.

0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.00

0.17

0.33

0.50

0.67

0.83

1.00

f1

f2

Pareto front

(d) Pareto-optimal Front for 30-variable problem.

Figure 10: Results for ‘ZDT2’ problem.

6.3 ZDT3

ZDT3 has a number of disconnected Pareto-optimal fronts. First the results for a three-variable ZDT3problem are shown followed by those for a 30-variable problem. Figure 11a shows the initial input and thecorrected data for ZDT3 problem. Several points with high error values have been corrected and the correcteddata is better organized for all the five disconnected Pareto-optimal sets. The clustering routine identifiedfive clusters in the corrected data. Figure 11b shows the curve fitting for the Pareto-optimal set. Curvefitting is carried out, independently, for each of the clusters, as a result we get five different parameterizations.Figure 11c shows the image of the Pareto-optimal set in the objective space. From Figure 11b it can beinferred that all the points in the Pareto-optimal set have all variables, except x1, close to zero. x1 does nottake all values in [0, 1], it consists of five disconnected parts which leads to a disconnected Pareto-optimalfront. We can not show the Pareto-optimal set for a 30-variable ZDT3 problem. The Pareto-optimal frontwith parameter values for the 30-variable problem is shown in Figure 11d.

15

00.2

0.40.6

0.81

0

0.5

1

1.5

2

2.5

x 10−3

0

0.5

1

1.5

2

2.5

3

x 10−3

x1

x2

x 3Input data


(a) Input data set.

0

0.5

1

00.2

0.40.6

0.81

x 10−3

0

0.5

1

1.5

2

2.5

3

x 10−3

x3

x1

x2

Pareto set


0 0.2 0.4 0.6 0.8−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.00

0.33

0.67

1.00

0.00

0.330.67

1.00

0.00

0.33

0.67

1.00 0.00

0.33

0.67

1.000.00

0.33

0.67

1.00

f1

f2

Pareto front

(c) Pareto-optimal Front for three-variable problem.

0 0.2 0.4 0.6 0.8−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0.00

0.50

1.000.00

0.33

0.67

1.00

0.00

0.33

0.671.00

0.00

0.33

0.67

0.00

0.33

0.671.00

f1

f2

1.00

Pareto front

(d) Pareto-optimal front for 30-variable problem.

Figure 11: Results for ‘ZDT3’ problem.

6.4 POL

‘POL’ problem has a non-convex and disconnected Pareto-optimal front. Results for this two variableproblem are shown in Figure 12. The KKT-error reduction routine reduces the error in the input points andprovide better structure to the input data. The clustering procedure detects two clusters as both Pareto-optimal set and Pareto-optimal front are disconnected. The Pareto-optimal set has two parts A and B, theirmapping in the objective space along with the parameter values is shown in Figure 12d as A and B.

Variation of variables x1 and x2 with parameter u is shown in Figure 13. For part A of the front x1

value increases initially and then becomes constant at a variable value of one. Variable x2 however risescontinuously from value of 1.57 to 2. For the part B of the front, x1 remains constant for most part of thefront at a variable value of −3.14, indicating that front occurs at the lower bound for x1. x2 rises from −1to 0.71. We observe that for both parts of the front, x2 is greater than x1. Interesting properties of thesolutions revealed from the above analysis are enumerated:

1. Part B of the front occurs at the lower bound of variable x1

2. x2 is greater than x1 for the whole front

16

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1

1.5

2

x1

x2

Before KKT−error reduction

(a) Input data set.

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

1.5

2

x1

x2



−4 −3 −2 −1 0 1 2−1

−0.5

0

0.5

1

1.5

2

x1

x2

Pareto set

B

A


0 2 4 6 8 10 12 14 16 180

5

10

15

20

25

0.000.150.300.440.70

0.96

0.000.33

0.67

1.00

f1

f2

Pareto front


Figure 12: Results for two variable ‘POL’ problem.

3. x2 always increases as we move along the front.

6.5 DTLZ2

A three-objective DTLZ2 problem has a spherical Pareto-optimal front. Pareto-optimal set is a plane withx3 = 0.5 and Pareto-optimal front is part of a sphere with f2

1 + f22 + f2

3 = 1. Results for the problem areshown in Figure 14. KKT-error reduction routine reduces the error from .0011 to 8 × 10−10. Figures 14aand 14b show the Pareto-optimal set and Pareto-optimal front for the problem, respectively.

6.6 DTLZ7

DTLZ7 has a disconnected set of Pareto-optimal front and Pareto-optimal set. There are 2M−1 disconnectedPareto regions in this problem.

Figure 15 shows the results for a three-objective and three-variable DTLZ7 problem. There are fourdisconnected Pareto regions here which have been identified by the clustering procedure. The average KKT-error for the input points reduced from 8.1× 10−3 to 1.7× 10−6 after the error reduction routine. The fittedPareto-optimal set and the Pareto-optimal front along with the input points are shown in Figures 15a and

17

0 0.2 0.4 0.6 0.8 1

0.8

1

1.2

1.4

1.6

1.8

2

x1

x2

u

Va

ria

ble

va

lue

(a) Variation of variables with parameter for A.

0 0.2 0.4 0.6 0.8 1−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

x1

x2

u

Va

ria

ble

va

lue

(b) Variation of variables with parameter for B.

Figure 13: Variation of variables with parameter for ‘POL’ problem.

0

0.5

1

00.2

0.40.6

0.81

0.495

0.5

0.505

0.51

0.515

x1

x2

x3

Input data

(a) Pareto-optimal set.

0

0.5

1

0

0.5

1

0

0.2

0.4

0.6

0.8

1

f1

f2

f 3

NSGA−II solutions

(b) Pareto-optimal front.

Figure 14: Results for 3-variable ‘DTLZ2’ problem.

15b, respectively. It can be noticed that some of the points in the Pareto-optimal set don’t lie exactly onthe front. Due to this the surface fitting is not exact and generated surface is not planar.

6.7 OSY

OSY is a six variable problem. The Pareto-optimal front consists of five parts which are continuouslyconcatenated. However, in the variable space these five parts are discontinuous.

Figure 16a show the Pareto-optimal front for the OSY problem. ‘AB’, ‘BC’, ‘CD’, ‘DE’ and ‘EF’ showthe five parts of the front. The data is divided into five parts by the clustering procedure indicating thatthe Pareto-optimal set is discontinuous. For each of the five parts, the B-spline curve fitting process givesseveral cubic splines joined with C2 continuity. As the B-spline used is cubic we get cubic equations asrelationships, however the coefficient for some the terms may be negligible. These relationships correct totwo decimal places are shown in Table 1. The values of variables obtained by our method is very close tothose tabulated elsewhere (Deb, 2001). There are small differences which we descrive below.

18

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

−2

0

2

4

6

8

x 10−4

x1

x2

x3

Error corrected data

(a) Pareto-optimal set.

−0.10

0.10.2

0.30.4

0.50.6

0.70.8

−0.10

0.10.2

0.30.4

0.50.6

0.70.8

2.5

3

3.5

4

4.5

5

5.5

6

f1

f2

f3

NSGA−II solutions

(b) Pareto-optimal front.

Figure 15: Results for 3-variable ‘DTLZ7’ problem.

Table 1: Relationships among variables inferred from the plots for problem OSY.

Front part x1 x2 x3 x4 x5 x6

AB 5 1 (1.41,. . .,5) 0 5 0BC 5 1 (1.66,. . .,4.88) 0 1 0CD (4.06, . . .,5) (0.69, . . .,1) 1 0 1 0DE 0 2 (1.63,. . .,3.69) 0 1 0

EF(u∈ [0, .1]) (0,. . .,.01) (2.00,. . .,1.99) (1.24,. . .,1.14) 0 1 0EF(u∈ (.1, 1]) (.01,. . .,.97) (1.99,. . .,1.03) (1.14,. . .,1) 0 1 0

To show the variation of variables with parameter u, graphs between variable values and parameter valuesare plotted for each of the 5 parts of the Pareto-optimal front. Figure 16 shows these plots. For AB, as wemove from parameter value zero to one, x1, x2, x4, x5 and x6 remain constant at five, one, zero, five andzero, respectively. Only x3 changes along this part of the Pareto-optimal front and its value rises from 1.41to 5. For BC also, x1, x2, x4, x5 and x6 remain constant at five, one, zero, one and zero, respectively. Herealso, only x3 changes from 4.88 to 1.66. For CD, x3 and x5 are constant at variable value one while x4 and x6

take a value of zero. The rise in value of x1 and x2 is linear. A closer inspection of the expressions of x1(u)and x2(u) reveal that x2(u) = (x1(u)− 2)/3. Such an inspection was carried out because the relationshipwas known to us from a previous study by (Deb, 2001). For DE, we notice that x1, x2, x4, x5 and x6 are

19

−300 −250 −200 −150 −100 −50 00

10

20

30

40

50

60

70

80

0.00

1.00

0.00

0.50

1.00

0.00

0.250.50

0.50

0.751.00

0.00 0.33 0.67 1.00

0.00

0.33

1.00

f1

f 2

AB

BC

CD

DE

EF

(a) Pareto-optimal Front with parameter values forOSY problem

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

x1

x2

x3

x4

x5

x6

u

Va

ria

ble

va

lue

(b) Variation of variables with parameter for AB.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

x1

x2

x3

x4

x5

x6

u

Va

ria

ble

va

lue

(c) Variation of variables with parameter for BC.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

x1

x2

x3

x4

x5

x6

u

Va

ria

ble

va

lue

(d) Variation of variables with parameter for CD.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

x1

x2

x3

x4

x5

x6

u

Va

ria

ble

va

lue

(e) Variation of variables with parameter for DE.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

0

1

2

3

4

5

6

x1

x2

x3

x4

x5

x6

u

Va

ria

ble

va

lue

(f) Variation of variables with parameter for EF.

Figure 16: Variation of six variables with parameter for ‘OSY’ problem.

20

constant at zero, two, zero, four and zero respectively. Value of x3 varies between 1.63 and 3.69. For thepart EF of the front, from parameter values zero to nearly 0.1, properties of the variables are similar to thosefor DE. It seems that the end of DE has been captured as EF. For the initial part of EF, x1 varies from 0 to0.01, x2 varies from 2 to 1.99, x3 varies from 1.24 to 1.14 and x4, x5 and x6 are constant at zero, one andzero respectively. For the rest of the part of EF, x1 rises with u, while x2 declines with u. It appears thatthe plots of x1 and x2 are mirror image of each other about line x = 1. This makes x2 = 2 − x1, whichwas also found in (Deb, 2001). Variable x3 is nearly constant and variable x5 is constant at variable valueof one while x4 and x6 have a constant value of zero. We can infer some interesting properties of optimalpoints by this procedure which can be useful to the decision maker. Some inferences are enumerated:

1. x4 = x6 = 0 along the front.

2. x5 is constant at 5 for part AB and at 1 for the rest of the front.

3. For AB, BC and DE the change in the objective function value is caused by the change in the value ofx3 alone.

4. x3 is non-decreasing throughout the Pareto-optimal front .

5. For part EF, x2 = 2− x1.

6.8 Two bar truss design problem

Figure 17: A two bar truss.

In a two bar truss design problem the goal is to minimize the volume of the structure while allowing thestructure to carry a minimum load without elastic failure. Figure 17 shows the diagram of a two bar truss.The stress in the two trusses, AC and BC, has to be minimized. A load F = 105 N acts at the junction C.The vertical distance between B and C is variable y whereas the horizontal distance between the hinges is5m. x1 and x2 are the cross-sectional areas of AC and BC respectively and are measured in m2. The yieldstress (Sy) is 10

8 Pa. The three variable problem is defined as follows:

Minimize f1(x, y) = x1

√

16 + y2 + x2

√

1 + y2, (11)

Minimize f2(x, y) = max(σAC , σBC),

Subject to max(σAC , σBC) ≤ 108,

1 ≤ y ≤ 3,

x ≥ 0,

where σAC =F√

16 + y2

5yx1, σBC =

4F√

16 + y2

5yx2.

The objective f2 and the constraint on the maximum stress use a ‘max’ function which makes themnon-differentiable. This prevents an effective local search and the data points may not have a good structureto fit a curve through them. To bypass this problem, an alternate formulation is considered. The constraintcan be split into two different constraints, σAC ≤ 108 and σBC ≤ 108. These constraints demand that

21

stress in both the rods be less than the maximum permissible value of stress, which also implies that themaximum of the stress in the two rods is less than the maximum limit, thus satisfying the original constraint.

To circumvent the problem of non-differentiability of f2, an alternate smooth formulation of local searchprocedure can be considered. The following optimization problem is solved for local search starting from areference point z returned by NSGA-II:

Minimize ǫ1 + ǫ2, (12)

Subject to w1(f1(x, y)− z1) ≤ ǫ1,

w2(ǫ2 − z2) ≤ ǫ1,

σAC ≤ ǫ2,

σBC ≤ ǫ2,

1 ≤ y ≤ 3,

x ≥ 0,

ǫ1 ≤ 0.

(13)

This formulation helps us to remove the max function which caused the objective function to be non-smooth. The difference between the results of the original formulation for local search and the new formula-tion is shown in Figure 18. The original formulation doesn’t lead to any considerable correction in the datawhile the new formulation leads to substantial reduction in the error.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 10−3

0

0.002

0.004

0.006

0.008

0.01

1.6

1.8

2

2.2

2.4

2.6

2.8

3

x1

y

x2

Input data


(a) Error reduction with original local search formulation.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

0

0.005

0.011.8

2

2.2

2.4

2.6

2.8

3

x1

x2

y

Input dataError corrected data

(b) Error reduction with new local search formulation.

Figure 18: Error reduction results for two bar truss problem.

In this problem, f1 lies in [0, 0.05] and f2 lies in [0, 108]. There is a vast difference in the scales of boththe objectives. Scaling in the variable space is also considerably different. This can cause problems in thethe data fitting step. To circumvent this problem, the data is normalized to the interval [0, 1]. The data isrescaled back to show the results. Figure 19 shows the results for the above problem. Input data set andthe data set after error correction are shown in Figure 19a. Figure 19b shows the Pareto-optimal set for thisproblem. The mapped Pareto-optimal front is shown in Figure 19d. It is very close to the non-dominatedpoints shown in Figure 19c returned by the MOOA.

The variation of the variables with parameter values is shown in Figure 20. Notice that after parametervalue of around 0.89, there is a sudden change in the behavior of all the variables. Moving from parameter

22

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

0

0.005

0.011.8

2

2.2

2.4

2.6

2.8

3

x1

x2

y

Input dataError corrected data

(a) Input data set.

0 1 2 3 4 5

x 10−3

0

0.005

0.01

0.015

1.8

2

2.2

2.4

2.6

2.8

3

x1

x2

y

Pareto−set


0 0.01 0.02 0.03 0.04 0.05 0.060

1

2

3

4

5

6

7

8

9

10x 10

7

f1

f2

Pareto front returned by NSGA−II

(c) Non-dominated points returned by NSGA-II.

0 0.01 0.02 0.03 0.04 0.05 0.060

1

2

3

4

5

6

7

8

9

10x 10

7

0.00

0.06

0.24

0.35

0.530.71

0.88

f1

f2

0.12

1.00

Pareto front


Figure 19: Results for 3-variable ‘truss’ problem.

value 0 to 0.89, it can be noticed that variable y value does not change much, however the value of x1 and x2

rises steadily. It can be noticed that variation of x1 and x2 with parameter u is linear and a relation betweenthe slopes of x1 and x2 can be found using the parametric equation of the curve. The value of cross-sectionalarea of BC (x2) is always greater than cross-sectional area of AC (x1) for all the optimal solutions. Thereis a little change in the value of x2 for parameter values greater than 0.89, x2 reaches its maximum value of.01 and remains constant there. Thus, for variable x2, the bound becomes active for u > 0.89. This may bethe reason for sudden change in the behavior of variables at u = 0.89. Variable value, x1 decreases slightlyfor u > 0.89. However, a drastic change is seen for the variable y in the same range. This means beyondthe parameter value of 0.89, change in the optimal value of both functions is mostly caused by change in thevalue of y.

For further analysis, variation of constraint g1 ≡ 108 − σAC and g2 ≡ 108 − σBC with parameter u isstudied. Figure 21 shows the variation of g1 and g2 with parameter u. We observe that g2 and g1 haveidentical plots, implying that σAC = σBC throughout the Pareto-optimal front.

Several interesting inferences which can be drawn from the plots are enumerated below. These observa-tions are similar to observations made in (Bandaru and Deb, 2011) but here we find them using a different

23

0 0.2 0.4 0.6 0.8 10

0.002

0.004

0.006

0.008

0.01

0.012

u

Va

ria

ble

va

lue

x1

x2

(a) Variation of x1 and x2 with parameter.

0 0.2 0.4 0.6 0.8 11.8

2

2.2

2.4

2.6

2.8

3

u

Va

ria

ble

va

lue

y

(b) Variation of y with parameter value.

Figure 20: Variation of three variables with parameter for ‘truss’ problem.

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

10x 10

7

u

Co

nst

rait

valu

e

g1

g2

Figure 21: Variation of constraints g1 and g2 with parameter u for ‘truss’ problem.

approach. The relationships that exist between the variables as found in (Bandaru and Deb, 2011) are y = 2and x2 = 2x1.

1. x2 is greater than x1 throughout the Pareto-optimal front.

2. Relationship, x2 = 2 x1, can be confirmed from Figure 20a

3. Variable y, remains constant at y = 2 for parameter u < 0.89 and rises rapidly for higher u values

4. After the parameter value of around 0.89, x2 reaches its upper bound and remains constant there.

5. σAC = σBC throughout the front.

6.9 Metal cutting problem

Metal cutting problem was introduced by Sardinas et al. (2006) and was discussed in Deb and Bandaru(2012). The problem is to minimize the production time (Tp) and used tool life (ξ) for a machining process.

24

The variables are the cutting parameters: cutting speed (v), feed (f) and depth of cut (a). The problemhas three constraints apart from the lower and upper bounds on all the variables. Bounds on the variableare supplied by the tool maker. The cutting force (Fc) should be less than a threshold (Fcmax) dependingon the strength and stability of the machine and the tool. Maximum allowable cutting power (P ) shouldbe less than the machine motor power (PMOT ). Maximum surface roughness should be less than a specifiedmaximum roughness (Rmax). The problem is defined as follows:

Minimize Tp(v, f, a), (14)

Minimize ξ(v, f, a),

Subject to g1(v, f, a) ≡ Fcmax − Fc(v, f, a) ≥ 0,

g2(v, f, a) ≡ ηPMOT

100− P (v, f, a) ≥ 0,

g3(v, f, a) ≡ Rmax − R(v, f, a) ≥ 0,

vmin ≤ v ≤ vmax, fmin ≤ f ≤ fmax, amin ≤ a ≤ amax,

where Tp = τs +V

M(v, f, a)(1 +

τtcT (v, f, a)

) + τo,

M(v, f, a) = 1000vfa,

T (v, f, a) = CT vαfβaγ ,

ξ(v, f, a) =100V

M(v, f, a)T (v, f, a),

Fc(v, f, a) = CF vα′

fβ′

aγ′

,

P (v, f, a) =vFc(v, f, a)

6× 104,

R(v, f, a) =125f2

re.

Parameters τs, τtc, τo and V are the setup time, tool change time, idle tool time and volume of thematerial removed, respectively. These are usually constant for a cutting process. CF , α, β, γ, CF , α′, β′

and γ′ are experimentally calculated constants. T(v,f,a) and M(v,f,a) are tool life and Material removal raterespectively and are functions of the cutting parameters. η is the transmission efficiency of the motor andre is the tool nose radius. Values of constants for a machining of a steel bar on a CNC machine using a P20carbide tool as described by Sardinas et al. (2006) are given in Table 2. Further details of the problem canbe found in Sardinas et al. (2006).

Figure 22 shows the results for the above problem. Points returned by NSGA-II, before and after errorcorrection are shown in Figures 22a and 22b, respectively. NSGA-II was run for 700 generations for apopulation size of 400. Not much difference between the initial and the corrected data is observable fromthe figures. However, an investigation of the average KKT-error of the initial and corrected points reveals areduction from the initial value of 0.2629 to 0.0311, after the error reduction. As there is a large differencein the range in which the variables lie, scaling of variables is done to achieve accurate results for hyposurfacefitting. Figure 22c shows the continuous Pareto-set as modeled by the cubic B-splines. The results mappedto the objective space along with the parameter values are shown in Figure 22d. The use of cubic B-splineenables accurate modeling of the linear as well as the non-linear part of the Pareto-optimal set. The sharpjunction between the linear and non-linear part of the Pareto-optimal front is also captured by the B-spline.

The production time varies from 0.8545 min to 1.1258 min and used tool life varies from 2.1131 to 9.4883along the Pareto-optimal front. From parameter value u = 0 to u = 0.6, the production time (Tp) decreasesfrom 1.1258 min to 0.9554 min. In the same interval, the percentage used tool life (ξ) changes from 2.1131to 2.5659. In this interval, the change in production time with respect to the total change in productiontime along the front is large compared to the same relative change for the used tool life. It implies that the

25

Table 2: Parameters for metal cutting problem.

Parameter Valueτs .15minτtc .20minτo .05minV 219,912m3

CT 5.48× 109

α -3.46β -.696γ -.460CF 6.56× 103

α′ -.286β′ .917γ′ 1.10

amin, amax .6, 6mmfmin, fmax .15, .55mm/revvmin, vmax 250, 400m/min

re .8mmFcmax 5000NPMOT 10KW

η .75

solutions in this interval may not be of much interest to the user. Figure 23a shows the variation in the valueof cutting speed (v) with parameter value. From parameter value u = 0 to u = 0.6, v remains constant atits lower bound value of 250 m/min. In this interval, value of f rises steadily to 0.55 mm/rev which is itsupper bound. Value of a always decreases along the front. In the range u = 0 to u = 0.6, f and a show aninverse relationship where as from u = 0.6 to u = 1, v and a show an inverse relationship. Figure 23d showsthe variation of constraints g2 and g3 along with parameters. The constraint g1 is inactive everywhere alongthe front. From figure, it is observed that the roughness constraint (g3) is also inactive everywhere alongthe front while the maximum cutting power constraint (g2) is active along the whole front. The inferencesdrawn are summarized below:

1. From u = 0 to u = 0.6, v = 250 m/min

2. From u = 0 to u = 0.6, f and a are inversely related.

3. From u = 0.6 to u = 1, f = .55 mm/rev

4. From u = 0.6 to u = 1, v and a are inversely related.

5. a decreases from u = 0 to u = 1.

6. Constraint g2 is active everywhere along the front.

7 Limitations of the algorithm

As a statistical model is used, it can not be said with certainty that all the points on the modeled hypo-surface are the Pareto-optimal points. This is an inherent problem of any data fitting procedure. Getting acontinuous picture from a discrete image means making some assumptions about the model. If the data isdense we will have more confidence in the modeled hyposurface as we do not expect sudden changes in thePareto-optimal front. However, with sparse data we can get an erroneous Pareto-optimal front even thoughthe least square distance error is sufficiently low. In our model, the clustering process ensures that all the

26

250

300

350

400

0.20.250.30.350.40.450.50.55

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

fv

aInput data

(a) Input data set.

250

300

350

400

0.20.250.30.350.40.450.50.55

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

fv

a


(b) Input data set.

250

300

350

4000.1

0.20.3

0.40.5

1

2

3

4

5

6

7

f

a

Pareto set


0.85 0.9 0.95 1 1.05 1.1 1.15 1.22

3

4

5

6

7

8

9

10

0.000.120.350.470.59

0.65

0.71

0.82

0.88

0.94

1.00

Tp

ξ

0.76

0.24

Pareto front


Figure 22: Results for three-variable metal cutting problem.

features of the input data are captured. The maximum distance between two points to be considered acontinuous entity is fixed. If the distance between two points exceeds this distance, the points are consideredto belong to two different clusters and a gap is thus detected.

For constrained optimization problems, the solutions may lie on the constraints boundary. In suchsituation the points modeled by the hyposurface may be infeasible. Often for such infeasible points, thefunction values may be slightly better than the true Pareto-optimal points. Thus, constraints of any pointgenerated using this model must be checked.

The process of KKT-error reduction requires the objective functions to be differentiable so that localsearch can be conducted. If the objective function is not smooth and there are sudden kinks in the objectivefunction or constraints then the process will have difficulties in reducing error, leading to a higher unaccept-able fitting error. In such cases, subdifferential based KKT error minimization procedure (Deb et al., 2007)can be adopted.

We consider problems where regularity conditions are met. So, for an M -objective problem, the algorithmassumes that the Pareto-optimal front and the Pareto-optimal set are (M − 1)-dimensional piecewise con-tinuous entities. However, there is a class of problems for which this may not be so and the Pareto-optimalset can be a lower dimensional manifold for a (M − 1)-dimensional Pareto-optimal front (DTLZ6 problem

27

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1240

260

280

300

320

340

360

380

400

v

u

Var

iabl

e va

lue

(a) Variation of cutting speed (v) with parameter value.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

f

u

Var

iabl

e va

lue

(b) Variation of feed (f) with parameter value.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

a

u

Var

iabl

e va

lue

(c) Variation of depth of cut (a) with parameter value.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

0

10

20

30

40

50

60

70

80

u

Con

stra

int v

alue

g2

g3

(d) Variation of constraints with parameter value.

Figure 23: Variation of three variables with parameter for metal cutting problem.

(Deb et al., 2005), for example). Several such problems are considered by Zhou et al. in (Zhou et al., 2009).

8 Conclusions

This paper has proposed an algorithm to model the Pareto-optimal set of a MOOP using the data returnedby a MOOA. A continuous model of the Pareto-optimal surface has been obtatined from the discrete dataset returned by an MOOA. The algorithm has been designed to take advantage of the regularity and con-nectedness property of the Pareto-optimal front to do modeling for certain class of optimization problems.Cubic B-spline basis functions are used for the process of data fitting. Results have been shown for somestandard standard test problems indicating that the Pareto-optimal front obtained by the model is close tothe true Pareto-optimal front.

Such a procedure helps to provide a continuous picture of the front which gives a sense of completion insolving an optimization problem. As a by-product, using the parametric hyposurface, points not returned by

28

the MOOA can also be generated. As the fitting is done in the variable space, one can easily generate pointson the Pareto-optimal front at one’s own discretion and retrace back to the variable space using the parametervalues. This can be helpful in cases where variable values play an important role in decision making. To thisaccount, the data-fitting procedure proposed here can be implemented within an EMO algorithm and canbe applied after every few generations so as to help generate new and near-optimal points quickly using thefitted hyposurface. In this paper, results for several standard and engineering-design optimization problemswere discussed. In future studies more engineering design problems can be analyzed. In principle, thealgorithm can be extended to higher objective problems as well. In higher dimensions issues with clusteringand merging of clusters may be important. The procedure of data fitting will also be used for designing aMOOA which uses data fitting as an intermediate step to figure out relationships in non-dominated frontierdata and use that for faster convergence of MOOA.

References

K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, 2001.

T. W. Athan and P. Y. Papalambros. A note on weighted criteria methods for compromise solutions inmulti-objective optimization. Engineering Optimization, 27:155–176, 1996.

I. Das and J. E. Dennis. Normal-boundary intersection: A new method for generating the pareto surface innonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8:631–657, 1998.

R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structuraland Multidisciplinary Optimization, 26(6):369–395, April 2004.

A. Charnes, R. W. Clower, and K. O. Kortanek. Effective control through coherent decentralization withpreemptive goals. Econometrica, 35:294–320, 1967.

A. Charnes and W. W. Cooper. Goal programming and multiple objective optimization; part 1. EuropeanJournal of Operations Research, 1:39–54, 1977.

A. Messac. Physical programming: effective optimization for computational design. AIAA Journal, 34:149–158, 1996.

R. E. Steuer, M. Hirschberger, and Y. Qi. Large-scale mv efficient frontier computation via a procedure ofparametric quadratic programming. European Journal of Operational Research, 204:581–588, 2010.

Crina Grosan. A new evolutionary technique for detecting Pareto continuous regions. In Alwyn Barry, editor,2003 Genetic and Evolutionary Computation Conference. Workshop Program, pages 304–307, Chicago,Illinois, USA, July 2003. AAAI.

D. Dumitrescu, Crina Grosan, and Mihai Oltean. Genetic Chromodynamics for Obtaining ContinuousRepresentation of Pareto Regions. Studia Universitatis Babes–Bolyai, Informatica, XLVI(1):15–30, 2001.

D. A. V. Veldhuizen. Multiobjective Evolutionary Algorithms: Classification, Analyses and New Innovations.PhD thesis, Air Force Institute of Technology, Air University, 1999.

T. Goel, R. Vaidyanathan, R. T. Haftka, W. Shyy, N. V. Queipo, and K. Tucker. Response surface approxi-mation of pareto optimal front in multi-objective optimization. Computer Methods in Applied Mechanicsand Engineering, 196:879–893, 2007.

T. Okabe, Y. Jin, B. Sendhoff, and M. Olhofer. Voronoi-based estimation of distribution algorithm formulti-objective optimization. In Proceedings of the Congress on Evolutionary Computation (CEC 2004),pages 1594–1601, Portland, Oregon, USA, 2004. IEEE press.

P. A. N. Bosman and D. Thierens. The naive midea: A baseline multiobjective ea. In Third InternationalConference on Evolutionary Multi-Criterion Optimization (EMO 2005)., volume 3410 of LNCS, pages428–442, Guanajuato, Mexico, 2005. Springer.

29

A. Zhou, Q. Zhang, Y. Jin, E. Tsang, and T. Okabe. A model-based evolutionary algorithm for bi-objectiveoptimization. In Proceedings of the Congress on Evolutionary Computation (CEC 2005)., pages 2568–2575,Edinburgh,U.K, 2005. IEEE Press.

A. Zhou, Y. Jin, Q. Zhang, B. Sendhoff, and E. Tsang. Combining model-based and genetics-based offspringgeneration for multi-objective optimization using a convergence criterion. In Proceedings of the Congresson Evolutionary Computation (CEC 2006), pages 3234–3241, Vancouver, BC,Canada, 2006. IEEE Press.

M. Dellnitz, O. Schutze, and T. Hestermeyer. Covering pareto sets by multilevel subdivision techniques.Journal of optimization theory and applications, 124(1):113–136, January 2005.

S. Bandaru and K. Deb. Towards automating the discovery of certain innovative design principles througha clustering-based optimization technique. Engineering optimization, pages 1–31, 2011. ISSN 0305-215X.

C. R. Bonham and I. C. Parmee. Developments of the cluster oriented genetic algorithm. EngineeringOptimization, 36(2):249–279, April 2004.

Y. Jin and B. Sendhoff. Connectedness, regularity and the success of local search in evolutionary multi-objective optimization. In Proceedings of the IEEE Congress on Evolutionary Computation, volume 3,pages 1910–1917, 2003.

M. Ehrgott and K. Klamroth. Connectedness of efficient solutions in multiple criteria combinatorial opti-mization. European Joumal of Opemtional Research, 1997.

A. Daniilidis, N. Hadjisavvas, and S. Schaible. Connectedness of the efficient set for three-objective qua-siconcave maximization problems. J. Optim. Theory Appl., 93(3):517–524, 1997. ISSN 0022-3239. doi:10.1023/a:1022634827916.

M. E. Mortenson. Geometric Modeling. John Wiley and Sons, 1985.

L. Fang and D. C. Gossard. Multidimensional curve fitting to unorganized data points by nonlinear mini-mization. Computer-Aided Design, 27(1):48–58, 1995.

K. Miettinen. Nonlinear Multiobjective Optimization. Kluwer, Boston, 1999.

Q. Zhang, A. Zhou, and Y. Jin. Rm-meda: A regularity model-based multiobjective estimation of distributionalgorithm. IEEE Trans. Evolutionary Computation, 12(1):41–63, 2008.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm:NSGA-II. IEEE Trans. Evolutionary Comp., 6(2):182–197, April 2002.

A P Wierzbicki. The use of reference objectives in multiobjective optimization-theoretical impli-cations and practical experiences. Int Inst Applied System Analysis, pages 1–32, 1979. URLhttp://www.iiasa.ac.at/Publications/Documents/WP-79-066.pdf.

Karthik Sindhya, Ankur Sinha, Kalyanmoy Deb, and Kaisa Miettinen. Local Search Based EvolutionaryMulti-Objective Optimization Algorithm for Constrained and Unconstrained Problems. In 2009 IEEECongress on Evolutionary Computation (CEC’2009), pages 2919–2926, Trondheim, Norway, May 2009.IEEE Press.

K. Deb, U. B. Rao N. J. Sundar, and S.Chaudhuri. Reference point based multi-objective optimizationusing evolutionary algorithms. International Journal of Computational Intelligence Research, 2(3):273–286, 2006.

S. Chiu. Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems,2(3), September 1994.

30

N. T. Ratrout. Subtractive clustering-based k-means technique for determining optimum time-of-day break-points. Journal of comupting in civil engineering, 25(5):380–387, Sep 2011.

K. Hammouda and F. Karray. A comparative study of data clustering techniques. 2006.

R. Q. Sardinas, M. R. Santana, and E. A. Brindis. Genetic algorithm-based multi-objective op-timization of cutting parameters in turning processes. Eng. Appl. Artif. Intell., 19:127–133,March 2006. ISSN 0952-1976. doi: http://dx.doi.org/10.1016/j.engappai.2005.06.007. URLhttp://dx.doi.org/10.1016/j.engappai.2005.06.007.

K. Deb and S. Bandaru. Higher and lower-level innovizations for knowledge discovery in practical problems.Technical report, Kanpur Genetic Algorithm Laboratory, 2012.

K. Deb, R. Tiwari, M. Dixit, and J. Dutta. Finding trade-off solutions close to KKT points using evolutionarymulti-objective optimization. In Proceedings of the Congress on Evolutionary Computation (CEC-2007),pages 2109–2116, 2007.

K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable test problems for evolutionary multi-objectiveoptimization. In A. Abraham, L. Jain, and R. Goldberg, editors, Evolutionary Multiobjective Optimization,pages 105–145. London: Springer-Verlag, 2005.

A. Zhou, Q. Zhang, and Y. Jin. Approximating the set of pareto-optimal solutions in both the decision andobjective spaces by an estimation of distribution algorithm. IEEE transactions on evolutionary computa-tion, 13(5):1167–1189, 2009.

31