A Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds

Computational Optimization and Applications, 28, 51–86, 2004c© 2004 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Trust Region SQP Algorithm for EqualityConstrained Parameter Estimation with SimpleParameter Bounds

NIKHIL ARORA AND LORENZ T. BIEGLER∗Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA

Received February 4, 2002; Revised May 1, 2003

Abstract. We describe a new algorithm for a class of parameter estimation problems, which are either uncon-strained or have only equality constraints and bounds on parameters. Due to the presence of unobservable variables,parameter estimation problems may have non-unique solutions for these variables. These can also lead to singularor ill-conditioned Hessians and this may be responsible for slow or non-convergence of nonlinear programming(NLP) algorithms used to solve these problems. For this reason, we need an algorithm that leads to strong descentand converges to a stationary point. Our algorithm is based on Successive Quadratic Programming (SQP) andconstrains the SQP steps in a trust region for global convergence. We consider the second-order information inthree ways: quasi-Newton updates, Gauss-Newton approximation, and exact second derivatives, and we comparetheir performance. Finally, we provide results of tests of our algorithm on various problems from the CUTE andCOPS sets.

Keywords: nonlinear programming, parameter estimation, trust region, bound constraints, equality constraints

1. Introduction

In this paper we develop a trust region algorithm for a class of parameter estimation prob-lems. These problems are characterized by few parameters and nonlinear constraints. Inaddition, they can have a large number of variables, but we assume that all linear algebracan be done inexpensively with direct methods. Such problems typically arise in chemicalengineering applications, particularly in the statistical estimation of parameters associatedwith reaction or process engineering models. Inequality constraints are typically not en-countered in such problems but we may have bounds on the parameters. However theseparameter bounds are intended as safeguards and are generally inactive at the solution.

Parameter estimation problems can be described as:

min{θ,y}

f (θ, y)

s.t. c(θ, y) = 0 (1)

θ L ≤ θ ≤ θU ,

∗Author to whom correspondence should be addressed.

52 ARORA AND BIEGLER

where f (y, θ ) is an objective function often derived from a statistical basis, x� = [θ�, y�]is the total set of variables, θ is the set of parameters, and θ L and θU are the lower and upperbounds, respectively, on the parameters. Here we assume that: f : R

n → R, c : Rn → R

m ,θ ∈ R

(n−m), y ∈ Rm , and n > m. The functions f and c are assumed to be at least

twice continuously differentiable. Observe that we have bounds only on the parameters.Problem (1) is solved by successive quadratic programming (SQP) by repeatedly solvingthe following quadratic program for the kth iteration:

mind

∇x f (xk)�d + 1

2d�Wkd

s.t. c(xk) + A(xk)d = 0 (2)

θ L ≤ θk + dθ ≤ θU .

Here d ∈ Rn , d� = [d�

θ , d�y ], dθ ∈ R

(n−m), and dy ∈ Rm . Throughout this paper, we shall

use the following notation for brevity: fk = f (xk), gk = ∇x f (xk) ∈ Rn , ck = c(xk), Ak =

A(xk) = ∇x c�k ∈ R

m×n , and Wk ∈ Rn×n is the Hessian of the Lagrangian, Lk = fk + c�

k λk .We define the null-space of the Jacobian Ak as the matrix Zk . Here Zk ∈ R

n×(n−m), andAk Zk = 0. Matrix Yk ∈ R

n×m is calculated so that the matrix [Zk Yk] is nonsingular. Thenull-space matrix Zk can be computed by a QR factorization of Ak which can also imposeY �

k Zk = 0. However, for large m and n, this can be an expensive calculation. By suitablyarranging the columns of the Jacobian as:

Ak = [Nk Ck] , Ck ∈ Rm×m non-singular, and Nk ∈ R

m×(n−m), (3)

we can obtain orthonormal [37] or orthogonal [9, 10, 33, 43] matrices Yk and Zk . Here Ck

is the basis matrix, ∇yc(xk)T . We assume this matrix to be non-singular for a large domainof x . A third method to calculate Yk and Zk is based on the coordinate decomposition [5]where we have:

Zk =[

I(n−m)

−C−1k Nk

], and Yk =

[0

Im

]. (4)

Following the approach in [5, 10, 39], we split the above QP (2) into the so-calledquasi-normal and tangential sub-problems. The quasi-normal sub-problem aims to attainfeasibility of the constraints while the tangential sub-problem aims to actually reduce theQP objective function in (2). By splitting the step taken by the QP into two parts using acoordinate basis, we write the resulting sub-problems as:

Quasi-normal sub-problem

ck + AkYk pY = 0, (5)

and

A TRUST REGION SQP ALGORITHM 53

Tangential sub-problem

minpZ

(Z�k gk + wk)� pZ + 1

2p�

Z Bk pZ

pLZ ≤ pZ ≤ pU

Z ,

(6)

where d = Yk pY +Zk pZ , pY ∈ Rm , and pZ ∈ R

(n−m). Here Bk may be calculated exactly byZ�

k Wk Zk , if second derivatives are available, or be approximated by quasi-Newton updates;we also denote the cross term wk as Z�

k WkYk pY or a suitable approximation. This termmay also be omitted, but at the risk of degrading the performance of the algorithm [4, 5].In our algorithm, we provide options to calculate the Hessian of the Lagrangian exactly orapproximate it by two different quasi-Newton approximations: the BFGS and the SymmetricRank-1 (SR1) updates.

Note that orthogonal matrices Zk and Yk (i.e., Y Tk Zk = 0) require O((n − m)2m) opera-

tions to compute and complicate the tangential sub-problem (6), because the simple boundson θ now become linear inequalities on pZ . Instead, Zk and Yk in (4) are not orthogonaland the steps Yk pY and Zk pZ are not closely related. This may lead to poor performanceof the SQP algorithm in some cases, as pY is sensitive to the conditioning of Ck . However,we use (4) because it is relatively inexpensive and works well in most cases [14, 16]. Also,now the bounds on pZ are the same as the bounds on dθ .

Because (1) may have non-unique solutions and can include non-convex functions, care-ful attention is required for reliable performance. To ensure global convergence of ouralgorithm, we constrain the quasi-normal and tangential steps of the QP within trust re-gions. A comprehensive treatment of trust region methods and their convergence propertiesis given in [13]. For the parameter estimation problem discussed here we develop a spe-cialization of composite step trust region SQP methods described in Chapter 15 of [13].In particular, we apply Byrd-Omojokun [39] trust region methods that use exact penaltymerit functions for equality constrained optimization, with minor modifications to handleparameter bounds.

Implementations of the Byrd-Omojokun approach for equality constrained NLPs are de-scribed in [9] and [33] and reviewed extensively in [13] and [38]. Related implementationsbased on augmented Lagrangian merit functions are developed and analyzed in [19], ex-tended to more general trust region algorithms in [14] and developed for box trust regionsin [27]. Finally, more recent trust region methods have been developed without penaltyfunctions. These include the filter approach of Fletcher et al. [20, 21] and Leyffer et al. [34]as well as the non-monotonic approach of Ulbrich and Ulbrich [44].

Moreover, Dennis et al. [16] proposed a closely related algorithm for trust region SQPwith coordinate basis. To handle bounds for the tangential step, they augment the reducedHessian of the Lagrangian with an inverse-barrier term using the algorithm in [12], andmaintain strict feasibility with respect to the bounds. In addition, they obtain an approximatesolution to the tangential problem by using truncated Newton methods. Finally, for nonlinearprograms with more general inequality constraints, we also mention the interior point trustregion strategy [9] as well as the trust region filter SQP methods [20, 21, 34].

For the parameter estimation problems we consider, the degrees of freedom, (n − m), aresmall compared to the total number of variables, function evaluations may be expensive, and


the calculation of the quasi-normal step may be expensive due to a large basis matrix. Suchproblems arise in the estimation of rate parameters, from experimental data in chemicalkinetics, and process parameters for large-scale models of steady state or dynamic chemicalprocesses. As shown by the example in the next section, we develop this algorithm tofocus on parameter estimation problems that have only a few model parameters. To solvethese problems, we will rely on sparse matrix factorizations for the quasi-normal problem.Moreover, in order to reduce the number of function and gradient calculations, we solvethe tangential problem using direct linear solvers.

In this study, we do not consider the important class of inverse problems, which are ill-posed with associated reduced Hessian matrices that contain a large cluster of small singularvalues. Such problems arise, for instance, upon discretization of a field variable to a set ofparameters, θ . While trust region methods can still be applied (see [42]), solving inverseproblems requires the imposition of regularization schemes that are beyond the scope of thisstudy. Regularization schemes consider a specific objective in (1) of the form f (y) + βφ(θ ),where φ(θ ) is often a problem dependent norm (as in Tikhonov regularization) and β > 0is a regularization parameter (see Hansen [30]). Regularization approaches are essential tocreate well-posed NLP problems and they have a strong similarity to trust region methods. Inparticular, the quadratically constrained trust region subproblem is equivalent to Tikhonovregularization [42] and a number of iterative linear solvers [6, 11, 26, 41, 42] have beendeveloped to deal with the large scale trust region subproblem (LSTRS). A survey ofregularization methods and comparison of regularization and trust region methods is givenin Rojas [40].

In Section 2 we discuss the general class of problems we intend to solve with this algorithmand we present motivating examples to stress the need for trust region methods. In Section 3we outline our algorithm, provide some background and discuss the solution of the quasi-normal and tangential sub-problems. In Section 4 we describe additional implementationdetails and in Section 5 we provide an outline of the convergence analysis. Section 6 presentsresults on test problems from the CUTE [7] and COPS [17] sets and we conclude the paper inSection 7. In the discussion that follows, the norm ‖·‖ refers to the �2 norm unless explicitlyspecified. The subscript k on a quantity means that it is evaluated at the kth iterate, whilethe superscript ( j) on a vector refers to its j th element.

2. Motivating examples

To motivate the discussion, we consider the isomerization of α-pinene [17, 43] as a typicalparameter estimation problem (1). Here the goal is to determine the reaction rate parametersθi , i ∈ {1, . . . 5} from the following differential equation process model (7):

y′1 = −(θ1 + θ2)y1

y′2 = θ1 y1

y′3 = θ2 y1 − (θ3 + θ4)y3 + θ5 y5 (7)

y′4 = θ3 y3

y′5 = θ4 y3 − θ5 y5.


Here the process model (7) is discretized to form the equality constraints in (1) usingorthogonal collocation on finite elements. In particular, we divide the time domain into aset of finite elements {1, . . . , ne}, 0 = t1 < · · · < tne = t , and denote hi := ti+1 − ti . Wecan now calculate the states at any given time t ∈ [ti , ti+1] using the monomial basis [3] as:

y(t) = y(ti ) + hi

ncol∑j=1

� j (t)yi j , (8)

where � j is a polynomial approximation at the collocation point j , ncol is the number ofcollocation points, and yi j is the derivative of the state y in the i th element evaluated atthe collocation point j . Here, the total number of variables is nyne(1 + ncol) + n p whereny = 5 is the number of states and n p = 5 is the number of parameters. Clearly, the totalnumber of variables increases linearly with ne and ncol. However, n−m is still n p = 5. Theobjective for systems like (7) is often a weighted least squares function or a more generalmaximum likelihood function that reflects the fit experimental data y(t l) for a time seriesof measurements, t l , l = 1, . . . ND, that do not necessary correspond to ti .

More generally process models used to formulate (1) consist of algebraic equations or canbe derived from discretization of ordinary or partial differential equations. A key point for(1) is that the set of parameters θ is intrinsic to the model and not affected by discretization ofthe model equations. Therefore even for large systems, n−m remains constant and normallymuch smaller than m. Finally, problem (1) is often part of a larger model discriminationexercise where statistical information for the parameters and even the suitability of theprocess model can be inferred from the solution of (1).

Parameter estimation problems have a number of interesting features. Of course, nonlinearprocess models lead to a non-convex problem (1), thus leading to non-positive curvature,indefinite reduced Hessians and stationary points that do not satisfy second order conditions.Moreover, parameter estimation problems that are over-parametrized also have singularor nearly singular Hessian matrices at the optimum (see, e.g. [2] for one such objectivefunction). To illustrate these characteristics and further motivate the need for trust regionalgorithms, we consider a derived example with three stationary points: a local minimum,a global minimum, and a saddle point:

minθ (1),θ (2)

(θ (1)

)4

4− (ω1 + ω2 + ω3)

(θ (1)

)3

3+ (ω1ω2 + ω2ω3 + ω3ω1)

(θ (1)

)2

2

− ω1ω2ω3θ(1) +

(θ (2)

)2

2, (9)

where ω1, ω2, and ω3 ∈ R are distinct constants with ω1 < ω2 < ω3. The problem hasa local minimum at (ω1, 0), the global minimum at (ω3, 0) and a saddle point at (ω2, 0),respectively. Figures 1 and 2 depict the contours and the surface generated by the function(9) when {ω1, ω2, ω3} = {−4, −1, 5}.

If we use (−1, ξ ) as a starting point with ξ > 0 and the trust region algorithm (describedlater) with the exact Hessian calculation, the Hessian is detected as being indefinite andthe algorithm takes directions of negative curvature to converge to the global minimum


−8 −6 4− −2 0 2 4 6

−6

−4

−2

0

2

4

6

8

−188

−170 −170151 −151133 133115 11 596. 5 9 678. 2 759. 9−41. 6

−23.3

−23.3

4.96

−4.96

13.3

13.3

13.3

31.6

31.6

31.6

49.9

68.2

86.5

105123

141160

178

196

215

233

251270

306324343416453471489507

526

526

x1

x 2

Figure 1. Contour plot of example (9).

(5, 0). The algorithm converges to (−4, 0) if ξ < 0. However, linesearch algorithms suchas MINOS, rSQP [5], and IPOPT [45, 46] simply stop at the saddle point (−1, 0). This is anadvantage of the trust region framework because it allows one to exploit negative curvaturein a direct manner. In Tables 1 and 2 we summarize the well-known behavior (see, e.g. [13],Chapter 6) of linesearch and trust region algorithms when the Hessian of the Lagrangian of

Table 1. Effects of various Hessian approximations on convergence when actual Hessian is singular at thesolution.

Hessian calculation Linesearch Trust region

Exact ‖d‖ → ∞ ‖d‖ = δ, slow convergence

SR1 ‖d‖ → ∞ ‖d‖ = δ, Hessian updates possible

BFGS with Powell’s damping ‖d‖ = δ, Hessian updates possible

‖d‖ → ∞with Skipping

slow convergence or line search failure


Table 2. Effects of various Hessian approximations on convergence when actual Hessian is indefinite.

Hessian calculation Linesearch Trust region

Exact No guarantee of descent, line-search failures Converges by taking somedirections of negative curvature

SR1 No guarantee of descent, line-search failures Converges by taking somedirections of negative curvature

BFGS with Powell’s damping Could stop at saddle point if‖d‖ → ∞ started close to the saddle point

with Skippingslow convergence or line search failure

Figure 2. Mesh plot of example (9).

the original problem is singular or indefinite and is calculated exactly or approximated byquasi-Newton updates.

3. Derivation of algorithm

We begin by discussing the quasi-normal and tangential subproblems for composite steptrust region algorithms of the Byrd-Omojokun-type.


Trust region quasi-normal sub-problem

minpY

‖ck + AkYk pY ‖2

s.t. ‖pY ‖ ≤ δ1,(10)

This subproblem is a large NLP (in Rm) with a quadratic objective and a single quadratic

constraint. We assume Ak to be full row rank so that this problem is strictly convex. Due tothe large size of this problem, we prefer to solve it inexactly with the aid of sparse matrixfactorizations within a dogleg method.

Trust region tangential sub-problem

minpZ

(Z�k gk + wk)� pZ + 1

2p�

Z Bk pZ

s.t. ‖pZ‖ ≤ δ2 (11)

θ L ≤ θk + pZ ≤ θU .

This subproblem is a small NLP (in R(n−m)) with a quadratic objective, a single quadratic

constraint and bounds. For this problem we apply a classical trust region algorithm extendedto deal with simple bounds. Because (n−m) is relatively small, we opt for an exact approachwith direct matrix factorizations [23] rather than an inexact approach based on truncatedNewton methods (e.g. [13, 35]).

In addition, we need to coordinate the adjustment of δ1 and δ2 in the sub-problems in orderto enforce ‖Yk pY + Zk pZ‖ ≤ �, where � is a trust region over the entire step (figure 3). Forthe problems (10) and (11) we equate δ1 and δ2 and adjust them simultaneously. However,in Section 3.3, we consider ‖Zk pZ‖ ≤ δ2 and will choose δ2 to be larger and proportionalto δ1.

Observe that in the quasi-normal sub-problem, we minimize the squared residual ofthe linearized constraints and constrain the step in a trust region (10). This step may besignificantly smaller than that predicted by (5). The tangential sub-problem minimizes thesame objective function as (6). However, here we constrain pZ or Zk pZ in a trust region(11). We then take as our next starting point:

xk+1 = xk + Yk pY + Zk pZ , (12)

provided we obtain sufficient reduction in the merit function f (x) + µ‖c(x)‖. If not, wereject this step, reduce the sizes of the trust regions and repeat the calculation of the quasi-normal and tangential steps. In addition, we obtain Lagrange multiplier estimates at iterationk using the first order approximation [39]:

λk = −(Y �k A�

k )−1Y �k gk . (13)


dk YkpY

δ1

δ2

k+1

ZkpZ

xk

x

Figure 3. A trust region step.

In the remainder of this section we provide details of the methods used to solve the quasi-normal and tangential sub-problems, and discuss a number of algorithmic issues regardingsolution of the tangential sub-problem.

3.1. Calculation of the quasi-normal step

The trust region constrained problem for the quasi-normal step (10) is solved inexactly bythe dogleg method of Powell [9, 33], assuming AkYk is nonsingular. This method has beenemployed because the Hessian in this problem, Y �

k A�k AkYk , is positive definite and the

calculation of a step is relatively cheap. A dogleg step is a linear combination of the Newtonand Cauchy steps:

Yk pDY = ηYk pN

Y + (1 − η)Yk pCY . (14)

Here, η ∈ [0, 1] and the superscripts D, N, and C refer to the dogleg, Newton and Cauchysteps, respectively. These steps are calculated as:

pNY = −(AkYk)−1ck, (15)

pCY = γ

−(AkYk)�ck

‖(AkYk)�ck‖ , (16)

γ = min

[δ1,

‖(AkYk)�ck‖3

‖(AkYk)(AkYk)�ck‖2

]. (17)


The parameter η is set to one if ‖pNY ‖ ≤ δ1. Otherwise, it is adjusted so that the length of

the entire step in (14) is equal to the radius of the trust region, δ1.

3.2. Calculation of the tangential step

The tangential step is obtained by solving the QP (11) by a calculation analogous to thatperformed by Goldfeld et al. [25]. Gay [23] implemented this algorithm in the codes NL2SOL[15] and Algorithm 717 [8]. He also considered bounds on variables [24]. We now describeGay’s implementation. Without active bounds, the first order Karush-Kuhn-Tucker (KKT)conditions for (11) are:

(Bk + αk I )pZ = −(Z�k gk + wk), (18)

αk = 0 if ‖pZ‖ < δ2 and αk ≥ 0 if ‖pZ‖ = δ2. (19)

Here αk is the Marquardt parameter, which is just the Lagrange multiplier associated withthe trust region constraint. If the Newton step (αk = 0) lies inside the trust region, then thisstep is taken. Otherwise αk is calculated by solving:

1

‖(Bk + αk I )−1gk‖ − 1

δ2= 0, (20)

where we have used the compact notation gk = (Z�k gk + wk). This equation is solved

by Newton’s method. Previously, Hebden [31] and others [36] have observed that thequality of Newton iterations may be very poor if αk is not constrained between suitablebounds. Observe that the first term in (20) is concave if Bk is positive definite and increasesmonotonically with αk . However, if Bk is indefinite or singular, we have to calculate an αk

that makes the matrix (Bk + αk I ) positive definite. Lower and upper bounds are imposedon αk along with an approximate upper bound on the smallest eigenvalue of Bk if Bk isindefinite. Once an αk is found that makes the matrix (Bk + αk I ) positive definite, wecan have monotonically increasing lower and monotonically decreasing upper bounds toconstrain αk .

If Bk is indefinite, having the magnitude of αk approach the magnitude of a negativeeigenvalue of Bk would normally cause ‖(Bk + αk I )−1gk‖ → ∞. However, if gk is or-thogonal to the eigenvector corresponding to the most negative eigenvalue of Bk , then‖(Bk +αk I )−1gk‖ may remain finite, even when αk is arbitrarily close to the absolute valueof the most negative eigenvalue of Bk . This behavior is known as the hard case and willbe considered further in Section 3.3. Here, the eigenvector corresponding to the smallesteigenvalue of Bk is computed and its multiple is added to −(Bk + αk I )−1gk . This is also adirection of negative curvature and ensures that the step-length equals the radius of the trustregion. For a more comprehensive discussion of the above, the reader is referred to More[36] and Gay [23].

To enforce bounds in (11), we adopt the approach of [24]. Using the solution from (18)and (19), we first obtain the step pZ and ignore bounds in (11). With this step, we define


� = max{ω | θ ( j) + ω(Z pZ )( j) ∈ [θ L , θU ], j = 1, . . . , n − m}, j = argmax{ω | θ ( j) +ω(Z pZ )( j) ∈ [θ L , θU ]} and

g = (gk + � (Bk + αk I )pZ )(1,..., j−1, j+1,...,(n−m)) ∈ R(n−m−1), (21)

with Bk ∈ R(n−m−1)×(n−m−1) as Bk + αk I with the j th row and column removed. We next

perform the subspace minimization:

minpZ =[ p(1)

Z ,...,

p( j−1)Z ,

p( j+1)

Z ,...,p(n−m)Z ]�

gk� pZ + 1

2pZ Bk pZ , (22)

and calculate the new step as:

pZ ← [� p(1)

Z + p(1)Z , . . . , � p( j−1)

Z + p( j−1)Z , � p( j)

Z , � p( j+1)Z

+ p( j+1)Z , . . . , � p(n−m)

Z + p(n−m)Z

]�. (23)

After each such minimization, the most violated bound is identified, made active, and asub-space minimization is carried out again until the final step lies within the bounds,or exceeds the radius of the trust region in length, when the step calculated previouslyis sufficiently smaller than the trust region radius. Observe that each time we solve (22),we reduce the dimension of the QP by removing the variables at active bounds. Also, ifduring the above procedure, the overall step significantly exceeds the trust region radiusand the step calculated previously is significantly smaller than the trust region radius, thenwe recalculate the Marquardt parameter so that the step fits inside the trust region.

More information on the solution of this bound constrained subproblem can be foundin Gay [24], whose code has been used here for the solution of (11). Another approachbased on projected search directions has been suggested by Lin and More [35]. Here, it isnot necessary to grow the active-set incrementally because it is now possible to add morethan one constraint to the active-set per subspace minimization. However, for a small set ofbound constraints, this should not make much difference in computation effort.

3.3. Trust region modifications for coordinate basis

So far we have described conditions to ensure ‖pZ‖ ≤ δ2. This trust region should work wellin most cases. However, if during some iteration ‖Zk pZ‖ � ‖Yk pY ‖, ‖Zk pZ‖ ‖Yk pY ‖,or cos(θk) = 〈Zk pZ ,Yk pY 〉

‖Zk pZ ‖‖Yk pY ‖ ≈ −1, then the overall step may

• make poor progress toward feasibility,• make poor progress toward reducing the objective function, or• be extremely small and lead to slow convergence.

From [9, 33, 39] we require the entire step d be inside a trust region. But here ‖Zk pZ‖ +‖Yk pY ‖ �= ‖d‖ because we use the coordinate basis to calculate Zk and Yk . Having non-orthogonal Zk pZ and Yk pY may require a trust region that is significantly larger than δ1


for d . Hence we relate δ2 to δ1, with δ2 large enough to allow the tangential step to cutback sufficiently if the quasi-normal step significantly increases the objective function andhence the merit function (43). For this purpose, we impose δ2 = Mδ1, M > 2. This alsoconstrains the overall step in a trust region of radius with upper bound of (M + 1)δ1, i.e.

‖Zk pZ‖ ≤ Mδ1, and (24)

‖Yk pY + Zk pZ‖ ≤ (M + 1)δ1. (25)

We modify the tangential subproblem described in the previous section to determine Zk pZ

as follows. The singular-value decomposition of Zk gives:

Zk = Uk

[�

0

]V �

k , (26)

� =

σ1

. . .

σ(n−m)

, ∈ R

(n−m)×(n−m), σi ≥ σ j if i ≤ j. (27)

Now,

σ(n−m)‖pZ (α)‖ ≤ ‖Zk pZ (α)‖ ≤ σ1‖pZ (α)‖, for α ≥ 0, (28)

where pZ (α) is calculated from (18) and (19). If we solve (18) and (19) by replacing (19)by σ1‖pZ (αk)‖ ≤ δ2 and σ(n−m)‖pZ (αk)‖ ≤ δ2 respectively, we obtain values for αU

k andαL

k , upper and lower bounds on αk , respectively. We now solve for αk :

1

‖Zk pZ (αk)‖ − 1

δ2= 0, (29)

αLk ≤ αk ≤ αU

k , by a secant method

and impose a trust region on the overall tangential step. Note that this is not equivalentto solving (11) with ‖Zk pZ‖ ≤ δ2 substituted for ‖pZ‖ ≤ δ2. Instead, this is a cheapercalculation that uses the conditions (18) directly.

We conclude this section by discussing the hard case when Zk pZ is constrained in atrust region. For the hard case, the reduced Hessian is indefinite and the length of thestep calculated is much less than the radius of the trust region, even when the Marquardtparameter is arbitrarily close to the smallest eigenvalue in magnitude and (B +α I ) is nearlysingular. As seen in Section 3.2, the tangential trust region problem is first solved withoutimposing bounds on pZ . If we impose the trust region over pZ , as in (19), then we canfollow the approach of Gay [23], calculate a step in the direction of negative curvatureand constrain it within the trust region so that the length of the step equals the radius. Thedirection of negative curvature is the eigenvector corresponding to the smallest (and hence


most negative) eigenvalue of the Hessian. We follow a similar approach when we impose atrust region on Z pZ using (18):

‖Z pZ‖ = ‖−Z (B + α I )−1g + τ Zv‖ = δ2, (30)

where τ ∈ R, ‖v‖ = 1, and v is the eigenvector corresponding to the smallest eigenvalueof B. Upon solving this for τ , we obtain the following two values:

τ± =−v�(Z� Z )s ±

√(v� Z� Zs)2 + (δ2

2 − ‖Zs‖2)‖Zv‖2

‖Zv‖2, (31)

and s = −(B + α I )−1g. To select τ we define the quadratic function:

ϕ(s) = g�s + 1

2s� Bs, (32)

and choose the value of τ± that gives the lower value of ϕ(s + τv). The step Z pZ is nowconstrained within a trust region, δ2.

4. Implementation details

In this section we discuss a number of options for second order information and describea number of remaining details for our trust region algorithm. The section then concludeswith a statement of the implemented algorithm.

4.1. Calculation of the reduced Hessian and correction terms

We have implemented three options for second order information: the exact Hessian of theLagrangian, the Gauss-Newton approximation, and quasi-Newton updates of the reducedHessian. We use two different types of quasi-Newton updates depending upon the user’spreference. These are the positive definite rank-2 BFGS and the possibly indefinite SR1updates. In case of exact Hessian calculation, we use the software ADOL-C [28] to calculatethe Hessian of the Lagrangian and its product with the null-space matrix Z to form Bk =Z�

k Wk Zk and wk = Z�k WkYk pY . In addition, we use ADOL-C to calculate the gradient of

the objective function in all cases.

4.1.1. Quasi-Newton options. For the quasi-Newton updates, we define the vectors:

sk = pZ , and (33)

rk = Z�k+1gk+1 − Z�

k gk − wk . (34)


As in [5], wk is calculated by:

wk = Z�k [(g(xk + Yk pY ) + A(xk + Yk pY )λk+1) − gk], (35)

wk =

wk, ‖wk‖ ≤ ‖pY ‖/γk

wk‖pY ‖

γk‖wk‖ , otherwise,(36)

γk = 0.01(n − m)1/4k−1.1, (37)

and we implement Powell’s damping for the BFGS update [38]:

if r�k sk ≥ 0.2s�

k Bksk,

t = 1,

else

t = 0.8s�

k Bksk

s�k Bksk − r�

k sk,

endif

rk ← trk + (1 − t)Bksk . (38)

Now, the update is calculated as:

Bk+1 = Bk − Bksks�k Bk

s�k Bksk

+ rkr�k

r�k sk

, (39)

and skipped when r�k rk ≥ 108r�

k sk .For the SR1 update, we implement the following skipping rule:

if s�k (rk − Bksk) ≤ 10−8‖rk − Bksk‖‖sk‖, skip.

The SR1 update now is:

Bk+1 = Bk + (rk − Bksk)(rk − Bksk)�

(rk − Bksk)�sk. (40)

During calculation of the tangential step by (11), we may calculate the cross-term aswk = Z�

k WkYk pY if we know Wk . Otherwise, when we only have an update of the reducedHessian, we estimate this term using a finite-difference approximation of the matrix Z�

k WkYk

along pY , i.e.,

wk = Z�k [g(xk + Yk pY ) + A(xk + Yk pY )λk − gk] ≈ Z�

k WkYk pY . (41)


4.1.2. Gauss-Newton approximation for the Hessian of the Lagrangian. Objective func-tions in parameter estimation problems, such as least squares, reflect the error in fitting datato the process model. If this error is small at the solution, the values of the objective functionand its gradient are also small in magnitude. If ∇x f (x∗) ≈ 0 and ∇x c(x∗) is full rank, thenλ∗ ≈ 0 and the Hessian of the Lagrange function can be approximated by ∇2

xx f (x). ThisGauss-Newton approximation, also used in [15, 32, 43], significantly reduces the compu-tational work required to calculate the Hessian of the Lagrangian, and can accelerate localconvergence w.r.t. quasi-Newton Hessian approximations. On the other hand, for largeresidual problems the Gauss-Newton approximation may not be very effective and we mayhave to resort to exact-Hessian calculation or apply some switching strategy [1, 22, 43].

4.2. Lagrange multipliers and second-order correction

We can apply a Newton step to calculate the step d and Lagrange multipliers λk of QP (2)at iteration k to obtain:

Y �k WkYk Y �

k Wk Zk Y �k A�

k

Z�k WkYk Z�

k Wk Zk 0

AkYk 0 0

pY

pZ

λk

= −

Y �k gk

Z�k gk

ck

. (42)

If we ignore the quadratic terms Y �k WkYk and Y �

k Wk Zk in the first equation, we can obtainLagrange multiplier estimates using (13). Here, Lagrange multipliers play a subordinaterole in our algorithm and are used only to calculate the exact Hessian of the Lagrangian, ifrequired.

Our algorithm uses the non-differentiable �2 merit function:

φ(x, µ) = f (x) + µ‖c(x)‖, (43)

with µ being the penalty parameter. This merit function is susceptible to the Maratos effect,which we counter by calculating a second-order correction step that is added to the quasi-normal step. We employ the cheap calculation:

pSY = −C−1

k c(xk + d). (44)

This ensures a move toward feasibility of the constraints.

4.3. Predicted and actual reductions, trust region update, and penalty parameter

After calculating the quasi-normal and tangential steps, we assess the quality of the overallstep by comparing the reduction, if any, in the merit function to the reduction in the quadratic


models. The predicted reduction in the quadratic models contains the predicted reductionin the model for the tangential step (q)

qk = −(Z�k gk + wk)� pZ − 1

2p�

Z Bk pZ , (45)

and the predicted reduction in the model for the quasi-normal step (ϑ)

ϑk = ‖ck‖ − ‖ck + AkYk pY ‖. (46)

The overall predicted reduction is now:

pred = q + µ+ϑk − g�k Yk pY − 1

2p�

Y Y �k WkYk pY , (47)

where µ+ is the updated penalty parameter calculated below. When we update the reducedHessian with quasi-Newton approximations, we omit the term 1

2 p�Y Y �

k WkYk pY because wedo not know Wk . Otherwise, we use the complete expression (47).

The actual reduction is just the difference in the merit function at two successiveiterates:

ared = f (xk) − f (xk + Yk pY + Zk pZ ) + µ+(‖c(xk)‖ − ‖c(xk + Yk pY + Zk pZ )‖).

(48)

Here pY may include the second-order correction. The quality of the step is now assessedand the radius of the trust region altered. Also define

ρ = ared

pred. (49)

The step is evaluated by using a modification of the method in [38] as in figure 4.Finally, we choose an update strategy for the penalty parameter µ that ensures that pred

is positive, and µ is positive and monotonically increasing as we approach the solution.With this in mind, we employ the following updating rule which has also been used in [9]:

µ+ = max

[µ,

−(q − g�

k Yk pY − 12 p�

Y Y �k WkYk pY

)(1 − ζ )ϑk

]. (50)

Here, we choose ζ = 0.3. Again, if we use quasi-Newton updates we drop the12 p�

Y Y �k WkYk pY term in (50).


Figure 4. Evaluating the trust region step.

4.4. Convergence criteria

We choose different convergence criteria depending on whether the problems are equalityconstrained, bound-constrained or both. If bounds are absent in constrained problems orinactive at the solution, then the problem is said to have converged if

max{‖Z�k gk‖∞, ‖ck‖∞} ≤ tol, (51)

where tol is some tolerance defined by the user. If bounds are active at the solution, werevert to the first-order KKT conditions of the NLP (1) and the QP on the null-space (11)with the trust region on pZ . If some bounds are active at the solution of the NLP, they areactive at the solution of the null space QP due to the structure of the problem. From (11)we have:

Z�k gk + wk + Bk pZ + αk pZ − νL

k + νUk = 0. (52)


If the i th bound is active, we calculate rhs(i) = [−(Z�k gk + wk) − Bk pZ − αk pZ ](i) and set

ν(i),Lk = −rhs(i) or ν

(i),Uk = rhs(i) depending upon the lower or upper bound being active,

respectively. If the multipliers are nonnegative then the problem is said to have converged if

max{∥∥Z�

k gk − νLk + νU

k

∥∥∞, ‖ck‖∞

} ≤ tol. (53)

Note that we have the same convergence criterion if the trust region is on pZ or Z pZ , dueto the way we determine Z pZ .

If the problem has no equality constraints, we first check if the sum ‖gk−νLk +νU

k ‖∞ ≤ tol.If yes, then stop. Otherwise we also check if ared ≤ 2 · pred and pred ≤ tol. If yes, thealgorithm has converged with the so-called “Relative Function Convergence”. Otherwise, ifαk = 0 and ‖xk+1 −xk‖/‖xk+1 +xk‖ ≤ tol, the algorithm has converged with so-called “X-convergence”. Finally, if only ‖xk+1 − xk‖/‖xk+1 + xk‖ ≤ tol, the algorithm has convergedwith so-called “False convergence” [15].

4.5. Overall algorithm

We are now ready to state the complete algorithm.

0. Set all tolerances and constants; set B1 = I if using quasi-Newton updates, set x1,µ1 and δ1.

For k = 1, . . .

1. At xk evaluate fk , ck , gk , Ak , Yk and Zk .2. Calculate λk from (13) and if exact second derivatives are available, calculate Wk and

Bk = Z�k Wk Zk .

3. Calculate Yk pY from (14).4. If quasi-Newton options are used, calculate wk from (41). Else set wk = Z T

k WkY pY .For the exact option, Wk contains the multipliers from (13). For the Gauss-Newtonoption, set λk = 0.

5. Solve for pZ and hence Zk pZ as in Section 3.2–3.3.6. Update the penalty parameter (µ+) from (50).7. Calculate the merit function fk + µ+‖ck‖.8. Calculate ared and pred.9. Check if a second-order correction is needed (figure 4). Define d = Yk pY + Zk pZ +

Yk pSY if the second-order correction was needed or d = Yk pY + Zk pZ if not. Re-

calculate ared if the second-order correction was needed.10. Evaluate the step and update δ1 and δ2 from figure 4.11. If converged, STOP. Else, if (δ1 < δmin), STOP.12. Calculate xk+1 = xk + d .13. If quasi-Newton updates are used, calculate wk , sk , and rk as in Section 4.1 and update

Bk+1.

End For


5. Convergence properties

Convergence properties of this approach follow directly from well-known properties ofByrd-Omojokun methods [13, 38], combined with properties of Gay’s algorithm [23, 24].In this section we state the relevant assumptions and briefly summarize these results. Inparticular, we show that our algorithm is globally convergent, possesses Q-quadratic rateof convergence to a critical point, and can converge to strong second-order critical points.

5.1. Global convergence

We use the following assumptions:

1. The first and second derivatives of f (x) and c(x) exist and are Lipschitz continuous insome open set containing the iterates generated by the algorithm.

2. f (x) is uniformly bounded below and g(x), c(x), and A(x) are uniformly bounded, withc(x) being Lipschitz continuous, at all points generated by the algorithm.

3. The sequence of second derivatives: {Wk} is bounded.4. The singular values of the null-space matrix, Zk , are bounded above and below, i.e.,

satisfy: κ−1lbn ≤ σmin(Zk) ≤ σmax(Zk) ≤ κubn, ∀ k.

5. There is a constant κsv > 1 such that for all k, A(xk) and C(xk) have singular values thatsatisfy: 0 < κ−1

sv ≤ σmin(Ak) ≤ σmax(Ak) ≤ κsv and 0 < κ−1sv ≤ σmin(Ck) ≤ σmax(Ck) ≤

κsv .6. Strict complementarity and the linear independence constraint qualification (LICQ) hold

at limit points x∗.

We note that from Assumption 5, there exists a constant κc such that ‖pY ‖ ≤ κc‖c(xk)‖.With these assumptions, we briefly summarize the properties from [13].

Theorem 1 (All limit points are feasible). Suppose that above assumptions hold for thealgorithm in Section 4, then

limk→∞

c(xk) = 0. (54)

Proof: The proof is identical to Corollary 15.4.20 and follows from Assumption 6 andTheorem 15.4.19 of [13]. The proof of Theorem 15.4.19 follows from Lemma 15.4.8, whichensures boundedness of the penalty parameter and Lemma 15.4.17, which ensures a fractionof decrease in the Cauchy step. The proofs in [13] for these lemmas remain unchanged forthis algorithm.

Theorem 2. Suppose that above assumptions hold for the algorithm in Section 4, then

lim infk→∞

∥∥Z�k gk + νU

k − νLk

∥∥ = 0, (55)

where the multipliers νU and νL are calculated from Section 4.4.


Proof: The proof is by contradiction and is based on Theorem 2 in [24] that applies toGay’s bound constrained algorithm and follows in a similar manner as Theorem 15.4.10from [13].

We assume that all subsequences of the iterates do not have limit points that satisfy (55).From Theorem 1 we know that limk→∞ c(xk) = 0. We assume that ‖Z�

k gk + νUk − νL

k ‖ > 0for all k and we consider a limit point x∗ and a neighborhood

xk ∈ B(x∗, ε), (56)

where ‖Z�(x∗)g(x∗) + νU∗ − νL

∗ ‖ > 0. Now consider a particular subsequence whichhas this limit point, then for ε small enough, we can assume the active set to becomeconstant. Also, from Lemma 15.4.3 [13] one can show that the trust region within thisneighborhood will not decrease and very successful steps can be taken. As a result, since‖Z�

k gk + νUk − νL

k ‖ is bounded away from zero, the number of iterates within this neigh-borhood will be finite. Thus, the assumption that ‖Z�(x∗)g(x∗) + νU

∗ − νL∗ ‖ > 0 is

false.

Finally we state the theorem that asserts first order criticality of all limit points of allsubsequences of iterates.

Theorem 3 (All limit points are first-order critical). Suppose that the above assumptionshold for the algorithm in Section 4, then

limk→∞

‖Z�(xk)g(xk) + νU (xk) − νL (xk)‖ = 0. (57)

Proof: This property follows directly from Theorem 6.4.6 in [13].

5.2. Fast convergence

Here, we have to consider second order corrections to prevent Maratos effect. We apply asecond order correction of the form: dCS,�

k = [0 − c(xk + dk)�C−�k ] which can be derived

from the following equation:

[I A�

k

Ak 0

][dCS

k

λCSk

]= −

[gCS

k

c(xk + dk)

]. (58)

where (gCSk )� = [−c(xk + dk)�(C�

k Ck)−1 Nk | 0]. Note that the matrix in (58) is uni-formly bounded and gCS

k = O(‖xk − x∗‖‖dk‖). As a result Theorem 15.4.23 in [13] holdsand hence the second order correction does not interfere in the proof of global conver-gence. To derive the rate of convergence, we assume that in the neighborhood of a criticalpoint:


1. First order conditions and second order sufficiency conditions hold.2. Second derivatives of f (x) and c(i)(x) exist and are Lipschitz continuous.3. Assumptions 1, 2, 3, 4, 5 from the previous section.4. Penalty parameter is bounded above.5. Strict complementarity holds at every critical point.

Now, the active set is constant, steps are very successful for a small enough trust-regionradius, and eventually, the trust-region becomes inactive. Hence, Theorem 15.4.24 in [13]holds and is stated below.

Theorem 4. Let the iterates {xk} and the Lagrange multiplier estimates λk have a limitpoint (x∗, λ∗) where A∗ is full rank and first-order necessary and second-order sufficientconditions hold. Also let Assumptions 1, 2, and 6 from the previous section hold. Let Wk bethe exact Hessian of the Lagrangian, i.e., Wk = ∇2

xx f (x)+∑i λ(i)∇2

xx c(i)(x). Finally, let theNewton step be taken whenever possible when (xk, λk, ν

Lk , νU

k ) is close to (x∗, λ∗, ν∗,L , ν∗,U ).Then {xk} converges to x∗ Q-superlinearly and {xk} → x∗ Q-quadratically if ‖λk − λ∗‖ =O‖xk − x∗‖ and ‖νL

k − νL∗ , νU

k − νU∗ ‖ = O‖xk − x∗‖.

5.3. Convergence to second order points

Here we state the same property as in [13].

Theorem 5. Suppose the above assumptions hold and that the exact Hessian is used inthe algorithm, then the algorithm converges to a limit point that is a strong second ordercritical point.

Proof: This property follows directly from the proof of Theorem 15.4.25 in [13].

6. Results

We have tested our algorithm on a set of problems from the CUTE and the COPS sets.The problems selected have nonlinear equality and possibly bound constraints. Many havethe structure of parameter estimation problems including ELEC, α-pinene, gasoil, andmethanol, which are from the COPS set. In Tables 3–6 we present the results of solvingthese problems when the trust region is imposed on pZ and Z pZ , respectively. We have alsocompared the performance of the algorithm with quasi-Newton Hessian updates and theexact Hessian. The code has been written in Fortran 77 and is compiled with g77 and gcc(versions 2.96) with the compiler flag -g on a computer with dual-Pentium III processorsrunning Linux as the operating system. The algorithm is terminated when the KKT-error,(53), <10−6.

A note on certain problems is in order. The Square problems are three problems we con-structed to test our code’s performance in solving square problems. The VARIANT problemsare actually two QPs that have been described in [4]. The ORTHREG [29] problems areorthogonal regression problems with (n − m) = O(n), that fit a cardioid orthogonally to


Table 3. Quasi-Newton Hessian: Trust region on pZ .

No. Problem n/m ni/n f/ng/nc/n J δ01 , µ0 BFGS/SR1

1. Beale 2/0 15/15/30/0/0 1.0, 1.0 B

2. BT11 5/3 68/70/122/70/122 ” S

3. Zangwil2 2/0 3/3/6/0/0 ” B

4. Zangwil3 3/0 5/5/10/0/0 ” S

5. Rosenbrock 2/0 47/57/84/0/0 1.0, 1.0 B

6. Square1 2/2 6/0/0/6/12 1.0, 1.0 B, S

7. Square2 3/3 8/0/0/8/15 ” B, S

8. Square3 2/2 4/0/0/4/8 ” B, S

9. BT12 5/3 13/14/26/14/26 ” B

10. HS46 5/2 28/44/53/44/53 ” S

11. BT4 3/2 8/8/16/8/16 ” B

12. BT8 5/2 11/11/22/11/22 ” B∗

13. BT6 5/2 25/26/46/26/46 ” B

14. BT1 2/1 71/119/110/119/110 1.0, 1.0 B

15. HS78 5/3 11/11/21/11/21 ” S

16. HS79 5/3 10/10/10/10/10 ” B

17. HS56 7/4 16/18/30/18/30 ” B

18. HS40 4/3 6/6/12/6/12 1.0, 1.0 B

19. HS050 5/3 14/14/28/14/28 ” B

20. HS80 5/3 16/22/30/22/30 ” S

21. HS060 3/1 9/9/17/9/17 ” B

22. HS77 5/2 35/40/63/40/63 ” B

23. ELEC 30/10 316/476/534/476/534 0.1, 100.0 B

24. ORTHREGB 27/6 288/399/485/399/485 0.1, 1.0 S

25. ORTHREGC 25/10 733/991/1213/991/1213 0.01, 1.0 B

105/50 – – B, S

26. ORTHREGD 23/10 – – B, S

27. ORTHREGE 36/20 274/515/431/515/431 10.0, 0.1 B‡

28. HANGING CHAIN 402/203 – – B, S

29. GENHS28 300/298 4/4/8/4/8 1.0, 1.0 S

30. HAGER2 201/101 7/7/14/7/14 ” S

31. ORTHREGA 37/16 194/273/326/273/326 1.0, 100.0 S

13/4 33/35/60/35/60 1.0, 1.0 B

32. LCH 300/1 173/209/311/209/311 1.0, 1.0 S

33. HAGER4 1001/501 74/74/148/74/148 1.0, 1.0 B

2001/1001 ” B

34. VARIANT1 10000/9999 9/9/18/9/18 ” B

35. VARIANT2 2000/1000 4/4/8/4/8 ” B

36. α-pinene 1005/1000 – – B, S

37. Gasoil 4003/4000 66/69/125/69/125 1.0, 1.0 B

38. Methanol 4805/4800 62/111/104/111/104 100.0, 1.0 B

‡Different solution.


Table 4. Exact Hessian: Trust region on pZ .

�0 = 1.0, µ0 = 1.0No. Problem n/m ni/n f/ng/nc/n J δ0

1 , µ0 ni/n f/ng/nc/n J

1. Beale 2/0 10/10/8/0/0 1.0, 1.0

2. BT11 5/3 11/11/10/11/10 1.0, 1.0

3. Zangwil2 2/0 3/3/3/0/0 1.0, 1.0

4. Zangwil3 3/0 5/5/5/0/0 ”

5. Rosenbrock 2/0 30/30/24/0/0 0.1, 1.0 51/51/30/0/0

6. Square1 2/2 6/0/0/6/6 1.0, 1.0

7. Square2 3/3 8/0/0/8/7 ”

8. Square3 2/2 4/0/0/4/4 ”

9. BT12 5/3 6/6/6/6/6 ”

10. HS46 5/2 16/26/16/26/16 ”

11. BT4 3/2 6/6/6/6/6 ”

12. BT8 5/2 11/11/11/11/11 ”

13. BT6 5/2 15/17/13/17/13 ”

14. BT1 2/1 14/16/11/16/11 ”

15. HS78 5/3 5/5/5/5/5 ”

16. HS79 5/3 8/8/8/8/8 ”

17. HS56 7/4 9/9/9/9/9 ”

18. HS40 4/3 4/4/4/4/4 1.0, 1.0

19. HS050 5/3 10/10/10/10/10 ”

20. HS80 5/3 6/6/6/6/6 ”

21. HS060 3/1 6/6/6/6/6 ”

22. HS77 5/2 13/14/12/14/12 ”

23. ELEC 30/10 59/76/35/76/35 1.0, 10.0 270/322/184/322/184

75/25 145/195/84/195/84 0.1, 100.0 699/771/565/771/565

150/50 181/237/112/237/112 10.0, 100.0 454/585/282/585/282

300/100 766/1175/469/1175/469 10.0, 100.0 996/1294/628/1294/628

24. ORTHREGB 27/6 7/7/7/7/7 1.0, 1.0

25. ORTHREGC 25/10 12/12/10/12/10 1.0, 10.0 38/38/26/38/26

105/50 90/117/56/117/56 1.0, 1.0

26. ORTHREGD 23/10 55/70/31/70/31 10.0, 1.0 84/104/52/104/52

27. ORTHREGE 36/20 22/30/19/30/19 0.1, 1.0 592/1137/327/1137/327 ‡

28. HANGING CHAIN 402/203 220/225/206/225/206 1.0, 10.0 slow (>1000)

29. GENHS28 300/298 4/4/4/4/4 ”

30. HAGER2 201/101 6/6/5/6/5 ”

31. ORTHREGA 517/256 25/25/18/25/18 1000.0, 1.0 slow (>1000)

133/64 47/57/31/57/31 1.0, 10.0 309/428/184/428/184

37/16 45/54/27/54/27 10.0, 1.0 50/59/33/59/33

13/4 8/10/6/10/6 1.0, 1.0

(Continued on next page.)


Table 4. (Continued ).



32. LCH 300/1 26/31/19/31/19 10.0, 1.0 700+33. HAGER4 1001/501 5/5/5/5/5 1.0, 1.0

2001/1001 6/6/6/6/6 ”

34. VARIANT1 10000/9999 7/7/7/7/7 ”

35. VARIANT2 2000/1000 4/4/4/4/4 ”

36. α-pinene 4005/4000 13/13/13/13/13 1.0, 1.0

3005/3000 13/13/13/13/13 ”

2005/2000 11/11/11/11/11 ”

1005/1000 11/11/11/11/11 ”

37. Gasoil 4003/4000 44/44/36/44/36 1.0, 1.0

38. Methanol 4805/4800 5/5/5/5/5 100.0, 1.0 slow (>1000)


a set of points in a plane. The problems are set up so that at their initial points, both theobjective function and its gradient are zero. Hence, the initial tangential step and the mul-tipliers for the equality constraints are zero. The problems α-pinene, gasoil, and methanolare similar to the process engineering problems we are interested in solving. These prob-lems are discretized-ODE-constrained parameter estimation problems, where the numberof parameters remains constant and small as compared to the total number of variables.In each case, the objective is to minimize the least-squares error in measurements. Wehave solved four instances of the α-pinene and methanol problems, and three instancesof the gasoil problem (Table 7). These instances differ in the number of discretizationtime intervals and we have used the sizes presented in [17]. A robust algorithm mustsolve all instances with a similar amount of effort as measured by the number of functionevaluations.

Our algorithm is able to solve most of the problems with reasonable computational ef-fort (defined as number of function evaluations being less than 1000) when quasi-NewtonHessian approximations are used. The Jacobian and Hessian matrices of some problemstend to become ill-conditioned as the iterations proceed. This often leads to steps that causesmall decreases in the merit function. The algorithm is not able to solve some of the OR-THREG problems when the quasi-Newton updates are used for the Hessian. Biegler et.al. [4] have observed that frequent changes in basis, as C becomes ill-conditioned, arerequired for their algorithm to perform reasonably. This is a feature that we do not allowin our algorithm due to the structure of the problems we are interested in solving. TheELEC problem from the COPS collection is another difficult problem. Here, the prob-lem is to find the equilibrium state distribution of electrons positioned on a conductingsphere. The problem has many local minima where the objective function value is veryclose to that at the global minimum. The number of local minima also grows exponen-tially with the number of variables making the determination of the global minimum very


Table 5. Quasi-Newton Hessian: Trust region on Z pZ .

No. Problem n/m ni/n f/ng/nc/n J δ01 , µ0 BFGS/SR1

1. Beale 2/0 17/17/34/0/0 ” B

2. BT11 5/3 59/60/106/60/106 1.0, 10.0 S

3. Zangwil2 2/0 3/3/6/0/0 1.0, 1.0 B

4. Zangwil3 3/0 5/5/10/0/0 ” S

5. Rosenbrock 2/0 47/57/84/0/0 0.1, 1.0 B

6. Square1 2/2 6/0/0/6/12 1.0, 1.0 B, S

7. Square2 3/3 8/0/0/8/15 ” B, S

8. Square3 2/2 4/0/0/4/8 ” B, S

9. BT12 5/3 11/11/22/11/22 ” B

10. HS46 5/2 39/60/77/60/77 ” B

11. BT4 3/2 8/8/16/8/16 ” B

12. BT8 5/2 11/11/22/11/22 ” B

13. BT6 5/2 30/33/58/33/58 ” B

14. BT1 2/1 154/264/243/264/243 1.0, 1.0 B

15. HS78 5/3 14/14/26/14/26 ” S

16. HS79 5/3 16/16/29/16/29 ” B

17. HS56 7/4 14/14/26/14/26 ” B

18. HS40 4/3 6/6/12/6/12 ” B

19. HS050 5/3 11/11/22/11/22 ” S

20. HS80 5/3 16/16/31/16/31 ” B

21. HS060 3/1 10/10/19/10/19 ” S

22. HS77 5/2 22/23/41/23/41 ” B

23. ELEC 30/10 257/419/425/419/425 0.01, 100.0 B

24. ORTHREGB 27/6 – – B

25. ORTHREGC 25/10 428/500/785/500/785 1.0, 1.0 B

26. ORTHREGD 23/10 – – B, S

27. ORTHREGE 36/20 303/452/507/452/507 10.0, 0.1 B ‡

28. HANGING CHAIN 403/203 slow (>1000) – B

29. GENHS28 300/298 4/4/8/4/8 1.0, 1.0 B

4/4/8/4/8 ” S

30. HAGER2 2001/1001 7/7/14/7/14 ” B

31. ORTHREGA 37/16 624/639/1228/639/1228 10.0, 1.0 BFGS w/o damping

13/4 47/48/88/48/88 0.1, 1.0 B

32. LCH 300/1 626/807/1075/807/1075 1.0, 1.0 B

33. HAGER4 1001/501 74/74/148/74/148 1.0, 1.0 B

34. VARIANT1 10000/9999 9/9/18/9/18 ” B

35. VARIANT2 2000/1000 4/4/8/4/8 ” B

” ” S

36. α-Pinene 1005/1000 – – B

37. Gasoil 4003/4000 102/107/193/107/193 1.0, 1.0 B

38. Methanol 4805/4800 56/102/97/102/97 100.0, 1.0 B



Table 6. Exact Hessian: Trust region on Z pZ , full quadratic model for pred.



1. Beale 2/0 10/10/8/0/0 1.0, 1.0

2. BT11 5/3 12/12/11/12/11 ”

3. Zangwil2 2/0 3/3/3/0/0 ”

4. Zangwil3 3/0 5/5/5/0/0 ”

5. Rosenbrock 2/0 30/30/24/0/0 0.1, 1.0 50/50/29/0/0

6. Square1 2/2 6/0/0/6/6 1.0, 1.0

7. Square2 3/3 8/0/0/8/7 ”

8. Square3 2/2 4/0/0/4/4 ”

9. BT12 5/3 6/6/6/6/6 ”

10. HS46 5/2 16/26/16/26/16 ”

11. BT4 3/2 6/6/6/6/6 ”

12. BT8 5/2 38/61/16/61/16 0.1, 1.0 †

13. BT6 5/2 21/23/17/23/17 1.0, 1.0

14. BT1 2/1 16/17/10/17/10 ”

15. HS78 5/3 5/5/5/5/5 ”

16. HS79 5/3 8/8/8/8/8 ”

17. HS56 7/4 5/5/5/5/5 ”

18. HS40 4/3 4/4/4/4/4 ”

19. HS050 5/3 10/10/10/10/10 ”

20. HS80 5/3 6/6/6/6/6 ”

21. HS060 3/1 7/7/7/7/7 ”

22. HS77 5/2 12/13/11/13/11 ”

23. ELEC 30/10 157/167/110/167/110 1.0, 1.0

75/25 101/106/64/106/64 10.0, 10.0 568/580/307/580/307

150/50 230/241/156/241/156 10.0, 100.0 FAIL

300/100 255/271/162/271/162 10.0, 100.0 522/531/412/531/412

24. ORTHREGB 27/6 14/16/12/16/12 1.0, 1.0

25. ORTHREGC 25/10 17/18/16/18/16 10.0, 1.0 92/93/63/93/63

105/50 536/553/430/553/430 ‡ 1.0, 1.0

26. ORTHREGD 23/10 30/32/21/32/21 1.0, 10.0 57/61/37/61/37

27. ORTHREGE 36/20 686/1342/363/1342/363 ‡ 0.1, 1.0 slow (>1000)

28. HANGING CHAIN 403/203 102/114/86/114/86 1.0, 1.0

29. GENHS28 300/298 4/4/4/4/4 1.0, 1.0

30. HAGER2 2001/1001 5/5/5/5/5 ”

31. ORTHREGA 517/256 17/18/14/18/14 1000.0, 1.0 slow (>1000)

133/64 19/19/18/19/18 1.0, 1.0

37/16 30/34/21/34/21 0.1, 1.0 216/218/158/218/158

13/4 24/33/15/33/15 1.0, 1.0 77/94/42/94/42

(Continued on next page.)


Table 6. (Continued. )



32. LCH 300/1 FAIL FAIL

33. HAGER4 1001/501 5/5/5/5/5 1.0, 1.0

2001/1001 4/4/4/4/4 ”

34. VARIANT1 10000/9999 6/6/6/6/6 ”

35. VARIANT2 2000/1000 4/4/4/4/4 ”

36. α-Pinene 4005/4000 13/13/13/13/13 1.0, 1.0

3005/3000 13/13/13/13/13 1.0, 1.0

2005/2000 11/11/11/11/11 1.0, 1.0

1005/1000 11/11/11/11/11 ”

37. Gasoil 4003/4000 61/61/54/61/54 1.0, 1.0

38. Methanol 4805/4800 5/5/5/5/5 ” slow (>1000)

†Numerical difficulties; ‡Different solution.

Table 7. Comparison of quasi-Newton, exact, and Gauss-Newton Hessian calculations on parameter estimationproblems.

q-N Hessian Exact Hessian G-N HessianProblem n/m ni/n f/nc ni/n f/nc ni/n f/nc

ORTHREGB 27/6 288/399/399 7/7/7 6/6/6

ORTHREGC 25/10 733/991/991 38/38/38 38/38/38

105/50 – 90/117/117 80/81/81

ORTHREGD 23/10 – 55/70/70 19/19/19

ORTHREGA 517/256 – 24/25/25 248/325/325

133/64 – 47/57/57 88/124/124

37/16 194/273/273 45/54/54 89/122/122

13/4 33/35/35 8/10/10 5/5/5

α-Pinene 1005/1000 – 11/11/11 20/25/25∗

2005/2000 – 11/11/11 21/27/27∗∗

3005/3000 – 13/13/13 15/16/16∗∗∗

4005/4000 – 13/13/13 15/16/16∗∗∗

Gasoil 1003/1000 68/80/80 59/83/83 59/83/83

2002 / 2000 67/81/81 48/48/48 47/47/47

4003/4000 66/69/69 44/44/44 46/46/46

Methanol 605/600 47/79/79 5/5/5 6/6/6

1205/1200 52/91/91 5/5/5 6/6/6

2405/2000 55/100/94 5/5/5 6/6/6

4805/4800 62/111/111 5/5/5 6/6/6

∗Looser tolerance (1×10−5); ∗∗Looser tolerance (1×10−3); ∗∗∗Looser tolerance (1×10−2).


difficult. Ill-conditioning of the quasi-Newton Hessian matrices is also observed on thisproblem.

By implementing our algorithm with the exact Hessian calculation, we are neverthelessable to converge these problems with fairly competitive results when compared to [5, 17, 33].This shows that our algorithm is fairly robust. The last three problems: α-pinene, gasoil, andmethanol are parameter estimation problems. In each case, the objective is to minimize theleast-squares error in measurements subject to a system of ordinary differential equations.The algorithm takes less than 50 function evaluations on most cases and less than 100function evaluations on the largest instance of gasoil with 8003 variables. In most cases, thealgorithm converges in less than 15 function evaluations. The performance of the algorithmis insensitive to the size of the problem in that we obtain same or slightly different numberof function evaluations.

We also compare the performance of using a trust region on pZ with a trust region onZ pZ , using M = 2.5. In most cases, some of the small problems take slightly more functionevaluations when the trust-region is on Z pZ . This is attributed to the fact that these problemsare well conditioned and the algorithm “wants” to take a Newton step but is restricted byimposing the trust-region on Z pZ . (A larger value of M can correct this behavior.) However,imposing the trust-region on Z pZ gives better performance on ill-conditioned problemslike Hanging Chain and ORTHREGA than when we impose a trust-region on pZ (Tables 4and 6).

We then compare the performance of the quasi-Newton, the Gauss-Newton, and theexact Hessian calculations on some parameter estimation problems in Table 7. Observethat the quasi-Newton Hessian calculations perform poorly with respect to the other twocases. The problem α-pinene and the larger instances of ORTHREGA and ORTHREGCcould not be solved by quasi-Newton Hessian approximations. Gauss-Newton approxi-mation performs well on the small-residual problems like the smallest ORTHREGC andORTHREGA, gasoil, and methanol. It takes significantly more function evaluations to solvethe large-residual problems like the larger instances of ORTHREGA. It also cannot solvethe two largest instances of α-pinene. However, for small or zero residual problems, us-ing the Gauss-Newton Hessian approximation seems to be a cheap and reliable alterna-tive to calculating the Hessian of the Lagrangian or using ill-conditioned quasi-Newtonapproximations.

We finally give some comparison results with recent algorithms IPOPT [45, 46] andKNITRO [9], to highlight some of the characteristics of our algorithm. Both of these algo-rithms are based on an interior-point approach, rather than an active set strategy that we use.KNITRO solves the barrier problems by a trust region SQP approach. It uses an orthogonalbasis for quasi-normal and tangential steps, and solves the latter sub-problem using a trun-cated Newton method. On the other hand, IPOPT uses a line search algorithm with a filterinstead of a merit function and solves the QPs in full-space. Loss of reduced space positivecurvature is treated by augmenting the Hessian matrix with a positive diagonal matrix. Bothalgorithms use the exact Hessian of the Lagrangian.

Compared to KNITRO (version 1.0), Table 8, we obtain better performance in terms ofiterations, function, gradient, and constraint evaluations on 8 problems. We obtain compa-rable performance on 4 problems and worse performance on the remaining problems. We


Table 8. Comparison with KNITRO.

‖Z pZ ‖ ≤ 2.5δ1 ‖pZ ‖ ≤ δ1 KNITRONo. Problem n/m ni/n f/ng/nc ni/n f/ng/nc ni/n f/ng/nc

1. Beale 2/0 10/10/8/0 10/10/8/0 7/8/8/8

2. BT11 5/3 12/12/11/12 11/11/10/11 6/7/7/6

3. Zangwil2 2/0 3/3/3/0 3/3/3/0 2/3/3/3

4. Zangwil3 3/0 5/5/5/0 5/5/5/0 4/5/5/5

5. Rosenbrock 2/0 30/30/24/0 30/30/24/0 25/26/21/26

6. BT12 5/3 6/6/6/6 6/6/6/6 5/7/6/7

7. HS46 5/2 16/26/16/26 16/26/16/26 16/17/17/17

8. BT4 3/2 6/6/6/6 6/6/6/6 6/7/7/7

9. BT8 5/2 38/61/16/61 11/11/11/11 7/8/8/7

10. BT6 5/2 21/23/17/23 15/17/13/17 10/12/10/12

11. BT1 2/1 16/17/10/17 14/16/11/16 5/6/6/6

12. HS78 5/3 5/5/5/5 5/5/5/5 4/5/5/5

13. HS79 5/3 8/8/8/8 8/8/8/8 6/7/7/7

14. HS56 7/4 5/5/5/5 9/9/9/9 96/144/52/144‡

15. HS40 4/3 4/4/4/4 4/4/4/4 3/4/4/4

16. HS050 5/3 10/10/10/10 10/10/10/10 9/10/10/10

17. HS80 5/3 6/6/6/6 6/6/6/6 10/11/11/11

18. HS060 3/1 7/7/7/7 6/6/6/6 8/9/9/9

19. HS77 5/2 12/13/11/13 13/14/13/14 10/12/10/12

20. ELEC 30/10 157/167/110/167 59/76/35/76 36/58/23/58

75/25 101/106/64/106 145/195/84/195 44/63/28/63

150/50 230/241/156/241 181/237/112/237 46/63/33/63

300/100 255/271/162/271/162 766/1175/469/1175/469 90/146/53/146

21. ORTHREGB 27/6 14/16/12/16 7/7/7/7 3/4/4/4

22. ORTHREGC 25/10 17/18/16/18 12/12/10/12 9/10/10/10

23. ORTHREGD 23/10 30/32/21/32 55/70/31/70 8/9/9/9

24. ORTHREGE 36/20 686/1342/363/1342 22/30/19/30 92/163/51/163‡

25. GENHS28 300/298 4/4/4/4 4/4/4/4 2/3/3/3

26. HAGER2 2001/1001 5/5/5/5 6/6/5/6 3/4/4/4

27. ORTHREGA 517/256 17/18/14/18 25/25/18/25 †

28. HAGER4 1001/501 5/5/5/5 5/5/5/5 13/14/14/14

2001/1001 4/4/4/4 6/6/6/6 14/15/15/15

29. α-Pinene 1005/1000 11/11/11/11 11/11/11/11 4/5/5/5

30. Gasoil 4003/4000 61/61/54/61 44/44/36/44 353/671/188/671

31. Methanol 4805/4800 5/5/5/5 5/5/5/5 93/173/55/173

†Different objective attained; ‡Same objective function, some variables different.


removed from consideration the problems ORTHREGE, ORTHREGA, and HS56 becauseKNITRO obtains a different solution. On the large parameter estimation problems gasoiland methanol, our algorithm performs considerably better. We attribute worse performanceobtained by us on the ELEC and the ORTHREG problems due to the fact that we donot change the basis selection at all. We then examine IPOPT (version 1.6) in Table 9.Here, we obtain better performance on 10 problems, worse on 20, and comparable on 2.We removed the problem HS56 from consideration because IPOPT obtains a differentsolution.

We represent the above discussion in figures 5–7. These are performance profile plots ofthe kind proposed by Dolan and More [18]. Here, we compare different solvers based upona certain characteristic: function evaluations in our case. For each problem we identify thesolver that takes the minimum number of function evaluations to solve that problem. If theminimum number of function evaluations to solve problem i is f min

i and the total numberof problems is n p, for each solver, s,we calculate the ratio:

�si = log2

f si

f mini

, (59)

0 1 2 3 4 5 6 7 8 9 100.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Factor difference between function evaluations and minimum function evaluations

Fra

ctio

n of

pro

blem

s so

lved

quasiNewton Hessianexact Hessian

Figure 5. Results on CUTE problems.


Table 9. Comparison with IPOPT: full-space version, filter.

‖Z pZ ‖ ≤ 2.5δ1 ‖pZ ‖ ≤ δ1 IPOPTNo. Problem n/m ni/n f/nc ni/n f/nc ni/n f/nc

1. Beale 2/0 10/10/0 10/10/0 6/13/0

2. BT11 5/3 12/12/12 11/11/11 7/8/8

3. Zangwil2 2/0 3/3/0 3/3/0 1/2/0

4. Zangwil3 3/0 5/5/0 5/5/0 1/2/0

5. Rosenbrock 2/0 30/30/0 30/30/0 21/33/0

6. BT12 5/3 6/6/6 6/6/6 4/5/5

7. HS46 5/2 16/26/16 16/26/26 13/14/14

8. BT4 3/2 6/6/6 6/6/6 13/14/14

9. BT8 5/2 38/61/61 11/11/11 28/29/29

10. BT6 5/2 21/23/23 15/17/17 13/18/18

11. BT1 2/1 16/17/17 14/16/16 14/152/161

12. HS78 5/3 5/5/5 5/5/5 4/5/5

13. HS79 5/3 8/8/8 8/8/8 4/5/5

14. HS56 7/4 5/5/5 9/9/9 60/89/89‡

15. HS40 4/3 4/4/4 4/4/4 3/4/4

16. HS050 5/3 10/10/10 10/10/10 9/10/10

17. HS80 5/3 6/6/6 6/6/6 7/8/8

18. HS060 3/1 7/7/7 6/6/6 7/8/8

19. HS77 5/2 12/13/13 13/14/14 10/17/17

20. ELEC 30/10 157/167/167 59/76/76 31/32/32

75/25 101/106/106 145/195/195 68/91/91

150/50 230/241/241 181/237/237 68/69/69

300/100 255/271/162 766/1175/469 122/123/123

21. ORTHREGB 27/6 14/16/16 7/7/7 2/3/3

22. ORTHREGC 25/10 17/18/18 12/12/12 19/25/25

23. ORTHREGD 23/10 30/32/32 55/70/70 10/12/12

24. ORTHREGE 36/20 686/1342/1342 22/30/30 57/182/182‡

25. GENHS28 300/298 4/4/4 4/4/4 1/2/2

26. HAGER2 2001/1001 5/5/5 6/6/6 1/2/2

27. ORTHREGA 517/256 17/18/18 25/25/25 48/70/70

28. HAGER4 1001/501 5/5/5 5/5/5 10/11/11

2001/1001 4/4/4 6/6/6 10/11/11

29. α-Pinene 1005/1000 11/11/11 11/11/11 9/10/10

30. Gasoil 4003/4000 61/61/61 44/44/44 17/32/31

31. Methanol 4805/4800 5/5/5 5/5/5 10/11/11

‡Same objective function, some variables different.


0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Fra

ctio

n of

pro

blem

s so

lved

quasi–Newton Hessianexact HessianGauss–Newton Hessian

Figure 6. Parameter estimation problems.

where f si is the number of function evaluations taken by solver s for problem i . Then for

each fixed value of u ∈ {1, 2, 4, . . .}, we calculate

ς su =

∑i :�s

i ≤u

i/n p. (60)

Clearly, as u increases, so does ς su until s is able to solve the maximum number of problems

possible. In the figures 5–7, we have plotted ς su on the y-axis against various values of u on

the x-axis.In figure 5, we compare the performance of the quasi-Newton Hessian calculation with

the exact Hessian calculation when the trust-region is on pZ . Here, we have used only theproblems that the quasi-Newton option could solve. Observe that the exact Hessian op-tion solves all the problems in the least number of function evaluations. The quasi-Newtonoption eventually solves ≈92% of the problems. We compare the performance of the quasi-Newton Hessian approximations, the exact Hessian calculation, and the Gauss-NewtonHessian evaluation on the parameter estimation problems of Table 7. Here, the trust-regionhas been imposed on pZ . Observe that here, the quasi-Newton Hessian approximations arethe worst performers. Gauss-Newton approximations fail on α-pinene. For this problem,‖λ∗‖ is not small and the Gauss-Newton assumption is violated.


0 1 2 3 4 5 6 7 8 9 100.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Fra

ctio

n of

pro

blem

s so

lved

trust–region on ZpZtrust–region on pZKNITROIPOPT

Figure 7. Comparison with KNITRO and IPOPT.

To generate these plots, we have used the best performance provided by the differentHessian options. Finally, in Table 7, we compare our algorithm to KNITRO and IPOPT.Here, we have used exact second-order information. The two options with which we useour solver are the trust-region on pZ or Z pZ . Observe that both KNITRO and IPOPT arequite efficient on most problems. However, eventually our algorithm solves all the problemsbefore the others do, as seen in figure 7. We have again used the best possible results obtainedby our algorithm.

At this stage, there are various parameters in the algorithm that may be tuned to somewhatalter iteration count. These are the initial radius of the trust regions, the initial penaltyparameter, and the relationship between the trust regions on pZ and Z pZ . For most problems,we were able to obtain good results with (δ0

1, µ0) = (1, 1). We did not experiment withdifferent values of M because our objective at this stage was to check if our approach wassuitable for the class of problems that we have described.

7. Conclusions

Parameter estimation problems in process engineering are characterized by few decisionvariables with bound constraints and potentially many equality constraints. Moreover,


provisions must be made to deal with loss of positive curvature in the reduced Hessian.To deal with these problems, we describe a robust trust region SQP algorithm and compareits performance with existing solvers.

Our approach is based on the approach of Byrd and Omojokun with tangential and quasi-normal subproblems. Here we solve the quasi-normal problem using a conventional, inexactdogleg approach. On the other hand, because (n −m) is relatively small, the tangential sub-problem is solved using the bound constraint trust region algorithm of Gay [23], which usesdirect matrix factorizations. This trust region approach seems to be well suited for this classof parameter estimation problems.

The algorithm detects and exploits negative curvature and has strong descent proper-ties. In comparisons with general purpose line search (IPOPT) and trust region (KNITRO)methods, the algorithm performs quite well when exact Hessian information is used withfirst-order Lagrange multipliers. As future work we will be fine-tuning the implementationand considering more diverse and larger problems in process engineering.

Acknowledgments

We are grateful to Prof. Jorge Nocedal for his insightful comments during the course of thiswork. Financial support from the ELKEM Foundation is gratefully acknowledged for thiswork.

References

1. J.S. Albuquerque, “Parameter estimation and data reconciliation for dynamic systems,” PhD thesis, CarnegieMellon University, 1996.

2. N. Arora and L.T. Biegler, “Redescending estimators for data reconciliation and parameter estimation,”Computers and Chemical Engineering, vol. 25, p. 1585, 2001.

3. G. Bader and U Ascher, “A new basis implementation for a mixed order boundary value ODE solver,” SIAMJ. Scientific and Statistical Computing, vol. 8, p. 483, 1987.

4. L.T. Biegler, J. Nocedal, C. Schmid, and D.J. Ternet, “Numerical experience with a reduced hessian methodfor large scale constrained optimization,” Computational Optimization and Applications, vol. 15, p. 45, 2000.

5. L.T. Biegler, J. Nocedal, and C. Schmid, “A reduced Hessian method for large-scale constrained optimization,”SIAM J. Optimization, vol. 5, no. 2, p. 314, 1995.

6. A. Bjorck, E. Grimme, and P. van Dooren, “An implicit shift bidiagonalization algorithm for ill-posed systems,”BIT, vol. 34, p. 510, 1994.

7. I. Bongartz, A.R. Conn, N.I.M. Gould, and Ph.L. Toint, “CUTE: Constrained and unconstrained testingenvironment,” ACM Transactions on Mathematical Software, vol. 21, no. 1, p. 123, 1995.

8. D.S. Bunch, D.M. Gay, and R.E. Welsch, “Algorithm 717: Subroutines for maximum likelihood and quasi-likelohood estimation of parameters in nonlinear regression models,” ACM Transactions on MathematicalSoftware, vol. 19, no. 1, p. 109, 1993.

9. R.H. Byrd, M.E. Hribar, and J. Nocedal, “An interior point algorithm for large scale nonlinear programming,”SIAM J. Optimization, vol. 9, no. 4, pp. 877–900, 1999.

10. R.H. Byrd and J. Nocedal, “An analysis of reduced hessian methods for constrained optimization,” Mathe-matical Programming, vol. 49, pp. 285–323, 1991.

11. D. Calvetti, L. Reichel, and Q. Zhang, “Estimation of the L-curve via Laczos bidiagonalization,” BIT, vol. 39,no. 4, p. 603, 1999.

12. T.F. Coleman and Y. Li, “An interior trust region approach for nonlinear minimization subject to bounds,”SIAM J. Optimization, vol. 6, pp. 418–445, 1996.


13. A. Conn, N. Gould, and P. Toint, Trust-region methods, MPS-SIAM Series on Optimization, Philadelphia,PA, 2000.

14. J.E. Dennis, M. El-Alem, and M.C. Maciel, “A global convergence theory for general trust region basedalgorithms for equality constrained optimization,” SIAM J. Opt., vol. 7, p. 177, 1997.

15. J.E. Dennis Jr., D.M. Gay, and R.E. Welsch, “An adaptive nonlinear least-squares algorithm,” ACM Transac-tions on Mathematical Software, vol. 7, no. 3, p. 348, 1981.

16. J.E. Dennis, M. Heinkenschloss, and L. Vicente, “Trust-region interior-point sqp algorithms for a class ofnonlinear programming problems,” SIAM J. Control and Optimization, vol. 36, no. 5, p. 1750, 1998.

17. E.D. Dolan and J. More, “Benchmarking optimization software with COPS,” Technical report, ANL/MCS-246, Argonne National Laboratory, 2001.

18. E.D. Dolan and J. More, “Benchmarking optimization software with performance profiles,” MathematicalProgramming Series A, on-line, 2002.

19. M. El-Alem, “A global convergence theory for the Celis-Dennis-Tapia trust region algorithm for constrainedoptimization,” SIAM J. Numerical Analysis, vol. 28, p. 266, 1991.

20. R. Fletcher, N. Gould, S. Leyffer, P. Toint, and A. Wachter, “Global convergence of a trust region SQPfilter algorithm for general nonlinear programming,” SIAM J. Optimization, vol. 13, no. 3, pp. 635–665,2003.

21. R. Fletcher and S. Leyffer, “Nonlinear programming without a penalty function,” Mathematical Programming,vol. 92, no. 2, p. 239, 2002.

22. R. Fletcher and C. Xu, “Hybrid methods for nonlinear least squares,” IMA J. of Numerical Analysis, vol. 7,p. 371, 1987.

23. D.M. Gay, “Computing optimal locally constrained steps,” SIAM J. Scientific and Statistical Computing,vol. 2, no. 2, p. 186, 1981.

24. D.M. Gay, “A trust-region approach to linearly constrained optimization,” Numerical Analysis Proceedings(Dundee, 1983). D.F. Griffiths, (Ed.), Springer-Verlag, vol. 72, 1983.

25. S.M. Goldfeld, R.E. Quandt, and H.F. Trotter, “Maximization by quadratic hill climbing,” Econometrica,vol. 34, no. 3, p. 541, 1966.

26. G. Golub and U. von Matt, “Quadratically constrained least squares and quadratic problems,” NumerischeMathematik, vol. 59, p. 561, 1991.

27. F. Gomes, M.C. Maciel, and J.M. Martinez, “Nonlinear programming algorithms using trust regions andaugmented Lagrangians with nonmonotone penalty parameters,” Mathematical Programming, vol. 84, p. 161,1999.

28. A. Griewank, D. Juedes, and J. Utke, “ADOL-C: A package for the automatic differentiation of algorithmswritten in C / C++,” ACM Transactions on Mathematical Software, vol. 22, no. 2, p. 131, 1996.

29. M. Gullicksson, “Algorithms for nonlinear least-squares with Applications to orthogonal regression,” Tech-nical report, UMINF-178.90, University of Umea, Sweden, 1990.

30. P.C. Hansen, Rank Deficient and Discrete Ill-Posed Problems, SIAM: Philadelphia, 1999.31. M.D. Hebden, “An algorithm for minimization using exact second derivatives,” Technical report, Atomic

Energy Research Establishment report T.P. 515, Harwell, England, 1973.32. N. Krejic, J.M. Martinez, M. Mello, and E. Pilotta, “Validation of an augmented lagrangian algorithm with a

Gauss-Newton Hessian approximation using a set of hard-spheres problems,” Computational Optimizationand Applications, vol. 16, p. 247, 2000.

33. M. Lalee, J. Nocedal, and T. Plantenga, “On the implementation of an algorithm for large-scale equalityconstrained optimization,” SIAM J. Optimization, vol. 8, no. 3, p. 682, 1998.

34. S. Leyffer, R. Fletcher, and P. Toint, “On the global convergence of a filter SQP algorithm,” SIAM J.Optimization, vol. 13, no. 1, p. 44, 2002.

35. C. Lin and J.J. More, “Newton’s method for large bound-constrained optimization problems,” SIAM Journalon Optimization, vol. 9, no. 4, p. 1100, 1999.

36. J.J. More, The Levenberg-Marquardt Algorithm: Implementation and Theory, volume Numerical Analysis,Dundee 1977. Springer-Verlag: Berlin, 1977.

37. J. Nocedal and M.L. Overton, “Projected Hessian updating algorithms for nonlinearly constrained optimiza-tion,” SIAM J. Numer. Anal., vol. 22, no. 5, p. 821, 1985.

38. J. Nocedal and S.J. Wright, Numerical Optimization, Springer: New York, 1999.


39. E.O. Omojokun, “Trust region algorithms for optimization with nonlinear equality and inequality constraints,”PhD thesis, Department of Computer Science, University of Colorado, Boulder, 1989.

40. M. Rojas, “A large scale trust region approach to the regularization of discrete ill-posed problems,” PhDthesis, Rice University, 1998.

41. M. Rojas, S.A. Santos, and D.C. Sorensen, “A matrix free algorithm for the large-scale trust region subprob-lem,” SIAM J. Optimization, vol. 11, no. 3, p. 611, 2000.

42. M. Rojas and D.C. Sorensen, “A trust region approach to the regularization of large-scale discrete forms ofill-posed problems,” SIAM J. on Scientific Computing, vol. 23, no. 6, pp. 1842–1860, 2002.

43. I.B.F. Tjoa, “Simultaneous solution and optimization strategies for data analysis,” PhD thesis, Carnegie MellonUniversity, 1991.

44. M. Ulbrich and S. Ulbrich, “Nonmonotone trust region methods for nonlinear equality constrained optimiza-tion without a penalty function,” Mathematical Programming, Series B, on-line, 2002.

45. A. Wachter and L.T. Biegler, “Global and local convergence of a reduced space Quasi-Newton barrier algorithmfor large-scale nonlinear programming,” Technical report, CAPD Technical Report B-00-06, Carnegie MellonUniversity, August, 2000.

46. A. Wachter and L.T. Biegler, “Global and local convergence of line search filter methods for nonlinearprogramming,” Technical report, CAPD Technical Report B-01-09, Carnegie Mellon University, August,2001.

Documents

A Trust Region SQP Algorithm for Equality Constrained Parameter Estimation with Simple Parameter Bounds