15
Computers and Chemical Engineering 25 (2001) 1585 – 1599 Redescending estimators for data reconciliation and parameter estimation Nikhil Arora, Lorenz T. Biegler * Department of Chemical Engineering, Carnegie Mellon Uniersity, Pittsburgh, PA 15213 -3890, USA Received 30 October 2000; received in revised form 27 June 2001; accepted 29 June 2001 Abstract Gross error detection is crucial for data reconciliation and parameter estimation, as gross errors can severely bias the estimates and the reconciled data. Robust estimators significantly reduce the effect of gross errors (or outliers) and yield less biased estimates. An important class of robust estimators are maximum likelihood estimators or M-estimators. These are commonly of two types, Huber estimators and Hampel estimators. The former significantly reduces the effect of large outliers whereas the latter nullifies their effect. In particular, these two estimators can be evaluated through the use of an influence function, which quantifies the effect of an observation on the estimated statistic. Here, the influence function must be bounded and finite for an estimator to be robust. For the Hampel estimators the influence function becomes zero for large outliers, nullifying their effect. On the other hand, Huber estimators do not reject large outliers; their influence function is simply bounded. As a result, we consider the three part redescending estimator of Hampel and compare its performance with a Huber estimator, the Fair function. A major advantage to redescending estimators is that it is easy to identify outliers without having to perform any exploratory data analysis on the residuals of regression. Instead, the outliers are simply the rejected observations. In this study, the redescending estimators are also tuned to the particular observed system data through an iterative procedure based on the Akaike information criterion, (AIC). This approach is not easily afforded by the Huber estimators and this can have a significant impact on the estimation. The resulting approach is incorporated within an efficient non-linear programming algorithm. Finally, all of these features are demonstrated on a number of process and literature examples for data reconciliation. © 2001 Elsevier Science Ltd. All rights reserved. Keywords: Redescending estimator; Data reconciliation; Parameter estimation www.elsevier.com/locate/compchemeng 1. Introduction Data reconciliation and parameter estimation are important components to model fitting, validation, and real time optimization in chemical industries. In its most general form, data reconciliation is a minimiza- tion of measurement errors subject to satisfying the constraints of the process model. Parameter estimation is the step after data reconciliation in which the recon- ciled values of the process variables are used to set values for the model parameters (Marlin & Hrymak, 1997; Perkins, 1998). This inefficient two-step approach has led to the development of simultaneous strategies for data reconciliation and parameter estimation (DRPE; Tjoa, 1991). The most commonly used formu- lation of a simultaneous DRPE problem is: min (least squares error in measurements) (1.1) subject to model constraints and bounds Formulation (1.1) is based on the assumption that measurements have normally distributed random errors in which case least squares is the maximum likelihood estimator. The problem is compounded when gross errors or biases are present in the data as these can lead to incorrect estimations and severely bias reconciliation of the other measurements. Gross errors can arise from a plethora of reasons such as broken gauges, process leaks, improper use of measuring devices, and random sources, such as an operator preparing the process log * Corresponding author. Tel.: +1-412-268-2232; fax: +1-412-268- 7139. E-mail address: [email protected] (L.T. Biegler). 0098-1354/01/$ - see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII:S0098-1354(01)00721-9

Redescending estimators for data reconciliation and parameter estimation

Embed Size (px)

Citation preview

Computers and Chemical Engineering 25 (2001) 1585–1599

Redescending estimators for data reconciliation and parameterestimation

Nikhil Arora, Lorenz T. Biegler *Department of Chemical Engineering, Carnegie Mellon Uni�ersity, Pittsburgh, PA 15213-3890, USA

Received 30 October 2000; received in revised form 27 June 2001; accepted 29 June 2001

Abstract

Gross error detection is crucial for data reconciliation and parameter estimation, as gross errors can severely bias the estimatesand the reconciled data. Robust estimators significantly reduce the effect of gross errors (or outliers) and yield less biasedestimates. An important class of robust estimators are maximum likelihood estimators or M-estimators. These are commonly oftwo types, Huber estimators and Hampel estimators. The former significantly reduces the effect of large outliers whereas the latternullifies their effect. In particular, these two estimators can be evaluated through the use of an influence function, which quantifiesthe effect of an observation on the estimated statistic. Here, the influence function must be bounded and finite for an estimatorto be robust. For the Hampel estimators the influence function becomes zero for large outliers, nullifying their effect. On the otherhand, Huber estimators do not reject large outliers; their influence function is simply bounded. As a result, we consider the threepart redescending estimator of Hampel and compare its performance with a Huber estimator, the Fair function. A majoradvantage to redescending estimators is that it is easy to identify outliers without having to perform any exploratory data analysison the residuals of regression. Instead, the outliers are simply the rejected observations. In this study, the redescending estimatorsare also tuned to the particular observed system data through an iterative procedure based on the Akaike information criterion,(AIC). This approach is not easily afforded by the Huber estimators and this can have a significant impact on the estimation. Theresulting approach is incorporated within an efficient non-linear programming algorithm. Finally, all of these features aredemonstrated on a number of process and literature examples for data reconciliation. © 2001 Elsevier Science Ltd. All rightsreserved.

Keywords: Redescending estimator; Data reconciliation; Parameter estimation

www.elsevier.com/locate/compchemeng

1. Introduction

Data reconciliation and parameter estimation areimportant components to model fitting, validation, andreal time optimization in chemical industries. In itsmost general form, data reconciliation is a minimiza-tion of measurement errors subject to satisfying theconstraints of the process model. Parameter estimationis the step after data reconciliation in which the recon-ciled values of the process variables are used to setvalues for the model parameters (Marlin & Hrymak,1997; Perkins, 1998). This inefficient two-step approachhas led to the development of simultaneous strategies

for data reconciliation and parameter estimation(DRPE; Tjoa, 1991). The most commonly used formu-lation of a simultaneous DRPE problem is:

min (least squares error in measurements) (1.1)

subject to model constraints and bounds

Formulation (1.1) is based on the assumption thatmeasurements have normally distributed random errorsin which case least squares is the maximum likelihoodestimator. The problem is compounded when grosserrors or biases are present in the data as these can leadto incorrect estimations and severely bias reconciliationof the other measurements. Gross errors can arise froma plethora of reasons such as broken gauges, processleaks, improper use of measuring devices, and randomsources, such as an operator preparing the process log

* Corresponding author. Tel.: +1-412-268-2232; fax: +1-412-268-7139.

E-mail address: [email protected] (L.T. Biegler).

0098-1354/01/$ - see front matter © 2001 Elsevier Science Ltd. All rights reserved.PII: S 0 0 9 8 -1354 (01 )00721 -9

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991586

(Albuquerque & Biegler, 1996). A number of methodshave been devised to identify and negate the effects ofgross errors. These range from sequential or combina-torial methods to simultaneous data reconciliation andgross error detection. Crowe (1994) provides a goodsummary of the evolution of various methods for grosserror detection.

1.1. Simultaneous data reconciliation and parameterestimation

The general form of a simultaneous DRPE problemis:

minx,u,p

F(xM, x) (1.2)

s.t. h(x, u, p)=0

g(x, u, p)�0

xL�x�xU

pL�p�pU

uL�u�uU,

where F is some objective function dependent upon thedifference between the measurement of a variable andits value for all measured variables, xM is the set ofmeasurement data of the corresponding variable x, pthe set of parameters, U is the set of unmeasuredvariables, h is the set of equality constraints, and g isthe set of inequalities. In this case, if there are multiplemeasurements of a variable, its reconciled value wouldlie somewhere in between its successive measurements.This leads to a smaller variance of the reconciled vari-ables and also reduces their sensitivity to any grosserror detection tests. If we assume all measurementschange with each data set, the problem is an errors invariables measured (EVM) problem, formulated as:

minxi,ui,p

�m

i=1

Fi(xiM, xi) (1.3)

s.t.

h(xi, ui, p)=0

g(xi, ui, p)�0

xL�xi�xU

pL�p�pU

uL�ui�uU

�������

, �i

where the subscript i refers to the ith measurement setand the rest of the symbols mean the same as in Eq.(1.2). Finally, if the model comes from a discretizeddifferential algebraic equation (DAE) system, then timeis generally incorporated into the constraints and theobjective function. The constraints discretized succes-sively in time may not be decoupled in this case as inEq. (1.2) and (1.3). The size of the problem in Eq. (1.3)

is generally much larger than the problem in Eq. (1.2)and this can be a significant issue if the problem in Eq.(1.2) is itself large.

If there are biases or outliers in the data, the leastsquares objective function would be severely biasedleading to incorrect reconciliation and estimation. Thecommon procedure is to identify the measurements thatsuffer from gross errors and to somehow account forthem.

Yamamura, Nakajima and Matsuyama (1988) ap-plied the Akaike information criterion (AIC) to identifybiased measurements in a least squares framework forgross error detection. Due to the combinatorial natureof the problem attempted, they suggested a branch andbound method to solve the problem. The above combi-natorial algorithm for data reconciliation can be auto-mated by mixed integer programming techniques (seee.g. Soderstrom, Himmelblau & Edgar, 2000) but this iscomputationally expensive as it requires a discrete deci-sion for each measurement. The problems become evenmore difficult when the model constraints, the objectivefunction or both are non-linear. Albuquerque andBiegler used a robust M-estimator, the Fair function, toreduce the effects of gross errors. They used boxplots toidentify outliers from the residuals of regression. Theyalso review the advantages of using estimators based onrobust statistics. Here a key advantage for outlier detec-tion is the elimination of the combinatorial procedure.Estimators derived from robust statistics can be used asobjective functions in simultaneous DRPE problems.These estimators put less weight on large residualscorresponding to outliers. This results in less biasedparameter estimates and reconciled values. Statisticaltests can then be applied to identify outliers (Hoaglin,Mosteller & Tukey, 1983). Commonly used robust esti-mators include the M-estimators by Huber (1981), Rey(1988), Hampel (1974) and Hoaglin et al. (1983). Beforedata reconciliation is performed, it is beneficial to iden-tify the unobservable variables and parameters, andnon-redundant measurements. Here gross errors innon-redundant measurements can lead to unobservablevalues and non-unique solutions rendering the estimatesand fitted values useless. Stanley and Mah (1981) andlater Crowe (1989) proposed observability and redun-dancy tests for steady state data reconciliation. Albu-querque and Biegler (1996) extended these to dynamicsystems. To simplify and speed the calculations theyapplied sparse LU decomposition rather than a QRfactorization. This is what we have also used in thispaper.

The presence of many different methods for DRPEraises the question of whether there is a commonstatistical framework in which both combinatorial androbust methods can be interpreted. In an attempt toprovide an answer, we consider the AIC as a generalframework for DRPE. In the following sections we

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1587

describe the AIC and its applications to combinatorialand robust approaches to gross error detection. InSection 3, we describe mixed integer programming ap-proaches. In Section 4, we describe properties of robustestimators and introduce a Huber estimator, the Fairfunction, and a Hampel estimator, the three part re-descending estimator. We also suggest methods of out-lier detection. With these concepts, we provide a simplealgorithm for tuning the redescending estimator basedon the AIC. In Section 5, we solve a DRPE problemwith linear model constraints, compare the performanceof the robust estimators with mixed integer approaches,and show the similarity between the two approaches.We then solve two non-linear examples and also com-pare the performance of various estimators. Finally, weconclude the paper in Section 6.

2. DRPE with the Akaike information criterion

Data reconciliation and gross error detection can beaddressed as a model discrimination and parameterestimation problem, where multiple models correspondto the partitioning of random and gross errors. If morethan one of these models can be fitted to the data underconsideration, it becomes necessary to identify whichmodel to use. To this end, one is interested in obtainingthe most likely model and its parameters. Since maxi-mum likelihood estimators are asymptotically efficientunder certain conditions (Akaike, 1974), the likelihoodfunction is a very sensitive criterion of deviation ofmodel parameters from their true values. The AIC is anestimate of the Kullback–Leibler mean information fordistance between the true model and the model underconsideration. It is given by:

AIC= −2 log(maximum likelihood)

+2(c independently adjusted parameters

within the model). (2.1)

and can be re-written as:

AIC=E(S)=2 �N

i=1

− log(l(�(i, p),i, p))+2dim(p),

(2.2)

where E is the expectation, � is the measurement errorobtained after reconciliation, i is the observation × , lis the likelihood function, and p is the number ofindependently adjusted model parameters.

For data reconciliation, we consider the total numberof parameters to be given by:

dim(p)=dim(p0)+nout (2.3)

where p0 is the number of model parameters and nout isthe number of outliers. Here variables with outlyingmeasurements are treated as parameters becausetheir reconciled values are adjusted only from measure-

ments having no gross errors. For this paper, we con-sider the likelihood function to be the least squaresfunction formed after removing the outliers. We shallobserve later that the AIC also offers a novel method totune redescending estimators for efficient data reconcili-ation.

3. Mixed integer approaches for data reconciliation

Yamamura et al. (1988) used the AIC for DRPE ona linear system. They divided the set of measurementsensors (J) into faulty (F) and non-faulty (J−F)sets. For the faulty sensors they estimated the biases inthe following format:

min�i,�j,F

12

�j�J−F

� j2+

12

�j�J−F

(�j−�j)2+ �F� (3.1)

s.t. �j�J

aij�j=bi, (i�I).

Here �j are the studentized residuals, �j are the biasesscaled by the instrument standard deviations and I isthe set of equations resulting from elimination of themeasured variables. To systematically select F, theydevised a branch-and-bound strategy, selected a set ofbiased sensors and solved Eq. (3.1). This constituted thebranching operation. For bounding the objective func-tions, they divided the power set of J into two non-in-tersecting subsets constituting faulty and non-faultyinstruments. By solving Eq. (3.1) for these subsets theysuccessively improved the lower bound which is theobjective function of the above subproblems. This pro-cedure can easily be translated into a mixed integernon-linear program (MINLP) with binary variablesidentifying faulty sensors. Using a linear model of themeasured variables, we state the MINLP as:

minxi,�i,yi

�n

i=1

�(xiM−xi)

�i

−�i

�i

n2

+2 �n

i=1

yi (3.2)

s.t. Ax=0 ��i ��Ui yi ��i ��Li yi

yi�{0, 1} xi�0,

where n is the number of measured variables, xiM is the

measurement of the ith variable, xi is the reconciledvalue of the ith variable, �i is the standard deviation ofthe ith variable, yi is a binary variable denoting exis-tence of bias in the ith variable, A is the matrix forconstraints, �i is the magnitude of bias in the ithvariable, and Li and Ui are the lower and upper boundson bias in the ith variable. If we use an MINLP solversuch as DICOPT to solve Eq. (3.2), we cannot alwaysguarantee that all the intermediate mixed integer linearprograms (MILPs) would be feasible and bounded. Toensure boundedness of the MILPs we add bound con-straints of the form:

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991588

xi�Xi. (3.3)

Also, due to the presence of the absolute valueoperator in the bound constraints of Eq. (3.2), we haveto reformulate the problem in order to remove thenon-differentiability caused by this operator:

minxi,�i,yi,zi

�n

i=1

�(xiM−xi)

�i

−�i

�i

n2

+2 �n

i=1

yi (3.4)

s.t. Ax=0

�i�Ui yi−�i�Ui yi

�i−zi Ui−zi Li+Li yi�0

−�i−zi Ui−zi Li+Li yi�Li+Ui

zi�yi

0�xi�Xi

yi, zi�{0, 1}.

Here, zi is a binary variable for the sign of bias value �i

for the ith variable.Recently Soderstrom et al. (2000) devised a MILP

approach to minimize an objective function similar tothe AIC. The advantage of an MILP is that it elimi-nates the non-linear programming subproblem associ-ated with the MINLP algorithm. The quadratic term inEq. (3.4) is replaced by an l1 norm and a penalty isadded:

minxi,�i,yi,zi

�n

i=1

�(xiM−xi)

�i

−�i

�i

�+ �

n

i=1

wi yi (3.5)

s.t. Ax=0

�i�Ui yi−�i�Ui yi

�i−zi Ui−zi Li+Li yi�0

−�i+zi Ui+zi Li+Li yi�Li+Ui

zi�yi

0�xi�Xi

yi, zi�{0, 1}.

with wi being ‘weight’ functions that penalize identifica-tion of too many biases. The non-differentiabilitycaused by the l1 norm is removed by rewriting theargument of in the absolute value as the difference oftwo positive numbers:

xiM− (xi+�i)=ri−qi, (3.6)

qi, ri�0. (3.7)

Problem (3.5) contains a more robust objective func-tion but does not directly minimize the AIC. Also, thechoice of the weight functions may be arbitrary, andthe AIC is a more complete measure of model fitting asit also includes the maximum likelihood on good data.In the next section, we consider an alternative to solv-

ing the MINLP by using robust statistics and alsorelating this approach back to the AIC.

Both the MILP and the MINLP are suitable forproblems with only a few variables. For large problemssuch as EVM problems, the combinatorial overhead istoo great to justify their use, especially on-line. Also, ifthere are non-linear constraints in the problems, thecomputational overhead on the MINLP can be large.In what follows, we shall observe that not only dorobust estimators perform well, they also significantlyreduce the computational overhead associated withmixed integer approaches. They are also suitable forlarge EVM problems because they allow efficient non-linear programming algorithms to be applied.

4. Robust statistics and estimators based on robuststatistics

In most DRPE problems data are assumed to followa certain distribution which often determines the objec-tive functions used. Data are mostly assumed to followthe normal distribution, in which case the associatedlikelihood function is the least squares estimator. It isoften difficult to determine the distribution of datacorrupted with outliers. The use of estimators derivedfrom a fixed probability distribution is thus notjustified. In such cases, robust estimators may be used.Robust estimators are largely distribution independentand produce unbiased results in presence of dataderived from an ideal distribution (say, normal), butare insensitive to deviations from ideality (Albuquerque& Biegler, 1996). These estimators add less weight tooutlying measurements and this protects other measure-ments from being corrupted.

Let (x1, …, xn) be derived from distribution f(x), Tbe the estimator of parameter p, and a(x) be an ap-proximate model of the distribution of x. The unbiasedestimate of p is now p̂=T [ f(x)] and its approximateestimate is p̃=T [a(x)]. Let the distributions of theabove estimates of p be �(p̂, f ) and �(p̃, a). Now theestimator T [·] is robust iff��, �� :d( f, a)�� � d [�(p̂, f ),�(p̃, a)]��, (4.1)where d(·) is a distance function (Albuquerque &Biegler, 1996). This means that a finite difference be-tween the true and assumed p.d.f. leads to a boundeddifference in the parameter estimates. The influencefunction of an estimator is the effect of an observationon the estimates obtained. If the residuals � of anestimation process are drawn from a probability distri-bution f(�) and if T [ f(�)] is the unbiased estimate of aparameter p, then the influence function of a residual �0

is given by:

IF=�(�0)= limt�0

T [(1− t)f+ t�(�−�0)]−T [ f ]t

.

(4.2)

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1589

Here, �(�−�0) is the delta function centered about �0.The influence functions of robust estimators arebounded. In this paper we compare the least squaresestimator, the Fair function (Rey, 1988) and the three-part redescending estimator of Hampel which are allM-estimators.

4.1. M-estimators: properties and examples

We now discuss the form and properties of M-esti-mators for the objective function F in Eqs. (1.2) and(1.3). If l(xi �p) is the likelihood function of an observa-tion xi dependent upon parameters p, then the overalllikelihood function L, of the errors in N observationsis given by:

L= �N

i=1

l(xi �p). (4.3)

The M-estimator associated with this likelihood func-tion is its negati�e logarithm, i.e.

�M= �N

i=1

�i= − log (L)= − �N

i=1

log l(xi �p)

=F OR �N

i=1

Fi, (4.4)

depending on whether the problem is in the standardform or EVM, respectively. Here �M is the overallM-estimator and �i is the estimator associated with theith observation. Two particular M-estimators includethe Fair function, a Huber (1981) estimator, and theredescending estimator proposed by Hampel (1974).These M-estimators are defined as:

Least squares estimator

PLi=12

� i2, (4.5)

Fair function

PFi=C2���i �C

− log�

1+��i �C�n

, and (4.6)

Three-part redescending estimator of Hampel

PHi

�����������

12

� i2, 0� ��i ��a

a ��i �−a2

2, a� ��i ��b

ab−a2

2+ (c−b)

a2�

1−�c− ��i �

c−b2n

, b� ��i ��c

ab−a2

2+ (c−b)

a2

, ��i ��c

,

(4.7)

where �i is the ith residual of regression satisfying

c�b+2a. (4.8)

Here C is the tuning constant for the Fair function,a, b, and c are the tuning constants for the redescendingestimator, and least squares is the only non-robustestimator.

In Fig. 1, we depict plots of the above M-estimators.Observe that the Fair function increases only linearlyfor large residuals. This gives it some robustness com-pared with the least squares estimator because largeresiduals are given lower influence. The redescendingestimator becomes constant for large observations; thuslarge residuals have zero influence which makes theredescending estimator very robust (i.e. IF�0 as��0���). Both the Fair function and the redescendingestimator are approximately least squares for smallresiduals. This gives these estimators high efficiency fordata derived from a Gaussian distribution. Fig. 2 de-picts the probability distribution of the redescendingestimator for a=1, b=2, and c=4. Notice that thecentral part of the distribution is derived from a normaldistribution which is explained by the estimator being aleast squares function in [−a, a ]. The probability fallsrapidly in [−c, −b ] and [b, c ] until it becomes con-stant for (−�, −c) and (c, �). The long tail isresponsible for the robustness. Notice that both theleast squares estimator and the Fair function are con�exfunctions whereas the redescending estimator is non-convex. In many cases, one would need to initialize theredescending estimator by performing DRPE with leastsquares or the Fair function and, depending on thequality of the local solutions, one may have to resort toglobal optimization. The redescending estimator has thecapability of being tuned to the system under investiga-tion by setting/obtaining the best possible values for a,b and c. This would enable the ‘best’ possible fit to thedata and also detect the outliers. However, it is notFig. 1. Plots of the three M-estimators considered.

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991590

Fig. 2. Probability plot of the three point redescending estimator.

clear how to test the goodness of fit without priorknowledge of the process, i.e. the past data or somekind of expected performance. In tuning the estimator,it is deemed necessary to estimate two of the threeparameters. The third parameter can be assigned aworkable lower/upper bound by Eq. (4.8). In the re-mainder of this section, we apply the AIC to tune thisrobust M-estimator.

4.2. Tuning the robust estimators

The Fair function contains a single tuning constant Cthat is related to the asymptotic efficiency of the estima-tor (Rey, 1988) and allows the user to balance insensi-tivity to outliers with this efficiency (Albuquerque,1996). On the other hand, we will tune constants for theredescending estimator based on minimizing the AICassociated with applications of this estimator. FromEq. (4.7) we see that the constant a is related to theamount of contamination in the data, �. One could alsotune c, consider some relation between b and c, andobtain a from Eq. (4.8). Here the relationship of b to aor c depends upon the desired size of the redescendingregion (see Fig. 1). Setting b=c would make the esti-mator lose its resistance to rounding and groupingerrors. For samples drawn largely from a Gaussiandistribution, we want very few data items to lie outside[−b, b ]. So, in this case it would be acceptable to havelarge values for b and have b close to c. However, ifsamples are drawn from distributions with tails heavierthan Gaussian, it is advisable to have b small (Hoaglinet al., 1983). All these decisions require some prepro-cessing of data which may be substantial. Here wepresent a simple tuning strategy which requires no

preprocessing and is dependent upon the slope of theredescending region.

We identify the best estimator by a two-step proce-dure. First, various estimators resulting from differentcombinations of a, b, and c are considered so that theminima of the resulting AIC can be bounded. Boundsfor the location of the minimum are then selected and agolden section search (Edgar & Himmelblau, 1988;Fletcher, 1987) is performed over the value of c toobtain the best estimator. The complete procedure is:1. Select a wide range of redescending estimators by

setting a, b, and c, taking care to satisfy Eq. (4.8).2. Perform data reconciliation with each set of con-

stants and identify outliers by Eq. (4.10).3. Calculate the AIC and store its values and/or plot

them.4. We now know the approximate location of the

minimum. Select upper and lower bounds on thevalues of c within which the minimum seems to belocated.

5. Perform golden section search for the precise valueof c. In all iterations of the golden section search, use

b=c2

and a=(c−b)

2. (4.9)

6. The estimator eventually selected by golden sectionsearch is the estimator suited best to the data.

The tuning process outlined above is not optimalbecause search methods for selecting values of b given cor a could differ from Eq. (4.9). In fact, this selectioncould be extended to a multidimensional search in a, band c. However, because of the relatively minor influ-ence of a and b in the objective function, we believethat Eq. (4.9) leads to a reasonably good tuning.

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1591

4.3. Outlier detection

We obtain the residuals of regression after DRPE.An analysis of these residuals helps identify outlyingmeasurements. Different methods of outlier detectionexist for different estimators. For the redescending esti-mator, since we have an explicit cutoff point, c, foroutliers, an observation is deemed to have an outlier ifits corresponding residual

��c. (4.10)

No such cutoff points are available for the Fair func-tion or least squares. We thus have to resort to ex-ploratory data analysis (EDA; Hoaglin et al., 1983) toidentify outliers (Albuquerque & Biegler, 1996). TheEDA method we have used is the boxplot.

In order to construct boxplots, data are first sortedand their fourth spreads are calculated as:

dF=FU−FL, (4.11)

where FU and FL correspond to the upper and lowerfourths, respectively. Now, an observation u is deemedan outlier if

x�FL−32

dF (4.12)

or

x�FU+32

dF. (4.13)

5. Examples

We shall now motivate the benefits of using re-descending estimators for DRPE with three examplestaken from literature. The first example is a linearlyconstrained stream metering problem through which wemotivate the benefits of robust estimators with empha-sis on redescending estimators. We also compare theperformance of the mixed integer approaches on thisproblem. In the second and third examples, we tune theredescending estimator based on the AIC. The last twoexamples are non-linear EVM problems. All exampleshave been solved on a Pentium III machine with dual800 MHz CPUs or a Pentium III machine with single667 MHz CPU, both using Linux as the operatingsystem.

5.1. Linear example: stream metering

This problem was first presented by Serth andHeenan (1986) and subsequently used by Narasimhanand Mah (1987), Rollins, Cheng and Devanathan(1996) and Soderstrom et al. (2000). The problem is astream metering process with 28 streams flowing in andout of 11 nodes as depicted in Fig. 3. The flowrates ofthe streams are measured at the nodes. The problemhas 11 equality constraints and 28 variables, all ofwhich are measured, and all of the measurements areredundant. Each of the above authors have generateddata for the problem differently, so their results are not

Fig. 3. Stream metering problem: example 1.

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991592

Table 1Data reconciliation on the stream metering problem with the Fair function

OPs1ff AVT1s1

ff OPs2ffc Biased AVT1s2

ffHorizon length

35 0.624 2.736 0.640 3.9070.544 2.8735 0.551 3.149

7 0.515 2.789 0.448 3.4610.665 1.945 0.62710 3.46030.583 2.1115 0.539 2.8160.507 2.554 0.4807 2.7720.664 1.6663 0.62920 3.427

5 0.587 1.912 0.537 2.7250.470 2.6707 0.504 2.482

340 0.649 1.499 0.633 3.3815 0.589 1.887 0.536 2.683

0.471 2.5887 0.504 2.440

entirely comparable. We have used the approach ofSoderstrom et al. to generate data and compare ourresults with theirs. In order to solve the problem, ahorizon of data is generated by sampling process data atvarious time intervals and organizing them into a datamatrix as (Jang, Joseph & Mukai, 1986; Liebman,Edgar & Lasdon, 1992; Robertson, Lee & Rawlings,1996):

XM=

�����

x11M x12

M ··· x1HM

x21M x22

M ··· x2HM

· · · xn1

M xn2M ··· xnH

M

�����

(5.1)

where

xijM is the ith measured variable in jth time period, nthe number of measured variables, and H is the siz

e of horizon. (5.2)

The size of a horizon is fixed in order to accommo-date data in a real time system. Every new measurementleads to the ‘dropping’ of the corresponding oldestmeasurement. New horizons are constructed this way.This is akin to dropping-out the first column of the XM

matrix and appending a column. The DRPE problem isnow written as:

minxi

�H

i=1

�n

j=1

�M(xi−xijM)

s.t. Ax=0

x�0, (5.3)

where �M is an M-estimator, the Fair function or theredescending estimator.

A Monte Carlo study is performed by generatingdata by adding noise and biases to the true values ofthe variables. Standard deviations, �, of the observa-tions are taken to be 2.5% of the true values of thevariables. Normal noise from N(0, �) is added to thetrue values for any set of moving windows. Biases

numbering 3, 5 or 7, with their location selected ran-domly, are then added to the above. The magnitudes ofthe biases are taken to be between 5� and 25�. Onehundred moving windows, each of length H, are chosenfor a particular configuration and 100 such configura-tions are generated with the location of gross errorsbeing different across successive configurations. Theperformance of the estimators is defined by the ob-served power (OP) and the average number of type-1errors (AVT1). These are defined as:

OP=c of biased variables correctly identified

of biased variables simulated.

(5.4)

AVT1=c of unbiased variables wrongly identified as biases

c simulation trials.

(5.5)Data for the problem are generated using an applica-

tion developed in MATLAB. The data reconciliationproblem is solved using MINOS5 as the NLP solverwhich is invoked in GAMS. The redescending estimatorbeing a non-con�ex function, can lead to the presence oflocal optima. To obtain a good initialization, we firstsolve the problem with the convex Fair function as theobjective function. The results of this problem are thenused as starting points for the redescending estimator.This way, the starting points are feasible and hopefully,lead to the ‘best’ local optimum. Also to handle thenon-differentiable terms in Eq. (4.7) we apply an inte-rior point smoothing function based on Chen andMangasarian (1995), Gopal and Biegler (1999).

Outlier detection is now addressed in two ways. Fordata reconciliation with the Fair function, we have usedboxplots of the residuals as the method for outlierdetection (Eqs. (4.12) and (4.13)). The Fair functiontuned at a relative asymptotic efficiency of 70% wasused in all the trials because of the added robustness.We have used four different redescending estimators inorder to identify a range of the best redescendingestimators for the problem. Results are reported inTables 1 and 2. The superscripts ff and red indicate the

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1593

estimator used: ff=Fair function and red=redescend-ing estimator.

The Fair function performs worse than most cases ofthe redescending estimator (Tables 1 and 2). This alsosuggests using the results of DRPE with the Fair functionas starting points for DRPE with redescending estima-tors.

From Table 2, we observe that the OP stabilizes veryquickly between the different horizon sizes. It even showsa slight decrease as the horizon size is increased. Type-1

errors significantly decrease as the size of the horizon isincreased. Type-1 errors also decrease as the redescend-ing estimators are made less strict. This behavior is to beexpected as now the redescending estimator can toleratelarger residuals without flagging them as outliers. In allthe cases, we obtain a larger OP when redescendingestimators are used except when they are the leastrestrictive. More significant than OP are the type-1 errorswhose reduction with increasing horizon size indicatesincreasing accuracy of the robust estimators.

Table 2Data reconciliation on the stream metering problem using redescending estimators

OP (Soderstrom et al., 2000)AVT1redOPreda, b, cc BiasedHorizon AVT1 (Soderstrom et al., 2000)

0.67 0.230.5, 1, 2 0.6735 5.82031, 2, 4 0.626 2.065

0.3050.5712, 4, 84, 8, 16 0.455 0.069

0.510.606.0050.6100.5, 1, 252.4620.5521, 2, 4

2, 4, 8 0.498 0.6244, 8, 16 0.392 0.161

7 0.5, 1, 2 0.616 6.538 0.55 0.843.2950.5461, 2, 41.4070.4762, 4, 8

0.3764, 8,16 0.6090.69 0.1010 3 0.5, 1, 2 0.667 5.844

1.9010.6241, 2, 42, 4, 8 0.573 0.2804, 8, 16 0.454 0.065

0.560.606.0340.6060.5, 1, 252.3530.5441, 2, 4

2, 4, 8 0.499 0.5864, 8, 16 0.391 0.154

0.970.596.5260.6130.5, 1, 271, 2, 4 0.542 3.125

1.3570.4752, 4, 80.6110.3744, 8, 16

0.74 0.120.5, 1, 220 0.6693 5.7610.622 1.8511, 2, 4

0.2680.5722, 4, 80.0740.4524, 8, 16

0.620.635.9110.5990.5, 1, 251, 2, 4 0.540 2.2402, 4, 8 0.499 0.5654, 8,16 0.392 0.155

0.62 0.740.5, 1, 27 0.602 6.4523.0070.5321, 2, 4

2, 4, 8 0.470 1.3644, 8,16 0.375 0.602

40 0.170.755.6530.6660.5, 1, 231.7410.6241, 2, 40.2490.5772, 4, 8

4, 8, 16 0.456 0.0610.590.5890.5, 1, 2 5.7945 0.71

2.1771, 2, 4 0.5381.6690.5102, 4, 80.1630.3884, 8, 16

7 0.5, 1, 2 0.597 6.297 0.59 1.261, 2, 4 0.528 2.8572, 4, 8 0.472 1.3114, 8, 16 0.380 0.570

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991594

Table 3Results of data reconciliation with MINLP and MILP on five hori-zons and three biases

OPSolution method AVT1wi

0.885 0.071MINLP/MILP –0.6851.0 2.265

2×H 0.704 0.081100.0 0.112 0.000

ever, each time we would have to solve a potentiallyexpensive problem (MILP vs. NLP) which is not avery attractive proposition.

In Table 4 we also present values of the AIC forthe above four redescending estimators and theMINLP. The AIC values have been scaled by thetotal number of measurements. Results are reportedfor the three biases, horizon size=five problem. Asexpected, the redescending estimator giving a higherOP also has lower AIC. The AIC increases as OPdecreases for the redescending estimator. The AIC ofthe MINLP is smaller than the AIC of all redescend-ing estimators. This is explained by a better overall fitproduced by the MINLP due to a discrete decision oneach measurement.

Note also that OP and AVT1 which are commonlyused to assess the performance of an estimator, areconsistent but not as complete as the AIC, whichmaximizes the overall probability of the estimator forthe DRPE problem with gross errors. This is becausethe AIC considers the reconciled residual values aswell; that is, an estimator having high OP may pro-duce poor reconciliation in uncorrupted measure-ments. This can be seen clearly in Table 4.

From Table 4 we see that the minimum value forthe AIC is provided by the MINLP approach and theMILP results approach this for wi=1 and wi=2×H.In comparison, AIC for the NLP approach with theredescending function is not as good as MINLP orMILP approaches, although it is still quite reasonable.Moreover, these results can be further improved bytuning the constants using the AIC as a guide. On theother hand the MINLP problem is very expensive forthis example; it requires about 40 times more compu-tation than the redescending approach. In fact, theexpense of the MINLP approach allowed us to con-sider only 15 horizons out of the 100 that were for-mulated.

With respect to the MILP approach, the NLP ap-proach requires about the same computational ex-pense, although this depends heavily on the weightfactor in the MILP formulation. However, the NLPapproach can be extended directly to non-linear mod-els with similar performance. On the other hand, ex-tending the MILP approach to non-linear modelsrequires an MINLP formulation, which is likely to bemuch more expensive.

The solution of non-linear examples is considered inthe next section, along with the need to obtain theredescending estimator best suited to the system, usingan AIC tuning procedure. Table 4 shows that the AICis strongly influenced by tuning parameters in the re-descending estimator. This motivates the need tosearch for the redescending estimator best suited tothe process; it is the focus of the next two non-linearexamples.

Table 4Performance of the redescending estimator, MINLP, and MILPcompared on five horizons and three biases

CPU time perSolution method AIChorizon(s)

Redescending estimators1.1143.491Redescending estimator (a=0.5,

b=1, c=2)14.112Redescending estimator (a=1, 1.075

b=2, c=4)40.299Redescending estimator (a=2, 1.057

b=4, c=8)96.399Redescending estimator (a=4, 1.019

b=8, c=16)

Mixed integer approachesMINLP 1.067 42.989MILP (wi=1) 4.3691.211

1.117MILP (wi=2×H) 0.5920.386MILP (wi=100.00) 17.148

5.1.1. Results from MINLP and MILPFinally, we compare the performance of the

MINLP and the MILP with the redescending estima-tor and the Fair function. Refer to Eq. (3.2) where wehave set Ui to 30. �i and Li to 3. �i so that these arefairly generic bounds. We have tested the efficacy ofEq. (3.4) on a problem with a horizon size of five, thenumber of biases equal to 3, the number of movingwindows per data set equal to 100, and 100 overalldata sets. The above problem has been solved usingthe solver DICOPT invoked in GAMS calling CPLEXas the MIP solver. Results are summarized in Table 3.

The problem formulated in this way is the truemaximum likelihood estimator of the AIC. We thensolved the problem as by Soderstrom et al. Here onecan set the weight functions to any value and obtaindifferent results. In Table 3 we have given results forsome values of the weight functions. Observe that weobtain much better results (higher OP and lowerAVT1) than Soderstrom et al. when we set wi=2×Hwhere H is the horizon length. We obtain similarresult when we do not weight the binary variables(wi=1) and very poor performance when we setweight to arbitrary large value of 100. This howeverdrives the AVT1 to 0. The weights can be optimizedas we have done for the redescending estimator. How-

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1595

5.2. Non-linear examples

5.2.1. Pai and Fisher (1988)This problem has been taken from Pai and Fisher

(1988) and has also been used by Tjoa (1991) to studythe contaminated normal estimator. We have analyzedthis problem by solving it both in the standard formula-tion (Eq. (1.2)) and the EVM formulation (Eq. (1.3)).In this example there are six equality constraints, fivemeasured variables—all measurements redundant—(x1–x5), and eight variables in all, the unmeasuredvariables being observable. Variables x1–x5 are mea-sured 100 times and each measurement set has 20 grosserrors in the form of stuck sensors. The first 20 mea-surements of x1 have gross errors followed by the next20 measurements of x2 and so on, making a total of 100gross errors. In addition, normal noise is added with aS.D. of 0.1. The exact values of the variables are:

xexact= [4.5124, 5.5819, 1.9260, 1.4560, 4.8545]T,

and uexact= [11.070, 0.61467, 2.0504]T (5.6)

First consider the data reconciliation problem solvedin the standard form henceforth referred to as case 1.Results are summarized in Table 5. All estimatorsproduce reconciled values that are very close to the truevalues of the variables. This is because of the largeredundancy introduced by multiple measurements ofthe same variable. Neither least squares estimator northe Fair functions were able to flag outliers. This isverified by boxplots (4.11), (4.12) and (4.13). The ro-bustness offered by the Fair function is offset by thelarge redundancy in the measurements. On the otherhand, we consider reconciliation performed with theredescending estimators (M1–M8). From Table 5 weobserve that neither estimator M1–M4 is able to detectoutliers. Outlier detection starts from M5 where we also

observe a sharp decrease in the value of AIC. M7 andM5 detect more than 80% of the measurements as beingcorrupted. This is due to the increase in type-1 errorswith increasing robustness of the estimator (Davies &Gather, 1993). The AIC also increases from M7 to M8.With this knowledge we tuned the redescending estima-tor. The resulting estimator, reconciled values due to itand its AIC are given in Table 5. We have divided theAIC by the number of measurements, i.e. 500, in orderto scale its values. As mentioned before, the tunedestimator lies close to M6 and provides the best estima-tion. It, however, contains more type-1 errors than M6

but these do not deteriorate the reconciliationsignificantly.

We then solved the problem in the EVM form (Eq.(1.3)) henceforth referred to as case-2. The redescend-ing estimators were the same as were used in thestandard form. In this study, however, we count thenumber of outliers identified correctly and the numberof observations incorrectly flagged as outliers. The leastsquares estimator is again unable to detect any outliers.The Fair function is more sensitive in this case and aswe make it more robust, it detects more outliers as seenin Table 6. However, the maximum power of outlierdetection is only 31%. The redescending estimators startdetecting outliers from M4 but most of the outliersappear as type-1 errors, which progressively increasefrom M4 to M5 as shown in Table 6. Overall power isless than 75% and this indicates that the quality ofestimation is not very good. The values of AIC are,however, lower than for case 1 (Table 6). We thentuned the estimator which led to identification of anestimator close to M6 (Table 6). The tuned estimatorperforms somewhat better than M6 as it suffers fromfewer type-1 errors and leads to higher power. We canconclude that redescending estimators are more sensi-tive than the Fair function in detecting outliers but this

Table 5Data reconciliation of example 2 in standard form—case 1

u1x5x3x2x1 u3u2x4

1.92605.58194.5124True values 2.05040.614611.0704.85451.45604.5503 5.5602 1.9208 1.4755LSE 4.8442 11.1361 0.6152 2.0497

Fair function (effect=95%) 1.92164.5444 5.5633 2.05020.615111.12644.84681.47134.5448 2.0510Fair function (effect=80%) 0.615211.12794.84891.46901.92165.56264.5474 5.5608 1.9212 1.4687Fair function (effect=70%) 4.8497 11.1331 0.6152 2.0510

Redescending estimator (a, b, c, nout, AIC)1.91944.5605 1.4801 4.8420 11.1541 0.6154 2.04975.5543M1 (10, 20, 40, 0, 4.353)

1.4801 4.8420 11.1541 0.6154M2 (5, 10, 20, 0, 4.353) 2.04974.5605 5.5543 1.9194M3 (4, 8, 16, 0, 4.353) 2.04960.615411.15694.84131.48121.91925.55344.5622

0.615211.13254.84681.4720 2.05051.92125.56124.5477M4 (2, 4, 8, 0, 4.360)1.4627 4.8514M5 (1, 2, 4, 80, 1.768) 11.09774.5279 0.6149 2.05055.5727 1.9239

4.5313 5.5708 1.9234 1.4644M6 (0.5, 1, 2, 119, 1.083) 4.8505 11.1036 0.6149 2.05044.5661 2.0561M7 (0.2, 0.4, 0.8, 287, 1.227) 0.615511.17024.85711.46431.91895.5476

2.05620.615611.17754.85621.4661M8 (0.1, 0.2, 0.4, 378, 1.525) 1.91835.54524.57024.5449Tuned (0.369, 0.739, 1.478, 158, 1.034) 5.5609 1.9217 1.4609 4.8562 11.1311 0.6152 2.0539

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991596

Table 6Results of estimation of example 2 in the EVM form—case 2

Estimator nout Type? x1 x2 x3 x4 x5

CorrectFair function (effect=95%) 01 0 0 0 0Correct 0 18 0Fair function (effect=80%) 0 4

48Fair function (effect=70%) Correct 0 9 17 1 4

Redescending estimator (a, b, c, nout, AIC)Correct 0M1 (10, 20, 40, 2.747) 00 0 0 0

M2 (5, 10, 20, 2.747) 0 Correct 0 0 0 0 0Correct 0M3 (4, 8, 16, 2.759) 00 0 0 0Correct 0 02 0M4 (2, 4, 8, 2.859) 0 0

94M5 (1, 2, 4, 1.323) Correct 0 10 20 9 7182M6 (0.5, 1, 2, 0.980) Correct 4 9 20 6 7

Correct 10 11316 20M7 (0.2, 0.4, 0.8, 1.276) 9 20352M8 (0.1, 0.2, 0.4, 1.410) Correct 18 19 20 13 5

Correct 17 10Tuned (0.697, 1.393, 2.787, 0.882) 20129 10 1

Fig. 4. Connected tanks.

is often at the price of type-1 errors. Even when thereare repeated measurements of variables as in Case1,redescending estimators are superior to the Fair func-tion due to their ability to discard outliers. We nowexamine the final example, which furthers theseconclusions.

5.2.2. Connected tanksThis problem concerns two CSTR’s connected by a

valve with liquid flowing into the first tank, from thefirst tank into the second tank, and out of the secondtank. The three flows, F0, F1, and F2, and the heights L1

and L2 are measured periodically. The areas of thetanks, A1 and A2 are the unknown parameters. Theprocess is depicted in Fig. 4 and the system is describedby an index-2 DAE system (Albuquerque & Biegler,1996).

Measurement data is simulated as was done by Albu-querque and Biegler (1996). All data have normal noise,with variance 0.01, added. In addition, the sensors forF2 and L1 are assumed to be stuck at the second timeinstant for measurement. The DAE system is dis-cretized by implicit Euler’s scheme. Measurements aretaken at each interval of discretization. Thus this prob-lem is an EVM problem with coupling in successiveconstraints for each set of measurements.

All measurements except F0 are redundant and theparameters 1/A1 and 1/A2 are observable. The problemis solved with the least squares estimator, Fair func-tions at 95, 80, and 70% asymptotic efficiencies, and theredescending estimator. All cases have been solved us-ing the reduced Hessian successive quadratic program-ming (rSQP) approach of Biegler, Nocedal and Schmid(1995). Results for estimation of 1/A1 and 1/A2 havebeen summarized in Table 7. Observe that the leastsquares estimator performs very poorly because theestimates of both parameters hit their lower and upperbounds, respectively. This behavior is to be expectedbecause of no robustness of the least squares estimator.The Fair functions perform somewhat better but stillthe estimate of 1/A1 hits the lower bound. Again, theestimate of 1/A2 becomes better as the Fair functionbecomes more robust. In contrast, the redescendingestimators perform a lot better. From M2 to M6, theestimates of 1/A1 and 1/A2 are very close to their truevalues. The AIC values (Table 7) have again beenscaled as in example 3. The AIC values indicate that thebest estimator lies close to M6. Upon tuning the estima-tor, we find this to be indeed the case. The estimates of1/A1 and 1/A2 are almost their true values. In Figs. 5and 6, we have plotted the fitted values of L1 and F2

when estimation is performed by the least squares

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1597

Table 7DRPE on connected tanks

1/A2Trials 1/A1

True values 0.50.5Least squares 0.55a0.25a

0.3200.25aFair function (effect=95%)0.25aFair function (effect=80%) 0.388

0.4390.25aFair function (effect=70%)

Redescending estimators (a, b, c, nout, AIC)0.3910.308M1 (a=8, b=16, c=32, 48, 24.855)0.485M2 (a=4, b=8, c=16, 49, 36.647) 0.4540.4980.484M3 (a=2, b=4, c=8, 95, 1.409)0.499M4 (a=1, b=2, c=4, 95, 1.078) 0.4850.4990.498M5 (a=0.5, b=1, c=2, 103, 1.048)

M6 (a=0.1, b=0.2, c=0.4, 163, 1.303) 0.492 0.4950.502 0.499Tuned (a=0.362, b=0.724, c=0.1449, 105,

0.974)

a Bounds.

6. Conclusions

A simultaneous data reconciliation and parameterestimation strategy using redescending estimators hasbeen presented. Redescending estimators are found tobe very robust and this is brought about by theirsuperior performance w.r.t. a Huber estimator, the Fairfunction, on a variety of examples. In addition, outlierdetection with the redescending estimators has beenshown to be very straightforward. We have also pro-vided a derivation of DRPE based on AIC and com-pared and contrasted broad strategies that can bederived from AIC. The MINLP approach is a directminimizer of AIC and this is supported by earlier workof Yamamura et al. (1988). There also exists a relatedwork by Soderstrom et al. (2000). On the other hand,robust statistics and the redescending function providea similar task of minimizing AIC by requiring only thesolution of an NLP, hence reducing the computationalload of the MINLP/MILP and alleviating problemsassociated with these. An innovative two-step tuningstrategy for the redescending estimator has been sug-gested that is based upon minimizing the AIC. This hasbeen found to work very well with the examples consid-ered. Future work will deal with the development ofmore efficient and reliable NLP algorithms to takeadvantage of the structure of DRPE problems withM-estimators. Important issues here are also reliablealgorithm behavior even in the presence of non-redun-dant variables. These shall be addressed in a futurepaper.

estimator, the Fair function (efficiency=70%), and thetuned redescending estimator. In addition, we have alsoplotted the original noisy data and the stuck measure-ments. The poor performance of least squares is clearlybrought out where fitted values due to it have beengrossly underestimated. The Fair function underesti-mates L1 but fits F2 very closely. This is the effect of1/A1 being at the lower bound. In both cases, theredescending estimator ignores the noise and fits L1 andF2 at their true values.

Fig. 5. Fitted values of L1.

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–15991598

Fig. 6. Fitted values of F2.

Acknowledgements

Funding from the Elkem Foundation is gratefullyacknowledged for this work. The authors are alsograteful to Dr Guillermo Sentoni for very helpful sug-gestions on redescending functions over the course ofthis work.

References

Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control, AC-19(6), 716.

Albuquerque, J. S. (1996). Parameter Estimation and Data Reconcilia-tion for Dynamic Systems. Carnegie Mellon University: Ph.D.thesis.

Albuquerque, J. S., & Biegler, L. T. (1996). Data reconciliation andgross-error detection for dynamic systems. American Institute ofChemical Engineering Journal, 42(10), 2841.

Biegler, L. T., Nocedal, J., & Schmid, C. (1995). A reduced Hessianmethod for large-scale constrained optimization. SIAM Journal ofOptimization, 5(2), 314.

Chen, C., & Mangasarian, O. L. (1995). A Class of SmoothingFunctions for Nonlinear and Mixed Complementarity Problems,Technical report. University of Wisconsin: Computer SciencesDepartment.

Crowe, C. M. (1989)). Observability and redundancy of process datafor steady state reconciliation. Chemical Engineering Science,44(12), 2909.

Crowe, C.M. (1994). Data reconciliation—progress and challenges.In Proceedings of PSE’94.

Davies, L., & Gather, U. (1993). The identification of multipleoutliers. Journal of the American Statistical Association, 88(423),782.

Edgar, T. F., & Himmelblau, D. M. (1988). Optimization of ChemicalProcesses. New York: McGraw-Hill.

Fletcher, R. (1987). Practical Methods of Optimization (second ed.).New York: Wiley.

Gopal, V., & Biegler, L. T. (1999). Smoothing methods for comple-mentarity problems in process engineering. American Institute ofChemical Engineering Journal, 45(7), 1535.

Hampel, F. R. (1974). The influence curve and its role in robustestimation. Journal of the American Statistical Association,69(346), 383.

Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1983). UnderstandingRobust and Exploratory Data Analysis. New York: Wiley.

Huber, P. J. (1981). Robust Statistics. New York: Wiley.Jang, S. S., Joseph, B., & Mukai, H. (1986). Comparison of two

approaches to on-line parameter and state estimation of nonlinearsystems. Industrial Engineering and Chemical Process Design andDe�elopment, 25, 809.

Liebman, M. J., Edgar, T. F., & Lasdon, L. S. (1992). Efficient datareconciliation and estimation for dynamic processes using nonlin-ear programming techniques. Computers and Chemical Engineer-ing, 16(10/11), 963.

Marlin, T.E., & Hrymak, A.N. (1997). Real-Time operations opti-mization of continuous processes. In Fifth International Confer-ence on Chemical Process Control, (CACHE/AICHE).

Narasimhan, S., & Mah, R. S. H. (1987). Generalized likelihood ratiomethod for gross error identification. American Institute of Chem-ical Engineering Journal, 33(9), 1514.

Pai, C. C. D., & Fisher, G. D. (1988). Application of Broyden’smethod to reconciliation of nonlinearly constrained data. Ameri-can Institute of Chemical Engineering Journal, 34(5), 873.

Perkins. J.D. (1998). Plant wide optimization—opportunities andchallenges. In FOCAPO, (CA CHE/AIChE).

Rey, W. J. J. (1988). Introduction to Robust and Quasi-Robust Statis-tical Methods. Berlin/New York: Springer.

Robertson, D. G., Lee, J. H., & Rawlings, J. B. (1996). A movinghorizon-based approach for least-squares estimation. AmericanInstitute of Chemical Engineering Journal, 42(8), 2209.

Rollins, D. K., Cheng, Y., & Devanathan, S. (1996). Intelligentselection of hypothesis tests to enhance gross error identification.Computers and Chemical Engineering, 20(5), 517.

N. Arora, L.T. Biegler / Computers and Chemical Engineering 25 (2001) 1585–1599 1599

Serth, R. W., & Heenan, W. A. (1986). Gross error detection anddata reconciliation in stream–metering systems. American Insti-tute of Chemical Engineering Journal, 32(5), 733.

Soderstrom, T.A., Himmelblau, D.M., & Edgar, T.F. (2000). Amixed integer optimization approach for simultaneous data recon-ciliation and identification of measurement bias. In ADCHEM2000.

Stanley, G. M., & Mah, R. S. H. (1981). Observability and redun-

dancy in process data estimation. Chemical Engineering Science,36, 259.

Tjoa, I. B. F. (1991). Simultaneous Solution and Optimization Strate-gies for Data Analysis. Carnegie Mellon University: Ph.D.thesis.

Yamamura, K., Nakajima, M., & Matsuyama, H. (1988). Detectionof gross errors in process data using mass and energy balances.International Chemical Engineering, 28(1), 91.