9
Process data modeling using modified kernel partial least squares Yingwei Zhang n , Yongdong Teng Key Laboratory of Integrated Automation of Process Industry, Ministry of Education, Northeastern University, Shenyang, Liaoning 110004, PR China article info Article history: Received 6 January 2010 Received in revised form 27 August 2010 Accepted 3 September 2010 Available online 1 October 2010 Keywords: Chemical analysis Design Simulation Process control Data modeling Regression model abstract In this paper, a multivariate data modeling approach is proposed based on modified kernel partial least squares (MKPLS) with the signal filtering method. Then it is applied to quality prediction of industrial processes. In the original KPLS, several disadvantages are: (1) Has to iteratively calculate until convergence of score vectors to extract one principal component. Thus, this situation will affect the computing speed and waste lots of time. (2) Has to give a limited number of iterative steps and a precision limit which will reduce the accuracy of original KPLS, when the score vectors are not convergent. (3) Contains unwanted dinal KPLS is not able to remove undesirable systematic variation in X that is unrelated to Y. For the above problems, a modified KPLS regression model with orthogonal- kernel projections to latent structures (O-KPLS) is proposed, which is called OKPLS-KPLS. Advantages of the proposed OKPLS-KPLS are: (1) gets score vectors directly by using the corresponded eigenvector to the largest eigenvalue instead of the iterative calculation, it will improve the computing speed, (2) does not involve the limited number of iterative steps and the precision limit, hence, it will increase the accuracy compared to that of original KPLS, and (3) removes unwanted disturbed variation through the use of data preprocessing method (O-KPLS). O-KPLS is proposed here as a nonlinear data preprocessing method that removes from X information not correlated to Y. Furthermore, O-KPLS solves the issue of data nonlinearity compared to orthogonal projections to latent structures (O-PLS). In this paper, the prediction performance of the proposed approach (OKPLS-KPLS) is compared to those of original KPLS and OPLS using two examples. Of the three methods, OKPLS-KPLS shows the best performance in terms of regression fitting capacity and predicting future observations of the response variable(s). & 2010 Elsevier Ltd. All rights reserved. 1. Introduction A major objective in process data analysis is establishing regression models and predicting product quality from experi- mental or historical data. However, the high dimensionality and collinearity of such data makes it difficult or, in some cases, impossible to reliably measure the product quality. The need to describe the quality of the final product from such data has led to the development of multivariate predictive models such as partial least squares (PLS) (Geladi and Kowalski, 1986; Wold et al., 2001a, 2001b; Hoskuldsson, 1988). PLS is a dimensionality reduction technique that finds a set of latent variables through the projection of the process (X) and quality spaces (Y) onto new subspaces by maximizing the covariance between the two spaces. PLS has been shown to be a powerful technique for process modeling and calibration in systems where the predictor variables are collinear, measurement data contain noise, variables have high dimensionality, and where there are fewer observations than predictor variables. In practical situations dealing with complex chemical and physical systems, however, linear PLS is inappropriate for describing the underlying data structure because such systems may exhibit significant nonlinear characteristics, and PLS assumes that process data are linear. To tackle the issue of data nonlinearity, a number of approaches have been proposed that incorporate nonlinear features into the linear PLS framework. One of the approaches used a polynomial nonlinear mapping that was formulated on the assumption that the relationship between the predictor and response latent variables can be modeled using a polynomial expansion (Wold et al., 1989; Frank, 1990; Wold, 1992). Other approaches have fitted the nonlinear inner mapping using artificial neural networks, which can approximate any continuous function with any level of desired accuracy (Qin and McAvoy, 1992; Holcomb and Morari, 1992; Malthouse et al., 1997). Recently, a nonlinear PLS technique for tackling the problem of data nonlinearity, called kernel PLS (KPLS), was developed (Rosipal and Trejo, 2001). KPLS differs from the previously mentioned nonlinear PLS algorithms in that the original input data are nonlinearly transformed into a feature space of arbitrary Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/ces Chemical Engineering Science 0009-2509/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.ces.2010.09.009 n Corresponding author. E-mail addresses: [email protected], [email protected] (Y. Zhang). Chemical Engineering Science 65 (2010) 6353–6361

Process data modeling using modified kernel partial least squares

Embed Size (px)

Citation preview

Chemical Engineering Science 65 (2010) 6353–6361

Contents lists available at ScienceDirect

Chemical Engineering Science

0009-25

doi:10.1

n Corr

E-m

zhangyi

journal homepage: www.elsevier.com/locate/ces

Process data modeling using modified kernel partial least squares

Yingwei Zhang n, Yongdong Teng

Key Laboratory of Integrated Automation of Process Industry, Ministry of Education, Northeastern University, Shenyang, Liaoning 110004, PR China

a r t i c l e i n f o

Article history:

Received 6 January 2010

Received in revised form

27 August 2010

Accepted 3 September 2010Available online 1 October 2010

Keywords:

Chemical analysis

Design

Simulation

Process control

Data modeling

Regression model

09/$ - see front matter & 2010 Elsevier Ltd. A

016/j.ces.2010.09.009

esponding author.

ail addresses: [email protected],

@che.utexas.edu (Y. Zhang).

a b s t r a c t

In this paper, a multivariate data modeling approach is proposed based on modified kernel partial least

squares (MKPLS) with the signal filtering method. Then it is applied to quality prediction of industrial

processes. In the original KPLS, several disadvantages are: (1) Has to iteratively calculate until

convergence of score vectors to extract one principal component. Thus, this situation will affect the

computing speed and waste lots of time. (2) Has to give a limited number of iterative steps and a

precision limit which will reduce the accuracy of original KPLS, when the score vectors are not

convergent. (3) Contains unwanted dinal KPLS is not able to remove undesirable systematic variation in

X that is unrelated to Y. For the above problems, a modified KPLS regression model with orthogonal-

kernel projections to latent structures (O-KPLS) is proposed, which is called OKPLS-KPLS. Advantages of

the proposed OKPLS-KPLS are: (1) gets score vectors directly by using the corresponded eigenvector to

the largest eigenvalue instead of the iterative calculation, it will improve the computing speed, (2) does

not involve the limited number of iterative steps and the precision limit, hence, it will increase the

accuracy compared to that of original KPLS, and (3) removes unwanted disturbed variation through the

use of data preprocessing method (O-KPLS). O-KPLS is proposed here as a nonlinear data preprocessing

method that removes from X information not correlated to Y. Furthermore, O-KPLS solves the issue of

data nonlinearity compared to orthogonal projections to latent structures (O-PLS). In this paper, the

prediction performance of the proposed approach (OKPLS-KPLS) is compared to those of original KPLS

and OPLS using two examples. Of the three methods, OKPLS-KPLS shows the best performance in terms

of regression fitting capacity and predicting future observations of the response variable(s).

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

A major objective in process data analysis is establishingregression models and predicting product quality from experi-mental or historical data. However, the high dimensionality andcollinearity of such data makes it difficult or, in some cases,impossible to reliably measure the product quality. The need todescribe the quality of the final product from such data has led tothe development of multivariate predictive models such as partialleast squares (PLS) (Geladi and Kowalski, 1986; Wold et al., 2001a,2001b; Hoskuldsson, 1988). PLS is a dimensionality reductiontechnique that finds a set of latent variables through theprojection of the process (X) and quality spaces (Y) onto newsubspaces by maximizing the covariance between the two spaces.PLS has been shown to be a powerful technique for processmodeling and calibration in systems where the predictor variablesare collinear, measurement data contain noise, variables have

ll rights reserved.

high dimensionality, and where there are fewer observations thanpredictor variables.

In practical situations dealing with complex chemical andphysical systems, however, linear PLS is inappropriate fordescribing the underlying data structure because such systemsmay exhibit significant nonlinear characteristics, and PLS assumesthat process data are linear. To tackle the issue of datanonlinearity, a number of approaches have been proposed thatincorporate nonlinear features into the linear PLS framework. Oneof the approaches used a polynomial nonlinear mapping that wasformulated on the assumption that the relationship between thepredictor and response latent variables can be modeled using apolynomial expansion (Wold et al., 1989; Frank, 1990; Wold,1992). Other approaches have fitted the nonlinear inner mappingusing artificial neural networks, which can approximate anycontinuous function with any level of desired accuracy (Qin andMcAvoy, 1992; Holcomb and Morari, 1992; Malthouse et al.,1997).

Recently, a nonlinear PLS technique for tackling the problem ofdata nonlinearity, called kernel PLS (KPLS), was developed(Rosipal and Trejo, 2001). KPLS differs from the previouslymentioned nonlinear PLS algorithms in that the original inputdata are nonlinearly transformed into a feature space of arbitrary

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–63616354

dimensionality via nonlinear mapping, and then a linear PLSmodel is created in the feature space. KPLS can efficientlycompute regression coefficients in high-dimensional featurespaces using nonlinear kernel functions. Compared to othernonlinear approaches, the main advantage of KPLS is that itavoids nonlinear optimization by utilizing the kernel functioncorresponding to the inner product in the feature space (Rosipaland Trejo, 2001). As a result, KPLS essentially requires only linearalgebra, making it as simple as standard PLS. Moreover, KPLS canhandle a wide range of nonlinearities due to its ability to usedifferent kinds of kernels. Based on these merits, KPLS has beenshown to perform better than linear PLS in regressing andclassifying data in nonlinear systems (Rosipal, 2003; Rosipalet al., 2003; Rannar et al., 1994). However, the original KPLSalgorithm needs to iterate calculation until convergence of scorevectors to extract one principal component. Thus, this situationwill affect the computing speed and waste lots of time. If the scorevectors are not convergent, we must give a limited number ofiterative steps and a precision limit to obtain the score vectors, itwill reduce the accuracy of original KPLS. Hence, a modified KPLSregression model is proposed, which need not to wait for theconvergence of score vectors and get them by solving constrainedoptimization problem comparing with the original KPLSregression model (Kyungpil et al., 2005).

In general, multivariate process data contain unwantedsystematic variation in X that is unrelated to Y. Several datapreprocessing methods have been devised to remove undesirablesystematic variation in data. Wold et al. (1998) developedorthogonal signal correction (OSC) to remove systematicvariation that is unrelated, or orthogonal, to the response matrixY from the predictor matrix X. In OSC, the largest variation in Xhaving zero correlation with the reference value Y is selectivelyremoved from X. Recently, an orthogonal projections to latentstructures (O-PLS) method was proposed by Trygg and Wold(2002). O-PLS provides a way to remove systematic variation froman input data set X not correlated to the response set Y; in otherwords, to remove variability in X that is orthogonal to Y. TheO-PLS method analyzes the disturbing variation in each regularPLS component. However, in the O-PLS method, PLS can beeffectively performed only on a set of observations that varylinearly. When the variations are nonlinear, the linear PLS isinappropriate for fitting the nonlinear data. This situation willaffect the performance of O-PLS about removing systematicvariation from an input data set X not correlated to theresponse set Y. For this reason, an O-KPLS method is proposed,which solves the issue of data nonlinearity compared to O-PLS.

In this paper, we combine the modified KPLS regression modelwith O-KPLS signal filtering methods respectively, which arecalled to as OKPLS-KPLS.

The paper is organized as follows. The modified KPLS and thesignal filtering method are proposed in Section 2. In Section 3, theperformance of OKPLS-KPLS in predicting the response variablesis compared to those of original KPLS and OPLS using twoexamples. Finally, concluding remarks are given in Section 4.

2. Proposed methods

2.1. The modified KPLS

While PLS can be performed on linear systems, kernel partialleast squares (KPLS) can map nonlinear data in a higher-dimensional space called feature space in which they can bemodeled linearly. KPLS is formulated in this feature space toextend linear PLS to its nonlinear kernel form. As in Kyungpil et al.(2005), the KPLS algorithm, called original KPLS, is an iterative

process. In order to extract one principal component, we mustwait for the convergence of score vectors. Thus, a lot of time wasspent. At the same time, to ensure the convergence of scorevectors, a limited number of iterative steps and a precision limitwere given. This situation will reduce the accuracy of originalKPLS. To solve above problems, a modified KPLS is proposed here,which need not to wait for the convergence of score vectors andget them directly by solving constrained optimization problem.

As in Kyungpil et al. (2005), a nonlinear mapping functionUð dÞ that projects the input vectors from the input space to F:

XiARN-UðXiÞAF

So sample matrix X changes into ½UðX1Þ,UðX2Þ,. . .,UðXnÞ�T , that is

UðXÞ ¼ ½UðX1Þ,UðX2Þ,. . .,UðXnÞ�T . The definition of Matrix K is

K¼U(X)U(X)T, where the (i, j) element of K is Kij¼K(Xi,Xj)¼U(Xi)

TU(Xj).The modified KPLS is aimed at solving projection vector aF and

v, so that the following function obtains maximum value:

maxaF ,vðrðaF,vÞÞ ¼ aTFUðXÞT Yv ð1Þ

subject to

aTFaF ¼ 1 ð2Þ

vT v¼ 1 ð3Þ

and exist n-dimensional column vector a, we get

aF ¼UðXÞTa ð4Þ

Substituting Eq. (4) into Eqs. (1) and (2), we obtain

maxa,vðrða,vÞÞ ¼ aTFUðXÞT Yv¼ aTUðXÞUðXÞT Yv¼ aT KYv ð5Þ

subject to

aTFaF ¼ aTUðXÞUðXÞTa¼ aT Ka¼ 1 ð6Þ

vT v¼ 1 ð7Þ

To solve the above equations, construct Lagrange function as

Lða,v,l,mÞ ¼ aT KYv�1

2lðaT Ka�1Þ�

1

2mðvT v�1Þ ð8Þ

where 12l and 1

2m are Lagrange multipliers. Compute the partialderivatives of a and v for L(a,v,l,m), respectively, and make themequal to 0, so we get

@L

@a¼KYv�lKa¼ 0 ð9Þ

@L

@v¼ YT Ka�mv¼ 0 ð10Þ

From Eqs. (9) and (10), we get

aT KYv¼ l

vT YT Ka¼ m

So l¼ lT¼ vT YT Ka¼ m. Then Eqs. (9) and (10) can change into

KYv¼ lKa ð11Þ

YT Ka¼ lv ð12Þ

Because K is positive definite matrices, from Eqs. (11) and (12), wecan get

YYT Ka¼ l2a ð13Þ

YT KYv¼ l2v ð14Þ

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–6361 6355

Because maxa,vðrða,vÞÞ ¼ aT KYv¼ l, when a and v are theeigenvectors correspond to the largest eigenvalue, aTKYv getsthe maximum value. Thus, the problem of solving constrainedoptimization is changed into that of solving the above twocharacteristic equations. There are the following properties aboutthe eigenvalues and eigenvectors of YYTK and YTKY.

Property 1. YYTK and YTKY have the same non-zero eigenvalues:

l21Zl2

2Z � � �Zl2r 40, r¼minðrankðYYT KÞ,rankðYT KYÞÞ.

Let A¼K�1=2KYYT KK�1=2, B¼YTKY, by the theory of matrices,we can see YYTK and A, YTKY and B have the same non-zeroeigenvalues, respectively.

Let H¼K�1/2KY, then A¼HHT, B¼HTH. Based on singularvalue decomposition (SVD) theorem, we obtain

H¼Xr

i ¼ 1

liuidTi

where li is the non-zero eigenvalue of A and B; ui and di are the

orthogonal unit eigenvectors corresponding to l2i of A and B,

respectively ði¼ 1,2,. . .,rÞ.

Property 2. let ai ¼K�1=2ui, vi ¼ di, i¼ 1,2,. . .,r, then

(1)

ai,vi are the eigenvectors corresponding to l2i of YYTK and

YTKY, respectively;

(2) aT

i Kaj ¼ vTi vj ¼ dij, aT

i KYvj ¼ lidij, where dij¼1 for i¼ j and

dij¼0 for ia j. i,j¼ 1,2,. . .,r.

After obtaining a, v, we can calculate score vectors t, u:

t¼UðXÞUðXÞTa¼Ka ð15Þ

u¼ Yv ð16Þ

and then deflate K, Y matrices:

K’ðI�ttT ÞKðI�ttT Þ ð17Þ

Y’Y�ttT Y ð18Þ

The deflation in (17) and (18) is based on rank-one reductionof the K and Y matrices using a new extracted score vector t. Inparticular, the K matrix is deflated as

K’ðI�ttT ÞKðI�ttT Þ ¼K�ttT K�KttTþttT KttT

where I is an n-dimensional identity matrix.The regression coefficient B in the modified KPLS algorithm

can be obtained from

B¼UT UðTT KUÞ�1TT Y ð19Þ

where T and U are (n� k) matrices of the extracted k scorevectors. As a result, when the number of test data is nt(1,y,nt), thepredictions on training data and test data can be made as follows,respectively:

Y¼UB¼KUðTT KUÞ�1TT Y ð20Þ

Ynew ¼UnewB¼KnewUðTT KUÞ�1TT Y ð21Þ

Here, Ut is the matrix of the mapped test points and Kt is the(nt�n) test matrix whose elements are Kij¼K(Xi,Xj), where Xi isthe ith test vector and Xj is the jth training vector.

Before applying modified KPLS, mean centering in the high-dimensional space should be performed. This can be done bysubstituting the kernel matrices K and Kt with ~K and ~Kt , where

~K ¼ I�1

n1n1T

n

� �K I�

1

n1n1T

n

� �

~Knew ¼ Knew�1

n1nnew 1T

nK

� �I�

1

n1n1T

n

� �

Here, I is an n-dimensional identity matrix and 1n,1nt representvectors whose elements are ones, with length n and nt,respectively.

There exist a number of kernel functions. According toMercer’s theorem of functional analysis, there exists a mappinginto a space where a kernel function acts as a dot product if thekernel function is a continuous kernel of a positive integraloperator. Hence, the requirement on the kernel function is that itsatisfies Mercer’s theorem (Cristianini and Shawe-Taylor 2000).Representative kernel functions are given below:

Polynomial kernel:

kðx,yÞ ¼/x,ySd

Sigmoid kernel:

kðx,yÞ ¼ tanhðb0/x,ySþb1Þ

Radial basis kernel:

kðx,yÞ ¼ exp �99x�y992

c

!

where d, b0, b1 and c (Mika et al., 1999; Lee et al., 2004b) arespecified a priori by the user. The polynomial kernel and radialbasis kernel always satisfy Mercer’s theorem, whereas thesigmoid kernel satisfies it only for certain values b0 and b1

(Haykin, 1999; Zhang and Joe Qin, 2008a, 2008b, 2009). A specificchoice of kernel function implicitly determines the mapping Uand the feature space F. Among different types of kernels, radialbasis kernel is more common.

Steps of the proposed KPLS algorithm are as follows:

(1)

Compute a according Eq. (13), where a are the correspondedeigenvectors to the largest eigenvalue of YYTK.

(2)

Compute v according Eq. (14), where v is the correspondedeigenvectors to the largest eigenvalue of YTKY.

(3)

Compute t according Eq. (15). (4) Compute u according Eq. (16). (5) Deflation

K’ðI�ttT ÞKðI�ttT Þ

Y’Y�ttT Y

2.2. OKPLS-KPLS

The orthogonal signal correction (OSC) method was proposedby Wold et al. (1998). The idea was to remove systematicinformation in X not correlated to the modeling of Y in order toachieve better models in multivariate calibration. It was followedby a number of different OSC methods (Trygg and Wold, 2002;Fearn, 2000; Sjoblom et al., 1998; Westerhuis et al., 2001;Eriksson et al., 2000; Hoskuldsson, 2001; Wold, 1982; Kvalheim,1992; Feudale et al., 2002). Svensson et al. (2002), Goicoechea andOlivieri (2001) and Trygg (2001) have reviewed and compareddifferent OSC methods.

Trygg and Wold (2002) proposed an OSC method, which wascalled orthogonal projections to latent structures (O-PLS). TheO-PLS method is a preprocessing and filtering method to removesystematic orthogonal variation from a given data set X. TheO-PLS method can be effectively performed only on a set ofobservations that vary linearly. However, when the variations arenonlinear, the O-PLS cannot be effectively performed on thenonlinear data. It will affect the performance of O-PLS about

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–63616356

removing systematic variation from an input data set X notcorrelated to the response set Y. Thus, an OKPLS-KPLS method isproposed in this section. Its steps are as follows:

For training set X and y:

(1)

Set up the number of latent variables h. Calculate vectors u, t:KPLS(X,y))u,t.

(2)

Calculate weight vector w¼U(X)Tu. (3) Calculate loading vector p¼UðXÞT t=ðtT tÞ. (4) Calculate orthogonal weight vector:

w? ¼ p�wT p

wT ww¼UðXÞT t=ðtT tÞ�

uTUðXÞUðXÞT t=ðtT tÞ

uTUðXÞUðXÞT uUðXÞT u

¼UðXÞT t=ðtT tÞ�uT Kt=ðtT tÞ

uT KuUðXÞT u

(5)

Calculate orthogonal score vector t? ¼UðXÞw? ¼Kt=ðtT tÞ�ðuT Kt=ðtT tÞ=uT KuÞKu.

(6)

Calculate orthogonal loading vector p? ¼UðXÞT t?=ðtT?t?Þ.

(7)

U(X) and matrix K after denoising:

UðXÞ’UðXÞ�t?tT?UðXÞ=ðt

T?t?Þ ¼ ðI�t?tT

?=ðtT?t?ÞÞUðXÞ

K¼UðXÞUðXÞT’ðI�t?tT?=ðt

T?t?ÞÞUðXÞUðXÞT ðI�t?tT

?=ðtT?t?ÞÞ

¼ ðI�t?tT?=ðt

T?t?ÞÞKðI�t?tT

?=ðtT?t?ÞÞ

(8)

Compute a according Eq. (13), where a are the correspondedeigenvector to the largest eigenvalue of YYTK.

(9)

Compute v according Eq. (14), where v are the correspondedeigenvector to the largest eigenvalue of YTKY.

(10)

Compute t according Eq. (15). (11) Compute u according Eq. (16) until h latent variables are

extracted.

(12) Deflation

K’ðI�ttT ÞKðI�ttT Þ

Y’Y�ttT Y

For the test set Xnew:

(1)

Calculate orthogonal score vector:

t?new¼UðXnewÞw? ¼Knewt=ðtT tÞ�uT Kt=ðtT tÞ

uT KuKnewu

(2)

U(Xnew) and matrix Knew after denoising:

UðXnewÞ’UðXnewÞ�t?newtT?UðXÞ=ðt

T?t?Þ

Knew ¼UðXnewÞUðXÞT’ðUðXnewÞ�t?newtT?UðXÞ

=ðtT?t?ÞÞUðXÞT ðI�t?tT

?=ðtT?t?ÞÞ

¼ ðKnew�t?newtT?K=ðtT

?t?ÞÞðI�t?tT?=ðt

T?t?ÞÞ

(3)

Compute Ynew according to (21).For a description of the O-KPLS method with a Y matrix,please see Appendix A. In this paper, the properties of KPLS, asshown in Properties 1 and 2, are proposed. Then KPLS isperformed by using the properties, i.e., the score vectors t andu can be obtained through Properties 1 and 2. ComparedRosipal and Trejo (2001), the two properties are new.

2.3. Quality prediction

To compare the predictive abilities of the models, severalmeasures of a model’s ability to fit data and predictive power are

introduced. All of these measures provide an estimate of theaverage deviation of the model from the data. The root-mean-square error (RMSE) of the residuals is defined as

RMSE¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni ¼ 1 ðyi�yiÞ

2

n

s

where yi is the reference value, yi is the predicted value, and n isthe total number of samples. The RMSE is termed the root meansquare error in calibration (RMSEC) for the calibration set and theroot mean square error in prediction (RMSEP) for the test set.

Another measure of the model fit to the training data is R2,defined as

R2 ¼ 1�SSR

SSY

where SSR is the sum of squares of the residual, and SSY is thesum of squares of the response variable corrected for the mean.A value of R2

¼1 denotes that the model fits the data perfectly.A value of R2

¼0.5 means that only half of the total sum of squaresin the training set is explained by the model, and that the otherhalf is in the residuals. A model with a R2 value of 0.7 isconsidered a useful representation of the calibration data and amodel R240.9 is considered very good. The same test can be usedfor the values predicted from the test set, Q2. Usually, R2 for acalibration set is larger than Q2 for a test set, since calibrationmodels can easily lead to overfitting of the data.

3. Results and discussion

In this section, the prediction performances of the methodproposed here (OKPLS-KPLS) and two other regression models(original KPLS, OPLS-KPLS) are evaluated over the TennesseeEastman process simulation data set and the continuous anneal-ing process data set. The two data sets were mean centered andscaled to unit variance before modeling. To ensure a faircomparison, the same calibration and validation sets were usedfor each of the models. OPLS-KPLS and OKPLS-KPLS models werebuilt after data correction by OSC pretreatment. For the twomodels, the same number of OSC components were used for thefiltering.

3.1. Tennessee Eastman process

The Tennessee Eastman process is a complex nonlinearprocess, which was created by Eastman Chemical Company toprovide a realistic industrial process for evaluating processcontrol and monitoring methods. The test process is based on asimulation of an actual industrial process where the components,kinetics, and operating conditions have been modified forproprietary reasons. There are five major unit operations in theprocess: a reactor, a condenser, a recycle compressor, a separator,and a stripper; and it contains eight components: A, B, C, D, E, F, Gand H. The four reactants A, C, D and E and the inert B are fed tothe reactor where the products G and H are formed and abyproduct F is also produced. The process has 22 continuousprocess measurements, 12 manipulated variables and 19 compo-sition measurements sampled less frequently. The details on theprocess description are well explained in Chiang et al. (2001), Leeet al. (2004a). A total of 52 variables are used for monitoring inthis study. The compositions need to be predicted since they arehard to be measured on-line in practice. A sampling interval of3 min was used to collect the simulated data for the training andtesting sets. For each fault, two sets of data were generated. Thetraining data were used to build the models and the testing datawere used to validate the model. The training data sets for each

0 2 4 6 8 10 12 14 16 18 20-4

-3

-2

-1

0

1

2

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–6361 6357

fault are composed of 480 observations. The testing data sets foreach fault are composed of 960 observations. All faults in the testdata set were introduced from sample 160. The data are generatedby Chiang et al. (2001) and Lee et al. (2004a). And can bedownloaded from http://brahms.scs.uiuc.edu.

The prediction results of original KPLS, OPLS-KPLS and OKPLS-KPLS regression models for the training and test sets are listed inTables 1 and 2. The response variables are shown in Figs. 1–6. Forvariable 49 (composition E), OKPLS-KPLS shows the best pre-dictive ability, with the highest Q2 of 0.9987 and the lowestRMSEP of 0.0914 compared to original KPLS and OPLS-KPLS. At thesame time, OKPLS-KPLS shows improved regression fittingcapacity (R2

¼0.9992 and RMSEC¼0.0647) compared to originalKPLS and OPLS-KPLS. As shown in Figs. 1–3, the OKPLS-KPLSmodel fits the test data best. For variable 52 (composition H),the OKPLS-KPLS shows the best predictive ability, with thehighest Q2 of 0.9987 and the lowest RMSEP of 0.0468 comparedto original KPLS and OPLS-KPLS. At the same time, the OKPLS-KPLSshows improved regression fitting capacity (R2

¼0.9996 and

Table 1Summary of modeling results (variable 49).

Original KPLS OPLS-KPLS OKPLS-KPLS

A 8 7 7

Training R2 0.9960 0.9968 0.9992

RMSEC 0.1494 0.1321 0.0647

Test Q2 0.9645 0.9875 0.9987

RMSEP 0.4756 0.2816 0.0914

Table 2Summary of modeling results (variable 52).

Original KPLS OPLS-KPLS OKPLS-KPLS

A 10 9 9

Training R2 0.9980 0.9983 0.9996

RMSEC 0.0659 0.0604 0.0307

Test Q2 0.9573 0.9872 0.9987

RMSEP 0.2699 0.1475 0.0468

0 2 4 6 8 10 12 14 16 18 20-4

-3

-2

-1

0

1

2

3

Samples

Fig. 1. Relationships of predicted and observed response variable of test data

using KPLS in Tennesse-Eastman process (variable 49).

Samples

Fig. 2. Relationships of predicted and observed response variable of test data

using OPLS-KPLS in Tennesse-Eastman process (variable 49).

0 2 4 6 8 10 12 14 16 18 20-4

-3

-2

-1

0

1

2

Samples

Fig. 3. Relationships of predicted and observed response variable of test data

using OKPLS-KPLS in Tennesse-Eastman process (variable 49).

RMSEC¼0.0307) compared to original KPLS and OPLS-KPLS. Asshown in Figs. 4–6, the OKPLS-KPLS model fits the test data best.Consequently, among the three approaches, OKPLS-KPLS is thebest in terms of maintaining the highest predictive ability overthe test set. From Table 3, we found the training times of originalKPLS and OPLS-KPLS are more than training times of OKPLS-KPLSsince the iteration calculation is avoided and more disturbancesare removed.

3.2. Annealing process

The physical layout of the continuous annealing process isshown in Fig. 7. The steel strip travels a distance of 210 m, stays inthe section 6–8 min, the maximum line speed is 1060 m/min, thewidth of the strip is 730–1230 mm, the thickness is 0.18–0.55, themaximum weight is 26.5 t and is heated to 710 1C. The continuous

0 2 4 6 8 10 12 14 16 18 20-1.5

-1

-0.5

0

0.5

1

1.5

2

Samples

Fig. 5. Prediction results of test data (variable 52) using OPLS-KPLS in Tennesse-

Eastman process (blue: observation; red: prediction).

0 2 4 6 8 10 12 14 16 18 20-1.5

-1

-0.5

0

0.5

1

1.5

2

Samples

Fig. 6. Prediction results of test data (variable 52) using OKPLS-KPLS in Tennesse-

Eastman process (blue: observation; red: prediction).

Table 3Training time.

Original KPLS OPLS-KPLS OKPLS-KPLS

Variable 49 4.8 4.3 3.7

Variable 52 5.2 4.6 4.1

0 2 4 6 8 10 12 14 16 18 20-1.5

-1

-0.5

0

0.5

1

1.5

2

Samples

Fig. 4. Prediction results of test data (variable 52) using KPLS in Tennesse-Eastman

process (blue: observation; red: prediction). (For interpretation of the references

to color in this figure legend, the reader is referred to the web version of this

article.)

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–63616358

annealing process is composed of 21 tension control sections. Theentry coil first is opened by payoff reel (POR), and it is welded intoa strip finally. The strip passes through 1# bridle roll (1BR), entryloop (ELP), 2# bridle roll (2BR), 1# dancer roll (1DCR), and 3#bridle roll (3BR), then it enters the continuous annealing furnace.The annealing technologies consist of: rapid cooling-reheating-inclined over ageing. The annealing equipments include heatingfurnace (HF), soaking furnace (SF), slow cooling furnace (SCF), 1#cooling furnace (1C), reheating furnace (RF), over ageing furnace(OA), and 2# cooling furnace (2C). After completing the annealingcraft, the strip in turn pass through 4# bridle roll (4BR), deliveryloop (DLP), and 5# bridle roll (5BR), and temper rolling machine(TPM) 6# bridle roll (6BR), 2# dancer roll (2DCR), and 7# bridleroll (7BR). Finally, the strip enters roll type reel (TR) tobecome coil.

In the continuous annealing process, strip tension is an impo-rtant factor that determines whether the continuous annealing lineworks steadily and promptly or not. The bridle roll is one of keycomponents for controlling tension. The bridle roll speed andtension of the line and their performance are directly related toquality of the steel strip. There are seven bridle rolls: 1BR-7BR. Thevariables can be divided into groups according to the process unitsor work zones. For example, 1BR has five rolls. Each roll has twovariables: percentage of current and the velocity of a roll. Thepercentage of current and velocity’s change of the 1st–4th rolls areinput variables. Since the first four rolls drive the fifth roll, thepercentage of current and velocity of the fifth roll are outputvariables. There are 300 samples for training. For output variable1BR5R current, we take 20 samples as test data. For output variable1BR5R velocity, we use 12 samples as test data. The performanceparameters are summarized in Tables 4 and 5. The responsevariables are shown in Figs. 8–13. For variable 1BR5R current, theoriginal KPLS model predicts the test data with Q2

¼0.7347 andRMSEP¼0.5349 with seven latent variables. After OSC preproces-sing, OPLS-KPLS shows slightly improved predictive ability(Q2¼0.8363 and RMSEP¼0.4202) compared to original KPLS.

Compared to OPLS-KPLS, OKPLS-KPLS shows a better predictiveability, with a higher Q2 of 0.9059 and a lower RMSEP of 0.3185 forthe test data. As shown in Figs. 8–10, the OKPLS-KPLS model fits thetest data best. For variable 1BR5R velocity, the original KPLS modelpredicts the training data with R2

¼0.8315 and RMSEC¼0.0159. Atthe same time, the prediction result for the test data set givesQ2¼0.7005 and RMSEP¼0.0237. In the OPLS-KPLS model, removing

one OSC component improves the predictive ability for the test data(Q2¼0.7921, RMSEP¼0.0197). In OKPLS-KPLS, R2 for the training

data is higher than that of OPLS-KPLS, and its estimation capabilityfor the test data (Q2) is improved from 0.7921 to 0.8343. As shown

Fig. 7. The continuous annealing process. POR: payoff reel; BR: bridle roll; ELP: entrance loop; DCR: dancer roll; HF: heating furnace; SF: soaking furnace; SCF: slow cooling

furnace; C: cooling furnace; RF: reheating furnace; DLP: delivery loop; TPM: temper rolling machine; TR: roll type reel; OA: over ageing furnace; TM: tension model.

Table 4Summary of modeling results (variable 1BR4R current).

Original KPLS OPLS-KPLS OKPLS-KPLS

A 7 6 6

Training R2 0.9253 0.9246 0.9239

RMSEC 0.2938 0.2950 0.2964

Test Q2 0.7347 0.8363 0.9059

RMSEP 0.5349 0.4202 0.3185

Table 5Summary of modeling results (variable velocity).

Original KPLS OPLS-KPLS OKPLS-KPLS

A 16 15 15

Training R2 0.8315 0.9375 0.9717

RMSEC 0.0159 0.0097 0.0065

Test Q2 0.7005 0.7921 0.8343

RMSEP 0.0237 0.0197 0.01760 2 4 6 8 10 12 14 16 18 20

41

41.5

42

42.5

43

43.5

44

44.5

45

Samples

Fig. 9. Prediction results of test data (variable 1BR5R current) using OPLS-KPLS in

the continuous annealing process (blue: observation; red: prediction).

0 2 4 6 8 10 12 14 16 18 2040.5

41

41.5

42

42.5

43

43.5

44

44.5

45

Samples

Fig. 8. Prediction results of test data (variable 1BR5R current) using KPLS in the

continuous annealing process (blue: observation; red: prediction).

0 2 4 6 8 10 12 14 16 18 2040.5

41

41.5

42

42.5

43

43.5

44

44.5

45

Samples

Fig. 10. Prediction results of test data (variable 1BR5R current) using OKPLS-KPLS

in the continuous annealing process (blue: observation; red: prediction).

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–6361 6359

0 2 4 6 8 10 121.42

1.44

1.46

1.48

1.5

1.52

1.54

1.56

1.58

Samples

Fig. 11. Prediction results of test data (variable 1BR5R velocity) using KPLS in the

continuous annealing process (blue: observation; red: prediction).

0 2 4 6 8 10 121.42

1.44

1.46

1.48

1.5

1.52

1.54

1.56

1.58

Samples

Fig. 12. Prediction results of test data (variable 1BR5R velocity) using OPLS-KPLS

in the continuous annealing process (blue: observation; red: prediction).

0 2 4 6 8 10 121.42

1.44

1.46

1.48

1.5

1.52

1.54

1.56

Samples

Fig. 13. Prediction results of test data (variable 1BR5R velocity) using OKPLS-KPLS

in the continuous annealing process (blue: observation; red: prediction).

Table 6Training time.

Original KPLS OPLS-KPLS OKPLS-KPLS

Current 3.9 3.4 3.1

Velocity 4.1 3.6 3.3

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–63616360

in Figs. 11–13, the OKPLS-KPLS model fits the test data best.Consequently, among the three approaches, OKPLS-KPLS shows thehighest predictive ability. From Table 6, we also found the trainingtimes of original KPLS and OPLS-KPLS are more than training timesof OKPLS-KPLS since the iteration calculation is avoided and moredisturbances are removed.

4. Conclusions

In this paper we have proposed a new multivariate statisticalregression approach designated OKPLS-KPLS. The proposedapproach combines O-KPLS, which effectively removes informa-tion not correlated to a target parameter, and modified KPLS,which captures nonlinear relationships among predictor variablesas well as with response variable(s) through high-dimensionalfeature mapping. The prediction performance of OKPLS-KPLS wascompared to those of original KPLS and OPLS-KPLS. Of the threemethods, OKPLS-KPLS gave the best performance in terms of

regression fitting capacity and predicting future observations ofthe response variable(s).

Acknowledgements

The work is supported by China’s National 973 program(2009CB320600) and NSF in China (61020106003 and 60974057).

Appendix A

An outline of the proposed O-KPLS method is shown here for amatrix Y.

For training set X and Y: center and scale:

(1)

w¼UðXÞT y=ðyT yÞ. For each column in Y, estimate the corre-sponding w and create a matrix W¼[Ww].

(2)

Calculate score matrix Tw: KPCA(W))Tw. (3) Calculate vectors t: KPLS(X,Y)) t. (4) Calculate loading vector p¼UðXÞT t=ðtT tÞ. (5) Calculate orthogonal weight vector w?:

p¼ p�tT

wp

tTwtw

tw ¼UðXÞT t=ðtT tÞ�tT

wUðXÞT t=ðtT tÞ

tTwtw

tw,

orthogonalize P to each column in Tw, then set w?¼p.

(6) Calculate orthogonal score vector t?¼U(X)w? (7) Calculate orthogonal loading vector p? ¼UðXÞT t?=ðt

T?t?Þ

(8)

U(X) and matrix K after denoising:

UðXÞ’UðXÞ�t?tT?UðXÞ=ðt

T?t?Þ ¼ ðI�t?tT

?=ðtT?t?ÞÞUðXÞ

K¼UðXÞUðXÞT’ðI�t?tT?=ðt

T?t?ÞÞUðXÞUðXÞT ðI�t?t?

T=ðt?T t?ÞÞ

¼ ðI�t?tT?=ðt

T?t?ÞÞKðI�t?tT

?=ðtT?t?ÞÞ

Y. Zhang, Y. Teng / Chemical Engineering Science 65 (2010) 6353–6361 6361

or the test set Xnew:

F

(1)

Calculate orthogonal score vector t?new¼U(Xnew)w? (2) U(Xnew) and matrix Knew after denoising:

UðXnewÞ’UðXnewÞ�t?newtT?UðXÞ=ðt

T?t?Þ:

Knew ¼UðXnewÞUðXÞT’ðUðXnewÞ

�t?newtT?UðXÞ=ðt

T?t?ÞÞUðXÞT ðI�t?tT

?=ðtT?t?ÞÞ

¼ ðKnew�t?newtT?K=ðtT

?t?ÞÞðI�t?tT?=ðt

T?t?ÞÞ

References

Chiang, L.H., Russell, F.L., Braatz, R.D., 2001. Fault Detection and Diagnosis inIndustrial Systems. Springer, London.

Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machinesand Other Kernel-Based Learning Methods. Cambridge University Press, UK.

Eriksson, L., Trygg, J., Johansson, E., Bro, R., Wold, S., 2000. Orthogonal signalcorrection, wavelet analysis, and multivariate calibration of complicatedprocess fluorescence data. Anal. Chim. Acta 420, 181–195.

Fearn, T., 2000. On orthogonal signal correction. Chemometrics and IntelligentLaboratory Systems 50, 47–52.

Feudale, R.N., Tan, H.W., Brown, S.D., 2002. Piecewise orthogonal signal correction.Chemometrics and Intelligent Laboratory Systems 63, 129–138.

Frank, I.E., 1990. A non-linear PLS model. Chemometrics and Intelligent LaboratorySystems 8, 109–119.

Geladi, P., Kowalski, B.R., 1986. Partial least squares regression: a tutorial.Analytica Chimica Acta 185, 1–17.

Goicoechea, H.C., Olivieri, A.C., 2001. A comparison of orthogonal signal correctionand net analyte preprocessing methods. Theoretical and experimental study.Chemometrics and Intelligent Laboratory Systems 56, 73–81.

Haykin, S., 1999. Neural Networks. Prentice-Hall, Englewood Cliffs, NJ.Holcomb, T.R., Morari, M., 1992. PLS/neural networks. Computers and Chemical

Engineering 16, 393–411.Hoskuldsson, A., 1988. PLS regression methods. Journal of Chemometrics 2, 211–228.Hoskuldsson, A., 2001. Variable and subset selection in PLS regression. Chemo-

metrics and Intelligent Laboratory Systems 55, 23–38.Kvalheim, O.M., 1992. The latent variable. Chemometrics and Intelligent

Laboratory Systems 14, 1–3.Kyungpil, K., Lee, J.-M., Lee, I.-B., 2005. A novel multivariate regression approach

based on kernel partial least squares with orthogonal signal correction.Chemometrics and Intelligent Laboratory Systems 79, 22–30.

Lee, G., Han, C.H., Yoon, E.S., 2004a. Multiple-fault diagnosis of the TennesseeEastman process based on system decomposition and dynamic PLS. Industrialand Engineering Chemistry Research 43, 8037–8048.

Lee, J.-M., Yoo, C., Choi, S.W., Vanrolleghem, P.A., Lee, I.-B., 2004b. Nonlinearprocess monitoring using kernel principal component analysis. ChemicalEngineering Science 59, 223–234.

Malthouse, E.C., Tamhane, A.C., Mah, R.S.H., 1997. Nonlinear partial least squares.Computers and Chemical Engineering 21, 875–890.

Mika, S., Scholkopf, B., Smola, A.J., Muller, K.-R., Scholz, M., Ratsch, G., 1999. KernelPCA and de-noising in feature spaces. Advances in Neural InformationProcessing Systems 11, 536–542.

Qin, S.J., McAvoy, T.J., 1992. Nonlinear PLS modeling using neural networks.Computers and Chemical Engineering 16, 379–391.

Rannar, S., Lindgren, F., Geladi, P., Wold, S., 1994. A PLS kernel algorithm for datasets with many variables and fewer objects. Part 1: theory and algorithm.Journal of Chemometrics 8, 111–125.

Rosipal, R., 2003. Kernel partial least squares for nonlinear regression anddiscrimination. Neural Network World 13, 291–300.

Rosipal, R., Trejo, L.J., 2001. Kernel partial least squares regression in reproducingKernel Hilbert space. Journal of Machine Learning Research 2, 97–123.

Rosipal, R., Trejo, L.J., Matthews, B., Kernel PLS-SVC for linear and nonlinearclassification. In: Proceedings of the Twentieth International Conference onMachine Learning (ICML-2003), Washington, DC, 2003, pp. 640–647.

Sjoblom, J., Svensson, O., Josefson, M., Kullberg, H., Wold, S., 1998. An evaluationof orthogonal signal correction applied to calibration transfer of nearinfrared spectra. Chemometrics and Intelligent Laboratory Systems 44,229–244.

Svensson, O., Kourti, T., MacGregor, J.F., 2002. An investigation of orthogonal signalcorrection algorithms and their characteristics. Journal of Chemometrics 16,176–188.

Trygg, J., Parsimonious multivariate models. Ph.D. Thesis, Umea:

University, 2001,pp. 19–54.

Trygg, J., Wold, S., 2002. Orthogonal projections to latent structures (O-PLS).Journal of Chemometrics 16, 119–128.

Westerhuis, J.A., de, J.S., Smilde, A.K., 2001. Direct orthogonal signal correction.Chemometrics and Intelligent Laboratory Systems 56, 13–25.

Wold, H., 1982. Soft modeling. The basic design and some extensions. In: Joreskog,K.-G., Wold, H. (Eds.), Systems under Indirect Observation, vol. 2. North-Holland, Amsterdam, pp. 1–53.

Wold, S., 1992. Nonlinear partial least squares modelling: II. Spline inner relation.Chemometrics and Intelligent Laboratory Systems 14, 71–84.

Wold, S., Kettaneh-Wold, N., Skagerberg, B., 1989. Nonlinear PLS modeling.Chemometrics and Intelligent Laboratory Systems 7, 53–65.

Wold, S., Antti, H., Lindgren, F., Ohman, J., 1998. Orthogonal signal correction ofnear-infrared spectra. Chemometrics and Intelligent Laboratory Systems 44,175–185.

Wold, S., Trygg, J., Berglund, A., Antii, H., 2001a. Some recent developmentsin PLS modeling. Chemometrics and Intelligent Laboratory Systems. 58,131–150.

Wold, S., Sjostrom, M., Eriksson, L., 2001b. PLS-regression: a basic toolof chemometrics. Chemometrics and Intelligent Laboratory Systems 58,109–130.

Zhang, Y., Joe Qin, S., 2008a. Improved nonlinear fault detection technique andstatistical analysis. AIChE Journal 54 (12), 3207–3220.

Zhang, Y., Joe Qin., S., 2008b. Fault diagnosis and isolation of multi-input-multi-output networked control systems. Industrial & Engineering ChemistryResearch 47, 2636–2642.

Zhang, Y., Joe Qin, S., 2009. Adaptive actuator fault compensation for linearsystems with matching and unmatching uncertainties. Journal of ProcessControl 16 (9), 985–990.