18
Overview of PCA-Based Statistical Process-Monitoring Methods for Time-Dependent, High-Dimensional Data BART DE KETELAERE*, MIA HUBERT**, and ERIC SCHMITT** Leuven Statistics Research Centre, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium *Department of Biosystems, Division MeBioS, KU Leuven, Kasteelpark Arenberg 30, B-3001 Heverlee, Belgium **Department of Mathematics, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium High-dimensional and time-dependent data pose significant challenges to statistical process monitor- ing. Dynamic principal-component analysis, recursive principal-component analysis, and moving-window principal-component analysis have been proposed to cope with high-dimensional and time-dependent fea- tures. We present a comprehensive review of this literature for the practitioner encountering this topic for the first time. We detail the implementation of the aforementioned methods and direct the reader toward extensions that may be useful to their specific problem. A real-data example is presented to help the reader draw connections between the methods and the behavior they display. Furthermore, we highlight several challenges that remain for research in this area. Key Words: Autocorrelation; Nonstationarity; Principal-Component Analysis. 1. Introduction Q UALITY CONTROL CHARTS are a widely used tool, developed in the field of statistical process monitoring (SPM) to identify when a system is devi- ating from typical behavior. High-dimensional, time- dependent data frequently arise in applications rang- ing from health care, industry, and IT to the econ- omy. These data features challenge many canoni- cal SPM methods, which lose precision as the di- mensionality of the process grows, or are not well- suited for monitoring processes with a high de- gree of correlation between variables. In this paper, we present an overview of foundational principal- component analysis-based techniques currently avail- Dr. De Ketelaere is Research Manager in the Division of Mechatronics, Biostatistics and Sensors (MeBioS). His email is [email protected]. Dr. Hubert is Professor in the Department of Mathematics. Her email is [email protected]. Mr. Schmitt is Doctoral Student in the Department of Mathematics. His email is [email protected]. He is the corresponding author. able to cope with these process types and indicate some advantages and disadvantages. A wide range of scenarios encountered in SPM have motivated the development of many control-chart techniques, which have been improved and reviewed over the course of the last 40 years. Bersimis et al. (2006) give an overview of many multivariate process-monitoring techniques, such as the multivariate EWMA and multivariate CUSUM, but provide minimal coverage of techniques for high-dimensional processes. Barcel´ o et al. (2010) compare the classical multivariate time- series Box–Jenkins methodology with a partial least squares (PLS) method. The latter is capable of moni- toring high-dimensional processes, but more methods for a broader range of time-dependent process sce- narios are not covered. In discussing the monitoring of multivariate processes, Bisgaard (2012) highlights principal-components analysis (PCA), partial least squares, factor analysis, and canonical correlation analysis as applicable monitoring methods. These methods and their extensions have the property that they are capable of handling high-dimensional pro- cess data and time-dependence. All of them project the high-dimensional process onto a lower dimen- Journal of Quality Technology 318 Vol. 47, No. 4, October 2015

Overview of PCA-Based Statistical Process-Monitoring

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

Overview of PCA-Based StatisticalProcess-Monitoring Methods for

Time-Dependent, High-Dimensional Data

BART DE KETELAERE*, MIA HUBERT**, and ERIC SCHMITT**

Leuven Statistics Research Centre, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium*Department of Biosystems, Division MeBioS, KU Leuven, Kasteelpark Arenberg 30, B-3001 Heverlee, Belgium

**Department of Mathematics, KU Leuven, Celestijnenlaan 200B, B-3001 Heverlee, Belgium

High-dimensional and time-dependent data pose significant challenges to statistical process monitor-ing. Dynamic principal-component analysis, recursive principal-component analysis, and moving-windowprincipal-component analysis have been proposed to cope with high-dimensional and time-dependent fea-tures. We present a comprehensive review of this literature for the practitioner encountering this topic forthe first time. We detail the implementation of the aforementioned methods and direct the reader towardextensions that may be useful to their specific problem. A real-data example is presented to help the readerdraw connections between the methods and the behavior they display. Furthermore, we highlight severalchallenges that remain for research in this area.

Key Words: Autocorrelation; Nonstationarity; Principal-Component Analysis.

1. Introduction

QUALITY CONTROL CHARTS are a widely used tool,developed in the field of statistical process

monitoring (SPM) to identify when a system is devi-ating from typical behavior. High-dimensional, time-dependent data frequently arise in applications rang-ing from health care, industry, and IT to the econ-omy. These data features challenge many canoni-cal SPM methods, which lose precision as the di-mensionality of the process grows, or are not well-suited for monitoring processes with a high de-gree of correlation between variables. In this paper,we present an overview of foundational principal-component analysis-based techniques currently avail-

Dr. De Ketelaere is Research Manager in the Division of

Mechatronics, Biostatistics and Sensors (MeBioS). His email

is [email protected].

Dr. Hubert is Professor in the Department of Mathematics.

Her email is [email protected].

Mr. Schmitt is Doctoral Student in the Department of

Mathematics. His email is [email protected]. He

is the corresponding author.

able to cope with these process types and indicatesome advantages and disadvantages. A wide rangeof scenarios encountered in SPM have motivated thedevelopment of many control-chart techniques, whichhave been improved and reviewed over the courseof the last 40 years. Bersimis et al. (2006) give anoverview of many multivariate process-monitoringtechniques, such as the multivariate EWMA andmultivariate CUSUM, but provide minimal coverageof techniques for high-dimensional processes. Barceloet al. (2010) compare the classical multivariate time-series Box–Jenkins methodology with a partial leastsquares (PLS) method. The latter is capable of moni-toring high-dimensional processes, but more methodsfor a broader range of time-dependent process sce-narios are not covered. In discussing the monitoringof multivariate processes, Bisgaard (2012) highlightsprincipal-components analysis (PCA), partial leastsquares, factor analysis, and canonical correlationanalysis as applicable monitoring methods. Thesemethods and their extensions have the property thatthey are capable of handling high-dimensional pro-cess data and time-dependence. All of them projectthe high-dimensional process onto a lower dimen-

Journal of Quality Technology 318 Vol. 47, No. 4, October 2015

Page 2: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 319

sional subspace and monitor the process behaviorwith respect to it. Woodall and Montgomery (2014)provide a survey of multivariate process-monitoringtechniques as well as motivations for their use. Theauthors also provide clear insights into possible pro-cess types and which monitoring methods might besuitable and o↵er commentary on popular perfor-mance measures, such as the average run lengthand false discovery rate. Other books and papersdevote more attention to PCA process monitoring.Kourti (2005) describes fundamental control chartingprocedures for latent variables, including PCA andPLS, but does not discuss many of the main meth-ods for time-dependent data nor their extensions.Kruger and Xie (2012) include a chapter coveringthe monitoring of high-dimensional, time-dependentprocesses but focus on one method only. Qin (2003)provides a review of fault detection, identificationand reconstruction methods for PCA process mon-itoring. He mentions the challenges of monitoringtime-dependent processes, but restricts his primaryresults to cases where the data is not time dependent.However, to the best of our knowledge, an overviewdirectly focusing on the range of available controlchart techniques concerned with high-dimensional,time-dependent data has not yet been written withdirections for practical use.

We assume that we have observed a large number,p, of time series xj(ti), (1 j p) during a cali-bration period t1, t2, . . . , tT . As time continues, moremeasurements become available. SPM aims to de-tect deviations from typical process behavior duringtwo distinct phases of process measurement, calledphase I and phase II. Phase I is the practice of retro-spectively evaluating whether a previously completedprocess was statistically in control. Phase II is thepractice of determining whether new observationsfrom the process are in control as they are measured.Two types of time dependence are autocorrelationand nonstationarity. Autocorrelation arises when themeasurements within one time series are not inde-pendent. Nonstationarity arises when the parametersgoverning a process, such as the mean or covariance,change over time. While it can be advantageous to in-clude process knowledge, such as information aboutnormal state changes, for the sake of focus, we willassume no such prior knowledge.

When no autocorrelation is present in the dataand the process is stationary, control charts basedon PCA have been successfully applied in process-monitoring settings with high dimensionality. These

methods operate by fitting a model on a T ⇥ p cali-bration data matrix XT,p, where the ith row in thejth column contains the ith measurement of the jthtime series xj(ti) for 1 i T . The number ofrows of XT,p thus refers to the number of observedtime points and the number of columns to the num-ber of time series measured in the system. The cali-bration data are chosen to be representative of typ-ical behavior of the system. A new observation attime t, x(t) = (x1(t),x2(t), . . . ,xp(t))0, is comparedwith the data in XT,p, and evaluated by the controlchart to determine whether it is typical. This is calledstatic PCA because the fitted model remains staticas new observations are obtained. Therefore, it willnot adjust as underlying parameter values change(nonstationarity) and no attempt is made to modelrelationships between observations at di↵erent timepoints (autocorrelation). One can identify autocorre-lation in a process by examining autocorrelation andcross-correlation functions of the data, as we shall dobelow. Nonstationarity can be assessed on univariatedata using the augmented Dickey–Fuller test for aunit root. In high-dimensional data, a compromise isto perform this test on each of the scores of a staticPCA model.

Three classes of approaches have been proposed toextend PCA methods to cope with time-dependantdata. These are dynamic PCA (DPCA), recursivePCA (RPCA), and moving-window PCA (MW-PCA). DPCA was developed to handle autocorrela-tion, whereas RPCA and MWPCA are able to copewith nonstationary data. No method is currently pro-posed for settings when both autocorrelation andnonstationarity are present. Although existing meth-ods may provide acceptable monitoring in some con-texts, this is nonetheless an area for further research.

2. Introducing the NASABearings Data Set

Throughout this paper, the NASA PrognosticsCenter of Excellence bearing data set (Lee et al.(2007)) will be used to illustrate the behavior of themethods on data with autocorrelation and nonsta-tionarity. As shown in Figure 1, the data consist ofmeasurements of eight sensors (p = 8), with eachsensor representing either the x or y-axis vibrationintensities of a bearing. Four bearings are monitoredat intervals of approximately 15 minutes, and a vibra-tion signal of about a second is recorded to describethe “stability”. These raw data are then compressedinto a single feature for each sensor. The resulting

Vol. 47, No. 4, October 2015 www.asq.org

Page 3: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

320 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

FIGURE 1. Data Series Depicting the Autocorrelated,Nonstationary NASA Ball-Bearing Data Set. Sensors 7 and8 are plotted in light gray. Other sensors are plotted in darkgray.

observations are eight-dimensional vectors of bear-ing vibration intensities spaced at approximately 15-minute intervals. These are paired such that the firsttwo sensors correspond to the first bearing and so on.Figure 1 shows that there are two variables, belong-ing to the seventh and eighth sensors correspondingto the fourth bearing (plotted in light gray), whichbegin to deviate from typical behavior shortly afterthe 600th observation. Later in the experiment, acatastrophic failure for all of the bearings is observed.

The NASA process shares many similarities with amultistream process (MSP). An MSP results in mul-tiple streams of output for which, from the perspec-tive of SPM, the quality variable and its specifica-tions are identical across all streams. An MSP mayalso be defined as a continuous process where multi-ple measurements are made on a cross section of theproduct (Epprecht et al. (2011)). The NASA processhas features of both of these definitions. It resemblesthe first in the sense that each of the bearings may beseen as having similar specifications to one another,with the average vibrations tending to be slightly dif-ferent (but this can be adjusted so that they have thesame mean), and the displayed variance being sim-ilar. The NASA process resembles the second defi-nition in the sense that multiple measurements aremade on a cross section of the process; namely, allof the bearings are measured by two sensors. We de-tect some correlation between the streams, but asEpprecht and Simoes (2013) note, this violates theassumption, made by most MSP methods, that noneis present. Given these process features, PCA and its

FIGURE 2. Histograms, Scatterplots, and Correlations ofSensors 1, 2, 7, and 8 During the First 120 Measurements.

extensions are a possible monitoring solution. Rungeret al. (1996) applied PCA to MSPs and note that thisapproach models the correlation structure betweenprocess variables. PCA is also capable of monitoringmore general multivariate processes consisting of out-puts that do not have identical properties, which maybe the case when the second MSP definition is moreappropriate and multiple measurements are made ona cross section. An additional advantage of PCA isthat it is capable of modeling high-dimensional pro-cesses, which can pose problems for many MSP meth-ods requiring an invertible covariance matrix.

Histograms, correlations, and pairwise scatter-plots of vibration intensity measurements from sen-sors (1 and 2) placed on a typical bearing and sensors(7 and 8) on a deviating bearing are presented in Fig-ure 2 for the first 120 observations, as these exhibitbehavior characteristic of the in-control process. Thecorresponding autocorrelation functions (ACFs) upto 50 lags are depicted in Figure 3. The autocorrela-tion is presented as light-gray bars, while a limit toidentify lags with high autocorrelation is expressed asa dark-gray line. During this early period, the pairs ofsensors are only mildly correlated, with autocorrela-tion only exceeding the dark-gray line indicating the97.5 percentile limits for a few lags. For comparativepurposes, the descriptive plots and autocorrelationfunctions are also shown for observations between

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 4: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 321

FIGURE 3. ACFs of Sensors 1, 2, 7, and 8 During theFirst 120 Measurements.

t = 600 and t = 1000 in Figures 4 and 5. In theplots for the later time period, we see that sensors 7and 8 become highly correlated as failure occurs. Anadvantage of multivariate control charts is that theytake the change in the correlation between variablesinto account when determining if a system is goingout of control. Furthermore, because nonstationar-ity has begun to develop, the ACFs now report veryhigh-order autocorrelation. Earlier observations willbe used to fit models, but control charts will alsobe used to assess these observations. In our context,

FIGURE 4. 4.Histograms, Scatterplots, and Correlationsof Sensors 1, 2, 7, and 8 During the Time Period Betweent = 600 and t = 1,000.

FIGURE 5. ACFs of Sensors 1, 2, 7, and 8 During theTime Period Between t = 600 and t = 1,000.

we will consider this monitoring phase I because itcould be used by the practitioner to gain a betterunderstanding of the behavior of this process fromhistorical data. For the purposes of this paper, wewill consider the later observations to be absent fromthe historical observations the practitioner could ac-cess for phase I monitoring and thus monitoring theselater observations will constitute phase II.

3. Static PCA

3.1. Method

Principal-components analysis defines a linear re-lationship between the original variables of a dataset, mapping them to a set of uncorrelated variables.In general, static PCA assumes an (n⇥ p) data ma-trix Xn,p = (x1, . . . ,xn)0. Let 1n = (1, 1, . . . , 1)0be of length n. Then the mean can be calculatedas x = (1/n)X 0

n,p1n and the covariance matrix asS = [1/(n� 1)](Xn,p�1nx0)0(Xn,p�1nx0). Each p-dimensional vector x is transformed into a score vec-tor y = P 0(x� x), where P is the p⇥ p loading ma-trix, containing column-wise the eigenvectors of S.More precisely, S can be decomposed as S = P⇤P 0.Here, ⇤ = diag(�1,�2, . . . ,�p) contains the eigenval-ues of S in descending order. Throughout this paper,PCA calculations will be performed using the covari-ance matrix. However, it is generally the case that themethods discussed can also be performed using thecorrelation matrix R by employing di↵erent formu-las.

It is common terminology to call y the scoresand the eigenvectors, P , the loading vectors. Inmany cases, due to redundancy between the vari-

Vol. 47, No. 4, October 2015 www.asq.org

Page 5: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

322 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

ables, fewer components are su�cient to representthe data. Thus, using k < p of the components, onecan obtain k-dimensional scores by the following:

y = P 0k(x� x), (1)

where Pk contains only the first k columns of P .To select the number of components to retain in thePCA model, one can resort to several methods, suchas the scree plot or cross validation. For a review ofthese and other methods, see, e.g., Valle et al. (1999)and Jolli↵e (2002). In this paper, the number of com-ponents will be selected based on the cumulative per-centage of variance (CPV), which is a measure of howmuch variation is captured by the first k PCs and is

CPV(k) =Pk

j=1 �jPpj=1 �j

100%.

The number of PCs is selected such that the CPV isgreater than the minimum amount of variation themodel should explain.

Control charts can be generated from PCA mod-els by using the Hotelling’s T 2 statistic and theQ-statistic, which is also sometimes referred to asthe squared prediction error (SPE). For any p-dimensional vector x, Hotelling’s T 2 is defined as

T 2 = (x� x)0Pk⇤�1k P 0

k(x� x) = y0⇤�1k y,

where ⇤k = diag(�1,�2, . . . ,�k) is the diagonal ma-trix consisting of the k largest eigenvalues of S. TheQ-statistic is defined as

Q = (x� x)0(I � PkP 0k)(x� x) = kx� xk2,

with x = PkP 0k(x � x). The Hotelling’s T 2 is the

Mahalanobis distance of x in the PCA model spaceand the Q-statistic is the quadratic orthogonal dis-tance to the PCA space. Assuming temporal inde-pendence and multivariate normality of the scores,the 100(1� ↵)% control limit for Hotelling’s T 2 is

T 2↵ =

k(n2 � 1)n(n� k)

Fk,n�k(↵). (2)

Here, Fk,n�k(↵) is the (1 � ↵) percentile of the F -distribution with k and n� k degrees of freedom. Ifthe number of observations is large, the control limitscan be approximated using the (1� ↵) percentile ofthe �2 distribution with k degrees of freedom, thusT 2

↵ ⇡ �2k(↵). The simplicity of calculating this limit is

advantageous. The control limit corresponding to the(1�↵) percentile of the Q-statistic can be calculated,provided that all the eigenvalues of the matrix S can

be obtained (Jackson and Mudholkar, 1979), as

Q↵ = ✓1

z↵

p2✓2h2

0

✓1+ 1 +

✓2h0(1� h0)✓21

!2

,

where

✓i =pX

j=k+1

�ij for i = 1, 2, 3 and h0 = 1� 2✓1✓3

3✓22

and z↵ is the (1� ↵) percentile of the standard nor-mal distribution. Another way of obtaining cut-o↵sfor the Q-statistic based on a weighted �2 distribu-tion is detailed in Nomikos and MacGregor (1995).An advantage of this approach is that it is relativelyfast to compute. During phase I, the T 2- and Q-statistic are monitored for all observations x(ti) =(x1(ti), . . . ,xp(ti))0 with 1 i T . It is importantto note that fitting the PCA model to these datawill result in a biased model with possible inaccu-rate fault detection if faults are present because theycan bias the fit. If faults are present, it is advised tofit a robust PCA model and refer to the monitoringstatistics it produces. Phase II consists of evaluatingcontemporary observations xt = x(t) using the T 2-and Q-statistic based on the outlier-free calibrationset.

An intuitive depiction of static PCA is given inFigure 6. This figure will serve as a basis of compari-son between the DPCA, RPCA, and MWPCA tech-niques that are discussed in the following sections.Variables are represented as vertical lines of dotsmeasured over time. The light-gray rectangle con-tains the observed data during the calibration periodthat is used to estimate the model that will be usedfor subsequent monitoring. The dark-gray rectangleis the new observation to be evaluated. The two plots

FIGURE 6. A Schematic Representation of Static PCAat Times t (Left) and t + 1 (Right). The model is fitted onobservations highlighted in light gray. The new observation,highlighted in dark gray, is evaluated.

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 6: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 323

show that, at time t + 1 (right), the same model isused to evaluate the new observation in dark gray asin the previous time period, t (left).

PCA is well suited for monitoring processes wherethe total quality of the output is properly assessedby considering the correlation between all variables.However, if a response variable is also measured andthe relationship of the process variables to it is of pri-mary interest, the technique of partial least squares(PLS) is preferred to PCA. Like linear regression, it isused to model the linear relation between a set of re-gressors and a set of response variables; but like PCA,it projects the observed variables onto a new space,allowing it to cope with high-dimensional data. Con-trol charts may be implemented for PLS in much thesame way as they are for PCA. Kourti (2005) pro-vides a comparison of PCA and PLS, as well as somereferences for PLS control chart literature.

Static PCA requires a calibration period to fit amodel. However, it is well known that PCA is highlysusceptible to outliers. If outliers are included in thedata used to fit a monitoring model, the detection ac-curacy can be severely impaired. Robust PCA meth-ods, such as ROBPCA (Hubert et al. (2005)), havebeen designed to provide accurate PCA models evenwhen outliers are present in the data. A robust PCAmethod can be used to identify outliers in the cali-bration data for removal or examination. Once theseare removed, the resulting robust PCA model can beused as the basis for subsequent process monitoring.ROBPCA may be performed using the robpca func-tion in the LIBRA toolbox (Verboven and Hubert(2005)) or the PcaHubert function in the R packagerrcov (Todorov and Filzmoser (2009)).

In addition to outliers, future observations withmissing data and observations with missing dataduring the calibration phase present challenges forprocess monitoring. In the context of PCA controlcharts, a number of options for addressing these is-sues exist. The problem of future observations withmissing data is typically addressed by using the pro-cess model and nonmissing elements of the new ob-servation, xnew, to correct for the missingness ofsome of its elements. Examples of algorithms usingthis approach at various levels of complexity are dis-cussed in Arteaga and Ferrer (2002). They concludethat a method referred to as trimmed score regres-sion (TSR) has the most advantages, in terms of ac-curacy and computational feasibility, of the methodsthey considered. TSR uses information from the full

score matrix Y from the calibration data, the load-ings in P corresponding to the nonmissing variablesin xnew, and xnew itself to estimate the ynew. In theevent that the calibration data have missing values,one does not have access to existing estimates of Pand Y to use for missing-data corrections. Walczakand Massart (2001) propose a method for missing-data imputation based on the expectation maximiza-tion (EM) algorithm. Serneels and Verdonck (2008)make this method robust, allowing missing-data im-putation to proceed even when the calibration dataset is contaminated by outliers. An implementationis available in the rrcovNA package (Todorov (2013))in R (R Core Team (2014)).

3.2. Static PCA Applied to the NASA Data

In this subsection, we apply static PCA to theNASA data. Before constructing control charts, weperformed ROBPCA on the first 120 observationsthat we use to fit the PCA model. No significantoutliers were detected, so we fit a PCA model onthat data without removing observations. No datawere missing in this data set, so missing data meth-ods were not employed. It is common in many fieldsto perform preprocessing. The type of preprocessingis typically determined by the type of process be-ing monitored, with chemometrics, for instance, giv-ing rise to many preprocessing approaches specific tothat context. In the case of the NASA data, no spe-cial preprocessing is necessary. Since all of the sen-sors in the NASA data are measuring vibrations inthe same units, standardizing the data is not strictlynecessary.

FIGURE 7. Static PCA Control Charts for the EntireNASA Data Set. The first 120 observations are used to fitthe underlying model.

Vol. 47, No. 4, October 2015 www.asq.org

Page 7: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

324 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

FIGURE 8. ACFs of the First Two Scores of Static PCAApplied to the NASA Data Set for t 120.

Static PCA applied to the NASA bearing dataset generates the control chart in Figure 7 and ACFplot in Figure 8. We plot the logarithm of the T 2-and Q-statistics in these and subsequent charts assolid light-gray lines and the control limit in solid,dark-gray lines. The first 120 observations are usedto fit the underlying model, as we do not observeany large change in the vibration intensity of any ofthe sensors during this period, and this will also al-low us to evaluate the estimated model against thewell-behaved data observed before t = 120. There-fore, we di↵erentiate between phase I, which takesplace when t 120, and phase II. A vertical line di-vides these two periods in Figure 7. Five componentsare retained in accordance with the CPV criterion.We see that failure of the system is detected beforecatastrophic failure occurs, at around t = 120 by theQ-statistic and at around t = 300 by the T 2-statistic.Because we did not detect any major outliers usingROBPCA during phase I, it is not surprising that fewobservations exceed the cut-o↵s during this early pe-riod and that later, during phase II when the issuewith the fourth bearing develops, we find a failure.Figure 8 shows there is room to reduce the variabil-ity of the statistics by accounting for autocorrelation.Examining the first score, we see that the autocorre-lations are fairly low, but when the number of lags isless than 10 or more than 30, many exceed the cut-o↵.The second component exhibits even stronger auto-correlation. Reducing the autocorrelation will morestrongly justify the assumption that the control chartstatistics are being calculated on i.i.d. inputs.

It is desirable that a model of the data be inter-pretable. One way to interpret PCA is by examin-ing the loadings it produces. In some cases, this re-veals a logical structure to the data. Table 1 presentsthe loadings of the static PCA model of the NASAdata. In the case of this data set, a clear structureis not revealed by the loadings. The first componentloads most heavily on sensors 1, 2, and 5. It is un-derstandable that the sensors 1 and 2 might be cor-related because they both measure the first bearing,

TABLE 1. Loadings of the Static PCA Modelof the NASA Data

Component

Sensor 1 2 3 4 5

1 �0.471 0.231 �0.173 0.2642 �0.430 0.306 �0.341 0.4033 �0.249 0.175 �0.194 �0.400 �0.3594 �0.259 0.110 �0.320 �0.5705 �0.467 �0.615 0.205 0.301 �0.2406 �0.368 �0.464 �0.418 0.2697 �0.236 0.422 0.788 �0.128 �0.2768 �0.233 0.198 0.212 0.807

but sensor 5 measures the third bearing. The remain-ing components are similarly ambiguous, with nonecorresponding to an intuitive structure. One way toimprove interpretabilty of PCA models is to employa rotation, such as the varimax. However, doing sois not necessary to achieve desirable fault-detectionproperties. The last three components di↵er from thefirst two in that some of the values of the loadings areso small that they are e↵ectively zero (these are leftblank in the table). The omission of relatively unim-portant variables from components increases the in-terpretability of them. Two similar procedures for ac-complishing this are sparse PCA (Zou et al. (2006))and SCoTLASS (Jolli↵e et al. (2003)). These meth-ods are designed to return a PCA model that fits thedata well, while giving many variables small or zeroloadings on the components where they are relativelyunimportant.

As a byproduct of PCA, one can construct a con-tribution plot, showing the contribution of each vari-able to the control statistics for a given observation(Miller et al. (1998)). The contributions of the jthvariable to the T 2- and the Q-statistic of an obser-vation x is the jth element of the vectors,

T 2contr = (x� x)0Pk⇤

�1/2k P 0

k

Qcontr = (x� x)0(I � PkP 0k). (3)

These contributions can be plotted as bars with theexpectation that variables that made a large contri-bution to a fault can be identified by higher mag-nitude bars. This does not necessarily lead to pre-cise identification of the source of the fault, but itshows which variables are also behaving atypicallyat the time of occurrence. In Figure 9, we display

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 8: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 325

FIGURE 9. Contribution Plots Showing the Contributionof Each Sensor to the T 2 and Q-Statistics for Observationsat t = 100 and t = 1200.

contribution plots for observations before the fault(t = 100) and after (t = 1200). Comparing the twoplots, we see that both statistics are much less in-fluenced by the observation from t = 100 than fromt = 1200. Focusing on the contribution plots for laterobservation, we see that the plot for the Q-statisticis ambiguous but that the contribution plot for theT 2-statistic clearly indicates sensors 7 and 8 as theprimary sources for this observation’s deviation onthe model space. Interpreting these plots, the prac-titioner would likely investigate the fourth bearingmore closely. When many variables are being moni-tored, the contribution plot can become di�cult tointerpret. Hierarchical contribution plots are a way ofovercoming this issue (Qin et al. (2001)). Qin (2003)provides further detail on extensions to the contribu-tion plot and fault reconstruction.

4. Dynamic PCA

4.1. Method

One approach for addressing autocorrelation isto perform first-order di↵erencing. This can dimin-ish the e↵ects of autocorrelation but it is problem-atic in the context of process monitoring. Problemsarise when detection of some fault types, such as stepfaults, is desired. In the case of step faults, di↵erenc-ing will reveal the large change that takes place whenthe fault first occurs, but subsequent faulty observa-tions will appear normal because they are in controlrelative to one another. As a result, an operator in-terpreting the control chart may be led to believethat the first faulty observation was an outlier andthe process is back in control. Dynamic PCA was firstproposed in Ku et al. (1995) as a way to extend static

PCA tools to autocorrelated, multivariate systems.The authors note that, previously, others had takenthe approach of addressing autocorrelated data byfitting univariate ARIMA models to the data and an-alyzing the residuals, which ignores cross-correlationbetween the variables. Attempts were made to im-prove the results by estimating multivariate modelsusing this approach, but this quickly proves to be acomplex task as p grows, due to the high number ofparameters that must be estimated and the presenceof cross-correlation.

DPCA combines the facility in high dimensions ofPCA with the ability to cope with autocorrelation ofARIMA. The approach of Ku et al. (1995) is that,in addition to the observed variables, the respectivelagged values up to the proper order can also be in-cluded as input for PCA estimation. For example,an AR(1) process will require the inclusion of laggedvalues up to order one.

Given data observed up to time T , XT,p, DPCAwith one lag models the process based on a matrixincluding one lag, fXT�1,2p, which has twice as manyvariables and one fewer row as a result of the lag-ging. More generally for an AR(l) process, we ob-tain fXT�l,(l+1)p, where the ith row of fXT�l,(l+1)p

is (x(ti+l),x(ti+l�1), . . . ,x(ti)) with i = 1, . . . , T � l.As new observations are measured, they are also aug-mented with lags as in the rows of fXT�l,(l+1)p andcompared with the model estimated by DPCA. In es-timating the linear relationships for the dimensional-ity reduction, this method also implicitly estimatesthe autoregressive structure of the data, as e.g., il-lustrated in Tsung (2000). For addressing the issueof moving average (MA) terms, it is well known thatan MA process can be approximated by using a highenough order AR process. As functions of the model,the T 2 and Q-statistics now will also be functions of

FIGURE 10. A Schematic Representation of DPCA withOne Lag at Times t (Left) and t + 1 (Right).

Vol. 47, No. 4, October 2015 www.asq.org

Page 9: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

326 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

the lag parameters. If the outlier detection methodsdiscussed in Section 3.1 are of interest, they can beapplied after including the appropriate lags.

DPCA is characterized intuitively in Figure 10,where a model estimated from observations in thelight-gray window is used to evaluate whether thenewly observed observation and the correspondinglagged observations, in dark gray, deviate from typ-ical behavior. Note that, because the assumption isthat the mean and covariance structures remain con-stant, it is su�cient to use the same model to evalu-ate observations at any future time point.

Ku et al. (1995) demonstrate that their procedureaccounts for the dynamic structure in the raw databut note that the score variables will still be autocor-related and possibly cross-correlated, even when noautocorrelation is present. Kruger et al. (2004) provethe scores of DPCA will inevitably exhibit some au-tocorrelation. They show that the presence of auto-correlated score variables leads to an increased rate offalse alarms from DPCA procedures using Hotelling’sT 2. They claim that the Q-statistic, on the otherhand, is applied to the model residuals, which areassumed to be i.i.d., and thus this statistic is nota↵ected by autocorrelation of the scores. They pro-pose to remedy the presence of autocorrelation in thescores through ARMA filtering. Such an ARMA fil-ter can be inverted and applied to the score variablesso that unautocorrelated residuals are produced fortesting purposes. Another possibility is to apply anARMA filter on the process data but, in cases wherethe data is high dimensional, it is generally morepractical to work on the lower-dimensional scores.

Luo et al. (1999) propose that the number of falsealarms generated using DPCA methods can be re-duced by applying wavelet filtering to isolate the ef-fects of noise and process changes from the e↵ects ofphysical changes in the sensor itself. This approachdoes not specifically address problems of autocorrela-tions and nonstationarity, but the authors find thatresults improve when a DPCA model is applied toautocorrelated data that has been filtered.

Another approach to reduce the autocorrelationof the scores was introduced and explored by Ratoand Reis (2013a, c). Their method DPCA-DR pro-ceeds by comparing the one-step ahead predictionscores (computed by means of the expectation-maximization algorithm) with the observed scores.The resulting residuals are almost entirely uncorre-lated and therefore suitable for monitoring. Statistics

based on this approach are typically better behavedthan those produced by both static and conventionalDPCA, sometimes significantly so.

4.2. Choice of Parameters

A simple way to select the number of lags man-ually is to apply a PCA model with no lags andexamine the ACFs of the scores. If autocorrelationis observed, then an additional lag can be added.This process can be repeated until enough lags havebeen added to su�ciently reduce the autocorrela-tion. However, this approach is extremely cumber-some due to the number of lags that it may be nec-essary to investigate, and similarly if there are manycomponents, there will be many ACFs to inspect.Ku et al. (1995) provide an algorithm to specify thenumber of lags that follows from the argument that alag should be included if it adds an important linearrelationship. Beginning from no lags, their algorithmsequentially increases the number of lags and evalu-ates whether the new lag leads to an important lin-ear relationship for one of the variables. This methodexplicitly counts the number of linear relationships.When a new lag does not reveal an important linearrelationship, the algorithm stops and the number oflags from the previous iteration is used. The numberof lags selected is usually one or two and all variablesare given the same number of lags.

Rato and Reis (2013b) propose two new, comple-mentary methods for specifying the lag structure.The first is a more robust method of selecting thecommon number of lags applied to all variables thanthe Ku et al. (1995) approach. It also increasinglyadds lags, but the algorithm stops after l lags, if,roughly said, the smallest singular value of the co-variance matrix of the extended data matrix fX issignificantly lower than the one using l � 1 lags. In-tuitively, this corresponds to the new lag not provid-ing additional modeling power. The second methodbegins from the previous one, and improves it byalso reducing the number of lags for variables thatdo not require so many, thereby giving a variable de-termined lag structure. The authors show that thisbetter controls for autocorrelation in the data andleads to better behaviors of the test statistics.

4.3. DPCA Applied to the NASA Data

DPCA control charts for the NASA data areshown in Figure 11. Parameter values for DPCA andthe adaptive methods are presented in Table 2. ForDPCA, this is the number of lags; for RPCA, theforgetting factor ⌘; and for MWPCA, the window-

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 10: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 327

FIGURE 11. DPCA Control Charts for the NASA DataSet Using 1 (Top) and 20 (Bottom) Lags.

size H. All models select the number of latent vari-ables (LV) such that the CPV is at least 80%. Thenumber of components used at the last evaluation ofthe system is included for each setting. Typically, thenumber of latent variables varies at the beginning ofthe control chart and then stabilizes to the value thatis shown.

Proposals for automatically selecting the parame-ter of each of the methods are available, but a con-

TABLE 2. Parameter Values (PV) Used in the NASAData Example for All Time-Dependent Methods

Low High

Method LV PV LV PV

DPCA 8 1 39 20RPCA 2 0.9 2 0.9999MWPCA 1 40 1 80

FIGURE 12. ACFs of the First Two Scores of DPCAApplied to the NASA Data Set when Using 1 (Upper) and20 (Lower) Lags for t 120.

sensus does not exist on which is best for any of thethree. Thus, for each method, we select low and highvalues for the parameter of interest to illustrate howthis influences the performance. Nonetheless, we stillnote that automatic methods, such as those discussedfor selecting the number of lags for DPCA, should beconsidered within the context facing the practitioner.

When DPCA is applied, the number of compo-nents needed to explain the structure of the modelinput grows. For one lag, 8 components are needed,while for 20 lags, 39 components are taken. This hasthe shortcoming that data sets with few observationsmay not be able to support such a complex structure.Figure 11 shows the results of DPCA control chartsfitted on the first 120 observations. Again, we con-sider the period when t 120 as phase I monitoringand, at later points, phase II monitoring takes place.When l = 1, the ACF of the first score (see Fig-ure 12) exhibits autocorrelation at lags below 10 andabove 20, as we saw in the case of static PCA (seeFigure 8). The second score of static PCA showedautocorrelations exceeding the cut-o↵ for almost alllags, but we now see that almost none exceed the cut-o↵. However, when 20 lags are used, we notice that,in the right plot of Figure 11, the monitoring statis-tics are clearly autocorrelated. The ACFs of the firsttwo scores, shown in Figure 12, confirm that auto-correlation is a major problem. This is an illustrationof the trade-o↵ between adding lags to manage au-tocorrelation and the issue that simply adding morecan actually increase autocorrelation. A choice of thenumber of lags between 1 and 20 shows the progres-sion toward greater autocorrelation.

Vol. 47, No. 4, October 2015 www.asq.org

Page 11: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

328 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

It is possible to apply a contribution plot to aDPCA model, as we did for static PCA. However,DPCA tends to use many more variables due tothe inclusion of lags. This can make interpretationmore di�cult. A subspace approach for autocorre-lated processes, such as the one proposed by Treasureet al. (2004), may be used to increase interpretabil-ity, though the authors note that the detection per-formance remains comparable with that of DPCA.

5. Recursive PCA

5.1. Method

Besides being sensitive to autocorrelation andmoving-average processes, static PCA control chartsare also unable to cope with nonstationarity. If astatic PCA model is applied to data with a nonsta-tionary process in it, then issues can arise where themean and/or covariance structure of the model be-come misspecified because they are estimated usingobservations from a time period with little similarityto the one being monitored. DPCA provides a tool foraddressing autoregressive and moving-average struc-tures in the data. However, it is vulnerable to nonsta-tionarity for the same reason as static PCA. Di↵er-encing is a possible strategy for coping with nonsta-tionarity, but it su↵ers from the same shortcomingas in the situation when the data are autocorrelated(see Section 4.1). In response to the need for an ef-fective means of coping with nonstationarity, two ap-proaches have been proposed: RPCA and MWPCA.Both of these attempt to address nonstationarity bylimiting the influence of older observations on esti-mates of the mean and covariance structures used toassess the status of observations at the most recenttime point.

The idea of using new observations and exponen-tially down weighting old ones to calculate the meanand covariance matrix obtained from PCA was firstinvestigated by Wold (1994) and Gallagher et al.(1997). However, both of these approaches requireall of the historical observations and complete re-calculation of the parameters at each time point.A more e�cient updating approach was proposedin Li et al. (2000), which provided a more detailedtreatment of the basic approach to mean and covari-ance/correlation updating that is used in the recentRPCA literature. A new observation is evaluatedwhen it is obtained. If the T 2- or Q-statistics exceedthe limits because the observation is a fault or an out-lier, then the model is not updated. However, whenthe observation is in control, it is desirable to up-

date the estimated mean and covariance/correlationfrom the previous period. The approach of Li et al.(2000) was inspired by a recursive version of PLSby Dayal and MacGregor (1997b). This RPLS algo-rithm is supported by a code implementation in thecounterpart paper of Dayal and MacGregor (1997a).

More precisely, assume that the mean and covari-ance of all observations up to time t have been es-timated by xt and St. Then, at time t + 1, the T 2-and Q-statistic are evaluated in the new observationxt+1 = x(t + 1) = (x1(t + 1), . . . ,xp(t + 1))0. If bothvalues do not exceed their cut-o↵ value, one couldaugment the data matrix XT,p with observation xt+1

as Xt+1,p = [X 0T,p xt+1]0 and recompute the model

parameters while using a forgetting factor 0 ⌘ 1.In practice, updating is not performed using the fulldata matrix, but rather a weighting is performed toupdate only the parameters. Denoting nt as the to-tal number of observations measured at time t, theupdated mean is defined as

xt+1 =✓

1� nt

nt + 1⌘

◆xt+1 +

nt

nt + 1⌘xt,

and the updated covariance matrix is defined as

St+1 =✓

1� nt

nt + 1⌘

◆(xt+1 � xt+1)(xt+1 � xt+1)0

+nt

nt + 1⌘St.

This is equivalent to computing a weighted mean andcovariance of Xt+1,p, where older values are downweighted exponentially as in a geometric progression.Using a forgetting factor ⌘ < 1 allows RPCA to au-tomatically give lower weight to older observations.As ⌘ ! 1, the model forgets older observations moreslowly. The eigenvalues of St+1 are used to obtaina loading matrix Pt+1. Calculating the new loadingmatrix can be done in a number of ways that wetouch on when discussing computational complexity.Updating with correlation matrices involves similarintuition, but di↵erent formulas. In order to lowerthe computational burden of repeatedly updating themean and covariances, one strategy has been to re-duce the number of updates; see He and Yang (2008).Application of the outlier detection and missing-datamethods discussed in Section 3.1 is problematic inthe case of RPCA because those techniques are basedon static PCA and the number of observations usedto initialize RPCA may be too short to apply themreliably. However, if the calibration data is assumedto be a locally stationary realization of the process,then it may be possible to apply them. The integra-

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 12: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 329

FIGURE 13. A Schematic Representation of RecursivePCA with a Forgetting Factor ⌘ < 1 at Times t (Left) andt + 1 (Right). The observations used to fit the model areassigned lower weight if they are older. This is representedby the lightening of the light-gray region as the observationsit covers become relatively old.

tion of such methods into adaptive PCA monitoringmethods remains an open field in the literature.

RPCA is characterized intuitively in Figure 13,where a model estimated from observations in thelight-gray region is used to evaluate whether thenewly observed observation, in dark gray, deviatesfrom typical behavior. In this characterization, obser-vations in the light-gray region are given diminishingweight by a forgetting factor to reflect the relativeimportance of contemporary information in estab-lishing the basis for typical behavior. As the choiceof the forgetting factor varies, so does the weight-ing. Furthermore, new observations are later used toevaluate future observations because, under the as-sumption that the monitored process is nonstation-ary, new data are needed to keep the model con-temporary. When an observation is determined to beout-of-control based on the T 2- or Q-statistic, thenthe model is not updated.

Updating the control limits is necessary, as the di-mensionality of the data could vary, and the under-lying mean and covariance parameters of the PCAmodel change. In order to do so for the T 2, it is onlynecessary to recalculate T 2

↵ = �2kt

(↵) for the newlydetermined number of PCs, kt. Furthermore, becauseQ(↵) is a function of ✓i, which are in turn functionsof the eigenvalues of the covariance matrix, once thenew PCA model has been estimated, the Q-statisticcontrol limit is updated to reflect changes to these es-timates. This is illustrated in the top (and bottom)plots of Figure 14, which shows RPCA control chartsof the NASA data for low and high values of the for-getting parameter ⌘. Here, we see that the cut-o↵ of

FIGURE 14. RPCA Control Charts for the NASA DataSet Using ⌘ = 0.9 (Top) and ⌘ = 0.9999 (Bottom).

the T 2-statistic experiences small, sharp steps up asthe number of components increases and down if theydecrease. This is also the case for the cut-o↵ of theQ-statistic, although the fluctuations are the resultof the combined e↵ects of a change in the numberof components and the covariance structure of thedata. The time at which the major fault is detectedis clearly visible in the chart of the Q-statistic as thetime point at which the control limit stops changingfrom t = 637.

In order to di↵erentiate between outlier observa-tions and false alarms, a rule is often imposed that anumber of consecutive observations must exceed thecontrol limits before an observation is considered afault (often three is used). Choi et al. (2006) proposethat an e↵ective way of using observations that maybe outliers or may prove to be faults is to implementa robust reweighting approach. Thus, when an obser-vation exceeds the control limit but is not yet deter-mined to be a true fault in the process, they proposeusing a reweighted version of the observed vector x,where each component of x is down weighted accord-ing to its residual to the current model. The intention

Vol. 47, No. 4, October 2015 www.asq.org

Page 13: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

330 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

of this approach is to prevent outliers from influenc-ing the updating process while still retaining infor-mation from them instead of completely discardingthem.

5.2. Choice of Parameters

Selecting a suitable forgetting factor in RPCA iscrucial. Typically, 0.9 ⌘ 0.9999 because forget-ting occurs exponentially, but lower values may benecessary for highly nonstationary processes. In Choiet al. (2006), RPCA is augmented using variable for-getting factors for the mean and the covariance orcorrelation matrix. This allows the model to adjustthe rate of forgetting to suit a process with nonsta-tionarity. First, they define minimum and maximumvalues of the forgetting factors that can be applied tothe mean and covariance, respectively. Then they al-low the forgetting factor to vary within those boundsbased on how much the parameter has changed sincethe previous period relative to how much it typicallychanges between periods.

Computational complexity is an important con-cern faced by algorithms that perform frequent up-dates. Updating the mean is relatively straightfor-ward because doing so is only a rank-one modifica-tion. Updating the covariance matrix and then cal-culating the new loading matrix proves to be moreinvolved. It is possible to proceed using the stan-dard SVD calculation, but this is relatively slow,with O(p3) time, and hence other approaches tothe eigen decomposition have been proposed. Krugerand Xie (2012) highlight the first order perturba-tion [O(p2)] and data projection method [O(pk2)] asparticularly economical. When p grows larger thank, the data-projection approach becomes faster rela-tive to first-order perturbations. However, the data-projection approach assumes a constant value of k,and this is not a requirement of the first-order per-turbation method. When updating is performed inblocks, fewer updates are performed for a given pe-riod of monitoring, which in turn reduces the com-putational cost.

5.3. RPCA Applied to the NASA Data

We apply two RPCA models to the NASA data.The first has a relatively fast forgetting factor of 0.9.This implies that it quickly forgets observations andprovides a more local model of the data than oursecond specification, which uses a slow forgetting fac-tor of 0.9999. Both are initiated using a static PCAmodel fitted on the first 120 observations, which, ac-cording to our exploration of the NASA data, are sta-

tionary. Then we apply the updating RPCA modelto those data to obtain phase I results. In this sense,phase I serves as a validation set that the model is ca-pable of monitoring the process when it is in controlwithout producing a high false-detection rate. Wethen proceed to apply the model to the observationsafter t = 120, constituting phase II. We note that,in practice, if the initialization period cannot be as-sumed stationary, then a fitting/validation approachbased on continuous sets of data should be used to fitthe model, with the validation set serving to preventoverfitting. Results for these two monitoring mod-els are shown in Figure 14. Because the model with⌘ = 0.9 (top) is based on a small set of observations,it is more local but also less stable. This translatesinto a control chart with many violations of the con-trol limit. Both the T 2- and Q-statistics detect fail-ure before the end of the calibration period. In con-trast, the model with ⌘ = 0.9999 (bottom) detectsthe failure at about t = 600 using the Q-statisticand t = 300 using the T 2-statistic. The times of thesedetections are later than for static PCA and DPCAbecause the RPCA model with ⌘ = 0.9999 is stableenough to produce a reliable model of the processbut adaptive enough that it adjusts to the increas-ingly atypical behavior of the fourth bearing duringthe early stages of its failure. This increased timeto detecting the failure is a shortcoming of RPCAin this context, but the results also illustrate how itis capable of adapting to changes in the system. Ifthese changes are natural and moderate, such adap-tation may be desirable. Fault-identification tech-niques are compatible with PCA methods for nonsta-tionary data. The only restriction is that the modelused for monitoring at the time of the fault shouldbe the one used to form the basis of the contributionplot.

6. Moving-Window PCA

6.1. Method

MWPCA updates at each time point while re-stricting the observations used in the estimations tothose that fall within a specified window of time.With each new observation, this window excludesthe oldest observation and includes the observa-tion from the previous time period. Thus, for win-dow size H, the data matrix at time t is Xt =(xt�H+1,xt�H+2, . . . ,xt)0 and, at time t + 1, it isXt+1 = (xt�H+2,xt�H+3, . . . ,xt+1)0. The updatedxt+1 and St+1 can then be calculated using theobservations in the new window. In a sense, the

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 14: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 331

FIGURE 15. Moving Window PCA with Window LengthH =10 at Times t (Left) and t + 1 (Right).

MWPCA windowing is akin to RPCA using a fixed,binary forgetting factor. While completely recal-culating the parameters for each new window isstraightforward and intuitively appealing, methodshave been developed to improve on computationalspeed (see, for example, Jeng (2010)). As was the casefor RPCA, the model is not updated when an obser-vation is determined to be out of control. A good in-troduction to MWPCA can be found in Kruger andXie (2012, chap. 7). In particular, it includes a de-tailed comparison of the di↵erence in computationtime between a complete recomputation of the pa-rameters versus an up- and down-dating approach.Both have O(p2) time complexity but, in most prac-tical situations, the adaptive approach works faster.The outlier detection and missing data methods dis-cussed in Section 3.1 can be applied to the windowof calibration data used to initialize the MWPCAmodel because it is assumed to be acceptably locallystationary enough to perform static PCA modeling.

MWPCA is characterized intuitively in Figure 15,where a model estimated from observations in thelight-gray window is used to evaluate whether thenew observation, in dark gray, deviates from typicalbehavior. In this characterization, at each new timepoint, the oldest observation is excluded from thelight-gray window, and the observation of the pre-vious period is added in order to accommodate fornonstationarity. The length of the window, H, is se-lected based on the speed at which the mean andcovariance parameters change, with large windowsbeing well suited to slow change and small windowsbeing well suited for rapid change.

6.2. Choice of Parameters

One challenge in implementing MWPCA is to se-lect the window length H. This can be done usingexpert knowledge or examination of the process by

a practitioner. Chiang et al. (2001) provide a roughestimate of the window size needed to correctly es-timate the T 2-statistic based on the convergence ofthe �2 distribution to the F distribution that recom-mends minimum window sizes greater than roughly10 times the number of variables. For the Q-statistic,this window size is something of an absolute mini-mum and a higher size is likely necessary. Inspiredby Choi et al. (2006), He and Yang (2008) propose avariable MWPCA approach that changes the lengthof the window in order to adapt to the rate at whichthe system under monitoring changes. Once the win-dow size is selected, the additional complication thatthere is not yet enough observed data may arise. Oneapproach to address this is to simply use all of thedata until the window can be filled and then proceedwith MWPCA. Another method, proposed in Jeng(2010), is a combination of MWPCA with RPCAsuch that, for the early monitoring period, RPCA isused because it is not obliged to consider a specificnumber of observations. Then, once enough observa-tions have been recorded to fill the MWPCA win-dow, MWPCA is used. Jin et al. (2006) also proposean approach for combining MWPCA with a dissimi-larity index based on changes in the covariance ma-trix, with the objective of identifying optimal updatepoints. Importantly, they also discuss a heuristic forthe inclusion of process knowledge into the controlchart that is intended to reduce unnecessary updat-ing and to prevent adaptation to anticipated distur-bances.

Jin et al. (2006) elaborate on the value of reducingthe number of updates in order to reduce computa-tional requirements and reduce sensitivity to randomperturbations. He and Yang (2011) propose anotherapproach aiming to reduce the number of updatesbased on waiting for M samples to accumulate beforeupdating the PCA model. This approach is intendedto be used in a context where slow ramp faults arepresent. In their paper, He and Yang (2011) proposea procedure for selecting the value of M .

Wang et al. (2005) propose a method for quicklyupdating the mean and covariance estimates for caseswhere the window size exceeds three times the num-ber of variables and of using a V -step-ahead predic-tion in order to prevent the model from adaptingso quickly that it ignores faults when they are ob-served. This approach proceeds by using a model es-timated at time t to predict the behavior of the sys-tem at time t + V and evaluate whether a fault hasoccurred. The intention is to ensure that the model

Vol. 47, No. 4, October 2015 www.asq.org

Page 15: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

332 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

does not overly adapt to the data and will be ableto detect errors that accumulate slowly enough topass as normal observations at each time point. Asthe authors point out, using a longer window willalso make the fault-detection process less sensitiveto slowly accumulating errors. One advantage of theV -step-ahead approach is that it can operate witha smaller data matrix than a longer window wouldrequire, so computational e�ciency can be gained.However, the trade o↵ is that the number of stepsahead must be chosen in addition to the choice ofthe window length.

6.3. MWPCA Applied to the NASA Data

Figure 16 displays the results of control charts forMWPCA models. These were fitted on the last H ob-servations of the phase I data (because an MWPCAmodel is only based on H observations) and thenreapplied to the phase I observations. As for RPCA,applying a model to observations that are not con-secutive with the endpoint of the calibration periodis plausible for the NASA process because the earlyobservations are stationary. Then, phase II observa-tions are monitored using the model. Window sizesof H = 40 and 80 were used to parameterize mod-els, corresponding to one third and two thirds of thesize of the calibration set. MWPCA shows slightlymore stability during the phase I monitoring whenH = 80, reinforcing what was observed when RPCAwas applied; that forgetting observations too quicklycan lead to too rapidly varying models and inconsis-tent process monitoring. We can see that the resultsfor the model with H = 80 convincingly detects thefault based on the Q-statistic at about the same timeas the RPCA model with ⌘ = 0.9999 (t = 600), butthe T 2-statistic remains more or less in control as welluntil about t = 600. Thus, the monitoring statisticsof MWPCA with H = 80 are somewhat more con-sistent with each other than those of RPCA with⌘ = 0.9999. Although the monitoring statistics be-come very large after t = 600 for the MWPCA modelwith H = 40, there tend to be more detections priorto this time point, indicating that the model is lessstable than the one obtained with H = 80. In this re-spect, the results are similar to those of RPCA with⌘ = 0.9. Although we find in this case that MWPCAwith a slower forgetting factor of H = 80 performsbetter than with H = 40, we also note that it has dif-ferent performance than static PCA because it con-vincingly detects the fault only at around t = 600.This could be desirable for the reason that, beforet = 600, the vibrations in bearing four are not so

FIGURE 16. MWPCA Control Charts for the NASA DataSet Using H = 40 (Top) and H = 80 (Bottom).

great that they necessarily justify stopping the ma-chine, but beyond this time point, the vibrations be-gin to increase rapidly.

7. Discussion

Control charts based on static PCA models havebeen widely used for monitoring systems with manyvariables that do not exhibit autocorrelation or non-stationary properties. DPCA, RPCA, and MWPCAprovide methodologies for addressing these scenar-ios. To summarize, a rubric of the situations wherethese methods are applicable is provided in Table 3.However, while extensions have sought to make themas generally implementable as static PCA, a numberof challenges have not yet been resolved.

An area for further research lies in investigat-ing the performance of models mixing DPCA andR/MWPCA to handle autocorrelation and nonsta-tionarity simultaneously. Presently, works have fo-

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 16: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 333

TABLE 3. Applicability of Di↵erent PCA Methodsto Time-Dependent Processes

Nonstationarity

Autocorrelation No Yes

No Static PCA R/MWPCAYes DPCA ?

cused on examining the performance of methods in-tended for only one type of dynamic data, but com-binations of the two remain unexplored.

Among the most important questions is how tochoose the optimal values of the parameters used byDPCA, RPCA, and MWPCA. We have focused onillustrating the properties of these algorithms as theirparameters vary by using low and high values. How-ever, in practice, an optimal value for monitoring isdesired. Often, the determination of these parame-ters is left to the discretion of an expert on the sys-tem being monitored. Automatic methods have beendescribed, but no consensus exists on which is thebest, and further research is particularly needed inthe area of automatic methods for RPCA and MW-PCA parameter selection.

Currently, a weakness of DPCA is that, if an ob-servation is considered out of control but as an out-lier rather than a fault, then the practitioner wouldnormally continue monitoring, but ignoring this ob-servation. However, doing so destroys the lag struc-ture of DPCA. Therefore, a study on the benefits ofreweighting the observation like in Choi et al. (2006)or removing the observation and replacing it with aprediction would be a useful contribution.

Methods for addressing the influence of outliersduring the calibration phase exist, see e.g., Hubertet al. (2005) and Jensen et al. (2007), as well asfor during online monitoring (see Chiang and Cole-grove (2007), Choi et al. (2006), and Li et al. (2000)).These methods address the problem of how to bestmake use of information captured in outliers, and ap-proaches range from excluding them completely todown weighting the influence exerted by such obser-vations. Which approach is preferable and whetherdi↵erent types of outliers should be treated di↵er-ently are still open questions. Similarly, approachesfor missing-data imputation for PCA that can be ap-plied when the calibration data is incomplete havealso been proposed (Walczak and Massart (2001) and

Serneels and Verdonck (2008)), but little has beendone to explore the performance of these methodsin the PCA process monitoring setting or when thedata is autocorrelated.

Further research is also warranted in the area offault isolation. The contribution plot, residual-basedtests, and variable reconstruction are three well-studied approaches for solving this problem (Krugerand Xie (2012), Qin (2003)). Recently, some newmethods for fault isolation based on modificationsto the contribution plot methodology have been pro-posed (see Elshenawy and Awad (2012)). However,these methods cannot isolate the source of faults inmany complex failure settings, a task that becomesmore di�cult still when the data is time dependent.Improvements on the classical contribution plot orentirely new methods would be a valuable addition tothe PCA control-chart toolbox. Woodall and Mont-gomery (2014) cover some control-chart performancemetrics, such at the average run length and false dis-covery rate (FDR), and elaborate on challenges facedby these metrics in real-data applications. They pro-pose that the FDR may be more appropriate forhigh-dimensional cases, but state that further re-search is necessary to draw firm conclusions. Thisadvice is especially relevant for PCA control-chartmethods because they are often applied to high-dimensional data and the FDR should be investi-gated as an option for measuring performance.

We make the code and data on which our resultsin this paper are based available on request.

References

Arteaga, F. and Ferrer, A. (2002). “Dealing with MissingData in MSPC: Several Methods, Di↵erent Interpretations,Some Examples”. Journal of Chemometrics 16(8–10), pp.408–418.

Barcelo, S.; Vidal-Puig, S.; and Ferrer, A. (2010). “Com-parison of Multivariate Statistical Methods for Dynamic Sys-tems Modeling”. Quality & Reliability Engineering Interna-tional 27(1), pp. 107–124.

Bersimis, S.; Psarakis, S.; and Panaretos, J. (2006). “Mul-tivariate Statistical Process Control Charts: An Overview”.Quality & Reliability Engineering International 23, pp. 517–543.

Bisgaard, S. (2012). “The Future of Quality Technology:From a Manufacturing to a Knowledge Economy & FromDefects to Innovations”. Quality Engineering 24, pp. 30–36.

Chiang, L. and Colegrove, L. (2007). “Industrial Implemen-tation of On-Line Multivariate Quality Control”. Chemomet-rics and Intelligent Laboratory Systems 88, pp. 143–153.

Chiang, L.; Russell, E.; and Braatz, R. (2001). Fault De-tection and Diagnosis in Industrial Systems. London, UK:Springer-Verlag.

Choi, S.; Martin, E.; Morris, A.; and Lee, I. (2006). “Adap-

Vol. 47, No. 4, October 2015 www.asq.org

Page 17: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

334 BART DE KETELAERE, MIA HUBERT, AND ERIC SCHMITT

tive Multivariate Statistical Process Control for MonitoringTime-Varying Processes”. Industrial & Engineering Chem-istry Research 45, pp. 3108–3118.

Dayal, B. S. and MacGregor, J. F. (1997a). “Improved PLSAlgorithms”. Journal of Chemometrics 11(1), pp. 73–85.

Dayal, B. S. and MacGregor, J. F. (1997b). “Recursive Ex-ponentially Weighted PLS and Its Applications to AdaptiveControl and Prediction”. Journal of Process Control 7(3),pp. 169–179.

Elshenawy, L. and Awad, H. (2012). “Recursive Fault Detec-tion and Isolation Approaches of Time-Varying Processes”.Industrial & Engineering Chemistry Research 51(29), pp.9812–9824.

Epprecht, E. K.; Barbosa, L. F. M.; and Simoes, B. F.T. (2011). “SPC of Multiple Stream Processes: A Chart forEnhanced Detection of Shifts in One Stream”. Production21, pp. 242–253.

Epprecht, E. K. and Simoes, B. F. T. (2013). “StatisticalControl of Multiple-Steam Processes: A Literature Review”.Paper presented at the 11th International Workshop on In-telligent Statistical Quality Control, Sydney, Australia.

Gallagher, N.; Wise, B.; Butler, S.; White, D.; andBarna, G. (1997). “Development and Benchmarking of Mul-tivariate Statistical Process Control Tools for a Semiconduc-tor Etch Process: Improving Robustness Through Model Up-dating”. Process: Impact of Measurement Selection and DataTreatment on Sensitivity, Safe Process 1997, pp. 26–27.

He, B. and Yang, X. (2011). “A Model Updating Approach ofMultivariate Statistical Process Monitoring”. IEEE Interna-tional Conference on Information and Automation (ICIA),pp. 400–405.

He, X. and Yang, Y. (2008). ”Variable MWPCA for AdaptiveProcess Monitoring”. Industrial & Engineering ChemistryResearch 47(2), pp. 419–427.

Hubert, M.; Rousseeuw, P.; and van den Branden, K.(2005). “ROBPCA: A New Approach to Robust PrincipalComponents Analysis”. Technometrics 47, pp. 64–79.

Jackson, J. and Mudholkar, G. (1979). “Control Proce-dures for Residuals Associated with Principal ComponentAnalysis”. Technometrics 21(3), pp. 341–349.

Jeng, J.-C. (2010). “Adaptive Process Monitoring Using Ef-ficient Recursive PCA and Moving Window PCA Algo-rithms”. Journal of the Taiwan Institute of Chemical En-gineer 44, pp. 475–481.

Jensen, W.; Birch, J.; and Woodall, W. (2007). “HighBreakdown Estimation Methods for Phase I MultivariateControl Charts”. Quality and Reliability Engineering Inter-national 23(5), pp. 615–629.

Jin, H.; Lee, Y.; Lee, G.; and Han, C. (2006). “Robust Re-cursive Principal Component Analysis Modeling for Adap-tive Monitoring”. Industrial & Engineering Chemistry Re-search 45(20), pp. 696–703.

Jolliffe, I. (2002). Principal Component Analysis, 2nd edi-tion. New York, NY: Springer.

Jolliffe, I. T.; Trendafilov, N. T.; and Uddin, M. (2003).“A Modified Principal Component Technique Based on theLASSO”. Journal of Computational and Graphical Statistics12, pp. 531–547.

Kourti, T. (2005). “Application of Latent Variable Methodsto Process Control and Multivariate Statistical Process Con-trol in Industry”. International Journal of Adaptive Controland Signal Processing 19(4), pp. 213–246.

Kruger, U. and Xie, L. (2012). Advances in Statistical Mon-itoring of Complex Multivariate Processes: With Applica-tions in Industrial Process Control. New York, NY: JohnWiley.

Kruger, U.; Zhou, Y.; and Irwin, G. (2004). “ImprovedPrincipal Component Monitoring of Large-Scale Processes”.Journal of Process Control 14(8), pp. 879–888.

Ku, W.; Storer, R.; and Georgakis, C. (1995). “Distur-bance Detection and Isolation by Dynamic Principal Com-ponent Analysis”. Chemometrics and Intelligent LaboratorySystems 30(1), pp. 179–196.

Lee, J.; Qiu, H.; Yu, G.; Lin, J.; and Services, R. T. (2007).“Bearing Data Set”. IMS, University of Cincinnati. NASAAmes Prognostics Data Repository.

Li, W.; Yue, H.; Valle-Cervantes, S.; and Qin, S. (2000).“Recursive PCA for Adaptive Process Monitoring”. Journalof Process Control 10(5), pp. 471–486.

Luo, R.; Misra, M.; and Himmelblau, D. (1999). “Sen-sor Fault Detection via Multiscale Analysis and DynamicPCA”. Industrial & Engineering Chemistry Research 38(4),pp. 1489–1495.

Miller, P.; Swanson, R.; and C., H. (1998). “Contribu-tion Plots: A Missing Link in Multivariate Quality Control”.Applied Mathematics and Computer Science 8, pp. 775–792.

Nomikos, P. and MacGregor, J. (1995). “Multivariate SPCCharts for Monitoring Batch Processes”. Technometrics 37,pp. 41–59.

Qin, J.; Valle-Cervantes, S.; and Piovoso, M. (2001). “OnUnifying Multi-Block Analysis with Applications to Decen-tralized Process Monitoring”. Journal of Chemometrics 15,pp. 715–742.

Qin, S. (2003). “Statistical Process Monitoring: Basics andBeyond”. Journal of Chemometrics 17, pp. 480–502.

R Core Team (2014). “R: A Language and Environment forStatistical Computing”.

Rato, T. and Reis, M. (2013a). “Advantage of Using Decor-related Residuals in Dynamic Principal Component Analysisfor Monitoring Large-Scale Systems”. Industrial & Engineer-ing Chemistry Research 52(38), pp. 13685–13698.

Rato, T. and Reis, M. (2013b). “Defining the Structure ofDPCA Models and its Impact on Process Monitoring andPrediction Activities”. Chemometrics and Intelligent Labo-ratory Systems 125, pp. 74–86.

Rato, T. and Reis, M. (2013c). “Fault Detection in the Ten-nessee Eastman Benchmark Process Using Dynamic Prin-cipal Components Analysis Based on Decorrelated Residu-als (DPCA-DR)”. Chemometrics and Intelligent LaboratorySystems 125, pp. 101–108.

Runger, G. C.; Alt, F. B.; and Montgomery, D. C.(1996). “Controlling Multiple Stream Processes with Princi-pal Components”. International Journal of Production Re-search 34(11), pp. 2991–2999.

Serneels, S. and Verdonck, T. (2008). “Principal Compo-nent Analysis for Data Containing Outliers and Missing El-ements”. Computational Statistics & Data Analysis 52(3),pp. 1712–1727.

Todorov, V. (2013). “Scalable Robust Estimators with HighBreakdown Point for Incomplete Data”. R package, version0.4-4.

Todorov, V. and Filzmoser, P. (2009). ”An Object-Oriented Framework for Robust Multivariate Analysis”.Journal of Statistical Software 32(3), pp. 1–47.

Journal of Quality Technology Vol. 47, No. 4, October 2015

Page 18: Overview of PCA-Based Statistical Process-Monitoring

mss # 1884.tex; art. # 02; 47(4)

OVERVIEW OF PCA-BASED PROCESS MONITORING OF TIME-DEPENDENT HIGH-DIMENSIONAL DATA 335

Treasure, R. J.; Kruger, U.; and Cooper, J. E. (2004).“Dynamic Multivariate Statistical Process Control UsingSubspace Identification”. Journal of Process Control 14(3),pp. 279–292.

Tsung, F. (2000). “Statistical Monitoring and Diagnosis ofAutomatic Controlled Processes Using Dynamic PCA”. In-ternational Journal of Production Research 38(3), pp. 625–637.

Valle, S.; Li, W.; and Qin, S. (1999). “Selection of the Num-ber of Principal Components: The Variance of the Recon-struction Error Criterion with a Comparison to Other Meth-ods”. Industrial & Engineering Chemistry Research 38(11),pp. 4389–4401.

Verboven, S. and Hubert, M. (2005). “LIBRA: A MATLABLibrary for Robust Analysis”. Chemometrics and IntelligentLaboratory Systems 75, pp. 127–136.

Walczak, B. and Massart, D. (2001). “Dealing with Miss-

ing Data. Part I”. Chemometrics & Intelligent LaboratorySystems 58, pp. 15–27.

Wang, X.; Kruger, U.; and Irwin, G. (2005). “Process Mon-itoring Approach Using Fast Moving Window PCA”. Indus-trial & Engineering Chemistry Research 44(15), pp. 5691–5702.

Wold, S. (1994). “Exponentially Weighted Moving PrincipalComponents Analysis and Projections to Latent Structures”.Chemometrics and Intelligent Laboratory Systems 23(1), pp.149–161.

Woodall, W. and Montgomery, D. (2014). “Some CurrentDirections in the Theory and Application of Statistical Pro-cess Monitoring”. Journal of Quality Technology 46(1), pp.78–94.

Zou, H.; Hastie, T.; and Tibshirani, R. (2006). “SparsePrincipal Component Analysis”. Journal of Computationaland Graphical Statistics 15, pp. 265–286.

s

Vol. 47, No. 4, October 2015 www.asq.org