12
Received: 9 March 2009, Revised: 13 November 2009, Accepted: 28 January 2010, Published online in Wiley InterScience: 13 April 2010 Optical coefficient-based multivariate calibration on near-infrared spectroscopy Zhenqi Shi a , Robert P. Cogdill b , Harald Martens c and Carl A. Anderson a * The time and expense of calibration development limit the feasibility of NIR spectroscopy for many industrial applications, with a major portion of the costs being related to creation of a sufficient set of calibration samples. Net analyte signal (NAS) and generalized least squares (GLS) pre-processing have been proposed in the literature as methods to simplify multivariate calibration by reducing the quantity of calibration samples by orthogonalizing or shrinking interference signals. Synthetic calibration has also been reported as a method to combine interference signals with pure component spectra to generate virtual calibration models, thereby reducing the number of real calibration samples required. The goals of this paper were to (1) compare theoretical and practical differences between NAS and GLS pre-processing and (2) explore the potential of simplified NIR calibrations, both empirical and synthetic, constructed using optical coefficient-based signal processing on predicting chemical compositions of pharmaceutical powder mixtures. A reduced calibration dataset including only one pharmaceutical powder mixture composition and pure component spectra was used for both empirical and synthetic calibrations. Absorption and reduced scattering coefficients, obtained from spatially-resolved spectroscopy, were used herein as interference signals in NAS/GLS pre-processing for both calibrations. As a result, NAS and GLS were shown to be equivalent in both theoretical and practical senses. After optical coefficient-based signal processing, simplified calibrations, both empirical and synthetic, were demonstrated to have similar model performance as generic pre-processing methods such as SNV and derivative, while requiring fewer principal components and achieving a lower prediction error. Copyright ß 2010 John Wiley & Sons, Ltd. Keywords: near-infrared spectroscopy; multivariate calibration; net analyte signal; generalized least square; synthetic calibration; absorption coefficients; reduced scattering coefficients 1. INTRODUCTION 1.1. Empirical calibration Quantitative analysis of intact samples by near-infrared (NIR) spectroscopy typically requires multivariate calibration in order to overcome additive and multiplicative effects due to chemical and physical interferences (e.g. constituents with overlapping peaks, light scattering, temperature effects etc.). The prevailing methods for multivariate calibration require [1]: (1) multiple calibration samples which span the potential sources of variation in prediction samples with respect to chemical composition (the analyte of interest and interferences), physical quality attributes and instrument performance factors and (2) pre-processing routines that mitigate interferences in order to enhance the chemical signals embedded in the spectra. These multivariate calibration efforts are generally regarded as the empirical approach. However, these requirements can be problematic. First, creating calibration samples may require extensive resources and time, which makes it difficult to implement in long-term practical applications in industry [2]. Second, most data pre-processing routines are non-specific, (e.g. standard normal variate, SNV and multiplicative scattering correction, MSC) and may result in the suppression of important chemical information [3,4]. The suppression of chemical information is expected to be problematic, especially when components of low concentration are the analytes of interest, such as the active pharmaceutical ingredient (API) in a low dosed dosage form. 1.2. Efficient calibration and synthetic calibration In order to simplify the empirical calibration procedure, some chemometric algorithms have been proposed to pre-process the input spectra into an ‘interference free’ data matrix, thereby allowing calibration based on a reduced number of samples without including interference variations. The term ‘reduced’ here means fewer (high-leveraged) calibration samples com- pared to what is used in empirical calibration. Two types of the so-called efficient calibration algorithms are generally reported. One is based on the projection of raw spectra onto a null space spanned by interference spectra. Examples of this type of algorithm include the many variants of net analyte signal (NAS) and orthogonal signal correction (OSC) [5,6]. A second method is (www.interscience.wiley.com) DOI: 10.1002/cem.1301 Research Article * Correspondence to: C. A. Anderson, Graduate School of Pharmaceutical Sciences, Duquesne University, Pittsburgh, PA, 15282, USA. E-mail: [email protected] a Z. Shi, C. A. Anderson Graduate School of Pharmaceutical Sciences, Duquesne University, Pitts- burgh, PA 15282, USA b R. P. Cogdill College of Engineering, University of Nebraska, Lincoln, NE 68588, USA c H. Martens Nofima Mat AS and CIGENE/IMT, Norwegian University of Life Sciences, Aas, Norway J. Chemometrics 2010; 24: 288–299 Copyright ß 2010 John Wiley & Sons, Ltd. 288

Optical coefficient-based multivariate calibration on near-infrared spectroscopy

Embed Size (px)

Citation preview

Received: 9 March 2009, Revised: 13 November 2009, Accepted: 28 January 2010, Published online in Wiley InterScience: 13 April 2010

Optical coefficient-based multivariatecalibration on near-infrared spectroscopyZhenqi Shia, Robert P. Cogdillb, Harald Martensc and Carl A. Andersona*

The time and expense of calibration development limit the feasibility of NIR spectroscopy for many industrialapplications, with a major portion of the costs being related to creation of a sufficient set of calibration samples. Netanalyte signal (NAS) and generalized least squares (GLS) pre-processing have been proposed in the literature asmethods to simplify multivariate calibration by reducing the quantity of calibration samples by orthogonalizing orshrinking interference signals. Synthetic calibration has also been reported as a method to combine interferencesignals with pure component spectra to generate virtual calibration models, thereby reducing the number of realcalibration samples required. The goals of this paper were to (1) compare theoretical and practical differencesbetween NAS and GLS pre-processing and (2) explore the potential of simplified NIR calibrations, both empirical andsynthetic, constructed using optical coefficient-based signal processing on predicting chemical compositions ofpharmaceutical powder mixtures. A reduced calibration dataset including only one pharmaceutical powder mixturecomposition and pure component spectra was used for both empirical and synthetic calibrations. Absorption andreduced scattering coefficients, obtained from spatially-resolved spectroscopy, were used herein as interferencesignals in NAS/GLS pre-processing for both calibrations. As a result, NAS and GLSwere shown to be equivalent in boththeoretical and practical senses. After optical coefficient-based signal processing, simplified calibrations, bothempirical and synthetic, were demonstrated to have similar model performance as generic pre-processing methodssuch as SNV and derivative, while requiring fewer principal components and achieving a lower prediction error.Copyright � 2010 John Wiley & Sons, Ltd.

Keywords: near-infrared spectroscopy; multivariate calibration; net analyte signal; generalized least square; syntheticcalibration; absorption coefficients; reduced scattering coefficients

1. INTRODUCTION

1.1. Empirical calibration

Quantitative analysis of intact samples by near-infrared (NIR)spectroscopy typically requires multivariate calibration in orderto overcome additive and multiplicative effects due to chemicaland physical interferences (e.g. constituents with overlappingpeaks, light scattering, temperature effects etc.). The prevailingmethods for multivariate calibration require [1]: (1) multiplecalibration samples which span the potential sources of variationin prediction samples with respect to chemical composition (theanalyte of interest and interferences), physical quality attributesand instrument performance factors and (2) pre-processingroutines that mitigate interferences in order to enhance thechemical signals embedded in the spectra. These multivariatecalibration efforts are generally regarded as the empiricalapproach. However, these requirements can be problematic.First, creating calibration samples may require extensiveresources and time, which makes it difficult to implement inlong-term practical applications in industry [2]. Second, most datapre-processing routines are non-specific, (e.g. standard normalvariate, SNV and multiplicative scattering correction, MSC) andmay result in the suppression of important chemical information[3,4]. The suppression of chemical information is expected to beproblematic, especially when components of low concentrationare the analytes of interest, such as the active pharmaceuticalingredient (API) in a low dosed dosage form.

1.2. Efficient calibration and synthetic calibration

In order to simplify the empirical calibration procedure, somechemometric algorithms have been proposed to pre-process theinput spectra into an ‘interference free’ data matrix, therebyallowing calibration based on a reduced number of sampleswithout including interference variations. The term ‘reduced’here means fewer (high-leveraged) calibration samples com-pared to what is used in empirical calibration. Two types of theso-called efficient calibration algorithms are generally reported.One is based on the projection of raw spectra onto a null spacespanned by interference spectra. Examples of this type ofalgorithm include the many variants of net analyte signal (NAS)and orthogonal signal correction (OSC) [5,6]. A second method is

(www.interscience.wiley.com) DOI: 10.1002/cem.1301

Research Article

* Correspondence to: C. A. Anderson, Graduate School of PharmaceuticalSciences, Duquesne University, Pittsburgh, PA, 15282, USA.E-mail: [email protected]

a Z. Shi, C. A. Anderson

Graduate School of Pharmaceutical Sciences, Duquesne University, Pitts-

burgh, PA 15282, USA

b R. P. Cogdill

College of Engineering, University of Nebraska, Lincoln, NE 68588, USA

c H. Martens

Nofima Mat AS and CIGENE/IMT, Norwegian University of Life Sciences, Aas,

Norway

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd.

288

based on shrinking the interference matrix via matrix division;classic examples include multivariate generalized least-squares(GLS) [7] and the spectroscopic Wiener filter [8,9].More recently, a third method of efficient calibration has also

been proposed, termed synthetic calibration. The term ‘synthetic’is used here because synthetic, artificial or virtual spectra arecreated by mixtures of pure-component spectra and randomcombinations of expected or known interference signals. Usingprior knowledge of the analyte and interference signals, virtualspectra can be created for model calibrations, thereby savingtime and labor associated with creating real calibration samples.Synthetic calibration, which can be considered as a form of MonteCarlo simulation, effectively transfers the effort of calibrationdevelopment from creation of standard samples to thedevelopment of accurate interference models; the advantagelies in the expectation that the latter tends to be more general(common across products) and cheaper to acquire. In addition toefficient calibration, synthetic spectra can also be used for riskassessment. The use of synthetic calibration has already beendemonstrated in agricultural, pharmaceutical and biomedicalapplications [2,10,11]. Although the addition of interferencesignals to pure component spectra in synthetic calibration isessentially analogous to the interference projection andshrinking in NAS and GLS, respectively, a rigorous side-by-sidetheoretical and practical comparison of these techniques has notbeen published.Estimates of chemical and physical interference signals are

required for each of these efficient calibration approaches to beeffective. In some examples from the literature, expected orknown absorption signals of interferants and empirical relation-ships between wavelength and scattering were used to representboth types of signals [3,12]. However, the efficiency of using theempirical wavelength-scattering relation to represent physicalinterferences (i.e. scattering) is still unclear. Since both types ofinterference signals are functions of the chemical and physicalproperties of the interferants, the determination of absorptionand scattering signals for raw materials of known interferants isexpected to be useful for these efficient calibration approaches toreduce time and cost during multivariate calibration. Due to thenature of these efficient calibration approaches, only additivephenomena are considered in the paper, in which spectra areregarded as a summed function of the analyte and interferences.

1.3. Optical coefficients-based efficient calibration

It is well known that absorption and scattering are the two basicevents that occur when a photon is impinge upon a solid sampleunder NIR illumination. The classical radiative transfer equationutilizes separated optical coefficients to describe these twoevents [13,14], which are absorption and scattering coefficients.Absorption is the loss of photon energy due to the alteredmolecular dipole of a bond, and is described by the absorptioncoefficient, ma (l) (cm

�1). The absorption coefficient is defined asthe probability of photon being absorbed. Scattering is astochastic process that occurs when a photon encountersmismatched refractive index interfaces within a sample. There-fore, physical parameters such as powder density and porosity,which affect the nature and density of refractive index interfaces,can have a dramatic effect on scattering. Scattering is describedby the scattering coefficient, ms (l) (cm

�1), which is defined asthe probability of photon being scattered. Because scattering isthe dominant event in NIR regime, the diffusion approximation to

the radiative transfer equation is normally applied, which allowsthe use of the reduced scattering coefficient, ms

0 (l) (cm�1) toquantify the scattering property of a sample matrix. Therelationship between ms and ms

0 is ms0 ¼ms

0(1� g), in which gstands for the anisotropy factor, describing the property ofscattering angle in a sample matrix.Due to the intrinsic properties of ma (l) and ms

0 (l), bothcoefficients have been correlated to chemical and physicalproperties of the analyte of interest for quantitative analysis[15,16]. For instance, density adjusted absorption coefficientswere used to quantify the content of acetylsalicylic acid in binarymixtures with microcrystalline cellulose [15], and reducedscattering coefficients were used to predict the median particlesize of lactose powder [16]. Among these studies, both ma (l) andms

0 (l) of individual samples were determined and directlyapplied for quantitative analysis in lieu of the measuredabsorbance spectra. However, considering the purpose of thispaper to be utilizing the optical coefficients as interferencesignals in the efficient calibration approaches, ma (l) and ms

0 (l) ofindividual samples were not directly used. Instead, the absorptionand scattering spectra (i.e. ma (l) and ms

0 (l)) of pure componentraw materials were combined with efficient calibrationapproaches to perform specific signal processing (i.e. signalremoval and addition) on absorbance spectra according toindividual components. In the signal processing for a givenconstituent (i.e. the ‘analyte’), the measured ma (l) of otherconstituents (i.e. the ‘interferants’) and the ms

0 (l) of all theconstituents were used as the chemical and physical interfer-ences, respectively. Includingms

0 (l) as explicit prior knowledge insignal processing is expected to better characterize the scatteringinterferences than the empirical wavelength-scattering relation-ship.The purpose of this paper is to demonstrate the potential of

utilizing a priori knowledge (i.e. ma (l) and ms0 (l) of pure

components) in multivariate calibration techniques in NIRspectroscopy. This presents useful insights for the idea of‘fundamentals-based modeling’ [17] to simplify or even sub-stitute for empirical modeling in the future. The objectives of thisstudy were to:

(1) Compare the theoretical and practical aspects of NAS and GLSas pre-processing methods;

(2) Explore the potential of pure component optical coefficientsas prior information in NAS/GLS pre-processing to simplifythe empirical multivariate model calibration;

(3) Investigate the use of pure component optical coefficients asprior information to assist in synthetic calibration to predictchemical compositions under practical conditions and com-pare its prediction performance against that of the empiricalmodel calibration.

For the sake of clarification, it should be noted here that for thiswork the NAS/GLS calculation was in all cases used only forpre-processing to be followed by partial least square (PLS)calibration. This two-stage approach was used in order torepresent situations where the prior knowledge is available butnot complete. The pre-processing removes interferences basedon prior knowledge, while the multivariate calibration accountsfor unidentified types of interferences. This should be differ-entiated from the NAS/GLS calibration, in which a subsequentprincipal component-based modeling routine is not typicallyapplied [18].

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

289

2. EXPERIMENTS AND METHODS

2.1. Materials and spectra acquisition

A Plackett–Burman experimental design (Table I) was used togenerate eight mixtures containing acetaminophen (APAP),lactose monohydrate (LAC) and microcrystalline cellulose(MCC). Two grades of APAP were used, including fine powder(Mallinckrodt Inc., Raleigh, NC) and coarse powder (RhodiaOrganique, Cedex, France). LAC was supplied by Foremost FarmsUSA, Rothschild, WI. Two grades of MCC were also used, includingPH 101 and 200 (FMC Biopolymer, Mechanicsburg, PA). Allmaterials were used as received.Powder blending was monitored online via a process

spectrometer (BM-1000, Control Development Inc., South Bend,IN) mounted on the top of a 5.5 L bin-blender for 90min at acollection rate of 5 s per spectrum. The spectrometer wasauto-referenced using an internal spectralon disk. The spectrawere collected within the wavelength range of 1050–1620 nm,with a 5 nm interval. In total, 115 wavelength channels werecollected. Each blending experiment was performed in triplicate.Blending spectra collected during the last 10min, correspondingto120 spectra for each blending run, were used here to representhomogenous powder mixtures (confirmed elsewhere to reachhomogeneity [19]).One of the eight blending compositions (i.e. Design 7, bold in

Table I) was arbitrarily selected to combine with the followingspectra to form a reduced calibration dataset. The remainingseven blending compositions served as a prediction dataset.Design 7 and pure component spectra of raw materials (50replicate spectra for each raw material) were used as theempirical calibration spectra. Design 7 was also combined withvirtual spectra (as described in Section 2.4.2) to serve as thesynthetic calibration spectra.

2.2. Determination of optical coefficients

Five pure-component powder materials were used, includingLAC, PH101, PH200, APAP (fine and coarse grades). Determinationof optical coefficients for these raw materials was performedaccording to a previously-reported protocol [4]. First, spatial-ly-resolved spectroscopy (based on chemical imaging) was usedto measure the radially-diffused reflectance of raw powdermaterials. Second, a center component design was used togenerate multiple pairs of ma and ms

0. Each pair of optical

coefficients was used as the inputs for a Monte Carlo simulationprocedure that generated corresponding radially-diffused reflec-tance. Then, a PLS model was built between simulatedradially-diffused reflectance and optical coefficients. Third, theMonte Carlo simulation-based PLS model was used to predictoptical coefficients of raw materials from the measuredradially-diffused reflectance.

2.3. Comparison of NAS and GLS pre-processing

NAS and GLS pre-processing were performed according toreported formulas:

NAS :6XNAS ¼ ½I� b � Kþ � K� � Xinput (1)

GLS :7XGLS ¼ Xinput � ðg � K0 � K þ IÞ�1=2 (2)

� Xinput (n�m) is the mean-centered raw input spectra, n is thenumber of sample spectra andm is the number of variables (i.e.wavelengths).

� K (p�m) is the matrix representative of interference signal(s);p is the number of interference signal(s). If collinearity exists inK matrix, singular value decomposition is applied prior tomatrix inversion. Both KþK and K0K are used to calculatethe covariance matrix of interferences, which are unitless.

� The exponent �1/2 in Equation (2) is part of the derivationof the GLS calculation. The usual GLS regression is tode-weight the uncertainty covariance among samples, whereb ¼ ðX0

inputS�1Xinput�1X0

inputS�1Yinput andS is n� n. Whereas,

in the GLS pre-processing, the uncertainty covariance isamong the wavelength and S becomes m�m. Assumingthe spectrum of an analyte is known, Kanalyte (1�m), then themodel can be written as Xinput ¼ Canalyte � Kanalyte þ e, whereS ¼ covðeÞ ¼ gK0 � K þ I. The GLS estimator can be expressedas Canalyte ¼ XinputS

�1K0analyteðKanalyteS

�1K0analyte�1. In order for

GLS estimator expressed as a normal Ordinary Least Squares(OLS) solution, the pre-processing step onXinput andKanalyte mustbe written as XGLS¼Xinput S

�1/2 and Kanalyte,GLS¼Kanalyte S�1/2.

Then, the GLS estimator expressed as an OLS solution becomesCanalyte ¼ XGLSK

0analyte;GLSðK0

analyte;GLSK0analyte;GLS�1

� b and g are unitless scalar adjustment factors, allowingfor the adjustment on the degree of covariance matrix ofinterferences. Unity was used for b in the NAS calculation,

Table I. Plackett–Burman design matrix

Design APAP (%, w/w) MCC (%, w/w) LAC (%, w/w) APAP type MCC type

1 15 56.7 28.3 Coarse powder 2002 5 63.3 31.7 Coarse powder 1013 15 28.3 56.7 Coarse powder 2004 5 31.7 63.3 Coarse powder 1015 15 56.7 28.3 Fine powder 1016 5 63.3 31.7 Fine powder 2007a 15 28.3 56.7 Fine powder 1018 5 31.7 63.3 Fine powder 200

a Design 7 was used in calibration dataset, while the other designs were used in prediction dataset.

www.interscience.wiley.com/journal/cem Copyright � 2010 John Wiley & Sons, Ltd. J. Chemometrics 2010; 24: 288–299

Z. Shi et al.

290

while a range of values were used for g, including 0, 1, 102, 104,106 and 108.

� þ indicates a pseudo-inverse operation.� 0 indicates a transpose operation.� XNAS and XGLS are the spectra after NAS and GLS pre-processing, sharing the same units as Xinput.

As it can be seen from the above equations, NAS orthogo-nalizes spectra with respect to interfering factors by projectingthe data onto their null space, while GLS shrinks the interferencespace via matrix division.Prediction of APAP concentrations based on the empirical

calibration dataset was used as an example to comparebetween NAS and GLS pre-processing. In this case, theinterference vectors (K) included reduced scattering coeffi-cients for all components and absorption coefficients for LACand MCC.

2.4. Optical coefficient-based pre-processing

2.4.1. Empirical calibration dataset

Design 7 and pure component spectra were used as the empiricalcalibration dataset to build individual PLS-I models for theprediction of each chemical component in the prediction dataset.The NAS/GLS pre-processing for each component of thedesigned three-component mixtures was performed based onthe interference signals (K) including absorption spectra (ma (l))of other components and scattering spectra (ms

0 (l)) of all thecomponents. Pure-component spectra were also used as vectorsto perform NAS/GLS pre-processing to compare with opticalcoefficient-based pre-processing. When the predicted concen-tration of one component was desired, the absorbance spectra ofthe remaining components in the mixture were treated aschemical interference in pure-component spectra-based NAS/GLS pre-processing. Additionally, commonly-employed pre-processing routines (e.g. SNV, detrending, derivatives andcombinations) were combined with optical coefficient-basedpre-processing to investigate potential effects on predictionperformance.

2.4.2. Synthetic calibration dataset

Amodification of the reported protocol [10] was used to generatesynthetic calibration spectra.

(1) A noise matrix (N) was generated by orthogonalizing thenoise spectra (M) with respect to the pure-componentreduced scattering coefficients (Q) using the followingformula:

N ¼ ½I� Qþ � Q� �M (3Þ

The noise spectra (M) were composed of high and low(temporal) frequency noise, which were mean-centeredspectra of a single run of Design 7 (representing samplingvariations) and mean-centered spectra across three replicateruns of Design 7 (representing replicate/batch variations).The estimated ranks of M and N were calculated to be 3 and105. The rank of M was expected to be the signals (withhigh eigen values) representative scattering properties ofthe powder mixtures, while the rank of N was expected tobe the structured and unstructured noises (with small eigenvalues).

(2) Singular value decomposition was used to extract loadings(P1) from the noise matrix (N), which were further combinedwith reduced scattering coefficients (Q) of pure-componentraw materials to form an interference matrix (L) to representall the potential interference vectors. The singular vectors (P2)of the combined matrix (L) were extracted for following use.The number of extracted factors in P1 and P2 was set to 10 inorder to take advantage of additional noise and interferencefactors to produce spectral effects similar to random ‘white’noise. Using singular value decomposition to generate multi-variate subspace models to describe interference was alsoreported elsewhere [20].

N ¼ T1 � P01 þ E (4Þ

L ¼ T2 � P02 þ E (5Þ

where T1 and T2 are the scores for the noise and interferencematrix and E represents the error term after singular valuedecomposition.

(3) The singular interference vectors (P2) were projected onto thenoise matrix (N) to generate their corresponding scores (T3).Then, the variance (w1) of the scores for individual interfer-ence vectors was calculated and used to weigh the inter-ference vectors.

T3 ¼ N � P02 (6Þ

(4) The interference spectra (XINT) were generated by combiningnormally distributed random numbers (J) and weightedinterference vectors; diag indicates an operation of restruc-turing scalar values into a diagonal matrix.

XINT ¼ J � diagðw1Þ � P02 (7Þ

(5) Mixture spectra (XMIX) were generated based on randomlygenerated concentration values (H) and weighted pure-component spectra.

XMIX ¼ H � diagðw2Þ � Xpure component (8ÞRegarding the randomly generated concentration values,

the blending composition of Design 7 (0.15/0.283/0.567, w/w)was used as the mean values for individual components. Therelative standard deviation for all three components was setto be 25%; as reported that relative standard deviation largerthan 10% did not present significant difference on thecalibration performance [10]. In addition, the blending com-position of Design 7 was also used as the weight factor (w2)for individual components.

(6) The interference spectra were added onto the mixture spec-tra to generate the synthetic spectra (XSYN). In total, 1500synthetic spectra were generated.

XSYN ¼ XMIX þ XINT (9)

Synthetic spectra were further combined with real spectracollected from design 7 in order to build separate PLS Regressionmodels for the prediction of the chemical concentrations of eachcomponent in the prediction dataset. The prediction perform-ance of the synthetic calibration set was compared against that ofthe empirical calibration set to explore the capacity of syntheticcalibration on concentration prediction under practical con-ditions. The signal processing was performed similar to thosedescribed in Section 2.4.1.

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

291

2.5. Software and data processing

All above calculations were executed using Matlab 7.1 (TheMathworks, Natick, MA) with the PLS_Toolbox 3.1 (EigenvectorResearch, Inc., Manson, WA), as well as a library of Matlab routineswritten in-house to support this work.

3. RESULTS AND DISCUSSION

3.1. Optical coefficients of pure component raw materials

The optical coefficients of the five raw materials (Figure 1) are thebasis for the following observations. First, absorption coefficientsincreased and reduced scattering coefficients decreased whenwavelength was increased. These observed relationships agreewell with early reported results [12,21]. Second, it was found thatthe reduced scattering coefficient was inversely correlated to theabsorption coefficient for individual raw materials. The scatteringcorrection via the reduced scattering coefficient is thereforeexpected to be more attenuated in regions of absorption peaksthan in other wavelength regions. This observation is inagreement with the argument that removal of scattering signals(representative of physical interferences) should be minimized(i.e. down-weighted) in spectral regions where concomitantchemical constituents absorb very strongly [3]. Third, sincedifferent particle sizes of APAP and MCC were used, therelationship between optical coefficients and particle size wasfound to match expectations, in which APAP coarse particle andPH200 (median particle size¼ 180mm) showed larger absorptioncoefficients and smaller reduced scattering coefficients com-pared to APAP fine particle and PH101 (median particlesize¼ 50mm).

3.2. NAS versus GLS

The spectral comparison between NAS and GLS on removinginterference signals for APAP is shown in Figure 2. It was observedthat GLS pre-processed spectra (using the GLS formula reportedby Martens et al. [7]) converged with NAS pre-processed spectra

when g was increased beyond 106. No change to thepre-processed spectral shape was observed upon furtherincrease of g. The similarity of NAS and GLS pre-processedspectra illustrated here can be described by the equivalence ofthe b in NAS and the g in GLS, which was analytically proven byMartens [22]. The equivalence can be described by the followingequation:

g ¼ 1þ b

1� b

� �2

�1 (10)

As indicated by the above equation, g will become infinitewhen b is equal to 1. Since unity was used here for b in the NAScalculation, it allowed 100% suppression of interference signalsvia orthogonalization. Thus, g in GLS was required to bemaximized in order to mitigate interference signals as close aspossible to 100%. This explains why GLS pre-processed spectraoverlapped well with NAS pre-processed spectra when g wasincreased as high as 106 or even higher.Although unity is commonly used for b, other possible b values

are necessary to test the validity of Equation (10). Based on theequation, the working range for these two scalar values isobserved to be g >�1 and 0� b� 1. Therefore, multiple b valueswere randomly selected within the range, followed by calculationof g values according to the equation. Then, correspondingpre-processing calculations were performed according toEquations (1) and (2). It was found that the multiple pairs of band g resulted in identical results of XNAS and XGLS withinmachine precision, while the eigen values of the matrices of½I� b � Kþ � K� in Equation (1) and ðg � K0 � K þ IÞ�1=2 in Equation(2) indicated that the convergence of eigen values takes placewhen b is close to 1 and g is close to infinity. Therefore, theanalytically derived Equation (10) can be used for empiricalpurposes when relating NAS to GLS pre-processed spectra.However, the validity of the equation holds only for b¼ 1 and g asit approaches infinity.Prediction performance between NAS and GLS pre-processing

is also compared in Table II. The original idea of NAS and GLS

Figure 1. Predicted absorption ma (l) (left) and reduced scattering coefficients ms0 (l) (right) of pure component raw materials.

www.interscience.wiley.com/journal/cem Copyright � 2010 John Wiley & Sons, Ltd. J. Chemometrics 2010; 24: 288–299

Z. Shi et al.

292

pre-processing was to suppress potential interferences, therebyrequiring only one principal component (PC) to describe thevariance left in the system, which ideally belongs only to theanalyte of interest [7]. Thus, the prediction error at the first PC forAPAP concentration was used to compare prediction perform-ance between NAS and GLS pre-processed spectra. It wasexpected that increasing g was necessary for GLS to achieveprediction performance (prediction error) equivalent to NAS.Therefore, based on the above results, it can be claimed that NASand GLS can be used equivalently in both theoretical andpractical senses. Based on this equivalence, only NAS pre-processing is discussed in the remaining sections.

3.3. Optical coefficient-based pre-processing on empiricalcalibration dataset

Comparison of spectra after different pre-processing routines forindividual components is shown in Figure 3. Without using anypre-processing, the raw spectra were the same for the threecomponents (Figure 3D–3F). After using pure-component spectra

as vectors to perform NAS pre-processing, spectral features wereenhanced (Figure 3G–3I), corresponding to individual com-ponents (Figure 3A–3C). Compared to pure-component spectrabased pre-processing, applying optical coefficients as vectors inNAS pre-processing offered the additional advantage ofsuppressing baseline noise (Figure 3J–3L), which was causedby physical interference when spectra of moving solids werecollected during a powder blending process. For instance, theAPAP peak around 1150 nm was practically indistinguishablecompared to baseline noise in Figure 3G, while the signal-to-noise ratio for that peak is dramatically enhanced in Figure 3J.The attenuation of baseline noise can be attributed to theinclusion of reduced scattering coefficients (representative ofphysical interferences) during signal processing.The inclusion of optical coefficients as interference vectors in

NAS pre-processing led to improved prediction performance, asillustrated in Figure 4. Compared to raw spectra (dots) and thespectra after pure-component spectra based pre-processing(pluses), it was determined that the same or even lowerprediction error was achieved by fewer PCs after using optical

Figure 2. Spectral comparison betweenNAS and GLS pre-processing. In total, 10 example spectra are shown here to represent the total variance span of

pre-processed spectra. (A): spectra after NAS pre-processing. (B, C, D, E and F): spectra after GLS pre-processing with g¼ 0, 1, 102, 104 and 106. When g¼ 0,

the spectra in plot B are equivalent to the raw spectra. The Y-axis in the plots was expressed in arbitrary units.

Table II. Comparison of prediction performance for APAP concentration after NAS and GLS pre-processing using only the first PC

Prediction error (%, w/w)a GLS NAS

g¼ 0 g¼ 1 g¼ 102 g¼ 104 g¼ 106 g¼ 108

Median 14.05 12.45 35.27 8.14 2.78 2.89 2.8725th percentile 6.58 5.04 24.34 5.46 1.24 1.23 1.2375th percentile 23.71 22.30 39.82 10.87 4.19 4.36 4.37

a Prediction error is the absolute value of the difference between predicted and nominal concentration.

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

293

coefficients in NAS pre-processing (circles). Only one PC wasrequired for prediction of APAP (Figure 4A), indicating efficientmitigation of both absorption and scattering interference signals.However, more than one PC was required to reach the smallestprediction error for LAC and MCC (Figure 4B and 4C). This waspotentially due to the following reasons. First, the main spectralfeatures of LAC and MCC are highly correlated (Figure 3B and 3C).The general NAS/GLS requirement [7] of the analyte spectrumbeing linearly independent from major interference spectra didnot strictly apply; consequently, the signals were not efficientlyseparated. Second, the multiplicative scattering effects might notbe fully addressed by using optical coefficients in NAS/GLSpre-processing. Since NAS/GLS pre-processing is designed toremove interference signals based on the assumption that aspectrum is a summed function of the analyte and interferences,the non-additivity of linear calculation procedure used in NAS/GLS might not fully remove the multiplicative scattering effects.Thus, the unique spectral features of APAP relative to themultiplicative scattering signals allowed one PC based PLSmodel,while the plain spectra features of LAC and MCC required morethan one PC.Prediction performances among different generic spectral

pre-treatments (i.e. SNV, detrending, derivative and combi-nations) were also compared. The results indicated that SNV

pre-processing was the most efficient pre-treatment to reach thelowest prediction error for all three components (data notshown). Therefore, a combination pre-treatment between SNVand optical coefficient-based NAS pre-processing was appliedlater to investigate potential effects on prediction performance.Considering the similarity between MSC and SNV and thepurpose of the paper (optical coefficient-based signal proces-sing), other commonly-used pre-treatments were not tested inthe paper, such as MSC and EMSC. The potential combinationbetween EMSC and optical coefficient-based signal processingmay be explored in the future.Combining SNV with optical coefficient-based NAS pre-

processing (Figure 3P–3R) was found to reduce baseline noiseand further enhance the signal-to-noise ratio, compared to SNVpre-processing only (Figure 3M–3O). For instance, the APAP peakaround 1150 nm in Figure 3P had a higher signal-to-noise ratiothan that in Figure 3M. Correspondingly, in Figure 4, thecombination pre-treatment (triangles) yielded better predictionperformance than SNV pre-processing alone (squares), especiallyin the case of APAP and MCC. The potential reasons for theenhanced performance of combination pre-treatment can beexplored in the following perspectives. First, optical coefficient-based NAS pre-processing present the capacity to address someof the multiplicative interferences that SNV cannot, although it is

Figure 3. Spectral comparison after different spectral pre-treatments for individual components. In total, 10 example spectra are shown here torepresent the total variance span of pre-processed spectra. Rows represent spectra after different pre-processing routines. Columns represent spectra

after different pre-treatments used for concentration prediction of APAP, LAC andMCC from left to right. A–C represent the pure component spectra. D–F

are mean-centered raw input spectra. G–I are spectra after using pure component spectra in NAS pre-processing. J–L represent spectra after using optical

coefficients in NAS pre-processing. M–O represent spectra SNV pre-processing. P–R represent spectra after SNV and optical coefficient-based NASpre-processing. The Y-axis in the plots is expressed in arbitrary units.

www.interscience.wiley.com/journal/cem Copyright � 2010 John Wiley & Sons, Ltd. J. Chemometrics 2010; 24: 288–299

Z. Shi et al.

294

based on an additive matrix operation. The capacity to addressmultiplicative interferences via NAS pre-processing might beattributed to the unique spectral features of ma (l) and ms

0 (l)mentioned in Section 3.1. These results confirmed well with earlyreported study [4]. Second, optical coefficient-based NASpre-processing might address the additive interference/noisebetter than SNV, considering NAS pre-processing as linearlyadditive operation. The underlying mechanism for opticalcoefficient-based NAS pre-processing, whether it is an additive,multiplicative or combination correction, requires futher inves-tigation.Additionally, compared to optical coefficient-based NAS

pre-processing (Figure 3J–3L and circles in Figure 4), thecombination pre-treatment enhanced spectral features (e.g.the peak around 1350 nm in Figure 3P–3R), and improvedprediction performance for all three components (triangles inFigure 4). This supports the previous explanation that usingoptical coefficients in NAS/GLS pre-processing does not fullyaddress the multiplicative scattering interference. After using thecombination pre-treatment, the same number of PCs was usedfor LAC and MCC and one more PC was required for APAP,compared to the optical coefficient-based NAS pre-processing.

3.4. Optical coefficient-based pre-processing on syntheticcalibration dataset

The procedure used here to generate synthetic calibrationspectra was slightly different from the reported protocol [10]. Dueto the availability of reduced scattering coefficients representa-tive of physical interferences, the reduced scattering coefficientsof pure component raw materials were used directly instead ofusing polynominal functions to create baseline noise vectors.Regarding prediction performance, optical coefficient-based

NAS pre-processing (pluses and triangles in Figure 5, respectively)was found to enhance the prediction performance of the

synthetic calibration dataset, similar to what was observed in theempirical calibration dataset. In general, optical coefficient-basedNAS pre-processing achieved the same or even lower predictionerror via more parsimonious PLS models. Combining derivative(window size: 11, polynomial order: 2, derivative order: 2) withoptical coefficient-based NAS pre-processing resulted in a modelwith reasonable predictions for LAC and MCC (triangle lines inFigure 5B and C). The utility of second derivative on syntheticcalibration was also reported elsewhere [10]. Due to the uniquespectral features of APAP (relative to LAC and MCC), only opticalcoefficient-based NAS pre-processing was required (pluses inFigure 5A).In the RMSEP scree plots of models using optical coefficient-

based NAS pre-processing, it was observed that adding extra PCsin APAP and MCC prediction profiles led to an increasedprediction error (Figure 5A and C). This might be attributed to thefollowing reason. After suppression of the potential absorptionand scattering interference signals, it was expected thatstructured noise was the second largest signal variation followingthe signal variation attributed to the analyte of interest. Additionof these PCs representative of the structured noise carries the riskof deteriorating the model performance.

3.5. Practical significance

In the study, one of the blending compositions (i.e. Design 7) wasarbitrarily chosen and included in the calibration dataset. Thepurpose was to combine with pure component spectra/syntheticspectra to form an example of a reduced calibration dataset. Infuture, it would be meaningful to choose other blendingcompositions and investigate the effect of the single blendingcomposition on the performance of the optical coefficient-basedcalibration approach.For the empirical calibration dataset, multiple pure component

spectra were combined with spectra collected from Design 7 to

Figure 4. Scree plots of RMSEP for empirical calibration dataset. A–C represented results for APAP, LAC and MCC, respectively. Inside each plot, dotsrepresent spectra without pre-processing; pluses represent spectra after using pure component spectra in NAS pre-processing; circles represent spectra

after using optical coefficients in NAS pre-processing; squares represent spectra after SNV pre-processing and triangles represent spectra after SNV and

optical coefficient-based NAS pre-processing. RMSEP at zero PC was calculated by the square root of the mean square difference between individual

nominal y values in the prediction dataset and mean nominal y value in the calibration dataset.

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

295

serve as the calibration spectra. In a practical industrial setting,multiple pure component spectra are not always available. Thus,a single pure component spectrum of each component wasinvestigated and compared to the results after using multiplepure component spectra. It was observed in Figure 6 that theRMSEP after using a single pure component spectrum (solid lines)was quite similar to that of using multiple pure componentspectra (triangles), especially for prediction performance on LACand MCC. The only difference was found in APAP prediction thatusing a single pure component spectrum required one more PC

than using multiple pure component spectra. Overall, nosignificant difference was found on RMSEP at the requirednumber of PC between using single and multiple purecomponent spectra (p< 0.05). Thus, the above results indicatethe potential for optical coefficients in NAS pre-processing tobuild NIR calibration models based only on one mixtureconcentration (i.e. one blending composition) and at least onepure component spectrum for each component.For the synthetic calibration dataset, due to the linear

combinations between interference signals and pure-component

Figure 6. Comparison of prediction performance between using single and multiple pure component spectra in empirical calibration dataset. A–C

represent results for APAP, LAC and MCC, respectively. Triangles represent results after using multiple pure component spectra. Solid lines and error barsrepresent results after using a single pure component spectrum for each component multiple times. RMSEP at zero PC was calculated by the square root

of the mean square difference between individual nominal y values in the prediction dataset and mean nominal y value in the calibration dataset.

Figure 5. Scree plots of RMSEP for synthetic calibration dataset. A–C represent results for APAP, LAC and MCC, respectively. Inside each plot, dots

represent results without pre-processing; pluses represent results after using optical coefficients in NAS pre-processing; circles represent results aftersecond derivative pre-processing and triangles represent results after second derivative and optical coefficients-based NAS pre-processing. RMSEP at zero

PC was calculated by the square root of the mean square difference between individual nominal y values in the prediction dataset and mean nominal y

value in the calibration dataset.

www.interscience.wiley.com/journal/cem Copyright � 2010 John Wiley & Sons, Ltd. J. Chemometrics 2010; 24: 288–299

Z. Shi et al.

296

spectra used to generate virtual calibration spectra, the syntheticspectra were expected to be simple compared to the empiricalspectra. Thus, the commonly-used error criteria, such as RMSECand RMSECV, were no longer suitable for the selection of theoptimal number of PC [10], since those only carried informationfrom virtual samples in the current case. A better metric todetermine the optimal number of PCs based on real samples isnecessary for synthetic calibration. Using figure of merit (FOM)calculations to indicate the required number of PC for syntheticcalibration dataset has been proposed [10]. The same was takenhere. In order to calculate FOM, it is required to determine theinstrumental noise [23]. Thus, mean standard deviation ofpredicted concentration across three replicate blending runs ofDesign 7 was used to represent instrumental noise. Signal-to-noise ratio was used here as an example to illustrate itsusefulness to determine the optimal number of PCs. Theequations used are listed as follows:

NASi^

¼ b � bT � b� ��1�bTxi (11)

S=N^

i

¼a1 � NASi

^��������

� �þ ao

dr(12)

where xi is a sample spectrum, b is the regression vector for

individual component in synthetic calibration dataset andNASi^

is

the net analyte signal vector for the sample spectrum. NASi^ ����

���� is a

scalar representation of the NASi^

of the sample. In order totranslate NAS value to signal-to-noise ratio, linear regression wasperformed between measured concentration and the univariateNAS values in order to estimate scale (a1) and offset (ao)

coefficients to transform the NAS value to units of concentration.dr is the instrument noise expressed in concentration units. Thus,signal-to-noise ratio is unitless. Finally, signal-to-noise ratio wasreported here as the mean of the S/N values for all samples underconsideration.As it can be seen in Figure 7D–7F, signal-to-noise ratio based on

calibration dataset (circles) showed corresponding profilescompared to prediction performance for all three components(Figure 7A–7C), indicating its suitability to determine thenecessary number of PCs. The advantage of using FOM onidentifying the number of PCs can be attributed to its ability tosimultaneously gauge the increase in accuracy (i.e. enhancedchemical signals) against the reduction of instability (i.e.unmodeled noise), compared to just monitoring the unmodelednoise in the error profiles. Incorporating enhanced chemicalsignals in selecting the number of PCsmatched well with the goalof using optical coefficient in NAS/GLS pre-processing to reduceinterference signals and ultimately capture the greatest fractionof the net analyte signal embedded in the spectra.Comparison of prediction performance between using the

empirical and the synthetic calibration dataset can be found inFigure 8 and Table III. Although the empirical calibration datasetshowed slightly better prediction performance compared to thesynthetic calibration, it can be stated that both calibrationdatasets reached roughly the same prediction performance. Inthe calibration dataset used in this paper, including bothempirical and synthetic cases, only one blending compositionwas used in addition to pure-component spectra. Given theavailability of these two types of data in industrial settings, theoptical coefficient-based signal processing reported here isexpected to be useful for routine model calibration and modelupdate as well. First, due to the capacity of removing potentialabsorption and scattering signals of available interferants, it is

Figure 7. Comparison between RMSEP and signal-to-noise ratio (s/n) in synthetic calibration dataset. A–C represent RMSEP for APAP, LAC and MCC,

respectively. RMSEP at zero PC was calculated by the square root of the mean square difference between individual nominal y values in the predictiondataset and mean nominal y value in the calibration dataset. D, E and F represent s/n for APAP, LAC and MCC, respectively. Inside D, E and F, circles are s/n

for synthetic calibration dataset, while pluses are s/n for empirical prediction dataset.

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

297

possible to simplify the empirical model calibration withoutinvolving interference variations. Second, because of the removalof any known interference signals, it can facilitate the ability ofPLS to account for any sample independent variations (e.g.instrument drift), if there is any, and to enhance the modelrobustness. Third, periodic update of the absorption andscattering interferences followed by NAS/GLS pre-processing isalso expected to enhance the model robustness via reducingpotential prediction error caused by the variations in absorptionand scattering properties of the pure-component raw materials,such as moisture content, particle size and particle shape of rawmaterials. The idea of periodic update agrees well with thefundamental basis for the efficient calibration method develop-ment flow path [10].Additionally, determination of optical coefficients can also be

used for any intermediates and finished pharmaceuticalproducts. For cases where pure component raw material is notavailable, the optical coefficients determined for in-processmaterials or finished products are still applicable as vectors toperform NAS/GLS pre-processing to mitigate the potentialinterferences based on the prior knowledge. For instance, thereduced scattering coefficient of a pharmaceutical compact canbe regarded as interference and suppressed via NAS/GLS

pre-processing when the chemical information of the compactis desired.Finally, the advantages of optical coefficient-based signal

processing demonstrated above are expected to be applicablefor use in industries beyond pharmaceuticals, such as agricultureand food. For example, determination of optical coefficients onapples has already been reported [21,24]. The potential variationin agricultural products is normally considered to be moreintense compared to pharmaceutical samples, e.g. physicalvariations caused by production area and year. Combining theavailable optical coefficients with these efficient calibrationapproaches is expected to be useful both for model calibrationswithout involving variation of these physical interferences, andfor routine model update by removing interference signals in amanner identical to that applied to pharmaceutical products.Thereby, the robustness of a spectroscopy-based multivariatemodel can be enhanced.

4. CONCLUSIONS

NAS and GLS pre-processing were theoretically and practicallyillustrated to be equivalent. Optical coefficient-based signal

Table III. Prediction performance comparison between empirical and synthetic calibration

PC Prediction error (%, w/w)a

Median 25th percentile 75th percentile

Empirical calibration APAP 2 1.02 0.41 2.37LAC 4 2.18 0.94 4.45MCC 2 3.19 1.42 5.20

Synthetic calibration APAP 1 2.46 1.13 4.29LAC 4 2.11 0.91 4.34MCC 1 3.62 1.70 5.35

a Prediction error is the absolute value of the difference between predicted and nominal concentration.

Figure 8. Predicted versus observed concentration plot for the prediction dataset after using empirical (left) and synthetic (right) calibration spectra.Circles represent the 50th percentile, while the upper and lower asterisks represent the 25th and 75th percentiles, respectively. Inside each plot, the

results from bottom to top represent results of APAP, LAC andMCC, respectively. The unity line is shown in black. The results for LAC andMCC are offset for

clarity.

www.interscience.wiley.com/journal/cem Copyright � 2010 John Wiley & Sons, Ltd. J. Chemometrics 2010; 24: 288–299

Z. Shi et al.

298

processing simplified NIR multivariate model calibration (bothempirical and synthetic calibration), using limited calibrationsamples, allowing parsimonious multivariate models, and reach-ing the same or even lower prediction error. The potential ofoptical coefficient-based signal processing is expected to bebeneficial for both model calibration and update in routine NIRspectroscopic analysis.

Acknowledgements

The authors would like to acknowledge Ms Ryanne Palermo forher skillful scientific and grammatical editing, which has beenhelpful in preparing this manuscript.

REFERENCES

1. Martens H, Naes T. (eds). Multivariate Calibration Wiley: New York,1989.

2. S0aiz-Abajo MJ, Mevik BH, Segtnan VH, Næs T. Ensemble methods anddata augmentation by noise addition applied to the analysis ofspectroscopic data. Anal. Chim. Acta 2005; 533: 147–159.

3. Martens H, Nielsen JP, Engelsen SB. Light scattering and light absor-bance separated by extended multiplicative signal correction. Appli-cation to near-infrared transmission analysis of powder mixtures.Anal. Chem. 2003; 75: 394–394.

4. Shi ZQ, Anderson CA. Scattering orthogonalization of near-infraredspectra for analysis of pharmaceutical tablets. Anal. Chem. 2009; 81:1389–1396.

5. Goicoechea HC, Olivieri AC. A comparison of orthogonal signalcorrection and net analyte preprocessing methods. Theoretical andexperimental study. Chemom. Intell. Lab. Syst. 2001; 56: 73–81.

6. Lorber A. Error propagation and figures of merit for quantification bysolving matrix equations. Anal. Chem. 1986; 58: 1167–1172.

7. Martens H, Hoy M, Wise BM, Bro R, Brockhoff P. Pre-whitening of databy covariance-weighted pre-processing. J. Chemom. 2003; 17:153–165.

8. Marbach R. On Wiener filtering and the physics behind stasticalmodeling. J. Biomed. Optic. 2002; 7: 130–147.

9. Marbach R. A newmethod for multivariate calibration. J. Near InfraredSpectrosc. 2005; 13: 241–254.

10. Cogdill RP, Herkert T, Anderson CA, Drennen JK. Synthetic calibrationfor efficient method development: analysis of tablet API concen-tration by near-infrared spectroscopy. J. Pharm. Innov. 2007; 2:93–105.

11. Mauro K, Oota T, Tsurugi M, Nakagawa T, Arimoto H, Tamura M, OzakiY, Yamada Y. New methodology to obtain a calibration model for

noninvasive near-infrared blood glucosemonitoring. Appl. Spec. 2006;60: 441–449.

12. Abrahamsson C, Lowgren A, Stromdahl B, Svesson T, Andersso-n-Engels S, Johansson J, Folestad S. Scatter correction of transmissionnear-infrared spectra by photon migration data: quantitative analysisof solids. Appl. Spec. 2005; 59: 1381–1387.

13. Mobley J, Vo-Dinh T. Optical properties of tissue. In BiomedicalPhotonics Handbook, Vo-Dinh T (ed.). CRC Press: Boca Raton, 2003;2.1–2.75.

14. Welch AJ, van Gemert MJC, Star WM, Wilson BC. Definitions andoverview of tissue optics. In Optical-thermal Response of Laser-irradiated Tissue, Welch AJ, van Gemert MJC (eds). Plenum Press:New York, 1995; 15-46.

15. Burger T, Fricke J, Kuhn J. NIR radiative transfer investigations tocharacterise pharmaceutical powders and their mixtures. J. NearInfrared Spectrosc. 1998; 6: 33–40.

16. Sun Z, Torrance S, McNeil-Watson FK, Sevick-Muraca EM. Applicationof frequency domain photon migration to particle size analysis andmonitoring of pharmaceutical powders. Anal. Chem. 2003; 75:1720–1725.

17. Martens H, Stark E. Extended multiplicative signal correction andspectral interference subtraction: new preprocessing methods fornear infrared spectroscopy. J. Pharm. Biomed. Anal. 1991; 9: 625–635.

18. Cogdill RP, Short SM, Forcht R, Shi ZQ, Shen YC, Taday PF, Andersso-n-Engelsen S, Drennen JK. An efficient method-development strategyfor quantitative chemical imaging using terahertz pulse spectroscopy.J. Pharm. Innov. 2006; 1: 63–75.

19. Shi Z, Cogdill RP, Short SM, Anderson CA. Process characterization ofpowder blending by near-infrared spectroscopy: blend end-pointsand beyond. J. Pharm. Biomed. Anal. 2008; 47: 738–745.

20. Kohler A, Sule-Suso J, Sockalingum GD, Tobin M, Bahrami F, Yang Y,Pijanka J, Dumas P, Cotte M, van Pittius DG, Parkes G, Martens H.Estimating and correcting mie scattering in synchrotron-basedmicro-scopic fourier transform infrared spectra by extended multiplicativesignal correction. Appl. Spectrosc. 2008; 62: 259–266.

21. Chauchard F, Roger JM, Bellon-Maurel V, Abrahamsson C, Andersso-n-Engels S, Svanberg S. MADSTRESS: a linear approach for evaluatingscattering and absorption coefficients of samples measured usingtime-resolved spectroscopy in reflection. Appl. Spec. 2005; 59: 53–59.

22. Martens H. Personal Communication with the author. 10October 2008.

23. Short SM, Cogdill RP, Anderson CA. Determination of figures of meritfor near-infrared and Raman spectrometry by net analyte signalanalysis for a 4-component solid dosage system. AAPS Pharm. Sci.Tech. 2007; 8: 96.

24. Chauchard F, Roussel S, Roger JM, Bellon-Maurel V, Abrahamsson C,Svensson T, Andersson-Engels S, Svanberg S. Least-squares supportvector machines modelization for time-resolved spectroscopy. Appl.Opt. 2005; 44: 1–17.

J. Chemometrics 2010; 24: 288–299 Copyright � 2010 John Wiley & Sons, Ltd. www.interscience.wiley.com/journal/cem

Optical coefficient-based NIR calibration

299