Anal Letters 2007

8/11/2019 Anal Letters 2007

1/25

CHEMOMETRICS

Selection of Wavelength Range and Numberof Factors to be Used in the PLS Treatment

of Spectrophotometric Data

M. Luz Luis, JoseM. G. Fraga, Francisco Jimenez,

Ana I. Jimenez, Oscar M. Hernandez, and Juan J. Arias

Departamento de Qumica Analtica, Nutricion y Bromatologa, Facultad

de Qumica, Universidad de La Laguna, La Laguna, Tenerife, Spain

Abstract:A procedure for selection of wavelength range and number of factors to be

used in partial least square calibration that involves the calculation of predictionresidual sum of squares (PRESS) in different conditions is proposed. The best model

takes into account the minimum PRESS value that does not show significant differ-

ences with respect to the corresponding model with fewer factors. The ability of the

proposed method to minimize errors in partial least squares (PLS) prediction is demon-

strated by applying it to the resolution of phenytoine (DPH) and phenobarbital (PB)

binary mixtures with errors less than 2.8%; the results are compared with those

obtained using another wavelength selection procedure. The ensuing method, which

was validated by high performance liquid chromatography (HPLC), also gives good

results with real samples (pharmaceutical preparations).

Keywords:Wavelength selection, number of factors selection, partial least squares

INTRODUCTION

A whole absorption spectrum provides a more accurate description of the

sample concerned than a single measurement does; there is, however, some

Received 9 March 2006; accepted 23 May 2006

The authors wish to acknowledge the financial support of this work by Education,

Culture and Sports Council of Canary Government, research project PI2002/014.Address correspondence to Juan J. Arias, Departamento de Qumica Analtica,

Nutricion y Bromatologa, Facultad de Qumica, Universidad de La Laguna,

E-38071, La Laguna, Tenerife, Spain. Tel.: 34-922-318-076; Fax: 34-922-318-

003; E-mail: [email protected]

Analytical Letters, 40: 257280, 2007

Copyright# Taylor & Francis Group, LLCISSN 0003-2719 print/1532-236X onlineDOI: 10.1080/00032710600867598

257


2/25

redundancy, as not all wavelengths contribute unique information. Multi-

variate calibration methods can deal with this shortcoming of the spectral

data, but the improvement of results in these cases has raised the need for

alternative methods to identify those measured variables of actual relevance

with a view to constructing the calibration model.

Partial least squares (PLS) calibration methodology resolves the matrix of

independent variables (X, absorbances at different wavelengths) and that of

dependent variables (Y, analyte concentrations in the samples) into their

respective latent variables or factors. The number of independent variables

usually exceeds that of dependent variables, so the number of latent

variables or PLS factors that could in principle be used is at least as largeas the rank of the independent variable matrix. However, all those factors

are rarely used because the independent variables are not always free of

noise and some factors may contain noise and introduce it into the model. If

the two matrices are linearly related, there are no interferences, and the inde-

pendent variable matrix is free of noises, then the number of latent variables or

factors required to describe the model equals the rank of the dependent

variables. Such a number must be increased in the presence of a nonlinear

relation or some interference.

As with any calibration method, the selection of the independent variables

to be used plays a crucial role in the development of a quality PLS model.

Thus, although most PLS applications use the entire wavelength range (i.e.,the full-spectrum method), a number of alternative methods for wavelength

selection that exclude those associated with high noise, nonlinearity, or irrele-

vant information have been developed (Osborne et al. 1997; Goicoechea and

Olivieri 1999; Centner et al. 1996; Jouan-Rimbaud et al. 1995; Espinosa-

Mansilla et al. 2002; Leardi et al. 2002; Jiang et al. 2002; Galvao et al.

2001; Kompany-Zareh and Mirzaei 2004; Gourvenec et al. 2004; Abdollahi

and Bagheri 2004; Ghasemi et al. 2003). However, many researchers find

these methods difficult to apply because most of them involve complex algo-

rithms and mathematic developments. Therefore, new methods that can be

easily implemented are always welcome.

In this work, we developed a simple, easy-to-implement method, for the

simultaneous selection of the wavelengths and number of factors to be used in

the PLS regression of spectrophotometric data. The method involves con-

structing as many PLS-1 models as wavelength ranges with f factors are

possible for each analyte from the same calibration set. Each model is then

used to calculate the predicted residual sum of squares (PRESS) for the vali-

dation set with a variable number of factors from 1 to f. For each analyte andf

value, the model exhibiting the lowest PRESS that is significantly different

from the model corresponding to the same interval, but with (f2 1)

factors, is chosen. The PRESS values of the models thus selected for each

number of factors are then compared, and the one with the lowest PRESS

that is significantly different from the value for the model with one less

factor is chosen.

M. L. Luis et al.258


3/25

The proposed method for the simultaneous selection of the wavelength

range and number of factors was validated by applying it to the determination

of two antiepileptics [viz., phenytoin (DPH) and phenobarbital (PB), the

spectra for which in the UV region are completely overlapped] in pharma-

ceutical preparations. The results thus obtained are compared with those

provided by the method for the selection of wavelengths and number of

factors, the Bw coefficients method (Garrido et al. 1995); this method was

chosen because it is also simple and easy to implement.

A method has been developed concerning spectrophotometric data

(Garrido et al. 1995) that selects those wavelengths exhibiting the highest B

coefficients in the PLS regression of the full spectrum. To ensure anidentical variance in all variables, the data must previously be mean-

centered and standardized (i.e., weighted with the reciprocal of the standard

deviation); the regression coefficients thus obtained are designated Bw.

Matrix Bw can be calculated directly from the PLS loading for the model

with the optimum number of factors:

Bw W PT W1 QT

where Wis the Xweight-loading matrix, P the X-loading matrix, and Q the

Y-loading matrix. The centered, nonstandardized X matrix is then used to

construct various models involving a small number of variables corresponding

to a given level ofBwcoefficients, and each model is subjected to a leave-one-

out cross-validation process to determine the number of wavelengths to be

included in the final model.

SELECTION METHODS

Best Significant PRESS Method

The process was conducted using macros in the software Unscrambler

(Computer-Aided Modelling A/S, 1993). The steps leading to the bestmodel are as follows:

1. The spectra for the calibration and validation samples are recorded, and the

calibration matrix (C), which is used to develop the models, and validation

matrix (O), which allows the best model to be identified, are constructed.

Bothmatrices encompass i variables (wavelengthsn1,n2 . . . ni)andj samples.

2. AnMn1,ni

model is then constructed by applying PLS-1 to the calibration

matrix Cn1,ni

[i j], which is used to predict the validation samples

by using 1, 2, . . . h, . . . f factors, and the PRESSn1,nih values for each

Pn1,ni

hprediction made with each number h of factors are calculated.

3. The variable with the highest subscript, ni, is removed from Cn1,ni to

obtain a calibration matrix Cn1,ni21 [(i 2 1) j] that is used to construct

PLS Treatment of Spectrophotometric Data 259


4/25

an Mn1,ni21

model by PLS-1, and the corresponding predictions, Pn1,ni21

h,

with their respective PRESS values for each number h of factors

(PRESSn1,ni21

h) are calculated.

4. Step 3 is repeated as many times as needed to obtain matrixCn1,n3

[3 j],

Mn1,n3

,Pn1,n3

h , and PRESSn1,n3

h . In this way, every possible model having n1as its initial wavelength and at least three variables (i22 models) is con-

structed and the PRESS for its predictions with 1, 2, . . . h, . . . ffactors

calculated.

5. The variable with the lowest subscript (n1) is removed from the initial

matrix, Cn1,ni

[i j], to obtain Cn2,ni

[(i 2 1) j], from which model

Mn2,ni is constructed, and the corresponding predictions, Pn2,nih , andPRESS

n2,ni

hvalue for each number of factors tested are calculated.

6. The variable with the highest subscript (ni) is removed from Cn2,ni[(i 2 1) j] to obtain Cn2,ni21 [(i 2 2) j], and a model Mn2,ni21 is con-

structed from which Pn2,ni21

h and then PRESSn2,ni21

h are obtained.

7. Step 6 is repeated until matrices Cn2,n4

[3 j], Mn2,n4

and Pn2,n4

h, and

PRESSn2,n4

hare obtained. In this way, every possible model having n2 as

its initial wavelength and at least three variables (i 2 3 models) is

constructed.

8. Steps 5 through 7 are repeated until the following calibration matrix series

are obtained in each iteration:

Cn1;ni ; Cn1;ni1 ; Cn1;ni2 ;. . . ; Cn1;n5 ; Cn1;n4 ; Cn1;n3

Cn2;ni ; Cn2;ni1 ; Cn2;ni2 ;. . . ; Cn2;n5 ; Cn2;n4

Cn3;ni ; Cn3;ni1 ; Cn3;ni2 ;. . . ; Cn3;n5

.

.

....

.

.

.

Cni4 ;ni ; Cni4;ni1 ; Cni4;ni2

Cni3 ;ni ; Cni3;ni2

Cni2 ;ni

with their respective models (M

nx,ny), predictions (

Pnx,ny

h

), and PRESSnx,ny

h

values, where y x 2.

9. The optimum range for each number of factors, h, is that with the

minimum PRESSh value significantly smaller than PRESSh21 for the

same model with one less factor. This is usually checked by calculating

the statistic F, the tabulated value of which for a probability of 0.75

allows significant differences between models to be accurately detected

according to Haaland and Thomas methodology (Haaland and Thomas

1988). If the model with the lowest PRESS does not fulfill the previous

requirement, then the next model with the lowest PRESS is chosen and

the same criterion applied. This process must be carried out for the

1, 2, . . . h, . . .f factors tested. Because it is a restrictive criterion, it

avoids overfitting.



5/25

10. The previous step yieldsfmodels (one per factor), the best of which is

identified by applying the previous criterion again. In fact, the most

suitable model will be that possessing the lowest PRESSprovided

that its calculated F value reflects significant differences with the

PRESS corresponding to one less factor.

BwCoefficients Method

In spectroscopic applications,b coefficients (B) are those that relate the set of

independent variables (X) to that of dependent variables (Y). These coefficientscannot be used as such to identify the most relevant wavelengths in the model

since large absolute values may indicate, indifferently, that a variable is

significant or that it possesses a small absolute value and a high variability.

This shortcoming can be avoided by standardizing data (i.e., by weighting

the X variables with the reciprocal of the standard deviation, which causes

all variables to have the same variance). In this way, a large absolute value

of the bwcoefficients will inevitably correspond to a major Xvariable so that

the specific variables to be used to construct the model can be readily identified.

To find the optimum number of wavelengths, one must examine every

model constructed from wavelengths with bw coefficients not exceeding

specific, preset values. One additional problem, present always in PLSregression, arises from the need to determine the optimum number of

factors, which is calculated from the root mean square error of cross-

validation (RMSECV):

RMSECV

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffi1

N

XNi1

^y yi2

vuut

wherey is the predicted concentration for the ith calibration sample and yiits

actual concentration. The optimum number of factors can be decided on by

using two different criteria, namely:

1. Criterion I, which involves the number of factors corresponding to the

minimum local RMSECV in a plot of RMSECV against the number of

factors included in the model.

2. Criterion II, which is based on the number of factors preceding the first

local RMSECV minimum is most suitable, provided it is not significantly

greater than the minimum itself. Otherwise, the local minimum is chosen.

This is the criterion used by Unscrambler software (Computer-Aided

Modelling A/S, 1993).

Depending on the particular bwlevels chosen, sets with different numbers

of variables are obtained. Each set is used to construct a model and plot the



6/25

resulting RMSECV values against the corresponding numbers of wavelengths.

A minimum for each criterion used to select the optimum number of factors

will be obtained, and the number resulting in the lowest RMSECV is

chosen as optimal.

EXPERIMENTAL SECTION

Apparatus

Spectra were recorded on a Hewlett-Packard HP 8452A diode array spectro-photometer (Avondale, PA, USA) interfaced to a Vectra ES computer, also

from Hewlett-Packard. Quartz cuvettes of 1 cm light path and 4 mL inner

volume were used throughout.

Chromatographic runs were conducted on a Waters 600E liquid chro-

matograph (Waters; Milford, MA, USA) with a 20-mL loop volume

equipped with a Model 996 diode array detector from the same manufacturer.

Both were operated via an NEC Power Mate 433 computer. All determinations

were carried out using a NovaPack C18 column (3.9 150 mm) packed with

particles 4 mm in size.

pH measurements were made with a Crison digital pH meter

(Crison; Barcelona, Spain) furnished with a glass-saturated calomel doubleelectrode.

A Selecta ultrasonic bath (R. Espinar; Sevilla, Spain) and an AND ER-

182 analytical balance (Pacisa, S.A.; Madrid, Spain) were also used.

Software

The software used included Millennium version 2.10 (Millipore Corp.;

Milford, MA, USA; 1993) and MS-DOS-UV-Visible HP 8452A to control

the chromatographic system and spectrophotometer, respectively, and

Unscrambler II version 5.0 (Computer-Aided Modelling A/

S; Trondheim,

Norway; 1993) to perform the partial least squares regressions.

Reagents

The reagents employed included 100 mg mL21

standard solutions of DPH

and PB, which were prepared by direct weighing of the respective pure

commercial products (Sigma; Madrid, Spain) and dissolution in ethanol;

a 0.2 M NH4Cl/NH3 buffer of pH 10.0; and a 10-mM TRIS buffer ofpH 7.0. All reagents were analytical-grade and aqueous solutions

prepared in Millipore/Milli-Q (Millipore; Bedford, MA, USA) deionizedwater.



7/25

Procedure

Calibration, Validation, and Prediction Sets

The calibration, validation, and prediction sets were constructed by using a

factor design consisting of two analytes at four levels (42). Concentrations

in the calibration set were within the linear range for each analyte (viz.,

1.979.85 mg mL21

for DPH and 2.02 10.08mg mL21

for PB). The

lowest and highest values for the validation and prediction sets were greater

than the minimum and lesser than the maximum values, respectively, used

for the calibration set.

Determination of DPH and PB in Synthetic Mixtures

In 25 mL calibrated flasks were placed 5 mL of NH4Cl/NH3 buffer and thevolumes of DPH and PB solutions required to obtain a final concentration

over the range 1.979.85 and 2.0210.08 mg mL21

, respectively, plus that

of ethanol needed to make the mixture 20% (v/v) in it. The solutions weremade to volume with deionized water and their absorption spectra between

200 and 300 nm, every 2 nm, recorded against an identically prepared blank

containing no analyte.

Determination of DPH and PB in Pharmaceutical Preparations

Using PLS Regression

DPH and PB were determined in Epilantinw

(Otsuka Pharmaceutical, S.A.;

Barcelona, Spain) by weighing, grinding, and homogenizing 10 tablets of

the pharmaceutical, and using six samples of 40 mg each for analysis.

The drugs were extracted with 100 mL of ethanol in an ultrasonic bath for

20 min and diluted to 250 mL with deionized water. Finally, an aliquot of

3 mL of the resulting solution was analyzed as described in Determination

of DPH and PB in Synthetic Mixtures.

Determination of DPH and PB in Pharmaceutical Preparations

Using HPLC

The previous solutions of the two drugs were filtered across a polypropylene

membrane of 0.4 mm pore size prior to injection into the chromatograph.

Determinations were carried out by using a NovaPack C18 (3.9 150 mm)

column and a 60:40 mixture of methanol and 10 mM TRIS buffer at 7.0 pH

that was circulated at a flow rate of 1 mL min21. The drugs were quantified

spectrophotometrically from the areas under the peaks at 254 nm.



8/25

RESULTS AND DISCUSSION

As can be seen in Fig. 1, the absorption spectra for DPH and PB are completely

overlapped throughout the UV region. Both drugs exhibit a strong peak at

212 nm; in addition, PB gives a maximum at 240 nm and DPH a shoulder at

the same wavelength. DPH and PB absorb UV light at wavelengths up to 270

and 280 nm, respectively. Its strong spectral overlap makes this mixture an app-

ropriate model to assess the performance of the proposed method for selecting

the wavelength range and number of factors to be used in the determination.

The applicability of the method was assessed by applying it to three

different sample sets (viz., a calibration, a validation, and a prediction set).The calibration set was used to construct the PLS models for the different

possible wavelength ranges (Mnx,ny

). The validation set was employed to

select the prediction resulting in the fewest errors (viz., the lowest significant

PRESSnx,ny

h value) among those obtained with the models provided by

the calibration set and a variable number of factors (Pnx,ny

h). Finally, the

prediction set, which consisted of samples not included in the calibration or

validation set, was used to assess the predictive ability of the method.

These three sets, prepared as described in the section titled Experimental,

were used to select the optimum wavelength range and number of factors for

applying the PLS-1 method to the determination of each analyte in the sample.

Figure 1. Absorption spectra of DPH () and PB ( ) solutions.



9/25


10/25


11/25

variables (200242 and 200244 nm) but did not improve on the previous

results. The PRESS value obtained with two factors was significantly lower

than that obtained with one factor in the three cases.

As can be seen from Fig. 2b, the inclusion of a third factor decreased

PRESS in the region of high values (226232 nm) and increased it in the

region of low values (valleys). The surface in the region where a peak was

observed with two factors lost smoothness as a result of the system being

overfitted in most of these ranges. As can be seen from Table 1, the ranges

providing the lowest PRESS values with two and three factors were virtually

identical; however, the PRESS values obtained with two factors in the ranges

Table 1. Minimum values of PRESS and corresponding wavelength ranges for the

different numbers of factors assayed

Range (nm)

Number of factors

1 2 3 4

DPH

Best models using 2 factors

202 242 60.84 7.70 3 1023 9.50 1023 6.05 1022

200 242 60.58 1.41 1022 1.35 1022 5.45 1022

200 244 61.58 1.86 1022 8.60 1023 5.37 1022


200 246 61.58 3.69 1022 8.10 3 1023 4.61 1022

200 248 61.97 5.46 1022 8.30 1023 4.20 1022

200 244 61.34 1.86 1022 8.60 1023 5.37 1022


210 240 58.01 9.52 1022 2.93 1021 1.733 1022

212 240 46.75 5.85 100 2.21 1022 2.20 1022

212 232 52.60 8.91 1022 3.55 1021 2.29 1022

PB


232 238 26.01 1.68 3 1022 2.01 1021 2.31 1021

228 242 25.75 1.70 102

2 2.51 102

1 9.62 102

2

226 244 25.95 1.74 1022 1.12 1021 1.75 1022


214 280 42.61 5.75 1022 1.34 3 1022 7.63 1022

216 276 36.94 4.65 1022 1.36 1022 1.94 1021

218 278 32.19 4.17 1022 1.36 1022 1.53 1021


208 278 57.75 8.93 1022 4.69 1022 1.23 3 1022

208 276 57.75 8.74 1022 4.69 1022 1.26 1022

208 274 57.75 8.74 1022 4.69 1022 1.26 1022

Bold numbers show the minimum value of PRESS for each number of factors.



12/25

corresponding to the minima obtained with three factors were significantly

higher.

Finally, the use of four factors yielded a less uniform surface (Fig. 2c).

This may be the result of virtually all ranges being overfitted and of factors

strongly dependent on noise or some other source not related to the analyte con-

centration being incorporated into the process. As can clearly be seen from

Table 1, the lowest PRESS value obtained with four factors was more than

twice that provided by two or three factors. Also, the second-best PRESS

obtained with four factors (2.20 1022) was not even significantly lower

than that corresponding to the same range with three factors (2.21 1022).

If only the best range for each factor is considered, then the 202242 nmrange with two factors was that yielding the lowest PRESS in the determi-

nation of DPH.

Determination of PB

The determination of PB in the presence of DPH was performed by construct-

ing a calibration matrix similar to that used for DPH, except that the wave-

lengths were those in the range 200 280 nm (viz., the region where the

analyte absorbs UV light). By following a procedure similar to that for

DPH, matrix C200,280 [41 16] was used to obtain all possible wavelength

ranges from 200 to 280 nm with at least three variables. The models thusobtained were used to calculate the concentrations for the prediction set

with 1, 2, 3, and 4 factors, as well as the corresponding PRESS values.

As with DPH, a single factor provided models that accounted for less than

95% of the initial variance and were poorly predictive. This suggests that the

use of a single factor resulted in underfittingin fact, the best PRESS value

thus obtained was 7.81 (wavelength range 250256 nm).

Fig. 3a shows the PRESS values obtained with two factors. As can be seen,

mostthose for the ranges containing wavelengths below 230 or above 236 nm

only exceptedwere less than 0.5. By exception, the ranges starting at 254 or

256 nm and ending in 276 or 278 nm had PRESS values around 0.020, even

though their wavelengths were in the region of high PRESS levels. Theranges providing the best PRESS values with two factors were those with

their centers at 235 nm and including 4, 8, or 10 variables. As can be seen

from Table 1, the PRESS values obtained with two factors differed by three

orders of magnitude from those provided by a single factor. Fig. 3b shows

the PRESS values obtained with three factors. Most were lower than those

obtained with two factorsexcept in the ranges with the lowest PRESS

levels (viz., those with a central value of 235 nm and spanning 4, 6, 8, or

10 nm), where PRESS was greater than in the other ranges as the likely

result of the system with three factors being overdimensioned. The ranges

with the lowest PRESS values in the prediction of the PB concentration with

three factors were quite wide; in fact, they spanned nearly the whole spectral

zone studied. The best PRESS values were those for the 214280, 216276,



13/25

and 218 278 nm ranges (1.34 1022, 1.36 1022, and 1.36 1022,

respectively). As can be seen from Table 1, the three values are significantly

smaller than those obtained for the same ranges but only two factors.

The best PRESS values obtained with four factors (1.23 1022,

1.26 1022, and 1.26 1022) were slightly lower than those obtained

with three factors (1.34 1022, 1.36 1022, and 1.36 1022). However,

the shape of the PRESS surface (Fig. 3c) suggests that the models were over-

dimensioned. The lowest PRESS values obtained with four factors were

virtually identical and corresponded to very similar ranges (viz., 208278,

208276, and 208274 nm) (see Table 1).

A comparison of the PRESS values for the ranges yielding the best results

with each number of factors reveals that PRESS decreased with an increasing

number of factors. The differences, however, were all insignificant. We

therefore adopted two factors and the wavelength range 232238 nm as the

best choice for the determination of PB.

Figure 3. PRESS obtained for PB as a function of initial and final wavelengths by

using (a) two factors, (b) three factors, and (c) four factors.



14/25

Calibration and Validation Sets

Fig. 4 shows the results obtained in the determination of DPH and PB by

applying PLS-1 methodology to the calibration and validation sets used to

identify the optimum conditions (viz., prediction with two factors over the

ranges 202242 and 232238 nm for DPH and PB, respectively). As can

be seen, the results were quite satisfactory (the errors were all less than

2.0%).

With calibration by full cross-validation, the chosen models accounted for

99.95% and 99.99% of the initial variance of validation in the concentration

matrix for DPH and PB, respectively.

Prediction Set

The predictive ability of a model cannot be assessed by using any samples

employed to establish its optimum parameter values (i.e., those in the cali-

bration set) or determine the optimum wavelength range and number of

factors (i.e., those in the validation set). For this reason, we prepared a third

sample set, the prediction set, the concentrations of which were determined

under the previously described optimum conditions with a view to assessing

the predictive ability of the proposed model.

As can be seen from Table 2, the errors in the concentrations of bothanalytes were all less than 3.0%. Also, there was good correlation between

the amounts added and found, as reflected in the slope and intercept

obtained in the mutual confidence region test (a 0.05) (Mandel and

Linnig 1957), which suggests the presence of no systematic error in the pre-

diction of either analyte.

One question raised by the proposed procedure for establishing the

optimum wavelength range and number of factors for PLS determinations

based on spectrophotometric data is to what extent the data matrix used

to calculate PRESS influences the final result. This was studied by

repeating the previously described procedure but using the prediction set

instead of the validation set in the calculations. Fig. 5 shows the PRESSvalues obtained in the different wavelength ranges used to predict the con-

centrations of the validation matrix with two factors. As can be seen, the

resulting surface was highly similar to that for the PRESS of the validation

matrix (Fig. 3a).

The lowest PRESS values found in this way were 7.40 1023 and

8.30 1023, and corresponded to the ranges 234240 and 232238 nm,

respectively. These ranges, which contain four variables each, are very

similar; also, the latter (232 238 nm) coincides with the previously

obtained lowest-PRESS range. Consequently, the performance of the

proposed method is not influenced by the specific matrix used to optimize

the wavelength range and number of factors since the use of two different

matrices to calculate PRESS yields virtually the same results.



15/25

Figure 4. Relative errors obtained for the determination of DPH (A) and PB () in

(a) calibration set and (b) validation set, by using PLS-1 with two factors in the ranges

202242 nm and 232238 nm, respectively.



16/25

Table 2. Results obtained for the determination of DPH and PB in the prediction set by using PLS-1 with two

in the ranges 202242 nm and 232238 nm, respectively, and by bwcoefficients method

Added (mg/mL)

Significant PRESS method bw

Found (mg/mL) Relative error (%) Found (mg/m

DPH PB DPH PB DPH PB DPH

2.17 2.22 2.17+ 0.09 2.23+ 0.03 0.00 0.45 2.13+ 0.02 2.2

2.17 4.84 2.11+ 0.04 4.86+ 0.04 22.76 0.41 2.2+ 0.02 4.

2.17 7.26 2.22+ 0.04 7.26+ 0.05 2.30 0.00 2.18+ 0.05 7.2

2.17 9.88 2.18+ 0.06 9.86+ 0.02 0.46 2

0.20 2.11+ 0.03 9.4.73 2.22 4.66+ 0.03 2.25+ 0.02 21.48 1.35 4.69+ 0.01 2.2

4.73 4.84 4.69+ 0.03 4.90+ 0.03 20.85 1.24 4.76+ 0.03 4.

4.73 7.26 4.62+ 0.04 7.25+ 0.01 22.33 20.14 4.72+ 0.04 7.2

4.73 9.88 4.67+ 0.04 9.89+ 0.02 21.27 0.10 4.79+ 0.04 9.

7.09 2.22 7.07+ 0.04 2.22+ 0.02 20.28 0.00 7.11+ 0.04 2.

7.09 4.84 7.09+ 0.05 4.84+ 0.02 0.00 0.00 7.08+ 0.04 4.

7.09 7.26 7.07+ 0.03 7.29+ 0.02 20.28 0.41 7.12+ 0.02 7.

7.09 9.88 7.19+ 0.04 9.90+ 0.03 1.41 0.20 7.19+ 0.06 9.9

9.66 2.22 9.64+ 0.05 2.22+ 0.01 20.21 0.00 9.63+ 0.02 2.2

9.66 4.84 9.63+ 0.04 4.87+ 0.03 20.31 0.62 9.71+ 0.05 4.

9.66 7.26 9.59+ 0.04 7.27+ 0.03 20.72 0.14 9.88+ 0.03 7.2

9.66 9.88 9.54+ 0.09 9.86+ 0.02 1.24 2

0.20 9.45+ 0.04 9.PRESS 5.98e-2 8.30e-3 1.20e-1 1.9


17/25

To confirm the benefits of wavelength selection, Table 3 shows the

PRESS obtained in the determination of DPH and PB by using PLS-1 for

the models built with the full-spectrum method (200270 nm for DPH and

200280 nm for PB) and with the best ranges obtained from the proposed

method for the wavelength selection (202 242 nm for DPH and 232

238 nm for PB). It can be seen that PRESS values diminish substantially in

the case of the best model found by the significant PRESS method, so wave-

length selection allows improving results in multivariate calibration methods.

Likewise, Fig. 6 shows that relative errors in the determination of DPH and PB

are lower when wavelength range is optimized.

Figure 5. PRESS obtained for determination of PB in the prediction set.

Table 3. PRESS obtained for the determination of DPH and PB by using PLS-1

with two factors in the ranges selected by the significant PRESS method and in the

full-spectrum method

Set

DPH PB

202 242 nm 200 270 nm 232 238 nm 200 280 nm

Calibration 2.50 1022 1.81 1021 6.90 1023 5.00 1022

Validation 7.70 1023 8.97 1022 1.68 1022 1.41 1021

Prediction 5.98 10

22

2.03 10

21

8.30 10

23

5.54 10

22



18/25

Figure 6. Relative errors obtained in the determination of (a) DPH and (b) PB in the

prediction set by using PLS-1 with two factors in the ranges selected by the significant

PRESS method () and in the full-spectrum method (A).



19/25

BwCoefficients Method

This method was implemented by conducting a PLS-1 regression on the cali-

bration matrix for each analyteprepared as described in the previous

sectionsfollowing centering and standardization of data. The resulting

models were validated by full cross-validation.

Fig. 7 shows the RMSECV for each analyte as a function of the number of

factors used in the prediction. As can be seen, the optimum number of factors

was two, both with the criterion of the first local RMSECV minimum and with

that of the RMSECV value not significantly different from the first local

minimum.Fig. 8 shows the variation of the bw coefficients for each analyte as a

function of wavelength with two factors. The curves exhibit two distinct

portions above and below about 226 nm. In the first portion, both DPH and

PB exhibit high bw coefficients (positive for the former and negative for the

latter) at 212 214 nm, which is the zone most strongly influenced by

DPH. Above 226 nm, bw coefficients are high (negative for DPH and

positive for PB) at 242 244 nm, where PB exhibits an absorption

maximum. Also, some regions are under no appreciable effect from either

analyte. Such is the case at wavelengths above 266 nm, where both analytes

Figure 7. RMSECV as a function of number of factors obtained by PLS-1 using

standardized data of the whole spectraDPH () and PB (B).



20/25

absorb very weakly; near 226 nm, where bw coefficients change sign and are

thus very small; and below 206 nm, where the absorption spectra for both

analytes are fully overlapped.

The next step involved selecting wavelengths in terms of the correspond-

ing bwvalue: the greater the coefficient was, the stronger was the significance

of the wavelength concerned. To determine the number of variables to be con-

sidered, a series of sets consisting of wavelengths with bw values not

exceeding a preset level were constructed. Table 4 shows the bw values

used to include or exclude variables in the models, as well as the number of

variables contained in each.

Then the PLS-1 models for the previous sets were constructed from

centered, nonstandardized data, and the respective RMSECV values were cal-

culated. The model yielding the lowest RMSECV was chosen as optimal, and

the number of variables contained in it was adopted as the most suitable. Fig. 9

shows the RMSECV values obtained with the different models tested in the

determination of DPH and PB. As can be seen, the optimum number of

factors was two with all models and both analytes, whatever the number of

variables used. Also, RMSECV increased slightly with the increasing

number of variables, particularly from 20 to 24 variables for DPH and from

24 to 28 for PB. The lowest RMSECV in the determination of DPH,

0.0312, was obtained with four variables; the ensuing model accounted for

Figure 8. bwcoefficients of DPH () and PB ( ).



21/25

99.0% of the initial variance in the data. On the other hand, the lowest

RMSECV for PB, 0.0195, was obtained with eight variables, and the

resulting model accounted for 99.4% of the initial variance in the data.

Note that the wavelengths contained in the optimum model for determining

DPH (240 246 nm) were included in that for the determination of PB

(236 248 nm).

To compare this variable selection method with the proposed one, the bestmodel provided by the bwcoefficient method was used to determine the con-

centrations of the samples in the prediction set used to assess the performance

of the best significant PRESS value. The results, shown in Table 2, reveal that

the predictions were reasonably accurate, with relative errors less than 3.0%

and 1.5% for DPH and PB, respectively. Also, the slopes and intercepts

obtained with the mutual confidence region method reveal no signs of sys-

tematic errors in the quantitation of either analyte.

As can also be seen from Table 2, the PRESS values obtained with

the best significant PRESS method (viz., 5.98 1022 for DPH and

8.30 1023 for PB) are smaller than those provided by this method (viz.,

1.20 1021

for DPH and 1.96 1022

for PB). Consequently, the proposedmethod provides better results.

Determination of DPH and PB in Commercial Pharmaceutical

Preparations

Once the optimum wavelength range and number of factors for determining

DPH and PB with the lowest possible errors were identified, the two

analytes were quantified simultaneously in the commercial pharmaceutical

preparation Epilantinw. The predictive ability of the proposed model (viz.,

PLS-1 with two factors and the wavelength ranges 202242 nm for DPH

and 232 238 nm for PB) was assessed by quantifying the samples using

Table 4. Limit values for bwcoefficients and correspond-

ing number of variables in the construction of the models

Model DPH PH Number of variables

1 0 0 41

2 0.130 0.135 36

3 0.263 0.254 32

4 0.508 0.528 28

5 0.915 0.699 24

6 1.358 0.953 20

7 1.793 1.354 168 2.206 1.729 12

9 2.566 2.553 8

10 2.831 3.230 4



22/25

Figure 9. RMSECV as a function of the number of variables for (a) DPH and (b) PB.

(Number of factors used is placed in the corresponding curve).



23/25

HPLC as the reference method. As can be seen from Table 5, there was good

consistency between the contents stated and those found with both PLS and

HPLC.

CONCLUSIONS

In this work, we developed a simple method of easy implementation for the

simultaneous selection of the optimum continuous wavelength range and

number of factors as a function of the calculated errors for the prediction

obtained by applying PLS to a sample set with a view to resolving mixtures

of analytes. It is shown that the use of the selected wavelength range

improved the results in relation to those obtained by using the full-spectrum

method. The method was used for the spectrophotometric determination of

two antiepileptics (viz., phenytoin and phenobarbital) in both synthetic

mixtures and a pharmaceutical preparation, and found to provide better

results than other easy to implement, existing methods.

REFERENCES

Abdollahi, H. and Bagheri, L. 2004. Simultaneous spectrophotometric determination of

Vitamin K3 and 1,4-naphthoquinone after cloud point extraction by using geneticalgorithm based wavelength selection-partial least squares regression. Anal. Chim.

Acta, 514: 211 218.

Centner, V., Massart, D.L., Noord, O.E., Jong, S., Vandeginste, B.M., and Stema, C.

1996. Elimination of uninformative variables for multivariate calibration. Anal.

Chem., 68: 38513858.

Espinosa-Mansilla, A., Duran Meras, I., Rodrguez Gomez, M.J., Munoz de la

Pena, A., and Salinas, F. 2002. Selection of the wavelength range and spectrophoto-

metric determination of leucovorin and methotrexate in human serum by a net

analyte signal based method. Talanta, 58: 255263.

Galvao, R.K.H., Pimentel, M.F., Ugulino Araujo, M.C., Yoneyama, T., and Visan, V.

2001. Aspects of the successive projections algorithm for variable selection in multi-

variate calibration applied to plasma emission spectrometry. Anal. Chim. Acta, 443:

107115.

Table 5. Determination of DPH and PB in pharmaceuti-

cals (100 mg of DPH and 50 mg of PB declared per tablet)

DPH (mg)+ SD PB (mg)+ SD

PLSa

99.04+ 0.88 49.61+ 0.35

HPLCb 99.71+ 0.92 49.88+ 0.28

aValue provided by unscrambler.bValue obtained from the straight line model.



24/25


25/25

Documents

Anal Letters 2007