Upload
nathaniel-cooper
View
216
Download
0
Embed Size (px)
Citation preview
8/11/2019 Anal Letters 2007
1/25
CHEMOMETRICS
Selection of Wavelength Range and Numberof Factors to be Used in the PLS Treatment
of Spectrophotometric Data
M. Luz Luis, JoseM. G. Fraga, Francisco Jimenez,
Ana I. Jimenez, Oscar M. Hernandez, and Juan J. Arias
Departamento de Qumica Analtica, Nutricion y Bromatologa, Facultad
de Qumica, Universidad de La Laguna, La Laguna, Tenerife, Spain
Abstract:A procedure for selection of wavelength range and number of factors to be
used in partial least square calibration that involves the calculation of predictionresidual sum of squares (PRESS) in different conditions is proposed. The best model
takes into account the minimum PRESS value that does not show significant differ-
ences with respect to the corresponding model with fewer factors. The ability of the
proposed method to minimize errors in partial least squares (PLS) prediction is demon-
strated by applying it to the resolution of phenytoine (DPH) and phenobarbital (PB)
binary mixtures with errors less than 2.8%; the results are compared with those
obtained using another wavelength selection procedure. The ensuing method, which
was validated by high performance liquid chromatography (HPLC), also gives good
results with real samples (pharmaceutical preparations).
Keywords:Wavelength selection, number of factors selection, partial least squares
INTRODUCTION
A whole absorption spectrum provides a more accurate description of the
sample concerned than a single measurement does; there is, however, some
Received 9 March 2006; accepted 23 May 2006
The authors wish to acknowledge the financial support of this work by Education,
Culture and Sports Council of Canary Government, research project PI2002/014.Address correspondence to Juan J. Arias, Departamento de Qumica Analtica,
Nutricion y Bromatologa, Facultad de Qumica, Universidad de La Laguna,
E-38071, La Laguna, Tenerife, Spain. Tel.: 34-922-318-076; Fax: 34-922-318-
003; E-mail: [email protected]
Analytical Letters, 40: 257280, 2007
Copyright# Taylor & Francis Group, LLCISSN 0003-2719 print/1532-236X onlineDOI: 10.1080/00032710600867598
257
8/11/2019 Anal Letters 2007
2/25
redundancy, as not all wavelengths contribute unique information. Multi-
variate calibration methods can deal with this shortcoming of the spectral
data, but the improvement of results in these cases has raised the need for
alternative methods to identify those measured variables of actual relevance
with a view to constructing the calibration model.
Partial least squares (PLS) calibration methodology resolves the matrix of
independent variables (X, absorbances at different wavelengths) and that of
dependent variables (Y, analyte concentrations in the samples) into their
respective latent variables or factors. The number of independent variables
usually exceeds that of dependent variables, so the number of latent
variables or PLS factors that could in principle be used is at least as largeas the rank of the independent variable matrix. However, all those factors
are rarely used because the independent variables are not always free of
noise and some factors may contain noise and introduce it into the model. If
the two matrices are linearly related, there are no interferences, and the inde-
pendent variable matrix is free of noises, then the number of latent variables or
factors required to describe the model equals the rank of the dependent
variables. Such a number must be increased in the presence of a nonlinear
relation or some interference.
As with any calibration method, the selection of the independent variables
to be used plays a crucial role in the development of a quality PLS model.
Thus, although most PLS applications use the entire wavelength range (i.e.,the full-spectrum method), a number of alternative methods for wavelength
selection that exclude those associated with high noise, nonlinearity, or irrele-
vant information have been developed (Osborne et al. 1997; Goicoechea and
Olivieri 1999; Centner et al. 1996; Jouan-Rimbaud et al. 1995; Espinosa-
Mansilla et al. 2002; Leardi et al. 2002; Jiang et al. 2002; Galvao et al.
2001; Kompany-Zareh and Mirzaei 2004; Gourvenec et al. 2004; Abdollahi
and Bagheri 2004; Ghasemi et al. 2003). However, many researchers find
these methods difficult to apply because most of them involve complex algo-
rithms and mathematic developments. Therefore, new methods that can be
easily implemented are always welcome.
In this work, we developed a simple, easy-to-implement method, for the
simultaneous selection of the wavelengths and number of factors to be used in
the PLS regression of spectrophotometric data. The method involves con-
structing as many PLS-1 models as wavelength ranges with f factors are
possible for each analyte from the same calibration set. Each model is then
used to calculate the predicted residual sum of squares (PRESS) for the vali-
dation set with a variable number of factors from 1 to f. For each analyte andf
value, the model exhibiting the lowest PRESS that is significantly different
from the model corresponding to the same interval, but with (f2 1)
factors, is chosen. The PRESS values of the models thus selected for each
number of factors are then compared, and the one with the lowest PRESS
that is significantly different from the value for the model with one less
factor is chosen.
M. L. Luis et al.258
8/11/2019 Anal Letters 2007
3/25
The proposed method for the simultaneous selection of the wavelength
range and number of factors was validated by applying it to the determination
of two antiepileptics [viz., phenytoin (DPH) and phenobarbital (PB), the
spectra for which in the UV region are completely overlapped] in pharma-
ceutical preparations. The results thus obtained are compared with those
provided by the method for the selection of wavelengths and number of
factors, the Bw coefficients method (Garrido et al. 1995); this method was
chosen because it is also simple and easy to implement.
A method has been developed concerning spectrophotometric data
(Garrido et al. 1995) that selects those wavelengths exhibiting the highest B
coefficients in the PLS regression of the full spectrum. To ensure anidentical variance in all variables, the data must previously be mean-
centered and standardized (i.e., weighted with the reciprocal of the standard
deviation); the regression coefficients thus obtained are designated Bw.
Matrix Bw can be calculated directly from the PLS loading for the model
with the optimum number of factors:
Bw W PT W1 QT
where Wis the Xweight-loading matrix, P the X-loading matrix, and Q the
Y-loading matrix. The centered, nonstandardized X matrix is then used to
construct various models involving a small number of variables corresponding
to a given level ofBwcoefficients, and each model is subjected to a leave-one-
out cross-validation process to determine the number of wavelengths to be
included in the final model.
SELECTION METHODS
Best Significant PRESS Method
The process was conducted using macros in the software Unscrambler
(Computer-Aided Modelling A/S, 1993). The steps leading to the bestmodel are as follows:
1. The spectra for the calibration and validation samples are recorded, and the
calibration matrix (C), which is used to develop the models, and validation
matrix (O), which allows the best model to be identified, are constructed.
Bothmatrices encompass i variables (wavelengthsn1,n2 . . . ni)andj samples.
2. AnMn1,ni
model is then constructed by applying PLS-1 to the calibration
matrix Cn1,ni
[i j], which is used to predict the validation samples
by using 1, 2, . . . h, . . . f factors, and the PRESSn1,nih values for each
Pn1,ni
hprediction made with each number h of factors are calculated.
3. The variable with the highest subscript, ni, is removed from Cn1,ni to
obtain a calibration matrix Cn1,ni21 [(i 2 1) j] that is used to construct
PLS Treatment of Spectrophotometric Data 259
8/11/2019 Anal Letters 2007
4/25
an Mn1,ni21
model by PLS-1, and the corresponding predictions, Pn1,ni21
h,
with their respective PRESS values for each number h of factors
(PRESSn1,ni21
h) are calculated.
4. Step 3 is repeated as many times as needed to obtain matrixCn1,n3
[3 j],
Mn1,n3
,Pn1,n3
h , and PRESSn1,n3
h . In this way, every possible model having n1as its initial wavelength and at least three variables (i22 models) is con-
structed and the PRESS for its predictions with 1, 2, . . . h, . . . ffactors
calculated.
5. The variable with the lowest subscript (n1) is removed from the initial
matrix, Cn1,ni
[i j], to obtain Cn2,ni
[(i 2 1) j], from which model
Mn2,ni is constructed, and the corresponding predictions, Pn2,nih , andPRESS
n2,ni
hvalue for each number of factors tested are calculated.
6. The variable with the highest subscript (ni) is removed from Cn2,ni[(i 2 1) j] to obtain Cn2,ni21 [(i 2 2) j], and a model Mn2,ni21 is con-
structed from which Pn2,ni21
h and then PRESSn2,ni21
h are obtained.
7. Step 6 is repeated until matrices Cn2,n4
[3 j], Mn2,n4
and Pn2,n4
h, and
PRESSn2,n4
hare obtained. In this way, every possible model having n2 as
its initial wavelength and at least three variables (i 2 3 models) is
constructed.
8. Steps 5 through 7 are repeated until the following calibration matrix series
are obtained in each iteration:
Cn1;ni ; Cn1;ni1 ; Cn1;ni2 ;. . . ; Cn1;n5 ; Cn1;n4 ; Cn1;n3
Cn2;ni ; Cn2;ni1 ; Cn2;ni2 ;. . . ; Cn2;n5 ; Cn2;n4
Cn3;ni ; Cn3;ni1 ; Cn3;ni2 ;. . . ; Cn3;n5
.
.
....
.
.
.
Cni4 ;ni ; Cni4;ni1 ; Cni4;ni2
Cni3 ;ni ; Cni3;ni2
Cni2 ;ni
with their respective models (M
nx,ny), predictions (
Pnx,ny
h
), and PRESSnx,ny
h
values, where y x 2.
9. The optimum range for each number of factors, h, is that with the
minimum PRESSh value significantly smaller than PRESSh21 for the
same model with one less factor. This is usually checked by calculating
the statistic F, the tabulated value of which for a probability of 0.75
allows significant differences between models to be accurately detected
according to Haaland and Thomas methodology (Haaland and Thomas
1988). If the model with the lowest PRESS does not fulfill the previous
requirement, then the next model with the lowest PRESS is chosen and
the same criterion applied. This process must be carried out for the
1, 2, . . . h, . . .f factors tested. Because it is a restrictive criterion, it
avoids overfitting.
M. L. Luis et al.260
8/11/2019 Anal Letters 2007
5/25
10. The previous step yieldsfmodels (one per factor), the best of which is
identified by applying the previous criterion again. In fact, the most
suitable model will be that possessing the lowest PRESSprovided
that its calculated F value reflects significant differences with the
PRESS corresponding to one less factor.
BwCoefficients Method
In spectroscopic applications,b coefficients (B) are those that relate the set of
independent variables (X) to that of dependent variables (Y). These coefficientscannot be used as such to identify the most relevant wavelengths in the model
since large absolute values may indicate, indifferently, that a variable is
significant or that it possesses a small absolute value and a high variability.
This shortcoming can be avoided by standardizing data (i.e., by weighting
the X variables with the reciprocal of the standard deviation, which causes
all variables to have the same variance). In this way, a large absolute value
of the bwcoefficients will inevitably correspond to a major Xvariable so that
the specific variables to be used to construct the model can be readily identified.
To find the optimum number of wavelengths, one must examine every
model constructed from wavelengths with bw coefficients not exceeding
specific, preset values. One additional problem, present always in PLSregression, arises from the need to determine the optimum number of
factors, which is calculated from the root mean square error of cross-
validation (RMSECV):
RMSECV
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffi1
N
XNi1
^y yi2
vuut
wherey is the predicted concentration for the ith calibration sample and yiits
actual concentration. The optimum number of factors can be decided on by
using two different criteria, namely:
1. Criterion I, which involves the number of factors corresponding to the
minimum local RMSECV in a plot of RMSECV against the number of
factors included in the model.
2. Criterion II, which is based on the number of factors preceding the first
local RMSECV minimum is most suitable, provided it is not significantly
greater than the minimum itself. Otherwise, the local minimum is chosen.
This is the criterion used by Unscrambler software (Computer-Aided
Modelling A/S, 1993).
Depending on the particular bwlevels chosen, sets with different numbers
of variables are obtained. Each set is used to construct a model and plot the
PLS Treatment of Spectrophotometric Data 261
8/11/2019 Anal Letters 2007
6/25
resulting RMSECV values against the corresponding numbers of wavelengths.
A minimum for each criterion used to select the optimum number of factors
will be obtained, and the number resulting in the lowest RMSECV is
chosen as optimal.
EXPERIMENTAL SECTION
Apparatus
Spectra were recorded on a Hewlett-Packard HP 8452A diode array spectro-photometer (Avondale, PA, USA) interfaced to a Vectra ES computer, also
from Hewlett-Packard. Quartz cuvettes of 1 cm light path and 4 mL inner
volume were used throughout.
Chromatographic runs were conducted on a Waters 600E liquid chro-
matograph (Waters; Milford, MA, USA) with a 20-mL loop volume
equipped with a Model 996 diode array detector from the same manufacturer.
Both were operated via an NEC Power Mate 433 computer. All determinations
were carried out using a NovaPack C18 column (3.9 150 mm) packed with
particles 4 mm in size.
pH measurements were made with a Crison digital pH meter
(Crison; Barcelona, Spain) furnished with a glass-saturated calomel doubleelectrode.
A Selecta ultrasonic bath (R. Espinar; Sevilla, Spain) and an AND ER-
182 analytical balance (Pacisa, S.A.; Madrid, Spain) were also used.
Software
The software used included Millennium version 2.10 (Millipore Corp.;
Milford, MA, USA; 1993) and MS-DOS-UV-Visible HP 8452A to control
the chromatographic system and spectrophotometer, respectively, and
Unscrambler II version 5.0 (Computer-Aided Modelling A/
S; Trondheim,
Norway; 1993) to perform the partial least squares regressions.
Reagents
The reagents employed included 100 mg mL21
standard solutions of DPH
and PB, which were prepared by direct weighing of the respective pure
commercial products (Sigma; Madrid, Spain) and dissolution in ethanol;
a 0.2 M NH4Cl/NH3 buffer of pH 10.0; and a 10-mM TRIS buffer ofpH 7.0. All reagents were analytical-grade and aqueous solutions
prepared in Millipore/Milli-Q (Millipore; Bedford, MA, USA) deionizedwater.
M. L. Luis et al.262
8/11/2019 Anal Letters 2007
7/25
Procedure
Calibration, Validation, and Prediction Sets
The calibration, validation, and prediction sets were constructed by using a
factor design consisting of two analytes at four levels (42). Concentrations
in the calibration set were within the linear range for each analyte (viz.,
1.979.85 mg mL21
for DPH and 2.02 10.08mg mL21
for PB). The
lowest and highest values for the validation and prediction sets were greater
than the minimum and lesser than the maximum values, respectively, used
for the calibration set.
Determination of DPH and PB in Synthetic Mixtures
In 25 mL calibrated flasks were placed 5 mL of NH4Cl/NH3 buffer and thevolumes of DPH and PB solutions required to obtain a final concentration
over the range 1.979.85 and 2.0210.08 mg mL21
, respectively, plus that
of ethanol needed to make the mixture 20% (v/v) in it. The solutions weremade to volume with deionized water and their absorption spectra between
200 and 300 nm, every 2 nm, recorded against an identically prepared blank
containing no analyte.
Determination of DPH and PB in Pharmaceutical Preparations
Using PLS Regression
DPH and PB were determined in Epilantinw
(Otsuka Pharmaceutical, S.A.;
Barcelona, Spain) by weighing, grinding, and homogenizing 10 tablets of
the pharmaceutical, and using six samples of 40 mg each for analysis.
The drugs were extracted with 100 mL of ethanol in an ultrasonic bath for
20 min and diluted to 250 mL with deionized water. Finally, an aliquot of
3 mL of the resulting solution was analyzed as described in Determination
of DPH and PB in Synthetic Mixtures.
Determination of DPH and PB in Pharmaceutical Preparations
Using HPLC
The previous solutions of the two drugs were filtered across a polypropylene
membrane of 0.4 mm pore size prior to injection into the chromatograph.
Determinations were carried out by using a NovaPack C18 (3.9 150 mm)
column and a 60:40 mixture of methanol and 10 mM TRIS buffer at 7.0 pH
that was circulated at a flow rate of 1 mL min21. The drugs were quantified
spectrophotometrically from the areas under the peaks at 254 nm.
PLS Treatment of Spectrophotometric Data 263
8/11/2019 Anal Letters 2007
8/25
RESULTS AND DISCUSSION
As can be seen in Fig. 1, the absorption spectra for DPH and PB are completely
overlapped throughout the UV region. Both drugs exhibit a strong peak at
212 nm; in addition, PB gives a maximum at 240 nm and DPH a shoulder at
the same wavelength. DPH and PB absorb UV light at wavelengths up to 270
and 280 nm, respectively. Its strong spectral overlap makes this mixture an app-
ropriate model to assess the performance of the proposed method for selecting
the wavelength range and number of factors to be used in the determination.
The applicability of the method was assessed by applying it to three
different sample sets (viz., a calibration, a validation, and a prediction set).The calibration set was used to construct the PLS models for the different
possible wavelength ranges (Mnx,ny
). The validation set was employed to
select the prediction resulting in the fewest errors (viz., the lowest significant
PRESSnx,ny
h value) among those obtained with the models provided by
the calibration set and a variable number of factors (Pnx,ny
h). Finally, the
prediction set, which consisted of samples not included in the calibration or
validation set, was used to assess the predictive ability of the method.
These three sets, prepared as described in the section titled Experimental,
were used to select the optimum wavelength range and number of factors for
applying the PLS-1 method to the determination of each analyte in the sample.
Figure 1. Absorption spectra of DPH () and PB ( ) solutions.
M. L. Luis et al.264
8/11/2019 Anal Letters 2007
9/25
8/11/2019 Anal Letters 2007
10/25
8/11/2019 Anal Letters 2007
11/25
variables (200242 and 200244 nm) but did not improve on the previous
results. The PRESS value obtained with two factors was significantly lower
than that obtained with one factor in the three cases.
As can be seen from Fig. 2b, the inclusion of a third factor decreased
PRESS in the region of high values (226232 nm) and increased it in the
region of low values (valleys). The surface in the region where a peak was
observed with two factors lost smoothness as a result of the system being
overfitted in most of these ranges. As can be seen from Table 1, the ranges
providing the lowest PRESS values with two and three factors were virtually
identical; however, the PRESS values obtained with two factors in the ranges
Table 1. Minimum values of PRESS and corresponding wavelength ranges for the
different numbers of factors assayed
Range (nm)
Number of factors
1 2 3 4
DPH
Best models using 2 factors
202 242 60.84 7.70 3 1023 9.50 1023 6.05 1022
200 242 60.58 1.41 1022 1.35 1022 5.45 1022
200 244 61.58 1.86 1022 8.60 1023 5.37 1022
Best models using 3 factors
200 246 61.58 3.69 1022 8.10 3 1023 4.61 1022
200 248 61.97 5.46 1022 8.30 1023 4.20 1022
200 244 61.34 1.86 1022 8.60 1023 5.37 1022
Best models using 4 factors
210 240 58.01 9.52 1022 2.93 1021 1.733 1022
212 240 46.75 5.85 100 2.21 1022 2.20 1022
212 232 52.60 8.91 1022 3.55 1021 2.29 1022
PB
Best models using 2 factors
232 238 26.01 1.68 3 1022 2.01 1021 2.31 1021
228 242 25.75 1.70 102
2 2.51 102
1 9.62 102
2
226 244 25.95 1.74 1022 1.12 1021 1.75 1022
Best models using 3 factors
214 280 42.61 5.75 1022 1.34 3 1022 7.63 1022
216 276 36.94 4.65 1022 1.36 1022 1.94 1021
218 278 32.19 4.17 1022 1.36 1022 1.53 1021
Best models using 4 factors
208 278 57.75 8.93 1022 4.69 1022 1.23 3 1022
208 276 57.75 8.74 1022 4.69 1022 1.26 1022
208 274 57.75 8.74 1022 4.69 1022 1.26 1022
Bold numbers show the minimum value of PRESS for each number of factors.
PLS Treatment of Spectrophotometric Data 267
8/11/2019 Anal Letters 2007
12/25
corresponding to the minima obtained with three factors were significantly
higher.
Finally, the use of four factors yielded a less uniform surface (Fig. 2c).
This may be the result of virtually all ranges being overfitted and of factors
strongly dependent on noise or some other source not related to the analyte con-
centration being incorporated into the process. As can clearly be seen from
Table 1, the lowest PRESS value obtained with four factors was more than
twice that provided by two or three factors. Also, the second-best PRESS
obtained with four factors (2.20 1022) was not even significantly lower
than that corresponding to the same range with three factors (2.21 1022).
If only the best range for each factor is considered, then the 202242 nmrange with two factors was that yielding the lowest PRESS in the determi-
nation of DPH.
Determination of PB
The determination of PB in the presence of DPH was performed by construct-
ing a calibration matrix similar to that used for DPH, except that the wave-
lengths were those in the range 200 280 nm (viz., the region where the
analyte absorbs UV light). By following a procedure similar to that for
DPH, matrix C200,280 [41 16] was used to obtain all possible wavelength
ranges from 200 to 280 nm with at least three variables. The models thusobtained were used to calculate the concentrations for the prediction set
with 1, 2, 3, and 4 factors, as well as the corresponding PRESS values.
As with DPH, a single factor provided models that accounted for less than
95% of the initial variance and were poorly predictive. This suggests that the
use of a single factor resulted in underfittingin fact, the best PRESS value
thus obtained was 7.81 (wavelength range 250256 nm).
Fig. 3a shows the PRESS values obtained with two factors. As can be seen,
mostthose for the ranges containing wavelengths below 230 or above 236 nm
only exceptedwere less than 0.5. By exception, the ranges starting at 254 or
256 nm and ending in 276 or 278 nm had PRESS values around 0.020, even
though their wavelengths were in the region of high PRESS levels. Theranges providing the best PRESS values with two factors were those with
their centers at 235 nm and including 4, 8, or 10 variables. As can be seen
from Table 1, the PRESS values obtained with two factors differed by three
orders of magnitude from those provided by a single factor. Fig. 3b shows
the PRESS values obtained with three factors. Most were lower than those
obtained with two factorsexcept in the ranges with the lowest PRESS
levels (viz., those with a central value of 235 nm and spanning 4, 6, 8, or
10 nm), where PRESS was greater than in the other ranges as the likely
result of the system with three factors being overdimensioned. The ranges
with the lowest PRESS values in the prediction of the PB concentration with
three factors were quite wide; in fact, they spanned nearly the whole spectral
zone studied. The best PRESS values were those for the 214280, 216276,
M. L. Luis et al.268
8/11/2019 Anal Letters 2007
13/25
and 218 278 nm ranges (1.34 1022, 1.36 1022, and 1.36 1022,
respectively). As can be seen from Table 1, the three values are significantly
smaller than those obtained for the same ranges but only two factors.
The best PRESS values obtained with four factors (1.23 1022,
1.26 1022, and 1.26 1022) were slightly lower than those obtained
with three factors (1.34 1022, 1.36 1022, and 1.36 1022). However,
the shape of the PRESS surface (Fig. 3c) suggests that the models were over-
dimensioned. The lowest PRESS values obtained with four factors were
virtually identical and corresponded to very similar ranges (viz., 208278,
208276, and 208274 nm) (see Table 1).
A comparison of the PRESS values for the ranges yielding the best results
with each number of factors reveals that PRESS decreased with an increasing
number of factors. The differences, however, were all insignificant. We
therefore adopted two factors and the wavelength range 232238 nm as the
best choice for the determination of PB.
Figure 3. PRESS obtained for PB as a function of initial and final wavelengths by
using (a) two factors, (b) three factors, and (c) four factors.
PLS Treatment of Spectrophotometric Data 269
8/11/2019 Anal Letters 2007
14/25
Calibration and Validation Sets
Fig. 4 shows the results obtained in the determination of DPH and PB by
applying PLS-1 methodology to the calibration and validation sets used to
identify the optimum conditions (viz., prediction with two factors over the
ranges 202242 and 232238 nm for DPH and PB, respectively). As can
be seen, the results were quite satisfactory (the errors were all less than
2.0%).
With calibration by full cross-validation, the chosen models accounted for
99.95% and 99.99% of the initial variance of validation in the concentration
matrix for DPH and PB, respectively.
Prediction Set
The predictive ability of a model cannot be assessed by using any samples
employed to establish its optimum parameter values (i.e., those in the cali-
bration set) or determine the optimum wavelength range and number of
factors (i.e., those in the validation set). For this reason, we prepared a third
sample set, the prediction set, the concentrations of which were determined
under the previously described optimum conditions with a view to assessing
the predictive ability of the proposed model.
As can be seen from Table 2, the errors in the concentrations of bothanalytes were all less than 3.0%. Also, there was good correlation between
the amounts added and found, as reflected in the slope and intercept
obtained in the mutual confidence region test (a 0.05) (Mandel and
Linnig 1957), which suggests the presence of no systematic error in the pre-
diction of either analyte.
One question raised by the proposed procedure for establishing the
optimum wavelength range and number of factors for PLS determinations
based on spectrophotometric data is to what extent the data matrix used
to calculate PRESS influences the final result. This was studied by
repeating the previously described procedure but using the prediction set
instead of the validation set in the calculations. Fig. 5 shows the PRESSvalues obtained in the different wavelength ranges used to predict the con-
centrations of the validation matrix with two factors. As can be seen, the
resulting surface was highly similar to that for the PRESS of the validation
matrix (Fig. 3a).
The lowest PRESS values found in this way were 7.40 1023 and
8.30 1023, and corresponded to the ranges 234240 and 232238 nm,
respectively. These ranges, which contain four variables each, are very
similar; also, the latter (232 238 nm) coincides with the previously
obtained lowest-PRESS range. Consequently, the performance of the
proposed method is not influenced by the specific matrix used to optimize
the wavelength range and number of factors since the use of two different
matrices to calculate PRESS yields virtually the same results.
M. L. Luis et al.270
8/11/2019 Anal Letters 2007
15/25
Figure 4. Relative errors obtained for the determination of DPH (A) and PB () in
(a) calibration set and (b) validation set, by using PLS-1 with two factors in the ranges
202242 nm and 232238 nm, respectively.
PLS Treatment of Spectrophotometric Data 271
8/11/2019 Anal Letters 2007
16/25
Table 2. Results obtained for the determination of DPH and PB in the prediction set by using PLS-1 with two
in the ranges 202242 nm and 232238 nm, respectively, and by bwcoefficients method
Added (mg/mL)
Significant PRESS method bw
Found (mg/mL) Relative error (%) Found (mg/m
DPH PB DPH PB DPH PB DPH
2.17 2.22 2.17+ 0.09 2.23+ 0.03 0.00 0.45 2.13+ 0.02 2.2
2.17 4.84 2.11+ 0.04 4.86+ 0.04 22.76 0.41 2.2+ 0.02 4.
2.17 7.26 2.22+ 0.04 7.26+ 0.05 2.30 0.00 2.18+ 0.05 7.2
2.17 9.88 2.18+ 0.06 9.86+ 0.02 0.46 2
0.20 2.11+ 0.03 9.4.73 2.22 4.66+ 0.03 2.25+ 0.02 21.48 1.35 4.69+ 0.01 2.2
4.73 4.84 4.69+ 0.03 4.90+ 0.03 20.85 1.24 4.76+ 0.03 4.
4.73 7.26 4.62+ 0.04 7.25+ 0.01 22.33 20.14 4.72+ 0.04 7.2
4.73 9.88 4.67+ 0.04 9.89+ 0.02 21.27 0.10 4.79+ 0.04 9.
7.09 2.22 7.07+ 0.04 2.22+ 0.02 20.28 0.00 7.11+ 0.04 2.
7.09 4.84 7.09+ 0.05 4.84+ 0.02 0.00 0.00 7.08+ 0.04 4.
7.09 7.26 7.07+ 0.03 7.29+ 0.02 20.28 0.41 7.12+ 0.02 7.
7.09 9.88 7.19+ 0.04 9.90+ 0.03 1.41 0.20 7.19+ 0.06 9.9
9.66 2.22 9.64+ 0.05 2.22+ 0.01 20.21 0.00 9.63+ 0.02 2.2
9.66 4.84 9.63+ 0.04 4.87+ 0.03 20.31 0.62 9.71+ 0.05 4.
9.66 7.26 9.59+ 0.04 7.27+ 0.03 20.72 0.14 9.88+ 0.03 7.2
9.66 9.88 9.54+ 0.09 9.86+ 0.02 1.24 2
0.20 9.45+ 0.04 9.PRESS 5.98e-2 8.30e-3 1.20e-1 1.9
8/11/2019 Anal Letters 2007
17/25
To confirm the benefits of wavelength selection, Table 3 shows the
PRESS obtained in the determination of DPH and PB by using PLS-1 for
the models built with the full-spectrum method (200270 nm for DPH and
200280 nm for PB) and with the best ranges obtained from the proposed
method for the wavelength selection (202 242 nm for DPH and 232
238 nm for PB). It can be seen that PRESS values diminish substantially in
the case of the best model found by the significant PRESS method, so wave-
length selection allows improving results in multivariate calibration methods.
Likewise, Fig. 6 shows that relative errors in the determination of DPH and PB
are lower when wavelength range is optimized.
Figure 5. PRESS obtained for determination of PB in the prediction set.
Table 3. PRESS obtained for the determination of DPH and PB by using PLS-1
with two factors in the ranges selected by the significant PRESS method and in the
full-spectrum method
Set
DPH PB
202 242 nm 200 270 nm 232 238 nm 200 280 nm
Calibration 2.50 1022 1.81 1021 6.90 1023 5.00 1022
Validation 7.70 1023 8.97 1022 1.68 1022 1.41 1021
Prediction 5.98 10
22
2.03 10
21
8.30 10
23
5.54 10
22
PLS Treatment of Spectrophotometric Data 273
8/11/2019 Anal Letters 2007
18/25
Figure 6. Relative errors obtained in the determination of (a) DPH and (b) PB in the
prediction set by using PLS-1 with two factors in the ranges selected by the significant
PRESS method () and in the full-spectrum method (A).
M. L. Luis et al.274
8/11/2019 Anal Letters 2007
19/25
BwCoefficients Method
This method was implemented by conducting a PLS-1 regression on the cali-
bration matrix for each analyteprepared as described in the previous
sectionsfollowing centering and standardization of data. The resulting
models were validated by full cross-validation.
Fig. 7 shows the RMSECV for each analyte as a function of the number of
factors used in the prediction. As can be seen, the optimum number of factors
was two, both with the criterion of the first local RMSECV minimum and with
that of the RMSECV value not significantly different from the first local
minimum.Fig. 8 shows the variation of the bw coefficients for each analyte as a
function of wavelength with two factors. The curves exhibit two distinct
portions above and below about 226 nm. In the first portion, both DPH and
PB exhibit high bw coefficients (positive for the former and negative for the
latter) at 212 214 nm, which is the zone most strongly influenced by
DPH. Above 226 nm, bw coefficients are high (negative for DPH and
positive for PB) at 242 244 nm, where PB exhibits an absorption
maximum. Also, some regions are under no appreciable effect from either
analyte. Such is the case at wavelengths above 266 nm, where both analytes
Figure 7. RMSECV as a function of number of factors obtained by PLS-1 using
standardized data of the whole spectraDPH () and PB (B).
PLS Treatment of Spectrophotometric Data 275
8/11/2019 Anal Letters 2007
20/25
absorb very weakly; near 226 nm, where bw coefficients change sign and are
thus very small; and below 206 nm, where the absorption spectra for both
analytes are fully overlapped.
The next step involved selecting wavelengths in terms of the correspond-
ing bwvalue: the greater the coefficient was, the stronger was the significance
of the wavelength concerned. To determine the number of variables to be con-
sidered, a series of sets consisting of wavelengths with bw values not
exceeding a preset level were constructed. Table 4 shows the bw values
used to include or exclude variables in the models, as well as the number of
variables contained in each.
Then the PLS-1 models for the previous sets were constructed from
centered, nonstandardized data, and the respective RMSECV values were cal-
culated. The model yielding the lowest RMSECV was chosen as optimal, and
the number of variables contained in it was adopted as the most suitable. Fig. 9
shows the RMSECV values obtained with the different models tested in the
determination of DPH and PB. As can be seen, the optimum number of
factors was two with all models and both analytes, whatever the number of
variables used. Also, RMSECV increased slightly with the increasing
number of variables, particularly from 20 to 24 variables for DPH and from
24 to 28 for PB. The lowest RMSECV in the determination of DPH,
0.0312, was obtained with four variables; the ensuing model accounted for
Figure 8. bwcoefficients of DPH () and PB ( ).
M. L. Luis et al.276
8/11/2019 Anal Letters 2007
21/25
99.0% of the initial variance in the data. On the other hand, the lowest
RMSECV for PB, 0.0195, was obtained with eight variables, and the
resulting model accounted for 99.4% of the initial variance in the data.
Note that the wavelengths contained in the optimum model for determining
DPH (240 246 nm) were included in that for the determination of PB
(236 248 nm).
To compare this variable selection method with the proposed one, the bestmodel provided by the bwcoefficient method was used to determine the con-
centrations of the samples in the prediction set used to assess the performance
of the best significant PRESS value. The results, shown in Table 2, reveal that
the predictions were reasonably accurate, with relative errors less than 3.0%
and 1.5% for DPH and PB, respectively. Also, the slopes and intercepts
obtained with the mutual confidence region method reveal no signs of sys-
tematic errors in the quantitation of either analyte.
As can also be seen from Table 2, the PRESS values obtained with
the best significant PRESS method (viz., 5.98 1022 for DPH and
8.30 1023 for PB) are smaller than those provided by this method (viz.,
1.20 1021
for DPH and 1.96 1022
for PB). Consequently, the proposedmethod provides better results.
Determination of DPH and PB in Commercial Pharmaceutical
Preparations
Once the optimum wavelength range and number of factors for determining
DPH and PB with the lowest possible errors were identified, the two
analytes were quantified simultaneously in the commercial pharmaceutical
preparation Epilantinw. The predictive ability of the proposed model (viz.,
PLS-1 with two factors and the wavelength ranges 202242 nm for DPH
and 232 238 nm for PB) was assessed by quantifying the samples using
Table 4. Limit values for bwcoefficients and correspond-
ing number of variables in the construction of the models
Model DPH PH Number of variables
1 0 0 41
2 0.130 0.135 36
3 0.263 0.254 32
4 0.508 0.528 28
5 0.915 0.699 24
6 1.358 0.953 20
7 1.793 1.354 168 2.206 1.729 12
9 2.566 2.553 8
10 2.831 3.230 4
PLS Treatment of Spectrophotometric Data 277
8/11/2019 Anal Letters 2007
22/25
Figure 9. RMSECV as a function of the number of variables for (a) DPH and (b) PB.
(Number of factors used is placed in the corresponding curve).
M. L. Luis et al.278
8/11/2019 Anal Letters 2007
23/25
HPLC as the reference method. As can be seen from Table 5, there was good
consistency between the contents stated and those found with both PLS and
HPLC.
CONCLUSIONS
In this work, we developed a simple method of easy implementation for the
simultaneous selection of the optimum continuous wavelength range and
number of factors as a function of the calculated errors for the prediction
obtained by applying PLS to a sample set with a view to resolving mixtures
of analytes. It is shown that the use of the selected wavelength range
improved the results in relation to those obtained by using the full-spectrum
method. The method was used for the spectrophotometric determination of
two antiepileptics (viz., phenytoin and phenobarbital) in both synthetic
mixtures and a pharmaceutical preparation, and found to provide better
results than other easy to implement, existing methods.
REFERENCES
Abdollahi, H. and Bagheri, L. 2004. Simultaneous spectrophotometric determination of
Vitamin K3 and 1,4-naphthoquinone after cloud point extraction by using geneticalgorithm based wavelength selection-partial least squares regression. Anal. Chim.
Acta, 514: 211 218.
Centner, V., Massart, D.L., Noord, O.E., Jong, S., Vandeginste, B.M., and Stema, C.
1996. Elimination of uninformative variables for multivariate calibration. Anal.
Chem., 68: 38513858.
Espinosa-Mansilla, A., Duran Meras, I., Rodrguez Gomez, M.J., Munoz de la
Pena, A., and Salinas, F. 2002. Selection of the wavelength range and spectrophoto-
metric determination of leucovorin and methotrexate in human serum by a net
analyte signal based method. Talanta, 58: 255263.
Galvao, R.K.H., Pimentel, M.F., Ugulino Araujo, M.C., Yoneyama, T., and Visan, V.
2001. Aspects of the successive projections algorithm for variable selection in multi-
variate calibration applied to plasma emission spectrometry. Anal. Chim. Acta, 443:
107115.
Table 5. Determination of DPH and PB in pharmaceuti-
cals (100 mg of DPH and 50 mg of PB declared per tablet)
DPH (mg)+ SD PB (mg)+ SD
PLSa
99.04+ 0.88 49.61+ 0.35
HPLCb 99.71+ 0.92 49.88+ 0.28
aValue provided by unscrambler.bValue obtained from the straight line model.
PLS Treatment of Spectrophotometric Data 279
8/11/2019 Anal Letters 2007
24/25
8/11/2019 Anal Letters 2007
25/25