Anal Letters 2007

Embed Size (px)

Citation preview

  • 8/11/2019 Anal Letters 2007

    1/25

    CHEMOMETRICS

    Selection of Wavelength Range and Numberof Factors to be Used in the PLS Treatment

    of Spectrophotometric Data

    M. Luz Luis, JoseM. G. Fraga, Francisco Jimenez,

    Ana I. Jimenez, Oscar M. Hernandez, and Juan J. Arias

    Departamento de Qumica Analtica, Nutricion y Bromatologa, Facultad

    de Qumica, Universidad de La Laguna, La Laguna, Tenerife, Spain

    Abstract:A procedure for selection of wavelength range and number of factors to be

    used in partial least square calibration that involves the calculation of predictionresidual sum of squares (PRESS) in different conditions is proposed. The best model

    takes into account the minimum PRESS value that does not show significant differ-

    ences with respect to the corresponding model with fewer factors. The ability of the

    proposed method to minimize errors in partial least squares (PLS) prediction is demon-

    strated by applying it to the resolution of phenytoine (DPH) and phenobarbital (PB)

    binary mixtures with errors less than 2.8%; the results are compared with those

    obtained using another wavelength selection procedure. The ensuing method, which

    was validated by high performance liquid chromatography (HPLC), also gives good

    results with real samples (pharmaceutical preparations).

    Keywords:Wavelength selection, number of factors selection, partial least squares

    INTRODUCTION

    A whole absorption spectrum provides a more accurate description of the

    sample concerned than a single measurement does; there is, however, some

    Received 9 March 2006; accepted 23 May 2006

    The authors wish to acknowledge the financial support of this work by Education,

    Culture and Sports Council of Canary Government, research project PI2002/014.Address correspondence to Juan J. Arias, Departamento de Qumica Analtica,

    Nutricion y Bromatologa, Facultad de Qumica, Universidad de La Laguna,

    E-38071, La Laguna, Tenerife, Spain. Tel.: 34-922-318-076; Fax: 34-922-318-

    003; E-mail: [email protected]

    Analytical Letters, 40: 257280, 2007

    Copyright# Taylor & Francis Group, LLCISSN 0003-2719 print/1532-236X onlineDOI: 10.1080/00032710600867598

    257

  • 8/11/2019 Anal Letters 2007

    2/25

    redundancy, as not all wavelengths contribute unique information. Multi-

    variate calibration methods can deal with this shortcoming of the spectral

    data, but the improvement of results in these cases has raised the need for

    alternative methods to identify those measured variables of actual relevance

    with a view to constructing the calibration model.

    Partial least squares (PLS) calibration methodology resolves the matrix of

    independent variables (X, absorbances at different wavelengths) and that of

    dependent variables (Y, analyte concentrations in the samples) into their

    respective latent variables or factors. The number of independent variables

    usually exceeds that of dependent variables, so the number of latent

    variables or PLS factors that could in principle be used is at least as largeas the rank of the independent variable matrix. However, all those factors

    are rarely used because the independent variables are not always free of

    noise and some factors may contain noise and introduce it into the model. If

    the two matrices are linearly related, there are no interferences, and the inde-

    pendent variable matrix is free of noises, then the number of latent variables or

    factors required to describe the model equals the rank of the dependent

    variables. Such a number must be increased in the presence of a nonlinear

    relation or some interference.

    As with any calibration method, the selection of the independent variables

    to be used plays a crucial role in the development of a quality PLS model.

    Thus, although most PLS applications use the entire wavelength range (i.e.,the full-spectrum method), a number of alternative methods for wavelength

    selection that exclude those associated with high noise, nonlinearity, or irrele-

    vant information have been developed (Osborne et al. 1997; Goicoechea and

    Olivieri 1999; Centner et al. 1996; Jouan-Rimbaud et al. 1995; Espinosa-

    Mansilla et al. 2002; Leardi et al. 2002; Jiang et al. 2002; Galvao et al.

    2001; Kompany-Zareh and Mirzaei 2004; Gourvenec et al. 2004; Abdollahi

    and Bagheri 2004; Ghasemi et al. 2003). However, many researchers find

    these methods difficult to apply because most of them involve complex algo-

    rithms and mathematic developments. Therefore, new methods that can be

    easily implemented are always welcome.

    In this work, we developed a simple, easy-to-implement method, for the

    simultaneous selection of the wavelengths and number of factors to be used in

    the PLS regression of spectrophotometric data. The method involves con-

    structing as many PLS-1 models as wavelength ranges with f factors are

    possible for each analyte from the same calibration set. Each model is then

    used to calculate the predicted residual sum of squares (PRESS) for the vali-

    dation set with a variable number of factors from 1 to f. For each analyte andf

    value, the model exhibiting the lowest PRESS that is significantly different

    from the model corresponding to the same interval, but with (f2 1)

    factors, is chosen. The PRESS values of the models thus selected for each

    number of factors are then compared, and the one with the lowest PRESS

    that is significantly different from the value for the model with one less

    factor is chosen.

    M. L. Luis et al.258

  • 8/11/2019 Anal Letters 2007

    3/25

    The proposed method for the simultaneous selection of the wavelength

    range and number of factors was validated by applying it to the determination

    of two antiepileptics [viz., phenytoin (DPH) and phenobarbital (PB), the

    spectra for which in the UV region are completely overlapped] in pharma-

    ceutical preparations. The results thus obtained are compared with those

    provided by the method for the selection of wavelengths and number of

    factors, the Bw coefficients method (Garrido et al. 1995); this method was

    chosen because it is also simple and easy to implement.

    A method has been developed concerning spectrophotometric data

    (Garrido et al. 1995) that selects those wavelengths exhibiting the highest B

    coefficients in the PLS regression of the full spectrum. To ensure anidentical variance in all variables, the data must previously be mean-

    centered and standardized (i.e., weighted with the reciprocal of the standard

    deviation); the regression coefficients thus obtained are designated Bw.

    Matrix Bw can be calculated directly from the PLS loading for the model

    with the optimum number of factors:

    Bw W PT W1 QT

    where Wis the Xweight-loading matrix, P the X-loading matrix, and Q the

    Y-loading matrix. The centered, nonstandardized X matrix is then used to

    construct various models involving a small number of variables corresponding

    to a given level ofBwcoefficients, and each model is subjected to a leave-one-

    out cross-validation process to determine the number of wavelengths to be

    included in the final model.

    SELECTION METHODS

    Best Significant PRESS Method

    The process was conducted using macros in the software Unscrambler

    (Computer-Aided Modelling A/S, 1993). The steps leading to the bestmodel are as follows:

    1. The spectra for the calibration and validation samples are recorded, and the

    calibration matrix (C), which is used to develop the models, and validation

    matrix (O), which allows the best model to be identified, are constructed.

    Bothmatrices encompass i variables (wavelengthsn1,n2 . . . ni)andj samples.

    2. AnMn1,ni

    model is then constructed by applying PLS-1 to the calibration

    matrix Cn1,ni

    [i j], which is used to predict the validation samples

    by using 1, 2, . . . h, . . . f factors, and the PRESSn1,nih values for each

    Pn1,ni

    hprediction made with each number h of factors are calculated.

    3. The variable with the highest subscript, ni, is removed from Cn1,ni to

    obtain a calibration matrix Cn1,ni21 [(i 2 1) j] that is used to construct

    PLS Treatment of Spectrophotometric Data 259

  • 8/11/2019 Anal Letters 2007

    4/25

    an Mn1,ni21

    model by PLS-1, and the corresponding predictions, Pn1,ni21

    h,

    with their respective PRESS values for each number h of factors

    (PRESSn1,ni21

    h) are calculated.

    4. Step 3 is repeated as many times as needed to obtain matrixCn1,n3

    [3 j],

    Mn1,n3

    ,Pn1,n3

    h , and PRESSn1,n3

    h . In this way, every possible model having n1as its initial wavelength and at least three variables (i22 models) is con-

    structed and the PRESS for its predictions with 1, 2, . . . h, . . . ffactors

    calculated.

    5. The variable with the lowest subscript (n1) is removed from the initial

    matrix, Cn1,ni

    [i j], to obtain Cn2,ni

    [(i 2 1) j], from which model

    Mn2,ni is constructed, and the corresponding predictions, Pn2,nih , andPRESS

    n2,ni

    hvalue for each number of factors tested are calculated.

    6. The variable with the highest subscript (ni) is removed from Cn2,ni[(i 2 1) j] to obtain Cn2,ni21 [(i 2 2) j], and a model Mn2,ni21 is con-

    structed from which Pn2,ni21

    h and then PRESSn2,ni21

    h are obtained.

    7. Step 6 is repeated until matrices Cn2,n4

    [3 j], Mn2,n4

    and Pn2,n4

    h, and

    PRESSn2,n4

    hare obtained. In this way, every possible model having n2 as

    its initial wavelength and at least three variables (i 2 3 models) is

    constructed.

    8. Steps 5 through 7 are repeated until the following calibration matrix series

    are obtained in each iteration:

    Cn1;ni ; Cn1;ni1 ; Cn1;ni2 ;. . . ; Cn1;n5 ; Cn1;n4 ; Cn1;n3

    Cn2;ni ; Cn2;ni1 ; Cn2;ni2 ;. . . ; Cn2;n5 ; Cn2;n4

    Cn3;ni ; Cn3;ni1 ; Cn3;ni2 ;. . . ; Cn3;n5

    .

    .

    ....

    .

    .

    .

    Cni4 ;ni ; Cni4;ni1 ; Cni4;ni2

    Cni3 ;ni ; Cni3;ni2

    Cni2 ;ni

    with their respective models (M

    nx,ny), predictions (

    Pnx,ny

    h

    ), and PRESSnx,ny

    h

    values, where y x 2.

    9. The optimum range for each number of factors, h, is that with the

    minimum PRESSh value significantly smaller than PRESSh21 for the

    same model with one less factor. This is usually checked by calculating

    the statistic F, the tabulated value of which for a probability of 0.75

    allows significant differences between models to be accurately detected

    according to Haaland and Thomas methodology (Haaland and Thomas

    1988). If the model with the lowest PRESS does not fulfill the previous

    requirement, then the next model with the lowest PRESS is chosen and

    the same criterion applied. This process must be carried out for the

    1, 2, . . . h, . . .f factors tested. Because it is a restrictive criterion, it

    avoids overfitting.

    M. L. Luis et al.260

  • 8/11/2019 Anal Letters 2007

    5/25

    10. The previous step yieldsfmodels (one per factor), the best of which is

    identified by applying the previous criterion again. In fact, the most

    suitable model will be that possessing the lowest PRESSprovided

    that its calculated F value reflects significant differences with the

    PRESS corresponding to one less factor.

    BwCoefficients Method

    In spectroscopic applications,b coefficients (B) are those that relate the set of

    independent variables (X) to that of dependent variables (Y). These coefficientscannot be used as such to identify the most relevant wavelengths in the model

    since large absolute values may indicate, indifferently, that a variable is

    significant or that it possesses a small absolute value and a high variability.

    This shortcoming can be avoided by standardizing data (i.e., by weighting

    the X variables with the reciprocal of the standard deviation, which causes

    all variables to have the same variance). In this way, a large absolute value

    of the bwcoefficients will inevitably correspond to a major Xvariable so that

    the specific variables to be used to construct the model can be readily identified.

    To find the optimum number of wavelengths, one must examine every

    model constructed from wavelengths with bw coefficients not exceeding

    specific, preset values. One additional problem, present always in PLSregression, arises from the need to determine the optimum number of

    factors, which is calculated from the root mean square error of cross-

    validation (RMSECV):

    RMSECV

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffi1

    N

    XNi1

    ^y yi2

    vuut

    wherey is the predicted concentration for the ith calibration sample and yiits

    actual concentration. The optimum number of factors can be decided on by

    using two different criteria, namely:

    1. Criterion I, which involves the number of factors corresponding to the

    minimum local RMSECV in a plot of RMSECV against the number of

    factors included in the model.

    2. Criterion II, which is based on the number of factors preceding the first

    local RMSECV minimum is most suitable, provided it is not significantly

    greater than the minimum itself. Otherwise, the local minimum is chosen.

    This is the criterion used by Unscrambler software (Computer-Aided

    Modelling A/S, 1993).

    Depending on the particular bwlevels chosen, sets with different numbers

    of variables are obtained. Each set is used to construct a model and plot the

    PLS Treatment of Spectrophotometric Data 261

  • 8/11/2019 Anal Letters 2007

    6/25

    resulting RMSECV values against the corresponding numbers of wavelengths.

    A minimum for each criterion used to select the optimum number of factors

    will be obtained, and the number resulting in the lowest RMSECV is

    chosen as optimal.

    EXPERIMENTAL SECTION

    Apparatus

    Spectra were recorded on a Hewlett-Packard HP 8452A diode array spectro-photometer (Avondale, PA, USA) interfaced to a Vectra ES computer, also

    from Hewlett-Packard. Quartz cuvettes of 1 cm light path and 4 mL inner

    volume were used throughout.

    Chromatographic runs were conducted on a Waters 600E liquid chro-

    matograph (Waters; Milford, MA, USA) with a 20-mL loop volume

    equipped with a Model 996 diode array detector from the same manufacturer.

    Both were operated via an NEC Power Mate 433 computer. All determinations

    were carried out using a NovaPack C18 column (3.9 150 mm) packed with

    particles 4 mm in size.

    pH measurements were made with a Crison digital pH meter

    (Crison; Barcelona, Spain) furnished with a glass-saturated calomel doubleelectrode.

    A Selecta ultrasonic bath (R. Espinar; Sevilla, Spain) and an AND ER-

    182 analytical balance (Pacisa, S.A.; Madrid, Spain) were also used.

    Software

    The software used included Millennium version 2.10 (Millipore Corp.;

    Milford, MA, USA; 1993) and MS-DOS-UV-Visible HP 8452A to control

    the chromatographic system and spectrophotometer, respectively, and

    Unscrambler II version 5.0 (Computer-Aided Modelling A/

    S; Trondheim,

    Norway; 1993) to perform the partial least squares regressions.

    Reagents

    The reagents employed included 100 mg mL21

    standard solutions of DPH

    and PB, which were prepared by direct weighing of the respective pure

    commercial products (Sigma; Madrid, Spain) and dissolution in ethanol;

    a 0.2 M NH4Cl/NH3 buffer of pH 10.0; and a 10-mM TRIS buffer ofpH 7.0. All reagents were analytical-grade and aqueous solutions

    prepared in Millipore/Milli-Q (Millipore; Bedford, MA, USA) deionizedwater.

    M. L. Luis et al.262

  • 8/11/2019 Anal Letters 2007

    7/25

    Procedure

    Calibration, Validation, and Prediction Sets

    The calibration, validation, and prediction sets were constructed by using a

    factor design consisting of two analytes at four levels (42). Concentrations

    in the calibration set were within the linear range for each analyte (viz.,

    1.979.85 mg mL21

    for DPH and 2.02 10.08mg mL21

    for PB). The

    lowest and highest values for the validation and prediction sets were greater

    than the minimum and lesser than the maximum values, respectively, used

    for the calibration set.

    Determination of DPH and PB in Synthetic Mixtures

    In 25 mL calibrated flasks were placed 5 mL of NH4Cl/NH3 buffer and thevolumes of DPH and PB solutions required to obtain a final concentration

    over the range 1.979.85 and 2.0210.08 mg mL21

    , respectively, plus that

    of ethanol needed to make the mixture 20% (v/v) in it. The solutions weremade to volume with deionized water and their absorption spectra between

    200 and 300 nm, every 2 nm, recorded against an identically prepared blank

    containing no analyte.

    Determination of DPH and PB in Pharmaceutical Preparations

    Using PLS Regression

    DPH and PB were determined in Epilantinw

    (Otsuka Pharmaceutical, S.A.;

    Barcelona, Spain) by weighing, grinding, and homogenizing 10 tablets of

    the pharmaceutical, and using six samples of 40 mg each for analysis.

    The drugs were extracted with 100 mL of ethanol in an ultrasonic bath for

    20 min and diluted to 250 mL with deionized water. Finally, an aliquot of

    3 mL of the resulting solution was analyzed as described in Determination

    of DPH and PB in Synthetic Mixtures.

    Determination of DPH and PB in Pharmaceutical Preparations

    Using HPLC

    The previous solutions of the two drugs were filtered across a polypropylene

    membrane of 0.4 mm pore size prior to injection into the chromatograph.

    Determinations were carried out by using a NovaPack C18 (3.9 150 mm)

    column and a 60:40 mixture of methanol and 10 mM TRIS buffer at 7.0 pH

    that was circulated at a flow rate of 1 mL min21. The drugs were quantified

    spectrophotometrically from the areas under the peaks at 254 nm.

    PLS Treatment of Spectrophotometric Data 263

  • 8/11/2019 Anal Letters 2007

    8/25

    RESULTS AND DISCUSSION

    As can be seen in Fig. 1, the absorption spectra for DPH and PB are completely

    overlapped throughout the UV region. Both drugs exhibit a strong peak at

    212 nm; in addition, PB gives a maximum at 240 nm and DPH a shoulder at

    the same wavelength. DPH and PB absorb UV light at wavelengths up to 270

    and 280 nm, respectively. Its strong spectral overlap makes this mixture an app-

    ropriate model to assess the performance of the proposed method for selecting

    the wavelength range and number of factors to be used in the determination.

    The applicability of the method was assessed by applying it to three

    different sample sets (viz., a calibration, a validation, and a prediction set).The calibration set was used to construct the PLS models for the different

    possible wavelength ranges (Mnx,ny

    ). The validation set was employed to

    select the prediction resulting in the fewest errors (viz., the lowest significant

    PRESSnx,ny

    h value) among those obtained with the models provided by

    the calibration set and a variable number of factors (Pnx,ny

    h). Finally, the

    prediction set, which consisted of samples not included in the calibration or

    validation set, was used to assess the predictive ability of the method.

    These three sets, prepared as described in the section titled Experimental,

    were used to select the optimum wavelength range and number of factors for

    applying the PLS-1 method to the determination of each analyte in the sample.

    Figure 1. Absorption spectra of DPH () and PB ( ) solutions.

    M. L. Luis et al.264

  • 8/11/2019 Anal Letters 2007

    9/25

  • 8/11/2019 Anal Letters 2007

    10/25

  • 8/11/2019 Anal Letters 2007

    11/25

    variables (200242 and 200244 nm) but did not improve on the previous

    results. The PRESS value obtained with two factors was significantly lower

    than that obtained with one factor in the three cases.

    As can be seen from Fig. 2b, the inclusion of a third factor decreased

    PRESS in the region of high values (226232 nm) and increased it in the

    region of low values (valleys). The surface in the region where a peak was

    observed with two factors lost smoothness as a result of the system being

    overfitted in most of these ranges. As can be seen from Table 1, the ranges

    providing the lowest PRESS values with two and three factors were virtually

    identical; however, the PRESS values obtained with two factors in the ranges

    Table 1. Minimum values of PRESS and corresponding wavelength ranges for the

    different numbers of factors assayed

    Range (nm)

    Number of factors

    1 2 3 4

    DPH

    Best models using 2 factors

    202 242 60.84 7.70 3 1023 9.50 1023 6.05 1022

    200 242 60.58 1.41 1022 1.35 1022 5.45 1022

    200 244 61.58 1.86 1022 8.60 1023 5.37 1022

    Best models using 3 factors

    200 246 61.58 3.69 1022 8.10 3 1023 4.61 1022

    200 248 61.97 5.46 1022 8.30 1023 4.20 1022

    200 244 61.34 1.86 1022 8.60 1023 5.37 1022

    Best models using 4 factors

    210 240 58.01 9.52 1022 2.93 1021 1.733 1022

    212 240 46.75 5.85 100 2.21 1022 2.20 1022

    212 232 52.60 8.91 1022 3.55 1021 2.29 1022

    PB

    Best models using 2 factors

    232 238 26.01 1.68 3 1022 2.01 1021 2.31 1021

    228 242 25.75 1.70 102

    2 2.51 102

    1 9.62 102

    2

    226 244 25.95 1.74 1022 1.12 1021 1.75 1022

    Best models using 3 factors

    214 280 42.61 5.75 1022 1.34 3 1022 7.63 1022

    216 276 36.94 4.65 1022 1.36 1022 1.94 1021

    218 278 32.19 4.17 1022 1.36 1022 1.53 1021

    Best models using 4 factors

    208 278 57.75 8.93 1022 4.69 1022 1.23 3 1022

    208 276 57.75 8.74 1022 4.69 1022 1.26 1022

    208 274 57.75 8.74 1022 4.69 1022 1.26 1022

    Bold numbers show the minimum value of PRESS for each number of factors.

    PLS Treatment of Spectrophotometric Data 267

  • 8/11/2019 Anal Letters 2007

    12/25

    corresponding to the minima obtained with three factors were significantly

    higher.

    Finally, the use of four factors yielded a less uniform surface (Fig. 2c).

    This may be the result of virtually all ranges being overfitted and of factors

    strongly dependent on noise or some other source not related to the analyte con-

    centration being incorporated into the process. As can clearly be seen from

    Table 1, the lowest PRESS value obtained with four factors was more than

    twice that provided by two or three factors. Also, the second-best PRESS

    obtained with four factors (2.20 1022) was not even significantly lower

    than that corresponding to the same range with three factors (2.21 1022).

    If only the best range for each factor is considered, then the 202242 nmrange with two factors was that yielding the lowest PRESS in the determi-

    nation of DPH.

    Determination of PB

    The determination of PB in the presence of DPH was performed by construct-

    ing a calibration matrix similar to that used for DPH, except that the wave-

    lengths were those in the range 200 280 nm (viz., the region where the

    analyte absorbs UV light). By following a procedure similar to that for

    DPH, matrix C200,280 [41 16] was used to obtain all possible wavelength

    ranges from 200 to 280 nm with at least three variables. The models thusobtained were used to calculate the concentrations for the prediction set

    with 1, 2, 3, and 4 factors, as well as the corresponding PRESS values.

    As with DPH, a single factor provided models that accounted for less than

    95% of the initial variance and were poorly predictive. This suggests that the

    use of a single factor resulted in underfittingin fact, the best PRESS value

    thus obtained was 7.81 (wavelength range 250256 nm).

    Fig. 3a shows the PRESS values obtained with two factors. As can be seen,

    mostthose for the ranges containing wavelengths below 230 or above 236 nm

    only exceptedwere less than 0.5. By exception, the ranges starting at 254 or

    256 nm and ending in 276 or 278 nm had PRESS values around 0.020, even

    though their wavelengths were in the region of high PRESS levels. Theranges providing the best PRESS values with two factors were those with

    their centers at 235 nm and including 4, 8, or 10 variables. As can be seen

    from Table 1, the PRESS values obtained with two factors differed by three

    orders of magnitude from those provided by a single factor. Fig. 3b shows

    the PRESS values obtained with three factors. Most were lower than those

    obtained with two factorsexcept in the ranges with the lowest PRESS

    levels (viz., those with a central value of 235 nm and spanning 4, 6, 8, or

    10 nm), where PRESS was greater than in the other ranges as the likely

    result of the system with three factors being overdimensioned. The ranges

    with the lowest PRESS values in the prediction of the PB concentration with

    three factors were quite wide; in fact, they spanned nearly the whole spectral

    zone studied. The best PRESS values were those for the 214280, 216276,

    M. L. Luis et al.268

  • 8/11/2019 Anal Letters 2007

    13/25

    and 218 278 nm ranges (1.34 1022, 1.36 1022, and 1.36 1022,

    respectively). As can be seen from Table 1, the three values are significantly

    smaller than those obtained for the same ranges but only two factors.

    The best PRESS values obtained with four factors (1.23 1022,

    1.26 1022, and 1.26 1022) were slightly lower than those obtained

    with three factors (1.34 1022, 1.36 1022, and 1.36 1022). However,

    the shape of the PRESS surface (Fig. 3c) suggests that the models were over-

    dimensioned. The lowest PRESS values obtained with four factors were

    virtually identical and corresponded to very similar ranges (viz., 208278,

    208276, and 208274 nm) (see Table 1).

    A comparison of the PRESS values for the ranges yielding the best results

    with each number of factors reveals that PRESS decreased with an increasing

    number of factors. The differences, however, were all insignificant. We

    therefore adopted two factors and the wavelength range 232238 nm as the

    best choice for the determination of PB.

    Figure 3. PRESS obtained for PB as a function of initial and final wavelengths by

    using (a) two factors, (b) three factors, and (c) four factors.

    PLS Treatment of Spectrophotometric Data 269

  • 8/11/2019 Anal Letters 2007

    14/25

    Calibration and Validation Sets

    Fig. 4 shows the results obtained in the determination of DPH and PB by

    applying PLS-1 methodology to the calibration and validation sets used to

    identify the optimum conditions (viz., prediction with two factors over the

    ranges 202242 and 232238 nm for DPH and PB, respectively). As can

    be seen, the results were quite satisfactory (the errors were all less than

    2.0%).

    With calibration by full cross-validation, the chosen models accounted for

    99.95% and 99.99% of the initial variance of validation in the concentration

    matrix for DPH and PB, respectively.

    Prediction Set

    The predictive ability of a model cannot be assessed by using any samples

    employed to establish its optimum parameter values (i.e., those in the cali-

    bration set) or determine the optimum wavelength range and number of

    factors (i.e., those in the validation set). For this reason, we prepared a third

    sample set, the prediction set, the concentrations of which were determined

    under the previously described optimum conditions with a view to assessing

    the predictive ability of the proposed model.

    As can be seen from Table 2, the errors in the concentrations of bothanalytes were all less than 3.0%. Also, there was good correlation between

    the amounts added and found, as reflected in the slope and intercept

    obtained in the mutual confidence region test (a 0.05) (Mandel and

    Linnig 1957), which suggests the presence of no systematic error in the pre-

    diction of either analyte.

    One question raised by the proposed procedure for establishing the

    optimum wavelength range and number of factors for PLS determinations

    based on spectrophotometric data is to what extent the data matrix used

    to calculate PRESS influences the final result. This was studied by

    repeating the previously described procedure but using the prediction set

    instead of the validation set in the calculations. Fig. 5 shows the PRESSvalues obtained in the different wavelength ranges used to predict the con-

    centrations of the validation matrix with two factors. As can be seen, the

    resulting surface was highly similar to that for the PRESS of the validation

    matrix (Fig. 3a).

    The lowest PRESS values found in this way were 7.40 1023 and

    8.30 1023, and corresponded to the ranges 234240 and 232238 nm,

    respectively. These ranges, which contain four variables each, are very

    similar; also, the latter (232 238 nm) coincides with the previously

    obtained lowest-PRESS range. Consequently, the performance of the

    proposed method is not influenced by the specific matrix used to optimize

    the wavelength range and number of factors since the use of two different

    matrices to calculate PRESS yields virtually the same results.

    M. L. Luis et al.270

  • 8/11/2019 Anal Letters 2007

    15/25

    Figure 4. Relative errors obtained for the determination of DPH (A) and PB () in

    (a) calibration set and (b) validation set, by using PLS-1 with two factors in the ranges

    202242 nm and 232238 nm, respectively.

    PLS Treatment of Spectrophotometric Data 271

  • 8/11/2019 Anal Letters 2007

    16/25

    Table 2. Results obtained for the determination of DPH and PB in the prediction set by using PLS-1 with two

    in the ranges 202242 nm and 232238 nm, respectively, and by bwcoefficients method

    Added (mg/mL)

    Significant PRESS method bw

    Found (mg/mL) Relative error (%) Found (mg/m

    DPH PB DPH PB DPH PB DPH

    2.17 2.22 2.17+ 0.09 2.23+ 0.03 0.00 0.45 2.13+ 0.02 2.2

    2.17 4.84 2.11+ 0.04 4.86+ 0.04 22.76 0.41 2.2+ 0.02 4.

    2.17 7.26 2.22+ 0.04 7.26+ 0.05 2.30 0.00 2.18+ 0.05 7.2

    2.17 9.88 2.18+ 0.06 9.86+ 0.02 0.46 2

    0.20 2.11+ 0.03 9.4.73 2.22 4.66+ 0.03 2.25+ 0.02 21.48 1.35 4.69+ 0.01 2.2

    4.73 4.84 4.69+ 0.03 4.90+ 0.03 20.85 1.24 4.76+ 0.03 4.

    4.73 7.26 4.62+ 0.04 7.25+ 0.01 22.33 20.14 4.72+ 0.04 7.2

    4.73 9.88 4.67+ 0.04 9.89+ 0.02 21.27 0.10 4.79+ 0.04 9.

    7.09 2.22 7.07+ 0.04 2.22+ 0.02 20.28 0.00 7.11+ 0.04 2.

    7.09 4.84 7.09+ 0.05 4.84+ 0.02 0.00 0.00 7.08+ 0.04 4.

    7.09 7.26 7.07+ 0.03 7.29+ 0.02 20.28 0.41 7.12+ 0.02 7.

    7.09 9.88 7.19+ 0.04 9.90+ 0.03 1.41 0.20 7.19+ 0.06 9.9

    9.66 2.22 9.64+ 0.05 2.22+ 0.01 20.21 0.00 9.63+ 0.02 2.2

    9.66 4.84 9.63+ 0.04 4.87+ 0.03 20.31 0.62 9.71+ 0.05 4.

    9.66 7.26 9.59+ 0.04 7.27+ 0.03 20.72 0.14 9.88+ 0.03 7.2

    9.66 9.88 9.54+ 0.09 9.86+ 0.02 1.24 2

    0.20 9.45+ 0.04 9.PRESS 5.98e-2 8.30e-3 1.20e-1 1.9

  • 8/11/2019 Anal Letters 2007

    17/25

    To confirm the benefits of wavelength selection, Table 3 shows the

    PRESS obtained in the determination of DPH and PB by using PLS-1 for

    the models built with the full-spectrum method (200270 nm for DPH and

    200280 nm for PB) and with the best ranges obtained from the proposed

    method for the wavelength selection (202 242 nm for DPH and 232

    238 nm for PB). It can be seen that PRESS values diminish substantially in

    the case of the best model found by the significant PRESS method, so wave-

    length selection allows improving results in multivariate calibration methods.

    Likewise, Fig. 6 shows that relative errors in the determination of DPH and PB

    are lower when wavelength range is optimized.

    Figure 5. PRESS obtained for determination of PB in the prediction set.

    Table 3. PRESS obtained for the determination of DPH and PB by using PLS-1

    with two factors in the ranges selected by the significant PRESS method and in the

    full-spectrum method

    Set

    DPH PB

    202 242 nm 200 270 nm 232 238 nm 200 280 nm

    Calibration 2.50 1022 1.81 1021 6.90 1023 5.00 1022

    Validation 7.70 1023 8.97 1022 1.68 1022 1.41 1021

    Prediction 5.98 10

    22

    2.03 10

    21

    8.30 10

    23

    5.54 10

    22

    PLS Treatment of Spectrophotometric Data 273

  • 8/11/2019 Anal Letters 2007

    18/25

    Figure 6. Relative errors obtained in the determination of (a) DPH and (b) PB in the

    prediction set by using PLS-1 with two factors in the ranges selected by the significant

    PRESS method () and in the full-spectrum method (A).

    M. L. Luis et al.274

  • 8/11/2019 Anal Letters 2007

    19/25

    BwCoefficients Method

    This method was implemented by conducting a PLS-1 regression on the cali-

    bration matrix for each analyteprepared as described in the previous

    sectionsfollowing centering and standardization of data. The resulting

    models were validated by full cross-validation.

    Fig. 7 shows the RMSECV for each analyte as a function of the number of

    factors used in the prediction. As can be seen, the optimum number of factors

    was two, both with the criterion of the first local RMSECV minimum and with

    that of the RMSECV value not significantly different from the first local

    minimum.Fig. 8 shows the variation of the bw coefficients for each analyte as a

    function of wavelength with two factors. The curves exhibit two distinct

    portions above and below about 226 nm. In the first portion, both DPH and

    PB exhibit high bw coefficients (positive for the former and negative for the

    latter) at 212 214 nm, which is the zone most strongly influenced by

    DPH. Above 226 nm, bw coefficients are high (negative for DPH and

    positive for PB) at 242 244 nm, where PB exhibits an absorption

    maximum. Also, some regions are under no appreciable effect from either

    analyte. Such is the case at wavelengths above 266 nm, where both analytes

    Figure 7. RMSECV as a function of number of factors obtained by PLS-1 using

    standardized data of the whole spectraDPH () and PB (B).

    PLS Treatment of Spectrophotometric Data 275

  • 8/11/2019 Anal Letters 2007

    20/25

    absorb very weakly; near 226 nm, where bw coefficients change sign and are

    thus very small; and below 206 nm, where the absorption spectra for both

    analytes are fully overlapped.

    The next step involved selecting wavelengths in terms of the correspond-

    ing bwvalue: the greater the coefficient was, the stronger was the significance

    of the wavelength concerned. To determine the number of variables to be con-

    sidered, a series of sets consisting of wavelengths with bw values not

    exceeding a preset level were constructed. Table 4 shows the bw values

    used to include or exclude variables in the models, as well as the number of

    variables contained in each.

    Then the PLS-1 models for the previous sets were constructed from

    centered, nonstandardized data, and the respective RMSECV values were cal-

    culated. The model yielding the lowest RMSECV was chosen as optimal, and

    the number of variables contained in it was adopted as the most suitable. Fig. 9

    shows the RMSECV values obtained with the different models tested in the

    determination of DPH and PB. As can be seen, the optimum number of

    factors was two with all models and both analytes, whatever the number of

    variables used. Also, RMSECV increased slightly with the increasing

    number of variables, particularly from 20 to 24 variables for DPH and from

    24 to 28 for PB. The lowest RMSECV in the determination of DPH,

    0.0312, was obtained with four variables; the ensuing model accounted for

    Figure 8. bwcoefficients of DPH () and PB ( ).

    M. L. Luis et al.276

  • 8/11/2019 Anal Letters 2007

    21/25

    99.0% of the initial variance in the data. On the other hand, the lowest

    RMSECV for PB, 0.0195, was obtained with eight variables, and the

    resulting model accounted for 99.4% of the initial variance in the data.

    Note that the wavelengths contained in the optimum model for determining

    DPH (240 246 nm) were included in that for the determination of PB

    (236 248 nm).

    To compare this variable selection method with the proposed one, the bestmodel provided by the bwcoefficient method was used to determine the con-

    centrations of the samples in the prediction set used to assess the performance

    of the best significant PRESS value. The results, shown in Table 2, reveal that

    the predictions were reasonably accurate, with relative errors less than 3.0%

    and 1.5% for DPH and PB, respectively. Also, the slopes and intercepts

    obtained with the mutual confidence region method reveal no signs of sys-

    tematic errors in the quantitation of either analyte.

    As can also be seen from Table 2, the PRESS values obtained with

    the best significant PRESS method (viz., 5.98 1022 for DPH and

    8.30 1023 for PB) are smaller than those provided by this method (viz.,

    1.20 1021

    for DPH and 1.96 1022

    for PB). Consequently, the proposedmethod provides better results.

    Determination of DPH and PB in Commercial Pharmaceutical

    Preparations

    Once the optimum wavelength range and number of factors for determining

    DPH and PB with the lowest possible errors were identified, the two

    analytes were quantified simultaneously in the commercial pharmaceutical

    preparation Epilantinw. The predictive ability of the proposed model (viz.,

    PLS-1 with two factors and the wavelength ranges 202242 nm for DPH

    and 232 238 nm for PB) was assessed by quantifying the samples using

    Table 4. Limit values for bwcoefficients and correspond-

    ing number of variables in the construction of the models

    Model DPH PH Number of variables

    1 0 0 41

    2 0.130 0.135 36

    3 0.263 0.254 32

    4 0.508 0.528 28

    5 0.915 0.699 24

    6 1.358 0.953 20

    7 1.793 1.354 168 2.206 1.729 12

    9 2.566 2.553 8

    10 2.831 3.230 4

    PLS Treatment of Spectrophotometric Data 277

  • 8/11/2019 Anal Letters 2007

    22/25

    Figure 9. RMSECV as a function of the number of variables for (a) DPH and (b) PB.

    (Number of factors used is placed in the corresponding curve).

    M. L. Luis et al.278

  • 8/11/2019 Anal Letters 2007

    23/25

    HPLC as the reference method. As can be seen from Table 5, there was good

    consistency between the contents stated and those found with both PLS and

    HPLC.

    CONCLUSIONS

    In this work, we developed a simple method of easy implementation for the

    simultaneous selection of the optimum continuous wavelength range and

    number of factors as a function of the calculated errors for the prediction

    obtained by applying PLS to a sample set with a view to resolving mixtures

    of analytes. It is shown that the use of the selected wavelength range

    improved the results in relation to those obtained by using the full-spectrum

    method. The method was used for the spectrophotometric determination of

    two antiepileptics (viz., phenytoin and phenobarbital) in both synthetic

    mixtures and a pharmaceutical preparation, and found to provide better

    results than other easy to implement, existing methods.

    REFERENCES

    Abdollahi, H. and Bagheri, L. 2004. Simultaneous spectrophotometric determination of

    Vitamin K3 and 1,4-naphthoquinone after cloud point extraction by using geneticalgorithm based wavelength selection-partial least squares regression. Anal. Chim.

    Acta, 514: 211 218.

    Centner, V., Massart, D.L., Noord, O.E., Jong, S., Vandeginste, B.M., and Stema, C.

    1996. Elimination of uninformative variables for multivariate calibration. Anal.

    Chem., 68: 38513858.

    Espinosa-Mansilla, A., Duran Meras, I., Rodrguez Gomez, M.J., Munoz de la

    Pena, A., and Salinas, F. 2002. Selection of the wavelength range and spectrophoto-

    metric determination of leucovorin and methotrexate in human serum by a net

    analyte signal based method. Talanta, 58: 255263.

    Galvao, R.K.H., Pimentel, M.F., Ugulino Araujo, M.C., Yoneyama, T., and Visan, V.

    2001. Aspects of the successive projections algorithm for variable selection in multi-

    variate calibration applied to plasma emission spectrometry. Anal. Chim. Acta, 443:

    107115.

    Table 5. Determination of DPH and PB in pharmaceuti-

    cals (100 mg of DPH and 50 mg of PB declared per tablet)

    DPH (mg)+ SD PB (mg)+ SD

    PLSa

    99.04+ 0.88 49.61+ 0.35

    HPLCb 99.71+ 0.92 49.88+ 0.28

    aValue provided by unscrambler.bValue obtained from the straight line model.

    PLS Treatment of Spectrophotometric Data 279

  • 8/11/2019 Anal Letters 2007

    24/25

  • 8/11/2019 Anal Letters 2007

    25/25