7
Calibration transfer based on maximum margin criterion for qualita- tive analysis using Fourier Transform Infrared spectroscopy Yong Hu, Silong Peng, Yiming Bi, and Liang Tang Received Xth XXXXXXXXXX 20XX, Accepted Xth XXXXXXXXX 20XX First published on the web Xth XXXXXXXXXX 200X DOI: 10.1039/b000000x Traditional multivariate calibration transfer method such as piecewise direct standardization (PDS) is usually applied to quanti- tative analysis. To make the method apply to qualitative analysis of Fourier Transform Infrared spectroscopy (FTIR), we propose an improved calibration transfer method based on maximum margin criterion (CTMMC). The new method not only consider the spectral changes under different conditions, but also take into account the geometric characteristics of spectra from different classes, so the transformed spectra from different classes will be separated as far as possible, this will improve the performance of the follow-up qualitative analysis. A comparative study is provided between the proposed method CTMMC and other tra- ditional calibration transfer methods on two data sets. Experimental results show that the proposed method can achieve better performance than previous methods. 1 Introduction The overall goal of qualitative analysis using Fourier Trans- form Infrared spectroscopy is classification. The classification step is often accomplished using one of several techniques that are now fairly well established including principal component analysis (PCA) 1 , k-nearest neighbor (KNN) 2 , support vector machine (SVM) 3 , and regularized discriminant analysis (R- DA) 4 etc. Nevertheless, a practical limitation to calibration models occurs when an existed model is applied to spectra that are measured under new environmental conditions or on a separate instrument. Even if identical samples are measured, the spectral variation of the two response matrices in differ- ent conditions that is captured by the model will differ. For this reason, a model developed on one instrument can not be directly used on spectra of a second instrument to predict its property. A solution to this calibration transfer problem is to re-measure every sample and construct a new model for new- ly acquired spectra. This is not a practical solution since it is normally expensive and time consuming. An alternative ap- proach is to apply chemometrics techniques to correct the dif- ferences produced by different environments 5 and instrumen- tals 6 , making the model robust and avoiding full recalibration. Different calibration transfer techniques have been devel- oped over the past years 7 in quantitative analysis. Much re- search work has been carried out on calibration transfer of In- frared spectra (IR), such as simple slope and bias correction 8 , Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, P. R. China. E-mail:[email protected] ; Tel: +86 10 62520293; Electronic Supplementary Information (ESI) available: [details of any supplementary information available should be included here]. See DOI: 10.1039/b000000x/ direct standardization (DS) 9 and piecewise direct standard- ization (PDS) 8 , these standardization methods transform the spectra from the new instrument to resemble those from the o- riginal instrument. An alternative approach to correct spectral differences is the use of data preprocessing. Unlike standard- ization procedures, these approaches do not require the use of transfer samples: Multiplicative signal correction (MSC) 10 , standardized normal variates (SNV) 10 and finite impulse re- sponse (FIR) 11 filtering all use a linear regression procedure to correct the secondary spectra on the basis of a reference spectrum, which is usually the mean of the calibration set ac- quired in the primary instrument. Moreover, some new tech- niques have been developed by modifying the previous cali- bration transfer algorithms 12–14 . Fan et al 15 proposed a cal- ibration transfer method based on canonical correlation anal- ysis (CTCCA), they show that canonical correlation analysis (CCA) is a very powerful tool that is especially well suited for relating two sets of measurements. In quantitative analysis, traditional PDS 8 method is consid- ered as a direct transfer technique with a good performance and has been successfully applied on a wide range of data set- s. This standardization method is based on the hypothesis that the spectral information given at a certain wavelength on one instrument (referred to as master instrument) is related to that in a small spectral region on the other instrument (referred to as slave instrument). Moreover, the local multivariate model (transfer matrix) is computed per spectral window around a given wavelength of the slave instrument, reducing the risk of over-fitting. In fact, PDS transfers spectra from the slave in- strument on which they are collected to the master instrument on which the calibration model is developed, this process re- 1–7 | 1

Calibration transfer based on maximum margin criterion for qualitative analysis using Fourier transform infrared spectroscopy

Embed Size (px)

Citation preview

Calibration transfer based on maximum margin criterion for qualita-tive analysis using Fourier Transform Infrared spectroscopy†

Yong Hu,∗ Silong Peng, Yiming Bi, and Liang Tang

Received Xth XXXXXXXXXX 20XX, Accepted Xth XXXXXXXXX 20XXFirst published on the web Xth XXXXXXXXXX 200XDOI: 10.1039/b000000x

Traditional multivariate calibration transfer method such as piecewise direct standardization (PDS) is usually applied to quanti-tative analysis. To make the method apply to qualitative analysis of Fourier Transform Infrared spectroscopy (FTIR), we proposean improved calibration transfer method based on maximum margin criterion (CTMMC). The new method not only considerthe spectral changes under different conditions, but also take into account the geometric characteristics of spectra from differentclasses, so the transformed spectra from different classes will be separated as far as possible, this will improve the performanceof the follow-up qualitative analysis. A comparative study is provided between the proposed method CTMMC and other tra-ditional calibration transfer methods on two data sets. Experimental results show that the proposed method can achieve betterperformance than previous methods.

1 Introduction

The overall goal of qualitative analysis using Fourier Trans-form Infrared spectroscopy is classification. The classificationstep is often accomplished using one of several techniques thatare now fairly well established including principal componentanalysis (PCA)1, k-nearest neighbor (KNN)2, support vectormachine (SVM)3, and regularized discriminant analysis (R-DA)4 etc. Nevertheless, a practical limitation to calibrationmodels occurs when an existed model is applied to spectrathat are measured under new environmental conditions or on aseparate instrument. Even if identical samples are measured,the spectral variation of the two response matrices in differ-ent conditions that is captured by the model will differ. Forthis reason, a model developed on one instrument can not bedirectly used on spectra of a second instrument to predict itsproperty. A solution to this calibration transfer problem is tore-measure every sample and construct a new model for new-ly acquired spectra. This is not a practical solution since it isnormally expensive and time consuming. An alternative ap-proach is to apply chemometrics techniques to correct the dif-ferences produced by different environments5 and instrumen-tals6, making the model robust and avoiding full recalibration.

Different calibration transfer techniques have been devel-oped over the past years7 in quantitative analysis. Much re-search work has been carried out on calibration transfer of In-frared spectra (IR), such as simple slope and bias correction8,

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, P. R.China. E-mail:[email protected] ; Tel: +86 10 62520293;† Electronic Supplementary Information (ESI) available: [details of anysupplementary information available should be included here]. See DOI:10.1039/b000000x/

direct standardization (DS)9 and piecewise direct standard-ization (PDS)8, these standardization methods transform thespectra from the new instrument to resemble those from the o-riginal instrument. An alternative approach to correct spectraldifferences is the use of data preprocessing. Unlike standard-ization procedures, these approaches do not require the use oftransfer samples: Multiplicative signal correction (MSC)10,standardized normal variates (SNV)10 and finite impulse re-sponse (FIR)11 filtering all use a linear regression procedureto correct the secondary spectra on the basis of a referencespectrum, which is usually the mean of the calibration set ac-quired in the primary instrument. Moreover, some new tech-niques have been developed by modifying the previous cali-bration transfer algorithms12–14. Fan et al15 proposed a cal-ibration transfer method based on canonical correlation anal-ysis (CTCCA), they show that canonical correlation analysis(CCA) is a very powerful tool that is especially well suited forrelating two sets of measurements.

In quantitative analysis, traditional PDS8 method is consid-ered as a direct transfer technique with a good performanceand has been successfully applied on a wide range of data set-s. This standardization method is based on the hypothesis thatthe spectral information given at a certain wavelength on oneinstrument (referred to as master instrument) is related to thatin a small spectral region on the other instrument (referred toas slave instrument). Moreover, the local multivariate model(transfer matrix) is computed per spectral window around agiven wavelength of the slave instrument, reducing the risk ofover-fitting. In fact, PDS transfers spectra from the slave in-strument on which they are collected to the master instrumenton which the calibration model is developed, this process re-

1–7 | 1

flects the spectra approximation from the slave instrument tothe master instrument by the transfer matrix. While in quali-tative analysis, because the goal is classification: we requestthe samples transformed from different classes can be separat-ed as far as possible, that means we need consider the classseparability for the development of transfer matrices.

In this paper, we propose a novel calibration transfermethod based on maximum margin criterion (CTMMC) to im-prove the robustness of the transfer matrix in qualitative anal-ysis. In this method, we add a constraint (maximum margincriterion) on the traditional PDS method to compute the trans-fer matrix, the constraint reflects that samples transformed bythe transfer matrix have a certain separability, and this is use-ful for the qualitative analysis.

2 Theory and algorithm

2.1 Piecewise direct standardization

The piecewise direct standardization (PDS)8 consists of di-rectly relating the response of a sample measured with themaster instrument to its response obtained on the slave instru-ment. This linear relationship is described by the transforma-tion matrix M according to:

Am = AsM (1)

where Am and As are the response matrices of the standardiza-tion samples obtained from the master and slave instruments,respectively. Once M is calculated, a new sample x measuredon the slave instrument is projected to the master instrumentmeasurement space, we obtain the corresponding vector x:

x = xM (2)

In PDS, the response of the standardization samples measuredat wavelength j on the master instrument A j

m is related to thewavelengths located in a small window around j measured onthe slave instrument:

A jm = A j

smj (3)

where A js is the localized response matrix of the standardiza-

tion samples measured on the slave instrument and mj is thevector of transformation coefficients for the j-th wavelength.The regression vectors calculated for each window on the da-ta are then assembled to form a banded diagonal matrix Maccording to:

M = diag(m1,m2, . . . ,mj, . . . ,mk) (4)

where k is the number of wavelengths. The response of anynew sample measured on the slave instrument can then be s-tandardized as if it were measured on the master instrumentusing Eq.2.

2.2 Maximum margin criterion

Suppose we have a set of n samples x1,x2, . . . ,xn (column vec-tor) belonging to c classes. The within-class scatter matrix Swand the between-class scatter matrix Sb are computed as fol-lows16:

Sw =c

∑k=1

nk

∑i=1

(x(k)i −µµµ(k))(x(k)i −µµµ(k))T (5)

Sb =c

∑k=1

nk(µµµ(k)−µµµ)(µµµ(k)−µµµ)T (6)

where (·)T denotes matrix transpose, µµµ is the total samplemean vector, nk is the number of samples in the k-th class,µµµ(k) is the mean vector of the k-th class, and xxx(k)i is the i-thsample in the k-th class. Define total scatter matrix:

St =n

∑i=1

(xxxi −µµµ)(xxxi −µµµ)T (7)

where n = ∑ki=1 ni, and St = Sw +Sb

17. Define:

J = tr(Sw −Sb) (8)

where tr(·) denotes matrix trace. Since tr(Sb) measures theoverall variance of the class mean vectors, a large tr(Sb) im-plies that the class mean vectors scatter in a large space, whilea small tr(Sw) implies that every class has a small spread, Jmakes the different classes have maximum margin. In fact,the maximum margin criterion J may represent class separa-bility better than PCA1, because PCA tries to maximize thetotal scatter tr(St) after a linear transformation, but the dataset with a large within-class scatter can also have a large totalscatter even when it has a small between-class scatter, this isdue to the equation St = Sb +Sw, obviously, such data are noteasy to classify.

2.3 Calibration transfer based on maximum margin cri-terion

Assume we have c classes standardization samples measuredon a master instrument and a salve instrument, respectively.Am and As are the response matrices of the standardizationsamples as defined in Eq.1. In qualitative analysis, we notonly need the transfer matrix M reflect the spectral changesunder different instruments, but also need the transfer matrixM reflect the separability of samples from different class. Toembody these two roles, we propose the following optimiza-tion framework:

M∗ = argmin∥Am −AsM∥2F +λ · tr(MT (Sw −Sb)M) (9)

where ∥ · ∥F denotes matrix frobenius norm, Sw and Sb de-note the within-class scatter matrix and between-class scatter

2 | 1–7

matrix for samples measured on the slave instrument (As), re-spectively. The first term on the right side of Eq.9 reflects thespectra approximation from the slave instrument to the masterinstrument by the transfer matrix M, while the second termis the maximum margin criterion J for the transfer samplesin the master instrument measurement space (the details aregiven in Appendix A), parameter λ is a tradeoff between ap-proximation accuracy and class separability. When λ = 0, theabove optimization problem becomes the ordinary PDS prob-lem. The optimization problem (9) is performed by setting thepartial derivative with respect to M to equal zero18:

AsT Am = [As

T As +λ (Sw −Sb)]M (10)

In order to avoid over-fitting, we use AsT Am instead of Am in

Eq.1, use AsT As +λ (Sw −Sb) instead of As in Eq.1, compute

the transfer matrix M with the method the same as PDS in sec-tion 2.1. We denote the proposed method calibration transferbased on maximum margin criterion (CTMMC).

3 Data set

3.1 Chinese liquor Guotai samples

This data set consists FTIR spectra of Chinese liquor Guotaisamples which are measured in our Multi-dimensional dataanalysis laboratory (MDA). The spectra measured in the re-gion 4000−650cm−1 have been recorded with a Perkin-ElmerSpectrum GX FTIR spectrometer, equipped with the Univer-sal ATR Sampling Accessory (ZnSe cell). The spectral reso-lution is set at 4cm−1. Sixteen scans are added for each spec-trum. The data set consists of two classes samples and theyare from two different instruments: PerkinElmer FTIR spec-trometers Spectrum 400 (SP400) and PerkinElmer FTIR spec-trometers Spectrum 100 (SP100) respectively. According tothe sample collection process, we partition the data set, ratherthan partitioning the data based on the sophisticated divisionsuch kennard-stone algorithm19.These samples are divided in-to three sets: 104 samples measured with SP400 are used forcalibration set, 543 samples measured with SP100 are usedfor prediction set, 100 samples independently measured withSP100 are used for validation set, which are severed as pa-rameter selection. 40 standardization samples which are mea-sured on the master (SP400: Am) and slave (SP100: As) in-struments respectively are used for computing the transfer ma-trix M. The details of dividing samples are shown in Table 1.The spectra of the standardization samples measured on bothinstruments are shown in Fig.1.

3.2 Pharmaceutical tablet samples

This data set consists FTIR spectra of 655 pharmaceuticaltablet samples from two instruments, they are presented by

Table 1 Samples divide for Chinese liquor Guotai samples

Data set Number instrumentsCalibration set 104 (54 + 50) SP400 (master)Prediction set 543 (223 + 320) SP100 (slave)Validation set 100 (50 + 50) SP100Standardization samples 40 (20 + 20) SP400 SP100The values in parentheses refer to the number of two types of samples.

1000150020002500300035004000

0

2

4

Wavenumber cm−1

Abs

orba

nce

1000150020002500300035004000

0

2

4

Wavenumber cm−1

Abs

orba

nce

1000150020002500300035004000

0

2

4

Wavenumber cm−1

Abs

orba

nce

Fig. 1 Spectra of 40 standardization samples measured on themaster (above) and slave (middle) instruments, the differencespectra measured between the master and the slave instrument(below).

the Software Shootout at the IDRC (2002) (InternationalDiffuse Reflectance Conference in Chambersburg)(www.eigenvector.com/data/tablets/index.html)with specified weight ranging from 363.9 to 390.99, including650 variables from 600 to 1898 nm in 2 nm increments(16667−5269cm−1). These samples were originally used forquantitative analysis, we now use it for qualitative analysisas follows: the samples whose weight are greater than 379.5are clustered into one class, and the samples whose weightare less than 378 are clustered into another class, we dividethe original samples into two classes based on the weight.395 calibration samples are measured with Instrument 1(INS 1), 102 prediction samples and 35 validation samplesare measured with Instrument 2 (INS 2), 35 standardizationsamples are measured with both INS 1 and INS 2 respectively.The details of dividing samples are shown in Table 2. The

Table 2 Samples divide for Pharmaceutical tablet samples

Data set Number instrumentsCalibration set 395 (197 + 198) INS 1 (master)+ Prediction set 102 (63 + 39) INS 2 (slave)Validation set 35 (21 + 14) INS 2Standardization samples 35 (9 + 26) INS 1 INS 2The values in parentheses refer to the number of two types of samples.

1–7 | 3

0.60.811.21.41.6

x 104

4

6

Wavenumber cm−1

Abs

orba

nce

0.60.811.21.41.6

x 104

4

6

Wavenumber cm−1

Abs

orba

nce

0.60.811.21.41.6

x 104

−1

0

1

2

Wavenumber cm−1

Abs

orba

nce

Fig. 2 Spectra of 35 standardization samples measured on themaster (above) and slave (middle) instruments, the differencespectra measured between the master and the slave instrument.

50010001500200025003000350040000

1

2

3

Wavenumber cm−1

Abs

orba

nce

Guotai sample spectrumRegion uesd for calibration

0.40.60.811.21.41.61.8

x 104

2

3

4

5

6

Wavenumber cm−1

Abs

orba

nce

Pharmaceutical tablet sample spectrumRegion uesd for calibration

Fig. 3 Guotai sample spectrum (above), pharmaceutical tabletsample spectrum (below).

spectra of the standardization samples measured on bothinstruments are shown in Fig.2.

4 Experimental

4.1 Data processing and modeling

We use the spectra in the region 2101− 1001cm−1 for Guo-tai samples in the experiment, and use the region 16667 −15631cm−1 and 7376− 6147cm−1 for pharmaceutical tabletsamples in the experiment, the first calibration sample (black)and the region (blue) of these two data sets used for thenext discussions are shown in Fig.3. We compare CTMMCmethod with traditional PDS8, CTCCA15 and spectra prepro-cessing methods, including MSC10, SNV10 and FIR11. InFIR method, there is only one parameter window width ω toadjust, we use the validation set to select the optimization pa-rameters for different methods.

The experiment of PDS and CTMMC methods consists of:

(1) using the calibration samples build classifier with LinearDiscriminate Analysis (LDA)17. (2) calculating the transfermatrices M with standardization samples measured with dif-ferent instruments (master and slave). (3) the prediction sam-ples measured with slave instrument are projected to the mas-ter instrument measurement place. (4) the projected predictionsamples are predicted with the classifier built in (1). We usethe validation set to select the optimization parameters of dif-ferent methods.

4.2 Evaluation measures

In order to evaluate the performance fairly for the two classesclassification, we use the sensitivity and the specificity. Sen-sitivity can be defined as the accuracy on the positive samples(minority) (true positives / (true positives + false negatives)),while specificity can be defined as the accuracy on the nega-tive samples (majority) (true negatives / (true negatives + falsepositives)). Kubat et al20 suggested the g-means metric de-fined as:

g−means =√

sensitively× speci f icity (11)

This measure has been used by several researchers for evalu-ating classifiers on imbalanced data sets21,22. We will also usethis measure to evaluate different methods in this paper. Wealso list the overall classification accuracy to give the readeran even better idea of the performance of g-means.

4.3 Software

All computations are performed in house in Matlab VersionR2008a (The MathWorks, Inc.) and run in a personal com-puter with an 2.20 GHz Intel Core 2 processor, 2 GB RAM,and a Windows XP operating system. The CTMMC algorithmis written in our Multi-dimensional data analysis laboratory∗;The PDS algorithm is from the PLSToolbox 4.0 (EigenvectorResearch, Manson, WA)23. All of the other routines, such ascalibration model establishment, PDS and performance evalu-ation, are performed with our own programs in the MATLABenvironment.

5 Results and discussion

5.1 The parameter effect for PDS method

There is only one adjustable parameter window width ω inPDS method. Thus, a proper window width must be investi-gated prior to development of the PDS model. In this exper-iment, for the calibration sets, ω is investigated by increas-ing from 1 to 100 with a step of 2. For each ω , the trans-fer matrices M is developed using the standardization samples

∗ http://mda.ia.ac.cn/English/codedata/codedata_index.htm

4 | 1–7

0 20 40 60 80 100

0.3

0.4

0.5

0.6

0.7

0.8

Window width ω

G−

mea

ns

Guotai samplesPharmaceutical tablet samples

Fig. 4 Variation of g-means for validation samples with the windowwidth ω : Guotai samples (blue), Pharmaceutical tablet samples(red) with PDS method.

with PDS method, then the classifier developed with calibra-tion samples is used to predict the validation samples project-ed with transfer matrices M. As shown in Fig.4, for Guotaisamples, g-means of the validation set increases when a smal-l ω is used, and then decreases with the increase of windowwidth ω , we assign parameter ω to 9 for PDS method, whilefor pharmaceutical tablet samples, there is a few fluctuationsfor the variation of g-means, we assign ω to 13 for this dataset in the next discussions.

5.2 The parameter effect for CTMMC method

There are two adjustable parameters window width ω and reg-ularization parameter λ in CTMMC method. Regularizationparameter λ is a tradeoff between approximation accuracy andclass separability, this can be found from Eq.9: if λ is too s-mall, which means we tend to pursue approximation accuracy,it reflects the spectra approximation from the slave instrumentto the master instrument by the transfer matrix M, while wepay a little attention to the class separability, this maybe fur-ther affect the classification accuracy of the final model; if λis too large, which means we tend to purse class separabilityand despise the approximation accuracy, this maybe also fur-ther affect the classification accuracy of the final model. Inthis study, ω is also investigated by increasing from 1 to 100with a step of 2, λ is investigated from 10−5 to 105 with a ra-tio of 10 for the two data sets. For each combination ω andλ , the transfer matrices M is developed using the standard-ization samples with CTMMC, then the classifier developedwith calibration samples is used to predict the validation sam-ples projected with transfer matrices M. We consider the t-wo parameters together and generate a contour graph to chosethe optimal parameter combination. The variation of g-meanswith different parameters for the two data sets are shown inFig.5 and Fig.6 respectively. As shown in Fig.5, for Guotai

0.2

0.2

0.4

0.4 0.4

0.4

0.4 0.4

0.6

0.6 0.6 0.6

0.6

0.8

0.8 0.80.80.8

Reg

ular

izat

ion

para

met

er lo

g10(

λ)

Window width ω

Contour graph plot of g−means for Guotai samples

10 20 30 40 50 60 70 80 90−5

−4

−3

−2

−1

0

1

2

3

4

5

Fig. 5 Variation of g-means for Guotai samples with the twoparameters with CTMMC.

0.2

0.2

0.2

0.4

0.4

0.4

0.4

0.4 0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.4

0.6

0.60.6 0.

60.

60.

6

0.60.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.6

0.8

0.8

0.8

0.8R

egul

ariz

atio

n pa

ram

eter

log1

0(λ)

Window width ω

Contour graph plot of g−means for Pharmaceutical tablet samples

10 20 30 40 50 60 70 80 90−5

−4

−3

−2

−1

0

1

2

3

4

5

Fig. 6 Variation of g-means for Pharmaceutical tablet samples withthe two parameters with CTMMC.

samples with the proposed method CTMMC , we assign λ to1 and window width ω to 17 in the next discussions. As shownin Fig.6, for Pharmaceuticaltablet samples with the proposedmethod CTMMC, we assign λ to 0.01 and window width ωto 71 in the next discussions.

5.3 Prediction results

With the above optimal parameters, we compare the proposedmethod CTMMC with PDS, CTCCA, MSC, SNV and FIRmethods. In order to illustrate the effect of the calibrationtransfer methods, we also give the prediction results withouttransfer matrix (No). All these methods are developed withthe calibration samples and used for the prediction samples.The result are shown in Table 3.

For Guotai samples, it is clear that the prediction resultis significantly improved with calibration transfer methods(PDS, CTMMC, CTCCA, MSC, SNV, FIR). Because the cal-ibration samples and prediction samples are measured withdifferent instruments, respectively, a model developed on one

1–7 | 5

Table 3 Prediction results (g-means/overall accuracy) with different calibration transfer methods

Date set No PDS CTMMC CTCCA MSC SNV FIRGuo tai samples 0/0.41 0.76/0.75 0.86/0.85 0.61/0.60 0.44/0.52 0.54/0.58 0.73/0.73 (69)Pharmaceutical tablet samples 0.15/0.58 0.49/0.52 0.63/0.66 0.61/0.60 0/0.62 0/0.62 0.45/0.49 (5)The values in parentheses refer to FIR window size.

instrument can generally not be directly used on spectra ofa second instrument to predict its property, the prediction re-sult without transfer matrix is bad. Although the accuracy is0.41, because one class samples are completely misclassifiedwithout transfer matrix, the performance g-means is 0. Com-pared with other calibration transfer methods, our method CT-MMC obtains the best results, both for g-means and accuracy(0.86/0.85).

For Pharmaceutical tablet samples, although the accuracyis 0.62 with MSC and SNV methods, while one class sam-ples are completely misclassified with these two methods (g-means: 0), this gives us a better idea of the performance ofg-means, and our method CTMMC (0.63/0.66) is better thanother methods, because we compute the transfer matrix con-sidering both samples approximation and class separability:the transfer matrix obtained with CTMMC method reflectsthe differences between the master instrument and the slaveinstrument. Meanwhile, due to the maximum margin crite-rion, when the prediction samples from the slave instrumentmeasurement space are transformed to the master instrumentmeasurement space, the new prediction samples have a certainseparability, this will have a positive effect for the subsequentqualitative analysis.

In this paper, parameter selection for CTMMC has been in-vestigated. A difficulty for CTMMC method is the selectionof the optimal window width due to the potential introductionof transfer artifacts, regularization parameter is another dif-ficulty: it is a tradeoff between approximation accuracy andclass separability, for Guotai and Pharmaceutical tablet sam-ples, the regularization parameter λ is 1 and 0.01, respectively,while the window width ω is 17 and 71, respectively. Param-eter selection should be adaptive to data sets, this is an openquestion for the follow-up study.

6 Conclusions

The traditional calibration transfer method PDS is used forquantitative analysis in most cases. In order to make themethod more effective in qualitative analysis, we improve thePDS method based on the maximum margin criterion. Exper-imental results on two data sets demonstrate the efficiency ofthe proposed algorithm.

Acknowledgements

The research is supported in part by the National Nat-ural Science Foundation of China (60972126), the Join-t Funds of the National Natural Science Foundation of Chi-na (U0935002/L05), the Beijing Municipal Natural ScienceFoundation (4102060) and the State Key Program of NationalNatural Science of China (61032007).

Appendix A

Suppose the transfer matrix is M and we have a set ofn standardization samples belonging to c classes: As =(xs1,xs2, . . . ,xsn)

T , where xsi (column vector) is the i−th sam-ple measured on the slave instrument. From Eq.2, the transfersamples with the transfer matrix M are as follows:

Am = AsM= (xs1,xs2, . . . ,xsn)

T M= (MT xs1,MT xs2, . . . ,MT xsn)

T

where Am is the approximation of Am in Eq.2, thus, thewithin-class scatter matrix Sw and the between-class scattermatrix Sb for the transfer samples Am are computed accordingto Eq.5 and Eq.6 as follows:

Sw =c

∑k=1

nk

∑i=1

(MT xsi(k)−MT µµµ(k))(MT xsi

(k)−MT µµµ(k))T

= MT{c

∑k=1

nk

∑i=1

(xsi(k)−µµµ(k))(xsi

(k)−µµµ(k))T}M

= MT SwM

Sb =c

∑k=1

nk(MT µµµ(k)−MT µµµ)(MT µµµ(k)−MT µµµ)T

= MT{c

∑k=1

nk(µµµ(k)−µµµ)(µµµ(k)−µµµ)T}M

= MT SbM

where Sw and Sb are the within-class scatter matrix and thebetween-class scatter matrix for samples matrix As, respec-tively. Thus, the maximum margin criterion J for the transfer

6 | 1–7

samples in the master instrument measurement space is as fol-lows:

J = tr(Sw − Sb)

= tr(MT SwM−MT SbM)

= tr(MT (Sw −Sb)M)

References1 I. Jolliffe and MyiLibrary, Principal component analysis, Wiley Online

Library, 2002, vol. 2.2 Y. Gao, B. Zheng, G. Chen, W. Lee, K. Lee and Q. Li, IEEE Trans. Knowl.

Data Eng., 2009, 21, 1314–1327.3 V. Vapnik, IEEE Trans. Neural Netw., 1999, 10, 988–999.4 D.-Q. Dai and P. Yuen, IEEE Trans. Syst., Man, Cybern. B, Cybern., 2007,

37, 1080–1085.5 X. Shao, J. Kang and W. Cai, Talanta, 2010, 82, 1017–1021.6 E. Bouveresse, C. Hartmann, D. L. Massart, I. R. Last and K. A. Prebble,

Anal. Chem., 1996, 68, 982–990.7 R. N. Feudale, N. A. Woody, H. Tan, A. J. Myles, S. D. Brown and J. Ferr,

Chemom. Intell. Lab. Syst., 2002, 64, 181–192.8 Y. Wang, D. J. Veltkamp and B. R. Kowalski, Anal. Chem., 1991, 63,

2750–2756.9 J. Lin, S.-C. Lo and C. W. Brown, Anal. Chim. Acta, 1997, 349, 263–269.

10 C. Pereira, M. Pimentel, R. Galvao, F. Honorato, L. Stragevitch andM. Martins, Anal. Chim. Acta, 2008, 611, 41–47.

11 T. B. Blank, S. T. Sum, S. D. Brown and S. L. Monfre, Anal. Chem., 1996,68, 2987–2995.

12 F. Lima and L. Borges, J. Near Infrared Spectrosc., 2002, 10, 269–278.13 J. Yoon, B. Lee and C. Han, Chemom. Intell. Lab. Syst., 2002, 64, 1–14.14 L. Zhang, G. W. Small and M. A. Arnold, Anal. Chem., 2003, 75, 5905–

5915.15 W. Fan, Y. Liang, D. Yuan and J. Wang, Anal. Chim. Acta, 2008, 623,

22–29.16 H. Li, T. Jiang and K. Zhang, IEEE Trans. Neural Netw., 2006, 17, 157–

165.17 K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic

Press, second edition, 1990.18 R. Bellman, Introduction to matrix analysis (2nd edition), Society for In-

dustrial and Applied Mathematics, Philadelphia, PA, USA, 1997.19 R. Galvao, M. Araujo, G. Jose, M. Pontes, E. Silva and T. Saldanha, Ta-

lanta, 2005, 67, 736–740.20 M. Kubat and S. Matwin, Machine learning international workshop, 1997,

pp. 179–186.21 H. Nguyen, E. Cooper and K. Kamei, International Journal of Knowledge

Engineering and Soft Data Paradigms, 2011, 3, 4–21.22 K. Huang, H. Yang, I. King and M. Lyu, Computer Vision and Pattern

Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE ComputerSociety Conference on, 2004, pp. II–558.

23 B. Wise, N. Gallagher, R. Bro, J. Shaver, W. Windig and J. Koch, Eigen-vector Research Inc., Manson, WA, 2006.

1–7 | 7