7
Analytical Methods Quantitative determination of fat and total protein in milk based on visible light scatter Andrey Bogomolov a,, Stefan Dietrich a , Barbara Boldrini b , Rudolf W. Kessler b a J&M Analytik AG, 8 Willy-Messerschmitt-Strasse, 73457 Essingen, Germany b Reutlingen University, Alteburgstrasse 150, 72762 Reutlingen, Germany article info Article history: Received 6 May 2011 Received in revised form 19 October 2011 Accepted 14 February 2012 Available online 22 February 2012 Keywords: Milk analysis Fat content Total protein content Visible spectroscopy Near infrared spectroscopy Light scatter Mie theory Design of experiment abstract A new optical spectroscopic method for milk fat and total protein analysis has been developed. In con- trast to the conventional approach that generally relies on the components’ absorption, the suggested method is based on the phenomenon of light scatter by fat and protein particles. This fundamental dis- tinction enables shifting the measurement to the cost-effective visible and adjacent near infrared region (below 1000 nm), where the scatter strongly dominates. Partial Least-Squares regression modelling on a designed set of training and validation milk samples resulted in root mean-square prediction errors of 0.05% and 0.03% for fat and protein content, respec- tively, which is close to the accuracy of reference analysis. It has been shown that multivariate data anal- ysis is capable of distinguishing individual scatter spectra of fat and protein. This conclusion has been supported by Mie-theory calculations. The method is suitable for routine laboratory analysis or in-line quality monitoring in the dairy production. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Fat and protein are two main nutrients of milk, and their con- centrations are routinely monitored during the commercial pro- duction and in the final product. Growing quality requirements in the food and pharmaceutical industries promote the dissemina- tion of spectroscopic analysis enabling real-time monitoring of processes and products. Optical spectroscopy is one of the most favored techniques for the industrial quality monitoring due to its wide availability, undemanding sampling, quick measurement and the applicability of fibre optics and remote probes on its basis. Spectroscopic analysis of homogenized and raw natural milk has been intensively investigated by infrared (IR) (Iñón, Garrigues, & De la Guardia, 2004; Luginbühl, 2002) or near IR (NIR) spectroscopy in both ‘‘silicon’’ detector range (700–1100 nm) (Brennan, Alderman, Sattler, O’Connor, & O’Mathuna, 2003; Chen, Iyo, Kawano, & Terada, 1999; Kalinin et al., 2008; Šašic ´ & Ozaki, 2001; Tsenkova, Atanass- ova, Ozaki, Toyoda, & Itoh, 2001; Tsenkova et al., 1999; Wu, He, & Feng, 2008) and far-NIR (1100–2500 nm) (Kalinin, Krasheninnikov, & Potapov, 2009; Kalinin et al., 2008; Purnomoadi, Batajoo, Ueda, & Terada, 1999; Tsenkova, Atanassova, Itoh, Ozaki, & Toyoda, 2000; Tsenkova et al., 1999). Reported prediction errors for both fat and protein content typically lie between 0.07% and 0.10% or above. Milk has pronounced light-scattering properties due to the presence of emulgated fat globules and suspended protein micelles (Walstra, Wouters, & Geurts, 2006, chap. 1). Applying the spectro- scopic analysis, researches typically attempt to minimize the scat- ter and thus emphasize the absorption. The scatter can be reduced during the measurement or through a mathematical post-process- ing of the spectra, e.g. multiplicative scatter correction (MSC) by Geladi, MacDougall, and Martens (1985), standard normal variate (SNV) or numerical derivatives. Multivariate regression analysis is then commonly used to obtain the models capable of predicting component concentrations (Rodriguez-Otero, Hermida, & Centeno, 1997). A complex multimodal size distribution of scattering particles significantly complicates the spectroscopic analysis of milk, specif- ically, in the economically very attractive region of visible (Vis) light (400–700 nm), where the scatter is essentially stronger than in NIR. For this reason, visible wavelengths are usually ignored as containing no intense characteristic absorption bands of the milk components. There are only a few published works exploiting the Vis region for milk analysis (Crofcheck, Payne, Hicks, Menguc, & Nokes, 2000; Muñiz et al., 2009). The scatter is generally considered as a ‘‘parasitic’’ phenome- non, complicating the spectroscopic analysis of opaque media. At the same time, the scatter may deliver quantitative information. The detected intensities of scattered light at different wavelengths depend on the number and sizes of colloidal particles, and 0308-8146/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.foodchem.2012.02.077 Corresponding author. Fax: +49 7361 92812. E-mail address: [email protected] (A. Bogomolov). Food Chemistry 134 (2012) 412–418 Contents lists available at SciVerse ScienceDirect Food Chemistry journal homepage: www.elsevier.com/locate/foodchem

Quantitative determination of fat and total protein in milk based on visible light scatter

Embed Size (px)

Citation preview

Page 1: Quantitative determination of fat and total protein in milk based on visible light scatter

Food Chemistry 134 (2012) 412–418

Contents lists available at SciVerse ScienceDirect

Food Chemistry

journal homepage: www.elsevier .com/locate / foodchem

Analytical Methods

Quantitative determination of fat and total protein in milk based on visiblelight scatter

Andrey Bogomolov a,⇑, Stefan Dietrich a, Barbara Boldrini b, Rudolf W. Kessler b

a J&M Analytik AG, 8 Willy-Messerschmitt-Strasse, 73457 Essingen, Germanyb Reutlingen University, Alteburgstrasse 150, 72762 Reutlingen, Germany

a r t i c l e i n f o a b s t r a c t

Article history:Received 6 May 2011Received in revised form 19 October 2011Accepted 14 February 2012Available online 22 February 2012

Keywords:Milk analysisFat contentTotal protein contentVisible spectroscopyNear infrared spectroscopyLight scatterMie theoryDesign of experiment

0308-8146/$ - see front matter � 2012 Elsevier Ltd. Adoi:10.1016/j.foodchem.2012.02.077

⇑ Corresponding author. Fax: +49 7361 92812.E-mail address: [email protected] (A. Bogomolov

A new optical spectroscopic method for milk fat and total protein analysis has been developed. In con-trast to the conventional approach that generally relies on the components’ absorption, the suggestedmethod is based on the phenomenon of light scatter by fat and protein particles. This fundamental dis-tinction enables shifting the measurement to the cost-effective visible and adjacent near infrared region(below 1000 nm), where the scatter strongly dominates.

Partial Least-Squares regression modelling on a designed set of training and validation milk samplesresulted in root mean-square prediction errors of 0.05% and 0.03% for fat and protein content, respec-tively, which is close to the accuracy of reference analysis. It has been shown that multivariate data anal-ysis is capable of distinguishing individual scatter spectra of fat and protein. This conclusion has beensupported by Mie-theory calculations.

The method is suitable for routine laboratory analysis or in-line quality monitoring in the dairyproduction.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Fat and protein are two main nutrients of milk, and their con-centrations are routinely monitored during the commercial pro-duction and in the final product. Growing quality requirementsin the food and pharmaceutical industries promote the dissemina-tion of spectroscopic analysis enabling real-time monitoring ofprocesses and products. Optical spectroscopy is one of the mostfavored techniques for the industrial quality monitoring due toits wide availability, undemanding sampling, quick measurementand the applicability of fibre optics and remote probes on its basis.

Spectroscopic analysis of homogenized and raw natural milk hasbeen intensively investigated by infrared (IR) (Iñón, Garrigues, & Dela Guardia, 2004; Luginbühl, 2002) or near IR (NIR) spectroscopy inboth ‘‘silicon’’ detector range (700–1100 nm) (Brennan, Alderman,Sattler, O’Connor, & O’Mathuna, 2003; Chen, Iyo, Kawano, & Terada,1999; Kalinin et al., 2008; Šašic & Ozaki, 2001; Tsenkova, Atanass-ova, Ozaki, Toyoda, & Itoh, 2001; Tsenkova et al., 1999; Wu, He, &Feng, 2008) and far-NIR (1100–2500 nm) (Kalinin, Krasheninnikov,& Potapov, 2009; Kalinin et al., 2008; Purnomoadi, Batajoo, Ueda, &Terada, 1999; Tsenkova, Atanassova, Itoh, Ozaki, & Toyoda, 2000;Tsenkova et al., 1999). Reported prediction errors for both fat andprotein content typically lie between 0.07% and 0.10% or above.

ll rights reserved.

).

Milk has pronounced light-scattering properties due to thepresence of emulgated fat globules and suspended protein micelles(Walstra, Wouters, & Geurts, 2006, chap. 1). Applying the spectro-scopic analysis, researches typically attempt to minimize the scat-ter and thus emphasize the absorption. The scatter can be reducedduring the measurement or through a mathematical post-process-ing of the spectra, e.g. multiplicative scatter correction (MSC) byGeladi, MacDougall, and Martens (1985), standard normal variate(SNV) or numerical derivatives. Multivariate regression analysisis then commonly used to obtain the models capable of predictingcomponent concentrations (Rodriguez-Otero, Hermida, & Centeno,1997).

A complex multimodal size distribution of scattering particlessignificantly complicates the spectroscopic analysis of milk, specif-ically, in the economically very attractive region of visible (Vis)light (400–700 nm), where the scatter is essentially stronger thanin NIR. For this reason, visible wavelengths are usually ignored ascontaining no intense characteristic absorption bands of the milkcomponents. There are only a few published works exploiting theVis region for milk analysis (Crofcheck, Payne, Hicks, Menguc, &Nokes, 2000; Muñiz et al., 2009).

The scatter is generally considered as a ‘‘parasitic’’ phenome-non, complicating the spectroscopic analysis of opaque media. Atthe same time, the scatter may deliver quantitative information.The detected intensities of scattered light at different wavelengthsdepend on the number and sizes of colloidal particles, and

Page 2: Quantitative determination of fat and total protein in milk based on visible light scatter

3.983.903.823.743.663.583.503.423.343.263.183.103.02

%Fat3.983.903.823.743.663.583.503.423.343.263.183.103.02

%Fat

A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418 413

consequently, on the respective component content. Hence, forsystems with a stable composition (or having predictably narrowvariations) like milk the scatter can potentially be used for quanti-tative analysis. This statement can be supported by an examplefrom powder technology, where the application of diffuse reflec-tance NIR spectroscopy to the analysis of particle size distributionhas been proved by Nieuwmeyer et al. (2007). Nevertheless, thefeasibility of quantitative analysis of multi-component mixturesfrom the scatter spectra still stays poorly studied.

There is a number of earlier works establishing the correlationbetween the visible light scatter and milk fat content, the so-calledturbidimetric methods (Ashworth, 1969; Walstra, 1967). None ofthem however resulted in a workable analytical technique havinga practically acceptable accuracy. An essential fault of these earlyworks was the univariate approach. Although full-spectrum datawere available, the analysis still stuck to individual wavelengths,one at a time. In this way, the globule size differences and the pres-ence of smaller protein particles became the factors seriously lim-iting the method accuracy. The method by Ashworth (1969)required a sample pre-treatment to disperse the protein micelles,but the univariate model still depended on the homogenizationefficiency and could not be transferred from one dairy to another.

To our knowledge, there is no scatter-based method for quanti-tative determination of total protein in milk. The task of physicallybased rendering of the milk appearance has been successfully re-solved by Frisvad, Christensen, and Jensen (2007). The authorshave also stated the feasibility of a reverse solution, i.e. resolvingthe constituent scatter properties from an image. This conclusionis important as an independent theoretical confirmation of the for-mal modelling technique we suggest in the manuscript.

In the present work, the spectroscopy in the predominant Visand adjacent short-wave (SW) edge of the NIR region (up to1000 nm) has been successfully applied to the quantitative analy-sis of fat and total protein in bovine milk. Partial Least-Squares(PLS) regression analysis of milk spectra has been performed usinga series of samples closely reproducing the natural milk composi-tion. Fat and total protein concentrations in the training and vali-dation sample sets were systematically varied in accordance withthe authors’ original design scheme. The main purpose of the re-search was to investigate the feasibility of accurate prediction ofthe milk composition from Vis/SW-NIR spectra. Comparative anal-ysis of PLS diagnostic plots (scores and loadings) and mathematicalscatter simulations has led to important conclusions on the role ofmilk components’ individual scatter spectra in the multivariateregression modelling.

162.60

142.65

13172.70

112.75

82.80

62.85

211 2-5182.90

72.95

93.00

103.05

20123.10

153.15

193.20

% Protein

162.60

142.65

13172.70

112.75

82.80

62.85

211 2-5182.90

72.95

93.00

103.05

20123.10

153.15

193.20

% Protein

Fig. 1. Experimental design of milk sample composition for multivariate regressionanalysis. Validation samples are selected in bold. Samples added to update to theCCC design (gray fields) are underlined.

2. Materials and methods

2.1. Milk sample design and preparation

For a robust modelling the milk samples should be composi-tionally close to the analyzed production milk. The sample setshould provide a systematic coverage of the whole design space,including the intervals of possible content variation. To achievethis, the samples should be prepared by a controlled dosing ofthe main constituents.

To prepare milk samples with a desired composition, the fol-lowing technique was used (Kalinin et al., 2008, 2009). The sam-ples were mixed of four source components. Two of them werecream (fat source) and skim milk (protein source), respectivelycontaining (w/w): 34.20% and 0.06% of fat, 2.00% and 3.31% of pro-tein and 2.97% and 4.83% of lactose. These were the intermediates,taken directly from the normalized milk production line of a largedairy (Schwarzwaldmilch GmbH, Freiburg, Germany). Lactose con-centration was kept constant at 4.70%. This was achieved by

additions of aqueous 10%-solution of lactose and pure water (ster-ile distilled water for injections was used in both cases).

Each sample was prepared by mixing calculated weights of theabove four constituents and treating them with ultrasound in SON-OREX TK30 50 kHz bath (Bandelin electronic, Berlin, Germany) for5 min. It was initially supposed that ultrasound treatment wouldlead to the milk homogenization as reported by Ertugay, S�engül,and S�engül (2004). However, subsequent microscopic observationhas shown that the samples still contained an essential share offat particles with sizes above 5 lm. Hence, it should be concludedthat the homogenization was still far from completeness.

The samples were composed in accordance with a scheme, pre-sented in Fig. 1. Possibly wide intervals of fat and protein content,given the source material composition, were selected. A customdesign, suggested by the authors, was put into the basis of theexperiment. The main principle of this design is to cover possiblymore concentration levels of the two components with the mini-mal number of samples. In accordance to it, fat and protein con-tents are varied in 11 levels each using only 11 samples. Severaladditional samples were added to update the existing scheme tothe central composite circumscribed (CCC) design (Eriksson,Johansson, Kettaneh-Wold, Wikström, & Wold, 2008, chap. 6.4)with five replicates in the central point. The CCC design is oftenused to study the effects of multiple factors on a response function.In Section 3, dependences of Vis/SW-NIR spectra on the fat andprotein content will be illustrated using the CCC samples. In total21 samples were prepared.

The reference analysis of milk fat and total protein content wasperformed using MilkoScan™ FT120 (FOSS, Denmark). The actualconcentrations of fat and protein in analyzed samples couldslightly deviate from the design (Fig. 1), for example, because ofcomponent dosing inaccuracy or fat sedimentation on the wallsof sample-containing vessels.

The validation samples (see also Section 2.3) were selectedstarting from 2 to 5 (central point replicates) and adding subse-quent samples so that they never occupy any two adjacent levelsof either component. Also, the validation set should not includeany marginal samples, i.e. those forming the outer border of thedesign space (the numbers 12, 13, 16–18, 19–21). This algorithm

Page 3: Quantitative determination of fat and total protein in milk based on visible light scatter

414 A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418

resulted in a well-balanced structure of validation samples withoutany noticeable damage to the calibration (training) set, formed by13 remaining samples.

2.2. Spectroscopic measurements

Spectroscopic analysis of the milk samples was performed in astainless-steel flow cell with reproducible variable pathlength(RVP) by J&M Analytik AG (Essingen, Germany) that was providedwith a funnel and a stopcock to adapt it for static laboratory mea-surements (Fig. 2). The Vis/SW-NIR absorbance spectra between400 and 1000 nm interpolated to 1-nm step were acquired intransmission mode (detection at 180�) through a 5.5-mm path-length. The illumination and detection ports of the RVP flow cellwere provided with the collimating optics. The light source was a5 V/7.5 W halogen lamp connected to the cell optics with a short600-lm fibre optical cable. The output port was connected to theMMS-spectrometer (TIDAS NIR, J&M Analytik AG) through anotherfibre cable.

The spectra were collected with the integration time of 500 msat five accumulations. Distilled water was used as a reference spec-trum that was measured once at the beginning of the series anal-ysis. To compensate for the optical density difference betweenthe water and milk, the reference spectrum was attenuated withan optical filter built-in in front of the detector. The filter evenly re-duced the incident light intensity in the chosen wavelength regionby four orders of magnitude (four optical density units). This waythe reference was optimized for measurement dynamic rangeand better interpretability of milk spectra.

The sample spectra were acquired at 24 ± 1 �C (the roomtemperature).

2.3. PLS regression modelling and validation

PLS regression (Esbensen, 2000, chaps. 6–8; Wold, Martens, &Wold, 1983) is widely used in quantitative spectroscopic analysis.It is aimed at finding a vector of regression coefficients b establish-ing a linear relationship between the matrix X of spectral data(n �m, n spectra in rows and m variables in columns) and a vectory (n � 1) of response values, i.e. milk fat or protein concentration,measured by a reference method. The model b (m � 1) is then usedto predict an unknown value y for a new sample spectrum xnew

(Eq. (1)):

Fig. 2. RVP flow cell schematic: 1 – cell body, 2 – funnel, 3 – adjustable pathlength,4 – collimating optics, 5 – halogen lamp, 6 – stopcock, 7 – lamp power supply and 8– fibre cable to spectrometer.

y ¼ Xb; y ¼ xnewb ð1Þ

PLS modelling will be considered here in some detail, as it is rel-evant for the presentation and discussion of results. The analysisstarts from a decomposition of X onto two complementary matri-ces: scores T (n � a) and loadings P (m � a) in accordance with Eq.(2):

X ¼ TPT þ E ¼Xa

i¼1

tipTi þ E ð2Þ

where a << n,m is the number of latent variables or principle com-ponents (PCs) retained in the model; ti (n � 1) and pi (m � 1) are theorthogonal/orthonormal vectors, respectively constituting thematrices T and P.

Each subsequent i-th PC, determined by the pair ti–pi, takes adirection of maximal covariance between the residual (not cap-tured by the former components) variance in X and y. A total num-ber of n or m, whichever is less, components can be selected in thisway. However, after a certain value a, the components are typicallystarting to describe noise rather than useful data structure. Thefirst a components are therefore called principal components andretained, while the subsequent ones should be ignored. Each PCcan be characterized by captured per cent variance. The matrixproduct of T and P reproduces the most important variance in X,leaving the noise (or error) in the residual matrix E (n �m). In fact,PLS performs a projection of the X-data onto a new lower-dimen-sional space, represented by T and P, where it can be effectivelyanalyzed.

In addition to its predictive power, PLS model is helpful toinvestigate the data structure using scores, loadings and the vectorof regression coefficients b. The matrices T and P provide valuableinformation on the internal data structure. Their interpretation isbased on the fundamental fact that correlation between two vari-ables or similarity of two samples is a function of distance in thePC-space. Pair-wise score plots ti–tj, in particular t1–t2 that usuallycaptures the main data variance, are often referred to as ‘samplemaps’ revealing their groupings and trends. Similarly, the loadingsdescribe the variable structure and show their correlations. Thedistance from the origin to a sample in the score or a variable inthe loading plot along a certain PC reflects their importance withregard to this PC. Comparison of respective scores and loadingplots helps to find out interrelations between the samples andthe variables.

An important property of the PLS regression coefficient b is thatits value reflects the relative variable importance for the model.Thus, the coefficient’s being close to zero means that the respectivevariable does not introduce into the prediction noticeably and viceversa. The correlated variables tend to form spectrum-like patternsthat can be interpreted in terms of the contribution of particularspectral features of the components into the model performance.

The PLS models for fat and protein were built and validatedusing the designed training and validation sets (Fig. 1). Alterna-tively, leave-one-out (full) cross-validation was applied (Esbensen,2000, chaps. 6–8). The model performances were characterized bythe root mean-square error (RMSE) of prediction (or cross-valida-tion) as well as with the coefficient of determination R2 (Eq. (3)):

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPki¼1ðyi � yiÞ

2

k

s; R2 ¼ 1�

Pki¼1ðyi � yiÞ

2Pki¼1ðyi � �yÞ2

;

where �y ¼Pk

i¼1yi

kð3Þ

where k is the number of samples in the validation set (for cross-validation k = n).

Page 4: Quantitative determination of fat and total protein in milk based on visible light scatter

A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418 415

2.4. Software

PLS regression was performed in The Unscrambler v.9.6 (CAMO,Oslo, Norway). MiePlot v.4.2 software by Laven (2011) was used forMie-theory calculations.

3. Results, discussion and conclusions

Spectra of the designed milk samples are presented in Fig. 3.The spectra have very similar features closely resembling the spec-trum of water with an intensive peak of the molecule symmetricaland asymmetrical stretching overtones around 980 nm. It appearsin the spectra because the milk water absorption is not completelycompensated by the reference that combines the water and anoptical filter (Section 2.2). The varying offset seems to be relatedto the fat content: compare highlighted marginal samples 18 and21. Indeed, higher fat content results in a higher number of parti-cles, and hence, stronger scatter observed as a stronger extinctionin absorbance units. Spectra of samples 16 and 19 correspond tothe lowest and highest protein content, respectively, and revealdistinct slope differences, while the offset stays similar and agreeswith the middle fat level.

Scatter correction techniques, such as SNV, MSC and first or sec-ond derivatives, routinely applied to milk spectra prior to multivar-iate analysis (e.g. Chen et al., 1999; Kalinin et al., 2008; Šašic &Ozaki, 2001; Tsenkova et al., 2001; Wu et al., 2008), dramaticallyinfluence the offsets and slopes. Considering the character of spec-tral responses to the milk composition (Fig. 3), this preprocessingcan be destructive for quantitative fat and protein analysis andthus should not be applied by default. Indeed, preliminary dataanalyses have shown that much better performance is achievedwithout any data preprocessing.

Individual PLS models have been built for fat and total proteincontent. The optimal spectral regions for the modelling of fat andprotein were found to be 625–995 and 520–960 nm, respectively.The optimization was performed by means of successive elimina-tion of single data points on each spectral side until the RMSEreached a minimum.

The optimal number of PCs was determined by analyzing theexplained X- and y-variance as well as RMSE (Fig. 4). Two principal

500 550 600 650 700 750 800 850 900 950 1000

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

16

1918

21

Wavelength (nm)

Abso

rban

ce

Fig. 3. Vis/SW-NIR spectra of designed milk samples; marginal samples 16 (redsolid), 18 (blue dash-dotted), 19 (black dashed) and 21 (green dotted) arehighlighted (see sample descriptions in Fig. 1); optimal wavelength regions forthe modelling of fat and protein are designated with solid and dashed vertical lines,respectively. (For interpretation of the references to colour in this figure legend, thereader is referred to the web version of this paper.)

components have been chosen for both fat and protein modelling.In the model for fat content, two PCs correspond to the lowestRMSE. Although in the case of protein the RMSE minimum lies atthree PCs, the first two describe about 100% and 98% of spectral(X) and protein content (y) data, respectively (Fig. 4b). Therefore,the last PC was neglected as less significant and potentially af-fected by noise.

PLS regression resulted in accurate models for both fat and pro-tein as shown in Table 1 and Fig. 5. Perfect consistency of the de-signed validation and cross-validation, confirms plausibility ofthe error estimates (Table 1).

PLS score and loading plots (Fig. 6) provide some further insightinto the data and its relation to the milk composition. The scoreplots of both fat and protein models, shown overlaid in Fig. 6a, re-veal similar patterns that closely reproduce the sample composi-tion (Fig. 1). This important observation makes it possible tointerpret the first two PLS principal components as basicallyresponsible for fat (PC1) and protein (PC2) content. As a conse-quence, the loadings (Fig. 6b) can be interpreted as resolved spec-tral contributions of the respective constituents.

It should be clarified that the scores and loadings are abstract bydefinition and their physical interpretation is generally discour-aged. However, provided that the scores t1 and t2 take the direc-tions of fat and protein concentrations, it is also reasonable, inthis specific case, to associate the first two loading vectors withindividual spectral responses of the respective components.

The absence of distinct absorption peaks in the loadings (exceptfor the water band at 980 nm) evidences that the light scatter dom-inates in both fat and protein spectra. A contribution of knownfrom the literature fat absorption at 930 nm and possibly someweaker bands between 950 and 1000 nm (Šašic & Ozaki, 2001;Tsenkova et al., 1999; van Veen, Sterenborg, Pifferi, Torricelli, &Cubeddu, 2004; Wu et al., 2008) into the modelling should be rel-atively low. However, the fact that the optimal region for fat mod-elling is shifted towards the higher wavelength may indirectlypoint at the presence of some absorption contribution.

Another factor affecting the spectral intervals optimal for fat andprotein modelling is the measurement error increase towards theborders of the spectrometer’s operational region (300–1100 nm).For highly dense media, such as milk, the marginal measurementsmay be suboptimal. At the ‘‘blue’’ end it is conditioned by the lightsource spectrum, intensity of which quickly falls down below450 nm. At the long-wave end, the detector sensitivity is startingto be a limiting factor over about 1000 nm. Therefore, the optimal-ity borders are defined by a compromise between the gain of usingpossibly wider spectral regions (more information) and a negativeeffect of including less accurate spectral variables.

The first loading vector p1 in both models is represented by analmost flat horizontal line, which is a typical spectrum of scatter byparticles having sizes close to the wavelengths range of visiblelight, such as fat globules, distributed between 1 and 10 lm (Wal-stra et al., 2006, chap. 1). At the same time, the protein micelles arenarrowly distributed in the range from 80 to 200 nm (Kalinin et al.,2009; Walstra et al., 2006, chap. 1), and thus, are smaller comparedto the incident light wavelengths. In this situation, the scatter isstrongly wavelength-dependent and its intensity is expected to fallwith the energy of light, i.e. towards the longer waves. This phe-nomenon is in agreement with p2-loading shapes in Fig. 6b, associ-ated with the protein spectral response.

The regression coefficients are presented in Fig. 7. In both fatand protein models the variable significance growth from the mid-dle to the edges of the spectral region. The absence of any distinctpeaks confirms that the main role in the modelling belongs to thescatter rather than absorption phenomena. The noisy character ofb above 950 nm evidences lower reliability of variables as theyapproach technical limits of the spectrometric system.

Page 5: Quantitative determination of fat and total protein in milk based on visible light scatter

0.02

0.04

0.06

0.08

0.1

0.12

PC number

RM

SEC

V

1 2 3 4 50

20

40

60

80

100

0.02

0.04

0.06

0.08

0.1

0.12

PC number1 2 3 4 5

0

20

40

60

80

100

%Va

rianc

e ex

plai

ned

(a) (b)

Fig. 4. The dependences of cross-validation RMSE (solid line), explained X- (dashed line) and y-variance (dotted line) on the number of PCs for the PLS models of fat (a) andtotal protein (b) content; the vertical red line designates the optimal number of PCs retained in the model for prediction.

Table 1Validation statistics for PLS-predicted fat and total protein concentrations in different wavelength regions. In all cases the prediction was performed with two principlecomponents (a = 2).

%Fata (625–995 nm) %Protein (520–960 nm) %Fata (600–700 nm) %Protein (600–700 nm)

RMSE R2 RMSE R2 RMSE R2 RMSE R2

Designed set 0.045 0.964 0.030 0.946 0.057 0.944 0.044 0.883Cross-validation 0.047 0.973 0.032 0.964 0.061 0.955 0.050 0.915

a Sample #13 was removed at the fat modelling as an evident outlier.

3 3.2 3.4 3.6 3.8

3

3.2

3.4

3.6

3.8

1

2

3

456

7

8

9

10

1112

14

15

16

17

1819

20

21(a)

%fat reference

%fa

t pre

dict

ed

2.6 2.8 3 3.22.6

2.8

3

3.2

%protein reference

%pr

otei

n pr

edic

ted

1

2

3456

7

8

9

10

11

12

13

14

15

1617

18

19

20

21

(b)

Fig. 5. Prediction versus measured values: (a) fat content and (b) total protein content; calibration and test samples are highlighted with hollow and filled markers,respectively.

−1 −0.5 0 0.5 1

−0.1

0

0.1

0.2

12

345 6 7

8

9 10

11

12

14

15

16

1718

19

20

21

12

3

4567

8

910

11

12

1314

15

16

17

18

19

20

21

t1

t2

(a)

600 700 800 900

−0.1

0

0.1

wavelength (nm)

p1 &

p2

(b)%protein

%fat

Fig. 6. First two principal components of PLS regression models for fat (red circles and solid lines) and total protein (blue squares and dashed lines) content: (a) t1 versus t2

score plots; and (b) p1 and p2 loading line-plots. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this paper.)

416 A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418

Page 6: Quantitative determination of fat and total protein in milk based on visible light scatter

600 700 800 900wavelength (nm)

norm

aliz

ed s

catte

ring

coef

ficie

nts

µ s

Fig. 8. Scattering coefficients for fat globules (red solid line) and casein micelles(blue dashed line) calculated in accordance with Mie-theory; both curves arenormalized to the unit vector length. (For interpretation of the references to colourin this figure legend, the reader is referred to the web version of this paper.)

A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418 417

Under the conditions of the independent scattering approxima-tion that is valid for emulsions in the present concentration ranges(Giusto et al., 2003; Michels, Foschum, & Kienle, 2008) Mie theorycan be employed for the calculation of the system opticalproperties. This simulation can be helpful for a better understand-ing of the experimental observations.

The Mie-theory calculations have been performed under the fol-lowing assumptions. Real parts of the refractive indices of the fatglobules and casein micelles are 1.46 (Michalski, Briard, & Michel,2001) and 1.57 (Griffin & Griffin, 1985), respectively, and noabsorption occurs. A polydisperse log-normal distribution has beenassumed with the mean particle diameters of 120 nm for the pro-tein and 5 lm for the fat; the standard deviation was taken for 30%in both cases. This simulation was aimed at a qualitative assess-ment of individual optical properties of fat and protein emulsions,in particular, the character and strength of the scatter wavelengthdependences. This fact should justify possible inaccuracy in themade assumptions.

The calculated scattering coefficients ls are presented in Fig. 8.Both vectors were normalized to the unit length for a better com-parability with the loading plots. The simulated data shows thatthe experimentally observed spectral behaviour of fat and proteincan be explained essentially in terms of the difference in particlesizes and refractive indices of the two species. The above madeinterpretation of the loadings (Fig. 6b) as resolved contributionsof individual milk constituents to the total scattering is thereforein a perfect agreement with the theory.

This consideration leads to an important conclusion on a funda-mental feasibility of precise analysis of two important milk nutri-ents in Vis and SW-NIR region. Despite the region similarity withthe conventional SW-NIR spectroscopy of milk, discussed in Sec-tion 1, our method has a clear distinction. Its main feature is theprevailing utilization of light scatter effects for quantitative model-ling of spectral data. In contrast to it, the traditional SW-NIR meth-ods of milk analysis tend to apply different scatter suppressiontechniques to emphasize the component absorbance above900 nm. As the suggested method relies on the wavelength depen-dences of light scatter by fat and protein particles, the regionextension toward Vis is advantageous. Therefore, from the pointof view of the information content of different spectral regions,the analysis has been essentially transferred to the Vis.

Indeed, narrowing the analytical region to the wavelengths be-tween 600 and 700 nm results in only a moderate loss in the modelperformances (Table 1), which is a reasonable effect of the reduction

600 700 800 900

−0.1

−0.05

0

0.05

0.1

0.15

wavelength (nm)

regr

essi

on c

oeffi

cien

ts b

Fig. 7. PLS regression coefficients for fat (red solid line) and total protein (bluedashed line). (For interpretation of the references to colour in this figure legend, thereader is referred to the web version of this paper.)

in the number of variables. It also means that the NIR absorptionbands above 700 nm do not have any critical importance for themodelling. Another evidence of this is that application of any scat-ter-correction techniques does not improve the performance ofour method; on the contrary, it results in the model deterioration.The PLS method success in resolving individual spectral responsesof fat and protein provides a mathematical basis for the new tech-nique. Another prerequisite to the scatter-based modelling is a com-positional stability of milk as a colloidal system, a fact that fat andprotein are the prevailing light-scattering particles present in fixedconcentration ranges.

The fundamental possibility to perform milk composition anal-ysis in the Vis/SW-NIR region up to 1000 nm is of great practicalimportance. It paves the way for the application of price-effectivedetectors (e.g. MMS or silicon CCD chip technology), optics andlight-guiding materials. Another advantage is a wider choice oflight sources. For instance, light emitting diodes (LEDs) are avail-able in a wide assortment of powers and emitted spectra; theyare compact, energy-saving and have a much longer life time thantraditional halogen lamps.

Vis/SW-NIR scatter spectroscopy presents a viable alternative tothe conventional NIR analysis of milk. The NIR spectroscopy ofaqueous systems is complicated by the strong water absorptionand peak overlap, which may limit the accuracy and decrease therobustness of quantitative multivariate modelling. Besides, in theVis–NIR region the molecular cross-sections of scatter are severalorders of magnitude higher than the cross-sections in absorption.Therefore, the scatter methods are expected to be more sensitive(and potentially, more precise) in comparison to the absorptionspectroscopy. On the other side, the dairy community today fillsthe miss of budget analytical instruments that could involve smal-ler and intermediate producers into the permanent quality control.From this point of view, the visible spectroscopy provides betterprice reduction potentials than NIR.

A vast amount of commercially produced milk is homogenized,i.e. standardized for equally-sized fat particles. In this situation thescatter intensity directly correlates with the particle number, andthus, can be straightforwardly converted to a component concen-tration. Casein micelles also have a narrow and relatively constantsize distribution. Therefore, the method can readily be applied tothe analysis of uniformly homogenized milk. The flow cell(Fig. 2), presented in this work, has been designed to be mounted

Page 7: Quantitative determination of fat and total protein in milk based on visible light scatter

418 A. Bogomolov et al. / Food Chemistry 134 (2012) 412–418

into a production pipeline for real-time in-line quality control, butit is also suitable for a static laboratory analysis.

The method applicability range to the analysis of raw milk, hav-ing variable distribution of fat particle sizes, is an important issuethat should be further investigated. Although the samples in thisstudy were not completely homogenized, the spectral analysis stillresulted in accurate predictive models. This fact may point at thefeasibility of spectral analysis of strongly scattering polydispersesystems due to the size- and wavelength-dependent contributionsof different particles into the transmitted light intensity. Multivar-iate PLS regression analysis, performed on raw spectra, withoutany preprocessing, was capable of extracting quantitative informa-tion. The wavelength region selection noticeably improved the per-formance. Since the mixture in our case was limited to only twosource particle distributions (cream and skim milk), a statementabout the analyzability of the whole diversity of natural raw milksamples would be premature. The final generalization requiresthe effect of variable particle size distribution on the modellingto be thoroughly studied. The fundamental question here iswhether the Vis/SW-NIR spectra themselves deliver sufficientinformation on the particle variations. From the theoretical pointof view it can be expected, considering different scatter coeffi-cient’s wavelength dependences for differently sized fat globules(e.g. Walstra, 1967). General feasibility of the approach is practi-cally proven for powder mixtures, e.g. Nieuwmeyer et al. (2007).An experimental study of fat particle size effect on the scatterparameters and establishing the modelling feasibility in this caseis scheduled as a next research step.

Further improvement of milk fat and total protein predictionperformance can hypothetically be achieved by separation of thecomponents’ absorbance and scatter spectra using the approachdescribed in earlier publications (e.g. Kessler, Oelkrug, & Kessler,2009, and references therein).

Original experimental design can be outlined as a separate workoutcome. The suggested diagonal scheme is an optimal way ofbuilding calibration and validation sample sets for simultaneousquantitative modelling of two components in a mixture free ofthe closure constraint.

Acknowledgements

The authors thank Dr. Andrey Kalinin (Institute of Spectroscopy,Troitsk, Russia) for sharing his expertise in spectroscopic analysisof milk products. Mr. Franz Merk and Mr. Bernd Schätzle (Schwarz-waldmilch GmbH) are acknowledged for their kind assistance dur-ing the experiments at the dairy. The project was supported byBMBF (PRODICHT, No. 13N9330).

References

Ashworth, U. S. (1969). Turbidimetric methods for measuring fat content ofhomogenized milk. Journal of Dairy Science, 52, 262–263.

Brennan, D., Alderman, J., Sattler, L., O’Connor, B., & O’Mathuna, C. (2003). Issues indevelopment of NIR micro spectrometer system for on-line process monitoringof milk product. Measurement, 33, 67–74.

Chen, J. Y., Iyo, C., Kawano, S., & Terada, F. (1999). Development of calibration withsample cell compensation for determining the fat content of unhomogenisedraw milk by a simple near infrared transmittance method. Journal of NearInfrared Spectroscopy, 7, 265–273.

Crofcheck, C. L., Payne, F. A., Hicks, C. L., Menguc, M. P., & Nokes, S. E. (2000). Fiberoptic sensor response to low levels of fat in skim milk. Journal of Food ProcessEngineering, 23, 163–175.

Eriksson, L., Johansson, E., Kettaneh-Wold, N., Wikström, C., & Wold, S. (2008).Design of experiments: Principles and applications (3rd ed.). Umeå: Umetrics AB.

Ertugay, M. F., S�engül, M., & S�engül, M. (2004). Effect of ultrasound treatment onmilk homogenisation and particle size distribution of fat. Turkish Journal ofVeterinary and Animal Sciences, 28, 303–308.

Esbensen, K. H. (2000). Multivariate data analysis – In practice (4th ed.). Oslo: CAMOProcess AS.

Frisvad, J. R., Christensen, N. J., & Jensen, H. W. (2007). Computing the scatteringproperties of participating media using Lorenz-Mie theory. ACM Transactions onGraphics, 26(3) [Article 60].

Geladi, P., MacDougall, D., & Martens, H. (1985). Linearization and scatter-correction for near-infrared reflectance spectra of meat. Applied Spectroscopy,39, 491–500.

Giusto, A., Saija, R., Iatì, M. A., Denti, P., Borghese, F., & Sindoni, O. I. (2003). Opticalproperties of high-density dispersions of particles: Application to intralipidsolutions. Applied Optics, 42, 4375–4380.

Griffin, M. C. A., & Griffin, W. G. (1985). A simple turbidimetric method for thedetermination of the refractive index of large colloidal particles applied tocasein micelles. Journal of Colloid and Interface Science, 104, 409–415.

Iñón, F. A., Garrigues, S., & De la Guardia, M. (2004). Nutritional parameters ofcommercially available milk samples by FT-IR and chemometric techniques.Analytica Chimica Acta, 513, 401–412.

Kalinin, A. V., Krasheninnikov, V. N., & Potapov, A. V. (2009). Effect of dispersionstructure variation on chemometrical calibration of near-infrared spectrometer:Protein fractions in milk and reversed micelles solutions. Chemometrics andIntelligent Laboratory Systems, 97, 33–38.

Kalinin, A., Krivtsun, V., Krashenuinnikov, V., Sadovskiy, S., Denisovich, H., & Yurova,H. (2008). Calibration models for multi-component quantitative analyses ofdairy with the use of two different types of portable near infrared spectrometer.Journal of Near Infrared Spectroscopy, 16, 343–348.

Kessler, W., Oelkrug, D., & Kessler, R. (2009). Using scattering and absorptionspectra as MCR-hard model constraints for diffuse reflectance measurements oftablets. Analytica Chimica Acta, 642, 127–134.

Laven, Ph. (2011). A computer program for scattering of light from a sphere using Mietheory and the Debye series. http://www.philiplaven.com/mieplot.htm(04.04.11).

Luginbühl, W. (2002). Evaluation of designed calibration samples for caseincalibration in Fourier transform infrared analysis of milk. Lebensmittel-Wissenschaft and Technologie, 35, 554–558.

Michalski, M. C., Briard, V., & Michel, F. (2001). Optical parameters of milk fatglobules for laser light scattering measurements. Le Lait, 81, 787–796.

Michels, R., Foschum, F., & Kienle, A. (2008). Optical properties of fat emulsions.Optics Express, 16, 5907–5925.

Muñiz, R., Pérez, M. A., De la Torre, C., Carleos, C. E., Corral, N., & Baro, J. A. (2009).Comparison of principal component regression (PCR) and partial least square(PLS) methods in prediction of raw milk composition by Vis–NIR spectrometry.Application to development of on-line sensors for fat, protein and lactosecontent. XIX IMEKO World Congress Fundamental and Applied MetrologySeptember 6–11, 2009, Lisbon, Portugal, Conference materials, ISBN 978-963-88410-0-1, 2564–2568.

Nieuwmeyer, F. J. S., Damen, M., Gerich, A., Rusmini, F., van der Voort Maarschalk, K.,& Vromans, H. (2007). Granule characterization during fluid bed drying bydevelopment of a near infrared method to determine water content and mediangranule size. Pharmaceutical Research, 24, 1854–1861.

Purnomoadi, A., Batajoo, K. K., Ueda, K., & Terada, F. (1999). Influence of feed sourceon determination of fat and protein in milk by near-infrared spectroscopy.International Dairy Journal, 9, 447–452.

Rodriguez-Otero, J. L., Hermida, M., & Centeno, J. (1997). Analysis of dairy productsby near-infrared spectroscopy: A review. Journal of Agriculture and FoodChemistry, 45, 2815–2819.

Šašic, S., & Ozaki, Y. (2001). Short-wave near-infrared spectroscopy of biologicalfluids. 1. Quantitative analysis of fat, protein and lactose in raw milk by partialleast-squares regression and band assignment. Analytical Chemistry, 73, 64–71.

Tsenkova, R., Atanassova, S., Itoh, K., Ozaki, Y., & Toyoda, K. (2000). Near infraredspectroscopy for biomonitoring: Cow milk composition measurement in aspectral region from 1100 to 2400 nm. Journal of Animal Science, 78, 515–522.

Tsenkova, R., Atanassova, S., Ozaki, Y., Toyoda, K., & Itoh, K. (2001). Near-infraredspectroscopy for biomonitoring: Influence of somatic cell count on cow’s milkcomposition analysis. International Dairy Journal, 11, 779–783.

Tsenkova, R., Atanassova, S., Toyoda, K., Ozaki, Y., Itoh, K., & Fearn, T. (1999). Near-infrared spectroscopy for dairy management: Measurement of unhomogenizedmilk composition. Journal of Dairy Science, 82, 2344–2351.

van Veen, R. L. P., Sterenborg, H. J. C. M., Pifferi, A., Torricelli, A., & Cubeddu, R.(2004). Determination of VIS–NIR absorption coefficients of mammalian fat,with time- and spatially resolved diffuse reflectance and transmissionspectroscopy. OSA annual BIOMED topical meeting (BIO), Miami Beach,Florida, April 14, 2004.

Walstra, P. (1967). Turbidimetric method for milk fat determination. Journal of DairyScience, 50, 1839–1840.

Walstra, P., Wouters, J. T. M., & Geurts, T. J. (2006). Dairy science and technology (2nded.). Boca Raton: CRC Press.

Wold, S., Martens, H., & Wold, H. (1983). The multivariate calibration problem inchemistry solved by the PLS method. In A. Ruhe & B. Kågström (Eds.), Proc. Conf.Matrix Pencils, Lecture Notes Math. (Vol. 973, pp. 286–293).

Wu, D., He, Y., & Feng, S. (2008). Short-wave near-infrared spectroscopy analysis ofmajor compounds in milk powder and wavelength assignment. AnalyticaChimica Acta, 610, 232–242.