Chemometric analysis of Tuscan olive oils

343

Chemometrics and Intelbgent Luboratoty Systems, 5 (1989) 343-354 Elsevier Science Publishers B.V., Amsterdam - Primed in The Netherlands

Original Research Paper n

Chemometric Analysis of Tuscan Olive Oils

C. ARMANINO * , R. LIZARD1 and S. LANTERI

Istituto di Analisi e Tecnologie Farmaceutiche ed Alimentari, via Brigata Salerno, I-16147 Genova (ItaIy)

G. MODI

S.M.P. Laboratorio Chimico - USL IO/A, via Ponte alle Masse 211, I-50144 Florence (Italy)

(Received 12 July 1988; accepted 11 November 1988)

ABSTRACT

Armanino, C., Leardi, R., Lanteri, S. and Modi, G., 1989. Chemometric analysis of Tuscan olive oils. Cbemometrics and Intelligent Laboratoty Sysiem, 5: 343-354.

The chemical information (fatty acids, sterols, triterpenic alcohols) on 120 olive oil samples from Tuscany, Italy, collected in 88 different areas of production, was evaluated by display methods and cluster analysis.

Inside this small region of varied orography, four groups of similar samples were detected and some relationships with the geographic profile were revealed by using a CAD package to produce effective geographical representations of

variables, eigenvectors and clusters.

INTRODUCTION

In recent years several studies, carried out by chemometric techniques on the chemical composition of olive oils, have demonstrated that the geographic origin of olive oil samples can be revealed by classification methods applied to chemical components: fatty acids [l], sterols and fatty acids [2], fatty acid methyl esters [3], sterols and triterpenic alcohols [4] were evaluated.

The data sets studied came’ from wide geographic areas, with large variations in latitude and climatic parameters. Only samples from East and West Liguria, a small region of Italy, had been investigated by display and classification methods [5-71. Beyond the pure statistical methods, an expert system, SEXIA [8], has recently been devel-

0169-7439/89/$03.50 0 1989 Elsevier science Publishers B.V.

oped and verified with Spanish olive oil samples of six varieties from Catalonia and Andalusia.

In this study we simply evaluated the information provided by the chemical composition (fatty acids, sterols, triterpenic alcohols) of 120 samples of olive oils collected in Tuscany, from 88 known and different production areas, in order to verify, by chemometric methods such as cluster analysis and principal component analysis, whether, within a region of small area and such varied orography as Tuscany is, the samples were homogeneous or whether groups of similar samples were present.

At each step of the chemometric evaluation, which was performed by PARVUS [9], the classical display methods were supported by colour maps obtained using AUTOCAD [lo] as a geographical output of PARVUS files.

n Chemometrics and Intelligent Laboratory Systems 344

EXPERIMENTAL

Data

Extra virgin olive oils (120 samples), which had been collected directly from the producing oil-mills in Tuscany during the same cropping season, were examined.

For each sample, the chemical composition of glyceride and unsaponifiable fractions and some

physicochemical parameters were known: in all, the 29 variables listed in Table 1 had been measured. Free fatty acids, refractive index and the

spectrophotometric variables (KZ6s and AK) had been measured by official procedures [ll]. Fatty acids were determined by gas chromatography using a flame-ionization detector: methyl esters were determined by esterification with methyl al-

TABLE 1

Original variables and their transformations

cohol, sulphuric acid and benzene (respectively 97.9,2 and 0.1%) and analyzed by a Dani 3600 gas chromatograph (butanediol succinate 158, Chro- mosorb WHMDS 80-100,2 m x 2 mm). Aliphatic and triterpene alcohols, sterols and erythrodiol (the alcohol of the unsaponifiable fraction) were esterified and separated on a silica-gel layer (0.5 mm). The components of the alcoholic (OV-17 3%, Anakrom Q 80-100,2 m x 4 mm) and sterolic fraction (SE-30 2.5% on Chromosorb GAW- DMCS 80-100, 2 m X 4 mm) were determined by

gas chromatography using a Perkin Elmer 900. The raw data table has been published elsewhere

]121.

Packages

Multivariate data evaluation was done by PARVUS running under DOS. Clustering was

Index Name

1 Free fatty acids

2 Refractive index

3 K268

4 AK

5 Palmitic acid

6 Palmitoleic acid

7 Steak acid

8 Oleic acid

9 Linoleic acid

10 Linolenic acid

11 Arachidic acid

12 Eicosenoic acid

13 Total sterols

14 Campesterol

15 Stigmasterol

16 Betasitosterol

17 A5-Avenasterol

18 Betasitosterol + A5-avenasterol

19 Erythrodiol

20 Behenyl alcohol

21 Lignoceryl alcohol

22 Ceryl alcohol

23 Montanyl alcohol

24 C26/C24

25 Cycloartenol

26 24-Methylenecycloartanol

27 Citrostadienol

28 Cyclobranol

29 Triteruene alc./Total ale.

Transform

log (free fatty acids)

_

log(palmitoleic acid)

log(stearic acid)

log(linoleic acid)

log(linolenic acid)

log(arachidic acid)

log(eicosenoic acid + 0.1)

log(tota1 sterols)

log(campestero1)

log(stigmastero1) _

_

_

_

log(cery1 alcohol)

log(montany1 alcohol)

log(C26/C24) _

log(citrostadieno1 + 0.1)

log(cyclobrano1 + 0.1)

345

carried out by PARVUS programs not included in the Elsevier edition. A package for computer graphics, AUTOCAD, was connected with PARVUS output files by simple intermediate programs.

Original Research Paper n

RESULTS AND DISCUSSION

Pretrealmenls

The data matrix was formed by 120 objects and 29 variables, no categories were known a priori.

PIANOSA t 0 rllaoum,~

MONTECRISTO 4 0 Qltwe

Fig. 1. Map of Tuscany.


Raw data were first examined by histograms to check the normal distribution of the variables. Because some of the original variables were not normally distributed, having a positive skewness, their logarithmic transforms were used. This simple transformation was sufficient to give new variables that were almost normally distributed. Table

1 (on the right) shows the transforms of variables used in the following data evaluation, rather than

the original ones. AUTOCAD was used to display the variations

of each variable related to the geographic origin of the samples. To do so, a map of Tuscany (Fig. 1) was first stored by using a graphic tablet. After

Fig. 2. Map of linoleic acid content of Tuscan olive oils: pure blue for the lowest values and pure red for acid.

the highest value of linoleic

341 Original Research Paper n

that, each sample was considered as a zone whose area and location correspond to the site of sampling, so that it was possible to colour each sampling zone independently of the other ones, according to a chromatic scale depicting the value of the variable taken into account.

During this univariate examination, the chosen chromatic scale went from pure blue (lowest values = 8 units of blue, 0 units of red) to pure red (highest values = 0 blue, 8 red), through seven intermediate &ours: thus, the layer correspond- ing to an object with a value placed in the middle of the range of that variable will appear as violet (4 blue, 4 red). This transformation was performed by using the command Script, applied to a SCRIPT file obtained by transforming the original PARVUS file containing the values of the variable to be represented; in it, for each object, the code of the colour in which it had to be painted was specified. As an example of this graphic display, Fig. 2 shows this technique applied to the variable linoleic acid: here it is very easy to see how this variable follows a regular pattern, in which one of the sources of variation appears to be the distance from the sea.

The correlation coefficient matrix was computed; the most relevant correlations were the following: &sitosterol - ASavenasterol r = - 0.936 lignoceryl alcohol - C26/C24 r = -0.899

cycloartanol - 24-methylenecycloartanol r = -0.891

ceryl alcohol - C26/C24 r = +0.859 palmitic acid - oleic acid r = -0.825 oleic acid - linoleic acid r= -0.724 Since data were uncategorized, the coefficient of correlation is the only parameter on which a fea- ture selection, though very rough, can be based. To do so, we performed the following cyclic procedure: the determinant and the rank of the correlation matrix were computed; afterwards, one of the variables of the pair with the highest correlation coefficient was deleted and the new rank and determinant were computed, together with the ratio between the two determinants: when this ratio is very high it may be supposed that the deleted variable does not carry any significant information.

On this basis, the following variables were eliminated: A5-avenasterol, C26/C24, cyclo-

EIGENVEXTOR 1

ELGENVECTOR 2

Fig. 3. Bar graphics of the variable loadings on eigenvectors 1 and 2 (absolute values).

n Chemometics and Intelligent Laboratory Systems 348

artanol, and oleic acid. After that, among the remaining twenty-five variables, the highest correlation was between behenilic and ceryl alcohols (r = 0.637).

Principal component analysis

The raw data were normalized by autoscaling (column centering and column standardization), then the eigenvectors of the covariance generalized matrix were computed.

After a first examination of the eigenvector plots, an object that appeared to be very different from the remaining ones (indeed, its content of palmitic acid was so low as to lead one to suppose that it was obtained from olives grown outside Tuscany), was judged to be anomalous and was discarded; the principal component analysis and the cluster analysis were performed on ,the remaining 119 objects. To define the number of significant eigenvectors, double cross-validation [13] was applied; by this means it was found that the first five eigenvectors, explaining 57.8% of the total variance, were significant. Since a five-dimensional display currently presents some practical difficulties, we used the first two eigenvectors for display, while the first five, considered as the totality of significant information, were used for cluster analysis. The loadings of the twenty-five variables for eigenvectors 1 and 2 are depicted in Fig. 3.

Fig. 4 shows a classical representation of the scores on eigenvectors 1 and 2 (29.6% of retained variance): almost no deduction can be drawn from it, since the objects do not seem to follow any pattern. In this case, too, a geographical representation has been attempted: in Figs. 5 and 6 the scores on eigenvectors 1 and 2 were separately represented on the map of Tuscany. The first eigenvector is evidently related to either latitude or distance from the Tyrrhenian sea; on it, the coastal and the southern part of the region (high scores) are separated from the inside and the northern areas (low scores).

The second eigenvector separates the samples having the lowest scores on eigenvector 1 into two groups: the first one corresponds to the hills of

Eigenvector 1

Fig. 4. Eigenvector plot of Tuscan olive oil samples.

Valdamo and Val di Chiana (low values of eigenvector 2), while the second one corresponds to two separate regions (Apuan Alps and a zone at the border with Umbria and Latium) with high values of eigenvector 2.

In an attempt to visualize more information than that revealed a single eigenvector, a bidimensional scale was also defined. In it, the first dimension (eigenvector l), corresponds to the blue-red scale already used, while the second dimension (eigenvector 2), is described by a grading of brightness, going from bright to dark, through 5 class intervals (the global scale is then formed by 45 different possibilities: 9 colours, each with 5 brightnesses). This scale is shown in Fig. 7. Thus, a layer will appear coloured dark red when it corresponds to an object having high scores on both eigenvectors, in bright blue when it corresponds to an object with low values on both eigenvectors, and in medium-bright violet for an object with central values on both eigenvectors.

In this case, too, a SCRIPT file was prepared by a short program (available upon request), containing the names of objects and their colour codes derived from the score file of PARVUS. As


-

Fig. 5. Eigenvector 1 map: pure blue for the lowest values of scores for eigenvector 1 and pure red for the highest.

may have been expected from the maps in Figs. 5 and 6, the resulting representation (Fig. 8), allows detection of three zones, formed essentially by the objects of coastal hills and of the South (red), by those of the low hills of Valdarno and Val di Chiana (bright blue), and by those of the high hills of the Apuan Alps and of the zone near Umbria and Latium (dark blue).

Cluster analysis

The eigenvector maps displayed groups of contiguous samples with very close values of scores for the first two eigenvectors; then, to verify the similarity among samples and to single out some categories, we applied cluster analysis to the information held by the five significant eigenvectors.


Fig. 6. Eigenvector 2 map: pure blue for the lowest values of scores for kigenvector 2 and pure red for the highest.

An hierarchical clustering technique was used [14]; the similarity matrix was computed on the basis of the Euclidean distance among the 119 five-dimensional scores of the objects. The objects were clustered by the average linkage method (weighted pair group). The similarity values of the linkages were represented by the dendrogram shown in Fig. 9: at a similarity value of 0.47 four clusters of similar objects and one singleton are separated.

Their composition was immediately found by assigning a colour to each cluster and drawing a new map of Tuscany, and then painting the sampling zone with the colour of the cluster to which the sample was belonging. This map, shown in Fig. 10, is self-explanatory: the objects of three clusters are contiguous, as the display methods had already shown, while this does not happen with the samples of the fourth cluster (light blue).

The contiguous clusters correspond to the zones


Fig. 7. Bidimensional chromatic scale used for colouring Fig. 8.

already detected, the objects of the fourth one derive from sites which, although not topographi- cally adjacent, are in any case mutually similar, since they are characterized by having an altitude higher than that of the surrounding zone.

When applying the clustering procedure of complete (instead of average) linkage, almost identical clusters were found.

CONCLUSIONS

detected inside Tuscany, three of which correspond to different production areas of similar orography and distance from the sea.

The maps used to display variables, eigenvectors and clusters have been very useful in develop- ing this data analysis.

In the same way, this kind of representation could give interesting information in similar prob- lems, in which the chemical composition of samples is supposed to be related to the geographic origin.

On the basis of their chemical composition, four different groups of olive oil samples were


Fig. 8. Eigenvector 1-2 map represented by the bidimensional chromatic scale in Fig. 7: eigenvector 1 is represented by the blue-

scale, eigenvector 2 by a brightness (low values of scores) and darkness (high values) scale.

.O

.I

.2

.4

.5

.6

.7

.8

.9

l_n

-red

Fig. 9. Dendrogram of Tuscan olive oils obtained by hierarchical clustering (weighted average linkage method). Red: low hills of

Valdamo and Val di Chiana; blue: high hills; green: Coastal hills and South; maroon: samples from hill surrounded by plane; black: singleton.

353 Original Research Paper m

Fig. 10. Clustering map: the sampling zones are coloured with the colour of the cluster of belonging as in Fig. 9 (light blue, instead of

maroon, was used for painting the samples of the fourth cluster, grey for the singleton).

ACKNOWLEDGEMENTS

We are grateful to Dr. P. Zunin for useful discussions. This work received financial support from the Education Department (MPI 40%) and from the National Council of Research. The paper was presented in part at the EUCHEM Con- ference, Trieste, Italy, in June 1988.

REFERENCES

1 M. Forma and E. Tiscomia, Pattern recognition methods in

the prediction of Italian olive oil origin by their fatty acid

content, Anna/i di Chimica (Rome), 72 (1982) 143-155.

2 M. Forma, C. Armanino, S. Lanteri, C. Calcagno and E.

Tiscornia, Valutazione delle caratteristiche chimiche

dell’olio di oliva in funzione dell’annata di produzione,

mediante metodi di classificazione multivariati, Rivista

Italiana delle Sostanre Grarse, 60 (1983) 607-613. 3 0. Eddib and G. Nickless, Elucidation of olive oil classifi- ,

cation by chemometrics, The Analyst (London), 112 (1987)

391-395.

4 R. Leardi and V. Paganuzzi, Caratterizzazione dell’origine

di oli di oliva extra vergini mediante metodi chemiometrici

applicati alla frazione sterolica, Rivista Italiana de& Sostanze Grasse, 64 (1987) 131-136.

5 M. Forma and C. Armanino, Eigenvector projection and

simplified non-linear mapping of fatty acid content of

Italian olive oils, Annali di Chimica (Rome), 72 (1982) 127-141.

n Chemometrics and Intelligent Laboratory Systems

6 M. Forma, C. Armanino, S. Lanteri and C. Calcagno,

Simplified non-linear mapping of analytical data, Annali di Chimica (Rome), 73 (1983) 641-651.

7 M.P. Derde, D. Coomans and D.L. Massart, Effect of

scaling on class model@ with the SIMCA method,

Analytica Chimica Acra, 141 (1982) 187-192. 8 R. Aparicio, Characterization of foods by inexact rules: the

SEXIA expert system, Journal of Chemometrics, 3 (1988) 175-192.

9 M. Forma, R. Leardi, C. Armanino and S. Lanteri,

PARVUS: an extendable package of programs for data ex-

ploration, classification and correlation, Elsevier Scientific

Software, Amsterdam, 1988.

10 AUTOCAD, Autodesk AG, Version 2.6.44 IBM PC, Ad-

vanced Drafting Extension 3.

354

11 Minister0 Agricoltura e Foreste, Metodi Ufficiali di Analisi per OIii e Grassi, Supplemento 1, Rome, 1963.

12 A. Fabbrini, G. Modi, M. Piccinini, G. Simiani and S.

Bonciani, Indagine su ohi di pressione prodotti in Toscana,

Boll. Chim. Igien., 35 (1984) 59-71.

13 S. Wold, Cross validatory estimation of the number of

components in factor and principal components models,

Technometrics, 20 (1978) 397-406.

14 D.L. Massart and L. Kaufman, Hierarchical clustering

methods, in J.D. Winefordner (Editor), The Inrerprerarion

of Analytical Chemical Data by the Use of Chuter Analysis,

Wiley, New York, 1983, pp. 75-99.

Documents

Chemometric analysis of Tuscan olive oils