Upload
c-armanino
View
228
Download
5
Embed Size (px)
Citation preview
343
Chemometrics and Intelbgent Luboratoty Systems, 5 (1989) 343-354 Elsevier Science Publishers B.V., Amsterdam - Primed in The Netherlands
Original Research Paper n
Chemometric Analysis of Tuscan Olive Oils
C. ARMANINO * , R. LIZARD1 and S. LANTERI
Istituto di Analisi e Tecnologie Farmaceutiche ed Alimentari, via Brigata Salerno, I-16147 Genova (ItaIy)
G. MODI
S.M.P. Laboratorio Chimico - USL IO/A, via Ponte alle Masse 211, I-50144 Florence (Italy)
(Received 12 July 1988; accepted 11 November 1988)
ABSTRACT
Armanino, C., Leardi, R., Lanteri, S. and Modi, G., 1989. Chemometric analysis of Tuscan olive oils. Cbemometrics and Intelligent Laboratoty Sysiem, 5: 343-354.
The chemical information (fatty acids, sterols, triterpenic alcohols) on 120 olive oil samples from Tuscany, Italy, collected in 88 different areas of production, was evaluated by display methods and cluster analysis.
Inside this small region of varied orography, four groups of similar samples were detected and some relationships with the geographic profile were revealed by using a CAD package to produce effective geographical representations of
variables, eigenvectors and clusters.
INTRODUCTION
In recent years several studies, carried out by chemometric techniques on the chemical composi- tion of olive oils, have demonstrated that the geographic origin of olive oil samples can be re- vealed by classification methods applied to chemical components: fatty acids [l], sterols and fatty acids [2], fatty acid methyl esters [3], sterols and triterpenic alcohols [4] were evaluated.
The data sets studied came’ from wide geo- graphic areas, with large variations in latitude and climatic parameters. Only samples from East and West Liguria, a small region of Italy, had been investigated by display and classification methods [5-71. Beyond the pure statistical methods, an expert system, SEXIA [8], has recently been devel-
0169-7439/89/$03.50 0 1989 Elsevier science Publishers B.V.
oped and verified with Spanish olive oil samples of six varieties from Catalonia and Andalusia.
In this study we simply evaluated the informa- tion provided by the chemical composition (fatty acids, sterols, triterpenic alcohols) of 120 samples of olive oils collected in Tuscany, from 88 known and different production areas, in order to verify, by chemometric methods such as cluster analysis and principal component analysis, whether, within a region of small area and such varied orography as Tuscany is, the samples were homogeneous or whether groups of similar samples were present.
At each step of the chemometric evaluation, which was performed by PARVUS [9], the classi- cal display methods were supported by colour maps obtained using AUTOCAD [lo] as a geo- graphical output of PARVUS files.
n Chemometrics and Intelligent Laboratory Systems 344
EXPERIMENTAL
Data
Extra virgin olive oils (120 samples), which had been collected directly from the producing oil-mills in Tuscany during the same cropping season, were examined.
For each sample, the chemical composition of glyceride and unsaponifiable fractions and some
physicochemical parameters were known: in all, the 29 variables listed in Table 1 had been mea- sured. Free fatty acids, refractive index and the
spectrophotometric variables (KZ6s and AK) had been measured by official procedures [ll]. Fatty acids were determined by gas chromatography using a flame-ionization detector: methyl esters were determined by esterification with methyl al-
TABLE 1
Original variables and their transformations
cohol, sulphuric acid and benzene (respectively 97.9,2 and 0.1%) and analyzed by a Dani 3600 gas chromatograph (butanediol succinate 158, Chro- mosorb WHMDS 80-100,2 m x 2 mm). Aliphatic and triterpene alcohols, sterols and erythrodiol (the alcohol of the unsaponifiable fraction) were esterified and separated on a silica-gel layer (0.5 mm). The components of the alcoholic (OV-17 3%, Anakrom Q 80-100,2 m x 4 mm) and sterolic fraction (SE-30 2.5% on Chromosorb GAW- DMCS 80-100, 2 m X 4 mm) were determined by
gas chromatography using a Perkin Elmer 900. The raw data table has been published elsewhere
]121.
Packages
Multivariate data evaluation was done by PARVUS running under DOS. Clustering was
Index Name
1 Free fatty acids
2 Refractive index
3 K268
4 AK
5 Palmitic acid
6 Palmitoleic acid
7 Steak acid
8 Oleic acid
9 Linoleic acid
10 Linolenic acid
11 Arachidic acid
12 Eicosenoic acid
13 Total sterols
14 Campesterol
15 Stigmasterol
16 Betasitosterol
17 A5-Avenasterol
18 Betasitosterol + A5-avenasterol
19 Erythrodiol
20 Behenyl alcohol
21 Lignoceryl alcohol
22 Ceryl alcohol
23 Montanyl alcohol
24 C26/C24
25 Cycloartenol
26 24-Methylenecycloartanol
27 Citrostadienol
28 Cyclobranol
29 Triteruene alc./Total ale.
Transform
log (free fatty acids)
_
log(palmitoleic acid)
log(stearic acid)
log(linoleic acid)
log(linolenic acid)
log(arachidic acid)
log(eicosenoic acid + 0.1)
log(tota1 sterols)
log(campestero1)
log(stigmastero1) _
_
_
_
log(cery1 alcohol)
log(montany1 alcohol)
log(C26/C24) _
log(citrostadieno1 + 0.1)
log(cyclobrano1 + 0.1)
345
carried out by PARVUS programs not included in the Elsevier edition. A package for computer graphics, AUTOCAD, was connected with PARVUS output files by simple intermediate pro- grams.
Original Research Paper n
RESULTS AND DISCUSSION
Pretrealmenls
The data matrix was formed by 120 objects and 29 variables, no categories were known a priori.
PIANOSA t 0 rllaoum,~
MONTECRISTO 4 0 Qltwe
Fig. 1. Map of Tuscany.
n Chemometrics and Intelligent Laboratory Systems 346
Raw data were first examined by histograms to check the normal distribution of the variables. Because some of the original variables were not normally distributed, having a positive skewness, their logarithmic transforms were used. This sim- ple transformation was sufficient to give new vari- ables that were almost normally distributed. Table
1 (on the right) shows the transforms of variables used in the following data evaluation, rather than
the original ones. AUTOCAD was used to display the variations
of each variable related to the geographic origin of the samples. To do so, a map of Tuscany (Fig. 1) was first stored by using a graphic tablet. After
Fig. 2. Map of linoleic acid content of Tuscan olive oils: pure blue for the lowest values and pure red for acid.
the highest value of linoleic
341 Original Research Paper n
that, each sample was considered as a zone whose area and location correspond to the site of sam- pling, so that it was possible to colour each sam- pling zone independently of the other ones, according to a chromatic scale depicting the value of the variable taken into account.
During this univariate examination, the chosen chromatic scale went from pure blue (lowest val- ues = 8 units of blue, 0 units of red) to pure red (highest values = 0 blue, 8 red), through seven intermediate &ours: thus, the layer correspond- ing to an object with a value placed in the middle of the range of that variable will appear as violet (4 blue, 4 red). This transformation was performed by using the command Script, applied to a SCRIPT file obtained by transforming the original PARVUS file containing the values of the variable to be represented; in it, for each object, the code of the colour in which it had to be painted was specified. As an example of this graphic display, Fig. 2 shows this technique applied to the variable linoleic acid: here it is very easy to see how this variable follows a regular pattern, in which one of the sources of variation appears to be the distance from the sea.
The correlation coefficient matrix was com- puted; the most relevant correlations were the following: &sitosterol - ASavenasterol r = - 0.936 lignoceryl alcohol - C26/C24 r = -0.899
cycloartanol - 24-methylene- cycloartanol r = -0.891
ceryl alcohol - C26/C24 r = +0.859 palmitic acid - oleic acid r = -0.825 oleic acid - linoleic acid r= -0.724 Since data were uncategorized, the coefficient of correlation is the only parameter on which a fea- ture selection, though very rough, can be based. To do so, we performed the following cyclic pro- cedure: the determinant and the rank of the corre- lation matrix were computed; afterwards, one of the variables of the pair with the highest correla- tion coefficient was deleted and the new rank and determinant were computed, together with the ratio between the two determinants: when this ratio is very high it may be supposed that the deleted variable does not carry any significant information.
On this basis, the following variables were eliminated: A5-avenasterol, C26/C24, cyclo-
EIGENVEXTOR 1
ELGENVECTOR 2
Fig. 3. Bar graphics of the variable loadings on eigenvectors 1 and 2 (absolute values).
n Chemometics and Intelligent Laboratory Systems 348
artanol, and oleic acid. After that, among the remaining twenty-five variables, the highest corre- lation was between behenilic and ceryl alcohols (r = 0.637).
Principal component analysis
The raw data were normalized by autoscaling (column centering and column standardization), then the eigenvectors of the covariance generalized matrix were computed.
After a first examination of the eigenvector plots, an object that appeared to be very different from the remaining ones (indeed, its content of palmitic acid was so low as to lead one to suppose that it was obtained from olives grown outside Tuscany), was judged to be anomalous and was discarded; the principal component analysis and the cluster analysis were performed on ,the remain- ing 119 objects. To define the number of signifi- cant eigenvectors, double cross-validation [13] was applied; by this means it was found that the first five eigenvectors, explaining 57.8% of the total variance, were significant. Since a five-dimen- sional display currently presents some practical difficulties, we used the first two eigenvectors for display, while the first five, considered as the totality of significant information, were used for cluster analysis. The loadings of the twenty-five variables for eigenvectors 1 and 2 are depicted in Fig. 3.
Fig. 4 shows a classical representation of the scores on eigenvectors 1 and 2 (29.6% of retained variance): almost no deduction can be drawn from it, since the objects do not seem to follow any pattern. In this case, too, a geographical represen- tation has been attempted: in Figs. 5 and 6 the scores on eigenvectors 1 and 2 were separately represented on the map of Tuscany. The first eigenvector is evidently related to either latitude or distance from the Tyrrhenian sea; on it, the coastal and the southern part of the region (high scores) are separated from the inside and the northern areas (low scores).
The second eigenvector separates the samples having the lowest scores on eigenvector 1 into two groups: the first one corresponds to the hills of
Eigenvector 1
Fig. 4. Eigenvector plot of Tuscan olive oil samples.
Valdamo and Val di Chiana (low values of eigen- vector 2), while the second one corresponds to two separate regions (Apuan Alps and a zone at the border with Umbria and Latium) with high values of eigenvector 2.
In an attempt to visualize more information than that revealed a single eigenvector, a bidimen- sional scale was also defined. In it, the first dimen- sion (eigenvector l), corresponds to the blue-red scale already used, while the second dimension (eigenvector 2), is described by a grading of brightness, going from bright to dark, through 5 class intervals (the global scale is then formed by 45 different possibilities: 9 colours, each with 5 brightnesses). This scale is shown in Fig. 7. Thus, a layer will appear coloured dark red when it corresponds to an object having high scores on both eigenvectors, in bright blue when it corre- sponds to an object with low values on both eigenvectors, and in medium-bright violet for an object with central values on both eigenvectors.
In this case, too, a SCRIPT file was prepared by a short program (available upon request), con- taining the names of objects and their colour codes derived from the score file of PARVUS. As
349 Original Research Paper n
-
Fig. 5. Eigenvector 1 map: pure blue for the lowest values of scores for eigenvector 1 and pure red for the highest.
may have been expected from the maps in Figs. 5 and 6, the resulting representation (Fig. 8), allows detection of three zones, formed essentially by the objects of coastal hills and of the South (red), by those of the low hills of Valdarno and Val di Chiana (bright blue), and by those of the high hills of the Apuan Alps and of the zone near Umbria and Latium (dark blue).
Cluster analysis
The eigenvector maps displayed groups of con- tiguous samples with very close values of scores for the first two eigenvectors; then, to verify the similarity among samples and to single out some categories, we applied cluster analysis to the infor- mation held by the five significant eigenvectors.
n Chemometrics and Intelligent Laboratory Systems 350
Fig. 6. Eigenvector 2 map: pure blue for the lowest values of scores for kigenvector 2 and pure red for the highest.
An hierarchical clustering technique was used [14]; the similarity matrix was computed on the basis of the Euclidean distance among the 119 five-dimensional scores of the objects. The objects were clustered by the average linkage method (weighted pair group). The similarity values of the linkages were represented by the dendrogram shown in Fig. 9: at a similarity value of 0.47 four clusters of similar objects and one singleton are separated.
Their composition was immediately found by assigning a colour to each cluster and drawing a new map of Tuscany, and then painting the sam- pling zone with the colour of the cluster to which the sample was belonging. This map, shown in Fig. 10, is self-explanatory: the objects of three clusters are contiguous, as the display methods had already shown, while this does not happen with the samples of the fourth cluster (light blue).
The contiguous clusters correspond to the zones
351 Original Research Paper n
Fig. 7. Bidimensional chromatic scale used for colouring Fig. 8.
already detected, the objects of the fourth one derive from sites which, although not topographi- cally adjacent, are in any case mutually similar, since they are characterized by having an altitude higher than that of the surrounding zone.
When applying the clustering procedure of complete (instead of average) linkage, almost identical clusters were found.
CONCLUSIONS
detected inside Tuscany, three of which corre- spond to different production areas of similar orography and distance from the sea.
The maps used to display variables, eigenvec- tors and clusters have been very useful in develop- ing this data analysis.
In the same way, this kind of representation could give interesting information in similar prob- lems, in which the chemical composition of sam- ples is supposed to be related to the geographic origin.
On the basis of their chemical composition, four different groups of olive oil samples were
n Chemometrics and Intelligent Laboratory Systems 352
Fig. 8. Eigenvector 1-2 map represented by the bidimensional chromatic scale in Fig. 7: eigenvector 1 is represented by the blue-
scale, eigenvector 2 by a brightness (low values of scores) and darkness (high values) scale.
.O
.I
.2
.4
.5
.6
.7
.8
.9
l_n
-red
Fig. 9. Dendrogram of Tuscan olive oils obtained by hierarchical clustering (weighted average linkage method). Red: low hills of
Valdamo and Val di Chiana; blue: high hills; green: Coastal hills and South; maroon: samples from hill surrounded by plane; black: singleton.
353 Original Research Paper m
Fig. 10. Clustering map: the sampling zones are coloured with the colour of the cluster of belonging as in Fig. 9 (light blue, instead of
maroon, was used for painting the samples of the fourth cluster, grey for the singleton).
ACKNOWLEDGEMENTS
We are grateful to Dr. P. Zunin for useful discussions. This work received financial support from the Education Department (MPI 40%) and from the National Council of Research. The paper was presented in part at the EUCHEM Con- ference, Trieste, Italy, in June 1988.
REFERENCES
1 M. Forma and E. Tiscomia, Pattern recognition methods in
the prediction of Italian olive oil origin by their fatty acid
content, Anna/i di Chimica (Rome), 72 (1982) 143-155.
2 M. Forma, C. Armanino, S. Lanteri, C. Calcagno and E.
Tiscornia, Valutazione delle caratteristiche chimiche
dell’olio di oliva in funzione dell’annata di produzione,
mediante metodi di classificazione multivariati, Rivista
Italiana delle Sostanre Grarse, 60 (1983) 607-613. 3 0. Eddib and G. Nickless, Elucidation of olive oil classifi- ,
cation by chemometrics, The Analyst (London), 112 (1987)
391-395.
4 R. Leardi and V. Paganuzzi, Caratterizzazione dell’origine
di oli di oliva extra vergini mediante metodi chemiometrici
applicati alla frazione sterolica, Rivista Italiana de& Sostanze Grasse, 64 (1987) 131-136.
5 M. Forma and C. Armanino, Eigenvector projection and
simplified non-linear mapping of fatty acid content of
Italian olive oils, Annali di Chimica (Rome), 72 (1982) 127-141.
n Chemometrics and Intelligent Laboratory Systems
6 M. Forma, C. Armanino, S. Lanteri and C. Calcagno,
Simplified non-linear mapping of analytical data, Annali di Chimica (Rome), 73 (1983) 641-651.
7 M.P. Derde, D. Coomans and D.L. Massart, Effect of
scaling on class model@ with the SIMCA method,
Analytica Chimica Acra, 141 (1982) 187-192. 8 R. Aparicio, Characterization of foods by inexact rules: the
SEXIA expert system, Journal of Chemometrics, 3 (1988) 175-192.
9 M. Forma, R. Leardi, C. Armanino and S. Lanteri,
PARVUS: an extendable package of programs for data ex-
ploration, classification and correlation, Elsevier Scientific
Software, Amsterdam, 1988.
10 AUTOCAD, Autodesk AG, Version 2.6.44 IBM PC, Ad-
vanced Drafting Extension 3.
354
11 Minister0 Agricoltura e Foreste, Metodi Ufficiali di Analisi per OIii e Grassi, Supplemento 1, Rome, 1963.
12 A. Fabbrini, G. Modi, M. Piccinini, G. Simiani and S.
Bonciani, Indagine su ohi di pressione prodotti in Toscana,
Boll. Chim. Igien., 35 (1984) 59-71.
13 S. Wold, Cross validatory estimation of the number of
components in factor and principal components models,
Technometrics, 20 (1978) 397-406.
14 D.L. Massart and L. Kaufman, Hierarchical clustering
methods, in J.D. Winefordner (Editor), The Inrerprerarion
of Analytical Chemical Data by the Use of Chuter Analysis,
Wiley, New York, 1983, pp. 75-99.