Baxter, M. J. (2001). Statistical Modelling of Artefact Compositional Data

STATISTICAL MODELLING OF ARTEFACT

COMPOSITIONAL DATA*

M. J. BAXTER

Department of Mathematics, Statistics and Operational Research, Nottingham Trent University, Clifton Campus,

Nottingham NG11 8NS, UK

Model-based methods for clustering artefacts, given their chemical composition, often assume

sampling from a mixture of multivariate normal distributions and/or make explicit assump-

tions about the way in which a composition is formed. It is argued that, analysed within a

modelling framework, several important and apparently competing methodologies are more

similar than would initially appear. The opportunity is taken to note that models for

populations are often not compatible with models for compositions, and that dilution

correctionÐwhich can be accomplished in a variety of waysÐcan be interpreted as an

attempt to resolve this problem.

KEYWORDS: CLUSTER ANALYSIS, DATA TRANSFORMATION, COMPOSITION, DILUTION,

MAHALANOBIS DISTANCE, MODEL, MULTIVARIATE NORMALITY

INTRODUCTION

This paper has arisen from a project that has, as one of its aims, the investigation of statistical

model-based approaches to the analysis of artefact compositional data. The particular focus is on

the use of model-based methods for grouping or clustering data. A model-based analysis is taken

to be an approach that incorporates some, or all, of the following features:

(1) A model is formulated for the composition of an artefact or case.

(2) A model is formulated that explains why artefacts (cases) of the same kind (and possibly

from the same source) differ in their composition. This may include assumptions about the

nature of the population from which a sample of artefacts is drawn.

(3) The data analysis is in¯uenced by the modelling assumptions of (1) and/or (2).

Model-based approaches may be contrasted with exploratory methods, which include methods

of cluster analysis (e.g., average linkage, complete linkage) commonly used in the archaeometric

literature. Model-based methods typically involve the exploitation of statistical assumptions that

are absent from exploratory methods. They may be more demanding of data and computational

resources, but are also potentially more rewarding in ways to be discussed later.

The focus in this paper is on model-based approaches that have been used in the archaeometric

literature for the analysis of artefact compositional data. A main contention is that several

apparently competing, and super®cially distinct, methodologies that have been proposed are

much more similar than is apparent at ®rst sight. For convenience of exposition, there is an

emphasis on the analysis of ceramic compositional data, though the arguments advanced hold

more generally. A separate paper will investigate model-based clustering approaches that have

been proposed in the statistical literature, but which have yet to be exploited by archaeologists

(Papageorgiou et al. 2000).

# University of Oxford, 2001.

Archaeometry 43, 1 (2001) 131±147. Printed in Great Britain

* Received 13 March 2000; accepted 5 July 2000.

It is noted that models that have been proposed for ceramic paste compositions and models

that have been used to analyse samples of ceramics are contradictory, in the sense that

assumptions of the latter are incompatible with the implications of the former. What has

sometimes been referred to as `dilution correction' can be viewed as an attempt to resolve this

problem, and this is also studied from a modelling perspective.

The next two sections establish notation and develop the models for ceramic pastes and

samples of ceramics that underpin discussion in the main section of the paper. In the main

section methodologies developed in laboratories at Brookhaven, Missouri, Bonn and Barcelona

(Bieber et al. 1976; Glascock 1992; Beier and Mommsen 1994; Buxeda 1999) are examined, and

compared within an explicit modelling framework. In all but the last case, the methodologies

depend (at least in part) on assumptions about the normality of compositional data within groups,

and use probability calculations based on Mahalanobis distance calculations to assess whether

cases belong to a group (e.g., Sayre 1975; Sayre et al. 1992; Glascock 1992; Beier and

Mommsen 1994; Leese and Main 1994). Such calculations, which exploit the modelling

assumption of normality, are not possible in exploratory methodologies, and constitute one of

the major uses of modelling assumptions in statistical analyses of archaeometric data.

NOTATION AND TERMINOLOGY

A single artefact will be referred to as a case; a collection of n cases, to be subjected to chemical

and statistical analysis, will be called a sample. Models may be proposed for both case

composition and sample distribution, as will be seen in the next section.

The observed composition for the ith case will be denoted by the p ´ 1 vector xi, with xi j the

value of case i and jth variable, Xj. Let Åxj and s2j be the estimated mean and variance of the jth

variable, with Åx the p ´ 1 vector of means.

The notation xi j will also be used for transformed data, as follows and as will be clear from the

context:

(1) xi j Ã xi j, untransformed data;

(2) xi j Ã �xi j ÿ Åxj�, centred data;

(3) xi j Ã �xi j ÿ Åxj�=sj, standardized data;

(4) xi j Ã log xi j, logged data;

(5) xi j Ã log�xi j=xi p� for j � 1; 2; . . . ; p ÿ 1, log-ratio transformed data.

The data matrix X with typical element xi j will be called the untransformed, centred,

standardized, logged or log-ratio data matrix, according to which transform is used; and other

possibilities, such as standardized logged data, exist.

The multivariate normal distribution (MVN) plays a fundamental role in most modelling

approaches to the analysis of artefact compositional data. If xi is sampled from a p-variate

multivariate normal distribution with mean vector m and covariance matrix S, write

xi , MVN�m;S�:

If all n cases are sampled from the same MVN, then Åx is an unbiased estimate of m and

S � X0X=�n ÿ 1� is an unbiased estate of S.

In many applications, data are analysed with the expectation that there is more than one group

or cluster in the data, re¯ecting the underlying population structure. In this situation assume that

the data are sampled from G populations, where G may be unknown in advance of analysis. Let

ng be the number of cases sampled from the gth population, so that SGg�1ng � n. Let

g � �g1; . . . gn�0 be a labelling vector such that gi � g if the case is a sample from the gth

132 M. J. Baxter

population. Then Xg is a data matrix whose rows consist of those x0i for which gi � g. That is, Xg

is an ng ´ p (or ng ´ � p ÿ 1� for log-ratios) data matrix. It is typically assumed that

xi , MVN�mg;Sg�

where mg and Sg are the mean vector and covariance matrix for population g and x0i is a row of

Xg.

MODELS FOR CERAMIC COMPOSITIONAL DATA

Models for cases

To ®x ideas, consider a model for a ceramic made from two components, a clay and a temper.

The composition of the ceramic paste may be modelled as

yi � p1iz1i � p2iz2i �1�

where the p ´ 1 vector zci : c � 1; 2 is the composition of component c and pci : c � 1; 2 are

mixing proportions with p1i � p2i � 1. This is a weighted sum of the two components, and

expresses, in mathematical language, a natural idea about how pastes are formed.

To explain why two pastes formed from components from the same clay and temper vary in

composition, statistical assumptions must be invoked. One obvious reason (e.g., Neff et al.

1988, 1989) is that mixing proportions, pci, may vary. Another is that zci is randomly distributed

within a source. If this distribution is modelled as MVN we have a model for a case of the

form

yi , p1iMVN�n1;Q1� � p2iMVN�n2;Q2� �2�

where nc and Qc are the mean vector and covariance matrix that characterize variation within the

source for component c.

Equation (2) generalizes directly to C > 2 components, as

yi ,XC

c�1

pciMVN�nc;Qc� �3�

and to other materials, wherePC

c�1 pci � 1.

The paste composition yi may be modi®ed by a variety of processes, so that the observed

composition xi may differ from that of the paste. Formally, this can be written as

xi Ã yi Ã z1i; z2i; . . . ; zCi

to indicate the transformation from components to paste to measured composition. To simplify

presentation it is assumed, unless otherwise stated, that there is no signi®cant difference between

the paste and observed composition, so xi and yi can be used interchangeably, and equation (3) is

the model for xi.

In equation (3), yi (or xi� is a weighted sum of normal distributions, and is thus itself normal.

The precise form of the distribution depends on the mixing proportions, pci, so that, in general,

two cases xi and xj will be samples from different normal distributions. A sample of n cases (with

the same component sources) can thus be regarded as a mixture of n MVNs (assuming that the pci

differ). Since a mixture of n MVNs is not, in general, MVN, an immediate consequence is that the

sample is not itself MVN.

The model for case compositions in equation (3) is super®cially similar to, and inspired by,

models used in the simulation studies of Neff et al. (1988, 1989) and Bishop and Neff (1989) to

133Statistical modelling of artefact compositional data

investigate the effects of tempering in the statistical analysis of compositional data. However, it

is not identical. They model the zci as log-normal rather than normal, so that for a single variable

a multiplicative model for the paste composition of the form

yi j �YCk�1

zpki

ki j

is obtained. This is less natural than either the additive model discussed above or the alternative

multiplicative model introduced later in equation (5). If individual components are log-normal

the paste composition will not, of course, be normal under the additive model.

Models for samples

Neff (1998, 116), in a discussion of chemistry-based ceramic provenance studies, argues that the

fundamental challenge is to àlign geographic co-ordinate units with the multi-dimensional

space de®ned by measured elemental concentration units'. This is based on the provenance

postulate of Weigand et al. (1977, 24) `that there exist differences in chemical composition

between different natural sources that exceed, in some recognizable way, the differences

observed within a given source'. Neff (1998, 116) equates `source' with a `point or zone of

origin' from which the materials used to make a pot originate.

This can be viewed as a `model' of the kind of variation to be expected in samples of cases

obtained from different sources. Ideally, cases will form separated clusters in a high-dimensional

space determined by the number of variables measured. These clusters may then be viewed as

samples from distinct populations that may be equated with `source' and located in geographical

space.

These ideas can be ¯eshed out in the form of statistical models, one possibility being a mixture

model of the form

xi ,XG

g�1

pgMVN�mg;Sg� �4�

where the pg are the mixing proportions, and SGg�1pg � 1. It is important to emphasize that what

we now have is a model for the population from which the sample of cases xi is drawn, rather

than a model for the composition of a single case, as previously.

The model states that the observed sample is drawn from G separate populations, which

have distinct MVN distributions. The MVN assumption is not essential, but is that almost

invariably used in model-based approaches. Many provenance studies are not based on

models such as equation (4), but rely on exploratory methods of cluster analysis to determine

the value of G, and associate cases with the population from which they are sampled. This

approach, or any sensible grouping method, is likely to work well if the component

populations of the mixture are well separated, and does not require the MVN assumption.

Problems can arise if components are not well separated and/or exhibit similar and high

variable correlations within different populations.

The main point to make here is that the models for cases in equation (3) and for samples in

equation (4) are incompatible. In particular, if the former model is valid, it cannot be assumed

that a sample of cases from a population have an MVN distribution, and hence the assumption

that populations in the latter model are MVN must be invalid. The former model is the more

fundamental and it then follows that any modelling approach based on the latter model must be

wrong, unless corrective action is taken. Some practitioners are clearly aware of this dif®culty

134 M. J. Baxter

and attempts to avoid or minimize the problem are discussed after ®rst looking at how models

have been used in practice.

MODELS IN PRACTICE

Mahalanobis distance

Mahalanobis distance plays an important role in several approaches that make use of models that

can be related to equation (4). Its properties are discussed at some length in Baxter (1999) and

Baxter and Buck (2000), and only the most salient features are reviewed here.

The squared Mahalanobis distance between a case and the centroid of a group is de®ned as

d 2i � �xi ÿ Åx�0Sÿ1

�xi ÿ Åx�:

It can be viewed as a generalization of Euclidean distance that allows for the correlation

structure of a group. The smaller d 2i is, the closer a case is to a group centroid. For suf®ciently

`small' d 2i a case may plausibly be regarded as a member of the group.

The idea of `small' may be made precise by introducing the modelling assumption that the

group is a sample from a p-variate MVN population. This allows d 2i to be transformed to a

probability ( p-value). If the p-value is too smallÐconventionally less than 0.05 or 0.01Ðit may

be doubted that the case belongs to the group against which it is being tested.

Details of implementation of this idea vary, with some researchers basing the probability

calculations on large sample approximations (e.g., Beier and Mommsen 1994) and others on

small sample approximations (e.g., Glascock 1992). The more sophisticated uses of the idea take

into account whether or not xi belongs to the group against which its membership is being tested

(e.g., Leese and Main 1994; Slane et al. 1994).

In ceramic studies this usage of Mahalanobis distance can be traced to work at the Brookhaven

National Laboratory (BNL) in the 1970s (Sayre 1975; Bieber et al. 1976; Harbottle 1976). It has

subsequently been developed by researchers, some now at the Missouri University Research

Laboratory (MURR) (Neff 1992; Glascock 1992) and will be referred to as the BNL/MURR

approach. In certain approaches to the statistical analysis of lead isotope ratio data, Mahalanobis

distance is also important (Sayre et al. 1992). Lead isotope ®elds for an ore source are assumed

to have a trivariate normal distribution. Samples are used to estimate the distribution of a ®eld,

and cases too distant from the centroid of the sample are excluded from this estimation process.

Once a ®eld has been delineated in this way, Mahalanobis distance and probability calculations,

based on the lead isotope signatures of artefacts, may be used to assess whether an ore source is a

possible provenance for an artefact.

A major practical problem in using Mahalanobis distance that has long been recognized is the

sample size requirement. For a single group, as a minimum, n > p is necessary in order to be able

to estimate S. For stable estimation much larger values of n are needed, with n > 3p and

preferably n > 5p having been suggested (Harbottle 1976). The sample size implications, when

there are several groups in the data and p is large, as is typical with many modern analytical

techniques such as Neutron Activation Analysis (NAA), are obvious.

Even for problems where the dimensionality is small, as in lead isotope ratio analysis where

p � 3, the requirements are not trivial. Pollard and Heron (1996, 328) detected an emerging

consensus that n � 20 was an àgreeable minimum' sample size. This has been challenged in

Baxter et al. (2000) who argue that 20 may be adequate if lead isotope ®elds are MVN

distributed, but will usually be inadequate if one wishes to test this modelling assumption.


Two themes emerge here, that recur in the study of model-based methods. One is that sample

sizes may preclude the use of model-based methods that one would like to use. The second is

that, even if such use is possible, it will often not be feasible to test the assumptions of the model,

so that the model's validity is essentially an act of faith.

The BNL/MURR approach

The BNL/MURR approach is not prescriptive, but rather may be viewed as comprising a set of

tools to be deployed as appropriate (e.g., Slane et al. 1994; Lizee et al. 1995; Steponaitis et al.

1996; Hegmon et al. 1997; Triadan et al. 1997; Herrera et al. 1999). The discussion that follows

concentrates on aspects relevant to model-based approaches to statistical analysis.

Typically data are logged (to base 10) before analysis. One reason for this is an assumption

that the variables (often trace elements measured in ppm) are more likely to be MVN within

groups on the log scale, and the MVN assumption is necessary for other procedures that are used.

The merits, or otherwise, of transformation are discussed in Sayre (1975), Bieber et al. (1976),

Pollard (1986), Beier and Mommsen (1994), Baxter (1995) and below. It is often assumed that

measurement error is unimportant in relation to `natural' variation, so that the former can be

ignored, and an explicit justi®cation for this assumption is given in Bieber et al. (1976).

A recurring concern is the problems posed by high correlations among variables within

populations (Sayre 1975; Harbottle 1976, 1991). If present, these give rise to populations, and

samples from them, that have an elliptical shape in p-dimensional space. Exploratory methods of

cluster analysis are the multivariate workhorse in the analysis of compositional data, and often

the only method presented in publications (Baxter 1994). These often produce spherical clusters

of roughly equal size and can be misleading if the true structure is ellipsoidal. This can be

demonstrated explicitly for Ward's method of cluster analysis, but may also be the case for other

supposedly `model-free' methods (Gordon 1999, 65±8). Where this is a concern a solution is to

model the correlation or covariance structure explicitly (Krzanowski and Marriott 1995, 89).

The BNL/MURR approach relegates cluster analysis to a minor role and applies group

evaluation methodology to groups provisionally de®ned using cluster analysis or on the basis of

archaeological criteria. Using the assumption that the groups being sought are samples from

MVN distributions, probability calculations are undertaken using Mahalanobis distance to

evaluate whether or not a case could belong to a group. This allows for the elliptical shape of

clusters, provided that sample sizes permit the calculation. Where this is not immediately

possible, recourse may be had to a subset of the principal components of the data (e.g., Slane et

al. 1994; Herrera et al. 1999) or to subsets of variables that appear to discriminate between

groups (Bieber et al. 1976, 67±8). Cases may be added to, or deleted from, a group and the

process repeated until a stable clustering is found. It is worth noting that, where initial groups are

de®ned using multivariate methods, this often involves the use of unstandardized logged data.

Subsequent group evaluation does, however, introduce standardization through the use of

Mahalanobis distance.

It may be remarked that, in assuming an MVN distribution, cases that do not conform with the

assumption are likely to be rejected from groups, so that what ®nally remains will tend to be

MVN. In this sense the MVN assumption might be regarded as a `self-ful®lling prophecy' that

tends to impose MVN structure on the results obtained.

The Bonn approach

The approach developed at Bonn University (e.g., Mommsen et al. 1988; Beier and Mommsen

136 M. J. Baxter

1994) challenges several aspects of the BNL/MURR methodology, including the need for using

logged data, the lack of importance of measurement error, and the importance of high

correlations. The argument to follow is that the methodologies are more similar than might

appear at ®rst sight.

The central ideas in Beier and Mommsen (1994) are readily explained. Starting from a single

case, or a small number of similar cases, a group is `grown' by adding to it cases that are `close'

to the starting set. This gives a new group, and cases close to this are further added to the group.

This process proceeds iteratively until no cases are close to the group. A second group is then

`grown' from a different starting point and so on until all cases are assigned to a group or

regarded as outliers. The fundamental modelling assumption is that groups are sampled from an

MVN distribution. Closeness to a group is decided on the basis of distance and probability

calculations, underpinned by the MVN assumption, using either weighted Euclidean or

Mahalanobis distance.

Groups are de®ned sequentially and iteratively, whereas the BNL/MURR method involves a

simultaneous determination of groups that are subsequently re®ned in an iterative fashion.

Arguably, the spirit underlying both methods is very similar.

The Bonn methodology allows for measurement uncertainty. In assessing whether a case xi

could belong to a group, S in the de®nition of Mahalanobis distance is replaced by ÃS � Sx � S,

where Sx is a diagonal matrix of the measurement ùncertainties' (i.e., the variance of the

analytical error). The jth diagonal element of ÃS may be written as s2x j � s2

j , where the former

term is the analytical error variance, and the latter term the estimated variance of the variable

within the group, which incorporates both analytical and natural error. This generalization of

other methodologies may have little effect in many practical situations if either natural variation

dominates analytical variation, or if there are only a few variables for which analytical variation

is the major component of variability within a group.

It is argued, in Beier and Mommsen (1994), that there is no need to transform data

logarithmically, as this gives similar results to the use of untransformed data. Some of the

detailed evidence for this is given in Beier and Mommsen (1991). Evaluating the general

validity of this claim raises a number of issues that have wider implications. One reason for a

lack of distinction between results based on untransformed and logged data appears to be the

relatively small spread of variable values within groups. This may well be a function of the

particular type of ®ne ware studied, and does not necessarily generalize. It is unlikely that it is

the high precision of results that allows the formation of groups with quite small spreads (Beier

and Mommsen 1991), since precision of measurement and the natural spread of a variable in a

group are logically unrelated. It is possible that the modelling methodology used gives rise to

results of the kind reported. In particular, the groups examined for normality are de®ned by an

algorithm that assumes normality, and which may impose that kind of structure on the groups

found. This relation between modelling assumptions and structure found has already been noted

for the BNL/MURR methodology.

Another possible reason for the lack of a distinction between results for untransformed and

logged data is the role played by standardization. Beier and Mommsen (1994) argue against the

use of principal component analysis, on the basis that this usually involves the use of

standardized data, and that this changes as cases are added to a data base. However, their

approach is also dependent on standardization. The main difference is that standardization takes

place within groups rather than across the sample as a whole (and will change as these groups are

iteratively modi®ed). Baxter (1995) found little difference in results obtained in studies using

standardized data and standardized logged data, except in the presence of clear outliers. Since


the Bonn methodology does depend on standardization, and outliers are excluded in group

formation, it follows that results may not be strongly dependent on whether or not data are

logged.

The gist of the argument so far is that the Bonn and BNL/MURR approaches are more similar

than would seem at ®rst sight. The re®nement of allowing for analytical error in the grouping

procedure will make little difference when natural variation dominates. When high-precision

data are used, and analytical variation dominates natural variation, groups are likely to be tightly

de®ned and found by any reasonable method. When analytical variation dominates and precision

is low it is questionable whether the elements so affected should be used in clustering. It has also

been argued that the use of untransformed as opposed to logged data is not a critical difference.

The treatment of highly correlated data is considered in the subsection on modelling dilution

effects.

Multiplicative models

An alternative multiplicative model for case compositions to that introduced earlier is that of

Buxeda (1999). This has statistical implications in the sense that a particular log-ratio,

transformation of the dataÐnot widely employed in archaeometric studiesÐis indicated for

data analysis. Given this transformation, any of the available methods of statistical analysis

might then be used. It is argued in what follows that Buxeda's (1999) approach will often be well

approximated by the simpler use of logged data, bringing it into the mainstream of methodol-

ogies that have been proposed for artefact compositional analysis.

Buxeda's (1999) model views the composition as a perturbation of the original clay

composition, z1i say. If there are A separate perturbation processes, a multiplicative model,

for a single element, of the form

Äxi j � z1i jui j �5�

is obtained, where

ui j �YA

a�1

uai j

and uai j > 0 represents the effect of the perturbation at the ath stage, on the composition that

exists at that point.

If all naturally occurring elements (D say) in the periodic table are measured, the xi j must sum

to 100%, so that the observed composition is of the form

xi j � 100Äxi j

�XD

j�1

Äxi j

andPD

j�1 xi j � 100. This gives rise to (fully) compositional data in the sense of Aitchison

(1986).

The compositional constraint presents problems for `standard' statistical analysis, documen-

ted in Aitchison (1982, 1986). One way of avoiding these problems, and that advocated in

Buxeda (1999), is to base analyses on log-ratios of the form

yi j � log�xi j=xi D� �6�

for j � 1; 2; . . . ; D ÿ 1 and a suitable choice of the Dth element for the divisor. Part of the

reasoning behind this choice is that, whereas xi j is constrained to lie between 0 and 100%, yi j is

138 M. J. Baxter

unrestricted and more amenable to analysis by standard statistical methods. In practice, of

course, only p p D elements are used. Provided that these are such thatPp

j�1 xi j < 100, similar

considerations apply with yi j � log�xi j=xi p� and j � 1; 2; . . . ; p ÿ 1. If the p measured elements

are such thatPp

j�1 xi j p 100, there is less of a case for using the log-ratio transformation (see the

discussion of Aitchison 1982), but this debate is not pursued here.

Equation (5) can be viewed as a mathematical model of ceramic compositional data, the form

of which suggests that a particular data transformation be applied before statistical analysis. That

the transformation depends on the choice of divisor may be viewed as a potential weakness of

the methodology, and is the focus of some attention in Buxeda (1999), following Aitchison

(1986, 1990). The choice is intimately linked to the concept of the variation matrix de®ned

(Aitchison 1990) as the p ´ p matrix with typical element

ti j � varflog�Xi=Xj�g � v2i � v2

j ÿ 2ri jvivj

where varf:g is the variance; v2i is the variance of the logarithm of variable i, varflog�Xi�g; and

ri j is the correlation between log�Xi� and log�Xj�. The total variation in the data is then de®ned as

vt �X

i j

ti j=2p

and

t:s �X

i

tis

is the total variance in the log-ratio covariance matrix when variable s is used as a divisor. It can

be shown that t:s > vt, and the excess is interpretable as variability imposed by the choice of

variable s as the divisor in equation (6). Buxeda's (1999) strategy is to choose as a divisor that

variable for which vt=t:s is a maximum. In other words' variable s is chosen to impose the least

variability.

The choice of divisor requires the minimization of

t:s �Xp

i�1

v2i � pv2

s ÿ 2vs

Xp

i�1

risvi:

The ®rst term on the right-hand side is constant for all s, so it is the last two terms that must be

minimized. If, on the log-scale, a variable is approximately constant, these last two terms will

be approximately zero, and analysis is then effectively based on the log-transformed data of the

remaining variablesÐa standard procedure (e.g., Bieber et al. 1976; Glascock 1992).

This ideal is unlikely to arise in practice, but the thrust of Buxeda's (1999) strategy is to

choose the variable for which this state is most closely approximated. It can thus be conjectured

that the use of log-ratios proposed by Buxeda (1999) will often be closely approximated by the

simpler procedure of using logarithms of variables. The argument above is heuristic, but

empirical evidence suggests that it is reasonable. In Buxeda's (1999) analysis of ceramic data

from Abella results are largely determined by just six of 14 ratios used. These account for most

of the variance in the data on the unstandardized log-ratio scale. It is straightforward to

determine, empirically, that the same six variables dominate an analysis based on unstandar-

dized log-transformed data, and lead to virtually identical results. Essentially this happens

because the transformations (log-ratio or log) differentially weight the variables in the absence

of subsequent standardization, leading to implicit variable selection of the same variables. In the


context of analyses of glass compositional data that sum to 100%, Baxter (1993) noted a

tendency for a small number of minor oxides with high variances on the log-ratio scale to

dominate analysis. Re-analysis of several glass data sets of the kind referred to has con®rmed

that virtually identical results are obtained if unstandardized log-transformed data are used.

Finally, some previous debate on the relative merits of using log-transformed and log-ratio data

has demonstrated that they produce similar results (Church 1995; Hoard et al. 1995). The

analysis given above may help explain why.

Likelihood and Bayesian clustering

Other model-based approaches to clustering that have received limited use in archaeometry are

only noted brie¯y. These include the Bayesian methodology of Buck and Litton (1996) and Buck

et al. (1996), which assumes that within a particular provenance a sample of cases (possibly after

transformation) is drawn from an MVN distribution. The total sample is assumed to be drawn

from a mixture of G such distributions, where G is unknown. The procedure is illustrated in Buck

and Litton (1996) for a 150 ´ 15 data set, in which there are three fairly clear groups.

Similar assumptions underpin classi®cation and mixture maximum likelihood models that

have also had little archaeometric application. They are investigated in Papageorgiou et al.

(2000). One potential attraction of both methodologies is that tests of the numbers of groups in

the sample are possible. A second potential attraction is the ability to model elliptical groups of

the kind to be expected with correlated data. Kraznowski and Marriott (1995) give a concise

account of the mathematics of the methodologies.

Modelling `dilution' effects

The term `dilution' has been used in various ways in the literature and is introduced here through

an idealized example. Suppose that for a two-component paste, modelled as in equation (1),

repeated below in slightly modi®ed form,

yi � p1iz1i � �1 ÿ p1i�z2i

the clay, z1i, involves � p ÿ 1� non-zero elements and is identical for each case in a sample of n.

Suppose, also, that the temper consists of a single element, different from those in the clay. The

composition for a case is then

� p1iz1i 1; p1iz1i 2; . . . ; p1iz1i � pÿ1�; 100�1 ÿ p1i��

since the pth variable comprises 100% of the temper. It is clear in this formulation that the

composition of cases from the same clay source differs only because of differences in p1i or,

equivalently, the proportion of temper in the paste. This effect, in which the variable addition of

temper to a paste can obscure the similarity of the clay compositions has sometimes been

referred to as a dilution effect.

What we have here is a simple model of dilution. If interest centres on identifying cases for

which the clay source is the same, or similar, the model can be used to remove or understand the

effect of tempering, in order better to identify cases with similar sources. Bishop and Neff's

(1989, 83) emphasis on the importance of modelling encompasses this kind of modelling, as

distinct from the more purely statistical models discussed elsewhere in this paper.

In the present simple example, a dilution correction can be accomplished in several equivalent

ways. One is to identify and remove the tempering element from the composition and rescale

remaining elements to sum to 100%. A second possibility is to remove the tempering element

140 M. J. Baxter

and work with ratios of the form

yi j=yi k � p1iz1i j=p1iz1i k � z1i j=z1i k

for j Þ k and some choice of k, in which the effect of tempering is `cancelled out'. A third

possibility is to note that, for distinct cases i and j,

yi k=yj k � p1iz1i k=p1jz1j k � p1i=p1j � ai j

where ai j is constant for all k, and estimate ai j using any k. Once this is done, the values of any

case may be adjusted to match, as closely as possible, that of any other case or, as is more

commonly done, a group mean. These ideas, while simple and based on an idealized model,

form the basis of much that has been done in practice to deal with dilution effects.

For i � 1; . . . ; n and any pair of variables, j; k, not involved in the temper, a bivariate plot of yi j

against yi k will show a scatter of points lying on a straight line, or a vector passing through the

origin. This can be seen by noting that the plot is of p1iz1i j against p1iz1i k and, by assumption,

z1i k � bz1i j for some constant b and for all i. Another way of stating this is that the effect of

dilution, of the kind being modelled, will be to induce high positive correlations among the

variables. In the present case a dilution correctionÐbased on the centroid of the point scatter, for

exampleÐwill be to `shrink' all observations to that point and remove all correlation from the

data.

It is, in fact, possible to view high positive correlations among variables as potentially

indicative of dilution, where dilution is now interpreted in a much more general sense that an

effect due to tempering. This is the view taken in Beier and Mommsen (1994, 295±6) who note a

variety of technical effects that can give rise to `dilution', which they de®ne to ìnclude both

shifts due to different additional components in the clay and due to technical effects'. From this

perspective dilution can give rise to elliptical clouds of points, in p-dimensional hyperspace, in

which the major axis of the ellipse passes through the origin. It may be remarked that this

de®nition of dilution encompasses data that are naturally, and highly correlated, a point that is

considered further below.

In practice, of course, matters are more complicated. Clays from the same source will vary;

tempers will consist of more than one element; and the elemental composition of clays and

tempers will overlap. Nevertheless, several researchers have considered the tempering model

used above to be suf®ciently close to what may sometimes occur in practice to devote time to

developing methods to correct for it.

More realistically, and expressed somewhat informally, `dilution' may be occurring if, for two

cases, i and j,

yi k < ai jyj k

for some constant ai j and for a majority, p0, of the p variables. In our idealized example ai j could

be determined from the ratio yi k=yj k for any variable, k, not involved in the temper. In practice,

different k will give rise to different values, so that ai j must be estimated in some way.

In the best relative ®t method developed at BNL (Harbottle 1976), if it is thought that a case yi

is related to a group with mean Åy and a dilution model of the form yi j � ai Åyj is postulated, ai is

estimated as

Ãai �Yp

j�1

�yi j=Åyj�

" #1=p0


the geometric mean of the correction factor determined separately for each element considered

to be affected by dilution. From this, adjusted values of the form yi j= Ãai can be calculated. An

arithmetic mean might also be used (e.g., Mommsen et al. 1988, 50).

The most general of the procedures proposed in Beier and Mommsen (1994) is numerically

more complex. For matching a case to a mean a modi®ed Mahalanobis `distance' of the

form

d 2i � �vixi ÿ Åx�0 ÃSÿ1

vi�vixi ÿ Åx�

is used, where vi is a parameter, used to model the dilution effect, that is estimated to minimize

d 2i , and

ÃSvi� v2

i Sx � S:

In general, vi must be determined numerically, although simpli®cation is possible if the

measurement uncertainty is ignored.

Buxeda's (1999) methodology, being based on log-ratios, will deal with dilution effects of

the kind under consideration. Similarly, ratios in the form

yi j � logfxi j=g�xi�g

where g�xi� is the geometric mean of the elements of xi, have been used explicitly to deal with

dilution effects arising from tempering in Leese et al. (1989) and, less transparently, to deal with

dilution arising for technical reasons, in Pike and Fulford (1983).

To illustrate some of the foregoing ideas, a data set published in Tubb et al. (1980) on the

chemical composition of Romano-British pottery, measured by atomic absorption spectrometry,

will be used. In its original form this is a 48 ´ 9 data set. The pottery comes from ®ve kiln sites in

three regions and previous multivariate analyses suggest the three regions are chemically

distinct. This is shown in the upper plot of Figure 1 based on a principal component analysis of

scaled data. Four oxides (Fe2O3, MgO, CaO and K2O), identi®ed in Tubb et al. as the only ones

necessary for discrimination have been used, and one outlier with an unusually low value of K2O

has been omitted (C14 in the original publication).

The separation of the three regional groups is evident. Groups 1 and 2 are dispersed in

comparison to group 3 and, to the centre right of the plot, there is a single specimen of group 1

that is outlying with respect to the rest of the group. It has previously been noted that this is a

multivariate outlier, possibly attributable to dilution effects (Baxter 1999). This can be examined

and corrected for in a variety of ways. For example, if the model yi j � ai Åyj is postulated this

gives rise to a model of the form log�yi j� � log�ai� � log�Åyj�. In Figure 2 log�yi j� is plotted

against log�Åyj� (using base 10 logarithms) along with the associated regression line. The solid

line is that which would be obtained in the absence of dilution (i.e., log�ai � 0�. It is

approximately parallel to the regression line and the difference between the two lines, of

about ÿ0:14, gives an informal estimate of log�ai�. This suggests that ai is about 0.73 or, in other

words, that values for the specimen be multiplied by about 1.37 (� 1=0:73) to `correct' for

dilution. Using exact calculations, the best relative ®t method leads to ai � 0:76 and a

multiplying factor of 1.32. Use of the Beier±Mommsen approach, ignoring measurement

error, gives a multiplying factor of 1.34. Even more simply, averaging Åyj=yi j leads to a

multiplying factor of 1.33. In this instance, the different methods of correcting for dilution

lead to very similar results.

Using this last approach to obtaining a correction factor, and adjusting all cases in group 1 to

142 M. J. Baxter


Figure 1 Principal component plots of scaled data using a subset of variables and cases from Tubb et al. (1980) (see text

for details). The upper ®gure uses the original data, and the lower after `correcting' for dilution in two of three regional

groups.

the mean of that group, and similarly for group 2, leads to the principal components plot in the

lower part of Figure 1. It can be seen that, with the exception of four cases in group 1, the groups

are more concentrated and spherical than in the original analysis.

It is interesting to view these attempts to deal with dilution in the context of the statistical

models for compositional data that have been used in this paper. It has been argued that the MVN

assumption used in all the modelling applications discussed is at odds with the more

fundamental model of case composition presented in equation (3). One way to reconcile the

two models is to transform case compositions in such a way that the MVN assumption for a

sample from a component population is more likely to be true. Dilution correction procedures

can be viewed as an attempt to do precisely this, and the variety of approaches noted above all

stem from the same simple model of dilution.

When sample sizes preclude the use of Mahalanobis distance recourse must be had to

Euclidean distance, and this is less than ideal when dealing with highly elliptical clusters.

Dilution corrections will, if successful, have the effect of reducing correlations within

groupsÐpossibly quite considerably (Beier and Mommsen 1994)Ðso that Euclidean

methods are more satisfactory. Beier and Mommsen (1994) interpret their results as showing

that the prevalence of highly correlated data, and the problems it causes, has been

exaggerated, but their methodology does not distinguish between naturally and arti®cially

correlated data. The foregoing argument suggests that their methodology can be interpreted

as an approach to data transformation that will generate approximately spherical and

normally distributed groups if successful.

Thus, from a statistical standpoint, dilution correction can be viewed as a methodology for

making model assumptions more valid and easing the computational burden.

144 M. J. Baxter

Figure 2 Using four variables, the logged (to base 10) values for a single case are plotted against the logged means for

each of the variables and a regression (dotted) line is shown. The good linear ®t is approximately parallel to the solid line

that would be obtained if the case had values equal to the means and is indicative of a dilution effect. The vertical

distance between the lines provides an estimate of log�ai�, where ai de®nes the proportionate relation between the case

and `variable' means in the presence of dilution.

DISCUSSION

This paper has examined a number of competing approaches to the statistical analysis of artefact

compositional data within a model-based framework. It has been argued that, despite their

apparent differences, the methods examined have strong similarities and might often be

expected to produce similar results in practice.

Using a modelling framework makes quite explicit the fact that models that have been

entertained for artefact compositions are at variance with models that are used for grouping

cases. Whether this is a serious problem will depend on the separation of groups and departure

from multivariate normality of a sample from a population. Methods that assume normality will

tend to impose normal structure on the groups found, regardless of the `true' situation, and may

mislead about their effectiveness. Different approaches to dilution correction can be viewed as

attempts to transform compositions to normality that satisfy the assumptions of grouping

procedures, although this is not the primary reason for the development of such approaches. In

the absence of dilution and presence of non-normality simpler methods, such as the use of log

transformation, exist.

Finally, the creators of the methods discussed here might not necessarily view what they do as

`model-based', although adopting such a view helps in understanding how methods compare.

More thorough-going model-based Bayesian and likelihood methods of grouping data,

developed in the statistical literature and little used in archaeometry, have been noted. They

have the potential attraction that clusters of highly correlated variables can be dealt with in a

natural fashion, and tests of the number of clusters in the data are possible. Some of these

methods are investigated in more detail in Papageorgiou et al. (2000).

ACKNOWLEDGEMENTS

I am particularly grateful to Thomas Beier, Caitlin Buck, Jaume Buxeda i GarrigoÂs, Hans Mommsen and Hector Neff for

discussing their approaches to data analysis with me. There is no implication that they necessarily agree with my

interpretations, and any misunderstandings and errors are entirely my responsibility. Jaume Buxeda i GarrigoÂs is thanked,

additionally, for providing and allowing use of his data from Abella. My colleagues Christian Beardah and Ioulia

Papageorgiou are thanked for contributing to my understanding of the practicalities of implementing some of the methods

discussed. This work forms part of the GEOPRO Research Network funded by the DGXII of the European Commission,

under the TMR Network Programme (Contract Number ERBFMRX-CT98-0165).

REFERENCES

Aitchison, J., 1982, The statistical analysis of compositional data (with discussion), Journal of the Royal Statistical

Society, B44, 139±77.

Aitchison, J., 1986, The statistical analysis of compositional data, Chapman and Hall, London.

Aitchison, J., 1990, Relative variation diagrams for describing patterns of compositional variability, Mathematical

Geology, 22, 487±511.

Baxter, M. J., 1993, Comment on D. Tangri and R. V. S. Wright, `Multivariate analysis of compositional data ldots',

Archaeometry, 35, 112±15.

Baxter, M. J., 1994, Exploratory multivariate analysis in archaeology, Edinburgh University Press, Edinburgh.

Baxter, M. J., 1995, Standardization and transformation in principal component analysis, with applications to

archeometry, Applied Statistics, 44, 513±27.

Baxter, M. J., 1999, Detecting multivariate outliers in artefact, compositional data, Archaeometry, 41, 321±38.

Baxter, M. J., and Buck, C. E., 2000, Data handling and statistical analysis, in (eds. E. Ciliberto and G. Spoto) Modern

analytical methods in art and archaeology 681±746, John Wiley, New York.

Baxter, M. J., Beardah, C. C., and Westwood, S., 2000, Sample size and related issues in the analysis of lead isotope data,

Journal of Archaeological Science, 27, 973±80.


Beier, T., and Mommsen, H., 1991, On the distribution function of elements within groups of pottery and some

consequences for multivariate analysis, unpublished conference paper.

Beier, T., and Mommsen, H., 1994, Modi®ed Mahalanobis ®lters for grouping pottery by chemical composition,


Bieber, A. M., Brooks, D. W., Harbottle, G., and Sayre, E. V., 1976, Application of multivariate techniques to analytical

data on Aegean ceramics, Archaeometry, 18, 59±74.

Bishop, R. L., and Neff, H., 1989, Compositional data analysis in archaeology, Archaeological chemistry IV in (ed. R. O.

Allen), American Chemical Society Advances in Chemistry Series 220, 57±86, Washington, DC.

Buck, C. E., and Litton, C. D., 1996, Mixtures, Bayes and archaeology, in Bayesian statistics 5 (eds. J. M. Bernado, J. O.

Berger, A. P. Dawid and A. F. M. Smith), 499±506, Clarendon Press, Oxford.

Buck, C. E., Cavanagh, W. G., and Litton, C. D., 1996, Bayesian approach to interpreting archaeological data, John

Wiley, New York.

Buxeda, i GarrigoÂs, J., 1999, Alteration and contamination of archaeological ceramics: the perturbation problem, Journal

of Archaeological Science, 26, 295±313.

Church, T., 1995, Comment on `Neutron-activation analysis of stone from the Chadron formation and a Clovis site on the

Great Plains' by Hoard et al. (1992), Journal of Archaeological Science, 22, 1±5.

Glascock, M. D., 1992, Characterization of archeological ceramics at MURR by neutron activation analysis and

multivariate statistics, in Chemical characterization of ceramic pastes in archaeology (ed. H. Neff), 11±26,

Prehistory Press, Madison, Wisconsin.

Gordon, A. D., 1999, Classi®cation, 2nd edn, Chapman and Hall/CRC, London.

Harbottle, G., 1976, Activation analysis in archaeology, in Radiochemistry 3 (ed. G. W. A. Newton), 33±72, Chemical

Society, London.

Harbottle, G., 1991, The ef®ciencies and error-rates in Euclidean Mahalanobis searches in hypergeometries of

archaeological ceramic compositions, in Archaeometry 90 (eds. E. Pernicka and G. A. Wagner), 413±24, BirkhaÈuser

Verlag, Basel.

Hegmon, M., Allison, J. R., Neff, H., and Glascock, M. D., 1997, Production of San Juan red ware in the Northern

Southwest: insights into regional interaction in early Puebloan prehistory, American Antiquity, 62, 449±63.

Herrera, R. S., Neff, H., Glascock, M. D., and Elam, J. M., 1999, Ceramic patterns, social interaction, and the Olmec:

neutron activation analysis of Early formative Pottery in the Oaxaca Highlands of Mexico, Journal of Archaeological

Science, 26, 967±87.

Hoard, R. J., Holen, S. R., Glascock, M. D., and Neff, H., 1995, Additional comments on neutron-activation analysis of

stone from the Great PlainsÐreply, Journal of Archaeological Science, 22, 7±10.

Krzanowski, W. J., and Marriott, F. H. C., 1995, Multivariate analysis: part 2, Edward Arnold, London.

Leese, M. N., and Main, P. L., 1994, The ef®cient computation of unbiased Mahalanobis distances and their

interpretation in archaeology, Archaeometry, 36, 307±16.

Leese, M. N., Hughes, M. J., and Stopford, J., 1989, The chemical composition of tiles from Bordesley, in Computer

applications and quantitative methods in archaeology 1989 (eds. S. Rahtz and J. Richards), 241±9, BAR

International Series 548, British Archaeological Reports, Oxford.

Lizee, J. M., Neff, H., and Glascock, M. D., 1995, Clay acquisition and vessel distribution patternsÐneutron activation

analysis of Late Windsor and Shantok tradition ceramics from southern New England, American Antiquity, 60, 515±

30.Mommsen, H., Kreuser, A., and Weber, J., 1988, A method for grouping pottery by chemical composition,


Neff, H, (ed.), 1992, Chemical characterization of ceramic pastes in archaeology, Prehistory Press, Madison, Wisconsin.

Neff, H., 1998, Units in chemistry-based provenance investigations of ceramics, in Measuring time, space and material:

unit issues in archaeology (eds. A. F. Ramenofsky and A. Steffen), 115±27, University of Utah Press, Provo, UT.

Neff, H., Bishop, R. L., and Sayre, E. V., 1988, A simulation approach to the problem of tempering in compositional

studies of archaeological ceramics, Journal of Archaeological Science, 15, 159±72.

Neff, H., Bishop, R. L., and Sayre, E. V., 1989, More observations on the problem of tempering in compositional studies

of archaeological ceramics, Journal of Archaeological Science, 16, 57±69.

Papageorgiou, I., Baxter, M. J., and Cau, M. A., 2000, Model-based cluster analysis of artefact compositional data

(submitted for publication).

Pike, H. H. M., and Fulford, M. G., 1983, Neutron activation analysis of black-glazed pottery from Carthage,


Pollard, A. M., 1986, Multivariate methods of data analysis, in Greek and Cypriot pottery: a review of scienti®c studies

(ed. R. E Jones), 56±83, Occasional Paper 1, British School at Athens Fitch Laboratory, Athens.

146 M. J. Baxter

Pollard, A. M., and Heron, C., 1996, Archaeological chemistry, Royal Society of Chemistry, Cambridge.

Sayre, E. V., 1975, Brookhaven procedures for statistical analyses of multivariate archaeometric data, Unpublished

manuscript.

Sayre, E. V., Yener, K. A., Joel, E. C., and Barnes, I. L., 1992, Statistical evaluation of the presently accumulated lead

isotope data from Anatolia and surrounding regions, Archaeometry, 34, 73±105.

Slane, K. W., Elam, J. M., Glascock, M. D., and Neff, H., 1994, Compositional analysis of eastern sigillata A and related

wares from Tel-Anafa (Israel), Journal of Archaeological Science, 21, 51±64.

Steponaitis, V. P., Blackman, M. J., and Neff, H., 1996, Large scale patterns in the chemical composition of

Mississippian pottery, American Antiquity, 61, 555±72.

Triadan, D., Neff, H., and Glascock, M. D., 1997, An evaluation of the archaeological relevance of weak-acid extraction

ICP: White Mountain redware as a case study, Journal of Archaeological Science, 24, 997±1002.

Tubb, A., Parker, A. J., and Nickless, G., 1980, The analysis of Romano-British pottery by atomic absorption

spectrophotometry, Archaeometry, 22, 153±71.

Weigand, P. C., Harbottle, G., and Sayre, E. V., 1977, Turquoise sources and source analysis: Mesoamerica and the

southwestern U.S.A., in Exchange systems in prehistory (eds. T. K. Earle and J. E. Ericson), 15±34, Academic Press,

New York.


Documents

Baxter, M. J. (2001). Statistical Modelling of Artefact Compositional Data