15
Methodological Issues In The Study Of Assemblage Diversity M.J. Baxter Department of Mathematics, Statistics and Operational Research, The Nottingham Trent University, Clifton Campus, Nottingham, NG11 8NS, UK. Abstract Three approaches that have been used to investigate assemblage diversity in the archaeo- logical literature, two established and one new, are studied, with a particular emphasis on assemblage richness. It is argued that the established regression and simulation approaches, as often used, are only strictly valid if they assume what they are supposed to test - namely that assemblages are sampled from populations with the same richness or structure . Rar- efaction methodology provides an alternative to the simulation approach and suggests, that even if the latter is used, sampling without rather than with replacement is preferable. Some potential limitations of a recently proposed approach using jackknife methods are noted, and it is suggested that bootstrapping may be a more natural resampling method to use. Introduction The comparison of archaeological assemblages in terms of indices that measure different aspects of diversity has attracted attention since the early 1980s. Diversity can be charac- terized in different ways, one of the simplest being the richness of an assemblage, defined as the number of artifact classes represented in an assemblage. If an assemblage is conceived as a sample from a population containing a finite number of classes then richness will tend to increase as sample size increases up to some limit. This sample-size/richness relation militates against the direct comparison of richness from different assemblages as a means of evaluating the similarity of the underlying populations in terms of richness. Two ’standard’ approaches developed to deal with this problem are the regression method of Jones et al. (1983) and the simulation method of Kintigh (1984, 1989). This paper examines some of the methodological issues that arise in using these methods, and a newer jackknifing approach suggested by Kaufman (1998). This is by no means the first such critique and previous reviews include Rhode (1988), Bobrowsky and Ball (1989), Cowgill (1989), Dunnell (1989), McCartney and Glass (1990), Ringrose (1993), Byrd (1997), Kaufman (1998) and Orton (2000). Despite some rather trenchant criticism of the methods (e.g. Ringrose 1993) they continue to be used (e.g. Shott 1997; Potter 1997; Grayson and Cole 1998; Dye and Tuggle 2000) and to be developed further (Byrd 1997), so that further examination of their merits or otherwise seems warranted. In part, this paper reiterates previous critiques though possibly in a different way that I hope clarifies some of the issues involved, particularly with regard to the regression 1

Methodological Issues In The Study Of Assemblage Diversity

Embed Size (px)

Citation preview

Methodological Issues In The Study Of Assemblage Diversity

M.J. BaxterDepartment of Mathematics, Statistics and Operational Research, The Nottingham Trent

University, Clifton Campus, Nottingham, NG11 8NS, UK.

Abstract

Three approaches that have been used to investigate assemblage diversity in the archaeo-logical literature, two established and one new, are studied, with a particular emphasis onassemblage richness. It is argued that the established regression and simulation approaches,as often used, are only strictly valid if they assume what they are supposed to test - namelythat assemblages are sampled from populations with the same richness or structure . Rar-efaction methodology provides an alternative to the simulation approach and suggests, thateven if the latter is used, sampling without rather than with replacement is preferable. Somepotential limitations of a recently proposed approach using jackknife methods are noted, andit is suggested that bootstrapping may be a more natural resampling method to use.

Introduction

The comparison of archaeological assemblages in terms of indices that measure differentaspects of diversity has attracted attention since the early 1980s. Diversity can be charac-terized in different ways, one of the simplest being the richness of an assemblage, defined asthe number of artifact classes represented in an assemblage. If an assemblage is conceivedas a sample from a population containing a finite number of classes then richness will tendto increase as sample size increases up to some limit. This sample-size/richness relationmilitates against the direct comparison of richness from different assemblages as a meansof evaluating the similarity of the underlying populations in terms of richness.

Two ’standard’ approaches developed to deal with this problem are the regressionmethod of Jones et al. (1983) and the simulation method of Kintigh (1984, 1989). Thispaper examines some of the methodological issues that arise in using these methods,and a newer jackknifing approach suggested by Kaufman (1998). This is by no meansthe first such critique and previous reviews include Rhode (1988), Bobrowsky and Ball(1989), Cowgill (1989), Dunnell (1989), McCartney and Glass (1990), Ringrose (1993),Byrd (1997), Kaufman (1998) and Orton (2000). Despite some rather trenchant criticismof the methods (e.g. Ringrose 1993) they continue to be used (e.g. Shott 1997; Potter1997; Grayson and Cole 1998; Dye and Tuggle 2000) and to be developed further (Byrd1997), so that further examination of their merits or otherwise seems warranted.

In part, this paper reiterates previous critiques though possibly in a different waythat I hope clarifies some of the issues involved, particularly with regard to the regression

1

approach. The opportunity is taken to demonstrate an alternative to the simulation methodthat has been previously suggested but not, to the best of my knowledge, illustrated, andwhich highlights an issue concerning its use that I have not seen raised before. Some ofthe observations on simulation and jackknifing are, I believe, novel.

1 Framework and notation

This paper addresses a number of methodological issues that have arisen in the comparativestudy of assemblages and will use the following framework. Assume that J assemblagesare to be compared then

Sj = number of classes in the population from which the jth assemblage is sampled.S = number of different classes represented in the J sampled populations.s = number of different classes observed in the J assemblages.sj = observed number of classes in the jth assemblage.nij = observed number of cases for the ith class in the jth assemblage.nj = observed number of cases in the jth assemblage.ni = observed number of cases in the ith class.n = total number of cases in the assemblages to be compared.Pij = proportion of the ith class in the jth population.pi = ni/nj is a sample estimate of Pij.pi = ni/n is the observed proportion of cases in the ith class, across all assemblages.Pi = a ‘population’ proportion estimated by pi whose meaning is discussed later.P = (P1, P2, . . . , PS) represents the ‘population’ defined by the Pi.p = (p1, p2, . . . , pS) is an estimate of P.n = (n1, n2, . . . , , nS) where the ni are intended.Pj = (P1j,P2j , . . . ,PSj.pj = (p1j, p2j , . . . , pSj).

In this formulation it is explicitly assumed that assemblages are samples from popu-lations and that the sampling fraction is negligible. This assumption is implicit in manystudies of diversity (Dunnell, 1989) and is adequate for what follows. For comment on thesituation where the assemblage is equated with the population, or the sampling fraction islarge, see Dunnell (1989), Plog and Hegmon (1993) and Leonard (1997).

Studies of diversity in archaeological assemblages frequently attempt to compare themin terms of richness and/or evenness.

1. One measure of the richness of the jth population is Sj. If comparison of the Sj isattempted this cannot be done simply using the sj because, even if Sj = Sk it willusually be the case that sj > Sk if nj >> nk unless both nj and nk are large. This isthe well-known sample-size effect.

2

2. The evenness of the jth population can be defined as a function of the distribution ofthe Pij across the S classes which ignores the ordering. Whatever function is used itmust be estimated using the pij.

In studies of richness the sample-size effect poses considerable problems that, arguably,have not been satisfactorily resolved, and the main focus of this paper is on this problem.

Regression approaches

In the regression approach (Jones et al. 1983; Grayson 1984) the relation between samplerichness and sample size is modelled statistically, usually as a log-linear regression model.One formulation is

log(sj) = α′ + β log(n)

which implies thatsj = αnβ

j

where α′ = log(α). If Sj is finite log-linearity cannot hold as sj approaches this limitCowgill 1989: 133). To avoid this problem Byrd (1997) proposed a non-linear regression(growth) model that, in a more general form, can be written as

sj = Sj − α exp(−βnj).

If β is positive, as nj becomes large si → Si which is an unknown parameter to be estimated.This explicit dependence on Sj makes immediately apparent the main problem with theregression approach which is that, without some restriction on the Sj, results from it willbe uninterpretable. This is because in a comparison of J assemblages there are only J datapoints of the form (S, nj) but, in the non-linear formulation above, (J+2) parameters needto be estimated. This problem is most readily seen in the non-linear model but causes equaldifficulties for the log-linear formulations, as consideration of Figure 1 shows.

Figure 1 shows several curves derived from the non-linear model, on a log-log scale.Imposed on this are six hypothetical data points corresponding to samples of size (26,35, 43,74, 99, 141) with sample richness (3, 6, 7,13, 18, 26). Since the curves are atleast approximately log-linear over much of the range shown a log-linear analysis would beacceptable as a first approximation.

The data points are constructed so that they lie about a single curve and observed dataof this form could be interpreted as demonstrating a sample size effect. Equally, the pointslie on curves corresponding to different population richness. It is impossible to distinguishbetween the two situations from the data alone. Dunnell (1989) makes essentially thispoint, though it is expressed differently

The linear correlation between sample size and richness on the log scale is 0.99 but,despite this high value, it cannot be interpreted as showing that varying assemblage richnessis ‘explained’ by varying sample size. Dunnell (1989: 146) take several of the contributors toLeonard and Jones (1989), including Thomas (1989), to task for this kind of interpretation,

3

Figure 1: Six ‘growth curves’ relating sample richness to sample size are shown, usingthe model presented in the text. The legend shows the richness of the populations used togenerate the curves. The five parallel curves are generated using β = −0.015 and α = 1;the solid curve is generated using β = −0.006 and α = 1.1. The hypothetical data pointsdisplayed can be interpreted as arising from a single growth curve with Sj = 50; severalgrowth curves with constant α and β; or a mixture of the two. In practice it would beimpossible to know, from the data alone, what situation applied.

suggesting that it arises from the mistaken conflation of cause and correlation. Thomas(1989: 87) includes statements along the lines that, because of high correlation, variabilityin diversity ‘can be accounted for strictly in terms of sample size’. This is only so if itis assumed that Sj = S for all j, the very assumption that the regression approach ispresumably supposed to test. Byrd (1997) makes a similar assumption in his non-linearmodel. His application involves the comparison of different sets of assemblages and it isassumed that within a set the assemblages are drawn from populations with the samerichness. The non-linear estimation estimation method used by Byrd is, unfortunately,flawed. That this is so is demonstrated in Baxter (2000), where an alternative method ofestimation is proposed and illustrated1.

Another point, as Ringrose (1993) observed and as is obvious from Figure 1, is that it

1In Byrd’s (1997) model it is assumed that α = 1 and β is estimated using an ad-hoc procedurebased on experimentation with simulated data. This reduces sj to a non-linear function of Sj . Givena set of assemblages for which it is assumed the Sj are the same a ‘best-fitting’ curve is fitted using aniterative minimum chi-squared method. The more ad-hoc aspects of this approach could be avoided byusing modem statistical software such as S-Plus that permits non-linear estimation of parameters usingmaximum likelihood methods (Venables and Ripley 1999: Chapter 8).

4

is perfectly possible to conceive of a set of data giving rise to a zero correlation if smallersamples are associated with richer populations. Negative correlations are also possible andthere is an example of one in Thomas (1989: 91), albeit for small J = 3.

To summarize, the use of the regression approach to identify sample-size effects for asingle set of assemblages is inappropriate. Linearity does not demonstrate that the pop-ulations from which the assemblages are sampled have a similar richness. The methodmight be used to compare several sets of assemblages. For example, Grayson and Cole(1998) compare French Mousterian, Chapelperronian and Aurignacian stone-tool assem-blage richness using regression methods, and also compare different industries within theMousterian. For this to be strictly valid, however, it needs to be assumed that the richnessis the same within the populations from which a particular set of assemblages is sampled.

Simulation approaches

In Kintigh’s (1984, 1989) simulation method, expected richness and associated confidenceintervals are generated for different sample sizes, by repeatedly sampling from a ‘back-ground’ population whose structure has to be defined.

For illustration a data set, originally discussed by Conkey (1980), that has been usedby others for illustrative purposes (e.g., Kintigh, 1984; Rhode, 1988; Kaufman, 1998) isconsidered. The data are give in Kaufman (1998) in the form of a 44 × 5 table wherecolumns correspond to different Early Magdelanian sites in Cantabrian Spain and rows todesign elements on engraved bone artifacts. Entries in the original table correspond tocounts of design elements from each site. The five sites Altamira (A), Cueto de la Mina(CM), El Juyo (EJ), El Cierra (EC) and La Paloma (LP) differ in their apparent richness,but also in terms of sample size. Altamira, with nj = 152 and sj = 38, is apparently therichest, but there has been some debate about whether this is a real phenomenon or aneffect of sample size (Rhode 1988, Kaufman 1998). In particular, it is of interest to knowif Altamira is genuinely richer that Cueto de la Mina which has a lower sj = 27 but alsolower nj = 69, since different methods have suggested apparently different answers to thisquestion (Kaufman, 1998).

Using 100 samples of size m, for m = 5 to 160 in intervals of 5, the solid lines in Figure 2show the estimated 80% confidence interval for E(sm), the expected richness in samplesof size m. Also shown are the actual sample sizes and richness fol the five sites used. The‘population’ is generated, following Kintigh’s (1984, 1989) recommendation, by summingthe totals for each assemblage. The result is essentially the same as that shown in Kintigh(1984) and has been interpreted as showing that Altamira is richer than expected underthe ‘null’ model, whereas Cueto de la Mina does not depart significantly from its expectedrichness.

In our notation this involves treating the sample estimate p as if it is the populationP and sampling from the (essentially infinite) population defined by the latter. This isequivalent to sampling from n with replacement.

My reading of the critical literature that has grown up around this approach is that

5

Figure 2: The solid lines show the 80% confidence interval obtained using Kintigh’s (1984)simulation method, with 100 simulations, applied to Conkey’s (1980) data. The dottedlines show a similar interval calculated using rarefaction and the dot-dash line shows theexpected richness using the same method.

it mostly centers around the legitimacy of using P, which may not correspond to a realpopulation in any useful sense. This concern can be expressed in different ways.

1. If all the Pj are the same then P is well defined; however, one is then assuming thatthe sampled populations have the same richness, which is the assumption the methodis supposed to test (e.g. Rhode 1988).

2. If the Pj differ then P is an artificial construct that need bear no relation to any of thePj and may well differ completely from all of them (e.g., Ringrose 1993).

3. The form of P will be biased towards that of the larger assemblages, which will affect thecomparisons made. McCartney and Glass (1990: 528-529), whose examples illustratethis, attempt to defend this feature of the method - unconvincingly in my view. Ifthe jth assemblage dominates in terms of size, for example, it seems better to use Pj

estimated by pj to define the population, and test directly if other assemblages arecompatible with it in terms of richness.

4. Even in circumstances where the definition of P on the basis of the observed assemblagesis not problematic, to equate it with p and ignore the uncertainty of estimation isquestionable. This is considered further below.

Both Rhode (1988) and Ringrose (1993), citing Smith and Grassle (1977), observethat simulation is unnecessary because the expected richness and confidence limits can

6

be calculated analytically2. Neither author illustrates the approach. If a random sampleof size m < n is drawn from an assemblage containing s classes, without replacement, anunbiased estimate of the number of classes expected in a sample of that size is

E(sm) =s∑

i=1

(1− (n− ni)!(n−m)!)

(n− ni −m)!n!

For small m the value of E(sm) is dominated by the common classes; for larger m rarerclasses have more effect. This formula is given in Ringrose (1993) who notes that it maybe viewed as defining a series of diversity indices of differing sensitivity to rare classes. Hedoes not provide the corresponding formula for standard errors, but notes that the methodis equivalent to ‘rarefaction’ methodology and the appropriate variance estimate for this isgiven in Orton (2000: 221), following Birks and Line (1992), as

var(sm) = a−1

(

s∑

i=1

ai(1− ai/a+ 2s−1∑

i=1

s∑

j=i+1

(aij − aiaj/a)

)

wherea = n!/(n−m)!m!

ai = (n− ni)!/(n− ni −m)!m!

aij = (n− ni − nj)!/((n− ni − nj −m)!m!

and b! = b(b− 1) . . . 1 is the usual factorial notation. These expressions are all of the form

n!/(n− x)!m!

and are equal to 0 by definition if x > n.It is instructive to compare the analytical results obtained using these results with

Kintigh’s (1984) simulation approach and this is also illustrated in Figure 2. The outerdotted limits are the estimated 80% confidence limits, with the central, dashed and dotted,line the expected richness. These tend to give higher values than the simulation approach,particularly for larger values of m. This is because the rarefaction approach and the resultsgiven above assume sampling without replacement, whereas Kintigh’s (1984, 1989) methodassumes sampling with replacement or, equivalently, that p estimates P without error.

Conclusions differ according to which method is used. For example, using Kintigh’soriginal method Altamira isjudged to be significantly richer than expected under the as-sumed population model, whereas its richness is close to that expected under the rarefactionapproach. Conclusions as to whether Altamira and Cueto de la Mina are significantly dif-ferent also depend on the approach used. With the rarefaction method they both appear

2That the limits can be calculated analytically is not, in itself, a criticism of the simulation method.The appropriate formulae depend on factorials that can be large and give rise to computational problems,so simulation is one way of avoiding these. What is important is that the formulae (presumably) referredto by Rhode (1988) and Ringrose (1993) assume sampling without replacement, in contrast to the use ofsampling with replacement implicit in Kintigh’s (1984) method.

7

to be close to the expected richness and lie comfortably within the 80% confidence interval.If Kintigh’s (1984) method is modified to sample with replacement from nn then limitssimilar to those derived analytically are obtained.

Ringrose (1993) suggested, without illustration, an alternative use of E(sm) that isillustrated here. If J assemblages are to be compared then, for each assemblage in turn,E(sm) may be calculated for m = 2, 3, . . . ,M and plotted against M . The resulting curvesmay be compared for each assemblage and if a curve for one assemblage is completely abovethat for another it suggests that the population from which that assemblage is drawn isthe richer. Ringrose (1993) suggests taking M as the minimum of the sample sizes acrossthe assemblages to be compared. Applying the foregoing methodology to these data givesrise to Figure 3 which can be interpreted as showing that Altamira and then Cueto de laMina are the richest sites, differing somewhat from the other three.

Figure 3: The expected richness index for five sites for m = 2, 3, . . . 23 is shown (see thetext for the key to the legend). The consistent separation, with no crossings, suggests anordering in terms of population richness, with Altamira (A) and Cueto de la Mina (CM)the most rich.

The statistics may also be used to test whether two sites differ significantly in richness.For example, n1= 69 for Cueto de la Mina and s1= 27. If a sample of this size is selectedfrom the Altamira assemblage the expected richness is 28.15 with a standard deviation of2.05. The observed richness for Cueto de la Mina lies comfortably within one standard de-viation of the expected richness, suggesting that the two sites are not significantly differentin terms of richness.

To summarize this section, previous criticisms of the simulation approach, that centeron the way in which the background that underpin the simulation has been defined, havebeen noted. An analytical method of calculating the confidence limits produced by the

8

simulation method, previously noted but not used by Rhode (1988) and Ringrose (1993),has been illustrated. This raises the issue that, if the simulation method is used, samplingfrom the background population without replacement may be more appropriate than thesampling with replacement that is normally used. This can affect the substantive conclu-sions drawn from an analysis. The statistical formulae used to get the analytical resultsarise in rarefaction analysis and can be used to effect comparisons between different pairsof assemblages.

Resampling approaches

Dis-satisfaction with the regression and simulation methods has recently led Kaufman(1998) to propose an alternative approach based on re-sampling - specifically the jackknifetechnique. This section contains a number of observations on this techniqm and will alsoillustrate bootstrapping, which is an alternative re-sampling method that could be used.

Jackknifing is is a procedure that for reducing bias in estimates and providing confidenceintervals where these are otherwise difficult to obtain. Kaufman recommends the techniqueprimarily because it ‘does not require any prior assumptions about, or knowledge of, thestructure of the original data’ (Kaufman 1998: 83). Given a sample of size n the principleis to omit each sample member in turn, generating n samples of size (n−1). The parameterof interest is generated for each of these n samples and these n estimates are combinedto provide the jackknife estimate and standard deviation for the parameter. Kaufmanclaims that ‘jackknifing is quite sensitive to picking up variation between assemblages andperforms better in this regard than the regression and simulation approaches’

Kaufman’s (1998) paper may be referred to for technical details and some workedexamples. Let Θj be the value of a statistic that measures some aspect of diversity foran assemblage; and let Θij be the value of the statistic on omitting the ith class, fori = 1, 2, ..., sj . From these quantities pseudovalues

Φij = sjΘj − (sj − 1)Θij

are defined for i = 1, 2, ..., sj . The mean of the Φij , Φj is used as a measure of diversity forthe assemblage and its estimated standard error, calculated from the sj values of Φij canbe used to test if it differs significantly from means for other assemblages. Kaufman (1998)undertakes analyses that use Menhinick ’s index, Θj = s/

√nj, as a measure of diversity

related to richness. His results, applied to Conkey’s (1980) data, differ in some details fromprevious analyses. For example it is concluded that Cueto de la Mina, with Φj = 4.74 ,is richer than Altamira with 4.56, though not significantly so. This differs from analysesthat have concluded that Altamira is the richer site.

The following observations on the procedure may be made.

1. Conclusions about the relative richness of different sites are a function of the measure ofrichness chosen and not of the jackknifing procedure. For example, Θj = s/

√nj = 3.25

for Cueto de la Mina and 3.08 for Altamira. This difference for the complete sample

9

simply feeds through into the jackknifed and pseudovalues, so that the difference inresults from previous analyses is a reflection of the choice of statistic used for jackknifingand nothing to do with the jackknifing procedure itself.

2. A major motivation for using jackknifing and other sampling procedure is that standarderrors and confidence intervals for statistics of interest can be derived. Kaufman (1998)uses such information to argue that Altamira and Cueto de la Mina do not differ signif-icantly in terms of richness. This apparently differs from Kintigh’s (1984) conclusion,but we have seen in the previous section that, in fact, the same conclusion is obtainedusing sampling without replacement from either n or the Altamira assemblage.

3. It is assumed in Kaufman (1998: 75) that the pseudovalues can be treated as normallydistributed random variables and this is used as the basis for inferential proceduresbased on the Φj. In fact normality cannot be taken for granted. Figure 4 shows thedistribution of pseudovalues for El Cierro and it is evident that their distribution ishighly non-normal, as was the case for the other sites. While the operation of thecentral limit theorem might lead one to hope that, nevertheless, the mean Φ̄j can betreated as normally distributed, the non-normality in Figure 4 is so great and the samplesize, 15, sufficiently small that this should not be assumed. Investigation of this questionprovides a convenient way of introducing bootstrap methodology (Efron and Tibshirani,1993; Davison and Hinkley, 1997).

Figure 4: A histogram based on 1000 bootstrap replications of the mean of the jackknifedpseudovalues for the El Cierra assemblage.

Bootstrapping is a data-based simulation method that can be used to investigate theproperties of a statistic estimated from a sample. Given a sample of size n, a bootstrapsample is obtained by sampling n observations from the sample with replacement. Thisprocedure is repeated a large number of times, say N , and the N estimates are used toestimate the distribution of the parameter, from which inferences may be drawn.

10

Figure 5: A histogram of the jackknifed pseudovalues for the El Cierro assemblage. Thisis highly non-normal and appears discrete because only six different counts occur in the 15classes represented in the assemblage.

In a jackknife analysis there are 15 non-empty classes for El Cierro giving rise to 15pseudovalues. A bootstrap sample consists of 15 observations selected from these values,sampledwith replacement. By taking 1000 bootstrap samples the distribution of the statis-tic to be investigated can be approximated without reference to distributional assumptions,and used as the basis for drawing inferences. Figure 5 shows the distribution of the meanof the pseudovalues estimated from 1000 bootstrap samples and is clearly not normal.

The empirical 95% bootstrap percentile interval, defined by the 0.025 and 0.975 per-centiles of the distribution in Figure 5, leads to an interval (3.12, 4.14). This contrastswith the interval derived from the jackknifed values using normal theory of (3.16, 4.24). Thedifference is not too great, though the bootstrap interval is somewhat narrower, howeverthe important point in principle is that it is unsafe to assume normality of the jackknifedmean if the number of classes is small.

The bootstrap can be applied directly to Menhinick’s index, and arguably producesmore interpretable results than the pseudovalues generated in jackknifing (Efron and Tib-shirani, 1993: 145). A bootstrap sample is drawn from the sj values of nij for assemblagej and Menhinick’s index may be calculated from this. To illustrate, Figure 6 shows thehistogram of the index for 1000 bootstrap samples for the Altamira assemblage. A 95%bootstrap percentile interval for the index is (2.68, 3,62). This comfortably includes theobserved value of 3.25 for Cueto de la Mina, leading once again to the conclusion that theydo not differ significantly in richness.3

Jackknifing and bootstrapping are both resampling approaches to problems of statisticalinference that have only been used sporadically in archaeology. Aldenderfer (1988: 111)

3The bootstrap percentile interval is used because it is easy to explain. Improvements exist, such asthe BCa interval, discussed in Chapter 14 of Efron and Tibshirani (1993). This will not be explained hereas it does not affect the point of principle being made.

11

Figure 6: A histogram based on 1000 bootstrap replications of Menhinick’s index of richnessfor the Altamira assemblage. The 95% bootstrap percentile interval given in the text isdefined by the two values, (2.68, 3,62), that cut off 0.025 of the distribution in each tail.

has recently noted the potential of this kind of resampling approach for archaeologicaldata analysis and suggested that its popularity is likely to grow. Kaufman (1998) hasreccommended the jackknifing procedure for use in the comparison of assemblage richness.That some of his results differ from previous analyses of the same data is a function of thestatistic used to measure richness, rather than the methodology. The jackknife can be usedto test for significant differences between measures of richness, but this is also possible usingthe rarefication methods discussed in the previous section. In fact the use of bootstrappingdemonstrates that some of the assumptions in Kaufman’s use of the jackknife for inferenceare questionable. The bootstrap method is an alternative to the jackknife as a resamplingapproach for studying richness that is more transparent and avoids, for example, the needto grapple with the concept of ‘pseudovalues’ and their interpretation. It is also morecomputer-intensive but, at least for the problem studied here, this is not a serious issuewith modern computing power.

2 Discussion

The main focus of this paper has been on methodological issues that arise in three ap-proaches – two established and one new – that have been used to compare diversity inarchaeological assemblages. Particular attention has been paid to the measurement ofrichness and sample size effect. The main conclusions may be summarized as follows.

1. Byrd’s (1997) non-linear generalization of the regression approach includes populationrichness as an unknown parameter. Using this model a graphical demonstration has beenproduced to explain why the regression approach will generally produce uninterpretableresults. Unless it is assumed that the assemblages studied derive from populations with

12

the same richness a strong linear (or non-linear) relationship between sample size andrichness cannot be interpreted as solely attributable to a sample-size effect. Similarly,without this assumption, outliers in a regression cannot be assumed to correspond topopulations that differ in richness from those lying on the regression line.

2. The simulation method, when the background ‘population’ is generated by summingthe observed assemblages, is open to a similar objection. Unless the assemblages areassumed to be samples from similar populations it is difficult to know what the back-ground population represents. The technique of rarefaction has been illustrated andcould be used in pairwise comparisons to establish if an assemblage could reasonablybe regarded as a sample from a population having the same structure as a larger as-semblage. It has also been argued that the simulation method, if appropriate, is betterapplied using sampling without replacement from the background ‘population’. Thiscan have non-trivial effects on inference unless the sampled population is large, whensampling with or without replacement will produce similar results.

3. It was noted that in the jackknifing approach advocated by Kaufman (1998) differencesin results from other approaches are a function of the richness index used rather thanthe jackknifing method. The possibility of testing for significant differences betweenmeasures of richness is not unique to the jackknife technique, as earlier use of rarefactionmethodology demonstrated. In fact the assumption that pseudovalues and the meanderived from them, that underpins Kaufman’s use of the jackknife for inference, isquestionable, at least for assemblages with few classes. This was demonstrated usingbootstrap methodology, and the bootstrap is arguably a more natural approach thanthe jackknife for drawing inferences about estimates of diversity. Whether or not aparticular richness statistic is a sensible one to use is a separate issue.

The main aim of this paper has been to discuss methodology rather than substantiveinterpretations of data sets. It may, however, be noted that using rarefaction, or samplingwithout replacement in the simulation approach, led to the same conclusion as Kaufman(1998) that Altamira and Cueto de la Mina do not differ significantly in terms of richness.The regression approach cannot usefully be used to address this question.

References

Aldenderfer, M. 1998 Quantitative methods in archaeology; A review of recent trends anddevelopments. Journal of Archaeological Research 6, 91-126.

Baxter, M.J. 2000 Non-linear models of assemblage richness: A technical note. ResearchReport 28/00. Department of Mathematics, Statistics and Operational Research,Nottingham Trent University, UK.

Birks, H.J.B. and Line, J.M. 1992 The use of rarefication analysis for estimatingpalynological richness from Quartenary pollen-analytical data. Holocene 2, 1-10.

13

Bobrowsky, P.T. and Ball, B.F. 1989 The theory and mechanics of ecological diversity inarchaeology. In Quantifying Diversity in Archaeology, edited by R.D. Leonard and G.T.Jones, pp. 4-12. Cambridge University Press, Cambridge.

Byrd, J.E. 1997 The analysis of diversity in archaeological faunal assemblages:Complexity and subsistence strategies in the southeast during the Middle Woodlandperiod. Journal of Anthropological Archaeology 16, 49-72.

Conkey, M.W. 1980 The identification of prehistoric aggregation sites: the case forAltamira. Current Anthropology 21, 609-630.

Cowgill, G.L. 1989 The concept of diversity in archaeological theory. In QuantifyingDiversity in Archaeology, edited by R.D. Leonard and G.T. Jones, pp. 131-141.Cambridge University Press, Cambridge.

Davison, A.C. and D.V. Hinkley 1997 Bootstrap Methods and their Application.Cambridge University Press, Cambridge.

Dunnell, R.C. 1989 Diversity in archaeology: a group of measures in search ofapplication? In Quantifying Diversity in Archaeology, edited by R.D. Leonard and G.T.Jones, pp. 142-149. Cambridge University Press, Cambridge.

Dye, T.S. and H. D. Tuggle 2001 Land snail extinctions at Kalaeloa, O’ahu. InternetArchaeology 10 (intarch.ac.uk/journal/issue10/).

Efron, B. and R.J. Tibshirani 1993 An Introduction to the Bootstrap. London: Chapmanand Hall.

Grayson, D.K. 1984 Quantitative Zooarchaeology. Academic Press: Orlando.

Grayson, D.K. and S.C. Cole 1998 Stone tool assemblage richness during the middle andearly Upper Palaeolithic in France. Journal of Archaeological Science 25, 927- 938.

Jones, G.T., D.K. Grayson, and C. Beck 1983 Artifact class richness and sample size inarchaeological surface assemblages. In Lulu Linear Punctuated: Essays in Honour ofGeorge Irving Quimby, edited by R.C.Dunnell, and D.K. Grayson, D.K., pp. 55-73.Anthropological Papers 72, Museum of Anthropology , University of Michigan, AnnArbor.

Kaufman, D. 1998 Measuring archaeological diversity: an application of the jackknifetechnique. American Antiquity 63, 73-85.

Kintigh, K.W. 1984 Measuring archaeological diversity by comparison with simulatedassemblages. American Antiquity 49, 44-54.

Kintigh, K.W. 1989 Sample size, significance and measures of diversity. In QuantifyingDiversity in Archaeology, edited by R.D. Leonard and G.T. Jones, pp. 25-36. CambridgeUniversity Press, Cambridge.

Leonard, R.D. 1997 The sample size-richness relation: a comment. American Antiquity62, 713-716.

Leonard, R.D. and G.T. Jones (eds.) 1989 Quantifying Diversity in Archaeology.

14

Cambridge University Press, Cambridge.

McCartney, P.H. and M.G. Glass 1990 Simulation modls and the interpretation ofarchaeological diversity. American Antiquity 55, 521-536 .

Orton, C. 2000 Sampling in Archaeology. Cambridge University Press, Cambridge.

Plog, S.and M. Hegmon 1993 The sample size-richness relation: the relevance of researchquestions, sampling strategies and behavioral variation. American Antiquity 58, 489-496.

Potter, J.M. 1997 Communal ritual and faunal remains: an example from the DoloresAnasazi. Journal of Field Archaeology 24,353-364.

Rhode, D. 1988 Measurement of archaeological diversity and the sample-size effect.American Antiquity 53, 708-716.

Ringrose, T.J. 1993 Diversity indices and archaeology.In Computing the Past: CAA92,edited by J. Andresen , T. Madsen and I. Scollar, pp. 279-285. Aarhus University Press,Aarhus.

Shott, M.J. 1997 Activity and formation as sources of variation in Great Lakespaleoindian assemblages. Midcontinental Journal of Archaeology 22, 197-236.

Smith, W. and J.F. Grassle 1977 Sampling properties of a family of diversity measures.Biometrics 33, 288-292.

Thomas, D.H. 1989 Diversity in hunter-gatherer cultural geography. In QuantifyingDiversity in Archaeology, edited by R.D. Leonard and G.T. Jones, pp. 85-91. CambridgeUniversity Press, Cambridge.

Venables, W.N. and B.D. Ripley 1999 Modern Applied Statistics with S-PLUS: Thirdedition. New York: Springer.

15