12
Journal of Archaeological Science 198 1,8,77-88 The Effects of Sample Size on Some Derived Measures in Vertebrate Fauna1 Analysis Donald K. Grayson” Analysisof three kindsof measures derivedfrom archaeological and palaeontological fauna1 collections (measures of relative taxonomic abundance, measures of taxonomic diversity, and a measureproposed to distinguish natural from cultural bone) demonstrates that in many instances such measures may he, or are, a function of sample size.A procedure for detectingthis situation is suggested, and the source of the interrelationships discussed. Keywords: FAUNAL ANALYSIS, SAMPLE SIZE, TAXONOMIC ABUN- DANCE, DIVERSITY MEASURES, VALIDITY. Introduction To ask whether a measure is valid is to ask whether it is measuring what we think it is measuring. Questions of validity are routinely asked in psychometrics, since “it is always necessary to gather some sort of evidence which provides confidence that a test score really represents what it appears to represent” (Helmstadter, 1964, p. 86). In both vertebrate palaeontology and archaeology, questions of the validity of measures in common use to assess the abundance of taxa represented in excavated bone samples have often been asked, although the recent upsurge of interest in asking such questions clearly relates to the fact that the validity of measures long in use now seems less secure (see Grayson, 1979, in press; Wolff, 1975, and references therein). In this paper, I examine the relationship between the size of excavated bone samples and three kinds of measures derived from such samples : relative abundances of single taxa or series of taxa which have been chosen as indicative of some parameter under investiga- tion; measures of taxonomic diversity; and, a measure employed to distinguish natural from cultural bone in archaeological sites. I shall argue that in many instances, such measures either may be, or are, a function of sample size. Relative Abundances and Sample Size Unless a palaeontological or archaeological site is completely excavated and every bone preserved in that site retrieved, the bone collection recovered is a sample of the entire set of bones which could have been retrieved. While all fauna1 analysts recognize that extremely small fauna1 samples probably do not provide a data base from which statistical generalizations concerning the fauna1 population may be validly made, it is also true that very little is known about how large a sample must be to represent adequately a population “Department of Anthropology, University of Washington, Seattle, Washington 98195, U.S.A. 77 0305-4403/81/010077$12 $02.00/O Q 1981 Academic Press Inc. (London) Limited

The effects of sample size on some derived measures in vertebrate faunal analysis

Embed Size (px)

Citation preview

Journal of Archaeological Science 198 1,8,77-88

The Effects of Sample Size on Some Derived Measures in Vertebrate Fauna1 Analysis Donald K. Grayson”

Analysis of three kinds of measures derived from archaeological and palaeontological fauna1 collections (measures of relative taxonomic abundance, measures of taxonomic diversity, and a measure proposed to distinguish natural from cultural bone) demonstrates that in many instances such measures may he, or are, a function of sample size. A procedure for detecting this situation is suggested, and the source of the interrelationships discussed.

Keywords: FAUNAL ANALYSIS, SAMPLE SIZE, TAXONOMIC ABUN- DANCE, DIVERSITY MEASURES, VALIDITY.

Introduction To ask whether a measure is valid is to ask whether it is measuring what we think it is measuring. Questions of validity are routinely asked in psychometrics, since “it is always necessary to gather some sort of evidence which provides confidence that a test score really represents what it appears to represent” (Helmstadter, 1964, p. 86). In both vertebrate palaeontology and archaeology, questions of the validity of measures in common use to assess the abundance of taxa represented in excavated bone samples have often been asked, although the recent upsurge of interest in asking such questions clearly relates to the fact that the validity of measures long in use now seems less secure (see Grayson, 1979, in press; Wolff, 1975, and references therein).

In this paper, I examine the relationship between the size of excavated bone samples and three kinds of measures derived from such samples : relative abundances of single taxa or series of taxa which have been chosen as indicative of some parameter under investiga- tion; measures of taxonomic diversity; and, a measure employed to distinguish natural from cultural bone in archaeological sites. I shall argue that in many instances, such measures either may be, or are, a function of sample size.

Relative Abundances and Sample Size Unless a palaeontological or archaeological site is completely excavated and every bone preserved in that site retrieved, the bone collection recovered is a sample of the entire set of bones which could have been retrieved. While all fauna1 analysts recognize that extremely small fauna1 samples probably do not provide a data base from which statistical generalizations concerning the fauna1 population may be validly made, it is also true that very little is known about how large a sample must be to represent adequately a population

“Department of Anthropology, University of Washington, Seattle, Washington 98195, U.S.A.

77 0305-4403/81/010077$12 $02.00/O Q 1981 Academic Press Inc. (London) Limited

D. K. GRAYSON

in any given instance, since the answer to such questions requires that something be known about the distribution of the variables under study. In practice, this situation results in samples that intuitively appear to be small being rejected as the basis for statistical manipulation, while samples that appear to be large are so used. The statistical analyses conducted with these apparently sufficiently large and sufficiently representative samples typically involve the application of some measure of the abundances of the taxa represented in those collections. Routinely, the number of identified specimens per taxon (NISP) or the minimum number of individuals per taxon (MNI) is the measure used to assess taxonomic abundance; the reason for applying such measures almost invariably relates to an attempt to make statements about past environments or about past human subsistence.

In these situations, the measure in use would be considered valid if the palaeoenviron- mental or subsistence parameters under investigations were, in fact, being measured. Analysis of actual applications of such measures, however, suggests that what is at times detected are not differing values of the parameter of interest, but instead differing sizes of the samples from which the measures have been derived. I shall demonstrate this through the examination of three archaeological, and one palaeontological, examples and then return to a more general discussion of the phenomenon.

Example I: Snaketown, southern Arizona, U.S.A. I begin with a very simple example. Included in the classic study of the prehistoric Hohokam village of Snaketown (Haury, 1971) is a brief analysis of a sample of the verte- brate fauna from this site (Greene & Matthews, 1976). Greene & Matthews provide a figure which dislays the frequency of “the principal meat-producing animals by phase” (1976, p. 371), calculated as the percentage of the minimum number of individuals per taxon by Hohokam phase. The abundances of the six taxa included in this figure vary from phase to phase, although Haury (1976) notes that the differences from phase to phase in abundance of the most common of these taxa do not seem significant. Nonethe- less, it is reasonable to ask if changes in these numbers across phases are telling us any- thing about subsistence, or if instead they are measuring something very different.

Table 1. Minimum numbers of individuals and relative abundances of deer by phase, Snaketown (from Greene & Matthews, 1976)

Phase MN1 Rank, MN1 % Deer Rank, % Deer

Vahki 65 1 15.4 6 Estrella 15 6 400 2 Sweetwater 56 2 25.1 4 Snaketown 27 4 11.1 I Gila Butte 49 3 24.5 5 Santa Cruz 23 5 39.1 3 Sacaton 4 7 50.0 1

Spearman’s r,, MNI- % deer = - 0.75 (P < 0.05).

Table 1 presents minimum numbers of individuals by unmixed phase at Snaketown, and also displays the relative abundances of deer (Odocoileus hemionus) expressed as a percentage of the total MN1 for each phase. Inspection of these two sets of numbers suggests that something other than the changing abundances of deer at Snaketown might be being measured here : as sample size increases, the relative abundance of deer seems to decrease. Calculations of Spearman’s rank order correlation coefficient (rho) for these data confirms this inverse relationship: r, = -0.75 (PC O-05; all tests of significance reported in this paper are two-tailed). I note that the only phases represented in Table 1

THE EFFECTS OF SAMPLE SIZE 79

are those displayed in the figure provided by Greene & Matthews (1976, p. 371); addition of the Pioneer phase to the analysis does not alter the results (I; = -0.74; P< 0.05).

As a result, it is no longer clear whether these relative abundances are measuring changing abundances of deer through time, or are instead being determined by differing sample sizes across phases. That is, the significant negative correlation between sample size and the relative abundance of deer across phases suggests that these relative abun- dances may not be a valid measure of the relative importance of deer characteristic of Snaketown phases.

Example 2: Hogup Cave, northwestern Utah, U.S.A. Neither Haury (1976) nor Greene & Matthews (1976) based further analyses on the chang- ing relative abundances of deer, and most other taxa, across Snaketown phases. As a result, the fact that these changing relative abundances may be a function of sample size does not harm their interpretation of the Snaketown fauna. It is, however, not hard to find situations in which possible sample size effects on relative abundances may render interpretation of those abundances as due to other causes less convincing.

Harper & Alder (1970) presented a detailed and valuable palaeoenvironmental analysis of the plant and mammal remains from Hogup Cave, northwestern Utah. In order to track changing abundances of mesic and xeric habitats in the area surrounding the cave through time, Harper & Alder (1970) divided the rodents from the site into mesic and xeric groups, eliminating from consideration those which contributed less than one individual per 1000 years to the deposits. They then plotted changing relative abundances of mesic and xeric rodents against the five major Hogup stratigraphic units, and demon- strated that the relative abundance of xeric rodents through time at Hogup Cave climbed steadily from the time of deposition of the earliest unit, between 6400 and 1250 BC, to the time of deposition of the latest unit, between AD 1350 and AD 1850. Harper & Alder (1970, p. 234) concluded that “relative abundance of mesic and xeric forms would seem to suggest continued drying of the upland environment throughout the period of time”.

Table 2. Sample sizes and relative abundance of xeric rodents by major strati- graphic unit, Hogup Cave (from Harper & Alder, 1970)

Howp Xeric plus Unit MN1 Rank mesic rodents, MN1 Rank

‘A Xeric rodents Rank

1 444 2 135 2 5 2 2247 1 371 1 d: 4 3 298 4 79 3 15 3 4 303 3 44 4 89 2 5 84 5 29 5 93 1

Spearman’s r,, MNI-% xeric rodents = -0.80 (P = 0.10). Spearman’s r,, xeric plus mesic rodents-% xeric rodents = -0.90 (P = 0.05).

Table 2 presents total numbers of mammalian individuals per Hogup Cave stratigraphic unit, as well as the per cent of the rodent fauna assigned by Harper & Alder (1970) to the xeric group. Inspection of this table shows that as the total MN1 for a stratigraphic unit increases, the relative abundance of xeric rodents decreases; Spearman’s rs for these two sets of data is -0.80 (p = O-10). Table 2 also presents the total number of rodents assigned to mesic and xeric categories and analysed by Harper & Alder (1970); Spearman’s rs for the total number of rodents analysed and per cent of xeric rodents is -0.90 (P=O*O5).

As with the Snaketown data, these significant correlations between sample size and relative abundances make it reasonable to question the assertion that “per cent xeric

80 D. K. GRAYSON

rodents” is measuring the abundance of xeric habitats in the area surrounding Hogup Cave. It would appear, instead, that this measure may be largely a function of sample size : as the sample size increases, the relative abundance of xeric rodents decreases. It is not unreasonable to argue that “per cent xeric rodents” is not a valid measure of the abun- dance of xeric habitats surrounding Hogup Cave in the past.

Example 3: Raddatz Rockshelter, southcentral Wisconsin, U.S.A.

In 1966, Cleland published a perceptive and important study of the prehistoric animal ecology and prehistoric human use of vertebrates in the upper Great Lakes region. As part of this analysis, he re-examined the vertebrate remains from Raddatz Rockshelter (Wittry, 1959), which had previously been published by Parmalee (1959). While Parmalee (1959) had interpreted the abundances of taxa from this site on essentially nominal and ordinal levels, Cleland (1966) wished to make more detailed statements about changing environments and changing human utilization of the prehistoric fauna through time. Of interest here is his attempt to infer changing habitat types through time from changes in per cent abundances of a set of taxa which he felt were indicative of those habitat types.

Table 3. Sample sizes and relative abundances of deer, Raddatz Rockshelter (from Geland, 1966)

Level Total NISP Rank % Deer Rank

1 117 2 181 3 286 4 374 5 551 6 575 7 442 8 344 9 179

10 170 11 125 12 82 13 49 14 69 15 44

11 7 6 4 2 1 3 5 8 9

10 12 14 13 15

93 (93.16) 90 87 95

93 &22) 94 86 80

69 (68.82) 63 56

69 (69.39) 45 43

5 6 7 2 1 4 3 8 9

11 12 13 10 14 15

Spearman’s rs, NISP- % deer = +0.84 (P<O.OOl).

Table 3 presents the total number of identified elements per level at Raddatz, the percentage of those elements which are deer (Udocoileus virginianus) in those levels, and the rank orders of those levels in terms of both total NISP and per cent deer. I chose to analyse deer in this example because that taxon is the most abundant in every level; because of the interdependence of percentage values, the behaviour of other taxa will tend to be strongly correlated with that of deer. As Table 3 shows, Spearman’s rs between level NISP values and per cent deer for those levels is very high: rs = +O-84 (P-C 0401). As a result, it becomes very difficult to interpret changing relative abundances of deer through time: perhaps, as Cleland (1966) suggests, they are providing information on changing abundances of habitat types through time in the region surrounding Raddatz Rockshelter. Perhaps, however, they are primarily telling us about the number of bone specimens identified per level.

THE EFFECTS OF SAMPLE SIZE

Example 4: Drover’s Cave, southwestern Western Australia It is, of course, obvious that palaeontological faunas are as prone to sample size effects as are archaeological ones. The Drover’s Cave fauna (Lundelius, 1960) illustrates this fact well.

Located on the coastal plain of southwestern Western Australia, Drover’s Cave provided a fauna1 sequence estimated by Lundelius (1960) to have begun sometimes during the late Pleistocene and to have continued into comparatively recent times. As did Harper & Alder (1970), Lundelius (1960) used changing relative abundances of species through time as an indicator of past environments. He noted that five taxa increased in abundance through time at the cave, while one (Pseudomys occidentalis) decreased. The decrease of P. occidentalis, today an animal of areas more humid than that in which Drover’s Cave is located, was interpreted by Lundelius (1960) as indicating more humid conditions during the time of deposition of the deeper sediments within the cave. The increase of two of the five taxa which became more common through time in the cave was interpreted as indica- ting an increase in aridity in the region. The increase in abundance of the three remaining taxa in this group was not given a palaeoenvironmental interpretation.

Table 4. Sample sizes and relative abundances of selected mammals, Drover’s Cave (from Lundelius, 1960)

Taxonomic group 4.5-5ft NISP by level 4ft 2.5ft (rift

A 175 70 216 285 B 41 53 62 C 2:: 139 68

Total NISP, all mammals 1388 788 659

Ranks and relative abundances Level NISP Rank Group A(%) Rank Group B( %) Rank Group C( %) Rank

4.5-5ft 1388 13 3 04 4 15 2 4ft 788 : 09 4 05 3 18 1

25ft 768 3 28 2 07 2 12 3 o-lft 659 4 43 1 09 1 10 4

Spearman’s rs, MS&group A% = -0.80 (P = 0.10). Spearman’s r,, NISP-group B % = - 1.00 (P< 005). Spearman’s r,, NISP-group C% = f0.80 (P = @lo).

Table 4 presents information on the relative and absolute abundances of these taxa. Group A includes those taxa which Lundelius (1960) noted as increasing through time; group B includes the two talra which Lundelius (1960) saw as implying increasing aridity through time, while group C includes the one species which Lundelius (1960) interpreted as implying more humid conditions at the time of accumulation of the deeper sediments in the site. Inspection of this table again suggests sample size effects; calculation of rank order correlation coefficients confirms the presence of a relationship between sample sizes and the relative abundances of the taxa examined by Lundelius (1960). Spearman’s rS between group A and sample size is -0.80 (P = 0.10); for group B and sample size, - 1.00 (PC 0.05); and, for group C and sample size, +0*80 (P = 0.10). As in the previous examples, it is reasonable to suspect that the abundances noted by Lundelius (1960) may be at least in part a function of sample size.

Sample size and derived abundances: some general comments In the four examples just presented, I have demonstrated significant correlations between relative abundances of taxa and the sizes of samples from which those relative abundances

82 D. K. GRAYSON

were defined. There are two possible reasons for those correlations. First, it is possible that no causal relationship between sample size and the derived measures exists: for instance, some third factor may be causing both. In the case of Hogup Cave, for example, it is possible that both increasing abundances of xeric rodents and decreasing sample sizes are caused by an increasingly arid environment; were this the case, then the relative abundances of xeric rodents might, in fact, provide a good indicator of aridity, as might the sample sizes themselves.

Second, changing sample sizes might be determining relative abundance values. For none of the examples presented above has a convincing argument been made that the retrieved bone sample is, in fact, representative of the populations about which inferences are being made. It is important to note that these populations are actually of at least two sorts (Grayson, 1979, in press). In conducting subsistence analyses of single sites, the target population is the population of animals originally deposited in that site itself. In conducting palaeoenvironmental analyses, the target population is the population of animals which existed in the environments of interest, of which the set of animals de- posited in the site is itself a sample drawn in usually unknown ways. In both situations, questions of the relationship between the bones present in the site at the time of excava- tion and the population about which inferences are to be made are extremely complex. Although I am not going to discuss these complexities here (see, for instance, Behrens- meyer et al., 1979; Grayson, 1979, in press, and references therein), I do wish to point out that these complexities make the assessment of whether or not a sample is representative of the target population equally complex. Indeed, given our knowledge of the taphonomy of archaeological and palaeontological deposits, it is difficult to see that the authors discussed above could have presented convincing arguments documenting the representa- tive nature of the bone samples with which they were dealing. The problem lies not with the authors, but with archaeological and palaeontological method and theory.

But if the sample cannot be shown to be representative, and if in fact it does not accurately portray variability in the target population, sample size effects may readily come to the fore. For instance, one cause of such effects in vertebrate fauna1 analysis may relate to the fact that in most (but not all) archaeological and palaeontological sites, only a very few taxa are represented by a large number of elements; the majority of taxa are rep- resented by a small number of identified specimens (Casteel, nd.; Grayson, 1979), a situation which mirrors the distribution of abundance of living organisms in many environments (Williams, 1964). Since this is the case, small samples will most likely over- represent the most abundant taxa; as sample size increases, the abundance of rarer taxa will increase strictly as a function of the probability that such rarer taxa will be detected (see also Tipper, 1979). As a result, measures of relative abundance may vary significantly as sample sizes vary. Clearly, unless it can be argued that the sample retrieved is represen- tative of the population sampled, the possibility that measures derived from that sample are a function of sample size must be seriously entertained.

In sum, while the significant correlations between sample sizes and relative abundances in the examples discussed above cannot be said necessarily to indicate sample size effects, the fact that there is no compelling set of reasons which implies that the retrieved samples are representative of the populations sampled suggests that these measures may as readily be measuring sample size as they are anything else. Thus, in these cases, it is reasonable to question the validity of those measures as indicating anything other than the size of the samples from which they were derived.

Diversity Indices and Sample Size Ecological theory specifies that the adaptations of many organisms are, in part, keyed to

THE EFFECTS OF SAMPLE SIZE 83

the distribution and abundance of the organisms upon which they depend for subsistence. Generalists, which feed upon a wide variety of organisms in roughly equal numbers, differ in many ways from specialists, which prey upon a smaller number of taxa but which utilize larger numbers of individuals of those taxa (see, for instance, MacArthur, 1972; Cody, 1974). Similar arguments have been made for human adaptations (Cleland, 1966; Dunnell, 1972), and it does, in fact, seem clear that much may be learned from the investigation of generalist versus specialist adaptations.

As a result, it becomes important to be able to measure subsistence resource diversity within human subsistence systems. Archaeologically, suchmeasures can be readily applied since the information which most diversity measures require-identification of the taxa present, and the abundances of those taxa-are commonly available from archaeological data.

Unfortunately, the archaeological applications of such diversity measures are quite prone to sample size effects. This situation results from the fact that MN1 values must be used in these measures, since most indices provide an indication not only of the numbers of taxa present, but also of the nature of the distribution of individuals across those taxa. However, the minimum number of individuals is, as I have discussed elsewhere, a function of sample size (Grayson, 197&r, 1979). Casteel (n.d.) has pointed out that since the relationship

MNJ/NISP = a(NISP)o (1)

obtains (Grayson, 1978a), then it is also true that

MN1 = a(NISP) p +l (2)

That is, the minimum number of individuals bears a predictable relationship to sample size (see also Ducos, 1968, 1975).

Since this is the case, whenever the relative abundances of taxa measured by NISP are a function of sample size, the diversity measures using MN1 will also be a function of sample size. I shall illustrate this relationship using one popular measure of diversity.

In frequent use in modern ecological studies, the measure - Xpi In pI has also been used as an index of diversity in archaeological analyses (Wing, 1963, 1975; see MacArthur, 1972 for a discussion of this measure). The value p, in this index is calculated as MNIi( lOO)/ XMNI; thus, this measure may be rewritten as

MNI,( 100) MNIi(lOO) -c In (3)

CMNI CMNI

Since MN1 = a(NISP)s +l (equation 2), equation 3 also equals

ar(NISP)b +li(lOO) ly(NISP)a +li(lOO) -z In (4)

C a(NISP) B +l C ar(NISP) s +l

Clearly, if the value (NISP),/C(NISP) is a function of sample size, then the diversity indices calculated from such data will also vary as a function of sample size.

To illustrate this effect, I shall examine the data presented in Wing (1963). Wing (1963) analysed the vertebrate remains from the Jungerman Site, located near the east coast of Florida. As part of this analysis, she calculated diversity indices for the fauna from nine of the 13 excavated levels, four pertaining to the St John’s I phase, the remaining five to the St John’s II phase. She discovered that diversity measures for St John’s II levels were less than those for St John’s I levels, and suggested several cultural explanations to account for this apparent decrease in diversity through time.

84 D. K. GRAYSON

Since diversity measures will vary as a function of sample size if the values (NISP),/ C(NISP) vary as a function of sample size, it is appropriate to examine this relationship in the Jungerman data employed by Wing (1963) to calculate diversity indices. Table 5 presents those values for the two most abundant taxa at Jungerman: gopher tortoise (Gopherus polyphemus) and sharks (Spualiformes). The Spearman’s r, value between (NISP),/C(NISP)and total level NISP forgopher tortoiseis - 0.87 (PC 0.01); that between (NISP),/Z(NISP) and total level NISP for sharks is $0.95 (P<O.Ol). Clearly, it is reasonable to suppose that these values are, in fact, determined by sample size : as sample size per level increases, the relative abundance of gopher tortoise decreases, while that for shark increases (since these are percentage values, the two measures are not, of course, independent of one another).

Table 5. NISP,/ ZNISP and sample size values for gopher tortoise and shark, Jungerman fauna (from Wing, 1963)

Gopher tortoise Shark Level NISP(( lOO)/NISP Rank NISP,(lOO)/NISP Rank Total level NISP Rank

2 82 3 45 4 89 5 35 8 07 9 13 (13.2)

10 13 (13.4) 11 19 12 17

00 00 02 06 61 45 37 28 25

8.5 17 8.5 11 7 45 6 34 1 180 2 3 4 5

220 134

54 63

Spearman’s r,, NISP,(lOO)/ ZNISP gopher tortoise-NISP = 0.87 (PC 0.01). Spearman’s r,, NISP,(lOO)/ ZNISP shark-NISP = + 0.95 (PC 0.01).

Because these values vary in concert with sample size, Wing’s diversity measures, calculated as - Epr In pi, should also vary with sample size. Table 6 presents minimum numbers and element counts by level, and the diversity measures calculated for those levels. The relationship predicted on the basis of the behaviour of (NISP),/X(NISP) values occurs: the Spearman’s r, coefficient between MN1 and diversity is +0.87 (PC 0.01); that between NISP and diversity is + 0.85 (PC O*Ol).

Table 6. Diversity indices and sample sizes, Jungerman fauna (from Wing 1963); all diversity indices recalculated from data presented by Wing)

Level MN1 Rank NISP

2 6 9 17 3 7 7.5 11 4 7 7.5 45 5 17 6 34 8 38 2 180 9 33 3 220

10 40 1 134 11 20 5 54 12 22 4 63

Rank Diversity -___

8 1.25 9 1.27 6 1.27 7 1.17 2 2.66 1 2.70 3 2.98 5 2.32 4 2.29

Rank --

8 6.5 6-5 9 3 2 1 4 5

Spearman’s r,, MNI-diversity = + 0.87 (PC 0.01). Spearman’s r,, NISP-diversity = + 0.85 (PC 0.01).

THE EFFECTS OF SAMPLE SIZE 85

Thus, the predicted relationship exists. The general conclusion is clear: if the values (NISP),/X(NISP) vary with sample size, then the diversity measures based upon these values will also so vary. As a result, the meaning of such indices will become clouded : it may not be at all clear whether they are measuring the diversity of an archaeological fauna, or the size of the fauna1 samples per stratum or level excavated from that site.

MNI/NISP as a measure of relative skeletal completeness

In a number of places, I have noted that the relationship between the ratio MNI/NISP in a given collection is hyperbolic: the larger the sample size, the smaller the ratio (Grayson, 1978a, b). Since MN1 values are defined from element counts, the nature of this relation- ship must be taken into account in assessing the meaning of MN1 values, and of any measure which attempts to use the raw ratio MNI/NISP as an indicator of anything other than sample size. For instance, Shotwell’s attempt to use the corrected number of specimens per individual (CSI) as an indicator of community membership of taxa represented within mammalian palaeontological assemblages (Shotwell, 1955, 1958) fails because this measure has as its heart the relationship NISP/MNI and is, as a result, a measure of sample size, not of community membership (Grayson, 1978b).

Numerous other suggestions concerning the possible use of the ratio NISP/MNI or MNI/NISP have been made. Chaplin (1971) for instance, suggested the ratio be used to recognize “butcher’s meat”, as opposed to animals killed where they were eaten. One of the most clever suggested uses of this ratio was proposed by Thomas (1971), who adopted Shotwell’s CSI in an attempt to distinguish taxa brought to an archaeological site by humans from those present for non-cultural (“natural”) reasons. Thomas (1971) reasoned that taxa represented by more complete skeletons-more elements per indi- vidual-most likely represented those present as a result of natural causes, since the “dietary practices of man tend to destroy and disperse the bones of his prey-species” (1971, p. 367).

Thomas defined a coefficient, B, as 5- ln(CS1) in order to assess degree of skeletal disruption in the faunas with which he was working. The measure CSI he defined precisely as Shotwell (1955, 1958) defined it: lOO(NISP)/Estimated Number of Elements (MNI), in which the estimated number of elements refers to the number of identifiable elements in the skeleton of the taxon in question.

Since MNI/NISP is related to NISP in a hyperbolic fashion, however, the measure CSI, which is essentially a normed reciprocal of MNI/NISP, must vary with sample size in a fashion reciprocal to that of MNI/NISP (see also Grayson, 1978b). Further, the coefficient B, used by Thomas (1971) to separate natural from cultural bone, must also be determined by sample size: the larger the sample size, the smaller the value of the coefficient.

Examination of the relationship between coefficient B and ln(NISP) for the three sites analysed by Thomas (1971) shows that these variables are tightly correlated in all three data sets: for Little Smoky, Pearson’s r = -0.87 (PC 0.001); for Hanging Rock, Y = -0.92 (PC 0.001); and, for Smoky Creek, r = -0.92 (PC 0.001).

Thus, Thomas’ coefficient B primarily measures sample size, not relative skeletal completeness per taxon. There are ways of avoiding the effects of the relationship between MNI/NISP (or any related measure, such as CSI or B) and NISP. The best-fit line between a set of MNI/NISP and NISP values within a fauna1 collection represents the predicted MNI/NISP values for all sample sizes for that collection. Points which fall above this line represent taxa whose skeletons are relatively less complete than those beneath it; if CSI (or NISP/MNI) is the measure employed, those points which fall above the line are relatively more skeletally complete than those beneath it (Grayson, 1978b). Thus, it is the residuals, and not the MNI/NISP (or CSI, or B) values themselves

86 D. K. GRAYSON

which must become the target of analysis (see Draper & Smith, 1966 for a discussion of the analysis of residuals). Without such an approach, however, MNI/NISP, or related measures, are primarily providing information about sample size, not about relative skeletal completeness.

Conclusions

Sample size effects in vertebrate fauna1 analysis are pernicious: they seem to lurk every- where, even in the apparently solid fauna1 studies I have examined here. The kinds of effects I have discussed have two sources. Those present in measures derived from ratios of minimum numbers to specimen counts derive from the mathematical relationship between these two, functionally related, variables. As a result, any measure incorporating such a ratio will vary as a function of sample size and thus be an invalid index of whatever else one is trying to measure, unless steps are taken to remove the effects of sample size statistically. I have suggested one way of accomplishing such a removal.

Sample size effects on measures derived from numbers of identified specimens are more troublesome, not only because they are less obvious, but also because the cure is less simple. I wish to emphasize that the examples I have discussed here do not provide instances of bad science; they provide instances of perceptive analyses which may be flawed by problems which are not at all obvious. I have suggested that the basic source of these effects lies in the fact that fauna1 analysts have traditionally worried little about whether or not their samples are representative of the populations about which they are trying to make inferences. If the samples under study are not representative, variation in sample size may become the source of variation in derived measures, rather than variation in the population under study. How frequently has this occurred in the published litera- ture ? I do not really know, but it may be of interest to point out that for every three faunas I have examined for these problems, one did not pass the test (examples of those that did pass include the faunas presented in Butler, 1972; Flannery, 1967; Harris, 1963; Hole et al., 1969; Klein, 1976).

It is clear, fortunately, that it is often easy to detect sample size effects: all one need do is look for them. While there are many ways to conduct such a search, in this paper I have depended heavily on rank order correlation coefficients since I believe that specimen counts and minimum numbers can usually be at most ordinal scale measures (Grayson, 1979, in press). The approach which I have used to test for sample size effects here can be used generally: rank the units which are being measured in terms of sample size, rank them in terms of the derived measure, and test to see if a significant rank order correlation emerges. If it does, then the possibility that the derived measure is a function of sample size, rather than a valid measure of the variable of interest, must be seriously entertained.

The need for more detailed analyses of the quantitative structure of archaeological and palaeontological faunas, and of the causes of this structure, is, perhaps, clear. More specifically, the apparently common occurrence of sample size effects on derived measures in vertebrate fauna1 analysis suggests that issues of sample size, on the one hand, and of the representativeness of fauna1 samples, on the other, must continue as a focus of detailed research. It is one thing to be aware of a potential problem and to have methods to detect its presence; it is quite another to design research questions and research pro- grams in such a way that the problem will not exist.

Acknowledgements

I thank C. Melvin Aikens, David J. Meltzer, David H. Thomas and Elizabeth S. Wing for critical comments on this paper.

THE EFFECTS OF SAMPLE SIZE

References

Behrensmeyer, A. K., Western, D. & Boaz, D. E. D. (1979). New perspectives in vertebrate palaeoecology from a Recent bone assemblage. Paleobiology 5, 12-21.

Butler, B. R. (1972). The Holocene or postglacial ecological crisis on the eastern Snake River plain. Tebiwa 15, 49-61.

Casteel, R. W. (n.d.). A treatise on the minimum number of individuals index: an analysis of its behaviour and a method for its prediction. Manuscript on file, Department of Archaeology, Simon Fraser University.

Chaplin, R. E. (1971). The Study of Animal Bones from Archaeological Sites. London: Seminar Press.

Cleland, C. E. (1966). The prehistoric animal ecology and ethnozoology of the upper Great Lakes region. Museum of Anthropology, University of Michigan, Anthropological Papers 29.

Cody, M. L. (1974). Competition and the structure of bird communities. Monographs in Population Biology 7. Princeton : Princeton University Press.

Draper, N. R. & Smith, H. (1966). Applied Regression Analysis. New York: Wiley. Ducos, R. (1968). L’origine des animaux domestiques en Palestine. Publications de l’lnstitut

de Pre’histoire de 1’ Universite de Bordeaux 6. Ducos, R. (1975). Analyse statistique des collections d’ossements d’animaux. In (A. T. Clason,

Ed.) Archaeozoological Studies. Amsterdam: North-Holland Publishing Co., pp. 35-44. Dunnell, R. C. (1972). The prehistory of Fishtrap, Kentucky. Yale University Publications in

Anthropology 75. Flannery, K. V. (1967). The vertebrate fauna and hunting patterns. In (D. S. Byers, Ed.)

The Prehistory of the Tehuacan Valley, Volume I: Environment and Subsistence. Austin : University of Texas Press, pp. 132-177.

Grayson, D. K. (1978a). Minimum numbers and sample size in vertebrate fauna1 analysis. American Antiquity 43, 53-65.

Grayson, D. K. (1978b). Reconstructing mammalian communities: a discussion of Shotwell’s method of paleoecological analysis. Paleobiology 4, 77-81.

Grayson, D. K. (1979). On the quantification of vertebrate archaeofaunas. In (M. B. Schiffer, Ed.) Advances in Archaeological Method and Theory. Vol. 2. New York: Academic Press, pp. 199-237.

Grayson, D. K. A critical view of the use of archaeological vertebrates in paleoenvironmental reconstruction. In (M. V. Gallagher, Ed.) Ethnobiology Today: A Collection of Papers Honoring Lyndon L. Hargrave and Alfred E. Whiting. Flagstaff: Museum of Northern Arizona, in press.

Greene, J. L. & Matthews, T. W. (1976). Fauna1 study of unworked mammalian bones. In (E. W. Haury, Ed.) The Hohokam: Desert Farmers and Craftsmen. Tucson: University of Arizona Press, pp. 367-373.

Harper, K. T. & Alder, G. M. (1970). The macroscopic plant remains of the deposits of Hogup Cave, Utah, and their paleoclimatic interpretation. In (C. M. Aikens, Ed.) Hogup Cave. University of Utah Anthropological Papers 93, 215-240.

Harris, A. H. (1963). Vertebrate remains and past environmental reconstruction in the Navajo Reservoir district. Museum of New Mexico Papers in Anthropology 11.

Haury, E. (1976). The Hohokam: Desert Farmers and Craftsmen. Tucson: University of Arizona Press.

Helmstadter, G. C. (1964). Principfes of Psychological Measurement. New York: Appleton-Century-Crofts.

Hole, F., Flannery, K. V. & Neely, J. A. (1969). Prehistory and human ecology of the Deh Luran Plain. Museum of Anthropology, University of Michigan, Memoirs 1.

Klein, R. G. (1976). The mammalian fauna of the Klasies River mouth sites, southern Cape Province, South Africa. South African Archaeological Bulletin 31, 75-98.

Lundelius, E., Jr. (1960). Post-Pleistocene fauna1 succession in Western Australia and its climatic interpretation. Proceedings of the International Geological Congress 21, 142-153.

MacArthur, R. H. (1972). Geographical Ecology. New York: Harper and Row. Parmalee, P. (1959). Animal remains from the Raddatz Rockshelter, Sk5, Wisconsin.

Wisconsin Archaeologist 4, 83-90.

D. K. GRAYSON

Shotwell, J. A. (1955). An approach to the paleoecology of mammals. Ecology 36, 327-337. Shotwell, J. A. (1958). Inter-community relationships in Hemphillian (mid-Pliocene)

mammals. EcoZogy 39, 271-282. Thomas, D. H. (1971). On distinguishing natural from cultural bone in archaeological sites.

American Antiquity 36, 366-371. Tipper, J. C. (1979). Rarefaction and rarefiction-the use and abuse of a method in paleo-

ecology. Paleobiology 5, 423-434. Williams, C. B. (1964). Patterns in the Balance of Nature. London: Academic Press. Wing, E. S. (1963). Vertebrates from the Jungerman and Goodman sites near the east coast

of Florida. Contributions of the Florida State Museum, Social Sciences 10, 51-60. Wing, E. S. (1975). Hunting and herding in the Peruvian Andes. In (A. T. Clason, Ed.)

Archaeozoofogical Studies. Amsterdam: North-Holland Publishing Co., pp. 302-308. Wittry, W. (1959). The Raddatz Rockshelter, Sk5, Wisconsin. Wisconsin Archaeologist 40,

33-69. Wolff, R. G. (1975). Sampling and sample size in ecological analyses of fossil mammals.

Paleobiology 1, 195-204.