9

Click here to load reader

Which species are described first?: the case of North American butterflies

Embed Size (px)

Citation preview

Page 1: Which species are described first?: the case of North American butterflies

Biodiversity and Conservation 4, 119-127 (1995)

Which species are described first?: the case of North American butterflies KEVIN J. GASTON*$§, TIM M. BLACKBUR N § and N A T A S H A LODER~¶ ~Department of Entomology, The Natural History Museum, Cromwell Road, London SW7 5BD, UK, §Department of Biology and NERC Centre for Population Biology, Imperial College, Silwood Park, Ascot, Berkshire SL5 7PY, UK and ¶Department of Animal and Plant Sciences, P.O. Box 601, University of Sheffield, Sheffield SIO 2UQ, UK

Received 15 June 1994; revised and accepted 14 September 1994

Within a taxon, some species are described before others, and some have greater numbers of synonyms. Here, we explore the correlates of date of description and numbers of synonyms for a well-known group of species, the butterflies of North America. Larger and more widespread species were described earlier. Species which were described earlier and are more widespread have greater numbers of associated synonyms. Development of an understanding both of patterns of non-random description and of their determinants is particularly important as increasing use is made of historical (museum) collections of specimens to document spatial patterns in the occurrence of individual species and levels of species richness, and large scale patterns in species-level traits. Exploration of patterns of description of well-known groups provides a point of reference for assessing potential difficulties with those which are more poorly known.

Keywords: butterflies; description; taxonomy; synonymy.

Introduction

The described fauna is a non-random subset of all species. Vertebrate taxa are better known than invertebrate, and within groups of invertebrates some groups are better known than others. For example, a greater proportion of dragonfly and butterfly species have been described than of weevils or ichneumonids (Gaston, 1991a). A variety of variables can be suggested as important in determining which groups are described differentially at the species level. These include the relatedness of the taxon to humans, its body size, its apparency and its spatial distribution. Within taxa, the reasons why some species are described earlier than others are often less obvious. Again, factors such as body size, apparency and distribution have been imputed. Indeed, correlations between dates of description and individual factors have been explored on a few occasions. For example, there is a negative relationship between body length and date of description for British beetles (Gaston, 1991b), and the most recently described species of bird have a mean body mass which is less than would be expected if they were a random sample from the world avifauna (Gaston and Blackburn, 1994). However, we are aware of only one study to date (for oscine passerine birds) which has examined the relationships between several possible factors and dates of species description within a group (Blackburn and Gaston, in press).

*To whom correspondence should be addressed.

0960-3115 © 1995 Chapman & Hall

Page 2: Which species are described first?: the case of North American butterflies

120 Gaston et al.

Work on the related problem of synonymy is yet more scarce. Whilst it is evident that many species are described under more than one name, no empirical analyses appear to have been performed to ascertain the characteristics of species which are associated with their names having low or high numbers of allied synonyms. Nonetheless, amongst many insect taxa at least, 20% or more of species-group names have been found to be synonyms (Gaston and Mound, 1993; Mound and Gaston, 1993).

In this paper we explore the correlates of date of description and numbers of synonyms for species of North American butterflies.

Materials and methods

We considered 537 species of North American butterflies. These were delimited as the assemblage having year-round ranges in North America north of Mexico (determined from Scott, 1986). Analyses were performed on these species, and separately on the subset endemic to this area (260 species). Where possible, each valid species was scored on the following variables: (i) date of description (Miller and Brown, 1981); (ii) numbers of associated synonyms (Miller and Brown, 1981), counted as the number of species-group synonyms (including the number of subspecies); (iii) body size, measured as wingspan (Pyle, 1981; Opler and Krizek, 1984). If sizes of subspecies were given separately, or a range of sizes provided, the arithmetic mean of the range was used; (iv) range size, scored as the number of grid squares occupied of an equal-area grid (size approx. 611 000 km 2, Williams, 1992), for species year-round and migrant distributions. Data on species distributions were taken from Scott (1986); (v) number of life zones, of a possible seven, occupied (Scott, 1986). Life zones are large-scale biogeographic areas defined on the basis of temperature and predominant vegetational cover, and the number occupied thus represents a crude measure of ecological flexibility; (vi) number of closely related species, scored as the number of congenerics.

Description date, body size, geographic range size and number of closely related species were normalized by log10 transformation before analysis, and data on the number of synonyms normalized by lOgl0(n + 1) transformation (some species have no synonyms).

Closely related species may share adaptations through common ancestry. Thus, analysing species in a comparative study such as this, may overestimate the actual number of times a trait or a relationship among traits evolved. Controlling for the effects of identity by descent is therefore important (Harvey and Pagel, 1991). It is not immediately obvious that common ancestry should affect species dates of description or number of synonyms, because these traits are not inherited, but consider the following example. Suppose all members of the family Papilionidae have early description dates and large body sizes, whereas most other species have late description dates and small body sizes. A relationship between body size and description date based on a comparison across all species could simply result from the difference between papilionids and other butterflies. However, papilionids may differ from other butterflies in a number of other ways, some of which may be difficult to quantify: they may be more brightly coloured, or have behavioural traits that make them more likely to be collected at earlier dates than other species. A more convincing test of a comparative relationship would be to document it within all butterfly taxa individually. A relationship demonstrated between, say, description date and body size within each taxon in the fauna will not be biased by special features of one particular

Page 3: Which species are described first?: the case of North American butterflies

Description of North American butterfly species 121

taxon. Therefore, in the analyses that follow, we compare relationships between species traits both across species, and within taxa.

We examine relationships between species traits within taxa using a method of analysis derived from a model reported by Felsenstein (1985). Felsenstein described a method for making comparisons between pairs of taxa at each bifurcation of a known true phylogeny. The difference between two taxa that share an immediate common ancestor is not confounded by phylogenetic differences between them, because the two taxa share all of their phylogenetic history down to the point of their split from their common ancestor. If two variables are correlated, a large difference on one of them should be associated with a large difference on the other. The set of differences obtained from comparing the pairs of taxa at each higher node of a branching phylogeny can be used to test the overall comparative hypothesis: large differences on one variable should be associated with large differences on the other, and vice versa, within each taxon across the phylogenetic tree if the hypothesis of a positive relationship is true. Unfortunately, the applicability of Felsenstein's method is limited by its requirement that the true phylogeny be known.

The method used here (evolutionary covariance regression; Pagel and Harvey, 1989; HarVey and Pagel, 1991) employs Felsenstein's idea of making comparisons between taxa sharing common ancestors and applies it to data sets for which the true phylogeny may not be known. This method calculates a single value ('contrast') for each variable within each taxon in a taxonomy, for example across species within genera, across genera within tribes, across tribes within subfamilies, and so on. Each contrast represents the magnitude and direction of the change in the variable within the taxon. If variables are correlated independent of taxonomy, they will show similar changes within each taxon, and hence their contrasts will also be correlated. The set of within-taxa contrasts can be analysed using standard regression techniques, with the proviso that regressions on contrasts must be forced through the origin (Garland et aL, 1992). All regressions, both within taxa and across species, were calculated using the methods of ordinary least squares. Stepwise multiple regression models were constructed using a forward selection procedure (Sokal and Rohlf, 1981), adding to the model the predictor variable with the highest partial correlation at each step, with the proviso that this variable explained a significant (p < 0.05) additional proportion of the variance in the dependent variable.

Results

(i) Interspecific analysis Species' body size, geographic range size, number of closely related species and number of life zones occupied were all correlated with species' date of description, although there was no correlation between number of closely related species and description date when analysis was restricted to North American endemics (Table 1). Large bodied species, with large geographic ranges, and occupying a wider range of life zones, tend to be described earlier. We used stepwise multiple regression to determine which, if any, of these variables were correlated with description date independently of the other variables, and which variables together explained most variance in description date. Geographic range size and body size are the strongest correlates of species' description date, and are correlated with description date independently. Controlling for these two variables, the number of closely related species explains additional variance in species' description date, but the number of

Page 4: Which species are described first?: the case of North American butterflies

122 Gaston et al.

Table 1. The relationship between species date of description (dependent variable) and the variables in the first column, across species, either including all species, or restricting analysis to North American endemics, r = Pearson correlation coefficient, n = number of species in the analysis

All species North American endemics

r n p r n p

Log wingspan -0.365 480 0.0001 -0.263 236 0.0001 Log range size -0.416 528 0.0001 -0.512 258 0.0001 Log no. related spp. 0.136 528 0.0017 0.075 258 0.2280 Number of life zones -0.239 526 0.0001 -0.405 258 0.0001

life zones occupied does not (Table 2). These results hold both for all species and for North American endemics.

Table 2. Results of a stepwise multiple regression across species to determine which of the variables in Table 1 correlate independently with species date of description, and which variables together explain most variance in description date. Only those variables that explain a significant (p < 0.05) proportion of the variance in description date are included in this Table. The overall statistics show the total proportion of the variance explained (r 2) by the variables in the model

All species North American endemics

t n p t n p

Log wingspan -8.10 0.0001 -4.03 0.0001 Log range size -11.32 0.0001 -8.32 0.0001 Log no. related spp. 5.56 0.0001 2.17 0.031

r 2 r 2

Overall statistics 0.332 480 0.0001 0.289 236 0.0001

Species body size, geographic range, and number of life zones occupied were also all correlated with the number of synonyms under which a species had been described, as additionally was species date of description (Table 3). Species with a high number of associated synonyms tend to have been described early, to be large bodied, and to occupy a large geographic range and a high number of life zones. Species geographic range, and description date are the strongest correlates of number of synonyms, and are correlated with number of synonyms independently (Table 4).

(ii) Within-taxon analysis Analysis within taxa tended to confirm the results of the analysis across species for correlates of date of description. Thus, subtaxa of larger average body size within taxa, with large geographic ranges, and occupying more life zones, tend to be described earlier. Early described taxa also tend to include a higher number of closely related species, although the correlation is not formally significant when analysis is restricted to North American endemics (Table 5). Only range size entered significantly in stepwise multiple regression with description date as the dependent variable (Table 6).

Within taxa, geographic range, number of life zones occupied, and date of description were all correlated with number of synonyms (Table 7). Thus, subtaxa within taxa which,

Page 5: Which species are described first?: the case of North American butterflies

Description of North American butterfly species 123

Table 3. The relationships between number of synonyms for a species (dependent variable) and the variables in the first column, across species, and either including all species, or restricting analysis to North American endemics, r and n as in Table 1

All species North American endemics

x variable r n p r n p

Log wingspan 0.145 472 0.0015 0.211 233 0.0012 Log range size 0.442 510 0.0001 0.488 253 0.0001 Number of life zones 0.230 508 0.0001 0.357 253 0.0001 Log no. related spp. 0.084 510 0.057 0.015 253 0.817 Log description date -0.396 509 0.0001 -0.464 252 0.0001

Table 4. Results of a stepwise multiple regression across species of the variables in Table 3 on number of synonyms. See Table 2 and Methods for details

All species North American endemics

t n p t n p

Log range size 7.83 0.0001 5.45 0.0001 Log description date -5.99 0.0001 -4.67 0.0001

r 2 r 2

Overall statistics 0.248 509 0.0001 0.298 252 0.0001

on average, have larger geographic ranges or occupy more life zones, have earlier dates of description and include species with more synonyms. Geographic range size and description date explain significant amounts of the variance in number of synonyms when all variables are included in a stepwise multiple regression (Table 8). If analysis is restricted to North American endemics, geographic range size and number of occupied life zones are independently correlated with number of synonyms (Table 8).

D i s c u s s i o n

Developing an understanding of the non-randomness of patterns of species description within groups is important for several reasons. Foremost this is because of the increasing use being made of historical (museum) collections of specimens to document spatial patterns in the occurrence of individual species and levels of species richness, and large scale patterns in species-level traits. Without some knowledge of the biases in collections, such as the kinds of species which are most likely to be described and incorporated, such patterns cannot sensibly be interpreted.

For North American butterflies, earlier dates of species description tend to be associated with larger body sizes and larger geographic range sizes, whether analyses account for relatedness or not (Tables 1, 2, 5 and 6). They also tend to be associated with larger numbers of life zones. However, it seems probable that this variable partly acts as another measure of range size, with widely distributed species occurring in more life zones; unlike range size, number of life zones seldom enters significantly into the multiple

Page 6: Which species are described first?: the case of North American butterflies

124 Gaston et al.

Table 5. The relationships within-taxa between species date of description (dependent variable) and the variables in the first column, either including all species, or restricting analysis to North American endemics. Within-taxon relationships were calculated using evolutionary covariance regression (see Methods). r = Pearson correlation coefficient, n = number of independent within- taxon comparisons in the analysis

All species North American endemics

x variable r n p r n p

Log wingspan -0.261 82 0.017 -0.290 51 0.037 Log range size -0.650 99 0.0001 -0.735 56 0.0001 Log no. related spp. 0.583 16 0.014 0.457 16 0.065 Number of life zones -0.505 91 0.0001 -0.609 54 0.0001

Table 6. Results of a stepwise multiple regression of the variables in Table 3 on species date of description, using the within-taxon contrasts calculated by evolutionary covariance regression. See Table 2 and Methods for details

All species North American endemics

x variable t n p t n p

Log range size -8.09 0.0001 -7.01

r 2 r 2

Overall statistics 0.416 93 0.0001 0.491

0.0001

52 0.0001

regressions (Tables 2, 4, 6 and 8). Body size and range size are not significantly correlated for North American butterflies (N. Loder and K.J. Gaston, in preparation), a result which is reflected in the significant entry of both variables in some step-wise multiple regressions (Table 2).

It seems most probable that large bodied and widely distributed species were described earlier simply because they were collected more readily. In some sense they could be argued to be more apparent. Broad correlations between range size and abundance, suggest that widespread species also tend to have higher local densities (Brown, 1984; Gaston, 1994). This would further enhance their likelihood of early collection. Some of the earliest species to be described in the North American butterfly fauna as a whole have holarctic distributions and were described on the basis of specimens collected in Europe. The importance of body size may in part also reflect greater taxonomic difficulties posed by some small-bodied species of North American butterflies; some on-going problems in determining species limits involve small-bodied taxa (e.g. Panoquina panoquinoides, Amblyscirtes aenus, Poanes zabulon, Euphilotes battoides; Scott, 1986). Body size appears to have played a more significant role in the timing of description of North American butterfly species than it did with species of South American oscine passerines (Blackburn and Gaston, in press).

In addition to being larger and more widely distributed, species described early tend to have fewer congeners (Tables 1, 2 and 5). There seem to be two plausible reasons for this. First, smaller genera may have taken shorter periods to sort out taxonomically. Second,

Page 7: Which species are described first?: the case of North American butterflies

Description of North American butterfly species 125

there could be relationships between both the range size of a species and its body size and the number of congeners it has. The latter explanation can essentially be rejected. Although there is a relationship between range size and number of congeners across all species it explains little variance (r 2 -- 0.027, n = 537, p = 0.0001), and disappears when only endemics are considered (r2= 0.005, n = 260, NS). There is no relationship between body size and number of congeners for all species (r2= 0.0003, n = 484, NS), or for endemics (r2= 0.007, n--.237, NS).

Table 7. The relationships within-taxa between number of synonyms (dependent variable) and the variables in the first column, either including all species, or restricting analysis to North American endemics. Within-taxon relationships were calculated using evolutionary covariance regression (see Methods). r and n as in Table 5

All species North American endemics

r n p r n p

Log wingspan -0.047 86 0.664 -0.008 52 0.952 Log range size 0.647 98 0.0001 0.661 58 0.0001 Number of life zones 0.412 90 0.0001 0.581 56 0.0001 Log no. related spp. 0.205 17 0.414 0.027 16 0.919 Log description date -0.599 98 0.0001 -0.371 58 0.0001

Table 8. Results of a stepwise multiple regression of the variables in Table 7 on number of synonyms, using the within-taxon contrasts calculated by evolutionary covariance regression. See Table 2 and Methods for details

All species North American endemics

x variable t n p t n p

Log description date -2.77 0.007 Log range size 3.93 0.0001 2.007 0.05 Number of life zones 2.59 0.013

r 2 r 2

Overall statistics 0.427 90 0.0001 0.450 52 0.0001

Of the North Amer ican breeding butterfly species studied, 71% have associated synonyms. Range size can be postulated to be a particularly important determinant of the numbers of synonyms associated with a valid species names. This is because there is a greater likelihood that specimens collected in one area will be regarded as distinct f rom specimens collected f rom another area for species with large ranges, for which those areas can be far ther apart, and for which morphological variation is apt on average to be greater. Such a result is indeed found here (Tables 3, 4, 7 and 8), but is complicated by the relatively strong negative correlation between the numbers of synonyms associated with a valid species name and the time which has passed since that valid name was erected (i.e. date of descriptiofi; Tables 3 and 7). Nonetheless, with one exception, both variables enter significantly in the multiple regressions (Tables 4 and 8).

The emphasis of this paper has explicitly been placed on the effects of species traits on

Page 8: Which species are described first?: the case of North American butterflies

126 Gaston et al.

description and synonymy. It has, nonetheless, to be recognized that other factors play a role, in particular the history and geography of the entomological exploration of a region. These are likely to account for much of the residual variance in the patterns documented, and will interact with those patterns in potentially complex ways.

The North American butterflies are a well-known group of species. How well will the results documented generalize to other, more poorly known, groups of insects? As yet we do not know. However, it seems reasonable, and perhaps safest, to assume that for many other groups we have progressed little beyond essentially documenting the larger and more widespread species. Although techniques for collecting insect specimens have improved markedly over recent decades, most collecting continues to be serendipitous (Mound and Gaston, 1993), and much taxonomy is anyway based on material collected long ago. Indeed, if we have only described the larger and more widespread species, great caution will need to be exercised in both the interpretation of ecological and biogeographic patterns documented on the basis of historical collections, and in the analysis of such data for the purposes of establishing conservation priorities. Whilst we have some sympathy with the argument that conservation priorities be based on available data rather than on no data at all (V~iisfinen and Heli6vaara, 1994), it may have some serious shortcomings.

Acknowledgements

We are grateful to Laurence Mound and Malcolm ScoNe for helpful discussion and comments. T.M.B. was supported by NERC grant GR3/8029.

References

Blackburn, T.M. and Gaston, K.J. What determines the probability of discovering a species?: a study of South American oscine passerine birds. J. Biogeog. (In press)

Brown, J.H. (1984) On the relationship between abundance and distribution of species. Am. Nat. 124, 255-79.

Felsenstein, J. (1985) Phylogenies and the comparative method. Am. Nat. 125, 1-15. Garland, T., Harvey, P.H. and Ires, A.R. (1992) Procedures for the analysis of comparative data

using phylogenetically independent contrasts. Syst. Biol. 41, 18-32. Gaston, K.J. (1991a) The magnitude of global insect species richness. Conserv. Biol. 5, 283-96. Gaston, K.J. (1991b) Body size and probability of description: the beetle fauna of Britain. Ecol.

Entomol. 16, 505-8. Gaston, K.J. (1994) Rarity. London: Chapman & Hall. Gaston, K.J. and Blackburn, T.M. (1994) Are newly discovered bird species small-bodied? Biodiv.

Lett. 2, 16-20. Gaston, K.J. and Mound, L.A. (1993) Taxonomy, hypothesis testing and the biodiversity crisis. Proc.

R. Soc. (Lond. B) 251, 139-42. Harvey, P.H. and Pagel, M.D. (1991) The comparative method in evolutionary biology. Oxford:

Oxford University Press. Miller, L.D. and Brown, F.M. (1981) A catalogue~checklist of the butterflies of America, north of

Mexico. [S.1.]: The Lepidopterists' Society. Mound, L.A. and Gaston, K.J. (1993) Conservation and systematics - the agony and the ecstasy. In

Perspectives on Insect Conservation (K.J. Gaston, T.R. New and M.E. Samways, eds) pp. 185-95. Andover: Intercept.

Opler, P. and Krizek, G.O. (1984) Butterflies east of the Great Plains. Baltimore: John Hopkins University Press.

Page 9: Which species are described first?: the case of North American butterflies

Description o f North American butterfly species 127

Pagel, M.D. and Harvey, P.H. (1989) Comparative methods for examining adaptation depend on evolutionary models. Folia Primatol. 53, 203-20.

Pyle, R.M. (1981) The Audobon Society field guide to North American butterflies. New York: Chanticleer Press.

Scott, J.A, (1986) The butterflies of North America: a natural history and field guide. Stanford: Stanford University Press.

Sokal, R.R. and Rohlf, F.J. (1981) Biometry, 2nd edition. San Francisco: Freeman. V~is~inen, R. and Heli6vaara, K. (1994) Hot-spots of insect diversity in northern Europe. Ann. Zool.

Fennici 31, 71-81. Williams, P.H. (1992) WORLDMAP: priority areas for biodiversity. Using version 3. London:

Privately distributed.