A bootstrap variance estimator for the observed species richness in quadrat sampling Steen Magnussen, Canadian Forest Service, Victoria BC Lorenzo Fattorini,

A bootstrap variance estimator for the observed species richness in quadrat

sampling

Steen Magnussen, Canadian Forest Service, Victoria BC

Lorenzo Fattorini, University of Sienna, IT

Ron McRoberts, USDA Forest Service, St. Paul Minnesota

TIES09-Bologna, July 5-9, 2009

The number of species (S) in a population is an important indicator of biodiversity. For many populations a census is infeasible. A sample survey yields an observed number of

species S(n) S. Interest in estimating richness. Model-based estimation of S and its precision. Quadrat sampling or k-distance sampling is

popular/efficient in vegetation surveys. Sample locations on a grid or syst. non-aligned.

Non-random species associations

Species co-occurrence in a sample unit is predominantly non-random(positive or negative correlation).

Non-randomness gives rise to over-dispersion in the sampling variance of S(n).

Non-random spatial distribution of specieshas no effect on E(S(n)) but lowers efficiency.

Do we need a variance estimator for S(n)?

The sampling variance of S(n) propagates to estimates of richness.

A variance estimator for should be consistent with the estimator of var[S(n)].

If estimator of S is a function of S(n) and other sample statistics: use delta technique to estimate variance.

We have only one design-based variance estimator for S(n), one that can be adapted to S(n)1, and one based on balanced repeated sample replications2.1. Haas PJ, Liu YS, Stokes L. 2006. An estimator of number of species from quadrat sampling. Biometrics 62: 135-14

2. Magnussen S. 2009. A balanced repeated replication estimator of sampling variance for apparent and predicted species richness. For. Sci.

S

A design-based estimator of variance Ugland KI, Gray JS, Ellingsen KE. 2003. The

species-accumulation curve and estimation of species richness. J. Anim. Ecol. 72: 888-897.

Finite population of N primary sampling units (PSU).

Sampling without replacement. Impractical but inspirational for the proposed

bootstrap estimator of variance. Designed for sub-sampling applications.

The expectation of S(n)

1

1

1

1

1 1 1 11 ( 1)

where number of species found in of sampled PSUs

and relative occurrence of species in P

Sj

j

N ni

i

i

j

N NE S n S

nn

fn n nS

N N N i S

f i n

j N

SUs

22

22

1

var

2

1 if th species samplewhere

0 otherwise

1 if th and th species sampleand

0 otherwise

S

i i ji i j

i

ij

S n E S n E S n

E I E I I E S n

iI

i jI

The bootstrap estimator

data: an ( ) occurrence matrix n S nn S n δ

1 if species occurs in th PSU

0 otherwiseij

j i

• Generate by N-n hot-deck imputations for non-sampled PSUs.

• Bootstrap samples (wor) of size n would miss Δ*S(n) species.

• Add Δ*S(n) columns to .*

( )N S nδ

*( )N S nδ

Expected number of missed of species ΔS(n)

1*( )

*

1

S nj

j

NN NS n

nn

*

* * *

* * *

ˆ

Add: columns to

Bernoulli~

j j

N S n

Dist

N N

S n

S n S n

δ

Adding species (columns)

*

Select the columns to be added

from the columns of

with probability proportional to the

chance of being missed in a sample of size .

N S n

n

δ

Bootstrap sample

Take a size n (wor) random sample from the augmented matrix

Repeat the sequence of hot-deck imputations, augmentation, and bootstrap sampling B times.

* *

*

N S n δ

The bootstrap variance estimator

Generate for each bootstrap sample the species sample occurrence indicators

*, *,,

*, *,

and , 1,...,

, 1,...,

b bi i j

b b

b

I I b B

i j Max S n S n

Compute var(S(n)) as per Ugland et al.

Assessment of estimator

Simulated wor sampling from large USDA Forest Service FIA collections of plot data.

Sample sizes n = 20, 40,...,120 (fp < 0.05). FIA plot records treated as finite populations. State-wide inventories from Georgia (GA), Minnesota

(MN), and Utah (UT). Regional inventory from Wisconsin (ASP212). Monte-Carlo variance = benchmark (10,000 samples). Coverage of estimated (95%) confidence intervals.

Data set Year N(plots)

Species(S)

Min, median,and max noof speciesper PSU

Median noof treesper plot

ASP212 ca. 2002 5771 76 1,2,15 8

Georgia (GA) 1989 6524 82 1,4,16 16

Georgia (GA) 2006 4429 147 1,5,19 25

Minnesota (MN) 1977 8815 54 1,4,11 19

Minnesota (MN) 2006 5769 70 1,4,12 25

Utah (UT) 1993 2733 20 1,2,7 14

Utah (UT) 2006 2198 20 1,2,8 17

Conclusions

The bootstrap variance estimator performs reasonably well in low-intensity sampling in species-rich and species-poor populations.

Tendency to underestimate actual variance. Coverage of CI95’s typically 0.90-0.93. About par with a Bal. Rep. Repl. estimator Much better than estimator by Haas et al.

Documents

A bootstrap variance estimator for the observed species richness in quadrat sampling Steen Magnussen, Canadian Forest Service, Victoria BC Lorenzo Fattorini,