1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating

1

Sampling designs for spawning data on

The Middle Fork Salmon River

*a lot like the Middle Fork Salmon R.

• What sampling design should be used for estimating the number of chinook redds on a river network*?

– estimation of status – number of spring-chinook redds in Middle Fork Salmon River one year

– Measurement design – we are not really thinking about the measurement design, we assume we have some way to identify and count redds once you get to a location.

2

The Middle Fork Salmon River

3

1995 1996 1997 1998 2001 2002

05

00

10

00

15

00

IndexOther

Redd data – “the Truth”

• IDFG Dataset (Russ Thurow) counted the number of redds in the Middle Fork Salmon River via helicopter

• All spawning reaches were censused each year• sampling was done by helicopter and where necessary by foot• Six years of data 1995-1998, 2001, and 2002

• These data can be considered the truth

year 1995 1996 1997 1998 2001 2002

Total redds 20 83 424 661 1789 1730

4

Objectives

• Compare several designs to see if one estimates the number of redds (and only redds) the best– unbiased designs (estimators)

– “best” determined by

• standard error of estimator

• coverage probability (how many times 95% confidence interval actually contains the number of redds)

• cost

– Keep things fair by sampling the same total length of stream, the index covers 976 segments or 195.2 km. of stream.

• Does not imply equal cost

• Although some standard errors can be calculated analytically the coverage needs to be addressed via simulation.

5

Methods

• Compare sampling strategies using IDFG data as the truth.• Sampling strategies include sampling design and estimator

.

.

.

sampledesign

Estimatorfor the total

And confidenceinterval

6

Fre

qu

en

cy

1400 1600 1800 2000 2200

05

01

00

15

02

00

25

0

Methods

• Use simulation by resampling the population over and over

.

.

.

7

Cost & Crew-trips

• Each segment gets an access pt.

• Travel to access sites based on whether

– airplane– Auto

• Travel from access sites to sampling reaches is the maximum distance from access site to furthest sampling reach in each “direction” along a tributary

• Cost = Fn(km by foot)4 round tripsrequired

8

distances in 5km intervals.

Many areas require over 20 km hike

Maximum distance is 33 km.

9

The sampling designs

• Index – sample the index reaches• Simple random sampling – using the unbiased estimator• Systematic sampling – sort tributaries in random order,

systematically sample along resulting line. • Stratify by Index – Sample independently within and outside

the index regions.• Adaptive cluster sampling – Choose segments with a simple

random sample. If sampled sites have redds sample adjacent segments.

• Spatially balanced design – Based on EMAP design, though selecting segments within primary sampling units rather than points (not yet implemented)

10

Index sampling

• When the sample size is smaller than the overall size of the index region a simple random sample of the segments within the index is assumed.

• Two possibilities to estimate the number of redds from the index sample:1. Assume there are no redds outside of the index – estimates will be

too small.

2. Assume that the average number of redds per segment outside the index is the same inside and simply inflate the index estimator – estimates will be too large.

11

1995 1996 1997 1998 2001 2002

01

00

02

00

03

00

04

00

05

00

0

* **

*

* *

estimates

* true value

Bias of Inflating Estimator from Index Sample

Redds

12

Systematic sampling

• Order the tributaries in random order along a line• Choose sampling interval, k, so that final sample size is

approximately n

• Select a random number, r, between 1 and k

• Sample reaches r, r+k, r+2k, …, r+(n-1)k

• Systematic sampling is cluster sampling where clusters are made up of units far apart in space and one cluster is sampled

k

r r+k r+2k r+4kr+3k

13

Stratify by Index

• Stratify by index and oversample index reaches• Simple random sample in each stratum

• Allocation:– Equal allocation: Usually does not perform well

– Proportional allocation: Does not oversample index sites so will probably not have good precision

– Optimal allocation: need to know the standard deviation

year 1995 1996 1997 1998 2001 2002

proportion in index

0.76 0.54 0.48 0.48 0.42 0.46

14

Adaptive cluster sampling

• Original sample is simple random sample

• If sampled site meets criteria also sample sites in neighborhood– Criteria: presence of redds

– Neighborhood: segments directly upstream and downstream

• Continue until sites do not meet criteria– Both legs of confluences 1

3

2

5

4

6

in original sample

2

Meets criteria

Meets criteria4

includeneighbor

6

anddo not meetcriteria

1 3

Final sample includes: 2 1 3 4 6

15

Design 20 83 424 661 1789 1730

SRS 39.0 18.6 9.9 8.2 7.3 7.1

Cluster (1km) 47.7 22.7 15.1 12.7 12.5 12.1

SYS 24.9 17.1 9.2 8.0 5.0 6.1

STRS (optimal) 27.2 15.3 8.6 7.2 6.7 6.2

ADAPT 39.3 18.8 10.5 7.6 8.7 8.4

Results: Normalized standard error of estimatorsRun size

SRS 86.6 93.0 94.5 95.3 93.3 94.3

Cluster (1km) 89.0 91.2 92.5 92.6 92.9 94.3

SYS 96.4 95.2 97.0 95.0 99.0 97.3

STRS (optimal) 92.0 95.0 94.3 95.5 92.9 93.5

ADAPT 87.6 94.8 94.7 94.9 94.7 93.9

Coverage Probability (.95)

16

Costs

SRS

SRS-

1km

SYS

STR

S

Adaptive sampling‘95 ‘96 ‘97 ‘98 ‘01 ‘02

Ada

ptiv

e

40

05

00

60

07

00

17

Precision per cost (10% sampling fraction)

big is good: high precision per km traveled

0 500 1000 1500

0.0

02

0.0

04

0.0

06

0.0

08

0.0

10

0.0

12

0.0

14

SRSSRS-1kmSYSSTRADP

SRSSRS-1kmSYSSTRADP

run size

Precision per cost

18

Conclusions

• Stratifying by index results in the most precise estimates except in the large runs where systematic sampling seems to work best.

• The index sites should be oversampled in the stratified design. Proportional allocation (based on the size of the strata) results in poor precision.

• Although the systematic sampling strategy often is the most precise, there is not a good estimator for the variance. The estimator that assumes a simple random sample is conservative.

• Same pattern for different sampling fractions.

19

Conclusions

• The cluster sampling design is not very precise but reduces costs significantly.

• Adaptive cluster sampling is not as precise as other designs.– It is optimal for rare clustered populations

– during small years the redds are not clustered enough

– during large years they are not rare enough

– only during the medium years does it compete with other designs.

• When cost and precision are analyzed together – small runs – either stratified by index or SRS-1km work best

– large runs – either systematic or stratified by index work best

20

Not yet finished

• EMAP type design.• successive difference variance estimator for the systematic

sampling• Adaptive sampling with same initial sample size (cost function

does not penalize this much)• Cost function

– including road travel

– crew trips/day units

21

Points vs. Lines

• Pick points -- points are picked along stream continuum and the measurement unit is constructed around the point

• advantages:– different size measurement

units are easily implemented

• disadvantages:– difficulty with overlapping units– inadvertent variable probability

design because of confluences and headwaters

– Analysis may be complicated

• Pick Segments – Universe is segmented before sampling and segments are picked from population of segments

• advantages:– simple to implement– simple estimators

• disadvantages:– Difficult frame construction

before sampling – Cannot accommodate varying

lengths of sampling unit

22

Adaptive Cluster Sampling

• Use the draw-by-draw probability estimator:

– Let wi be the average number of redds in the network of which segment i belongs, then

– with variance

Thompson 1992

23

COSTS

• Our costs are based on the number of kilometers traveled by foot.

• Each segment in the MF is assigned to an access point (this is not optimized in some rare instances the assigned access point is not the closest) and the distance along the stream from that access point is calculated

• There are two types of access points air fields and trailheads. For this exercise they both have the same price.

• Because we are tallying the number of km. along the streams, this cost function also models other types of sampling including via helicopter and raft.

24

Six years

1995 1996

25

Six years

1997 1998

26

Six years

2001 2002

27

Access to MFSR

• Roadless area• Airplane access

possible

28

29

air vs. car access

30

Index sample

• Not sure how to build estimates for total number of redds in Middle fork.– expand current estimator

(assume same density outside of index)

– use current estimate (assume 0 redds outside of index)

year 1995 1996 1997 1998 2001 2002

Number counted in Index

19 62 290 448 1178 1199

Total number of redds

20 83 424 661 1789 1730

31

32

Stratify by Index

• Oversample index sites where most redds are located• Simple random sample in each stratum

• Equal allocation:

• Proportional allocation:

year 1995 1996 1997 1998 2001 2002

5.33 12.68 36.61 47.34 121.43 106.98

coverage 90.4 94.6 94.2 94.8 92.9 93.4

year 1995 1996 1997 1998 2001 2002

7.77 15.26 41.08 52.37 124.90 115.56

coverage 88.0 94.7 95.0 94.9 94.4 93.6

33

Stratify by index

• Optimal allocation

• Using

year 1995 1996 1997 1998 2001 2002

proportion in index

0.76 0.54 0.48 0.48 0.42 0.46

n index 746 530 475 464 407 445

n other 230 446 501 512 569 531

year 1995 1996 1997 1998 2001 2002

5.49 12.76 36.60 47.26 120.50 106.58

coverage 92.0 95.0 94.3 95.5 92.9 93.5

34

Stratify by index

• Using

year 1995 1996 1997 1998 2001 2002

n index 746 530 475 464 407 445

n other 230 446 501 512 569 531

Documents

1 Sampling designs for spawning data on The Middle Fork Salmon River *a lot like the Middle Fork Salmon R. What sampling design should be used for estimating