Upload
maria-ortega
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
1
Sampling designs for spawning data on
The Middle Fork Salmon River
*a lot like the Middle Fork Salmon R.
• What sampling design should be used for estimating the number of chinook redds on a river network*?
– estimation of status – number of spring-chinook redds in Middle Fork Salmon River one year
– Measurement design – we are not really thinking about the measurement design, we assume we have some way to identify and count redds once you get to a location.
2
The Middle Fork Salmon River
3
1995 1996 1997 1998 2001 2002
05
00
10
00
15
00
IndexOther
Redd data – “the Truth”
• IDFG Dataset (Russ Thurow) counted the number of redds in the Middle Fork Salmon River via helicopter
• All spawning reaches were censused each year• sampling was done by helicopter and where necessary by foot• Six years of data 1995-1998, 2001, and 2002
• These data can be considered the truth
year 1995 1996 1997 1998 2001 2002
Total redds 20 83 424 661 1789 1730
4
Objectives
• Compare several designs to see if one estimates the number of redds (and only redds) the best– unbiased designs (estimators)
– “best” determined by
• standard error of estimator
• coverage probability (how many times 95% confidence interval actually contains the number of redds)
• cost
– Keep things fair by sampling the same total length of stream, the index covers 976 segments or 195.2 km. of stream.
• Does not imply equal cost
• Although some standard errors can be calculated analytically the coverage needs to be addressed via simulation.
5
Methods
• Compare sampling strategies using IDFG data as the truth.• Sampling strategies include sampling design and estimator
.
.
.
sampledesign
Estimatorfor the total
And confidenceinterval
6
Fre
qu
en
cy
1400 1600 1800 2000 2200
05
01
00
15
02
00
25
0
Methods
• Use simulation by resampling the population over and over
.
.
.
7
Cost & Crew-trips
• Each segment gets an access pt.
• Travel to access sites based on whether
– airplane– Auto
• Travel from access sites to sampling reaches is the maximum distance from access site to furthest sampling reach in each “direction” along a tributary
• Cost = Fn(km by foot)4 round tripsrequired
8
distances in 5km intervals.
Many areas require over 20 km hike
Maximum distance is 33 km.
9
The sampling designs
• Index – sample the index reaches• Simple random sampling – using the unbiased estimator• Systematic sampling – sort tributaries in random order,
systematically sample along resulting line. • Stratify by Index – Sample independently within and outside
the index regions.• Adaptive cluster sampling – Choose segments with a simple
random sample. If sampled sites have redds sample adjacent segments.
• Spatially balanced design – Based on EMAP design, though selecting segments within primary sampling units rather than points (not yet implemented)
10
Index sampling
• When the sample size is smaller than the overall size of the index region a simple random sample of the segments within the index is assumed.
• Two possibilities to estimate the number of redds from the index sample:1. Assume there are no redds outside of the index – estimates will be
too small.
2. Assume that the average number of redds per segment outside the index is the same inside and simply inflate the index estimator – estimates will be too large.
11
1995 1996 1997 1998 2001 2002
01
00
02
00
03
00
04
00
05
00
0
* **
*
* *
estimates
* true value
Bias of Inflating Estimator from Index Sample
Redds
12
Systematic sampling
• Order the tributaries in random order along a line• Choose sampling interval, k, so that final sample size is
approximately n
• Select a random number, r, between 1 and k
• Sample reaches r, r+k, r+2k, …, r+(n-1)k
• Systematic sampling is cluster sampling where clusters are made up of units far apart in space and one cluster is sampled
k
r r+k r+2k r+4kr+3k
13
Stratify by Index
• Stratify by index and oversample index reaches• Simple random sample in each stratum
• Allocation:– Equal allocation: Usually does not perform well
– Proportional allocation: Does not oversample index sites so will probably not have good precision
– Optimal allocation: need to know the standard deviation
year 1995 1996 1997 1998 2001 2002
proportion in index
0.76 0.54 0.48 0.48 0.42 0.46
14
Adaptive cluster sampling
• Original sample is simple random sample
• If sampled site meets criteria also sample sites in neighborhood– Criteria: presence of redds
– Neighborhood: segments directly upstream and downstream
• Continue until sites do not meet criteria– Both legs of confluences 1
3
2
5
4
6
in original sample
2
Meets criteria
Meets criteria4
includeneighbor
6
anddo not meetcriteria
1 3
Final sample includes: 2 1 3 4 6
15
Design 20 83 424 661 1789 1730
SRS 39.0 18.6 9.9 8.2 7.3 7.1
Cluster (1km) 47.7 22.7 15.1 12.7 12.5 12.1
SYS 24.9 17.1 9.2 8.0 5.0 6.1
STRS (optimal) 27.2 15.3 8.6 7.2 6.7 6.2
ADAPT 39.3 18.8 10.5 7.6 8.7 8.4
Results: Normalized standard error of estimatorsRun size
SRS 86.6 93.0 94.5 95.3 93.3 94.3
Cluster (1km) 89.0 91.2 92.5 92.6 92.9 94.3
SYS 96.4 95.2 97.0 95.0 99.0 97.3
STRS (optimal) 92.0 95.0 94.3 95.5 92.9 93.5
ADAPT 87.6 94.8 94.7 94.9 94.7 93.9
Coverage Probability (.95)
16
Costs
SRS
SRS-
1km
SYS
STR
S
Adaptive sampling‘95 ‘96 ‘97 ‘98 ‘01 ‘02
Ada
ptiv
e
40
05
00
60
07
00
17
Precision per cost (10% sampling fraction)
big is good: high precision per km traveled
0 500 1000 1500
0.0
02
0.0
04
0.0
06
0.0
08
0.0
10
0.0
12
0.0
14
SRSSRS-1kmSYSSTRADP
SRSSRS-1kmSYSSTRADP
run size
Precision per cost
18
Conclusions
• Stratifying by index results in the most precise estimates except in the large runs where systematic sampling seems to work best.
• The index sites should be oversampled in the stratified design. Proportional allocation (based on the size of the strata) results in poor precision.
• Although the systematic sampling strategy often is the most precise, there is not a good estimator for the variance. The estimator that assumes a simple random sample is conservative.
• Same pattern for different sampling fractions.
19
Conclusions
• The cluster sampling design is not very precise but reduces costs significantly.
• Adaptive cluster sampling is not as precise as other designs.– It is optimal for rare clustered populations
– during small years the redds are not clustered enough
– during large years they are not rare enough
– only during the medium years does it compete with other designs.
• When cost and precision are analyzed together – small runs – either stratified by index or SRS-1km work best
– large runs – either systematic or stratified by index work best
20
Not yet finished
• EMAP type design.• successive difference variance estimator for the systematic
sampling• Adaptive sampling with same initial sample size (cost function
does not penalize this much)• Cost function
– including road travel
– crew trips/day units
21
Points vs. Lines
• Pick points -- points are picked along stream continuum and the measurement unit is constructed around the point
• advantages:– different size measurement
units are easily implemented
• disadvantages:– difficulty with overlapping units– inadvertent variable probability
design because of confluences and headwaters
– Analysis may be complicated
• Pick Segments – Universe is segmented before sampling and segments are picked from population of segments
• advantages:– simple to implement– simple estimators
• disadvantages:– Difficult frame construction
before sampling – Cannot accommodate varying
lengths of sampling unit
22
Adaptive Cluster Sampling
• Use the draw-by-draw probability estimator:
– Let wi be the average number of redds in the network of which segment i belongs, then
– with variance
Thompson 1992
23
COSTS
• Our costs are based on the number of kilometers traveled by foot.
• Each segment in the MF is assigned to an access point (this is not optimized in some rare instances the assigned access point is not the closest) and the distance along the stream from that access point is calculated
• There are two types of access points air fields and trailheads. For this exercise they both have the same price.
• Because we are tallying the number of km. along the streams, this cost function also models other types of sampling including via helicopter and raft.
24
Six years
1995 1996
25
Six years
1997 1998
26
Six years
2001 2002
27
Access to MFSR
• Roadless area• Airplane access
possible
28
29
air vs. car access
30
Index sample
• Not sure how to build estimates for total number of redds in Middle fork.– expand current estimator
(assume same density outside of index)
– use current estimate (assume 0 redds outside of index)
year 1995 1996 1997 1998 2001 2002
Number counted in Index
19 62 290 448 1178 1199
Total number of redds
20 83 424 661 1789 1730
31
32
Stratify by Index
• Oversample index sites where most redds are located• Simple random sample in each stratum
• Equal allocation:
• Proportional allocation:
year 1995 1996 1997 1998 2001 2002
5.33 12.68 36.61 47.34 121.43 106.98
coverage 90.4 94.6 94.2 94.8 92.9 93.4
year 1995 1996 1997 1998 2001 2002
7.77 15.26 41.08 52.37 124.90 115.56
coverage 88.0 94.7 95.0 94.9 94.4 93.6
33
Stratify by index
• Optimal allocation
• Using
year 1995 1996 1997 1998 2001 2002
proportion in index
0.76 0.54 0.48 0.48 0.42 0.46
n index 746 530 475 464 407 445
n other 230 446 501 512 569 531
year 1995 1996 1997 1998 2001 2002
5.49 12.76 36.60 47.26 120.50 106.58
coverage 92.0 95.0 94.3 95.5 92.9 93.5
34
Stratify by index
• Using
year 1995 1996 1997 1998 2001 2002
n index 746 530 475 464 407 445
n other 230 446 501 512 569 531