Upload
peter-haase
View
212
Download
0
Embed Size (px)
Citation preview
Assessing the impact of errors in sorting and identifying macroinvertebrate
samples
Peter Haase1,*,�, John Murray-Bligh2, Susanne Lohse1, Steffen Pauls1, Andrea Sundermann1,�,Rick Gunn3 & Ralph Clarke31Department of Limnology and Conservation Research, Senckenberg – Research Institute and Natural History Museum,Clamecystraße 12, 63571 Gelnhausen, Germany2Environment Agency, Manley House, Kestrel Way, EX6 8EX Exeter, UK3CEH Dorset, Winfrith Technology Centre, Winfrith Newburgh, Dorchester, DT2 8ZD Dorset, UK(*Author for correspondence: E-mail: [email protected])
Key words: stream assessment, error estimation, sample sorting, macroinvertebrate identification
Abstract
This study assesses the impact of errors in sorting and identifying macroinvertebrate samples collected andanalysed using different protocols (e.g. STAR-AQEM, RIVPACS). The study is based on the auditingscheme implemented in the EU-funded project STAR and presents the first attempt at analysing the auditdata. Data from 10 participating countries are analysed with regard to the impact of sorting and identi-fication errors. These differences are measured in the form of gains and losses at each level of audit for 120samples. Based on gains and losses to the primary results, qualitative binary taxa lists were deducted foreach level of audit for a subset of 72 data sets. Between these taxa lists the taxonomic similarity and theimpact of differences on selected metrics common to stream assessment were analysed. The results of ourstudy indicate that in all methods used, a considerable amount of sorting and identification error could bedetected. This total impact is reflected in most functional metrics. In some metrics indicative of taxonomicrichness, the total impact of differences is not directly reflected in differences in metric scores. The resultsstress the importance of implementing quality control mechanisms in macroinvertebrate assessmentschemes.
Introduction
All assessments of the ecological status of a riversite based on biological samples are subject touncertainty and errors. Biological surveys canonly detect a change in river quality when thedifference in the results before and after change isgreater than uncertainty caused by naturalvariability and human error. In this paper, weexplore the size of these errors in survey andanalytical methods for assessing river quality thatare used throughout Europe. Error is rarelymeasured in monitoring surveys or considered
negligible as it is often assumed to be small andconstant. If this assumption is incorrect, there is ahigh risk that conclusions drawn from such sur-veys could be wrong.
Most quantitative assessments of the biologicalstatus of water bodies are based on the values ofbiological indices or metrics derived from thetaxonomic composition of the sample, where themetric is intended to measure some specific aspector general feature of the biota (Cao et al., 2003;Bohmer et al., 2004; Hering et al., 2004a, b). Thesemeasures are of little value without knowing theirdegree of uncertainty (Clarke, 2000; Clarke et al.,2002). This is because differences in river qualitycan only be confirmed when they exceed the�These authors contributed equally to this work.
Hydrobiologia (2006) 566:505–521 � Springer 2006M.T. Furse, D. Hering, K. Brabec, A. Buffagni, L. Sandin & P.F.M. Verdonschot (eds), The Ecological Status of European Rivers:Evaluation and Intercalibration of Assessment MethodsDOI 10.1007/s10750-006-0075-6
uncertainty inherent in the data. Uncertainty iscaused both by the natural variability of the biotaused to evaluate river quality and by human errorintroduced by the analyst. It arises from everystage of data collection, from sampling (e.g. Carter& Resh, 2001; Clarke et al., 2002; Ostermiller &Hawkins, 2004) to sample analysis and data han-dling (e.g. Doberstein et al., 2000; Haase et al.,2004a, b). The sources of these must be identifiedso that they can be reduced and can be accountedfor when results are evaluated. This study focuseson the two major sources of analytical error:sorting error and identification error.
The EU Water Framework Directive(EU-WFD) (European Union, 2000) requires thelevel of confidence and precision of results pro-vided by monitoring programmes to be given in theRiver Basin Management Plans (EU-WFD: An-nexe V, Section 1.3). As with all ecological analy-ses, it is more important to have moderate errorsthat have been quantified than to have small errorsbut no estimate of their magnitude. The formerallows the significance of any differences to bedetermined whereas the latter does not. In mostmember states, monitoring for the EU-WFD willbe undertaken by environmental protection agen-cies or commercial environmental laboratories,and in same cases by research laboratories. Irre-spective of who does the analysis, it is impossible toeliminate all errors from data based on field surveyand laboratory analysis. Therefore it is essential tounderstand how error can be quantified and mini-mised and to provide tools to assess these errors.
One aim of the EU-funded STAR1 project wasto identify and quantify different sources of errorthat affect metrics and thus assessment results(Furse et al., 2006). Within the project theuncertainty entailed to site selection, naturalvariability within a site or between seasons, dif-ferent sub-sampling strategies, and human errorcaused in sample processing were studied (Furseet al., 2006). To evaluate the error of sampleprocessing, a sorting and identification audit wasimplemented for two major biological quality
assessment components, macroinvertebrates anddiatoms. For invertebrates, the emphasis wasplaced on qualitative sorting and identificationerrors inherent in the laboratory treatment ofinvertebrate samples.
In the present study, pre audit and post auditmacroinvertebrate taxa lists and resulting metricvalues are compared, based on samples collectedfollowing standardised sampling and processingprotocols. The focus of this study thus lies ondetermining differences between individual sourcesof error and how sorting and identification errorseffect metrics commonly used in river qualityassessment. The results from other sources of er-ror, such as replicate sampling (Clarke et al.,2006a), sub-sampling (Clarke et al., 2006a; Vleket al., 2006) and natural variability like seasonalchange (Sporka et al., 2006) are presented in otherpapers of this issue.
Materials and methods
Audit design
The auditing approach for macroinvertebratesamples applied in the STAR project involved twoseparate components: (1) a sorting audit at familylevel, undertaken by a single auditing laboratory,to assess sorting errors across the whole project ina consistent and unified way, and (2) an identifi-cation audit undertaken by partners familiar withanalysing invertebrates from similar environments.Generally, these were laboratories from neigh-bouring countries from the same Ecoregion. Thisapproach was chosen for the identification audit,because no laboratory involved in the project hadsufficient experience in analysing all the speciesfound in the geographic area covered by the pro-ject to undertake the identification audit for thewhole project. Because of this, each partner’sidentification audit was done by one or morepartners from neighbouring countries. Althoughthis caused the quality of the identification audit tovary between partners, it ensured that audit resultswere more accurate. Throughout this paper, theterms primary sample, primary analyst and pri-mary data relate to the main analysis of a sample,the terms audit sample and audit data to the re-analysis of a sample in the audit.
1 Standardisation of river classifications: Framework method
for calibrating different biological survey results against eco-
logical quality classifications to be developed for the Water
Framework Directive, STAR. Contract No: EVK1-CT 2001-
00089
506
Selecting audit samples
Macroinvertebrate samples of the STAR projectsampling programme were taken at all sites by eachparticipating partner using two different methods:(a) the STAR-AQEM method, a multi-habitat-sampling protocol developed within the STARproject (Furse et al., 2006) and (b) a ‘national’method, whichwas normally a widely used protocolwithin the individual partner’s member state (Furseet al., 2006). In Germany, Austria andGreece therewere no existing common ‘national’ sampling pro-tocols. Alternativly, the UK RIVPACS protocolwas used (Murray-Bligh et al., 1997). Twelve sam-ples from 10 countries (Nsort=120) were analysed inthe sorting audit: 6 STAR-AQEM samples(NSA=60) and 6 collected and analysed by thecorresponding national method (NNat=60) (Ta-ble 1). A subset of these samples was used for theidentification audit. This subset comprised the 12samples from the 6 countries for which RIVPACSor the RIVPACS comparable PERLA was theirnational method (from hereon referred to asRIVPACS/PERLA), to complement the dataavailable from the STAR-AQEM data sets. Thisallowed for a comparison of different methodsbased on a reasonable sample size for both STAR-AQEM and RIVPACS/PERLA samples (N=72for both methods) (Table 1).
Primary analysts were aware that all 1090invertebrate samples collected for the STARproject were potentially subject to audit. Apartner not involved in any primary analysesselected the samples for the audit randomly.Audit samples were selected roughly evenly be-tween seasons and included samples representinghigh, good and moderate ecological quality (pre-classification based on expert judgement). Foreach combination of site and season, one samplecollected by the STAR-AQEM protocol and onesample collected by the national survey protocolwas chosen for audit. Partners were not toldwhich samples were selected for audit until allthe primary data had been entered into theSTAR database, AQEMdip (AQEM Consor-tium, 2004) so that the primary data could notbe altered after the audit samples had been se-lected. This ensured that primary analysts could
not give any special attention to audit samplesand the audit results would therefore reflect thequality of all the primary analyses.
Audit procedure
Sorting audit
When sorting STAR-AQEM samples, the pri-mary analyst had to remove all specimens fromthe sub-sample. After analysis, the sorted speci-mens were placed in a labelled vial or jar con-taining preservative and stored for theidentification audit (see Furse et al., 2006 for adetailed sampling and sorting protocol). Allorganic and inorganic material from the sortedSTAR-AQEM sub-sample, together with anyanimals remaining in it, was returned to a jarwith preservative for the sorting audit. In thesorting audit, the auditors re-sorted the wholesub-sample removing any animals they foundand placing them in a new, labelled vial. Theonly identification undertaken by the sortingauditors was to identify any additional familiesthat were not recorded by the primary analyst,which were then recorded as gains.
For national methods protocols that did notdemand that all specimens were removed duringsorting, the primary analysts had to remove up tothree representatives (but not every specimen) ofevery taxon for the identification audit (see Furseet al., 2006 for a detailed protocols of samplingand sorting procedures used in the STAR project).The taxa were based on the taxonomic level of theprimary analysis: if the identification was to familylevel, the taxa removed were families; if the samplewas analysed to species level, the taxa removedwere species (Table 1). The specimens removedhad to be good quality examples and not simplythe first ones that the analysts found in the sample.The sorting auditors re-sorted the sample and re-moved from it all specimens of families missed bythe primary analyst. They also removed up tothree good quality specimens of every potentiallydifferent species that they found in the sample. Theauditors put these specimens in a vial with pre-servative.
507
Table
1.Samplesusedin
thepresentstudy.‘‘NationalMethod’’refers
tothesamplingandsortingprotocolapplied
intherespectivecountryforthenationalmethodsamples,‘‘National
methodcategory’’refers
tothemethodcategory
into
whichprotocolwasplacedfortheselectionoftheidentificationauditsubsetofsamples
Country
STAR-
AQEM
sample*
National
method
sample*
National
method
National
method
category
Season
River
Site
STAR
Site
No.
Sample
date
Sorting
audit
IDaudit
analyses
Austria
A0500261
A0500431
RIV
RIV
/PER
Spring
Sarm
ingbach
Wolfsschlucht
600
16.04.2003
FS
X
Austria
A0500291
A0500461
RIV
RIV
/PER
Spring
GrosseYsper
nearAltenmarkt
603
16.04.2003
FS
X
Austria
A0500332
A0500502
RIV
RIV
/PER
Summer
Sarm
ingbach
Waldhausen
607
09.07.2002
FS
X
Austria
A0600141
A0600341
RIV
RIV
/PER
Spring
Wildbach
nearKramermirtl
701
28.05.2003
FS
X
Austria
A0600192
A0600392
RIV
RIV
/PER
Summer
Stullneggbach
nearAichegg
706
30.07.2002
FS
X
Austria
A0600232
A0600432
RIV
RIV
/PER
Summer
Stullneggbach
nearMainsdorf
708
30.07.2002
FS
X
Czech
Republic
C0401621
C0403561
PERLA
RIV
/PER
Spring
VelkaHana
Rychtarov
614
04.04.2003
FS
X
Czech
Republic
C0401701
C0403631
PERLA
RIV
/PER
Spring
Nectava
Brezinky
620
27.03.2003
FS
X
Czech
Republic
C0401172
C0403152
PERLA
RIV
/PER
Summer
Umori
Zbraslavec
625
19.07.2002
FS
X
Czech
Republic
C0501212
C0503182
PERLA
RIV
/PER
Summer
Huntava
Valsovskydul
713
26.07.2002
FS
X
Czech
Republic
C0501272
C0503232
PERLA
RIV
/PER
Summer
Luha
Sloup
717
22.07.2002
FS
X
Czech
Republic
C0501941
C0503831
PERLA
RIV
/PER
Spring
Trebuvka
Borsov
722
09.04.2003
FS
X
Denmark
K0201011
K0201021
DSFI
Nat
Spring
Karstoft
Noerre
Grene
662
01.04.2003
FS
Denmark
K0202012
K0202022
DSFI
Nat
Summer
Mattrup
StidsMoelle
663
06.08.2002
FS
Denmark
K0206011
K0206021
DSFI
Nat
Spring
Kastbjerg
Edderup
667
01.04.2003
FS
Denmark
K0207012
K0207022
DSFI
Nat
Summer
Fjederholt
Okkels
668
08.08.2002
FS
Denmark
K0209011
K0209021
DSFI
Nat
Spring
Skibsted
Skibstedbro
670
07.04.2003
FS
Denmark
K0210012
K0210022
DSFI
Nat
Summer
Skals
Faarup
671
12.08.2002
FS
France
F0800013
F0800193
IBGN
Nat
Autumn
Aube
Aubepierre-sur-Aube
724
25.09.2002
FF
France
F0800021
F0800201
IBGN
Nat
Spring
Seine
Erm
itageduValdeSeine
725
15.04.2003
FF
France
F0800041
F0800221
IBGN
Nat
Spring
Aujon
upstream
of
Giey-sur-Aujon
726
25.05.2003
FF
France
F0800063
F0800243
IBGN
Nat
Autumn
Ornain
downstream
ofAbainville
728
30.09.2002
FF
France
F0800073
F0800253
IBGN
Nat
Autumn
Meuse
(Bassoncourt)
between
Daillecourt
&
Bassoncourt
729
10.10.2002
FF
France
F0800111
F0800291
IBGN
Nat
Spring
Mouzon
Sartes
733
09.04.2003
FF
Germany
D0400392
D0400512
RIV
RIV
/PER
Summer
Wehebach
Wehebachtalsperre
627
29.06.2002
FS
X
Germany
D0400461
D0400581
RIV
RIV
/PER
Spring
Salwey
Niedersalwey
634
25.03.2003
FS
X
Germany
D0300202
D0300352
RIV
RIV
/PER
Summer
Stepenitz
nearPutlitz
649
15.07.2002
FS
X
Germany
D0300201
D0300351
RIV
RIV
/PER
Spring
Stepenitz
nearPutlitz
649
10.04.2003
FS
X
508
Germany
D0600022
D0600122
RIV
RIV
/PER
Summer
Ilme
aboveRelliehausen
816
21.06.2002
FS
X
Germany
D0600071
D0600171
RIV
RIV
/PER
Spring
Klingbach
aboveHausen
821
11.03.2003
FS
X
Greece
H0400222
H0400282
RIV
RIV
/PER
Summer
Peristeria
Artiki
735
29.07.2002
FF
X
Greece
H0400242
H0400302
RIV
RIV
/PER
Summer
Tsouraki
Tsouraki
737
01.08.2002
FF
X
Greece
H0400051
H0400151
RIV
RIV
/PER
Spring
Tsouraki
SL
98
738
21.05.2003
FF
X
Greece
H0400031
H0400131
RIV
RIV
/PER
Spring
Krathis
Tsivlos
739
22.05.2003
FF
X
Greece
H0400262
H0400322
RIV
RIV
/PER
Summer
Gadouras
Gadouras
753
24.08.2002
FF
X
Greece
H0400011
H0400111
RIV
RIV
/PER
Spring
Gorgopotamos
GorgopotamosBridge
756
18.05.2003
FF
X
Portugal
P0411313
P0431313
PMP
Nat
Autumn
Xevora
Xevora
863
03.02.2003
FG
Portugal
P0411133
P0431133
PMP
Nat
Autumn
Tripeiro
Tripeiro
864
06.12.2002
FG
Portugal
P0411121
P0431121
PMP
Nat
Spring
Taveiro
Taveiro
865
13.05.2003
FG
Portugal
P0411213
P0431213
PMP
Nat
Autumn
Alpreade
Alpreade
866
05.12.2002
FG
Portugal
P0411321
P0431321
PMP
Nat
Spring
Ponsul
Ponsul
867
14.05.2003
FG
Portugal
P0411221
P0431221
PMP
Nat
Spring
Basagueda
Basagueda
868
14.05.2003
FG
Slovakia
V0100473
V0100483
PERLA
RIV
/PER
Autumn
Bystrica
podVel’kouskalou
984
17.09.2003
FS
X
Slovakia
V0100493
V0100503
PERLA
RIV
/PER
Autumn
Bystrica
Hornadomovina
986
17.09.2003
FS
X
Slovakia
V0100513
V0100523
PERLA
RIV
/PER
Autumn
Bystrica
Bystrieany
987
17.09.2003
FS
X
Slovakia
V0100423
V0100433
PERLA
RIV
/PER
Autumn
Hostianskypotok
priPodJavorom
988
16.09.2003
FS
X
Slovakia
V0100443
V0100453
PERLA
RIV
/PER
Autumn
Hostianskypotok
podObecnym
vrchom
989
16.09.2003
FS
X
Slovakia
V0100463
V0100563
PERLA
RIV
/PER
Autumn
Hostianskypotok
nadTopoleiankami
990
18.09.2003
FS
X
Sweden
S0501351
S0502311
Swedish
Nat
Spring
Nittalven
downstream
of
Nordtjarnsalven
685
04.06.2003
FS
Sweden
S0501063
S0502023
Swedish
Nat
Autumn
Savalven
upstream
ofSavefors
689
23.10.2002
FS
Sweden
S0501431
S0502391
Swedish
Nat
Spring
Horksalven
Brattforsen
691
04.06.2003
FS
Sweden
S0601193
S0602153
Swedish
Nat
Autumn
Forsmarksan
Johannisfors
875
30.10.2002
FS
Sweden
S0601293
S0602253
Swedish
Nat
Autumn
Hagaan
Lurbo
876
19.11.2002
FS
Sweden
S0601561
S0602521
Swedish
Nat
Spring
Stromaran
Hillebola
878
22.05.2003
FS
United
Kingdom
U1510011
U1510321
RIV
RIV
/PER
Spring
Ecchinsw
ellBrook
Headley
639
07.04.2003
FS
X
United
Kingdom
U1510663
U1510973
RIV
RIV
/PER
Autumn
Westbury
Brook
Westbury
642
08.10.2002
FS
X
United
Kingdom
U1510101
U1510411
RIV
RIV
/PER
Spring
CliffBrook
Crowton
648
13.04.2003
FS
X
United
Kingdom
U2310763
U2311073
RIV
RIV
/PER
Autumn
Clun
Marlow
674
28.09.2002
FS
X
United
Kingdom
U2310181
U2310491
RIV
RIV
/PER
Spring
Ogmore
Bridgend
678
09.04.2003
FS
X
United
Kingdom
U2310833
U2311143
RIV
RIV
/PER
Autumn
Sirhowy
Ynysddu
681
27.09.2002
FS
X
No.ofsamples
NSA=
60
NNat=
60
NRIV
/PER=
32
Nsort=120
NID=72
‘‘STAR
SiteNo.’’refers
totheuniquesite
codeusedthroughouttheSTAR
project.(SA=STAR-A
QEM,RIV
=RIV
PACS,IB
GN=French
nationalmethod,DSFI=
DanishStream
FaunaIndex
protocol,PERLA=Czech
nationalmethod,Swedish=Swedishnationalmethod,PMP=Portuguesenationalmethod,NAT=other
nationalmethods;further
inform
ation
onthedifferentmethodsseeFurseet
al.,2005).Taxonomiclevelindicatedforsortingauditandidentificationaudit(F=familylevel,G=mainly
genuslevel,S=mainly
specieslevel).‘‘X’’
indicatessamplesusedforsubsequentanalysesin
theidentificationaudit.*Refersto
STAR
samplescode.
509
Identification audit
The identification audit was undertaken at thetaxonomic level used for the calculation of themetrics by the primary analyst partner. For somepartners, this was species, for others it was mixedtaxonomic level (Table 1). Vials of specimens andmaterial mounted on permanent microscope slidesby the primary analysts were sent to the identifi-cation auditors. Temporary mounts could not besent. The identification auditors used the samemethod of identification that they used for theirprimary analysis. Partners that used experts fortheir primary analyses used the same experts forauditing identifications. The identification auditorsrecorded a new list of taxa based on their identi-fication of the vial(s) and slide mounts from theprimary analyst. They recorded gains and losses,compared to the primary analyst’s taxa list. Be-cause it was impractical to undertake the identifi-cation audit quantitatively, the metrics used tocompare the two samples were based on presence/absence data. A binary taxa list was created toallow a qualitative comparison of the results. Thebinary taxa lists comprised the primary analysts’results (from hereon referred to as ‘‘primary’’ or‘‘P’’), the primary analysts’ results plus furthertaxa observed at the sorting audit level (fromhereon referred to as ‘‘sorting audit’’ or ‘‘AS’’) anda taxa list based on the identification auditor’sresults (from hereon referred to a ‘‘identificationaudit’’ or ‘‘AID’’). AID is only based on taxa foundin the primary analysis and does not consider taxagained through the sorting audit. A last taxa listrepresents the combined results of the sorting andidentification audits (from hereon referred to as‘‘total audit ‘‘ or ‘‘ATOT’’). By this design, it waspossible to establish the effects of errors at eachaudit level independently and the cumulative errorof both the sorting and identification audit.
Audit analyses
Chironomidae, Nematoda and Oligochaeta taxawere not included in the audit. All sorting andidentification audit results were based on qualita-tive errors only. Two parameters were used tomeasure analytical quality in these audits: thenumber of gains (taxa that were not recorded asbeing present in the sample but which the auditors
found in the sample) and the number of losses(taxa that are recorded as being present but whichwere not found in the sample by the auditor).Gains (G) and losses (L) were identified by com-paring the auditor’s taxa list to that of the primaryanalyst. Only gains were recorded in the sortingaudit. Losses and gains were recorded in theidentification audit.
Neither the primary analyst’s nor the identifi-cation auditor’s species lists were considered to bedefinitive – they were considered simply as twoviews of the same data. Audit results were not usedto correct the primary data.
The primary analysts calculated a range ofmetrics separately for the primary and auditsample analyses using the AQEM-STAR assess-ment software ASTERICS (www.eu-star.at). Thedifferences in metric results for primary and auditsamples were calculated and used to determine theeffect of analytical errors on a selection of metrics,that are commonly used in the member states toclassify river quality (Hering et al., 2004).The audit results were qualitative and the metricswere calculated from the binary taxa lists, i.e.presence/absence data. For some of the selectedmetrics, which are normally based on quantitativedata, e.g. Shannon–Wiener diversity index, thisapproach could only reflect the qualitative com-ponent of the error. Depending on the abundancestructure of a sample, this approach may overes-timate or underestimate the impact of differencesin some metrics. However, this approach made itpossible not only to test the effect of uncertaintyon ‘‘counting’’ metrics, such as Number of Taxa,but also to get an idea of the uncertainty related tosorting and identification errors based on com-monly used richness measures and functionalmetrics. The similarity between the primary andaudit results was investigated by Jaccard similarity(Jaccard, 1901), which was calculated using PC-ORD Version 4.25 (McCune & Mefford, 1999).The statistical analysis of metric results includedmean deviation and the spread of differences be-tween primary and audit results. It was also pos-sible to compare the performance of the STAR-AQEM and RIVPACS methods in regard to thedeviation of metric results between primary andaudit samples. Mann–Whitney U-tests (Mann &Whitney, 1947) were used to see whether deviationwas larger using one particular method. Wilcoxon
510
Test (Wilcoxon, 1945) was used to see if differencesbetween primary and audit samples were signifi-cant. All statistical analyses were performed inStatistica 6.1 (StatSoft, 2002).
Results
Absolute differences in resulting taxa lists
The results of the sorting audit are summarised asgains and the identification audit results as acombination of gains and losses. Figure 1 showsthe number of differences between taxa lists basedon gains and losses after each audit by differentmethods. In STAR-AQEM samples, the numberof differences identified in the sorting audit is sig-nificantly smaller than those observed duringidentification audit (Wilcoxon Test, p<0.001). Inboth the national methods and the RIVPACS/PERLA methods complex there is no significantdifference between the number of differencesidentified in the sorting or the identification audit(Wilcoxon Test, p<0.5). The number of gainsobserved at the sorting level is significantly higherfor RIVPACS/PERLA than for national methodsor for STAR-AQEM (Mann–Whitney U-Test,p<0.003). The number of gains observed at theidentification audit and the total audit level issignificantly lower in the national methods than inRIVPACS/PERLA or STAR-AQEM (Mann–Whitney U-Test, p<0.001). This could be becausemany of the national methods determine taxa atthe family or genus level, where there is a very lowerror. In both RIVPACS/PERLA and STAR-AQEM identification is generally to species level,presumably leading to higher levels of identifica-tion difference. The results further indicate that innational method samples and RIVPACS/PERLAcomplex samples, both sorting and identificationdifferences contribute about equally to the totalnumber of differences, while in STAR-AQEMsamples the differences are mainly caused byvarying identification results.
The qualitative similarity between taxa listsbased on primary results, sorting audit results,identification audit results and total audit resultswere also tested by calculating Jaccard similaritybetween the different fractions. This similarityvalue was calculated for samples from those
countries where STAR-AQEM and RIVPACS/PERLA methods were applied (c.f. Table 1).Figure 2 shows the Jaccard similarity values bymethod between the primary taxa lists (P) and thesorting audit taxa lists (AS) (P/AS), between theprimary and identification audit taxa lists (AID)(P/AID) and between the primary and the totalaudit taxa lists (P/ATOT). In both methods, there isa significant difference in Jaccard similarity be-tween sorting audit and identification audit: inSTAR-AQEM Jaccard similarity is significantlyhigher after sorting audit, while in RIVPACS/PERLA it is significantly lower after the sortingaudit (Wilcoxon Test for both, p<0.01). There isno significant difference in Jaccard similarity afteridentification audit or total between the twomethods (Mann–Whitney U-Test, p>0.36). How-ever, in RIVPACS/PERLA samples the Jaccardsimilarity between primary and sorting auditsamples is significantly lower than in STAR-AQEM samples (Mann–Whitney U-Test,p<0.01). This indicates that in RIVPACS/PERLAsamples the sorting error contributes more to thetotal error than differences in identification, whilein STAR-AQEM samples the effect of the sortingaudit is much less than that of the identificationaudit. In both methods, the Jaccard similarity issignificantly lower after the total audit (P/ATOT:RIVPACS/PERLA: median=0.58; STAR-AQEM: median=0.63) than after the sorting auditor identification audit (Wilcoxon Test, p<0.01).This shows that in both methods there is acumulative effect of both errors with respect toJaccard similarity.
Metric results
For the same subset of samples, qualitative taxalists were also used to calculate 12 metrics com-monly used in river quality assessments, to exam-ine the impact of sorting and identification erroron the metrics and thus the assessment results. Sixof the metrics examined were richness measures:number of taxa (No. Taxa), number of families(No. Families), number of genera (No. Genera),number of Ephemeroptera, Plecoptera, Trichoptera,Coleoptera, Odonata and Bivalvia taxa (EPTCOBTaxa), number of Ephemeroptera, Plecoptera,Trichoptera taxa (EPT Taxa), Shannon–WienerDiversity index (Diversity Shannon–Wiener)
511
(Shannon & Weaver, 1949). Two were relativemeasures of composition: number of Ephemerop-tera, Plecoptera, Trichoptera taxa to the numberof Diptera taxa (EPT/Diptera Taxa) and thenumber of taxa scored as r-strategists compared tothe number of taxa scored as K-strategists (r/Krelationship). Four were functional metrics: Bio-logical Monitoring Working Party score (BMWP),the average score per taxon (ASPT) (both Armit-tage et al., 1983), the Rhithron Typie Index (RTI)(Biss et al., 2002) and the Rhithron-Feeding typesindex (RETI), which analyses the proportion ofshredders and grazers (Schweder, 1992). Anexplanation of all these metrics can be found in theAQEM-STAR assessment software ASTERICS(www.eu-star.at).
Table 2 gives the absolute differences in metricvalues compared to the primary result. A WilcoxonTest was used to see if there are significant differ-ences between metric values scored for the primarytaxa list and those scored after the sorting audit,the identification audit and the total audit. In
STAR-AQEM samples there are significant dif-ferences between the primary results and resultsafter sorting audit and identification audit foreight metrics (Wilcoxon Test p<0.05). In RIVP-ACS/PERLA samples eight metrics showed sig-nificant differences between the primary metricresults and the results after sorting and sevenmetrics showed significant differences after theidentification and total audit (Wilcoxon Testp<0.05). In both methods, six of these significantdifferences were observed in metrics that measuretaxonomic richness. BMWP was significantly dif-ferent at all levels of the audit in both methods(Wilcoxon Test, p<0.05). In STAR-AQEM sam-ples only two metrics were significantly different atthe total audit level (BMWP and ASPT).
The absolute differences between primarymetric results and those metric results scored aftersorting (P-AS), identification (P-AID) and totalaudit (P-ATOT) were also calculated (Fig. 3).
In STAR-AQEM samples differences in metricresults observed after identification audit are
Figure. 1 Box Plots showing the number of differences observed during the sorting (AS: gains only) and identification audit (AID: gains
plus losses) and cumulative number of differences (ATOT: gains plus losses) for 24 samples collected and analysed following national
methods protocols, 36 RIVPACS/PERLA (RIV/PER) samples and 60 STAR-AQEM samples. Box Plots: h indicates median; box
indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 120.
512
higher than after sorting audit for all metrics butASPT and RETI. For RTI, No. Taxa, EPTCOBTaxa, EPT Taxa and Diversity Shannon–Wienerthe difference is significant (Wilcoxon Test,p<0.02) (Table 2). In RIVPACS/PERLA samplesdifferences in metric results are higher after sortingaudit than after identification audit, the onlyexception is r/k relationship (Table 2). The differ-ences after sorting audit are significantly higher ineight metrics. These eight metrics are all richnessmeasures, BMWP score and ASPT score (Wilco-xon Test, p<0.02). These results suggest that inRIVPACS/PERLA samples, sorting error causesmore analytical error than identification, while inSTAR-AQEM differences in identification appearto be more important than sorting error.
The differences after both audits are generallylower in STAR-AQEM samples than they are inRIVPACS/PERLA samples (Fig. 3). For allfunctional metrics the differences observed arelargest after the total audit, suggesting a cumula-tive effect of the two audit levels. For richness
measures the situation is different. In RIVPACS/PERLA samples the largest differences are ob-served at the sorting audit level (Table 2, Fig. 3).In STAR-AQEM samples the largest differencesare observed at the identification audit level forrichness measures (Table 2, Fig. 3).
Discussion
In this paper, we make a first attempt to evaluatethe analytical error observed in stream assessmentsbased on sorting and identification components oflaboratory sample treatment. The errors entailedin these processes are important for providingconfidence in assessment results. There are manyother sources of variation in macroinvertebratesampling and sample analysis. These include nat-ural variability (e.g. McElravy et al., 1989;Weatherby & Ormerod, 1990; Boulton & Lake,1992), operator dependent sampling variability(Clarke et al., 2002), variability in sample
Figure. 2 Box Plots showing Jaccard similarity values between the primary taxa lists (P) and the taxa lists based on the sorting audit
(AS) (P/AS), between the primary taxa lists and the identification audit (AID) taxa lists (P/AID) and the difference between the primary
taxa lists and the total audit (ATOT) taxa lists (P/ATOT) for 36 RIVPACS/PERLA (RIV/PER) and 36 STAR-AQEM samples. Box
Plots: h indicates median; box indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 72.
513
Table
2.Meanvalues
(Ø)andstandard
deviation(SD)ofabsolute
differencesin
metricvalues
Primary
(P)
Sortingaudit(P-A
S)
Identificationaudit(P-A
ID)
Totalaudit(P-A
TOT)
P/A
SP/A
IDP/A
TOT
ر
SD
ر
SD
ر
SD
ر
SD
p*
p*
p*
STAR-A
QEM
BMWP
118.36±44.89
4.92±
7.90
7.92±
11.73
9.53±8.65
<0.001
<0.001
0.031
ASPT
6.70±0.98
0.10±
0.32
0.10±
0.23
0.30±0.50
0.041
0.011
<0.001
RTI
11.10±
4.15
0.26±
0.48
0.55±
0.55
0.94±1.00
0.600
0.789
0.307
RETI
0.58±0.11
0.03±
0.05
0.02±
0.02
0.05±0.08
0.175
0.926
0.598
EPT/D
iptera
Taxa
3.12±2.17
0.34±
0.64
0.55±
0.86
0.99±1.41
0.218
0.939
0.304
r/K
relationship
0.06±0.05
0.01±0.01
0.01±0.01
0.02±0.03
0.260
0.125
0.092
No.Taxa
34.53±
15.12
3.00±
3.22
5.42±
4.06
3.14±2.98
<0.001
<0.001
0.352
No.Families
21.58±
6.51
1.08±
1.36
1.31±
1.85
1.42±1.25
<0.001
<0.001
0.737
No.Genera
27.56±
10.46
1.78±
2.27
2.58±
2.89
2.56±2.47
<0.001
<0.001
0.596
EPTCOB
Taxa
24.92±
13.04
2.17±
2.62
3.89±
3.47
2.81±2.75
<0.001
<0.001
0.244
EPTTaxa
17.83±
10.27
1.33±
1.96
3.14±
2.87
2.36±2.32
<0.001
<0.001
0.110
Diversity
Shannon–Wiener
3.41±0.60
0.09±
0.12
0.15±
0.11
0.10±0.09
<0.001
<0.001
0.214
RIV
PACS
BMWP
124.72±51.88
18.83±
18.49
6.83±
12.54
19.39±
18.18
<0.001
0.034
<0.001
ASPT
6.90±0.96
0.22±
0.28
0.09±
0.16
0.28±0.29
1.000
0.438
0.797
RTI
11.54±
3.43
0.59±
0.88
0.48±
0.66
1.01±0.90
0.265
0.514
0.838
RETI
0.58±0.15
0.04±
0.04
0.02±
0.03
0.05±0.05
0.299
0.355
0.528
EPT/D
iptera
Taxa
3.64±2.38
0.52±
0.74
0.46±
0.48
0.69±0.88
0.027
0.688
0.188
r/K
relationship
0.04±0.05
0.01±0.02
0.01±0.02
0.02±0.03
0.207
0.614
0.432
No.Taxa
36.19±
17.64
10.89±
8.86
5.28±
6.39
8.17±7.21
<0.001
<0.001
<0.001
No.Families
21.86±
8.41
4.08±
3.86
1.39±
2.23
4.06±3.88
<0.001
0.003
<0.001
No.Genera
28.53±
12.96
6.56±
6.12
2.83±
4.31
5.83±5.46
<0.001
<0.001
<0.001
EPTCOB
Taxa
26.56±
14.12
7.78±
6.49
3.36±
4.03
5.78±5.54
<0.001
<0.001
<0.001
EPTTaxa
20.33±
11.77
4.53±
4.65
2.44±
3.08
3.58±3.78
<0.001
<0.001
0.001
Diversity
Shannon–Wiener
3.39±0.76
0.29±
0.35
0.15±
0.16
0.26±0.36
<0.001
<0.001
<0.001
Metrics
values
basedonthesortingaudit(P-A
S),identificationaudit(P-A
ID)andtotalaudit(P-A
TOT)comparedto
thevalues
observed
usingprimary
taxalist(P).Significance
ofdifferencesbetweenP-A
SandP-A
IDandsignificance
betweenaverageprimary
metricresultandtheaveragemetricresultsatthedifferentauditlevelsweretested
using
WilcoxonTest(p
indicateslevel
ofsignificance).*WilcoxonTest:boldface
values.significantatp<
0.05.
514
treatment (Haase et al., 2004b) and sub-samplingvariability (Lorenz et al., 2004; Clarke et al.2006b). The present study provides a first approx-imation of the quality and degree of error that maybe observed from sorting errors and operator-dependent differences in macroinvertebrate identi-fications. Aspects of variability and sources of er-ror related to replicate sampling (Clarke et al.,2006a), sub-sampling (Clarke et al., 2006a; Vleket al., 2006) and natural variability (Sporka et al.,2006) are discussed in other essays in this issue.
Absolute differences in resulting taxa lists
Our study provides some interesting insights intothe two sources of error examined in this study:sorting and identification error. The two compo-nents of error play a different role in STAR-AQEM and RIVPACS/PERLA samples. Whilesorting error seems to be more important inRIVPACS/PERLA samples, identification errorseems to be more important in STAR-AQEMsamples (Figs. 1–3). The effect of identificationerror should, by its nature, be similar or equal inboth methods because it is caused by two analystslooking at the same set of specimens. The differ-ence in taxonomic expertise or interpretations ofdistinguishing morphological characters – e.g.relative bristle length or coloration – is the samefor the operators, independent of the method usedto obtain the set of specimens. Therefore, it is notsurprising that no significant differences in identi-fication error between the two methods were de-tected. STAR-AQEM samples seem to be lessaffected by sorting error than RIVPACS/PERLAsamples. This could be the result of differences inthe sorting procedures for the methods. Whileboth methods apply a sub-sampling procedure, thesub-sampling approach is very different. In STAR-AQEM, a defined fraction of the sample is com-pletely sorted and all animals are removed fromthe sub-sample for identification. In RIVPACS,the whole sample is sorted sequentially by trans-ferring small aliquots of sample material into adish and sorting a defined fraction of this dish (e.g.¼ or 1/8), depending on the number of specimens inthe total sample (see Furse et al., 2006 for a de-tailed sampling and sorting protocol for themethods used in the STAR project). This definedfraction will from hereon be referred to as the
‘‘sorted fraction’’. The rest of the material in thedish is scanned and only taxa, which have not beenobserved in the sorted fraction or any of the sortedfractions in previous dishes are picked and re-corded. Also, instead of removing all individualsof abundant taxa from the sorted fraction, they areleft in the tray and counted instead. This inevitablyleads to a more variable sorting protocol inRIVPACS, which also requires a higher level oftaxonomic expertise from the person sorting thesample than the sorting protocol in STAR-AQEM(Haase et al. 2004a, b and references therein). Thissource of error may be overestimated in the pres-ent study, because for many partners, this was thefirst time they applied the RIVPACS protocol.Although the same is true for most partners withrespect to the STAR-AQEM protocol, the com-plexity of the RIVPACS protocol may make itmore prone to mistakes by novices.
Our results indicate that, for RIVPACS sam-ples the sorting error is equally as important as theidentification error, while in STAR-AQEM sam-ples, the sorting error is less severe. This is sup-ported by both the Jaccard similarity analyses aswell as the number of gains and losses observed ateach level of the audit procedure (Figs. 1 and 2).
There appears to be a cumulative effect ofanalytical error in the two levels of audit. Thesingle effect of sorting error and differences inidentification still maintain a high Jaccard simi-larity (>0.8) between samples regardless ofmethod. The cumulative effect is much more severein both methods. In both methods the cumulativeerror decreases Jaccard similarity to about 0.6;60% shared taxa between pre and post audit taxalists is a very poor value. In both methods there arenumerous samples in which differences based onlyon analytical error exceed this value. Similar val-ues are e.g. observed when comparing caddis flyassemblages from different regions (Wiberg-Lar-sen et al., 2000) or differently impacted sites alonga river stretch (Ganasan & Hughes, 1998). Thissuggests that the differences between taxa listscaused by the analytical errors assessed in thisstudy are severe. These results stress the need for ahigh degree of standardisation of methods andraise the issue of increasing confidence in assess-ment results through independent sample auditing.Our results suggest that errors caused duringsorting and identification procedures cannot be
515
ignored in river quality assessment. Several ques-tions concerning the effect of error on quantitativedata sets and the practical implementation ofauditing schemes should be subject to furtherstudy. For example, are the errors observed in thepresent study more or less pronounced in quanti-tative data sets? If they are as pronounced, are theerrors constant or stable and can the degree oferror be estimated in a one-off survey or researchproject? If not, must they be measured continuallyas an integral part of the survey?
Metric results
This study provides a preliminary view of howmetrics are affected by sorting and identificationerrors, but can only quantify these errors on singlequalitative metrics. How quantitative metrics ormultimetric assessment results are affected canonly be estimated. The reason for this lies in thequalitative nature of the audit approach. Theindices of the BMWP-score system (NationalWater Council, 1981) were ideally suited to thesorting audit because they are based on family-level presence-absence data and included both anindex of organic pollution (ASPT) and of generalstress (No. Taxa). However, other metrics whichmake use of the absolute or relative abundances oftaxa or involve species are less suited to thisqualitative approach. Many of the metrics used forstream assessment rely not only on pres-ence/absence data, but also on abundance data foreach taxon. Four of the twelve metrics investigatedin this study normally use quantitative data. Theinterpretation of the results obtained for thesemetrics based on qualitative data must thereforebe interpreted with caution. Also, all of themultimetric assessment schemes implemented todate for the countries whose samples were analy-sed in this study are based on at least one metricthat requires quantitative data (Bohmer et al.,2004; Hering et al. 2004a; Ofenbock et al., 2004).Therefore, we cannot estimate the effect of sortingand identification error on a multimetric assess-
ment result. It is important that this subject isaddressed by a quantitative audit scheme in futurestudies. Then error ranges can be assigned toassessment results and confidence in assessmentresults increased.
Despite the difficulties related to the qualitativenature of our audit design, some interestingobservations were made regarding the effect ofanalytical error on certain metrics. Intuitively, onewould expect the total number of differences or thetotal error (observed in both the sorting andidentification audit) to be higher than that ob-served in the sorting or the identification auditonly, independent of the method. For the numberof gains and losses and Jaccard similarity values,this was the case. This was also the case forfunctional metrics. In qualitative metrics that areindicative of richness, the results presented in thisstudy are different and somewhat counter-intui-tive. In RIVPACS/PERLA samples especially, thedifferences in richness metrics are more pro-nounced in the sorting audit than in the total auditresult where both sorting and identification dif-ferences are considered (Fig. 3, Table 2). Oneexplanation for this could be that in RIVPACSsamples, most gains were identified in the sortingaudit. These gains (GS) plus the number of gainsobserved in the identification audit (GID) are noteliminated by the losses (L) observed in the iden-tification audit (GS+GID)L=8.03). Therefore, inmetrics that measure taxonomic richness – i.e. arequalitative, counting taxa – the number of differ-ences increases more strongly between original andaudited samples in RIVPACS samples than it doesin STAR-AQEM samples. This is because inSTAR-AQEM samples, sorting and identificationgains are more or less eliminated by identificationlosses (GS+GID ) L=0.44). The higher number ofdifferences observed in the sorting audit comparedto the total audit can be explained by the numberof losses. On average, GID ) L is )2.83 forRIVPACS/PERLA and )2.55 for STAR-AQEM,so the effect is about the same for both methods.However, because GS is much greater in RIVP-
Figure. 3 Box Plots showing the absolute differences in metric values between the primary taxa lists (P) and the taxa lists based on the
sorting audit (AS) (P-AS), between the primary taxa list and the identification audit (AID) taxa lists (P-AID) and the difference between
the primary taxa lists and the total audit (ATOT) taxa lists (P-ATOT) for 36 RIVPACS/PERLA (RIV/PER) and 36 STAR-AQEM
samples. Box Plots: (indicates median; box indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 72.
c
516
517
Figure 3. (Continued)
518
ACS than in STAR-AQEM samples, the differ-ences in number of taxa between the primary re-sults and the sorting audit results are greater thanthose between the primary result and the sortingplus identification audit result. This effect is max-imised in metrics with the highest level of taxo-nomic differentiation, i.e. highest probability ofobserving differences, e.g. No Taxa. The effect isreduced as the number of taxa observed is reduced.This is, for example, the case when only certaintaxonomic groups are considered (effect in EPT-COB Taxa >EPT Taxa) or when the level ofidentification is lowered (effect in No. Taxa>No.Genera>No. Families).
It thus appears, that some metrics will hardlybe affected by the cumulative error in sorting andidentification of samples. These are the metricsthat count taxa as measures of species richness.While affected by both sorting and identificationerrors, the overall number of taxa and thereforethe number of taxa belonging to a taxonomicgroup is hardly influenced in the overall assess-ment as the errors do not act cumulatively butcancel each other out. For example, if the primaryanalyst identifies Drusus annulatus and anotheranalyst identifies the same individuals as Drususdestitutus, one would have two differences in thetaxa lists, but no differences in the number of taxa,genera, family or number of EPT taxa. Functionalmetrics may however respond to such differences,e.g. feeding types, r/K relationship, ASPT orBMWP scores, saprobic valences.
Sorting and identification audits and quality control
Regardless the methods used, a considerableamount of sorting and identification error could beshown. It also became evident that these errorsaffect metric results and therefore should be takeninto account in stream assessment.
The performance of partners varied consider-ably, especially in the sorting audit. This could bethe result of the limited experience of some partnerswith one or both of the protocols. Another reasonmight be the existence of an audit system. In ourstudy the best performance (of all samples) in thesorting audit has been achieved by the UK partner,the only country which has established an auditsystem many years ago. It appears that experienceand common auditing of samples leads to better
quality of performance. Errors in the long-termauditing scheme in the UK were greatest in the firstyear of the audit and have decreased over time forvarious laboratories in the UK, including othergovernment agencies and commercial contractors.Poor results are especially common in the first auditbut improve very rapidly thereafter (Murray-Blighet al., 2006). This is an effect of training and expe-rience, but may also be an effect of operatorsknowing that samples can and will be audited.Those partners whose audit results were muchpoorer than expected are unlikely to have similarlypoor results if they are audited again in the future.Conclusions made on the basis of the results in thisstudy may therefore differ from those of potentialfuture studieswith a similar auditing scheme and theresults may not be generally applicable to labora-tories currently involved in operational monitoringschemes with auditing. However, the present studyand past experience with auditing schemes showsthat there is a considerable effect resulting fromexperience and the training that operators receive.Biologists often receive no formal training, partic-ularly in sorting, and unless someone points-outmistakes, they will remain unaware of shortcom-ings. Sorting is conceptually very simple and thetask is sometimes left to the most junior and inex-perienced biologists. The audit results demonstratethat sorting is in fact a task that requires more skillthan has been recognised in the past. The audit re-sults also point out the need for formal training andaudit strategies for operators working to implementbioassessment schemes using macroinvertebrates.Extensive training is of utmost importance in theidentification of macroinvertebrates. This is dem-onstrated by the large contribution identificationerror had on the total audit error (Figs. 1 and 2). InSTAR-AQEM it was the main component of error,and was equally important in RIVPACS. Correcttaxonomic identification is also very importantwhen assessment strategies are based on metrics,because many functional metrics are based on spe-cies-specific autecological data. As less and less al-pha-taxonomical skills are being taught withintertiary education programs around the globe, theneed for specialist-based training and extra-curric-ular schooling for analysts dealing with the identi-fication of stream biota will become increasinglyimportant. Rigid training programs and auditingschemes will minimise analytical error related to
519
sample sorting and species identification observedin this study. This will increase the precision ofassessment results and strengthen water managers’confidence in assessment results. Such programsshould therefore become an integral part of bio-logical stream assessment in the future and seemvital for the successful implementation of the EU-WFD.
Acknowledgements
We would like to thank all project partners whocontributed data to this study. Rebecca Bloch,Britta Gehenio and Jenny Schmidt are thanked forassistance in data formatting. This study wassupported by the EU and presents results from theSTAR project (Contract No: EVK1-CT 2001-00089).
References
AQEM consortium, 2004. AQEMdip: AQEM data input pro-
gram. Downloadable from http://www.eu-star.at.
Armitage, P. D., D. Moss, J. F. Wright & M. T. Furse, 1983.
The performance of a new biological water quality score
system based on macroinvertebrates over a wide range of
unpolluted running-water sites. Water Research 17: 333–347.
Biss, R., P. Kubler, I. Pinter & U. Braukmann, 2002. Leit-
bildbezogenes biozonotisches Bewertungsverfahren fur Fli-
eßgewasser (aquatischer Bereich) in der Bundesrepublik
Deutschland. Ein erster Beitrag zur integrierten okologis-
chen Fließgewasserbewertung – Final report on CD-ROM.
UBA Texts 62/02, Berlin.
Bohmer, J., C. Rawer-Jost, A. Zenker, C. Meier, C. Feld,
R. Biss & D. Hering, 2004. Development of a multimetric
invertebrate based assessment system for German rivers.
Limnologica 34: 416–432.
Boulton, A. J. & P. S. Lake, 1992. The ecology of two streams
in Victoria, Australia. III. Temporal changes in species
composition. Freshwater Biology 27: 123–138.
Cao, Y., C. P. Hawkins & M. R. Vinson, 2003. Measuring and
controlling data quality in biological assemblage surveys
with special reference to stream benthic macroinvertebrates.
Freshwater Biology 48: 1898–1911.
Carter, J. L. & V. H. Resh, 2001. After site selection and before
data analysis: sampling, sorting, and laboratory procedures
used in stream benthic macroinvertebrate monitoring pro-
grams by USA state agencies. Journal of the North Ameri-
can Benthological Society 20: 658–682.
Clarke, R. T., 2000. Uncertainty in estimates of river quality
based on RIVPACS. In Wright, J. F., D. W. Sutcliffe &
M. T. Furse (eds), Assessing the Biological Quality of
Freshwaters: RIVPACS and Similar Techniques. Freshwa-
ter Biological Association, Ambleside 39–54.
Clarke, R. T., M. T. Furse, R. J. M. Gunn, J. M. Winder &
J. F. Wright, 2002. Sampling variation in macroinvertebrate
data and implications for river quality indices. Freshwater
Biology 47: 1735–1751.
Clarke, R. T., J. Davy-Bowker, L. Sandin, N. Friberg, R. K.
Johnson & B. Bis, 2006a. Estimates and comparisons of the
effects of sampling variation using ‘national’ macroinverte-
brate sampling protocols on the precision of metrics used to
assess ecological status. Hydrobiologia 566: 477–503.
Clarke, R. T., A. Lorenz, L. Sandin, A. Schmidt-Kloiber,
J. Strackbein, N. T. Kneebone & P. Haase, 2006b. Effects of
sampling and sub-sampling variation using the STAR-
AQEM sampling protocol on the precision of macroinver-
tebrate metrics. Hydrobiologia 566: 441–459.
Doberstein, C., J. Karr & L. Conquest, 2000. The effect of
fixed-count subsampling on macroinvertebrate biomonitor-
ing in small streams. Freshwater Biology 44: 355–371.
European Union, 2000. Directive 2000/60/EC. Establishing a
framework for community action in the field of water policy.
European Commission PE-CONS 3639/1/100 Rev 1, Lux-
emburg.
Furse, M., D. Hering, O. Moog, P. Verdonschot, R. K.
Johnson, K. Brabec, K. Gritzalis, A. Buffagni, P. Pinto, N.
Friberg, J. Murray-Bligh, J. Kokes, R. Alber, P. Usseglio-
Polatera, P. Haase, R. Sweeting, B. Bis, K. Szoszkiewicz, H.
Soszka, G. Springe, F. Sporka & I. Krno, 2006. The STAR
project: context, objectives and approaches. Hydrobiologia
566: 3–29.
Ganasan, V. & R. M. Hughes, 1998. Application of an index of
biological integrity (IBI) to fish assemblages of the rivers
Khan and Kshipra (Madhya Pradesh), India. Freshwater
Biology 40: 367–383.
Haase, P., S. Lohse, S. Pauls, K. Schindehutte, A. Sundermann,
P. Rolauffs & D. Hering, 2004a. Assessing streams in Ger-
many with benthic invertebrates: development of a practical
standardised protocol for macroinvertebrate sampling and
sorting. Limnologica 34: 349–365.
Haase, P., S. Pauls, A. Sundermann & A. Zenker, 2004b.
Testing different sorting techniques in macroinvertebrate
samples from running waters. Limnologica 34: 366–378.
Hering, D., C. Meier, C. Rawer-Jost, R. Biss, C. Feld,
A. Zenker, A. Sundermann, S. Lohse & J. Bohmer, 2004a.
Assessing streams in Germany with benthic invertebrates:
selection of candidate metrics. Limnologica 34: 398–415.
Hering, D., O. Moog, L. Sandin & P. F. M. Verdonschot,
2004b. Overview and application of the AQEM assessment
system. Hydrobiologia 516: 1–20.
Jaccard, P., 1901. Etude comparative de la distribution florale
dans une portion des Alpes et des Jura. Bulletin de la Societe
Vaudoise des Sciences Naturelles 37: 547–579.
Lorenz, A., L. Kirchner & D. Hering, 2004. ‘Electronic subsam-
pling’ of macrobenthic samples: how many individuals are nee-
ded for a valid assessment result? Hydrobiologia 516: 299–312.
Mann, H. B. & D. R. Whitney, 1947. On a test of whether one
of two random variables is stochastically larger than the
other. Annals of Mathematical Statistics 18: 50–60.
520
McCune, B. & M. J. Mefford, 1999. PC-ORD. Multivariate
Analysis of Ecological Data. Version 4.25. MjM Software,
Gleneden Beach, Oregon, USA.
McElravy, E. P., G. A. Lamberti & V. H. Resh, 1989. Year-to-
year variation in the aquatic macroinvertebrate fauna of a
northern Californian Stream. Journal of the North Ameri-
can Benthological Society 8: 51–63.
Murray-Bligh, J. A. D., M. T. Furse, F. H. Jones, R. J. M.
Gunn, R. A. Dines & J. F. Wright, 1997. Procedure for
collecting and analysing macroinvertebrate samples for
RIVPACS. Joint publication by the Institute of Freshwater
Ecology and the Environment Agency, 162 pp.
Murray-Bligh, J., J. van der Molen & P. Verdonschot, 2006.
STAR deliverable No. 7: Audit of Performance incorporat-
ing Results of the La Bresse sampling and analysis work-
shop. Unpublished report. www.eu-star.at.
National Water Council, 1981. River Quality: The 1980 Survey
and Future Outlook. National Water Council, UK.
Ofenbock, T., O. Moog, J. Gerritsen & M. Barbour, 2004. A
stressor specific multimetric approach for monitoring run-
ning waters in Austria using benthic macro-invertebrates.
Hydrobiologia 516: 251–268.
Ostermiller, J. D. & C. P. Hawkins, 2004. Effects of sampling
error on bioassessments of stream ecosystems: application to
RIVPACS-type models. Journal of the North American
Benthological Society 23: 363–382.
Shannon, C. E. & W. Weaver, 1949. Mathematical Theory of
Communication. The University of Illinois Press, Urbana,
IL.
Schweder, H., 1992. Neue Indices fur die Bewertung des oko-
logischen Zustandes von Fließgewassern, abgeleitet aus der
Makroinvertebraten-Ernahrungstypologie. Limnologie Ak-
tuell 3: 353–377.
Sporka F., H. E. Vlek, E. Bulankova & I. Krno, 2006. Influence
of seasonal variation on bioassessment of streams using
macroinvertebrates. Hydrobiologia 566: 543–555.
StatSoft, Inc., 2002. STATISTICA for Windows (Software-
System for Data Analysis) Version 6.1. www.statsoft.com.
Weatherby, N. S. & S. J. Ormerod, 1990. The constancy of
univoltine assemblages in soft water streams: implications
for the publication and detection of environmental change.
Journal of Applied Ecology 27: 952–964.
Wiberg-Larsen, P., K. P. Brodersen, S. Birkholm, P. N. Grøn &
J. Skriver, 2000. Species richness and assemblage structure of
Trichoptera in Danish streams. Freshwater Biology 43: 633–
647.
Wilcoxon, F., 1945. Individual Comparisons by Ranking
Methods. Biometrics 1: 80–83.
Vlek, H. E., F. Sporka & I. Krno, 2006. Influence of macro-
invertebrate sample size on bioassessment of streams. Hyd-
robiologia 566: 523–542.
521