17
Assessing the impact of errors in sorting and identifying macroinvertebrate samples Peter Haase 1, * , , John Murray-Bligh 2 , Susanne Lohse 1 , Steffen Pauls 1 , Andrea Sundermann 1, , Rick Gunn 3 & Ralph Clarke 3 1 Department of Limnology and Conservation Research, Senckenberg – Research Institute and Natural History Museum, Clamecystraße 12, 63571 Gelnhausen, Germany 2 Environment Agency, Manley House, Kestrel Way, EX6 8EX Exeter, UK 3 CEH Dorset, Winfrith Technology Centre, Winfrith Newburgh, Dorchester, DT2 8ZD Dorset, UK (*Author for correspondence: E-mail: [email protected]) Key words: stream assessment, error estimation, sample sorting, macroinvertebrate identification Abstract This study assesses the impact of errors in sorting and identifying macroinvertebrate samples collected and analysed using different protocols (e.g. STAR-AQEM, RIVPACS). The study is based on the auditing scheme implemented in the EU-funded project STAR and presents the first attempt at analysing the audit data. Data from 10 participating countries are analysed with regard to the impact of sorting and identi- fication errors. These differences are measured in the form of gains and losses at each level of audit for 120 samples. Based on gains and losses to the primary results, qualitative binary taxa lists were deducted for each level of audit for a subset of 72 data sets. Between these taxa lists the taxonomic similarity and the impact of differences on selected metrics common to stream assessment were analysed. The results of our study indicate that in all methods used, a considerable amount of sorting and identification error could be detected. This total impact is reflected in most functional metrics. In some metrics indicative of taxonomic richness, the total impact of differences is not directly reflected in differences in metric scores. The results stress the importance of implementing quality control mechanisms in macroinvertebrate assessment schemes. Introduction All assessments of the ecological status of a river site based on biological samples are subject to uncertainty and errors. Biological surveys can only detect a change in river quality when the difference in the results before and after change is greater than uncertainty caused by natural variability and human error. In this paper, we explore the size of these errors in survey and analytical methods for assessing river quality that are used throughout Europe. Error is rarely measured in monitoring surveys or considered negligible as it is often assumed to be small and constant. If this assumption is incorrect, there is a high risk that conclusions drawn from such sur- veys could be wrong. Most quantitative assessments of the biological status of water bodies are based on the values of biological indices or metrics derived from the taxonomic composition of the sample, where the metric is intended to measure some specific aspect or general feature of the biota (Cao et al., 2003; Bo¨ hmeret al., 2004; Hering et al., 2004a, b). These measures are of little value without knowing their degree of uncertainty (Clarke, 2000; Clarke et al., 2002). This is because differences in river quality can only be confirmed when they exceed the  These authors contributed equally to this work. Hydrobiologia (2006) 566:505–521 Ó Springer 2006 M.T. Furse, D. Hering, K. Brabec, A. Buffagni, L. Sandin & P.F.M. Verdonschot (eds), The Ecological Status of European Rivers: Evaluation and Intercalibration of Assessment Methods DOI 10.1007/s10750-006-0075-6

Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Embed Size (px)

Citation preview

Page 1: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Assessing the impact of errors in sorting and identifying macroinvertebrate

samples

Peter Haase1,*,�, John Murray-Bligh2, Susanne Lohse1, Steffen Pauls1, Andrea Sundermann1,�,Rick Gunn3 & Ralph Clarke31Department of Limnology and Conservation Research, Senckenberg – Research Institute and Natural History Museum,Clamecystraße 12, 63571 Gelnhausen, Germany2Environment Agency, Manley House, Kestrel Way, EX6 8EX Exeter, UK3CEH Dorset, Winfrith Technology Centre, Winfrith Newburgh, Dorchester, DT2 8ZD Dorset, UK(*Author for correspondence: E-mail: [email protected])

Key words: stream assessment, error estimation, sample sorting, macroinvertebrate identification

Abstract

This study assesses the impact of errors in sorting and identifying macroinvertebrate samples collected andanalysed using different protocols (e.g. STAR-AQEM, RIVPACS). The study is based on the auditingscheme implemented in the EU-funded project STAR and presents the first attempt at analysing the auditdata. Data from 10 participating countries are analysed with regard to the impact of sorting and identi-fication errors. These differences are measured in the form of gains and losses at each level of audit for 120samples. Based on gains and losses to the primary results, qualitative binary taxa lists were deducted foreach level of audit for a subset of 72 data sets. Between these taxa lists the taxonomic similarity and theimpact of differences on selected metrics common to stream assessment were analysed. The results of ourstudy indicate that in all methods used, a considerable amount of sorting and identification error could bedetected. This total impact is reflected in most functional metrics. In some metrics indicative of taxonomicrichness, the total impact of differences is not directly reflected in differences in metric scores. The resultsstress the importance of implementing quality control mechanisms in macroinvertebrate assessmentschemes.

Introduction

All assessments of the ecological status of a riversite based on biological samples are subject touncertainty and errors. Biological surveys canonly detect a change in river quality when thedifference in the results before and after change isgreater than uncertainty caused by naturalvariability and human error. In this paper, weexplore the size of these errors in survey andanalytical methods for assessing river quality thatare used throughout Europe. Error is rarelymeasured in monitoring surveys or considered

negligible as it is often assumed to be small andconstant. If this assumption is incorrect, there is ahigh risk that conclusions drawn from such sur-veys could be wrong.

Most quantitative assessments of the biologicalstatus of water bodies are based on the values ofbiological indices or metrics derived from thetaxonomic composition of the sample, where themetric is intended to measure some specific aspector general feature of the biota (Cao et al., 2003;Bohmer et al., 2004; Hering et al., 2004a, b). Thesemeasures are of little value without knowing theirdegree of uncertainty (Clarke, 2000; Clarke et al.,2002). This is because differences in river qualitycan only be confirmed when they exceed the�These authors contributed equally to this work.

Hydrobiologia (2006) 566:505–521 � Springer 2006M.T. Furse, D. Hering, K. Brabec, A. Buffagni, L. Sandin & P.F.M. Verdonschot (eds), The Ecological Status of European Rivers:Evaluation and Intercalibration of Assessment MethodsDOI 10.1007/s10750-006-0075-6

Page 2: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

uncertainty inherent in the data. Uncertainty iscaused both by the natural variability of the biotaused to evaluate river quality and by human errorintroduced by the analyst. It arises from everystage of data collection, from sampling (e.g. Carter& Resh, 2001; Clarke et al., 2002; Ostermiller &Hawkins, 2004) to sample analysis and data han-dling (e.g. Doberstein et al., 2000; Haase et al.,2004a, b). The sources of these must be identifiedso that they can be reduced and can be accountedfor when results are evaluated. This study focuseson the two major sources of analytical error:sorting error and identification error.

The EU Water Framework Directive(EU-WFD) (European Union, 2000) requires thelevel of confidence and precision of results pro-vided by monitoring programmes to be given in theRiver Basin Management Plans (EU-WFD: An-nexe V, Section 1.3). As with all ecological analy-ses, it is more important to have moderate errorsthat have been quantified than to have small errorsbut no estimate of their magnitude. The formerallows the significance of any differences to bedetermined whereas the latter does not. In mostmember states, monitoring for the EU-WFD willbe undertaken by environmental protection agen-cies or commercial environmental laboratories,and in same cases by research laboratories. Irre-spective of who does the analysis, it is impossible toeliminate all errors from data based on field surveyand laboratory analysis. Therefore it is essential tounderstand how error can be quantified and mini-mised and to provide tools to assess these errors.

One aim of the EU-funded STAR1 project wasto identify and quantify different sources of errorthat affect metrics and thus assessment results(Furse et al., 2006). Within the project theuncertainty entailed to site selection, naturalvariability within a site or between seasons, dif-ferent sub-sampling strategies, and human errorcaused in sample processing were studied (Furseet al., 2006). To evaluate the error of sampleprocessing, a sorting and identification audit wasimplemented for two major biological quality

assessment components, macroinvertebrates anddiatoms. For invertebrates, the emphasis wasplaced on qualitative sorting and identificationerrors inherent in the laboratory treatment ofinvertebrate samples.

In the present study, pre audit and post auditmacroinvertebrate taxa lists and resulting metricvalues are compared, based on samples collectedfollowing standardised sampling and processingprotocols. The focus of this study thus lies ondetermining differences between individual sourcesof error and how sorting and identification errorseffect metrics commonly used in river qualityassessment. The results from other sources of er-ror, such as replicate sampling (Clarke et al.,2006a), sub-sampling (Clarke et al., 2006a; Vleket al., 2006) and natural variability like seasonalchange (Sporka et al., 2006) are presented in otherpapers of this issue.

Materials and methods

Audit design

The auditing approach for macroinvertebratesamples applied in the STAR project involved twoseparate components: (1) a sorting audit at familylevel, undertaken by a single auditing laboratory,to assess sorting errors across the whole project ina consistent and unified way, and (2) an identifi-cation audit undertaken by partners familiar withanalysing invertebrates from similar environments.Generally, these were laboratories from neigh-bouring countries from the same Ecoregion. Thisapproach was chosen for the identification audit,because no laboratory involved in the project hadsufficient experience in analysing all the speciesfound in the geographic area covered by the pro-ject to undertake the identification audit for thewhole project. Because of this, each partner’sidentification audit was done by one or morepartners from neighbouring countries. Althoughthis caused the quality of the identification audit tovary between partners, it ensured that audit resultswere more accurate. Throughout this paper, theterms primary sample, primary analyst and pri-mary data relate to the main analysis of a sample,the terms audit sample and audit data to the re-analysis of a sample in the audit.

1 Standardisation of river classifications: Framework method

for calibrating different biological survey results against eco-

logical quality classifications to be developed for the Water

Framework Directive, STAR. Contract No: EVK1-CT 2001-

00089

506

Page 3: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Selecting audit samples

Macroinvertebrate samples of the STAR projectsampling programme were taken at all sites by eachparticipating partner using two different methods:(a) the STAR-AQEM method, a multi-habitat-sampling protocol developed within the STARproject (Furse et al., 2006) and (b) a ‘national’method, whichwas normally a widely used protocolwithin the individual partner’s member state (Furseet al., 2006). In Germany, Austria andGreece therewere no existing common ‘national’ sampling pro-tocols. Alternativly, the UK RIVPACS protocolwas used (Murray-Bligh et al., 1997). Twelve sam-ples from 10 countries (Nsort=120) were analysed inthe sorting audit: 6 STAR-AQEM samples(NSA=60) and 6 collected and analysed by thecorresponding national method (NNat=60) (Ta-ble 1). A subset of these samples was used for theidentification audit. This subset comprised the 12samples from the 6 countries for which RIVPACSor the RIVPACS comparable PERLA was theirnational method (from hereon referred to asRIVPACS/PERLA), to complement the dataavailable from the STAR-AQEM data sets. Thisallowed for a comparison of different methodsbased on a reasonable sample size for both STAR-AQEM and RIVPACS/PERLA samples (N=72for both methods) (Table 1).

Primary analysts were aware that all 1090invertebrate samples collected for the STARproject were potentially subject to audit. Apartner not involved in any primary analysesselected the samples for the audit randomly.Audit samples were selected roughly evenly be-tween seasons and included samples representinghigh, good and moderate ecological quality (pre-classification based on expert judgement). Foreach combination of site and season, one samplecollected by the STAR-AQEM protocol and onesample collected by the national survey protocolwas chosen for audit. Partners were not toldwhich samples were selected for audit until allthe primary data had been entered into theSTAR database, AQEMdip (AQEM Consor-tium, 2004) so that the primary data could notbe altered after the audit samples had been se-lected. This ensured that primary analysts could

not give any special attention to audit samplesand the audit results would therefore reflect thequality of all the primary analyses.

Audit procedure

Sorting audit

When sorting STAR-AQEM samples, the pri-mary analyst had to remove all specimens fromthe sub-sample. After analysis, the sorted speci-mens were placed in a labelled vial or jar con-taining preservative and stored for theidentification audit (see Furse et al., 2006 for adetailed sampling and sorting protocol). Allorganic and inorganic material from the sortedSTAR-AQEM sub-sample, together with anyanimals remaining in it, was returned to a jarwith preservative for the sorting audit. In thesorting audit, the auditors re-sorted the wholesub-sample removing any animals they foundand placing them in a new, labelled vial. Theonly identification undertaken by the sortingauditors was to identify any additional familiesthat were not recorded by the primary analyst,which were then recorded as gains.

For national methods protocols that did notdemand that all specimens were removed duringsorting, the primary analysts had to remove up tothree representatives (but not every specimen) ofevery taxon for the identification audit (see Furseet al., 2006 for a detailed protocols of samplingand sorting procedures used in the STAR project).The taxa were based on the taxonomic level of theprimary analysis: if the identification was to familylevel, the taxa removed were families; if the samplewas analysed to species level, the taxa removedwere species (Table 1). The specimens removedhad to be good quality examples and not simplythe first ones that the analysts found in the sample.The sorting auditors re-sorted the sample and re-moved from it all specimens of families missed bythe primary analyst. They also removed up tothree good quality specimens of every potentiallydifferent species that they found in the sample. Theauditors put these specimens in a vial with pre-servative.

507

Page 4: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Table

1.Samplesusedin

thepresentstudy.‘‘NationalMethod’’refers

tothesamplingandsortingprotocolapplied

intherespectivecountryforthenationalmethodsamples,‘‘National

methodcategory’’refers

tothemethodcategory

into

whichprotocolwasplacedfortheselectionoftheidentificationauditsubsetofsamples

Country

STAR-

AQEM

sample*

National

method

sample*

National

method

National

method

category

Season

River

Site

STAR

Site

No.

Sample

date

Sorting

audit

IDaudit

analyses

Austria

A0500261

A0500431

RIV

RIV

/PER

Spring

Sarm

ingbach

Wolfsschlucht

600

16.04.2003

FS

X

Austria

A0500291

A0500461

RIV

RIV

/PER

Spring

GrosseYsper

nearAltenmarkt

603

16.04.2003

FS

X

Austria

A0500332

A0500502

RIV

RIV

/PER

Summer

Sarm

ingbach

Waldhausen

607

09.07.2002

FS

X

Austria

A0600141

A0600341

RIV

RIV

/PER

Spring

Wildbach

nearKramermirtl

701

28.05.2003

FS

X

Austria

A0600192

A0600392

RIV

RIV

/PER

Summer

Stullneggbach

nearAichegg

706

30.07.2002

FS

X

Austria

A0600232

A0600432

RIV

RIV

/PER

Summer

Stullneggbach

nearMainsdorf

708

30.07.2002

FS

X

Czech

Republic

C0401621

C0403561

PERLA

RIV

/PER

Spring

VelkaHana

Rychtarov

614

04.04.2003

FS

X

Czech

Republic

C0401701

C0403631

PERLA

RIV

/PER

Spring

Nectava

Brezinky

620

27.03.2003

FS

X

Czech

Republic

C0401172

C0403152

PERLA

RIV

/PER

Summer

Umori

Zbraslavec

625

19.07.2002

FS

X

Czech

Republic

C0501212

C0503182

PERLA

RIV

/PER

Summer

Huntava

Valsovskydul

713

26.07.2002

FS

X

Czech

Republic

C0501272

C0503232

PERLA

RIV

/PER

Summer

Luha

Sloup

717

22.07.2002

FS

X

Czech

Republic

C0501941

C0503831

PERLA

RIV

/PER

Spring

Trebuvka

Borsov

722

09.04.2003

FS

X

Denmark

K0201011

K0201021

DSFI

Nat

Spring

Karstoft

Noerre

Grene

662

01.04.2003

FS

Denmark

K0202012

K0202022

DSFI

Nat

Summer

Mattrup

StidsMoelle

663

06.08.2002

FS

Denmark

K0206011

K0206021

DSFI

Nat

Spring

Kastbjerg

Edderup

667

01.04.2003

FS

Denmark

K0207012

K0207022

DSFI

Nat

Summer

Fjederholt

Okkels

668

08.08.2002

FS

Denmark

K0209011

K0209021

DSFI

Nat

Spring

Skibsted

Skibstedbro

670

07.04.2003

FS

Denmark

K0210012

K0210022

DSFI

Nat

Summer

Skals

Faarup

671

12.08.2002

FS

France

F0800013

F0800193

IBGN

Nat

Autumn

Aube

Aubepierre-sur-Aube

724

25.09.2002

FF

France

F0800021

F0800201

IBGN

Nat

Spring

Seine

Erm

itageduValdeSeine

725

15.04.2003

FF

France

F0800041

F0800221

IBGN

Nat

Spring

Aujon

upstream

of

Giey-sur-Aujon

726

25.05.2003

FF

France

F0800063

F0800243

IBGN

Nat

Autumn

Ornain

downstream

ofAbainville

728

30.09.2002

FF

France

F0800073

F0800253

IBGN

Nat

Autumn

Meuse

(Bassoncourt)

between

Daillecourt

&

Bassoncourt

729

10.10.2002

FF

France

F0800111

F0800291

IBGN

Nat

Spring

Mouzon

Sartes

733

09.04.2003

FF

Germany

D0400392

D0400512

RIV

RIV

/PER

Summer

Wehebach

Wehebachtalsperre

627

29.06.2002

FS

X

Germany

D0400461

D0400581

RIV

RIV

/PER

Spring

Salwey

Niedersalwey

634

25.03.2003

FS

X

Germany

D0300202

D0300352

RIV

RIV

/PER

Summer

Stepenitz

nearPutlitz

649

15.07.2002

FS

X

Germany

D0300201

D0300351

RIV

RIV

/PER

Spring

Stepenitz

nearPutlitz

649

10.04.2003

FS

X

508

Page 5: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Germany

D0600022

D0600122

RIV

RIV

/PER

Summer

Ilme

aboveRelliehausen

816

21.06.2002

FS

X

Germany

D0600071

D0600171

RIV

RIV

/PER

Spring

Klingbach

aboveHausen

821

11.03.2003

FS

X

Greece

H0400222

H0400282

RIV

RIV

/PER

Summer

Peristeria

Artiki

735

29.07.2002

FF

X

Greece

H0400242

H0400302

RIV

RIV

/PER

Summer

Tsouraki

Tsouraki

737

01.08.2002

FF

X

Greece

H0400051

H0400151

RIV

RIV

/PER

Spring

Tsouraki

SL

98

738

21.05.2003

FF

X

Greece

H0400031

H0400131

RIV

RIV

/PER

Spring

Krathis

Tsivlos

739

22.05.2003

FF

X

Greece

H0400262

H0400322

RIV

RIV

/PER

Summer

Gadouras

Gadouras

753

24.08.2002

FF

X

Greece

H0400011

H0400111

RIV

RIV

/PER

Spring

Gorgopotamos

GorgopotamosBridge

756

18.05.2003

FF

X

Portugal

P0411313

P0431313

PMP

Nat

Autumn

Xevora

Xevora

863

03.02.2003

FG

Portugal

P0411133

P0431133

PMP

Nat

Autumn

Tripeiro

Tripeiro

864

06.12.2002

FG

Portugal

P0411121

P0431121

PMP

Nat

Spring

Taveiro

Taveiro

865

13.05.2003

FG

Portugal

P0411213

P0431213

PMP

Nat

Autumn

Alpreade

Alpreade

866

05.12.2002

FG

Portugal

P0411321

P0431321

PMP

Nat

Spring

Ponsul

Ponsul

867

14.05.2003

FG

Portugal

P0411221

P0431221

PMP

Nat

Spring

Basagueda

Basagueda

868

14.05.2003

FG

Slovakia

V0100473

V0100483

PERLA

RIV

/PER

Autumn

Bystrica

podVel’kouskalou

984

17.09.2003

FS

X

Slovakia

V0100493

V0100503

PERLA

RIV

/PER

Autumn

Bystrica

Hornadomovina

986

17.09.2003

FS

X

Slovakia

V0100513

V0100523

PERLA

RIV

/PER

Autumn

Bystrica

Bystrieany

987

17.09.2003

FS

X

Slovakia

V0100423

V0100433

PERLA

RIV

/PER

Autumn

Hostianskypotok

priPodJavorom

988

16.09.2003

FS

X

Slovakia

V0100443

V0100453

PERLA

RIV

/PER

Autumn

Hostianskypotok

podObecnym

vrchom

989

16.09.2003

FS

X

Slovakia

V0100463

V0100563

PERLA

RIV

/PER

Autumn

Hostianskypotok

nadTopoleiankami

990

18.09.2003

FS

X

Sweden

S0501351

S0502311

Swedish

Nat

Spring

Nittalven

downstream

of

Nordtjarnsalven

685

04.06.2003

FS

Sweden

S0501063

S0502023

Swedish

Nat

Autumn

Savalven

upstream

ofSavefors

689

23.10.2002

FS

Sweden

S0501431

S0502391

Swedish

Nat

Spring

Horksalven

Brattforsen

691

04.06.2003

FS

Sweden

S0601193

S0602153

Swedish

Nat

Autumn

Forsmarksan

Johannisfors

875

30.10.2002

FS

Sweden

S0601293

S0602253

Swedish

Nat

Autumn

Hagaan

Lurbo

876

19.11.2002

FS

Sweden

S0601561

S0602521

Swedish

Nat

Spring

Stromaran

Hillebola

878

22.05.2003

FS

United

Kingdom

U1510011

U1510321

RIV

RIV

/PER

Spring

Ecchinsw

ellBrook

Headley

639

07.04.2003

FS

X

United

Kingdom

U1510663

U1510973

RIV

RIV

/PER

Autumn

Westbury

Brook

Westbury

642

08.10.2002

FS

X

United

Kingdom

U1510101

U1510411

RIV

RIV

/PER

Spring

CliffBrook

Crowton

648

13.04.2003

FS

X

United

Kingdom

U2310763

U2311073

RIV

RIV

/PER

Autumn

Clun

Marlow

674

28.09.2002

FS

X

United

Kingdom

U2310181

U2310491

RIV

RIV

/PER

Spring

Ogmore

Bridgend

678

09.04.2003

FS

X

United

Kingdom

U2310833

U2311143

RIV

RIV

/PER

Autumn

Sirhowy

Ynysddu

681

27.09.2002

FS

X

No.ofsamples

NSA=

60

NNat=

60

NRIV

/PER=

32

Nsort=120

NID=72

‘‘STAR

SiteNo.’’refers

totheuniquesite

codeusedthroughouttheSTAR

project.(SA=STAR-A

QEM,RIV

=RIV

PACS,IB

GN=French

nationalmethod,DSFI=

DanishStream

FaunaIndex

protocol,PERLA=Czech

nationalmethod,Swedish=Swedishnationalmethod,PMP=Portuguesenationalmethod,NAT=other

nationalmethods;further

inform

ation

onthedifferentmethodsseeFurseet

al.,2005).Taxonomiclevelindicatedforsortingauditandidentificationaudit(F=familylevel,G=mainly

genuslevel,S=mainly

specieslevel).‘‘X’’

indicatessamplesusedforsubsequentanalysesin

theidentificationaudit.*Refersto

STAR

samplescode.

509

Page 6: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Identification audit

The identification audit was undertaken at thetaxonomic level used for the calculation of themetrics by the primary analyst partner. For somepartners, this was species, for others it was mixedtaxonomic level (Table 1). Vials of specimens andmaterial mounted on permanent microscope slidesby the primary analysts were sent to the identifi-cation auditors. Temporary mounts could not besent. The identification auditors used the samemethod of identification that they used for theirprimary analysis. Partners that used experts fortheir primary analyses used the same experts forauditing identifications. The identification auditorsrecorded a new list of taxa based on their identi-fication of the vial(s) and slide mounts from theprimary analyst. They recorded gains and losses,compared to the primary analyst’s taxa list. Be-cause it was impractical to undertake the identifi-cation audit quantitatively, the metrics used tocompare the two samples were based on presence/absence data. A binary taxa list was created toallow a qualitative comparison of the results. Thebinary taxa lists comprised the primary analysts’results (from hereon referred to as ‘‘primary’’ or‘‘P’’), the primary analysts’ results plus furthertaxa observed at the sorting audit level (fromhereon referred to as ‘‘sorting audit’’ or ‘‘AS’’) anda taxa list based on the identification auditor’sresults (from hereon referred to a ‘‘identificationaudit’’ or ‘‘AID’’). AID is only based on taxa foundin the primary analysis and does not consider taxagained through the sorting audit. A last taxa listrepresents the combined results of the sorting andidentification audits (from hereon referred to as‘‘total audit ‘‘ or ‘‘ATOT’’). By this design, it waspossible to establish the effects of errors at eachaudit level independently and the cumulative errorof both the sorting and identification audit.

Audit analyses

Chironomidae, Nematoda and Oligochaeta taxawere not included in the audit. All sorting andidentification audit results were based on qualita-tive errors only. Two parameters were used tomeasure analytical quality in these audits: thenumber of gains (taxa that were not recorded asbeing present in the sample but which the auditors

found in the sample) and the number of losses(taxa that are recorded as being present but whichwere not found in the sample by the auditor).Gains (G) and losses (L) were identified by com-paring the auditor’s taxa list to that of the primaryanalyst. Only gains were recorded in the sortingaudit. Losses and gains were recorded in theidentification audit.

Neither the primary analyst’s nor the identifi-cation auditor’s species lists were considered to bedefinitive – they were considered simply as twoviews of the same data. Audit results were not usedto correct the primary data.

The primary analysts calculated a range ofmetrics separately for the primary and auditsample analyses using the AQEM-STAR assess-ment software ASTERICS (www.eu-star.at). Thedifferences in metric results for primary and auditsamples were calculated and used to determine theeffect of analytical errors on a selection of metrics,that are commonly used in the member states toclassify river quality (Hering et al., 2004).The audit results were qualitative and the metricswere calculated from the binary taxa lists, i.e.presence/absence data. For some of the selectedmetrics, which are normally based on quantitativedata, e.g. Shannon–Wiener diversity index, thisapproach could only reflect the qualitative com-ponent of the error. Depending on the abundancestructure of a sample, this approach may overes-timate or underestimate the impact of differencesin some metrics. However, this approach made itpossible not only to test the effect of uncertaintyon ‘‘counting’’ metrics, such as Number of Taxa,but also to get an idea of the uncertainty related tosorting and identification errors based on com-monly used richness measures and functionalmetrics. The similarity between the primary andaudit results was investigated by Jaccard similarity(Jaccard, 1901), which was calculated using PC-ORD Version 4.25 (McCune & Mefford, 1999).The statistical analysis of metric results includedmean deviation and the spread of differences be-tween primary and audit results. It was also pos-sible to compare the performance of the STAR-AQEM and RIVPACS methods in regard to thedeviation of metric results between primary andaudit samples. Mann–Whitney U-tests (Mann &Whitney, 1947) were used to see whether deviationwas larger using one particular method. Wilcoxon

510

Page 7: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Test (Wilcoxon, 1945) was used to see if differencesbetween primary and audit samples were signifi-cant. All statistical analyses were performed inStatistica 6.1 (StatSoft, 2002).

Results

Absolute differences in resulting taxa lists

The results of the sorting audit are summarised asgains and the identification audit results as acombination of gains and losses. Figure 1 showsthe number of differences between taxa lists basedon gains and losses after each audit by differentmethods. In STAR-AQEM samples, the numberof differences identified in the sorting audit is sig-nificantly smaller than those observed duringidentification audit (Wilcoxon Test, p<0.001). Inboth the national methods and the RIVPACS/PERLA methods complex there is no significantdifference between the number of differencesidentified in the sorting or the identification audit(Wilcoxon Test, p<0.5). The number of gainsobserved at the sorting level is significantly higherfor RIVPACS/PERLA than for national methodsor for STAR-AQEM (Mann–Whitney U-Test,p<0.003). The number of gains observed at theidentification audit and the total audit level issignificantly lower in the national methods than inRIVPACS/PERLA or STAR-AQEM (Mann–Whitney U-Test, p<0.001). This could be becausemany of the national methods determine taxa atthe family or genus level, where there is a very lowerror. In both RIVPACS/PERLA and STAR-AQEM identification is generally to species level,presumably leading to higher levels of identifica-tion difference. The results further indicate that innational method samples and RIVPACS/PERLAcomplex samples, both sorting and identificationdifferences contribute about equally to the totalnumber of differences, while in STAR-AQEMsamples the differences are mainly caused byvarying identification results.

The qualitative similarity between taxa listsbased on primary results, sorting audit results,identification audit results and total audit resultswere also tested by calculating Jaccard similaritybetween the different fractions. This similarityvalue was calculated for samples from those

countries where STAR-AQEM and RIVPACS/PERLA methods were applied (c.f. Table 1).Figure 2 shows the Jaccard similarity values bymethod between the primary taxa lists (P) and thesorting audit taxa lists (AS) (P/AS), between theprimary and identification audit taxa lists (AID)(P/AID) and between the primary and the totalaudit taxa lists (P/ATOT). In both methods, there isa significant difference in Jaccard similarity be-tween sorting audit and identification audit: inSTAR-AQEM Jaccard similarity is significantlyhigher after sorting audit, while in RIVPACS/PERLA it is significantly lower after the sortingaudit (Wilcoxon Test for both, p<0.01). There isno significant difference in Jaccard similarity afteridentification audit or total between the twomethods (Mann–Whitney U-Test, p>0.36). How-ever, in RIVPACS/PERLA samples the Jaccardsimilarity between primary and sorting auditsamples is significantly lower than in STAR-AQEM samples (Mann–Whitney U-Test,p<0.01). This indicates that in RIVPACS/PERLAsamples the sorting error contributes more to thetotal error than differences in identification, whilein STAR-AQEM samples the effect of the sortingaudit is much less than that of the identificationaudit. In both methods, the Jaccard similarity issignificantly lower after the total audit (P/ATOT:RIVPACS/PERLA: median=0.58; STAR-AQEM: median=0.63) than after the sorting auditor identification audit (Wilcoxon Test, p<0.01).This shows that in both methods there is acumulative effect of both errors with respect toJaccard similarity.

Metric results

For the same subset of samples, qualitative taxalists were also used to calculate 12 metrics com-monly used in river quality assessments, to exam-ine the impact of sorting and identification erroron the metrics and thus the assessment results. Sixof the metrics examined were richness measures:number of taxa (No. Taxa), number of families(No. Families), number of genera (No. Genera),number of Ephemeroptera, Plecoptera, Trichoptera,Coleoptera, Odonata and Bivalvia taxa (EPTCOBTaxa), number of Ephemeroptera, Plecoptera,Trichoptera taxa (EPT Taxa), Shannon–WienerDiversity index (Diversity Shannon–Wiener)

511

Page 8: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

(Shannon & Weaver, 1949). Two were relativemeasures of composition: number of Ephemerop-tera, Plecoptera, Trichoptera taxa to the numberof Diptera taxa (EPT/Diptera Taxa) and thenumber of taxa scored as r-strategists compared tothe number of taxa scored as K-strategists (r/Krelationship). Four were functional metrics: Bio-logical Monitoring Working Party score (BMWP),the average score per taxon (ASPT) (both Armit-tage et al., 1983), the Rhithron Typie Index (RTI)(Biss et al., 2002) and the Rhithron-Feeding typesindex (RETI), which analyses the proportion ofshredders and grazers (Schweder, 1992). Anexplanation of all these metrics can be found in theAQEM-STAR assessment software ASTERICS(www.eu-star.at).

Table 2 gives the absolute differences in metricvalues compared to the primary result. A WilcoxonTest was used to see if there are significant differ-ences between metric values scored for the primarytaxa list and those scored after the sorting audit,the identification audit and the total audit. In

STAR-AQEM samples there are significant dif-ferences between the primary results and resultsafter sorting audit and identification audit foreight metrics (Wilcoxon Test p<0.05). In RIVP-ACS/PERLA samples eight metrics showed sig-nificant differences between the primary metricresults and the results after sorting and sevenmetrics showed significant differences after theidentification and total audit (Wilcoxon Testp<0.05). In both methods, six of these significantdifferences were observed in metrics that measuretaxonomic richness. BMWP was significantly dif-ferent at all levels of the audit in both methods(Wilcoxon Test, p<0.05). In STAR-AQEM sam-ples only two metrics were significantly different atthe total audit level (BMWP and ASPT).

The absolute differences between primarymetric results and those metric results scored aftersorting (P-AS), identification (P-AID) and totalaudit (P-ATOT) were also calculated (Fig. 3).

In STAR-AQEM samples differences in metricresults observed after identification audit are

Figure. 1 Box Plots showing the number of differences observed during the sorting (AS: gains only) and identification audit (AID: gains

plus losses) and cumulative number of differences (ATOT: gains plus losses) for 24 samples collected and analysed following national

methods protocols, 36 RIVPACS/PERLA (RIV/PER) samples and 60 STAR-AQEM samples. Box Plots: h indicates median; box

indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 120.

512

Page 9: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

higher than after sorting audit for all metrics butASPT and RETI. For RTI, No. Taxa, EPTCOBTaxa, EPT Taxa and Diversity Shannon–Wienerthe difference is significant (Wilcoxon Test,p<0.02) (Table 2). In RIVPACS/PERLA samplesdifferences in metric results are higher after sortingaudit than after identification audit, the onlyexception is r/k relationship (Table 2). The differ-ences after sorting audit are significantly higher ineight metrics. These eight metrics are all richnessmeasures, BMWP score and ASPT score (Wilco-xon Test, p<0.02). These results suggest that inRIVPACS/PERLA samples, sorting error causesmore analytical error than identification, while inSTAR-AQEM differences in identification appearto be more important than sorting error.

The differences after both audits are generallylower in STAR-AQEM samples than they are inRIVPACS/PERLA samples (Fig. 3). For allfunctional metrics the differences observed arelargest after the total audit, suggesting a cumula-tive effect of the two audit levels. For richness

measures the situation is different. In RIVPACS/PERLA samples the largest differences are ob-served at the sorting audit level (Table 2, Fig. 3).In STAR-AQEM samples the largest differencesare observed at the identification audit level forrichness measures (Table 2, Fig. 3).

Discussion

In this paper, we make a first attempt to evaluatethe analytical error observed in stream assessmentsbased on sorting and identification components oflaboratory sample treatment. The errors entailedin these processes are important for providingconfidence in assessment results. There are manyother sources of variation in macroinvertebratesampling and sample analysis. These include nat-ural variability (e.g. McElravy et al., 1989;Weatherby & Ormerod, 1990; Boulton & Lake,1992), operator dependent sampling variability(Clarke et al., 2002), variability in sample

Figure. 2 Box Plots showing Jaccard similarity values between the primary taxa lists (P) and the taxa lists based on the sorting audit

(AS) (P/AS), between the primary taxa lists and the identification audit (AID) taxa lists (P/AID) and the difference between the primary

taxa lists and the total audit (ATOT) taxa lists (P/ATOT) for 36 RIVPACS/PERLA (RIV/PER) and 36 STAR-AQEM samples. Box

Plots: h indicates median; box indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 72.

513

Page 10: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Table

2.Meanvalues

(Ø)andstandard

deviation(SD)ofabsolute

differencesin

metricvalues

Primary

(P)

Sortingaudit(P-A

S)

Identificationaudit(P-A

ID)

Totalaudit(P-A

TOT)

P/A

SP/A

IDP/A

TOT

ر

SD

ر

SD

ر

SD

ر

SD

p*

p*

p*

STAR-A

QEM

BMWP

118.36±44.89

4.92±

7.90

7.92±

11.73

9.53±8.65

<0.001

<0.001

0.031

ASPT

6.70±0.98

0.10±

0.32

0.10±

0.23

0.30±0.50

0.041

0.011

<0.001

RTI

11.10±

4.15

0.26±

0.48

0.55±

0.55

0.94±1.00

0.600

0.789

0.307

RETI

0.58±0.11

0.03±

0.05

0.02±

0.02

0.05±0.08

0.175

0.926

0.598

EPT/D

iptera

Taxa

3.12±2.17

0.34±

0.64

0.55±

0.86

0.99±1.41

0.218

0.939

0.304

r/K

relationship

0.06±0.05

0.01±0.01

0.01±0.01

0.02±0.03

0.260

0.125

0.092

No.Taxa

34.53±

15.12

3.00±

3.22

5.42±

4.06

3.14±2.98

<0.001

<0.001

0.352

No.Families

21.58±

6.51

1.08±

1.36

1.31±

1.85

1.42±1.25

<0.001

<0.001

0.737

No.Genera

27.56±

10.46

1.78±

2.27

2.58±

2.89

2.56±2.47

<0.001

<0.001

0.596

EPTCOB

Taxa

24.92±

13.04

2.17±

2.62

3.89±

3.47

2.81±2.75

<0.001

<0.001

0.244

EPTTaxa

17.83±

10.27

1.33±

1.96

3.14±

2.87

2.36±2.32

<0.001

<0.001

0.110

Diversity

Shannon–Wiener

3.41±0.60

0.09±

0.12

0.15±

0.11

0.10±0.09

<0.001

<0.001

0.214

RIV

PACS

BMWP

124.72±51.88

18.83±

18.49

6.83±

12.54

19.39±

18.18

<0.001

0.034

<0.001

ASPT

6.90±0.96

0.22±

0.28

0.09±

0.16

0.28±0.29

1.000

0.438

0.797

RTI

11.54±

3.43

0.59±

0.88

0.48±

0.66

1.01±0.90

0.265

0.514

0.838

RETI

0.58±0.15

0.04±

0.04

0.02±

0.03

0.05±0.05

0.299

0.355

0.528

EPT/D

iptera

Taxa

3.64±2.38

0.52±

0.74

0.46±

0.48

0.69±0.88

0.027

0.688

0.188

r/K

relationship

0.04±0.05

0.01±0.02

0.01±0.02

0.02±0.03

0.207

0.614

0.432

No.Taxa

36.19±

17.64

10.89±

8.86

5.28±

6.39

8.17±7.21

<0.001

<0.001

<0.001

No.Families

21.86±

8.41

4.08±

3.86

1.39±

2.23

4.06±3.88

<0.001

0.003

<0.001

No.Genera

28.53±

12.96

6.56±

6.12

2.83±

4.31

5.83±5.46

<0.001

<0.001

<0.001

EPTCOB

Taxa

26.56±

14.12

7.78±

6.49

3.36±

4.03

5.78±5.54

<0.001

<0.001

<0.001

EPTTaxa

20.33±

11.77

4.53±

4.65

2.44±

3.08

3.58±3.78

<0.001

<0.001

0.001

Diversity

Shannon–Wiener

3.39±0.76

0.29±

0.35

0.15±

0.16

0.26±0.36

<0.001

<0.001

<0.001

Metrics

values

basedonthesortingaudit(P-A

S),identificationaudit(P-A

ID)andtotalaudit(P-A

TOT)comparedto

thevalues

observed

usingprimary

taxalist(P).Significance

ofdifferencesbetweenP-A

SandP-A

IDandsignificance

betweenaverageprimary

metricresultandtheaveragemetricresultsatthedifferentauditlevelsweretested

using

WilcoxonTest(p

indicateslevel

ofsignificance).*WilcoxonTest:boldface

values.significantatp<

0.05.

514

Page 11: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

treatment (Haase et al., 2004b) and sub-samplingvariability (Lorenz et al., 2004; Clarke et al.2006b). The present study provides a first approx-imation of the quality and degree of error that maybe observed from sorting errors and operator-dependent differences in macroinvertebrate identi-fications. Aspects of variability and sources of er-ror related to replicate sampling (Clarke et al.,2006a), sub-sampling (Clarke et al., 2006a; Vleket al., 2006) and natural variability (Sporka et al.,2006) are discussed in other essays in this issue.

Absolute differences in resulting taxa lists

Our study provides some interesting insights intothe two sources of error examined in this study:sorting and identification error. The two compo-nents of error play a different role in STAR-AQEM and RIVPACS/PERLA samples. Whilesorting error seems to be more important inRIVPACS/PERLA samples, identification errorseems to be more important in STAR-AQEMsamples (Figs. 1–3). The effect of identificationerror should, by its nature, be similar or equal inboth methods because it is caused by two analystslooking at the same set of specimens. The differ-ence in taxonomic expertise or interpretations ofdistinguishing morphological characters – e.g.relative bristle length or coloration – is the samefor the operators, independent of the method usedto obtain the set of specimens. Therefore, it is notsurprising that no significant differences in identi-fication error between the two methods were de-tected. STAR-AQEM samples seem to be lessaffected by sorting error than RIVPACS/PERLAsamples. This could be the result of differences inthe sorting procedures for the methods. Whileboth methods apply a sub-sampling procedure, thesub-sampling approach is very different. In STAR-AQEM, a defined fraction of the sample is com-pletely sorted and all animals are removed fromthe sub-sample for identification. In RIVPACS,the whole sample is sorted sequentially by trans-ferring small aliquots of sample material into adish and sorting a defined fraction of this dish (e.g.¼ or 1/8), depending on the number of specimens inthe total sample (see Furse et al., 2006 for a de-tailed sampling and sorting protocol for themethods used in the STAR project). This definedfraction will from hereon be referred to as the

‘‘sorted fraction’’. The rest of the material in thedish is scanned and only taxa, which have not beenobserved in the sorted fraction or any of the sortedfractions in previous dishes are picked and re-corded. Also, instead of removing all individualsof abundant taxa from the sorted fraction, they areleft in the tray and counted instead. This inevitablyleads to a more variable sorting protocol inRIVPACS, which also requires a higher level oftaxonomic expertise from the person sorting thesample than the sorting protocol in STAR-AQEM(Haase et al. 2004a, b and references therein). Thissource of error may be overestimated in the pres-ent study, because for many partners, this was thefirst time they applied the RIVPACS protocol.Although the same is true for most partners withrespect to the STAR-AQEM protocol, the com-plexity of the RIVPACS protocol may make itmore prone to mistakes by novices.

Our results indicate that, for RIVPACS sam-ples the sorting error is equally as important as theidentification error, while in STAR-AQEM sam-ples, the sorting error is less severe. This is sup-ported by both the Jaccard similarity analyses aswell as the number of gains and losses observed ateach level of the audit procedure (Figs. 1 and 2).

There appears to be a cumulative effect ofanalytical error in the two levels of audit. Thesingle effect of sorting error and differences inidentification still maintain a high Jaccard simi-larity (>0.8) between samples regardless ofmethod. The cumulative effect is much more severein both methods. In both methods the cumulativeerror decreases Jaccard similarity to about 0.6;60% shared taxa between pre and post audit taxalists is a very poor value. In both methods there arenumerous samples in which differences based onlyon analytical error exceed this value. Similar val-ues are e.g. observed when comparing caddis flyassemblages from different regions (Wiberg-Lar-sen et al., 2000) or differently impacted sites alonga river stretch (Ganasan & Hughes, 1998). Thissuggests that the differences between taxa listscaused by the analytical errors assessed in thisstudy are severe. These results stress the need for ahigh degree of standardisation of methods andraise the issue of increasing confidence in assess-ment results through independent sample auditing.Our results suggest that errors caused duringsorting and identification procedures cannot be

515

Page 12: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

ignored in river quality assessment. Several ques-tions concerning the effect of error on quantitativedata sets and the practical implementation ofauditing schemes should be subject to furtherstudy. For example, are the errors observed in thepresent study more or less pronounced in quanti-tative data sets? If they are as pronounced, are theerrors constant or stable and can the degree oferror be estimated in a one-off survey or researchproject? If not, must they be measured continuallyas an integral part of the survey?

Metric results

This study provides a preliminary view of howmetrics are affected by sorting and identificationerrors, but can only quantify these errors on singlequalitative metrics. How quantitative metrics ormultimetric assessment results are affected canonly be estimated. The reason for this lies in thequalitative nature of the audit approach. Theindices of the BMWP-score system (NationalWater Council, 1981) were ideally suited to thesorting audit because they are based on family-level presence-absence data and included both anindex of organic pollution (ASPT) and of generalstress (No. Taxa). However, other metrics whichmake use of the absolute or relative abundances oftaxa or involve species are less suited to thisqualitative approach. Many of the metrics used forstream assessment rely not only on pres-ence/absence data, but also on abundance data foreach taxon. Four of the twelve metrics investigatedin this study normally use quantitative data. Theinterpretation of the results obtained for thesemetrics based on qualitative data must thereforebe interpreted with caution. Also, all of themultimetric assessment schemes implemented todate for the countries whose samples were analy-sed in this study are based on at least one metricthat requires quantitative data (Bohmer et al.,2004; Hering et al. 2004a; Ofenbock et al., 2004).Therefore, we cannot estimate the effect of sortingand identification error on a multimetric assess-

ment result. It is important that this subject isaddressed by a quantitative audit scheme in futurestudies. Then error ranges can be assigned toassessment results and confidence in assessmentresults increased.

Despite the difficulties related to the qualitativenature of our audit design, some interestingobservations were made regarding the effect ofanalytical error on certain metrics. Intuitively, onewould expect the total number of differences or thetotal error (observed in both the sorting andidentification audit) to be higher than that ob-served in the sorting or the identification auditonly, independent of the method. For the numberof gains and losses and Jaccard similarity values,this was the case. This was also the case forfunctional metrics. In qualitative metrics that areindicative of richness, the results presented in thisstudy are different and somewhat counter-intui-tive. In RIVPACS/PERLA samples especially, thedifferences in richness metrics are more pro-nounced in the sorting audit than in the total auditresult where both sorting and identification dif-ferences are considered (Fig. 3, Table 2). Oneexplanation for this could be that in RIVPACSsamples, most gains were identified in the sortingaudit. These gains (GS) plus the number of gainsobserved in the identification audit (GID) are noteliminated by the losses (L) observed in the iden-tification audit (GS+GID)L=8.03). Therefore, inmetrics that measure taxonomic richness – i.e. arequalitative, counting taxa – the number of differ-ences increases more strongly between original andaudited samples in RIVPACS samples than it doesin STAR-AQEM samples. This is because inSTAR-AQEM samples, sorting and identificationgains are more or less eliminated by identificationlosses (GS+GID ) L=0.44). The higher number ofdifferences observed in the sorting audit comparedto the total audit can be explained by the numberof losses. On average, GID ) L is )2.83 forRIVPACS/PERLA and )2.55 for STAR-AQEM,so the effect is about the same for both methods.However, because GS is much greater in RIVP-

Figure. 3 Box Plots showing the absolute differences in metric values between the primary taxa lists (P) and the taxa lists based on the

sorting audit (AS) (P-AS), between the primary taxa list and the identification audit (AID) taxa lists (P-AID) and the difference between

the primary taxa lists and the total audit (ATOT) taxa lists (P-ATOT) for 36 RIVPACS/PERLA (RIV/PER) and 36 STAR-AQEM

samples. Box Plots: (indicates median; box indicates 25–75th percentile range; s indicate outliers; } indicate extreme values. N = 72.

c

516

Page 13: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

517

Page 14: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

Figure 3. (Continued)

518

Page 15: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

ACS than in STAR-AQEM samples, the differ-ences in number of taxa between the primary re-sults and the sorting audit results are greater thanthose between the primary result and the sortingplus identification audit result. This effect is max-imised in metrics with the highest level of taxo-nomic differentiation, i.e. highest probability ofobserving differences, e.g. No Taxa. The effect isreduced as the number of taxa observed is reduced.This is, for example, the case when only certaintaxonomic groups are considered (effect in EPT-COB Taxa >EPT Taxa) or when the level ofidentification is lowered (effect in No. Taxa>No.Genera>No. Families).

It thus appears, that some metrics will hardlybe affected by the cumulative error in sorting andidentification of samples. These are the metricsthat count taxa as measures of species richness.While affected by both sorting and identificationerrors, the overall number of taxa and thereforethe number of taxa belonging to a taxonomicgroup is hardly influenced in the overall assess-ment as the errors do not act cumulatively butcancel each other out. For example, if the primaryanalyst identifies Drusus annulatus and anotheranalyst identifies the same individuals as Drususdestitutus, one would have two differences in thetaxa lists, but no differences in the number of taxa,genera, family or number of EPT taxa. Functionalmetrics may however respond to such differences,e.g. feeding types, r/K relationship, ASPT orBMWP scores, saprobic valences.

Sorting and identification audits and quality control

Regardless the methods used, a considerableamount of sorting and identification error could beshown. It also became evident that these errorsaffect metric results and therefore should be takeninto account in stream assessment.

The performance of partners varied consider-ably, especially in the sorting audit. This could bethe result of the limited experience of some partnerswith one or both of the protocols. Another reasonmight be the existence of an audit system. In ourstudy the best performance (of all samples) in thesorting audit has been achieved by the UK partner,the only country which has established an auditsystem many years ago. It appears that experienceand common auditing of samples leads to better

quality of performance. Errors in the long-termauditing scheme in the UK were greatest in the firstyear of the audit and have decreased over time forvarious laboratories in the UK, including othergovernment agencies and commercial contractors.Poor results are especially common in the first auditbut improve very rapidly thereafter (Murray-Blighet al., 2006). This is an effect of training and expe-rience, but may also be an effect of operatorsknowing that samples can and will be audited.Those partners whose audit results were muchpoorer than expected are unlikely to have similarlypoor results if they are audited again in the future.Conclusions made on the basis of the results in thisstudy may therefore differ from those of potentialfuture studieswith a similar auditing scheme and theresults may not be generally applicable to labora-tories currently involved in operational monitoringschemes with auditing. However, the present studyand past experience with auditing schemes showsthat there is a considerable effect resulting fromexperience and the training that operators receive.Biologists often receive no formal training, partic-ularly in sorting, and unless someone points-outmistakes, they will remain unaware of shortcom-ings. Sorting is conceptually very simple and thetask is sometimes left to the most junior and inex-perienced biologists. The audit results demonstratethat sorting is in fact a task that requires more skillthan has been recognised in the past. The audit re-sults also point out the need for formal training andaudit strategies for operators working to implementbioassessment schemes using macroinvertebrates.Extensive training is of utmost importance in theidentification of macroinvertebrates. This is dem-onstrated by the large contribution identificationerror had on the total audit error (Figs. 1 and 2). InSTAR-AQEM it was the main component of error,and was equally important in RIVPACS. Correcttaxonomic identification is also very importantwhen assessment strategies are based on metrics,because many functional metrics are based on spe-cies-specific autecological data. As less and less al-pha-taxonomical skills are being taught withintertiary education programs around the globe, theneed for specialist-based training and extra-curric-ular schooling for analysts dealing with the identi-fication of stream biota will become increasinglyimportant. Rigid training programs and auditingschemes will minimise analytical error related to

519

Page 16: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

sample sorting and species identification observedin this study. This will increase the precision ofassessment results and strengthen water managers’confidence in assessment results. Such programsshould therefore become an integral part of bio-logical stream assessment in the future and seemvital for the successful implementation of the EU-WFD.

Acknowledgements

We would like to thank all project partners whocontributed data to this study. Rebecca Bloch,Britta Gehenio and Jenny Schmidt are thanked forassistance in data formatting. This study wassupported by the EU and presents results from theSTAR project (Contract No: EVK1-CT 2001-00089).

References

AQEM consortium, 2004. AQEMdip: AQEM data input pro-

gram. Downloadable from http://www.eu-star.at.

Armitage, P. D., D. Moss, J. F. Wright & M. T. Furse, 1983.

The performance of a new biological water quality score

system based on macroinvertebrates over a wide range of

unpolluted running-water sites. Water Research 17: 333–347.

Biss, R., P. Kubler, I. Pinter & U. Braukmann, 2002. Leit-

bildbezogenes biozonotisches Bewertungsverfahren fur Fli-

eßgewasser (aquatischer Bereich) in der Bundesrepublik

Deutschland. Ein erster Beitrag zur integrierten okologis-

chen Fließgewasserbewertung – Final report on CD-ROM.

UBA Texts 62/02, Berlin.

Bohmer, J., C. Rawer-Jost, A. Zenker, C. Meier, C. Feld,

R. Biss & D. Hering, 2004. Development of a multimetric

invertebrate based assessment system for German rivers.

Limnologica 34: 416–432.

Boulton, A. J. & P. S. Lake, 1992. The ecology of two streams

in Victoria, Australia. III. Temporal changes in species

composition. Freshwater Biology 27: 123–138.

Cao, Y., C. P. Hawkins & M. R. Vinson, 2003. Measuring and

controlling data quality in biological assemblage surveys

with special reference to stream benthic macroinvertebrates.

Freshwater Biology 48: 1898–1911.

Carter, J. L. & V. H. Resh, 2001. After site selection and before

data analysis: sampling, sorting, and laboratory procedures

used in stream benthic macroinvertebrate monitoring pro-

grams by USA state agencies. Journal of the North Ameri-

can Benthological Society 20: 658–682.

Clarke, R. T., 2000. Uncertainty in estimates of river quality

based on RIVPACS. In Wright, J. F., D. W. Sutcliffe &

M. T. Furse (eds), Assessing the Biological Quality of

Freshwaters: RIVPACS and Similar Techniques. Freshwa-

ter Biological Association, Ambleside 39–54.

Clarke, R. T., M. T. Furse, R. J. M. Gunn, J. M. Winder &

J. F. Wright, 2002. Sampling variation in macroinvertebrate

data and implications for river quality indices. Freshwater

Biology 47: 1735–1751.

Clarke, R. T., J. Davy-Bowker, L. Sandin, N. Friberg, R. K.

Johnson & B. Bis, 2006a. Estimates and comparisons of the

effects of sampling variation using ‘national’ macroinverte-

brate sampling protocols on the precision of metrics used to

assess ecological status. Hydrobiologia 566: 477–503.

Clarke, R. T., A. Lorenz, L. Sandin, A. Schmidt-Kloiber,

J. Strackbein, N. T. Kneebone & P. Haase, 2006b. Effects of

sampling and sub-sampling variation using the STAR-

AQEM sampling protocol on the precision of macroinver-

tebrate metrics. Hydrobiologia 566: 441–459.

Doberstein, C., J. Karr & L. Conquest, 2000. The effect of

fixed-count subsampling on macroinvertebrate biomonitor-

ing in small streams. Freshwater Biology 44: 355–371.

European Union, 2000. Directive 2000/60/EC. Establishing a

framework for community action in the field of water policy.

European Commission PE-CONS 3639/1/100 Rev 1, Lux-

emburg.

Furse, M., D. Hering, O. Moog, P. Verdonschot, R. K.

Johnson, K. Brabec, K. Gritzalis, A. Buffagni, P. Pinto, N.

Friberg, J. Murray-Bligh, J. Kokes, R. Alber, P. Usseglio-

Polatera, P. Haase, R. Sweeting, B. Bis, K. Szoszkiewicz, H.

Soszka, G. Springe, F. Sporka & I. Krno, 2006. The STAR

project: context, objectives and approaches. Hydrobiologia

566: 3–29.

Ganasan, V. & R. M. Hughes, 1998. Application of an index of

biological integrity (IBI) to fish assemblages of the rivers

Khan and Kshipra (Madhya Pradesh), India. Freshwater

Biology 40: 367–383.

Haase, P., S. Lohse, S. Pauls, K. Schindehutte, A. Sundermann,

P. Rolauffs & D. Hering, 2004a. Assessing streams in Ger-

many with benthic invertebrates: development of a practical

standardised protocol for macroinvertebrate sampling and

sorting. Limnologica 34: 349–365.

Haase, P., S. Pauls, A. Sundermann & A. Zenker, 2004b.

Testing different sorting techniques in macroinvertebrate

samples from running waters. Limnologica 34: 366–378.

Hering, D., C. Meier, C. Rawer-Jost, R. Biss, C. Feld,

A. Zenker, A. Sundermann, S. Lohse & J. Bohmer, 2004a.

Assessing streams in Germany with benthic invertebrates:

selection of candidate metrics. Limnologica 34: 398–415.

Hering, D., O. Moog, L. Sandin & P. F. M. Verdonschot,

2004b. Overview and application of the AQEM assessment

system. Hydrobiologia 516: 1–20.

Jaccard, P., 1901. Etude comparative de la distribution florale

dans une portion des Alpes et des Jura. Bulletin de la Societe

Vaudoise des Sciences Naturelles 37: 547–579.

Lorenz, A., L. Kirchner & D. Hering, 2004. ‘Electronic subsam-

pling’ of macrobenthic samples: how many individuals are nee-

ded for a valid assessment result? Hydrobiologia 516: 299–312.

Mann, H. B. & D. R. Whitney, 1947. On a test of whether one

of two random variables is stochastically larger than the

other. Annals of Mathematical Statistics 18: 50–60.

520

Page 17: Assessing the impact of errors in sorting and identifying macroinvertebrate samples

McCune, B. & M. J. Mefford, 1999. PC-ORD. Multivariate

Analysis of Ecological Data. Version 4.25. MjM Software,

Gleneden Beach, Oregon, USA.

McElravy, E. P., G. A. Lamberti & V. H. Resh, 1989. Year-to-

year variation in the aquatic macroinvertebrate fauna of a

northern Californian Stream. Journal of the North Ameri-

can Benthological Society 8: 51–63.

Murray-Bligh, J. A. D., M. T. Furse, F. H. Jones, R. J. M.

Gunn, R. A. Dines & J. F. Wright, 1997. Procedure for

collecting and analysing macroinvertebrate samples for

RIVPACS. Joint publication by the Institute of Freshwater

Ecology and the Environment Agency, 162 pp.

Murray-Bligh, J., J. van der Molen & P. Verdonschot, 2006.

STAR deliverable No. 7: Audit of Performance incorporat-

ing Results of the La Bresse sampling and analysis work-

shop. Unpublished report. www.eu-star.at.

National Water Council, 1981. River Quality: The 1980 Survey

and Future Outlook. National Water Council, UK.

Ofenbock, T., O. Moog, J. Gerritsen & M. Barbour, 2004. A

stressor specific multimetric approach for monitoring run-

ning waters in Austria using benthic macro-invertebrates.

Hydrobiologia 516: 251–268.

Ostermiller, J. D. & C. P. Hawkins, 2004. Effects of sampling

error on bioassessments of stream ecosystems: application to

RIVPACS-type models. Journal of the North American

Benthological Society 23: 363–382.

Shannon, C. E. & W. Weaver, 1949. Mathematical Theory of

Communication. The University of Illinois Press, Urbana,

IL.

Schweder, H., 1992. Neue Indices fur die Bewertung des oko-

logischen Zustandes von Fließgewassern, abgeleitet aus der

Makroinvertebraten-Ernahrungstypologie. Limnologie Ak-

tuell 3: 353–377.

Sporka F., H. E. Vlek, E. Bulankova & I. Krno, 2006. Influence

of seasonal variation on bioassessment of streams using

macroinvertebrates. Hydrobiologia 566: 543–555.

StatSoft, Inc., 2002. STATISTICA for Windows (Software-

System for Data Analysis) Version 6.1. www.statsoft.com.

Weatherby, N. S. & S. J. Ormerod, 1990. The constancy of

univoltine assemblages in soft water streams: implications

for the publication and detection of environmental change.

Journal of Applied Ecology 27: 952–964.

Wiberg-Larsen, P., K. P. Brodersen, S. Birkholm, P. N. Grøn &

J. Skriver, 2000. Species richness and assemblage structure of

Trichoptera in Danish streams. Freshwater Biology 43: 633–

647.

Wilcoxon, F., 1945. Individual Comparisons by Ranking

Methods. Biometrics 1: 80–83.

Vlek, H. E., F. Sporka & I. Krno, 2006. Influence of macro-

invertebrate sample size on bioassessment of streams. Hyd-

robiologia 566: 523–542.

521