4
Received: 29 September 2000 Accepted: 3 December 2000 Presented at the EURACHEM/EQUALM Workshop “Proficiency Testing in Analyti- cal Chemistry, Microbiology and Labora- tory Medicine”, 24–26 September 2000, Bora ˚s, Sweden Abstract The evaluation of mea- surement uncertainty, and that of un- certainty statements of participating laboratories will be a challenge to be met in the coming years. The publi- cation of ISO 17025 has led to the situation that testing laboratories should, to a certain extent, meet the same requirements regarding mea- surement uncertainty and traceabil- ity. As a consequence, proficiency test organizers should deal with the issues measurement uncertainty and traceability as well. Two common statistical models used in proficiency testing are revisited to explore the options to include the evaluation of the measurement uncertainty of the PTRV (proficiency test reference value). Furthermore, the use of this PTRV and its uncertainty estimate for assessing the uncertainty state- ments of the participants for the two models will be discussed. It is con- cluded that in analogy to Key Com- parisons it is feasible to implement proficiency tests in such a way, that the new requirements can be met. Keywords Proficiency testing · Measurement uncertainty · Reference value · Consensus value · Assessment of laboratories Accred Qual Assur (2001) 6:160–163 © Springer-Verlag 2001 PRACTITIONER’S REPORT Adriaan M.H. van der Veen Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives Introduction The current practice in proficiency testing differs consid- erably from the practice in comparisons in the calibra- tion area. This is not caused by differences between cali- bration and testing; it finds its origin in the fact that most test results are at best accompanied by an indication of their repeatability, whereas calibration results come with an uncertainty statement. In view of the new ISO 17025 [1], this difference will disappear, leaving a task for the proficiency testing providers in redesigning their ser- vices. Uncertainty calculations will play a dominant role at all levels of proficiency testing in the near future. The laboratories are required to express their uncertainty, and the organizer of the proficiency tests will be required to evaluate the uncertainty statements delivered. This paper aims to set the frame for the newly de- signed proficiency tests. Furthermore, it will compare the proposed new practices with classical proficiency testing as it is carried out today. An important aspect of this comparison is to see how these developments affect the assessment of the participants’ results. Obviously, if a laboratory performs well today, it should also do so to- morrow. This holds only for the reported value; as the laboratory has to deliver an uncertainty statements there is still the option of performing unsatisfactorily on this part. The “Guide to the expression of uncertainty in mea- surement” (GUM) [2] provides the framework for doing uncertainty calculations. It does not distinguish between physics, chemistry, or biology; neither does it between calibration and testing. This observation is very impor- tant, as it allows the option of using well designed ap- proaches of key comparisons, or other comparisons in the calibration area. The nature of the problems in de- signing a proficiency test does not differ from that in the calibration area. Problems like obtaining a reference value, expressing its uncertainty, and dealing with co- variances and correlations are all the same. A.M.H. van der Veen Nederlands Meetinstituut, Schoemakerstraat 97, 2628 VK Delft, The Netherlands e-mail: [email protected] Tel.: +31-15-269-1733 Fax: +31-15-261-2971

Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives

Embed Size (px)

Citation preview

Page 1: Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives

Received: 29 September 2000Accepted: 3 December 2000

Presented at the EURACHEM/EQUALMWorkshop “Proficiency Testing in Analyti-cal Chemistry, Microbiology and Labora-tory Medicine”, 24–26 September 2000,Boras, Sweden

Abstract The evaluation of mea-surement uncertainty, and that of un-certainty statements of participatinglaboratories will be a challenge to bemet in the coming years. The publi-cation of ISO 17025 has led to thesituation that testing laboratoriesshould, to a certain extent, meet thesame requirements regarding mea-surement uncertainty and traceabil-ity. As a consequence, proficiencytest organizers should deal with theissues measurement uncertainty andtraceability as well. Two commonstatistical models used in proficiencytesting are revisited to explore theoptions to include the evaluation of

the measurement uncertainty of thePTRV (proficiency test referencevalue). Furthermore, the use of thisPTRV and its uncertainty estimatefor assessing the uncertainty state-ments of the participants for the twomodels will be discussed. It is con-cluded that in analogy to Key Com-parisons it is feasible to implementproficiency tests in such a way, thatthe new requirements can be met.

Keywords Proficiency testing · Measurement uncertainty ·Reference value · Consensus value · Assessment of laboratories

Accred Qual Assur (2001) 6:160–163© Springer-Verlag 2001 PRACTITIONER’S REPORT

Adriaan M.H. van der Veen Uncertainty evaluation in proficiency testing:state-of-the-art, challenges, and perspectives

Introduction

The current practice in proficiency testing differs consid-erably from the practice in comparisons in the calibra-tion area. This is not caused by differences between cali-bration and testing; it finds its origin in the fact that mosttest results are at best accompanied by an indication oftheir repeatability, whereas calibration results come withan uncertainty statement. In view of the new ISO 17025[1], this difference will disappear, leaving a task for theproficiency testing providers in redesigning their ser-vices. Uncertainty calculations will play a dominant roleat all levels of proficiency testing in the near future. Thelaboratories are required to express their uncertainty, andthe organizer of the proficiency tests will be required toevaluate the uncertainty statements delivered.

This paper aims to set the frame for the newly de-signed proficiency tests. Furthermore, it will comparethe proposed new practices with classical proficiencytesting as it is carried out today. An important aspect of

this comparison is to see how these developments affectthe assessment of the participants’ results. Obviously, ifa laboratory performs well today, it should also do so to-morrow. This holds only for the reported value; as thelaboratory has to deliver an uncertainty statements thereis still the option of performing unsatisfactorily on thispart.

The “Guide to the expression of uncertainty in mea-surement” (GUM) [2] provides the framework for doinguncertainty calculations. It does not distinguish betweenphysics, chemistry, or biology; neither does it betweencalibration and testing. This observation is very impor-tant, as it allows the option of using well designed ap-proaches of key comparisons, or other comparisons inthe calibration area. The nature of the problems in de-signing a proficiency test does not differ from that in thecalibration area. Problems like obtaining a referencevalue, expressing its uncertainty, and dealing with co-variances and correlations are all the same.

A.M.H. van der VeenNederlands Meetinstituut,Schoemakerstraat 97,2628 VK Delft,The Netherlandse-mail: [email protected].: +31-15-269-1733Fax: +31-15-261-2971

Page 2: Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives

161

Basic considerations for evaluating measurementuncertainty

The basis for proficiency testing is described in ISOGuide 43–1:1997 [3]. One of the tools necessary to as-sess the performance of the participating laboratories isan assigned value, which is used as reference point. Inthis paper, the abbreviation PTRV (proficiency test refer-ence value) will be used for this purpose. Classically,there are two ways to obtain a PTRV:

1. By prior measurement (“reference value”)2. From the participants’ results (“consensus value”)

Irrespective of the model chosen, the GUM [2] providesa framework for the evaluation of the measurement un-certainty with respect to the PTRV. From a fundamentalpoint of view, there is no difference between the twoways of obtaining a PTRV. A practical example of work-ing out the establishment of a PTRV using prior mea-surement is given elsewhere [4]. Although the process isnot uncomplicated, the estimation of measurement un-certainty is certainly well feasible.

When working with a consensus value, the philoso-phy is not different: the GUM can be implementedstraightforwardly, as soon as the establishment of theconsensus value is defined appropriately. There are how-ever some practical difficulties to be overcome, whichhave mainly to do with the quality of the participants’data. It should be noted first that the quality of the PTRVis directly dependent on the quality of the participants’data. This will be reflected by the uncertainty of thePTRV as well. A further problem is the presence of sus-picious results (e.g., outliers). It is not acceptable in aproficiency test to work without some policy to treat out-liers.

In this paper, the establishment of a PTRV throughconsensus among participants will be revisited. There area few different cases to be considered, in fact:

1. Results with credible uncertainty statement2. Results with non-credible uncertainty statements3. Results without uncertainty statements

PTRV through consensus

Establishment of a PTRV through consensus is morecomplicated than through prior measurement. The reasonfor this is that it is more difficult to develop a set of as-sumptions and assertions that is in compliance with thedata obtained, and a sufficient basis on which to developan algorithm at the same time. The days are gone whenall data from all participants could be thrown into a big“hat” and that automatically the consensus value wouldcome out. Building consensus values is probably one ofthe most complex tasks to be carried out by the organizer.

The other mainstream design, in fact with a PTRVbased on prior measurement, is always easier to imple-ment. The understanding of what is going on during theestablishment of reference values is usually better thanin the case of consensus values: consensus values areoften used in cases too complex to be handled by refer-ence values. This is often a result from a lack of under-standing, in terms of modeling, of the measurementproblem. Properties of the sample, matrix effects, extrac-tion/destruction yields, etc. all contribute greatly to thislack of understanding. All these aspects, that may greatlyinfluence the measurement results and therefore alsotheir uncertainty, may lead to the conclusion that work-ing with a consensus value is inevitable. So, this lack ofunderstanding has more to do with the state-of-the-art inmeasurement science than with the skills of the teamoperating the proficiency test.

The topic of correlation between measurement resultsis a very critical one, and it is gaining more and more in-terest. The assumption of IID-data (independent, identi-cally distributed data) is easily made, but difficult to ver-ify, and in most cases highly critical. If data are not IID,most of the statistics known do not work. Often, theproblem is not so much in the distribution, it is more inthe (in)dependence. Dependent data can already be ob-served in cases where all laboratories use the same puresubstances for their calibration, for instance. This hap-pens, for example, in PAH-analysis, where there is onlyone series of certified pure substances available. Obvi-ously, the purity data of these substances cannot be treat-ed as being independent.

Both in testing and in calibration, correlation of dataplays an important role. The consequence of data beingcorrelated and disrespecting this leads to wrong uncer-tainty estimates. The worst part of the message is that itis even not known whether this leads to over- or underes-timation problems. As a result, it will just not work toignore correlations. A safe practice is to drop the as-sumption of independence, and to work from there. Itdoes make life somewhat more complicated under cer-tain circumstances, but underestimation problems will beavoided.

Case of credible uncertainty statements

The first important case to be considered is the case ofcredible uncertainty statements. The development of aprocedure for the calculation of the consensus value doesnot differ from an approach suggested for evaluating keycomparisons [5] which has also been demonstrated towork for the certification of reference materials [6]. In arecent paper by this author, an implementation of thisrecipe has been given for the case of reference materials.A disadvantage of the method is that a full description ofall measurement models is required. This is – apart from

Page 3: Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives

the considerable extra effort – undesirable for anotherreason: it is far away from the present philosophy of pro-ficiency testing as it violates the principle to work under“normal conditions”.

The crux in designing an evaluation method is in thetreatment of the data from the laboratories, in relation tothe issue of correlations between results. In principle, foreach laboratory pair in the proficiency test, the covari-ance should be computed. To the full extent, this hasbeen established elsewhere [5, 6]. Here, a simpler meth-od will be proposed. The task for the statistician respon-sible for the evaluation of the proficiency test is to makea fair estimate of the degree of correlation between twolaboratory results. In order to make such an estimation,the organizer should have some insight in the methods,chemicals, and standards used, etc. In most proficiencytests, such information is obtained through an inquiryand/or regular participant-organizer communication.

Instead of requesting all measurement models fromall laboratories to be reported like in the case of refer-ence materials [6], the statistician should make a conser-vative estimate of the (possible) degree of correlation ofresults. This conservative value should flow in into theevaluation method as proposed for the reference materi-als, and the calculation can be started. Using the metho-dology of looking at the degrees of equivalence [5, 6],the unsatisfactory results can be removed and the con-sensus value can be established. Then, with the consen-sus value after removal of unsatisfactory results, theresults of the laboratories can be assessed.

Case of non-credible uncertainty statements

This case cannot be compared with the case of credibleuncertainty statements. The problem is that the organizerof the proficiency test gets a lot of information, but thevalue of this information is to a certain degree question-able. Obviously, the judgment as to whether informationis credible or not is something that must be decided fromcase to case, but always beforehand. If, during a profi-ciency test, it appears that the wrong decision has beentaken, then it is not an easy task to do a repair: the dan-ger of violating other assumptions is great. Furthermore,it leaves the participants in doubt about the outcome ofthe proficiency test, something to be avoided at all cost.

If the uncertainty statements are not credible, it is bet-ter to refrain from using the uncertainty information atall for the establishment of the consensus value. It is bet-ter practice to use some kind of approximation, like forinstance the following formula:

(1)

where the last term reflects those uncertainty sourcesother than those randomized in the proficiency test.

These uncertainties are considered to be more or less thesame for all participants. The standard deviation s is thejust the standard deviation of the means of the laboratorymeans, whereas m is the mean of these laboratory means.p denotes the number of laboratories. Further treatmentof data can take place as usual, including outlier/strag-gler testing and/or removal if considered appropriate. Itshould be noted that the larger the proficiency test (p),the smaller the first term in the expression for the uncer-tainty, so the more important the second term becomes.This is a serious disadvantage of the approach, and can-not be solved easily, due to apparent problems in theuncertainty estimation.

The method can obviously also be carried out withrobust estimation techniques, like for instance the use ofthe median and the (normalized) median of absolutedeviations, MADe. The procedure remains the same, andusually the results from robust estimation techniques donot differ significantly from those after an evaluationusing classical statistical techniques [7, 8].

The evaluation of the performance of the laboratoriescan now take place as in the case of the credible uncer-tainty statements, as the uncertainty of the consensusvalue is now available, and so are all uncertainty state-ments from the laboratories.

No uncertainty information available

In several cases it may still be impossible to come upwith an uncertainty statement. This is probably the worstsituation, as the customer of the laboratory does not haveany indication about the reliability of the reported data.In the absence of uncertainty data it is obviously impos-sible to work with anything else than the reported labora-tory averages. It still leaves the organizer of the profi-ciency test with the task of estimating the uncertainty ofthe consensus value. Typically, one could proceed as fol-lows. The uncertainty at the level of a laboratory can becomputed from

(2)

where all symbols have the same meaning as in the pre-vious case. The major difference is that the division by phas vanished. This is a necessity, as only the reportedvalue of the laboratory (y) can be assessed (there is nouncertainty information).

In this case, the well known Z-score can still be used:

(3)

to assess the performance of the laboratories. The esti-mation of the uncertainty of a “typical” laboratory is areal burden, as the organizer must find ways to come upwith an uncertainty statement in a complete lack of in-

162

u m sp ui other

i

L2

22

1( ) = + ∑

= ,

u y s ui otheri

L2 2 2

1( ) = + ∑

= ,

Zm yu y

= −( )

Page 4: Uncertainty evaluation in proficiency testing: state-of-the-art, challenges, and perspectives

163

formation. This situation should be avoided, or circum-vented by working with fixed limits in the performancecharacteristics. This is a completely different philosophy,and outside the scope of this paper.

Role of homogeneity and stability of PTMs

Similarly to the uncertainty of the property values of(certified) reference materials, the uncertainty of theproperty values of PTMs (proficiency test materials)should also include the between-bottle homogeneity [9]and short- and long-term stability [10]. It should benoted that (1) the stability of the material is only of con-cern as long as the comparison is ongoing and (2) short-term stability might impose even greater problems thanin the case of CRMs. This is due to the fact that PTMsare often more like “real-world” samples, in a sense thatthe measures taken to improve stability are less severethan for several groups of CRMs. The inclusion of theseuncertainty components in the uncertainty of the PTM isanalogous to the uncertainty model established for refer-ence materials and is described elsewhere [6, 11].

Conclusions

In conclusion, it is demonstrated that practical ap-proaches are at hand to run proficiency tests in the test-

ing area in the same way as comparisons in the calibra-tion area. The nature of the two comparisons is exactlythe same: the problems of credible uncertainty state-ments as well as that of correlated variables also exist inboth cases. The outcome of the restyled proficiency testmust not differ from the classical approach, providedthat the same assumptions are used and that they are“translated” correctly in the model.

Uncertainty calculations in the testing area are nolonger completely different from those in the calibrationarea. There are differences, and both areas have theirspecific problems. There is a big task ahead for profi-ciency testing organizers in adapting to the new situa-tion, but they can borrow a lot from existing techniquesmade available in comparisons in the calibration area. Itwill bring probably the science of experimental measure-ment and the science of uncertainty evaluation moreclosely and more consistently together, which will im-prove the learning cycle in proficiency testing consider-ably. It will give a boost to the understanding of howmeasurement systems behave, and this will allow formore direct and better heading actions if method im-provement is necessary.

References

1. ISO (1999) International Organizationfor Standardization ISO 17025: Gen-eral requirements for the competenceof testing and calibration laboratories.ISO Geneva

2. ISO (1995) BIPM, IEC, IFCC, ISO,IUPAC, IUPAP, OIML: Guide to theexpression of uncertainty in measure-ment, 1st edn, 2nd corrected print. ISO Geneva

3. ISO (1997) International Organizationfor Standardization: ISO/IEC Guide43–1:1997: Proficiency testing by in-terlaboratory comparisons – Part 1:Development and operation of profi-ciency testing schemes. ISO Geneva

4. Van der Veen AMH, Horvat M,Milačič R, Buačr T, Repinc U, ŠčančarJ, Jaćimović R (2001) Operation of aproficiency test of trace elements insewage sludge with reference values.Accred Qual Assur (submitted for publication)

5. Nielsen L (1999) Evaluation of mea-surement intercomparisons by themethod of least squares. DFM Rep 99-R39, presented at the EUROMETworkshop on uncertainty calculationsin key comparisons, Teddington, Nov 1999

6. Van der Veen AMH (2000) Determina-tion of the certified value of a refer-ence material appreciating the uncer-tainty statements obtained in the col-laborative study. Presented at AMCTM2000, Monte de Caparica, May 2000

7. Van der Veen AMH, Broos AJM(1996) Preparation and characterisationof coal samples and maceral concen-trates for studies on gasification andcombustion reactivity of coals in com-bined cycle processes. Draft Final RepECSC 7220/EC-036, Eygelshoven, NL

8. Cox MG (1999) A discussion of ap-proaches for determining a referencevalue in the analysis of key-compari-son data. NPL Rep CISE 42/99, Tedd-ington, UK

9. Van der Veen AMH, Linsinger TPJ,Pauwels J (2001) Uncertainty calcula-tions in the certification of referencematerials. 2. Homogeneity study.Accred Qual Assur 6:26–30

10. Van der Veen AMH, Linsinger TPJ,Lamberty A, Pauwels J (2001) Uncer-tainty calculations in the certificationof reference materials. 3. Stabilitystudy. Accred Qual Assur (in press)

11. Van der Veen AMH, Linsinger TPJ,Schimmel H, Lamberty A, Pauwels J(2001) Uncertainty calculations in thecertification of reference materials. 4. Characterisation and certification:Accred Qual Assur (in press)