IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

Embed Size (px)

Citation preview

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    1/75

    INTERNATIONAL UNION OF PURE AND APPLIED CHEMISTRY1

    ANALYTICAL CHEMISTRY DIVISION*2

    INTERDIVISIONAL WORKING PARTY FOR HARMONIZATION3

    OF QUALITY ASSURANCE SCHEMES4

    5

    COOPERATION ON INTERNATIONAL TRACEABILITY6

    IN ANALYTICAL CHEMISTRY (CITAC)7

    8

    IUPAC/CITAC GUIDE9

    SELECTION AND USE OF PROFICIENCY TESTING SCHEMES10

    FOR A LIMITED NUMBER OF PARTICIPANTS 11

    CHEMICAL ANALYTICAL LABORATORIES12

    13

    (IUPAC Technical Report)14

    15

    Prepared for publication by16

    ILYA KUSELMAN1,AND ALE FAJGELJ2171The National Physical Laboratory of Israel, Givat Ram, Jerusalem 91904, Israel;18

    2International Atomic Energy Agency, Wagramer Strasse 5, P.O.Box 100, Vienna19

    A-1400, Austria20

    21

    Corresponding author: e-mail: [email protected]

    23

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    2/75

    2

    *Membership of the Analytical Chemistry Division during the final preparation of this1

    report was as follows:2

    President:A. Fajgelj (IAEA); Vice-President: W. Lund (Norway);Past-President:3

    R. Lobinski (France); Secretary: D.B. Hibbert (Australia); Titular Members:4

    M.F. Cames (Portugal); Z. Chai (China); P. De Bivre (Belgium); J. Labuda5

    (Slovakia); Z. Mester (Canada); S. Motomizu (Japan); Associate Members: P. De6

    Zorzi (Italy); A. Felinger (Hungary); M. Jarosz (Poland); D.E. Knox (USA);7

    P.Minkkinen (Finland); P.M. Pingarrn (Spain); National Representatives: S.K.8

    Aggarwal (India); R. Apak (Turkey); M.S. Iqbal (Pakistan); H. Kim (Korea); T.A.9

    Maryutina (Russia); R.M. Smith (UK); N. Trendafilova (Bulgaria)10

    11

    Membership of the Task Group:12

    Chairman:A. Fajgelj (IAEA);Members: I.Kuselman (Israel); M.Belli (Italy); S.L.R.13

    Ellison (UK); U.Sansone (IAEA); W.Wegscheider (Austria)14

    15

    ACKNOWLEDGEMENTS16

    The Task Group would like to thank P. Fisicaro (France) and M. Koch (Germany) for17

    their data used and help in preparation of Examples 1 and 2, respectively, in Annex B18

    of the Guide; H. Emons (IRMM) for helpful discussions; Springer, Heidelberg19

    (www.springer.com) and the Royal Society of Chemistry, London (www.rsc.org) for20

    permission to use material from the published papers cited in the Guide.21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    3/75

    3

    IUPAC/CITAC Guide1

    Selection and Use of Proficiency Testing Schemes for a Limited2

    Number of Participants Chemical Analytical Laboratories3

    (IUPAC technical Report)4

    5

    Abstract:A metrological background for implementation of proficiency testing (PT)6

    schemes for a limited number of participating laboratories (fewer than 30) is7

    discussed. Such schemes should be based on the use of certified reference materials8

    with traceable property values to serve as proficiency test items whose composition is9

    unknown to the participants. It is shown that achieving quality of PT results in the10

    framework of the concept tested once, accepted everywhere requires both11

    metrological comparability and compatibility of these results.12

    A possibility to assess collective/group performance of PT participants by13

    comparison of the PT consensus value (mean or median of the PT results) with the14

    certified value of the test items, is analyzed. Tabulated criteria for this assessment are15

    proposed.16

    Practical examples are described for illustration of the issues discussed.17

    18

    Keywords: proficiency testing, sample size, metrological traceability, measurement19

    uncertainty, metrological comparability and compatibility20

    21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    4/75

    4

    ABBREVIATIONS AND SYMBOLS1

    2

    A- critical value for numbersN+ and/orN-3

    AAS - atomic absorption spectrometry4

    ai- empirical sensitivity coefficient of the i-th component5

    AN- acid number6

    AS adequacy score7

    - probability equivalent to the area under the tail/s of a distribution;8

    bcf- buoyancy correction factor9

    - probability of type 2 error10

    c1, c2 measurement/test results corresponding to the crossing points of two11

    probability density functions12

    ccert certified (assigned) value of a particular property of a CRM13

    ci measurement/test result of i-th laboratory participating in PT14cis value of a particular property of routine samples15

    CP- criterion power16

    cPT- population (theoretical) mean of PT results17

    cPT/avg observed/experimental mean of PT results (consensus value)18

    CRM certified reference material19

    - ratiocert/PT20

    - permissible bias ofMPTfrom ccert21

    and - parameters22

    EMD - Ecole des Mines de Douai23

    F- frequencyof a c-value24

    f- probability density function25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    5/75

    5

    GC-MS gas chromatography-mass spectroscopy1

    GF-AAS graphite furnace-atomic absorption spectrometry2

    H0- null hypothesis3

    H1- alternative hypothesis4

    hand - hand preparation of a sample5

    HPLC high performance liquid chromatography6

    i, j, n index numbers7

    ICP-MS - inductively coupled plasma mass spectroscopy8

    ICP-OES inductively coupled plasma-optical emission spectroscopy9

    ID-ICP-MS isotope dilution-inductively coupled plasma-mass spectrometry10

    IHRM in-house reference material11

    INPL National Physical Laboratory of Israel12

    ISO International Organization for Standardization13

    K kelvin14

    LNE - Laboratoire National de Mtrologie et dEssais15

    MCL - maximum contaminant level16

    32OAsm - mass of a sample of arsenic oxide17

    mdil- mass of the diluted solution (a sample)18

    mdil/t total mass of the diluted solution19

    mlot- total mass of final lot20

    MPT population median of PT results21

    mss - mass of the stock solution (a sample)22

    mss/t- total mass of the stock solution23

    N-- number of PT results ci < ccert- 24

    N size of the a statistical sample of measurement results of PT participants25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    6/75

    6

    N* - number of potentiometric titration results1

    N+- number of PT results ci > ccert+ 2

    NIST SRM standard (certified) reference material developed by the National3

    Institute of Standards and Technology, USA4

    NMR nuclear magnetic resonance5

    Np - size of the population of PT participants6

    P probability7

    pc purity of chemicals8

    Pe- probability of an event9

    pH-metr. pH-metric method10

    Pot. titr. potentiometric titration11

    PT proficiency testing12

    32/ OAsAsp - proportion of atomic weights of As and As2O313

    - symbol of multiplication14

    Qest questionable15

    RAN limit of a difference between two results ofANdetermination (range)16

    Ri-ratio of the min to the max values from two concentrations17

    RL reference laboratory18

    lot density of a lot of an aqueous IHRM19

    s observed sample standard deviation20

    SADCMET - Southern African Cooperation in Measurement Traceability21

    sbsiand sisi- between-sample and intra-sample standard deviations22

    SI International System of Units;23

    sPT- observed sample standard deviation of PT results24

    PT- population standard deviation of PT results25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    7/75

    7

    PT/av - standard deviation of the sample mean cPT/avof PT results1

    targ- target standard deviation of PT results2

    t1-/2- percentile of the one-tailed Students distribution at level of confidence 1-/23

    TP - test power4

    u(ci)andU(ci) - standard and expanded uncertainties of ci, respectively5

    ucertand Ucert- standard and expanded uncertainty of ccert, respectively6

    ucomb combined standard uncertainty7

    umLP- standard measurement uncertainty declared by a laboratory participating in PT8

    umRL standard measurement uncertainty declared by the reference laboratory9

    USN - ultrasonic nebulization10

    UV ultraviolet11

    vibr sample preparation with a vibrating table12

    VIM3 International Vocabulary of Metrology; 3rded.13

    xj-normalized value of the j-th PT result14

    2{,N-1} - 100percentile of the 2distribution atN 1 degrees of freedom15

    - function of normalized normal distributionfunction16

    (xj) - value of the function of the normalized normal distribution forxj17

    fraction of the statistical sample of sizeNfrom the population of sizeNp18

    2

    empirical value of the Cramer-von-Mises criterion19

    z, andEn- scores for assessment of proficiency of a laboratory participating in PT20

    21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    8/75

    8

    CONTENTS1

    1. INTRODUCTION2

    1.1. Scope and field of application3

    1.2. Terminology4

    2. APPROACH5

    2.1. Properties of PT consensus values: dependence on the statistical sample size6

    2.2. Measurement uncertainty use for interpretation of PT results7

    2.3. What is a metrological approach to PT?8

    3. VALUE ASSIGNMENT9

    3.1. Metrological traceability of a CRM property value and of PT results10

    3.1.1. Commutability of the CRMs and routine samples11

    3.1.2. Three scenarios12

    3.2. Scenario I: Use of adequate CRM13

    3.3. Scenario II: No closely matched CRMs14

    3.4. Scenario III: Appropriate CRMs are not available15

    4. INDIVIDUAL LABORATORY PERFORMANCE EVALUATION AND16

    SCORING17

    4.1. Single (external) criterion for all laboratories participated in a PT18

    4.2. Own criterion for every laboratory19

    5. METROLOGICAL COMPARABILITY & COMPATIBILITY OF PT RESULTS20

    6. EFFECT OF SMALL LABORATORY POPULATION ON SAMPLE21

    ESTIMATES22

    7. OUTLIERS23

    8. EFFECTIVENESS OF APPROACHES TO PT24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    9/75

    9

    ANNEX A. CRITERIA FOR ASSESSMENT OF METROLOGICAL1

    COMPATIBILITY OF PT RESULTS2

    ANNEX B. EXAMPLES3

    ANNEX C. REFERENCES4

    5

    6

    1. INTRODUCTION7

    The International Harmonized Protocol for the proficiency testing (PT) of analytical8

    chemistry laboratories adopted by IUPAC in 1993 [1] was revised in 2006 [2].9

    Statistical methods for use in PT [3] have been published as a complementary standard10

    to ISO/IEC Guide 43, which describes PT schemes based on interlaboratory11

    comparisons [4]. General requirements for PT are updated in the new standard [5].12

    International Laboratory Accreditation Corporation (ILAC) Guidelines define13

    requirements for the competence of PT providers [6]. Guidelines for PT use in specific14

    sectors, like clinical laboratories, have also been widely available [7]. In some other15

    sectors they are under development.16

    These documents are, however, oriented mostly towards PT schemes for a17

    relatively large number Nof laboratories or participants (greater than or equal to 30),18

    henceforth referred to as "large schemes". This is important from a statistical point of19

    view, since with Nbelow 30, evaluations by statistical methods become increasingly20

    unreliable, especially for N< 20. For example, uncertainties in estimates of location21

    (such as mean and median) are sufficiently small to be neglected in scoring as N22

    increases to approximately 30, but cannot be neglected safely withN < 20. Deviations23

    from normal distribution are harder to identify if Nis small. Robust statistics, too, are24

    not usually recommended when N< 20. Therefore, the assigned/certified value of the25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    10/75

    10

    proficiency test items ccertcan not be calculated safely from the measurement results1

    obtained by the participants (PT results) as a consensus value: its uncertainty becomes2

    large enough to affect scores in "small schemes", that is, schemes with small numbers3

    of participants (N< 20).4

    Moreover, if the sizeNp of the population of laboratories participating in PT is not5

    infinite, and the size of the statistical sample N is greater than 5 to 10 % of Np, the6

    value of the sample fraction = N/Npmay need to be taken into account.7

    Thus, implementation of small PT schemes is sometimes not a routine task. Such8

    schemes are quite often required for quality assurance of environmental analysis9

    specific for a local region, analysis of specific materials in an industry (e.g. under10

    development), for purposes of a regulator or a laboratory accreditation body, etc. [8].11

    12

    1.1. Scope and field of application13

    This Guide is developed for implementation of simultaneous participation schemes14when the number of laboratories is smaller than 30. This includes: 1) selection of a15

    scheme based on simultaneous distribution of test items to participants for concurrent16

    quantitative testing; 2) use of certified reference materials (CRMs) as test items17

    unknown to the participants; 3) the individual laboratory performance assessment and18

    assessment of the metrological comparability and compatibility of the measurement19

    results of the laboratories taking part in the PT scheme as a collective (group) of the20

    participants.21

    The document is intended for PT providers and PT participants (chemical22

    analytical laboratories), for accreditation bodies, laboratory customers, regulators,23

    quality managers, metrologists and analysts.24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    11/75

    11

    1.2.Terminology1

    Terminology used in this Guide corresponds to ISO standards 17043 [5] and 3534 [9],2

    and ISO Guide 99 (VIM) [10].3

    4

    2. APPROACH5

    2.1. Properties of PT consensus values: dependence on the statistical sample size6

    The difference between the population parameters and the corresponding sample7

    estimates increases with decreasing sample sizeN. In particular, a sample mean cPT/avg8

    ofNPT results can differ from the population mean cPTby up to 1.96PT /Nwith9

    95 % probability, 1.96 being the appropriate percentile of the normal distribution for a10

    two-sided 95 % interval, and PT is the population standard deviation of the results.11

    Dependence of the upper limit of the interval for the expected bias |cPT/avg- cPT| onN12

    is shown (in units of PT) in Fig. 1, where the range N = 20 to 30 is indicated by the13

    grey bar. Even forN= 30 the bias may reach 0.36PTat the 95 % level of confidence.14

    Similarly, the sample standard deviation sPT is expected to be in the range15

    PT [2{0.025,N1}/(N1)]1/2sPT PT [

    2{0.975,N 1}/(N1)]1/2with probability16

    of 95 %, where2{, N 1} is the 100 percentile of the 2 distribution at N 117

    degrees of freedom. The dependence of the range limits for sPTonNis shown in Fig. 218

    (again in PT

    units), also with the range N = 20 to 30 marked by the grey bar. For19

    example, for N = 30 the upper 95 % limit for sPTis 1.26PT. In other words, sPTcan20

    differ from PT for N = 30 by over 25 % rel. at the level of confidence 0.95. For21

    N< 30 the difference between the sample and the population characteristics increases22

    with decreasingN, especially dramatically for the standard deviation whenN < 20.23

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    12/75

    12

    0.1

    0.3

    0.5

    0.7

    0.9

    1.1

    1.3

    0 20 40 60 80 100

    N

    B

    ias/PT

    1

    Fig. 1. Dependence of the upper limit of the bias |cPT/avg- cPT| (in units of PT)on the2

    numberNof PT results; reproduced from ref. [8] by permission of Springer. The line3

    is the upper 97.5thpercentile, corresponding to the upper limit of the two-sided 95 %4

    interval for the expected bias. The range of N = 20 to 30, intermediate between small5

    and large sample sizes, is shown by the grey bar.6

    7

    While consensus mean values are less affected than observed standard deviations,8

    uncertainties in consensus means are relatively large in small schemes, and will9

    practically never meet the guidelines for unqualified scoring suggested in the IUPAC10

    Harmonized Protocol [2] for cases when the uncertainties are negligible. It follows11

    that scoring for small schemes should usually avoid simple consensus values.12

    Methods of obtaining traceable assigned values ccertare to be used wherever possible13

    to provide comparable PT results [11, 12].14

    The high variability of dispersion estimates in small statistical samples has special15

    implications for scoring based on observed participant standard deviation sPT. This16

    practice is already not recommended even for large schemes [3], on the grounds that it17

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    13/75

    13

    does not provide consistent interpretation of scores from one round (or scheme) to the1

    next. For small schemes, the variability of sPTmagnifies the problem.2

    3

    0.0

    0.5

    1.0

    1.5

    2.0

    0 20 40 60 80 100

    N

    s

    PT/PT

    4

    Fig. 2. Dependence of the sample standard deviation sPTlimits (in units of PT)on the5

    number N of PT results; reproduced from ref. [8] by permission of Springer. Solid6

    lines show 2.5th(lower line) and 97.5th(upper line) percentiles for sPT. The dashed line7

    is at sPT/PT=1.0 for reference.The grey bar shows the range of intermediate sample8

    sizes (N= 20 to 30).9

    10

    It follows that scores based on the observed participant standard deviation should11

    not be applied in such a case. If a PT provider can set an external, fit-for purpose,12

    normative or target standard deviation targ, then z-scores, which compare a result bias13

    from the assigned value with targ, can be calculated in a small scheme in the same14

    manner as recommended in refs. [1-5] for a large scheme. The condition is only that15

    the standard uncertainty of the assigned/certified value ucert is insignificant in16

    comparison to targ(ucert2

    < 0.1targ2

    ).17

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    14/75

    14

    2.2. Measurement uncertainty use for interpretation of PT results1

    When information necessary to set targis not available, and/or ucertis not negligible,2

    the information, included in the measurement uncertainty u(ci)of the result cireported3

    by the i-th laboratory, is helpful for performance assessment using zeta-scores and/or4

    En numbers [2, 3]. It may also be important for a small scheme that laboratories5

    working according to their own fitness-for-purpose criteria (for example, in conditions6

    of competition) can be judged by individual criteria based on their declared7

    measurement uncertainty values.8

    9

    2.3. What is a metrological approach to PT?10

    The approach based on metrological traceability of an assigned value of test items,11

    providing comparability of PT results, and on scoring PT results taking into account12

    uncertainties of the assigned value and uncertainties of the measurement results, has13

    been described as a "metrological approach" [13].14Two main steps are common for any PT scheme using this approach:15

    1) establishment of a metrologically traceable assigned value, ccert, of analyte16

    concentration in the test items/reference material and quantification of the standard17

    uncertainty ucert of this value, including components arising from the material18

    homogeneity and stability during the PT round, and 2) calculation of fitness-for-19

    purpose performance statistics as well as assessment of the laboratory performance,20

    taking into account the laboratory measurement uncertainty. For the second step it21

    may be necessary in addition to take into account the small population size of22

    laboratories able to take part in the PT. These issues are considered below.23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    15/75

    15

    3. VALUE ASSIGNMENT1

    3.1. Metrological traceability of a CRM property value and of PT results2

    Since the approach to PT for a limited numberNof participants is based on the use of3

    CRMs as test items unknown to the participants, metrological traceability of a CRM4

    property value is a key to understanding metrological comparability and compatibility5

    of the PT results. Interrelations of these parameters are shown in Fig. 3.6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    Fig. 3. A scheme of calibration hierarchy, traceability and commutability (adequacy20

    or match) of reference materials used for PT, comparability and compatibility of PT21

    results; reproduced from ref. [16] by permission of Springer.22

    23

    The left pyramid in Fig. 3 illustrates the calibration hierarchy of CRMs as24

    measurement standards or calibrators [10] ranked by increasing uncertainties of25

    Uncertain

    ty

    Comparabi

    lity

    Traceabi

    lity

    Assigned value-measurement

    result

    SI unitskg K mol others

    Primary CRM

    NMIs

    Secondary CRM

    CRM producers

    Working CRM/ IHRM

    Testing labs and other users

    Ref.meas.stand.

    Ref.meas.stand.

    CRM commutability

    Compatibility

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    16/75

    16

    supplied property values from primary CRMs (mostly pure substances developed by1

    National Metrology Institutes - NMIs), to secondary CRMs (e.g. a matrix CRM2

    traceable to primary CRMs), and from secondary to working CRMs (certified in-3

    house reference materials - IHRMs - developed by testing/analytical laboratories, PT4

    providers and other users) [14,15]. When a CRM of a higher level is used for5

    certification of a reference material of a lower level by comparing them (for example,6

    for certification of IHRM), the first one plays the role of a reference measurement7

    standard: shown in Fig. 3 by semicircular pointers. Since uncertainty of CRM8

    property values is increasing in this way, the uncertainty pointer is directed from the9

    top of the pyramid to the bottom.10

    The same CRM can be used for calibration of a measurement system and for PT,11

    i.e. for two different purposes: as a calibrator and as a quality control material (test12

    items), but not at the same time, in the same measurement or in the same test [17].13

    The right-side overturned pyramid in Fig. 3 shows traceability chains from a14

    reference material certified value and the corresponding measurement/analysis/test15

    results to SI units. As a rule, one result is to be traceable to the definition of its unit,16

    while simultaneously there are several influence quantities which need also to be17

    traceable to their own definition of units: to the mole of the analyte entities per mass18

    of sample (i.e. for the concentrations in the calibration solutions), to the kilogram19

    because a size of a sample under analysis is quantified by mass or volume, to the20

    Kelvin when the temperature influences the results obtaining for the main quantity,21

    etc. Thus, the traceability pointer has a direction which is opposite to the measurement22

    uncertainty. Of course, the width of the overturned pyramid is not correlated with the23

    uncertainty values, as the case is in the left-side pyramid.24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    17/75

    17

    Understanding traceability of measurement/analysis/test and PT results to the mole1

    (realized through the chain of the CRMs according to their hierarchy) is often not2

    simple and requires reliable information about the measurement uncertainty. The3

    problem is that the uncertainty of analytical results may increase because of4

    deviations of the chemical composition of the matrix CRM (used for calibration of the5

    measurement system) from the chemical composition of the routine samples under6

    analysis. Similarly, the difference between a certified value of the matrix reference7

    material (applied in a PT as test items) and the result of a laboratory participating in8

    the PT may increase when the CRM has a different chemical composition than the9

    routine samples. This is known as the problem of CRM commutability - adequacy or10

    match - to a sample under analysis [18], and is shown in Fig. 3 as an additional11

    pointer above the uncertainty pointer. The commutability is discussed in the following12

    paragraph 3.1.1, while the metrological comparability and compatibility pointers13

    shown also in Fig. 3 in paragraph 5.14

    15

    3.1.1. Commutability of the CRMs and routine samples16

    Since a difference in property values and matrices of CRM and of routine samples17

    influences the measurement uncertainty in PT, the chemical composition of both, the18

    measurement standard (the CRM used as test items) and the routine samples of the19

    test object, should be as close as possible. Algorithm for a priori evaluation of CRMs20

    adequacy can be based on the use of an adequacy score: AS % =100n

    i

    a

    iiR , where 21

    is the symbol of multiplication, i = 1, 2, , nis the number of a component or of a22

    physico-chemical parameter; Ri= [min(ci,s, ci,cert)/max(ci,s, ci,cert)] is the ratio of the23

    minimal to the maximal values from ci,sand ci,cert; ci,sand ci,cert are the concentrations24

    of the i-th component or the values of the i-th physico-chemical parameter in the25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    18/75

    18

    sample and certified in the CRM, respectively; 0 ai1 is the empirical sensitivity1

    coefficient which allows decreasing the influence of a component or a parameter on2

    the score value, if the component or the parameter is less important for the analysis3

    than others. According to this score, the ideal adequacy (AS= 100 %) is achieved4

    when the composition and properties of the sample and of the RM coincide. The5

    adequacy is absent (AS= 0 %) when the sample and the CRM are different substances6

    or materials, and/or the analyte is absent in the CRM (ci,cert= 0). Intermediate cases,7

    for example for two components under control, are shown in Fig. 4. The ratios R1and8

    R2providing adequacy score valuesAS= 70, 80 and 90 %, form here curves 1, 2 and9

    3, respectively.10

    11

    Fig. 4. Adequacy scoreASvalues in dependence on ratiosR1andR2of concentrations12

    of two components in a sample under analysis and in a CRM; reproduced from ref.13

    [16] by permission of Springer. Curves 1, 2 and 3 correspond to AS= 70, 80 and 9014

    %, respectively. The dotted pointer shows the direction of the adequacy increasing.15

    16

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    19/75

    19

    The adequacy score may be helpful for CRM choice as a calibrator since direct use1

    of a CRM having a low adequacy score can lead to an incorrect/broken traceability2

    chain. Such a CRM applied for PT will decrease the reliability of a laboratory3

    performance assessment. Therefore, CRM commutability in PT and a score allowing4

    its evaluation are also important. However, the adequacy score does not properly5

    quantify the measurement uncertainty contribution caused by insufficient6

    commutability (AS< 100 %). This requires a special study.7

    More details ofAScalculations see in Annex B, Example 5.8

    9

    3.1.2. Three scenarios10

    Thus, the task of value assignment is divided into the following three scenarios: I) an11

    adequate matrix CRM with traceable property value is available for use as test items;12

    II) available matrix CRMs are not directly applicable, but a CRM can be used in13

    formulating a spiked material with traceable property values; III) only an IHRM with14

    a limited traceability chain of the property value is available (for example, because15

    instability of the material under analysis).16

    17

    3.2. Scenario I: Use of adequate CRM18

    The ideal case is when the test items distributed among the laboratories participating19

    in the PT are portions of a purchased adequate matrix CRM (primary or secondary20

    measurement standard). However, when the CRMs available in the market are too21

    expensive for direct use in PT in the capacity of test items, a corresponding IHRM22

    (working measurement standard) is to be developed. Characterization of an IHRM23

    with a property value traceable to the CRM value by comparison, and application of24

    the IHRM for PT are described in refs. [3, 19-21]. The characterization can be25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    20/75

    20

    effectively carried out by analysis of the two materials in pairs, each pair consisting of1

    one portion of the IHRM and one portion of the CRM. A pair is analyzed practically2

    simultaneously, by the same analyst and method, in the same laboratory and3

    conditions. According to this design, the analyte concentration in the IHRM under4

    characterization is compared with the certified value of the CRM and is calculated5

    using differences in results of the analyte determinations in the pairs. The standard6

    uncertainty of the IHRM certified value is evaluated as a combination of the CRM7

    standard uncertainty and of the differences' standard uncertainty (the standard8

    deviation of the mean of the differences). The uncertainty of the IHRM certified value9

    includes homogeneity uncertainties of both the CRM and the IHRM, since the10

    differences in the results are caused not only by the measurement uncertainties, but11

    also by fluctuations of the analyte concentrations in the test portions. When more than12

    one unit of IHRM is prepared for PT, care still needs to be taken to include the IHRM13

    between-unit homogeneity term in evaluating the uncertainty. Since, in this scenario,14

    the CRM and IHRM have similar matrixes and close chemical compositions, at15

    similar processing, packaging and transportation conditions their stability16

    characteristics during PT are assumed to be identical unless there is information to the17

    contrary. The CRM uncertainty forms a part of the IHRM uncertainty budget and is18

    expected to include any necessary uncertainty related to stability, therefore no19

    additional stability term is included in the IHRM uncertainty.20

    The criterion of fitness-for-purpose uncertainty of the property value of a reference21

    material applied for PT is formulated depending on the task. For example, for PT in the22

    field of water analysis in Israel [22], expanded uncertainty valuesshould be negligible23

    in comparison to the maximum contaminant level (MCL), i.e. the maximum24

    permissible analyte concentration in water delivered to any user of the public water25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    21/75

    21

    system. In this example, the uncertainty was limited to 2ucert

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    22/75

    22

    A related scenario is based on traceable quantitative elemental analysis and1

    qualitative information on purity/degradation of the analyte under characterization in2

    the IHRM. For example, IHRMs for determination of inorganic polysulfides in water3

    have been developed in this way [24]. The determination included the polysulfides4

    derivatization with a methylation agent followed by GC-MS or HPLC analysis of the5

    difunctionalized polysulfides. Therefore, the IHRMs were synthesized in the form of6

    dimethylated polysulfides containing four to eight atoms of sulfur. Composition of the7

    compounds was confirmed by NMR and by dependence of HPLC retention time of the8

    dimethylpolysulfides on the number of sulfur atoms in the molecule. Stability of the9

    IHRMs was studied by HPLC with UV detection. Total sulfur content was determined10

    by the IHRMs oxidation with perchloric acid in high-pressure vessels (bombs),11

    followed by determination of the formed sulfate using ICP-OES. IHRM certified12

    values were traceable to NIST SRM 682 through the Anion Multi-Element Standard II13

    from Merck (containing certified concentration of sulfate ions) that was used for the14

    ICP-OES calibration, and to the SI kg, since all the test portions were quantified by15

    weight.16

    More detailed example see in Annex B, Example 2.17

    18

    3.4. Scenario III: Appropriate CRMs are not available19

    This scenario can arise when a component or an impurity of an object/material under20

    analysis is unstable, or the matrix is unstable, and no CRMs (primary or secondary21

    measurement standards) are available. The proposed PT scheme for such a case is22

    based on preparation of an individual sample of IHRM for every participant in the23

    same conditions provided by a reference laboratory (RL), allowing the participant to24

    start the measurement/test process immediately after the sample preparation. In this25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    23/75

    23

    scheme IHRM instability is not relevant as a source of measurement/test uncertainty,1

    while intra- and between-samples inhomogeneity parameters are evaluated using the2

    results of RL testing of the samples taken at the beginning, the middle and the end of3

    the PT experiment. For example, such a PT scheme was used for concrete testing:4

    more details see in Annex B, Example 3.5

    6

    4. INDIVIDUAL LABORATORY PERFORMANCE EVALUATION AND7

    SCORING8

    4.1. Single (external) criterion for all laboratories participated in a PT9

    The present IUPAC Harmonized Protocol [2] recommends thatz-score values10

    arg

    -

    t

    certi

    i

    ccz

    = ,11

    are considered acceptable within 2, unacceptable with values outside 3, and12

    questionable with intermediate values (the grounds for that are discussed thoroughly13

    elsewhere [2]). This score provides the simplest and most direct answer to the14

    question: Is the laboratory performing to the quantitative requirement (targ) set for15

    the particular scheme? The laboratorys quoted uncertainty is not directly relevant to16

    this particular question, so is not included in the score. Over the longer term, however,17

    a laboratory will be scored poorly if its real (as opposed to estimated) uncertainty is18

    too large for the job, whether the problem is caused by unacceptable bias or19

    unacceptable variability. This scoring, based on an externally set value targ(without20

    explicitly taking uncertainties of the assigned value and participant uncertainties into21

    account), remains applicable to small schemes, provided that laboratories share a22

    common purpose for which a single value of targcan be determined for each round.23

    Examples ofthe targsetting andz-score use see in Annex B, Examples 1-2.24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    24/75

    24

    4.2. Own criterion for every laboratory1

    Often, however, a small group of laboratories has sufficiently different requirements2

    that a single criterion is not appropriate. It may then (as well as generally) be of3

    interest to consider a somewhat different question about performance: Are the4

    participants results consistent with their own quoted uncertainties? For this purpose,5

    zeta() andEnnumber scores are appropriate. The scores are calculated as6

    7

    22

    )(

    -

    certi

    certi

    i

    ucu

    cc

    +

    = and22

    -

    certi

    certi

    n

    U)c(U

    ccE

    +

    = ,8

    9

    where u(ci) and U(ci) are the standard and expanded uncertainties of the i-th10

    participant result ci, respectively, Ucertis the expanded uncertainty of the certified (or11

    otherwise assigned) value ccert.Zetascore values are typically interpreted in the same12

    way asz-score values (see Annex B, Example 3).Ennumber differs fromzetascore in13

    the use of expanded uncertainties and En values are usually considered acceptable14

    within 1. The advantages of zetascoring are that i) it takes explicit account of the15

    laboratorys reported uncertainty; ii) it provides feedback on both the laboratory result16

    and on the laboratorys uncertainty estimation procedures. The main disadvantages17

    are that i) it cannot be directly related to an independent criterion of fitness-for-18

    purpose; ii) pessimistic uncertainty estimates lead to consistently good zeta scores19

    irrespective of whether they are fit for a particular task; and iii) the PT provider has no20

    way of checking that reported uncertainties are the same as those given to customers,21

    although a customer or accreditation body is able to check this if necessary. The En22

    number shares these characteristics, but adds two more. First, it additionally evaluates23

    the laboratorys choice of coverage factor for converting standard to expanded24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    25/75

    25

    uncertainty. This is an advantage. Second, unless the confidence level is set in1

    advance, Enis sensitive to the level of confidence chosen both by participant and by2

    provider in calculating U(ci) and Ucert. It is obviously important to ensure consistency3

    in the use of coverage factors ifEnnumbers are to be compared.4

    It is clear that a single score cannot provide simultaneous information on whether5

    laboratories meet external criteria (z-scores apply best here) and on whether they meet6

    their own criteria (zetaorEnnumber apply best).7

    8

    5. METROLOGICAL COMPARABILITY & COMPATIBILITY OF PT RESULTS9

    The meaning of metrological comparability of PT results is that being traceable to the10

    same metrological reference, they are comparable independently of the result values11

    and of the associated measurement uncertainties. Since scoring a laboratory12

    proficiency in the discussed small PT schemes is based on evaluation of the bias13

    ci c

    certof i-th laboratory result c

    ifrom the certified property value c

    certof the test14

    items, both PT results and the CRM certification (measurement) data should be15

    comparable, i.e. traceable to the same metrological reference. The same is correct for16

    different runs of the PT scheme, when laboratory score values obtained in these runs17

    are compared. As much as metrological comparability is a consequence of18

    metrological traceability, the comparability pointer in Fig. 3 is directed like the19

    traceability one.20

    Metrological compatibility can be interpreted for PT results as the property21

    satisfied by each pair of PT results, so that the absolute value of the difference22

    between them is smaller than some chosen multiple of the standard measurement23

    uncertainty of that difference. Moreover, successful PT scoring means that the24

    absolute value of the bias ci ccertis smaller than the corresponding chosen multiple25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    26/75

    26

    of the bias standard uncertainty. In other words, a PT result is successful when it is1

    compatible with the CRM (test item) certified value. Therefore compatibility is shown2

    in Fig. 3 by a horizontal pointer uniting the direct and the inversed pyramids.3

    Thus, achieving the quality of measurement/analysis/test and PT results in the4

    framework of the concept tested once, accepted everywhere [11, 25] requires both5

    comparability and compatibility of the results.6

    When PT is based on the metrological approach, there are two key parameters for7

    assessment of comparability & compatibility of results [26]: 1) position of the CRM8

    sent to the participants in the calibration hierarchy of measurement standards, and 2)9

    closeness of the distribution of PT results to the distribution of the CRM data.10

    The position of a CRM in the calibration hierarchy depends on the top11

    measurement standard in the traceability chain. For example, if a CRM property value12

    is traceable to SI units (by scenarios I and II), it confirms world-wide comparability of13

    PT results. Any PT scheme based on the use of IHRM with a limited traceability14

    chain of the property value (not traceable to SI units: scenario III) provides the15

    possibility of confirming local comparability only. The same situation took place in16

    the classical fields of mass and length measurements before the Convention of the17

    Metre, when measurement results in different countries had been traceable to different18

    national (local) measurement standards.19

    At any traceability of the CRM property value used, the closeness of the20

    distributions of the PT results and of the CRM data is important for the result21

    compatibility and performance assessment. Since laboratory performance is assessed22

    individually for each PT participant, even in a case when the performance of the23

    majority of them is found to be successful, compatibility of all the PT results (i.e. a24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    27/75

    27

    group performance characteristic of the laboratories participating in PT) still remains1

    unassessed.2

    The situation is illustrated in Fig. 5, where both distribution density functions fof3

    PT results (curve 1) and of CRM data (curve 2) are shown as normal ones. The vertical4

    lines are the centers of these distributions: cPT and ccert, respectively. The common5

    shaded area P under the density function curves is the probability of obtained PT6

    results belonging to the population of the RM data. It can be considered as a parameter7

    of compatibility. The value Ptends to zero when the difference between cPTand ccertis8

    significantly larger than standard deviations PT and ucert of both distributions. The9

    closer cPT is to ccert (shown by the semicircular pointers in Fig. 5), the higher the P10

    value is.11

    0.0

    1.0

    2.0

    3.0

    9.8 10.4 11.0 11.6 12.2 12.8 13.4C

    fCPT Ccert

    12

    Fig. 5. Probability density functionsf of PT results, curve 1, and of CRM data, curve13

    2; reproduced from ref [16] by permission of Springer. Vertical lines are the centers of14

    these distributions: cPT and ccert, respectively. The common shaded area under the15

    density function curves is the probability Pof obtained PT results belonging to the16

    population of the CRM data. The semicircular pointers show the direction of the17

    compatibility increasing.18

    19

    1

    2P

    c

    fcPT ccert

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    28/75

    28

    The distributions, Pvalues, hypotheses necessary for assessment of compatibility of1

    results of a limited number Nof PT participants, as a group, and suitable criteria for2

    that based on analysis of the statistical sample characteristics (average cPT/avg, standard3

    deviation sPT,etc.) are discussed in detail in Annex A.4

    In principle, cPT/avg and sPT are the consensus values which cannot be used for a5

    reliable assessment of an individual laboratory performance when the number of the6

    laboratories participating in the PT scheme is limited. However, here the consensus7

    values are used for another purpose: for comparison of PT results, as a statistical8

    sample, with the CRM data (see Examples 1-4 in Annex B). The compatibility of PT9

    results of a group of laboratories can be low if one or more laboratories from the group10

    perform badly. Analysis of reasons leading to such a situation, as well as ways to11

    correct it, are a task for the corresponding accreditation body and/or the regulator12

    responsible for these laboratories and interested in the comparability & compatibility13

    of the results.14

    15

    6. EFFECT OF SMALL LABORATORY POPULATION ON SAMPLE16

    ESTIMATES17

    The population of possible laboratory participants is not usually infinite. For example,18

    the population size of possible PT participants in motor oil testing organized by the19

    Israel Forum of Managers of Oil Laboratories was Np =12 only, while the statistical20

    sample size, i.e. the number of the participants agreed to take part in the PT in21

    different years was N= 6 to 10 (see Annex B, Example 4). In such cases the sample22

    fraction = 6/12 to 10/12 = 0.5 to 0.8 (i.e. 50 to 80 %) is not negligible and23

    corrections for finite population size are necessary in the statistical data analyses. The24

    corrections include the standard deviation (standard uncertainty) of the sample mean25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    29/75

    29

    ofNPT results cPT/av,equal toPT/av = PT{[(NP N)/(NP 1)]/N}1/2and the standard1

    deviation of a PT result equal to sPT= PT[NP/(NP 1)]1/2.2

    After simple transformations the following formula for the sample mean can be3

    obtained: PT/av/(PT/N) = [(NP N)/(NP 1)]1/2= [(1 )/(1 1/Np)]

    1/2. The4

    dependence of PT/avon is shown (in units of PT/N) in Fig. 6 for the populations of5

    NP = 10, 20 and 100 laboratories, curves 1, 2 and 3, respectively.6

    0.4

    0.6

    0.8

    1.0

    0 20 40 60 80

    , %, %, %, %

    PT/av

    /(PT/N

    )

    7

    Fig. 6. Dependence of the standard deviation of the sample meanPT/av(in units of8

    PT/N) on the sample fraction; reproduced from ref. [8] by permission of Springer.9

    Curves 1, 2 and 3 are for the populations of NP = 10, 20 and 100 laboratories,10

    respectively. The grey bar shows the intermediate range of sample fraction values11

    = 5 to 10 % (at < 5 % corrections for a finite population size are negligible, as a12

    rule).13

    14

    Since at least two PT results are necessary for calculation of a standard deviation (i.e.15

    the minimal sample size is N= 2), curve 1 is shown for 20 %, curve 2 - for16

    2

    3

    1

    , %

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    30/75

    30

    10 %, and curve 3 - for 2 %. The population size has much less influence here1

    than the sample fraction value.2

    Dependence of sPT on by the formula sPT/PT = [1/(1 /N)]1/2 is weak in3

    comparison with the previous one in Fig. 6, since the correction factor values are of4

    0.96 to 1.00 only for any event when the sample size is ofN= 10 to 100 PT results.5

    AsNP increases and decreases, the values (NPN)/(NP 1) 1 and 1/(1 /N)6

    1, and the corrections for finite population size disappear: PT/av PT/NandsPT7

    PT. Therefore, the corrections are negligible for values up to around 5 to 10 %8

    (shown by the grey bars in Fig. 6).9

    These corrections should, however, be applied with care, only when the population10

    is really finite.11

    12

    7. OUTLIERS13

    Since the number of PT results (the sample size N) is limited, it is also important to14

    treat extreme results correctly if they are not caused by a known gross error or15

    miscalculation. Even at large Nextreme results can provide valuable information to16

    the PT provider and should not be disregarded entirely in analysis of the PT results17

    without due consideration. When N is small, extreme results cannot usually be18

    identified as outliers by known statistical tests because of low power of these tests.19

    Fortunately, the metrological approach for small schemes makes outlier handling20

    less important, since assigned values should not be calculated by consensus, and21

    scores are not expected to be based on observed standard deviations. Accordingly,22

    outliers have effect on scoring only for the laboratory reporting outlying results and23

    for the PT provider seeking the underlying causes of such problems.24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    31/75

    31

    8. EFFECTIVENESS OF APPROACHS TO PT1

    While traditional approaches to PT (used consensus values for assessment of a2

    laboratory performance) are not acceptable forN< 30, the metrological one (based on3

    the CRM use) is acceptable from statistical and metrological points of view for anyN,4

    includingN30 as well. However, a PT cost increasing withNshould also be taken5

    into account for any correct PT scheme design.6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    32/75

    32

    ANNEX A. CRITERIA FOR ASSESSMENT OF METROLOGICAL1

    COMPATIBILITY OF PT RESULTS2

    3

    CONTENTS4

    1. RELATIONSHIP BETWEEN THE DISTRIBUTION OF CRM ASSIGNED5

    VALUE DATA AND THE DISTRIBUTION OF PT RESULTS6

    2. NULL AND ALTERNATIVE HYPOTHESES7

    3. A CRITERION FOR PT RESULTS BEING NORMALLY DISTRIBUTED8

    3.1. Example9

    3.2. Reliability of the assessment10

    4. A NON-PARAMETRIC TEST FOR PT RESULTS WITH AN UNKNOWN11

    DISTRIBUTION12

    4.1. Reliability of the test13

    4.2. Example14

    4.3. Limitations15

    16

    17

    1. RELATIONSHIP BETWEEN THE DISTRIBUTION OF CRM ASSIGNED18

    VALUE DATA AND THE DISTRIBUTION OF PT RESULTS19

    Data used for calculation of the CRM assigned value, and the measurement/analysis20

    results of the laboratories participating in PT can be considered as independent21

    random events. Therefore, the relation between them can be characterized by the22

    common area P under the density function curves for both CRM data and for PT23

    results. The Pvalue is the probability of joint events and, therefore, the probability of24

    obtained PT results belonging to the population of CRM data.25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    33/75

    33

    For the sake of simplicity, both distributions are assumed to be normal, with1

    parameters ccert, certand cPT, PT, as shown in Fig. 7. The figure refers to a simulated2

    example of aluminum determination in coal fly ashes using a CRM developed by3

    NIST, USA: SRM 2690 with ccert= 12.35 % and cert= 0.14 % (as mass fraction)4

    [27].5

    0.0

    1.0

    2.0

    3.0

    9.8 10.4 11.0 11.6 12.2 12.8 13.4

    c

    f c PT c cert

    c 1 c 2

    6

    Fig. 7. Probability density functions f of the PT results and of the CRM data when7

    cPT= 12.25 % and PT= 0.34 %; reproduced from ref. [27] by permission of RSC.8

    Values c1and c2are the measurement/test results corresponding to the crossing points9

    of thefcurves.10

    11

    Since both density functions,fcertof CRM data andfPTof PT results are equal at the12

    c1and c2values, one can write13

    14

    cert

    /)cc(

    cert

    /)cc(

    PT

    PT feefcertcertPTPT === 2222 22

    2

    1

    2

    1

    (1)15

    16

    As shown in ref. [27], after transformations of expression (1), c1 and c2 can be17

    calculated by the following formula:18

    19

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    34/75

    34

    ,)(

    ,22

    22

    21

    PTcert

    PTcertcertPTPTcert cccc

    = (2)1

    where2

    .ln)(2)( 222

    cert

    PT

    certPTPTcert cc

    += (3)3

    When c1 and c2 are known, the probability calculation is convenient by the next4

    formula:5

    6

    +

    +=++=

    +

    cert

    cert

    PT

    PT

    PT

    PT

    cert

    cert

    c c

    c c

    certPTcert

    cc

    ccccccdcfdcfdcfP

    2

    12111 2

    1 2 7

    8

    where stands for the normalized normal distribution function. For example,9

    calculations by formulas (2)-(4) in the case shown in Fig. 7 yield c1= 12.16,10c2= 12.58 and P= 0.58.11

    Information on the distributions of both PT results and CRM data is limited by12

    experimental statistical sample sizes. Therefore, the common area P under the13

    probability density function curves of the distributions (the probability of obtained PT14

    results belonging to the population of the CRM data) can adequately characterize the15

    metrological compatibility only as much as the goodness-of-fit of empirical and16

    theoretical distributions is high. However, the Pvalue is of practical importance since17

    it allows one to choose a suitable null hypothesis for a criterion of a yes-no type for18

    assessment of the metrological compatibility of relatively small (not infinite) number19

    of PT results.20

    21

    (4),

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    35/75

    35

    2. NULL AND ALTERNATIVE HYPOTHESES1

    The chosen null hypothesisH0states that the metrological compatibility is satisfactory2

    if the bias | certPT cc | exceeds cert only by a value which is insignificant in3

    comparison with random interlaboratory errors:4

    5

    H0: ( ) 2/122 ]3.0[ PTcertcertPT cc + . (5)6

    7

    where a coefficient of 0.3 is used according to the known metrological rule defining8

    one standard deviation insignificant in comparison with another one when the former9

    does not exceed 1/3 of the latter (i.e. the first variance is smaller than the second one10

    by an order). By this hypothesis, the probability Pof considering the PT results as11

    belonging to the population of CRM data is P0.53 for the ratio= cert/PT0.412

    (as shown in Fig. 7), when the right-hand side of expression (5) reaches the value of13

    1.25cert.14

    The alternative hypothesis H1 assumes that the metrological compatibility is not15

    satisfactory and the bias | certPT cc | exceeds certsignificantly, for example:16

    17

    H1: ( ) 2/122 ]3.0[0.2 PTcertcertPT cc += , (6)18

    etc.19

    20

    3. A CRITERION FOR PT RESULTS BEING NORMALLY DISTRIBUTED21

    The criterion for not rejectingH0 fora statistical sample of sizeN, i.e. for results ofN22

    laboratories participating in the PT, is23

    ( ) 2/1222/1/ ]3.0[/ PTcertPTcertavPT Nstcc ++ , (7)24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    36/75

    36

    where cPT/avand sPTare the sample estimates of cPTand PTcalculated from the sameN1

    results as the sample average and standard deviation, correspondingly; the left-hand2

    side of the expression represents the upper limit of the confidence interval for the bias3

    | certPT cc |; t1-/2 is the percentile of the one-tailed Students distribution for the4

    number of degrees of freedom N-1; the 1-/2 value is the probability of the bias not5

    exceeding the upper limit of its confidence interval.6

    By substituting the ratio and sPT/PT=2/12

    2/ )]1/([ N , where 2

    /2is the 100/27

    percentile of 2distribution for the number of degrees of freedom N-1, into formula8

    (7), the following transformation of the criterion is obtained:9

    10

    ( ) ( )N

    tNscc PTcertavPT

    2/1

    2/1

    22

    2// 09.0

    1/

    +

    . (8)11

    12

    Table 1 gives the numerical values for the right-hand side of the criterion at =0.05.13

    Table 114

    The bias norms in sPTunits by criterion (8)15

    N

    5 10 15 20 30 40 50

    0.4 0.20 0.20 0.23 0.26 0.30 0.32 0.34

    0.7 0.95 0.68 0.65 0.64 0.65 0.66 0.67

    1.0 1.76 1.19 1.09 1.06 1.03 1.02 1.02

    16

    17

    These values are the norms for the bias of the average PT result from the analyte18

    concentration certified in the CRM (in sPTunits). The value of should be set based19

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    37/75

    37

    on the requirements to the analytical results taking into account PT fit-for-purpose1

    valuethat is equal either to the standard analytical/measurement uncertainty or to the2

    target standard deviation targ calculated using the Horwitz curve [2, 3] or another3

    database.4

    5

    3.1. Example6

    According to the ASTM standard [29], the means of the results of duplicate7

    aluminum determinations in coal fly ashes carried out by different laboratories on8

    riffled splits of the analysis sample should not differ by more than 2.0 % for Al2O3,9

    i.e. 1.06 % for aluminum. Since the range for two laboratory results is limited by the10

    standard, PT= 1.06/2.77 = 0.38 %, where 2.77 is the 95 % percentile of the range11

    distribution. In case of the discussed SRM 2690 with cert = 0.14 % the value12

    is 0.14/0.38 = 0.4. Simulated statistical samples of the PT results are given in13

    Table 2. Metrological compatibility of results of the first 15 laboratories can be14

    assessed as satisfactory by the norm in Table 1 for = 0.4 (0.23), since15

    cPT/av - ccert= 12.30 12.35= 0.05 < 0.23 sPT= 0.23 0.34 = 0.08 % (as mass16

    fraction). The same is true concerning the metrological compatibility of results of all17

    the 30 laboratories (the norm in Table 1 is 0.30):cPT/av - ccert= 12.38 12.35=18

    0.03 < 0.30 sPT

    = 0.30 0.35 = 0.11 %.19

    Other detailed examples see in Annex B, Examples 3 and 4.20

    21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    38/75

    38

    Table 21

    PT results of aluminum determination in SRM 2690 (simulated in % as mass2fraction)3

    4

    Lab. No. i 100 ci Lab. No. i 100 ci

    1 12.76 16 12.60

    2 12.19 17 12.81

    3 12.68 18 12.39

    4 12.21 19 11.96

    5 12.96 20 11.91

    6 12.27 21 11.86

    7 11.96 22 12.32

    8 12.03 23 12.53

    9 11.88 24 12.84

    10 11.97 25 12.67

    11 12.23 26 12.86

    12 12.48 27 12.75

    13 12.69 28 12.66

    14 12.21 29 11.99

    15 11.98 30 12.61

    cPT/av 12.30 cPT/av 12.38

    sPT 0.34 sPT 0.35

    5

    6

    3.2. Reliability of the assessment7

    Reliability in such metrological compatibility assessment is determined by the8

    probabilities of not rejecting the null hypothesis H0when it is true, and rejecting it9

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    39/75

    39

    when it is false (i.e. when the alternative hypothesis H1is true). Criterion (8) does not1

    allow rejecting hypothesisH0with probability 1-/2 when it is true. Probability of an2

    error of type 1 by this criterion (to reject the H0hypothesis when it is true) is /2.3

    Probability of rejecting H0, when it is false, i.e. when the alternative hypotheses H14

    are actually true (the criterion power - CP) is:5

    6

    CP=[ ]

    +

    +

    2/122/1

    2/

    )1(2/1 Nt

    t

    , (9)7

    where8

    =N

    cc

    PT

    PTcertPT

    /

    )09.0( 2/12

    +. (10)9

    10

    The value of the deviation parameter is calculated substituting the bias | certPT cc |11

    in equation (10) by its value corresponding to the alternative hypothesis. For12

    hypothesisH1by formula (6) the substitution is ( ) 2/122 ]3.0[0.2 PTcert + and, therefore,13

    = [(0.09 + 2)N]1/2. The probability of an error of type 2 (not rejecting theH0when it14

    is false) equals to = 1 - CP. Both operational characteristics of the criterion CPand15

    are shown in Fig. 8 at = 0.05 for different values and different numbersNof the16

    PT participants.17

    Thus, the reliability of the compatibility assessment using the hypotheses H018

    againstH1for the PT scheme for aluminum determination in coal fly ashes (where =19

    0.4) can be characterized by 1) probability 1- /2 = 0.975 of the correct assessment of20

    the compatibility as successful (i.e. not rejecting the null hypothesis H0 when it is21

    true) for any number Nof the laboratories participating in PT, and by 2) probability22

    CP= 0.42 of correct assessment of the compatibility as unsuccessful (i.e. rejectingH023

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    40/75

    40

    when the alternative hypothesisH1 is true) forN= 15, and probability CP= 0.75 for1

    N = 30 results. Probability /2 of a type 1 error is 0.025 for anyN, while probability 2

    of a type 2 error is 0.58 forN= 15, and 0.25 forN= 30, etc.3

    4

    0

    0.2

    0.4

    0.6

    0.8

    1

    5 15 25 35 45N

    CP

    5

    Fig. 8. Power CP of the criterion and probability of an error of type 2 (in6

    dependence on the numberN of laboratories participating in PT) for probability/2=7

    0.025 of an error of type 1; reproduced from ref. [28] by permission of Springer.8

    Curve 1 are at = 0.4, and curve 2 - at = 1.0.9

    10

    The power of criterion (8) is high (CP > 0.5) for a number of PT participants11

    N20.12

    13

    14

    15

    1

    2

    N

    0

    0.2

    0.4

    0.6

    0.8

    1

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    41/75

    41

    4. A NON-PARAMETRIC TEST FOR PT RESULTS WITH UNKNOWN1

    DISTRIBUTION2

    In the case of unknown distributions differing from the normal one, the median is3

    more robust than the average, i.e. better reproduced in the repeated experiments, being4

    less sensitive to extreme results/outliers. Therefore, the null hypothesis assuming here5

    that the bias of PT results exceeds certby a value which is insignificant in comparison6

    with random interlaboratory errors, has the following form:7

    8

    H05: ( ) =+ 2/122 ]3.0[- PTcertcertPT cM , (11)9

    10

    where MPT is the median of PT results of hypothetically infinite number N of11

    participants, i.e. the population median.12

    IfMPTccert, the null hypothesisH0 implies that probability Peof an event when a13

    result ci of the i-th PT-participating laboratory exceeds the value ccert+ , is14

    Pe{ci> ccert+ } according to the median definition. If MPT< ccert, the probability15

    of ciyielding the value ccert- is also Pe{ci< ccert- }. The alternative hypothesis16

    assumes that the bias exceeds cert significantly and probabilities of the events17

    described above are Pe > , for example:18

    19

    H1: =certPT cM - 2, (12)20

    21

    where is the same as in expression (11). Probabilities Pe of the events according to22

    the alternative hypothesisH1at normal distribution (depending on the permissible bias23

    in PTunits at different values) are shown in Table 3.24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    42/75

    42

    Table 31

    ProbabilityPe according to alternative hypothesisH12

    /PT Pe

    0.4 0.50 0.69

    0.7 0.75 0.77

    1.0 1.04 0.85

    3

    Since the population median is unknown in practice, and results of Nlaboratories4

    participating in PT form aN-size statistical sample from the population, hypothesisH05

    is not rejected when the upper limit of the median confidence interval does not exceed6

    ccert+ , or the lower limit does not yield ccert - . The limits can be evaluated based7

    on the simplest non-parametric sign test[30]. According to this test, the numberN+of8

    results ci > ccert+ or the number N-of results ci < ccert- should not exceed the9

    critical value A(the bias norm) in order not to reject H0. The Avalues are available,10

    for example, in ref. [31]. ForNfrom 5 to 50 PT participants and levels of confidence11

    0.975 (/2 = 1-0.975 = 0.025) and 0.95 (/2 = 0.05), these values are shown in Table12

    4. The Avalue for fewer than six participants at /2 = 0.025 cannot be determined,13

    and therefore, is not presented in Table 4 forN= 5.14

    Table 415

    The bias normsAby the sign test16

    N/2

    5 10 15 20 30 40 50

    0.025 - 1 3 5 9 13 17

    0.05 0 1 3 5 10 14 18

    17

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    43/75

    43

    4.1. Reliability of the test1

    The test does not allow rejecting hypothesisH0with a probability of 1-/2, when it is2

    true. Probability of an error of type 1 by this test (to reject theH0hypothesis when it is3

    true) is /2. Probability of rejecting the null hypothesis when it is false, i.e. when the4

    alternative hypothesis is actually true (the test power: TP), is tabulated in ref. [31].5

    The probability of type 2 error (not rejecting H0when it is false) equals to= 1-TP.6

    The operational characteristics of the test (TPand ) are shown in Fig. 9 at= 0.057

    for the alternative hypothesisH1at different values and different numbers Nof the8

    PT participants.9

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    5 15 25 35 45N

    TP

    10

    Fig. 9. PowerTP of the test and probabilityof an error of type 2 in dependence on11

    the number N of laboratories participating in PT, when probability of an error of12

    type 1 is /2 = 0.025; reproduced from ref. [30] by permission of Springer.The null13

    hypothesis H0 is tested against the alternative hypotheses H1 at = 0.4 and =1.014

    shown by curves 1 and 2, respectively.15

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1

    2

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    44/75

    44

    4.2. Example1

    The hypothesis about normal distribution of the PT results in the example shown in2

    Table 2 was not tested because of the small size of the statistical samples. Therefore,3

    the sample size is increased here to N = 50: the simulated data are presented in Table4

    5 (the simulation is performed by the known method of successive approximations).5

    Such sample size allows testing the hypothesis about the data normal distribution6

    applying the Cramer-von-Mises 2-criterion, powerful for statistical samples of small7

    sizes [32]:8

    9

    2= -N- 2 )]}(1ln[]2/)12(1[)(ln]2/)12[({

    1jj

    N

    j

    xNjxNj +=

    , (13)10

    11

    where j = 1, 2, ,Nis the number of the PT result Cjin the statistical sample ranked12

    by increasing c value (c1c2 cN);xj= (cj cPT/av)/sPT is the normalized value13

    of the j-th result which is distributed with the mean of 0 and the standard deviation of14

    1; and (xj) isthe value of the function of the normalized normal distribution forxj.15

    The probability that 2= 1.95calculated by formula (13) for the data in Table 516

    exceeded randomly the critical value 1.94 (forN= 50) equals to 0.10 [31]. Therefore,17

    the hypothesis about normal distribution of these data should be rejected at the level18

    of confidence of 0.90. The corresponding empirical histogram and the theoretical19

    (normal) distribution are shown in Fig. 10. It is clear that the empirical distribution is20

    a bimodal one, therefore, no normal distribution can fit it. Since other known21

    distributions are also not suitable here, let us apply the proposed non-parametric test22

    for the comparability assessment of the results.23

    Table 524

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    45/75

    45

    PT results of aluminum determination in SRM 2690 (simulated in % as mass1

    fraction) ranked according to their increasing value2

    No.

    j

    Result,

    100Ci

    Cj ccert

    Sign No.

    j

    Result,

    Ci100

    Cj ccert

    Sign No.

    j

    Result,

    100 Ci

    Cj ccert

    Sign

    1 11.86 -0.49 - 18 12.44 0.09 0 35 12.53 0.18 0

    2 11.88 -0.47 - 19 12.44 0.09 0 36 12.55 0.20 +

    3 11.90 -0.45 - 20 12.45 0.10 0 37 12.56 0.21 +

    4 11.91 -0.44 - 21 12.46 0.11 0 38 12.57 0.22 +

    5 11.93 -0.42 - 22 12.46 0.11 0 39 12.60 0.25 +

    6 11.96 -0.39 - 23 12.47 0.12 0 40 12.61 0.26 +

    7 11.96 -0.39 - 24 12.48 0.13 0 41 12.64 0.29 +

    8 11.97 -0.38 - 25 12.49 0.14 0 42 12.66 0.31 +

    9 11.98 -0.37 - 26 12.49 0.14 0 43 12.67 0.32 +

    10 11.99 -0.36 - 27 12.50 0.15 0 44 12.68 0.33 +

    11 12.03 -0.32 - 28 12.50 0.15 0 45 12.69 0.34 +

    12 12.07 -0.28 - 29 12.51 0.16 0 46 12.76 0.41 +

    13 12.17 -0.18 0 30 12.51 0.16 0 47 12.81 0.46 +

    14 12.19 -0.16 0 31 12.52 0.17 0 48 12.84 0.49 +

    15 12.20 -0.15 0 32 12.52 0.17 0 49 12.90 0.55 +

    16 12.34 -0.01 0 33 12.53 0.18 0 50 12.96 0.61 +

    17 12.43 0.08 0 34 12.53 0.18 0 N-= 12; N+= 15

    3

    Taking into account ccert= 12.35 %, cert= 0.14 %, PT= 0.38 %, and = 0.14/0.384

    = 0.4, one can calculate = 0.500.38 = 0.19 % (Table 5), ccert+ = 12.54 % and5

    ccert- = 12.16 %.There are N+= 15 results cj > 12.54 %, N-= 12 results cj < 12.166

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    46/75

    46

    %, andN-N+ -N-= 23 values in the range ccert . The sample median found is c25=1

    c26= 12.49 > ccert=12.35 % andN+>N-. However,N+ is lower than the critical value2

    A= 17 at /2 = 0.025 andN= 50 (Table 4).Therefore, null hypothesisH0concerning3

    successful metrological compatibility of the results is not rejected.4

    0.00

    0.10

    0.20

    0.30

    0.40

    0.50

    11.7 12.0 12.3 12.6 12.9 13.2

    C, %

    F

    5

    Reliability of the assessment with hypotheses H0 against H1 for this case can be6

    characterized by: 1) probability 1- /2 = 0.975 of correct assessment of the7

    compatibility as successful (not rejecting the null hypothesis when it is true) for any8

    number N 6 of the PT participants, and 2) probability TP= 0.73 of correct9

    Fig. 10.Histogram of PT results (frequencyF of a result valuec) solid line, and

    the fitted normal distribution dotted line; reproduced from ref. [30] by permission

    of Springer.

    c,%

    F

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    47/75

    47

    assessment of the compatibility of N = 50 PT results as unsuccessful (rejecting H01

    when alternative hypothesis H1 is true). Probability /2 of a type 1 error is 0.025 for2

    anyN6, while probability of type 2 error is 0.27 forN= 50.3

    Additional examples of the use of the sign test see in Annex B, Examples 1 and 2,4

    of 2-criterion application Example 3.5

    6

    4.3. Limitations7

    Since the sign test critical A values are determined for N 4 8 depending on8

    probabilities , and the test power is calculated also only for N6 8, the proposed9

    metrological compatibility assessment cannot be performed for a smaller sample size.10

    The power efficiency of the sign test in relation to the t-test (ratio of the sizes Nof11

    statistical samples from normal populations allowing the same power) is from 0.96 for12

    N= 5 to 0.64 for infinite N. For example, practically the same power (0.73 and 0.75)13

    was achieved in the sign test of the compatibility of PT results for aluminum14

    determination in coal fly ashes at N= 50 discussed above, and in the t-test for the15

    same purpose at N= 30 in the previous paragraph 3. The power efficiency here is16

    approximately of 30/50 = 0.6. On the other hand, when information about the17

    distribution of PT results is limited by N < 50, it is a problem to evaluate the18

    goodness-of-fit empirical and theoretical/normal distributions, a decrease of the t-test19

    power and the corresponding decrease of reliability of the compatibility assessment20

    caused by deviation of the empirical distribution from the normal one.21

    22

    23

    24

    25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    48/75

    48

    ANNEX B. EXAMPLES1

    2

    CONTENTS3

    EXAMPLE 1. SCENARIO 1: PT FOR LEAD DETERMINATION IN AIRBORNE4

    PARTICLES5

    1.1.Aim of the PT6

    1.2. Procedure for preparation of the IHRM7

    1.3. Analytical methods used and raw data8

    1.4. Statistical analysis of the data9

    1.4.1. Metrological compatibility assessment10

    EXAMPLE 2. SCENARIO 2: PT FOR ARSENIC DETERMINATION IN WATER11

    2.1.Aim of the PT12

    2.2. Procedure for preparation of the IHRM13

    2.3. Analytical methods used and raw data14

    2.4.Statistical analysis of the data15

    2.4.1. Metrological compatibility assessment16

    EXAMPLE 3. SCENARIO 3: PT FOR DETERMINATION OF CONCRETE17

    COMPRESSIVE STRENGTH18

    3.1. Aim of the PT19

    3.2. Procedure for preparation of the IHRM20

    3.2.1. IHRM homogeneity, certified value and its uncertainty21

    3.3. Methods used and raw data22

    3.4.Statistical analysis of the data23

    3.4.1. Metrological compatibility assessment24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    49/75

    49

    EXAMPLE 4. A LIMITED POPULATION OF PT PARTICIPANTS: PT FOR ACID1

    NUMBER DETERMINATION IN USED MOTOR OILS2

    4.1. Aim of the PT3

    4.2. Procedure for preparation of the IHRM4

    4.2.1. Characterization of the IHRM5

    4.3. Methods used and raw data6

    4.4.Statistical analysis of the data7

    4.4.1. Metrological compatibility assessment8

    EXAMPLE 5. SELECTION OF THE MOST COMMUTABLE (ADEQUATE) CRM9

    FOR PT OF CEMENTS10

    5.1. Twelve components11

    5.2. Six components12

    5.3. One component13

    5.4. Sensitivity coefficient14

    15

    1617

    EXAMPLE 1. SCENARIO 1: PT FOR LEAD DETERMINATION IN AIRBORNE18

    PARTICLES19

    1.1. Aim of the PT20

    The objectives of this PT were to determine whether the quality criteria described in21

    the European Directives [33, 34] concerning the analysis of As, Cd, Ni and Pb in22

    airborne particles, are reached and the most important sources of uncertainties are23

    identified. The measurement method is divided by the standard [35] into two main24

    parts: first the sampling in the field and second the analysis in the laboratory. During25

    sampling, particles are collected by drawing a measured volume of air through a filter26

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    50/75

    50

    mounted in a sampler designed to collect the fraction of suspended particulate matter1

    of less than 10 m (PM10) [36]. The sample filter is transported to the laboratory and2

    the analytes are taken into solution by closed vessel microwave digestion using nitric3

    acid and hydrogen peroxide. The resultant solution is analysed by known analytical4

    methods. When quantity of an analyte in the solution is measured, its concentration5

    can be expressed in ng/m3of the sampled air.6

    The PT was organized in 2005 and focused on the second (analytical) part of the7

    method. The PT provider was the Ecole des Mines de Douai (EMD) supported by the8

    Laboratoire National de Mtrologie et dEssais (LNE). Ten laboratories (N= 10) of9

    the Association Agres de Surveillance de la Qualit de lAir participated in this10

    trial.11

    Results for lead only are discussed below for briefness.12

    13

    1.2. Procedure for preparation of the IHRM14

    The PM10 fraction of suspended particulate matter was collected by EMD on an15

    industrial site according to the standard [36]. The sampling was performed on 2016

    quartz filters (diameter of 50 mm) during one week at a flow rate of 1 m 3h-1, which17

    means a total of 168 m3. Dust on the filters was then digested with 5 ml HNO 3+ 1 ml18

    H2O2in a closed microwave oven.19

    The LNE was in charge to prepare one liter of a solution from the digestion residue20

    which could be used in the PT as an IHRM. The assigned/certified value of the lead21

    content in the solution ccert = 26.72 g l-1 provided by LNE was obtained with a22

    primary method: isotope dilution inductive coupled plasma mass spectrometry (ID-23

    ICP-MS). This content corresponds to 26.72 1000/168 = 159 ng m-3 Pb in the24

    sampled air. The expanded measurement uncertainty of the certified value was Ucert =25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    51/75

    51

    0.77 g l-1at the level of confidence 0.95 and the coverage factor of 2. No stability1

    tests were conducted, since the laboratories used the solution just after the2

    preparation. The uncertainty due to inhomogeneity of the one liter solution was3

    considered negligible. Note, the standard uncertainty was ucert = 0.77/2 = 0.38 g l-1,4

    i.e. 1.4 % of the certified value.5

    Each laboratory received a bottle of 50 ml of this solution (for all analytes).6

    7

    1.3. Analytical methods used and raw data8

    The list of the laboratories-participants was confidential. All of them followed the9

    standard [35]. The methods used were: inductively coupled plasma mass spectrometry10

    (ICP-MS), graphite furnace atomic absorption spectrometry (GF-AAS), and11

    inductively coupled plasma optical emission spectroscopy with ultrasonic12

    nebulization (ICP-OES-USN). The measurements results of i-th laboratory ci, i= 1, 2,13

    ,N= 10 are shown in Table 6.14

    15

    1.4. Statistical analysis of the data16

    There was no statistically significant dependence of the results on the analytical17

    method used. The robust value of the experimental standard deviation sPT of a18

    laboratory result ci calculated by the LNE from the data shown in Table 6 using19

    Algorithm A of the standards [3, 37] was of 3.93 g l-1, i.e. 14.7 % of the certified20

    value. Since the expanded uncertainty stated for lead in the European Directives21

    [33, 34] and the standard [35, p.30] is 25 %, the target value for standard deviation of22

    a laboratory result in the PT was targ= 25/2 = 12.5 % or 3.34 g l-1.23

    Table 624

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    52/75

    52

    Results of the PT for lead content determination in the solution1

    Lab No,

    i

    Method ci

    g l-1

    ci- ccert

    g l-1

    zi Sign

    1 ICP-MS 20.12 -6.60 -1.98 -

    2 ICP-MS 20.28 -6.44 -1.93 -

    3 ICP-OES-USN 30.34 3.62 1.08 +

    4 GF-AAS 29.00 2.28 0.68 +

    5 ICP-MS 25.00 -1.72 -0.51 -

    6 GF-AAS 28.40 1.68 0.50 +

    7 ICP-MS 27.80 1.08 0.32 +

    8 ICP-MS 25.70 -1.02 -0.31 -

    9 GF-AAS 28.20 1.48 0.44 +

    10 ICP-MS 25.51 -1.21 -0.36 -

    2

    Uncertainty of the certified value ucert= 1.4 % was negligible in comparison with3

    targ and z-score was applicable for the proficiency testing based on the target targ4

    value. The calculatedz-score values are shown in Table 6. All of them are between 25

    and +2, and therefore, were interpreted as satisfactory.6

    7

    8

    9

    1.4.1. Metrological compatibility assessment10

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    53/75

    53

    Since a hypothesis on the normal distribution of the PT results was not taken into1

    account, compatibility of the results (as a group) is tested based on non-parametric2

    statistics as shown in Annex A, para. 4.3

    As the standard uncertainty of the certified value ucert = 1.4 % was insignificant in4

    comparison with the target standard deviation of PT results targ= 12.5 %, the5

    permissible bias of the median of the PT results from the certified value was =6

    0.3targ= 3.75 % or 1.00 g l-1. Therefore, ccert+ = 27.72 g l

    -1and ccert- = 25.727

    g l-1. There wereN+= 5 results ci> 27.72 g l-1andN-= 5 results ci< 25.72 g l

    -1.8

    They are shown in Table 6 as signs "+" and "-", respectively. Both N+andN-values9

    are high than the critical value A = 1 in Table 4. Therefore, null hypothesis H010

    concerning compatibility of this group of results should be rejected, in spite of the11

    satisfactory z-score values for every laboratory-participant of the PT. Probability of12

    type 1 error (to reject the hypothesis when it is correct) of the decision is of 0.025,13

    while probability of type 2 error (to not reject the hypothesis when it is false) is of14

    above 0.85 according to Fig. 9.15

    16

    17

    EXAMPLE 2. SCENARIO 2: PT FOR ARSENIC DETERMINATION IN WATER18

    2.1. Aim of the PT19

    The aim of the PT was to support water testing laboratories from the Southern African20

    Development Community (SADC) and from East African Community in their effort21

    to improve the quality of measurement results. The PT round was organized in 200622

    within the Water PT Scheme of the SADCMET (SADC Cooperation in Measurement23

    Traceability). The organizers were the Water Quality Services, Windhoek, Namibia,24

    in cooperation with the Universitt Stuttgart, Germany, and with financial support by25

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    54/75

    54

    the Physikalisch-Technische Bundesanstalt, Braunschweig, Germany. The analytes1

    were Ca, Mg, Na, K, Fe, Mn, Al, Pb, Cu, Zn, Cr, Ni, Cd, As, SO42-, Cl-, F-, NO3

    -, and2

    PO43-

    in synthetic water modeling drinking/ground water. Three IHRMs with different3

    analyte concentrations were prepared and distributed between the laboratories-4

    participants for analysis.5

    In the following description the determination of the arsenic concentration in one6

    IHRM only was selected as an example.7

    8

    2.2. Procedure for preparation of the IHRM9

    The IHRM was formulated on the basis of analytical grade water spiked with pure10

    chemicals. Arsenic (III) oxide from Sigma-Aldrich (purity pc= 99.995 %) was used11

    for the preparation of the stock solution with a content of As of about 0.4 mg g-1. The12

    mass32OAs

    m of the oxide was measured on an analytical balance (Sartorius RC 210D),13

    the total mass mss/tof the stock solution was determined by the difference weighing on14

    a Sartorius BA3100P balance. About mss = 100 g of the stock solution was diluted to15

    about mdil/t= 1000 g also on a Sartorius BA3100P balance. Finally about mdil= 200 g16

    of the diluted solution (also weighed on the same balance) were diluted to about mlot=17

    49900 g. The total mass mlotof this lot was determined by difference weighing on a18

    Sartorius F150S balance.19

    The assigned/certified value of the As concentration in the IHRM was assessed20

    according to the preparation procedure and taking into account the proportion21

    32/ OAsAsp of atomic weights (from IUPAC publications), the purity of As2O3used, the22

    densitylotof the final lot and a buoyancy correction factor bcf. The density of the23

    final lot was measured gravimetrically using a 100 ml pycnometer. The certified value24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    55/75

    55

    ccertof the mass concentration of As in the final lot was calculated by the following1

    formula:2

    tdilcflottss

    dillotsscOAsAsOAs

    cert mbmm

    mmppm

    c //

    / 3232

    =

    . (14)3

    4

    Formula (14) enables also calculation of the uncertainty budget of the certified5

    value. The uncertainties of the masses were derived from precision experiments,6

    delivering directly the standard uncertainty, and from the linearity tolerances given by7

    the manufacturer (used as rectangular distribution). The uncertainty of the purity was8

    derived from manufacturers information. The uncertainty of the buoyancy correction9

    factor was estimated from the possible variations in the atmospheric pressure, air10

    humidity and temperature [38]. For the estimation of the uncertainty of density, a11

    separate budget was calculated taking into account the uncertainties of the weighing12

    and that of the temperature measurement. The uncertainties of the atomic weights and13

    of stability and homogeneity of the solution were neglected.14

    The assigned/certified value of the As content in the IHRM and its expanded15

    uncertainty were ccert Ucert = 0.1706 0.0001 mg l-1at the level of confidence 0.9516

    and the coverage factor of 2. Note, the expanded uncertainty was of 0.07 % of the17

    reference value.18

    Each laboratory received a bottle of 1 L of this IHRM (for all analytes).19

    20

    2.3. Analytical methods used and raw data21

    Nine laboratories-participants (N = 9) reported results on determination of the As22

    concentration shown in Table 7. One of the major problems of current situation with23

    water analysis in Africa is absence of any common standard for analytical methods.24

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    56/75

    56

    The methods used were: inductively coupled plasma optical emission spectrometry1

    (ICP-OES), atomic absorption spectrometry (AAS) and others.2

    3

    2.4. Statistical analysis of the data4

    High standard deviations from the certified value (above 20 % of the value) were5

    expected at a workshop organized for representatives of the laboratories-participants6

    prior to this PT round. Therefore, it was decided to use the target standard deviation7

    targof 20 % of the certified value, when the experimental standard deviation sPT> 208

    %. Since in the As case the robust sPT value, calculated from the data shown in9

    Table 7 by Algorithm A of the standards [3, 37], was of 50.5 % (0.086 mg l -1), the10

    stated target value targ= 20 % (0.034 mg l-1) was applied for the proficiency11

    assessment withz-score. Thez-score values are shown in Table 7 with the comments:12

    satisfactory (Yes) when they were between 2 and +2, questionable (Quest) for 2

  • 7/25/2019 IUPAC-CITAC Guide Draft 0PT Schemes2019.10.09

    57/75

    57

    Table 71

    Results of the PT for arsenic content determination in water2

    Lab N

    i

    Method ci

    mg l-1ci- ccert

    mg l-1zi Comment Sign

    4 AAS 0.03 -0.1406 -4.12 No -

    10 other 0. 20 0.0294 0.86 Yes +

    18 ICP-OES 0.20 0.0294 0.86 Yes +

    19 ICP-OES 0.12 -0.0506 -1.48 Yes -

    26 ICP-OES 0.12 -0.0506 -1.48 Yes -

    34 AAS 0.169 -0.0206 -0.05 Yes 0

    35 AAS 0.08 -0.0906 -2.66 Quest -

    37 ICP-OES 0.789 0.6184 18.12 No +

    38 other 0.258 0.0874 2.56 Quest +

    3

    Therefore, the permissib