49
Statistics and Data Analysis in Proficiency Testing Michael Thompson School of Biological and Chemical Sciences Birkbeck College (University of London) Malet Street London WC1E 7HX [email protected]

Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

  • Upload
    lydien

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Statistics and Data Analysisin

Proficiency Testing

Michael ThompsonSchool of Biological and Chemical Sciences

Birkbeck College (University of London)Malet Street

London WC1E [email protected]

Page 2: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Organisation of a proficiency test

“Harmonised Protocol”. Pure Appl Chem. 2006, 78, 145-196.

Page 3: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Where do we use statistics inproficiency testing?

• Finding a consensus and its uncertainty touse as an assigned value

• Assessing participants’ results• Assessing the efficacy of the PT scheme• Testing for sufficient homogeneity and

stability of the distributed test material• Others

Page 4: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Criteria for an ideal scoringmethod

• Adds value to raw results.• Easily understandable, based on the

properties of the normal distribution.• Has no arbitrary scaling transformation.• Is transferable between different

concentrations, analytes, matrices, andmeasurement principles.

Page 5: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

How can we construct a score?

• An obvious idea is to utilise the propertiesof the normal distribution to interpret theresults of a proficiency test.

BUT…

We do not makeany assumptionsabout the actualdata.

Page 6: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Example dataset A• Determination of protein nitrogen in a meat

product.

Page 7: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

A weak scoring method

• On average, slightly more than 95% of laboratoriesreceive z-score within the range ±2.

sxxz

077.0126.2

sx

Page 8: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Robust mean and standarddeviation

• Robust statistics is applicable to datasets thatlook like normally distributed samplescontaminated with outliers and stragglers (i.e.,unimodal and roughly symmetric.

• The method downweights the otherwise largeinfluence of outliers and stragglers on theestimates.

• It models the central ‘reliable’ part of the dataset.

robrob ˆ,ˆ

Page 9: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Can I use robust estimates?

Measurement axis

Skewed

Bimodal

Heavy-tailed

Page 10: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

nxxx 21Tx

MAD5.1ˆ,medianˆ,0,21Set 00 pk

ppipp

ppipp

ppippi

i

kxk

kxk

kxkx

x

ˆˆifˆˆ

ˆˆifˆˆ

ˆˆˆˆif~

)~var()(ˆ

)~(meanˆ2

1

1

ip

ip

xkf

x

1converged,notIf pp

Huber’s H15

Page 11: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

References: robust statistics

• Analytical Methods Committee,Analyst,1989, 114, 1489

• AMC Technical Brief No 6, 2001(download from www/rsc.org/amc)

• P J Rousseeuw, J. Chemomet, 1991, 5, 1.

Page 12: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Is that enough?

• On average, slightly less than 95% oflaboratories receive a z-score between ±2.

robrobxz ˆˆ

048.0ˆ

128.2ˆ

rob

rob

Page 13: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

What more do we need?

• We need a method that evaluates the datain relation to its intended use, rather thanmerely describing it.

• This adds value to the data rather thansimply summarising it.

• The method is based on fitness forpurpose.

Page 14: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Fitness for purpose

• Fitness for purpose occurs when the uncertaintyof the result uf gives best value for money.

• If the uncertainty is smaller than uf , the analysismay be too expensive.

• If the uncertainty is larger than uf , the cost andthe probability of a mistaken decision will rise.

Page 15: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Fitness for purpose

• The value of uf can sometimes be estimatedobjectively by decision theoretic methods, but ismost often simply agreed between thelaboratory and the customer by professionaljudgement.

• In the proficiency test context, uf should bedetermined by the scheme provider.

Reference: T Fearn, S A Fisher, M Thompson,and S L R Ellison, Analyst, 2002, 127, 818-824.

Page 16: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

• If we now define a z-score thus:

we have a z-score that is both robustified againstextreme values and tells us something about fitnessfor purpose.

• In an exactly compliant laboratory, scores of 2<|z|<3will be encountered occasionally, and scores of |z|>3rarely. Better performers will receive fewer of theseextreme z-scores.

A score that meets all of thecriteria

fpprob uxz whereˆ

Page 17: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Example data A again• Suppose that the fitness for purpose criterion set

for the analysis is an RSD of 1%. This gives us:021.01.201.0 p

Page 18: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Finding a consensus fromparticipants’ results

• The consensus is not theoretically the bestoption for the assigned value but is usuallythe only practicable value.

• The consensus is not necessarily identicalwith the true value. PT providers have tobe alert to this possibility.

Page 19: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

What is a ‘consensus’?

• Mean? - easy to calculate, but affected byoutliers and asymmetry.

• Robust mean? - fairly easy to calculate, handlesoutliers but affected by asymmetry.

• Median? - easy to calculate, more robust forasymmetric distributions, but larger standarderror than robust mean.

• Mode? - intuitively good, difficult to define,difficult to calculate.

Page 20: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

• The robust mean provides a useful consensusin the great majority of instances, where theunderlying distribution is roughly symmetricand there are 0-10% outliers.

• The uncertainty of this consensus can besafely taken as

The robust mean as consensus

nxu roba ̂

Page 21: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

When can I use robust estimates?

Measurement axis

Skewed

Bimodal

Heavy-tailed

Page 22: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Skewed distributions

• Skews can arise when the participants’results come from two or moreinconsistent methods.

• They can also arise as an artefact at lowconcentrations of analyte as a result ofdata recording practice.

• Rarely, skews can arise when thedistribution is truly lognormal.

Page 23: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Possible use of a trimmed dataset?

Page 24: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Can I use the mode?How many modes? Where are they?

Page 25: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

The normal kernel density foridentifying a mode

where Φ is the standard normal density,

AMC Technical Brief No. 4

n

i

i

hxx

nhy

1

1

2

)2/exp()(

2aa

Page 26: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

A normal kernel

Page 27: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

A kernel density

Page 28: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Another kernel density

Page 29: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Graphical representation of sample data

Page 30: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Kernel density of the aflatoxin data

Page 31: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Uncertainty of the mode

• The uncertainty of the consensus can beestimated as the standard error of themode by applying the bootstrap to theprocedure.

• The bootstrap is a general procedurebased on resampling for estimatingstandard errors of complex statistics.

• Reference: Bump-hunting for the proficiencytester – searching for multimodality. P JLowthian and M Thompson, Analyst, 2002,127,1359-1364.

Page 32: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

The normal mixture model

AMC Technical Brief No 23, and AMC Software.Thompson, Acc Qual Assur, 2006, 10, 501-505.

1,)()(11

m

jj

m

jjj pyfpyf

2

2/)(exp()(

22j

j

yyf

Page 33: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Mixture models found by the maximumlikelihood method (the EM algorithm)

• The M-step

• The E-step

nyjPpn

iij /)(̂ˆ

1

n

ii

n

iiij yjPyjPy

11)(̂)(̂̂

)(̂)(̂)̂(ˆ1 1

22i

n

j

m

iiji yjPyjPy

)(ˆ)(ˆ)(̂1

i

m

jjjijji yfpyfpyjP

Page 34: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Kernel density and fit of 2-componentnormal mixture model

Page 35: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Kernel density and variance-inflatedmixture model

Page 36: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Useful References

• Mixture modelsM Thompson. Accred Qual Assur. 2006, 10, 501-505.AMC Technical Brief No. 23, 2006. www/rsc.org/amc

• Kernel densitiesB W Silverman, Density estimation for statistics and dataanalysis. Chapman and Hall, London, 1986.AMC Technical Brief, no. 4, 2001 www/rsc.org/amc

• The bootstrapB Efron and R J Tibshirani, An introduction to thebootstrap. Chapman and Hall, London, 1993AMC Technical Brief, No. 8, 2001 www/rsc.org/amc

Page 37: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

• Use z-scores based on fitness forpurpose.

• Estimate the consensus as the robustmean and its uncertainty asif the dataset is roughly symmetric.

• If the dataset is skewed and plausiblycomposite, use kernel density methodsor mixture models

Conclusions—scoring

nrob̂

Page 38: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Homogeneity testing

• Comminute and mix bulk material.• Split into distribution units.• Select m>10 distribution units at random.• Homogenise each one.• Analyse 2 test portions from each in

random order, with high precision, andconduct one-way analysis of variance onresults.

Page 39: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Design for homogeneity testing

2,

MSWMSBsMSWs saman

Page 40: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Problems with simple ANOVAbased on testing

• Analytical precision too low—methodcannot detect consequential degree ofheterogeneity.

• Analytical precision too high—methodfinds significant degree of heterogeneitythat may not be consequential.

(Everything is heterogeneous!)

0:0 samH

Page 41: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as
Page 42: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as
Page 43: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as
Page 44: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

• Material passes homogeneity test if

• Problems are:– ssam may not be well estimated;– too big a probability of rejecting

satisfactory test material.

“Sufficient homogeneity”:original definition

pLsams 3.0

Page 45: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Fearn test

• Test by rejecting when

Ref: Analyst, 2001, 127, 1359-1364.

220 : LsamH

2

11

,122

12

2

mmanmL

samFs

ms

Page 46: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

Problems with homogeneitydata

• Problems with data are common:e.g., no proper randomisation, insufficientprecision, biases, trends, steps,insufficient significant figures recorded,outliers.

• Laboratories need detailed instructions.• Data need careful scrutiny before

statistics.• HP1 is incorrect in saying that all outlying

data should be retained.

Page 47: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as
Page 48: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as
Page 49: Statistics and Data Analysis in Proficiency Testing 2007_tcm18-87000.pdf · Where do we use statistics in proficiency testing? • Finding a consensus and its uncertainty to use as

General references

• The Harmonised Protocol (revised)M Thompson, S L R Ellison and R WoodPure Appl. Chem., 2006, 78, 145-196.

• R E Lawn, M Thompson and R F Walker,Proficiency testing in analytical chemistry. TheRoyal Society of Chemistry, Cambridge, 1997.

• ISO Guide 43. International StandardsOrganisation, Geneva, 1997.