MODULE 2 Statistics Applaied to Analytical Chemistry ...€¦ · Statistics Applaied to Analytical Chemistry Last class: Basic Concepts and definitions There is no absolute value

ADVANCED ANALYTICAL CHEMISTRY – 1S 2018

MODULE 2

Statistics Applaied to Analytical Chemistry

- Basic Concepts

Class notes : www.ufjf.br/baccan

Prof. Rafael Arromba de Sousa Departamento de Química - ICE [email protected]

Statistics Applaied to Analytical Chemistry

Last class:

Basic Concepts and definitions

There is no absolute value for an analytical result

Correct way to express results

Accuracy and precision definition

Importance of rejecting anomalous results

Exercises:

Exerc. 1 Significant figures

Exerc. 2 Data “treatment” 3

RELATION BETWEEN ACCURACY AND PRECISION

Accuracy and Precision are related mainly in 3 diferente forms:

Analytical Method

C B A

precise and accorate ! precise bu unaccurate imprecise and unaccurate

Analite conc. True value

PRATICAL EXEMPLES ?? 3

Some formalities... TERMINOLOGIES

Codex Committee on Methods of Analysis and Sampling. Guidelines on Analytical Terminology (CAC/GL72 – 2009)

1) Good precision = Good repeatability ≠ reproductibility

4

5

TERMINOLOGIES

2) Intermediate Precision

(Semelhante ao apresentado na Rev. 05 – Ago 2016)

STATISTICS IN CHEMICAL ANALYSIS

Comparison of Results:

6

Precisions (Test F)

Averages (Test t - Student)

Test F (or Snedecor´s Test)

F = SA

2

SB2

IF Fcalculated Fcritical for the desired confidence level

There is a significant difference between the data

IF Fcalculated < Fcritical for the desired confidence level

There is not significant difference between the data in relation to their precisions…

To compare precisions (or variances)

of two averages (A and B)

“A” is the one with the highest deviation

7

Eg: F critical values to the 5% level*

* It gives 95% confidence to the test (for the non existence of significative difference)

3 4 5 6 12 20 Numerator 3 9.28 9.12 9.01 8.94 8.74 8.64 4 6.59 6.39 6.26 6.16 5.91 5.80 5 5.41 5.19 5.05 4.95 4.68 4.56 6 4.76 4.53 4.39 4.28 4.00 3.87 12 3.49 3.26 3.11 3.00 2.69 2.54 20 3.10 2.87 2.71 2.60 2.28 2.12 Denom.

Degrees Of freedom

8

Tables for F

EXERCISE Exerc.3- Comment on the difference in precision obtained in laboratories A and B for the determination of Mg in the same milk sample considering a confidence level of 95%. Data: Lab. A : 34.97; 34.85; 34.94 and 34.88 mg L-1 Lab. B : 35.02; 34.96; 34.99; 35.07 and 34.85 mg L-1

(close precisions, comparable) 9

To evaluate the accuracy and precision it is also necessary to understand what errors affect these

parameters !

ERROR TYPES: - SYSTEMATIC (traceable and can be avoided) - RANDOM (always present ...)

10

Method errors: arise from the non-ideal chemical or physical behavior of analytical systems Personal errors: result from the lack of care, lack of attention or personal limitations of the analyst Instrumental errors: caused by non-ideal behavior of an instrument, by faulty calibrations or by the use of inappropriate conditions

affect the accuracy

11

1) Systematic or Determined Errors (Can be known and traceable)

How to detect a systematic error?

- Certified Reference Materials (CRM)

- Analyte addition and recovery method (spike)

- Comparative methods

- Interlaboratory tests

12

Systematic or Determined Errors

CRM Acquired from specialized companies (NIST*) Samples with certificate of analysis (result ± uncertainty) Substances commonly found in samples: ENVIRONMENT - CLINICS - BIOLOGICAL - FORENSICS

(*) National Institute and Technology

13

Systematic or Determined Errors

Affect precision

They vary according to a normal distribution

Measurements randomly fluctuate

around the mean

2) Unidentified Errors (random)

Can not be located…

14

Eg. of a Normal Distribution (Calibrating a pipette)

50

9.969 9.971

9.981 9.983

9.993 9.995

9.987 9989

9.975 9.977

Mesu

rem

ents

%

30

10

Gauss Curve (Distribution Profile)

OBS: This slide was prepared from material by Prof. Célio Pasquini (IQ-Unicamp)

volume (mL)

Histogram showing the distribution of 50 measurements of volume flowed through a 10.00 mL pipette

15

Characteristics of a Normal Distribution

The experimental values vary at times above or below the arithmetic mean by errors that appear to be random, which

is an expected behavior and therefore "normal"

Karl F. Gauss

16

Y = 1

2 exp -

1 2

(Xi - )2

2

Probability of occurrence of a result (Y)

Gauss´Normal Distribution

μ is the mean of the population (situation of several measures)

Thus, one can calculate a range for a result R assuming that the deviations observed follow a normal distribution

17

Normal Distribution and Classical Statistics

For classical statistics, deviations from normality do not affect statistical results (robust methods) and this idea is

presented in the "Central Limit Theorem":

18

“If the total fluctuation in a certain random variable is the result of the sum of the fluctuations of many independent variables of more or

less equal importance, their distribution will tend to be normal, no matter what the nature of the distributions of the individual

variables”

William S Gossett (1868 – 1937): Developed the Student distribution to analyze alcohol content in beer

0 0

0,1

0,2

0,3

0,4

+ _

+1 +2 -2 -1

= x z

N

Rel

ativ

e F

requen

cy

Confidence limits for the arithmetic mean of a result

Confidence level

for Z %

50 0.67

68 1.00

80 1.28

90 1.64

95 1.96

95,4 2.00

99 2.58

99,7 3.00

99,9 3.29 Gauss´ Normal Distribution

19

OBS: This slide was prepared from material by Prof. Célio Pasquini (IQ-Unicamp)

= x t S

N

Confidence limits when “μ” is unknown

= x z

N

Degrees of freedom 95% 99%

1 12.71 63.66

2 4.30 9.93

3 3.18 5.84

4 2.78 4.60

5 2.57 4.03

6 2.45 3.71

7 2.36 3.50

8 2.31 3.36

9 2.26 3.25

10 2.23 3.17

. . .

. . .

1.96 2.58

20

Example 4 (application of the concept in the expression of a result): An individual made four determinations of iron in a metal alloy, finding an average value of 31.40 % m/m and an estimate of the standard deviation of 0.11 % m/m. What is the range of the average population with a confidence level of 95%? µ = ? µ = 31.40 ± (3.18 x 0.11) / 4 µ = 31.40 ± 0.17 C Fe = (31.23 – 31.57) % m/m

= x t S

N

21

22

Comparison between averages

Situation 1

Result ± uncertainty

Reference value

Situation 2

Result ± uncertainty

Reference value ± uncertainty

Basic data treatment

Example of application – SITUATION 1 (Sampling Experiment)

Eg: Comparison of M & M color distribution with the manufacturer's specification

Based on the quantification of different samples

average (X) and deviation estimative (S) for each colour Manufacturer Average (µ) It is calculated “t” with the desired confidence and one compares with the tabulated value :

If the t calculated < t tabulated there is no significant difference between the average

23

- x

S N t =

Another way to use the Confidence Limit for this same type of comparison

Comparison of a mean with a reference value when there is no deviation from the reference value

24

Confidence interval for the result

Reference value Example of a

situation where the obtained result agrees with the reference value

SITUATION 2: If the precisions are compatible, the averages can be evaluated (Student´s t-test)

(n1-1) S12 + (n2 -1) S2

2

n1 + n2 - 2 Sp=

IF tcalculated < tcritical for the desired confidence level there is no significant difference between the averages and

the observed differences are due to random errors

n is the number of measurements for each average Sp corresponds to "grouped S”

x1 - x2

Sp

t = n1 + n2

n1 . n2

25

ν t tabulated = n1+n2-2

BUT if the precisions are not comparable :

(n1-1) S12 + (n2 -1) S2

2

n1 + n2 - 2 Sp=

x1 - x2

Sp

t = n1 + n2

n1 . n2

26

ν t tabelado = n1+n2-2

ν t tabulated:

SPECIFIC FORMULAS (To search ...)

n1 = n2 n1 ≠ n2

Graus de liberdade 95% 99%

1 12,71 63,66

2 4,30 9,93

3 3,18 5,84

4 2,78 4,60

5 2,57 4,03

6 2,45 3,71

7 2,37 3,50

8 2,31 3,36

9 2,26 3,25

10 2,23 3,17

. . .

. . .

1,96 2,58

Critical values for t at 95 and 99% confidence level (P= 0,025 e P= 0,005)

- Statistical tests are valid when the errors involved are random

- To higher ν (degrees of freedom) the more reliable is the test 27

Other ways to use the Student´s t-test

Comparison of data in pairs: Paired t-test

d Sd N

t =

28

Different sample groups (of the same type or different lots) analyzed by different methods or different analysts

Procedure: 1) To arrange the data in pairs 2) To calculate the "standard deviation" between the averages differences

for each pair 3) To calculate the t value:

ν t tabulated = N-1

ν= degrees of freedom

EXERCISE 4

In your post-graduation project will you compare results?

If so, which test will be used? Why?

29

30

Statistical tests for results comparison

- Precision - Accuracy

Hypothesis Tests

- Null hypothesis (H0) H0: µ = µ0

- Alternative hypothesis (Ha)

Ha: µ ≠ µ0 (µ0 é o valor verdadeiro ou de referência)

31

Some Examples of "Error Propagation"

It takes into account the uncertainties of each stage of the process

For undetermined errors (most usual situation) For a result “R”:

32

R = A + B – C (addition and subtraction)

SR =

SA2 + SB

2 + SC2

R = AB

C

SR

R =

SA

A

2 SB

B

2 SC

C

2

+ +

(multiplication and division)

Propagação de Erros “Homework”

Exercise 5: Considering that the S of a balance is 0,0001 g, calculate the standard deviation estimate of a weighing done in this balance.

(SR= 0,0001g)

33

FOR INSTRUMENTAL METHODS OF ANALYSIS,

STATISTICAL DATA TREATMENT ALSO INCLUDE: Linear regression Analytical/ calibration curve Types: - univariate (“conventional”) - multivariada (cheometric tool) Detection and quantification limits

Calculations based on the estimate of the standard deviation of the blank in order to predict the detectability of the instrument and method used

34

LINEAR REGRESSION

It is the line that best represents the relationship between measured property (Abs, eg) and

concentration of patterns

Concentration (mg L-1) 0 1

Ab

sorb

an

ce

Patterns

Blank

Abs= 48.3x + 0.24 r= 0.9987

- The correlation coefficient (r) varies between -1 and +1 The closer to the "unit", the better the correlation is 35

LINEAR REGRESSION A linear analytic curve is not always possible and a nonlinear

regression can be used as long as it presents a "good correlation“

Linear Regressions are the most usual and can be obtained through softwares, which use the "least squares method":

n= number of points (x1;y1) used in the calibração curve

For y= ax + b, with a correlation coef. “r”:

a = ______________ n Σxiyi – ΣxiΣyi

n Σxi2 – (Σxi)

2 b = y - x

r = _______________________________ n Σxiyi – ΣxiΣyi

{ [nΣxi2 – (Σxi)

2] [nΣyi2 – (Σyi )

2] } 1/2 36

LINEAR REGRESSION The "least squares method" minimizes the "residuals" (errors) that the regression line generates for each of the points At the center points of a calibration curve the residuals are always smaller Implications ?

37

LINEAR REGRESSION

“Homework”: Study the example 4.9 and 4.10 from “Vogel – Análise Química

Quantitativa”, 6ª Ed: To determine chitin by molecular fluorescence, chitin standards were used at the following concentrations: 0.10; 0.20; 0.30 and 0.40 μg mL-1, which resulted in the following emission values, respectively: 5.20; 9.90; 15.30 and 19.10 Cps. The blank generated a reading of 0.00 Cps. Considering that a linear correlation coefficient greater than 0.99 is satisfactory, calculate the chitin concentration of a sample whose analytical signal was 16.10 Cps.

(y= 48.3x + 0.24; r= 0,9987; C chitin = 0,32 µg mL-1) 38

LINEAR REGRESSION In order to obtain a "good calibration" it is necessary that:

a) The only errors in the measurements are due to fluctuations in

the dependent variable "y“ b) The points be homocedastic (random errors with zero mean and

constant variance) - Cochran (Q) or Levene tests "normal" or weighted regression

c) Ensure that errors are not correlated Prepare all patterns in the same way and perform the experimental measurements randomly d) The errors follow a normal distribution Shapiro Wilk's Test e) The obtained model does not show lack of fit (ANOVA)

39

LINEAR REGRESSION AND MATRIX EFFECT

Since Linear Regression is a "mean" line, one can calculate the uncertainty of its angular and linear coefficients These uncertainties can be used to evaluate the regression itself as well as other parameters

Eg: Evaluation of matrix effect in the development of an analytical method*:

0 2 4 6 8 10

-0.002

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

0.016

0.018

0.020

AI

Concentração de Pb (ug/L)

Curva de adição de padrão

Curva analítica em meio ácido a

“adição de padrão”

(meio alcalino)

a

“calibração usual”

(meio ácido)

1,60 10-3 ± 0,05 10 -3 1,78 10-3 ± 0,03 10-3

The angular coefficients (a) are compared in a 95% confidence level to know if media (alkaline and acidic)

are different or not!

( y= ax + b )

40 (*) R.A.de Sousa et al. / Talanta 104 (2013) 90–96

LINEAR REGRESSION AND MATRIX EFFECT

Calculation of the uncertainties of “a” and “b”:

S y/x = [ Σ(yi – y )2 / (n-2) ] 1/2 ^

Obtained using the regression equation itself

S a = ______________ S y/x

[ Σ(xi – x )2 ]1/2

S b = S y/x [ Σxi2 / nΣ(xi- x )

2 ]1/2

A careful observation of the values of yi - y show that the regression error is smaller in the vicinity of

the center of the calibration curve !

^

41

LINEAR REGRESSION AND FIGURES OF MERIT Estimation of DETECTION LIMIT (LOD) and QUANTIFICATION LIMIT (LOQ)

Represents the METHOD DETECTABILITY

"REMEMBER": instrumental LOD is different than method LOD Example

To quantify 0.02 mg kg-1 Pb in a fish sample by GF AAS the LOQ should be considerably less than 0.02 mg kg-1 Pb !

Elemental analysis Sample is digested In the final solution the sample is diluted (10x eg.) C Pb sample solution= 0.02 mg kg-1 / 10 = 0.002 mg kg-1

Instrumental LOQ ≤ 0.002 mg.kg-1 Pb and not 0.02 mg kg-1

42

Definitions Limit of detection: It is the lowest concentration or mass of analyte that can be detected with some confidence Depends on the magnitude of the analytical signal in relation to the blank fluctuation: 3 Sblank

LOD= m m= analytical curve angular coefficient

43

LINEAR REGRESSION AND FIGURES OF MERIT Estimation of DETECTION LIMIT (LOD) and QUANTIFICATION LIMIT (LOQ)

CONSIDERATION: LOD concentration corresponding to the lowest detectable signal Lowest detectable signal = Signal blanc + 3 S blanc (1) For one to have a 99% confidence level IF Analytical curve: Y= m X + b b= Signal blanc (2) At the limit concentration Y = lowest detectable signal (3) X = LOD Replacing (2 and 3) in (1): Signalblanc + 3 Sblanc = m LOD + Signalblanc 3 Sbranco = LOD m

44

LINEAR REGRESSION AND FIGURES OF MERIT

CALCULATION OF DETECTION LIMIT

At the LOD level the measurement is significantly affected by instrumental noise (low accuracy) Quantitative measurments should be made at a level above the LOD, generally:

Can be determined experimentally (as a function of precision)

10 Sblanc

LOQ= m

45

LINEAR REGRESSION AND FIGURES OF MERIT

THE LIMIT OF QUANTIFICATION

46

There are other ways to calculate LOD and

LOQ values, depending on the technique and purpose of the method / analysis

Questions ? Comments ?

Other Uses of Statistics in the Laboratory...

Chemometrics methods for design and optimization of experiments, multivariate data analysis and multivariate calibration “Chemometrics can be defined as the application of mathematical and statistical methods in the planning or optimization of procedures and the obtaining of chemical information through the analysis of relevant results”

Subject for another discipline!!

47

48

Recommended Basic Literature for Initial Studies

ARTICLE Bruns, RE; Faigle, FG. “Quimiometria”, Quím. Nova, 8 (1985) 84 BOOK Gemperline, PJ. Practical Guide to Chemometrics (2nd Ed), CRC Press Taylor and Francis, New York (2006)

USED REFERENCES

1) D. A. Skoog, D. M. West, F. J. Holler, Stanley R. Crouch Fundamentos de Química Analítica, 8a Ed., CENGAGE Learning, 2009 2) J. Mendham, R. C. Denney, J. D. Barnes, M. Thomas Vogel - Análise Química Quantitativa, 6a ed., LTC, 2002 3) D. C. Harris, Análise Química Quantitativa, 7a ed., LTC, 2008 4) B. B. Neto, I. E. Scarminio, R. E. Bruns, Como Fazer Experimentos, Editora da Unicamp, 2001 5) J. N. Miller, J. C. Miller, Statistics and Chemometrics for Analytical Chemistry, 5th Ed, Pearson Education Limited, 2005

49

Documents

MODULE 2 Statistics Applaied to Analytical Chemistry ...€¦ · Statistics Applaied to Analytical Chemistry Last class: Basic Concepts and definitions There is no absolute value