Upload
others
View
15
Download
2
Embed Size (px)
Citation preview
ADVANCED ANALYTICAL CHEMISTRY – 1S 2018
MODULE 2
Statistics Applaied to Analytical Chemistry
- Basic Concepts
Class notes : www.ufjf.br/baccan
Prof. Rafael Arromba de Sousa Departamento de Química - ICE [email protected]
Statistics Applaied to Analytical Chemistry
Last class:
Basic Concepts and definitions
There is no absolute value for an analytical result
Correct way to express results
Accuracy and precision definition
Importance of rejecting anomalous results
Exercises:
Exerc. 1 Significant figures
Exerc. 2 Data “treatment” 3
RELATION BETWEEN ACCURACY AND PRECISION
Accuracy and Precision are related mainly in 3 diferente forms:
Analytical Method
C B A
precise and accorate ! precise bu unaccurate imprecise and unaccurate
Analite conc. True value
PRATICAL EXEMPLES ?? 3
Some formalities... TERMINOLOGIES
Codex Committee on Methods of Analysis and Sampling. Guidelines on Analytical Terminology (CAC/GL72 – 2009)
1) Good precision = Good repeatability ≠ reproductibility
4
5
TERMINOLOGIES
2) Intermediate Precision
(Semelhante ao apresentado na Rev. 05 – Ago 2016)
STATISTICS IN CHEMICAL ANALYSIS
Comparison of Results:
6
Precisions (Test F)
Averages (Test t - Student)
Test F (or Snedecor´s Test)
F = SA
2
SB2
IF Fcalculated Fcritical for the desired confidence level
There is a significant difference between the data
IF Fcalculated < Fcritical for the desired confidence level
There is not significant difference between the data in relation to their precisions…
To compare precisions (or variances)
of two averages (A and B)
“A” is the one with the highest deviation
7
Eg: F critical values to the 5% level*
* It gives 95% confidence to the test (for the non existence of significative difference)
3 4 5 6 12 20 Numerator 3 9.28 9.12 9.01 8.94 8.74 8.64 4 6.59 6.39 6.26 6.16 5.91 5.80 5 5.41 5.19 5.05 4.95 4.68 4.56 6 4.76 4.53 4.39 4.28 4.00 3.87 12 3.49 3.26 3.11 3.00 2.69 2.54 20 3.10 2.87 2.71 2.60 2.28 2.12 Denom.
Degrees Of freedom
8
Tables for F
EXERCISE Exerc.3- Comment on the difference in precision obtained in laboratories A and B for the determination of Mg in the same milk sample considering a confidence level of 95%. Data: Lab. A : 34.97; 34.85; 34.94 and 34.88 mg L-1 Lab. B : 35.02; 34.96; 34.99; 35.07 and 34.85 mg L-1
(close precisions, comparable) 9
To evaluate the accuracy and precision it is also necessary to understand what errors affect these
parameters !
ERROR TYPES: - SYSTEMATIC (traceable and can be avoided) - RANDOM (always present ...)
10
Method errors: arise from the non-ideal chemical or physical behavior of analytical systems Personal errors: result from the lack of care, lack of attention or personal limitations of the analyst Instrumental errors: caused by non-ideal behavior of an instrument, by faulty calibrations or by the use of inappropriate conditions
affect the accuracy
11
1) Systematic or Determined Errors (Can be known and traceable)
How to detect a systematic error?
- Certified Reference Materials (CRM)
- Analyte addition and recovery method (spike)
- Comparative methods
- Interlaboratory tests
12
Systematic or Determined Errors
CRM Acquired from specialized companies (NIST*) Samples with certificate of analysis (result ± uncertainty) Substances commonly found in samples: ENVIRONMENT - CLINICS - BIOLOGICAL - FORENSICS
(*) National Institute and Technology
13
Systematic or Determined Errors
Affect precision
They vary according to a normal distribution
Measurements randomly fluctuate
around the mean
2) Unidentified Errors (random)
Can not be located…
14
Eg. of a Normal Distribution (Calibrating a pipette)
50
9.969 9.971
9.981 9.983
9.993 9.995
9.987 9989
9.975 9.977
Mesu
rem
ents
%
30
10
Gauss Curve (Distribution Profile)
OBS: This slide was prepared from material by Prof. Célio Pasquini (IQ-Unicamp)
volume (mL)
Histogram showing the distribution of 50 measurements of volume flowed through a 10.00 mL pipette
15
Characteristics of a Normal Distribution
The experimental values vary at times above or below the arithmetic mean by errors that appear to be random, which
is an expected behavior and therefore "normal"
Karl F. Gauss
16
Y = 1
2 exp -
1 2
(Xi - )2
2
Probability of occurrence of a result (Y)
Gauss´Normal Distribution
μ is the mean of the population (situation of several measures)
Thus, one can calculate a range for a result R assuming that the deviations observed follow a normal distribution
17
Normal Distribution and Classical Statistics
For classical statistics, deviations from normality do not affect statistical results (robust methods) and this idea is
presented in the "Central Limit Theorem":
18
“If the total fluctuation in a certain random variable is the result of the sum of the fluctuations of many independent variables of more or
less equal importance, their distribution will tend to be normal, no matter what the nature of the distributions of the individual
variables”
William S Gossett (1868 – 1937): Developed the Student distribution to analyze alcohol content in beer
0 0
0,1
0,2
0,3
0,4
+ _
+1 +2 -2 -1
= x z
N
Rel
ativ
e F
requen
cy
Confidence limits for the arithmetic mean of a result
Confidence level
for Z %
50 0.67
68 1.00
80 1.28
90 1.64
95 1.96
95,4 2.00
99 2.58
99,7 3.00
99,9 3.29 Gauss´ Normal Distribution
19
OBS: This slide was prepared from material by Prof. Célio Pasquini (IQ-Unicamp)
= x t S
N
Confidence limits when “μ” is unknown
= x z
N
Degrees of freedom 95% 99%
1 12.71 63.66
2 4.30 9.93
3 3.18 5.84
4 2.78 4.60
5 2.57 4.03
6 2.45 3.71
7 2.36 3.50
8 2.31 3.36
9 2.26 3.25
10 2.23 3.17
. . .
. . .
1.96 2.58
20
Example 4 (application of the concept in the expression of a result): An individual made four determinations of iron in a metal alloy, finding an average value of 31.40 % m/m and an estimate of the standard deviation of 0.11 % m/m. What is the range of the average population with a confidence level of 95%? µ = ? µ = 31.40 ± (3.18 x 0.11) / 4 µ = 31.40 ± 0.17 C Fe = (31.23 – 31.57) % m/m
= x t S
N
21
22
Comparison between averages
Situation 1
Result ± uncertainty
Reference value
Situation 2
Result ± uncertainty
Reference value ± uncertainty
Basic data treatment
Example of application – SITUATION 1 (Sampling Experiment)
Eg: Comparison of M & M color distribution with the manufacturer's specification
Based on the quantification of different samples
average (X) and deviation estimative (S) for each colour Manufacturer Average (µ) It is calculated “t” with the desired confidence and one compares with the tabulated value :
If the t calculated < t tabulated there is no significant difference between the average
23
- x
S N t =
Another way to use the Confidence Limit for this same type of comparison
Comparison of a mean with a reference value when there is no deviation from the reference value
24
Confidence interval for the result
Reference value Example of a
situation where the obtained result agrees with the reference value
SITUATION 2: If the precisions are compatible, the averages can be evaluated (Student´s t-test)
(n1-1) S12 + (n2 -1) S2
2
n1 + n2 - 2 Sp=
IF tcalculated < tcritical for the desired confidence level there is no significant difference between the averages and
the observed differences are due to random errors
n is the number of measurements for each average Sp corresponds to "grouped S”
x1 - x2
Sp
t = n1 + n2
n1 . n2
25
ν t tabulated = n1+n2-2
BUT if the precisions are not comparable :
(n1-1) S12 + (n2 -1) S2
2
n1 + n2 - 2 Sp=
x1 - x2
Sp
t = n1 + n2
n1 . n2
26
ν t tabelado = n1+n2-2
ν t tabulated:
SPECIFIC FORMULAS (To search ...)
n1 = n2 n1 ≠ n2
Graus de liberdade 95% 99%
1 12,71 63,66
2 4,30 9,93
3 3,18 5,84
4 2,78 4,60
5 2,57 4,03
6 2,45 3,71
7 2,37 3,50
8 2,31 3,36
9 2,26 3,25
10 2,23 3,17
. . .
. . .
1,96 2,58
Critical values for t at 95 and 99% confidence level (P= 0,025 e P= 0,005)
- Statistical tests are valid when the errors involved are random
- To higher ν (degrees of freedom) the more reliable is the test 27
Other ways to use the Student´s t-test
Comparison of data in pairs: Paired t-test
d Sd N
t =
28
Different sample groups (of the same type or different lots) analyzed by different methods or different analysts
Procedure: 1) To arrange the data in pairs 2) To calculate the "standard deviation" between the averages differences
for each pair 3) To calculate the t value:
ν t tabulated = N-1
ν= degrees of freedom
EXERCISE 4
In your post-graduation project will you compare results?
If so, which test will be used? Why?
29
30
Statistical tests for results comparison
- Precision - Accuracy
Hypothesis Tests
- Null hypothesis (H0) H0: µ = µ0
- Alternative hypothesis (Ha)
Ha: µ ≠ µ0 (µ0 é o valor verdadeiro ou de referência)
31
Some Examples of "Error Propagation"
It takes into account the uncertainties of each stage of the process
For undetermined errors (most usual situation) For a result “R”:
32
R = A + B – C (addition and subtraction)
SR =
SA2 + SB
2 + SC2
R = AB
C
SR
R =
SA
A
2 SB
B
2 SC
C
2
+ +
(multiplication and division)
Propagação de Erros “Homework”
Exercise 5: Considering that the S of a balance is 0,0001 g, calculate the standard deviation estimate of a weighing done in this balance.
(SR= 0,0001g)
33
FOR INSTRUMENTAL METHODS OF ANALYSIS,
STATISTICAL DATA TREATMENT ALSO INCLUDE: Linear regression Analytical/ calibration curve Types: - univariate (“conventional”) - multivariada (cheometric tool) Detection and quantification limits
Calculations based on the estimate of the standard deviation of the blank in order to predict the detectability of the instrument and method used
34
LINEAR REGRESSION
It is the line that best represents the relationship between measured property (Abs, eg) and
concentration of patterns
Concentration (mg L-1) 0 1
Ab
sorb
an
ce
Patterns
Blank
Abs= 48.3x + 0.24 r= 0.9987
- The correlation coefficient (r) varies between -1 and +1 The closer to the "unit", the better the correlation is 35
LINEAR REGRESSION A linear analytic curve is not always possible and a nonlinear
regression can be used as long as it presents a "good correlation“
Linear Regressions are the most usual and can be obtained through softwares, which use the "least squares method":
n= number of points (x1;y1) used in the calibração curve
For y= ax + b, with a correlation coef. “r”:
a = ______________ n Σxiyi – ΣxiΣyi
n Σxi2 – (Σxi)
2 b = y - x
r = _______________________________ n Σxiyi – ΣxiΣyi
{ [nΣxi2 – (Σxi)
2] [nΣyi2 – (Σyi )
2] } 1/2 36
LINEAR REGRESSION The "least squares method" minimizes the "residuals" (errors) that the regression line generates for each of the points At the center points of a calibration curve the residuals are always smaller Implications ?
37
LINEAR REGRESSION
“Homework”: Study the example 4.9 and 4.10 from “Vogel – Análise Química
Quantitativa”, 6ª Ed: To determine chitin by molecular fluorescence, chitin standards were used at the following concentrations: 0.10; 0.20; 0.30 and 0.40 μg mL-1, which resulted in the following emission values, respectively: 5.20; 9.90; 15.30 and 19.10 Cps. The blank generated a reading of 0.00 Cps. Considering that a linear correlation coefficient greater than 0.99 is satisfactory, calculate the chitin concentration of a sample whose analytical signal was 16.10 Cps.
(y= 48.3x + 0.24; r= 0,9987; C chitin = 0,32 µg mL-1) 38
LINEAR REGRESSION In order to obtain a "good calibration" it is necessary that:
a) The only errors in the measurements are due to fluctuations in
the dependent variable "y“ b) The points be homocedastic (random errors with zero mean and
constant variance) - Cochran (Q) or Levene tests "normal" or weighted regression
c) Ensure that errors are not correlated Prepare all patterns in the same way and perform the experimental measurements randomly d) The errors follow a normal distribution Shapiro Wilk's Test e) The obtained model does not show lack of fit (ANOVA)
39
LINEAR REGRESSION AND MATRIX EFFECT
Since Linear Regression is a "mean" line, one can calculate the uncertainty of its angular and linear coefficients These uncertainties can be used to evaluate the regression itself as well as other parameters
Eg: Evaluation of matrix effect in the development of an analytical method*:
0 2 4 6 8 10
-0.002
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
AI
Concentração de Pb (ug/L)
Curva de adição de padrão
Curva analítica em meio ácido a
“adição de padrão”
(meio alcalino)
a
“calibração usual”
(meio ácido)
1,60 10-3 ± 0,05 10 -3 1,78 10-3 ± 0,03 10-3
The angular coefficients (a) are compared in a 95% confidence level to know if media (alkaline and acidic)
are different or not!
( y= ax + b )
40 (*) R.A.de Sousa et al. / Talanta 104 (2013) 90–96
LINEAR REGRESSION AND MATRIX EFFECT
Calculation of the uncertainties of “a” and “b”:
S y/x = [ Σ(yi – y )2 / (n-2) ] 1/2 ^
Obtained using the regression equation itself
S a = ______________ S y/x
[ Σ(xi – x )2 ]1/2
S b = S y/x [ Σxi2 / nΣ(xi- x )
2 ]1/2
A careful observation of the values of yi - y show that the regression error is smaller in the vicinity of
the center of the calibration curve !
^
41
LINEAR REGRESSION AND FIGURES OF MERIT Estimation of DETECTION LIMIT (LOD) and QUANTIFICATION LIMIT (LOQ)
Represents the METHOD DETECTABILITY
"REMEMBER": instrumental LOD is different than method LOD Example
To quantify 0.02 mg kg-1 Pb in a fish sample by GF AAS the LOQ should be considerably less than 0.02 mg kg-1 Pb !
Elemental analysis Sample is digested In the final solution the sample is diluted (10x eg.) C Pb sample solution= 0.02 mg kg-1 / 10 = 0.002 mg kg-1
Instrumental LOQ ≤ 0.002 mg.kg-1 Pb and not 0.02 mg kg-1
42
Definitions Limit of detection: It is the lowest concentration or mass of analyte that can be detected with some confidence Depends on the magnitude of the analytical signal in relation to the blank fluctuation: 3 Sblank
LOD= m m= analytical curve angular coefficient
43
LINEAR REGRESSION AND FIGURES OF MERIT Estimation of DETECTION LIMIT (LOD) and QUANTIFICATION LIMIT (LOQ)
CONSIDERATION: LOD concentration corresponding to the lowest detectable signal Lowest detectable signal = Signal blanc + 3 S blanc (1) For one to have a 99% confidence level IF Analytical curve: Y= m X + b b= Signal blanc (2) At the limit concentration Y = lowest detectable signal (3) X = LOD Replacing (2 and 3) in (1): Signalblanc + 3 Sblanc = m LOD + Signalblanc 3 Sbranco = LOD m
44
LINEAR REGRESSION AND FIGURES OF MERIT
CALCULATION OF DETECTION LIMIT
At the LOD level the measurement is significantly affected by instrumental noise (low accuracy) Quantitative measurments should be made at a level above the LOD, generally:
Can be determined experimentally (as a function of precision)
10 Sblanc
LOQ= m
45
LINEAR REGRESSION AND FIGURES OF MERIT
THE LIMIT OF QUANTIFICATION
46
There are other ways to calculate LOD and
LOQ values, depending on the technique and purpose of the method / analysis
Questions ? Comments ?
Other Uses of Statistics in the Laboratory...
Chemometrics methods for design and optimization of experiments, multivariate data analysis and multivariate calibration “Chemometrics can be defined as the application of mathematical and statistical methods in the planning or optimization of procedures and the obtaining of chemical information through the analysis of relevant results”
Subject for another discipline!!
47
48
Recommended Basic Literature for Initial Studies
ARTICLE Bruns, RE; Faigle, FG. “Quimiometria”, Quím. Nova, 8 (1985) 84 BOOK Gemperline, PJ. Practical Guide to Chemometrics (2nd Ed), CRC Press Taylor and Francis, New York (2006)
USED REFERENCES
1) D. A. Skoog, D. M. West, F. J. Holler, Stanley R. Crouch Fundamentos de Química Analítica, 8a Ed., CENGAGE Learning, 2009 2) J. Mendham, R. C. Denney, J. D. Barnes, M. Thomas Vogel - Análise Química Quantitativa, 6a ed., LTC, 2002 3) D. C. Harris, Análise Química Quantitativa, 7a ed., LTC, 2008 4) B. B. Neto, I. E. Scarminio, R. E. Bruns, Como Fazer Experimentos, Editora da Unicamp, 2001 5) J. N. Miller, J. C. Miller, Statistics and Chemometrics for Analytical Chemistry, 5th Ed, Pearson Education Limited, 2005
49