13
1 BASICS OF CHEMOMETRICS Juan Antonio Fernández Pierna Vincent Baeten Pierre Dardenne Walloon Agricultural Research Centre (CRA-W) Valorisation of Agricultural Products Department Gembloux, Belgium 2 INTRODUCTION - Chemometrics Introduction What is this and why we need it - Some definitions - Overview of methods - Examples 2 3 The use of multivariate analysis in the discipline X: - Biometrics (used in biology) - Technometrics (used in engineering) - Psychrometrics (used in phychology) - Chemometrics (used in chemistry) Statistical, mathematical or graphical technique, considers multiple variables simultaneously X - METRICS 4 “Chemometrics is the chemical discipline that uses mathematics and statistics to design or select optimal experimental procedures, to provide maximum relevant chemical information by analyzing chemical data, and to obtain knowledge about chemical systems” D. L. Massart CHEMOMETRICS INTRODUCTION

- Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

1

BASICS OF CHEMOMETRICS

Juan Antonio Fernández Pierna

Vincent Baeten

Pierre Dardenne

Walloon Agricultural Research Centre (CRA-W)

Valorisation of Agricultural Products Department

Gembloux, Belgium

2

INTRODUCTION

- Chemometrics Introduction

What is this and why we need it

- Some definitions

- Overview of methods

- Examples

2

3

The use of multivariate analysis in the discipline X:

- Biometrics (used in biology)

- Technometrics (used in engineering)

- Psychrometrics (used in phychology)

- Chemometrics (used in chemistry)

Statistical, mathematical or graphical

technique, considers multiple variables

simultaneously

X - METRICS

4

“Chemometrics is the chemical discipline that

uses mathematics and statistics to design or

select optimal experimental procedures, to

provide maximum relevant chemical

information by analyzing chemical data, and to

obtain knowledge about chemical systems”

D. L. Massart

CHEMOMETRICS – INTRODUCTION

Page 2: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

5

Statistics

Food Feed

Organicchemistry

Theoreticaland physical

chemistry

Analyticalchemistry

Organicchemistry

Theoreticaland physical

chemistry

chemistry

Computing

Engineering

BiologyIndustrialMathematics

Chemometrics

Meeting point of various disciplines

CHEMOMETRICS – INTRODUCTION

6

The scientific world today

CHEMOMETRICS – INTRODUCTION

• The data flood generated by modern analytical

instrumentation produces large quantity of numbers

to understand and quantify phenomenons around us.

• A deeper understanding of those methods and tools

for viewing all data simultaneously are needed.

• The evolution of personal computers allows faster

acquisition, processing and interpretation of chemical

data.

• Every scientist uses software related to mathematical

methods or to processing of knowledge.

7

Use of mathematical and statistical methods for

selecting optimal experiments

Statistical experimental design

Design of Experiments (DoE)…

Use of mathematical and statistical methods for

selecting optimal experiments

Statistical experimental design

Design of Experiments (DoE)…

Extracting maximum amount of information when

analysing multivariate (chemical) data

Classification

Process monitoring,

Multivariate calibration…

Extracting maximum amount of information whenwhen

analysing multivariate (chemical) data

Classification

Process monitoring,

Multivariate calibration…

CHEMOMETRICS – INTRODUCTION

8

CHEMOMETRIC

S Analytical

request

Analytical

method

Analytical

answer

CHEMOMETRICS

Useful at any point in an analysis, from the first

conception of an experiment until the data is

discarded.

CHEMOMETRICS – INTRODUCTION

Page 3: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

9

Huge growth area in past 15 years

• Process Control and analysis

• Food and feed analysis

• Biology – metabolomics etc

• Environmental monitoring

• Analytical Chemistry

CHEMOMETRICS – APPLICATIONS

10

Linear algebra is the language of Chemometrics. One cannot expect

to truly understand most chemometric techniques without a basic

understanding of linear algebra (Wise and Gallagher, 1998)

Matrix and vector operations

CHEMOMETRICS – DEFINITIONS

11

- Samples are referred to as OBJECTS

CHEMOMETRICS:

Extract meaningful information about the objects and the

variables from data matrices

- Measurement results (e.g. concentration, absorbances, …) are referred

to as VARIABLES

X

K variables

M objects - observations

-A data table of K variables and M objects is referred to as a DATA

MATRIX OF SIZE MxK

CHEMOMETRICS – DEFINITIONS

12

úúúúú

û

ù

êêêêê

ë

é

=

np2n1n

p22221

p11211

x...xx

............

x...xx

x...xx

X

Variables

Sam

ple

s

If X has 3 rows and 5 columns

3 x 5 data matrix

1200 1400 1600 1800 2000 2200 2400-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Wavelength

Absorb

ance

1200 1400 1600 1800 2000 2200 2400-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength

Absorb

ance

1200 1400 1600 1800 2000 2200 2400-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength

Absorb

ance

CHEMOMETRICS – DEFINITIONS

Page 4: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

13

úúúúú

û

ù

êêêêê

ë

é

npnn

p

p

xxx

xxx

xxx

...

............

...

...

21

22221

11211

In summary…

X

K variables

M objects

=

CHEMOMETRICS – INTRODUCTION

14

Ø Sampling, selection of objects and variables

Ø Clustering

Ø Multivariate regressions, calibrations and predictions

Ø Neural Networks

Ø Validation

Ø Graphical display and outlier detection

Source: Chemometrics – Introduction - Jens C. Frisvad

CHEMOMETRICS – IMPORTANT DISCIPLINES

15

Selection optimal complexity

Model building

Supervised analysis

Regression models

Model construction

Calibration

Calibration data set (X) Sample Selection

Visualization

Unsupervised analysis

Principal Component Analysis

(PCA)

outlier detection

Data matrix pre-treatment

clustering tendency

Data Exploration

Pattern Recognition

Outliers Prediction

New data

Uncertainty estimation

Validation

16

Population sources of variation:

- origins of samples

- processes

- varieties

- storage conditions

- sample preparations (t°, particle size)

- residual moisture

INCLUDE IN THE CALIBRATION SET ALL THE FACTORS OF VARIATION:

Measurement factors:

- room temperature

- operator

- instrument setup

SAMPLE SELECTION

Page 5: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

17

Grouping of objects e.g. how similar is the behaviour of compounds,

how similar are products…

Looking at relationships:

Between samples: food samples / patients / people / spectra / …

Between variables: elemental compositions / compound

concentrations / spectral peaks / …

PATTERN RECOGNITION

PCA

Principal Component Analysis

18

0.20.25

0.30.35

0.4

0.4

0.5

0.6

0.7

0.8-5

-4

-3

-2

-1

x 10-3

PCA is a chemometric technique for visualizing high

dimensional data. PCA reduces the dimensionality (the

number of variables) of a data set by maintaining as much

variance as possible.

PC1

PC2

-3 -2 -1 0 1 2 3 4-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

PC 1 (96.09%)

PC

2 (

3.4

2%

)

PC1 PC2

PRINCIPAL COMPONENT ANALYSIS

19

-3 -2 -1 0 1 2 3 4-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Scores on PC 1 (96.09%)

Score

s o

n P

C 3

(0.3

4%

)

Samples/Scores Plot of Xoffset

Object space

-3 -2 -1 0 1 2 3 4-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Scores on PC 1 (96.09%)

Score

s o

n P

C 3

(0.3

4%

)

Samples/Scores Plot of Xoffset

Clusters

Outliers

-3 -2 -1 0 1 2 3 4-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Scores on PC 1 (96.09%)

Score

s o

n P

C 3

(0.3

4%

)

Samples/Scores Plot of Xoffset

PRINCIPAL COMPONENT ANALYSIS

20

Quantitative or qualitative estimation. Especially mixtures.

Univariate calibration : one measurement e.g. a peak hight

Multivariate calibration : several measurements e.g. spectra

Statistical, mathematical or graphical technique, considers

multiple variables simultaneously

CALIBRATION

Page 6: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

21

Linking two sets of data together : peak hight to

concentration / Spectra to concentrations / biological

activity to structure / …

MULTIVARIATE CALIBRATION

• Relate instrumental response to chemical concentration.

• Improper calibration yields improper results.

• Linear, non-linear and multivariate calibration algorithms

are available, depending on the instrument and the analysis

involved.

22

X – Data

spectra

Y – Data

Reference values

Calibration

model +

Calibration

- Infrared spectroscopy (IR-Raman)

- Fourier-Transform Infrared spectroscopy (FTIR)

- Nuclear Magnetic Resonance (NMR)

- X-Ray Fluorescence (XRF)

- X-Ray Diffraction (XRD) spectroscopy

- …

- Moisture (regression)

- Concentration of protein (regression)

- country of origin (discrimination)

- PDO (discrimination)

- …

MULTIVARIATE CALIBRATION

23

X – Data

spectra

Y – Data

Ref value

Calibration

model +

X – Data

spectra

Calibration

model

Y – Data

// ref value. +

Calibration

Validation

Calibration uses empirical data and prior knowledge for

determining how to predict unknown quantitative

information y from available measurements x, via some

mathematical transfer functions

MULTIVARIATE CALIBRATION - VALIDATION

24

Y is a matrix nxm containing the reference values

X is a matrix nxp containing the spectra (NIR, MIR,

Raman,…)

B is a matrix pxm containing the regression coeficients

E is a matrix nxm explaining for the model error.

Y=f(X)=XB+E

MULTIVARIATE CALIBRATION

Page 7: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

25

Multivariate analysis

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised

Regression

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

Discrimination

PLS-Discriminant Analysis (PLS-DA)

SIMCA

2525

K-Nearest Neighbours (k-NN)

25

Support Vector Machines (SVM)Algorithms 26

Multivariate analysis

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised

Regression

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

Discrimination

PLS-Discriminant Analysis (PLS-DA)

SIMCA

K-Nearest Neighbours (k-NN)

Support Vector Machines (SVM)

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

PLS-Discriminant Analysis (PLS-DA)

SIMCA

K-Nearest Neighbours (k-NN)

Support Vector Machines (SVM)

Supervised

Regression

Discrimination

Unsupervised - Pattern recognition

à Seeks similarities and regularities present in the data.

27

Multivariate analysis

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised

Regression

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

Discrimination

PLS-Discriminant Analysis (PLS-DA)

SIMCA

K-Nearest Neighbours (k-NN)

Support Vector Machines (SVM)

PLS-Discriminant Analysis (PLS-DA)

SIMCA

K-Nearest Neighbours (k-NN)

Support Vector Machines (SVM)

Discrimination

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised - Regression

à It includes any techniques for modeling and analyzing several

variables, when the focus is on the relationship between a dependent

variable (property of interest) and one or more independent

variables (spectra).

28

Multivariate analysis

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised

Regression

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

Discrimination

PLS-Discriminant Analysis (PLS-DA)

SIMCA

K-Nearest Neighbours (k-NN)

Support Vector Machines (SVM)

Multiple Linear Regression (MLR)

Principal Component Regression (PCR)

Partial Least Squares (PLS)

Artificial Neural Networks (ANN)

LS–Support Vector Machines (LS-SVM)

Local techniques

(CA)

Regression

Unsupervised

Principal Component

Analysis (PCA)

Cluster Analysis

(CA)

Supervised - Discrimination

à Regression based statistical technique used in determining which

particular classification or group an item of data or an object belongs to

on the basis of its characteristics or essential features.

Page 8: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

29

Find a set of predictor variables which gives a good fit,

predicts the dependent value well and is as small as

possible.

Ø Backward elimination: Start with all the predictors and

potentially drop predictors in subsequent steps.

Ø Forward selection: Start with no predictors and potentially

add predictors in subsequent steps.

Ø Stepwise regression: Combination of backward

elimination and forward selection.

VARIABLE SELECTION

30

VARIABLE SELECTION

31

Reducing number of variables

Possibility of constructing fast spectrometers based on a reduced

number of variables aiming to a reduction of training and

utilization times to be used, for instance, in a conveyer belt.

Moreover, the fact of grouping the different selected variables in

regions allows an easy chemical interpretation of the spectra.

References

- ‘Backward variable selection in PLS (BVSPLS)’; J.A. Fernández

Pierna, V. Baeten, P. Dardenne. Analytica Chimica Acta 642 (2009)

pp. 89-93.

- ‘Prediction error improvements using variable selection on small

calibration sets - a comparison of some recent methods’. S.I.

Overgaard, J.A. Fernández Pierna, V. Baeten, P. Dardenne & T.

Isaksson. J. Near Infrared Spectrosc. 20, 329-337 (2012).

VARIABLE SELECTION

32

CHEMOMETRICS – SOFTWARE

Page 9: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

33

Vegetal

Fish

Poultry Bovine

+ Pig

Terrestrial animal

Other

Other

CAL LOOCV

Belonging

to...

% classified as... % classified as...

Fish Rest Fish Rest

Fish 98.6 1.4 96.8 3.2

Rest 9.5 90.5 10 90

FARIMAL

93.4 % correct classification

EXAMPLE: FEED

34

EXAMPLE: FEED

35

SVM discrimination models

‘NIR hyperspectral imaging spectroscopy and chemometrics for the detection of undesirable substances in food and

feed’ J.A. Fernández Pierna, Ph. Vermeulen, O. Amand, A. Tossens, P. Dardenne and V. Baeten. Special issue

Chemometrics and Intelligent Laboratory Systems 117 (2012) 233-239

EXAMPLE: CEREALS - IMPURITIES

36

§ Walnut

§ Hazelnut

§ Brazil nut

§ Almond

§ Cashew

§ Raisin

EXAMPLE: NUTS AND DRIED FRUITS

Page 10: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

37

KNN - PCA models

‘Using a visible vision system for on-line determination of quality parameters of olive fruits’. E. Guzmán,

V. Baeten, J.A. Fernández Pierna, J.A. García Mesa. Foof and Nutrition Sciences, 4, 90-98 (2013).

EXAMPLE: OLIVES - IMPURITIES

38

‘Application of low-resolution Raman spectroscopy for the analysis of oxidized olive oil’ E. Guzmán,

V. Baeten, J.A. Fernández Pierna, J.A. García Mesa. (2011) Food control 22, 2036-2040.

EXAMPLE: OLIVE OIL – QUALITY PARAMETERS

39

Detection of the Presence of Hazelnut Oil in Olive Oil by FT-Raman and FT-MIR Spectroscopy' V. Baeten,

J.A. Fernández Pierna, P. Dardenne, M. Meurens, D.L. García González, R. Aparicio. Journal of

Agricultural and Food Chemistry, 53 (16) (2005), 6201-6206.

EXAMPLE: OLIVE OIL – ADULTERATION

40

optical microscopy NIR hyperspectral imaging

SVM discrimination models

‘NIR hyperspectral imaging spectroscopy and chemometrics for the detection of undesirable substances in food and

feed’ J.A. Fernández Pierna, Ph. Vermeulen, O. Amand, A. Tossens, P. Dardenne and V. Baeten. Special issue

Chemometrics and Intelligent Laboratory Systems 117 (2012) 233-239

EXAMPLE: SUGAR BEET PLANTS – CYST

Page 11: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

41

EXAMPLE: SUGAR BEET PLANTS – CERCOSPORA

42

Multivariate regression method comparison:

PLS, ANN and LS-SVM

EXAMPLE: FEED PRODUCTS

43

Feed -------------------------------------------- (28676x700)

Ash, Fat, Fibre, Starch, Protein

Feed Ingredients --------------------------- (26652x700)

Ash, Fat, Fibre, Protein

Fresh Silages ------------------------------- (1035x700)

Dry Matter, Fibre, Protein

Soils ------------------------------------------- (1625x700)

CEC, COT_SK, N_Kj

Pre-processing: SNV + detrend + First derivative

EXAMPLE: FEED PRODUCTS

44

Example of data distribution (Soils N_Kj)

EXAMPLE: FEED PRODUCTS

Page 12: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

45

Feed - Fat

0

0.2

0.4

0.6

0.8

1

pls ann svm

RM

S

train

val

ks

rd

Feed - fibre

0

0.2

0.4

0.6

0.8

1

1.2

1.4

pls ann svm

RM

S

train

val

ks

rd

Feed - ash

0

0.5

1

1.5

2

2.5

pls ann svm

RM

S

train

val

ks

rd

Feed - starch

0

0.5

1

1.5

2

2.5

3

3.5

pls ann svm

RM

S

train

val

ks

rd

Feed - protein

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

pls ann svm

RM

S

train

val

ks

rd

EXAMPLE: FEED PRODUCTS

46

Feed ingredients - ash

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

pls ann svm

RM

S

train

val

ks

rd

Feed ingredients - fat

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

pls ann svm

RM

S

train

val

ks

rd

Feed ingredients - fibre

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

pls ann svm

train

val

ks

rd

Feed ingredients - protein

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

pls ann svm

RM

S

train

val

ks

rd

EXAMPLE: FEED PRODUCTS

47

Fresh silages - fibre

0

0.5

1

1.5

2

2.5

pls ann svm

RM

S

train

val

ks

rd

Fresh silages - protein

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

pls ann svm

RM

S

train

val

ks

rd

Fresh silages - dry matter

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

pls ann svm

RM

S

train

val

ks

rd

EXAMPLE: FEED PRODUCTS

48

0 10 20 30 40 50 60 700

10

20

30

40

50

60

0 10 20 30 40 50 60 700

10

20

30

40

50

60

CALIBRATION TRANSFER FROM DISPERSIVE

INSTRUMENTS TO HANDHELD SPECTROMETERS (MEMS)

‘Calibration Transfer from Dispersive Instruments to Handheld Spectrometers’, J.A. Fernández

Pierna, P. Vermeulen, B. Lecler, V. Baeten, P. Dardenne. Applied Spectroscopy 64 (6) (2010)

EXAMPLE: TRANSFERT

Page 13: - Chemometrics Introduction What is this and why …CHEMOMETRICS – APPLICATIONS 10 Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric

49

EXAMPLE: TRANSFERT

50

http://www.cra.wallonie.be/

j.fernandez@ cra.wallonie.be

QUESTIONS ABOUT CHEMOMETRICS?

51