Upload
cory-george
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Grey modeling approaches to investigate chemical processes
Romà Tauler1 and Anna de Juan2
IIQAB-CSIC1, UB2 Spain
E-mail: [email protected]
Grey modeling approaches to investigate chemical processes
• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry
• Multivariate Curve Resolution as a grey modeling method
• Grey modeling applications using MCR-ALS
Modeling approaches
Hard Modeling White ModelingModels based on Physical/Chemical Laws
Soft Modeling Black ModelingEmpirical Models with no knowledge/assumptions about the Physical/chemical laws of the system (usually non-linear)
Models with no assumptions about the physical/chemical modelbut with assumptions about the measurement model (usually multivariate and linear)
Soft+Hard Modeling? Grey Modeling?Mixed Models partially using information about physical//chemical laws
Chemical model (variation of compound contribution)
MIXTURE
Non-existent
PROCESS
Known Too complex Unknown
Chemical multicomponent systems. Structure
Measurement model (variation of the instrumental signal)
Simple additive linear model (Factor Analysis tools)
D D1 D2 Dn
= + + ... +
D
= +
s 1
+c2
s 2
... + cn
s n
c1
D
=
C
Ssn
s1
cnc1
Hard (White) Modeling
•Data modeling and data fitting in chemical sciences has been traditionally done by hard modeling techniques.
•They are based on physical/chemical models which are already known (or assumed, proposed,...)
•The parameters of these model are not known and they are estimated by least squares curve fitting
•This approach may be also called white modeling and it is valid for well known phenomena and laboratory data, where the variables of the model are under control during the experiments and only the phenomena under study affect the data.
2i,j
i=1 j=1
= I J
ssq r f ( ,model, )ssq Y
Hard (White) Modeling
ijijij YYr ˆ
0)(
ssqFind the optimal
parameters of theModel ,
eYY ),model(ˆ
Hard (White) Modeling
Case 1 Kinetic Systems:Yij = Aij
measured absorbances of sample/solution i wavelength j
Measurement model assumptions:
Chemical Model assumptions:
Defining the residuals:
Finding the best model and its parameters
kj
ki
kij
K
k
kijij CAAA ,
klklkl
tkk cqTeCC ,0
ijijij AAr ˆ
0
ssq
Hard (White) Modeling
Case 2 Solution Equilibria: Yij = Aij
measured absorbances of sample/solution i wavelength j.
Measurement model assumptions:
Chemical Model assumptions:
Defining the residuals:
Finding the best model and its parameters
kj
ki
kij
K
k
kijij CAAA ,
k
lklkll
qlk
kk cqTcC lk ,
i j
ijijijij rssqAAr 2,ˆ
0
ssq
mp=0
guess parameters, k0
calculate residuals, r(k0)and the sum of squares, ssq
calculate Jacobian J
calculate shift vector k, andk0 = k0 + k
end;display resultsssqold <> ssq mp=0
mp / 3mp5
<
>
yes
no
The Newton-Gauss-Levenberg/Marquardt (NGL/M) algorithm
Hard (White) Modeling
t 1 t0( ) ( )mp k J J J r kI
i iik y d
y
ssq
nt n nk nc n
• In soft (black) modeling no physical model is assumed.
• In some cases a linear measurement model is assumed (factor analysis methods)
• In other cases dependencies among variables and sources of variation are considered to be non linear (neural networks, genetic algorithms, …)
• The goal of these methods is the explanation of data variance using the minimal or softer assumptions about data
Soft (Black) Modeling
Example of Soft (Black) ModelingFactor Analysis/Principal Component Analysis
Bilinear ModelD = U VT + E
Unique solutions but without physical meaningConstraints: U orthogonal, VT orthonormalVT in the direction of maximum variance
N
ij in nj ijn=1
d u v e
N
D UVT
E+I
J J J
I I
N
N << I or J
Hard (white)- vs. Soft (black)-modelling
Pros HM• Well defined behaviour
model (useful chemical information).
• Unique solutions.
• Reduced number of parameters to be optimised (e.g., K, k,..)
Pros SM No explicit model is
required.
Information on the process or signal may be used (constraints).
May help to set or to validate a physicochemical model.
Cons HM• The underlying model
should be correct and completely known.
• No variations other than those related to the model should be present in the data set.
Cons SM Ambiguous solutions.
Does not provide directly physicochemical (kinetic or thermodynamic,...) information.
Hard (white)- vs. Soft (black)-modelling
Hard (white)- vs. Soft (black)-modelling
Use HM• The variation of the
system is completely described by a reliable physicochemical model.
Clean reaction systems (kinetic or thermodynamic processes)
Use SM The model describing the
variation of the data is too complex, unknown or non-existent.
Images.
Chromatographic data.
Macromolecular processes.
• Mixed systems with hard-modelable and soft-modelable parts are proposed– Hard-model: kinetic process, equilibrium reaction.....– Soft-model: interferent, background, drift, unknown....
• Introducing a hard-model part decreases the ambiguity related to pure soft-modeling methods and gives additional information (parameters).
• Introducing a soft-model part, may help to clarify the nature of the physicochemical model and give more reliable results.
Grey (hard+soft) modeling
Grey modeling approaches to investigate chemical processes
• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry
• Multivariate Curve Resolution as a grey modeling method
• Grey modeling applications using MCR-ALS
Multivariate Curve Resolution (MCR)
Goal
Knowing the identity and contribution of each pure
compound (entity) in the process or in the mixture.
PROCESS
The composition changes in a continuous
evolutionary manner.
E.g. chemical reactions, processes, HPLC-DAD.
MIXTURE
The composition changes with a random pattern
variation.
E.g. Series of independent samples.
The composition changes with a non-random pattern variation.
E.g. environmental data, spectroscopic images.
A tool to analyse (resolve) changes in composition and response in multicomponent systems.
Multivariate Curve Resolution
Pure component information
C
ST
sn
s1
c nc 1
WavelengthsRetention times
Pure concentration profiles Chemical model
Process evolutionCompound contribution
Pure signals
Compound identity
D
Mixed information
tR
Multivariate Curve Resolution methods
D = CST + E
• Investigation of chemical reactions (kinetics, equilibria, …) using multivarite measurements (spectrometric,...)
• Industrial processes (blending, syntheses,…).• Macromolecular processes.• Biochemical processes (protein folding).• Spectroscopic images.• Mixture Analysis (in general)• Hyphenated separation techniques (HPLC-DAD, GC-MS, CE-
DAD,...).• Environmental data (model of pollution sources)• ……………..
Multivariate Curve Resolution Bilinear Model: Factor Analysis Model
D = C ST + E
N
ij ik kj ijn=1
d c s e
D CST
E+I
J J J
I I
K
N << I or J
N
Non-unique solutions but with physical meaning (rotational/ intensity ambiguities are present)
Constraints: C and ST non-negativeC or ST scaled (normalization, closure)
Other constraints (unimodality, local rank, selectivity, previous knowledge... )
D1
D2
D3
ST
C1
C2
C3
Z
D C
Multivariate Curve resolution Alternating Least Squares MCR-ALS
Extension to multiple data matrices
quantitative information
row-, concentration profiles
column-, spectraprofiles
column-wiseaugmenteddata matrix
NR1
NR2
NR3
NC
NM = 3
Advantages of matrix augmetation(multiway data)
• Resolution local rank conditions are achieved in many situations for well designed experiments (unique solutions!)
• Rank deficiency problems can be more easily solved
• Unique decompositions are easily achieved for trilinear data (trilinear constraints)
• Constraints (local rank/selectivity and natural constraints) can be applied independently to each component and to each individual data matrix.
J,of Chemometrics 1995, 9, 31-58 J.of Chemometrics and Intell. Lab. Systems, 1995, 30, 133
Multivariate Curve Resolution – Alternating Least Squares (MCR-ALS)
• Determination of the number of components (i.e. by SVD)
• Building of initial estimates (C or ST)
• Iterative optimisation of C and/or ST by Alternating Least Squares (ALS) subject to constraints.
• Check for satisfactory CST data reproduction.
Data exploration
Input of external information asCONSTRAINTS
The aim is the optimal description of the experimental data using chemically meaningful pure profiles.
Fit and validation
TPCA
CSCDmin ˆˆˆ
ˆ T
PCAS
SCDminT
ˆˆˆ
• Optional constraints (local rank, non-negativity, unimodality,closure,…) are applied at each iteration• Initial estimates of C or S are obtained from EFA or from pure variable detection methods.
C and ST are obtained by solving iteratively the two LS equations:
An algorithm for Bilinear Multivariate Curve Resolution Models :
Alternating Least Squares (MCR-ALS)
Constraints
Definition
Any chemical or mathematical feature obeyed by the profiles of the pure compounds in our data set.
• C and ST can be constrained differently.• The profiles within C and S can be constrained
differently.
Constraints transform resolution algorithms into problem-oriented data analysis tools
Soft constraints
Non-negativity
C*
0 10 20 30 40 50-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
Retention times
Cc
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retention times
Concentration profiles
spectra
Unimodality
C*
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retention times
Cc
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retention times
Reaction profiles Chromatographic peaks
Voltammograms
Soft constraints
Soft constraints
Selectivity/local rank
Concentration selectivity/local rank constraint
C*
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retention times
Cc < threshold values
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retention times
We knowthat this region
is not rank 3, but rank 2!
D
Select
Updated
STALS
cALS
Local model
predALSc
calALSc
calALSc
refc
predcpredALSc
calc calc
cal
ALSref cc
b, b0
b, b0predc
C
Errorbcbc 0calALSref
0predALS
pred bcbc ˆ
Concentration correlation constraint (multivariate calibration)
ST
C
=
D
D1
D2
D3
Trilinearity Constraint (flexible to every species) Extension of MCR-ALS to multilinear systems
1st scoreloadings
PCA,SVD
Foldingspeciesprofile
1st scoregives thecommonshape
Loadings give therelative amounts!
Trilinearity Constraint
Unfolding species profile
UniqueSolutions!
Substitution of species profile
C
Selection of species profile
R.Tauler, I.Marqués and E.Casassas. Journal of Chemometrics, 1998; 12, 55-75
Hard modeling: Mass balance or Closure constraint
C*
2 3 4 5 6 7 8 90
0.05
0.1
0.15
0.2
0.25
0.3
0.35
pH
ctotal
2 3 4 5 6 7 8 90
0.05
0.1
0.15
0.2
0.25
0.3
0.35
pH
Cc
= ctotal
ctotal
Mass balance
Closed reaction systems
Hard modeling constraints
Hard modeling: Mass action law and rate laws
Hard modeling constraints
C
2 3 4 5 6 7 8 90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pH
Ccons
2 3 4 5 6 7 8 90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pH
Physicochemical model
Kinetic processes
Equilibrium processes
• The hard model is introduced as a new and essential constraint in the soft-modelling resolution process.
• It is applied in a flexible manner, as the soft-modelling constraints.
– To some or to all process profiles.– To some or to all matrices in a three-way data set. – Different hard models can be applied to different
matrices in a three-way data set.
Grey modeling using MCR-ALSsoft + hard modeling constraints
C
2 3 4 5 6 7 8 90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pH
Ccons
2 3 4 5 6 7 8 90
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pH
physicochemical model (mass action law, rate law)
Kinetic processes
Equilibrium processes
CSM CHM
Grey modeling using MCR-ALS
soft model (non-negativity)
HM
SM
1. Select the soft-modelled profiles to be constrained (CSM).
2. Non-linear fit of the selected profiles according to the hard model selected.
3. Update the soft-modelled profiles CSM.by the fitted CHM.
min(ssq(CSM-CHM))
ssq=f(CSM, model, parameters)
Grey modeling using MCR-ALS
Grey modeling approaches to investigate chemical processes
• Introduction to chemical modeling: white (hard), black (soft) and grey modeling in chemistry
• Multivariate Curve Resolution as a grey modeling method
• Grey modeling applications using MCR-ALS
Grey modeling approaches to investigate chemical processes
Examples:
1. Getting kinetic and analytical information from mixed systems (drift and interferents)
2. Using a physicochemical model to decrease resolution ambiguity and getting analytical information
3. pH induced transitions in hemoglobin
0 5 100
0.5
1
Time
Con
cent
ratio
n
0 50 1000
1
2
3
4x 10
4
Wavelengths
Abs
orba
nce
A
B
C
C B A
i
D a
d
0 5 100
0.5
1
Con
cent
ratio
n
drift
D d
Time
Kinetic process + drift
0 5 100
0.5
1
TimeC
once
ntra
tion
interf.
D i
Kinetic process + interferent
CBA k1 = k2 = 1
Model
Grey modeling applications using MCR-ALS
consecutiveirreversible
Example 1 Getting kinetic information from mixed systems (drift and interferents)
Anna de Juan, Marcel Maeder, Manuel MartÍnez, Romà TaulerAnalytica Chimica Acta 442 (2001) 337–350;
Kinetic model
][][][][
)(][][
][][
21
1
12
1
CBAC
eekk
kAB
eAA
o
tktko
tko
CHM = f(k1, k2)
Kinetic process
+ drift/interferent
A, B, C HMDrift, inter SM
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Time
Con
cent
ratio
n (a
.u.)
0 20 40 60 80 1000
1
2
3x 10
4
Wavelength channel
Abs
orpt
iviti
es (
a.u.
)
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Time
Con
cent
ratio
n (a
.u.)
0 20 40 60 80 1000
1
2
3
4x 10
4
a)
b)
Kinetic process + drift
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
Time
Con
cent
ratio
n
0 20 40 60 80 1000
1
2
3
x 104
Wavelength channel
Abs
orba
nce
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
TimeC
once
ntra
tion
0 20 40 60 80 1000
1
2
3
x 104
Wavelength channel
Abs
orba
nce
a)
b)
Kinetic process + interferent
HM
HSM
Grey modeling applications using MCR-ALS
Example 1 Getting kinetic information from mixed systems (drift and interferents)
System Algorithm k1 = 1 k2 = 1
A,B,C (drift) HM 1.40 0.83
HSM 1.16 0.90
A,B,C (interferent) HM 1.16 0.89
HSM 0.95 1.05
Anna de Juan, Marcel Maeder, Manuel MartÍnez, Romà TaulerChemometrics and Intelligent Laboratory Systems 54 2000 123–141
Example 2. Using a physicochemical model to decrease resolution
ambiguity. Getting analytical information.
Chemical problem: multiequilibria systems
Quantitation of an analyte (H2A) in the presence of an interferent (H2B).
Measurements FT-IR monitored pH titrations
H2A (malic acid)
H2B (tartaric acid)
0
0.02
0.04
0.06
0.08
0.1
0.12
1 6 11
pH
Co
nc
en
tra
tio
n
Grey modeling applications using MCR-ALS
Highly overlapped concentration profiles
Example 2. Using a physicochemical model to decrease resolution
ambiguity. Getting analytical information.
Too correlated concentration profiles
Too overlapped spectra
Too ambiguous SM solutions
Quantitation fails
Data set
Standard
H2A
Sample
H2A/H2B
pH
pH
Grey modeling applications using MCR-ALS
Time effect on pH transitions (UV)
2 3 4 5 6 7 8 9 1000.10.20.30.40.50.60.70.80.9
pH
3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
pH
350 400 450 500 550 600 650 7000
0.5
1
1.5
2
2.5
Wavelengths (nm)
350 400 450 500 550 600 650 7000
0.2
0.4
0.6
0.8
1
After 24 hours
Fresh solution
Wavelengths (nm)
Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin
SM
SM
1,2 Heme group unbound 3 Native 4 Heme bound (change in coordination)
• Time-dependent acidic conformations evolve very similarly with pH (rank-deficiency).• The kinetic matrix helps in the resolution of the acidic conformations in the pH-dependent process.• Hard-modelling constraint applied to the kinetic process helps to a less ambiguous recovery of the
acidic conformations in the pH-dependent process.
tim
ep
H
D C
ST
=
Global description of the process
After 48 hours
Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin
SM
HM
• All the pH-dependent conformations can be resolved, even those time-dependent.
• Additional kinetic information is obtained. k1 = 1.424e-5 + 4 e-8
Complete description SM + HM
350 400 450 500 550 600 650 7000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Wavelengths (nm)0 2 4 6 8 10 12 14 16
0
1
2
3
4
5
Time
3 4 5 6 7 8 9 10
0
1
2
3
4
5
pH
Grey modeling applications using MCR-ALSExample 3 Time effect on pH induced transitions in hemoglobin
HM SM
Some References Soft+Hard (Grey) Modelling
• A. de Juan, M. Maeder, M. Martínez, R. Tauler. Chemom. Intell. Lab. Sys. 54 (2000) 123.
• A. de Juan, M. Maeder, M. Martínez, R. Tauler. Anal. Chim. Acta, 442 (2001) 337
• J.Diework, A. de Juan, R.Tauler and B.Lendl. Applied Spectroscopy, 2002, 56, 40-50
• J. Diewok; A. de Juan; M. Marcel; R. Tauler; B. Lendl. Analytical Chemistry, 2003, 76, 641-7
Acknowledgements• Chemometrics Group (UB and IIQAB-CSIC)
– Staff: Romà Tauler, Javier Saurina, Anna de Juan, Raimundo Gargallo– Post-doc: Montse Vives, Mónica Felipe– PhD : Susana Navea, Joaquim Jaumot, Emma Peré-Trepat, Elisabeth
Teixido– Master: Silvia Termes, Silvia Mas, Gloria Muñoz, Marta Terrado, Xavier
Puig
.
Manel Martínez, University of Barcelona (Spain)Marcel Maeder (University of Newcastle, Australia)Josef Diewok (University of Viena, Austria)