7
Anal. Chem. lQ04, 56, 2297-2303 2297 Registry No. I, 154-97-2;11,114-90-9; 111,51-15-0; IV, 56-97-3; V, 53370-49-3. LITERATURE CITED (1) Wood, G. W.; McIntosh, J. M.; Law, P. Y. J. Org. Chem. 1975, 40, (2) Brent, D. A,; Rouse, D. J. Tehahedron Lett. 1973, 42, 4127-4130. (3) Ryan, Y. M.; Day, R. J.; Cooks, R. 0. Anal. Chem. 1980, 52. (4) Heller. D. N.; Yergey, J.; Cotter, R. J. Anal. Chem. 1983, 55, 636-636. 2054-2057. 1310-1313. Beuhler, R. J.; Flanlgan, F. E.; Greene, L. J.; Friedman, L. J. Am. Chem. SOC. 1974, 96, 3990-3999. Cotter, R. J. Anal. Chem. 1980, 52, 1589A. Cullls, P. 0 . ; Neumann, G. M.; Rogers, D. E.; Derrick, P. J. Adv. Mass Spectrom. 1980, BB, 1729. Soltmann, 6.; Sweeley, C. C.; Holland, J. F. Anal. Chem. 1977, 49, 1164- 1 166. (9) Fukushlma, K.; Aral, T. Mass Spectrom. 1978, 26, 197. (10) JEOL FD-MS Application Data Voi. 102, 103. (11) Stoll, R.; Rollgen, F. W. Org. Mass Spectrom. 1979, 14, 642. RECEIVED for review January 31,1984. Accepted May 24,1984. Nonsupervised Numerical Component Extraction from Pyrolysis Mass Spectra of Complex Mixtures Willem Windig* and Henk L. C. Meuzelaar Biomaterials Profiling Center, University of Utah, Salt Lake City, Utah 84108 Pyrolysis mass spectral data of complex organic mixtures can be expressed in subpatterns describing the pure Components of the mixtures and their relative concentrations. The ap- proach described involves factor and discriminant analysis and does not require the presence of pure component spectra in the data set. It Is based on a representation of correlations in a data set in the form of a “variance diagram”. Applica- tions of the procedure are discussed for various sets of sam- pies: biopolymers, lignites (brown coals), and grass leaves. Recently, Windig et al. described a mixture analysis pro- cedure for pyrolysis mass spectrometry (Py-MS) data seta by graphical rotation (1-3). This visually assisted approach was chosen over mathematical procedures because pyrolysis mass spectra of complex mixtures are not necessarily exact linear combinations of the spectra of the individual components. Furthermore, reference spectra of pure biochemical compo- nents are often not available. This severely limits the ap- plicability of library search systems, such as used in GC/MS (4), or of target rotation methods, such as developed by Malinowski (5, 6), for Py-MS data sets. Other methods for mixture analysis, often based on factor analysis, do not require reference spectra. Knorr and Futrell developed a technique to extract information on the pure components from mixture data (7). This procedure relies on a number of assumptions with regard to the structure of the data set: (a) a “pure mass” is present for every component (i.e., a mass value with a finite intensity for one component and zero intensity for all other components), and (b) the mathematically extracted spectrum contains only positive intensities (7). The procedure for finding pure masses or a key set of variables has been refined by Malinowski (8), thereby making this method applicable for mixtures with more than three components. Successful applications of this technique have been reported for mass spectral (7,8), nuclear magnetic resonance (8,9), and gas-liquid chromatography (8) data. However, this type of technique is not suited for Py-MS data, since the complexity of most Py-MS spectra (10) makes fulfillment of assumption (a) generally impossible. Due to the normalization procedures applied to the peak intensities, mathematically extracted spectra from Py-MS data sets show 0003-2700/84/0356-2297$01.50/0 positive and negative intensities (1-3), therefore assumption (b) is not fulfilled either. Although the graphical rotation of Py-MS data has proven to give valuable information about chemical components (1-3, 11-14) as well as accurate quantitative results (15), the v i s d y assisted graphical rotation approach has some definite dis- advantages. These disadvantages are (a) a highly experienced person is necessary to judge the results of graphical rotations, (b) the presence of unknown components may be overlooked, and (c) other important “tendencies” than changes in the concentration of a component in a set of mixture spectra (e.g., a pH change, changes caused by linkage difference in poly- mers, matrix effects or other physicochemical reactions) may not be recognized visually. For these reasons, an automated mathematical approach to graphical rotation capable of extracting major component patterns from a Py-MS data set will be a valuable research tool. This paper describes such a mathematical approach for mixture analysis which, compared to graphical rotation, is a time saving aid for the visual evaluation of the results of factor/discriminant analysis. MATERIALS AND METHODS Simulated Mixtures. Three pyrolysis mass spectra of bio- polymers, viz., the protein bovine serum albumin (BSA), the polyhexose glycogen (GLY), and the peptidoglycan of Bacillus subtilis (PG), form the basis for a set of 10 simulated mixture spectra. Although PG contains aliphatic amino acid residues in addition to the N-acetylamino-sugar chain, the latter units dominate in the pyrolysis mass spectrum. The spectra were taken from literature (I). A matrix of random numbers was generated by computer for constructing the mixtures. In matrix notation where [SI is the data matrix, a 100 X 10 matrix of the intensities of the 100 masses in the 10 different mixtures. The matrix [C], containingthe spectra of the three components, is a 100 x 3 matrix and [F] is the matrix (3 X 10) with the mixture composition. As the factor analysis program used can handle a maximum of 100 variables, the 100 most intense mass peaks out of the 150 present were chosen from the averaged spectrum of the 10 mix- tures. The sum of the elements of each column of [C] is 1, resulting in normalized spectra in [SI. Real Mixtures. Ten mixtures of three water-soluble bio- polymers, gelatin (protein), corn starch (hexose polymer), both 0 1984 American Chemical Soclety

Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

Embed Size (px)

Citation preview

Page 1: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

Anal. Chem. lQ04, 56, 2297-2303 2297

Registry No. I, 154-97-2; 11,114-90-9; 111,51-15-0; IV, 56-97-3; V, 53370-49-3.

LITERATURE CITED (1) Wood, G. W.; McIntosh, J. M.; Law, P. Y. J . Org. Chem. 1975, 40,

(2) Brent, D. A,; Rouse, D. J. Tehahedron Lett. 1973, 42, 4127-4130. (3) Ryan, Y. M.; Day, R. J.; Cooks, R. 0. Anal. Chem. 1980, 52.

(4) Heller. D. N.; Yergey, J.; Cotter, R. J. Anal. Chem. 1983, 55,

636-636.

2054-2057.

1310-1313.

Beuhler, R. J.; Flanlgan, F. E.; Greene, L. J.; Friedman, L. J . Am. Chem. SOC. 1974, 96, 3990-3999. Cotter, R. J. Anal. Chem. 1980, 52, 1589A. Cullls, P. 0.; Neumann, G. M.; Rogers, D. E.; Derrick, P. J. Adv. Mass Spectrom. 1980, BB, 1729. Soltmann, 6.; Sweeley, C. C.; Holland, J. F. Anal. Chem. 1977, 49, 1 164- 1 166.

(9) Fukushlma, K.; Aral, T. Mass Spectrom. 1978, 26, 197. (10) JEOL FD-MS Application Data Voi. 102, 103. (11) Stoll, R.; Rollgen, F. W. Org. Mass Spectrom. 1979, 14, 642.

RECEIVED for review January 31,1984. Accepted May 24,1984.

Nonsupervised Numerical Component Extraction from Pyrolysis Mass Spectra of Complex Mixtures

Willem Windig* and Henk L. C. Meuzelaar

Biomaterials Profiling Center, University of Utah, Salt Lake City, Utah 84108

Pyrolysis mass spectral data of complex organic mixtures can be expressed in subpatterns describing the pure Components of the mixtures and their relative concentrations. The ap- proach described involves factor and discriminant analysis and does not require the presence of pure component spectra in the data set. I t Is based on a representation of correlations in a data set in the form of a “variance diagram”. Applica- tions of the procedure are discussed for various sets of sam- pies: biopolymers, lignites (brown coals), and grass leaves.

Recently, Windig e t al. described a mixture analysis pro- cedure for pyrolysis mass spectrometry (Py-MS) data seta by graphical rotation (1-3). This visually assisted approach was chosen over mathematical procedures because pyrolysis mass spectra of complex mixtures are not necessarily exact linear combinations of the spectra of the individual components. Furthermore, reference spectra of pure biochemical compo- nents are often not available. This severely limits the ap- plicability of library search systems, such as used in GC/MS (4), or of target rotation methods, such as developed by Malinowski (5, 6), for Py-MS data sets.

Other methods for mixture analysis, often based on factor analysis, do not require reference spectra. Knorr and Futrell developed a technique to extract information on the pure components from mixture data (7). This procedure relies on a number of assumptions with regard to the structure of the data set: (a) a “pure mass” is present for every component (i.e., a mass value with a finite intensity for one component and zero intensity for all other components), and (b) the mathematically extracted spectrum contains only positive intensities (7). The procedure for finding pure masses or a key set of variables has been refined by Malinowski (8), thereby making this method applicable for mixtures with more than three components. Successful applications of this technique have been reported for mass spectral (7,8), nuclear magnetic resonance (8,9), and gas-liquid chromatography (8) data. However, this type of technique is not suited for Py-MS data, since the complexity of most Py-MS spectra (10) makes fulfillment of assumption (a) generally impossible. Due to the normalization procedures applied to the peak intensities, mathematically extracted spectra from Py-MS data sets show

0003-2700/84/0356-2297$01.50/0

positive and negative intensities (1-3), therefore assumption (b) is not fulfilled either.

Although the graphical rotation of Py-MS data has proven to give valuable information about chemical components (1-3, 11-14) as well as accurate quantitative results (15), the v i s d y assisted graphical rotation approach has some definite dis- advantages. These disadvantages are (a) a highly experienced person is necessary to judge the results of graphical rotations, (b) the presence of unknown components may be overlooked, and (c) other important “tendencies” than changes in the concentration of a component in a set of mixture spectra (e.g., a pH change, changes caused by linkage difference in poly- mers, matrix effects or other physicochemical reactions) may not be recognized visually.

For these reasons, an automated mathematical approach to graphical rotation capable of extracting major component patterns from a Py-MS data set will be a valuable research tool. This paper describes such a mathematical approach for mixture analysis which, compared to graphical rotation, is a time saving aid for the visual evaluation of the results of factor/discriminant analysis.

MATERIALS AND METHODS Simulated Mixtures. Three pyrolysis mass spectra of bio-

polymers, viz., the protein bovine serum albumin (BSA), the polyhexose glycogen (GLY), and the peptidoglycan of Bacillus subtilis (PG), form the basis for a set of 10 simulated mixture spectra. Although PG contains aliphatic amino acid residues in addition to the N-acetylamino-sugar chain, the latter units dominate in the pyrolysis mass spectrum. The spectra were taken from literature (I). A matrix of random numbers was generated by computer for constructing the mixtures. In matrix notation

where [SI is the data matrix, a 100 X 10 matrix of the intensities of the 100 masses in the 10 different mixtures. The matrix [C], containing the spectra of the three components, is a 100 x 3 matrix and [F] is the matrix (3 X 10) with the mixture composition.

As the factor analysis program used can handle a maximum of 100 variables, the 100 most intense mass peaks out of the 150 present were chosen from the averaged spectrum of the 10 mix- tures. The sum of the elements of each column of [C] is 1, resulting in normalized spectra in [SI.

Real Mixtures. Ten mixtures of three water-soluble bio- polymers, gelatin (protein), corn starch (hexose polymer), both

0 1984 American Chemical Soclety

Page 2: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

2298 ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1984

of household quality, and chondroitin sulfate (aminosugar, C2139, Sigma Chemical Corp. St. Louis, MO), in concentrations of 1 mg/mL HzO were sample in triplicate. Drops of 5 KL were coated on ferromagnetic wires with a Curie-point temperature of 610 “C and dried under continuous rotation.

Lignites. The lignite data set consisted of Py-MS spectra of 28 representative lignite samples from the Penn State/DOE coal sample bank obtained through P. Given. Samples were ground under liquid nitrogen and the resulting powder (-300 mesh) was suspended in methanol at a concentration of 4 mg/mL. Drops of 5 KL were coated on 610 “C wires and dried under continuous rotation.

Grass Leaves. A detailed description of the experiments from which these data are taken can be found elsewhere (14). In these experiments the relationship between the pyrolysis data and the insect susceptibility of eight grass species/hybrids (Agropyron repens, A. repens X A. spicatum, A. desertorum, A. repens X A. desertorum, A. cristatum, A. repens X A. cristatum, Phalaris arundinacea, Dactylus glomerata) was studied. The results presented here were obtained on senescent (yellow colored) grass leaves.

The grasses were grown for 12 weeks (March-May) in plastic containers 3.8 cm in diameter and 30 cm long. The soil was a standard greenhouse mixture of two parts loam, one part peat, and one part sand. Temperature was 25 * 2 “C; light conditions were natural.

For every grass species or interspecific hybrid, 10 undamaged plants were sampled, by clipping equivalent lengths of a senescent leaf of each plant. The 10-leaf samples were pooled into a single sample. This procedure was repeated three times using the same plants, resulting in three samples of senescent leaves for every grass species or interspecific hybrid. The samples were ground and suspended in Spectrograde methanol at a concentration of 2 mg/mL. For analysis, 5-pL drops of the suspensions were applied on 510 OC Curie-point wires and air-dried under con- tinuous rotation.

Pyrolysis Mass Spectrometry. The basic Curie-point Py-MS approach has been described ekewhere (10, 16-18). For the studies described here an Extranuclear Model 5000-1 Py-MS system with a manual Curie-point inlet system was used. Pyrolysis conditions were as follows: equilibrium temperature (Curie-point temper- ature of the filament) 510 O C for grass leaves, 610 “C for bio- polymer mixtures and lignites, temperature rise time 5 s, total heating time 10 s. Mass spectrometric conditions were as follows: electron energy (set value) 11 eV for the lignites, 12 eV for the biopolymer mixtures, and 15 eV for the grasses; mass range scanned was m/z 20-260 for the lignites, 20-240 for the biopolymer mixtures and grass leaves, scanning speed 1000 amu/s, total scanning time 20 s.

Data Analysis. Prior to multivariate analysis, spectra were normalized with regard to total intensity in order to correct for sample size. However, very large peaks and strongly varying peaks were excluded from the calculation of the 100% total ion intensity value because of their strong influence on the relative intensities of the other peaks in the spectrum. As a consequence, the total ion intensity exceeds 100%.

Factor and discriminant analyses followed by graphical rotation have been described previously from a geometric point of view (1-3). Factor and discriminant analyses describe the variance in a data set in independent linear combinations of the variables, according to eq 1, where LJ is the linear combination of the n mass

(1) L, = muml + azjmz + ... mn,mn

j = 1, 2, ... m (m I n)

variables. The coefficients aLj are the loadings of the variables mi to describe the partial contribution of variable mi to LJ. The loadings are the correlation coefficients of the variables with that factor and thus may be considered as the projection of the mass variable axis, with length 1 on the factor, where the cosine of the angle between the “mass axis” and the factor equals the correlation coefficient. A geometrical representation of the results of factor and discriminant analysis will be given under Results and Dis- cussion.

The linear combinations (the “factors”) obtained by factor analysis describe the total variance in the data set as efficiently

Flgure 1. Representation of the original mass variables of the simu- lated biopolymer mixture data set in the space described by factors F1 and F2. Bars indicate mass variables within 10’ sectors. The length of the bars is proportional to the number of mass variables in the bars. The component axes are indicated by arrows: (a) protein axis: (b) hexose axis; (c) amlnosugar axis.

as possible: the first factor describes the maximum variance, the second orthogonal factor the maximum of the residual variance, etc.

Discriminant analysis describes the maximum of the ratio between-group variance/within-group variance, where the be- tween-group variance is the difference between groups of samples, e.g., triplicate analyses of the same sample, and the within-group variance represents the differences within the groups.

The first discriminant function describes the maximum of the between/within variance ratio, the second orthogonal discriminant function the maximum of the residual between/within variance ratio, etc. For a comparison of the linear combinations of the variables with pyrolysis mass spectra of chemical components, the loadings have been multiplied by the standard deviation of the mass variables in the data set, according to Windig et al. (1). The resulting values are plotted in the form of spectra and are called factor spectra and discriminant spectra, respectively.

The Statistical Package for the Social Sciences (19), in com- bination with programs for representing the results in the form of spectra and a graphical rotation routine ( I ) , was used for computerized data analysis.

The “intensities” of the original spectra on the factors or discriminant functions are called factor and discriminant scores, respectively. They are presented in standardized form, Le., the mean of the scores of the factors or discriminant functions is 0 and the standard deviation 1.

The geometrical meaning of the linear combination of variables is discussed in Results and Discussion.

RESULTS AND DISCUSSION In order to help develop and test the proposed mathematical

procedure for determining components in a set of mixture spectra, a simulated set of mixture spectra was constructed by linearly combining three actual pyrolysis mass spectra. Since the simulated data set is based on a normalized system of three variables (the components), factor analysis will result in two factors. The two factors describe the same information as the original 100 mass variables. The contribution of the original mass variables can be visualized by plotting the loadings of the two factors in a system of axes representing the two factors (see Figure 1). Since the data set contains only two-dimensional information, all mass variables are fuUy described in this system and, consequently all mass variables have unit length (loading = 1.0). The resulting plot, an “oblique” coordinate system of mass variable axes, is complex. Therefore in Figure 1 we have attempted to simplify the information by indicating which mass variables are present in sectors of 10”. The contributions of the original spectra (the “scores”) can also be plotted in this coordinate system. The relative contribution of each of the mass variables can simply be derived by projecting the score values on the re- spective mass axes. The directions of the “component axes” ( I ) , i.e., the directions in the space in which the projected

Page 3: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

scores are 100% correlated with the concentrations of the respective components and the spectral patterns associated with the changes in the data set (the factor spectra) are op- timally reflecting the components involved (I), are indicated by arrows.

The first impression gained from Figure 1 is that of the complexity of the oblique mass axes system; there are no distinct clusters of mass variables related to the component axes. The complexity of the figure is caused by the fact that the mass variables have a partial correlation with each com- ponent. Furthermore, the normalization procedure applied also influences the correlation structures in the data set. The spectra are normalized for constant total intensity; conse- quently a change in the data set has to be “compensated” by a negatively correlated change. This is clearly demonstrated by most difference spectra, factor spectra, and discriminant spectra which generally show positive and negative intensities (1-3, IO, 20).

A closer study of the diagram in Figure 1 reveals that there are relatively large numbers of mass variables in the directions of the component axes. This indicates that the contribution of mass variables in the direction of the mass axes may contain a clue for mixture analysis, thus prompting us to calculate the variance contribution of the mass variables in all directions. A commonly used approach to determine this contribution is to calculate the variance by simply adding up the square loadings of a factor. Subsequently the contribution of all masses to each direction is calculated or, in geometrical terms, a window of 360° is used. However, a 360’ window is not useful for our purpose since, by definition, it produces a maximum in the direction of F1. The maximum of the re- sidual variance is in the direction of F2 (and so on if more than 2 factors are present). Another, often used standard rotation method is Varimax (19), which assumes simple correlation structures in the data set by trying to find factors with loading values of -1, 0, or +1 thereby emphasizing mass variables with a high positive, high negative, or zero correlation, respectively. It will be clear that the data originating from Py-MS studies do not have such simple structures. Furthermore, Varimax uses orthogonal rotation criteria and is thus not suitable for determining oblique component axes such as shown in Figure 1. The orthogonal Quartimax (19) rotation 1I18ximi28s variance in a relatively narrow window (approximately 66’) by max- imizing the sum of the factor loadings to the fourth power. Aa outlined above, however, orthogonal rotations are not suited for our data. The oblique rotation method available in the SPSS package (17) combines features of Quartimax and Varimax, but allows a certain degree of obliquenes between factors, controlled by the parameter delta, which has to be determined by the researcher. The number of oblique factors is equal to the number of the original unrotated factors. In normalized systems the number of components (=oblique factors) is at least one more than the number of factors, so the oblique rotation of SPSS is also not suited for Py-MS data seta.

The circular histogram in Figure 1 demonstrates that a loo window shows maxima in the directions of the component axes although other maxima are also observed. This made u*s decide to calculate the variance in windows of 10’ using the following algorithm:

where ai = cyi l cos y + ai2 sin y and Var (W = 81, is the variance in the direction in the space at an angle of y degrees with the fist fador (or discriminant function), using a window W of 8 degrees. Furthermore, ai is the loading of mass variable i on the rotated function, rn is the total number of mass

ANALYTICAL CHEMISTRY, VOL. 56, NO. 13, NOVEMBER

a 90. I

b

I 270’

F2

c

I 270.

90.

I 2 70’

Flgure 2. (a) Variance dlagram obtained on a simulated mixture data set by using a 10’ window for variance calculation. The variance values as obtained by eq 2 for each sector are plotted as points on the correspondlng axes. The dlstance from the center of the diagram to each point is proportional to the variance value. Points on nelgh- boring axes have been connected by solM lines. All values have been scaled in such a way that the highest variance value is full scale. (b) Variance diagram after normailzation correction with eq 3. Note that local maxima opposite to component axes have been removed by this procedure. (c) Conflguratlon obtalned by applying the smoothing procedure in eq 4 to the variance dlagram in Figure 2b.

variables, and rvij are the loadings of the mass variables i on the unrotated factor j . For the commonly used variance 0 = 360’. Expressed in terms which are easier to visualize, ex- pression 2 accomplishes the following task in a two-dimen- sional system the sum of the squares of the lengths of all mass axes present in a pie-shaped window of @ degrees is calculated while the window “scans” the whole two-dimensional space, in discrete steps (generally loo). Applying this formula with

= 10’ and y = Oo, lo’, 20°, ... 350’ gives the diagram presented in Figure 2a.

A systematic study of Figure 2a shows that the largest local maximum that has an angle of 160’ with F1 is the component

Page 4: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

2300 ANALYTICAL CHEMISTRY, VOL. 56, NO. 13, NOVEMBER iga4

0 90' I

b

z i o .

90'

2 70'

Flgure 3. Smoothed variance diagrams obtained for a set of actual biopolymer mixtures: (a) before and (b) after normalization correctlon. The correlation coefflclents of the scores on the component axls and the known concentrations are 0.94, 0.92 and 0.93 for the corn starch axis, gelatin axis, and chondroitin sulfate axis, respectively.

axis of the aminosugar constituent. Opposite to this direction (340" with F1) we also see a maximum caused by the nor- malization procedure. The next local maximum at a 30' angle with F1 is the component axis of protein. Opposite to this local maximum there is again another local maximum (210' angle with Fl). The last component axis, the hexose axis, at a 280' angle with F1, is the next local maximum provided we ignore the local maxima opposite to the protein and amino- sugar axes, which are due to normalization. Furthermore, opposite to this last local maximum, some scattered local maxima are observed (at angles of 50°, XI", and 110" with Fl).

Evaluation of Figure 2 as described above demonstrates the necessity to apply a correction for the "normalization effect". It should be noted that in the case of only "pure masses" for each of the component axes, there would not be a local maximum due to normalization opposite to the component axes (except for cases with approximately 100% negatively correlated component axes).

cl CHONDROITIN SULPHATE '"I I

0 40 60 80 100 120 140

m/z

Figure 4. Py-MS spectra of the three components of the actual bio- polymer mixtures: (a) corn starch, (b) gelatin, and (c) chondroitin sulfate.

By trial and error a formula for correcting the "normalization effect" was developed as follows: Let the maximum in the variance be Var (W = p), where y is the angle with the first factor. Then the variance corrected for the normalization effect, Var, (W = p),, can be calculated as follows:

Var, (W = P),+lM+a, = Var (W = @)y+l~o+ai - Var (W =

where 6i = -30°, -20°, ... 30°, and if Var, (W = p)y+lBo+di is negative the value is made 0. So the variance values in a 70" window opposite to a local maximum are made 0, except when they are larger than the local maximum in order to avoid deletion of real negatively correlated tendencies.

This procedure results in the picture given in Figure 2b, in which the three largest local maxima point exactly in the directions of the component axes. However, this picture also shows many small "noise type" maxima. Consequently, the following formula was applied in order to smooth the data presented in Figure 2b: Var, (W = p), = Var (W = p), + 1/[Var (W=

where Var, (W = p), is the smoothed variance. The results of the smoothing procedure on the data cor-

rected for normalization are shown in Figure 2c. In principle, the smoothing procedure applied gives the same results as those obtained by a 20" window (p = 20') in eq 2. Thus the question may arise why a 20' window is not used directly. The reason is that for a 10' atepwise rotation (y = Oo, lo', ... 350' in eq 2) with a 20' window (p = 20" in eq 2) the variance values are partially overlapping at every step, which causes an overcorrection of the normalization effect.

The above method, when applied to the first two discri- minant functions obtained on a data set of 10 actual mixtures of three biopolymers resulted in the diagrams presented in Figure 3. The three local maxima detected after normali-

@),+Si (3)

@),+IO + Var (W = @),-l~l (4)

Page 5: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

ANALYTICAL CHEMISTRY, VOL. 56, NO. 13, NOVEMBER 1984 2301

b) GELATIN AXIS

101 TI25

c) CHONDROITIN SULPHATE AXIS

es I II

Flgure 5. Discriminant spectra corresponding to the three-component axes of the biopolymer mixtures data set: (a) discriminant spectrum of the corn starch axis (compare with Py-MS spectrum of corn starch In Figure 4a); (b) discriminant spectrum of the gelatin axis (compare with Py-MS spectrum of gelatin in Figure 4b); and (c) discriminant spectrum of the chondroitin sulfate axis (compare with Py-MS spectrum of chondroitin sulfate in Figure 4c). I t has to be noted that the pure components were not present in the data set.

a ) TEXAS LIGNITE (wildcat seam)

> v)

w I-

t

5 P z

-I

I- O

8

b) MONTANA LIGNITE

40 60 80 100 120 140 160 180 200 m h

Figure 6. typical lignite spectra from different sample sites, showing complex homologous ion serles patterns thought to represent aliphatic hydrocarbons, (alkyl)phenols, etc. Mass peaks representing sulfur containing ion species are labeled with "s", e.g., m l z 34 (H2S+9, 48 (CH,SH+.), 64 (SO2+. and/or S2+) and 76 (CS,+.).

zation correction were the same as those found by visually assisted graphical rotation. The pyrolysis mass spectra of the pure components (which are not included in the data set) are given in Figure 4; the discriminant spectra of the component axes are given in Figure 5. Since the negative part is not of interest for recognition of the pattern, it is ignored in the spectral patterns of the component axes in this article.

The next data set on which this procedure was applied represents a series of lignites (brown coals). Two typical spectra from this data set are shown in Figure 6. The dia- grams obtained for the first two discriminant functions are shown in Figure 7. The discriminant spectra of some of these maxima are shown in Figure 8, together with some reference

a

180'

b

90'

2 i o *

90.

D2 1

0' 180'

Flgure 7. Smoothed variance diagrams obtained from the lignite data set: (a) before and (b) after normalization correction.

spectra. The first impression after comparing the original spectra in Figure 6 and the mathematically extracted spectra in Figure 8 is the simplicity which greatly facilitates chemical interpretation.

The discriminant spectrum of the component axis at Oo with D1 clearly shows a characteristic pattern of aliphatic hydro- carbons (e.g., at m/z 56, 70, 84, etc.) as well as a presumed hardwood lignin marker (at m/z 194) as can be deduced from Figure 8a,b. The discriminant spectra at 50' and 90° (not shown) show the optimal representation of reduced sulfur moieties (i.e., H2S at m/z 34) and more oxidized sulfur moieties (Le., SO2 at m/z 64, and CS2 at m/z 76), respectively. The discriminant spectrum of the component axis at 170° (Figure 84 , clearly shows a softwood lignin pattern (compare with Figure 8d). This agrees with the depositional origin of the Northern Plains Province lignites which exhibit a high score on this component axis. Finally, the component axis at 20O0 shows an optimal representation of resin related mass peaks (not shown).

The question may arise why more than three component axes can be found in a two-dimensional system. This is the

Page 6: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

2302 ANALYTICAL CHEMISTRY, VOL. 50, NO. 13, NOVEMBER 1984

. 32

a) ALIPHATIC HYDROCARBONS 8 HARDWOOD LIGNIN AXIS

5

b) APIEZCN L

IO1 5l 5

> 25 t ul z W

t 0 W

5! 44 10 1 1 ‘f”

d) SOFTWOCO LIGNIN (SPRVCE)

5

40 60 80 100 120 140 160 180 200 rnh

Figure 8. Discriminant spectra of some of the component axes as located by means of the diagrams in Figure 6, together with relevant reference spectra: (a) discriminant spectrum of the component axis at Oo, which shows an optimal representation of aliphatic hydrocarbons (compare with the pattern of Apiezon L in Figure 8b) and of presumed hardwood lignin marker peaks at mlz 194; (c) discriminant spectrum of the component axis at 170°, which optlmally represents a soltwood lignin pattern (compare with the spruce lignin pattern in Figure 8d).

case when the dimensionality of the system is limited by other “factors” than component axes. A clear example can be found in ref 12. In this article, Py-MS patterns of three taxonomical groups of yeasta were analyzed. Although five-component axes were found in this case, the dimensionality of the factor space was limited to two since there were only three groups, which, by definition, lie in a plane.

All component axes found were the same as those found by graphical rotation and loading plots. It has to be noted that the softwood lignin pattern at 170° does not show a local maximum when the normalization correction has not been applied.

As a final example, the results obtained on a data set of grass leaves are given. Because of the complex biochemical composition of recent plant materials, spectra of the grass leaves show a complex pattern, as can be seen in Figure 9. The diagram after normalization correction (Figure lob) re- veals a complex pattern between 60° and 140° with three local maxima. Examination of the discriminant spectra (not shown) and the loadings showed a weak proteinlike pattern in this direction in the discriminant space. This possibly indicates changes in several partially correlated proteins or amino acids. Another explanation is that protein is involved in intermo- lecular interactions, such as shown previously in a mixture analysis study on Py-MS data (15).

The component axis at 200° represents a change in (po1y)hexose (e.g., cellulose) content, as can be deduced from Figure l l a and 4b. The discriminant spectrum of this com-

0 .- E; 50

b) A repens X A splcoturn

60 136 I 8

0 ? E; 50

b) A repens X A splcoturn h 8 25

0

Flgure 8, Typical pyrolysis mass spectra from the grass leaves data set. The spectra show a complex mixture of (poiy)hexoses, e.g., at rnlz 31, 32, 43, 60, 74, 96, 98, 110, 112, and 126; (poiy)pentoses, e.g., at mlz 85 and 114 and several other component classes.

?’ 8

D2

110’

b

2 i o .

‘i’

2 io - Figure 10. Smoothed variance diagram obtalned from the grass leaves data set: (a) before and (b) after normalization correction.

Page 7: Nonsupervised numerical component extraction from pyrolysis mass spectra of complex mixtures

ANALYTICAL CHEMISTRY, VOL. 56, NO. 13, NOVEMBER 1984 2303

'01 168 b) POLYISOPREWD AXIS

61j68 c ) NATURAL RUBBER 'O1

I' 9" 51

Figure 11. Discriminant spectra of grass leaf component axes and relevant model compound spectrum: (a) dlscrlmlnant spectrum of the component axls at 200' which optimally represents a hexose pattern (compare wlth typlcal hexose pattern of gelatin In Figure 4b); (b) dis- criminate spectrum of the component ax18 at 340' which optimally represents a (poly)lsoprenoM pattern (compare with the typical pattern of natural rubber in Figure 1 IC).

ponent axis also shows mlz 85 and 114, which indicates a partial correlation with a change in a (po1y)pentose (e.g., hemicellulose) component as deduced from the loadings of these and other pentose related mass values.

The component axis at 340' shows the optimal repre- sentation of a relatively pure (po1y)isoprenoid pattern, as can be deduced from Figure llb,c. From these results it is obvious again that factor analysis techniques can greatly simplify complex spectroscopic patterns.

The component axes obtained were the same as those found with graphical rotation. However, it appeared that a com- ponent axis of pentose, opposite to the polyisoprenoid axis, was lost by the normalization procedure. It has to be noted that the pentose axis is not clearly present in the diagram before normalization correction either.

This procedure can easily be extended to more than two dimensions. For instance, let us assume that a certain com- ponent is described by three functions. Then the first step is to find the maximum of this component axis in the space of the first two functions L1 and LP If the local maximum of component axis is found at an angle of a degrees with L1, the next step is to study the variance diagram of the space described by ((cos a)& + (sin a)L2) and La. A variance di- agram determined in the above described way in a three-di- mensional space can be found elsewhere (21). Our experience with Py-MS data is that rarely ever more than five or six linear combinations are needed to describe the bulk of the data.

CONCLUSIONS From the results presented above and from results obtained

on other data sets (not shown) the rotational procedure de- scribed above proved to be able to find component axes. It is clear, however, that the results have to be checked by the researcher. Jae-Ou Kim words this well in the SPSS manual (19) discussing the oblique rotation: "Such method, however,

can be adequately achieved only with some visual or graphical aid and the discerning eye of the researcher." In our laboratory we use the mathematical method in the following way. First a study is made of the factor or discriminant spectra which allows a rough estimate to be made of the directions of the component axes. Subsequent comparison of the variance diagrams before and after normalization correction generally provides a clear indication for the directions of the component axes. Finally loading values are then used to confirm the presence of certain component axes. Compared with graphical rotation, the number of rotated factor or discriminant spectra to be checked is greatly reduced, thus saving time. Fur- thermore, the mathematically defined component axes can be reproduced in a more objective manner than axes deter- mined by visual means alone.

An obvious advantage of representing results in the form of variance diagrams is that oblique axes are much more obvious than when determined by abstract mathematical procedures such as the "delta value" technique in the SPSS oblique rotation. Although the normalization correction plays an important part in our data, for data without a normalization effect a variance diagram certainly is a helpful tool for oblique rotations.

ACKNOWLEDGMENT The authors want to acknowledge the expert technical

assistance of Tony Schurtz, Annabelle Sorrels, and Van Nguyen, Melinda Van, and Lisa Casner in analyzing the samples, processing the data, and preparing the manuscript.

LITERATURE CITED (1) Windig, W.; Klstemaker, P. G.; Haverkamp, J. J. Anal. Appl. Pyrolysis

(2) Wlndig, W.; de Hoog, G. S.; Haverkamp, J. J. Anal. Appl. Pyrolysis

(3) Wlndig, W.; Haverkamp, J.; Kistemaker, P. G. Anal. Chem. 1983, 55,

(4) Waiier, G. R., Dermer, V. C., Eds. "Biochemical Application of Mass Spectrometry"; Wiley: New York, 1980 Section 11.

(5) Malinowskl, E. R.; McCue, M. Anal. Chem. 1977, 49, 264-287. (6) Maiinowskl, E. R.; Howery, D. G. "Factor Analysis in Chemistry"; Wi-

ley-Interscience: New York, 1980. (7) Knorr, F. J.; Futreil, J. H. Anal. Chem. 1979, 57, 1236-1241. (8) Maiinowski, E. R. Anal. Chim. Acta 1982, 734, 129-137. (9) Giilete, P. C.; Lando, J. B.; Koenig, J. L. Anal. Chem. 1983, 55,

(IO) Meuzeiaar, H. L. C.; Haverkamp, J.; Hileman, F. D. "Pyrolysis Mass Spectrometry of Recent and Fossli Biomateriais; Compendium and Atlas"; Eisevier: Amsterdam, 1962.

(11) Wlndig, W.; Haverkamp, J. Stud. Mycol. 1982, 22, 56-59. (12) Windig, W.; de Hoog, 0. S. Stud. Mycol. 1982, 22, 60-64. (13) Windig, W.; Meuzelaar, H. L. C.; Haverkamp, J. Proc. R. Netherlands

Acad. Arts Sci., Ser. C 1983, C86 (3), 437-444. (14) Windig, W.; Meuzelaar, H. L. C.; Haws, B. A.; Campbell, W. F.; Asay,

K. H. J . Anal. Appi. Pyrolysis 1883, 5, 183-198. 15) Van de Meent, D.; de Leeuw, J. W.; Schenck. P. A.; Windig, W.; Ha-

verkamp, J. J. Anal. Appl. Pyro/ys/s 1982, 4 , 133-142. 16) Meuzelaar, H. L. C.; Kistemaker, P. G. Anal. Chem. 1973, 45,

17) Meuzelaar, H. L. C.; Klstemaker, P. G.; Posthumus, M. A. Biomed.

18) Meuzeiaar, H. L. C.; Kistemaker, P. G.; Eshuis, W.; Engel. H. W. B. I n "Rapid Methods and Automation in Microbiology"; Johnston, H. H., Newsom, S. W. B., Eds.; Learned Information: Oxford, New York,

19) Nie, N. H.; Hull, C. H. G.; Jenkins, J. G.; Steinbrenner, K.; Bent, W. H. Statlstlcai Package for the Social Sciences", 2nd ed.; McGraw-Hill:

(20) Wlndig, W.; Meuzelaar, H. L. C.; Jacobson, M. J. Anal. Appi. Pyrolysis

(21) Wlndig, W.; Meuzelaar, H. L. C.; Shafizadeh, F.; Keisey, R. G. J. Anal.

1981, 3 , 199-212.

1981, 3 , 213-220.

81-88.

630-633.

587-590.

MeSS SpeCtr~m. 1974, 7 , 312-319.

1976; pp 225-230.

New York, 1975.

1983, 5, 349-382.

Appl. Pyrolysis, in press.

RECEIVED for review March 7,1984. Accepted June 22, 1984. The research reported in this publication was supported by NIH Grant No. 5ROlGM28581-03 and ARO Contract DAAG29-82-K-0193.