10
A New Mapping/Exploration Approach for HT Synthesis of Zeolites Avelino Corma,* Manuel Moliner, Jose M. Serra, Pedro Serna, Marı ´a J. Dı ´az-Caba n ˜ as, and Laurent A. Baumes  Instituto de Tecnologı ´a Quı ´mica, UPV-CSIC, UniVersidad Polite ´ cnica de Valen cia,  AVda. de los Naranjos s/n, 46022 Valencia, Spain  ReceiVed Mar ch 15, 2006. ReVised Manuscript ReceiVed May 4, 2006 This work shows a methodology for the synthesi s of self-as sembled organic -inorganic materials which integrates high-throughput tools for the synthesis and characterization of solid materials and data-mining techniques in materials science. This is illustrated by a detailed exploration of the hydrothermal synthesis in the system SiO 2 :GeO 2 :Al 2 O 3 :F - :H 2 O:N(16) methylsparteinium. Data analysis and dimensional reduction were conducted by using principal components analysis and clustering algorithms, allowing the definition of a new and suitable structural vector which summarizes the X-ray diffraction characterization data as well as an improvement of data visualization and interpretation. Different modeling techniques were applied for the prediction of the properties of the materials considering the synthesis descriptors as input of the model. Furthermore, different “material property” descriptors were considered as outcome of the model, that is, the crystallinity of the formed phases, structural principal components computed by principal component analysis, or clustering results. It was found that the final properties of the materials could be successfully modeled using artificial neural networks and decision trees. 1. Introduction The application of combinatorial and high-throughput (HT) techniq ues to materia ls science can help chemists to increase the number of variables of a given process that can be studied in a reasonable time period as well as to increase the number of samples produced and characterized. 1-3 Moreover, data mining and database technology are applied for the analysis and mod eli ng of the large amo unt s of dat a genera ted , al lowi ng in turn a spee di ng up of the di scover y and optimiza tion process while establishing scientific principles. In recent years, the usefulness of HT methods has been proven for the discovery of solid functional materials. 4-8 Ind eed , these met hod s allow the simult aneous study of nume rous synth esis and proce ssing variable s, this being especia lly important when dealing with highly nonlinear and multidimensional systems as is the case for the synthesis of microporous molecular sieve systems. The hydrothermal crystallization processes of microporous mate rial s are governed by a large number of para mete rs which determine the phases formed and the crystallization kinetics. Despite the notable efforts made to rationalize the synthesis of zeolites, 9-12 the relationship between synthesis variables and the zeolitic struct ure formed is not clea rly understood, because of the metastable nature of zeolites and the complexity of the involved synthesis mechanisms. As a result of this, the discovery of new microporous materials is still predominantly an empirical process, though strongly helped by accumulated experience. High-throughput methods should be useful in this field 13-17 to determine the effect of different synthesis parameters and to help in the discovery of new zeolites. Very recently, a new zeolite, named ITQ-21, containing Si, Ge, and opti onal ly Al as fr amewor k cati ons was reported. 18 This material presents a unique pore topology formed by nearly spherical large cavities of 1.18 nm diameter  joined to six other neighbored cavities by circular 12-ring pore windows with an aperture of 0.74 nm, which results in a three-directional channel system of fully interconnected * T o whom correspo nden ce should be addressed. Tel.: 34(9 6)387 7800 . Fax: 34(96)387780 9. E-mail : acorma@i tq.upv.es. (1) Combinatorial Materials Science; Xiang, X. D., Takeuchi, I., Eds.; Dekk er: New York , 2003. (2) Koinuma, H.; Takeuchi, I. Nat. Mater. 2004, 3, 429-438. (3) Hanak , J. J. Appl. Surf. Sci. 2004, 223, 1-8. (4) Gorer , A. U.S. Patent 6.723 .678 , 2004 , to Symyx Technolog ies Inc. (5) Sohn , K. S.; Seo, S. Y.; Park, H. D. Electrochem. Solid State Lett. 2001, 4, H26-H29. (6) Bouss ie, T. R.; Diamo nd, G. M.; Goh, C.; Hall, K. A.; LaPointe, A. M.; Cheryl Lund, M. L.; Murphy, V.; Shoemaker, J. A. W.; Tracht, U.; Turner, H.; Zhang, J.; Uno, T.; Rosen, R. K.; Stevens, J. C. J.  Am. Chem. Soc. 2003, 125, 4306-4317. (7) Corma, A.; Serra , J. M.; Serna, P.; Argen te, E.; Valero, S.; Botti, V.  J. Catal. 2005, 229, 513-524. (8) Klann er, C.; Farru sseng , D.; Baumes , L. A.; Mirodat os, C.; Schut h, F. Angew. Chem., Int. Ed . 2004, 43 (40), 5347-5349. (9) Piccione, P. M.; Yang, S. ; Navrotsky, A.; Davis, M. E. J. Phys Chem.  B 2002, 106 , 3629. (10) Corma , A.; Davis, M. E. ChemPhysChem. 2004, 5 (3), 304-313. (11) Schu ¨th, F.; Schmidt, W. Ad V. Eng. Mater. 2002, 4 (5), 269-279. (12) Rajagopalan, A.; Suh, C.; Li, X.; Rajan, K. Appl. Catal., A 2003, 254, 147-160. (13) Akpo riaye , D. E.; Dahl, I. M.; Karlsso n, A.; Wende lbo, R. Angew. Chem., Int. Ed. 1998, 37 (5), 609-611. (14 ) Holmg ren, J.; Bem, D.; Bri cke r, M.; Gil lesp ie, R.; Lewis, G.; Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R. Stud. Surf. Sci. Catal. 2001, 135, 461-470. (15) Bricke r, M. L.; Sacht ler, J. W. A.; Gillespi e, R. D.; McGone ral, C. P.; Vega, H.; Bem, D. S.; Holmgren, J. S. Appl. Surf. Sci. 2004, 223 (1-3), 109-117. (16) Pescar mona , P. P.; Rops, J. J. T.; van der Waal, J. C.; Janse n, J. C.; Maschmeyer, T. J. Mol. Chem. A 2002, 182-183, 319-325. (17) Klein, J.; Lehma nn, C. W.; Schmi dt, H. W.; Maier , W. F. Angew. Chem., Int. Ed. 1999, 38, 3369. (18) Corma, A. ; Dı ´az-Caban ˜ as, M. J.; Ma rtı ´nez-Triguero, J.; Rey, F.; Rius, J. Nature 2002, 418, 514-517. 3287 Chem. Mater. 2006, 18, 3287-3296 10.1 021/ cm06 0620 k CCC: $33 .50 © 2006 American Chemi cal Soc iety Published on Web 06/20/2006

chem mater 2006 18 3287 3296

  • Upload
    baumesl

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 1/10

A New Mapping/Exploration Approach for HT Synthesis of Zeolites

Avelino Corma,* Manuel Moliner, Jose M. Serra, Pedro Serna, Marı ´a J. Dıaz-Cabanas, andLaurent A. Baumes

Instituto de Tecnologı ´a Quı mica, UPV-CSIC, Uni Versidad Polite ´cnica de Valencia, AVda. de los Naranjos s/n, 46022 Valencia, Spain

ReceiV

ed March 15, 2006. ReV

ised Manuscript ReceiV

ed May 4, 2006

This work shows a methodology for the synthesis of self-assembled organic - inorganic materials whichintegrates high-throughput tools for the synthesis and characterization of solid materials and data-miningtechniques in materials science. This is illustrated by a detailed exploration of the hydrothermal synthesisin the system SiO 2:GeO 2:Al2O3:F- :H2O:N(16) methylsparteinium. Data analysis and dimensional reductionwere conducted by using principal components analysis and clustering algorithms, allowing the definitionof a new and suitable structural vector which summarizes the X-ray diffraction characterization data aswell as an improvement of data visualization and interpretation. Different modeling techniques wereapplied for the prediction of the properties of the materials considering the synthesis descriptors as inputof the model. Furthermore, different “material property” descriptors were considered as outcome of themodel, that is, the crystallinity of the formed phases, structural principal components computed by principalcomponent analysis, or clustering results. It was found that the final properties of the materials could besuccessfully modeled using artificial neural networks and decision trees.

1. Introduction

The application of combinatorial and high-throughput (HT)techniques to materials science can help chemists to increasethe number of variables of a given process that can be studiedin a reasonable time period as well as to increase the numberof samples produced and characterized. 1- 3 Moreover, datamining and database technology are applied for the analysisand modeling of the large amounts of data generated,allowing in turn a speeding up of the discovery andoptimization process while establishing scientific principles.In recent years, the usefulness of HT methods has beenproven for the discovery of solid functional materials. 4- 8

Indeed, these methods allow the simultaneous study of numerous synthesis and processing variables, this beingespecially important when dealing with highly nonlinear andmultidimensional systems as is the case for the synthesis of microporous molecular sieve systems.

The hydrothermal crystallization processes of microporousmaterials are governed by a large number of parameterswhich determine the phases formed and the crystallization

kinetics. Despite the notable efforts made to rationalize thesynthesis of zeolites, 9- 12 the relationship between synthesisvariables and the zeolitic structure formed is not clearlyunderstood, because of the metastable nature of zeolites andthe complexity of the involved synthesis mechanisms. As aresult of this, the discovery of new microporous materials isstill predominantly an empirical process, though stronglyhelped by accumulated experience. High-throughput methodsshould be useful in this field 13- 17 to determine the effect of different synthesis parameters and to help in the discovery

of new zeolites.Very recently, a new zeolite, named ITQ-21, containing

Si, Ge, and optionally Al as framework cations wasreported. 18 This material presents a unique pore topologyformed by nearly spherical large cavities of 1.18 nm diameter joined to six other neighbored cavities by circular 12-ringpore windows with an aperture of 0.74 nm, which results ina three-directional channel system of fully interconnected

* To whom correspondence should be addressed. Tel.: 34(96)3877800.Fax: 34(96)3877809. E-mail: [email protected].

(1) Combinatorial Materials Science ; Xiang, X. D., Takeuchi, I., Eds.;Dekker: New York, 2003.(2) Koinuma, H.; Takeuchi, I. Nat. Mater. 2004 , 3, 429- 438.(3) Hanak, J. J. Appl. Surf. Sci . 2004 , 223 , 1- 8.(4) Gorer, A. U.S. Patent 6.723.678, 2004, to Symyx Technologies Inc.(5) Sohn, K. S.; Seo, S. Y.; Park, H. D. Electrochem. Solid State Lett.

2001 , 4, H26 - H29.(6) Boussie, T. R.; Diamond, G. M.; Goh, C.; Hall, K. A.; LaPointe, A.

M.; Cheryl Lund, M. L.; Murphy, V.; Shoemaker, J. A. W.; Tracht,U.; Turner, H.; Zhang, J.; Uno, T.; Rosen, R. K.; Stevens, J. C. J. Am. Chem. Soc. 2003 , 125 , 4306 - 4317.

(7) Corma, A.; Serra, J. M.; Serna, P.; Argente, E.; Valero, S.; Botti, V. J. Catal. 2005 , 229 , 513- 524.

(8) Klanner, C.; Farrusseng, D.; Baumes, L. A.; Mirodatos, C.; Schuth,F. Angew. Chem., Int. Ed . 2004 , 43 (40), 5347 - 5349.

(9) Piccione, P. M.; Yang, S.; Navrotsky, A.; Davis, M. E. J. Phys Chem. B 2002 , 106 , 3629.

(10) Corma, A.; Davis, M. E. ChemPhysChem. 2004 , 5 (3), 304 - 313.(11) Schuth, F.; Schmidt, W. Ad V. Eng. Mater. 2002 , 4 (5), 269 - 279.(12) Rajagopalan, A.; Suh, C.; Li, X.; Rajan, K. Appl. Catal., A 2003 , 254 ,

147-

160.(13) Akporiaye, D. E.; Dahl, I. M.; Karlsson, A.; Wendelbo, R. Angew.Chem., Int. Ed. 1998 , 37 (5), 609 - 611.

(14) Holmgren, J.; Bem, D.; Bricker, M.; Gillespie, R.; Lewis, G.;Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R. Stud.Surf. Sci. Catal. 2001 , 135 , 461- 470.

(15) Bricker, M. L.; Sachtler, J. W. A.; Gillespie, R. D.; McGoneral, C.P.; Vega, H.; Bem, D. S.; Holmgren, J. S. Appl. Surf. Sci. 2004 , 223(1- 3), 109 - 117.

(16) Pescarmona, P. P.; Rops, J. J. T.; van der Waal, J. C.; Jansen, J. C.;Maschmeyer, T. J. Mol. Chem. A 2002 , 182 - 183 , 319- 325.

(17) Klein, J.; Lehmann, C. W.; Schmidt, H. W.; Maier, W. F. Angew.Chem., Int. Ed. 1999 , 38 , 3369.

(18) Corma, A.; Dı az-Cabanas, M. J.; Martı ´nez-Triguero, J.; Rey, F.; Rius,J. Nature 2002 , 418 , 514- 517.

3287Chem. Mater. 2006, 18, 3287 - 3296

10.1021/cm060620k CCC: $33.50 © 2006 American Chemical SocietyPublished on Web 06/20/2006

Page 2: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 2/10

large cavities. This zeolite was synthesized using a large andrigid structure-directing agent, N(16)-methylsparteinium(MSTP), and the directing effect of Ge toward the formationof structures containing double four rings seems decisive forthe synthesis of ITQ-21. 19 Zeolite ITQ-30 20 is a new structureof the MWW family, which is more closely related to MCM-5621 but with clearly different X-ray diffraction (XRD)features. The thermal and hydrothermal stability of zeolitesincreases as the germanium content decreases. Furthermore,it is important for catalytic applications to find out thesynthesis conditions in which fully crystalline samples of ITQ-21 could be obtained with the lowest amount (or none)of Ge and the highest acidity [determined by the (Si + Ge)/ Al ratio].

Classical designs of experiments (DoE), 22 like factorialor combination designs, have been applied successfully, whenexploring the synthesis gel conditions aimed at the discoveryof new zeolites or the optimization of existing ones. 23- 25 Itis clear that the synthesis variables should be carefullyselected in order to cover the largest part of the mostpromising parameter space, while keeping the total number

of experiments at a reasonable and feasible level. Moreover,the HT methods currently applied for parallel hydrothermalsynthesis strongly constrain how the synthesis parameterscan be experimentally studied. For instance, when usingautoclave arrays (multiautoclaves with 15 - 96 wells), theintensive exploration of crystallization temperature and timeis restricted. Therefore, DoE strategies should be developedwhich consider the specific aspects of HT methods in thisfield, while minimizing the number of experiments. On thebasis of the data analysis/mining methodology applied in thiswork, we propose a new mapping/exploration approach forreducing the screening of low-promise conditions, within themultivariate synthesis spaces found in microporous systems.

2. Experimental Section and the Design of Experiments

A detailed exploration of the hydrothermal synthesis in systemSiO2:GeO 2:Al2O3:F- :H2O:MSPT has been performed, to understandthe influence of these factors on the growth of ITQ-21 and ITQ-30,at 175 ° C under static conditions. Parallel syntheses were developedusing a robotic system and 15-fold Teflon-lined stainless steelautoclaves for the crystallization. 25 Crystallinity was measured bymeans of XRD, using a multisample Phillips X’Pert diffractometeremploying Cu K R radiation. A factorial experimental design (4.3 2.22

) 144) was selected for studying simultaneously the concentrationsof the components in the starting gel, that is, Al/(Si + Ge), MSPT/ (Si + Ge), F - /(Si + Ge), and Si/Ge molar ratios, as well as the

crystallization time. Table 1 shows the values and levels consideredfor the different variables. For experimental details, see theSupporting Information.

Different data-mining techniques have been applied to extractknowledge about the relationships between synthesis conditions andthe occurrence of different zeolite phases, minimizing the humanparticipation in the analysis of the great amount of data generated.Furthermore, the advantages of data-mining techniques whenprocessing, visualizing, and interpreting this type of nonlinear datahave been shown. In this sense, three issues are key in ourmethodology: (i) the analysis and extraction of knowledge (i.e.,Pareto analysis and data visualization techniques), (ii) a reductionof the complexity/dimensionality of the problem, minimizing theinformation loss (i.e., clustering analysis and principal component

analysis, PCA), and (iii) modeling, enabling one to make a prioripredictions (i.e., classification trees and neural networks, NNs).Moreover, this approach combining diverse data-mining techniqueshas been shown as a realistic way of statistically treating data frommaterials science. At last, we have used the NN model based onITQ-21 crystallinity to minimize the germanium content presentin the final structure, to increase its thermal stability, whilemaintaining high crystallinity. More details for data-mining tech-niques are described in the Supporting Information.

3. Results and Discussion

3.1. Screening Results: Phase Diagram. Figure 1 showsthe phase diagram obtained following the factorial design

(19) Blasco, T.; Corma, A.; Dı´az-Cabanas, M. J.; Rey, F.; Rius, J.; Sastre,

G.; Vidal-Moya, J. A. J. Am. Chem. Soc. 2004 , 126 , 13414 - 13423.(20) Corma, A.; Dı az-Cabanas, M. J.; Moliner, M.; Martı ´nez, C. Discoveryof a new catalytically active and selective zeolite (ITQ-30) by high-throughput synthesis techniques. J. Catal . in press.

(21) Fung, A. S.; Lawton, S. L.; Roth, W. J. U.S. Patent 5 362 697, 1994,to Mobil Oil Corp.

(22) Montgomery, D. C. Design and Analysis of Experiments , 4th ed.; JohnWiley & Sons Inc.: New York, 1997.

(23) Tagliabue, M.; Carluccio, L. C.; Ghisletti, D.; Perego, C. Catal. Today2003 , 81 , 405- 412.

(24) Holmgren, J.; Bem, D.; Bricker, M. L.; Gillespie, R. D.; Lewis, G.;Akporiaye, D.; Dahl, I.; Karlsson, A.; Plassen, M.; Wendelbo, R.Proceedings of the 13th International Zeolite Conference ; Montpellier,France, July 8 - 13, 2001; Galarneau, A., Di Renzo, F., Fajula, F.,Vedrine, J., Eds.; Stud. Surf. Sci. Catal. 2001 , 135 , 461.

(25) Moliner, M.; Serra, J. M.; Corma, A.; Argente, E.; Valero, S.; Botti,V. Microporous Mesoporous Mater. 2005 , 78 , 73- 81.

(26) Lobo, R. F.; Davis. M. E. Microporous Mater. 1994 , 3, 61.

Table 1. Levels and Ranges of Synthesis Factors Employed in theExperimental Design

variation ranges

numberlevel level 1 level 2 level 3 level 4

time (days) 2 1 5Si/Ge 4 15 20 25 50Al/(Si + Ge) 3 0.02 0.04 0.067MSPT/(Si + Ge) 2 0.25 0.5F/(Si + Ge) 2 0.25 0.5H2O/(Si + Ge) 3 2 5 10

Figure 1. Phase diagram showing the occurring materials as a function of the five synthesis variables (starting gel molar ratios and crystallizationtime).

3288 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Page 3: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 3/10

described above. ITQ-21, ITQ-30, and amorphous materialwere obtained in the explored space. The standard X-raydiffractograms for each crystalline phase are shown in Figure2. Automatic calculation of the occurrence and crystallinitywas done integrating the area of the characteristic peaks foreach phase and referring this to the fully crystalline materials.For ITQ-21, the integrated area is comprised of a 2 θ anglebetween 25.4 and 27.2 ° , and for ITQ-30, the range is between24.6 and 25.4 ° . Because ITQ-30 also presents diffractionpeaks in the 25.4 - 27.2 ° region, the percentage of ITQ-30is subtracted considering the crystallinity measured from thepeak located at 25.0 ° . Considering the crystallinity of thesynthesized materials, three different groups have beencreated. A material is qualified as “amorphous” if both theITQ-21 and ITQ-30 crystallinities are below 20%. “ITQ-21” is defined as a material for which the ITQ-21 crystallinityis higher than 20% and ITQ-30 below 20%. If the ITQ-30crystallinity is greater than 20%, the material is noted as“ITQ-30”.

A first approach using Pareto analysis shows in Figure 3

the relative influence of each synthesis factor over thecrystallinity of ITQ-21 and ITQ-30 samples. In this chart,the length of each bar is the estimated effect divided by itsstandard error, which is equivalent to computing a t statisticfor each effect. The vertical line on the plot means that barswhich extend beyond the line correspond to effects that arestatistically significant at the 95% confidence level. Thisstatistical way of understanding the results allows quantifica-tion of the hypothetical weight of the factors in the growthof materials. Both ITQ-21 and ITQ-30 seem to be quiteinfluenced in a negative sense by water and aluminumcontent; that is, the more water or the higher Al/(Si + Ge),the less crystalline are the samples. Afterwards, MSPT/

(Si + Ge) and F/(Si + Ge) play a positive role in theformation of ITQ-21 and ITQ-30. However, some importantdifferences can be observed when comparing the analysesfor ITQ-21 and ITQ-30. On one hand, the relative importanceof MSPT/(Si + Ge) and F/(Si + Ge) is higher for ITQ-30,because only in a few small zones can this material beobtained with the minimum content of MSPT/(Si + Ge) andF/(Si + Ge). On the other hand, Si/Ge appears as animportant negative factor for ITQ-21 samples, while itbecomes slightly positive for ITQ-30 samples. This resulthas to be understood as a penalization for the growth of ITQ-21 when increasing the Si/Ge ratio, because the crystallinitydecreases but also some syntheses change to ITQ-30. Thisreason can be applied for the slight benefit of Si/Ge for ITQ-30, taking into account a balance between the loss of crystallinity and the appearance of new ITQ-30 points.However, ITQ-21 samples appear with a lower Si/Ge content.Finally, the relative influence of time for these materials isquite different, being much more important in the case of ITQ-30 than in that of ITQ-21. This effect of time could be

understood as a retransformation process of ITQ-21, in sucha way that ITQ-30 can only be obtained in 1 day if it isworked with the maximum levels of MSPT/(Si + Ge) andF/(Si + Ge) and the minimum level of Al/(Si + Ge).

3.2. Analysis and Knowledge Extraction from HTExperimental Data. In this section, different techniques of unsupervised analysis will be applied to the original dataset derived from the XRD characterization of the whole setof samples, allowing an improvement in data visualization,classification, and the ulterior knowledge extraction. Indeed,structural vectors will be computed from the raw character-ization data by means of dimensional reduction and analysistechniques, that is, clustering algorithms and PCA.

Figure 2. XRD patterns of ITQ-21 and ITQ-30.

Figure 3. Standardized Pareto chart for ITQ-21 and ITQ-30 formation, showing the effect of the different synthesis factors on the crystallinity of eachzeolite. The length of each bar displayed in the frequency histogram is proportional to the absolute value of its associated estimated effect.

A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3289

Page 4: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 4/10

Clustering analyses of raw XRD data allow classificationof the as-synthesized samples into different structural groupswithout applying any previous knowledge. That can be of interest when the resulting materials contain mixtures of phases or unknown phases, where the conventional phaseidentification systems find difficulties. Moreover, this typeof data classification allows the achievement of high degreesof automation in the high-throughput experimental workflow.

3.2.A. Clustering Analysis. The k -means clustering algo-rithm examines each sample from the population and assignsit to one of the clusters trying to minimize the varianceintraclass and maximize the variance interclass. The centroidof one cluster is iteratively computed when a new componentis added to the cluster, this process being repeated until allof the components are grouped into the selected number of clusters. This methodology suffers from the initialization of centroids. Depending on the first randomly chosen centroids,the final solution can highly change. Therefore, numerousassignments have been performed in order to get a stableand representative solution.

A first data set constituted by the XRD data of each sample

has been taken into account for the clustering analysis. Thisinvolves vectors with 800 attributes, corresponding to theintensities obtained for each diffraction angle of the 144samples. The number of clusters chosen to perform the lateranalysis was investigated by means of a tree diagram (calleda dendrogram), using Ward’s clustering method (see theClustering Analysis section in the Supporting Information).In this tree diagram (Figure 4), the different groups of samples are plotted as a function of the relative diversity of each group (linkage distance). This classification analysisshows that two big clusters can be clearly recognized,corresponding to amorphous and crystalline materials, whereasthe last cluster can be split into two new groups, correspond-

ing to ITQ-21 and ITQ-30 samples. More specific subclusterscan be related to slight differences in the XRD diffractogramsfor a given structure, because of changes in their crystallinityor germanium contents. From a practical point of view, wehave selected a number of three clusters, to make a firstclassification based on the three types of materials identifiedmanually, that is, amorphous, ITQ-21, and ITQ-30.

A second data set constituted by XRD data from thecharacteristic 2 θ range (24.5 - 27.5 ° ) of ITQ-30 for eachsample was considered. Figure 5 shows a general visualiza-

Figure 4. Tree diagram (dendrogram) showing the Euclidean distances between the different clusters and subclusters.

Figure 5. XRD measurements of the as-synthesized samples orderedconsidering the cluster distribution obtained by the k -means algorithm usingthe second data set.

3290 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Page 5: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 5/10

tion of the XRD data, ordered according to their belongingto the different clusters obtained by the k -means clustering

algorithm using the second data set. Figure 6 shows the goodmatch between the clusters obtained by k -means analysis forboth data sets and the corresponding material/phase. Theclustering analysis using the whole of the XRD data allowsone to accurately distinguish amorphous and crystallinematerials, whereas it fails only in a few samples whendistinguishing between ITQ-21 and ITQ-30 phases (Table2). However, it is possible to improve the quality of theseparation between ITQ-21 and ITQ-30 samples taking onlyinto account the range of 2 θ where these two structurespresent different peaks (24.5 ° and 27.5 ° ). The k -meansclustering in this way allows a strong improvement of theclassification between both phases, although the classification

Figure 6. Identification of the formed phase using a k -means clustering analysis.

Figure 7. Averaged XRD diffractogram for the three clusters obtained by k -means analysis.

Figure 8. Distribution of the three different phases in the SPC coordinates.(PCA computed using the whole of the XRD data, first data set.)

Table 2. Clustering Analysis Carried out Using the XRD Data,Showing the Match between Clustering Results and Phase

Identification

clustering k -means match

clustersspecific 2 θ range

match (%)complete 2 θ range

match (%)

1. amorphous 87.3 99.02. ITQ-21 100.0 89.73. ITQ-30 92.3 69.2

A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3291

Page 6: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 6/10

accuracy of the amorphous samples is reduced. Figure 7presents the averaged XRD pattern for each cluster (first data

set), showing the good match between the clustering analysisand phase identification (see the real diffractograms of standard ITQ-21 and ITQ-30 samples mentioned previously).The characteristic peaks of ITQ-30 can be observed, and theaveraged diffractogram can be clearly distinguished from theITQ-21 XRD pattern.

3.2.B. Principal Component Analysis. The PCA computedfrom the whole of the XRD data will be referred to asstructural principal components (SPCs) from here on. WhenPCA techniques are applied, it is possible to reduce the XRDvector of each sample (vectors with 800 intensities for each2θ angle) to a vector with only three new variables (SPCs),without a loss of the main information of the original data

because 81.8% of the cumulative variance has been extracted.The corresponding percentage of variance for each compo-nent (SPC#1, SPC#2, and SPC#3) is 39.8%, 32.8%, and

9.2%, respectively. Because of the simplification of theoriginal vector, we can provide now an easy visualizationof the distribution of the samples into the virtual three-dimensional SPC space. The results of the k -means clusteringalgorithm and the PCA can be combined, as it is shown inFigure 8. SPC projections of the samples are clearly separatedfrom one cluster to another.

Diffraction data usually contain information about the typeof crystalline phase as well as about the crystallinity of thematerial, crystallite size, zeolite framework composition, andso forth. Indeed, the fine-tuning of ITQ-21 crystallite sizehas been reported 19 from nanocrystals to large crystals bycontrolling the rates of nucleation and crystal growth, throughthe H2O/(Si + Ge) ratio. In the present study, trying torationalize the meaning of SPC space, we will study thevariation of phase crystallinity and framework compositioninside this new space. On one hand, Figure 9 shows thedistribution of ITQ-21 and ITQ-30 samples with differentdegrees of crystallinity into the SPC space. It can be seenthat they are clearly distributed in the space, it being possibleto correlate crystallinity against SPCs. On the other hand,

the correlation between the germanium content in the ITQ-21 framework and the SPC was studied. Given that the Si/ Ge ratio in the starting gel has been shown as a veryinfluencing factor on the final crystallinity of ITQ-21 (seethe Pareto analysis in Figure 3), the variation of the Si/Gewas followed apart from the correlation between the SPCand crystallinity. Concretely, Figure 10 represents the thirdSPC as a function of Si/Ge, for three different degrees of crystallinity. It is clear that SPC#3 is strongly correlated withthe structural changes produced by the Si/Ge framework variation. In fact, this correlation is attributed to the informa-tion extracted by PC analysis from the XRD peak shiftproduced by the isomorphic substitution of Si by Ge in the

zeolite framework, as can be clearly seen in the Figure 10inset. No correlation was found between Si/Ge and theremaining two SPCs.

Figure 9. Identification of different structural properties in the SPC space:distribution of ITQ-21 and ITQ-30 with different ranges of crystallinity.

Figure 10. Identification of different structural properties in the SPC space for ITQ-21 samples: correlation between SPC#3 and Si/Ge in the starting gel,for three different degrees of crystallinity. Inset: Partial diffractograms corresponding to four samples with different Si/Ge ratios and the same crystallinity(20%), showing the peak shift.

3292 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Page 7: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 7/10

Consequently, SPCs contain the summarized informationof XRD patterns concerning the different structural andmorphological changes in the whole of the materialsexplored. These results demonstrate that the application of dimensional reduction techniques, just as with PCA, of the

raw XRD data allows one to obtain a new series of structuralcomponents in a fully automated manner, which entirely

describes the properties of the synthesized samples. Inaddition, these structural vectors can be used to improve theprediction performance of QSAR/QSPR models, such asNNs, as well as the development of new exploration tools(mapping) of nonlinear and multidimensional spaces, suchas those found in the development of new microporousmaterials.

3.3. Construction of Predictive Models (QSPR/QSAR).3.3.A. Predicti Ve Modeling of Material Properties fromSynthesis Descriptors. As a first step, NN models wereobtained using the synthesis descriptors as input and thezeolite crystallinity as output. Very good prediction resultscould be obtained using a NN with a two-hidden-layer

topology and the back propagation training algorithm ( R )0.3). A total of 70% of the data were employed for thetraining process and the rest for testing. Figure 11 showsthe experimental and predicted crystallinity for both zeolites,clearly illustrating the high accuracy of the model despitethe experimental error associated with the synthesis andcharacterization steps. Subsequently, this predictive modelwas applied for finding the theoretical synthesis conditionsthat optimize the ITQ-21 crystallinity by keeping the molarratio Si/Ge > 30. Three different sets of conditions withpredicted crystallinity around 60% were selected for experi-mental testing, with 2 days of crystallization time. The

Figure 11. Prediction performance of the NN model using the synthesisfactors as input and the crystallinity of ITQ-21 and ITQ-30 as output. (Nettopology 5_10_4_2, trained using BackProp with the Momentum algorithmand 80% data.)

Figure 12. Decision tree ID3-IV obtained using synthesis descriptors as model input and phase clusters as output. [The importance of each factors asfollows: Si/Ge 100%, Al/(Si + Ge) 79%, MSTP/(Si + Ge) 72%, H 2O/(Si + Ge) 70%, and crystallization time 38%.] The initial data partition called theinitial branch or root encompasses all data records. This root is split into subsets or child branches, on the basis of the value of a particular input field, whichmay in turn be split again into sub-branches and so on.

Table 3. NN and Decision Tree Prediction Performances of theObtained Phase Using the Synthesis Variables as Model Input

class % DT accuracy % NN accuracy

amorphous 92.16 96.08ITQ-21 93.10 93.10ITQ-30 92.31 92.31

A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3293

Page 8: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 8/10

experimental crystallinity achieved was slightly lower thanexpected, being for the samples close to 50, as can be shownin Figure 11 (filled squares).

Subsequently, predictive models based on decision treesand NNs were computed using just the type of formedmaterial as output data. Figure 12 shows the best decisiontree found, describing successfully the type of materialformed as a function of the synthesis variables. Table 3compares the prediction performance of the NN and decisiontree models, with very high accuracy, although the NN model

is slightly better. The relative importance of each input factorin the occurrence of each phase follows, in both models, theorder Si/Ge > Al/(Si + Ge) > MSPT ≈ H2O/(Si + Ge) >

time, contrasting with the standardized effect observed forthe crystallinity of each phase (Figure 3), where H 2O/(Si +

Ge) and Al/(Si + Ge) played the major roles for ITQ-21and ITQ-30, respectively.

As a second step, predictive models were computed usingthe SPCs as output for the model, whereas synthesis variableswere used as input. This approach may allow prediction of the structural properties of a material, it being possible todistinguish between the type of phase (known or unknown),crystallinity, framework composition, and so forth. The SPC

output is well-suited when the aims of the exploration areboth the discovery of new structures and the optimizationof a determined feature when competing phases are alsoformed. Given that synthesis variables have been shown asthe main factors in the growth of both ITQ-21 and ITQ-30by the Pareto analysis, and bearing in mind that SPCs arestrongly correlated with the type of material formed, itscrystallinity, and its framework composition, there is nodoubt about the existence of clear relationships betweensynthesis descriptors and SPCs. Following this approach, an

accurate NN model was obtained using the available data(70% for training and 30% for validation), trained followingthe back propagation algorithm ( R ) 0.3). Figure 13 showsthe observed SPCs versus the predicted ones, the averagedprediction error to the test samples being in the range of 10%.

Considering all of the predictive results based on decisiontrees and NNs, we can see in Figure 12 that the lowest Gecontent in the ITQ-21 zeolite that can be synthesizedwith high crystallinity is for a Si/Ge ratio of 37.5. This isin agreement with previous results 19 that suggest thatITQ-21 could be obtained for a Si/Ge ratio of 25, but notfor 50.

Figure 13. NN prediction performance of the SPC using the synthesis factors as input. The correlation factor for the crystallinity of ITQ-21 and ITQ-30is 0.960 and 0.958, respectively. The inset shows the topology of the best NN.

Figure 14. Eigenvalues for two different data set sizes: on the left-hand side, 60% of the whole available amount of experiments is considered, while onthe right side, only 40% is used for the calculation of the eigenvectors.

3294 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.

Page 9: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 9/10

This helps to fine-tune the better synthesis conditions forthe lowest Ge-content ITQ-21 samples that will have themaximum stability and better catalytic performance.

3.3.B. Predicti Ve Modeling of Phase Type from theStructural Principal Components. Finally, the correlationbetween SPCs and the type of structure by NN modelingwas studied. Carefulness is compulsory during this study inorder to not overfit the data but also to present a realistic

methodology. Therefore, the stability of the approach is testedby reducing drastically the number of experiments that are

used for producing the PCA. Two different sizes, 40% and60% of the whole available data set, have been used for thecalculation of the eigenvectors, and the first three principalcomponents have been kept for both analyses, see Figure14. Then, the remaining unseen experimental data (60% and40%, respectively) are projected into the modified spaceusing the analytic definition of the selected principalcomponents (i.e., the first three components), see Figure 15.Then, NNs are trained using only the materials used for thePCA calculations with PCA coordinates as input and phasetypes as output. Therefore, when the coordinates of theunseen solids are calculated through PCA axes definition,the NN is used in a second step to assign them a labelcorresponding to the expected phase class. Table 4 indicatesthe recognition rates for both training and test sets consider-ing the most drastic PCA study (i.e., 40% of the data for

Figure 15. 3D scatter plot with the first three principal components. On the left-hand side are represented the experiments corresponding to the 40% of theentire data set used for the calculation of the eigenvectors, while on the right side, unseen materials are projected.

Table 4. Best Selected NN: MLP 3:3 - 10 - 3:1

Real Classes

training set:100% recognition test set:96% recognitionpredictedclass 1 2 3 1 2 3

1 35 0 0 58 0 02 0 16 0 0 17 03 0 0 6 1 2 9

a NN prediction performances of the obtained phase using the SPCcoordinates as input.

Figure 16. Data mining applied in the development of new solid materials: methodology for automated data analysis, visualization, and QSPR modeling.

A New Mapping/Exploration Approach Chem. Mater., Vol. 18, No. 14, 2006 3295

Page 10: chem mater 2006 18 3287 3296

8/14/2019 chem mater 2006 18 3287 3296

http://slidepdf.com/reader/full/chem-mater-2006-18-3287-3296 10/10

component calculation). It can be argued that the NN playsa rather small role because the separation between classesinto the PCA space is sharp. However, the results areexcellent, and this approach appears to be of great interest.

4. Conclusions

This works shows a complete study integrating high-throughput tools for the synthesis and characterization of

solid materials and data-mining techniques in the discoveryand optimization of new microporous materials. The phasediagram of the system SiO 2:GeO2:Al2O3:F- :H2O:N(16) me-thylsparteinium hydroxide has been systematically exploredfollowing a factorial design, the effect of the starting gelcomposition being determined, as well as the crystallizationtime. Two different zeolites (ITQ-21 and ITQ-30) weredetected within the explored space.

Data visualization and dimensional reduction were con-ducted by using principal components analysis and clusteringalgorithms, allowing extraction of the desired structuralvectors from the XRD characterization data. These unsu-pervised techniques allow the obtainment of a view of thescreening results closer to the topology of the exploredmultidimensional space, including information about theformed phase(s), crystallinity of the material, particle size,and isomorphic substitution degree, allowing as well thereduction of the experimental noise of the original charac-terization data. Moreover, the automation of this type of analysis can be easily implemented without any priorknowledge of the problem.

Different modeling techniques were applied for the predic-tion of the properties of the materials obtained consideringthe synthesis data as input of the model. Furthermore,

different “material property” descriptors were considered asoutcome of the model, that is, crystallinity of the formedphase, SPCs computed by PCA, or clustering results. It wasfound that the final properties of the materials could besuccessfully modeled using neural networks, obtaining high-quality predictions, especially when applying SPCs as modeloutput.

This proposed methodology (see Figure 16) for unsuper-

vised characterization analysis and subsequent predictivemodeling could be applied when other material propertiesare to be explored or optimized, such as, for instance, acidity,fluorescence/phosphorescence, or adsorption properties, andwhen other characterization techniques are employed, suchas RAMAN, NMR, photoluminescence spectroscopy, andIR imaging. Finally, these predictive models could be usedfor guiding the next experimental round, allowing one toskip the screening of Virtually low-performing materials andpromoting the synthesis of new dissimilar materials (withrespect to the explored space) and therefore accelerating themultiparametric space exploration.

Acknowledgment. Financial support from the Spanishgovernment (Project MAT 2003-07945-C02-01 and GrantsTIC2003-07369-C02-01 and FPU AP2003-4635) and the E.U.Commission (TOPCOMBI Project) is gratefully acknowledged.The authors thank I. Millet and J. Herrera for technicalassistance.

Supporting Information Available: Details for data miningtechniques. This material is available free of charge via the Internetat http://pubs.acs.org.

CM060620K

3296 Chem. Mater., Vol. 18, No. 14, 2006 Corma et al.