MMP Manuscript A

  • Upload
    baumesl

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 MMP Manuscript A

    1/15

    Using Genetic Programming for an Advanced Performance Assessment of

    Industrially Relevant Heterogeneous Catalysts

    L.A. Baumes1*, A. Blansch2, P. Serna1, A.Tchougang2, N. Lachiche2, P. Collet2,A. Corma11

    Instituto de Tecnologa Qumica, UPV, Av. Naranjos s/n, E-46022 Valencia, Spain2

    Universit Louis Pasteur, LSIIT, FDBT, Ple API, F-67400 Illkirch, France

    * Corresponding author(s)

    Abstract

    Beside the ease and speed brought by automated synthesis stations and reactors technologies in materials

    science, adapted informatics tools must be further developed in order to handle the increase of throughput

    and data volume, and not to slow down the whole process. This paper reports the use of genetic

    programming (GP) in heterogeneous catalysis. Despite the fact that GP has received only little attention

    in this domain, it is shown how such an approach can be turned into a very singular and powerful tool for

    solid optimization, discovery, and monitoring. Jointly with neural networks, the GP paradigm is

    employed in order to accurately and automatically estimate the whole curve conversion versus time inthe epoxidation of large olefins using titanosilicates, Ti-MCM-41 and Ti-ITQ-2, as catalysts. In contrast

    to previous studies in combinatorial materials science and high-throughput screening, it was possible to

    estimate the entire evolution of the catalytic reaction for unsynthesized catalysts. Consequently the

    evaluation of the performance of virtual solids is not reduced to a single point (e.g. the conversion level at

    only one given reaction time or the initial reaction rate). The methodology is thoroughly detailed, whilestressing on the comparison between a newly proposed CAX crossover operator and the traditional one.

    Keywords: High-throughput, Data Mining, Genetic Programming, Materials Science, Heterogeneous

    Catalysis

  • 8/14/2019 MMP Manuscript A

    2/15

    1. Introduction

    The availability of long chain lineal olefins from Fisher-Tropsch units opens new possibilities toobtain long chain aliphatic epoxides that can be functionalised for application in lubricants,plastisizers, chemicals and fine chemicals production. Among the different catalytic systems to

    carry out the epoxidation of double bonds, micro and mesoporous titanosilicates

    1,2,3have beenshown more efficient catalysts than other metal-based materials.4,5,6 Considering this, and the

    fact that extra-large pores or high external surface areas are required to avoid diffusional

    restrictions when reacting large olefins, structured mesoporous material7,8,9 MCM-41, and thedelaminated zeolite10

    ,11,12,13,14 ITQ-2, were selected in this paper as silica supports for graftingactive Ti species (see Figure 1).

    Figure 1. Synthesis of the catalysts. Right - Firstly one of the two supports is selected. Then a givenamount of Titanium is grafted onto the surface. Finally, a given amount of one of the four selected silylating agent is grafted on the solid.Left - Example of catalyst with ITQ-2 as support, SiMe3 as

    silylating agent.

    On the other hand, the catalytic activity of such materials can be improved by properlycontrolling their surface properties, taking into account that the own hydrophilic nature of these

    silica supports can contribute to the Ti sites deactivation by water adsorption and formation ofdifferent by-products such as diols. Therefore, the design of an efficient epoxidation catalystrequires not only the synthesis of highly active sites, but also a way to prevent their poisoningduring the reaction. Tailoring the hydrophobicity allows an optimum adsorption of the reactants,while reducing the adsorption of the water and the opening of the desired product (epoxide) toform diols (see Figure 2), which would lead to the deactivation of the catalyst.

    7

    Figure 2. Reaction scheme. The starting reactant is on the left, the target product is in the middle, andthe molecule to avoid is on the right hand side.

    In the present work, this control has been achieved by anchoring alkyl-silylated agents onto thecatalyst surface, see Figure 1, whose apolar character modifies the final hydrophilicity of thematerial. During the silylation process, the amount of grafted molecules and the nature of the

    alkyl ligands are key parameters. Four different silylating agents have been selected to test theirability for protecting the Ti active sites from the presence of water. Such procedure introduces

    numerous variables to be optimised, requiring an important experimental effort, which has beenreduced by using high throughput synthesis and testing apparatus,15 see Figure 3. In our

  • 8/14/2019 MMP Manuscript A

    3/15

    precedent work,16 the amount of grafted Ti, the level of silylation, and the nature of thesilylating agent on the two different supports (MCM-41 and ITQ-2) were studied for theepoxidation of a C10 n-olefine taking the initial reaction rate as performance criterion. Contrarily

    to most prior studies16,17 in combinatorial materials science and high-throughput screeningapplied to heterogeneous catalysis, which restrict the data analysis by using a single standpoint(e.g. conversion value at one given reaction time, or initial reaction rate), we want to extractmore information from the previously collected data in order to be able to automaticallycompare the materials behaviour from different catalytic criteria.

    Figure 3. High throughput equipments. Left - Automated solid and liquid handling station for catalystsynthesis.Right- Parallel batch reactors in which catalysts and reactants are mixed and analyzed.

    In absence of a complete kinetic studies of the different synthesized catalysts, which could notbe tackled in practice due to the relatively large number of experiments, a new approach needsto be proposed. To do this, a genetic programming18 (GP) technique is employed in order todiscover one analytical functionfbehind the general shape corresponding to all the previouslytested catalysts ci, i=1..C. The GP objective can be formulated as the minimization of the erroretaking into account all the conversion measurements xi,t, t=tj..tT, for the whole dataset.

    Therefore, considering a given function, its parameters ,i kb

    , k=1..Nare fitted using Levenberg-Marquart methodology for each solid.

    i

    C

    c

    i

    eMin with ( )2

    , ,i

    T

    c i t i t

    t

    e x x= and ( ), , ,,t i k i t f b x= (Equation 1)

    Once the best function is found, the parameters can be used as output of a neural network while

    the synthesis variables of the catalysts are the inputs. This allows obtaining the parametersvalues for unsynthesized solids, and thus, the entire conversion curve. Beside the ease and speed

    brought by automated synthesis stations and reactors technologies in materials science, adaptedinformatics tools must be further developed in order to handle the increase of throughput anddata volume, and not to slow down the whole process.19

    In Ref.20, 21 and 22, the authors present

    a new Genetic Programming crossover operator called Context Aware Crossover (CAX) thatyielded great results on several usual benchmarks. Therefore, it was decided to try it out on thereal problem of catalyst performance modelling, which is a form of multi-objective symbolicregression. We report the use of genetic programming (GP) in heterogeneous catalysis. Despitethe fact that GP has received only a little attention in this domain, 23 this paper shows how suchan approach can be turned into a very singular and powerful tool for solid optimization anddiscovery. The GP paradigm is employed in order to accurately and automatically estimate thewhole curve conversion versus time in the epoxidation of large olefins using titanosilicates,

    Ti-MCM-41 and Ti-ITQ-2, as catalysts. Because of this, the evaluation of the performance of

    the virtual solids is not reduced to a single conversion value or the initial reaction rate, while theknowledge gain about the response of the catalysts, expressed through few parameters capturing

  • 8/14/2019 MMP Manuscript A

    4/15

    the evolution of the reaction along time, can be applied to predict the behaviour of new(unsynthesized) materials. The methodology is thoroughly detailed, and the analysis of the GPcrossover is stressed by comparing the newly proposed CAX operator with the traditional one.

    This paper starts with a quick description of the real dataset. Then, the scheme of the employedmethodology is drawn and the paper focuses on the CAX crossover. The presentation of theresults obtained on the catalyst optimisation problem and different benchmarks allow comparingthe CAX with the standard GP crossover based on consumed CPU-time. Finally, a conclusionends the paper.

    2. Description of the input data and experimental setup

    2.1.- Datasets

    BenchmarksStandard benchmarks have been implemented in order to asses the efficiency of the CAX underthe new point of view of CPU-time basis, namely the quadratic polynomial symbolic regression,

    the 11 bit multiplexer and the artificial ant on the Santa-Fe trail (with no ADF as in Koza'simplementation).

    Real applicationThe dataset obtained from the first step of the study

    16is composed of 128 different synthesized

    and tested catalysts, e.g. 36 for catalysts with SiMe3 as silylating agent, 6 for the three nextsilylating agents, each time on both supports, and a selection of 10 new diverse catalysts per

    support for verifying the modelling (362+632+102=128). Catalysts activity has beenmonitored during 16 hours giving a series of seven conversion measurements, i.e. the quantityof initial reactant which is transformed along time, see Equation 2. Since reactions were

    performed in a closed reactor, so-called batch mode, reactant concentration decreases over time,providing always curves conversion versus time characterized by a positive first derivative

    and a negative second derivative.

    1

    0 11

    1

    ( ) ( )% ( ) 100

    ( )t

    x t x t Conversion t x

    x t

    = = (Equation 2)

    2.2.- Experimental setup

    The CAX aims at improving the efficiency of the standard GP crossover by improving the

    second part of the operation, i.e. choosing where to graft into parent 1 (P1) a subtree chosen in parent 2 (P2). Usually, a modern GP crossover operator creates one new child from twoselected parents (P1 and P2) by i) randomly selecting a subtree S2 in P2 with 90% chance toselect a node, ii) randomly selecting a subtree S1 in P1 pointing on a node if S2 is a node, andiii) creating a child which is the clone of P1 with subtree S2 in place of S1. Considering theCAX operator, after selecting S2 in P2, one tries to find the best place where it could be graftedin P1. All nodes of P1 can potentially receive the graft, excluding the root of P1 and the nodes atthe bottom of P1 due to depth constraint. All possibilities are deterministically explored, byevaluating all possible children resulting from the graft of S2 wherever P1 can receive it (graynodes in Figure 4), and the child with the best fitness is returned. Even though the exhaustiveexploration of all potential crossover points in P1 is clearly expensive, Majeed and Ryanclaimed exceptional results, convincing us to try this new operator.

  • 8/14/2019 MMP Manuscript A

    5/15

    Figure 4. Context Aware Crossover (CAX): the shaded nodes in P1 are possible crossover points where

    the selected subtree S2 from P2 can go in.

    In their different papers, Majeed and Ryan suggest to first use the standard GP crossover, andthen start the CAX only after some time, so curves were plotted for CAX_10 (CAX started after

    10% of the run), CAX_40, CAX_70 and no CAX (cf . Figure. 5).

    In Ref.20, the population is made of 4,000 individuals for standard GP where the algorithmusing CAX only needed 200. Figure 5-left shows that if the same population size is used forstandard GP and CAX, the generation count just freezes when the CAX starts, due to the hugeamount of children evaluations that this operator needs. Therefore, it appears that using 200individuals for CAX is an advantage to CAX rather than GP. Thus, it was decided to reduce thepopulation by 95% when CAX starts, so as to keep a generation count roughly equivalent tostandard GP as shown in Figure 5- centre.

    Note that in Ref.20, fitness curves are given with reference to the number of generations.However, to produce one child the CAX needs many more evaluations than a standardcrossover. In Ref.21, performance is given considering the number of evaluations. One couldargue that all individuals do not take the same time to be evaluated. For these reasons, theresults will be expressed against computing time, all four plots being done in parallel, on a

    quadri-processor exclusively devoted to the runs.

  • 8/14/2019 MMP Manuscript A

    6/15

    Figure 5.Top - Catalyst optimisation problem. Left - Number of generations for constant population;

    Centre - reduced population for CAX; Right - Results averaged on 4 runs for a reduced population size

    when CAX starts. Each run takes around 13 hours on a 3Ghz PC. Bottom - The implementationpopulation reduction scheme is fair for the CAX evaluation- and generation-wise.

    All the experiments were done over 50 runs, but for a number of seconds allowing standard GP

    to perform the same number of evaluations as found in Koza's book. The experimentsimplement the simple solution of turning on the CAX after completion of a certain percentageof a run. In order to precisely evaluate the effects of CAX, the standard GP population size(4,000) is used in the beginning of CAX runs until the CAX operator is started, after which thepopulation is reduced down to 200 individuals. As a consequence, in this paper, the runs usingCAX are identical to the standard GP run until the CAX operator is started.

    3. Results

    3.1.- Benchmarks

    Koza's quartic polynomial symbolic regression problem (x4

    + x3

    + x2

    + x) is implemented. Toobtain the CAX_10 curve which takes 1200 seconds, see Figure 6, the algorithm begins with apopulation of 4,000 individuals for 120 seconds (10% of 1200), after which the CAX is started.At this moment, the population is reduced down to 200 individuals using the following process:the best individual is kept (elitism), and the other 199 individuals are selected with a tournamentof size 40 (1% of the original population size). Lower arities were tested, with elitisttournament-7 and random selection, but tournament-40 is what yielded the best results. OnFigure 6-Top-Left, it can be observed that all methods perform the same, even when the CAX is

    started and the population reduced from 4,000 down to 200. However, the Average Populationfitness curve, Figure 6-Top-Right clearly shows that, when the CAX starts, the average

    population fitness is boosted to values not far from the best individual's, but apparently, this

    does not lead to premature convergence, which is an interesting feature. Unfortunately, the greatimprovement announced in Ref.21 was not seen.

    On the 11-bit multiplexer problem, the effects of CAX look pretty much the same: on Figure 6-Centre-Left, starting the CAX does not seem to have much effect at all (although it seems thatCAX_10 has had a small negative impact on the best individual performance). On the right, onecan clearly see the effect of CAX on the population average fitness whenever CAX is started.Before CAX starts, the curve is of course identical to standard GP. What is remarkable, though,is that for CAX_10, it seems that the population has not prematurely converged, though theaverage fitness is very close to the best fitness. In the end, the best individual value for CAX_10is the same as for standard GP.

    The last benchmark in Ref.21 was the Lawnmower problem.18 However, this problem usesADFs that were not implemented in this work, since the original catalysis problem did not need

  • 8/14/2019 MMP Manuscript A

    7/15

    them. So, in order to take a comparable benchmark, the Artificial Ant on the Santa-Fe trailproblem was chosen. On this benchmark, still no improvement on the best fitness can be seen,cf. Figure 6-Bottom-Left, although this time, CAX_10 does not seem to recover and catch up

    with Standard GP. Here again, a spectacular boost on the population average fitness is observedwhenever the CAX starts.

    Figure 6. Top - Quartic polynomial symbolic regression. Left: Best individual performance. Right:Average performance of the population.Middle - 11 bit multiplexer problem. Left - Best performance.

    Right- mean performance.Bottom - Artificial Ant on the Santa-Fe Trail. Left - Number of hits of the bestindividual. Right - Number of hits of the average population.

    3.2.- Real application

  • 8/14/2019 MMP Manuscript A

    8/15

    This difficult problem was first tackled with a tailored GP algorithm that did not use the CAXoperator. The adjusted fitness (in the Koza sense) of the best individual measured on theevaluation set is 0.93 which corresponds to a mean R2 of 0.93, considering all the catalysts and

    all measurements. Data has been previously divided in learning set, test set, and evaluation setin order to detect overfitting.

    Considering the real application, it seems that one can conclude that the exhaustive searchstarted by the CAX in order to find the best positions of grafting does not yield much betterresults than when the same amount of CPU time is used by an ordinary standard crossover, seeFigure 5. Finally, different functions can be extracted from the best Pareto front, see Figure 7.For example, a two parameter function X=h(t)=kt

    n/(1+kt

    n) is selected that was found, using the

    standard GP operator, that shows the best balance between fitting accuracy and number ofparameters. On the other hand, a three parameter functionX=f(t)=a-bc

    tis also selected since the

    number of operators is minimized while showing approximately the same fitting quality.

    Figure 7. Genetic programming trees: a-bct

    on the left hand side, and ktn/(1+kt

    n ) respectively on the

    right.

    4- Using genetic programming results

    GP algorithm using the CAX operator, as well as the ordinary standard crossover, wereevaluated on a real set of data, consisting of kinetic measurements for 128 different catalysts inthe epoxidation of 4-decene. The application of GP to extract an analytical expression for

    reproducing the relationship between the conversion level and the reaction time introduces newopportunities during the evaluation of the results, since all the information obtained during theexperimental assays is entirely retained. As a consequence, the loss of information is avoided

    through an expression capturing the evolution of conversion with reaction time for eachcatalyst, while the data storage is also simplified by transforming the collection of discrete

    conversion vs. time values into the few parameters of the proposed equation.

    The automatic discovery and fitting of analytical expressions to reproduce kinetic experimentsrepresents a key issue for speeding up the data treatment stage, especially when large amount ofinformation has been generated by using high-throughput technologies. Even when the behaviour of the tested catalysts wants to be evaluated from one unique stand point (initialreaction rate, or the conversion level at a specific reaction time), it is necessary to normalize theexperimental results to fairly perform the comparisons, since aliquots for each reaction are

    hardly ever taken at the same reaction times. In this scenario, managing a simple equation to

    -

    ^

    /

    +

    ^

    ^

    a

    b

    c

    k

    ntt

    nt

    k

    1

  • 8/14/2019 MMP Manuscript A

    9/15

    rapidly estimate initial rates or the conversion at any reaction time (interpolation) becomescrucial. by simply calculating the derivate of a given analytical function, reaction rates can beevaluated at whatever reaction time including the initial reaction rate r0. For example,

    considering ( ).

    1 .=

    +

    n

    n

    k th t

    k t, ( )

    ( )

    2 2 1 1'( )

    11

    = = = +

    ++

    n n

    nn

    h k nt knt v t h t

    t ktkt

    , R

    2=0.98 is found

    between estimated r0 and previously reported in Ref.16

    On the other hand, analytical equations allow retaining most of the information from the kineticmeasurements, whose importance has been graphically expressed in Figure 8. In this Figurethree representative conversion vs. reaction time curves are depicted, showing that theranking of catalysts (A, B, C) depends on the selected criteria, i.e. C > B > A at t = 1, B > C >

    A at t = 4, andA > B > Cat t = 10, with tin hours. This result is a direct consequence of thefact that the final catalytic response is actually defined by a set of chemical-physicalphenomena, such as the type and magnitude of the interactions between reactants and the activesites or the occurrence of some deactivation processes.

    Time (h)

    Conversion(%)

    4 10

    50

    100

    A

    C

    B

    A

    C

    B

    A

    C

    B

    A

    C

    B

    1

    Figure 8. On the importance of the comparison criterion

    Therefore, the use of GP on catalysis field deals with acquiring a global understanding of the

    studies in case, enhancing the quality of results and the final knowledge gain. For instance, inthe present work we have applied the GP algorithm to infer various analytical expressions able

    to reproduce within very low errors (globalR20.93) the conversion vs. reaction time curves

    for the epoxidation of 4-decene using Ti-MCM-41 and Ti-ITQ-2 catalysts. As a consequence,we are now ready to evaluate the catalysts behaviour from different standpoints, as shown inFigure 9. In this Figure, experimental results (conversion levels) are represented at two (left) orthree (right) reaction times, using some filters to identify some of the characteristics of the

    related catalysts (type of material, i.e. MCM-41 or ITQ-2, top; type of silylating agent, SiMe3,SiMe2Bu, SiMe2Ph, or SiMePh2, bottom). Under this approach, new conclusions can be

    extracted about the catalysts mode of action, complementing those previously reported.16

    Onthis regard, it is shown that ITQ-2 samples providing the same conversion level than MCM-41

    at the initial stages of the reaction (t = 0.2 h), are generally more active materials at larger

    reaction times (t = 1 h), as can be inferred from Figure 9 (top, left). A similar analysis can becarried out in three dimensions by considering the behaviour of the catalysts at 0.2, 1, and 6 h

  • 8/14/2019 MMP Manuscript A

    10/15

    (Figure 9, top right). Moreover, when the same results are filtered by the type of silylatingagent, it is possible to observe the formation of some clusters, indicating, for instance, thatSiMe2Bu is highly active at short reaction times, but becomes overcome by SiMe3 at 6 h.

    Figure 9.Left- 2D plot of percent of conversion at t=1, and t=0.2.Right- 3D plot of percent conversionat t=0.2, t=1, and t=6; t in hours. Influence of support is shown in the top charts with ITQ samples as redsquares and MCM with filled blue circles, while silylating agent influence is shown at the bottom withblue, red, gray, green, and white circles respectively for SiMe2Ph, SiMe2Bu, SiMe3, SiMePh2, and withoutsilylating agent.

    On the other hand, the powerful of GP to offer an analytical expression to the experimental datais not only related to the ration knowledge gain/time savingsbut to the possibility of introducingdiverse mathematical criteria during the search process. For instance, among the large numberof possible equations to fit our kinetic measurements, we have limited the complexity of thesolution (number of operators, and number of parameters), leading to simple empiric

  • 8/14/2019 MMP Manuscript A

    11/15

    expressions. For instance, the equation ( ) =ta bc f t has been found by minimizing the

    number of operators to obtain a satisfactory correlation (R2=0.92). Thanks to this fact, a newcriterion can be easily calculated to rank the catalysts with regard to the whole conversion vs.reaction time data, using the area bellow the kinetic curves (integral of the analyticalexpression between 0 and T=10 hours) as shown in Equation 3. Figure 10 shows that this new

    criterion allows giving a new point of view on catalysts ranking complementarily to previouslyestablished one.

    16

    ( )( )

    0

    0

    .. , with .

    =

    = = t T T

    t T T b ca b c F F F a T Ln c

    (Equation 3)

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    1 11 21 31 41 51 61

    r0 - MCM41

    Integral - MCM41

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    1 11 21 31 41 51 61

    r0 - ITQ2

    Integral - ITQ2

    Figure 10. In black is represented the initial reaction rate while the area below the reaction curve

    between 0 and 10 hours appears in grey. Area has been divided by 100 in order to keep only one y-axis.Results are given separately for MCM41 and ITQ2, resp. Top and Bottom.

  • 8/14/2019 MMP Manuscript A

    12/15

    On the other hand, the equation ( ).

    1 .=

    +

    n

    n

    k th t

    k thas been achieved by minimizing the total

    number of parameters involved. Although the resulting expression is clearly more complex,making difficult its analytical treatment (and in particular the definition of the primitive forintegral calculation), it is more convenient for trying to correlate the responses (conversion vs.

    time curves) of the catalysts with their chemical characteristics using advanced modellingalgorithms. In this sense, the regression between parameters values and synthesis variables hasbeen handled with a neural network. The synthesis variables are the following: 2 supports forthe Titanium grafting process {MCM-41, ITQ-2}, the range [0.1-5] Ti wt% for the Titaniumgrafting, 4 silylant agents to analyse the effect of the alkyl group size on the catalytic properties{SiMe3, SiMe2Ph, SiMePh2, SiMe2Bu}, [0.0-1] and [0.0-0.5] for the silylation degree, e.g.SiR3/(SiO2+TiO2), for MCM-41 and ITQ-2 samples respectively. The epoxidation of trans-4-decene is elected as test reaction to evaluate the catalytic performance of the synthesized

    materials. Minimizing the low number of parameters allows overfitting of the neural network tobe easily handled, and thus, the resulting architecture shows a very low level of complexity in

    both the number of hidden layers and total amount of neurons (Multi-layer Perceptron 4:4-8-2:2), four synthesis variables as input, and kand n as output.

    Table 1. Neural network statisticsTraining Selection Test

    k n k n k n

    Data mean 0.178283 0.539448 0.223869 0.574403 0.196315 0.561092

    Data S.D. 0.143507 0.162605 0.164714 0.145667 0.159821 0.111543

    Error Mean -0.000971 -0.001282 -0.032269 -0.036759 -0.010183 -0.019647

    Error S.D. 0.054646 0.075635 0.080973 0.080328 0.084756 0.075151

    Abs. E. Mean 0.041913 0.062049 0.060644 0.069006 0.067030 0.063672

    S.D. Ratio 0.380789 0.465148 0.491600 0.551453 0.530319 0.673738

    Correlation 0.924835 0.886070 0.872050 0.854034 0.854259 0.804007

    Figure 11 shows the estimation and nominal errors of kand n using the synthesis variables as

    input (e.g. %Ti, %Sylilation, Sylilating agent, and Support). Before using the neural network, adivision of the dataset ( for training, for selection, and for testing, i.e. unseen materials)allows preventing overfitting, see Table 1 for statistics.

    0.0

    0.3

    0.6

    0.9

    1.2

    1.5

    k nkpred npred

    -0.2

    -0.1

    0.0

    0.1

    0.2

    0.3 diff k

    diff n

  • 8/14/2019 MMP Manuscript A

    13/15

    Figure 11. Top - Neural network (Multi-layer Perceptron 4:4-6-2:2) estimation. Bottom Observedversus predicted values of k and n (respectively left and right) for separated datasets, i.e. T for training, Sfor selection, and X for Test.

    5.- Conclusion

    The conclusion is not exactly the one that was originally planned. When starting this work, the

    aim was to improve the best individual result on the heterogeneous catalyst optimisationproblem using the Context Aware Crossover. Unfortunately, things did not turn out as expected,

    as it was impossible to obtain better results with the CAX than with an ordinary crossoveroperator on this real world problem. A careful implementation of the benchmarks seems toshow that CAX is not capable of improving the best fitness value; although CAX seems to be avery good exploitation operator that boosts the whole population towards much better fitnessvalues while maintaining a good level of diversity (best individual fitness keeps rising after theCAX is started). This means that CAX remains a very interesting crossover method that woulddeserve another careful investigation on diversity preservation. From the point of view of thechemistry, the application of GP allows reproducing the relationship between the conversion

    level and the reaction time, it retains all the information, and data storage is also simplified.Moreover, the use of GP permits acquiring a more global understanding, enhancing the qualityof results and the final knowledge gain. Catalysts behaviour can be quickly evaluated from

    different points of view, allowing new conclusions to be extracted about the catalysts mode ofaction. For the first time in heterogeneous catalysis, genetic programming has been used for anapplication of industrial interest. With this study, it has been shown how such a tool can opennew opportunities for data mining and knowledge extraction in material science. As an example,

    the combination with a modelling tool such as neural network makes again the GP strategy verypromising and relevant.

  • 8/14/2019 MMP Manuscript A

    14/15

    References

    1

    A. Corma, M.T. Navarro, J. Perez Pariente.J. Chem. Soc., Chem. Commun. 1994 1472 A. Thangaraj, R. Kumar, P. Ratnasamy,J. Catal. 131 1991 2943

    W. Fan, P. Wu, S. Namba, T. Tatsumi,Angew. Chem., Int. Ed. 43 2003 2364

    P. Barret, F. Pautet, M. Dauton, J.F. Sabot,Pharm. Acta Helv., 62 1987 3485

    N. Fdil, A. Romane, S. Allaoud, A. Karim, Y. Castanet, A. Morteaux.,J. Mol. Catal., 108 1996 156

    M. Lajunen, A.M.P. Koskinen, Tet. Lett., 35 1994 44617

    A. Corma, M. Domine, J.A. Gaona, J.L. Jorda, M.T. Navarro, F. Rey, J. Perez-Pariente, J. Tsuji, B.

    McCullock, L.T. Nemeth, Chem. Comm., 2211 19988

    W. Zhang, M. Froeba, J. Wang, P.T. Tanev, J. Wong, T.J. Pinnavaia, JACS 1996, 118(38), 9164-

    9171.9

    K.A. Koyano, T. Tatsumi, Microporous Materials 1997, 10(4-6), 259-271.10

    A. Corma, V. Fornes, S.B. Pergher, Th.L.M. Maesen, J.G. Buglass, Nature (London) 1998,

    396(6709), 353-356.11

    A. Corma, U. Diaz, V. Fornes, J.L Jorda, M.E. Domine, F. Rey, Chem. Comm. (Cambridge) 1999,(9), 779-780.12

    A. Corma, U. Diaz, M.E. Domine, V. Fornes, Angewandte Chemie, Int. Ed. 2000, 39(8), 1499-1501.13

    A. Corma, U. Diaz, M.E. Domine, V. Fornes, JACS 2000, 122(12), 2804-2809.14

    P. Wu, D. Nuntasri, J. Ruan, Y. Liu, M. He, W. Fan, O. Terasaki, T. Tatsumi, J. of Physical

    Chemistry B 2004, 108(50), 19126-19131.15

    (a) Jandeleit, B.; Schaefer, D.J.; Powers, T.S.; Turner, H.W.; Weinberg, W.H., Angew. Chem. Int. Ed.

    1999, 38, (17), 2494-2532. (b) Senkan, S.M.,Angew. Chem. Int. Ed. 2001, 40, (2), 312-329. (c) Reetz,

    M.T.,Angew. Chem. Int. Ed. 2001, 40, (2), 284-310. (d) Newsam, J.M.; Schuth, F., Biotechnol.Bioeng. 1999, 61, (4), 203-216. (e) Gennari, F.; Seneci, P.; Miertus, S., Catal. Rev.-Sci. Eng. 2000, 42,(3), 385-402.

    16P. Serna, L.A. Baumes, M. Moliner, A. Corma, Journal of Catalysis, 258, 35-34, 2008

    17(a) M. Holena, M. Baerns, Catal. Today, 2003, 81, 485-494. (b) L.A. Baumes, M. Moliner, A.

    Corma., QSAR comb. Sci. Vol. 26, Issue 2, 255-272, 2007 (c) D. Nicolaides, QSAR Comb. Sci. 2005,24, 15-21. (d) L.A. Baumes, J.M. Serra, P. Serna, A. Corma. J. Comb. Chem. 2006, 8, 583-596(e) M.M. Gardner, J. N. Cawse, InExperimental Design for Combinatorial and High Throughput MaterialsDevelopment, Ed. J.M. Cawse. J. Wiley & Sons, Inc. 2003, 129-145. (f) F. Schth, L.A. Baumes, F.

    Clerc, D. Demuth, D. Farrusseng, J. Llamas-Galilea, C. Klanner, J. Klein, A. Martinez-Joaristi, J.

    Procelewska, M. Saupe, S. Schunk, M. Schwickardi, W. Strehlau, T. Zech. Catal. Today. Vol. 117,

    2006. 284-290 (g) A. Corma, J. M. Serra, E. Argente, S. Valero, V. Botti, Chem. Phys. Chem., 2002,3, 939-945. (h) L.A. Baumes, D. Farruseng, M. Lengliz, C. Mirodatos. QSAR & Comb. Sci. Nov.2004, vol. 29, Issue 9, 767-778.

    18(a) J.R. Koza. Genetic Programming: On the Programming of Computers by means of Natural

    Evolution. MIT Press, Massachusetts, 1992. (b) J.R. Koza. Genetic Programming II: Automatic

    Discovery of Reusable Programs. MIT Press, Massachussetts, 1994.19

    (a) Baumes, L.A. Combinatorial Stochastic Iterative Algorithms and High-Throughput Approach:

    from Discovery to Optimisation of Heterogeneous Catalysts (in English). Univ. Claude Bernard Lyon1, Lyon, France, 2004. (b) Farrusseng, D.; Baumes, L.A.; Mirodatos, C., Data management for

    combinatorial heterogeneous catalysis: methodology and development of advanced tool. In In High-Throughput Analysis: A Tool For Combinatorial Materials Science, Potyrailo., R. A.; Amis., E. J.,Eds. Kluwer Academic/Plenum Publishers: 2003; pp 551-579. (c) http://catalyse.univ-

    lyon1.fr/gre3b4.htm website accessed the 20th

    july 2006 (d) http://www.fist.fr/article259.html website

    accessed the 20th

    july 2006. (e) Adams, N.; Schubert, U.S., Macromol. Rapid. Commun. 2005, 25, 48-58. (f) Adams, N.; Schubert, U.S., QSAR & Comb. Sci. 2005, 24, 58-65. (g) Ohrenberg, A.; vonTorne, C.; Schuppet, A.; Knab, B., QSAR & Comb. Sci. 2005, 24, 29-37. (h) Saupe, M.; Fodisch, R.;

    Sunderrmann, A.; Schunk, S.A.; Finger, K.E., QSAR & Comb. Sci. 2005, 24, 66-77. (i) Gilardoni, F.;Curcin, V.; Karunanayake, K.; Norgaard, J.; Guo, Y., QSAR & Comb. Sci. 2005, 24, 120-130.

    20H. Majeed and C. Ryan. A less destructive, context-aware crossover operator for GP. In P. Collet et

    al., editor, Proc of the 9th

    European Conf. on Genetic Programming, vol. 3905 of Lecture Notes in

    Computer Science, 36-48, Budapest, 2006. Springer.

  • 8/14/2019 MMP Manuscript A

    15/15

    21

    H. Majeed and C. Ryan. Using context-aware crossover to improve the performance of GP. In

    Maarten Keijzer et al., editor, GECCO 2006: Proc. of the 8th

    annual conf. on Genetic and evolutionary

    computation, vol.1, 847-854, Seattle, Washington, USA, 8-12 July 2006. ACM Press.22

    (a) H. Majeed and C. Ryan. Context-aware mutation: a modular, context aware mutation operator for

    genetic programming. In Dirk Thierens et al., editor, GECCO '07: Proc. of the 9th

    annual conf. on

    Genetic and evolutionary computation, vol.2, 1651-1658, London, 7-11 July 2007. ACM Press. (b) H.Majeed and C. Ryan. On the constructiveness of contextaware crossover. In Dirk Thierens et al.,

    editor, GECCO '07: Proc. of the 9th

    annual conf. on Genetic and evolutionary computation, vol.2,

    1659-1666, London, 7-11 July 2007. ACM Press.23

    L.A. Baumes, P.Collet. Computational Materials Science. 2008. In Press.