System Level Application of Metabolic Engineering · System Level Application of Metabolic Engineering ----- In silico Microbial Strain Redesign Hao Wang Helsinki 9.4.2007 ... network

Computational Systems Biology Seminar report System Level Application of Metabolic Engineering ------ In silico Microbial Strain Redesign Hao Wang Helsinki 9.4.2007 UNIVERSITY OF HELSINKI Department of Applied Chemistry and Microbiology

1 Introduction

At present, there are about 600 microbial genomes had been sequenced, and 288 undergoing

or nearly finished genome projects are processing around the world

(http://www.ncbi.nlm.nih.gov/genomes/MICROBES/microbial_taxtree.html). This escalating

condition has turned to be the driving force of new bioinformatics methods and algorisms

served for the data-mining requirements in post-genomic era. With more accurate genome

annotation and more complete pathway information, promising achievements had been made

on in silico metabolic network reconstruction (Van Dien and Lidstrom 2002), and noted that

one of the most comprehensive stoichiometric models built upon the genome of E. coli had

been available (Reed and Palsson 2003). Based on these genome-scale metabolic networks, a

few bio-engineering applications had be proposed, and one new emerging study is aiming at

computationally predict the probability and possibility of overproduction of particular

chemical and biochemical compounds, diversely range from industrial interests to

environmental usages, through metabolic modification. The outcomes of some positive

experimental verification of these microorganism redesigns bring light on their potential

engineering application in the future.

2 OptStrain

OptStrain (Pharkya et al. 2003) is newly developed hierarchical computational framework,

which could be used as a statistical model for microorganism strain redesign through

recombination of available digitalized pathways, its aim is to not only find a proper set of

non-native pathways whose combining could lead to desired extrinsic compound production

but also figure out possible ways of pathway deletion using OptKnock (Burgard et al. 2003;

Pharkya et al. 2003) so as to obtain the maximum output of target compound. The workflow

of OptStrain is illustrated in Figure 1. Its procedures include four steps described as follow.

2.1 Curation of the database

Firstly, a Universal database was constructed by collecting curated biotransformation data

form public domains, such as KEGG (Kanehisa et al. 2004) and in silico E. coli model (Reed

and Palsson 2003), which laid very important foundation for later studies. Then those

downloaded reactions were checked by pre-programmed scripts for automatically parsing and

Figure 1. Flowchart of OptStrain procedure. Step 1 Curation of database(s) of reactions to

compile the Universal database, only elementally balanced reactions included. Step 2

identifies a maximum-yield path enabling the desired biotransformations from a substrate to

product, the white arrows represent native reactions of the host and the yellow arrows denote

non-native reactions. Step 3 minimizes the reliance on non-native reactions, and Step 4

incorporates the non-native functionalities into the microbial host's stoichiometric model and

applies the OptKnock framework to identify and eliminate reactions competing with the

targeted product. (Pharkya et al., 2003).

elementally balancing check before entering the database. The metabolite set N was

comprised of ～4800 metabolites, and the reaction set M consisted of >5700 reactions.

Moreover, compounds with an unspecified number of repeat units [e.g., trans-2-Enoyl-CoA

represented by C25H39N7O17P3S(CH2)n] or unspecified alkyl groups R in their chemical

formulas are removed. A few Perl (Brown 1999) scripts were developed to automatically

process this task routinely. 2.2 Determination of the maximum yield

With the elementally balanced datasets of functionalities obtained from previous step,

theoretical maximum yields of the target product P are calculated by maximizing the sum of

all reaction fluxes producing minus those consuming the target metabolite, weighted by the

stoichiometric coefficient of the target metabolite in all reactions within the database, which

comprised of a set N = {1,..., N} of metabolites (~4800) and a set M = {1,..., M} of reactions

(>5700). The uptake rate of substrate is set to pre-defined value (default is 1 unit of substrate).

This calculation could be simulated as a linear programming (LP) problem as following

equations, which now could be analyzed in a large scale and solved very well.

where MWi is the molecular weight of metabolite i, vj is the molar flux of reaction j (can

either be irreversible or reversible), and Sij is the stoichiometric coefficient of metabolite i in

reaction j. The inequality in constraint (1) allows only for secretion and prevents the uptake of

all metabolites in the network other than the substrates in R. Constraint (2) scales the results

for a total substrate uptake flux of one unit of mass. Solutions of these equations could be the

initial values used in later calculations.

2.3 Identification of the minimum number of non-native reactions for a host

organism One key purpose of strain engineering is to endow microbial hosts with function of producing

additional compounds with economical or environmental interests. In this step, OptStrain is

trying to determine the minimum number of non-native functionalities or reactions, that are

absent in the examined microbial host’s metabolic model and needed to be combined into the

original network. The simulation of this step could be achieved through adding heterologous

fluxes as following equations:

The set Mnon-native comprises the non-native reactions for the examined host, and (1) and (2)

are identical to those in step 2. (3) ensures that the product yield meets the maximum

theoretical yield calculated in Step 2. The binary variable yj is a set of binary values (1 or 0)

for turning the responding reactions on or off, and this constraint is imposed only on reactions

associated with genes heterologous to the specified production host. The parameters vjmin and

vjmax can either be assumed values or calculated by minimizing and maximizing every

reaction flux vj subject to stoichiometric constraints. The addition of constraints derived from

binary controls turned such a case into solving a simulation problem of Mixed Integer Linear

Programming (MILP) model; its solution is a set of non-native pathways, which are obtained

through balancing between minimizing numbers of heterologous genes and maximizing

theoretical product yield.

2.4 OptKnock: pruning of the host organism's stoichiometric model in order to achieve highest production of compounds (bi-level computational framework) Even addition of optimal set(s) of non-native biotransformations would lead to production of

desired compounds; it would not guarantee an overproduction of this compound. In order to

maximize this output, another computational framework, OptKnock (Burgard et al. 2003;

Pharkya et al. 2003) was used to modulate the flux distribution toward specific orientation by

containing other competing reactions and byproducts.

This framework is aim to optimize a bilevel problem, that is balancing between two

competing optimal strategists (cellular objective and chemical production). As one promising

achievement from the same group of OptStrain, OptKnock also utilize genome-scale

metabolic models as stoichiometric basis and flux balance analysis (FBA), then maximization

of biomass formation (cellular objective), optimization of target chemical or biochemical

compounds (chemical production), and candidate gene deletions are set as additional

constraints in looking for a likely flux distribution. Optimal solutions after overall

considering all alternative solutions would give out suggested genes for knocking out. It

should be noted that OptKnock’s results from testing a wide range of products (succinate,

lactate, 1,3-propanediol, glutamate, alanine, hydrogen, vanillin) showed good congruence

with the experimental data published in literatures.

3. Case studies results from OptStrain

To verify the effectiveness of OptStrain, two diversified compounds, hydrogen and vanillin,

are selected as test examples of in silico strain design. In the case of hydrogen production,

three different hosts are used, and a few substrates are screened. As for the case of vanillin,

minimum set of non-native reactions required for compound production was identified, and

this highlights the outcome of OptStrain. Nevertheless, the suggestions of reshaping strain

made by this framework look like only are in silico predictions, and have little support form

existing experimental data. Hence, the results of implementation of those suggestions are

really expected, not only for model validation but also for preliminary engineering trial.

3.1 Hydrogen production The result of LP formulation (step 2) among many substrates shows that methanol turned to

be the most efficient ‘raw material’ for hydrogen production, and glucose also selected for

further test owning to its economic availability. Three strains are used for analysis: 1). E. coli,

2). C. acetobutylicum, 3). M. extorquens.

3.1.1 glucose substrate in E. coli

Table 1. Deletion mutants for enhanced hydrogen production in E. coli

For this case, OptStrain doesn’t find any non-native reaction needed for hydrogen producing.

But OptKnock (step 4) identified two sets of reactions (shown in Table 1), whose deletions

would decrease competition with hydrogen production at most degrees.

3.1.2 glucose substrate in C. acetobutylicum

C. acetobutylicum had been intensively studied as a model organism of hydrogen production

(Chin et al. 2003), and OptStrain also does not find any non-native reactions are needed.

Importantly, the output of OptKnock’s two knockout candidates is highly congruent with

experimental data.

3.1.2 methanol substrate in M. extorquens AM1

M. extorquens AM1 could live on methanol as solely carbon source, therefore, also been

thoroughly studied (Van Dien et al. 2003; Chistoserdova et al. 2004), even a stoichiometric

model of central metabolism had been established (Van Dien and Lidstrom 2002). Single

non-native reaction was identified by OptStrain to enable hydrogen ability on this host; this

might be the reason why M. extorquens AM1 can not produce hydrogen originally.

3.2 Vanillin production There is constant demand for overproduction of vanillin in case of its limited natural yields

and economical value. So initially OptStrain was used to figure out the maximum theoretical

yields and minimum set of non-native reactions required for strain redesign of E. coli. Based

on a maximum theoretical yield of glucose at 0.63 g/g, OptKnock was used again for

overproduction optimization in such an augmented genome-scale model with three non-native

bioconversions, and many alternative pathways were found that their deletions could meet the

criteria of maximum vanillin production. Finally the author look like choose the three cases

which have experimental data support to be further discussed in the paper. They are one

deletion (removal of acetaldehyde dehydrogenase, EC 1.2.1.10), double deletion (with

Figure 2. Calculated flux distributions at the maximum growth rates in the (A) one, (B) two,

and (C) four deletion E. coli mutant networks for overproducing vanillin. Non-native

reactions are denoted by the thicker gray arrows. A basic glucose uptake rate of 10

mmol/gDW per hour was assumed.

additional removal of glucose -6-phophate isomerase EC 5.3.1.9), and four-reaction deletion

(with deletion of acetate kinase EC 2.7.2.1, pyruvate kinase EC 2.7.1.40, the PTS transport

mechanism, and fructose 6-phosphate aldolase). These modifications on flux distribution

network (shown in Figure 2) presume higher vanillin production level under all culture

conditions.

4 OptReg

As an upgraded version of OptKnock, OptReg (Pharkya and Maranas, 2006) was developed

to maximize the production of desired compound through modulation on pathways by up- or

down-regulating reactions besides knocking them out. However, since this extended

computational framework take into account much complex conditions than the former one did,

its computational complexity is magnified.

OptReg still applied Mixed Integer Linear Programming (MILP) simulation to solve the

optimization problem in this framework. In this case, regulation strength parameter C (Figure

3) was introduced into the framework to simulate the conditions of up- and down-regulation,

so there are actually three states (up/down-regulating and knocking out) for any reaction in

the model for optimal secreening. In present paper (Pharkya and Maranas, 2006) all results

were calculated while fixing the value of C as 0.5, however they might be completely

different if C were assigned to another value; and this might be a potential drawback of

OptReg.

It is still unrealistic to check those predictions by present experimental techniques, therefore,

another computational criterion for anticipating microbial systems, MOMA (Segre et al.,

2002), was applied to evaluate OptReg’s performance. Anyway, even the authors agree that a

similar conclusion from both statistical methods could not make confirmative remarks on

OptReg.

Figure 3: A pictorial overview of the definitions of up/down regulations and deletions. 0≤C

≤1, The lower bound vj min for a flux j may be greater than zero if it is required for biomass

formation, and the reaction cannot be knocked out. 5 Calculation and Software

All the optimization problems above were solved using CPLEX 7.0 accessed via the GAMS

(Brooke et al. 1998) modeling environment on an IBM RS6000-270 workstation.

6 Conclusions

Such a series of frameworks (OptKnock, OptStrain and OptReg) highlighted continuous

efforts laid on the advancement of metabolic engineering modification from this group in

Pennsylvania State University (PSU). Their studies would be essentially helpful in improving

the application of biotechnology in industry innovations, and also paved solid basis for more

comprehensive understanding on this leading edge field as well. Admittedly, some of the

examples discussed in their papers are in good agreement with experimental data in published

literatures. But this could not provide us enough confidence on their performance due to only

a limited number of experimental test results could be available until now. As we known,

The stoichiometry models used in metabolic engineering study could have alternative

solutions because only extracellular parameters could be accurately measured, thus

intracellular information is required in order to obtain better predictions from present highly

redundant models. Consequently, more accurate data from intracellular conditions would be

tremendous helpful in polishing these frameworks, and new techniques (e.g. NMR, mass

spectrometry, and isotopic atoms labeling etc.) that could intensively measure those

intracellular parameters would provide more precise and stringent constraints over those

optimization models, therefore, could make significant advancement for the frameworks.

7 References

Brooke, A., Kendrick, D., Meeraus, A., and Raman, R. 1998. GAMS: A user’s guide. GAMS Development Corp., Washington, D.C. Brown, M. 1999. Perl programmer’s reference. Osborne/McGraw-Hill, Berkeley, CA. Burgard, A., Pharkya, P. and Maranas, C. 2003. OptKnock: A Bilevel Programming Framework for Identifying Gene Knockout Strategies for Microbial Strain Optimization, Biotech. Bioeng., 84: 647-657.

Chin, H.L., Chen, Z.S., and Chou, C.P. 2003. Fedbatch operation using Clostridium acetobutylicum suspension culture as biocatalyst for enhancing hydrogen production. Biotechnol. Prog. 19: 383–388. Chistoserdova, L., Laukel, M., Portais, J.C., Vorholt, J.A., and Lidstrom, M.E. 2004. Multiple formate dehydrogenase enzymes in the facultative methylotroph Methylobacterium extorquens AM1 are dispensable for growth on methanol. J. Bacteriol. 186: 22–28. Fischer, E., Zamboni, N., Sauer, U., 2004. High-throughput metabolic flux analysis based on gas chromatography-mass spectrometry derived 13C constraints. Anal. Biochem. 325: 308–316. Ignizio, J.P., Cavalier, T.M., 1994. Linear programming. Prentice Hall, Englewood Cliffs, N.J. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. 2004. The KEGG resource for depichering the genome, Nucleic Acid res. 32: D277-D280.

Pharkya, P., Burgard, A.P., Maranas, C.D. 2003. Exploring the overproduction of amino acids using the bilevel optimization framework OptKnock. Biotechnol. Bioeng. 84: 887–899.

Pharkya, P., Burgard, A.P., Maranas, C.D., 2004. OptSrain: a computational framework for redesign of microbial production networks. Genome Res. 14: 2367–2376. Pharkya, P., Maranas, C.D., 2006. An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems. Metabolic Engineering 8: 1–13. Reed, J.L., Vo, T.D., Schilling, C.H., and Palsson, B.O. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4: R54. Segre, D., Vitkup, D., Church, G.M., 2002. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA 99: 15112–15117. Van Dien, S.J. and Lidstrom, M.E. 2002. Stoichiometric model for evaluating the metabolic capabilities of the facultative methylotroph Methylobacterium extorquens AM1, with application to reconstruction of C(3) and C(4) metabolism. Biotechnol. Bioeng. 78: 296–312.

Van Dien, S.J., Strovas, T., and Lidstrom, M.E. 2003. Quantification of central metabolic fluxes in the facultative methylotroph Methylobacterium extorquens AM1 using 13C-label tracing and mass spectrometry. Biotechnol. Bioeng. 84: 45–55.

Documents

System Level Application of Metabolic Engineering · System Level Application of Metabolic Engineering ----- In silico Microbial Strain Redesign Hao Wang Helsinki 9.4.2007 ... network