FBA

Optimization Based Frameworks and Search Methodologies for the

Analysis and Redesign of the Escherichia coli Metabolic Network.

Thesis defense by: William W. GikandiMajor Professor: Matheos KoffasAdditional committee Members:Prof. E. (Manolis) S. Tzanakakis

Prof. Sriram Neelamegham

Cell Modeling to Improve Naringenin Production in E. coli

Cell Modeling

Variety of methods. Identify the steady state fluxes of a cell. Main ones Flux Balance Analysis and MOMA

Flux Balance AnalysisProcedure

Is it biologically justifiable to assume it?

“The steady state approximation is generally valid because of fast equilibration of metabolite concentrations (seconds) with respect to the time scale of genetic regulation (minutes)” – Segre 2002

Steady State Assumption

Maximization ObjectiveCell’s objective is to Maximize Biomass

The Maximization objective = the stoichiometric sum of components that constitute Biomass

Minimization of Metabolic Adjustment (MOMA) Do mutant bacteria exhibit optimum metabolic

states? Not subjected to the same evolutionary

pressure that shaped the wild type Therefore knockouts probably do not possess

a mechanism for immediate regulation of fluxes toward the optimal growth configuration

MOMA

Hypothesis: knocked out bacteria initially display a suboptimal flux distribution with minimal cell-wide changes in fluxes

MOMA uses quadratic programming to approximate this behavior

FBA and MOMA

MOMA calculates initial flux distribution after perturbation assuming sub-optimal growth.

FBA (incorrectly) assumes perturbed cells behave optimally from the onset.

Regulatory/ Kinetic effects not accounted for.

FBA/ MOMAconstraints fluxes

Does Cell Modeling Work?

Qualitatively predict the growth potential of mutant strains

Qualitatively predict media dependent uptake/ secretion of protons in the growth

The average difference between experimental flux measurements and ones predicted by the model was 16%

Quantitatively describe relationship between uptake of a primary carbon source (acetate, malate, succinate), oxygen and maximal cellular growth rate.

Successfully identify triple-knockout gene targets that improved lycopene yield by ~ 40% in E. coli

FBA/ MOMA

Building the Model

[c]akg + ala-L <==> glu-L + pyr

[c]ala-L <==> ala-D

[c]asn-L + h2o --> asp-L + nh4

[c]asp-L + atp + nh4 --> amp + asn-L + h + ppi

[c]asp-L + atp + gln-L + h2o --> amp + asn-L + glu-L + h + ppi

[c]asp-L --> fum + nh4

[c]akg + asp-L <==> glu-L + oaa

[c]3mob + ala-L --> pyr + val-L

[c]ala-D + fad + h2o --> fadh2 + nh4 + pyr

Matrix Creator

1191 Total Fluxes 932 Reactions 259 Transport & Exchange Fluxes 70 Dead end Metabolites

Current Model

Glycolysis, the TCA cycle, the pentose phosphate pathway, respiration, anaplerotic reactions, fermentative reactions, amino acid biosynthesis and degradation, nucleotide biosynthesis and interconversions, fatty acid biosynthesis and degradation, phospholipid biosynthesis, cofactor biosynthesis, and metabolite transport

Testing the ModelObtained in-Silico exchange fluxes vs. Palsson's iJR904 model

Similar results for Anaerobic-Glucose, Aerobic-Succinate, Aerobic-Acetate substrates-20

-10

0

10

20

30

40

50

Exchange flux

Ou

tpu

t (m

mo

l/g D

W-h

r)

Matlab

Palsson

EX_co2(e) EX_h(e) EX_h2o(e) EX_pi(e)

EX_nh4(e)

Biomass

Proton Exchange Flux

Limiting exchange of protons across system boundary

0.00E+00

2.00E-01

4.00E-01

6.00E-01

8.00E-01

1.00E+00

1.20E+00

-10 -8 -6 -4 -2 0 2 4 6

Proton secretion flux

Re

lati

ve G

row

th R

ate

Acetate

Akg

Glucose-D

L-lactate

D-Lactate

Malate

Pyruvate

Succinate

Glycerol

Proton Exchange Flux

Naringenin

Reactions added

Participating Enzyme ReactionCoumaric Acid transport cma[e] <==> cma[c]4 coumarate:coenzyme A ligase [c]atp + cma + coa --> amp + ppi + cmcoaChalcone Synthase [c](3) malcoa + cmcoa --> (4) coa + chal + (3) co2Chalcone Isomerase [c]chal --> flvaNaringenin exchange flux [e]flva <==>Coumaric Acid exchange flux [e]cma <==>Naringenin transport flva[e] <==> flva[c]

Evaluate Scenarios

Gene-Protein Relationships



Gene Map

Overall Process

Standard Search

Combinatorial Explosion

Quaternary Knockouts ~ 230 days

At 2 seconds/ calculation…

Tertiary Knockouts ~ 12 daysSecondary Knockouts ~ 1 day

Primary Knockouts < 3 hours

Limited search space

Problem of large search space

Time taken Not all search covered Other methods possible? Genetic Algorithm

Genetic Algorithm

Genetic Algorithm

Crossover - Recombination

Crossover combines genetic material from two parents,Crossover combines genetic material from two parents,in order to produce superior offspring.in order to produce superior offspring.

Mutation

•Mutation introduces randomness into the population.Mutation introduces randomness into the population.•The idea of mutation is to reintroduce divergence into a The idea of mutation is to reintroduce divergence into a converging population.converging population.

Fitness Function

The Fitness function determines what solutions are better than others.

Fitness is computed for each individual. Fitness = flavanoid production

Example population

No. Chromosome Fitness

1 1010011010 1

2 1111100001 2

3 1011001100 3

4 1010000000 1

5 0000010000 3

6 1001011111 5

7 0101010101 1

8 1011100111 2

Main idea: better individuals get higher chance Chances proportional to fitness Roulette wheel technique

Selection

fitness(A) = 3

fitness(B) = 1

fitness(C) = 2

A C

1/6 = 17%

3/6 = 50%

B

2/6 = 33%

Stopping Criteria

Final problem is to decide when to stop execution of algorithm.

There are two possible solutions to this problem: First approach:

Stop after production of definite number of generations

Second approach: Stop when the improvement in average fitness

over two generations is below a threshold

Typical behavior of an EA

Early phase:

quasi-random population distribution

Mid-phase:

population arranged around/on hills

Late phase:

population concentrated on high hills

Phases in optimizing on a 1-dimensional fitness landscape

Advantages of GA’s

Search space not limited to first top 10 knockouts

Supports multi-objective optimization Can return a family of solutions with

similar fluxes Easy to exploit previous or alternate

solutions May find synergistic knockouts overlooked

by standard search

Genetic Algorithm

Parameters of the GA

Representation scheme: Integer [00100111][3 6 7 8]

Mutation rate: 1/ string length / locus restricted

Crossover type: scattered (random mix) Elite children : 2 Stall generations: 50 Population size: 1000 Mutation probability: Simulated Annealing

Simulated Annealing

Simulated AnnealingChange in Mutation Rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0 10 20 30 40 50 60 70 80 90 100

Generation %

Mu

tati

on

rat

e

Results:

Results: Summary

Over 10,000 KO results were stored by the algorithms, out of about 900,000 MOMA calculations performed

Results: Hill Climber VS GA

Results for both methods in Agreement Exhaustive combination of top 10 most

frequently suggested KO’s yielded no better results

Implications: the search space is not as chaotic as originally assumed

Which is better?

Results: Effect of Gene Mapping

More accurate prediction on reactions affected by disruption of genes

For example, the top yielding candidate for a primary level knockout predicted the loss of two reactions

Results: Primary Level

The top result predicted a flux increase of naringenin from zero with no knockouts performed to 0.6078 mmol/g-DW/hr

Gene: sdhC Reaction:

Reaction reduces amount of fumerate available to the cell. (Other sources available: e.g. glutamate degradation)


Affects ATP availability?


The top second result Gene: tpiA Glycolysis

Affects ATP availability?

Results: Top 3 in each levelTop 3 Simulated KO in each level

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Primar

y('sd

hC')

Primar

y('tp

iA')

Primar

y('gn

d')

Secon

dary

('gnd

' 'sd

hC')

Secon

dary

( 'gly

A' 'sd

hC')

Secon

dary

('folD

' 'sd

hC')

Tertia

ry('g

dhA' '

gnd'

'sdh

C')

Tertia

ry('g

cd' '

glyA

' 'sd

hC')

Tertia

ry('m

dh' 'g

lyA'

'sdhC')

Qua

tern

ary(

'dcuC

' 'br

nQ' '

gnd'

'sdh

C')

Qua

tern

ary(

'dcu

C' 'br

nQ' '

folD

' 'sd

hC')

Qua

tern

ary(

'gdhA

' 'pg

i' 'br

nQ' '

gnd')

GA Q

uate

rnar

y( 'g

nd'

'dcuC

' 'br

nQ'

'sdhD

')

GA Q

uate

rnar

y('sd

hC' '

gdhA

' 'ac

eA' '

gnd')

GA Q

uate

rnar

y('gn

d' 'm

dh' '

gdhA' '

sdhB

')

KO Genes

Nar

ing

enin

flu

x (m

mo

l/g-D

W/h

r)

Results: Increase over Wild type

% increase over predicted naringenin wildtype flux (0.0002 mmol/g-DW/hr)

0

100000

200000

300000

400000

500000

600000

700000

800000

Primar

y('sd

hC')

Primar

y('tp

iA')

Primar

y('gn

d')

Secon

dary

('gnd

' 'sd

hC')

Secon

dary

( 'gly

A' 'sd

hC')

Secon

dary

('folD

' 'sd

hC')

Tertia

ry('g

dhA' '

gnd'

'sdh

C')

Tertia

ry('g

cd' '

glyA

' 'sd

hC')

Tertia

ry('m

dh' 'g

lyA'

'sdhC')

Qua

tern

ary(

'dcuC

' 'br

nQ' '

gnd'

'sdh

C')

Qua

tern

ary(

'dcu

C' 'br

nQ' '

folD

' 'sd

hC')

Qua

tern

ary(

'gdhA

' 'pg

i' 'br

nQ' '

gnd')

GA Q

uate

rnar

y( 'g

nd'

'dcuC

' 'br

nQ'

'sdhD

')

GA Q

uate

rnar

y('sd

hC' '

gdhA

' 'ac

eA' '

gnd')

GA Q

uate

rnar

y('gn

d' 'm

dh' '

gdhA' '

sdhB

')

% in

crea

se o

f n

arin

gen

in f

lux

Results: Targets

TCA cycle, the pentose phosphate pathway, and other biosynthetic pathways

Results: RationalizationPrecursor Availability

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Naringenin Flux (mmol/g-DW/hr)

Flu

x o

utp

uts

(m

mo

l/g

-DW

/hr)

Malonyl CoA ACP transacylase

acetyl CoA carboxylate

Malonyl CoA ACP transacylase: only consumer of malonyl CoA

Acetyl CoA carboxylate: produces malonyl CoA

Naringenin/ Biomass Relationship

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

Naringenin output (mmol/g-DW/hr)

Bio

mas

s fl

ux

(mm

ol/

g-D

W/h

r)

Competition for precursors

Results: Diminishing Returns% increases of top 3 KO's over previous levels

0

50

100

150

200

250

300

Primar

y('sd

hC')

Primar

y('tp

iA')

Primar

y('gn

d')

Secon

dary

('gnd

' 'sd

hC')

Secon

dary

( 'gly

A' 'sd

hC')

Secon

dary

('folD

' 'sd

hC')

Tertia

ry('g

dhA' '

gnd'

'sdh

C')

Tertia

ry('g

cd' '

glyA

' 'sd

hC')

Tertia

ry('m

dh' 'g

lyA'

'sdhC')

Qua

tern

ary(

'dcuC

' 'br

nQ' '

gnd'

'sdh

C')

Qua

tern

ary(

'dcu

C' 'br

nQ' '

folD

' 'sd

hC')

Qua

tern

ary(

'gdhA

' 'pg

i' 'br

nQ' '

gnd')

KO Genes

% in

crea

se o

f n

arin

gen

in f

lux

Results: Diminishing ReturnsBiomass Threshold

Biomass Flux of top 3 KO's in each level

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Wild

type

Primar

y('sd

hC')

Primar

y('tp

iA')

Primar

y('gn

d')

Secon

dary

('gnd

' 'sd

hC')

Secon

dary

( 'gly

A' 'sd

hC')

Secon

dary

('folD

' 'sd

hC')

Tertia

ry('g

dhA' '

gnd'

'sdh

C')

Tertia

ry('g

cd' '

glyA

' 'sd

hC')

Tertia

ry('m

dh' 'g

lyA'

'sdhC')

Qua

tern

ary(

'dcuC

' 'br

nQ' '

gnd'

'sdh

C')

Qua

tern

ary(

'dcu

C' 'br

nQ' '

folD

' 'sd

hC')

Qua

tern

ary(

'gdhA

' 'pg

i' 'br

nQ' '

gnd')

GA Q

uate

rnar

y( 'g

nd'

'dcuC

' 'br

nQ'

'sdhD

')

GA Q

uate

rnar

y('sd

hC' '

gdhA

' 'ac

eA' '

gnd')

GA Q

uate

rnar

y('gn

d' 'm

dh' '

gdhA' '

sdhB

')

KO Genes

Bio

mas

s F

lux

(mm

ol/g

-DW

/hr)

In Conclusion

Will all knockouts identified show increased productivity?

In-vivo results could provide an opportunity to improve the model.

The approaches used justify some optimism regarding gene targeting for strain improvement

Provide a clearer understanding of the nature of the optimization goal

Questions?

Documents

FBA