Upload
will-gikandi
View
37
Download
2
Tags:
Embed Size (px)
Citation preview
Optimization Based Frameworks and Search Methodologies for the
Analysis and Redesign of the Escherichia coli Metabolic Network.
Thesis defense by: William W. GikandiMajor Professor: Matheos KoffasAdditional committee Members:Prof. E. (Manolis) S. Tzanakakis
Prof. Sriram Neelamegham
Cell Modeling to Improve Naringenin Production in E. coli
Cell Modeling
Variety of methods. Identify the steady state fluxes of a cell. Main ones Flux Balance Analysis and MOMA
Flux Balance AnalysisProcedure
Is it biologically justifiable to assume it?
“The steady state approximation is generally valid because of fast equilibration of metabolite concentrations (seconds) with respect to the time scale of genetic regulation (minutes)” – Segre 2002
Steady State Assumption
Maximization ObjectiveCell’s objective is to Maximize Biomass
The Maximization objective = the stoichiometric sum of components that constitute Biomass
Minimization of Metabolic Adjustment (MOMA) Do mutant bacteria exhibit optimum metabolic
states? Not subjected to the same evolutionary
pressure that shaped the wild type Therefore knockouts probably do not possess
a mechanism for immediate regulation of fluxes toward the optimal growth configuration
MOMA
Hypothesis: knocked out bacteria initially display a suboptimal flux distribution with minimal cell-wide changes in fluxes
MOMA uses quadratic programming to approximate this behavior
FBA and MOMA
MOMA calculates initial flux distribution after perturbation assuming sub-optimal growth.
FBA (incorrectly) assumes perturbed cells behave optimally from the onset.
Regulatory/ Kinetic effects not accounted for.
FBA/ MOMAconstraints fluxes
Does Cell Modeling Work?
Qualitatively predict the growth potential of mutant strains
Qualitatively predict media dependent uptake/ secretion of protons in the growth
The average difference between experimental flux measurements and ones predicted by the model was 16%
Quantitatively describe relationship between uptake of a primary carbon source (acetate, malate, succinate), oxygen and maximal cellular growth rate.
Successfully identify triple-knockout gene targets that improved lycopene yield by ~ 40% in E. coli
FBA/ MOMA
Building the Model
[c]akg + ala-L <==> glu-L + pyr
[c]ala-L <==> ala-D
[c]asn-L + h2o --> asp-L + nh4
[c]asp-L + atp + nh4 --> amp + asn-L + h + ppi
[c]asp-L + atp + gln-L + h2o --> amp + asn-L + glu-L + h + ppi
[c]asp-L --> fum + nh4
[c]akg + asp-L <==> glu-L + oaa
[c]3mob + ala-L --> pyr + val-L
[c]ala-D + fad + h2o --> fadh2 + nh4 + pyr
Matrix Creator
1191 Total Fluxes 932 Reactions 259 Transport & Exchange Fluxes 70 Dead end Metabolites
Current Model
Glycolysis, the TCA cycle, the pentose phosphate pathway, respiration, anaplerotic reactions, fermentative reactions, amino acid biosynthesis and degradation, nucleotide biosynthesis and interconversions, fatty acid biosynthesis and degradation, phospholipid biosynthesis, cofactor biosynthesis, and metabolite transport
Testing the ModelObtained in-Silico exchange fluxes vs. Palsson's iJR904 model
Similar results for Anaerobic-Glucose, Aerobic-Succinate, Aerobic-Acetate substrates-20
-10
0
10
20
30
40
50
Exchange flux
Ou
tpu
t (m
mo
l/g D
W-h
r)
Matlab
Palsson
EX_co2(e) EX_h(e) EX_h2o(e) EX_pi(e)
EX_nh4(e)
Biomass
Proton Exchange Flux
Limiting exchange of protons across system boundary
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
-10 -8 -6 -4 -2 0 2 4 6
Proton secretion flux
Re
lati
ve G
row
th R
ate
Acetate
Akg
Glucose-D
L-lactate
D-Lactate
Malate
Pyruvate
Succinate
Glycerol
Proton Exchange Flux
Naringenin
Reactions added
Participating Enzyme ReactionCoumaric Acid transport cma[e] <==> cma[c]4 coumarate:coenzyme A ligase [c]atp + cma + coa --> amp + ppi + cmcoaChalcone Synthase [c](3) malcoa + cmcoa --> (4) coa + chal + (3) co2Chalcone Isomerase [c]chal --> flvaNaringenin exchange flux [e]flva <==>Coumaric Acid exchange flux [e]cma <==>Naringenin transport flva[e] <==> flva[c]
Evaluate Scenarios
Gene-Protein Relationships
Gene-Protein Relationships
Gene-Protein Relationships
Gene Map
Overall Process
Standard Search
Combinatorial Explosion
Quaternary Knockouts ~ 230 days
At 2 seconds/ calculation…
Tertiary Knockouts ~ 12 daysSecondary Knockouts ~ 1 day
Primary Knockouts < 3 hours
Limited search space
Problem of large search space
Time taken Not all search covered Other methods possible? Genetic Algorithm
Genetic Algorithm
Genetic Algorithm
Crossover - Recombination
Crossover combines genetic material from two parents,Crossover combines genetic material from two parents,in order to produce superior offspring.in order to produce superior offspring.
Mutation
•Mutation introduces randomness into the population.Mutation introduces randomness into the population.•The idea of mutation is to reintroduce divergence into a The idea of mutation is to reintroduce divergence into a converging population.converging population.
Fitness Function
The Fitness function determines what solutions are better than others.
Fitness is computed for each individual. Fitness = flavanoid production
Example population
No. Chromosome Fitness
1 1010011010 1
2 1111100001 2
3 1011001100 3
4 1010000000 1
5 0000010000 3
6 1001011111 5
7 0101010101 1
8 1011100111 2
Main idea: better individuals get higher chance Chances proportional to fitness Roulette wheel technique
Selection
fitness(A) = 3
fitness(B) = 1
fitness(C) = 2
A C
1/6 = 17%
3/6 = 50%
B
2/6 = 33%
Stopping Criteria
Final problem is to decide when to stop execution of algorithm.
There are two possible solutions to this problem: First approach:
Stop after production of definite number of generations
Second approach: Stop when the improvement in average fitness
over two generations is below a threshold
Typical behavior of an EA
Early phase:
quasi-random population distribution
Mid-phase:
population arranged around/on hills
Late phase:
population concentrated on high hills
Phases in optimizing on a 1-dimensional fitness landscape
Advantages of GA’s
Search space not limited to first top 10 knockouts
Supports multi-objective optimization Can return a family of solutions with
similar fluxes Easy to exploit previous or alternate
solutions May find synergistic knockouts overlooked
by standard search
Genetic Algorithm
Parameters of the GA
Representation scheme: Integer [00100111][3 6 7 8]
Mutation rate: 1/ string length / locus restricted
Crossover type: scattered (random mix) Elite children : 2 Stall generations: 50 Population size: 1000 Mutation probability: Simulated Annealing
Simulated Annealing
Simulated AnnealingChange in Mutation Rate
0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30 40 50 60 70 80 90 100
Generation %
Mu
tati
on
rat
e
Results:
Results: Summary
Over 10,000 KO results were stored by the algorithms, out of about 900,000 MOMA calculations performed
Results: Hill Climber VS GA
Results for both methods in Agreement Exhaustive combination of top 10 most
frequently suggested KO’s yielded no better results
Implications: the search space is not as chaotic as originally assumed
Which is better?
Results: Effect of Gene Mapping
More accurate prediction on reactions affected by disruption of genes
For example, the top yielding candidate for a primary level knockout predicted the loss of two reactions
Results: Primary Level
The top result predicted a flux increase of naringenin from zero with no knockouts performed to 0.6078 mmol/g-DW/hr
Gene: sdhC Reaction:
Reaction reduces amount of fumerate available to the cell. (Other sources available: e.g. glutamate degradation)
Results: Primary Level
Affects ATP availability?
Results: Primary Level
The top second result Gene: tpiA Glycolysis
Affects ATP availability?
Results: Top 3 in each levelTop 3 Simulated KO in each level
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Primar
y('sd
hC')
Primar
y('tp
iA')
Primar
y('gn
d')
Secon
dary
('gnd
' 'sd
hC')
Secon
dary
( 'gly
A' 'sd
hC')
Secon
dary
('folD
' 'sd
hC')
Tertia
ry('g
dhA' '
gnd'
'sdh
C')
Tertia
ry('g
cd' '
glyA
' 'sd
hC')
Tertia
ry('m
dh' 'g
lyA'
'sdhC')
Qua
tern
ary(
'dcuC
' 'br
nQ' '
gnd'
'sdh
C')
Qua
tern
ary(
'dcu
C' 'br
nQ' '
folD
' 'sd
hC')
Qua
tern
ary(
'gdhA
' 'pg
i' 'br
nQ' '
gnd')
GA Q
uate
rnar
y( 'g
nd'
'dcuC
' 'br
nQ'
'sdhD
')
GA Q
uate
rnar
y('sd
hC' '
gdhA
' 'ac
eA' '
gnd')
GA Q
uate
rnar
y('gn
d' 'm
dh' '
gdhA' '
sdhB
')
KO Genes
Nar
ing
enin
flu
x (m
mo
l/g-D
W/h
r)
Results: Increase over Wild type
% increase over predicted naringenin wildtype flux (0.0002 mmol/g-DW/hr)
0
100000
200000
300000
400000
500000
600000
700000
800000
Primar
y('sd
hC')
Primar
y('tp
iA')
Primar
y('gn
d')
Secon
dary
('gnd
' 'sd
hC')
Secon
dary
( 'gly
A' 'sd
hC')
Secon
dary
('folD
' 'sd
hC')
Tertia
ry('g
dhA' '
gnd'
'sdh
C')
Tertia
ry('g
cd' '
glyA
' 'sd
hC')
Tertia
ry('m
dh' 'g
lyA'
'sdhC')
Qua
tern
ary(
'dcuC
' 'br
nQ' '
gnd'
'sdh
C')
Qua
tern
ary(
'dcu
C' 'br
nQ' '
folD
' 'sd
hC')
Qua
tern
ary(
'gdhA
' 'pg
i' 'br
nQ' '
gnd')
GA Q
uate
rnar
y( 'g
nd'
'dcuC
' 'br
nQ'
'sdhD
')
GA Q
uate
rnar
y('sd
hC' '
gdhA
' 'ac
eA' '
gnd')
GA Q
uate
rnar
y('gn
d' 'm
dh' '
gdhA' '
sdhB
')
% in
crea
se o
f n
arin
gen
in f
lux
Results: Targets
TCA cycle, the pentose phosphate pathway, and other biosynthetic pathways
Results: RationalizationPrecursor Availability
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Naringenin Flux (mmol/g-DW/hr)
Flu
x o
utp
uts
(m
mo
l/g
-DW
/hr)
Malonyl CoA ACP transacylase
acetyl CoA carboxylate
Malonyl CoA ACP transacylase: only consumer of malonyl CoA
Acetyl CoA carboxylate: produces malonyl CoA
Naringenin/ Biomass Relationship
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Naringenin output (mmol/g-DW/hr)
Bio
mas
s fl
ux
(mm
ol/
g-D
W/h
r)
Competition for precursors
Results: Diminishing Returns% increases of top 3 KO's over previous levels
0
50
100
150
200
250
300
Primar
y('sd
hC')
Primar
y('tp
iA')
Primar
y('gn
d')
Secon
dary
('gnd
' 'sd
hC')
Secon
dary
( 'gly
A' 'sd
hC')
Secon
dary
('folD
' 'sd
hC')
Tertia
ry('g
dhA' '
gnd'
'sdh
C')
Tertia
ry('g
cd' '
glyA
' 'sd
hC')
Tertia
ry('m
dh' 'g
lyA'
'sdhC')
Qua
tern
ary(
'dcuC
' 'br
nQ' '
gnd'
'sdh
C')
Qua
tern
ary(
'dcu
C' 'br
nQ' '
folD
' 'sd
hC')
Qua
tern
ary(
'gdhA
' 'pg
i' 'br
nQ' '
gnd')
KO Genes
% in
crea
se o
f n
arin
gen
in f
lux
Results: Diminishing ReturnsBiomass Threshold
Biomass Flux of top 3 KO's in each level
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wild
type
Primar
y('sd
hC')
Primar
y('tp
iA')
Primar
y('gn
d')
Secon
dary
('gnd
' 'sd
hC')
Secon
dary
( 'gly
A' 'sd
hC')
Secon
dary
('folD
' 'sd
hC')
Tertia
ry('g
dhA' '
gnd'
'sdh
C')
Tertia
ry('g
cd' '
glyA
' 'sd
hC')
Tertia
ry('m
dh' 'g
lyA'
'sdhC')
Qua
tern
ary(
'dcuC
' 'br
nQ' '
gnd'
'sdh
C')
Qua
tern
ary(
'dcu
C' 'br
nQ' '
folD
' 'sd
hC')
Qua
tern
ary(
'gdhA
' 'pg
i' 'br
nQ' '
gnd')
GA Q
uate
rnar
y( 'g
nd'
'dcuC
' 'br
nQ'
'sdhD
')
GA Q
uate
rnar
y('sd
hC' '
gdhA
' 'ac
eA' '
gnd')
GA Q
uate
rnar
y('gn
d' 'm
dh' '
gdhA' '
sdhB
')
KO Genes
Bio
mas
s F
lux
(mm
ol/g
-DW
/hr)
In Conclusion
Will all knockouts identified show increased productivity?
In-vivo results could provide an opportunity to improve the model.
The approaches used justify some optimism regarding gene targeting for strain improvement
Provide a clearer understanding of the nature of the optimization goal
Questions?