Discrete Modeling, Discovery and Prediction for Evolving ......Discrete Modeling, Discovery and...

Preview:

Citation preview

Discrete Modeling, Discovery and Prediction for Evolving, Living

Systems

Myra B. Cohen1, Nicole R. Buan2, Christine Kelley3, Mikaela Cashman1, Jennie L. Catlett2

1. Department of Computer Science & Engineering 2. Department of Biochemistry 3. Department of Mathematics

Motivation

vs.

Green Energy Petroleum based Fuels

Methane-producing archaea (methanogens)

•  Phylogenetically distinct group •  Derive all their energy from

reduction of C1 compounds to methane

•  4% of the global C cycle (2 Gigatons per year)*

•  Strict anaerobes

* Thauer RK. et al. 2008. Microbiology. 6:579-591.

Global C cycle

Methanogen Biotechnology

www.spaceX.com

www.sagentpharma.com

WHO Essential Medicine ~50% all chemotherapy

www.fordcngokc.com

www.mineralhq.com

Transportation Cleaner than diesel

Methanogen Biotechnology

Deer Island, MA Hyperion, CA

Lincoln, NE

Biomass Energy - Nebraska

Aliens Among As

Adapted from Pace, NR. 2009. MMBR. 73(4):565-76

Humans

E. Coli

methanogens

A Tale of two pathways… Methylotrophic Acetoclastic

Entropy-retarded Enthalpy-retarded

We can control behavior

Typical Organism Behaviors

(e.g E. coli)

First-principles reasoning? •  Methanogens are ruled by:

– Thermodynamics and biochemistry, information processing, regulation, selection, mutation, etc.

•  To date no general set of equations describes behavior and evolution that – Applies equally well to methanogens,

bacteria, eukaryotes

Dynamic •  Organisms reproduce with ~99.999% probability of

genetic information being passed to next generation •  Mutations occur which can change gene functionality •  Environment impacts the behavior:

–  Food sources –  Light –  Temperature –  Pressure – …?

Data Driven •  As these organisms grow/die within their

environment they are sensing both the environment as well as receiving messages (communicating) with other organisms in their vicinity

•  Based on what they sense they produce outputs (e.g. methane)

Models Today Chemical Reaction Networks

Reaction Networks •  Allow us to model the chemical reactions (as

PDEs) through a cell •  Based on the “whole cell model”

Physical Models •  Flux balance analysis:

–  Optimization algorithm that solves the series of reaction equations to calculate the steady-state fluxes of an organism’s reaction network

–  Can use to predict biomass based on inputs

•  Gapfilling: –  Incomplete models may have incomplete

networks and will not grow. Gapfllling fills in missing reaction pathways using mixed linear programming

Problems with Existing Models

•  Highly dependent on human annotations from empirical data

•  Infer unknown behavior from organisms that are annotated

•  Complex – difficult to reason about high level behavior

Variance of Pathways

Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf and Buan. 2014. PLOS One. 9(9): e107563.

Application Systems

Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf and Buan. 2014. PLOS One. 9(9): e107563.

Organisms sense, adapt

Use DDDAS?

Software (Discrete) Testing Perspective

Configurable Software

Discrete/Model Sampling

Observe Behavior

Optimize Parameters for

an objective

Pierobon, Cohen, Buan, Kelley, SCIM: Sampling, Characterization, Inference and Modeling of Biological Consortia, 2015

Methanogen Configuration Options

•  Media compounds (e.g. glucose) •  Light •  Pressure •  Temp •  Oxygen Use discrete values for sampling

Reasoning about Configurations with Coding

Theory Error correcting codes: transmit information reliably and efficiently across space/time Factor graphs •  Variable nodes represent information •  Constraint nodes represent constraints/dependencies •  Decoding and error-correction is performed via message

passing on the edges of the graph. •  Update rules of the messages at the nodes follow belief

propagation algorithm on Bayesian networks

Factor Graph

•  The input (i.e., “channel information”) to each variable node is a vector with n parameters (one for each factor)

•  Update rules are designed for each factor, and iterative decoding is performed to determine how the system behaves for various inputs

•  We can test how the system changes with modifications to certain factors

f1 f2 f3 f4

x1 x2 x3 x4 x5 x6

f(x1,x2,x3,x4,x5,x6) = f1(x1,x3,x6) f2(x2,x4) f3(x1,x5) f4(x3,x5)

µx1àf1

Configurations

ρf1àx1 ρf3àx1

inx1

Fitness methane/flux Population

by fitness

p

……

popula(on

1 2 3 n

Crossover

p

1 2 3 n

X

Mutation

p

1 2 3 n

X

p

……

popula(on

1 2 3 n

Population

DDDAS System

Sensors Evolution/Adaptation

Simulation/updating of models

Feasibility

Goals •  Evaluate models for optimization •  Use a well studied methanogen

– Methanosarcina acetivorans •  Explore a part of configuration space

contained in KBase •  Understand how well current models

describe the organism

Exploring Environment •  Iteration One (729 data points)

– 12 compounds in growth media H2O, Phosphate, CO2, NH3, Acetate, Sulfate, H+, L-Cysteine ,Co2+, Ni2+, Fe2+, H2

– Vary max flux for 6 (3 different flux values) •  Iteration Two (2187 data points)

– Two compounds that have no impact. Made constant, added 3 more –> 7 factors

Results (iteration 1)

Phosphate

1.2

4.6

Flux=1

L-Cysteine

5.1

Flux=1 Flux=10 or 100

Flux=10or100

Results (iteration 2) Acetate

.05 Flux=1

Flux=100

Phosphate

1.2

4.6

Flux=1

L-Cysteine

Flux=1

Flux=10or100

C02

4.6

Flux=10or100

5.1

.5 Flux=10

But •  We know the models are not perfect •  Still need laboratory data

Next Iterations •  Drill down on the four primary factors:

– Acetate, Phosphate, L-Cysteine and CO2 •  Use smaller flux distances •  Run generic algorithm on a large

number of flux values and more compounds

•  Validate results in lab and update model

Summary •  View biological organisms as part of a

DDDA system •  Developing techniques for discrete

sampling/modeling of their configuration space

•  Developing optimization techniques to fit into the DDDAS loop

References 1.  Thauer RK. et al. 2008. Microbiology. 6:579-591 2.  Pace, NR. 2009. MMBR. 73(4):565-76 3.  Lieber, Catlett, Madayiputhiya, Nandukumar, Lopez, Metcalf

and Buan. 2014. PLOS One. 9(9): e107563 4.  Pierobon, Cohen, Buan, Kelley, SCIM: Sampling,

Characterization, Inference and Modeling of Biological Consortia, 2015

5.  J. Swanson, M.B. Cohen, M.B. Dwyer, B.J. Garvin and J. Firestone, Beyond the Rainbow: Self-Adaptive Failure Avoidance in Configurable Systems, Foundations of Software Engineering, 2014, pp. 377-388

Acknowledgements

CCF-1161767 CNS-1205472 IOS-1449525

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding agencies

Recommended