Some ‘formal’ models - Oxford Statistics · 2017. 10. 16. · there is a phase transition where the entire reaction graph becomes an autocatalytic set, similar to giant connected

Autocatalytic sets and models of early life

Oxford, 2016 1

!  Mike Steel Joint work with…

Wim Hordijk

Elchanan Mossel Stuart Kauffman

Filipa Sousa

2

!  Many ideas/theories re. origin of life

( ‘RNA world’, genetic first/vs metabolism first, hydrothermal vents etc).

!  Current DNA/RNA/protein molecular machinery too complex to have arisen spontaneously all at once.

It is often said that all the conditions for the first production of a living organism are now present, which could ever have been present.— But if (& oh what a big if) we could conceive in some warm little pond with all sorts of ammonia & phosphoric salts,—light, heat, electricity &c present, that a protein compound was chemically formed, ready to undergo still more complex changes, at the present day such matter would be instantly devoured, or absorbed, which would not have been the case before living creatures were formed. Letter to J. D. Hooker, 1 Feb [1871]

3

Some ‘formal’ models !  (1940s-1980s)

!  Self-reproducing automata (von Neumann) !  Chemoton model (Gánti) !  ‘Hypercycles’ (Eigen and Schuster) !  Collectively autocatalytic systems (Kauffman, Farmer, Bagley) !  First cycles in directed graphs (Bollobás and Rasmussen) !  (M,R)-systems (Rosen)

!  More recently (1990s-)

!  Petri Nets (Sharov) !  Chemical organisation theory (COT)

"  (Contreras et al. 2011; Kreyssig et al. 2012) !  RAF theory

4

!  Key early steps require the emergence (and

evolution) of self-sustaining and autocatalytic networks of reactions.

Vaidya et al., Nature, 2012

5

Two features of catalysis

Wolfenden,)Snider,)Acc.)Chem.)Res,)2001)

Accelerates the production of molecules in the network so they accumulate spatially in concentrations sufficient to sustain further reactions and fight diffusion.

Not only much faster rates, but also tightly ‘coordinated’

6

Catalytic Reaction System (CRS)

Molecule types

Reactions

Catalysis “Food” set

(x, r)

7

r = ({reactants}, {products}) = (⇢(r),⇡(r))

Q = (X,R0, C, F )

R0 ✓ 2X ⇥ 2XC ✓ X ⇥R0 F ✓ X

Another way to view a CRS

8

A directed (and bipartite) graph with two types of vertices (molecule types, reactions) and two types of arrows (reactants + products, catalysis).

f1

f2

f3

r1

r2

r3

p1

p3

p4

p2

Simple example: polymer model

A set of molecules represented by strings over an alphabet (e.g. 0, 1)

up to length n, with food molecules up to length t (with t << n).

A set of reactions of two types: ligation: 000+111�000111 cleavage: 0101010�0101+010 Randomly assigned catalysis: Pr[x catalyzes r] = p(x,r) Uniform model p(x,r)=p

9

Early claim: “The formation of autocatalytic sets of polypeptide catalysts is an expected emergent collective property of sufficiently complex sets of polypeptides, amino acids, and other small molecules.”

(Kauffman, 1986)

Basic idea: Given a fixed probability of catalysis p and increasing n, at some point there is a phase transition where the entire reaction graph becomes an autocatalytic set, similar to giant connected components appearing in random graphs.

10

Main Criticisms ●  Argument requires an exponential growth rate in

level of catalysis (Lifson, 1996).

●  Autocatalytic sets lack evolvability (Vasas, Szathmáry & Santos, 2010).

●  Binary polymer model is not realistic enough (Wills & Henderson, 1997).

11

We will consider all these issues….

Our approach Use mathematics (and simulations) to study the polymer model and its extensions

12

First we need to formalize some notions….

Definitions: Closure

!  Given any subset R of R0, the closure of F (relative to R)

is the set of molecule types that can be constructed from F by applying just reactions from R (whether they are catalysed or not).

!  Formally, clR(F) is the unique (minimal) subset W of X that

contains F and satisfies:

!  clR(F) is computable in polynomial time in |Q|

13

clR(F )

⇢(r) ✓ W ) ⇡(r) ✓ W

Definition: F-generated

14

⇒ each reactant of any reaction in R is either in F or is a product of some other reaction in R

⇐ ?

R is F–generated if clR(F ) contains

every reactant of every reaction in R

Definition: RAF (Reflexively Autocatalytic network over F)

15

A subset R of R0 is an RAF if R ≠∅, and it satisfies the two properties: (RA): each reaction r in R, is catalysed by a product of some other reaction (or by an element of F), (F): R is F-generated

Earlier example

16

Equivalent definition

17

f1

f2f3

p1

p2

p3

p4

f4

f5

A subset R of R0 is a RAF if

R 6= ;, and for each reaction r 2 R

all of the reactants and at least one catalyst of r

are present in clR(F )

Two nice properties of RAFs !  The union of any collections of RAFs is itself an RAF

"  So if Q has an RAF then it contains a unique maximal RAF. "  Denote this by maxRAF(Q)

!  There is a simple algorithm to determine whether or not Q has

an RAF, and if so to compute maxRAF(Q) (polynomial time in |Q|).

18

Related notions:

(Constructively Autocatalytic network over F)

f1 f4

f5

f2f3

p4

p1

p2

p3

f1 f4

f5

f2f3

p4

p1

p2

p3

f1

f2f3

p4

p1

p2

p3

CAF

RAF

pseudo-RAF

RAF Algorithm

20

R0, R1, . . . (nested decreasing sequence) with limit R1

Ri+1 = reactions in Ri that have all their reactants

and at least one catalyst in clRi(F )

If R1 = ;, then Q has no RAF, else R1 = maxRAF(Q).

Quantities of Interest

Q = (X, R0, C, F ) full binary polymer model (on all sequences of length up to n).|X|~2n+1; |R0|~n2n+1.

●  Average number of reactions catalyzed by any molecule type: f = p·|R0|

●  Probability Pn = Pr(R0 contains an RAF) that an instance of the binary polymer model contains an RAF set.

21

Early results !  p constant (i.e. independent of n): !  But this requires f to grow exponentially with n which is

biochemically unrealistic (Lifson ‘96) !  What if f grows more slowly?

"  [S: 2000]

[Conjecture: sub-quadratic]

22

If f < 13e

�1 then Pn ! 0 as n ! 1

If f > cn2 then Pn ! 1 as n ! 1

[Kau↵man; 1986, 1993] Pn ! 1 as n ! 1

Probability of RAF Sets

[Hordijk+S, 2004] 23

Main theoretical results I (Mossel+S, 2005)

!  Theorem 1: Linear transition for RAFs

!  Theorem 2: Exponential transition for CAFs

24

If f / n1+� then Pn ! 1 as n ! 1If f / n1�� then Pn ! 0 as n ! 1

If f / (2� �)n then Pn ! 0 as n ! 1If f / (2 + �)n then Pn ! 1 as n ! 1

The actual bounds (Mossel+S, 2005)

25

�

= #states

t = length of sequences in F

t = 2, = 2

Generates all of X but not all of R0 (this requires f to grow quadratically with n) Infact, P10 = 0.5 when f = 1.3 (not f = 17).

If f �n then Pn M�

If f � �n then Pn � 1� (e��)t

1�e��

Small RAFs !  Finding a smallest RAF is NP-hard. !  Definition: An RAF of Q is irreducible

if it contains no proper sub-RAF

!  Finding an irrRAFs is easy… !  But there can be exponentially many

of them

26

Small RAFs

!  irrRAFs can be of different sizes:

27 Two instances of binary polymer model with n=10 (|R0|= 16,388) at f =1.2 where Pn~0.5.

Main theoretical result (S+Hordijk+Smith 2013)

Theorem 3 There are no small RAFs when they first appear

!  When first cycles first appear in a random directed graph, they are of all sizes (Bollobás and Rasmussen, 1989)

!  There are tiny RA sets, and tiny F-generated sets but none that are both.

28

1.15 1.20 1.25 1.30 1.35 1.40 1.45

050

010

0015

0020

00

f = p | R|

# re

actio

ns

maxRAFirrRAF

If f = �n then as n ! 1:

P(R0 contains an RAF of size < 2

cn) ! 0

Structure of RAFs

!  In general maxRAF(Q) may contain (many) other sub-RAFs

29

Movie by Wim Hordijk (KLI institute) 2016 An instance of binary polymer model with n=5 (|R0|= 196), f=073, where Pn~0.5.

Dynamics (Gillespie algorithm)

Poset of RAFs

30

Computing: It’s easy to….

Find all the maximal proper subRAFs of the maxRAF (polynomial time in |Q|). Construct the poset P of subRAFs of Q. (polynomial time in |P|x|Q|).

31

Application to a real experimental system

32

+Dynamics of subRAFs via Gillespie algorithm

RNA ribozyme replicator system: 16 reactions, 18 molecules, |F|=2, 64 catalysation pairs (x,r). Forms an RAF (but not a CAF!). Contains many subRAFs.

Application to a living organism

33

#  CRS has 1826 reactions, 1199 molecule types (42 catalysts), |F|=438.

#  RAF set of 1787 reactions out of 1826 = 98%. Not a CAF

#  Complex subRAF structure #  min F for a RAF = 123 molecules

Methanopyrus kandleri E coli

Extensions beyond the uniform model

34

!  Allowing p(x,r) to vary !  Template-based catalysis !  p(x,r) depends on length of x !  Partitioned polymers system

!  Extension of transition theory to non-polymer systems

[n replaced by ratio of # reactions to # molecules]

Recent work [motivated by Oxford summer school project]

35

W. Hordijk, L. Hasenclever, J. Gao, D. Mincheva, and J. Hein. An investigation into

irreducible autocatalytic sets and power law distributed catalysis. Natural Computing, 13(3):287–296, 2014.

•  Hordijk, W. and Steel, M. (2016). Autocatalytic sets in polymer networks with variable catalysis distributions ArXiv 1605.03919v1 (submitted to J. Math. Chem.).

Four models

!  Uniform model

!  Power law catalysis

!  Sparse model (each molecule catalyses n reactions w.p. p or no reactions w.p. 1-p.) !  All or nothing (each molecule catalyses all reactions w.p. p or no reactions w.p. 1-p)

36

JGAA, 0(0) 0–0 (0) 11

M(1 − q∗), and so, from Eqn. (13), Pn is less than or equal to M(λ + o(1)),which converges to zero as λ → 0. !

Fig. 3 shows the behavior of Pn as a function of f for the binary polymermodel, across our four exemplar distributions (for n = 10 and n = 16). Theplots were obtained by simulations and the use of the RAF algorithm (exceptfor the all-or-nothing model, where the exact formula was used). Two features

●●

●

●

●

●

●●●●●●●●●●●●

●●●

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

f

Pn

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●●●●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

● ● ●● ● ● ● ● ● ●

●

●

●

Unif (n=10)Unif (n=16)Plaw (n=10)Plaw (n=16)Sparse (n=10)Sparse (n=16)All/none (n=10)All/none (n=16)

Figure 3: Comparison across the four models of Pn (probability of a RAF) as afunction of f (average catalysis rate per molecule) on the binary polymer modelfor n = 10 and n = 16.

are apparent. Firstly, the uniform model and the power law model show a muchsharper transition from Pn = 0 to Pn = 1 than the other two models. Secondly,the effect of increasing n (from 10 to 16) has a larger effect on the curves forthe sparse and all-or-nothing model than it does for the uniform and power lawmodels. Theorem 1 also leads to the following interesting consequence for thestructure of small RAFs in the sparse model, which is different to that for theuniform model.

Corollary 1 Consider the sparse polymer model with f = λn, where λ is suf-ficiently large such that (from Theorem 1) P̃n ≥ 1− ϵ. Then with probability atleast 1− ϵ− o(1), any instance of the model will contain a RAF with less than2λn2 reactions (where o(1) refers to a term that converges to zero exponentiallywith increasing n).

Pn = 1� e��

Result 1

37

JGAA, 0(0) 0–0 (0) 19

Case Slopestandard 0.02all-X 0.70F & all-X 1.51theory 1.63

Table 1: The slopes of the linear relationship for the growth rate in requiredlevel of catalysis, with increasing maximum molecule length n, as derived fromthe four different cases described in the text.

5.2 Power law catalysis

In contrast to the uniform catalysis version of the binary polymer model, thepower law catalysis version seemed to require no increase in the level of catalysis,with increasing n, to get a probability Pn ≈ 0.50 of finding RAF sets. In Fig. 9

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●●

●●

●●●

1.0 1.2 1.4 1.6 1.8 2.0

0.0

0.2

0.4

0.6

0.8

1.0

f

Pn

● n = 8n = 10n = 12n = 14n = 16n = 18n = 20

Figure 6: The probability Pn of finding RAF sets against the average level ofcatalysis f for various values of the maximum molecule length n in the binarypolymer model with power law distributed catalysis.

in [3], Pn vs. f is plotted for various values of n (up to n = 12) for the powerlaw case. However, rather than slowly moving to the right (i.e. towards higherf values, as is the case for the uniform model [7]), the curves for increasingn for the power law case cross each other around f = 1.5 [3]. This resultseems somewhat counter-intuitive, especially as Theorem 1(ii) above impliesthat the curves must eventually start moving to the right as n grows. It turns

Recall: when RAFs first arise, the uniform model has no small (subexponential) RAFs. But the Sparse model has O(n2) size RAFs

Pn � 1� (e��)t

1� e��Pn �

✓1� (e��/2)t

1� e��/2

◆(1� e��/2)If f � �n then

If f �n then Pn M�

Impact of inhibition

38

Definition (u-RAF)

Determining if there exists a u-RAF is NP-hard (Mossel+S, 2008)

Simple algorithm:

Q ! maxRAF(Q) ! R0 ! maxRAF(Q0)

Given Q = (X,R0, C, F ) and I ✓ X ⇥R0,

a u-RAF is a RAF R that satisfies:

(x, r) 2 I ) r 62 R or x 62 clR(F ).

Result 2

39

JGAA, 0(0) 0–0 (0) 21

6. Theorem 2 coupled with Fig. 3 suggested that u-RAFs would likely exist,and applying the iteration algorithm over 10 random instances succeeded indetecting u-RAFs in all cases, as shown in Table 2. The fixed-pararametertractable u-RAF algorithm would be quite infeasible here, as m = |X10| = 2046.

(1) (2)6393 25476415 25866447 27296410 27166351 25816359 26676395 27186397 25126418 26316438 2625

Table 2: The sizes of (1) the maximal RAF not taking inhibitors into account,and (2) the u-RAF from the iterative algorithm, across 10 random instances ofthe uniform binary polymer model containing a RAF (with n = 10, catalysisrate f = 4 and inhibition rate h = 6).

6.1 Efficiency of iteration algorithm

Since the “standard” RAF algorithm has a running time that is polynomial inthe size |R| of the given CRS, the iteration algorithm above for finding u-RAFsis clearly also polynomial-time. We now compare the results of the iterationalgorithm with that of the earlier exact algorithm on model instances in whichthe number m of inhibitor molecule types is bounded (this exact algorithm, bycontrast, has a complexity that grows exponentially with m). We then applythe iteration algorithm to model instances that are beyond the feasible reach ofthe exact method.

Note that in this section we continue to work in the uniform model, but nowthere are m molecule types that are inhibitors, and for each of these the prob-ability the molecule type inhibits any particular reaction is q. This is thereforequite different to the set-up in Section 3.1 (and discussed above) where eachmolecule type has a constant probability of inhibiting each reaction.

Using the same parameter values as in the previous study (i.e., n = 10, t = 2,p = 0.0000792 [giving a probability Pn of 0.50 to get “regular” RAFs]), m = 10,and inhibition probabilities q = 10×p and q = 100×p [6], the results are shownin Table 3 (including the additional parameter value q = 50× p).

For each of the three cases (different values for the inhibition probability q),10 instances of the model that contain a “regular” RAF were taken and boththe exact and the iteration algorithm for finding u-RAFs were applied. In eachtable, the column labeled (1) shows the size of the (regular) maximal RAFs for

µ(f, h) = expected number of u-RAFs

h = inhibition rate

(expected # reactions inhibited by each reaction)

Inhibition theorem

µ(2f, h) � µ(f, 0)

If f = �n and h ln(1+e��)� n then

Example: n=10, |R0|=16,388

f = 2, h = 30 µ(4, 30) � µ(2, 0)

Possible future applications

40

Further details

!  Economics/social science !  cognitive psychology

•  Steel, M. (2015). Self-sustaining autocatalytic networks within open-ended reaction systems.

Journal of Mathematical Chemistry, 53(8): 1687--1701. •  F. L. Sousa, W. Hordijk, M. Steel, and W. F. Martin, Autocatalytic sets in E. coli metabolism.

Journal of Systems Chemistry, 6:4, 2015. •  Hordijk, W. and Steel, M. (2013). A formal model of autocatalytic sets emerging in an RNA

replicator system. Journal of Systems Chemistry 4:3. •  Steel, M., Hordijk, W., and Smith, J. (2013). Minimal autocatalytic networks. Journal of

Theoretical Biology 332: 96-107. •  E. Mossel and M. Steel. (2005). Random biochemical networks and the probability of self-

sustaining autocatalysis. Journal of Theoretical Biology 233(3), 327-336.

Documents

Some ‘formal’ models - Oxford Statistics · 2017. 10. 16. · there is a phase transition where the entire reaction graph becomes an autocatalytic set, similar to giant connected