Using Weighted Ensemble Sampling in Spatial Stochastic ...€¦ · ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations were run on the Salk machine at the Pittsburgh

Using Weighted Ensemble Sampling in Spatial Stochastic

Simulations

Rory Donovan MMBios Scientific Meeting 26/2/15

Motivation

• Biological systems are complex

• “Simple” underlying dynamics can be incredible costly to simulate directly

• Need to look for ways to increase efficiency of computations

• If rare events matter, how can we cope?

Resampling

Halve Points, Double Weights

Double Points, Halve Weights

Do Both, In Different Regions

Original Sample

A Weighted Ensemble is Just Repeated Resampling

•Of any stochastic process:

•Molecular dynamics (protein motion)

•Chemical kinetics (cell signaling)

•Branching processes (cancer modeling)

•Agent based models (population dynamics)

Repeated RunsNormalized Histogram

Ensemble Samplingt = 0 t = τ t = 2τ t = 3τ

U(x)

bin 1 bin 2 bin 3

Probability:1/8

x

Time Propagation: Arbitrary Dynamics EngineU(x)

bin 1 bin 2 bin 3 x

U(x)

bin 1 bin 2 bin 3 x

U(x)

bin 1 bin 2 bin 3 x

t = 0 t = τ t = 2τ t = 3τ

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

U(x)

bin 1 bin 2 bin 3

Probability:1/2 1/4 1/81 1/16

WE

Dynamics WE

Dynami

cs WE

Dynamics W

E

x x x x

x x x x

Time Propagation: Arbitrary Dynamics Engine

Split

ting/

Mer

ging

: Wei

ghte

d En

sem

ble

Weighted Ensemble Sampling

Ensembles: Weighted vs Unweighted

Spatial Stochastic Systems

Toy Model

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●

●

●●●

●●●●●●●●●●

●●●●

●●●●10−120

10−100

10−80

10−60

10−40

10−20

100

0 20 40 60 80 100 120Bound Receptors on Opposing Surface

Prob

abili

ty

●

Brute Force MCellWeighted Ensemble

t = 1/100 s

●

●●

●●

●

●●●

●●●

●●●●●●●

●●●●

●●

●

●●

●●

●●

●

●●

●

●

●●

●●10−10

10−8

10−6

10−4

10−2

100

0 10 20 30 40Bound Receptors on Opposing Surface

Prob

abili

ty

t = 1/100 s

Toy Model Results

Neuromuscular Junction

MATERIALS AND METHODS

Model geometry

Spatially realistic model geometries for MCell simulations can beobtained in several ways, such as by reconstruction from electron micro-scopy imagery (14) or by in silico methods using computer-aided designor three-dimensional (3D) content creation software (15). The latter is apowerful new approach for modeling complex biological structures thatallows for rapid creation of model geometries and thus complements tradi-tional reconstructions from electron microscopy. We used Blender (16) tocreate our model’s 3D mesh geometry in silico according to dimensionsbased on published averages (17,18) (see Fig. 1). Our model included26 synaptic vesicles of 50 nm diameter arranged in two double rows.We marked triangular mesh tiles on the bottom of each synaptic vesicleas putative Ca2þ-sensor sites. Similarly, mesh tiles in the trough betweenvesicles were marked as putative VGCC sites at locations suggested bypublished estimates (17–19). These marked tiles could then be populatedwith Ca2þ-sensor sites or VGCCs as needed within MCell’s ModelDescription Language (MDL). From Blender, meshes were exporteddirectly into MDL.

MCell simulations and algorithms

All simulations were carried out with MCell version 3.1 (rev. 788) with acustom binary reaction data output format to facilitate data handling. Foreach simulation condition (different numbers of Ca2þ-sensor sites onvesicles, varying extracellular Ca2þ concentration, etc.), we averaged theresults over 10,000 separate MCell runs obtained via different randomnumber seeds. During each run we tracked the Ca2þ ions emerging fromindividual VGCCs, their binding to Ca2þ buffer, and their binding toCa2þ-sensor sites on vesicles. Ca2þ ions that encountered the edges ofour model geometry were removed from the simulation to mimic theirdisappearance into the adjacent presynaptic space. We used programs writ-ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations

were run on the Salk machine at the Pittsburgh Supercomputing Center,an SGI Altix 4700 shared-memory NUMA system with 144 Itanium2 pro-cessors, cmist, an eight-core Intel Xeon E5472 machine, or a local desktopwith an Intel i7-980X CPU.

We previously described MCell’s simulation algorithms in detail (20,21).In brief, in an MCell model, the membranes of cells and organelles arerepresented by triangulated surface meshes. Brownian motion of diffusingvolume and surface molecules is simulated using optimized grid-free Brow-nian dynamics random walk algorithms (21). Due to the inherently smalllength scales present in our model (e.g., distances between VGCCs and syn-aptic vesicles or between vesicles and the presynaptic membrane), we useda small simulation time step of 10 ns and corresponding short diffusion steplengths to allow for accurate spatial sampling (22).

Simulation of VGCCs and Ca2D influx

VGCCs in our model opened and closed according to a time-dependentaction potential waveform (Fig. 2 A) and were modeled via a four-statekinetic scheme (three closed states and one open state; see Table 1 and insetin Fig. 2 A). The rate constants between states were voltage dependent withthe following parameters (23):

a ¼ 0:06eðVmþ24Þ=14:5

b ¼ 1:7

eðVmþ34Þ=16:9 þ 1:

Here, Vm(t) is the time-dependent membrane voltage corresponding to theaction potential waveform shown in Fig. 2 A. The time-dependent rate con-stants a and b were precomputed and stored in separate text files that werethen read by MCell at the beginning of each simulation. Because the extra-cellular Ca2þ concentration, [Ca2þ]ext, remains approximately constantduring an action potential, we approximated Ca2þ flux through open

A D

BE

C

FIGURE 1 Overview of the frog NMJ model.(A) Rendered simulation snapshot of the completeAZ model. Visible are the two double rows ofsynaptic vesicles (large red spheres) as well asdiffusing free and buffer-bound Ca2þ ions (smallcolored spheres). For visual clarity, unbound buffersites are not shown. (B and C) Close-ups of synap-tic vesicles, revealing the Ca2þ-sensor sites at theirbottom (the 40-sensor model is shown). The cylin-drical glyphs directly in front of synaptic vesiclesrepresent closed or open VGCCs (red, closed;yellow, open). Open VGCCs release Ca2þ ions(yellow spheres) into the presynaptic space, whichcan bind to endogenous buffer sites (cyan spheres;only Ca2þ-bound buffer sites are shown) or sensorsites on synaptic vesicles. Unbound and boundCa2þ-sensor sites on vesicles are colored blackand yellow, respectively. (D and E) Importantmodel dimensions (drawings are not to scale).

Biophysical Journal 104(12) 2751–2763

2752 Dittrich et al.


Model geometry








a ¼ 0:06eðVmþ24Þ=14:5

b ¼ 1:7

eðVmþ34Þ=16:9 þ 1:


A D

BE

C




Neuromuscular Junction Model

NMJ Zoomed-In


Model geometry








a ¼ 0:06eðVmþ24Þ=14:5

b ¼ 1:7

eðVmþ34Þ=16:9 þ 1:


A D

BE

C




• Calcium is released from bottom, diffuses, and can bind to synaptotagmin vesicles

• Model: if enough calcium bind to one vesicle, in the right pattern, a release event occurs

First Passage Times

10−10

10−8

10−6

10−4

10−2

100

0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3First Passage Time (ms)

Prob

abili

ty

Ca = 0.5, BFCa = 0.5, WE (100 iters, 8 spb)

Total Firing: WE = 0.0014 / BF = 0.0013

Scaling Relationship

Prob ~ Ca4.67

10−5

10−4

10−3

10−2

10−1

100

0.1 0.5 1 1.5 2Ca Concentration (mM)

Prob

abili

ty to

Fir

e

BFWE

Firing Probability vs. Ca Concentration

Realistic Geometry

Signaling Network

Pipeline: CellOrganizer → BioNetGen → MCell

Harris, Hogg and Faeder

R

L

2

2

2

PM

EC

NU

EM

EN

NM

P1

Im

P1

P1

*

P2

Im

P1

Im

*2

P1

Im

NP

R2 R1

R3-5

R6-7

R8

R10,11

R9

R11

R122

R14

R15

R13

R16

R17R19

R18

R21

R20

R23

R22

R22

R24R25

R26

R27

R28

R28

R28

R28

R30

Im

TF TF p

TF p

p

TF p

p TF

p

p

Im TF

p

p

Im TF p

p

R R R R

L L

TF p

TF p

p p

L L

R Rp p

L L

R R

EM

EN

EM

EN

EM

EN

L L

R R

L

R

p2

TF p

pp1

p2

p1 mRNA1

mRNA2

mRNA2

mRNA1

Im

P1

CP

L L

NP

NP

TF p TF p

Figure 1: A compartmental model of the cell. The model couples simplified processes of signal tranduction, nuclear transportand transcriptional regulation in a single eukaryotic cell. The system consists of four volume compartments: extracellular(EC), cytosol (CP), endosomal (EN) and nuclear (NU). These are separated by three membrane surfaces: plasma (PM),endosomal (EM) and nuclear (NM). The model is presented as a pathway that proceeds from ligand (L) binding to expressionof protein P2. The underlying rule-based model defines a set of 354 reactions between 78 species. Bonds between moleculesare shown as black lines. Black arrows between species represent reactions. Gray arrow labels correspond to the rule numberthat describes the reaction (see model files at www.bionetgen.org/wsc09). Black integer-valued arrow labels representreaction stoichiometry (if not equal to unity). DNA promoters are pictured as a double helix icon.

the same molecule or another. Components may be associated with state variables with a finite set of possible values, eachrepresenting a conformational or chemical state of a component, such as phosphorylation status. The name of the moleculetype is given first followed by a comma-separated list of its components in parenthesis. The allowed values of state variablesare indicated by ‘⇠’ followed by a value, as in L(r,d,loc⇠EC⇠EN), which declares a ligand molecule L that contains areceptor binding component r, a dimerization component d, and a location component loc that takes on values EC or EN.

The seed species block defines the species initially present in the system. For example, the line

L(r,d,loc⇠EC) Lig0

specifies that the initial amount of free ligand monomers in EC is Lig0, a parameter defined in the parameters block.Molecular complexes may also be specified, as in L(r,d!1,loc⇠EC).L(r,d!1,loc⇠EC), where a bond linking thed components of each L molecule is indicated by a shared bond label, !1.

The reaction rules block contains rules that define how molecules interact. A rule is comprised of a set of reactantpatterns, a transformation arrow, a set of product patterns, and a rate law. A pattern is a set of molecules that select speciesthrough a mapping operation (Blinov et al. 2006). The match of a molecule in a pattern to a molecule in a species dependsonly on the components specified in the pattern (including wildcards), so that one pattern may select many different species.The ‘+’ operator separates two reactant patterns that must map to distinct species (i.e., they may not reside in the samecomplex). The ‘.’ operator separates molecules that are part of the same species. The transformation arrow may be eitherunidirectional (->) or bidirectional (<->). Five basic types of operations are carried out by the rules in the example system

Protein Histograms

BF Limit

10−10

10−8

10−6

10−4

10−2

100

0 5 10 15 20 25Total Protein 1

Prob

abili

ty

BFWE

Time = 100 secs

Steady-State

In steady state, MFPT(A→B) = 1/Flux(A→B)

First Passage Times

10−20

10−15

10−10

10−5

100

0 20 40 60 80 100Seconds

Flux

WE Steady−state run 1WE Steady−state run 2

BF equiv = 89616 secs of Dynamics

Pipeline:

Spa$ally'dependent'response'

Chemical'Kine$cs'

BioNetGen'

Model'Geometries'

Enhanced'Sampling'Visual'Edi$ng'

Stochas$c'Spa$al'

Dynamics'

Efficient Sampling

Conclusions • Able to sample the rare events and full

distributions for stochastic systems biology models over a wide range of complexity

• Speed-up over brute-force is dramatic enough encourage the design of more complex, more realistic models

• Long time-scale behavior can be extrapolated from short simulations: can bridge dynamics over multiple timescales

Thanks• Collaborators:

• Jose, Devin, Markus, Jim, Bob, Dan

• Funding:

• SF Grant No. MCB-1119091

• NIH Grant No. P41 GM103712

• NIH Grant No. T32 EB009403

• NIH Grant No. GM090033

• NSF Expeditions in Computing Grant (Award No. 0926181)

Documents

Using Weighted Ensemble Sampling in Spatial Stochastic ...€¦ · ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations were run on the Salk machine at the Pittsburgh