Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Using Weighted Ensemble Sampling in Spatial Stochastic
Simulations
Rory Donovan MMBios Scientific Meeting 26/2/15
Motivation
• Biological systems are complex
• “Simple” underlying dynamics can be incredible costly to simulate directly
• Need to look for ways to increase efficiency of computations
• If rare events matter, how can we cope?
Resampling
Halve Points, Double Weights
Double Points, Halve Weights
Do Both, In Different Regions
Original Sample
A Weighted Ensemble is Just Repeated Resampling
•Of any stochastic process:
•Molecular dynamics (protein motion)
•Chemical kinetics (cell signaling)
•Branching processes (cancer modeling)
•Agent based models (population dynamics)
Repeated RunsNormalized Histogram
Ensemble Samplingt = 0 t = τ t = 2τ t = 3τ
U(x)
bin 1 bin 2 bin 3
Probability:1/8
x
Time Propagation: Arbitrary Dynamics EngineU(x)
bin 1 bin 2 bin 3 x
U(x)
bin 1 bin 2 bin 3 x
U(x)
bin 1 bin 2 bin 3 x
t = 0 t = τ t = 2τ t = 3τ
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
U(x)
bin 1 bin 2 bin 3
Probability:1/2 1/4 1/81 1/16
WE
Dynamics WE
Dynami
cs WE
Dynamics W
E
x x x x
x x x x
Time Propagation: Arbitrary Dynamics Engine
Split
ting/
Mer
ging
: Wei
ghte
d En
sem
ble
Weighted Ensemble Sampling
Ensembles: Weighted vs Unweighted
Spatial Stochastic Systems
Toy Model
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●
●●●
●●●●●●●●●●
●●●●
●●●●10−120
10−100
10−80
10−60
10−40
10−20
100
0 20 40 60 80 100 120Bound Receptors on Opposing Surface
Prob
abili
ty
●
Brute Force MCellWeighted Ensemble
t = 1/100 s
●
●●
●●
●
●●●
●●●
●●●●●●●
●●●●
●●
●
●●
●●
●●
●
●●
●
●
●●
●●10−10
10−8
10−6
10−4
10−2
100
0 10 20 30 40Bound Receptors on Opposing Surface
Prob
abili
ty
t = 1/100 s
Toy Model Results
Neuromuscular Junction
MATERIALS AND METHODS
Model geometry
Spatially realistic model geometries for MCell simulations can beobtained in several ways, such as by reconstruction from electron micro-scopy imagery (14) or by in silico methods using computer-aided designor three-dimensional (3D) content creation software (15). The latter is apowerful new approach for modeling complex biological structures thatallows for rapid creation of model geometries and thus complements tradi-tional reconstructions from electron microscopy. We used Blender (16) tocreate our model’s 3D mesh geometry in silico according to dimensionsbased on published averages (17,18) (see Fig. 1). Our model included26 synaptic vesicles of 50 nm diameter arranged in two double rows.We marked triangular mesh tiles on the bottom of each synaptic vesicleas putative Ca2þ-sensor sites. Similarly, mesh tiles in the trough betweenvesicles were marked as putative VGCC sites at locations suggested bypublished estimates (17–19). These marked tiles could then be populatedwith Ca2þ-sensor sites or VGCCs as needed within MCell’s ModelDescription Language (MDL). From Blender, meshes were exporteddirectly into MDL.
MCell simulations and algorithms
All simulations were carried out with MCell version 3.1 (rev. 788) with acustom binary reaction data output format to facilitate data handling. Foreach simulation condition (different numbers of Ca2þ-sensor sites onvesicles, varying extracellular Ca2þ concentration, etc.), we averaged theresults over 10,000 separate MCell runs obtained via different randomnumber seeds. During each run we tracked the Ca2þ ions emerging fromindividual VGCCs, their binding to Ca2þ buffer, and their binding toCa2þ-sensor sites on vesicles. Ca2þ ions that encountered the edges ofour model geometry were removed from the simulation to mimic theirdisappearance into the adjacent presynaptic space. We used programs writ-ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations
were run on the Salk machine at the Pittsburgh Supercomputing Center,an SGI Altix 4700 shared-memory NUMA system with 144 Itanium2 pro-cessors, cmist, an eight-core Intel Xeon E5472 machine, or a local desktopwith an Intel i7-980X CPU.
We previously described MCell’s simulation algorithms in detail (20,21).In brief, in an MCell model, the membranes of cells and organelles arerepresented by triangulated surface meshes. Brownian motion of diffusingvolume and surface molecules is simulated using optimized grid-free Brow-nian dynamics random walk algorithms (21). Due to the inherently smalllength scales present in our model (e.g., distances between VGCCs and syn-aptic vesicles or between vesicles and the presynaptic membrane), we useda small simulation time step of 10 ns and corresponding short diffusion steplengths to allow for accurate spatial sampling (22).
Simulation of VGCCs and Ca2D influx
VGCCs in our model opened and closed according to a time-dependentaction potential waveform (Fig. 2 A) and were modeled via a four-statekinetic scheme (three closed states and one open state; see Table 1 and insetin Fig. 2 A). The rate constants between states were voltage dependent withthe following parameters (23):
a ¼ 0:06eðVmþ24Þ=14:5
b ¼ 1:7
eðVmþ34Þ=16:9 þ 1:
Here, Vm(t) is the time-dependent membrane voltage corresponding to theaction potential waveform shown in Fig. 2 A. The time-dependent rate con-stants a and b were precomputed and stored in separate text files that werethen read by MCell at the beginning of each simulation. Because the extra-cellular Ca2þ concentration, [Ca2þ]ext, remains approximately constantduring an action potential, we approximated Ca2þ flux through open
A D
BE
C
FIGURE 1 Overview of the frog NMJ model.(A) Rendered simulation snapshot of the completeAZ model. Visible are the two double rows ofsynaptic vesicles (large red spheres) as well asdiffusing free and buffer-bound Ca2þ ions (smallcolored spheres). For visual clarity, unbound buffersites are not shown. (B and C) Close-ups of synap-tic vesicles, revealing the Ca2þ-sensor sites at theirbottom (the 40-sensor model is shown). The cylin-drical glyphs directly in front of synaptic vesiclesrepresent closed or open VGCCs (red, closed;yellow, open). Open VGCCs release Ca2þ ions(yellow spheres) into the presynaptic space, whichcan bind to endogenous buffer sites (cyan spheres;only Ca2þ-bound buffer sites are shown) or sensorsites on synaptic vesicles. Unbound and boundCa2þ-sensor sites on vesicles are colored blackand yellow, respectively. (D and E) Importantmodel dimensions (drawings are not to scale).
Biophysical Journal 104(12) 2751–2763
2752 Dittrich et al.
MATERIALS AND METHODS
Model geometry
Spatially realistic model geometries for MCell simulations can beobtained in several ways, such as by reconstruction from electron micro-scopy imagery (14) or by in silico methods using computer-aided designor three-dimensional (3D) content creation software (15). The latter is apowerful new approach for modeling complex biological structures thatallows for rapid creation of model geometries and thus complements tradi-tional reconstructions from electron microscopy. We used Blender (16) tocreate our model’s 3D mesh geometry in silico according to dimensionsbased on published averages (17,18) (see Fig. 1). Our model included26 synaptic vesicles of 50 nm diameter arranged in two double rows.We marked triangular mesh tiles on the bottom of each synaptic vesicleas putative Ca2þ-sensor sites. Similarly, mesh tiles in the trough betweenvesicles were marked as putative VGCC sites at locations suggested bypublished estimates (17–19). These marked tiles could then be populatedwith Ca2þ-sensor sites or VGCCs as needed within MCell’s ModelDescription Language (MDL). From Blender, meshes were exporteddirectly into MDL.
MCell simulations and algorithms
All simulations were carried out with MCell version 3.1 (rev. 788) with acustom binary reaction data output format to facilitate data handling. Foreach simulation condition (different numbers of Ca2þ-sensor sites onvesicles, varying extracellular Ca2þ concentration, etc.), we averaged theresults over 10,000 separate MCell runs obtained via different randomnumber seeds. During each run we tracked the Ca2þ ions emerging fromindividual VGCCs, their binding to Ca2þ buffer, and their binding toCa2þ-sensor sites on vesicles. Ca2þ ions that encountered the edges ofour model geometry were removed from the simulation to mimic theirdisappearance into the adjacent presynaptic space. We used programs writ-ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations
were run on the Salk machine at the Pittsburgh Supercomputing Center,an SGI Altix 4700 shared-memory NUMA system with 144 Itanium2 pro-cessors, cmist, an eight-core Intel Xeon E5472 machine, or a local desktopwith an Intel i7-980X CPU.
We previously described MCell’s simulation algorithms in detail (20,21).In brief, in an MCell model, the membranes of cells and organelles arerepresented by triangulated surface meshes. Brownian motion of diffusingvolume and surface molecules is simulated using optimized grid-free Brow-nian dynamics random walk algorithms (21). Due to the inherently smalllength scales present in our model (e.g., distances between VGCCs and syn-aptic vesicles or between vesicles and the presynaptic membrane), we useda small simulation time step of 10 ns and corresponding short diffusion steplengths to allow for accurate spatial sampling (22).
Simulation of VGCCs and Ca2D influx
VGCCs in our model opened and closed according to a time-dependentaction potential waveform (Fig. 2 A) and were modeled via a four-statekinetic scheme (three closed states and one open state; see Table 1 and insetin Fig. 2 A). The rate constants between states were voltage dependent withthe following parameters (23):
a ¼ 0:06eðVmþ24Þ=14:5
b ¼ 1:7
eðVmþ34Þ=16:9 þ 1:
Here, Vm(t) is the time-dependent membrane voltage corresponding to theaction potential waveform shown in Fig. 2 A. The time-dependent rate con-stants a and b were precomputed and stored in separate text files that werethen read by MCell at the beginning of each simulation. Because the extra-cellular Ca2þ concentration, [Ca2þ]ext, remains approximately constantduring an action potential, we approximated Ca2þ flux through open
A D
BE
C
FIGURE 1 Overview of the frog NMJ model.(A) Rendered simulation snapshot of the completeAZ model. Visible are the two double rows ofsynaptic vesicles (large red spheres) as well asdiffusing free and buffer-bound Ca2þ ions (smallcolored spheres). For visual clarity, unbound buffersites are not shown. (B and C) Close-ups of synap-tic vesicles, revealing the Ca2þ-sensor sites at theirbottom (the 40-sensor model is shown). The cylin-drical glyphs directly in front of synaptic vesiclesrepresent closed or open VGCCs (red, closed;yellow, open). Open VGCCs release Ca2þ ions(yellow spheres) into the presynaptic space, whichcan bind to endogenous buffer sites (cyan spheres;only Ca2þ-bound buffer sites are shown) or sensorsites on synaptic vesicles. Unbound and boundCa2þ-sensor sites on vesicles are colored blackand yellow, respectively. (D and E) Importantmodel dimensions (drawings are not to scale).
Biophysical Journal 104(12) 2751–2763
2752 Dittrich et al.
Neuromuscular Junction Model
NMJ Zoomed-In
MATERIALS AND METHODS
Model geometry
Spatially realistic model geometries for MCell simulations can beobtained in several ways, such as by reconstruction from electron micro-scopy imagery (14) or by in silico methods using computer-aided designor three-dimensional (3D) content creation software (15). The latter is apowerful new approach for modeling complex biological structures thatallows for rapid creation of model geometries and thus complements tradi-tional reconstructions from electron microscopy. We used Blender (16) tocreate our model’s 3D mesh geometry in silico according to dimensionsbased on published averages (17,18) (see Fig. 1). Our model included26 synaptic vesicles of 50 nm diameter arranged in two double rows.We marked triangular mesh tiles on the bottom of each synaptic vesicleas putative Ca2þ-sensor sites. Similarly, mesh tiles in the trough betweenvesicles were marked as putative VGCC sites at locations suggested bypublished estimates (17–19). These marked tiles could then be populatedwith Ca2þ-sensor sites or VGCCs as needed within MCell’s ModelDescription Language (MDL). From Blender, meshes were exporteddirectly into MDL.
MCell simulations and algorithms
All simulations were carried out with MCell version 3.1 (rev. 788) with acustom binary reaction data output format to facilitate data handling. Foreach simulation condition (different numbers of Ca2þ-sensor sites onvesicles, varying extracellular Ca2þ concentration, etc.), we averaged theresults over 10,000 separate MCell runs obtained via different randomnumber seeds. During each run we tracked the Ca2þ ions emerging fromindividual VGCCs, their binding to Ca2þ buffer, and their binding toCa2þ-sensor sites on vesicles. Ca2þ ions that encountered the edges ofour model geometry were removed from the simulation to mimic theirdisappearance into the adjacent presynaptic space. We used programs writ-ten in Cþþ, Python, Lua, and Bash to analyze the data. All simulations
were run on the Salk machine at the Pittsburgh Supercomputing Center,an SGI Altix 4700 shared-memory NUMA system with 144 Itanium2 pro-cessors, cmist, an eight-core Intel Xeon E5472 machine, or a local desktopwith an Intel i7-980X CPU.
We previously described MCell’s simulation algorithms in detail (20,21).In brief, in an MCell model, the membranes of cells and organelles arerepresented by triangulated surface meshes. Brownian motion of diffusingvolume and surface molecules is simulated using optimized grid-free Brow-nian dynamics random walk algorithms (21). Due to the inherently smalllength scales present in our model (e.g., distances between VGCCs and syn-aptic vesicles or between vesicles and the presynaptic membrane), we useda small simulation time step of 10 ns and corresponding short diffusion steplengths to allow for accurate spatial sampling (22).
Simulation of VGCCs and Ca2D influx
VGCCs in our model opened and closed according to a time-dependentaction potential waveform (Fig. 2 A) and were modeled via a four-statekinetic scheme (three closed states and one open state; see Table 1 and insetin Fig. 2 A). The rate constants between states were voltage dependent withthe following parameters (23):
a ¼ 0:06eðVmþ24Þ=14:5
b ¼ 1:7
eðVmþ34Þ=16:9 þ 1:
Here, Vm(t) is the time-dependent membrane voltage corresponding to theaction potential waveform shown in Fig. 2 A. The time-dependent rate con-stants a and b were precomputed and stored in separate text files that werethen read by MCell at the beginning of each simulation. Because the extra-cellular Ca2þ concentration, [Ca2þ]ext, remains approximately constantduring an action potential, we approximated Ca2þ flux through open
A D
BE
C
FIGURE 1 Overview of the frog NMJ model.(A) Rendered simulation snapshot of the completeAZ model. Visible are the two double rows ofsynaptic vesicles (large red spheres) as well asdiffusing free and buffer-bound Ca2þ ions (smallcolored spheres). For visual clarity, unbound buffersites are not shown. (B and C) Close-ups of synap-tic vesicles, revealing the Ca2þ-sensor sites at theirbottom (the 40-sensor model is shown). The cylin-drical glyphs directly in front of synaptic vesiclesrepresent closed or open VGCCs (red, closed;yellow, open). Open VGCCs release Ca2þ ions(yellow spheres) into the presynaptic space, whichcan bind to endogenous buffer sites (cyan spheres;only Ca2þ-bound buffer sites are shown) or sensorsites on synaptic vesicles. Unbound and boundCa2þ-sensor sites on vesicles are colored blackand yellow, respectively. (D and E) Importantmodel dimensions (drawings are not to scale).
Biophysical Journal 104(12) 2751–2763
2752 Dittrich et al.
• Calcium is released from bottom, diffuses, and can bind to synaptotagmin vesicles
• Model: if enough calcium bind to one vesicle, in the right pattern, a release event occurs
First Passage Times
10−10
10−8
10−6
10−4
10−2
100
0 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3First Passage Time (ms)
Prob
abili
ty
Ca = 0.5, BFCa = 0.5, WE (100 iters, 8 spb)
Total Firing: WE = 0.0014 / BF = 0.0013
Scaling Relationship
Prob ~ Ca4.67
10−5
10−4
10−3
10−2
10−1
100
0.1 0.5 1 1.5 2Ca Concentration (mM)
Prob
abili
ty to
Fir
e
BFWE
Firing Probability vs. Ca Concentration
Realistic Geometry
Signaling Network
Pipeline: CellOrganizer → BioNetGen → MCell
Harris, Hogg and Faeder
R
L
2
2
2
PM
EC
NU
EM
EN
NM
P1
Im
P1
P1
*
P2
Im
P1
Im
*2
P1
Im
NP
R2 R1
R3-5
R6-7
R8
R10,11
R9
R11
R122
R14
R15
R13
R16
R17R19
R18
R21
R20
R23
R22
R22
R24R25
R26
R27
R28
R28
R28
R28
R30
Im
TF TF p
TF p
p
TF p
p TF
p
p
Im TF
p
p
Im TF p
p
R R R R
L L
TF p
TF p
p p
L L
R Rp p
L L
R R
EM
EN
EM
EN
EM
EN
L L
R R
L
R
p2
TF p
pp1
p2
p1 mRNA1
mRNA2
mRNA2
mRNA1
Im
P1
CP
L L
NP
NP
TF p TF p
Figure 1: A compartmental model of the cell. The model couples simplified processes of signal tranduction, nuclear transportand transcriptional regulation in a single eukaryotic cell. The system consists of four volume compartments: extracellular(EC), cytosol (CP), endosomal (EN) and nuclear (NU). These are separated by three membrane surfaces: plasma (PM),endosomal (EM) and nuclear (NM). The model is presented as a pathway that proceeds from ligand (L) binding to expressionof protein P2. The underlying rule-based model defines a set of 354 reactions between 78 species. Bonds between moleculesare shown as black lines. Black arrows between species represent reactions. Gray arrow labels correspond to the rule numberthat describes the reaction (see model files at www.bionetgen.org/wsc09). Black integer-valued arrow labels representreaction stoichiometry (if not equal to unity). DNA promoters are pictured as a double helix icon.
the same molecule or another. Components may be associated with state variables with a finite set of possible values, eachrepresenting a conformational or chemical state of a component, such as phosphorylation status. The name of the moleculetype is given first followed by a comma-separated list of its components in parenthesis. The allowed values of state variablesare indicated by ‘⇠’ followed by a value, as in L(r,d,loc⇠EC⇠EN), which declares a ligand molecule L that contains areceptor binding component r, a dimerization component d, and a location component loc that takes on values EC or EN.
The seed species block defines the species initially present in the system. For example, the line
L(r,d,loc⇠EC) Lig0
specifies that the initial amount of free ligand monomers in EC is Lig0, a parameter defined in the parameters block.Molecular complexes may also be specified, as in L(r,d!1,loc⇠EC).L(r,d!1,loc⇠EC), where a bond linking thed components of each L molecule is indicated by a shared bond label, !1.
The reaction rules block contains rules that define how molecules interact. A rule is comprised of a set of reactantpatterns, a transformation arrow, a set of product patterns, and a rate law. A pattern is a set of molecules that select speciesthrough a mapping operation (Blinov et al. 2006). The match of a molecule in a pattern to a molecule in a species dependsonly on the components specified in the pattern (including wildcards), so that one pattern may select many different species.The ‘+’ operator separates two reactant patterns that must map to distinct species (i.e., they may not reside in the samecomplex). The ‘.’ operator separates molecules that are part of the same species. The transformation arrow may be eitherunidirectional (->) or bidirectional (<->). Five basic types of operations are carried out by the rules in the example system
Protein Histograms
BF Limit
10−10
10−8
10−6
10−4
10−2
100
0 5 10 15 20 25Total Protein 1
Prob
abili
ty
BFWE
Time = 100 secs
Steady-State
In steady state, MFPT(A→B) = 1/Flux(A→B)
First Passage Times
10−20
10−15
10−10
10−5
100
0 20 40 60 80 100Seconds
Flux
WE Steady−state run 1WE Steady−state run 2
BF equiv = 89616 secs of Dynamics
Pipeline:
Spa$ally'dependent'response'
Chemical'Kine$cs'
BioNetGen'
Model'Geometries'
Enhanced'Sampling'Visual'Edi$ng'
Stochas$c'Spa$al'
Dynamics'
Efficient Sampling
Conclusions • Able to sample the rare events and full
distributions for stochastic systems biology models over a wide range of complexity
• Speed-up over brute-force is dramatic enough encourage the design of more complex, more realistic models
• Long time-scale behavior can be extrapolated from short simulations: can bridge dynamics over multiple timescales
Thanks• Collaborators:
• Jose, Devin, Markus, Jim, Bob, Dan
• Funding:
• SF Grant No. MCB-1119091
• NIH Grant No. P41 GM103712
• NIH Grant No. T32 EB009403
• NIH Grant No. GM090033
• NSF Expeditions in Computing Grant (Award No. 0926181)