# Webpages about Smoldyn and spatial simulation

- Brief description of Smoldyn
- Spatial simulation and simulators
- Comparison table of particle-based simulators
- Questions and answers about Smoldyn
- Advice on choosing simulation parameters
- Research project ideas
- Algorithm details

Choosing appropriate simulation parameters for a model is always a challenge. It is particularly challenging for highly detailed simulations, such as those that Smoldyn does, because these simulations require more parameters than others do. In addition, most necessary parameters have not been measured experimentally.

However, parameterizing models is not as daunting as it might seem. The notes section of this Methods in Molecular Biology paper (pdf) offers several suggestions. For the rest of this page, I simply copied and pasted many of those notes.

# Page contents

- Partitioning space
- Time steps
- Diffusion coefficients
- Molecule lists
- Surface interactions
- Reaction rates

# Partitioning space

Smoldyn subdivides the system within its boundaries into a grid of uniformly spaced "virtual boxes." These boxes do not affect the simulated results. Instead, they make simulations more efficient by reducing the number of potential molecule-molecule and molecule-surface interactions that Smoldyn needs to check at each time step. The default partition spacing, which yields an average of about four molecules per box when the simulation starts, is often good but can usually be improved upon. Improvement is especially important if the model starting state is not representative of its typical state or if molecules are not distributed homogenously.

Optimize the partition spacing by timing simulations (Smoldyn displays the run time when it terminates) that use a range of box sizes, which you can set with molperbox or boxsize, and choose the fastest size for which Smoldyn does not report any errors. The errors to watch for are those that report that bimolecular reaction binding radii, or analogous distances, are larger than the box widths. When they arise, they indicate that Smoldyn will not detect some bimolecular reactions, which means that those reactions will simulate too slowly, and simulation accuracy will be reduced. Other Smoldyn warnings about box sizes, such as those that comment on unusually many or few molecules per box, are simply suggestions that different partition spacings may improve performance.

# Time steps

When choosing the time step for a simulation, there is an unavoidable trade-off between using shorter time steps to get more accurate results, and using longer time steps for faster simulations. Thus, the succinct answer to the question of what time step to use, is that you should choose the longest time step that yields sufficiently accurate results. You can find it by running trial simulations with a wide range of time steps and graphing representative simulation results (e.g., the maximum amount of a product concentration, time until a substrate concentration is halved, steady-state receptor occupancy, etc.) against the log of the time step. Typically, this plot will show results that are independent of time step lengths for short time steps, that errors increase log-linearly for long time steps, and that these regions are separated by a cross-over region that is about a factor of 10 in width. A judgment call is required at this point to decide how much accuracy, if any, you are willing to sacrifice for faster simulations.

In addition to this heuristic method, it is worth considering how the time step length affects individual simulation algorithms. In particular,

- Smoldyn displaces each diffusing molecule by about
s = (2DΔt)
^{1/2}, where D is the molecule's diffusion coefficient and Δt is the time step, at each time step. The average displacement for the fastest diffusing species, s_{max}, is the simulation's spatial resolution. In general, s_{max}should be significantly smaller than important geometrical features, surface curvature radii, and distances between fixed molecules. - Characteristic transition times for unimolecular reactions, molecular desorption, and transitions between surface-bound states are all t = 1/k, where k is the appropriate rate constant. Also, characteristic times for bimolecular reactions with well-mixed reactants are t = ([A] + [B])/(k[A][B]), where k is the reaction rate constant and [A] and [B] are the reactant concentrations. And, characteristic times for molecular adsorption and permeability are t = d/k, where d is the distance between surfaces (e.g., the cell length) and k is the adsorption or transmission coefficient. To simulate these dynamics accurately, the time step should be smaller than these characteristic times.
- Smoldyn performs bimolecular reactions between pairs of reactant molecules that are closer than their "binding radius." Binding radii, which Smoldyn computes from reaction rates, diffusion coefficients, and the simulation time step, increase as time steps are made longer and are often larger than physical molecular radii. If a molecular species is so concentrated that these binding radii frequently overlap each other, which is especially problematic for clustered surface-bound molecules, this decreases accuracy.
- For fast bimolecular reactions that are far from steady state, Smoldyn's simulated results more closely agree with diffusion-limited kinetics when short time steps are used and with activation-limited kinetics when long time steps are used. This is unlikely to affect typical biochemical network models, but may affect reaction biophysics models.

For all of these time step considerations, note that Smoldyn displays all internal simulation parameters, several characteristic times, and any warnings for potential problems before it starts simulations. It is prudent to check that this information seems reasonable.

In practice, many researchers use time steps around 0.1 ms. Smoldyn has also been used with time steps as short as 0.06 ns, in a test of its reaction algorithms, and with time steps up to 10 or 20 ms. The latter models were able to use long time steps because they had relatively low molecular densities and large system geometries.

# Diffusion coefficients

In homogeneous solutions, such as water or liquid growth media, diffusion coefficients can often be approximated reasonably well with the Stokes-Einstein equation. It is

D = k_{B}T/(6πηr)

where D is the diffusion coefficient, k_{B} is Boltzmann's constant, T is the absolute temperature, η
is the solution viscosity, and r is the radius of the diffusing particle, which is assumed to be spherical. This
particle radius can be easily calculated from its mass m and density r,

r = [3m/(4πρ)]^{1/3}

In the latter equality, m is measured in Daltons; also, it assumes an average density of 1.41 g/cm^{3}, which is
reasonably accurate for proteins above 40 kDa and within 10% for smaller proteins. Combining this radius calculation
with the Stokes-Einstein equation yields the protein diffusion coefficient estimate

D = 3270/m^{1/3} μm^{2}/s

This is for a temperature of 20 degrees C, where water's viscosity is very nearly 1 mPa s. Two examples are:
for lysozyme, m is 14.6 kDa, the experimental D is 111 μm^{2}/s, and the computed D is 134
μm^{2}/s; and for green fluorescent protein (GFP), m is 26.9 kDa, the experimental D is 87
μm^{2}/s, and the computed D is 109 μm^{2}/s. Of course, theories that account for protein
shape yield better predictions.

For small molecules in water, experimentally measured diffusion coefficients are
readily available. Examples include 2,010 μm^{2}/s for oxygen at 20 degrees C and 380 μm^{2}/s
for lactose at 15 degrees C. For quick estimates, the Stokes-Einstein equation is usually correct to within a
factor of two.

In eukaryotic cytoplasms, nuclei, and mitochondria, experiments by Verkman's group show that macromolecule
diffusion coefficients are about 25% of their values in water, for masses up to about 500 kDa. Larger molecules and
filamentous molecules, such as DNA, diffuse more slowly, likely due to sieving by actin networks. In the *E. coli*
bacterial cytoplasm, GFP diffuses about another factor of 5 slower than it does in eukaryotes, now with a diffusion
coefficient around 3-8 μm^{2}/s. Diffusion is much slower yet in the *E. coli* periplasm, where the
42.5 kDa maltose-binding protein has a diffusion coefficient of about 0.009 μm^{2}/s. Finally, a couple
of membrane-bound protein diffusion coefficients are 0.09 μm^{2}/s for aquaporin-1 (about 30 kDa) in
nonpolarized fibroblast cell membranes and 0.012 μm^{2}/s for histidine kinase PleC (117 kDa, including
fluorophore) in Caulobacter membranes.

From these data, I suggest the rules-of-thumb: use 80% of the Stokes-Einstein value, as calculated above, for proteins in water, and divide the aqueous diffusion coefficient by 4 for eukaryotic cell and organelle cytoplasms, by 15 for bacterial cytoplasms, by 1,000 for bacterial periplasms, by 1,000 for eukaryotic membranes, and by 4,000 for bacterial membranes.

# Molecule lists

Smoldyn stores molecules in several lists. By default, Smoldyn stores all simulated molecules that do not diffuse in a "fixed" list and those that do diffuse in a "diffusing" list. This scheme is more efficient than a single list because Smoldyn does not need to diffuse or perform surface interactions for molecules in the fixed list; also Smoldyn does not need to check for bimolecular reactions between pairs of molecules in the fixed list.

Adding more lists can speed simulations up further, often by factors of five or more, typically by minimizing the number of potential bimolecular reactions that Smoldyn needs to check. For example, consider the chemical reaction A + B -> C. If Smoldyn stored all three species in the same list, then when Smoldyn searched the list to see which AB molecule pairs could react, it would also encounter many nonreactive AA, AC, BB, BC, and CC molecule pairs, each of which would take a small amount of time to check and then ignore. On the other hand, Smoldyn would only encounter AB pairs if A, B, and C were stored in three separate lists. Generalizing this example, more molecule lists reduce the number of unnecessary checks, so typically produce faster simulations. This trend does not continue indefinitely though, because each molecule list also requires some processing time; this becomes important for lists with few molecules. Thus, overall, simulation performance is usually best when each abundant species has its own list and when sparsely populated species share lists. Further list optimization, by considering the potential bimolecular and molecule-surface interactions that Smoldyn has to check at each time step, can produce additional gains.

Create molecule lists with molecule_lists and assign species to them with mol_list.

# Surface interactions

Adsorption coefficients are limited to about the thermal velocity of the adsorbent, which is

κ_{max} = (k_{B}T/m)^{1/2} = 1.56x10^{9}/m^{1/2} μm/s

The latter equality assumes that m is measured in Daltons and the temperature is 20 degrees C. For example,
the largest possible adsorption coefficient for a 50 kDa protein is about 7x10^{6} μm/s. Real adsorption coefficients
are likely to be vastly smaller than this maximum. For example, Huang and coworkers chose an adsorption coefficient
of 0.025 μm/s for the adsorption of MinD (29.5 kDa) to the inside of the *E. coli* cell membrane in a biochemical
model. Unfortunately, remarkably few experimental papers present quantitative data on protein adsorption to or
desorption from lipid bilayers, despite an extensive literature.

Transmission coefficients for molecules through membranes can be calculated using the equation

κ = K_{mem}D_{mem}/d_{mem}

where K_{mem} is the partition coefficient for the molecule into the membrane from the neighboring
solution (e.g., cytoplasm), D_{mem} is the molecule's diffusion coefficient in the membrane, and
d_{mem} is the membrane thickness. Qualitatively, this equation shows that hydrophobic molecules are
transmitted faster than hydrophilic ones because they partition into the membrane more readily, and that smaller
molecules are transmitted faster than larger ones because they diffuse faster. A paper by Paula et al. shows how
to compute the necessary parameters. It also lists some experimentally determined coefficients for transmission
through lipid bilayers, including: 3.5x10^{8} μm/s for potassium ions, 0.014 μm/s for urea, 0.027
μm/s for glycerol, 5 μm/s for protons, and 150 μm/s for water molecules; these values are for 2.7 nm
thick bilayers, which is about the thickness of typical biological membranes.

Adsorption and transmission coefficients can also be estimated using the respective characteristic times, if these times can be inferred from experiments.

# Reaction rates

Experimental bimolecular reaction rates are limited to

k_{max} = 4π(D_{A} + D_{B})(r_{A} + r_{B})

by the rate at which reactants can diffuse together. Here, k_{max} is the diffusion-limited reaction
rate constant, D_{A} and D_{B} are the diffusion coefficients of the reactants, and r_{A}
and r_{B} are the reactant radii. Highly reactive small molecules sometimes react with rates that are close
to this maximum, but typical biochemical reaction rate constants are much slower. You can verify that your simulated
rates will be well below this maxi- mum by comparing Smoldyn's output parameter labeled "binding radius if dt were 0"
to the sum of the physical reactant radii. They will be equal if your simulated reaction is at the maximum rate, and
proportionately less for slower reactions.

Other than this one qualitative check, reaction rates usually have to found from the scientific literature that
is relevant to your model. Typically, some reaction rates will have been measured, others can be calculated from
published data (e.g., association reaction rates can be calculated from dissociation constants and dissociation
reaction rates), others can be estimated from characteristic reaction times and published figures, and yet others
simply have to be guessed. Previously developed models, whether of your system or similar systems, are often a good
source of reaction rates and literature references. For this, the BioModels database
(http://www.ebi.ac.uk/ biomodels-main) is a particularly good place to start. Finally, curated literature summaries
that list quantitative data are rarely available, but can be excellent resources when they are. For example, the
Yeast Pheromone Model wiki (http:// yeastpheromonemodel.org/wiki/Main_page) lists reaction rates and their
references for most reactions in the yeast pheromone response system. A few representative rates from this wiki
are pheromone binds Ste2 receptors at 1.8x10^{5} M^{-1}s^{-1} (3.1x10^{-4}
μm^{3}/s) and unbinds at 10^{-3} s^{-1}, the Gpa1 G-protein subunit exchanges GDP
for GTP at 6.17x10^{-4} s^{-1}, and the Fus3 MAP kinase binds the Ste5 scaffold protein at
2.3x10^{6} M^{-1}s^{-1} (3.8x10^{-3} μm^{3}/s) and unbinds at
2.3 s^{-1}. While these are reasonably typical values for these types of interactions, other similar
reactions are often faster or slower by several orders of magnitude.

Note that bimolecular reactions that take place in one- and two-dimensional systems, such as along filaments or within membranes, do not have reaction rate constants in the same sense as reactions in three dimensions. Also, current versions of Smoldyn cannot quantitatively simulate low-dimensional reactions.