18
1 JBB2026H –Régis Pomès – 2 November 2018 Biomolecular Dynamics Molecular Dynamics Basic approximations Force field Equations of motion Stochastic simulations Statistical mechanics Applications to Protein Folding and Aggregation Expanding the scope of computer simulations !! 10 6 atoms, 10 -15 !10 -3 s • Develop smarter sampling algorithms • Develop smarter computer architectures • Use more computers! Biomolecular simulations The Biophysicist Control Freak’s Dream Experiment: - full atomic resolution - broad range of biomolecular time scales " How and why things are happening Computer simulations of biomolecular systems typical scales: 10 4 atoms, 10 -15 -10 -6 s SciNet

Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

1

JBB2026H –Régis Pomès – 2 November 2018

Biomolecular Dynamics

•  Molecular Dynamics

•  Basic approximations •  Force field •  Equations of motion •  Stochastic simulations •  Statistical mechanics

•  Applications to Protein Folding and Aggregation

Expanding the scope of computer simulations

!! 106 atoms, 10-15!10-3 s

•  Develop smarter sampling algorithms •  Develop smarter computer architectures •  Use more computers!

Biomolecular simulations

The Biophysicist Control Freak’s Dream Experiment:

- full atomic resolution - broad range of biomolecular time scales

" How and why things are happening

Computer simulations of biomolecular systems typical scales: 104 atoms, 10-15-10-6 s

SciNet

Page 2: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

2

The nature of conformational space and of molecular dynamics is essentially determined by the nature of atomic interactions

•  In an ideal gas, atoms mostly travel through space, occasionally colliding with other atoms

•  In liquid Ar, all the DOFs are equivalent; the atoms roll around and their interactions are dominated by excluded volume, which restricts conformational space by precluding atomic overlap (box of marbles)

•  In liquid water and lipid bilayers, there are additional preferences due to anisotropy of shape and more complex interactions " more restrictions

•  In proteins, further restrictions due to the complex topology of chemical bonds and diverse interactions " conformational space is inherently heterogeneous

To understand the time evolution of molecular systems, we need to understand molecular interactions

Page 3: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

3

Different types of motion depending on potential energy and time scale

Librations

Oscillations

Diffusion

Activated dynamics

Eg liquid Ar: At sufficiently short t, atoms trapped by neighbors. Excluded volume! random collisions

Eg diatomic t

q

t long enough " reorganization of surrounding atoms " mean squared displacement proportional to t: msd = 6Dt

Rate decays exponentially with activation energy Ea

Ea k ~ e-βEa

E

r

t

Example: classical 1D harmonic oscillator •  Eg: Diatomic molecule in vacuo

Epot = V(x) = ½ kx2

Newtonian mechanics:

F = -dV/dx = md2x/dt2

⇔ -(k/m) x = d2x/dt2 #

Given initial conditions: dx/dt = 0 x = x0

eq.# can be solved analytically:

x(t) = x0cosωt, with ω = (k/m)1/2

Etot = Epot + Ekin = ½ kx2 + ½ m(dx/dt)2 = ½ k (x0cosωt)2 + ½ m(x0 ω sin ωt)2 = ½ kx0

2 = constant

For a complex molecular system, the equations of motion are solved numerically instead.

V(x)

x

t

Etot

Epot

Ekin

Page 4: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

4

How do atoms and molecules interact with each other?

Molecular Mechanics Force Field Molecular interactions are described by the potential energy (PE) of the

system, E(r1, r2, … , rN) = explicit link between conformation and energy

•  Approximate from a set of empirical functions used to estimate potential energy of molecular system based on the 3-dimensional arrangement of atomic nuclei

•  … rigorously, energy of molecular system obtained from first principles of quantum mechanics by solving the Schrodinger equation Hφ = Eφ

–  but accurate ab initio methods are very expensive, which limits their applicability to small-ish systems (up to ~100 atoms)

•  Many biological phenomena can be described in the classical limit –  atoms are represented by beads with point charges –  distinguish between covalent interactions (bond, angle, torsion,

improper torsion) and non-bonded interactions (electrostatic and van der Waals)

Total potential energy •  Vtotal = Σbonds VB + Σangles VA + Σdihedrals VD + Σimpr VI + ΣiΣj>i [VC + VLJ]

Takes into account both local and non-local interactions

(A) Bonded interactions

•  Covalent bonds: VB(b) = ½ kB (b - b0)2 •  Angles: VA(θ) = ½ kA (θ - θ 0)2

•  Dihedral angles or torsions: VD(χ) = kD [1 + cos(nχ + φ) ] •  Improper dihedrals: VI(θ) = ½ kI (q - q0)2

–  enforce planarity/chirality

harmonic spring

Page 5: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

5

•  Assume non-bonded interactions are pairwise-additive ⇒ for two atoms separated by rij:

•  Electrostatic: VC(rij) = qiqj/(4πεε0rij) = Coulomb potential

qi, qj = atomic charges of atoms i and j

ε0 = permittivity of vacuum

Long range: | VC(rij)| = 332/r kcal/mol/Å = huge for two unit charges! But often screened by solvent (water dielectric constant ε = 78.5).

Accounts for charge-charge, charge-dipole, dipole-dipole, and (essentially) hydrogen bonding interactions

•  van der Waals : VLJ(rij) = 4ε [ (σ/rij)12 - (σ/rij)6 ] = a.k.a. Lennard-Jones 6-12 potential

• r-12 excluded volume, steric term (Pauli exclusion principle)

• r-6 dispersion attraction (London force) due to small fluctuations of electronic charge distribution in presence of other atoms

r

V(r)

(B) Non-bonded interactions

r σ

ε

Symbols b, θ, χ, r denote the variables. The other symbols are parameters.

Page 6: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

6

Parametrisation •  internal geometry (equilibrium bond lengths, angles, torsions) from ab

initio calculations of small molecules (e.g., Ala dipeptide, imidazole ring)

•  vibrational frequencies from ab initio calculations, IR and Raman spectra

•  bulk properties (e.g. water density, dipole moment, non-bonded interactions, partial charges) from ab initio calculations, crystallography, neutron diffraction, calorimetry, mass spectrometry

•  fit point charges to electrostatic field generated by electron distribution in a continuum model of water

•  fit Lennard-Jones parameters to reproduce interaction energy with water molecules, hydration free energy, crystallography of small molecules

… for proteins and other polymers, use fragments

Molecular Dynamics Simulations •  Non-equilibrium (time-dependent) properties •  Equilibrium (thermodynamic) properties •  Affords insight hard to access with experimental tools:

length scale: atomic level of detail, o(104-106) atoms time scale: fs to ms (up to 12 orders of magnitude)

•  Approach: use physical principles

quantum mechanics classical mechanics statistical mechanics

Hφ = Eφ F = ma = -dV/dr <A> parametrisation integration of the ensemble of molecular equations of averaging interactions motion

Page 7: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

7

Time trajectory •  calculate the force acting on each particle from the gradient of the

potential energy: Fi = - d/dri(Epot) •  then use Newton’s 2nd law: Fi = mi ai

to get acceleration ai(t) •  Etotal = Epotential + Ekinetic

•  To propagate the equations of motion (EOM), we need, for each atom of the system at a given time t: –  positions {r(t)}, –  velocities {v(t)}, and –  accelerations {a(t)}

•  Integration of the EOM:

Integration of the equations of motion

•  Usually achieved by finite difference: dy/dx ~ δy/δx

Eg, Verlet algorithm: Taylor expansion r(t+δt) = r(t) + v(t)δt + ½ a(t)δt2 + … r(t-δt) = r(t) - v(t)δt + ½ a(t)δt2 - … add up r(t+δt) ~ 2r(t) - r(t-δt) + a(t)δt2 velocity v(t) = [ r(t+δt) – r(t-δt) ]/2δt

•  Stochastic dynamics (Brownian motion). Langevin integration:

r(t+δt) = r(t) + c1v(t)δt + c2a(t)δt2 + δrG

where c0 = exp(-ζδt), ζ = friction coefficient c1 = (1-c0)/(ζδt) c2 = (1-c1)/(ζδt) δrG = randomly picked from Gaussian distribution reduces to Newton in the low-friction limit ζ = 5 ps-1 mimics overdamping of molecular motion in liquid water

Page 8: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

8

Temperature •  The absolute temperature is related to the average kinetic energy (motion) of

the constituent particles of the system:

High T ⇔ higher velocities vi Low T ⇔ lower velocities

•  Equipartition principle:

In a system of N particles with mass mi and velocity vi,

3N/2 kBT = < ½ Σi mi vi2>

where kB is the Boltzmann constant

Each of the 3N DOFs has an average kinetic energy of ½ kBT

Time trajectory (cont.) •  initial positions {r(t=0)} taken from structures obtained by X-ray

crystallography, NMR spectroscopy or modeling

•  initial atomic atomic velocities are picked from Maxwell distribution f(v) ∝ v2 exp(-βmv2/2) to yield temperature T

3/2 NkBT = Σi <mi vi2/2> = kinetic energy

•  typical time step δt = 1 fs = 10-15 s limited by fastest motions = stretching of covalent bonds involving H

•  ~ 1 µs MD simulations of soluble proteins with explicit solvent or membrane protein with explicit lipid and water (~105 atoms) ! eg, 10-100 ns/day/CPU

•  longer if explicit solvent is omitted or atoms are coarse-grained—at the expense of accuracy

Page 9: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

9

MD: the heart of the matter

(1)  Read coordinates, topology, parameters

(2)  Guess initial velocities from Maxwell

(3)  Calculate forces acting on each atom

(4)  Integrate EOMs

(5)  " new coordinates and velocities δt later

had enough?

end

N

Y

Treatment of solvent How to represent infinite systems in

computer simulations?

practical calculations are limited to N ~104-106 atoms

bottleneck = calculation of pairwise non-bonded interactions (VC, VLJ)

grows as the number of pairs = N(N-1)/2

realistic hydration (eg, 1.5 nm water shell) of a globular protein typically increases number of atoms, N, 10-fold

" CPU requirements up 100-fold

1% pairs are protein-protein 18% are protein-solvent 81% are solvent-solvent

Omitting solvent would allow simulation times to jump from typical limitation length, ~10µs, to ~1ms

SH3 domain in (6nm)3 water box

Page 10: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

10

Simulations with explicit solvent

Periodic boundary conditions

•  consider infinite number of replicas of a central cell

•  cut off interactions at some separation Rcut < a/2 to prevent artificial “crystallization” of the system due to self-interactions

•  only compute interactions between atoms in central cell and those falling within Rcut (whichever adjacent cell they may be in)

•  Rcut not good for (long-range) Coulombic interactions => use Ewald sums = rigorous treatment of charge-charge interactions in an infinite periodic system

•  currently best practical description of the liquid state for simulations of biopolymers

•  allows explicit treatment of phospholipid membranes, counterions, excess salt, …

a

Statistical Mechanical Ensembles •  Artificial but mathematically convenient division of macroscopic system into a very large

collection of microscopic systems (replicas) sharing a set of common characteristics

•  Eg canonical ensemble (N,V,T)

Identical composition of N particles volume V absolute temperature T in each replica

= closed systems at thermal equilibrium with one another

•  At a given time, the replicas differ in their conformation and energy •  But they have the same physical properties on average •  Statistical mechanics is based on the principle that thermodynamic observables are averages

of molecular properties

N, V, T N, V, T N, V, T N, V, T

N, V, T N, V, T N, V, T

N, V, T

Page 11: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

11

Statistical Mechanics and Molecular Simulations

•  The purpose of statistical mechanics is to establish a formal connection between the microscopic and the macroscopic

•  This connection can be exploited with computer simulations techniques such as molecular dynamics (MD)

Understanding biological mechanisms ultimately requires knowledge of molecular interactions and events

But most measurements are made on a very large (macroscopic) number of copies or replicas of the molecular system of interest

-  with notable exceptions such as single-molecule experiments relying on laser spectroscopy, single-channel conductance measurements, atomic-force microscopy

Ensemble averages At thermodynamic equilibrium, the time evolution of a molecular

system (molecular motion) is linked to ensemble averages

The macroscopic properties of a collection of particles are determined by ensemble averages over a huge number of identical microscopic systems whose thermodynamic state is defined by, eg, N, p, T (isothermal-isobaric ensemble) or N, V, T (canonical ensemble)

No two of them are in exactly the same microscopic state, but given a sufficient amount of time, at equilibrium eventually they all visit the same arrangements or conformations with the same probabilities

⇒  Taking averages from the dynamic trajectory of a single microscopic system over “infinitely long” times yields thermodynamic averages = macroscopic observables

Page 12: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

12

Characterising microscopic systems •  Each of the replicas / snapshots is found in a different conformation = microstate

•  We are interested in the distribution of these microstates = how likely is each of them?

•  If we know that, we can compute thermodynamic averages as weighted sums:

<X> = Σi Xi Pi

where the angular brackets denote ensemble averaging, summation takes place over all possible (thermally-accessible) microstates, and Pi, the weight or probability to find the system in state i, is entirely defined by the chemical composition, the temperature, and (in the canonoical ensemble) the volume of the system

•  What is needed to characterise the average and fluctuations (and therefore, the physical properties) of microscopic systems is a way to relate conformation and temperature to the distribution of states

The Boltzmann distribution The distribution of microstates in the canonical ensemble is given by the

Boltzmann distribution:

Pi ∝ e-Ei/kBT

where Ei is the energy of microstate i

Pi = e-Ei/kBT / Σi e-Ei/kBT

Q = Σi e-Ei/kBT is the partition function

also writen as ΣE g(E) e-E/kBT , where g(E) is the density of states or degeneracy of each distinct energy level E

Page 13: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

13

Conformational averaging Suppose we are interested in a property X that depends on the conformational state of a molecular system, which in a classical system is uniquely defined by vector q = (q1, q2,…, q3N)

The expectation value of X (macroscopic observable) is obtained as

<X> = Σj X(qj) e-βE(qj) / Σi e-βE(qi) where summation runs over all possible states, β = 1 / kBT, and E(qi) is the potential energy.

In the classical limit, the energy levels are so closely spaced that the discrete sums are replaced by integrals over conformational space.

At equilibrium, the relative probability to find two conformations A and B with energies EA and EB is given by

P(A)/P(B) = exp[-(EA-EB)/kBT]

Thermodynamic connection Some useful formulas:

Internal energy: ΔU = -(∂lnQ/∂β)V

Enthalpy: ΔH = ΔU + pV

Entropy: S = - kB Σi Pi ln Pi

Helmholtz free energy: ΔA = ΔU - TΔS A = - kBT ln Q

!  relate to last equation on previous page ! and to the equilibrium constant (conformational eqbm)

K = [Product]/[Reactant] = e-βΔF.

Gibbs free energy: ΔG = ΔH - TΔS = - kBT ln Q + pV

Note all Δ’s on this page are relative to value at absolute zero.

Page 14: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

14

To compute equilibrium properties, we need

•  (i) a way to generate distinct conformational microstates and

•  (ii) a way to assign or determine the energy of each microstate

… This is what molecular simulations do

Monte Carlo Simulations •  Another way to explore potential energy hypersurface thermally accessible

to molecular system using Boltzman statistics •  An example of a stochastic (random) simulation •  Based on probabilistic acceptance of random atomic moves {Δri} instead

of integrating the equations of motion •  Metropolis sampling:

if ΔV = V({ri + Δri}) - V({ri}) < 0, accept the move otherwise, pick a random number 0 < R < 1

if e-βΔV > R, accept otherwise, reject

Go to next random atomic move •  One advantage is no need to calculate forces or velocities •  But since the sequence of moves is arbitrary, no dynamic info is generated •  “moves” don’t have to be atomic displacements

- Eg: change protonation state, electronic state, temperature…

Repeat ad nauseam

Page 15: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

15

eg

Page 16: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

16

Applications to Protein Folding

Use tricks to improve sampling efficiency:

•  1. Replica-exchange simulations –  Eg García & Sanbonmatsu, PNAS 2002 –  Combine many MD simulations of same system

at different T’s with Monte Carlo exchange

•  2. Exploit exponential kinetics

+ distributed computing –  Snow et al., Nature, 7 Nov. 2002

•  3. Markov State Models

1. Replica-exchange simulations Eg García & Onuchic, PNAS 100:13898-13903 (2003)

•  Run many simulations in parallel at Ti = T0 + iΔT •  Eg 48 runs at 275 - 500K •  Run MC test every 0.25 ps:

–  If V(Ti+1) < V(Ti), switch T •  Efficient sampling as high-T scouting

continues even as “better” conformers cool down

•  The dynamics is lost •  BUT get free energy surface for a range of temperatures •  Giant leap in full sampling (peptide to protein!)

•  Folding protein A, a 46-aa, 3-helix bundle •  Get helix-coil transition thermodynamics over a wide range of temperature

$ complete sampling! In explicit solvent! •  Can use complete sampling to refine force fields

Page 17: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

17

Copyright ©2003 by the National Academy of Sciences

Fig. 2. Contour maps of the free energy in the folded state (a), ΔG(folded T = 387 K), and at the transition temperature (b), ΔG(T = T* = 421 K), as a function of the rmsd from the experimental

folded structure and Q

2. Exploiting relaxation kinetics Snow et al., Absolute comparison of simulated and experimental protein-folding dynamics, Nature 420:102-106 (2002)

•  Approach: in exponential-relaxation (two-state) processes, most of the time is spent waiting for a rapid transition

•  Population decay of the unfolded state: Mf(t) = M(1-e-kt) ~ Mkt if kt is small M = 104 CPU, k = 10-4 ns-1, t = 20 ns per processor ⇒ Expect 20 simulations to fold

•  Computational advantages of distributed computing: no wasted time on communication

•  Capture the heterogeneity of folding pathways •  ! Each simulation should be at least as long as the t it takes to cross the barrier •  Application to BB5, a designed “mini-protein” of 23 aa.

Page 18: Biomolecular Dynamics - University of Torontoarrhenius.med.utoronto.ca/~chan/JBB2026H_18-11-02_handout.pdf · 11/2/2018  · Biomolecular Dynamics • Molecular Dynamics • Basic

18

Results •  Stochastic dynamics w/ implicit solvent •  More than 100 folded using 30,000 CPUs for a few months •  Probably worked well $ no substantial hydrophobic core •  Total of 700 µs •  Compare 6 µs folding time to fluo and CD data τf = 7.5 +/- 3.5 µs •  Folding criteria = monitor at once:

population of 2ary structure (α helix and β hairpin) rms deviation from folded structure (No guarantee folded is most stable)

•  Download movies from Nature’s website

MD Simulations: Challenges and Progress –  Statistical errors

•  As a rule of thumb, to be reliable a simulation should extend over at least 10 times as long as the event being investigated

•  Running multiple simulations helps

–  Systematic errors Sampling: unvisited regions! Force field:

•  High T stretches limit of force field (both protein’s and solvent’s)… •  …and so do longer times and larger displacements

" As sampling capabilities increase, limits of FF accuracy become clearer

Keeping up: Curr Opin Struct Biol, Feb issue = Theory and Simulation