Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
BIASED COARSE-GRAINED MOLECULAR DYNAMICSSIMULATION APPROACH FOR FLEXIBLE FITTING OF X-
RAY STRUCTURE INTO CRY ELECTRON MICROSCOPY
Item Type text; Electronic Thesis
Authors Grubisic, Ivan
Publisher The University of Arizona.
Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.
Download date 07/01/2021 21:31:03
Link to Item http://hdl.handle.net/10150/192461
2
3
CONTRIBUTIONS
All of the experiments were conducted by Ivan Grubisic. The program used for these experiments
was developed by Max Shokirev and Osamu Miyashita. The thesis was authored by Ivan Grubisic
with editorial contribution from Florence Tama.
4
ABSTRACT
Different approaches have been introduced to fit high-resolution structures into low-resolution data
as obtained from cryo-EM experiments. As conformational changes are often observed in
biological molecules, these techniques need to take into account the flexibility of proteins.
Flexibility has been described in terms of movement between rigid domains and between rigid
secondary structure elements, which present some limitations for studying dynamical properties.
Normal mode analysis has also been used but is limited to medium resolution. All-atom molecular
dynamics fitting techniques are more appropriate to fit structure into higher resolution data as full
protein flexibility is used, but are cumbersome in terms of computational time. Therefore we
introduced and a coarse-grained MD approach to fit high-resolution structure into low-resolution
data. We use a Go-model to represent the biological molecule and biased MD to reproduce the
dynamics. Illustrative examples of our approach on simulated data are shown. Accurate fittings can
be obtained for resolution ranging from 5 to 20 angstroms. The approach was also tested on
experimental data, Elongation Factor G and for E. coli RNA Polymerase, where its validity would
be compared to previous models obtained from different flexible fitting techniques. This
comparison demonstrates the fact that quantitative flexible techniques, as opposed to manual
docking, need to be considered to interpret low-resolution data.
5
INTRODUCTION
Conformational dynamics of biological molecules
are essential to their function. As it is often
difficult to characterize different conformational
states for large biological molecules, medium to
low resolution techniques such as cryo electron
microscopy (EM) are often used to study
dynamical properties of biological molecules.
Cryo-EM has played a particularly key role in
identifying conformational states of
macromolecular assemblies (1) such as the
ribosome (2,3), GroEL, RNA polymerase (4),
myosin (5) and viruses (6,7) amongst others.
Because the cryo-EM technique only provides
medium to low-resolution data, it is interesting to
combine and fit, this data with known high-
resolution structures (obtained from X-ray or
NMR measurements) of the same biological
molecule. For objective and reproducible fitting,
several algorithms have been developed to replace
manual fitting. In the first quantitative approaches
introduced, only rigid body motions of the
molecules were considered (8-13). However, as
resolution of cryo-EM data has improved, distinct
conformational states can now be observed.
Therefore, in order to interpret the experimental
data at a near atomic level, approaches that include
protein flexibility have been introduced.
The first approaches for flexible fitting considered
biological systems as a collection of domains that
could be fitted independently as individual rigid
bodies. These approaches have revealed
conformational changes of several important
biological systems (5,14-17). However, such
methods rely on a subjective partitioning of the
system and ignore concerted motions between
domains that occur in biological molecules during
conformational changes. Therefore, faulty models
could be obtained.
Following these first methods, a number of
quantitative and more objective techniques have
been introduced for flexible fitting. In some of the
earlier work, reduced models that were deformed
to fit the data have been used (8,18). However, this
method might introduce ambiguity by embedding
the full atomic structure into a reduced model to
perform the flexible fitting. Real space refinement
(RSRef)(19) with molecular dynamics simulation
(20,21) is another avenue for flexible fitting. Such
an approach considers all-atoms, and optimizes the
fit to the data and the stereo-chemical properties of
the molecule (20). However, this implementation
assumes that certain units of the molecules,
domains, are rigid. This approximation could lead
to deformations that are not physically feasible, as
the mechanical properties of the molecules are not
fully reflected in the deformation of the structure.
A more objective representation of the dynamic of
biological molecule is a partitioning at a more
detailed level of the protein. In a molecular
modeling technique based on Monte Carlo,
simulated annealing and coarse-grained models,
secondary structure elements (SSE) were
maintained rigid to refine structural models into
cryo-EM maps (22). In methods that use iterative
comparative modeling for fitting into cryo-EM
data (23,24), multilevel subdivisions of the
structure from domain to secondary structure
elements (25) have been considered. Movements
of SSE have also been used in a flexible fitting
technique that compares structural variability of
domains within a given family. Finally,
partitioning can also be determined by identifying
elements of the biological molecule that are rigid
using graph theory (26,27). Flexibility between
these elements, as obtained from constrained
geometry simulation, can then be used for flexible
fitting (28). In each of these methods, some level
of protein rigidity is considered, which could
impair the interpretation of the mechanical
properties of biological molecules. In particular,
rigid block approximations, even at the SSE level,
would limit the interpretations of smaller scale
conformational changes that can now be observed
with higher-resolution data.
Using an all-atom representation of the biological
molecule for flexible fitting can be cumbersome in
terms of computational time. Therefore coarse-
grained representations of the molecules based on
elastic network model have also been considered
to incorporate full protein flexibility during the
fitting process. Schröder et al. combined the
elastic network model with random walk
displacements and distance restraints, which have
been successful in predicting conformational
changes of the ribose-binding protein (29).
6
Coarse-grained elastic network normal mode
analysis (NMA) has also been adapted to flexibly
fit high resolution structures into low-resolution
data (30-37).
Due to developments in cryo-EM techniques,
resolution of cryo-EM data can reach up to 4 Å
(for symmetric structures). Such higher-resolution
data provides a more detailed definition of the
structure and smaller scale rearrangements
compared to known high-resolution structures are
now becoming visible (38). Using NMA
techniques, large conformational changes of
biological molecule can be studied; however,
NMA does not describe smaller scale
conformational changes well (39). Methods that
maintain parts of the protein rigid would also limit
the interpretation of such data. Therefore methods
allowing full protein flexibility during fitting are
needed.
Recently, several approaches that consider full
protein flexibility, have been introduced for
flexible fitting. These techniques are based on
molecular dynamics simulation, which is a well-
established technique to investigate dynamics of
biological molecule. Overall, the aim of these
methods is to bias the MD simulation toward a
conformation that would fit the cryo-EM data. The
difference between these approaches is the form of
the biasing potential that is being used. In an
approach introduced by Caufield et al., the
molecular dynamics is steered using a minimum
biasing function (Maxwell's demon molecular
dynamics) (40). In a different approach, atoms are
steered by a potential map created from the cryo-
EM data (41,42). In order to maintain the
stereochemical quality of the structure, restraints
are applied to coordinates relevant to secondary
structural elements (42) as such approach can lead
to artifacts (43). Biased molecular dynamics using
correlation coefficient as the biasing potential have
also been introduced with no restraint imposed on
the secondary structure (43). Such approaches
have been successful; however, due to an all-atom
representation of the protein, they can be
computationally expensive especially with large
systems and because simulations are run in
vacuum there might be undesirable effect on
protein structure.
In this paper we describe an optimization method
based on molecular dynamics simulation with full
protein flexibility but with reduced protein
representation to reduce computational costs
associated with fitting. In coarse-grained models
used in molecular dynamics simulations, not all of
the atoms in the system are considered explicitly;
rather each residue is reduced to a few points,
which reduces considerably the computational
complexity of the systems and therefore speeds up
the simulation time while representing flexibility
and stereochemistry of the protein structure (44-
46). Also, coarse-grained models would be
sufficient for modeling the conformational
changes based on cryo-EM data, because with
such low-resolution data, it is not possible to
define the exact position of all the atoms beyond
backbone atoms. The setting up of such simulation
is straightforward, as no initial minimization of the
protein structure is needed. Given an appropriate
potential for a coarse-grained model,
conformational changes observed in experimental
data could be reproduced and structural models
consistent with experimental data could be
constructed. Illustrative results of our studies on
simulated EM data from several proteins, with
large conformational changes are presented. We
demonstrate that this flexible fitting method yields
structures that agree remarkably well with the
error-free simulated EM maps. Finally, we discuss
the results of our method applied to experimental
data of elongation factor G and RNA polymerase.
METHODS
Biasing potential
“Biased” molecular dynamics simulation, targeted
MD (47,48) and Steered Molecular MD (49), in
which external forces are added to guide the
system into a certain region of the conformational
space have been have been successfully employed
to study important conformational transitions of
biological systems (50,51).
Here we employ classical molecular dynamics
technique with a modified force field potential V
which is calculated as a sum of the classical
potential from the MD method Vff
and a new
7
effective potential VFit
to fit high-resolution
structure into low-resolution data:
V = Vff + V
Fit
The additional effective potential is calculated
according to the following equation:
VFit
= k (1 – c.c.) (1)
The c.c. represents the correlation coefficient that
describes similarity (overlap) of a target cryo-EM
map to a cryo-EM map synthetically generated for
an all-atom X-ray structure being fitted. The
correlation coefficient is defined in the following
way:
where and represent
experimental and synthetically simulated density
of voxel (i,j,k). The synthetic cryo-EM map is
generated from the all-atom structure.
k in Equation 1 is a constant that regulates the
magnitude of the effective potential and it needs to
be calibrated. This constant is the only arbitrary
parameter introduced. In the Discussion section,
we discuss the appropriate value of k that should
be used to obtain optimal results from the fitting.
While the described biasing potential is not the
only choice, other biasing potentials that are
simply proportional to the density (41,42)(C.C.
Jolley and M.F. Thorpe, personal communication)
could lead to artifacts, therefore VFit
was preferred
(43).
Go-model
In order to reduce computational costs
associated with an all atom description of the
protein in the MD simulation and reproduce full
protein flexibility, a coarse-grained model was
used for flexible fitting. While several coarse-
grained models have been developed, we will
employ a well-established one: Go-model (52,53),
which has been extensively used to study protein
folding (44,54) and more recently to describe
conformational transitions of biological
molecules.(55-58). A Go-potential takes into
account only native interactions, and each of these
interactions enters into the energy balance with the
same weight. Residues in the proteins are
represented as single beads centered at their C
positions (54). Adjacent beads are connected into a
polymer chain by bond and angle interactions,
while the geometry of the native state is encoded
in the dihedral angle potential and a non-local
bead-bead potential. The parameters for the
potential Vff used in this study were taken from
Clementi et al. (54).
Creating the cryo-EM map
Synthetic maps are calculated by locating three–
dimensional Gaussian functions on every atom and
integrating these functions for every atom in each
of the voxels as a given xn, yn, zn set of atomic
coordinates (18):
s i m(i, j,k) dxdydzVi j k g(x,y,z,xn,yn,zn)
n1
N
where n denotes n-th atom from a set of N atoms
and (i,j,k) denotes a given voxel and
g(x,y,z;xn,yn,zn) is the three-dimensional
Gaussian function of the following formula:
g(x,y,z;xn,yn,zn)exp[3
22{(xxn)
2(yyn)2(zzn)
2}]
where is a resolution parameter. The resolution
of a synthetic map is equal to 2 (18). We note
that the resolution parameter is set as a rough
estimate and it may not exactly coincide with the
resolution reported by the experimental data. This
is because the resolution of the experimental data
is often defined as a Fourier filter. However,
previous studies with coarse-grained models have
shown that the precise detail of the simulated map
do not affect the fitting performance (18,31,32).
Synthetic cryo-EM maps generated at several
resolutions, 5Å, 8Å, 10Å, 15Å and 20Å were used
8
as target maps in the simulations. It has been
shown that as long as the grid spacing is smaller
than the resolution of map, good quality fits can be
obtained (28). A 2 Å grid size was used in these
studies.
Determining the folding temperature
To determine the folding temperature of the
protein structure, we monitor Q-values at different
temperatures changes with respect to temperature.
Native contacts were defined as follow: 2 residues
have contact when at least one non-hydrogen atom
of the ith amino acid is within 6.5 Å of any non-
hydrogen atom of the jth amino acid (57).
Rigid body fitting
We employed the Situs package (8) to perform
rigid-body fitting of all-atom structures to
experimental cryo-EM maps of the Elongation
Factor G and the RNA polymerase. In this work
we used 10 codebook vectors to represent a pdb
structure and a cryo-EM map. The highest ranked
model was used to start to flexible fitting.
RESULTS
We have tested our procedure on simulated EM
data. Several proteins have been used in the past
for developing flexible fitting approaches
(25,28,31,42,59). Here we used a subset of
proteins that have been used in some of these
studies. The proteins that we choose undergo a
conformational change that has been observed
experimentally and high-resolution structures of
the two states are available. In addition to these we
chose large proteins and a protein complex: RNA
Polymerase. The proteins that were used for the
simulations are listed in Table 1 along with the
initial and target PDB files and the initial RMSD
between the two structures. Both small and large
proteins have been studied.
A synthetic density map was constructed by
convolution with a Gaussian kernel of = 2.5, 4, 5
7.5, 10 (31) for each structure. Several resolutions
were considered as this method is intended to
target both the high-end resolution spectrum of
cryo-EM data and the lower end. The fitting is
done from the X-ray structures into the calculated
error-free low-resolution maps. During the fitting,
simulated maps from the deformed structures are
created with the same resolution as the target map
in order to evaluate the c.c.
The purpose of the method presented here is to fit
the X-ray structure into a raw experimental (in this
case simulated) EM map of a different
conformation of the same molecule. To examine
the fitness between the atomic structure and the
experimental map during the simulation we
examine the c.c. between the structure and the
target EM map. In the following we also examine
the Root Mean Square Deviation (RMSD)
between the deformed structure and the original
structure from which the target EM map was
created. Ideally, when the c.c. reaches a maximum,
the RMSD should reach a minimum. However,
two different structures with different RMSD can
have the same c.c., because detailed information of
the atomic coordinates is lost in the EM map and
the EM map and atomic coordinates are not simply
related by a linear transformation. Thus to
characterize the performance of our method we
examine both values.
In order to perform accurate fitting, flexible fitting
techniques should be able to tolerate noise in the
map. Jolley et al. have analyzed the effect of noise
on the result of fitting with a correlation
coefficient optimization (28). Though high level
of noise S/N=1 is detrimental for fitting, with
higher S/N > 2, little impact is observed.
Similarly, Schroder et al. have compared fitting
with and without noise and have observed that
their method is robust against noise (30).
Therefore, we have also tested the applicability of
our approach to experimental data, which contains
noise. The elongation factor G (EF-G), an 844
residue protein, and the RNA polymerase were
taken as examples.
Folding Temperature
9
The folding temperature varies from one protein to
another. In our simulation, we do not want to
unfold the protein as previously done in other Go
model studies; instead we want the protein to
undergo its conformational change while
remaining folded. We have measured the folding
temperature for each protein. The folding
temperature for proteins of various size is different
because of the nature of the Go model but overall
in the same scale (see Table 2). Running
simulations at a temperature that is too close to the
folding temperature runs the risk of unfolding the
protein. Very low temperatures on the other hand
will freeze the protein. The temperature lacks
specific units because the simulation has not been
calibrated to the Kelvin scale but we found that a
T=500 worked well for all out test systems for the
force field we used.
Validity of Coarse Graining
The coarse grained Go model approach will look
to fit the C atoms to the full atom map. To ensure
the validity of our approach, we computed the
correlation coefficient between a given simulated
cryo-EM map using an all-atom structure against
several maps generated using a C model and
with different resolutions. Results for the case of
adenylate kinase (PDB: 4AKE) are shown Table
3. One can see that for all of the resolutions,
except for 10Å, the C map correlates the best
with the same resolution full atom map. The
concern at 10Å is negligible though because the
difference in correlation is at the thousandth
decimal point. Therefore using a coarse-grained
model seems appropriate and it has been shown to
be successful in several cases (31,60)
Simulated Data
In our implementation, the weight k is a parameter
to adjust how strongly the system is biased to fit
into the density. k needs to be calibrated so that
sampling of conformations is enhanced while
maintaining the structural integrity of the protein.
Therefore, it is necessary to determine what force
constant or sequence of force constants would be
capable of fitting the initial structure to the target
map. Simulations with weights k ranging from
1,000 to 100,000 were run for each protein.
Initially a force constant k=1,000 was used, but
this was only capable of causing the initial
structure to oscillate around its initial position.
Figure 1a shows this in the case of acetyl CoA
synthase (PDB: 1OAO Chain C). Using a stronger
force constant, k=10,000, as illustrated in Figure
1b was capable of reducing the RMSD to
approximately 1Å for the 5Å resolution map. In
such cases, the forces resulting from the gradient
of the c.c. are strong enough to overcome the
energetic barriers separating the two structures.
With a weight of 1000, the potential from the
correlation coefficient is too small which leads to a
poor fit
However, for higher resolution maps simulations,
a weight of 10,000 is not always successful. While
the RMSD initially decreases, they seem to remain
stuck at higher final RMSD values, i.e. in a local
minimum, than their lower resolution counterparts.
This would happen whenever the force constant
was initially too high. The most interesting of all
the cases is noted in Figure 2a where Ca2+
-
ATPase drops down immediately to a RMSD
value between 2 and 5Å for 15Å and 20Å
resolution, but at the 5Å, 8Å and 10Å resolutions,
the RMSD gets stuck at a minimum around 8Å.
To solve this problem a k=1,000 force constant
was implemented for 5,000 steps. At a force
constant of k=1,000 it can be seen in Figure 2b
that there are only relatively small changes, the
protein slightly alter its conformation to fit the
experimental data. Once the simulation was
continued from the final position and velocity
vectors with a k= 10,000 force constant, it began
to move towards the target structure causing the
RMSD to decrease. When this method was used,
the final RMSD was lower for the higher
resolution maps, which is exactly what would be
expected.
The k=10,000 force constant still allows for some
oscillations though and it was of interest to see if
we could minimize them and in turn get an even
better final structure. A larger force constant,
k=100,000, was then implemented and that was
able to get minimize the RMSD even further (see
Table 4). In each case the final structures agree
10
well with the simulated cryo-EM map as indicated
by a high c.c and are very close to the
conformation from which those simulated maps
were derived. As expected, accuracy decreases
with lower-resolution data.
The final k=100,000 force constant iteration can
minimize some of the poorer resolution structures
(see Figure 2b). In some cases though, as shown in
Figure 3 for adenylate kinase, the minimization
only worked for a few of the resolutions. One
noticeable difference between adenylate kinase
and the other proteins though is its small size. It
has 214 residues versus 691 residues, belonging to
which is the next smallest successful protein. The
large k=100,000 force constant that is applied for
the last 5000 steps of the simulation is capable of
pushing the smaller proteins out of a global energy
minimum into a nearby local minimum. Since the
correlation coefficient continued to increase, and
the RMSD failed to decrease, this is a situation
where over-fitting occurred. What is peculiar
though, is that the RMSD increased for all of the
resolutions with the exception of the 5Å and 20Å
maps.
Of the proteins listed in Table 1, all of them with
the exception of transglutaminase (PDB: 1KV3),
folded successfully onto the target PDB structure.
The initial RMSD was 28Å and the final RMSD
was 30Å. This meant that the final structure was in
essence worse than the initial. The structure of
transglutaminase presents an interesting problem
though. Figure 4a shows a trace of the initial
structure with two domains of interest colored
differently. These domains are highlighted because
they have very similar electron densities, which
will play a role in determining to which areas of
the target map the region can move to. Figure 4b
shows the final coordinate structure from the Go-
model in relation to the target structure. The
domains are highlighted in the same manner that
they were in Figure 4a and it is clear from this
figure that the proximity of the “red” domain
allowed it to move into the area where the “blue”
domain should have moved. In this instance the
final structure is in fact in a global energy
minimum, but with atoms in regions where they
should not be. Figure 4c validates this point by
showing the final structure from the simulation in
the cryo-EM map. Increasing the strength of the
native contacts did not have a positive effect on
the results. The only noticeable event from this is
that the force constant would have to be increased
by the same factor to allow for any motion. The
manner in which the backbone moved though did
not change and the final RMSD was the same as
the initial native contact value. For this particular
case, our approach fails to predict the correct
structure. Methods using a potential map
created from the cryo-EM data (41,42) would
certainly lead to the same results due to the
limitations associated with this potential (43). It is also reasonable to think that the NMA based
approach would also fail in such cases. Methods
based on domains structure variability within a
given family might provide a better avenue for
such cases (59); however, enough structural
information across the family would be needed for
the approach to be successful.
Overall, this approach construct models that are
similar to structure that were obtained from NMFF
approach (Table 5). This initial comparison shows
that the Go-model is a better fitting tool than
NMA in two of the three cases studied. Only in the
case of lactoferrin was the C fit for the NMA
better with a value of 1.0 Å compared to the 1.3 Å
from the Go-model. Using the same level of
coarse-graining, this approach is faster than the
normal mode based approach.
Experimental data simulations
We have also applied our approach to
experimental data, the cryo-EM map of the
Elongation Factor G bound to the ribosome at 11.8
Å resolution (61) and the cryo-EM map of the E.
coli RNA polymerase at 15 Å resolution (4).
EF-G has been studied using NMFF, i.e normal
mode based refinement and by all-atom molecular
dynamics simulations as well as a method based
on using structural variability of protein domains
within a same family (59). The RNA polymerase
has been studied using coarse-grained fitting using
Situs and by NMFF as well. Since the high-
resolution structure corresponding to the cryo-EM
map is not available, one can compare our
11
modeled structure to the ones that have been
obtained by other approaches.
The structure of an „„open‟‟ form of the E. coli
core RNAP has been determined at a 15 Å
resolution using single-particle cryo-EM
techniques (4). Because of the high similarity in
sequence between the E. coli and Thermus
aquaticus (Taq) core RNAP, it is possible to
annotate the low-resolution map from the E. coli
system with the high-resolution structure of Taq
(62,63). Previous studies found that a large
conformational change was necessary to
accommodate the high-resolution structure into the
cryo-EM map (4,32).
The Go-model flexible fitting was performed
using the three-force sequence and structure after
the two and three forces were compared to the
NMA result. With three force iterations the RMSD
increased to 10.9 Å, whereas the two force
iterations only increased it to 5.1 Å compared to
the initial structure, which is closer to the NMA
structure (7.3 Å). Figure 5 shows the initial and
final structures of RNAP in the two force iteration
condition. A much better fit to the data is
observed, in particular the jaw are now opened,
which is in agreement with previous fits using
coarse-grained dynamics (4,32). Comparisons with
other fittings are useful to determine whether a
two force or three force sequence should be used.
Indeed, the two force fitting is more comparable to
the NMA structure. We have seen with simulated
data that a three force sequence can lead to over
fitting, considering the fact that experimental data
contain noise, it is preferable to use a two force
sequence approach for fitting.
We also performed flexible fitting into the isolated
EF-G map using the X-ray structure of a mutant
factor (64). The PDB entry code for this structure
was 1FNM. The correlation coefficient converged
after steps of simulation. The C RMSD between
the starting and the fitted structure is 11.2 Å with
the three force iterations and 8.6 Å. with two force
iterations. Figure 6a shows the original rigid body
fitting for which some regions of the density
remain unaccounted for. Figure 6b illustrates our
model in which significant improvement of the fit
to the density is observed.
Results from this fitting can be compared to
models obtained from other techniques such as
NMFF (32), biased MD all atom (43) and rigid
body manual fitting (see Figure 7). When the Go-
model simulation was applied to Elongation Factor
G (EF-G) using two force iterations sequence, the
RMSD with the known X-ray structure went up to
8.6Å while with the three force sequence it
reached 11.2 Å. The RMSD obtained with the two
force sequence is more comparable with existing
NMA results where the RMSD was reported to be
8.8 Å. Similarly, comparing the final structures of
each simulated protein for each force constant
with the EF-G MD all atom model further
enforced the idea that the ideal methodology
would be to only use two force iterations. The
third iteration had a RMSD value that ranged
between 6.9 Å and 8.0 Å. The final structure after
the second force iteration had the lowest RMSD
value, 2.5 Å, when compared to the NMA fitted
structure which reemphasizes the point that a two
force sequence would be a better choice for
experimental data. We should also note that the
visual comparison with the results from
Velazquez-Muriel et al. reveals a similar structural
rearrangement (59).
In this study, for EF-G, we have observed that
different flexible techniques approaches, based on
normal mode analysis, molecular dynamics
simulation, all-atom protein representation or
coarse-grained approach converge toward a same
structural model. These flexible fittings reveal
large rearrangements between the domains II, IV
and V. In particular, a large displacement (up to 20
Å) of domain IV, which is correlated with
rotations of domains II and V is observed.
In addition models obtained from computational
approaches can be compared to a model that was
proposed based on manual docking (PDB: 1PN6)
(61). Figure 7 shows all four fitting methods
superimposed on one another and looks to point
out the flaw with the manual docking simulation.
Looking at the zoomed in picture of the top middle
domain, it is clear that the manually docked
structure is rotated to the right with respect to the
other structures. Computational methods are good
approximations of what structures may look like,
but as can be seen from these results, it is
12
important that multiple methods are taken to verify
a structure.
The case of elongation factor G is a prime
example that illustrates the need for quantitative
computational approaches for flexible fitting.
Indeed, computational approaches based on
different techniques, molecular dynamics
simulation, normal mode analysis and structural
variability within a same family using either all-
atom protein representation or coarse-grained
models converge toward structural models that
display an overall identical domain arrangement.
In comparison, the model based on manual
docking leads to a different domain arrangement.
The main difference between these results comes
from the fact that domain rearrangements involved
correlated motions between domains, which are
ignored by simple manual docking. Methods based
on physical properties of biological molecules
naturally take into those correlated motions and
provide a more accurate model.
Timing
The length of time needed to complete a single
simulation was dependent on two factors: the size
of the protein and the resolution. Adenylate kinase
(214 residues) was the fastest simulation at 5Å,
which took 7 minutes on a single processor. The
20Å simulation for the same protein took 23
minutes. The longest simulation was RNA
polymerase II (3558 Residues). The 5 Å resolution
took 5 hours 18 minutes while the 20 Å resolution
lasted for 8 hours 40 minutes.
CONCLUSIONS
We introduced a method for the flexible fitting of
high-resolution structures into low-resolution
electron density maps from cryo-EM based on
molecular dynamic simulations and coarse-grained
representation, Go-model, of the molecule. This
method can be applied to higher-resolution cryo-
EM maps for which more structural details are
available (8 Å and higher) but it is also applicable
to lower resolution data. The results on simulated
data have shown that it is a robust algorithm
because of its ability to work on a multitude of
systems with the same parameters with very short
computational time. Results presented in this
paper are in agreement with previous works using
different approaches.
We show also its success in predicting
conformational changes from experimental data.
The models obtained for the RNA polymerase and
EF-G are in agreement with other computational
studies. This paper also demonstrates the necessity
to employ computational tools to interpret
experimental data. Comparisons of several
computational models with a manual build model
reveal significant difference. Indeed, in general
computational tools provide robust ways to
flexibly fit proteins while maintaining the overall
architectural integrity. Due to the use of MD
simulation or normal mode analysis for fitting
with an all-atom or coarse-grained representation
of the protein, only feasible deformations are
possible, i.e. correlated motions between distant
parts of the biological system are taken into
account. Such correlated motions need to be taken
into account for accurate flexible fitting.
Acknowledgements
I would like to thank Dr. Florence Tama and Dr.
Osamu Miyashita for their help and guidance
during this project.
Financial support from National Science
Foundation grant 0744732 is greatly appreciated.
13
REFERENCES
1. Saibil, H. R. (2000) Conformational changes studied by cryo-electron microscopy, Nat Struct
Biol 7, 711-714.
2. Frank, J., and Agrawal, R. K. (2000) A ratchet-like inter-subunit reorganization of the ribosome
during translocation, Nature 406, 318-322.
3. Valle, M., Gillet, R., Kaur, S., Henne, A., Ramakrishnan, V., and Frank, J. (2003) Visualizing
tmRNA entry into a stalled ribosome, Science 300, 127-130.
4. Darst, S. A., Opalka, N., Chacon, P., Polyakov, A., Richter, C., Zhang, G. Y., and Wriggers, W.
(2002) Conformational flexibility of bacterial RNA polymerase, Proc Natl Acad Sci USA 99, 4296-
4301.
5. Wendt, T., Taylor, D., Trybus, K. M., and Taylor, K. (2001) Three-dimensional image
reconstruction of dephosphorylated smooth muscle heavy meromyosin reveals asymmetry in the
interaction between myosin heads and placement of subfragment 2, Proc Natl Acad Sci USA 98,
4361-4366.
6. Conway, J. F., Wikoff, W. R., Cheng, N., Duda, R. L., Hendrix, R. W., Johnson, J. E., and
Steven, A. C. (2001) Virus maturation involving large subunit rotations and local refolding, Science
292, 744-748.
7. Lee, K. K., and Johnson, J. E. (2003) Complementary approaches to structure determination of
icosahedral viruses, Curr Opin Struct Biol 13, 558-569.
8. Wriggers, W., Milligan, R. A., and McCammon, J. A. (1999) Situs: A package for docking
crystal structures into low- resolution maps from electron microscopy, J Struct Biol 125, 185-195.
9. Volkmann, N., and Hanein, D. (1999) Quantitative fitting of atomic models into observed
densities derived by electron microscopy, J Struct Biol 125, 176-184.
10. Rossmann, M. G. (2000) Fitting atomic models into electron-microscopy maps, Acta Cryst D
56, 1341-1349.
11. Rossmann, M. G., Bernal, R., and Pletnev, S. V. (2001) Combining electron microscopic with
X-ray crystallographic structures, J Struct Biol 136, 190-200.
12. Jiang, W., Baker, M. L., Ludtke, S. J., and Chiu, W. (2001) Bridging the information gap:
Computational tools for intermediate resolution structure interpretation, J Mol Biol 308, 1033-1044.
13. Chacon, P., and Wriggers, W. (2002) Multi-resolution contour-based fitting of macromolecular
structures, J Mol Biol 317, 375-384.
14. Volkmann, N., Hanein, D., Ouyang, G., Trybus, K. M., DeRosier, D. J., and Lowey, S. (2000)
Evidence for cleft closure in actomyosin upon ADP release, Nat Struct Biol 7, 1147-1155.
15. Rawat, U. B. S., Zavialov, A. V., Sengupta, J., Valle, M., Grassucci, R. A., Linde, J.,
Vestergaard, B., Ehrenberg, M., and Frank, J. (2003) A cryo-electron microscopic study of
ribosome-bound termination factor RF2, Nature 421, 87-90.
16. Gao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S. M., Van Roey, P.,
Agrawal, R. K., Harvey, S. C., Sali, A., Chapman, M. S., and Frank, J. (2003) Study of the
structural dynamics of the E coli 70S ribosome using real-space refinement, Cell 113, 789-801.
17. Gao, H., and Frank, J. (2005) Molding atomic structures into intermediate-resolution cryo-EM
density maps of ribosomal complexes using real-space refinement, Structure 13, 401-406.
18. Wriggers, W., and Birmanns, S. (2001) Using Situs for flexible and rigid-body fitting of
multiresolution single-molecule data, J Struct Biol 133, 193-202.
14
19. Chapman, M. S. (1995) Restrained Real-Space Macromolecular Atomic Refinement Using a
New Resolution-Dependent Electron-Density Function, Acta Cryst A 51, 69-80.
20. Chen, J. Z., Fürst, J., Chapman, M. S., and Grigorieff, N. (2003) Low-resolution structure
refinement in electron microscopy, J Struct Biol 144, 144-151.
21. Fabiola, F., and Chapman, M. S. (2005) Fitting of high-resolution structures into electron
microscopy reconstruction images, Structure 13, 389-400.
22. Mears, J. A., Sharma, M. R., Gutell, R. R., McCook, A. S., Richardson, P. E., Caulfield, T. R.,
Agrawal, R. K., and Harvey, S. C. (2006) A structural model for the large subunit of the
mammalian mitochondrial ribosome, J Mol Biol 358, 193-212.
23. Topf, M., Baker, M. L., John, B., Chiu, W., and Sali, A. (2005) Structural characterization of
components of protein assemblies by comparative modeling and electron cryo-microscopy, J Struct
Biol 149, 191-203.
24. Topf, M., Baker, M. L., Marti-Renom, M. A., Chiu, W., and Sali, A. (2006) Refinement of
protein structures by iterative comparative modeling and cryoEM density fitting, J Mol Biol 357,
1655-1668.
25. Topf, M., Lasker, K., Webb, B., Wolfson, H., Chiu, W., and Sali, A. (2008) Protein structure
fitting and refinement guided by cryo-EM density, Structure 16, 295-307.
26. Jacobs, D. J., and Thorpe, M. F. (1995) Generic Rigidity Percolation - The Pebble Game, Phys
Rev Lett 75, 4051-4054.
27. Jacobs, D. J., Rader, A. J., Kuhn, L. A., and Thorpe, M. F. (2001) Protein flexibility
predictions using graph theory, Proteins 44, 150-165.
28. Jolley, C. C., Wells, S. A., Fromme, P., and Thorpe, M. F. (2008) Fitting low-resolution cryo-
EM maps of proteins using constrained geometric simulations, Biophys J 94, 1613-1621.
29. Schröder, G. F., Brunger, A. T., and Levitt, M. (2007) Combining efficient conformational
sampling with a deformable elastic network model facilitates structure refinement at low
resolution, Structure 15, 1630-1641.
30. Delarue, M., and Dumas, P. (2004) On the use of low-frequency normal modes to enforce
collective movements in refining macromolecular structural models, Proc Natl Acad Sci USA 101,
6957-6962.
31. Tama, F., Miyashita, O., and Brooks III, C. L. (2004) Flexible multi-scale fitting of atomic
structures into low-resolution electron density maps with elastic network normal mode analysis, J
Mol Biol 337, 985-999.
32. Tama, F., Miyashita, O., and Brooks III, C. L. (2004) NMFF: Flexible high-resolution
annotation of low-resolution experimental data from cryo-EM maps using normal mode analysis, J
Struct Biol 147, 315-326.
33. Hinsen, K., Reuter, N., Navaza, J., Stokes, D. L., and Lacapere, J. J. (2005) Normal mode-
based fitting of atomic structure into electron density maps: Application to sarcoplasmic reticulum
Ca-ATPase, Biophys J 88, 818-827.
34. Suhre, K., Navaza, J., and Sanejouand, Y. H. (2006) NORMA: a tool for flexible fitting of
high-resolution protein structures into low-resolution electron-microscopy-derived density maps,
Acta Crystallogr D Biol Crystallogr 62, 1098-1100.
35. Mitra, K., Schaffitzel, C., Shaikh, T., Tama, F., Jenni, S., Brooks III, C. L., Ban, N., and Frank,
J. (2005) Structure of the E. coli protein-conducting channel bound to a translating ribosome,
Nature 438, 318-324.
15
36. Falke, S., Tama, F., Brooks, C. L., Gogol, E. P., and Fisher, M. T. (2005) The 13 angstrom
structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy, J Mol
Biol 348, 219-230.
37. Tama, F., Ren, G., Brooks, C. L. 3., and Mitra, A. K. (2006) Model of the toxic complex of
anthrax: responsive conformational changes in both the lethal factor and the protective antigen
heptamer, Protein Science 15, 2190-2200.
38. Stahlberg, H., and Walz, T. (2008) Molecular electron microscopy: state of the art and current
challenges, ACS Chem Biol 3, 268-281.
39. Tama, F., and Sanejouand, Y. H. (2001) Conformational change of proteins arising from
normal mode calculations, Protein Eng 14, 1-6.
40. Caulfield, T. R., and Harvey, S. C. (2007) Conformational fitting of atomic models to
cryogenic-electron microscopy maps using Maxwell's demon molecular dynamics, Biophys J ,
368A-368A.
41. Noda, K., Nakamura, M., Nishida, R., Yoneda, Y., Yamaguchi, Y., Tamura, Y., Nakamura, H.,
and Yasunaga, T. (2006) Atomic model construction of protein complexes from electron
micrographs and visualization of their 3D structure using a virtual reality system, J Plasma Phys 72,
1037-1040.
42. Trabuco, Villa, Mitra, Frank, and Schulten (2008) Flexible fitting of atomic structures into
electron microscopy maps using molecular dynamics, Structure 16, 673-683.
43. Orzechowski, M., and Tama, F. (2008) Flexible fitting of high-resolution x-ray structures into
cryoelectron microscopy maps using biased molecular dynamics simulations, Biophys J 95, 5692-
5705.
44. Tozzini, V. (2005) Coarse-grained models for proteins, Curr Opin Struct Biol 15, 144-150.
45. Onuchic, J. N., and Wolynes, P. G. (2004) Theory of protein folding, Curr Opin Struct Biol 14,
70-75.
46. Tozzini, V., and McCammon, J. A. (2005) A coarse grained model for the dynamics of flap
opening in HIV-1 protease, Chemical Physics Letters 413, 123-128.
47. Schlitter, J., Engels, M., and Krüger, P. (1994) Targeted molecular dynamics: a new approach
for searching pathways of conformational transitions, J Mol Graph 12, 84-89.
48. Ma, J., Sigler, P. B., Xu, Z., and Karplus, M. (2000) A dynamic model for the allosteric
mechanism of GroEL, J Mol Biol 302, 303-313.
49. Isralewitz, B., Gao, M., and Schulten, K. (2001) Steered molecular dynamics and mechanical
functions of proteins, Curr Opin Struct Biol 11, 224-230.
50. Krammer, A., Lu, H., Isralewitz, B., Schulten, K., and Vogel, V. (1999) Forced unfolding of
the fibronectin type III module reveals a tensile molecular recognition switch, Proc Natl Acad Sci U
S A 96, 1351-1356.
51. Sanbonmatsu, K. Y., Joseph, S., and Tung, C. S. (2005) Simulating movement of tRNA into
the ribosome during decoding, Proc Natl Acad Sci U S A 102, 15854-15859.
52. Taketomi, H., Ueda, Y., and Go, N. (1975) Studies on Protein Folding, Unfolding and
Fluctuations by Computer-Simulation .1. Effect of Specific Amino-Acid Sequence Represented by
Specific Inter-Unit Interactions, International Journal of Peptide and Protein Research 7, 445-459.
53. Ueda, Y., Taketomi, H., and Go, N. (1978) Studies on Protein Folding, Unfolding, and
Fluctuations by Computer-Simulation .2. 3-Dimensional Lattice Model of Lysozyme, Biopolymers
17, 1531-1548.
16
54. Clementi, C., Nymeyer, H., and Onuchic, J. N. (2000) Topological and energetic factors: What
determines the structural details of the transition state ensemble and "en-route" intermediates for
protein folding? An investigation for small globular proteins, J Mol Biol 298, 937-953.
55. Best, R. B., Chen, Y. G., and Hummer, G. (2005) Slow protein conformational dynamics from
multiple experimental structures: The helix/sheet transition of arc repressor, Structure 13, 1755-
1763.
56. Koga, N., and Takada, S. (2006) Folding-based molecular simulations reveal mechanisms of
the rotary motor F-1-ATPase, Proc Natl Acad Sci USA 103, 5367-5372.
57. Okazaki, K., Koga, N., Takada, S., Onuchic, J. N., and Wolynes, P. G. (2006) Multiple-basin
energy landscapes for large-amplitude conformational motions of proteins: Structure-based
molecular dynamics simulations, Proc Natl Acad Sci USA 103, 11844-11849.
58. Whitford, P. C., Miyashita, O., Levy, Y., and Onuchic, J. N. (2007) Conformational transitions
of adenylate kinase: Switching by cracking, J Mol Biol 366, 1661-1671.
59. Velazquez-Muriel, J. A., Valle, M., Santamaría-Pang, A., Kakadiaris, I. A., and Carazo, J. M.
(2006) Flexible fitting in 3D-EM guided by the structural variability of protein superfamilies,
Structure 14, 1115-1126.
60. Wriggers, W., Agrawal, R. K., Drew, D. L., McCammon, A., and Frank, J. (2000) Domain
motions of EF-G bound to the 70S ribosome: insights from a hand-shaking between multi-
resolution structures, Biophys J 79, 1670-1678.
61. Valle, M., Zavialov, A., Sengupta, J., Rawat, U., Ehrenberg, M., and Frank, J. (2003) Locking
and unlocking of ribosomal motions, Cell 114, 123-134.
62. Campbell, E. A., Korzheva, N., Mustaev, A., Murakami, K., Nair, S., Goldfarb, A., and Darst,
S. A. (2001) Structural mechanism for rifampicin inhibition of bacterial RNA polymerase, Cell
104, 901-912.
63. Zhang, G. Y., Campbell, E. A., Minakhin, L., Richter, C., Severinov, K., and Darst, S. A.
(1999) Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 angstrom resolution,
Cell 98, 811-824.
64. Laurberg, M., Kristensen, O., Martemyanov, K., Gudkov, A. T., Nagaev, I., Hughes, D., and
Liljas, A. (2000) Structure of a mutant EF-G reveals domain III and possibly the fusidic acid
binding site, J Mol Biol 303, 593-603.
TABLE LEGENDS
Table 1: A list of the proteins that will be simulated along with the PDB files that will be used.
The total number of residues is also listed for each protein. The initial RMSD value between the
two conformation was calculated by aligning the two PDB files in VMD.
Table 2: The folding temperatures of the proteins.
Table 3: The correlation coefficient between the C and and full atom maps at five different resolutions: 5, 8, 10, 15 and 20 angstroms.
Table 4: Summarizes the results of the successful simulations by providing the initial PDB for
each protein and then the best correlation coefficient and RMSD value for each resolution. The
table is divided into two sections: one for the two force iteration (k=1000 and k=10,000) and one
17
for the three force iteration (k=1000, k=10,000 and k=100,000). All of the RMSD values were
determined using VMD, whereas the correlation coefficient was obtained from the Go-model‟s
output file. The timing column refers to the computation time required by the Go-model at each
resolution.
Table 5: A comparison of the results provided by the Go-Model and NMA (31).
FIGURE LEGENDS
Figure 1: a. Acetyl CoA Synthase‟s RMSD and c.c. results at T=500 and using a single force
iteration of k=1,000. b. Acetyl CoA Synthase‟s RMSD and c.c. results at T=500 and using a single
force iteration of k=10,000. The 5Å resolution is represented by ο, 8 Å is represented by □, 10Å is
represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was calculated
using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken from the
Go-model‟s output file.
Figure 2: a. Ca2+
-ATPase‟s RMSD and c.c. results at T=500 and using a single force iteration of
k=10,000. b. Ca2+
-ATPase‟s RMSD and c.c. results at T=500 when using three force iterations:
k=1,000 for the first 5,000 steps, k=10,000 for steps 5,000 to 10,000 and k=100,000 for steps
10,000 to 15,000. The 5Å resolution is represented by ○, 8Å is represented by □, 10Å is
represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was calculated
using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken from the
Go-model‟s output file.
Figure 3: Adenylate kinase‟s RMSD and correlation coefficient results at T=500 when using three
force iterations: k=1,000 for the first 5,000 steps, k=10,000 for steps 5,000 to 10,000 and
k=100,000 for steps 10,000 to 15,000. The 5Å resolution is represented by ○, 8Å is represented by
□, 10Å is represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was
calculated using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken
from the Go-model‟s output file.
Figure 4: a. The solid surface is the target structure of transglutaminase and it is superimposed by
the initial structure which is defined by a tube. The blue tube corresponds to the blue surface and
the red tube corresponds to the red surface. b. The same target structure is now superimposed on
the final structure after the GO-Model simulation at T=500 and using three force iterations. Again
the chains are color coded to show that the red domain moved to where the blue domain should
have been. c. The final C backbone of transglutaminase as it fits into the cryo-EM target map.
Figure 5: a. E. coli RNA Polymerase‟s initial C backbone (PDB:1HQM) in the cryo-EM map
provided by the target experimental X-ray chromatography data. b. E. coli RNA Polymerase‟s C backbone fit into the cryo-EM map produced by the two force iteration at T=500.
Figure 6: a. Elongation Factor G‟s initial Cbackbone (PDB:1FNM) in the cryo-EM map
provided by the target experimental X-ray chromatography data. b. Elongation Factor G‟s
Cbackbone fit into the cryo-EM map produced by the two force iteration at T=500.
18
Figure 7: A comparison multiple final structures for Elongation Factor G. The cyan is the Go-
Model MD simulation as it is fit to the experimental X-ray chromatography data using two force
iterations, the NMA fitted EF-G is colored silver, the all atom MD simulation is ice blue and the
manual docked structure (PDB: 1PN6) is in red. The domain of interest is zoomed in on and
rotated to show the distortion of the manually docked structure with respect to the others. It is clear
by the position of the two alpha helices.
19
TABLES
Table 1
Protein Initial PDB Target PDB Number of Residues RMSD (Å)
Adenylate Kinase 1AKE 4AKE 214 7.2 Lactoferrin 1LFG 1LFH 691 6.4
Elongation Factor 2 1N0V 1N0U 819 14.5 Ca2+-ATPase 1IWO 1SU4 994 14.0
Acetyl CoA Synthase 1OAO (Chain
C) 1OAO (Chain D) 728 7.0 Transglutaminase 1KV3 2Q3Z 651 28.8 RNA Polymerase II 1I50 1I6H 3558 4.8 Elongation Factor G 1FNM N/A 655 N/A
E. coli RNA Polymerase 1HQM N/A 2650 N/A
20
Table 2
Protein Folding Temperature
1AKE 800
1N0V 960
1LFG 960
1IWO 940
1KV3 920
1OAO.C 1040
1I50 1000
21
Table 3
C Resolution Full atom Resolution
5 8 10 15 20
5 0.911 0.859 0.821 0.746 0.694
8 0.956 0.966 0.948 0.894 0.846
10 0.948 0.984 0.981 0.947 0.907
15 0.896 0.965 0.985 0.996 0.98
20 0.844 0.924 0.954 0.991 0.999
22
Table 4
PDB Resolution
(Å) Best C.C.
RMSD (Å)
Best C.C.
RMSD (Å)
Three Forces Two Forces
1AKE 5 0.93 0.9 0.91 0.9 8 0.99 3.9 0.97 1.5 10 0.99 3.4 0.98 1.8 15 1.00 1.9 1.00 1.7 20 1.00 1.7 1.00 1.9
1N0V 5 0.92 3.3 0.89 3.3 8 0.97 1.1 0.96 1.2 10 0.99 1.2 0.98 1.3 15 1.00 1.4 0.99 1.5 20 1.00 1.4 1.00 1.9
1LFG 5 0.92 0.7 0.90 0.9 8 0.98 1.2 0.96 1.1 10 0.99 1.4 0.98 1.3 15 1.00 1.2 0.99 1.4 20 1.00 1.3 1.00 1.7
1IWO 5 0.92 2.0 0.89 2.2 8 0.97 2.1 0.96 2.2 10 0.99 2.1 0.97 2.8 15 1.00 2.5 0.99 3.0 20 1.00 2.6 0.99 3.9
1OAO.C 5 0.92 0.7 0.90 0.8 8 0.97 1.1 0.96 1.0 10 0.99 1.1 0.98 1.2 15 1.00 1.1 0.99 1.3 20 0.95 1.1 0.95 1.4
1I50 5 N/A N/A 0.88 2.7 8 N/A N/A 0.96 2.9 10 N/A N/A 0.98 3.0 15 N/A N/A 0.99 3.1 20 N/A N/A 1.00 3.2
EFG 10.8 0.94 11.2 0.88 8.6
RNAP 15 0.96 10.8 0.89 5.1
23
Table 5
PDB Resolution (Å) Final RMSD (Å)
NMA Go-Model
1N0V 10 1.8 1.3
1LFG 10 1.0 1.3
1IWO 10 5.1 2.8
24
FIGURES
Figure 1
25
Figure 2
26
Figure 3
27
Figure 4
28
Figure 5
29
Figure 6
30
Figure 7