BIASED COARSE-GRAINED MOLECULAR DYNAMICS SIMULATION … · 2020. 4. 2. · states for large biological molecules, medium to low resolution techniques such as cryo electron microscopy

BIASED COARSE-GRAINED MOLECULAR DYNAMICSSIMULATION APPROACH FOR FLEXIBLE FITTING OF X-

RAY STRUCTURE INTO CRY ELECTRON MICROSCOPY

Item Type text; Electronic Thesis

Authors Grubisic, Ivan

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this materialis made possible by the University Libraries, University of Arizona.Further transmission, reproduction or presentation (such aspublic display or performance) of protected items is prohibitedexcept with permission of the author.

Download date 07/01/2021 21:31:03

Link to Item http://hdl.handle.net/10150/192461

http://hdl.handle.net/10150/192461

2

3

CONTRIBUTIONS

All of the experiments were conducted by Ivan Grubisic. The program used for these experiments

was developed by Max Shokirev and Osamu Miyashita. The thesis was authored by Ivan Grubisic

with editorial contribution from Florence Tama.

4

ABSTRACT

Different approaches have been introduced to fit high-resolution structures into low-resolution data

as obtained from cryo-EM experiments. As conformational changes are often observed in

biological molecules, these techniques need to take into account the flexibility of proteins.

Flexibility has been described in terms of movement between rigid domains and between rigid

secondary structure elements, which present some limitations for studying dynamical properties.

Normal mode analysis has also been used but is limited to medium resolution. All-atom molecular

dynamics fitting techniques are more appropriate to fit structure into higher resolution data as full

protein flexibility is used, but are cumbersome in terms of computational time. Therefore we

introduced and a coarse-grained MD approach to fit high-resolution structure into low-resolution

data. We use a Go-model to represent the biological molecule and biased MD to reproduce the

dynamics. Illustrative examples of our approach on simulated data are shown. Accurate fittings can

be obtained for resolution ranging from 5 to 20 angstroms. The approach was also tested on

experimental data, Elongation Factor G and for E. coli RNA Polymerase, where its validity would

be compared to previous models obtained from different flexible fitting techniques. This

comparison demonstrates the fact that quantitative flexible techniques, as opposed to manual

docking, need to be considered to interpret low-resolution data.

5

INTRODUCTION

Conformational dynamics of biological molecules

are essential to their function. As it is often

difficult to characterize different conformational

states for large biological molecules, medium to

low resolution techniques such as cryo electron

microscopy (EM) are often used to study

dynamical properties of biological molecules.

Cryo-EM has played a particularly key role in

identifying conformational states of

macromolecular assemblies (1) such as the

ribosome (2,3), GroEL, RNA polymerase (4),

myosin (5) and viruses (6,7) amongst others.

Because the cryo-EM technique only provides

medium to low-resolution data, it is interesting to

combine and fit, this data with known high-

resolution structures (obtained from X-ray or

NMR measurements) of the same biological

molecule. For objective and reproducible fitting,

several algorithms have been developed to replace

manual fitting. In the first quantitative approaches

introduced, only rigid body motions of the

molecules were considered (8-13). However, as

resolution of cryo-EM data has improved, distinct

conformational states can now be observed.

Therefore, in order to interpret the experimental

data at a near atomic level, approaches that include

protein flexibility have been introduced.

The first approaches for flexible fitting considered

biological systems as a collection of domains that

could be fitted independently as individual rigid

bodies. These approaches have revealed

conformational changes of several important

biological systems (5,14-17). However, such

methods rely on a subjective partitioning of the

system and ignore concerted motions between

domains that occur in biological molecules during

conformational changes. Therefore, faulty models

could be obtained.

Following these first methods, a number of

quantitative and more objective techniques have

been introduced for flexible fitting. In some of the

earlier work, reduced models that were deformed

to fit the data have been used (8,18). However, this

method might introduce ambiguity by embedding

the full atomic structure into a reduced model to

perform the flexible fitting. Real space refinement

(RSRef)(19) with molecular dynamics simulation

(20,21) is another avenue for flexible fitting. Such

an approach considers all-atoms, and optimizes the

fit to the data and the stereo-chemical properties of

the molecule (20). However, this implementation

assumes that certain units of the molecules,

domains, are rigid. This approximation could lead

to deformations that are not physically feasible, as

the mechanical properties of the molecules are not

fully reflected in the deformation of the structure.

A more objective representation of the dynamic of

biological molecule is a partitioning at a more

detailed level of the protein. In a molecular

modeling technique based on Monte Carlo,

simulated annealing and coarse-grained models,

secondary structure elements (SSE) were

maintained rigid to refine structural models into

cryo-EM maps (22). In methods that use iterative

comparative modeling for fitting into cryo-EM

data (23,24), multilevel subdivisions of the

structure from domain to secondary structure

elements (25) have been considered. Movements

of SSE have also been used in a flexible fitting

technique that compares structural variability of

domains within a given family. Finally,

partitioning can also be determined by identifying

elements of the biological molecule that are rigid

using graph theory (26,27). Flexibility between

these elements, as obtained from constrained

geometry simulation, can then be used for flexible

fitting (28). In each of these methods, some level

of protein rigidity is considered, which could

impair the interpretation of the mechanical

properties of biological molecules. In particular,

rigid block approximations, even at the SSE level,

would limit the interpretations of smaller scale

conformational changes that can now be observed

with higher-resolution data.

Using an all-atom representation of the biological

molecule for flexible fitting can be cumbersome in

terms of computational time. Therefore coarse-

grained representations of the molecules based on

elastic network model have also been considered

to incorporate full protein flexibility during the

fitting process. Schröder et al. combined the

elastic network model with random walk

displacements and distance restraints, which have

been successful in predicting conformational

changes of the ribose-binding protein (29).

6

Coarse-grained elastic network normal mode

analysis (NMA) has also been adapted to flexibly

fit high resolution structures into low-resolution

data (30-37).

Due to developments in cryo-EM techniques,

resolution of cryo-EM data can reach up to 4 Å

(for symmetric structures). Such higher-resolution

data provides a more detailed definition of the

structure and smaller scale rearrangements

compared to known high-resolution structures are

now becoming visible (38). Using NMA

techniques, large conformational changes of

biological molecule can be studied; however,

NMA does not describe smaller scale

conformational changes well (39). Methods that

maintain parts of the protein rigid would also limit

the interpretation of such data. Therefore methods

allowing full protein flexibility during fitting are

needed.

Recently, several approaches that consider full

protein flexibility, have been introduced for

flexible fitting. These techniques are based on

molecular dynamics simulation, which is a well-

established technique to investigate dynamics of

biological molecule. Overall, the aim of these

methods is to bias the MD simulation toward a

conformation that would fit the cryo-EM data. The

difference between these approaches is the form of

the biasing potential that is being used. In an

approach introduced by Caufield et al., the

molecular dynamics is steered using a minimum

biasing function (Maxwell's demon molecular

dynamics) (40). In a different approach, atoms are

steered by a potential map created from the cryo-

EM data (41,42). In order to maintain the

stereochemical quality of the structure, restraints

are applied to coordinates relevant to secondary

structural elements (42) as such approach can lead

to artifacts (43). Biased molecular dynamics using

correlation coefficient as the biasing potential have

also been introduced with no restraint imposed on

the secondary structure (43). Such approaches

have been successful; however, due to an all-atom

representation of the protein, they can be

computationally expensive especially with large

systems and because simulations are run in

vacuum there might be undesirable effect on

protein structure.

In this paper we describe an optimization method

based on molecular dynamics simulation with full

protein flexibility but with reduced protein

representation to reduce computational costs

associated with fitting. In coarse-grained models

used in molecular dynamics simulations, not all of

the atoms in the system are considered explicitly;

rather each residue is reduced to a few points,

which reduces considerably the computational

complexity of the systems and therefore speeds up

the simulation time while representing flexibility

and stereochemistry of the protein structure (44-

46). Also, coarse-grained models would be

sufficient for modeling the conformational

changes based on cryo-EM data, because with

such low-resolution data, it is not possible to

define the exact position of all the atoms beyond

backbone atoms. The setting up of such simulation

is straightforward, as no initial minimization of the

protein structure is needed. Given an appropriate

potential for a coarse-grained model,

conformational changes observed in experimental

data could be reproduced and structural models

consistent with experimental data could be

constructed. Illustrative results of our studies on

simulated EM data from several proteins, with

large conformational changes are presented. We

demonstrate that this flexible fitting method yields

structures that agree remarkably well with the

error-free simulated EM maps. Finally, we discuss

the results of our method applied to experimental

data of elongation factor G and RNA polymerase.

METHODS

Biasing potential

“Biased” molecular dynamics simulation, targeted

MD (47,48) and Steered Molecular MD (49), in

which external forces are added to guide the

system into a certain region of the conformational

space have been have been successfully employed

to study important conformational transitions of

biological systems (50,51).

Here we employ classical molecular dynamics

technique with a modified force field potential V

which is calculated as a sum of the classical

potential from the MD method Vff

and a new

7

effective potential VFit

to fit high-resolution

structure into low-resolution data:

V = Vff + V

Fit

The additional effective potential is calculated

according to the following equation:

VFit

= k (1 – c.c.) (1)

The c.c. represents the correlation coefficient that

describes similarity (overlap) of a target cryo-EM

map to a cryo-EM map synthetically generated for

an all-atom X-ray structure being fitted. The

correlation coefficient is defined in the following

way:

where and represent

experimental and synthetically simulated density

of voxel (i,j,k). The synthetic cryo-EM map is

generated from the all-atom structure.

k in Equation 1 is a constant that regulates the

magnitude of the effective potential and it needs to

be calibrated. This constant is the only arbitrary

parameter introduced. In the Discussion section,

we discuss the appropriate value of k that should

be used to obtain optimal results from the fitting.

While the described biasing potential is not the

only choice, other biasing potentials that are

simply proportional to the density (41,42)(C.C.

Jolley and M.F. Thorpe, personal communication)

could lead to artifacts, therefore VFit

was preferred

(43).

Go-model

In order to reduce computational costs

associated with an all atom description of the

protein in the MD simulation and reproduce full

protein flexibility, a coarse-grained model was

used for flexible fitting. While several coarse-

grained models have been developed, we will

employ a well-established one: Go-model (52,53),

which has been extensively used to study protein

folding (44,54) and more recently to describe

conformational transitions of biological

molecules.(55-58). A Go-potential takes into

account only native interactions, and each of these

interactions enters into the energy balance with the

same weight. Residues in the proteins are

represented as single beads centered at their C

positions (54). Adjacent beads are connected into a

polymer chain by bond and angle interactions,

while the geometry of the native state is encoded

in the dihedral angle potential and a non-local

bead-bead potential. The parameters for the

potential Vff used in this study were taken from

Clementi et al. (54).

Creating the cryo-EM map

Synthetic maps are calculated by locating three–

dimensional Gaussian functions on every atom and

integrating these functions for every atom in each

of the voxels as a given xn, yn, zn set of atomic

coordinates (18):

s i m(i, j,k) dxdydzVi j k g(x,y,z,xn,yn,zn)

n1

N

where n denotes n-th atom from a set of N atoms

and (i,j,k) denotes a given voxel and

g(x,y,z;xn,yn,zn) is the three-dimensional

Gaussian function of the following formula:

g(x,y,z;xn,yn,zn)exp[3

22{(xxn)

2(yyn)2(zzn)

2}]

where is a resolution parameter. The resolution

of a synthetic map is equal to 2 (18). We note

that the resolution parameter is set as a rough

estimate and it may not exactly coincide with the

resolution reported by the experimental data. This

is because the resolution of the experimental data

is often defined as a Fourier filter. However,

previous studies with coarse-grained models have

shown that the precise detail of the simulated map

do not affect the fitting performance (18,31,32).

Synthetic cryo-EM maps generated at several

resolutions, 5Å, 8Å, 10Å, 15Å and 20Å were used

8

as target maps in the simulations. It has been

shown that as long as the grid spacing is smaller

than the resolution of map, good quality fits can be

obtained (28). A 2 Å grid size was used in these

studies.

Determining the folding temperature

To determine the folding temperature of the

protein structure, we monitor Q-values at different

temperatures changes with respect to temperature.

Native contacts were defined as follow: 2 residues

have contact when at least one non-hydrogen atom

of the ith amino acid is within 6.5 Å of any non-

hydrogen atom of the jth amino acid (57).

Rigid body fitting

We employed the Situs package (8) to perform

rigid-body fitting of all-atom structures to

experimental cryo-EM maps of the Elongation

Factor G and the RNA polymerase. In this work

we used 10 codebook vectors to represent a pdb

structure and a cryo-EM map. The highest ranked

model was used to start to flexible fitting.

RESULTS

We have tested our procedure on simulated EM

data. Several proteins have been used in the past

for developing flexible fitting approaches

(25,28,31,42,59). Here we used a subset of

proteins that have been used in some of these

studies. The proteins that we choose undergo a

conformational change that has been observed

experimentally and high-resolution structures of

the two states are available. In addition to these we

chose large proteins and a protein complex: RNA

Polymerase. The proteins that were used for the

simulations are listed in Table 1 along with the

initial and target PDB files and the initial RMSD

between the two structures. Both small and large

proteins have been studied.

A synthetic density map was constructed by

convolution with a Gaussian kernel of = 2.5, 4, 5

7.5, 10 (31) for each structure. Several resolutions

were considered as this method is intended to

target both the high-end resolution spectrum of

cryo-EM data and the lower end. The fitting is

done from the X-ray structures into the calculated

error-free low-resolution maps. During the fitting,

simulated maps from the deformed structures are

created with the same resolution as the target map

in order to evaluate the c.c.

The purpose of the method presented here is to fit

the X-ray structure into a raw experimental (in this

case simulated) EM map of a different

conformation of the same molecule. To examine

the fitness between the atomic structure and the

experimental map during the simulation we

examine the c.c. between the structure and the

target EM map. In the following we also examine

the Root Mean Square Deviation (RMSD)

between the deformed structure and the original

structure from which the target EM map was

created. Ideally, when the c.c. reaches a maximum,

the RMSD should reach a minimum. However,

two different structures with different RMSD can

have the same c.c., because detailed information of

the atomic coordinates is lost in the EM map and

the EM map and atomic coordinates are not simply

related by a linear transformation. Thus to

characterize the performance of our method we

examine both values.

In order to perform accurate fitting, flexible fitting

techniques should be able to tolerate noise in the

map. Jolley et al. have analyzed the effect of noise

on the result of fitting with a correlation

coefficient optimization (28). Though high level

of noise S/N=1 is detrimental for fitting, with

higher S/N > 2, little impact is observed.

Similarly, Schroder et al. have compared fitting

with and without noise and have observed that

their method is robust against noise (30).

Therefore, we have also tested the applicability of

our approach to experimental data, which contains

noise. The elongation factor G (EF-G), an 844

residue protein, and the RNA polymerase were

taken as examples.

Folding Temperature

9

The folding temperature varies from one protein to

another. In our simulation, we do not want to

unfold the protein as previously done in other Go

model studies; instead we want the protein to

undergo its conformational change while

remaining folded. We have measured the folding

temperature for each protein. The folding

temperature for proteins of various size is different

because of the nature of the Go model but overall

in the same scale (see Table 2). Running

simulations at a temperature that is too close to the

folding temperature runs the risk of unfolding the

protein. Very low temperatures on the other hand

will freeze the protein. The temperature lacks

specific units because the simulation has not been

calibrated to the Kelvin scale but we found that a

T=500 worked well for all out test systems for the

force field we used.

Validity of Coarse Graining

The coarse grained Go model approach will look

to fit the C atoms to the full atom map. To ensure

the validity of our approach, we computed the

correlation coefficient between a given simulated

cryo-EM map using an all-atom structure against

several maps generated using a C model and

with different resolutions. Results for the case of

adenylate kinase (PDB: 4AKE) are shown Table

3. One can see that for all of the resolutions,

except for 10Å, the C map correlates the best

with the same resolution full atom map. The

concern at 10Å is negligible though because the

difference in correlation is at the thousandth

decimal point. Therefore using a coarse-grained

model seems appropriate and it has been shown to

be successful in several cases (31,60)

Simulated Data

In our implementation, the weight k is a parameter

to adjust how strongly the system is biased to fit

into the density. k needs to be calibrated so that

sampling of conformations is enhanced while

maintaining the structural integrity of the protein.

Therefore, it is necessary to determine what force

constant or sequence of force constants would be

capable of fitting the initial structure to the target

map. Simulations with weights k ranging from

1,000 to 100,000 were run for each protein.

Initially a force constant k=1,000 was used, but

this was only capable of causing the initial

structure to oscillate around its initial position.

Figure 1a shows this in the case of acetyl CoA

synthase (PDB: 1OAO Chain C). Using a stronger

force constant, k=10,000, as illustrated in Figure

1b was capable of reducing the RMSD to

approximately 1Å for the 5Å resolution map. In

such cases, the forces resulting from the gradient

of the c.c. are strong enough to overcome the

energetic barriers separating the two structures.

With a weight of 1000, the potential from the

correlation coefficient is too small which leads to a

poor fit

However, for higher resolution maps simulations,

a weight of 10,000 is not always successful. While

the RMSD initially decreases, they seem to remain

stuck at higher final RMSD values, i.e. in a local

minimum, than their lower resolution counterparts.

This would happen whenever the force constant

was initially too high. The most interesting of all

the cases is noted in Figure 2a where Ca2+

-

ATPase drops down immediately to a RMSD

value between 2 and 5Å for 15Å and 20Å

resolution, but at the 5Å, 8Å and 10Å resolutions,

the RMSD gets stuck at a minimum around 8Å.

To solve this problem a k=1,000 force constant

was implemented for 5,000 steps. At a force

constant of k=1,000 it can be seen in Figure 2b

that there are only relatively small changes, the

protein slightly alter its conformation to fit the

experimental data. Once the simulation was

continued from the final position and velocity

vectors with a k= 10,000 force constant, it began

to move towards the target structure causing the

RMSD to decrease. When this method was used,

the final RMSD was lower for the higher

resolution maps, which is exactly what would be

expected.

The k=10,000 force constant still allows for some

oscillations though and it was of interest to see if

we could minimize them and in turn get an even

better final structure. A larger force constant,

k=100,000, was then implemented and that was

able to get minimize the RMSD even further (see

Table 4). In each case the final structures agree

10

well with the simulated cryo-EM map as indicated

by a high c.c and are very close to the

conformation from which those simulated maps

were derived. As expected, accuracy decreases

with lower-resolution data.

The final k=100,000 force constant iteration can

minimize some of the poorer resolution structures

(see Figure 2b). In some cases though, as shown in

Figure 3 for adenylate kinase, the minimization

only worked for a few of the resolutions. One

noticeable difference between adenylate kinase

and the other proteins though is its small size. It

has 214 residues versus 691 residues, belonging to

which is the next smallest successful protein. The

large k=100,000 force constant that is applied for

the last 5000 steps of the simulation is capable of

pushing the smaller proteins out of a global energy

minimum into a nearby local minimum. Since the

correlation coefficient continued to increase, and

the RMSD failed to decrease, this is a situation

where over-fitting occurred. What is peculiar

though, is that the RMSD increased for all of the

resolutions with the exception of the 5Å and 20Å

maps.

Of the proteins listed in Table 1, all of them with

the exception of transglutaminase (PDB: 1KV3),

folded successfully onto the target PDB structure.

The initial RMSD was 28Å and the final RMSD

was 30Å. This meant that the final structure was in

essence worse than the initial. The structure of

transglutaminase presents an interesting problem

though. Figure 4a shows a trace of the initial

structure with two domains of interest colored

differently. These domains are highlighted because

they have very similar electron densities, which

will play a role in determining to which areas of

the target map the region can move to. Figure 4b

shows the final coordinate structure from the Go-

model in relation to the target structure. The

domains are highlighted in the same manner that

they were in Figure 4a and it is clear from this

figure that the proximity of the “red” domain

allowed it to move into the area where the “blue”

domain should have moved. In this instance the

final structure is in fact in a global energy

minimum, but with atoms in regions where they

should not be. Figure 4c validates this point by

showing the final structure from the simulation in

the cryo-EM map. Increasing the strength of the

native contacts did not have a positive effect on

the results. The only noticeable event from this is

that the force constant would have to be increased

by the same factor to allow for any motion. The

manner in which the backbone moved though did

not change and the final RMSD was the same as

the initial native contact value. For this particular

case, our approach fails to predict the correct

structure. Methods using a potential map

created from the cryo-EM data (41,42) would

certainly lead to the same results due to the

limitations associated with this potential (43). It is also reasonable to think that the NMA based

approach would also fail in such cases. Methods

based on domains structure variability within a

given family might provide a better avenue for

such cases (59); however, enough structural

information across the family would be needed for

the approach to be successful.

Overall, this approach construct models that are

similar to structure that were obtained from NMFF

approach (Table 5). This initial comparison shows

that the Go-model is a better fitting tool than

NMA in two of the three cases studied. Only in the

case of lactoferrin was the C fit for the NMA

better with a value of 1.0 Å compared to the 1.3 Å

from the Go-model. Using the same level of

coarse-graining, this approach is faster than the

normal mode based approach.

Experimental data simulations

We have also applied our approach to

experimental data, the cryo-EM map of the

Elongation Factor G bound to the ribosome at 11.8

Å resolution (61) and the cryo-EM map of the E.

coli RNA polymerase at 15 Å resolution (4).

EF-G has been studied using NMFF, i.e normal

mode based refinement and by all-atom molecular

dynamics simulations as well as a method based

on using structural variability of protein domains

within a same family (59). The RNA polymerase

has been studied using coarse-grained fitting using

Situs and by NMFF as well. Since the high-

resolution structure corresponding to the cryo-EM

map is not available, one can compare our

11

modeled structure to the ones that have been

obtained by other approaches.

The structure of an „„open‟‟ form of the E. coli

core RNAP has been determined at a 15 Å

resolution using single-particle cryo-EM

techniques (4). Because of the high similarity in

sequence between the E. coli and Thermus

aquaticus (Taq) core RNAP, it is possible to

annotate the low-resolution map from the E. coli

system with the high-resolution structure of Taq

(62,63). Previous studies found that a large

conformational change was necessary to

accommodate the high-resolution structure into the

cryo-EM map (4,32).

The Go-model flexible fitting was performed

using the three-force sequence and structure after

the two and three forces were compared to the

NMA result. With three force iterations the RMSD

increased to 10.9 Å, whereas the two force

iterations only increased it to 5.1 Å compared to

the initial structure, which is closer to the NMA

structure (7.3 Å). Figure 5 shows the initial and

final structures of RNAP in the two force iteration

condition. A much better fit to the data is

observed, in particular the jaw are now opened,

which is in agreement with previous fits using

coarse-grained dynamics (4,32). Comparisons with

other fittings are useful to determine whether a

two force or three force sequence should be used.

Indeed, the two force fitting is more comparable to

the NMA structure. We have seen with simulated

data that a three force sequence can lead to over

fitting, considering the fact that experimental data

contain noise, it is preferable to use a two force

sequence approach for fitting.

We also performed flexible fitting into the isolated

EF-G map using the X-ray structure of a mutant

factor (64). The PDB entry code for this structure

was 1FNM. The correlation coefficient converged

after steps of simulation. The C RMSD between

the starting and the fitted structure is 11.2 Å with

the three force iterations and 8.6 Å. with two force

iterations. Figure 6a shows the original rigid body

fitting for which some regions of the density

remain unaccounted for. Figure 6b illustrates our

model in which significant improvement of the fit

to the density is observed.

Results from this fitting can be compared to

models obtained from other techniques such as

NMFF (32), biased MD all atom (43) and rigid

body manual fitting (see Figure 7). When the Go-

model simulation was applied to Elongation Factor

G (EF-G) using two force iterations sequence, the

RMSD with the known X-ray structure went up to

8.6Å while with the three force sequence it

reached 11.2 Å. The RMSD obtained with the two

force sequence is more comparable with existing

NMA results where the RMSD was reported to be

8.8 Å. Similarly, comparing the final structures of

each simulated protein for each force constant

with the EF-G MD all atom model further

enforced the idea that the ideal methodology

would be to only use two force iterations. The

third iteration had a RMSD value that ranged

between 6.9 Å and 8.0 Å. The final structure after

the second force iteration had the lowest RMSD

value, 2.5 Å, when compared to the NMA fitted

structure which reemphasizes the point that a two

force sequence would be a better choice for

experimental data. We should also note that the

visual comparison with the results from

Velazquez-Muriel et al. reveals a similar structural

rearrangement (59).

In this study, for EF-G, we have observed that

different flexible techniques approaches, based on

normal mode analysis, molecular dynamics

simulation, all-atom protein representation or

coarse-grained approach converge toward a same

structural model. These flexible fittings reveal

large rearrangements between the domains II, IV

and V. In particular, a large displacement (up to 20

Å) of domain IV, which is correlated with

rotations of domains II and V is observed.

In addition models obtained from computational

approaches can be compared to a model that was

proposed based on manual docking (PDB: 1PN6)

(61). Figure 7 shows all four fitting methods

superimposed on one another and looks to point

out the flaw with the manual docking simulation.

Looking at the zoomed in picture of the top middle

domain, it is clear that the manually docked

structure is rotated to the right with respect to the

other structures. Computational methods are good

approximations of what structures may look like,

but as can be seen from these results, it is

12

important that multiple methods are taken to verify

a structure.

The case of elongation factor G is a prime

example that illustrates the need for quantitative

computational approaches for flexible fitting.

Indeed, computational approaches based on

different techniques, molecular dynamics

simulation, normal mode analysis and structural

variability within a same family using either all-

atom protein representation or coarse-grained

models converge toward structural models that

display an overall identical domain arrangement.

In comparison, the model based on manual

docking leads to a different domain arrangement.

The main difference between these results comes

from the fact that domain rearrangements involved

correlated motions between domains, which are

ignored by simple manual docking. Methods based

on physical properties of biological molecules

naturally take into those correlated motions and

provide a more accurate model.

Timing

The length of time needed to complete a single

simulation was dependent on two factors: the size

of the protein and the resolution. Adenylate kinase

(214 residues) was the fastest simulation at 5Å,

which took 7 minutes on a single processor. The

20Å simulation for the same protein took 23

minutes. The longest simulation was RNA

polymerase II (3558 Residues). The 5 Å resolution

took 5 hours 18 minutes while the 20 Å resolution

lasted for 8 hours 40 minutes.

CONCLUSIONS

We introduced a method for the flexible fitting of

high-resolution structures into low-resolution

electron density maps from cryo-EM based on

molecular dynamic simulations and coarse-grained

representation, Go-model, of the molecule. This

method can be applied to higher-resolution cryo-

EM maps for which more structural details are

available (8 Å and higher) but it is also applicable

to lower resolution data. The results on simulated

data have shown that it is a robust algorithm

because of its ability to work on a multitude of

systems with the same parameters with very short

computational time. Results presented in this

paper are in agreement with previous works using

different approaches.

We show also its success in predicting

conformational changes from experimental data.

The models obtained for the RNA polymerase and

EF-G are in agreement with other computational

studies. This paper also demonstrates the necessity

to employ computational tools to interpret

experimental data. Comparisons of several

computational models with a manual build model

reveal significant difference. Indeed, in general

computational tools provide robust ways to

flexibly fit proteins while maintaining the overall

architectural integrity. Due to the use of MD

simulation or normal mode analysis for fitting

with an all-atom or coarse-grained representation

of the protein, only feasible deformations are

possible, i.e. correlated motions between distant

parts of the biological system are taken into

account. Such correlated motions need to be taken

into account for accurate flexible fitting.

Acknowledgements

I would like to thank Dr. Florence Tama and Dr.

Osamu Miyashita for their help and guidance

during this project.

Financial support from National Science

Foundation grant 0744732 is greatly appreciated.

13

REFERENCES

1. Saibil, H. R. (2000) Conformational changes studied by cryo-electron microscopy, Nat Struct

Biol 7, 711-714.

2. Frank, J., and Agrawal, R. K. (2000) A ratchet-like inter-subunit reorganization of the ribosome

during translocation, Nature 406, 318-322.

3. Valle, M., Gillet, R., Kaur, S., Henne, A., Ramakrishnan, V., and Frank, J. (2003) Visualizing

tmRNA entry into a stalled ribosome, Science 300, 127-130.

4. Darst, S. A., Opalka, N., Chacon, P., Polyakov, A., Richter, C., Zhang, G. Y., and Wriggers, W.

(2002) Conformational flexibility of bacterial RNA polymerase, Proc Natl Acad Sci USA 99, 4296-

4301.

5. Wendt, T., Taylor, D., Trybus, K. M., and Taylor, K. (2001) Three-dimensional image

reconstruction of dephosphorylated smooth muscle heavy meromyosin reveals asymmetry in the

interaction between myosin heads and placement of subfragment 2, Proc Natl Acad Sci USA 98,

4361-4366.

6. Conway, J. F., Wikoff, W. R., Cheng, N., Duda, R. L., Hendrix, R. W., Johnson, J. E., and

Steven, A. C. (2001) Virus maturation involving large subunit rotations and local refolding, Science

292, 744-748.

7. Lee, K. K., and Johnson, J. E. (2003) Complementary approaches to structure determination of

icosahedral viruses, Curr Opin Struct Biol 13, 558-569.

8. Wriggers, W., Milligan, R. A., and McCammon, J. A. (1999) Situs: A package for docking

crystal structures into low- resolution maps from electron microscopy, J Struct Biol 125, 185-195.

9. Volkmann, N., and Hanein, D. (1999) Quantitative fitting of atomic models into observed

densities derived by electron microscopy, J Struct Biol 125, 176-184.

10. Rossmann, M. G. (2000) Fitting atomic models into electron-microscopy maps, Acta Cryst D

56, 1341-1349.

11. Rossmann, M. G., Bernal, R., and Pletnev, S. V. (2001) Combining electron microscopic with

X-ray crystallographic structures, J Struct Biol 136, 190-200.

12. Jiang, W., Baker, M. L., Ludtke, S. J., and Chiu, W. (2001) Bridging the information gap:

Computational tools for intermediate resolution structure interpretation, J Mol Biol 308, 1033-1044.

13. Chacon, P., and Wriggers, W. (2002) Multi-resolution contour-based fitting of macromolecular

structures, J Mol Biol 317, 375-384.

14. Volkmann, N., Hanein, D., Ouyang, G., Trybus, K. M., DeRosier, D. J., and Lowey, S. (2000)

Evidence for cleft closure in actomyosin upon ADP release, Nat Struct Biol 7, 1147-1155.

15. Rawat, U. B. S., Zavialov, A. V., Sengupta, J., Valle, M., Grassucci, R. A., Linde, J.,

Vestergaard, B., Ehrenberg, M., and Frank, J. (2003) A cryo-electron microscopic study of

ribosome-bound termination factor RF2, Nature 421, 87-90.

16. Gao, H., Sengupta, J., Valle, M., Korostelev, A., Eswar, N., Stagg, S. M., Van Roey, P.,

Agrawal, R. K., Harvey, S. C., Sali, A., Chapman, M. S., and Frank, J. (2003) Study of the

structural dynamics of the E coli 70S ribosome using real-space refinement, Cell 113, 789-801.

17. Gao, H., and Frank, J. (2005) Molding atomic structures into intermediate-resolution cryo-EM

density maps of ribosomal complexes using real-space refinement, Structure 13, 401-406.

18. Wriggers, W., and Birmanns, S. (2001) Using Situs for flexible and rigid-body fitting of

multiresolution single-molecule data, J Struct Biol 133, 193-202.

14

19. Chapman, M. S. (1995) Restrained Real-Space Macromolecular Atomic Refinement Using a

New Resolution-Dependent Electron-Density Function, Acta Cryst A 51, 69-80.

20. Chen, J. Z., Fürst, J., Chapman, M. S., and Grigorieff, N. (2003) Low-resolution structure

refinement in electron microscopy, J Struct Biol 144, 144-151.

21. Fabiola, F., and Chapman, M. S. (2005) Fitting of high-resolution structures into electron

microscopy reconstruction images, Structure 13, 389-400.

22. Mears, J. A., Sharma, M. R., Gutell, R. R., McCook, A. S., Richardson, P. E., Caulfield, T. R.,

Agrawal, R. K., and Harvey, S. C. (2006) A structural model for the large subunit of the

mammalian mitochondrial ribosome, J Mol Biol 358, 193-212.

23. Topf, M., Baker, M. L., John, B., Chiu, W., and Sali, A. (2005) Structural characterization of

components of protein assemblies by comparative modeling and electron cryo-microscopy, J Struct

Biol 149, 191-203.

24. Topf, M., Baker, M. L., Marti-Renom, M. A., Chiu, W., and Sali, A. (2006) Refinement of

protein structures by iterative comparative modeling and cryoEM density fitting, J Mol Biol 357,

1655-1668.

25. Topf, M., Lasker, K., Webb, B., Wolfson, H., Chiu, W., and Sali, A. (2008) Protein structure

fitting and refinement guided by cryo-EM density, Structure 16, 295-307.

26. Jacobs, D. J., and Thorpe, M. F. (1995) Generic Rigidity Percolation - The Pebble Game, Phys

Rev Lett 75, 4051-4054.

27. Jacobs, D. J., Rader, A. J., Kuhn, L. A., and Thorpe, M. F. (2001) Protein flexibility

predictions using graph theory, Proteins 44, 150-165.

28. Jolley, C. C., Wells, S. A., Fromme, P., and Thorpe, M. F. (2008) Fitting low-resolution cryo-

EM maps of proteins using constrained geometric simulations, Biophys J 94, 1613-1621.

29. Schröder, G. F., Brunger, A. T., and Levitt, M. (2007) Combining efficient conformational

sampling with a deformable elastic network model facilitates structure refinement at low

resolution, Structure 15, 1630-1641.

30. Delarue, M., and Dumas, P. (2004) On the use of low-frequency normal modes to enforce

collective movements in refining macromolecular structural models, Proc Natl Acad Sci USA 101,

6957-6962.

31. Tama, F., Miyashita, O., and Brooks III, C. L. (2004) Flexible multi-scale fitting of atomic

structures into low-resolution electron density maps with elastic network normal mode analysis, J

Mol Biol 337, 985-999.

32. Tama, F., Miyashita, O., and Brooks III, C. L. (2004) NMFF: Flexible high-resolution

annotation of low-resolution experimental data from cryo-EM maps using normal mode analysis, J

Struct Biol 147, 315-326.

33. Hinsen, K., Reuter, N., Navaza, J., Stokes, D. L., and Lacapere, J. J. (2005) Normal mode-

based fitting of atomic structure into electron density maps: Application to sarcoplasmic reticulum

Ca-ATPase, Biophys J 88, 818-827.

34. Suhre, K., Navaza, J., and Sanejouand, Y. H. (2006) NORMA: a tool for flexible fitting of

high-resolution protein structures into low-resolution electron-microscopy-derived density maps,

Acta Crystallogr D Biol Crystallogr 62, 1098-1100.

35. Mitra, K., Schaffitzel, C., Shaikh, T., Tama, F., Jenni, S., Brooks III, C. L., Ban, N., and Frank,

J. (2005) Structure of the E. coli protein-conducting channel bound to a translating ribosome,

Nature 438, 318-324.

15

36. Falke, S., Tama, F., Brooks, C. L., Gogol, E. P., and Fisher, M. T. (2005) The 13 angstrom

structure of a chaperonin GroEL-protein substrate complex by cryo-electron microscopy, J Mol

Biol 348, 219-230.

37. Tama, F., Ren, G., Brooks, C. L. 3., and Mitra, A. K. (2006) Model of the toxic complex of

anthrax: responsive conformational changes in both the lethal factor and the protective antigen

heptamer, Protein Science 15, 2190-2200.

38. Stahlberg, H., and Walz, T. (2008) Molecular electron microscopy: state of the art and current

challenges, ACS Chem Biol 3, 268-281.

39. Tama, F., and Sanejouand, Y. H. (2001) Conformational change of proteins arising from

normal mode calculations, Protein Eng 14, 1-6.

40. Caulfield, T. R., and Harvey, S. C. (2007) Conformational fitting of atomic models to

cryogenic-electron microscopy maps using Maxwell's demon molecular dynamics, Biophys J ,

368A-368A.

41. Noda, K., Nakamura, M., Nishida, R., Yoneda, Y., Yamaguchi, Y., Tamura, Y., Nakamura, H.,

and Yasunaga, T. (2006) Atomic model construction of protein complexes from electron

micrographs and visualization of their 3D structure using a virtual reality system, J Plasma Phys 72,

1037-1040.

42. Trabuco, Villa, Mitra, Frank, and Schulten (2008) Flexible fitting of atomic structures into

electron microscopy maps using molecular dynamics, Structure 16, 673-683.

43. Orzechowski, M., and Tama, F. (2008) Flexible fitting of high-resolution x-ray structures into

cryoelectron microscopy maps using biased molecular dynamics simulations, Biophys J 95, 5692-

5705.

44. Tozzini, V. (2005) Coarse-grained models for proteins, Curr Opin Struct Biol 15, 144-150.

45. Onuchic, J. N., and Wolynes, P. G. (2004) Theory of protein folding, Curr Opin Struct Biol 14,

70-75.

46. Tozzini, V., and McCammon, J. A. (2005) A coarse grained model for the dynamics of flap

opening in HIV-1 protease, Chemical Physics Letters 413, 123-128.

47. Schlitter, J., Engels, M., and Krüger, P. (1994) Targeted molecular dynamics: a new approach

for searching pathways of conformational transitions, J Mol Graph 12, 84-89.

48. Ma, J., Sigler, P. B., Xu, Z., and Karplus, M. (2000) A dynamic model for the allosteric

mechanism of GroEL, J Mol Biol 302, 303-313.

49. Isralewitz, B., Gao, M., and Schulten, K. (2001) Steered molecular dynamics and mechanical

functions of proteins, Curr Opin Struct Biol 11, 224-230.

50. Krammer, A., Lu, H., Isralewitz, B., Schulten, K., and Vogel, V. (1999) Forced unfolding of

the fibronectin type III module reveals a tensile molecular recognition switch, Proc Natl Acad Sci U

S A 96, 1351-1356.

51. Sanbonmatsu, K. Y., Joseph, S., and Tung, C. S. (2005) Simulating movement of tRNA into

the ribosome during decoding, Proc Natl Acad Sci U S A 102, 15854-15859.

52. Taketomi, H., Ueda, Y., and Go, N. (1975) Studies on Protein Folding, Unfolding and

Fluctuations by Computer-Simulation .1. Effect of Specific Amino-Acid Sequence Represented by

Specific Inter-Unit Interactions, International Journal of Peptide and Protein Research 7, 445-459.

53. Ueda, Y., Taketomi, H., and Go, N. (1978) Studies on Protein Folding, Unfolding, and

Fluctuations by Computer-Simulation .2. 3-Dimensional Lattice Model of Lysozyme, Biopolymers

17, 1531-1548.

16

54. Clementi, C., Nymeyer, H., and Onuchic, J. N. (2000) Topological and energetic factors: What

determines the structural details of the transition state ensemble and "en-route" intermediates for

protein folding? An investigation for small globular proteins, J Mol Biol 298, 937-953.

55. Best, R. B., Chen, Y. G., and Hummer, G. (2005) Slow protein conformational dynamics from

multiple experimental structures: The helix/sheet transition of arc repressor, Structure 13, 1755-

1763.

56. Koga, N., and Takada, S. (2006) Folding-based molecular simulations reveal mechanisms of

the rotary motor F-1-ATPase, Proc Natl Acad Sci USA 103, 5367-5372.

57. Okazaki, K., Koga, N., Takada, S., Onuchic, J. N., and Wolynes, P. G. (2006) Multiple-basin

energy landscapes for large-amplitude conformational motions of proteins: Structure-based

molecular dynamics simulations, Proc Natl Acad Sci USA 103, 11844-11849.

58. Whitford, P. C., Miyashita, O., Levy, Y., and Onuchic, J. N. (2007) Conformational transitions

of adenylate kinase: Switching by cracking, J Mol Biol 366, 1661-1671.

59. Velazquez-Muriel, J. A., Valle, M., Santamaría-Pang, A., Kakadiaris, I. A., and Carazo, J. M.

(2006) Flexible fitting in 3D-EM guided by the structural variability of protein superfamilies,

Structure 14, 1115-1126.

60. Wriggers, W., Agrawal, R. K., Drew, D. L., McCammon, A., and Frank, J. (2000) Domain

motions of EF-G bound to the 70S ribosome: insights from a hand-shaking between multi-

resolution structures, Biophys J 79, 1670-1678.

61. Valle, M., Zavialov, A., Sengupta, J., Rawat, U., Ehrenberg, M., and Frank, J. (2003) Locking

and unlocking of ribosomal motions, Cell 114, 123-134.

62. Campbell, E. A., Korzheva, N., Mustaev, A., Murakami, K., Nair, S., Goldfarb, A., and Darst,

S. A. (2001) Structural mechanism for rifampicin inhibition of bacterial RNA polymerase, Cell

104, 901-912.

63. Zhang, G. Y., Campbell, E. A., Minakhin, L., Richter, C., Severinov, K., and Darst, S. A.

(1999) Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 angstrom resolution,

Cell 98, 811-824.

64. Laurberg, M., Kristensen, O., Martemyanov, K., Gudkov, A. T., Nagaev, I., Hughes, D., and

Liljas, A. (2000) Structure of a mutant EF-G reveals domain III and possibly the fusidic acid

binding site, J Mol Biol 303, 593-603.

TABLE LEGENDS

Table 1: A list of the proteins that will be simulated along with the PDB files that will be used.

The total number of residues is also listed for each protein. The initial RMSD value between the

two conformation was calculated by aligning the two PDB files in VMD.

Table 2: The folding temperatures of the proteins.

Table 3: The correlation coefficient between the C and and full atom maps at five different resolutions: 5, 8, 10, 15 and 20 angstroms.

Table 4: Summarizes the results of the successful simulations by providing the initial PDB for

each protein and then the best correlation coefficient and RMSD value for each resolution. The

table is divided into two sections: one for the two force iteration (k=1000 and k=10,000) and one

17

for the three force iteration (k=1000, k=10,000 and k=100,000). All of the RMSD values were

determined using VMD, whereas the correlation coefficient was obtained from the Go-model‟s

output file. The timing column refers to the computation time required by the Go-model at each

resolution.

Table 5: A comparison of the results provided by the Go-Model and NMA (31).

FIGURE LEGENDS

Figure 1: a. Acetyl CoA Synthase‟s RMSD and c.c. results at T=500 and using a single force

iteration of k=1,000. b. Acetyl CoA Synthase‟s RMSD and c.c. results at T=500 and using a single

force iteration of k=10,000. The 5Å resolution is represented by ο, 8 Å is represented by □, 10Å is

represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was calculated

using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken from the

Go-model‟s output file.

Figure 2: a. Ca2+

-ATPase‟s RMSD and c.c. results at T=500 and using a single force iteration of

k=10,000. b. Ca2+

-ATPase‟s RMSD and c.c. results at T=500 when using three force iterations:

k=1,000 for the first 5,000 steps, k=10,000 for steps 5,000 to 10,000 and k=100,000 for steps

10,000 to 15,000. The 5Å resolution is represented by ○, 8Å is represented by □, 10Å is

represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was calculated

using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken from the

Go-model‟s output file.

Figure 3: Adenylate kinase‟s RMSD and correlation coefficient results at T=500 when using three

force iterations: k=1,000 for the first 5,000 steps, k=10,000 for steps 5,000 to 10,000 and

k=100,000 for steps 10,000 to 15,000. The 5Å resolution is represented by ○, 8Å is represented by

□, 10Å is represented by ×, 15Å is represented by Δ and 20Å is represented by +. The RMSD was

calculated using the RMSD Trajectory Tool plug-in in VMD. The correlation coefficient was taken

from the Go-model‟s output file.

Figure 4: a. The solid surface is the target structure of transglutaminase and it is superimposed by

the initial structure which is defined by a tube. The blue tube corresponds to the blue surface and

the red tube corresponds to the red surface. b. The same target structure is now superimposed on

the final structure after the GO-Model simulation at T=500 and using three force iterations. Again

the chains are color coded to show that the red domain moved to where the blue domain should

have been. c. The final C backbone of transglutaminase as it fits into the cryo-EM target map.

Figure 5: a. E. coli RNA Polymerase‟s initial C backbone (PDB:1HQM) in the cryo-EM map

provided by the target experimental X-ray chromatography data. b. E. coli RNA Polymerase‟s C backbone fit into the cryo-EM map produced by the two force iteration at T=500.

Figure 6: a. Elongation Factor G‟s initial Cbackbone (PDB:1FNM) in the cryo-EM map

provided by the target experimental X-ray chromatography data. b. Elongation Factor G‟s

Cbackbone fit into the cryo-EM map produced by the two force iteration at T=500.

18

Figure 7: A comparison multiple final structures for Elongation Factor G. The cyan is the Go-

Model MD simulation as it is fit to the experimental X-ray chromatography data using two force

iterations, the NMA fitted EF-G is colored silver, the all atom MD simulation is ice blue and the

manual docked structure (PDB: 1PN6) is in red. The domain of interest is zoomed in on and

rotated to show the distortion of the manually docked structure with respect to the others. It is clear

by the position of the two alpha helices.

19

TABLES

Table 1

Protein Initial PDB Target PDB Number of Residues RMSD (Å)

Adenylate Kinase 1AKE 4AKE 214 7.2 Lactoferrin 1LFG 1LFH 691 6.4

Elongation Factor 2 1N0V 1N0U 819 14.5 Ca2+-ATPase 1IWO 1SU4 994 14.0

Acetyl CoA Synthase 1OAO (Chain

C) 1OAO (Chain D) 728 7.0 Transglutaminase 1KV3 2Q3Z 651 28.8 RNA Polymerase II 1I50 1I6H 3558 4.8 Elongation Factor G 1FNM N/A 655 N/A

E. coli RNA Polymerase 1HQM N/A 2650 N/A

20

Table 2

Protein Folding Temperature

1AKE 800

1N0V 960

1LFG 960

1IWO 940

1KV3 920

1OAO.C 1040

1I50 1000

21

Table 3

C Resolution Full atom Resolution

5 8 10 15 20

5 0.911 0.859 0.821 0.746 0.694

8 0.956 0.966 0.948 0.894 0.846

10 0.948 0.984 0.981 0.947 0.907

15 0.896 0.965 0.985 0.996 0.98

20 0.844 0.924 0.954 0.991 0.999

22

Table 4

PDB Resolution

(Å) Best C.C.

RMSD (Å)

Best C.C.

RMSD (Å)

Three Forces Two Forces

1AKE 5 0.93 0.9 0.91 0.9 8 0.99 3.9 0.97 1.5 10 0.99 3.4 0.98 1.8 15 1.00 1.9 1.00 1.7 20 1.00 1.7 1.00 1.9

1N0V 5 0.92 3.3 0.89 3.3 8 0.97 1.1 0.96 1.2 10 0.99 1.2 0.98 1.3 15 1.00 1.4 0.99 1.5 20 1.00 1.4 1.00 1.9

1LFG 5 0.92 0.7 0.90 0.9 8 0.98 1.2 0.96 1.1 10 0.99 1.4 0.98 1.3 15 1.00 1.2 0.99 1.4 20 1.00 1.3 1.00 1.7

1IWO 5 0.92 2.0 0.89 2.2 8 0.97 2.1 0.96 2.2 10 0.99 2.1 0.97 2.8 15 1.00 2.5 0.99 3.0 20 1.00 2.6 0.99 3.9

1OAO.C 5 0.92 0.7 0.90 0.8 8 0.97 1.1 0.96 1.0 10 0.99 1.1 0.98 1.2 15 1.00 1.1 0.99 1.3 20 0.95 1.1 0.95 1.4

1I50 5 N/A N/A 0.88 2.7 8 N/A N/A 0.96 2.9 10 N/A N/A 0.98 3.0 15 N/A N/A 0.99 3.1 20 N/A N/A 1.00 3.2

EFG 10.8 0.94 11.2 0.88 8.6

RNAP 15 0.96 10.8 0.89 5.1

23

Table 5

PDB Resolution (Å) Final RMSD (Å)

NMA Go-Model

1N0V 10 1.8 1.3

1LFG 10 1.0 1.3

1IWO 10 5.1 2.8

24

FIGURES

Figure 1

25

Figure 2

26

Figure 3

27

Figure 4

28

Figure 5

29

Figure 6

30

Figure 7

Documents

BIASED COARSE-GRAINED MOLECULAR DYNAMICS SIMULATION … · 2020. 4. 2. · states for large biological molecules, medium to low resolution techniques such as cryo electron microscopy