CONTENTS
1 Contents 3 1.1 Installing MDAnalysis . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 4 1.3 Working with
AtomGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 7 1.4 Trajectory analysis . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 1.5 Intermediate Level MDAnalysis hacks . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 14
2 References 19
Bibliography 23
Index 25
Practical 15: MDAnalysis Documentation, Release 1.0
MDAnalysis is an open source Python library that helps you to
quickly write your own analysis algorithm for studying trajectories
produced by the most popular simulation packages
[Michaud-Agrawal2011].
The online documentation together with the interactive python
documentation should help you while you are using the
library.
DISCLAIMER: Your instructor is one of the main authors of the
package and might be overly enthusiastic in promoting it...
CONTENTS 1
2 CONTENTS
1.1 Installing MDAnalysis
The following notes are specific for the SimBioNano course; in
general (if you have a C-compiler installed and a few other
packages) then you should be able to follow the installation
notes.
1.1.1 Installing the binary distribution on the iMacs
The iMacs do not have a C-compiler so a special binary distribution
was prepared as a so-called “egg” file, which contains Python code
and compiled code. To install a Python egg you need the helper
script ez_setup.py, which installs some infrastructure, and the egg
file itself.
On the iMacs, do the following to install a version latest released
version of MDAnalysis (0.7.7).
First create the directory where you will install packages:
mkdir -p ~/Library/Python/2.7/lib/python/site-packages
(This directory is automatically searched by the Python
installation on the iMacs so you don’t have to manipulate
PYTHONPATH.) Then create a file ~/.pydistutils.cfg that will tell
the Python installation tools (distutils) that you always want to
put Python packages into your own private directory (a so-called
Mac OS X user installation):
cat > ~/.pydistutils.cfg <<’EOF’ # Mac OS X user
installation: #
http://peak.telecommunity.com/DevCenter/EasyInstall#mac-os-x-user-installation
#
http://peak.telecommunity.com/DevCenter/EasyInstall#downloading-and-installing-a-package
# for Mac OS X framework installations (such as EPD) # use
site.USER_SITE http://docs.python.org/2/library/site.html
[install] install_lib =
~/Library/Python/$py_version_short/lib/python/site-packages
install_scripts = ~/bin EOF
Then download the packages from the SimBioNano/15/eggs directory
(in general you can find source files on the MDAnalysis download
page):
curl -O
http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/ez_setup.py
curl -O
http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/eggs/MDAnalysis-0.7.7-py2.7-macosx-10.6-i386.egg
curl -O
http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/eggs/MDAnalysisTests-0.7.7-py2.7.egg
Finally, install the packages and a few dependencies (which are
downloaded from the internet):
easy_install --no-deps
./MDAnalysis-0.7.7-py2.7-macosx-10.6-i386.egg
./MDAnalysisTests-0.7.7-py2.7.egg easy_install networkx
gridDataFormats
1.1.2 Installing on saguaro from source
saguaro has an installation of all the important Python packages
and the GNU compilers as part of the Software Library and Packages.
Thus it is very easy to install MDAnalysis from Python
packages:
module load python easy_install --user -U MDAnalysis
MDAnalysisTests
Just remember to module load python before starting to use
MDAnalysis.
1.1.3 Testing the installation
MDAnalysis comes with over 500 test cases that check its
functionality. These test cases can be run with the command
python -c ’from MDAnalysis.tests import test; test(label="full",
verbose=3, extra_argv=["--exe"])’
This can take a few minutes. Ideally, you should only get passing
tests (“ok” or just a single dot ”.” when using verbose=1) or
“KnownFailures”.
1.2 Basics
• use the interactive help (command? or command??)
• TAB completion, e.g. MDAnalysis.U<TAB> will autocomplete to
MDAnalysis.Universe.
MDAnalysis.Universe.<TAB> will show all methods and
attributes.
• quick plotting with matplotlib (and array manipulations with
numpy)
1.2.1 Loading modules
Load MDAnalysis:
import MDAnalysis
MDAnalysis comes with a bunch of test files and trajectories. One
is the AdK trajectory from Practical 10 that samples a transition
from a closed to an open conformation [Beckstein2009]. The topology
file (CHARMM psf format) and trajectory (CHARMM/NAMD dcd format)
can be loaded into the variables PSF and DCD:
from MDAnalysis.tests.datafiles import PSF, DCD
Finally, also load numpy:
import numpy as np
4 Chapter 1. Contents
1.2.2 Universe and AtomGroup
MDAnalysis is object oriented. Molecular systems consist of Atom
objects (“instances” of the “class”
MDAnalysis.core.AtomGroup.Atom), which are grouped in AtomGroup
instances. You build the AtomGroup of your system by loading a
topology (list of atoms and possibly their connectivity) together
with a trajectory (coordinate information) into the central data
structure, the Universe object:
>>> u = MDAnalysis.Universe(PSF, DCD) >>>
print(u) <Universe with 3341 atoms>
The atoms are stored in the “attribute”
MDAnalysis.core.AtomGroup.Universe.atoms
>>> print(u.atoms) <AtomGroup with 3341 atoms>
>>> list(u.atoms[:5]) [< Atom 1: name ’N’ of type ’56’
of resname ’MET’, resid 1 and segid ’4AKE’>, < Atom 2: name
’HT1’ of type ’2’ of resname ’MET’, resid 1 and segid ’4AKE’>,
< Atom 3: name ’HT2’ of type ’2’ of resname ’MET’, resid 1 and
segid ’4AKE’>, < Atom 4: name ’HT3’ of type ’2’ of resname
’MET’, resid 1 and segid ’4AKE’>, < Atom 5: name ’CA’ of type
’22’ of resname ’MET’, resid 1 and segid ’4AKE’>]
Any AtomGroup knows the residues that the atoms belong to via the
attribute residues, which produces a ResidueGroup. A ResidueGroup
acts like a list of Residue objects:
>>> u.atoms[100:130].residues <ResidueGroup
[<Residue ’LEU’, 6>, <Residue ’GLY’, 7>, <Residue
’ALA’, 8>]>
Larger organizational units are Segment instances, for example one
protein or all the solvent molecules or simply the whole system.
Atom, AtomGroup, Residue, and ResidueGroup have an attribute
segments that will list the segment IDs (“segids”) as a
SegmentGroup:
>>> u.atoms.segments <SegmentGroup [<Segment
’4AKE’>]>
The converse is also true: each “higher” level in the hierarchy
also know about the Residue and Atom instances it contains. For
example, to list the atoms of the ResidueGroup we had before:
>>> r = u.atoms[100:130].residues >>> r.atoms
<AtomGroup with 36 atoms>
Exercise 1
1. What residue (“resname”) does the last atom belong to in the
above example?
>>> r = u.atoms[100:130].residues >>> r.atoms[-1]
< Atom 136: name ’O’ of type ’70’ of resname ’ALA’, resid 8 and
segid ’4AKE’>
2. Why does the expression
len(u.atoms[100:130]) == len(u.atoms[100:130].residues.atoms)
return False?
Because the complete residues contain more atoms than the arbitrary
slice of atoms.
3. How many residues are in the Universe u?
1.2. Basics 5
>>> len(u.atoms.residues) >>>
u.atoms.numberOfResidues() 214
How do you get a list of the residue names (such as ["Ala", "Gly",
"Gly", "Asp", ...]) and residue numbers (“resid”) for atoms 1000 to
1300? And as a list of tuples (resname, resid) (Hint:
zip())?:
>>> resnames = u.atoms[999:1300].resnames() >>>
resids = u.atoms[999:1300].resids() >>> zip(resnames,
resids)
How do you obtain the resid and the resname for the 100th residue?
(Hint: investigate the Residue object interactively with TAB
completion)
>>> r100 = u.atoms.residues[99] >>>
print(r100.id, r100.name) 100 GLY
4. How many segments are there?
>>> len(u.segments) >>> len(u.atoms.segments)
>>> u.atoms.numberOfSegments() 1
>>> s1 = u.segments[0] >>> s1.id ’4AKE’
See Also:
• numberOfResidues() and numberOfAtoms()
1.2.3 Selections
MDAnalysis comes with a fairly complete atom selection facility.
Primarily, one uses the method selectAtoms() of a Universe:
>>> CA = u.selectAtoms("protein and name CA") >>>
CA >>> <AtomGroup with 214 atoms>
but really any AtomGroup has a selectAtoms() method:
>>> acidic = CA.selectAtoms("resname ASP or resname GLU")
>>> acidic >>> <AtomGroup with 35 atoms>
>>> acidic.residues <ResidueGroup [<Residue ’GLU’,
22>, <Residue ’ASP’, 33>, <Residue ’GLU’, 44>,
<Residue ’ASP’, 51>, <Residue ’ASP’, 54>, <Residue
’ASP’, 61>, <Residue ’GLU’, 62>, <Residue ’GLU’,
70>, <Residue ’GLU’, 75>, <Residue ’ASP’, 76>,
<Residue ’ASP’, 84>, <Residue ’ASP’, 94>, <Residue
’GLU’, 98>, <Residue ’ASP’, 104>, <Residue ’GLU’,
108>, <Residue ’ASP’, 110>, <Residue ’ASP’, 113>,
<Residue ’GLU’, 114>, <Residue ’ASP’, 118>, <Residue
’GLU’, 143>, <Residue ’ASP’, 146>, <Residue ’ASP’,
147>, <Residue ’GLU’, 151>, <Residue ’GLU’, 152>,
<Residue ’ASP’, 158>, <Residue ’ASP’, 159>, <Residue
’GLU’, 161>, <Residue ’GLU’, 162>, <Residue ’GLU’,
170>, <Residue ’GLU’, 185>, <Residue ’GLU’, 187>,
<Residue ’ASP’, 197>, <Residue ’GLU’, 204>, <Residue
’ASP’, 208>, <Residue ’GLU’, 210>]>
6 Chapter 1. Contents
See Also:
All the selection keywords are described in the
documentation.
Selections can be combined with boolean expression and it is also
possible to select by geometric criteria, e.g. with the around
distance selection keyword:
u.selectAtoms("((resname ASP or resname GLU) and not (backbone or
name CB or name CG)) \ and around 4.0 ((resname LYS or resname ARG)
\
and not (backbone or name CB or name CG))").residues
What is this selection trying to accomplish?
Exercises 2
1. Select the range of resids 100 to 200 (“100-200”) with a
selection. Compare the result to what you get by slicing the
u.atoms.residues appropriately.
Which approach would you prefer to use in a analysis script?
Solution:
>>> u.selectAtoms("resid 100-200") <AtomGroup with 1609
atoms>
Compare to the slicing solution (doing an element-wise comparison,
i.e. residue by residue in each list()):
>>> list(u.selectAtoms("resid 100-200").residues) ==
list(u.atoms.residues[99:200])
If one wants to get specific residues in scripts one typically uses
selections instead of slicing because the index in the slice might
not correspond to the actual residue ids (minus 1): If a number of
residues (e.g. 150-160) are missing from the structure then the
selection will simply give you residues 100-149 and 151-200 but the
slice 99:200 would give you residues 100-149, 151-209.
2. Select all residues that do not contain a Cβ (“CB”) atom. How
many are there? What residue names did you find?
Solution:
>>> sel = u.selectAtoms("(byres name CA) and not (byres
name CB)").residues >>> len(sel) 20
These are all Glycines, as can be seen by comparing the residue
groups element-wise:
>>> glycines = u.selectAtoms("resname GLY") >>>
list(sel) == list(glycines.residues) True
1.3 Working with AtomGroups
A AtomGroup has a large number of methods attributes defined that
provide information about the atoms such as names, indices, or the
coordinates in the positions attribute:
>>> CA = u.selectAtoms("protein and name CA") >>>
r = CA.positions >>> r.shape (214, 3)
1.3. Working with AtomGroups 7
Practical 15: MDAnalysis Documentation, Release 1.0
The resulting output is a numpy.ndarray. The main purpose of
MDAnalysis is to get trajectory data into numpy arrays!
1.3.1 Important methods and attributes of AtomGroup
The coordinates positions attribute is probably the most important
information that you can get from an AtomGroup.
Other quantities that can be easily calculated for a AtomGroup
are
• the center of mass centerOfMass() and the center of geoemtry (or
centroid) centerOfGeometry() (equivalent to centroid());
• the total mass totalMass();
• the total charge totalCharge() (if partial charges are defined in
the topology);
• the radius of gyration
mi(ri −R)2
with radiusOfGyration();
• the principal axes p1,p2,p1 from principalAxes() via a
diagonalization of the tensor of inertia momentOfInertia(),
Λ = UT IU, with U = (p1,p2,p3)
where U is a rotation matrix whose columns are the eigenvectors
that form the principal axes, Λ is the diagonal matrix of
eigenvalues (sorted from largest to smallest) known as the
principal moments of inertia, and I =∑N i=1mi[(ri · ri)
∑3 α=1 eα ⊗ eα − ri ⊗ ri] is the tensor of inertia.
1.3.2 Exercises 3
• CORE residues 1-29, 60-121, 160-214 (gray)
• NMP residues 30-59 (blue)
• LID residues 122-159 (yellow)
1. Calculate the center of mass and the center of geometry for each
of the three domains.
• What are the distances between the centers of mass?
(Hint: you can use numpy.linalg.norm() or use a function like
veclength() that you defined previously)
8 Chapter 1. Contents
Practical 15: MDAnalysis Documentation, Release 1.0
• Does it matter to use center of mass vs center of geometry?
AdK undergoes a conformational transition during which CORE and LID
move relative to each other. The movement can be characterized by
two angles, θNMP and θLID, which are defined between the centers of
geometry of the backbone and Cβ atoms between groups of residues
[Beckstein2009]:
definition of θNMP A: 115-125, B: 90-100, C: 35-55
definition of θLID A: 179-185, B: 115-125, C: 125-153
The angle between vectors ~BA and ~BC is
θ = arccos
)
2. Write a function theta_NMP() that takes a Universe as an
argument and computes θNMP:
theta_NMP(u) Calculate the NMP-CORE angle for E. coli AdK in
degrees from Universe u
Use the following incomplete code as a starting point:
import numpy as np from np.linalg import norm
def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK
in degrees""" A = u.selectAtoms("resid 115:125 and (backbone or
name CB)").centerOfGeometry() B = C = BA = A - B BC = theta =
np.arccos( return np.rad2deg(theta)
Write the function in a file adk.py and use ipython %run adk.py to
load the function while working on it.
Test it on the AdK simulation (actually, the first frame):
>>> theta_NMP(u) 44.124821
1.3. Working with AtomGroups 9
Test it:
>>> theta_LID(u) 107.00881
1.3.3 Processing AtomGroups
You can directly write a AtomGroup to a file with the write()
method:
CORE = u.selectAtoms("resid 1:29 or resid 60:121 or resid 160:214")
CORE.write("AdK_CORE.pdb")
(The extension determines the file type.)
You can do fairly complicated things on the fly, such as writing
the hydration shell around a protein to a file
u.selectAtoms("byres (name OW and around 4.0
protein)").write("hydration_shell.pdb")
for further analysis or visualization.
You can also write Gromacs index files (in case you don’t like
make_ndx...) with the write_selection() method:
CORE.write_selection("CORE.ndx", name="CORE")
1.4 Trajectory analysis
The Universe binds together the static topology (which atoms, how
are they connected, what un-changing properties do the atoms
possess (such as partial charge), ...) and the changing coordinate
information, which is stored in the trajectory.
The length of a trajectory (number of frames) is
len(u.trajectory)
The standard way to assess each time step (or frame) in a
trajectory is to iterate over the Universe.trajectory attribute
(which is an instance of Reader class):
for ts in u.trajectory: print("Frame: %5d, Time: %8.3f ps" %
(ts.frame, u.trajectory.time)) print("Rgyr: %g A" %
(u.atoms.radiusOfGyration(), ))
The time attribute contains the current time step. The Reader only
contains information about one time step: imagine a cursor or
pointer moving along the trajectory file. Where the cursor points,
there’s you current coordinates, frame number, and time.
Normally you will collect the data in a list or array, e.g.
10 Chapter 1. Contents
Rgyr = [] protein = u.selectAtoms("protein") for ts in
u.trajectory:
Rgyr.append((u.trajectory.time, protein.radiusOfGyration())) Rgyr =
np.array(Rgyr)
Note: It is important to note that the coordinates and related
properties calculated from the coordinates such as the radius of
gyration change while selections such as protein in the example do
not change when moving through a trajectory: You can define the
selection once and the recalculate the property of interest for
each frame of the trajectory.
The data can be plotted to give the graph below:
# quick plot from pylab import * plot(Rgyr[:,0], Rgyr[:,1], ’r--’,
lw=2, label=r"$R_G$") xlabel("time (ps)") ylabel(r"radius of
gyration $R_G$ ($\AA$)")
What does the shape of the RG(t) time series indicate?
0 20 40 60 80 100 time (ps)
16.5
17.0
17.5
18.0
18.5
19.0
19.5
20.0
import numpy as np from numpy.linalg import norm
def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK
in degrees""" C = u.selectAtoms("resid 115:125 and (backbone or
name CB)").centerOfGeometry() B = u.selectAtoms("resid 90:100 and
(backbone or name CB)").centerOfGeometry() A = u.selectAtoms("resid
35:55 and (backbone or name CB)").centerOfGeometry() BA = A - B BC
= C - B theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC)))
return np.rad2deg(theta)
def theta_LID(u): """Calculate the LID-CORE angle for E. coli AdK
in degrees""" C = u.selectAtoms("resid 179:185 and (backbone or
name CB)").centerOfGeometry()
1.4. Trajectory analysis 11
B = u.selectAtoms("resid 115:125 and (backbone or name
CB)").centerOfGeometry() A = u.selectAtoms("resid 125:153 and
(backbone or name CB)").centerOfGeometry() BA = A - B BC = C - B
theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return
np.rad2deg(theta)
and calculate the time series θNMP(t) and θLID(t).
Plot them together in one plot.
2. Plot θNMP(t) against θLID(t).
What does the plot show?
Why could such a plot be useful?
0 20 40 60 80 100 time t (ps)
40 ±
60 ±
80 ±
100 ±
120 ±
140 ±
160 ±
NMP-CORE angle µNMP
The code to generate the figure contains theta_LID() and
theta_NMP().
1 import numpy as np 2 from numpy.linalg import norm 3
4 def theta_NMP(u): 5 """Calculate the NMP-CORE angle for E. coli
AdK in degrees""" 6 C = u.selectAtoms("resid 115:125 and (backbone
or name CB)").centerOfGeometry() 7 B = u.selectAtoms("resid 90:100
and (backbone or name CB)").centerOfGeometry() 8 A =
u.selectAtoms("resid 35:55 and (backbone or name
CB)").centerOfGeometry() 9 BA = A - B
10 BC = C - B 11 theta = np.arccos(np.dot(BA,
BC)/(norm(BA)*norm(BC))) 12 return np.rad2deg(theta) 13
14 def theta_LID(u): 15 """Calculate the LID-CORE angle for E. coli
AdK in degrees""" 16 C = u.selectAtoms("resid 179:185 and (backbone
or name CB)").centerOfGeometry() 17 B = u.selectAtoms("resid
115:125 and (backbone or name CB)").centerOfGeometry() 18 A =
u.selectAtoms("resid 125:153 and (backbone or name
CB)").centerOfGeometry() 19 BA = A - B 20 BC = C - B 21 theta =
np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) 22 return
np.rad2deg(theta) 23
24 if __name__ == "__main__": 25 import MDAnalysis 26 from
MDAnalysis.tests.datafiles import PSF, DCD 27 import matplotlib 28
import matplotlib.pyplot as plt 29
30 u = MDAnalysis.Universe(PSF, DCD)
12 Chapter 1. Contents
31 data = np.array([(u.trajectory.time, theta_NMP(u), theta_LID(u))
for ts in u.trajectory]) 32 time, NMP, LID = data.T 33
34
35 # plotting 36 degreeFormatter =
matplotlib.ticker.FormatStrFormatter(r"%g$^\circ$") 37 fig =
plt.figure(figsize=(6,3)) 38
39 ax1 = fig.add_subplot(121) 40 ax1.plot(time, NMP, ’b-’, lw=2,
label=r"$\theta_{\mathrm{NMP}}$") 41 ax1.plot(time, LID, ’r-’,
lw=2, label=r"$\theta_{\mathrm{LID}}$") 42 ax1.set_xlabel(r"time
$t$ (ps)") 43 ax1.set_ylabel(r"angle $\theta$") 44
ax1.yaxis.set_major_formatter(degreeFormatter) 45
ax1.legend(loc="best") 46
47 ax2 = fig.add_subplot(122) 48 ax2.plot(NMP, LID, ’k-’, lw=3) 49
ax2.set_xlabel(r"NMP-CORE angle $\theta_{\mathrm{NMP}}$") 50
ax2.set_ylabel(r"LID-CORE angle $\theta_{\mathrm{LID}}$") 51
ax2.xaxis.set_major_formatter(degreeFormatter) 52
ax2.yaxis.set_major_formatter(degreeFormatter) 53
ax2.yaxis.tick_right() 54 ax2.yaxis.set_label_position("right")
55
56 fig.subplots_adjust(left=0.12, right=0.88, bottom=0.2,
wspace=0.15) 57
58 for ext in (’svg’, ’pdf’, ’png’): 59
fig.savefig("NMP_LID_angle_projection.{0}".format(ext))
Note that one would normally write the code more efficiently and
generate the atom groups once and then pass them to a simple
function to calculate the angle
def theta(A, B, C): """Calculate the angle between BA and BC for
AtomGroups A, B, C""" B_center = B.centroid() BA = A.centroid() -
B_center BC = C.centroid() - B_center theta = np.arccos(np.dot(BA,
BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta)
1.4.2 Bells and whistles
Especially useful for interactive analysis in ipython –pylab using
list comprehensions (implicit for loops):
protein = u.selectAtoms("protein") data =
np.array([(u.trajectory.time, protein.radiusOfGyration()) for ts in
u.trajectory]) time, RG = data.T plot(time, RG)
More on the trajectory iterator
One can directly jump to a frame by using “indexing syntax”:
1.4. Trajectory analysis 13
Practical 15: MDAnalysis Documentation, Release 1.0
>>> u.trajectory[50] < Timestep 51 with unit cell
dimensions array([ 0., 0., 0., 90., 90., 90.], dtype=float32) >
>>> ts.frame 51
You can also slice trajectories, e.g. if you want to start at the
10th frame and go to 10th before the end, and only use every 5th
frame:
for ts in u.trajectory[9:-10:5]: print(ts.frame) ...
(although doing this on Gromacs XTC and TRR trajectories is
currently much slower than for DCDs.)
Note: Trajectory indexing and slicing uses 0-based indices (as in
standard Python) but MDAnalysis numbers frames starting with 1 (for
historical reasons and according to the practice of all MD
codes).
1.5 Intermediate Level MDAnalysis hacks
MDAnalysis comes with a number of existing analysis code in the
MDAnalysis.analysis module and example scripts (see also the
Examples on the MDAnalysis wiki).
1.5.1 RMSD
As an example we will use the MDAnalysis.analysis.rms.rmsd()
function from the MDAnalysis.analysis.rms module. It computes the
coordinate root mean square distance between two sets of
coordinates. For example for the AdK trajectory the backbone RMSD
between first and last frame is
>>> u = Universe(PSF,DCD) >>> bb =
u.selectAtoms(’backbone’) >>> A = bb.positions #
coordinates of first frame >>> u.trajectory[-1] # forward
to last frame >>> B = bb.positions # coordinates of last
frame >>> rmsd(A,B) 6.8342494129169804
1.5.2 Superposition of structure
In order to superimpose two structures in a way that minimizes the
RMSD we have functions in the MDAnalysis.analysis.align
module.
The example uses files provided as part of the MDAnalysis test
suite (in the variables PSF, DCD, and PDB_small). For all further
examples execute first
>>> from MDAnalysis import Universe >>> from
MDAnalysis.analysis.align import * >>> from
MDAnalysis.tests.datafiles import PSF, DCD, PDB_small
In the simplest case, we can simply calculate the C-alpha RMSD
between two structures, using rmsd():
>>> ref = Universe(PDB_small) >>> mobile =
Universe(PSF,DCD)
14 Chapter 1. Contents
>>> rmsd(mobile.atoms.CA.positions,
ref.atoms.CA.positions) 18.858259026820352
Note that in this example translations have not been removed. In
order to look at the pure rotation one needs to superimpose the
centres of mass (or geometry) first:
>>> ref0 = ref.atoms.CA.positions -
ref.atoms.CA.centerOfMass() >>> mobile0 =
mobile.atoms.CA.positions - mobile.atoms.CA.centerOfMass()
>>> rmsd(mobile0, ref0) 6.8093965864717951
The rotation matrix that superimposes mobile on ref while
minimizing the CA-RMSD is obtained with the rotation_matrix()
function
>>> R, rmsd = rotation_matrix(mobile0, ref0) >>>
print rmsd 6.8093965864717951 >>> print R [[ 0.14514539
-0.27259113 0.95111876] [ 0.88652593 0.46267112 -0.00268642]
[-0.43932289 0.84358136 0.30881368]]
Putting all this together one can superimpose all of mobile onto
ref :
>>>
mobile.atoms.translate(-mobile.atoms.CA.centerOfMass())
>>> mobile.atoms.rotate(R) >>>
mobile.atoms.translate(ref.atoms.CA.centerOfMass()) >>>
mobile.atoms.write("mobile_on_ref.pdb")
1.5.3 Exercise 5
Use the above in order to investigate how rigid the CORE, NMP, and
LID domains are during the transition: Compute time series of the
CA RMSD of each domain relative to its own starting structure, when
superimposed on the starting structure.
• You will need to make a copy of the starting reference
coordinates that are needed for the shifts, e.g.
NMP = u.selectAtoms("resid 30:59") u.trajectory[0] # make sure to
be on initial frame ref_com = NMP.selectAtoms("name
CA").centerOfMass() ref0 = NMP.positions - ref_com
which is then used instead of ref.atoms.CA.centerOfMass() (which
would change for each time step).
• I suggest writing a function that does the superposition for a
given time step, reference, and mobile AtomGroup to make the code
more manageable (or use MDAnalysis.analysis.align.alignto())
1.5. Intermediate Level MDAnalysis hacks 15
0 20 40 60 80 100 time t (ps)
0.0
0.5
1.0
1.5
2.0
)
CORE
NMP
LID
Possible solution
The code contains a function superpose() and rmsd(). The latter is
marginally faster because we only need the calculated RMSD and not
the full rotation matrix. (We are calling the lower-level function
MDAnalysis.core.qcprot.CalcRMSDRotationalMatrix() directly, which
has somewhat non-intuitive calling conventions). superpose() also
does the superposition of the mobile group to the references (but
alignto() is actually a more flexible tool for doing this).
Otherwise it is mostly book-keeping, which is solved by organizing
everything in dictionaries with keys “CORE”, “NMP”, “LID”.
1 import numpy as np 2 from MDAnalysis.analysis.align import
rotation_matrix 3 from MDAnalysis.core.qcprot import
CalcRMSDRotationalMatrix 4
5 def superpose(mobile, xref0, xref_com=None): 6 """Superpose the
AtomGroup *mobile* onto the coordinates *xref0* centered at the
orgin. 7
8 The original center of mass of the reference group *xref_com*
must 9 be supplied or the superposition is done at the origin of
the
10 coordinate system. 11 """ 12 # 995 us 13 xref_com = xref_com if
xref_com is not None else np.array([0., 0., 0.]) 14 xmobile0 =
mobile.positions - mobile.centerOfMass() 15 R, rmsd =
rotation_matrix(xmobile0, xref0) 16 mobile.rotate(R) 17
mobile.translate(xref_com) 18 return rmsd 19
20 def rmsd(mobile, xref0): 21 """Calculate optimal RMSD for
AtomGroup *mobile* onto the coordinates *xref0* centered at the
orgin. 22
23 The coordinates are not changed. No mass weighting. 24 """ 25 #
738 us 26 xmobile0 = mobile.positions - mobile.centerOfMass()
16 Chapter 1. Contents
27 return CalcRMSDRotationalMatrix(xref0.T.astype(np.float64),
xmobile0.T.astype(np.float64), mobile.numberOfAtoms(), None, None)
28
29
30 if __name__ == "__main__": 31 import MDAnalysis 32 import
matplotlib 33 import matplotlib.pyplot as plt 34
35 # load AdK DIMS trajectory 36 from MDAnalysis.tests.datafiles
import PSF, DCD 37 u = MDAnalysis.Universe(PSF, DCD) 38
39 # one AtomGroup per domain 40 domains = { 41 ’CORE’:
u.selectAtoms("(resid 1:29 or resid 60:121 or resid 160:214) and
name CA"), 42 ’LID’: u.selectAtoms("resid 122-159 and name CA"), 43
’NMP’: u.selectAtoms("resid 30-59 and name CA"), 44 } 45 colors =
{’CORE’: ’black’, ’NMP’: ’blue’, ’LID’: ’red’} 46
47 u.trajectory[0] # rewind trajectory 48 xref0 = dict((name,
g.positions - g.centerOfMass()) for name,g in domains.iteritems())
49
50 nframes = len(u.trajectory) 51 results = dict((name,
np.zeros((nframes, 2), dtype=np.float64)) for name in domains)
52
53 for iframe,ts in enumerate(u.trajectory): 54 for name, g in
domains.iteritems(): 55 results[name][iframe, :] =
u.trajectory.time, rmsd(g, xref0[name]) 56
57
58 # plot 59 fig = plt.figure(figsize=(5,5)) 60 ax =
fig.add_subplot(111) 61 for name in "CORE", "NMP", "LID": 62 data =
results[name] 63 ax.plot(data[:,0], data[:,1], linestyle="-",
color=colors[name], lw=2, label=name) 64 ax.legend(loc="best") 65
ax.set_xlabel(r"time $t$ (ps)") 66 ax.set_ylabel(r"C$_\alpha$ RMSD
from $t=0$, $\rho_{\mathrm{C}_\alpha}$ ($\AA$)") 67
68 for ext in (’svg’, ’pdf’, ’png’): 69
fig.savefig("AdK_domain_rigidity.{0}".format(ext))
1.5. Intermediate Level MDAnalysis hacks 17
Practical 15: MDAnalysis Documentation, Release 1.0
18 Chapter 1. Contents
20 Chapter 2. References
BIBLIOGRAPHY
[Michaud-Agrawal2011] N. Michaud-Agrawal, E. J. Denning, T. B.
Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of
Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011),
2319–2327, doi:10.1002/jcc.21787 PMCID:PMC3144279
[Beckstein2009] O Beckstein. EJ Denning, JR Perilla, and TB Woolf.
Zipping and Unzipping of Adenylate Ki- nase: Atomistic Insights
into the Ensemble of Open/Closed Transitions. J Mol Biol 394
(2009), 160–176. doi:10.1016/j.jmb.2009.09.009
24 Bibliography
25
Contents