Upload
abhilash-kannan
View
530
Download
4
Embed Size (px)
Citation preview
MoleculardynamicsandSimulations
Abhilash Kannan,TIFR (mumbai)
MoleculardynamicsandSimulations
� Molecular dynamics (MD) is a form of computer simulationin which atoms and molecules are allowed to interact for aperiod of time.
� Because molecular systems generally consist of a vast number ofparticles, it is impossible to find the properties of such complexsystems analytically; MD simulation circumvents this problemby using numerical methods.
� It represents an interface between laboratory experiments andtheory.
� It can be understood as a "virtual experiment.
PurposeOfMDforSUMO• Proteins in solution are considered to dynamic.
• It is difficult to study their motions, behavior, structural flexibility insolution.
• The strucutre of small proteins can be solved and studied by theconventional techniqueof X-RAY CRYSTALLOGRAPHY.
• X-RAY techniques require strict periodic boundary conditions whichis very difficult to obtain in a non crystalline strucutres.
• Molecular dynamics simulations can predict the state of a protein insolution and save these states in the form of a trajectory.
• MD can predict the movement of large proteins in the solution whichis not possible in X-ray.
• MD can simulate the exact conditon of the existence of a protein.• Structures obtained after MD simulation can be regarded as best
energy minimized and geometrically optimized structres thusallowing them to be used in various experiments-------NMR,Docking, protein-ligand interactions.
BriefMethodology1. Use physics to find the potential energy between all pairs of atoms.2. Move atoms to the next state.3. Repeat.
Energy Function
� Describes the interaction energies of all atoms and molecules in the system.
� Always an approximation.
Closer to real physics --> more realistic, more computation time (I.e. smaller time steps and more interactions increase accuracy)
ScaleinSimulations
Ηψ = Εψ
F = MA
exp(-ΔE/kT)
domain
quantumchemistry
moleculardynamics
Monte Carlo
mesoscale continuum
Length Scale10-10 M 10-8 M 10-6 M 10-4 M
10-12 S
10-8 S
10-6 S
Moleculardynamicsonproteins
� Although normally represented as static structures, proteinsare in fact dynamic.
� Most experimental properties, for example, measure a timeaverage or an ensemble average over the range of possibleconfigurations the molecule can adopt.
� One way to investigate the range of accessible configurationsis to simulate the motions or dynamics of a moleculenumerically. This can be done by computing a trajectory, aseries of molecular configurations as a function of time, by thesimultaneous integrationof Newton's equations of motion.
SowhatexactlytheMolecularDynamicsis?
• It is the process of giving the movements to proteins internallywhich is produced by increasing the temperature of the systemand cooling them rapidly in a very short time scale.
• During these conditions the steric interactions or the imperfectbonds between the amino acid residues and the peptides areremoved or modified.
• It generates the most stable and the energy minimizedconformations of the protein.
• While doing so it computes many different frames or trajectoriesof the same protein.
MDofproteininvacuumMDofproteininWater
Molecular Dynamics ofSUMO proteins
SUMOproteins� Small Ubiquitin-like Modifier or SUMO proteins are a family of
small proteins that are covalently attached to and detached fromother proteins in cells to modify their function.
� The function performed by SUMO proteins is known asSUMOylation.
� post-trnalational modification involved in various cellularprocesses such as transcriptional regulation, apoptosis, proteinstability etc.
� Similar to ubiquitin and SUMOylation is directed by an enzymaticcascade analogous to that involved in ubiquitination. In contrastto ubiquitin, SUMO is not used to tag proteins for degradation.
StructureschematicofhumanSUMOprotein
NMR structure of SUMO: the backbone of the protein isrepresented as a ribbon, highlighting secondary structure; N-terminus in blue, C-terminus in red
FunctionofSUMO� SUMO modification of proteins has many functions. Among the
most frequent and best studied are protein stability, nuclear-cytosolic transport, and transcriptional regulation.
� Typically, only a small fraction of a given protein is SUMOylatedand this modification is rapidly reversed by the action ofdeSUMOylating enzymes. The SUMO-1 modification of RanGAP1(the first identified SUMO substrate) leads to its trafficking fromcytosol to nuclear pore complex.
� The SUMO modification of protein leads to its movement from the centrosome to the nucleus .
Structure� There are 3 confirmed SUMO isoformsi n humans; SUMO-1, SUMO-2 and
SUMO-3. SUMO-2/3 show high a high degree of similarity to each other and are distinct from SUMO-1.
� SUMO proteins are small; most are around 88 to100 amino acids in length and 12 kDA in mass. The exact length and mass varies between SUMO family members and depends on which organism the protein comes from.
� Although SUMO has very little homology with Ubiquitin at the amino acid level, it has a nearly identical structural fold.
� SUMO1 as a globular protein with both ends of the amino acid chain sticking out of the protein's centre. The spherical core consists of an alpha helix and a beta sheet.
The SUMO protein taken for this work was extracted out from Drosophila melangaster
GivingDynamicstotheproteinStep 1 : generationof structures.
Step 2: performing molecular dynamics on each of the topologies.
Step 3: Recording the potential energy changes in protein during Dynamics.
Step 4: Clustering of the best minimized structures.
Programs/software's used:• Cyana• NAMD/VMD• VEGA ZZ• GROMACS
GenerationofstructuresusingCyana� For the sake of convenience and ease of dynamics, SUMO protein
was divided in to five fragments.
� These fragments were divided based on their propensity to form secondary structures.
Fragment Residue numbers
Fragment 1 1-12
Fragment 2 11-32
Fragment 3 31-53
Fragment 4 52-72
Fragment 5 71-88
� First it was necessary to make 1000 random conformers or topologies from each of the five different fragments.
� The program used for this structure generation was CYANA which is linux-based program.
� For this the sequences of each of the five fragments of SUMO protein was given to the program and was told to generate 1000 random topologies.
� Dynamics and annealing conditions were applied to give the energy of these 1000 random structures.
� The program was told to select 20 best energy minimized structures.� These 20 different topologies could be viewed using softwares like
Pymol, Molmol, VMD etc.
� The dynamics of the protein was carried out in vacuum without giving any constraints to them and the dynamics of the protein could be played using the above mentioned softwares and are saved.
FilesusedinCyana� First need to create .CCO file of the FIVE fragments .CCO file------------
1 MET H HA 6.7277 3.20E+002 SER H HA 6.9968 1.20E+003 ASP H HA 6.4720 1.20E+004 GLU H HA 6.8444 1.20E+005 LYS H HA 6.9625 1.20E+00.......53 THR H HA 7.5359 2.20E+00
•Init.cyaFile
.cya is a batch file which contains a set commands.
Rmsd range := 31.....53Cyana.libRead seq third.seqSwap = 0
Batchfile
� ‘Seed’ asks the program to generate 1000 topologies.
� The last two commands create the 20 best topologies.
performing molecular dynamics on each of the topologies
Using
GROMACS………………
HIGHLIGHTS� Generally 3 to 10 times faster than other Molecular Dynamics programs� Very user-friendly: issues clear error messages, no scripting language is
required to run the programs, prints out the progress of the program that is running, etc.
� Allows the trajectory data to be stored in a compact way. � Gromacs provides a basic trajectory data viewer; xmgr or Grace may also
be used to analyze the results.� Files from earlier versions of Gromacs may be used in the latest Gromacs,
version 3.1.
To run a simulation several things are needed: 1. a file containing the coordinates for all atoms. 2. information on the interactions (bond angles, charges, Van der
Waals). 3. parameters to control the simulation.
The exercise falls apart in four sections, corresponding to the actual steps in an MD simulation.
1. Conversion of the pdb structure file to a Gromacs structure file, with the simultaneous generation of a descriptive topology file.
2. Energy minimization of the structure to release strain.
3. Running a full simulations.
4. Analyzing results.
File Formats� PDB file
------ format used by Brookhaven Protein DataBank.
Atom
residueRes.no
X,Y,Z coordinates
Topology (*.top) file (ascii) ---------contains all the forcefield parameters
Gromacs(*.gro): molecular structure file in the Gromos87 format
(Gromacs format)
x, y, and z position,
in nm
x, y, and z velocity, in
nm/ps
*.tpr Filecontains the starting structure of the simulation, the molecular topology file and all the simulation parameters; binary format.
Trajectory(*.trr)file:contains the trajectory data for the simulation; binary format. It contains allthe coordinates, velocities, forces and energies.
Differenttrajectories
T=1 T=2 T=3
T=4 T=5 T=N
*.xvg file:
file format that is read by Grace (formerly called Xmgr), which is a plotting tool for the X window system.
Plot of X vs Y
*.mdp file: allows the user to set up specific parameters for all the calculations that Gromacs performs.
Recording every 0.002ps
No of steps for MD=500000
Coloumb interactions within 1.4A
Vander waalsinteraction
within 1.4radius
Takes into account interactions from
all the bonds
em.mdp file:setstheparametersforrunningenergyminimizations
integratorIterations
Neighbor list
constraints
Forces and potential
The exercise falls apart in four sections, corresponding to the actual steps in an MD simulation.
1. Conversion of the pdb structure file to a Gromacs structure file, with the simultaneous generation of a descriptive topology file.
2. Energy minimization of the structure to release strain.
3. Running a full simulations.
4. Analyzing results.
Batchfileforexecutingmoleculardynamics
Step1:ConversionofthePDBFile
� Each of the topologies saved in the PDB format were used as an input file for MD simulations with Gromacs.
� It is first necessary to convert it to the gromos file type (*.gro). Original data in the pdb file is often incomplete, carbon bound hydrogens are generally omitted.
� The conversion program pdb2gmx will check every residue in the structure file against a database and add all hydrogens. In the conversion process it also creates a topology file, with all the connections between the atoms listed.
� pdb2gmx can be used by simply typing it at the prompt.Example : we get a list of available options that that this conversion program can execute - - - -
pdb2gmx -h
pdb2gmx-h
� This program reads a pdb file, reads some database files, adds hydrogens to the molecules and generates coordinates in Gromacs(Gromos) format and a topology in Gromacs format.
� This conversion program contains many in-built force-fields. We have to select the required force fields.
Option in Pdb2gmx:
Options Description-f Input-o Output-p Output for topology file-i Output-n Output-q output-ff To assign force field
� The program will ask to select a force field:
� Select the Force Field:
0: GROMOS96 43a1 force field.1: GROMOS96 43b1 vacuum force field.2: GROMOS96 43a2 force field (improved alkane
dihedrals).3: OPLS-AA/L all-atom force field (for aminoacid
dihedrals).4: Gromacs force field (gmx) with hydrogens forNMR.5: Encad all-atom force field, using scaled-down
vacuum charges.6: Encad all-atom force field, using full solvent charges.
Gmx force field was used (for NMR)
Outputs produced by this command…………………..Once the selection of the force field is done, three kinds of output files are produced:
1. PDB files2. the generated topology (.top) file3. gromos (.gro) file
• dsmt3.gro:It looks a lot like the original pdb file, containing the same information regarding
the positions of the atoms, but the layout is different, hydrogens have been added and units have been converted to nm.
• dsmt3.top:This file contains the information on the atom names, types, masses and charges, as
well as a description of bonds, angles, dihedrals, etc.
Forcefieldsusedusingduringmoleculardynamics� force field (also called a forcefield) refers to the functional form and
parameter sets used to describe the potential energy of a system of particles ( in this case the atoms and the residues).
� As protein models consist of hundreds or thousands of atoms the only feasible methods of computing systems of such size are molecular mechanics calculations.
� A Force- Field is assigned to each atom in the protein. This figure is a schematic representation of the four key contributions to a molecular mechanics force field: bond stretching, angle bending, torsional terms and non-bonded interactions.
Bond Stretching Energy
Bending Energy
Torsion Energy
Non-bonded Energy
Energy =
Stretching Energy +
Bending Energy+
Torsion Energy +
Non-Bonded Interaction Energy
Types of force fields
1. All-atom force fields - provide parameters for every atom in a system, including hydrogen.
2. united-atom force fields - treat the hydrogen and carbon atoms in methyl and methylene groups as a single interaction center.
3. Coarse-grained force fields - which are frequently used in long-time simulations of proteins.
These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a force-field.
Batchfileforexecutingmoleculardynamics
Done
Editconf• editconf puts .gro file into a box• The box can be modified with options -box, -d and -angles. Both -box and –d will center
the system in the box.• Option -bt determines the box type: cubic is a rectangular box with all sides equal
dodecahedron represents a rhombic dodecahedron and octahedron is a truncated octahedron.
• With -d and cubic, dodecahedron or octahedron boxes, the dimensions are set to the diameter of the system.
Options in Editconf
Option Description
-f Input
-n Output
-o Output
-bt For box type
-d Distance between the solute and the box
Batchfileforexecutingmoleculardynamics
Done
Done
Genbox……………….Genbox can do one of 2 things:1) Generate a box of solvent.2) Solvate a solute configuration, eg. a protein, in a bath of solvent molecules. Specify -cp
(solute) and -cs (solvent). The box specified in the solute coordinate file (-cp) is used.Options in Genbox
• Here the solvent of 8M urea (in the form of the denaturant) was prepared with the proteinacting as a solute (protein dissolved in 8M urea).
• The solvent file for urea was in the the form of urea+water.gro
Options Description
-cp Input
-cs Input
-o Output
-p Output
Batchfileforexecutingmoleculardynamics
Done
DoneDone
� Step 2: Energy Minimization• The structure is now complete (hydrogens have been added) and a topology
file has been created. • However, there may be local strain in the protein, due to the generation of
the hydrogens, and bad Van der Waals contacts may exist, caused by particles that are too close.
• The strain has to be removed by energy minimization of the structure. This can be done with the program 'mdrun', which is the MD program. Mdrun uses a single .tpr file as input, which is generated by combining the topology (aki.top), structure (aki.gro) and parameter files (minim.mdp).
• grompp also reads parameters for the mdrun (eg. number of MD steps, time step, cut-off).
• To generate the .tpr file the program grompp has to be used.� A description of grompp can be obtained by giving the command:
grompp -h
Options in GromppOption Description-f grompp input file with MD parameters
-po grompp input file with MD parameters
-c Input
-r Input
-n Input
-p Input the topology file
-pp Preprocess and output the toplology file
-o Output
-t Input the trajectory file
-e Input the energy file
-np Generate the status file
Batchfileforexecutingmoleculardynamics
Done
DoneDone
Done
Mdrun• The mdrun program is the main computational chemistry engine within GROMACS.• It performs Molecular Dynamics simulations, Brownian Dynamics and Langevin
Dynamics as well as Conjugate Gradient or Steepest Descents energy minimization.Principle
The mdrun program reads the run input file (-s)
Distributes the topology over nodes.
The coordinates are passed around, so that computations can begin.
A neighborlist is made, then the forces are computed.
The forces are globally summed, and positions are updated.
If necessary shake is performed to constrain bond lengths and/or bond angles.
• Temperature and Pressure can be controlled using weak coupling to a bath.
• Option in Mdrun
Option Descriptionnp Number of nodes used
s Input
o Output
c Output
e Output
g Output
x Output
� The energy minimization may take some time, depending on the CPU in and the load of the computer.
� The trajectory file is not very important in energy minimizations, but the generated structure file (minimized.gro) will serve as input for the simulation.
� During the minimization the potential energy decreases. A plot from the energy over time can be made from the minim_ener.edr file using g_energy command.
� Simply make a plot from the .edr file by executing:
� This will display something like the following:
g_energy -f dsmt3-em_ener.edr -o dsmt3-em_ener.xvg
� Select the property you want by typing the name, e.g. Potential, which codes for potential energy and then press return and another return to quit.
� The program g_energy produces a .xvg graph, which can be viewed and edited with xmgrace (prgram which makes graphs in gromacs) :
� GRAPH
xmgrace -nxy dsmt3-em_ener.xvg
Batchfileforexecutingmoleculardynamics
Done
DoneDone
DoneDone
Position Restrained MD• molecular dynamics of the water molecules of water molecules are done, and position of the peptide
is kept fixed. This is called position restrained (PR) MD.
• Position Restrained MD keeps the peptide fixed and lets all water molecules equilibrate around the peptide in order to fill holes, etc., which were not filled by the genbox program.
• It is first necessary to pre-process the input files to generate the binary topology. The input files are: the topology file, the structure file (output of the EM) and a parameter file.
• By default, the system was split into two groups - Protein and SOL(vent), to put position restraints on all the atoms of the peptide.
• The parameter file (.mdp extension) contains information about the PR-MD such as step size, number of steps, temperature, etc. This parameter file also tells GROMACS what kind of simulation should be performed
preprocess Position restrained MD fileTopology file of the
proteinEnergy minimized
binary file from previous step
Position restrained MD file with energy minimized protein in
it
Batchfileforexecutingmoleculardynamics
Done
DoneDone
DoneDone
Done
Run MD
Position restrained MD file of
protein
Output of trajectory of PRMD file
Output in gromacsformat
Energy after position
restrained MD
Batchfileforexecutingmoleculardynamics
Done
DoneDone
DoneDone
Done
done
• These two commands are used to give full moloeculardynamics, where none of the systems are fixed.
• Both the proteins and the solvent is subjected to motion until both of them become stable at a particular point, which is retained as the final output.
• g_filter frequency filters trajectories, useful for making smooth movies. Many of the trajectories are filtered and in all only 10 trajectories are kept. These can be read by pymol(protein visualization software).
Batchfileforexecutingmoleculardynamics
Done
DoneDone
DoneDone
Done
done
DoneDone
� Pymol cannot read Gromacs xtc trajectories , and it is better to remove the solvent in the trajectory to concentrate on the proteins.
� This is easy to fix by using another Gromacs program to convert the trajectory to PDB format and only select one group:
� An output group is asked to select and protein is selected in that.
� The trajectories of protein after MD is visualized in Pymol by giving a command:
Pymol dsmt3-finaltraj.pdb
Batchfileforexecutingmoleculardynamics
Done
DoneDone
DoneDone
Done
done
DoneDone
Done
Acknowledgements………………………
Prof. Ramkrishna Hosur (TIFR, Mumbai)Dinesh Kumar (TIFR, Mumbai)Dr. Ganapathy Subramanian (NCL, Pune)
Special thanks to NMR Facility, TIFR, Mumbai
Thank You for the patience