70
Molecular dynamics and Simulations Abhilash Kannan, TIFR (mumbai)

Molecular dynamics and Simulations

Embed Size (px)

Citation preview

Page 1: Molecular dynamics and Simulations

MoleculardynamicsandSimulations

Abhilash Kannan,TIFR (mumbai)

Page 2: Molecular dynamics and Simulations

MoleculardynamicsandSimulations

� Molecular dynamics (MD) is a form of computer simulationin which atoms and molecules are allowed to interact for aperiod of time.

� Because molecular systems generally consist of a vast number ofparticles, it is impossible to find the properties of such complexsystems analytically; MD simulation circumvents this problemby using numerical methods.

� It represents an interface between laboratory experiments andtheory.

� It can be understood as a "virtual experiment.

Page 3: Molecular dynamics and Simulations

PurposeOfMDforSUMO• Proteins in solution are considered to dynamic.

• It is difficult to study their motions, behavior, structural flexibility insolution.

• The strucutre of small proteins can be solved and studied by theconventional techniqueof X-RAY CRYSTALLOGRAPHY.

• X-RAY techniques require strict periodic boundary conditions whichis very difficult to obtain in a non crystalline strucutres.

• Molecular dynamics simulations can predict the state of a protein insolution and save these states in the form of a trajectory.

• MD can predict the movement of large proteins in the solution whichis not possible in X-ray.

• MD can simulate the exact conditon of the existence of a protein.• Structures obtained after MD simulation can be regarded as best

energy minimized and geometrically optimized structres thusallowing them to be used in various experiments-------NMR,Docking, protein-ligand interactions.

Page 4: Molecular dynamics and Simulations

BriefMethodology1. Use physics to find the potential energy between all pairs of atoms.2. Move atoms to the next state.3. Repeat.

Energy Function

� Describes the interaction energies of all atoms and molecules in the system.

� Always an approximation.

Closer to real physics --> more realistic, more computation time (I.e. smaller time steps and more interactions increase accuracy)

Page 5: Molecular dynamics and Simulations

ScaleinSimulations

Ηψ = Εψ

F = MA

exp(-ΔE/kT)

domain

quantumchemistry

moleculardynamics

Monte Carlo

mesoscale continuum

Length Scale10-10 M 10-8 M 10-6 M 10-4 M

10-12 S

10-8 S

10-6 S

Page 6: Molecular dynamics and Simulations

Moleculardynamicsonproteins

� Although normally represented as static structures, proteinsare in fact dynamic.

� Most experimental properties, for example, measure a timeaverage or an ensemble average over the range of possibleconfigurations the molecule can adopt.

� One way to investigate the range of accessible configurationsis to simulate the motions or dynamics of a moleculenumerically. This can be done by computing a trajectory, aseries of molecular configurations as a function of time, by thesimultaneous integrationof Newton's equations of motion.

Page 7: Molecular dynamics and Simulations

SowhatexactlytheMolecularDynamicsis?

• It is the process of giving the movements to proteins internallywhich is produced by increasing the temperature of the systemand cooling them rapidly in a very short time scale.

• During these conditions the steric interactions or the imperfectbonds between the amino acid residues and the peptides areremoved or modified.

• It generates the most stable and the energy minimizedconformations of the protein.

• While doing so it computes many different frames or trajectoriesof the same protein.

Page 8: Molecular dynamics and Simulations

MDofproteininvacuumMDofproteininWater

Page 9: Molecular dynamics and Simulations

Molecular Dynamics ofSUMO proteins

Page 10: Molecular dynamics and Simulations

SUMOproteins� Small Ubiquitin-like Modifier or SUMO proteins are a family of

small proteins that are covalently attached to and detached fromother proteins in cells to modify their function.

� The function performed by SUMO proteins is known asSUMOylation.

� post-trnalational modification involved in various cellularprocesses such as transcriptional regulation, apoptosis, proteinstability etc.

� Similar to ubiquitin and SUMOylation is directed by an enzymaticcascade analogous to that involved in ubiquitination. In contrastto ubiquitin, SUMO is not used to tag proteins for degradation.

Page 11: Molecular dynamics and Simulations

StructureschematicofhumanSUMOprotein

NMR structure of SUMO: the backbone of the protein isrepresented as a ribbon, highlighting secondary structure; N-terminus in blue, C-terminus in red

Page 12: Molecular dynamics and Simulations

FunctionofSUMO� SUMO modification of proteins has many functions. Among the

most frequent and best studied are protein stability, nuclear-cytosolic transport, and transcriptional regulation.

� Typically, only a small fraction of a given protein is SUMOylatedand this modification is rapidly reversed by the action ofdeSUMOylating enzymes. The SUMO-1 modification of RanGAP1(the first identified SUMO substrate) leads to its trafficking fromcytosol to nuclear pore complex.

� The SUMO modification of protein leads to its movement from the centrosome to the nucleus .

Page 13: Molecular dynamics and Simulations

Structure� There are 3 confirmed SUMO isoformsi n humans; SUMO-1, SUMO-2 and

SUMO-3. SUMO-2/3 show high a high degree of similarity to each other and are distinct from SUMO-1.

� SUMO proteins are small; most are around 88 to100 amino acids in length and 12 kDA in mass. The exact length and mass varies between SUMO family members and depends on which organism the protein comes from.

� Although SUMO has very little homology with Ubiquitin at the amino acid level, it has a nearly identical structural fold.

� SUMO1 as a globular protein with both ends of the amino acid chain sticking out of the protein's centre. The spherical core consists of an alpha helix and a beta sheet.

The SUMO protein taken for this work was extracted out from Drosophila melangaster

Page 14: Molecular dynamics and Simulations

GivingDynamicstotheproteinStep 1 : generationof structures.

Step 2: performing molecular dynamics on each of the topologies.

Step 3: Recording the potential energy changes in protein during Dynamics.

Step 4: Clustering of the best minimized structures.

Programs/software's used:• Cyana• NAMD/VMD• VEGA ZZ• GROMACS

Page 15: Molecular dynamics and Simulations

GenerationofstructuresusingCyana� For the sake of convenience and ease of dynamics, SUMO protein

was divided in to five fragments.

� These fragments were divided based on their propensity to form secondary structures.

Fragment Residue numbers

Fragment 1 1-12

Fragment 2 11-32

Fragment 3 31-53

Fragment 4 52-72

Fragment 5 71-88

Page 16: Molecular dynamics and Simulations

� First it was necessary to make 1000 random conformers or topologies from each of the five different fragments.

� The program used for this structure generation was CYANA which is linux-based program.

� For this the sequences of each of the five fragments of SUMO protein was given to the program and was told to generate 1000 random topologies.

� Dynamics and annealing conditions were applied to give the energy of these 1000 random structures.

� The program was told to select 20 best energy minimized structures.� These 20 different topologies could be viewed using softwares like

Pymol, Molmol, VMD etc.

� The dynamics of the protein was carried out in vacuum without giving any constraints to them and the dynamics of the protein could be played using the above mentioned softwares and are saved.

Page 17: Molecular dynamics and Simulations

FilesusedinCyana� First need to create .CCO file of the FIVE fragments .CCO file------------

1 MET H HA 6.7277 3.20E+002 SER H HA 6.9968 1.20E+003 ASP H HA 6.4720 1.20E+004 GLU H HA 6.8444 1.20E+005 LYS H HA 6.9625 1.20E+00.......53 THR H HA 7.5359 2.20E+00

Page 18: Molecular dynamics and Simulations

•Init.cyaFile

.cya is a batch file which contains a set commands.

Rmsd range := 31.....53Cyana.libRead seq third.seqSwap = 0

Page 19: Molecular dynamics and Simulations

Batchfile

� ‘Seed’ asks the program to generate 1000 topologies.

� The last two commands create the 20 best topologies.

Page 20: Molecular dynamics and Simulations

performing molecular dynamics on each of the topologies

Using

GROMACS………………

Page 21: Molecular dynamics and Simulations

HIGHLIGHTS� Generally 3 to 10 times faster than other Molecular Dynamics programs� Very user-friendly: issues clear error messages, no scripting language is

required to run the programs, prints out the progress of the program that is running, etc.

� Allows the trajectory data to be stored in a compact way. � Gromacs provides a basic trajectory data viewer; xmgr or Grace may also

be used to analyze the results.� Files from earlier versions of Gromacs may be used in the latest Gromacs,

version 3.1.

To run a simulation several things are needed: 1. a file containing the coordinates for all atoms. 2. information on the interactions (bond angles, charges, Van der

Waals). 3. parameters to control the simulation.

Page 22: Molecular dynamics and Simulations

The exercise falls apart in four sections, corresponding to the actual steps in an MD simulation.

1. Conversion of the pdb structure file to a Gromacs structure file, with the simultaneous generation of a descriptive topology file.

2. Energy minimization of the structure to release strain.

3. Running a full simulations.

4. Analyzing results.

Page 23: Molecular dynamics and Simulations

File Formats� PDB file

------ format used by Brookhaven Protein DataBank.

Atom

residueRes.no

X,Y,Z coordinates

Page 24: Molecular dynamics and Simulations

Topology (*.top) file (ascii) ---------contains all the forcefield parameters

Page 25: Molecular dynamics and Simulations

Gromacs(*.gro): molecular structure file in the Gromos87 format

(Gromacs format)

x, y, and z position,

in nm

x, y, and z velocity, in

nm/ps

Page 26: Molecular dynamics and Simulations

*.tpr Filecontains the starting structure of the simulation, the molecular topology file and all the simulation parameters; binary format.

Page 27: Molecular dynamics and Simulations

Trajectory(*.trr)file:contains the trajectory data for the simulation; binary format. It contains allthe coordinates, velocities, forces and energies.

Page 28: Molecular dynamics and Simulations

Differenttrajectories

T=1 T=2 T=3

T=4 T=5 T=N

Page 29: Molecular dynamics and Simulations

*.xvg file:

file format that is read by Grace (formerly called Xmgr), which is a plotting tool for the X window system.

Plot of X vs Y

Page 30: Molecular dynamics and Simulations

*.mdp file: allows the user to set up specific parameters for all the calculations that Gromacs performs.

Recording every 0.002ps

No of steps for MD=500000

Page 31: Molecular dynamics and Simulations

Coloumb interactions within 1.4A

Vander waalsinteraction

within 1.4radius

Page 32: Molecular dynamics and Simulations

Takes into account interactions from

all the bonds

Page 33: Molecular dynamics and Simulations

em.mdp file:setstheparametersforrunningenergyminimizations

integratorIterations

Neighbor list

constraints

Forces and potential

Page 34: Molecular dynamics and Simulations

The exercise falls apart in four sections, corresponding to the actual steps in an MD simulation.

1. Conversion of the pdb structure file to a Gromacs structure file, with the simultaneous generation of a descriptive topology file.

2. Energy minimization of the structure to release strain.

3. Running a full simulations.

4. Analyzing results.

Page 35: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Page 36: Molecular dynamics and Simulations

Step1:ConversionofthePDBFile

� Each of the topologies saved in the PDB format were used as an input file for MD simulations with Gromacs.

� It is first necessary to convert it to the gromos file type (*.gro). Original data in the pdb file is often incomplete, carbon bound hydrogens are generally omitted.

� The conversion program pdb2gmx will check every residue in the structure file against a database and add all hydrogens. In the conversion process it also creates a topology file, with all the connections between the atoms listed.

� pdb2gmx can be used by simply typing it at the prompt.Example : we get a list of available options that that this conversion program can execute - - - -

pdb2gmx -h

Page 37: Molecular dynamics and Simulations

pdb2gmx-h

� This program reads a pdb file, reads some database files, adds hydrogens to the molecules and generates coordinates in Gromacs(Gromos) format and a topology in Gromacs format.

� This conversion program contains many in-built force-fields. We have to select the required force fields.

Option in Pdb2gmx:

Options Description-f Input-o Output-p Output for topology file-i Output-n Output-q output-ff To assign force field

Page 38: Molecular dynamics and Simulations

� The program will ask to select a force field:

� Select the Force Field:

0: GROMOS96 43a1 force field.1: GROMOS96 43b1 vacuum force field.2: GROMOS96 43a2 force field (improved alkane

dihedrals).3: OPLS-AA/L all-atom force field (for aminoacid

dihedrals).4: Gromacs force field (gmx) with hydrogens forNMR.5: Encad all-atom force field, using scaled-down

vacuum charges.6: Encad all-atom force field, using full solvent charges.

Gmx force field was used (for NMR)

Page 39: Molecular dynamics and Simulations

Outputs produced by this command…………………..Once the selection of the force field is done, three kinds of output files are produced:

1. PDB files2. the generated topology (.top) file3. gromos (.gro) file

• dsmt3.gro:It looks a lot like the original pdb file, containing the same information regarding

the positions of the atoms, but the layout is different, hydrogens have been added and units have been converted to nm.

• dsmt3.top:This file contains the information on the atom names, types, masses and charges, as

well as a description of bonds, angles, dihedrals, etc.

Page 40: Molecular dynamics and Simulations

Forcefieldsusedusingduringmoleculardynamics� force field (also called a forcefield) refers to the functional form and

parameter sets used to describe the potential energy of a system of particles ( in this case the atoms and the residues).

� As protein models consist of hundreds or thousands of atoms the only feasible methods of computing systems of such size are molecular mechanics calculations.

� A Force- Field is assigned to each atom in the protein. This figure is a schematic representation of the four key contributions to a molecular mechanics force field: bond stretching, angle bending, torsional terms and non-bonded interactions.

Page 41: Molecular dynamics and Simulations

Bond Stretching Energy

Page 42: Molecular dynamics and Simulations

Bending Energy

Page 43: Molecular dynamics and Simulations

Torsion Energy

Page 44: Molecular dynamics and Simulations

Non-bonded Energy

Page 45: Molecular dynamics and Simulations

Energy =

Stretching Energy +

Bending Energy+

Torsion Energy +

Non-Bonded Interaction Energy

Types of force fields

1. All-atom force fields - provide parameters for every atom in a system, including hydrogen.

2. united-atom force fields - treat the hydrogen and carbon atoms in methyl and methylene groups as a single interaction center.

3. Coarse-grained force fields - which are frequently used in long-time simulations of proteins.

These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a force-field.

Page 46: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

Page 47: Molecular dynamics and Simulations

Editconf• editconf puts .gro file into a box• The box can be modified with options -box, -d and -angles. Both -box and –d will center

the system in the box.• Option -bt determines the box type: cubic is a rectangular box with all sides equal

dodecahedron represents a rhombic dodecahedron and octahedron is a truncated octahedron.

• With -d and cubic, dodecahedron or octahedron boxes, the dimensions are set to the diameter of the system.

Options in Editconf

Option Description

-f Input

-n Output

-o Output

-bt For box type

-d Distance between the solute and the box

Page 48: Molecular dynamics and Simulations
Page 49: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

Done

Page 50: Molecular dynamics and Simulations

Genbox……………….Genbox can do one of 2 things:1) Generate a box of solvent.2) Solvate a solute configuration, eg. a protein, in a bath of solvent molecules. Specify -cp

(solute) and -cs (solvent). The box specified in the solute coordinate file (-cp) is used.Options in Genbox

• Here the solvent of 8M urea (in the form of the denaturant) was prepared with the proteinacting as a solute (protein dissolved in 8M urea).

• The solvent file for urea was in the the form of urea+water.gro

Options Description

-cp Input

-cs Input

-o Output

-p Output

Page 51: Molecular dynamics and Simulations
Page 52: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

Page 53: Molecular dynamics and Simulations

� Step 2: Energy Minimization• The structure is now complete (hydrogens have been added) and a topology

file has been created. • However, there may be local strain in the protein, due to the generation of

the hydrogens, and bad Van der Waals contacts may exist, caused by particles that are too close.

• The strain has to be removed by energy minimization of the structure. This can be done with the program 'mdrun', which is the MD program. Mdrun uses a single .tpr file as input, which is generated by combining the topology (aki.top), structure (aki.gro) and parameter files (minim.mdp).

• grompp also reads parameters for the mdrun (eg. number of MD steps, time step, cut-off).

• To generate the .tpr file the program grompp has to be used.� A description of grompp can be obtained by giving the command:

grompp -h

Page 54: Molecular dynamics and Simulations

Options in GromppOption Description-f grompp input file with MD parameters

-po grompp input file with MD parameters

-c Input

-r Input

-n Input

-p Input the topology file

-pp Preprocess and output the toplology file

-o Output

-t Input the trajectory file

-e Input the energy file

-np Generate the status file

Page 55: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

Done

Page 56: Molecular dynamics and Simulations

Mdrun• The mdrun program is the main computational chemistry engine within GROMACS.• It performs Molecular Dynamics simulations, Brownian Dynamics and Langevin

Dynamics as well as Conjugate Gradient or Steepest Descents energy minimization.Principle

The mdrun program reads the run input file (-s)

Distributes the topology over nodes.

The coordinates are passed around, so that computations can begin.

A neighborlist is made, then the forces are computed.

The forces are globally summed, and positions are updated.

If necessary shake is performed to constrain bond lengths and/or bond angles.

• Temperature and Pressure can be controlled using weak coupling to a bath.

Page 57: Molecular dynamics and Simulations

• Option in Mdrun

Option Descriptionnp Number of nodes used

s Input

o Output

c Output

e Output

g Output

x Output

Page 58: Molecular dynamics and Simulations

� The energy minimization may take some time, depending on the CPU in and the load of the computer.

� The trajectory file is not very important in energy minimizations, but the generated structure file (minimized.gro) will serve as input for the simulation.

� During the minimization the potential energy decreases. A plot from the energy over time can be made from the minim_ener.edr file using g_energy command.

� Simply make a plot from the .edr file by executing:

� This will display something like the following:

g_energy -f dsmt3-em_ener.edr -o dsmt3-em_ener.xvg

Page 59: Molecular dynamics and Simulations

� Select the property you want by typing the name, e.g. Potential, which codes for potential energy and then press return and another return to quit.

� The program g_energy produces a .xvg graph, which can be viewed and edited with xmgrace (prgram which makes graphs in gromacs) :

� GRAPH

xmgrace -nxy dsmt3-em_ener.xvg

Page 60: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

DoneDone

Page 61: Molecular dynamics and Simulations

Position Restrained MD• molecular dynamics of the water molecules of water molecules are done, and position of the peptide

is kept fixed. This is called position restrained (PR) MD.

• Position Restrained MD keeps the peptide fixed and lets all water molecules equilibrate around the peptide in order to fill holes, etc., which were not filled by the genbox program.

• It is first necessary to pre-process the input files to generate the binary topology. The input files are: the topology file, the structure file (output of the EM) and a parameter file.

• By default, the system was split into two groups - Protein and SOL(vent), to put position restraints on all the atoms of the peptide.

• The parameter file (.mdp extension) contains information about the PR-MD such as step size, number of steps, temperature, etc. This parameter file also tells GROMACS what kind of simulation should be performed

preprocess Position restrained MD fileTopology file of the

proteinEnergy minimized

binary file from previous step

Position restrained MD file with energy minimized protein in

it

Page 62: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

DoneDone

Done

Page 63: Molecular dynamics and Simulations

Run MD

Position restrained MD file of

protein

Output of trajectory of PRMD file

Output in gromacsformat

Energy after position

restrained MD

Page 64: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

DoneDone

Done

done

Page 65: Molecular dynamics and Simulations

• These two commands are used to give full moloeculardynamics, where none of the systems are fixed.

• Both the proteins and the solvent is subjected to motion until both of them become stable at a particular point, which is retained as the final output.

• g_filter frequency filters trajectories, useful for making smooth movies. Many of the trajectories are filtered and in all only 10 trajectories are kept. These can be read by pymol(protein visualization software).

Page 66: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

DoneDone

Done

done

DoneDone

Page 67: Molecular dynamics and Simulations

� Pymol cannot read Gromacs xtc trajectories , and it is better to remove the solvent in the trajectory to concentrate on the proteins.

� This is easy to fix by using another Gromacs program to convert the trajectory to PDB format and only select one group:

� An output group is asked to select and protein is selected in that.

� The trajectories of protein after MD is visualized in Pymol by giving a command:

Pymol dsmt3-finaltraj.pdb

Page 68: Molecular dynamics and Simulations

Batchfileforexecutingmoleculardynamics

Done

DoneDone

DoneDone

Done

done

DoneDone

Done

Page 69: Molecular dynamics and Simulations

Acknowledgements………………………

Prof. Ramkrishna Hosur (TIFR, Mumbai)Dinesh Kumar (TIFR, Mumbai)Dr. Ganapathy Subramanian (NCL, Pune)

Special thanks to NMR Facility, TIFR, Mumbai

Page 70: Molecular dynamics and Simulations

Thank You for the patience