17
Helsinki University of Technology S-114.2500 Basics for Bio Systems of the Cell 22.11.2006 Using Protein Data Bank and Swiss-PdbViewer to study protein structures Saara Suikkanen Pia Ojala

Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

Embed Size (px)

Citation preview

Page 1: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

Helsinki University of Technology S-114.2500 Basics for Bio Systems of the Cell

22.11.2006

Using Protein Data Bank and

Swiss-PdbViewer to study protein structures

Saara Suikkanen Pia Ojala

Page 2: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

2

1. Introduction.....................................................................................................................................3

2. Proteins............................................................................................................................................4

2.1 Amino acids............................................................................................................................................ 4

2.2 Primary structure.................................................................................................................................. 4

2.3 Secondary structure .............................................................................................................................. 4

2.4 Tertiary structure.................................................................................................................................. 5

2.5 Quaternary structure............................................................................................................................ 6

2.6 Prosthetic group .................................................................................................................................... 6

2.7 Example protein .................................................................................................................................... 6

3. Protein Data Bank...........................................................................................................................7

3.1 Files in Protein Data Bank ................................................................................................................... 7

3.2 Our example protein, xylose isomerase............................................................................................... 7

4. Swiss-PdbViewer ...........................................................................................................................10

4.1 Protein visualization............................................................................................................................ 11

4.3 Visualization of Xylose isomerases active site................................................................................... 14

5. Conclusion.....................................................................................................................................16

6. References .....................................................................................................................................17

Page 3: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

3

1. Introduction Proteins are essential parts of all living organisms and participate in every process within the cell. All proteins fold into a unique three-dimensional structure. The 3-D structure is defined by the primary, secondary, tertiary and quaternary structures. The function of a protein depends on its structure. Proteins can be enzymes or they can have structural or mechanical functions among others. Researchers over the world have determined proteins three-dimensional structures and the structures have been gathered to a protein data bank that is maintained by The Research Collaboratory for Structural Bioinformatics (RCSB). The proteins 3-D structure can be studied with different visualization programs. We have chosen to study protein structures with a program called the Swiss-PdbViewer. In this paper you will get a brief look at the basic structure of proteins. Then we will introduce to you our example protein, find it from the Protein Data Bank and look at it with the Swiss-PdbViewer.

Page 4: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

4

2. Proteins Proteins are polypeptides consisting of amino acids arranged in a linear chain. The sequence of amino acids in a protein is defined by a gene and encoded in the genetic code. The genetic code is translated into a protein structure by the so called protein synthesis, which we will not discus here.

2.1 Amino acids There are 20 amino acids occurring in proteins. All amino acids contain one amine group, one carboxyl group, one hydrogen and one side chain (R) all attached to a the same carbon atom, the so called �-carbon as in figure [1]. The amino acids are different by shape and chemical properties of their side chains (R groups). They are classified by their side chains to five groups: 1. Nonpolar, aliphatic R groups 2. Polar, uncharged R groups 3. Aromatic R groups 4. Positively charged R groups 5. Negatively charged R groups. Amino acids exist as two enantiomers (mirror images), L- and D-forms. Only L-amino acids are found in proteins.

2.2 Primary structure The amino acids are joined together by a peptide bond. The peptide bond is formed between two amino acids when the carboxyl group of one amino acid reacts with the amino group of another amino acid, releasing a molecule of water (H2O) shown in figure [2]. Many amino acids linked together forms a polypeptide. The amino acid sequence in a protein is called the proteins primary structure. In one end of the primary structure we can find the N-terminus (left end) and in the other end the C-terminus (right end). The primary structure is always given in the direction from the N-terminus to the C-terminus.

Even if the peptide bond is formed by a single bond, there is no free rotation around the bond because conjugation gives the peptide bond double bond characteristics and therefore peptide bonds are quite rigid constructions. Other single bonds in the peptide chain are free to rotate.

2.3 Secondary structure The amino acid residues in the proteins primary structure interacts with each other non-covalently forming for example hydrogen bonds. The hydrogen bonds draw the primary structure together forming secondary structures. Secondary structures are local conformations of some part of a

Figure 1 Amino acid general form

Figure 2 Peptide bond forming

Page 5: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

5

polypeptide. There are different secondary structure folding patterns, but the most common and stable ones are � helix, � conformation and turns & loops. Turns and loops are connecting elements that links � helix and � conformations to a complete secondary structure. Turns and loops can bee found when the polypeptide chain changes direction. In the � helix construction, figure [3], the polypeptide backbone is tightly wound around an imaginary axis drawn longitudinally trough the middle of the helix, and the R groups of the amino acid residues protrude outward from the helical backbone. Each helical turn includes 3.6 amino acid residues and extends about 5.4 Å along the axis. The helical twists of � helix found in all proteins are right-handed.

In the � construction polypeptide chains are organized into sheets. The backbone of the polypeptide chain is extended into a zigzag. The zigzag polypeptide chains can be arranged side by side to form a structure resembling a series of pleats. In this so called � sheet conformation, hydrogen bonds are formed between the adjacent segments of the polypeptide chain. The adjacent sheets in a � sheet construction can be either parallel, figure [4], or antiparallel.

2.4 Tertiary structure Every active protein has its own unique three-dimensional structure, referred to as the proteins tertiary structure. The tertiary structure of a protein is its overall shape, the whole polypeptide chains packing construction. The helical and folded chains form into spherical knots or coil into elongated fiber like structures. The tertiary structure is formed in addition to hydrogen bonds with help of van der Waals forces, ionic interactions and with sulfur bridges that are formed between two amino acid side groups.

Figure 3 � helix conformation

Figure 4 Parallel � sheet conformation

Page 6: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

6

2.5 Quaternary structure Some proteins contain two or more separate polypeptide chains, or subunits, which may be identical or different. The arrangement of these protein subunits in the three-dimensional complexes constitutes the quaternary structure.

2.6 Prosthetic group Many proteins contain other groups than amino acids and these groups are called prosthetic groups or coenzymes. Prosthetic groups usually bond covalently to their protein and can be for example metal ions or vitamins. Prosthetic groups can be of big importance for a proteins function and for example in enzymes the prosthetic groups are involved in the active site in some way.

2.7 Example protein To make it easier for you to understand the different functions of the programs we use, we will have an example protein that we use trough out this paper. The example protein we have chosen is an enzyme called xylose isomerase also known as glucose isomerase. Xylose isomerase catalyses the enzymatic isomerization of D-xylose to D-xylulose and it also catalyses the interconversion of D-glucose to D-fructose. The substrate (D-xylose or D-glucose) binds to the active site of the enzyme were the isomerization then takes place.

Bacteria utilize xylose by isomerizing xylose to xylulose, a process which is catalyzed by xylose isomerase. Dozens of Streptomyces strains are known to produce xylose isomerases but only few of them are suitable for large scale production. In industry xylose isomerase is used widely to convert glucose, which is not very sweet, to fructose, the sweetest of the natural sugars. Syrups from this process compete with sucrose (cane sugar) in many food applications. Almost all manufacturers of soft drinks use high fructose syrups as sweetener because it is less expensive than sucrose. Xylose isomerase can also be used in ethanol production. The principal agricultural residue, xylose, is converted to xylulose by this enzyme and then to ethanol by fermentation. The xylose isomerase we have picked as an example is isolated from Actinoplanes missoriensis. It is a tetramer with four identical subunits and it contains 394 amino acids. The molecular mass per monomer is 43 500 daltons and it contains two magnesium ions per monomer unit.

Figure 5 D-xylose (left) and D-xylulose (right)

Page 7: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

7

3. Protein Data Bank The Research Collaboratory for Structural Bioinformatics (RCSB) provides a non-commercial database containing information about the three-dimensional structure of different kind of large biological molecules, proteins. This database is called the Protein Data Bank (PDB) and can be accessed for free. The Protein Data Bank (PDB) was established in 1971 at Brookhaven National Laboratory containing only 7 protein structures. In 1998 the Research Collaboratory for Structural Bioinformatics became responsible for the management of the PDB. With Japanese and European Protein Data Banks the Bioinformatics Institute the RCSB ensures that the Protein Data Bank is available world wide and uniform. Since the beginning the number of different protein structures available has undergone an exponential growth which shows no sign of stopping. Every Wednesday RCSB releases new structures and the total number new structures each year is about 500. In September 2006 the database contained about 32754 released atomic structures, about 30000 of them proteins and the rest being nucleic acids and nucleic acid-protein complexes.

3.1 Files in Protein Data Bank The information of each protein structure has been derived experimentally and contains information about sequence details, atomic coordinates, crystallization conditions and 3-D structure neighbours. The information has been gathered through X-ray diffraction and NMR studies and being saved in PDB-files. This file describes the molecule atom by atom and every file is then named with special PDB ID. When searching a special protein from the protein data bank you can either use the PDB ID or use plain text search typing the name of the protein. The advantage of the PDB ID relies on the fact that same protein in a different environment exhibits different 3-D structures. The PDB IDs specifies them from each other. The key idea in RCSB PDB is that the 3-D structure of every protein it contains is known and the structural knowledge of the proteins has been derived experimentally (and not theoretically predicted). There is also data banks containing even more proteins than this one, but they usually lack the information of the 3-D structure. So when studying dimensions and tertiary structures of proteins RCSB PDB should be used. If you are only interested in the list of amino acids making up a particular protein, much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration would be the right tool for you.

3.2 Our example protein, xylose isomerase Our example protein, xylose (glucose) isomerase, can be found from RCSB PDB and its PDB ID is 6XIM. Parts of the information related to this protein in the RCSB PDB are shown in table [1]. From there you can find the experimental method used to explore the 3-D structure and the persons related to this specific research.

Page 8: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

8

Figure 6 3-D picture of xylose isomerase

Table 1 Proteins 6XIM information from RCSB PDB web site.

Title PROTEIN ENGINEERING OF XYLOSE (GLUCOSE) ISOMERASE FROM ACTINOPLANES MISSOURIENSIS. 1. CRYSTALLOGRAPHY AND SITE-DIRECTED MUTAGENESIS OF METAL BINDING SITES Authors Janin,J. Primary Citation Jenkins, J., Janin, J., Rey, F., Chiadmi, M., van Tilbeurgh, H., Lasters, I., De Maeyer, M., Van Belle, D., Wodak, S.J., Lauwereys, M., Stanssens, P., Matthyssens, G., Lambeir, A.M. Protein engineering of xylose (glucose) isomerase from Actinoplanes missouriensis. 1. Crystallography and site-directed mutagenesis of metal binding sites. Biochemistry v31 pp.5449-5458 , 1992 History Deposition 1992-04-03 Release 1993-07-15 Experimental Method Type X-RAY DIFFRACTION Data N/A Parameters Resolution[Å] R-Value R-Free Space Group 2.50 0.151 (obs.) n/a P 32 2 1 Unit Cell Length [Å] a 143.45 b 143.45 c 231.50 Angles [°] alpha 90.00 beta 90.00 gamma 120.00

Molecular Description Asymmetric Unit Polymer: 1 Molecule: D-XYLOSE ISOMERASE Chains: A,B,C,D EC no.: 5.3.1.5

Page 9: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

9

Classification Isomerase isomerase(intramolecular Oxidoreductse) Source Polymer: 1 Scientific Name: Actinoplanes missouriensis

The PDB-file of 6XIM is available on the same site and it can be transported to your own computer. Then, with appropriate software, you can visualize the protein from the structural data. Swiss-PdbViewer is one of them and we are going to use it in our visualization.

Page 10: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

10

4. Swiss-PdbViewer Swiss-PdbViewer is a user friendly software used to visualize PDB-files. Swiss-PdbViewer has been developed by Nicolas Guex and it is tightly linked to SWISS-MODEL, an automated homology modeling server developed within the Swiss Institute of Bioinformatics at the Structural Bioinformatics Group at the Biozentrum in Basel. With Swiss-PdbViewer you can analyze several proteins at the same time and the proteins can be superimposed in order to deduce structural alignments and compare their active parts. Also H-bonds, angles and distances between atoms in structure can bee measured whit this program.

Figure 7 Bond-lengths between nearby atoms.

The easiest way to display a protein is simply dragging a PDB-file into the Swiss-PdbViewer. The Swiss-PdbViewer receives the atomic coordinates of the molecule from the PDB-file and constructs the 3-D model based on this information. During the file loading there may occur some data loss. Because of this the Swiss-PdbViewer might be forced to construct the lacking atoms by finding the rotamere that generates the maximum of H-bonds and a minimum of steric hindrances. Due to this process of reconstruction, it is currently impossible to load a PDB-file that contains only alpha carbons. �

Molecules appear in wireframe representation in a black window. Unlinked atoms appear as a small cross. The atoms in the model are colored as following by default: carbon(C) is white, oxygen(O) is red, nitrogen(N) is blue, sulfur(S) is yellow, phosphor(P) is orange, hydrogen(H) is cyan and other molecules are grey. In figure [8] you can see the color coding.

Page 11: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

11

Figure 8 Colour coding for C9O3H7

Working with this program greatly reduces the amount of work necessary to generate models, as it is possible to thread proteins primary sequences into a 3D template. Swiss-PdbViewer can also read electron density maps, and provides various tools to count the densities. In addition, various modeling tools are integrated and command files for popular energy minimization packages can be generated. Yet, in our paper, we will concentrate on the basics of visualization tools of this program.

4.1 Protein visualization The program used is a very wide-ranging program with a large number of functions. Here we will only describe the most basic features of the program, with help of them we hope you will get a good start to the secrets of the program. To visual the chosen protein structure download the wanted pdb-file from the protein data bank and open it with Swiss-PdbViewer. In our example case we download the 6XIM pdb-file. The program then opens the protein 3-D structure in the window. To get the control panel to open up choose from the menu Window � Control Panel and to get up the alignment with the primary structure choose Window � Alignment.

Page 12: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

12

In the Control Panel you can:

1. In the upper left corner, you can make protein structures visible/unvisible. With this function you can have many pdb-files open at the same time and change between the different structures. You can also display many structures at the same time in the window (not possible with our example).

2. In the first upright row (on the right) you can see letters from A � D (use the scroll on the left), these letters describe the subunits in the protein (= four = tetramer). By pressing the left button on the mouse on one of the letters and then pressing enter, you will get only one subunit visible on the screen. This makes it easier for you to study the protein structure (because the subunits are similar).

3. The second upright row describes the amino acid sequence (primary structure) of the protein.

4. The third upright row indicates the polypeptide backbone. Pressing the mouse left button you get one amino acid to disappear, pressing the right button the whole peptide chain disappears (pressing the same buttons again the reverse will happen).

5. The fourth upright row indicates the amino acids side chains and the same functions apply as in the third row.

6. In the fifth row you can get amino acid labels visible in the screen, with the left and right mouse buttons.

7. In the eight row you can choose one amino acid and change its colour so that it’s easier for you to distinguish a particular amino acid from the screen.

The alignment shows the proteins primary structure. By moving the mouse over the letters describing the amino acids, the amino acids on which the mouse is pointing on will change colour

Menu

Control Panel

Protein structure

Alignment

Page 13: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

13

on the screen. Pressing the left mouse button on one letter will make the chosen amino acid to flash on the screen. In the menu you can find the menu buttons and the fast tool buttons. To next we will introduce the fast tool buttons:

The first button centralizes your protein structure.

- With the “hand” button you can move your protein structure - With the middle button you can enlarge and reduce your structure

- With the last button you can rotate your structure

1. Measures distance between two atoms 2. Measures bond angel between three atoms 3. Measures omega, phi and psi angles of the picked amino-acid 4. Provenance of an atom (will give the name of the molecule, group, chain, and atom) 5. Displays groups that are at a certain distance from an atom 6. Centers the chosen atom on the screen 7. Fits a molecule onto an other, you can use this only when you have many pdb files open (not

possible in our example) 8. You can make a mutation on your protein, change one amino acid into another 9. Torsion tool, you can pick one atom from the screen and have the program to rotate the

structure In the menu bar you can find a wide range of functions, we have chosen a few function from the tools, display and colour menus that we will describe to you:

- Tools � Compute H-bonds o with this tool the program will display the protein structure hydrogen bonds and you

can see how the different amino acids interact with each other. - Display � Slab

o This tool displays one disc/slab of the protein structure to you. It allows you to go "inside a protein" without having to see all periferical atoms

- Display � Stereo view o This tool split the screen into two parts and shows the same structure on both screens

- Display � Show X o In the display menu you can find many functions which shows different things on

the protein for example show backbone oxygens or show hyndrogens, this function makes it sometimes easier to see something if everything is not shown on the screen

- Colour � by X o X = CPK, will colour the atoms by default o X = Type, will colour basic amino acids in blue, acidic amino-acids in red, polar in

yellow and hydrophobic in grey o X = secondary structure, will colour helixes in red and strands/sheets in yellow. The

rest of the structure is colored in grey.

Page 14: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

14

o X = selection, will colour selected residues in cyan and non selected residues in dark grey. This is useful to quickly highlight the position of selected amino acids compared to the rest of the protein.

4.3 Visualization of Xylose isomerases active site Now we will introduce some functions of the program with the help of our example protein, xylose isomerase. After opening the 6XIM pdb file we want to find the active site of the xylose isomerase. Because we know that the xylose isomerase has two metal ions in its active site we try to locate them.

1. We display only one subunit on the screen, using the first upright row in the control panel. 2. We take away the amino acid side chains by using the control panel’s fourth upright row.

This gives us a better sight into the structure. 3. We use the rotate tool to rotate the protein structure and try to see two crosses indicating the

metal ions (you can also find the metal ions in the control panel’s second row). 4. When we have located the metal ions we use the fourth (from the left) fast tool button to

center one of them. 5. We enlarge the proteins active site with the third (from right) fast tool button. 6. When the active site is enlarged and we have rotated the active site to a convenient position,

we can see a structure in the active site that is not bounded to the protein structure. This is the substrate of the enzyme.

7. Now we can display the hydrogen bonds on the screen with the Compute H-bonds function. The hydrogen bonds are displayed as green dash lines.

8. To see all the bonds between the substrate and the protein structure we want the side chain to show on the screen. When displaying the side chains (with help of the control panel) we get a “mess” shown to us…

9. The next step is to use the display slab tool. This tool makes the picture more understandable.

10. Now we can rotate our picture and see how the substrate is bonded to the active site with hydrogen bonds. Because of the display slab tool we are using, some bonds seem to bee hanging in the air, but if you rotate the structure you can see that is isn’t so.

Page 15: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

15

The metal ion (MG396) in the control panel

The substrate

The metal ions

Page 16: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

16

5. Conclusion While DNA acts as a memory storage in cell, proteins plays a vital role in nearly every cell event. Having the right kind of protein in right place ensures healthy cell to divide and prolong its life. On the other hand having the right kind of protein in right place in diseased cell disables the cell division and prevents several diseased to develop. Mutations in DNA double helix may change the tertiary structure of a protein, and ultimately these kinds of structural changes in proteins, and there by in their behavior, may cause cancer or other fatal deceases to human beings. Studying the structural conformation of different kind of proteins is one highly explored field among medical science. The RCSB PDB is one of the most up to date data banks available because of its world wide cooperation partners. It has been estimated, that 2,2 structures from RCSB PDB are downloaded every second. The Protein Data Bank is in grate favor among bio scientists and researcher and they can also add new structures to the PDB. On the other hand user friendly visualization programs, like Swiss-Pdb viewer, bring the possibility to study 3-D structures close to every one. With graphical user interface even less experienced computer users can handle the program quite easily. The creators of this program haven’t forgotten the more advantage users either. For them, the Swiss-PdbViewer offers ability to edit the data and pictures in more experimental way. All in all, the Protein Data Bank and the Swiss-PdbViewer make a handy tool together and should be in common use.

Page 17: Using Protein Data Bank and Swiss-PdbViewer to study ... · Using Protein Data Bank and ... so called sheet conformation, ... the fact that same protein in a different environment

17

6. References

1. David L. Nelson, Michael M. Cox, Lehninger Principles of biochemistry, fourth edition, W.H. Freeman and Company

2. Pastinen, Ossi. Xylose isomerase from Streptomyces rubiginosus: Stability, novel reactions and applications. Espoo 2000, Helsinki University of Technology.

3. Swiss-PdbViewer www pages, http://us.expasy.org/spdbv/mainpage.htm, ref. 20.11.2006 4. Protein Data Bank www pages,

http://www.rcsb.org/pdb/home/home.do;jsessionid=6A01A9457B02EB4F2D08C858E5B76B68, ref. 20.11.2006