View
231
Download
1
Category
Tags:
Preview:
Citation preview
Macromolecular structure
Bioinformatics
Contents
Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis
Structure alignment Domain recognition
Structure prediction Homology modelling Threading/folder recognition Secondary structure prediction ab initio prediction
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Determination of protein structure
Structure
Crystal
Hanging drop method / vapour diffusion method
Microscope slide
2-Concentrated salt solution
1-Dilute protein solutionMicroscope
many differentconditions of 1&2must be tried
many differentconditions of 1&2must be tried
Crystallisation
Slide courtesy from Shoshana Wodak
Diffraction pattern Atomic model
Determination of protein structure
Slide courtesy from Shoshana Wodak
A high resolution protein structure : 1.5 - 2.0 Å resolution
The resolution problem
Slide courtesy from Shoshana Wodak
Nuclear Magnetic Resonance (NMR)
Source: Branden & Tooze (1991)
Interatomic forces
Covalent interactions Hydrogen bonds Hydrophobic/hydrophilic interactions Ionic interactions van der Waals force Repulsive forces
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Structure databases
Structure
Structure databases
PDB (Protein database) Official structure repository
SCOP (Stuctural Classification Of Proteins) Structure classification. Top level reflect structural classes.The
second level, called Fold, includes topological and similarity criteria.
CATH (Class, Architecture, Topology and Homologous superfamily)
PDB entry header
HEADER TRANSCRIPTION REGULATION 06-MAR-92 1D66 1D66 2COMPND GAL4 (RESIDUES 1 - 65) COMPLEX WITH 19MER DNA 1D66 3SOURCE (SACCHAROMYCES $CEREVISIAE) OVEREXPRESSED IN (ESCHERICHIA 1D66 4SOURCE 2 $COLI) 1D66 5AUTHOR R.MARMORSTEIN,S.HARRISON 1D66 6REVDAT 1 15-APR-93 1D66 0 1D66 7JRNL AUTH R.MARMORSTEIN,M.CAREY,M.PTASHNE,S.C.HARRISON 1D66 8JRNL TITL /DNA$ RECOGNITION BY /GAL4$: STRUCTURE OF A 1D66 9JRNL TITL 2 PROTEIN(SLASH)/DNA$ COMPLEX 1D66 10JRNL REF NATURE V. 356 408 1992 1D66 11JRNL REFN ASTM NATUAS UK ISSN 0028-0836 006 1D66 12REMARK 1 1D66 13REMARK 2 1D66 14REMARK 2 RESOLUTION. 2.7 ANGSTROMS. 1D66 15REMARK 3 1D66 16REMARK 3 REFINEMENT. 1D66 17REMARK 3 PROGRAM CORELS;TNT;XPLOR 1D66 18REMARK 3 AUTHORS J.SUSSMAN;D.TRONRUD;A.BRUNGER 1D66 19REMARK 3 R VALUE 0.230 1D66 20REMARK 3 RMSD BOND DISTANCES 0.015 ANGSTROMS 1D66 21REMARK 3 RMSD BOND ANGLES 2.9 DEGREES 1D66 22REMARK 4 1D66 23REMARK 4 THERE ARE TWO DNA CHAINS WHICH HAVE BEEN ASSIGNED CHAIN 1D66 24REMARK 4 INDICATORS *D* AND *E*. THERE ARE TWO PROTEIN CHAINS 1D66 25REMARK 4 WHICH HAVE BEEN ASSIGNED CHAIN INDICATORS *A* AND *B*. 1D66 26REMARK 4 EACH PROTEIN - DNA COMPLEX CONTAINS FOUR BOUND CD IONS. 1D66 27...
Class
Architecture
Topology
Figure from Shoshana Wodak
CATH - A protein domain classification
In CATH, protein domains are classified according to a tree with 4 levels of hierarchically
Class Architecture Topology Homology
CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secondary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]
Classifications of protein structures (domains)
Slide courtesy from Shoshana Wodak
Books
Branden, C. & Tooze, J. (1991). Introduction to protein structure. 1 edit, Garland Publishing Inc., New York and London.
Westhead, D.R., J.H. Parish, and R.M. Twyman. 2002. Bioinformatics. BIOS Scientific Publishers, Oxford.
Mount, M. (2001). Bioinformatics: Sequence and Genome Analysis. 1 edit. 1 vols, Cold Spring Harbor Laboratory Press, New York.
Gibas, C. & Jambeck, P. (2001). Developing Bioinformatics Computer Skills, O'Reilly.
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Secondary structure elements
Structure
Secondary structure - -helix
Source: Branden & Tooze (1991)
3.6 residues
hydrogen bond
CarbonNitrogenOxygen
Hydrophobicity of side-chain residues in helices
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Source: Branden & Tooze (1999)Blue: polarRed: basic or acidic
Secondary structure - sheets
Antiparallel Parallel
Source: Branden & Tooze (1991)
Secondary structure - twist of sheets
Mixed sheet
Source: Branden & Tooze (1991)
Angles of rotation
Each dipeptide unit is characterized by two angles of rotation
Phi around the N-Calpha bond Psi around the Calpha-C bond
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Image from Branden & Tooze (1999)
Dipeptide unit
The Ramachandran map
Slide courtesy from Shoshana Wodak
Dipeptide unit
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Tertiary structure
Structure
Combinations of secondary structures
loop
-helix
-sheet
Retinol binding protein (PDB:1rpb)
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Analysis of structure
Bioinformatics
Question: Is structure A similar to structure B ?Question: Is structure A similar to structure B ?
Structure AStructure B
Approach: structure alignmentsApproach: structure alignments
Structure-structure alignment and comparison
Slide courtesy from Shoshana Wodak
Open form Closed form
Citrate synthase, ligand induced conformational changesDomain motion and small structural distortions
Analyzing conformational changes
Slide courtesy from Shoshana Wodak
Defining Domains: What for?
Link between domain structure and function
Different structural domains can be associated with
different functions
Enzyme active sites are often at domain interfaces;domain movements play
a functional role
Cathepsin DDNA Methyltransferase
Slide courtesy from Shoshana Wodak
N
C
N
C
C
N
1-cut
2-cuts
4-cuts
Slide courtesy from Shoshana Wodak
Methods for Identifying Domains
Underlying principle Domain limits are defined by identifying groups of residues such
that the number of contacts between groups is minimized.
Domains From Contact Map
Lactate dehydrogenase
Slide courtesy from Shoshana Wodak
Jacques van Heldenjvanheld@ucmb.ulb.ac.be
Structure prediction
Structure
Methods for structure prediction
Homology modelling Building a 3D model on the basis of similar sequences
Threading Threading the sequence on all known protein structures, and
testing the consistency Secondary structure prediction ab initio prediction of tertiary structure
For proteins of normal size, it is almost impossible to predict structures ab initio.
Some results have been obtained in the prediction of oligopeptide structures.
Homology modelling - steps
Similarity search Modelling of backbone
Secondary structure elements Loops
Modelling of side chains Refinement of the model Verification
Steric compatibility of the residues
Homology modelling - similarity search
Starting from a query sequence, search for similar sequences with known structure.
Search for similar sequences in a database of protein structures. Multiple alignment. A weight can be assigned to each matching protein (higher score
to more similar proteins) The higher is the sequence similarity, the more accurate
will be the predicted structure. When one disposes of structure for proteins with >70% similarity
with the query, a good model can be expected. When the similarity is <40%, homology modeling gives poor
results. The lack of available structures constitutes one of the main
limitations to homology modeling• In 2004, PDB contains
Homology modelling - Backbone modelling
Modelling of secondary structure elements a-helices b-sheets For each secondary structure element of the template, align the
backbone of query and template. Loop modelling
Databases of loop regions Loop main chain depends on number of aa and neighbour
elements (a-a, a-b, b-a, b-b)
Homology modelling - Side chain modelling
Side-chain conformation (model building and energy refinement)
Conserved side chains take same coordinates as in the template.
For non-conserved side chains, use rotamer libraries to determine the most favourable conformation.
Homology modelling - refinement
After the steps above have been completed, the model can be refined by modifying the positions of some atoms in order to reduce the energy.
Recommended