36
Macromolecular structure Bioinformatics

Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Embed Size (px)

Citation preview

Page 1: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Macromolecular structure

Bioinformatics

Page 2: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Contents

Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis

Structure alignment Domain recognition

Structure prediction Homology modelling Threading/folder recognition Secondary structure prediction ab initio prediction

Page 3: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Determination of protein structure

Structure

Page 4: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Crystal

Hanging drop method / vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1&2must be tried

many differentconditions of 1&2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Page 5: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

Page 6: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

A high resolution protein structure : 1.5 - 2.0 Å resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Page 7: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Nuclear Magnetic Resonance (NMR)

Source: Branden & Tooze (1991)

Page 8: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Interatomic forces

Covalent interactions Hydrogen bonds Hydrophobic/hydrophilic interactions Ionic interactions van der Waals force Repulsive forces

Page 9: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Structure databases

Structure

Page 10: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Structure databases

PDB (Protein database) Official structure repository

SCOP (Stuctural Classification Of Proteins) Structure classification. Top level reflect structural classes.The

second level, called Fold, includes topological and similarity criteria.

CATH (Class, Architecture, Topology and Homologous superfamily)

Page 11: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

PDB entry header

HEADER TRANSCRIPTION REGULATION 06-MAR-92 1D66 1D66 2COMPND GAL4 (RESIDUES 1 - 65) COMPLEX WITH 19MER DNA 1D66 3SOURCE (SACCHAROMYCES $CEREVISIAE) OVEREXPRESSED IN (ESCHERICHIA 1D66 4SOURCE 2 $COLI) 1D66 5AUTHOR R.MARMORSTEIN,S.HARRISON 1D66 6REVDAT 1 15-APR-93 1D66 0 1D66 7JRNL AUTH R.MARMORSTEIN,M.CAREY,M.PTASHNE,S.C.HARRISON 1D66 8JRNL TITL /DNA$ RECOGNITION BY /GAL4$: STRUCTURE OF A 1D66 9JRNL TITL 2 PROTEIN(SLASH)/DNA$ COMPLEX 1D66 10JRNL REF NATURE V. 356 408 1992 1D66 11JRNL REFN ASTM NATUAS UK ISSN 0028-0836 006 1D66 12REMARK 1 1D66 13REMARK 2 1D66 14REMARK 2 RESOLUTION. 2.7 ANGSTROMS. 1D66 15REMARK 3 1D66 16REMARK 3 REFINEMENT. 1D66 17REMARK 3 PROGRAM CORELS;TNT;XPLOR 1D66 18REMARK 3 AUTHORS J.SUSSMAN;D.TRONRUD;A.BRUNGER 1D66 19REMARK 3 R VALUE 0.230 1D66 20REMARK 3 RMSD BOND DISTANCES 0.015 ANGSTROMS 1D66 21REMARK 3 RMSD BOND ANGLES 2.9 DEGREES 1D66 22REMARK 4 1D66 23REMARK 4 THERE ARE TWO DNA CHAINS WHICH HAVE BEEN ASSIGNED CHAIN 1D66 24REMARK 4 INDICATORS *D* AND *E*. THERE ARE TWO PROTEIN CHAINS 1D66 25REMARK 4 WHICH HAVE BEEN ASSIGNED CHAIN INDICATORS *A* AND *B*. 1D66 26REMARK 4 EACH PROTEIN - DNA COMPLEX CONTAINS FOUR BOUND CD IONS. 1D66 27...

Page 12: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

In CATH, protein domains are classified according to a tree with 4 levels of hierarchically

Class Architecture Topology Homology

Page 13: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secondary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]

Classifications of protein structures (domains)

Slide courtesy from Shoshana Wodak

Page 14: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Books

Branden, C. & Tooze, J. (1991). Introduction to protein structure. 1 edit, Garland Publishing Inc., New York and London.

Westhead, D.R., J.H. Parish, and R.M. Twyman. 2002. Bioinformatics. BIOS Scientific Publishers, Oxford.

Mount, M. (2001). Bioinformatics: Sequence and Genome Analysis. 1 edit. 1 vols, Cold Spring Harbor Laboratory Press, New York.

Gibas, C. & Jambeck, P. (2001). Developing Bioinformatics Computer Skills, O'Reilly.

Page 15: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Secondary structure elements

Structure

Page 16: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Secondary structure - -helix

Source: Branden & Tooze (1991)

3.6 residues

hydrogen bond

CarbonNitrogenOxygen

Page 17: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Hydrophobicity of side-chain residues in helices

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Source: Branden & Tooze (1999)Blue: polarRed: basic or acidic

Page 18: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Secondary structure - sheets

Antiparallel Parallel

Source: Branden & Tooze (1991)

Page 19: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Secondary structure - twist of sheets

Mixed sheet

Source: Branden & Tooze (1991)

Page 20: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Angles of rotation

Each dipeptide unit is characterized by two angles of rotation

Phi around the N-Calpha bond Psi around the Calpha-C bond

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Image from Branden & Tooze (1999)

Page 21: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Dipeptide unit

The Ramachandran map

Slide courtesy from Shoshana Wodak

Dipeptide unit

Page 22: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Tertiary structure

Structure

Page 23: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Combinations of secondary structures

loop

-helix

-sheet

Retinol binding protein (PDB:1rpb)

Page 24: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Analysis of structure

Bioinformatics

Page 25: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Question: Is structure A similar to structure B ?Question: Is structure A similar to structure B ?

Structure AStructure B

Approach: structure alignmentsApproach: structure alignments

Structure-structure alignment and comparison

Slide courtesy from Shoshana Wodak

Page 26: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Open form Closed form

Citrate synthase, ligand induced conformational changesDomain motion and small structural distortions

Analyzing conformational changes

Slide courtesy from Shoshana Wodak

Page 27: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Defining Domains: What for?

Link between domain structure and function

Different structural domains can be associated with

different functions

Enzyme active sites are often at domain interfaces;domain movements play

a functional role

Cathepsin DDNA Methyltransferase

Slide courtesy from Shoshana Wodak

Page 28: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

N

C

N

C

C

N

1-cut

2-cuts

4-cuts

Slide courtesy from Shoshana Wodak

Methods for Identifying Domains

Underlying principle Domain limits are defined by identifying groups of residues such

that the number of contacts between groups is minimized.

Page 29: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Domains From Contact Map

Lactate dehydrogenase

Slide courtesy from Shoshana Wodak

Page 30: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Jacques van [email protected]

Structure prediction

Structure

Page 31: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Methods for structure prediction

Homology modelling Building a 3D model on the basis of similar sequences

Threading Threading the sequence on all known protein structures, and

testing the consistency Secondary structure prediction ab initio prediction of tertiary structure

For proteins of normal size, it is almost impossible to predict structures ab initio.

Some results have been obtained in the prediction of oligopeptide structures.

Page 32: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Homology modelling - steps

Similarity search Modelling of backbone

Secondary structure elements Loops

Modelling of side chains Refinement of the model Verification

Steric compatibility of the residues

Page 33: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Homology modelling - similarity search

Starting from a query sequence, search for similar sequences with known structure.

Search for similar sequences in a database of protein structures. Multiple alignment. A weight can be assigned to each matching protein (higher score

to more similar proteins) The higher is the sequence similarity, the more accurate

will be the predicted structure. When one disposes of structure for proteins with >70% similarity

with the query, a good model can be expected. When the similarity is <40%, homology modeling gives poor

results. The lack of available structures constitutes one of the main

limitations to homology modeling• In 2004, PDB contains

Page 34: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Homology modelling - Backbone modelling

Modelling of secondary structure elements a-helices b-sheets For each secondary structure element of the template, align the

backbone of query and template. Loop modelling

Databases of loop regions Loop main chain depends on number of aa and neighbour

elements (a-a, a-b, b-a, b-b)

Page 35: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Homology modelling - Side chain modelling

Side-chain conformation (model building and energy refinement)

Conserved side chains take same coordinates as in the template.

For non-conserved side chains, use rotamer libraries to determine the most favourable conformation.

Page 36: Macromolecular structure Bioinformatics. Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure

Homology modelling - refinement

After the steps above have been completed, the model can be refined by modifying the positions of some atoms in order to reduce the energy.