View
235
Download
3
Embed Size (px)
Citation preview
Protein Tertiary Structure Prediction
Structural Bioinformatics
Primary: amino acid linear sequence.
Secondary: -helices, β-sheets and loops.
Tertiary: the 3D shape of the fully folded polypeptide chain
The Different levels of Protein Structure
Predicting 3D Structure
– Comparative modeling (homology)
Based on structural homology
– Fold recognition (threading)
Outstanding difficult problem
Based on sequence homology
Comparative ModelingSimilar sequences suggests similar structure
Sequence and Structure alignments of two Retinol Binding Protein
Structure Alignments
The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another.
Low values of RMSD mean similar structures
There are many different algorithms for structural Alignment.
Comparative Modeling
Builds a protein structure model based on its alignment to one or more related protein structures in the database
Similar sequence suggests similar structure
Comparative Modeling• Accuracy of the comparative model is
related to the sequence identity on which it is based
>50% sequence identity = high accuracy
30%-50% sequence identity= 90% modeled
<30% sequence identity =low accuracy (many errors)
Homology Threshold for Different Alignment Lengths
0
10
20
30
40
50
60
70
80
90
0 20 40 60 80 100
Alignment length (L)
Homology Threshold (t)
A sequence alignment between two proteins is considered to imply structural homology if the sequence identity is equal to or above the homology threshold t in a sequence region of a given length L.
The threshold values t(L) are derived from PDB
Comparative Modeling
• Similarity particularly high in core– Alpha helices and beta sheets preserved– Even near-identical sequences vary in loops
Comparative Modeling Methods
MODELLER (Sali –Rockefeller/UCSF)
SCWRL (Dunbrack- UCSF )
SWISS-MODEL http://swissmodel.expasy.org//SWISS-MODEL.html
Comparative ModelingModeling of a sequence based on known structuresConsist of four major steps :1. Finding a known structure(s) related to the sequence
to be modeled (template), using sequence comparison methods such as PSI-BLAST
2. Aligning sequence with the templates
3. Building a model
4. Assessing the model
Fold Recognition
Hemoglobin TIM
Protein Folds: sequential and spatial arrangement of secondary structures
Similar folds usually mean similar function
Homeodomain Transcriptionfactors
The same fold can have multiple functions
Rossmann
TIM barrel
12 functions
31 functions
Fold Recognition
• Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.
• Search for folds that are compatible with a particular sequence.
• "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence
Basic steps in Fold Recognition :
Compare sequence against a Library of all known Protein Folds (finite number)
Query sequenceQuery sequence
MTYGFRIPLNCERWGHKLSTVILKRP...
Goal: find to what folding template the sequence fits best
There are different ways to evaluate sequence-structure fit
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
There are different ways to evaluate sequence-structure fit
Programs for fold recognition
• TOPITS (Rost 1995)
• GenTHREADER (Jones 1999)
• SAMT02 (UCSC HMM)
• 3D-PSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/
Ab Initio Modeling
• Compute molecular structure from laws of physics and chemistry alone Theoretically Ideal solution
Practically nearly impossible
WHY ?– Exceptionally complex calculations– Biophysics understanding incomplete
Ab Initio Methods
• Rosetta (Bakers lab, Seattle)
• Undertaker (Karplus, UCSC)
CASP - Critical Assessment of Structure Prediction
• Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.
• Current state -– ab-initio - the worst, but greatly improved in the last
years. – Modeling - performs very well when homologous
sequences with known structures exist.– Fold recognition - performs well.
What can you do?FOLDIT
Solve Puzzles for Science
A computer game to fold proteins
http://fold.it/portal/puzzles
What’s Next
Predicting function from structure
Structural Genomics : a large scale structure determination project designed to cover all
representative protein structures
Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)
ATP binding domain of protein MJ0577
As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved
Wanted !Automated methods to predict function from the protein structures resulting from the structural genomic project.
Approaches for predicting function from structure
ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/
Approaches for predicting function from structure
PFPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/
A method to distinguish DNA from RNA-binding proteins
DNA binding interface RNA binding interface
DNA binding interface RNA binding interface
RNA and DNA binding interfaces tend to have different geometric features
H=(k1+k2)/2 Mean Curvature
K=k1*k2 Gaussian Curvature
Applying Differential Geometry to characterize DNA and RNA binding proteins
K1 - MINIMAL CURVATURE
K2- MAXIMAL CURVATURE
Peak
Pit
Ridge
Valley
Flat
Minimal Surface
Saddle ridge
Saddle valley
Applying Differential Geometry to characterize DNA and RNA proteins
Applying Differential Geometry for DNA and RNA function prediction
Fre
quen
cy o
f po
ints
RNA binding surfaces are distinguishedfrom DNA binding surfaces based on
Differential Geometric features
78% DNA binding76% RNA-binding
Differential Geometry can correctly determinewhether a given binding domain binds
RNA or DNA
RNA pattern DNA pattern
Fre
quen
cy o
f po
ints
Shazman et al, NAR 2011
How can we view the protein structure ?
• Download the coordinates of the structure from the PDB http://www.rcsb.org/pdb/
• Launch a 3D viewer program For example we will use the program Pymol The program can be downloaded freely from the Pymol homepage http://pymol.org
• Upload the coordinates to the viewer
Pymol example• Launch Pymol• Open file “1aqb” (PDB coordinate file)• Display sequence• Hide everything• Show main chain / hide main chain• Show cartoon • Color by ss• Color red• Color green, resi 1:40
Help : http://pymol.org