Upload
verity-malone
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Protein Threading
Zhanggroup 2003 10 22
Overview
Background protein structure protein folding and designability
Protein threadingCurrent limitations to protein threading
Computational complexity of certain formulations of the protein threading problem
Performance of protein threading systems
References
Protein Structure
Primary, secondary, tertiary structure
Can only refer to the structure of a protein if a particular environment is assumed
solvent environment (aqueous trans-membrane ……)
temperature pH etcDifferent environments yield different
structures or no stable structure at all
Proteins molecules are not completely rigid structures
kinetic energy energetic collisions with solvent molecules
vibrations sidechain conformational changes
flexible sections of the peptide chainThe native tertiary structure of a protein
is thus an average
Protein Folding
Protein folding = searching for a conformation having minimum energy
Factors in protein folding
hydrophobic effects electrostatic charges in residues hydrogen bondingChaperonins,ribosomes
3 stages of folding
denatured unfolded state molten globule state native compact statemost proteins will return to their native
state after forced denaturation
The Protein Folding Problem
Given a proteins amino acid sequence what is its tertiary structure
The protein folding problem is hard
Direct approach :molecular dynamics simulation
Simulate on an atomic level the folding of a single protein molecule
protein = thousands of atomssolvent environment = hundreds to
thousands of molecules => thousands of atoms
Sub-picosecond time scalesrun the simulation for 1-5 secondsWe need another years of Moores law
to make this computation feasible
DesignabilityA protein with a stable native state can
not have another low-energy state nearby in conformational space
A structure is highly designable if its minimum energy state has no low-energy neighbours
Protein Threadinginverse protein folding problem: givena tertiary structure, find an amino acid
sequence that folds to that structureProtein threading: given a library of
possible protein folds and an amino acid sequence find the fold with the
best sequence -> structure alignment (threading)
Evolution depends on designability to preserve function under mutation
Estimate only different protein structures exist in nature (Chothia,1992)
four componentsa library of protein folds (templates)a scoring function to measure the
fitness of a sequence -> structure alignment
a search technique for finding the best alignment between a fixed sequence and structure
a means of choosing the best fold from among the best scoring alignments of a sequence to all possible folds
Scoring Schemes for Sequence->Structure
Alignments
The scoring scheme for a particular threading of a sequence onto a structure measures the degree to which
environmental preferences are satisfied Different amino acid types prefer different
environments e.g. structural preferences: in helix in sheet not exposed to solvent pairwise interactions with neighbouring amino
acids
Formal Statement of the ProteinThreading Problem
C is a protein core having m segments Ci representing a set of contiguous amino acids Let ci be the length of Ci
Sequence a = a1a2…an of amino acids
Current limitations to protein threading
Statistical problems
Definition of neighbor and /or pairwise contact environments:
energetic neighbor ? contact neighbor
Computational Complexity of Finding an Optimal Alignment
The complexity of the protein threading problem depends on whether:
Variable-length gaps are allowed in alignments
the scoring function for an alignment incorporates pairwise interactions between amino acids
Property(I) makes the search space exponential in size to the length of the sequence
Property(Ii) forces a solution to take non-local effects into account
Any protein threading scheme with both properties is NP-complete
(3-SAT Lathrop 1994)
(MAX-CUT Akutsu,Miyano 1999)
Thus all protein threading approaches can be divided
into four groups:
1 no variable length gaps allowed
2 no pairwise interactions considered in scoring function
3 no optimal solution guarantee
4 exponential runtime
Performance of Protein Threading Systems
CASP1(1994) CASP2(1996) CASP3(1998): Critical
Assessment of Structure Prediction meetings
protein threading methods have consistently been
the winners
success depends on structural similarity of target to
known structures
successful even when target sequence and library
sequence have low homology
Much room for improvement in all areas of protein threading e.g.:
algorithms for searching the threading space
reliable biologically accurate scoring functions