Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
• Globular proteins are compact and densely packed with only few empty spaces (cavities)
• Protein conformation and dynamics are coded in amino acid sequence• A protein in its native conformation is at an energy minimum which results in
spontaneous folding• In soluble globular proteins, hydrophobic groups are predominantly on the inside,
hydrophilic on the outside of the globule• Most backbone NH and C=O groups are involved in H-bonds to other protein atoms • Homology of sequence (>25-30% identity) similarity of structure• There are clear statistical preferences for some amino acids in some positions in
some secondary structures (but secondary structure prediction from aa-sequence can be wrong…)
• Loops and turns tend to be on the surface of a globular protein• Proteins are dynamic molecules capable of a wide range of motions • Protein structure is dominated by secondary structure
Protein Structure in 10 Points
Stability, Folding and Kinetics
Levinthal’s Paradox:
There are so many unfolded states that if the polypeptide chain had to search through them all in order to find the correct (minimum energy) folded state it would take longer than the age of the Universe
The Paradox ReferenceLevinthal, C. (1968). Are There Pathways for Protein Folding? J. Chim. Phys.
PCB 65, 44-45
Conformational Energy
50 100 150 200 250 30050
100
150200250300
60
61
62
63
64
65
Interactions:
Bonds, bond angles, torsions, electrostatic, van der Waals, (hydrogen bonds)
The free energy, including entropy, rules!!
Free Energy, Entropy and all that
Stability, Folding and Kinetics
Proteins are only marginally stable (∆G~10kcal/mol), and may denature if the temperature is raised above normal by a few ºC
∆G
The Oildrop Model of Protein Folding
Hydrophobic sidechains thus tend to be buriedinside soluble proteins – but what happens topolar groups in the backbone?
Stability, Folding and Kinetics
Barnase - one major pathway
Stability, Folding and Kinetics
Lysozyme - two different pathways.
There are two domains
Disulfide bridge formationBPTI
Proline isomerizationCyclophilin catalyzes Pro cis-trans isomerization
20%
0.1%99.9%
80%
Conformational change -calmodulin
Understanding protein folding via free-energy surfaces fromtheory and experiment
Aaron R. Dinner, Andrej Sali, Lorna J. Smith, Christopher M. Dobson and Martin Karplus TIBS 25 – JULY 2000
RG measures the size of the protein Ramachandran diagram for Alanine as ’energy’contour map
Model of Protein FoldingThe Model:
•a simple lattice on which the polymer is built
•favorable (native) contacts and unfavorable (non-native contacts)
The number of states can be enumerated, and the global free energy (F) minimum identified
Q0 number of native contacts
C total number of contacts
Model Properties – Energetic and Entropic Components
Same native structure in both cases, but in (b,d) the native contacts are not as strong
Fast and Slow Folding Pathways ofLysozyme
Hen lysozyme has 129 aa-residues in two domains
Sequence of events mapped out using NMR hydrogen exchange protection experiments
Fast and Slow Folding Pathways ofModel
This 125-mer lattice model shows very similar behavior as the experimental lysozyme
Core and surface contacts are monitored
Modeling a 3D Structure
• It is sometimes very difficult to obtain an experimental structure.
• Can one construct theoretical 3D models?• Today - Not really, if just based on aa sequence• Homology modeling, or comparative modeling,
works fairly well, but requires access to known structure of a similar protein
Bioinformatics - Concepts• Identity - Homology (“of common origin”)
• Distance Similarity• Score/Scoring matrix/z-score• Global vs Local Alignment• Multiple alignment• Dynamic Programming• Artificial Neural Networks (ANN)
Sequence DatabasesInternational Nucleotide Sequence Database Collaboration:• GenBank at NIH• EMBL• DDBJ (DNA DataBank of Japan)
GenBank doubles in size every 14 months!!
3 000 000 000 bases from 47 000 species (late 1999)
NCBI (National Center for Biotechnology and Information) at NIH:
http://www.ncbi.nlm.nih.gov/
Example protein sequence in FASTA format:>4LZM:_ LYSOZYME (E.C.3.2.1.17) (HIGH SALT) - CHAIN _ MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAKSELDKAIGRNCNGVITK DEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCALINMVFQMGETGVAGFTNSLRM LQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
Identity? Similarity?Two identical sequences are easy to recognize, but tospot a relationship when they begin to differ gets progressivelymore difficult. Sequences may also be of different length - we havegaps due to insertions or deletions (indels).
s: ACACACAt: ACACACA
s: AGCACACAt: ACACACTA
s: AGCACAC-At: A-CACACTA
s: AG-CACACAt: ACACACT-A
or
HomologyThe biological approach:
During evolution DNA sequences (and proteins) diverge,due to point mutations and more sophisticated events.Two sequences which share a common evolutionary originare said to be homologous, and our task then is to find these relationships even for distantly releated sequences.
We may thus use our knowledge about evolution andmutations in our quest for homology.
ScoringDNA Protein
s:CUUCCGAAA s:Leu-Pro-Lyst:CUAGCGAGA t:Leu-Ala-Arg
We need to consider the “degree of change” in a substitution, we need a scoring scheme:
Substitution (score) matrix - there are 210 (20x19/2+20) pairs of amino acids and we need a number, a score, indicating howsimilar we consider two amino acids to be. For a given alignmentof two sequence we sum the pairwise scores and add gap penaltiesto obtain to total score for the alignment; sometimes instead the distance between two sequences is considered.
Specific Algorithms
• Global Alignment - find optimal alignment of two complete sequences: Needleman-Wunsch
• Local Alignment - find optimal alignment of fragments of a sequence: Smith-Waterman
• Heuristic (“less stringent”, but faster):FASTA and BLAST (Basic Local Alignment Search Tool) use local high scoring regions to find initial alignments which can be extended
Sequence AlignmentOverall SCP and SFCP are 80% similar
Multiple sequence alignment helps identify the real similarities
Structure ComparisonMeasure of structural similarity:• Root Mean Square Distance (RMSD) between equivalent, superposed atoms
• Caveats: alignment - use sequence and/or secondary structure, and superposition
• Indels also pose a problem
• Distances between Cα atom-pairsin one structure can be comparedto same distances in second structure
( )∑ =−=
N
insorientatio
iiN
RMSD1
221 )()(1min rr
High sequence similarity also gives
high structural similarity
Protein Folds
How many folds are there? (There are ~30000 human genes)
Current estimates 1 000-10 000We know about 100 different folds (1%-10%), almost exclusively of water soluble proteins - a handful membrane protein structures are known, even though they may account for about 1/3 of all proteins.
There are no reliable methods today of predicting a 3D structure just from an amio-acid sequence. ”Guessing” a fold based on the known structure of a homologous protein is the best we can do -homology modeling, works at >25-30% similarity
Protein Data Bankhttp://www.rcsb.org
Protein Data Bank
PDBSum http://www.biochem.ucl.ac.uk/bsm/pdbsum/
Folds in the PDB
New folds
Old folds
CATH http://www.biochem.ucl.ac.uk/bsm/cath/
ClassArchitectureTopologyHomologous Superfamily
SCOP http://scop.mrc-lmb.cam.ac.uk/scop/
Structural Classification Of Proteins
SCOP
DALI http://www.ebi.ac.uk/dali/Structure comparison server
Homology Modeling• Problem: Have sequence of protein and want 3D structure,
but no experimental structure is available.• Find homologous protein(s) with known 3D structure
(>25% similarity recommended!)• Align sequences (multiple alignment HELPS!)
it also helps if you have multiple templates and can use them to identify structurally conserved regions
• Identify conserved and variable regions• Generate core coordinates from template(s)• Generate conformations for loops• Build side-chain conformations• Refine and evaluate the model
Loops - from databases
Restricted set of CDR3 main chain conformations
Success rate
Swiss-Modelhttp://swissmodel.expasy.org/
Submit your own sequence, and get a 3D model back (if there are templates available…)
Alphavirus Spike-NC Binding
SFCP Model vs. Xtal
Conserved Residues in Hydrophobic Binding Pocket
Structure Validation
• Biochemically reasonable• Good stereochemistry, with main chain in
acceptable Ramachandran regions• Planar peptide bonds• Hydrogen bonding of buried residues• Apolar and polar residues properly
accommodated
Exceptions do existB1 fold has been changed to protein like Rop by changing 50% of the amino acids
1994 a 1000$ prize for changing fold by changing no more than 50% of aa
1997 Lynne Regan, Yale, won the prize
B1 domain of protein G
Rop dimer
“JANUS”
56 aa – 28 may change!
B1 and Rop have only three identical aa positions
Change key amino acids:
e g Rop Arg 16 & Asp45 form a salt bridge
No structure for Janus yet, but CD and NMR spectra indicate clear similarities to Rop, including dimer formation
Designing Protein/Peptide• May be easier to find aa-sequence which adopts specific
fold, than the opposite, i e to find the fold of a given sequence
• Zinc-finger peptide design:allow only certain types of aa in give regionscore Ala,Val,Leu,Ile,Phe,Tyr,Trpsurface Ala,Ser,Thr,His,Asp,Asn,Glu,Gln,Lys,Argboundary allow both core and surface setsno Pro,Cys,Met; Gly at special positions
try combinations of these, and evaluate their energy in computer
Designed peptideCan one design a peptide with a zinc-finger fold, without the zinc?
Real Zn-finger Designed – hydrophobic
stabilization
ProfilesA profile is a compilation of additional information about a sequence - it can even take into account 3D information if a 3D structure is known.
Such profiles can be used to evaluate relationships between proteins or protein families, even for distantly related proteins with little sequence similarity
ThreadingThis asks the inverse question of protein folding:
Given a 3D structure, which amino acid sequences are compatible with the structure?
Thread your sequence through representative set of folds and see if there is a match.
Easier, but not easy… we cannot always tell if a sequence fits a structure - our scoring or energy functions are not accurate enough
Structural GenomicsSimilar Sequence Similar Structure
The Protein Universewith protein “families”
RMSD
?
Known structures
Structural Genomics
Structural GenomicsWhich proteins?
Lab 2 goals
• Measure conformational details (distances, angles) in a protein
• Ramachandran plots• Extract information from coordinate file header• Investigate disulfide bonds• RasMol scripts