Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Let's Play LEGO!
● Use the PDB [1] as the building blocks for 3D protein construction.● Target is Baker's Top7 protein (novel fold) [2].
[1] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, P. E. Bourne, The Protein Data Bank, Nucl. Acids Res. 28 (2000) 235242.
[2] B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, D. Baker, Design of a novel globular protein fold with atomiclevel accuracy, Science 302 (2003) 13641368.
Baker's Top7 Protein
● Two betaalphabeta modules (with a strand in between).● Pure antiparallel betasheet.● PDB code 1QYS.
Top7 Design1) Make a 3D backbone of a novel fold (i.e. not yet observed in the PDB). The novelty here comes from the relative strand arrangement (21354).
2) Optimize the sequence on the backbone using a rotamer library.
3) Optimize the backbone given the sequence using an all atom potential energy (implicit solvent model).
4) Goto 2) until sequence/energy converges.
[1] B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, D. Baker, Design of a novel globular protein fold with atomiclevel accuracy, Science 302 (2003) 13641368.
RosettaDesign (http://www.unc.edu/kuhlmanpg/rosettadesign.htm) [1]
Our Approach1) Sequence > Secondary Structure Assignments
(alphahelices, betaturns, betastrands, coils, ...)by computer (SS prediction, ...), by lab (CD, NMR, ...), ...
2) Strands > Plausible betasheet topologies(at least N! * 2^(N2) solutions, where N is number of strands)start building the protein with it's betasheet; longrangesequence portions interact at shortrange in 3D structure!everything in between (turns, helices) is trapped!!
3) Topologies > 3D structures of betasheets(strands are curled and sheets are twisted, pleated and arched)existing software is not able to explore the conformational spaceof betasheets because of it's complexity (suffice not to updatesome phi/psi angles here and there).> we have a very nice solution to it!
Our Approach4) From the 3D structures of the betasheets add:
betaturns. betaalphabeta units.> we also have a nice solution to it too!
5) Optimize the hydrophobic moments of all the alphahelices witha simple search space operator and a simple energy function.(we will stop here...)
6) Use rotamer library to add sidechains.
7) Use a minimizer to correct: bond lengths (specially the peptidic link between blocks). steric clashes.
Start with the Sequence
● NNPREDICT (http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html) [2][3] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ----EEEE----------EEEE----HHHHHHHHHHHHHHH-----EEEEEEE--HHHHHHHHHHHHHHHHHHH----EEEE-----EEEE-------HHHH----
Legend: S for Sequence, C for Crystal, P for Prediction.
[1] McGuffin LJ, Bryson K, Jones DT. (2000) The PSIPRED protein structure prediction server. Bioinformatics. 16, 404405.[2] J. L. McClelland and D. E. Rumelhart. (1988) "Explorations in Parallel Distributed Processing" vol 3. pp 318362. MIT Press, Cambridge MA.[3] D. G. Kneller, F. E. Cohen and R. Langridge (1990) "Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network" J. Mol. Biol. (214) 171182.
● PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html) [1] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ---EEEEEEE-------EEEEEEE-HHHHHHHHHHHHHHHHH----EEEEEEEE--HHHHHHHHHHHHHHHHH----EEEEEE---EEEEEEEE------------
Legend: S for Sequence, C for Crystal, P for Prediction.
Use Hairpin and Helix Preds also● TURNPRED (http://www.jensmeiler.de/turnpred.html) [1] 1 2 3 4 5 6 7 8 9 456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789S: MGDIQVQVNIDDNGKNFDYTYTVTTESELQKVLNELMDYIKKQGAKRVRISITARTKKEAEKFAAILIKVFAELGYNDINVTFDGDTVTVEGQLEGGSLEHHHHHHC: -----EEEEEEETTEEEEEEE----HHHHHHHHHHHHHHHHHH---EEEEEEE---HHHHHHHHHHHHHHHHHH---EEEEEEETTEEEEEEE-------------P: ---EEEEEEEhhhhhhEEEEEE---HHHHHHHHHHHHHHHHH----EEEEEEEE--HHHHHHHHHHHHHHHHHH---EEEEEEhhhEEEEEEEE------------
Legend: S for Sequence, C for Crystal, P for Prediction.
[1] Kuhn, M.; Meiler, J.; Baker, D. Strandloopstrand motifs: prediction of hairpins and diverging turns in proteins, Proteins (2003) 54, 282288
● PROTSCALE (http://www.expasy.org/tools/protscale.html)
Secondary Structure Prediction
● Exploit the strength of the method.(ex. TURNPRED for betaturns)
● Use recent methods on recent databases.(not good to use Chou/Fasman parameterized in 1973 on59 proteins)
● Look for consistent and consensus predictions.(different predictions are due to ambiguous peptide signal)
● Relative prediction accuracy:betaturns > alphahelices > betastrands > coils
Top7 Secondary StructureSTRAND 4 10TURN 10 13STRAND 13 19LOOP 19 24HELIX 24 41LOOP 41 45STRAND 45 51LOOP 51 55HELIX 55 72LOOP 72 76STRAND 76 82TURN 82 85STRAND 85 91
Fragments from PDB
not new:
● ROSETTA [1]● PROFESY [2]● 3MER [3]● Assembly of Segments [4]...
Ideas behind the use of PDB fragments:
1) Sequence imposes the local backbone conformation [5].2) PDB blocks reduce the conformational search space [4].
(see also [6] for technical points)
Fragments from PDB (refs)[1] K. T. Simons, C. Kooperberg, E. Huang, D. Baker, Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 268 (1997) 20925.
[2] J. Lee, S. Y. Kim, K. Joo, I. Kim, J. Lee, Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins 56 (2004) 70414.
[3] E. Martineau, P. J. L'Heureux, J. R. Gunn, Biased fragment distribution in MC simulation of protein folding, J Comput Chem. 25 (2004) 1895903.
[4] I. Simon, L. Glasser, H. A. Scheraga, Calculation of protein conformation as an assembly of stable overlapping segments: application to bovine pancreatic trypsin inhibitor. PNAS 88 (1991) 36615.
[5] A. G. Street, S. L. Mayo, Intrinsic betasheet propensities result from van der Waals interactions between side chains and the local backbone. PNAS 96 (1999) 90746.
[6] J. B. Holmes, J. Tsai, Some fundamental aspects of building protein structures from fragment libraries.Protein Sci. 13 (2004) 163650.
What is a Backtracking?Variables Domains
V1V2V3
D1 : { 1, 2, 3, ..., 10 }D2 : { A, B, C, ..., Z }D3 : { I, II, III, ..., M }
The backtracking produces the Cartesian productD1 x D2 x D3
In a depthfirst searchlike manner
What is a Backtracking?
1 2 10...V1
A B Z...V2
I II M...V3
Partial assignments are checked for validity.If not valid then backtrack!
Backtrack Optimizations1) Variables with smallest domains should be at the root.
2) 1Consistency (nodeconsistency)often taken for granted because domain values are taken fromthe PDB (nucleotide conformation, WC relation, pair of betastrands, ...).
3) 2Consistency (arcconsistency)idea: when a backtrack fails it often fails for the same reason.> can we eliminate the deadends from the domain values?
compute the Cartesian product D(i) x D(i+1).if there is a value j in D(i) for which there are no valid subtreeswith k, D(i,j)D(i+1,k), then remove j from D(i).> not applicable for MCSYM (why? MCSYM is 2consistent...)
BetaStrand ShufflerGiven all the betastrands how do we arrange them to form a betasheet?
With N strands there are N! ways for relative ordering(123, 132, ...)
In each of these the strands can either go left or rightwhich leads to 2^N different strand orientations.
BUT there are 2 axes of symmetry (divide by two,two times), so total number of fold is “just”
N! * 2^(N2)
Now, consider sliding the strands with respect tothe others... this is called registering.
BetaStrand Shuffler
We need a program that:1) generates all plausible betasheet topology.2) scores each topology so we can select the best one.
see http://wwwlbit.iro.umontreal.ca/bShuffle/index.html...
BetaStrand Shuffler
1) generates all plausible betasheet topology.> this is straightforward...
2) scores each topology so we can select the best one.> this is not so obvious... why?
Context is a major determinant of betasheet propensity [1]. What determines the best registering between 2 strands?
residue pairings? hydrogen bond network? context??
[1] D. L. Minor Jr, P. S. Kim, Context is a major determinant of betasheet propensity. Nature 371 (1994), 2647.
BetaStrand ShufflerThe energy model is as follow:
Aminoacid composition in parallel and antiparallel sheets differ [1]. Aminoacid pairings are not “random” [2][3]. Betasheets often have an hydrophobic face [4]. Number of Hbonds given a topology (~2.8 kcal/mol/hbond) [5].
[1] S. Lifson, C. Sander, Antiparallel and parallel betastrands differ in amino acid residue preferences, Nature 282 (1979) 109111.[2] S. Lifson, C. Sander, Specific recognition in the tertiary structure of betasheets of proteins, J. Mol. Biol. 139 (1980) 627639.[3] H. Zhu, W. Braun, Sequence specificity, statistical potentials, and threedimensional structure prediction with selfcorrecting distance geometry calculations of betasheet formation in proteins, Protein Sci. 8 (1999) 326342.[4] J. F. Richardson, D. C. Richardson, Principles and patterns of protein conformation, Plenum Press, New York, 1989, Ch. 1, pp. 198.[5] D. N. Boobbyer, P. J. Goodford, P. M. McWhinnie, R. C. Wade, New hydrogenbond potentials for use in determining energetically favorable binding sites on molecules of known structure, J. Med. Chem. 32 (1989) 10831094.
Top7 BetaSheet
./bShuffle.exe -S -R 6 -O 19-45 -O 51-76 -H -B 23 ./baker.str
STRAND 4 10STRAND 13 19STRAND 45 51STRAND 76 82STRAND 85 91
● Maximum register sliding is 6 (strands are of length 7).● We want residues 19 and 45, as well as 51 and 76, to be on opposite sides of the sheet. This is to force the two helices to lie over the betasheet.● The mean number of Hbonds in all the generated topologies is 23.● Consider also the most hydrophobic face (-H).
Top7 BetaSheet
2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] Pairing Energy: -23.73 kcal/molHydrophobicity Energy: -7.04 kcal/mol
(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)
H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 85 with 51] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 82 with 85] Energy: -12.50 kcal/mol)
--------------------------------------- Total Sheet Energy: -43.27 kcal/mol
Lowest betasheet energy:
● As in Top7, with the Hbonding network [8285].● Has the most number of Hbonds among all topologies.
Top7 BetaSheet
2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] Pairing Energy: -21.93 kcal/molHydrophobicity Energy: -7.04 kcal/mol
(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)
H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 5 with 18] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 4 with 19] Energy: -12.50 kcal/mol)
--------------------------------------- Total Sheet Energy: -41.47 kcal/mol
Low betasheet energy:
● Same face hydrophobicity as crystal.● Networked salt bridge (D9R47E89) which is electrostatically more stable than isolated version [1] (not taken into account).
[1] S. Kumar, R. Nussinov, Salt bridge stability in monomeric proteins, J Mol Biol. 293 (1999) 124155.
Top7 BetaSheet
Low betasheet energy:
2 -> [ 13:K+][ 14:N ][ 15:F@][ 16:D-][ 17:Y@][ 18:T ][ 19:Y@] 1 <- [ 10:D-][ 9:D-][ 8:I ][ 7:N ][ 6:V ][ 5:Q ][ 4:V ] 4 -> [ 76:D-][ 77:I ][ 78:N ][ 79:V ][ 80:T ][ 81:F@][ 82:D-] 5 <- [ 91:Q ][ 90:G ][ 89:E-][ 88:V ][ 87:T ][ 86:V ][ 85:T ] 3 -> [ 45:R+][ 46:V ][ 47:R+][ 48:I ][ 49:S ][ 50:I ][ 51:T ] Pairing Energy: -20.89 kcal/molHydrophobicity Energy: -7.04 kcal/mol
(Face 1 Hydrophobicity Score: +15.71)(Face 2 Hydrophobicity Score: -20.36)
H-bonding Energy: -12.50 kcal/mol(Alternative 1: 26 H-bonds [ 85 with 82] Energy: -7.50 kcal/mol)(Alternative 2: 28 H-bonds [ 51 with 85] Energy: -12.50 kcal/mol)
--------------------------------------- Total Sheet Energy: -40.43 kcal/mol
● Same face hydrophobicity as crystal.● D9 and D76 are close in space.● Would force an alphahelix to lie on the polar face.
Top7 BetaSheet
We cannot discriminate between these plausible topologies based on:
Face hydrophobicity. Number of Hbonds.
Also each topology has an unpaired charged residue at a border strand. This should prevent amyloid fibril formation [1]. Remember that this has been crystalized...
[1] J. S. Richardson, D. C. Richardson, Natural betasheet proteins use negative design to avoid edgetoedge aggregation, PNAS 99 (2002) 27549.
BetaSheet Builder
Given a betasheet topology how can build 3D structures that satisfy the prescribed topology (including the hbonding network and the betabulges)?
Does only 1 betasheet conformation allows for proper placement of the alphahelices? If so which is it?
Can we explore the conformational search space for a betasheet?
see http://wwwlbit.iro.umontreal.ca/bBuilder/index.html...
BetaSheet BuilderBasically it backtracks on pairs of strands, assembling them in 3D.
BetaSheet Builder
BetaSheet Builder
Preserves the original Hbonds from the crystal structures...
BetaSheet Builder
Precision:
Flexible:
Accuracy test of the BetaSheet Builder on the betasheet of 1TML. The crystal structure has light grey cylinders while best RMSD (0.83A) structure has dark grey ones. Strand ribbons are pictured for the crystal structure.
Flexibility test of the BetaSheet Builder on the betasheet of 1TML. The crystal structure is in red while the worst RMSD rebuilt structure is in blue. Both structures are aligned along the strand 40 to 42. The high RMSD (7.88A) comes from the fact that the rebuilt structure chooses a different path as soon as the third strand from the top.
Top7 BetaSheet
354 VAL5 GLN6 VAL7 ASN8 ILE9 ASP10 ASP13 LYS14 ASN15 PHE16 ASP17 TYR18 THR19 TYR45 ARG46 VAL47 ARG48 ILE49 SER50 ILE51 THR76 ASP77 ILE78 ASN79 VAL80 THR81 PHE82 ASP85 THR86 VAL87 THR88 VAL89 GLU90 GLY91 GLN
LINKP 10 13LINKH 10 13LINKP 76 91LINKH 76 91LINKP 91 45LINKP 45 9LINKH 45 9LINKP 9 14LINKP 77 90LINKP 90 46LINKH 90 46LINKP 46 8LINKP 8 15LINKH 8 15LINKP 78 89LINKH 78 89LINKP 89 47LINKP 47 7LINKH 47 7LINKP 7 16LINKP 79 88LINKP 88 48LINKH 88 48LINKP 48 6LINKP 6 17LINKH 6 17LINKP 80 87LINKH 80 87LINKP 87 49LINKP 49 5LINKH 49 5LINKP 5 18LINKP 81 86LINKP 86 50LINKH 86 50LINKP 50 4LINKP 4 19LINKH 4 19LINKP 82 85LINKH 82 85LINKP 85 51
1101334621-9-2.str
● LINKP: betasheet partners.● LINKH: Hbond.
Top7 BetaSheet
psb.exe.AMD64 1101334621-9-2.str cullpdb_pc25_res3.0_R1.0_d040427_chains3083.bspider.pair.dat0.5 0.5 0.4 9999 1.0
backtrack tree size before arcconsistency: 1.12615e+14 (100%)backtrack tree size after arcconsistency: 8.04181e+13 ( 71%)
193243 partial structures were rejected for CB(i)CB(j) steric conflicts.31422720 partial structures were rejected for C(i)N(i+1) peptidic deformation 1.32 +/ 0.5A.
● 579 betasheet 3D structures at 1.0A from each other.● 13:10:57 on AMD64 @ 2.2 Ghz.
> closest RMSD to crystal structure is 1.0A (backboneonly).
Top7 BetaSheetStrand 410> cyan
Strand 1319> blue
Strand 4551> green
Strand 7682> red
Strand 8591> yellow
579 structuresaligned on thestrand 1319...
Add Turns
A program to add betaturns on betasheets...
● database of:< TFO(1,2), 3D structure >(turn fragments from PDB)
● distance between 2 TFOs
Top7 BetaTurns
STRAND 4 10● TURN 10 13STRAND 13 19STRAND 76 82● TURN 82 85STRAND 85 91
foreach i ( 1101334621-9-2-??????.pdb )./addFrag.exe -T -Z 1.5 baker.all.bab $i loops.25.dat
● Matrix distance (“closure”) is 1.5 Angstroms.● This leaves us 463 (80%) betasheets on 579 (100%).
Top7 BetaTurns
~20 A
BetaAlphaBeta Builder
77% of alphahelices are “tied” to betastrands.38% of alphahelices are “tied” at both N and Cterms.
BetaAlphaBeta BuilderBacktrack level 1:variables:
v1: Nterm loop.v2: helix.v3: Cterm loop.
domains:3D fragments from PDB.> generates 3D babs foreach bab units.
Backtrack level 2:variables:
v(i): the ith bab unit.domains:
previously assembled babs.> generates 3D structureswith coexisting babs.
BetaAlphaBeta Builder
The loops encode the stereochemistry of the interstrand crossovers...
BetaAlphaBeta Builder
WHY?
BetaAlphaBeta of Top7
On the 463 3D structures of betasheets with added betaturns
● We obtain 9559 3D structures with the 2 babs.
● Not all betasheets are suitable for the helices:115/463 (25%) did not yield 3D structures for babs!
● Stats on number of structures with 3D babs:Min: 1, Max: 176, Mean: 27.5, StdDev: 33.0
● About 4 days of computation on AMD64 @ 2.2 Ghz.
foreach i ( 1101334621-9-2-??????-??????.pdb )./addFrag.exe -L -Z 1.5 baker.all.bab $i loops.25.dat
Hydrophobic Fitness Score
[1] Huang, E. S., Subbiah, S. & Levitt, M. Recognizing native folds by the arrangement of hydrophobic and polar residues. J. Mol. Biol. 252 (1995) 709720. [2] Huang, E. S., Subbiah, S., Tsai, J. & Levitt, M. Using a hydrophobic contact potential to evaluate native and nearnative folds generated by molecular dynamics simulations. J. Mol. Biol. 257 (1996) 716725.
Hydrophobic Fitness Score
2 problems with the original formulation:
1) HP partition of aminoacids.Solution: use a hydrophobic scaling of the aminoacids.
2) Hard distance cutoffs.Solution: use a distance switching function.
Hydrophobic Fitness Score
[1] Cowan, R. & Whittaker, R. G. Hydrophobicity indices for amino acid residues as determined by highperformance liquid chromatography. Pept. Res. 3 (1990) 7580.
Hydrophobic scaling of the aminoacids:
Hydrophobic Fitness Score
Distance switching function:
Plot[ (1-Tanh[ x – 10 ])/2, {x, 0, 20} ]
Hydrophobic Fitness Score
And finally:
Helix Hydrophobic MomentThe BetaAlphaBeta Builder is a geometrical process; it makes sure that the loophelixloop fragment can be attached on the betasheet; it does not take into account the hydrophobic moment of the helix...
Simple procedure to optimize the hydrophobic moment of helices within a 3D structure:
consider each helix one at the time.1) rotate it 360 degrees and remember best rotation(we optimize the helix within the context of the others aswe find them currently in the 3D structure).2) rotate that helix at it's best rotation.repeat while an helix has rotated to a different angle.
Best rotation: minimum of the HF score!
optHelix
HFS: 2.95 HFS: 5.80
● HFS score is better.● We have disconnected the helices because of the rotation.
optHelix+154
10
7
4
...
Add Loops
● The helix rotation optimization procedure can disconnectthe helix from it's loop because of the rotation.
● We need a procedure to rebuild these loops:addTurn L (instead of adding turns we'll add loops)
● We will destroy any 3D structure whose loops cannot connect the helices properly (say within 1.5 A).
HFS vs RMSD
1QYS HFS Score: 7.37
Best Solution
RMSD: 1.34 HFS: 7.20
Conclusions
● Possible to build atomic precision 3D structures from PDB fragments.● Possible to build novel folds (goal of Top7) from PDB fragments.● Independence of sequence, except:
betasheet topology determination (start). helix hydrophobic moment optimization (end).
● Not all betasheet 3D structures can accommodate BABs.● The Hydrophobic Fitness Score is a properly behaved function to
pick the nativelike structures (in our example).