1
www.buffalo.edu Introduction The ability to predict the loop structure in a protein is useful in many studies, including homology modeling, protein design and docking. There are significant challenges in obtaining the high quality models as the loop length increases. The current research aims to overcome the challenges caused by the ruggedness of the energy landscape around a native protein structure, i.e. the presence of high energy barriers immediately around the structure by locally manipulating the shape of the energy landscape during certain steps of the conformational search. Methodology Sequence – Robust Loop Modeling with PyRosetta Aparajita Dasgupta, Dr. Sheldon Park Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Email: [email protected], [email protected] PyRosetta is the Python version of Rosetta, a suite of software to support computational protein structure analysis. In the context of Rosetta, the kinematic closure (KIC) loop algorithm, allows prediction of the structure of loops of up to twelve amino acids with high accuracy, i.e. < 1 Å (Mandell et al Nature Method 2009, 6:551-2). We note that protein structure, especially the main chain conformation, often exhibits robustness against small sequence variations. Using such transient mutations which smooth the energy landscape creates the possibility of improving results during the conformational search. Figure 1: Procedure to improve conformational search by introducing transient mutations using KIC loop protocol in PyRosetta Results Most protein structures yielded “funnel – shaped” continuous graphs while only some diverged from this trend Merely increasing the number of wild type structures (structures without any alanine mutation) did not lead to improved results Results Future Work Citations Acknowledgments Figure 2: RMSD vs minimized energy for each of the 20 wild type (non-mutated) proteins. Each graph represents 600 structures generated by the KIC loop protocol. Note the funnel shaped contour in most cases. For the proteins where the contour develops differently, prediction of loop structure is very difficult due to the presence of multiple conformations with different energies at the same RMSD Figure 3: RMSD vs minimized energy for 3 wild type(1cnv, 1t1d and 1i7p) proteins. Each graph represents 7500 structures generated by the KIC loop protocol for wild type structures. Although the overall energy surface behaves similar as in the case of 600 structures, there is no marked improvement in either minimizing energy or predicting loop structure. This leads to the conclusion that site directed mutagenesis is indeed the right approach. Furthermore, increasing the structures also did not yield the classic “funnel-shaped” energy contour that is favorable for loop prediction as is evident in the cases of 1cnv and 1i7p. This is due to the fact that while the number of conformations does indeed increase, the energy landscape is not smoothed and hence those structures which may be possible but are not calculated due to the presence of a local maxima are not taken into account in this case as well. One dimensional analysis of RMSD did not yield any conclusive results to point out which amino acids (if any) led to more difficult energy landscapes for modeling purposes Mutated structures led to lower energy and resulted in better structure prediction Figure 4: Boxplots depicting distribution of LRMSD for each of the 20 amino acids. For each proteins and its 13 versions (12 mutants and 1 wild type), the minimum RMSD was calculated and the mutated residue for that particular structure was noted. Boxplots were plotted to visualize if any clear trends appeared signifying which amino acids posed an issue in de-novo modeling. While some amino acids are common in occurrence as compared to others, a clear trend was not visible while plotting. The main conclusion drawn from this exercise was that one dimensional analysis does not yield any trends and that a two dimensional analysis of RMSD with another observable property (Energy, in current experiment) is vital to clearly understand the bottlenecks associated with loop modeling Figure 5: RMSD vs minimized energy for all 20 proteins for wild type and mutant structures. Each data point on each graph represents a single average structure from the cluster which were formed from each type of mutant. The blue data points are mutant structures while the purple data points are wild type structures. In all cases the mutated structures had lower energy than the wild type structure. This leads us to the conclusion that site directed mutagenesis can indeed lead to improved de novo structure prediction when coupled with the KIC loop protocol. Since energy and RMSD are significantly lower than the wild type structures, the odds of arriving at a correct structure increase greatly when using these mutated structures. While applying site directed mutagenesis led to better results, there are still minor differences in the predicted structure and the actual structure Our initial approach was to combine all mutants and wild type structures together and determine whether this smoothed the energy landscape further However, this approach did not yield conclusive results The current approach is to aim to linearize the RMSD and energy relationship for each protein near the lowest energy threshold obtained using linear regression techniques and neural networks The authors would like to thank the Figure 6: 1cnv native structure and minimum energy model mutated back to wild type. The RMSD is 3.3 A for this system. The current algorithm still leaves a few questions to be answered with regards to the energy function, the role of each type of amino acid and the characteristic energy landscape for each protein 1. Mandell, J. D., Coutsias, A. E., & Kortemme, T. (2009). Sub- angstrom accuracy in protein loop reconstruction by robotics- inspired conformational sampling. Nature Methods . 2. Baugh, E. H., Lyskov, S., Weitzner, B. D., & Gray, J. (2011). Real-Time PyMOL Visualization for Rosetta and PyRosetta. PLOS One . 3. Das R, Baker D (2008) Macromolecular modeling with Rosetta. Biochemistry 77: 363– 382.

CBE_Symposium_Poster_Aparajita - sjp

Embed Size (px)

Citation preview

Page 1: CBE_Symposium_Poster_Aparajita - sjp

www.buffalo.edu

Introduction The ability to predict the loop structure in a protein is

useful in many studies, including homology modeling, protein design and docking.

There are significant challenges in obtaining the high quality models as the loop length increases.

The current research aims to overcome the challenges caused by the ruggedness of the energy landscape around a native protein structure, i.e. the presence of high energy barriers immediately around the structure by locally manipulating the shape of the energy landscape during certain steps of the conformational search.

Methodology

Sequence – Robust Loop Modeling with PyRosettaAparajita Dasgupta, Dr. Sheldon Park

Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Email: [email protected], [email protected]

PyRosetta is the Python version of Rosetta, a suite of software to support computational protein structure analysis. In the context of Rosetta, the kinematic closure (KIC) loop algorithm, allows prediction of the structure of loops of up to twelve amino acids with high accuracy, i.e. < 1 Å (Mandell et al Nature Method 2009, 6:551-2).

We note that protein structure, especially the main chain conformation, often exhibits robustness against small sequence variations. Using such transient mutations which smooth the energy landscape creates the possibility of improving results during the conformational search.

Figure 1: Procedure to improve conformational search by introducing transient mutations using KIC loop protocol in PyRosetta

Results Most protein structures yielded “funnel – shaped”

continuous graphs while only some diverged from this trend

Merely increasing the number of wild type structures (structures without any alanine mutation) did not lead to improved results

Results Future Work

Citations

Acknowledgments

Figure 2: RMSD vs minimized energy for each of the 20 wild type (non-mutated) proteins. Each graph represents 600 structures generated by the KIC loop protocol. Note the funnel shaped contour in most cases. For the proteins where the contour develops differently, prediction of loop structure is very difficult due to the presence of multiple conformations with different energies at the same RMSD

Figure 3: RMSD vs minimized energy for 3 wild type(1cnv, 1t1d and 1i7p) proteins. Each graph represents 7500 structures generated by the KIC loop protocol for wild type structures. Although the overall energy surface behaves similar as in the case of 600 structures, there is no marked improvement in either minimizing energy or predicting loop structure. This leads to the conclusion that site directed mutagenesis is indeed the right approach. Furthermore, increasing the structures also did not yield the classic “funnel-shaped” energy contour that is favorable for loop prediction as is evident in the cases of 1cnv and 1i7p. This is due to the fact that while the number of conformations does indeed increase, the energy landscape is not smoothed and hence those structures which may be possible but are not calculated due to the presence of a local maxima are not taken into account in this case as well.

One dimensional analysis of RMSD did not yield any conclusive results to point out which amino acids (if any) led to more difficult energy landscapes for modeling purposes

Mutated structures led to lower energy and resulted in better structure prediction

Figure 4: Boxplots depicting distribution of LRMSD for each of the 20 amino acids. For each proteins and its 13 versions (12 mutants and 1 wild type), the minimum RMSD was calculated and the mutated residue for that particular structure was noted. Boxplots were plotted to visualize if any clear trends appeared signifying which amino acids posed an issue in de-novo modeling. While some amino acids are common in occurrence as compared to others, a clear trend was not visible while plotting. The main conclusion drawn from this exercise was that one dimensional analysis does not yield any trends and that a two dimensional analysis of RMSD with another observable property (Energy, in current experiment) is vital to clearly understand the bottlenecks associated with loop modeling

Figure 5: RMSD vs minimized energy for all 20 proteins for wild type and mutant structures. Each data point on each graph represents a single average structure from the cluster which were formed from each type of mutant. The blue data points are mutant structures while the purple data points are wild type structures. In all cases the mutated structures had lower energy than the wild type structure. This leads us to the conclusion that site directed mutagenesis can indeed lead to improved de novo structure prediction when coupled with the KIC loop protocol. Since energy and RMSD are significantly lower than the wild type structures, the odds of arriving at a correct structure increase greatly when using these mutated structures.

While applying site directed mutagenesis led to better results, there are still minor differences in the predicted structure and the actual structure

Our initial approach was to combine all mutants and wild type structures together and determine whether this smoothed the energy landscape further

However, this approach did not yield conclusive results

The current approach is to aim to linearize the RMSD and energy relationship for each protein near the lowest energy threshold obtained using linear regression techniques and neural networks

The authors would like to thank the UB School of Engineering and Applied Science

Figure 6: 1cnv native structure and minimum energy model mutated back to wild type. The RMSD is 3.3 A for this system. The current algorithm still leaves a few questions to be answered with regards to the energy function, the role of each type of amino acid and the characteristic energy landscape for each protein

1. Mandell, J. D., Coutsias, A. E., & Kortemme, T. (2009). Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nature Methods .

2. Baugh, E. H., Lyskov, S., Weitzner, B. D., & Gray, J. (2011). Real-Time PyMOL Visualization for Rosetta and PyRosetta. PLOS One .

3. Das R, Baker D (2008) Macromolecular modeling with Rosetta. Biochemistry 77: 363–382.