7
Protein & Peptide Letters, 2012, 19, 725-731 725 1875-5305/12 $58.00+.00 © 2012 Bentham Science Publishers Predicting Protein Crystallizability and Nucleation Nuria Sánchez-Puig 1 , Claude Sauter 2 , Bernard Lorber 2 , Richard Giegé 2 and Abel Moreno 1, * 1 Instituto de Química, Universidad Nacional Autónoma de México, Circuito Exterior, C.U. México, D.F. 04510 (México); 2 Architecture et Réactivité de l’ARN, Université de Strasbourg, CNRS, IBMC, UPR9002, 15 rue René Des- cartes 67000 Strasbourg (France) Abstract: The outcome of protein crystallization attempts is often uncertain due to inherent features of the protein or to the crystallization process that are not fully under control of the experimentalist. The aim of this contribution is to propose user-friendly tools that can increase the success rate of a protein crytallization project. Different bioinformatic approaches to predict the crystallization feasibility (before any crystallization attempts are undertaken) are discussed and a novel ap- proach to assess the nucleation process of a given protein is proposed. Practical examples illustrate these two points. Keywords: Proteins, crystallization feasibility, crystal growth, X-ray diffraction, nucleation, surface energy. 1. INTRODUCTION X-ray crystallography is a powerful technique for deter- mining the 3D structure of proteins. However, as obvious as it seems, it requires the production of protein crystals of suit- able quality. Despite the existence of a huge variety of crys- tallization technologies and access to high-throughput screening systems, statistics from the various structural pro- grams indicate that only ~15% of expressed proteins yield diffracting crystals. This represents a very low success rate considering the cumulative difficulties of cloning, expressing and purifying the proteins [1]. The reasons why some pro- teins do not crystallize are difficult to identify, but most likely relate to the intrinsic physico-chemical properties of the protein [2]. Therefore, it is useful to have user-friendly tools that allow the experimenter to a priori select possible successful protein targets for crystallization or to identify problematic proteins. Proteins recalcitrant to crystallize could be highly flexible [3] and even completely unstruc- tured or despite being well folded would not nucleate prop- erly for many reasons, such as a propensity to aggregate as amorphous phase or difficulties to form stable crystal con- tacts [4]. Thus, getting good crystals can be very tricky and often needs a combination of protein engineering, the use of sophisticated crystallization techniques and a good under- standing of the nucleation and crystal growth process. This paper sketches different tools that may help to in- crease the success rate of a crystallization project. The first part covers methods useful for crystallization prediction be- fore any crystallization attempts are undertaken. The second part presents a new way to predict the nucleation output to- gether with some initial practical applications. Finally, con- siderations on the importance of impurities in protein crystal growth are presented. *Address correspondence to this author at the Instituto de Química, UNAM. Circuito Exterior, C.U. México, D.F. 04510 Mexico (Mexico); Tel: +52-55-56224467; Fax: +52-55-56162217; E-mail: [email protected] 2. PREDICTING PROTEIN CRYSTALLIZABILITY Protein biochemistry has been dominated by the idea that a well-defined structure is a pre-requisite for function. How- ever, this structure-function paradigm has been re-evaluated based on the growing evidence that many proteins contain disordered domains and that polypeptide chains can exhibit considerable flexibility or be completely unfolded in their functional states [5]. For long, the structural characterization of such proteins has been hindered due to their highly dy- namic conformations. To overcome this bottleneck, crystalli- zation data from the Structural Genomic consortia have been used to create crystallizability predictors that require only the protein sequence as input [6-14]. Depending on the program, the score can be as simple as the phrase “crystallizable pro- tein” or “non-crystallizable protein” like in the case of the OB-Score program [10]. Programs such as XtalPred [13] give the output as a number between 1 and 5, the highest value standing for proteins "very difficult" to crystallize and the lowest for proteins expected to be "optimal" to crystal- lize. These algorithms analyze protein characteristics such as sequence length, isoelectric point, average hydropathy [15], instability index [16] and existence of low complexity re- gions such as signal peptides [17], trans-membrane helices [18] and coiled-coil regions [19]. Comparative analyses sug- gest that the programmes XtalPred [13], CRYSTALP2 [8], ParCrys [11] and OB-score [10] have a similar prediction accuracy of ~70%. A majority-vote based combination of these methods only improved the prediction success rate to 74% [20]. The program MetaPPCP is a linear model tree- based meta-predictor that uses the complementarity of the state-of-the-art protein crystallization propensity predictors and claims to improve the accuracy of predictions up to over 80% [9]. Besides analyzing the crystallization propensity, it is also important to identify if the protein of interest is an intrinsi- cally unstructured protein or not, since this may also hinder its chances of crystallizing. There are several programs use- ful at predicting the propensity of regions longer than 40

Predicting Protein Crystallizability and Nucleation

Embed Size (px)

Citation preview

Protein & Peptide Letters, 2012, 19, 725-731 725

1875-5305/12 $58.00+.00 © 2012 Bentham Science Publishers

Predicting Protein Crystallizability and Nucleation

Nuria Sánchez-Puig1, Claude Sauter

2, Bernard Lorber

2, Richard Giegé

2 and Abel Moreno

1,*

1Instituto de Química, Universidad Nacional Autónoma de México, Circuito Exterior, C.U. México, D.F. 04510

(México); 2Architecture et Réactivité de l’ARN, Université de Strasbourg, CNRS, IBMC, UPR9002, 15 rue René Des-

cartes 67000 Strasbourg (France)

Abstract: The outcome of protein crystallization attempts is often uncertain due to inherent features of the protein or to

the crystallization process that are not fully under control of the experimentalist. The aim of this contribution is to propose

user-friendly tools that can increase the success rate of a protein crytallization project. Different bioinformatic approaches

to predict the crystallization feasibility (before any crystallization attempts are undertaken) are discussed and a novel ap-

proach to assess the nucleation process of a given protein is proposed. Practical examples illustrate these two points.

Keywords: Proteins, crystallization feasibility, crystal growth, X-ray diffraction, nucleation, surface energy.

1. INTRODUCTION

X-ray crystallography is a powerful technique for deter-mining the 3D structure of proteins. However, as obvious as it seems, it requires the production of protein crystals of suit-able quality. Despite the existence of a huge variety of crys-tallization technologies and access to high-throughput screening systems, statistics from the various structural pro-grams indicate that only ~15% of expressed proteins yield diffracting crystals. This represents a very low success rate considering the cumulative difficulties of cloning, expressing and purifying the proteins [1]. The reasons why some pro-teins do not crystallize are difficult to identify, but most likely relate to the intrinsic physico-chemical properties of the protein [2]. Therefore, it is useful to have user-friendly tools that allow the experimenter to a priori select possible successful protein targets for crystallization or to identify problematic proteins. Proteins recalcitrant to crystallize could be highly flexible [3] and even completely unstruc-tured or despite being well folded would not nucleate prop-erly for many reasons, such as a propensity to aggregate as amorphous phase or difficulties to form stable crystal con-tacts [4]. Thus, getting good crystals can be very tricky and often needs a combination of protein engineering, the use of sophisticated crystallization techniques and a good under-standing of the nucleation and crystal growth process.

This paper sketches different tools that may help to in-crease the success rate of a crystallization project. The first part covers methods useful for crystallization prediction be-fore any crystallization attempts are undertaken. The second part presents a new way to predict the nucleation output to-gether with some initial practical applications. Finally, con-siderations on the importance of impurities in protein crystal growth are presented.

*Address correspondence to this author at the Instituto de Química, UNAM.

Circuito Exterior, C.U. México, D.F. 04510 Mexico (Mexico); Tel: +52-55-56224467; Fax: +52-55-56162217; E-mail: [email protected]

2. PREDICTING PROTEIN CRYSTALLIZABILITY

Protein biochemistry has been dominated by the idea that a well-defined structure is a pre-requisite for function. How-ever, this structure-function paradigm has been re-evaluated based on the growing evidence that many proteins contain disordered domains and that polypeptide chains can exhibit considerable flexibility or be completely unfolded in their functional states [5]. For long, the structural characterization of such proteins has been hindered due to their highly dy-namic conformations. To overcome this bottleneck, crystalli-zation data from the Structural Genomic consortia have been used to create crystallizability predictors that require only the protein sequence as input [6-14]. Depending on the program, the score can be as simple as the phrase “crystallizable pro-tein” or “non-crystallizable protein” like in the case of the OB-Score program [10]. Programs such as XtalPred [13] give the output as a number between 1 and 5, the highest value standing for proteins "very difficult" to crystallize and the lowest for proteins expected to be "optimal" to crystal-lize. These algorithms analyze protein characteristics such as sequence length, isoelectric point, average hydropathy [15], instability index [16] and existence of low complexity re-gions such as signal peptides [17], trans-membrane helices [18] and coiled-coil regions [19]. Comparative analyses sug-gest that the programmes XtalPred [13], CRYSTALP2 [8], ParCrys [11] and OB-score [10] have a similar prediction accuracy of ~70%. A majority-vote based combination of these methods only improved the prediction success rate to 74% [20]. The program MetaPPCP is a linear model tree-based meta-predictor that uses the complementarity of the state-of-the-art protein crystallization propensity predictors and claims to improve the accuracy of predictions up to over 80% [9].

Besides analyzing the crystallization propensity, it is also important to identify if the protein of interest is an intrinsi-cally unstructured protein or not, since this may also hinder its chances of crystallizing. There are several programs use-ful at predicting the propensity of regions longer than 40

726 Protein & Peptide Letters, 2012, Vol. 19, No. 7 Sánchez-Puig et al.

consecutive residues to be disordered [21-31]. The most widely used disorder predictors are those of the PONDR® family. The output of these predictors is a position-dependent score, which varies between 0 and 1 with a mini-mum threshold of 0.5 for a residue to be considered disor-dered. In addition, analysis of both, the amino acid composi-tion and the sequence of the protein of interest, can also hint to a protein being intrinsically unstructured. Statistical analy-sis has shown that intrinsically disordered proteins or seg-ments thereof, consist of low complexity amino acid se-quences. They commonly contain high levels of the amino acids R, K, E, P and S, denoted as “disorder promoting resi-dues”, and low levels of the amino acids C, W, Y, I and V, considered to be “order promoting residues” [32]. The se-quence of a protein can be used to calculate parameters such as the net mean charge <R> and mean hydropathy <H>. A combination of low mean hydropathy and relatively high net mean charge are determinants for the absence of compact structure in proteins [33, 34]. To calculate these values it is necessary to consider that the mean net charge <R> is de-fined as the absolute value of the difference between the numbers of positively and negatively charged residues di-vided by the total number of residues. The mean <H> is the sum of normalized hydrophobicities of individual residues divided by the total number of amino acids present in the protein. Individual hydrophobicities are calculated as de-scribed in [15]. Disordered proteins display characteristic <R> and <H> values that can be compared to those de-scribed in [34]. Analysis such as that mentioned above has proven accurate at predicting disordered proteins [35, 36]. If a protein fulfills all the criteria mentioned above it strongly suggests that it belongs to the class of intrinsically disordered proteins. Although very difficult, crystallizing an intrinsi-

cally unstructured protein may be achieved in complex with an endogenous partner. However, expression and purifica-tion of such partners are not always possible, nor do all pro-teins have known binding partners. Furthermore, even when the binding partners are known it is important to get the cor-rect boundaries of the target protein such that any flexible regions that do not participate in the interaction are removed. A good example of such a strategy concerns the crystalliza-tion of the C-terminal transactivation domain of HIF-1 bound to the Taz1 domain of CBP and to FIH [37, 38].

In summary, different bioinformatics tools are available that can help scientists to decide if a target protein is suitable for crystallization. This knowledge can, in turn, led to rede-fine goals and to establish what type of project is worth pur-suing. This is of practical and strategic importance since working with a protein whose crystallization prediction is low, may represent a long and hard process before getting the structure and may not be worth enduring. A diagram with a decision-tree summarizing the purpose of this section is shown in (Figure 1). Finally, strategies to engineer the target protein can be found in Ruggiero et al., on this special issue.

3. PREDICTING PROTEIN NUCLEATION

3.1. Background

Understanding protein crystal nucleation and crystal growth is important to successfully crystallize a protein. Nu-cleation is the first step in any crystallization process and deals with nucleus formation, nuclei size distribution and nucleus growth rate until nuclei reach a detectable size. Typically, nucleation involves different time scales occurring simultaneously, namely 0.01 ns for molecular conformations,

Figure 1. Schematic view of protein primary structure analysis to predict the crystallization feasibility of a protein and possible decision

routes to take according to the results obtained.

Predicting Protein Crystallizability and Nucleation Protein & Peptide Letters, 2012, Vol. 19, No. 7 727

ns for surface structure and defect displacements [39], μs for surface step displacement [40], ms for growth of one atomic layer [41], seconds for hydrodynamic transport, and minutes for the homogenous nucleation phenomena [42]. There are several methods to investigate the chemical and physical interactions between the different molecules present during crystal growth. For the nucleation step only few techniques allow optimal resolution in terms of time scaling, quality of the imaging measurement and visualization of the process. In general, dynamic (DLS) and static light scattering (SLS) methods are useful to analyze features such as homogeneity, nucleation, protein-protein interactions and pre-crystalli-zation conditions [43].

Here we present a method to predict the nucleation be-havior of a protein based on the knowledge of hydrodynamic parameters obtained by light scattering and of crystallogene-sis parameters obtained under pre-crystallization conditions.

3.2. How to Relate Nucleation and Hydrodynamic Pa-rameters

Following the classical theory of crystallization in solu-tion and the energetics of crystal nucleation, in particular that decribed by Kashiev et al. [44] for the calculation of

W(n)

(i.e. the work necessary to assemble n protein molecules into an n-sized cluster) and the revised formula by Stefan-Skapski-Turnbull [44] for the estimation of the specific sur-face energy of the cluster/solution interface, one can express the Gibbs free energy ( G) of nucleation as a function of the radius of the n-sized protein cluster as follows:

G = – [(4/3 r3)/ ] kT ln + 4 r

2

where r is the radius of the protein or protein-cluster, is the molar volume occupied by a protein unit in the crystal, k is the Boltzmann constant, T is the absolute temperature, is the supersaturation (the ratio between the protein concentra-tion C in mg mL

–1 and the solubility Ce which is the protein

concentration remaining in solution in equilibrium with the crystals), and is the specific surface energy of the clus-ter/solution interface. Details on the establisment of the G = f(r) formula will be published elsewhere.

The plot of G versus r shows a maximum of G repre-senting the critical Gibbs free energy energy ( G*) of the nucleation phenomenom and defines the critical radius (r*) of the protein cluster, i.e. the nucleus from which a crystal will grow (Figure 2). The G = f(r) plot can be experimen-tally computed for any protein (or macromolecular assem-bly) provided experimental values of , , and are avail-able. They can be derived from light scattering (DLS), crys-tallographic and solubility data.

Experimental values of are typically derived from crystallographic data, but could also be roughly estimated from the hydrodynamic radius of the protein obtained by DLS measurements. Experimental estimates of require knowledge of the solubility or supersaturation of the protein at the tested condition. This can be done via the revised Stefan-Skapski-Turnbull formula [44]:

= B [–kT ln (Ce o)]/ o2/3

where B is a numerical factor (ranging from 0.2 to 0.6, its value being ~0.514 for spherical clusters) and o

is the spe-

cific volume of the protein (estimated from o = Mr / NA, with the density of the crystal (g cm

–3), Na is the Avogadro

number and Mr is the molecular mass of the protein (g mol–1

) or alternatively via the equation established previously in [45]

2 vo/r = kT ln + A2 kT Mr Ce ( –1)

where both A2, the osmotic second virial coefficient (in mol cm

3 g

–2), and r the radius of the protein and can be de-

rived from DLS measurements. To avoid confusion, it is considered that the supersaturation is an alternative expres-sion defined as the ratio between the protein concentration C (in mg mL

–1) and Ce (in mg mL

–1).

3.3. A Few Test Case Predictions of r*

G = f(r) plots have been computed for three test case proteins, namely hen egg white lysozyme (Mr 14300), thau-matin (Mr 22200) and apoferritin (Mr 443000) based on in-house experimental data. Table 1 shows the main results, including the estimated critical nucleus radius (r*) and the critical Gibbs free energy of nucleation ( G*), for the aforementioned proteins. Large variations appear when com-paring the numerical values of these two crystallogenesis parameters for the three test-case proteins, with the largest values of both r* and G* for thaumatin. Interestingly, the value of 50 protein molecules forming the nucleus of lysozyme crystals compares well with data from the litera-ture obtained by other means (20–44 protein units) [46, 47] but deviates significantly from other estimates where size of the nuclei were either smaller [48] or much larger [49, 50]. Of course, predicted values are subject to errors on the ex-perimental data since the parameters such as , , and , are determined using biophysical methods. For instance, errors of 10% of the values (due to experimental conditions) lead to maximal deviation of 20% on r

* in some cases (see article of

Peter Vekilov published in this special issue on nucleation). Application to other proteins together with a detailed de-scription experimental procedures, critical analysis of the results and comparison with data from literature obtained by other experimental approaches will be published elsewhere.

4. CONCLUSIONS AND PERSPECTIVES

Purity is the first variable that is essential to obtain good crystals. It has been demonstrated for a long time that mac-romolecular contaminants and micro-heterogeneities that are present within a protein batch poison the faces of growing crystals and alter the crystal packing. In order to understand the effects of these impurities on crystal growth and the cor-relation with the surface energy it is necessary to estimate the value of the surface energy using any of the revised equations shown in this contribution. These calculations were applied to three model proteins (lysozyme, thaumatin and apo-ferritin) to test our approach. From these results we conclude that the lower the value of the surface energy the higher the propensity of a protein to be poisoned in its crystal growth mechanism.

Concerning the crystal growth there is no unique recipe to crystallize soluble or non-soluble proteins, most of them need special and particular care based on multifactorial ele-ments (precipitants, temperatures, pH values, transport prop-

728 Protein & Peptide Letters, 2012, Vol. 19, No. 7 Sánchez-Puig et al.

Figure 2. Gibbs energy as a function of number of protein units for (A) Lysozyme, (B) Thaumatin and (C) Ferritin.

Predicting Protein Crystallizability and Nucleation Protein & Peptide Letters, 2012, Vol. 19, No. 7 729

Table 1. Physicochemical Crystallogenesis Parameters for Different Proteins

Parameters Hen egg white Lysozyme Thaumatin Apo-Ferritin

= C / Ce 2.5 2.7 43

0, cm3 0.2 x 10-19 0.3 x 10-19 5.0 x 10-19

, J cm-2 10 x 10-8 20 x 10-8 0.24 x 10-8

, cm3 2.4 x 10-19 5.3 x 10-19 62 x 10-19

r* (in protein units) 50 ± 5 220 ± 22 30 ± 3

G*, J mol-1 36 x 103 1500 x 106 24 x 103

* means critical; r* values are given assuming an error of 10% on , , and .

erties, etc.). The propensity of any protein to be crystallized depends on the nature of the protein, but it can be hinted using some of the tools presented here. However, these as-pects are not usually taken into account and academic labs seldom predict the crystallization propensity or analyze their target protein before starting a difficult structural biology project. The strategy shown in Figure 1 should be followed as a normal procedure before using kits to search for the crystallization conditions.

Finally, after having understood the physicochemical crystallization behavior of the target protein, or having ana-lyzed the propensity to crystallize, it is possible to move into crystallization trials. Figure 3 shows the schematic view of de novo protein crystallization strategy to be followed. It is

worth mentioning that temperature also plays an important role (it should be kept constant) while searching for the crys-tallization conditions as well as the presence of impurities. All these ideas and procedures should be taken into account when trying to crystallize a new protein. Choosing the cor-rect strategies and a good understanding of the protein of interest may provide the experimenters with higher chances of crystallization success in a shorter time.

ACKNOWLEDGEMENTS

The authors thank Université de Strasbourg and French CNRS and Agence National pour la Recherche (ANR-09-BLAN-0091-03) for support. A.M. acknowledges DGAPA-UNAM project PAPIIT No. IN201811 for financial support

Figure 3. Schematic strategies for de novo protein crystallization.

730 Protein & Peptide Letters, 2012, Vol. 19, No. 7 Sánchez-Puig et al.

for this research and CNRS for sponsorship during his sab-batical visit in Strasbourg. N. S-P acknowledges the financial support from UNAM-DGAPA project PAPIIT No. IN204010.

REFERENCES

[1] Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon,

A. Structural genomics: from genes to structures with valuable ma-terials and many questions in between. Nat. Methods, 2008, 5(2),

129-132. [2] Dale, G.E.; Oefner, C.; D'Arcy, A. The protein as a variable in

protein crystallization. J. Struct., Biol., 2003, 142(1), 88-97. [3] Malawski, G.A.; Hillig, R.C.; Monteclaro, F.; Eberspaecher, U.;

Schmitz, A.A.; Crusius, K.; Huber, M.; Egner, U.; Donner, P.; Muller-Tiemann, B. Identifying protein construct variants with in-

creased crystallization propensity--a case study. Protein Sci., 2006, 15(12), 2718-2728.

[4] Moon, A.F.; Mueller, G.A.; Zhong, X.; Pedersen, L.C. A synergis-tic approach to protein crystallization: combination of a fixed-arm

carrier with surface entropy reduction. Protein Sci, 2010, 19(5), 901-913.

[5] Dyson, H.J.; Wright, P.E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell. Biol., 2005, 6(3), 197-208.

[6] Chen, K.; Kurgan, L.; Rahbari, M. Prediction of protein crystalliza-tion using collocation of amino acid pairs. Biochem. Biophys. Res.

Commun., 2007, 355(3), 764-769. [7] Kandaswamy, K.K.; Pugalenthi, G.; Suganthan, P.N.; Gangal, R.

SVMCRYS: an SVM approach for the prediction of protein crys-tallization propensity from protein sequence. Protein Pept. Lett.,

2010, 17(4), 423-430. [8] Kurgan, L.; Razib, A.A.; Aghakhani, S.; Dick, S.; Mizianty, M.;

Jahandideh, S. CRYSTALP2: sequence-based protein crystalliza-tion propensity prediction. BMC Struct. Biol., 2009, 9, 50.

[9] Mizianty, M.J.; Kurgan, L. Meta prediction of protein crystalliza-tion propensity. Biochem. Biophys. Res. Commun., 2009, 390(1),

10-15. [10] Overton, I.M.; Barton, G.J. A normalised scale for structural ge-

nomics target ranking: the OB-Score. FEBS Lett., 2006, 580(16), 4005-4009.

[11] Overton, I.M.; Padovani, G.; Girolami, M.A.; Barton, G.J. ParCrys: a Parzen window density estimation approach to protein crystalliza-

tion propensity prediction. Bioinformatics, 2008, 24(7), 901-907. [12] Overton, I.M.; van Niekerk, C.A.; Barton, G.J. XANNpred: neural

nets that predict the propensity of a protein to yield diffraction-quality crystals. Proteins, 2006, 79(4), 1027-1033.

[13] Slabinski, L.; Jaroszewski, L.; Rychlewski, L.; Wilson, I.A.; Les-ley, S.A.; Godzik, A. XtalPred: a web server for prediction of pro-

tein crystallizability. Bioinformatics, 2007, 23(24), 3403-3405. [14] Smialowski, P.; Schmidt, T.; Cox, J.; Kirschner, A.; Frishman, D.

Will my protein crystallize? A sequence-based predictor. Proteins, 2006, 62(2), 343-355.

[15] Kyte, J.; Doolittle, R.F. A simple method for displaying the hy-dropathic character of a protein. J. Mol. Biol., 1982, 157(1), 105-

132. [16] Guruprasad, K.; Reddy, B.V.; Pandit, M.W. Correlation between

stability of a protein and its dipeptide composition: a novel ap-proach for predicting in vivo stability of a protein from its primary

sequence. Protein Eng., 1990, 4(2), 155-161. [17] Plewczynski, D.; Slabinski, L.; Tkacz, A.; Kajan, L.; Holm, L.;

Ginalski, K.; Rychlewski, L. The RPSP: Web server for prediction of signal peptides. Polymer, 2007, 48(19), 5493-5496.

[18] Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L. Predict-ing transmembrane protein topology with a hidden Markov model:

application to complete genomes. J. Mol. Biol., 2001, 305(3), 567-580.

[19] Lupas, A.; Van Dyke, M.; Stock, J. Predicting coiled coils from protein sequences. Science, 1991, 252(5010), 1162-1164.

[20] Kurgan, L.; Mizianty, M. Sequence-based protein crystallization propensity prediction for structural genomics: review and compara-

tive analysis. Nat. Sci., 2009, 1, 93-106. [21] Dosztanyi, Z.; Csizmok, V.; Tompa, P.; Simon, I. IUPred: web

server for the prediction of intrinsically unstructured regions of

proteins based on estimated energy content. Bioinformatics, 2005,

21(16), 3433-3434. [22] Garbuzynskiy, S.O.; Lobanov, M.Y.; Galzitskaya, O.V. To be

folded or to be unfolded? Protein Sci., 2004, 13(11), 2871-2877. [23] Linding, R.; Jensen, L.J.; Diella, F.; Bork, P.; Gibson, T.J.; Russell,

R.B. Protein disorder prediction: implications for structural pro-teomics. Structure, 2003, 11(11), 1453-1459.

[24] Linding, R.; Russell, R.B.; Neduva, V.; Gibson, T.J. GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic

Acids Res., 2003, 31(13), 3701-3708. [25] Obradovic, Z.; Peng, K.; Vucetic, S.; Radivojac, P.; Brown, C.J.;

Dunker, A.K. Predicting intrinsic disorder from amino acid se-quence. Proteins, 2003, 53 Suppl 6, 566-572.

[26] Prilusky, J.; Felder, C.E.; Zeev-Ben-Mordehai, T.; Rydberg, E.H.; Man, O.; Beckmann, J.S.; Silman, I.; Sussman, J.L. FoldIndex: a

simple tool to predict whether a given protein sequence is intrinsi-cally unfolded. Bioinformatics, 2005, 21(16), 3435-3438.

[27] Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.; Dunker, A.K. Sequence complexity of disordered protein. Proteins,

2001, 42(1), 38-48. [28] Vullo, A.; Bortolami, O.; Pollastri, G.; Tosatto, S.C. Spritz: a

server for the prediction of intrinsically disordered regions in pro-tein sequences using kernel machines. Nucleic Acids Res., 2006,

34, W164-168. [29] Ward, J.J.; Sodhi, J.S.; McGuffin, L.J.; Buxton, B.F.; Jones, D.T.

Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol., 2004, 337(3), 635-

645. [30] Xue, B.; Dunbrack, R.L.; Williams, R.W.; Dunker, A.K.; Uversky,

V.N. PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim. Biophys. Acta, 2010, 1804(4), 996-1010.

[31] Yang, Z.R.; Thomson, R.; McNeil, P.; Esnouf, R.M. RONN: the bio-basis function neural network technique applied to the detec-

tion of natively disordered regions in proteins. Bioinformatics, 2005, 21(16), 3369-3376.

[32] Williams, R.M.; Obradovic, Z.; Mathura, V.; Braun, W.; Garner, E.C.; Young, J.; Takayama, S.; Brown, C.J.; Dunker, A.K. The pro-

tein non-folding problem: amino acid determinants of intrinsic or-der and disorder. Pac. Symp. Biocomput., 2001, 89-100.

[33] Uversky, V.N. Cracking the folding code. Why do some proteins adopt partially folded conformations, whereas other don't? FEBS

Lett., 2002, 514(2-3), 181-183. [34] Uversky, V.N.; Gillespie, J.R.; Fink, A.L. Why are "natively un-

folded" proteins unstructured under physiologic conditions? Pro-teins, 2000, 41(3), 415-427.

[35] Brocca, S.; Samalikova, M.; Uversky, V.N.; Lotti, M.; Vanoni, M.; Alberghina, L.; Grandori, R. Order propensity of an intrinsically

disordered protein, the cyclin-dependent-kinase inhibitor Sic1. Pro-teins, 2009, 76(3), 731-746.

[36] Sanchez-Puig, N.; Veprintsev, D.B.; Fersht, A.R. Binding of natively unfolded HIF-1alpha ODD domain to p53. Mol. Cell,

2005, 17(1), 11-21. [37] Dames, S.A.; Martinez-Yamout, M.; De Guzman, R.N.; Dyson,

H.J.; Wright, P.E. Structural basis for Hif-1 alpha /CBP recognition in the cellular hypoxic response. Proc. Natl. Acad. Sci. USA, 2002,

99(8), 5271-5276. [38] Hewitson, K.S.; McNeill, L.A.; Riordan, M.V.; Tian, Y.M.; Bul-

lock, A.N.; Welford, R.W.; Elkins, J.M.; Oldham, N.J.; Bhattach-arya, S.; Gleadle, J.M.; Ratcliffe, P.J.; Pugh, C.W.; Schofield, C.J.

Hypoxia-inducible factor (HIF) asparagine hydroxylase is identical to factor inhibiting HIF (FIH) and is related to the cupin structural

family. J. Biol. Chem., 2002, 277(29), 26351-26355. [39] De Sancho, D.; Best, R.B. What is the time scale for alpha-helix

nucleation? J. Am. Chem. Soc., 2011, 133(17), 6809-6816. [40] Vekilov, P.G. What determines the rate of growth of crystals from

solution? Cryst. Growth Des., 2007, 7(12), 2796-2810. [41] Durbin, S.D.; Feher, G. Studies of crystal growth mechanisms of

proteins by electron microscopy. J. Mol. Biol., 1990, 212(4), 763-774.

[42] DeMattei, R.C.; Feigelson, R.S. Controlling Nucleation In Protein Solutions. J. Cryst. Growth, 1992, 122(1-4), 21-30.

[43] George, A.; Wilson, W.W. Predicting protein crystallization from a dilute solution property. Acta Crystallogr. Sect D Biol. Crystal-

logr., 1994, 50(Pt 4), 361-365. [44] Kashchiev, D.; van Rosmalen, G.M. Review: Nucleation in solu-

tions revisited. Cryst. Res. Technol., 2003, 38(7-8), 555-574.

Predicting Protein Crystallizability and Nucleation Protein & Peptide Letters, 2012, Vol. 19, No. 7 731

[45] Juárez-Martínez, G.; Garza, C.; Castillo, R.; Moreno, A. A dy-

namic light scattering investigation of the nucleation and growth of thaumatin crystals. J. Cryst. Growth, 2001, 232, 119-131.

[46] Baird, J.K.; Scott, S.C.; Kim, Y.W. Theory of the effect of pH and ionic strength on the nucleation of protein crystals. J. Cryst.

Growth, 2001, 232(1-4), 50-62. [47] Michinomae, M.; Mochizuki, M.; Ataka, M. Electron microscopic

studies on the initial process of lysozyme crystal growth. J. Cryst. Growth, 1999, 197(1-2), 257-262.

[48] Tanaka, S.; Ito, K.; Hayakawa, R.; Ataka, M. Size and number

density of precrystalline aggregates in lysozyme crystallization process. J. Chem. Phys., 1999, 111(22), 10330-10337.

[49] Georgalis, Y.; Schuler, J.; Frank, J.; Soumpasis, M.D.; Saenger, W. Protein crysatllization screening through scattering techniques.

Adv. Colloid Interface Sci., 1995, 58(1), 57-86. [50] Niimura, N.; Minezaki, Y.; Ataka, M.; Katsura, T. Aggregation in

supersaturated lysozyme solutions studied by time-resolved small-angle neutron scattering. J. Cryst. Growth, 1995, 154(1-2), 136-

144.

Received: August 8, 2011 Revised: August 29, 2011 Accepted: February 11, 2012