Upload
case
View
1
Download
0
Embed Size (px)
Citation preview
Efficient protein crystallization
Lawrence J. DeLucas,a,* Terry L. Bray,a Lisa Nagy,a Debbie McCombs,a
Nikolai Chernov,b David Hamrick,c Larry Cosenza,c Alexander Belgovskiy,d Brad Stoops,d
and Arnon Chaitd
a Center for Biophysical Sciences and Engineering, University of Alabama at Birmingham, Birmingham, AL 35294-4400, USAb Natural Sciences and Mathematics, University of Alabama at Birmingham, Birmingham, AL 35294-4400, USA
c Diversified Scientific, Inc., Birmingham, AL, USAd ANALIZA, Inc., Bay Village, OH, USA
Received 5 February 2003
Abstract
High-throughput molecular biology and crystallography advances have placed an increasing demand on crystallization, the one
remaining bottleneck in macromolecular crystallography. This paper describes three experimental approaches, an incomplete fac-
torial crystallization screen, a high-throughput nanoliter crystallization system, and the use of a neural net to predict crystallization
conditions via a small sample (�0.1%) of screening results. The use of these technologies has the potential to reduce time and samplerequirements. Initial experimental results indicate that the incomplete factorial design detects initial crystallization conditions not
previously discovered using commercial screens. This may be due to the ability of the incomplete factorial screen to sample a
broader portion of ‘‘crystallization space,’’ using a multidimensional set of components, concentrations, and physical conditions.
The incomplete factorial screen is complemented by a neural network program used to model crystallization. This capability is used
to help predict new crystallization conditions. An automated, nanoliter crystallization system, with a throughput of up to 400
conditions/h in 40-nl droplets (total volume), accommodates microbatch or traditional ‘‘sitting-drop’’ vapor diffusion experiments.
The goal of this research is to develop a fully-automated high-throughput crystallization system that integrates incomplete factorial
screen and neural net capabilities.
� 2003 Elsevier Science (USA). All rights reserved.
Keywords: High-throughput nanocrystallization; Incomplete factorial screen; Neural-net
1. Introduction
The promise of high-throughput structural genomics
(HTX) as an enabling technology for rapid drug dis-
covery is becoming a reality. Advances in experimentaland computational technologies have made it possible
for pharmaceutical and drug discovery companies to
apply HTX to the multitude of new targets available
from high-throughput genomics efforts. Successful HTX
application will require flexible systems that allow for
efficient development and usage of experimental data for
iterative optimization.
A number of techniques have been employed for de-
termining protein structures, including X-ray crystal-
lography, nuclear magnetic resonance spectroscopy, and
mass spectrometry. Of these, X-ray crystallography re-
mains the only method routinely used to determinestructures of large biomolecules (i.e., MW in excess of
20 000Da). The elucidation of complete genome se-
quences for a large number of vertebrate and inverte-
brate species (Roses, 2002) has accelerated international
efforts to develop high-throughput methods/technologies
that enable rapid three-dimensional protein structure
determination (Kuhn et al., 2002; Lamzin and Perrakis,
2000). The National Institutes of Health (NIH) estab-lished a structural genomics program with the goal of
‘‘encouraging research on the development of method-
ology and technology underpinning the emerging field of
Journal of Structural Biology 142 (2003) 188–206
www.elsevier.com/locate/yjsbi
Journal of
StructuralBiology
* Corresponding author. Fax: +205-934-2659.
E-mail address: [email protected] (L.J. DeLucas).
1047-8477/03/$ - see front matter � 2003 Elsevier Science (USA). All rights reserved.
doi:10.1016/S1047-8477(03)00050-9
structural genomics, whose goal is the understanding ofprotein structural families, structural folds, and the re-
lation of structure and function.’’ One such program
(P50-GM62407), the ‘‘Southeastern Collaboratory for
Structural Genomics’’ (SECSG), involves researchers
from a consortium of universities, including the Uni-
versity of Georgia (UGA; P.I. B.C. Wang), Georgia
State University, the University of Alabama at Bir-
mingham (UAB), the University of Alabama in Hunts-ville, and Duke University. The complete genomes from
Caenorhabditis elegans and Pyrococcus furiosis, plus se-
lected genes from the human genome, were chosen for
this collaborative program. At UAB, the Center for
Biophysical Sciences and Engineering (CBSE) has been
involved in fundamental studies of macromolecular
crystal growth since 1985. The CBSE previously devel-
oped an automated crystallization system that providesreal-time control of pre- and postnucleation vapor
equilibration kinetics, in an effort to enhance crystal size
and diffraction resolution (Bray et al., 1997, 1998; Col-
lingsworth et al., 2000). More recently, the CBSE and
two collaborating companies, ANALIZA, Inc., and Di-
versified Scientific, Inc., have developed novel technol-
ogies that have the potential to increase the efficiency of
protein crystal screening and optimization. Screening forpotential crystallization conditions often requires large
quantities (25–100mg) of purified protein. Unfortu-
nately, even with the recent advances in cloning and
protein expression, many proteins can be produced in
only submilligram or low-milligram quantities without
enormous expenditures of resources to produce larger
quantities. The recent efforts in genomics, and subse-
quently proteomics, have produced thousands of newproteins for study in structural biology and drug design
projects. The number of new proteins available will
continue to increase significantly in the next several years
as additional investigators become involved in this im-
portant research. As a result, there is a significant need
for more efficient and effective methods for determining
protein structures. Since X-ray crystallography will likely
remain the primary technique for determining structuresof large proteins for the foreseeable future, the limita-
tions associated with this method must be overcome.
There are two primary approaches to solving the limi-
tations of X-ray crystallography: (1) develop more ef-
fective methods for large-scale protein production and
(2) reduce the scale and number of individual screening
experiments so that less protein is needed to perform
screening experiments.A holistic HTX approach is presented, combining
with a thorough understanding of the physical phe-
nomena underlying important problem areas and novel,
automated technology platforms that are modular, effi-
cient, and flexible.
A critical component of X-ray crystallography is ob-
taining well-ordered crystals of the target protein. This
effort traditionally requires screening thousands of so-lutions with varying chemical compositions. In an effort
to minimize the total amount of protein required to
discover suitable crystallization conditions, the CBSE,
ANALIZA, Inc., and Diversified Scientific, Inc., have
explored three different complementary approaches: an
incomplete factorial screen with response surface opti-
mization, a high-throughput nanoliter crystallization
robot, and a neural net software program capable ofusing initial screening results to predict future conditions
that are likely to yield crystals. The incomplete factorial
screen with response surface optimization allows a small
number of experiments to be performed that, sampling
the extremely large experiment space in a statistically
robust manner. This approach allows for efficient de-
termination of solution conditions suitable for crystal-
lizing proteins by performing experiments that take intoaccount the independent and interdependent influences
of each experimental parameter. Hits obtained in the
initial screening process are used to design an optimiza-
tion screen to further improve the initial crystallization
results. To enable a comprehensive search in a large
parameter space for optimal crystallization conditions
using submilligram protein quantities, a modular line of
high-throughput crystallization and inspection work-stations has been developed. The process begins with a
statistically based screen optimization that directs the
production of specialized libraries of crystallization
conditions. The libraries and the proteins are subse-
quently combined using a novel nanoliter-range crystal-
lization screening system (NanoScreenTM). An
automated intelligent high-resolution inspection system
(CrystalScoreTM) is then deployed to periodically exam-ine and classify the optimal starting conditions for the
subsequent scale-up crystallization experiments. This
system is currently being upgraded to include a neural
net crystallization prediction program.
The high-throughput nanoliter crystallization ro-
bot significantly reduces the scale of each experi-
ment, allowing the use of as little as 0.1 lg of protein
per screening experiment condition. The use of neuralnetwork software programs facilitates prediction of
probable crystallization conditions based on results
from a small number of experiments. This can fur-
ther improve the efficiency of protein crystallization
screening experiments by learning from prior experi-
mental results and predicting new conditions that
should produce crystals. The three technologies work
in tandem to facilitate highly efficient and effectivescreening of protein crystallization conditions. The
ultimate goal is to develop an automated system that
combines all three approaches, yielding a more effi-
cient and successful method for macromolecular
crystallization. The following provides our strategy
and preliminary investigations for each of these ap-
proaches.
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 189
2. Incomplete factorial screen
The crystallization of biological molecules is a pro-
cess in which large, complex molecules in solution as-
sociate with one another to form a regular lattice in the
solid phase. This process is accomplished by changing
the physical and chemical environment such that the
formation of this lattice is more favorable thermody-
namically than remaining in the solution state. Thethermodynamics can be rationalized in terms of several
factors.
The Gibb�s free energy, DG, is given by the equation
DG ¼ DH � TDS:
The energy released by crystallization is the difference
between the changes in enthalpic and entropic energy.
The energetic cost of order (negative entropy) increases
with temperature.
The macromolecule itself has a favorable (positive)
entropy in solution due to its translational and rotationaldisorder. It is surrounded by an ordered shell (or shells) of
water molecules that have negative entropy. The charged
macromolecular surface forms hydrogen bonds with the
water with favorable (negative) enthalpy. In the crystal-
line form, water is released as a portion of those hydrogen
bonds are sacrificed in favor of stronger intramolecular
hydrogen and ionic bonds. The order of the lattice de-
creases the entropy of the molecules. The exclusion of thewater from the lattice releases thewatermolecules into the
bulk solution, however, raising their entropy. In solution,
hydrophobic patches on the surface of themacromolecule
are also surrounded by (unfavorably) ordered water
molecules, but lack the favorable hydrogen bonds. In the
crystalline form, these hydrophobic patches are often
buried by intermolecular interactions. These energetic
conditions are summarized in Table 1.The conditions chosen to crystallize a macromole-
cule exploit and control these energetic differences. The
pH, for example, determines the charge on the mole-
cule, directly influencing the energetics of its interactionwith the bulk solvent. The addition of counterions
shields surface charges and changes the chemical po-
tential of the solvent. Polymeric alcohols sequester
water away from the macromolecule and may interact
with it as well. Certain other components, such as di-
valent cations and metals, may interact directly with
the macromolecules and moderate lattice contacts.
Although some macromolecules are crystallized solelyby temperature change, in most cases the buffers that
stabilize the solution and crystalline states have differ-
ent compositions. This is often accomplished either by
evaporation or by dialysis. The rate of change can be
controlled by the physical form of the experiment, such
as vapor diffusion, dialysis, or controlled evaporation.
A phase diagram representing a vapor diffusion ex-
periment is shown in Fig. 1. The zigzag line indicatesthe transition from soluble to crystalline macromole-
cule as the macromolecule and precipitant concentra-
tions increase.
Changes in the concentration of the macromolecule
due to water evaporation increase the tendency for
nucleation and crystal growth. Crystal growth itself
Table 1
Summary of molecular energetics that affect crystallization processes
In solution Crystal
Favorable Unfavorable Favorable Unfavorable
Macromolecule
entropy
Freedom of translation
and rotation
Packing order
Macromolecule
enthalpy
Hydrogen bonds to
water, ionic bonds to
counter ions
Ionic and hydrogen
bonds to lattice
neighbors
Loss of hydrogen bonds
to water shell
Solution entropy Ordered surface
water—especially
around hydrophobic
areas
Water released from shell
into bulk solvent
Solution enthalpy Hydrogen bonds to
macromolecule
Fig. 1. Crystallization phase diagram.
190 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
removes the macromolecule from the solution phase,decreasing its concentration.
A preliminary crystallization screen searches broadly
though ‘‘crystallization space,’’ a multidimensional set
of possible components, concentrations, and physical
conditions. The goal is good outcomes (i.e., leads to
diffraction-quality crystals). The experiment number
and type for the initial screen are chosen to reflect a
reasonable balance between thoroughness and cost anda reasonable expectation of where the desired results
may be found. For example, excessive trials will return
many repeated results from similar experiments, wast-
ing material, time, and manpower. An insufficient
number of experiments may miss a region that would
have ultimately led to crystals. It is also important to
avoid time- and sample-consuming general trials in
areas of chemical space that would rarely, if ever,produce crystals. These areas might be accessed later
by widening the search if initial trials fail to produce
leads.
There are many strategies available to search for
crystallization conditions. Commercial screens use
sparse matrix methods, in which the experiments are
clustered around conditions that have already given
crystals in the past. The advantage of this approach isthat when a protein is crystallized under one set of
conditions, it will often exhibit hits under other con-
ditions as well. The disadvantage is that some areas
of crystallization space are neglected. Random screens
(such as Crystool) sample these areas, but in both
cases it is difficult to glean information from the
collected results because the screens are not balanced.
With the assistance of Professor Charles Carter(University of North Carolina at Chapel Hill), CBSE
scientists have been developing a set of conditions to
construct an efficient screen. A general model is used
that incorporates factors whose levels are mathemati-
cally balanced. Each possible level of a factor is sam-
pled an equal number of times. In our 360-experiment
screen, each of the six anionic precipitants is sampled
60 times. Binary combinations are also balanced.Third-order and higher combinations are distributed as
randomly as possible. A statistical computer program,
INFAC (C.W. Carter, private communication), is used
to construct the balanced matrices that encode the ex-
periments. A spreadsheet program translates the ma-
trices into chemical recipes for our solution-handling
robots to construct. This balanced design facilitates the
determination of which factor levels are most suitablefor crystallization. For example, comparison of the
average score of all the experiments that contain chlo-
ride as an anionic precipitant to the overall average
(and the other anions) allows one to determine whether
chloride is the best choice for the anion. Some binary
combinations will show obvious synergy, such as the
combination of pH and anion choice. Anions show
varying effectiveness as precipitating agents as the netcharge on the protein changes. This is related to the
Hofmeister effect (Hofmeister, 1888). The temperature
can have a significant effect on the solubility of the
macromolecule, also affecting the solution properties
for other components. Several buffers have a tempera-
ture dependence on their pKa, a characteristic that may
be exploited. The pH affects the net charge on the
molecule and the charge state of the surface aminoacids. Above the protein isoelectric point (pI), the net
charge on the protein is negative and more of the basic
groups are neutral, rather than protonated. At the pI,
the net charge is zero and the protein is less soluble.
Below the pI, the net charge on the protein is positive
and more of the acidic groups are neutral.
The phase diagram (Fig. 1) reflects a general solu-
bility (Ksp) model, in which the nucleation point dependson both the concentrations of macromolecule and pre-
cipitant. At higher concentrations of macromolecule,
less precipitant is required for nucleation. At lower
concentrations of macromolecule, the reverse is true.
Conditions with higher concentrations of protein have
more protein available to form crystals, but may cause
overly rapid crystal growth rates, leading to flaws and
twinning. In a screen, the precipitant concentration isvaried relative to the macromolecule concentration. This
avoids areas of crystallization space that would never
give crystals.
Both organic precipitants, such as polymeric alco-
hols (polyethylene glycols), and ionic salts are useful
as precipitants. Many ionic salts are used as precip-
itants and have particular charges, polarizabilities, si-
zes, and solution activities. Many have significantbuffering capabilities as well. Using a combination of
organic and inorganic precipitants can balance their
various properties. Glycerol can stabilize the macro-
molecule in solution by specific interaction with the
surface and it can shift the nucleation point of the
macromolecule independent of other factors. Divalent
cations stabilize specific interactions between macro-
molecule monomers, often increasing the order of acrystal. Many proteins bind specific divalent metals,
and often the metals have an anomalous signal that
can be used for phasing. Additives such as detergents
and arginine can affect the macromolecule conforma-
tion, strengthen specific contacts, or reduce nonspecific
contacts.
2.1. Methods
Once a screen design has been devised and con-
structed, the experiments are conducted and the out-
comes are scored in a qualitative fashion. We use a
scale that includes scores for various crystalline, quasi-
crystalline, and noncrystalline results. From the out-
comes of the screen, a central starting point is defined
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 191
for a fine search of the physical and chemical param-eters. This optimization step is important because at the
optimum, small unavoidable statistical variations in
the experimental conditions have negligible effects on
the outcome. The experiment is more robust and re-
producible at the optimum than at the extremes. In the
case in which a trial has given crystalline material, it
may serve as the center point for a multidimensional
search. A starting point may also be inferred by astatistical treatment (least-squares and/or nonlinear
methods) of the variation of noncrystalline results. In
our optimization model, four or five numerical vari-
ables (protein concentration, pH, total amount of pre-
cipitant, relative amounts of organic and ionic
precipitant, and if used, concentration of glycerol or
polyol) are varied simultaneously to explore the vicinity
of the hit for improved results. The particular param-eters are defined to minimize their synergistic effects.
For example, the concentration of precipitant necessary
to crystallize a protein at 10mg/ml would be insufficient
to crystallize that same protein at 5mg/ml and would
precipitate the protein at 20mg/ml. For this reason, as
the concentration of the protein is varied in the opti-
mization, the relative rather than the absolute concen-
tration of precipitant is varied.A grid search at three levels in five dimensions
would entail 243 experiments. To reduce that number
to a manageable level without sacrificing too much
data, we have used the Design of Experiments–Re-
sponse Surface method (Carter, 1997). Many industrial
processes are optimized using this method. We use
previously published matrices (Carter, 1977) to define
the variation about the center point. These matrices arearrays of numbers from )1 to +1, with repetitions at
the center point. The desired spread of variation for
each dimension is multiplied by the matrix value and
added to the center point for that variable. The out-
comes of the experiments are scored using a size
measurement, such as the ratio of width to length. The
results of the experiments are then fed into the statis-
tical analysis package JMP (SAS Institute, Cary, NC,USA), which has built-in methods to construct and
analyze a model from response surface data. The
program fits the data to a multidimensional quadratic
model and either predicts an optimum within the
bounds of the experiments or suggests a direction in
which to locate a new center point. Full statistical
calculations are presented to allow the researcher to
assess the goodness of fit of the model. An iterativeapproach is then used to refine about the optimum,
leading to higher quality crystals.
A matrix to encode the variables for the screen was
calculated using the program INFAC. The screen size
was chosen to be 360 experiments with 10 variables at
six or three levels each. The most balanced matrix was
chosen from the 10 000 numerical seeds tested. Screen
variables were chosen according to previously publishedwork (Carter and Carter, 1979).
The matrix was translated into a set of recipes using a
spreadsheet program. The stock components were
mixed using the RecipeMaker robot, a custom-config-
ured Hamilton ML 4000 liquid-handling robot capable
of mixing up to 48 different stocks and water. Recipes
were prepared in 2-ml block plates and reformatted into
standard 384-well plates. For comparison, the proteinswere also subjected to a group of commercial screens
consisting of Hampton Crystal Screens I and II,
MembFac, Natrix, and Emerald Wizards I and II (290
experiments). Comparison of crystallization success
rates for incomplete and sparse matrix screens suggests
that the incomplete factorial method may be more ef-
fective. A set of five proteins (two from C. elegans and
three from P. furiosis), 2G1.1, 9c9, UGA 214, UGA 220,and UGA 222, was subjected to crystallization screening
using both incomplete factorial and sparse matrix
models. For this limited sample size, the incomplete
factorial method of screening identified a greater num-
ber of crystallization conditions for each protein. Each
of the five protein samples was crystallized using the
incomplete factorial method, whereas only two proteins,
2G1.1 and UGA 214, crystallized using the sparse ma-trix set. Interestingly, the incomplete factorial method
identified conditions that appear disparate compared to
the results from the sparse matrix screens.
2.2. Screening results
Table 2 summarizes the screening results for five
proteins (two from C. elegans and three from P. furiosis)from the SECSG program for the incomplete factorial
screen vs commercial screens.
2.3. Optimization
Preliminary optimization efforts have focused on two
proteins, the VEE capsid protein and Group B Strep-
tococcus hyaluronate lyase.In the search for diffraction quality crystals of the
VEE capsid protein, one hit in the screen, 12% PEG
8000, 10% glycerol and 150 mM Ammonium Citrate,
Table 2
Comparison of results obtained using incomplete factorial screen
versus available commercial screens
Protein Incomplete factorial
screen hits
Commercial
screen hits
2G1.1 3 5
9C9 1 0
UGA 214 6 1
UGA 220 7 0
UGA 222 2 0
192 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
pH 7.2, gave very small crystals that appeared to betetragonal. The normal form, which is orthorhombic,
tended to grow as needles or prisms. A response surface
optimization experiment was constructed such that the
pH, protein concentration, glycerol concentration, and
precipitant concentrations were varied simultaneously.
Thirty optimization experiments were performed using
the response surface methodology to search for im-
proved conditions for VEE capsid crystallization. Theresults can be inferred from the width and length mea-
surements of the crystals obtained. Most conditions
gave small to large blocky tetragonal crystals, while
other conditions produced needles or a mixture of the
two. Representative photos are shown in Fig. 2.
The scores were analyzed by both width (smallest
dimension) and aspect ratio in the statistical program
JMP. Fig. 3 is a plot of the predicted against actualwidths.
2.3.1. Summary of fit
The summary of fit for Fig. 3 is as follows: RSquare,
0.821104; RSquare Adj, 0.694825; Root mean square
error, 0.088871; Mean of response, 0.1521; Observations
(or sum Wts), 30.
When the model was predicted based on the widthscore alone, the surface represented not a maximum, but
a single dimension minimum (Fig. 4). This indicates that
the area probed by the experiments is on the boundary
between two maxima. The results confirm this, as two
forms are present separately and mixed under other
conditions. In order to maximize the tendency toward
the blocky crystal form, a model was constructed based
on aspect ratio (Figs. 5 and 6). Since the majority of the
trials gave the blocky form, the statistical model is not asrobust. However, the predictions indicate that the form
is sensitive to both the protein concentration and the
precipitant ratios. In particular, the blocky form is
preferred at higher concentrations of ammonium citrate,
and the needles form at higher concentrations of PEG
8000.
2.3.2. Summary of fit
The summary of fit for Fig. 5 is as follows: RSquare,
0.73941; RSquare Adj, 0.664956; Root mean square
error, 0.235667; Mean of response, 0.75978; Observa-
tions (or sum Wts), 28.
A set of experiments was prepared at the predicted
maximum. All trials gave large single blocky crystals.
2.3.3. Crystallization of full-length Group B Streptococ-
cus hyaluronate lyase (GBShl)
Preliminary crystals of GBShl were obtained using
vapor diffusion methods. The reservoir precipitant was
Fig. 2. Crystals from the VEE capsid response surface experiment.
Fig. 3. Actual vs. predicted width of crystals obtained using incomplete
factorial screen.
Fig. 4. Prediction profiles based on crystal width.
Fig. 5. Predicted vs. actual aspect ratio.
Fig. 6. Prediction profiles based on crystal aspect ratio.
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 193
chosen based on the crystallization conditions reportedfor the 92-kDa fragment of S. agalactiae hyaluronate
lyase and included polyethylene glycol monomethyl
ether 5000 (PEG-MME 5000), potassium thiocyanate,
and 100mM Hepes buffer, pH 7. The protein solution
contained 20mM Hepes, pH 7, and 10mM calcium
chloride. In a typical experiment, 2 ll of protein so-
lution was mixed with 2 ll of the reservoir solution.
The initial crystals were poorly formed, as shown inFig. 7.
As with the previous example, response surface
methods were used to refine the crystallization condi-
tions. The protein concentration, reservoir pH, thiocy-
anate, and PEG-MME 5000 concentrations were varied
simultaneously.
A variety of outcomes, ranging from clear drops to
precipitate to poor crystals and single crystals, was ob-tained. A model of the interactions of these factors was
constructed using the statistics package JMP (SAS In-
stitute). The statistics are rough approximations because
many of the drops that might have produced crystals
failed to nucleate (Fig. 8).
2.3.4. Summary of fit
The summary of fit for Fig. 8 is as follows: RSquare,0.684151; RSquare Adj, 0.514078; Root mean square
error, 2.359997; Mean of response, 3.190476; Observa-
tions (or sum Wts), 21.
The new predicted center point (Fig. 9) had a
protein concentration of 24mg/ml, and the reservoir
buffer was pH 7.6, 19% PEG-MME 5000, and 56mM
KSCN. Larger single crystals were obtained by seed-
ing. Crystals were crushed in the mother liquor, and1 ll of this solution was added to the 1-ml reservoirs
in the vapor diffusion setup. The reservoir was mixed
as normal with the protein solution. The seeded
crystallization conditions were refined using response
surface methods to give consistent large single crystals,
0:2� 0:3� 0:5mm, suitable for diffraction experi-
ments. A tray of 24 replicates produced crystals that
diffracted to 2�AA (Fig. 10).
2.3.5. Problems with optimization
One of the problems inherent with this or any op-
timization experiment is the batch-to-batch variability
of protein samples. Although the protein concentra-
tion and buffer composition can be adjusted to the
desired conditions, variations in handling the sample,
particularly during the concentration step, may give
rise to differing aggregation and oxidation states.
Another problem is the stochastic nature of crystal
nucleation (this can be controlled to some extent via
laser light-scattering measurements). Nucleation oftenfails to occur for some experiments, even though exact
replicates yield large single crystals. Additionally, there
are inherent errors involved with scale-up of experi-
ments from nanoliter to microliter volumes. The
equilibration kinetics for microbatch or vapor diffu-
sion is affected by the protein drop volume and sur-
face area. Although these differences can be modeled,
this added variable complicates scale-up and finaloptimization. Despite these difficulties, initial incom-
plete factorial screen and optimization results are
encouraging.Fig. 7. Initial crystals of GBShl.
Fig. 8. Actual vs. predicted score.
Fig. 9. Prediction profiles for GBShl crystal score.
Fig. 10. Representative optimized crystals of GBShl.
194 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
3. High-throughput nanoliter protein crystallization
CBSE scientists recognized several years ago that the
demand for a significant quantity of protein for crystal
screening would inhibit the ability of researchers to de-
termine structures for a significant percentage of the
targeted proteins. In 1997, the CBSE began developing
concepts for reducing the scale of protein crystallization
experiments. New technologies that would support au-tomated, reduced-scale crystallization screening were
also developed. Indeed, several companies, academic
laboratories, and government laboratories are now ac-
tively pursuing reduced-scale crystallization, both with
internally developed technologies and with commer-
cially available systems (Syrrx, Inc., 10410 Science
Center Drive, San Diego, CA 92121, USA; DeTitta
et al., 2001; Luft et al., 2001; Mueller et al., 2001; San-tarsiero et al., 2002; Segelke et al., 2002). This approach
has proven to be an enabling methodology for high-
throughput crystallization screening and structure de-
termination.
3.1. Fundamental vs practical considerations
Macromolecular crystallization is characterized byextremely low and highly anisotropic molecular at-
tachment kinetics. Large molecules require a longer
time (compared with small molecules) to assemble into
a highly ordered crystalline lattice. Critical variables
include the solvent structure, the presence of various
crystallization agents that modify solvent structure,
and the rate at which protein molecules are trans-
ported to the crystalline surface. In large-scale crystalgrowth it is possible to handle highly viscous solutions
and mix and prepare screening solutions containing
minor ingredients in very low proportions. Reduced-
scale crystallization places constraints on these three
requirements.
Our approach to reduced-scale crystallization ex-
plicitly separates preparation of screening solutions
from the screening experiments. Libraries of crystalli-zation solutions are prepared according to previously
described statistical design methodologies, using con-
ventional liquid handlers in large volumes (ca. 1ml).
Rather viscous bulk ingredients can be dispensed at
proper proportions and completely mixed in advance of
the actual screening experiment. Different screening li-
braries specific for each class of proteins are prepared
and stored in sealed 96-deep-well plates for further re-formatting and subsequent use. The actual experiment is
reduced to aspiration/dispension of the screening and
protein solutions for the specific crystallization tech-
nique chosen.
There are three fundamental approaches to deliver-
ing submicroliter volumes. The simplest approach is the
spotting method, whereby a pin with an extremely fine
point is dipped into a solution and removed. A smallamount of solution clings to the tip of the pin and can
be transferred to a microarray or slide by touching the
pin to the target surface. This method is easy to im-
plement but it is difficult to control the accuracy and
precision of the delivered volume. A second approach
involves using chambers of defined dimensions to
control the volumes delivered. A common application
of this method is in microfluidic devices, which utilizefabrication techniques from the computer chip industry
to create channels and chambers in silicon that can
contain nanoliter volumes. However, controlling fluid
movement in this format has proved difficult primarily
due to limitations of valve fabrication. Only recently
has effective valve fabrication technology been achieved
in soft polymeric material devices (Fluidigm Corp.,
7100 Shoreline Court, South San Francisco, CA 94080,USA). A third method for delivering small volumes
involves dispensing technologies. At the outset, the
CBSE and ANALIZA determined that the use of active
dispensing technologies was most feasible for success-
fully performing high-throughput nanoliter crystalliza-
tion. Fundamental technologies capable of dispensing
nanoliter volumes, and adaptable for our application,
were available in the ink-jet printing industry. Otherapproaches were judged to have limitations not easily
overcome in a reasonable time frame. Although the
active dispensing approach was judged to be the most
feasible, there remained several hurdles. One of the
most challenging problems encountered was the deliv-
ery of nanoliter volumes of solutions with widely
varying physicochemical properties. This problem is
solved by the choice of active dispensing method andby tailoring the dispense rate to the solution viscosity.
These two approaches resulted in a system capable of
dispensing solutions exhibiting a wide range of physi-
cochemical properties with high precision and accuracy.
Other practical problems included dispensing multiple
nanoliter volumes simultaneously with similar precision
and accuracy, eliminating cross-contamination between
solutions, and controlling unintended water loss duringexperiment preparation. Dispensing of multiple-nano-
liter solutions was accomplished using a hybrid mi-
crofluidic valve (Innovadyne Technologies, Inc., 2835
Duke Court, P.O. Box 7329, Santa Rosa, CA 95407-
7329, USA) that controlled the fluid flow at each dis-
pensing tip. Cross-contamination was eliminated by use
of custom wash stations that rinsed the dispensing tips
thoroughly before sets of new solution conditions wereaspirated. Water loss during experiment preparation
was virtually eliminated via an automated oil (micro-
batch) dispense system or by using a humidity chamber
(vapor diffusion) to retard evaporation.
Dispensing of viscous screening solutions in very
small volumes (e.g., low nanoliter quantities) is a
vexing problem that rapidly reduces the choice of
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 195
dispensing technologies. Techniques relying on surfacewave excitation and instability, such as piezo capillar-
ies, are inherently limited since they typically rely on
small variations in surface tension and viscosity to
achieve quantitative accuracy. Screening solutions vary
in all of their physicochemical parameters thus re-
quiring a robust dispensing technique to achieve re-
producible and quantitative screens. Fast solenoid
(a.k.a. drop-on-demand) techniques were chosen be-cause they rely on fluid inertia for dispensing and
typically result in accurate results across a large oper-
ating range.
Low-volume screening experiments can be executed
in multiple ways. Our initial work concentrated on mi-
crobatch (under-oil) crystallization (Chayen, 1997), in
which the screening and protein droplet is quickly cov-
ered with a mixture of water-impermeable and water-permeable oils, offering the desired kinetic profile (e.g., 1
or 2 weeks for complete drying). Crystallization under
oil is particularly suitable for low-volume screening in
which crystal recovery is not required. Also, rapid
evaporation of prepared droplets is a serious issue when
using small volumes and typical techniques such as base-
plate cooling and humidity control are not optimal since
every droplet (screen) has different colligative properties.For example, base-plate cooling below the dew point of
water can result in droplets continuing to dry while
others, on the same multiwell plate, actually experience
hydration. Rapid sealing of individual wells requires
sophisticated automation, a level of complexity that was
omitted from initial prototypes. Other crystallization
techniques such as sitting drops are easily accommo-
dated, and new miniaturized crystallization plates fa-cilitate such experiments. It should be noted that a thin
layer of silicon oil can be used to slow the rapid initial
evaporation, thereby allowing an entire plate to be
prepared before covering the individual experiment
chambers.
As discussed previously, a preferred method for
conducting reduced-scale crystallization screening is
through the use of technologies, similar to those em-ployed in ink-jet printing devices, which allow small
volumes to be dispensed with high accuracy and pre-
cision. The CBSE partnered with ANALIZA, Inc. (a
Cleveland-based biotechnology company) to develop
NanoScreen, an automated crystallization system that
can dispense 20 nl of solution with �10% accuracy.
This system can prepare 3000 crystallization experi-
ments with less than 350 lg of protein. A wide rangeof solution viscosities can be accommodated while
maintaining accurate dispensing. Twenty-nanoliter
droplets consisting of up to 15% PEG 8000 can be
rapidly dispensed with less than 10% error. Higher
PEG concentrations are achieved by dispensing four-
fold diluted PEG solutions (making the solutions more
water-like with similar chemical contents) or by de-
creasing the dispense rate. The microbatch method canbe adjusted so that water is continually drawn from
the diluted crystallization droplet until the desired
percentage of PEG is obtained. The dilution effect can
be accounted for in our statistical experiment design
strategy.
The NanoScreen system (Fig. 11) is comprised of the
several key components. A microfluidic dispensing head
is used to handle the protein and crystallization (recipe)solutions. The first-generation system has 12 tips for
fluid handling, with 2 tips dedicated to dispensing pro-
tein solutions and the remaining 10 used to deliver the
recipe solutions. Motorized stages allow for accurate x-,
y-, and z-axis movement of the head to the various lo-
cations on the deck of the NanoScreen system. The
system can perform both microbatch and traditional
vapor diffusion (sitting drop) experiments. For micro-batch experiments, an oil-dispensing subsystem is used
to cover the experiment solutions after they are de-
ployed. Wash stations for protein and recipe prevent
contamination between solutions and a quality control
capability is used to calibrate drops and to monitor the
dispense mechanism. Custom software controls all as-
pects of the NanoScreen operation, including solution
aspiration, solution dispensing, tip washing, oil dis-pensing, stage movement, quality control operations,
and drop-dispensing calibration. The system accom-
modates experiments in conventional 384-well or 1536-
well plates, as well as the new Corning 192-experiment
vapor diffusion plate.
The NanoScreen system operates by aspirating 20 llof protein solution into two protein tips and 20 ll of 10different recipe solutions into the 10 recipe tips. Beforethe first experiments are prepared, drops are dispensed
from each tip onto the quality control area to ensure
that the tips are dispensing properly. The dispensing
head then moves to the first set of experiment wells and
dispenses the protein solution. The 10 recipe tips are
then automatically positioned over the protein solutions
in the experiment wells and recipe solutions are simul-
Fig. 11. NanoScreen crystallization system.
196 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
taneously dispensed. For microbatch experiments, an oilmixture is immediately dispensed over the experiment
solutions to prevent unintended water loss. For vapor
diffusion experiments, the experiments are prepared
within a constant-humidity chamber to minimize unin-
tended evaporation, but coverage with permeable oil
also offers an option to arrest the initial rapid evapo-
ration while not interfering with the slow approach to
supersaturation. With each set of 10 experiments pre-pared, the recipe tips are washed, followed by aspiration
of the next set of 10 different recipe solutions for sub-
sequent experiments. This process is repeated until all
experiments on a tray have been prepared. The experi-
ment tray is then manually sealed with Crystal Clear
Sealing Tape (MANCO HP260) and placed in a con-
stant-temperature incubator. The NanoScreen system
has a throughput of 400 experiments/h. While approxi-mately 40 ll of protein solution is needed as a total
working volume, very little of this initial solution is
actually consumed. Most of the aspirated protein solu-
tion is returned, after preparation of all experiments is
completed, where it can be recovered and used for future
experiments. It should be noted that the first-generation
NanoScreen system is a prototype, with improvementsalready under development to increase throughput by a
factor of 10.
3.2. Experimental results
Our first experiments in reduced volume crystalliza-
tion involved manually dispensing nanoliter volumes of
commercially available proteins (thaumatin and lyso-zyme) and crystallizing solutions into a custom micro-
array using a micro syringe. Fig. 12 shows one of the
early custom microarrays used and Fig. 13 represents
selected images of thaumatin crystals grown in the mi-
croarray.
The initial total volume (protein + crystallizing solu-
tion) was approximately 50 nl. After verifying that
protein crystals could be grown and visualized innanoliter droplets, scientists at ANALIZA, Inc., devel-
oped and tested a number of technologies to support the
construction of the NanoScreen system.
Initial testing of the prototype system involved the
crystallization of eight commercial proteins. The pro-
teins were crystallized in 60-nl total initial volume mi-
crobatch experiments using known crystallization
conditions. Fig. 14 shows results from three of theseproteins, lysozyme, catalase, and pepsin. Additionally,
using the incomplete factorial screen, previously unre-
ported new crystallization conditions were identified for
several of the commercial proteins.
After completing the initial testing with commercial
proteins, the system was used to crystallize new proteins
obtained from internal projects and from collaborators
at other institutions. More recently, we have begunscreening crystallization conditions for proteins from
our NIH Structural Genomics project. We have
screened more than 170 proteins in nanoliter drops by
microbatch or vapor diffusion experiments, with greater
than 25% producing a crystalline phase separation.
Initial hits are currently being optimized for subsequent
scale-up to produce larger crystals required for X-ray
diffraction analysis.Microbatch crystallization has been used for the
majority of our nanoliter crystallization experiments to
date. Initial total drop volumes for microbatch experi-
ments are typically 60–80 nl. The protein and recipe
solutions are deployed as described earlier into 384-well
Fig. 12. Nanocrystallization microarray.
Fig. 13. Thaumatin crystals grown in 20-nl droplets.
Fig. 14. Nanoliter crystallization results for lysozyme, catalase, and pepsin.
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 197
plates, covered with oil, sealed with tape, and imageddaily for up to 2 weeks. Since experiments are performed
at different temperatures as part of our statistical ex-
periment design approach, the nanoliter drops lose wa-
ter through the oil and tray at different rates. Water
dissipation is slower at lower temperatures, thus an ex-
periment at 22 �C is imaged for 5–7 days, while an
experiment at 4 �C is imaged daily for up to 2 weeks. A
10-point scale based upon the Hampton Researchcrystallization scale is used to score results. Hits are
defined as a crystalline phase separation, with higher
scores assigned to results producing three-dimensional
single crystals. Fig. 15 shows microbatch nanocrystalli-
zation results for NIH Structural Genomics proteins
2G1, 4G49A, and W539A.
The NanoScreen system was recently modified by
addition of precise humidity control, thereby accom-modating vapor diffusion crystallization experiments in
the Corning 192 experiment chamber protein crystalli-
zation plate. The Corning plate is based on a 384-well
plate format, with 192 separate experiment chambers
where a protein solution can be equilibrated against a
reservoir solution. The reservoir solutions are pre-
loaded to match the chemical contents of crystallizing
solutions that are deployed into the protein solution foreach experiment. Experiments are prepared as de-
scribed earlier, with a total initial volume of 120 nl.
Larger volumes are used to allow for slight evaporation
that may occur prior to sealing the plate. This does not
increase the consumption of protein significantly rela-
tive to the working volume. The prepared tray is sealed
with clear tape and imaged through the bottom of each
crystallization well. Since the protein solutions areequilibrated against a reservoir, the drops do not lose
water to dryness and can be imaged less frequentlywithout missing hits. Thus far, crystallization experi-
ments for 18 proteins from the NIH Structural Ge-
nomics project have been performed using the vapor
diffusion method, with crystals observed for 10 of these
proteins. Fig. 16 shows sitting-drop vapor diffusion
nanocrystallization results for NIH Structural Genom-
ics proteins 2G1, 11D4A, and UGA 214.
3.3. Experiment imaging
Images are acquired and stored using the Crystal-
Score imaging system developed by Diversified Scien-
tific, Inc. The CrystalScore system allows for
automated image acquisition of each experiment, ar-
chiving of sequential images, data storage, and auto-
matic determination of crystal location, size, andnumber for any experiment. Other CrystalScore capa-
bilities include filtering and sorting relational databases
for ‘‘hits’’ and growth trends; storage of 2M database
records on local workstations; remote database con-
nectivity for Oracle 9I, IBM DB2, and Microsoft SQL
servers; database report generation to HTML, Word,
and Excel; time-lapsed image acquisition; AVI Focus-
Through Movies; and tray support for Linbro 24,Corning 24 and 96, 48-well, 96-well, 192-well, 384-well,
Greiner 288-well, Cryschem 24, VDX 24, and Nunc 72-
well plates.
3.4. Future efforts
The initial approach was to investigate numerous
experimental designs for automated low-volume crys-tallization. It is advantageous to scale-up a well-char-
Fig. 15. Microbatch nanoliter crystallization results for NIH Structural Genomics proteins 2G1, 4G49A, and W539A.
Fig. 16. Sitting drop vapor diffusion nanoliter crystallization results for NIH Structural Genomics proteins 2G1, 11D4A, and UGA214.
198 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
acterized process to construct a high-speed system thatsolves only problems unique to low-volume handling.
Future efforts in efficient screening of protein crystalli-
zation will focus on several areas. The NanoScreen
system will be used to routinely prepare experiments
supporting the NIH Structural Genomics project. Ad-
ditionally, several improvements to the NanoScreen
system will be implemented. These include faster
throughput (10-fold) using NanoScreen, and by the in-corporation of additional tips and dispensing heads,
optimizing movements to improve efficiency, and im-
proving the effectiveness of our statistical screens. The
implementation of other embodiments of our original
reduced-volume screening concepts to further improve
the efficiency and effectiveness of screening protein
crystallization conditions is being pursued. These in-
clude customizing tip dispense rates to accommodatemore viscous PEG solutions and more flexible control
software that automatically calculates and prepares re-
sponse surface optimization experiments (being devel-
oped by Diversified Scientific, Inc.). To improve our
imaging throughput, we are constructing an automated
imaging platform consisting of a robotic arm, three
custom CrystalScore systems, and incubators housing
the experiment trays. This imaging platform will be in-tegrated with a bar-code reader, plate imaging sched-
uler, central server for housing image databases, and
client computers for accessing experiment databases and
evaluating crystallization results.
4. Predicting protein crystallization conditions using
neural net technology
Neural net technology developed from artificial in-
telligence research was applied to protein crystallization
screening and resulted in the ability to accurately pre-
dict/recognize conditions that favored crystallization.
The following preliminary research (developed in col-
laboration with scientist and engineers from Diversified
Scientific, Inc., and the University of Alabama at Bir-mingham) deals with an optimization technique that
may increase the success rate for producing diffraction-
quality macromolecular crystals. This technology
demonstrates the most promise for optimizing protein
crystallization if combined with a thorough sampling of
crystallization space. An initial screen based upon
sampling techniques, such as the incomplete factorial
(Carter and Carter, 1979) described previously, is usedfor the protein crystallization trials. Every crystalliza-
tion trial outcome, including failures, is used to train a
neural network. Once trained, the neural network may
recognize conditions that yield crystals. Neural net-
works are based upon a real nervous system paradigm
composed of multiple neurons communicating through
axon connections. Characteristics of neural networks
include self organization, nonlinear processing, andmassive parallelism. The neural network exhibits en-
hanced approximation, noise immunity, and classifica-
tion properties. The self-organizing and predictive
nature of the neural networks allow for accurate pre-
diction of never before seen crystallization conditions,
even in the presence of noise. Our predictive neural
networks are trained via back propagation using the
incomplete factorial screen. If properly trained, theneural network can be used to identify or recognize
important patterns of crystallization. An input pattern
comprised of the incomplete factorial screen is pre-
sented to the network. The outputs are compared to the
known scores. Additional neurons are added and in-
terconnect weights (basis functions) are adjusted to
minimize the error. This process is continued until the
average error across all the training sets is minimized.Eventually, if the correct variables and sample size are
chosen to adequately represent the crystallization na-
ture of the protein, a stable set of hidden neurons and
basis function weights evolve. This neural network can
then be used to predict non-sampled complete factorial
conditions to be used for optimization, i.e. predicting
the conditions that produce crystals from the entire
‘‘crystallization space’’ of possible experimental condi-tions based upon the results from a much smaller
number of actual experiments performed. This ap-
proach has a higher probability of producing accurate
predictions if the small test set is statistically represen-
tative of the crystallization space.
A result of the structural proteomics initiative has
been the automation and miniaturization of protein
crystal growth experiments accompanied by a tremen-dous explosion in the generation of protein crystalli-
zation data. Protein crystallization data are complex.
This complexity is due to the nature of the data in-
corporated, which includes text, images, and quanti-
tative data related to the specific solution conditions
and scoring of results. An added layer of informatics
can also be wrapped around the core information that
is anticipated to link protein amino acid sequence andbiophysical parameters to crystallization results (Juri-
sica et al., 2001; Hennessy et al., 2000). Crystallization
results are the interpretation of the outcome of indi-
vidual protein crystallization experiments into usually
one of four main categories: clear drop, precipitate,
phase change, crystal. A variety of methods have been
adopted for evaluating the results of protein crystalli-
zation experiments. Historically, it is only those ex-periments that yield crystals that are used to generate a
second round of protein crystal growth experiments
for optimization (Gilliland, 1988). Optimization of
protein crystal growth is performed using multivariate
designs such as central composite, Box–Behnken, and
factorial (full and incomplete). These designs system-
atically evaluate several variables around a central
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 199
point or within a range (Box and Behnken, 1960; Boxand Hunter, 1957; Box et al., 1978; Carter, 1997;
Carter and Carter, 1979; Shaw Stewart and Baldock,
1999).
A comprehensive model for protein crystallization
does not exist. The optimization methods identify user-
defined variables in a systematic procedure to deter-
mine those variables most important for the particular
protein sample to be crystallized. There usually arefour variables that are common to all protein crys-
tallization experiments: protein concentration, pre-
cipitant concentration, pH, and temperature. Often
there are additional variables contributing to the suc-
cess or failure of crystallization experiments. If possi-
ble it is wise to select variables that have linearly
independent effects on protein crystal growth (Box
et al., 1978). In order to model the protein crystalli-zation data the results are scored. Hence an investi-
gator�s experience can play an important role in the
crystallization of a protein. If the experience of a
crystallographer could be captured into a model, much
of the subjectivity in protein crystallography could be
eliminated.
The recipes used for crystallization were developed by
the CBSE using 10 variables and 47 components. Thedesign for this screen is similar to that displayed in Table
3. Using this system there are (3� 3� 5� 5� 6� 3�3� 3� 3� 3¼ ) 328 050 combinations of components
that can be created. Depending upon the complexity of
the custom screens developed, the possibility of intro-
ducing human error and the amount of time needed to
generate the screens could be areas of potential prob-
lems. These challenges were considered an opportunityfor the development of technology that could facilitate
the high-throughput structural genomics programs and
industrial research.
In an effort to develop a tool for modeling protein
crystallization the screening results from a single pro-
tein were subjected to three types of analyses. The
‘‘best’’ analysis was then applied to protein samples 9c9
and Delta 8-10B. The incomplete factorial recipes de-veloped by the CBSE similar to that shown in Table 3
were used to screen all three protein samples for crys-
tallization. The actual experiments were implemented
using the NanoScreen NanoScreen. The robot prepared
360 microbatch crystallization experiments for each
protein sample. The crystallization experiments them-
selves were performed in a total volume of 80 nl. The
results from these experiments were visualized, ac-quired, and manually scored using CrystalScore. The
experimental preparation developed by the incomplete
factorial design was encoded in a 360� 10 matrix. The
score for each experiment was appended forming a new
column in the matrix. This matrix encoded the experi-
mental preparation and results formed the raw data for
analysis. Lactoglobulin gave a frequency of forming
crystals of 3.1–5.8% depending on the screen used. The
three types of crystallization analyses used for lacto-
globulin were (1) multiple-step regression analysis, (2)
neural net analysis, and (3) Chernov analysis. The
Table 3
Variables and treatments used for incomplete factorial screen
Variable/treatment Index
Variable 1: Temperature
4 �C 1
15 �C 2
20 �C 3
Variable 2: Protein dilution
4.382 1
3.912 2
3.442 3
Variable 3: Anionic precipitate
Chloride 1
Citrate 2
Acetate 3
Sulfate 4
Malonate 5
Thiocyanate 6
Variable 4: Organic precipitate
MPD 1
PEG 400 2
PEG 1450 3
PEG 4000 4
PEG 5000 MME 5
PEG 8000 6
Variable 5: Buffer pH
5.5 1
6 2
6.5 3
7 4
7.5 5
8 6
Variable 6: Precipitation strength
3 1
5.47 2
9.97 3
Variable 7: Organic moment
0.05 1
0.4 2
0.75 3
Variable 8: Percentage glycerol
0 1
5 2
10 3
Variable 9: Additive
None 1
Arginine 2
BOG 3
Variable 10: Divalent ion
None 1
Mg2þ 2
Ca2þ 3
200 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
multiple-step regression analysis is an extension of thepublished work of Drs. Carter and Carter, Jr., in which
linear, quadratic, and cross products were used to build
a model based on the screening results. The neural net
analysis was performed using a proprietary neural net
analysis program developed at Diversified Scientific,
Inc. The Chernov analysis was performed in collabo-
ration with the University of Alabama at Birming-
ham Department of Mathematics by Dr. NikolaiChernov. The performances of crystallization analyses
were compared using two metrics, the R2 value and the
ability to correctly identify/predict crystallization out-
come in experimental trials withheld from the training
data set. The best performing crystallization analysis
was then applied to protein 9c9 and Delta 8-10B
screens.
Multiple-step regression analysis of the lactoglobu-lin crystallization screening was inadequate at gener-
ating a model for crystallization. The regression
analysis was performed by Mrs. Shuying Yu, a SAS
programmer employed at the Jefferson County De-
partment of Health in Alabama. The major finding
was a significant lack of fit (R2 ¼ 0:54). This could be
interpreted as requiring further design or additional/
different variables. The regression model responsesurface has a saddle point suggesting that no optimum
response exist. Furthermore, the cross product of the
variables used in designing the screen had a significant
effect on score. The ability of this type of analysis to
model the data for lactoglobulin is still in the inves-
tigative stage. Preliminary research using the Chernov
algorithm and neural network suggested that these
methods produced better R2 values and therefore,more accurately predict the crystallization of Lacto-
globulin.
Preliminary research using a neural network accu-rately predicted the crystallization of lactoglobulin. A
partial sampling of the incomplete factorial design ex-
periment was used to train a neural net to recognize
conditions that result in crystallization with R2 values
between 0.63 and 0.75. The neural network was trained
with 87.5% (315 experiments) of the incomplete factorial
screen results (Fig. 17). A representative sampling of the
training set is shown in Table 4. Table 4 translates theincomplete factorial design in Table 3 to index values that
represent the physical components of the crystallization
recipe. Experiments 316 to 360 were withheld during
training and subsequently used for validation. The 315
experiments allowed the neural network to converge with
an R2 value of 0.754, an acceptable value, but still not
optimal. The input to the neural network is the indexed
variables and the output is the predicted score. Theweights of the hidden neurons are determined by back-
propagation. The remaining 12.5% (45 experiments) of
the incomplete factorial screen results were used for ver-
ification. Fig. 18 displays a comparison between the pre-
dicted and actual scores for lactoglobulin in experiments
316–360. The neural network successfully predicted the
two crystal outcomes (experiments 322 and 340) in the
validation set. The score or result from a protein crys-tallization experiment (y axis) versus the crystallization
experiment (x axis) is displayed. The results from a ‘‘real’’
experiment and those from the predicted experiment us-
ing a neural network are displayed. Fig. 19 illustrates the
physical outcomes of the two predicted crystallization
conditions (experiments 322 and 340). The trained net-
work was able to predict every crystallization outcome in
the 12.5% test set even though the data had never beeninput into the network. This result in the face of the lowR2
value highlighted the neural network�s ability to recognize
Fig. 17. Neural network training data for lactoglobulin. Experiments 1–315 were used to train the neural network.
Table 4
Representative training data for lactoglobulin neural network (experiments 1–315)
Experiment V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Score
1 1 3 4 1 2 3 4 5 2 3 1
2 1 3 1 2 3 3 2 2 2 2 1
3 1 3 1 1 1 2 1 1 3 3 3
4 1 3 3 1 3 2 5 4 1 3 1
5 1 3 5 3 2 3 3 2 2 2 1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .315 3 1 4 3 1 1 5 3 2 2 1
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 201
crystallization conditions for lactoglobulin. A neuralnetworkwas created using a partial (315/360 experiments)
incomplete factorial design for lactoglobulin. This ap-
proach may be used as an optimization tool to identify
crystal-yielding experiments not physically tested in the
incomplete factorial design. This is a relatively dramatic
example of how well the trained neural net functions as a
predictor of crystallization conditions. Graphs of the
crystallization results require specific ordinate scales toaccommodate different weighting schemes required for
training optimization.
The Chernov analysis was developed by Dr. Nikolai
Chernov, at the University of Alabama at Birmingham.
The algorithm is structured similar to a regression
model, but instead of coefficients, the algorithm uses
mathematical basis functions that can be linear or
nonlinear, discrete or continuous. The mathematicalbasis functions take into account higher order interac-
tions between the various components. The mathemat-
ical functions are automatically varied in an organized
procedure until the R2 value, or coefficient of determi-
nation (this is essentially a measure of the difference
between observed and computed values), is within an
acceptable range, usually greater than 0.95. Once
trained, the Chernov algorithm mathematically de-scribes the various states of crystallization (output) in
terms of real, physical variables (inputs). The Chernov
crystallization cost function would look like
F1V1 þ F2V2 þ F3V3 þ � � � þ FnVn ¼ Output;
where Fn are the mathematical basis functions that
can be linear or nonlinear, discrete or continuous; Vn
are the input variables, and Output is the predictedscore.
The Chernov algorithm converged on the lacto-
globulin training set with a R2 value of 0.93. The al-
gorithm was used in a predictive manner to create
optimization conditions for lactoglobulin protein crys-
tallization. Our current screening protocol tests 360 of
328 050 possible permutations for protein crystalliza-
tion. The highest 20 predictions of the 328 050 per-mutations were identified and are in the process of
being evaluated. It is anticipated that these analyses
may improve initial screening results. The Chernov
analysis was performed only on lactoglobulin screen
results. The analysis can be used to construct a model
that will recognize combinations of reagents that will
most likely give a selected score. The simplicity of
training neural networks and the apparent accuracy ofpredicting crystallization conditions in such preliminary
work have very exciting implications for optimization.
Because of the ease in training a neural net on
screening results this analysis was applied to protein
9c9 (unknown C. elegans protein), a previously un-
crystallized sample and Delta 8–10B, a membrane as-
sociated molecule. For protein 9c9, experiments 1–315
were used to train the neural network. There was onlyone crystal-producing condition in the training set
(experiment 239) (Fig. 20). The neural network con-
verged with an R2 value of 0.604. The scoring system
used was binary with noncrystal scored as 0 and crystal
scored as 2000. Experiments 316–360 were withheld
from the training set and used to verify the neural
network. In the validation set (experiments 316–
360), the neural network accurately predicted the onlycrystal-yielding experiment number 350, Fig. 21. The
neural network was able to predict the only crystal-
yielding experimental conditions in experiments 316–
360 for the previously uncrystallized protein 9c9, even
though it was not trained on these conditions. This
analysis further reinforces the hypothesis that the
neural network can be used to predict crystallization
Fig. 19. Predicted crystallization experiments 322 and 340. The neural network successfully predicted the 2 crystal outcomes for the lactoglobulin
verification data set.
Fig. 18. Lactoglobulin actual vs. predicted crystallization.
202 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
conditions for previously uncrystallized proteins. The
9c9 training data and results are summarized in Figs.
20–22.
Neural networks may be useful for crystallization
optimization of membrane proteins. An incompletefactorial screen with 11 input variables was designed for
protein Delta 8-10B. This protein is considered a pe-
ripheral membrane protein that has undergone numer-
ous crystallization attempts. It is a therapeutically
important protein whose structural data would be of
great interest to the scientific community. The incom-
plete factorial comprised 288 samples. The incomplete
factorial screen yielded the following outcomes: 226 clear
drops, 3 phase separations, 39 precipitates, 10 micro-
crystals/precipitates, 8 rosettes/spherulites, 2 needles, 0plates, 0 small 3D crystals, and 0 large 3D crystals. By
combining the incomplete factorial variables with the
outcome for each experiment, a set of nonlinear equa-
tions with 11 independent variables and 1 dependent
variable is created. These nonlinear equations become
the basis for training the neural network to recognize
experimental conditions useful for crystallization. A
neural network, trained on 90% of an incomplete facto-rial screen for Delta 8-10B, was able to predict the only
crystallization condition in the remaining 10% of the
incomplete factorial screen. The 90% training set con-
tained only 1 needle condition while the test set con-
tained only 1 needle condition. The neural network was
able to predict the crystallization condition of the needle
in the test set even though the 2 needles crystallized with
very disparate conditions (Table 5).In order to validate the integrity of the training for
the third protein, Delta 8-10B, only 90% of the test set
(259 experiments) was used for training. The remain-
ing 10% (29 experiments) was withheld from training
and used for verification. If the analysis can perform
Fig. 22. Crystallization outcome for protein 9c9 (experiment 350). The
neural network for protein 9c9 successfully predicted the only crystal
yielding outcome in the test set.
Fig. 23. Experiments 1–259 were used to train the neural network for Delta 8–10B.
Fig. 24. Experiments 260–288 were used to verify the neural network
for Delta 8–10B.
Fig. 21. Comparison of crystallization scores between predicted and
actual experiments using input data the neural net has NEVER seen
for protein 9c9. Experiments 316–360 were used to verify the neural
network that was trained only on experiments 1–315.
Fig. 20. Neural network training data for protein 9c9. Experiments 1–315 were used to train the neural network.
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 203
well on a small test set it may recognize those con-ditions that yield crystals from the complete crystal-
lization space not previously tested (�35 million
experiments in this example). The experiments were
randomized to remove the sequential sampling due to
the three temperatures. Since there were only 2 needle
conditions that resulted from the 288-condition screen,
1 needle condition (100mM Bicine, pH 8.3, 0.567M
sodium acetate, 0.9% PEG 400, 0.01M CaCl2, 0.05%BOG, protein dilution encoded as 2.3 at 14 �C) was
placed into the training set (Fig. 23) while the other
needle condition, experiment number 282 (100mM
acetate, pH 4.5, 0.648M sodium chloride, 11.6% PEG
M5000, 0.01M CaCl2, protein dilution encoded as 1.5
at 22 �C) was placed into the verification set (Fig. 24).
The neural network�s highest predicted score in the
test set was the needle condition, even though theneural network had never been exposed to the crys-
tallization condition in the test set. Visual inspection
of the crystallization conditions for both crystal results
appear disparate (Table 5). Because the neural net-
work is trained with all the results, including failures,
traditional optimization techniques that use only the
hits probably would have not predicted the crystalli-
zation conditions for the test set. The network wasable to predict the crystallization condition for the
only crystal in the test set (experiment 282) even
though it had never been trained on the experiment.
There were two false positives (experiments 263 and
275) although the relative highest predicted score
was for the correctly predicted crystal condition.
We anticipate simulating the 35 000 000 permutations
not tested, using the highest 20 predicted scores forcrystallization optimization/verification experiments.
The relative importance of each variable is shown in
Fig. 25.
The ability of the neural network to identify patterns
of crystallization in complex nonlinear data sets may
provide a powerful method of optimization. The total
number of permutations is calculated by multiplying the
number of discrete values of each design variable. In thefirst two experiments (lactoglobulin and the C. elegans
protein 9c9), there are 320 050 possible permutations in
the incomplete factorial space. In the second experi-
ment, there are approximately 35 million possible per-
mutations because two of the variables, organic
percentage and salt concentration, each have more than
20 possible discrete values.
We anticipate using the trained neural network on theremaining untested experiments for each protein
(320 050 for lactoglobulin and 9c9 and 35 000 000 for
Delta 8-10B) and then actually perform a limited num-
ber of crystallization experiments (on the top 20 pre-
dicted scores) to determine if the neural net was able to
predict crystallization conditions and perhaps condi-
tions that yield different crystalline habits.Table5
Comparisonofexperimentalconditionthatyield
crystalforDelta
8-10B
Experiment
Tem
perature
[Protein],
dilution
[Buffer]
Buffer
pH
[Salt]
Salt
[Organic,
%]
Organic
[Glycerol]
[Divalent]
Divalent
[Additive]
Additive
Score
Trained
14
2.300
0.100
MBicine
8.3
0.567
MNa
acetate
0.9
%PEG
400
0.0
0.010
MCaCl 2
0.050
%BOG
6
Predicted
22
1.500
0.100
MAcetate
4.5
0.648
MNa
chloride
11.6
%PEG
M5000
0.0
0.010
MCaCl 2
0.000
None
6
204 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206
5. Summary
In summary, the three technologies presented offer the
potential to significantly reduce the amount of time and
sample required to produce diffraction-quality crystals.Furthermore, preliminary experimental data indicate
that the incomplete factorial screen may be more effective
than available commercial screens are for determining
initial crystallization conditions. The ability to success-
fully use response surface optimization protocols may be
directly affected by protein purity, aggregation, and
process scale up. The accurate assessment of the potential
usefulness of response surface optimization may requireproteins that can be expressed/purified in large (6–10mg)
batches and are stable over a period of several weeks. The
NanoScreen crystallization system effectively minimizes
the amount of sample required while accommodating
both microbatch and traditional vapor diffusion meth-
ods. Both the neural network and Chernov algorithm
appear to accurately model crystallization using only a
small sample (�0.1%) of the crystallization space. Im-portantly, self-learning algorithms may be useful in pre-
dicting optimal crystallization conditions from the entire
set of crystallization conditions (328 050) for a particular
protein, thus addressing a major bottleneck in high-
throughput crystallization. An interesting observation is
that the actual scoring of the screening experiments can
have a dramatic outcome on the ability of mathematical
analyses to model crystallization. Furthermore, it is an-ticipated that class hierarchies may be created based
upon the crystallization functions, enabling more rapid
structure determination.
The eventual goal of this research is to incorporate
the incomplete factorial screen/response surface opti-
mization and the neural net crystallization prediction
program with the nanoliter crystallization robot. If
successful, this approach is expected to minimize ex-periment time and the quantity of protein required to
determine optimum crystallization conditions.
Acknowledgments
We thank Dr. Charles Carter for his consultation and
guidance with the development of our incomplete fac-torial design and response surface optimization pro-
gram. Funding for this research was provided by NIH
Grant P50-GM62407 and NASA Cooperative Agree-
ment NCC8-246.
References
Box, G.E.P., Behnken, D.W., 1960. Technometrics 2, 455.
Box, G.E.P., Hunter, J.S., 1957. Ann. Math. Stat. 28, 195.
Box, G.E.P., Hunter, W.G., Hunter, J.S., 1978. Statistics for Exper-
imenters. Wiley Interscience, New York.
Bray, T.L., Kim, L.J., Askew, R.P., Harrington, M.D., Rosenblum,
W.M., Wilson, W.W., DeLucas, L.J., 1998. New crystallization
systems envisioned for microgravity studies. J. Appl. Crystallogr.
31, 515–522.
Bray, T.L., Powell, D.L., DeLucas, L.J., 1997. Dynamic control of
vapor diffusion protein crystal growth. Am. Inst. Phys. Space
Technol. Appl. Int. Forum Proc., 705.
Carter Jr., C.W., 1997. Response surface methods for optimizing and
improving reproducibility of crystal growth. Methods Enzymol.
276, 74–99.
Carter Jr., C.W., Carter, C.W., 1979. Protein crystallization using
incomplete factorial experiments. J. Biol. Chem. 254, 12219–12223.
Chayen, N.E., 1997. The role of oil in macromolecular crystallization.
Structure 5, 1269–1274.
Collingsworth, P.D., Bray, T.L., Christopher, G.K., 2000. Crystal
growth via computer controlled vapor diffusion. J. Cryst. Growth
219, 283–289.
DeTitta, G.T., Biance, M.A., Collins, R.J., Faust, A.M.E., Kacz-
marek, J.N., Luft, J.R., Fehrman, N.A., Pangborn, W.A., Salerno,
J.M., Wolfley, J.R., 2001. Macromolecular crystallization in a high
throughput setting. Conference and Exhibit on International Space
Station Utilization, Cape Canaveral, FL, October 15–18.
Gilliland, G.L., 1988. A biological macromolecular crystallization
database: a basis for a crystallization strategy. J. Cryst. Growth 90,
51–59.
Hennessy, D., Buchanan, B., Subramanian, D., Wilkosz, P.A.,
Rosenberg, J.M., 2000. Statistical methods for the objective design
of screening procedures for macromolecular crystallization. Acta
Cryst. D Biological Crystallography 56, 817–827.
Hofmeister, F., 1888. On the understanding of the effects of salts.
Arch. Exp. Pathol. Pharmakol. (Leipzig) 24, 247–260.
Jurisica, I., Rogers, P., Glasgow, J., Fortier, S., Luft, J., Wolfley, J.,
Bianca, M., Weeks, D., DiTitta, G., 2001. Intelligent decision
support for protein crystal growth. IBM Syst. J. 40, 394–409.
Kuhn, P., Wilson, K., Patch, M.G., Stevens, R.C., 2002. The genesis of
high-throughput structure-based drug discovery using protein
crystallography. Curr. Opin. Chem. Biol. 6, 704–710, doi:10.1016.
Lamzin, V.S., Perrakis, A., 2000. Current state of automated crystal-
lographic data analysis. Nat. Struct. Biol. Struct. Genom. Suppl.,
978–981.
Luft, J.R., Wolfley, J., Jurisica, I., Glasgow, J., Fortier, S., DeTitta,
G.T., 2001. Macromolecular crystallization in a high throughput
laboratory: the search phase. J. Cryst. Growth 232, 591–595.
Mueller, U., Nyarsik, L., Horn, M., Rauth, H., Przewieslik, T.,
Saenger, W., Lehrach, H., Eickhoff, H., 2001. Development of a
technology for automation and miniaturization of protein crystal-
lization. J. Biotechnol. 85, 7–14.
Roses, A.D., 2002. Genome-based pharmacogenetics and the phar-
maceutical industry. Nat. Rev. Drug Discovery 1, 541–549.
Santarsiero, B.D., Yegian, D.T., Lee, C.C., Spraggon, G., Gu, J.,
Scheibe, D., Uber, D.C., Cornell, E.W., Nordmeyer, R.A., Kolbe,
W.F., Jin, J., Jones, A.L., Jaklevic, J.M., Schultz, P.G., Stevens,
Fig. 25. The chart demonstrates that the buffer index and pH were the two most important factors in the crystallization of Delta 8-10B.
L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206 205
R.C., 2002. An approach to rapid protein crystallization using
nanodroplets. J. Appl. Crystallogr. 35, 278–281.
Segelke, B.W., Rupp, B., Lekin, T.P., Krupka, H.I., Azarani, A.,
Todd, P., Wright, D., Wu, H.-C., 2002. The high speed Hydra-
Plus-One system for automated high-throughput protein crystal-
lography. Acta Crystallogr. D 58, 1523–1526.
Shaw Stewart, P.D., Baldock, P.F.M., 1999. Practical experimental
design techniques for automatic and manual protein crystallization.
J. Cryst. Growth 196, 665–673.
Further reading
Nagy, L., 2003. Screen conditions developed by CBSE investigator.
Temperature: 4, 15, and 22 �C. Protein concentration: 100, 65, and
35% of the provided stock. The stock is assumed to be highly
concentrated. Buffer pH: 4.5, 6, 7, 7.5, 8.3, 9. Total precipitant
(supersaturation S): chosen to give high, medium, and low concen-
trations relative to the amount of protein. For each precipitating
agent, an empirical constant, Ks, represents its molar strength as a
precipitating agent. The total amount of precipitant is calculated
using the empirical equation Ks1 ½Precip1 þ Ks2 ½Precip2 ¼ lnðSÞ�ln½macromolecule. Ratio of organic and ionic precipitants: chosen
to be 0.05 (organic precipitant is an additive), 0.4, 0.75. Glycerol
concentration: 0, 3, 6%. Ionic precipitant: chosen to reflect a variety
of basicities and Hofmeister behaviors, sodium acetate, sodium
chloride, sodium citrate, sodium malonate, ammonium sulfate, and
potassium thiocyanate.Where applicable, the salts are titrated to the
desired pH. Organic precipitant: MPD, polyethylene glycol 5000
monomethyl ether, and polyethylene glycols 400, 1450, 4000, and
8000. Divalent cations: none, 10mM calcium chloride and 10mM
magnesium chloride. Additives: none, 50mM arginine HCl, 0.05%
n-octyl-b-DD-glucopyranoside.
206 L.J. DeLucas et al. / Journal of Structural Biology 142 (2003) 188–206