9
Review 10.1586/14789450.2.4.511 © 2005 Future Drugs Ltd ISSN 1478-9450 511 www.future-drugs.com Structural proteomics in drug discovery Leslie W Tari, Martin Rosenberg and Anthony B Schryvers Author for correspondence University of Calgary, Department of Microbiology & Infectious Diseases, Faculty of Medicine, Calgary, AB T2N 4N1, Canada Tel.: +1 403 220 3703 Fax: +1 403 270 2772 [email protected] KEYWORDS: automated crystallization, drug design, high-throughput methods, protein crystallography, protein expression, structural genomics High-throughput, automated or semiautomated methodologies implemented by companies and structural genomics initiatives have accelerated the process of acquiring structural information for proteins via x-ray crystallography. This has enabled the application of structure-based drug design technologies to a variety of new structures that have potential pharmacologic relevance. Although there remain major challenges to applying these approaches more broadly to all classes of drug discovery targets, clearly the continued development and implementation of these structure-based drug design methodologies by the scientific community at large will help to address and provide solutions to these hurdles. The result will be a growing number of protein structures of important pharmacologic targets that will help to streamline the process of identification and optimization of lead compounds for drug development. These lead agonist and antagonist pharmacophores should, in turn, help to alleviate one of the current critical bottlenecks in the drug discovery process; that is, defining the functional relevance of potential novel targets to disease modification. The prospect of generating an increasing number of potential drug candidates will serve to highlight perhaps the most significant future bottleneck for drug development, the cost and complexity of the drug approval process. Expert Rev. Proteomics 2(4), 511–519 (2005) X-ray crystallography is currently the premier technique for elucidating 3D structures of large biologic macromolecules at atomic resolution. Crystal structures provide unparalleled insights into the functions of individual macro- molecules and macromolecular complexes, the exact nature of protein–ligand interactions and the catalytic mechanisms of enzymes. Informa- tion about the structural, chemical and dynamic landscape of a macromolecular drug target has been used for the development of small-molecule drugs through a process termed structure-based drug design (SBDD). SBDD is an iterative method that combines protein structure and molecular modeling to develop and optimize small-molecule inhibi- tors. By determining crystal structures of pro- tein targets with substrates, natural product inhibitors, compounds derived from libraries or de novo-designed scaffolds, medicinal chem- ists and molecular modelers are able to devise strategies for making rational modifications to molecules that may improve compound affin- ity and selectivity. After the first set of pro- tein–ligand structures are determined, further cycles of rational compound design/modifica- tion, target-based screening and cocrystal structure determination are used to develop potential drug candidates. SBDD has the potential to accelerate the rate at which small- molecule candidates reach the clinic by dra- matically reducing the number of compounds that have to be synthesized and tested. Although the ability to effectively act upon the drug target is an essential feature of drug candidates, their utility ultimately depends upon additional properties that are not directly assessed in most SBDD regimens (e.g., formulation, solubility, metabolism, safety, tolerability, adsorption, distribution and elimination). However, by precisely local- izing regions of the small molecules involved in the key protein–ligand interactions, SBDD provides insights into how modifications to CONTENTS Cloning, protein expression & production High-throughput robotic crystallization, imaging & tracking of results High-throughput x-ray data collection Structure determination & refinement Data tracking Leveraging structural information for drug discovery Expert commentary Five-year view Key issues References Affiliations For reprint orders, please contact [email protected]

Structural proteomics in drug discovery

Embed Size (px)

Citation preview

Page 1: Structural proteomics in drug discovery

Review

10.1586/14789450.2.4.511 © 2005 Future Drugs Ltd ISSN 1478-9450 511www.future-drugs.com

Structural proteomics in drug discoveryLeslie W Tari, Martin Rosenberg and Anthony B Schryvers†

†Author for correspondenceUniversity of Calgary, Department of Microbiology & Infectious Diseases, Faculty of Medicine, Calgary, AB T2N 4N1, CanadaTel.: +1 403 220 3703Fax: +1 403 270 [email protected]

KEYWORDS: automated crystallization, drug design, high-throughput methods, protein crystallography, protein expression, structural genomics

High-throughput, automated or semiautomated methodologies implemented by companies and structural genomics initiatives have accelerated the process of acquiring structural information for proteins via x-ray crystallography. This has enabled the application of structure-based drug design technologies to a variety of new structures that have potential pharmacologic relevance. Although there remain major challenges to applying these approaches more broadly to all classes of drug discovery targets, clearly the continued development and implementation of these structure-based drug design methodologies by the scientific community at large will help to address and provide solutions to these hurdles. The result will be a growing number of protein structures of important pharmacologic targets that will help to streamline the process of identification and optimization of lead compounds for drug development. These lead agonist and antagonist pharmacophores should, in turn, help to alleviate one of the current critical bottlenecks in the drug discovery process; that is, defining the functional relevance of potential novel targets to disease modification. The prospect of generating an increasing number of potential drug candidates will serve to highlight perhaps the most significant future bottleneck for drug development, the cost and complexity of the drug approval process.

Expert Rev. Proteomics 2(4), 511–519 (2005)

X-ray crystallography is currently the premiertechnique for elucidating 3D structures of largebiologic macromolecules at atomic resolution.Crystal structures provide unparalleled insightsinto the functions of individual macro-molecules and macromolecular complexes, theexact nature of protein–ligand interactions andthe catalytic mechanisms of enzymes. Informa-tion about the structural, chemical anddynamic landscape of a macromolecular drugtarget has been used for the development ofsmall-molecule drugs through a processtermed structure-based drug design (SBDD).SBDD is an iterative method that combinesprotein structure and molecular modeling todevelop and optimize small-molecule inhibi-tors. By determining crystal structures of pro-tein targets with substrates, natural productinhibitors, compounds derived from librariesor de novo-designed scaffolds, medicinal chem-ists and molecular modelers are able to devisestrategies for making rational modifications to

molecules that may improve compound affin-ity and selectivity. After the first set of pro-tein–ligand structures are determined, furthercycles of rational compound design/modifica-tion, target-based screening and cocrystalstructure determination are used to developpotential drug candidates. SBDD has thepotential to accelerate the rate at which small-molecule candidates reach the clinic by dra-matically reducing the number of compoundsthat have to be synthesized and tested.

Although the ability to effectively act uponthe drug target is an essential feature of drugcandidates, their utility ultimately dependsupon additional properties that are notdirectly assessed in most SBDD regimens(e.g., formulation, solubility, metabolism,safety, tolerability, adsorption, distributionand elimination). However, by precisely local-izing regions of the small molecules involvedin the key protein–ligand interactions, SBDDprovides insights into how modifications to

CONTENTS

Cloning, protein expression & production

High-throughput robotic crystallization, imaging & tracking of results

High-throughput x-ray data collection

Structure determination & refinement

Data tracking

Leveraging structural information for drug discovery

Expert commentary

Five-year view

Key issues

References

Affiliations

For reprint orders, please contact [email protected]

Page 2: Structural proteomics in drug discovery

Tari, Rosenberg & Schryvers

512 Expert Rev. Proteomics 2(4), (2005)

the compound can be made without compromising ligandaffinity. Thus, information from testing several small-mole-cule candidates in animals, or in surrogate tests when availa-ble, can provide insights on how to rationally design drug can-didates with improved pharmacokinetic properties or areduced toxicity profile. Several drugs currently on the marketsuch as the HIV protease inhibitor Agenerase™ (Vertex, Kis-sei and GlaxoSmithKline) [1] and the neuraminidase inhibitorsRelenza™ (Biota and GlaxoSmithKline) [2] and Tamiflu™(Gilead Sciences and Roche) [3] were developed through struc-ture-based methodologies. Many more are currently in earlierphases of development.

Protein crystallography, while of tremendous utility in drugdiscovery, has traditionally been slow, labor intensive andexpensive, due to serious bottlenecks in a number of steps inthe process. However, the time, difficulty and expense inobtaining protein structures by x-ray crystallography has beendramatically reduced by some key technological breakthroughs,which moved the technique into the high-throughput realm.With the advent of molecular biology tools developed in the1980s and 1990s that allow facile recombinant expression ofproteins, a major bottleneck of obtaining suitable protein sam-ples for crystallization was removed. A variety of methods havebeen developed for parallel expression and purification of largenumbers of gene products [4–9]. Such parallel methods haveallowed scientists to explore diverse arrays of multiple gene con-structs, homologs and variants for specific protein targets. Anencompassing search of gene construct space greatly increasesthe chances of obtaining crystallizable protein samples. Overthe past 10 years, the development of robotic systems for theset-up and imaging of crystallization experiments has led toexponential increases in throughput of crystallization experi-ments and increased reproducibility of results. Concurrently,advances in the structure determination process, includingstructure solution by multiple-wavelength anomalous disper-sion (MAD) using selenomethionine incorporated over-expressed protein [10,11], rapid diffraction data collection at syn-chrotron beamlines using flash-cooling [12], beamline sample-mounting robotics [13–15] and automated structure solutionmethods [13,16–18], have provided the complementary elementsenabling high-throughput structural biology [13,19–29].

Many of the structures solved by the structural genomicsconsortiums and centers are biased towards proteins that canrapidly fold into a stable functional form in the cytoplasm, the‘low-hanging fruit’ [30]. However, many potential drug targetsarise from classes of proteins that are more challenging andwhich will require more specialized approaches. Fortunately,even though the advances in high-throughput crystallographyhave largely been developed in the context of large consorti-ums or well-funded industrial initiatives, many of the strate-gies can be adopted by smaller groups or by individual labora-tories that are targeting specific proteins or classes of proteins.This paper will discuss the various stages of the SBDD processin the context of the recent advances and their applicability forthe scientific community at large.

Cloning, protein expression & productionThe ambitious structural genomics projects geared to screen allthe gene products from a particular organism required efficientsystems for cloning genes into expression vectors that areadaptable to high-throughput (robotic) methodologies. Clon-ing systems such as the Gateway™ [31] and the TOPO™ [32]

systems provide the ability to directly clone PCR products intoexpression vectors without conventional restriction digestionand ligation steps. The Gateway system provides the additionalability to rapidly subclone the genes into a series of customizedexpression vectors, which can vary with respect to the nature,presence and position of affinity tags for subsequent purifica-tion when the mature coding sequence is flanked by the attBsites. Initial problems encountered with expression in some sys-tems that were attributed to the junctional sequences can beaddressed, in part, by the availability of the alternate attBsequences developed for multisite cloning applications [33].Even in the absence of robotics and without the use of com-mercial cloning systems, it is possible for modest research pro-grams to implement fairly extensive expression screening byrapidly cloning large numbers of genes or gene derivatives intoa variety of expression vectors. The demonstrated advantage ofincluding a diverse collection of homologs for crystallizationprojects [34] and the ever-increasing number of genomicsequences available will likely foster wider adoption of strate-gies that involve screening expression of a variety of differentgenetic constructs.

In spite of the extensive efforts at expressing proteins, thestructural genomics initiatives provide us with relatively fewinsights on which parameters are the most important for pro-duction of functional protein from different classes. Initially, thevectors were primarily designed for high-level expression in thecytoplasm and thus preferentially yielded proteins that readilyand rapidly fold into their functional form under these condi-tions. The impressive yield of soluble proteins providing diffrac-tion-quality crystals that was obtained from genomes ofthermophilic bacteria (23% of the proteome) has not been read-ily extended to other situations [35]. Significant hurdles stillexist. The barriers for functional expression of recombinant pro-tein depend upon the type of protein and have prompted devel-opment of specialized expression systems such as those designedfor the expression of eukaryotic [36,37] and membrane [38] pro-teins. However, there remains a clear need for continued devel-opment and evaluation of specialized expression systems beforethe challenge of producing functional protein for crystallizationtrials has truly been met. When targeting a specific class or typeof protein it would be logical to optimize the expression condi-tions with a limited subset of proteins prior to initiating largerscale expression and production runs. Direct assays for func-tional activity would obviously be preferable but are unlikely tobe adapted to a high-throughput format. Indirect assays such asreporter systems that monitor the host response to heterologousexpression of proteins could be incorporated into high-throughput expression experiments to evaluate their correlationwith production of protein yielding crystals [39].

Page 3: Structural proteomics in drug discovery

Structural proteomics in drug discovery

www.future-drugs.com 513

In order to be suitable for crystallization, a protein has to foldcorrectly without aggregation, should be free of extraneousstructurally heterogeneous regions and should maintain func-tion such as binding of substrates or interacting with proteinpartners. Sodium dodecyl sulfate polyacrylamide gel electo-phoresis has being routinely used to assess suitability of proteinpreparations for crystallization trials, but does not assure homo-geneity, proper folding or monodispersity. Light scattering orgel filtration chromatography are the best methods to evaluatethese parameters, but are not ideally suited for high-throughputscreening (HTS) [40]. There is clearly room for development ofinnovative approaches for evaluating the suitability of proteinsfor crystallization and with the increasing numbers of proteinsbeing produced and tested in crystallization screens, thereshould be ample opportunity to assess their utility.

In the absence of systematic studies that evaluate the variousparameters involved in optimum functional expression of dif-ferent classes of proteins, it would seem prudent to designexpressions systems that explore properties demonstrated to beimportant in individual situations or that seem intuitively obvi-ous. Thus, secreted or surface-anchored proteins may requireexport for functional expression. The genetic background maybe critical for successful production of some proteins, such asthe expression of membrane proteins in Rhodobacter species [38],thus, it would be prudent to design systems capable of evaluat-ing various genetic backgrounds in a high-throughput format.In some instances, it may be necessary to design custom expres-sion systems that closely emulate the natural expression envi-ronment. In addition, proteins that are normally found in com-plexes may require coexpression of one or more of the partnerproteins for proper folding and assembly.

In vitro expression systems provide an alternate approach forprotein production that can provide sufficient protein for cur-rent crystal screening methods under optimal conditions [41].In vitro systems have the advantage of avoiding toxicity problemsthat may limit protein expression from in vivo systems and havethe potential of adding further components to overcome limita-tions in folding and processing. The issues of relative cost andgeneral applicability will only become apparent after continuedcomparison to the various in vivo systems that are developed.

The selection of a variety of homologs and subsegments ofthe genes of interest based on the analysis of sequence align-ments and secondary structure predictions may dramaticallyimprove the likelihood of obtaining protein suitable for crystal-lization screens. For de novo construct generation, the best casescenario is one where something is known about the protein ofinterest, either from the literature, from a comparison withorthologs/homologs, or from a low-resolution crystal structure.Using sequence alignments, this information can be leveragedinto a few well-chosen expression constructs for crystallography.However, even in the absence of any a priori knowledge, identi-fication of crystallizable domains and/or heterogeneous regionscan be accomplished using several methods. Systematic trunca-tions from the N- and C-terminal ends of the protein of inter-est, limited proteolytic digestion, and mutagenesis of putative

surface residues from large flexible charged residues (e.g., lysine)to smaller less conformationally labile polar ones (e.g., serine),followed by small-scale protein expression, are commonly usedto identify useful variants suitable for structural analysis. Othervariables, including the location and nature of affinity purifica-tion and secretion tags, the inclusion of proteolytic sites for tagremoval, internal sequence truncations (to remove structurallyheterogeneous or hydrophobic loops), bicistronic expression ofpartner proteins, and site-specific mutagenesis to enhance sam-ple solubility and monodispersity, or to prevent or mimic apost-translational modification (e.g. serine to aspartate muta-tions to mimic phosphorylation), are all commonly used strate-gies by crystallography groups. However, in many instances, themodifications that may result in crystallizable protein are notapparent. To address this problem, protein engineering usingdirected evolution approaches can be considered when a suita-ble reporter system is available, such as the use of green fluores-cence protein (GFP) as a C-terminal fusion partner [42]. Thesplit GFP reporter system may be even more appropriate forcytoplasmic proteins due to the reduced effect on inherent solu-bility of the test protein [43]. Reporter systems that are suitablefor proteins produced in other compartments such as mem-brane or surface proteins would aid in overcoming the relativelylow success rate at producing sufficient functional protein forcrystallization trials.

The success of structural genomics initiatives has relied on thedevelopment of systems for rapid production and isolation ofrecombinant proteins. The Escherichia coli-based system devel-oped by The Genomics Institute for the Novartis ResearchFoundation (GNF) is perhaps the most successful as it was usedto rapidly scan through the entire soluble genome of Therma-toga maritima to extract the crystallizable proteins [35]. Althoughthis system may not be suitable for many other types of proteins(i.e., membrane or eukaryotic proteins), it does illustrate howcertain principles and features can be applied for more rapidprotein production. The development of yeast-based expressionsystems for eukaryotic proteins has the advantage of shortergeneration times than insect or mammalian expression systems,but the extraction and purification schemes involved presentchallenges and are not truly high throughput [36]. The use ofaffinity tags, and even combinations of affinity tags [36], are anessential feature of rapid, generallizable protein purification pro-tocols and can be readily removed by specific protease cleavageto avoid interference with crystallization.

High-throughput robotic crystallization, imaging & tracking of resultsSecond to the generation of crystallization-quality protein sam-ples, the set-up of crystallization experiments served as the nextmost important bottleneck in obtaining macromolecular struc-tures. Although the use of multichannel pipettors and 96-well,sitting drop, crystallization plates can enhance the output byhand, a competent technician is hard 0pressed to set-up severalhundred crystallization experiments in a day. A number ofrobotic systems have been developed to alleviate this bottleneck.

Page 4: Structural proteomics in drug discovery

Tari, Rosenberg & Schryvers

514 Expert Rev. Proteomics 2(4), (2005)

One of the largest high-throughput systems built to date is theGNF robot currently in use at Syrrx, which is able to process inexcess of 40,000 crystallization experiments in a single day.Other groups are building automated crystallization systemsand some are becoming available commercially such as theRoboDesign Crystalmation system, the Phenyx crystallizationrobot (GeneBio), the Hydra fluid-handling system, theSyrrx/RTS system and a system from Data Centric Automationrecently installed at GlaxoSmithKline [44]. These new systemssupersede liquid handling systems such as those available fromGilson, by incorporating barcoded sample plates, a database andLaboratory Information Management System (LIMS) system totrack the enormous volume of data generated by such systems.Automated crystallization confers a number of advantages overconventional methods. First, robotic systems dramaticallyincrease the speed and reproducibility of crystallization set-ups.Additionally, robotic systems allow for higher density drop plat-ing configurations for a more condensed experimentation scale,thereby minimizing plate storage space requirements. In addi-tion, an automated crystallization system facilitates efficienttracking of experimental data, a key requirement when workingin any high-throughput environment.

An innovation incorporated into the GNF robotic crystalli-zation system, now commercially available from Syrrx/RTS, isthe use of submicroliter (or nanoliter) crystallization volumes[101]. By using volumes 20–50-times smaller than those routinelyemployed in standard microliter volume experiments, sub-microliter volume techniques lessen the amount of samplerequired to comprehensively sample crystallization space byseveral orders of magnitude [45]. Using nanovolumes, an exten-sive exploration of crystallization space at multiple tempera-tures can be carried out with less than a milligram of protein.The reduced material requirements facilitate parallelization, sothat small quantities of numerous protein variants can bescreened simultaneously. Using conventional methods, theseexperiments would be executed serially over many weeks andrequire large amount of biomass to generate enough protein fora single screen. However, with the biomass requirementsdecreased by a factor of 20–50 for growth and purification,many experimental protocols for a given protein target can betested simultaneously [35]. Nanovolume crystallization dropletshave much smaller surface area-to-volume ratios than micro-liter droplets, dramatically increasing the equilibration rate invapor diffusion experiments. The faster turnaround time fornanovolume crystallization experiments facilitates the rapidassessment and optimization of experimental parameters thatimpact crystal growth and structure determination [46]. Whilenanovolume crystallization tends to produce crystals that aresmaller than microliter volume methods, advances in detectortechnology coupled with the availability of intense in-houseand synchrotron x-ray sources allows for routine data collec-tion without losses in diffraction resolution. An advantage ofsmaller crystals is an increased surface area-to-volume ratio thatprovides easier diffusion of cryoprotectants into the crystal lat-tice, thus minimizing damage due to osmotic shock during

cryoprotection, and reducing crystal mosaicity [47]. Additionally,it has been demonstrated that a reduction in the volume of acrystallization experiment mimics the effects of microgravity byreducing convective flow, thereby generating crystals of superiordiffraction quality to those grown in larger volumes [48].

A new crystallization system that uses nanovolumes and isbased on a free-interface diffusion technique may offer ortho-gonal advantages over the more traditional vapor diffusionmethods; the Topaz Crystallizer™ available from Fluidigm [49].A small volume of protein is pumped through the chip to smallchambers where it is allowed to slowly mix with a number ofprecipitants in separate chambers through small channels.Small pressure valves distributed throughout the chip are usedto open and close channels and to pump liquids. The chip isbased on fluidic microprocessor technology developed byStephen Quake and colleagues at Caltech, which has potentialfor automating and miniaturizing a large number of steps in theproteomics laboratory [50].

The production of thousands of crystallization experimentsin a single day leads to the greatest bottleneck in a crystalliza-tion experiment; each crystallization trial has to be visuallyinspected several times over the lifetime of the experiment todetect crystals. This problem has been addressed by a numberof groups by developing automated imaging systems that com-bine a storage system with a motorized microscope and charge-coupled device (CCD) camera. Currently, there are ten crystalstorage and imaging systems available commercially [44]. Thesesystems store the images, typically, as JPEG image files that canbe scanned quickly by the crystallographer and examined forcrystal growth. The crystallization plate storage developed atGNF and deployed at Syrrx involves a gantry-configurationcrystal image and storage system that can automatically captureand process 138,000 images per day at both 4 and 20°C [44].

Substantial effort has been dedicated to the development ofimage analysis software to find the crystals in the large numberof images that are generated by these systems. One such systemhas been developed by GNF/Joint Center for StructuralGenomics (JCSG) [51]. However, due to the complex and irreg-ular nature of precipitates in crystallization droplets, refractiveskins that can form on the surfaces of droplets and other com-plicating factors, the software sometimes misses crystals, ormore frequently, characterizes precipitates and skins as crystals.While the software can be tuned to detect more than 95% ofsuccessful crystallization experiments (i.e., to reduce the chanceof missing a crystal), this accuracy comes at a cost of a very highrate of false-positive indications [51]. While this level of accuracyhas proven acceptable for scanning easy to crystallize prokaryotictargets, it still does not suffice for drug discovery applicationswhere a smaller number of high-value, very difficult targets witha low probability of crystallizing are being pursued.

High-throughput x-ray data collectionUsing high-throughput cloning/expression/crystallization meth-ods, large consortia and industrial labs are able to generate sev-eral hundreds of crystals per month suitable for diffraction

Page 5: Structural proteomics in drug discovery

Structural proteomics in drug discovery

www.future-drugs.com 515

analysis. To keep up with this level of crystal production, tens tohundreds of crystals must be screened for diffraction prior todata collection, and data sets must be collected quickly, so thatfive to ten data sets are collected per day. These levels ofthroughput necessitate data collection at intense synchrotronradiation sources to rapidly obtain high-resolution diffractiondata on the small crystals generated from automated crystalliza-tion experiments. Recent developments in rotating anode x-raysources and optics by Rigaku/MSC have led to the developmentof in-house x-ray sources suitable for high-throughput applica-tions such as the FR-E™ superbright generator that providesflux at the sample equivalent to second-generation synchrotronx-ray beamlines by focusing the x-rays onto a small area. Addi-tionally, automated sample mounting is essential for highthroughput, particularly at synchrotron sources, where enteringand exiting the experimental hutch is a time-consuming proce-dure. An automated system can be operated remotely and rununsupervised, and samples can be automatically screened fordiffraction and ranked in order of diffraction quality and/orresolution. Of equal importance, sample tracking and datacapture from diffraction experiments can be automated.

The first automated system was developed at Abbott Labora-tories [14], and is now being manufactured by Rigaku/MSC asthe ACTOR system. Independently, automated systems havebeen developed and are in place at the Stanford SynchrotronRadiation Lab (SSRL; CA, USA) [52] and at the Advanced LightSource (ALS; Lawrence Berkeley National Laboratory, CA,USA) [53]. Other beamlines throughout the world are imple-menting automated sampling handling and many systemsshould be in place in the near future. These systems utilize arobotic arm that retrieves conventionally flash-frozen crystalsmounted in rayon loops on magnetic bases from storage, andplaces them on a motorized goniometer while keeping themfrozen at liquid nitrogen temperatures. The use of the mountingrobot in the hutch has had a dramatic impact on throughput,such that 100 crystals can be screened with x-rays in 3–5 h [53].

Structure determination & refinementFor most routine problems, structure determination, or theconversion of integrated x-ray intensities into atomic coordi-nates, has become a streamlined and automated process. In par-ticular, the perfection of the Se-MAD method has greatlyreduced the time between initial crystallization of a protein andthe time the first model is available [54]. In this method, themethionines in a protein are replaced by selenomethionine.Since selenium absorbs x-rays differently than the rest of theatoms in the protein at wavelengths around the seleniumabsorption edge, the difference in the intensities that result bytaking data at these wavelengths can be treated as isomorphousderivatives. The advantage over the traditional heavy atommethod of solving protein structures is that the data are fullyisomorphous and can be obtained from a single crystal, thusminimizing experimental errors. Initially, this method requiredthe use of a methionine auxotroph and was limited to E. coliprotein expression [11]. Procedures have been determined for

inserting selenomethionine in yeast, insect and mammalian cellexpression systems by adding exogenous selenomethioninewhile simultaneously downregulating internal methionine syn-thesis [55]. This has allowed the use of the Se-MAD method onmore difficult human drug target proteins.

Phasing of Se-MAD data can be rapidly accomplished using anumber of available computer programs [13,16–18]. Model coordi-nates can be rapidly obtained using the automated tracing andrefinement procedures coded in the computer programARP/wARP [56].

The rapid increase in the number of solved structures hasmade the method of determining structures by molecularreplacement strategies increasingly successful. Molecularreplacement techniques allow for much more rapid structuredeterminations than de novo methods because they do notrequire the generation of derivatized protein, data collectionruns at multiple wavelengths or on multiple crystals, and build-ing of the entire molecular model into experimental electrondensity. However, for this method to work, a previously solvedstructure with sequence identity of approximately 30% orgreater needs to be available. As the structural proteomicsprojects currently in progress turn out an ever-increasingnumber of structures, the probability of finding molecularreplacement models for new structures will continue to increase.

Once an initial structure determination is completed, it isworth stopping and carefully considering whether this crystalform is the ideal one to continue with for SBDD work. It israre that a protein will crystallize in only one form. The idealcrystal for SBDD would diffract to better than 2.5 Å resolu-tion, can have compounds bound by soaking into the crystal,and would be grown in conditions without high concentrationsof salt, as this will make it difficult to dissolve compounds.Often, the model will have some mobile ends or loops withpoor or no density, and a common strategy is to trim theseparts in the construct and re-screen for crystals. Other strategiesinclude replacing a few surface residues with another type. Ofcourse, care must be taken not to disrupt the active site of theprotein or to adversely affect the activity of the protein. Care-fully improving the freezing conditions has also led to crystalswith improved diffraction characteristics. In this regard, it isworth determining if the crystal can be grown in the presenceof a cryoprotectant so that adding it later is not needed.

Data trackingThe vast amount of experimental data generated from a highlyparallel crystallization platform requires a fully integrated infor-mation system. For each construct that enters the platform, alldata relevant to gene cloning, gene expression, protein purifica-tion, protein biophysical characterization, crystallization condi-tions, crystal annotations, crystal harvesting, diffraction andstructure solution must be captured. Such an informatics plat-form allows scientists real-time access to any experimental datapertaining to a specific project and tracks all the physical mate-rials that progress through the system. Most importantly, theinformation system provides the basis for efficient bidirectional

Page 6: Structural proteomics in drug discovery

Tari, Rosenberg & Schryvers

516 Expert Rev. Proteomics 2(4), (2005)

data sharing that increases platform efficiency. By providing ameans for instant access to all the experimental data, the infor-matics platform enables rapid assessment of key experimentalparameters in the gene to structure process to ultimately acceler-ate the speed at which diffraction quality crystals of traditionallydifficult targets are obtained.

Leveraging structural information for drug discoveryWith the emergence of robotics, facile molecular biology toolsand increases in computational power over the past 20 years,almost every facet of crystallography has been impacted, suchthat the volume of crystal structures that can be generated in anindustrial setting has increased by orders of magnitude. Thus,SBDD already plays a more prominent role in the refinement ofdrug leads, as well as in the discovery of novel drug compounds.While x-ray crystallography has proven itself as a powerfulmethod for refinement of small-molecule compounds into via-ble drug leads, it has traditionally been too slow to competewith HTS methods for the discovery of novel, high-quality drugleads. Now, with the advent of high-throughput paradigms forcrystallographic structure determination, it is emerging as apowerful weapon in the drug discovery arsenal. Overall, small-molecule lead discovery has been favorably impacted by severalnew approaches over the past decade, including combinatorialchemistry, parallel chemical synthesis and HTS [57]. All of theabove methods are used in the conventional approach to small-molecule lead discovery, where a protein target is screenedagainst a large library of small drug-like compounds (thousandsto millions), and the readout is typically an assay that measuresthe inhibition of the protein activity. Compounds that inhibitwith inhibitory concentration of 50% (IC50) values in the lowmicromolar range are usually classified as ‘hits’, and optimizedfor potency and pharmacologic properties by medicinal chem-ists. While many drugs have been successfully developed by thisapproach, limitations of the method have been recognized. Atpresent, bioassay methods only detect high-affinity interactions,and it is not unusual that most of the hits from HTS fail toprogress for a host of other reasons unrelated to their potency.Currently, a major hurdle is that the targets employed in thisstrategy are not often validated.

Over the past 5 years, a new alternative to HTS approachesthat exploits x-ray crystallography, termed fragment- or scaf-fold-based drug discovery, has been introduced, which is show-ing early promise as a potent and efficient new drug discoverytool (FIGURE 1) [58]. Fragment-based discovery employs a libraryof smaller (<200 Da), simpler compounds with fewer function-alities than those typically employed in HTS, which conse-quently exhibit lower affinities (micro- to millimolar) thantheir HTS counterparts for a given protein target. X-ray crystal-lography is ideally suited for this approach, since it is able todetect much lower affinity interactions than typical bioassaymethods [59]. Using fragment-based methods, crystals of a tar-get protein are soaked with fragment mixtures, and their struc-tures characterized. Once a fragment or combination of frag-ments have been found to bind to a target protein, they can be

grown, decorated or linked to form larger drug-like compoundsusing SBDD. The power of the technique becomes apparentwhen one considers that this method can facilitate the develop-ment of leads with 3–5 orders of magnitude increases in com-pound affinity after synthesis of less than 1000 compounds.The most impressive published example of the application ofthis method to drug discovery is by Card and coworkers, whoeffectively used fragment-based approaches to develop novel,high-affinity phosphodiesterase (PDE)-4 inhibitors [58]. Start-ing with a library of 316 fragments, the authors were able totake parent fragments with approximately 100 µM affinity, anduse SBDD to increase their affinities to the 20–100-nM range.Even more impressive was that only 21 compounds had to besynthesized to achieve this increase in affinity. Although itremains unclear whether selective PDE4 inhibitors will demon-strate clinical utility without side effects, this example doesillustrate the relative speed and efficacy of this approach to thedrug discovery process.

SBDD approaches are now components of the drug dis-covery pipeline with established track records at almost alllarge pharmaceutical companies, as well as many newersmall and midsized companies around the world. Some ofthe newer approaches, such as fragment-based drug design,

1

2A

B

C

12

12 23

12

12

Figure 1. Fragment-based screening. Hits with small-molecule fragments are optimized using several methods. (A) Fragment (1) is observed binding to a pocket in the drug target. Other functionalities can be tethered to it so that it can be adapted to bind in nearby pockets. (B) Simultaneously bound fragments in adjacent pockets can be linked to increase affinity. (C) Careful analysis of individual fragment-binding modes allows rational modification of individual fragments (i.e., conversion of (1) to (3)) to improve potency or the pharmacokinetic properties of the original compound.

Page 7: Structural proteomics in drug discovery

Structural proteomics in drug discovery

www.future-drugs.com 517

have been coupled with the high-throughput structural biol-ogy engines at several new biotechnology companies,including Plexxicon, Aztexs and Structural Genomix. Whilethe fragment-based methodology has shown considerableearly promise, the next several years will be telling, as com-pounds generated using the method enter the clinic. Thenthe results from the preliminary studies can be judged,allowing for a quantitative comparison of the speed, robust-ness and cost effectiveness of fragment-based SBDD toother HTS approaches.

Expert commentaryParallelized high-throughput approaches that have beendeveloped by structural genomics consortia have greatlyincreased the success rates for generating diffraction-qualitycrystals. However, these approaches have not yet been verysuccessful at expression, production and isolation of many ofthe most appealing drug targets. These targets will requirespecialized approaches. Progress can be enhanced by moreextensive implementation of high-throughput methodologiesby the general scientific community or by more specializedfocus of structural genomics initiatives. SBDD, includingfragment-based approaches, have the potential to provide anincreasing number of lead compounds, but need to be accom-panied by new and novel approaches for assessing their phar-macologic and biologic properties. The integration of struc-tural biology and structural genomics into multidisciplinary

research groups should be encouraged as it may be one of themost effective means of coordinating all aspects of the drugdevelopment process.

Five-year viewThe next 5 years will see continued progress by structuralgenomics consortiums and industry in solving an ever-expand-ing number of protein structures. Concurrently, various aspectsof the high-throughput approaches will be increasingly adoptedby the general scientific community and will provide more spe-cialized strategies for tackling the most challenging target pro-teins. The availability of an increasing number of potential drugtargets and fragment- or scaffold-based drug discovery tools willdramatically increase the number of lead compounds that aregenerated, possibly with an increasing involvement of smallerbiotechnology companies. The increasing number of lead com-pounds will both help to alleviate but also create a bottleneckfor drug development. The availability of new structural leadsagainst novel targets will provide incentive for development ofthe tools necessary to validate the pharmacologic and biologicproperties of these targets. However, in turn, the increasingnumber of promising drug candidates that pass the initial ani-mal testing phase will now encounter another major bottleneckwhen facing the cost and complexity of the current drugapproval process. Clearly, this bottleneck cannot be overcomeby scientific or technological advances. If it is to be addressed, itwill require political, business and economic solutions.

Key issues

• The advances in high-throughput crystallography need to be more accessible and more widely adopted by the general scientific community in order to facilitate the conquering of more challenging targets.

• Functional expression and crystallization of many potential drug targets remains a challenging problem that will require the continual development and application of a variety of innovative approaches.

• The increasing number of lead compounds that are generated by structure-based drug design will drive the need for innovation in the validation process.

• The drug approval process will increasingly become the major bottleneck for drug development, made more apparent by an increasing number of potential drug candidates.

ReferencesPapers of special note have been highlighted as:• of interest•• of considerable interest

1 Kim EE, Baker CT, Dwyer MD et al. Crystal structure of HIV-1 protease in complex with VX-478, a potent and orally bioavailable inhibitor of the enzyme. J. Am. Chem. Soc. 117, 1181–1182 (1995).

2 von Itzstein M, Wu WY, Kok GB et al. Rational design of potent sialidase-based inhibitors of influenza virus replication. Nature 363(6428), 418–423 (1993).

3 Kim CU, Lew W, Williams MA et al. Influenzae neuraminidase inhibitors

possessing a novel hydrophobic interaction in the enzyme active sight: design, synthesis and structural analysis of carbocyclic sialic acid analogues with potent anti-influenzae activity. J. Am. Chem. Soc. 119, 681–690 (1997).

4 Ding HT, Ren H, Chen Q et al. Parallel cloning, expression, purification and crystallization of human proteins for structural genomics. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 12), 2102–2108 (2002).

5 Edwards AM, Arrowsmith CH, Christendat D et al. Protein production: feeding the crystallographers and NMR spectroscopists. Nature Struct. Biol. 7(Suppl.), 970–972 (2000).

6 Claverie JM, Monchois V, Audic S, Poirot O, Abergel C. In search of new antibacterial target genes: a comparative/structural genomics approach. Comb. Chem. High Throughput Screen. 5(7), 511–522 (2002).

7 Brizuela L, Braun P, LaBaer J. FLEXGene repository: from sequenced genomes to gene repositories for high-throughput functional biology and proteomics. Mol. Biochem. Parasitol. 118(2), 155–165 (2001).

8 Holz C, Hesse O, Bolotina N, Stahl U, Lang C. A micro-scale process for high-throughput expression of cDNAs in the yeast Saccharomyces cerevisiae. Protein Expr. Purif. 25(3), 372–378 (2002).

Page 8: Structural proteomics in drug discovery

Tari, Rosenberg & Schryvers

518 Expert Rev. Proteomics 2(4), (2005)

9 Lesley SA. High-throughput proteomics: protein expression and purification in the postgenomic world. Protein Expr. Purif. 22(2), 159–164 (2001).

10 Hendrickson WA, Horton JR, Murthy HM, Pahler A, Smith JL. Multiwavelength anomalous diffraction as a direct phasing vehicle in macromolecular crystallography. Basic Life Sci. 51, 317–324 (1989).

11 Hendrickson WA, Horton JR, LeMaster DM. Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO J. 9(5), 1665–1672 (1990).

12 Garman E. Cool data: quantity AND quality. Acta Crystallogr. D Biol. Crystallogr. 55(Pt 10), 1641–1653 (1999).

13 Rupp B, Segelke BW, Krupka HI et al. The TB structural genomics consortium crystallization facility: towards automation from protein to electron density. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 10 Pt 1), 1514–1518 (2002).

14 Karain WI, Bourenkov GP, Blume H, Bartunik HD. Automated mounting, centering and screening of crystals for high-throughput protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 10 Pt 1), 1519–1522 (2002).

15 Muchmore SW, Olson J, Jones R et al. Automated crystal mounting and data collection for protein crystallography. Structure Fold Des. 8(12), R243–R246 (2000).

16 Lamzin VS, Perrakis A. Current state of automated crystallographic data analysis. Nature Struct. Biol. 7(Suppl.), 978–981 (2000).

17 Diller DJ, Redinbo MR, Pohl E, Hol WG. A database method for automated map interpretation in protein crystallography. Proteins 36(4), 526–541 (1999).

18 Terwilliger TC, Berendzen J. Automated MAD and MIR structure solution. Acta Crystallogr. D Biol. Crystallogr. 55(Pt 4), 849–861 (1999).

19 Yasutake Y, Yao M, Tanaka I. High-throughput protein crystallography. Tanpakushitsu Kakusan Koso 47(8 Suppl.), 1033–1037 (2002).

20 Sugahara M, Miyano M. Development of high-throughput automatic protein crystallization and observation system. Tanpakushitsu Kakusan Koso 47(8 Suppl.), 1026–1032 (2002).

21 Stewart L, Clark R, Behnke C. High-throughput crystallization and structure determination in drug discovery. Drug Discov. Today 7(3), 187–196 (2002).

22 Stevens RC, Wilson IA. Tech.Sight. Industrializing structural biology. Science 293(5529), 519–520 (2001).

23 Stevens RC. High-throughput protein crystallization. Curr. Opin. Struct. Biol. 10(5), 558–563 (2000).

24 Schmid MB. Structural proteomics: the potential of high-throughput structure determination. Trends Microbiol. 10(10 Suppl.), S27–S31 (2002).

25 Kuhn P, Wilson K, Patch MG, Stevens RC. The genesis of high-throughput structure-based drug discovery using protein crystallography. Curr. Opin. Chem. Biol. 6(5), 704–710 (2002).

26 Krupka HI, Rupp B, Segelke BW et al. The high-speed Hydra-Plus-One system for automated high-throughput protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 10 Pt 1), 1523–1526 (2002).

27 Buchanan SG. Structural genomics: bridging functional genomics and structure-based drug design. Curr. Opin. Drug Discov. Devel. 5(3), 367–381 (2002).

28 Burley SK, Bonanno JB. Structuring the universe of proteins. Ann. Rev. Genomics Hum. Genet. 3, 243–262 (2002).

29 Blundell TL, Jhoti H, Abell C. High-throughput crystallography for lead discovery in drug design. Nature Rev. Drug Discov. 1(1), 45–54 (2002).

30 Pusey ML, Liu ZJ, Tempel W et al. Life in the fast lane for protein crystallization and x-ray crystallography. Prog. Biophys. Mol. Biol. 88(3), 359–386 (2005).

31 Walhout AJ, Temple GF, Brasch MA et al. GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes. Methods Enzymol. 328, 575–592 (2000).

32 Heyman JA, Cornthwaite J, Foncerada L et al. Genome-scale cloning and expression of individual open reading frames using topoisomerase 1-mediated ligation. Genome Res. 9(4), 383–392 (1999).

33 Sasaki Y, Sone T, Yoshida S et al. Evidence for high specificity and efficiency of multiple recombination signals in mixed DNA cloning by the Multisite Gateway system. J. Biotechnol. 107(3), 233–243 (2004).

34 Locher KP, Lee AT, Rees DC. The E. coli BtuCD structure: a framework for ABC transporter architecture and mechanism. Science 296(5570), 1091–1098 (2002).

• Demonstrates the value of exploiting diversity of genomic information for tackling challenging targets for protein crystallography. The authors cloned 28

distinct and diverse ATP transporters from different species that led to solving the structure of the BtuCD transport complex.

35 Page R, Grzechnik SK, Canaves JM et al. Shotgun crystallization strategy for structural genomics: an optimized two-tiered crystallization screen against the Thermotoga maritima proteome. Acta Crystallogr. D Biol. Crystallogr. 59(Pt 6), 1028–1037 (2003).

36 Prinz B, Schultchen J, Rydzewski R et al. Establishing a versatile fermentation and purification procedure for human proteins expressed in the yeasts Saccharomyces cerevisiae and Pichia pastoris for structural genomics. J. Struct. Funct. Genomics 5(1–2), 29–44 (2004).

37 Possee RD. Baculoviruses as expression vectors. Curr. Opin. Biotechnol. 8(5), 569–572 (1997).

38 Laible PD, Scott HN, Henry L, Hanson DK. Towards higher-throughput membrane protein production for structural genomics initiatives. J. Struct. Funct. Genomics 5(1–2), 167–172 (2004).

• Describes an attempt to overcome limitations in production of integral membrane proteins by using a bacterial species that naturally has relatively large quantities of (internal) membranes. Although the approaches for protein production and purification are clearly not high throughput, the reported yields are impressive.

39 Kolodka D, Hoang TT, Surette M, Schryvers AB. Genome wide analysis of the response to expression of a foreign outer membrane protein. Mol. Microbiol. (2005) (In Press).

40 D’Arcy A. Crystallizing proteins – a rational approach? Acta Crystallogr. D Biol. Crystallogr. 50(Pt 4), 469–471 (1994).

41 Yokoyama S. Protein expression systems for structural genomics and proteomics. Curr. Opin. Chem. Biol. 7(1), 39–43 (2003).

42 Pedelacq JD, Piltch E, Liong EC et al. Engineering soluble proteins for structural genomics. Nature Biotechnol. 20(9), 927–932 (2002).

43 Cabantous S, Terwilliger TC, Waldo GS. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nature Biotechnol. 23(1), 102–107 (2005).

• Description of a system for detection of solubility of expressed foreign proteins. The system involves fusing a small 15-mer peptide fragment of green fluorescent protein (GFP) to the C-terminus of the protein. The GFP subfragment forms a

Page 9: Structural proteomics in drug discovery

Structural proteomics in drug discovery

www.future-drugs.com 519

functional GFP molecule, with the remainder of the protein expressed separately. This system has the advantage that the tag does not influence the solubility of the foreign protein.

44 Stevens RC. The cost and value of three-dimensional protein structure. Drug Disc. World 4, 35–48 (2003).

45 Santarsiero BD, Yegian DT, Lee CC et al. An approach to rapid protein crystallization using nanodroplets. J. Applied Crystallog. 35, 278–281 (2002).

46 Hosfield D, Palan J, Hilgers M et al. A fully integrated protein crystallization platform for small-molecule drug discovery. J. Struct. Biol. 142(1), 207–217 (2003).

47 Goodwill KE, Tennant MG, Stevens RC. High-throughput Xrya crystallography for structure-based drug design. Drug Discov. Today 6, S113–S118 (2001).

48 Carter DC, Rhodes P, McRee DE et al. Reduction in diffuso-convective disturbances in nanovolume crystallization experiments. J. Applied Crystallog. 38, 87–90 (2005).

49 Hansen CL, Skordalakes E, Berger JM, Quake SR. A robust and scalable microfluidic metering method that allows protein crystal growth by free interface diffusion. Proc. Natl Acad. Sci. USA 99(26), 16531–16536 (2002).

50 Thorsen T, Maerkl SJ, Quake SR. Microfluidic large-scale integration. Science 298(5593), 580–584 (2002).

51 Spraggon G, Lesley SA, Kreusch A, Priestle JP. Computational analysis of crystallization trials. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 11), 1915–1923 (2002).

52 Abola E, Kuhn P, Earnest T, Stevens RC. Automation of x-ray crystallography. Nature Struct. Biol. 7(Suppl.), 973–977 (2000).

53 Snell G, Cork C, Nordmeyer R et al. Automated sample mounting and alignment system for biological crystallography at a synchrotron source. Structure (Camb.) 12(4), 537–545 (2004).

54 Pahler A, Smith JL, Hendrickson WA. A probability representation for phase information from multiwavelength anomalous dispersion. Acta Crystallogr. A. 46(Pt 7), 537–540 (1990).

55 Bushnell DA, Cramer P, Kornberg RD. Selenomethionine incorporation in Saccharomyces cerevisiae RNA polymerase II. Structure (Camb.) 9(1), R11–R14 (2001).

56 Morris RJ, Perrakis A, Lamzin VS. ARP/wARP’s model-building algorithms. I. The main chain. Acta Crystallogr. D Biol. Crystallogr. 58(Pt 6 Pt 2), 968–975 (2002).

57 Campbell SF. Science, art and drug discovery: a personal perspective. Clin. Sci. (Lond.) 99(4), 255–260 (2000).

58 Card GL, Blasdel L, England BP et al. A family of phosphodiesterase inhibitors discovered by cocrystallography and scaffold-based drug design. Nature Biotechnol. 23(2), 201–207 (2005).

•• Excellent example of the application of fragment-based screening to generate nanomolar lead compounds against phosphodiesterase type IV from a starting pool of 316 fragments and subsequent synthesis of 21 additional compounds.

59 Nienaber VL, Richardson PL, Klighofer V et al. Discovering novel ligands for macromolecules using x-ray crystallographic screening. Nature Biotechnol. 18(10), 1105–1108 (2000).

Patent

101 An innovation incorporated into the GNF robotic crystallization system, now commercially available from Syrrx/RTS, is the use of submicroliter (or nanoliter) crystallization volumes.US Patent No. 6,296,673

Affiliations• Leslie W Tari, PhD

Director of Structural Biology, ActiveSight, 4045 Sorrento Valley Blvd, San Diego, CA 92121, USATel.: +1 858 455 6870 ext. 104Fax: +1 858 455 [email protected]

• Martin Rosenberg, PhD

Chief Scientific Officer, Promega Corporation, Research & Development, 2800 Woods Hollow Road, Madison, WI 53711-5399, USATel.: +1 608 274 4330 ext. 1139Fax: +1 608 277 [email protected]

• Anthony B Schryvers, PhD, MD

Professor and Alberta Heritage Foundation for Medical Research Scientist, University of Calgary, Department of Microbiology & Infectious Diseases, Faculty of Medicine, Calgary, AB T2N 4N1, CanadaTel.: +1 403 220 3703Fax: +1 403 270 [email protected]