30

PROTEINS IN SOLUTIONdownload.e-bookshelf.de/download/0000/7510/53/L-G...Wiley Series on Surface and Interfacial Chemistry Series Editors: Ponisseril Somasundaran Nissim Garti Multiple

  • Upload
    letram

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

PROTEINS IN SOLUTIONAND AT INTERFACES

Wiley Series on

Surface and Interfacial ChemistrySeries Editors:Ponisseril SomasundaranNissim Garti

Multiple Emulsion: Technology and ApplicationsBy A. Aserin

Colloidal Nanoparticles in NanotechnolgyEdited by Abdelhamid Elaissari

Self-Assembled Supramolecular Architectures: Lyotropic Liquid CrystalsEdited by Nissim Garti, Ponisseril Somasundaran, and Raffaele Mezzenga

Proteins in Solution and at Interfaces: Methods and Applications inBiotechnology and Materials ScienceEdited by Juan M. Ruso and Angel Pineiro

PROTEINS IN SOLUTIONAND AT INTERFACES

Methods and Applications in Biotechnologyand Materials Science

Edited by

JUAN M. RUSOANGEL PINEIRO

Copyright C© 2013 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical,photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, withouteither the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright ClearanceCenter, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to thePublisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201)748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make norepresentations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any impliedwarranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written salesmaterials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional whereappropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited tospecial, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the UnitedStates at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. Formore information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Proteins in solution and at interfaces : methods and applications in biotechnology and materials science / edited by Juan M. Ruso, Angel Pineiro.pages cm

Includes bibliographical references and index.ISBN 978-0-470-95251-1 (hardback)

1. Proteins–Biotechnology. I. Ruso, Juan M. (Ruso Beiras, Juan Manuel), editor of compilation. II. Pineiro, Angel, 1973– editor of compilation.TP248.65.P76P768 2013660.6′3–dc23

2012050742

Printed in the United States of America

ISBN: 9780470952511

10 9 8 7 6 5 4 3 2 1

CONTENTS

PREFACE ix

CONTRIBUTORS xiii

PART I

1 X-Ray Crystallography of Biological Macromolecules: Fundamentalsand Applications 3Antonio L. Llamas-Saiz and Mark J. van Raaij

2 Nuclear Magnetic Resonance Methods for Studying Soluble, Fibrous,and Membrane-Embedded Proteins 23Victoria A. Higman

3 Small-Angle X-Ray Scattering Applied to Proteins in Solution 49Leandro Ramos Souza Barbosa, Francesco Spinozzi, Paolo Mariani, andRosangela Itri

4 Analyzing the Solution State of Protein Structure, Interactions, andLigands by Spectroscopic Methods 73Veronica I. Dodero and Paula V. Messina

5 Resolving Membrane-Bound Protein Orientation and Conformationby Neutron Reflectivity 99Hirsh Nanda

6 Investigating Protein Interactions at Solid Surfaces—In Situ,Nonlabeling Techniques 113Olof Svensson, Javier Sotres, and Alejandro Barrantes

7 Calorimetric Methods to Characterize the Forces DrivingMacromolecular Association and Folding Processes 139Conceicao A.S.A. Minetti, Peter L. Privalov, and David P. Remeta

v

vi CONTENTS

8 Virtual Ligand Screening Against Comparative Models of Proteins 179Hao Fan

9 Atomistic and Coarse-Grained Molecular Dynamics Simulationsof Membrane Proteins 193Thomas J. Piggot, Peter J. Bond, and Syma Khalid

PART II

10 Preparation of Nanomaterials Based on Peptides and Proteins 209Yujing Sun and Zhuang Li

11 Natural Fibrous Proteins: Structural Analysis, Assembly,and Applications 219Mark J. van Raaij and Anna Mitraki

12 Amyloid-Like Fibrils: Origin, Structure, Properties, and PotentialTechnological Applications 233Pablo Taboada, Silvia Barbosa, Josue Juarez, Manuel-Alatorre Meda, andVıctor Mosquera

13 Proteins and Peptides in Biomimetic Polymeric Membranes 283Alfredo Gonzalez-Perez

14 Study of Proteins and Peptides at Interfaces By Molecular DynamicsSimulation Techniques 291David Poger and Alan E. Mark

15 A Single-Molecule Approach to Explore the Role of the SolventEnvironment in Protein Folding 315Katarzyna Tych and Lorna Dougan

16 Enhanced Functionality of Peroxidases By Its Immobilization at TheSolid–Liquid Interface of Mesoporous Materials and Nanoparticles 335Jose Campos-Teran, Iker Inarritu, Jorge Aburto, and Eduardo Torres

17 Superactivity of Enzymes in Supramolecular Hydrogels 353Ye Zhang and Bing Xu

18 Surfactant Proteins and Natural Biofoams 365Malcolm W. Kennedy and Alan Cooper

19 Promiscuous Enzymes 379Luis F. Olguin

20 Thermodynamics and Kinetics of Mixed Protein/Surfactant AdsorptionLayers at Liquid Interfaces 389Reinhard Miller, E.V. Aksenenko, V.S. Alahverdjieva, V.B. Fainerman,C.S. Kotsmar, J. Kragel, M.E. Leser, J. Maldonado-Valderrama,V. Pradines, C. Stefaniu, A. Stocco, and R. Wustneck

21 Application of Force Spectroscopy Methods to the Study of Biomaterials 429Chuan Xu and Erika F. Merschrod S.

22 Protein Gel Rheology 437Katie Weigandt and Danilo Pozzo

CONTENTS vii

23 Exploring Biomolecular Thermodynamics in Aqueous and NonaqueousEnvironments using Time-Resolved Photothermal Methods 449Randy W. Larsen, Carissa M. Vetromile, William A. Maza, Khoa Pham,Jaroslava Miksovska

INDEX 473

PREFACE

This book is the result of a superb collective research workdeveloped by scientists from around the world in the field ofprotein science. A compendium of 23 chapters, in additionto providing an introduction to this complex and wonderfulworld, outlines the impact of the new theoretical and tech-nical developments in disciplines such as material design,genetic engineering, personalized medicine, drug delivery,biosensors, and biotechnology.

If the incorporation of new technologies and more efficientmethods is a common practice in all fields of science, it hasbeen precisely in the study of proteins where the innovationhas achieved impressive proportions. In the words of severalcontributors to this book: These new insights complement andextend our knowledge of proteins and their potential appli-cations to unimaginable levels. The technological factor isaddressed in all its complexity and illustrated in detail therich tapestry of the scientific, technological, and economicintegration associated with the new infrastructure. The bookhas a wide scope of addressing a large diversity of method-ologies and current applications based on proteins. However,homogenizing dynamics could find resistance in the experi-mental and theoretical traditional idiosyncrasies. Finding theappropriate balance between both approaches is one of theunderlying purposes in this book. In this way, we tried tomake this book useful for people from both the academicand industrial environments. The chapters have been writ-ten by selected and reputed experts in their respective fields.In general, all contributions start with an introduction at thefundamental level, and then grow in complexity as the chap-ter unravels to finally deal with the concluding remarks andfuture perspectives. This also makes the book appropriate forteaching (academic) purposes.

The book is organized in two parts. The first partfocuses on the introduction and description of concepts and

techniques of universal application that are typicallyemployed to study protein systems. Particular approachessubjected to specific conditions in different contexts are alsoincluded. The fascination with new methods is evident andthe authors, who are responsible for facilitating dissemina-tion, assume the role of protagonists. Simultaneously, theappreciation for techniques already consolidated emergeswith equal intensity but with the background of some criti-cal and efficient reviews which expose the many reasons forwhich they continue in the thick of things. In short, new andclassic technologies are woven into mutual attraction anddiverse perspectives that run throughout the book, with tripsto both sides that encourage us to learn from the advantagesand drawbacks of the different methods.

Eight chapters are devoted to the detailed description ofthe intricacies of physical principles, devices, and procedureson which the experimental or theoretical methods rely. Thispart of the book is self-consistent since the evolution of themultidirectional technical progress also brings new proto-cols and developments that can be applied to different fields,meeting the demands of social and economic interests.

The organization of the chapters within the first part ofthe book requires a cross reflection on several elements andthe result would probably depend on the reader. An overallanalysis would expose more clearly and convincingly thepotential of joint research using multiple methods. Resourcesfor this added discussion are available in the chapters and thereaders can split or unify according to their personal interests.

The main body of this part comprises contributions thatfocus on experimental methods. Specifically, chapters 1to 6 offer a comprehensive picture of several techniquesthat, in general, are complementary to each other and thatprovide different levels of detail on the studied samples.Chapter 1 reviews the basis and recent advances on x-ray

ix

x PREFACE

crystallography aimed to determine the three-dimensionalstructure of proteins from a practical point of view. Thisincludes the concepts required to understand the fundamen-tal theory. Chapter 2 describes the recent efforts performedto address several challenging issues by nuclear magneticresonance technique; these include solid-state MAS NMR,dynamics of proteins across a variety of time scales, andintermolecular interactions. Thus, the capability of NMRin the study of large proteins is illustrated together withthe perspectives at the short term. Chapter 3 shows thatthe use of small-angle x-ray scattering would greatly facil-itate to solving the structure of proteins and protein aggre-gates in a solution. Key information regarding protein struc-ture such as radius of gyration, spatial dimensions, foldingpathway, molecular weight, or the aggregation state can beobtained. Chapter 4 reports on different spectroscopic tech-niques such as ultraviolet-visible, circular dichroism, fluores-cence, Raman, FT infrared, and photon correlation. Authorsprovide a general overview of these techniques focusing onmethods available for studying protein secondary structurebesides assessing changes in the structure as a result of inter-nal or external factors. Chapter 5 highlights the ability ofneutron reflectivity for the molecular-level characterizationof membrane protein structure. After an overview of themost important practical aspects, a description of severalrecent works are used to demonstrate how integrating high-resolution structures into reflectivity refinement proceduresresolves molecular details of protein penetration and orien-tation on the membrane, as well as conformational changesrelevant to their biological function. In Chapter 6, emphasis isplaced on surface analytical techniques such as ellipsometry,dual polarization interferometry, surface plasmon resonance,quartz crystal microbalance, and atomic force microscopy. Adetailed description of the techniques and the evaluation ofthe resulting information are followed by an elegant analysisuseful to choose the best combination of techniques that suitsthe goals of the experimentalist.

Thermodynamics is central to understanding the stabili-ties and energetics of proteins, and the reactions and inter-actions that they undergo. This is deeply treated in Chapter7 through microcalorimetric methods. Differential scanningand isothermal titration microcalorimetry are described andreviewed for an understanding of the relationship betweenthe structure of proteins, the energetics of their stability,and binding with others biomolecules. In keeping with thisthematic breadth, Chapters 8 and 9 draw on computationalapproaches. Chapter 8 merges computational techniques likeprotein modeling and docking in an integrated protocol whichcan act as protein structure prediction and ligand discovery.This is an efficient manner of exploiting the large amount ofavailable information on ligand–protein interactions. Chap-ter 9 covers fundamental and advanced topics on moleculardynamics simulations. After a description of the theoreti-cal basis of the technique, molecular dynamic simulations at

the atomistic and coarse grain levels are discussed in moredetail, with special attention to membrane protein systems.Finally, more recent advances like multiscale approaches areintroduced.

Having presented some of the most important experimen-tal and theoretical techniques that are typically employed todeal with protein systems, the second part of this book han-dles several of the principal present-day applications in theframe of protein science.

Chapter 10 puts the accent on nanomaterials based on pep-tides and proteins to deal with more sustainable systems. Col-lagen networks, lysozyme monolayers, or protein cages arenice examples of nanostructured systems prepared by facilesynthetic routes. Applications to human health and environ-mental concerns are offered. Chapter 11 reviews the function,structure, and assembly of fibrous proteins. Fibrous structuralmotifs show a great potential for the design and engineeringof novel biomaterials. Nowadays, it is challenging for a sci-entist to design multifunctional materials of high complexitythrough the combination of different fibrous motifs. Chapter12 is devoted to amyloids. The main aspects concerning theorigin and possible mechanisms by which proteins fibrillate,with special emphasis on the factors that can both originateand influence this process, are described. In the end, somepotential biotechnological applications are summarized. InChapter 13, a very interesting point of view is exposed: thepossibility to incorporate functional membrane proteins inlipid-free polymeric membranes. This has opened new unex-pected possibilities to investigate membrane protein func-tionality in addition to developing applications based on thesesystems. Chapter 14 covers the study of the interaction ofpeptides and proteins at interfaces using molecular dynamicssimulation techniques. The chapter focuses on the main typesof interfaces: membranes, air/oil–water, water−organic, andwater−inorganic interfaces. It includes a brief introductionto the most important aspects of computational simulations,highlighting the advantages and drawbacks of the severaltechniques at different levels of detail to deal with proteinsystems. The connection between computational results and anumber of experimental techniques is also discussed. Chapter15 is devoted to the important topic of single-molecule forcespectroscopy. This concept was introduced only a few yearsago and can systematically improve the knowledge of therole of the solvent environment, hydrogen bonds, hydropho-bic collapse or ligands in the complete unfolding and refold-ing pathways of a protein. Examples of this kind of studiesare described in this chapter. The interaction of enzymeswith solid supports is addressed in Chapter 16. Accordingly,the immobilization of enzymes within a pore or on a sur-face such as mesoporous materials, has allowed to enhancethe enzyme performance and to produce more robust biocata-lysts adapted to industrial requirements. Continuing with thisline, Chapter 17 explores the recent activities in achievingenzymes’ superactivity by means of molecular hydrogels.

PREFACE xi

This route provides a more convenient way to handle theenzymes, facilitates the efficient recovery, and reuse of costlyenzymes enhancing their stability and performance. Chap-ter 18 focuses on proteins that exhibit surfactant activity intheir native state, and without association with other mate-rials such as lipids or carbohydrates. Numerous potentialapplications exploiting their biocompatibility and biodegrad-abilty comprising three-dimensional scaffolds/matrices fortissue growth, wound healing, or environmental remediationpurposes are clearly exposed. Chapter 19 treats an originaland interesting subject: enzyme catalytic promiscuity. Suchbehavior has started to be better understood and it has impli-cations in diverse areas such as acquisition of new function-alities in nature, drug resistance, immune system function,signal transduction, and transcription regulation. In Chap-ter 20, theoretical and experimental methods are thoroughlyexposed to gain an extremely detailed picture of the adsorp-tion of proteins and proteins mixed with surfactants at liquidinterfaces. Chapter 21 is fully dedicated to the adaptation,both experimental and in modeling/analyzing the data, of

nanoindentation experiments to biological materials. Chap-ter 22 serves to present key assumptions of aspects relatedto the mechanical properties of protein gels found in livingsystems by a combination of experimental and clinical tech-niques. In Chapter 23, time-resolved photothermal methodsare canvassed to reveal novel insights into the intricate inter-play between protein conformation, physiological function,and protein/surface interactions on fast timescales.

Last but not least, we would like to thank each and everyone of the authors who contributed to this book. We areenormously grateful for many reasons that this short pref-ace prevents us from enumerating. However, we do not wantto farewell without emphasizing two of the reasons that webelieve are the most important: first, the close and profes-sional collaboration during the edition process of this book,and second, their very invaluable and in-depth scientificcontributions.

Juan M. Ruso and Angel PineiroSantiago de Compostela, 2012

CONTRIBUTORS

Jorge Aburto, Coordinacion de Procesos de Transfor-macion, Instituto Mexicano del Petroleo, Col. San BartoloAtepehuacan, Mexico.

E.V. Aksenenko, Institute of Colloid Chemistry and Chem-istry of Water, Kiev, Ukraine.

V.S. Alahverdjieva, Nestle US R&D, PTC Marysville,Ohio, USA.

Leandro Ramos Souza Barbosa, Institute of Physics, Uni-versity of Sao Paulo, Sao Paulo, Brazil.

Silvia Barbosa, Departamento de Fısica de la Materia Con-densada, Facultad de Fısica, Campus Vida, Universidade deSantiago de Compostela, Santiago de Compostela, Spain.

Alejandro Barrantes, Biomedical Laboratory Science andTechnology, Faculty of Health and Society, Malmo Univer-sity, Malmo, Sweden.

Peter J. Bond, Department of Chemistry, The Unilever Cen-tre for Molecular Science Informatics, University of Cam-bridge, Cambridge, UK.

Jose Campos-Teran, Departamento de Procesos y Tec-nologıa, DCNI, Universidad Autonoma Metropolitana-Cuajimalpa, Artificios 40-sexto piso, Col. Hidalgo,Mexico.

Alan Cooper, School of Chemistry, University of Glasgow,College of Science and Engineering, Scotland, UK.

Veronica I. Dodero, Chemistry Department, UniversidadNacional del Sur, Bahıa Blanca, Argentina. INQUISUR-CONICET

Lorna Dougan, School of Physics and Astronomy, Univer-sity of Leeds, Leeds, UK.

V.B. Fainerman, Donetsk Medical University, Donetsk,Ukraine.

Hao Fan, Department of Bioengineering and Therapeu-tic Sciences, Department of Pharmaceutical Chemistry,California Institute for Quantitative Biosciences, Universityof California, San Francisco, California.

Alfredo Gonzalez-Perez, Membrane Biophysics Group,Niels Bohr Institute, University of Copenhagen, Blegdamsvej17, Copenhagen, Denmark.

Victoria A. Higman, Department of Biochemistry, Univer-sity of Oxford, Oxford, UK.

Iker Inarritu, Departamento de Procesos y Tecnologıa,DCNI, Universidad Autonoma Metropolitana-Cuajimalpa,Artificios 40-sexto piso, Col. Hidalgo, Mexico.

Rosangela Itri, Institute of Physics, University of SaoPaulo, Sao Paulo, Brazil.

Josue Juarez, Departamento de Fısica de la Materia Con-densada, Facultad de Fısica, Campus Vida, Universidad deSantiago de Compostela, Santiago de Compostela, Spain.

Malcolm W. Kennedy, Institute of Molecular, Cell and Sys-tems Biology, Institute for Infection, Immunity and Inflam-mation, College of Medical, Veterinary and Life Sciences,University of Glasgow, Scotland, UK.

Syma Khalid, School of Chemistry, University ofSouthampton, Southampton, UK.

C.S. Kotsmar, Department of Chemical and BiomolecularEngineering, University of California, Berkeley, California.

J. Kragel, Max Planck Institute of Colloids and Interfaces,Potsdam/Golm, Brandenburg, Germany.

xiii

xiv CONTRIBUTORS

Randy W. Larsen, Department of Chemistry, University ofSouth Florida, Florida, USA.

M.E. Leser, Nestle US R&D, PTC Marysville, Ohio, USA.

Zhuang Li, State Key Laboratory of ElectroanalyticalChemistry, Changchun Institute of Applied Chemistry, Chi-nese Academy of Sciences, Changchun, People’s Republicof China.

Antonio L. Llamas-Saiz, Unidad de Rayos X, RIAIDT,Edificio CACTUS, Campus Sur, Universidad de Santiago deCompostela, Santiago de Compostela, Spain.

J. Maldonado-Valderrama, University of Granada, Facul-tad de Ciencias, Granada, Spain.

Paolo Mariani, Department of Life and Environmen-tal Sciences, Marche Polytechnic University, Ancona,Italy.

Alan E. Mark, The School of Chemistry and Molecu-lar Biosciences, The University of Queensland, Brisbane,Australia.

William A. Maza, Department of Chemistry, University ofSouth Florida, Florida, USA.

Manuel-Alatorre Meda, Departamento de Fısica de laMateria Condensada, Facultad de Fısica, Campus Vida, Uni-versidad de Santiago de Compostela, Santiago de Com-postela, Spain.

Erika F. Merschrods S., Department of Chemistry, Memo-rial University, St. John’s, Canada.

Paula V. Messina, Chemistry Department, UniversidadNacional del Sur, Bahıa Blanca, Argentina. INQUISUR-CONICET.

Jaroslava Miksovska, Department of Chemistry andBiochemistry, Florida International University, Florida,USA.

Reinhard Miller, Max Planck Institute of Colloids andInterfaces, Potsdam/Golm, Brandenburg, Germany.

Conceicao A.S.A. Minetti, Department of Chemistry andChemical Biology, Rutgers, The State University of NewJersey, New Jersey, USA.

Anna Mitraki, Department of Materials Science and Tech-nology, University of Crete and Institute for Electronic Struc-ture and Laser, Foundation for Research and Technology-Hellas (IESL-FORTH), Vassilika Vouton, Heraklion, Crete,Greece.

Vıctor Mosquera, Departamento de Fısica de la MateriaCondensada, Facultad de Fısica, Campus Vida, Universi-dad de Santiago de Compostela, Santiago de Compostela,Spain.

Hirsh Nanda, National Institute of Standards and Technol-ogy, Center for Neutron Research, Maryland, USA.

Luis F. Olguin, Facultad de Quimica, Universidad NacionalAutonoma de Mexico (UNAM), Mexico.

Khoa Pham, Department of Chemistry and Biochemistry,Florida International University, Florida, USA.

Thomas J. Piggot, School of Chemistry, University ofSouthampton, Southampton, UK.

David Poger, School of Chemistry and MolecularBiosciences, The University of Queensland, Brisbane,Australia.

Danilo Pozzo, Chemical Engineering, University of Wash-ington, Washington, USA.

V. Pradines, Laboratoire de Chimie de Coordination,Toulouse Cedex 04, France.

Peter L. Privalov, Department of Biology, The Johns Hop-kins University, Maryland, USA.

David P. Remeta, Department of Chemistry and ChemicalBiology, Rutgers, The State University of New Jersey, NewJersey, USA.

Javier Sotres, Biomedical Laboratory Science and Tech-nology, Faculty of Health and Society, Malmo University,Malmo, Sweden.

Francesco Spinozzi, Department of Life and Environ-mental Sciences, Marche Polytechnic University, Ancona,Italy.

C. Stefaniu, Max Planck Institute of Colloids and Inter-faces, Potsdam/Golm, Germany.

A. Stocco, Soft Matter Team, Laboratoire Charles CoulombUMR 5221 CNRS-UM2, 34095 Montpellier Cedex 05,France.

Yujing Sun, State Key Laboratory of ElectroanalyticalChemistry, Changchun Institute of Applied Chemistry, Chi-nese Academy of Sciences, Changchun, People’s Republicof China.

Olof Svensson, Department of Theoretical Chemistry, LundUniversity, Lund, Sweden.

Pablo Taboada, Departamento de Fısica de la MateriaCondensada, Facultad de Fısica, Campus Vida, Universi-dad de Santiago de Compostela, Santiago de Compostela,Spain.

Eduardo Torres, Posgrado en Ciencias Ambientales, Cen-tro de Quımica-ICUAP, Benemerita Universidad Autonomade Puebla, Edificio 103G, Ciudad Universitaria, Puebla,Mexico.

CONTRIBUTORS xv

Katarzyna Tych, School of Physics and Astronomy, Uni-versity of Leeds, Leeds, UK.

Mark J. van Raaij, Departamento de Estructura de Macro-moleculas, Centro Nacional de Biotecnologıa (CNB-CSIC),Madrid, Spain.

Carissa M. Vetromile, Department of Chemistry, Univer-sity of South Florida, Florida, USA.

Katie Weigandt, Department of Chemical Engineering,University of Washington, Washington, USA.

R. Wustneck, Max Planck Institute of Colloids and Inter-faces, Potsdam/Golm, Brandenburg, Germany.

Bing Xu, Department of Chemistry, Brandeis University,Massachusetts, USA.

Chuan Xu, Department of Chemistry, Memorial University,St. John’s, Canada.

Ye Zhang, Department of Chemistry, Brandeis University,Massachusetts, USA.

PART I

1X-RAY CRYSTALLOGRAPHY OF BIOLOGICALMACROMOLECULES: FUNDAMENTALS ANDAPPLICATIONS

Antonio L. Llamas-Saiz and Mark J. van Raaij

1.1 INTRODUCTION

X-ray crystallography is a powerful technique to determinethe three-dimensional structure of any kind of moleculeat atomic resolution, including that of biological macro-molecules like proteins, nucleic acids, or any complexbetween them or with smaller compounds like ligands, drugs,cofactors, or inhibitors. The experimental result providedby this technique is the three-dimensional electron densitymap corresponding to the crystal subjected to the diffrac-tion experiment. In this detailed and “amplified” imageof the crystal, an atomic model for the molecules presentcan be built. The theoretical background involved in X-raycrystallography is very broad, covering different disciplineslike mathematics, physics, chemistry, and even biology. Theexperimental setups can also be very complicated, like thebeam lines at synchrotron installations, which include opticaland experimental hutches full of dedicated equipment. In thischapter, we will try to cover the main concepts to understandthe basic theory behind an X-ray diffraction crystal structuredetermination and to outline, from a practical viewpoint,the principal steps in order to facilitate the interpretationof the structural determination process and the final resultsobtained.

1.2 FUNDAMENTALS OF X-RAY DIFFRACTION

1.2.1 X-Ray Radiation and Interaction with Matter

X-rays consist of photons from the electromagnetic spec-trum with energies above ultraviolet light and below gammaradiation. The energy ranges from approximately 0.12 to

120 keV, corresponding to wavelengths between 100 and0.1 Å, respectively (1 Å equals 0.1 nm). The most energeticX-rays, known as hard X-rays, are the ones used in crystallog-raphy for single crystal structure determination due to theirpenetrating abilities and due to their wavelengths that varyfrom 2 to 0.5 Å, similar to the shortest interatomic distancespresent in solid matter [1].

X-rays interact almost exclusively with the electrons ofmatter. They do this in different ways, via absorption, pho-toelectric, Compton, and Thompson scattering. Thompsonscattering, also called as coherent or elastic scattering, ispredominant in the X-ray diffraction pattern obtained froma crystal. It is a pure scattering interaction and deposits noenergy in the scattering material. In the classical free electronmodel developed by J.J. Thompson in 1898, the charged par-ticle interacts with the X-ray electromagnetic field and startsto oscillate. Consequently, it emits secondary radiation of thesame wavelength (same energy) in all directions. The inten-sity distribution as a function of the scattering angle (anglebetween incident and scattered radiation) found using thisclassical model is comparable to that obtained from quantummechanical calculations. As we can consider the electronsas the unique X-ray scatters in a crystal, diffraction shouldtherefore reveal the distribution of electrons, or the electrondensity, of the atoms or molecules of that crystal.

1.2.2 Crystals and Symmetry

Why do we need crystals? Reconstructing the image of asingle molecule using X-rays is still not possible, mainlyfor the following two reasons. The first one is that thereis no easy way to focus X-ray-scattered beams by lenses.

Proteins in Solution and at Interfaces: Methods and Applications in Biotechnology and Materials Science, First Edition. Edited by Juan M. Ruso and Angel Pineiro.C© 2013 John Wiley & Sons, Inc. Published 2013 by John Wiley & Sons, Inc.

3

4 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

The second reason is that a single molecule scatters X-raysvery weakly. Having said that, X-ray diffraction from sin-gle molecules using X-ray lasers is under development [2].Both limitations can be surpassed by the use of crystals. Thecrystalline state leads to the concentration of the scatteredintensities for every irradiated molecule in small and well-defined regions of space (i.e., generates a diffraction pattern),thus increasing the local intensities many-fold and facilitat-ing their measurement. After the “phase problem” has beensolved (see Sections 1.2.7 and 1.3.4), the recombination ofthe diffracted beams is then performed by a crystallographerusing crystallographic software. This is analogous to whathappens in real time using an optical microscope.

The crystalline state is defined by the repetition of a sin-gle elemental unit (motif) by means of identical translations.In practice, the final result can be mono-dimensional crystals(fibers, as used in fiber diffraction), bi-dimensional crystals (asingle layer of ordered molecules as used in electron diffrac-tion), or three-dimensional crystals (used in single crystalstructure determination). That means that the crystal is theconvolution of a repeating motif with a periodic lattice. Acrystal is formed by a motif that repeats in a perfectly regu-lar pattern in three dimensions. Choosing any arbitrary pointin the pattern and all the equivalent points related by trans-lation, a three-dimensional lattice can be defined in whichall lattice points have exactly the same environment. A fun-damental difference between a crystal (the pattern) and itslattice is that the first is a continuous media (like its electrondensity) and the lattice is discontinuous. The point latticeis determined by all the points that correspond to succes-sive repetitions of identical crystal components. A latticemay have additional symmetry operators besides its owntranslation operators and the symmetry operators belong-ing to the point group or space group of the correspond-ing crystal. For example, all crystallographic lattices arecentrosymmetric.

A lattice plane can be defined for every set of three nonco-linear lattice points. All the equivalent planes in the lattice,parallel with the same periodical repetition, constitute theassociated family of lattice planes. They are unambiguouslynamed by the three Miller indices (hkl) that correspond to thenumber of times that the planes intersect each of the threeunit cell vectors a, b, and c, respectively.

The unit cell is the parallelepiped built on the basis vec-tors, a, b, and c, of a crystal lattice, which can be selected inmany different ways. The most convenient way is to choosethat volume enclosed by the set of three noncoplanar lat-tice vectors with the shortest possible lengths and sorted ina “right-handed” way. A primitive unit cell, containing onlyone lattice point, can always be defined. However, for sym-metry reasons, basis vectors defining nonprimitive unit cells(i.e., face- or body-centered) are sometimes used instead,because they provide a more convenient coordinate systemand set of basis vectors.

TABLE 1.1 Crystal Systems and Bravais Lattices

CrystalSystem

LatticeCenteringSymbol

LatticeSymmetrya

ConditionsImposed by

Symmetry on UnitCell Geometry

Triclinic P −1 (Ci) NoneMonoclinic P 2/m (C2h) Unique axis b:

α = γ = 90◦

COrthorhombic P mmm (D2h) α = β = γ = 90◦

CFI

Tetragonalb P 4/mmm (D4h) a = b;α = β = γ = 90◦

ITrigonal R −3m (D3d) a = b = c;

α = β = γ

Hexagonal Pb 6/mmm (D6h) a = b;α = β = 90◦;γ = 120◦

Cubic P m-3m (Oh) a = b = c;α = β = γ = 90◦

FI

aHermann–Mauguin (and Schoenflies) symbols.bThe primitive hexagonal lattice is common to the trigonal and hexagonalcrystal systems.

In three dimensions, seven kinds of lattices, or crystalsystems, are possible: triclinic, monoclinic, orthorhombic,tetragonal, trigonal, hexagonal, and cubic (Table 1.1). Thecombination of the seven crystal systems and the possibilityof choosing nonprimitive unit cells give rise to 14 Bravaislattices.

The classification of a crystal into a crystal system isalways determined by the symmetry of the lattice (the Laueclass to which the crystal structure belongs, see next para-graph) and not to the relationships between the unit cellmetric values. For example, a tetragonal unit cell will alwayshave a = b and α = β = γ = 90◦; however, the c axiscould take any value, in most cases different from a and b,but it could be equal just by chance and still belong to thetetragonal system instead of to the cubic system.

By definition, symmetry point groups apply to any objectwhere at least one point remains invariant after the applica-tion of all its symmetry operations. Crystallographic pointgroups play their role in three-dimensional lattices (not inthree-dimensional space in general), and in this particularcase the rotations and rotoinversions allowed are restrictedto 1, 2, 3, 4, 6, and −1, −2(=m), −3, −4 −6, respectively.There are 32 crystallographic point groups (Table 1.2), alsoknown as crystal classes. The Laue classes correspond to11 centrosymmetric crystallographic point groups. On the

FUNDAMENTALS OF X-RAY DIFFRACTION 5

TABLE 1.2 The 32 Crystallographic Point Groups

Laue ClassesNoncentrosymmetric GroupsHaving the Same Laue Class

1 12/m 2, mmmm 222, 2mm3 33m 32, 3m4/m 4, 44/mmm 422, 42m, 42m6/m 6, 66/mmm 622, 62m, 62mm3 23m3m 432, 432

In bold the 11 enantiomorphic point groups.

other hand, the crystal classes that include inversion centersor mirror planes are not allowed for crystals of enantiomer-ically pure substances, like the biological macromolecules.Crystals of chiral molecules display only one of the 11 enan-tiomorphic point groups (Table 1.2].

The combination of the 32 crystallographic point groupswith the 14 Bravais lattices gives rise to 73 symmorphicspace groups. In a symmorphic space group, all generatingsymmetry operations leave at least one common point fixed,of course, with the exception of the lattice translations. Tocomplete the 230 space groups possible in three-dimensionalcrystal patterns another kind of symmetry elements shouldbe taken into account. They are the screw axes and glideplanes, where the rotations or reflections are combined withtranslational displacements, respectively.

Crystallographic space groups apply to infinite periodicpatterns. Therefore, according to the previous description, thesymmetry elements of the space groups are translations, sym-metry elements of the crystallographic point groups, screwaxes and glide planes. In any case, the space group of acrystal structure determines its point group uniquely and notvice versa. For a complete description of all symmetry ele-ments compatible with three-dimensional periodic patterns(crystals), see the International Tables for Crystallography,Volume A, in Reference 3. Space groups with mirror planesand/or inversion centers are not allowed for crystals of bio-logical macromolecules, like proteins or nucleic acids, dueto the enantiopure nature of these molecules. This means thatthere are only 65 space groups available for the enantiomor-phic crystal structures of biological macromolecules.

The symmetry elements of the crystal space group operateinside the crystal unit cell; therefore, it is possible to definean “asymmetric unit” of the unit cell. The asymmetric unitis the independent fraction of the unit cell that generates thewhole crystal structure once all the symmetry operations ofthe space group are applied. The structural description ofthis asymmetric unit plus the indication of the corresponding

space group is all that is needed to represent the completecrystal structure (and is thus what is normally used by crystal-lographers, crystallographic programs, and what is depositedin databases such as the Protein Data Bank).

1.2.3 Diffraction by Crystals

Crystals are constituted by atoms; therefore let us first con-sider the X-ray scattering by the atomic electron cloud (con-sidered spherical in shape in a first approximation). The scat-tering amplitude of an atom is called the atomic scatteringfactor, or form factor, f. It expresses the scattering power ofone atom in relation of that from a free single electron, andit is calculated and averaged for spherical electron densitydistributions.

The values for f are tabulated in the International Tablesfor Crystallography, Volume C, Table 6.1.1.1, page 555, inReference 4, for each atom type as a function of sin θ /λ.Usually its value is calculated using Equation 1.1 and thetabulated set of nine Cromer–Mann coefficients ai, bi, c,(i = 1 to 4) in a parameterization of the nondispersive part ofthe atomic scattering factor for each atom (see Table 6.1.1.4in Reference 4]. This expression is very convenient for cal-culation in crystal structure software suites. These valuesare real numbers if the X-ray wavelength is not close toan absorption edge of the atom. Near the absorption edges,the atomic scattering factors become complex numbers asexpressed in Equation 1.2, where f is the “normal” atomicscattering factor, f ′ is the real part of the correction, andf ′′ is the imaginary one, which is always π /2 out of phaseahead of f [5]. The anomalous dispersion (or more rigor-ously, resonant scattering) effect, far from being an incon-venience, is a very useful tool to solve crystal structures ofmacromolecules (see Friedel’s law description below andSection 1.3.4.3).

There is always an angular dependence for the scatteringamplitude of an atom, it decays with increasing scatteringangle for two reasons. The first reason is interference inter-actions between the scattered rays from different regions ofthe atomic electronic cloud. In the incident beam directionθ = 0, all electrons scatter in phase, there is no decay for thisreason, and the atomic scattering factor value is identical tothe number of electrons in the atom. This type of decay isreflected in the tabulated values and represented with solidlines in Figure 1.1. The second source of decay is due to theatomic displacement effects that cause that the apparent sizeof an atom is larger than it will be at rest during the X-rayexposure time, dashed line in Figure 1.1 and Equation 1.3.The spreading of the atomic electronic cloud may be dueto temperature-dependent atomic vibrations around the equi-librium position, dynamic disorder, or to the situation whereequivalent atoms in different unit cells stray around differentequilibrium positions. This is called static disorder and istemperature-independent. During a typical X-ray diffraction

6 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

f

8

ON

C

7

6

0.1 0.3sin θ/λ

0.5 0.7

f = f ° + f ′ + i · f ″

(1.1)

(1.2)

(1.3)fB = f · e –B(sinθ/λ)2

B = 8π2U2

f °(sinθ/λ) = + ci = 1

4ai · e –bi(sinθ/λ)2Σ

FIGURE 1.1 Schematic representation of the theoretical atomic scattering factors for C, N, and Oatoms at rest as a function of the scattering angle (solid lines). Faster decay is observed for vibratingatoms and plotted for the C atom (dashed line) when the atomic displacement parameter, U or B, isdifferent from zero.

experiment, the measured intensities of the diffracted beamshave been averaged in space over all the unit cells that diffractsimultaneously. They have also been averaged in time duringthe data acquisition period, which is much longer than theatomic vibration periods.

When considering the X-ray scattering by the whole crys-tal, periodicity imposes discontinuity in the resulting diffrac-tion pattern. All scattering intensities are concentrated andmagnified in well-defined directions in space where con-structive interference of waves occur and are recorded asclear points in the X-ray detector. The conditions for con-structive interference of the diffracted beams are defined bythe Bragg’s law (Fig. 1.2) or the equivalent Laue equations.To obtain constructive interference between both waves, theequivalence 2d sin θ = nλ must hold, n being any positiveinteger (this is the mathematical expression of Bragg’s law).Sometimes it is useful to indicate this relation in terms of thecorresponding family of planes (by use of the Miller indices)

FIGURE 1.2 Geometrical representation of Bragg’s law. Thepath differences between the X-ray waves that reach the first andsecond horizontal crystal planes of atoms separated with distance dis equal to two times d sin θ .

instead of the interplanar distance d. This gives rise to the setof Laue equations in the three-dimensional space:

a · (s − so) = hλ

b · (s − so) = kλ

c · (s − so) = lλ,

where a, b, and c are the unit cell vectors, h, k, and l are theMiller indices of the corresponding family of planes, and sand so are the unit vectors along the incident and reflecteddirections, respectively.

1.2.4 Real and Reciprocal Space

Given any crystal lattice in real space, it is always possibleto construct its one-to-one related counterpart in reciprocalspace, the reciprocal crystal lattice. The reciprocal lattice isa very convenient tool for constructing and analyzing the X-ray diffraction pattern. It is obtained by positioning its latticepoints along the direction perpendicular to each family ofreal lattice planes and at a distance from the origin, d∗, equalto the inverse of the interplanar distance corresponding tothis family, d∗ = 1/d. According to this construction, eachreciprocal lattice point is univocally associated to a family oflattice planes in real space. Therefore, the Miller indices ofthis family also correspond to the coordinates of one latticepoint in the three-dimensional reciprocal lattice.

As it is clearly stated in Bragg’s law there is an inverserelation between the diffraction angle θ and interplanar dis-tances, d. Reflections measured at higher diffraction anglescorrespond to shorter values of d and therefore contain struc-tural information about the electronic density distribution athigher resolution. More detail can be seen in the electrondensity maps calculated with data measured up to higherdiffraction angles.

FUNDAMENTALS OF X-RAY DIFFRACTION 7

1.2.5 Structure Factors

The structure factor represents the total scattered wave by allthe electrons in the whole unit cell. The effective number ofscattering electrons is called the structure factor, F, becauseit depends on the structure, that is, the electronic densitydistribution of the atoms in the unit cell. Due to the regularperiodicity in the crystals it also depends on the scatteringdirection. The structure factor can be regarded as the sumof the scattering by the atoms in the unit cell, taking intoconsideration their positions and the corresponding phasedifferences between the scattered waves.

F(h,k,l) =atoms∑

j=1

f( j) exp[2π · i(hx( j) + ky( j) + lz( j))] (1.4)

F(h, k, l) = |F(h, k, l)|eiα(h,k,l)

Fhkl =atoms∑

j=1

f j (a j + ib j ) = A + i B.

It is a complex (vectorial) magnitude and therefore can berepresented in different ways, for example, with module anddirection (phase) or as a complex number (real and imagi-nary part) as shown in the Argand diagram (Fig. 1.3). It isimportant not to confuse the mentioned “direction” of thestructure factor vector in the complex space, which indicatesthe phase of the structure factor, with the “direction” of thediffracted X-ray beam in real space, which is determined bythe crystal lattice geometry and the particular setup for thediffraction experiment.

Im

⏐F⏐=(A2+B2)½

B=⏐F⏐sinφ

A=⏐F⏐cosφ

φπ 0 Re

φ=tan–1(B/A)

F hkl

FIGURE 1.3 Argand diagram for the representation of complexmagnitudes (like the structure factors) in the complex plane. Realand imaginary components are located along horizontal and verticalaxes, respectively.

The diffracted intensity is proportional to the square ofthe modulus of the structure factor, I ∝ |Fh|2. When theanomalous dispersion effect is negligible the atomic scat-tering factors, f, of all atoms are real, and accordingly|Fhkl|2 = |F-h-k-l|2, that is, the intensities of the hkl and -h-k-l reflections (Friedel’s pair) are equal and it is known as theFriedel’s rule. The rule does not hold for noncentrosymmet-ric crystals containing atoms showing anomalous dispersion,because of the imaginary part of the atomic scattering fac-tors, f ′′. The difference between these intensities becomeslarger when the X-ray wavelength used is close to an absorp-tion edge of a particular atom in the crystal. Synchrotronradiation, which is a tunable X-ray source, may be used forthis purpose. When the differences in intensity between bothcomponents of the Friedel’s pairs are clearly measured, thediffraction pattern reveals the symmetry of the actual pointgroup of the crystal.

1.2.6 Fourier Synthesis and Transform

The electron density distribution is a periodic function; there-fore, it can be described as a Fourier series.

ρ(x, y, z) =∑

h′

k ′

l ′Ch′k ′l ′e

2π i(h′x+k ′ y+l ′z). (1.5)

Analogously to the discrete expression for the structurefactor (Eq. 1.4), it can be expressed as a continuous summa-tion (integration) of the electron density distribution over thewhole unit cell volume.

Fhkl =∫

vρ(x, y, z) e2π i(hx+ky+lz)dv . (1.6)

Substituting electronic density expression (Eq. 1.5) inEquation 1.6 and after some operations it is not so difficultto arrive to

ρ(x, y, z) = 1

V

h

k

l

Fhkl e−2π i(hx+ky+lz), (1.7)

where the structure factors are the coefficients of this sum-mation in the Fourier expansion.

Each structure factor contains contributions from all atomsin the unit cell. Its value (module and phase) will be deter-mined by the electron density distribution along the directionperpendicular to its associated diffracting family of planes.

The reciprocal space lattice weighted by the correspond-ing structure factors is the Fourier transform of the electrondensity distribution of the crystal structure. Therefore, thereciprocal lattice construction is a very convenient repre-sentation of the diffraction pattern. To obtain this informa-tion, every measured diffracted intensity has to be processed(see Section 1.3.3) to get the structure factor module after

8 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

normalization and correction for Lorentz, polarization, andabsorption effects.

1.2.7 The Phase Problem

Only X-ray-diffracted intensities can be measured, and fromthem only the value of the amplitude (module) can be esti-mated. No direct information about the phase can be recordedin the X-ray diffraction experiment, and the reciprocal spacelattice could only be weighted by structure factor ampli-tudes. To calculate the Fourier transform and obtain the three-dimensional electron density map of the crystal, the value forthe phase of each reflection is needed. The crystallographermust obtain the phase angles from further experimentation asdescribed in Section 1.3.4. This is what is called the “phaseproblem” in crystal structure determination [6, 7].

1.3 THE STRUCTURE DETERMINATIONPROCESS

Determining the structure of a macromolecule is a processthat consists of various steps, comprising many different tech-niques. The macromolecule, or complex of macromolecules,may have to be expressed in a suitable system if it cannot beisolated from natural sources. For this, a suitable expressionvector will need to be constructed, involving genetic engi-neering and/or cloning. The molecule or complex of interestwill have to be isolated and purified, either from its naturalsource or from the expression host in sufficient amounts, usu-ally several to many milligrams. Then, many different crys-tallization trials are performed. When crystals are obtained,they have to be manipulated to allow data collection, andwhere necessary, heavy atom derivatives may need to be pre-pared. Cocrystallization or crystal soaking experiments withnatural or artificial ligands may also be performed. This parttakes place in the laboratory, that is, in vitro. Data processing,structure determination, model construction, refinement, andvalidation take place in silico, using specialized computerprograms developed to such end. All these steps are dis-cussed below.

1.3.1 Sample Production and Conditioning

High-quality samples may be obtained by careful purifica-tion from natural sources in which the macromolecule orcomplex of interest is present in sufficient amounts andat high enough concentration to make purification feasibleand worthwhile. Examples are myoglobin from sperm whalemeat [8], hemoglobin from blood [9], elongation factor Tuand ribosomes from bacteria [10, 11], F1-ATPase from beefhearts [12], and light harvesting center from spinach leaves[13]. However, in many other cases, the macromolecule orcomplex of interest needs to be expressed in bacteria, yeast,insect cells, or mammalian cells.

1.3.1.1 Protein Expression in Bacteria For expression inprokaryotic systems (most often the bacterium Escherichiacoli), expression vectors have to be constructed. Usually,expression plasmids are used. Plasmids are small circularDNAs that replicate in the bacterium independently fromthe chromosome. To select for bacteria containing the plas-mid during cultivation, plasmids contain a gene encodinga protein that confers resistance to a certain antibiotic. Forexample, they may encode a gene for beta-lactamase, whichhydrolyses ampicillin and carbenicillin. Other commonlyused antibiotics are kanamycin, streptomycin, and chlo-ramphenicol with their corresponding resistance-conferringgenes. Positive selection of plasmid-containing bacteria isnecessary because, without selection, bacteria without incor-porated plasmid will inevitably have a growth advantagedue to less energy expenditure and they will thus outgrowplasmid-containing ones.

To allow for replication in bacteria, plasmids must containan origin of replication, the type of which also determineswhether the plasmid is present at higher or lower copy num-bers. In many cases, high plasmid copy numbers are desir-able, because it facilitates plasmid purification and allows forthe expression of high amounts of protein in less time. How-ever, in cases where the protein folding rate is limiting, it maybe preferable to have lower plasmid copy numbers, leadingto less rapid protein expression and thus giving more time tothe expressed proteins to fold correctly. Growing the culturesat lower temperatures may also promote correct folding.

The promoter and its location upstream of the gene tobe expressed included in the expression vector determinethe amount of messenger RNA that will be produced andthus, indirectly, the rate and amount of protein that willbe expressed. In principle, constitutive expression may beemployed, but unless the expressed protein is useful for theexpression host (e.g., a chaperone), the extra expenditureof energy to produce the protein will be disadvantageousand mutants that do not express the protein will accumulateduring repeated growth/dilution cycles. Therefore, severalinducible expression systems have been developed. Many usethe PLAC, PTAC, or PTRC promoters, inducible with the lactoseanalogue isopropyl-beta-d-thiogalactoside [14, 15]. Anotherpopular system uses the PT7 promoter, the late promoterof bacteriophage T7 [16]. In this case, first T7 RNA poly-merase has to be produced, which is usually achieved using anexpression host that contains a lambda lysogen called DE3,which encodes T7 RNA polymerase under the control ofthe isopropyl-beta-d-thiogalactoside-inducible lacUV5 pro-moter. The T7 RNA polymerase then produces the messengerRNA of interest.

Most inducible systems allow some protein expressioneven before induction. This means that if the protein or com-plex to be expressed is toxic to the host cells, a system withstrong repression before induction must be used. An exam-ple of such a system uses the PBAD promoter of the E. coli

THE STRUCTURE DETERMINATION PROCESS 9

arabinose operon and its regulatory gene araC, allowingstrong repression in the absence of l-arabinose (and evenstronger repression if glucose is added to the culture media)and high levels of messenger RNA generation after inductionwith l-arabinose [17]. In case the protein to be expressed con-tains cystine bonds, expression in the reducing bacterial cyto-plasm may lead to incorrectly folded protein. In this case, theprotein to be expressed may be directed to the less-reducingbacterial periplasm compartment via an N-terminal signalpeptide or bacterial strains mutated in thioredoxin reductase(trxB) and/or glutathione reductase (gor) may be used (likethe E. coli Origami strain). For some proteins, coexpressionwith a specific chaperone or chaperones may be necessaryfor correct folding [18]. They may be encoded on the sameplasmid or another plasmid to be cotransformed into the bac-teria or their coding sequence may be integrated into the hostgenome. Another reason for low expression levels may bethat the heterologous gene contains a codon that is very rarein the bacterium used. Use of a strain overexpressing raretRNA species may resolve this problem (for instance the E.coli Rosetta strain).

When the object of interest is a protein complex, pro-teins may be mixed after purification or after expression,and the resulting complex is purified directly. Proteins mayalso be coexpressed using expression vectors encoding twoor more proteins or by the use of multiple expression vec-tors in the same bacterial host. These multiple expressionvectors should be compatible and encode different antibioticresistance genes, so that selection using the relevant antibi-otics simultaneously forces the bacteria to maintain all theplasmids. Terpe [19] has written a short but comprehensivereview of commonly used bacterial expression systems.

1.3.1.2 Protein Expression in Eukaryotic Systems Notall eukaryotic proteins fold correctly in prokaryotic expres-sion systems, in which case expression in eukaryotic systemsmay be tried. Eukaryotic systems may also be necessary if theexpressed protein is to contain certain posttranslational mod-ifications. As a single-celled and innocuous organism, theyeast Saccharomyces cerevisiae has been most extensivelystudied for protein production (reviewed in Reference 20].Expression plasmids have been developed with sequencesfor propagation in E. coli for DNA amplification and in yeastfor protein expression experiments, including yeast promot-ers and terminators for the production of messenger RNA.Chromosomal integration of a suitable protein expressioncassette is also an option, as plasmids are not always sta-bly maintained in yeast cells. Another yeast species, Pichiapastoris, is noted for its high endogenous protein produc-tion capacity and is also used routinely [21]. In P. pastoris,expression vectors that integrate into the genome appear to bethe norm. In both yeast systems, the proteins to be expressedmay be directed to the medium or allowed to accumulateintracellularly.

Cloning the gene to be expressed in a viral vector andinfecting eukaryotic cells with the resulting viruses is alsoa system that can produce high yields of protein. The sys-tem that is developed most for protein expression is infect-ing insect (lepidopteran) cells using recombinant baculovirus[22]. Recombinant baculoviruses are constructed by replac-ing the polyhedrin gene by a gene encoding the protein ofinterest. Expression is controlled by the strong late polh pro-moter, which thus allows the production of the recombinantprotein at high yield. In vivo, polyhedrin is produced at highamounts (up to 50% of the total infected larva protein mass)and is necessary to form occluded virus, which can survivein the environment until uptake by a new feeding caterpil-lar. In vitro, polyhedrin is not necessary for virus survivalbecause budded virus can readily infect cultured insect cellsand replicate in them. Methods to express multiple proteinsto form protein complexes in the baculovirus/insect systemhave been developed [23]. Other viral systems that have beendeveloped for protein expression include vaccinia virus [24],which allows transient expression in human cell lines (suchas HeLa cells). Usually, the PT7 promoter is used, and the T7RNA polymerase necessary for this is either constitutivelyexpressed in the cell line used or included in the recombinantvaccinia virus vector.

The DNA containing the gene for the protein to beexpressed can also be transferred into eukaryotic cells bytransfection. For this, a suitable DNA vector is usually con-structed as a plasmid in E. coli and transfected into mam-malian cells by electroporation or using cationic lipids (lipo-fection) for transient expression [25]. Popular cell lines areHEK293 [26], derived from human embryonic kidney andCHO, derived from Chinese hamster ovary. Cells that haveincorporated the DNA into their genome and express therecombinant protein in a stable manner may be selected.

1.3.1.3 Cell-Free Protein Expression In case the proteinto be expressed is toxic for living cells or very prone to degra-dation, a cell-free in vitro translation system may be a viable,albeit more expensive, solution. For in vitro protein expres-sion, first messenger RNA must be produced by in vitro tran-scription. Bacteriophage T7 RNA polymerase may be usedfor this. In this system, the gene of interest is cloned behinda T7 promoter, allowing large amounts of messenger RNAto be produced when DNA, nucleoside triphosphates, and T7RNA polymerase are mixed. For the translation step, apartfrom the messenger RNA, many other components are nec-essary (initiation factors, ribosomes, transfer RNAs, elonga-tion factors, amino acids, ATP and GTP, termination factors,ions), so that usually cell extracts are used that contain allof them. Examples are rabbit reticulocyte lysate and wheatgerm extract. Coupled systems are available in which thetranscription and translation steps occur in the same tube,either by the same cell extract (such as an E. coli extract) orby mixing the components necessary for the two steps. An

10 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

advantage of in vitro systems is that ligands or other proteininteraction partners may be added, which in vivo may notbe taken up by the cell or be degraded by living cells beforethey can interact with the expressed protein. These interac-tion partners may make the protein more soluble and/or morestable [27]. More details about cell-free expression systemsare available in specialist books, such as that by Spirin andSwartz [28].

1.3.1.4 Production of Nucleic Acids For the study ofDNA and RNA structure (alone and in complex with nucleic-acid-binding proteins like transcription factors and restrictionenzymes), crystallization-quality nucleic acids will need tobe obtained. DNA molecules may be synthesized chemically,and many companies provide oligonucleotide synthesis ser-vices. RNA oligos may also be synthesized but are morecostly and difficult to produce due to the necessity of protect-ing the extra 2′-hydroxyl group. RNAs may also be producedin the lab by in vitro transcription [29]. The template may bea pair of complementary DNA oligonucleotides encoding theT7 promoter and the sequence of the RNA to be produceddownstream from it. A gene encoding the RNA molecule tobe produced may also be cloned into a plasmid under con-trol of the T7 promoter, the plasmid amplified in E. coli andpurified in large amounts. After linearization of the plasmidwith an efficient restriction enzyme, T7 RNA polymeraseis added along with nucleoside triphosphates, leading to theproduction of large amounts of RNA. For efficient transcrip-tion by T7 RNA polymerase, the first few bases of the RNAto be produced should be purines, while the sequence of the3′ end is determined by the restriction enzyme used. To avoidthese restrictions, a 5′ cis-acting autocleaving hammerheadribozyme may be encoded 5′ and 3′ of the sequence to beproduced [30]. These authors also pioneered the use of therestriction enzyme BsmAI that cleaves 5′ to its recognitionsite to digest the template DNA prior to transcription. In thisway, no restrictions exist for the sequence at the 3′ end of thedesired RNA.

1.3.1.5 Purification and Conditioning After production,the macromolecules or complexes to be crystallized need tobe purified. Oligonucleotides, where irreversible unfolding isless of a problem than for proteins, may be purified by poly-acrylamide gel electrophoresis or high-performance liquidchromatography. If the protein is present in the cultivationmedium, it may be purified directly from it after removalof the expression host cells by centrifugation. This has theadvantage of a relative absence of insoluble contaminants butthe disadvantage of a relatively large volume. If the proteinis produced intracellularly or is to be purified from a naturaltissue source (e.g., meat or spinach leaves, see Section 1.3.1),a crude extract will need to be prepared. Cells will need to bebroken by grinding, sonication, treatment with a hypotonicsolution, detergent treatment, or treatment with a cell-wall

destroying enzyme like lysozyme. In the case of soluble pro-teins, cell debris may be removed by centrifugation and theprotein purified from the soluble fraction, while in the caseof membrane proteins, the protein may be extracted fromthe membrane with detergents. If the protein of interest isexpressed as inclusion bodies, these may be purified by dif-ferential centrifugation and sucrose gradient centrifugationand the protein refolded from these inclusion bodies [31].However, protein refolding is often not straightforward andthere is no guarantee of success.

To facilitate purification, proteins may be expressed withpurification tags or as fusion proteins. The first purificationstep may then be performed using affinity chromatogra-phy, examples are metal affinity chromatography for pro-teins containing an oligohistidine sequence, a matrix witha modified streptavidin for proteins with a streptavidin-recognizing octapeptide, amylose–resin chromatography forproteins expressed as maltose-binding protein fusions, orglutathione affinity chromatography for protein containinga glutathione S-transferase tag. Maltose-binding protein andglutathione S-transferase have the additional advantage thatthey may help the target protein stay soluble during expres-sion, although for crystallization such a large fusion partneris likely to be detrimental and would have to be removed(usually by including a specific protease site between thetwo fusion partners). If no purification tag is present, usu-ally some bulk fractionation step needs to be performedbefore proceeding to more traditional column chromatog-raphy steps. These may include ammonium sulfate precipi-tation, streptomycin sulfate precipitation to remove nucleicacids, or sucrose gradient centrifugations to isolate largecomplexes. Then, purification takes place using anion and/orcation exchange chromatography and size exclusion chro-matography (often as a final “polishing” step). It should bestressed that no universal purification protocols are avail-able and specialized schemes have to be developed for eachparticular protein.

During and after purification, the identity and state ofthe sample should be verified. In the case of proteins, N-terminal sequence analysis (Edman degradation) and massspectrometry can be used to verify the identity of the proteinand to verify that the N-terminus (and sometimes C-terminus)are as expected. In the case of enzymes and macromoleculesthat bind specific ligands, activity and binding assays may beperformed to verify identity and correct folding.

For successful crystallization, it is usually necessary toconcentrate the purified macromolecule to values of morethan 10 mg/mL. Although proteins have been successfullycrystallized from samples at 2 mg/mL or less, a higher con-centration increases the chances of success, and if the proteinis maintained soluble at 20, 50, or even 100 mg/mL, crys-tallization trials may be setup at these higher concentrations.Concentration of macromolecular samples may be achievedby filtration using membranes through which the protein does

THE STRUCTURE DETERMINATION PROCESS 11

not pass. The necessary pressure to force the buffer throughthe membrane may be provided by centrifugation or pres-surized nitrogen or air. Alternative methods include proteinprecipitation by ammonium sulfate followed by dialysis or bycovering a dialysis tube containing the sample with polyethy-lene glycol powder, removing solvent from the sample butretaining the macromolecule in the tube, optionally followedby dialysis.

Crystals consist of regularly repeating units of the samemolecule or complex, each in the same conformation. Inorder for a sample to successfully crystallize, purity is veryimportant. Therefore the minimum amount of buffer com-ponents to keep the protein stable should be included—infact, many macromolecules are stable in water alone, and thepurification buffer can be exchanged for water or the min-imum buffer in the last concentration or dialysis step. Thechemical purity of the macromolecule or complex may beassessed using denaturing gel electrophoresis. This shouldalso reveal if the protein is intact or whether proteolysis mayhave occurred during expression and purification.

While chemical purity is necessary, it is not sufficient;conformational homogeneity is just as important. Typicalcauses of conformational heterogeneity may be partial andunspecific aggregation, unfolding or flexible domains. Theaggregation state of the protein may be investigated by nativegel electrophoresis, size exclusion chromatography, dynamiclight scattering, or analytical ultracentrifugation. The fact thata protein forms oligomers is not necessarily a problem, aslong as it forms a homogeneous population of them, leadingto a monodisperse sample. Certain proteins may need to formspecific oligomers to perform their natural function and maynot even be as stable as monomers. If the macromoleculeor complex is large enough, it may be useful to observesingle particles by electron microscopy, which may quicklyreveal large differences in conformation or oligomerizationstate using only small amounts of sample. Native gel elec-trophoresis or isoelectric focusing may also reveal multiplecharge states for the macromolecule. If this happens, thesemay need to be separated by ion exchange chromatographyor preparative isoelectric focusing.

To have a reasonable chance of crystallizing, the macro-molecule or complex of interest should be folded correctly.While many unfolded proteins aggregate unspecifically andoften even precipitate, some proteins may be perfectly solu-ble and monomeric, even when unfolded. The folding degreeof a protein may be judged by NMR spectroscopy, a foldedprotein should have a more disperse set of amide proteinresonances when compared to unfolded, random coil, pro-teins (see also Chapter 2). If it is suspected that the macro-molecule has disordered loops or larger flexible domains, itmay be necessary to remove these by limited proteolysis orby redesigning the expression vector. A specific ligand orinhibitor may also be included to try and lock the protein, thenucleic acid, or complex into a unique conformation.

1.3.2 Crystallization

Several different methods exist for obtaining crystals ofmacromolecules. In most of them, the solution containingthe macromolecules (the mother liquor) is mixed with a sim-ilar volume of precipitation solution and allowed to equili-brate with a larger volume of the same precipitant solution.Equilibration by vapor diffusion is the most commonly usedmethod. Traditionally, this was (and is) performed by thehanging drop method, placing the drop of mother liquoron a siliconized microscope cover slip and inverting thiscover slip over a well with precipitant solution in a Lin-bro plate. The borders of the well are sealed with mineraloil or vacuum grease. Currently, sitting drop vapor diffu-sion experiments are becoming more popular because oftheir relative ease of setup, ease of crystal harvesting, andsuitability for automatization. Sitting drop vapor diffusionexperiments can be sealed with extraclear tape, which per-mits opening individual wells by carefully removing the tapeonly from that well and resealing with a piece of the sametape.

Some proteins are sensitive to air, and although vapordiffusion experiments can be setup under a nitrogen atmo-sphere to prevent oxidation, dialysis may be a better option[12]. Microdialysis buttons are available for small volumes(5–350 μL of mother liquor), although these are still an orderof magnitude larger than the volumes used in vapor diffusionor microbatch experiments (see next paragraph). The but-tons are covered with a piece of dialysis membrane kept inplace with a rubber o-ring and incubated in a vial with alarge volume of precipitant solution. A further advantage ofthis method is that after crystal growth, ligands, cryoprotec-tant, and other components can be introduced into the motherliquor without disturbing the crystals by adding them to theprecipitant solution or exchanging the precipitant solutionand waiting for equilibration.

Macromolecules can also be crystallized in batch, bysimply mixing a concentrated solution of them with pre-cipitant solution and waiting. In microbatch experiments,protein solution is directly mixed with precipitant solu-tion and incubated under a layer of mineral oil, allowingfor slow evaporation of aqueous solvent through the oillayer. A percentage of silicon oil can be mixed in withthe mineral oil if faster evaporation is desired. This isoften done in Terasaki plates, which contain 60 or 72 smallwells.

Free interface diffusion is another commonly used tech-nique [32]. The solution containing the concentrated macro-molecules is brought into direct contact with the precipitantsolution in a capillary and slow free diffusion is allowed totake place through the small contact surface. The concentra-tion gradient that forms along the capillary allows samplingof a larger fraction of crystallization space in a smaller num-ber of experiments.

12 X-RAY CRYSTALLOGRAPHY OF BIOLOGICAL MACROMOLECULES: FUNDAMENTALS AND APPLICATIONS

Crystallization robots can significantly expedite the crys-tallization process, eliminating a lot of tedious manipula-tions and allowing for small-volume drops (typically 50μL). There are robots specialized in microbatch exper-iments or sitting-drop vapor diffusion, but multipurposeones are also available that can also perform hanging-dropvapor diffusion experiments. Robots generally use 96-wellplates, with the possibility of multiple crystallization dropsper well.

A typical initial screen consists of one or more 96-wellplates with very different conditions [33, 34], and if possi-ble, the same experiments are incubated at different tem-peratures (e.g., at 20◦C and 5◦C). Incubation should be inlow-vibration conditions. If crystals are obtained, they aremeasured to confirm they are protein, not salt or anothersmall-molecule additive, and to assess their diffraction limitand quality. If crystalline precipitates are obtained, furtherscreens are performed around these conditions to see if crys-tals can be obtained. At the same time, it is worth carefullyexamining the cloning, expression, and purification strategyto see if improvements in protein purity and conformationalhomogeneity can be obtained (see Section 1.3.1.5). In addi-tion to these initial more-or-less random screens, it is worthscreening common precipitants such as ammonium sulfateand polyethylene glycol at different concentrations, pH, andtemperatures. Precipitant solutions should be prepared usinghigh-grade chemicals. Other parameters that may be variedto obtain crystals or improve their size and quality are ini-tial protein concentration, drop size, and the ratio of proteinsolution to precipitation solution in the drop. Additives ofdifferent classes may be tried, such as multivalent cations,common salts, chaotropes, reducing agents, polyamines, andorganic molecules.

The results of crystallization experiments include cleardrops and precipitates due to unspecific protein aggrega-tion. In these cases, future experiments in which the precip-itant concentration is increased or decreased, respectively,may yield more promising results. Phase separation in whichthe protein concentrates in an organic phase may also beobserved, and sometimes protein crystals nucleate on theedges of such organic phases. Crystalline precipitates mayform due to excessive nucleation or inversely, clusters of crys-tals due to insufficient nucleation sites. Sometimes, crystalsor crystal fragments useful for diffraction experiments maybe separated from these clusters. Single crystals may alsobe observed. Often, crystal growth is not equally efficientin all three dimensions and needle- or plate-shaped crys-tals result, but if the conditions are just right, crystals withsizes of 10–100 μm in all three dimensions may be obtained.Where crystals are too small, seeding drops with preformedmicrocrystals may lead to growth of larger crystals [35, 36].Seeding may also improve crystal qualities other than size.For more complete texts on protein crystallization, textbooksare available [37–39].

1.3.3 Data Collection and Processing

The first step of data collection is the recovery of the fragilecrystals from the crystallization setup. For room temperaturedata collection, they may be carefully transferred to a quartzcapillary and mounted in conditions in which the crystal willnot dry up or be able to attract moisture from the surroundingatmosphere and dissolve. They can also be picked up with anylon or plastic microloop about the same size as the crystal.The loop is then covered with a plastic hood filled with a dropof mother liquor. To prolong crystal life, a crystal can alsobe briefly incubated in a suitable cryoprotectant, and in thiscase, they can either be flash-frozen at 100 K inside a nitrogengas stream or in liquid nitrogen [40]. If data collection isthen performed at 90–120 K, a significant increase in crystallifetime can be obtained as radiation damage decreases atlower temperature [41].

The most common strategy setup used nowadays to mea-sure X-ray diffraction intensities is the oscillation method.Consecutive images are recorded for small rotation angles(0.25◦ to 2◦) around an axis perpendicular to the incidentX-ray beam [42].

Depending on the space group of the crystals obtainedand the structure solution method that is to be used, some-what different data collection procedures will need to beemployed. In all cases, complete datasets are necessary, andif the diffraction data anomalous signal is to be exploited,Friedel’s pairs will have to be collected for each reflectionat high multiplicity. This is because the anomalous intensitydifferences between Friedel’s pairs are generally small com-pared to the diffraction intensities. For high-symmetry spacegroups, a relatively small fraction of reciprocal space needsto be explored, while for lower-symmetry space groups, alarger fraction of reciprocal space will need to be covered,that is, more images per dataset will have to be collected.For structure solution by molecular replacement or isomor-phous replacement methods (see Section 1.3.4), high multi-plicity is not a necessity (although it is always an advantage),while for anomalous dispersion methods it is very important.High-multiplicity datasets will require longer data collectiontimes, while at the same time radiation damage will have to beavoided [43]. Therefore, to allow successful structure solu-tion, at times higher resolution data will have to be sacrificed(i.e., less exposure time per image) for data completenessand/or multiplicity. Once the structure is solved and morecrystals are available, one can always attempt to collect acomplete higher resolution dataset for the final refinementof the structure. Completeness means that as many as pos-sible reflections for this particular crystal structure are well-measured. A common mistake is to overexpose crystals inorder to achieve the highest possible resolution, leading tooverloading low-resolution reflections. In some cases thisproblem is best overcome by merging two datasets measuredat low- and high-beam intensity or exposure time.