135

Click here to load reader

Nature Structural Molecular Biology February

Embed Size (px)

Citation preview

Page 1: Nature Structural Molecular Biology February
Page 2: Nature Structural Molecular Biology February

www.nature.com/nsmb

EDITORIAL OFFICE [email protected] Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9331, Fax: (212) 679 0735Editor: Boyana KonfortiSenior Editor: Michelle MontoyaAssociate Editors: Inês Chen, Sabbi LallCopy Editor: Carrie PatisSenior Production Editor: Jessica IannuzziProduction Editor: Jamel WootenSenior Illustrator: Katie Ris-VicariIllustrator: Kimberly CaesarCover Design: Erin BoyleEditorial Assistant: Christina Polizoto

MANAGEMENT OFFICESNPG New York75 Varick Street, Fl 9, New York, NY 10013-1917Tel: (212) 726 9200, Fax: (212) 696 9006Executive Editor: Linda MillerChief Technology Officer: Howard RatnerHead of Nature Research & Reviews Marketing: Sara GirardMarketing Manager: Leah RodriguezAssistant Production Coordinator: Karen WilsonHead of Web Services: Anthony BarreraWeb Production Manager: Susan Kline

NPG LondonThe Macmillan Building, 4 Crinan Street, London N1 9XWTel: 44 207 833 4000, Fax: 44 207 843 4996Managing Director: Steven InchcoombePublishing Director: Alison MitchellEditor-in-Chief, Nature Publications: Philip CampbellMarketing Director: Della SarDirector of Web Publishing: Timo Hannay

NPG Nature Asia-PacificChiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843Tel: 81 3 3267 8751, Fax: 81 3 3267 8746Publishing Director — Asia Pacific: David SwinbanksAssociate Director: Antoine E. BocquetManager: Koichi NakamuraSenior Marketing Manager: Peter YoshiharaAsia-Pacific Sales Director: Kate YoneyamaAsia-Pacific Sales Manager: Ken Mikami

DISPLAY ADVERTISING [email protected] (US/Canada) [email protected] (Europe) [email protected] (Asia)Global Head of Display Advertising Sales: John Michael, Tel: 44 207 843 4960, Fax: 44 207 843 4996Asia-Pacific Sales Manager: Ken Mikami, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746Display Account Managers:New England: Sheila Reardon, Tel: (617) 399 4098, Fax: (617) 426 3717New York/Mid-Atlantic/Southeast: Jim Breault, Tel: (212) 726 9334, Fax: (212) 696 9481Midwest: Mike Rossi, Tel: (212) 726 9255, Fax: (212) 696 9481West Coast South: George Lui, Tel: (415) 781 3804, Fax: (415) 781 3805West Coast North: Bruce Shaver, Tel: (415) 781 6422, Fax: (415) 781 3805Germany/Switzerland/Austria: Sabine Hugi-Fürst, Tel: 41 52761 3386, Fax: 41 52761 3419United Kingdom/Ireland/France/Belgium/Eastern Europe: Jeremy Betts, Tel: 44 207 843 4968, Fax: 44 207 843 4749Scandinavia/The Netherlands/Italy/Spain/Portugal/Israel/Iceland: Graham Combe, Tel: 44 207 843 4914, Fax: 44 207 843 4749Greater China/Singapore: Gloria To, Tel: 852 2811 7191, Fax: 852 2811 0743

NATUREJOBS [email protected] (US/Canada) [email protected] (Europe) [email protected] (Asia)US Sales Manager: Peter Bless, Tel: (212) 726 9248, Fax: (212) 696 9482European Sales Manager: Andrew Douglas, Tel: 44 207 843 4975, Fax: 44 207 843 4996Asia-Pacific Sales Manager: Ayako Watanabe, Tel: 81 3 3267 8765, Fax: 81 3 3267 8746

SITE LICENSE BUSINESS UNITAmericas: Tel: (888) 331 6288 [email protected]/Pacific: Tel: 81 3 3267 8751 [email protected]/New Zealand: Tel: 61 3 9825 1160 [email protected]: Tel: 91 124 2881054/55 [email protected]: Tel: 44 207 843 4759 [email protected]

CUSTOMER SERVICE www.nature.com/helpSenior Global Customer Service Manager: Gerald CoppinFor all print and online assistance, please visit www.nature.com/helpPurchase subscriptions:Americas: Nature Structural & Molecular Biology, Subscription Dept., 342 Broadway, PMB 301, New York, NY 10013-3910. Tel: (866) 363 7860, Fax: (212) 689 9108Europe/ROW: Nature Structural & Molecular Biology, Subscription Dept., Macmillan Magazines Ltd., Brunel Road, Houndmills, Basingstoke RG21 6XS, United Kingdom. Tel: 44 1256 329 242, Fax: 44 1256 812 358Japan: Nature Structural & Molecular Biology, NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 3 3267 8751, Fax: 81 3 3267 8746India: Nature Structural & Molecular Biology, NPG India, 3A, 4th Floor, DLF Corporate Park, Gurgaon 122002, India. Tel: 91 124 2881054/55, Fax: 91 124 2881052

REPRINTS [email protected] Structural & Molecular Biology Reprint Department, Nature Publishing Group, 75 Varick Street, Fl 9, New York, NY 10013-1917, USA.For commercial reprint orders of 600 or more, please contact:UK Reprints: Tel: 44 1256 302 923, Fax: 44 1256 321 531US Reprints: Tel: (212) 726 9278, Fax: (212) 679 0843

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 3: Nature Structural Molecular Biology February

i

volume 16 number 2 FebruArY 2009

Nature Structural & Molecular Biology (ISSN 1545-9993) is published monthly by Nature Publishing Group, a trading name of Nature America Inc. located at 75 Varick Street, Fl 9, New York, NY 10013-1917. Periodicals postage paid at New York, NY and additional mailing post offices. Editorial Office: 75 Varick Street, Fl 9, New York, NY 10013-1917. Tel: (212) 726 9331, Fax: (212) 679 0735. Annual subscription rates: USA/Canada: US$225 (personal), US$3,060 (institution). Canada add 7% GST #104911595RT001; Euro-zone: €287 (personal), €2,430 (institution); Rest of world (excluding China, Japan, Korea): £185 (personal), £1,570 (institution); Japan: Contact NPG Nature Asia-Pacific, Chiyoda Building, 2-37 Ichigayatamachi, Shinjuku-ku, Tokyo 162-0843. Tel: 81 (03) 3267 8751, Fax: 81 (03) 3267 8746. POSTMASTER: Send address changes to Nature Structural & Molecular Biology, Subscriptions Department, 342 Broadway, PMB 301, New York, NY 10013-3910. Authorization to photocopy material for internal or personal use, or internal or personal use of specific clients, is granted by Nature Publishing Group to libraries and others registered with the Copyright Clearance Center (CCC) Transactional Reporting Service, provided the relevant copyright fee is paid direct to CCC, 222 Rosewood Drive, Danvers, MA 01923, USA. Identification code for Nature Structural & Molecular Biology: 1545-9993/04. Back issues: US$45, Canada add 7% for GST. CPC PUB AGREEMENT #40032744. Printed on acid-free paper by Dartmouth Journal Services, Hanover, NH, USA. Copyright © 2009 Nature Publishing Group. Printed in USA.

e d i to r i A l

99 The year that was and the year ahead

n e w s A n d v i e w s

100 Tip20p reaches out to Dsl1p to tether membranesMary Munson see also p 114

102 Wedging out DNA damageOrlando D Schärer & Arthur J Campbell see also p 138

104 Towards the architecture of the chromosomal architectsValentin V Rybenkov

106 reseArch highlights

P e r s P e c t i v e

107 Nonsense-mediated mRNA decay (NMD) mechanismsSaverio Brogna & Jikai Wen

A r t i c l e s

114 Structural characterization of Tip20p and Dsl1p, subunits of the Dsl1p vesicle tethering complexArati Tripathi, Yi Ren, Philip D Jeffrey & Frederick M Hughson see also p 100

124 High-resolution dynamic mapping of histone-DNA interactions in a nucleosomeMichael A Hall, Alla Shundrovsky, Lu Bai, Robert M Fulbright, John T Lis & Michelle D Wang

130 An RNA code for the FOX2 splicing regulator revealed by mapping RNA-protein interactions in stem cellsGene W Yeo, Nicole G Coufal, Tiffany Y Liang, Grace E Peng, Xiang-Dong Fu & Fred H Gage

Crystal structures of bacterial endonuclease V in complex with its substrate and product give insight into a major base-repair pathway.(p 138, News and Views p 102)

Fatty acid synthase is composed of several catalytic domains that work in sequence. Asturias and colleagues use single-particle EM analysis of rat

FAS to reveal the movements of the domains during the reaction cycle.

The cover image of a flamenco dancer represents the complex

motions of FAS in action (© Emanuele Ferrari, iStockphoto).

pp 190–197

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 4: Nature Structural Molecular Biology February

iiinAture structurAl & moleculAr biologY

138 Structures of endonuclease V with DNA reveal initiation of deaminated adenine repairBjørn Dalhus, Andrew S Arvai, Ida Rosnes, Øyvind E Olsen, Paul H Backe, Ingrun Alseth, Honghai Gao, Weiguo Cao, John A Tainer & Magnar Bjørås see also p 102

144 Biological basis for restriction of microRNA targets to the 3′ untranslated region in mammalian mRNAsShuo Gu, Lan Jin, Feijie Zhang, Peter Sarnow & Mark A Kay

151 Nucleosomes can invade DNA territories occupied by their neighborsMaik Engeholm, Martijn de Jager, Andrew Flaus, Ruth Brenk, John van Noort & Tom Owen-Hughes

159 SRS2 and SGS1 prevent chromosomal breaks and stabilize triplet repeats by restraining recombinationAlix Kerrest, Ranjith P Anand, Rangapriya Sundararajan, Rodrigo Bermejo, Giordano Liberi, Bernard Dujon, Catherine H Freudenreich & Guy-Franck Richard

168 Helix movement is coupled to displacement of the second extracellular loop in rhodopsin activationShivani Ahuja, Viktor Hornak, Elsa C Y Yan, Natalie Syrett, Joseph A Goncalves, Amiram Hirshfeld, Martine Ziliox, Thomas P Sakmar, Mordechai Sheves, Philip J Reeves, Steven O Smith & Markus Eilers

176 Recognition of atypical 5′ splice sites by shifted base-pairing to U1 snRNAXavier Roca & Adrian R Krainer

183 A distinct class of small RNAs arises from pre-miRNA–proximal regions in a simple chordateWeiyang Shi, David Hendrix, Mike Levine & Benjamin Haley

190 Conformational flexibility of metazoan fatty acid synthase enables catalysisEdward J Brignole, Stuart Smith & Francisco J Asturias

Structural and functional analysis of MIA40 gives insight into its role in mitochondrial import of certain

Cys-motif-containing proteins.(p 198)

In vitro analyses indicate that the DNA territory of nucleosomes can overlap, resulting in single compact particles.

(p 151)

The U1 snRNA has unexpected flexibility and

recognizes atypical splice sites by shifting its base pairing.

(p 176)

volume 16 number 2 FebruArY 2009

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 5: Nature Structural Molecular Biology February

v

volume 16 number 2 FebruArY 2009

nAture structurAl & moleculAr biologY

198 MIA40 is an oxidoreductase that catalyzes oxidative protein folding in mitochondriaLucia Banci, Ivano Bertini, Chiara Cefaro, Simone Ciofi-Baffoni, Angelo Gallo, Manuele Martinelli, Dionisia P Sideris, Nitsa Katrakili & Kostas Tokatlidis

207 RDE-1 slicer activity is required only for passenger-strand cleavage during RNAi in Caenorhabditis elegansFlorian A Steiner, Kristy L Okihara, Suzanne W Hoogstrate, Titia Sijen & René F Ketting

212 Nucleic acid polymerases use a general acid for nucleotidyl transferChristian Castro, Eric D Smidansky, Jamie J Arnold, Kenneth R Maksimchuk, Ibrahim Moustafa, Akira Uchida, Matthias Götte, William Konigsberg & Craig E Cameron

219 Polyubiquitin substrates allosterically activate their own degradation by the 26S proteasomeDawadschargal Bech-Otschir, Annett Helfrich, Cordula Enenkel, Gesa Consiglieri, Michael Seeger, Hermann-Georg Holzhütter, Burkhardt Dahlmann & Peter-Michael Kloetzel

b r i e F com m u n i c At i o n

226 Replisome stalling and stabilization at CGG repeats, which are responsible for chromosomal fragilityIrina Voineagu, Christine F Surka, Alexander A Shishkin, Maria M Krasilnikova & Sergei M Mirkin

nAture structurAl & moleculAr biologY clAssiFied

See back pages.

An active site residue is implicated as a proton donor in nucleic acid polymerases.

(p 212)

An extracellular loop may maintain inactive receptor conformations and

propagate conformational changes during rhodopsin activation.

(p 168)

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 6: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 99

Continue to educate ourselves (and our readers) about politics 2. in the United States and abroad and how decisions made at the government level can and do affect us all.

Explore issues at the interface between science and society 3. (read: how to get the public more engaged in scientific research and education).

Go to more meetings and do more laboratory visits so that we 4. can meet and talk to more of you. That way we can hear about your work and get your thoughts about the journal firsthand. In the meantime, you can write ([email protected]) or call us (+1 212 726 9331).

Read 5. The New York Times, The Economist, The New Yorker, The Atlantic Monthly (fill in your favorite magazine), novels, and go to plays, museums, concerts and so on, so that we can have semi-intelligent conversations over meals and at the bar about something other than science.

Certainly not a comprehensive set of lists, but a good start to what we hope will be a year of cautious optimism. L

Here are some of the things I would like to do less of in 2009.

Try to convince those in government that money invested 1. in science and technology is an investment in the future. Federal funding in the life sciences has fallen in real dollars since 2004. As a result, the success rates of grant applications are dangerously low and excellent science is not being funded.

Argue that2. money that the government does invest in science should go to basic science research rather than applied or directed research. In fact, I would argue that more money should be spent not just on basic research but also on so-called ‘high-risk’ projects—those that are unlikely to succeed but that would have enormous value if they did.

Push to get real science and technology issues into the political 3. discussion rather than hot button issues such as abortion.

Advocate for the equality of women. Women still make up only 4. 10–20% of full professors and they are paid less than men.

Remind scientists of the crucial role they have in educating the 5. public about the scientific process.

Explain why evolution belongs in a science classroom and 6. ‘intelligent design’ doesn’t.

Convince people that spending money on education is 7. money well spent. The United States is once again not among the top ten countries for science and math education. This means that we are not adequately preparing our children for tomorrow’s workforce.

And here are just some of the things we as a group look forward to doing more of in 2009 (in no particular order).

Watch how President Barack Obama puts his science and 1. education policies into action (he actually mentioned both science and education several times in his inaugural speech).

The year that was and the year aheadNow that we are well into 2009, I can’t help but think about the year that has passed. Fear not, this will not be one of those dreaded holiday letters where we list all the highs and lows of the year. But as I look back, there are many things I hope that I have permanently crossed off my ‘To Do’ list and others that I am looking forward to doing.

© P

aige

Fos

ter,

iSto

ckP

hoto

e d i to r i a l

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 7: Nature Structural Molecular Biology February

100 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

Mary Munson is in the Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA. e-mail: [email protected]

in solution), and they postulated that these helices might be used for heterodimerization. They showed that these helices are crucial for Tip20p–Dsl1p complex formation in vitro; unfortunately, the complexes did not crystallize. When they cleverly used a construct in which the first 36 amino acids of Dsl1p (which are disordered in the Dsl1p crystals) were replaced by the N-terminal helix (residues 1–40) of Tip20p and a short linker, they were able to obtain diffracting crystals. In this pseudo cocrystal structure, the Tip20p region forms a helix that is antiparallel to Dsl1p’s N-terminal helix, suggesting an elongated tip-to-tip interaction that could tether vesicles at a distance (see Figure 6 of ref. 1).

Can the Dsl1p complex be a useful paradigm for all helical bundle subunit interactions—is this tip-to-tip binding mode a conserved mechanism for protein-protein interactions between subunits? Intriguingly, no other structures of the extreme N-terminal ends of other helical bundle tethering complexes have been determined; these regions may be natively unfolded in the absence of their binding partners and were therefore removed by limited proteolysis before crystallization. Further biochemical experiments and cocrystal structures are necessary to address this possibility.

If these complexes function just to tether membranes, one of the most puzzling aspects of the helical bundle tethering complexes is why most of them, Dsl1p being the exception, are composed of many (four to eight) large subunits? And why does the Dsl1p c omplex need only three subunits? The most likely answers are that these complexes perform a range of functions and that Dsl1p carries out only a subset of them. The dogma in the field is that these complexes are responsible for vesicle tethering. The complexes have little or no ability to bind membranes directly, but interact specifically with

Tip20p reaches out to Dsl1p to tether membranesMary Munson

Large, multisubunit complexes have been implicated in tethering transport vesicles to organelle membranes before membrane fusion. New structures add to the growing list of tethering complexes that contain conserved helical bundle structures and provide a first glimpse of how these complexes are assembled.

Eukaryotic cells are crowded with functionally distinct membrane-bound compartments and vesicles that transport protein and lipid cargo between them. How does the cell ensure that each vesicle fuses with the proper organelle membrane? One strategy is the use of tethers—proteins or protein complexes that specifically bind to both the correct vesicle and its target membrane and bring them together. Two general types of vesicle tethers have been identified: elongated coiled coil proteins and large multisubunit tethering complexes. Although progress has been made in defining the structures of individual tethering subunits, the mechanism(s) of the large vesicle tethering complexes cannot be understood without higher-order complex structures that reveal how the subunits assemble and function together. On page 114 of this issue, Tripathi, Ren and co-workers show us the first high-resolution view of how subunits from a helical bundle tethering complex can associate with each other1.

This new study focuses on the Dsl1p complex from yeast, which contains the subunits Dsl1p (ZW10 in mammals), Tip20p (RINT1 in mammals) and Sec39p/Dsl3p (no Sec39p homolog has been identified in mammals). The Dsl1p complex functions in the retrograde trafficking of COPI-coated vesicles from the Golgi complex to the endoplasmic reticulum (ER)2. Dsl1p is one of eight multisubunit tethering complexes required for intracellular membrane trafficking (Fig. 1; reviewed in refs. 3–5): TRAPP I (ER to Golgi); COG (retrograde intra-Golgi); TRAPP II (intra-Golgi and endosome to

late Golgi); CORVET (late Golgi to endosome); HOPS (endosome to vacuole and homotypic vacuole); GARP (endosome to late Golgi); and exocyst (exocytosis and recycling of endocytic vesicles). Although they have limited sequence similarity, several COG and exocyst subunits have been shown to possess conserved helical bundle structures6–9; the remaining COG, exocyst, GARP and Dsl1p complex subunits are predicted to have similar structures10,11. Notably, the TRAPP I and II subunits are structurally distinct from the exocyst and COG subunits12,13, and the sequences of HOPS and CORVET seem to be distinct from either the exocyst, COG or TRAPP subunits. Here these new structures of Dsl1p (residues 37–355, out of 754) and full-length Tip20p (residues 1–701) reveal that they indeed have helical bundle structures similar to the exocyst and COG subunits (Fig. 1), suggesting a common evolutionary origin for these complexes.

Little is known about how helical bundle proteins assemble together and function in membrane trafficking. Only TRAPP I has been crystallized as a complex12,13, and the lack of structural similarity between this complex and the others precludes modeling of the higher-order structures of the other complexes. Assembled exocyst and COG complexes have been observed by quick-freeze/deep-etch EM imaging, but only at low resolution14,15. Qualitative in vitro binding experiments with several of the exocyst subunits suggested that they may interact in a side-to-side fashion7 (see model for the complex in ref. 16).

The Dsl1p complex subunits, however, seem to use a novel interface to mediate association with each other1. While determining the structures of the subunits individually, the authors observed that long α-helices near the N termini of both Tip20p and Dsl1p homodimerize in the crystals (although not

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 8: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 101

other proteins, including Rab and Rho GTPases, coat proteins, Sec1/Munc18 proteins and SNARE (soluble N-ethylmaleimide–sensitive factor attachment protein receptors) proteins in the membranes (reviewed in refs. 3–5). These interactions suggest a vesicle tethering function, as has been observed for TRAPP I (ref. 17), HOPS18 and the elongated coiled coil tethers (reviewed in refs. 3,19). Surprisingly, however, there is no direct evidence of a tethering function for any of the helical bundle complexes. The availability of purified Dsl1p complex components should greatly facilitate the future development of in vitro tethering assays.

Biochemical and genetic evidence, on the other hand, does implicate these complexes in the regulation of SNARE complex assembly (for review, see refs. 3,4). SNARE proteins are present on the vesicle and target membranes and are core components of the

membrane-fusion machinery. The SNARE proteins cannot generate specificity wholely by themselves; a mechanism for controlling fusion specificity, therefore, is to regulate the assembly of specific SNARE complexes. The helical bundle tethering complexes have each been shown to interact with cognate SNARE proteins, and here Tripathi, Ren et al. show direct interactions of Tip20p and Sec39p with the ER SNAREs Sec20p and Use1p, respectively1. It will be interesting to see whether the Dsl1p complex binds only to individual SNAREs or also to SNARE complexes, and if these interactions have any effect on the rate and/or specificity of SNARE complex assembly and membrane fusion. Moreover, it is possible that the Dsl1p complex also interacts with other regulators to carry out its function(s), including the Sec1/Munc18 homolog Sly1p and a putative

Rab GTPase, although this partner remains to be identified. Similar interactions have been observed for other tethers, although the precise mechanisms of regulation have yet to be worked out.

These Dsl1p–Tip20p studies provide a good start toward answering many important questions about the structure and function of the tethering complexes. Do these complexes actually tether membranes? If so, is the tethering a passive or an active process? How do these complexes regulate SNARE complex assembly? What other functions do these complexes have? How do the complexes assemble and disassemble? Where do assembly and disassembly take place (on which membranes, or in the cytosol)? What factors trigger the assembly and disassembly processes? These questions are not just structural, but also cell biological and biochemical. The key will

Figure 1 The Dsl1p complex subunits have similar helical bundle structures to those of the exocyst and COG complexes, whereas the TRAPP I complex is structurally distinct. The Dsl1p complex regulates the retrograde trafficking of COPI vesicles back to the ER (left, Dsl1p; right, Tip20p)1; the TRAPP I complex functions in the trafficking of COPII vesicles to the Golgi (left to right: Trs31p, Bet3p-B, Trs23p, Bet5p and Bet3p-A13); the COG complex is important for retrograde trafficking in the Golgi (COG2 is shown); and the exocyst complex regulates exocytosis and recycling of endocytic vesicles (left to right: Exo70p7, Exo84CT7, Sec6CT8 and Sec15CT6. The proteins are oriented with their helical bundles in alignment).

TRAPP I complex

Exocyst complex

COGcomplex

EndosomesommeEndoExocyst

Exocyst

HOPS

GARP

CORVET

Dsl1

COG

TRAPP IRAPP I

TRAPP II

ER

Golgi

Plasmamembrane

Lysosome(vacuole)

Dsl1p complex

Kim

Cae

sar

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 9: Nature Structural Molecular Biology February

102 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

23418–23426 (2007).10. Koumandou, V.L., Dacks, J.B., Coulson, R.M. & Field, M.C.

BMC Evol. Biol. 7, 29 (2007).11. Croteau, N.J., Furgason, M.L.M., Devos, D. & Munson, M.

PLoS ONE (in the press) (2009).12. Kim, Y.G. et al. Cell 127, 817–830 (2006).13. Cai, Y. et al. Cell 133, 1202–1213 (2008).14. Hsu, S.C. et al. Neuron 20, 1111–1122 (1998).15. Ungar, D. et al. J. Cell Biol. 157, 405–415 (2002).16. Munson, M. & Novick, P. Nat. Struct. Mol. Biol. 13,

577–581 (2006).17. Yu, S. et al. J. Cell Biol. 174, 359–368 (2006).18. Stroupe, C., Collins, K.M., Fratti, R.A. & Wickner, W.

EMBO J. 25, 1579–1589 (2006).19. Gillingham, A.K. & Munro, S. Biochim. Biophys. Acta

1641, 71–85 (2003).

be more complex structures, combined with in vitro functional assays to reconstitute complex assembly, disassembly, membrane tethering, SNARE complex assembly and so on, as well as in vivo mutagenesis and cell biological analyses. It will be exciting to see the structures and functions of these complexes revealed in the near future.

1. Tripathi, A., Ren, Y., Jeffrey, P.D. & Hughson, F.M. Nat. Struct. Mol. Biol. 16, 114–123 (2009).

2. Kraynack, B.A. et al. Mol. Biol. Cell 16, 3963–3977 (2005).

3. Sztul, E. & Lupashin, V. Am. J. Physiol. Cell Physiol. 290, C11–C26 (2006).

4. Cai, H., Reinisch, K. & Ferro-Novick, S. Dev. Cell 12, 671–682 (2007).

5. Kummel, D. & Heinemann, U. Curr. Protein Pept. Sci. 9, 197–209 (2008).

6. Wu, S., Mehta, S.Q., Pichaud, F., Bellen, H.J. & Quiocho, F.A. Nat. Struct. Mol. Biol. 12, 879–885 (2005).

7. Dong, G., Hutagalung, A.H., Fu, C., Novick, P. & Reinisch, K.M. Nat. Struct. Mol. Biol. 12, 1094–1100 (2005).

8. Sivaram, M.V., Furgason, M.L.M., Brewer, D.N. & Munson, M. Nat. Struct. Mol. Biol. 13, 555–556 (2006).

9. Cavanaugh, L.F. et al. J. Biol. Chem. 282,

Wedging out DNA damageOrlando D Schärer & Arthur J Campbell

The DNA-repair machinery is faced with the significant challenge of differentiating DNA lesions from unmodified DNA. Two recent publications, one in this issue of Nature Structural & Molecular Biology, uncover a new way of recognizing minimally distorting DNA lesions: insertion of a 3- or 4-amino-acid wedge into DNA to extrude the lesion into a shallow binding pocket that can accommodate various damaged bases.

Damaged bases are most frequently removed from DNA by one of two pathways: base- excision repair (BER) and nucleotide- excision repair (NER)1. In BER, small modifications to the DNA bases caused by oxidation, deamination or alkylation are recognized and excised by DNA glycosylases2. At least 12 such enzymes are known in humans; they show narrow substrate specificity and recognize one or a few prominent endogenous lesions. DNA glycosylases hydrolyze the glycosidic bond linking the base to the sugar-phosphate backbone, generating an abasic-site product, which is processed by a common set of downstream enzymes that remove the abasic site and restore the original DNA sequence. These enzymes recognize and extrude the damaged nucleotide from the DNA helix into an active site pocket that confers specificity for the damaged base. A single aromatic or aliphatic residue is inserted into the helix from the minor groove and takes up the position of the displaced base3,4.

A few repair enzymes use slightly different strategies to recognize small base lesions. Among them is Endonuclease V (EndoV), which was

Orlando D. Schärer is in the Departments of Pharmacological Sciences and Chemistry and Arthur J. Campbell is in the Department of Chemistry, Graduate Chemistry Building Room 619, Stony Brook University, Stony Brook, New York 11794-3400, USA. e-mail: [email protected]

Figure 1 Recognition and repair pathway initiation of slightly distorting DNA lesions. (a) EndoV recognizes deaminated bases, in particular hypoxanthine (Hyp), by inserting a wedge into the DNA helix and binding the lesion in a specific pocket. The catalytic activity of EndoV makes an incision in the phosphodiester bond located one nucleotide 3′ to the lesion. EndoV remains bound to the incised product, allowing the recruitment of hitherto unknown downstream factors and completion of the pathway without exposure of the nick. (b) DDB2 binds to CPD (and other) lesions by inserting a wedge into the DNA helix to extrude the modified base into a shallow binding pocket. Binding of the DDB2–DDB1 complex to DNA locates the CUL4A/RBX1 ubiquitin ligase in proximity of the lesion, leading to the ubiquitination (Ub) of XPC and DDB2. This modification weakens the DNA binding activity of DDB2, but not that of XPC, and allows XPC–RAD23B (23B) to replace DDB2–DDB1 on the damaged site and progression through the NER pathway.

Lesion recognitionby wedge insertion

IncisionProduct binding

Hyp

Hyp

EndoV

Hyp

a b

Lesion removal (?)Repair synthesis (?)

on

DDB1–CUL4A/RBX1

23B

XPC

23B

XPC

XPC recruitment

Ub

CPD

CPD

DDB2

Progression throughthe NER pathway

UbCPD

Kim

Cae

sar

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 10: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 103

subsequent (currently unknown) enzymes in the pathway that remove the damage and restore the original DNA sequence.

So is this wedge mechanism for base extrusion and recognition found in other DNA-damage binding proteins? In a parallel and almost simultaneous study of a different repair pathway, NER, Scrima et al. report that UV DNA-damaged binding complex (UV-DDB, made up of the DDB1 and DDB2 subunits) uses a similar wedge-base mechanism to find lesions8. NER deals with numerous structurally diverse lesions formed by environmental agents such as solar UV radiation and others9. Within NER, UV-DDB is required for lesions that only moderately destabilize the DNA helix, whereas more distorting lesions are directly recognized by the XPC–RAD23B complex. XPC-RAD23B acts downstream of UV-DDB and recognizes the thermodynamic destabilization induced in the DNA by many NER substrates10 (Fig. 1b). UV-DDB is of particular importance for the repair of cyclopyrimidine dimers (CPD), which have only a mild influence on the structure of the DNA. The recent structure of DDB1–DDB2 bound to another photolesion, the 6-4 photoproduct (6-4PP), provides important insight into the mechanism of damage recognition in NER8. DNA is exclusively bound by the DDB2 subunit of the complex. DDB2 has a WD-40 β-propeller structure, and DNA binding is mediated by loops emerging from one side of the propeller. The key element for DDB2-mediated damage sensing is a 3- residue hairpin made up of the strictly conserved

originally isolated from Escherichia coli as an enzyme to process hypoxanthine arising from the nitrosative deamination of adenine5. EndoV also binds other deaminated bases, as well as mismatches, flap structures and Y structures, and it remains to be elucidated which one of these is the principal substrate in mammals6. Upon binding its substrate, EndoV cleaves the phosphodiester bond located one nucleotide 3′ to the lesion (Fig. 1a). This unusual mechanism for dealing with base lesions raises the question of whether EndoV also uses a nucleotide- flipping mechanism to recognize lesions in DNA. A paper by Dalhus et al.7 on page 138 of this issue of Nature Structure & Molecular Biology provides fascinating insights into this question. The distinguishing feature of the EndoV–DNA complex is a wedge of four residues (PYIP) that inserts into the minor groove of the helix (Fig. 2). In contrast to the precise extrusion of a single nucleotide by DNA glycosylases of almost 180° (ref. 3), the lesioned base in EndoV is rotated by only 90° into the binding pocket of EndoV, located on the minor groove side. The orphaned cytosine base in turn is slightly pushed into the minor groove of the helix. The role of the wedge might be to tap on the helix and present potentially modified bases to the lesion binding pocket. The protein stays bound to the 3′ incised product through a series of hydrogen bonds to the terminal 5′ phosphate at the excision position, with the hypoxanthine still fully engaged in the recognition pocket. This product binding probably serves to ensure that the nicked product is securely passed to the

residues Phe371, Gln372 and His373 that inserts into the DNA from the minor groove, displacing the lesion into a shallow binding pocket located in the major groove (Fig. 2b). The wedge has striking geometric similarity to the one found in EndoV, and in both cases 2 amino acids are stacked at the tip of the wedge (Gln372 and His373 in DDB2 and Tyr80 and Pro82 in EndoV) and inserted into the DNA helix. As is seen in EndoV, the lesion binding pocket in DDB2 is shallow. It does not provide much specificity for the 6-4PP, but seems to be well suited to accommodate various lesions with an intrastrand cross-link, including CPDs and 6-4PPs. Cross-linked bases induce a compression of the phosphate backbone of the DNA, and this compression is readily accommodated in DDB2. These properties allow DDB2 to bind minimally distorting lesions.

The interaction mode of DDB2 with damaged DNA ideally complements the binding of XPC–RAD23. The binding of the DDB2 opens the DNA at the lesion and induces a kink, making the CPD recognizable for XPC, which interacts with the adducted site by binding 2 nucleotides of undamaged single-stranded DNA opposite the lesion11. UV-DDB is associated with the ubiquitin ligase CUL4A/RBX1 via the DDB1 s ubunit12. DDB2 binding to DNA anchors this complex at the damaged sites and this leads to ubiquitination of DDB2 itself and XPC at the site of the lesion13 (Fig. 1b). This event is thought to weaken the binding of DDB2 to the lesion, permitting XPC to displace DDB2 and recruit the core NER factors. Inspection of the DDB2 and XPC DNA structures suggests that a clash of the respective damage binding regions would not permit the two proteins to bind simultaneously, providing one possible reason for the existence of the ubiquitination cascade. The complementary binding strategies of DDB2 and XPC thus provide a powerful way of recognizing a broad variety of lesions that induce diverse structural alterations in the DNA helix. Once XPC binding has been established, downstream NER factors are recruited. These factors ensure that a chemical modification is indeed present in the DNA and that the distortion is not induced by a simple mismatch. They then excise the lesion, as part of an oligonucleotide of 24–32 nucleotides, and fill in the resulting DNA gap9.

Considering the terrific progress made in the structural biology of DNA-damage recognition by proteins in the past 15 years, it is remarkable that studies of two unrelated proteins reveal a novel mechanism for locating damaged sites in DNA, a wedge consisting of 3 or 4 amino acids. It will be interesting to see whether this structural element is also present in other proteins. At least one other protein, the ultraviolet damage

Figure 2 EndoV and DDB1–DDB2 complexes. (a) EndoV binds hypoxanthine by inserting a wedge made up of the amino acids PYIP (turquoise and red, CPK representation) from the minor groove of the helix and displacing the base (tan, CPK) into a shallow binding pocket located in the minor grove. DNA is shown in dark blue and EndoV in light blue. (b) UV-DDB binds to 6-4PP through the DDB2 subunit. DDB2 uses a wedge made up of the amino acids PNH (turquoise, blue and red, CPK) that binds from the minor groove of DNA and extrudes the UV-induced adduct (tan, CPK) into a shallow binding pocket in the major groove. DDB2 is shown in light blue; DDB1 is shown in light red and binds DDB2 on the opposite side of the DNA binding surface. DNA is dark blue. Figures were made using PyMOL (http://pymol.sourceforge.net) using the structures PDB 2W35 and PDB 2W36 for EndoV and PDB 3EI1 for UV-DDB.

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 11: Nature Structural Molecular Biology February

104 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

Correct folding of the chromosome is essential for many fundamental events such as DNA replication, exchange and segregation of genetic information, DNA repair and cell differentiation. To support these functions, the structure of the chromosome is controlled both locally and globally. The global folding is especially intriguing, because it involves long-range ordering of the chromosome by much smaller proteins. Early EM studies suggested that such long-range order might be imposed by a protein scaffold at the core of the chromosomes, which would divide it into a series of giant loops1,2. The scaffold, long regarded as a half-mythical entity, started to take shape with the discovery of condensins, which are now beginning to look like its central component3–7. The mechanism by which condensins organize the chromosome, however, remains a mystery, and new clues have emerged from the recent study by Woo and co-workers8.

Condensins and their close relatives cohesins are widely conserved across species and contain at their core the characteristically

6. Moe, A. et al. Nucleic Acids Res. 31, 3893–3900 (2003).

7. Dalhus, B. et al. Nat. Struct. Mol. Biol. 16, 138–143 (2009).

8. Scrima, A. et al. Cell 135, 1213–1223 (2008).9. Gillet, L.C. & Schärer, O.D. Chem. Rev. 106, 253–276

(2006).10. Sugasawa, K. et al. Genes Dev. 15, 507–521 (2001).11. Min, J.H. & Pavletich, N.P. Nature 449, 570–575

(2007).12. Groisman, R. et al. Cell 113, 357–367 (2003).13. Sugasawa, K. et al. Cell 121, 387–400 (2005).14. Paspaleva, K. et al. Structure 15, 1316–1324 (2007).

endonuclease (UVDE), for which only a DNA-free structure is presently available, seems to contain a similar element14. All three proteins recognize a diversity of lesions, so the wedge may be an important feature in proteins with broad substrate recognition.

ACKNOWLEDGMENTWork in the O.D.S. laboratory is supported by US National Institutes of Health grants GM080454 and CA092584. A.J.C. is supported by an Integrative

Graduate Education and Research Trainseeship predoctoral fellowship (National Science Foundation Award No. 0549370).

1. Friedberg, E.C. et al. DNA Repair and Mutagenesis (ASM, Washington DC, 2005).

2. Hegde, M.L., Hazra, T.K. & Mitra, S. Cell Res. 18, 27–47 (2008).

3. Hitomi, K., Iwai, S. & Tainer, J.A. DNA Repair (Amst.) 6, 410–428 (2007).

4. Yang, W. Cell Res. 18, 184–197 (2008).5. Yao, M., Hatahet, Z., Melamede, R.J. & Kow, Y.W. J. Biol.

Chem. 269, 16260–16268 (1994).

V-shaped dimer of SMC (structural maintenance of chromosome) proteins, where two globular heads are connected via two long coiled coils9–13. The head domains can further associate via the shared ABC-type ATPase site, giving rise to SMC rings or macromolecular assemblies (Fig. 1). Yet another mode for macromolecular association is provided by kleisins, conserved non-SMC components of the complex, which also can link the SMC

heads together. These numerous modes of protein-protein interactions, coupled with the difficulties in obtaining crystal structures of the full-length proteins, created fruitful grounds for speculation about the architecture of SMCs9,10,13,14.

Even less clear is how SMCs interact with DNA. Condensins and cohesins were found to condense short DNA molecules14–19 and entire chromosomes14,15,17, and they

Valentin V. Rybenkov is in the Department of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Norman, Oklahoma 73019, USA. e-mail: [email protected]

Towards the architecture of the chromosomal architectsValentin V Rybenkov

MukBEF, the bacterial prototype of eukaryotic condensins and cohesins, has a key role in the global chromosomal organization of Escherichia coli and its close relatives. The recent report of the crystal structure of the MukB head domain in complex with its accessory subunits MukEF clearly demonstrates that MukBEF functions as a macromolecular assembly rather than a set of individual molecules and offers clues on how ATP and MukEF regulate the architecture of this complex.

a b

MukEF

MukB

Figure 1 Working models for the organization of MukBEF. ATP-mediated dimerization of the head domains of MukB can lead to ring closure (a) or formation of macromolecular structures (b). The dimerized MukB heads form a composite DNA binding site on the side of the protruding coiled coils. The non-flat surface of the site could prime the formation of large right-handed DNA loops, which have been detected for all condensins tested so far. In the case of intramolecular association of the heads (a), DNA binding inadvertently leads to the topological entrapment of the molecule within the protein ring. For the intermolecular association (b), topological entrapment via MukB and MukEF might be expected.

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 12: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 105

subunits, together with ATP, help modulate the architecture of the MukBEF assembly.

How this assembly organizes DNA is less clear. The authors interpret their results in terms of the MukEF- and ATP-modulated opening of the gate within the giant macromolecular MukBEF ring, similar to what has been proposed for cohesins. Other models can be put forward if we take into account all available data, including the physical association of MukB with DNA, efficient binding of condensins to linear DNA and the mutually exclusive binding of MukEF and DNA to MukB. Learning the role of the coiled coils could also force us to revise our views on the operation of condensins. All these concerns notwithstanding, we can safely expect that unraveling the architecture of condensins will be greatly facilitated by the newly available crystal structure of MukBEF.

1. Paulson, J.R. & Laemmli, U.K. Cell 12, 817–828 (1977).

2. Kavenoff, R. & Bowen, B.C. Chromosoma 59, 89–101 (1976).

3. Saitoh, N., Goldberg, I.G., Wood, E.R. & Earnshaw, W.C. J. Cell Biol. 127, 303–318 (1994).

4. Wang, Q., Mordukhova, E.A., Edwards, A.L. & Rybenkov, V.V. J. Bacteriol. 188, 4431–4441 (2006).

5. Hudson, D.F., Vagnarelli, P., Gassmann, R. & Earnshaw, W.C. Dev. Cell 5, 323–336 (2003).

6. Maeshima, K. & Laemmli, U.K. Dev. Cell 4, 467–480 (2003).

7. She, W., Wang, Q., Mordukhova, E.A. & Rybenkov, V.V. J. Bacteriol. 189, 7062–7068 (2007).

8. Woo, J.S. et al. Cell 136, 85–96 (2009).9. Hirano, T. Nat. Rev. Mol. Cell Biol. 7, 311–322 (2006).10. Nasmyth, K. & Haering, C.H. Annu. Rev. Biochem. 74,

595–648 (2005).11. Cobbe, N. & Heck, M.M. Mol. Biol. Evol. 21, 332–347

(2004).12. Hiraga, S. Annu. Rev. Genet. 34, 21–59 (2000).13. Huang, C.E., Milutinovich, M. & Koshland, D.

Phil. Trans. R. Soc. Lond. B 360, 537–542 (2005).14. Petrushenko, Z.M., Lai, C.H., Rai, R. & Rybenkov, V.V.

J. Biol. Chem. 281, 4606–4615 (2006).15. Kimura, K., Rybenkov, V., Crisona, N., Hirano, T. &

Cozzarelli, N. Cell 98, 239–248 (1999).16. Petrushenko, Z.M., Lai, C.H. & Rybenkov, V.V. J. Biol.

Chem. 281, 34208–34217 (2006).17. Stray, J.E., Crisona, N.J., Belotserkovskii, B.P.,

Lindsley, J.E. & Cozzarelli, N.R. J. Biol. Chem. 280, 34723–34734 (2005).

18. Strick, T.R., Kawaguchi, T. & Hirano, T. Curr. Biol. 14, 874–880 (2004).

19. Cui, Y., Petrushenko, Z.M. & Rybenkov, V.V. Nat. Struct. Mol. Biol. 15, 411–418 (2008).

20. Losada, A. & Hirano, T. Curr. Biol. 11, 268–272 (2001).

21. Haering, C.H., Farcas, A.M., Arumugam, P., Metson, J. & Nasmyth, K. Nature 454, 297–301 (2008).

22. Yamazoe, M. et al. EMBO J. 18, 5873–5884 (1999).23. Fennell-Fezzie, R., Gradia, S.D., Akey, D. & Berger, J.M.

EMBO J. 24, 1921–1930 (2005).24. Lammens, A., Schele, A. & Hopfner, K.P. Curr. Biol. 14,

1778–1782 (2004).25. Haering, C.H. et al. Mol. Cell 15, 951–964 (2004).

coils was highly positively charged, suggesting a possible DNA binding site. The kleisin was bound at the opposite side of MukBhd, with the overall arrangement being similar to that for the yeast SMC1–Scc1 complex25. Given such similarities, it is hard to avoid the conclusion that all SMC complexes share fundamentally the same mechanism.

An important contribution of this study is that, using site-directed mutagenesis, the authors actually verified that the positively charged patches on the top and partially on the sides of MukBhd are indeed involved in DNA binding. Such an arrangement suggests that the bound DNA might be wrapped around MukBhd and could thereby explain the highly conserved propensity of SMCs to introduce right-handed writhe into DNA14,15,17. Furthermore, the experiment clearly demonstrated that the coiled coils are completely expendable for DNA binding and, therefore, that MukB indeed physically interacts with DNA.

The biggest surprise came from the comparison of MukBhd–MukEF crystals grown in the presence and absence of DNA. The crystal grown with DNA contained symmetric, dimeric MukBhd with two bound MukEFs. Curiously, no DNA was found in this structure, suggesting that it was displaced during crystal growth. In the absence of DNA, however, only one MukEF could be found in complex with the ATPγS-sandwiched dimer of MukBhd, with the flexible linker of MukF occupying the binding site for the second MukEF. The authors go to great length to demonstrate that the discovered linker-MukBhd interaction is specific and physiologically significant. Moreover, they determine that one of the MukEFs is indeed displaced from MukBhd when ATP-induced dimerization of MukBhds is reconstituted in solution. This result parallels the earlier finding that the full-length MukB can form only a B2(E2F)1 complex, whereas the complex B2(E2F)2 is short-lived and requires magnesium for stability16.

Perhaps the most important result of this study is the realization that MukBEF operates as a macromolecular assembly and not a set of individual molecules. Indeed, as Woo et al. point out, the extended shape of MukEF precludes it from binding to MukB heads that are engaged in the ATP-mediated dimer. Furthermore, the study clearly establishes that the non-SMC

even revealed preference for either intra- or intermolecular DNA condensation, according to their intracellular functions20. These intriguing activities befit the role of these proteins in global chromosome organization but are poorly understood mechanistically. There is no agreement even on whether SMCs physically associate with DNA during the reaction. Thus, genetic and cell biology evidence strongly suggested that cohesins embrace DNA within the protein rings21. In contrast, biochemical studies unequivocally point to a physical association between condensins and DNA14,16,18,19.

Into this picture enters the recent study by Woo and co-workers8. In this study, the authors present the first ever structure of the entire non-SMC component of a condensin, the MukEF complex, and further characterize the interaction of MukEF with the head domain of MukB, MukBhd. This impressive feat was made possible by the somewhat simplifying aspect of MukBEF, which, unlike its eukaryotic counterparts, consists of only three subunits and can be purified as two distinct complexes, MukB and MukEF16,22.

The authors start by presenting the structure of MukEF. In agreement with previous studies16,23, MukEF proved to be a highly elongated complex with the stoichiometry (E2F)2, where the N-terminal winged- helix domain (WHD) of MukF provides the dimerization interface between two E2F units. The MukE molecules seem to be integrated within the middle region of MukF, which lends support to the notion that MukEF indeed operates as a single unit. The two MukB binding domains, the C-terminal WHDs of MukF, are attached to the apices of the extended MukEF molecule via flexible linkers (Fig. 1). Such an arrangement seems ideally suited to ensure the binding of MukEF to two distant MukB heads, which could be located at variable distances and orientations.

The authors then proceed to the structure of the MukEF–MukBhd complex. The use of ATPγS and the ATPase-deficient mutant E1435Q for crystallization ensured the ATPγS-sandwiched dimerization of the head domains of MukB. Despite only 9% sequence identity, the structure of the MukBhd proved to be similar to that of the Pyrococcus furiosus SMC head24 and the yeast Smc1 head25. As in the other structures, the surface that faces the coiled

n e w s a n d v i e w s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 13: Nature Structural Molecular Biology February

106 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

Written by Angela K. Eggleston, Joshua M. Finkelstein, Maria Hodges & Michelle Montoya

Initiation factorsAlthough it is known that the eukaryotic ribosome binds the 5′ end of a message then scans to the initiation codon, the mechanism by which such a scanning ribosome would overcome secondary structure in the 5′ untranslated region (5′ UTR) has been less clear. Eukaryotic protein synthesis begins with recognition of the initiation codon by the 40S ribosomal subunit and formation of a 48S initiation complex, in which initiator tRNA is base-paired with the initiation codon. The first step in its assembly is the attachment of a 43S pre-initiation complex, composed of a 40S ribosomal subunit, four initiation factors (eIFs), 1, 1A, 2 and 3, and intitiator tRNA, to the 5′-proximal region of mRNA. Once bound, the 43S complex scans along the 5′ UTR to the initiation codon, where it forms the 48S complex. Attachment is mediated by three additional eIFs, 4A, 4B and 4F, which cooperatively unwind the mRNA to allow 43S complexes to bind and then also assist them in scanning. Together, the seven eIFs are sufficient for ribosomal scanning on an unstructured 5′ UTR, but, as Pestova and colleagues show, highly structured 5′ UTRs require an additional factor, the DExH-box protein DHX29. The authors used an in vitro reconstituted initiation system containing the 40S subunits and seven eIFs and found that, in the absence of DHX29, the 48S complex did not form efficiently, even on moderately stable GC-rich mRNA. Furthermore, they noticed an additional toeprint at +8–9 nucleotides from the start codon; properly assembled 48S complexes have a toeprint at +15 –17 nucleotides. The authors identified DHX29 as a factor that removes the aberrant toeprint, and it is required for efficient 48S complex formation. They show that it binds to the 40S subunit and hydrolyzes ATP, GTP, UTP and CTP. They speculate that DHX29 induces a conformational change within the 48S complex that enables ribosomal accommodation of mRNA. (Cell 135, 1237–1250, 2008) MH

Translocon quality controlThe bacterial translocon core, SecYEG, is a protein- conducting channel essential for the production of most secreted and integral membrane proteins. SecYEG makes important co-translational interactions with the signal-recognition particle and ribosome, and it cooperates with the SecA ATPase during protein translocation of secretory proteins across the membrane. SecYEG can also self-associate into oligomeric complexes. The biogenesis and regulation of SecYEG remains to be fully understood. The protein Syd was originally isolated as a suppressor of a dominant-negative mutant of SecY, and there was evidence to suggest that Syd and SecY interact directly with each other. Using nanodiscs—a single membrane complex (SecYEG, in this case) placed in a small lipid bilayer supported by two membrane scaffold proteins—Duong and colleagues were able to determine that Escherichia coli Syd makes interactions with two cytoplasmic loops of SecY that are also known to be involved in SecY’s interaction with SecA. The crystal structure of Syd reveals a charged cavity that cross-linking analysis suggests is involved in making SecY interactions. Interestingly, Syd can interact with a SecYEG monomer in nanodiscs, but it cannot compete with SecA for binding to SecYEG purified from inner membrane vesicles, which exist in an oligomeric form. Analysis of SecE mutants defective in interacting with SecY indicates that Syd preferentially recognizes misassembled SecYEG complexes, and the authors show that SecY can dissociate SecYEG dimers formed in detergent micelles. They suggest that Syd may be acting as part of a quality- control system, interacting with improperly formed complexes and thereby facilitating their degradation by the FtsH protease. (J. Biol. Chem. published online 12 January 2009, doi: 10.1074/jbc.m808305200) MM

rIGging the deck of CArDsRIG-I is a cytosolic protein that recognizes ‘pathogen-associated molecule patterns’ (PAMPs)—for example, a 5′-triphosphate group on a strand of RNA or a double-stranded RNA duplex—when viral RNA is present inside a cell. The detection of PAMPs by RIG-I leads to the activation of type-I interferons, which promote a robust immune response. RIG-I has several domains, including two N-terminal caspase activation and recruitment domains (CARDs) and an ATPase domain, the cellular function of which is not clear. Myong et al. used a single-molecule approach to probe the exact function of these domains in vitro, and their experiments revealed that RIG-I uses ATP hydrolysis to translocate, but not to unwind, double-stranded RNA. Deletion of part of or all of the two CARDs led to an increase in the rate of translocation, suggesting that these domains negatively regulate the activity of RIG-I. If the RNA contained a 5′-triphosphate group, the translocase activity of wild-type RIG-I increased dramatically; additional experiments revealed that this occurred only when the 5′-triphosphate group was on the RNA molecule being translocated. Although the physiological function of RIG-I’s translocase activity in vivo is still not clear, the authors note that these findings mean that DExH-box ATPases are now known to include both single-stranded and double-stranded translocases for RNA and DNA. (Science, published online 1 January 2009, doi:10.1126/ science.1168352) JMF

An alternative response to damageFaced with an uncertain, changeable environment, bacteria use multiple promoter-specificity (sigma) factors that modulate the activity of RNA polymerase and, thereby, gene expression. A major alternative sigma factor in Escherichia coli is RpoS. This protein had been linked to the stress response, but its role in the response to DNA damage was unknown. In a screen for genes involved in the DNA-damage response, Lovett and colleagues identified iraD. Previously, IraD had been shown to inhibit RssB, a protein that targets RpoS for ClpXP-mediated degradation. In agreement with this model, they found that rpoS mutants were also sensitive to DNA-damaging agents, and such mutations were epistatic with iraD mutations. In addition, mutation of rssB alleviated the sensitivity of iraD mutant cells, consistent with the notion that the instability of RpoS in the presence of RssB is responsible for the DNA-damage sensitivity. Although IraD expression was induced by most DNA-damaging agents, it was notably not induced by mitomycin C, an agent that strongly elicits the classic SOS damage response. In the SOS pathway, DNA damage leads to formation of a RecA filament, which activates self-cleavage of the transcriptional repressor LexA, bound to several genes needed for DNA repair. Supporting the idea that IraD acts in a separate pathway, iraD lexA cells were hypersensitive to DNA-damaging agents. This result s uggests that complementary pathways leading to upregulation of specific damage-repairing genes and a change in the RNA polymerase composition to include the RpoS sigma factor contribute to the DNA-damage response in E. coli. (Proc. Natl. Acad. Sci. USA 106, 611–616, 2009) AKE

r e s e a r c h h i g h l i g h t s

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 14: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 107

Saverio Brogna and Jikai Wen are at the University of Birmingham, School of Biosciences, Edgbaston, Birmingham B15 2TT, UK. e-mail: [email protected]

Received 11 February 2008; accepted 2 January 2009; published online 4 February 2009; doi:10.1038/nsmb.1550

and SMG7 are thought to regulate the phosphorylation state of UPF1, with phosphorylation of UPF1 being necessary for its interaction with mRNA-decay factors7,14,15.

NMD is thought to serve as an mRNA-surveillance mechanism to prevent the synthesis of truncated proteins that would have the potential to have toxic effects such as dominant negative interactions, but its full physiological importance is not yet clear. The main NMD substrates are likely to be aberrant transcripts that have acquired PTCs as a consequence of mutations or errors during transcription or RNA processing16. In particular, it has been proposed that NMD may have a proofreading role in gene expression, eliminating transcripts that have not been spliced owing to their suboptimal splice signals17,18. It has also been suggested that NMD regulates alternative splicing by eliminating unproductive splice variants that contain PTCs (reviewed in ref. 19). However, other studies have failed to find evidence of a widespread coupling between NMD and alternative splicing19. Genome-wide studies have also suggested a role for NMD in regulating the expression of a subset of normal mRNAs (reviewed in ref. 20), but there is poor overlap between the sets of transcripts that NMD influences in different species and no correlation with any specific cellular function20.

Deletion of the genes encoding core NMD factors UPF1, UPF2 and UPF3 does not impair viability in S. cerevisiae or C. elegans6,11, but some of the NMD factors are essential for viability in other organisms21–23. For example, UPF1 (also known as Rent1 in mice) is essential for mouse embryonic development21, and UPF2 depletion in hematopoietic cells causes the loss of all hematopoietic stem and progenitor populations24. It is not yet clear whether NMD or some other essential function of these proteins is necessary for survival25–29.

PTC discrimination in yeast: the faux 3′ UTR modelThe key question in NMD concerns how the process distinguishes between a PTC and a normal stop codon. Current models hypothesize that UPF1 is selectively recruited onto or activated on prematurely terminating ribosomes, and that UPF1, in association with UPF2 and UPF3 (and additional NMD factors in higher organisms), somehow promotes mRNA degradation4,5,7.

It is not yet understood what directs the recruitment (or activation) of NMD factors at the premature termination site, and the mechanisms may differ considerably between organisms. Early studies in S. cerevisiae showed that deletion of some regions downstream of the nonsense mutation can abolish NMD and suggested that PTCs are distinguished by the presence of a downstream sequence element (DSE) that stimulates NMD by binding NMD-stimulatory factors such as Hrp1p30,31. A DSE was proposed to be a degenerate sequence motif present in a few copies along the mRNA coding region, so the closer the PTC was to the beginning of the gene, the higher the

Nonsense-mediated mRNA decay (NMD) is a translation-coupled mechanism that eliminates mRNAs containing premature translation-termination codons (PTCs). In mammalian cells, NMD is also linked to pre-mRNA splicing, as in many instances strong mRNA reduction occurs only when the PTC is located upstream of an intron. It is proposed that in these systems, the exon junction complex (EJC) mediates the link between splicing and NMD. Recent studies have questioned the role of splicing and the EJC in initiating NMD. Instead, they put forward a general and evolutionarily conserved mechanism in which the main regulator of NMD is the distance between a PTC and the poly(A) tail of an mRNA. Here we discuss the limitations of the new NMD model and the EJC concept; we argue that neither satisfactorily accounts for all of the available data and offer a new model to test in future studies.

NMD was discovered when it was realized that cells often contain unexpectedly low concentrations of mRNAs that are transcribed from alleles carrying nonsense mutations. This phenomenon—observed in all investigated organisms, from bacteria to mammalian cells1–3—has been extensively studied in eukaryotic cells.

In eukaryotic cells, NMD requires both active mRNA translation and NMD-specific trans-acting factors4,5. Three well-investigated trans- acting factors in NMD are the proteins encoded by the UPF1, UPF2 and UPF3 genes, which were discovered in genetic suppressor screens in Saccharomyces cerevisiae6. These genes are evolutionarily conserved, and their deletion or silencing prevents NMD in all tested eukaryotic organisms7. The three UPF proteins are believed to form a trimeric complex that constitutes the core of the NMD machinery and links premature translation termination to mRNA degradation7, and they are thought to be the first NMD factors to bind to prematurely terminating ribosomes. Additional factors are likely to be involved in NMD. In yeast, mutations in several genes involved in translation and pre-mRNA processing also selectively stabilize PTC-containing mRNAs8,9. In multicellular animals, a plethora of additional factors have been implicated in NMD7. The better characterized are the SMG1, SMG5, SMG6 and SMG7 proteins, first identified as allele-specific suppressors of mutations in various Caenorhabditis elegans genes10,11. The initial genetic screen also hit smg-2, smg-3 and smg-4, the C. elegans homologs of UPF1, UPF2 and UPF3 (refs. 11,12). SMG1 is a protein kinase that can phosphorylate UPF1 (or SMG2) (ref. 13); SMG5, SMG6

Nonsense-mediated mRNA decay (NMD) mechanismsSaverio Brogna & Jikai Wen

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 15: Nature Structural Molecular Biology February

108 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

chance of a DSE occurring downstream. The main problem with this model is that there is no clear similarity between the putative DSE sequences from diverse mRNAs, and so the idea of a specific DSE that provokes premature termination remains nebulous32.

It has also been reported that abnormal mRNAs with extended 3′ untranslated regions (UTRs) are NMD substrates33, and it was proposed that an important distinction between normal and premature termination might lie in the very different distances between the stop codon and the 3′ terminal mRNP32,33. Recent studies suggest that the distance between the PTC and the poly(A) tail might be the key determinant. This mechanism is conceptualized by the so-called ‘faux (false) 3′ UTR model’, which says that premature translation termination is intrinsically abnormal because it takes place a long distance from the 3′ end, and that this prevents the normal interaction between the terminating ribosome and poly(A) binding protein (PABP). Instead, NMD factors associate with the terminating ribosome4 (Fig. 1a). The model is mainly based on the observation that ribosomes terminating at a PTC and at the normal stop codon leave different mRNA footprints (toeprints)34. This abnormal ribosome positioning was not observed in cells lacking UPF1 or when the PTC was flanked by a normal 3′ UTR. Moreover, PTC-containing mRNAs were stabilized if PABP was artificially tethered close to the PTC34. Earlier observations—that the peptide- release factor eRF3 and PABP interact during standard termination, and that overexpression of PABP can enhance termination in some eRF3 mutants—are also consistent with the faux 3′ UTR model35,36.

PTC recognition in mammalian cells: the EJC modelThe hallmark of eukaryotic gene expression is that transcription and RNA processing take place in the nucleus, whereas translation occurs in the cytoplasm. It was therefore believed that there is generally no link between nuclear events such as pre-mRNA splicing and cytoplasmic events such as translation and mRNA destruction. This view has been challenged in recent years, primarily by the initially baffling discovery that the presence of introns in genes enhances NMD. Several groups found that PTCs in mammalian mRNAs cause NMD when they are located upstream of an intron and that artificially inserting an intron downstream of a wild-type stop codon can make it behave similarly to a PTC37–39. The discovery that mRNAs derived from naturally intronless genes are immune to NMD was consistent with these observations40,41.

It was therefore proposed that downstream introns may serve as ‘second signals’ in NMD. Most genes do not have introns in the 3′ UTR, so this provided a logical mechanism by which the NMD apparatus might distinguish between premature and canonical stop codons. It was also observed that many genes in which PTCs are very close to the last intron are immune to NMD, and this led to the proposal of a ‘rule’: for a PTC to drive strong NMD, it should be located at least 50–55 bases upstream of the last exon-exon junction37,39,42. Later, supporting discoveries indicated that the spliceosome deposits the EJC—a multiprotein complex with a core of four proteins that interact with UPF2 and UPF3—20–24 bases upstream of the exon-exon junction and that the EJC serves as a binding platform for NMD factors43 (Fig. 1b).

The current version of this ‘EJC model’ is that upon translation termination, the EJC promotes recruitment and activation of UPF1 on the ribosome, initiating the formation of the so-called NMD-inducing complex, and this NMD-inducing complex promotes accelerated destruction of the targeted mRNAs15,44. This model of NMD represents a logical and convenient synthesis and is consistent with the proposal that in mammalian cells the mRNA is susceptible to NMD only during the first round of translation (the pioneer round): once the EJC dissociates the mRNA becomes refractory to NMD5,45,46. The EJC is displaced either by the translating ribosome, if located before or very close to the PTC,

Figure 1 Current NMD models. (a) The faux or false 3′ UTR model. Normal translation termination is efficient because it takes place close to the proper 3′ UTR, allowing interaction between poly(A) binding protein (PABP), which is associated with the poly(A) tail, and peptide-release factor eRF3 (R3), which is associated with the terminating ribosome. Premature termination is inefficient because the terminating ribosome cannot interact with PABP, and it instead interacts with UPF factors. R1 and R3 represent peptide-release factors eRF1 and eRF3; 4E and 4G represent translation-initiation factors eIF4E and eIF4G. (b) The EJC model. Translation termination triggers NMD when occurring upstream of the EJC by promoting recruitment or activation of UPF factors on either the terminating or post-termination ribosome. The line with a black dot denotes phosphorylation of UPF1. EJC components are indicated with standard abbreviations44. CBC indicates the cap binding complex. (c) The unified faux 3′ UTR model. Similarly to what is described for the faux 3′ UTR model in a, premature translation termination is inefficient because it occurs away from the 3′ end. However, the EJC also has a role in the recruitment and/or activation of UPF factors to inefficiently terminating ribosomes.

Faux 3′-UTR model in yeast and Drosophila

a

b

cUnified 3′ UTR

model

AA

AA

AA

AA

n

Premature

termination

Normal

termination

Start

( )

( )

R1

R3

UPF1

UPF2

UPF3

UPF1UPF2

UPF3

R3R1

UPF3

( )

eIF4A3

EJC modelin mammalian cells

AA

AA

AA

AA

n

CB

C

SMG1

Premature

termination

Start

( )

EJC

4E5′

3′

4GPAB

P

UPF2

UPF1

R1

R3

4G5′

3′

eIF

4AIII

RN

PS

1

Y14

Magoh

PAB

PA

AA

AA

AA

An

Premature

termination

Normal

termination

Start

( )

( )

R1

R3

UPF1

UPF2

UPF3

UPF1UPF2

UPF3

R3R1

3′

4GPAB

P

4E

STOP

STOP

5′

STOP

EJC

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 16: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 109

and caused NMD39. But it also used inducible reporters and assayed NMD by comparing mRNA half-lives, whereas the earlier work relied on steady-state mRNA levels of constitutively expressed reporters. Two additional studies also reached the conclusion that NMD is triggered when termination occurs a long distance from the 3′ end49,55, with one of those also reporting that PABPC1 stimulates and UPF1 inhibits normal translation termination49.

Issues with the faux 3′ UTR and more recent variants of itAn emerging view, therefore, is that introns are neither sufficient nor necessary for NMD and that ‘premature’ termination may be defined more simply as a termination event that occurs far from the poly(A) tail. These recent studies suggest that PTC recognition is conserved among eukaryotes and it can be explained by a slightly revised version of the yeast faux 3′ UTR model: the ‘unified 3′ UTR model’4,49,55,57,58 (Fig. 1c). This proposes that the key trigger for NMD is an inefficient termination event caused by the failure of PABP (or other 3′ UTR–associated factors) to interact with the terminating ribosome, with the EJC sometimes functioning as a secondary NMD enhancer.

The faux 3′ UTR models are also consistent with the so-called ‘polarity effect’: NMD is most apparent when a PTC is located in the first half of the coding region. This is well documented in yeast, and indeed, it was noticed in the first report of NMD in budding yeast60. Polarity of NMD has also been observed in the Alcohol dehydrogenase (Adh) gene in D. melanogaster61 and in S2 cells54. However, the polarity effect is not always gradual or linear with the distance from the 3′ UTR, either in yeast62 or D. melanogaster54, and NMD is not typically polarized in mammalian genes, where PTCs at most positions cause a similar degree of mRNA reduction. Some 5′-to-3′ polarity was observed within individual exons in TCR-β minigene reporters, but this is more likely to correlate with the distance to the next intron rather than the distance to the 3′ end51.

A problem with faux 3′ UTR–based models is that they downplay the fact that an mRNA that is being translated is likely to be in a closed-loop conformation stabilized by an interaction between the 5′ and 3′ ends, probably mediated by translation-initiation factors eIF4E, eIF4G and PABP63. The idea that mRNA is translated in a closed-loop conformation is consistent with many studies in budding yeast63,64 and probably also applies to other organisms—the interaction between eukaryotic initiation factor eIF4G and PABP seems to be evolutionarily conserved65,66 and EM images of mammalian cells show polysomes as double rows of ribosomes resembling closed hairpins67.

PTCs close to the 5′ end are, in the context of a circular mRNP, likely to be physically close to the 3′ end, but faux 3′ UTR models do not explain why these usually provoke NMD most effectively. For example, PTCs as close to the start as codon 7 in the CYC1 gene budding yeast62,68, codon 26 in the mammalian β-globin gene69 or codon 23 in the triosephosphate isomerase gene (TPI) cause strong NMD70. In TPI transcripts, PTCs close to the initiation codon show reduced NMD, probably because translation can re-initiate at a downstream AUG codon71. In the case of the β-globin transcript, PTCs close to the AUG also escape NMD69. This does not seem to be due to re-initiation, but it also does not depend on the length of the 5′ UTR72. It has been suggested that eIF4G may remain associated with the ribosome for a short while after initiation and, because eIF4G interacts with PABP, this may suppress NMD at very early PTCs55.

While the proposal that eIF4G remains temporarily associated with the ribosome55 can be incorporated into faux 3′ UTR models, the idea that some initiation factors remain associated with the ribosome for a short while following initiation is, most likely, also the reason why ribosomes can resume translation following termination at short open reading frames73,74. Re-initiation would require scanning, most likely by the 40S subunit rather than the whole 80S ribosome, and it is feasible

or possibly by the scanning 40S subunit if located in the 5′ UTR. If the EJC is located further downstream of the PTC, it is not clear how it could be removed. It is possible that the first round of translation and the global remodeling of the messenger ribonucleoprotein (mRNP) in the cytoplasm—typified by the replacement of the cap binding complex (CBC) with the eukaryotic initiation factor eIF4E at the 5′ end45,46—can shed downstream EJCs. Remodeling of mRNPs is an active process requiring energy, but, we speculate, it is probably also passively facilitated by the stochastic association and dissociation of proteins. Many components of the EJC are concentrated in the nucleus43,47, so it is possible that some proteins, such as UPF3, may simply detach rapidly at the lower concentrations in the cytoplasm48. The standard EJC model is conceptually similar to the superseded yeast DSE model. Unlike the faux 3′ UTR model, it does not regard premature termination as intrinsically different from normal termination but suggests that the presence of the EJC signals the recruitment or activation of NMD-promoting factors to the post-termination ribosome5,44. However, a recent study reported that UPF1 interferes with termination49, suggesting that the bridging between the ribosome and EJC might also affect termination due to the recruitment or activation of UPF1.

Is the EJC model generally applicable?Many studies have supported the EJC model of PTC recognition, but a number of observations do not conform to its predictions. First, PTCs that are closer than 50 bases from the final exon-exon junction do induce NMD in the T-cell receptor (TCR)-β and immunoglobulin-µ (Ig-µ) transcripts38,50,51. This suggests that the ribosome can recognize PTCs even when the EJC (or other associated factors) would have been physically displaced by the treadmilling of the mRNA through the ribosome entry tunnel. Second, the general applicability of the EJC model was played down by the finding that NMD does not require introns or EJC proteins in Drosophila melanogaster S2 cells52 or in C. elegans53. Third, and consistent with the faux 3′ UTR model, the ability of a PTC to cause NMD in S2 cells is primarily modulated by its distance from the 3′ end, and NMD can be suppressed by tethering PABP proximally to the PTC54. In addition, there was no NMD when an mRNA lacked the poly(A) tail or when PABP was RNAi depleted54.

Several recent studies have questioned the importance of introns and the EJC in NMD and proposed that a faux 3′ UTR model explains NMD better, even in mammalian cells49,55–58. One reported that NMD in a Ig-µ minigene reporter does not require downstream introns or EJC factors—rather, it depends on the distance between the termination codon and the poly(A) tail56. It was then shown that extending the distance between the normal stop codon of a reporter mRNA and its 3′ end caused a reduction in mRNA levels that required the presence of UPF proteins—the normal stop codon was now behaving similarly to a PTC58. And NMD could be suppressed either by inserting a sequence complementary to a region just downstream of the PTC next to the poly(A) tail, so making the poly(A) tail fold back to the vicinity of the PTC, or by tethering PABPC1 (one of the four mammalian cytoplasmic PABPs) nearby58,59. Another study that challenged the role of introns in mammalian NMD reported that some introns, including those known to recruit the EJC, did not induce NMD when placed downstream of the termination codon57. Extending the 3′ UTR again destabilized the mRNA in an UPF1-dependent manner, and tethering of PABPC1 re-stabilized this mRNA, so the authors concluded that an EJC in the 3′ UTR is not sufficient or necessary for NMD but can enhance the decay of an mRNA that is already undergoing NMD because it has an extended 3′ UTR57. Notably, this study used the well- characterized β-globin mRNA, in which it was reported that insertion of an intron in the 3′ UTR made the normal stop codon behave similarly to a PTC

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 17: Nature Structural Molecular Biology February

110 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

in budding yeast78. The conclusion that PAB1 is dispensable for NMD is complicated by the fact that this protein is normally essential for budding yeast, so PAB1 requirement was tested in a strain carrying a secondary mutation that suppresses lethality. However, an earlier study that reported that NMD is unaffected by deletion of the C-terminal region of PAB1, which contains the PABC domain thought to mediate the interaction with terminating ribosomes, reached similar conclusions79. Therefore, on the basis of the available data, the poly(A) tail and PABP seem unlikely to have a key role in NMD.

It has been argued that in the absence of PABP, other 3′ UTR–associated factors can antagonize NMD57. However, the original PABP-centric faux 3′ UTR model is in conflict with the earlier observation that nonpolyadenylated transcripts with a histone mRNA 3′ end can undergo NMD in HeLa cells80. In this latter case, the histone 3′ terminal stem-loop binding protein (SLBP) may be able to function in a similar way to PABP by interacting with eIF4G81 and possibly with the terminating ribosome82. But whether the interaction modulates NMD is not clear, as endogenous histone mRNAs do not seem to be susceptible to NMD41.

A new NMD model to test in future studiesAs it has become clear that the distance between a PTC and the 3′ end is generally an important determinant of NMD, and that faux 3′ UTR models do not accommodate some of the key observations, we would like to offer an alternative speculative model that incorporates evidence that the translation-initiation factor eIF3 is implicated in NMD. Earlier studies suggested some role for eIF3 in NMD in yeast and in mammalian cells, and a recent study showed that UPF1 interacts with eIF3 (refs. 9,83,84). eIF3 is the principal factor that mediates the splitting of the ribosome into subunits after translation termination85. The key observation is that phosphorylated UPF1 suppresses translation initiation, probably by interacting with eIF3 and preventing the association of the 40S and 60S into a translation-competent 80S ribosome83.

In view of these new findings, we would like to propose that NMD might be a consequence of the release of post-termination ribosomes from a region of the mRNA that normally would be translated (Fig. 2a). The model proposes that ribosome subunits are released from the mRNP efficiently, regardless of the position of the PTC. However, whenever termination occurs early, a long region of mRNA downstream of the PTC will not be traversed by ribosome subunits and will destabilize the mRNP. The unstable mRNP will be either shunted to destruction by the canonical exonucleases involved in general mRNA degradation86 or perhaps also attacked by endonucleases. The latter possibility is consistent with the earlier observation that, in D. melanogaster, NMD involves an endonucleolytic attack87; also supporting this idea are recent reports that this also occurs in human cells, with SMG6—one of the additional factors involved in NMD in multicellular animals—being the responsible endonuclease88,89. SMG6 cleaves the mRNA in a broad region around the PTC, perhaps reflecting an association with the ribosome, but it is feasible that other endonucleases might be involved in NMD. If the PTC is positioned toward the end of the coding sequence, mRNA destruction will be minimized because the untranslated and exposed region is unlikely to be long enough to destabilize the mRNP noticeably (Fig. 2b). A key feature of the model is that NMD, at least in its most basic form, would not depend on the nature of the 3′ UTR but, instead, on the fact that early release of ribosome from the mRNA leaves a large region of the mRNA exposed, destabilizing the mRNP. This model is consistent with observations that mRNAs that lack poly(A) tails remain susceptible to NMD78,80.

Does this mean that NMD is a passive mechanism by which mRNAs that are not protected by ribosomes are preferentially destroyed? Probably not. NMD depends on UPF1, other specific trans-acting factors

that the reason why very early PTCs escape NMD in mammalian systems is that scanning is resumed after these early PTCs, regardless of whether translation is re-initiated.

In one published study58, the fact that a PTC at codon 32 (ter32) did not cause strong NMD was interpreted as meaning that close proximity to the 5′ end prevents NMD, but the possibility of translation re- initiation or a potential dependence on the length of the 5′ UTR remains to be assessed. In addition to its proposed role in termination, PABP has a well-documented role in translation initiation75, but none of the studies mentioned above assessed whether the tethering of PABP influenced translation initiation.

As mentioned above, there are reports that re-initiation of translation downstream of the PTC can prevent NMD2,71, and there are also reports that PABP can promote internal translation initiation76. Given the fact that PABP has multifaceted roles in standard mRNA decay59, might one possibility be that the tethering of PABP stabilizes potential NMD substrates simply because it prevents standard mRNA decay, rather than because it inhibits the process of NMD? Consistent with this idea, earlier studies reported that tethering of PABP can stabilize mRNAs that do not contain PTCs in budding yeast77.

The general validity of faux 3′ UTR models is also questioned by the recent report that neither a poly(A) tail nor PAB1 is essential for NMD

eIF3

(+)

60S

40S

(+)

AAAAAAAAAAAAAAAAAA

40S

40S

Fast ribosome recycling40S

NMD

Start

Start

3′

R3

R1

a

b

40S

40S

eIF3UPF1

(+)

60S(+)

5′

STOP

STOP

Cap

AAAAAAAAAAAAAAAAAA3′5′Cap

40S

40S

Unstable translation circuit

Normal mRNA stability

Stable translation circuit

Early PTC

Late PTC

R3

R1

UPF1

Figure 2 The proposed ribosome release NMD model. (a) Following termination at PTCs located early in the coding region, the 80S ribosome splits into the 60S and 40S subunits, which swiftly detach from the mRNA. UPF1 promotes the disassociation of the post-termination ribosome from the mRNA. Release of the ribosome subunits leaves the long region downstream of the PTC unprotected, destabilizing the mRNP and translation circuit and causing rapid mRNA decay. (b) Termination at PTCs located near the normal 3′ UTR leaves only a short region that is not trafficked by translating ribosomes, which is not sufficient to destabilize the mRNP and translation circuit. Proximity to the normal 3′ UTR also promotes the recycling of 40S subunits, which remain associated with the mRNA and migrate to the 5′ end of the same mRNA. Both factors synergize to maintain normal mRNP stability.

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 18: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 111

to hold true across genomes as diverse as those of Arabidopsis thaliana, Homo sapiens, C. elegans, D. melanogaster and Schizosaccharomyces pombe. Knockdown of UPF1 in P. tetraurelia leads to accumulation of unspliced RNA, suggesting that NMD does preferentially remove PTC-containing transcripts18. This is a compelling indication that translation and splicing are coupled. Although these observations would be consistent with the view that NMD regulates splicing indirectly by removing unspliced pre-mRNAs and spliced isoforms containing PTCs19, they would also be consistent with the view that translation may affect splicing and other steps of pre-mRNA processing directly.

Finally, we suggest that many of the ambiguities in the field may result from a failure to consider the more complex idea that nonsense- mediated mRNA reduction (NMMR) could be the result of a ‘double act’ that involves two superimposed processes, only one of which may affect mRNA decay. One would be an exclusively cytoplasmic process that reduces the stability of polysomal mRNA—we suggest that the primary factor triggering NMD is that a long region of the mRNA downstream of the PTC is not covered by ribosomes, as outlined by our model (Fig. 2) —but the actual destruction of the mRNA might involve different mechanisms. The other would be a process coupled to nuclear events that we do not yet understand. In this latter interpretation, nuclear NMMR would simply be a result of reduced mRNA production due to inefficient pre-mRNA processing caused by the presence of the PTC. Generally, NMD is portrayed as a single biochemical mechanism because NMMR, be it cytoplasmic or nucleus-associated, typically does not occur in cells that have been depleted of any of the UPF1, UPF2 or UPF3 subunits of the NMD machinery. However, there are cases in which NMD occurs independently of UPF2 and UPF3 (refs. 99,100). Moreover, genome-wide microarray studies have indicated that depleting cells of these NMD factors stabilizes some mRNAs that do not contain PTCs or other features conducive to NMD20,90. Notably, mutations in UPF1 and other NMD factors in D. melanogaster cause an increase in the expression of many transgene reporters, regardless of whether PTCs or other NMD features are present22. Thus, depletion of UPF1 may tend to have a nonspecific mRNA stabilizing activity; if this is the case, it becomes misleading to conclude that two instances of NMMR are caused by the same mechanism just because loss of UPF1 function suppresses them both. Future research shall unveil the mechanisms behind the twists and turns of this important field.

ACKNOWLEDGMENTSWe thank B. Michell for critically reading the manuscript. S.B. is supported by a Royal Society URF fellowship and J.W. by a Darwin Trust PhD Scholarship.

Published online at http://www.nature.com/nsmb/Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/

1. Maquat, L.E. When cells stop making sense: effects of nonsense codons on RNA metabolism in vertebrate cells. RNA 1, 453–465 (1995).

2. Peltz, S.W., Brown, A.H. & Jacobson, A. Messenger RNA destabilization triggered by premature translational termination depends on at least 3 cis-acting sequence elements and one trans-acting factor. Genes Dev. 7, 1737–1754 (1993).

3. Morse, D.E. & Yanofsky, C. Polarity and the degradation of mRNA. Nature 224, 329–331 (1969).

4. Amrani, N., Sachs, M.S. & Jacobson, A. Early nonsense: mRNA decay solves a translational problem. Nat. Rev. Mol. Cell Biol. 7, 415–425 (2006).

5. Maquat, L.E. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat. Rev. Mol. Cell Biol. 5, 89–99 (2004).

6. Culbertson, M.R., Underbrink, K.M. & Fink, G.R. Frameshift suppression in Saccharomyces cerevisiae II. Genetic properties of group II suppressors. Genetics 95, 833–853 (1980).

7. Conti, E. & Izaurralde, E. Nonsense-mediated mRNA decay: molecular insights and mechanistic variations across species. Curr. Opin. Cell Biol. 17, 316–325 (2005).

8. Cui, Y., Gonzalez, C.I., Kinzy, T.G., Dinman, J.D. & Peltz, S.W. Mutations in the MOF2/SUI1 gene affect both translation and nonsense-mediated mRNA decay. RNA 5, 794–804 (1999).

9. Welch, E.M. & Jacobson, A. An internal open reading frame triggers nonsense- mediated decay of the yeast SPT10 mRNA. EMBO J. 18, 6134–6145 (1999).

and active translation in all eukaryotic systems1. Our model proposes an active role for UPF1: it promotes NMD by stimulating ribosome release as a result of its interaction with eIF3. Consequently, factors such as the EJC that stimulate either UPF1 recruitment or its activation would enhance NMD. We speculate that in cells without UPF1, the region downstream of the PTC continues to be traversed by ribosome subunits that fail to detach from the mRNA.

UPF1 also regulates the stability of many normal transcripts that do not have obvious NMD features90, consistent with the idea that it may also act at normal stop codons. Our model is consistent with the fact that mRNAs with artificially long 3′ UTRs tend to be susceptible to UPF1-dependent decay. Conversely, it must be hypothesized that endogenous mRNAs that naturally possess unusually long 3′ UTRs must have evolved features to optimize their stability.

The ‘ribosome release’ model does not impose constraints on the nature of the 3′ UTR. However, some of the idiosyncrasies of NMD most likely depend on the specific properties of individual 3′ UTRs and on the fact that translation probably takes place in a closed-loop mRNP, around which ribosomes may make several circuits after the initial phase of translation91. For example, the fact that PTCs that are close to the 3′ end sometimes escape NMD might be due to speedy recycling of the released 40S subunit onto the 5′ end of the same mRNA91,92, which possibly involves either the bypass of the 3′ UTR by shunting the 40S subunit to the 5′ end directly or retention of the subunit on the mRNA and scanning. Later PTCs will be less likely to interrupt the translation circuit than early ones. Of course, this ribosome release hypothesis is at this stage a speculative working model to be tested by future studies.

OutlookAs discussed above, the role of splicing and the EJC in NMD has been questioned by recent studies; however, despite these new observations, it seems likely to us that the role of nearby introns in regulating NMD is still important, and it deserves to be addressed in the future. For example, one study reported that for one of their PTCs (ter440), NMD can take place without a downstream intron56; but it should be noted that when a downstream intron is present, the residual mRNA level is reduced a further sevenfold. Notably, all the studies that have questioned the role of introns in NMD have used minigene reporters that include one or more introns upstream of the PTC—perhaps the upstream introns act as ‘failsafe’ elements and removing them would prevent or drastically reduce NMD93. The role of the EJC in NMD will clearly need to be revised after these recent studies, but the concept of a protein complex that is deposited on the mRNA in the nucleus and exported to the cytoplasm remains a plausible explanation for how translation and NMD might be affected by an intron downstream of the PTC. However, if the PTC is located ahead of the intron, it is more difficult to envisage a similar mechanism—the ribosome would be expected to scrap any RNP complex preceding the termination codon. One possible explanation might be that the increased NMD is the secondary consequence of enhanced translation of spliced mRNA; several studies have reported that introns can enhance gene expression94, possibly by affecting the first round of translation95.

Future studies should also investigate the controversial possibility that translation may affect splicing and other steps of pre-mRNA processing directly, as suggested by reports that translation can occur in the nucleus96,97 and that PTC can affect pre-mRNA 3′ end processing61 and splicing (reviewed in ref. 98). The issue of whether translation and NMD can occur in the nucleus has been neglected lately; however, it was recently discovered that small introns are under selective pressure to encode premature PTCs in frame with the preceding exon, so that if the intron is not spliced out, it will cause premature translation termination18. This study focused on Paramecium tetraurelia, but this statement seems

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 19: Nature Structural Molecular Biology February

112 volume 16 number 2 FebruArY 2009 nature structural & molecular biology

intronless melanocortin 4-receptor gene is NMD insensitive. Hum. Mol. Genet. 11, 331–335 (2002).

41. Maquat, L.E. & Li, X. Mammalian heat shock p70 and histone H4 transcripts, which derive from naturally intronless genes, are immune to nonsense-mediated decay. RNA 7, 445–456 (2001).

42. Nagy, E. & Maquat, L.E. A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198–199 (1998).

43. Le Hir, H., Gatfield, D., Izaurralde, E. & Moore, M.J. The exon-exon junction complex provides a binding platform for factors involved in mRNA export and nonsense- mediated mRNA decay. EMBO J. 20, 4987–4997 (2001).

44. Chamieh, H., Ballut, L., Bonneau, F. & Le Hir, H. NMD factors UPF2 and UPF3 bridge UPF1 to the exon junction complex and stimulate its RNA helicase activity. Nat. Struct. Mol. Biol. 15, 85–93 (2008).

45. Ishigaki, Y., Li, X., Serin, G. & Maquat, L.E. Evidence for a pioneer round of mRNA translation: mRNAs subject to nonsense-mediated decay in mammalian cells are bound by CBP80 and CBP20. Cell 106, 607–617 (2001).

46. Lejeune, F., Ishigaki, Y., Li, X. & Maquat, L.E. The exon junction complex is detected on CBP80-bound but not eIF4E-bound mRNA in mammalian cells: dynamics of mRNP remodeling. EMBO J. 21, 3536–3545 (2002).

47. Tange, T.O., Nott, A. & Moore, M.J. The ever-increasing complexities of the exon junction complex. Curr. Opin. Cell Biol. 16, 279–284 (2004).

48. Serin, G., Gersappe, A., Black, J.D., Aronoff, R. & Maquat, L.E. Identification and characterization of human orthologues to Saccharomyces cerevisiae Upf2 protein and Upf3 protein (Caenorhabditis elegans SMG-4). Mol. Cell. Biol. 21, 209–223 (2001).

49. Ivanov, P.V., Gehring, N.H., Kunz, J.B., Hentze, M.W. & Kulozik, A.E. Interactions between UPF1, eRFs, PABP and the exon junction complex suggest an integrated model for mammalian NMD pathways. EMBO J. 27, 736–747 (2008).

50. Buhler, M., Paillusson, A. & Muhlemann, O. Efficient downregulation of immunoglobulin µ mRNA with premature translation-termination codons requires the 5′-half of the VDJ exon. Nucleic Acids Res. 32, 3304–3315 (2004).

51. Wang, J., Gudikote, J.P., Olivas, O.R. & Wilkinson, M.F. Boundary-independent polar nonsense-mediated decay. EMBO Rep. 3, 274–279 (2002).

52. Gatfield, D., Unterholzner, L., Ciccarelli, F.D., Bork, P. & Izaurralde, E. Nonsense-mediated mRNA decay in Drosophila: at the intersection of the yeast and mammalian pathways. EMBO J. 22, 3960–3970 (2003).

53. Longman, D., Plasterk, R.H., Johnstone, I.L. & Caceres, J.F. Mechanistic insights and identification of two novel factors in the C. elegans NMD pathway. Genes Dev. 21, 1075–1085 (2007).

54. Behm-Ansmant, I., Gatfield, D., Rehwinkel, J., Hilgers, V. & Izaurralde, E. A con-served role for cytoplasmic poly(A)-binding protein 1 (PABPC1) in nonsense-mediated mRNA decay. EMBO J. 26, 1591–1601 (2007).

55. Silva, A.L., Ribeiro, P., Inacio, A., Liebhaber, S.A. & Romao, L. Proximity of the poly(A)-binding protein to a premature termination codon inhibits mammalian nonsense-mediated mRNA decay. RNA 14, 563–576 (2008).

56. Buhler, M., Steiner, S., Mohn, F., Paillusson, A. & Muhlemann, O. EJC-independent degradation of nonsense immunoglobulin-µ mRNA depends on 3′ UTR length. Nat. Struct. Mol. Biol. 13, 462–464 (2006).

57. Singh, G., Rebbapragada, I. & Lykke-Andersen, J. A competition between stimulators and antagonists of Upf complex recruitment governs human nonsense-mediated mRNA decay. PLoS Biol. 6, e111 (2008).

58. Eberle, A.B., Stalder, L., Mathys, H., Orozco, R.Z. & Muhlemann, O. Posttranscriptional gene regulation by spatial rearrangement of the 3′ untranslated region. PLoS Biol. 6, e92 (2008).

59. Mangus, D.A., Evans, M.C. & Jacobson, A. Poly(A)-binding proteins: multifunctional scaffolds for the post-transcriptional control of gene expression. Genome Biol. 4, 223 (2003).

60. Losson, R. & Lacroute, F. Interference of nonsense mutations with eukaryotic messanger RNA stability. Proc. Natl. Acad. Sci. USA 76, 5134–5137 (1979).

61. Brogna, S. Nonsense mutations in the alcohol dehydrogenase gene of Drosophila melanogaster correlate with an abnormal 3′ end processing of the corresponding pre-mRNA. RNA 5, 562–573 (1999).

62. Yun, D.F. & Sherman, F. Initiation of translation can occur only in a restricted region of the CYC1 mRNA of Saccharomyces cerevisiae. Mol. Cell. Biol. 15, 1021–1033 (1995).

63. Sachs, A. Physical and functional interactions between the mRNA cap structure and the poly(A) tail. in Translational Control of Gene Expression. (eds. Sonenberg, N., Hershey, J.W.B. & Mathews, M.B.) 447–465 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 2000).

64. Amrani, N., Ghosh, S., Mangus, D.A. & Jacobson, A. Translation factors promote the formation of two states of the closed-loop mRNP. Nature 453, 1276–1280 (2008).

65. Imataka, H., Gradi, A. & Sonenberg, N. A newly identified N-terminal amino acid sequence of human eIF4G binds poly(A)-binding protein and functions in poly(A)-dependent translation. EMBO J. 17, 7480–7489 (1998).

66. Le, H. et al. Translation initiation factors eIF-iso4G and eIF-4B interact with the poly(A)-binding protein and increase its RNA binding activity. J. Biol. Chem. 272, 16247–16255 (1997).

67. Christensen, A.K. & Bourne, C.M. Shape of large bound polysomes in cultured fibroblasts and thyroid epithelial cells. Anat. Rec. 255, 116–129 (1999).

68. Kuperwasser, N., Brogna, S., Dower, K. & Rosbash, M. Nonsense-mediated decay does not occur within the yeast nucleus. RNA 10, 1907–1915 (2004).

10. Cali, B.M., Kuchma, S.L., Latham, J. & Anderson, P. smg-7 is required for mRNA surveillance in Caenorhabditis elegans. Genetics 151, 605–616 (1999).

11. Hodgkin, J., Papp, A., Pulak, R., Ambros, V. & Anderson, P. A new kind of informational suppression in the nematode Caenorhabditis elegans. Genetics 123, 301–313 (1989).

12. Page, M.F., Carr, B., Anders, K.R., Grimson, A. & Anderson, P. SMG-2 is a phosphorylated protein required for mRNA surveillance in Caenorhabditis elegans and related to Upf1p of yeast. Mol. Cell. Biol. 19, 5943–5951 (1999).

13. Grimson, A., O’Connor, S., Newman, C.L. & Anderson, P. SMG-1 is a phosphatidylinositol kinase-related protein kinase required for nonsense-mediated mRNA decay in Caenorhabditis elegans. Mol. Cell. Biol. 24, 7483–7490 (2004).

14. Unterholzner, L. & Izaurralde, E. SMG7 acts as a molecular link between mRNA surveillance and mRNA decay. Mol. Cell 16, 587–596 (2004).

15. Kashima, I. et al. Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense-mediated mRNA decay. Genes Dev. 20, 355–367 (2006).

16. He, F., Peltz, S.W., Donahue, J.L., Rosbash, M. & Jacobson, A. Stabilization and ribosome association of unspliced pre-mRNAs in a yeast Upf1– mutant. Proc. Natl. Acad. Sci. USA 90, 7034–7038 (1993).

17. Sayani, S., Janis, M., Lee, C.Y., Toesca, I. & Chanfreau, G.F. Widespread impact of nonsense-mediated mRNA decay on the yeast intronome. Mol. Cell 31, 360–370 (2008).

18. Jaillon, O. et al. Translational control of intron splicing in eukaryotes. Nature 451, 359–362 (2008).

19. McGlincy, N.J. & Smith, C.W. Alternative splicing resulting in nonsense-mediated mRNA decay: what is the meaning of nonsense? Trends Biochem. Sci. 33, 385–393 (2008).

20. Rehwinkel, J., Raes, J. & Izaurralde, E. Nonsense-mediated mRNA decay: target genes and functional diversification of effectors. Trends Biochem. Sci. 31, 639–646 (2006).

21. Medghalchi, S.M. et al. Rent1, a trans-effector of nonsense-mediated mRNA decay, is essential for mammalian embryonic viability. Hum. Mol. Genet. 10, 99–105 (2001).

22. Metzstein, M.M. & Krasnow, M.A. Functions of the nonsense-mediated mRNA decay pathway in Drosophila development. PLoS Genet. 2, e180 (2006).

23. Yoine, M., Nishii, T. & Nakamura, K. Arabidopsis UPF1 RNA helicase for nonsense-mediated mRNA decay is involved in seed size control and is essential for growth. Plant Cell Physiol. 47, 572–580 (2006).

24. Weischenfeldt, J. et al. NMD is essential for hematopoietic stem and progenitor cells and for eliminating by-products of programmed DNA rearrangements. Genes Dev. 22, 1381–1396 (2008).

25. Ajamian, L. et al. Unexpected roles for UPF1 in HIV-1 RNA metabolism and translation. RNA 14, 914–927 (2008).

26. Azzalin, C.M. & Lingner, J. The human RNA surveillance factor UPF1 is required for S phase progression and genome stability. Curr. Biol. 16, 433–439 (2006).

27. Luke, B. et al. Saccharomyces cerevisiae Ebs1p is a putative ortholog of human Smg7 and promotes nonsense-mediated mRNA decay. Nucleic Acids Res. 35, 7688–7697 (2007).

28. Azzalin, C.M., Reichenbach, P., Khoriauli, L., Giulotto, E. & Lingner, J. Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science 318, 798–801 (2007).

29. Brumbaugh, K.M. et al. The mRNA surveillance protein hSMG-1 functions in genotoxic stress response pathways in mammalian cells. Mol. Cell 14, 585–598 (2004).

30. Zhang, S, Ruizechevarria, M.J., Quan, Y. & Peltz, S.W. Identification and characterization of a sequence motif involved in nonsense-mediated messenger RNA decay. Mol. Cell. Biol. 15, 2231–2244 (1995).

31. Gonzalez, C.I., Ruiz-Echevarria, M.J., Vasudevan, S., Henry, M.F. & Peltz, S.W. The yeast hnRNP-like protein Hrp1/Nab4 marks a transcript for nonsense-mediated mRNA decay. Mol. Cell 5, 489–499 (2000).

32. Hilleren, P. & Parker, R. mRNA surveillance in eukaryotes: kinetic proofreading of proper translation termination as assessed by mRNP domain organization? RNA 5, 711–719 (1999).

33. Muhlrad, D. & Parker, R. Aberrant mRNAs with extended 3′ UTRs are substrates for rapid degradation by mRNA surveillance. RNA 5, 1299–1307 (1999).

34. Amrani, N. et al. A faux 3′-UTR promotes aberrant termination and triggers nonsense-mediated mRNA decay. Nature 432, 112–118 (2004).

35. Hoshino, S., Imai, M., Kobayashi, T., Uchida, N. & Katada, T. The eukaryotic polypeptide chain releasing factor (eRF3/GSPT) carrying the translation termination signal to the 3′-poly(A) tail of mRNA. Direct association of erf3/GSPT with polyadenylate-binding protein. J. Biol. Chem. 274, 16677–16680 (1999).

36. Cosson, B. et al. Poly(A)-binding protein acts in translation termination via eukaryotic release factor 3 interaction and does not influence [PSI(+)] propagation. Mol. Cell. Biol. 22, 3301–3315 (2002).

37. Zhang, J., Sun, X.L., Qian, Y.M., LaDuca, J.P. & Maquat, L.E. At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation. Mol. Cell. Biol. 18, 5272–5283 (1998).

38. Carter, M.S., Li, S.L. & Wilkinson, M.F. A splicing dependent regulatory mechanism that detects translation signals. EMBO J. 15, 5965–5975 (1996).

39. Thermann, R. et al. Binary specification of nonsense codons by splicing and cytoplasmic translation. EMBO J. 17, 3484–3494 (1998).

40. Brocke, K.S., Neu-Yilik, G., Gehring, N.H., Hentze, M.W. & Kulozik, A.E. The human

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 20: Nature Structural Molecular Biology February

nature structural & molecular biology volume 16 number 2 FebruArY 2009 113

85. Pisarev, A.V., Hellen, C.U. & Pestova, T.V. Recycling of eukaryotic posttermination ribosomal complexes. Cell 131, 286–299 (2007).

86. Lejeune, F., Li, X. & Maquat, L.E. Nonsense-mediated mRNA decay in mammalian cells involves decapping, deadenylating, and exonucleolytic activities. Mol. Cell 12, 675–687 (2003).

87. Gatfield, D. & Izaurralde, E. Nonsense-mediated messenger RNA decay is initiated by endonucleolytic cleavage in Drosophila. Nature 429, 575–578 (2004).

88. Huntzinger, E., Kashima, I., Fauser, M., Sauliere, J. & Izaurralde, E. SMG6 is the catalytic endonuclease that cleaves mRNAs containing nonsense codons in meta-zoan. RNA 14, 2609–2617 (2008).

89. Eberle, A.B., Lykke-Andersen, S., Muhlemann, O. & Jensen, T.H. SMG6 promotes endonucleolytic cleavage of nonsense mRNA in human cells. Nat. Struct. Mol. Biol. 16, 49–55 (2009).

90. Johansson, M.J., He, F., Spatrick, P., Li, C. & Jacobson, A. Association of yeast Upf1p with direct substrates of the NMD pathway. Proc. Natl. Acad. Sci. USA 104, 20872–20877 (2007).

91. Kopeina, G.S. et al. Step-wise formation of eukaryotic double-row polyribosomes and circular translation of polysomal mRNA. Nucleic Acids Res. 36, 2476–2488 (2008).

92. Uchida, N., Hoshino, S., Imataka, H., Sonenberg, N. & Katada, T. A novel role of the mammalian GSPT/eRF3 associating with poly(A)-binding protein in cap/poly(A)-dependent translation. J. Biol. Chem. 277, 50286–50292 (2002).

93. Matsuda, D., Hosoda, N., Kim, Y.K. & Maquat, L.E. Failsafe nonsense-mediated mRNA decay does not detectably target eIF4E-bound mRNA. Nat. Struct. Mol. Biol. 14, 974–979 (2007).

94. Nott, A., Meislin, S.H. & Moore, M.J. A quantitative analysis of intron effects on mammalian gene expression. RNA 9, 607–617 (2003).

95. Ma, X.M., Yoon, S.O., Richardson, C.J., Julich, K. & Blenis, J. SKAR links pre-mRNA splicing to mTOR/S6K1-mediated enhanced translation efficiency of spliced mRNAs. Cell 133, 303–313 (2008).

96. Brogna, S., Sato, T.A. & Rosbash, M. Ribosome components are associated with sites of transcription. Mol. Cell 10, 93–104 (2002).

97. Iborra, F.J., Jackson, D.A. & Cook, P.R. Coupled transcription and translation within nuclei of mammalian cells. Science 293, 1139–1142 (2001).

98. Maquat, L.E. NASty effects on fibrillin pre-mRNA splicing: another case of ESE does it, but proposals for translation-dependent splice site choice live on. Genes Dev. 16, 1743–1753 (2002).

99. Gehring, N.H. et al. Exon-junction complex components specify distinct routes of nonsense-mediated mRNA decay with differential cofactor requirements. Mol. Cell 20, 65–75 (2005).

100. Chan, W.K. et al. An alternative branch of the nonsense-mediated decay pathway. EMBO J. 26, 1820–1830 (2007).

69. Romao, L. et al. Nonsense mutations in the human β-globin gene lead to unexpected levels of cytoplasmic mRNA accumulation. Blood 96, 2895–2901 (2000).

70. Belgrader, P., Cheng, J., Zhou, X.B., Stephenson, L.S. & Maquat, L.E. Mammalian nonsense codons can be cis effectors of nuclear messenger RNA half life. Mol. Cell. Biol. 14, 8219–8228 (1994).

71. Zhang, J. & Maquat, L.E. Evidence that translation reinitiation abrogates nonsense-mediated mRNA decay in mammalian cells. EMBO J. 16, 826–833 (1997).

72. Inacio, A. et al. Nonsense mutations in close proximity to the initiation codon fail to trigger full nonsense-mediated mRNA decay. J. Biol. Chem. 279, 32170–32180 (2004).

73. Poyry, T.A., Kaminski, A. & Jackson, R.J. What determines whether mammalian ribosomes resume scanning after translation of a short upstream open reading frame? Genes Dev. 18, 62–75 (2004).

74. Szamecz, B. et al. eIF3a cooperates with sequences 5′ of uORF1 to promote resumption of scanning by post-termination ribosomes for reinitiation on GCN4 mRNA. Genes Dev. 22, 2414–2425 (2008).

75. Kahvejian, A., Svitkin, Y.V., Sukarieh, R., M’Boutchou, M.N. & Sonenberg, N. Mammalian poly(A)-binding protein is a eukaryotic translation initiation factor, which acts via multiple mechanisms. Genes Dev. 19, 104–113 (2005).

76. Gilbert, W.V., Zhou, K., Butler, T.K. & Doudna, J.A. Cap-independent translation is required for starvation-induced differentiation in yeast. Science 317, 1224–1227 (2007).

77. Coller, J.M., Gray, N.K. & Wickens, M.P. mRNA stabilization by poly(A) binding protein is independent of poly(A) and requires translation. Genes Dev. 12, 3226–3235 (1998).

78. Meaux, S., van Hoof, A. & Baker, K.E. Nonsense-mediated mRNA decay in yeast does not require PAB1 or a poly(A) tail. Mol. Cell 29, 134–140 (2008).

79. Simon, E. & Seraphin, B. A specific role for the C-terminal region of the poly(A)-binding protein in mRNA decay. Nucleic Acids Res. 35, 6017–6028 (2007).

80. Neu-Yilik, G. et al. Splicing and 3′ end formation in the definition of nonsense- mediated decay-competent human β-globin mRNPs. EMBO J. 20, 532–540 (2001).

81. Ling, J., Morley, S.J., Pain, V.M., Marzluff, W.F. & Gallie, D.R. The histone 3′-terminal stem-loop-binding protein enhances translation through a functional and physical interaction with eukaryotic initiation factor 4G (eIF4G) and eIF3. Mol. Cell. Biol. 22, 7853–7867 (2002).

82. Kaygun, H. & Marzluff, W.F. Regulated degradation of replication-dependent histone mRNAs requires both ATR and Upf1. Nat. Struct. Mol. Biol. 12, 794–800 (2005).

83. Isken, O. et al. Upf1 phosphorylation triggers translational repression during non-sense-mediated mRNA decay. Cell 133, 314–327 (2008).

84. Morris, C., Wittmann, J., Jack, H.M. & Jalinot, P. Human INT6/eIF3e is required for nonsense-mediated mRNA decay. EMBO Rep. 8, 596–602 (2007).

p e r s p e c t i v e

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 21: Nature Structural Molecular Biology February

Structural characterization of Tip20p and Dsl1p, subunitsof the Dsl1p vesicle tethering complexArati Tripathi1,2, Yi Ren1,2, Philip D Jeffrey1 & Frederick M Hughson1

Multisubunit tethering complexes are essential for intracellular trafficking and have been proposed to mediate the initialinteraction between vesicles and the membranes with which they fuse. Here we report initial structural characterization of theDsl1p complex, whose three subunits are essential for trafficking from the Golgi apparatus to the endoplasmic reticulum (ER).Crystal structures reveal that two of the three subunits, Tip20p and Dsl1p, resemble known subunits of the exocyst complex,establishing a structural connection among several multisubunit tethering complexes and implying that many of their subunits arederived from a common progenitor. We show, moreover, that Tip20p and Dsl1p interact directly via N-terminal a-helices. Finally,we establish that different Dsl1p complex subunits bind independently to different ER SNARE proteins. Our results map out twoalternative protein-interaction networks capable of tethering COPI-coated vesicles, via the Dsl1p complex, to ER membranes.

Intracellular trafficking of proteins and lipids is accomplished ineukaryotes by means of vesicles that ferry cargo from one compart-ment to another, or to and from the plasma membrane. Cargoselection, vesicle formation, and vesicle docking and fusion require alarge ensemble of cellular proteins and protein complexes1. Some ofthese, such as vesicle coats and SNAREs (soluble N-ethylmaleimide–sensitive factor attachment protein receptors), have reasonably well-defined functional roles: the assembly of coat subunits helps drivevesicle formation, whereas the assembly of complexes between cognateSNARE proteins catalyzes the fusion of vesicles with appropriate targetmembranes. Precise functional roles have not, however, been assignedto other proteins with essential roles in vesicle trafficking. Most ofthese additional proteins are either small G proteins of the Rab family2

or members of a seemingly heterogeneous set of proteins and proteincomplexes collectively termed ‘tethering factors’3.

Tethering factors have been proposed to mediate an initial, rever-sible attachment between a transport vesicle and its proper intra-cellular target membrane3,4. Nonetheless, fundamental questionsabout tethering factors remain unanswered. First, how many differenttypes of tethering factors are there? A strong distinction can be drawnbetween homodimeric tethering factors, which are highly elongatedcoiled-coil proteins, and multisubunit tethering factors, which arecomposed of as many as ten different polypeptides5,6. Among themultisubunit tethering factors, there is clear evidence for structuraldiversity, and therefore mechanistic diversity, but the extent of thisdiversity is not understood. A second question concerns the extent towhich multisubunit tethering factors actually mediate vesicle tether-ing. Considerable uncertainty remains on this central point, in partbecause, unlike budding and fusion, tethering has not been recon-stituted using defined protein and lipid components. Moreover,

structural information that could serve as a foundation for probingthe function and mechanism of tethering factors has been, in manycases, unavailable. A third question is whether tethering factors fulfilladditional functions beyond (or instead of) vesicle tethering. Themultisubunit tethering factors, in particular, seem to be architecturallycomplex and might well possess functionality extending beyondsimple membrane attachment. This seems especially plausible inlight of the demonstrated genetic and/or physical interactions betweenmultisubunit tethering factors and Rabs, vesicle coat proteins, SNAREsand other components of the cellular trafficking machinery4,7. Inseveral cases, multisubunit tethering factors seem to influence theassembly and/or stability of SNARE complexes8–11, but the mechan-ism by which this is accomplished is unknown.

To establish a basis for addressing some of these questions, we andothers have initiated efforts to determine the structures of multi-subunit tethering complexes, or their subunits or subassemblies. Todate, eight conserved multisubunit complexes, containing 3–10 sub-units each and functioning largely in discrete trafficking pathways,have been identified4. The most complete structural information isavailable for the 300-kDa TRAPP I (transport protein particle I)complex, which functions in ER-to-Golgi trafficking12,13. EM com-bined with X-ray crystallography established that TRAPP I is made upof seven subunits that assemble to form a flattened, two-lobed array14.More fragmentary structural information is available for the exocyst15

and COG16 (conserved oligomeric Golgi) complexes, which operate atthe plasma membrane and Golgi, respectively. Both exocyst and COGcomplexes are hetero-octamers with molecular weights exceeding500 kDa. Structures of five individual subunits—four exocyst sub-units17–21 and one COG subunit22—have been reported. Notably,although these structures all resemble one another, none of them

Received 26 October 2008; accepted 29 December 2008; published online 18 January 2009; doi:10.1038/nsmb.1548

1Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA. 2These authors contributed equally to this work. Correspondence shouldbe addressed to F.M.H. ([email protected]).

11 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

A R T IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 22: Nature Structural Molecular Biology February

resembles TRAPP I subunits. This observation divides the structurallycharacterized multisubunit tethering complexes into at least twodifferent families. Whether the remaining complexes fit into eitherof these families is largely unknown, although sequence homologysuggests that the GARP (Golgi-associated retrograde protein) complexprobably belongs to the exocyst and COG family23,24. Also unknown,except for TRAPP I, is how the subunits within each complex interactwith one another.

Here we report initial biochemical and crystallographic analysisof the Dsl1p multisubunit tethering complex10. The Dsl1p complexhas only three known subunits; in the yeast, Saccharomyces cerevisiae,they are Dsl1p, Tip20p and Sec39p (also called Dsl3p). All threeare essential for viability. Despite lacking predicted transmembranedomains, Dsl1p, Tip20p and Sec39p all localize to ER membranes;temperature-sensitive mutations in any one of them blocks Golgi-to-ER retrograde trafficking10,25–29. We have determined X-raystructures for two subunits of the Dsl1p complex, Tip20p (fulllength) and Dsl1p (residues 37–355). Both structures reveal unantici-pated but significant similarity to subunits of the exocyst complex,providing direct structural evidence that the Dsl1p, exocyst and COGcomplexes are derived from a common evolutionary precursor.Our results delineate a series of protein-protein interactionscapable of tethering COPI vesicles to the ER via the t-SNAREsSec20p and Use1p. They furthermore establish that the Dsl1pcomplex has two independent binding sites for two differentSNARE proteins, suggesting a potential role in controllingSNARE assembly.

RESULTSTip20p structureWe began by determining the crystal structure of full-length yeastTip20p (residues 1–701) at 3.0-A resolution using MAD phasing(Fig. 1a). The structure consists entirely of a-helices and interveningloops of variable length, organized into a series of helix-bundledomains. Despite the absence of any detectable sequence similarity23,there is a strong resemblance between Tip20p and each of the fourexocyst subunits that have been structurally characterized (Fig. 1b).This resemblance establishes a structural link to the COG complex aswell, because several COG subunits resemble exocyst subunits (ref. 22and B.C. Richardson and F.M.H., unpublished results). Thus, theavailable structural data support the sorting of multisubunit tetheringcomplexes into at least two unrelated families: one that includesTRAPP I and TRAPP II, and another that includes the exocyst,COG and Dsl1p complexes.

Tip20p is the first subunit of an exocyst/COG/Dsl1p familytethering complex to be crystallized intact; previously reportedstructures17–22 were based on crystals (or NMR characterization) ofN-terminally truncated subunits. The most nearly complete ofthe previously reported structures, lacking just 66 out of 623 residues,is the exocyst subunit Exo70p17–19. Comparing Exo70p and Tip20preveals that they share a core structure consisting of helix-bundledomains (domains A–D; Fig. 1a,b). Tip20p has, in addition, aset of N-terminal helices, as well as an extra C-terminal domain(domain E). An analogous C-terminal domain is present in one otherexocyst subunit (Sec6p20), but is lacking from others (Exo70p and

Exo84p17). For Sec15, the potential presenceof an extra C-terminal domain is ambiguousbecause the published structure21 lacksC-terminal regions in addition to N-terminalregions. Further details are provided in theFigure 1 legend.

Comparing the intact Tip20p structure tothe nearly intact Exo70p structure reveals amarked difference (Fig. 1b). Exo70p’s fourdomains are arranged in a linear array, giving

Domain A

Domain A

Domain B

Domain B

Domain C

Domain D 90°

200 Å

0Tip20p Sec6p Sec15 Exo70p

Exocyst subunits

Exo84p Dsl1p

Domain EC CC

NN

N

a

b

c

Figure 1 X-ray crystal structures of S. cerevisiae

Dsl1p complex subunits. (a) Full-length Tip20p

(residues 1–701), color coded by domain. Two

views are shown; it can be seen most clearly on

the right that the N-terminal helix projects away

from the remainder of the protein in a manner

stabilized by crystal contacts. (b) Structural

alignment of Tip20p and Dsl1DC to known

exocyst subunits. Shown are S. cerevisiae Sec6p

(PDB 2FJI, residues 411–805 out of 805)20,

Drosophila melanogaster Sec15 (PDB 2A2F,

residues 382–699 out of 766)21, S. cerevisiaeExo70p (PDB 2PFV, residues 67–623 out of

623)17–19 and S. cerevisiae Exo84p (PDB 2D2S,

residues 525–753 out of 753)17. Pairwise

alignment was performed with the program

DaliLite to match each of the exocyst structures

to domains C through E of Tip20p; Dsl1DC was

then aligned to domains A and B of Exo70p. The

DaliLite Z scores for the alignments shown were

11.5 (Tip20p–Exo70p), 9.0 (Tip20p–Exo84p),

16.0 (Tip20p–Sec6p), 12.4 (Tip20p–Sec15) and

7.1 (Dsl1DC–Exo70p). (c) Dsl1DC (residues 37–

355 out of 754), color-coded by domain.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 1 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 23: Nature Structural Molecular Biology February

rise to a rod-like shape. The corresponding domains of Tip20p, onthe other hand, are arranged in a curving array, giving rise to asharply bent, hook-like shape. The difference in global conformationbetween Exo70p and Tip20p is largely attributable to differences in theA–B and B–C hinge angles. The specific bent conformation observedfor Tip20p is probably a thermodynamically favorable one, because it isadopted by all four independent Tip20p monomers in the crystal-lographic asymmetric unit (pairwise r.m.s. deviation 1.3–2.9 A). It ispossible that the straight and bent conformations simply reflect staticstructural differences between Exo70p and Tip20p. An intriguingalternative is that Exo70p and Tip20p, and perhaps other exocyst/COG/Dsl1p subunits, are structurally dynamic molecules that adoptboth straight and bent conformations during a functional cycle.Crystallographic evidence for modest flexibility at the B-C hinge ofExo70p was reported previously18.

The Tip20p structure offers a first opportunity to examine theconformation of the N-terminal region of an exocyst/COG/Dsl1pcomplex family subunit. The entire N-terminal region, except forresidues 1–4, shows clear electron density. Notably, residues 5–38 forma long a-helix that projects away from the main body of the protein(Fig. 1a, right). This helix is stabilized, in the crystals, by forming an

antiparallel coiled coil with the corresponding helix of a secondmonomer. This interaction is not, however, maintained in solution,as judged by sedimentation velocity analytical ultracentrifugationexperiments (data not shown). Instead, as discussed below, theN-terminal helix is required for the interaction between Tip20p andanother subunit of the Dsl1p complex, Dsl1p itself.

Dsl1p structureWe were able to produce soluble full-length yeast Dsl1p(residues 1–754) but could not generate diffraction-qualitycrystals, perhaps because the full-length protein contains a centralregion (residues 388–467) with an unusual concentration ofcharged residues28 and an absence of predicted regular secondarystructure. We therefore tested truncated versions of Dsl1p,obtaining the highest-quality crystals using an N-terminalfragment (residues 1–361) that we named Dsl1DC. The X-ray struc-ture of Dsl1DC, determined using MAD phasing and refined to2.4-A resolution, revealed a molecule with a significantresemblance to other exocyst/COG/Dsl1p complex family subunits(Fig. 1b,c). Like these structures, Dsl1DC consists primarily ofa-helical bundles.

97 kDa66

97 kDa

97 kDa

45 kDa

66

66

45

45

31

31

97 kDa

6.0 10.0Volume (ml) Volume (ml)

Tip20p Tip20p

Tip20p

Tip20p

Dsl1pDsl1∆C

Dsl1∆C

Dsl1p

14.0 18.0 6.0 10.0 14.0 18.0

GST-Tip20 (1–43) –

– –

+

++

+

+

GST

66

97 kDa66

–10 0 10 20 30 40Time (min)

50 60 70 80 90

–5

–10

–15

–20

–25

–30

0.0 0.5 1.0Molar ratio

1.5 2.0 2.5

–35

µcal

s–1

0.2

0.0

–0.2

–0.4

–0.6

–0.8

–1.05

0

97 kDa 97 kDa

66 kDa

45

31

21

45 kDa

97 kDa97 kDa

97 kDa

66 66

66

66

66

45

31

Tip20∆N Tip20p, Dsl1∆N

Tip20∆NTip20p

Dsl1∆C

Dsl1∆C Dsl1∆N

Dsl1∆C

6.0 10.0

Volume (ml)

14.0 18.0 6.0 10.0

Volume (ml)

14.0 18.0

a

d e

f

b c

kcal

(m

ol in

ject

ion)

–1

Figure 2 The Tip20p and Dsl1p subunits of the Dsl1p complex form stoichiometric heterodimers. (a) Tip20p binds full-length Dsl1p (residues 1–754).

Tip20p alone, Dsl1p alone or an equimolar mixture were sized on a Superdex 200 gel-filtration column. Protein-containing fractions were analyzed using

SDS-PAGE gels stained with Coomassie Blue, false-colored to match the corresponding gel-filtration profiles. (b) Tip20p binds Dsl1DC (residues 1–361).

A slight molar excess of Tip20p was present in the mixture (blue gel-filtration profile) and accounts for the apparent trailing of the peak. (c) As assessed by

isothermal titration calorimetry, Tip20p binds Dsl1DC with a dissociation constant of 100 nM to form 1:1 complexes. (d) Tip20DN (residues 82–701) does

not bind Dsl1DC, demonstrating that the N-terminal region of Tip20p is essential for heterodimer formation. (e) Tip20p does not bind Dsl1DN (residues

57–754), demonstrating that the N-terminal region of Dsl1p is essential for heterodimer formation. (f) The N terminus of Tip20p (residues 1–43), fused to

GST, is sufficient to bind Dsl1DC (residues 1–361).

ART IC L E S

11 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 24: Nature Structural Molecular Biology February

No electron density was discernable for the first 36 residuesof Dsl1DC, suggesting that the extreme N terminus, althoughpresent, is not well ordered crystallographically. Residues 38–73form a long a-helix with a pronounced bend centered aroundresidue 51 (Fig. 1c). The C-terminal portion of the helix,residues 57–73, forms the first helix of domain A. The N-terminalportion of the helix, by contrast, projects away from the rest ofthe protein and interacts in the crystals with the corresponding regionof a second monomer via an antiparallel helix-helix interaction.Thus, both Tip20p and Dsl1DC crystallize in such a way thattheir protruding N-terminal helices are paired and mutuallystabilized. It seems likely that, in Tip20p or Dsl1p monomers,these N-terminal regions would be flexible. Such flexibility isconsistent with the absence of N-terminal regions from previously

reported exocyst and COG structures, all of which were basedon stable fragments identified by limited proteolysis17–22.

Tip20p–Dsl1p interactionPure recombinant Tip20p bound in vitro to both full-length Dsl1p(Fig. 2a) and Dsl1DC (Fig. 2b), as judged by comparing gel-filtrationchromatography profiles of the individual proteins to their equimolarmixture. Although this finding is consistent with a wealth of previousdata10,25,26,30, it represents the first demonstration of a direct physicalinteraction between Tip20p and Dsl1p. Sedimentation velocity analy-tical ultracentrifugation demonstrated that Tip20p and Dsl1DC form1:1 complexes (data not shown). Isothermal titration calorimetryyielded the same 1:1 stoichiometry, together with a Kd of 100 nM(Fig. 2c). Control ultracentrifugation experiments, using Tip20p orDsl1DC alone, revealed little or no homodimerization. Therefore, thepairing of N-terminal helices observed in both crystal structures is notsufficient to stabilize either homodimer in solution. It remainedpossible that an analogous interaction between antiparallel N-terminalhelices might mediate the formation of Tip20p–Dsl1p heterodimers.We therefore tested whether N-terminal truncations affected theability of Tip20p and Dsl1p to bind one another. Indeed, deletingthe N-terminal region of either Tip20p (residues 1–81) or Dsl1p(residues 1–56) eliminated heterodimer formation (Fig. 2d,e). On theother hand, removing just those N-terminal Dsl1p residues (1–36)that were poorly ordered in the crystal structure had no effect oncomplex formation (data not shown). A glutathione S-transferase

Dsl1p

Dsl1p

Tip20p

Tip20p–l10D L28E

Tip20p–V17E

Dsl1∆C

Dsl1∆C

Dsl1∆C

N

Leu55 Leu55

Leu48 Leu48

Leu28 Leu28

Val17 Val17

lle10 lle10

Leu58 Leu58

Leu62 Leu62

N C

Tip20p

N

N

Tip20p

C

C

C

97 kDa664597 kDa664597 kDa6645

6.0 10.0Volume (ml)

14.0 18.0

a

cd

b Figure 3 Structural and biochemical characterization of the Tip20p–Dsl1p

interaction. (a) X-ray crystal structure of the Tip20p–Dsl1DC fusion protein

(see text for details). (b) The antiparallel interaction between N-terminal

helices of Tip20p and Dsl1p. Side chains are shown for residues in the

Tip20p–Dsl1p interface. The side chains of the residues selected for

site-directed mutagenesis are labeled and shown as spheres.

‘Intermolecular’ polar interactions are highlighted with black dashed lines.

(c) Representative results of Tip20p–Dsl1DC binding experiments. Dsl1DC

binds wild-type Tip20p (above; see also Fig. 2b) but not the mutant

proteins Tip20p–I10D L28E or Tip20p-V17E. A slight molar excess of

Dsl1DC accounts for the trailing of the blue gel-filtration profile.

(d) Model for Tip20p–Dsl1DC complex generated by replacing Tip20p

residues 9–32 in the Tip20p–Dsl1DC fusion protein with full-length

Tip20p. The model contains a single steric clash, just to the left of the

blue ‘N’, involving a presumably flexible region of Tip20p (see textfor details).

6.0

97 kDa66

97 kDa66

97 kDa66

97 kDa66

4597 kDa66

97 kDa66

97 kDa66

10.0

Volume (ml)

14.0 18.0 6.0 10.0

Volume (ml)

14.0 18.0 6.0 10.0

Volume (ml)

14.0 18.0

Dsl1p, Sec39p, Tip20p Dsl1p, Sec39pTip20∆N

Dsl1p

Sec39p

Tip20p

97 kDa6645

97 kDa6645

97 kDa66

Dsl1p, Sec39p

Tip20∆N

Sec39pTip20p, Dsl1∆C

Tip20p, Dsl1∆C

Sec39p

a b c

Figure 4 Reconstitution of the heterotrimeric Dsl1p complex. (a) Full-length Tip20p, Dsl1p and Sec39p (residues 1–709) form stoichiometric heterotrimers.

(b) Tip20DN (residues 82–701) does not bind to Dsl1p–Sec39p heterodimers, demonstrating that the N-terminal region of Tip20p is essential for its

incorporation into the Dsl1p complex. (c) Tip20p–Dsl1DC heterodimers do not bind Sec39p, demonstrating that a C-terminal region of Dsl1p is essential for

incorporation of Sec39p into the Dsl1p complex.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 1 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 25: Nature Structural Molecular Biology February

(GST) fusion protein containing only the first 43 residues of Tip20p,corresponding to the N-terminal helix, was sufficient to bind Dsl1DC(Fig. 2f). Together, these results provided a strong indicationthat Tip20p–Dsl1p heterodimerization entails the pairing of oneN-terminal helix from each protein.

To further analyze the Tip20p–Dsl1p interaction, we sought todetermine the crystal structure of a Tip20p–Dsl1p complex. Becauseall of the crystals we obtained from mixtures of Tip20p and Dsl1DCcontained only one of the two proteins, we took an alternativeapproach, fusing the N-terminal helix of Tip20p to the N terminusof Dsl1DC. We obtained high-quality crystals from a fusion proteinthat linked residues 1–40 of Tip20p to the well-ordered region ofDsl1DC (residues 37–339) via an 8-residue glycine-serine linker. Theresulting structure, determined by molecular replacement and refinedto a resolution of 1.9 A, includes residues 9–32 of Tip20p and residues42–338 of Dsl1p (Fig. 3a,b). These are connected by a tether, notvisible in electron-density maps, consisting of residues 33–40 ofTip20p, the 8-residue glycine-serine linker and residues 37–41 ofDsl1p; this 21-residue tether is capable of reaching 60 A or more. Inthe crystal structure, residues 9–32 of Tip20p form an a-helix ofabout 35 A in length that packs, in an antiparallel orientation, againstthe N-terminal helix of Dsl1p. The presence of the Tip20p helixeliminates the bend in the Dsl1p helix (compare Figs. 1c and 3a). Also

noticeable, upon comparing the Dsl1DC structure to the fusionprotein structure, is a small reorientation of domain B relative todomain A (not shown). Otherwise, neither the presence of the Tip20phelix nor the change in the crystal-packing environment causessignificant perturbation in the Dsl1DC structure.

To test whether the antiparallel helix-helix interaction observedin the crystal structure of the fusion protein was required forTip20p–Dsl1DC heterodimer formation, we used site-directed muta-genesis to change interfacial hydrophobic residues to either glutamateor aspartate (Fig. 3b). In excellent agreement with the structure, eachof the following mutations abolished heterodimer formation:Tip20p—I10D L28E, Tip20p-V17E, Dsl1DC-L48E, Dsl1DC-L55Eand Dsl1DC-L58D; representative results are shown in Figure 3c.Dsl1DC-L41E, on the other hand, modifies a residue that is notordered in the crystal structure (and therefore not present inFig. 3b); as expected, this modification had no effect on binding.The only unexpected result was that the buried interfacial residueLeu62 could be replaced by glutamate without eliminating binding.However, closer inspection revealed that the mutant Glu62 sidechain could, with minor structural readjustment, salt bridge withArg13 of Tip20p. Thus, structure-based mutagenesis seems to befully consistent with the X-ray structural analysis of the Tip20p–Dsl1DC interaction.

6.0

97 kDa66

97 kDa6645

45

45

66 kDa

97 kDa66

97 kDa

97 kDa66

6645

45

97 kDa

97 kDa

66

66

45

31 31

21

10.0Volume (ml)

Volume (ml) Volume (ml) Volume (ml)

14.0 18.0 6.0 10.0Volume (ml)

14.0 18.0 6.0 10.0Volume (ml)

14.0 18.0

6.0 10.0 14.0 18.06.0 10.0 14.0 18.0 6.0 10.0 14.0 18.0

Tip20p Dsl1p, Sec39p, Tip20p

Dsl1p, Sec39p, Tip20p

Dsl1p, Sec39p, Tip20p Dsl1p, Sec39p

Tip20p

Tip20p

Tip20p

Tip20∆N

Dsl1p

Dsl1p, Sec39p

Dsl1p, Tip20p

Sec39p

Sec39p Sec39p

Dsl1p

Sec20∆C Sec20∆CUse1∆C

Use1∆C Use1∆C Use1∆C

Use1∆C

Use1∆CSec20∆C

Sec20∆C

Sec20∆C

Sec39p

Sec39pSec20∆CSec20∆C

97 kDa66

97 kDa6697 kDa66

45

45

31

97 kDa664531

97 kDa664531

97 kDa

97 kDa

97 kDa

66

66

66

45

45

31

97 kDa6645

97 kDa66

97 kDa66

45

31

31

Tip20∆N, Sec20∆CUse1∆C

a

d e f

b c

Figure 5 ER SNAREs Sec20p and Use1p bind Dsl1p complex via different subunits. (a) Sec20DC (cytoplasmic domain, residues 1–275) binds directly to

Tip20p. (b) Sec20DC binds the intact Dsl1p complex to form stoichiometric heterotetramers. (c) Use1DC (cytoplasmic domain; see text for details) binds

directly to Sec39p. (d) Use1DC, Sec39p, Dsl1p and Tip20p form a heterotetrameric complex. (e) Use1DC, Sec39p, Dsl1p, Tip20p and Sec20DC form a

heteropentameric complex. (f) Use1DC–Sec39p–Dsl1p does not bind Tip20DN–Sec20DC.

ART IC L E S

11 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 26: Nature Structural Molecular Biology February

We then docked full-length Tip20p onto the structure of theTip20p–Dsl1DC fusion protein by overlaying residues 9–32 ofTip20p (Fig. 3d). This docking exercise revealed a steric clash betweenthe N terminus of Dsl1DC (residues 42–46) and a short helix in theN-terminal region of Tip20p (residues 46–55). This clash is readilyresolved, however, by allowing flexibility in the loop connecting theTip20p N-terminal helix (residues 5–38) to the clashing helix (residues46–55). Allowing this flexibility is justified by the high likelihood thatthe specific positioning of this region is dictated by lattice contacts inthe Tip20p crystals. Thus, we propose that Tip20p and Dsl1p interactvia sequences at (Tip20p) or near (Dsl1p) their N termini. Further-more, we suggest that this interaction mode likely results in a pliableconnection, because of flexibility between the N-terminal helix ofTip20p and the bulk of the Tip20p molecule.

In light of the importance of the N-terminal regions in mediatingthe interaction of Tip20p and Dsl1p, we were surprised that replacingthe full-length Tip20p or Dsl1p subunits with N-terminally trun-cated versions had been reported to cause only relatively mild growthand trafficking defects in yeast10,31. Nonetheless, we were able toobtain additional evidence for these earlier conclusions by usingplasmid shuffling to replace wild-type subunits with mutant subunitsincapable of forming stable heterodimers. We tested the mutantsTip20p–I10D L28E, Tip20p-V17E, Tip20DN and Dsl1p–L55E L58D;in no case did we observe a growth defect (data not shown). Toattempt to resolve this apparent conundrum, we carried out additionalexperiments to investigate the network of protein interactions centeredaround Tip20p and Dsl1p.

Dsl1p ternary complexThe only other known component of the Dsl1p complex, in additionto Tip20p and Dsl1p itself, is Sec39p10,32. We were able to reconstitutestoichiometric Tip20p–Dsl1p–Sec39p complexes by combining thethree full-length recombinant proteins in an equimolar ratio(Fig. 4a). When only two of the three proteins were combined, wefound that Sec39p bound directly to Dsl1p but not to Tip20p(Supplementary Fig. 1a online and data not shown). These resultsare consistent with a model in which the Dsl1p subunit lies at the

center of a ternary Sec39p–Dsl1p–Tip20p complex, where it serves tolink Sec39p to Tip20p. Further experiments indicated that Tip20p andSec39p bind to nonoverlapping regions of Dsl1p. As discussed above,Tip20p binds a helix near the N terminus of Dsl1p. Moreover,Tip20DN did not bind to Dsl1p–Sec39p complexes (Fig. 4b), demon-strating that the inclusion of Tip20p into Sec39p–Dsl1p–Tip20pcomplexes requires its N-terminal helix. Sec39p, on the other hand,binds to a C-terminal region of Dsl1p. This conclusion is based on theobservation that a C-terminal fragment of Dsl1p, Dsl1DN1 (residues340–754), bound efficiently to Sec39p (Supplementary Fig. 1b).Furthermore, neither Dsl1DC (Supplementary Fig. 1c) nor Tip20p–Dsl1DC complexes (Fig. 4c) were able to bind Sec39p.

Interaction of the Dsl1p complex with ER-localized SNAREsThe subunits of the Dsl1p complex, although they lack potentialtransmembrane domains, localize to ER membranes10,25–27. Thislocalization may be mediated, at least in part, by an interactionbetween Tip20p and the ER SNARE protein Sec20p10,27,33,34. Indeed,Tip20p (originally named Tip1p) was first discovered in a screen for‘SEC twenty interacting protein’ genes27. We attempted to recapitulatethis interaction by testing whether Tip20p and the cytoplasmicdomain of the SNARE protein (residues 1–275, denoted Sec20DC)bind to one another directly. As predicted, they indeed formedSec20DC–Tip20p complexes (Fig. 5a). Sec20DC also bound efficientlyto Tip20DN (Supplementary Fig. 1d), demonstrating that theN-terminal region of Tip20p is not required for the interaction.Most importantly, Sec20DC bound the intact Dsl1p complex, forminga stoichiometric complex containing all four polypeptides (Fig. 5b).

Our findings imply that a chain of binary protein-protein inter-actions gives rise to a heterotetrameric Sec39p–Dsl1p–Tip20p–Sec20DC assembly. A strong prediction of the model is that disruptingthe interaction between Dsl1p and Tip20p would cause the hetero-tetrameric complex to dissociate into two binary complexes, Sec39p–Dsl1p and Tip20p–Sec20DC. We tested this prediction in two ways: byreplacing full-length Tip20p with Tip20DN and by replacing full-length Dsl1p with Dsl1DN. In both cases, only the two binarycomplexes were observed (Supplementary Fig. 1e,f). These resultsprovide strong support for the proposed arrangement of the Dsl1pcomplex subunits. Notably, they also establish that the Dsl1p complexinteracts with the t-SNARE Sec20p primarily, if not exclusively,through the Tip20p subunit.

Previous immunoprecipitation experiments using TAP-tagged pro-teins10 suggested that the Dsl1p complex associates stoichiometricallywith a second ER SNARE protein, Use1p. Like Sec20p, Use1p isrequired for Golgi-to-ER trafficking35,36. Although we were unable tooverexpress the cytoplasmic domain (residues 1–217) of Use1p insoluble form in Escherichia coli, we found that coexpressing it withSec39p yielded heterodimers that could be purified to near homo-geneity. Unfortunately, despite the addition of protease inhibitors,Use1p was invariably cleaved—presumably by a cellular protease—during purification. Nonetheless, both of the fragments (comprisingresidues 1–167 and 1–175) bind Sec39p (Fig. 5c). It is worth notingthat both of these Use1p fragments lack a substantial portion of themembrane-proximal SNARE motif and are therefore unlikely to formstable SNARE complexes. Gel filtration suggests that the purifiedSec39p–Use1DC complex, which elutes at a different volume fromSec39p alone, contains little if any unbound Sec39p (Fig. 5c). Thus,the relatively faint Coomassie Blue staining of Use1DC is not anindication of substoichiometric binding, but is rather a consequenceof its small size relative to Sec39p and the fact that it migrates as twodistinct bands. We were able to reconstitute complexes containing the

Vesicle

Sec22p

Ufe1p

Use1p

Sec20p

COPl coatcomplex

Dsl1p

Tip20p

ER membrane

Sec39p

C

Figure 6 Schematic model for the tethering of Golgi-derived retrograde

trafficking vesicles to the ER via bivalent attachment of the Dsl1p complex

to the ER SNAREs Use1p and Sec20p. Also shown are two additional

SNAREs, Ufe1p and Sec22p, that together with Use1p and Sec20p are

thought to form the quaternary SNARE complex that mediates membrane

fusion. A central, potentially disorderd region of Dsl1p (residues 388–467)

contains binding sites for COPI coat proteins30.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 1 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 27: Nature Structural Molecular Biology February

full Dsl1p complex plus either Use1DC alone (Fig. 5d) or bothUse1DC and Sec20DC (Fig. 5e). These results support the modelshown in Figure 6. As expected, based on this model, deletion of theTip20p N terminus severed the heteropentameric complex into twoparts, Use1DC–Sec39p–Dsl1p and Tip20p–Sec20DC (Fig. 5f).

DISCUSSIONThe Dsl1p complex is composed of only three subunits, fewer thanany of the other known multisubunit tethering complexes4. Herewe have shown that these subunits—Tip20p, Dsl1p and Sec39p—combine to form stoichiometric binary and ternary complexes. TheDsl1p subunit itself lies at the center of the complex, interacting via itsN-terminal region with the Tip20p subunit and via its C-terminalregion with the Sec39p subunit. By determining the X-ray structuresfor approximately half of the Dsl1p complex, including the entireTip20p subunit and domains A–B of the Dsl1p subunit, we have beenable to place these Dsl1p complex subunits into the same structuralfamily as the known exocyst and COG complex subunits, anddistinguish them from the unrelated TRAPP I complex subunits. Afourth multisubunit tethering complex, GARP, probably belongs tothe exocyst/COG/Dsl1p structural family, as judged by distantsequence homology among GARP, exocyst and COG subunits23,24.Perhaps most notably, comparison of Tip20p and Exo70p reveals astructural homology extending over all four domains of the Exo70pstructure. Owing to large differences in the relative orientations ofdomains A–C, the overall shapes of Tip20p and Exo70p are none-theless very different, with Exo70p adopting a straight, rodlike con-formation, whereas Tip20p shows a sharply bent conformation. Onlyone other protein bears a strong structural resemblance to the knownexocyst and Dsl1p subunits: the cargo binding domain of the yeastmyosin V molecular motor Myo2p37. Notably, this domain, too,functions in tethering processes; these include, for example, thetethering of yeast secretory vesicles to actin filaments38.

Despite the emerging evidence for widespread structural homologyamong the exocyst/COG/Dsl1p family of multisubunit tetheringcomplexes, it remains difficult to discern the extent to which thevarious complexes are homologous at the quaternary structural levelor to which they operate using homologous mechanisms. At present,the most distinctive property shared by all of these tethering factors isa relatively large array of interacting partners. The yeast exocystcomplex, for example, interacts with small GTP binding proteins onboth vesicles and the plasma membrane, and in addition binds theplasma membrane t-SNARE Sec9 and the plasma membrane lipidphosphatidylinositol 4,5-bisphosphate39–41. The COG complex inter-acts genetically and physically with the Rab protein Ypt1, GolgiSNAREs and COPI coat subunits42. A similar density of interactionpartners is emerging for the Dsl1p complex: although no Rabinteraction has been reported, each of the three Dsl1p complexsubunits binds directly to either a SNARE protein or, as discussedbelow, the COPI coat complex.

A potential role in catalyzing SNARE assembly is implied by thefinding that the Dsl1p complex uses distinct sites to bind two differentER-localized SNARE proteins. For example, the Dsl1p complex couldorient Use1p and Sec20p for facile assembly, or it could modify theirconformations to release autoinhibitory interactions, or it couldsimply increase the local concentration of Use1p relative to Sec20p.Further developments in our ability to generate the relevant recombi-nant SNARE proteins will be necessary to enable in vitro tests of thesepossibilities; unfortunately, to date we have been unable to producethe full-length cytoplasmic domains of Use1p or the third t-SNARE,Ufe1p (Fig. 6). Nonetheless, published evidence is consistent with

a role for the Dsl1p complex in SNARE assembly. Specifically,mutations or truncations in any of the Dsl1p complex subunitscause severe reductions in the amount of Use1p and Sec20pthat can be co-immunoprecipitated from yeast lysates10. It is intri-guing to speculate that the potential ability of Tip20p to adoptboth bent and Exo70p-like extended conformations might be impor-tant for mediating SNARE assembly. By controlling SNARE assembly,tethering complexes might orchestrate the events leading to mem-brane fusion.

The Dsl1p subunit contains a central region with overlappingbinding sites for two different subunits of the COPI vesicle coat proteincomplex25,30. This observation, in conjunction with our findings,immediately suggests a mechanism for Dsl1p complex–mediatedtethering of COPI vesicles to the ER, via bivalent recognition of theER SNAREs Sec20p and Use1p (Fig. 6). Indeed, our biochemicalanalysis establishes that the Tip20p subunit binds directly to Sec20p, aspredicted based on earlier studies10,27,33,34, whereas the Sec39p subunitbinds directly to Use1p. Thus, the Dsl1p complex contains a COPI-coated vesicle binding site at the center and one SNARE binding site ateach ‘end’. This constellation of binding sites provides a mechanism forvesicular tethering through the simultaneous recognition of vesicles(via direct interactions with the COPI coat itself) and the ER (via directinteractions with Sec20p, Use1p or both). Even in the absence of aDsl1p–Tip20p interaction, this tethering function could, in principle,be mediated by Use1p–Sec39p–Dsl1p, potentially explaining the lack ofan observed growth defect when the Dsl1p–Tip20p interaction isdisrupted (refs. 10,31 and data not shown). An alternative explanation,of course, is that the Dsl1p–Tip20p interaction is stabilized in vivo byadditional factors not present in our reconstituted system. TheTip20p–Dsl1p interaction could also be sensitive to the assemblystate of the SNARE proteins. In the future, additional structuralinformation about the Dsl1p complex and its binding partners shouldallow these and other models to be tested directly.

METHODSProtein production. We constructed expression plasmids derived from pQLink

(Addgene plasmid43 13670, 13667; for Use1DC and Sec39p coexpression),

pGEX-4T1 (GE Healthcare; for GST-Tip201–43), or pProExHTb (Gibco; for all

other proteins) using PCR. Mutations were introduced using QuickChange

Mutagenesis (Stratagene). All expression constructs were confirmed by DNA

sequencing. N-terminally His6-tagged proteins were overproduced in either

Rosetta or BL21 E. coli (Novagen) grown in LB media at 37 1C to an optical

density at 600 nm (OD600) of 0.6–0.8 and induced by the addition of 0.3–

0.5 mM IPTG. Cells were harvested after an additional 5 h of growth at 23 1C,

and the tagged proteins were purified from cell lysates by Ni2+-affinity

chromatography followed by removal of the His6 tag by digestion with rTEV

protease. The cleaved proteins were then further purified by anion exchange

(MonoQ; GE Healthcare) and, for crystallization, size-exclusion (S200; GE

Healthcare) chromatography. Purified proteins were stored at –80 1C in 15 mM

Tris, pH 8.0, 150 mM NaCl and 1–2 mM DTT. For the preparation of

selenomethionine (SeMet)-labeled Dsl1p (residues 1–361) and Tip20p (resi-

dues 1–701), methionine synthesis was suppressed by metabolic inhibition

essentially as described44. Each protein was expressed in Rosetta E. coli cells

grown in expression medium (M9 media supplemented with 5% (w/v)

dextrose and 0.7% (w/v) yeast nitrogen base without amino acids (DIFCO))

to an OD600 of approximately 0.8. L-selenomethionine (Acros Organics) was

added to a final concentration of 50 mg l–1, together with a mixture of amino

acids intended to inhibit the methionine biosynthetic pathway (lysine,

phenylalanine, threonine, arginine, isoleucine, leucine, valine; final concen-

trations, 50 mg l–1; Sigma). After 20 min, protein expression was induced by

adding 1 mM IPTG and shaking overnight at 18 1C (Tip20p) or 23 1C (Dsl1p).

SeMet-labeled proteins were purified as above, with 6 mM b-mercaptoethanol

present throughout.

ART IC L E S

12 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 28: Nature Structural Molecular Biology February

Crystallization and data collection. We obtained crystals of full-length SeMet-

substituted Tip20p by vapor diffusion at 23 1C using a 3:1 ratio of protein

(2 mg ml–1) and well buffer (0.1 M N-(2-acetamido)-2-iminodiacetic acid,

pH 6.0, 10% (w/v) PEG monomethyl ether 5K, 0.2 M LiSO4, 3% (v/v)

isopropanol, 5 mM DTT). After 3 d, crystals of dimensions 200 � 100 �75 mm were obtained and were subsequently cryoprotected using well buffer

supplemented with sequentially increasing amounts of glycerol (up to 22.5%

(v/v)) before flash freezing in liquid nitrogen. Crystals of native and SeMet-

substituted Dsl1p (residues 1–361) were obtained by vapor diffusion at 23 1C

using a 2:3 ratio of protein (4 mg ml–1) and well buffer (0.1 M HEPES, pH 7.5,

0.45–0.50 M sodium citrate). Crystals were cryoprotected by a brief soak in well

buffer supplemented with 30% (v/v) glycerol and flash frozen in liquid nitrogen.

A Tip20p–Dsl1p fusion protein (residues 1–40 of Tip20p linked to residues 37–

339 of Dsl1p by the linker GGGSGGGS) formed plate-like crystals by vapor

diffusion at 23 1C using a 5:5:1 ratio of protein (8 mg ml–1), well buffer (0.1 M

sodium acetate, pH 5.0, 0.2 M ammonium acetate, 20% (w/v) PEG 4000),

and additive (1.0 M lithium chloride). The crystals were flash frozen

without additional cryoprotection. All data were collected at the US National

Synchrotron Light Source (NSLS) beamlines X25 or X29 and processed using

the HKL suite45.

Structure determination and refinement. Tip20p crystallized in space group

P1, with four molecules in the asymmetric unit (Table 1). The structure was

determined using MAD phasing methods from SeMet-substituted protein to a

maximum resolution of 3.0 A. The SHELX46 suite of programs were used to

find the SeMet sites and calculate the initial electron-density maps.

The program SHARP47 was then used to further improve the phases.

Electron-density maps calculated from solvent-flattened experimental phases

showed clear density for a number of a-helices (Supplementary Fig. 2 online).

Noncrystallographic symmetry (NCS) was determined from the SeMet

sites located by SHELXD and from experimentally phased anomalous

difference maps. The four copies of the monomer did not share single

NCS relationships over the entire length of each of the molecules;

therefore, NCS restraints and averaging needed to be defined on a local

basis rather than globally over each chain. Sequence assignment was made

on the basis of both model- and experimentally phased electron-density maps,

with reference to four-fold averaged maps where necessary. Building was

done using the programs O48 and COOT49. The structure was refined using

the program CNS50 against data in the range 30–3.0 A from the peak SeMet

data set, which showed the least radiation damage. NCS restraints were applied

between molecules on main chain atoms (including Cb). The final model

(Supplementary Fig. 3 online) spans nearly the entire molecule, comprising

residues 5–701, with residues 217–234 and 546–551 missing because

of disordered loops that are not visible in any electron-density map.

Data collection and structure refinement statistics are summarized in

Table 1. In the Ramachandran plot, 92.0% of the residues are in the most

favored regions, whereas 7.0% fall in the additional allowed regions, as judged

using MolProbity51.

We initially determined the structure of Dsl1DC by MAD at 3.0-A resolution

using a SeMet-substituted crystal (Table 1). Initial phases were calculated using

SHELX and subsequently improved using SHARP (Supplementary Fig. 2).

Refinement using REFMAC5 (ref. 52) against native data to 2.4-A resolution

yielded a model containing residues 37–238 and 245–355; no interpretable

electron density was observed for residues 1–36, 239–244 or 356–361 (Supple-

mentary Fig. 3). In the Ramachandran plot, 96.8% of the residues are in the

most favored regions, whereas 2.9% fall in additional allowed regions.

We determined the Tip20p–Dsl1p fusion protein structure by molecular

replacement using PHASER53 (Table 1). Dsl1DC, broken into separate

Table 1 Data collection, phasing and refinement statistics

Tip20p SeMet Dsl1DC Dsl1DC SeMet Tip20p1–40–Dsl1p37–339

Data collection

Space group P1 P3221 P3221 C2

Cell dimensions

a, b, c (A) 85.5, 111.6, 149.8 110.6, 110.6, 77.2 110.9, 110.9, 78.3 168.4, 61.4, 37.3

a, b, g (1) 77.1, 88.1, 70.4 90.0, 90.0, 120.0 90.0, 90.0, 120.0 90.0, 92.0, 90.0

Peak Inflection Remote Peak Inflection Remote

Wavelength (A) 0.9792 0.9794 0.9640 1.1000 0.9795 0.9797 0.9641 1.0400

Resolution (A) 50–3.00

(3.11–3.00)

50–3.00

(3.11–3.00)

50–3.20

(3.31–3.20)

100–2.40

(2.49–2.40)

100–3.00

(3.11–3.00)

100–3.00

(3.11–3.00)

100–3.00

(3.11–3.00)

100–1.94

(1.99–1.94)

Rsym (%) 7.7 (49.7) 8.3 (67.0) 8.1 (56.4) 7.3 (43.3) 9.1 (39.1) 8.9 (48.4) 9.0 (61.7) 5.7 (46.7)

I /sI 12.2 (2.6) 10.8 (1.7) 10.9 (2.4) 28.7 (2.8) 34.0 (5.3) 30.5 (3.6) 27.4 (2.7) 18.0 (2.0)

Completeness (%) 98.5 (98.2) 97.5 (98.4) 98.5 (98.3) 99.2 (96.8) 99.9 (99.5) 99.8 (99.2) 99.8 (99.1) 90.7 (97.0)

Redundancy 3.7 (3.6) 3.7 (3.3) 3.7 (3.5) 7.1 (6.6) 10.4 (9.9) 10.1 (9.1) 9.5 (8.4) 3.1 (2.9)

Refinement

Resolution (A) 30–3.0 40.0–2.40 40.0–1.94

No. reflections 99,867 20,321 24,166

Rwork/ Rfree (%) 22.0/26.4 22.9/26.6 22.5/27.7

No. atoms

Protein 22,048a 2,549 2,566

Water 0 49 221

B-factors (A2)

Protein 79.4 69.2 28.3

Water NA 62.5 31.3

r.m.s. deviations

Bond lengths (A) 0.0084 0.009 0.009

Bond angles (1) 1.31 1.126 1.091

Values in parentheses are for the highest resolution shell.aThe asymmetric unit contains four Tip20p molecules.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 2 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 29: Nature Structural Molecular Biology February

domains, was used as the search model. The final model, built using COOT and

refined against native data to 1.94-A resolution using REFMAC5, includes

Tip20p residues 9–32 and Dsl1p residues 42–238 and 245–338. In the

Ramachandran plot, 99.0% of the residues are in the most favored regions,

with the remaining 1.0% falling in additional allowed regions.

Structure-based sequence alignment was guided by DaliLite54. Molecular

graphics were rendered using PyMOL (http://pymol.sourceforge.net).

Binding experiments. For gel-filtration binding experiments, we prepared

binding reactions by mixing proteins at 8 mM final concentration in a total

volume of 300 ml in 15 mM Tris, pH 8.0, 150 mM NaCl, 2 mM DTT. After

incubating on ice for 30 min, samples were loaded onto a Superdex 200 10/30

column (GE Healthcare) equilibrated with the same buffer and run at 4 1C

using a flow rate of 0.6 ml min–1. Equal volumes from individual 0.3-ml

fractions were analyzed using Coomassie Blue stained SDS-PAGE gels. For

measuring binding by isothermal titration calorimetry, we used a VP-ITC

titration microcalorimeter (Microcal). 100 mM Dsl1DC in 15 mM Tris, pH 8.0,

150 mM NaCl, 0.5 mM Tris(2-carboxyethyl) phosphine (TCEP) was injected

into the sample cell containing 10 mM Tip20p in the same buffer. The resulting

titration data were subjected to least-squares fitting using Origin version 7.0

(Origin Laboratories). For measuring binding to immobilized GST fusion

proteins, cell lysates containing GST or GST-Tip20p (residues 1–43) were

loaded onto glutathione resin (Clontech Laboratories). After washing the beads

with buffer (15 mM Tris, pH 8.0, 150 mM NaCl, 1 mM DTT), purified Dsl1DC

was added and binding was allowed to proceed for 1 h at 23 1C. Beads were

washed extensively with the same buffer, after which bound proteins were

analyzed using Coomassie-stained SDS-PAGE gels.

Accession codes. Protein Data Bank: Crystallographic coordinates for Tip20p,

Dsl1DC and the Tip20p–Dsl1p fusion protein have been deposited with

assession codes 3FHN, 3ETU and 3ETV, respectively.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe gratefully acknowledge B. Kokona and R. Fairman for sedimentation velocityanalytical ultracentrifugation; the staff of the National Synchrotron Light SourceX25 and X29 beamlines for assistance with X-ray data collection; O. Perisic foradvice on crystallization; M. Diefenbacher and A. Spang for many fruitfuldiscussions and for communicating results before publication; K. Bussow(Max Planck Institute for Molecular Genetics, Berlin) for reagents; and S. Munro,M. Munson, A. Spang and members of the Hughson laboratory for criticalcomments on the manuscript. This work was supported by the US NationalInstitutes of Health grant GM071574.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Pfeffer, S.R. Unsolved mysteries in membrane traffic. Annu. Rev. Biochem. 76,629–645 (2007).

2. Grosshans, B.L., Ortiz, D. & Novick, P. Rabs and their effectors: achieving specificity inmembrane traffic. Proc. Natl. Acad. Sci. USA 103, 11821–11827 (2006).

3. Pfeffer, S.R. Transport-vesicle targeting: tethers before SNAREs. Nat. Cell Biol. 1,E17–E22 (1999).

4. Cai, H., Reinisch, K. & Ferro-Novick, S. Coats, tethers, Rabs, and SNAREs worktogether to mediate the intracellular destination of a transport vesicle. Dev. Cell 12,671–682 (2007).

5. Gillingham, A.K. & Munro, S. Long coiled-coil proteins and membrane traffic. Biochim.Biophys. Acta 1641, 71–85 (2003).

6. Whyte, J.R. & Munro, S. Vesicle tethering complexes in membrane traffic. J. Cell Sci.115, 2627–2637 (2002).

7. Sztul, E. & Lupashin, V. Role of tethering factors in secretory membrane traffic. Am.J. Physiol. Cell Physiol. 290, C11–C26 (2006).

8. Sato, T.K., Rehling, P., Peterson, M.R., Emr, S.D. & Class, C. Vps protein complexregulates vacuolar SNARE pairing and is required for vesicle docking/fusion. Mol. Cell6, 661–671 (2000).

9. Seals, D.F., Eitzen, G., Margolis, N., Wickner, W.T. & Price, A.A. Ypt/Rab effectorcomplex containing the Sec1 homolog Vps33p is required for homotypic vacuolefusion. Proc. Natl. Acad. Sci. USA 97, 9402–9407 (2000).

10. Kraynack,, B.A. et al. Dsl1p, Tip20p, and the novel Dsl3(Sec39) protein are requiredfor the stability of the Q/t-SNARE complex at the endoplasmic reticulum in yeast. Mol.Biol. Cell 16, 3963–3977 (2005).

11. Shestakova, A., Suvorova, E., Pavliv, O., Khaidakova, G. & Lupashin, V. Interactionof the conserved oligomeric Golgi complex with t-SNARE Syntaxin5a/Sed5enhances intra-Golgi SNARE complex stability. J. Cell Biol. 179, 1179–1192(2007).

12. Cai, H. et al. TRAPPI tethers COPII vesicles by binding the coat subunit Sec23. Nature445, 941–944 (2007).

13. Sacher, M. et al. TRAPP, a highly conserved novel complex on the cis-Golgi thatmediates vesicle docking and fusion. EMBO J. 17, 2494–2503 (1998).

14. Kim, Y.G. et al. The architecture of the multisubunit TRAPP I complex suggests amodel for vesicle tethering. Cell 127, 817–830 (2006).

15. Terbush, D.R., Maurice, T., Roth, D. & Novick, P. The Exocyst is a multiprotein complexrequired for exocytosis in Saccharomyces cerevisiae. EMBO J. 15, 6483–6494 (1996).

16. Ungar, D. et al. Characterization of a mammalian Golgi-localized protein complex,COG, that is required for normal Golgi morphology and function. J. Cell Biol. 157,405–415 (2002).

17. Dong, G., Hutagalung, A.H., Fu, C., Novick, P. & Reinisch, K.M. The structures ofexocyst subunit Exo70p and the Exo84p C-terminal domains reveal a common motif.Nat. Struct. Mol. Biol. 12, 1094–1100 (2005).

18. Hamburger, Z.A., Hamburger, A.E., West, A.P., Jr & Weis, W.I. Crystal structure of theS. cerevisiae exocyst component Exo70p. J. Mol. Biol. 356, 9–21 (2006).

19. Moore, B.A., Robinson, H.H. & Xu, Z. The crystal structure of mouse Exo70reveals unique features of the mammalian exocyst. J. Mol. Biol. 371, 410–421(2007).

20. Sivaram, M.V., Furgason, M.L., Brewer, D.N. & Munson, M. The structure of the exocystsubunit Sec6p defines a conserved architecture with diverse roles. Nat. Struct. Mol.Biol. 13, 555–556 (2006).

21. Wu, S., Mehta, S.Q., Pichaud, F., Bellen, H.J. & Quiocho, F.A. Sec15 interacts withRab11 via a novel domain and affects Rab11 localization in vivo. Nat. Struct. Mol.Biol. 12, 879–885 (2005).

22. Cavanaugh, L.F. et al. Structural analysis of conserved oligomeric Golgi complexsubunit 2. J. Biol. Chem. 282, 23418–23426 (2007).

23. Koumandou, V.L., Dacks, J.B., Coulson, R.M. & Field, M.C. Control systems formembrane fusion in the ancestral eukaryote; evolution of tethering complexes andSM proteins. BMC Evol. Biol. 7, 29 (2007).

24. Whyte, J.R. & Munro, S. The Sec34/35 Golgi transport complex is related to theexocyst, defining a family of complexes involved in multiple steps of membrane traffic.Dev. Cell 1, 527–537 (2001).

25. Andag, U., Neumann, T. & Schmitt, H.D. The coatomer-interacting protein Dsl1p isrequired for Golgi-to-endoplasmic reticulum retrieval in yeast. J. Biol. Chem. 276,39150–39160 (2001).

26. Reilly, B.A., Kraynack, B.A., VanRheenen, S.M. & Waters, M.G. Golgi-to-endoplasmicreticulum (ER) retrograde traffic in yeast requires Dsl1p, a component of the ERtarget site that interacts with a COPI coat subunit. Mol. Biol. Cell 12, 3783–3796(2001).

27. Sweet, D.J. & Pelham, H.R. The TIP1 gene of Saccharomyces cerevisiae encodes an80 kDa cytoplasmic protein that interacts with the cytoplasmic domain of Sec20p.EMBO J. 12, 2831–2840 (1993).

28. VanRheenen, S.M., Reilly, B.A., Chamberlain, S.J. & Waters, M.G. Dsl1p, an essentialprotein required for membrane traffic at the endoplasmic reticulum/Golgi interface inyeast. Traffic 2, 212–231 (2001).

29. Kamena, F. & Spang, A. Tip20p prohibits back-fusion of COPII vesicles with theendoplasmic reticulum. Science 304, 286–289 (2004).

30. Andag, U. & Schmitt, H.D. Dsl1p, an essential component of the Golgi-endoplasmicreticulum retrieval system in yeast, uses the same sequence motif to interact withdifferent subunits of the COPI vesicle coat. J. Biol. Chem. 278, 51722–51734(2003).

31. Frigerio, G. The Saccharomyces cerevisiae early secretion mutant tip20 is syntheticlethal with mutants in yeast coatomer and the SNARE proteins Sec22p and Ufe1p.Yeast 14, 633–646 (1998).

32. Mnaimneh, S. et al. Exploration of essential gene functions via titratable promoteralleles. Cell 118, 31–44 (2004).

33. Sweet, D.J. & Pelham, H.R. The Saccharomyces cerevisiae SEC20 gene encodes amembrane glycoprotein which is sorted by the HDEL retrieval system. EMBO J. 11,423–432 (1992).

34. Novick, P., Ferro, S. & Schekman, R. Order of events in the yeast secretory pathway.Cell 25, 461–469 (1981).

35. Burri, L. et al. A SNARE required for retrograde transport to the endoplasmicreticulum. Proc. Natl. Acad. Sci. USA 100, 9873–9877 (2003).

36. Dilcher, M. et al. Use1p is a yeast SNARE protein required for retrograde traffic to theER. EMBO J. 22, 3664–3674 (2003).

37. Pashkova, N., Jin, Y., Ramaswamy, S. & Weisman, L.S. Structural basis for myosin Vdiscrimination between distinct cargoes. EMBO J. 25, 693–700 (2006).

38. Schott, D., Ho, J., Pruyne, D. & Bretscher, A. The COOH-terminal domain of Myo2p,a yeast myosin V, has a direct role in secretory vesicle targeting. J. Cell Biol. 147,791–808 (1999).

39. Sivaram, M.V., Saporita, J.A., Furgason, M.L., Boettcher, A.J. & Munson, M. Dimeriza-tion of the exocyst protein Sec6p and its interaction with the t-SNARE Sec9p.Biochemistry 44, 6302–6311 (2005).

40. Munson, M. & Novick, P. The exocyst defrocked, a framework of rods revealed. Nat.Struct. Mol. Biol. 13, 577–581 (2006).

ART IC L E S

12 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 30: Nature Structural Molecular Biology February

41. Zhang, X. et al. Membrane association and functional regulation of Sec3 by phospho-lipids and Cdc42. J. Cell Biol. 180, 145–158 (2008).

42. Ungar, D., Oka, T., Krieger, M. & Hughson, F.M. Retrograde transport on the COGrailway. Trends Cell Biol. 16, 113–120 (2006).

43. Scheich, C., Kummel, D., Soumailakakis, D., Heinemann, U. & Bussow, K. Vectors forco-expression of an unrestricted number of proteins. Nucleic Acids Res. 35, e43(2007).

44. Doublie, S. Preparation of selenomethionyl proteins for phase determination. MethodsEnzymol. 276, 523–530 (1997).

45. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillationmode. Methods Enzymol. 276, 307–326 (1997).

46. Sheldrick, G.M. A short history of SHELX. Acta Crystallogr. A 64, 112–122 (2008).47. Bricogne, G., Vonrhein, C., Flensburg, C., Schiltz, M. & Paciorek, W. Generation,

representation and flow of phase information in structure determination: recentdevelopments in and around SHARP 2.0. Acta Crystallogr. D Biol. Crystallogr. 59,2023–2030 (2003).

48. Jones, T.A., Zou, J.-Y., Cowan, S.W. & Kjeldgaard, M. Improved methods for buildingprotein models in electron density maps and the location of errors in these models.Acta Crystallogr. A 47, 110–119 (1991).

49. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. ActaCrystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).

50. Brunger, A.T. et al. Crystallography & NMR System (CNS): a new software system formacromolecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54,905–921 (1998).

51. Lovell, S.C. et al. Structure validation by Ca geometry: j, c and Cb deviation. Proteins50, 437–450 (2003).

52. Murshudov, G.N. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr. D Biol. Crystallogr. 53, 240–255 (1997).

53. Storoni, L.C., McCoy, A.J. & Read, R.J. Likelihood-enhanced fast rotation functions.Acta Crystallogr. D Biol. Crystallogr. 60, 432–438 (2004).

54. Holm, L. & Park, J. DaliLite workbench for protein structure comparison. Bioinfor-matics 16, 566–567 (2000).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 2 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 31: Nature Structural Molecular Biology February

High-resolution dynamic mapping of histone-DNAinteractions in a nucleosomeMichael A Hall1, Alla Shundrovsky1,4, Lu Bai1,4, Robert M Fulbright1, John T Lis2 & Michelle D Wang1,3

The nature of the nucleosomal barrier that regulates access to the underlying DNA during many cellular processes is not fullyunderstood. Here we present a detailed map of histone-DNA interactions along the DNA sequence to near base pair accuracy bymechanically unzipping single molecules of DNA, each containing a single nucleosome. This interaction map revealed a distinctB5-bp periodicity that was enveloped by three broad regions of strong interactions, with the strongest occurring at the dyadand the other two about ±40-bp from the dyad. Unzipping up to the dyad allowed recovery of a canonical nucleosome uponrelaxation of the DNA, but unzipping beyond the dyad resulted in removal of the histone octamer from its initial DNA sequence.These findings have important implications for how RNA polymerase and other DNA-based enzymes may gain access to DNAassociated with a nucleosome.

The nucleosome is the fundamental repeating unit of eukaryoticchromatin, consisting of B147 bp of DNA wrapped B1.7 timesaround a histone octamer1. Nucleosomes must be stable and yetdynamic structures, both maintaining eukaryotic DNA in a condensedstate and also permitting regulated access to genetic informationcontained therein. During many important cellular processes, DNAbinding proteins must access specific genomic regions that areoccluded by nucleosomes. In particular, in vitro studies show thatRNA polymerase slows down, pauses or stalls upon encountering anucleosome2–7. The resistance that RNA polymerase encounters whentranscribing a chromatin template should be largely dictated by boththe strengths and locations of histone-DNA interactions in thenucleosome. Therefore a detailed map of these interactions wouldlay an important foundation for understanding the structural detailsof eukaryotic transcription and how gene expression may be regulatedby histone modifications, DNA sequence and nucleosome remodeling.

Analysis of the crystal structure of the nucleosome indicates thathistone-DNA interactions are not uniform along the DNA1,8; however,experimental determination of this interaction map has proven to bechallenging and is still largely controversial. Although it is wellestablished that the overall stability of a nucleosome depends on itsconstituent DNA sequence and histone modifications9–11, the way inwhich specific interactions in a nucleosome lead to this stability is lesswell understood. The mechanical nature of this problem makes itideally suited for investigation using single-molecule manipulationapproaches12–19. Previously, we have stretched single DNA moleculesof chromatin and obtained data on the relative locations of stronghistone-DNA interactions14,17. These data indicate the presence ofthree regions of strong interactions, consistent with those suggested by

counting the number of apparent histone-DNA contacts seen in thenucleosome crystal structure20. However, subsequent single-moleculestretching experiments challenged this interpretation and suggestedthat force signatures from stretching experiments can be attributed tothe rotation of the spool geometry of the nucleosome rather thanregions of strong histone-DNA interactions21. These studies favor amodel in which histone-DNA interactions are uniform along theDNA22,23. Ambiguities exist because stretching experiments cannotreadily separate contributions of geometry from those of interactionstrengths, nor can they quantitatively assay interaction strengths nearthe dyad.

Recently, we have developed a method to sequentially determine theabsolute locations of histone-DNA interactions by mechanicallyunzipping a DNA molecule containing a nucleosome assembledwith histones purified from HeLa cells16. However, the precision ofthat method was insufficient to map out all of the densely packedhistone-DNA interactions in a nucleosome. In the current work, usingan improved unzipping method, we have mapped the locations of theinteractions to near base pair accuracy along the DNA and quantita-tively assayed the strengths of these interactions. The histone-DNAinteraction map, together with mechanical invasion experiments,provides a simple explanation of the pausing pattern of RNA poly-merase within a nucleosome and makes testable predictions on thefate of histones during transcription.

RESULTSMapping of interactions with near base pair precisionThe experimental configuration is sketched in Figure 1a (see alsoMethods and Supplementary Fig. 1 online). A DNA molecule

Received 15 August 2008; accepted 10 November 2008; published online 11 January 2009; doi:10.1038/nsmb.1526

1Department of Physics—Laboratory of Atomic and Solid State Physics, 2Department of Molecular Biology and Genetics and 3Howard Hughes Medical Institute, CornellUniversity, Ithaca, New York 14853, USA. 4Present addresses: Department of Mechanical Engineering, Yale University, New Haven, Connecticut 06511, USA (A.S.);Rockefeller University, 1230 York Avenue, New York, New York 10065, USA (L.B.). Correspondence should be addressed to M.D.W. ([email protected]).

12 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 32: Nature Structural Molecular Biology February

containing a single nucleosome uniquely positioned at a601 nucleosome-positioning sequence24 was attached to the surfaceof a microscope coverslip via one of its strands and to a microsphereheld in an optical trap via the other strand16. As the coverslip wasmoved away from the trapped microsphere, double-stranded DNA(dsDNA) was sequentially converted to single-stranded DNA (ssDNA)upon base pair separation. As the unzipping fork progressed throughthe nucleosome, it encountered resistance from histone-DNA inter-actions at well-defined locations and, because these interactionsrequire dsDNA, they were sequentially disrupted. The magnitude ofresistance should strongly correlate with histone-DNA affinity, andthus a histone-DNA interaction map was generated along the DNA.We improved the alignment method and showed that this techniqueachieved a resolution of better than 1 bp (Methods and Supplemen-tary Fig. 2c online). Its accuracy and precision in determining theabsolute sequence position of an interaction were both B1.5 bp(Methods and Supplementary Fig. 2b).

Mapping strengths of histone-DNA interactions in a nucleosomeTo quantitatively assay the strengths of the histone-DNA interactions,we unzipped through individual nucleosomal DNA molecules with a

constant unzipping force of B28 pN (Methods). Under a forceclamp25, the dwell times at different sequence positions measure thestrengths of interactions at those positions, provided that disruptionof each interaction follows a similar energy landscape. Thus thismethod allows direct mapping of the strengths of interactions.Figure 1b shows example traces for unzipping DNA through anucleosome under a constant force (Supplementary Fig. 3 onlinefor additional traces). DNA molecules were unzipped from bothdirections along the DNA (referred to as ‘forward’ and ‘reverse’)(Methods and Supplementary Fig. 3). In both cases, the unzippingfork did not move through the nucleosomal DNA at a constant ratebut instead dwelled at specific locations within the nucleosome,indicating the presence of strong interactions. In particular, thesetraces revealed that the fork dwelled with discrete steps spaced byB5 bp, and the longest dwell times tended to occur near the dyad.

We generated an interaction map by averaging dwell time histogrammeasurements from many traces from both forward and reverseunzipping (Fig. 2). Several features are evident from these plots.(i) There are three broad regions of strong interactions: one located atthe dyad and two approximately ±40 bp from the dyad. (ii) An B5-bpperiodicity occurred within each region of interaction. (iii) Theinteractions near the entry and exit DNA are particularly weak. Theunzipping fork did not dwell at a 20-bp region of both entry and exitDNA, indicating that the histones are only loosely bound to the DNA.(iv) For unzipping in both the forward and reverse directions, the firsttwo regions of interactions encountered were always detected, but notthe last region. This indicates that, once the dyad region of inter-actions was disrupted, the nucleosome became unstable and histonesdissociated from the 601 sequence. (v) The total dwell time in thenucleosome was longer in the forward direction compared with thatin the reverse direction, indicating nucleosomes were more difficult to

a

Force ForceUnzipping

b(To microscope coverslip)

5 bp Reverse unzipping

Forward unzipping

(To trapped microsphere)

Nucleosome

dsDNA

ssDNA ssDNA

0.5 1.5

–40

–20

0

20

40

Pos

ition

rel

ativ

e to

dya

d (b

p)

Time, arbitrary origin (s)

0.0 0.4

Dwelltime (s bp–1)

1.0 2.0

H3

H4

H2B

H2A

Dyad Dyad

Dyad

L2L1α1α1L1L2L2L1α1α1L1L1α1α1L1L2L2L1α1α1L1L2αN αN

Region 1

Region 2

Region 3

Reverseunzipping

Forwardunzipping

L2L2

–70 –60 –50 –40 –30 –20 –10 10 20 30 40 50 60 7000.00

0.05

0.10

0.15

0.20

Dw

ell t

ime

(s b

p–1)

Position relative to dyad (bp)

Figure 1 Nucleosome disruptions under a constant unzipping force.

(a) Experimental configuration. A DNA molecule was mechanically

unzipped through a nucleosome uniquely positioned at a 601 sequence.

(b) Representative traces for unzipping under a constant applied force

(B28 pN). Two traces are shown: one from forward unzipping (black) and

one from reverse unzipping (red). Both traces were low-pass filtered from the

raw traces (gray) to 60 Hz. The unzipping fork paused at specific locations,

which are evident from both the traces (left) and their corresponding dwell

time histograms (right).

Figure 2 Histone-DNA interaction map within a nucleosome core particle.

Above, the crystal structure of the nucleosome core particle1. Dots indicateregions where interactions between DNA and one of the core histones are

likely to occur. The two halves of the nucleosome are shown separately for

clarity. Below, a histone-DNA interaction map constructed from the averaged

dwell time histograms of the unzipping fork at constant force (B28 pN).

Individual traces were low-pass filtered to 60 Hz, and their dwell time

histograms were binned to 1 bp. A total of 27 traces from the forward

template and 30 traces from the reverse template were used for the

construction. Each peak corresponds to an individual histone-DNA

interaction, and the heights of the peaks are indicative of their relative

strengths. Three regions of strong interactions are indicated: one located at

the dyad (region 2) and two off-dyad locations (regions 1 and 3). Colored

boxes indicate predictions from the crystal structure of where individual

histone binding motifs are expected to interact with DNA. The H3

N-terminal a-helices (aN) and the histone loops (L1, L2) and a-helices

(a1) that compose the L1L2 and a1a1 DNA binding sites observed in the

crystal structure1 are also indicated.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 2 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 33: Nature Structural Molecular Biology February

disrupt when unzipped in the forward direction, probably reflectingthe nonpalindromic nature of the 601 sequence.

Highlighting histone-DNA interactions near entry or exit DNABecause the entry and exit DNA regulate the initial invasion of anucleosome by a motor protein, we carried out experiments startingfrom a lower unzipping force to specifically detect interactions at thoselocations and then ramped up the force to allow complete unzippingthrough the nucleosomal DNA. We unzipped through nucleosomalDNA molecules under a constant loading rate(8 pN s–1), highlighting the edge of the region

first encountered16 (Methods). Figure 3a shows example traces ofnucleosomes unzipped from both forward and reverse directions.Figure 3b shows the averaged dwell time histograms measured duringboth forward and reverse unzipping (see Supplementary Fig. 4 onlinefor additional traces). Aside from the aforementioned bias in the dwelltime histogram, many features are consistent with data from unzip-ping under a constant force. The interactions near the entry and exitDNA were more evident, still showing a clear B5-bp periodicity. Thisindicates that DNA segments at least up to 60 bp from the dyad havesubstantial interactions with the histone core.

Features shared by nucleosomes on arbitrary DNA sequencesTo determine whether the conclusions above are also valid fornucleosomes of arbitrary DNA sequence or just for the 601 sequence,we assembled nucleosomes onto a DNA segment that does not containany known positioning elements (Methods). The assembly conditionwas controlled to achieve a relatively low saturation level, so that eachDNA molecule had at most one nucleosome. When such nucleosomalDNA molecules were unzipped with a loading-rate clamp using thesame conditions as those of Figure 3, we found nucleosomes atvarious locations on the template (Supplementary Fig. 5 online),probably owing to the lack of any known nucleosome-positioningelement on this DNA sequence. Each unzipping trace contains two

–60 –50 –40 –30 –20 –10 10 20 30 40 50 600

–60 –50 –40 –30 –20 –10 10 20 30 40 50 600

0.0

0.1

0.2

Dw

ell

time

(s b

p–1)

Position relative to dyad (bp)

10

20

30

40

50

For

ce (

pN)

a

b

5 bp

Reverseunzipping

Forwardunzipping

Dyad

Region 1 Region 2 Region 3

–70 –60 –50 –40 –30 –20 –10 10 20 30 40 50 60 7000.00

0.05

0.10

0.15

0.20

Dw

ell t

ime

(s b

p–1)

Position relative to dyad (bp)

L2L1α1α1L1L2L2L1α1α1L1L2L1α1α1L1L2L2L1α1α1L1L2αN αNL2

Figure 3 Nucleosome disruptions under a constant loading rate.

(a) Representative traces for unzipping under a constant loading rate

(8 pN s–1). Two traces are shown: one from forward unzipping (black) and

one from reverse unzipping (red). For clarity, the naked DNA signature

before and after each nucleosome-disruption event is not shown. The

unzipping fork again paused at specific locations, which are evident from

both the traces (above) and their corresponding dwell time histograms

(below). (b) The average dwell time histograms of the unzipping fork under a

constant loading rate. Individual traces such as those shown above were low-

pass filtered to 60 Hz, and their dwell time histograms were binned to 1 bp.

A total of 36 traces from each direction was used for the construction.

Other notations are the same as those used in Figure 2.

10

20

30

40

Position relative to dyad (bp)

10

20

30

40

For

ce (

pN)

–200 –100 100 2000

–200 –100 100 2000

–200 –100 100 2000

10

20

30

40

Motor pausesin region 1

Motor approachescanonical nucleosome

Motor pausesin region 2near dyad

Nucleosome

Motor continuesunimpeded

past dyad

Motorprotein

1 2 3

Limited invasion of nucleosomeResulting structure after DNA relaxation

Region

a

b

c

Mimics:

Figure 4 Mechanical unzipping (left) to mimic

motor enzyme progression into a nucleosome(right). (a) DNA was unzipped with a loading-rate

clamp (8 pN s–1) until the unzipping force

reached B20 pN, which typically occurred within

the first region of interactions (green curve). The

unzipping force was then held at this force for

10 s, resulting in a horizontal force line due

to the hopping of the unzipping fork among

different positions within the first region. These

steps mimic a motor invasion into the first region

of interactions and subsequent pausing within

the region (right). The tension in the DNA was

then relaxed for B3 s, and the state of the

nucleosome was determined by unzipping a

second time (orange curve). (b) Similar to the

experiment in a, except that the unzipping force

was held at B21 pN immediately after the

unzipping fork entered the dyad region of

interactions. These steps mimic motor invasioninto the dyad region of interactions before

pausing (right). (c) Similar to the experiment in

b, except that DNA was unzipped past the dyad

region of interactions. This mimics motor

invasion past the dyad (right).

ART IC L E S

12 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 34: Nature Structural Molecular Biology February

major regions of strong interaction, with the second region presum-ably located near the dyad. These nucleosomes possessed essentiallyidentical characteristics to those of the 601 sequence, except that theirpeak forces within each region were typically smaller by a few pico-Newtons, reflecting weaker interactions of histone with nonposition-ing DNA sequences. The key features remained essentially identical:the presence of three regions of strong interactions, with the strongestat the dyad; the 5-bp periodicity; and the loss of nucleosome stabilityupon dyad disruption. These results indicate that the conclusions ofthis work are not restricted to nucleosomes on the 601 sequence butare general to nucleosomes on any sequence.

Mechanical invasion of a nucleosomeTo mimic invasion by a motor protein as it progresses into anucleosome, we carried out three sets of mechanical invasion experi-ments (Fig. 4). In the first set, unzipping was allowed to proceed intoand then held within the first region of strong interactions, before theDNA was relaxed to allow rezipping (Fig. 4a). The state of thenucleosome was subsequently examined by unzipping through theentire 601 sequence. Most of the traces examined in this way (75%)showed a canonical nucleosome structure at the 601 sequence. Theremaining 25% showed altered structures, probably resulting fromincomplete re-annealing of the DNA in the presence of histones(Supplementary Fig. 6 online). In the second set, unzipping wasallowed to proceed into and then held within the dyad region ofinteractions, before the DNA was relaxed to allow rezipping (Fig. 4b).Most of the resulting structures (70%) again resembled a canonicalnucleosome at the 601 sequence. In the third set, unzipping wasallowed to proceed past the dyad region of interactions, before theDNA was relaxed to allow rezipping (Fig. 4c). Subsequently, all tracesshowed force signatures indistinguishable from those of the naked 601sequence, indicating complete removal of the histone octamer fromthe 601 sequence. These results indicate that motor enzymes may becapable of accessing nearly half of the underlying DNA withoutresulting in histone dissociation.

DISCUSSIONHistone-DNA interaction map of a nucleosomeThis study presents a high-resolution quantitative map of histone-DNA interactions in a nucleosome. It not only provides a directmeasure of the locations of interactions to near base pair resolution,but also quantitatively assays the strengths of these interactions.The overall features of the interaction map are not specific to the601 sequence but are shared by DNA of arbitrary sequence (Supple-mentary Fig. 5).

The histone-DNA interaction map reveals the existence of threeregions of strong interactions. This is the most direct evidence that thehistone-DNA interactions within a nucleosome are not uniform: thestrongest region of interactions is located at the dyad and another tworegions of strong interactions lie approximately ±40 bp from the dyad.The locations of all three regions are strongly correlated with thoseestimated from the crystal structure of the nucleosome8,20. The centralregion is clearly the strongest, and this observation explains whynucleosome stability has been shown to be most sensitive to DNAsequence near the dyad26. The locations of the off-dyad regionsare also consistent with findings from our previous nucleosome-stretching measurements14,17. This also indicates that, in the single-molecule stretching experiments, nucleosome spool geometry may notcontribute substantially to force signatures or contribute in a waythat coincides with the effects due to the two regions of off-dyad interactions.

We observed a 5-bp periodicity in the interaction map, whereasbefore this work a 10-bp periodicity would have been expected. Thecrystal structure of the nucleosome shows that specific DNA-histonecontacts are made each time the DNA minor groove faces the histoneoctamer surface, leading to binding sites spaced at B10 bp1. Closerinspection shows that interactions from the two strands of the dsDNAcompletely stagger with each other and alternate between the twostrands along the sequence at every 5 bp. However, in crystal structureanalyses, the histone interaction with each minor groove of the DNAhas been treated as a single binding site1,20,27. This is reasonable, asdisruption of a histone interaction with one of the DNA strands at aminor groove may result in a concurrent disruption of a histoneinteraction with the other strand. Before our experiments, wehad anticipated that a 10-bp periodicity would be observed. Thefact that we have actually observed a 5-bp periodicity indicates that thehistone interactions with two strands of DNA at its minor grooveare decoupled, and can thus be disrupted sequentially insteadof simultaneously.

The interactions near the exit and entrance DNA were found to beparticularly weak, although they maintain the 5-bp periodicity. Wepropose that these weak interactions permit spontaneous peeling ofDNA ends from the octamer surface, as observed by equilibriumaccessibility assays28,29.

Implications for transcriptionAlthough RNA polymerases are known to be powerful molecularmotors30,31, the presence of a nucleosome still presents a majorobstacle2–7. The mechanical unzipping experiments described hereresemble the action of RNA polymerase, which opens up a transcrip-tion bubble and unzips the downstream DNA while advancing into anucleosome (Fig. 4, right). The histone-DNA interaction map (Fig. 2)has important implications for how RNA polymerases may gain accessto DNA associated with a nucleosome. RNA polymerase is expected toinitially proceed smoothly but pause when it encounters the off-dyadinteractions. Disruption of these interactions permits it to proceedtoward the dyad. The polymerase will then pause most strongly withinthe dyad region of interactions. Once it overcomes the dyad inter-actions, it will proceed through the rest of the nucleosomal DNA withminimal resistance. The interaction map also predicts that the601-positioned nucleosome acts as a polar barrier: transcription inthe forward direction is less efficient than in the reverse direction. It islikely that asymmetries of this sort exist in eukaryotic genomes, andthey may have functional importance for normal gene expressionwhere positioned nucleosomes reside at key positions transited byRNA polymerase (Pol) II32. Notably, many of these predictions havebeen verified by biochemical studies of Pol II or Pol III transcriptionthrough nucleosomes2–7.

Although the interaction map also suggests that transcriptionpausing may show a finer, B5-bp periodic pattern, an B10-bpperiodicity has been observed5,6,33,34. Although this periodicity hasbeen attributed to nucleosome restriction of RNA polymerase rotationcoupled with DNA loop formation, this work offers a simplerexplanation. The B10-bp periodicity in transcription pausing maybe due to RNA polymerase cooperatively disrupting a pair of inter-actions located at each minor groove of DNA.

Although the pausing pattern of RNA polymerase is dictated byboth the mechanical barriers it encounters and its own motor proper-ties, similarities between the dwell time in the histone-DNA inter-action map (Fig. 2) and the polymerase pausing pattern within anucleosome suggest that the barriers encountered by the poly-merase are a major determinant of its pausing behavior. Thus, this

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 2 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 35: Nature Structural Molecular Biology February

explanation of the pausing pattern within a nucleosome provides asimpler explanation than existing models3,5,33. The consistency ofthe histone-DNA interaction map with biochemical assays of theRNA polymerase pausing pattern is an indication that this mapmay also be used to predict how other motor enzymes passthrough nucleosomes.

The results from nucleosome-invasion experiments yield testablepredictions regarding the fate of nucleosomes during transcription. IfRNA polymerase backtracks before the dyad, histones will notdissociate from the DNA but will tend to reform a canonical nucleo-some at the same location (Supplementary Discussion online),perhaps encouraging further backtracking of the polymerase. Oncethe RNA polymerase passes the dyad, histones will most likely beremoved from their original locations.

METHODSNucleosomal DNA templates. We prepared nucleosomal DNA templates using

methods similar to those previously described16. Briefly, each DNA construct

consisted of two separate segments (Supplementary Fig. 1a). An B1.1-kbp

anchoring segment was prepared by PCR from plasmid pRL574 (ref. 35) using

a digoxigenin-labeled primer and then digested with BstXI (NEB) to produce a

ligatable overhang. Each unzipping segment was prepared by PCR using a

biotin-labeled primer, and then digested with BstXI and dephosphorylated

using calf intestinal phosphatase (CIP; NEB) to introduce a nick into the final

DNA template. Nucleosomes were assembled from purified HeLa histones onto

the unzipping fragment by a well-established salt-dialysis method36. The two

segments were joined by ligation immediately before use. This produced the

complete template that was labeled with a single dig tag on one end and a

biotin tag located 7 bp after the nick in one DNA strand.

We prepared the forward 601 unzipping segment (0.8 kbp) by PCR from

plasmid 601 (ref. 24) as described previously16. The reverse template is nearly

identical to the forward template, except that the reverse unzipping segment

was flipped so that the unzipping fork would approach the nucleosome from

the opposite direction. To achieve this, the reverse segment was produced using

different primers, such that the ligatable overhang produced through BstXI

digestion and the nick introduced via CIP were located on the end opposite to

that of the forward segment. The unzipping segment that does not contain any

known nucleosome-positioning element (B0.8 kbp) was prepared by PCR

from plasmid pBR322 (NEB).

Hairpin DNA templates. We prepared three different hairpin templates from

the forward template (without nucleosomes) by truncating the unzipping

segment at precise locations using restriction enzymes and ligating the same

hairpin onto the end in each case. The lengths of the unzipping templates are

indicated in Supplementary Figure 2b.

Unzipping under constant force. For experiments involving unzipping

through a nucleosome under a constant force, we started the unzipping with

a loading-rate clamp (8 pN s–1) until the desired force of B28 pN was reached

within a nucleosome. The unzipping force was then held constant by feedback

control of the coverslip position25. This force is much stronger than the

sequence-dependent unzipping force of the naked 601 sequence (13–16 pN),

minimizing the dwell time contribution due solely to DNA base-pairing

interactions, but is weak enough to allow sufficient dwell time at each DNA

sequence position for detection. Upon reaching the end of the 601 sequence,

the unzipping was continued under a loading-rate clamp (8 pN s–1). Unzipping

before and after the 601 segment under a constant loading rate generated

distinct unzipping signatures that could be used for data alignment (see below).

Unzipping under constant loading rate. An optical trapping setup was used to

unzip a single DNA molecule by moving the microscope coverslip horizontally

away from the optical trap (Supplementary Fig. 1b). As barriers to fork

progression were encountered, a computer-controlled feedback loop increased

the applied load linearly with time (8 pN s–1) as necessary to overcome those

barriers. Whenever the unzipping fork stopped, for example, at an interaction,

the unzipping force was ramped up linearly with time until the interaction was

disrupted37. When two interactions occurred in close vicinity, upon the

disruption of the first interaction the force was unable to relax back to the

baseline before being ramped up again for the second interaction, subjecting

this subsequent interaction to a higher initial force. Therefore, for each region

of interactions, the dwell time histogram highlighted the edge of the region first

encountered. Another feature of this method was the display of the distinctive

force signature for a nucleosome, allowing for ease of identification of the

nucleosome structure16 (compare traces in Supplementary Fig. 3 with Supple-

mentary Fig. 4).

Data collection and alignment. Data were low-pass filtered to 5 kHz, digitized

at B12 kHz and later filtered to 60 Hz. Previously, to improve the positional

precision and accuracy, the experimental curves were aligned to the theoretical

curve by cross-correlation of a region immediately preceding the nucleosome

disruption16. In the current work, we further improved the precision and

accuracy of the data by an additional cross-correlation of a region immediately

following the nucleosome disruption. To account for minor instrumental drift,

trapping-bead size variations and DNA linker variations, the alignment allowed

for a small additive shift (o5 bp) and multiplicative linear stretch (o2%)

using algorithms similar to those previously described38.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank members of the Wang laboratory and B. Brower-Toland for criticalreading of the manuscript, J. Jin for helpful advice with biochemical preparationsand D.S. Johnson for helpful discussions on instrumentation. We wish toacknowledge support from the US National Institutes of Health (GM059849to M.D.W.; GM25232 to J.T.L.), the Keck Foundation (to M.D.W.), the CornellNanobiotechnology Center (to M.D.W. and J.T.L.) and the Molecular BiophysicsTraining Grant Traineeship (to M.A.H.).

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F. & Richmond, T.J. Crystalstructure of the nucleosome core particle at 2.8 resolution. Nature 389, 251–260(1997).

2. Bondarenko, V.A. et al. Nucleosomes can form a polar barrier to transcript elongation byRNA polymerase II. Mol. Cell 24, 469–479 (2006).

3. Kireeva, M.L. et al. Nature of the nucleosomal barrier to RNA polymerase II. Mol. Cell18, 97–108 (2005).

4. Kireeva, M.L. et al. Nucleosome remodeling induced by RNA polymerase II: loss of theH2A/H2B dimer during transcription. Mol. Cell 9, 541–552 (2002).

5. Studitsky, V.M., Kassavetis, G.A., Geiduschek, E.P. & Felsenfeld, G. Mechanism oftranscription through the nucleosome by eukaryotic RNA polymerase. Science 278,1960–1963 (1997).

6. Studitsky, V.M., Walter, W., Kireeva, M., Kashlev, M. & Felsenfeld, G. Chromatinremodeling by RNA polymerases. Trends Biochem. Sci. 29, 127–135 (2004).

7. Walter, W., Kireeva, M.L., Studitsky, V.M. & Kashlev, M. Bacterial polymerase and yeastpolymerase II use similar mechanisms for transcription through nucleosomes. J. Biol.Chem. 278, 36148–36156 (2003).

8. Davey, C.A., Sargent, D.F., Luger, K., Maeder, A.W. & Richmond, T.J. Solvent mediatedinteractions in the structure of the nucleosome core particle at 1.9 resolution. J. Mol.Biol. 319, 1097–1113 (2002).

9. Cosgrove, M.S., Boeke, J.D. & Wolberger, C. Regulated nucleosome mobility and thehistone code. Nat. Struct. Mol. Biol. 11, 1037–1043 (2004).

10. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705(2007).

11. Segal, E. et al. A genomic code for nucleosome positioning. Nature 442, 772–778(2006).

12. Bancaud, A. et al. Nucleosome chiral transition under positive torsional stress in singlechromatin fibers. Mol. Cell 27, 135–147 (2007).

13. Bennink, M.L. et al. Unfolding individual nucleosomes by stretching single chromatinfibers with optical tweezers. Nat. Struct. Biol. 8, 606–610 (2001).

14. Brower-Toland, B.D. et al. Mechanical disruption of individual nucleosomes reveals areversible multistage release of DNA. Proc. Natl. Acad. Sci. USA 99, 1960–1965(2002).

15. Cui, Y. & Bustamante, C. Pulling a single chromatin fiber reveals the forcesthat maintain its higher-order structure. Proc. Natl. Acad. Sci. USA 97, 127–132(2000).

16. Shundrovsky, A., Smith, C.L., Lis, J.T., Peterson, C.L. & Wang, M.D. Probing SWI/SNFremodeling of the nucleosome by unzipping single DNA molecules. Nat. Struct. Mol.Biol. 13, 549–554 (2006).

ART IC L E S

12 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 36: Nature Structural Molecular Biology February

17. Brower-Toland, B. et al. Specific contributions of histone tails and their acetylation tothe mechanical stability of nucleosomes. J. Mol. Biol. 346, 135–146 (2005).

18. Gemmen, G.J. et al. Forced unraveling of nucleosomes assembled on heterogeneousDNA using core histones, NAP-1, and ACF. J. Mol. Biol. 351, 89–99 (2005).

19. Pope, L.H. et al. Single chromatin fiber stretching reveals physically distinct popula-tions of disassembly events. Biophys. J. 88, 3572–3583 (2005).

20. Luger, K. & Richmond, T.J. DNA binding within the nucleosome core. Curr. Opin.Struct. Biol. 8, 33–40 (1998).

21. Mihardja, S., Spakowitz, A.J., Zhang, Y. & Bustamante, C. Effect of force on mono-nucleosomal dynamics. Proc. Natl. Acad. Sci. USA 103, 15871–15876 (2006).

22. Kulic, I.M. & Schiessel, H. DNA spools under tension. Phys. Rev. Lett. 92, 228101(2004).

23. Sakaue, T. & Lowen, H. Unwrapping of DNA-protein complexes under externalstretching. Phys. Rev. E 70, 021801 (2004).

24. Lowary, P.T. & Widom, J. New DNA sequence rules for high affinity binding to histoneoctamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276,19–42 (1998).

25. Johnson, D.S., Bai, L., Smith, B.Y., Patel, S.S. & Wang, M.D. Single-molecule studiesreveal dynamics of DNA unwinding by the ring-shaped T7 helicase. Cell 129,1299–1309 (2007).

26. Thastrom, A., Bingham, L.M. & Widom, J. Nucleosomal locations of dominant DNAsequence motifs for histone-DNA interactions and nucleosome positioning. J. Mol.Biol. 338, 695–709 (2004).

27. Muthurajan, U.M. et al. Crystal structures of histone Sin mutant nucleosomes revealaltered protein-DNA interactions. EMBO J. 23, 260–271 (2004).

28. Li, G., Levitus, M., Bustamante, C. & Widom, J. Rapid spontaneous accessibility ofnucleosomal DNA. Nat. Struct. Mol. Biol. 12, 46–53 (2005).

29. Li, G. & Widom, J. Nucleosomes facilitate their own invasion. Nat. Struct. Mol. Biol.11, 763–769 (2004).

30. Wang, M.D. et al. Force and velocity measured for single molecules of RNA polymer-ase. Science 282, 902–907 (1998).

31. Galburt, E.A. et al. Backtracking determines the force sensitivity of RNAP II in a factor-dependent manner. Nature 446, 820–823 (2007).

32. Albert, I. et al. Translational and rotational settings of H2A.Z nucleosomes across theSaccharomyces cerevisiae genome. Nature 446, 572–576 (2007).

33. Studitsky, V.M., Clark, D.J. & Felsenfeld, G. Overcoming a nucleosomal barrier totranscription. Cell 83, 19–27 (1995).

34. Bednar, J., Studitsky, V.M., Grigoryev, S.A., Felsenfeld, G. & Woodcock, C.L. The natureof the nucleosomal barrier to transcription: direct observation of paused intermediatesby electron cryomicroscopy. Mol. Cell 4, 377–386 (1999).

35. Schafer, D.A., Gelles, J., Sheetz, M.P. & Landick, R. Transcription by singlemolecules of RNA polymerase observed by light microscopy. Nature 352, 444–448(1991).

36. Lee, K.M. & Narlikar, G. Assembly of nucleosomal templates by salt dialysis. Curr.Protoc. Mol. Biol. 21, 21 6 (2001).

37. Koch, S.J., Shundrovsky, A., Jantzen, B.C. & Wang, M.D. Probing protein-DNAinteractions by unzipping a single DNA double helix. Biophys. J. 83, 1098–1105(2002).

38. Deufel, C. & Wang, M.D. Detection of forces and displacements along the axialdirection in an optical trap. Biophys. J. 90, 657–667 (2006).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 2 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 37: Nature Structural Molecular Biology February

An RNA code for the FOX2 splicing regulator revealedby mapping RNA-protein interactions in stem cellsGene W Yeo1,2,4, Nicole G Coufal2, Tiffany Y Liang1,3,4, Grace E Peng2, Xiang-Dong Fu3 & Fred H Gage2

The elucidation of a code for regulated splicing has been a long-standing goal in understanding the control of post-transcriptionalgene expression events that are crucial for cell survival, differentiation and development. We decoded functional RNA elementsin vivo by constructing an RNA map for the cell type–specific splicing regulator FOX2 (also known as RBM9) via cross-linkingimmunoprecipitation coupled with high-throughput sequencing (CLIP-seq) in human embryonic stem cells. The map identifieda large cohort of specific FOX2 targets, many of which are themselves splicing regulators, and comparison between the FOX2binding profile and validated splicing events revealed a general rule for FOX2-regulated exon inclusion or skipping in a position-dependent manner. These findings suggest that FOX2 functions as a critical regulator of a splicing network, and we further showthat FOX2 is important for the survival of human embryonic stem cells.

Understanding regulated gene expression is vital to providing insightsinto disease and development. Whereas much effort has been placedon deciphering transcriptional regulation though interactions withfunctional DNA elements by the more than a thousand transcriptionfactors encoded in mammalian genomes, little is known about anequally sizable number of RNA binding proteins and their involve-ment in diverse aspects of RNA metabolism. A dominant function ofthese RNA binding proteins is to regulate alternative splicing, a majorform of post-transcriptional regulation of gene expression that isthought to contribute to the structural and functional diversityof the cellular proteome1. One of the ultimate goals in the RNAfield is to deduce a set of rules that govern the control of splice-siteselection to produce the ‘splicing code’. This goal can now beapproached due to recent advances in functional genomics andhigh-throughput sequencing.

Human embryonic stem cells (hESCs) are pluripotent cells thatpropagate perpetually in culture as undifferentiated cells and can bereadily induced to differentiate into various cell types both in vitro andin vivo2. As hESCs can theoretically generate most if not all of the celltypes that constitute a human, they serve as an excellent model forunderstanding early embryonic development. Furthermore, hESCs area nearly infinite source for generating specialized cells such as neuronsand glia for potential therapeutic purposes or for screening smallmolecules to intervene with specific biological processes3,4. Therefore,there has been intense interest in identifying the molecular changesthat are important for the survival of hESCs, maintenance ofpluripotency and promotion of cell differentiation.

In our previous Affymetrix exon-tiling array analysis, we demon-strated that the FOX binding motif GCAUG was enriched proximal toa set of exons that are alternatively spliced in hESCs, suggesting thatFOX splicing factors may have a vital role in the biology of hESCs5.Thus, we selected the RNA binding protein FOX2 to identify thefunctional RNA elements in the human genome in hESCs by deepsequencing. FOX2, a member of the FOX family of RNA bindingproteins, was initially identified as a factor involved in dosagecompensation in Caenorhabditis elegans and was later found to beevolutionarily conserved across mammalian genomes6. FOX2 is bestknown for its tissue-specific expression in muscle and neuronal cellsand for its activity in regulated splicing in those highly differentiatedcell types6,7. Unexpectedly, we found that FOX2 is expressed abun-dantly in the hESC lines HUES6 and H9, which are positive for thepluripotency markers OCT4, SOX2, NANOG and SSEA4 (Fig. 1a andSupplementary Fig. 1 online). In contrast, FOX1 (also known asA2BP1) is not expressed in any hESCs examined. Consistent with theirtissue-specific expression in cells of the neural lineage, both FOX1 andFOX2 are expressed in neural progenitors.

RESULTSCLIP-seq for mapping functional RNA elementsWe began to address the function of FOX2 in hESCs by developing ahigh-throughput experimental approach to large-scale identificationof FOX2 targets in vivo, by coupling a modified CLIP technology8

with high-throughput sequencing, a method we refer to as CLIP-seq (Fig. 1b). Key features of CLIP include: stabilization of in vivo

Received 27 October 2008; accepted 11 December 2008; published online 11 January 2009; doi:10.1038/nsmb.1545

1Crick-Jacobs Center for Theoretical and Computational Biology, Salk Institute, 10010 North Torrey Pines Road, La Jolla, California 92037, USA. 2Laboratory ofGenetics, Salk Institute, 10010 North Torrey Pines Road, La Jolla, California 92037, USA. 3Department of Cellular and Molecular Medicine, University of California,San Diego, 9500 Gilman Drive, La Jolla, Calfornia 92093-5004, USA. 4Present address: Stem Cell Program, Department of Cellular and Molecular Medicine,University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-5004, USA. Correspondence should be addressed to G.W.Y. ([email protected]),F.H.G. ([email protected]) or X.-D.F. ([email protected]).

13 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 38: Nature Structural Molecular Biology February

protein-RNA interactions by UV irradiation, antibody-mediatedenrichment of specific RNA-protein complexes, SDS-PAGE to isolateprotein-RNA adducts after RNA trimming by nuclease, 3¢ RNA linkerligation and 5¢ labeling using 32P-gATP. To prevent continuous RNAtrimming by the RNase A used in the original protocol, we usedmicrococcal nuclease (MNase), which can be inactivated by EGTA, amodification that improves RNA recovery. Titration of the MNaseallowed controlled trimming, resulting in short RNA molecules in therange of 50 nucleotides (nt) to 100 nt that remain attached to theprotein (bands A and B in Fig. 1b). Recovered RNA was ligated to a5¢ linker before amplification by reverse-transcription PCR (RT-PCR).We designed both linkers to be compatible for sequencing on theIllumina 1G Genome Analyzer.

We obtained 5.3 million 36-nt sequence reads from anti-FOX2–enriched RNA from HUES6 hESCs, 83% of which (4.4 million) wereuniquely mapped to the repeat-masked human genome (data availableat hg17 and hg18 genome browsers (http://genome.ucsc.edu/) under‘Regulation’). Our comparisons between genes containing CLIP reads

in smaller-scale sequencing runs in HUES6and H9 cells indicated a high overlap rangingfrom 70% to 90% (Supplementary Fig. 2online), indicating that FOX2 binds to similartargets in both cell lines. As expected from asplicing regulator that interacts primarily withtranscribed mRNA, we found that the FOX2binding sites were largely confined withinprotein-coding genes (B3.7 million or 80%of total tags), 97% of which are oriented in thedirection of transcription (sense-strand reads)(Supplementary Fig. 3 online), confirmingthat DNA contamination was not a majorissue with our preparation. Among annotatedhuman genes, 16,642 (75%) contain one readwithin exonic or intronic regions, 3,598 (22%)have up to 10 reads and 543 (3%) harbored 10to more than 1,000 reads. This distributionprobably reflects the abundance of individualgene transcripts expressed in HUES6 cells, anassumption that was confirmed by the obser-vation that the read density was positivelycorrelated with gene expression measured onAffymetrix exon arrays in general (Supple-mentary Fig. 4 online). This observationindicates that we cannot identify preferredtargets for FOX2 by simply rank ordering thereads that map to individual transcripts.

Genomic distribution of in vivo FOX2 binding sitesTo distinguish enriched FOX2 binding sites from background binding,we established gene-specific thresholds based on the assumption thatFOX2 may prefer to bind to specific loci, rather than bindingrandomly to distributed sites along individual transcripts. We there-fore computationally extended each genome-aligned sequence read inthe 5¢-to-3¢ direction by 100 nt—the average length of RNA fragmentsafter MNase treatment. The height at each position indicates thenumber of reads that overlap with that position.

To identify enriched FOX2 binding in clusters, we determined thefalse-discovery rate (FDR) for each position by computing the ‘back-ground’ frequency after randomly placing the same number ofextended reads within the gene for 100 iterations, similar to anapproach that has been described for finding DNA-protein interactionclusters9. For a particular height, our modified FDR was computed asthe ratio of the probability of observing background positions ofat least that height to one standard deviation above the averageprobability of observing actual positions of at least that height. We

FOX2/RBM9 DAPI

SSEA4

UVUV UV

1G genome analyzer

Proteinase K

Solexa B

Solexa A5′-P

RBP

fox2

fox2

fox2

fox2

MNaseMNase

Alkalinephosphatase

Polynucleotidekinase

RNA ligase

kDa

MNase

15U 0.2U

98 DCBA

49

38

28

62RNA linker

3′-OH

3′-P5′-OH

5′-OH

5′-OH

5′-γ-32P

5′-γ-32P

3′-OH

RT-PCR

RNA ligase

OCT4

a

b

MergehESC Marker Figure 1 CLIP-seq of FOX2 in hESCs. (a) FOX2

is expressed in hESCs positive for pluripotent

markers such as cytoplasmic SSEA4 and nuclear

OCT4. Nuclei indicated by DAPI staining.

(b) Flow chart of CLIP-Seq. RNA in complex with

RNA binding proteins from UV-irradiated HUES6

hESCs was subjected to enrichment using anti-

FOX2 rabbit polyclonal antibody. RNA in the

complex was trimmed by MNase at two different

concentrations, followed by autoradiography, as

illustrated. Protein-RNA covalent complexes

corresponding to bands A and B were recovered

following SDS-PAGE, RT-PCR amplified and

sequenced by the Illumina 1G genome analyzer.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 3 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 39: Nature Structural Molecular Biology February

identified FOX2 binding clusters by grouping positions thatsatisfied the condition FDR o 0.001 and occurred within 50 nt ofeach other. This analysis identified 6,123 FOX2 binding clustersthroughout the human genome. The median distance between clusterswithin protein-coding genes was 813 nt, whereas the median distancebetween randomly chosen regions of similar sizes was 7,978 nt(Supplementary Fig. 5 online). This result demonstrated that trueFOX2 binding loci are indeed distributed non-randomly in protein-coding genes.

To further group the clusters, we determined the reduction incluster number as a function of increasing window size. The numberof clusters decreased eight-fold as the window size increased until thethreshold of 1,500 nt was reached (Supplementary Fig. 6 online). Incontrast, the number of randomly chosen regions of similar sizesremained unaltered at any window size. Using this approach, weidentified 3,547 combined clusters within the 1500-nt window, prob-ably representing true FOX2 binding events, occurring either indivi-dually or in groups, in the human genome.

Having established grouped clusters, we next determined the motifsenriched in the clusters compared to randomly selected regions ofsimilar sizes within the same protein-coding genes. Using Z-scorestatistics10, we found that the most significantly enriched hexamerwithin the clusters was UGCAUG (Z-score of 25.16; P-value o 10�70)(Fig. 2a), which exactly matched the biochemically defined consensusFOX1 and FOX2 binding site11. We next calculated the fractions of thegrouped clusters that contained the consensus, observing that 1,052(33%) and 704 (22%) of the FOX2 binding clusters harbored theGCAUG and UGCAUG motif, respectively, compared to 23% and11%, respectively, of randomly located regions. Although thisenrichment is highly significant (P-value o10�10), the observationindicates that FOX2 did not bind to all available consensus-containingsequences and that FOX2 may also recognize other types of sequencesin complex with other RNA-processing regulators. Consistent with thepreviously published bioinformatics analyses showing that compositefunctional RNA elements tend to be more evolutionarily conservedthan other genomic regions that contain just the consensus12–14, wefound that 8% and 5% of FOX2 binding clusters contained one ormore GCAUG and UGCAUG, respectively, that were perfectly con-served across four mammalian genomes (human, dog, mouse andrat). In contrast, only 2% (four-fold difference) and 1% (five-fold

difference) of randomly selected pre-mRNA regions contained one ormore perfectly conserved GCAUG and UGCAUG sites. These findingsstrongly suggest the functional importance of the FOX2 binding sitesidentified by CLIP-seq.

Preferential FOX2 action near alternative splice sitesTo characterize the FOX2 binding profile relative to known splice sites,we found a median of 1.7 reads per kilobase of nucleotide sequencewithin protein-coding genes, with 13.5 reads per kilobase in exons,2.2 reads per kilobase in introns, 0.3 reads per kilobase in promotersand 0.7 reads per kilobase in 3¢ untranslated regions (UTRs). Thisobservation suggests that FOX2 binds preferentially to exonic andintronic regions, consistent with its function as a splicing regulator.We observed that FOX2 binding clusters were 20-fold more likely to liewithin exons and flanking intronic regions relative to randomlyselected regions in the same protein-coding genes (Fig. 2b).This enrichment decreases to the background level B3 kb awayfrom the exons. Notably, the FOX2 binding sites were significantly(P-value o 0.001) enriched in the downstream intronic region B50–100 nt from the 5¢ splice site, consistent with several characterizedFOX2 binding sites in regulated splicing7. The FOX2 binding siteswere also enriched in the upstream intronic regions near the 3¢ splicesite, but at a level 2.5-fold to 3-fold lower than in the downstreamregions (Fig. 2b).

Preferential FOX2 binding to intronic regions near both 3¢ and5¢ splice sites supports a crucial role of FOX2 in splice-site selection.Previous studies showed that intronic regions flanking alternativelyspliced exons are more conserved than those flanking constitutiveexons15,16. To determine whether FOX2 functions through conservedcis-acting regulatory RNA elements, we compared the association ofmapped FOX2 binding clusters with constitutive and alternative splicesites and found that the highest enrichment occurred around alter-native conserved exons (ACE) (Fig. 2b). Conversely, using Phastconsscores as a measure of evolutionary sequence conservation (Phastconsscores vary from 0 to 1, with 1 indicating high conservation)17, weconfirmed that FOX2-bound intronic regions flanking alternativeexons were approximately two-fold more conserved than those flank-ing constitutive exons, and four- to seven-fold more conserved thanother intronic regions containing randomly selected regions of similarsizes (Fig. 2c). These findings are fully consistent with existing

00.10.20.30.40.50.60.70.80.9

1

0 100 200 3000–100–200–300

EST–ASCS

ACE

EST–AS, randomCS, random

Con

serv

atio

n

Upstream intron (nt) Downstream intron (nt)

–2,000 –1,500 –1,000 –500 00

0.1

0.2

0.3

0.4

0.5EST–ASCS

ACE

EST–AS, randomCS, random

0 500 1,000 1,500 2,000

Fra

ctio

n of

clu

ster

s

Upstream intron (nt) Downstream intron (nt)

a cb

0

100

200

302520151050–5–10–15

300

400

500

600

700

800

UGCAUG

UGCAUG 25.16GCAUGU 17.81GUGAUG 14.15UGGUGA 13.74GGUGGU 12.81

Z

Num

ber

of h

exam

ers

Z–score

Figure 2 Genomic mapping and analysis of FOX2 CLIP-seq reads. (a) Consensus in vivo FOX2 binding sites identified by CLIP-seq. Histogram of Z-scores

indicating the enrichment of hexamers in CLIP-seq clusters compared to randomly chosen regions of similar sizes in the same genes. Z-scores of the top five

hexamers were indicated. (b) Enrichment of FOX2 CLIP-seq clusters within both constitutive and regulated exons and flanking intronic regions, particularly in

the 3¢ half of exons and downstream intronic regions. The FOX2 CLIP-seq clusters mapped most frequently to alternative conserved exons (ACEs) predicted

by ACEScan, followed by EST-verified AS exons (EST-AS), compared to constitutively spliced exons (CS). Randomly chosen regions of similar sizes in thesame genes were not distributed near EST-AS exons (EST-AS, random) and CS exons (CS, random). The x axis indicates a composite intron-exon-intron

structure, containing sequences from 2,000 nt in the upstream intron and the first 50 nt of the exon (left), and the last 50 nt of the exon and 2,000 nt

in the downstream intron (right). The y axis indicates the frequency of FOX2 CLIP-seq clusters. (c) Sequence conservation of FOX2 CLIP-seq clusters

associated with different classes of exons. The average Phastcons scores were used to compute the extent of conservation17.

ART IC L E S

13 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 40: Nature Structural Molecular Biology February

examples of FOX2-regulated alternative splicing events18, where highlevels of flanking-sequence conservation were predictive of regulatedsplicing in mammalian cells15,16.

FOX2 regulation of RNA targetsOverall, we identified FOX2 binding clusters in 1,876 protein-codinggenes, suggesting that B7% of human genes are subjected to FOX2regulation in hESCs. To study the function of these FOX2 target genes,we performed gene ontology analysis16, revealing a surprising enrich-ment for RNA binding proteins (P-value o 10�8; SupplementaryTable 1 online). We also noted enrichment for nuclear mRNA splicingfactors (P-value o 10�5) and serine/threonine kinase activity (P-valueo 10�3). Among these FOX2 target genes were heterogeneous ribo-nucleoproteins (hnRNPs; for example, A2/B1, H1, H2, PTB and R),known alternative splicing regulators (for example, A2BP1, PTB,nPTB, QKI, SFRS3, SFRS5, SFRS6, SFRS11 and TRA2A) and RNAbinding proteins important for stem-cell biology (for example, LIN28and MSI2). This observation suggests that FOX2 may have a crucialrole in establishing and maintaining the splicing and signaling pro-grams in hESCs.

Figure 3 presents four examples of FOX2 RNA targets. A total of962 CLIP-seq reads were localized within the fibroblast growth factorreceptor (FGFR) gene FGFR2 (Fig. 3a), which is known to be subjectto FOX2 regulation19. A substantial number (103) of FOX2 CLIP-seqreads were clustered around one of the mutually exclusive exons (exon8, which is selected to produce FGFR7 or keratinocyte growth factor(KGF) in epithelial cells, whereas exon 9 is used to produce FGFR2 infibroblasts). The mapped FOX2 binding sites coincide with threeUGCAUG and two GCAUG sites that are conserved across humans,dogs, mice and rats.

We previously identified in the STE20-like kinase (SLK) gene a93-nt alternative exon that was included in hESCs but excluded indifferentiated cells or tissues5. We mapped a total of 495 FOX2 CLIP-seq reads around three conserved (U)GCAUG elements upstream ofthe alternative exon (Fig. 3b). Indeed, FOX2 knockdown resulted inexon skipping of the alternative exon.

A total of 2,563 CLIP-seq reads were mapped to the polypyrimidinetract binding protein 2 (nPTB) gene, which is crucial for manyregulated splicing events in neurons20,21. We identified 15 FOX2binding clusters that could be aggregated into four groups (Fig. 3c),

154 nt

FOX2/RBM9

40 nt

93 nt

60

1

PTBP2

34 nt

21

2311

1

SLKFGFR2

FOX2Ctl

Ctl FOX2

CLIP-seq sense reads

CLIP-seq sense reads

CLIP-seq sense reads

a b

c d

CLIP-seq sense reads

RV2RV1

148 nt

FW

FOX2Ctl

02

4

68

10

12

020406080

100120140

0

2

4

6

8

10

Ctl FOX2

0

0.02

0.04

0.06

0.08

0.1

FW+RV2GAPDH

FW+RV1

FW+RV2GAPDH

1

1

Figure 3 Clustering of FOX2 CLIP-seq reads around regulated splicing events. (a–d) The distribution of FOX2 CLIP-seq clusters in four examples of FOX2-

regulated genes. The CLIP-seq reads are shown above each gene, with the y axis indicating the read density at each position. Each gene is diagrammed by

vertical black bars (exons) and thin horizontal lines (introns), with arrows representing specific RT-PCR primers. Identified clusters are marked by horizontal

orange bars. Exons encased by the red box in each case are illustrated in an expanded view below in which yellow boxes indicate the location of conserved

GCAUG (dashed outlines) and UGCAUG (filled outlines) FOX2 binding motifs. Sequence conservation as measured by Phastcons scores is shown below. The

insert in each expanded view shows RT-PCR analysis of alternative splicing in response to FOX2 knockdown by shRNA from triplicate experiments, with a

representative gel image and s.d. indicated by error bars. FW, RV1 and RV2 represent forward and reverse primers. Ctl, control.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 3 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 41: Nature Structural Molecular Biology February

one that contained 958 reads overlappingprecisely with the known alternative exonand its flanking introns, which contain fourUGCAUG elements in the ultraconservedintrons20,21. Such dense FOX2 binding mayindicate an unexpected mode of regulation,such as cooperative action, that cannot beexplained by simple FOX2 recognition of itsconsensus binding sites.

In the fourth example, 1,576 reads werelocated on the FOX2 transcript itself, with198 reads overlapping with six conserved(U)GCAUG elements proximal to the 40-ntalternative exon 11 (Fig. 3d), a finding that isconsistent with the reported autoregulationof the gene crucial for homeostatic FOX2expression19,22. These and many other exam-ples (see below) show that FOX2 functions asa regulator of other splicing factors, includingitself, in hESCs.

Exon inclusion and repression by FOX2 binding in hESCsOur mapping results revealed preferential FOX2 binding to intronicregions either upstream or downstream or on both sides of thealternative exons. To determine the functional impact of these physicalbinding events, we selected 23 FOX2 target genes for functionalvalidation in HUES6 cells treated with a lentivirus expressing ashort hairpin RNA (shRNA) against FOX2 (Fig. 4). Western blotting36 h after infection indicated specific downregulation of FOX2,relative to a control shRNA against enhanced green fluorescent protein(EGFP) (Fig. 4). RT-PCR analysis showed that FOX2 depletion indeedinduced differential alternative splicing in 17 out of 23 (73%) testedgenes (Figs. 3 and 4 and Supplementary Table 2 online).

Notably, we observed a general trend with respect to FOX2-regulated exon inclusion or skipping, depending on the location ofFOX2 binding sites in the upstream or downstream intronic regions.Depletion of FOX2 tended to lead to exon inclusion if FOX2 bindingsites were located in the upstream intron, as seen in MAP3K7,

ZNF532, PARD3 and SFRS11 (Fig. 4a). In contrast, depletion ofFOX2 resulted in exon skipping if FOX2 binding sites were locatedin the downstream intron, for instance, in ECT2, PICALM, PTBP1 andENAH (Fig. 4b). In several cases, such as in PTBP2 (nPTB), TSC2,SFRS6 and RIMS2, FOX2 binding sites were present in both upstreamand downstream introns (Figs. 3 and 4); here, depletion of FOX2resulted in either exon skipping (PTBP2, TSC2, SFRS6) or inclusion(RIMS2), probably reflecting a dominant effect of one binding siteover the other(s). Notably, we also observed FOX2-dependent alter-native 3¢ end formation in the QK1 gene (Fig. 4a).

On the basis of the results from the experimental validation, wegenerated a general splicing model by compiling consensus FOX2binding motifs that are associated with FOX2 depletion–induced exoninclusion (green) or skipping (blue) (Fig. 4c). Compared to shuffledversions of the regions bound by FOX2, we observed an enrichment ofseven-fold to nine-fold of the conserved GCAUG motifs within1,000 nt of the alternatively spliced exons (P-value o 0.01). In fact,this enrichment peaks at 29-fold B100 nt upstream (repression) and

MAP3K7 ECT2

PICALM

02468

10

24 nt

PTBP1

78 nt

FOX2

Ctl

ENAH

63 nt

FOX2

Ctl

ZNF532

PARD3

TSC2

129 nt

Ctl

QK1

FW+RV2FW+RV1

CtlFOX2

Ctl

RV1

RV2FW

RIMS2

SFRS11

FOX2

CtlFW

FW+RV1

FW+RV2

Incl

Excl

FW+RV1

FW+RV2

268 ntCtl

SFRS6

RV2FW

FW+RV1

FW+RV2

0

1

2

3

0

0.1

0.05

0.15

0.2

0

0.4

1.2

0.8

0123

01234

0

1

2

3

4

00.050.1

0.150.2

02468

10

0123456

00.10.20.30.40.50.6

05

1510

2025

0.25

Incl

Excl

Incl

Excl

Incl

Excl

Incl

Excl

Incl

Excl

Incl

Excl

Incl

Excl

65 ntFOX2

Ctl

91 nt

FOX2

Ctl

209 ntFOX2

Ctl68 nt

82 ntFOX2

Ctl

45 nt

Ctl

a b

0–200–400–600–800

6

9

0 200 400 600 800

c

Isoform2Isoform1

Isoform1

Isoform2

(bp)

3

Num

ber

ofG

CA

TG

FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2 Ctl FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2

Ctl FOX2

RV1

RV2

FW+RV2 FW+RV1

FOX2

FOX2

Ctl

FOX2

FW+RV2 FW+RV1

RV1FOX2

FOX2 FOX2

Figure 4 RNA map of FOX2-regulated alternative

splicing. (a) FOX2-dependent exon skipping.

(b) FOX2-dependent exon inclusion. Each gene

is diagrammed by vertical black bars (exons)

and thin horizontal lines (introns) with arrows

representing specific RT-PCR primers. The

conserved GCAUG FOX2 binding motifs (red

vertical bars) generally overlap with mapped

FOX2 binding sites by CLIP-seq (blue horizontal

bars). Regulated splicing in control (Ctl) shRNA–

and FOX2 (FOX2) shRNA–treated hESCs was

analyzed by RT-PCR in triplicate, and s.d. is

indicated by error bars. Changes in alternative

splicing were significant in all cases, as

determined by the Student’s t-test (P-valueo 0.05). (c) Number of conserved GCAUG sites

proximal to the RT-PCR–validated FOX2-regulated

alternative splicing, showing that conserved FOX2

binding motifs upstream or downstream of the

alternative exon correlate with FOX2-dependent

exon skipping (green) or inclusion (blue),

respectively. Dashed lines and error bars indicate

average number and s.d. of GCAUG sites in

100 independent versions of shuffled

CLIP-seq binding sites.

ART IC L E S

13 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 42: Nature Structural Molecular Biology February

21-fold B100 nt downstream (activation) of the alternative exons(P-value o 0.001). This splicing model revealed a regulatory RNAmap for FOX2 to activate or repress alternative splicing when bounddownstream or upstream of the alternative exon, respectively. ThisRNA map is reminiscent of the trend observed with the neuronalspecific splicing regulator Nova23, suggesting a general splicing codefor cell type–specific splicing regulators.

Notably, we observed that the alternative splicing patterns for arandomly selected subset of these exons were different in neuralprogenitors differentiated from HUES6 cells (HUES6-NP) that wereFOX2 depleted by lentiviral shRNA, demonstrating that the splicingpatterns were embryonic stem cell specific (Supplementary Fig. 7online). Furthermore, the splicing patterns in FOX2-depleted humanfetal neural stem cells (human central nervous system stem cellspropagated as neurospheres, hCNS-SCns) were similar to FOX2-depleted HUE6-NP cells24. HCNS-SCns are primary fate-restrictedneural progenitors that, similarly to HUES6-NP cells, also expressFOX2, suggesting that upon neural differentiation the targets forFOX2 regulation will be altered. We conclude that our RNA targetsof FOX2 identified by CLIP-seq are specific for embryonic stem cells.

FOX2 is an important gene for hESC survivalDuring the investigation of FOX2 knockdown–induced alternativesplicing, we were surprised to observe a rapid cell-death phenotype inresponse to FOX2 depletion in a dose-dependent manner (Fig. 5a,b).We observed the same phenotype using two independent lentiviralshRNAs on two independent hESC lines—HUES6 (Fig. 5a) and H9

(Supplementary Fig. 8 online). In contrast,we knocked down FOX2 in hCNS-SCns andobserved no effect on cell survival,with the caveat that the knockdown inhCNS-SCns was of slightly lower efficiencyowing to decreased infection in neuro-spheres24 (Fig. 5a,b). However, knockdownsin embryonic stem cells at comparable effi-ciencies to hCNS-SCns still recapitulate thecell-death phenotype. Live cell counts usingtrypan blue exclusion indicated cell death in a

dose-dependent fashion, exclusively with FOX2 depletion in hESCs,but not hCNS-SCns (Fig. 5c). Furthermore, FOX2 depletion did notaffect expression of pluripotency markers in both HUES6 (Fig. 5d)and H9 hESC lines (Supplementary Fig. 9 online). Additionally,knockdown of FOX2 in transformed cell types such as 3T3 andHEK293T cells also did not affect cell viability (SupplementaryFig. 8), suggesting overall that FOX2 is selectively required forhESC survival.

To determine the possible cause of cell death, we stained HUES6cells with the monomeric cyanine dye green fluorescent Yo-Pro-1, amarker of early apoptosis correlated with Annexin V staining25.Flow cytometry indicated that a statistically significant portion ofFOX2-depleted, but not mock-depleted, cells committed apoptosisin a dosage-dependent manner (Student’s t-test, P-value o 0.001)(Fig. 5e). This apoptotic death was confirmed by immunocyto-chemistry for activated caspase 3 (Fig. 5e). We also detected theupregulation of numerous genes involved in the necrosis pathway(Supplementary Fig. 10 online). Together, these results indicate thatFOX2-deficient cells underwent both apoptosis and necrosis, inde-pendently of cell-cycle arrest (Supplementary Fig. 11 online).

DISCUSSIONPost-transcriptional gene expression regulation is crucial for manydiverse cellular processes, such as development, metabolism andcancer. The fate of hundreds of thousands of mRNA molecules ineukaryotic cells is likely to be coordinated and regulated by hundredsof RNA binding proteins and noncoding RNAs (for example,

a

b

2 5 10

Initia

l

Initia

l

Initia

l2 5 10 2 5 10 2 5 10 2 5 10 2 5 10 2 5 100

4e5

8e5

1.2e6

ActivatedCaspase 3 DAPI

FOX2

Ctl FOX2

HUES6 H9 hCNS-SCns

HUES6 H9 hCNS-SCns

Ctl CtlFOX2 FOX2

Ctl FOX2

HUES6Ctl FOX2

Ctl CtlFOX2 FOX2

Actin

c

e

d

RFP

Nanog OCT4

% A

popto

sis

2 5 10 2 5 10

2

4

6

8

10

0

*

*

Live

cel

l cou

nts

HU

ES

6F

OX

2 sh

RN

A

HU

ES

6F

OX

2 sh

RN

AH

UE

S6

Ctl

shR

NA

HU

ES

6C

tl sh

RN

A

hCN

S-S

Cns

FO

X2

shR

NA

hCN

S-S

Cns

Ctl

shR

NA

Figure 5 FOX2 is important for hESC survival.

(a) Left, HUES6 hESC cells underwent rapid cell

death in a dosage-dependent manner (2 ml, 5 ml

and 10 ml) in response to FOX2 knockdown by

lentiviral shRNA. Lentiviruses also expressed RFP,

indicated in the inset, demonstrating the extent

of infection. Right, similar infection of hCNS-

SCns (grown as suspended neurospheres) did not

result in a cell-death phenotype. Scale bars for

hESC and hCNS-SCns are 25 mm and 200 mm,

respectively. (b) Efficient FOX2 knockdown

determined by western blotting was achieved in

all cell types using actin as a loading control.

(c) Cell counts using trypan blue exclusion

indicates that the cell-death phenotype fromFOX2 knockdown is specific to hESCs (HUES6

and H9 lines) and occurs in a dose-dependent

fashion. (d) Infection of hESCs with FOX2

shRNA–RFP virus does not affect expression

of pluripotency markers OCT4 and Nanog.

(e) Knockdown of FOX2 in hESCs resulted in an

increase in apoptotic cell death, as indicated by

immunocytochemistry toward activated caspase-3

(left) and by FACs analysis using Yo-Pro (right)

(* P-value o 0.001). Error bars indicate s.d.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 3 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 43: Nature Structural Molecular Biology February

microRNAs). To shed light on the importance and roles of individualRNA binding proteins, it is necessary to identify the spectrum oftargets recognized and associated with these RNA binding proteins.Genome-wide unbiased methods have begun to reveal the plethora oftargets and diverse rules by which the post-transcriptional regulatorynetworks are controlled26.

Here we have identified the splicing factor FOX2 as beinghighly expressed in the nuclei of pluripotent hESCs. hESCs con-stitute an excellent in vitro model for survival, self-renewal,differentiation and development. Using a modified CLIP-seqtechnology and computational analyses that accounted for gene-specific variation in RNA abundance, we have uncovered thou-sands of FOX2 RNA targets representing B7% of the human genesin hESCs. Confirming and extending previous computationalanalyses of human intronic regions14–16,18, we observed thatFOX2 was preferentially bound near alternative splice sites, andthe binding sites were located within regions of higher evolution-arily conservation. Experimental validation of targets revealed thatFOX2 represses exon usage when bound upstream and enhancesexon inclusion when located downstream of the alternative exon,revealing an RNA map for the FOX2-mediated alternative splicingprogram in hESCs. Our study presenting in vivo targets of FOX2 ina biological system strengthens computational predictions fromotherwise indistinguishable conserved FOX1 and FOX2 sites27, asboth FOX1 and FOX2 recognize the same RNA element6,11,28. Thefact that FOX2 is also expressed in differentiated neural progeni-tors from hESCs and fetal neural stem cells but was not shown toregulate alternative splicing the same way in hESCs, despite havingconserved binding sites in the same transcribed pre-mRNA, under-scores the importance of experimentally identifying in vivo targetsin the appropriate cell and tissue context.

The finding that many FOX2 targets are themselves splicing regula-tors leads to the provocative possibility that FOX2 may function as anupstream regulator of many general and tissue-specific splicingregulators. In addition, we identified FOX2 binding within the FOX2pre-mRNA itself and, combined with RT-PCR data, demonstrateddirect evidence for autoregulation of the FOX2 gene. The alternativesplicing of the FOX2 pre-mRNA may result in unique target pre-mRNAsplicing regulation; this possibility deserves further attention inthe future.

Last, our preliminary results indicate that FOX2 has an impor-tant role in maintaining the viability of hESCs, as depletion ofFOX2 led to rapid cell death. Given the many genes controlled byFOX2 in hESCs, it is presently unclear which gene(s) or alternativesplicing event(s) is responsible for the lethal phenotype. It ispossible that the phenotype is a result of the combined effect ofmultiple affected genes. Given our observation that FOX2 mayfunction as a master regulator of the alternative splicing program inhESCs and signaling pathways, it may be likely that many eventscontribute to the phenotype, that is, it may be unrealistic to thinkthat the complex cellular mortality phenotype could be due to asingle altered gene product. Nevertheless, the phenotype is remark-ably specific to hESCs, and not other cell lines such as 293T or 3T3.More notably, neither neural progenitors derived from hESCs norprimary human fetal neural stem cells were sensitive to FOX2depletion, suggesting that FOX2 has a different set of targets and,hence, a dissimilar RNA map in other cell types. Our studyprovides a starting point for the future characterization of thevarying target repertoire of the same splicing factor in differentbiological systems, embracing a need to understand the uniquenessof factor-target relationships throughout biology.

METHODSCulturing and differentiation of hESCs. We cultured hESC lines HUES6 and

H9 as previously described (http://www.mcb.harvard.edu/melton/HUES/)5.

Briefly, we grew cells on growth factor–reduced (GFR) matrigel–coated plates

(BD) in mouse embryonic fibroblast–conditioned medium and FGF2 (20 ng

ml–1) in DMEM media (Invitrogen) supplemented with 20% (v/v) Knock Out

serum replacement (GIBCO), 1 mM L-glutamine, 50 mM b-mercaptoethanol,

0.1 mM nonessential amino acids (Invitrogen) and 10 ng ml–1 FGF2 (R&D

Systems), and passaged by manual dissection.

Neural progenitors were derived from hESCs as previously described5.

Briefly, colonies were removed by treatment with collagenase IV (Sigma) and

resuspended in media without FGF2 in nonadherent plates to form embryoid

bodies. After 1 week, embryonic bodies were plated on polyornathine/laminin-

coated plates in DMEM/F12 supplemented with N2 (1�) and FGF2. Rosette

structures were manually dissected and enzymatically dissociated with TryPLE

(Invitrogen), plated on polyornathine/laminin-coated plates and grown in

DMEM/F12 supplemented with N2, B27 without retinoic acid and 20 ng ml–1

FGF2. Progenitors were verified by neuronal differentiation using 20 ng ml–1

brain-derived and glial-derived neurotrophic factors (BDNF and GDNF).

Lentiviral short hairpin RNA–mediated knockdown of FOX2. We purchased

lentiviral shRNAs constructs toward FOX2 from Open Biosystems in the

pLKO.1 vector system (TRCN0000074545 and TRCN0000074546). The control

virus used was pLKO.1 containing a shRNA toward GFP (Open Biosystems).

Lentivirus production was as previously described29. The efficacy of the

lentivirus was tested by infection of HUES6 hESCs at varying viral concentra-

tions and subsequent western blotting 36 h after infection with an antibody to

FOX2 (1:1,000, Bethyl Laboratories) and actin as a control (1:5,000, Sigma).

The FOX2, control and a GFP lentivirus were all made in parallel and

concentrated by ultracentrifugation. GFP virus was titered using serial dilutions

and infection of HEK293T cells. At 3 d after infection, we analyzed the cells for

GFP expression by FACS and determined the viral titer using multiple

dilutions, which yielded infections in the linear range. Titers were between

1 � 109 and 3 � 109. We used matched FOX2 and control viruses for hESC

infections. Additionally, red fluorescent protein (RFP) was cloned into the

PLKO.1 and 74546 lentiviral backbone in place of the puromycin-resistance

gene and used in some studies to verify titer and comparable infection rates

between the two lentiviruses.

Analysis of cross-linking immunoprecipitation reads. The human genome

sequence (hg17) and annotations for protein-coding genes were obtained from

the University of California, Santa Cruz Genome Browser. Known human genes

(knownGene containing 43,401 entries) and known isoforms (knownIsoforms

containing 43,286 entries in 21,397 unique isoform clusters) with annotated

exon alignments to the human hg17 genomic sequence were processed as

follows. Known genes that were mapped to different isoform clusters were

discarded. All mRNAs aligned to hg17 that were greater than 300 nt were

clustered together with the known isoforms. For the purpose of inferring

alternative splicing, genes containing fewer than three exons were removed

from further consideration. A total of 2.7 million spliced ESTs were mapped

onto the 17,478 high-quality gene clusters to identify alternative splicing. To

eliminate redundancies in this analysis, final annotated gene regions were

clustered together so that any overlapping portion of these databases was

defined by a single genomic position. To determine the number of reads that

was contained within protein-coding genes, promoters and intergenic regions,

we arbitrarily defined promoter regions as 3 kb upstream of the transcriptional

start site of the gene and intergenic regions as unannotated regions in the

genome. To identify CLIP clusters, we performed the following steps: (i) CLIP

reads were associated with protein-coding genes as defined by the region from

the annotated transcriptional start to the end of each gene locus. (ii) CLIP reads

were separated into the categories of sense or antisense to the transcriptional

direction of the gene. (ii) Sense CLIP reads were extended by 100 nt in the

5¢-to-3¢ direction. The height of each nucleotide position is the number of reads

that overlap that position. (iv) The count distribution of heights is as follows

from 1, 2, yh, yH-1, H: {n1, n2, ynh, ynH-1, nH; N¼ S ni (i¼ 1:H)}. For a

particular height, h, the associated probability of observing a height of at least

h is Ph ¼ S ni (i ¼ h:H) / N. (v) We computed the background frequency after

ART IC L E S

13 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 44: Nature Structural Molecular Biology February

randomly placing the same number of extended reads within the gene for 100

iterations. This controls for the length of the gene and the number of reads. For

each iteration, the count distribution and probabilities for the randomly placed

reads (Ph,random) was generated as in step (iv). (vi) Our modified FDR for a

peak height was computed as FDR(h) ¼ (mh + sh)/Ph, where mh and sh is the

average and s.d., respectively, of Ph,random across the 100 iterations. For each

gene loci, we chose a threshold peak height h* as the smallest height equivalent

to FDR(h*) o 0.001. We identified FOX2 binding clusters by grouping

nucleotide positions satisfying h 4 h* and occurred within 50 nt of each

other. This resulted in 6,123 FOX2 binding clusters. This number varied slightly

when repeated for different sets of iterations. As a control for authentic FOX2

clusters, artificial randomly located regions were generated as follows. For each

gene that contained one or more FOX2 binding clusters, we randomly picked

the same number of regions of the same sizes as the FOX2 clusters in the pre-

mRNA. Distances between clusters were measured from the 3¢ end of a cluster

to the 5¢ end of the downstream cluster. Clusters were further grouped, as many

clusters were closer than expected when compared to the randomly chosen

regions. If a cluster was greater than 50 nt in length and within 1,500 nt to

another cluster, we grouped that as a single cluster, resulting in 3,547 clusters.

Cell-cycle and cell-death analysis. We carried out apoptosis staining using

Yo-Pro according to the manufacturer’s instructions (Invitrogen). Gating for

apoptotic cells was determined empirically using a negative control (no Yo-Pro)

and a positive control (4-h treatment with 10 mM campthothecin). Cell-cycle

staining was performed as previously described30,31. Briefly, cells were trypsin-

ized, washed and resuspended in PBS, then fixed by the addition of a 3:1 ratio

of ice-cold 100% (v/v) ethanol in PBS overnight at –20 1C. Subsequently, cells

were washed and resuspended in a solution containing 50 mg ml–1 propidium

iodide and 500ng ml–1 RNase A for 1 h at 37 1C before analysis by FACS on a

Becton-Dickinson FACScan. Immunocytochemistry was performed using the

activated caspase-3 antibody (Cell Signaling Technologies, 1:150).

Additional cell culture procedures and antibodies used, RNA extraction,

RT-PCR, CLIP library construction and sequencing, Processing of 1G data and

Genomic analysis are available in Supplementary Methods online.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSThe authors would like to acknowledge S. Aigner for technical advice, J. Simonfor illustration assistance and R. Keithley and B. Miller for cell culture. G.W.Y.was funded by a Junior Fellowship from the Crick-Jacobs Center for Theoreticaland Computational Biology, Salk Institute. F.H.G. is funded by the CaliforniaInstitute of Regenerative Medicine, The Picower Foundation and the LookoutFoundation. Part of this work was supported by US National Institutes of Healthgrants to X.-D.F. (GM049369 and HG004659) and G.W.Y. (HG004659).

AUTHOR CONTRIBUTIONSG.W.Y. directed the project; G.W.Y. and F.H.G. designed the project; G.W.Y.,N.G.C. and X.-D.F. analyzed the data and wrote the manuscript; G.W.Y., N.G.C.,T.Y.L. and G.E.P. performed the experiments; G.W.Y. and T.Y.L. carried outbioinformatics data analysis.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Black, D.L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev.Biochem. 72, 291–336 (2003).

2. Thomson, J.A. et al. Embryonic stem cell lines derived from human blastocysts.Science 282, 1145–1147 (1998).

3. Keller, G. Embryonic stem cell differentiation: emergence of a new era in biology andmedicine. Genes Dev. 19, 1129–1155 (2005).

4. Sonntag, K.C., Simantov, R. & Isacson, O. Stem cells may reshape the prospectof Parkinson’s disease therapy. Brain Res. Mol. Brain Res. 134, 34–51(2005).

5. Yeo, G.W. et al. Alternative splicing events identified in human embryonic stem cellsand neural progenitors. PLOS Comput. Biol. 3, e196 (2007).

6. Jin, Y. et al. A vertebrate RNA-binding protein Fox-1 regulates tissue-specific splicingvia the pentanucleotide GCAUG. EMBO J. 22, 905–912 (2003).

7. Underwood, J.G., Boutz, P.L., Dougherty, J.D., Stoilov, P. & Black, D.L. Homologues ofthe Caenorhabditis elegans Fox-1 protein are neuronal splicing regulators in mammals.Mol. Cell. Biol. 25, 10005–10016 (2005).

8. Ule, J. et al. CLIP identifies Nova-regulated RNA networks in the brain. Science 302,1212–1215 (2003).

9. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatinimmunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657(2007).

10. Fairbrother, W.G., Yeh, R.F., Sharp, P.A. & Burge, C.B. Predictive identification ofexonic splicing enhancers in human genes. Science 297, 1007–1013 (2002).

11. Auweter, S.D. et al. Molecular basis of RNA recognition by the human alternativesplicing factor Fox-1. EMBO J. 25, 163–173 (2006).

12. Kabat, J.L. et al. Intronic alternative splicing regulators identified by comparativegenomics in nematodes. PLOS Comput. Biol. 2, e86 (2006).

13. Goren, A. et al. Comparative analysis identifies exonic splicing regulatory sequences—the complex definition of enhancers and silencers. Mol. Cell 22, 769–781(2006).

14. Yeo, G.W., Nostrand, E.L. & Liang, T.Y. Discovery and analysis of evolutionarilyconserved intronic splicing regulatory elements. PLoS Genet. 3, e85 (2007).

15. Sorek, R. & Ast, G. Intronic sequences flanking alternatively spliced exons areconserved between human and mouse. Genome Res. 13, 1631–1637 (2003).

16. Yeo, G.W., Van Nostrand, E., Holste, D., Poggio, T. & Burge, C.B. Identification andanalysis of alternative splicing events conserved in human and mouse. Proc. Natl.Acad. Sci. USA 102, 2850–2855 (2005).

17. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, andyeast genomes. Genome Res. 15, 1034–1050 (2005).

18. Brudno, M. et al. Computational analysis of candidate intron regulatory elements fortissue-specific alternative pre-mRNA splicing. Nucleic Acids Res. 29, 2338–2348(2001).

19. Baraniak, A.P., Chen, J.R. & Garcia-Blanco, M.A. Fox-2 mediates epithelialcell-specific fibroblast growth factor receptor 2 exon choice. Mol. Cell. Biol. 26,1209–1222 (2006).

20. Makeyev, E.V., Zhang, J., Carrasco, M.A. & Maniatis, T. The microRNA miR-124promotes neuronal differentiation by triggering brain-specific alternative pre-mRNAsplicing. Mol. Cell 27, 435–448 (2007).

21. Boutz, P.L. et al. A post-transcriptional regulatory switch in polypyrimidine tract-binding proteins reprograms alternative splicing in developing neurons. Genes Dev. 21,1636–1652 (2007).

22. Nakahata, S. & Kawamoto, S. Tissue-dependent isoforms of mammalian Fox-1homologs are associated with tissue-specific splicing activities. Nucleic Acids Res.33, 2078–2089 (2005).

23. Ule, J. et al. An RNA map predicting Nova-dependent splicing regulation. Nature 444,580–586 (2006).

24. Uchida, N. et al. Direct isolation of human central nervous system stem cells.Proc. Natl. Acad. Sci. USA 97, 14720–14725 (2000).

25. Idziorek, T., Estaquier, J., De Bels, F. & Ameisen, J.C. YOPRO-1 permits cytofluoro-metric analysis of programmed cell death (apoptosis) without interfering with cellviability. J. Immunol. Methods 185, 249–258 (1995).

26. Halbeisen, R.E., Galgano, A., Scherrer, T. & Gerber, A.P. Post-transcriptional generegulation: from genome-wide studies to principles. Cell. Mol. Life Sci. 65, 798–813(2008).

27. Zhang, C. et al. Defining the regulatory network of the tissue-specific splicing factorsFox-1 and Fox-2. Genes Dev. 22, 2550–2563 (2008).

28. Ponthier, J.L. et al. Fox-2 splicing factor binds to a conserved intron motif to promoteinclusion of protein 4.1R alternative exon 16. J. Biol. Chem. 281, 12468–12474(2006).

29. Singer, O. et al. Targeting BACE1 with siRNAs ameliorates Alzheimerdisease neuropathology in a transgenic model. Nat. Neurosci. 8, 1343–1349(2005).

30. Crissman, H.A. & Steinkamp, J.A. Rapid, simultaneous measurement of DNA, protein,and cell volume in single cells from large mammalian cell populations. J. Cell Biol. 59,766–771 (1973).

31. Krishan, A. Rapid flow cytofluorometric analysis of mammalian cell cycle by propidiumiodide staining. J. Cell Biol. 66, 188–193 (1975).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 3 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 45: Nature Structural Molecular Biology February

Structures of endonuclease V with DNA reveal initiationof deaminated adenine repairBjørn Dalhus1–3, Andrew S Arvai4, Ida Rosnes1,3, Øyvind E Olsen1,3, Paul H Backe1,3, Ingrun Alseth1,2,Honghai Gao5, Weiguo Cao5, John A Tainer4 & Magnar Bjøras1–3

Endonuclease V (EndoV) initiates a major base-repair pathway for nitrosative deamination resulting from endogenous processesand increased by oxidative stress from mitochondrial dysfunction or inflammatory responses. We solved the crystal structures ofThermotoga maritima EndoV in complex with a hypoxanthine lesion substrate and with product DNA. The PYIP wedge motif actsas a minor groove–damage sensor for helical distortions and base mismatches and separates DNA strands at the lesion. EndoVincises DNA with an unusual offset nick 1 nucleotide 3¢ of the lesion, as the deaminated adenine is rotated B901 into arecognition pocket B8 A from the catalytic site. Tight binding by the lesion-recognition pocket in addition to Mg2+ andhydrogen-bonding interactions to the DNA ends stabilize the product complex, suggesting an orderly recruitment ofdownstream proteins in this base-repair pathway.

Nitrate or nitrite metabolism generates mutagenic reactive nitrogenoxides, which can deaminate exocyclic amines of DNA nucleobases.Thus, adenine in DNA is deaminated to hypoxanthine, guanine toxanthine or oxanine, and finally cytosine to uracil (Fig. 1a). Suchnitrosative deamination of DNA bases can cause transition mutationsand cancer predispositions1–5. The deaminated bases have strongmiscoding properties and produce mutations during subsequentreplication, in which hypoxanthine mispairs with cytosine, leadingto mutation from A-T to G-C2. Although it was until recently missingfrom otherwise comprehensive analyses of DNA base-repairpathways6, EndoV, encoded by the nfi gene, is the key enzyme forinitiating repair of deaminated purine bases1,2,7,8. EndoV sequencehomologs are conserved in all domains of life from bacteria tohumans. A multiple-sequence alignment of EndoV reveals residuescharacteristic of this protein family, including the fully conservedAsp43, Tyr80, Glu89, Asp110, His116 and Lys139 (SupplementaryFig. 1 online).

Under physiological conditions, EndoV hydrolyzes the secondphosphodiester bond 3¢ of a deaminated base using Mg2+ as acofactor9. This unprecedented 3¢ offset incision by a DNA-repairprotein involved in recognition and excision of single-base lesionsis unique to EndoV (Fig. 1b). In contrast, DNA glycosylases in thebase-excision repair (BER) pathway remove the damaged base byhydrolyzing the N-glycosylic bond, leaving an abasic site for down-stream processing10. Although the details of downstream processing

for the EndoV pathway remain unknown, a 3¢-5¢ exonuclease activitymay generate a DNA-repair patch spanning only 2–3 nucleotides(nt) to either side of a hypoxanthine base8. EndoV has high affinityfor both the hypoxanthine substrate and the nicked product11,12.Furthermore, EndoV can recognize all possible deaminated DNAbases13. In vitro, EndoV also shows activity toward the single-baselesions abasic sites, urea9 and base mismatches11,14. Finally, Escherichiacoli EndoV can cleave insertion or deletion mismatches, and flap andpseudo Y structures15, which are all characterized by a discontinuousor distorted DNA helix.

Although EndoV cleaves a spectrum of DNA lesions, geneticanalysis of E. coli nfi insertion mutants and overproducingstrains suggest a major role for EndoV in the in vivo repair ofdeaminated purine bases such as hypoxanthine2,7,8,16. The functionof EndoV in eukaryotic cells is less well understood; however, the nfimutant of fission yeast Schizosaccharomyces pombe shows a strongmutator phenotype (I. Alseth, personal communication), and EndoVfrom mice possesses DNA-repair activities resembling those of thebacterial counterparts17. Furthermore, nfi�/� mice show a cancer-prone phenotype (A. Klungland, personal communication), support-ing a key role for EndoV in genome integrity and for malfunction ofthe EndoV pathway in cancer pathophysiology.

To characterize DNA-repair initiation for this prototypic enzyme,we solved the crystal structures of T. maritima EndoV in complex witha hypoxanthine-lesion substrate and product DNA.

Received 29 September 2008; accepted 21 November 2008; published online 11 January 2009; doi:10.1038/nsmb.1538

1Centre for Molecular Biology and Neuroscience (CMBN), Rikshospitalet University Hospital, Sognsvannsveien 20, N-0027 Oslo, Norway. 2Institute of MedicalMicrobiology, Rikshospitalet University Hospital, Sognsvannsveien 20, N-0027 Oslo, Norway. 3Institute of Clinical Biochemistry, University of Oslo, N-0027 Oslo,Norway. 4Department of Molecular Biology and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 North Torrey Pine Road, MB4,La Jolla, California 92037, USA. 5Department of Genetics and Biochemistry, South Carolina Experiment Station, Clemson University, 51 New Cherry Street, Clemson,South Carolina 29634, USA. Correspondence should be addressed to M.B. ([email protected]) or J.A.T. ([email protected]).

13 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 46: Nature Structural Molecular Biology February

RESULTSStructure determinationTo unravel the molecular basis for lesion recognition and the unpre-cedented 3¢ offset nicking of the various substrates of EndoV, wecrystallized and solved structures for both the inactive EndoV D43Amutant18 and the active wild-type EndoV from T. maritima incomplex with DNA harboring a deaminated adenine. The D43Amutant represents the lesion-recognition complex (LRC) before phos-phodiester hydrolysis, whereas the wild-type complex establishes thestructure of the product complex (PC). Both complexes crystallized inspace group I222, with two copies of EndoV in the asymmetric unit.The LRC structure was solved by SAD with 5-bromo-2-deoxyuracil–substituted DNA and refined to 2.1-A resolution (Rwork ¼ 21.6%,Rfree ¼ 25.8%). The wild-type PC structure was initially refined byrigid-body refinement, using the atomic coordinates of the proteinpart only from the LRC as a starting model. The LRC was refined to2.15-A resolution (Rwork ¼ 25.9%, Rfree ¼ 28.8%). We expect thesestructures to be generally representative of the EndoV enzyme family,as T. maritima EndoV contains conserved residues characteristic ofEndoV enzymes from E. coli to humans (Supplementary Fig. 1).

Description of the structureStructure determination of T. maritima EndoV reveals an aba proteinwith a central eight-stranded b-sheet of parallel and antiparallel strandsflanked on either side by a-helices (Fig. 2a). In general, the proteinstructures of the LRC and PC are nearly indistinguishable, with onlyminor reorientations of a few side chains surrounding the mutated

Asp43. The proteins are superimposable with an r.m.s. deviation of0.30 A for 223 Ca atoms. EndoV contains an ‘RNase H–like motif’,resembling that in E. coli RNase H19,20, which is also found in E. coliand yeast Holliday junction resolvase RuvC21,22, the catalytic domain ofE. coli DNA transposase23, the PIWI domain of Pyrococcus furiosusArgonaute24 and the 5¢ endonuclease domain of the nucleotide-excision repair protein UvrC from T. maritima25. However, thetopology and number of strands in the central b-sheet as well as thenumber and orientation of the surrounding helices varies betweenthese proteins, and none of these other enzymes with known structureshare the substrate affinity and the enzymatic properties of EndoV.

The positively charged DNA binding surface of EndoV (Fig. 2b)comprises a distinct central cleft of conserved residues (Fig. 2c) thatruns across the b-sheet and includes a base lesion–recognition pocket,a strand-separating wedge and a catalytic pocket (Fig. 2d). Theliganded DNA forms duplex DNA through homodimerization via atwo-fold crystallographic symmetry within the crystal lattice, so thattwo EndoV molecules bind to a single DNA duplex with twohypoxanthine bases (Supplementary Fig. 2 online). The DNA duplexis sharply bent adjacent to the lesion and bound to EndoV with itsminor groove facing the protein (Fig. 2e).

DNA strands are separated by a highly conserved wedgeMost of the interactions between EndoV and the DNA ribose-phosphate backbone are on the 3¢ side of the lesion (Fig. 3a andSupplementary Fig. 3 online), including those involving theconserved Lys139 and His214. Several hydrophobic residues are stra-tegically located to have key roles in maintaining the DNA conforma-tion. Particularly, a wedge-like segment on the protein surface, arisingfrom the PYIP motif (Pro79-Tyr80-Ile81-Pro82) divides the duplex

NH2

NN

N N

O ′5G C

COH

A

AAGTC

G TT

TATGTC

DeaminatedDNA Endonuclease V

HxP

P P P P

P

P

P P P P

PHx P P

P P

5′

5′

′5

′3

3′

3′

′3

N

a b

N N

Adenine

Deamination

Hypoxanthine

Nicked product

NH

Figure 1 EndoV 3¢ incision initiating deaminated adenine repair.

(a) Deamination of the exocyclic amino group in adenine yields

hypoxanthine. (b) EndoV-dependent repair is initiated by cleavage at the

second phosphodiester bond 3¢ to the lesion (Hx, hypoxanthine) resulting

from deamination of adenine.

α4

α5

a

b

d

c

e

α5

α4

α3 α3

α2 α2

α1 α1

Figure 2 EndoV overall fold, surface characteristics and protein–DNA

complex structure. (a) Stereo pair showing protein fold and ternary structure

of T. maritima EndoV. (b) Electrostatic potential of wild-type EndoV mapped

onto the solvent-accessible protein surface (blue indicates positive regions;

red indicates negative regions). Electrostatic potential was calculated using

APBS. (c) Molecular surface showing conserved residues in the EndoV

family, colored from dark burgundy (highly conserved) through neutral gray

into dark cyan (highly variable). The degree of conservation was calculated

using ConSeq (http://conseq.tau.ac.il/). (d) Molecular surface with bound

DNA (orange and yellow tubes and rings) showing spatial relationships

among key structural elements. The strand-separating PYIP wedge (cyan,

left) protrudes out adjacent to residues Asp43, Glu89, Asp110 and His214,

which are involved in Mg2+ ion binding and phosphodiester incision (yellow,

center). Also shown is the hypoxanthine lesion and the surrounding residues(Leu85, Gly111, Gln112, Gly113, Gly136 and Leu142) that form the

nucleobase pocket (red, center). (e) Molecular surface of wild-type

T. maritima EndoV showing substantial bending of the bound duplex DNA

(orange and yellow ball-and-stick representation). The PYIP wedge is shown

in cyan. Experimental electron density is shown for one of the DNA strands

in the duplex (sA-weighted 2Fo –Fc map contoured at 1s).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 3 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 47: Nature Structural Molecular Biology February

DNA strands adjacent to the lesion (Figs. 2d,e and 3). The Tyr80aromatic ring stacks face-to-face with the guanine base lying 3¢ to thedeaminated base, sterically blocks the vacant hypoxanthine position inthe DNA helix and hydrogen-bonds to the DNA phosphate backbone.Pro82 stacks against Tyr80 as well as the base 5¢ of the lesion and,together with Ile81, wedges open the duplex (Fig. 3c). This PYIP wedgeidentified here thus separates the two DNA strands at the lesion andpushes the base (cytosine) opposite hypoxanthine partly out of theduplex; however, base pairs on either side of the lesion remain paired(Fig. 3c). Thermotoga maritima EndoV Y80A and Y80H mutants areseverely compromised in their ability to bind to DNA base lesions andthe corresponding nicked product, whereas Y80F is similar to the wild-type EndoV in this respect, suggesting that the aromatic ring stackingidentified here is key to retaining a high affinity to the DNA18.

The discovery of the strand-separating wedge as part of the DNAbinding motif helps explain the broad EndoV substrate range, wherebythe PYIP wedge forms a key element in the recognition of structureswith helical distortions such as insertion or deletion mismatches andthe pseudo Y, flap and hairpin substrates.

The deaminated base is inserted into a specific pocketThe PYIP wedge motif is furthermore well suited for an importantand independent role in the search for modified single bases byinterrogating duplex DNA and presenting specific bases to the activesite (Fig. 3c). In contrast to known DNA base–repair mechanisms fordamage reversal and BER, which flip the nucleoside B1801 into alesion-recognition pocket via the major groove26, the hypoxanthine inthe EndoV complex is inserted into the predominantly hydrophobic

recognition pocket by a B901 rotation in theopposite direction toward the minor groove(Fig. 3c).

The hypoxanthine base is inserted betweenthe hydrophobic side chains of Leu85 andLeu142, and the high degree of conservationin these positions across the species reflects theimportance of hydrophobic stacking with theelectron-rich, heterocyclic ring (Fig. 4a). TheLeu142 side chain seems to be important forcontacting the five-membered heterocyclicring of hypoxanthine and assisting the asso-ciated rotation by close contacts to the DNAbackbone. The shape of the pocket is furtherestablished in part by key conserved glycineresidues, Gly83, Gly111, Gly113 and Gly121,the last three of which are invariant in theEndoV family (Supplementary Fig. 1). Intro-duction of a valine side chain at either Gly111or Gly113 in the DGXG motif reducesendonucleolytic activity for hypoxanthine-containing DNA by 50%13. Furthermore, thestrictly conserved Gly136 has been shown tobe important for cleavage of various sub-strates13. Gly136 lies in the middle of b-strand4, and insertion of a larger hydrophobic sidechain is likely to shift the neighboring loopcontaining the DGXG motif, thus displacingboth the metal coordinating Asp110 and theremaining residues in the DGXG motif. Alto-gether, these amino acid substitutions validatethe importance of the protein fold as a scaffoldfor the lesion-recognition pocket and the

metal coordination around the active-site Asp110.

Lesion recognition involves protein backbone atomsThe discrimination between adenine and hypoxanthine in particular,and between other native DNA bases and their corresponding deami-nated analogs in general, arise through several lesion-specific inter-actions in the recognition pocket. Except for a single solvent watermolecule, all polar contacts between EndoV and the hypoxanthinebase involve protein backbone atoms only. Consequently, the recogni-tion pocket is fairly rigid. The close proximity of the Ile122 backboneamide NH to the N1 atom in the deaminated purine bases probablyleads to hypoxanthine and xanthine binding in their respectivetautomeric imidic acid forms (Fig. 4a–c). The resulting hydroxylgroup attached to C6 in hypoxanthine or xanthine forms hydrogenbonds with the Gly83 carbonyl oxygen and Leu85 amide nitrogen,which are reinforced cooperatively (Fig. 4b,c). The xanthine modelfurther suggests a hydrogen-bonding interaction between the Gln112backbone carbonyl oxygen and the hydroxyl group of C2 in xanthine(Fig. 4d). The invariant His116 side chain also interacts with thebackbone carbonyl oxygen of Gly83, securing a stable conformation ofthe adjacent critical PYIP wedge loop and closing in the two sides ofthe pocket. Consistent with our structures, His116 can be replaced byhydrogen bond–donating residues such as glutamine and threoninewithout notable loss of activity13.

Modeling of uracil in the EndoV binding pocket suggests that thetwo carbonyl moieties are recognized by the backbone amide NHgroups of Gly83, Leu85 and Gln112, whereas the Gly83 backbonecarbonyl and the Ile122 amide NH are too distal to form any contacts

G3a b

c

C4

G5

A6

C7 G14

C13

C12

G11

A10

G9

G9

A10

G11

C12

C13

G14

T15

C7

A6

G5

C4

G3

T15

Leu142

Tyr80

Pro82 IIe81

Pro13

Pro79

Gln112 Nε

Arg141 NH

Asp110 OδWat

WatMg2+

Asp43 OδWat

OH 5′

OH 3′

Gln218 Nε

Glu89 Oδ1

Glu89 Oδ2

Tyr80 OH

His214 Nε2

Hx8Ser140 Oγ

Lys139 NLys139 Nζ Lys49 Nζ

Lys221 Nζ

Arg205 NH

Hx8

Phe46 N

Figure 3 Protein-DNA contacts. (a) Protein-DNA contacts in the wild-type product complex. Hydrogen

bonding and ionic interactions (dashed lines, 3.75-A cut-off), main chain amide nitrogen atoms (N)

and steric interactions (orange arcs, 4.25-A cut-off) involving side chains (yellow circles) are shown for

one of the two EndoV molecules binding symmetrically to the DNA. Hx8, hypoxanthine. (b) Close-up ofthe coordination around the phosphodiester incision. Wat, water. (c) Close-up of the strand-separating

PYIP wedge, with selected distances to DNA bases that stack with residues defining the protein

surface. The hypoxanthine (dark red) is partially buried behind the wedge.

ART IC L E S

14 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 48: Nature Structural Molecular Biology February

(Fig. 4e). This model furthermore suggests that rejection of the closelyrelated thymine is governed by steric repulsion between the additionalmethyl group in thymine and the side chain of Pro82 within theconserved PYIP wedge. The smaller pyrimidine ring of uracil willshare a smaller contact surface with Leu85 and Leu142, which mayexplain the lower affinity to uracil compared with the larger purinessuch as hypoxanthine27.

EndoV has also been shown to cleave substrates with base mis-matches and helical distortions, such as mismatch loops, hairpins andflap structures, implying that EndoV can possibly also accommodatenative bases in the recognition pocket. Notably, EndoV cleavesmismatched base pairs preferentially at adenine and guanine pur-ines11. However, EndoV binds to cleaved mismatch base pair productswith much lower affinity as compared to cleaved deaminated bases27,indicating that the pocket is substantially less favorable for adenineand guanine than for the corresponding deaminated bases. Theseresults, combined with the present structural analysis, suggest that theEndoV nucleobase pocket is optimized for binding deaminated bases,yet capable of accommodating the normal adenine and guanine basesin a context of a base mismatch or DNA helix distortion only.

Tight binding of DNA ends secure a stable product complexThe crystal structure of wild-type EndoV in complex with DNA revealsthe atomic details of the 3¢ incised product, including the orientation ofall catalytic residues and the metal-ion cofactor (Figs. 3b and 5a). Thedistance between the Gua9 free 3¢ OH and the Ade10 free 5¢ phosphateis B4.75 A at the cleavage site, and the electron density for bothterminal groups is distinct, demonstrating complete phosphodiestercleavage (Fig. 5b). The Mg2+ ion is directly coordinated to the 3¢ OHgroup of Gua9 and the two catalytic residues Asp43 and Asp110.Removal of Asp43, Glu89 or Asp110 severely affect catalysis18. Twowater molecules connect the Mg2+ ion to the conserved Glu89, whereasanother water molecule bridges the metal cofactor and the free5¢ phosphate. The terminal 5¢ phosphate is additionally held firmlyin place by Lys139 and His214 (Fig. 5a). The side chain of theconserved Lys139 bridges two DNA phosphate groups on either sideof the incision (Figs. 3a and 5a). Combined, this specific hydrogen-bonding network seems to be fine-tuned to secure strong binding ofthe free DNA ends of the cleavage product after catalysis.

Binding of EndoV to the intermediatecytotoxic single-strand break product is prob-ably crucial for recruitment of and controlledhandover to downstream processing factors,as proposed for BER nucleases APE1 andEndoIV26. The side chain of Leu85 forms aphysical barrier between the catalytic site andthe lesion-recognition pocket, separating theprocesses of lesion recognition and strandincision by about B8 A. Thus, the presentPC defines how EndoV binds and protects thesingle-strand break product in the initial stepof this pathway. The permanent insertion ofthe deaminated base in the lesion-recognitionpocket following catalysis secures tight bind-ing of the nicked product.

The structurally related enzymes Argo-naute24 and UvrC25 have catalytic triads con-sisting of two aspartate residues and onehistidine residue, corresponding to Asp43,Asp110 and His214 (DDH-motif) in T. mar-itima EndoV. Both these proteins also bind

only one Mn2+ ion, yet these enzymes may bind two metals in thepresence of a nucleic acid substrate25. However, the metal ion in bothArgonaute and UvrC is directly coordinated to the histidine side chain(Supplementary Fig. 4 online), whereas, in the PC of EndoV, His214 isdirectly involved in DNA binding (Fig. 5a). The second metal ion isnot required to bind the product with high affinity. Two metal ionshave been observed in the E. coli Tn5 transposase–DNA complex28 andin the Bacillus halodurans RNase HI RNA–DNA hybrid complex29,binding to a related DDD/DDE motif. In eukaryotic EndoV, His214 isreplaced by an aspartate residue (Supplementary Fig. 1), so theresultant DDD motif could resemble the two-metal binding sites inRNase HI and Tn5 transposase. In that case, the second Mg2+ cationcould function as a bridge between the negatively charged aspartate

O O

O

N NH

N85

N112

C83

a b

c d e

N122

C112

NNH

H

N

Tautomerization

RiboseRibose

NXthXth

N N

H

O

O

H

H

H

O

ON85

N112

C83

N122

C112

H

N

RiboseRiboseFront viewSide view

Tautomerization

NHxHx

NH

O

N N

H O

H

H

N

NN

O

Figure 4 Protein-DNA contacts in the base lesion pocket. (a) Diagram of interactions involved in

hypoxanthine recognition, shown in both side view and front view. Hydrogen bonds and steric interactions

are shown with dashed lines; the van der Waals volumes of selected residues involved in hypoxanthine

contacts are represented by dotted surfaces. (b) Tautomeric forms of hypoxanthine (hx) and detailed

hydrogen-bonding network. (c) Tautomeric forms of xanthine (Xth) and detailed hydrogen-bonding

network. (d) Model of xanthine binding to EndoV. (e) Model of uracil binding to EndoV. Wat, water.

a

b

Figure 5 Active-site architecture of the EndoV–DNA complex. (a) Stereo pair

showing the active site with the free 3¢ hydroxyl and 5¢ phosphate groups of

the nicked product DNA. A thin protein surface slab shows the stericseparation of the recognition pocket and the catalytic center (yellow Mg2+).

(b) Stereo pair of the active-site region with experimental electron density

showing the DNA incision (density gap) and Mg2+ coordination (2Fo – Fc

map contoured at 1.0s).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 4 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 49: Nature Structural Molecular Biology February

side chain and the free 5¢ phosphate in the incised product, therebydiminishing the phosphate-carboxylate electrostatic repulsion thatwould otherwise exist in a protein–DNA complex of eukaryotic EndoV.

The D43A mutant seems unable to bind the Mg2+ cofactor—no electron density corresponding to a metal binding to the proteinis apparent in the LRC structure. Finally, comparison of the LRC andPC complexes of T. maritima EndoV did not reveal any conforma-tional change or substantial shifts in the protein backbone taking placeduring catalysis (Supplementary Fig. 5 online).

DISCUSSIONTogether, the structures of the EndoV–DNA substrate and productcomplexes presented here provide new insight into the initial step ofwhat has been a structurally undefined, but biologically crucial, DNAbase–repair pathway1,6,8,17. The structures reveal a conserved strand-separating PYIP wedge positioned to have an important role as a lesionsensor by presenting deaminated bases to the lesion-specific pocket.The wedge may also work as a sensor for detection of base mismatches,particularly involving purine bases. Moreover, this motif could have anindependent role in recognition of DNA structures with helix distor-tions by wedging into the more open DNA minor groove in such sub-strates. The present structural analysis suggests that the EndoV lesionbinding pocket is optimized for recognizing deaminated bases, butnormal adenine and guanine bases can possibly also be accommodatedif present in a base mismatch or close to a DNA helix distortion.

Finally, these EndoV structures address the paradox of a base-repairpathway that cleaves DNA 1 nt 3¢ of a base lesion: the 3¢ downstreamnicking is due to a physical barrier between the catalytic site and thelesion-recognition pocket that enforces a 1-nt offset strand incision.This dual pocket feature secures tight binding of the cytotoxic, nickedrepair intermediate to EndoV by permanent insertion of the deami-nated base in the lesion-recognition pocket in combination with tighthydrogen-bonding of both the free 3¢ and 5¢ ends by conservedresidues in the active site.

METHODSExpression and purification of endonuclease V from Thermotoga maritima.

A pET28b plasmid including a full-length T. maritima EndoV mutant D43A or

wild-type sequence was transformed into the E. coli expression strain BL21

CodonPlus (DE3) RIL (Stratagen), and overexpressed in LB broth cultures

supplemented with 50 mg l–1 kanamycin. Expression was induced with 0.5 mM

IPTG at an optical density at 600 nm (OD600) of B0.75 for 2–3 h at 37 1C.

Cell-free extracts were prepared by 3 � 30-s sonications of cell pellets

dissolved in 50 mM NaCl, 20mM MES, pH 6.5, and 10mM b-mercaptoethanol

(buffer A). Cell debris was removed by centrifugation at 20,000–27,000g for

30 min. The protein extracts were incubated at 70 1C for 15 min, followed by a

second centrifugation step. The cell-free extracts were loaded onto 5-ml HiTrap

SP XL columns (GE Healthcare) equilibrated with buffer A. Each protein was

eluted using a linear gradient to 1 M in NaCl. The proteins were eluted in

fractions and analyzed by SDS-PAGE, and pure EndoV fractions were pooled

and dialyzed against buffer A. Aliquots of EndoV (B8mg ml–1) were stored at

–20 1C before crystallization.

Crystallization and data collection of the lesion-recognition complex. An

11-mer DNA oligonucleotide (Operon Biotechnologies GmbH) with sequence

5¢-GC-5BrU-AC-Hx-GA-5BrU-CG-3¢, containing both 5-bromo-deoxyurasil

(5BrU) and hypoxanthine (Hx), was annealed to a complementary strand

with T opposite Hx. Purified D43A EndoV was thawed on ice, mixed with the

DNA in molar ratio of 1:1.5 (excess DNA) and equilibrated on ice for 430

min. Plate-shaped crystals of the EndoV–DNA complex were obtained by the

vapor diffusion method at room temperature (17–22 1C) using hanging drops

equilibrated against 6–12% (w/v) MPEG 2k in 200 mM imidazole-matate

buffer, pH 5.8–7.4. Crystals grew to a final size of about 0.1 mm, and they were

mounted in cryoloops and flash frozen in liquid nitrogen following a short soak

in mother liquor supplemented with ethylene glycol to a final concentration of

30% (v/v). A complete SAD data set to 2.1-A resolution (T ¼ 100 K, l ¼0.9117 A) was collected using beamline BL12.3.1 at the Advanced Light Source

synchrotron, Berkeley Laboratories. Diffraction images were processed and the

integrated data were scaled and merged with the HKL2000 suite30 (Table 1).

Crystallization and data collection of the wild-type EndoV product complex.

The structure of the LRC revealed that the combination of a low melting point

for the 11-mer with the presence of a short self-complementary segment in the

DNA resulted in a dimerization with two EndoV monomers binding to one

single duplex DNA with two lesions (Supplementary Fig. 2a). Hoping to get a

new crystal form of EndoV with a single-lesion duplex DNA, purified wild-type

EndoV (8 mg ml–1) supplemented with 5mM MgCl2 was mixed with a 15-mer

DNA oligonucleotide with sequence 5¢-ATGCGAC-Hx-GAGCCGT-3¢ (with the

complementary strand containing T opposite Hx) in molar ratio 1:1.5 (excess

DNA). The protein-DNA mixture was screened at room temperature using an

Oryx robot (Douglas Instruments). Plate-shaped crystals were obtained with

10% (w/v) polyethylene glycol 4000, 0.2 M ammonium acetate, 0.01 M calcium

chloride and 0.05 M sodium cacodylate, pH 6.5. Crystals were soaked in mother

liquor supplemented with 20% (w/v) glucose before flash freezing in liquid

nitrogen. A complete data set to 2.15-A resolution (T ¼ 100 K, l ¼ 0.9185 A)

was collected using beamline ID14-4 at the European Synchrotron Radiation

Facility in Grenoble, France. Diffraction images were processed using Mosflm31,

and the integrated data were scaled and merged with CCP4/Scala32. Despite

using a different DNA sequence and oligonucleotide length (Supplementary

Fig. 2), wild-type EndoV still crystallized with two EndoV proteins in the

asymmetric unit, both bound to a single-stranded, self-priming DNA forming a

DNA duplex through crystallographic two-fold symmetry (Table 1).

Structure determination. Solve/Resolve33 was used to calculate initial experi-

mental phases from the 2.1-A SAD data of the D43A EndoV LRC. Four

bromine sites were identified. Phases were improved by density modification

before automatic building of 295 residues in two polypeptide chains using

ARP/wARP34. The wild-type EndoV PC was initially refined by rigid-body

refinement using the atomic coordinates of the protein part from the refined

LRC as a starting model.

Table 1 X-ray data collection and refinement statistics

Wild-type EndoV (PC) D43A EndoV (LRC, 5-BrU peak)

Data collection

Space group I222 I222

Cell dimensions

a, b, c (A) 55.06, 134.29, 194.45 51.98, 132.24, 191.62

Resolution (A) 50–2.15 (2.23–2.15)* 50–2.10 (2.19–2.10)

Rsym (%) 7.8 (59.1) 5.3 (34.9)

I / sI 7.8 (2.6) 69 (5.9)

Completeness (%) 99.2 (98.9) 87.6 (62.0)

Redundancy 6.0 (5.6) 13.3 (10.6)

Refinement

Resolution (A) 50–2.15 50–2.10

No. reflections 39,299 32,087

Rwork / Rfree (%) 25.9 / 28.8 21.6 / 25.8

No. atoms

Protein 3,580 3,574

Ligand/ion 435 648

Water 216 221

B-factors

Protein 44.5 54.7

Ligand/ion 66.0 74.1

Water 47.6 58.8

r.m.s. deviations

Bond lengths (A) 0.006 0.006

Bond angles (1) 1.3 1.3

*Values in parentheses are for highest-resolution shell (2.23–2.15 and 2.19–2.10,respectively).

ART IC L E S

14 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 50: Nature Structural Molecular Biology February

Model building and refinement. Generally, the two peptide chains in each

structure were completed and adjusted by manual inspection and modeling in

Coot35, interspersed with simulated annealing refinement in CNS1.1 (ref. 36).

Improvements of the models were monitored with Rfree cross-validation against

8% and 5% of the data for D43A and wild-type EndoV, respectively. The DNA

was manually built into each model using both Fo – Fc and 2Fo – Fc Fourier

maps as guidelines.

For the LRC, a heptamer single-stranded DNA fragment flanking the flipped

hypoxanthine base was easily recognized in the initial difference map. A gradual

improvement of the model revealed the location of the remaining four bases in

the lesion strand, as well as five nucleotides belonging to the complementary

strand (Supplementary Fig. 3). Solvent water molecules were located in

succeeding difference maps and manually filtered. Refinement of the occupancy

factor for the two strands gave an occupancy factor of B1.0 and B0.70 for the

lesion and complementary strand, respectively. In the final refinement, the

occupancy for the complementary strand was fixed at 0.75. The occupancy

factor for the first three terminal nucleosides at the 5¢ end of the lesion strand

was also fixed at 0.50/0.75. The final model of the EndoV LRC contains two

crystallographically independent EndoV–DNA complexes (each with 223 amino

acid residues and 16 nucleotides) as well as 222 solvent water molecules.

The initial refinement of the PC structure was carried out by rigid-body

refinement and simulated annealing of the protein part of the EndoV LRC. The

side chains of residues Asp43, Glu89 and Asp110 were also initially removed

and remodeled in the final steps of the refinement. A short DNA fragment

flanking the flipped hypoxanthine base was readily identified in the initial

difference maps. Gradual improvement of the model revealed the location of

additional nucleotides and one Mg2+ ion in the active site. Solvent water

molecules were located by difference maps and manual filtering. The final

model of the EndoV product complex contains two crystallographically

independent EndoV–DNA complexes (each with 223 amino acid residues,

8 or 13 nt and 1 Mg2+ ion) and 216 solvent water molecules. Refinement

statistics for the two models are listed in Table 1. Electrostatic potential of

wild-type T. maritima EndoV was calculated by APBS37 and the degree of

conservation was calculated using ConSeq38. All structural figures were

prepared with PyMol (Delano Scientific, http://www.pymol.org).

Accession codes. Protein Data Bank: Coordinates and structure factors for the

PC and the LRC have been deposited with the accession codes 2W35 and

2W36, respectively.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSThe authors acknowledge the technical support at the BL12.3.1 beamline atAdvanced Light Source, Berkeley Laboratories and the ID14-4 beamline at theEuropean Synchrotron Radiation Facility, Grenoble, used to collect X-raydiffraction data. Base repair research in the Tainer laboratory is funded by a grantfrom the US National Institutes of Health. This work in the Bjoras laboratory isfunded by the EU, the Norwegian Research Council (FUGE-CAMST) and theNorwegian Cancer Society.

AUTHOR CONTRIBUTIONSB.D. and M.B. designed and performed experiments, analyzed data and wrote themanuscript; J.A.T., I.A. and W.C. analyzed data and wrote the manuscript; A.S.Adesigned and performed experiments and analyzed data. I.R., O.E.O., P.H.B. andH.G. designed and performed experiments.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Demple, B. & Linn, S. On the recognition and cleavage mechanism of Escherichia coliendodeoxyribonuclease V, a possible DNA repair enzyme. J. Biol. Chem. 257,2848–2855 (1982).

2. Schouten, K.A. & Weiss, B. Endonuclease V protects Escherichia coli against specificmutations caused by nitrous acid. Mutat. Res. 435, 245–254 (1999).

3. Hussain, S.P., Hofseth, L.J. & Harris, C.C. Radical causes of cancer. Nat. Rev. Cancer3, 276–285 (2003).

4. Nguyen, T. et al. DNA damage and mutation in human cells exposed to nitric oxidein vitro. Proc. Natl. Acad. Sci. USA 89, 3030–3034 (1992).

5. Wink, D.A. et al. DNA deaminating ability and genotoxicity of nitric oxide and itsprogenitors. Science 254, 1001–1003 (1991).

6. Wood, R.D., Mitchell, M., Sgouros, J. & Lindahl, T. Human DNA repair genes. Science291, 1284–1289 (2001).

7. Guo, G. & Weiss, B. Endonuclease V (nfi) mutant of Escherichia coli K-12. J. Bacteriol.180, 46–51 (1998).

8. Weiss, B. Removal of deoxyinosine from the Escherichia coli chromosome as studied byoligonucleotide transformation. DNA Repair (Amst.) 7, 205–212 (2008).

9. Yao, M., Hatahet, Z., Melamede, R.J. & Kow, Y.W. Purification and characterization of anovel deoxyinosine-specific enzyme, deoxyinosine 3¢ endonuclease, from Escherichiacoli. J. Biol. Chem. 269, 16260–16268 (1994).

10. Hegde, M.L., Hazra, T.K. & Mitra, S. Early steps in the DNA base excision/single-strandinterruption repair pathway in mammalian cells. Cell Res. 18, 27–47 (2008).

11. Huang, J., Lu, J., Barany, F. & Cao, W. Multiple cleavage activities of endonuclease Vfrom Thermotoga maritima: recognition and strand nicking mechanism. Biochemistry40, 8738–8748 (2001).

12. Yao, M. & Kow, Y.W. Interaction of deoxyinosine 3¢-endonuclease from Escherichiacoli with DNA containing deoxyinosine. J. Biol. Chem. 270, 28609–28616 (1995).

13. Feng, H., Dong, L., Klutz, A.M., Aghaebrahim, N. & Cao, W. Defining amino acidresidues involved in DNA-protein interactions and revelation of 3¢-exonuclease activityin endonuclease V. Biochemistry 44, 11486–11495 (2005).

14. Yao, M. & Kow, Y.W. Strand-specific cleavage of mismatch-containing DNA bydeoxyinosine 3¢-endonuclease from Escherichia coli. J. Biol. Chem. 269,31390–31396 (1994).

15. Yao, M. & Kow, Y.W. Cleavage of insertion/deletion mismatches, flap and pseudo-Y DNAstructures by deoxyinosine 3¢-endonuclease from Escherichia coli. J. Biol. Chem. 271,30672–30676 (1996).

16. Weiss, B. Endonuclease V of Escherichia coli prevents mutations from nitrosativedeamination during nitrate/nitrite respiration. Mutat. Res. 461, 301–309 (2001).

17. Moe, A. et al. Incision at hypoxanthine residues in DNA by a mammalian homologue ofthe Escherichia coli antimutator enzyme endonuclease V. Nucleic Acids Res. 31,3893–3900 (2003).

18. Huang, J., Lu, J., Barany, F. & Cao, W. Mutational analysis of endonuclease V fromThermotoga maritima. Biochemistry 41, 8342–8350 (2002).

19. Katayanagi, K. et al. Three-dimensional structure of ribonuclease H from E. coli.Nature 347, 306–309 (1990).

20. Yang, W., Hendrickson, W.A., Crouch, R.J. & Satow, Y. Structure of ribonuclease Hphased at 2 resolution by MAD analysis of the selenomethionyl protein. Science 249,1398–1405 (1990).

21. Ariyoshi, M. et al. Atomic structure of the RuvC resolvase: a holliday junction-specificendonuclease from E. coli. Cell 78, 1063–1072 (1994).

22. Ceschini, S. et al. Crystal structure of the fission yeast mitochondrial Holliday junctionresolvase Ydc2. EMBO J. 20, 6601–6611 (2001).

23. Davies, D.R., Goryshin, I.Y., Reznikoff, W.S. & Rayment, I. Three-dimensional structureof the Tn5 synaptic complex transposition intermediate. Science 289, 77–85 (2000).

24. Song, J.J., Smith, S.K., Hannon, G.J. & Joshua-Tor, L. Crystal structure of Argonauteand its implications for RISC slicer activity. Science 305, 1434–1437 (2004).

25. Karakas, E. et al. Structure of the C-terminal half of UvrC reveals an RNase H endonu-clease domain with an Argonaute-like catalytic triad. EMBO J. 26, 613–622 (2007).

26. Hitomi, K., Iwai, S. & Tainer, J.A. The intricate structural chemistry of base excisionrepair machinery: implications for DNA damage recognition, removal, and repair. DNARepair (Amst.) 6, 410–428 (2007).

27. Yao, M. & Kow, Y.W. Further characterization of Escherichia coli endonuclease V.Mechanism of recognition for deoxyinosine, deoxyuridine, and base mismatches inDNA. J. Biol. Chem. 272, 30774–30779 (1997).

28. Steiniger-White, M., Rayment, I. & Reznikoff, W.S. Structure/function insights into Tn5transposition. Curr. Opin. Struct. Biol. 14, 50–57 (2004).

29. Nowotny, M., Gaidamakov, S.A., Crouch, R.J. & Yang, W. Crystal structures of RNase Hbound to an RNA/DNA hybrid: substrate specificity and metal-dependent catalysis.Cell 121, 1005–1016 (2005).

30. Otwinowski, Z. & Minor, W. Processing of X-ray Diffraction Data Collected in OscillationMode. Methods Enzymol. 276, 307–326 (1997).

31. Leslie, A.G.W. Recent changes to the MOSFLM package for processing film and imageplate data. Joint CCP4 + ESF-EAMCB Newsletter on Protein Crystallography 26 (1992).

32. Collaborative Computational Project, Number 4. The CCP4 suite: programs for proteincrystallography. Acta Crystallogr. D 50, 760–763 (1994).

33. Terwilliger, T.C. & Berendzen, J. Automated MAD and MIR structure solution.Acta Crystallogr. D Biol. Crystallogr. 55, 849–861 (1999).

34. Perrakis, A., Morris, R. & Lamzin, V.S. Automated protein model buildingcombined with iterative structure refinement. Nat. Struct. Biol. 6, 458–463 (1999).

35. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. ActaCrystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).

36. Brunger, A.T. et al. Crystallography & NMR system: a new software suite for macro-molecular structure determination. Acta Crystallogr. D Biol. Crystallogr. 54, 905–921(1998).

37. Baker, N.A., Sept, D., Joseph, S., Holst, M.J. & McCammon, J.A. Electrostatics ofnanosystems: application to microtubules and the ribosome. Proc. Natl. Acad. Sci.USA 98, 10037–10041 (2001).

38. Berezin, C. et al. ConSeq: the identification of functionally and structurally importantresidues in protein sequences. Bioinformatics 20, 1322–1324 (2004).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 4 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 51: Nature Structural Molecular Biology February

Biological basis for restriction of microRNA targets tothe 3¢ untranslated region in mammalian mRNAsShuo Gu1, Lan Jin1, Feijie Zhang1, Peter Sarnow2 & Mark A Kay1

MicroRNAs (miRNAs) interact with target sites located in the 3¢ untranslated regions (3¢ UTRs) of mRNAs to downregulate theirexpression when the appropriate miRNA is bound to target mRNA. To establish the functional importance of target-sitelocalization in the 3¢ UTR, we modified the stop codon to extend the coding region of the transgene reporter through the miRNAtarget sequence. As a result, the miRNAs lost their ability to inhibit translation but retained their ability to function as smallinterfering RNAs in mammalian cells in culture and in vivo. The addition of rare but not optimal codons upstream of theextended opening reading frame (ORF) made the miRNA target site more accessible and restored miRNA-induced translationalknockdown. Taken together, these results suggest that active translation impedes miRNA-programmed RISC association withtarget mRNAs and support a mechanistic explanation for the localization of most miRNA target sites in noncoding regions ofmRNAs in mammals.

miRNAs are a class of short, 20–22-nt regulatory RNAs expressed inplants and animals1,2. Up to 4% of the human genome is predicted tocode for more than 400 miRNAs, which are estimated to regulate atleast 30% of all human genes3–5. Although the specific functions ofvery few have been well established, a growing body of evidenceindicates that miRNAs have important regulatory roles in a vast rangeof biological processes6–8. In plants, most miRNAs hybridize to targetmRNAs with a near-perfect complementarity, and they mediate anendonucleolytic cleavage through a similar, if not identical, mechan-ism to that used by the small interfering RNA (siRNA) pathway9. Inanimals, with few exceptions, most of the known miRNAs form animperfect duplex, with sequences located solely in the 3¢ UTR regionof target mRNA (base-pairing of a minimum 7-nucleotide seedsequence is required)10–12. The central mismatch between miRNA-mRNA hybridization may be responsible for the lack of RNAi-mediated mRNA-cleavage events in animals13,14. The associationbetween the miRNA-programmed RNA-induced silencing complex(RISC) and the target mRNA induces translational repression througha poorly understood mechanism. There is evidence supporting modelsin which translation repression occurs at the initiation stage or latersteps, including elongation (reviewed in refs. 15,16). Repressed mRNAand associated Argonaute (Ago) proteins are enriched in Processingbodies (P-bodies), where endogenous cellular mRNAs are kept forstorage and degradation17,18, which may partially explain whymiRNA-mediated translational inhibition is often coupled withsome RISC-independent target-mRNA degradation19.

In contrast to an siRNA, which can target almost any part of anmRNA and be fully functional, almost all identified target sites forendogenous miRNAs are located in the 3¢ UTR of target mRNAs in

animals. This has been established by extensive bioinformatic sequenceanalyses and by experimental approaches2. To further define themolecular events involved in miRNA-induced silencing, we clonedboth the human mir-30 and Drosophila melanogaster bantam miRNAtarget sites into the 3¢ UTR of the luciferase and green fluorescentprotein (GFP) reporter genes so that, by deleting one nucleotide in thestop codon, we were able to extend the ORF into the target site whilemaintaining the bioactivity of the protein. Using these reporterconstructs as a starting point, in combination with the correspondingshort hairpin RNA (shRNA) and miRNA expression cassettes, weprovide experimental proof that there is a functional basis for theobserved distribution of miRNA target sites in mammalian systems.

RESULTSmiRNA-mediated repression is abolished in extended ORFsTo establish whether miRNAs can retain their negative regulatoryactivity if their targets remain in the 3¢ UTR of an mRNA but becomeembedded within the coding sequence, we constructed luciferaseexpression plasmids that contained either (i) no miRNA target sites,(ii) tandem mir-30 target sites in the 3¢ UTR or (iii) mir-30 target siteswith an additional single-base insertion, abolishing the stop codonand extending the ORF through the mir-30 sites (Fig. 1a). Eachplasmid was tested for miRNA-induced silencing in mammalian cells.Specifically, luciferase plasmids were co-transfected with plasmids thatcan direct the expression of miRNAs, such as sh-mir-30 (mismatch),sh-mir-30P (perfect complementarity) or sh-Scramble (scrambledcontrol) (Fig. 1b,c).

We first established that the mir-30 and mir-30P expressed from U6-driven cassettes were processed correctly and resulted in similar levels

Received 21 July 2008; accepted 2 January 2009; published online 1 February 2009; doi:10.1038/nsmb.1552

1The Center for Clinical Science Research, Room 2105, 269 Campus Drive, Stanford, California 94305-5164, USA. 2Department of Microbiology and Immunology,300 Pasteur Drive, Room D309, Stanford University, Stanford, California 94305, USA. Correspondence should be addressed to M.A.K. ([email protected]).

14 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 52: Nature Structural Molecular Biology February

of the mature miRNA transcripts between transfection experiments(Fig. 1c). As expected, co-transfection of HEK293 cells and NIH3T3cells with plasmids expressing sh-mir-30, sh-mir-30P or a scrambledshRNA with a Firefly luciferase (FF-luciferase) reporter constructwithout mir-30 target sites did not alter FF-luciferase expression, asmeasured by enzymatic activity in a dual-luciferase assay (Fig. 1d,e).Moreover, this experiment established that there were no off-targeteffects using this reporter system from the U6-shRNA–expressingconstructs. Consistent with previous studies20, sh-mir-30 effectivelydownregulated FF-luciferase expression by more than 60%, whereassh-mir-30P inhibited FF-luciferase expression by 490% when tandemmir-30 target sites were present in the 3¢ UTR region (Fig. 1d,e).Notably, when the same target sites were embedded within the exten-ded coding region in both HEK293 cells and NIH3T3 cells (Fig. 1d,e),sh-mir-30–induced repression, but not sh-mir30P–induced repression,was abolished (o3% for HEK293 cells and o15% for NIH3T3 cells).

The construct containing the tandem mir-30 target sites in theextended ORF was predicted to produce a FF-luciferase withextra amino acids at the C-terminal end. Although the luci-ferase activity produced from the extended ORF was about 100times lower than the wild-type FF-luciferase activity (data notshown), the enzymatic activity was still in the linear range of theassay. A western blot showed that, as expected, the ORF-extended

protein migrated with a higher molecular weight, with a signalintensity similar to that of the wild-type protein (Fig. 1f). Notably,relative changes in the protein-band intensity for both the wild-typeand extended ORF paralleled the changes in luciferase-activity mea-surements under all conditions when they were directly compared.

To confirm that both miRNA- and RNAi-mediated mechanismswere active, we measured luciferase mRNA levels in transfectedNIH3T3 cells using an RNase protection assay. Coexpressionof the sh-mir-30 and the reporter containing the miRNA target inthe 3¢ UTR resulted in a 70% downregulation of enzymatic activityand no detectable variation in mRNA, indicating that the reduction inprotein level was primarily the result of translational repression, whichin turn is suggestive of miRNA-mediated inhibition (Fig. 1g). Incontrast, the concentration of the FF-luciferase extended ORF mRNAdid not change in the presence of mir-30 expression, butwas greatly reduced when mir-30P was coexpressed (Fig. 1g). Theseresults show that, whereas miRNA-mediated translational inhibitionwas limited to targets in the untranslated region, RNAi-mediatedactivity directed against the same sequence remained functional,whether or not the site was within a coding sequence. This isconsistent with a previous report21 where only minor reductionsof siRNA-mediated cleavage efficiency were observed when targetsites were switched from an untranslated to translated region.

Figure 1 miRNA-mediated repression is

abolished in extended ORFs. (a) The reporter

constructs used in this study. pGL3-control

contains no miRNA target sites. pGL3-3¢ UTR

contains two tandem mir-30 targets sites located

in the 3¢ UTR. In pGL3-ORF, the upstream stop

codon is abolished and the mir-30 target sites

are covered by an extended ORF. Grey box

represents the ORF of the FF-luciferase gene.

Dark boxes represent tandem mir-30 target sites

with 6 nt in between. Positions of the upstream

(original) stop codon and downstream stop

codon are indicated by solid and dotted arrows,

respectively. (b) Schematic illustration of the

interactions between the mir-30 target sequenceand the guiding-strand sequence of sh-mir-30

and sh-mir-30P, respectively. (c) NIH3T3 cells

were co-transfected with plasmids, as described

above. Sh-RNA expressed from a U6-driven

cassette was detected by northern blotting using

either a probe against mir-30 (above) or a probe

against mir-30P (below). Owing to sequence

similarity, cross-hybridization was observed.

Endogenous U6 snRNA was also detected as

an internal control. (d,e) HEK293 cells (d) and

NIH3T3 cells (e) were co-transfected with

different combinations of plasmids, and dual-

luciferase assays were performed 36 h post-

transfection. FF-luciferase activities were

normalized with RL-luciferase, and the

percentage of relative enzyme activity compared

to the negative control (treated with sh-Scramble)

was plotted. Error bars represent the s.d. fromthree independent experiments, each performed

in triplicate. (f) Protein analysis by western

blotting was performed in transfected 3T3 cells.

A protein band of b-actin was used as an internal

control. Positions of the bands representing wild-

type or mutant FF-luciferase were indicated by

arrows. A nonspecific band was indicated by an asterisk. (g) RNA levels of either FF-luciferase (FF) or RL-luciferase (RL) from transected 3T3 cells were

detected by an RNase protection assay. Full-length probes and protected bands are indicated in the figure. A band labeled with an asterisk is possibly due to

a truncated RL-luciferase probe and, therefore, corresponds to the RL-luciferase mRNA level.

SV40 FF-luc

No target site

SV40 FF-luc

2× Mir30 target sites

SV40 FF-luc

2× Mir30 target sites

pGL3-control

pGL3-3′ UTR

pGL3-ORF

Mir30 :

Mir30P :1×Mir30 target :

3′ - CGACGUUUGU5′ - GCUGCAAACA

CUGACUUUC - 5′GACUGAAAG - 3′

AGG

AA1×Mir30 target :

3′ - CGACGUUUGUUUCUGACUUUC - 5′5′ - GCUGCAAACAAAGACUGAAAG - 3′

3T3 Cells

0%20%40%60%80%

100%120%

Control sh-Mir30 sh-Mir30P

293 Cells

0%20%40%60%80%

100%120%140%

No targets 3′ UTR Coding region

No targets 3′ UTR Coding region

Control sh-Mir30 sh-Mir30P

U6 sh-Scramble U6 sh-Mir30PU6 sh-Mir30

U6 snRNA

Mature small RNA

Mature small RNA

Precursor

Precursor

Mar

ker

3' U

TR

+ s

h-S

cram

ble

3' U

TR

+ s

h-M

ir30

3' U

TR

+ s

h-M

ir30P

OR

F +

sh-

Scr

ambl

e

OR

F +

sh-

Mir3

0

OR

F +

sh-

Mir3

0P

50

50

20

20

Leng

th (

nt)

3′ U

TR

+ s

h-S

cram

ble

3′ U

TR

+ s

h-M

ir30

3′ U

TR

+ s

h-M

ir30P

OR

F +

sh-

Scr

ambl

e

OR

F +

sh-

Mir3

0

OR

F +

sh-

Mir3

0P

Mar

ker

*Luciferase(60 kD)

Luciferase withextended ORF

(65 kD)

β-actin

FF mRNA

RL mRNA

Mar

ker

3′ U

TR

+ s

h-S

cram

ble

3′ U

TR

+ s

h-M

ir30

3′ U

TR

+ s

h-M

ir30P

OR

F +

sh-

Scr

ambl

e

OR

F +

sh-

Mir3

0

OR

F +

sh-

Mir3

0P

150

100

Leng

th (

nt)

Und

iges

ted

prob

es

RL probFF prob

*

a d

e

f

g

b

c

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 4 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 53: Nature Structural Molecular Biology February

To establish that miRNA-mediated repression between target sitesin the 3¢ UTR and coding region was not limited to a single reportersystem or cell line, we placed the same miRNA targets into anenhanced green fluorescent protein (EGFP) reporter gene and coex-pressed this construct with the various miRNAs (Fig. 1) in bothNIH3T3 (Supplementary Fig. 1 online) and HEK 293 cells (notshown). These experiments yielded similar results.

As a final test for fidelity, we replaced the mir-30 sequences with abantam miRNA target. The bantam miRNA was originally identifiedin D. melanogaster and is not believed to have a direct mammaliancounterpart22. Co-transfection studies using U6-bantam expressionplasmids in mammalian cells (Supplementary Fig. 2a,b online) gavevirtually identical results to those observed for the mir-30 constructsin both HEK293 cells and NIH3T3 cells. The ability of the bantammiRNA to repress translation was lost when the target was part of theextended ORF, but the RNAi activity induced by bantam-P wasequally robust, whether or not the target was embedded into a codingregion (Supplementary Fig. 2c,d). Moreover, to establish that theaccessibility and functionality of the miRNA target were functions ofits presence, rather than its specific position, in the 3¢ UTR, wevaried its location relative to the stop codon and poly(A) signal withthe insertion of an irrelevant B700-bp fragment, and found thatthis had little effect on miRNA-induced silencing (SupplementaryFig. 3 online).

ORFs are refractory to miRNA-mediated regulation in vivoTo establish that the regulatory miRNA circuit is biologically operativeunder physiological conditions in whole mammals, we examinedthe efficiency of the mir-30–luciferase system (Fig. 1) in mouseliver. We selected mir-30 because it is not believed to be highlyexpressed in this tissue23,24. Luciferase expression plasmids (Fig. 1)were co-transfected into mouse liver via a hydrodynamic tail veininfusion, a method known to transfect up to 30% of mouse hepato-cytes in vivo25. After 4 d, we measured luciferase expression (Fig. 2a).To control for variation in transfection efficiencies between individualanimals, the FF-luciferase expression data was normalized (Fig. 2b) toan added control plasmid expressing a third, unrelated transgeneproduct (Methods).

The data obtained from mouse liver were concordant with thedata from tissue-culture cells. Whereas RNAi-mediated knockdownactivity was robust whether the target was in the 3¢ UTR or part

of the extended coding region, miRNA-induced silencing was severely compromisedwhen the target was included within theextended ORF.

Rare codons restore miRNA-mediatedknockdownBecause our results suggested that activetranslation of mRNAs precludes miRNA-induced knockdown, we predicted that ribo-some hindrance would interfere with theability of a miRNA and its associated machin-ery to attach to its target site. To test this, weintroduced a cluster (9 residues) of rarecodons upstream of miRNA target siteslocated in the extended luciferase ORF(Fig. 3a), an approach used to cause ribo-some pausing in eukaryotes26,27. As we couldnot measure ribosome translocation directly,we constructed several different control se-

quences for direct comparison. We inserted the same 9 residues in theidentical location using an optimized set of codons, or placed the rarecodons downstream of the miRNA target. When the rare codons wereupstream of the target, miRNA-induced silencing from sh-mir-30 wasrestored to a level close to that observed for the wild type (rescue of480% and 70% in HEK293 and NIH3T3 cells, respectively). Incontrast, replacing the rare with optimal codons or placing the rarecodons downstream of the miRNA target was unable to rescuemiRNA-induced silencing (Fig. 3b–e). This confirmed that the addi-tional nucleotides or the extra amino acids were not responsible forthe differential activity of the miRNA target. To eliminate thepossibility that the addition of the extra 27 nucleotides altered thelocal RNA-folding structure—and, hence, the accessibility and efficacyof miRNA target sites—we inserted these sequences upstream of mir-30 target sites, which remain in the 3¢ UTR in the FF-luciferasereporter construct. miRNA-mediated repression was not changed(Fig. 3f). RNA analyses confirmed that the rare or optimal codonclusters had no substantial effect on the steady-state mRNA levels(Fig. 3g).

To further validate that the rescue of the miRNA repression was dueto the brief translational pause mediated by rare codons26,27, wemapped the accessibility of sequences downstream of the rare andoptimal codons using a DNA-oligonuclotide–RNase H approach28

(Fig. 4a). The sequences immediately downstream (B70 nucleotides;Fig. 4, Oligos 1–3) of the rare codons were more accessible to RNaseH–mediated cleavage than were the same sequences in the mRNAscontaining the optimal codons (Fig. 4b). In contrast, sequencesfurther downstream of the rare or optimal codons in the 3¢ UTRwere similar in their accessibility to RNase H cleavage (Fig. 4a, Oligos4 and 5), indicating that the difference in accessibility is specific to theregion just downstream of the rare codon tract (Fig. 4b). In addition,and consistent with our prediction (Fig. 4a), RNase H–mediatedcleavage was equal or modestly less robust in sequences containedupstream of the rare versus the optimal codon mRNA sequences,suggesting a slight backup of ribosomes upstream of the rare-codoninsertion (Supplementary Fig. 4 online). As the steady-state produc-tion of protein (Fig. 3e) and the average density of ribosomes alongthe mRNA as determined by polysome gradient fractionation(Supplementary Fig. 5 online) was not substantially altered by therare-codon insertion, the ribosomal pause during active translationover the specific region covered by oligos 1–3 was likely to be brief.

3′ UTR + sh-Scramble 3′ UTR + sh-Mir30 3′ UTR + sh-Mir30P

ORF + sh-Scramble ORF + sh-Mir30 ORF + sh-Mir30P

0%

20%

40%

60%

80%

100%

120%

140%

Targets in3′ UTR

Targets inORF

sh-Scramblesh-Mir30sh-Mir30P

a b

Figure 2 miRNA-mediated repression studies were concordant in mouse liver in vivo. (a) The plasmids

described in Figure 1 were transfected into mice by hydrodynamic tail injection (n ¼ 5 per group,

except group 4, where n ¼ 4 (one animal died after injection)). Real-time transgene expression was

determined 4 d after injection. (b) A control plasmid, RSV-hAAT, was co-transfected within each sample

as an internal control for transfection efficiency. The FF-luciferase activities were normalized to serum

hAAT levels measured by ELISA. The percentage of relative luciferase activity compared to negative

controls (treated with sh-Scramble) was plotted. Error bars represent the s.d.

ART IC L E S

14 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 54: Nature Structural Molecular Biology February

Repressed reporter mRNAs are associated with polyribosomesOur data were consistent with the requirement of a stable associationbetween miRNA–RISC and target mRNA for miRNA-induced trans-lational repression. We next investigated whether this associationresults in exclusion of the target mRNA from the translationalmachinery by analyzing the polysome profiles of repressed targetmRNAs. Whole-cell extracts were prepared from NIH3T3 cells trans-fected with either a luciferase or an EGFP reporter gene containingtandem mir-30 target sequences in either the 3¢ UTR or ORF, as wellas plasmids expressing sh-mir-30 or sh-Scramble. Polysome-sedimen-tation profiles of luciferase reporter mRNA and control Renilla (RL)-luciferase mRNA were measured by an RNase protection assay (RPA)

(Supplementary Fig. 6 online); EGFP mRNA and actively translatedb-actin mRNA levels were determined by northern blotting (Supple-mentary Fig. 7 online). Notably, reporter mRNAs containing targetsequences in their 3¢ UTRs or in the extended ORF and coexpressedwith sh-Scramble or sh-mir-30 showed distribution profiles similar tothose of actively translated mRNA (RL-luciferase or actin) (Supple-mentary Fig. 6b and Supplementary Fig. 7a–e). To establish thatthese mRNAs were actually associated with polyribosomes, we per-formed polysome gradient analyses after treatment with puromycin orEDTA, both of which release polysomes. As shown in SupplementaryFigures 6 and 7, the miRNA-repressed mRNAs shifted to the slow-sedimenting part of the gradient to the same degree as actively

a e

f

g

b

c

d

FF luc

2× Mir30 target sites FF luc

2× Mir30 target sites

FF luc

2× Mir30 target sites

target sites

FF luc

2× Mir30 target sites

target sites

293 cells

0%

20%

40%

60%

80%

100%

120%

3′ UTR Codingregion

Rare-before Rare-after

3′ UTR Codingregion

Rare-before Rare-after

Control sh-Mir30 sh-Mir30P

3T3 cells

0%

20%

40%

60%

80%100%

120%

140%Control sh-Mir30 sh-Mir30P

3T3 cells

0%

20%

40%

60%

80%

100%

120%

Coding region Rare codons Optimal codons

Luciferase(60 kD)

β-actin

β-actin79 40 60 36 11 32 30 6 64 59 818 14 14 15 14 14 14 14 1514 144.5 2.7 4.2 2.5 0.8 2.2 2.1 0.4 4.6 4.0 0.5

Luciferase

Ratio

3′ U

TR

+ s

h-S

cram

ble

OR

F +

sh-

Scr

ambl

eR

are

codo

ns u

pstr

eam

+

sh-

Scr

ambl

eR

are

codo

ns u

pstr

eam

+ s

h-M

ir30

Rar

e co

dons

ups

trea

m +

sh-

Mir3

0PR

are

codo

ns d

owns

trea

m +

sh-

Scr

ambl

eR

are

codo

ns d

owns

trea

m +

sh-

Mir3

0R

are

codo

ns d

owns

trea

m +

sh-

Mir3

0PO

ptim

al c

odon

s +

sh-

Scr

ambl

e

Opt

imal

cod

ons

+ s

h-M

ir30

Opt

imal

cod

ons

+ s

h-M

ir30P

3T3 cells

0%

20%

40%

60%

80%

100%

120%

3′ UTR Rare codon in 3′ UTR

FF mRNA

RL mRNA

Mar

ker

150

100

Leng

th (

nt)

1 2 3 4 5 6 7 8 9 10 11

FF-luc

2× Mir30 target sites

Target sites in 3′ UTR

FF-luc

2× Mir30 target sites

Rare codon in 3′ UTR

(a1) Target sites in 3′ UTR

(a4) Rare codons after(a3) Rare codons before

(a2) Target sites in ORF

Control sh-Mir30 sh-Mir30P

Control sh-Mir30 sh-Mir30P

Figure 3 Insertion of rare codons upstream of the

extended miRNA ORF rescues miRNA-mediated

knockdown. (a) The maps of the reporter

constructs used in this study. Plasmids containing

tandem mir-30 target sequence in either 3¢ UTR

(a1) or ORF (a2) are the same as those described

in Figure 1. A cluster of rare codons (represented

as a dark box) were inserted either upstream (a3)

or downstream (a4) of mir-30 target sequences.

In another construct, the upstream rare codons

(a3) were replaced with optimal-codon sequences

that code for the same peptide sequence. The

arrows and gray box represent the position of the

miRNA target sequences. (b–d) HEK293 cells (d)

and NIH3T3 cells (c,d) were transfected with thereporter constructs illustrated in a. Dual-

luciferase assays were performed 36 h post-

transfection. FF-luciferase activities were

normalized with RL-luciferase, and the

percentage of relative enzyme activity compared

to the negative control (treated with sh-Scramble)

was plotted. Error bars represent s.d. from three

independent experiments, each performed in

triplicate. (e) Protein levels of reporter genes were

analyzed by western blotting in transfected

NIH3T3 cells. (f) NIH3T3 cells were transfected

with constructs as indicated in the figure.

Insertion of a rare-codon cluster (dark box)

upstream of mir-30 targets sites in the 3¢ UTR

did not substantially change the miRNA-induced

repression. (g) RNA levels of reporter genes were

analyzed by RNase protection assay. The loading

sequence of lines 1–11 is same as noted in e.

0%

20%

40%

60%

80%

100%

120%

140%

Control Oligo 1 Oligo 2 Oligo 3 Oligo 4 Oligo 5 Oligo 3′ UTR

Rare codon Optimal codon

qRT-PCR

Oligonucleotides 1–5Optimal codons

AAAAAAAA

Stop codon

AAAAAAAA

RNAse H-mediated cleavage in cell extracts

Oligonucleotides 1–5

S

Rare codons

Stop codon

RNA extraction

Rare or optimalcodons

DNA oligos mapped cleavage sites P-value

= 0.013

P-value= 0.0031

P-value= 0.0066

a b

Figure 4 Insertion of rare codons increases the accessibility of downstream sequences to RNase H–mediated cleavage. (a) Experimental strategy. Cells weretransfected with the luciferase reporter constructs containing the cluster of rare or optimal codons (Fig. 3a). After fixing the ribosomes on the mRNA by the

addition of cycloheximide, one of six oligonucleotides corresponding to the region between the rare or optimal codons and target 3¢ UTR was added into the

cell extracts. The hybridization of DNA oligonucleotides at the target site within the mRNA results in cleavage mediated by the endogenous RNase H activity

in the cell extracts. The extent of the cleavage represents the relative RNA accessibility, which was quantified by real-time RT-PCR (qRT-PCR) using two

primers flanking the cleavage sites. (b) Quantification of RNase H–mediated cleavage. The values are presented as the relative PCR signal compared to

control samples treated with a scrambled oligonucleotide and normalized for a GFP mRNA obtained from a co-transfected control plasmid.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 4 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 55: Nature Structural Molecular Biology February

translated mRNA after puromycin (Supplementary Figs. 6c and 7f,h)or EDTA treatment (Supplementary Figs. 6d and 7g,i). These resultsstrongly favor a model where miRNA-targeted mRNAs remainassociated with the polyribosome. At first consideration, these resultsseem concordant with some studies29–33 but contrast with otherstudies where miRNA-repressed mRNAs were found in the fast-sedimenting34,35 or puromycin-resistant, slow-sedimenting pseudo-polysomal fractions36.

DISCUSSIONTaken together, our studies, using multiple expression systems, cellsand miRNA targets, are in good concordance. Although our resultssuggest that location within the 3¢ UTR may not cause a largefunctional difference, there does seem to be a functional reason forthe localization of miRNA targets in the 3¢ UTR. We propose thatthese functional constraints may be the primary explanation for theobserved distribution pattern of miRNA targets found in mammaliancells. However, other studies have reported that artificially designed,mismatched siRNA or shRNA co-delivery studies can result in sometranslational repression when mRNAs contained target sequences inthe coding regions37,38. The source of these contradictions is notcompletely clear, but in several studies the mismatched syntheticsiRNAs were provided in very high concentrations. Other factorspossibly contributing to the degree of repression between miRNAs andtheir corresponding targets may include sequence composition4,number of target sites39, local RNA structure40 and distance betweentarget sites41. Adding or removing miRNA target sites in codingregions may not elucidate the true natural functional differencesbetween a target site residing in the coding region and one residingin the 3¢ UTR. In our study, we carefully designed our reporterconstructs such that there was only one nucleotide difference betweenthe mRNA sequences that we compared directly. Therefore, thereduction of miRNA-induced gene repression should be a direct resultof changing the target location from the 3¢ UTR to the ORF withoutmaking major alterations in the mRNA sequence.

Our data support a model whereby miRNA-programmed RISC isrequired to remain attached to the target mRNA to effectively silencetranslation in cis. Moreover, when target sites remain at the same sitein the mRNA but become part of the coding region, we suggest thatribosomal complexes override and inhibit the miRNA-programmedRISC from attaching to the target site. If the translational process isslowed, we speculate that there is less physical constraint by theribosomes, thus allowing miRNA-programmed RISC to attach tothe target.

This process seems to be functionally distinct from RISC RNAi-mediated RNA degradation, because converting the miRNA to give itperfect complementarity to the target still resulted in loss of themRNA, presumably through the RNAi pathway, whether the miRNAtarget was part of the extended coding sequence or located in the3¢ UTR. This is consistent with the finding that, unlike in mammals,miRNA target sites in plants are widely distributed across codingregions, as nearly all of them have perfect complementarity with theirtarget sequences and function through an RNAi-mediated degradationpathway. Curiously, the only known mammalian miRNA that targetsthe coding region in the mRNA has perfect complementarity with itstargets and also functions through RISC-mediated cleavage42. None-theless, we cannot exclude the possibility that some functional miRNAtargets exist in coding regions. If such sites are identified, it will be ofgreat interest to determine whether they are preceded by rare codons.In fact, one study provided evidence for a functional miRNA target inthe coding region of an endogenous mammalian gene43. The mRNA

was active when it had an extensive 17-bp, but not the more classical,7-bp, 5¢ seed match with the mRNA sequence. This suggests that thedownregulation may have been mediated by RNAi cleavage ratherthan by translational downregulation44.

It will also be of interest in future studies to determine whenfunctional miRISC–mRNA complexes can be assembled in the post-transcriptional life of an mRNA . Our results show that, if transloca-tion of the ribosome is slow, miRISC complexes can still form aftertranslational initiation begins. We favor a model where miRNA–RISCbinding to actively translating mRNAs results in reduced translationalelongation and termination, concordant with a reduction in ribosomalinitiation and possible nascent-peptide destabilization32,33.

Here we provide evidence for why endogenous miRNA target sitesare found in noncoding regions, but it is also logical to ask whyrelatively few miRNA targets are localized in the 5¢ UTR. When thetranslation-initiation complex forms around the cap structure, the 40Ssubunit of the ribosome will scan the 5¢ UTR until it identifies the firstAUG sequence, where the 60S subunit joins to form an 80S ribosome.It is possible that the scanning process impairs the formation ofmiRNA–RISC complexes in some 5¢ UTRs, depending on its structure,which can be complex. Our preliminary studies were consistent withthis, because we found a great degree of discordance between differentmiRNA-target 5¢ UTR insertions and the degree of translationalrepression (S.G. and M.A.K., unpublished data). Nonetheless, thereare examples of miRNAs that do function with 5¢ UTR targets.Whereas one study shows that the mir-122 target sites located in the5¢ UTR region of the hepatitis C virus are important to maintainrobust viral replication45, another reports that the mRNA-bearingmiRNA target sites in the 5¢ UTR can be repressed as effectively asthose having miRNA target sites in the 3¢ UTR46. Further studies areneeded to establish the extent to which functional miRNA targets arepresent in these noncoding regions.

METHODSPlasmid constructions. Both strands of 2�Mir30 target sites were chemically

synthesized (sense strand: 5¢-AATTCGCTGCAAACAAAGACTGAAAGAACT

AGTGCGCTGCAAACAAAGACTGAAAGCTGCA-3¢; antisense strand 5¢-GCTTTCAGTCTTTGTTTGCAGCGCACTAGTTCTTTCAGTCTTTGTTTGCA

GCG-3¢), annealed, purified and inserted between EcoRI and PstI sites 67 bp

downstream of the FF-luciferase coding region in a pGL3 construct with

modified 3¢ UTR sequences. We used PCR-based point-mutagenesis

approaches to create a single-point insertion to disrupt the stop codon of

the FF-luciferase gene. A similar approach was used to generate the GFP

reporter system and the FF-luciferase reporter system with bantam target

sequences. An B700-bp sequence in the middle of a kanamycin-resistance

gene–coding region was PCR amplified and then inserted into various cloning

sites upstream or downstream of the miRNA target sites to reposition the

miRNA targets within different regions of the 3¢ UTR.

Rare codon sequences (5¢-GCG CCG GTA ACG GTA CCG GCG ACG GCG-

3¢) or optimal codon sequences (5¢-GCC CCC GTC ACC GTC CCC GCC ACC

GCC-3¢) were inserted either 53 bp upstream or immediately downstream of

the mir-30 target sites. Mir-30/mir-30P or bantam/bantam-P shRNAs were

designed as a passenger strand, followed by the mir-22 loop sequence (5¢-CCTGACCCA-3¢), followed by the guiding-strand sequence. These were cloned

downstream of the U6 ploymerase III promoter.

Cell culture and transfections. Adherent HEK293 and NIH3T3 cells were

grown in DMEM (Gibco-BRL) with 2 mM L-glutamine and 10% (v/v) heat-

inactivated FBS with antibiotics. All transfection assays were done using

Lipofectamine 2000 (Invitrogen) following the manufacturer’s protocol.

HEK293 and NIH3T3 cells at 90% confluency were transfected in 24-well

plates with 50 ng FF-luciferase or EGFP reporter DNA, 50 ng shRNA expression

DNA and 5 ng RL-luciferase DNA, unless specified otherwise. Unless indicated,

cells were assayed 36 h after transfection.

ART IC L E S

14 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 56: Nature Structural Molecular Biology February

Dual-luciferase assay. FF-luciferase and RL-luciferase were measured using

Promega’s dual-luciferase kit (catatalg no. E1980) protocol and detected by a

Modulus Microplate Luminometer (Turner BioSystems).

Western blots. NIH3T3 cells (36 h after transfection) were lysed with

mammalian protein-extraction reagent from M-PER (PIERCE, catalog no.

78501) with protease inhibitors (Roche, catalog no. 11836153001). The samples

were denatured in Laemmli sample buffer (Bio-RAD, catalog no. 161-0737) for

5 min at 95 1C and separated in 10% (w/v) SDS-PAGE gels. The denatured

proteins were then electrotransferred onto a PVDF membrane blocked with 5%

(w/v) fat-free milk powder in PBS and 0.5% (v/v) Tween 20 for 1 h. Either an

anti–FF-luciferase antibody (diluted 1:5,000, Abcam), anti-GFP antibody

(diluted 1:1,000, Abcam) or anti–b-actin antibody (diluted 1:8,000, Sigma)

was used. Following three washes in PBS for 5 min, a secondary antibody

(horseradish peroxidase (HRP)–anti-mouse IgG; diluted 1:10,000, Sigma) was

added for 1 h at room temperature (25 1C), followed by three 5-min washes in

PBS. Antibody-bound proteins were visualized using the ECL western blotting

analysis system (Amersham, RPN2109).

Northern blots and RNase protection assay. Total RNA was isolated using

Trizol (Invitrogen). The DNA-free kit (Ambion, catalog no. 1906) was used to

purify total RNA from contaminating DNAs. Total RNA (10–20 mg) was

electrophoresed on 1% (w/v) agarose gel. After transfer onto Hybond-N1

membrane (Amersham Pharmacia Biotech), target mRNAs were detected using

P32-labeled full-length cDNA probes.

RPA assays were carried out using the Ambion PRA III kit (catalog no.

AM1414). P32-labeled antisense RNA probes against either FF-luciferase or

RL-luciferase were generated by in vitro transcription (Ambion MAXIscript Kit,

catalog no. AM1308). DNA templates were produced by PCR using primer sets

(FF-luc: 5¢-ATCCATCTTGCTCCAACACC-3¢ and 5¢-TTTTCCGTCATCGTCT

TTCC-3¢; RL-luc: 5¢-GATAACTGGTCCGCAGTGGT-3¢ and 5¢-ATTTGCCTGA

TTTGCCCATA-3¢). Total RNA from NIH3T3 cells was isolated by Trizol

(Invitrogen) 36 h after transfection and purified using a DNA-free kit (Ambion,

catalog no.1906). Hybridization reactions were carried out at 55 1C overnight

and RNase digestion at 37 1C for 30 min using the RNase A/T1 cocktail

provided in the RPA III kit.

Hydrodynamic tail injection and luciferase imaging. Animals studies were

done in concordance with the US National Institutes of Health guidelines and

the Stanford Animal Care Committee. Female BALB/c mice, 6–8 weeks of age

(Jackson Laboratory) were hydrodynamically infused with a mixture of 2 mg

FF-luciferase DNA, 2 mg of the appropriate shRNA plasmid, 2 mg of an RSV-

hAAT expression cassette DNA and 34 mg pBluescript plasmid DNA (Strata-

gene), and were then imaged for luciferase. As described47, raw light values

were reported as relative detected light photons per minute, and normalized for

serum hAAT expression.

Polyribosome fractionation. Polysomal mRNA was prepared based on a

method described previously48. Briefly, before being harvested, cells were

incubated with 0.1 mg ml�1 cycloheximide for 3 min at 37 1C. NIH3T3 cells

were harvested directly on their culture dish in lysis buffer (15 mM Tris-HCl,

pH 7.4, 15 mM MgCl2, 0.3 M NaCl, 1% (v/v) Triton X-100, 0.1 mg ml�1

cycloheximide and 1 mg ml�1 heparin) and loaded onto 10–50% (w/v) sucrose

gradients composed of the same extraction buffer lacking Triton X-100. The

gradients were sedimented at 210,000g (max.) for 180 min in a SW41 rotor at 4

1C. Fractions of equal volumes were collected from the top using an ISCO

fraction-collector system. RNAs were extracted by phenol-chloroform followed

by isopropanol precipitation, washes in 75% (v/v) ethanol and resuspension in

DNase I reaction buffer (Turbo DNase, Ambion).

Mapping accessibility. This approach is modified from a previous publica-

tion49. HEK293 cells were transfected with plasmids expressing the FF-lucifer-

ase reporter gene embedded with the cluster of rare or optimal codons along

with a GFP control plasmid. At 36 h post-transfection, cells were harvested after

incubation with 0.1 mg ml�1 cycloheximide for 3 min at 37 1C. After three

washes with PBS, approximately 2 � 107 cells were pelleted and resuspended in

two times the volume of the cell pellet in hypotonic swelling buffer (7 mM Tris-

HCl, pH7.5, 7 mM KCl, 1 mM MgCl2 and 1 mM b-mercaptoethanol). After a

10-min incubation on ice, samples were Dounce homogenized (VWR) 40 times

with a tight pestle B followed by addition of one-tenth of the final volume of

neutralizing buffer (21 mM Tris-HCl, pH 7.5, 116 mM KCl, 3.6 mM MgCl2

and 6 mM b-mercaptoethanol). After centrifugation of the homogenates at

20,000g for 10 min at 4 1C, the supernatants were collected. The RNase

H–mediated–cleavage experiments were carried out in a total volume of 300 ml,

containing 280 ml cell extracts, 1 mM DTT, 20–40 units RNase inhibitor

(Promega) and 50 nM each of the defined sequence antisense deoxyribooligo-

nucleotides (ODNs) (Supplementary Table 1 online). The ODNs were

incubated in the extracts for 5 min at 37 1C. Total RNA was extracted by

phenol-chloroform extraction. After the reverse transcription reaction (Invi-

trogen RT kit, catalog no. 18080-051) with oligo dT primer, real-time PCR

(Qiagen, QuantiTect SYBR green PCR kit) was performed with two primers

flanking the cleavage sites. (Upstream: 5¢-AGGCCAAGAAGGGCGGAAAG-3¢or 5¢-ACCGCGAAAAAGTTGCGCG-3¢; downstream: 5¢-TCACTGCATTC

TAGTTGTGG-3¢). All results are obtained with R 4 0.98). Each oligonucleo-

tide was tested six times in two separate experiments. P-values were calculated

using the Student’s t-test.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSThis work was supported by the US National Institutes of Health grant DK78424. We thank B. Hu for helping prepare some of the samples, R. Cevailosfor technical assistance with the polyribosome fractionation experiments andD. Haussecker for critical reading of the manuscript.

AUTHOR CONTRIBUTIONSS.G. designed and implemented most of the experiments; L.J. performed thestudies outlined in Figure 4; F.Z. assisted S.G. with the molecular biologypreparations; P.S. provided assistance with the polysome studies and offeredcritical discussions related to data interpretation; M.A.K. supervised thestudies and provided scientific input into the experimental design and datainterpretation; S.G and M.A.K wrote the manuscript; all authors approvedthe final manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Ambros, V. The functions of animal microRNAs. Nature 431, 350–355(2004).

2. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,281–297 (2004).

3. Berezikov, E. et al. Phylogenetic shadowing and computational identification of humanmicroRNA genes. Cell 120, 21–24 (2005).

4. Lewis, B.P., Burge, C.B. & Bartel, D.P. Conserved seed pairing, often flanked byadenosines, indicates that thousands of human genes are microRNA targets. Cell 120,15–20 (2005).

5. Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3¢ UTRsby comparison of several mammals. Nature 434, 338–345 (2005).

6. O’Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V. & Mendell, J.T. c-Myc-regulatedmicroRNAs modulate E2F1 expression. Nature 435, 839–843 (2005).

7. He, L. et al. A microRNA polycistron as a potential human oncogene. Nature 435,828–833 (2005).

8. Triboulet, R. et al. Suppression of microRNA-silencing pathway by HIV-1 during virusreplication. Science 315, 1579–1582 (2007).

9. Vaucheret, H. Post-transcriptional small RNA pathways in plants: mechanisms andregulations. Genes Dev. 20, 759–771 (2006).

10. Lai, E.C. Micro RNAs are complementary to 3¢ UTR sequence motifs that mediatenegative post-transcriptional regulation. Nat. Genet. 30, 363–364 (2002).

11. Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P. & Burge, C.B. Prediction ofmammalian microRNA targets. Cell 115, 787–798 (2003).

12. Doench, J.G. & Sharp, P.A. Specificity of microRNA target selection in translationalrepression. Genes Dev. 18, 504–511 (2004).

13. Meister, G. et al. Human Argonaute2 mediates RNA cleavage targeted by miRNAs andsiRNAs. Mol. Cell 15, 185–197 (2004).

14. Liu, J. et al. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305,1437–1441 (2004).

15. Pillai, R.S., Bhattacharyya, S.N. & Filipowicz, W. Repression of protein synthesis bymiRNAs: how many mechanisms? Trends Cell Biol. 17, 118–126 (2007).

16. Valencia-Sanchez, M.A., Liu, J., Hannon, G.J. & Parker, R. Control of translationand mRNA degradation by miRNAs and siRNAs. Genes Dev. 20, 515–524(2006).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 4 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 57: Nature Structural Molecular Biology February

17. Liu, J., Valencia-Sanchez, M.A., Hannon, G.J. & Parker, R. MicroRNA-dependentlocalization of targeted mRNAs to mammalian P-bodies. Nat. Cell Biol. 7, 719–723(2005).

18. Sen, G.L. & Blau, H.M. Argonaute 2/RISC resides in sites of mammalian mRNA decayknown as cytoplasmic bodies. Nat. Cell Biol. 7, 633–636 (2005).

19. Bagga, S. et al. Regulation by let-7 and lin-4 miRNAs results in target mRNAdegradation. Cell 122, 553–563 (2005).

20. Zeng, Y., Wagner, E.J. & Cullen, B.R. Both natural and designed micro RNAs caninhibit the expression of cognate mRNAs when expressed in human cells. Mol. Cell 9,1327–1333 (2002).

21. Gu, S. & Rossi, J.J. Uncoupling of RNAi from active translation in mammalian cells.RNA 11, 38–44 (2005).

22. Brennecke, J., Hipfner, D.R., Stark, A., Russell, R.B. & Cohen, S.M. bantam encodes adevelopmentally regulated microRNA that controls cell proliferation and regulates theproapoptotic gene hid in Drosophila. Cell 113, 25–36 (2003).

23. Lagos-Quintana, M. et al. Identification of tissue-specific microRNAs from mouse.Curr. Biol. 12, 735–739 (2002).

24. Takada, S. et al. Mouse microRNA profiles determined with a new and sensitive cloningmethod. Nucleic Acids Res. 34, e115 (2006).

25. Yant, S.R. et al. Somatic integration and long-term transgene expression in normal andhaemophilic mice using a DNA transposon system. Nat. Genet. 25, 35–41(2000).

26. Fernandez, J. et al. Ribosome stalling regulates IRES-mediated translation in eukar-yotes, a parallel to prokaryotic attenuation. Mol. Cell 17, 405–416 (2005).

27. Lemm, I. & Ross, J. Regulation of c-myc mRNA decay by translational pausingin a coding region instability determinant. Mol. Cell. Biol. 22, 3959–3969(2002).

28. Scherr, M. et al. Detection of antisense and ribozyme accessible sites on nativemRNAs: application to NCOA3 mRNA. Mol. Ther. 4, 454–460 (2001).

29. Seggerson, K., Tang, L. & Moss, E.G. Two genetic circuits repress the Caenorhabditiselegans heterochronic gene lin-28 after translation initiation. Dev. Biol. 243, 215–225(2002).

30. Olsen, P.H. & Ambros, V. The lin-4 regulatory RNA controls developmental timing inCaenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation oftranslation. Dev. Biol. 216, 671–680 (1999).

31. Maroney, P.A., Yu, Y., Fisher, J. & Nilsen, T.W. Evidence that microRNAs are associatedwith translating messenger RNAs in human cells. Nat. Struct. Mol. Biol. 13,1102–1107 (2006).

32. Nottrott, S., Simard, M.J. & Richter, J.D. Human let-7a miRNA blocks proteinproduction on actively translating polyribosomes. Nat. Struct. Mol. Biol. 13,1108–1114 (2006).

33. Petersen, C.P., Bordeleau, M.E., Pelletier, J. & Sharp, P.A. Short RNAs represstranslation after initiation in mammalian cells. Mol. Cell 21, 533–542 (2006).

34. Bhattacharyya, S.N., Habermacher, R., Martine, U., Closs, E.I. & Filipowicz, W. Reliefof microRNA-mediated translational repression in human cells subjected to stress.Cell 125, 1111–1124 (2006).

35. Pillai, R.S. et al. Inhibition of translational initiation by Let-7 MicroRNA in humancells. Science 309, 1573–1576 (2005).

36. Thermann, R. & Hentze, M.W. Drosophila miR2 induces pseudo-polysomes andinhibits translation initiation. Nature 447, 875–878 (2007).

37. Saxena, S., Jonsson, Z.O. & Dutta, A. Small RNAs with imperfect match to endogenousmRNA repress translation. Implications for off-target activity of small inhibitory RNA inmammalian cells. J. Biol. Chem. 278, 44312–44319 (2003).

38. Kloosterman, W.P., Wienholds, E., Ketting, R.F. & Plasterk, R.H. Substrate require-ments for let-7 function in the developing zebrafish embryo. Nucleic Acids Res. 32,6284–6291 (2004).

39. Doench, J.G., Petersen, C.P. & Sharp, P.A. siRNAs can function as miRNAs. GenesDev. 17, 438–442 (2003).

40. Long, D. et al. Potent effect of target structure on microRNA function. Nat. Struct. Mol.Biol. 14, 287–294 (2007).

41. Saetrom, P. et al. Distance constraints between microRNA target sites dictate efficacyand cooperativity. Nucleic Acids Res. 35, 2333–2342 (2007).

42. Yekta, S., Shih, I.H. & Bartel, D.P. MicroRNA-directed cleavage of HOXB8 mRNA.Science 304, 594–596 (2004).

43. Duursma, A.M., Kedde, M., Schrier, M., le Sage, C. & Agami, R. miR-148 targetshuman DNMT3b protein coding region. RNA 14, 872–877 (2008).

44. Hutvagner, G. & Zamore, P.D. A microRNA in a multiple-turnover RNAi enzymecomplex. Science 297, 2056–2060 (2002).

45. Jopling, C.L., Yi, M., Lancaster, A.M., Lemon, S.M. & Sarnow, P. Modulationof hepatitis C virus RNA abundance by a liver-specific MicroRNA. Science 309,1577–1581 (2005).

46. Lytle, J.R., Yario, T.A. & Steitz, J.A. Target mRNAs are repressed as efficiently bymicroRNA-binding sites in the 5¢ UTR as in the 3¢ UTR. Proc. Natl. Acad. Sci. USA104, 9667–9672 (2007).

47. Grimm, D. et al. Fatality in mice due to oversaturation of cellular microRNA/shorthairpin RNA pathways. Nature 441, 537–541 (2006).

48. Johannes, G. & Sarnow, P. Cap-independent polysomal association of natural mRNAsencoding c-myc, BiP, and eIF4G conferred by internal ribosome entry sites. RNA 4,1500–1513 (1998).

49. Gu, S., Ji, J., Kim, J.D., Yee, J.K. & Rossi, J.J. Inhibition of infectious humanimmunodeficiency virus type 1 virions via lentiviral vector encoded short antisenseRNAs. Oligonucleotides 16, 287–295 (2006).

ART IC L E S

15 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 58: Nature Structural Molecular Biology February

Nucleosomes can invade DNA territories occupied bytheir neighborsMaik Engeholm1, Martijn de Jager2, Andrew Flaus1,4, Ruth Brenk3, John van Noort2 & Tom Owen-Hughes1

Nucleosomes are the fundamental subunits of eukaryotic chromatin. They are not static entities, but can undergo a number ofdynamic transitions, including spontaneous repositioning along DNA. As nucleosomes are spaced close together within genomes,it is likely that on occasion they approach each other and may even collide. Here we have used a dinucleosomal model systemto show that the 147-base-pair (bp) DNA territories of two nucleosomes can overlap extensively. In the situation of an overlapby 44 bp or 54 bp, one histone dimer is lost and the resulting complex can condense to form a compact single particle. Wepropose a pathway in which adjacent nucleosomes promote DNA unraveling as they approach each other and that this permitstheir 147-bp territories to overlap, and we suggest that these events may represent early steps in a pathway for nucleosomeremoval via collision.

In eukaryotic cells genomic DNA exists in the form of a nucleoproteincomplex called chromatin1. The packaging of the genomic DNAimposes a hindrance to most DNA-dependent processes, includingDNA replication, repair and mRNA transcription. This implies animportant role for chromatin structure in the control of many nuclearfunctions2,3. The first step in the packaging hierarchy of chromatin isthe formation of a nucleosome core particle (NCP)4. The NCP iscommonly defined as a complex comprising 147 bp of double-stranded DNA and an octamer of core histone proteins5. The coreparticle as a whole possesses a pseudo two-fold symmetry, with thepseudo dyad axis passing through the central base pair of the 147-bpDNA territory5,6. It is convenient to refer to this dyad base pair inorder to describe the translational position of a nucleosome.

Of great practical and theoretical interest is the question of how thetranslational positions of nucleosomes on a DNA molecule arespecified. Different mechanisms have been proposed. In the case ofdirect positioning, the position of a nucleosome is solely determinedby its interactions with the underlying DNA7. A more complexsituation known as indirect positioning involves binding of non-histone proteins such that they direct the positioning of adjacentnucleosomes8–10. So far, relatively little attention has been paid to thequestion of whether a first nucleosome is capable of indirectlypositioning a second one in a similar manner. Instead, in assigningnucleosome positions genome-wide, it has been assumed that overlapbetween the 147-bp territories is not possible.

Mononucleosomes can undergo a range of structural transitionsincluding transient detachment of DNA from the surface of thehistone octamer11,12, destabilization of histone H2A-H2B dimers13,

reconfiguration of histone dimers as part of a packing interactionbetween nucleosomes14, a chiral transition of the entire H3-H4tetramer15 and repositioning of histone octamers along DNA16.Given these numerous ways of adapting to their environment, it isnot clear what actually happens when two nucleosomes approach orcollide. Nonetheless, this is likely to be of fundamental importance inregulating access to the underlying genetic information.

Here we have studied the behavior of Xenopus laevis nucleosomes asthey approach each other using a dinucleosomal system. We find thatnucleosome-DNA interactions dominate over the principle of indirectpositioning through a second nucleosome, demonstrating that it ispossible to assemble template molecules on which the nucleosomesextensively overlap with respect to their 147-bp DNA territories.Moreover, we have used this system to analyze the structure ofdinucleosomes. We find that nucleosomes promote the unravelingof DNA from their neighbors. In an extreme situation, a territorialoverlap of 54 bp is observed and in this case one histone dimerdissociates from the complex, enabling it to form a condensed particle.

RESULTSAssembly of dinucleosomes with defined separationWe developed a system for assembling dinucleosomes whereby the601 nucleosome-positioning sequence is used to direct sites ofnucleosome assembly with high efficiency17. Two constructs weredesigned to result in internucleosome spacing of +48 bp or 0 bp; inother words, the central dyads of the nucleosomes were separated by195 bp and 147 bp, respectively (Fig. 1a). We designed a thirdconstruct such that the central base pair of each positioning

Received 7 May 2008; accepted 2 January 2009; published online 1 February 2009; doi:10.1038/nsmb.1551

1Wellcome Trust Centre for Gene Regulation and Expression, University of Dundee, Dundee, DD1 5EH, UK. 2Physics of Life Processes, Leiden Institute of Physics,Leiden University, Niels Bohrweg 2, 2333 CA Leiden, The Netherlands. 3Division of Biological Chemistry and Drug Discovery, College of Life Sciences, University ofDundee, Dundee, DD1 5EH, UK. 4Present address: Department of Biochemistry, NUI Galway, Ireland. Correspondence should be addressed to T.O.-H.([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 5 1

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 59: Nature Structural Molecular Biology February

sequence was separated by only 103 bp, such that they overlapby 44 bp (construct (�44 bp); Fig. 1a).

Nucleosome-assembly reactions were carried out on all threeconstructs, and the extent of assembly was first monitored by nativegel electrophoresis (Fig. 1b–d). At lower ratios of octamer:DNA, twospecies were present, consistent with the assembly of just one nucleo-some on either of the two positioning elements (Fig. 1b–d, lanes 2 and3). In assembly reactions with octamer:DNA ratios of approximately2:1, a slower-migrating species was predominant, consistent with theoccupancy of both nucleosome-positioning sequences on the sameDNA fragment (Fig. 1b–d, lanes 4 and 5).

We used site-directed nucleosome mapping to determine preciselywhere histone H4 makes contacts with DNA on these fragments.Briefly, this technique involves mapping the locations of cleavagesites caused as a result of the tethering of a DNA-cleaving com-pound to a specific location on the histone octamer. These can be usedto assign the nucleosomal dyad6. In all cases, cleavage sites occuredonly at locations consistent with those previously observed fornucleosomes on the 601 sequence18 (Fig. 1e–g). On the (�44 bp)construct, this involves the assembly of nucleosomes at locations inwhich the 147-bp territories overlap by 44 bp. As nucleosomesassembled onto this construct at an octamer:DNA ratio of 2:1have a discrete mobility in native gel electrophoresis (Fig. 1d,lane 5) it is likely that a single species is generated in which normalDNA contacts in the region of the dyad are made simultaneouslyat both of the positioning sequences on the template. The data shownin Figure 1 were obtained using direct repeats of the 601 sequence,

but similar results are observed using inverted repeats of thesame sequence (Supplementary Fig. 1 online).

Histone composition of model dinucleosomesWe first used native gel electrophoresis to monitor intermediates indinucleosome assembly. Assembly of the (0 bp) construct withsubstoichiometric octamer, tetramer and hexamer resulted in thegeneration of doublets with distinct electrophoretic mobilities, con-sistent with the generation of single nucleosomes, hexasomes andtetrasomes, respectively (Fig. 2a, lanes 2–4, bands 1, 2 and 3).Assembly reactions performed at higher histone:DNA ratios enabledus to identify species corresponding to dinucleosome, ditetrasome anddihexasome (Fig. 2a, lanes 6–8, bands 4, 5 and 7). Furthermore, eachremaining intermediate during the assembly of two intact adjacentmononucleosomes could also be identified (Supplementary Fig. 2online). When the same analysis was applied to the (�44 bp)construct, we observed a similar pattern, with the exception thatone intermediate between a ditetrasome and the fully assembledspecies was not detected (Fig. 2b and Supplementary Fig. 2). Thisindicates that the limit species formed upon assembly on the (�44 bp)construct is missing a histone dimer from one nucleosome, but notthe other.

To substantiate this hypothesis, we purified chromatin assembledon the dinucleosomal constructs from native gels followed by trypticdigestion and nLC/ESI/MS/MS analysis (Methods). We next deter-mined the ratio between H2A-H2B dimer peptides and H3-H4tetramer peptides on the (0 bp) and (�44 bp) constructs. Peptides

0 1.6

2.0

1.2

0.8

[Octamer]/[DNA]

1 5432

(–44 bp) construct

1 5 6432

Hin

P1

I0 1.

62.

0

1.2

0.8

[Octamer]/[DNA]

(147

–44

) bp

*0 1.6

2.0

1.2

0.8

[Octamer]/[DNA]

1 5432

(0 bp) construct

1 5 6432

Hin

P1

I0 1.

62.

0

1.2

0.8

[Octamer]/[DNA]

147

bp

*0 1.6

2.0

1.2

0.8

[Octamer]/[DNA]

1 5432

(+48 bp) construct

1 5 6432

Hin

P1

I

0 1.6

2.0

1.2

0.8

[Octamer]/[DNA]

(147

+48

) bp

*

a

b

e

c d

f g

HinP1 I HinP1 I

HinP1 I HinP1 I

(+48 bp) construct

147 bp 147 bp48 bp

601 sequence 601 sequence

(0 bp) construct

147 bp 147 bp

601 sequence 601 sequence

(–44 bp) construct

125 bp 125 bp

truncated 601 sequence truncated 601 sequence

HinP1 I HinP1 I

Figure 1 Chromatin assembly on defined dinucleosomal templates. (a) Schematic representation of three dimeric 601 constructs based on direct repeats

of the positioning sequence. The 147-bp positioning sequence is indicated by the shadowed arrows. On the (�44 bp) construct, 22 bp have been removed

from the inward-looking end of each of the two copies. (b–d) Native gel analysis of reconstitutions on the above DNA constructs. Reconstitutions were

performed at increasing [octamer]:[DNA] ratios. Bands corresponding to the two types of mononucleosomes and one type of dinucleosome are indicated.

(e–g) Site-directed mapping analysis of the reconstitutions in b–d. Mapping signals are indicated by the vertical bars on the right. The marker lanes on the

left contain DNA partially digested at the HinPI 1 restriction sites 2 bp upstream of the 601 dyad position. The DNA fragments used in the reconstitution

reactions were prepared by PCR, which also gives rise to some DNA fragments comprising a single repeat of the 601 sequence only (*).

ART IC L E S

15 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 60: Nature Structural Molecular Biology February

derived from H3 and H4 were present at comparable abundances onboth the (0 bp) and (�44 bp) constructs (Fig. 2c). However, peptidesderived from H2A and H2B were on average 0.72-fold less abundantin material derived from the overlapping construct in comparison tothe touching nucleosomes (Fig. 2c). This is best explained by the lossof one H2A-H2B dimer on the (–44 bp) construct.

Structural characterization of dinucleosomes by AFMTo gain further insight into how the dinucleosomes are arranged, wesubjected them to atomic force microscopy (AFM). Dinucleosomesreconstituted on the three DNA constructs in Figure 1 were isolatedby preparative native gel electrophoresis,fixed in the absence or presence of Mg2+

ions and imaged using tapping mode AFM(Fig. 3a–f). In the absence of divalentcations, on all three DNA constructs wewere able to resolve two separate particleson most templates (Fig. 3a–c). In the pre-sence of 5 mM Mg2+, the (–44 bp) constructappeared predominantly as single, larger par-ticles, whereas template molecules on the(+48 bp) and (0 bp) constructs were stillresolved as two individual nucleosomes(Fig. 3d–f). Moreover, the larger particles in

Figure 3f appeared higher than those observed on the other two DNAconstructs (Fig. 3g). Under similar ionic conditions, an MNase digestof dinucleosomes on the (�44 bp) construct gave rise to a protectedfragment of approximately 250 bp in length (Fig. 3h). This shows thatin the presence of Mg2+ the (–44 bp) template molecules condense toform a structure in which the linker DNA between the two particles isnot highly accessible.

The particles observed in Figure 3f possess a similar cross-sectionbut increased height compared to a mononucleosome. Such a structurecan most easily be obtained if a stacking interaction is formed betweenthe hexasome and the nucleosome. We have built a model of a stacked

Figure 2 Measuring the histone content of

dimeric chromatin particles. (a,b) Reconstitution

and native gel analysis. Reconstitutions were

performed at the indicated [octamer]:[DNA] and

[tetramer]:[dimer]:[DNA] ratios and analyzed by

means of native gel electrophoresis. Assembly

was performed on the (0 bp) (a) and the (�44

bp) (b) construct. Each assembly intermediate is

labeled. For a full description see text and

Supplementary Figure 2. 1, mononucleosomes;

2, monotetrasomes; 3, monohexasomes; 4, fully

assembled species; 5, ditetrasome; 6, tetrasome-

hexasome; 7 dihexasome; 8 hexasome-

nucleosome. (c) The major fully assembled

species (band 4 in both a and b) were purified bynative gel electrophoresis, digested with trypsin

and analyzed in triplicate by LC/ESI/MS/MS.

Signal intensities of peptides in the (�44 bp)

samples were divided by the corresponding signal intensities in the (0 bp) samples, and the average over all replicates was calculated for each individual

peptide (blue diamonds). Then, these averaged, normalized signal intensities were averaged separately for dimer-derived (green) and tetramer-derived (red)

peptides yielding a [dimer]:[tetramer] ratio of 0.72. Error bars indicate s.d.

2 3 4 5 6 7 8 91

13

5 5

4

24

6

7

DNA(dimer)

DN

A(d

imer

)D

NA

(mon

omer

)

0.8:

1

0.8:

0:1

0.8:

0.8:

1

0.8:

1.6:

1

1.6:

1

1.6:

0:1

1.6:

1.6:

1

1.6:

3.2:

1

1

3

5 5

42

46

87

2 3 4 5 6 7 8 91

0.8:

1

0.8:

0:1

0.8:

0.8:

1

0.8:

1.6:

1

1.6:

1

1.6:

0:1

1.6:

1.6:

1

1.6:

3.2:

1

a b

Oct:DNAor

tetramer:dimer:DNA

(0 bp) construct

147 bp 147 bp601 601

(–44 bp) construct

125 bp 125 bptruncated 601 truncated 601

(–44 bp) vs. (0 bp)

c

Peptide number

Nor

mal

ized

sig

nal i

nten

sity

H2A–H2B dimerH3–H4 tetramer

510.0

0.5

1.0

1.5

2.0

10 15 20

a

d

g h

e f

b c

(+48 bp) (0 bp) (–44 bp)

MNase (min)– 2.5 6 – 2.5 6

(+48 bp) (–44 bp)

100 bp

400 bp300 bp200 bp

2 31 4 5 6 MM

0

50

1000

20

40

60

0 2 4 6 8 10 12 140

20

40

Particle height (nm)

n, (

+48

bp)

n, (

–44

bp)

n, (

0 bp

)

Figure 3 AFM imaging of dinucleosomes.

(a–f) Dinucleosomes on the respective constructs

were gel purified, fixed in the absence (a–c) or

presence (d–f) of 5 mM Mg2+ and imaged. In the

presence of divalent cations, the overlapping

dinucleosomes on the (�44 bp) construct appear

as single particles of increased height.

(g) Maximal heights of the particles in the

experiments in d–f. The average height of the

nucleosomes on the (+48 bp) and (0 bp)

constructs is indicated by the blue vertical line.(h) Dinucleosomes on the (+48 bp) (lanes 1–3)

and (�44 bp) (lanes 4–6) constructs were

treated with MNase for the indicated time

periods. An MNase-resistant fragment of around

250 bp is observed for the dinucleosomes on the

(�44 bp) construct.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 5 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 61: Nature Structural Molecular Biology February

dinucleosome based on two copies of the crystal structure of the NCP(using PDB 1KX5; Fig. 4a). The last 44 bp of the DNA double helix incopy 1 were superimposed on the first 44 bp of the DNA in copy 2 ofthe structure. The resulting structure resembles an extended nucleo-some in which the DNA forms a continuous superhelix of approxi-mately three turns. This arrangement of DNA is in good agreementwith the observed inaccessibility to MNase digestion (Fig. 3h).

To test our model, we analyzed the steric interference between thehistone proteins in both copies of the NCP. When both inner dimersare present, large van der Waals clashes are observed, which are clearlynot compatible with the formation of such a structure (Supplemen-tary Fig. 3a online). In fact, the two inner dimers cover almost thesame volume (Supplementary Fig. 3c). If, however, the inner dimer isabsent from copy 2 of the NCP, only minimal van der Waals clashesoccur (Supplementary Fig. 3b). In addition, these clashes map toregions of the histone proteins that could be easily rearranged in theprocess of folding.

Helical phasing contributes to the folded stateWe next repeated the modeling described above for overlap lengthsranging from 3 bp to 60 bp and determined the r.m.s. deviation of thephosphorus atoms in the segment used for superimposition. Whenplotted as a function of the overlap length, a clear 10-bp periodicity ofthe r.m.s. deviation becomes apparent (Fig. 4b). This reflects the factthat the DNA in superhelix 2 needs to possess the same rotationalframe as the DNA in superhelix 1 so that they can be superimposedsmoothly. In the situation of a dinucleosome, this suggests that afolded particle, in which the DNA describes a continuous superhelix,can be formed only if the dyad-to-dyad distance fulfills certainconditions; that is, the distance must be an integer multiple of thehelical repeat length of DNA.

To test this hypothesis, chromatin was assembled on (�44 bp),(�49 bp) and (�54 bp) constructs corresponding to neighboring localminima or maxima in Figure 4b. On all of these constructs, recon-stitution gives rise to a limit species containing only three H2A-H2Bdimers (data not shown). Following gel purification and fixation inthe presence of Mg2+, we imaged the template molecules using AFM(Fig. 4c–e). The (�44 bp) and (�54 bp) constructs typically appearedas single larger particles (Fig. 4c,e), whereas, in contrast, the (�49 bp)

construct showed a markedly different behavior. Although some of thetemplate molecules still appeared as single larger particles, on mosttemplates, two separate small particles could be distinguished(Fig. 4d). Thus, on the (�49 bp) construct, Mg2+-induced foldingindeed occurs much less efficiently than on the other two constructs,for which the dyad-to-dyad distance coincides with local minima inthe model for the folded state.

Generation of overlapping nucleosomes by repositioningWe next investigated whether nucleosomes that are initially intact andseparate can rearrange to overlap with each other during the course ofspontaneous nucleosome repositioning. To do this, dinucleosomeswere assembled onto a 379-bp DNA fragment derived from the mousemammary tumor virus (MMTV) long terminal repeat (LTR)19. Site-directed mapping showed that, in the starting material, nucleosomeswere present at the +70 (nucA) and �127 (nucB) positions (Fig. 5a).During temperature incubation nucleosomes were lost from the +70location, but remained present at �127, consistent with previousobservations and indicating that nucA is shifted at lower temperaturescompared to nucB (see legend to Supplementary Fig. 4 online). Inaddition, new mapping signals were present at +22 and �25. Thesimplest explanation for this is that nucleosomes moved from +70 to+22 and �25. Movement of a nucleosome to �25 while the secondnucleosome remains at �127 would result in the dyads of these twonucleosomes being separated by only 102 bp, such that their territorieswould overlap by 45 bp. Native gel electrophoresis of mono- anddinucleosomes assembled onto this DNA fragment showed thatmononucleosomes did not contribute to the mapping signal at the�25 location (Supplementary Fig. 4).

Using AFM, we obtained further support for the spontaneousgeneration of nucleosomes with overlapping territories on the sameDNA molecule. Following temperature incubation and fixation in thepresence of Mg2+, a considerable proportion of the dinucleosomesthat were assembled on the MMTV fragment appeared as single largerparticles similar to those formed on the (�44 bp) construct (compareFigs. 5b and 3f). To assess the occurrence of these particles quantita-tively, we determined particle volumes from images obtained beforeand after mobilization (Fig. 5c,d). The volume distributionswe observed show that nucleosomes that are separate initially form

(–44 bp)

c

(–49 bp)

d

(–54 bp)

eOverlap length

R.m

.s. d

evia

tion

20

15

10

5

00 10 20 30 40 50 60

a b Figure 4 Helical phasing is required for the

condensation of overlapping dinucleosomes.

(a) Structural models of dinucleosomes on the

(�44 bp) construct in the folded state. The inner

dimer is highlighted in red; other histone proteins

are shown in blue and DNA in yellow. Model

based on PDB 1KX5, as described in

Supplementary Figure 3. (b) A plot of the r.m.s.

deviation of phosphorus atoms as a function of

the overlap length for partial superimposition of

two copies of the DNA superhelix in the NCP

structure (PDB 1KX5). For a chosen overlap

length n, the last n bp in the first copy were

superimposed with the first n bp in the second

copy of the NCP structure. The helical periodicitysuggests that formation of compact structures is

likely to require helical phasing. (c–e) AFM

imaging of dinucleosomes with overlaps lengths

of �44 bp, �49 bp and �54 bp, following

fixation in 5 mM Mg2+. Formation of compact

particles is observed for the (�44 bp) and

(�54 bp) constructs, which coincide with

minima in b; unfolded particles predominate

for the �49 bp construct.

ART IC L E S

15 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 62: Nature Structural Molecular Biology February

condensed particles similar to those observed on the (�44 bp)construct (Fig. 5c–g). Together, these observations suggest that spon-taneous repositioning can result in collisions between nucleosomesinvolving DNA being shared between the two nucleosomal particles.

It has been shown previously that SWI/SNF remodeling of nucleo-somal arrays gives rise to species that protect 190–250 bp of DNAfollowing digestions with MNase20. In combination with our findingin Figure 3h, this raises the possibility that at least some of theseproducts result from the movement of nucleosomes into positions inwhich the territories overlap. As a first step toward investigating this,we incubated dinucleosomes on the MMTV LTR fragment withincreasing amounts of RSC and subjected them to site-directedmapping (Fig. 5h). Following remodeling, we observed a new map-ping signal, albeit weak, predominantly at three positions (+18, �62and �86). Unfortunately, as the spectrum of products generatedfollowing ATP-dependent remodeling are complex and involve bothnucleosomes moving from their initial locations, it is not possible toassign their structure in as much detail as is possible during thegeneration of overlapping nucleosomes by assembly or during spon-taneous repositioning. However, the simplest interpretation of theobserved mapping products is that nucleosomes are present at +18concomitantly with one of the other two positions, resulting in aterritorial overlap of 67 bp or 43 bp.

DISCUSSIONNucleosomes are not static entities, but can undergo a number ofdynamic transitions, including the transient dissociation of the outerturns of DNA and repositioning along DNA11,16,21. The observationswe present here support a simple pathway in which these twoproperties combine. As a nucleosome moves toward a neighbor, the

unraveling of DNA at the interface is stabi-lized such that one nucleosome can invadethe DNA territory of another.

Our results demonstrate that both in reconstitution and in repo-sitioning reactions, certain DNA sequences induce the formation ofoverlapping nucleosomes. Consistent with this, the DNA sequencesupon which we observe dinucleosomes are made up from two separatepositioning elements, both of which are also functional on their own;that is, they can induce a mononucleosome at a defined dyad position.However, we do not believe that there is a large penalty or benefitassociated with the formation of overlapping nucleosomes for thefollowing reasons. First, if there was a penalty associated with forma-tion of overlapping nucleosomes, we would anticipate that mosttemplate molecules would first become occupied by mononucleo-somes, with overlapping nucleosomes assembling only after eachmolecule was occupied by one nucleosome. Instead, monomers andoverlapping nucleosomes are formed at the same rate, regardless ofwhether the template directs assembly of separate or overlappingnucleosomes (Fig. 1 and data not shown). Second, if there was apenalty associated with the formation of overlapping nucleosomes, alonger time course or higher temperature would be anticipated when anucleosome moves to a location coincident with a neighbor incomparison to when it moves to the same location on free DNA.Instead, we find that nucA repositions at the same rate in the presenceor absence of nucB (not shown).

The positions of nucleosomes in native chromatin are, at least inpart, also selected by the quality of the nucleosome-DNA inter-actions22. This raises the possibility that overlapping nucleosomescan also be formed inside a living cell, namely in places where twopositioning elements occur at a suitable distance along the genomicDNA. Several recent studies that have assigned nucleosome positionsacross genome segments have relied on a territorial exclusion princi-ple, whereby two neighboring positioning elements cannot be used at

Volume (nm3)

0 500 1,000 1,500 2,000 2,500

500 1,000 1,500 2,000 2,500

500 1,000 1,500 2,000 2,500

500 1,000 1,500 2,000 2,500

500 1,000 1,500 2,000 2,500

0255075

100

00

255075

100

0

0

0

0255075

100

0

50

150

100

0

20

40

n, M

MT

Vst

artin

gn,

MM

TV

shift

edn,

(0

bp)

n, (

+48

bp)

n, (

–44

bp)

MMTV, heat-shifted

b ha

c

d

e

f

g

Temperature

+18

–86

–62

RSC

Lane2 31 4 5Lane2 31 4

–127

+70

P P

–127

+70

P P

–127

–25

+22+26+

Figure 5 Formation of overlapping nucleosomes

as a result of repositioning. (a) Samples of a

dinucleosomal reconstitution on the MMTV were

incubated at 0 1C, 42 1C, 47 1C or 52 1C for

60 min and analyzed by site-directed mapping.

Mapping signals are labeled with the respective

dyad positions. (b) Dinucleosomes were gel

purified, temperature treated and fixed in the

presence of 5 mM Mg2+ followed by AFM imaging

at room temperature. (c–g) Volumetric analysis of

particles on the MMTV fragment before (c) and

following (d) temperature incubation and on the

three 601 constructs in the presence of Mg2+.

Where two particles could be resolved on a

template molecule, their volumes weredetermined and scored separately. Where only one

large particle was present, the total volume of the

composite particle was used. (h) Dinucleosomes

(2 pmol) assembled on MMTV DNA were subject

to remodeling with 0 fmol, 0.015 fmol, 0.03

fmol, 0.06 fmol or 0.12 fmol RSC in the

presence of 1 mM ATP for 30 min at 30 1C.

Following remodeling, the major new locations

detected are consistent with nucleosomes moving

into positions in which their DNA territories will

overlap if occupied on the same DNA molecule.

The loss of mapping signal in lanes 4 and 5 may

be due to the increased heterogeneity in the

remodeled chromatin and/or additional alterations

to chromatin structure.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 5 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 63: Nature Structural Molecular Biology February

the same time if the 147-bp territories of the corresponding nucleo-somes overlap22–26. Our results indicate that this restriction may notbe justified and that instead an overlap of at least 54 bp is possible.Other studies focus on the use of nucleosomal DNA of 147 bp inlength to assign nucleosome positions, when in fact occupancy over arange from 80 bp to 300 bp may be possible27.

For all DNA constructs characterized by AFM, the measuredCM-CM (center of mass–center of mass) distances indicate that thestretch of DNA physically present between the two nucleosomes islarger than the amount of DNA between their 147-bp DNA territories(Supplementary Fig. 5g online). It is unlikely that this is simply anartifact occurring during sample preparation, as nucleosomes werefixed with glutaraldehyde before deposition. Furthermore, the extent ofseparation was dependent on the ionic conditions in solution at thetime of cross-linking and persists at millimolar Mg2+ concentrations(compare Supplementary Fig. 5a,b and 5d,e). Previous studies havealso reported larger than anticipated CM-CM distances at low ionicstrength28–31. The unexpectedly large separation between nucleosomesis best explained by the partial unwrapping of nucleosomal DNA.Transient unwrapping of the outer turns of nucleosomal DNA has beenobserved in mononucleosomes11,12. The binding of transcriptionfactors to sequences on the edge of nucleosomes acts to promote theunraveling of DNA21,32, and it is possible that the presence ofneighboring nucleosomes promotes the exposure of DNA in a similarway. This could occur by steric occlusion or electrostatic repulsion.Consistent with the latter possibility, ionic conditions at the time ofcross-linking influence CM-CM distances (compare SupplementaryFig. 5a,b and 5d,e). The end result is that, whereas DNA within amononucleosome spends approximately 10% of the time in theunwrapped state12, this may be considerably extended in the contextof dinucleosomes or longer arrays of uncondensed chromatin.

The dissociation of three helical turns of DNA results in the loss ofmany of the contacts with the histone dimer on that side of thenucleosome. As histone dimers spontaneously dissociate from octa-mers at physiological salt concentrations, the unraveling of DNA couldpromote the loss of histone dimers. This would expose an additionalproportion of the DNA territory at the interface. If this is subsequentlyoccupied by an adjacent nucleosome, reassociation of the dimer wouldno longer be trivial. This provides a simple pathway by which anucleosome could invade the DNA territories of its neighbor as aresult of spontaneous or ATP-driven repositioning.

On some of the dimeric 601 constructs, the two nucleosomescoalesced upon addition of Mg2+ to form particles with a circularcross-section and increased height compared to single nucleosomes.Moreover, overlap lengths, for which Mg2+-induced folding occursmost readily, recur with a 10-bp periodicity, and the DNA in the linkerregion is protected from digestion by MNase. Altogether, theseobservations are indicative of a cylinder-shaped particle with DNAwrapped around its lateral surface in one continuous superhelix. Withthe aid of the NCP structure, we have been able to build a detailedmodel that fits the experimental observations. The model obtained fora 44-bp or 54-bp overlap length contains one inner dimer. Althoughthis dimer formally can be assigned to one of the two NCPs, it mightinteract with both tetramers in a structurally equivalent manner(Supplementary Fig. 3c).

For the nonoverlapping dinucleosomes on the (0 bp) construct, it isalso possible to obtain a model of a stacked particle, in which onlyminor steric interference occurs (not shown). Experimentally, how-ever, upon addition of Mg2+ we observed no coalescence ofthe nucleosomes on these template molecules (Fig. 3e), althoughsome compaction has previously been reported by analytical

ultracentrifugation33. This could suggest that two nucleosomal ends,both of which are sealed off by a histone H2A-H2B dimer, generallycannot form the type of stacking interaction required for dinucleoso-mal folding or association in trans. Only when at least one of the twoinner dimers is released is a suitable dimerization interface created.This stacking interaction is somewhat reminiscent of previous studiesshowing that tetramers can stack against each other6,34,35. Indeed, it ispossible that, with even greater overlaps than we have studied here,adjacent hexasomes might stack in this way.

Chromatin-remodeling complexes such as RSC and SWI/SNF useenergy derived from ATP hydrolysis to overcome thermodynamicconstraints on nucleosome mobility. As a result they have the potentialto drive the formation of overlapping nucleosomes, even in theabsence of suitable positioning elements. Remodeling of mononucleo-somes by RSC or SWI/SNF can result in the unraveling of up to 50 bpfrom the edge of one of these nucleosomes36–38. The exposed DNAbinding surface of these remodeled mononucleosomes provides ameans by which they may associate to form dinucleosome-likeparticles39,40. Nucleosomes located within arrays do not have thesame opportunity to encounter DNA ends. Instead it is far morelikely that one nucleosome will collide with a neighbor as a result ofsliding. If DNA is unraveled from either nucleosome at the point ofcollision, the result would be that one nucleosome would encroachupon territory occupied by the other in a fashion similar to that whichwe have reported here.

More recently, generation of an altered dinucleosome-like particle,termed the altosome, as a result of SWI/SNF remodeling of poly-nucleosomal arrays has been reported20,41. The altosome differs fromthe structures we have detected in that there is a DNA crossoverbetween adjacent intact nucleosomes and all histone polypeptides areretained. The hallmark of the altosome is the protection of DNAfragments from 190 bp to 250 bp in an MNase digest. The latterobservation is consistent with the continuous DNA superhelix in ourmodel, but less likely for any arrangement of the DNA that involves acrossover. Although there is evidence that no histone dimers are lostfrom altosomes20, other studies have found that SWI/SNF-relatedcomplexes reduce the stability with which histone dimers are retainedin nucleosomes42,43, and in this study we found that detection of a25% reduction in histone dimer content was technically challenging.

Finally, we observed the redistribution of dinucleosomes to posi-tions in which they seem to overlap following remodeling with RSC(Fig. 5h). Therefore, overlapping dinucleosomes may contribute tothe spectrum of products generated during the course of ATP-dependent chromatin-remodeling reactions. As we also observedthat the formation of overlapping nucleosomes can result in thedissociation of histone subunits, it is tempting to speculate thatsimilar species generated during remodeling by SWI/SNF complexescould represent intermediates in the complete removal of histoneoctamers. In such a reaction, an adjacent nucleosome would berequired for octamer removal, a concept that is gaining support44–46

and that is consistent with the observed kinetics of nucleosomeremoval at the PHO5 promoter47. It is now important to establishwhether this does indeed occur in vivo. Our preliminary efforts withthis aim have been unsuccessful, and it may not be trivial to detectsuch species if they exist transiently. It is nonetheless also possible thatother DNA-translocating enzymes such as DNA and RNA polymerasesmay precipitate related collisions between nucleosomes.

As collisions between nucleosomes have the potential to perturbchromatin structure on a genome-wide scale, it may be generallyadvantageous to prevent such collisions from occurring. All eukar-yotes possess nucleosome-spacing enzymes, including members of the

ART IC L E S

15 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 64: Nature Structural Molecular Biology February

ISWI and Chd1 families of Snf2-related proteins, which act to positionnucleosomes equidistantly from their neighbors. Many of theseenzymes are abundant and have the potential to hydrolyze largequantities of ATP. That they have proven to be of value throughoutthe evolution of eukaryotes may reflect the importance of preventinginternucleosome collisions over the majority of our genomes.

METHODSDNA fragments. All dimeric 601 constructs are based on the 147-bp core of the

601 sequence as defined previously14. The two halves are joined via an EcoRI

restriction site. Where additional DNA is present, it is derived from the DNA

flanking the 147-bp core in the original 220-bp 601 sequence48. We cloned

direct-repeat constructs into pBluescriptII and then amplified them by PCR.

Indirect-repeat constructs were synthesized by preparative ligation. The MMTV

DNA sequences used were amplified from pAB438 (ref. 49) by PCR such that

18-bp extensions were generated on the ends protruding beyond nucA

and nucB.

Nucleosome reconstitution, native gel electrophoresis and site-directed

hydroxylradical mapping. We carried out expression of histone proteins,

octamer refolding and nucleosome assembly as described50. We used

H4S47C and H3C110A mutant histones for mapping experiments. Native gel

analysis and site-directed mapping were performed as described37. In the

experiments in Figures 1 and 2 and Supplementary Figures 1 and 2, we used

a Cy5 fluorescent end label, instead of 32P. These gels were scanned using an

FLA-5100 imager (Fuji). RSC complex was purified as described previously51.

Mass spectrometry. Gel slices containing the dinucleosomal species of interest

were excised from a native gel, and an in-gel tryptic digest was performed.

We extracted peptides from the gel slices and analyzed them in triplicate by

nLC/ESI/MS/MS (nano liquid chromatography/electrospray ionization/mass

spectrmetry) on an LTQ Orbitrap XL mass spectrometer. For all peptides

identified in a Mascot search, the signal intensities were obtained by extracted

ion chromatograms using the Quan Browser software (Xcalibur). Peptides with

a signal intensity of less than 50,000 counts per second were discarded. The

remaining signal intensities were normalized by dividing sample by reference for

each individual peptide. Each replicate of the sample was divided by each

replicate of the reference, and the geometric mean was determined. Peptides

with a s.d. of larger than 0.4 were excluded from further analysis. Averaged

normalized signal intensities were plotted against peptide number and the

geometric mean was calculated separately for the set of dimer-derived and

tetramer-derived peptides. Sequences of the peptides used in Figure 2c are

available on request.

Atomic force microscopy. Dinucleosomes were purified by native gel electro-

phoresis and electroelution. Two microliters of the electroeluted samples were

fixed in 10 ml 5 mM Hepes and 2 ml 1% (v/v) glutaraldehyde for 15 min. Where

indicated, MgCl2 was present at a final concentration of 5 mM in the fixation

reaction. Directly after fixation, 2 ml of the mixture was deposited on freshly

cleaved mica, flushed with milli-Q water, dried in a stream of nitrogen gas and

imaged using a Nanoscope IV (Digital Instruments) operated in tapping mode

AFM, acquiring 1 mm � 1 mm images with 512 � 512 pixels. Image processing

was done with custom-built software written in LabVIEW.

Model construction. All in silico models of dinucleosomes were created in

PyMol. The models are based on the high-resolution crystal structure of

the NCP (PDB 1KX5). For superimposition of DNA helices, we used

the phosphorus atoms in both strands.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe would like to thank D. Lamont and K. Beattie for assistance with MS andD. Norman for assistance with modeling. We thank members of the T.O.-H.laboratory for valuable suggestions. M.d.J. and J.v.N. were financially supportedby the ‘Netherlands Organisation for Scientific Research’ (NWO) and theEuropean Science Foundation (ESF). M.E. (Studentship), A.F. and T.O.-H. werefunded by the Wellcome Trust (Senior Fellowship 064414).

AUTHOR CONTRIBUTIONSM.E. carried out most of the experimental work and data analysis; M.d.J. andJ.v.N. carried out AFM and associated data analysis; A.F. performed the assays inFigure 5; R.B. assisted with the modeling of the dinucleosome structure; M.E.and T.O.-H. designed the experiments and wrote the manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. van Holde, K.E. Chromatin, (Springer-Verlag, New York, 1988).2. Groth, A., Rocha, W., Verreault, A. & Almouzni, G. Chromatin challenges during DNA

replication and repair. Cell 128, 721–733 (2007).3. Workman, J.L. Nucleosome displacement in transcription. Genes Dev. 20, 2009–2017

(2006).4. Kornberg, R.D. & Lorch, Y. Twenty-five years of the nucleosome, fundamental particle

of the eukaryote chromosome. Cell 98, 285–294 (1999).5. Davey, C.A., Sargent, D.F., Luger, K., Maeder, A.W. & Richmond, T.J. Solvent mediated

interactions in the structure of the nucleosome core particle at 1.9 A resolution. J. Mol.Biol. 319, 1097–1113 (2002).

6. Flaus, A., Luger, K., Tan, S. & Richmond, T.J. Mapping nucleosome position at singlebase-pair resolution by using site-directed hydroxyl radicals. Proc. Natl. Acad. Sci. USA93, 1370–1375 (1996).

7. Widlund, H.R. et al. Identification and characterization of genomic nucleosome-positioning sequences. J. Mol. Biol. 267, 807–817 (1997).

8. Roth, S.Y., Dean, A. & Simpson, R.T. Yeast a2 repressor positions nucleosomes inTRP1/ARS1 chromatin. Mol. Cell. Biol. 10, 2247–2260 (1990).

9. Strauss, F. & Varshavsky, A. A protein binds to a satellite DNA repeat at three specificsites that would be brought into mutual proximity by DNA folding in the nucleosome.Cell 37, 889–901 (1984).

10. Pazin, M.J., Bhargava, P., Geiduschek, E.P. & Kadonaga, J.T. Nucleosome mobility andthe maintenance of nucleosome positioning. Science 276, 809–812 (1997).

11. Polach, K.J. & Widom, J. Mechanism of protein access to specific DNA sequencesin chromatin: a dynamic equilibrium model for gene regulation. J. Mol. Biol. 254,130–149 (1995).

12. Li, G., Levitus, M., Bustamante, C. & Widom, J. Rapid spontaneous accessibility ofnucleosomal DNA. Nat. Struct. Mol. Biol. 12, 46–53 (2005).

13. Ferreira, H., Somers, J., Webster, R., Flaus, A. & Owen-Hughes, T. Histone tails andthe H3 aN helix regulate nucleosome mobility and stability. Mol. Cell. Biol. 27,4037–4048 (2007).

14. Schalch, T., Duda, S., Sargent, D.F. & Richmond, T.J. X-ray structure of a tetranucleo-some and its implications for the chromatin fibre. Nature 436, 138–141 (2005).

15. Bancaud, A. et al. Nucleosome chiral transition under positive torsional stress in singlechromatin fibers. Mol. Cell 27, 135–147 (2007).

16. Meersseman, G., Pennings, S. & Bradbury, E.M. Mobile nucleosomes—a generalbehavior. EMBO J. 11, 2951–2959 (1992).

17. Thastrom, A. et al. Sequence motifs and free energies of selected natural and non-natural nucleosome positioning DNA sequences. J. Mol. Biol. 288, 213–229(1999).

18. Dorigo, B., Schalch, T., Bystricky, K. & Richmond, T.J. Chromatin fiber folding:requirement for the histone H4 N-terminal tail. J. Mol. Biol. 327, 85–96 (2003).

19. Richard-Foy, H. & Hager, G.L. Sequence-specific positioning of nucleosomes over thesteroid-inducible MMTV promoter. EMBO J. 6, 2321–2328 (1987).

20. Ulyanova, N.P. & Schnitzler, G.R. Human SWI/SNF generates abundant, structurallyaltered dinucleosomes on polynucleosomal templates. Mol. Cell. Biol. 25, 11156–11170 (2005).

21. Li, G. & Widom, J. Nucleosomes facilitate their own invasion. Nat. Struct. Mol. Biol.11, 763–769 (2004).

22. Segal, E. et al. A genomic code for nucleosome positioning. Nature 442, 772–778(2006).

23. Yuan, G.C. et al. Genome-scale identification of nucleosome positions in S. cerevisiae.Science 309, 626–630 (2005).

24. Johnson, S.M., Tan, F.J., McCullough, H.L., Riordan, D.P. & Fire, A.Z. Flexibility andconstraint in the nucleosome core landscape of Caenorhabditis elegans chromatin.Genome Res. 16, 1505–1516 (2006).

25. Albert, I. et al. Translational and rotational settings of H2A.Z nucleosomes across theSaccharomyces cerevisiae genome. Nature 446, 572–576 (2007).

26. Lee, W. et al. A high-resolution atlas of nucleosome occupancy in yeast. Nat. Genet.39, 1235–1244 (2007).

27. Fatemi, M. et al. Footprinting of mammalian promoters: use of a CpG DNA methyl-transferase revealing nucleosome positions at a single molecule level. Nucleic AcidsRes. 33, e176 (2005).

28. Allen, M.J. et al. Atomic force microscope measurements of nucleosome coresassembled along defined DNA sequences. Biochemistry 32, 8390–8396 (1993).

29. Pisano, S., Pascucci, E., Cacchione, S., De Santis, P. & Savino, M. AFM imaging andtheoretical modeling studies of sequence-dependent nucleosome positioning. Biophys.Chem. 124, 81–89 (2006).

30. van Holde, K. & Zlatanova, J. The nucleosome core particle: does it have structural andphysiologic relevance? Bioessays 21, 776–780 (1999).

31. Thoma, F., Koller, T. & Klug, A. Involvement of histone H1 in the organization of thenucleosome and of the salt-dependent superstructure of chromatin. J. Cell Biol. 83,403–427 (1979).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 5 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 65: Nature Structural Molecular Biology February

32. Polach, K.J. & Widom, J. A model for the cooperative binding of eukaryoticregulatory proteins to nucleosomal target sites. J. Mol. Biol. 258, 800–812 (1996).

33. Butler, P.J.G. & Thomas, J.O. Dinucleosomes show compaction by ionic strength,consistent with bending of linker DNA. J. Mol. Biol. 281, 401–407 (1998).

34. Alilat, M., Sivolob, A., Revet, B. & Prunell, A. Nucleosome dynamics IV. Protein andDNA contributions in the chiral transition of the tetrasome, the histone (H3–H4)2tetramer-DNA particle. J. Mol. Biol. 291, 815–841 (1999).

35. Tomschik, M., Karymov, M.A., Zlatanova, J. & Leuba, S.H. The archaeal histone-foldprotein HMf organizes DNA into bona fide chromatin fibers. Structure 9, 1201–1211(2001).

36. Fan, H.Y., He, X., Kingston, R.E. & Narlikar, G.J. Distinct strategies to makenucleosomal DNA accessible. Mol. Cell 11, 1311–1322 (2003).

37. Flaus, A. & Owen-Hughes, T. Dynamic properties of nucleosomes during thermal andATP-driven mobilization. Mol. Cell. Biol. 23, 7767–7779 (2003).

38. Kassabov, S.R., Zhang, B., Persinger, J. & Bartholomew, B. SWI/SNF unwraps, slidesand rewraps the nucleosome. Mol. Cell 11, 391–403 (2003).

39. Lorch, Y., Zhang, M. & Kornberg, R.D. RSC unravels the nucleosome. Mol. Cell 7,89–95 (2001).

40. Ulyanova, N.P. & Schnitzler, G.R. Inverted factor access and slow reversioncharacterize SWI/SNF-altered nucleosome dimers. J. Biol. Chem. 282, 1018–1028(2007).

41. Schnitzler, G.R. et al. Direct imaging of human SWI/SNF-remodeled mono- andpolynucleosomes by atomic force microscopy employing carbon nanotube tips. Mol.Cell. Biol. 21, 8504–8511 (2001).

42. Bruno, M. et al. Histone H2A/H2B dimer exchange by ATP-dependent chromatinremodeling activities. Mol. Cell 12, 1599–1606 (2003).

43. Vicent, G.P. et al. DNA instructed displacement of histones H2A and H2B at aninducible promoter. Mol. Cell 16, 439–452 (2004).

44. Cairns, B.R. Chromatin remodeling: insights and intrigue from single-molecule studies.Nat. Struct. Mol. Biol. 14, 989–996 (2007).

45. Dechassa, M.L. et al. Architecture of the SWI/SNF-nucleosome complex. Mol. Cell.Biol. 28, 6010–6021 (2008).

46. Chaban, Y. et al. Structure of a RSC-nucleosome complex and insights into chromatinremodeling. Nat. Struct. Mol. Biol. 15, 1272–1277 (2008).

47. Boeger, H., Griesenbeck, J. & Kornberg, R.D. Nucleosome retention and the stochasticnature of promoter chromatin remodeling for transcription. Cell 133, 716–726(2008).

48. Lowary, P.T. & Widom, J. New DNA sequence rules for high affinity binding to histoneoctamer and sequence-directed nucleosome positioning. J. Mol. Biol. 276, 19–42(1998).

49. Flaus, A. & Richmond, T.J. Positioning and stability of nucleosomes on MMTV 3¢ LTRsequences. J. Mol. Biol. 275, 427–441 (1998).

50. Luger, K., Rechsteiner, T.J. & Richmond, T.J. Expression and purification ofrecombinant histones and nucleosome reconstitution. Methods Mol. Biol. 119,1–16 (1999).

51. Ferreira, H., Flaus, A. & Owen-Hughes, T. Histone modifications influence the actionof Snf2 family remodelling enzymes by different mechanisms. J. Mol. Biol. 374,563–579 (2007).

ART IC L E S

15 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 66: Nature Structural Molecular Biology February

SRS2 and SGS1 prevent chromosomal breaks and stabilizetriplet repeats by restraining recombinationAlix Kerrest1,4, Ranjith P Anand2,4, Rangapriya Sundararajan2, Rodrigo Bermejo3, Giordano Liberi3,Bernard Dujon1, Catherine H Freudenreich2 & Guy-Franck Richard1

Several molecular mechanisms have been proposed to explain trinucleotide repeat expansions. Here we show that in yeast srs2Dcells, CTG repeats undergo both expansions and contractions, and they show increased chromosomal fragility. Deletion of RAD52or RAD51 suppresses these phenotypes, suggesting that recombination triggers trinucleotide repeat instability in srs2D cells.In sgs1D cells, CTG repeats undergo contractions and increased fragility by a mechanism partially dependent on RAD52 andRAD51. Analysis of replication intermediates revealed abundant joint molecules at the CTG repeats during S phase. Thesemolecules migrate similarly to reversed replication forks, and their presence is dependent on SRS2 and SGS1 but not RAD51. Ourresults suggest that Srs2 promotes fork reversal in repetitive sequences, preventing repeat instability and fragility. In the absenceof Srs2 or Sgs1, DNA damage accumulates and is processed by homologous recombination, triggering repeat rearrangements.

Trinucleotide repeats are a particular class of microsatellites involvedin many human neurological and muscular disorders, includingfragile X syndrome, Huntington’s disease and Friedreich’s ataxia(reviewed in refs. 1,2). These disorders are all associated with theexpansion of a trinucleotide repeat array near or within a gene. Theseexpansions can be large, in some instances reaching several thousandsof repeats in one single generation. Several mechanisms have beenproposed to explain trinucleotide repeat expansions, including repli-cation slippage and DNA repair of single-strand nicks (reviewed inrefs. 3,4). Several years ago, an alternative model was proposed,involving slippage during double-strand break repair5. It was shownthat in yeast contractions and expansions occur during gene conver-sion associated with double-strand break repair. Rearrangements,which were observed in 20–40% of gene-conversion events, weredependent on the Mre11–Rad50–Xrs2 protein complex, were morefrequent during ectopic double-strand break repair and did notinvolve crossover formation6–8.RAD27, the yeast homolog of the human FEN1 gene involved in

Okazaki fragment processing, was the first gene identified whosedeletion led to an increased frequency of large expansions andcontractions of trinucleotide repeats in yeast5,9–11. During the courseof a whole-genome screen looking for genes whose deletion gave asynthetic slow-growth or lethal phenotype with the RAD27 deletion,we found 41 mutants showing such a phenotype12. Among them, weidentified two S-phase helicase genes, SRS2 and SGS1.SGS1 was originally identified as a suppressor of a type I topo-

isomerase (TOP3) mutation13 and was also shown to interact withTOP2 (ref. 14). The Sgs1 protein is a DEAH-box helicase, having

orthologs in Escherichia coli (RecQ), in all sequenced yeasts15 and inmammals. Five orthologs are found in humans: WRN, BLM and RTS,respectively involved in Werner’s, Bloom’s and Rothmund-Thomson’ssyndromes, and two shorter forms, RecQL and RecQ5. The precisebiochemical activity of SGS1 is unknown, but the BLM protein andhuman topoisomerase IIIa can unwind double-Holliday junctionsin vitro16. In addition, sgs1 mutants in Saccharomyces cerevisiae showincreased levels of crossovers17,18, and Drosophila melanogaster mus309mutants (mutated in the BLM ortholog) are defective in a late stage ofdouble-strand break repair19. Altogether, these data point to a role forSGS1 in unwinding Holliday junction–like molecules.SRS2 is involved in the post-replication repair pathway of DNA

damage20. The purified Srs2 protein possesses a 3¢-to-5¢, ATP-depen-dent helicase activity21 and was shown to disrupt Rad51 nucleoproteinfilaments in vitro22,23. It was recently proposed that the Srs2 helicasecould act as an antirecombinogenic protein that unwinds toxicrecombination intermediates24. One study25 showed that shortCAG�CTG trinucleotide repeats (13–25 repeats) were more prone toexpansions in a srs2 mutant, and that these expansions were largelyindependent of RAD51. However, long trinucleotide repeat sequencesin both yeast and E. coli undergo breakage in a length-dependentmanner9,26–29. Therefore, we were interested in testing the effect ofdeleting the SRS2 and SGS1 genes on the stability of long CAG�CTGrepeats and determining whether expansions of such long repeatsoccurred independently of RAD51.

SRS2 and SGS1 mutants show a strong genetic interaction—thesrs2D sgs1D double mutant is lethal30. Cell death is suppressed bymutations in RAD51, RAD55 or RAD57, showing that homologous

Received 16 May 2008; accepted 4 December 2008; published online 11 January 2009; corrected online 26 January 2009 (details online); doi:10.1038/nsmb.1544

1Institut Pasteur, Unite de Genetique Moleculaire des Levures, CNRS, URA2171, Universite Pierre et Marie Curie, UFR 927, 25 rue du Dr Roux, F-75015 Paris,France. 2Department of Biology, Tufts University, Medford, Massachusetts 02155, USA. 3Instituto FIRC di Oncologia Molecolare, Via Amadello 16, 20141 Milano,Italy. 4These authors contributed equally to this work. Correspondence should be addressed to G.-F.R. ([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 5 9

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 67: Nature Structural Molecular Biology February

recombination is responsible for cell death. The authors suggested thatthe Srs2 and Sgs1 helicases are needed either to help restart stalledreplication forks or at the replication termination step, or alternativelyto process recombination intermediates that form during replication.Their simultaneous absence would lead to accumulation of DNAdamage transformed into potentially toxic recombination intermedi-ates, whose resolution would be lethal for the cells31.

To study the role of SRS2 and SGS1 on the stability of longtrinucleotide repeats during replication, we integrated CAG�CTGrepeats in the two opposite orientations in two different chromosomallocations: on yeast chromosome X and on a yeast artificial chromo-some (YAC). We found that these repeats are unstable in both mutantbackgrounds. Instability is dependent on RAD52 and RAD51, but todifferent degrees. Chromosomal fragility is also increased in bothmutants; however, further analyses showed that only in srs2D cells isthis fragility dependent on homologous recombination, suggestingthat chromosomal breakage occurs by a different pathway in sgs1Dcells. Analysis of replication and recombination intermediates by two-dimensional gel electrophoresis showed that molecules that migrate ina similar manner to reversed forks form during replication of thetrinucleotide repeat tract, and formation of these intermediatesdepends on the presence of both SRS2 and SGS1. We propose thattrinucleotide repeat replication is less efficient in srs2D and sgs1D cells,leading to accumulation of DNA damage. Processing of this damageby the homologous recombination machinery triggers rearrangementsof the repeat tract by sister-chromatid recombination and single-strand annealing.

RESULTSCAG�CTG repeats are more fragile in the absence of Srs2 or Sgs1To study whether Srs2 and Sgs1 proteins protect against fragilitycaused by trinucleotide repeats, we used a YAC-based assay thatallowed determination of the rate of CAG�CTG fragility in eitherwild-type or helicase-deficient yeast (Fig. 1a). Comparison of the5-fluorotic acid resistance (FOAR) rates in strains carrying a (CAG)70

tract showed that breakage increased 2.8-fold and 5.8-fold in theabsence of Sgs1 and Srs2, respectively, as compared to the rate in thewild-type strain (Fig. 1b and Supplementary Table 1 online; Po 0.01for both). The increase in breakage rate in the absence of eitherhelicase suggests that they both have an important role in preventing

CEN4

CEN4C NG

CNG

(CAG)70

(CTG)70

70

WT

sgs1�

srs2�

rad51�

rad52�

sgs1�rad51�

sgs1�rad52�

srs2�rad51�

WT

sgs1�

srs2�

rad51�

rad52�

sgs1�rad51�

sgs1�rad52�

srs2�rad51�

60

50

40

30

2.61.4 2.02.12.42.5

2.8

5.8

4.3

5.8

5.1

3.2

1.9

0.4

2.9

1.51.5

1.81.82.1

0.7

20

**

**

** **

****

*

*10

0

URA3 TEL

URA3

FOAR

(CAG)0 (CAG)70

(CTG)0 (CTG)70

Rat

e of

FO

AR

× 10

–6

70

60

50

40

30

20

10

0

Rat

e of

FO

AR

× 10

–6

TEL

CEN4

TEL

TEL

TEL TEL

LEU2

LEU2

Breakage

G4T4C4A4

G4T4C4A4

G4T4C4A4

De novo telomere addition

LEU2

ARS1

ARS1

ARS1

2.61.4 2.0

2.12.42.5

0.4**

****

**

*

* *

*

a b

c

Figure 1 CAG�CTG repeats show increased fragility in the absence of Srs2 or

Sgs1 helicases. Molecular analysis of YACs purified from FOAR colonies, in

both wild-type and mutant strains, showed that the rate of FOAR is correlated

with YAC breakage (data not shown; see also ref. 26). (a) Experimental system.

If the YAC undergoes breakage at or near the trinucleotide repeat tract, the distalDNA fragment containing the URA3 gene is lost and cells become resistant to

5-fluoroorotic acid (FOAR). Broken YACs can be recovered by addition of a new

telomere onto the 108-bp T4G4/C4A4 telomere seed sequence (TEL). (b,c) Rate of

FOAR for cells with YACs containing no repeat (CAG�CTG)0, a (CAG)70 repeat (b)

or a (CTG)70 repeat (c). The average of at least three experiments is shown.

Error bars indicate s.e.m., and asterisks indicate a significant difference between

the wild type and the mutants (pooled t-test: **, P r 0.01; *, P r 0.05). Numbers

above each bar represent the fold increase over the wild-type value for the

corresponding repeat tract.

WT+CTG(GFY117)

CTG orientation5′

5′3′

5′3′

3′

5′3′

CAG orientation

CEN XARS1010

a

b

JEM1 YJL070cPSF2 ARG2∆ TRP1

srs2∆(BY1331)

sgs1∆(BY775)

srs2∆+CTG(GFY120)

sgs1∆+CTG(GFY121)

WT+CAG(GFY167)

srs2∆(BY1331)

sgs1∆(BY775)

srs2∆+CAG(GFY162)

sgs1∆+CAG(GFY168)

YPDHU 0.2 M

CT

G o

rient

atio

nC

AG

orie

ntat

ion

YPDHU 0.2 M

Figure 2 Effect of the trinucleotide repeat tract orientation on stability.

(a) Experimental design. The ARG2 locus on chromosome X, at which

repeats are cloned, is depicted, along with the replication origin (ARS1010)

and centromere location (CEN X). Strains used in this study contain thesame repeat tract cloned either in the CTG orientation (the CTG sequence

is the lagging-strand template) or in the CAG orientation (the CAG sequence

is the lagging-strand template). (b) Orientation effect on growth rate on

hydroxyurea (HU) plates compared to standard glucose plates (YPD).

WT, wild type.

ART IC L E S

16 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 68: Nature Structural Molecular Biology February

or repairing lesions that occur at long trinucleotide repeats. Srs2 seemsto have a particularly important function at a CAG repeat tract, asfragility in the srs2D background was increased only 1.4-fold for the(CAG)0 control YAC but 5.8-fold for the (CAG)70 YAC. Notably,deletion of RAD51 in an srs2D background brought the breakage ratedown almost to the wild-type level (wild type versus srs2D rad51D,P ¼ 0.163; srs2D versus srs2D rad51D, P ¼ 0.02), indicating that theincreased fragility observed in a srs2D strain is dependent on RAD51.In contrast, deletion of RAD51 in an sgs1D background did notsuppress fragility and in fact resulted in increased CAG fragilitycompared to the sgs1D single mutant (P ¼ 0.07). Because there aresome pathways of recombination that are dependent on Rad52 butnot Rad51, we also made the sgs1D rad52D double mutant. Thismutant showed a rate of FOAR similar to that of the sgs1D rad51Dmutant. The increase observed in the double mutant is consistent withan additive effect of the sgs1D and rad52D single mutants (Fig. 1b).Thus, the fragility that occurs in the absence of Sgs1 is independentof recombination.

To determine whether there is an orientation effect on CAG�CTGfragility, we flipped the repeat tract so that the (CTG)70 repeat would beon the lagging-strand template (CTG orientation). This orientation hasbeen previously shown to be more contraction prone in both yeast andE. coli cells32–36. In this orientation, the repeat was even more prone tobreakage, with the rate of FOAR being 3.3-fold more than in the CAGorientation (nine-fold more than in the control; Fig. 1c and Supple-mentary Table 1). Both Sgs1 and Srs2 were still important in prevent-ing repeat tract breakage in this orientation, although notably theimportance of each helicase was reversed. In the sgs1D mutant, fragilitywas about three-fold higher than the wild-type rate, whereas, in thesrs2D strain, fragility was increased only 1.5-fold. As in the CAG orien-tation, fragility rates returned to wild-type levels in the srs2D rad51Ddouble mutant, whereas fragility in the sgs1D strain was only partiallydependent on RAD51. In summary, both Srs2 and Sgs1 helicases areimportant in preventing fragility of a CAG�CTG tract, regardless oforientation, and fragility of an srs2D mutant, but not an sgs1D mutant,is rescued by preventing Rad51-dependent recombination.

CAG�CTG repeats are frequently rearranged in srs2D andsgs1D cellsTo determine the role of Srs2 and Sgs1 on trinucleotide repeat stability,we integrated CAG�CTG repeats in the two opposite orientations at

either the ARG2 locus on yeast chromosome X or on the YAC. Forboth locations, Figures 1a and 2a show the replication fork comingfrom the left, so that when the CTG strand is the top strand it is usedas the template for lagging-strand synthesis; this strand will be here-after referred to as the CTG orientation, or CTG repeats (orientationII32). When the CTG strand is the bottom strand, it is used as thetemplate for leading-strand synthesis and will be hereafter referred toas the CAG orientation, or CAG repeats (orientation I32).

First, we analyzed repeat-size changes in srs2D and sgs1D strains inthe CTG orientation and compared them to the wild-type strain. Atthe ARG2 locus, we observed no expansion of the (CTG)55 repeat inthe wild-type and sgs1D strains. In contrast, in the srs2D strain, wedetected expansions in 8.1% of the colonies analyzed, a substantiallyhigher frequency than observed in the wild-type strain (Table 1).Thus, deletion of SRS2 substantially increases expansions of a longCTG tract, whereas deletion of SGS1 does not increase expansions. Forcontractions, we observed a 25-fold increase in the srs2D strain(58.1%) and a 20-fold increase in the sgs1D strain (47.1%) ascompared to in the wild type (Table 1). Expansion sizes rangedfrom 17 to 33 repeats (mean: 22 ± 4 repeats), and contraction sizesranged from 10 to 55 repeats (mean: 29 ± 3 repeats) (size changessmaller than 10 repeats were not detectable at this locus). On the YAC,the (CTG)70 repeat seemed to be more contraction prone than theshorter repeat at the ARG2 locus, with 24.5% of cells showingcontractions and none showing expansions in the wild-type back-ground (Table 2). Nonetheless, similarly to the ARG2 locus, an srs2Dmutation substantially increased the frequency of expansions to 3.1%.The expansion sizes ranged from 26 to 39 repeats (mean: 29 ± 6repeats) and the contraction sizes ranged from 4 to 70 repeats (mean:49 ± 2 repeats) (size changes Z3 repeats were detectable at theYAC locus).

We subsequently looked for instability in both mutant backgroundsin the opposite orientation, where the CAG repeats lie on the lagging-strand template, expecting that the contractions would be lessfrequent. Indeed, at the ARG2 locus, the (CAG)55 repeats were stable(around 2% of cells showed contractions and none showed expansionin both mutants; Table 1). On the YAC, however, where the repeatsseem to be more unstable owing to either the slightly longer repeattested or the location, or a combination of both, we observed thatrepeats were destabilized in both mutant backgrounds (Table 2). In

Table 1 Instability of CTG and CAG triplets on chromosome X in

srs2D and sgs1D mutants

Strain No. of clones Contractions (%)a Expansions (%)a Total (%)a

CTG orientation

WT 141 2.1 (3) 0 (0) 2.1 (3)

srs2D 62 58.1 (36)** 8.1 (5)** 66.2 (41)

sgs1D 104 47.1 (49)** 0 (0) 47.1 (49)

rad52D 115 0.9 (1) 0 (0) 0.9 (1)

rad51D 92 0 (0) 0 (0) 0 (0)

srs2D rad52D 117 3.4 (4) 0 (0) 3.4 (4)

srs2D rad51D 114 0 (0) 0 (0) 0 (0)

sgs1D rad52D 93 0 (0) 0 (0) 0 (0)

sgs1D rad51D 122 0.8 (1) 5.7 (7)** 6.6 (8)

CAG orientation

srs2D 182 2.2 (4) 0 (0) 2.2 (4)

sgs1D 192 2.1 (4) 0 (0) 2.1 (4)

aNumbers in parentheses indicate the number of clones in each class.*, P-value r 0.05;** P-value r 0.01 (Fisher’s exact test).

Table 2 Instability of CTG and CAG triplets on the YAC in srs2D and

sgs1D mutants

Strain No. of clones % Contractionsa % Expansionsa % Totala

CTG orientation

WT 163 24.5 (40) 0 (0) 24.5 (40)

srs2D 130 30.0 (39) 3.1 (4)* 33.1 (43)

sgs1D 157 23.6 (37) 0 (0) 23.6 (37)

srs2D rad51D 134 34.3 (46) 0.8 (1) 35.1 (47)

sgs1D rad51D 164 32.9 (54) 0.6 (1) 33.5 (55)

sgs1D rad52D 80 35 (28) 0 (0) 35 (28)

CAG orientation

WT 217 2.8 (6) 1.4 (3) 4.1 (9)

srs2D 231 6.5 (15)* 5.6 (13)** 12.1 (28)

sgs1D 236 8.9 (21)** 1.7 (4) 10.6 (25)

srs2D rad51D 144 3.5 (5) 0.7 (1) 4.2 (6)

sgs1D rad51D 144 6.9 (10) 0 (0) 6.9 (10)

sgs1D rad52D 164 11.6 (19)** 0.6 (1) 12.2 (20)

aNumbers in parentheses indicate the number of clones in each class.*, P-value r 0.05;** P-value r 0.01 (Fisher’s exact test).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 6 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 69: Nature Structural Molecular Biology February

the srs2D mutant both expansions and contractions were morefrequent, whereas in the sgs1D strain only contractions showedsubstantially increased frequencies (Table 2). Expansion sizes rangedfrom 3 to 30 repeats in the srs2D strain (mean: 17 ± 4 repeats) and 3 to16 repeats in the sgs1D strain (mean: 9 ± 7 repeats); contraction sizesranged from 3 to 69 repeats (srs2D mean: 17 ± 5 repeats; sgs1D mean:38 ± 5 repeats). We conclude that trinucleotide repeats are prone tofrequent rearrangements in srs2D and sgs1D cells. However, the twohelicases do not act equivalently, as Srs2 protects against both repeatexpansions and contractions, whereas only contractions showedincreased frequency in the absence of Sgs1.

To detect a possible effect of the orientation on cell growth in bothmutant backgrounds, we made serial dilutions on plates containing200 mM hydroxyurea. Hydroxyurea slows down replication forkprogression37 by inhibiting ribonucleotide reductase38. Reducedgrowth on hydroxyurea plates was visible in srs2D cells containingCTG repeats at ARG2, as compared to an isogenic strain containingno repeat (Fig. 2b). Similarly, although sgs1D cells were sensitiveto hydroxyurea, this phenotype was more severe in sgs1D cellscontaining repeats in the CTG orientation. In the CAG orientation,no growth defect due to the presence of the repeats was detected.This suggests that both helicases are needed to help replicating

CTG repeats, perhaps by unwinding structures formed by theserepeats39 that could lead to lesions such as double- or single-strandbreaks, or yet other kinds of lesions. The high level of instabilitydetected in srs2D and sgs1D mutants in the CTG orientation, ascompared to the lower level for the CAG orientation (Table 1), wouldreflect this difference.

Repeat instability in both mutants depends on recombinationTo determine whether trinucleotide repeat instability in the CTGorientation was dependent on homologous recombination, we deletedRAD51 or RAD52 in srs2D and sgs1D mutants and in the wild-typestrain. At the ARG2 locus, trinucleotide repeats were stable in therad52D and rad51D single mutants (Table 1). In sgs1D rad52D, srs2Drad52D and srs2D rad51D double mutants, trinucleotide repeats wereas stable as in the wild-type strain. Thus, the high level of instabilityobserved in both mutants is mediated by homologous recombination(Table 1). The (CTG)70 repeats on the YACs showed a similar profile,where the increased frequency of expansions observed in srs2D cellsdropped down to a frequency indistinguishable from that of thewild-type level in srs2D rad51D cells (Table 2, P ¼ 0.4512), indicatingthat the expansion events that occurred in srs2D cells were dependenton Rad51-mediated recombination.

First dimension: size

a

b

ARS1010

7.2 kb

3.6 kb

5.7 kb

1.0

WT

rad51∆

X spike

0.80.60.40.2

40′

40′

90′60′

30′

40′

60′

12

Y arc % JMs/Y arc

0.9

0.8

0.7

0.6

0.5

0.4

10

8

6

4

2

40′30 60 9040

Time (min)

Time (min)40 9060

CEN X

CEN X

JEM1 ARG2 YJL070cPSF2

JEM1 ARG2∆ YJL070cPSF2 TRP1

ClaI ClaI

X spike

Y arc

Linear

X spike

First dimension: sizeARS1010

7.2 kb

ClaI ClaI

Y arc

Y arc

(CTG)55

WT

rad51∆

1n

1n

2n

Sec

ond

dim

ensi

on:

size

and

str

uctu

re

Spike

Cone Y arc

Linear

Jointmolecules

Jointmolecules

2n

Sec

ond

dim

ensi

on:

size

and

str

uctu

re

Figure 3 Analysis of replication

intermediates at the ARG2 locus bytwo-dimensional gel electrophoresis.

(a) The ARG2 locus in strains without

repeats is depicted, showing the

positions of the two ClaI sites used to

digest total DNA. The X spike signal

values are shown as the ratio of the signal at each time point over the signal at 40 min. An enlargment of the 40-min time point is shown to the left, along

with a cartoon depicting the types of molecules visualized on the gel. (b) The ARG2 locus in the wild-type strain GFY117 containing the (CTG)55 repeat tract

is depicted. The same enzymatic restriction, probe and quantification as above were used. The Y arc (white bars) and the joint molecule (gray bars) signal

values are shown as the percentage of Y arc signal over total signal, and as the ratio of joint molecule (JM) signal over Y arc signal, respectively. An

enlargment of the 40-min time point is shown to the left, along with a cartoon depicting the different types of molecules visualized on the gel, the spike and

the cone being the joint molecules. The position at which the CTG repeats are inserted is shown by a white arrowhead.

ART IC L E S

16 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 70: Nature Structural Molecular Biology February

The results were more complex for the sgs1D mutant. The increasedcontractions observed, at the ARG2 locus, in sgs1D cells were sup-pressed by a deletion of RAD52 or RAD51, indicating that thesecontractions were triggered by homologous recombination (Table 1).However, in the sgs1D rad51D double mutant, the frequency ofexpansions was substantially higher than in the wild-type strain oreither of the single mutants. Although it is possible to generateexpansions by single-strand annealing (SSA) when both SGS1 andRAD51 are inactivated, previous studies showed that mostly contrac-tions were generated by such a mechanism5. Further investigations willbe needed to clarify the precise mechanism by which expansions occurin this strain background. We also deleted RAD51 or RAD52 in thesgs1D mutant, on the YAC. The CAG contractions observed in thesgs1D strain were not markedly reduced by the elimination of Rad51or Rad52 (Table 2). We conclude that repeat instability induced bydeletion of SGS1 seems to be only partially dependent on homologousrecombination, an observation reminiscent of the CTG repeat fragilitywe observed in sgs1D cells, which was also only partially dependent onhomologous recombination (Fig. 1c).

Altogether, we found that all phenotypes (repeat expansions andcontractions and chromosomal fragility) that showed increased fre-quency in srs2D strains were dependent on the presence of a functionalhomologous recombination machinery. However, the repeat instabil-ity and fragility occurring in the sgs1D strain were only partiallydependent on recombination, with the results dependent on the typeof instability and on the orientation of the repeat.

Analysis of replication and recombination intermediates bytwo-dimensional gelsTo investigate replication and recombination intermediates duringtrinucleotide repeat replication, we used two-dimensional gel electro-phoresis. In the wild-type strain containing no repeat tract, weobserved a Y arc corresponding to replication forks progressingthrough the ARG2 locus (Fig. 3a). In addition, we detected X-shapedmolecules that migrate in a similar manner to Holliday junctions orhemicatenanes and appear and disappear with the Y arc. To determinewhether these structures were recombination intermediates, weperformed the same experiment in a rad51D strain. In this strain,the X spike was still detectable, showing that these molecules arenot RAD51-dependent recombination intermediates and suggesting

that they are hemicatenanes40 (although wecannot formally exclude that they aresome kind of RAD51-independent recombi-nation intermediate).

Hemicatenanes were not visible in thewild-type strain containing the trinucleotiderepeat tract; instead, they were replaced bytwo other kinds of structured molecules: a

spike-like shape migrating above the Y arc and a conical shapeemanating from where the trinucleotide repeat tract is located, onthe descending Y arc (Fig. 3b). These structured molecules also appearand disappear with the Y arc, indicating that they are formed duringreplication of the ARG2 locus and are removed afterwards. Thesestructures migrate in a similar manner to joint molecules, whichwould be slightly retarded in the second dimension.

In the sgs1D and srs2D strains, progression of the replication fork issimilar to the wild type, with the Y arc peaking in intensity at around40 min (Fig. 4). Joint molecules were visible in both the srs2Dand sgs1D mutants, but their amount was reduced as compared towild type. In the srs2D strain, the amount of joint molecules wassignificantly reduced two- to four-fold as compared to wild type at alltime points (P ¼ 0.0087). In the sgs1D strain, the amount of jointmolecules as compared to Y arc is significantly reduced two-fold at40 min and 60 min (P ¼ 0.0465). Formation of these joint moleculesis therefore partially dependent on both SRS2 and SGS1.

To determine whether joint molecule formation was dependent onhomologous recombination, we analyzed their amounts in a rad51Dstrain and in the sgs1D rad51D and srs2D rad51D double mutants,focusing on the 40-min and 60-min time points (Fig. 4). In therad51D strain and in sgs1D rad51D cells, joint molecule formation wasnot statistically different from what was observed in the wild type(P 4 0.05). However, in the srs2D rad51D strain, the amount of jointmolecules was significantly decreased three- to five-fold compared tothe rad51D mutant (P ¼ 0.0418) and three- to four-fold compared tothe wild type (P ¼ 0.0475), but was not statistically different from theamount in the srs2D single mutant (P 4 0.05). This shows that SRS2is epistatic to RAD51 for joint molecule formation, and that RAD51 isepistatic to SGS1 for the same process. Joint molecules could bereversed replication forks; therefore, SRS2 would act first at thereplication fork, perhaps to promote replication fork reversal whendamage is present. The bacterial Srs2 homolog, UvrD, has been shownto facilitate the reversal of stalled forks by clearing inappropriatebinding of recombination proteins such as the Rad51 homolog,RecA41. If fork reversal does not occur properly, the damaged forkwill be taken care of by Rad51-mediated homologous recombination,to be eventually resolved by Sgs1. We therefore propose that jointmolecules are a mixture of reversed replication forks and Hollidayjunctions (Fig. 5, and see Discussion).

8

Y arc %CEN X

765

34

2

30 40

WTsgs1∆

rad51∆sgs1∆ rad51∆srs2∆ rad51∆srs2∆

60

ARS1010

7.2 kb5.7 kb

JEM1 YJL070cPSF2 ARG2∆ TRP1

CIaI CIaI

rad51∆

sgs1∆rad51∆

srs2∆rad51∆

sgs1∆

srs2∆

30′

WT

40′ 60′40′ 60′

30 40Time (min)

Time (min)

60

1

10.9

JMs/Y arc

0.8

**

**

*

***

**

**

0.70.60.50.40.30.20.1

Figure 4 Analysis of replication intermediates at

ARG2 by two-dimensional gel electrophoresis in

wild-type (WT) and mutant strains. The ARG2

locus in the wild-type strain GFY117 containing

the (CTG)55 repeat tract is depicted as in

Figure 3. Representative two-dimensional gels for

each time point are shown to the left. Y arc and

joint molecules quantifications were performed as

in Figure 3. Asterisks above graph bars indicate a

significant difference between the wild type and

the mutants (Mann-Whitney test: **, P o 0.01;

*, P o 0.05).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 6 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 71: Nature Structural Molecular Biology February

DISCUSSIONSRS2 and SGS1 both stabilize long CAG�CTG repeatsPrevious studies25 using (CAG�CTG)13 or (CAG�CTG)25 repeatsshowed that repeat expansions in srs2D cells were mostly independentof RAD51. Therefore, the authors proposed that a nonrecombinationalpathway, for example, slippage of the 3¢ end of a nascent strandcombined with hairpin formation, generates expansions in srs2Dcells42. In the present work, we show that size changes in srs2D cellsoccur mainly by homologous recombination. We therefore proposethat when repeat size reaches a given threshold (between 25 and55 triplets), expansions in srs2D cells occur mainly by homo-logous recombination between sister chromatids (because strainsare haploid).

In the same work25, SGS1 was shown to have no effect on(CAG�CTG)25 trinucleotide repeats, whereas, in the present study,the frequency of contractions was substantially increased in sgs1Dstrains. Therefore, there seems to be a size threshold above which SGS1is important to maintain trinucleotide repeat stability. This idea isstrengthened by an experiment using (CTG)40 repeats, for which nocontractions were observed out of 92 colonies analyzed in the sgs1Dmutant (data not shown), suggesting that 40 repeats is under thethreshold requiring a functional SGS1 gene. Some contractions in thesgs1D mutant may be generated by SSA, as this is an efficient pathwayto generate contractions between two (CAG�CTG) repeats5. Sgs1 isalso involved in rejection of mismatched SSA, suggesting a role forSgs1 in unwinding SSA intermediates43,44 and preventing SSA eventsthat would lead to contractions (Fig. 5, above right).

We note that, although the results from both experimental systems(YAC and chromosome) are in good agreement, the level of instabilityis higher on the YAC than on the chromosome. However, it is wellknown that cis-acting effects such as chromosomal location and repeatlength have major roles in regulating trinucleotide repeat stability

(reviewed in refs. 3,45). Our results suggest that either YAC repeats aremore unstable because they are slightly longer (70 triplets against55 on the chromosome) or because they are located in a chromosomalenvironment that favors instability.

Notably, Srs2 had a stronger, more specific role in preventingfragility in the CAG orientation. In this orientation, CTG hairpinswill occur on the nascent lagging strand, suggesting that Srs2 has animportant role in preventing inappropriate recombination on thisstrand, which could lead to expansions and fork breakdown. Incontrast, Sgs1 was more important in preventing fragility in theCTG orientation, suggesting that it may have a role in unwindinghairpins that form on the lagging-strand template. WRN, the humanhomolog of Sgs1, is known to interact with Polymerase d and facilitatereplication through CGG hairpin structures46,47, and Sgs1 helicase hasthe correct polarity to track along the lagging-strand template in the3¢-to-5¢ direction during lagging-strand replication to unwind CTGhairpins. Persistence of the template hairpin could lead to fork stallingand breakdown, explaining the fragility observed in sgs1D mutants, orthe template hairpin could be bypassed, leading to contractions.

Possible stabilization of trinucleotide repeats by reversed forksWe initially postulated that hairpins formed by trinucleotide repeatscould impede replication, therefore creating more opportunities tostall the fork or to promote the formation of single-stranded gaps onrepeat-containing DNA. This hypothesis is supported by the slowergrowth rate of cells carrying CTG repeats in the presence of hydroxy-urea (Fig. 2). Previous studies showed a weak, diffuse pausing signalon two-dimensional gels for plasmid-borne (CTG)80 repeats in yeast,as compared to the strong pausing signal observed for (CGG)40 or(CCG)40 repeats48. In bacteria, the pausing signal at (CTG)70 repeatscan be clearly detected only when protein synthesis is blocked bychloramphenicol49. In our experiments, we did not observe any strong

Figure 5 A model showing different pathways to

repair replication fork damage due to structure-

forming sequences. Srs2 and Sgs1 helicases act

at the fork to facilitate replication across

structure-forming sequences on the CTG strand

(above). Srs2 can facilitate fork reversal, perhaps

by removing Rad51 from damaged forks, to allow

the damage to be bypassed and the fork to

restart in a manner that prevents breakage and

repeat-length changes (left arrow). Sgs1 may also

help to stabilize a replisome stalled at the

CAG�CTG repeats. In the absence of either of

these helicases, single-strand gaps or double-

strand breaks result (above). Single- or double-

strand breaks can be processed by a Rad51-dependent sister chromatid recombination

pathway, leading to recombination intermediates

that are dissolved by Sgs1–Top3 (middle).

Slippage associated with DNA synthesis of

CAG�CTG repeats can lead to repeat contractions

or expansions7. Double-strand breaks may be

repaired by homologous recombination, leading to

trinucleotide repeat instability (right downward

arrow), by Rad51-independent single-strand

annealing (right, above), or, if unrepaired, will

result in loss of a chromosome arm as observed

in our fragility assay. Joint molecules observed by

two-dimensional gels (shown inside the gray box)

correspond mainly to reversed forks but may also represent some Rad51-dependent sister chromatid recombination intermediates (see text for details).

Alternatively, damage may be processed by a template-switching mechanism, as proposed previously60, leading to similar recombination intermediates.

In the absence of both Sgs1 and Rad51 proteins, instability occurs by another pathway, in which replication slippage following formation of unresolved

secondary structures on the lagging strand or on its template leads to contractions or expansions (dotted arrow, above left).

Replicationslippage

Forkreversal Sister chromatid

recombination

Resolution andreplication restart

Repeatsstabilized

Repeatsexpansions

orcontractions

Gene conversionwith or without

contractions or expansions

Single-strandgap

Fork cleavageor breakdown

Double-strandbreak

Single-strandannealing

Possiblejoint molecules

Rad52

Sgs1

Sgs1

Top3

Sgs1

Top3

Srs2

Rad52

Rad51

Srs2

Sgs1

Rad52

Rad51

ART IC L E S

16 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 72: Nature Structural Molecular Biology February

pause of the replication fork near or within a chromosome-borne(CTG)55 trinucleotide repeat tract in the wild type or any of themutant strains studied, although we cannot exclude that a weakand/or transient pause exists. If this transient pause is rapidly con-verted into joint molecules, this would preclude its detection as a spoton the Y arc.

The joint molecules detected migrated in a similar manner toreversed forks seen in rad53 yeast mutants50 or hemicatenanesdetected during replication near ARS305 in yeast40. Joint moleculeswere barely detectable in the srs2D mutant and in the srs2D rad51Ddouble mutant (Fig. 4), indicating that Srs2 is involved in theirformation or processing. In addition, joint molecules were still visiblein the rad51D mutant. Therefore, it is unlikely that they representclassical recombination intermediates. We propose that the jointmolecules observed are mostly molecules resulting from replicationfork reversal (Fig. 5). In support of this hypothesis, it was very recentlyshown that reversed replication forks appear during replication from abacteriophage T4 chromosomal origin: in the presence of the gp46nuclease, there is a transient accumulation of intermediates, forming aconical shape rather than a discrete spike, similarly to what we haveobserved in the presence of trinucleotide repeats (Fig. 3b). Thisconical shape contains reversed replication forks whose double-stranded ends have been partially resected by the gp46 nuclease51.

A recent study52 supports the hypothesis that replication forkreversal occurs frequently during replication of trinucleotide repeatsin a synthetic replication fork model. We must point out thatreplication fork reversal can occur in linear DNA but is restrainedin topologically closed DNA53,54. Therefore, in vivo, one must spec-ulate that single- or double-strand breaks occur to release topologicalconstraints and allow fork reversal.

In the sgs1D mutant, we observed a significant decrease in theamount of joint molecules, with this decrease being partially depen-dent on the presence of RAD51 (Fig. 4). This is different from whatwas observed in a previous study in budding yeast, in which X-shapedintermediates accumulated in sgs1D cells55. Therefore, it is unlikelythat the two kinds of molecules are similar. Sgs1 is therefore likely tobe involved in stabilizing CTG repeat–containing replication forks,allowing possible subsequent formation of reversed forks. Consistentwith the observation that Sgs1 contributes to replisome stability56, inits absence CTG repeat–containing forks would break, leading to areduction in the formation of reversed forks, and hence of jointmolecules, and to a corresponding increase in repeat fragility.

Various mechanisms have been proposed to lead to trinucleotiderepeat expansions, all of which are based on the basic idea thattrinucleotide repeats form stable hairpins during different DNAmetabolic processes2–4,45. We propose that unrestricted recombinationthat occurs subsequent to replication through these structure-formingsequences is another factor contributing to the expansion of longCAG�CTG repeats.

METHODSStrains. Strains used in this study are haploid and isogenic to the S288c strain

except for the mutations indicated (Supplementary Table 2 online). Strains

containing trinucleotide repeats in the CTG orientation were derivatives of the

GFY117 strain6. Strains containing trinucleotide repeats in the CAG orientation

were built by transforming linearized plasmid pTRI131 into wild-type, srs2D or

sgs1D cells. We built the plasmid pTRI131 by flipping the trinucleotide repeat

tract in pTRI110 (ref. 6). Deletions were done by PCR-mediated gene replace-

ment. SRS2 and SGS1deletions were made in the GFY117 strain by transforma-

tion of PCR fragments containing the HIS3 selection marker and short flanking

homologies, to give rise to GFY120 and GFY121 strains. We generated RAD51

and RAD52 deletions by transformation of the KANMX marker flanked by

short homologies, in strains GFY117, GFY120 or GFY121. For the fragility

experiments on the YAC, SRS2, SGS1 and RAD51 single mutants in the

BY4742 strain were obtained from the yeast MATa deletion set. The double

mutants derived from BY4742 were generated by transformation of

HISMX6 marker flanked by short (40-bp) homologies. To create YAC-

containing strains, we introduced YACs with or without CAG�CTG repeats

into the various strains via mating with a kar1-1 strain. Subsequently to the

introduction of the YAC, repeat length and the genotype of the strain

containing the YAC were confirmed by both genetics and PCR analysis.

Transformants were screened and CAG�CTG tract length was verified by

PCR and by Southern blot. We created the strains with a YAC carrying the

repeats in the CTG orientation by the following method. Wild-type yeast

strains that carried the YAC with no repeats were plated on a 5-fluoroorotic

acid (FOA)-containing plate to select for cells that had undergone a YAC

breakage event. Colonies that grew on the FOA plate were analyzed by Southern

blot to confirm the breakage and subsequent healing at the G4T4/C4A4

sequence. Cells with the correct YAC structure were then transformed with a

linearized pVS20 plasmid that had the repeats in the CTG orientation and

selected for the ability to grow on plates lacking uracil. Transformants were

checked by Southern and PCR analysis for the presence of the repeats in the

correct orientation.

Molecular analysis of CTG and CAG trinucleotide repeat size at ARG2. For

each experiment, a single colony was diluted in water, plated on a YPD plate

and incubated at 30 1C for 2 d. From this plate, 12 single colonies were picked

and inoculated in 1.8-ml cultures in sterile microplates (ABGene AB-0932) and

incubated at 30 1C for 24 h. DNA was extracted directly in microplates,

following the standard Zymolyase procedure for yeast cells. All DNA transfers

during the preparation process were made by a Hydra-96 syringes automatic

microdispenser (Robbins Scientific). After DNA extractions, PCRs were per-

formed in microplates (Sorenson) in an Eppendorf Mastercycler to amplify the

repeat tract and its flanking regions. PCR products were migrated on 1.2%

agarose gels without ethidium bromide and stained after the migration.

Alternatively to PCR, total genomic DNA was digested and trinucleotide repeat

sizes were analyzed by Southern blot. Whenever the 12 colonies coming from

the same plate showed the same contraction or expansion, we assumed that the

rearrangement occurred in the mother colony, before plating, and these clones

were not taken into account in our analysis.

Molecular analysis of CTG and CAG repeat size and fragility on the YAC.

Cells were plated for single colonies on supplemented minimal medium lacking

uracil and leucine (YC –Leu –Ura) to maintain selection for the YAC and

grown at 30 1C. At least two separate cytoductants (from kar1-1 matings) were

tested for each strain. The rate of FOA resistance was calculated by the

maximum-likelihood method using SALVADOR software. The healing of the

YAC at the G4T4/C4A4 tract was confirmed for a subset of FOAR colonies by

Southern blotting (data not shown). For determining the stability of CTG or

CAG repeats on the YAC, the repeat tract was PCR amplified from colonies that

grew on the YC –Leu plates used in the fragility assay to determine total cell

count using CTGrev2 (5¢-CCCAGGCCTCCAGTTTGC-3¢) and T7 (5¢-TAA

TACGACTCACTATAGGG-3¢) primers and an IDPOL polymerase (ID labs).

Products were separated on 2% Metaphor gels and the CAG tract size was

estimated by using the TotalLab software (Nonlinear Dynamics).

Orientation effect of CAG�CTG repeats on growth on hydroxyurea. Ten-fold

serial dilutions of yeast strains were spotted on plates containing 0.2 M

hydroxyurea or on standard glucose plates and allowed to grow for 3 d or

2 d, respectively, at 30 1C.

Two-dimensional gel analyses. The closest replication origin to the ARG2

locus is ARS1010, a well-characterized late replication origin, that fires

approximately 30 min after the beginning of S phase57,58 and is located

7.2 kb 5¢ to the ARG2 gene (Fig. 3). Cells were grown overnight at 30 1C, in

200-ml YPD cultures. When concentration reached 107 cells per ml, cells were

centrifuged, washed, resuspended in fresh YPD medium at a concentration of

0.75 � 107 cells per ml and grown for another 45 min at 23 1C to slow

replication. Afterwards, 2 � 109 cells were synchronized using 3mg ml–1

a-factor for 2 h at 23 1C (or 2.5 h for rad51D cells). G1 arrest was checked

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 6 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 73: Nature Structural Molecular Biology February

by microscope observation. When more than 90% of the cells were arrested,

they were centrifuged, washed and resuspended in 200 ml fresh YPD medium

at 23 1C. Progression of replication was followed by microscope observation

and confirmed by fluorescence-activated cells sorting (FACS) analysis. Cells

were harvested after 30 min, 40 min, 60 min and 90 min and killed by addition

of sodium azide (0.1% final concentration). Total genomic DNA was extracted

by the CTAB procedure59, from cells entering S phase to G2-M phase,

and analyzed on two-dimensional gels. DNA was transfered overnight in

10� SSC on a charged nylon membrane (Sigma) and UV cross-linked on a

Stratagene Stratalinker. Hybridization was performed with a 750-bp probe to

the 5¢ end of the ARG2 gene, labeled by random priming. Quantifications were

performed on a Phosphorimager, using the ImageQuant software. For quanti-

fication of joint molecules, we took into account both the cone signal and the

spike signal (as pictured in Fig. 1b).

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSA.K. is grateful to G. Maffioletti for teaching her the two-dimensional gelelectrophoresis technique and helping with her first successful experiments.A.K. and G.-F.R. thank the people who gave them advice concerning two-dimensional gel electrophoresis: M. Foiani, C. Maric, A. Ceschia and A. Kaykov.They also gratefully acknowledge the help of G. Millot for advice concerning thevarious statistical tests used in this manuscript. They and B.D. also thank theircolleagues of the Unite de Genetique Moleculaire des Levures for many fruitfuldiscussions and G. Fischer for careful reading of the manuscript. R.P.A. wouldlike to acknowledge the help of K. Suryanarayanan in installing the SALVADORprogram. A.K. was funded by the Ministere de la Recherche and the Fondationpour la Recherche Medicale (FRM). This work was supported by grant 3738from the Association pour la Recherche contre le Cancer (ARC),grant ANR-05-BLAN-0331 from the Agence Nationale de la Recherche, USNational Institutes of Health grant GM063066 to C.H.F., Tufts University FRACaward to C.H.F. and GSC Research Award to R.P.A. B.D. is a member of theInstitut Universitaire de France.

AUTHOR CONTRIBUTIONSA.K. and G.-F.R. conceived of and performed the instability and two-dimensionalstudies on yeast chromosome X; C.H.F. and R.P.A. conceived of and performedthe instability and fragility studies on the YAC, with R.S. contributing the rad51D(CAG)0 and (CAG)70 fragility analyses; R.B. and G.L. gave expert assistance withtwo-dimensional gel electrophoresis; A.K., B.D., G.-F.R., C.H.F. and R.P.A.analyzed the data and wrote the manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Gatchel, J.R. & Zoghbi, H.Y. Diseases of unstable repeat expansion: mechanisms andcommon principles. Nat. Rev. Genet. 6, 743–755 (2005).

2. Mirkin, S.M. DNA structures, repeat expansions and human hereditary disorders. Curr.Opin. Struct. Biol. 16, 351–358 (2006).

3. Lenzmeier, B.A. & Freudenreich, C.H. Trinucleotide repeat instability: a hairpin curveat the crossroads of replication, recombination, and repair. Cytogenet. Genome Res.100, 7–24 (2003).

4. Pearson, C.E., Edamura, K.N. & Cleary, J.D. Repeat instability: mechanisms ofdynamic mutations. Nat. Rev. Genet. 6, 729–742 (2005).

5. Richard, G.-F., Dujon, B. & Haber, J.E. Double-strand break repair can lead to highfrequencies of deletions within short CAG/CTG trinucleotide repeats. Mol. Gen. Genet.261, 871–882 (1999).

6. Richard, G.-F., Cyncynatus, C. & Dujon, B. Contractions and expansions of CAG/CTGtrinucleotide repeats occur during ectopic gene conversion in yeast, by a MUS81-independent mechanism. J. Mol. Biol. 326, 769–782 (2003).

7. Richard, G.-F., Goellner, G.M., McMurray, C.T. & Haber, J.E. Recombination-inducedCAG trinucleotide repeat expansions in yeast involve the MRE11/RAD50/XRS2 com-plex. EMBO J. 19, 2381–2390 (2000).

8. Richard, G.-F. & Paques, F. Mini- and microsatellite expansions: the recombinationconnection. EMBO Rep. 1, 122–126 (2000).

9. Freudenreich, C.H., Kantrow, S.M. & Zakian, V.A. Expansion and length-dependentfragility of CTG repeats in yeast. Science 279, 853–856 (1998).

10. Spiro, C. et al. Inhibition of FEN-1 processing by DNA secondary structure attrinucleotide repeats. Mol. Cell 4, 1079–1085 (1999).

11. Schweitzer, J.K. & Livingston, D.M. Expansions of CAG repeat tracts are frequent in ayeast mutant defective in Okazaki fragment maturation. Hum. Mol. Genet. 7, 69–74(1998).

12. Loeillet, S. et al. Genetic network interactions among replication, repair and nuclearpore deficiencies in yeast. DNA Repair (Amst.) 4, 459–468 (2005).

13. Gangloff, S., McDonald, J.P., Bendixen, C., Arthur, L. & Rothstein, R. The yeast type Itopoisomerase Top3 interacts with Sgs1, a DNA helicase homolog: a potentialeukaryotic reverse gyrase. Mol. Cell. Biol. 14, 8391–8398 (1994).

14. Watt, P.M., Hickson, I.D., Borts, R.H. & Louis, E.J. SGS1, a homologue of the Bloom’sand Werner’s syndrome genes, is required for maintenance of genome stability inSaccharomyces cerevisiae. Genetics 144, 935–945 (1996).

15. Richard, G.-F., Kerrest, A., Lafontaine, I. & Dujon, B. Comparative genomics ofhemiascomycete yeasts: genes involved in DNA replication, repair, and recombination.Mol. Biol. Evol. 22, 1011–1023 (2005).

16. Wu, L. & Hickson, I.D. The Bloom’s syndrome helicase suppresses crossing over duringhomologous recombination. Nature 426, 870–874 (2003).

17. Ira, G., Malkova, A., Liberi, G., Foiani, M. & Haber, J.E. Srs2 and Sgs1-Top3suppress crossovers during double-strand break repair in yeast. Cell 115, 401–411(2003).

18. Robert, T., Dervins, D., Fabre, F. & Gangloff, S. Mrc1 and Srs2 are major actors in theregulation of spontaneous crossover. EMBO J. 25, 2837–2846 (2006).

19. Adams, M.D., McVey, M. & Sekelsky, J.J. Drosophila BLM in double-strandbreak repair by synthesis-dependent strand annealing. Science 299, 265–267(2003).

20. Aboussekhra, A. et al. RADH, a gene of Saccharomyces cerevisiae encoding a putativeDNA helicase involved in DNA repair. Characteristics of radH mutants and sequence ofthe gene. Nucleic Acids Res. 17, 7211–7219 (1989).

21. Rong, L. & Klein, H.L. Purification and characterization of the SRS2 DNAhelicase of the yeast Saccharomyces cerevisiae. J. Biol. Chem. 268, 1252–1259(1993).

22. Krejci, L. et al. DNA helicase Srs2 disrupts the Rad51 presynaptic filament. Nature423, 305–309 (2003).

23. Veaute, X. et al. The Srs2 helicase prevents recombination by disrupting Rad51nucleoprotein filaments. Nature 423, 309–312 (2003).

24. Dupaigne, P. et al. The Srs2 helicase activity is stimulated by Rad51 filaments ondsDNA: implications for crossover incidence during mitotic recombination. Mol. Cell29, 243–254 (2008).

25. Bhattacharyya, S. & Lahue, R.S. Saccharomyces cerevisiae Srs2 DNA helicaseselectively blocks expansions of trinucleotide repeats. Mol. Cell. Biol. 24,7324–7330 (2004).

26. Callahan, J.L., Andrews, K.J., Zakian, V.A. & Freudenreich, C.H. Mutations in yeastreplication proteins that increase CAG/CTG expansions also increase repeat fragility.Mol. Cell. Biol. 23, 7849–7860 (2003).

27. Napierala, M., Parniewski, P., Pluciennik, A. & Wells, R.D. Long CTG�CAG repeatsequences markedly stimulate intramolecular recombination. J. Biol. Chem. 277,34087–34100 (2002).

28. Pluciennik, A. et al. Long CTG�CAG repeats from myotonic dystrophy are preferredsites for intermolecular recombination. J. Biol. Chem. 277, 34074–34086(2002).

29. Jankowski, C., Nasar, F. & Nag, D.K. Meiotic instability of CAG repeat tracts occurs bydouble-strand break repair in yeast. Proc. Natl. Acad. Sci. USA 97, 2134–2139(2000).

30. Gangloff, S., Soustelle, C. & Fabre, F. Homologous recombination is responsible for celldeath in the absence of the Sgs1 and Srs2 helicases. Nat. Genet. 25, 192–194(2000).

31. Fabre, F., Chan, A., Heyer, W.-D. & Gangloff, S. Alternate pathways involving Sgs1/Top3, Mus81/Mms4, and Srs2 prevent formation of toxic recombination intermediatesfrom single-stranded gaps created by DNA replication. Proc. Natl. Acad. Sci. USA 99,16887–16892 (2002).

32. Freudenreich, C.H., Stavenhagen, J.B. & Zakian, V.A. Stability of a CTG/CAG trinucleo-tide repeat in yeast is dependent on its orientation in the genome. Mol. Cell. Biol. 17,2090–2098 (1997).

33. Kang, S., Jaworski, A., Ohshima, K. & Wells, R.D. Expansion and deletion of CTGrepeats from human disease genes are determined by the direction of replication inE. coli. Nat. Genet. 10, 213–217 (1995).

34. Zahra, R., Blackwood, J.K., Sales, J. & Leach, D.R.F. Proofreading andsecondary structure processing determine the orientation dependence of CAG�CTGtrinucleotide repeat instability in Escherichia coli. Genetics 176, 27–41(2007).

35. Miret, J.J., Passoa-Brandao, L. & Lahue, R.S. Orientation-dependent and sequence-specific expansions of CTG/CAG trinucleotide repeats in Saccharomyces cerevisiae.Proc. Natl. Acad. Sci. USA 95, 12438–12443 (1998).

36. Maurer, D.J., O’Callaghan, B.L. & Livingston, D.M. Orientation dependence of trinu-cleotide CAG repeat instability in Saccharomyces cerevisiae. Mol. Cell. Biol. 16,6617–6622 (1996).

37. Alvino, G.M. et al. Replication in hydroxyurea: it’s a matter of time. Mol. Cell. Biol. 27,6396–6406 (2007).

38. Lammers, M. & Follmann, H. Deoxyribonucleotide biosynthesis in yeast (Saccharo-myces cerevisiae). A ribonucleotide reductase system of sufficient activity for DNAsynthesis. Eur. J. Biochem. 140, 281–287 (1984).

39. Bhattacharyya, S. & Lahue, R.S. Srs2 helicase of Saccharomyces cerevisiaeselectively unwinds triplet repeat DNA. J. Biol. Chem. 280, 33311–33317(2005).

40. Lopes, M., Cotta-Ramusino, C., Liberi, G. & Foiani, M. Branch migrating sisterchromatid junctions form at replication origins through Rad51/Rad52-independentmechanisms. Mol. Cell 12, 1499–1510 (2003).

ART IC L E S

16 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 74: Nature Structural Molecular Biology February

41. Flores, M.J., Sanchez, N. & Michel, B. A fork-clearing role for UvrD. Mol. Microbiol.57, 1664–1675 (2005).

42. Daee, D.L., Mertz, T. & Lahue, R.S. Postreplication repair inhibits CAG-CTGrepeat expansions in Saccharomyces cerevisiae. Mol. Cell. Biol. 27, 102–110 (2007).

43. Goldfarb, T. & Alani, E. Distinct roles for the Saccharomyces cerevisiae mismatchrepair proteins in heteroduplex rejection, mismatch repair and nonhomologous tailremoval. Genetics 169, 563–574 (2005).

44. Sugawara, N., Goldfarb, T., Studamire, B., Alani, E. & Haber, J.E. Heteroduplex rejec-tion during single-strand annealing requires Sgs1 helicase and mismatch repair proteinsMsh2 and Msh6 but not Pms1. Proc. Natl. Acad. Sci. USA 101, 9315–9320 (2004).

45. Richard, G.F., Kerrest, A. & Dujon, B. Comparative genomics and molecular dynamicsof DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. 72, 686–727 (2008).

46. Kamath-Loeb, A.S., Johansson, E., Burgers, P.M. & Loeb, L.A. Functional interactionbetween the Werner Syndrome protein and DNA polymerase d. Proc. Natl. Acad. Sci.USA 97, 4603–4608 (2000).

47. Kamath-Loeb, A.S., Loeb, L.A., Johansson, E., Burgers, P.M. & Fry, M. Interactionsbetween the Werner syndrome helicase and DNA polymerase d specifically facilitatecopying of tetraplex and hairpin structures of the d(CGG)n trinucleotide repeatsequence. J. Biol. Chem. 276, 16439–16446 (2001).

48. Pelletier, R., Krasilnikova, M.M., Samadashwily, G.M., Lahue, R. & Mirkin, S.M.Replication and expansion of trinucleotide repeats in yeast. Mol. Cell. Biol. 23,1349–1357 (2003).

49. Samadashwily, G.M., Raca, G. & Mirkin, S.M. Trinucleotide repeats affect DNAreplication in vivo. Nat. Genet. 17, 298–304 (1997).

50. Lopes, M. et al. The DNA replication checkpoint response stabilizes stalled replicationforks. Nature 412, 557–561 (2001).

51. Long, D.T. & Kreuzer, K.N. Regression supports two mechanisms of fork processing inphage T4. Proc. Natl. Acad. Sci. USA 105, 6852–6857 (2008).

52. Fouche, N., Ozgur, S., Roy, D. & Griffith, J.D. Replication fork regression in repetitiveDNAs. Nucleic Acids Res. 34, 6044–6050 (2006).

53. Fierro-Fernandez, M., Hernandez, P., Krimer, D.B. & Schvartzman, J.B. Replicationfork reversal occurs spontaneously after digestion but is constrained in supercoileddomains. J. Biol. Chem. 282, 18190–18196 (2007).

54. Fierro-Fernandez, M., Hernandez, P., Krimer, D.B. & Schvartzman, J.B. Topologicallocking restrains replication fork reversal. Proc. Natl. Acad. Sci. USA 104,1500–1505 (2007).

55. Liberi, G. et al. Rad51-dependent DNA structures accumulate at damaged replicationforks in sgs1 mutants defective in the yeast ortholog of BLM RecQ helicase. Genes Dev.19, 339–350 (2005).

56. Cobb, J.A., Bjergbaek, L., Shimada, K., Frei, C. & Gasser, S.M. DNA polymerasestabilization at stalled replication forks requires Mec1 and the RecQ helicase Sgs1.EMBO J. 22, 4325–4336 (2003).

57. Nieduszynski, C.A., Knox, Y. & Donaldson, A.D. Genome-wide identification ofreplication origins in yeast by comparative genomics. Genes Dev. 20, 1874–1879(2006).

58. Raghuraman, M.K. et al. Replication dynamics of the yeast genome. Science 294,115–121 (2001).

59. Liberi, G. et al. Methods to study replication fork collapse in budding yeast. MethodsEnzymol. 409, 442–462 (2006).

60. Goldfless, S.J., Morag, A.S., Belisle, K.A., Sutera, V.A.J. & Lovett, S.T. DNA repeatrearrangements mediated by DnaK-dependent replication fork repair. Mol. Cell 21,595–604 (2006).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 6 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 75: Nature Structural Molecular Biology February

Helix movement is coupled to displacement of the secondextracellular loop in rhodopsin activationShivani Ahuja1, Viktor Hornak2, Elsa C Y Yan3,6, Natalie Syrett4, Joseph A Goncalves2, Amiram Hirshfeld5,Martine Ziliox2, Thomas P Sakmar3, Mordechai Sheves5, Philip J Reeves4, Steven O Smith2 & Markus Eilers2

The second extracellular loop (EL2) of rhodopsin forms a cap over the binding site of its photoreactive 11-cis retinylidenechromophore. A crucial question has been whether EL2 forms a reversible gate that opens upon activation or acts as a rigidbarrier. Distance measurements using solid-state 13C NMR spectroscopy between the retinal chromophore and the b4 strandof EL2 show that the loop is displaced from the retinal binding site upon activation, and there is a rearrangement in thehydrogen-bonding networks connecting EL2 with the extracellular ends of transmembrane helices H4, H5 and H6. NMRmeasurements further reveal that structural changes in EL2 are coupled to the motion of helix H5 and breaking of the ionic lockthat regulates activation. These results provide a comprehensive view of how retinal isomerization triggers helix motion andactivation in this prototypical G protein–coupled receptor.

G protein–coupled receptors (GPCRs) comprise the largest and mostdiverse superfamily of membrane receptors, with a simple architec-tural core of seven transmembrane helices (H1 to H7) connected bytypically short extracellular and cytoplasmic loops. Sequence varia-bility within the transmembrane helices and extracellular loops allowGPCRs to respond to diverse stimuli, including light and a widevariety of ligands. Small-molecule ligands can bind within the helicalcore of the receptor, whereas larger peptide and protein ligands bind atthe extracellular loops. The second extracellular loop (EL2) in parti-cular has been the target of a number of functional studies indicatingthat it has an integral role in activation of GPCRs that bind eithersmall molecules or large peptide ligands1–4.

The vertebrate visual pigments are unique in the class A GPCRs inthat they are activated by photoreaction of an 11-cis retinylidenechromophore. The retinal is covalently attached via a protonatedSchiff base (PSB) within the seven-transmembrane-helix bundle.The crystal structure of rhodopsin indicates that EL2 extends fromTrp175 on H4 to Thr198 on H5. The intriguing aspect about the EL2sequence is that it folds into a highly ordered and stable structureconsisting of two short b-strands (b3 and b4) that form a lid over theretinal binding site5,6. EL2 is constrained by a conserved disulfidebond between Cys110 at the end of H3 and Cys187 on b4 thatis crucial for the correct folding of rhodopsin7,8. Other than theCys110-Cys187 disulfide bond, the EL2 sequence is not conservedamong the class A GPCRs.

The structure of EL2 in rhodopsin is stabilized by several polarresidues that form a well-defined hydrogen-bonded network

(Supplementary Fig. 1a online). At the center of this network isGlu181 on the b3 strand. Glu181 is hydrogen-bonded to Tyr192 (b4)and Tyr268 (H6) and is connected through water-mediated hydrogenbonds to Ser186 (EL2) and to Glu113 (H3), the counterion to theretinal PSB6. Glu113 is hydrogen-bonded to the backbone carbonyl ofCys187 (EL2) through a water molecule and is within hydrogen-bonding distance to the hydroxyl group of Thr94 (H2)6. The involve-ment of Glu113 in this stable hydrogen-bonded network is importantin raising the pKa of the Schiff base (above 16)9 and ensuring that itremains protonated in the dark state of rhodopsin10,11. Besides theconserved disulfide bond and the hydrogen-bonding network involv-ing Glu181, there are a striking number of hydrogen-bonding inter-actions between the b-strands and the ends of the transmembranehelices (for example, Trp175-Ser202, Ser176-Thr198, Arg177-Asp190and Tyr178-Ala168). Computational studies identified this region aspart of a stable folding core of rhodopsin12, suggesting that EL2 isimportant for maintaining a stable, inactive receptor conformation.

In contrast to the role of EL2 as a stable cap, several studies havesuggested that EL2 is dynamic and mediates both receptor activity andligand binding. It has been proposed that in the C5a receptor, EL2serves as a negative regulator3, by a mechanism where the loop insertsbetween the transmembrane helices to block receptor activity and thenis released upon ligand binding. Other work suggested that a shortEL2 in the melanocortin receptor, which is unable to insert into thehelical transmembrane core, leads to a high level of constitutiveactivation13. In the recent crystal structure of the b2-adrenergicreceptor (b2-AR)14, EL2 is not closely associated with the ligand

Received 31 August 2008; accepted 2 January 2009; published online 1 February 2009; doi:10.1038/nsmb.1549

1Departments of Physics and Astronomy, 2Biochemistry and Cell Biology, Stony Brook University, Stony Brook, New York 11794-5215, USA. 3Laboratory of MolecularBiology and Biochemistry, The Rockefeller University, 1230 York Avenue, New York, New York 10065, USA. 4Department of Biological Sciences, University of Essex,Wivenhoe Park, Essex C04 3SQ, UK. 5Department of Organic Chemistry, The Weizmann Institute, Rehovot 76100, Israel. 6Present address: Department of Chemistry,Yale University, New Haven, Connecticut 06520, USA. Correspondence should be addressed to S.O.S. ([email protected]).

16 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 76: Nature Structural Molecular Biology February

binding site. The b2-AR structure, along with the observation thatshort loops may be correlated with constitutively active GPCRs, raisesthe question of whether the role of EL2 as a stable cap is unique inrhodopsin because of the crucial requirement that visual pigmentsmust have very low basal activity in the dark.

Here we use 13C magic angle spinning (MAS) NMR spectroscopy toaddress the position of EL2 in rhodopsin and in the active meta-rhodopsin II (meta II) intermediate, and show how motion of EL2 iscoupled to motion of transmembrane helix H5 and the insertion ofTyr223 into the region of the ‘ionic lock’ between H3 and H6. Weobtained retinal-protein and protein-protein distance constraints fromNMR measurements for rhodopsin and meta II (SupplementaryFig. 1b and Supplementary Table 1 online), and we used them toperform restrained molecular dynamic simulations to obtain anatomistic model of meta II. Chemical shift measurements of theconserved Cys110-Cys187 disulfide bond and distance measurementsbetween the retinal chromophore and the b4 strand of EL2 areconsistent with motion of EL2 away from the agonist all-trans retinalSchiff base upon receptor activation. Mutational studies on Glu181(EL2) and Met288 (H7) show that the hydrogen-bonding network onEL2 is coupled to the hydrogen-bonding network centered on H5involving His211, which in turn leads to rearrangement of theintracellular end of H5 in meta II. Together, these results explainhow EL2 is a pivotal element in locking the extracellular ends of H5,H6 and H7 in inactive conformations in the dark and how EL2motion allows the intracellular ends of these helices to shift into activeconformations in the light.

RESULTSActivation of rhodopsin is initiated by photoisomerization of itsretinal chromophore within a tightly packed protein environment.Because the all-trans retinal chromophore in the active meta IIintermediate does not fit in the retinal binding site of the dark-stateof rhodopsin15, conformational changes of a highly strained retinalmust induce changes in the structure of the receptor to release theabsorbed light energy.

EL2 is displaced from the retinal binding site in meta IIThe first indication that the structure or position of EL2 changesin meta II arises from the large chemical shift changes observed for13Cb-Ser186, 13Cb-Cys187 and 13Ca-Gly188 (Fig. 1a). The Cys110-Cys187 disulfide bond is the only conserved feature in EL2. Figure 1bpresents 13C dipolar assisted rotational resonance (DARR) NMR spectraof rhodopsin (black) and meta II (red) labeled with 13Cb-cysteine. Theb-carbon resonances in disulfide bonds occur in a unique chemicalshift window (34–50 p.p.m.) and are sensitive to the secondarystructure with a range of 34–43 p.p.m. for a-helices and 36–50p.p.m. for b-sheets16. Figure 1b shows strong cross-peaks between

the Cys110-Cys187 b-carbon resonances at 36.4 p.p.m. and46.8 p.p.m., respectively. The 46.8-p.p.m. chemical shift of Cys187 isconsistent with its location in the b4 strand of EL2. Upon conversionto meta II, the Cys187 resonance shifts to 50.1 p.p.m. owing to achange in the conformation of EL2 or a change in the environmentaround Cys187. The chemical shift of Cys110 does not changeappreciably (�0.2 p.p.m.), indicating that the secondary structure ofH3 near Cys110 does not change in meta II.

In addition to the chemical shift changes observed in Cys187, weobserved an B1.6-p.p.m. change in the 13Cb chemical shift of Ser186and a 2.9-p.p.m. change in the 13Ca resonance of Gly188 (Supple-mentary Figs. 2 and 3 online). The 13Cb-Ser186 chemical shift changemay be attributed to a change in the hydrogen-bonding interactionof Ser186 with surrounding residues on EL2 and H3, whereas the13Ca-Gly188 chemical shift is likely to be due to changes in backbonetorsion angles.

Confirmation of the motion of EL2 away from the retinal bindingsite in meta II comes from direct distance measurements. The b4strand of EL2 is aligned almost parallel to the retinal in the bindingsite, with Cys185 close to the PSB end of the retinal and with Ile189close to the retinal b-ionone ring. We observed close contact betweenthe retinal 13C14 and 13C15 carbons and 13Cb-Ser186 (Fig. 2a),between the retinal 13C12 and 13C20 carbons and 13C1-Cys187(Fig. 2b), and between the retinal 13C12 and 13C20 carbonsand 13Ca-Gly188 in rhodopsin (Fig. 2c). These contacts are lost inmeta II. Moreover, we were not able to observe contacts in rhodopsinor meta II between the retinal 13C9 and 13C12 carbons andU-13C6-Ile189 (Fig. 2d).

As indicated above, in general we found that the distances obtainedfrom NMR measurements on rhodopsin were comparable with thecorresponding distances in the rhodopsin crystal structure beforeconverting to meta II. The meta II intermediate that we trapped atlow temperature in n-dodecyl-b-D-maltoside (DDM) was present in asingle, well-defined state (Methods). We typically observed strongcross-peaks for 13C-13C distances of B4.0 A or less, moderate cross-peaks for distances of up to 5.0 A and weak cross-peaks for distancesof up to 6.0 A. Consequently, the lack of contacts in meta II indicatethat retinal-EL2 distances are on the order of 6.0 A or more. Inrhodopsin, we observed strong contacts between the 13C1-Cys187 onEL2 and the retinal 13C12 and 13C20 carbons (Fig. 2b). In therhodopsin crystal structure6, Cys187 is 4.21 A and 6.22 A from theretinal C12 and C20 carbons, respectively. On conversion to meta II,we lost both retinal contacts with Cys187, consistent with an increasein separation between EL2 and the retinal.

Further support for separation between the retinal and EL2 in metaII comes from (i) the loss of tyrosine-glycine contacts in meta II and(ii) assignment of a cross-peak at 46.5 p.p.m. between the 13C20methyl carbon on the retinal and a 13Ca-glycine residue. There are two

Gly188

IIe189

Met207

RetinalH5

Lys296H7

Ser18619

12 14

1520

H3

Cys187

Cys110 35

40

45

13C

che

mic

al s

hift

13C chemical shift

50

5054 46 42 38 34 p.p.m.

p.p.m.

55

EL2

a b Figure 1 Structural changes involving the conserved Cys110-Cys187

disulfide link on activation of rhodopsin. (a) View of the b4 strand of EL2

from the rhodopsin crystal structure6 highlighting the interactions of Ile189,

Gly188, Cys187 and Ser186 with the polyene chain of the retinal. Cys110

on the extracellular end of H3 forms a conserved disulfide link with Cys187

in b4. (b) A region from the two-dimensional DARR NMR spectrum of

rhodopsin, selectively labeled with 13Cb-cysteine. The figure highlights the

cross-peak between Cys187 (46.8 p.p.m.) and Cys110 (36.4 p.p.m.) in

rhodopsin (black). On conversion to meta II (red), there is a distinct shift in

the cross-peak to 50.1 p.p.m. for Cys187. The 13Cb chemical shift of

Cys110 at B36 p.p.m. does not change appreciably between rhodopsin

and meta II. The eight reduced cysteines in rhodopsin are observed as a

broad resonance at B25 p.p.m. (not shown).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 6 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 77: Nature Structural Molecular Biology February

tyrosine-glycine contacts that connect EL2 with the extracellular endsof transmembrane helices H3 and H6, namely Tyr268-Gly188 andTyr178-Gly114. Both contacts are lost in meta II (SupplementaryFig. 3). There are only two glycines in the binding cavity close to theC20 methyl group: Gly114 on H3 and Gly188 on EL2 (Fig. 3). In metaII, we assign the C20-glycine cross-peak to Gly114 (H3) based on thepresence of this resonance in the two-dimensional DARR spectrum ofthe G188A mutant of meta II. The assignment of a C20-Gly114contact in meta II indicates that the C20-Gly188 contact is lost despitethe large rotation of the C20 methyl group toward EL2 (Supplemen-tary Figs. 4 and 5 online).

The model in Figure 2 shows the crystal structure of rhodopsincontaining the 11-cis (red) retinal PSB tightly packed against EL2. Thedistances between the C20 methyl group and the 13C-labeled positionson Gly114, Cys187 and Gly188 are shown. Cross-peaks between theretinal C20 methyl group and each of these amino acids were observedin the dark. We superimposed the position of the all-trans retinal SB(orange) in meta II predicted using restrained molecular dynamicsimulations (Supplementary Table 1). To satisfy distance constraintsderived from our NMR measurements, in the molecular dynamic

simulations the retinal shifted slightly toward the cytoplasmic side ofthe binding site and EL2 moved toward the extracellular surface.

Rearrangement of hydrogen-bonding networks involving EL2-H5The loss of EL2-retinal contacts in meta II and the changes observed inthe chemical shifts for the b4 strand indicate that EL2 changesposition upon receptor activation. As a result, the next question tobe investigated concerned whether the hydrogen-bonding networkinvolving EL2 remains intact or is disrupted in meta II.

Tyrosine residues are an integral part of the EL2 hydrogen-bonding network6 (Fig. 3). The 13Cz resonances of the 18 tyrosinesin rhodopsin are not resolved (Fig. 4a, black). However, the differencespectrum between rhodopsin and meta II highlights the 13Cz-tyrosineresonances that change upon rhodopsin activation (Fig. 4b). Thereare two well-resolved shoulders in the meta II portion of the differencespectrum (Fig. 4b). The distinct meta II resonance at 153.6 p.p.m. isreadily assigned to Tyr206 on H5 on the basis of the loss of aCz-tyrosine resonance at 153.6 p.p.m. in the meta II component ofthe Y206F mutant difference spectrum (Fig. 4c). Additional supportfor this assignment is provided in Supplementary Figure 6 online.The upfield shift of 13Cz-Tyr206 resonance is consistent with a weakerCz-OH hydrogen bond in meta II.

The downfield resonance at 159.3 p.p.m. is reflective of a morestrongly hydrogen-bonded tyrosine17. Both tyrosines with unusualchemical shifts must be coupled to the hydrogen-bonding networkinvolving Glu181 on EL2, because both resonances were lost in the

Figure 2 Two-dimensional 13C DARR NMR

spectra of retinal-EL2 interactions. Rows from

the two-dimensional 13C DARR NMR spectra of

rhodopsin (black) and meta II (red) are shown.

(a) Rhodopsin labeled with 13Cb-serine and13C14,15-retinal. Cross-peaks are observed

between Ser186 (63.3 p.p.m.) and the 13C14

and 13C15 resonances in dark rhodopsin, which

are lost (arrows) in meta II. (b) Rhodopsin

labeled with 13C1-cysteine and 13C12,20-retinal.

Cross-peaks are observed between Cys187

(170.8 p.p.m.) and the 13C12 and 13C20

resonances in dark rhodopsin, which are lost

(arrows) in meta II. Asterisks correspond to MAS

side bands. (c) Rhodopsin labeled with 13Ca-glycine and 13C12,20-retinal. Cross-peaks are

observed between Gly188 (42.0 p.p.m.) and the13C12 and 13C20 resonances in dark rhodopsin,

which are lost (arrows) in meta II. However, a

new Gly-C20 contact is observed in meta II,

which is assigned to Gly114 (see text).

(d) Rhodopsin labeled with U-13C6-isoleucine and 13C9-retinal. No contacts were observed between Ile189 and C9 on the polyene chain of the retinal in

either rhodopsin (black arrows) or meta II (red arrows). The structure of EL2 in rhodopsin is shown (center), indicating the contacts observed between the

C20 methyl group and Cys187, Gly188 and Gly114 in rhodopsin. To illustrate the displacement of EL2 that is needed to satisfy the NMR constraints, we

have superimposed the rhodopsin crystal structure (gray) with the meta II model (orange) obtained from molecular dynamic simulations guided by our

experimentally determined retinal-protein contacts.

160

a c

b d

140 120 100 80 60 140 100 60 20

140 100

100 50150

100 50150

60 20160

150 100

*

*

*

13C chemical shift (p.p.m.) 13C chemical shift (p.p.m.)

50

150 100 50

140

Cys187-C12

Ser186-C15 Ser186-C14

Gly114-C20

Gly188-C12 Gly188-C20

Cys187-C20

H5

Retinal

H3

EL2

EL2

120 100 80 60

N terminus

Tyr10

Tyr191 Tyr192

Glu181Gly188

Gly114

Tyr178

Tyr206

Met207H3

H6H5 Retinal

Trp265

Tyr268

Met288

EL2

H7

Gly280

Gly3

Figure 3 A view of the extracellular side of rhodopsin from the crystal

structure6. The figure highlights the relative position of six tyrosine residues:

Tyr10, Tyr178, Tyr191, Tyr192, Tyr206 and Tyr268. Of these tyrosines,

Tyr191, Tyr192 and Tyr268 are involved in the hydrogen-bonding network

with Glu181. Tyr268 and Tyr191 are also in close contact with Met288 on

H7. Tyr206 on H5 is involved in a second hydrogen-bonding network withHis211 (H5), Glu122 (H3), Trp126 (H3) and Ala166 (H4) (not shown).

Additionally, the figure shows tyrosine-glycine interactions on the

extracellular side of rhodopsin between Gly188-Tyr268, Gly3-Tyr10-Gly280

and Gly114-Tyr178.

ART IC L E S

17 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 78: Nature Structural Molecular Biology February

tyrosine difference spectrum of the E181Q mutant (Fig. 4d). There isno evidence for a tyrosinate anion18, which would have shown achemical shift closer to 165 p.p.m.17.

To assign the tyrosine resonance at 159.3 p.p.m. in meta II, wecollected difference spectra for a series of rhodopsin mutants (Y268F,Y192F, Y191F and Y178F) in which tyrosine residues in the retinalbinding cavity near Glu181 were mutated individually to phenyl-alanine (Fig. 4e–h). None of the 13Cz-tyrosine difference spectrashows a complete loss of the negative peak at 159.3 p.p.m., except

for the Y268F mutant spectrum, in which the negative peak at 159.3p.p.m. seems to shift to 157.5 p.p.m. We are not able to assign the159.3-p.p.m. resonance to Tyr268 because of the appearance of apositive peak at 159.3 p.p.m. in the dark spectrum of the Y268Fmutant, which suggests that mutation of Tyr268 causes anothertyrosine in the vicinity to become more strongly hydrogen-bonded.In the difference spectrum of the Y191F mutant, the negative peakat B159 p.p.m. is split into two components as compared to thewild-type difference spectrum.

The loss of the 159.3-p.p.m. resonance in the E181Q mutant and itssensitivity to mutation of Tyr268 and Tyr191 strongly suggest anassignment to one of the tyrosines associated with EL2. This assign-ment is supported by two-dimensional DARR data obtained onrhodopsin labeled with 13Cz-tyrosine and 13Ce-methionine. In therhodopsin crystal structure (PDB 1U19), there are five Met(13Ce)-Tyr(13Cz) pairs (Met288-Tyr268, 3.9 A; Met207-Tyr191, 4.8 A;Met288-Tyr191, 5.2 A; Met253-Tyr306, 5.5 A; Met288-Tyr192,5.7 A). In Figure 5a, we observe two cross-peaks between tyrosineand methionine that we assign to the closest methionine-tyrosine pairs(that is, Met288-Tyr268 and Met207-Tyr191). Conversion to meta IIgenerated a cross-peak between the tyrosine resonance at 159.3 p.p.m.and a methionine resonance at 12.8 p.p.m. We can assign thismethionine to Met288 on H7 based on the loss of this cross-peak inthe M288L mutant (Fig. 5b, orange).

The M288L data along with the tyrosine difference spectra aboveindicate that the 159.3-p.p.m. resonance belongs to either Tyr191 orTyr268 in meta II. We assume that the strong hydrogen-bondinginteraction for a tyrosine at 159.3 p.p.m. is due to its interaction withGlu181 and that the appearance of a resonance at 159.3 p.p.m. in theY268F rhodopsin spectrum and in the Y191F meta II spectrum occursbecause these mutations lead to the rearrangement of the EL2hydrogen-bonding network. We assign the 159.3-p.p.m. resonance inmeta II to Tyr191, because we observe a cross-peak at 156.5 p.p.m.between a tyrosine and the retinal C20 methyl group19 that we assign

160 155

WTa

c

e

g

b

d

f

h

WT

rho

meta II

p.p.m.

E181QY206F

Y268F Y192F

Y191F Y178F

160 155 p.p.m.

160 155 p.p.m. 160 155 p.p.m.

160 155 p.p.m. 160 155 p.p.m.

160 155 p.p.m.

13C chemical shift

160 155 p.p.m.

Figure 4 One dimensional 13C cross-polarization magic angle spinning

(CP-MAS) spectra of rhodopsin and meta II labeled with 13Cz-tyrosine.

(a) Overlap of the 13C one-dimensional CP-MAS spectra of the 13Cz-tyrosine

resonance in rhodopsin (rho, solid line) and meta II (broken line).

(b–h) Difference spectra for wild-type (WT) rhodopsin (b) and several

rhodopsin mutants, Y206F (c), E181Q (d), Y268F (e), Y192F (f), Y191F (g)

and Y178F (h). The wild-type difference spectrum is shown in gray in c–h.

a c

db

e f

2030

160 150 140 140 120130

Tyr268-Met288

*

*

Tyr268-Met288 Tyr223-Met257

13C chemical shifts (p.p.m.)

Met207-C6Met207-C7

M288L mutantWT

Tyr191-Met288

Glu134Glu134

H3H3

Tyr136

Tyr136

Tyr223

Tyr223

H5H5

Arg135

Arg135

Glu247

Glu247

Met257

Met257H6

H6

16.2 Å

Meta II Meta II

Tyr191-Met207 Met207-C6 Met207-C7

Rhodopsin Rhodopsin

10 140 1201300Figure 5 Two-dimensional DARR NMR of Tyr(Cz)-Met(Ce) contacts in

rhodopsin and the M288L rhodopsin mutant. (a) Rows through the 13Cz-tyrosine diagonal resonance from two-dimensional DARR NMR spectra of

rhodopsin (black) and the M288L rhodopsin mutant (orange) labeled with13Cz-tyrosine and 13Ce-methionine. Asterisks correspond to MAS side bands.

(b) Rows through the 13Ce-Met diagonal resonance from two-dimensional

DARR NMR spectra of wild-type meta II (WT, black) and the M288L

rhodopsin mutant (orange) following conversion to meta II. (c) Rows through

the 13Ce-methionine diagonal resonance of rhodopsin (black) and the

M288L rhodopsin mutant (orange) showing the cross-peaks to the retinal13C6 and 13C7 resonances. (d) Same as in c following conversion to meta II.

In the M288L mutant of rhodopsin, we observe a contact between Met207and C6 that is not present in wild-type rhodopsin. This change in the

Met207-retinal contact in the M288L mutant of rhodopsin can be

interpreted as either a change in the position of the retinal or in the

position of Met207 on H5 upon mutation of Met288 (H7) to leucine.

Upon activation, the Met207-retinal interactions in the M288L mutant are

identical to those in wild-type meta II. (e) A view of the ionic lock between

Arg135 and Glu247 from the crystal structure of rhodopsin20. The Tyr223-

Met257 distance is well beyond the range of the DARR NMR experiment.

(f) Structure of the ionic lock from the recent crystal structure of opsin21,22

showing the close proximity between Tyr223 and Met257.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 7 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 79: Nature Structural Molecular Biology February

to Tyr268. The C20 methyl group is closer to Tyr268 (4.2 A) than toTyr191 (8.0 A) in rhodopsin, and we expect that motion of EL2 awayfrom the retinal would only increase the 13C20-Tyr191(13Cz) distance.Together, these data argue that Tyr191 becomes more stronglyhydrogen-bonded in meta II and that the hydrogen-bonding networkinvolving the tyrosines and Glu181 on EL2 remains intact.

Coupling of EL2 displacement to rotation of helix H5The data presented above on the E181Q and M288L mutants and inSupplementary Figure 3 on the Y206F mutant suggest that light-induced structural changes in EL2 are strongly coupled to thehydrogen-bonding network centered on H5. First, in the E181Qmutant (Fig. 4d) the resonance at 153.6 p.p.m. assigned to Tyr206(H5) is lost. Second, in the M288L (H7) mutant a contact is gainedbetween the e-CH3 group of Met207 (H5) and the retinal C6 carbon(Fig. 5c). Third, in the Y206F (H5) mutant a tyrosine-glycine contactis lost that is likely to involve Tyr10 or Tyr29 on the extracellular loopsof rhodopsin, as there are no glycines in the vicinity of Tyr206.

We propose that the functional unit is the EL2-H5 sequence. Thecrystal structure of rhodopsin shows that the b-strands in EL2 areextensively knit together by hydrogen-bonding interactions thatextend to Tyr268 on H6 and Glu113 on H3 (refs. 6,20). If the motionof EL2 is coupled to the motion of H5, then the Pro170-Pro171sequence at the H4-b3 boundary may serve as a hinge, leading toobservable changes in the hydrogen-bonding interactions that link b3to H4 and H4 to H5. We observe that many of the hydrogen-bondingcontacts involving the extracellular ends of H4 and H5 change inmeta II (Supplementary Fig. 1a).

We have previously shown that H5 undergoes a change in orienta-tion in the region of His211 (Supplementary Fig. 6). Figure 5e,f showshow rotation of H5 leads to disruption of the ionic lock between H3and H6. In the dark state of rhodopsin (Fig. 5e), Arg135 of theconserved ERY sequence on H3 interacts with Glu247 (H6). In therecent structure of opsin (Fig. 5f) with21 and without22 the Ga peptidebound, H5 is rotated and Tyr223 (a residue that is highly conservedacross the GPCR family) interacts directly with Arg135 and Met257 onH6. The Tyr223-Arg135 interaction is thought to be one element inbreaking the ionic lock and stabilizing the active conformation ofrhodopsin. Figure 5b shows a new tyrosine-methionine contact in

meta II that we can assign to the Tyr223-Met257 interaction, consistentwith the proposal that this active-state geometry is maintained in theopsin structure22. Notably, mutation of Tyr223 to phenylalanine resultsin an appreciable increase in the decay rate of meta II to opsin(Supplementary Fig. 7 online), in agreement with the idea that theTyr223-Arg135 interaction stabilizes the active conformation of thereceptor. These results indicate that there are two crucial H3-H5interactions that hold helix H5 in an active geometry: Glu122-His211 (refs. 23,24) and Arg135-Tyr223 (refs. 21,22). The model ofactivation that emerges from these studies is one where steric contactsbetween the retinal b-ionone ring with H5 and the retinal C19 methylgroup with EL2 shift the EL2-H5 sequence into an active geometrystabilized by H3-H5 contacts; retinals lacking either the ring25 or theC19 methyl group26 fail to activate rhodopsin, and mutation of Tyr223to phenylalanine leads to rapid meta II decay.

DISCUSSIONEL2 controls access to the retinal binding siteThe main conclusion from our studies is that EL2 changes positionupon activation and that this change is coupled to motion oftransmembrane helix H5. We have recently defined the location ofthe retinal chromophore in meta II (S.A., E. Crocker, M.E., P.J.R.,M.S. and S.O.S., unpublished data), and our current measurementsbetween the b4 strand and the retinal indicate that there must be anincrease in the separation between the retinal and EL2 upon activa-tion. The hydrogen-bonding network involving Glu181 seems toremain intact in meta II, and consequently the displacement of EL2does not seem to be large.

Our observations can be compared with the crystal structures ofopsin22 and a ‘photoactivated’ (deprotonated) intermediate of rhod-opsin27. In these structures, EL2 does not seem to have moved to anyappreciable extent. The differences between meta II and opsin suggestthat the all-trans retinal Schiff base holds EL2 in an active conforma-tion in meta II. Release of the retinal to form opsin allows the binding-site residues to rearrange and EL2 to shift back to roughly its positionin rhodopsin.

The displacement of EL2 away from the retinal that we observed isconsistent with studies showing that the retinal binding site becomesmore accessible to water and hydroxylamine in meta II28,29. Mutationof many of the residues in the hydrogen-bonding network involvingEL2, such as Glu181 (ref. 30) and Tyr192 (ref. 31), results in increasedaccessibility of the retinal PSB to hydroxylamine in the dark. Also,the appearance of an N-D amide A vibration at 2,366 cm�1 in meta IIhas been attributed to hydrogen-deuterium exchange that occurs

Pro171

EL2a bEL3

Glu181

Trp175

Tyr206

His211

Pro215

Tyr223

Trp265Retinal

Met207

H5

CL3

H8

H7

H6

Pro303

Pro267

Glu134

Trp175

Trp265

Pro267

Pro215

Tyr223Arg135

Glu247

H6Pro303 H7

H5

EL2

EL3

Pro291

Glu247

Figure 6 Crystal structure of rhodopsin20 highlighting EL2 and H5.

(a) Retinal isomerization within the tightly packed binding site results in

steric contacts between the b-ionone ring and H5 and between the retinal

C19 and C20 methyl groups and EL2. These interactions trigger the

simultaneous displacement of EL2 and H5. Motion of the b-ionone ring

is also coupled to the motion of Trp265. Trp265 is packed against the

b-ionone ring and C20 of the retinal, as well as Gly121 on H3 and Ala295

on H7. Movement of the Trp265 side chain away from these crucial

contacts allows helices H6 and H7 to shift into active conformations.

The coupled motions of helices H5-H7, in turn, are coupled to the

rearrangement of electrostatic interactions involving the conserved ERY

sequence at the cytoplasmic end of H3, exposing the G protein binding site

on the cytoplasmic surface of the protein. (b) View of the rhodopsin crystal

structure highlighting the interaction between EL2 and EL3 on the

extracellular side of the receptor, and the positions of Tyr223 and theconserved Glu135-Arg135-Tyr136 sequence on the intracellular side

of the receptor.

ART IC L E S

17 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 80: Nature Structural Molecular Biology February

following the exposure of the EL2 b-hairpin to water in the meta I–to–meta II transition32. Notably, neither disruption of the Cys110-Cys187disulfide bond by mutation to alanine nor disruption of the salt bridgebetween Arg177 and Asp190 on EL2 increases hydroxylamine acces-sibility33,34, suggesting that the hydrogen-bonding network involvingGlu181 is alone sufficient to keep EL2 tightly capped over the retinalbinding site.

In a parallel fashion, EL2 may serve to control the access of small-molecule ligands to interior binding sites within the ligand-activatedGPCRs. For example, alanine-scanning mutagenesis of the M1 mus-carinic acetylcholine receptor revealed that the access of ligands to thebinding site was increased by mutation of EL2 residues35. Furthermore,substituted-cysteine accessibility studies of the dopamine D2 receptorshowed that the extracellular part of H5 is accessible to hydrophilicreagents36. Finally, the recent crystal structure of b2AR with a boundpartial inverse agonist14 shows that EL2 does not cap the amine bind-ing site, as occurs in rhodopsin. Taken together, the studies on GPCRsactivated by small-molecule ligands suggest that there is a dynamic roleof EL2 in allowing water and ligands to enter the interior binding sites.

EL2 as a negative regulator in GPCR activationSeveral studies have suggested that EL2 serves a role as a negativeregulator in the class A GPCRs. The simple idea is that EL2 hasmultiple interactions with the extracellular ends of the transmembranehelices in the inactive state and that displacement of EL2 upon ligandbinding allows H5, H6 and H7 to adopt active conformations. Forexample, one report showed that a high degree of constitutive activityis associated with the mutation of residues in EL2 of the C5a receptor3.The authors proposed that mutation of EL2 increases the flexibility ofthe loop and releases inhibitory constraints. The high degree of basalactivity in the melanocortin receptor, which has a short EL2 and lacksthe conserved disulfide bond, was explained by a related mechanism13.Finally, cross-linking in the putative ligand binding site37,38 and metalbinding sites39 in the vicinity of EL2 modulate receptor activity. Thesemodifications were designed to mimic the movement of the trans-membrane helices, and for this to occur, EL2 was envisioned to changeconformation or position.

In rhodopsin, EL2 has also been implicated as a negative regulatorof receptor activity. Mutation of Tyr191 and Tyr192 to leucinedecreases the stability of the binding pocket, leading to faster metaII decay rates40, and mutation of Ser186 to alanine and Glu181 tophenylalanine strongly perturbs the kinetics of rhodopsin activation41.However, none of the EL2 mutants tested in rhodopsin showsconstitutive activity. This may be due to the presence of additionalregulatory elements, such as the interaction between the retinal PSBand its Glu113 counterion and the tight packing between the 11-cisretinal and conserved Trp265 (H6), which all contribute to low darknoise in rhodopsin.

EL2-H5 as a structural unit in GPCRsOne of the challenges in understanding the mechanism of GPCRactivation is to establish how retinal isomerization42,43 or ligandbinding39,44 produces rigid-body motion of the transmembranehelices. Our results suggest that the motion of EL2 is coupled to themotion of H5 and breaking of the ionic lock.

Tight coupling between EL2 and H5 is supported in studies onligand-activated GPCRs45–48. One study addressed the coupling of EL2and H5 by replacing the EL2-H5 sequence from the 5HT1D serotoninreceptor with the corresponding sequence from the 5HT1B serotoninreceptor46. The authors found that it was necessary to replace theentire EL2-H5 sequence to recover antagonist binding; replacing either

the EL2 or H5 sequence alone markedly decreased binding. Also, theidea that EL2 is a structured unit is reflected in gonadotrophin-releasing hormone receptor studies showing that exchange of theentire EL2 from another species had less effect on ligand bindingaffinity than point mutations of EL2 within a species48.Figure 6 highlights the helix-loop-helix (HLH) segments involving

EL2 and EL3. Motion of EL2 away from the retinal binding site iscoupled to the outward displacement of the extracellular end of H5and the inward displacement and rotation of the intracellular end ofH5 (refs. 21,22). The outward displacement of H5 is driven by stericinteraction with the retinal b-ionone ring and is stabilized by a directGlu122-His211 interaction. Motion of EL2 may allow the extracellularend of the H6-EL3-H7 segment to pivot toward the center of theprotein and conversely allow the intracellular end of H6 to rotateoutward42,49. Inward motions of the extracellular ends of H6 and H7are captured in the global toggle switch model of GPCR activation39.

Additionally, Figure 6 shows the positions of key tryptophanresidues in rhodopsin, Trp265 (H6) and Trp175 (H4). Trp265 isconserved throughout the class A GPCRs and is an important elementof the activation mechanism of rhodopsin50. Trp175 is located at thejunction of EL2 with H4 and H5. In rhodopsin, the W175F mutationis one of the only mutations in the H4-EL2-H5 segment that leads toconstitutive activity51. The fact that this tryptophan residue is highlyconserved in the visual receptors, but not in other class A GPCRs,suggests that the H4-EL2-H5 sequence up to Pro215 is specific todifferent subfamilies of class A GPCRs.

In conclusion, the structural constraints described above provideinsights into how EL2 and its extensive hydrogen-bonding interactionsare involved in coupling retinal isomerization to the activation ofrhodopsin. The subfamily-specific H4-EL2-H5 unit in rhodopsinholds H5 and the extracellular ends of H6 and H7 in inactiveconformations. Retinal isomerization and displacement of EL2 fromthe retinal binding site are coupled to motion of H5 and to the inwardmotion of the H6-EL3-H7 unit. Similar motions are likely to occur inother GPCRs39,52, suggesting that EL2 may act as a plug or cork thatmust be released or rearranged for receptor activation.

METHODSExpression and purification of 13C-labeled rhodopsin. We used a stable

tetracycline-inducible HEK293S cell line53 containing the bovine opsin gene

or its mutants54 to express rhodopsin. The cells were grown in DMEM55

prepared from cell culture–tested components (Sigma). Suspension cultures

were grown using a bioreactor in medium with specific 13C-labeled amino acids

(Cambridge Isotope Laboratories), heat-inactivated FBS (10% (v/v), dialyzed

three times against 20 liters PBS per liter of serum)56, 0.1% (w/v) Pluronic

F-68, 300 mg l–1 dextran sulfate, 100 units ml–1 penicillin and 100 mg ml–1

streptomycin. On day 4 after incubation, cells were fed with 2.4 g l�1 glucose.

Opsin gene expression was induced 5 d after inoculation by addition of both

2 mg l�1 tetracycline and 5 mM sodium butyrate (final concentration) to the

growth medium53, and cells were harvested on day 7.

We resuspended the HEK293S cell pellets in 40 ml PBS per liter of cell

culture plus protease inhibitors54 and added unlabeled 11-cis retinal in two

steps to a final concentration of 15 mM. The rhodopsin-containing cells were

solubilized in 40 ml of PBS plus 1% (w/v) DDM per liter of cell culture for 4 h

at room temperature (22–25 1C). We carried out subsequent purification by

immunoaffinity chromatography using the rho-1D4 antibody (National Cell

Culture Center) as described previously54. The eluted rhodopsin fractions were

pooled and concentrated to a final volume of B400 ml using 10-kDa MWCO

Centricon devices (Amicon).

Synthesis of 13C-labeled retinals and regeneration into rhodopsin. We

synthesized specific 13C-labeled retinals by previously described methods57,58

and purified them using HPLC as previously described50.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 7 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 81: Nature Structural Molecular Biology February

We carried out regeneration of the rhodopsin pigments with 13C-labeled

retinal in DDM micelles by illuminating the concentrated samples containing a

2:1 molar ratio of labeled retinal to protein, as described previously19.

Typically, more than 85% of the sample was regenerated with labeled retinal.

Different regeneration rates were observed for wild-type and mutant opsins. A

stream of argon gas was used to evaporate the regenerated sample down to a

volume of 60 ml.

Solid-state NMR spectroscopy. Concentrated samples (7–10 mg) were loaded

into 4-mm MAS zirconia rotors. All NMR spectra were acquired at a static

magnetic field strength of 14.1 T (600 MHz) on a Bruker AVANCE spectro-

meter using double-channel 4 mm MAS probes, as described previously50.

Typically, we used MAS spinning rates of 8–12 kHz. One-dimensional13C spectra were acquired using ramped amplitude cross-polarization, with

contact times of 2 ms and acquisition times on the order of 16 ms for all

experiments. Intermolecular 13C-13C distance constraints on rhodopsin in the

inactive and the active state were obtained using the DARR recoupling

technique with a mixing time of 600 ms to maximize homonuclear recoupling

between different 13C labels. The 1H radiofrequency field strength during

mixing was matched to the MAS speed for each sample, satisfying the n ¼ 1

matching condition. Two-pulse phase-modulated or SPINAL64 proton decou-

pling was typically used during the evolution and acquisition periods, with a

radiofrequency field strength of 80–90 kHz. In each two-dimensional data set,

we acquired 1,024 time domain points in the f2 (direct) dimension and

64 points in the f1 (indirect) dimension. All experiments were conducted at

�80 1C. 13C spectra were referenced externally to the carbonyl resonance of

powdered glycine at 176.46 p.p.m. relative to neat TMS at 0.0 p.p.m.

Trapping of the metarhodopsin II intermediate. Samples were illuminated for

45–60 s at room temperature in the NMR rotor using a 400-W lamp with a

4495-nm cutoff filter and immediately placed in the NMR probe with the

probe stator warmed to 5 1C. Under slow spinning (B2 kHz), the sample was

frozen within 3 min of illumination using nitrogen gas cooled to �80 1C. To

confirm that meta II conversion was complete and stably trapped, we

monitored the chemical shift changes of the 13C-labeled carbons of the polyene

chain of the retinal, as they are sensitive to both protonation and isomerization.

The linewidths of the resolved protein and retinal NMR resonances were

generally between 1 p.p.m. and 2 p.p.m. in both rhodopsin and meta II. The

absence of line broadening or resonance splitting indicates that a spectro-

scopically well-defined meta II state has been trapped. The time between

illumination and freezing of the sample was approximately 3 min, indicating

that the proton uptake in our sample was complete; the intermediate we

trapped is functionally equivalent to meta II in rod outer segment (ROS)

membranes, as it can activate transducin59. Also, it has been shown that the

vibrational frequencies observed in the Fourier transform infrared (FTIR)

difference spectrum of meta II minus rhodopsin are identical for rhodopsin

in DDM or ROS membranes60.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSThis work was supported by the US National Insitutes of Health (NIH)–NationalScience Foundation instrumentation grants (S10 RR13889 and DBI-9977553),a grant from the NIH to S.O.S. (GM-41412), and a grant from the US-IsraelBinational Science Foundation to M.S. We thank C.A. Opefi for help with theM288A and M288L mutants and gratefully acknowledge the W.M. KeckFoundation for support of the NMR facilities in the Center of Structural Biologyat Stony Brook. M.S. acknowledges support from the Kimmelman Center forBiomolecular Structure and Assembly.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Samson, M. et al. The second extracellular loop of CCR5 is the major determinant ofligand specificity. J. Biol. Chem. 272, 24934–24941 (1997).

2. Shi, L. & Javitch, J.A. The second extracellular loop of the dopamine D-2 receptor linesthe binding-site crevice. Proc. Natl. Acad. Sci. USA 101, 440–445 (2004).

3. Klco, J.M., Wiegand, C.B., Narzinski, K. & Baranski, T.J. Essential role for the secondextracellular loop in C5a receptor activation. Nat. Struct. Mol. Biol. 12, 320–326(2005).

4. Scarselli, M., Li, B., Kim, S.K. & Wess, J. Multiple residues in the second extracellularloop are critical for M-3 muscarinic acetylcholine receptor activation. J. Biol. Chem.282, 7385–7396 (2007).

5. Palczewski, K. et al. Crystal structure of rhodopsin: a G protein-coupled receptor.Science 289, 739–745 (2000).

6. Okada, T. et al. The retinal conformation and its environment in rhodopsin in light of anew 2.2 A crystal structure. J. Mol. Biol. 342, 571–583 (2004).

7. Karnik, S.S. & Khorana, H.G. Assembly of functional rhodopsin requires a disulfidebond between cysteine residues 110 and 187. J. Biol. Chem. 265, 17520–17524(1990).

8. Hwa, J., Klein-Seetharaman, J. & Khorana, H.G. Structure and function in rhodopsin:mass spectrometric identification of the abnormal intradiscal disulfide bond in mis-folded retinitis pigmentosa mutants. Proc. Natl. Acad. Sci. USA 98, 4872–4876(2001).

9. Steinberg, G., Ottolenghi, M. & Sheves, M. pKa of the protonated Schiff base ofbovine rhodopsin: a study with artificial pigments. Biophys. J. 64, 1499–1502(1993).

10. Sakmar, T.P., Franke, R.R. & Khorana, H.G. The role of the retinylidene Schiff basecounterion in rhodopsin in determining wavelength absorbance and Schiff base pKa.Proc. Natl. Acad. Sci. USA 88, 3079–3083 (1991).

11. Cohen, G.B., Oprian, D.D. & Robinson, P.R. Mechanism of activation and inactivationof opsin: role of Glu113 and Lys296. Biochemistry 31, 12592–12601 (1992).

12. Rader, A.J. et al. Identification of core amino acids stabilizing rhodopsin. Proc. Natl.Acad. Sci. USA 101, 7246–7251 (2004).

13. Holst, B. & Schwartz, T.W. Molecular mechanism of agonism and inverse agonism inthe melanocortin receptors—Zn2+ as a structural and functional probe. Ann. NY Acad.Sci. 994, 1–11 (2003).

14. Cherezov, V. et al. High-resolution crystal structure of an engineered humanb2-adrenergic G protein-coupled receptor. Science 318, 1258–1265 (2007).

15. Matsumoto, H. & Yoshizawa, T. Recognition of opsin to longitudinal length of retinalisomers in formation of rhodopsin. Vision Res. 18, 607–609 (1978).

16. Sharma, D. & Rajarathnam, K. 13C NMR chemical shifts can predict disulfide bondformation. J. Biomol. NMR 18, 165–171 (2000).

17. Herzfeld, J. et al. Solid-state 13C NMR study of tyrosine protonation in dark-adaptedbacteriorhodopsin. Biochemistry 29, 5567–5574 (1990).

18. DeLange, F. et al. Tyrosine structural changes detected during the photoactivation ofrhodopsin. J. Biol. Chem. 273, 23735–23739 (1998).

19. Patel, A.B. et al. Coupling of retinal isomerization to the activation of rhodopsin. Proc.Natl. Acad. Sci. USA 101, 10048–10053 (2004).

20. Li, J., Edwards, P.C., Burghammer, M., Villa, C. & Schertler, G.F.X. Structure of bovinerhodopsin in a trigonal crystal form. J. Mol. Biol. 343, 1409–1438 (2004).

21. Scheerer, P. et al. Crystal structure of opsin in its G-protein-interacting conformation.Nature 455, 497–502 (2008).

22. Park, J.H., Scheerer, P., Hofmann, K.P., Choe, H.W. & Ernst, O.P. Crystal structure ofthe ligand-free G-protein-coupled receptor opsin. Nature 454, 183–187 (2008).

23. Patel, A.B. et al. Changes in interhelical hydrogen bonding upon rhodopsin activation.J. Mol. Biol. 347, 803–812 (2005).

24. Imai, H. et al. Single amino acid residue as a functional determinant of rod and conevisual pigments. Proc. Natl. Acad. Sci. USA 94, 2322–2326 (1997).

25. Jager, F. et al. Interactions of the b-ionone ring with the protein in the visual pigmentrhodopsin control the activation mechanism. An FTIR and fluorescence study onartificial vertebrate rhodopsins. Biochemistry 33, 7389–7397 (1994).

26. Ganter, U.M., Schmid, E.D., Perez-Sala, D., Rando, R.R. & Siebert, F. Removal of the9-methyl group of retinal inhibits signal transduction in the visual process. A Fouriertransform infrared and biochemical investigation. Biochemistry 28, 5954–5962(1989).

27. Salom, D. et al. Crystal structure of a photoactivated deprotonated intermediate ofrhodopsin. Proc. Natl. Acad. Sci. USA 103, 16123–16128 (2006).

28. Sakmar, T.P., Franke, R.R. & Khorana, H.G. Glutamic acid-113 serves as theretinylidene Schiff base counterion in bovine rhodopsin. Proc. Natl. Acad. Sci. USA86, 8309–8313 (1989).

29. Zhukovsky, E.A. & Oprian, D.D. Effect of carboxylic acid side chains on the absorptionmaximum of visual pigments. Science 246, 928–930 (1989).

30. Yan, E.C.Y. et al. Function of extracellular loop 2 in rhodopsin: glutamic acid 181modulates stability and absorption wavelength of metarhodopsin II. Biochemistry 41,3620–3627 (2002).

31. Janz, J.M. & Farrens, D.L. Role of the retinal hydrogen bond network in rhodopsinSchiff base stability and hydrolysis. J. Biol. Chem. 279, 55886–55894 (2004).

32. Furutani, Y., Shichida, Y. & Kandori, H. Structural changes of water molecules duringthe photoactivation processes in bovine rhodopsin. Biochemistry 42, 9619–9625(2003).

33. Davidson, F.F., Loewen, P.C. & Khorana, H.G. Structure and function in rhodopsin:replacement by alanine of cysteine residues 110 and 187, components of a conserveddisulfide bond in rhodopsin, affects the light-activated metarhodopsin II state. Proc.Natl. Acad. Sci. USA 91, 4029–4033 (1994).

34. Janz, J.M., Fay, J.F. & Farrens, D.L. Stability of dark state rhodopsin is mediated by aconserved ion pair in intradiscal loop E-2. J. Biol. Chem. 278, 16982–16991 (2003).

35. Goodwin, J.A., Hulme, E.C., Langmead, C.J. & Tehan, B.G. Roof and floor of themuscarinic binding pocket: variations in the binding modes of orthosteric ligands. Mol.Pharmacol. 72, 1484–1496 (2007).

ART IC L E S

17 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 82: Nature Structural Molecular Biology February

36. Javitch, J.A., Fu, D. & Chen, J. Residues in the fifth membrane-spanning segment ofthe dopamine D2 receptor exposed in the binding-site crevice. Biochemistry 34,16433–16439 (1995).

37. Struthers, M., Yu, H.B. & Oprian, D.D. G protein-coupled receptor activation: analysisof a highly constrained, ‘‘straitjacketed’’ rhodopsin. Biochemistry 39, 7938–7942(2000).

38. Han, S.J. et al. Identification of an agonist-induced conformational change occurringadjacent to the ligand-binding pocket of the M-3 muscarinic acetylcholine receptor.J. Biol. Chem. 280, 34849–34858 (2005).

39. Elling, C.E. et al. Metal ion site engineering indicates a global toggle switch model forseven-transmembrane receptor activation. J. Biol. Chem. 281, 17337–17346(2006).

40. Doi, T., Molday, R.S. & Khorana, H.G. Role of the intradiscal domain in rho-dopsin assembly and function. Proc. Natl. Acad. Sci. USA 87, 4991–4995(1990).

41. Yan, E.C.Y. et al. Photointermediates of the rhodopsin S186A mutant as a probe of thehydrogen-bond network in the chromophore pocket and the mechanism of counterionswitch. J. Phys. Chem. C 111, 8843–8848 (2007).

42. Farrens, D.L., Altenbach, C., Yang, K., Hubbell, W.L. & Khorana, H.G. Requirement ofrigid-body motion of transmembrane helices for light activation of rhodopsin. Science274, 768–770 (1996).

43. Sheikh, S.P., Zvyaga, T.A., Lichtarge, O., Sakmar, T.P. & Bourne, H.R. Rhodopsinactivation blocked by metal-ion-binding sites linking transmembrane helices C and F.Nature 383, 347–350 (1996).

44. Sheikh, S.P. et al. Similar structures and shared switch mechanisms of theb2-adrenoceptor and the parathyroid hormone receptor—Zn(II) bridges betweenhelices III and VI block activation. J. Biol. Chem. 274, 17033–17041 (1999).

45. Olah, M.E., Jacobson, K.A. & Stiles, G.L. Role of the 2nd extracellular loop ofadenosine receptors in agonist and antagonist binding—analysis of Chimeric A1/A3-adenosine receptors. J. Biol. Chem. 269, 24692–24698 (1994).

46. Wurch, T., Colpaert, F.C. & Pauwels, P.J. Chimeric receptor analysis of the ketanserinbinding site in the human 5-hydroxytryptamine1D receptor: importance of the secondextracellular loop and fifth transmembrane domain in antagonist binding. Mol.Pharmacol. 54, 1088–1096 (1998).

47. Conner, M. et al. Systematic analysis of the entire second extracellular loop of the V-1avasopressin receptor—key residues, conserved throughout a G-protein-coupled recep-tor family, identified. J. Biol. Chem. 282, 17405–17412 (2007).

48. Pfleger, K.D.G., Pawson, A.J. & Millar, R.P. Changes to gonadotropin-releasinghormone (GnRH) receptor extracellular loops differentially affect GnRH analog bindingand activation: evidence for distinct ligand-stabilized receptor conformations. Endo-crinology 149, 3118–3129 (2008).

49. Altenbach, C., Kusnetzow, A.K., Ernst, O.P., Hofmann, K.P. & Hubbell, W.L. High-resolution distance mapping in rhodopsin reveals the pattern of helix movement due toactivation. Proc. Natl. Acad. Sci. USA 105, 7439–7444 (2008).

50. Crocker, E. et al. Location of Trp265 in metarhodopsin II: implications for the activationmechanism of the visual receptor rhodopsin. J. Mol. Biol. 357, 163–172 (2006).

51. Madabushi, S. et al. Evolutionary trace of G protein-coupled receptors reveals clustersof residues that determine global and class-specific functions. J. Biol. Chem. 279,8126–8132 (2004).

52. Holst, B., Elling, C.E. & Schwartz, T.W. Partial agonism through a zinc-ion switchconstructed between transmembrane domains III and VII in the tachykinin NK1receptor. Mol. Pharmacol. 58, 263–270 (2000).

53. Reeves, P.J., Kim, J.M. & Khorana, H.G. Structure and function in rhodopsin: atetracycline-inducible system in stable mammalian cell lines for high-level expressionof opsin mutants. Proc. Natl. Acad. Sci. USA 99, 13413–13418 (2002).

54. Reeves, P.J., Thurmond, R.L. & Khorana, H.G. Structure and function in rhodopsin:high level expression of a synthetic bovine opsin gene and its mutants in stablemammalian cell lines. Proc. Natl. Acad. Sci. USA 93, 11487–11492 (1996).

55. Dulbecco, R. & Freeman, G. Plaque production by the polyoma virus. Virology 8,396–397 (1959).

56. Eilers, M., Reeves, P.J., Ying, W.W., Khorana, H.G. & Smith, S.O. Magic angle spinningNMR of the protonated retinylidene schiff base nitrogen in rhodopsin: expression of15N-lysine and 13C-glycine labeled opsin in a stable cell line. Proc. Natl. Acad. Sci.USA 96, 487–492 (1999).

57. Lugtenburg, J. The synthesis of 13C-labeled retinals. Pure Appl. Chem. 57, 753–762(1985).

58. Crocker, E. et al. Dipolar assisted rotational resonance NMR of tryptophan and tyrosinein rhodopsin. J. Biomol. NMR 29, 11–20 (2004).

59. Han, M., Groesbeek, M., Smith, S.O. & Sakmar, T.P. Role of the C9 methyl group inrhodopsin activation: characterization of mutant opsins with the artificial chromophore11-cis-9-demethylretinal. Biochemistry 37, 538–545 (1998).

60. Fahmy, K. et al. Protonation states of membrane-embedded carboxylic acid groups inrhodopsin and metarhodopsin II: a Fourier-transform infrared spectroscopy study ofsite-directed mutants. Proc. Natl. Acad. Sci. USA 90, 10206–10210 (1993).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 7 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 83: Nature Structural Molecular Biology February

Recognition of atypical 5¢ splice sites by shiftedbase-pairing to U1 snRNAXavier Roca & Adrian R Krainer

Accurate pre-mRNA splicing is crucial for gene expression. The 5¢ splice site (5¢ ss)—the highly diverse element at the 5¢ end ofintrons—is initially recognized via base-pairing to the 5¢ end of the U1 small nuclear RNA (snRNA). However, many natural 5¢ sshave a poor match to the consensus sequence, and are predicted to be weak. Using genetic suppression experiments in humancells, we demonstrate that some atypical 5¢ ss are actually efficiently recognized by U1, in an alternative base-pairing register thatis shifted by one nucleotide. These atypical 5¢ ss are phylogenetically widespread, and many of them are conserved. Moreover,shifted base-pairing provides an explanation for the effect of a 5¢ ss mutation associated with pontocerebellar hypoplasia. Theunexpected flexibility in 5¢ ss–U1 base-pairing challenges an established paradigm and has broad implications for splice-siteprediction algorithms and gene-annotation efforts in genome projects.

Accurate pre-mRNA splicing is crucial for the correct transmissionof information from gene to protein1. Splicing is catalyzed by thespliceosome, a large and dynamic complex composed of five smallnuclear ribonucleoprotein particles (snRNPs) made up of snRNAsand associated polypeptides, as well as many other protein factors2.Conserved sequences that match degenerate consensus motifs at bothends of introns are essential for splicing1. As first proposed in 1980(refs. 3,4), and definitively demonstrated in 1986 (ref. 5), the 5¢ ss isinitially recognized via base-pairing to the 5¢ end of the U1 snRNA. The5¢ ss consensus sequence for the major, or U2-type, GT-AG introns inmammals, which comprise 498% of all introns6, has perfect com-plementarity to the 5¢ end of the U1 snRNA3–5,7,8, establishing up to 11base pairs (bp) in a defined register, here referred to as the ‘canonical’register (Fig. 1a and Methods). However, the major spliceosome canaccurately recognize a highly diverse set of 5¢ ss sequences: usingSpliceRack6, a comprehensive database of splice sites, we have found2,503 unique human 5¢ ss sequences—considering only the classical9-nt motif (Methods)—that are used at least three times in thetranscribed genome, in 186,630 introns.

Many of these bona fide 5¢ ss have few predicted base pairs to U1(refs. 6,9,10), and selection of these atypical 5¢ ss cannot be explainedby other known mechanisms, such as splicing via the minor, U12-typespliceosome6,11. We noticed that a subset of atypical 5¢ ss have asequence (ACA/GUUAAGUAU, where / marks the exon-intron bound-ary) that is reminiscent of the consensus motif (Fig. 1a). This sequencecan form only three potential base pairs with the 5¢ end of U1 in thecanonical scheme (+1G of the 5¢ ss base-pairing with C8 of U1);however, this can be increased to 10 base pairs by shifting the 5¢ end ofU1 snRNA one position downstream of the 5¢ ss (+1G of the 5¢ ssbase-pairing with C9 of U1). We refer to this alternative base-pairing

arrangement as the ‘shifted’ register. Thus, we hypothesized that these5¢ ss are recognized via shifted base-pairing to the 5¢ end of U1, andhere we present experimental evidence to support this model.

RESULTSSome 5¢ ss do not base-pair to U1 by the canonical registerTo test the shifted base-pairing hypothesis experimentally, we firstanalyzed the atypical 5¢ ss associated with exons 6 and 8 of the humaninositol polyphosphate-4-phosphatase, type I (INPP4A) and generaltranscription factor IIH, polypeptide 1 (GTF2H1) genes, respectively.We transiently transfected three-exon, two-intron minigene constructsinto HeLa cells and analyzed the inclusion or skipping of the middleexon carrying the atypical 5¢ ss by reverse-transcription PCR (RT-PCR). We found in both cases that the atypical 5¢ ss was efficientlyused for splicing of the minigene transcripts (Fig. 1b, lane 1) as wellas of the endogenous transcripts in HeLa cells (Supplementary Fig. 1online), with slight retention of the second intron in the caseof GTF2H1.

We also mutated the atypical 5¢ ss in both minigenes, so as torestore the consensus nucleotides at positions +3 and/or +6 of the5¢ ss. Paradoxically, these mutant minigenes with improved base-pairing potential to U1 in the canonical register (4 bp or 5bp) anddecreased base-pairing in the shifted register (8 bp or 9 bp) expressedmany aberrantly spliced mRNAs, generated by skipping of the internalexon, retention of the second intron or cryptic 5¢ ss activation (Fig. 1b,lanes 2–4). This observation indicates that these 5¢ ss are notrecognized via the classical base-pairing register with U1 snRNA.

Next, we used survival of motor neuron minigenes (SMN1/2)12

to test atypical 5¢ ss in a heterologous context. The SMN1 and SMN2paralog pre-mRNAs give different extents of exon 7 inclusion,

Received 11 September 2008; accepted 19 December 2008; published online 25 January 2009; doi:10.1038/nsmb.1546

Cold Spring Harbor Laboratory, PO Box 100, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA. Correspondence should be addressed to A.R.K.([email protected]).

17 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 84: Nature Structural Molecular Biology February

providing two distinct contexts in which to analyze the efficiency ofthe test 5¢ ss. This difference is mainly attributable to a single-nucleotide divergence at the sixth position of this exon. Whereas acytidine in SMN1 results in virtually complete exon 7 inclusion, auridine in SMN2 results predominantly in exon skipping13 because ofthe sequence change in a cis-acting element(s) recognized by a splicingactivator in SMN1 and/or a repressor in SMN2 (refs. 12,14).

We substituted the natural 5¢ ss of SMN1/2 exon 7 (GGA/GUAAGUCU; eight base pairs with U1) with different versions of theatypical 5¢ ss. In SMN2, the atypical 5¢ ss was threefold more efficientthan the natural one (Fig. 1c, lanes 1 and 2). This finding isremarkable because all available computational methods15–22 predictthe natural SMN1/2 exon 7 5¢ ss to be much stronger than the atypical5¢ ss (Table 1). The splicing compatibility of the atypical 5¢ ss with thecanonical 3¢ splice site of SMN1/2 exon 8 also indicates that splicingvia this 5¢ ss is catalyzed by the major spliceosome1.SMN1/2 minigenes with mutations at the atypical 5¢ ss positions

+3 and +6 that restore the consensus nucleotide (but disruptshifted base-pairing) showed increased exon 7 skipping (Fig. 1c,lanes 3–6), consistent with the above results with the GTF2H1 andINPP4A substrates, and suggesting that the shifted base-pairingregister is being used. The simpler mRNA patterns obtained withthe SMN1/2 minigenes made them more amenable to further muta-tional analyses.

Suppressor U1 analysis demonstrates shifted base-pairingWe next sought to determine whether the atypical 5¢ ss is indeedrecognized by shifted base-pairing to U1 (Fig. 1a). To this end,we transfected a series of SMN1/2 minigenes carrying mutations atthe atypical 5¢ ss, along with U1 snRNA expression plasmids withcompensatory mutations that restore base-pairing. This type ofinformational suppression analysis is known as suppressor- or shift-U1 experiments5,7–9,23–25.

First, we tested a series of mutations that introduced a consensusnucleotide at different positions of the atypical 5¢ ss (Fig. 2a,b andSupplementary Fig. 2a online). All mutations, with the exception of–2A and –1G in the SMN1 context, resulted in partial or complete lossof exon 7 inclusion, further indicating that canonical base-pairingwith U1 does not occur at the atypical 5¢ ss (Fig. 2c). The correspond-ing suppressor U1 snRNAs in the shifted base-pairing register partiallyrestored exon 7 inclusion for some of these mutants: +5G and +7A inSMN1 (Fig. 2c, above, lanes 8–9 and 12–13) and –1G and +5G inSMN2 (Fig. 2c, below, lanes 4–5 and 8–9). For one mutant 5¢ ss, +3Ain the SMN1 context, the suppressor U1 snRNA decreased exon 7inclusion (Fig. 2c, above, lanes 6 and 7), perhaps reflecting a block in asubsequent step in the splicing reaction. The –2A and +6U mutationscould not be rescued by suppressor U1s in either of the two contexts.The –2A mutation resulted in very slight exon 7 skipping. The +6Umutation (as well as +6C in Figure 3, see below) was not rescued bysuppressor U1, perhaps because this mutation eliminates a strong G-Cbase pair essential for efficient binding of U1. Nevertheless, the rescueof exon 7 inclusion by suppressor U1s for mutants –1G, +5G and +7Ais consistent with the hypothesis that atypical 5¢ ss are recognized viashifted base-pairing to U1.

Second, we analyzed a series of mutant 5¢ ss with a cytidine atone intronic position (+3 to +6), with or without co-transfectedsuppressor U1 snRNAs carrying the compensatory mutation in eitherthe classical or the shifted register (Fig. 3a and SupplementaryFig. 2b–d). We chose the nucleotide cytidine because it cannot forma base pair with wild-type U1 in either of the two arrangements. In allcases, each mutation resulted in predominant skipping of SMN1/2exon 7 (Fig. 3b). Splicing via the mutant 5¢ ss with cytidine at

Consensus 5′ ssa

b

c

U1

U1

Atypical 5′ ss

(C)(C)

(S)

U1

–3 +1

GTF2H1 (exons 7–9)

GTF2H1

INPP4A

SMN1

SMN1

SMN2

SMN2

INPP4A (exons 5–7)

+3

2

3

4

45

2

% inclusion:s.d.:

% inclusion:s.d.:

100 420

0 00 0

100 000

1 2 3 4 5 6

0 00 0

100 10000

55 04 0

00

3

3

4

5% inclusion:

s.d.:Lane:

28 10004

1 2 3 4 5

0 00 0 0

0% inclusion:s.d.:

Lane:

1

1

1

4

1

+5 +7

M

(+6C)

(+6U)7 86

+3A

+3AAtp+3A+6U+6U

+3ANat Atp+3A+6U+6U

+6U

+3A

+6U

–1 –3 +1 +3 +5 +7 +9–1

3′ 3′

3′

Table 1 Scores of the SMN1/ 2 exon 7 5¢ ss (upper sequence) and of

the atypical 5¢ ss (lower sequence)

S&Sa DGb H-Bondc NNd MAXENTe MDDf MMg

5¢-GGAGUAAGUCU-3¢ 77.48 –8.70 14.50 0.99 8.57 12.28 6.36

5¢-ACAGUUAAGUA-3¢ 51.65 –2.20 1.90 0.00 –12.18 –2.72 –4.30

aShapiro and Senapathy Consensus Value, a position-weight matrix15,16. bFree energy of the5¢ ss–U1 RNA duplex in the canonical register17. cAn algorithm based on the hydrogen bondingof the 5¢ ss–U1 base-pairing in the canonical register18. dNeural Network, a machine learningapproach19. eMaximum Entropy Model, an algorithm that considers dependencies betweenpositions20,21. fMaximum Dependence Decomposition, a decision-tree approach20,21.gFirst-order Markov Model, an algorithm that considers dependencies between adjacentpositions20,21.See refs. 18,22 for detailed descriptions of these methods.

Figure 1 Shifted base-pairing between atypical 5¢ ss and the 5¢ end of

U1 snRNA. (a) Diagram of the two base-pairing registers between the 5¢ ss

(positions are numbered) and U1. Consensus nucleotides are shown in red

in all figures (Methods). C, pseudouridine; �, 2,2,7-trimethylguanosine cap

at the 5¢ end of U1; box, upstream exon; line, intron. Base pairs in the

canonical (C) or shifted (S) register are indicated by vertical lines. Note that

the atypical 5¢ ss can form seven more base pairs to U1 in the shifted

arrangement. (b) Mutations at the atypical (Atp) 5¢ ss that disrupt shifted

but enhance canonical base-pairing abolish correct splicing. The human

GTF2H1 and INPP4A minigenes are schematically represented at the top,

indicating the mutations introduced at the atypical 5¢ ss. M, molecular

weight markers. The identity of the various spliced mRNAs, detected by

radioactive RT-PCR, is schematically shown on the left of the gels:

1, correctly spliced mRNA; 2, retention of the downstream intron; 3, use

of cryptic 5¢ ss in the middle exon; 4, skipping of the middle exon;5, activation of a cryptic 5¢ ss in the first exon. The percentage of correct

splicing is shown below. See Supplementary Figure 1 for details about the

aberrantly spliced mRNAs. (c) RT-PCR analysis of the atypical 5¢ ss in

the SMN1/2 context (schematic above). Nat, natural SMN1/2 exon 7 5¢ ss.

Numbers below show the percentage and s.d. of exon 7 inclusion.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 7 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 85: Nature Structural Molecular Biology February

positions +4 or +5 was rescued by suppressor U1s in the shifted, butnot in the canonical base-pairing, register (Fig. 3b, lanes 5–10).Splicing via the +3C mutant 5¢ ss in SMN1 was rescued by bothsuppressor U1s (Fig. 3b, lanes 2–4), but the suppressor in the shiftedregister showed substantially higher activity. We also tested forsuppression of the +4C mutation in the INPP4A and GTF2H1minigenes and found that the suppressor U1 in the shifted but notin the canonical register restored recognition of the mutant 5¢ ss(Supplementary Fig. 3 online). Furthermore, analysis of 5¢ ss withtwo mutations in the context of SMN1/2 minigenes gave consistentresults (Supplementary Fig. 4 online). Although not all suppressorU1s are effective in this type of experiment5,7,8, our data show thatmany of the suppressor U1 snRNAs in the shifted register can rescuemutations at atypical 5¢ ss. Together, our U1-suppressor experimentsformally demonstrate that recognition of these 5¢ ss is mediated bybase-pairing to U1 that is shifted by one nucleotide relative to thecanonical scheme.

Atypical 5¢ ss are recognized by U1 and not U1A7 snRNAA recent report described the expression of three human U1 snRNAvariants with 5¢ ends different from that of U1 and several nucleotidechanges at other positions26. Notably, the U1A7 snRNA 5¢ end hasperfect complementarity to the atypical 5¢ ss, also in the shiftedregister. To test the role of the U1A7 snRNA in the recognition ofatypical 5¢ ss, we performed a series of experiments with suppressorU1 or U1A7 snRNAs or RNA decoys.

We used suppressor U1 and U1A7 snRNAs both in the canonical orshifted register to try to rescue the +4C mutation in the SMN1/2contexts (Supplementary Fig. 5 online). In addition, the 5¢ ends and

the snRNA bodies of U1 and U1A7 were swapped to make chimericsnRNAs. None of the suppressors with the U1A7 snRNA body rescuedexon 7 inclusion. With the U1 body, both the U1 and U1A7 5¢ endscarrying the compensatory mutation in the shifted but not in thecanonical register rescued splicing. As expected, the 5¢ end of U1A7was more effective than that of U1 because it can form an extra basepair to the +4C 5¢ ss. However, owing to the lack of activity ofsuppressors with the U1A7 body, the much greater abundance of U1and the fact that an snRNA with the U1 body and the U1A7 5¢ enddoes not exist in human cells, we infer that U1A7 is not involved in therecognition of atypical 5¢ ss.

In addition, we used U1- and U1A7-specific RNA decoys to furthertest which of these two trans-acting factors is involved in the recogni-tion of atypical 5¢ ss (Fig. 4). The D1 and D7 decoys are short RNAsthat carry a sequence with perfect complementarity to the 5¢ end of U1or U1A7 snRNAs, respectively (Fig. 4a). The D1 decoy has theconsensus 5¢ ss sequence, and the D7 decoy has the atypical 5¢ sssequence. We determined that RNA decoys bind to their cognatesnRNAs only when they have perfect complementarity to them(Supplementary Fig. 6 online), thereby reducing the free levels ofthese snRNPs in the cell and affecting the splicing of certain introns(data not shown). The decoy RNAs were cotransfected with SMN1/2minigenes with the natural exon 7 5¢ ss or the atypical 5¢ ss. The D1

100% inclusion:s.d.:

% inclusion:

Sup. U1:

SMN2

SMN1

Atp

(+6U)

(+6C)

76

a

c

b

8U1 (C5)

5′ ss:

s.d.:

Lane:

97 96 52 27134110

1000

1000

100

–2A

–2A

–1G +5G

5′ ss +5G:

3′

+7A

–1G +3A

+3A

+5G +6U

+6U

+7A

0100

– + – + – + – + – –+ +

0100

0554

233

364

67 00

00

00

3

SMN1

SMN2

14

1 2 3 4 5 6 7 8 9 10 11 12 13

0 0 0 0 0 00 0 0

000 0 0 1

5′ ss:

5′ ss +4C:

3′

3′

Sup. U1:Atp +3C

(+6C)

(C)

Base-pairing:

U1 (G6)

U1 (G5)

(S)

6

a

b

7 8(+6U)

– – C S – C S – C S – C S

+4C +5C +6C

SMN1

SMN2

100% inclusion:s.d.:

% inclusion:s.d.:

Lane:

0 0 00000

1000

00

1000

62

174

00

00

604

132

62

00

00

00

1 2 3 4

005

006

008

009

0011

0012

0013

81310

217

SMN1

SMN2

Figure 2 Suppressor U1 snRNAs in the shifted register can rescue splicing.

(a) Schematic of the single mutations introduced at the atypical 5¢ ss in the

SMN1/2 context. These mutations substitute a nonconsensus nucleotide

by a consensus nucleotide. (b) Base-pairing of the mutant 5¢ ss with the

corresponding suppressor U1 snRNA. As an example, we show the

base-pairing of the +5G mutant 5¢ ss with the suppressor U1 snRNA

carrying the corresponding compensatory mutation (C5) in the shifted

register. The mutant nucleotide at the 5¢ end of U1 in each case is shown

in red. See Supplementary Figure 2 for the base-pairing of all mutant 5¢ ss

with their respective suppressor U1s. (c) RT-PCR analysis of the SMN1/2

minigenes carrying the wild-type (lane 1) or mutant atypical 5¢ ss (lanes

2–13). The 5¢ ss mutation is indicated above, without (–) or with (+) the

corresponding suppressor U1 snRNA. The mRNA products are schematically

indicated on the left. The fastest migrating band in SMN1 corresponds to an

mRNA that skipped exon 7 and used a cryptic 5¢ ss 50 nt upstream of theexon 6 5¢ ss. The percentage and s.d. of exon 7 inclusion is indicated below

each autoradiogram.

Figure 3 Compensatory U1 mutations that restore shifted but not

canonical base-pairing rescue splicing at atypical 5¢ ss. (a) Scheme of

the experimental design. SMN1/2 minigenes carrying point mutations at a

heterologous atypical 5¢ ss in exon 7 were co-transfected with suppressor

U1 snRNAs. The 5¢ ss nucleotides at positions +3 to +6 were individually

mutated to cytidine (+3C to +6C). The 5¢ end of U1 was mutated so as to

rescue base-pairing in the canonical or the shifted arrangement (suppressor

U1 mutations G3 to G6). Mutant +4C is shown as a representative example,

for which U1 mutations G5 or G6 restore base-pairing in the canonical(C) or shifted (S) register, respectively. Mutations are shown in blue type. For

the other three mutations, see Supplementary Figure 2. (b) RT-PCR analysis

of the +3C to +6C 5¢ ss mutations in SMN1/2 with suppressor U1. Labels

above indicate the 5¢ ss mutant and the suppressor U1 in either register.

Atp, wild-type atypical 5¢ ss. The percentage and s.d. of exon 7 inclusion is

shown below each autoradiogram.

ART IC L E S

17 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 86: Nature Structural Molecular Biology February

decoy reduced recognition of both the natural (Fig. 4b, lane 11versus 10) and the atypical (Fig. 4b, lanes 2–5 versus 1) 5¢ ss inexon 7 in a dose-dependent manner. The D7 decoy did not substan-tially affect recognition of the atypical 5¢ ss (Fig. 4b, lanes 6–9versus 1) and had only a subtle effect on the natural exon 7 5¢ ss(Fig. 4b, lane 12 versus 10). The results obtained with the U1 andU1A7 suppressors and the decoys demonstrate that both the atypicaland the natural SMN1/2 exon 7 5¢ ss are recognized by the sametrans-acting factor, U1 snRNA, and not by U1A7.

Atypical 5¢ ss do not base-pair to U6 in a shifted registerDuring spliceosome assembly, U1 is displaced from the 5¢ ss to allowbase-pairing of U5 and U6 snRNAs to the exonic and intronicportions of the 5¢ ss, respectively27–31. This replacement is crucialfor spliceosome assembly and catalysis. The atypical 5¢ ss has anextended potential base-pairing to the phylogenetically invariant U6ACAGAG box, when its position is shifted by one nucleotide (Fig. 5a,6 bp versus 3 bp). To test whether this shifted base-pairing to U6 canoccur, we used suppressor U6 snRNAs30,32–34 in combination withsuppressor U1 snRNAs to try to rescue atypical 5¢ ss mutations in theSMN1/2 context (Fig. 5b,c and Supplementary Fig. 7 online).Suppressor U6s with only one compensatory mutation had no effecton exon 7 inclusion (Supplementary Fig. 7), but suppressor U6s withseveral mutations did (Fig. 5b,c). Suppressor U6 in the canonicalregister resulted in higher levels of exon 7 inclusion than suppressorU6 in the shifted register (Fig. 5c, lanes 5 and 6 in SMN2). These data

suggest that shifted base-pairing between the atypical 5¢ ss and U6does not occur. In other words, the same positions of the 5¢ ss base-pair to the same positions in U6 in both conventional and atypical5¢ ss, such as uridine at 5¢ ss position +2 base-pairing to 45A in U6.This observation is consistent with the proposed prominent role of the5¢ ss–U6 RNA helix in catalysis1,31 (Discussion).

Estimated counts and conservation of atypical 5¢ ssAtypical 5¢ ss that can be recognized by shifted base-pairing to the U1snRNA 5¢ end are present in the five genomes in the SpliceRackdatabase6: Homo sapiens, Mus musculus, Drosphila melanogaster,Caenorhabditis elegans and Arabidopsis thaliana (Table 2 and Supple-mentary Tables 1 and 2 online). Conservative estimates of thenumber of 5¢ ss recognized by this new mechanism, based on ourcurrent understanding of the shifted base-pairing requirements, rangefrom 20 in D. melanogaster to 115 in A. thaliana. Notably, theC. elegans genome, which has lost all minor, U12-type introns6, has63 5¢ ss predicted to be recognized by shifted base-pairing. Further-more, a comparison of orthologous 5¢ ss pairs between humans andmice showed that the shifted base-pairing arrangement is partiallyconserved (B50%): we found 27 atypical 5¢ ss that either have nonucleotide change between the two species, or have changes thatmaintain shifted base-pairing to U1; in contrast, we found 21orthologous 5¢ ss pairs recognized by shifted base-pairing in onlyone of the two species (Supplementary Table 2). These predictionsstrongly suggest that shifted base-pairing between 5¢ ss and U1 is aminor but phylogenetically widespread phenomenon and that manyof these atypical 5¢ ss are conserved.

U1a

b 5′ ss: Atp

Lane: 1 2 3 4 5 6 7 8 9 10 11 12

Decoy:

Nat

SMN1 (+6C)

6 7 8(+6U)SMN2

SMN1

SMN2

U1A7

D1

D1 D1 D7– –D7

D7

3′ 3′

Consensus 5′ ss

U6 (38-44)

U6 (43G 44U 46A)

Atypical 5′ ssU6

a

b c

45 43 41U6

U6

Sup. U1: S

wt C C

S

S S

9797101595131

001

003

004

4255

3136

3942

4 3 1 1 1

S–––

–Sup. U6:

SMN1

SMN2SMN1

6 7 8SMN2

% inclusion:

% inclusion:

s.d.:

s.d.:Lane:

(C) (C)

(S)

(C)(S)

+7+5+3+1–1–3 +5 +7 +9+3+1–1–3

3′ 3′

3′

3′

3′5′ ss +5C:

(+6C)

(+6U)

Figure 5 U6 snRNA does not base-pair to the atypical 5¢ ss in a shifted

register. (a) Schematic of the base-pairing between consensus (left) or

atypical (right) 5¢ ss and the conserved U6 ACAGAG box (positions arenumbered). The open dot indicates the g-monomethyl cap. The atypical

5¢ ss has an extended base-pairing potential to U6 in the shifted register.

(b) Schematic of the suppressor U6 snRNAs carrying compensatory

mutations in either the canonical (C) or the shifted (S) register. These

mutations (blue type) restore base-pairing for the +5C mutation at atypical

5¢ ss in the SMN1/2 context. (c) RT-PCR analysis of the SMN1/2 minigenes

cotransfected with suppressor U1 and U6 snRNAs. Labels above indicate

the suppressor U1 or U6 used. wt, wild-type U6 snRNA. Suppressor U6s

alone had no effect (lanes 3, 4 versus lane 1). In combination with

suppressor U1, suppressor U6 in the canonical register resulted in more

exon 7 inclusion than suppressor U6 in the shifted register (lanes 5 and

6 in SMN2). These results suggest that atypical 5¢ ss establish canonical

base-pairing to U6 snRNA.

Figure 4 U1 but not U1A7 snRNA decoys reduce splicing via the atypical

5¢ ss. (a) Schematic of the U1 (black) and U1A7 (green) snRNA decoys.

The D1 and D7 decoys are short RNAs expressed from the potent U6

promoter, and comprise the first 27 nt of the U6 snRNA for stability, and

a sequence with perfect complementarity to the 5¢ end of U1 (black) or

U1A7 (green) snRNAs, respectively. These decoys reduce the free levels

of their cognate snRNAs in the cell, affecting the splicing of certain introns.

(b) The D1 but not the D7 decoy reduced SMN1/2 exon 7 inclusion in

minigenes carrying either the natural or an atypical 5¢ ss. Labels above

indicate the identity of the 5¢ ss in exon 7 and the decoy used.

The triangle depicts an increasing amount of decoy plasmid transfected

with the minigene.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 7 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 87: Nature Structural Molecular Biology February

DISCUSSIONHere we have shown a phylogenetically conserved mechanism of 5¢ ssselection by shifted base-pairing to U1 snRNA, with importantimplications for genomics, evolution and human disease. Shiftedbase-pairing provides a basis for the efficient recognition of a subsetof 5¢ ss that are predicted to be very weak (Table 1). This unprece-dented mechanism also reveals that the interaction between the 5¢ ssand U1 is not as rigid as previously believed, allowing for alternativebase-pairing arrangements that result in efficient splicing. The plasti-city of the interaction between the 5¢ ss and U1 is probably toleratedbecause the U1 snRNP defines the 5¢ ss early on and is displaced fromthe spliceosome before catalysis1,31. Furthermore, the 5¢ ss and U6snRNA do not seem to show such base-pairing flexibility. Shifted base-pairing between atypical 5¢ ss and U6 would imply that an extranucleotide has to be inserted between the 5¢ ss–U6 helix and thescissile bond. As the 5¢ ss–U6 helix is at the spliceosomal catalyticcore35, subtle perturbations of the positioning of this helix couldimpair catalysis. Thus, whereas U1 has enough flexibility to recognizethe atypical 5¢ ss in a shifted register, U6 probably needs to base-pair inthe conventional register to allow the first trans-esterification step tooccur at the correct position.

Early in splicing, 5¢ ss and neighboring sequences are also bound byproteins that influence base-pairing to U1 and hence 5¢ ss selection22.For instance, the U1 snRNP–specific polypeptide U1C binds to the5¢ ss before base-pairing with U1 (refs. 36,37). Shifted base-pairingbetween the 5¢ ss and U1 could also rely on proteins by mechanismsthat might differ from those for canonical base-pairing. In addition,proteins involved in 5¢ ss selection perhaps account for the differencesin splicing patterns seen for different mutations at atypical 5¢ ss, aswell as for the differences in rescue by suppressor U1s (Figs. 2 and 3and Supplementary Figs. 3 and 4).

We ruled out the possibility that atypical 5¢ ss are recognized bythe U1 snRNA variant U1A7 (ref. 26) instead of U1. We have shownthat suppressor U1A7 snRNAs did not rescue mutations at atypical5¢ ss (Supplementary Fig. 5) and that the U1A7-specific decoy D7did not compromise recognition of any 5¢ ss (Fig. 4). As theseatypical 5¢ ss were the most likely 5¢ ss to be recognized by U1A7,considering their perfect complementarity (11 bp), our data alsosuggest that U1A7 is unlikely to function in splicing. Nevertheless,it remains possible that U1A7 is involved in processes other thansplicing, as is U1 (refs. 18,38,39), or that other U1 variants26 have arole in 5¢ ss selection.

A mechanism distinct from shifted base-pairing was proposed forone unusual intron in the HOP2 gene in S. cerevisiae40. Mutationalanalysis of this noncanonical 5¢ ss suggested that it is recognized viaan alternative base-pairing arrangement with U1, involving a bulgednucleotide at position +2 or +3 of the 5¢ ss. In the case of the humanatypical 5¢ ss analyzed here, our mutational analyses and suppressorU1 data for position –1 (Fig. 2c, lanes 4 and 5) rule out the possibilityof a bulged nucleotide in the interaction between these atypical 5¢ ssand U1: the rescue of the –1G mutation in SMN2 by the U1suppressor C10 indicates that the exonic positions of the atypical5¢ ss base-pair to U1 in the shifted register. This observation rules out

a base-pairing register between the atypical 5¢ ss and U1 that involves abulged nucleotide at the 5¢ ss, as this arrangement implies thatposition –1 would not base-pair to position 10 of U1.

Our study leaves open the possibility that other subclasses ofatypical 5¢ ss base-pair to U1 in other ‘shifted’ registers. We searchedSpliceRack6 for other base-pairing arrangements between 5¢ ss and U1,by shifting the 5¢ end of U1 by two or three positions downstream, aswell as by shifting it by one to three positions upstream (data notshown). We found few (15 or less) 5¢ ss for each of these categories.Furthermore, most of these 5¢ ss can establish a similar number of basepairs to U1 in the canonical register, as opposed to the atypical 5¢ ssanalyzed in this study (Supplementary Table 2). We conclude that, ifother shifted base-pairing arrangements between naturally occurring5¢ ss and U1 actually occur, the number of 5¢ ss recognized by theseputative mechanisms should be far lower than the counts for atypical5¢ ss presented here (Table 2). Finally, we did not find any obviouscandidate 5¢ ss that could be recognized by shifted base-pairing to U11snRNA or to the other two U1 variants26 (data not shown).

Notably, a +5 A-to-G mutation at the atypical 5¢ ss (AGA/GUUAAGUAU) in intron 2 of the human RARS2 gene results in exon 2skipping and is associated with pontocerebellar hypoplasia41. Thepathogenic effects of this mutation, which paradoxically changes anonconsensus to a consensus nucleotide, can now be explained byweakening of shifted base-pairing between this 5¢ ss and U1: an A-Cbase pair at position +5 is substituted by a weaker wobble G-C basepair in the shifted register. Indeed, we found that this transition at asimilar atypical 5¢ ss tested in the SMN1/2 context compromisedexon 7 inclusion, and exon 7 inclusion could be partially rescued bythe U1 suppressor C5, which restored shifted base-pairing (Fig. 2c,lanes 8 and 9). Thus, shifted base-pairing can explain the effects at themolecular level of the +5 A-to-G mutation in intron 2 of the humanRARS2 gene. These observations further strengthen the shifted base-pairing hypothesis and highlight its implications for molecular diag-nosis of 5¢ ss mutations10,41,42.

Atypical 5¢ ss that are recognized by shifted base-pairing to U1snRNA are found in a wide range of eukaryotic genomes. Even thoughthe estimated number of these atypical 5¢ ss in the genome is ratherlow at present, further experimental analysis of the tolerance ofmutations at these 5¢ ss is very likely to expand the set of predictedatypical 5¢ ss. Furthermore, experimental analysis of the numerous5¢ ss sequences that can potentially base-pair to U1 with similarstability in both registers should allow a reassessment of theirmechanism of recognition. In addition, characterization of this alter-native mechanism of 5¢ ss selection should prompt a recalculation ofthe 5¢ ss motifs recognized in each base-pairing register, as these twocategories of 5¢ ss should have different consensus motifs (Fig. 1a).This in turn could lead to improved splice-site prediction tools,considering that all current 5¢ ss scoring methods estimate theseatypical 5¢ ss to be very weak (Table 1). Finally, this study shouldfacilitate the development of improved algorithms to find genes andexons in sequenced genomes, as well as to predict the effects ofdisease-causing mutations and SNPs that map at these atypical 5¢ ss.

METHODSIn silico analyses. In addition to base-pairing to the classical 5¢ ss motif

spanning from positions –3 to +6, we took into account positions +7 and +8,

which can also base-pair to U1 and contribute to splicing18,43,44, even though

they do not show appreciable conservation in 5¢ ss compilations6.

The SpliceRack database is a comprehensive collection of splice

sites from five different genomes6: Homo sapiens, Mus musculus, Drosophila

melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. We used the

Table 2 Counts for the atypical 5¢ ss in five species and for the

conserved 5¢ ss between human and mouse

H. sapiens M. musculus D. melanogaster C. elegans A. thaliana

Total 59 59 20 63 115

Conserved 27 27

ART IC L E S

18 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 88: Nature Structural Molecular Biology February

built-in tool ‘Locate splice site sequence patterns’ to search for 5¢ ss that are

presumably recognized via shifted base-pairing. We restricted the query to

‘splice-site type GT_AG_U2, donor’, and ‘motif start position 5¢. We used the

following query sequences: NNHGTYRAGT, NYGGTYRAGT, NYAGTRRAGT,

NYAGTYYAGT, NYAGTYRBGT, NYAGTYRAHT, and NYAGTYRAGV, where

N ¼ A, G, C or T; Y ¼ C or T; R ¼ A or G; H ¼ A, T or C; B ¼ G, C or T;

V ¼ G, A or C. We chose these patterns to single out 5¢ ss that base-pair to

U1 much more efficiently in the shifted than in the canonical register. We

selected the intronic positions (+3 to +7) to base-pair to U1 in the shifted but

not in the canonical register, but also allowing for one nucleotide mismatch to

the putative ’shifted’ consensus (CA/GUUAAGU). The requirements for the

exonic positions (–2 and –1) are less stringent, in that mutations at these

positions have weaker effects (Fig. 2). We avoided sequences with U2-type

consensus nucleotides at both positions –2 and –1 (–2A–1G) in the searches,

because this combination substantially strengthens canonical base-pairing

to U1.

We performed all searches for the five species, retrieved and manually

curated hits using the ENSEMBL45 and UCSC46 genome browsers. We also

derived human-mouse orthologous pairs of 5¢ ss. In many cases, the ortholo-

gous gene or intron could not be identified in the other species. Nevertheless,

the comparison of 5¢ ss between humans and mice resulted in the addition of a

few extra 5¢ ss to the lists from both species, because these genes were missing

from the SpliceRack database in one of the species. We provide the total counts

of 5¢ ss predicted to be recognized by shifted base-pairing to U1 snRNA in

Table 2 and Supplementary Table 1, as well as the counts for conserved

human-mouse orthologous pairs. The complete list of atypical 5¢ ss for the five

species is provided in Supplementary Table 2.

We calculated the 5¢ ss scores using several methods15–21. See refs. 18,22 for

detailed descriptions and comparisons between algorithms.

Cloning procedures. We amplified the three-exon and two-intron GTF2H1

and INPP4A fragments from human genomic DNA and subcloned them into

the pcDNA3.1+ vector (Invitrogen). We internally deleted intron 5 of INPP4A

to leave only 225 nt at each end. Likewise, we deleted intron 7 of GTF2H1 to

leave only 200 nt at each end. The SMN1/2 mutant minigenes in the pCI vector

were previously described12. The U1 and U6 expression plasmids, termed

pN/S6 and pGemU6, respectively, were a gift from N. Hernandez (University of

Lausanne). We derived the plasmid containing the decoy RNAs from the

pU6/Hae/RA.2 plasmid47, also obtained from N. Hernandez. This plasmid

includes a U6 RNA polymerase III promoter and 27 nt of the U6 snRNA 5¢stem-loop structure to stabilize the small RNA48. In addition, we added unique

restriction sites to subclone the different decoy RNA sequences, as well as an

RNA polymerase III termination sequence.

We used PCR mutagenesis with PfuI Turbo (Stratagene) and oligonucleotides

carrying the various mutations to generate the different mutant constructs. The

sequences of all the primers used in this study are available upon request. We

digested the PCR products with DpnI (New England Biolabs) before transfor-

mation of competent DH5a cells. We verified all mutants by DNA sequencing.

Minigene transfection into HeLa cells. We cultured HeLa cells in

DMEM (Invitrogen) containing 10% (v/v) FBS and antibiotics (100 U ml–1

penicillin and 100 mg ml–1 streptomycin). We mixed the various GTF2H1,

INPP4A or SMN1/2 plasmid constructs with control or suppressor U1 or U6,

or decoy plasmids, and with the pEGFP-N1 plasmid (Clontech). For the

suppressor snRNA experiments, we transfected 80 ng of the SMN1/2 minigene

and EGFP-N1 plasmids with 800 ng of control (pcDNA3.1+ or pUC19) or

suppressor U1 or U6 plasmid. For the decoy experiments, we transfected 55 ng

of the SMN1/2 minigene and EGFP-N1 plasmids and 890 ng of decoy plasmid.

We transfected a total of 1 mg of plasmid mixture into B50%-confluent

HeLa cells in six-well plates, using FuGENE 6 (Roche Diagnostics) at a 3:1

(plasmid: reagent) ratio.

RNA extraction, reverse transcription and PCR. We harvested cells 48 h after

transfection, and extracted total RNA using TRIzol (Invitrogen). We eliminated

residual DNA by RQ-DNase1 (Promega) digestion, and we phenol-extracted

and ethanol-precipitated the RNA. We used a total of 1 mg of RNA for reverse

transcription with Superscript II RT (Invitrogen) and oligo-dT as a primer.

We amplified cDNAs derived from expression of the pcDNA3.1+ constructs

by PCR using primers located in the transcribed portion of the plasmid. We

amplified cDNAs from endogenous GTF2H1 and INPP4A transcripts using

primers in the exons flanking the exon with the atypical 5¢ ss. We amplified

cDNAs from the SMN1/2 minigenes with pCI-Fwb and pCI-Rev primers12. In

each case, we radiolabeled the 5¢ end of one of the PCR primers using T4

polynucleotide kinase (New England Biolabs) and g-32P-ATP, and we purified

the primers using MicroSpin G-25 columns (GE Healthcare). We performed

23 cycles of PCR, ensuring that amplification remained in the exponential

phase (data not shown). We separated the PCR products by 6% native PAGE,

followed by phosphorimaging analysis to quantify the intensity of the bands.

We performed three experimental replicas (RT-PCRs from three independent

transfections) to derive the mean percentage of inclusion for each experi-

ment. In all cases, the s.d. was o5%, such that the exon-inclusion percentage

values can be compared between experiments. We determined the identity

of each PCR product by using the Original TA Cloning kit (Invitrogen)

to subclone gel-purified bands, followed by sequencing on an ABI3730

automated sequencer.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank M. Hastings and D. Horowitz for insightful comments on themanuscript, R. Sachidanandam for helpful suggestions, and Y. Hua and Z. Zhangfor technical advice. X.R. and A.R.K. acknowledge support from the US NationalInstitutes of Health grant GM42699.

AUTHOR CONTRIBUTIONSX.R. performed the experiments and the in silico analyses; X.R. and A.R.K.contributed to the design of the study and to the preparation of the manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Brow, D.A. Allosteric cascade of spliceosome activation. Annu. Rev. Genet. 36,333–360 (2002).

2. Bessonov, S., Anokhina, M., Will, C.L., Urlaub, H. & Luhrmann, R. Isolation of anactive step I spliceosome and composition of its RNP core. Nature 452, 846–850(2008).

3. Lerner, M.R., Boyle, J.A., Mount, S.M., Wolin, S.L. & Steitz, J.A. Are snRNPs involvedin splicing? Nature 283, 220–224 (1980).

4. Rogers, J. & Wall, R. A mechanism for RNA splicing. Proc. Natl. Acad. Sci. USA 77,1877–1879 (1980).

5. Zhuang, Y. & Weiner, A.M. A compensatory base change in U1 snRNA suppresses a 5¢splice site mutation. Cell 46, 827–835 (1986).

6. Sheth, N. et al. Comprehensive splice-site analysis using comparative genomics.Nucleic Acids Res. 34, 3955–3967 (2006).

7. Seraphin, B., Kretzner, L. & Rosbash, M. A U1 snRNA:pre-mRNA base pairing inter-action is required early in yeast spliceosome assembly but does not uniquely define the5¢ cleavage site. EMBO J. 7, 2533–2538 (1988).

8. Siliciano, P.G. & Guthrie, C. 5¢ splice site selection in yeast: genetic alterations inbase-pairing with U1 reveal additional requirements. Genes Dev. 2, 1258–1267(1988).

9. Carmel, I., Tal, S., Vig, I. & Ast, G. Comparative analysis detects dependencies amongthe 5¢ splice-site positions. RNA 10, 828–840 (2004).

10. Roca, X. et al. Features of 5¢-splice-site efficiency derived from disease-causing mutations and comparative genomics. Genome Res. 18, 77–87(2008).

11. Will, C.L. & Luhrmann, R. Splicing of a rare class of introns by the U12-dependentspliceosome. Biol. Chem. 386, 713–724 (2005).

12. Cartegni, L., Hastings, M.L., Calarco, J.A., de Stanchina, E. & Krainer, A.R. Determi-nants of exon 7 splicing in the spinal muscular atrophy genes, SMN1 and SMN2.Am. J. Hum. Genet. 78, 63–77 (2006).

13. Lorson, C.L., Hahnen, E., Androphy, E.J. & Wirth, B. A single nucleotide in the SMNgene regulates splicing and is responsible for spinal muscular atrophy. Proc. Natl.Acad. Sci. USA 96, 6307–6311 (1999).

14. Kashima, T. & Manley, J.L. A negative element in SMN2 exon 7 inhibits splicing inspinal muscular atrophy. Nat. Genet. 34, 460–463 (2003).

15. Senapathy, P., Shapiro, M.B. & Harris, N.L. Splice junctions, branch point sites, andexons: sequence statistics, identification, and applications to genome project. MethodsEnzymol. 183, 252–278 (1990).

16. Shapiro, M.B. & Senapathy, P. RNA splice junctions of different classes of eukaryotes:sequence statistics and functional implications in gene expression. Nucleic Acids Res.15, 7155–7174 (1987).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 89: Nature Structural Molecular Biology February

17. Serra, M.J. & Turner, D.H. Predicting thermodynamic properties of RNA. MethodsEnzymol. 259, 242–261 (1995).

18. Hartmann, L., Theiss, S., Niederacher, D. & Schaal, H. Diagnostics of pathogenicsplicing mutations: does bioinformatics cover all bases? Front. Biosci. 13,3252–3272 (2008).

19. Brunak, S., Engelbrecht, J. & Knudsen, S. Prediction of human mRNA donor andacceptor sites from the DNA sequence. J. Mol. Biol. 220, 49–65 (1991).

20. Burge, C. Modeling dependencies in pre-mRNA splicing signals. in ComputationalMethods in Molecular Biology, Ch. 8, 129–164 (eds. Salzberg, S.L., Searls, D.B. &Kasif, S.) (Elsevier, Philadelphia, 1998).

21. Yeo, G. & Burge, C.B. Maximum entropy modeling of short sequence motifs withapplications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

22. Roca, X., Sachidanandam, R. & Krainer, A.R. Determinants of the inherent strength ofhuman 5¢ splice sites. RNA 11, 683–698 (2005).

23. Mount, S.M. & Anderson, P. Expanding the definition of informational suppression.Trends Genet. 16, 157 (2000).

24. Lo, P.C., Roy, D. & Mount, S.M. Suppressor U1 snRNAs in Drosophila. Genetics 138,365–378 (1994).

25. Cohen, J.B., Snow, J.E., Spencer, S.D. & Levinson, A.D. Suppression of mammalian5¢ splice-site defects by U1 small nuclear RNAs from a distance. Proc. Natl. Acad. Sci.USA 91, 10470–10474 (1994).

26. Kyriakopoulou, C. et al. U1-like snRNAs lacking complementarity to canonical5¢ splice sites. RNA 12, 1603–1611 (2006).

27. Newman, A.J. & Norman, C. U5 snRNA interacts with exon sequences at 5¢ and 3¢splice sites. Cell 68, 743–754 (1992).

28. Wassarman, D.A. & Steitz, J.A. Interactions of small nuclear RNA’s with precursormessenger RNA during in vitro splicing. Science 257, 1918–1925 (1992).

29. Kandels-Lewis, S. & Seraphin, B. Involvement of U6 snRNA in 5¢ splice site selection.Science 262, 2035–2039 (1993).

30. Lesser, C.F. & Guthrie, C. Mutations in U6 snRNA that alter splice site specificity:implications for the active site. Science 262, 1982–1988 (1993).

31. Staley, J.P. & Guthrie, C. Mechanical devices of the spliceosome: motors, clocks,springs, and things. Cell 92, 315–326 (1998).

32. Hwang, D.Y. & Cohen, J.B. U1 snRNA promotes the selection of nearby 5¢ splice sitesby U6 snRNA in mammalian cells. Genes Dev. 10, 338–350 (1996).

33. Brackenridge, S., Wilkie, A.O. & Screaton, G.R. Efficient use of a ’dead-end’GA 5¢ splice site in the human fibroblast growth factor receptor genes. EMBO J. 22,1620–1631 (2003).

34. Konarska, M.M., Vilardell, J. & Query, C.C. Repositioning of the reaction intermediatewithin the catalytic center of the spliceosome. Mol. Cell 21, 543–553(2006).

35. Rhode, B.M., Harmuth, K., Westhof, E. & Luhrmann, R. Proximity of conserved U6 andU2 snRNA elements to the 5¢ splice site region in activated spliceosomes. EMBO J.25, 2475–2486 (2006).

36. Du, H. & Rosbash, M. Yeast U1 snRNP-pre-mRNA complex formation withoutU1snRNA-pre-mRNA base pairing. RNA 7, 133–142 (2001).

37. Du, H. & Rosbash, M. The U1 snRNP protein U1C recognizes the 5¢ splice site in theabsence of base pairing. Nature 419, 86–90 (2002).

38. Lu, X.B., Heimer, J., Rekosh, D. & Hammarskjold, M.L. U1 small nuclear RNA plays adirect role in the formation of a rev-regulated human immunodeficiency virusenv mRNA that remains unspliced. Proc. Natl. Acad. Sci. USA 87, 7598–7602(1990).

39. Boelens, W.C. et al. The human U1 snRNP-specific U1A protein inhibits polyadenyla-tion of its own pre-mRNA. Cell 72, 881–892 (1993).

40. Leu, J.Y. & Roeder, G.S. Splicing of the meiosis-specific HOP2 transcript utilizes aunique 5¢ splice site. Mol. Cell. Biol. 19, 7933–7943 (1999).

41. Edvardson, S. et al. Deleterious mutation in the mitochondrial arginyl-transfer RNAsynthetase gene is associated with pontocerebellar hypoplasia. Am. J. Hum. Genet. 81,857–862 (2007).

42. Buratti, E. et al. Aberrant 5¢ splice sites in human disease genes: mutation pattern,nucleotide structure and comparison of computational tools that predict their utiliza-tion. Nucleic Acids Res. 35, 4250–4263 (2007).

43. Lund, M. & Kjems, J. Defining a 5¢ splice site by functional selection in the presenceand absence of U1 snRNA 5¢ end. RNA 8, 166–179 (2002).

44. Schwartz, S.H. et al. Large-scale comparative analysis of splicing signals and theircorresponding splicing factors in eukaryotes. Genome Res. 18, 88–103(2008).

45. Stalker, J. et al. The Ensembl web site: mechanics of a genome browser. Genome Res.14, 951–955 (2004).

46. Kent, W.J. et al. The Human Genome Browser at UCSC. Genome Res. 12, 996–1006(2002).

47. Lobo, S.M. & Hernandez, N. A 7 bp mutation converts a human RNA polymerase IIsnRNA promoter into an RNA polymerase III promoter. Cell 58, 55–67(1989).

48. Good, P.D. et al. Expression of small, therapeutic RNAs in human cell nuclei. GeneTher. 4, 45–54 (1997).

ART IC L E S

18 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 90: Nature Structural Molecular Biology February

A distinct class of small RNAs arises from pre-miRNA–proximal regions in a simple chordateWeiyang Shi1,2, David Hendrix1,2, Mike Levine1 & Benjamin Haley1

MicroRNAs (miRNAs) have been implicated in various cellular processes. They are thought to function primarily as inhibitorsof gene activity by attenuating translation or promoting mRNA degradation. A typical miRNA gene produces a predominantB21-nucleotide (nt) RNA (the miRNA) along with a less abundant miRNA* product. We sought to identify miRNAs from thesimple chordate Ciona intestinalis through comprehensive sequencing of small RNA libraries created from different developmentalstages. Unexpectedly, half of the identified miRNA loci encode up to four distinct, stable small RNAs. The additional RNAs,miRNA-offset RNAs (moRs), are generated from sequences immediately adjacent to the predicted B60-nt pre-miRNA. moRs seemto be produced by RNAse III–like processing, are B20 nt long and, like miRNAs, are observed at specific developmental stages.We present evidence suggesting that the biogenesis of moRs results from an intrinsic property of the miRNA processingmachinery in C. intestinalis.

miRNA genes have been observed across the Eukarya1–5. A typicalmiRNA arises from the processing of a larger primary transcript(pri-miRNA) that is synthesized by RNA polymerase II, as seen forprotein-coding genes6. The pri-miRNA transcript forms one or multi-ple fixed hairpin structures that are liberated by the RNase III enzymeDrosha7. The resulting B70-nt hairpins (pre-miRNAs) are furtherprocessed by a separate RNAse III enzyme, Dicer, which producesstable, mature miRNAs of 20–22 nt in length8–10.

Serial processing of pre-miRNAs is usually asymmetric, resulting inthe production of a single, predominant miRNA arising from eitherthe 5¢ or 3¢ arm of the pre-miRNA hairpin. In some cases, the oppositearm produces what is known as a miRNA* sequence that can reachappreciable steady-state levels but is less abundant than the miRNA11.The resulting miRNA and miRNA* can regulate distinct targetmRNAs in a coordinated fashion12.

It has been proposed that conserved miRNA gene families provide adistinctive evolutionary signature and that the miRNA repertoireexpands along with animal complexity13. To better understand theevolutionary history of miRNA genes among the chordate lineages, weperformed a high-resolution study of small RNAs from the ascidianCiona intestinalis, which belongs to the sister group of the verte-brates14. In contrast to other well-studied model organisms,C. intestinalis possesses a uniquely simplified repertoire of smallRNA cofactors, consisting of single copies of Drosha, Pasha, Dicer,TRBP/PACT and Argonaute, and just two PIWI homologs11,14,15.

Here we report that numerous miRNA loci in C. intestinalisproduce one or two discrete and stable B20-nt small RNA speciesfrom sequences immediately adjacent to the predicted pre-miRNA

hairpins, in addition to conventional miRNA and miRNA* products.The biogenesis of these distinct RNAs is not explained by currentmodels of miRNA processing. We present evidence that moRs arederived from an unanticipated activity of the C. intestinalis miRNA-biogenesis pathway.

RESULTSDistinct small RNAs encoded by miRNA lociWe prepared small RNA (B16–26-nt) libraries from C. intestinalis atvarious developmental stages, including unfertilized eggs, earlyembryos, late embryos and adults. High-throughput sequencing ofthe resulting cDNAs was performed with an Illumina 1G GenomeAnalyzer. Combining earlier studies with a recently described miRNA-discovery algorithm, we defined 80 miRNA loci in the C. intestinalisgenome16–18. Detailed information regarding the encoded miRNAsand their potential target mRNAs is provided in SupplementaryTables 1–4 online and at the following website: http://flybuzz.berkeley.edu/cgi-bin/CionaMicroRNAs.cgi.

Half of these genes encode a single major product (the miRNA),along with a less abundant miRNA* sequence, as is typically seen inother organisms19,20. For example, the C. intestinalis (Ci) miR-125gene (ortholog of the prototypic lin-4 miRNA in Caenorhabditiselegans) encodes a predominant miRNA that is stably expressed atall developmental stages examined21 (Supplementary Fig. 1 online).Ci-miR-125 is most highly expressed in adults, and at the adult stage asingle clone of miR-125* is also detected.

Unexpectedly, the remaining half of C. intestinalis miRNA lociencode previously uncharacterized small RNAs, in addition to

Received 16 September 2008; accepted 21 November 2008; published online 18 January 2009; doi:10.1038/nsmb.1536

1Department of Molecular Cell Biology, Division of Genetics, Genomics, and Development, Center for Integrative Genomics, University of California, Berkeley,California 94720-3200, USA. 2These authors contributed equally to the work. Correspondence should be addressed to B.H. ([email protected]) or M.L.([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 3

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 91: Nature Structural Molecular Biology February

conventional miRNA and miRNA* products. This new class of RNAsarises from sequences located adjacent to the predicted pre-miRNAstem-loop, and we hereafter refer to them as ‘moRs’, for miRNA-offsetRNAs. Only small RNAs with 5¢ monophosphates and free 3¢ hydroxylgroups can be cloned by the method used in this study (see Methods),although they could contain modifications on the 2¢ oxygen, asseen for Piwi-interacting RNAs (piRNAs) and some miRNAspecies22. Most moR sequences are 19–20 ntin length, whereas C. intestinalis miRNAsrange in size between 19 nt and 22 nt(Supplementary Fig. 2a online). Overall,moRs are considerably less abundant thanmiRNAs, but just B50% less abundant thanmiRNA* sequences (1,552 total reads and3,353 total reads, respectively) (Supplemen-tary Table 4b). In general, moRs show

greater 5¢ heterogeneity than miRNA or miRNA* sequences (Supple-mentary Fig. 2b). However, several abundantly expressed moRs, suchas 5¢ moRs 124-1 and 219, contain a rigid 5¢-terminal nucleotideidentity and show developmental regulation, suggesting that particularmoRs may be under selective pressure, as has been suggested for the 5¢ends of miRNAs23.

It is possible that the C. intestinalis miRNA loci encoding moRscontain unique structural features, as compared to those that do not24.Global comparisons of base-pairing probabilities across the extendedpre-miRNA loci in C. intestinalis revealed only modest structuraldifferences between the two classes of miRNA loci (SupplementaryFig. 3 online). Overall, C. intestinalis miRNA loci maintain a similarbase-pairing probability trace as those seen in Drosophila melanogaster,suggesting that C. intestinalis miRNA genes lack an intrinsic, species-specific structure. Similarly, there is no obvious difference in the size ofthe loop sequences in pre-miRs that produce moRs and those that donot (B13 nt and B15 nt, respectively; Supplementary Fig. 4aonline). In addition, we analyzed sequence motifs for all smallRNAs cloned in this study. Whereas C. intestinalis miRNAs retainedthe expected 5¢-uracil bias, no obvious motifs were apparent in themoRs25 (Supplementary Fig. 4b). Thus, it is currently unclear whythey arise from particular miRNA loci.

Unfertilized egg

106

103

0

106

103

0

106

103

0

106

103

0

Early embryo

Late embryo

Adult((((((((((.....)))))..((((.(.(((.(((((.((((((.(.((((........)))).).)))))).)))))))).)......((((((....))))))..)))).

5′ moR-219 miR-219 miR-219*

Reads

Reads

Reads

Reads

a C. intestinalis miR-219 locus

b

9

39

193

81

620

5

24

36

miR-219 (715 reads)

miR-219* (60 reads)

5′-m

oR-219 (232 reads)

5′

3′

b c

a Unfertilized egg

Early embryo

Late embryo

Adult

C. intestinalis miR-124-1/2 locus

5′-moR

-124

-1

miR

-124

-1*

miR

-124

-1

5′-moR

-124

-2

miR

-124

-2*

miR

-124

-2

3′-moR

-124

-2

3′-moR

-124

-1

Reads

Reads

Reads

Reads

33 33

159

172

124

249

1,763

6,382

33

2

1,760

6,376

33

6

2177

320

25

31

106

103

0106

103

0106

103

0106

103

0

mir-124-2 (8,202 reads)

miR-124-2* (497 reads)5′-moR-124-2 (56 reads)

3′-moR-124-2 (8 reads)

miR-124-1 (8,211 reads)

miR-124-1* (373 reads)

5′-moR-124-1 (331 reads)

3′-moR-124-1 (2 reads)Canonical class IIIRNAse III product

~19-bp core

~2-nt 3′ overhang

Ci-miR-124-1

Ci-moR-124-1

5′3′

5′3′

5′3′

5′3′

5′

3′

5′

3′

5′3′

5′3′

Figure 1 Developmental expression of small RNAs encoded by the

C. intestinalis miR-219 locus. (a) Graphical depiction of small RNAs that

map to the miR-219 locus at four developmental time points, indicated to

the right. The histograms represent overlapping Illumina sequencing reads

(numbered above stack) centered at each position (miRNA, blue; miRNA*,

burgundy; 5¢-moR, yellow). The y axis is plotted on a log scale. The

secondary structure of the locus is presented in parenthetical format.

(b) Locations of miRNA, miRNA* and moR sequences on the predicted

secondary structure surrounding the pre–miR-219 hairpin. mFold was

used to predict pre-miRNA secondary structure here and in the

following figures45,46.

Figure 2 Coincident expression of 5¢ and 3¢ moR

sequences from the C. intestinalis miR-124

locus. (a) Sequencing reads at each position of

the miR-124 cluster are shown (miRNA, blue;

miRNA*, burgundy; 5¢-moR, yellow; 3¢-miRNA,

green). (b) miRNA and moR sequences aligned

with sequence surrounding the predicted pre–

miR-124-1 and pre–miR-124-2 stem-loop

structures. A red ‘C’ in the pre–miR-124-1

structure indicates a shared base between

multiple 5¢-moR and miR-124-1* clones.

(c) Standard class III RNAse III product is shown

(above), depicting an B19-nt core of matched

RNA bases, along with an B2-nt 3¢ overhang.

Aligned sequences are shown in the context of

the predicted secondary structure of the pri-miRNA for miR-124-1 (top) and miR-124-1*

(bottom), as well as 5¢-moR-124-1 (bottom) and

3¢-moR-124-1 (top). A shared base between loci

is marked as a red ‘‘C’’.

ART IC L E S

18 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 92: Nature Structural Molecular Biology February

Defining characteristics of moRsThe C. intestinalis miR-219 gene (Ci-miR-219) encodes a predicted57-nt pre-miRNA hairpin that is processed to produce miR-219 andmiR-219*. In addition, a 5¢ moR product (5¢-moR-219) arises fromsequences located immediately adjacent to miR-219 (Fig. 1a,b). Thepredominant miR-219 and miR-219* sequences are each B21 nt inlength, whereas the 5¢-moR-219 sequence is 20 nt (Fig. 1b).

Like most miRNAs, nearly all 5¢-moR-219 clones maintain aninvariant 5¢ end (223 of 232 total reads)23. Each of the three smallRNAs observed at the miR-219 locus showed developmental regula-tion (Fig. 1a). Only miR-219 was detected in unfertilized eggs andadults (Fig. 1a), whereas both miR-219* and 5¢-moR-219 were seenduring embryogenesis.

In some cases, two distinct moRs are produced from a singlemiRNA gene, in addition to miRNA and miRNA* sequences(Fig. 2). The Ci-miR-124 locus encodes a pri-miRNA containingtwo tandem, but slightly different, B58-nt pre-miRNAs (Fig. 2b).The resulting miRNAs, miR-124-1 and miR-124-2, are identical, andthe sequence shows peak expression, as evidenced by increased readcounts (see miR-133 example below), in advanced-stage embryos(Fig. 2a). Both pre-miRNAs produce 5¢ and 3¢ moRs during embry-ogenesis (Fig. 2a). We observed the 3¢ moR from the pre–miR-124-2hairpin in both early embryos and late embryos, but the 3¢ moR fromthe pre–miR-124-1 hairpin was detected only in early embryos.Moreover, 5¢-moR-124 RNAs are considerably more abundant thanthe 3¢-moR-124 RNAs, a result that is typical of the moRs andreminiscent of the processing of miRNA and miRNA* sequences, aswell as processing of pri-miRNA 5¢ and 3¢ arms by Drosha26–28.

Notably, alignment of coincident 5¢ and 3¢ moR sequences fromnumerous miRNA loci suggests that they arise from RNAse IIIprocessing (B21-nt duplexed RNAs with B2-nt 3¢ overhangs)29

(Fig. 2c and Supplementary Fig. 5 online).Despite the high prevalence of moRs associated with miRNA loci in

C. intestinalis, we found that, overall, moR sequences are poorlyconserved as compared to miRNAs between C. intestinalis and arelated ascidian species, Ciona savignyi, and moRs are even lessconserved than miRNA* sequences (Supplementary Fig. 4c). How-ever, it has been noted that well-conserved small RNAs are expressedat higher levels than those lacking conservation19,30. This is true formost miRNAs when comparing C. intestinalis to C. savignyi (Supple-mentary Fig. 4c). Similarly, abundant moRs are also better conservedthan those found at low copy number. Nonetheless, the general lack ofconservation raises the possibility that moRs may represent unstableprocessing intermediates during the biogenesis of miRNAs. Suchintermediates might be produced through a generic RNA-degradationmechanism that leaves behind spurious and variably sized small RNAs.However, as with miRNAs, the high copy number and near uniformityof clones at each locus suggests that moRs are produced mainly asB20-nt RNAs. To further address this point, we used northern assaysto directly examine the expression and size distribution of miRNA andmoRs in C. intestinalis embryos (see below).

Direct detection of moRs as discrete small RNAsVertebrate miR-133 genes are often part of a bicistronic pri-miRNAthat also contains miR-1, and the two miRNAs work together topromote mesodermal fates31. A similar genomic linkage is seen in

Unfertilized egg

Early embryo

Late embryo

Adult

Reads

Reads

Reads

Reads

.....(((((..((((((((.((((.((((((.(((.((.((((((((....((.......))..)))))))).)))))..)))))).)))).)))).)))).....(((.....

5′ moR miR-133* Loop miR-133

a

b

M

30 —25 —

17 —

30 —25 —

17 —

Un EE LE Ad

miR-133

5′-moR-133

U6 RNA

cC. intestinalis miR-133 locus

5

104

698

195

104

89

5

2

55

132

1,978

12

6

2

30 —

25 —

17 —

30 —

25 —

17 —

M

dWT Bra

miR-133

5′-moR-133

U6 RNA

106

103

0

106

103

0

106

103

0

106

103

0

5′3′

miR-133* (198 reads)

miR-133 (2,167 reads)

5′-moR-133 (1,002 reads)

Loop (20 reads)

Figure 3 Direct detection of the 5¢-moR-133 species. (a) Overlapping sequencing reads at each position along the miR-133 locus (miRNA, blue; miRNA*,

burgundy; loop, gray; 5¢-moR, yellow). (b) Alignment of sequenced reads on the predicted structure surrounding pre–miR-133. (c) Total RNA (B30 mg per

lane) was used for northern blots showing the B21-nt miR-133 (above) and 5¢-moR-133 (middle) species throughout C. intestinalis development (M, size

markers; Un, unfertilized eggs; EE, early embryos; LE, late embryos; Ad, adult animals). A northern blot for U6 RNA was used as a loading control (below).(d) As in c, comparing tailbud-stage C. intestinalis embryos that are unelectroporated (wild type, WT) or electroporated with a Ci-Brachyury enhancer:minimal

Ci-miR-133 transgene (Bra). The Ci-Brachyury enhancer drives expression in the developing notochord33.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 93: Nature Structural Molecular Biology February

C. intestinalis, and previous studies have shown that the primarytranscript containing miR-1 and miR-133 is selectively expressed indeveloping tail muscles during C. intestinalis embryogenesis32. TheC. intestinalis miR-133 locus encodes separate miRNA, miRNA* and5¢ moR products (Fig. 3). miR-133 reads steadily increase duringembryogenesis and reach peak levels in adults (Fig. 3a). We found thatthe 5¢-moR-133 RNA is most abundant in late embryos and is presentat an equal or higher read count than miR-133 and miR-133* at allembryonic stages examined.

The levels of miR-133 and 5¢-moR-133 detected in northern assaysare in agreement with the sequencing frequencies obtained from thecDNA libraries (Fig. 3c). There is a progressive increase in the steady-state levels of miR-133 in unfertilized eggs, early embryos, late-stageembryos and adults (Fig. 3c, above). Similarly, the predicted 5¢-moR-133 RNA was detected as a stable product (appearing as a doublet ofB19–20-nt species in adults), with peak levels seen in late embryos.There was no indication of a smear or ‘ladder’ of higher- or lower-molecular-weight products, as would be expected if moRs representedincompletely degraded hairpin sequences or cleaved pri-miRNAtranscripts. Moreover, ectopic expression of Ci-miR-133 directed bya Ci-Brachyury enhancer in the developing C. intestinalis notochord—the primitive chordate backbone—resulted in increased accumulationof both 5¢-moR-133 and miR-133, indicating that expression of adiscrete moR is correlated with that of the host miRNA transcript33

(Fig. 3d).

Drosophila pri-miRNAs produce moRs in the Ciona tadpoleThe preceding analysis suggests that moRs arise from an intrinsicproperty of the C. intestinalis small RNA–biogenesis machinery (seeDiscussion). To test this possibility, the miR-309 miRNA cluster (alsoknown as ‘8-miR’) from D. melanogaster was selectively expressed inC. intestinalis34,35 (Fig. 4). We reasoned that the pri–miR-309 tran-script would be more likely to produce detectable moRs whenexpressed in the C. intestinalis tadpole because it seems to producesuch products, albeit rarely, in D. melanogaster (Fig. 4a).

We separately placed the entire miR-309 cluster under the control ofthree different tissue-specific enhancers from C. intestinalis thatdirect expression in the notochord, epidermis and mesenchyme,respectively33,36. All three transgenes were coelectroporated into ferti-lized eggs, and the embryos were allowed to develop to the tailbudstage (after neurogenesis). Total RNA was extracted from theseembryos and subjected to high-throughput sequencing or used fornorthern assays.Drosophila melanogaster moRs are produced at high steady-state

levels in C. intestinalis, and here we focused on the miR-3 and miR-5genes within the miR-309 cluster. We detected only four 3¢-moR-3RNA reads in the D. melanogster embryo, whereas in C. intestinalis weobserved nearly 2,000 copies (Fig. 4a,b). There is also a markedincrease in the levels of the 5¢-moR-5 RNA produced in C. intestinalisas compared with those in D. melanogaster. Nearly all copies ofthis moR RNA contain homogenous 5¢ and 3¢ termini (1,616 of

a

b

miR-6-3miR-6-2miR-6-1miR-5miR-4miR-3miR-309 miR-286

D. melanogaster

C. intestinalis

Reads

Reads

D. melanogaster miR-309 cluster

361,

619

2,19

76,

781

1210

1

1,84

58,

335

47,

557

67411

5

M

30 —

25 —

17 —

30 —

25 —

17 —

Ci Ci + m

iR-3

09

Dm

miR

-35′-m

oR-3

U6

c

106

103

0

106

103

0

D. melanogaster

C. intestinalis

D. melanogaster

C. intestinalis

4 (5)

11 (11)

41 (67) 5,854 (7,557)

8,167 (8,335)4 (4)

3 (4)

1,843 (1,845)

16 (36)

1,616 (1,619)

3,351 (6,781)

2,190 (2,197)

44 (101)

6 (12)

miR-3

miR-5

5′-moR-3 3′-moR-3miR-3miR-3*

5′-moR-5 miR-5 miR-5*

Figure 4 Ectopic expression of Drosophila pri-miRNAs can induce moR production in C. intestinalis embryos. (a) Small RNAs were cloned from 2–4-hour-old

D. melanogaster Toll10b mutant embryos (above), which contain only mesodermal cell types, or tailbud-stage C. intestinalis embryos expressing the entire

D. melanogaster pri–miR-309 cluster (below), and were subjected to Illumina sequencing. The resulting sequencing reads are shown at each position along

the D. melanogaster miR-309 locus (miRNA, blue; miRNA*, burgundy; 5¢-moR, yellow; intervening loop, gray). (b) The most abundant reads overlappingthe respective regions of the miR-3 (above) or miR-5 (below) loci are shown. The number of clones matching the exact sequence depicted is shown in

comparison to the overall number of clones overlapping that segment (in parentheses). (c) Northern blots showing miR-3 (above) and 5¢-moR-3 (middle)

in C. intestinalis and D. melanogaster embryos. For each well, B50 mg total RNA was analyzed from tailbud-stage C. intestinalis embryo that were

unelectroporated (Ci), similarly staged C. intestinalis embryos electroporated with D. melanogaster miR-309 expression plasmids (Ci + miR-309),

or 2–4-hour-old Toll10b embryos. Below is shown a northern blot in which a cross-reactive probe for U6 RNA was used as a loading control.

ART IC L E S

18 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 94: Nature Structural Molecular Biology February

1,629 cloned copies are identical; Fig. 4b). In contrast, miR-3 wascloned at high frequency in D. melanogaster and C. intestinalis. Usingnorthern assays, we identified similar levels of miR-3 in C. intestinalisand D. melanogaster embryos, a result that is consistent with thesimilar number of reads detected by sequence analysis. However, usinga specific 5¢-moR-3 hybridization probe, we detected a discrete band,without any obvious intermediate products, only in C. intestinalisembryos ectopically expressing the miR-309 cluster (Fig. 4c).

There is no obvious correlation between the efficiency of moRbiogenesis and the size of the loop sequence in the pre-miRNAs orconservation of other features. For example, the pre-miRNAs encod-ing miR-3 and miR-5 contain loops of 13 nt and 18 nt, respectively,but nonetheless produce similar yields of moRs. These experimentsclearly demonstrate that the stable expression of moRs is an intrinsicfeature of the C. intestinalis small RNA–processing machinery.

DISCUSSIONWe have presented a high-resolution analysis of small RNAs duringthe development of the simple chordate, C. intestinalis. In the courseof documenting 80 C. intestinalis miRNA genes, a distinct species ofsmall RNAs was found to arise from sequences immediately 5¢ and 3¢of the expected miRNA and miRNA* products. We have termed thesesmall RNAs moRs (miRNA-offset RNAs).

moRs arise from B50% of the detected miRNA loci inC. intestinalis. However, there is no obvious sequence or structural dif-ference between those miRNA loci that produce moRs and those thatdo not. This observation raises the possibility that moRs might reflect

an intrinsic property of the small RNA–biogenesis machinery in C. intestinalis (seebelow). It is currently unclear why thismachinery fails to produce moRs from halfof the C. intestinalis miRNA genes and whythere is differential accumulation of individualmoRs during C. intestinalis development.

Putative moR products are seen inD. melanogaster and mouse embryonic stemcells, although they are extremely rare19,37.It was suggested that they might arise asby-products from exonuclease digestion ofpri-miRNAs. According to this view, thepre-miRNA stem-loop would be excisedfrom the pri-miRNA by Drosha, followedby decapping and 5¢-3¢ degradation, leavingbehind fortuitously cloned B21-mersnear the base of the pre-miRNA (summarizedin Fig. 5). We have presented evidencesuggesting that this mechanism prob-ably does not apply to the biogenesis ofC. intestinalis moRs. These products are farmore abundant in C. intestinalis as comparedwith D. melanogaster and mouse. Moreover,the most abundant moRs contain homo-genous 5¢ and 3¢ termini, and northern assaysdid not detect intermediate cleavage products(a smear or ladder), as would be expectedfrom such processive degradation (Figs. 3c,dand 4c).

In C. intestinalis, distinct 5¢ and 3¢ moRsarise from sequences located between thebicistronic Ci-miR-124-1/2 pre-miRNAs andfrom an ectopically expressed D. melanogaster

pri-miRNA cluster. It is difficult to reconcile the proposed exonucleo-lytic degradation model with the occurrence of such moRs, becausethis intervening region should be equally accessible to 5¢-3¢ and 3¢-5¢exonucleases38,39. Once again, such processing would be expected toproduce a range of small RNAs rather than the discrete products thatare actually observed.

Altogether, the simplest explanation for the biogenesis of moRs isthat they arise during Drosha processing of the pri-miRNA transcript.Drosha is a class II RNAse III enzyme containing two tandemRNAse III domains28,40. Following intramolecular dimerization ofthese domains, the enzyme cleaves the pri-miRNA substrate at asingle site (two total phosphodiester bonds), releasing a 5¢ and a3¢ product in addition to the pre-miRNA. Analysis of coincident 5¢and 3¢ moRs from numerous miRNA loci (such as those arising nearmiR-124-1) suggests that they may be paired in a manner similar toproducts generated through a bona fide RNAse III–like mechanism.That is, the duplexed RNAs contain B2-nt 3¢ overhangs, as seen forDicer products29.

For a lone C. intestinalis Drosha molecule to produce moRs, thesingle processing center must cut in a processive fashion at two sitesalong the pri-miRNA substrate, which is inconsistent with the pre-vailing model for Drosha activity28. Interactions among Droshamolecules could reconcile this apparent discrepancy. Such a mechan-ism is suggested by the recent demonstration of multimerized humanDrosha complexes28. Notably, mouse embryonic stem cells lackingDicer show enriched levels of moR-like sequences, which are lost upondisruption of Drosha activity37.

7m GpppG

AAA...

XXX

7mGpppG

7mGpppG

AAA...

XX

XX

AAA...

XX

XX

7m Gpp

pG

AA

A

7m Gpp

pG

AA

A

7mG

pppG

AA

A

a

b

c

Drosha cleavage

Drosha cleavage

Drosha multimercleavage

Figure 5 A speculative model for the biogenesis of moRs. (a) Previous analysis of D. melanogaster

and mouse small RNAs suggested that pre-miRNA–proximal sequences (analogous to moRs) were

by-products of exonucleolytic degradation following excision of the pre-miRNA hairpin by Drosha

(Drosha is represented in blue and yellow crosses indicate where Drosha cuts). (b) moR production

may result via excision of an B20-nt, imperfectly paired duplex RNA at the immediate base of the

pre-miRNA stem-loop, following two concurrent or sequential cuts by a single Drosha molecule.

(c) Alternatively, a multimeric complex containing at least two Drosha molecules could associate witha substrate pri-miRNA. Here each Drosha molecule would cleave the pri-miRNA at a distinct position,

liberating the pre-miRNA, as well as the B20-nt moR duplex.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 95: Nature Structural Molecular Biology February

It is possible that Drosha produces ‘double cuts’ in most or allorganisms, not just C. intestinalis. However, the resulting moR RNAsmay be subject to rapid degradation by an unknown pathway. Cionaintestinalis might have a modified version of this degradation pathwayto produce high steady-state levels of moRs. Future studies willexplore the mechanistic details of moR biogenesis and function inC. intestinalis development.

METHODSSmall RNA cloning and detection. We collected adult C. intestinalis animals

from Half Moon Bay, California, and maintained them in an artificial seawater

tank. We carried out fertilization, dechorionation and electroporations as

previously described33. Total RNA was extracted from unfertilized eggs, cleavage

stage, tadpole-stage embryos and adults using the miRVana miRNA Isolation

Kit (Ambion). Small RNA cloning was carried out as previously described41.

Basically, from B30 mg of total RNA, only 17–25-nt RNAs were size selected via

15% denaturing PAGE. The 3¢ ‘modban-1’ adaptor (IDT) was ligated to the

RNAs from this fraction with RNA ligase (Ambion) in ATP-free reaction

buffer41, and appropriately ligated RNAs were size selected via 15% denaturing

PAGE. The modified RNAs were subsequently ligated to a 5¢ linker (Solexa

linker) in the presence of RNA ligase and in reaction buffer with ATP. The

resulting RNA library was reverse transcribed to a cDNA library with Super-

Script II (Invitrogen). cDNA was amplified using Illumina sequencing–

specific primers, and the resulting libraries were sequenced on an Illumina

1G Genome Analyzer. In parallel, small RNAs were extracted using TRIZOL

(Invitrogen), cloned and sequenced, as above, from staged, 2–4-hour-old

D. melanogaster Toll10b embryos34. Northern blotting assays were performed

as described previously42.

We cloned the D. melanogaster miR-309 cluster by amplifying the locus from

yw genomic DNA using pfuUltra High Fidelity polymerase (Stratagene) and the

TOPO TA cloning system (Invitrogen).

Ci-Brachyury, Ci-FoxF and Ci-Twist enhancers were used to drive transgene

expression in the C. intestinalis notochord, epidermis and mesenchyme,

respectively33,36.

Primers used for amplification of the Ci-Twist enhancer were Ci-Twist-F

(forward), 5¢-ACCACAGCTTCTATTATATA-3¢, and Ci-Twist-R (reverse), 5¢-CATCGTGTGTTGATTGATTT-3¢.

Probe sequences for the Ci-miR-133 northern assay were Ci-miR-133 (5¢-CAGCTGGTTGAAGGGGACCAAA-3¢), Ci-5¢-moR-133 (5¢-GACCGACACC

CGCAATGTTT-3¢) and Ci-U6 (5¢-GTCATCCTTGCGCAGGGGCCATGCTA

ATCTTCTCTGTATCGTTCC-3¢).

The C. intestinalis miR-133 amplification primers were Ci-miR-133-F (for-

ward), 5¢-CGTTTTATACGGTTATATACAGG-3¢, and Ci-miR-133-R (reverse),

5¢-TATTTCCGACTACTGAGCG-3¢.The Drosophila miR-309 cluster amplification primers were Dme-8miR-F

(forward), 5¢-TGCAGACAAATGACGAATTGA-3¢, and Dme-8miR-R (reverse),

5¢-CCGACCCTTTCAGGTAACAA-3¢.The probe sequences for the Drosophila miR-3 northern assay were Dme-

miR-3, 5¢-TGAGACACACTTTGCCCAGTGAT-3¢ and Dme-5¢-moR-3, 5¢-CAG

GATCGGGACCTTAGGTG-3¢.

Data analysis. The standard Illumina pipeline (GAPipeline-0.3.0) was used

to extract sequenced reads. Nucleotide positions 1 to 26 were aligned to the

C. intestinalis (JGI version 1.0) or D. melanogaster (version 4.3) genomes using

ELAND, and for the calculation of position-specific error rates18,43. Supple-

mentary Figure 6a online shows the average error rate, defined as the estimated

probability of a base call being incorrect as a function of nucleotide position for

each of the four lanes (libraries) studied. The error rate model for the Illumina

pipeline was calibrated on the basis of uniquely aligned reads to the genome,

and then applied to all reads. The average error rate (averaged over all reads)

rises sharply beyond the twenty-first base, consistent with an assumption that

the reads should be dominated by miRNA sequences of roughly 21 nt, as

subsequent unaligned bases of the 3¢ adaptor would be scored as low quality.

Reads were trimmed so as to optimize the total nucleotide quality in a

dynamic programming approach that produced trimmed reads such that the

maximum acceptable error rate over the trimmed sequence is less than 10%

(QPHRED ¼ 10), the total quality of the read is optimized globally over all start

and stop positions, and the resulting length is greater than or equal to 17 nt44.

The trimming procedure can be described formally as follows. An optimal

trimming can be achieved by defining a penalty P associated with making an

incorrect base call at a given nucleotide n. Using the position-specific error

probability, en, one can define an expected score for a given nucleotide as

sn ¼ 1 � ð1 � enÞ+P � en ¼ 1 � ðP + 1Þ � en. The total expected score for a

trimming of the nucleotide sequence to start at position and end at position

j is then given by:

Sði; jÞ ¼Xj

n¼i

sn ¼Xj

n¼i

1 � ðP+1Þen:

One is then free to choose the penalty, such that the expected score is zero when

the error rate is the maximum tolerated, so P ¼ 1emax

� 1, and any error rate

greater than emax will produce a negative contribution to the score. A dynamic

programming search then globally optimizes Sði; jÞ over all start and stop

positions44. Further details of the data analysis rationale and methodology are

available in the Supplementary Methods online. A meta-analysis of the

distribution for all processed reads across a miRNA locus is presented in

Supplementary Figure 7 online.

Accession codes. Gene Expression Omnibus: Small RNA sequencing data have

been deposited with accession code GSE13625.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank L. Tonkin of the Vincent J. Coates Genomics Sequencing Laboratoryfor assistance with high-throughput sequencing and general expertise,H. Melichar for critical reading of the manuscript and members of the Levinelaboratory for discussions. B.H. is supported by an American Cancer SocietyPostdoctoral Fellowship. This work was funded by a grant from the US NationalInstitutes of Health (34431) to M.L.

AUTHOR CONTRIBUTIONSW.S. and B.H. performed all experiments on C. intestinalis and D. melanogaster,respectively; D.H. performed bioinformatic analyses; M.L. and B.H. supervised thestudy and wrote the first draft of the manuscript; all authors discussed the resultsand commented on the manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Ambros, V. The functions of animal microRNAs. Nature 431, 350–355 (2004).2. Zamore, P.D. & Haley, B. Ribo-gnome: the big world of small RNAs. Science 309,

1519–1524 (2005).3. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,

281–297 (2004).4. Lau, N.C., Lim, L.P., Weinstein, E.G. & Bartel, D.P. An abundant class of tiny RNAs

with probable regulatory roles in Caenorhabditis elegans. Science 294, 858–862(2001).

5. Pasquinelli, A.E. et al. Conservation of the sequence and temporal expression of let-7heterochronic regulatory RNA. Nature 408, 86–89 (2000).

6. Kim, V.N. MicroRNA biogenesis: coordinated cropping and dicing. Nat. Rev. Mol. CellBiol. 6, 376–385 (2005).

7. Lee, Y. et al. The nuclear RNase III Drosha initiates microRNA processing. Nature 425,415–419 (2003).

8. Bernstein, E., Caudy, A.A., Hammond, S.M. & Hannon, G.J. Role for a bidentateribonuclease in the initiation step of RNA interference. Nature 409, 363–366 (2001).

9. Grishok, A. et al. Genes and mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control C. elegans developmental timing.Cell 106, 23–34 (2001).

10. Hutvagner, G. et al. A cellular function for the RNA-interference enzyme Dicer in thematuration of the let-7 small temporal RNA. Science 293, 834–838 (2001).

11. Tomari, Y. & Zamore, P.D. Perspective: machines for RNAi. Genes Dev. 19, 517–529(2005).

12. Okamura, K. et al. The regulatory activity of microRNA* species has substantialinfluence on microRNA and 3¢ UTR evolution. Nat. Struct. Mol. Biol. 15, 354–363(2008).

13. Heimberg, A.M., Sempere, L.F., Moy, V.N., Donoghue, P.C. & Peterson, K.J. MicroRNAsand the advent of vertebrate morphological complexity. Proc. Natl. Acad. Sci. USA105, 2946–2950 (2008).

ART IC L E S

18 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 96: Nature Structural Molecular Biology February

14. Dehal, P. et al. The draft genome of Ciona intestinalis: insights into chordate andvertebrate origins. Science 298, 2157–2167 (2002).

15. Murphy, D., Dancis, B. & Brown, J.R. The evolution of core proteins involved inmicroRNA biogenesis. BMC Evol. Biol. 8, 92 (2008).

16. Friedlander, M.R. et al. Discovering microRNAs from deep sequencing data usingmiRDeep. Nat. Biotechnol. 26, 407–415 (2008).

17. Fu, X., Adamski, M. & Thompson, E.M. Altered miRNA repertoire in the simplifiedchordate, Oikopleura dioica. Mol. Biol. Evol. 25, 1067–1080 (2008).

18. Prochnik, S.E., Rokhsar, D.S. & Aboobaker, A.A. Evidence for a microRNA expansion inthe bilaterian ancestor. Dev. Genes Evol. 217, 73–77 (2007).

19. Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of asubstantially expanded set of Drosophila microRNAs. Genome Res. 17, 1850–1864(2007).

20. Stark, A. et al. Systematic discovery and characterization of fly microRNAs using 12Drosophila genomes. Genome Res. 17, 1865–1879 (2007).

21. Slack, F. & Ruvkun, G. Temporal pattern formation by heterochronic genes. Annu. Rev.Genet. 31, 611–634 (1997).

22. Grimson, A. et al. Early origins and evolution of microRNAs and Piwi-interacting RNAsin animals. Nature 455, 1193–1197 (2008).

23. Seitz, H., Ghildiyal, M. & Zamore, P.D. Argonaute loading improves the 5¢ precision ofboth microRNAs and their miRNA strands in flies. Curr. Biol. 18, 147–151(2008).

24. Han, J. et al. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887–901 (2006).

25. Du, T. & Zamore, P.D. microPrimer: the biogenesis and function of microRNA.Development 132, 4645–4652 (2005).

26. Khvorova, A., Reynolds, A. & Jayasena, S.D. Functional siRNAs and miRNAs exhibitstrand bias. Cell 115, 209–216 (2003).

27. Schwarz, D.S. et al. Asymmetry in the assembly of the RNAi enzyme complex. Cell115, 199–208 (2003).

28. Han, J. et al. The Drosha-DGCR8 complex in primary microRNA processing. GenesDev. 18, 3016–3027 (2004).

29. MacRae, I.J. & Doudna, J.A. Ribonuclease revisited: structural insights into ribonu-clease III family enzymes. Curr. Opin. Struct. Biol. 17, 138–145 (2007).

30. Axtell, M.J. Evolution of microRNAs and their targets: are all microRNAs biologicallyrelevant? Biochim. Biophys. Acta 1779, 725–734 (2008).

31. Chen, J.F. et al. The role of microRNA-1 and microRNA-133 in skeletal muscleproliferation and differentiation. Nat. Genet. 38, 228–233 (2006).

32. Davidson, B., Shi, W., Beh, J., Christiaen, L. & Levine, M. FGF signaling delineates thecardiac progenitor field in the simple chordate, Ciona intestinalis. Genes Dev. 20,2728–2738 (2006).

33. Corbo, J.C., Levine, M. & Zeller, R.W. Characterization of a notochord-specificenhancer from the Brachyury promoter region of the ascidian, Ciona intestinalis.Development 124, 589–602 (1997).

34. Biemar, F. et al. Comprehensive identification of Drosophila dorsal-ventral patterninggenes using a whole-genome tiling array. Proc. Natl. Acad. Sci. USA 103,12763–12768 (2006).

35. Bushati, N., Stark, A., Brennecke, J. & Cohen, S.M. Temporal reciprocity of miRNAsand their targets during the maternal-to-zygotic transition in Drosophila. Curr. Biol. 18,501–506 (2008).

36. Beh, J., Shi, W., Levine, M., Davidson, B. & Christiaen, L. FoxF is essential forFGF-induced migration of heart progenitor cells in the ascidian Ciona intestinalis.Development 134, 3297–3305 (2007).

37. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P. & Blelloch, R. Mouse ES cellsexpress endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 22, 2773–2785 (2008).

38. Wang, Z. & Kiledjian, M. Functional link between the mammalian exosome and mRNAdecapping. Cell 107, 751–762 (2001).

39. Wilusz, C.J., Wormington, M. & Peltz, S.W. The cap-to-tail guide to mRNA turnover.Nat. Rev. Mol. Cell Biol. 2, 237–246 (2001).

40. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E. & Filipowicz, W. Single processingcenter models for human Dicer and bacterial RNase III. Cell 118, 57–68 (2004).

41. Brennecke, J. et al. Discrete small RNA-generating loci as master regulators oftransposon activity in Drosophila. Cell 128, 1089–1103 (2007).

42. Haley, B., Hendrix, D., Trang, V. & Levine, M. A simplified miRNA-based gene silencingmethod for Drosophila melanogaster. Dev. Biol. 321, 482–490 (2008).

43. Norden-Krichmar, T.M., Holtz, J., Pasquinelli, A.E. & Gaasterland, T. Computationalprediction and experimental validation of Ciona intestinalis microRNA genes. BMCGenomics 8, 445 (2007).

44. Chapman, J. Whole Genome Shotgun Assembly in Theory and Practice. PhD Thesis,Univ. California, Berkeley, 50–51 (2004).

45. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction.Nucleic Acids Res. 31, 3406–3415 (2003).

46. Mathews, D.H., Sabina, J., Zuker, M. & Turner, D.H. Expanded sequence dependenceof thermodynamic parameters improves prediction of rna secondary structure. J. Mol.Biol. 288, 911–940 (1999).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 97: Nature Structural Molecular Biology February

Conformational flexibility of metazoan fatty acidsynthase enables catalysisEdward J Brignole1, Stuart Smith2 & Francisco J Asturias1

The metazoan cytosolic fatty acid synthase (FAS) contains all of the enzymes required for de novo fatty acid biosynthesiscovalently linked around two reaction chambers. Although the three-dimensional architecture of FAS has been mostly defined, itis unclear how reaction intermediates can transfer between distant catalytic domains. Using single-particle EM, we have identifieda near continuum of conformations consistent with a remarkable flexibility of FAS. The distribution of conformations wasinfluenced by the presence of substrates and altered by different catalytic mutations, suggesting a direct correlation betweenconformation and specific enzymatic activities. We interpreted three-dimensional reconstructions by docking high-resolutionstructures of individual domains, and they show that the substrate-loading and condensation domains dramatically swing andswivel to access substrates within either reaction chamber. Concomitant rearrangement of the b-carbon–processing domainssynchronizes acyl chain reduction in one chamber with acyl chain elongation in the other.

The synthesis de novo of long-chain fatty acids universally involves asuite of enzymes that catalyze the iterative elongation and processingof the carbon chain followed by product release (Fig. 1). In all FASsystems, covalently bound reaction intermediates are translocatedbetween active sites by an acyl carrier protein (ACP). In prokaryotes,chloroplasts and mitochondria, the constituent enzymes are free-standing proteins, but in the cytosol of eukaryotes they are integratedinto giant multifunctional polypeptide chains1. Interest in the struc-ture and mechanism of action of the mammalian FAS has beenstimulated by the realization that the protein is a potential target forthe treatment of obesity and cancer, because FAS inhibitors areeffective appetite suppressants and can selectively target several typesof cancer cells2,3.

Notably, the evolution of the eukaryotic megasynthases has fol-lowed two radically different architectural themes. In the 2.6-MDafungal FAS, the constituent enzymes are all embedded in the interiorwall of a rigid barrel-shaped structure. Access of the enzymes to theirACP-bound substrates is facilitated entirely by movement of the ACPdomains about their attachment points in the center of the barrel4,5.By contrast, the homodimeric 0.54-MDa metazoan FAS (Fig. 2a) is anextremely flexible macromolecule6. A model of FAS, derived by fittinghigh-resolution structures of individual prokaryotic enzymes into a4.5-A resolution FAS crystallographic density map7, revealed that theFAS subunits come together to form a central interface comprisingdimeric b-ketoacyl synthase (KS) and enoyl reductase (ER) domains,and a pair of pseudodimeric dehydratase (DH) domains (Fig. 2b).The ER and DH domains in the upper portion of the FAS structureare flanked by appended pairs of monomeric ketoreductase (KR)domains. In the lower section, the KS domains are positioned between

monomeric malonyl/acetyl transferase (MAT) domains. Upper andlower sections are joined by a narrow connection formed by the distalend of the linker connecting the MAT and DH domains. Theresolution of the X-ray density map was insufficient to determinewhether the FAS subunits were in a back-to-back or crossed-overarrangement. An additional section of the structure must comprise theACP and the thioesterase (TE) domain, which catalyzes the chain-termination step. Extensive flexibility in their flanking linker regionsprevented imaging of these domains in the X-ray crystallographyelectron-density map, but we surmise (by virtue of the covalentlinkage between the ACP and the C terminus of the KR domain)that an ACP-sized density observed below the KR domains in thecryo-EM structure marks the position of the ACP domain in theupper section of the EM reconstruction.

The FAS structure defined by the 4.5-A crystallographic densitymap neatly compartmentalizes constituent domains. Domainsinvolved in chain extension (KS and MAT) grouped into the lowerportion of the structure; domains responsible for b-carbon processing(KR, DH and ER) in the upper portion alternately engage the ACPduring each catalytic cycle and are arranged around two discretereaction chambers in which each ACP has access to only one set ofcatalytic domains7. However, a substantial body of biochemicalevidence, including mutant-complementation analyses8 and site-specific cross-linking9, indicates that the ACP domains can makefunctional contacts with the KS and MAT domains of either subunit(red arrows in Fig. 2b). Clearly, substantial flexibility of the mega-synthase would be required to permit functional contacts betweenthese domains that, in the crystal structure, seem distantly located. Inthis study, we used single-particle macromolecular EM to

Received 18 August 2008; accepted 14 November 2008; published online 18 January 2009; doi:10.1038/nsmb.1532

1Department of Cell Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA. 2Children’s Hospital Oakland ResearchInstitute, 5700 Martin Luther King Jr. Way, Oakland, California 94609, USA. Correspondence should be addressed to F.J.A. ([email protected]).

19 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 98: Nature Structural Molecular Biology February

characterize a wide range of conformations that enable functionality ofa mammalian (Rattus norvegicus) FAS by facilitating all requiredcatalytic interactions.

RESULTSStructural analysis of FAS using single-particle EMDerivation of structural information from noisy EM images of singlemacromolecules relies on averaging of properly aligned images that, inprinciple, originate from identical particles. If the particles areheterogeneous in conformation, the analysis becomes considerablymore complex, especially for the unstained specimens in whichbiological macromolecules are best preserved. Preservation of particlesin stain can result in some distortion, but the relatively high signal-to-noise ratio in the resulting images, in combination with suitablestatistical analysis, can allow for a quantitative description of differentmolecular conformations10. Earlier EM reconstructions of FAS calcu-lated from molecules preserved in stain or ice were markedly similar6,and both resembled the intermediate-resolution X-ray structure7 inoverall shape and size (Fig. 2b), indicating that FAS is fairly resistantto stain-induced deformation. Moreover, upon adsorption to the

amorphous carbon support film used to prepare stained samples,FAS showed a strongly preferred orientation, so that image alignmentand classification can be used to distinguish changes in molecularconformation without the complication of having to determineparticle-orientation parameters. Therefore, images of FAS fromstained samples could be used to calculate two-dimensional classaverages (Fig. 2c) and corresponding three-dimensional reconstruc-tions that would reveal the relative domain positions in differentconformational states.

FAS structural pliabilityThe conformational variability of FAS was characterized using a FASmutant (D22-FAS) bearing a 22-residue deletion in the linker betweenthe ACP and TE domains. The shortened tether was expected torestrict mobility of the TE domain and simplify alignment andclassification of single-particle images without affecting the activityof other FAS components (the only effect of the mutation is to slightlyslow down the chain-termination step catalyzed by the TE)11, whileincreasing the probability of imaging the previously undetected TEdomain (Supplementary Fig. 1 online). To avoid oversamplingconformations related to the slowed catalysis of acyl chain release,we examined D22-FAS in the absence of substrates. We collectedimages of D22-FAS as tilted pairs so that three-dimensional structurescould be calculated by the random conical tilt reconstructionmethod12, maximizing the information garnered about each observedconformation. Reference-free image alignment and classification were

S

S

S

S

S

S

S

S

O

O

O

O

R

R

OH

R R

R

O O

O

OO

O

O

Chaintermination

MAT

MAT

KSTE

ER KR

DH R = H-(CH2)1,3,5,7,9,11,13

Chainelongation

β-carbonprocessing

13

H2O

CO2

O

O

O

O

H2O

O

CoA

CoA

CoA

CoA

Initiation

NADP+

NADP+NADPH

+ H+

NADPH+ H+

Substrateloading

ACP

ACPACP

ACP

ACP

ACPACP

ACP

Figure 1 Reaction cycle in fatty acid biosynthesis. The fatty acid

biosynthesis reaction cycle initiates with transfer of the acetyl moiety to

the KS via an ACP-bound intermediate (Initiation). The malonyl thioester

is similarly transferred to an ACP (Substrate loading) and then condensed

with the KS-bound acyl chain (Chain extension). The resulting b-ketone

is then reduced and dehydrated, yielding a saturated acyl chain (b-carbon

processing) that is delivered to the KS, initiating the next cycle. After seven

cycles, the 16-carbon acyl chain, palmitate, is released (Chain termination).

a

b

c

Cys161 Ser581 His878 Asp1032 Gly1672 Gly1886 Ser2151 Ser2302

Figure 2 Structural and functional organization of the metazoan FAS.

(a) The domains of FAS are linearly arranged along the FAS polypeptide.

Key active-site residues for the KS, MAT, DH and TE domains, the location

of the glycine-rich motifs of the nucleotide binding sites in the ER and KR

domains, and the site of post-translational phosphopantetheinylation in

the ACP domain are marked (rat FAS numbering). (b) Crystal and cryo-EM

structures capture different conformations of FAS. Atomic structures of

individual catalytic domains were positioned according to the intermediate-

resolution crystal structure7. ACP domains36 were fitted into remaining

densities located below the KR domains in the cryo-EM structure (gray)6.

Densities corresponding to the TE domains were not apparent in either the

crystal structure or the cryo-EM structure, but the domains are positioned

near the outer edge of the two reaction chambers, on the basis of evidence

presented in this article. Subunits in the FAS homodimer are depicted in acrossed-over arrangement with the domains of one subunit in faded colors.

Catalytic contacts made by the ACP of one subunit are indicated by arrows.

The flexibility of the KR-ACP linker and mobility of the phosphopantetheine

are insufficient to explain contacts with the distant KS and MAT active

sites. (c) A two-dimensional class average calculated from images of FAS

molecules preserved in stain has recognizable structural elements and

shows good correspondence with the X-ray and cryo-EM three-dimensional

structures. The scale bar represents 100 A.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 9 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 99: Nature Structural Molecular Biology February

used to separate particles into the minimum number of groupsnecessary to describe structural variability in sufficient detail. Weidentified a total of 16 distinct FAS conformations in two-dimensionalprojection and calculated three-dimensional structures for each(Fig. 3a–d). FAS seems to adopt a continuum of conformations,and further image subdivision was possible, but at the cost ofdiminishing the resolution of the resulting three-dimensional struc-tures (Supplementary Fig. 2 online).

The remarkable diversity among the observed FAS conformationscould be described according to three main criteria: (i) in-planerotation between the top and bottom portions of the structure(Fig. 3a,c); (ii) reorganization resulting in progressive asymmetry inthe upper half of the structure (Fig. 3c and Supplementary Video 1online); and (iii) off-plane rotation between the top and bottomportions (Fig. 3b,d). A nearly symmetric conformation with approxi-mately equal-sized reaction chambers (Fig. 3a, class #888) closelyresembles the conformation in the 4.5-A X-ray structure7. FAS isfaithfully represented in the EM reconstructions (SupplementaryFig. 3 online), and its domain organization can be interpreted bycomparison with models derived from the X-ray structure of FAS.Fitting atomic structures of the DH, ER and KR domains requires onlyan adjustment from their positions in the crystal structure (Fig. 3e–g).A large density is apparent adjacent to the KR monomers at each endof the upper portion of the EM reconstructions. In homologousmodular polyketide synthase (PKS) systems, the KR domain isstabilized by partnering with a structural subdomain13. The analogousstructural domain (SD) of the metazoan FAS was predicted to have afold resembling that of S-adenosylmethionine–dependent methyl-transferases14, and we fitted the human histamine methyltransferasestructure15 into this part of the EM reconstructions (also SD in Fig. 2).Comparison of the different EM three-dimensional structures indi-cates that the position of the SD is variable, probably explaining the

partial absence of density attributable to the SD in the 4.5-A X-raystructure7. Densities corresponding in size and location to thoseexpected for the ACP (Fig. 2b) and TE domains are also apparentin the EM reconstructions, with the TE domain often positionedwithin an open reaction chamber or in front of a closed one(Supplementary Fig. 1).

Consistent with the highly structured nature of the linker thatconnects the MAT and KS domains in FAS (PDB 2JFD) and a modularPKS16, fitting of atomic structures indicates that the MAT and KSdomains maintain their relative positions, with the entire lowerportion of the FAS structure rotating in plane as a unit (Fig. 3a,c).In molecules where the upper and lower sections were orientatedperpendicular to each other (Fig. 3b,d) the MAT densities that extendabove and below the plane of the molecule seem to have beencompressed, a familiar complication when imaging negativelystained molecules17. However, we were still able to identify the bilobeddensity representing each KS-MAT didomain and could use it todetermine the relative angle between the top and bottom portions ofthe FAS structure.

Effect of substrates and point mutations on FAS conformationTo expand on a previous analysis of the effect of substrates andcatalytic state on the conformation of FAS6, three mutants, D22-FAS(slower product release)11, H878A-FAS (DH activity compromised)18

and C161Q-FAS (KS activity compromised)19, were imaged in thepresence of substrates. Because different mutations should variouslyaffect specific catalytic steps, changes in the distribution of FASconformations could be expected following the addition of substrates.Independent analysis of each mutant image data set produced a rangeof class averages, revealing that all mutants sample the same con-formations observed in the absence of substrates (SupplementaryFig. 4 online). This implies that the conformations we have

In-plane bottom

888

688

888 823 691 706 1103 815

608 823 859 685 591 815 807

691 706 893 1103 714 689 917

Sym

met

ric to

pA

sym

met

ric to

p

Perpendicular bottoma

c

e

f

g

d

b Figure 3 Conformational variability of D22-FAS

in the absence of substrates. (a–d) Single-particle

images were classified (black and white images)

and corresponding three-dimensional structures

were calculated (yellow). The number of

particles in each class is indicated above its

two-dimensional class average. The domain

arrangements in the upper portion of the

structure range from predominantly symmetric

(a,b) to strongly asymmetric (c,d). The lower

domains are arranged with respect to the upper

domains either in parallel, swinging from right

to left (a,c) or swiveling about the narrow ‘waist’

into a perpendicular arrangement (b,d). (e) Three-

dimensional structures of the D22-FAS mutantwere colored as in Figure 2 to indicate the

regions that could be fitted with structures of

the KS, MAT, DH, ER, KR and SD. Regions of

density that were not fitted (transparent gray)

may accommodate the TE and/or ACP domains.

(f) Atomic structures of individual domains were

fitted into several RCT structures and filtered

to match the resolution of the EM structures.

(g) Two-dimensional projections of these fitted

atomic structures (right image in each pair)

closely resemble the two-dimensional class

averages (left image in each pair, also in a–d)

that correspond to each of the three-dimensional

RCT reconstructions (directly above each pair in

e). Scale bars represent 100 A.

ART IC L E S

19 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 100: Nature Structural Molecular Biology February

documented most likely represent a full range of domain motions thatare sufficient to enable all reactions catalyzed by FAS.

To avoid the possible introduction of subjective bias in ourcomparison of conformation distributions, images from all FAS datasets were combined, simultaneously aligned as one group and thenseparated into 50 different classes. Finer partition into more classeswas facilitated by the larger size of the combined data set and thesmaller number of images required for two-dimensional analysis(Fig. 4a). The conformations observed in each class were thencategorized (for example, asymmetric or symmetric, perpendicularor in plane; Fig. 4b), and only on completion of this analysis were theorigin of the particles in each conformation decoded.

The most appreciable change resulting from addition of substrateswas a marked increase in the fraction of molecules that showedan asymmetric arrangement of the b-carbon–processing domains(Fig. 4c). Of the three FAS mutants prepared with substrates, theDH mutant has the strongest preference for asymmetry in the upperportion (80% asymmetric), suggesting that this arrangement mayfacilitate interactions required for b-carbon processing. The D22 andKS mutants also show a preponderance of asymmetric upper domains(70% and 64% asymmetric, respectively) when imaged in the presenceof substrates. In fact, the D22 and KS mutants have nearly identicalconformation distributions, possibly reflecting the fact that both arecompromised in the chain-elongation step (by KS inactivity and byslow elongation from 16 to 18 carbon atoms resulting from reducedTE activity11, respectively). To further confirm that the arrangement ofthe b-carbon–processing domains was directly influenced by catalyticactivity (addition of substrates), the DH mutant that showed thelargest preference for asymmetric arrangement was imaged in theabsence of substrates. This resulted in a two-fold decrease in thenumber of DH mutant molecules with asymmetrically arrangedb-carbon–processing domains (Fig. 4c). Finally, whereas the propor-tion of molecules showing a perpendicular disposition of the top and

bottom portions of the FAS structure varies for each mutant, thisconformation is always more prevalent when the upper portion issymmetric, suggesting a possible correlation between catalytic activityand in-plane arrangement of the top and bottom portions of the FASstructure (see below).

Although we consider it unlikely, we cannot completely rule out thepossibility that the changes we observed upon addition of substratesmight have resulted from selective adsorption of certain molecularconformations, rather than from conformational changes in FAS as itengages in catalysis.

Domain movements and implicationsThe insight gained from EM reconstructions of different FAS con-formations was leveraged by fitting high-resolution structures ofindividual domains into the EM maps (Fig. 3e–g). The most promi-nent domain rearrangements result from movement about the narrowconnection that separates the upper and lower portions of theFAS structure, which are covalently held together by the linkerconnecting each of the MAT-DH domain pairs. By analogy with thestructure of a PKS KS-MAT module20, the FAS MAT catalytic domain(PDB 2JFD) is followed by a linker region (probably composed ofresidues from Val823 to the absolutely conserved Trp842) that threadsbetween the structured pre-MAT linker and the KS domain andimmediately precedes a region rich in proline, glycine and serineresidues. This sequence probably represents an unstructured, solvent-exposed region, well suited to function as a flexible tether. Majorflexibility at this ‘hinge’ permits two distinct motions: a pendulum-like swinging of the MAT-KS2-MAT module from side to side, and aswiveling motion perpendicular to the plane of the upper portion ofthe structure.

With the upper and lower portions of the FAS structure in the sameplane, a swinging motion of the bottom as a rigid unit results inchanges of up to B251 in the angle between the upper and lower

Bottom conformation

Left closed

Symmetric Asymmetric

Top

Bottom

Sym

.A

sym

met

ric

Top

conf

orm

atio

n

In-plane swinging Right closed Perpendicular

PerpendicularPerpendicularIn planeIn plane

% p

artic

les

Sym

.

Sym

.

Asy

m.

Asy

m.

Sym

.

Asy

m.

Sym

.

Asy

m.

Sym

.

Asy

m.

80

Substrates:

∆22 mutant DH mutant

– + – + +

KS mutant

70

60

50

40

30

20

10

a

b

c

Figure 4 Distribution of FAS conformations is

altered in the presence of substrates. (a) The

D22-FAS and H878A (DH) mutants were imaged

without substrates, and these mutants and the

C161Q (KS) mutant were imaged in the presenceof substrates. Particles from all five data sets

were classified together into 50 groups. After

discarding 6 classes of grossly misaligned or

distorted particles (3.4% of particles, not

shown), the remaining 44 classes were

categorized into those with symmetric (Sym.,

red) and asymmetric (blue) conformations in

the upper b-carbon–processing section (top

conformation) and those with perpendicular

(faded colors) or in-plane conformations in

the lower MAT-KS2-MAT section (bottom

conformation). The in-plane conformations of

the lower section are arranged according to the

degree of rotation of the lower section: from left

swinging (left closed) to right swinging (right

closed). For simplicity, class averages that show

an opening between the DH, ER and KR domains

in the left half of the structure were mirrored sothat the opening always appears in the right half

of the structure. (b) Cartoon representation of

each conformation colored according to a.

(c) After categorization of classes, the numbers

of particles from each FAS preparation in each

category were determined. Bars are colored

according to conformations as in a and b.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 9 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 101: Nature Structural Molecular Biology February

portions of FAS (Figs. 3e,f and 5a). This swinging motion isconsiderably larger than the B71 upward or downward movementsuggested by the asymmetric reaction chambers of the FAS crystalstructure7 and results in synchronized cycling between the open andclosed reaction chamber conformations. Thus, the two reactionchambers might simultaneously engage in different activities. Thesubstrate-loading and chain-extension reactions require close contactsbetween domains in the upper and lower sections, and it is unlikelythat the ACP and associated phosphopantetheine moiety could accessthe KS and MAT domains in a fully open reaction chamber(Fig. 5b,c). This suggests that closure of the chamber is required forthe substrate-loading and chain-extension reactions, whereas the openchamber might be capable of only b-carbon processing and chaintermination. As the TE domain seems to be excluded from a closedchamber, the chain-termination reaction is likely to occur preferen-tially in the open chamber conformation, when free access to the ACPis facilitated.

Our interpretation of rearrangements of the b-carbon–processingdomains in the top portion of the FAS structure was based on fittingof DH2, ER2, KR and SD domains into the three-dimensional EMreconstructions (Fig. 5d and Supplementary Fig. 3b). Comparing theposition of the fitted domains to their original positions in the X-raymodel of FAS revealed a correlated motion and suggested that thechanges observed in the top portion of the FAS EM reconstructionscould be partially explained by rotation of the domains as a rigid unit.However, such rigid motion was insufficient to explain the degree ofasymmetry apparent in most FAS molecules imaged in the presenceof substrates. Although certain projections of the top portion of theFAS X-ray model would resemble the asymmetric projectionsobserved in this study (Figs. 3c,d and 4a, and SupplementaryVideo 1), the largely symmetric arrangement of domains in theX-ray model is in sharp contrast with the clearly asymmetric nature

of the corresponding portion of several of the three-dimensionalFAS reconstructions presented here (Supplementary Fig. 3c,d). Theasymmetrically arranged b-carbon–processing domains in thepresence of substrates is further supported by consideration of apreviously published three-dimensional cryo-EM reconstruction ofFAS (Fig. 2b) that was also determined from particles imaged in thepresence of substrates6.

The asymmetric arrangement of the b-carbon–processingdomains favored in the presence of substrates opens up a spaceabout 15 A in diameter between the DH, ER and KR domains ofone reaction chamber while closing contacts between the samedomains in the opposite chamber. The access points to the DH andER active sites face this asymmetric opening21, and dilation mayfacilitate alternating ACP access to the DH and ER domains ineither chamber (Fig. 5c). Notably, the asymmetric conformation ofthe top portion was most prevalent when fatty acid synthesiswas arrested during dehydration (Fig. 4c, DH mutant with sub-strates), suggesting that this structural rearrangement is particularlyrelevant to b-carbon processing and that varying accessibility ofthe ACP may provide a mechanism for coordinating b-carbonprocessing and substrate loading and condensation between the tworeaction chambers.

Configurations in which the upper and lower portions of the FASstructure swivel by ±80–1001 with respect to each other (Fig. 5e) mustbe facilitated by flexibility at the narrow connection between themformed by extended linkers between the MAT and DH domains. In theresulting perpendicular configuration, the MAT active sites are posi-tioned out of reach of the ACP and would be unable to supportsubstrate loading (Fig. 5f). Moreover, the fraction of molecules with aperpendicular conformation is significantly reduced upon addition ofsubstrates. This is most apparent for the D22 mutant, in which ashortened ACP-TE linker must interfere with free rotation of the lowerdomains or allow the TE domain to participate in protein-proteininteractions that transiently stabilize the perpendicular conformation.Taken together, these observations suggest that the perpendicularconfiguration might not be directly relevant for catalysis. However,its detection as a possible intermediate between two states in which theupper and top portions of the FAS structure have flipped seems toilluminate a critical aspect of FAS function.

888

Front view 25°

35°

100°

180°

80°

<35 Å~10 residue linker

888

Side view

Bottom view

Swivel

Loading Condensation

823888

PO4ACPACPACPKRKRKR

1103

23 Å 18 ÅPhosphopantetheine

Ser SH

25°

823 706a

b

c

d e

f g

Figure 5 Changes in domain position bring catalytic domains into proximity

of the ACP to facilitate catalytic interactions. (a) Lower portion swings

relative to upper portion. (b) Crystal and NMR structures of the human

and rat ACP domains36,40 and analysis of sequence conservation among

metazoan FAS indicate that the KR-ACP linker is likely to consist of

approximately ten residues between Lys2109 through Arg2120 (rat FAS

numbering). When fully extended, this linker could be up to 35 A in

length. The ACP has a length of approximately 23 A from its N terminus

to the phosphopantetheinylated Ser2151. The phosphopanthetheine must

extend approximately 18 A from the ACP to reach the active site within

each catalytic domain. (c) After roughly modeling a rigid, extended

phosphopantetheine into each active-site pocket, the phosphate was

rendered as a sphere with an 8-A radius. Gray spheres with a radius of

55 A indicate the distance that the ACP domains could reach from a fixed

tether point at the C terminus of the KR. (d) Side view of FAS with theKR and SD domains removed from one subunit, revealing rotation of the

DH and ER domains. (e,f) The lower portion swivels relative to the upper

portion. (g) Full 1801 swiveling of the lower portion of the structure occurs

during each catalytic cycle to explain the FAS activity of a heterodimer

composed of a wild-type subunit (colored domains with red stars) partnered

with a mutant subunit lacking all seven functionalities (indicated by gray

domains with black crosses). Domains of FAS are colored as in Figure 2.

Scale bar in a represents 100 A.

ART IC L E S

19 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 102: Nature Structural Molecular Biology February

DISCUSSIONThe single conformations captured in our previous cryo-EM study6

and in the 4.5-A X-ray structure of the metazoan FAS7 could notaccount for a substantial body of biochemical evidence demonstratingthat both ACP domains in a FAS homodimer can make functionalcontacts with the MAT and KS domains of either subunit. Wepreviously suggested that FAS subunits might associate in either oftwo conformations: one with the polypeptides crossed over in thecenter of the molecule, the other with the subunits arranged back toback6. It now seems clear that FAS must switch between these twoalternative modes of subunit association. The present EM resultsprovide direct evidence that FAS can adopt a configuration inwhich the MAT-KS2-MAT module that constitutes the bottom ofthe FAS structure is rotated by 80–1001 relative to the upper section ofthe molecule. Swiveling of the lower section in either direction byB901 restores particles to the familiar conformation in whichthe upper and lower sections are in the same plane and would bringthe ACP domains within reach of either of the KS and MAT domainsin the lower section, depending on the direction of rotation. Theability of FAS to swivel between back-to-back and crossed-overconfigurations would explain the long-standing observation that thebifunctional dibromopropanone reagent cross-links the phospho-pantetheine of one subunit with the KS active-site cysteine of the sameor opposite subunits9.

Swiveling would explain how a FAS heterodimer in which onewild-type subunit is paired with a subunit compromised by mutationsin all seven functional domains (WT/7KO-FAS) can synthesize fattyacids22. In the WT/7KO-FAS heterodimer, the single functional ACP isforced to interact with KS and MAT domains of the same subunit,which can be accessed only from opposite reaction chambers7. Thus,the ability of the WT/7KO-FAS heterodimer to synthesize palmitateat a significant rate implies that the mutant must be capable ofextremely rapid 1801 swivel motions between every substrate-loadingand condensation event (Fig. 5g). The recently published23 improvedcrystal structures of porcine FAS in the NADPH-bound and free statesprovide atomic coordinates for most of FAS (the ACP and TEdomains were not resolved). In agreement with our conclusion, theauthors suggest that the loading and condensation portion of FASmay swivel by 1801 relative to the b-carbon–processing portion.However, they speculate that the swiveling motion may involve afurther coiling of the linkers rather than uncoiling, which we believecould be more favorable.

Notably, the wild-type FAS should in principle be able to functionwithout swiveling if each ACP would always cooperate with the sameMAT and KS domains. The fact that the flexible structural organiza-tion of FAS allows for alternative routes for substrate delivery andchain extension suggests that this apparent redundancy offers someinherent benefit. Indeed, kinetic analysis of a panel of FAS hetero-dimeric mutants in which only one of the two ACP domains wasfunctional revealed that heterodimers retaining both options forsubstrate delivery and chain extension enjoy a significant catalyticadvantage over those that rely on only a single option. For the wild-type FAS, the availability of alternative routes for substrate deliveryand chain extension was estimated to contribute B20% to the overallrate of fatty acid biosynthesis8.

The FAS structures characterized in this study illustrate theconformational flexibility of the protein and reveal specific long-range domain rearrangements ostensibly essential for catalyticactivity. The conformational changes we observed (Fig. 3), and thecatalytic interactions they facilitate (Fig. 5), are shown in an anima-tion assembled from the gallery of three-dimensional structures

(Supplementary Video 2 online). The animation shows how aswinging motion of the substrate-loading and -condensation domainsresults in cycling between open and closed chamber conformations,whereas swiveling allows an ACP to alternately engage the b-carbon–processing domains of the same subunit and the loading and con-densation portions of both subunits. Swinging and swiveling areapparently coordinated with rearrangement of b-carbon–processingdomains, possibly resulting in asynchronous chain extension andb-carbon–processing reactions between the two reaction chambers.

The metazoan and fungal FAS systems represent two contrastingalternative solutions to the problem of shuttling reaction intermedi-ates between multiple catalytic sites in a single protein. The fungalFAS has almost 50% of its total mass invested in supportive infra-structure. Structural inserts interspersed both within and betweencatalytic domains serve to position the active sites of the constituentenzymes so that they can be readily accessed by the mobile ACPdomain4,5. In contrast, the metazoan FAS is remarkably parsimonious.Its catalytic domains are almost completely uninterrupted bynoncatalytic domains, and the connecting regions account for asmall portion of the total mass. Success of the metazoan FAS designdepends on maintaining an extraordinary degree of flexibility inthe structure to ensure productive interactions between the ACPand its catalytic partners.

METHODSProtein expression and purification. Mutagenesis, expression and puri-

fication of rat FAS mutants, H878A (DH), C161Q (KS) and D22, have been

described previously11,18,19.

Specimen preparation and electron microscopy. FAS aliquots (1–5 mg ml–1)

were stored at –80 1C in 0.25 mM potassium phosphate, pH 7.0, 1 mM DTT,

1 mM EDTA and 10% (v/v) glycerol. For EM sample preparation, FAS aliquots

were diluted to a final concentration of 10–15 ng ml–1 in reaction buffer

(55 mM potassium phosphate, pH 7.0, 1.1 mM DTT, 20 mM acetyl CoA,

100 mM malonyl CoA, 180 mM NADPH, 1 mM Tris, pH 8.0) or mock reaction

buffer lacking malonyl CoA, acetyl CoA (replaced with 1:5 dilution of 3 mM

HCl) and NADPH24. Samples were prepared by placing 3–5 ml of FAS reaction

mixture on a continuous carbon-coated EM specimen grid (300 mesh, Cu/Rh,

Ted Pella) that had been freshly glow discharged, incubating for B1 min and

staining with 1% (w/v) uranyl acetate using a double-carbon layer technique25.

Specimens prepared with substrates were automatically imaged at B15 e– A–2

using the Leginon26 package in MSI-Raster mode to run a Tecnai F20

microscope (FEI) operating at 120 kV. Images were recorded at 50,000 �magnification on Gatan or Teitz 4096 � 4096 charge-coupled device (CCD)

cameras resulting in a pixel size of 2.26 A or 1.63 A, respectively, on the object

scale. Untilted and 551 tilt-pair images of the D22 mutant without substrates

were collected manually using low-dose conditions (each exposure B18 e– A–2)

and recorded on SO163 film (Kodak). Micrograph negatives were digitized

using an SCAI scanner (Zeiss) and binned to a final pixel size of 4.2 A on the

object scale.

Processing of single-particle images. All image processing was carried out

with routines implemented using the SPIDER and Web software packages

(Version 13)27. The defocus of each micrograph and CCD frame was com-

puted, power spectra were evaluated visually, and incorrectly estimated defocus

values were recalculated manually using Web. Particles from untilted micro-

graphs and CCD frames with estimated defocus between approximately

200 nm and 600 nm were selected for further processing. Individual FAS

molecules imaged in the presence of substrates were selected from CCD frames

by template matching28 using projections of a previously calculated FAS cryo-

EM density map6. Incorrectly selected particles were removed from the data set

by visual inspection. Tilted micrographs were assessed on an optical diffracto-

meter to ensure that the image was entirely underfocused. Particles in tilt-pair

micrographs were selected manually and interactively using TiltPicker, a

program currently under development by the Automated Molecular Imaging

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 9 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 103: Nature Structural Molecular Biology February

group at The Scripps Research Institute (software available at http://www.

appion.org). All individual particle images were windowed, ramped and

normalized. In addition, particles from CCD frames were interpolated to a

final pixel size of 4.2 A for direct comparison with images of the D22-FAS

mutant without substrates. Images were then band-pass filtered, retaining

information between 21 A and 330 A, and a soft-edged circular mask was

applied to remove information at the corners of the images.

Particles were reference-free aligned29, classified using correspondence

analysis and hierarchical ascendant clustering30 and then subjected to an

additional reference-free alignment within each more homogenous group.

Owing to conformational heterogeneity in the data set, this preliminary

alignment of FAS particles yielded variable results, and we repeated it ten

times to survey widely the possible structural arrangements. Similar groups

from each classification round were then combined by an additional round of

classification, and particles were further reference-free aligned within these

superclasses. The resulting class averages were used as references to initiate

multireference classification followed by reference-free alignment within each

class. This multireference and reference-free procedure was repeated 20 times,

with the resulting class averages used as references for the next iteration. Finally,

particles within each class were reference aligned to their final reference-free

class average using an alignment radius optimized for each group of particles.

This procedure is summarized in Supplementary Figure 5 online.

An initial round of this alignment and classification routine was used to

remove particles that were consistently assigned to poorly aligned classes of

particles. After removal of these particles, the D22, KS and DH mutants with

substrates and the D22 and DH mutants without substrates were composed of

35,902, 17,319, 15,369, 13,847 and 11,586 single-particle images, respectively.

These remaining particles were subjected to a second round of alignment and

classification, producing two-dimensional class averages that enabled a rudi-

mentary comparison of the particle distributions between data sets. To more

objectively compare conformation distributions between data sets, all particle

images were merged into a single data set and subjected to the alignment and

classification routine. After manually categorizing each class of FAS conforma-

tion, particles of each FAS variant belonging to each category were identified.

Particle distributions were displayed with Matlab 7.5.

For the D22 mutant without substrates, the final in-plane rotational

alignment parameters were used to produce random conical tilt (RCT)

reconstructions31. These reconstructions were improved through six iterations

of shift refinement to center the tilted particle images, followed by a single 51

angular search.

Fitting domain structures into density maps. Structures of DEBS KS3 (PDB

2QO3)20, human MAT (PDB 2JFD), human mitochondrial ER (PDB 1ZSY),

tylosin KR1 (PDB 2Z5L)32 and a homology model of DH33 after removal of the

N-terminal residues that are likely to be part of the MAT-DH linker were

initially positioned according to the FAS crystal structure7. The sequence

between the DH and ER domains (Glu1166 to Gln1520) was submitted to

HHpred34 to identify the human histamine methyltransferase (PDB 2AOT)15.

This structure was positioned adjacent to the tylosin KR domain by using

sequence-based structural alignment implemented in UCSF Chimera35 to

superimpose the Rossman folds of the KR structural subdomain and the

methyltransferase. After subtracting this multimodel structure from the cryo-

EM map6, the rat ACP (PDB 2PNG)36 was fitted into small protruding

densities located just below the KR domain using COAN37.

This FAS model was aligned to each of the three-dimensional EM structures

(calculated using the RCT method) using the ‘fit in map’ tool in Chimera,

and then the domain positions were manually adjusted to improve the fit

into each conformation while maintaining appropriate domain orientations

and contacts. For this procedure, the MAT-KS2-MAT, ER2 and DH2 domains

could be easily docked as rigid units. The KR-SD didomain was initially treated

as a single structural unit and the positions of individual domains were then

further adjusted.

To indicate the locations of the substrate binding pockets of each enzyme

and approximately define (within an B8-A radius) the region where the ACP

would dock, fully extended phosphopanthetheine ligands were inserted into

each active site through the deep binding cleft that leads to the active site within

each domain. In some cases, homologous structures with bound CoA were

available (PDB 1PN4 (ref. 38) and 2G2Z (ref. 39)). Sequence-based structural

alignment was used to position the phosphopanthetheine ligands of these

structures. Models and animated structures were rendered using Chimera.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank A. Witkowski for helpful discussions. We also acknowledge theNational Resource for Automated Molecular Microscopy for assistance withdata collection. The work was supported by a research fellowship F32 DK080622(to E.J.B.) and grant RO1 DK16073 (to S.S.) from the US National Institutesof Health.

AUTHOR CONTRIBUTIONSE.J.B. performed all experiments and data analysis; S.S. provided purified FAS;all authors contributed to designing experiments, interpreting results and writingthe manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Sul, H.S. & Smith, S. Fatty acid synthesis in eukaryotes. in Biochemistry of Lipids,Lipoproteins and Membranes (ed. Vance, D.E.a.V. J.E.) 155–190 (Elsevier, Amster-dam; Oxford, 2008).

2. Kuhajda, F.P. et al. Fatty acid synthesis: a potential selective target for antineoplastictherapy. Proc. Natl. Acad. Sci. USA 91, 6379–6383 (1994).

3. Loftus, T.M. et al. Reduced food intake and body weight in mice treated with fatty acidsynthase inhibitors. Science 288, 2379–2381 (2000).

4. Jenni, S. et al. Structure of fungal fatty acid synthase and implications for iterativesubstrate shuttling. Science 316, 254–261 (2007).

5. Lomakin, I.B., Xiong, Y. & Steitz, T.A. The crystal structure of yeast fatty acid synthase,a cellular machine with eight active sites working together. Cell 129, 319–332 (2007).

6. Asturias, F.J. et al. Structure and molecular organization of mammalian fatty acidsynthase. Nat. Struct. Mol. Biol. 12, 225–232 (2005).

7. Maier, T., Jenni, S. & Ban, N. Architecture of mammalian fatty acid synthase at 4.5 Aresolution. Science 311, 1258–1262 (2006).

8. Rangan, V.S., Joshi, A.K. & Smith, S. Mapping the functional topology of the animalfatty acid synthase by mutant complementation in vitro. Biochemistry 40,10792–10799 (2001).

9. Witkowski, A. et al. Dibromopropanone cross-linking of the phosphopantetheine andactive-site cysteine thiols of the animal fatty acid synthase can occur both inter- andintrasubunit. Reevaluation of the side-by-side, antiparallel subunit model. J. Biol.Chem. 274, 11557–11563 (1999).

10. Burgess, S.A., Walker, M.L., Thirumurugan, K., Trinick, J. & Knight, P.J. Use ofnegative stain and single-particle image processing to explore dynamic properties offlexible macromolecules. J. Struct. Biol. 147, 247–258 (2004).

11. Joshi, A.K., Witkowski, A., Berman, H.A., Zhang, L. & Smith, S. Effect of modificationof the length and flexibility of the acyl carrier protein-thioesterase interdomain linkeron functionality of the animal fatty acid synthase. Biochemistry 44, 4100–4107(2005).

12. Radermacher, M. The three-dimensional reconstruction of single particles from randomand non-random tilt series. J. Electron Microsc. Tech. 9, 359–394 (1988).

13. Keatinge-Clay, A.T. & Stroud, R.M. The structure of a ketoreductase determines theorganization of the b-carbon processing enzymes of modular polyketide synthases.Structure 14, 737–748 (2006).

14. Smith, S. & Tsai, S.C. The type I fatty acid and polyketide synthases: a tale of twomegasynthases. Nat. Prod. Rep. 24, 1041–1072 (2007).

15. Horton, J.R., Sawada, K., Nishibori, M. & Cheng, X. Structural basis for inhibition ofhistamine N-methyltransferase by diverse drugs. J. Mol. Biol. 353, 334–344 (2005).

16. Tang, Y., Kim, C.Y., Mathews, I.I., Cane, D.E. & Khosla, C. The 2.7-angstrom crystalstructure of a 194-kDa homodimeric fragment of the 6-deoxyerythronolide B synthase.Proc. Natl. Acad. Sci. USA 103, 11124–11129 (2006).

17. Cheng, Y. et al. Single particle reconstructions of the transferrin-transferrin receptorcomplex obtained with different specimen preparation techniques. J. Mol. Biol. 355,1048–1065 (2006).

18. Joshi, A.K. & Smith, S. Construction, expression, and characterization of a mutatedanimal fatty acid synthase deficient in the dehydrase function. J. Biol. Chem. 268,22508–22513 (1993).

19. Witkowski, A., Joshi, A.K., Lindqvist, Y. & Smith, S. Conversion of a b-ketoacylsynthase to a malonyl decarboxylase by replacement of the active-site cysteine withglutamine. Biochemistry 38, 11643–11650 (1999).

20. Tang, Y., Chen, A.Y., Kim, C.Y., Cane, D.E. & Khosla, C. Structural and mechanisticanalysis of protein interactions in module 3 of the 6-deoxyerythronolide B synthase.Chem. Biol. 14, 931–943 (2007).

21. Chen, Z.J. et al. Structural enzymological studies of 2-enoyl thioester reductase of thehuman mitochondrial FAS II pathway: new insights into its substrate recognitionproperties. J. Mol. Biol. 379, 830–844 (2008).

ART IC L E S

19 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 104: Nature Structural Molecular Biology February

22. Joshi, A.K., Rangan, V.S., Witkowski, A. & Smith, S. Engineering of an active animalfatty acid synthase dimer with only one competent subunit. Chem. Biol. 10, 169–173(2003).

23. Maier, T., Leibundgut, M. & Ban, N. The crystal structure of a mammalian fatty acidsynthase. Science 321, 1315–1322 (2008).

24. Smith, S. & Abraham, S. Fatty acid synthase from lactating rat mammary gland.Methods Enzymol. 35, 65–74 (1975).

25. Tischendorf, G.W., Zeichhardt, H. & Stoffler, G. Determination of the location ofproteins L14, L17, L18, L19, L22, L23 on the surface of the 50S ribosomal subunitof Escherichia coli by immune electron microscopy. Mol. Gen. Genet. 134, 187–208(1974).

26. Suloway, C. et al. Automated molecular microscopy: the new Leginon system. J. Struct.Biol. 151, 41–60 (2005).

27. Frank, J. et al. SPIDER and WEB: processing and visualization of images in3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199(1996).

28. Rath, B.K. & Frank, J. Fast automatic particle picking from cryo-electron micrographsusing a locally normalized cross-correlation function: a case study. J. Struct. Biol. 145,84–90 (2004).

29. Penczek, P., Radermacher, M. & Frank, J. Three-dimensional reconstruction of singleparticles embedded in ice. Ultramicroscopy 40, 33–53 (1992).

30. Bretaudiere, J.P. & Frank, J. Reconstitution of molecule images analysed bycorrespondence analysis: a tool for structural interpretation. J. Microsc. 144, 1–14(1986).

31. Radermacher, M., Wagenknecht, T., Verschoor, A. & Frank, J. Three-dimensionalreconstruction from a single-exposure, random conical tilt series applied to the 50Sribosomal subunit of Escherichia coli. J. Microsc. 146, 113–136 (1987).

32. Keatinge-Clay, A.T. A tylosin ketoreductase reveals how chirality is determined inpolyketides. Chem. Biol. 14, 898–908 (2007).

33. Pasta, S., Witkowski, A., Joshi, A.K. & Smith, S. Catalytic residues are shared betweentwo pseudosubunits of the dehydratase domain of the animal fatty acid synthase.Chem. Biol. 14, 1377–1385 (2007).

34. Soding, J., Biegert, A. & Lupas, A.N. The HHpred interactive server for protein homologydetection and structure prediction. Nucleic Acids Res. 33, W244–W248 (2005).

35. Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory researchand analysis. J. Comput. Chem. 25, 1605–1612 (2004).

36. Ploskon, E. et al. A mammalian type I fatty acid synthase acyl carrier protein domaindoes not sequester acyl chains. J. Biol. Chem. 283, 518–528 (2008).

37. Volkmann, N. & Hanein, D. Quantitative fitting of atomic models into observeddensities derived by electron microscopy. J. Struct. Biol. 125, 176–184 (1999).

38. Koski, M.K., Haapalainen, A.M., Hiltunen, J.K. & Glumoff, T. A two-domain structureof one subunit explains unique features of eukaryotic hydratase 2. J. Biol. Chem. 279,24666–24672 (2004).

39. Oefner, C., Schulz, H., D’Arcy, A. & Dale, G.E. Mapping the active site of Escherichiacoli malonyl-CoA-acyl carrier protein transacylase (FabD) by protein crystallography.Acta Crystallogr. D Biol. Crystallogr. 62, 613–618 (2006).

40. Bunkoczi, G. et al. Mechanism and substrate recognition of human holo ACP synthase.Chem. Biol. 14, 1243–1253 (2007).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 9 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 105: Nature Structural Molecular Biology February

MIA40 is an oxidoreductase that catalyzes oxidativeprotein folding in mitochondriaLucia Banci1,2, Ivano Bertini1,2, Chiara Cefaro1,2, Simone Ciofi-Baffoni1,2, Angelo Gallo1,2,Manuele Martinelli1,2, Dionisia P Sideris3,4, Nitsa Katrakili3 & Kostas Tokatlidis3,5

MIA40 has a key role in oxidative protein folding in the mitochondrial intermembrane space. We present the solution structure ofhuman MIA40 and its mechanism as a catalyst of oxidative folding. MIA40 has a 66-residue folded domain made of an a-helicalhairpin core stabilized by two structural disulfides and a rigid N-terminal lid, with a characteristic CPC motif that can donate itsdisulfide bond to substrates. The CPC active site is solvent-accessible and sits adjacent to a hydrophobic cleft. Its second cysteine(Cys55) is essential in vivo and is crucial for mixed disulfide formation with the substrate. The hydrophobic cleft functions as asubstrate binding domain, and mutations of this domain are lethal in vivo and abrogate binding in vitro. MIA40 represents athioredoxin-unrelated, minimal oxidoreductase, with a facile CPC redox active site that ensures its catalytic function in oxidativefolding in mitochondria.

Disulfide bonds are crucial for maintaining the structural stability ofproteins and are involved in various redox-signaling pathways in cells.The introduction of disulfide bonds in vivo often requires thecoordinated action of dedicated enzymes that act as catalysts for theoxidative folding process necessary to adopt a native conformation.Most of our understanding about oxidative folding pathways comesfrom studies on the eukaryotic protein disulfide isomerase (PDI),which resides in the lumen of the endoplasmic reticulum (ER)1–3, andon the bacterial periplasmic disulfide bond (Dsb) proteins4–6.Recently, a similar process of oxidative folding has been discoveredto operate in the mitochondria of eukaryotic cells7–10. Several cysteine-rich proteins of the mitochondrial intermembrane space (IMS) werefound to undergo oxidation after entering the organelle, in a pathwaythat requires the proteins Mia40 and Erv1 and is ultimately linked tothe respiratory chain9,11–14.

Mia40 belongs to a protein family whose members share sixcompletely conserved cysteine residues constituting a -CPC-CX9C-CX9C- motif7,15,16. Mia40 primary sequences can, however, varysubstantially in length. The human homolog (MIA40, 142 residues)shares high sequence identity (450%) with its eukaryotic homologsin the central part of its sequence (residues 47–105), which includesthe conserved -CPC-CX9C-CX9C- motif (Fig. 1a). Outside this region,the level of homology between different species is low (o20%).MIA40 lacks a large N-terminal extension including a transmembraneregion with respect to the yeast homologs (Fig. 1a), thus beingcompletely soluble in the IMS17. Substrate proteins for Mia40 are

IMS proteins of less than 20 kDa containing characteristic cysteinemotifs, organized in twin CX3C, twin CX9C or CX2C motifs18. Amongthem is the mitochondrial copper chaperone Cox17 (a CX9C sub-strate), which participates in Cu(I) transfer to cytochrome c oxidase(CcO)19–21, and the small Tims (CX3C substrates), which are chaper-ones for mitochondrial membrane proteins22–24. The Mia40-based protein-import mechanism is therefore vital to allow a correctfunction of several mitochondrial processes such as respiration andprotein biogenesis.

On the basis of binding experiments in vitro and in organello, it hasbeen proposed that Mia40 introduces disulfide bonds into importedprecursor substrates after they cross the outer membrane protein-import channel7,9. In a cascade of oxidoreductase reactions, electronsare then transferred from Mia40 to Erv1 and finally to either oxygenor cytochrome c11–13. Such a pathway resembles the reaction cascadesunderpinning the oxidative folding process in the ER and bacterialperiplasm, involving Ero1–Erv2–PDI and DsbB–DsbA, respec-tively4,5,25–27. The identification of transient intermolecular disulfidebonds (mixed disulfides) between Erv1 and Mia40, as well as betweenMia40 and its substrate proteins, such as Cox17 and Tim proteins,supports the model of a regulated transfer of disulfide bonds7,9,14,28–30.However, direct molecular evidence on the mechanism of Mia40-dependent oxidation of the substrates and the structural basis of thisprocess are lacking.

Here we reveal the molecular mechanism of Mia40-dependent oxi-dative folding. This was achieved through structural characterization

Received 27 May 2008; accepted 5 January 2009; published online 1 February 2009; doi:10.1038/nsmb.1553

1Magnetic Resonance Center CERM, University of Florence, Via Luigi Sacconi 6, 50019, Sesto Fiorentino, Florence, Italy. 2Department of Chemistry, University ofFlorence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Florence, Italy. 3Institute of Molecular Biology and Biotechnology, Foundation for Research and TechnologyHellas (IMBB-FORTH), Heraklion 71110, Crete, Greece. 4Department of Biology, University of Crete, Heraklion 71409, Crete, Greece. 5Department of MaterialsScience and Technology, University of Crete, Heraklion 71003, Crete, Greece. Correspondence should be addressed to I.B. ([email protected]) or K.T.([email protected]).

19 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 106: Nature Structural Molecular Biology February

of MIA40 and investigation in vitro and in vivo of its interaction withCOX17 and Tim substrates. The CPC motif is the active site ofMIA40, rapidly catalyzing the formation of a disulfide bond inthe substrates.

RESULTSMIA40 binds substrates as a monomer in vitroMIA40 purified from Escherichia coli cells (Supplementary Methodsonline) is functionally active, as assessed by testing its binding abilityto an authentic substrate (Tim10) or to two control proteins (outermembrane porin and matrix-targeted Su9-DHFR) using import-competition assays (Fig. 1b). Such an assay was used previously totest the functionality of the TIM10 complex in binding its cognateimport substrate31. Radiolabeled precursor proteins (Tim10, porinor Su9-DHFR) were incubated with MIA40 purified in aerobicconditions—that is, in its fully oxidized state—and then withisolated mitochondria. MIA40 inhibited specifically the import ofTim10 by more than 99% but only weakly that of porin or Su9-DHFR(Fig. 1b). MIA40 directly bound radiolabeled Tim10, because ab-mercaptoethanol–sensitive mixed disulfide intermediate wasdetected. Additionally, MIA40 is monomeric in vitro, as shown bystatic multiangle light scattering and confirmed by a correlation timefor the molecule tumbling (tm), of 10.6 ± 0.5 ns from NMR hetero-nuclear relaxation data (Supplementary Fig. 1 online).

13C NMR reveals distinct redox properties of MIA40 disulfidesNMR Cb chemical shifts are characteristic of the oxidation state ofcysteines32. In aerobic conditions as purified from bacterial cellcultures, the six conserved cysteine residues of MIA40 (-CPC-CX9C-CX9C-) are engaged in three disulfide bonds (MIA403S-S). The CPC(Cys53-Pro54-Cys55) motif could be easily reduced by a low concen-tration of DTT (2 mM) (Fig. 2). Reduction of the CPC motif entailsonly local structural changes for the segment 50–64, which encom-passes the N-terminal lid domain, as shown by 1H- and 15N chemicalshift changes (Fig. 2a,b). In contrast, no chemical shift variations weredetected for the residues of the ‘core’ domain (Fig. 2a,b), even aftertreatment with 100 mM DTT (Supplementary Fig. 2 online), indicat-ing that the CX9C disulfides were still in an oxidized state. Thisbehavior was confirmed by AMS (4-acetamido-4¢-maleimidylstilbene-2,2¢-disulfonic acid) thiol trapping assays (Supplementary Fig. 2 andSupplementary Methods). These data, in agreement with a previousbiochemical study33, provide a structural basis for the observed redoxbehavior of MIA40.

The reduction potential of the easily reducible CPC motif in theredox couple MIA403S-S–MIA402S-S, is �200 ± 5 mV, as measured byfluorescence emission spectra (Fig. 3a,b). This value of redox potentiallies between those of Mia40 substrates (for example, �340 mV forCOX17 and �320 mV for yeast Tim10) and the enzymatic C-terminalintramolecular cysteine pair C130-C133 of yeast Erv1 (�150 mV13;Fig. 3c). Therefore, on thermodynamic grounds alone, thesevalues support the disulfide-relay reactions observed in mitochondriawhere reducing equivalents flow from the substrate to the CPC motifof Mia40 and then to Erv1.

Solution structure defines MIA40 as a new type ofoxidoreductaseChemical shift index analysis34 indicates that both MIA403S-S andMIA402S-S states have a small helical segment in residues 56–59 (helixa1) and two longer helical segments (helix a2, residues 65–77, andhelix a3, residues 88–100), whereas the other 80% of residues, essen-tially located at the N and C termini, do not take any secondary-structural conformation (Fig. 4a), a large part of them being highlyflexible. Regions 1–41 and 107–142 indeed have R2/R1 ratios belowthose of the a-helical regions (Supplementary Fig. 1) and arecharacterized by negative or low (o0.5) 15N{1H} NOE values(Fig. 4b). By contrast, the region containing the CX9C motifs, aswell as that encompassing the N-terminal helix a1, have R2/R1

(Supplementary Fig. 1) and 15N{1H} NOE values consistent with astructured conformation (Fig. 4b). The unstructured C terminus isnot essential in vivo, as a yeast Mia40 mutant lacking this C-terminalsegment could support growth to wild-type levels (data not shown).Similarly, it was previously shown that the N-terminal segment isdispensable for function33.

The solution structure of the folded central region of MIA402S-S,determined by NMR (Fig. 5a and Supplementary Fig. 3 online),consists of a ‘core’ and a ‘lid’ on top of it. The core is composedof helices a2 (residues 65–77) and a3 (residues 88–100), which forman antiparallel a-hairpin kept together by two disulfide pairs, Cys64-Cys97 and Cys74-Cys87, juxtaposing the CX9C motifs (Fig. 5a). Thelid (residues 41–64) folds onto the core and is structurally rigid,although it does not have defined secondary-structural elements, withthe exception of the short helix a1 (residues 56–59; Fig. 5a). The tworesidues preceding the lid segment, Pro54 and Cys55, also show somea-helical propensity (B40% of 30 energy-minimized structures).However, the disappearance of the NH signal of Cys55 in MIA402S-S

indicates local structural flexibility in the CPC region. In contrast, this

Human 47C.........X9.......C

CoreLidα1 α2 α3

C.........X9.......C107

350104

% import

10% Time (min)

Tim10

Su9

Porin

10

90

65

103 100 60 75

100 0 1

100 87 80

30

–MIA40 +MIA40

10 30

% import

% import

142403158

29044

S. cerevisiaeD. melanogaster

Human 108351

105

S. cerevisiae

D. melanogaster

a b

Figure 1 MIA40 is functionally active in binding substrates. (a) Sequence alignments of Mia40 orthologs. The protein secondary structure based on

chemical shift index analysis of MIA40 is reported below the alignment. The N-termimal lid and the a-hairpin core are indicated (see text for details).

Helices a1 (red), a2 (blue) and a3 (cyan) are shown. The conserved cysteine motif CPC and twin CX9C motifs are shown in yellow; intramolecular disulfide

pairings (determined here) are shown above the alignment. (b) Competition import assays. 35S-labeled Porin, Tim10 and Su9-DHFR from yeast were mixed

with or without MIA40 and imported into yeast mitochondria at 30 1C for the indicated time points. The imported material was analyzed by reducing

SDS-PAGE, visualized by autoradiography and quantified as a percentage imported relative to the starting amount.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 9 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 107: Nature Structural Molecular Biology February

NH signal is still detectable in MIA403S-S, suggesting an increasedstructural rigidity upon disulfide-bond formation. The difference inbackbone flexibility between the two redox states of MIA40 may havea role in the catalytic process.

Notably, the lid contains conserved hydrophobic residues thatinteract with conserved aromatic residues located on one side of thea-hairpin core. Indeed, a highly charged region is only present on thea-hairpin face opposite the CPC motif (Fig. 5b). The hydrophobicinteractions between the lid and the a-hairpin core position the CPCmotif in a solvent-exposed conformation protruding from a hydro-phobic cleft (Fig. 5c, red), which consists of the strictly conservedphenylalanine residues Phe68, Phe72, Phe75 and Phe91, as well asLeu42, Ile43, Ile49, Trp51, Leu56, Met59, Ala60, Met94 and Met98

(Fig. 5d). The second cysteine, Cys55, of theCPC motif (Fig. 5c,d, yellow) lies directlyabove this characteristic hydrophobic cleft.

In summary, the salient structural featuresof MIA40 are (i) a high proportion ofunfolded segments at the N and C termini,(ii) a folded a-hairpin core stabilized bystructural disulfide pairings, (iii) a rigidN-terminal lid with an extensive array ofhydrophobic interactions with one part ofthe a-hairpin core and (iv) a solvent-exposedCPC motif ideally placed to be the active site

in the oxidation process, with its second cysteine, Cys55, lying adjacentto a hydrophobic cleft that may function as a substrate binding site.

The electron-transfer mechanism of MIA40-catalyzed oxidativefoldingSpontaneous air oxidation in the absence of MIA40 can oxidize onlyB10% of fully reduced COX176SH to the partially oxidized COX172S-S

form (with two disulfide pairs created between the twin CX9C motifsof Cox17) in 12 h. By contrast, MIA403S-S can quantitatively andrapidly (less than 30 min) oxidize COX176SH to COX172S-S, asmonitored through 1H-15N NMR spectra (Fig. 6a). These spectrashow that, upon addition of MIA403S-S, the NH resonance pattern ofCOX176SH drastically changes to that of COX172S-S—that is, the form

0.40

p.p.m.

105 MIA40 (2 mM DTT)

Cys55

Cys53

Cys53 Cys74 Cys97

20

24

28

32

36

40

44

48

52

56

60

64

68

7.47 7.43 7.32

Cβ Cβ

Cβ Cβ

CαCαCα

Cα Cα

Cys74 Cys97

202428

32

3640

44

48

52

5660

64

68

121.40

122.94 118.93 117.68

118.84

7.41 7.38 7.30

+ 2 mM DTT

117.66

Gly58

Gly41Gly57Gly62

Asn52

Leu56

Cys53

Trp51

MIA40 (no DTT)15N

110

115

120

125

11 10 9 8 7 p.p.m.1H

130

0.35

∆ HN(M

IA40

2S-S

–MIA

403S

-S)

(p.p

.m.)

0.30

0.25

0.20

0.15

0.10

0.05

20 30 40 50Lid Core

Residue number

60 70 80 90 100 110 120 130 1400.00

a

b

c

3201.0

ReducingDTT (–330 mV)

OxidizingO2 (820 mV)

–340

COX17(CX9C)

Tim10(CX3C)

MIA40(CPC)

Erv1(CX2C)

–320 –200 –150 (mV)

0.8

0.6

0.4

0.2

0.0

300280260240

Flu

ores

cenc

e in

tens

ity (

a.u.

)

Frac

tion

redu

ced

220200180160140120

300 325 350 375 400Wavelength (nm)

425 450 475 500 1E–6 1E–5 1E–4 1E–3[GSH]2/[GSSG] (M)

0.01 0.1 1 10 100

100806040200

a b c

Figure 3 Redox potential of the CPC redox active site. (a) Fluorescence emission spectra of the oxidized (50 mM phosphate buffer, pH 7.0, 0.01 mM GSSG;

broken line) and the reduced (50 mM phosphate buffer, pH 7.0, 200 mM GSH; solid line) MIA40 after excitation at 280 nm. (b) The redox equilibriumof MIA40 with different [GSH]2/GSSG ratios is shown. Data processing and determination of the equilibrium constant are previously described51. After

nonlinear regression, a value of Keq ¼ 68.4 mM ± 7.9 (correlation coefficient: 0.987) was obtained for the MIA40/glutathione equilibrium, corresponding to

a redox potential of �200 ± 5 mV for the MIA403S-S–MIA402S-S redox couple. (c) Comparison of the redox potential of MIA40 to that of components of the

disulfide relay system in the IMS.

Figure 2 The redox and structural properties of

the CPC intramolecular disulfide bonds of human

MIA40. (a) The weighted-average chemical shift

differences DHN (that is, ([(DH)2 + (DN / 5)2] /

2)1/2, where DH and DN are chemical shift

differences for 1H and 15N, respectively) between

MIA402S-S and MIA403S-S. (b) Superimposition

of two-dimensional 1H-15N HSQC spectra

(800 MHz, 298 K) of MIA402S-S (black) and of

MIA403S-S (red). Residues with NH chemical

shifts that change upon reduction lie in the

vicinity of the CPC motif (indicated in the NMR

spectra). (c) Cb and Ca chemical shift values of

Cys53, Cys74 and Cys97, either involved or not

involved in disulfide bonds with Cys55, Cys87and Cys64, respectively, are shown in the

CBCANH NMR experiment in the absence and in

the presence of 2 mM DTT.

ART IC L E S

20 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 108: Nature Structural Molecular Biology February

where Cys25, Cys35, Cys44 and Cys54 are oxidized—while the copperbinding Cys22 and Cys23 ligands remain in a reduced state andtherefore do not participate in the electron-transfer reaction. Con-sistent with this observation, MIA403S-S undergoes reduction toMIA402S-S (Fig. 6a). Clear NH resonance changes are seen forCys53 and Cys55 of the CPC motif of MIA40, but also for someneighboring residues (Trp51, Asn52, Gly62, Gly57 and Gly58, indi-cated in Figure 6a) that are all part of the N-terminal lid. Upontitration of 15N-labeled MIA403S-S with increasing amounts of 15N-labeled COX176SH, the MIA403S-S signal intensity seemed to decreasewith increasing COX176SH concentration and, concomitantly, thesignals corresponding to MIA402S-S and COX172S-S appeared andincreased in intensity, with the reaction being complete at 1:1protein:protein ratio (Fig. 6b,c).

Considering that two disulfides in COX172S-S are formed to thedetriment of one in MIA403S-S and that oxygen is difficult to eliminatefrom our reaction mixture, we postulate that the second disulfidepairing in COX17 is mediated by oxygen. As there are two electronsinvolved in the latter reaction, we guessed that transient H2O2 wasformed, and we could indeed detect itthrough a colorimetric H2O2 assay. However,the formation of the first disulfide bond,

which is MIA403S-S dependent, is a prerequisite for substrate oxida-tion, as upon addition of MIA402S-S (where oxygen is still present) toCOX6SH no electron transfer was observed. To define which is thecrucial disulfide bond of COX17 formed by MIA40, we produced twomutants of COX17 (CX9S/SX9C and SX9C/CX9S) with 15N,13C-selectively labeled cysteine residues and used NMR to investigatetheir reactions with MIA403S-S. Disulfide-bond formation betweenthe two remaining cysteine residues of CX9C motifs was detected onlyin the presence of the SX9C/CX9S mutant (Fig. 6d). This result wassupported by import experiments in isolated mitochondria where themixed intermediate with MIA40 could still form substantially for theSX9C/CX9S (or C1/4S) mutant but was almost entirely abolished forthe CX9S/SX9C (or C2/3S) mutant (Fig. 6e). Therefore, MIA403S-S

specifically catalyzes the formation of the inner disulfide bond betweenCys35 and Cys44 in COX17.

We found that the same reaction features occur in intact mitochon-dria, where the crucial mixed disulfide intermediate of the oxidativefolding reaction could be trapped and monitored (Fig. 7a). COX17was imported efficiently into isolated yeast mitochondria, where itforms a transient mixed disulfide intermediate with endogenousMia40 within 2 min of import (shown by a blue arrowhead inFigure 7a). The intermediate was stabilized by N-ethyl-maleimide(NEM) treatment, which blocks unreacted cysteines, thereby arrestingthe substrate in transit and bound to Mia40 (ref. 30). ImportedCOX17 (initially mostly in a reduced state, shown in green) becamegradually oxidized (shown in brown) to the detriment of the mixeddisulfide species (shown in blue), which disappears, as would beexpected for a productive intermediate (Fig. 7b).

We further dissected the functional role of the cysteines in theCPC motif by investigating the impact of cysteine mutations in vivousing yeast cells, and in vitro using a reconstituted system. Wegenerated three mutants of the CPC motif (SPS, CPS and SPC,exchanging the corresponding cysteine with serine) and tested themin complementation assays with a GALMia40 strain, which grows wellon galactose (SGm) but not on glucose (SCk) or lactate (SLk)(Fig. 7c, above, ‘Empty’, and Supplementary Methods). WhenGALMia40 cells were transformed with a plasmid carrying wild-typeyeast Mia40, their growth was restored on glucose and lactate (Fig. 7c,middle, ‘WT’). However, the CPS and SPS mutations were lethal,

p.p.m.

105

1.0

0.5

0.0

–0.5

–1.0

1 H-15

N N

OE

–1.5

–2.0

Residue number

–2.5

110

115

120

125

130

11 10 9 8 7 0 10 20 30 40 50 60 70 80 9010011

012

013

014

0p.p.m.1H

15N

a b

Figure 4 MIA40 is substantially unstructured at its N and C termini.

(a) 1H-15N HSQC spectrum of MIA402S-S showing that NH signals

clustered in the central region (bordered by a broken line) belong mainly to

the unstructured N and C termini of the protein. (b) 15N{1H} NOE versus

MIA402S-S residue number collected at 600 MHz in 50 mM phosphate

buffer, pH 7, and 2mM DTT. Reliable relaxation values cannot be obtained

for residues 109–118 as their NH cross-peaks are overlapped in the

NMR spectra.

a b

c d

Lid

α1

α2

α3

...Cter(105)

...Nter(41)

Core

Cys64

Cys97Cys74

Cys87

Figure 5 The solution structure of MIA402S-S.

(a) Ribbon diagram of the lowest-energy

conformer of MIA402S-S. Helix a1 of the

N-terminal lid is shown in red, and helices

a2 and a3 composing the a-hairpin core are

shown in blue and cyan. Disulfide pairings (or

free thiols) are shown in yellow. (b) Surface re-

presentation of MIA402S-S, mapping the electro-

static potential. White, uncharged residues; red,acidic residues; blue, basic residues. (c) The

hydrophobic cleft on the surface of MIA402S-S is

shown in red, with ribbon diagram in transparent

cyan. The second cysteine of the CPC motif,

Cys55, which lies adjacent to the hydrophobic

cleft, is shown in yellow. (d) The conserved

residues making up the hydrophobic cleft on the

surface of MIA402S-S are annotated and shown in

red. The ribbon diagram is shown in transparent

cyan and Cys55 is shown in yellow.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 0 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 109: Nature Structural Molecular Biology February

supporting the concept that the second cysteine of the CPC motif iscrucial for survival of the cells. The SPC mutant survived, but had aclear growth defect compared to the wild type. The viability of theSPC mutant can be explained by the fact that this mutant protein hasa high tendency to form intermolecular disulfide-bonded dimersin vitro (Supplementary Fig. 4 online) and that it is trapped in aDTT-sensitive high-molecular-weight complex in vivo (data notshown). Thus, this intermolecular disulfide bond can act as thecatalytic site.

In an in vitro reconstituted system, bead-immobilized yeast Mia40was previously reported to efficiently bind substrates in a manner thatfaithfully represents the in vivo function and allows monitoring of astably trapped mixed disulfide intermediate28,30. Mia40 formedDTT-sensitive mixed disulfide intermediates with both yeast Tim10(a CX3C substrate) and COX17 (a CX9C substrate) (Fig. 7d, arrow-head). In contrast, the CPS, SPC and SPS Mia40 mutants showedmarked differences in their ability to form the mixed disulfideintermediate. The CPS mutant showed a far stronger defect than theSPC mutant, and the double SPS mutant was essentially incapable offorming the intermediate at all (Fig. 7d). This agrees well with thecomplementation data (Fig. 7c), is in line with the positioning ofCys55 adjacent to the putative substrate binding hydrophobic cleft inthe structure of MIA40 (Fig. 5) and supports the concept that Cys55 isthe key residue for the reaction with the substrate.

The effect of the mutations is identical for both types of substrate(CX3C and CX9C), suggesting that the CPC motif–mediated Mia40mechanism of oxidation proceeds unaffected by the spacing between

the substrate cysteines. The stable covalent mixed disulfide intermedi-ate, obtained in this assay, was found to be consistent with a 1:1complex of Mia40–Tim10, as shown by blue native PAGE (Fig. 7e).This is in agreement with our data in Supplementary Figure 1 andFigure 6b, and shows that monomeric Mia40 is active. The bindingassay was also done using immobilized MIA40—the same proteinused in all our NMR analyses—to further ascertain that the humanand yeast proteins are functionally equivalent. MIA40 can indeedefficiently bind both substrates (CX9C COX17 and CX3C yTim10),efficiently and can with either of them form a DTT-sensitive mixeddisulfide intermediate, mirroring the behavior of Mia40 (Supplemen-tary Fig. 5 online).

A final result regards the hydrophobic cleft of MIA40 (Fig. 7f–h).When six or eight hydrophobic residues are mutated to alanine, weobserved a strong defect in vitro, similar to that of the SPS mutant,whereas mutagenesis of only four residues did not result in substantialdefects. Retention of binding in two of the three sextuple mutantsindicates also that Ile49 and Trp51 are probably less important thanthe combined effect of Leu56, Met59, Phe72, Phe75, Phe91 and Met94.The fact that several hydrophobic residues must be mutagenized incombination to produce a substantial effect reflects the weak nature ofthe intermolecular noncovalent hydrophobic interactions, whichbecomes physiologically relevant when an extended hydrophobicsurface is created. Additionally, these interactions are expected to betransient, as more permanent interactions would ‘freeze’ the inter-mediate, thus hindering product release from Mia40. As shown bycomplementation assays (Fig. 7h), the cells harboring any of the

p.p.m.15N105

CN

C CN

C C

C

CX9C

COX176SH

MIA403S-S/Mixture MIA402S-S/Mixture

MIA403S-S

MIA402S-SCOX172S-S

COX172S-S/Mixture

CX9C C

110

115

120

125

130

p.p.m.15N

105

100

122

Trp51 (MIA402S-S)

Trp51 (MIA403S-S)

Cys44 (COX172S-S)

Cys35 (COX172S-S)

Cys53 (MIA403S-S)

Cys53 (MIA402S-S)

p.p.m.15N

124

126

7.8 7.7 7.6 7.5 7.4

2 min import

WT C2/3SC1/4S

interm

–βMe

ox

red

7.3 7.2 p.p.m.1H

80

60

Mol

ar fr

actio

n (%

)

40

20

0.0 0.25

15N

1H

p.p.m.

116

117

118

119

120

1218.6 8.4 8.2 8.0 7.8 7.6 7.4 p.p.m

0.5

[COX17]/[MIA40]

0.75 1.0 1.250

110

115

120

125

130

Gly57 Gly57Gly62

Gly62

Asn52Asn52

Leu56 Leu56

Cys53 Cys53

Cys44

Cys35

Cys54

Cys25

Trp51 Trp51

Gly58 Gly58Cys55

p.p.m.15N105

110C

N N

CPCCPC

115

120

125

130

p.p.m.15N 105

110

115

120

125

130

11 10 9 8 7 p.p.m.

Reduction of MIA403S-S

Oxidation ofCOX176SH

1H 11 10 9 8 7 p.p.m.1H

a b

d e

c

Figure 6 Interaction of MIA40 with substrates. (a) Oxidation and reduction processes of COX176SH and MIA403S-S, respectively, as followed by NMR. The1H-15N HSQC spectrum of a 1:1 15N-labeled MIA403S-S / 15N-labeled COX176SH mixture is superimposed with the 1H-15N HSQC spectra of MIA403S-S or

MIA402S-S or COX172S-S. The 1H-15N HSQC spectrum of COX176SH is also shown. NH resonances of cysteine residues and some surrounding residues of

MIA40 and COX17 are indicated in the NMR spectra. The disulfide formation in COX17 (above) and disruption in MIA40 (below) in relation to their

structural changes are shown schematically in the inset. (b) Plot shows the formation of MIA402S-S (red) and COX172S-S (blue) and the decrease in of

MIA403S-S level (black) as a function of the COX17 / MIA40 molar ratio. The cross-peaks of residues Trp51, Cys53 and Gly57, whose 1H- and 15N chemical

shifts change substantially depending on the redox state of MIA40, have been selected to evaluate the molar fraction of MIA40, whereas those of residues

Lys44 and Glu47 provide the molar fraction of COX172S-S. (c) Overlay of a selected region of the 1H-15N HSQC spectra of MIA403S-S in the presence of

0 (black), 0.5 (red) or 1.2 (blue) equivalents of 15N-labeled COX176SH, showing the quantitative formation of MIA402S-S at a 1:1 protein ratio. (d) Overlay of

the 1H-15N HSQC spectra of the (13C,15N)Cys-selectively labeled SX9C/CX9S COX176SH mutant in the presence of 0 (black) and 1 (red) equivalents of

unlabeled MIA403S-S, showing the formation of NH cross-peaks with 1H- and 15N chemical shifts typical of those found for Cys44 and Cys35 in wild-typeCOX172S-S. (e) 35S-labeled COX17 and the cysteine-to-serine mutants C1/4S (outer disulfide bridge) or C2/3S (inner disulfide bridge) were imported into

wild-type (WT) yeast mitochondria for 2 min at 30 1C, followed by nonreducing SDS-PAGE and autoradiography. The mixed disulfide intermediate (interm)

with yeast Mia40 is shown with an arrowhead, and the oxidized (ox) and reduced (red) species of COX17 are indicated.

ART IC L E S

20 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 110: Nature Structural Molecular Biology February

mutations in the Mia40 hydrophobic cleft do not survive. This clearlyargues for a crucial role of the hydrophobic cleft in vivo. Acorresponding hydrophobic patch consisting of conserved residuesof the substrate is present in helix a2 of COX17 (Supplementary Fig. 6online). We have generated a hypothetical docking model of theCOX17–MIA40 adduct, showing the relevant hydrophobic inter-molecular interactions, using the program HADDOCK35 (Supple-mentary Fig. 6 and Supplementary Methods).

Collectively, the in vitro and in vivo experiments show that (i) theCPC motif is crucial for the Mia40 oxidative function, (ii) the secondcysteine of the CPC motif is the vital active site cysteine, (iii) thehydrophobic cleft mediates substrate binding and (iv) defects observedin vitro are mirrored as phenotypes in vivo.

DISCUSSIONThe structure of MIA40 bears no similarity with any other knownoxidoreductase in the cell, thus defining MIA40 as a new type ofoxidoreductase. It does not have a thioredoxin domain, which iscommon among other known oxidoreductases such as eukaryoticPDI and bacterial Dsb proteins. In fact, the well-folded part ofMIA40 is much smaller (66 residues) than the typical thio-redoxin domain (about 127 residues). In this respect, MIA40 can bethought of as the most ‘minimal’ oxidoreductase domain describedso far.

MIA40 has an a-hairpin core, common to other mitochondrialproteins that contain CX9C motifs36 and are, presumably all,substrates for MIA40. In this respect, MIA40 resembles structurallyits own substrates, and they may have evolved from a commonancestor. Functional and structural diversification from a putativecommon ancestor stems, at least partially, from distinct differences inthe N-terminal region upstream of the a-hairpin core. First, theN-terminal lid of MIA40 has a much more defined conformationand is more structurally organized and rigid than the N-terminal endof COX172S-S. Second, the CPC active site of MIA40, which precedesthe well-defined helix a1, lies at a greater distance from the corecompared to the CC metal binding motif in COX172S-S. Third, theN-terminal lid of MIA40 is stabilized onto the core by an extensivearray of hydrophobic interactions that are unique to MIA40 andabsent in other mitochondrial proteins that share the a-hairpin core.These unique properties of the N-terminal lid endow MIA40 with anoxidoreductase function but not with a copper-chaperone function, incontrast with the equivalent N terminus of COX17.

The N-terminal lid is the functional site of the molecule, with theCPC motif forming the active center. This MIA40-unique motif isaccessible to the solvent and positioned favorably for a direct and faciletransfer of the disulfide bond to the substrate. In this respect, the CPCmotif functions as a redox active site, shuttling between the oxidizedand reduced states upon binding to the substrate, without affecting the

kDa 10% lmp 2 5

–βMe+βMe

10 20 30

100

80

SG

Empty

Empty

WT W

TW

T

SPS SPS

Tim10 (CX3C)

Tim10 (CX3C)

COX17 (CX9C)

COX17 (CX9C)

SPSSPC

SPC

SPC kDa 10%10067

4526

14

kDa

kDa

100

100

67

4526

14

67

4526

14

kDa

kDa

100

232

140

67

BN-PAGE

674526

14

CPS CPS

WT

WT 1 2 3 6 7

1 2 3 4 5 6 7 SPSSPS

SPC10%CPS

WT10%

1 2 3 4 5 6 7 SPSWT10%

CPS

WT

SPSSPC

CPS

SC

SL

60

40

20

2 5 10 20 30 min0

min

Spe

cies

dis

trib

utio

n (%

)

interm

interm

ox

ox

red

red

10067

4526

14

–βMe

–βMe

+βMe

SG

SC

SL

WT CPSSPC

SPS

No Mia40

l292A W294A F311A F315A I49A W51A F72A F75A

L56A M59A F72A F75A

F72A F75A F91A M94

I49A W51A L56A M59AF72A F75

I49A W51A F72A F75AF91A M94

L56A M59A F72A F75AF91A M94A

I49A W51A L56A M59AF72A F75A F91A M94A

L299A M302A F311A F315A

F311A F315A F334A M337A

l292A W294A L299A M302AF311A F315A

I292A W294A F311A F315AF334A M337A

L299A M302A F311A F315AF334A M337A

I292A W294A L299A M302AF311A F315A F334A M337A

MIA40

1

2

3

4

5

6

7

a

e f g

h

b c d

Figure 7 The second cysteine, Cys55, of the active-site CPC is essential in vivo and in vitro. (a) COX17 forms a transient mixed disulfide intermediate with

yeast Mia40 in organello. 35S-labeled COX17 was imported into wild-type yeast mitochondria for the indicated time points at 30 1C, followed by nonreducing

SDS-PAGE and autoradiography. The mixed intermediate (‘interm’, blue), the oxidized COX17 monomer (‘ox’, brown) and the reduced COX17 monomer(‘red’, green) are indicated. (b) Quantitative analysis of a. The amount of oxidized COX17 increases with time (brown) to the detriment of the transient mixed

intermediate with Mia40 (blue) and the reduced COX17 (green). (c) In vivo complementation of CPC mia40 mutants. A GAL-MIA40 strain containing the

MIA40 gene under the control of the GAL10 promoter was transformed with plasmids carrying wild-type (WT) or cysteine mutants of yeast MIA40 or nothing

(Empty). The resulting transformants were grown in galactose and then shifted to glucose for 18 h before plating on galactose (SG), on glucose (SC) or on

lactate with 0.2% (w/v) glucose (SL). (d) Reconstitution of binding in vitro. Bead-immobilized yeast Mia40 (WT or cysteine mutants) was incubated with

radioactive yeast Tim10 (above) or COX17 (below), the interaction was arrested by NEM and protein samples were analyzed by SDS-PAGE (with or without

b-mercaptoethanol, +/�bME) and autoradiography. (e) The Mia40–Tim10 mixed disulfide intermediate is a 1:1 monomer. 35S-labeled Tim10 was incubated

with bead-immobilized Mia40 as in d, but the bound material was released from the beads by thrombin and analyzed on blue native PAGE. (f) Table showing

the mutants that were made in the hydrophobic cleft of yeast Mia40 and the corresponding mutation in MIA40. All of the residues were exchanged with

alanines. The same mutant numbering is used in g and h. (g) Reconstitution of binding in vitro using the hydrophobic mutants of Mia40, as in d. (h) In vivo

complementation of the hydrophobic mutants of Mia40 in the GAL-MIA40 strain, as in c.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 0 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 111: Nature Structural Molecular Biology February

rest of the MIA40 molecule structurally (Figs. 3 and 6). Actually, theCPC motif protrudes into the solvent from a hydrophobic proteinsurface formed by a number of hydrophobic and aromatic residues, allof which are strictly conserved. The presence of a proline residue in theactive site near the hydrophobic cleft where the substrate could bind isa feature of other oxidoreductases such as DsbA and PDI37,38, and itwould be tempting to speculate that it has a functional role. However,mutation of this conserved proline resulted in no apparent changes inmixed disulfide formation with the substrate, nor did it cause growthdefects in vivo (data not shown). This observation, along with the factthat the crucial proline (for example Pro151 in DsbA, which isadjacent to the active site CPHC motif at residues 30–33 of DsbA)is in a cis conformation in these other enzymes as opposed to its transconformation in MIA40, argue for a very different mechanism in thecase of MIA40 compared to the thioredoxin-like oxidoreductases.

The structural properties of MIA40 may also rationalize the dualfunction that this protein must perform in the IMS, as an oxidase andas an import receptor: (i) its N-terminal lid is endowed with theoxidation-active CPC site, which introduces disulfides into the sub-strates, and (ii) the characteristic hydrophobic cleft functions as asubstrate recognition and binding site, stabilizing initial noncovalentinteractions that appropriately position the partially folded substrates(which usually have exposed hydrophobic segments) so that the firstcrucial mixed disulfide can form. In this manner, MIA40 fulfills animport-receptor role.

Given these properties of MIA40, and the fact that there is noevidence of a protein disulfide-isomerase activity in the intermem-brane space, it seems possible that such an isomerase activity might bedispensable. In agreement with this, MIA40 has a much greaterspecificity than proteins such as PDI and DsbA and introduces specificdisulfide bonds into partially folded substrates that are properlypositioned on MIA40. This is the first example of such a proteinand distinguishes the oxidative folding pathway from those in the ERof eukaryotes and the periplasm of bacteria.

MIA40 is necessary and sufficient for oxidation of its substrates, viathe N-terminal lid CPC motif as its active site, with the secondcysteine, Cys55, in CPC being the catalytic residue. The standard redoxpotential of the CPC disulfide bond (�200 mV) makes oxidationof substrate motifs with more reducing redox potentials, such as�340 mV (CX9C of COX17) or �320 mV (CX3C of small Tims),

thermodynamically favored. In the same way, oxidation of the CPCitself by the more oxidizing C-terminal CX2C pair of Erv1 (redoxpotential of �150 mV) is also favored. The concept of disulfide relaybetween Erv1, MIA40 and the substrate can be rationalized on thebasis of our structural characterization and is in complete agreementwith the recent biochemical and reconstitution analysis of the relaysystem for the yeast proteins33.

The working model of the electron-transfer reaction that wepropose is shown in Figure 8: (i) the substrate (for exampleCOX17) in the fully reduced state cannot efficiently be transformedinto the partially oxidized state by oxygen alone for kinetic reasons;(ii) MIA40 efficiently favors the formation of one of the two disulfidebonds within the twin CX9C motif of COX17; (iii) once the firstdisulfide bond (between Cys35 and Cys44) is introduced by MIA40,the second disulfide bond between the two remaining cysteines ofCOX17, which are now favorably positioned, can then be formedrapidly by oxygen. In vivo, the second disulfide may alternatively beformed by Erv1, which has been found to be physically linked toMia40 under certain conditions9. The ability of MIA40 to catalyzedisulfide-bond formation for both CX9C and CX3C proteins, asobserved for both Cox17 and Tim10, can be rationalized on thefollowing basis: it is sufficient that MIA40 interacts with one cysteinepair, and once this first disulfide bond is formed, the other cysteinepair can undergo a facile oxidation independently of how manyresidues are in between the two cysteine pairs. Spacing of n residueswithin the CXnC motifs is crucial for the final stabilization of thesubstrate structure, either in a relatively aligned two-helical arrange-ment, as in Cox17 (ref. 36), or when the two helices are more tilted inrelation to each other, as it is the case for the small Tims31. Thisworking model is consistent with the observation that the fourthcysteine of the CX3C motif of Tim10 (connected to the first oneto form the outer disulfide) is necessary and sufficient (in vitro andin organello) for release from MIA40 (ref. 30).

In conclusion, here we have elucidated the Mia40-dependentoxidative folding reaction for mitochondrial cysteine-rich proteins atthe molecular level. The mechanism, proposed on the basis of in vitroand in vivo protein-protein interaction studies between Mia40 and itssubstrates, gives a clear picture of the mitochondrial IMS proteinoxidative folding and can explain the wide range of different substratesof Mia40—proteins with repetitive cysteines organized in twin CX3C(small Tims), twin CX9C (Cox17, Cox19, Mdm35, Mic14, Mic17) ortwin CX2C motifs (Erv1). The present results are an important steptoward revealing the full molecular details of oxidative protein foldingin eukaryotes and the interactions of the mitochondrial machinerydedicated to this process.

METHODSNMR spectroscopy. We carried out all NMR experiments used for resonance

assignment and structure calculations on 0.5–1 mM 13C,15N-labeled and15N-labeled MIA402S-S and MIA403S-S samples in 50 mM phosphate buffer,

pH 7.0, containing 10% (v/v) D2O (plus 2 mM DTT for MIA402S-S). All NMR

spectra were collected at 298 K, processed using the standard Bruker software

(Topspin) and analyzed through the CARA program39. The 1H, 13C and 15N

resonance assignments of MIA402S-S and MIA403S-S were performed following

a standard protocol using, for backbone assignment, triple-resonance NMR

experiments and, for side chain assignment, TOCSY-based NMR experiments.

Structure calculations of MIA402S-S were performed with the software

package ATNOS/CANDID/CYANA40–42, using as input the amino acid

sequence, the chemical shift lists, three [1H,1H]-NOE experiments (two-

dimensional NOESY, three-dimensional 13C-resolved NOESY and three-

dimensional 15N-resolved NOESY) and f and c dihedral angle constraints

Substrate

CNter

Nter

C C

CC

C

C C

C

X9

X9 X9

COX176SH

COX172S-S

MIA402S-S

O2

MIA403S-S

Oxidase

CORE

CORE

SHSH

CPC

CPC

α1

α1

Nter

Nter

Cter

Cter

C CX9

C

Figure 8 Model for the interaction of MIA40 with its substrates. Schematic

representation of the oxidative folding reaction between MIA403S-S and

COX176SH, as observed in vitro by NMR. The disulfide bond of the MIA40

CPC redox active site (red) is readily reduced and transferred to the

substrate; the second intramolecular disulfide of the substrate can then be

formed by O2. The oxidizing equivalents transferred in the reaction are

shown in red.

ART IC L E S

20 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 112: Nature Structural Molecular Biology February

derived from the chemical shift index analysis34. CSI and PECAN software43

were used to estimate protein secondary structure. In addition, two disulfide

bonds between Cys64 and Cys97 and between Cys74 and Cys87 were imposed,

as resulted from their 13C chemical shift analysis. All the other possible

combinations of disulfide pairing determine a drastic increase in CYANA target

function, as they are not in agreement with long-range NOE patterns.

We subjected the 20 conformers with the lowest residual target function

values to restrained energy minimization in explicit water with AMBER 8.0

(ref. 44) and evaluated the quality of the structures with the programs

PROCHECK, PROCHECK-NMR45 and WHATIF46. The conformational and

energetic analyses of the final REM family of 20 structures are reported in

Table 1. The Ramachandran plot of the mean minimized structure of MIA402S-

S shows that 81.1% of residues lie in the most favorable region of the plot,

18.9% of residues lie in the allowed region and no residues lie in the generously

and disallowed regions.

We performed relaxation experiments on 15N-labeled samples at 500 MHz

and 600 MHz, and measured the 15N backbone longitudinal (R1) and

transverse (R2) relaxation rates, as well as the heteronuclear 15N{1H} NOEs,

as previously described47,48.

We followed disulfide reduction of the MIA403S-S form by NMR. Up to

100 mM DTT was added stepwise in anaerobic conditions to 15N-labeled

MIA403S-S in 50 mM phosphate buffer, pH 7.0, containing 10% (v/v) D2O, and

we acquired two-dimensional 1H-15N HSQC spectra.

To follow the oxidative folding of COX17, we first produced 15N-labeled or15N, 13C-labeled COX176SH, adding a large excess of DTT, which was then re-

moved through a PD-10 desalting column, and the cysteine redox state was then

checked by NMR. COX176SH was then left at air exposure or was titrated with15N-labeled MIA403S-S at 25 1C, following the reaction by two-dimensional1H-15N HSQC spectra and/or triple-resonance experiments. We followed a

similar procedure for the reaction between (13C,15N)cysteine-selectively labeled

COX176SH mutants and unlabeled Mia403S-S.

Import in yeast mitochondria. We synthesized 35S-labeled precursor proteins

using the TNT SP6 coupled transcription/translation kit (Promega). We

imported the radioactive material in wild-type yeast mitochondria (50–

100 mg) in the presence of 2 mM ATP and 2.5 mM NADH for the indicated

time points at 30 1C. We then resuspended mitochondria in 1.2 M sorbitol and

20 mM HEPES, pH 7.4, followed by a treatment with proteinase K (0.1 mg

ml�1) to remove unimported material and resuspension in Laemmli sample

buffer with or without b-mercaptoethanol, as indicated. We analyzed samples

by SDS-PAGE and digital autoradiography (Molecular Dynamics). For the

competition experiments, we imported the radioactive precursors in mito-

chondria with or without 10 mM of recombinant MIA40.

In vitro reconstitution of substrate binding on Mia40. We immobilized

Mia40 as a GST-fusion and incubated it with radioactive precursor for 10 min

at 4 1C. The reaction was stopped with the addition of 10 mM N-ethylmalei-

mide. We then washed the bound material three times with 150 mM NaCl,

50 mM Tris, pH 7.4, 0.1% (w/v) BSA and 0.1% (v/v) Triton X-100,

resuspended it in Laemmli buffer and analyzed it by nonreducing SDS-PAGE

and autoradiography (Molecular Dynamics). For the blue native analysis, we

released the bound material from the beads by thrombin treatment for 1 h at

4 1C. The released fraction was then loaded onto a 6–16% (v/v) gradient blue

native electrophoresis gel49 followed by autoradiography.

Accession codes. Protein Data Bank: The atomic coordinates and structural

restraints for MIA402S-S have been deposited under accession code 2K3J.

BioMagResBank: resonance assignments are under accession code 15763.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe are grateful to A. Makris (Mediterranean Agronomic Institute of Chania,Crete) for the plasmid M4801, N. Pfanner (University of Freiburg) for the porinSP6 plasmid, N. Petrakis (K.T. laboratory, Institute of Molecular Biology andBiotechnology-Foundation for Research and Technology (IMBB-FORTH)) forhelp with the use of the Chimera software used in Figure 5, A. Hatzi (K.T. group,IMBB-FORTH) for some help with part of the mutagenesis and T. Economou(IMBB-FORTH) and T. Pugsley (Institut Pasteur) for comments on themanuscript. This work was supported by European Network of ResearchInfrastructures for Providing Access and Technological Advancements in Bio-NMR Contract 026145, by the SPINE II-COMPLEXES Contract, LSHG-CT-2006-031220 ‘‘From Receptor to Gene: Structures of Complexes from SignallingPathways Linking Immunology, Neurobiology and Cancer,’’ and by funds fromIMBB-FORTH, the University of Crete and the European Social Fund andNational Resources (to K.T.). D.P.S. was supported by a PENED grant. Thiswork was also supported in part by the Italian MIUR-FIRB (Fondo per gliInvestimenti della Ricerca di Base, Grant protocollo, MIUR-RBLA032ZM7).Molecular graphics images were produced using the UCSF Chimera package50

from the Resource for Biocomputing, Visualization, and Informatics at theUniversity of California, San Francisco (supported by the US NationalInstitutes of Health grant P41 RR-01081).

AUTHOR CONTRIBUTIONSI.B. and L.B. planned the research, discussed and guided the flow of experimentsand coordinated the writing of the text, to which all the co-authors contributed;M.M. and C.C. coordinated and performed protein production andcharacterization; A.G. solved the MIA402S-S NMR structure; S.C.-B. plannedand recorded the NMR spectra and coordinated the titration experiments; D.P.S.performed the in vivo and in vitro mutational analysis and interactions andanalyzed data; N.K. provided technical support; K.T. designed experiments,analyzed data and coordinated the presentation of the data and the writing ofthe paper.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Gruber, C.W., Cemazar, M., Heras, B., Martin, J.L. & Craik, D.J. Protein disulphideisomerase: the structure of oxidative folding. Trends Biochem. Sci. 31, 455–464(2006).

2. Hatahet, F. & Ruddock, L.W. Substrate recognition by the protein disulfide isomerases.FEBS J. 274, 5223–5234 (2007).

3. Sevier, C.S. & Kaiser, C.A. Ero1 and redox homeostasis in the endoplasmic reticulum.Biochim. Biophys. Acta 1783, 549–556 (2008).

Table 1 NMR and refinement statistics for MIA402S-S

MIA402S-S

NMR distance and dihedral constraints

Distance constraints

Total NOE 1,321

Intra-residue 230

Inter-residue 1,091

Sequential (|i – j| ¼ 1) 437

Medium-range (|i – j| o 4) 398

Long-range (|i – j| 4 5) 256

Hydrogen bonds 18

Total dihedral angle restraints 60

f 30

c 30

Structure statistics

Violations (mean ± s.d.)

Distance constraints (A) 0.018 ± 0.001

Dihedral angle constraints (1) 0.48 ± 0.18

Max. dihedral angle violation (1) 1.65 ± 0.38

Max. distance constraint violation (A) 0.33 ± 0.03

Deviations from idealized geometry

Bond lengths (A) 0.0214 ± 0.0001

Bond angles (1) 2.574 ± 0.057

Impropers (1) 6.685 ± 0.595

Average pairwise r.m.s. deviation* (A)

Heavy 0.51 ± 0.12

Backbone 0.90 ± 0.09

*Pairwise r.m.s. deviation was calculated among 20 refined structures.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 0 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 113: Nature Structural Molecular Biology February

4. Collet, J.F. & Bardwell, J.C. Oxidative protein folding in bacteria. Mol. Microbiol. 44,1–8 (2002).

5. Kadokura, H., Katzen, F. & Beckwith, J. Protein disulfide bond formation in prokar-yotes. Annu. Rev. Biochem. 72, 111–135 (2003).

6. Nakamoto, H. & Bardwell, J.C. Catalysis of disulfide bond formation and isomerizationin the Escherichia coli periplasm. Biochim. Biophys. Acta 1694, 111–119 (2004).

7. Chacinska, A. et al. Essential role of Mia40 in import and assembly of mitochondrialintermembrane space proteins. EMBO J. 23, 3735–3746 (2004).

8. Lu, H., Allen, S., Wardleworth, L., Savory, P. & Tokatlidis, K. Functional TIM10chaperone assembly is redox-regulated in vivo. J. Biol. Chem. 279, 18952–18958(2004).

9. Mesecke, N. et al. A disulfide relay system in the intermembrane space of mitochondriathat mediates protein import. Cell 121, 1059–1069 (2005).

10. Tokatlidis, K. A disulfide relay system in mitochondria. Cell 121, 965–967 (2005).11. Allen, S., Balabanidou, V., Sideris, D.P., Lisowsky, T. & Tokatlidis, K. Erv1 mediates the

Mia40-dependent protein import pathway and provides a functional link to therespiratory chain by shuttling electrons to cytochrome c. J. Mol. Biol. 353,937–944 (2005).

12. Bihlmaier, K. et al. The disulfide relay system of mitochondria is connected to therespiratory chain. J. Cell Biol. 179, 389–395 (2007).

13. Dabir, D.V. et al. A role for cytochrome c and cytochrome c peroxidase in electronshuttling from Erv1. EMBO J. 26, 4801–4811 (2007).

14. Rissler, M. et al. The essential mitochondrial protein Erv1 cooperates with Mia40 inbiogenesis of intermembrane space proteins. J. Mol. Biol. 353, 485–492 (2005).

15. Naoe, M. et al. Identification of Tim40 that mediates protein sorting to the mitochon-drial intermembrane space. J. Biol. Chem. 279, 47815–47821 (2004).

16. Terziyska, N. et al. Mia40, a novel factor for protein import into the intermembranespace of mitochondria is able to bind metal ions. FEBS Lett. 579, 179–184 (2005).

17. Hofmann, S. et al. Functional and mutational characterization of human MIA40 actingduring import into the mitochondrial intermembrane space. J. Mol. Biol. 353,517–528 (2005).

18. Gabriel, K. et al. Novel mitochondrial intermembrane space proteins as substrates ofthe MIA import pathway. J. Mol. Biol. 365, 612–620 (2007).

19. Cobine, P.A., Pierrel, F. & Winge, D.R. Copper trafficking to the mitochondrion andassembly of copper metalloenzymes. Biochim. Biophys. Acta 1763, 759–772 (2006).

20. Banci, L. et al. Mitochondrial copper(I) transfer from Cox17 to Sco1 is coupled toelectron transfer. Proc. Natl. Acad. Sci. USA 105, 6803–6808 (2008).

21. Banci, L. et al. Modeling protein-protein complexes involved in the cytochrome coxidase copper-delivery pathway. J. Proteome Res. 6, 1530–1539 (2007).

22. Bauer, M.F., Hofmann, S., Neupert, W. & Brunner, M. Protein translocation intomitochondria: the role of TIM complexes. Trends Cell Biol. 10, 25–31 (2000).

23. Endres, M., Neupert, W. & Brunner, M. Transport of the ADP/ATP carrier of mitochon-dria from the TOM complex to the TIM22.54 complex. EMBO J. 18, 3214–3221(1999).

24. Vial, S. et al. Assembly of Tim9 and Tim10 into a functional chaperone. J. Biol. Chem.277, 36100–36108 (2002).

25. Sevier, C.S. & Kaiser, C.A. Conservation and diversity of the cellular disulfide bondformation pathways. Antioxid. Redox Signal. 8, 797–811 (2006).

26. Tu, B.P. & Weissman, J.S. Oxidative protein folding in eukaryotes: mechanisms andconsequences. J. Cell Biol. 164, 341–346 (2004).

27. Wilkinson, B. & Gilbert, H.F. Protein disulfide isomerase. Biochim. Biophys. Acta1699, 35–44 (2004).

28. Milenkovic, D. et al. Biogenesis of the essential Tim9-Tim10 chaperone complex ofmitochondria: site-specific recognition of cysteine residues by the intermembranespace receptor Mia40. J. Biol. Chem. 282, 22472–22480 (2007).

29. Muller, J.M., Milenkovic, D., Guiard, B., Pfanner, N. & Chacinska, A. Precursoroxidation by Mia40 and Erv1 promotes vectorial transport of proteins into themitochondrial intermembrane space. Mol. Biol. Cell 19, 226–236 (2008).

30. Sideris, D.P. & Tokatlidis, K. Oxidative folding of small Tims is mediated by site-specific docking onto Mia40 in the mitochondrial intermembrane space. Mol. Micro-biol. 65, 1360–1373 (2007).

31. Webb, C.T., Gorman, M.A., Lazarou, M., Ryan, M.T. & Gulbis, J.M. Crystal structure ofthe mitochondrial chaperone TIM9.10 reveals a six-bladed a-propeller. Mol. Cell 21,123–133 (2006).

32. Sharma, D. & Rajarathnam, K. 13C NMR chemical shifts can predict disulfide bondformation. J. Biomol. NMR 18, 165–171 (2000).

33. Grumbt, B., Stroobant, V., Terziyska, N., Israel, L. & Hell, K. Functional characteriza-tion of Mia40p, the central component of the disulfide relay system of the mitochon-drial intermembrane space. J. Biol. Chem. 282, 37461–37470 (2007).

34. Wishart, D.S. & Sykes, B.D. The 13C chemical shift index: a simple method for theidentification of protein secondary structure using 13C chemical shift data. J. Biomol.NMR 4, 171–180 (1994).

35. Dominguez, C., Boelens, R. & Bonvin, A.M. HADDOCK: a protein-protein dockingapproach based on biochemical or biophysical information. J. Am. Chem. Soc. 125,1731–1737 (2003).

36. Banci, L. et al. A structural-dynamical characterization of human Cox17. J. Biol.Chem. 283, 7912–7920 (2008).

37. Kadokura, H., Tian, H., Zander, T., Bardwell, J.C. & Beckwith, J. Snapshots of DsbAin action: detection of proteins in the process of oxidative folding. Science 303,534–537 (2004).

38. Qin, J., Clore, G.M., Kennedy, W.P., Kuszewski, J. & Gronenborn, A.M. The solutionstructure of human thioredoxin complexed with its target from Ref-1 reveals peptidechain reversal. Structure 4, 613–620 (1996).

39. Keller, R. The Computer Aided Resonance Assignment Tutorial (Cantina, Goldau,2004).

40. Guntert, P. Automatd NMR structure calculation with CYANA. Methods Mol. Biol. 278,353–378 (2004).

41. Herrmann, T., Guntert, P. & Wuthrich, K. Protein NMR structure determination withautomated NOE assignment using the new software CANDID and the torsion angledynamics algorithm DYANA. J. Mol. Biol. 319, 209–227 (2002).

42. Herrmann, T., Guntert, P. & Wuthrich, K. Protein NMR structure determination withautomated NOE-identification in the NOESY spectra using the new software ATNOS.J. Biomol. NMR 24, 171–189 (2002).

43. Eghbalnia, H.R., Wang, L., Bahrani, A., Assadi, A. & Markley, J.L. Protein energeticconformational analysis from NMR chemical shifts (PECAN) and its use in determiningsecondary structural elements. J. Biomol. NMR 32, 71–81 (2005).

44. Case, D.A. et al. AMBER 8.0, (San Francisco, CA, University of California2004).

45. Laskowski, R.A., Rullmann, J.A.C., MacArthur, M.W., Kaptein, R. & Thornton, J.M.AQUA and PROCHECK-NMR: programs for checking the quality of protein structuressolved by NMR. J. Biomol. NMR 8, 477–486 (1996).

46. Vriend, G. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8,52–56 (1990).

47. Farrow, N.A. et al. Backbone dynamics of a free and phosphopeptide-complexed Srchomology 2 domain studied by 15N NMR relaxation. Biochemistry 33, 5984–6003(1994).

48. Grzesiek, S. & Bax, A. The importance of not saturating H2O in protein NMR.Application to sensitivity enhancement and NOE measurements. J. Am. Chem. Soc.115, 12593–12594 (1993).

49. Schagger, H. & Von Jagow, G. Blue native electrophoresis for isolation of membraneprotein complexes in enzymatically active form. Anal. Biochem. 199, 223–231(1991).

50. Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory researchand analysis. J. Comput. Chem. 25, 1605–1612 (2004).

51. Banci, L. et al. Human Sco1 functional studies and pathological implications of theP174L mutant. Proc. Natl. Acad. Sci. USA 104, 15–20 (2007).

ART IC L E S

20 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 114: Nature Structural Molecular Biology February

RDE-1 slicer activity is required only for passenger-strandcleavage during RNAi in Caenorhabditis elegansFlorian A Steiner1, Kristy L Okihara1, Suzanne W Hoogstrate1, Titia Sijen1–3 & Rene F Ketting1,3

RNA interference (RNAi) is a process in which double-stranded RNA is cleaved into small interfering RNAs (siRNAs) that inducethe destruction of homologous single-stranded mRNAs. Argonaute proteins are essential components of this silencing process; theybind siRNAs directly and can cleave RNA targets using a conserved RNase H motif. In Caenorhabditis elegans, the Argonauteprotein RDE-1 has a central role in RNAi. In animals lacking RDE-1, the introduction of double-stranded RNA does not triggerany detectable level of RNAi. Here we show that RNase H activity of RDE-1 is required only for efficient removal of thepassenger strand of the siRNA duplex and not for triggering the silencing response at the target-mRNA level. These resultsuncouple the role of the RDE-1 RNase H activity in small RNA maturation from its role in target-mRNA silencing in vivo.

In the process of RNAi, long double-stranded RNA (dsRNA) inducesthe destruction of homologous single-stranded mRNAs1. The longdsRNA is processed into small RNA duplexes by a Dicer protein familymember2,3. The resulting siRNAs are subsequently bound by Argo-naute proteins, which are essential components of most silencingprocesses involving small RNAs4,5. In RNAi, Argonaute proteins notonly bind siRNAs directly but can also cleave target RNA6–11. The latteris achieved through RNase H, or ‘slicer’ activity within the PIWIdomain, which is a common feature of Argonaute proteins7,12–18.Three conserved amino acids, two aspartates and a histidine, havebeen identified as catalytic residues of the RNase H activity and formthe so-called DDH motif 10,19. RNase H–mediated target cleavage isessential for RNAi in mammals and flies, and mutation of any residuewithin the RNase H catalytic triad leads to the loss of cleavage activityand silencing7,17,20,21. In many organisms, the RNase H activity also hasa function in siRNA maturation. Following processing by Dicer, thesiRNA is loaded in duplex form into the Argonaute protein. Thepassenger strand has to be subsequently removed to make the guidestrand accessible for the target. The RNase H motif of Argonauteproteins has been shown to be important for passenger-strand cleavage,a process required for removal of this strand18,21–27. In cases whereRNase H–mediated cleavage is prevented by mismatches within theduplex, as is true for many microRNAs (miRNAs), or upon amino acidsubstitutions in the catalytic triad of the DDH motif of the Argonauteprotein, an unclear mechanism removes the passenger strand25.

In C. elegans, several members of the Argonaute protein family arerequired to achieve efficient RNAi. RDE-1 binds primary siRNAsderived from the double-stranded trigger RNA. This primary Argo-naute complex triggers an amplification machinery containing anRNA-directed RNA polymerase to generate secondary siRNAs28–30.

The secondary siRNAs are bound by a set of redundant secondaryArgonaute proteins (SAGOs), which are also involved in targetdegradation31. Of the Argonaute proteins implicated in RNAi, onlyRDE-1 and CSR-1 seem to have a catalytically active RNase H domain.Many other Argonaute proteins in C. elegans also carry the DDHmotif, implying that other pathways require slicer activity as well31

(Supplementary Fig. 1 online).In vitro analysis demonstrated that siRNA-mediated mRNA-

cleavage activity in C. elegans extracts is mediated mainly by CSR-1and not by RDE-1 (ref. 32). We therefore set out to analyze the role ofthe DDH motif in RDE-1, using an in vivo approach in which wecomplemented an rde-1–defective strain with wild-type and mutantversions of rde-1. We find that RDE-1 with mutations in the DDHmotif is defective in passenger-strand turnover. These defects can bebypassed by providing miRNA-like RNAi triggers. The integrity of thecatalytic triad becomes totally dispensable for effective RNAi in thesecases. The functional DDH motif in RDE-1 is thus only required forsiRNA maturation, but not for target-mRNA cleavage.

RESULTSDDH motif mutants are only partially RNAi deficientIn RDE-1, the C. elegans Argonaute protein that binds primarysiRNAs, the residues of the DDH motif are present at conservedpositions, suggesting that the protein carries RNase H activity(Supplementary Fig. 1). To analyze the importance of an intactDDH motif in RDE-1 for RNAi, we replaced the conserved residuesAsp718, Asp801 and His974 with alanines in a hemagglutinin epitope(HA)-tagged version of RDE-1. The resulting mutant DDH motifs areabbreviated DDA (H974A), DAH (D801A), ADH (D718A) and AAA(D718A, D801A, H974A). We tested the ability of the wild-type and

Received 21 January 2008; accepted 2 December 2008; published online 18 January 2009; doi:10.1038/nsmb.1541

1Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences & University Medical Centre Utrecht, Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands.2Present address: Netherlands Forensic Institute, P.O. Box 24044, 2490 AA, The Hague, The Netherlands. 3These authors contributed equally to this work.Correspondence should be addressed to R.F.K. ([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 0 7

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 115: Nature Structural Molecular Biology February

mutant versions to establish RNAi in vivo in a strain carrying anonsense mutation in the endogenous rde-1 gene and expressing asingle primary siRNA (Fig. 1a). This single siRNA, termed 22siRNA,targets the mRNA of the endogenous unc-22 gene and induces atwitching phenotype in an rde-1–dependent manner30.

No phenotype was observed in the absence of a rescuing version ofRDE-1, proving that RDE-1 is absolutely required for RNAi, and itsfunctions cannot be replaced by other Argonaute proteins (Fig. 1b).Wild-type HA–RDE-1 readily rescues the rde-1 stop mutation, leadingto unc-22 silencing and a severe twitching phenotype. Mutation of theDDH residues within HA–RDE-1 results in a decrease in silencing,ranging from a mild decrease in the case of HA–RDE-1 DAH to acomplete loss of silencing in the cases of HA–RDE-1 ADH andHA–RDE-1 AAA (Fig. 1b). In human AGO2, Drosophila melanogaster

AGO2, Neurospora crassa QDE-2 and Schizosccharomyces pombe Ago1,single amino acid changes of any residue within the RNase H motif ledto a complete loss of target-cleavage activity and silencing7,16,18,19,23.In contrast to these findings, we show that HA–RDE-1 DAH andHA–RDE-1 DDA mutants are still RNAi proficient, although at alower level than wild-type HA–RDE-1 (Fig. 1b). This implies that theRDE-1 DDH motif is not required for target silencing once it has beenloaded with a single-stranded siRNA.

Passenger-strand turnover is deficient in DDH motif mutantsTo test whether the mutations in the RDE-1 DDH motif causedeficiencies in passenger-strand turnover, we replaced the fully match-ing 22siRNA duplex with miRNA-like siRNA duplexes containing a1-nucleotide (nt) or 3-nt mismatch in the passenger strand (ps mm10,

ps mm9-11, ps mm14 and ps mm17;Fig. 1a). The passenger strands of thesesiRNAs cannot be cleaved by RDE-1 becauseof the mismatch but should instead beremoved by an alternative mechanism, poss-ibly analogous to the removal of miRNA*strands from miRNA duplexes25. The mis-matched siRNA duplexes restored silencingby HA–RDE-1 ADH, some up to wild-typeRDE-1 levels, suggesting that the prime defectof RDE-1 ADH lies in passenger-strandremoval (Fig. 1b,c).

Consistent with the phenotypic observa-tion of the defects in passenger-strand turn-over, we detected both strands of the22siRNA in lines expressing the HA–RDE-1DDH mutants by northern blotting (Fig. 2a).In HA–RDE-1 wild type–expressing nema-todes, and when the 22siRNA was replacedwith an siRNA containing mismatches in theduplex (ps mm10, ps mm9-11), only theguide strand was detected, a result that is inaccordance with the observation of activesilencing (Fig. 2a). Note that, although theps mm10 construct produces low levels ofmature siRNA that are hard to detect in total-RNA preparations, isolation of small RNAsbound to HA–RDE-1 clearly contain themature guide strand derived from this con-struct. The reason behind this low expressionmay either be lower transcription rates fromthe transgene or lower stability of the pre-cursor. Quantification experiments showedthat the passenger strand is downregulatedby a factor of more than 100 compared to theguide strand in these lines (SupplementaryFig. 2 online).

In RDE-1 AAA mutant nematodes, bothstrands of the 22siRNA cofractionate withthe HA–RDE-1 protein in FPLC analyses(Supplementary Fig. 3 online). To showthat RDE-1 DDH mutants indeed bind thesiRNA duplex, we immunoprecipitated thehemagglutinin-tagged versions of RDE-1 anddetected both strands of the 22siRNA bydenaturing and native northern blotting(Fig. 2b,c and Supplementary Fig. 4 online).

22siRNA 5′

3′

3′

5′

5′

3′

3′

5′

5′

3′

3′

5′

5′

3′

3′

5′

5′

3′

3′

5′

ps mm10

ps mm14

ps mm17

ps mm9-11

TransgenicsiRNA andHA–RDE-1

100

80

60

% w

ith p

heno

type

40

20

01

siRNA2 3 4 5 6 7 8 9 10 11

None;

22siR

NA

None;

ps m

m10

None;

ps m

m9–

11

Wild

type

; non

e

Wild

type

; 22s

iRNA

AAA; 22s

iRNA

DDA; 22s

iRNA

DAH; 22s

iRNA

ADH; 22s

iRNA

ADH; ps m

m10

ADH; ps m

m9–

11

Twitching strength:

Paralysed

Strong

Weak

No

60

40

% w

ith p

heno

type

20

wt; 22si1(n = 76)

wt; 22si2(n = 56)

wt; 22si3(n = 66)

ADH;psmm101(n = 70)

ADH;psmm10

2(n = 119)

ADH;psmm103(n = 97)

ADH;psmm14

1(n = 243)

ADH;psmm14

2(n = 175)

ADH;psmm143(n = 69)

ADH;psmm17

1(n = 181)

ADH;psmm17

2(n = 168)

a

b

c

Figure 1 Mismatched siRNA duplexes bypass mutations in the RDE-1 DDH motif. (a) Structure and

sequence of the transgenic small RNA duplexes used in this study. The guide strand is shown in bold.

ps mm14 and ps mm17 constructs induce mismatches across the seed sequence of the mature siRNA.

(b) RDE-1 DDH mutants show various degrees of reduced silencing. The phenotype of 100 adult

nematodes from each transgenic line was assessed in three independent trials, and a score was given

for penetrance. Phenotypical categories are paralyzed (strongest, black), strong twitching (dark gray),

weak twitching (light gray) and nontwitching (white). All lines contain a stop mutation in theendogenous rde-1 gene. The guide strand in the cartoon representation of the siRNA duplex is in bold.

Error bars indicate s.d. Arithmetic mean and s.d. of each category are also given in Supplementary

Table 1 online. (c) ps mm14 and ps mm17 transgenic lines induce unc-22 RNAi. The efficiency of

these constructs is, however, less than the ps mm10 and ps mm9-11 constructs. Several independent

transgenic lines were tested, as indicated with numbers 1–3. Details are the same as in Figure 1b.

ART IC L E S

20 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 116: Nature Structural Molecular Biology February

Only the guide strand was found to be bound to wild-type HA–RDE-1(Fig. 2b,c, lane 2, and Supplementary Fig. 4). Both strands of the22siRNA duplex co-immunoprecipitated with the DDH mutantversions and were mostly present in duplex form on nondenaturinggels (Fig. 2b,c and Supplementary Fig. 4). The siRNA duplexesdetected were present in vivo and not formed during RNA preparation(Supplementary Fig. 5 online). When the 22siRNA was replaced withan siRNA containing mismatches in the duplex, only the guidestrand was bound to HA–RDE-1 ADH (Fig. 2b,c and SupplementaryFig. 4). The levels of mismatched siRNAs in the immunoprecipitationwere lower than the 22siRNA because small RNAs expressed frommismatched precursors feed mainly into the miRNA pathway, andonly minor amounts are bound by RDE-1 (ref. 33). Placingmismatches away from the center has been shown to have little effect

on loading preference into different pathways33, in contrast to whatis observed in D. melanogaster34. The populations of siRNAsderived from mismatched precursors (ps mm10, ps mm9-11) thatare bound by ALG-1 and ALG-2 have, however, no effect on RNAi,as these lines are completely RNAi deficient in an rde-1 mutantbackground (Fig. 1b).

Thus, the defects in passenger-strand turnover in the RDE-1 DDHmutants can be bypassed with siRNA duplexes containing mismatches,as the passenger strand of these duplexes is removed by an RNase H–independent mechanism.

RDE-1–target interaction is impaired in DDH motif mutantsTo assess whether RDE-1 DDH mutant–bound siRNAs are singlestranded and capable of interacting with single-stranded targets, weperformed Argonaute protein–capture assays with the hemagglutinin-tagged versions of RDE-1 (Fig. 3). Nematode extracts were incubatedwith a biotinylated target complementary to the siRNAs, and HA–RDE-1-target complexes were pulled down. Wild-type HA–RDE-1bound efficiently to the target and precipitated readily using thismethod (Fig. 3). In contrast, precipitation of the HA–RDE-1 DDHmutants was prevented, as we detected relatively little Argonauteprotein compared to the input (Fig. 3). Target recognition wassomewhat restored when HA–RDE-1 ADH was provided with mis-matched siRNA duplexes (Fig. 3). These results support the findingthat the silencing defects in the RDE-1 DDH mutants are caused bypassenger strand–turnover defects that interfere with guide-strandaccessibility for the target.

DISCUSSIONRDE-1 slices the passenger strandIn fly and mammalian RNAi, target cleavage by the (primary)Argonaute protein is sufficient for silencing7,17. Here we have shownthat the slicer function of RDE-1 is used for passenger-strand turnoverand is not required for target-mRNA cleavage. These results wereobtained by expressing wild-type or mutant versions of the RDE-1protein in an otherwise rde-1 mutant background. This background istotally RNAi defective in the absence of rescuing RDE-1 protein,ensuring that all the phenotypic effects observed can be attributed tothe re-introduced versions of RDE-1. As a consequence, expressionlevels of RDE-1 differ in the various lines obtained. However, we haveanalyzed at least two independent lines per construct and have notobserved a correlation between the expression level of RDE-1 and itsrescuing activity (F.A.S., R.F.K. and T.S., unpublished results). Thisindicates that the required amount of RDE-1 to obtain rescuingactivity is relatively low, at least below that obtained in our lowest-expressing lines.

Together with previously published findings30,33,35,36, our resultsshow that the main role of RDE-1 in RNAi is three-fold: (i) recogni-tion of potential siRNAs based on base-pairing properties of precursor

HA–RDE-1

Empt

y bea

ds

22siR

NA

Wild

type

Wild

type

; 22s

iRNA

AAA; 22s

iRNA

DDA; 22s

iRNA

DAH; 22s

iRNA

ADH; 22s

iRNA

ADH; ps m

m10

ADH; ps m

m9–

11

Guide

Passenger

5S rRNA

Tubulin

WB

NB

NB

NB

NB

NB

NNB

NNB

WB

WB

Guide

Passenger

Guide

Passenger

ds

ss

ds

ss

HA–RDE-1

0 1 2 3 4 5 6 7 8

0E 1 2 3 4 5 6 7 8

0E 1 2 3 4 5 6 7 8

a

b

c

22siR

NA

Wild

type

Wild

type

; 22s

iRNA

AAA; 22s

iRNA

DDA; 22s

iRNA

DAH; 22s

iRNA

ADH; 22s

iRNA

ADH; ps m

m10

ADH; ps m

m9–

11

Input

Anti-guide

Anti-GL3

Tubulin

0 1 2 3 4 5 6 7 8

Figure 3 Passenger-strand turnover is required for target recognition.

HA–RDE-1–small RNA complexes were captured with biotinylated targetscomplementary to the small RNAs and pulled down using streptavidin

beads. HA–RDE-1 wild-type and DDH mutant versions were detected by

western blotting. Capture oligonucleotides are antisense to and fully

matching an unrelated luciferase GL3 sequence or the 22siRNA. a-tubulin

was detected by western blotting and used as loading control.

Figure 2 In RDE-1 DDH mutants, the passenger strand accumulates and

remains associated with the guide strand and RDE-1. (a) Hemagglutinin

(HA)-tagged wild-type and mutant versions of RDE-1 (HA–RDE-1) and guide

and passenger strands of transgenic small RNAs (22siRNA, ps mm10,

ps mm9-11) were detected in total nematode extract or RNA by western

blotting (WB) or northern blotting (NB), respectively. 5S ribosomal RNA

and a-tubulin were detected by NB and WB, respectively, and used as

loading controls. (b,c) HA–RDE-1 wild-type and mutant versions were

immunoprecipitated and detected by western blotting. RNA was isolated

from immunoprecipitated protein complexes and siRNA guide and passenger

strands were detected by denaturing NB (b) or native NB (NNB, c). Empty

beads were used as a control (lane E).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 0 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 117: Nature Structural Molecular Biology February

RNAs; (ii) siRNA maturation; and (iii) flagging mRNAs for recogni-tion by an RNA-dependent RNA polymerase (RdRP) complex. Underthis scenario, efficient target-mRNA cleavage might even interfere withRdRP initiation and secondary siRNA generation, and there maytherefore be evolutionary pressure against efficient RDE-1 sliceractivity towards its targets. The fact that we have previously clonedsecondary siRNAs that span the presumed cleavage site on 22siRNA-targeted unc-22 mRNA, which is only possible if secondary siRNAs aregenerated on noncleaved templates, supports this hypothesis30. Inaddition, target slicer activity in total C. elegans extracts is mainlyattributed to an Argonaute protein other than RDE-1 (ref. 32);however, as RDE-1 preferentially loads with perfectly matchingsiRNA duplexes33, RNase H activity remains an essential feature forRDE-1 to eliminate the passenger strand so that the guide strandbecomes available for target-mRNA selection. To more directly studythis reaction, an in vitro system will need to be developed. Unfortu-nately, initial attempts at this in the context of the present study havenot been successful. Notably, a similar explanation for the retainmentof the inefficient catalytic activity of the fly AGO1 protein has beenproposed37, suggesting that an exclusive requirement of the RNase Hdomain for passenger-strand removal may be a common featureamong Argonaute proteins.

RDE-1 loading through mismatched duplexesIn our experiments, we bypassed catalytically inactive RDE-1 proteinby introducing mismatched siRNA duplexes into cells. As differencesin the efficiency between these various constructs may be introducedat any step between the in vivo transcription of the transgene throughto eventual mRNA degradation, it is hard to derive strong conclusionsabout the mechanism of passenger-strand removal in RDE-1 specifi-cally. However, we do observe a trend in which mismatches introducedin the central region of the siRNA duplex are more effective intriggering RNAi than those introduced near the ends of the duplexin the context of our rde-1 mutants. This is true also for constructs inwhich the pairing of the seed sequence of the mature siRNA isimpaired. siRNAs containing 3-nt mismatches do not induce RNAiat all (not shown), and 1-nt mismatches at the seed are less effectivethan those in the center. This could reflect a strong preference ofRDE-1 to bind perfectly base-paired siRNA duplexes, and the seedregion especially could be an important factor in that selectionprocess. Hence, although passenger-strand removal should in theorybe more effective when the seed is mispaired, the net effect in the caseof RDE-1 would be a loss of duplex binding leading to a less effectiveRNAi response.

RNA degradation during RNAi in C. elegansThe finding that RDE-1 is a limited slicer and has no role in target-RNA turnover leaves open the question of how target RNA isdegraded. In flies and mammals, the strong slicer activity in theequivalent Ago2 is sufficient to downregulate target-RNA levels.Caenorhabditis elegans, however, requires an amplification machineryfor efficient RNAi. The secondary siRNAs generated in this amplifica-tion step are bound by the secondary Argonaute proteins and cansilence in trans to degrade mRNA molecules that are not comple-mentary to primary siRNAs28–31. Both the secondary Argonauteproteins and the production of secondary siRNAs are required forsilencing, implying that target degradation occurs mainly via second-ary Argonaute protein complexes29,31,32. Notably, many secondaryArgonaute proteins lack the conserved DDH motif, suggesting that asignificant fraction of the secondary siRNAs may silence their targetsby a mechanism other than RNase H–mediated cleavage31. The exact

mechanism by which these cleavage-incompetent secondary Argo-naute complexes induce target degradation remains to be investigated.

METHODSPlasmids, nematode strains and transgenic lines. The alleles used in

this study were rde-1(pk3301), 22siRNA(pkIs2289)30, ps mm10(pkIs2450),

ps mm9-11(pkIs2446)32 and unc-119 (ed3). Mutations in the DDH motif of

HA–RDE-1 were introduced into the vector pHIT-1 (ref. 34) using the

QuickChange site-directed mutagenesis kit (Stratagene). Transgenic small

RNAs (22siRNA, ps mm10, ps mm9-11) have been described30,32. We generated

transgenic lines carrying an HA–RDE-1 mutant and one of the transgenic small

RNAs using microinjection or standard ballistic transformation. Newly gener-

ated alleles for this study were rde-1–HA(pkIs2449), rde-1–HA D718A D801A

H974A;22siRNA(pkIs2461), rde-1–HA H974A;22siRNA(pkIs2424), rde-1–HA

D801A;22siRNA(pkIs2421), rde-1–HA D718A;22siRNA(pkIs2460), rde-1–HA

D718A;ps mm10(pkIs2464), rde-1–HA D718A;ps mm9-11 (pkIs2458). The

small RNAs from the various constructs target nucleotides 11925–11946 of

the unc-22 spliced sequence. Nematodes were cultured according to standard

procedures, and the unc-22 twitching phenotype was determined by eye.

Protein assays. FPLC experiments, capture assays, RNA immunoprecipitations

and western blotting were performed as described32. The 2¢-O-methyl oligo-

nucleotide sequences used in the capture assays were 5¢-UUUC-X-AUCAC-3¢,X being the sequence antisense to 22siRNA, ps mm10 and ps mm9-11, or the

luciferase sequence 5¢-UCGAAGUACUCAGCGUAAGUU-3¢. a-tubulin was

detected using an anti–a-tubulin antibody (Sigma).

RNA analysis. RNA from nematode extracts, FPLC fractions and RNA

immunoprecipitations was isolated using Trizol LS (Invitrogen) according

to the manufacturers protocol. No trap oligonucleotide was added during

the isolation of siRNA duplexes. To distinguish between duplexes formed

in vivo and in vitro, siRNA duplexes were heat denatured and re-isolated using

Trizol LS (Supplementary Fig. 4).

We carried out primer-extension reactions and denaturing northern blot

analyses as described32. Specific DNA 18-mer sequences used for primer

extensions are 5¢�TGCATTTGTCACTGGAAC-3¢ for the guide strand and

5¢-GGTGAGGTTCCAGTGAC-3¢ for the passenger strand of 22siRNA.

For nondenaturing northern blot analyses, RNA was separated on 20% (w/v)

polyacrylamide Tris-borate EDTA gels at 4 1C. After running, the gels were

heated to 80 1C for 15 min followed by a brief incubation on ice and standard

blotting procedures. Transgenic small RNAs and 5S ribosomal RNA were

detected using DNA probes.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank C. Mello (University of Massachusetts Medical School) for providingstrains and B. Ason, T. Sixma and M. Buhler for help and discussions. The workwas supported by a VIDI fellowship from the Dutch Scientific Organization(NWO) to T.S. and the Sixth Framework Programme of the EuropeanCommission through the SIROCCO Integrated Project to R.F.K.

AUTHOR CONTRIBUTIONSF.A.S., T.S. and R.F.K. designed the experiments; F.A.S., K.L.O., S.W.H. and T.S.performed the experiments; F.A.S. and R.F.K. wrote the paper.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Fire, A. et al. Potent and specific genetic interference by double-stranded RNA inCaenorhabditis elegans. Nature 391, 806–811 (1998).

2. Bernstein, E., Caudy, A.A., Hammond, S.M. & Hannon, G.J. Role for a bidentateribonuclease in the initiation step of RNA interference. Nature 409, 363–366 (2001).

3. Hutvagner, G. et al. A cellular function for the RNA-interference enzyme Dicer in thematuration of the let-7 small temporal RNA. Science 293, 834–838 (2001).

4. Hutvagner, G. & Simard, M.J. Argonaute proteins: key players in RNA silencing.Nat. Rev. Mol. Cell Biol. 9, 22–32 (2008).

ART IC L E S

21 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 118: Nature Structural Molecular Biology February

5. Peters, L. & Meister, G. Argonaute proteins: mediators of RNA silencing. Mol. Cell 26,611–623 (2007).

6. Lingel, A., Simon, B., Izaurralde, E. & Sattler, M. Structure and nucleic-acid binding ofthe Drosophila Argonaute 2 PAZ domain. Nature 426, 465–469 (2003).

7. Liu, J. et al. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305,1437–1441 (2004).

8. Ma, J.B., Ye, K. & Patel, D.J. Structural basis for overhang-specific small interferingRNA recognition by the PAZ domain. Nature 429, 318–322 (2004).

9. Song, J.J. et al. The crystal structure of the Argonaute2 PAZ domain reveals an RNAbinding motif in RNAi effector complexes. Nat. Struct. Biol. 10, 1026–1032 (2003).

10. Song, J.J., Smith, S.K., Hannon, G.J. & Joshua-Tor, L. Crystal structure of Argonauteand its implications for RISC slicer activity. Science 305, 1434–1437 (2004).

11. Yan, K.S. et al. Structure and conserved RNA binding of the PAZ domain. Nature 426,468–474 (2003).

12. Baumberger, N. & Baulcombe, D.C. Arabidopsis ARGONAUTE1 is an RNA Slicer thatselectively recruits microRNAs and short interfering RNAs. Proc. Natl. Acad. Sci. USA102, 11928–11933 (2005).

13. Gunawardane, L.S. et al. A Slicer-mediated mechanism for repeat-associated siRNA 5¢end formation in Drosophila. Science 315, 1587–1590 (2007).

14. Lau, N.C. et al. Characterization of the piRNA complex from rat testes. Science 313,363–367 (2006).

15. Saito, K. et al. Specific association of Piwi with rasiRNAs derived from retrotransposonand heterochromatic regions in the Drosophila genome. Genes Dev. 20, 2214–2222(2006).

16. Irvine, D.V. et al. Argonaute slicing is required for heterochromatic silencing andspreading. Science 313, 1134–1137 (2006).

17. Rand, T.A., Ginalski, K., Grishin, N.V. & Wang, X. Biochemical identification ofArgonaute 2 as the sole protein required for RNA-induced silencing complex activity.Proc. Natl. Acad. Sci. USA 101, 14385–14389 (2004).

18. Miyoshi, K., Tsukumo, H., Nagami, T., Siomi, H. & Siomi, M.C. Slicer functionof Drosophila Argonautes and its involvement in RISC formation. Genes Dev. 19,2837–2848 (2005).

19. Rivas, F.V. et al. Purified Argonaute2 and an siRNA form recombinant human RISC.Nat. Struct. Mol. Biol. 12, 340–349 (2005).

20. Martinez, J., Patkaniowska, A., Urlaub, H., Luhrmann, R. & Tuschl, T. Single-strandedantisense siRNAs guide target RNA cleavage in RNAi. Cell 110, 563–574 (2002).

21. Okamura, K., Ishizuka, A., Siomi, H. & Siomi, M.C. Distinct roles for Argonauteproteins in small RNA-directed RNA cleavage pathways. Genes Dev. 18, 1655–1666(2004).

22. Buker, S.M. et al. Two different Argonaute complexes are required for siRNA generationand heterochromatin assembly in fission yeast. Nat. Struct. Mol. Biol. 14, 200–207(2007).

23. Maiti, M., Lee, H.C. & Liu, Y. QIP, a putative exonuclease, interacts with theNeurospora Argonaute protein and facilitates conversion of duplex siRNA into singlestrands. Genes Dev. 21, 590–600 (2007).

24. Leuschner, P.J., Ameres, S.L., Kueng, S. & Martinez, J. Cleavage of the siRNApassenger strand during RISC assembly in human cells. EMBO Rep. 7, 314–320(2006).

25. Matranga, C., Tomari, Y., Shin, C., Bartel, D.P. & Zamore, P.D. Passenger-strandcleavage facilitates assembly of siRNA into Ago2-containing RNAi enzyme complexes.Cell 123, 607–620 (2005).

26. Rand, T.A., Petersen, S., Du, F. & Wang, X. Argonaute2 cleaves the anti-guide strand ofsiRNA during RISC activation. Cell 123, 621–629 (2005).

27. Kim, K., Lee, Y.S. & Carthew, R.W. Conversion of pre-RISC to holo-RISC by Ago2during assembly of RNAi complexes. RNA 13, 22–29 (2007).

28. Pak, J. & Fire, A. Distinct populations of primary and secondary effectors during RNAiin C. elegans. Science 315, 241–244 (2007).

29. Sijen, T. et al. On the role of RNA amplification in dsRNA-triggered gene silencing.Cell 107, 465–476 (2001).

30. Sijen, T., Steiner, F.A., Thijssen, K.L. & Plasterk, R.H. Secondary siRNAs result fromunprimed RNA synthesis and form a distinct class. Science 315, 244–247 (2007).

31. Yigit, E. et al. Analysis of the C. elegans Argonaute family reveals that distinctArgonautes act sequentially during RNAi. Cell 127, 747–757 (2006).

32. Aoki, K., Moriguchi, H., Yoshioka, T., Okawa, K. & Tabara, H. In vitro analyses of theproduction and activity of secondary small interfering RNAs in C. elegans. EMBO J. 26,5007–5019 (2007).

33. Steiner, F.A. et al. Structural features of small RNA precursors determine Argonauteloading in Caenorhabditis elegans. Nat. Struct. Mol. Biol. 14, 927–933 (2007).

34. Tomari, Y., Du, T. & Zamore, P.D. Sorting of Drosophila small RNA silencing RNAs.Cell 130, 299–308 (2007).

35. Tabara, H. et al. The rde-1 gene, RNA interference, and transposon silencing inC. elegans. Cell 99, 123–132 (1999).

36. Tabara, H., Yigit, E., Siomi, H. & Mello, C.C. The dsRNA binding protein RDE-4interacts with RDE-1, DCR-1, and a DExH-box helicase to direct RNAi in C. elegans.Cell 109, 861–871 (2002).

37. Forstemann, K., Horwich, M.D., Wee, L., Tomari, Y. & Zamore, P.D. DrosophilamicroRNAs are sorted into functionally distinct Argonaute complexes after productionby Dicer-1. Cell 130, 287–297 (2007).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 1 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 119: Nature Structural Molecular Biology February

Nucleic acid polymerases use a general acid fornucleotidyl transferChristian Castro1,4, Eric D Smidansky1,4, Jamie J Arnold1, Kenneth R Maksimchuk1, Ibrahim Moustafa1,Akira Uchida1, Matthias Gotte2, William Konigsberg3 & Craig E Cameron1

Nucleic acid polymerases catalyze the formation of DNA or RNA from nucleoside-triphosphate precursors. Amino acid residuesin the active site of polymerases are thought to contribute only indirectly to catalysis by serving as ligands for the two divalentcations that are required for activity or substrate binding. Two proton-transfer reactions are necessary for polymerase-catalyzednucleotidyl transfer: deprotonation of the 3¢-hydroxyl nucleophile and protonation of the pyrophosphate leaving group. Usingmodel enzymes representing all four classes of nucleic acid polymerases, we show that the proton donor to pyrophosphate isan active-site amino acid residue. The use of general acid catalysis by polymerases extends the mechanism of nucleotidyltransfer beyond that of the well-established two-metal-ion mechanism. The existence of an active-site residue that regulatespolymerase catalysis may permit manipulation of viral polymerase replication speed and/or fidelity for virus attenuation andvaccine development.

Nucleic acid polymerases catalyze the formation of DNA or RNAfrom 2¢-deoxyribonucleotides or ribonucleotides, respectively.Polymerases are therefore required for the reproduction, maintenanceand expression of the genomes of all organisms, including viruses.Nucleotidyl transfer, the chemical reaction catalyzed by polymerases, isshown in Figure 1. Nucleophilic attack on the a-phosphorous atom ofthe (2¢-deoxy)ribonucleoside triphosphate by the primer 3¢-hydroxylleads to formation of a phosphodiester bond and release of pyro-phosphate. All polymerases require two divalent cations, usually Mg2+,for activity and use a two-metal-ion mechanism for nucleotidyltransfer1,2 (Fig. 1). Metal A lowers the pKa of the primer 3¢-hydroxyl,thus facilitating deprotonation of this moiety for in-line nucleophilicattack3. Metal B orients the triphosphate for catalysis, stabilizes thenegative charge that arises during formation of the pentavalenttransition state and has been suggested to assist pyrophos-phate release1,3. Although it is clear that deprotonation of the primer3¢-hydroxyl is required for catalysis4,5, only recently has it beensuggested that protonation of the pyrophosphate leaving groupoccurs before its release from the enzyme4. The acceptor of the3¢-hydroxyl proton and donor of pyrophosphate proton are notknown. It is now clear that chemistry can be at least partially ratelimiting for nucleotide addition by all classes of nucleic acidpolymerases4,6, as well as serving as a fidelity checkpoint6–8. Identi-fication of the acceptor and donor for these proton-transferreactions may inspire new mechanistic hypotheses for how catalyticefficiency can be tuned by the nature (correct versus incorrect) ofthe bound nucleotide.

Four classes of template-dependent nucleic acid polymerases exist:RNA-dependent RNA polymerase (RdRp); RNA/DNA-dependentDNA polymerase, the so-called reverse transcriptase (RT); DNA-dependent DNA polymerase (DdDp); and DNA-dependent RNApolymerase (DdRp). In this study, we have used the RdRp frompoliovirus (PV), the RT from human immunodeficiency virus type 1(HIV), the DdDp from bacteriophage RB69 and the DdRp frombacteriophage T7 as representatives of the four classes of polymerasesbecause of the wealth of mechanistic and/or structural informationavailable for these enzymes9–21. The objective of this study was toidentify the proton donor to the pyrophosphate leaving group, asanalysis of high-resolution structures of several polymerases poisedfor or undergoing catalysis implicated a basic amino acid residuerather than a water molecule as the proton donor. In addition, analysisof the pH dependence of nucleotide addition by PV RdRp wasconsistent with catalysis being dependent on a residue with apKa value of 10.5 (ref. 4).

RESULTSA general acid in nucleic acid polymerase catalysisAnalysis of the structural model for the RdRp from Norwalk virus(NV) in complex with primed template RNA and nucleotide showedthat Lys374 of conserved structural motif D was located in the vicinityof the triphosphate moiety of the incoming nucleotide22 (Supple-mentary Figs. 1a,b online). Structural motif D is conserved betweenRdRps and RTs and has no defined catalytic function23. Analysis ofstructure-based sequence alignments showed only one conserved

Received 12 August 2008; accepted 26 November 2008; published online 18 January 2009; doi:10.1038/nsmb.1540

1Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA. 2Department of Microbiology &Immunology, McGill University, Montreal, Quebec H3A 2B4, Canada. 3Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut06520, USA. 4These authors contributed equally to this work. Correspondence should be addressed to C.E.C. ([email protected]).

21 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 120: Nature Structural Molecular Biology February

lysine residue in motif D of RdRps and RTs, including the telomereRT, telomerase (TERT) (Supplementary Fig. 1c). These sequencealignments identified Lys359 of PV RdRp and Lys220 of HIV RT inconserved structural motif D as candidates for the proton donors inthese systems (Supplementary Fig. 1c). Lys219 of HIV RT was ruledout as the putative general acid because of the absence of this residuein the RT from human immunodeficiency virus type 2 (HIV-2)(Supplementary Fig. 1c)—it is well established that the DNA poly-merase activity of HIV-2 RT is on par with that observed for HIVRT24. Unlike Lys374 of NV RdRp, Lys359 of PV RdRp and Lys220 ofHIV RT were not oriented in a position to interact readily with thetriphosphate moiety of the incoming nucleotide (Fig. 2).

Structural motif D is one of the most dynamic elements of the palmsubdomain of RdRps and RTs, with the position of this motif varyingby as much as 6 A when structures are compared (SupplementaryFig. 1b). Solution of the structure of HIV RT in complex with primedtemplate DNA and nucleotide required cross-linking of the enzyme toDNA, which could have influenced the orientation of motif D14. Thispossibility is supported by a recently published model for the TERTelongation complex that positions the motif D lysine in a position toserve as a general acid25 (Fig. 2). Therefore, only biochemical studiescould be used to test the possibility that the conserved lysine in motifD functions as a general acid. Identification of candidates for thegeneral acid in DNA-dependent DNA and RNA polymerases was morestraightforward. Structural models show clearly that Lys560 in helix Pof RB69 DdDp, a B family polymerase12, and Lys631 in helix O ofT7 DdRp, an A family polymerase21, are positioned to serve as protondonors (Fig. 2) for these enzymes.

For each of the polymerases described above, we produced deriva-tives in which the candidate lysine proton donor was replaced with aleucine residue (Supplementary Methods online). We chose leucine

instead of alanine to prevent water from binding to the site andserving as a proton donor (Supplementary Fig. 2 online). For allpolymerases tested, the maximal rate constant for nucleotide incor-poration (kpol) was reduced by 50-fold to 2,000-fold for the leucinederivative relative to the wild-type, lysine-containing enzyme(Table 1). This observation is consistent with a role for the lysine inthe rate-limiting step for nucleotide incorporation. For all of the wild-type polymerases used here, chemistry is at least partially rate limit-ing4. With the exception of the leucine derivative of T7 DdRp, the kpol

values for each polymerase derive from experiments that included asaturating nucleotide concentration (five times the apparent dissocia-tion constant (Kd,app)) (Supplementary Methods). In the case of theleucine derivative of T7 DdRp, the highest concentration of nucleotideattainable was two times the Kd,app (Supplementary Methods). Thiscircumstance increases the error on the kpol value measured; however,our ability to reach conclusions for the leucine derivative is notaffected, because the observed 100-fold reduction in kpol value forthis derivative is much greater than the error of the measurement.

We did not observe a substantial difference in the Kd,app for thenucleotide substrate measured for the leucine derivatives of PV RdRpor HIV RT (Table 1). This observation is consistent with residues inthe conserved, structural motif F of these enzymes functioningindependently in triphosphate binding26. In contrast, the Kd,app

value for nucleotide substrate measured for the leucine derivativesof RB69 DdDp and T7 DdRp increased 25-fold and 167-fold,respectively (Table 1). This observation is consistent with structuralstudies showing that these lysines interact with the nucleotidesubstrate18,20 (Fig. 2).

Unlike with many other polymerases, the stability of PV RdRp onits nucleic acid primer template is not affected by pH; this allows us tointerpret the pH dependence of the kinetics of nucleotide incorpora-tion4 (Supplementary Methods). Consistent with Lys359 of PV RdRpfunctioning as a proton donor during nucleotidyl transfer, thedescending limb of the pH rate profile observed for wild-type PVRdRp4 (Fig. 3a and Supplementary Fig. 3 online) was lost in theprofile for the K359L derivative (Fig. 3a and Supplementary Fig. 3).The ionization observed for the K359L derivative probably reflectsdeprotonation of the 3¢-hydroxyl (Fig. 3a). Theoretical studies ofnucleotidyl transfer by rat DNA polymerase-b have suggested that theprimer 3¢-hydroxyl has a pKa in the range of 8–9.5 (ref. 7). Because thepKa values measured are kinetic pKa values, the lack of equivalence inthe pKa value for the K359L derivative to any measured for the wild-type enzyme probably reflects a change in the rate-determining stepfor nucleotidyl transfer4. Only the chemistry should be rate limitingfor the K359L derivative, whereas a conformational-change step andchemistry are equally rate limiting for the wild-type enzyme10.

Loss of the general acid leads to single proton transferTheoretical studies of the free-energy landscape for phosphoryltransfer almost always show proton transfer in the transition state27,which would manifest experimentally as a solvent deuterium kineticisotope effect4. The solvent deuterium kinetic isotope effect is definedas the ratio of kpol values obtained when the reaction is performed inH2O relative to values obtained when the reaction is performed inD2O28 (Supplementary Methods). The observation of a solventdeuterium kinetic isotope effect supports a proton-transfer reactionrepresenting one of the microscopic steps reflected in the macroscopicrate constant, kpol. Each wild-type polymerase showed a solventdeuterium kinetic isotope effect of 2–5 (Table 1). This observationis consistent with chemistry contributing to the rate-limiting step(s)reported by the nucleotide-incorporation assay, as observed for the

A

B

Motif C

Motif A

-O

-O

OO

O

OO

O-

O-P

O

P

O(O)H

OH

P

OO-

OO

O

O Mg2+

O

O

Mg2+

O

Base

Base

OPrimer

O

Ha

Hb

O-

(O)H

α

β

γ

B?

A

Enzyme

PV RdRp Lys359 (Motif D)HIV RT Lys220 (Motif D)

RB69 DdDp Lys560 (Helix P)

T7 DdRp Lys631 (Helix O)

3′

Figure 1 Extending the two-metal-ion mechanism of nucleotidyl transfer

to include general acid catalysis. Nucleoside triphosphate (green) enters

the active site with a divalent cation (Mg2+, metal B). This metal ion is

coordinated by the phosphates of the nucleotide, an aspartate residue

located in structural motif A of all polymerases, and probably water

molecules (indicated as oxygen ligands to metal without specific

designation). Metal B orients the triphosphate in the active site and may

contribute to charge neutralization during catalysis. A second divalent cation

binds (Mg2+, metal A) that is coordinated by the 3¢-hydroxyl of the primerterminus (cyan), the nucleotide a-phosphate and aspartate residues of

structural motifs A and C. Metal A lowers the pKa of the 3¢-hydroxyl,

facilitating deprotonation and subsequent nucleophilic attack at

physiological pH. As the transition state of nucleotidyl transfer is

approached (indicated by dashed red lines), the primer 3¢-hydroxyl proton,

Ha, is transferred to an unidentified base (B), and we propose that the

pyrophosphate leaving group is protonated (Hb) by a basic amino acid

on the enzyme. The positions of the candidate general acids of the model

polymerases used in this study are shown.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 1 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 121: Nature Structural Molecular Biology February

wild-type enzymes4. Notably, the solvent deuterium kinetic isotopeeffect measured for each leucine derivative was smaller than thatobserved for the corresponding wild-type enzyme (Table 1). Thisobservation could be interpreted in one of two ways. First, it ispossible that the extent to which chemistry limits nucleotide additionby the leucine derivatives is less than occurs for the wild-type enzymes.Second, it is possible that fewer protons are being transferred. Thislatter possibility would be consistent with the candidate lysine residuescontributing one proton transfer during nucleotide incorporation.

To count the number of protons being transferred during thenucleotidyl transfer reaction, we performed a proton-inventoryexperiment4,29 (Supplementary Methods). This experiment obtainskpol values in the presence of different mole fractions of D2O (kn). Theratio of kn / k0 (k0 is kpol in H2O) is plotted as a function of the molefraction of D2O (n). If a single proton is transferred during thereaction, then the data should fit to a line. We carried out thisexperiment for all leucine derivatives except the T7 DdRp derivative,which could not be saturated with nucleotide. The data obtained forall of the leucine derivatives evaluated fit well to a straight line

(Fig. 3b), consistent with a single proton-transfer reaction occurringduring nucleotidyl transfer for these enzymes. In contrast, the data forthe corresponding wild-type enzymes failed to fall on a line defined bykn / k0 values at n ¼ 0 and n ¼ 1, consistent with a model in whichmore than one proton-transfer reaction occurs during nucleotidyltransfer (Fig. 3b). More extensive analysis of the proton-inventoryexperiment for the wild-type enzymes, including statistical analysis,was reported previously, convincingly demonstrating that two proton-transfer reactions occur in the transition state for nucleotidyl transferby the wild-type enzymes used here4. These data provide compellingevidence that an active-site amino acid residue is used as a generalacid to protonate the pyrophosphate leaving group and facilitatenucleotidyl transfer.

General acid catalysis in the presence of Mn2+

All of the experiments described to this point were performed in thepresence of Mg2+, because this divalent cation is considered to be thebiologically relevant cofactor. In the presence of Mg2+, chemistry isonly partially rate limiting for PV RdRp at pH 7.5 (ref. 10). In

DdRp (Pol II)

Template

Primer

GTP

5′

3′

Trigger loop

His1085

Template

Primer

CTP

Lys359

5′

3′

Motif D

RdRp R/DdDp (RT)

Template

Primer

TTP

5′

3′

Motif D

Lys220

DdDp (B Family)

Template

Primer

Helix P

dTTP

5′

3′

Lys560

DdDp (X Family)

Template

PrimerddCTP

5′3′

Helix K

Arg183

DdRp (A Family)

Helix O

Template

PrimerNTP

5′

Lys631

3′

RdDp (TERT)

Template

Primer

TTP

5′

3′

Motif D

Lys372

a

d e gf

b c

Figure 2 Interactions of NTP in the active sites of various polymerase families. In all ternary complexes, the NTP (green carbon atoms), primer and template

(cyan and purple carbon atoms, respectively) and key active-site residues (gray carbon atoms) are shown as sticks; the metal ions are shown as magenta

spheres. (a) The RdRp from PV (PDB 1RA6)19 with the primer and template and CTP bound to the active site in the presence of two Mn2+ ions. The primer

and template, CTP and Mn2+ ions were extracted from the structural complex of Norovirus (NV) polymerase (PDB 3BS0)22 and modeled into the activesite of PV RdRp. (b) HIV RT with TTP bound to the active site in the presence of two Mg2+ ions (PDB 1RTD)14. (c) The RB69 DNA polymerase with the

dTTP nucleotide bound in the presence of Ca2+ ions (PDB 1IG9)12. (d) The T7 RNA polymerase with the modified nucleotide ab-methylene-ATP bound

in the presence of Mg2+ (PDB 1S76)21. (e) DNA polymerase-b with the ddCTP bound to the active site in the presence of Mg2+ ions (PDB 1BPY)50.

(f) Multisubunit RNA polymerase II from yeast (PDB 2E2H)31 in complex with nucleic acids and GTP substrate in the presence of Mg2+. (g) The catalytic

subunit of telomerase with TTP bound to the active site in the presence of two Mg2+ ions (PDB 3DU5)25. Primer and template, TTP and Mg2+ ions were

extracted from the structural complex of HIV RT (PDB 1RTD)14 and modeled into the active site of TERT as described previously25. Residues equivalent to

Lys359 of PV RdRp predicted to function as general acids are labeled. The designations of the structural elements on which the general acid is located are

also indicated explicitly.

ART IC L E S

21 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 122: Nature Structural Molecular Biology February

addition, the extent to which chemistry contributes to the observedrate constant for nucleotide addition changes as a function of pH4. Inthe presence of Mn2+, however, chemistry is the primary determinantof the rate constant for nucleotide addition by PV RdRp from pH 6.0to pH 9.0 (ref. 4). Unfortunately, experiments cannot be performedabove pH 9.0 in the presence of Mn2+ owing to precipitation of themetal hydroxide.

We evaluated the activity of the K359L derivative of PV RdRp in thepresence of Mn2+ (Supplementary Fig. 4 online). The observed rateconstant for nucleotide incorporation was reduced to 1 s–1, ten-foldlower than that of the wild-type enzyme, with no substantial change inthe apparent dissociation constant for nucleotide (SupplementaryFig. 4a). The phosphorothioate effect observed for the K359L deriva-tive (6 ± 1) was essentially the same as that measured for wild-typeenzyme (7 ± 1), consistent with chemistry remaining as the rate-limiting step (Supplementary Fig. 4a). The solvent deuterium kineticisotope effect observed for the K359L derivative of PV RdRp was3 ± 1—more than two-fold lower than the value of 7 ± 1 observed forthe wild-type enzyme (Supplementary Fig. 4a). These data are consi-stent with the loss of a proton-transfer reaction, as observed in thepresence of Mg2+. As expected, the pH dependence of the rate cons-tant for nucleotide incorporation was now essentially identical withinthe experimentally accessible pH range (Supplementary Fig. 4a).These data are consistent with chemistry serving as the rate-limitingstep for both the K359L derivative and the wild-type enzyme inthe presence of Mn2+. The reduced rate constant for nucleotide

incorporation and the solvent deuterium kinetic isotope effectobserved for the K359L derivative relative to the wild-type enzymeprovide additional support for the role of Lys359 as a general acid. Weconclude that PV RdRp, and probably the other polymerases, employgeneral acid catalysis, regardless of the divalent cation cofactor used.

Arginine and histidine substitute for lysine as the general acidTo test further the use of a general acid for nucleotidyl transfer, weproduced a PV RdRp derivative in which Lys359 was substituted witharginine. In addition to testing our model for general acid catalysis,the ability of arginine to function at this position as a general acidwould be consistent with the use of arginine as a general acid inX family polymerases such as DNA polymerase-b30 (Fig. 2). Aderivative containing an arginine at this position would be expectedto function better than the K359L derivative but still worse than thewild-type enzyme, would have unique pH dependence relative to thewild-type enzyme and would show two proton-transfer reactionsduring nucleotidyl transfer. At pH 7.5, the K359R derivative boundnucleotide well and turned over Bthree-fold faster than the K359Lderivative and 20-fold slower than wild-type enzyme (Fig. 4a). AtpH 10, the kpol value for the K359R derivative had risen to withintwo-fold of the maximum value observed for wild-type enzymeat pH 8.5 and continued to rise (Fig. 4b). In contrast, the value forthe wild-type enzyme was falling at pH 10 (Fig. 4b). This differencecan be explained by the DpKa value of 2 between lysine andarginine. The descending limb of the pH rate profile for the K359R

Figure 3 A conserved basic amino acid in the

active site of multiple classes of polymerases

participates directly in nucleotidyl transfer and

functions as a general acid. (a) Values for kpol

plotted as a function of pH for K359L PV RdRp

and wild-type (WT) PV RdRp. Kinetic data were

obtained for AMP incorporation into S/S-+1

(a symmetrical primer template substrate

containing 2-aminopurine in the +1 templating

position) using the stopped flow assay4 in the

presence of 5 mM MgCl2 (Supplementary

Methods). In the case of wild-type (WT) PV

RdRp, the solid line shows the fit of the data to a

model describing two ionizable groups (equation(3) in the Supplementary Methods), yielding

pKa values of 7.0 ± 0.1 and 10.5 ± 0.1 (ref. 4).

Error bars indicate s.d. Plots of the log(kpol) as a function of pH are provided in Supplementary Figure 3. (b) Proton-inventory plots for PV RdRp, HIV RT and

RB69 DdDp, with either the leucine derivative or lysine-containing polymerase (Supplementary Methods). Values for kn / k0 were plotted as a function

of n—n is the mole fraction of D2O, kn is the observed rate constant for nucleotide incorporation in a reaction containing a given mole fraction of D2O and

k0 is the observed rate constant for nucleotide incorporation in H2O. The solid lines represent the fit of the data to a one-proton-transfer model for the

leucine derivative (filled squares) and to a two-proton-transfer model for the wild-type polymerase (filled circles) (Supplementary Methods). The dashed lines

represent the predicted line for a one-proton-transfer model. Each data point represents the average of two or three independent experiments. Error bars

indicate s.d. (o10% in all cases). Graphs for wild-type enzymes are adapted from ref. 4; more rigorous analysis of the two-proton model is reported therein.

PV RdRp HIV RT RB69 DdDp

WT

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

kn

k0

n

WT

0 0.2 0.4 0.6 0.8 1n

WT

0 0.2 0.4 0.6 0.8 1n

K359L

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

kn

k0

n

K220L

0 0.2 0.4 0.6 0.8 1n

K560L

0 0.2 0.4 0.6 0.8 1n

b

0

20

40

60

80

k pol (

s–1)

0

2

4

6

8

10

5 6 7 8 9 10 11pH

5 6 7 8 9 10 11pH

k pol (

s–1)

aK359L

WT

PV RdRp

Table 1 Kinetic analysis of PV RdRp, HIV-1 RT, RB69 DdDp and T7 DdRp supports general acid catalysis in nucleotidyl transfer

Parameter measured PV RdRp HIV RT RB69 DdDp T7 DdRp

WT K359L WT K220L WT K560L WT K631L

kpol (s–1) 50 ± 5 1 ± 0.1 60 ± 5 0.3 ± 0.1 200 ± 10 0.10 ± 0.01 60 ± 5 0.6 ± 0.1

Kd,app (mM)a 200 ± 20 700 ± 80 7 ± 1 5 ± 2 40 ± 5 1000 ± 100 300 ± 30 5.0 ± 1.0 � 104c

SDKIEb 3.0 ± 0.3 2.5 ± 0.3 2.2 ± 0.4 1.8 ± 0.4 4.2 ± 0.2 1.8 ± 0.2 5.2 ± 0.5 2.6 ± 0.5c

PId 2 1 2 1 2 1 2 n.d.e

aKd,app is for (d)ATP (Supplementary Methods). bSDKIE is the solvent deuterium kinetic isotope effect28, calculated as kobs in H2O / kobs in D2O at saturating (d)ATP concentration. cKd,app, kpol

and SDKIE values listed for T7 K631L were obtained by using data collected with 80 mM ATP, a subsaturating concentration. dPI is the proton inventory29, calculated from a plot of kn / k0 as afunction of n. The data were fitted to a modified Gross-Butler equation for either a two-proton-transfer model (equation (1)) or a one-proton-transfer model (equation (2)). The value reported isthe proton-transfer model that best fits the data. en.d., not determined.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 1 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 123: Nature Structural Molecular Biology February

derivative should appear at pH values greater than 11 (dashed linein Fig. 4b). Unfortunately, experiments cannot be performed atpH values higher than 10 because of the insolubility of metalhydroxide. Finally, the solvent deuterium kinetic isotope effect was3.4 ± 0.1, essentially the same as that of the wild-type enzyme(Table 1), and proton-inventory data fit well to a two-proton model(Fig. 4c). These data strongly support the use of a general acid fornucleotidyl transfer and suggest that both lysine and arginine can servethis function.

Multisubunit RNA polymerases contain a histidine in the triggerloop that may serve as the general acid for these enzymes31 (Fig. 2).Histidine can function at this position of the PV RdRp (Fig. 4d). Weobserved a ten-fold reduction in the kpol value for the K359Hderivative relative to the wild-type enzyme with only a modest(50%) increase in the Kd,app value for nucleotide at pH 7.5(Fig. 4d). The pKa value for the ascending limb of the pH rate profilefor the K359H derivative was identical to that of the wild-typeenzyme; the descending limb was absent, consistent with a pKa

value of 6–7 for histidine (Fig. 4e).

DISCUSSIONWe conclude that all four classes of nucleic acid polymerases usegeneral acid catalysis for nucleotidyl transfer. Notably, the general acidis not absolutely essential but provides a 50-fold to 2,000-fold rateenhancement, depending upon the polymerase evaluated. The value ofthe pKa for pyrophosphate in the polymerase active site is not known.The absence of an absolute requirement for a general acid inpolymerase-catalyzed nucleotidyl transfer suggests that the value ofthe pKa for the tetra-anionic form of pyrophosphate is not sufficientlyhigh to preclude it from serving as a leaving group. The differences inrate enhancement conferred by the general acid for the differentpolymerases may reflect the extent to which the general acid alsocontributes to the overall neutralization of the negative charge in theactive site. RdRps and RTs have evolved an independent structuralmotif (F) to bind the triphosphate26. In contrast, in other polymerasefamilies the general acid is connected directly (helix O, helix K, triggerloop) or indirectly (helix P) to the structural elements that bind thetriphosphate (Fig. 2). There is no doubt that neutralization of thenegative charge that forms during the transition state will facilitatenucleotidyl transfer and probably also contributes to the reducedcatalytic efficiency of the leucine derivatives12,32,33.

Protonation of the pyrophosphate leaving group in the mechanismof nucleotidyl transfer has been essentially ignored until recently. Moststructures of polymerases poised for or undergoing catalysis reveal aninteraction between a basic amino acid of the polymerase and theb-phosphate of the nucleotide substrate (Fig. 2), and this interactionhas been interpreted as a nucleotide binding determinant. However,one study proposed that the interaction between the b-phosphate ofthe incoming nucleotide and His1085 (trigger loop) of yeast RNApolymerase II could link substrate recognition to catalysis if thishistidine served as a general acid for protonation of the pyrophosphateleaving group31. Our recent studies of nucleotidyl transfer confirmedthat protonation of the pyrophosphate leaving group does occur4,thus encouraging the identification of the proton donor.

Nature has used all three basic amino acids as the general acid(Fig. 2). The residue chosen may reflect the need to achieve a balancebetween the rate of polymerization and the fidelity of nucleotideincorporation. Enzymes that use a lysine as a general acid—forexample, PV RdRp, HIV RT, RB69 DdDp and T7 DdRp—elongatenucleic acids with rate constants on the order of 100 s–1 (Table 1). Incontrast, rat DNA polymerase-b (which contains arginine as generalacid) and yeast RNA polymerase II (histidine as a general acid)synthesize nucleic acid an order of magnitude more slowly atphysiological pH8,30,34. The idea that a difference in the general acidleads to changes in elongation rate is supported by the reduced rate ofRNA synthesis observed for the K359R and K359H derivatives ofPV RdRp (Fig. 4). In addition, changing His1085 of yeast RNApolymerase II to tyrosine reduces the rate of catalysis by an order ofmagnitude without changing the observed affinity for the nucleotidesubstrate34. Our study would suggest that the reduced rate of catalysisfor the H1085Y mutant at physiological pH is likely to reflect theincreased pKa value for the tyrosine hydroxyl proton relative to thehistidine imino proton.

It has often been suggested that polymerase translocation occursconcomitantly with or after pyrophosphate release21,35. The beststructural description of a polymerase undergoing catalysis wasprovided by work using a fragment of DNA polymerase I fromBacillus stearothermophilus, an A family polymerase36. In this system,movement of helix O is coupled to catalysis and translocation.Notably, the general acid of all polymerases seems to be associatedwith exceptionally dynamic structural elements (SupplementaryFig. 1 and Supplementary Fig. 5 online), and the movement of

Figure 4 Altering nucleotidyl transfer kinetics by

changing the amino acid that acts as the general

acid. (a) Lys359 of PV RdRp was changed to

arginine. Kinetic analysis of AMP incorporation

by K359R PV RdRp at pH 7.5 fit to a hyperbola

with a Kd,app for ATP of 54 ± 6 mM and a kpol of

2.6 ± 0.1 s–1. (b) pH rate profile for K359R PV

RdRp fit to a model describing a one-ionizable

group, yielding a pKa value of 8.8 ± 0.3. The

dashed line shows the predicted curve for a

second ionizable group with a pKa of 12.0. The

red line shows the curve of kpol as a function of

pH for wild-type (WT) PV RdRp with pKa values

of 7.0 and 10.5. Error bars indicate s.d.

(c) Proton-inventory plot for K359R PV RdRp fitto a two-proton-transfer model. The dashed line

represents the predicted line for a one-proton-

transfer model. Each data point represents the average of two independent experiments. Error bars indicate s.d. (o10%). (d) Lys359 of PV RdRp was

changed to histidine. Kinetic analysis of AMP incorporation by K359H PV RdRp at pH 7.5 fit to a hyperbola with a Kd,app for ATP of 340 ± 20 mM and a

kpol of 4.9 ± 0.4 s–1. (e) pH rate profile for K359H PV RdRp fit to a model describing a one-ionizable group, yielding a pKa value of 6.9 ± 0.3. The red line

shows the curve of kpol as a function of pH for wild-type PV RdRp with pKa values of 7.0 and 10.5. Error bars indicate the s.d.

a

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

kn

k0

n

0

0.5

1

1.5

2

2.5

0 100 200 300 400 500 600[ATP] (µM)

Kd,app = 54 ± 6 µM

kpol = 2.6 ± 0.3 s–1

0

20

40

60

80

0

8

16

24

32

6 8 10 12 14pH

K359R K359Rb c

k obs

(s–1

)

WT

( )

k pol (

s–1)

K359R

( )

kpol (s

–1)

d

0

1

2

3

4

5

0 500 1,000 1,500 2,000[ATP] (µM)

Kd,app = 340 ± 20 µM

kpol = 4.9 ± 0.4 s–1

K359H

k obs

(s–1

)

0

20

40

60

80

0

2

4

6

8

6 8 10 12pH

e

WT

( )

k pol (

s–1)

K359H

( )

kpol (s

–1)

ART IC L E S

21 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 124: Nature Structural Molecular Biology February

these elements has been implicated in translocation12,18,21,23,31,34,36. Itis possible that the protonation-deprotonation cycle of the generalacid during catalysis toggles helix O (and its structural equivalents)between the closed (protonated) and open (deprotonated) states, thusdriving the translocation reaction. Consistent with this possibility isthe observation that changing the general acid to leucine leads tosubstantial reductions in processivity for all nucleic acid polymerasesused in this study (Supplementary Fig. 6 online).

The use of a general acid by polymerases for nucleotidyl transfer hasimportant implications for reaction reversal, pyrophosphorolysis.Because pyrophosphate in solution should be protonated, deprotona-tion of pyrophosphate should not occur readily for enzymes that uselysine or arginine as a general acid, as these residues should be rapidlyreprotonated by solvent after pyrophosphate release. In contrast,enzymes such as the multisubunit RNA polymerases that use a histidineresidue should catalyze pyrophosphorolysis more efficiently, an obser-vation that has been reported37–39. The presence of histidine in thesepolymerases may have been an early solution to deal with the problemof polymerase arrest caused by backtracking, misincorporation ortemplate damage that is now dealt with by using editing factors40.

Finally, it is worth noting that, where studied, the element harbor-ing the general acid has been shown to contribute not only toincorporation speed but also incorporation fidelity41–46. RNA viruspathogenesis and virulence are closely linked to replication speed andfidelity47,48. It has recently been shown that a PV mutant encoding anRdRp with increased fidelity is attenuated47,48, and this attenuatedvirus serves as an effective vaccine strain49. The identification of asingle amino acid in the polymerase active site that can tune replica-tion speed and perhaps modulate incorporation fidelity suggeststhe provocative hypothesis that a universal strategy for viral attenua-tion may exist that can be applied to the rational design of virusvaccine strains.

METHODSMaterials. All general experimental materials, buffers, salts, and so on, were of

the highest grade available from Sigma, Fisher or VWR. A complete list of any

specialty reagents used is provided in the Supplementary Methods.

Construction, expression and purification of polymerases and their deriva-

tives. We constructed derivatives of all polymerases: PV RdRp, RB69 DdDp, T7

DdRp and HIV RT by using standard recombinant DNA protocols, described

in the Supplementary Methods. Briefly, we mutated DNA sequences by PCR.

Forward and reverse primers (Supplementary Table 1 online) used for

amplification were selected based on the presence of unique restriction sites

suitable for subcloning of the mutated DNA fragment into the expression

plasmid for the wild-type polymerase.

We expressed all polymerases in Escherichia coli and purified them as

described previously4. Briefly, we transformed cells with the appropriate

plasmid and used these cells to produce an inoculum for large-scale growth.

We induced gene expression during exponential growth by addition of

isopropyl-b-D-thiogalactopyranoside. We lysed the induced cells in appropriate

buffers and we purified the enzymes to apparent homogeneity by using

standard column chromatography resins and protocols. Details are provided

in the Supplementary Methods.

Nucleotide-incorporation experiments. We performed nucleotide-incorpora-

tion experiments as described previously4. In general, we formed elongation

complexes in the appropriate buffer by incubating polymerase with the

appropriate primed template followed by rapid mixing with a nucleoside

triphosphate substrate, generally ATP. Rapid mixing was performed by using

either a chemical quench flow (CQF) or stopped flow instrument (both from

KinTek). For the CQF experiments, we used 32P-labeled primers. We monitored

primer extension by phosphorimaging of polyacrylamide gels. For the stopped

flow experiments, we used templates containing 2-aminopurine ribonucleoside

monophosphate on the 5¢ side of the templating nucleotide. Primer extension

causes a fluorescence change that could be monitored using the stopped flow

instrument. All details, including modifications, are provided in the Supple-

mentary Methods.

Solvent deuterium kinetic isotope effect and proton-inventory experiments.

We performed solvent deuterium kinetic isotope effect and proton-inventory

experiments by monitoring pre–steady state nucleotide incorporation either

using the CQF or stopped flow instruments. We prepared enzymes, substrates

and buffers in 100% water or 100% D2O and then mixed them at the

appropriate ratio to obtain 0%, 25%, 50%, 75% or 100% D2O. Deuterated

glycerol was used in all solutions in D2O. The pD was used instead of pH for

the solutions in D2O and was adjusted according to pD ¼ pH + 0.4. Solvent

deuterium kinetic isotope effect values were calculated as the ratio of kpol

values obtained in H2O divided by that obtained in D2O. pI plots consisted of

kn / kH2O as a function of n, where kn is the observed rate constant for

nucleotide incorporation at a particular mole fraction of D2O, kH2O is the

observed rate constant for nucleotide incorporation in H2O and n is the solvent

mole fraction of D2O. Proton-inventory data were fit to the modified Gross-

Butler equation for a two-proton-transfer model29:

knkH2O

¼ ð1 � n+ n�F1Þð1 � n+ n�F2Þ ð1Þ

or for a one-proton-transfer model29:

knkH2O

¼ ð1 � n+ n�FÞ ð2Þ

where kn is the observed rate constant at the different percentages of D2O, kH2O

is the observed rate constant in water, n is the mole fraction of D2O and F is

the inverse of the isotope effect for each ionizable group.

Data analysis. Observed rate constants (kobs) for nucleotidyl transfer at various

concentrations of nucleotide were obtained by fitting product-versus-time data

to an equation defining a single exponential. Values for Kd,app and kpol were

obtained by fitting kobs-versus-[NTP] data to an equation defining a hyperbola.

Data were fit by nonlinear regression using the program KaleidaGraph (Synergy

Software). Specific equations used are provided in Supplementary Methods.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank S.J. Benkovic, P.C. Bevilacqua, J. Martin Bollinger, K. Murakami,K.D. Raney and J.C. Reese for comments on the manuscript. This study wassupported by a grant (AI45818) from the US National Institutes of Healthto C.E.C.

AUTHOR CONTRIBUTIONSC.E.C., J.J.A., C.C. and E.D.S. designed research; C.C., E.D.S., K.R.M., J.J.A.,I.M. and A.U. performed research; M.G. and W.K. contributed new reagentsand analytical tools; C.E.C., C.C., E.D.S. and J.J.A. analyzed data; C.E.C., E.D.S.,C.C. and J.J.A. wrote the paper.

COMPETING INTERESTS STATEMENTThe authors declare competing financial interests: details accompany the full-textHTML version of the paper at http://www.nature.com/nsmb/.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Steitz, T.A. A mechanism for all polymerases. Nature 391, 231–232 (1998).2. Yang, W., Lee, J.Y. & Nowotny, M. Making and breaking nucleic acids: two-Mg2+-ion

catalysis and substrate specificity. Mol. Cell 22, 5–13 (2006).3. Fothergill, M., Goodman, M.F., Petruska, J. & Warshel, A. Structure-energy analysis of

the role of metal ions in phosphodiester bond hydrolysis by DNA polymerase I. J. Am.Chem. Soc. 117, 11619–11627 (1995).

4. Castro, C. et al. Two proton transfers in the transition state for nucleotidyl transfercatalyzed by RNA- and DNA-dependent RNA and DNA polymerases. Proc. Natl. Acad.Sci. USA 104, 4267–4272 (2007).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 1 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 125: Nature Structural Molecular Biology February

5. Florian, J., Goodman, M.F. & Warshel, A. Computer simulation of the chemical catalysisof DNA polymerases: discriminating between alternative nucleotide insertion mechan-isms for T7 DNA polymerase. J. Am. Chem. Soc. 125, 8163–8177 (2003).

6. Showalter, A.K. & Tsai, M.D. A reexamination of the nucleotide incorporation fidelity ofDNA polymerases. Biochemistry 41, 10571–10576 (2002).

7. Florian, J., Goodman, M.F. & Warshel, A. Computer simulations of protein functions:searching for the molecular origin of the replication fidelity of DNA polymerases.Proc. Natl. Acad. Sci. USA 102, 6819–6824 (2005).

8. Sucato, C.A. et al. DNA polymerase b fidelity: halomethylene-modified leaving groupsin pre-steady-state kinetic analysis reveal differences at the chemical transition state.Biochemistry 47, 870–879 (2008).

9. Anand, V.S. & Patel, S.S. Transient state kinetics of transcription elongation by T7 RNApolymerase. J. Biol. Chem. 281, 35677–35685 (2006).

10. Arnold, J.J. & Cameron, C.E. Poliovirus RNA-dependent RNA polymerase (3Dpol): pre-steady-state kinetic analysis of ribonucleotide incorporation in the presence of Mg2+.Biochemistry 43, 5126–5137 (2004).

11. Ferrer-Orta, C. et al. Sequential structures provide insights into the fidelity of RNAreplication. Proc. Natl. Acad. Sci. USA 104, 9463–9468 (2007).

12. Franklin, M.C., Wang, J. & Steitz, T.A. Structure of the replicating complex of a Pol afamily DNA polymerase. Cell 105, 657–667 (2001).

13. Gohara, D.W., Arnold, J.J. & Cameron, C.E. Poliovirus RNA-dependent RNA polymer-ase (3Dpol): kinetic, thermodynamic, and structural analysis of ribonucleotide selec-tion. Biochemistry 43, 5149–5158 (2004).

14. Huang, H., Chopra, R., Verdine, G.L. & Harrison, S.C. Structure of a covalently trappedcatalytic complex of HIV-1 reverse transcriptase: implications for drug resistance.Science 282, 1669–1675 (1998).

15. Pomerantz, R.T., Temiakov, D., Anikin, M., Vassylyev, D.G. & McAllister, W.T. Amechanism of nucleotide misincorporation during transcription due to template-strandmisalignment. Mol. Cell 24, 245–255 (2006).

16. Sarafianos, S.G. et al. Structures of HIV-1 reverse transcriptase with pre- and post-translocation AZTMP-terminated DNA. EMBO J. 21, 6614–6624 (2002).

17. Spence, R.A., Kati, W.M., Anderson, K.S. & Johnson, K.A. Mechanism of inhibition ofHIV-1 reverse transcriptase by nonnucleoside inhibitors. Science 267, 988–993(1995).

18. Temiakov, D. et al. Structural basis for substrate selection by T7 RNA polymerase.Cell 116, 381–391 (2004).

19. Thompson, A.A. & Peersen, O.B. Structural basis for proteolysis-dependent activationof the poliovirus RNA-dependent RNA polymerase. EMBO J. 23, 3462–3471(2004).

20. Yang, G., Franklin, M., Li, J., Lin, T.C. & Konigsberg, W. Correlation of the kinetics offinger domain mutants in RB69 DNA polymerase with its structure. Biochemistry 41,2526–2534 (2002).

21. Yin, Y.W. & Steitz, T.A. The structural mechanism of translocation and helicase activityin T7 RNA polymerase. Cell 116, 393–404 (2004).

22. Zamyatkin, D.F. et al. Structural insights into mechanisms of catalysis and inhibition inNorwalk virus polymerase. J. Biol. Chem. 283, 7705–7712 (2008).

23. Canard, B., Chowdhury, K., Sarfati, R., Doublie, S. & Richardson, C.C. The motif D loopof human immunodeficiency virus type 1 reverse transcriptase is critical for nucleoside5¢-triphosphate selectivity. J. Biol. Chem. 274, 35768–35776 (1999).

24. Hizi, A., Tal, R., Shaharabany, M. & Loya, S. Catalytic properties of the reversetranscriptases of human immunodeficiency viruses type 1 and type 2. J. Biol. Chem.266, 6230–6239 (1991).

25. Gillis, A.J., Schuller, A.P. & Skordalakes, E. Structure of the Tribolium castaneumtelomerase catalytic subunit TERT. Nature 455, 633–637 (2008).

26. Ng, K.K., Arnold, J.J. & Cameron, C.E. Structure-function relationships among RNA-dependent RNA polymerases. Curr. Top. Microbiol. Immunol. 320, 137–156 (2008).

27. Rosta, E., Kamerlin, S.C. & Warshel, A. On the interpretation of the observed linear freeenergy relationship in phosphate hydrolysis: a thorough computational study ofphosphate diester hydrolysis in solution. Biochemistry 47, 3725–3735 (2008).

28. Schowen, R.L. Mechanistic deductions from solvent isotope effects. Progr. Phys. Org.Chem. 9, 275–332 (1972).

29. Venkatasubban, K.S. & Schowen, R.L. The proton inventory technique. CRC Crit. Rev.Biochem. 17, 1–44 (1984).

30. Kraynov, V.S., Showalter, A.K., Liu, J., Zhong, X. & Tsai, M.D. DNA polymerase b:contributions of template-positioning and dNTP triphosphate-binding residues tocatalysis and fidelity. Biochemistry 39, 16008–16015 (2000).

31. Wang, D., Bushnell, D.A., Westover, K.D., Kaplan, C.D. & Kornberg, R.D. Structuralbasis of transcription: role of the trigger loop in substrate specificity and catalysis.Cell 127, 941–954 (2006).

32. Xiang, Y., Oelschlaeger, P., Florian, J., Goodman, M.F. & Warshel, A. Simulating theeffect of DNA polymerase mutations on transition-state energetics and fidelity:evaluating amino acid group contribution and allosteric coupling for ionized residuesin human Pol b. Biochemistry 45, 7036–7048 (2006).

33. Yang, G., Lin, T., Karam, J. & Konigsberg, W.H. Steady-state kinetic characterization ofRB69 DNA polymerase mutants that affect dNTP incorporation. Biochemistry 38,8094–8101 (1999).

34. Kaplan, C.D., Larsson, K.M. & Kornberg, R.D. The RNA polymerase II trigger loopfunctions in substrate selection and is directly targeted by a-amanitin. Mol. Cell 30,547–556 (2008).

35. Marchand, B. & Gotte, M. Site-specific footprinting reveals differences in the translo-cation status of HIV-1 reverse transcriptase. Implications for polymerase translocationand drug resistance. J. Biol. Chem. 278, 35362–35372 (2003).

36. Johnson, S.J., Taylor, J.S. & Beese, L.S. Processive DNA synthesis observed in apolymerase crystal suggests a mechanism for the prevention of frameshift mutations.Proc. Natl. Acad. Sci. USA 100, 3895–3900 (2003).

37. Erie, D.A., Yager, T.D. & von Hippel, P.H. The single-nucleotide addition cycle intranscription: a biophysical and biochemical perspective. Annu. Rev. Biophys. Biomol.Struct. 21, 379–415 (1992).

38. Rudd, M.D., Izban, M.G. & Luse, D.S. The active site of RNA polymerase II participatesin transcript cleavage within arrested ternary complexes. Proc. Natl. Acad. Sci. USA91, 8057–8061 (1994).

39. Wang, D. & Hawley, D.K. Identification of a 3¢-5¢ exonuclease activity associated withhuman RNA polymerase II. Proc. Natl. Acad. Sci. USA 90, 843–847 (1993).

40. Erie, D.A., Hajiseyedjavadi, O., Young, M.C. & von Hippel, P.H. Multiple RNApolymerase conformations and GreA: control of the fidelity of transcription. Science262, 867–873 (1993).

41. Bebenek, A. et al. Dissecting the fidelity of bacteriophage RB69 DNA polymerase:site-specific modulation of fidelity by polymerase accessory proteins. Genetics 162,1003–1018 (2002).

42. Carroll, S.S., Cowart, M. & Benkovic, S.J. A mutant of DNA polymerase I (Klenowfragment) with reduced fidelity. Biochemistry 30, 804–813 (1991).

43. Johnson, V.A. et al. Update of the drug resistance mutations in HIV-1: spring 2008.Top. HIV Med. 16, 62–68 (2008).

44. Sousa, R. & Padilla, R. A mutant T7 RNA polymerase as a DNA polymerase. EMBO J.14, 4609–4621 (1995).

45. Suzuki, M., Yoshida, S., Adman, E.T., Blank, A. & Loeb, L.A. Thermus aquaticus DNApolymerase I mutants with altered fidelity. Interacting mutations in the O-helix. J. Biol.Chem. 275, 32728–32735 (2000).

46. Zhang, H. et al. The L561A substitution in the nascent base-pair binding pocket ofRB69 DNA polymerase reduces base discrimination. Biochemistry 45, 2211–2220(2006).

47. Arnold, J.J., Vignuzzi, M., Stone, J.K., Andino, R. & Cameron, C.E. Remote site controlof an active site fidelity checkpoint in a viral RNA-dependent RNA polymerase. J. Biol.Chem. 280, 25706–25716 (2005).

48. Vignuzzi, M., Stone, J.K., Arnold, J.J., Cameron, C.E. & Andino, R. Quasispeciesdiversity determines pathogenesis through cooperative interactions in a viral popula-tion. Nature 439, 344–348 (2006).

49. Vignuzzi, M., Wendt, E. & Andino, R. Engineering attenuated virus vaccines bycontrolling replication fidelity. Nat. Med. 14, 154–161 (2008).

50. Sawaya, M.R., Prasad, R., Wilson, S.H., Kraut, J. & Pelletier, H. Crystal structures ofhuman DNA polymerase b complexed with gapped and nicked DNA: evidence for aninduced fit mechanism. Biochemistry 36, 11205–11215 (1997).

ART IC L E S

21 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 126: Nature Structural Molecular Biology February

Polyubiquitin substrates allosterically activate their owndegradation by the 26S proteasomeDawadschargal Bech-Otschir1,2, Annett Helfrich1,2, Cordula Enenkel1, Gesa Consiglieri1, Michael Seeger1,Hermann-Georg Holzhutter1, Burkhardt Dahlmann1 & Peter-Michael Kloetzel1

The 26S proteasome degrades polyubiquitylated (polyUb) proteins by an ATP-dependent mechanism. Here we show that bindingof model polyUb substrates to the 19S regulator of mammalian and yeast 26S proteasomes enhances the peptidase activities ofthe 20S proteasome about two-fold in a process requiring ATP hydrolysis. Monoubiquitylated proteins or tetraubiquitin aloneexert no effect. However, 26S proteasomes from the yeast a3DN open-gate mutant and the rpt2YA and rpt5YA mutants withimpaired gating can still be activated (approximately 1.3-fold to 1.8-fold) by polyUb-protein binding. Thus, binding of polyUbsubstrates to the 19S regulator stabilizes gate opening of the 20S proteasome and induces conformational changes of the 20Sproteasome that facilitate channeling of substrates and their access to active sites. In consequence, polyUb substrates willallosterically stimulate their own degradation.

The 26S proteasome is a high-molecular-weight protease complex thatin an ATP-dependent process catalyzes the degradation of polyUbcellular proteins. The 26S complex, resembling the active form of theproteasome, is composed of a 19S regulator complex bound to one orboth ends of the 20S core proteasome. The 20S proteasome has acylindrical structure built up as a dimer of two stacked seven-membered rings with subunit compositions a1–a7 and b1–b7,respectively. When the 20S proteasome is in its active state, theouter rings formed by the a subunits enclose a narrow pore ofB13 A at either end of the cylinder through which peptides orunfolded protein substrates gain access to a central catalytic chamber.The six proteolytically active sites provided by the subunits b1,b2 and b5 within the two inner b-rings exert peptidylglutamylpeptide–hydrolyzing activity (PGPH-like), trypsin-like (T-like) andchymotrypsin-like (ChT-like) activities1. However, in the absence ofregulatory proteins, access of substrates to the catalytic chamber isrestricted by the N-terminal peptide extensions of the a subunits thatgate the central a-ring pore2. In consequence, 20S proteasomes with aclosed gate are catalytically inert. Activation of the 20S proteasome isachieved by regulatory proteins such as the proteasome activator PA28or the 19S regulator, which bind to the outer a-rings and release theocclusion by the N-terminal extensions of the a subunits and thusallow facilitated access of substrates to the catalytic chamber3.

The 19S regulator, formed by at least 19 different protein subunits,can be divided into the lid and base subcomplexes, which togetherconfer ATP-dependent polyUb protein degradation by the 26S protea-some4. The lid is responsible for the recognition of polyUb proteins aswell as the removal of the ubiquitin chain from protein substrates

(deubiquitylation)5,6. The base is formed by two non-ATPase and sixdistinct AAA+ ATPase subunits (Rpt1–Rpt6) with different individualfunctions7. The base has chaperone activity in vitro, and the ATPasesubunits Rpt2 and Rpt5 have an essential role in binding of the base tothe a-rings of the 20S proteasome, thereby releasing the occlusion ofthe central gate of the 20S proteasome8–10. Furthermore, one of theATPases of the base complex, namely Rpt5, has been shown to bindthe polyUb chain of a model substrate in a process modulated by ATPhydrolysis11. In consequence, the 19S regulator functions in theopening of the central gate of the 20S proteasome and in therecognition, unfolding and translocation of degradation-targetedproteins into the catalytic chamber12.

To date, little is known about the direct communication between the19S regulator and the active sites of the 20S proteasome. Some viralproteins bind directly to the 19S regulator and modulate the proteolyticactivities of the 26S proteasome, thereby either stimulating or blockingantigen processing, as well as modulating the stability of tumor-suppressor proteins13,14. Furthermore, blockage of the 20S proteasomalactive sites by an inhibitor leads to a stabilization of the 26S protea-some, probably owing to the induction of conformational changes inthe 20S core complex and possibly also the 19S regulator15,16.

However, so far there has been no information about whetherbinding of polyUb proteins to the 19S regulator of 26S proteasomesexerts any effect on the conformation of the enzyme complex. There-fore, we analyzed the effects of polyUb-substrate binding on theproteolytic activities of the 26S proteasomes.

Here we demonstrate that the binding of polyUb proteins to the19S regulator results in stimulation of the peptide-hydrolyzing

Received 2 July 2008; accepted 29 December 2008; published online 25 January 2009; doi:10.1038/nsmb.1547

1Institut fur Biochemie, Charite—Universitatsmedizin Berlin, Monbijoustrasse 2, 10117 Berlin, Germany. 2These authors contributed equally to this work.Correspondence should be addressed to P.-M.K. ([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 1 9

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 127: Nature Structural Molecular Biology February

activities of the mammalian and yeast 26S proteasomes. This stimula-tion requires ATP hydrolysis but is independent of substrate degrada-tion. Our data provide evidence for a conserved regulatory principlewhereby binding of polyUb substrates to the 19S regulator transfersconformational changes to the 20S proteasome that result in stabilizedgate opening and facilitated active-site accessibility of the substrates.

RESULTSMUC1-derived model substrates for 26S proteasomesThe human epithelial glycoprotein mucin encoded by the gene MUC1is a tumor-associated antigen. Its major histocompatibility complex(MHC) class I epitope MUC1950–958 is generated by the proteasomesystem and recognized by cytotoxic T cells (G. Consiglieri, unpub-lished data). To analyze the degradation of polyUb proteins by the 26Sproteasome in vitro, we generated polyUb MUC1 derivatives using ananalogous procedure to one previously described17.

The model substrates shown in Figure 1a consist of a tetraubiquitinchain (Ub4) linked to a lysine residue of another ubiquitin moiety,which is fused to the N-terminal end of a MUC1 derivative. TheMUC1 derivatives contain the epitope MUC1950–958 four or eighttimes in a row—that is, (MUC1950–958)4, described here as MUC4,and (MUC1950–958)8, or MUC8—or only once in the naturalMUC1938–1025 context (MUClong). The synthetic repeats of epitopeMUC1950–958 were efficiently degraded in vitro by the 20S proteasome(data not shown).

To test whether polyUb MUC1 derivatives are suitable modelsubstrates for the 26S proteasome, we incubated 300 nMUb5-MUC4 with 30 nM 26S proteasome isolated from humanerythrocytes and visualized the reaction products by immuno-blotting with an antibody directed against the MUC1950–958 epitope.After 2 h of incubation, approximately 80% of the Ub5-MUC4

substrate was degraded by the purified 26S proteasome. Depletionof ATP by apyrase abolished Ub5-MUC4 degradation but did notprevent deubiquitylation, as evidenced by the appearance of a stabi-lized, partially deubiquitylated MUC4 substrate (Fig. 1b). We obtainedsimilar results when we assayed the polyUb MUC1 derivatives Ub5-MUC8 and Ub5-MUClong to assess their degradation by 26S protea-somes. Both substrate proteins were stabilized almost completely inthe presence of apyrase. Product analysis by MS confirmed thegeneration of the epitope MUC1950–958 from all three proteins (datanot shown).

The fact that polyUb MUC1 derivatives were not degraded bypurified 20S proteasomes and that neither 26S nor 20S proteasomes

degraded the fusion protein Ub-MUC4 (Supplementary Fig. 1 online)revealed that the proteasomal degradation of polyUb MUC4 conjugatesnot only requires a functional 19S regulator but also a polyUb signal.

PolyUb proteins stimulate 26S proteasome activityTo date there is no information about whether binding of polyUbconjugates to 26S proteasomes via the 19S regulator affects 20Sproteasome activity. To investigate whether there is a functionalinterdependency between the 20S proteasome and the 19S regulator,we tested whether the presence of MUC1-derived polyUb proteinsinfluences the peptide hydrolyzing activity of the 26S proteasome. Inan initial experiment, we therefore pre-incubated the 26S proteasomewith MUC1-derived polyUb proteins for 15 min and subsequentlyassayed the ChT-like activity of the 26S proteasome using thefluorogenic peptide substrate Suc-LLVY-AMC.

The presence of the different MUC1-derived polyUb conjugates ledto an increase in the ChT-like activity of the 26S proteasome (Fig. 2a).We obtained a maximal stimulation of approximately two-fold abovethe basal activity at a molar ratio of 26S proteasomes to polyUbproteins of 1:10. Pre-incubation with BSA as control protein had noimpact on the peptidase activity of the 26S proteasome (Supplemen-tary Fig. 2a online). Notably, with a calculated K1/2 of 45 ± 20 nM, thebinding affinities of the MUC1-derived proteins were in the range tothose of other polyUb proteins18.

Processing of polyUb proteins includes binding, deubiquitylation,unfolding and translocation of proteins into the 20S proteasome bythe 19S regulatory particle12. To obtain information about which stepof polyUb protein–processing triggers the stimulation of the 26Sproteasomal peptidase activity, we measured the ChT-like peptide-hydrolyzing activity of 26S proteasomes at different time points afteraddition of the Ub5-MUC4 protein (Fig. 2b). Stimulation of thepeptidase activity occurred immediately after the addition of thepolyUb conjugate to the 26S proteasome (Fig. 2c). Maximal stimula-tion was already reached 5 min after addition of polyUb conjugatesand stayed constant for approximately 15 min. After 30 min, thestimulation rate decreased, indicating turnover of polyUb proteins.Consequently, we chose a 15-min incubation and a 1:10 molar ratio ofenzyme to protein for all further experiments.

Thus, the immediate stimulation of the peptidase activity of the 26Sproteasome by polyUb proteins suggests that very early processingsteps performed by the 19S regulatory particle, such as polyUb proteinbinding or deubiquitylation, must be responsible for the observedstimulatory effect.

Ub5-MUC1 derivatives

Ub4

His6-Ub

Ub5-MUC4

Ub5-MUC4

Ub-MUC8

Anti-MUC1

Ub-MUC4

Ub5-MUClong

Ub5-MUClong

Ub-MUClong

MUClong

Ub5-MUC8

Ub5-MUC8

26S26S 26S+A

pyra

se

+Apy

rase

+Apy

rase

0 1 2 2 Time (h)kDa48

32

2516

(MUC1950 –958)×4 MUC4

MUC8

MUClong

(MUC1950 –958)×8

(MUC1938–1025)×1

MUCx

MUCx:

G76-GS

a b

0 1 2 2 Time (h) 0 1 2 2 Time (h)kDa

48

32

25

16

Figure 1 MUC1-derived model substrates for 26S proteasomes. (a) All polyUb proteins are composed of Ub4 linked to Lys48 of an N-terminally His6-tagged ubiquitin (Ub) fused to the MUC1 derivatives by means of E2-25K. The MUC1 derivatives contain the epitope MUC1950–958 four times in a row

(MUC1950–958)4; MUC4), eight times in a row (MUC1950–958)8; MUC8) or only once in the natural context (MUC1938–1025). (b) In vitro digestion of the

polyUb MUC1 derivatives (300 nM) was performed with the isolated 26S proteasome (30 nM) in the presence of an ATP-regenerating system at 37 1C.

The lane labeled ‘+Apyrase’ shows a sample in which the 26S proteasome was pre-incubated with 25 U ml�1 apyrase for 15 min before addition of the

polyUb protein. Reaction products obtained after the indicated times were monitored by immunoblotting with an antibody against the epitope MUC1950–958

(Anti-MUC1). The migration positions of the monoubiquitylated and polyUb MUC1 derivatives in SDS-PAGE are indicated.

ART IC L E S

22 0 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 128: Nature Structural Molecular Biology February

Ub5-MUC4 binding stimulates all peptidase activitiesTo investigate whether the above observed peptidase stimulation isspecific for the ChT-like active site or affected all proteasomalactivities, we next tested whether the two other peptide-hydrolyzingactivities of the 26S proteasome, the T-like and the PGPH-likeactivities, were also affected by the polyUb conjugates. Thereforewe analyzed the hydrolysis of the fluorogenic peptides Z-VGR-AMC(T-like) and Z-LLE-AMC (PGPH-like) by the 26S proteasome afteraddition of a 10-M excess of Ub5-MUC4. Addition of Ub5-MUC4 ledto an approximately two-fold increase in both types of peptidecleavage activities, and thus was almost identical to the stimulationobserved for the ChT-like activity (Fig. 3a–c).

ChT-like activity was blocked by treatment of the 26S proteasomewith 50 mM clasto-lactacystin, which specifically binds to the b5subunit of the 20S proteasome19 (Fig. 3a). PolyUb conjugate–inducedstimulation of the T-like as well as the PGPH-like activity in thepresence of clasto-lactacytin was about 1.5-fold to 2-fold for bothactivities (Fig. 3b,c). Thus, all three peptide-hydrolyzing activities ofthe 26S proteasomes were stimulated by Ub5-MUC4, whereas thestimulation of the T-like as well as PGPH-like peptidase activities ofthe 26S proteasome did not seem to be affected by inhibition of the b5

active site. To clarify whether deubiquitylation of polyUb conjugatesby the 19S regulator is required for stimulating the 26S proteasome’speptide-hydrolyzing activities, we used o-phenanthroline (o-PT)5 toinhibit the deubiquitylating activity of Rpn11 in the 26S proteasomebefore the addition of Ub5-MUC4. After 15 min of pre-incubation, wemeasured the peptidase activity of the 26S proteasome. The basicpeptide-hydrolyzing activities were only slightly reduced in presence ofo-PT (Fig. 3c). Similarly, stimulation of the three peptidase activitiesof the 26S proteasome by Ub5-MUC4 was also not affected, indicatingthat o-PT has no effect on substrate binding to the 19S regulator andthat the deubiquitylation step is not decisively responsible for thepolyUb substrate–induced peptidase activation.

Proteasome activation correlates with Ub5-MUC4 bindingThe data obtained so far indicated that binding of the polyUbproteins to the 19S regulator is responsible for the observed stimula-tion of the 26S proteasome. As hydrophobic peptides can directlystimulate the peptide-cleavage activities of the 20S proteasome bybinding to so-called ‘modifier sites’20,21, it was important to excludethe possibility that the different MUC1 derivatives had any influenceon the peptide-hydrolyzing activities of the 20S proteasome.

Consistent with the result that 20S protea-somes do not degrade MUC1-derived polyUbconjugates (Supplementary Fig. 1a), neitherthe polyUb conjugate Ub5-MUC4 nor thefusion protein Ub-MUC4 induced anenhancement of the ChT-like activity of the20S proteasome (Fig. 4a). Also, addition ofthe peptides MUC3 (triple MUC1950–958 epi-tope) or MUC1 (single MUC1950–958 epitope)had no effect on the ChT-like activity of eitherthe 20S or the 26S proteasome (Fig. 4a,b). Incontrast, the polyUb protein Ub5-MUC4

strongly stimulated the ChT-like activity ofthe 26S proteasome (Fig. 4b). Notably, whenwe tested Ub5-HPV-E7, a polyUb proteinderived from the human papilloma virus(HPV) E7 protein, which in contrast toUb5-MUC4 is not degraded by the 26S pro-teasome, we also observed a stimulation of

60

ChT-like T-like PGPH-like

Hyd

roly

sis

ofS

uc-L

LVY-

AM

C

Hyd

roly

sis

ofZ

-VG

R-A

MC

Hyd

roly

sis

ofZ

-LLE

-AM

C

40

20

26S 26S+clasto-

lactacystin

26S+o-PT

26S 26S+clasto-

lactacystin

26S+o-PT

26S

26S26S+Ub5-MUC4 (1:10)

26S+clasto-

lactacystin

26S+o-PT

0

60

a b c

40

20

0

60

40

20

0

Figure 3 Ub5-MUC4 binding stimulates all peptidase activities. (a–c) 26S proteasomes (30 nM) werepre-incubated without (white bars) or with (black bars) Ub5-MUC4 (300 nM) for 15 min before addition

of fluorogenic peptides. ChT-like (a), T-like (b) and PGPH-like (c) activities were assayed using the

fluorogenic peptides Suc-LLVY-AMC (100 mM), Z-VGR-AMC (100 mM) and Z-LLE-AMC (100 mM),

respectively. The peptidase activities were measured after a 15-min incubation at 37 1C. In reactions

containing inhibitors, the 26S proteasome was treated with 50 mM clasto-lactacystin or 5 mM

o-phenanthroline (o-PT) for 15 min before addition of the polyUb conjugate. Values are the means

(± s.d.) from three independent experiments.

80

a b c1,200 100

80

60

40

20

00 3 5 10 15 32 92 120 150 180 215 275 340

1,000

800

600

400

200

100 200 300Time (min) Time (min)

400

60 1:2

1:5

1:10

1:5

1:10

Molar ratio26S : PolyUb conjugate

Molar ratio26S : PolyUb conjugate

26S26S+Ub5-MUC4 (1:5)26S+Ub5-MUC4 (1:10)

Hyd

roly

sis

of S

uc-L

LVY-

AM

C

Hyd

roly

sis

of S

uc-L

LVY-

AM

C

% in

crea

se in

ChT

-like

act

ivity

40

20

026S 26S+

Ub5-MUC4

26S+Ub5-MUC8

26S+Ub5-MUClong

Figure 2 PolyUb proteins stimulate the activity of the 26S proteasome. (a) 26S proteasomes (30 nM) were pre-incubated with or without Ub5-MUC4,

Ub5-MUC8 or Ub5-MUClong at different molar ratios of 26S proteasome to polyUb proteins as indicated for 15 min before addition of a fluorogenic peptide.

The ChT-like activity of the 26S proteasome was measured using the fluorogenic peptide Suc-LLVY-AMC (100 mM) after a 15-min incubation at 37 1C.

The hydrolysis of the fluorogenic peptide represents the mean value (± s.d.) of triplicate assays. The units for the y axis are the fluorescence relative to the

sensitivity of the photometer. The specific ChT-like peptidase activity of the 26S proteasomes in presence of Ub5-MUC4 was 6.5 to 7.0 nmol Suc-LLVY-AMC/

(min*mg proteasome)—about two-fold more than that of nonstimulated 26S proteasomes. (b) 26S proteasomes (30 nM) were pre-incubated with Ub5-MUC4

at molar ratios of proteasome to polyUb protein of 1:5 (open rectangles) and 1:10 (closed triangles), or in the absence of Ub5-MUC4 (open squares). AMC

cleavage from Suc-LLVY-AMC (100 mM) by the 26S proteasome was monitored over a time period of 340 min. (c) The data obtained in the experiment

shown in b are depicted as a percentage increase relative to the control without Ub5-MUC4.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 2 1

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 129: Nature Structural Molecular Biology February

the 26S proteasome’s peptide-hydrolyzing activity (SupplementaryFig. 2). These data therefore demonstrate that the peptidase activity ofthe 20S proteasome is stimulated by polyUb proteins only when the26S proteasome possesses a functional 19S regulator, and that thiseffect is not necessarily dependent on protein degradation.

To further elucidate the role of the polyUb signal for the stimulationof the 26S proteasome’s peptidase activity, we tested the influence ofproteins containing ubiquitin chains of different lengths. In contrast tothe polyUb conjugate Ub5-MUC4, neither monoubiquitin nor Ub4—except at extremely high Ub4 concentrations—induced a considerableincrease of the peptidase activity of the 26S proteasome (Fig. 4c). Also,a mixture of Ub-MUC4 and Ub4 had no stimulating effect. From thisdata we conclude that a sufficient binding affinity of the 19S regulatorto polyUb proteins is required to induce an enhancement of thepeptidase activities of the 26S proteasome.

To demonstrate that MUC1-derived polyUb protein binds directlyto the 19S regulator, we incubated Ub5-MUC4 with or without the 26S

proteasome for 15 min in the presence of ATP and separated thesamples by glycerol gradient centrifugation and analyzed the resultingfractions using antibodies against the epitope MUC1950–958, the 20Ssubunit b1 and the 19S regulator subunit Rpn2 (Fig. 4d). Ub5-MUC4

was found to be present in the fractions containing the unbound 19Sregulator (fractions 8 to 11) as well as in the fractions containing the26S proteasome (fractions 15 to 19), supporting the suggestion of adirect interaction between the polyUb conjugate and the 26S protea-some via the 19S regulator complex.

26S proteasome stimulation requires ATP hydrolysisAs demonstrated by the experiments above, binding of the polyUbproteins to the 19S regulator has a crucial role in stimulating the 26Sproteasome’s peptidase activity. Previously it was shown thatbinding of specific polyUb proteins to the 19S regulator occurs viasubunit Rpt5 and is coupled to ATP hydrolysis11. Given that ourMUC1-derived polyUb conjugates were generated in a manneranalogous to the method used to produce Ub5-DHFR18, it may beinferred that the polyUb MUC1 derivatives also preferentially bind tothe 19S regulator via Rpt5. To investigate whether ATP hydrolysis isrequired for polyUb MUC1 derivative–dependent induction of the26S proteasome’s peptide-hydrolyzing activity, we measured the ChT-like activity of the 26S proteasome in presence of Ub5-MUC4 and ATPanalogs (Fig. 5).

Only in the presence of ATP, provided either by the residual ATPused for preparation of the 26S proteasome or by addition of 2 mMATP, was the ChT-like activity enhanced in the presence of Ub5-MUC4. Noncleavable ATPgS slightly enhanced the basic peptidaseactivity but was not sufficient to support a stimulation of 26Sproteasome–mediated peptide hydrolysis in the presence of Ub5-MUC4. As expected, removal of the residual ATP by apyrase ledto a decrease in the basic proteasomal peptidase activity. No Ub5-MUC4–induced stimulation was detected under ATP-depleted condi-tions (Fig. 5).

7 80 50

40

30

20

10

0

60

40

20

0

6

20S 26S 26S

(×10)

(×100

)(×1

0)

(×100

)(×1

0)

(×100

)(×1

0)

(×100

)(×1

0)(×1

0)(×1

0)(×1

0)

(×100

)(×1

0)

(×100

)(×1

0)

(×100

)

(×100

)

+Ub5-MUC4

+Ub5-MUC4

Ub5-MUC4

Ub5-MUC4

Rpn2

β1

+Ub-MUC4

+MUC3 +MUC1 +DMSO +DMSO

+Ub 5

-MUC 4

+Ub 4

(×10

)

+Ub 4

(×20

)

+Ub 4

(×40

)

+Ub 4

+Ub-

MUC 4

+Ub+MUC3

Fraction 1 5 10

Anti-MUC1

Anti-MUC1

Anti-Rpn2, anti-β1

Glycerol15% 45%

PA700 20S 26S

26S + Ub5-MUC4

Ub5-MUC4

15 20

+MUC1

4

2

Hyd

roly

sis

ofS

uc-L

LVY-

AM

C

Hyd

roly

sis

ofS

uc-L

LVY-

AM

C

Hyd

roly

sis

ofS

uc-L

LVY-

AM

C

0

a b c

d

+Ub-MUC4

Figure 4 Proteasome activation correlates with Ub5-MUC4 binding affinity.

(a) 20S proteasome (30 nM) alone or with the MUC1 derivatives

Ub5-MUC4, Ub-MUC4, MUC3 (MUC1950–958 epitope three times in

a row) or MUC1 (one MUC1950–958 epitope), at different molar ratios

of proteasome to MUC1 derivatives as indicated, was pre-incubated for15 min before addition of the fluorogenic peptide. The peptide solvent

DMSO was used as a control. The ChT-like activity of proteasomes was

measured using Suc-LLVY-AMC (100 mM) after a 15-min incubation period

at 37 1C. (b) The same experiment as in a was carried out, with the 26S

proteasome included (30 nM). (c) 26S proteasomes (30 nM) were

pre-incubated with Ub5-MUC4, monoubiquitin (300 nM), a mixture of

Ub-MUC4 (300 nM) and Ub4, and Ub4 at a 10-fold, 20-fold or 40-fold

molar excess, respectively, for 15 min before initiation of the fluorogenic

peptide assay. The ChT-like activity was determined using 100 mM

Suc-LLVY-AMC after a 15-min incubation. Values are the means (± s.d.) of three independent experiments. (d) Reactions containing 300 nM Ub5-MUC4

alone or 300 nM Ub5-MUC4 with 26S proteasome (30 nM) were incubated for 15 min in an ATP-containing buffer, then applied to 15–45% (v/v) glycerol

gradients and separated by density centrifugation. Fractions were collected and analyzed by immunoblotting with antibodies against the epitope

MUC1950–958 (Anti-MUC1), the PA700 subunit Rpn2 (Anti-Rpn2) and the 20S subunit b1 (anti-b1).

50

26S26S + Ub5-MUC4 (1:10)

40

30

Hyd

roly

sis

of S

uc-L

LVY-

AM

C

20

10

0No ATP +ATP +ATP +ADP+ADP+ATPγS +ATPγSNo ATP

+Apyrase

Figure 5 PolyUb-mediated stimulation requires ATP hydrolysis. The 26S

proteasome (30 nM) was treated without (left) or with (right) apyrase

(25 U ml�1) for 15 min before addition of ATP or ATP analogs. Reaction

mixtures with 300 nM Ub5-MUC4 (black bars) or without Ub5-MUC4 (white)

were incubated in the presence of 2 mM ATP or ATP analogs, respectively,

for additional 15 min before initiation of a fluorogenic peptide. The ChT-like

activity of the 26S proteasome was assayed with 100 mM Suc-LLVY-AMC

during a 15-min incubation period. The values are means (± s.d.) from

three experiments.

ART IC L E S

22 2 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 130: Nature Structural Molecular Biology February

Conformational changes of the proteasome by Ub5-MUC4

The data obtained so far suggested that binding of polyUb proteins tothe 19S regulator induces structural changes in the 20S proteasomes.To test this conclusion, we took advantage of existing gating mutantsof yeast Saccharomyces cerevesiae.

As revealed by X-ray structure analysis, the 20S proteasome of theopen gate mutant a3DN, which lacks the N-terminal nine residues ofthe a3 subunit, possesses a maximally opened gate compared with thewild-type 26S proteasome, which has a closed gate2. Consistent withthis, the 26S proteasomes of the a3DN mutant show increasedpeptide-hydrolyzing activity in comparison to wild-type 26S protea-somes (Fig. 6).

Wild-type yeast 26S proteasomes degraded the polyUb proteinUb5-MUC8 in a similar fashion to 26S proteasomes from mammaliancells (Supplementary Fig. 3 online). Furthermore, Ub5-MUC4 stimu-lated the ChT-like peptidase activity of the wild-type 26S proteasomeapproximately 2.0-fold to 2.5-fold, demonstrating the evolutionaryconservation of the underlying mechanism (Fig. 6a,b). However,despite an already maximally opened gate in the 26S proteasomesfrom the a3DN mutant, we still observed an approximately 1.3-foldstimulation of the ChT-like activity by Ub5-MUC4 (Fig. 6a,b).

The AAA+ ATPase gating mutants rpt2YA and rpt5YA of the basecomplex are characterized by a substitution at the terminal residue ofthe C-terminal docking motif and show an impaired gating mechan-ism10. Despite this defect, the peptidase activity of the 26S protea-somes from the rpt2YA and rpt5YA mutants could still be stimulatedby Ub5-MUC4, by approximately 1.8-fold and 1.4-fold, respectively(Fig. 6a).

Taken together, our experiments indicate that binding of Ub5-MUC4 to the 19S regulator not only affects gate opening of the 20Sproteasome but also confers, independently of the gating mechanism,further conformational changes on the 20S proteasome that result in afacilitated access of the substrates to the catalytic cavity, therebyincreasing the proteolytic activities of the 26S proteasome.

DISCUSSIONDegradation of polyUb proteins by the 26S proteasome requires acoordinated multistep process that, despite its importance for protein

homeostasis, is not completely understood.Known steps required for degradation ofpolyUb proteins by the 26S proteasomesinclude unlocking of the 20S proteasomeduring 19S regulator and 20S proteasomeassembly22,23, binding of the substrate tothe 19S regulator24, unfolding of the substrateinvolving the ATPase subunits of thebase25,26, deubiquitylation27 and transloca-tion of the unfolded substrate into the cata-lytic chamber of the 20S proteasome10,12.

Our data reveal a previously undiscoveredconserved functional interdependence betweenthe 19S regulator and the 20S proteasomethat affects the proteolytic activities of the26S proteasome. We show that a processconsisting of binding of polyUb conjugatesto the 19S regulator and ATP hydrolysisinduces opening of the gate and confersconformational changes onto the 20S protea-some that further facilitate substrate access tothe catalytic cavity and the active sites.

In the experiments reported here, we usedpolyUb MUC1 derivatives that were produced analogously to thepreviously described Ub5-DHFR substrate. Binding of the polyUbsubstrate to the 19S regulator of the 26S proteasome, as shown here byco-sedimentation studies, stimulates the peptidase activities of the 20Sproteasome, requires ATP hydrolysis and is not supported by ATPgS.Furthermore, peptidase activation was independent of deubiquityla-tion and not supported by monoubiquitin fusion proteins. Also, no oronly negligible stimulatory effects were exerted on the 26S proteasomeby Ub4 alone. As our polyUb proteins were already unfolded, thestimulatory effect does not seem to be mediated by other proteindegradation–coupled steps such as unfolding and translocation.Notably, binding of the nondegradable substrate Ub5-HPV-E6 wasalso able to stimulate the peptidase activity, revealing that polyUbsubstrate–induced activation via the 19S regulator is not necessarilylinked to protein degradation.

Cross-linking studies had demonstrated that an analogous polyUbprotein, Ub5-DHFR, was recognized with high affinity by only theRpt5 subunit of the 19S regulator in a process that is modulated byATP hydrolysis and impaired by ATPgS11. Our data therefore suggestthat the polyUb MUC1 derivatives probably also interact with theATPase subunit Rpt5 of the base complex, which attaches the 19Sregulator to the 20S proteasome. In support of this idea, deletion ofthe polyUb binding subunit Rpn10 exerted no effect on the stimula-tion of the 26S proteasome’s peptidase activity, demonstrating thatRpn10 is not necessary for substrate-induced activation of the 26Sproteasome (Fig. 6a,b). Nevertheless, we cannot exclude an involve-ment of the 19S lid complex, which also has been described as anadaptor for polyUb chains.

The fact that binding of polyUb proteins affects only the Vmax valueof the peptide-hydrolyzing reaction but leaves the Km value unaltered(data not shown) is reminiscent of the reported PA28-inducedactivation of 20S proteasomes28. Furthermore, it also indicates thatthe polyUb substrate–induced stimulation of the 26S proteasome’speptidase activities is the consequence of gate opening and facilitatedsubstrate entry or product exit from the catalytic cavity.

Previous biochemical data seemed to establish that binding ofthe 19S regulator to the 20S proteasome via the six AAA+ ATPasesof the base induces almost maximal opening of the gate4 and activates

–Ub5-MUC4

+Ub5-MUC4

26S + Ub5-MUC4 (1:10)

2.5

Hyd

roly

sis

of S

uc-L

LVY-

AM

C

1.8 1.4

Coomassie

600

500

400

300

200

100

0

WT rpt2YA rpt5YA α3∆N

α3∆N 26S

∆rpn10

∆rpn10 26S

1.3 1.8 (Fold)

26S

26S

26S

26S

WT 26S

a b

Figure 6 Conformational changes of the proteasome caused by Ub5-MUC4. (a) Affinity-purified

proteasomes (2 nM) from wild type (WT), a3DN and Drpn10 mutant strains, and 6 nM proteasomes

from the rpt2YA and rpt5YA mutant strains were resolved on 3.5–6.0% nondenaturing gels. Gels were

pre-incubated for 30 min at 37 1C with or without Ub5-MUC4 protein (60 nM) in the buffer used for

the degradation assay (see above) before addition of 100 mM Suc-LLVY-AMC fluorogenic peptide. After

an additional 15-min incubation with the fluorogenic peptide, proteasome activity was visualized and

quantified by densitometry as indicated. The gels were stained with Coomassie Blue to assess the

amount of protein in each sample. (b) Affinity-purified proteasomes (4 nM) from the wild type and the

a3DN strains, and 5 nM proteasomes from the Drpn10 mutant strain, were pre-incubated with 40 nM

Ub5-MUC4 for 15 min at 37 1C before addition of 100 mM Suc-LLVY-AMC fluorogenic peptide. After a

15-min incubation with fluorogenic peptide, the peptidase activity of 26S proteasomes was measured.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 2 3

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 131: Nature Structural Molecular Biology February

the peptide-hydrolyzing activities of the 20S proteasome to a similarextent as the binding of the heptameric proteasome activator PA28(ref. 29). The observation that binding of polyUb substrate to the 19Sregulator results in an additional activation of the 26S proteasomalpeptidase activity now suggests that, in the absence of polyUbsubstrates, the 26S proteasome exists predominantly in a semiacti-vated state in which opening of the gate is not fully stabilized. Such aconclusion is supported by image analyses of the Drosophila melano-gaster 26S proteasome, revealing a remarkably flexible linkage betweenthe 19S regulator and the 20S core complex, with a peculiar wagging-type movement of the 19S regulator relative to the 20S proteasomes30.These investigations suggest that in the absence of any substrate thewidth of the gate may vary considerably depending on the interactionstate of the base complex with its six ATPases and the a-rings of the20S proteasome.

Previous work also revealed that binding to the 20S proteasomeand gate opening requires the docking of only two or three of thesix ATPases of the 19S regulator base10. Both Rpt2 and Rpt5 couldstimulate gate opening via their C termini, which each possess a7-residue peptide containing a so-called HbYX motif that isresponsible for docking of the ATPase subunits into one specificintersubunit pocket of the a-ring of the 20S proteasome. In fact, themeasurable proteolytic activity was found to vary depending on thenumber of docked ATPase subunits10. There is, however, no directexperimental evidence so far that the varying width of the gatestructure proposed by structure analysis is due to alternating ATPasesubunit–mediated interactions between the 19S regulator and the 20Sproteasome. In consequence, full activation and gating of the 26Sproteasomes would require structural and conformational alterationstransferred from the 19S regulator to the 20S proteasome uponsubstrate binding.

In accordance with the view that opening of the central pore is notstable and that 26S proteasomes in the absence of polyUb substratesprobably reside in a semiactivated state in vitro, peptide-hydrolyzingactivity of wild-type yeast 26S proteasomes has been shown to beconsiderably reduced in comparison to a3DN mutant, open gate 26Sproteasomes2 (Fig. 6). Although substrate-induced stabilization ofgate opening represents one part of the mechanism to explainpeptidase activation, our experiments also demonstrate that, despitea probably close-to-maximally opened gate in the a3DN open gatemutant or the impaired gating mechanism of the yeast rpt2YA andrpt5YA mutants, binding of Ub5-MUC4 to the 19S regulator stillactivated the peptidase activity of the different types of 26S protea-some to a considerable extent. Ub5-MUC4 induced activation of thethree types of 26S proteasome mutants by increasing the maximaldegradation rate of the substrate, whereas substrate affinity remainedunchanged. This suggests that binding of polyUb substrates to the 19Sregulator also exerts conformational changes on the 20S proteasomeparticle, facilitating the accessibility of the active sites and substrateentry and thereby contributing to enhanced proteolytic activity of the26S proteasome. In fact, recent structural studies on the mammalian26S proteasome suggest that docking of the 19S regulator to the 20Score, probably via Rpt2, Rpt5 and possibly also Rpt1, induces a radialdisplacement of the adjacent a subunits of the 20S proteasome,resulting in a more widely opened central channel that extends towardthe catalytic cavity formed by the b subunits31.

Recently, occupation of the active sites by a proteasome inhibitorhas been shown to result in a 26S proteasome–stabilizing inside tooutside effect that is probably due to induction of conformationalchanges16. The experiments performed here, however, show that evenwhen the b5 active site is occupied by lactacystin polyUb MUC1

derivatives still activate the other peptidase activities of the 26Sproteasome, a result that excludes complex stabilization as a majorreason for the observed substrate-induced activation. Analogously tothe described inside to outside effect, our data reported here may alsobe interpreted as the first evidence for a true outside to inside effectinduced by substrate binding to the 19S regulator. Thus, rather thanbeing in contradiction, the inside to outside effect and the heredescribed outside to inside effect may in fact cooperate in substrate-induced conformational changes of the 26S proteasome to establish astructure-determined feedforward mechanism as a driving force forefficient protein degradation.

In recognition of the complex regulatory gating principle and onthe basis of the data obtained here, the following new model for 26Sproteasome activity and function evolves. Binding of the 19S regulatorto the latent 20S proteasome results in the formation of a ‘semiacti-vated’ or ‘preactivated’ 26S proteasome. In this state, the opening ofthe central gate is not fully stabilized or incomplete. Binding of apolyUb substrate to the 19S regulator in conjunction with an ATP-consuming process induces a more rigid interaction between the 19Sregulator and the 20S proteasome via Rpt2 and Rpt5, therebystabilizing opening of the central gate. Through the interaction withthe adjacent a subunit ring and stabilized radial displacement of thea subunits, polyUb substrate binding at the same time inducesconformational changes in the 20S proteasome that affects thewidth of the substrate channel leading the substrate to the catalyticcavity. In consequence, there exists a mechanism whereby, as a resultof binding to the 19S regulator, polyUb substrates facilitate active-siteaccess and allosterically accelerate their own degradation by the 26Sproteasome.

METHODSPlasmids and antibodies. We generated the plasmids pET3a-D77-Ub

and pET3a-K48C-Ub as described previously18. cDNA of ubiquitin

(Ub)-MUC1 derivatives was obtained by similar methods as described for

Ub-(MUC1950–958)4 (ref. 17). The following antibodies were used: anti-

MUC1950–958 (charge 14G2, A. Zvirbliene, Institute of Biotechnology, Vilnius,

Lithuania), polyclonal anti-b1 (charge 43/4, Institute of Biochemistry, Campus

Charite Mitte, Berlin), and polyclonal anti-Rpn2 (Biomol). The secondary

antibodies were purchased from Seramun.

Generation of recombinant polyUb derivatives. Recombinant ubiquitin and

Ub4 were produced described previously18. The polyUb MUC1 derivatives

Ub5-MUC8 and Ub5-MUClong were produced in analogous manner to that

described for Ub5-MUC4 (ref. 17).

Proteasome purification. We purified 20S and 26S proteasomes from human

erythrocytes as described previously32,33. The protein concentration was deter-

mined by Bradford assay; complex purity and integrity were tested by SDS-

PAGE, immunoblotting and native-PAGE as described in Supplementary

Figure 4 online.

Degradation of polyUb MUC1 derivatives by proteasomes. The 26S protea-

some (30 nM) was incubated with 300 nM polyUb MUC1 derivatives at 37 1C

in a buffer containing 50 mM Tris-HCl, pH 7.6, 10 mM KCl, 0.5 mM DTT,

2 mM ATP, 5 mM MgCl2, 2.5% (v/v) glycerol and a phosphocreatine-based

ATP-regenerating system. To obtain an ATP-free medium, the 26S proteasome

was treated with 25 U ml�1 apyrase (Sigma) for 15 min before addition of

polyUb MUC1 derivatives. Aliquots were taken at the indicated time points

and products were analyzed by SDS-PAGE and immunoblotting.

SDS-PAGE and immunoblotting. Proteins were separated by 12% SDS-PAGE,

transferred onto nitrocellulose and incubated with the appropriate antibodies.

Antibody binding was detected with appropriate secondary antibodies in

connection with the ECL technique (Amersham) according to the instructions

of the manufacturer.

ART IC L E S

22 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 132: Nature Structural Molecular Biology February

Density centrifugation of the proteasome–substrate complex. Degradation

mixtures containing 300 nM Ub5-MUC4 with or without 30 nM 26S protea-

some, as described above, were applied after a 15-min incubation period to a

15–45% (w/v) glycerol gradient and separated by density centrifugation at

284,000g for 16 h using a Beckman Coulter SW 40 rotor. Fractions of 0.5 ml

were collected, and proteins from each fraction were precipitated with

trichloroacetic acid (TCA) and analyzed by SDS-PAGE and immunoblotting.

Peptidase activities of mammalian proteasomes. 26S and 20S proteasomes

(30 nM) were incubated with or without 300 nM MUC1 derivative for 15 min

before addition of fluorogenic peptides in the buffers described above. The

peptidase assay was initiated by addition of 100 mM fluorogenic peptides:

Suc-LLVY-AMC for ChT-like activity; Bz-VGR-AMC for T-like activity; and

Z-LLE-AMC for PGPH activity. These peptides were purchased from Bachem.

After a 15-min incubation at 37 1C the peptide hydrolysis was measured

fluorometrically. To hydrolyze endogenous ATP, 26S proteasome was treated

with apyrase (25 U ml�1) for 15 min. After a 15-min incubation with 2 mM

ATP (Serva) or 2 mM ATP analogs (Sigma) in the presence or absence of the

polyUb MUC1 derivative Ub5-MUC4, the ChT-like activity of the 26S protea-

some was determined as described above. To test the effects of various ubiqituin

species, the 26S proteasome was pre-incubated with different concentrations

(see figure legends) of ubiquitin, Ub4, Ub-MUC4 and Ub5-MUC4, respectively,

for 15 min before addition of the fluorogenic peptide substrate. For inhibitor

studies, the 26S proteasome was pre-incubated with 50 mM clasto-lactacystin

(Calbiochem) or 5 mM o-phenathroline (Sigma) for 15 min before addition

of the polyUb proteins. After a 15-min incubation, 100 mM Suc-LLVY-AMC

was added and the hydrolysis of the fluorogenic peptide was measured as

described above.

Yeast strains and proteasomes. Yeast proteasomes were affinity purified from

cells expressing TEV-cleavable proteinA–tagged Rpn11 instead of the endo-

genous subunit34. The IgG binding domains of protein A fused to Rpn11 were

cleaved by TEV protease in 50 mM Tris-HCl, pH 7.5, 50 mM NaCl, 5 mM

MgCl2, 2 mM ATP, 0.5 mM EDTA, 0.5 mM DTT and 10% (v/v) glycerol.

Proteasomes were subjected to 3.5–6.0% native PAGE. In-gel activity assays

using Suc-LLVY-AMC as fluorogenic peptide were performed in the buffer

used for the degradation assay in the presence or ansence of Ub5-MUC4 (see

figure legends). SDS-PAGE and Coomassie Blue staining were used to control

for homogeneity of proteasome preparations. The genotypes of the yeast strains

used in this study can be found in the Supplementary Methods online.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe would like to thank D. Finley (Harvard Medical School) for the yeast strainsrpt2YA and rpt5YA, J. Dohmen (University of Cologne) for the rpn10 mutantstrain and A. Lehmann for excellent technical assistance. This work wassupported by grants from the Deutsche Forschungsgemeinschaft (SFB421 and SFB740 to P.-M.K.).

AUTHOR CONTRIBUTIONSD.B.-O. and A.H. performed all biochemical experiments; C.E. purified yeastproteasomes; G.C. generated the MUC1 peptides; M.S. and H.-G.H. supervisedthe biochemical and enzyme kinetic experiments; B.D. purified the mammalianproteasomes; D.B.-O., B.D. and P.-M.K. prepared the manuscript; P.-M.K.designed the project.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Voges, D., Zwickl, P. & Baumeister, W. The 26S proteasome: a molecular machinedesigned for controlled proteolysis. Annu. Rev. Biochem. 68, 1015–1068 (1999).

2. Groll, M. et al. A gated channel into the proteasome core particle. Nat. Struct. Biol. 7,1062–1067 (2000).

3. Rechsteiner, M. & Hill, C.P. Mobilizing the proteolytic machine: cell biological roles ofproteasome activators and inhibitors. Trends Cell Biol. 15, 27–33 (2005).

4. Liu, C.W. et al. ATP binding and ATP hydrolysis play distinct roles in the function of26S proteasome. Mol. Cell 24, 39–50 (2006).

5. Verma, R. et al. Role of Rpn11 metalloprotease in deubiquitination and degradation bythe 26S proteasome. Science 298, 611–615 (2002).

6. Yao, T. & Cohen, R.E. A cryptic protease couples deubiquitination and degradation bythe proteasome. Nature 419, 403–407 (2002).

7. Rubin, D.M., Glickman, M.H., Larsen, C.N., Dhruvakumar, S. & Finley, D. Active sitemutants in the six regulatory particle ATPases reveal multiple roles for ATP in theproteasome. EMBO J. 17, 4909–4919 (1998).

8. Braun, B.C. et al. The base of the proteasome regulatory particle exhibits chaperone-like activity. Nat. Cell Biol. 1, 221–226 (1999).

9. Kohler, A. et al. The axial channel of the proteasome core particle is gated by the Rpt2ATPase and controls both substrate entry and product release. Mol. Cell 7, 1143–1152(2001).

10. Smith, D.M. et al. Docking of the proteasomal ATPases’ carboxyl termini in the 20Sproteasome’s a ring opens the gate for substrate entry. Mol. Cell 27, 731–744(2007).

11. Lam, Y.A., Lawson, T.G., Velayutham, M., Zweier, J.L. & Pickart, C.M. A proteasomalATPase subunit recognizes the polyubiquitin degradation signal. Nature 416,763–767 (2002).

12. Pickart, C.M. & Cohen, R.E. Proteasomes and their kin: proteases in the machine age.Nat. Rev. Mol. Cell Biol. 5, 177–187 (2004).

13. Seeger, M., Ferrell, K., Frank, R. & Dubiel, W. HIV-1 tat inhibits the 20 S proteasomeand its 11 S regulator-mediated activation. J. Biol. Chem. 272, 8145–8148(1997).

14. Ferrell, K., Wilkinson, C.R., Dubiel, W. & Gordon, C. Regulatory subunit interactions ofthe 26S proteasome, a complex problem. Trends Biochem. Sci. 25, 83–88 (2000).

15. Babbitt, S.E. et al. ATP hydrolysis-dependent disassembly of the 26S proteasome ispart of the catalytic cycle. Cell 121, 553–565 (2005).

16. Kleijnen, M.F. et al. Stability of the proteasome can be regulated allosterically throughengagement of its proteolytic active sites. Nat. Struct. Mol. Biol. 14, 1180–1188(2007).

17. Hetfeld, B.K. et al. The zinc finger of the CSN-associated deubiquitinating enzymeUSP15 is essential to rescue the E3 ligase Rbx1. Curr. Biol. 15, 1217–1221(2005).

18. Thrower, J.S., Hoffman, L., Rechsteiner, M. & Pickart, C.M. Recognition of thepolyubiquitin proteolytic signal. EMBO J. 19, 94–102 (2000).

19. Fenteany, G. et al. Inhibition of proteasome activities and subunit-specific amino-terminal threonine modification by lactacystin. Science 268, 726–731 (1995).

20. Schmidtke, G., Emch, S., Groettrup, M. & Holzhutter, H.G. Evidence for the existenceof a non-catalytic modifier site of peptide hydrolysis by the 20 S proteasome. J. Biol.Chem. 275, 22056–22063 (2000).

21. Kisselev, A.F., Kaganovich, D. & Goldberg, A.L. Binding of hydrophobic peptides toseveral non-catalytic sites promotes peptide hydrolysis by all active sites of 20 Sproteasomes. Evidence for peptide-induced channel opening in the a-rings. J. Biol.Chem. 277, 22260–22270 (2002).

22. Glickman, M.H. Getting in and out of the proteasome. Semin. Cell Dev. Biol. 11,149–158 (2000).

23. Kohler, A. et al. The substrate translocation channel of the proteasome. Biochimie 83,325–332 (2001).

24. Verma, R., Oania, R., Graumann, J. & Deshaies, R.J. Multiubiquitin chain receptorsdefine a layer of substrate selectivity in the ubiquitin-proteasome system. Cell 118,99–110 (2004).

25. Benaroudj, N., Zwickl, P., Seemuller, E., Baumeister, W. & Goldberg, A.L. ATPhydrolysis by the proteasome regulatory complex PAN serves multiple functions inprotein degradation. Mol. Cell 11, 69–78 (2003).

26. Navon, A. & Goldberg, A.L. Proteins are unfolded on the surface of the ATPase ringbefore transport into the proteasome. Mol. Cell 8, 1339–1349 (2001).

27. Amerik, A.Y. & Hochstrasser, M. Mechanism and function of deubiquitinating enzymes.Biochim. Biophys. Acta 1695, 189–207 (2004).

28. Stohwasser, R., Salzmann, U., Giesebrecht, J., Kloetzel, P.M. & Holzhutter, H.G.Kinetic evidences for facilitation of peptide channelling by the proteasome activatorPA28. Eur. J. Biochem. 267, 6221–6230 (2000).

29. Kopp, F., Dahlmann, B. & Kuehn, L. Reconstitution of hybrid proteasomes frompurified PA700–20 S complexes and PA28ab activator: ultrastructure and peptidaseactivities. J. Mol. Biol. 313, 465–471 (2001).

30. Walz, J. et al. 26S proteasome structure revealed by three-dimensional electronmicroscopy. J. Struct. Biol. 121, 19–29 (1998).

31. da Fonseca, P.C. & Morris, E.P. Structure of the human 26S proteasome: Subunitradial displacements open the gate into the proteolytic core. J. Biol. Chem. 283,23305–23314 (2008).

32. Groettrup, M. et al. The interferon-g-inducible 11 S regulator (PA28) and the LMP2/LMP7 subunits govern the peptide production by the 20 S proteasome in vitro. J. Biol.Chem. 270, 23808–23815 (1995).

33. Dahlmann, B., Kuehn, L. & Reinauer, H. Studies on the activation by ATP of the26 S proteasome complex from rat skeletal muscle. Biochem. J. 309, 195–202(1995).

34. Wendler, P., Lehmann, A., Janek, K., Baumgart, S. & Enenkel, C. The bipartite nuclearlocalization sequence of Rpn2 is required for nuclear import of proteasomal basecomplexes via karyopherin ab and proteasome functions. J. Biol. Chem. 279,37751–37762 (2004).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 2 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 133: Nature Structural Molecular Biology February

Replisome stalling andstabilization at CGG repeats,which are responsible forchromosomal fragilityIrina Voineagu1,2, Christine F Surka1, Alexander A Shishkin1,Maria M Krasilnikova3 & Sergei M Mirkin1

Expanded CGG repeats cause chromosomal fragility andhereditary neurological disorders in humans. Replication forksstall at CGG repeats in a length-dependent manner in primatecells and in yeast. Saccharomyces cerevisiae proteins Tof1and Mrc1 facilitate replication fork progression through CGGrepeats. Remarkably, the fork-stabilizing role of Mrc1 does notinvolve its checkpoint function. Thus, chromosomal fragilitymight occur when forks stalled at expanded CGG repeatsescape the S-phase checkpoint.

Fragile sites are chromosomal loci that look constricted or even brokenupon replication inhibition. Rare fragile sites are associated withhereditary neurological disorders in humans, such as fragile X syn-drome and FRAXE mental retardation, and are caused by expansions of(CGG)n�(CCG)n repeats1. Both repeat expansions and chromosomalfragility seem to depend on faulty DNA replication caused by unusualstructures of the repeats2. CGG repeats adopt hairpin-like andquadruplex-like DNA structures3,4, which arrest DNA replicationin vitro5,6 and in unicellular organisms7,8. Is the same true formammalian cells?

We monitored replication fork progression through CGG repeats inCOS-1 fibroblasts using a pSV2neo episome as previously described9.This vector contains an SV40 replication origin functional in cellsexpressing T antigen. T antigen acts as an initiator and a replicativeDNA helicase; the other components of the replisome come from thehost cell. (CGG)n repeats of normal (n ¼ 18), expansion threshold

(n ¼ 40) or premutation (n ¼ 105) lengths were cloned into thepSV2neo vector into either a nontranscribed area or transcribed area.Each repeat was cloned in two orientations relative to the replicationorigin, positioning either (CGG)n or (CCG)n runs into the lagging-strand template. Their distance from the origin was sufficient toseparate stalled intermediates from the nonreplicated molecules,avoiding problems in detection of repeat-mediated replication stalls10.

Normal-length CGG repeats did not affect replication fork progres-sion in mammalian cells (Fig. 1). CGG repeats of the expansion-threshold size caused defined replication stall signals whether theywere placed in a transcribed or a nontranscribed area. The replicationfork stalling further intensified for 105 CGG repeats. (A diffuseddescending arm of the Y arc here results from contractions of longrepeats during plasmid propagation in Escherichia coli.) Besides beingstronger (Fig. 1b), the area of delayed replication spanned a largeportion of the Y arc (Fig. 1a). In contrast, replication through a twice-longer inverted repeat in mammalian cells produced a defined stallsite9. We believe that the extended replication slow zone might be dueto the lack of a defined symmetry center in a CGG repeat, allowing itto fold into stable structures at multiple points. Notably, for both

Neo

Af/III Xmnl(CGG)n(CCG)n Af/III Af/III(CGG)n

(CCG)n

Amp

18

CGG

CCG

2.6

2.4

2.2

2.0

1.8

Rep

licat

ion

stal

ling

(fol

d)

1.6

1.4

1.2

1.0

0.818 40

Number of repeats105

CGGCCG

40 40 105

b

a

Figure 1 Replication fork stalling at CGG repeats in mammalian cells.

(a) Fork progression through CGG repeats in COS-1 fibroblasts. The location

of the repeat within the vector is shown above the corresponding gels. The

repeat constructs are designated according to the lagging-strand template

sequence. Numbers above the gels show the number of repetitive units.

Solid arrows indicate replication stall sites. (b) Quantitative analysis of the

data obtained in a. Replication stalling was quantified as the ratio between

the maximum radioactive count within the bulge normalized to the adjacent

arc; error bars show s.d.

Received 16 September; accepted 13 November; published online 11 January 2008; doi:10.1038/nsmb.1527

1Department of Biology, Tufts University, 165 Packard Avenue, Medford, Massachusetts 02155, USA. 2Department of Biochemistry and Molecular Genetics, Universityof Illinois at Chicago, 900 South Ashland Avenue, Chicago, Illinois 60607, USA. 3Department of Biochemistry and Molecular Biology, Pennsylvania State University,407 South Frear Laboratory, University Park, Pennsylvania 16802, USA. Correspondence should be addressed to S.M.M. ([email protected]).

22 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

BR I E F COMMUNICAT IONS

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 134: Nature Structural Molecular Biology February

40 and 105 CGG repeats, there was no difference in the severity of thereplication blockage between different orientations (Fig. 1b). Thus,CGG repeats stall replication fork progression in mammalian cells in alength-dependent, but orientation-independent, manner. These resultsare qualitatively similar to our previous observations in yeast8, but theminimal length of the repeat required for replication stalling is higherin mammalian cells, closely matching the threshold length for expan-sions in humans.

The lack of orientation dependence in replication stalling can beinterpreted in two ways: (i) stalling occurs with equal efficiency onboth template strands, or (ii) stalling occurs primarily on the lagging-strand template upon the formation of equally stable structures by therepetitive CGG and CCG runs. We favor the second interpretation, asformation of an alternative DNA structure requires a DNA segment tobecome single stranded, a feature that is intrinsic to the lagging-strandtemplate. In this case, the hairpin-like structures are likely to beresponsible for fork stalling by the (CGG)n�(CCG)n repeat, as theycan be formed by both strands of the repeat4, in contrast to strand-specific quadruplexes5.

Given the similarity between repeat-mediated replication blockagein mammalian and yeast cells, we studied the mechanisms regulating

CGG-mediated replication blockage in yeast. We looked at the effect offork-stabilizing proteins on the replication of CGG repeats in a yeast2-mm plasmid via two-dimensional gel-electrophoresis8. Tof1 is a fork-stabilizing protein that supports normal replication11 and stabilizesreplisomes stalled by hydroxyurea treatment12. In addition, it pro-motes fork pausing at protein-mediated replication barriers, such asFob1–DNA complexes, centromeres or tRNA genes, by counteractingthe Rrm3 helicase13. Consequently, replication stalling at these sites isabsent in Tof1 knockouts14. In contrast, we found that the strength ofthe fork stalling at CGG repeats was increased in Dtof1, compared toin the wild-type strain (Fig. 2). This increase was not due to a generalreplication slow down in the Tof1 mutant, as the strength of repeat-mediated stalling was normalized to the strength of the Y arc(Supplementary Fig. 1 and Supplementary Methods online). Tof1mutation similarly affected fork stalling at inverted repeats9. Thus,Tof1 sustains forks stalled by DNA structures, whereas its fork-pausing,counter-helicase function is limited to protein-mediated barriers.

Yeast Mrc1 forms a complex with the Tof1 and Csm3 proteins thattravels with the replication fork12. Besides maintaining the integrity ofstalled replisomes, Mrc1 mediates replication checkpoint responses15.We found that the severity of replication stalling at the CGG repeatwas increased in the Dmrc1 compared to in the wild-type strain(Fig. 2a,b). To distinguish whether this effect was due to thecheckpoint or fork-stabilizing function of the protein, we analyzedits separation-of-function mutant mrc1AQ, which cannot be phos-phorylated by the checkpoint kinases15. When we expressed themrc1AQ allele in the Dmrc1 strain, the severity of repeat-mediatedfork stalling was restored to the wild-type level (Fig. 2a,b). Further-more, repeat-mediated replication stalling in the Dtof1Dmrc1 doublemutant was similar to that in individual mutants. Thus, fork stabiliz-ing, rather than the checkpoint function of the Mrc1 protein, helpsreplisome progress through CGG repeats. By the same token, repli-somes stalled at CGG repeats do not seem to trigger an intra–S phasecheckpoint response.

We propose that chromosomal fragility arises when a replicationfork stalled at the expanded repeat escapes the S-phase checkpoint(Fig. 3). This could result in the continuation of the cell cycle beforecompletion of replication around the repeat16,17. As mitosis proceeds,the under-replicated areas would convert into constrictions and/ordouble-stranded breaks.

Recent data indicate that breakage and instability of trinucleotiderepeats is increased in yeast Tof1 and Mrc1 knockouts18,19. Combinedwith our current data, these results point to the role of fork stalling inrepeat instability. Similarly to our observations, Mrc1 prevented repeatcontractions independently of its checkpoint function. The checkpointfunction, however, specifically prevented repeat expansions. Thus,replication fork stalling may trigger repeat contractions, whereasexpansions may additionally involve a DNA-damage checkpoint–inducing event.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank K. Mirkin for her help with plasmid construction, C. Freudenreich formany useful suggestions, S. Elledge (Harvard Medical School) for the plasmidwith the mrc1AQ allele and J. and P. White for their generosity. Supported by theUS National Institutes of Health grant GM60987 to S.M.M.

AUTHOR CONTRIBUTIONSI.V. designed and performed experiments in yeast and mammalian cells, andwrote the paper; C.F.S. performed replication studies in mammalian cells;A.A.S. performed cassettes for yeast knockouts; M.M.K. contributed to plasmid

WT ∆tof1 ∆mrc1 mrc1AQ ∆tof1∆mrc1

WT ∆tof1 ∆mrc1 mrc1AQ ∆tof1∆mrc1

2.6

2.8

2.4

2.2

2.0

1.8

Rep

licat

ion

stal

ling

(fol

d)

1.6

1.4

1.2

1.0

a

b

Figure 2 Genetic control of replication fork pausing at CGG repeats.

(a) Replication fork progression through (CGG)40 in the wild-type and mutantS. cerevisiae strains. Arrows show replication stall sites. (b) Quantitative

analysis of data obtained in a. Replication stalling was quantified as the

ratio between the maximum radioactive count within the bulge normalized to

the adjacent arc; error bars show s.d.

Fork stalling at the CGG repeat

S-phase checkpoint escape

Delayed replication of the CGG repeat areaand

cell-cycle progression

Unreplicated area

Chromosomal fragility

Figure 3 Model of chromosomal fragility at expanded CGG repeats.

BR I E F COMMUNICAT IONS

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 2 2 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 135: Nature Structural Molecular Biology February

construction; S.M.M. designed experiments, supervised the whole project andwrote the paper.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Fu, Y.H. et al. Cell 67, 1047–1058 (1991).2. Mirkin, S.M. Nature 447, 932–940 (2007).3. Fry, M. & Loeb, L.A. Proc. Natl. Acad. Sci. USA 91, 4950–4954 (1994).4. Gacy, A.M., Goellner, G., Juranic, N., Macura, S. & McMurray, C.T. Cell 81, 533–540

(1995).5. Usdin, K. & Woodford, K.J. Nucleic Acids Res. 23, 4202–4229 (1995).6. Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S. & Wells, R.D. J. Biol. Chem. 270,

27014–27021 (1995).7. Samadashwily, G.M., Raca, G. & Mirkin, S.M. Nat. Genet. 17, 298–304 (1997).

8. Pelletier, R., Krasilnikova, M.M., Samadashwily, G.M., Lahue, R.S. & Mirkin, S.M. Mol.Cell. Biol. 23, 1349–1357 (2003).

9. Voineagu, I., Narayanan, V., Lobachev, K.S. & Mirkin, S.M. Proc. Natl. Acad. Sci. USA105, 9936–9941 (2008).

10. Nichol Edamura, K., Leonard, M.R. & Pearson, C.E. Am. J. Hum. Genet. 76,302–311 (2005).

11. Hodgson, B., Calzada, A. & Labib, K. Mol. Biol. Cell 18, 3894–3902 (2007).12. Katou, Y. et al. Nature 424, 1078–1083 (2003).13. Mohanty, B.K., Bairwa, N.K. & Bastia, D. Proc. Natl. Acad. Sci. USA 103,

897–902 (2006).14. Calzada, A., Hodgson, B., Kanemaki, M., Bueno, A. & Labib, K. Genes Dev. 19,

1905–1919 (2005).15. Osborn, A.J. & Elledge, S.J. Genes Dev. 17, 1755–1767 (2003).16. Torres-Rosell, J. et al. Science 315, 1411–1415 (2007).17. Hansen, R.S., Canfield, T.K., Lamb, M.M., Gartler, S.M. & Laird, C.D. Cell 73,

1403–1409 (1993).18. Freudenreich, C.H. & Lahiri, M. Cell Cycle 3, 1370–1374 (2004).19. Razidlo, D.F. & Lahue, R.S. DNA Repair (Amst.) 7, 633–640 (2008).

BR I E F COMMUNICAT IONS

22 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.