7
ARTICLES Post-translational modifications, such as phosphorylation, acetylation or methylation, regulate protein function in many cellular processes. Since its discovery two decades ago, O-linked glycosylation of nuclear and cytoplasmic proteins has emerged as a widespread form of post- translational modification 1,2 . It involves the attachment of single GlcNAc moieties to serine or threonine residues of intracellular pro- teins. The modification is reversible, dynamic and inducible by cellular stimuli 3 . As such, O-GlcNAc modification is more similar to phospho- rylation than to the complex N- and O-linked glycosylation occurring in the secretory pathway. Indeed, additions of GlcNAc and phosphate moieties often occur at the same or adjacent amino acid residues and compete with each other 4,5 . Components of the nuclear pore complex (nucleoporins) were among the first intracellular O-GlcNAc-modified proteins to be identi- fied by studies in rat liver nuclei owing to their interaction with the GlcNAc-binding lectin wheat germ agglutinin (WGA) 6 . The nucleo- porin p62 has been shown to have GlcNAc attached at multiple sites 7 . It is now known that O-GlcNAc modification occurs in at least 100 pro- teins as diverse as transcription factors, RNA polymerase II, cytoskele- tal proteins, signaling proteins and tumor suppressors 8 . O-GlcNAc modifications have been shown to impair the trans-activation capabil- ity of the transcription factor Sp1 (ref. 9) and inhibit the proteasome 10 . O-GlcNAc-dependent signaling has been implicated in human diseases including neurodegeneration and diabetes 1 . In the case of nuclear transport, the role of the O-GlcNAc modification in NPC structure and function is still unresolved. The attachment of O-GlcNAc is catalyzed by the enzyme OGT 11,12 , which is encoded by a single gene that is ubiquitously expressed in higher eukaryotes. The gene is absent from the Saccharomyces cerevisiae genome but is essential for embryonic stem cell viability and embryonic development in mice 13 . The pro- tein is highly conserved. Human OGT shares >65% sequence iden- tity with its Caenorhabditis elegans and Drosophila melanogaster orthologs 11,12 . The 110-kDa polypeptide is composed of two sepa- rate domains. The C-terminal domain has glycosyltransferase activity and is alone sufficient to glycosylate short peptides in vitro 14,15 . However, glycosylation of physiological protein sub- strates such as p62 requires the presence of the N-terminal domain, which consists of multiple TPR repeats 14–16 . The TPR domain has also been shown to interact with the transcriptional repressor mSin3A 17 and to be essential for OGT oligomerization 16 . TPR repeats are 34-residue motifs known to mediate protein-pro- tein interactions 18 . The motifs share a degenerate consensus sequence and occur in tandem, typically in 3–16 copies per protein 19 . The structure of the TPR domain of protein phosphatase 5 revealed that each TPR motif folds into a pair of antiparallel α-helices and that adjacent repeats stack in a parallel fashion 20 . Although struc- tural information is available for several proteins containing three to six TPR repeats 21–23 , it is unclear what overall structure would be adopted by a larger set of tandem repeats. We report here the struc- ture of the homodimeric TPR domain of human OGT, which con- tains 11.5 TPR repeats out of the total 13.5. The repeats form an elongated superhelix with a phylogenetically conserved inner surface that exhibits an unexpected structural similarity to armadillo (ARM) repeat–containing proteins, such as importin α and β-catenin. The similarity suggests a possible mechanism for substrate binding that would account for the ability of OGT to recognize a wide range of physiological protein substrates. 1 Structural and Computational Biology and 2 Gene Expression Programmes, European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany. 3 Laboratory of Cell Biochemistry and Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, USA. Correspondence should be addressed to E.C. ([email protected]). Published online 12 September 2004; doi:10.1038/nsmb833 The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α Martin Jínek 1 , Jan Rehwinkel 2 , Brooke D Lazarus 3 , Elisa Izaurralde 2 , John A Hanover 3 & Elena Conti 1 Addition of N-acetylglucosamine (GlcNAc) is a ubiquitous form of intracellular glycosylation catalyzed by the conserved O-linked GlcNAc transferase (OGT). OGT contains an N-terminal domain of tetratricopeptide (TPR) repeats that mediates the recognition of a broad range of target proteins. Components of the nuclear pore complex are major OGT targets, as OGT depletion by RNA interference (RNAi) results in the loss of GlcNAc modification at the nuclear envelope. To gain insight into the mechanism of target recognition, we solved the crystal structure of the homodimeric TPR domain of human OGT, which contains 11.5 TPR repeats. The repeats form an elongated superhelix. The concave surface of the superhelix is lined by absolutely conserved asparagines, in a manner reminiscent of the peptide-binding site of importin . Based on this structural similarity, we propose that OGT uses an analogous molecular mechanism to recognize its targets. NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 11 NUMBER 10 OCTOBER 2004 1001 © 2004 Nature Publishing Group http://www.nature.com/natstructmolbiol

The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

  • Upload
    elena

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

Post-translational modifications, such as phosphorylation, acetylationor methylation, regulate protein function in many cellular processes.Since its discovery two decades ago, O-linked glycosylation of nuclearand cytoplasmic proteins has emerged as a widespread form of post-translational modification1,2. It involves the attachment of singleGlcNAc moieties to serine or threonine residues of intracellular pro-teins. The modification is reversible, dynamic and inducible by cellularstimuli3. As such, O-GlcNAc modification is more similar to phospho-rylation than to the complex N- and O-linked glycosylation occurringin the secretory pathway. Indeed, additions of GlcNAc and phosphatemoieties often occur at the same or adjacent amino acid residues andcompete with each other4,5.

Components of the nuclear pore complex (nucleoporins) wereamong the first intracellular O-GlcNAc-modified proteins to be identi-fied by studies in rat liver nuclei owing to their interaction with theGlcNAc-binding lectin wheat germ agglutinin (WGA)6. The nucleo-porin p62 has been shown to have GlcNAc attached at multiple sites7. Itis now known that O-GlcNAc modification occurs in at least 100 pro-teins as diverse as transcription factors, RNA polymerase II, cytoskele-tal proteins, signaling proteins and tumor suppressors8. O-GlcNAcmodifications have been shown to impair the trans-activation capabil-ity of the transcription factor Sp1 (ref. 9) and inhibit the proteasome10.O-GlcNAc-dependent signaling has been implicated in human diseasesincluding neurodegeneration and diabetes1. In the case of nucleartransport, the role of the O-GlcNAc modification in NPC structure andfunction is still unresolved.

The attachment of O-GlcNAc is catalyzed by the enzymeOGT11,12, which is encoded by a single gene that is ubiquitouslyexpressed in higher eukaryotes. The gene is absent from the

Saccharomyces cerevisiae genome but is essential for embryonicstem cell viability and embryonic development in mice13. The pro-tein is highly conserved. Human OGT shares >65% sequence iden-tity with its Caenorhabditis elegans and Drosophila melanogasterorthologs11,12. The 110-kDa polypeptide is composed of two sepa-rate domains. The C-terminal domain has glycosyltransferaseactivity and is alone sufficient to glycosylate short peptides in vitro14,15. However, glycosylation of physiological protein sub-strates such as p62 requires the presence of the N-terminaldomain, which consists of multiple TPR repeats14–16. The TPRdomain has also been shown to interact with the transcriptionalrepressor mSin3A17 and to be essential for OGT oligomerization16.

TPR repeats are 34-residue motifs known to mediate protein-pro-tein interactions18. The motifs share a degenerate consensussequence and occur in tandem, typically in 3–16 copies per protein19.The structure of the TPR domain of protein phosphatase 5 revealedthat each TPR motif folds into a pair of antiparallel α-helices andthat adjacent repeats stack in a parallel fashion20. Although struc-tural information is available for several proteins containing three tosix TPR repeats21–23, it is unclear what overall structure would beadopted by a larger set of tandem repeats. We report here the struc-ture of the homodimeric TPR domain of human OGT, which con-tains 11.5 TPR repeats out of the total 13.5. The repeats form anelongated superhelix with a phylogenetically conserved inner surfacethat exhibits an unexpected structural similarity to armadillo (ARM)repeat–containing proteins, such as importin α and β-catenin. Thesimilarity suggests a possible mechanism for substrate binding thatwould account for the ability of OGT to recognize a wide range ofphysiological protein substrates.

1Structural and Computational Biology and 2Gene Expression Programmes, European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg,Germany. 3Laboratory of Cell Biochemistry and Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda,Maryland 20892, USA. Correspondence should be addressed to E.C. ([email protected]).

Published online 12 September 2004; doi:10.1038/nsmb833

The superhelical TPR-repeat domain of O-linked GlcNActransferase exhibits structural similarities to importin αMartin Jínek1, Jan Rehwinkel2, Brooke D Lazarus3, Elisa Izaurralde2, John A Hanover3 & Elena Conti1

Addition of N-acetylglucosamine (GlcNAc) is a ubiquitous form of intracellular glycosylation catalyzed by the conserved O-linkedGlcNAc transferase (OGT). OGT contains an N-terminal domain of tetratricopeptide (TPR) repeats that mediates the recognition ofa broad range of target proteins. Components of the nuclear pore complex are major OGT targets, as OGT depletion by RNAinterference (RNAi) results in the loss of GlcNAc modification at the nuclear envelope. To gain insight into the mechanism of targetrecognition, we solved the crystal structure of the homodimeric TPR domain of human OGT, which contains 11.5 TPR repeats. Therepeats form an elongated superhelix. The concave surface of the superhelix is lined by absolutely conserved asparagines, in amanner reminiscent of the peptide-binding site of importin �. Based on this structural similarity, we propose that OGT uses ananalogous molecular mechanism to recognize its targets.

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 11 NUMBER 10 OCTOBER 2004 1001

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 2: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

RESULTSNucleoporins are major OGT targetsThe nucleporin p62 is a high-affinity substrate for the OGT enzyme24.To first assess the extent of nucleoporin-OGT interactions in vivo, wedepleted OGT in D. melanogaster cultured cells (S2 cells) by RNAi andanalyzed nuclear envelope staining with WGA, a GlcNAc-bindinglectin that is commonly used to detect the nuclear envelope25. OGTknockdown resulted in complete loss of WGA staining at the nuclearenvelope, without affecting staining by mAb414, an antibody that rec-ognizes several nucleoporins (Fig. 1a). Thus, in the absence of OGT,nucleoporins are present at the nuclear envelope but are no longer gly-cosylated. Similar results were obtained by knockdown in HeLa cells,even though in this case the depletion was not as efficient as in S2 cells(Supplementary Fig. 1 and Methods online). This indicates thatnucleoporins are major OGT targets in vivo.

We next identified a fragment of human OGT that would be suffi-cient for nucleoporin p62 recognition in vitro and would be amenablefor structural studies. The fragment of human OGT we crystallizedencompasses residues 16–400 and includes the reported nucleoporinp62-binding site14. This fragment is active in p62 recognition, asshown by competition experiments using in vitro glycosylation assays,in which OGT(16–400) substantially inhibited the glycosylation ofp62 by full-length OGT (Fig. 1b). The fragment also contains thereported recognition site for mSin3A17.

Structure of the protein-targeting domain of OGTThe crystal structure of human OGT(16–400) was solved by MAD andrefined to a resolution of 2.85 Å. The final model has working and free

R-values of 25.9% and 29.7%, respectively, and good stereochemistry(Table 1). Two molecules are present in the asymmetric unit of thecrystal. One polypeptide chain (molecule 1) is well ordered, with theexception of the first α-helix and two loops (residues 16–26 and38–45). The other polypeptide chain (molecule 2) consists of two frag-ments comprising residues 16–319 and 340–400, which are joined by adisordered hinge-like region.

The TPR domain is a dimer of superhelicesThe TPR domain is a homodimer of right-handed superhelices packedagainst each other at an angle of ∼ 80° and related by pseudo two-foldrotational symmetry (Fig. 2). The well-ordered polypeptide chain (mol-ecule 1, yellow in Fig. 2) consists of 23 antiparallel α-helices forming11.5 TPR repeats. Each repeat folds into a pair of antiparallel helices(termed helices A and B). The repeats are characterized by a pattern ofhydrophobic residues consistent with the canonical TPR sequencemotif19 (W4-L7-G8-Y11-A20-F24-A27-P32, where the numbers indi-cate position within the repeat, Fig. 3a). The sequence preceding thefirst predicted TPR repeat (residues 79–112) also adopts the TPR folddespite its deviation from the canonical TPR consensus sequence.

The individual repeats are very similar to one another and to those of other TPR-containing proteins such as protein phosphatase 5 (PP5)20, superposing with an r.m.s. deviation of <1.8 Å in the corre-

1002 VOLUME 11 NUMBER 10 OCTOBER 2004 NATURE STRUCTURAL & MOLECULAR BIOLOGY

Figure 2 Structure of the homodimeric TPR domain of human OGT. Themolecule is shown in two views related by a 90° rotation about the pseudotwo-fold axis. Chain 1 is yellow, chain 2 is blue. Residues 320–339 inchain 2 corresponding to TPR repeat 10 are disordered and are indicatedwith a dashed line. All ribbon drawings were prepared using PyMOL(http://www.pymol.org).

Figure 1 Nucleoporins are major OGT targets. (a) Depletion of OGT in D. melanogaster S2 cells results in the loss of O-Glc-NAc modification at the NPCs (as detected by WGA staining), without detectably affecting NPC integrity (as detected by the nucleoporin-specific antibody mAb414)On days 4 and 8, cells were retransfected with the same dsRNAs. Cells werecollected on day 12, fixed and stained with fluorescently labeled WGA or anti-nucleoporin antibody (mAb414). In the merged images, the WGA signal isgreen and the mAb414 signal is red. (b) Human OGT(16–400) inhibitsnucleoporin p62 glycosylation by recombinant full-length human OGT. Thefragment used for crystallographic analysis was added to a glycosylationreaction with varying amounts of p62 as a substrate. The assays in thepresence and absence of the OGT fragment were done in triplicates and a representative experiment is shown as a Lineweaver-Burk plot.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 3: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

sponding Cα positions. Tandem stacking of the repeats, together withthe rotation and translation relating adjacent repeats, gives rise to aregular superhelical structure. The superhelix consists of two layers ofα-helices, with the A and B helices of the TPR motifs forming theinner (concave) and outer (convex) faces, respectively (Fig. 3b). Inter-repeat contacts are mediated by the complementary packing of thesmall and large hydrophobic residues of the TPR sequence motif,which gives rise to a continuous hydrophobic core.

The OGT superhelix is conformationally flexibleThe TPR superhelix is ∼ 100 Å long and 35 Å wide, with a pitch of 55 Å.One complete superhelical turn comprises about seven TPR repeats, ingood agreement with the model predicted for multiple TPR-contain-ing proteins on the basis of the structure of the TPR-repeat domain ofPP5 (ref. 20). The expectation based on the structure of OGT(16–400)is that the full-length TPR domain of OGT (residues 1–464) contains13.5 repeats. The additional repeats would contribute four more α-helices and extend the superhelix to two complete superhelical turns.

The two polypeptide chains of the TPR domain adopt different con-formations in the crystal structure (Fig. 3c). Although all TPR repeatsof molecule 1 are arranged around the superhelical axis, in molecule 2the superhelix bends by ~40° between TPR 9 and TPR 10. The regionspanning TPR repeats 1–9 of chain 2 is related by local two-fold sym-metry to the corresponding region of chain 1, with which it superposeswith an r.m.s. deviation of 0.90 Å between 300 equivalent Ca atoms.The C-terminal TPR repeats (residues 340–400) have a similar struc-

ture to the corresponding repeats of chain 1 (r.m.s. deviation of 1.06 Åfor 61 Ca atoms) but are deflected from the local two-fold symmetry.Breakage of the hydrophobic interface between repeats 9 and 10 inmolecule 2 is apparently due to the formation of an extensive crystalcontact of the C-terminal helix with molecule 1 of a symmetry-relateddimer. Crystal-induced unfolding of TPR repeats has previously beenobserved in the structures of bovine cyclophilin Cyp40 (ref. 26) andtrypanosomal peroxin Pex5 (ref. 27). Although the differencesbetween the conformations of the two polypeptide chains observed in the structure are likely due to crystal packing, they reflect the inher-ent flexibility of the elongated superhelical conformation. This flexi-bility may be functionally significant for target protein recognition, asobserved in the case of the helical HEAT-repeat protein importin b,which adopts different overall conformations upon interaction withdifferent binding partners28–30.

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 11 NUMBER 10 OCTOBER 2004 1003

Figure 3 The contiguous TPR repeats form a regular superhelicalarchitecture. (a) Structure-based sequence alignment of the TPR repeats inhuman OGT. Secondary structure elements are indicated at the top of thefigure (helices A and B). The canonical TPR consensus sequence19 is at thebottom. Positions of solvent-exposed residues are marked with asterisks.Surface residues that are strictly conserved in five OGT sequences (human,rat, D. melanogaster, C. elegans and Arabidopsis thaliana) are shadedorange. Strictly conserved residues in the hydrophobic core of the structureare shaded pink. Residues involved in dimer formation are brown. (b) Cartoon drawing of the TPR superhelix (chain 1). The inner surface is formed by the A helices of the TPR motifs (yellow). The B helices,comprising the outer surface of the superhelix, are gray. (c) The TPRdomain of OGT is conformationally flexible. Structural superposition of the two polypeptide chains found in the asymmetric unit (chain 1, yellow;chain 2, blue). The C-terminal fragment of chain 2 bends ∼ 40° away fromthe superhelical axis, breaking the interface between repeats 9 and 10.

Figure 4 The outer face of the TPR superhelix mediates homodimerization of OGT. (a) The dimerization interface of the TPR domain (chain 1, yellow; chain 2,blue) is centered on the local dyad axis. Residues involved in dimer formation are highlighted. The main feature of the interface is a hydrophobic contactinvolving interdigitating residues Trp198 and Ile201 from both chains. (b) The TPR domain is a constitutive homodimer in solution. Samples of wild-type(gray) and double-mutant W198E I201D (blue) TPR domains were analyzed on a Superdex 200 size-exclusion column. The elution profiles are plotted as A280against elution volume. (c) Glycosyltransferase activity of wild-type and double-mutant (W198E I201D) OGT enzymes using p62 as a substrate. The y-axisrepresents percent activity relative to wild type.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 4: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

The convex surface of the superhelix mediates dimerizationDimerization of the TPR domain is mediated by the convex faces ofthe superhelical monomers (Fig. 2). The dimer interface is centered atthe local dyad axis of the homodimer and is made up of the B helicesof repeats 6 and 7. The interaction is mainly hydrophobic, with theside chains of Trp198 and Ile201 of one molecule interdigitating withthe corresponding side chains of the other (Fig. 4a). The interaction isfurther reinforced by additional hydrophobic contacts (Leu199 andIle230) and by interchain hydrogen bonds (Glu205, Arg233, Glu196

and His202). The dimer interface is relatively small (560 Å2 permonomer) and buries only 3% of the surface area of each monomer.However, the hydrophobic nature of the interface points to a consti-tutive dimer. In particular, the presence of a tryptophan residue(Trp198) at position 18 in the TPR consensus is unique to the TPRrepeat 6; the other repeats have polar or charged amino acids at thissolvent-exposed position (Fig. 3a).

To investigate the physiological relevance of the dimerizationobserved in the crystals, we mutated two hydrophobic residues atthe interface to negatively charged residues (W198E and I201D)and carried out size-exclusion chromatography. Whereas the wild-type TPR domain migrates with an apparent molecular mass of∼ 120 kDa, the double mutant migrates with an apparent molecularmass of 70 kDa (Fig. 4b). This suggests that the dimerizationobserved in the crystals also occurs in solution, and is disrupted inthe mutant. The observation that the gel filtration profile overesti-mates the apparent molecular masses of both the dimer (apparentmolecular mass, 120 kDa; expected, 85 kDa) and the monomer (70 kDa and 42 kDa, respectively) is consistent with the ratherelongated and nonglobular structure of the TPR domain. Thismight account for previous reports that suggested trimerization ofthe TPR domain based on gel filtration data16. Disruption of thedimerization interface in the context of the full-length OGTenzyme causes only a modest decrease in enzymatic activity towardp62 (Fig. 4c).

1004 VOLUME 11 NUMBER 10 OCTOBER 2004 NATURE STRUCTURAL & MOLECULAR BIOLOGY

Figure 5 The inner surface of the TPR superhelix features a ladder of asparagines. (a) Schematic of the inside face of the superhelix, showing residues presentat positions 6 (top row) and 9 (bottom row) in the TPR repeats. Absolutely conserved asparagines (Fig. 3a) are highlighted in red. TPR repeats 13 and 14,shaded in gray, are not present in the crystal structure. (b) Stereo view of electron density around the conserved asparagines of TPR repeats 3 and 4 in chain 2with the final model superimposed. The 2.85-Å electron density map is contoured at 1.1 σ. (c) Ribbon drawing of TPR repeats 2–4 (helices A in yellow, helicesB in gray) with solvent-exposed residues highlighted. The position in the TPR consensus sequence (Fig. 3a) of each residue is indicated with circled numbers.

Figure 6 The inner surface of the TPR superhelix is similar to thepeptide-binding surface of importin α. (a) The TPR domain of OGT (left)is shown as a transparent surface. Absolutely conserved asparagines atposition 6 in the TPR repeats are orange. The ARM domain of importin α(right) is shown in complex with the nucleoplasmin nuclear localizationsignal (NLS) peptide (purple)35. The conserved asparagine arrayresponsible for binding the peptide backbone of the NLS is red. Theimportin α structure was superimposed onto the TPR domain structureusing the protein structure comparison service SSM at EuropeanBioinformatics Institute (http://www.ebi.ac.uk/msd-srv/ssm) and is shownat the same scale and orientation. (b) Close-up views of parts of theconcave surface of OGT (left) and of the peptide-binding surface ofimportin α (right) with a nucleoplasmin NLS peptide bound to it. Theconserved asparagines are highlighted.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 5: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

The concave surface of the superhelix is conservedWe next analyzed the location of phylogenetically conserved residuesin the structure. Conserved hydrophobic residues that play a role inmaintaining the hydrophobic core of the protein are interspersed inboth inner and outer helices throughout the polypeptide chain (pink,Fig. 3a). In contrast, most of the conserved surface-exposed aminoacids are restricted to the inner A helix (orange, Fig. 3a) and line theinner concave surface of the superhelix. Particularly well conserved arethe residues that line the central part of the surface groove (positions 6and 9 in the TPR consensus, Fig. 3a). Notably, these positions aremostly occupied by asparagines, which form a continuous ladderthroughout the superhelix (Fig. 5a). The asparagines contribute tointer-repeat interactions, with the side chain of the asparagine at posi-tion 6 contacting the side chain of the asparagine at position 9 of theupstream TPR (Fig. 5b,c). Several of the residues forming the edge ofthe surface groove (positions 12 and 13, Fig. 3a) are also conserved.

The asparagine array on the inner surface of the OGT superhelixbears a marked similarity to the array of conserved asparagines in theARM-repeat proteins importin α and β-catenin31,32 (Fig. 6a,b). Inboth ARM-repeat proteins, the asparagine side chains contribute tothe binding of the target peptide (a nuclear localization signal in thecase of importin α and E-cadherin in the case of β-catenin) by formingbidentate hydrogen bonds with the peptide backbone. The structuralsimilarity suggests that the TPR domain of OGT uses a similar mecha-nism of protein-protein recognition.

DISCUSSIONTo obtain insights into the molecular basis of OGT function, wedetermined the structure of the protein-protein interaction domainof OGT. On the basis of structural similaritywith importin α and evolutionary conserva-tion, we propose that the inner surface of themolecule is the protein-binding site. Here, atarget polypeptide would bind in an extendedconformation by interactions of its backbonewith the conserved asparagine array of theTPR domain. Notably, asparagines at posi-tions 6 and/or 9 are found in other TPR-con-taining proteins such as Pex5 (ref. 21), Hop23,PP5 (ref. 20) and Tom70 (ref. 33). In thestructure of Pex5 in complex with a peroxi-some import signal21, an asparagine at posi-tion 6 binds the backbone of the targetpeptide, even though the overall conforma-tion of Pex5 is rather different from theextended conformations of OGT, importin αand β-catenin. In the structure of Hop boundto Hsp70 and Hsp90 peptides23, twoabsolutely conserved asparagines contact themain chain amide groups and also the C-ter-minal carboxylate of the peptide.

The structure includes 11.5 TPR repeatsout of the total 13.5. It is the largest TPR-containing protein structure determined sofar and shows that the repeats are arrangedinto an extended superhelical architecture;this might be a paradigm for other proteinscomposed of a large number of tandem TPRrepeats. The extensive surface generated bythe multiple repeats is likely to represent several potentially overlapping binding

pockets, each defined by the presence of a conserved asparagineresidue within a TPR repeat. Different substrates might be recog-nized by different sets of these binding pockets, analogously toimportin α, which uses different subsets of its conserved asparaginesto bind different types of nuclear localization signals34,35. Such inter-actions would occur independently of the amino acid sequence ofthe target polypeptides. In this context, specificity is likely to be con-ferred by neighboring amino acids that flank the asparagines oneturn of the A helix upstream or downstream. The proposed mode ofsubstrate recognition provides a rationale for the lack of an apparentconsensus sequence in the known OGT substrates and suggests thatOGT-interacting sequences would be likely to reside in unstructuredregions in target proteins. Thus, the structure reveals how the TPRdomain of OGT might function as a versatile molecular scaffoldcapable of integrating a variety of protein-protein interactions.

METHODSRNA interference and immunofluorescence. RNAi in D. melanogaster S2 cellswas carried out as described25. The OGT and GFP double-stranded RNAs(dsRNAs) correspond to the first 800 bp and 650 bp of the OGT and GFP codingregions, respectively. S2 cells grown in suspension were allowed to adhere to poly-D-lysine-coated coverslips for 10 min. Cells were washed once in PBS and fixedwith 3.7% (v/v) paraformaldehyde in PBS for 10 min. After fixation, cells werewashed in PBS, permeabilized for 10 min with PBS containing 0.5% (v/v) TritonX-100 and washed again with PBS. Indirect immunofluorescence with mAb414antibodies (diluted 1:1,500) was carried out as described25. Cy3-coupled donkeysecondary antibody (Molecular Probes) was used in a dilution of 1:500. Nuclearenvelopes were stained with Alexa Fluor488-WGA conjugates (dilution 1:1,500).Cells were mountedusing Fluoromount-G (Southern Biotechnology Associates).Images were acquired using a Zeiss LSM510 FCS confocal microscope.

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 11 NUMBER 10 OCTOBER 2004 1005

Table 1 Data collection and refinement statistics

Nativea Derivativeb

Data collection

Space group P1 P1

Cell dimensions

a, b, c (Å) 64.3, 75.5, 77.7

α, β, γ (°) 105.1, 105.1, 110.3

Au peak Au inflection Au remote

Wavelength (Å) 0.933 1.03730 1.04010 0.99990

Resolution (Å) 20–2.85 (2.95–2.85) 30–3.6 (3.7–3.6) 30–3.6 (3.7–3.6) 30–3.6 (3.7–3.6)

Unique reflections 27,841 12,956 13,262 12,831

Completeness (%) 97.3 (96.8) 93.2 (79.4) 94.0 (85.0) 94.3 (87.1)

Multiplicity 5.8 (5.1) 1.2 (1.2) 1.2 (1.2) 1.1 (1.1)

I / σ 21.0 (3.5) 10.0 (3.3) 9.6 (3.3) 9.0 (5.0)

Rsym (%) 5.1 (46.7) 3.8 (19.8) 4.4 (21.4) 4.6 (15.6)

Riso (%) 23.2

Refinement

Rfree (%) 29.7

Rwork (%) 25.9

R.m.s. deviations

Bond lengths (Å) 0.0075

Bond angles (°) 1.25

No. atoms

Protein 756

Water 27

Ions 1

Values in parentheses correspond to the highest resolution shell.aESRF ID14-2. bSLS.

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 6: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

Expression and purification of the TPR domain. The full-length TPR region ofhuman OGT (residues 1–464, with numbering corresponding to the sequenceof isoform 2, Swiss-Prot entry O15294) was amplified by PCR from I.M.A.G.E.Consortium cDNA clone number 4865031. Attempts to crystallize the full-length TPR region were unsuccessful. A series of deletion constructs were gener-ated based on limited proteolysis experiments and mass spectrometry analysis(data not shown). The TPR domain that yielded crystals comprises residues16–400. The domain was subcloned in a pET-derived expression vector in framewith an N-terminal His6-tag followed by the rhinovirus 3C protease cleavagesite, and expressed in E. coli strain BL21 (DE3). Cells were grown to an A600 of0.8 at 37 °C and induced with 0.4 mM IPTG at 18 °C for 16 h. Harvested cellswere resuspended in 20 mM Tris-Cl, 500 mM NaCl, 5 mM imidazole, pH 8.0,and lysed with a homogenizer (Avestin). The lysate was loaded onto a Taloncobalt resin (Clontech) and eluted with an imidazole gradient. Peak fractionscontaining the TPR domain were digested overnight with 3C protease whilebeing dialyzed against 20 mM Na-HEPES, pH 7.5, 200 mM NaCl, 2 mM DTT.The TPR domain was further purified by size-exclusion chromatography(Superdex 200, Pharmacia) in the same buffer and concentrated to 30 mg ml–1.Samples were flash-frozen in liquid N2 and stored at –80 °C. The same protocolwas used to purify the W198E I201D mutant. The mutation was introducedusing the QuikChange mutagenesis kit (Stratagene) according to manufacturer’sinstructions and confirmed by DNA sequencing.

Glycosylation assays. In vitro glycosylation of recombinant p62 was carriedout essentially as described14. Briefly, bacterially expressed p62 was purified byion exchange chromatography and 1 µg was added to the glycosylation reac-tion. Bacterially expressed OGT was added with 14C-labeled UDP-GlcNAc to aconcentration of ∼ 0.3 nM. The reaction was terminated by adding SDS samplebuffer and was applied to an SDS-PAGE gel. The reaction was monitored inthe gel by quantification of p62 glycosylation using a Fuji phosphoimager andImage Gauge 3.0 software. In competition assays, the glycosyltransferase reac-tion was carried out in the presence of 10 µg of the OGT(16–400) fragment.

Crystallization and data collection. Crystallization of human TPR(16–400)was done at 18 °C using sitting-drop vapor diffusion in 96-well trays (EmeraldBioscience) by mixing 1 µl of the protein solution with 1 µl of the reservoirsolution containing 100 mM HEPES-Na/HCl, pH 7.5, 200 mM CaCl2 and32–36% (v/v) PEG400 (Hampton Research). Triangular crystals appearedovernight and grew to full size (200 × 200 × 50 µm) within one week. For datacollection, the crystals were transferred from 18 °C to 4 °C for 24 h, stabilized in100 mM HEPES, pH 7.5, 150 mM NaCl, 150 mM CaCl2, 40% v/v PEG400, 10%(v/v) glycerol, and flash-frozen in liquid nitrogen. Data were processed withXDS36. Data collection and phasing statistics are shown in Table 1.

Structure determination and refinement. Phase information was obtained by athree-wavelength MAD experiment at a resolution of 3.6 Å on a gold derivativegenerated by soaking crystals in 10 mM KAu(CN)2 for 6 h. Eleven gold sites werelocated using SOLVE37. The initial phases were improved by solvent flatteningwith RESOLVE38 to yield an interpretable electron-density map revealing twopolypeptide chains related by pseudo two-fold rotational symmetry. The mapwas improved locally by two-fold averaging with RAVE39. The model was builtusing O40 and refined against the 2.85-Å-resolution native data set using themaximum-likelihood target in CNS41. The progress of refinement and rebuild-ing was monitored with the free R-factor computed from a randomly omitted4% of the observed diffraction data42. Strong noncrystallographic symmetryrestraints were applied to residues 80–270 of the two polypeptide chains in theasymmetric unit throughout most of the refinement. The restraints werereleased when the Rfree had dropped to 33.6%. The final model was refined to anRfree of 29.7% and Rwork of 25.9%. Although these values are relatively high(probably owing to a combination of the relatively low resolution, low symme-try and partial disorder of TPR repeats 9 and 10 in molecule 2), the model hasgood stereochemistry (Table 1). It contains residues 16–400 in chain 1, residues16–319 and 340–400 in chain 2 (in both cases, the sequence GPM, derived fromthe expression vector, precedes Glu16), 27 water molecules and one Ca2+ ion.

Coordinates. The atomic coordinates and structure factors have been depositedin the Protein Data Bank (accession code 1W3B).

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe are grateful to beamline scientists of X06SA at the Swiss Light Source (Zürich)and ESRF ID14-2 (Grenoble) for assistance during data collection. We thank M. Hothorn for introduction to data processing with XDS and P. Brick for criticalreading of the manuscript. M.J. was supported by the Human Frontier ScienceProgram Organization (RGP0063/2002-C).

COMPETING INTERESTS STATEMENTThe authors declare that they have no competing financial interests.

Received 24 June; accepted 20 July 2004Published online at http://www.nature.com/nsmb/

1. Hanover, J.A. Glycan-dependent signaling: O-linked N-acetylglucosamine. FASEB J.15, 1865–1876 (2001).

2. Wells, L., Vosseller, K. & Hart, G.W. Glycosylation of nucleocytoplasmic proteins: sig-nal transduction and O-GlcNAc. Science 291, 2376–2378 (2001).

3. Kearse, K.P. & Hart, G.W. Lymphocyte activation induces rapid changes in nuclearand cytoplasmic glycoproteins. Proc. Natl. Acad. Sci. USA 88, 1701–1705 (1991).

4. Chou, T.Y., Hart, G.W. & Dang, C.V. c-Myc is glycosylated at threonine 58, a knownphosphorylation site and a mutational hot spot in lymphomas. J. Biol. Chem. 270,18961–18965 (1995).

5. Slawson, C. & Hart, G.W. Dynamic interplay between O-GlcNAc and O-phosphate: thesweet side of protein regulation. Curr. Opin. Struct. Biol. 13, 631–636 (2003).

6. Davis, L.I. & Blobel, G. Nuclear pore complex contains a family of glycoproteins thatincludes p62: glycosylation through a previously unidentified cellular pathway. Proc.Natl. Acad. Sci. USA 84, 7552–7556 (1987).

7. Hanover, J.A., Cohen, C.K., Willingham, M.C. & Park, M.K. O-linked N-acetylglu-cosamine is attached to proteins of the nuclear pore. Evidence for cytoplasmic andnucleoplasmic glycoproteins. J. Biol. Chem. 262, 9887–9894 (1987).

8. Wells, L., Whalen, S.A. & Hart, G.W. O-GlcNAc: a regulatory post-translational modi-fication. Biochem. Biophys. Res. Com. 302, 435–441 (2003).

9. Yang, X. et al. O-linkage of N-acetylglucosamine to Sp1 activation domain inhibits itstranscriptional capability. Proc. Natl. Acad. Sci. USA 98, 6611–6616 (2001).

10. Zhang, F. et al. O-GlcNAc modification is an endogenous inhibitor of the proteasome.Cell 115, 715–725 (2003).

11. Kreppel, L.K., Blomberg, M.A. & Hart, G.W. Dynamic glycosylation of nuclear andcytosolic proteins. Cloning and characterization of a unique O-GlcNAc transferasewith multiple tetratricopeptide repeats. J. Biol. Chem. 272, 9308–9315 (1997).

12. Lubas, W.A., Frank, D.W., Krause, M. & Hanover, J.A. O-linked GlcNAc transferase isa conserved nucleocytoplasmic protein containing tetratricopeptide repeats. J. Biol.Chem. 272, 9316–9324 (1997).

13. Shafi, R. et al. The O-GlcNAc transferase gene resides on the X chromosome and isessential for embryonic stem cell viability and mouse ontogeny. Proc. Natl. Acad. Sci.USA 97, 5735–5739 (2000).

14. Lubas, W.A. & Hanover, J.A. Functional expression of O-linked GlcNAc transferase.Domain structure and substrate specificity. J. Biol. Chem. 275, 10983–10988 (2000).

15. Iyer, S.P. & Hart, G.W. Roles of the tetratricopeptide repeat domain in O-GlcNActransferase targeting and protein substrate specificity. J. Biol. Chem. 278,24608–24616 (2003).

16. Kreppel, L.K. & Hart, G.W. Regulation of a cytosolic and nuclear O-GlcNAc trans-ferase. Role of the tetratricopeptide repeats. J. Biol. Chem. 274, 32015–32022(1999).

17. Yang, X., Zhang, F. & Kudlow, J.E. Recruitment of O-GlcNAc transferase to promotersby corepressor mSin3A: coupling protein O-GlcNAcylation to transcriptional repres-sion. Cell 110, 69–80 (2002).

18. D’Andrea, L.D. & Regan, L. TPR proteins: the versatile helix. Trends Biochem. Sci.28, 655–662 (2003).

19. Sikorski, R.S., Boguski, M.S., Goebl, M. & Hieter, P. A repeating amino acid motif inCDC23 defines a family of proteins and a new relationship among genes required formitosis and RNA synthesis. Cell 60, 307–317 (1990).

20. Das, A.K., Cohen, P.W. & Barford, D. The structure of the tetratricopeptide repeats ofprotein phosphatase 5: implications for TPR-mediated protein-protein interactions.EMBO J. 17, 1192–1199 (1998).

21. Gatto, G.J. Jr., Geisbrecht, B.V., Gould, S.J. & Berg, J.M. Peroxisomal targeting sig-nal-1 recognition by the TPR domains of human PEX5. Nat. Struct. Biol. 7,1091–1095 (2000).

22. Lapouge, K. et al. Structure of the TPR domain of p67phox in complex with Rac.GTP.Mol. Cell 6, 899–907 (2000).

23. Scheufler, C. et al. Structure of TPR domain–peptide complexes: critical elements inthe assembly of the Hsp70-Hsp90 multichaperone machine. Cell 101, 199–210(2000).

24. Lubas, W.A., Smith, M., Starr, C.M. & Hanover, J.A. Analysis of nuclear pore proteinp62 glycosylation. Biochemistry 34, 1686–1694 (1995).

25. Herold, A., Klymenko, T. & Izaurralde, E. NXF1/p15 heterodimers are essential formRNA nuclear export in Drosophila. RNA 7, 1768–1780 (2001).

26. Taylor, P. et al. Two structures of cyclophilin 40: folding and fidelity in the TPRdomains. Structure 9, 431–438 (2001).

1006 VOLUME 11 NUMBER 10 OCTOBER 2004 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol

Page 7: The superhelical TPR-repeat domain of O-linked GlcNAc transferase exhibits structural similarities to importin α

A R T I C L E S

27. Kumar, A. et al. An unexpected extended conformation for the third TPR motif of theperoxin PEX5 from Trypanosoma brucei. J. Mol. Biol. 307, 271–282 (2001).

28. Cingolani, G., Petosa, C., Weis, K. & Muller, C.W. Structure of importin-β bound to theIBB domain of importin-α. Nature 399, 221–229 (1999).

29. Lee, S.J. et al. The structure of importin-β bound to SREBP-2: nuclear import of atranscription factor. Science 302, 1571–1575 (2003).

30. Fukuhara, N., Fernandez, E., Ebert, J., Conti, E. & Svergun, D. Conformational variabil-ity of nucleo-cytoplasmic transport factors. J. Biol. Chem. 279, 2176–2181 (2004).

31. Huber, A.H. & Weis, W.I. The structure of the β-catenin/E-cadherin complex and themolecular basis of diverse ligand recognition by β-catenin. Cell 105, 391–402 (2001).

32. Conti, E., Uy, M., Leighton, L., Blobel, G. & Kuriyan, J. Crystallographic analysis ofthe recognition of a nuclear localization signal by the nuclear import factor karyo-pherin α. Cell 94, 193–204 (1998).

33. Young, J.C., Hoogenraad, N.J. & Hartl, F.U. Molecular chaperones Hsp90 and Hsp70deliver preproteins to the mitochondrial import receptor Tom70. Cell 112, 41–50(2003).

34. Fontes, M.R., Teh, T. & Kobe, B. Structural basis of recognition of monopartite andbipartite nuclear localization sequences by mammalian importin-α. J. Mol. Biol. 297,1183–1194 (2000).

35. Conti, E. & Kuriyan, J. Crystallographic analysis of the specific yet versatile recogni-tion of distinct nuclear localization signals by karyopherin α. Structure Fold. Des. 8,329–338 (2000).

36. Kabsch, W. Automatic processing of rotation diffraction data from crystals of ini-tially unknown symmetry and cell constants. J. Appl. Crystallogr. 26, 795–800(1993).

37. Terwilliger, T.C. & Berendzen, J. Automated MAD and MIR structure solution. ActaCrystallogr. D 55, 849–861 (1999).

38. Terwilliger, T.C. Maximum-likelihood density modification. Acta Crystallogr. D 56,965–972 (2000).

39. Kleywegt, G.J. & Read, R.J. Not your average density. Structure 5, 1557–1569(1997).

40. Jones, T.A., Zou, J.Y., Cowan, S.W. & Kjeldgaard. Improved methods for building pro-tein models in electron density maps and the location of errors in these models. ActaCrystallogr. A 47, 110–119 (1991).

41. Brünger, A.T. et al. Crystallography & NMR system: A new software suite for macro-molecular structure determination. Acta Crystallogr. D 54, 905–921 (1998).

42. Brünger, A. Free R value: a novel statistical quantity for assessing the accuracy ofcrystal structures. Nature 355, 472–475 (1992).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 11 NUMBER 10 OCTOBER 2004 1007

©20

04 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

stru

ctm

olbi

ol