Upload
dinhkhanh
View
218
Download
0
Embed Size (px)
Citation preview
Distant structural homology leads to the functional characterisation
of an archaeal PIN-domain as an exonuclease.
Vickery L. Arcus1,2,*, Kristina Bäckbro1,4, Annette Roos1,4, Emma L. Daniel1,3 and
Edward N. Baker1,3.
1School of Biological Sciences, 2AgResearch Structural Biology Laboratory and 3Centre
of Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland, New
Zealand.
4Present address: Department of Cell and Molecular Biology, Uppsala University,
Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden.
*Corresponding author
Name: Vickery Arcus
Phone: +64-9-373-7599
Fax: +64-9-373-7414
Email: [email protected]
Running title: Structure and function of an archaeal PIN domain
Key words: Structural genomics, PIN-domain, X-ray crystallography, exonuclease.
JBC Papers in Press. Published on January 20, 2004 as Manuscript M313833200
Copyright 2004 by The American Society for Biochemistry and Molecular Biology, Inc.
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
SUMMARY
Genome sequencing projects have focused attention on the problem of discovering the
functions of protein domains that are widely distributed throughout living species but
which are, as yet, largely uncharacterised. One such example is the PIN domain, found in
eukaryotes, bacteria and archaea, and with suggested roles in signalling, RNase editing
and/or nucleotide binding. The first reported crystal structure of a PIN domain (open
reading frame PAE2754, derived from the crenarchaeon, Pyrobaculum aerophilum) has
been determined to 2.5 Å resolution and is presented here. Mapping conserved residues
from a multiple sequence alignment onto the structure identifies a putative active site.
The discovery of distant structural homology with several exonucleases, including T4
phage RNase H and flap endonuclease (FEN1), further suggests a likely function for PIN
domains as Mg2+-dependent exonucleases, a hypothesis which we have confirmed in
vitro. The tetrameric structure of PAE2754, with the active sites inside a tunnel, suggests
a mechanism for selective cleavage of single stranded overhangs or flap structures. These
results indicate likely DNA or RNA editing roles for prokaryotic PIN domains, which are
strikingly numerous in thermophiles, and in organisms such as Mycobacterium
tuberculosis. They also support previous hypotheses that eukaryotic PIN domains
participate in RNAi and nonsense mediated RNA degradation (NMD).
2
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
INTRODUCTION
The explosive growth of whole genome sequencing efforts, and the discovery that a large
proportion of the assumed gene products are of unknown or poorly-understood function,
has focused attention on new approaches to assigning function. In the absence of
sufficient sequence similarity to clearly infer homology with already characterised
proteins, a variety of bioinformatic approaches have been used to obtain functional clues.
These include, for example, analyses of genome location (seeking potential operons),
phylogenetic profiling, and observations of gene fusions in different species (1,2). An
alternative, complementary, approach is to use analyses of protein three-dimensional
structure to derive functional insights, since three-dimensional structure is conserved in
evolution much more strongly than sequence. This provides a rationale for a number of
structural genomics initiatives (3-6).
As part of a pilot structural genomics project aimed at the discovery of biological
function, we have focused on gene products from the hyperthermophilic crenarchaeon
Pyrobaculum aerophilum, an organism whose complete genome sequence was published
recently (7). A whole-genome comparison of P. aerophilum and Mycobacterium
tuberculosis, two organisms with very different, and in a sense extreme, lifestyles, led us
to identify a set of 250 pairs of orthologous genes that are both widely distributed in
nature and are shared by these two organisms. Among these were a set of four genes from
P. aerophilum (PAE0151, PAE0285, PAE0337, PAE2754) and four from M. tuberculosis
(Rv0065, Rv0549, Rv0960, Rv1720) that have since been clustered at NCBI as part of
3
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
COG4113, with members drawn from archaea, cyanobacteria, actinobacteria and alpha
proteobacteria (Table 1). These are now annotated in Pfam as PIN domains.
[Table 1]
PIN domains, named for their homology with the N-terminal domain of the pili
biogenesis protein (PIN = PilT N-terminus) (8), comprise a very large family of proteins
with representatives in all three kingdoms of life. They are classified in the Pfam database
(http://www.sanger.ac.uk/Software/Pfam) as PF01850, currently with more than 340
members. Functional annotation of the PIN domains is equivocal. They were initially
thought to function in signalling (9), but a recent bioinformatic analysis of nearly 100
PIN domain sequences identified a set of five conserved acidic residues and a sixth
conserved position at which there is either a Serine or Threonine residue. This has led to a
suggested exonuclease function (10). In eukaryotes, for example, the C. elegans PIN
domain, smg-5, and the yeast PIN domain, NMD4p, are postulated to be ribonucleases
(RNases) that bind to helicases as part of the machinery for RNA degradation via the
RNAi and nonsense mediated RNA degradation (NMD) pathways (10).
In archaea and thermophilic bacteria, PIN domains have been associated with a possible
role in DNA repair. A recent analysis of conserved gene context across fully sequenced
prokaryotic genomes revealed, in most archaea and some thermophilic bacteria, a
previously unrecognised cluster of genes containing DNA polymerases, helicases,
nucleases and many conserved hypothetical open reading frames, one of which is a PIN
domain clustering in COG1848 (11). This suggested a new DNA repair system in these
organisms. DNA repair, and particularly mismatch repair, in thermophiles is a vexing
4
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
question. The absence of key mismatch repair enzymes such as MutS and MutL, which
are highly conserved in mesophiles from E. coli to humans, has been suggested to result
in “mutator” lifestyles for some thermophiles (7), in which adaptive mutations enable the
organism to adapt to stress or extreme environments (12). Also absent from many of the
fully sequenced thermophilic genomes are several well conserved nucleotide excision
repair enzymes. The discovery of a new DNA repair operon in archaea addresses this
question and by implicating PIN domains as part of this operon adds another piece of
functional evidence for this large protein family.
Here we present the first crystal structure of a PIN domain, from the crenarchaeon
Pyrobaculum aerophilum. The protein, the gene product of open reading frame
PAE2754, proves to be a distant structural homologue of T4 RNase H and other
exonucleases, despite insignificant sequence identity. Strict conservation of the active site
residues suggests that this PIN domain is indeed an exonuclease. We have confirmed this
functional hypothesis in vitro. This has important implications both for archaeal DNA
editing and for the role of PIN domains in eukaryotic RNA editing. It is also an
illustration of the power of structural genomics whereby deep phylogenetic lineages are
apparent at the structural level and lead directly to functional characterisation of proteins
of previously unknown function or with equivocal and/or general functional annotation.
5
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
EXPERIMENTAL PROCEDURES
Protein Expression, Purification and Crystallization
The predicted open reading frame PAE2754 was amplified from genomic DNA by PCR,
subcloned into the expression vector pPROeX (Lifetech), transformed into E. coli
BL21(DE3) cells and expressed as an N-terminal His6-tagged protein. Purification
involved a heat step, in which incubation for 40 minutes at 80 ºC denatured a large
fraction of the E. coli proteins, followed by Ni2+ affinity chromatography and size
exclusion chromatography, as described (13).
Two site-specific mutations, L65M and L80M, were designed and introduced to facilitate
structure determination by multiwavelength anomalous diffraction (MAD) methods using
the selenomethionine (SeMet)-substituted protein. The single mutants were individually
made and tested for expression and crystallization, followed by the double mutant
L65M/L80M. Mutagenesis was performed with the QuikChange site-directed
mutagenesis kit (Stratagene). The double mutant was stable at 80 ºC suggesting that the
structure was not significantly destabilised by these mutations. The plasmid encoding the
double mutant was transformed into the methionine auxotroph E. coli strain DL41(DE3)
and grown in LeMaster medium with SeMet as the only methionine source. The SeMet-
substituted double mutant protein (SeMet-PAE2754_MM) was then purified as above.
Both native PAE2754 and SeMet-PAE2754_MM were crystallized as described (13) and
flash-cooled for data collection by soaking in cryoprotectant (mother liquor plus 10%
glycerol) immediately prior to placement in a stream of cold N2 gas at 110 K.
6
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
Structure Determination and Refinement
Native PAE2754 X-ray diffraction data to 2.5 Å resolution were collected at the National
Synchrotron Light Source (NSLS), Brookhaven, on beamline X8C (λ=1.0000 Å). MAD
data at two wavelengths were collected for SeMet-PAE2754_MM at the Stanford
Synchrotron Radiation Laboratory (SSRL), beamline 9-1. The data were processed using
DENZO and SCALEPACK (14). Data collection and refinement statistics are given in
Table 2. The structure of PAE2754_MM was determined by single anomalous diffraction
(SAD) using a single SeMet dataset at λ=0.9794 Å. This is not where the anomalous
differences are maximised for Se, but was necessary because the “remote” wavelength
data set proved to be of poor quality due to crystal decay (data were initially collected at
two wavelengths in accordance with (15)). Using SOLVE (16,17), a total of 17 out of a
possible 24 Se sites (from the 8 monomers in the crystal asymmetric unit) were located,
based on anomalous differences. These gave initial phases to 2.8 Å with a figure of merit
of 0.18 and a Z-score of 20.6. The phases were improved using maximum likelihood
density modification via RESOLVE (16). Successful improvement of the phases was
dependent on user-defined non-crystallographic symmetry elements based on the initial
Se positions.
Much of the core structure (approximately 74 residues per monomer) was built
automatically with RESOLVE (16) and TEXTAL (18,19), and the rest of the structure
was built manually, during twelve cycles of model building with O (20) and refinement
with CNS (21). Due to the relatively low resolution of the data, non-crystallographic
symmetry (NCS) was included as a restraint in all refinement cycles and weighted
7
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
experimental phases from RESOLVE were also included at all stages. The final
PAE2754_MM structure was then used as a molecular replacement model for the native
data and this structure was completed using 5 cycles of model building and refinement.
Refinement statistics for both models are given in Table 2.
[Table 2]
In vitro exonuclease assays
An 18 base pair (bp) primer (5'-CGCGCCGTTGCTATCTCC-3') was annealed to a 54 bp
primer (5'-ATTGAGAAATTCACGGCGNNKATANNKNNKGTTNNKGGAGA
TAGCAACGGCGCG-3'; N=T,G,A,C; K=T,G) to form double stranded DNA with a
36 bp, 5'-3' single stranded, randomised overhang. 200 pM of DNA was then mixed with
200 pM of PAE2754 in 20 mM NaCl and 10 mM MgCl2 (MgCl2 was omitted from the
negative control) and incubated at 37 ºC. The reaction was stopped at different time
points by the addition of formamide gel loading buffer (80% v/v formamide, 10 mM
EDTA), followed by freezing at -20 ºC. Samples were run on a 20% polyacrylamide-urea
denaturing minigel and visualised using ethidium bromide staining. Samples were also
prepared in the same way using MnCl2 as the metal ion source.
RESULTS
Crystal structure of PAE2754
The crystal structure of the protein encoded by the open reading frame PAE2754 from
P. aerophilum was solved at 2.8 Å resolution, as a selenomethionine (SeMet)-labelled
double mutant, incorporating two L→M mutations, that was constructed to facilitate
phasing by MAD methods. This derivative structure was then refined to give a final R-
8
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
factor of 0.226 (Rfree = 0.279). The native structure was then solved by molecular
replacement and refined at 2.5 Å resolution to a final R-factor of 0.250 (Rfree = 0.305).
The resulting model has good stereochemistry (Table 2) with 92% of residues in the most
favoured region of the Ramachandran plot.
The PAE2754 monomer forms a single domain in which the 133-residue polypeptide is
folded as an α/β/α stack, with a central twisted parallel β-sheet of five short strands
(Figure 1). The strand order is 32145 and the twist of the sheet is such that the outer
strands, which involve just three residues each, are oriented at 160º with respect to each
other. Between strand 2 and strand 3, helices α2 and α3 pack in an antiparallel manner to
form a long protrusion that extends orthogonally from the α/β/α stack. Hydrophobic
cores above and below the central β-sheet stabilise the α/β/α stack and a third
hydrophobic mini-core is formed below the stack by the orthogonal packing of helices α4
and α5. These hydrophobic cores are highly populated by Ala, Leu and Val residues,
which comprise no less than 40% of the PAE2754 sequence.
[Figure 1]
Dynamic light scattering measurements show that PAE2754 forms a tetramer in solution,
and accordingly the 12 monomers in the crystal asymmetric unit are organised as three
tetramers (8 monomers in the asymmetric unit are organised as two tetramers for
PAE2754_MM). The tetramer is best described as a dimer of dimers (Figure 2). An
extensive dimer interface is formed by the two-fold related packing of a nearly
continuous region of sequence between residues 32 and 94, spanning helices α2, α3 and
α4. This interface buries 1440 Å2 of surface area (19% of the total monomer surface) and
9
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
is dominated by hydrophobic interactions, marked by a striking interdigitation of many
large hydrophobic sidechains. There are just six hydrogen bonds between the two protein
chains, all centred around the stacked histidine aromatic rings at the centre of the
interface.
[Figure 2]
The tetramer is formed by the association of two dimers via a relatively small interface,
in which the C-terminus of helix α2 from one monomer contacts the N-terminus of helix
α6 from another. Stabilising interactions at this interface are modest, and include a salt
bridge between Arg48 from one monomer and Asp110 and Glu112 from another. The
backbone carbonyl group from Arg48 also hydrogen bonds across the interface to the
amide group of Arg111. Although only 450 Å2 of surface area per monomer is buried on
formation of this interface, the cooperative association of two dimers buries in total
4x450=1800 Å2. There is also a diagonally stabilising interaction across the tetramer
whereby a chloride ion linearly coordinates two arginine residues. The chloride lies at
the centre of a sphere of charged side-chains which include Arg48 (chain A), Glu112 and
Lys116 (chain B), Glu112 and Lys116 (chain C), and Arg48 (chain D).
Residues that are conserved across COG4113 (Figures 2 and 3) are clustered in a pocket
formed at the C-terminal end of the β-sheet and the N-termini of helices α2 and α6. This
arrangement brings together four conserved acidic residues (Asp8, Glu38, Asp92 and
Asp110 in PAE2754) that point into the pocket and create a highly negatively charged
hole. Two other conserved residues on either side of Asp110, Thr108 and Leu112, also
flank the acidic pocket with Thr108 being hydrogen bonded to Asp8. In the PAE2754
10
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
dimer (Figure 2) the two acidic pockets are approximately 20 Å apart, and are separated
by an intriguing structure formed by the one remaining fully-conserved residue, Tyr91;
the two Tyr91 side chains lie adjacent at the dimer interface, with their aromatic rings
parallel and 6 Å apart. Upon formation of the tetramer, the four active site pockets lie in
the interior of a tunnel with restricted access via two openings on opposite sides of the
tetramer (Figure 2). Adjacent lysine and glutamic acid residues (Lys45, Glu46) from each
monomer flank the entrances to the tunnel.
[Figure 3]
Structural comparisons
A DALI search (22) using the monomer structure as a search model gave no significant
matches to structures in the Protein Data Bank (PDB). The best structural match was to a
porcine D-amino acid oxidase (DAO) (23) with a Z-score of 3.3, a root-mean-square
difference (RMSD) in atomic positions of 3.9 Å for 97 Cα positions, and 8% sequence
identity. The topological match between PAE2754 and DAO in the overlaid region was
relatively good, with matches for all the major elements of secondary structure in
PAE2754 except α4, although DAO does have two large insertions of 90 amino acids
(between α2 and α3) and 110 amino acids (between β4 and β5). DAO has an FAD (flavin
adenine dinucleotide) cofactor whose nucleotide component binds into the hole where the
active site is hypothesised for PAE2754. This appeared to support the PIN domain
annotation of “possible nucleotide binding protein” from the major databases and was
also consistent with the RNase hypothesis of Clissold and Ponting (10). A second match
to the ADP binding domain of trimethylamine dehydrogenase (24), where the nucleotide
was similarly orientated, added weight to this hypothesis.
11
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
Perhaps more significant, however, was the presence in the top ten DALI structural
matches of the T4 RNase H structure (25). This also has low DALI scores (Z=2.8, RMSD
= 3.6 Å over 84 amino acids and 10% sequence identity) and the topological matches
between PAE2754 and T4 RNase H were significantly poorer than for DAO, with no
matches for α1, α2, α4 or α7 of PAE2754. What was striking, however, was the
observation that residues that are conserved across COG4113, including the acidic
residues at the putative active site, aligned structurally with similar residues in T4
RNase H that are involved in Mg2+ binding and catalysis (Figure 4). Furthermore, these
residues are also conserved across a large family of related prokaryotic exonucleases.
This led us to test Mg2+ binding and DNase activity in vitro.
Thus, while the fold of PAE2754 is most closely related to domains that bind ADP or
FAD, suggesting nucleotide binding as part of its core function, sequence conservation at
the active site predicts that PAE2754 (along with the many other PIN domains) belongs
to the T4 RNase H family of exonucleases (26). Further, once the superposition of T4
RNase H on to the PAE2754 structure could be established, it became clear that other
members of the exonuclease family, including the exonuclease domain of Taq DNA
polymerase (27,28) and the Flap endonuclease-1 (FEN-1) (29,30), which has both endo
and exo-nuclease activities, are also structurally related to PAE2754 and, by extension, to
other PIN domains.
12
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
In vitro tests for exonuclease activity
Exonuclease assays were carried out using synthetic DNA primers designed to give a
long 5'-3' single stranded overhang. A time course incubation clearly shows PAE2754 has
a Mg2+-dependent exonuclease activity (see Figure 5). This experiment was repeated with
Mn2+ in place of Mg2+ with equivalent results (data not shown), but no activity was seen
in the absence of a suitable divalent cation. These initial tests show that the cleavage of
single stranded DNA by PAE2754 is slow and requires equimolar amounts of DNA,
Mg2+ or Mn2+ and protein to provide catalysis. The sluggish reaction may be the result of
the non-optimal substrate and/or the non-optimal temperature of the assay – we presume
that the optimal temperature for this enzyme is 95-100 ºC. Assays to determine substrate
specificity and the optimal temperature are the subject of ongoing work.
[Figure 5]
Putative active site
The four conserved acidic residues in each monomer, Asp8, Glu38, Asp92 and Asp110
are clustered together in a surface pocket, facing into the tunnel through the center of the
tetramer. Two of these residues, Asp8 and Asp110 are at the N-termini of helices α1 and
α6 respectively, and their carboxylate groups are fixed in place by typical helix N-cap
hydrogen bonds (with Ala11 NH and Tyr113 NH). This is reminiscent of the first Mg2+
site in the P. furiosus FEN-1 endo/exonuclease (29), where both the Asp residues that
directly coordinate the Mg2+ ion are fixed at helix N-termini. It seems likely, by analogy,
that the Mg2+ site in PAE2754 may be similarly pre-organised, with Asp8 and Asp110
directly coordinating the metal. Asp92 could also coordinate a metal ion bound in this
13
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
way, either directly, or indirectly via a water molecule, but Glu38 is more remote (6 Å
away). If two Mg2+ ions are bound, as is the case in T4 ribonuclease H, the flap
exonucleases and many other exo- and endonucleases and polymerases, it is likely that
Glu38 would participate in binding the second Mg2+ ion. If this is so, the distance
between the two Mg2+ ions will be somewhere between the ~4 Å seen in the Klenow
fragment of DNA polymerase (31) and the ~8 Å seen in the T5 5’-exonuclease (32).
The invariant threonine residue, Thr108, is a candidate for involvement in catalysis, but is
fairly well buried, hydrogen bonded to Asp8 Oδ2 and Asp110 NH, and it is difficult to
see how it can play any direct role. Two other hydroxyl-containing residues, Ser10 and
Thr89, which are almost fully conserved, are also adjacent to the metal site where they
are fully exposed in the central tunnel and could play a role in catalysis or binding. Two
other features of the active site region seem likely to be important. First, the pairwise,
parallel, stacking (Figure 2) of the aromatic rings of the conserved Tyr91 residues on the
inner surface of the central tunnel suggests that these residues could participate in
substrate binding by stacking on either side of a nucleotide base. Aromatic residues have
previously been found to perform such a function in, for example, single stranded RNA
and DNA binding domains (33,34). Second, side chains of Lys45 from each monomer
project into the tunnel and could be involved in binding to nucleic acid phosphate groups.
This residue is almost fully conserved as Lys or Arg.
The active site is only accessible from inside the tunnel through the tetramer, implying
that nucleic acid substrates must thread through it. The diameter of the opening to the
14
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
tunnel is an oval with dimensions of approximately 10 x 14 Å, too small for double
stranded DNA or RNA, but consistent with the protein cleaving overhanging single-
stranded nucleic acids or flap structures, as is the case for flap endonucleases.
DISCUSSION
PIN domains, of which PAE2754 is the first to be structurally characterised, appear to be
highly abundant and to have a deep lineage through all three kingdoms of life. Thus,
Clissold and Ponting (10) identified 95 PIN domain sequences from a 90% non-
redundant database using HMM sequence searching methods, Makarova and colleagues
find 122 PIN sequences (35), from all three kingdoms, and Pfam annotates 345 proteins
as containing a PIN domain. No fewer than ten related COGs bear the annotation
“Predicted nucleic acid-binding protein, contains PIN domain”. It is a measure of the
increasing power of bioinformatics approaches that Clissold and Ponting were able to
cluster the PIN domains with exonucleases and hypothesise their role in RNAi and
nonsense mediated RNA degradation when just 5 of ~135 sequence positions are
conserved across the domain.
Our structure of PAE2754 places these conserved residues close together in
3-dimensional space. Together with our experimental evidence for nuclease activity, and
the demonstrated structural homology with known exonucleases such as
T4 ribonuclease H and the flap exonucleases, it further provides compelling evidence that
15
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
PIN domains do indeed play a role in DNA and/or RNA editing processes through
Mg2+-dependent exonuclease activity.
Multiple sequence alignments of PIN domains based on COGs or Pfam show three
principal features: (i) three conserved aspartic acid residues, one near the N-terminus of
the protein and two clustered near the C-terminus; (ii) a conserved threonine or serine
adjacent to the last conserved aspartic acid, forming a T/SxD motif in which the threonine
or serine possibly plays a catalytic role; (iii) a well conserved acidic residue (either Asp
or Glu) in the centre of the sequence. The acidic residues are clustered in such a way that
they could support the coordination of either one or two Mg2+ ions. If two Mg2+ ions are
bound, as is commonly the case in polymerases and nucleases (25), and has been
proposed as essential for catalysis (31), it is probable that one will be bound to Asp8,
Asp92 and Asp110, which are in close proximity, and the second to Glu38. The former
site may be of higher affinity given that the side chains of Asp8 and Asp110 are fixed at
the N-termini of α-helices. On the other hand, a direct and essential role for Thr108 in
catalysis seems less likely, given its relatively buried location.
The ways in which PIN domains associate into oligomers, or are combined with other
domains or other proteins is likely to determine the types of editing in which they are
involved and the types of substrates on which they act. Thus, the PAE2754 structure is
tetrameric, both in solution (as shown by dynamic light scattering and gel filtration) and
in the crystal, and the tunnel through the centre of the tetramer provides quite restricted
access to the active site. Our preliminary modelling suggested that only single stranded
16
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
nucleic acids, or single stranded overhangs from duplex structures, could serve as
substrates. This hypothesis is supported by our in vitro assays, using double stranded
DNA with a single-stranded overhang as a template, although we have not demonstrated
any specificity in either substrate or the direction of DNA cleavage and further functional
studies clearly need to be carried out. We also note that the FEN1 endonuclease from
Pyrococcus horikoshii forms a dimer that is topologically similar to the PAE2754
tetramer, with the two active sites inside the dimer and access via a hole to the exterior.
In contrast, a second example of what is clearly a PIN domain structure has recently been
deposited in the Protein Data Bank (ID code 1O4W, deposited by the Joint Centre for
Structural Genomics (San Diego)). This protein shares 23% sequence identity with
PAE2754, and forms a very similar monomer in which 101 residues match with an
RMSD of 3.0 Å. This A. fulgidus PIN domain forms a dimer in which the monomers are
tethered by a stretch of 10 amino acids at the C-terminus of the protein, thus separating
the monomer active sites by 42 Å and creating a very different molecular environment
from that of the four active sites in the PAE2754 tetramer.
An intriguing feature of the phylogenetic distribution of PIN domains is that they seem to
be amplified in a number of species, sometimes to a remarkable extent. These include
Archaeoglobus fulgidus, Pyrococcus horikoshii, and Methanococcus jannaschii, all of
which are thermophilic euryarchaeota. Among mesophilic bacteria, PIN domains also
seem to be extraordinarily amplified in Mycobacterium tuberculosis. COG1848, proteins
from which have been predicted to be part of a new DNA repair system (11), includes no
17
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
fewer than 14 PIN-domain proteins from M. tuberculosis. Additionally, three other
COGS are found to include a further 12 M. tuberculosis PIN-domain proteins between
them. This presents the intriguing question as to why M. tuberculosis would have 26 PIN
domains as part of its DNA or RNA editing machinery.
This augmentation of the exonuclease family and the accumulation of PIN-domain
proteins in certain species raises intriguing questions as to the cellular function of the
PIN-domains. In thermophilic archaea, it is reasonable to expect that there would be
additional suites of DNA and RNA editing and repair mechanisms due to the elevated
levels of oligonucleotide modification and damage caused by the high temperatures.
Indeed, a predicted new DNA repair operon encompassing ~20 genes in archaea contains
a PIN-domain protein (11,12). However, the augmentation of the exonucleases in
mesophiles is unexpected. M. tuberculosis is the most extreme example of the currently
sequenced organisms, with its 26 PIN-domain proteins, all of which contain the acidic
quartet of Mg2+ binding residues and the adjacent serine or threonine. This suggests that a
large retinue of exo- or endonuclease enzymes are present in this pathogenic bacterium.
There has been speculation about the functional relevance of the presence or absence of
DNA repair genes in M. tuberculosis. It has been suggested that an absence of repair
enzymes might be beneficial under conditions of stress or therapeutic treatment during
long periods of stationary phase as is the case for M tuberculosis (36,37). On the other
hand, the correlation between those species lacking the mutS gene and thus assumed to be
mismatch repair deficient, and the expansion, in the same species, of the PIN-domain
18
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
proteins may not be coincidental. We conclude that these PIN-domain proteins are very
likely substitutes for a number of exo- and endonucleases, with roles in DNA repair that
are apparently missing from the M. tuberculosis genome and from the genomes of
various extremophile species (12).
The second possible role for these nucleases is as a defence arsenal designed to neutralise
phage via DNA and/or RNA degradation. It is the case that these functions have been
adapted in eukaryotic PIN domains to degrade RNA via the RNAi and NMD pathways
(10). Hence in the eukaryotes it seems that editing and repair functions for PIN domains
have likely been adapted to degradation pathways.
ACKNOWLEDGEMENTS
We thank Alexi Murzin for helpful discussions surrounding PIN domains and their
functional and structural classification. We thank Li-Wei Hung for the data collection at
the National Synchrotron Light Source, Brookhaven, under the auspices of the
International Mycobacterium tuberculosis Structural Genomics Consortium. We thank
Clyde Smith for MAD data collection at the SSRL. This work was supported by funding
from the Marsden Fund and the Health Research Council of New Zealand.
19
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
REFERENCES
1. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg,
D. (1999) Nature. 402, 83-86
2. von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel, B.
(2003) Nucleic Acids Res. 31, 258-261
3. Goulding, C. W., Apostol, M., Anderson, D. H., Gill, H. S., Smith, C. V., Kuo, M.
R., Yang, J. K., Waldo, G. S., Suh, S. W., Chauhan, R., Kale, A., Bachhawat, N.,
Mande, S. C., Johnston, J. M., Lott, J. S., Baker, E. N., Arcus, V. L., Leys, D.,
McLean, K. J., Munro, A. W., Berendzen, J., Sharma, V., Park, M. S., Eisenberg,
D., Sacchettini, J., Alber, T., Rupp, B., Jacobs, W., Jr., and Terwilliger, T. C.
(2002) Curr. Drug Targets - Infect. Disord. 2, 121-141
4. Stevens, R. C., Yokoyama, S., and Wilson, I. A. (2001) Science. 294, 89-92
5. Adams, M. W., Dailey, H. A., DeLucas, L. J., Luo, M., Prestegard, J. H., Rose, J.
P., and Wang, B. C. (2003) Acc. Chem. Res. 36, 191-198
6. Teichmann, S. A., Murzin, A. G., and Chothia, C. (2001) Curr. Opin. Struct. Biol.
11, 354-363
7. Fitz-Gibbon, S. T., Ladner, H., Kim, U. J., Stetter, K. O., Simon, M. I., and
Miller, J. H. (2002) Proc. Natl. Acad. Sci. U.S.A. 99, 984-989
8. Wall, D., and Kaiser, D. (1999) Mol. Microbiol. 32, 1-10
9. Noguchi, E., Hayashi, N., Azuma, Y., Seki, T., Nakamura, M., Nakashima, N.,
Yanagida, M., He, X., Mueller, U., Sazer, S., and Nishimoto, T. (1996) EMBO J.
15, 5595-5605
20
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
10. Clissold, P. M., and Ponting, C. P. (2000) Current Biol. 10, R888-890
11. Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B., and Koonin, E. V.
(2002) Nucleic Acids Res. 30, 482-496
12. Makarova, K. S., Aravind, L., Galperin, M. Y., Grishin, N. V., Tatusov, R. L.,
Wolf, Y. I., and Koonin, E. V. (1999) Genome Res. 9, 608-628
13. Arcus, V. L., Backbro, K., Roos, A., and Baker, E. N. (2003) Acta Crystallogr.
Sect. D, submitted for publication.
14. Otwinowski, Z., and Minor, W. (1997) Methods Enzymol. 276, 307-326
15. Gonzalez, A. (2003) Acta Crystallogr. Sect. D 59, 315-322
16. Terwilliger, T. C. (2002) Acta Crystallogr. Sect. D 58, 1937-1940
17. Terwilliger, T. C. (2001) Acta Crystallogr. Sect. D 57, 1755-1762
18. Holton, T., Ioerger, T. R., Christopher, J. A., and Sacchettini, J. C. (2000) Acta
Crystallogr. Sect. D 56, 722-734
19. Ioerger, T. R., and Sacchettini, J. C. (2002) Acta Crystallogr. Sect. D 58, 2043-
2054
20. Jones, T. A., Zou, J. Y., Cowan, S. W., and Kjeldgaard, M. (1991) Acta
Crystallogr. Sect. A 47, 110-119
21. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-
Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R.
J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Acta Crystallogr. Sect. D
54, 905-921
22. Holm, L., and Sander, C. (1995) Trends in Biochem. Sci. 20, 478-480
21
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
23. Miura, R., Setoyama, C., Nishina, Y., Shiga, K., Mizutani, H., Miyahara, I., and
Hirotsu, K. (1997) J. Biochem. 122, 825-833
24. Barber, M. J., Neame, P. J., Lim, L. W., White, S., and Matthews, F. S. (1992) J.
Biol. Chem. 267, 6611-6619
25. Mueser, T. C., Nossal, N. G., and Hyde, C. C. (1996) Cell 85, 1101-1112
26. Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G., and
Chothia, C. (2000) Nucleic Acids Res. 28, 257-259
27. Eom, S. H., Wang, J., and Steitz, T. A. (1996) Nature. 382, 278-281
28. Kim, Y., Eom, S. H., Wang, J., Lee, D. S., Suh, S. W., and Steitz, T. A. (1995)
Nature. 376, 612-616
29. Hosfield, D. J., Mol, C. D., Shen, B., and Tainer, J. A. (1998) Cell 95, 135-146
30. Hwang, K. Y., Baek, K., Kim, H. Y., and Cho, Y. (1998) Nature Struct. Biol. 5,
707-713
31. Beese, L. S., and Steitz, T. A. (1991) EMBO J. 10, 25-33
32. Ceska, T. A., Sayers, J. R., Stier, G., and Suck, D. (1996) Nature 382, 90-93
33. Draper, D. E. (1999) J. Mol. Biol. 293, 255-270
34. Mitton-Fry, R. M., Anderson, E. M., Hughes, T. R., Lundblad, V., and Wuttke, D.
S. (2002) Science 296, 145-147
35. Makarova, K. S., and Koonin, E. V. (2003) Genome Biol. 4, 115.111-115.117
36. Karunakaran, P., and Davies, J. (2000) J. Bacteriol. 182, 3331-3335
37. Mizrahi, V., and Andersen, S. J. (1998) Mol. Microbiol. 29, 1331-1339
38. DeLano, W.L. (2002) The PyMOL Molecular Graphics System DeLano
Scientific, San Carlos, CA, USA.
22
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
39. Nicholls, A., Sharp, K., and Honig, B. (1991) Proteins Struct. Funct. Genet. 11,
281-296.
40. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G.
(1997) Nucleic Acids Res. 24, 4876-4882.
FOOTNOTES
The atomic coordinates and structure factors for PAE2754 (code 1v8p) and
PAE2754_MM (code 1v8o) have been deposited in the Protein Data Bank, Research
Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ
(http://www.rcsb.org/pdb/).
Abbreviations
COG, cluster of orthologous groups; NMD, nonsense mediated degradation; PAE2754,
protein encoded by open reading frame number 2754 from Pyrobaculum aerophilum;
MAD multiwavelength anomalous diffraction; SeMet, Seleno-methionine; SAD single
wavelength anomalous diffraction; NCS, non-crystallographic symmetry; DAO, D-amino
acid oxidase; RMSD, root-mean-square deviation;
23
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
FIGURE LEGENDS
FIG. 1. Structure and topology of the PAE2754 monomer. A stereo ribbon diagram
showing the organisation of secondary structure elements for PAE2754. Alpha-helices
are labelled sequentially (with the exception of helix 6 whose label is omitted for clarity)
and the N- and C-termini are also labelled. The central, twisted, parallel β-sheet is shown
in green, with the strand order 32145 proceeding perpendicular to the page from front to
back. This figure, along with figures 2 and 4 were drawn using PyMol (38).
FIG. 2. Oligomeric state for PAE2754 showing conserved residues. A. View of the
dimer showing the residues that are conserved across COG4113 (also shown in the
alignment in Figure 3). For chain A three of the residues are labelled with their one-letter
amino acid code. B. Surface depiction (in the same orientation as A) of the PAE2754
dimer showing electrostatic charges at the surface using GRASP. C, Orthogonal views of
the tetramer surface showing the tunnel inside which the putative active site resides. This
figure was drawn using PyMol (38) and GRASP (39).
FIG. 3. Sequence and structural alignments for PIN domain proteins. A. Selection of
protein sequences from COG4113 aligned using the program ClustalX (40). Seven
positions are highlighted in red identifying the near-strict conservation at these positions
across this cluster. Sequences are named according to their genome and open reading
frame numbers in accordance with Table 1. B. Sequences from five structures aligned
based on structural homology using DALI. Secondary structure elements for PAE2754
24
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
are shown above this alignment and the five conserved positions are highlighted in red.
Numbers in parentheses within the sequences indicate insertions—the number of residues
in a loop in each structure which do not structurally align amongst the group. Numbers in
parentheses at the C-termini of the sequences indicate the number of amino acids in the
full-length proteins.
FIG. 4. Structural comparison of the PAE2754 PIN domain with T4 RNase H.
A. A stereo view of the structures of PAE2754 (grey) and T4 RNase H (blue) aligned
according to the superposition matrix from DALI. This produces a close alignment of the
central β-sheet of the two proteins, though other secondary structural elements match less
well. PAE2754 helices are labelled according to Figure 2 and the position of the two
Mg2+ ions in RNase H are shown as blue spheres. B. An expanded view, in stereo, of the
active site showing just the secondary structure elements for PAE2754 and the matching
residues from each of the two structures along with the two Mg2+ ions from the T4 RNase
H structure. Putative active site residues for PAE2754 are shown in grey and active site
residues for T4 RNase H are shown in blue. For clarity, just three of the five conserved
positions are labelled (PAE2754 sequence and numbering). The remaining unlabelled
residues are D92 and T108.
FIG. 5. In vitro exonuclease activity of PAE2754. A polyacrylamide/urea denaturing gel
showing DNA stained by ethidium bromide (see text). Lane A shows the 54 bp
oligonucleotide alone. Lanes B,C,D,E,F and G show 1,2,3,4,5 and 19 hr incubations,
respectively, of annealed oligos (54 bp + 18 bp) with PAE2754 and MgCl2 at 37ºC. Lane
25
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
H shows a 19 hour incubation of annealed oligos (54 bp + 18 bp) with PAE2754 at 37ºC
in the absence of MgCl2.
26
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
TABLES
Table 1. Taxonomy and makeup of COG4113
Taxa Organism Proteins
Alpha proteobacteria Sinorhizobium meliloti SMA0545
Actinobacteria Mycobacterium tuberculosis Rv0065, Rv0549, Rv0960, Rv1720
Cyanobacteria Synechocystis sp.
Nostoc sp.
SLL1225
ALL5132
Crenarchaeota Pyrobaculum aerophilum PAE0151, PAE0285, PAE0337, PAE2754
Sulfolobus solfataricus SSO0798, SSO1243, SSO1493, SSO1786
SSO1914, SSO1922, SSO1970
Euryarchaeota Pyrococcus horikoshii PH0098, PH0389
27
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
Table 2. Data collection, structure solution and refinement statistics.
Parameters Native SeMet A. Crystal data Space group P212121 P21 Cell axial lengths (Å) 60.6, 165.2, 203.4 56.4, 193.3, 60.5 angles (°) 90, 90, 90 90, 94.6, 90 B. Data collection Resolution (Å) 50-2.50 (2.59-2.50) 40-2.75 (2.85-2.75) Measured reflections 879180 286429 Unique Reflections 69531 32550 Completeness (%) 97.5 (91.4) 96.1 (72.7) Mosaicity 0.42 0.39 Rmerge (%)a 6.4 (41.8) 10.3 (31.9) I/σI 21.9 (3.2) 10.7 (1.9) C. Phasing, SOLVE Resolution (Å) 40–2.8 Rano
b 7.5 Sites 17 mean FOM 0.18 Z-score (σ)c 20.6 D. Refinement Resolution (Å) 50-2.5 (2.66-2.50) 40–2.8 (2.98-2.80) R 25.0 (33.5) 22.6 (32.7) Rfree 30.5 (40.4) 27.9 (35.3) Molecules/asymmetric unit 12 (3 tetramers) 8 (2 tetramers) Protein atoms 12,658 8,424 Water molecules 100 65 rms deviation bond lengths (Å) 0.007 0.007 bond angles 1.3 1.3 Average B values (Å2) Protein 44.8 31.2 Water 40.7 26.2 aRmerge = Σ|Iobs – <I>|/ΣIobs; bRano = Σ|F(+) – F(–)|/Σ[F(+) + F(–)]; c(3).
28
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from
Figure 3Arcus et al.
A PAE0151 1 ------MKLVVDASAIAALYVPEERSEQ----AERAVSQAQELHTLDLAAYEVANDLWKHARRGLLREDEASNMLEELWEFFKALKVHSRv0065 1 -----MDECVVDAAAVVDALAGKGASAI----VLRGLLKESISNAPHLLDAEVGHALRRAVLSDEISEEQAR-AALDALPYLIDNRYP-Rv0960 1 -------MIVVDASAALAALLNDGQ--------ARQLIAAERLHVPHLVDSEIASGLRRLAQRDRLGAADGR-RALQTWRRLAVTRYP-SSO0798 1 ---MKGNGFLFDASALYPLLDYIDK------------IDVKKIYILTLTFYEVGNAIWKEYYIHKKVKDPIT-LSMLFNDLLRRFNVV-SSO1243 1 ---MKDKEFLLDASALYSLLDYVDK------------VDVKKIHVLTLTFYEVGNVIWKEYYIHKKVKDPIT-LSRLFYKLMRKFNVI-SMA0545 1 -----METLVADASIAIKWVVEEEGTDS-----AVELRSRFRFAAPELLIPECANILWKKVQRGELSRDEAV-LAAKLLERSGIDFVS-ALL5132 1 MREDVTRVLCLDTSVWIPYLVPEVYQSQAVTLVTEALSLNIRLVAPAFAWAEVGSVLRKKTRMGVITAEEAL-GFFEDFCELPIDYIEESLL1225 1 MTNQTSFTICIDSNFIVRLLVGYYEETIYLEMWNKWCNANTKIVAPDLINYEVTNVLWRLNKTNQINYTQAQIALTESFN-LGIELYSN
PAE2754 1 MPVEYLVDASALYAL--AAHYDKWIK--------HREKLAILHLTIYEAGNALWKEARLGR---VDWAAASRHLKKVLSSFKVL-AF0591 10 KVRCAVVDTNVLMYVYLNKADVVGQLREF-----GFSRFLITASVKRELEKLEMSLR--------GKEKVAARFALKLLEHFEV-T4RNaseH 12 KEGICLIDFSQIALSTALVNFPDKEKINLSMV-RHLI(17)KIVLCIDNAKSGYWRRDFAYYYKKTW------DWEGYFESSHKVTAQ 10 PKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGF(13)AVIVVFDARAP---------------------TPEDFPRQLALIFEN1 17 EDLKGKKVAIDGMNALYQFLTSIRLPLRNRKGEI---TSA(18)TPIWVFDGEPPKLKEK(22)EDFEEAAKYAKRVSYLTPKMVENC
PAE0151 80 YAEVLKDAFALALKHGVTVYDAAYVALAEKIGGKLLTLDRQLAEKFPALVTP----------------- 131Rv0065 79 HSPRLIEYT-WQLRHNVTFYDALYVALATALDVPLLTGDSRLAAAPGLPCEIKLVR------------- 133Rv0960 73 VVG-LFERI-WEIRANLSAYDASYVALAEALNCALVTADLRLSDTGQAQCPITVVPR------------ 127SSO0798 73 EDPPLDKVMKVAIDKGLTYYDASYVYVAESLGLTLVSNNRELIRKAN-AITLEELIKGV---------- 130SSO1243 73 EDSPLEGVMRIAIERGLTYYNASYAYVAESLGLILVSNDKELIRKAN-AISLKDLIKSM---------- 130SMA0545 78 MTGLLEEATNLSIVLSHPAYDCTYLIAAQRTGSRFVTADMRLLRIVSERAPGEIARLCVSLPDARNDAH 146ALL5132 89 EAIRLRSWEIAEQYGLLTLYDAAFLACAEMTSAEFWTADAALVKQVIPRPSYLREIGEI---------- 147SLL1225 89 SELHQDALAIAEKFQLSAAYDVHYLALAEKMQIDFYTCDKKLFNSVQQNFPRIKLVIANSS-------- 149
PAE2754 72 EDPPLDEVLRVAVERGLTFYDASYAYVAESSGLVLVTQDRELLAKTK----GAIDVETLLVRLAAQ---- 133AF0591 81 VETE-------------SEGDPSLIEAAEKYGCILITNDKELKRKAKQRGIPVGYLKEDKRVFVELL--- 135T4RNaseH 112 IDELKAYMPYIVMDIDKYEADDHIAVLV(8)KILIISSDGDFTQLHKYPNVKQWSPMHKKWVKIGS---- 184 (305)TAQ 100 KE-LVDLLGLARLEVPGYEADDVLASLA(8)EVRILTADKDLYQLLS-DRIHVLHPE-GYLITPAWLWEK 171 (832)FEN1 136 KYLLSLM--GIPYVEAPSEGEAQASYMAKKGDVWVVSQDYDALLYGAPRVVRNLTTTKEMPELIELNEVL 204 (315)
B
A
B
by guest on April 16, 2018 http://www.jbc.org/ Downloaded from
Vickery L. Arcus, Kristina Bäckbro, Annette Roos, Emma L. Daniel and Edward N. BakerPIN-domain as an exonuclease
Distant structural homology leads to the functional characterisation of an archaeal
published online January 20, 2004J. Biol. Chem.
10.1074/jbc.M313833200Access the most updated version of this article at doi:
Alerts:
When a correction for this article is posted•
When this article is cited•
to choose from all of JBC's e-mail alertsClick here
by guest on April 16, 2018
http://ww
w.jbc.org/
Dow
nloaded from