Distant structural homology leads to the functional characterisation

Distant structural homology leads to the functional characterisation

of an archaeal PIN-domain as an exonuclease.

Vickery L. Arcus1,2,*, Kristina Bäckbro1,4, Annette Roos1,4, Emma L. Daniel1,3 and

Edward N. Baker1,3.

1School of Biological Sciences, 2AgResearch Structural Biology Laboratory and 3Centre

of Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland, New

Zealand.

4Present address: Department of Cell and Molecular Biology, Uppsala University,

Biomedical Centre, Box 596, SE-751 24 Uppsala, Sweden.

*Corresponding author

Name: Vickery Arcus

Phone: +64-9-373-7599

Fax: +64-9-373-7414

Email: [email protected]

Running title: Structure and function of an archaeal PIN domain

Key words: Structural genomics, PIN-domain, X-ray crystallography, exonuclease.

JBC Papers in Press. Published on January 20, 2004 as Manuscript M313833200

Copyright 2004 by The American Society for Biochemistry and Molecular Biology, Inc.

by guest on April 16, 2018

http://ww

w.jbc.org/

Dow

nloaded from

mailto:[email protected]

http://www.jbc.org/

SUMMARY

Genome sequencing projects have focused attention on the problem of discovering the

functions of protein domains that are widely distributed throughout living species but

which are, as yet, largely uncharacterised. One such example is the PIN domain, found in

eukaryotes, bacteria and archaea, and with suggested roles in signalling, RNase editing

and/or nucleotide binding. The first reported crystal structure of a PIN domain (open

reading frame PAE2754, derived from the crenarchaeon, Pyrobaculum aerophilum) has

been determined to 2.5 Å resolution and is presented here. Mapping conserved residues

from a multiple sequence alignment onto the structure identifies a putative active site.

The discovery of distant structural homology with several exonucleases, including T4

phage RNase H and flap endonuclease (FEN1), further suggests a likely function for PIN

domains as Mg2+-dependent exonucleases, a hypothesis which we have confirmed in

vitro. The tetrameric structure of PAE2754, with the active sites inside a tunnel, suggests

a mechanism for selective cleavage of single stranded overhangs or flap structures. These

results indicate likely DNA or RNA editing roles for prokaryotic PIN domains, which are

strikingly numerous in thermophiles, and in organisms such as Mycobacterium

tuberculosis. They also support previous hypotheses that eukaryotic PIN domains

participate in RNAi and nonsense mediated RNA degradation (NMD).

2


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

INTRODUCTION

The explosive growth of whole genome sequencing efforts, and the discovery that a large

proportion of the assumed gene products are of unknown or poorly-understood function,

has focused attention on new approaches to assigning function. In the absence of

sufficient sequence similarity to clearly infer homology with already characterised

proteins, a variety of bioinformatic approaches have been used to obtain functional clues.

These include, for example, analyses of genome location (seeking potential operons),

phylogenetic profiling, and observations of gene fusions in different species (1,2). An

alternative, complementary, approach is to use analyses of protein three-dimensional

structure to derive functional insights, since three-dimensional structure is conserved in

evolution much more strongly than sequence. This provides a rationale for a number of

structural genomics initiatives (3-6).

As part of a pilot structural genomics project aimed at the discovery of biological

function, we have focused on gene products from the hyperthermophilic crenarchaeon

Pyrobaculum aerophilum, an organism whose complete genome sequence was published

recently (7). A whole-genome comparison of P. aerophilum and Mycobacterium

tuberculosis, two organisms with very different, and in a sense extreme, lifestyles, led us

to identify a set of 250 pairs of orthologous genes that are both widely distributed in

nature and are shared by these two organisms. Among these were a set of four genes from

P. aerophilum (PAE0151, PAE0285, PAE0337, PAE2754) and four from M. tuberculosis

(Rv0065, Rv0549, Rv0960, Rv1720) that have since been clustered at NCBI as part of

3


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

COG4113, with members drawn from archaea, cyanobacteria, actinobacteria and alpha

proteobacteria (Table 1). These are now annotated in Pfam as PIN domains.

[Table 1]

PIN domains, named for their homology with the N-terminal domain of the pili

biogenesis protein (PIN = PilT N-terminus) (8), comprise a very large family of proteins

with representatives in all three kingdoms of life. They are classified in the Pfam database

(http://www.sanger.ac.uk/Software/Pfam) as PF01850, currently with more than 340

members. Functional annotation of the PIN domains is equivocal. They were initially

thought to function in signalling (9), but a recent bioinformatic analysis of nearly 100

PIN domain sequences identified a set of five conserved acidic residues and a sixth

conserved position at which there is either a Serine or Threonine residue. This has led to a

suggested exonuclease function (10). In eukaryotes, for example, the C. elegans PIN

domain, smg-5, and the yeast PIN domain, NMD4p, are postulated to be ribonucleases

(RNases) that bind to helicases as part of the machinery for RNA degradation via the

RNAi and nonsense mediated RNA degradation (NMD) pathways (10).

In archaea and thermophilic bacteria, PIN domains have been associated with a possible

role in DNA repair. A recent analysis of conserved gene context across fully sequenced

prokaryotic genomes revealed, in most archaea and some thermophilic bacteria, a

previously unrecognised cluster of genes containing DNA polymerases, helicases,

nucleases and many conserved hypothetical open reading frames, one of which is a PIN

domain clustering in COG1848 (11). This suggested a new DNA repair system in these

organisms. DNA repair, and particularly mismatch repair, in thermophiles is a vexing

4


http://ww

w.jbc.org/

Dow

nloaded from

http://www.sanger.ac.uk/Software/Pfam

http://www.jbc.org/

question. The absence of key mismatch repair enzymes such as MutS and MutL, which

are highly conserved in mesophiles from E. coli to humans, has been suggested to result

in “mutator” lifestyles for some thermophiles (7), in which adaptive mutations enable the

organism to adapt to stress or extreme environments (12). Also absent from many of the

fully sequenced thermophilic genomes are several well conserved nucleotide excision

repair enzymes. The discovery of a new DNA repair operon in archaea addresses this

question and by implicating PIN domains as part of this operon adds another piece of

functional evidence for this large protein family.

Here we present the first crystal structure of a PIN domain, from the crenarchaeon

Pyrobaculum aerophilum. The protein, the gene product of open reading frame

PAE2754, proves to be a distant structural homologue of T4 RNase H and other

exonucleases, despite insignificant sequence identity. Strict conservation of the active site

residues suggests that this PIN domain is indeed an exonuclease. We have confirmed this

functional hypothesis in vitro. This has important implications both for archaeal DNA

editing and for the role of PIN domains in eukaryotic RNA editing. It is also an

illustration of the power of structural genomics whereby deep phylogenetic lineages are

apparent at the structural level and lead directly to functional characterisation of proteins

of previously unknown function or with equivocal and/or general functional annotation.

5


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

EXPERIMENTAL PROCEDURES

Protein Expression, Purification and Crystallization

The predicted open reading frame PAE2754 was amplified from genomic DNA by PCR,

subcloned into the expression vector pPROeX (Lifetech), transformed into E. coli

BL21(DE3) cells and expressed as an N-terminal His6-tagged protein. Purification

involved a heat step, in which incubation for 40 minutes at 80 ºC denatured a large

fraction of the E. coli proteins, followed by Ni2+ affinity chromatography and size

exclusion chromatography, as described (13).

Two site-specific mutations, L65M and L80M, were designed and introduced to facilitate

structure determination by multiwavelength anomalous diffraction (MAD) methods using

the selenomethionine (SeMet)-substituted protein. The single mutants were individually

made and tested for expression and crystallization, followed by the double mutant

L65M/L80M. Mutagenesis was performed with the QuikChange site-directed

mutagenesis kit (Stratagene). The double mutant was stable at 80 ºC suggesting that the

structure was not significantly destabilised by these mutations. The plasmid encoding the

double mutant was transformed into the methionine auxotroph E. coli strain DL41(DE3)

and grown in LeMaster medium with SeMet as the only methionine source. The SeMet-

substituted double mutant protein (SeMet-PAE2754_MM) was then purified as above.

Both native PAE2754 and SeMet-PAE2754_MM were crystallized as described (13) and

flash-cooled for data collection by soaking in cryoprotectant (mother liquor plus 10%

glycerol) immediately prior to placement in a stream of cold N2 gas at 110 K.

6


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

Structure Determination and Refinement

Native PAE2754 X-ray diffraction data to 2.5 Å resolution were collected at the National

Synchrotron Light Source (NSLS), Brookhaven, on beamline X8C (λ=1.0000 Å). MAD

data at two wavelengths were collected for SeMet-PAE2754_MM at the Stanford

Synchrotron Radiation Laboratory (SSRL), beamline 9-1. The data were processed using

DENZO and SCALEPACK (14). Data collection and refinement statistics are given in

Table 2. The structure of PAE2754_MM was determined by single anomalous diffraction

(SAD) using a single SeMet dataset at λ=0.9794 Å. This is not where the anomalous

differences are maximised for Se, but was necessary because the “remote” wavelength

data set proved to be of poor quality due to crystal decay (data were initially collected at

two wavelengths in accordance with (15)). Using SOLVE (16,17), a total of 17 out of a

possible 24 Se sites (from the 8 monomers in the crystal asymmetric unit) were located,

based on anomalous differences. These gave initial phases to 2.8 Å with a figure of merit

of 0.18 and a Z-score of 20.6. The phases were improved using maximum likelihood

density modification via RESOLVE (16). Successful improvement of the phases was

dependent on user-defined non-crystallographic symmetry elements based on the initial

Se positions.

Much of the core structure (approximately 74 residues per monomer) was built

automatically with RESOLVE (16) and TEXTAL (18,19), and the rest of the structure

was built manually, during twelve cycles of model building with O (20) and refinement

with CNS (21). Due to the relatively low resolution of the data, non-crystallographic

symmetry (NCS) was included as a restraint in all refinement cycles and weighted

7


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

experimental phases from RESOLVE were also included at all stages. The final

PAE2754_MM structure was then used as a molecular replacement model for the native

data and this structure was completed using 5 cycles of model building and refinement.

Refinement statistics for both models are given in Table 2.

[Table 2]

In vitro exonuclease assays

An 18 base pair (bp) primer (5'-CGCGCCGTTGCTATCTCC-3') was annealed to a 54 bp

primer (5'-ATTGAGAAATTCACGGCGNNKATANNKNNKGTTNNKGGAGA

TAGCAACGGCGCG-3'; N=T,G,A,C; K=T,G) to form double stranded DNA with a

36 bp, 5'-3' single stranded, randomised overhang. 200 pM of DNA was then mixed with

200 pM of PAE2754 in 20 mM NaCl and 10 mM MgCl2 (MgCl2 was omitted from the

negative control) and incubated at 37 ºC. The reaction was stopped at different time

points by the addition of formamide gel loading buffer (80% v/v formamide, 10 mM

EDTA), followed by freezing at -20 ºC. Samples were run on a 20% polyacrylamide-urea

denaturing minigel and visualised using ethidium bromide staining. Samples were also

prepared in the same way using MnCl2 as the metal ion source.

RESULTS

Crystal structure of PAE2754

The crystal structure of the protein encoded by the open reading frame PAE2754 from

P. aerophilum was solved at 2.8 Å resolution, as a selenomethionine (SeMet)-labelled

double mutant, incorporating two L→M mutations, that was constructed to facilitate

phasing by MAD methods. This derivative structure was then refined to give a final R-

8


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

factor of 0.226 (Rfree = 0.279). The native structure was then solved by molecular

replacement and refined at 2.5 Å resolution to a final R-factor of 0.250 (Rfree = 0.305).

The resulting model has good stereochemistry (Table 2) with 92% of residues in the most

favoured region of the Ramachandran plot.

The PAE2754 monomer forms a single domain in which the 133-residue polypeptide is

folded as an α/β/α stack, with a central twisted parallel β-sheet of five short strands

(Figure 1). The strand order is 32145 and the twist of the sheet is such that the outer

strands, which involve just three residues each, are oriented at 160º with respect to each

other. Between strand 2 and strand 3, helices α2 and α3 pack in an antiparallel manner to

form a long protrusion that extends orthogonally from the α/β/α stack. Hydrophobic

cores above and below the central β-sheet stabilise the α/β/α stack and a third

hydrophobic mini-core is formed below the stack by the orthogonal packing of helices α4

and α5. These hydrophobic cores are highly populated by Ala, Leu and Val residues,

which comprise no less than 40% of the PAE2754 sequence.

[Figure 1]

Dynamic light scattering measurements show that PAE2754 forms a tetramer in solution,

and accordingly the 12 monomers in the crystal asymmetric unit are organised as three

tetramers (8 monomers in the asymmetric unit are organised as two tetramers for

PAE2754_MM). The tetramer is best described as a dimer of dimers (Figure 2). An

extensive dimer interface is formed by the two-fold related packing of a nearly

continuous region of sequence between residues 32 and 94, spanning helices α2, α3 and

α4. This interface buries 1440 Å2 of surface area (19% of the total monomer surface) and

9


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

is dominated by hydrophobic interactions, marked by a striking interdigitation of many

large hydrophobic sidechains. There are just six hydrogen bonds between the two protein

chains, all centred around the stacked histidine aromatic rings at the centre of the

interface.

[Figure 2]

The tetramer is formed by the association of two dimers via a relatively small interface,

in which the C-terminus of helix α2 from one monomer contacts the N-terminus of helix

α6 from another. Stabilising interactions at this interface are modest, and include a salt

bridge between Arg48 from one monomer and Asp110 and Glu112 from another. The

backbone carbonyl group from Arg48 also hydrogen bonds across the interface to the

amide group of Arg111. Although only 450 Å2 of surface area per monomer is buried on

formation of this interface, the cooperative association of two dimers buries in total

4x450=1800 Å2. There is also a diagonally stabilising interaction across the tetramer

whereby a chloride ion linearly coordinates two arginine residues. The chloride lies at

the centre of a sphere of charged side-chains which include Arg48 (chain A), Glu112 and

Lys116 (chain B), Glu112 and Lys116 (chain C), and Arg48 (chain D).

Residues that are conserved across COG4113 (Figures 2 and 3) are clustered in a pocket

formed at the C-terminal end of the β-sheet and the N-termini of helices α2 and α6. This

arrangement brings together four conserved acidic residues (Asp8, Glu38, Asp92 and

Asp110 in PAE2754) that point into the pocket and create a highly negatively charged

hole. Two other conserved residues on either side of Asp110, Thr108 and Leu112, also

flank the acidic pocket with Thr108 being hydrogen bonded to Asp8. In the PAE2754

10


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

dimer (Figure 2) the two acidic pockets are approximately 20 Å apart, and are separated

by an intriguing structure formed by the one remaining fully-conserved residue, Tyr91;

the two Tyr91 side chains lie adjacent at the dimer interface, with their aromatic rings

parallel and 6 Å apart. Upon formation of the tetramer, the four active site pockets lie in

the interior of a tunnel with restricted access via two openings on opposite sides of the

tetramer (Figure 2). Adjacent lysine and glutamic acid residues (Lys45, Glu46) from each

monomer flank the entrances to the tunnel.

[Figure 3]

Structural comparisons

A DALI search (22) using the monomer structure as a search model gave no significant

matches to structures in the Protein Data Bank (PDB). The best structural match was to a

porcine D-amino acid oxidase (DAO) (23) with a Z-score of 3.3, a root-mean-square

difference (RMSD) in atomic positions of 3.9 Å for 97 Cα positions, and 8% sequence

identity. The topological match between PAE2754 and DAO in the overlaid region was

relatively good, with matches for all the major elements of secondary structure in

PAE2754 except α4, although DAO does have two large insertions of 90 amino acids

(between α2 and α3) and 110 amino acids (between β4 and β5). DAO has an FAD (flavin

adenine dinucleotide) cofactor whose nucleotide component binds into the hole where the

active site is hypothesised for PAE2754. This appeared to support the PIN domain

annotation of “possible nucleotide binding protein” from the major databases and was

also consistent with the RNase hypothesis of Clissold and Ponting (10). A second match

to the ADP binding domain of trimethylamine dehydrogenase (24), where the nucleotide

was similarly orientated, added weight to this hypothesis.

11


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

Perhaps more significant, however, was the presence in the top ten DALI structural

matches of the T4 RNase H structure (25). This also has low DALI scores (Z=2.8, RMSD

= 3.6 Å over 84 amino acids and 10% sequence identity) and the topological matches

between PAE2754 and T4 RNase H were significantly poorer than for DAO, with no

matches for α1, α2, α4 or α7 of PAE2754. What was striking, however, was the

observation that residues that are conserved across COG4113, including the acidic

residues at the putative active site, aligned structurally with similar residues in T4

RNase H that are involved in Mg2+ binding and catalysis (Figure 4). Furthermore, these

residues are also conserved across a large family of related prokaryotic exonucleases.

This led us to test Mg2+ binding and DNase activity in vitro.

Thus, while the fold of PAE2754 is most closely related to domains that bind ADP or

FAD, suggesting nucleotide binding as part of its core function, sequence conservation at

the active site predicts that PAE2754 (along with the many other PIN domains) belongs

to the T4 RNase H family of exonucleases (26). Further, once the superposition of T4

RNase H on to the PAE2754 structure could be established, it became clear that other

members of the exonuclease family, including the exonuclease domain of Taq DNA

polymerase (27,28) and the Flap endonuclease-1 (FEN-1) (29,30), which has both endo

and exo-nuclease activities, are also structurally related to PAE2754 and, by extension, to

other PIN domains.

12


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

In vitro tests for exonuclease activity

Exonuclease assays were carried out using synthetic DNA primers designed to give a

long 5'-3' single stranded overhang. A time course incubation clearly shows PAE2754 has

a Mg2+-dependent exonuclease activity (see Figure 5). This experiment was repeated with

Mn2+ in place of Mg2+ with equivalent results (data not shown), but no activity was seen

in the absence of a suitable divalent cation. These initial tests show that the cleavage of

single stranded DNA by PAE2754 is slow and requires equimolar amounts of DNA,

Mg2+ or Mn2+ and protein to provide catalysis. The sluggish reaction may be the result of

the non-optimal substrate and/or the non-optimal temperature of the assay – we presume

that the optimal temperature for this enzyme is 95-100 ºC. Assays to determine substrate

specificity and the optimal temperature are the subject of ongoing work.

[Figure 5]

Putative active site

The four conserved acidic residues in each monomer, Asp8, Glu38, Asp92 and Asp110

are clustered together in a surface pocket, facing into the tunnel through the center of the

tetramer. Two of these residues, Asp8 and Asp110 are at the N-termini of helices α1 and

α6 respectively, and their carboxylate groups are fixed in place by typical helix N-cap

hydrogen bonds (with Ala11 NH and Tyr113 NH). This is reminiscent of the first Mg2+

site in the P. furiosus FEN-1 endo/exonuclease (29), where both the Asp residues that

directly coordinate the Mg2+ ion are fixed at helix N-termini. It seems likely, by analogy,

that the Mg2+ site in PAE2754 may be similarly pre-organised, with Asp8 and Asp110

directly coordinating the metal. Asp92 could also coordinate a metal ion bound in this

13


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

way, either directly, or indirectly via a water molecule, but Glu38 is more remote (6 Å

away). If two Mg2+ ions are bound, as is the case in T4 ribonuclease H, the flap

exonucleases and many other exo- and endonucleases and polymerases, it is likely that

Glu38 would participate in binding the second Mg2+ ion. If this is so, the distance

between the two Mg2+ ions will be somewhere between the ~4 Å seen in the Klenow

fragment of DNA polymerase (31) and the ~8 Å seen in the T5 5’-exonuclease (32).

The invariant threonine residue, Thr108, is a candidate for involvement in catalysis, but is

fairly well buried, hydrogen bonded to Asp8 Oδ2 and Asp110 NH, and it is difficult to

see how it can play any direct role. Two other hydroxyl-containing residues, Ser10 and

Thr89, which are almost fully conserved, are also adjacent to the metal site where they

are fully exposed in the central tunnel and could play a role in catalysis or binding. Two

other features of the active site region seem likely to be important. First, the pairwise,

parallel, stacking (Figure 2) of the aromatic rings of the conserved Tyr91 residues on the

inner surface of the central tunnel suggests that these residues could participate in

substrate binding by stacking on either side of a nucleotide base. Aromatic residues have

previously been found to perform such a function in, for example, single stranded RNA

and DNA binding domains (33,34). Second, side chains of Lys45 from each monomer

project into the tunnel and could be involved in binding to nucleic acid phosphate groups.

This residue is almost fully conserved as Lys or Arg.

The active site is only accessible from inside the tunnel through the tetramer, implying

that nucleic acid substrates must thread through it. The diameter of the opening to the

14


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

tunnel is an oval with dimensions of approximately 10 x 14 Å, too small for double

stranded DNA or RNA, but consistent with the protein cleaving overhanging single-

stranded nucleic acids or flap structures, as is the case for flap endonucleases.

DISCUSSION

PIN domains, of which PAE2754 is the first to be structurally characterised, appear to be

highly abundant and to have a deep lineage through all three kingdoms of life. Thus,

Clissold and Ponting (10) identified 95 PIN domain sequences from a 90% non-

redundant database using HMM sequence searching methods, Makarova and colleagues

find 122 PIN sequences (35), from all three kingdoms, and Pfam annotates 345 proteins

as containing a PIN domain. No fewer than ten related COGs bear the annotation

“Predicted nucleic acid-binding protein, contains PIN domain”. It is a measure of the

increasing power of bioinformatics approaches that Clissold and Ponting were able to

cluster the PIN domains with exonucleases and hypothesise their role in RNAi and

nonsense mediated RNA degradation when just 5 of ~135 sequence positions are

conserved across the domain.

Our structure of PAE2754 places these conserved residues close together in

3-dimensional space. Together with our experimental evidence for nuclease activity, and

the demonstrated structural homology with known exonucleases such as

T4 ribonuclease H and the flap exonucleases, it further provides compelling evidence that

15


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

PIN domains do indeed play a role in DNA and/or RNA editing processes through

Mg2+-dependent exonuclease activity.

Multiple sequence alignments of PIN domains based on COGs or Pfam show three

principal features: (i) three conserved aspartic acid residues, one near the N-terminus of

the protein and two clustered near the C-terminus; (ii) a conserved threonine or serine

adjacent to the last conserved aspartic acid, forming a T/SxD motif in which the threonine

or serine possibly plays a catalytic role; (iii) a well conserved acidic residue (either Asp

or Glu) in the centre of the sequence. The acidic residues are clustered in such a way that

they could support the coordination of either one or two Mg2+ ions. If two Mg2+ ions are

bound, as is commonly the case in polymerases and nucleases (25), and has been

proposed as essential for catalysis (31), it is probable that one will be bound to Asp8,

Asp92 and Asp110, which are in close proximity, and the second to Glu38. The former

site may be of higher affinity given that the side chains of Asp8 and Asp110 are fixed at

the N-termini of α-helices. On the other hand, a direct and essential role for Thr108 in

catalysis seems less likely, given its relatively buried location.

The ways in which PIN domains associate into oligomers, or are combined with other

domains or other proteins is likely to determine the types of editing in which they are

involved and the types of substrates on which they act. Thus, the PAE2754 structure is

tetrameric, both in solution (as shown by dynamic light scattering and gel filtration) and

in the crystal, and the tunnel through the centre of the tetramer provides quite restricted

access to the active site. Our preliminary modelling suggested that only single stranded

16


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

nucleic acids, or single stranded overhangs from duplex structures, could serve as

substrates. This hypothesis is supported by our in vitro assays, using double stranded

DNA with a single-stranded overhang as a template, although we have not demonstrated

any specificity in either substrate or the direction of DNA cleavage and further functional

studies clearly need to be carried out. We also note that the FEN1 endonuclease from

Pyrococcus horikoshii forms a dimer that is topologically similar to the PAE2754

tetramer, with the two active sites inside the dimer and access via a hole to the exterior.

In contrast, a second example of what is clearly a PIN domain structure has recently been

deposited in the Protein Data Bank (ID code 1O4W, deposited by the Joint Centre for

Structural Genomics (San Diego)). This protein shares 23% sequence identity with

PAE2754, and forms a very similar monomer in which 101 residues match with an

RMSD of 3.0 Å. This A. fulgidus PIN domain forms a dimer in which the monomers are

tethered by a stretch of 10 amino acids at the C-terminus of the protein, thus separating

the monomer active sites by 42 Å and creating a very different molecular environment

from that of the four active sites in the PAE2754 tetramer.

An intriguing feature of the phylogenetic distribution of PIN domains is that they seem to

be amplified in a number of species, sometimes to a remarkable extent. These include

Archaeoglobus fulgidus, Pyrococcus horikoshii, and Methanococcus jannaschii, all of

which are thermophilic euryarchaeota. Among mesophilic bacteria, PIN domains also

seem to be extraordinarily amplified in Mycobacterium tuberculosis. COG1848, proteins

from which have been predicted to be part of a new DNA repair system (11), includes no

17


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

fewer than 14 PIN-domain proteins from M. tuberculosis. Additionally, three other

COGS are found to include a further 12 M. tuberculosis PIN-domain proteins between

them. This presents the intriguing question as to why M. tuberculosis would have 26 PIN

domains as part of its DNA or RNA editing machinery.

This augmentation of the exonuclease family and the accumulation of PIN-domain

proteins in certain species raises intriguing questions as to the cellular function of the

PIN-domains. In thermophilic archaea, it is reasonable to expect that there would be

additional suites of DNA and RNA editing and repair mechanisms due to the elevated

levels of oligonucleotide modification and damage caused by the high temperatures.

Indeed, a predicted new DNA repair operon encompassing ~20 genes in archaea contains

a PIN-domain protein (11,12). However, the augmentation of the exonucleases in

mesophiles is unexpected. M. tuberculosis is the most extreme example of the currently

sequenced organisms, with its 26 PIN-domain proteins, all of which contain the acidic

quartet of Mg2+ binding residues and the adjacent serine or threonine. This suggests that a

large retinue of exo- or endonuclease enzymes are present in this pathogenic bacterium.

There has been speculation about the functional relevance of the presence or absence of

DNA repair genes in M. tuberculosis. It has been suggested that an absence of repair

enzymes might be beneficial under conditions of stress or therapeutic treatment during

long periods of stationary phase as is the case for M tuberculosis (36,37). On the other

hand, the correlation between those species lacking the mutS gene and thus assumed to be

mismatch repair deficient, and the expansion, in the same species, of the PIN-domain

18


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

proteins may not be coincidental. We conclude that these PIN-domain proteins are very

likely substitutes for a number of exo- and endonucleases, with roles in DNA repair that

are apparently missing from the M. tuberculosis genome and from the genomes of

various extremophile species (12).

The second possible role for these nucleases is as a defence arsenal designed to neutralise

phage via DNA and/or RNA degradation. It is the case that these functions have been

adapted in eukaryotic PIN domains to degrade RNA via the RNAi and NMD pathways

(10). Hence in the eukaryotes it seems that editing and repair functions for PIN domains

have likely been adapted to degradation pathways.

ACKNOWLEDGEMENTS

We thank Alexi Murzin for helpful discussions surrounding PIN domains and their

functional and structural classification. We thank Li-Wei Hung for the data collection at

the National Synchrotron Light Source, Brookhaven, under the auspices of the

International Mycobacterium tuberculosis Structural Genomics Consortium. We thank

Clyde Smith for MAD data collection at the SSRL. This work was supported by funding

from the Marsden Fund and the Health Research Council of New Zealand.

19


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

REFERENCES

1. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg,

D. (1999) Nature. 402, 83-86

2. von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel, B.

(2003) Nucleic Acids Res. 31, 258-261

3. Goulding, C. W., Apostol, M., Anderson, D. H., Gill, H. S., Smith, C. V., Kuo, M.

R., Yang, J. K., Waldo, G. S., Suh, S. W., Chauhan, R., Kale, A., Bachhawat, N.,

Mande, S. C., Johnston, J. M., Lott, J. S., Baker, E. N., Arcus, V. L., Leys, D.,

McLean, K. J., Munro, A. W., Berendzen, J., Sharma, V., Park, M. S., Eisenberg,

D., Sacchettini, J., Alber, T., Rupp, B., Jacobs, W., Jr., and Terwilliger, T. C.

(2002) Curr. Drug Targets - Infect. Disord. 2, 121-141

4. Stevens, R. C., Yokoyama, S., and Wilson, I. A. (2001) Science. 294, 89-92

5. Adams, M. W., Dailey, H. A., DeLucas, L. J., Luo, M., Prestegard, J. H., Rose, J.

P., and Wang, B. C. (2003) Acc. Chem. Res. 36, 191-198

6. Teichmann, S. A., Murzin, A. G., and Chothia, C. (2001) Curr. Opin. Struct. Biol.

11, 354-363

7. Fitz-Gibbon, S. T., Ladner, H., Kim, U. J., Stetter, K. O., Simon, M. I., and

Miller, J. H. (2002) Proc. Natl. Acad. Sci. U.S.A. 99, 984-989

8. Wall, D., and Kaiser, D. (1999) Mol. Microbiol. 32, 1-10

9. Noguchi, E., Hayashi, N., Azuma, Y., Seki, T., Nakamura, M., Nakashima, N.,

Yanagida, M., He, X., Mueller, U., Sazer, S., and Nishimoto, T. (1996) EMBO J.

15, 5595-5605

20


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

10. Clissold, P. M., and Ponting, C. P. (2000) Current Biol. 10, R888-890

11. Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B., and Koonin, E. V.

(2002) Nucleic Acids Res. 30, 482-496

12. Makarova, K. S., Aravind, L., Galperin, M. Y., Grishin, N. V., Tatusov, R. L.,

Wolf, Y. I., and Koonin, E. V. (1999) Genome Res. 9, 608-628

13. Arcus, V. L., Backbro, K., Roos, A., and Baker, E. N. (2003) Acta Crystallogr.

Sect. D, submitted for publication.

14. Otwinowski, Z., and Minor, W. (1997) Methods Enzymol. 276, 307-326

15. Gonzalez, A. (2003) Acta Crystallogr. Sect. D 59, 315-322

16. Terwilliger, T. C. (2002) Acta Crystallogr. Sect. D 58, 1937-1940

17. Terwilliger, T. C. (2001) Acta Crystallogr. Sect. D 57, 1755-1762

18. Holton, T., Ioerger, T. R., Christopher, J. A., and Sacchettini, J. C. (2000) Acta

Crystallogr. Sect. D 56, 722-734

19. Ioerger, T. R., and Sacchettini, J. C. (2002) Acta Crystallogr. Sect. D 58, 2043-

2054

20. Jones, T. A., Zou, J. Y., Cowan, S. W., and Kjeldgaard, M. (1991) Acta

Crystallogr. Sect. A 47, 110-119

21. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-

Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R.

J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Acta Crystallogr. Sect. D

54, 905-921

22. Holm, L., and Sander, C. (1995) Trends in Biochem. Sci. 20, 478-480

21


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

23. Miura, R., Setoyama, C., Nishina, Y., Shiga, K., Mizutani, H., Miyahara, I., and

Hirotsu, K. (1997) J. Biochem. 122, 825-833

24. Barber, M. J., Neame, P. J., Lim, L. W., White, S., and Matthews, F. S. (1992) J.

Biol. Chem. 267, 6611-6619

25. Mueser, T. C., Nossal, N. G., and Hyde, C. C. (1996) Cell 85, 1101-1112

26. Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G., and

Chothia, C. (2000) Nucleic Acids Res. 28, 257-259

27. Eom, S. H., Wang, J., and Steitz, T. A. (1996) Nature. 382, 278-281

28. Kim, Y., Eom, S. H., Wang, J., Lee, D. S., Suh, S. W., and Steitz, T. A. (1995)

Nature. 376, 612-616

29. Hosfield, D. J., Mol, C. D., Shen, B., and Tainer, J. A. (1998) Cell 95, 135-146

30. Hwang, K. Y., Baek, K., Kim, H. Y., and Cho, Y. (1998) Nature Struct. Biol. 5,

707-713

31. Beese, L. S., and Steitz, T. A. (1991) EMBO J. 10, 25-33

32. Ceska, T. A., Sayers, J. R., Stier, G., and Suck, D. (1996) Nature 382, 90-93

33. Draper, D. E. (1999) J. Mol. Biol. 293, 255-270

34. Mitton-Fry, R. M., Anderson, E. M., Hughes, T. R., Lundblad, V., and Wuttke, D.

S. (2002) Science 296, 145-147

35. Makarova, K. S., and Koonin, E. V. (2003) Genome Biol. 4, 115.111-115.117

36. Karunakaran, P., and Davies, J. (2000) J. Bacteriol. 182, 3331-3335

37. Mizrahi, V., and Andersen, S. J. (1998) Mol. Microbiol. 29, 1331-1339

38. DeLano, W.L. (2002) The PyMOL Molecular Graphics System DeLano

Scientific, San Carlos, CA, USA.

22


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

39. Nicholls, A., Sharp, K., and Honig, B. (1991) Proteins Struct. Funct. Genet. 11,

281-296.

40. Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G.

(1997) Nucleic Acids Res. 24, 4876-4882.

FOOTNOTES

The atomic coordinates and structure factors for PAE2754 (code 1v8p) and

PAE2754_MM (code 1v8o) have been deposited in the Protein Data Bank, Research

Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ

(http://www.rcsb.org/pdb/).

Abbreviations

COG, cluster of orthologous groups; NMD, nonsense mediated degradation; PAE2754,

protein encoded by open reading frame number 2754 from Pyrobaculum aerophilum;

MAD multiwavelength anomalous diffraction; SeMet, Seleno-methionine; SAD single

wavelength anomalous diffraction; NCS, non-crystallographic symmetry; DAO, D-amino

acid oxidase; RMSD, root-mean-square deviation;

23


http://ww

w.jbc.org/

Dow

nloaded from

http://www.rcsb.org/pdb/

http://www.jbc.org/

FIGURE LEGENDS

FIG. 1. Structure and topology of the PAE2754 monomer. A stereo ribbon diagram

showing the organisation of secondary structure elements for PAE2754. Alpha-helices

are labelled sequentially (with the exception of helix 6 whose label is omitted for clarity)

and the N- and C-termini are also labelled. The central, twisted, parallel β-sheet is shown

in green, with the strand order 32145 proceeding perpendicular to the page from front to

back. This figure, along with figures 2 and 4 were drawn using PyMol (38).

FIG. 2. Oligomeric state for PAE2754 showing conserved residues. A. View of the

dimer showing the residues that are conserved across COG4113 (also shown in the

alignment in Figure 3). For chain A three of the residues are labelled with their one-letter

amino acid code. B. Surface depiction (in the same orientation as A) of the PAE2754

dimer showing electrostatic charges at the surface using GRASP. C, Orthogonal views of

the tetramer surface showing the tunnel inside which the putative active site resides. This

figure was drawn using PyMol (38) and GRASP (39).

FIG. 3. Sequence and structural alignments for PIN domain proteins. A. Selection of

protein sequences from COG4113 aligned using the program ClustalX (40). Seven

positions are highlighted in red identifying the near-strict conservation at these positions

across this cluster. Sequences are named according to their genome and open reading

frame numbers in accordance with Table 1. B. Sequences from five structures aligned

based on structural homology using DALI. Secondary structure elements for PAE2754

24


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

are shown above this alignment and the five conserved positions are highlighted in red.

Numbers in parentheses within the sequences indicate insertions—the number of residues

in a loop in each structure which do not structurally align amongst the group. Numbers in

parentheses at the C-termini of the sequences indicate the number of amino acids in the

full-length proteins.

FIG. 4. Structural comparison of the PAE2754 PIN domain with T4 RNase H.

A. A stereo view of the structures of PAE2754 (grey) and T4 RNase H (blue) aligned

according to the superposition matrix from DALI. This produces a close alignment of the

central β-sheet of the two proteins, though other secondary structural elements match less

well. PAE2754 helices are labelled according to Figure 2 and the position of the two

Mg2+ ions in RNase H are shown as blue spheres. B. An expanded view, in stereo, of the

active site showing just the secondary structure elements for PAE2754 and the matching

residues from each of the two structures along with the two Mg2+ ions from the T4 RNase

H structure. Putative active site residues for PAE2754 are shown in grey and active site

residues for T4 RNase H are shown in blue. For clarity, just three of the five conserved

positions are labelled (PAE2754 sequence and numbering). The remaining unlabelled

residues are D92 and T108.

FIG. 5. In vitro exonuclease activity of PAE2754. A polyacrylamide/urea denaturing gel

showing DNA stained by ethidium bromide (see text). Lane A shows the 54 bp

oligonucleotide alone. Lanes B,C,D,E,F and G show 1,2,3,4,5 and 19 hr incubations,

respectively, of annealed oligos (54 bp + 18 bp) with PAE2754 and MgCl2 at 37ºC. Lane

25


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

H shows a 19 hour incubation of annealed oligos (54 bp + 18 bp) with PAE2754 at 37ºC

in the absence of MgCl2.

26


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

TABLES

Table 1. Taxonomy and makeup of COG4113

Taxa Organism Proteins

Alpha proteobacteria Sinorhizobium meliloti SMA0545

Actinobacteria Mycobacterium tuberculosis Rv0065, Rv0549, Rv0960, Rv1720

Cyanobacteria Synechocystis sp.

Nostoc sp.

SLL1225

ALL5132

Crenarchaeota Pyrobaculum aerophilum PAE0151, PAE0285, PAE0337, PAE2754

Sulfolobus solfataricus SSO0798, SSO1243, SSO1493, SSO1786

SSO1914, SSO1922, SSO1970

Euryarchaeota Pyrococcus horikoshii PH0098, PH0389

27


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

Table 2. Data collection, structure solution and refinement statistics.

Parameters Native SeMet A. Crystal data Space group P212121 P21 Cell axial lengths (Å) 60.6, 165.2, 203.4 56.4, 193.3, 60.5 angles (°) 90, 90, 90 90, 94.6, 90 B. Data collection Resolution (Å) 50-2.50 (2.59-2.50) 40-2.75 (2.85-2.75) Measured reflections 879180 286429 Unique Reflections 69531 32550 Completeness (%) 97.5 (91.4) 96.1 (72.7) Mosaicity 0.42 0.39 Rmerge (%)a 6.4 (41.8) 10.3 (31.9) I/σI 21.9 (3.2) 10.7 (1.9) C. Phasing, SOLVE Resolution (Å) 40–2.8 Rano

b 7.5 Sites 17 mean FOM 0.18 Z-score (σ)c 20.6 D. Refinement Resolution (Å) 50-2.5 (2.66-2.50) 40–2.8 (2.98-2.80) R 25.0 (33.5) 22.6 (32.7) Rfree 30.5 (40.4) 27.9 (35.3) Molecules/asymmetric unit 12 (3 tetramers) 8 (2 tetramers) Protein atoms 12,658 8,424 Water molecules 100 65 rms deviation bond lengths (Å) 0.007 0.007 bond angles 1.3 1.3 Average B values (Å2) Protein 44.8 31.2 Water 40.7 26.2 aRmerge = Σ|Iobs – <I>|/ΣIobs; bRano = Σ|F(+) – F(–)|/Σ[F(+) + F(–)]; c(3).

28


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

Figure 3Arcus et al.

A PAE0151 1 ------MKLVVDASAIAALYVPEERSEQ----AERAVSQAQELHTLDLAAYEVANDLWKHARRGLLREDEASNMLEELWEFFKALKVHSRv0065 1 -----MDECVVDAAAVVDALAGKGASAI----VLRGLLKESISNAPHLLDAEVGHALRRAVLSDEISEEQAR-AALDALPYLIDNRYP-Rv0960 1 -------MIVVDASAALAALLNDGQ--------ARQLIAAERLHVPHLVDSEIASGLRRLAQRDRLGAADGR-RALQTWRRLAVTRYP-SSO0798 1 ---MKGNGFLFDASALYPLLDYIDK------------IDVKKIYILTLTFYEVGNAIWKEYYIHKKVKDPIT-LSMLFNDLLRRFNVV-SSO1243 1 ---MKDKEFLLDASALYSLLDYVDK------------VDVKKIHVLTLTFYEVGNVIWKEYYIHKKVKDPIT-LSRLFYKLMRKFNVI-SMA0545 1 -----METLVADASIAIKWVVEEEGTDS-----AVELRSRFRFAAPELLIPECANILWKKVQRGELSRDEAV-LAAKLLERSGIDFVS-ALL5132 1 MREDVTRVLCLDTSVWIPYLVPEVYQSQAVTLVTEALSLNIRLVAPAFAWAEVGSVLRKKTRMGVITAEEAL-GFFEDFCELPIDYIEESLL1225 1 MTNQTSFTICIDSNFIVRLLVGYYEETIYLEMWNKWCNANTKIVAPDLINYEVTNVLWRLNKTNQINYTQAQIALTESFN-LGIELYSN

PAE2754 1 MPVEYLVDASALYAL--AAHYDKWIK--------HREKLAILHLTIYEAGNALWKEARLGR---VDWAAASRHLKKVLSSFKVL-AF0591 10 KVRCAVVDTNVLMYVYLNKADVVGQLREF-----GFSRFLITASVKRELEKLEMSLR--------GKEKVAARFALKLLEHFEV-T4RNaseH 12 KEGICLIDFSQIALSTALVNFPDKEKINLSMV-RHLI(17)KIVLCIDNAKSGYWRRDFAYYYKKTW------DWEGYFESSHKVTAQ 10 PKGRVLLVDGHHLAYRTFHALKGLTTSRGEPVQAVYGF(13)AVIVVFDARAP---------------------TPEDFPRQLALIFEN1 17 EDLKGKKVAIDGMNALYQFLTSIRLPLRNRKGEI---TSA(18)TPIWVFDGEPPKLKEK(22)EDFEEAAKYAKRVSYLTPKMVENC

PAE0151 80 YAEVLKDAFALALKHGVTVYDAAYVALAEKIGGKLLTLDRQLAEKFPALVTP----------------- 131Rv0065 79 HSPRLIEYT-WQLRHNVTFYDALYVALATALDVPLLTGDSRLAAAPGLPCEIKLVR------------- 133Rv0960 73 VVG-LFERI-WEIRANLSAYDASYVALAEALNCALVTADLRLSDTGQAQCPITVVPR------------ 127SSO0798 73 EDPPLDKVMKVAIDKGLTYYDASYVYVAESLGLTLVSNNRELIRKAN-AITLEELIKGV---------- 130SSO1243 73 EDSPLEGVMRIAIERGLTYYNASYAYVAESLGLILVSNDKELIRKAN-AISLKDLIKSM---------- 130SMA0545 78 MTGLLEEATNLSIVLSHPAYDCTYLIAAQRTGSRFVTADMRLLRIVSERAPGEIARLCVSLPDARNDAH 146ALL5132 89 EAIRLRSWEIAEQYGLLTLYDAAFLACAEMTSAEFWTADAALVKQVIPRPSYLREIGEI---------- 147SLL1225 89 SELHQDALAIAEKFQLSAAYDVHYLALAEKMQIDFYTCDKKLFNSVQQNFPRIKLVIANSS-------- 149

PAE2754 72 EDPPLDEVLRVAVERGLTFYDASYAYVAESSGLVLVTQDRELLAKTK----GAIDVETLLVRLAAQ---- 133AF0591 81 VETE-------------SEGDPSLIEAAEKYGCILITNDKELKRKAKQRGIPVGYLKEDKRVFVELL--- 135T4RNaseH 112 IDELKAYMPYIVMDIDKYEADDHIAVLV(8)KILIISSDGDFTQLHKYPNVKQWSPMHKKWVKIGS---- 184 (305)TAQ 100 KE-LVDLLGLARLEVPGYEADDVLASLA(8)EVRILTADKDLYQLLS-DRIHVLHPE-GYLITPAWLWEK 171 (832)FEN1 136 KYLLSLM--GIPYVEAPSEGEAQASYMAKKGDVWVVSQDYDALLYGAPRVVRNLTTTKEMPELIELNEVL 204 (315)

B

A

B

by guest on April 16, 2018 http://www.jbc.org/ Downloaded from

http://www.jbc.org/


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/

Vickery L. Arcus, Kristina Bäckbro, Annette Roos, Emma L. Daniel and Edward N. BakerPIN-domain as an exonuclease

Distant structural homology leads to the functional characterisation of an archaeal

published online January 20, 2004J. Biol. Chem.

10.1074/jbc.M313833200Access the most updated version of this article at doi:

Alerts:

When a correction for this article is posted•

When this article is cited•

to choose from all of JBC's e-mail alertsClick here


http://ww

w.jbc.org/

Dow

nloaded from

http://www.jbc.org/lookup/doi/10.1074/jbc.M313833200

http://www.jbc.org/cgi/alerts?alertType=citedby&addAlert=cited_by&cited_by_criteria_resid=jbc;M313833200v1&saveAlert=no&return-type=article&return_url=http://www.jbc.org/content/early/2004/01/20/jbc.M313833200.citation

http://www.jbc.org/cgi/alerts?alertType=correction&addAlert=correction&correction_criteria_value=early/2004/01/20/jbc&saveAlert=no&return-type=article&return_url=http://www.jbc.org/content/early/2004/01/20/jbc.M313833200.citation

http://www.jbc.org/cgi/alerts/etoc

http://www.jbc.org/

Documents

Distant structural homology leads to the functional characterisation