59
Chapter 2 PRNP and PrP Chapter 2: PRNP and PrP 52

Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Chapter 2: PRNP and PrP

52

Page 2: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Chapter 2: PRNP and PrP

In this Chapter, I first discuss features of the prion protein gene PRNP and of vertebrate

prion proteins. Then I describe background and logic of my strategy for discovery of

PRNP homologues. Finally, I outline arguments for the Kangaroo Genome Project and

the reasons why I used the tammar wallaby PRNP in comparative genomic analysis.

2.1 Vertebrate Prion Proteins

The set of known vertebrate proteins comprising the prion protein family consists of 99

members, from fish to mammals. There are 79 mammalian PrPs (78 eutherian and 1

marsupial), 14 bird PrPs, 2 reptile PrPs, 1 amphibian PrP and 4 fish PrP homologue

sequences available in the NCBI Entrez Protein database (Figure 2.1).

There is conservation of the general sequence features among the vertebrate PrPs. Prion

proteins can be provisionally divided into four regions with distinct amino acid

composition: the basic region (region 1), the repeats or low-complexity sequence

(region 2), the hydrophobic region (region 3), and the C-terminal region (region 4).

However, the PrP regions from different vertebrate classes exhibit differences in their

primary sequences (Chapter 6.3).

Most notably, the PrPs show conservation in the middle hydrophobic sequence, in the

presence of one disulfide bond and two N-glycosylation sites in the C-terminal domain,

and in the presence of the N- and C-terminal signal sequences for extracellular export

and attachment of a GPI anchor. On the other hand, the N-terminal repeat region is

variable, both in repeat motif length and sequence, and is entirely absent in frog PrP.

The species barrier in prion transmission is determined in part by the sequence

similarity between host PrPC and exogenous PrPSc (Chapter 1.2). Variability and

conservation among prion protein sequences is therefore important because of the risk

of prion transmission.

53

Page 3: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Figure 2.1. Overall structures of PrP, stPrP, PrP-like and Sho proteins showing: S, signal sequence; B, basic region; H, hydrophobic region; R/PGH, PGH-rich repeats; R/GH, GH-rich repeats; B,R/RG, RG-rich basic repeats; B,R, basic repeats; N, N-glycosylation site; S-S, disulfide bridge; GPI, glycophosphatidylinositol anchor; GY and GYH, GY- and GYH-rich regions. Regions and attachment positions are approximately to scale. Numbers indicate the first residue of each section, and last one of each protein. A, Mammalian, avian and reptilian PrPs; numbers refer to human; additional N site for avian in italics. B, Xenopus laevis PrP. C, Fugu, Tetraodon and salmon stPrP; numbers refer to PrP-461. D, Fugu and Tetraodon PrP-like; numbers refer to Fugu. E, Zebrafish PrP-like. F, Zebrafish and Fugu Sho (Chapter 4); the arrow indicates insertion region in Fugu; numbers refer to zebrafish. G, Mammalian Sho (Chapter 4); numbers refer to human.

53a

Page 4: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

The discovery and characteristics of mammalian PrPs are described in Chapter 1.2.

Here I will summarize the history of discovery of PrPs in other species, and also outline

major analyses of PrPs.

Harris et al. (1991) isolated a cDNA coding for the chicken PrP. This protein showed

33% identity with mouse PrP, but the middle hydrophobic sequence, glycosylation sites

(with the third site unique to birds), disulphide bridge and GPI-anchor were conserved.

The proximal repeats of bird and mammalian PrP, however, showed marked differences

(Chapter 6.3). The chicken PRNP mRNA levels increased during postnatal development

in brain in parallel with the levels of choline acetyltransferase mRNA.

To understand better the species barrier that determines prion transmission between

human and primates, Shätzl et al. (1995) compared human PrP with 25 monkey and ape

PrPs. The most prominent difference in this selection of PrPs was in the number of

proximal repeats: whereas one fewer repeat was detected in the orang-utan, African

green monkey and spider monkey PrP, one additional repeat was found in the squirrel

monkey PrP relative to the human PrP. Variations in the residues 90-130 (PrPC-PrPSc

interface; Chapter 1.2.5) could influenced human prion transmission to apes. However,

this analysis also indicated that residues outside this region are also involved in the

species barrier. Differences in approximately one third of amino acid residues in PrP

were observed when the bovine, sheep, mink, rat, mouse, Armenian hamster, Chinese

hamster and Syrian hamster PrP were included in the alignment. Genomic organization

of the PRNP gene was identical in all the species (Chapter 2.2).

Windl et al. (1995) sequenced the first marsupial PrP. This sequence revealed overall

conservation of mammalian prion protein (80% identity) but there were differences in

the composition of proximal repeats (Chapter 6.3).

Wopfner et al. (1999) expanded the number of known mammalian PrPs to a total of 46,

and the number of avian PrPs to a total of 9, and then analysed the regions of PrP that

control the species barrier. Structural regions (Chapter 1.3) were conserved among

54

Page 5: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

mammals, as were functional positions including the two glycosylation sites, two

cysteines and a serine residue (amino acid 231 in human) that is the attachment for a

GPI-anchor. The minor differences in PrPs could strongly affect disease transmission:

there are only two residues different between the dog and cat PrP, and between the

ferret and mink PrP. However, whereas dog and ferret are resistant, cat and mink are

susceptible to prion infection. PrPs were also highly conserved between bird species

(roughly 90%), but avian PrPs showed only 30% of overall identity with mammalian

PrPs. The only PrP region invariably conserved between mammals and birds was the

middle hydrophobic sequence bordered by residues 110-128 in human PrP (Chapter

1.2.5). The proximal repeats of avian and mammalian PrPs are different, perhaps

reflecting different evolutionary pressures.

Van Rheede et al. (2003) studied molecular evolution of the mammalian (eutherian)

prion protein. In order to include representatives of major clades from all 18 eutherian

orders in the analysis, they sequenced 26 new eutherian PrPs. Glycosylation sites,

disulphide bridge, hydrophobic region, elements of secondary structure and signal

peptides are all conserved among the eutherian PrPs (Figure 2.1). The repeat number in

eutherian PrPs varies from as low as two in squirrel (also shown in the lemur PrP (Gilch

et al., 2000) to seven in gymnure and leaf-nosed bat. Deviations from the repeat

consensus sequence were observed, as well as repeat homogenization. Not all the

histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved

eutherian-wide. Expansion and contraction of repeats is a frequent mutational process in

the eutherian PRNP. I show in Chapter 6.3 that this counts also for the marsupial

mammals.

Simonic et al. (2000) cloned a cDNA encoded by the turtle PRNP. The 270-residue

protein showed 40% identity with mammalian and 58% identity with avian PrPs. Ten

tandem hexarepeats were found in the N-terminal part of the protein, whose

composition was different from those in bird and mammals (Chapter 6). Homology

modelling of the turtle PrPC C-terminal region suggested that turtle PrP could generate

the same fold as mammalian PrP.

55

Page 6: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Strumbo et al. (2001) reported the sequence of the 216 residues in X. laevis PrP (Figure

2.1). This amphibian PrP showed more identity with avian and turtle PrPs (more than

44%) than with mammalian PrP (about 28%). The major surprise was a lack of the

repeats in the N-terminal part of the protein. The conserved hydrophobic sequence was

four residues shorter than in other PrPs.

Suzuki et al. (2002) reported a new gene in three fish species, encoding the protein with

similarities to PrP (Figure 2.1). First, a cDNA was cloned in Fugu rubripes coding for a

protein of 180 amino acids. This protein was named PrP-like, because it shared the

conserved middle hydrophobic sequence and other features of PrP, including its basic

nature and the predicted N-terminal signal sequence and GPI-anchor attachment. As for

mammal PrP, the complete coding region lies within a single exon (Chapter 2.2).

However, the Fugu PrP-like lacked the repeats, disulfide bridge and glycosylation sites

of other vertebrate PrPs, and had a different C-terminal domain. Two other fish PrP-like

sequences were discovered in Tetraodon nigroviridis and zebrafish.

Two more fish genes were later reported to encode proteins with structural features

similar to PrP (Rivera-Milla et al., 2003; Oidtmann et al., 2003). Firstly, Rivera-Milla et

al. (2003) reported a cDNA from Fugu rubripes encoding a protein of 461 amino acids

(PrP461) that contained the conserved hydrophobic region and a C-terminal domain

similar to those in other vertebrate PrPs, including the disulfide bridge and N-

glycosylation sites, one of which is conserved with other PrPs. However, it had a

greatly expanded repeat region. Sequence similarity between the Fugu PrP461 and

mammalian PrPs was 22%. The same Fugu rubripes protein independently discovered

by Oidtmann et al. 2003 was named stPrP-1. It has a different length (450 amino acids

due to the inclusion of an extra small (30 bp) intron in its ORF (Chapter 2.2). A 605

amino acids orthologue from Atlantic salmon Salmo salar, longer because of an

expanded repeat region, was also described by Oidtmann et al. (2003). A homologous

stPrP-1 from Tetraodon was found in public genomic data (Rivera-Milla et al., 2003

and Oidtmann et al., 2003).

56

Page 7: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Secondly, Oidtmann et al. (2003) reported a cDNA encoded by a third related gene in

Fugu they named stPrP-2. This stPrP-2 is closely related to stPrP-1 and has the same

sequence features: a hydrophobic region that is disrupted by charged residues, a C-

terminal domain with the disulfide bridge and three N-glycosylation sites, and an

expanded repeat region. It was estimated that the Fugu stPrP-1, Fugu stPrP-2, salmon

stPrP-1 and Fugu PrP-like show 24.8, 21.3, 17.7 and 16.3% identity with the human PrP

27-30 (residues 90-230).

In addition to the problem of transmission of prions among mammals and to humans in

the recent BSE crisis, the findings of PrP homologues in fish raise new issues: the

possibility of spread of prions to farmed fish (e.g. from meat and bone meal feedstuff

derived from farm animals), and vice versa. Oidtmann et al. (2003) indicated that it

seems unlikely that fish could accumulate mammalian prions, but this possibility should

not be excluded, as factors contributing to the species barrier are not fully understood.

2.2 PRNP and Its Homologues

The mammalian prion protein gene and its fish homologues have similar characteristics,

including the exon/intron structure and complete ORF within 3’ terminal exon.

2.2.1 Mammalian PRNP

There is only a single PRNP gene in the mammalian genome. Interest in the structure

and regulation of this gene has been extreme because it dictates both host PrPC amino

acid sequence and level of its expression. These features determine genetic

resistance/disposition to the prion diseases (Prusiner and Scott, 1997; Prusiner, 1998).

The first PRNP gene studied was that of Syrian hamster (Basler et al., 1986). Analysis

showed that it has two exons (56-82 bp and 2kb), separated by one intron (10 kb) and

that the entire ORF resides within the larger exon. Multiple transcription sites were

observed within 25 bp in the upstream promoter. The promoter region contained three

Sp-1-binding sites but no TATA box, features of a housekeeping gene that are in tune

57

Page 8: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

with the ubiquitous expression of PRNP (Chapter 2.3). Li and Bolton (1997) showed

that there is another non-coding exon (99 bp) within the intron. In different brain

regions, the transcript containing all three exons was expressed at 30-50% of the level

of transcript containing only exons 1 and 3. Worth noting here is that the first full-

length PrP sequence was translated from the Syrian hamster PRNP DNA sequence

(Chapter 1.2.2).

Human PRNP has the same gene structure as Syrian hamster PRNP (Puckett et al.,

1991), with a short proximal exon coding for a 5’untranslated region of mRNA (136

bp), a single intron (13 kb) and a distal exon (2.3 kb) containing the ORF. Although no

trace of the exon 2 was found in human cDNAs, the gene contains an exon 2-like

sequence (Lee et al., 1998). The proximal promoter is GC-rich, typical of housekeeping

genes. I analyse human PRNP characteristics in Chapter 6.

Lee et al. (1998) compared the human PRNP with the mouse and ovine PRNPs. The

mouse Prnp (21 kb) encompasses three exons. The short exons 1 (47 bp) and 2 (98 bp)

encode the 5’ UTR (Westaway et al., 1994a), and the complete ORF lies within the

exon 3 (2 kb). The splice donor and acceptor sites flanking the exon 2 were different

from consensus sequences, suggesting that splicing may not be obligatory. Two mouse

Prnp alleles that determine different incubation times after prion infection, Prnpa and

Prnpb, have different lengths (approximately 6 kb difference in the second intron) but

this difference does not affect incubation times. There are multiple transcription start

sites across the 25 bp promoter region. The promoters of both Prnpa and Prnpb contain

binding sites for the Sp1 and AP-1 transcription factors. There are four motifs 250 bp

proximal to the transcription start site CTTTCATTTTCTC, CCATTAt/cGTAACG,

TAAAGATGATTTTTA, TCAGGGAG. These are conserved in the mouse Syrian

hamster, sheep and human promoters but their functional significance is unclear.

The 20 kb sheep PRNP gene also has three exons (52, 98, 4028 bp) (Westaway et al.,

1994b). The coding exon 3 is longer than those in other PRNPs. There is neither a Sp1-

nor an AP-1-binding site in the promoter, but there is an AP-2-binding site.

58

Page 9: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Hills et al., (2001) reported full genomic sequence of the 20 kb bovine PRNP (Figure

2.2). The gene structure is the same as that of the sheep PRNP, with three exons (53, 98

and 4092 bp) and two introns (2442 and 13552 bp).

Comparative genomic analysis of the human, mouse and sheep PRNP showed that the

genes accumulate transposable elements extensively and independently (Lee et al.,

1998). The content of transposable elements was estimated to be 40% in human and

mouse, and 57% in sheep. The 6 kb difference between mouse Prnpa and Prnpb is due

to insertion of the transposable element intracisternal A-particle into the intron 2. The

three-species comparisons identified conserved non-coding sequences in the intron 1

and in the 3’UTR region of the terminal exon (Chapter 6.5). The longer terminal exon is

present in the bovine and sheep genes due to integration of the Bov-B, Bov-tA, and

Mariner transposable elements in the 3’UTR.

Mammalian PRNP lies adjacent to one or two related genes (Chapter 5). The gene

immediately distal to PRNP in eutherian mammals is the PRND encoding the doppel

(Dpl) protein (Chapter 2.2.2), which is thought to have arisen by a duplication of PRNP

(Mastrangelo and Westaway 2001). The next gene adjacent to PRND, detected so far

only in humans and not present in mouse, is PRNT gene (Makrinou et al. 2002), which

seems to be a pseudogene arisen from a duplication of PRND (Chapter 5.5). Further

distal to the PRNT in human, and to the Prnd in mouse, are the RASSF2 encoding Ras

association domain family 2 protein, and the SLC23A1 encoding solute carrier family

23 member 1 protein, conserved in both human and mouse genomes (Chapter 5.5).

The PRNP gene is located on human chromosome 20p13, and in syntenic regions on

mouse chromosome 2F3, rat chromosome 3q36, dog chromosome 24 (Ensembl), bovine

chromosome 13q17, river buffalo chromosome 14q15, sheep chromosome 13q15, goat

chromosome 13q15 (Iannuzzi et al., 1998) and chicken chromosome 22 (Ensembl).

In summary, these analyses have identified, as conserved features of eutherian PRNP

promoters, their GC richness and a lack of TATA box typical of housekeeping genes.

There are some differences in gene structure and regulation of gene expression between

59

Page 10: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Figure 2.2: Structure of the mammalian PRNP and fish PrP-like genes. (A) Typical mammalian PRNP has two short noncoding exons (E1 and E2), two introns, and the complete ORF within longer terminal exon (E3). E2 is missing in the human PRNP. The sizes of exons and introns correspond to the bovine PRNP. (B) Fugu PrP-like contains one non-coding exon (E1) and the complete ORF is within terminal exon (E2). The two rulers indicate size in kb. Exons are depicted by black rectangles. ORF is shown as white rectangle.

59a

Page 11: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

species. PRNP genes contain either three or two exons. There is a single transcription

start site in PRNP except for rodents which have multiple transcription start sites.

Whereas the hamster, human, bovine and mouse promoters contain Sp-1-and AP-1-

binding sites, the sheep promoter does not but instead include an AP-2-binding site.

2.2.2 PRND: A Mammalian Paralogue of PRNP gene

The first mammalian paralogue of gene encoding the prion protein was discovered in

mouse by sequencing the genomic DNA 16 kb downstream of the Prnp gene (Moore et

al., 1999). Prnd encodes a GPI-anchored glycoprotein of 179 amino acids dubbed

“doppel” (Dpl; double in German) showing roughly 25% identity with mammalian PrP.

However, Dpl contains neither the middle hydrophobic section of the PrP critical for its

function, nor the proximal repeats (Chapter 1.2). PRND is 27 kb distal to PRNP in

human. The human and rat Dpl are 76% and 90% identical with mouse Dpl. Unlike

Prnp, the Prnd was expressed minimally in the adult mouse brain, and highly in testis.

Expression of Prnd was upregulated in the brains of Prnp0/0 mice lines Ngsk Prnp0/0 and

Rcm Prnp0/0 that exhibit ataxia and neurodegeneration (Chapter 2.5.1).

The solution structure of recombinant mouse Dpl (amino acids 26-157) was very similar

to that of PrPC (Chapter 1.3), despite limited sequence homology (Figure 2.3) (Mo et

al., 2001). A globular domain contained three helices and little of β-structure. Two

disulfide bonds were found, one between Cys-109 and Cys-143 and the other between

Cys-95 and Cys-148. Regions of secondary structure occurred roughly at the same

positions in both proteins, but differences include a kink in helix αB, shorter helix αC,

shorter β-strands and different orientation of the β-sheet in Dpl.

Prnd knockout mice develop normally (Behrens et al., 2002). Sterility was found in

male but not in female Prnd-deficient mice. The spermatids from Prnd knock-out males

were immobile and malformed, their number was reduced and they were unable to

fertilize oocytes in vitro. Acrosomal defects observed in the Prnd knockout sperms

could account for infertility, perhaps due to inability of sperms to cross zona pellucida.

Transformation of the round spermatids into testicular spermatozoa was also abnormal,

60

Page 12: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Figure 2.3: Comparison of the backbone topology of recombinant mouse Dpl and PrP. αA, α helix A; αB, α helix B; αC, α helix C (copied from Mo et al., 2001).

60a

Page 13: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

as well as regional separation of the spermiogeneic differentiation stages. Thus Prnd is

implicated in male gametogenesis. PrPC, although expressed in testis, could not

compensate for the loss of Dpl, indicating that Prnp and Prnd have non-redundant

functions, at least in the male reproductive tract. Indeed, the Prnp/Prnd double

knockout mice showed no additional new phenotype (Chapter 2.5.1).

The proximity of PRNP and PRND, and sequence and structural similarities of their

products, indicate that they are product of tandem duplication. After duplication of an

ancestral gene, the two genes (duplicates) evolved distinct and unrelated functions

(divergent evolution). Although the two proteins retain similar architectures but with

slightly different topologies, their diverged amino acid compositions dictate different

functions.

2.2.3 Fish PRNP Homologues

Features of the fish PRNP homologues were first defined in the Fugu genome. Suzuki

et al. (2002) reported the PrP-like gene and analyzed its structure and its local genomic

environment. The gene structure resembles that of mammalian PRNP: a short exon 1

(39 bp) and a long exon 2 (932 bp) harbouring the complete ORF are separated by an

intron (1.5 kb) (Figure 2.2). The Fugu PrP-like transcript was expressed in skin, eyes

and brain, an expression pattern different from that of mammal PRNP (Chapter 2.3). It

was noted that the PrP-like resides in the same genomic region as mammalian PRNPs,

proximal to RASSF2 and SLC23A1 in both Fugu and mammals. Suzuki et al. placed the

PrP-like between these two genes, suggesting an evolutionary relationship between the

fish PrP-like and tetrapod PRNP genes (Chapter 5.5).

Oidtmann et al. (2003) provided more details about the PRNP-related genes in Fugu.

They determined that stPrP-2 lies 2 kb proximal to PrP-like, and RASSF2 and SLC23A1

were distal (Chapter 5.5). This fish genomic region did not contain the PRND, which is

reported only in mammals.

61

Page 14: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

The other PRNP homologue, stPrP-1 was found in different genomic context. It

contains a small intron within the ORF, unlike other members of the PRNP gene family

that had been described (Oidtmann et al., 2003). Rivera-Milla et al. (2003)

demonstrated expression of the PrP461 (stPrP-2) transcript in brain and liver. Oidtmann

et al. (2003) showed that stPrP-1, but not stPrP-2 mRNA is expressed in brain in Fugu.

Salmon stPrP-1 transcript is expressed ubiquitously (muscle, liver, skin, gills, kidney,

spleen, heart, brain) but most prominently in brain, an expression pattern similar to

PRNP expression in mammals (Chapter 2.3).

Suzuki et al. (2002) made the initial suggestion of the evolutionary link between the fish

PrP-like and tetrapod PRNP based on their similar protein sequence features

(extracellular, GPI-anchored proteins with repeats and middle hydrophobic region) and

shared contex (proximity to RASSF2 and SLS23A1 genes). However, Oidtmann et al.

(2003) considered that the stPrP-2 had a closer evolutionary relationship with tetrapod

PrPs because its C-terminal region has more similarity to mammalian PrP than does the

PrP-like. Of all the fish PrP homologues, stPrP-1 showed highest homology with other

PrPs although the gene is located in a different genomic context. I analyse these

competing hypotheses in Chapter 5.

2.3 Expression of PRNP and PrPC

Mammalian PRNP is a housekeeping gene and is expressed in a heterogenous set of

cells. This was first demonstrated by Oesch et al. (1985) who cloned a partial cDNA

coding for Syrian hamster PrP. The mRNA levels were the same in both normal and

prion-infected brain. Transcription of mRNA was shown in a range of other tissues:

heart, lung, pancreas, liver, spleen, testis and kidney.

Regulated expression of PRNP during Syrian hamster brain development was

demonstrated by Northern analysis (McKinley et al., 1987). A low level of the mRNA

was found one to ten days after birth, rising to a maximal between the day 10 and day

20 after birth, and remained constant throughout life. PrPC expression increased from a

low at day 2 to a maximum at day 10 after birth. These changes correlate with

62

Page 15: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

morphological changes occurring during mammalian brain development, including

neuron differentiation and increase in the rates of synaptogenesis and myelination

which occur after postnatal day six, suggesting PRNP’s involvement in neuronal

maturation (Chapters 2.5.2 and 6.5).

Caughey et al. (1988) found that PRNP is expressed in normal and scrapie-infected

mouse and hamster brains, liver and spleen. In situ hybridisation (Brown et al., 1990)

revealed PRNP expression in the neurons and non-neuronal cells of mouse brain

(ependymal cells, choroid plexus epithelium, astrocytes, pericytes, endothelial cells and

meninges). Transcription was found also in the microglia cells, alveolar lining and

septal interstitial pulmonary cells and myocard, but not in spleen.

PRNP is also expressed in a number of the cell lines from mouse (epithelial cell line

C127, neuroblastoma Neuro 2A cells, erythroid cell line AA60, embryo fibroblast B6-

3T3 cells, B cell lymphoma cell line 1593), Syrian hamster (ovary-derived CCL61

cells), human (astrocytoma HTB14 cells, neuroblastoma HTB10 cells) and rat (glioma-

derived C6Bu3 cells). No PRNP mRNA was found in the mouse myeloid cell lines

5402 (differentiated) and 7320 (undifferentiated), nor in the human T cell lymphoma

cell line MBL-2 (Caughey et al., 1998).

Human lymphocytes and lymphoid cell lines (but not erythrocytes or granulocytes)

transcribe PRNP and express PrPC (Cashman et al., 1990). After activation of T

lymphocytes, abundance of PrPC on the cell surface increased. Polyclonal antibodies to

PrPC suppressed concanavalin A-induced activation of lymphocytes, indicating that the

PrPC may participate in activation of T lymphocytes (Chapter 2.5.2).

Manson et al. (1992) studied PRNP expression during embryonic mouse development

using in situ hybridisation. Transcripts were found by 13.5 or 16.5 days throughout the

developing brain and spinal cord, and also in the peripheral nervous system (ganglia and

nerve trunks of the sympathetic nervous system and neural cells of sensory organs). At

this stage PRNP expression was also detected in the differentiating non-neuronal cells

of dental lamina and kidney. In extra-embryonic tissue, the PRNP transcripts were

63

Page 16: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

found in the maternal cells of the placenta, and in the amnion, umbilical cord and

mesodermal layer of yolk-sac.

The distribution of tissues expressing PrPC was studied in Syrian hamster (Bendheim et

al., 1992). Immunohistochemical analysis localized PrPC in brain to the neurons and

surrounding neuropil in the hippocampus, septal, caudate and thalamic nuclei, dorsal

root ganglia and dorsal root axons. PrPC was most concentrated within the hippocampus

including the CA1, CA3, CA4 subfields, fimbria, pyramidal cells, dentate formation and

the intervening neuropil. Cortex, fornix, caudate, thalamus, brainstem and spinal cord

expressed less PrPC. In non-neuronal tissues, the circulating leukocytes, heart, myocard,

lung (bronchial epithelium), stomach (parietal and glandular neuroepithelial cells),

intestines, spleen, testis and ovary all expressed PrPC.

Askanas et al. (1993) demonstrated that PrPC is concentrated at the postsynaptic domain

of human normal neuromuscular junctions (NMJ). At the NMJ, molecular compositions

of the extracellular matrix and immediately postsynaptic cytoplasmic domain are

different from those in the nonsynaptic region of the muscle fibre.

Ford et al. (2002a) developed antibodies recognising PrPC in glutaraldehyde-fixed tissue

and studied PrPC expression in the brain. PrPC expression was predominantly neural.

The GABA-immunoreactive neurones showed the highest levels of expression.

Dopaminergic neurones and glia, on the other hand, showed no PrPC expression.

However, all the neurones expressed PRNP mRNA, indicating the importance of

posttranscriptional control of mRNA activity (Chapter 6.5).

PrPC is expressed in a heterogenous set of mouse tissues outside brain (Ford et al.,

2002b), including peripheral nerves and Schwann cells, sympathetic ganglia and nerves,

parasympathetic and enteric nervous system, antigen presenting and processing cells,

populations of lymphocytes and the neuroendocrine system. A good correlation

between mRNA and protein was found outside brain.

64

Page 17: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Barmada et al. (2004) generated transgenic mice in which the PRNP promoter drives

expression of a fusion protein PrP-EGFP (enhanced green fluorescent protein). PrP-

EGPH was expressed within synapse-rich regions in brain. In the hippocampus,

fluorescence was found in the synapse-rich layers such as the strata oriens, radiatum,

lacunosum-moleculare and lucidum, alveus, subiculum, fimbria and hilus. PrP-EGPH

was found throughout the neocortex. In the cerebellum, fluorescence was detected at

high levels in the molecular layer, and at lower levels in the granule cell layer and white

matter.

Morel et al. (2004) analysed expression of PrPC in normal human intestinal tissues. PrPC

was expressed in enterocytes, the dominant cell population of the intestinal epithelium,

and also in the vascular epithelia. The enterocytic cell line caco-2/TC7 also expresses

PrPC.

2.4. Cell Biological Features of PrPC

The metabolism of PrPC determines both its normal role and its contribution to prion

disease pathogenesis.

Mammalian PrPC is a membrane protein that cycles constitutively between the cell

membrane and early endosomes (reviewed in Harris, 2003). The biosynthetic pathway

of PrPC is similar to that of other secreted and membrane proteins. It is first synthesized

in the endoplasmic reticulum (ER), then post-translationally modified (cleavage of

signal peptides, N-linked glycosylation and addition of the GPI-anchor) in the ER and

Golgi, and finally it reaches the cell surface. The PrPC molecules cycle constitutively

through the cell with a transit time of approximately 1 hr: the t1/2 for internalisation and

the t1/2 for return to the cell surface are both roughly 20 min with the protein being

equally divided between the two compartments (Shyng et al., 1993). Most of the

molecules are recycled intact to the cell surface but a small percentage is proteolytically

cleaved in the middle of the protein. Roughly 10-30% of the membrane-anchored

molecules is released into the extracellular milieu. The t1/2 for degradation of PrPC in

lysosomes is 3-6 hrs (Taraboulos et al., 1992).

65

Page 18: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Most of the protein resides in membrane “rafts”, detergent-resistant domains enriched in

sphingolipids that are foci for signal transduction events. Internalisation of PrPC may

occur via clathrin-coated vesicles and is mediated by the N-terminal part of the protein

or, alternatively, through a caveolae-mediated endosomal pathway. Binding of copper

stimulates the endocytosis.

Peters et al. (2003) analysed PrPC trafficking using cryoimmunogold electron

microscopy. They found that PrPC was enriched in the caveolae, stable membrane

microdomains (“rafts”) that mediate key cell processes such as signal transduction,

anchored by the actin cytoskeleton, and enriched in caveolin, cholesterol and

glycosphingolipids. PrPC was delivered to the late endosomes/lysosomes via a

nonclassical, caveolae-containing early endocytic structures (“caveosomes”). The GPI-

anchored proteins may cycle between the cell surface and trans Golgi network via this

pathway and inhibitors of such endocytosis may be of therapeutic interest.

Early studies of PrPC localization were ambiguous: it was predominantly found in the

soma with minor signal in the neuropil (Bendheim et al., 1992; Ford et al., 2002b) but

PrP was found to be predominant in the neuropil as well. It could also be predominant

in the synaptosomal plasma membrane but with no presence in the synaptic vesicles or

cytosol (Herms et al., 1999).

Mironov et al. (2003) investigated ultrastructural localization of PrPC in the mouse

hippocampus cornu ammonis 1 (CA1) and dentate gyrus areas. They demonstrated

ubiquitous cell distribution of the extracellular PrPC. Consistent with its GPI-anchored

membrane asociation, this suggests that it diffuses along the cell membrane. PrPC was

associated predominantly with the neuropil and had the same concentration within the

synaptic specializations and perisynaptically. It was present with the same concentration

in the presynaptic and postsynaptic membranes and within the synapse, but no PrP was

found in the synaptic vesicles. Besides PrPC associated with the biosynthetic and

endocytic membranous structures, a cytosolic PrP was also identified in subpopulations

of unknown neurons in the hippocampus, neocortex and thalamus (CPrP cells). This

66

Page 19: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

cytosolic PrP could be novel PrP entity, with structure and function different from the

extracellular PrPC.

Barmada et al. (2004) confirmed existence of the cytosolic PrP. Further, they showed

that the PrP-EGFP (Chapter 2.3) is localized primarily along axons and in presynaptic

terminals. This distribution is consistent with retrograde and anterograde transport of

PrP along axons (Moya et al., 2004) and with preferential sorting of some GPI-anchored

proteins in neurons to their axonal surface. There was less PrP-EGFP on dendrites in the

hippocampus and cerebellum.

In enterocytes, PrPC is localized in rafts microdomains as well (Morel et al., 2004;

Chapter 2.3). Further, it was mainly concentrated in the lateral membrane, associated

with the junctional complexes. This localization was dependent on cell-cell contacts

(Chapter 2.5.2). PrPC was not found on the apical membrane.

There are three topological forms of PrP known: the extracellular GPI-anchored form

(PrPC) comprising roughly 50% of total PrP, and two transmembrane entities (CtmPrP

and NtmPrP) spanning the cell membrane in opposite orientations and comprising about

10% and 40% of total PrP (Hegde et al., 1998). Two adjacent regions act in concert to

generate transmembrane entities: TM1 (A113-S135 in human PrP) and STE (for stop

transfer effector L104-M112 in human PrP). Aberrant regulation of PrP biogenesis and

topology may cause neurodegeneration. CtmPrP caused severe neurodegeneration in

mice, and is a key component in the GSS disease pathway caused by the A117V

mutation. NtmPrP could have normal role.

Both PrP isoforms have two variably occupied glycosylation sites, Asn181 and Asn197

in human PrP (reviewed in Rudd et al., 2002). More than 50 glycans occupied either or

both sites in PrPC. The PrPSc from the scrapie-infected hamster brain also had glycans at

these sites, but contained more of tri- and tetra-antennary glycan complexes. The

glycans stabilise the folded part of PrPC, so altering its sugars could have functional

consequences. For example, the PrP transformation occurs more readily if PrPC is

unglycosylated. The oligosaccharides are also required for the intracellular trafficking

67

Page 20: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

of PrPC (Chapter 1.2.6). Further, they are big in comparison with the PrPC. Simulations

of molecular dynamics (Zuegg and Gready, 2000) showed that the folded domain of

PrPC is stabilized by an indirect effect of glycosylation, and that the glycans change the

surface charge to a negative electrostatic field which could inhibit the association of

PrPC with the membrane.

Both PrP isoforms also contain a phosphatidyl inositol glycolipid that attaches them to

the outer leaflet of cell membrane (Stahl et al., 1988). In fact, all vertebrate homologues

of PrP were predicted to contain this GPI-anchor (Chapter 2.1). It is readily cleaved by

a bacterial enzyme PI-PLC (Chapter 1.2.7), releasing PrPC from the cell membrane.

Simulations of the molecular dynamics indicated that the GPI-anchor is flexible and

maintains the protein 9-13Å from the cell membrane (Zuegg and Gready, 2000). In

general, GPI-anchored proteins are involved in signal transduction and cell activation

(e.g. acetylcholinesterase in synaptic cleft) and they show rapid locomotion (Medof et

al., 1996). They may be promiscuous and reincorporate into membranes in trans,

remaining fully functional (protein “painting”). Such intermembrane transfer of the

mouse GPI-anchored complement restriction factors from erythrocytes to the epithelium

was shown to occur in vivo under physiological conditions (Kooyman et al., 1995). By

analogy with these observations, this feature of GPI-linked proteins may enable

spreading of prions from neuron to neuron.

Indeed, Liu et al. (2002) demonstrated that PrPC could be transferred from cell to cell by

a GPI-dependent process in vitro. This process is tightly regulated, as it occurred only

after either donor or recipient cells, or both, were activated by the protein kinase C

(Chapter 2.5.2) activator phorbol 12-myristate 13-acetate (PMA). The transfer was also

dependent on direct cell to cell contact.

Exosomes are membrane vesicles released into the extracellular millieu. Follicular

dendritic cells, which are implicated in the peripheral prion disease pathogenesis

(Chapter 1.2.9), release and exchange exosomes with other cells (Fevrier and Raposo,

2004). Exosomes are released after exocytic fusion of multivesicular endosomes, and

could act as carriers for intercellular exchange of PrPC and PrPSc (Fevrier et al., 2004).

68

Page 21: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

A fraction of infectious PrPSc was released from the scrapie-infected Mov and Rov cells

in association with exosomes. Native PrPC is released in the same manner. Protein

composition of the PrP-carrying exosomes was evaluated by mass spectroscopy.

Among others, proteins involved in adhesion, membrane fusion and exosome

biogenesis were found, indicating that the PrP-carrying vesicles are bona fide

exosomes. Exosomes are a newly discovered mode of intercellular communication.

They are released by many cell types including B cells and intestinal epithelial cells.

They are enriched in cell-type specific proteins (e.g. MHC I and II in B cells), in

ubiquitous proteins involved in biosynthesis of exosomes and their adhesion to target

cells, in membrane raft components, and in GPI-anchored proteins. This finding is in

agreement with result of Peters et al. (2003): one fate of the caveosomes is exocytic

fusion and release into the extracellular environment.

Yedidia et al. (2001) showed that roughly 10% of the newly synthesized PrPC is

degraded by the ERAD-proteasome pathway, which is responsible for clearing of

misfolded proteins (Chapter 1.4.1). During this process, PrP molecules are translocated

to the cytosol, unglycosylated, ubiquitinated and degraded.

Ubiquitous expression indicates functional contribution of the PRNP to many cell types.

Glycosylated, GPI-anchored extracellular PrPC diffuses along the cell membrane. It

resides in the cell membrane foci that mediate signal transduction, cycles constitutively

and is degraded by the lysosomes.

2.5 Normal Function of PRNP

The normal function of prion protein gene remains elusive, and a number of hypotheses

were proposed.

2.5.1 Prnp Knock-Out Mice

Prnp knock-out mice were constructed to illuminate the normal function of Prnp.

69

Page 22: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Prion protein gene knock-out mice conservatively generated by disrupting the Prnp

ORF have no obvious phenotype (Bueler et al., 1992; Manson et al., 1994; Weissmann

and Flechsig, 2003). No major anatomical abnormalities, infertility, difference in

immunological status, learning or behavioural changes were found.

A more radical Prnp knock-out (which, as well as disrupting the ORF, also included

removal of the splice acceptor site of the exon 3) produced ataxia and loss of the

Purkinje cells later in life (Sakaguchi et al., 1996). However, this phenotype was a

consequence of the up-regulation of the Prnd gene and its high, non-physiological

expression in brain (Chapter 2.2.2).

There are several explanations for the lack of phenotype in the conservative Prnp

knock-out mice. The knock-out phenotype could be so subtle that a selective

disadvantage may emerge only after many generations, for example as a consequence of

stressful conditions. Alternatively, the functional redundancy or compensation of its

loss by other molecule(s) may mask the loss of the Prnp gene. Another possibility is

that the protein may have recently lost its function (Bueler et al., 1992). Finally, the

knock-out phenotype may not be apparent in laboratory settings.

However, there could be more subtle phenotypic changes in the Prnp knock-out mice.

Collinge et al. (1994) reported that the CA1 hippocampal slices from Prnp knock-out

mice show weakened GABAA receptor-mediated fast inhibition and impaired long-term

potentiation. However, other laboratories could not confirm this observation (Lasmezas,

2003). Colling et al. (1997) reported aberrant mossy fibers in the Prnp0/0 mice

hippocampus CA2 and dentate gyrus regions, similar to morphological abnormalities

following epileptic seizures.

Mice devoid of Prnp exhibited alterations in circadian activity rhythms and sleep

(Tobler et al., 1996) indicating involvement of Prnp in regulation of sleep. Period

lengths of the circadian activity rhythms were longer in the null mice than in wild type.

Next, the Prnp0/0 mice were less active in the first half of the dark period. The null mice

70

Page 23: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

also showed different non-rapid eye movement sleep (REM), waking distribution in the

dark and sleep fragmentation. These phenotypes were rescued by re-introduction of the

Prnp gene. Evaluation of behavioural parameters in the Prnp0/0 mice showed normal

fear-motivated memory, anxiety and exploratory behaviour but slightly increased

locomotor activity (Roesler et al., 1999).

Results consistent with the mild knock-out phenotype were produced by using a

tetracycline controlled transactivator to repress PrPC expression in adult mice. Tremblay

et al. (1998) found no deleterious effects. After administration of doxycycline (an

analogue of tetracycline) to adult mice, expression of PrPC in brain was repressed by

90% after seven days of treatment; when doxycycline was withdrawn, it took seven

days for PrPC expression to return to its normal level. Doxycycline-treated mice were

not susceptible to exogenous prions. The absence of systemic or CNS dysfunction upon

PrPC repression also argues in favour of redundancy between PrPC and other

molecule(s) as no developmental compensation and adaptation was possible using this

experimental system.

Using the cre-loxP system to knock-out the Prnp gene in 9 week old mice Malluci et al.

(2002) found that mice remained healthy and showed no evidence of

neurodegeneration. However, a significant reduction of afterhyperpolarization in the

CA1 cells was found, indicating that the PrPC may modulate neuronal excitability by

affecting afterhyperpolarization. Bypassing developmental compensatory mechanisms

induced no detrimental effect, suggesting once again functional redundancy between

Prnp and another gene(s).

Coitinho et al. (2003) studied behavioural parameters in the 3- and 9-months old Prnp0/0

mice. Behavioural parameters were also compared after administration of anti-PrPC

antibodies into the CA1 region of dorsal hippocampus in normal 3- and 9-months old

rats. Memory performance normally declines with aging, starting at the age of 9-12

months in rodents. No difference from normal mice was observed in the 3-months old

Prnp0/0 mice. On the other hand, impairment of both short- and long-term memory was

observed in the 9-months old Prnp0/0 mice when compared with normal mice. This was

71

Page 24: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

also the case in comparisons of 9-months old rats that received anti-PrPC antibodies

compared with normal rats. Decreased locomotor activity during observation of an open

field was observed in the 9-months old Prnp0/0 mice. Normal anxiety was found in both

Prnp0/0 mice age groups. These observations may be explained by the impairment (or

modification) of PrPC physiological functions in the adult Prnp0/0 mice hippocampus.

The Prnd gene (Chapter 2.2.2) is dispensable for prion disease pathogenesis. Its normal

function must encompass reproduction, since male Prnd knock-outs are infertile.

Overexpression of Prnd in the brain causes neurodegeneration that can be rescued by

the expression of Prnp. Mice in which both paralogues, Prnp and Prnd, were

inactivated showed no additional new phenotype (Genoud et al., 2004). Double knock-

out mice had no morphologic or immunologic abnormalities apart from infertility of

male mice. This analysis showed that there is no functional redundancy between Prnp

and Prnd genes. Therefore, functional redundancy is likely to exist between Prnp and

its other homologue(s) (Chapters 4-6).

The homologue(s) of Prnp with redundant function are unknown. Shadow of prion

protein SPRN is the only human gene that is such a candidate at present (Chapters 4-7).

2.5.2 Hypotheses about the Function of PRNP

Many hypotheses have been proposed for PRNP function, including its involvement in

copper transport, copper buffering, redox signalling, neuroprotection, cell-cell

interactions, lymphocyte activation and nucleic acid metabolism and signal

transduction. Here I will briefly outline eight hypotheses, and describe in full ninth

which is supported by my work (Chapter 6).

2.5.2.1 PrPC Transports Copper

The endocytic pathway of PrPC could suggest a role in uptake or in efflux of an

extracellular ligand (Harris, 2003). Mammalian PrPC binds copper cooperatively at five

to six sites in a low micromolar range (total copper concentration is 16-20 µM in blood,

72

Page 25: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

0.5-2.5 µM in cerebrospinal fluid and 15 µM in synapse) and in a pH-dependent manner

(optimal at physiological pH) (Brown et al., 1997). The residues involved in copper

binding are histidines that reside in the proximal octarepeats, and histidines in the C-

terminal domain (His96, His111 or His140 in human PrP; Chapter 2.1). Deletion of the

proximal repeats in chicken PrPC also affected copper binding (Pauly and Harris, 1998).

Prnp0/0 mice showed reduced copper content in the membrane-enriched brain and liver

extracts and increased content of serum copper. Tenfold reductions of copper content

were also found in the synaptosomal and endosome-enriched brain fractions, indicating

that the PrPC-deficient cell membranes are also deficient in copper. Further, a reduction

in the activity of copper/zinc superoxide dismutase (SOD-1) and altered

electrophysiological responses in the excess of copper were observed in PrPC-deficient

cells.

Thus, PrPC is a copper-binding cuproprotein whose low affinity copper binding may

allow exchange of copper with other molecules. In this, PrPC may be similar to the

proteins implicated in pathogenesis of Parkinson’s disease (monoamine oxidase),

Alzheimer’s disease (amyloid precursor protein APP) and familial amyotrophic lateral

sclerosis (SOD-1), which are also cuproproteins.

It is unclear how copper and PrPC could be functionally related. Bound copper may

serve as a cofactor for enzymatic activity of PrPC, PrPC may act as a sink for chelation

of extracellular copper ions, or PrPC may act as a carrier protein for copper uptake and

delivery to intracellular targets. Pauly and Harris (1998) showed that copper rapidly and

reversibly stimulates endocytosis of PrPC from the cell surface. Incubation of N2a

mouse neuroblastoma cells expressing either mouse or chicken PrPC with excess CuSO4

(200 µM, 500 µM) rapidly stimulated internalisation of both PrPCs. The removal of

metal reversed the PrPC distribution.

Two models for the role of PrPC in copper trafficking were hypothesized (Harris, 2003).

Firstly, PrPC could serve as a receptor for uptake of copper ions from the extracellular

milieu. It could bind copper on the plasma membrane via the proximal repeats and

73

Page 26: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

deliver it by endocytosis to the acidic endosomal compartments, where copper ions

dissociates at low pH and are then transported to the cytoplasm. PrPC could then return

to the cell surface to begin a new cycle. Alternatively, PrPC could facilitate cellular

efflux of copper via the secretory pathway by binding copper ions in the Golgi

compartments.

2.5.2.2 PrPC Buffers Copper from the Synapse

As PrPC is concentrated at the synapse both presynaptically and postsynaptically,

copper binding may have an anti-oxidant effect that is important for synaptic

homeostasis (reviewed by Brown, 2001). At the cellular level, PrP-deficient cells are

more susceptible to oxidative damage and toxicity, and show increased sensitivity to

various kinds of stresses, implying a protective role of PrPC. The synaptic release of

copper may increase its local concentration to up to 250 µM. This copper is usually

bound to peptides or amino acids and must be taken up rapidly by the neurones. Excess

copper can catalyse interconversion of various reactive oxygen species, or even

generate hydroxyl radicals from water. Sequestering it from the synapse is therefore

important to protect the cell from oxidative damage. PrPC-deficient cells do take up

copper, but to a lesser extent than PrPC-containing cells.

Brown et al. (2001) showed that the protection of cells against oxidative stress by PrPC

is proportional to the amount of copper it binds. Both purified PrPC and recombinant

PrP exhibited superoxide-dismutase-like activity in a formazan formation assay. The

SOD-like activity increased with the number of copper molecules incorporated, and it

depended on the copper concentration. This suggested that copper binding facilitates

changes in the secondary structure of the protein. The SOD-like activity was inhibited

when PrPs were incubated with the PrP106-126 (Chapter 1.2.5). Increased resistance to

oxidative stress was also shown for cells grown in excess copper, but not when PrPC

was stripped away using PI-PLC. Expression of PrPC with bound copper boosted

cellular resistance to oxidative stress.

74

Page 27: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Cui et al. (2003) investigated which regions of prion protein are required for the SOD-

like activity. The repeats and hydrophobic region (Chapter 2.1) are indispensable for

this activity, and the C-terminus is also important.

Several studies argue against a copper-transporting role for PrPC; for example, the in

vitro study of Rachidi et al. (2003) indicated that the PrPC was not involved in delivery

of copper at physiological concentrations (1.6 µM).

2.5.2.3 PrPC Contributes to Redox Signalling

Another suggestion is that PrPC could be a copper-sensitive stress-sensor, which is able

to initiate signal transduction cascades. After sensing stimuli such as copper and/or free

radicals, PrPC could trigger intracellular calcium signals that contribute to modulation of

synaptic transmission and maintenance of neuronal integrity (reviewed by Vassallo and

Herms, 2003). PrPC may efficiently buffer copper at the synapse in order to maintain

copper concentrations in the presynaptic cytosol and protect synapses from oxidative

insult. These complementary activities should also contribute to the preservation of

neuronal electrophysiology. Copper may be transported back to the cell by other

transporters present on the outer side of the cell membrane.

Some features of PrPC, like its neuroprotective effect against oxidative stress, suggest

that it is involved in free radical pathways, as these overlap with systems controlling

homeostasis of redox-active metals such as copper. One scenario is that PrPC acts as a

modulator of calcium flux in response to copper because copper enables redox

signalling and triggers responses. Thus copper will bind to PrPC after its concentration

increases, enabling it to participate in the redox reactions (such as SOD-like activity), in

turn triggering membrane kinases and activating Ca2+-mediated signalling cascades.

Therefore PrPC may act as a sensor for strong copper/reactive oxygen species (ROS)

stimuli and by generating a signal through redox chemistry it may turn on Ca2+ -

mediated signalling.

75

Page 28: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

2.5.2.4 PrPC has Neuroprotective Role

Several lines of evidence implicate a role of PrPC in prevention of apoptotic cell death.

Using the yeast two-hybrid system, Kurschner and Morgan (1995) identified Bcl-2 as a

binding partner of PrPC. Bcl-2 specifically suppresses apoptosis in a number of cell

types and it can bind proteins from the same and from other protein families. A peptide

comprising the C-terminal 183 amino acids of mouse PrPC (residues 72-254) interacted

with the Bcl-2 region that contains the BH2 domain (residues 174-236). By this

association, PrPC could sequester Bcl-2 from its intracellular organelle pools, and the

depletion of Bcl-2 pools during prion disease and accumulation of PrPSc may contribute

to apoptosis.

Kuwahara et al. (1999) established hippocampal cell lines from Prnp0/0 and Prnp+/+

mice. A stress insult (serum removal from the cell culture) caused apoptosis in the

Prnp0/0 cells but not in the Prnp+/+ cells. Transduction of the Prnp0/0 cells with either

PrPC- or Bcl-2-coding constructs prevented apoptosis of the cells under the serum-free

conditions. Prnp0/0 cells had shorter neurites than Prnp+/+ cells, but this was also

abrogated by the expression of PrPC in Prnp0/0 cells. This study strongly indicated the

involvement of PrPC in prevention of cell death.

Human PrPC protected neurons against apoptosis mediated by the Bax protein (Bounhar

et al., 2001). Inhibition of apoptosis depended on the proximal octarepeats but not on

the GPI-anchor. Bax is not pro-apoptotic unless it is induced by insult or

overexpression. However, overepression of both Bax and PrPC prevented apoptosis in

the human primary neurons. Conversely, an antisense PrPC cDNA potentiated the effect

of Bax overexpression. Trafficking of PrPC past the cis-Golgi was required for

neuroprotection. The PrP mutations D178A (FFI) and T183A prevented the protective

effect of PrPC. Thus, PrPC could be a strong natural neuroprotector.

Activation of PrPC in vitro induced neuroprotection (Chiarini et al., 2002). An

immunogenic PrPC-binding peptide (PrR; Martins et al., 1997) that binds the mouse

76

Page 29: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

PrPC between residues 113-128 activated the cAMP/protein kinase A (PKA) and ERK

pathways, partially preventing apoptosis in retinal explants from neonatal rats or

neonatal mice, but not from Prnp0/0 mice. Incubation of cells with PrR increased the

intracellular levels of cAMP, activity of PKA and activation of ERKs. Addition of the

PrP106-126 peptide disrupted interactions between PrPC and PrR, blocking the

neuroprotective effect. Inhibitors of PKA, but not of ERKs, blocked neuroprotection

suggesting involvement of cAMP/PKA-dependent pathway in the PrPC-mediated

neuroprotection. Further, antibodies to PrPC that increased cAMP also increased the

neuroprotective effect, indicating that the activation of PrPC transduces neuroprotective

signals through a cAMP/PKA-dependent pathway and affects sensitivity to induced

apoptosis.

A cDNA microarray analysis was used to determine which genes are over- or under-

expressed in a human breast cancer cell line resistant to the cytotoxic action of tumor

necrosis factor α (TNF) (Diarra-Mehrpour et al., 2004). Seventeen-fold overexpression

of PRNP mRNA and also overexpression of PrPC was found in a TNF-resistant clone.

Furthermore, overexpression of PrPC was able to convert TNF-sensitive cells into TNF-

resistant cells. The protective effect of PrPC on tumor cells could be a consequence of

its interaction with laminin 2 and activation of the PI3K/Akt pathway.

2.5.2.5 PrPC Mediates Intercellular Contacts

PrPC binds molecules in extracellular matrix and on the cell membrane that mediate cell-

cell interactions.

The 37-kDa laminin receptor precursor (LRP) was identified as an interacting partner of

PrPC using the yeast two-hybrid system in S. cerevisiae (Rieger et al., 1997). PrPC binds

the same domain of LRP (residues 161-180) as does laminin. Laminin is a glycoprotein

involved in cell attachment, differentiation, movement and growth. The interaction

between PrPC and LRP was confirmed by re-transformation and by co-transfection in

the insect (Sf9) and mammalian (COS-7) cells. The LRP level was higher in scrapie-

infected N2a cells and in brains of scrapie-infected mice and hamsters. The LRP,

77

Page 30: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

located on the cell surface, binds elastin and laminin and mediates their action. The two

extracellular proteins, PrPC and LRP may interact on the cell surface.

Interaction between laminin and PrPC was also shown (Graner et al., 2000) by the

specific and saturable fashion in which PrPC bound laminin. In brain, laminin promotes

neuronal differentiation, migration of neurons, neuronal regeneration and also acts anti-

apoptotically. These effects are mediated by the cell membrane receptors because

laminins are major components of the extracellular matrix. For example, the interaction

between laminin and amyloid precursor protein promotes neurite outgrowth. The PrPC-

laminin interaction was also involved in neuritogenesis induced by NGF and laminin in

PC-12 cells, suggesting a role for PrPC in neuronal plasticity. Supporting this hypothesis

are the observations that anti-PrPC antibodies inhibited neuritogenesis and that NGF

treatment of the PC-12 cells increased PrPC expression by 25%. Laminin is a big (800

kDa), heterotrimeric molecule with many known isoforms. PrPC bound preferentially to

the well-conserved γ-1 chain C-terminal domain of laminin that stimulates neurite

outgrowth. Neuritogenesis stimulated by the γ-1 chain was abrogated in the Prnp0/0

cells. PrPC may therefore act as a laminin receptor.

In the caveolae-like membrane microdomains (“rafts”), PrPC was identified as a part of

protein complexes together with three spice variants of the neural cell-adhesion

molecule (N-CAM) (Schmitt-Ulms et al., 2001). The N-CAMs belong to the

immunoglobulin superfamily and they mediate cell-cell interaction by triggering

cytosolic signals. The PrPC-N-CAM interaction occurred through amino-acid side

chains. The interacting face of PrPC, its N-terminal part, the first helix and the adjacent

loop, bound the β-strands C and C’ within two adjacent N-CAM fibronectin type III

modules. The partners may associate early during their joint passage in the secretory

pathway. Knock-out mice lacking N-CAM were susceptible to prions, indicating that N-

CAM is not the protein X (Chapter 1.2.4). However, the PrPC/N-CAM association may

be involved as an alternative signalling route from PrPC to Fyn tyrosine kinase (see

below).

78

Page 31: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

In order to identify proteins that reside near PrPC in the cell, Schmitt-Ulms et al. (2004)

used time-controlled transcardiac cross-linking (tcTPC), a method that combines

transcardiac perfusion and mild formaldehyde cross-linking. More than 20 proteins

were identified; most of these were either integral membrane proteins or proteins that

reside near the cell membrane. Some of the proteins were components of the secretory

pathway. Of twenty proteins, six are GPI-anchored proteins (PrPC, N-CAM 1, N-CAM

2, myelin-associated glycoprotein, contactin-1, limbic system-associated membrane

protein), and two were previously identified partners of PrPC, chaperone BiP (Chapter

1.4.1), and APP-like proteins. Most of these twenty proteins are involved in cell

adhesion and neuritic outgrowth. Although it is possible that not all identified proteins

are genuine interacting partners of PrPC, this analysis confirmed that PrPC is embedded

within the specialized membrane microdomains (“rafts”) together with a defined subset

of other GPI-attached molecules.

Morel et al. (2004) showed co-localization of PrPC and Src kinase at the junctional

complexes on the lateral membrane of enterocytes. A pool of Src also co-precipitated

with anti- PrPC antibodies and vice versa. Thus, PrPC could play a role in intercellular

signalling and/or sensing of neighboring cells, through an interaction with Src kinases

(Fyn tyrosine kinase is a member of Src family; see below).

2.5.2.6 PrPC is Involved in Lymphocyte Activation

Evidence that PrPC is involved in the activation of T cells includes the observation that

PrPC is expressed at high levels in T cells, B cells, monocytes and dendritic cells (Li et

al., 2001). The composition of N-linked glycans on PrPC from these cells is different

from those on PrPC from brain or neuroblastoma cells. The level of PrPC expressed on

the surface of T cells increased as a consequence of cellular activation (Chapter 2.3).

The memory T cells express more PrPC than naïve T cells. Anti-PrPC antibodies

inhibited the proliferation of T cells in vitro. Thus PrPC may be involved in the

activation of T cells.

79

Page 32: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

There is a strict association between the PrPC and Fyn in the lymphoblastoid T cells

(Mattei et al., 2004). PrPC clustered within the glycophospholipid-enriched membrane

microdomains (GEMs) where it strongly interacted with the GM1 and GM3

gangliosides. The GM3 is the main constituent of GEMs where it modulates signal

transduction. The phosphorylation protein ZAP-70 was also found to interact with PrPC

after T cell activation mediated by CD28 and CD3. ZAP-70 has a key role in the GEM-

associated signalling pathways leading to T cell activation. PrPC could be a component

of the signalling complex leading to T cell activation.

Finally, after hypothermal stimulation of the human lymphocyte cell line Jurkat E6.1,

PrPC co-localized with the CD3 and GM1 in the lipid rafts (Wurm et al., 2004).

Thus, PrPC could be involved in activation of T cells.

2.5.2.7 PrPC Participates in Nucleic Acid Metabolism

PrPC has nucleocapsid protein-like properties (Gabus et al., 2001). Human PrPC

mimicked the chaperone properties of HIV Ncp7 nucleocapside protein by actively

assisting the annealing of complementary nucleic acid strands, viral RNA dimerization,

hybridisation of replication primer tRNALys to the HIV-1 5’-primer binding site

sequence and initiation of reverse transcription by reverse transcriptase. The

transmembrane or the cytoplasmic PrP entities (Chapter 2.4) could interact with cellular

and/or viral nucleic acids.

2.5.2.8 PrPs are Memory Molecules

Alternative PrP conformations other than PrPSc could exist (Tompa and Friedrich,

1998). The self-sustaining autocatalytic propagation of these states may determine the

normal PrP function. A kinetic model was proposed in which PrP forms a bi-stable

molecular switch that can structurally encode and stably store information. Such a

mechanism could control a range of physiological processes, including the formation of

80

Page 33: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

memory. The mechanism for long-term synaptic stabilization mediated by the neuronal

isoform of CPEB from sea hare shows similarities with this model (Chapter 1.5.2).

2.5.2.9 PrPC is Signal Transduction Protein

The final hypothesis I will discuss is that PrPC could be a signal transduction protein.

This is supported by findings of interactions between PrPC and proteins involved in

signal transduction. Antibody cross-linking of PrPC in the mouse 1C11 neuronal cells

triggers activation of the Fyn tyrosine kinase (Mouillet-Richard et al., 2000). In the

mouse hippocampus, Fyn contributes to the molecular mechanisms for induction of

long-term potentiation (a long-lasting enhancement of synaptic transmission thought to

be the cellular basis for learning and memory) (Kojima et al., 1997). The 1C11 cell line

is a neuroectodermal progenitor that, depending on the inducers, differentiates into

either 1C11*/5-HT serotonergic cells or 1C11**/NE noradrenegic cells. PrPC is expressed in

both progenitor and differentiated cells. PrPC cross-linking did not trigger response in

the progenitor cells. However, dephosphorylation of the Fyn tyrosine kinase and

increase of its kinase activity was found 10 min after ligation of the ani-PrPC antibodies

1A8 and SAF61 in the differentiated cells. Progenitor and differentiated cells have

similar amounts of PrPC, but the signalling competence involving PrPC depended on the

differentiation and full acquisition of neuron-associated functions. In the differentiated

cells, PrPC co-immunoprecipitated with caveolin-1. Antibodies against caveolin-1

inhibited the PrPC-mediated activation of Fyn, indicating involvement of caveolin-1 in

the coupling of PrPC with Fyn. Physiological extracellular signal leading to the

activation of PrPC is unknown. Although PrPC was abundant in both cell bodies and

neurite extensions, the neuritic PrPC was mostly due to Fyn activation. Thus, PrPC may

be involved in modulation of neuronal functions.

The recombinant bovine PrP (residues 25-242) interacts with the catalytic α/α’ subunits

of protein kinase CK2 (Meggio et al., 2000), a pleiotropic protein kinase that is

abundant in brain. CK2 phosphorylates more than 200 substrates, most of which are

involved in signal transduction and gene expression. The association between CK2 and

81

Page 34: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

PrP induced CK2 phosphotransferase activity. Both N-terminal and C-terminal parts of

recombinant PrP were involved in this activation, but the N-terminus was more

important for activation. The CK2 is extracellular and could contact PrPC on the outer

side of the cell membrane leading to stabilization of the active conformation of CK2.

Recombinant mouse PrP (residues 23-231) was used as a bait to screen a mouse brain

cDNA expression library in the yeast two-hybrid system (Spielhaupter and Shätzl,

2001) leading to identification of the neuronal phosphoprotein synapsin Ib, adaptor

protein Grb2 and uncharacterized prion interactor Pint as potential partners of PrPC.

These interactions were confirmed by co-immunoprecipitation assays. Synapsin Ib and

Grb2 interacted with both the N- and C-terminal parts of PrP, but Pint interacted with

the C-terminal part only. PrPC co-fractionated with synapsin Ib and Grb2 in microsomal

preparations, indicating that these proteins interact in the intracellular, presumably

Golgi, vesicles. Pint1 is a newly discovered protein, with homologues in human and C.

elegans. Synapsins reversibly attach synaptic vesicles to the cytoskeleton, and regulate

their release, so the interaction between synapsin Ib and PrPC may contribute to the

regulation of cell-cell contact and extracellular signalling. Grb-2 is an adaptor involved

in intracellular signal transduction, which links signals coming from extracellular

proteins to their intracellular effectors. Interactions between PrPC and these proteins

involved in signal transduction suggest a role for PrPC in signal transduction.

When PrPC was stimulated with various anti-PrPC antibodies in the 1C11 progenitor and

differentiated cells, neurohypothalamic GT1-7 cells and T lymphoid BW5147 cells

(Schneider et al., 2004), it triggered production of the NADPH oxidase-dependent

reactive oxygen species (ROS), and phosphorylation of the extracellular regulated

kinase 1 and 2 (ERK 1 and 2), two MAPK kinases. PrPC activation lead to

phosphorylation of the p47PHOX subunit of NADPH, a substrate of the protein kinase C.

Inhibition of NADPH oxidase with diphenyleneiodonium (DPI) abolished ROS

production following PrPC activation, indicating involvement of NADPH oxidase ROS

production in the PrPC-mediated signalling. ROS act as chemical mediators in many

signalling processes such as regulation of transcription factors and activation of kinases,

including the MAPK kinase family. After PrPC activation the ERKs, but not the other

82

Page 35: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

MAPKs, c-Jun NH2-terminal kinase or p38MAPK, were phosphorylated (activated). In

the neuronal context, ERKs are modulators of long-term synaptic facilitation (synaptic

plasticity): they activate the CREB-1-mediated gene transcription (Martin et al., 1997;

Si et al., 2003a; Chapter 6.5). Phosphorylation of ERKs is regulated by ROS production

in the 1C11 progenitor, GT1-7 and BW5147 cells, although the GT-1 and BW5147

cells lack caveolin-1. In the differentiated 1C11 cells, but not in the other cells, both

ROS production and phosphorylation of ERKs were specifically controlled by the

activation of Fyn tyrosine kinase. Thus, PrPC contributes to signalling networks in

neuronal, neuroendocrine and lymphoid cells (Figure 2.4).

Using Affymetrix oligonucleotide microarrays, Mody et al. (2001) analysed patterns of

gene expression in the developing mouse hippocampus. Of 11000 genes, 1926 showed

dynamic changes across the five timepoints denoting major developmental events.

These were the embryonic day 16 (E16) corresponding to the proliferation of neurons,

and the postnatal days 1, 7, 16 and 30 (P1, P7, P16, P30) corresponding to the

outgrowth and differentiation of neurons (P1, P7), formation of synapses (P16) and

maturation of synaptic function (P30). Genes showed 16 different expression patterns

(c0 - c15) of four major types: type I showing overall age-dependent down-regulation

(c0, c1, c5), type II showing general age-dependent up-regulation to peak levels at P16

or P30 (c10, c11, c14, c15), type III showing peak expression at either P1 or P7 (c4, c8,

c9, c12, c13) and type IV showing minimal expression at either P1 or P7 (c2, c3, c6).

This clustering correlated with the major developmental changes. For instance, the c1

genes highly expressed at the E16 were switched off after birth. The Prnp gene

belonged in the type II c15 cluster, showing the highest expression at P30, when the

hippocampal synapses become more active and begin to exhibit increased synaptic

plasticity. The other genes that shared the expression profile with Prnp were related to

the maturation of synaptic function, including the genes involved in synaptic function,

signal transduction, control of transcription and translation, glucose and oxidative

metabolism and membrane regulation of ionic concentration. Of particluar note here is

that the genes encoding PKC subunit βII and MEK protein kinase, which are involved

in the PrPC-induced signalling (Figure 2.4), clustered together with Prnp within the c15.

This clustering of PRNP gene and genes involved in its signalling pathway with genes

83

Page 36: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

PrP

?

PKC

MEK

ERK

PrP

PKC

MEK

ERK

Cav

Fyn

Shc

Grb2

Ras

Raf

?

1C11, GT1-7, BW5147

(Signal(s) ?)

NOxNOx

1C115-HT, 1C11NE

Figure 2.4: Model of the proposed PrPC-associated signalling pathways. 1C11, progenitor neuroectodermal cells; GT1-7, hypothalamic cell line; BW5147, T lymphocyte cell line; 1C115-HT, differentiated serotonergic cells; 1C11NE, differentiated noradrenegic cells; PKC, protein kinase C; NOx, NADPH oxidase; MEK, MEK1 and MEK2 kinases; ERK, ERK1 and ERK2 kinases; Cav, caveolin 1b, Fyn, Fyn tyrosine kinase; Shc Grb-2 Ras Raf, Shc-Grb2/SOS-Ras-Raf signalling cascade (modified from Schneider et al., 2004).

83a

Page 37: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

contributing to mature synaptic function indicates the involvement of PRNP in synaptic

plasticity (Chapter 6).

Comparative genomics is a strategy to understand gene function. I used this approach to

analyse the elusive function of PRNP gene in Chapter 6. My analysis supports best the

signal transduction hypothesis.

2.6 Genomes: Digging Out the Gems

A major impetus for sequencing of the human genome was its potential for discovery of

new human genes related to known disease-associated genes. Study of such genes may

shed light on the function of their disease-causing counterparts, reveal the basis for

related diseases, uncover potential drug targets and gain new insights into disease

pathogenesis mechanisms. Further, genomic sequence allows rapid discovery of

paralogues of the classic drug target proteins in silico. There are also numerous similar

applications to basic physiology and cell biology.

As well as the human genome, there are 167 genomes completely sequenced by now,

including only five other vertebrates, mouse, rat, Fugu, chicken and chimp (Genome

News network; Table 2.1). By 19 August, more than 30 genomes were sequenced this

year (Genome News Network). Genomic sequences are deposited in public biological

databases, and comparison of genomes is a strategy to discover new genes, define gene

regulatory elements and understand genome evolution and gene function.

2.6.1 The Human Genome

The human genome provides evidence of our evolutionary history (Lander et al., 2001).

Clues about human development, physiology and evolution are all encrypted within the

2.9 Gb of DNA. Basic features of the broad genome landscape are the gene content,

distribution of GC content and CpG islands, distribution of repeats and recombination

rate.

84

Page 38: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Table 2.1: 167 sequenced genomes (Genome News Network, 24 August 2004) Aeropyrum pernix Agrobacterium tumefaciens Anabaena Anopheles gambiae Apis mellifera Aquifex aeolicus Arabidopsis thaliana Archaeoglobus fulgidus Ashbya gossypii Bacillus anthracis Bacillus cereus Bacillus halodurans Bacillus subtilis Bacteroides thetaiotaomicron Bartonella henselae Bartonella quintana Bdellovibrio bacteriovorus Bifidobacterium longum Blochmannia floridanus Bordetella bronchiseptica Bordetella parapertussis Bordetella pertussis Borrelia burgdorferi Bradyrhizobium japonicum Brucella melitensis Brucella suis Buchnera aphidicola Caenorhabditis briggsae Caenorhabditis elegans Campylobacter jejuni Candida glabrata Caulobacter crescentus Chlamydia muridarum Chlamydia trachomatis Chlamydophila caviae Chlamydophila pneumoniae Chlorobium tepidum Chromobacterium violaceum Ciona intestinalis Clostridium acetobutylicum Clostridium perfringens Clostridium tetani Corynebacterium diphtheriae Corynebacterium efficiens Coxiella burnetii Cyanidioschyzon merolae Debaryomyces hansenii Deinococcus radiodurans Desulfovibrio vulgaris Drosophila melanogaster Encephalitozoon cuniculi Enterococcus faecalis Erwinia carotovora Escherichia coli Fugu rubripes Fusobacterium nucleatum

Gallus gallus Geobacter sulfurreducens Gloeobacter violaceus Guillardia theta Haemophilus ducreyi Haemophilus influenzae Halobacterium Helicobacter hepaticus Helicobacter pylori Homo sapiens Kluyveromyces waltii Lactobacillus johnsonii Lactobacillus plantarum Lactococcus lactis Leptospira interrogans Listeria innocua Listeria monocytogenes Magnaporthe grisea Mesorhizobium loti Methanobacterium thermoautotrophicum Methanococcoides burtonii Methanococcus jannaschii Methanococcus maripaludis Methanogenium frigidum Methanopyrus kandleri Methanosarcina acetivorans Methanosarcina mazei Mus musculus Mycobacterium bovis Mycobacterium leprae Mycobacterium paratuberculosis Mycobacterium tuberculosis Mycoplasma gallisepticum Mycoplasma genitalium Mycoplasma mycoides Mycoplasma penetrans Mycoplasma pneumoniae Mycoplasma pulmonis Mycoplasma mobile Nanoarchaeum equitans Neisseria meningitidis Neurospora crassa Nitrosomonas europaea Oceanobacillus iheyensis Onions yellows phytoplasma Oryza sativa Pan troglodytes Pasteurella multocida Phanerochaete chrysosporium Photorhabdus luminescens Picrophilus torridus Plasmodium falciparum Plasmodium yoelii yoelii Porphyromonas gingivalis Prochlorococcus marinus Protochlamydia amoebophila

Pseudomonas aeruginosa Pseudomonas putida Pseudomonas syringae Pyrobaculum aerophilum Pyrococcus abyssi Pyrococcus furiosus Pyrococcus horikoshii Pyrolobus fumarii Ralstonia solanacearum Rattus norvegicus Rhodopirellula baltica Rhodopseudomonas palustris Rickettsia conorii Rickettsia prowazekii Rickettsia siberica Saccharomyces cerevisiae Saccharopolyspora erythraea Salmonella enterica Salmonella typhimurium Schizosaccharomyces pombe Shewanella oneidensis Shigella flexneria Sinorhizobium meliloti Staphylococcus aureus Staphylococcus epidermidis Streptococcus agalactiae Streptococcus mutans Streptococcus pneumoniae Streptococcus pyogenes Streptomyces avermitilis Streptomyces coelicolor Sulfolobus solfataricus Sulfolobus tokodaii Synechococcus Synechocystis Thermoanaerobacter tengcongensis Thermoplasma acidophilum Thermoplasma volcanium Thermosynechococcus elongatus Thermotagoa maritima Thermus thermophilus Treponema denticola Treponema pallidum Tropheryma whipplei Ureaplasma urealyticum Vibrio cholerae Vibrio parahaemolyticus Vibrio vulnificus Wigglesworthia glossinidia Wolbachia pipientis Wolinella succinogenes Xanthomonas axonopodis Xanthomonas campestris Xylella fastidiosa Yarrowia lipolytica Yersinia pestis

Bold and underlined, the six vertebrate genomes.

84a

Page 39: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

2.6.1.1 Gene and Protein Content

The early estimates of the gene number varied between 30000-100000 genes. Yet an

average human gene is complex. The mean exon number per gene is 8.8, and the mean

size of internal exons is 145 bp. An average gene extends across 27 kb. The mean size

of introns is 3365 bp, and mean sizes of 3’UTR and 5’UTR are 770 and 300 bp

respectively. The mean size of coding sequence is 1340 bp, translating into a protein of

447 amino acids. It was estimated that approximately 35% of human genes are

alternatively spliced, and there are on average 3 distinct transcripts per gene. The gene

density ranges from 6.4 genes/Mb (chromosome Y) to 26.8 genes/Mb (chromosome

19).

Protein-coding genes in the human genome were predicted from three lines of evidence:

direct evidence of transcription (mRNA, EST), indirect evidence (homology to

previously identified genes and proteins) and ab initio prediction using software that

recognizes the functional signals in genes. The ab initio gene prediction methods

predict correctly about 70% of individual exons and 20% of individual genes in human.

The gene prediction strategy used as a first step the Ensembl prediction system, starting

with the ab initio prediction (Genscan program) and confirmation of gene predictions

by assesing similarity with known proteins, mRNAs, ESTs and protein motifs from any

organism. The protein matches were then extended using the GeneWise program. This

system yielded 35500 gene and 44860 transcript predictions. Frequent mistakes with

this system are fragmentation, merging and overlapping of genes.

In the second step, the Genie program predictions were combined with the Ensembl

gene predictions. Genie starts with the mRNA and EST matches, and then employs the

Hidden Markov Model statistics for ab initio prediction to extend these matches in both

3’ and 5’ directions. This strategy yielded fewer fragmented genes than the Ensembl

system, merging 15437 Ensembl gene predictions into 9526 clusters.

In the final step, known genes from the RefSeq, SWISSPROT and TrEMBL databases

were incorporated into the results, producing a final estimate of 31000 coding genes in

85

Page 40: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

the human genome, only twice as many as in worm or fly. This includes about 15000

known genes, and about 17000 gene predictions, which are a collection of anonymous

genes and a fantastic resource for targeted gene discovery (Chapter 4). This estimate

leads to calculations that, on average, 1.5% of the human genome is coding sequence.

There are also several thousands of non-coding genes in the human genome (tRNAs,

rRNAs, splicesomal RNAs, telomeric RNAs, snoRNAs, microRNA, siRNAs, and other

non-coding genes of unknown function). Overall 30% of the genome would be

transcribed.

The full set of known human proteins is more complex than those in invertebrates due

to presence of vertebrate-specific protein domains and motifs: 7% of the InterPro

families are vertebrate-specific representing 70 protein families and 24 domain families.

Vertebrates have arranged pre-existing protein components into a richer collection of

domain architectures. Specifically, the human genome contains more genes, domains,

protein families, paralogues, multidomain proteins with multiple functions and domain

architectures, in comparisons with worm and fly.

2.6.1.2 GC Content and CpG Islands

There are GC-rich and GC-poor regions in the human genome. The genome-wide GC

content average is 41%, ranging from 36-47.1% on a large scale (> 10Mb), and from

33.1-59% on a smaller scale. There is strong positive correlation between the GC

content and gene density. The human genome contains 28890 CpG islands, which are

short genomic regions (<85 bp) with high GC content (>75%) that are associated with

5’ ends of genes.

2.6.1.3 Repeat Content

Repeat sequences account for more than 50% of the human genome. The repeats are

evidence of evolutionary events and forces that have shaped the genome. As passive

entities, they represent markers for studies of mutation and selection. As active entities,

86

Page 41: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

they have reshaped genome by causing rearrangements, forming new genes, reshuffling

existing genes and modulating of GC content.

Transposable elements comprise 45% of the genome. The currently recognized long

interspersed elements (LINEs), short interspersed elements (SINEs), long terminal

repeats (LTR) retroposons and DNA transposons comprise 13%, 20%, 8% and 3%,

respectively, of the human genome. Overall activity of these transposons has declined

over the past 35-50 million years, with the possible exception of the 61 LINEs with

intact ORFs. There is a remarkable variation in the repeat content across the genome,

ranging from less than 2% across the four 100 kb homeobox gene clusters to 89%

across 525 kb of the X chromosome in region Xp11. The absence of repeats in a

genomic region may indicate many cis-regulatory elements that cannot be interrupted

by insertions.

Simple sequence repeats (SSR) are perfect or imperfect tandem repeats of a particular k-

mer. Microsatellites have a short k (1-13 bp) and minisatellites have longer k (14-500

bp). Simple sequence repeats arise by the DNA polymerase slippage and comprise 3%

of the genome with frequency of one SSR per 2 kb.

Segmental duplications of parts of the genomic sequence (1-200 kb) occur as

interchromosomal duplications when segments are distributed to nonhomologous

chromosomes, and as the intrachromosomal duplications when duplications occur

within a particular chromosome. These regions comprise 3.3% of the genome.

Chromosomal regions near centromeres and telomeres consist almost entirely of

interchromosomal duplicated segments.

2.6.1.4 Recombination Rate

The overall occurrence of single nucleotide polymorphisms (SNP) is roughly 1 in 1900

bp. Recombination rate varies across the genome. In general, recombination rate is

higher in the distal regions of chromosomes (20 Mb from telomere) and on the shorter

87

Page 42: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

chromosome arms, promoting at least one crossover per chromosome arm per meiosis.

Recombination is suppressed near the centromeres.

2.6.1.5 Quality Assessment of the Human Genome Sequence

World standards for the human genome sequence fidelity state that there should be less

than one base pair error per 10000 DNA bases (99.99% accuracy), and that the

sequence should be without gaps. Schmutz et al. (2004) performed a detailed evaluation

of a sample of 34 Mb of the human DNA reference sequence. Accuracy of the sequence

was above 99.99%, with the overall error rate 1/73369. There was 1 significant error (a

single error that causes 50 contiguous base pairs to be incorrect) in 2630005 base pairs.

2.6.2 The Mouse Genome

Mouse is a key experimental tool for biomedical research (Waterston et al., 2002). The

mouse genome is also important for comparative genomics, since roughly 75 million

years of independent evolution separates the human and mouse genomes, which now

diverge in nearly one substitution per two nucleotides.

The mouse genome (2.6 Gb) is 14% smaller than the human genome (Table 2.2), due to

higher deletion rate in mouse. The mouse has higher overall GC content (42%) and

tighter GC distribution. There are fewer CpG islands in the mouse genome (15500) than

in human (28890).

Only 37.5% of the mouse genome can be recognized as transposon-derived, compared

with 45% of the human genome. This is due to higher nucleotide substitution rate that

makes ancient repeat sequences difficult to recognize. The neutral substitution rate in

mouse (4.5 x 10-9 per year) is twice that of human (2.2 x 10-9 per year), perhaps

determined by population size, body size or generation time. The depth of the human

repeat analysis (150-200 million years), therefore, is better than that of the mouse repeat

analysis (100-120 million years). Lineage-specific repeats account for 32.4% of the

mouse genome compared with 24.4% in human. The rate of transposition is constant in

88

Page 43: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Table 2.2: Vertebrate genomes in numbers

Human Mouse Dog Rat Fugu Genome size 2.9 Gb 2.6 Gb 2.4 Gb 2.7 Gb 365 Mb Gene number 31000 31000 NA 31000 39000 Gene density 6.4/Mb -26.8 /Mb NA NA NA 1/10.9 kb

Average gene size 27 kb NA NA NA NA GC content 41% 42% NA 43% 44.1-53.5 %

Transposons 45% 37.5% 31% 40% 2.7 % Substitution rate 2.2 x 10-9/year 4.5 x 10-9/year (2.2 x 10-9/year) 4.9 x 10-9/year NA SNP frequency 1/1900 bp 1/600 bp 1/1500 bp NA NA

88a

Page 44: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

mouse although it has declined in humans. There are 3000 individual LINEs, four SINE

lineages and three LTR lineages that are potentially active in the mouse genome. The

LINEs bias toward AT-rich, and the SINEs bias toward GC-rich genome regions.

The SNP frequency is 1 per 500-700 bp in mouse. Mouse has roughly four-fold more

short SSRs (1-5 bp unit) than human.

Both the mouse and human genome have about 30000 protein-coding genes. There are

80% mouse genes with one identifiable orthologue in human. Less than one percent of

genes are unique to each genome. At the nucleotide level, the two genomes can be

aligned across 40% of their lengths. These sequences are the orthologous sequences

from the common ancestor that remained in both lineages. Over 90% of the human and

mouse genomes can be partitioned into the regions of conserved synteny (orthologous

gene loci on the same chromosome in two species regardless of gene order and presence

of intervening genes). In these genomic regions, gene order from the most recent

ancestor has been conserved in both species.

Approximately 5% of mammalian genome is under purifying selection, more that its

coding potential (1.5%). This suggests that the UTRs (1%), regulatory elements, non-

protein-coding genes and chromosomal structural elements are under functional

selection as well.

The mammalian genome is evolving in a non-uniform manner. There is a substantial

variation across the genome in all three forces that shape genome: nucleotide

substitution, deletion and insertion. Neutral substitution rate is correlated with

recombination rate genome-wide.

Two general mechanisms guide protein invention in eukaryotes. First, domains can be

combined to form new architectures, and second, gene families may expand in a

lineage-specific manner. In the mouse lineage, many local gene family expansions have

occurred. Such examples include genes involved in reproduction, immunity,

development and olfaction.

89

Page 45: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Two-genome comparison between human and mouse allowed estimation of rate of

protein evolution. Measures of protein sequence evolution are the percentage of identity

and the ratio between the rates of non-synonymous (KA) mutations per non-synonimous

site and synonymous (KS) mutations per synonymous site (in general, the KA / KS ratio

<1 indicates purifying selection, the KA / KS ratio =1 indicates neutral evolution, and the

KA / KS >1 indicates positive selection). For the 12845 pairs of mouse-human 1:1

orthologues, the median amino acid identity was 78.5%, and the median KA / KS ratio

was 0.115. The major determinant of the KA / KS ratio was variation in KA. The KS

clustered tightly around 0.6 synonymous substitutions per synonymous site, indicating a

similar neutral substitution rate among all proteins. Domains are under greater selective

pressure than protein regions not containing domains, and catalytical domains are under

greater selective pressure than not-catalytical domains. Finally, domains in the secreted

class are typically under less purifying selection than are either nuclear or cytoplasmic

domains. Protein domain families involved in the immunity and gene transcription

showed the highest median KA / KS ratio.

2.6.3 The Rat Genome

Rat is a tool in experimental medicine and drug discovery (Gibbs et al., 2004). It is

separated from mouse by 12-24 million years, and from human by about 75 million

years. This third mammalian genome sequenced allowed three-way comparisons to

resolve new details of mammalian evolution.

The rat genome (2.7 Gb) is smaller than human but bigger than mouse (Table 2.2). The

difference between rodents is due to a different repeat content and to a different

proportion of segmental duplications.

The number of genes encoded by the rat genome is similar to that in mouse and human

(about 30000). Most of genes (90%) have had no deletion or duplication since the last

common ancestor. The intronic structures have been conserved as well. Coding density

is about 1.7%. There are 435 tRNA genes, and 454 other known non-coding RNA genes

90

Page 46: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

defined in the rat genome. There are 15975 CpG islands in the rat genome. The GC

content is 0.35% enriched in comparison with mouse (43%) due to a higher rate of A to

G transitions over T to C transitions. There is also an excess of the G+T over C+A on

the coding strand (strand asymmetry).

In the protein-coding sequences there is an overall excess of small deletions over

insertions. Based on the three-species comparisons, the rates of indel accumulations in

nuclear, accumulated/secreted, mitochondrial, cytoplasmic proteins, enzymes and

ligand-binding proteins are 4 x 10-4, 3.9 x 10-4, 3.1 x 10-4, 2.4 x 10-4, 2.1 x 10-4 and 1.4 x

10-4. Whereas the transmembrane protein regions were the most refractory to indel

accumulation, the low-complexity protein regions were three times enriched in indels.

Almost all human disease-associated genes have 1:1 orthologues in the rat genome and

are unlikely to be diverged, duplicated or lost. However, their rates of synonymous

substitution are higher than those of remaining genes. Some rat-specific genes arose

through expansion of gene families, including the genes encoding pheromones,

immunity-related proteins and proteins involved in chemosensation and detoxification.

About 3% of the genome is in the large segmental duplications, associated primarily

with the pericentromeric and subtelomeric regions. These regions harbour many

recently expanded gene families. Intrachromosomal duplications occur three times more

frequently than the interchromosomal duplications in rat.

Roughly 40% of the rat genome aligns with human and mouse and this fraction contains

the vast majority of exons and regulatory elements. A portion of this eutherian core

makes 5-6% of the genome that is under selective constraint. About 28% of the rat

genome aligns only with mouse. This fraction contains rodent-specific repeats (40%),

and the rest may be single-copy DNA deleted in the human lineage.

One third (29%) of the rat genome aligns with neither human nor mouse. Half of this

sequence consists of rat-specific repeats, and about third of this sequence are rodent-

specific repeats deleted in mouse.

91

Page 47: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

There were 250 genome rearrangements in the rodent lineage since evolutionary split

between rodents and human. The neutral substitution rate appears to be three times

higher in rodents than in human, with that in rat 5-10% higher than in mouse.

Microdeletions occur at a two-fold higher rate in rodents than in human. There is a

correlation between the local rate of microinsertions, microdeletions, transposable

element insertions and nucleotide substitutions in the rat genome.

Males have two-fold excess of nucleotide substitution and of little indels (<50 bp)

mirroring the ratio of the numbers of cell divisions between the male and female

germlines.

About 40% of the rat genome is derived from transposable elements. The LINEs

comprise 22% of the genome, with the L1 family still active. Two SINE families, B2

and ID, are also active, as well as all three classes of LTR retroviral elements. The DNA

transposons are inactive.

2.6.4 The Fugu rubripes Genome

The tiger pufferfish, Fugu rubripes, has the smallest vertebrate genome but its gene

repertoire is similar to mammals (Venkatesh et al., 2000). Thus it could be a useful

reference genome for gene discovery and discovery of conserved regulatory elements.

Although the compact genome of tiger pufferfish Fugu rubripes has only 365 Mb

(Aparicio et al., 2002), the number of protein-coding genes between human and Fugu is

comparable (Table 2.2). 31059 genes were predicted in the Fugu genome, with the

upper bound of gene loci expected to reach 38000-40000. Genes were predicted mostly

using the homology evidence due to unavailability of cDNA.

Only 2.7% of the genome matched interspersed repeats but this is probably a significant

underestimate due to incompleteness of the Fugu repeats database. Rapid deletion of

nonfunctional sequences may be the mechanism accounting for the repeat structure in

92

Page 48: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Fugu. On the other hand, transposable elements in Fugu appear to be very active. At

least 40 families of transposable elements have accumulated fewer than 5% of

substitutions, indicating that they may be active.

The compactness of Fugu genome is due to reduction in the size of introns and

intergenic regions. Roughly 75% introns are <425 bp in length, but the number of

introns, 161536, is very similar to that in human. Both gain and loss of introns were

observed in the Fugu lineage. The presence of “giant” genes was also noted in Fugu.

The average gene density was estimated to be one gene locus per 10.9 kb. Gene loci

occupy one third of the genome.

There was much lower GC variation in the overall Fugu GC content (44.1-53.5%) than

in human.

With windows of 1, 0.5 and >1 kb, roughly 0.15, 1.3 and 5% of the Fugu genome

contained duplicated segments, indicating that the large duplications are not a recent

feature of the Fugu genome. However, evidence for ancient duplications comes from

the existence of paralogous segments.

Most of human peptides (75%) have some match in Fugu. About 6000 Fugu proteins

have no match in human. There is a general human-Fugu concordance between the

predicted protein classifications. Exceptions include an excess of the potassium channel

subunits and kinases in Fugu, and an excess of the C2H2 zinc finger proteins in human.

Olfactory receptors show a clear expansion of different families in Fugu.

Many short genomic segments are conserved between human and Fugu after separation

by 450 million years of independent evolution. However, scrambling of the gene order,

depending on the chromosome length, was also often found in human-Fugu

comparisons.

93

Page 49: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

2.6.5 The Dog Genome

Dog is an attractive choice for genetic comparisons as the characteristics of about 300

breeds are maintained by restricting gene flow between breeds. The dog genome

sequence was sequenced with 1.5 time coverage, consisting of 6.22 million reads and

covering about 50% of the 4.8 Gb diploid genome (Kirkness et al., 2003). This limited

depth of sequencing permits some initial analyses.

The dog genome is estimated to be 2.3-2.4 Gb (Table 2.2). The 6.22 million reads were

merged into 522011 contigs with mean span of 8.6 kb and random sequence coverage

of about 77%.

Roughly 31% of the sequence is repeat-derived (e.g. human 45%, mouse 38%), but the

dog repeat libraries may not be as complete as those for human and mouse. The

substitution levels were similar in dog and human.

The dog-human alignments covered 18473 genes. Dog appears to have much larger

complements than human of olfactory receptor genes, and genes involved in peptide

metabolism.

The SNP frequency in dog was estimated to be about 1/1500 bp.

Many sequences in the dog genome differed in the presence or absence of a SINE

insertion, and such polymorphisms were verified in a number of dog breeds.

Approximately 7% of the 23000 SINE_cF elements are dimorphic in the sequenced

poodle, and these are a valuable resource for phylogenetic studies. This kind of gene

dimorphism may cause dramatic phenotypic effects (e.g. induction of the canine

narcolepsy), contributing to the phenotypic diversity among the dog breeds.

At present, besides the published human, mouse, rat, dog and Fugu genome analyses,

sequences of the chimpanzee and chicken genomes are also available (Ensembl).

Sequencing and assembly of the Tetraodon and zebrafish genomes is near completion

94

Page 50: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

(Genoscope; Ensembl). Further, sequencing of the cow, pig and Brazilian opossum

genomes is underway (NCBI). The National Human Genome Research Institute

(NHGRI, USA) approved funding for the projects to sequence the genomes of African

savannah elephant, the European common shrew, the guinea pig, two species of

hedgehog, the nine-banded armadillo, the rabbit, the cat, and the orang-utan (Genome

News Network). These vertebrate genomes are priceless resource for discovery of genes

and definition of gene regulatory sequences.

2.6.6 Annotation of Genomic Sequences

Automatic genome annotation is a major strategy to annotate genomic sequences

(Chapter 2.6.1). However, this is a work in progress. The main problems are mistakes

arising from automatic genome annotation, and the inability of recent programs to

predict UTRs and non-protein coding genes. Further, the collections of transcripts and

ESTs are limited.

Guigo et al. (2003) developed a two-stage multi-exon gene prediction procedure that

exploits the availability of human and mouse genomic sequences. The first stage is to

run gene-prediction programs (TWINSCAM, SGP2) that utilize genome alignment in

combination with detection of statistical patterns in DNA. In the second stage,

multiexon genes predicted in human and mouse are compared. Gene prediction is

retained only if the predicted proteins in both species align, with at least one predicted

intron at the same location. A total of 1019 additional new genes were predicted using

this method. The reliability of these gene predictions was 76%, as tested by RT-PCR

and direct sequencing of a single exon pair from a sample of the gene predictions.

Analysis of gene expression patterns indicated that this gene prediction system could be

particularly sensitive to genes with tissue-restricted expression.

There are still transcripts and ESTs that are missing from the human collections. Ota et

al. (2004) sequenced 21243 full-length human cDNAs, of which 14490 were unique.

Roughly half of these were protein-coding cDNAs (5416). Of these, 1999 clusters had

not been predicted by computational methods. The distribution of GC content in this

95

Page 51: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

category has a peak at 58%, suggesting that there may be a bias against GC-rich

transcripts in the current protein-coding gene predictions. The remaining cDNAs

contained no ORF, corresponding to the non-protein-coding genes.

Manual curation is at present the ultimate way to annotate genomic sequence. For

example, The Vertebrate Genome Annotation database (VEGA;

http://vega.sanger.ac.uk/) is a central repository for manual annotation of different

vertebrate finished genome sequences. Expert manual annotators have to correct

mistakes arising from automatic gene prediction by effectively integrating the ab initio

gene predictions, direct evidence, homology-based evidence and comparison across

multiple genomes.

This strategy can also be used in a targeted fashion to search for genes of interest in

multiple genomes and compile supporting evidence. Using this strategy to search

genomic databases for the predicted PRNP paralogues, I discovered a new human

PRNP paralogue dubbed Shadow of prion protein gene (SPRN). I compiled direct

evidence, ab initio gene predictions and homology-based evidence for this gene in

mammals and fish (Chapter 4).

2.7 Comparative Genomic Analysis

With the availability of many genomic sequences, it is now possible to decipher

information that is encrypted within the DNA stands. Comparative genomic analysis is

emerging as a major strategy to understand genomes.

Functional sequences tend to evolve more slowly than non-functional sequences (Frazer

et al., 2002). By comparing genomic sequences it is therefore possible to identify

conserved, functional sequences (coding and non-coding) against non-conserved, non-

functional background noise. The depth of comparative analysis depends on the

evolutionary distance between sequences in comparison. For instance, human-mouse

(75 million years) comparisons revealed many conserved coding and non-coding

regions, but it was not possible to discriminate which non-coding conserved regions are

96

Page 52: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

indeed functional (Waterston et al., 2002). When more species are included in

comparisons (e.g. human, mouse and cow), non-coding sequences conserved in all

species are more likely to be functional. At the other extreme are comparisons between

human and fish (450 million years). These will primarily reveal conserved coding

sequences, but conserved regulatory sequences could be also found.

Computational tools have been developed to enable comparison and analysis of

genomic sequences (Frazer et al., 2003). There are two basic types of programs for

alignment of long genomic sequences: global and local. Global alignments are designed

to produce an optimal similarity score over the entire lengths of sequences compared. I

used the global alignment tool VISTA in my work (Mayor et al., 2000; Chapter 3). The

VISTA server implements AVID algorithm that works by first finding maximal exact

matches between two sequences using suffix tree, and then identifies the best anchor

points based on the length of the exact matches and the similarity of their flanking

regions. Local alignments, on the other hand, are computed to produce optimal

similarity scores between the subregions of sequences. I used the PipMaker program for

local alignments in my analyses (Schwartz et al., 2000; Chapter 3). The underlying

algorithm BLASTZ is a gapped BLAST program that starts by finding short, exact

matches and than extend those matches to alignments that include gaps.

Kellis et al. (2003) compared genomes of four yeast species (S. cerevisiae, S.

paradoxus, S. mikatae, S. bayanus) that diverged over 5-20 million years. This

comparative genomic analysis allowed gene identification and determination of gene

structure. Gene regulatory elements in genes were also found. Furthermore, genes and

genome regions that exhibit fast or slow evolutionary changes were identified.

Thomas et al. (2003) compared an 1.8 Mb region of human chromosome 7 harbouring

10 genes with its orthologous genomic regions from 11 species. Human, chimpanzee,

baboon, cat, dog, cow, pig, rat, mouse, chicken, Fugu, Tetraodon and zebrafish,

spanning 450 million years of evolution, were included in this analysis. These

sequences showed conservation that reflected both functional constraints and neutral

sequence entropy. The small genomic regions (average 58 bp) conserved across these

97

Page 53: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

sequences called multi-species conserved sequences (MCS) are candidate regions for

functional roles. About 2% of MCSs comprises ancestral repeats and 32% represents

coding sequences or UTRs. The remaining 68% of MCSs are outside known exons, and

almost none correspond to currently known regulatory elements. Many of the conserved

non-coding genomic sequences identified by this strategy were previously not

detectable in pairwise sequence comparisons. The human-fish comparisons detected

conservation largely confined to coding sequences, but almost third of human coding

exons did not align with fish. Eliminating chimp and baboon did not affect the

specificity of the MCS detection, but eliminating non-human primates, chicken and fish

reduced the MCS number by 17%. Chicken sequence alone detected 40% of MCS bases

(94% of the coding but only 29% of the non-coding sequences).

I used the public genomic data for mammals (human, mouse, rat) and fish (zebrafish,

Fugu, Tetraodon) as a basis for gene discovery and comparative genomic analysis by

which I determined evolutionary trajectories of the PRNP and SPRN genes (Chapter 5).

2.8 The Tammar Wallaby: an Alternative Mammalian Experimental Model and

Kangaroo Genome Project

The number of vertebrate genomic sequences available limits the depth of comparative

genomic analysis. O’Brien et al. (2001) discussed current limitations for comparative

genomics and listed mammalian species that are a priority for sequencing. The criteria

for sequencing priority includes phylogeny, relevance to understanding human biology

or medicine, economic importance, genomic characteristics, developmental features and

species diversity among mammalian orders. Of 4600-4800 mammal species, all but 270

are eutherian (“placental”) mammals. The eighteen eutherian orders cluster into four

principal clades. Human, mouse and rat all cluster in the clade III. There is therefore a

need to sequence representatives from the other three clades. Livestock together with

cat and dog, cluster in the same clade IV. Representatives of the remaining clade II

(sloths, anteaters, armadillos) and clade I (Afrotheria) should also be considered, as well

as marsupials and monotremes.

98

Page 54: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Graves and Westerman (2002) presented the case for a Kangaroo Genome Project.

Marsupials, found only in Australasia and the Americas, are mammals since they bear

fur and suckle their young with milk. Yet, independent evolution over 180 million years

of separation from their eutherian relatives has sculpted different (but not inferior)

mammals with quite distinct characteristics.

Three distantly related marsupial species have been of major experimental interest:

tammar wallaby (Macropus eugenii), fat-tailed dunnart (Sminthopsis crassicaudata) and

Brazilian opossum (Monodelphis domestica) (Graves and Weterman, 2002).

All mammals (Figure 2.5) are equally related to birds and reptiles (about 310 million

years of separation), and fish (roughly 450 million years). Marsupials (Metatheria) and

Eutheria diverged about 180 million years ago. These therian mammals diverged from

the egg-laying mammals monotremes (Prototheria) roughly 210 million years ago.

Early marsupials radiated in the Americas more that 65 million years ago, and during

the time of the supercontinent Gondwana they colonized Antarctica and Australia. After

separation of the Americas and Australia 38-84 million years ago, Australian marsupials

evolved separately. The oldest fossils found in Australia are dated 55 million years ago.

The evolutionary distance between tammar wallaby (Australia) and Brazilian opossum

(South America) mirrors that of human and mouse (75 million years).

The marsupial genome is roughly the same size as eutherian genomes, but it is usually

divided into fewer, larger chromosomes. A basal 2n=14 karyotype represented in all

marsupial superfamilies represents an ancestral diploid marsupial karyotype. The

diploid karyotype of tammar wallaby contains 16 chromosomes, and the diploid

Brazilian opossum karyotype has 18 chromosomes.

Comparative gene mapping has been used to study the relationships between the

mammalian genomes. The experiments comparing human and other eutherians showed,

for instance, that the X chromosome content is mainly conserved among all eutherian

mammals. However, most genes on the short arm of human X are autosomal

99

Page 55: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Figure 2.5: Evolutionary relationship among vertebrates (Graves and Westerman, 2002). My, million years.

99a

Page 56: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

(chromosome 5) in tammar wallaby, as well as in monotremes, implying that they were

added onto the eutherian X after divergence of marsupials. The relatively recent

evolutionary origin of this region could explain why a high number of human genes on

the short arm of human X escape the X inactivation. The marsupial X is subject to

inactivation, but the mechanism seems to be simpler than that in eutherians, and may be

ancestral. Several genes involved in eutherian sex determination have been isolated and

analysed in marsupials.

The depth of comparative genomics depends on the richness and evolutionary span

across the species being compared. As marsupials on the evolutionary scale fill the huge

gap between eutherians (which radiated roughly 105 million years ago) and bird/reptile

branch (which diverged about 310 million years ago), this lineage makes a logical

choice for sequencing. Being at this mid evolutionary distance from human, the highest

promise for such an alternative mammalian experimental system is in identification of

conserved genes and of conserved regulatory sequences.

This potential of such analyses of the kangaroo genome was discussed by Wakefield

and Graves (2003). Sequencing of the kangaroo genome will provide a new dimension

to comparative genomic analysis, as inferred from the contribution of Australian

mammal tammar wallaby (Macropus eugenii) to biology, genetics and genomics.

Comparison of the XPCT gene between human, mouse and tammar wallaby suggested a

high ratio of conservation signal to random noise. This reduced noise level could be

particularly useful for identification of gene regulatory regions.

The kangaroo genome project (Figure 2.6; http://kangaroo.genome.org.au) is an

international project to achieve draft-quality sequencing of the tammar wallaby genome.

The project includes mapping of the tammar genome, sequencing of DNA and analysis

of gene expression. Initial funding for the project was approved in March 2004.

I outline some major discoveries that have emerged from the mammalian-wide

comparisons, arguing in favour of the kangaroo genome project.

100

Page 57: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

Figure 2.6: Kangaroo genome project logo.

100a

Page 58: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

2.8.1 The Mammalian Testis-Determining Gene

A testis-determining gene is encoded by the Y chromosome in mammals. This gene

defines maleness, inducing the development of testis from the indifferentiated gonad.

The Y-borne zinc-finger (ZFY) gene was an early candidate for the testis-determining

gene. It maps to the eutherian Y and also has a homologue (ZFX) on the short arm of

the human X. Using the human ZFY as probe to hybridise the tammar wallaby and fat-

tailed dunnart chromosome spreads, Sinclair et al., 1988, surprisingly, found that it

mapped to neither Y nor X. In marsupials, the ZFY is autosomal, indicating that it is not

primary mammalian sex-determining gene.

2.8.2 Discovery of New Human Genes

It was proposed that there are two classes of Y-chromosome associated genes: single

copy genes present on both Y and X and widely expressed, and multicopy Y-specific

genes expressed in testis. It was thought that one such a testis-specific gene was the

human RBMY (for RNA-binding motif gene, Y chromosome). RBMY genes were

reported to have no X homologue in eutherians. However, Delbridge et al. (1999) first

found that the RBMY has a homologue on the marsupial X and subsequently

demonstrated also on the human chromosome X by cloning, sequencing and fluorescent

in situ hybridisation. Thus the new human locus, RBMX, was found after comparison

between marsupials and human. This human gene is now being investigated for a role in

mental retardation, since its position on the human X falls within a deletion interval

containing several X-linked mental retardation genes.

2.8.3 Detection of Regulatory Elements

Chapman et al. (2003) used marsupial sequence for phylogenetic footprinting (Chapter

6.6). A BAC clone from stripe-faced dunnart (Sminthopsis macroura) was isolated

harbouring the lymphoblastic leukemia-1 (LYL1) gene. LYL1 is a member of the stem-

cell leukemia gene family identified on the basis of translocations in T cell acute

101

Page 59: Chapter 2: PRNP and PrP€¦ · histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved eutherian-wide. Expansion and contraction of repeats is a frequent

Chapter 2 PRNP and PrP

leukemia. By aligning the LYL1 promoter between human, mouse and dunnart,

Chapman et al. found conserved putative transcription factor-binding sites.

I therefore isolated and characterized the prion protein gene from tammar wallaby. In

comparative genomic analysis that included also the PRNPs from four eutherian species

(human, mouse, bovine, ovine), I identified mammalian-wide conserved gene regions

and potential regulatory elements. I discussed these findings with respect to current

hypotheses about the function of PRNP (Chapter 6). This study showed utility of the

marsupial sequence in analysis of the human disease-related gene.

2.9 The Present Study

The original aim of this study was to analyse the evolution and function of prion protein

gene. Elucidation of its normal function is essential for better understanding of its role

in prion diseases, and for development of strategies for therapy and prevention of prion

diseases.

This project grew another dimension when I discovered the new human SPRN gene and

defined a new family of vertebrate Shadoo proteins (Chapter 4).

I then analysed evolution of PRNP and SPRN genes and showed different evolutionary

trajectories for these two mammalian genes. The more conserved evolution of SPRN

gene indicates that it has more prominent, and perhaps more important, function than

PRNP suggesting that it could substitute for the loss of PRNP in the knock-out mice

(Chapter 5).

Finally, PRNP gene comparisons across the eutherian-marsupial distance enabled me to

identify conserved gene regions that represent potential regulatory elements. I fitted this

information with the hypotheses on normal function of PRNP and concluded that my

analysis supports best the signal transduction hypothesis (Chapter 6).

102