Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Chapter 5 Evolution of PRNP and SPRN
Chapter 5: Evolution of PRNP and SPRN
131
Chapter 5 Evolution of PRNP and SPRN
Chapter 5: Evolution of PRNP and SPRN 5.1 Introduction
What are evolutionary trajectories of the SPRN and PRNP genes?
I analysed evolution of the PRNP and SPRN genes by means of comparisons between
mammals and fish. This analysis was based on the free genomic sequence information
available in public databases. Apart from human, I used genomic sequence from mouse,
rat, Fugu rubripes, Tetraodon nigroviridis and zebrafish for the vertebrate-wide cross-
species comparisons (Chapter 3.1.4).
I compared genomic sequences harbouring mammalian PRNP and fish homologues,
and SPRN genes from mammals and fish, respectively, together with their adjacent
genes. This analysis used both homology and non-homology criteria to assess gene
orthology (Eisen and Wu 2002; Gilligan et al., 2002) from fish to mammals. Apart from
assessing gene similarity (homology criteria), I tested whether local rearrangements
have occurred in the genomic regions (non-homology criteria).
Further, I analysed the mammalian PRNP, PRND, PRNT, and SPRN gene features in
detail. Using the human, mouse, and Fugu SPRN sequences, I performed phylogenetic
footprinting to define conserved regions, potential regulatory elements.
In public databases, I found novel fish genes related to PrP: stPrP-2 from Tetraodon,
and stPrP-3 from zebrafish. I cloned the SPRN ORF sequence coding for Sho from
Tetraodon. For Fugu, Tetraodon, carp, and zebrafish I also found in silico a duplicated
SPRN gene paralogue (SPRNB) encoding a related Shadoo2 protein (Sho2).
Dr. Lars Jermiin (University of Sydney, Australia) conducted phylogenetic analysis of
the vertebrate PrP- and Sho-protein families. Dr. Jill Gready and Prof. Jenny Graves
constructed a model to rationalize evolution of both vertebrate PRNP- and SPRN-gene
families.
132
Chapter 5 Evolution of PRNP and SPRN
This analysis shows different evolutionary pathways for the SPRN and PRNP genes.
5.2 Discovery of SPRNB in Zebrafish, Fugu, Tetraodon, and Carp
Using the zebrafish sprn ORF sequence as search query and the BLASTN program
available from the Ensembl (v14.2.1) interactive web service I discovered the zebrafish
genomic sequence containing the Sho2 (ctg10456) coding sequence. I found the
genomic sequence containing the Fugu gene SPRNB (scaffold_96) using the zebrafish
sprnb coding sequence as search query and the BLASTN program available from the
Ensembl (v12.2.1) web genome browser. I identified the Tetraodon genomic contig
(FS_CONTIG_41464_1) containing SPRNB coding sequence using the Fugu SPRNB
ORF sequence and the BLASTN search tool from the Genoscope database. The EST
(CA964511) containing the ORF for the carp SPRNB was detected by BLASTN search
of the NCBI est_others database, using the zebrafish sprnb coding sequence as search
query.
5.3. Discovery of New Fish stPrP- and PrP-like-Coding Genes
5.3.1 Discovery of Tetraodon PrP-like and stPrP-2
I identified genomic contig FS_CONTIG_4238_2 harbouring the Tetraodon PrP-like-
and stPrP-2-coding genes by using the sequence of the Tetraodon PrP-like ORF (Suzuki
et al. 2002) as search query and the BLASTN program provided in the Genoscope web
service.
5.3.2 Discovery of Zebrafish and stPrP-3 and stPrP-1
To identify the zebrafish stPrP-1-coding gene I used nucleotide sequence from the Fugu
stPrP-1 ORF as search query and the BLASTN web service from the Ensembl zebrafish
interactive genome database (v22.3b.1). I detected coding sequence for the gene on the
ctg30140. Genomic sequence containing the zebrafish stPrP-3 gene (assembly_234,
133
Chapter 5 Evolution of PRNP and SPRN
NA3274.1, v14.2.1) was identified by using the Fugu stPrP-2 coding sequence as search
query and the BLASTN web service from the Ensembl zebrafish genome database, as
above.
5.4 Detection of Genomic Contexts
5.4.1 Detection of PRNP Genomic Context in Human, Mouse and Rat
The PRNP genes in human (chr20p13: 4614996 - 4630236 bp), mouse (chr2F3:
132911892 - 132940089 bp), and rat (chr3Q36: 112889678 - 112890442 bp) were
found by keyword search of the Ensembl human (v12.31.1), mouse (v12.3.1), and rat
(v11.2.1) genome databases, respectively. The local genomic environment is also
evident from the interactive web genome browser, as is the annotation of the genomic
sequence.
5.4.2 Detection of Genomic Contexts of Fugu stPrP-1 and stPrP-2
I identified stPrP-1- and stPrP-2-coding gene sequences and their local genomic context
in the Fugu interactive Ensembl genome browsers (v12.2.1) by using the sequences of
their ORFs as search query (AY141106, AY188583, respectively; NCBI) and the
server’s BLASTN (Altschul et al. 1990) search tool. The local genomic environment
and annotation of the genomic sequence was evident from this web service. The genes
encoding stPrP-1 and stPrP-2 are located on the genomic scaffold_96 and scaffold_155,
respectively.
5.4.3 Detection of Genomic Context of Zebrafish stPrP-1 and stPrP-3
The local genomic environments were evident from the interactive zebrafish Ensembl
web genome browser.
134
Chapter 5 Evolution of PRNP and SPRN
5.4.4 Detection and Assembly of Genomic Context of Tetraodon PrP-like and stPrP-2
By using the terminal 200 bp of FS_CONTIG_4238_2 harbouring the PrP-like- and
stPrP-2-coding genes as search query and local BLASTN tool I identified in the
Genoscope database overlapping genomic clone FS_CONTIG_4238_1. In addition, I
found two more overlapping clones (FS_CONTIG_24895_1 and
FS_CONTIG_31286_1) by using the same strategy. Sequences of these contigs were
merged into a virtual contig (Tetraodon virtual contig 1) of length 22249 bp. I verified
this assembly using the PiPMaker program (Chapter 3.1.4) and alignment to the
orthologous Fugu genomic sequence (Chapter 5.5.3). The sequence was also annotated
by using the NIX interactive web tool (Chapter 3.1.2).
5.4.5 Detection of Genomic Context of SPRN and SPRNB
Local genomic contexts of the SPRN gene (human, mouse, rat, Fugu, zebrafish) and
SPRNB gene (Fugu, zebrafish) were immediately evident in the Ensembl interactive
genome browsers.
5.4.6 Detection and Assembly of Genomic Context of Tetraodon SPRN
I used terminal 200 bp sequence of the genomic contig FS_ CONTIG_4144_1
harbouring the SPRN gene as search query and the BLASTN program on the
Genoscope web to identify overlapping clone FS_CONTIG_31029_1. Using the same
strategy, the next overlapping clone FS_CONTIG_37429_1 was identified. I assembled
these sequences into a virtual contig of 19029 bp (Tetraodon virtual contig 2; I used 10
kb sequence containing the SPRN gene from this virtual contig in further analyses). I
verified this assembly by using the PiPMaker program (Chapter 3.1.4) to align it to the
orthologous Fugu genomic sequence (Chapters 5.5.3), and I annotated it by using the
NIX interactive web tool (Chapter 3.1.2).
135
Chapter 5 Evolution of PRNP and SPRN
5.5 Comparative Genomic Analysis
Sequence comparison is a multistep process (Frazer et al. 2003). In order to align the
genomic sequences and identify conserved regions, I first found genes of interest in
public genomic sequences from mammals and fish (Chapters 2.6 and 4.2), and defined
the local genomic contexts.
5.5.1 Genomic Sequences Containing PRNP in Mammals and Homologous Genes
in Fish
In Figure 5.1 I present the order of genes and their relative orientation in the local
genomic regions containing the mammalian PRNP gene and fish genes encoding stPrP-
1, stPrP-2, stPrP-3, and PrP-like proteins. In mammals, the genes adjacent to PRNP and
PRND are RASSF2 and SLC23A1. The PRNT gene is present in humans but not rodents
(Figure 5.1A-B; see also below). In pufferfish, the genes encoding stPrP-2 and PrP-like
are also adjacent to the Rassf2- and Slc23a1-coding genes (Figure 5.1C).
In contrast, the stPrP-1-coding genes from Fugu and zebrafish are located in different
genomic environments (Figure 5.1D-E). In the Fugu genome, stPrP-1 is flanked distally
by the TA-PPC2 (T-cell activation protein phosphatase 2C) and EPI-64 (epi64 protein).
Proximally, it is arranged head-to-head with a paralogue of the SPRN, SPRNB. The
SPRNB, in turn, is flanked by KCNIP3 (calsenilin). In the zebrafish genome, I found the
stPrP-1 also adjacent to sprnb in a head-to-head relative orientation. However, these
two genes are flanked by genes proximal (rassf2 encoding Rassf2) and distal (slk
encoding STE20-like kinase) different from those in Fugu, indicating a different
genomic environment. The proximity in fish of the two genes, stPrP-1 and SPRNB, is
suggestive of an evolutionary relationship between the genes. Further, the presence of
the Rassf2-coding gene adjacent to sprnb is of interest, as the RASSF2 homologue is
also adjacent to the mammalian PRND/PRNT.
136
Chapter 5 Evolution of PRNP and SPRN
Figure 5.1: Overview of the genomic contexts of the PRNP gene in mammals, and stPrP-1-, stPrP-2-, stPrP-3- and PrP-like-coding genes in fish. Figure is approximately to scale as shown by rulers. PRND, doppel gene; PRNT, PRNT gene; RASSF2, Ras association domain family 2 gene; SLC23A1, Solute carrier 23, member 2 gene; KCNIP3, calsenilin gene; SPRNB, Shadoo2 gene; stPrP-1, stPrP-1 gene; TA-PPC2, T-cell activation protein phosphatase 2C gene; EPI-64, Epi64 protein gene; rassf2, Ras association domain family 2 gene; slk, STE20-like kinase gene. For ruler under B., gene sizes and intergenic distances refer to mouse; for ruler under C. and D., gene sizes and intergenic distances refer to Fugu. Genomic coordinates roughly correspond to those used for the cross-species Vista analysis (Chapter 3.1.4).
136a
Chapter 5 Evolution of PRNP and SPRN
I found another stPrP-like gene (stPrP-3) in zebrafish in the genomic contig NA3274.1
(Ensembl). Its proximal flanking gene is unknown, while its distal flanking genes code
for green cone photoreceptor (gcp) and annexin a6 (anxa6) (not shown).
5.5.2 Genomic Sequences Containing SPRN in Mammals and Fish
The genomic environment of the SPRN gene (Chapter 4.6), updated with the Tetraodon
data, is summarized in Figure 5.2. Genes adjacent to SPRN both in mammals and two
pufferfish are those encoding the GTP-binding protein (GTP) and amine oxidase (AO);
in zebrafish, the most proximal adjacent gene is the long-chain fatty-acyl elongase-
coding gene (fae) rather than the AO.
5.5.3 Annotation of Tetraodon Genomic Sequences Containing stPrP-2 and SPRN
To verify assembly of the Tetraodon virtual contig 1, containing stPrP-2-coding gene
and its neighbour genes (Chapter 5.4.2), and to assess its validity for the comparative
genomic analysis, I aligned it with its orthologous Fugu genomic sequence using the
PiPMaker program (Chapter 3.1.4), which is able to compare both complete and
incomplete sequences. The PiP plot of the Fugu and Tetraodon genomic sequences is
given in Figure 5.3A, and its dot plot in Figure 5.3B. Among the four genes in this
genomic fragment, the exon-intron structure is known from comparison of the genomic
and cDNA sequences only for the PrP-like-coding gene. The single-exon ORF of the
stPrP-2-coding gene and GenScan predictions for Rassf2- and Slc23a1-coding gene
exons are also shown. The PiP plot indicates high conservation in predicted exons for
all genes (above 70% identity).
Second, using the PiPMaker program (Chapter 3.1.4) I aligned 10 kb of the Tetraodon
virtual contig 2, containing SPRN and adjacent genes, with its orthologous Fugu
sequence. The PiP plot is shown in Figure 5.3C, and the dot plot is in Figure 5.3D. The
GenScan exon predictions for the single-exon SPRN ORF and for genes encoding amine
oxidase and GTP-binding protein are shown. The PiP shows high conservation in exons
of the three genes (above 70% identity).
137
Chapter 5 Evolution of PRNP and SPRN
Figure 5.2: Overview of the local genomic contexts of the SPRN gene in mammals and fish. AO, Amine oxidase gene; GTP, GTP-binding protein gene; FAE, long-chain fatty-acyl elongase gene. Figure is approximately to scale as shown by rulers. For ruler under D., gene sizes and intergenic distances refer to Fugu. Genomic coordinates roughly correspond to those used for the cross-species Vista analysis (Chapter 3.1.4).
137a
Chapter 5 Evolution of PRNP and SPRN
Figure 5.3: PipMaker percent identity (Pip) and dot plots of the genomic fragments containing the stPrP-2- and PrP-like-coding, and SPRN genes in Fugu and Tetraodon. (A) Pip plot of stPrP-2- and PrP-like-coding genes and in (B) corresponding dot plot. (C) Pip plot of SPRN in Fugu and Tetraodon and in (D) Corresponding dot plot. Fugu sequence is shown along x-axis in A-D. Percentage of identity (50-100%) is shown on the y-axis in A and C, and Tetraodon sequence in B and D. Location of exons and directionality of genes is shown as black (coding) and grey (UTR) boxes, and horizontal arrows, respectively. Exons are numbered. Short dark grey and white boxes denote CpG islands with ratio 0.75 and 0.6-0.75, respectively. RASSF2, Ras association domain family 2 gene; SLC23A1, Solute carrier 23, member 2 gene; AO, Amine oxidase gene; GTP, GTP-binding-protein gene.
137b
Chapter 5 Evolution of PRNP and SPRN
5.5.4 Annotation of PRNP and SPRN
Annotation of genomic DNA sequence comprises analysis of gene order and relative
transcriptional orientation, gene structure, gene density, distribution of repeat elements,
and distribution of GC islands. This information for the human and mouse genomes
could be compiled from the interactive Ensembl genome browsers. However, the rat,
Fugu and zebrafish genome annotations (Ensembl) are much less comprehensive
because the number of transcript libraries from these species is limited. Further, the
depth of fish transposable element analysis is less than that of primates and rodents
(Aparicio et al. 2002). In the following section, I annotate the human and mouse PRNP,
PRND, PRNT and SPRN genes, and their local genomic contexts. Analyses of the rat
genes and genomic regions were included where possible.
5.5.4.1 Gene Structure, Gene Features, Gene Density and CpG Islands
I found that gene density and GC content are higher in the SPRN genomic environment
than in the PRNP environment (the genomic environments described in this chapter
correspond to those presented in Figures 5.1 and 5.2 but include also proximal
intergenic sequences of PRNP and SPRN). There are three genes in 51425 bp of the
human SPRN gene context, which is 50.66% GC rich, compared with five genes in
380074 bp of the human PRNP local genomic environment, which is 45.02 % GC rich.
The same counts for rodents where three genes in the mouse SPRN genomic
environment of 36116 bp (GC level: 48.36 %) and rat SPRN genomic environment of
34399 bp (GC level: 47.69 %) compares with four genes in the mouse PRNP local
genomic context of 241524 bp (GC level: 45.70 %) and rat PRNP local genomic
context of 218689 bp (GC level: 46.22 %).
Gene structure, gene size, GC content and features of exons and introns of the human
and mouse PRNP, PRND, PRNT, and SPRN genes are summarized in Table 5.1.
138
Chapter 5 Evolution of PRNP and SPRN
Table 5.1: Features of PRNP, PRND, and SPRN gene in human and mouse, and PRNT gene in human
Species Gene Length (bp) GC (%) Human PRNP a 15241 41.8 Exon 1 159 77.99 Intron 12697 40.89 Exon 2* 2380 44.16 PRND b 6549 46.37 Exon 1 60 65 Intron 1 2528 47.31 Exon 2 611 60.72 Intron 2 1510 43.44 Exon 3* 1840 42.12 PRNT c 9387 43.26 Exon 1 69 59.42 Intron 7514 42.09 Exon 2* 1804 47.51 SPRN d 3913 66.06 Exon 1 101 78.22 Intron 779 70.86 Exon 2* 3033 64.42 Mouse Prnp e 28198 44.85 Exon 1 47 65.95 Intron 1 2191 44.45 Exon 2 98 45.92 Intron 2 23854 33.50 Exon 3* 2008 49.25 Prnd f 5269 47.33 Exon 1 54 57.41 Intron 2093 46.49 Exon 2 3122 47.73 Sprn g 2203 57.69 Exon 1 148 61.74 Intron 876 56.62 Exon 2* 1178 57.98
* denotes coding exon. Transcripts used in analysis: a, OTTHUMT00020000691 (Ensembl human v12.31.1); b, OTTHUMT00020000599 (Ensembl human v12.31.1); c, OTTHUMT00020000595 (Ensembl human v12.31.1); d, BC040198 (NCBI); e, ENSMUST00000040877 (Ensembl mouse v12.3.1); f, AF192384 (NCBI); g, C630041J07 (FANTOM).
138a
Chapter 5 Evolution of PRNP and SPRN
In all four genes, PRNP, PRND, PRNT, and SPRN, the ORF is contained within a single
coding exon. The lengths of the genes correlate inversely with GC content, which is
higher in the exons. The GC content of the human and mouse SPRN is higher than those
of PRNP, PRND, and PRNT.
CpG islands are genomic fragments of exceptionally high GC content, typically a few
hundreds bp long. The CpG islands are associated with the 5’ ends of housekeeping
genes and, being nonmethylated, they are not subject to mutational entropy. I
determined the distribution of CpG islands in the mammalian genomic sequences by
using the cpgplot program (Chapter 3.1.2). Results for the human and mouse PRNP,
PRND, PRNT and SPRN genes are shown in Table 5.2.
5.5.4.2 Distribution of Transposable Elements in PRNP and SPRN
The depth of the repeat analysis using the RepeatMasker program (Chapter 3.1.2) is
150-200 and 100-120 million years ago for transposable elements in human and mouse,
respectively (Chapter 2.6). Transposon contents of the PRNP, PRND, PRNT and SPRN
genes are shown in Table 5.3. Whereas the SPRN contains no transposable elements, the
PRNP, PRND, PRNT accumulate repeats.
Analysis of the distribution of interspersed repeat elements in the local genomic
environments of the human, mouse, and rat PRNP and SPRN genes indicated no major
differences (not shown).
5.5.5 Cross-Species Comparisons
After I analysed genes and genomic sequences, I aligned the genomic sequences from
mammals and fish. Global alignments are particularly useful to detect conserved regions
in the long contiguous sequences (Frazer et al. 2003). I used the VISTA global
alignment tool (Chapter 3.1.4) for this purpose.
139
Chapter 5 Evolution of PRNP and SPRN
Table 5.2: CpG island distribution in human and mouse PRNP, PRND, PRNT, and SPRN gene
Species Gene Length (bp) Begina Enda
PRNP 1144 -207 937 PRND PRNT
- -
Human
SPRN 284 -12 272 Prnp 317 -191 126 Prnd -
Mouse
Sprn 195 -1 194
a begin and end denote genomic sequence positions relative to transcription start.
139a
Chapter 5 Evolution of PRNP and SPRN
Table 5.3: Summary of transposable elements content in human and mouse PRNP, PRND, PRNT, and SPRN gene
Species Gene Length (bp) SINE % LINE % LTR % DNA % Total % Human PRNP 15241 4.6 40.7 0 0.9 46.2
PRND 6549 18.5 5.6 0 0 24.1 PRNT 9387 21.4 4.1 3.8 4.8 34.1 SPRN 3912 0 0 0 0 0
Mouse Prnp 28198 6 3.6 25.3 0.4 35.3 Prnd 5269 5.1 0 0 3.9 9 Sprn 2203 0 0 0 0 0
139b
Chapter 5 Evolution of PRNP and SPRN
5.5.5.1 Comparisons of PRNP Genomic Region
A VISTA plot of the PRNP genomic region is shown in Figure 5.4. There is
conservation in both coding and noncoding sequences of the PRNP, PRND, RASSF2,
and SLC23A1 genes between human and rodents. Conservation is evident in the exons,
and there are a few highly conserved regions in the introns as well.
The VISTA plot indicates that this conservation does not count for fish stPrP-2- and
PrP-like-coding genes, which do not align with human. In contrast, the adjacent
RASSF2 and SLC23A1 align in the coding exons. There is also some evidence that
rearrangements in the local genomic sequences occurred since divergence of mammals
and fish: whereas the PRND gene (and PRNT gene in human) exists in mammals but
not in fish, the PrP-like-coding gene is present in fish only.
I observed that conservation with rodents in the human PRNT gene region differs from
that of other genes shown in the plot. There is almost no conservation between rat and
human (none at all in exons), and conservation between human and mouse appears
poor. To test this observation I aligned human and mouse genomic sequence regions
between the PRND and RASSF2 genes using the PipMaker program (Chapter 3.1.4).
The results presented in Figures 5.5A and 5.5B agree with the Vista results in showing
lack of conservation of PRNT gene exons between human and mouse.
5.5.5.2 Comparisons of SPRN Genomic Region
A VISTA plot of the mammalian and fish genomic regions containing the SPRN and
adjacent genes shows conservation in all three genes (Figure 5.6). The coding exon
sequence of SPRN aligns in all five pairwise alignments.
The coding exons of the GTP-binding protein-coding gene are in general conserved
between mammals and fish. The large gap in the alignment (~24.5-39 kb in human
sequence) is due to the insertion of the LINE elements in human only (two complete
elements in antisense orientation and two truncated human LINE/L1 elements). The
140
Chapter 5 Evolution of PRNP and SPRN Chapter 5 Evolution of PRNP and SPRN
140a
140a
Chapter 5 Evolution of PRNP and SPRN
Figure 5.4: VISTA plot showing peaks of similarity in pairwise sequence alignments between: 1, human vs. mouse; 2, human vs. rat; 3, human vs. Fugu; and 4, human vs. Tetraodon. PRNP, prion protein gene; PRND, doppel gene; PRNT, PRNT gene; RASSF2, ras association domain family 2 gene; SLC23A1, solute carrier 23, member 2 gene. Peaks are shown relative to their position in the reference (human) sequence (horizontal axis) and their percent identities (30-100%) are indicated on the vertical axis. For the reference sequence, the direction of gene transcription is indicated by a horizontal arrow, blue rectangles denote coding exons, and light blue rectangles indicate 5’ and 3’ untranslated regions. CNS, conserved non-coding sequence. Conservation of CNS (pink), UTR (light blue), and coding (blue) sequences fitting the experimental cut-off (50% over 50 bp) is indicated.
140b
Chapter 5 Evolution of PRNP and SPRN
Figure 5.5: Analysis of human-mouse conservation in the genomic region of the human PRNT gene. (A) PipMaker plot of the human and mouse genomic sequence between the PRND and RASSF2 genes. Human sequence is along horizontal axis; percentage of identity (50-100%) is on the vertical axis. The location of exons and directionality of genes are shown as black (coding) and grey (UTR) boxes, and horizontal arrows, respectively. Other icons show repeats (LINE1, grey pointed boxes; LINE2, black pointed boxes; LTR, dark grey pointed boxes; SINEs other than MIR, light grey triangles; MIR, black triangles; other repeats, dark grey triangles). Short dark grey and white boxes denote CpG islands with ratio 0.75 and 0.6-0.75, respectively. (B) Corresponding dot plot.
140c
Chapter 5 Evolution of PRNP and SPRN
Figure 5.6: VISTA plot showing peaks of similarity in pairwise sequence alignments between: 1, human vs. mouse; 2, human vs. rat; 3, human vs. Fugu; 4, human vs. Tetraodon; and 5, human vs. zebrafish. SPRN, SPRN gene; GTP, GTP-binding protein gene; AO, amine oxidase gene. Peaks are shown relative to their position in the reference (human) sequence (horizontal axis) and their percent identities (30-100%) are indicated on vertical axis. For the reference sequence, the direction of gene transcription is indicated by a horizontal arrow, blue rectangles denote coding exons, and light blue rectangles indicate 5’ and 3’ untranslated regions. CNS, conserved non-coding sequence. Conservation of CNS (pink), UTR (light blue), and coding (blue) sequences fitting the experimental cut-off (50% over 50 bp) is indicated.
140d
Chapter 5 Evolution of PRNP and SPRN
distal end of the GTP-binding protein-coding gene overlaps with the distal end of the
SPRN gene in human. There are four polyadenylation signals in the human GTP-
binding protein-coding gene, resulting in alternative transcription of the non-coding part
of its 3' terminal exon. All four sites differ in one position from canonical consensus
polyadenylation signals AATAAA and ATTAAA. The sequence of the first (41227 -
41232 bp; for AK095872, BC00409, BC000920 transcripts), second (41262 - 41267 bp;
for BC026725 transcript), and third (41321 - 41326 bp; for BC035721 transcript) signal
is GTTAAA. The most distal fourth signal sequence of GTP-binding protein-coding
gene, which overlaps with the 3' end of the SPRN gene, is AATCAA (42068 - 42072
bp; for cDNAs AK074976, NM_138384). The sequence of the single polyadenylation
signal site for SPRN is canonical consensus AATAAA (41449 - 41454 bp; for
BC040198 transcript).
The exons of the third gene encoding amine oxidase are conserved between mammals
and pufferfish, but not zebrafish. In the zebrafish sequence, the third gene is for the
long-chain fatty-acyl synthetase (Figure 5.2).
5.5.6 Phylogenetic Footprinting of SPRN
In the following step, I tried to find the vertebrate-wide conserved sequences in SPRN,
potential regulatory elements. I used the program Footprinter (Chapter 3.1.5) that
reports sets of conserved motifs, taking into account a phylogenetic tree relating the
input species.
I identified 16 conserved motifs upstream of the SPRN ORF, in the intron, exon 1, and
upstream promoter (Table 5.4). In human and mouse, five motifs were detected in the
upstream promoter, one in the exon 1, and ten in the intron (Figure 5.7). Some motifs
are duplicated.
I next checked whether any known transcription factor-binding sites were among these
detected motifs using the MatInspector program (Chapter 3.1.5) for human and mouse.
In the human and mouse sequences, 155 and 159 likely transcription factor-binding
141
Chapter 5 Evolution of PRNP and SPRN
Table 5.4: Conserved motifs in human, mouse, and Fugu SPRN gene identified by phylogenetic footprinting
Motif (length in bp) and localization in human / mouse SPRN a, b, c
Coordinates in human / mouse / Fugu SPRN relative to ORF
Score g
1. TGGAGA (6) P / P -1290 -1301 -1143 0 2. GCTGAG (6) E1 / P -890 -1049 -886 0 3. GCTGAA (6) I / I -658 -574 -776 0 4. TCCAGA (6) d I / I TCCAGA (6)
-466 -434 -303 - -543 -
0
5. GAACCC (6) I / I GAACCC (6)
-242 -217 -253 - - -218
0
6. ATCTCC (6) I / I -58 -276 -63 0 7. CTTTCC (6) I / I -36 -292 -100 0 8. GAgAGCCA (8) P / P AGAGCCA (7)
-1411 -1479 -1453 - - -1224
1
9. TGaAACAA (8) P / P -1346 -1262 -1500 1 10. GGAGGCcT (8) P / P GGAGGC (6)
-933 -1011 -927 -1027 - -
1
11. GGAGGcTG (8) I / I AGGCTG (6) GATGCtG (7)
-782 -826 -780 - -517 -508 -732 -863 -836
1
12. CCAgCCAG (8) I / I -668 -786 -759 1 13. CAGGCCTaA (9) I / I AGGCCTGA (8) CCAGGCCT (8) GTCCTAA (7)
-202 -388 -209 -304 - - -161 - - - -76 -
1
14. GcGTGCAgAG (10) e I / I TGCAGA (6) TGCACA (6)
-361 -423 -291 - - -319 -261 -54 -
2
15. TGgGGCTaGT (10) P / P TGTGGCT (7)
-1237 -1230 -1351 - -1102 -
2
16. CCCCttCAGGCCT (13) f I / I CCCc/gAGGCTT (10) CCCCCAGGCC (10) CCCAGGC (7)
-166 -394 -215 -344 - -560 -206 - - -488, -520 - -
2
Localization corresponds to position in human and mouse SPRN intron (I), exon 1 (E1), and upstream promoter (P). a Capital letters, conserved position in motif; lower case letters, variable position in motif. b Motif duplications were identified manually after inspection of results. c TRANSFAC matrix sequences identifying potential transcription factor-binding sites were included in MatInspector program analysis: motif had to be detected both in human and mouse sequences. d, Motif denotes a part of the V$NBRE.01 matrix sequence for Nurr1; e, Motif denotes a part of the V$ATF6.02 matrix sequence for Activating transcription factor 6. f, Motif denotes a part of the V$MAZR.01 matrix sequence for MYC-associated zinc finger protein related transcription factor. g, Parsimony score corresponds to main motif.
141a
Chapter 5 Evolution of PRNP and SPRN
Figure 5.7: Potential regulatory motifs in human SPRN and mouse Sprn genes identified by phylogenetic footprinting. Motifs 4, 14 and 16, labelled by *, denote potential nurr1, ATF6 (Activating Transcription Factor 6) and MAZR (MYC-Associated Zinc-finger-protein-Related) transcription factor-binding sites, respectively.
141b
Chapter 5 Evolution of PRNP and SPRN
sites were found (69 or 82 in the intron, 11 or 14 in exon 1, and 75 or 62 in the upstream
promoter; not shown).
However, only three of the motifs detected by phylogenetic footprinting correspond to
predicted transcription-factor binding sites. All three motifs were detected in the human
and mouse SPRN intron and are present in the same relative order.
Motif 4 denotes the binding site for the nuclear receptor transcription factor nurr1.
Motif 14 corresponds to the Activating Transcription Factor 6 (ATF6)-binding site. A
part of the third conserved motif 16 binds the MYC-Associated Zinc-finger-protein-
Related transcription factor (MAZR).
5.5.7 PrP and Sho Protein Families: from Fish to Mammals
Fish PrP homologues are PrP-like, stPrP-1, stPrP-2 and stPrP-3, and fish Sho
homologue is Sho2. I expanded the dataset of fish proteins that belong to the PrP- and
Sho-families.
Firstly, I found a sequence of ORF encoding the Tetraodon stPrP-2 395 amino acids
(BN000527; EMBL) in the Tetraodon virtual contig 1. Secondly, I translated the
zebrafish stPrP-3 (BN000526; EMBL) from the Ensembl genomic contig NA3274.1
into a protein of 561 amino acids.
Using the Tetraodon virtual contig 2 sequence to design PCR primers, I cloned and
sequenced the Tetraodon SPRN ORF (AJ717305; EMBL; Chapter 3.2.1) and deduced a
155-residue protein, thus adding a third fish Sho sequence to those for zebrafish
(AJ490525, EMBL) and Fugu (BN000521, EMBL).
I discovered a new Sho-related protein, its Shadoo2 (Sho2) paralogue, also from
sequences deposited in public databases. I deduced the Sho2 protein sequences of 150,
150 and 135 amino acids, from the genomic information for Fugu SPRNB ORF
(BN000522, EMBL), Tetraodon SPRNB ORF (BN000525, EMBL) and zebrafish
142
Chapter 5 Evolution of PRNP and SPRN
SPRNB ORF (BN000523, EMBL). The carp Sho2 ORF (BN000524, EMBL) was
conceptually translated from the EST CA964511 (NCBI) to give a protein of 145 amino
acids. Together with the human, mouse and rat Shos (BN000518, BN000519 and
BN000520; EMBL), this new protein family now has 10 members.
The model for the general structures of these expanded sets of Sho- and PrP-related
proteins determines four protein regions (Chapters 2.1 and 4.4): the basic region 1, the
repeat or low complexity sequence region 2, the hydrophobic region 3, and the C-
terminal region 4 (Figure 5.8).
5.5.8 Phylogenetic Analysis
Dr. Lars Jermiin (University of Sydney, Australia) conducted phylogenetic analysis of
the two protein families using the MOLPHY and PrtoML programs (Adachi and
Hasegawa, 1996). The trees are shown in Figures 5.9 and 5.10.
The analysis of the PrP-related sequence set identified a single most likely tree, shown
in Figure 5.9, and 281 near optimal trees, none of which differed significantly from the
most likely tree. The most likely tree groups the human, chicken, turtle and frog PrP
sequences together to the exclusion of all fish sequences. The total tree length is 5.65,
implying that every site in the alignment (not shown) has changed on the average 5.65
times. This in turn implies that interpretation of the tree must be done with some
caution, due to several low local bootstrap probability (LBP) and relative likelihood
score (RLS) values. However, finding of only 282 "good" trees (i.e. the most likely tree
and near optimal trees) out of 34,459,425 possible trees permits some confidence in the
result.
The analysis of the Shadoo protein family identified a single most likely tree (Figure
5.10) and 48 near optimal trees, none of which differed significantly from the most
likely tree. The single most likely tree indicates that Shos and Shos2s lie on two
separate branches. The total tree length is 4.77, implying that every site in the alignment
143
Chapter 5 Evolution of PRNP and SPRN
143a
Figure 5.8: Overall structures of PrPs, PrP-related proteins from fish (stPrP-2, stPrP-1, stPrP-3, PrP-like) and Sho proteins. Numbers indicate the first residue of each section, and last one of each protein. S, signal sequence; B, basic region; R/PGH, PGH-rich repeats; H, hydrophobic region; N, N-glycosylation site; S-S, disulphide bond; GPI, glycophosphatidylinositol anchor; B,R, basic repeats; GY, GY-rich region; B,R/RG, RG-rich repeats. A, F, numbers refer to human; E, numbers refer to Fugu; G, H, numbers refer to zebrafish.
Chapter 5 Evolution of PRNP and SPRN
Figure 5.9: The most likely tree based on phylogenetic analysis of the PrP protein family. ChPrP, chicken PrP; TuPrP, turtle PrP; XePrP, Xenopus laevis PrP, HuPrP, human PrP; FustPrP-2, Fugu stPrP-2; TestPrP-2, Tetraodon stPrP-2; FustPrP-1, Fugu stPrP-1; ZestPrP-3, zebrafish stPrP-3; ZePrP-like, zebrafish PrP-like; TePrP-like, Tetraodon PrP-like, FuPrP-like, Fugu PrP-like. Local bootstrap probabilities (LBP) are listed above the edges and relative likelihood scores (RLS) are listed below the edges. The error bar at bottom corresponds to 1.0 substitution per site.
143b
Chapter 5 Evolution of PRNP and SPRN
Figure 5.10: The most likely tree (and the consensus tree) based on phylogenetic analysis of the Sho protein family. CaSho2, carp Sho2; ZeSho2, zebrafish Sho2; FuSho2, Fugu Sho2, TeSho2, Tetraodon Sho2; MoSho, mouse Sho; RaSho, rat Sho; HuSho, human Sho; ZeSho, zebrafish Sho; FuSho, Fugu Sho; TeSho, Tetraodon Sho. Local bootstrap probabilities (LBP) are listed above the edges and relative likelihood scores (RLS) are listed below the edges. The error bar at bottom corresponds to 1.0 substitution per site.
143c
Chapter 5 Evolution of PRNP and SPRN
(not shown) has changed on the average 4.77 times. Again, this implies caution must be
applied in interpretation of the tree, this need being reflected also in some low LBP and
RLS values. However, again only 49 "good" trees (i.e., the most likely tree and the near
optimal trees) were found out of 2,027,025 possible trees, setting the result in a better
light.
The discoveries of the novel genes and proteins permitted a number of analyses, ranging
from genome comparisons to phylogenetic studies. I combine all these analyses to
define evolutionary trajectories of the PRNP and SPRN genes.
5.6 Discussion
I firstly discuss features of the genomic sequences and the outcome of cross-species
analyses. Secondly, I comment on the new protein set characteristics, and on the
phylogenetic analysis. Finally, I infer evolution of the PRNP and SPRN genes.
5.6.1 Genomic Sequences Containing PRNP and SPRN in Mammals and Related
Genes in Fish
My analysis indicates different evolution of the local genomic regions containing PRNP
and SPRN genes.
The mammalian and fish genome regions containing PRNP and its homologues differ.
Neither PRND nor PRNT genes were detected in fish nor were PrP-like-coding genes
found in mammals. The stPrP-2 shares its position and relative orientation with respect
to the adjacent genes with mammalian PRNP, suggesting an evolutionary relationship.
Yet, the phylogenetic analysis (Chapter 5.5.8) indicated that the stPrP-2 shares an
ancestral gene with the fish gene encoding stPrP-1, and that this gene duplication
occurred after the evolutionary separation of fish and mammals. The mammalian and
fish sequences have also diverged beyond recognition in the comparative genomic
analysis (Chapter 5.5.5). I concluded therefore that the mammalian PRNP and fish
stPrP-2 are not orthologous.
144
Chapter 5 Evolution of PRNP and SPRN
On the other hand, the local gene order and relative orientations in the SPRN local
genomic contexts are conserved between fish and mammals. The genes also aligned in
the comparative genomic analysis (Chapter 5.5.5) and clustered together in the
phylogenetic tree (Chapter 5.5.8). These observations indicate orthology between the
mammalian and fish SPRN.
5.6.2 Annotation of Tetraodon Genomic Sequences
As the Tetraodon sequence reads (Genoscope) were assembled into the two virtual
contigs, it was important to test and verify these assemblies.
The dot plots (Figure 5.3B and D) indicate that both Tetraodon virtual contig 1 and
Tetraodon virtual contig 2 were assembled in an order consistent with the orthologous
Fugu genomic sequences and are valid for comparative analysis.
In the Pip-plot showing alignment of the Tetraodon virtual contig 2 and Fugu sequence
(Figure 5.3), conservation of the sequences proximal (~4.5 kb) and distal (~6 kb) to the
GTP-binding-protein-coding gene may denote exons not recognised by GenScan
prediction (Chapter 2.6.6). Indeed, eight out of ten human GTP-binding-protein-coding
gene exons aligned between human and fish in my cross-species comparisons (Figure
5.6) but only three exons were predicted by the GenScan.
Thus, it is possible to assemble Tetraodon genomic sequence reads correctly into larger
contigs and use them in cross-species analysis.
5.6.3 Annotation of PRNP and SPRN Genes
There are both similarities and differences between the mammalian PRNP and SPRN
genes.
145
Chapter 5 Evolution of PRNP and SPRN
The gene structure is similar for the PRNP and SPRN genes. Both genes have one or
two non-coding exons, one or two introns and a 3’-terminal exon harbouring the
complete ORF. However, the GC content of mammalian SPRN is much higher than that
of PRNP. The mammalian GC content is known to vary genome-wide at different scales
(Chapter 2.6). The mammalian genomic DNA tends to evolve more AT-rich sequence,
so the higher GC content in SPRN may indicate stronger evolutionary pressure acting
on the gene.
The promoters of both PRNP and SPRN are associated with CpG islands. By contrast, I
found no CpG islands in the PRND or PRNT gene promoters, as already shown by
Comincini et al. (2001) and Makrinou et al. (2002). This gene feature is therefore more
similar between the PRNP and SPRN than between the PRNP and PRND.
I showed striking differences in the transposable-element content between PRNP and
SPRN. The PRNP gene has been expanding independently in all lineages since the
mammalian radiation by insertions of numerous transposable elements (Lee et al., 1998
and Chapter 6.4.3). In striking contrast, the short and GC rich SPRN is utterly devoid of
the transposable elements, again suggesting stronger selective pressure acting on the
gene.
The frequency of fixation of transposable elements is known to vary genome wide
(Chapter 2.6). There is also a strong correlation between divergence in non-coding
DNA and the amount of repetitive DNA (Chiaromonte et al., 2001): “flexible” genomic
regions accumulate many changes while “rigid” regions accumulate fewer. Rigidity of
sequence may reflect strong selection on a large number of gene regulatory elements
(Lander et al. 2001), or, alternatively, may be determined by the local genomic mutation
rate (Chiaromonte et al. 2001). SPRN is the only gene lacking repeats in its local
genomic environment, indicating that strong selection acting on the gene has prevented
integration of transposable elements. Conversely, PRNP's flexibility and “promiscuity”
for accepting repeat insertions suggests a more relaxed evolutionary history of the gene.
146
Chapter 5 Evolution of PRNP and SPRN
Thus, the features of the SPRN gene could suggest that it evolves more conservatively
than PRNP. The other analyses indeed confirm this assumption.
5.6.4 Cross-Species Comparisons
Orthologous genes could usually be aligned and recognized by comparative genomic
analysis. Between evolutionarily more distant species, such as mammals and fish, it is
the coding regions that are primarily recognizable (Frazer et al. 2003). However, where
rapid divergence of nucleotide sequence, indels, and gene loss or acquisition has
occurred, coding sequences cannot readily be aligned (Kellis et al. 2003). The analysis
of Thomas et al. (2003) showed that almost one third of human coding sequences did
not align with corresponding fish sequences.
The coding exons of the human PRNP and fish stPrP-2 and PrP-like did not align
(Figure 5.4), indicating divergence of their sequences beyond detectable conservation.
Further, there is also evidence of rearrangements in the local genomic encironments
since divergence of mammals and fish. Thus, neither homology criteria nor non-
homology criteria for gene orthology are fulfilled between human (mammal) PRNP and
fish homologues.
On the other hand, the SPRN gene aligned between mammals and fish (Figure 5.6)
satisfying homology criteria for gene orthology. Conserved contiguity between
mammals and fish indicates that no rearrangement occurred in this genomic fragment
after the evolutionary divergence of fish and mammals 450 million years ago (non-
homology criteria for gene orthology). Thus, the SPRN gene is likely to be orthologous
between mammals and fish. This is also supported by phylogenetic analysis of SPRN.
The functional significance of the overlapping of the 3’UTRs between SPRN and GTP-
binding protein-coding genes is unclear. A few other examples of such anti-parallel
overlapping of untranslated exons of functional genes have been reported (Miyajima et
al. 1989; Batshake and Sundelin 1996; Dan et al. 2002). Untranslated gene fragments
147
Chapter 5 Evolution of PRNP and SPRN
may contain regulatory sequences that affect mRNA stability and translation efficiency
(Beaudoing and Gautheret 2001; Chapter 6.5).
There is no conservation of the PRNT between human and rodents. When I translated
the exon 2 of the (human) PRNT gene (Figure 5.5), I detected two potential ORFs
encoding proteins of 60 and 94 residues, and also several smaller ORFs (not shown).
Makrinou et al. (2002) reported the transcription of PRNT exclusively in testis, as for
the PRND gene, and noted 50% similarity and 42% identity between the potential 94-
residue protein and human Dpl. The presence of disrupted ORFs suggests that human
PRNT is a pseudogene, appearing originally from duplication of PRND. Pseudogenes
are remnants of duplicate genes arising either from tandem duplication or
retrotransposition. While their sequences show similarities to coding regions of known
proteins, they have acquired many stop codons or frameshifts so that they no longer
code for full-length protein. They are usually not transcribed but may be occasionally
resurrected (Harrison and Gerstein 2002), or may acquire additional functions such as a
specific regulatory role (Hirotsune et al. 2003). In the intron of the PRNT pseudogene I
also detected a processed pseudogene sequence with high homology to the mRNA for
isopentenyl-diphosphate delta isomerase 1, disrupted by an Alu insertion. It is likely
that the PRNT pseudogene appeared in the human lineage after the evolutionary split
with rodents but it is also possible that the PRND duplication is more ancient and that
PRNT survives as a pseudogene in other mammalian lineages as well as human, but has
been deleted in rodents.
The cross-species comparison indicates that there is no orthology between the human
PRNP gene and its fish homologues. However, the SPRN gene from human and fish
could be othologous.
5.6.5 Phylogenetic Footprinting of SPRN
The set of motifs found by phylogenetic footprinting of the human, mouse and Fugu
SPRN (Table 5.4. and Figure 5.7) represents candidate regulatory regions among which
148
Chapter 5 Evolution of PRNP and SPRN
the motifs predicted to bind ATF6, Nurr1 and MAZR are conserved in human, mouse
and Fugu.
The whole set of conserved motifs may contain false positives, but the FootPrinter
program may also miss motifs present in a single species, shorter motifs present in
multiple species, motifs containing indels, motifs that fail to meet statistical
significance, and dimers with variable internal sequences (Blanchette and Tompa 2002).
The SPRN intron is devoid of transposable elements indicating that selection acts
against integration of transposons in the intron, and suggesting its importance for
regulation of gene activity.
Nurr1 is expressed in mammalian brain and plays an important role in coordinate
neuroendocrine regulation of activity of the hypothalamic/pituitary/adrenal axis
(Murphy and Conneely 1997). Its expression could be induced also in peripheral tissues.
In brain, it is critical for dopaminergic neuron development by activating tyrosine
hydroxylase transcription in a cell-context dependent manner (Kim et al. 2003).
Aberrations in the dopaminergic system are associated with Parkinson's disease and
schizophrenia.
ATF6 is a member of the basic leucine-zipper family of transcription factors. It is
strongly induced by the endoplasmic reticulum (ER) stress response that induces
transcription of genes encoding molecular chaperones and folding enzymes located in
the endoplasmic reticulum (Chapter 1.4.1). Some genes in this pathway are directly
activated by ATF6. Upstream to ATF6 in the ER-stress response is IRE1 (Wang et al.
2000). The ER-stress response pathway is involved in familial Alzheimers disease
(FAD) pathogenesis. For instance, the FAD-linked PS1 mutants attenuate
autophosphorylation of IRE1 and lead to impaired induction of the ER-stress response.
These mutants also attenuate the ATF6 signalling pathway (Kudo et al. 2002).
Remarkably, much evidence indicates involvement of chaperones in the PrP pathogenic
transformations (Chapter 1.2.4).
149
Chapter 5 Evolution of PRNP and SPRN
MAZR interacts with Bach2, a B-cell and neuron-specific transcription repressor
(Kobayashi et al. 2000).
The most exciting outcome of this analysis is a possibility that the SPRN may be
involved in the ER-stress response (Chapter 1.4.1). Chaperones and enzymes involved
in the ER-stress response give misfolded proteins second chance to fold properly, and
they are also involved in pathogenesis of protein folding diseases. Genetic evidence
supports involvement of the molecular chaperone (protein X) in prion disease
pathogenesis (Chapter 1.2.4). Can one speculate that, in fact, Sho may be the protein X
(Chapter 7)?
5.6.6 New Members of PrP and Sho Protein Families
The new collection of PrPs, Shos and their homologues permits novel insights about
evolution of their sequences.
The most characteristic feature of all mammalian and fish members of the PrP and Sho
protein families is a stretch of hydrophobic amino acids in the middle of the protein
(Figure 5.8), which is essential for both the function and pathogenic transformation of
mammalian PrP (Prusiner 1998).
Fish proteins from the stPrP set (FustPrP-2, 424 residues; TestPrP-2, 395; FustPrP-1,
461; ZestPrP-3, 561) are much longer than tetrapod PrPs (frog, 216 residues; turtle, 270,
chicken, 273; human, 253) or fish PrP-likes (~170-190 residues), and show sequence
heterogeneity in the repeat region and in the C-terminal region (Figure 5.8).
The fish proteins from the Sho protein family are all of similar length (FuSho, 146
residues; TeSho, 149; ZeSho, 131; FuSho2, 150; TeSho2, 150; ZeSho2, 136; CaSho2,
145). There is an insertion in the Fugu and Tetraodon Sho basic repeats that is not
present in other Shos. However, the Sho2 sequence in this region is different from that
in Shos, lacking the basic region N-terminal to the hydrophobic region. The C-terminal
150
Chapter 5 Evolution of PRNP and SPRN
regions of fish Shos and Sho2s are quite diverged; this is also the sequence region most
diverged between fish and tetrapod Shos.
Thus, the Sho protein family from mammals and fish show better conservation than the
PrP protein family.
5.6.7 Phylogenetic Analysis
The aim of phylogenetic analysis is to examine evolutionary relationship among the
members of the two protein families.
The human, chicken, turtle and frog PrP sequences cluster together to the exclusion of
all fish sequences (Figure 5.9). Although the inferred evolutionary relationship among
higher vertebrates has human PrP more distantly related to birds and reptiles than to
frog, the most likely tree is not significantly different from others consistent with the
current view on tetrapod evolution (i.e., with HuPrP and XePrP changed over). The
divergence of the amino-acid sequence between the fish proteins related to PrP and the
PrPs of higher vertebrates, suggests no orthology between these proteins. The clustering
pattern of the stPrPs also indicates that the genes coding for these proteins were
duplicated in fish after the evolutionary split from tetrapods.
The Shos and Shos2s lie on two separate branches of the tree (Figure 5.10), implying
that these two genes duplicated before the divergence of fish from tetrapods. Most
importantly, the fish Shos cluster with their mammalian homologues, rather than with
their fish Sho2 paralogues. This clustering pattern strongly suggests orthology between
mammalian and fish Shadoos.
It is established that whole genome and/or gene(s) duplications occurred in several fish
lineages, so that many duplicated fish genes have only one homologue in mammals
(Taylor et al. 2003; Aparicio et al. 2002). Two possible fates of duplicated genes have
been proposed. The classical model of neofunctionalization predicts that one of the
duplicate loci retains its original function while the other duplicate is fixed only if rare
151
Chapter 5 Evolution of PRNP and SPRN
beneficial mutations occur (Ohno 1970). This model could fit current knowledge of
Shos and Sho2s, as they cluster separately in the phylogenetic analysis. The alternative
model proposes that both duplicates are preserved due to subfunctionalization, where
proteins encoded by the duplicates complement each other functionally (Force et al.
1999). This model may fit the fish stPrPs which have sequences more similar to each
other than to those of tetrapod PrPs, grouping separately in the phylogenetic study.
5.6.8 Hypothetical Model for Evolution of PRNP- and SPRN-gene families
Dr. Gready and Prof. Graves constructed a hypothetical model (Figure 5.11), proposing
that the ancestral gene leading to all the PRNP-related and SPRN-related genes was
SPRN-like. First, an ancient pre-vertebrate duplication produced SPRNA and SPRNB
within an environment which may have contained the AO- and GTP-binding protein-
coding genes proximally, and the Rassf2- and Slc23a1-coding genes distally. The model
then proposes physical separation of the SPRN and SPRNB genes by a translocation of
half of the gene cluster to another chromosome. The subsequent history of the two
branches suggests that the genomic environment containing the SPRNB gene was highly
recombinogenic, while that containing the SPRNA gene was stable, leading to the
currently known fish and mammal SPRN orthologues in the same genomic context. It
was predicted that orthologous SPRN genes will be found in the same genomic context
in the other tetrapod lineages. A duplication of the SPRNB gene is then proposed, still
before the divergence of fish from tetrapods, to produce SPRNB1 and SPRNB2
protogenes.
Acquisition of additional sequence to form the complete C-terminal domain at this stage
in necessary to
explain subsequent gene evolution steps. The C-terminal domain sequence of genes
evolved from SPRNB1 and SPRNB2 has been truncated or replaced, leading to the PrP-
like and SPRNB in fish. This occured independently and recently to the duplicate PRNT
gene in human. SPRNB2 translocated in the fish lineage, but was deleted in tetrapods,
since no genes descendent from SPRNB2 have been found in mammals.
152
Chapter 5 Evolution of PRNP and SPRN
Figure 5.11: An evolutionary model for orgin of Sho- and PrP-related coding genes.
152a
Chapter 5 Evolution of PRNP and SPRN
After divergence of fish from tetrapods, the model proposes independent duplications of
these protogenes in the two lineages. In the tetrapod lineage a gene duplication of
SPRNB1 produced a gene cluster containing PRNP and PRND genes, between the
Rassf2- and Slc23a1-coding genes. It is not known at what stage the SPRNB2 gene was
deleted in tetrapod evolution, or whether it has simply diverged beyond levels currently
detectable in mammals.
In the fish branch, initial duplications of the SPRNB1 protogene to produce stPrP-2- and
PrP-like-coding genes, and of the SPRNB2 protogene to produce SPRNB and stPrP-1-
coding gene, are proposed. As depicted in Figure 5.11, these gene clusters are already
separated. If the separation occurred after these duplications, translocation of the
SPRNB and stPrP-1 fragment might more conveniently explain the apparently different
contexts observed in Fugu and zebrafish.
This model suggests that the PRNP- and SPRN-gene families evolved from the same
gene.
5.7 Conclusion: Evolvability of PRNP and SPRN
My comparative genomic analysis, together with the complementary phylogenetic
analysis, showed different evolutionary trajectories for PRNP and SPRN. On the one
hand, the dispensable mammalian PRNP appears to have relaxed evolutionary
constraints. It did not align with the fish homologues, there were local rearrangements
from fish to mammals, it accumulates transposable elements extensively and sequences
of proteins belonging to the PrP family vary. In contrast, the GC-rich mammalian SPRN
aligned with its fish homologues in comparative genomic analysis, it harbours no
transposable elements, there is conserved contiguity between fish and mammals and the
protein sequences are conserved. This evolutionary dialectic therefore indicates that the
SPRN gene is more conserved than PRNP, implying that it may have a more prominent
function than the PRNP gene.
153
Chapter 5 Evolution of PRNP and SPRN
An interplay between conservation and change enables perpetuation of life: whereas
maintenance of organization requires conservation, variation allows adaptation
(Radman et al. 1999). Whereas the PRNP and its homologues under weaker
evolutionary constraints may have adapted to different roles in different vertebrate
lineages, the conserved SPRN under stronger selective pressure may have retained
basic, vertebrate-wide conserved function. Many lines of evidence indicate redundancy
between the dispensable PRNP and another gene; perhaps this gene is the more
conserved SPRN. Finally, the pathogenic potential of PRNP could evolved as a
consequence of relaxed evolutionary constraints.
With two such a different players now in hand, new avenues for research appear: the
comparison between the two genes is a way to understand their functions better
(Chapter 7).
154