Upload
truongbao
View
216
Download
0
Embed Size (px)
Citation preview
and Evolution. All rights reserved. For permissions, please e-mail: [email protected] The Author 2007. Published by Oxford University Press on behalf of the Society for Molecular Biology
1
Title Page
Computational Analysis of RNA Editing Sites in Plant Mitochondrial Genomes Reveals
Similar Information Content and a Sporadic Distribution of Editing Sites.
R. Michael Mulligan1*, Kenneth LC Chang2** and Chia Ching Chou2**
1Department of Developmental and Cell Biology
2Department of Information and Computer Science
University of California
Irvine, CA 92697-2300
Submission for a Research Article
*To whom correspondence should be addressed:
R. Michael Mulligan
Department of Developmental and Cell Biology
University of California
Irvine, CA 92697-2300
Voice: 949-824-8433
Fax: 949-824-4709
Email: [email protected]
Running Title: Computational Analysis of RNA Editing Sites
Key words: RNA Editing, evolution, plant mitochondria
Date Received:
Date Accepted:
**These authors contributed equally to the manuscript.
MBE Advance Access published June 24, 2007
2
ABSTRACT
A computational analysis of RNA editing sites was performed on protein-coding
sequences of plant mitochondrial genomes from Arabidopsis thaliana, Beta vulgaris,
Brassica napus, and Oryza sativa. The distribution of nucleotides around edited and
unedited cytidines was compared in 41 nucleotide segments and included 1481 edited
cytidines and 21,390 unedited cytidines in the four genomes. The distribution of
nucleotides was examined in 1, 2, and 3 nucleotide windows by comparison of nucleotide
frequency ratios and relative entropy. The relative entropy analyses indicate that
information is encoded in the nucleotide sequences in the 5 prime flank (-18 to -14, -13 to
-10, -6 to -4, -2/-1) and the immediate 3 prime flanking nucleotide (+1), and these regions
may be important in editing site recognition. The relative entropy was large when two or
three nucleotide windows were analyzed, suggesting that several contiguous nucleotides
may be involved in editing site recognition. RNA editing sites were frequently preceded
by two pyrimidines or AU and followed by a guanidine (HYCG) in the monocot and
dicot mitochondrial genomes, and rarely preceded by two purines. Analysis of
chloroplast editing sites from a dicot, Nicotiana tabacum, and a monocot, Zea mays,
revealed a similar distribution of nucleotides around editing sites (HYCA). The
similarity of this motif around editing sites in monocots and dicots in both mitochondria
and chloroplasts suggests that a mechanistic basis for this motif exists that is common in
these different organelle and phylogenetic systems. The preferred sequence distribution
around RNA editing sites may have an important impact on the acquisition of editing
sites in evolution because the immediate sequence context of a cytidine residue may
3
render a cytidine editable or uneditable, and consequently determine whether a T to C
mutation at a specific position may be corrected by RNA editing. The distribution of
editing sites in many protein-coding sequences is shown to be non-random with editing
sites clustered in groups separated by regions with no editing sites. The sporadic
distribution of editing sites could result from a mechanism of editing site loss by gene
conversion utilizing edited sequence information, possibly through an edited cDNA
intermediate.
Key Words: RNA Editing, Relative Entropy, Gene Conversion, Copy Correction, Non-
random Distribution, Evolution of Editing, Editing Site Recognition, Retroconversion,
Gene Transfer
4
Introduction
RNA editing is a post-transcriptional process that changes the nucleotide sequence of
RNAs. C-to-U editing occurs in the organelles of vascular plants and changes the coding
information in mRNAs. In higher plants, specific cytidine residues are converted to
uridine residues in chloroplast and in mitochondrial transcripts, and this process
frequently re-specifies the codon to direct the incorporation of a non-synonymous amino
acid residue (Covello and Gray 1989; Gualberto et al. 1989; Hiesel et al. 1989). The
amino acid specified by the edited codon is typically the evolutionarily conserved amino
acid at that position, and the unedited codon would code for a radical amino acid
substitution.
Several higher plant chloroplast genomes have been sequenced and analysed for
editing, and generally have about 30 C-to-U editing sites (Maier et al. 1995; Sugiura
1995; Schmitz-Linneweber et al. 2002; Kugita et al. 2003a; Kugita et al. 2003b; Tillich et
al. 2005). The complete Arabidopsis thaliana, Brassica napus, Beta vulgaris and Oryza
sativa mitochondrial genomes have been sequenced and analysed for RNA editing, and
these genomes encode 441, 427, 357, and 491 C-to-U editing sites, respectively (Giege
and Brennicke 1999; Kubo et al. 2000; Notsu et al. 2002; Handa 2003; Mower 2005).
Thus, the number of nucleotide changes directed by RNA editing is much greater in
mitochondria than in chloroplasts, although the editing process is generally thought to be
similar in these organelles (Maier et al. 1996; Mulligan 2004)
The plant organellar editing complexes must specifically recognize ~30 editing
sites in chloroplasts and about 400 editing sites in plant mitochondria. Analysis of three
editing sites in transgenic tobacco chloroplasts by 5’ and 3’ deletion led to the broad
5
conclusion that recognition elements exist largely in the 5’ flanking region with some
sequence requirements in the 3’ region (Chaudhuri, Carrer and Maliga 1995; Bock,
Hermann and Kossel 1996; Chaudhuri and Maliga 1996; Bock, Hermann and Fuchs
1997; Reed and Hanson 1997; Hermann and Bock 1999; Reed, Peeters and Hanson 2001;
Chateigner-Boutin and Hanson 2002; Chateigner-Boutin and Hanson 2003). A detailed
analysis of the petB and psbE editing site in Nicotiana tabucum chloroplasts has
identified the -20 to +10 region as important for editing site conversion (Miyamoto,
Obokata and Sugiura 2002), and mutations at nucleotides -11 to -1, +2 to +4, and +8/9
were deleterious to in vitro editing of psbE RNAs (Hayes and Hanson 2007). RNA
editing site recognition in chloroplasts appears to occur through trans-acting factors that
recognize several editing sites with similar cis elements (Chateigner-Boutin and Hanson
2002; Chateigner-Boutin and Hanson 2003). The groups of editing sites are referred to as
editing site clusters and share common sequence motifs that are frequently composed of
three or four nucleotides. Recently, the pentatricopeptide proteins have been recognized
as a large class of organellar RNA binding proteins that are required for RNA editing and
other RNA processing reactions (Kotera, Tasaka and Shikanai 2005; Schmitz-
Linneweber et al. 2006).
Computational analysis of sequences around editing sites has been performed by
examination of the distribution of single nucleotides in close proximity to RNA editing
sites in plant mitochondrial genomes, and were compared to a small subset of unedited
cytidines (Giege and Brennicke 1999; Cummings and Myers 2004). An analysis of the
Arabidopsis mitochondrial genome compared nucleotide frequencies from -17 to +7 in
sequences around all known edited cytidines and 30 randomly selected unedited
6
cytidines. This study reported a high incidence of pyrimidines in position -2 and -1, a
low incidence of guanines at position -1, and other unexpected nucleotide frequencies at -
5 and -17 (Giege and Brennicke 1999). A second computational analysis of plant
mitochondrial editing sites analysed editing sites from the Oryza, Arabidopsis, and
Brassica mitochondrial genomes and compared them to a subset of randomly selected
non-edited cytidines with the same codon position frequencies (Cummings and Myers
2004). This study detected the pyrimidine bias that exists at position -1, and reported a
correlation of the free energy of folding of the 41 nucleotide RNA segments centered on
an edited or unedited cytidine.
In this study we present a comprehensive analysis of edited and unedited cytidines
in the protein-coding sequences of four mitochondrial genomes. In order to evaluate
possible higher order distribution of nucleotides, our analyses have included analysis of
the distribution of single, di- and tri-nucleotides around edited and unedited cytidines in
the Arabidopsis, Beta, Brassica and Oryza mitochondrial genomes. The relative entropy
of the nucleotide sequences flanking edited and unedited cytidines are very similar in
these genomes, suggesting that the same regions are utilized in editing site recognition in
mitochondria of moncots and dicots. Analysis of information content suggests that
several groups of two or three contiguous nucleotides may be utilized in editing site
recognition. Comparison of the RNA sequences immediately adjacent to chloroplast and
mitochondrial editing sites to unedited cytidines suggests that a similar sequence of
YYCR are enriched around editing sites in both organelle systems in monocots and
dicots, and the immediate sequence context of a cytidine residue may be critical factor in
whether a cytidine is editable. In addition, the distribution of editing sites within
7
individual coding sequences was analysed and editing sites are frequently non-randomly
distributed. Evolutionary mechanisms that may result in a sporadic distribution of RNA
editing sites are discussed.
Materials and Methods
DNA Sequence Data
A comprehensive analysis of RNA editing sites in mitochondrial genomes has been
reported for the Arabidopsis thaliana, Beta vulgaris, Brassica napus , and Oryza sativa
(Giege and Brennicke 1999; Kubo et al. 2000; Notsu et al. 2002; Handa 2003; Mower
and Palmer 2006). DNA sequences and editing site locations were obtained from
Genbank accession numbers NC001284, AP006444, BA000009 with DQ381444-
DQ381465, and BA000029, respectively. Genbank genome entries were converted into a
series of FASTA-formatted text for all known protein-coding sequences, and were
annotated with edited cytidines represented as an upper case C. Thus, editing sites are
represented as the unedited nucleotide and are considered to be cytidines in these
analyses. These files are available in the supplemental information. Protein coding
sequences were limited to entries that were larger than 100 nucleotides, and included only
protein coding sequences, with no intron or untranslated regions. In addition, small
ORFs, uncharacterized ORFs, and small exons were eliminated from the database.
Computational Analyses
Computer programs were written and compiled with Dr Java (version 1.4). The
nucleotide distribution around all edited and all unedited cytidines in the database was
analyzed in a sliding window of one, two, or three nucleotides. Each FASTA entry in the
genome file was scanned for an edited C or an unedited c, and every time a cytidine was
8
encountered, a sequence was written to an array of edited or unedited sequences. Thus,
the sequences flanking all edited or unedited cytidines in the database are aligned in a
matrix. The size of the region to be analyzed was specified as an input to the program,
and was typically the 20 or 50 nucleotides flanking a cytidine (e.g. a 41 or 101-nucleotide
sequence was written to the matrix). Cytidines that were encountered in a FASTA entry
that had less than the specified region in either the 5’ or 3’ direction were ignored; thus
the first and last 20 (or 50) nucleotides of the coding sequences were eliminated from
analysis.
The arrays represent the alignment of all RNA sequence surrounding edited or
unedited cytidines, and were analyzed for the distribution of nucleotide sequences by
scanning one, two, or three nucleotide windows and calculating the number of times each
sequence was encountered in a specific position relative to a cytidine. As an example of
the output, Table 1 shows the distribution of dinucleotides around Arabidopsis
mitochondrial editing sites and unedited cytidines in the -2/-1 window. The frequency
that each dinucleotide is encountered adjacent to an edited or unedited cytidine (P, Q) is
the number of times that a dinucleotide is observed divided by the total number of edited
or unedited cytidines. The ratio of the frequencies that each dinucleotide is around an
edited and unedited cytidine is defined as the selectivity ratio (P/Q). Thus, a sequence
with a selectivity ratio of 1 has the same relative frequency around edited and unedited
cytidines, while a sequence with a selectivity ratio greater than 1 is more frequently
present around an editing site. Relative entropy was calculated as the Kullback-Leibler
distance by the equation d = Σ Pk log (Pk/Qk) over k terms (k = 4n) for the distribution of
nucleotides in 1, 2, or 3 nucleotide windows.
9
Random Editing Site Assignment
Random editing site assignment was used to compare the results of the mitochondrial
database with a random distribution of editing sites. The random editing site assignment
program scanned each FASTA formatted entry in the database and determined the
number and codon position of each of the editing sites. The program then randomly
selected a cytidine in the same codon position to be assigned as an editing site. Thus, the
random editing site assignment program maintained the number and codon position of
editing sites in a coding sequence. Statistics such as mean, standard deviation, variance,
and confidence intervals were determined from 1000 genome files with randomly
assigned editing sites.
Results
Nucleotide Distribution Around Edited and Unedited Cytidines
The distribution of nucleotides around edited and unedited cytidines was analyzed by
calculation of relative entropy to determine where information content existed within
these sequences. Figure 1 shows the relative entropy of edited and unedited cytidines for
Arabidopsis and Oryza mitochondrial coding sequences. The analysis was performed by
analysis of nucleotides in the 40 or 100 nucleotides flanking edited and unedited
cytidines. The 5% confidence interval for the relative entropy of each mitochondrial
genome was determined by 1000 iterations of random assignment of RNA editing sites
and calculation of the mean and standard deviation of the relative entropy values. The
Beta vulgaris and Brassica napus mitochondrial genomes were also analyzed, but are
provided in the supplemental material to improve figure clarity.
10
Figure 1 shows the relative entropy for the analysis of a one nucleotide window
over the entire 101 nucleotide segment. The relative entropy is extremely high in the
immediate vicinity of the editing site (nucleotides -2, -1, +1) and several peaks are
observed that exceed the 5% confidence interval in the -20 to +8 nucleotide region.
Figure 1B shows an expanded view of the -20 to +8 nucleotide region, and the relative
entropies of the Arabidopsis and Oryza mitochondrial genomes are very similar in this
region. The relative entropy of the two nucleotides immediately upstream of an editing
site is very large suggesting great importance of these nucleotides in editing site
recognition. In addition, the coincidence and magnitude of peaks in the relative entropy
profiles are very similar, suggesting similar regions are involved in editing site
recognition. Thus, the information content is very similar around RNA editing sites in
the dicot (Arabidopsis) and the monocot (Oryza) genomes. These taxa are thought to
have diverged about 150 MY ago (Chaw et al. 2004), and these results suggest that
similar editing site recognition mechanisms are utilized in these mitochondrial systems.
Analysis of the relative entropy around editing sites in two and three nucleotide
windows resulted in some important differences. For example, nucleotide position -5
shows a peak in relative entropy over the adjacent nucleotides when analyzed as a single
nucleotide (Fig. 1B). Uridines are enriched at the -5 position and the selectivity ratio is
very high (Fig. 2); however, the selectivity ratio of C at position 5 is not remarkable, nor
is the entropy or distribution of mononucleotides at -6 or -4 positions. When
dinucleotides are analyzed, the entropy analysis shows a broad peak that includes
dincleotides at -6/-5 and -5/-4 (Fig, 1C), and CU and CC are enriched at -6/-5 and UA
and CG are enriched at the -5/-4 position (Fig. 2). Finally, when trinucleotides are
11
analyzed, a large peak in the entropy profile is evident at trinucleotide -6/-5/-4 (Fig. 1D),
and CUA and CCG are enriched at these positions (Fig. 2). The trinucleotide CCG has a
greater selectivity ratio than CUA that includes the highly enriched U at position -5.
Thus, analysis of multiple adjacent nucleotides reveals that combinations of nucleotides
are enriched around the RNA editing sites that are not evident when single nucleotides
are analyzed. These results suggest that multiple contiguous nucleotides are recognized
by the editing apparatus, and that distinct combinations of nucleotides in regions with
high relative entropy may exist in the cis element of RNA editing sites.
Similar changes in the relative entropy profile are evident in the -12 to -10 region
and the -18 to -15 region. In summary, the relative entropy of nucleotide sequences
around RNA editing sites suggests that the greatest information is present immediately
upstream of editing sites (-2/-1), and additional information is present in the -18 to -14, -
13 to -10, -6 to -4, -2/-1, and +1/+2, regions.
The distribution of nucleotides is similar around plant mitochondrial and
chloroplast editing sites. Table 1 shows the distribution of dinucleotides in the -2/-1
window around plant mitochondrial editing sites. Panel A shows the number of times
each dinucleotide occupies the -2/-1 window upstream of an edited or an unedited
cytidine. P and Q, the frequencies that a dinucleotide is upstream of an edited or unedited
cytidine is calculated as the number of times that a specific dinucleotide is present
divided by the total number of edited or unedited cytidines. The ratio of these
frequencies (P/Q) is the selectivity ratio that expresses the relative frequency of a specific
dinucleotide around an edited or unedited cytidine.
12
The selectivity ratios for editing sites in the Arabidopsis, Beta, and Oryza
genomes are compared in columns 6, 7, and 8, respectively (Table 1). The selectivity
ratios are very similar for editing sites in the three genomes with about half of the
dinucleotides rarely observed upstream of an editing site (UA, UG, CA, GG, CG, AA,
GA, AG). These dinucleotides have very low selectivity ratios, and include all eight
dinucleotides with a purine in the -1 position. The dinucleotides with consistently high
selectivity ratios (UU, CU, UC, AU) are pyrimidine-pyrimidine or AU combinations.
Figure 3 compares the selectivity ratios for Arabidopsis with the Beta or Oryza
editing sites in the -2/-1 and +1/+2 windows (Fig. 3A and B respectively). Each point
represents the selectivity ratios for one of the sixteen dinucleotides. About half of the
dinucleotides exhibit low selectivity ratios in all three species, and as a result are
clustered near the origin. In addition, dinucleotides with high selectivity ratios in
Arabidopsis generally have high selectivity ratios in Beta and Oryza, and linear
regression of the selectivity ratios shows lines with slopes of nearly one and intercepts
very close to zero. The coefficient of determination (r2) between Arabidopsis and Oryza
selectivity ratios is 0.90 and indicates a very strong degree of correlation.
Figure 3B compares the selectivity ratios for Arabidopsis with the Beta or Oryza
editing sites in the +1/+2 window. The selectivity ratios also exhibit a strong correlation
in the+1/+2 window with a coefficient of determination of 0.68 between Arabidopsis and
Oryza editing sites. In contrast to the -2/-1 window, none of the selectivity ratios are very
small indicating that dinucleotides downstream to the editing site are not discriminated
against as highly as in the upstream position. The dinucleotides with high selectivity
ratios in the +1/+2 position are GG and GU.
13
The distribution of nucleotides around RNA editing sites in chloroplasts is very
similar to plant mitochondria. Table 1B and Figure 4 show selectivity ratios for the
distribution of dinucleotides in the -2/-1 window for Arabidopsis mitochondria compared
to editing sites in tobacco and maize chloroplast genomes. Eight of the dinucleotides are
clustered near the origin and are rarely observed upstream of editing sites in mitochondria
or in chloroplasts (AG, GA, AA, CG, GG, CA, UG, UA), and several dinucleotides are
frequently detected upstream of editing sites (UU, AU, CC) in both organelles.
Regression analysis of these values gives a coefficient of determination (r2) of 0.65
between the Arabidopsis mitochondrial and tobacco chloroplast and 0.73 between the
Arabidopsis mitochondria and Zea chloroplast editing sites, and indicates a moderate to
strong correlation. The similarity of nucleotide distribution around editing sites in dicots
and monocots and in both chloroplast and mitochondria suggests that common features
are necessary for editing site conversion in these diverse taxa and organelle systems.
Effect of Codon Position of an Editing Site
The distribution of editing sites in plant mitochondrial genomes is typically about
35%:55%:10% in the first, second and third positions of the codon (Giege and
Brennicke); thus the second codon position is over represented and the third codon
position is under represented in edited cytidines, and the sequence context of editing sites
may be influenced by codon position.
In order to directly assess the effect of codon position, relative entropy was
separately analyzed in a one nucleotide window for editing sites in the first, second, or
third position, and compared to unedited cytidines from the same codon position (Fig. 5).
If entropy values were strongly influenced by codon position, then the peaks and troughs
14
in the entropy profile would be expected to be displaced by one nucleotide. However, the
entropy profile in the 5’ flanking region is quite similar, especially for the first and
second positions that exhibit peaks at -1, -5, and -8/-9, and intervening troughs. In the -
10 to -20 region, many peaks coincide with a few differences; however, a strong single
nucleotide displacement of the profile is not evident. The analysis of a small number of
editing sites from the third codon position resulted in much larger fluctuations in the
entropy values, but showed similar trends. This analysis suggests that although there is
some influence of editing site position in the entropy value, information is similarly
embedded in the 5’ flanking region of editing sites irrespective of position in the codon.
Some codon position effects are evident, especially in the nucleotides
immediately downstream of an editing site. In both the Oryza and Arabidopsis genomes,
the downstream region exhibited a peak at the +1 nucleotide for editing sites in the
second position, and at the +2 nucleotide for editing sites in the first position. This
position represents the first downstream wobble position, and synonymous mutations
may allow optimization of the editing site for efficient editing, and would result in
increased entropy at these positions.
Editing Sites are Sporadically Distributed in Some Genes
Some coding sequences exhibited an unusual distribution of editing sites that appeared to
be clustered in groups and separated by gaps that lacked editing sites. In order to
systematically examine the distribution of RNA editing sites within individual coding
sequences, the interval between editing sites was calculated for all coding sequences
greater than 500 nucleotides that included at least three editing sites.
15
The variance of the intervals between editing sites for an individual coding
sequence was determined as a measurement of the distribution of editing site intervals
relative to the mean interval size, and was compared to the variances of 1000 randomly
assigned coding sequences. Table 2 shows p values for the analysis of coding sequences
in the three genomes, and 31%, 45%, and 35% of the coding sequences analyzed from
each genome exhibited a non-random distribution of editing sites with p values less than
0.05. A random distribution of editing sites would be expected to yield a p value of 0.05
for only 5% of the coding sequences, and would be expected only once for each of the
~20 coding sequences analyzed from each genome. These results demonstrate that an
unexpectedly large fraction of plant mitochondrial coding sequences exhibit a non-
random distribution of editing sites. The distribution of editing sites for several coding
sequences with small p values is graphically presented in Figure 6.
Discussion
Editing Site Sequence Context
Analysis of the information content around RNA editing sites in plant mitochondrial
transcripts suggests that groups of nucleotides in specific regions are important in editing
site recognition (Figure 7). The relative entropy immediately upstream and downstream
of an editing site is large and suggests that these regions are critical for editing site
recognition. Based on these results, it would appear that the simple motif “HYCGK”
represents a sequence that is likely to be edited in plant mitochondria. These
observations extend earlier studies based on single nucleotide analyses that concluded
16
that editing sites are frequently preceded by pyrimidines and rarely preceded by a
guanine (Maier et al. 1996; Giege and Brennicke 1999; Cummings and Myers 2004).
The distribution of nucleotides in the immediate 5’ flanking region of editing sites
in monocot and dicot mitochondria was shown to be remarkably similar (Table 1A) and
the selectivity ratios exhibit a strong correlation between monocots and dicots (Figure
3A). The distribution of dinucleotides in the -2/-1 window in chloroplasts editing sites of
a dicot (Nicotiana tabacum) and a monocot (Zea mays) is very similar to the distribution
of dinucleotides observed in plant mitochondria (Table 1B, Figure 4). Thus, a similar
distribution of nucleotides immediately upstream of RNA editing sites in monocots and
dicots and in both mitochondria and chloroplasts suggests that similar molecular systems
are involved and that a preferred editing site sequence context is shared among these
organisms and organelles. However, some differences are noted between the
mitochondrial and chloroplast systems: in chloroplast editing sites, the dinucleotide CU
is not prevalent in the -2/-1 window and the +1 nucleotide is more typically an A. These
may represent differences that distinguish the editing machinery in these two systems.
Monocots and dicots diverged 150 MY ago (Chaw et al. 2004), and chloroplast
trans-acting specificity factors are proposed to change rapidly in evolution (Schmitz-
Linneweber et al. 2001). In principle, a trans-acting RNA binding factor would be
expected to be able recognize virtually any sequence. Thus, the similarity of editing site
context that is maintained across diverse taxa and different organelles may reflect
common features in the mechanisms of editing. The sequence similarity around RNA
editing sites as well as the strong selection for and against nucleotides immediately
adjacent to editing sites in these disparate systems suggests that some cytidines may exist
17
in an “editable” context while other cytidines may exist in an “uneditable” context. Thus,
the immediate sequence context of a nucleotide may have an important impact on
whether a T to C mutation could be edited and has important consequences on how RNA
editing sites are acquired in evolution (Covello and Gray 1993).
Analysis of relative entropy suggests that information around RNA editing sites
exists in the -18 to 14, -13 to -10, -6 to -4, -2 to -1, and +1 to +2 regions. These regions
would be expected to be recognized by the editing machinery, and Figure 7 shows the
combinations of nucleotides most frequently observed in these positions based on
selectivity ratios. Analysis of the relative entropy in larger nucleotide windows exhibited
large increases over the relative entropy of randomly assigned editing sites, and several
contiguous nucleotides may be important in editing site recognition rather than an
individual nucleotides at specific positions. These results are consistent with the editing
site cluster model that proposes that groups of editing sites are recognized by the same
trans-acting factor (Chateigner-Boutin and Hanson 2002).
The groups of trinucleotides indicated in the -18 to -14 region overlap, and
suggest that a larger series of nucleotides may be important in editing site recognition.
For example, YUC, UCC, and CCU are frequently encountered at positions -18/-16, -17/-
15, and -16/-14, respectively (Figure 7). The pentanucleotide YUCCU is present at
nucleotide -18 to -14 in eleven editing sites of the 376 rice editing sites analyzed,
suggesting that this may represent a portion of an editing site recognition sequence.
Other 4 and 5 nucleotide combinations were noted in the Arabidopsis genome such as
YUACA (-18/-14) that may be important editing site recognition motifs.
Editing Site Recognition
18
RNA editing site recognition and conversion has been analyzed with an in vitro editing
system and by electroporation of intact mitochondria. Deletion of 5’ and 3’ sequences
suggests that nucleotide sequences from -20 to +10 are required for editing site
conversion (Takenaka, Neuwirt and Brennicke 2004; Neuwirt et al. 2005). These studies
examined an atp9 editing site and showed that five pentanucleotide nucleotide regions
between -25 and -1 were highly important to critical in editing site conversion, while
sequences in the -35 to -25 and +1 and greater were much less important. In addition, the
+1 nucleotide and was shown to be extremely important for editing site conversion, as
well as nucleotide deletions or insertions at -2. These results suggest that spacing
between cis-elements may be important and is a conclusion supported by the entropy
analyses in this study.
The cis-acting sequences of the two editing sites in wheat cox2 are proposed to be
present within -16 to +6 nucleotides of the editing site (Farre et al. 2001; Choury et al.
2004). Single nucleotide mutagenesis within the 23 nucleotide region of editing site C77
of the cox2 transcript demonstrated that residues at -11, -10, -9, -6, -2, and -1 were
critical for effective editing site conversion, while editing site C259 showed a similar
trend with critical residues at -12, -11, +1, +3, and +4. These positions correspond well
with regions identified as potentially important in editing site recognition in this study.
While the analysis of individual editing sites provides detailed information about an
individual editing site, this study has analyzed all editing sites of entire genomes, and
provides statistical information about the features of a “typical” editing site.
Editing Sites Distribution
19
A statistical analysis of the intervals between RNA editing sites demonstrated that coding
sequences frequently include groups of editing sites that are separated by gaps with no
editing sites. The mechanism of editing site acquisition is proposed to involve T to C
mutation in the genome that can be corrected by C-to-U editing (Covello and Gray 1993;
Schmitz-Linneweber et al. 2001; Schmitz-Linneweber et al. 2002; Tillich et al. 2005).
Conversely, the simplest mechanism of editing site loss would be a C to T mutation at an
edited C, such that the edited cytidine was lost from the genome. These mechanisms
would be predicted to occur randomly within a gene; however, the distribution of editing
sites is frequently non-random with large gaps between clustered RNA editing sites.
The molecular process that results in the sporadic distribution of editing sites in
these genes is open for speculation. The distribution of editing sites in the matR coding
sequence has been previously noted to correspond to regions that encode the reverse
transcriptase and maturase domains of this protein (Thomson et al. 1994; Begu et al.
1998), and the distribution of editing sites could be related to maintenance of these
functions. Alternatively, the targeting of one region of a transcript by the editing
apparatus might facilitate editing of additional T to C mutations in that region, and
consequently the acquisition of editing sties might tend to occur in groups.
Figure 8 illustrates a possible mechanism for the generation of the sporadic
distribution of editing sites. Loss of RNA editing sites from relatively large regions of a
coding sequence could occur through retroconversion that would remove adjacent editing
sites by replacement with the edited sequence information. This process would
presumably require conversion of edited mRNA to cDNA through reverse transcription,
and there is limited evidence for reverse transcriptase activity in plant mitochondria
20
(Wahleithner, MacFarlane and Wolstenholme 1990; Begu et al. 1998; Farre and Araya
1999). Recombination or gene conversion could integrate the edited information into the
genome, and this process is thought to happen readily in plant mitochondria (Knoop
2004).
A number of individual examples of loss of editing sites within a plant
mitochondrial transcripts have provided examples that may have occurred through
retroconversion. Editing in cox3 and rps13 transcripts was completely or nearly
eliminated in the Iridaceae and Amarylliadaceae, yet these transcripts include numerous
editing sites in related dicots (Lopez, Picardi and Quagliariello 2007). A similar example
involved the loss of editing sties from cox1 in several gymnosperm taxa, yet cox1 is
heavily edited with 25 to 34 editing sites in related species (Lu, Szmidt and Wang 1998).
Two compelling examples involve the loss of introns and the adjacent editing sites in the
Caryophylales and the Asterales. The nad4 gene lost an intron in the Caryophylaceae,
and editing sites were eliminated near the newly created exon-exon boundary (Itchoda et
al. 2002), while nad4 transcripts in Lactuca lost two introns as well as the RNA editing
sites in that region of the gene (Geiss, Abbas and Makaroff 1994). The simultaneous loss
of both introns and the adjacent editing sites strongly suggests that the process involved
recombination with a spliced and edited intermediate.
Numerous examples of gene transfer from the mitochondrial to the nuclear
genome demonstrate that the nuclear forms of these transferred genes have lost
mitochondrial introns and editing sites (Nugent and Palmer 1991; Kadowaki et al. 1996).
Mitochondrial gene transfer to the nucleus would require RNA-mediated transfer through
a cDNA intermediate, or wholesale loss of editing sites and introns in the mitochondrial
21
genome prior to DNA-mediated gene transfer (Henze and Martin 2001). Thus, RNA
editing represents an obstacle to gene transfer to the nucleus, and may contribute to the
retention of genes in plant mitochondrial genomes. Loss of editing sites in the
mitochondrial genome would facilitate DNA-mediated gene transfer to the nucleus
(Henze and Martin 2001). Both DNA-mediated and RNA-mediated gene transfer would
be facilitated by mechanisms related to retroconversion, either by removal of editing sites
to facilitate DNA-mediate gene transfer, or as a mechanism to produce a cDNA for
integration in the nuclear genome.
While individual examples of editing site and intron loss suggest that
retroconversion has occurred in specific taxonomic groups, the statistical analysis of the
distribution of editing sites in mitochondrial genomes demonstrates that numerous genes
exhibit a sporadic distribution of editing sites. Taken together, these observations suggest
that loss of editing sites has occurred periodically and may have important consequences
in the evolution of plant mitochondrial genomes.
Supplemental Information
Text files of DNA sequence information for the coding sequences of the Arabidopsis,
Beta, and Oryza mitochondrial genomes are provided. The files are in fasta format and
edited cytidines are represented as an upper case “C”. Figure 1E-H shows relative
entropy analyses for editing sites in the Brassica and Beta genomes. A detailed table of
trinucleotides and selectivity ratios around RNA editing sites is provided as a supplement
to Figure 7.
22
Acknowledgements
The authors are grateful to Dr. Brandon Gaut for assistance with statistical analyses,
experimental design, and thoughtful discussion. Kenneth LC Chang and Chia Ching
Chou each contributed equally to this work. Ms. Nam Nguyen provided excellent
technical assistance.
References
Begu D, Mercado A, Farre JC, Moenne A, Holuigue L, Araya A, Jordana X. 1998.
Editing status of mat-r transcripts in mitochondria from two plant species: C-to-U changes occur in putative functional RT and maturase domains. Curr Genet 33:420-428.
Bock R, Hermann M, Fuchs M. 1997. Identification of critical nucleotide positions for plastid RNA editing site recognition. Rna 3:1194-1200.
Bock R, Hermann M, Kossel H. 1996. In vivo dissection of cis-acting determinants for plastid RNA editing. Embo J 15:5052-5059.
Chateigner-Boutin AL, Hanson MR. 2002. Cross-competition in transgenic chloroplasts expressing single editing sites reveals shared cis elements. Mol Cell Biol 22:8448-8456.
Chateigner-Boutin AL, Hanson MR. 2003. Developmental co-variation of RNA editing extent of plastid editing sites exhibiting similar cis-elements. Nucleic Acids Res 31:2586-2594.
Chaudhuri S, Carrer H, Maliga P. 1995. Site-specific factor involved in the editing of the psbL mRNA in tobacco plastids. Embo J 14:2951-2957.
Chaudhuri S, Maliga P. 1996. Sequences directing C to U editing of the plastid psbL mRNA are located within a 22 nucleotide segment spanning the editing site. Embo J 15:5958-5964.
Chaw SM, Chang CC, Chen HL, Li WH. 2004. Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J Mol Evol 58:424-441.
Choury D, Farre JC, Jordana X, Araya A. 2004. Different patterns in the recognition of editing sites in plant mitochondria. Nucleic Acids Res 32:6397-6406.
Covello PS, Gray MW. 1989. RNA editing in plant mitochondria. Nature 341:662-666. Covello PS, Gray MW. 1993. On the evolution of RNA editing. Trends Genet 9:265-268. Cummings MP, Myers DS. 2004. Simple statistical models predict C-to-U edited sites in
plant mitochondrial RNA. BMC Bioinformatics 5:132.
23
Farre JC, Araya A. 1999. The mat-r open reading frame is transcribed from a non-canonical promoter and contains an internal promoter to co-transcribe exons nad1e and nad5III in wheat mitochondria. Plant Mol Biol 40:959-967.
Farre JC, Leon G, Jordana X, Araya A. 2001. cis Recognition elements in plant mitochondrion RNA editing. Mol Cell Biol 21:6731-6737.
Geiss KT, Abbas GM, Makaroff CA. 1994. Intron loss from the NADH dehydrogenase subunit 4 gene of lettuce mitochondrial DNA: evidence for homologous recombination of a cDNA intermediate. Mol Gen Genet 243:97-105.
Giege P, Brennicke A. 1999. RNA editing in Arabidopsis mitochondria effects 441 C to U changes in ORFs. Proc Natl Acad Sci U S A 96:15324-15329.
Gualberto JM, Lamattina L, Bonnard G, Weil JH, Grienenberger JM. 1989. RNA editing in wheat mitochondria results in the conservation of protein sequences. Nature 341:660-662.
Handa H. 2003. The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res 31:5907-5916.
Hayes ML, Hanson MR. 2007. Identification of a sequence motif critical for editing of a tobacco chloroplast transcript. Rna 13:281-288.
Henze K, Martin W. 2001. How do mitochondrial genes get into the nucleus? Trends Genet 17:383-387.
Hermann M, Bock R. 1999. Transfer of plastid RNA-editing activity to novel sites suggests a critical role for spacing in editing-site recognition. Proc Natl Acad Sci U S A 96:4856-4861.
Hiesel R, Wissinger B, Schuster W, Brennicke A. 1989. RNA editing in plant mitochondria. Science 246:1632-1634.
Itchoda N, Nishizawa S, Nagano H, Kubo T, Mikami T. 2002. The sugar beet mitochondrial nad4 gene: an intron loss and its phylogenetic implication in the Caryophyllales. Theor Appl Genet 104:209-213.
Kadowaki K, Kubo N, Ozawa K, Hirai A. 1996. Targeting presequence acquisition after mitochondrial gene transfer to the nucleus occurs by duplication of existing targeting signals. Embo J 15:6652-6661.
Knoop V. 2004. The mitochondrial DNA of land plants: peculiarities in phylogenetic perspective. Curr Genet 46:123-139.
Kotera E, Tasaka M, Shikanai T. 2005. A pentatricopeptide repeat protein is essential for RNA editing in chloroplasts. Nature 433:326-330.
Kubo T, Nishizawa S, Sugawara A, Itchoda N, Estiati A, Mikami T. 2000. The complete nucleotide sequence of the mitochondrial genome of sugar beet (Beta vulgaris L.) reveals a novel gene for tRNA(Cys)(GCA). Nucleic Acids Res 28:2571-2576.
Kugita M, Kaneko A, Yamamoto Y, Takeya Y, Matsumoto T, Yoshinaga K. 2003a. The complete nucleotide sequence of the hornwort (Anthoceros formosae) chloroplast genome: insight into the earliest land plants. Nucleic Acids Res 31:716-721.
Kugita M, Yamamoto Y, Fujikawa T, Matsumoto T, Yoshinaga K. 2003b. RNA editing in hornwort chloroplasts makes more than half the genes functional. Nucleic Acids Res 31:2417-2423.
24
Lopez L, Picardi E, Quagliariello C. 2007. RNA editing has been lost in the mitochondrial cox3 and rps13 mRNAs in Asparagales. Biochimie 89:159-167.
Lu MZ, Szmidt AE, Wang XR. 1998. RNA editing in gymnosperms and its impact on the evolution of the mitochondrial coxI gene. Plant Mol Biol 37:225-234.
Maier RM, Neckermann K, Igloi GL, Kossel H. 1995. Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol 251:614-628.
Maier RM, Zeltz P, Kossel H, Bonnard G, Gualberto JM, Grienenberger JM. 1996. RNA editing in plant mitochondria and chloroplasts. Plant Mol Biol 32:343-365.
Miyamoto T, Obokata J, Sugiura M. 2002. Recognition of RNA editing sites is directed by unique proteins in chloroplasts: biochemical identification of cis-acting elements and trans-acting factors involved in RNA editing in tobacco and pea chloroplasts. Mol Cell Biol 22:6726-6734.
Mower JP. 2005. PREP-Mt: predictive RNA editor for plant mitochondrial genes. BMC Bioinformatics 6:96.
Mower JP, Palmer JD. 2006. Patterns of partial RNA editing in mitochondrial genes of Beta vulgaris. Mol Genet Genomics 276:285-293.
Mulligan RM. 2004. RNA Editing in Plant Organelles. Pp. 239-260 in H. Daniel, and C. Chase, eds. Molecular Biology and Biotechnology of Plant Organelles. Springer, Dordrecht, NL.
Neuwirt J, Takenaka M, van der Merwe JA, Brennicke A. 2005. An in vitro RNA editing system from cauliflower mitochondria: editing site recognition parameters can vary in different plant species. Rna 11:1563-1570.
Notsu Y, Masood S, Nishikawa T, Kubo N, Akiduki G, Nakazono M, Hirai A, Kadowaki K. 2002. The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants. Mol Genet Genomics 268:434-445.
Nugent JM, Palmer JD. 1991. RNA-mediated transfer of the gene coxII from the mitochondrion to the nucleus during flowering plant evolution. Cell 66:473-481.
Reed ML, Hanson MR. 1997. A heterologous maize rpoB editing site is recognized by transgenic tobacco chloroplasts. Mol Cell Biol 17:6948-6952.
Reed ML, Peeters NM, Hanson MR. 2001. A single alteration 20 nt 5' to an editing target inhibits chloroplast RNA editing in vivo. Nucleic Acids Res 29:1507-1513.
Schmitz-Linneweber C, Regel R, Du TG, Hupfer H, Herrmann RG, Maier RM. 2002. The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: the role of RNA editing in generating divergence in the process of plant speciation. Mol Biol Evol 19:1602-1612.
Schmitz-Linneweber C, Tillich M, Herrmann RG, Maier RM. 2001. Heterologous, splicing-dependent RNA editing in chloroplasts: allotetraploidy provides trans-factors. Embo J 20:4874-4883.
Schmitz-Linneweber C, Williams-Carrier RE, Williams-Voelker PM, Kroeger TS, Vichas A, Barkan A. 2006. A pentatricopeptide repeat protein facilitates the trans-splicing of the maize chloroplast rps12 pre-mRNA. Plant Cell 18:2650-2663.
Sugiura M. 1995. The chloroplast genome. Essays Biochem 30:49-57. Takenaka M, Neuwirt J, Brennicke A. 2004. Complex cis-elements determine an RNA
editing site in pea mitochondria. Nucleic Acids Res 32:4137-4144.
25
Thomson MC, Macfarlane JL, Beagley CT, Wolstenholme DR. 1994. RNA editing of mat-r transcripts in maize and soybean increases similarity of the encoded protein to fungal and bryophyte group II intron maturases: evidence that mat-r encodes a functional protein. Nucleic Acids Res 22:5745-5752.
Tillich M, Funk HT, Schmitz-Linneweber C, Poltnigg P, Sabater B, Martin M, Maier RM. 2005. Editing of plastid RNA in Arabidopsis thaliana ecotypes. Plant J 43:708-715.
Wahleithner JA, MacFarlane JL, Wolstenholme DR. 1990. A sequence encoding a maturase-related protein in a group II intron of a plant mitochondrial nad1 gene. Proc Natl Acad Sci U S A 87:548-552.
26
Table 1. Distribution of dinucleotides in the -2/-1 window in mitochondrial (A) and chloroplast (B) editing sites.
A. Arabidopsis Betaa Oryzaa
dinucleotide # edited #unedited P Q P/Q P/Q P/Q UU 117 754 0.290 0.126 2.29 2.13 2.26 CU 54 400 0.134 0.067 1.99 2.40 2.35 UC 60 457 0.149 0.077 1.94 2.21 2.38 AU 61 489 0.151 0.082 1.84 1.60 1.50 AC 28 325 0.069 0.054 1.27 1.07 0.57 CC 23 295 0.057 0.049 1.15 1.46 1.37 GU 20 304 0.050 0.051 0.97 0.86 1.23 GC 15 276 0.037 0.046 0.80 0.75 0.47 UG 8 386 0.020 0.065 0.31 0.18 0.28 UA 8 413 0.020 0.069 0.29 0.45 0.30 CG 2 229 0.005 0.038 0.13 0.16 0.07 AA 3 381 0.007 0.064 0.12 0.04 0.04 CA 2 280 0.005 0.047 0.11 0.06 0.22 GA 1 285 0.002 0.048 0.05 0.07 0.06 GG 1 302 0.002 0.051 0.05 0.00 0.16 AG 1 389 0.002 0.065 0.04 0.00 0.00 Sum 404 5965
B.
Mitochondria Chloroplast At mt Nt ct Zm ct
dinucleotide P/Q P/Q P/Q UU 2.29 3.10 2.64 CU 1.99 0.64 1.52 UC 1.94 1.96 1.14 AU 1.84 2.26 2.23 AC 1.27 0.66 0.79 CC 1.15 2.50 1.84 GU 0.97 0 2.02 GC 0.80 0 0 UG 0.31 0 0 UA 0.29 0 0.52 CG 0.13 0 0 AA 0.12 0 0 CA 0.11 0 0 GA 0.05 0 0 GG 0.05 0 0 AG 0.04 0 0
a The number of edited and unedited cytidines analyzed in the Arabidopsis, Beta, and Oryza genomes is 404/5965; 332/5161; and 376/4683, respectively.
27
Table 2. Distribution of RNA editing sites in Mitochondrial Genesb.
Gene p value Gene p value Gene p value Atmt matR 0.002 Bvmt nad6 0.002 Osmt cox1 0.001 Atmt atp1 0.006 Bvmt matR 0.003 Osmt nad6 0.001 Atmt rps4 0.015 Bvmt ccmFn 0.010 Osmt atp6 0.008 Atmt nad5 Ex2 0.017 Bvmt ccmFc Ex1 0.014 Osmt rps2 0.011 Atmt ccmFc Ex1 (ccb452) 0.019 Bvmt nad5 Ex2 0.027 Osmt rps4 0.032 Atmt nad6 0.037 Bvmt ccmB 0.033 Osmt ccmFn 0.040 Atmt ccmB (ccb206) 0.039 Bvmt atp1 0.039 Bvmt rpl5 0.044 Bvmt atp4 0.045 Atmt rpl5 0.083 Bvmt rps4 0.063 Osmt ccmFc Ex1 0.196 Atmt rps3 Ex2 0.165 Bvmt ccmFc Ex2 0.122 Osmt nad9 0.222 Atmt ccmC (ccb256) 0.181 Bvmt nad4 Ex2 0.297 Osmt atp4 0.236 Atmt tatC 0.188 Bvmt cox3 0.365 Osmt tatC 0.327 Atmt rpl16 0.256 Bv nad9 0.455 Osmt rpl16 0.454 Atmt ccmFc Ex2 (ccb452) 0.298 Bvmt ccmC 0.537 Osmt nad2 Ex4 0.527 Atmt ccmFn (ccb382) 0.318 Bvmt atp6 0.616 Osmt ccmC 0.569 Atmt atp4 0.365 Bvmt nad2 Ex4 0.713 Osmt ccmB 0.607 Atmt cox2 Ex1 0.460 Bvmt rps3 0.716 Osmt cob 0.607 Atmt nad2 Ex4 0.559 Bvmt cob 0.778 Osmt rps1 0.674 Atmt ccb203 (ccmFN2) 0.562 Bvmt mttB (tatC) 0.833 Osmt ccmFc Ex2 0.952 Atmt cox3 0.597 Atmt cob 0.818 Atmt nad4 Ex2 0.864 Atmt nad9 0.947
b Protein coding sequences or exons greater than 500 nucleotides and with three or more editing sites were evaluated for editing site distribution. The observed variance for the intervals between editing sites was compared to the variance of 1000 trials of random editing site assignment.
28
Figure Legends.
Figure 1. Nucleotide sequences around RNA editing sites in monocot and dicot
mitochondria have similar entropy profiles. Relative entropy for the distribution of
nucleotides is plotted for 50 nucleotides flanking RNA editing sites (panel A) or -20 to
+8 nucleotide window in 1, 2, or 3 nucleotide windows (panels B, C, D) for Arabidopsis
and Oryza mitochondrial genomes. Random editing site assignment was used to produce
a randomly edited mitochondrial genome files and relative entropy analysis of 1000
random assignments was used to determine a mean relative entropy value and a 5%
confidence interval. The Brassica napus and Beta vulgaris mitochondrial genome were
also analyzed, and these results are provided in the supplemental information.
Figure 2. Specificity ratios for Arabidopsis editing sites suggest that multiple
contiguous nucleotides are important in editing site recognition in the -6 to -4
region. Selectivity ratios for mono-, di- and tri-nucleotides in the -6 to -4 region are
shown in the top, middle and bottom of the figure. The selectivity ratio for uridine at -5
is very high; however, the distribution of C at -5 and other mononucleotides at -4 and -6
are not notable. Selectivity ratios for dinucleotides at -6/-5 show that CU and CC are
enriched and at -5/-4 UA and CG are enriched around editing sites. The trinucleotide
CCG has a greater selectivity ratio than CUA that includes the highly enriched U at
position -5.
Figure 3. Selectivity ratios of dinucleotides near RNA editing sites are similar in
Arabidopsis, Beta, and Oryza mitochondrial genomes. A. The selectivity ratio (P/Q)
29
for dinucleotides in the -2/-1 window upstream of edited and unedited cytidines are
plotted as the selectivity ratio (P/Q) observed in Arabidopsis against the Oryza or Beta
values. Thus, each point represents the selectivity ratio for a specific dinucleotide, and a
large number of values are clustered near the origin. Regression analysis of the Oryza
(Os) and the Arabidopsis (At) selectivity ratios gives an equation of y = 1.003x - 0.03
with a coefficient of determination of r2 = 0.90. Regression analysis of the Beta (Bv) and
the Arabidopsis (At) selectivity ratios gives an equation of y = 1.04x - 0.02 with a
coefficient of determination of r2 = 0.96. B. The selectivity ratio (P/Q) for dinucleotides
in the +1/+2 window are plotted as the selectivity ratio (P/Q) observed in Arabidopsis
versus the Oryza or Beta. Regression analysis of the Oryza (Os) and the Arabidopsis (At)
selectivity ratios gives an equation of y = 1.03x - 0.03 with a coefficient of determination
of r2 = 0.69. Regression analysis of the Beta (Bv) and the Arabidopsis (At) selectivity
ratios gives an equation of y = 1.27x - 0.24 with a coefficient of determination of r2 =
0.84.
Figure 4. Selectivity ratios for dinucleotides upstream of RNA editing sites are
similar in mitochondrial and chloroplast genomes. The selectivity ratio (P/Q) for
dinucleotides in the -2/-1 window upstream of edited and unedited cytidines are plotted
as the P/Q value observed in the Arabidopsis genome versus the Nicotiana (Nt) or Zea
mays (Zm) genome on the Y axis. Thus, each point represents the selectivity ratio for a
specific dinucleotide, and a large number of values are clustered near the origin.
Regression analysis of the Nicotiana chloroplast (Nt ct) and the Arabidopsis
mitochondrial (At mt) selectivity ratios gives an equation of y = 1.07x- 0.19 with a
30
coefficient of determination of r2 = 0.65. Regression analysis of the Zea mays chloroplast
(Zm ct) and the Arabidopsis mitochondrial (At mt) selectivity ratios gives an equation of
y = 1.00 x - 0.03 with a coefficient of determination of r2 = 0.73.
Figure 5. The Effect of Codon Position on Relative Entropy. Relative entropy was
determined in a one nucleotide window for editing sites in the first, second or third codon
position (CPA1, 2, and 3) in the in the Arabidopsis (A) and Oryza genomes (B). The
number of editing sites analyzed in the first, second, and third codon position was 103,
163, and 26 in the Arabidopsis genome and122, 171, and 32 in the Oryza genome,
respectively.
Figure 6. Editing Sites are Sporadically Distributed in Plant Mitochondrial Genes.
The distribution of editing sites in coding sequences that exhibit non-random distribution
of RNA editing sites is illustrated on a line graph. The positions of RNA editing sites are
shown as vertical lines on a line representing the length of the coding sequence. The
average size of the largest gap for the genes that exhibited p values less than 0.05 was
533, 545, and 559 nucleotides in the Arabidopsis, Beta, and Oryza genomes, respectively.
Figure 7. Model of RNA Editing Site Recognition. A model for the interaction of the
editing apparatus with an editing substrate is shown. The edited cytidine is shown as a
bolded C, and regions where relative entropy is high are shown as an upper case N.
Groups of nucleotides that are frequently present in these positions are shown under the
RNA sequence. The groups of di- and tri-nucleotides noted in this figure show high
selectivity ratios that exceed the 5% confidence interval of the mean and standard
31
deviation of the selectivity ratios determined from randomly assigned editing sites.
Trinucleotides marked with a single asterisk were only significant in the monocot, Oryza,
and trinucleotides marked with two asterisks were significant in Arabidopsis.
Nucleotides with no asterisk were significant in both taxa.
Figure 8. Gene Conversion Model of Editing Site Loss Resulting in the Clustered
Distribution of RNA editing sites. Gene conversion events in the mitochondrial
genome that utilized cDNA sequence derived from edited mRNA would convert the
cytidines at editing sites to thymidines that would not require editing. This process
would eliminate editing sites with the region that experienced gene conversion, and
create stretches of coding sequence with no editing sites and would leave clusters of
editing sites within regions that had not experienced gene conversion.