Upload
phamhanh
View
230
Download
0
Embed Size (px)
Citation preview
University of IowaIowa Research Online
Theses and Dissertations
2013
Mechanisms Of MicroRNA evolution, regulationand function: computational insight, biologicalevaluation and practical applicationRyan Michael SpenglerUniversity of Iowa
Copyright 2013 Ryan Spengler
This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/2636
Follow this and additional works at: http://ir.uiowa.edu/etd
Part of the Cell Biology Commons
Recommended CitationSpengler, Ryan Michael. "Mechanisms Of MicroRNA evolution, regulation and function: computational insight, biological evaluationand practical application." PhD (Doctor of Philosophy) thesis, University of Iowa, 2013.http://ir.uiowa.edu/etd/2636.
1
MECHANISMS OF MICRORNA EVOLUTION, REGULATION AND FUNCTION:
COMPUTATIONAL INSIGHT, BIOLOGICAL EVALUATION
AND PRACTICAL APPLICATION
by
Ryan Michael Spengler
An Abstract
Of a thesis submitted in partial fulfillment of the requirements for the Doctor of
Philosophy degree in Molecular and Cellular Biology in the Graduate College of
The University of Iowa
May 2013
Thesis Supervisor: Professor Beverly L. Davidson
1
1
ABSTRACT
MicroRNAs (miRNAs) are an abundant and diverse class of small, non-protein
coding RNAs that guide the post-transcriptional repression of messenger RNA (mRNA)
targets in a sequence-specific manner. Hundreds, if not thousands of distinct miRNA
sequences have been described, each of which has the potential to regulate a large number of
mRNAs. Over the last decade, miRNAs have been ascribed roles in nearly all biological
processes in which they have been tested. More recently, interest has grown in understanding
how individual miRNAs evolved, and how they are regulated. In this work, we demonstrate
that Transposable Elements are a source for novel miRNA genes and miRNA target sites. We
find that primate-specific miRNA binding sites were gained through the transposition of Alu
elements. We also find that remnants of Mammalian Interspersed Repeat transposition, which
occurred early in mammalian evolution, provide highly conserved functional miRNA binding
sites in the human genome. We also provide data to support that long non-coding RNAs
(lncRNAs) can provide a novel miRNA binding substrate which, rather than inhibiting the
miRNA target, inhibits the miRNA. As such, lncRNAs are proposed to function as
endogenous miRNA “sponges,” competing for miRNA binding and reducing miRNA-
mediated repression of protein-coding mRNA targets. We also explored how dynamic
changes to miRNA binding sites can occur by A-to-I editing of the 3 ‘UTRs of mRNA
targets. These works, together with knowledge gained from the regulatory activity of
endogenous and exogenously added miRNAs, provided a platform for algorithm
development that can be used in the rational design of artificial RNAi triggers with
improved target specificity. The cumulative results from our studies identify and in some
cases clarify important mechanisms for the emergence of miRNAs and miRNA binding sites
on large (over eons) and small (developmental) time scales, and help in translating these gene
silencing processes into practical application.
2
2
Abstract Approved: ____________________________________ Thesis Supervisor
____________________________________ Title and Department
____________________________________ Date
1
MECHANISMS OF MICRORNA EVOLUTION, REGULATION AND FUNCTION:
COMPUTATIONAL INSIGHT, BIOLOGICAL EVALUATION
AND PRACTICAL APPLICATION
by
Ryan Michael Spengler
A thesis submitted in partial fulfillment of the requirements for the Doctor of
Philosophy degree in Molecular and Cellular Biology in the Graduate College of
The University of Iowa
May 2013
Thesis Supervisor: Professor Beverly L. Davidson
Graduate College The University of Iowa
Iowa City, Iowa
CERTIFICATE OF APPROVAL
_______________________
PH.D. THESIS
_______________
This is to certify that the Ph.D. thesis of
Ryan Michael Spengler
has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Molecular and Cellular Biology at the May 2013 graduation.
Thesis Committee: ___________________________________ Beverly L. Davidson, Thesis Supervisor
___________________________________ Adam Dupuy
___________________________________ John Logsdon
___________________________________ Andrew Russo
___________________________________ Yi Xing
ii
2
ACKNOWLEDGMENTS
First of all, I would like to thank my mentor, Dr. Bev Davidson, for her guidance,
motivation and most of all, patience. I also must acknowledge all the members of the
Davidson Lab, both past and present, who have been an invaluable source of knowledge,
discussion and guidance over the years. In particular, I would like to thank Ryan
Boudreau and Alex Mas Monteys with whom I have closely collaborated and who are
responsible for some of the work presented in this manuscript.
I also thank Dr. Anton McCaffrey, my first research mentor, who took the time to
train me in the basics of molecular biology techniques. He encouraged me to think
outside the box, and guided me as I learned to interpret data and manage my own
research projects.
I owe a special mention of thanks to the entire faculty in the Biology department
of Augustana College, who first taught me to think about science and encouraged me to
explore my own interests. Dr. Kristin Douglas and Dr. Dara Wegman-Geedey were
particularly amazing mentors who guided me in my own research. Dr. Douglas deserves
a special acknowledgement as she first introduced me to microRNAs, which sparked my
interest in the subject and led me to follow that interest in my graduate research.
Finally, words cannot truly describe my appreciation for the love and support my
family has given me over the years. My wife, Erin, most of all has been a vital source of
encouragement and the fact that I am writing this manuscript is in large part due to her
always being there for me.
Thank you all.
iii
3
ABSTRACT
MicroRNAs (miRNAs) are an abundant and diverse class of small, non-protein
coding RNAs that guide the post-transcriptional repression of messenger RNA (mRNA)
targets in a sequence-specific manner. Hundreds, if not thousands of distinct miRNA
sequences have been described, each of which has the potential to regulate a large number of
mRNAs. Over the last decade, miRNAs have been ascribed roles in nearly all biological
processes in which they have been tested. More recently, interest has grown in understanding
how individual miRNAs evolved, and how they are regulated. In this work, we demonstrate
that Transposable Elements are a source for novel miRNA genes and miRNA target sites. We
find that primate-specific miRNA binding sites were gained through the transposition of Alu
elements. We also find that remnants of Mammalian Interspersed Repeat transposition, which
occurred early in mammalian evolution, provide highly conserved functional miRNA binding
sites in the human genome. We also provide data to support that long non-coding RNAs
(lncRNAs) can provide a novel miRNA binding substrate which, rather than inhibiting the
miRNA target, inhibits the miRNA. As such, lncRNAs are proposed to function as
endogenous miRNA “sponges,” competing for miRNA binding and reducing miRNA-
mediated repression of protein-coding mRNA targets. We also explored how dynamic
changes to miRNA binding sites can occur by A-to-I editing of the 3 ‘UTRs of mRNA
targets. These works, together with knowledge gained from the regulatory activity of
endogenous and exogenously added miRNAs, provided a platform for algorithm
development that can be used in the rational design of artificial RNAi triggers with
improved target specificity. The cumulative results from our studies identify and in some
cases clarify important mechanisms for the emergence of miRNAs and miRNA binding sites
on large (over eons) and small (developmental) time scales, and help in translating these gene
silencing processes into practical application.
iv
4
TABLE OF CONTENTS
LIST OF TABLES ................................................................................................... vii
LIST OF FIGURES ................................................................................................ viii
LIST OF ABBREVIATIONS .................................................................................... x
CHAPTER
1. INTRODUCTION ................................................................................................. 1 miRNAs: biogenesis ............................................................................... 1 miRNAs: mechanism of action ............................................................... 2 miRNAs: transcriptional and co-transcriptional control ......................... 3 miRNAs: post-transcriptional control ..................................................... 4 miRNAs: changing the mature miRNA sequence modulates target profiles .................................................................................................... 4 miRNAs: changing mRNA sequence modulates target profiles ........... 6 Long noncoding RNAs ........................................................................... 7 Exogenous RNAi .................................................................................... 7 Exogenous RNAi: implementation and design ....................................... 8 Objectives ............................................................................................... 9 Summary ............................................................................................... 10 Published work ..................................................................................... 10
2. TRANSPOSABLE ELEMENTS CREATE FUNCTIONAL MICRORNAS AND MICRORNA TARGET SITES ............................................................ 14 Abstract ................................................................................................. 14 Introduction ........................................................................................... 14 Methods ................................................................................................ 17
Data retrieval and parsing .............................................................. 17 3’UTR analyses ...................................................................... 17 3’UTR TEs ............................................................................. 18
miRNA target prediction and TE annotation ................................. 18 TargetScan MRE predictions ................................................. 18 Local position to global coordinate conversion ..................... 18 Intersection of MRE and TE genomic coordinates ................ 19 Alu-MRE positional enrichment relative to Alu consensus ... 20 Generating unique MRE coordinates ..................................... 21
TE annotations of miRNA genes ................................................... 22 Detailed positional analysis of Alu-MREs .................................... 22 Microarray analysis ....................................................................... 23 Cloning 3’UTR reporters ............................................................... 23 Cloning endogenous microRNAs .................................................. 24 Cell culture and transfections ........................................................ 24 Luciferase assays ........................................................................... 25 RT-qPCR ....................................................................................... 25
microRNA .............................................................................. 26 Results................................................................................................... 26
MiRNAs have predicted binding sites in 3’UTR-resident TE sequences ....................................................................................... 26
v
5
Let-7 directly regulates genes through conserved, MIR-element-derived target sites ......................................................................... 27 miRNAs with high Alu-MRE frequency target specific regions in the Alu ........................................................................................... 28 miR-24 directly regulates transcripts through Alu-derived target sites ................................................................................................ 28 Proliferation of Alu and B1 SINEs resulted in the convergent acquisition of miRNA targets in their respective primate and murine lineages .............................................................................. 30 Potentially-active Alu loci contain miRNA binding motifs ......... 31 MiRNAs are processed from TE sequences and regulate target genes containing homologous elements ........................................ 31 Functional validation of Alu-derived miRNAs ............................. 32
Discussion ............................................................................................. 34
3. LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL SOURCE OF ENDOGENOUS MICRORNA “SPONGES” ................................ 53 Abstract ................................................................................................. 53 Introduction ........................................................................................... 53 Methods ................................................................................................ 55
Data sources ................................................................................... 55 Prediction and analysis of MRE content in lncRNAs ................... 55 RNA isolation and RT-PCR .......................................................... 56 Ago immunoprecipitation .............................................................. 56
Results................................................................................................... 57 Abundant MRE content is evident in many mouse lncRNAs ....... 57 Expression pattern of lncRNA, PSMI16 ....................................... 58
Adult mouse brain .................................................................. 58 Developing mouse at e14.5 .................................................... 58 Other adult mouse tissues and cell lines ................................. 58
PSMI16 associates with Ago2 ....................................................... 59 “Modular” exon structure and differential MRE inclusion in PSMI16 alternative isoforms ......................................................... 59
Discussion ............................................................................................. 60
4. SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND POTENT SIRNAS FOR HUMAN AND MOUSE............................................... 70 Abstract ................................................................................................. 70 Introduction ........................................................................................... 70 Methods ................................................................................................ 72
Dataset and Sequence Retrieval .................................................... 72 Formulating POTS ......................................................................... 73
Dataset selection ..................................................................... 73 Establishing weighted probability of repression (PR) values and POTS calculation ............................................................. 74
Tissue-specific POTS analysis ...................................................... 75 Validating siSPOTR ...................................................................... 75
Efficacy .................................................................................. 75 Ranking off-targeting potential .............................................. 76 Suppression signatures ........................................................... 76
SiRNA Design Tool Comparison .................................................. 77 Genome-wide shRNA coverage analysis and prospective library generation and comparison ............................................................ 78
Results................................................................................................... 79
vi
6
Low off-targeting siRNAs maintain potency ................................ 79 Design of effective low off-targeting potential siRNAs ................ 79
Strand-biasing ......................................................................... 80 GC-content ............................................................................. 80 Seed specificity ...................................................................... 81 SiSPOTR design example ...................................................... 83
Validation of siSPOTR algorithm: efficacy and specificity ......... 84 Efficacy .................................................................................. 84 Off-targeting potential ............................................................ 84
Comparison of siSPOTR to other algorithms ................................ 86 Prospective applications to expressed RNAi and genome-wide RNAi libraries ............................................................................... 86 SiSPOTR Online Tool ................................................................... 88
Discussion ............................................................................................. 89 Consideration of Seed Pairing Stability ........................................ 89 The Utility of siSPOTR ................................................................. 89
V. FINAL DISCUSSION ...................................................................................... 105 Competitive Endogenous RNAs ......................................................... 105 Off-targeting and RNAi design .......................................................... 106 Emerging technologies in the study of miRNA biology .................... 108
APPENDIX ............................................................................................................ 112
ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS GENERATES NOVEL MICRORNA BINDING SITES1F ...................................... 112 Abstract ............................................................................................... 112 Introduction ......................................................................................... 113 Results................................................................................................. 114
Adenosine deamination creates miRNA complementarities ....... 114 MiR-513 and miR-769-3p/-450b-3p specifically target deamination sites ......................................................................... 115 MiR-513 and miR-769-3p repress deaminated sequences .......... 116 MiR-769-3p represses DFFA expression specifically in cells that deaminate the DFFA 3’ UTR ...................................................... 117
Discussion ........................................................................................... 118 Materials and Methods ....................................................................... 120
Informatics evaluation of ADAR deamination sites ................... 120 Vector construction ..................................................................... 120 Luciferase assays ......................................................................... 121 Western blotting .......................................................................... 122
REFERENCES ...................................................................................................... 130
vii
7
LIST OF TABLES
Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome ................................................................................................... 49
Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs ................................................................................................. 64
Table 4-1. Comparison of siRNA design tools. .................................................... 101
Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.............................................................................................. 102
Table 4-3. The effect of seed position 8 on off-targeting potential by POTS ....... 103
Table A-1. MiRNA seed matches markedly enriched by adenosine deamination . 123
Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences ............................................................................................. 124
viii
8
LIST OF FIGURES
Figure 1-1. Canonical microRNA Biogenesis Pathway (Davis-Dusenbery & Hata, 2010). .................................................................................................. 12
Figure 1-2. Anatomy of lncRNA loci (Adapted from Rinn & Chang, 2012) ........ 13
Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs. ..... 38
Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs. ................................................................................................. 39
Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6. ...................................................................................... 41
Figure 2-4. Let-7 regulates 3’UTRs containing MIR-derived MREs.................... 42
Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction. ....................................................................... 43
Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus. .......................................................................... 44
Figure 2-7. Alu-derived MREs respond to miR-24 overexpression. ..................... 45
Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale .. 46
Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence. ................................................ 47
Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families. 48
Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences. ........................................................................................... 50
Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs. ................................................ 51
Figure 2-13. Pol III intronic promoters drive intronic miRNA expression. ............ 52
Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”. .................................. 62
Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs. 63
Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain. ......................................................... 65
ix
9
Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization. ....................................... 66
Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines. ............................................................................................. 67
Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells. .................................................................................................... 68
Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms. ..... 69
Figure 4-1. Diagram of on- and off-target silencing by siRNAs. .......................... 92
Figure 4-2. Effect of siRNA off-targeting potential on gene silencing capacity. .. 93
Figure 4-3. Formulation and distribution of POTS (potential off-targeting score). .................................................................................................. 94
Figure 4-4. Correlation of POTS ranks across tissues. .......................................... 95
Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm. ............................................................ 96
Figure 4-6. Validation of siSPOTR: efficacy and off-targeting. ........................... 97
Figure 4-7. Spearman rank correlation of final POTS values. .............................. 99
Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors. .............................................................................................. 100
Figure 4-9. Comparison of off-targeting potentials among shRNA libraries. ..... 104
Figure A-1. ADARs deaminate adenosine to inosine, potentially altering miRNA complementarities. ............................................................................ 125
Figure A-2. A-to-I edits frequently create miR-513 and miR-769-3p / -450b-3p complementarities. ............................................................................ 126
Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence. ............................................................................ 127
Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression. ......................................................................................... 128
Figure A-5. MiR-769 selectively represses DFFA protein. .................................. 129
x
10
LIST OF ABBREVIATIONS
ADAR Adenosine Deaminases that Act on RNA
AGO Argonaute
A-to-I Adenosine to Inosine
BEND3 Brain-derived Endothelial Cells
C Conserved
cDNA Complementary DNA
ceRNA Competing Endogenous RNA
CHST6 Carbohydrate (N-acetylglucosamine 6-O) Sulfotransferase 6
CISD2 CDGSH Iron Sulfur Domain 2
CTRL Control
DFFA DNA Fragmentation Factor Alpha
DGCR8 DiGeorge Syndrome Critical Region Gene 8
DMEM Dulbecco's Modified Eagle Medium
DNA Deoxyribonucleic Acid
DPC Days Post Coitum
EIF2S3 Eukaryotic Translation Initiation Factor 2, Subunit 3 gamma
ESC Embryonic Stem Cell
EST Expressed Sequence Tags
EXP5 Exportin-5
F11R Platelet F11 Receptor
FBS Fetal Bovine Serum
GAPDH Glyceraldehyde-3-Phosphate Dehydrogenase
GAS5 Growth Arrest-Specific 5
HE High Efficacy
HEK293 Human Embryonic Kidney 293 Cells
HITS-CLIP High Throughput Sequencing Crosslink Immunoprecipitation
hsa Homo sapiens (Human)
IP Immunoprecipitation
ISH In situ Hybridization
kb Kilobase
xi
11
LE Low Efficacy
lincRNAs Long Intergenic Non-coding RNAs
LINE1 (L1) Long Interspersed Nuclear Element 1
LINE2 (L2) Long Interspersed Nuclear Element 2
MAID miRNA Associating If Deaminated
MAP3K9 Mitogen-Activated Protein 3-Kinase 9
miRNA microRNA
MIRs Mammalian-wide Interspersed Repeats
mmu Mus musculus (Mouse)
MRE miRNA recognition element
mRNA Messenger RNA
N2A Neuro 2A Cells
NC Non-Conserved
ng Nanogram
nM Nanomolar
nt Nucleotide
PBMC Peripheral Blood Mononuclear Cell
PCDHB1 Procadherin Beta 11
PolII RNA Polymerase II
POTS Potential Off-Targeting Score
PPIB Peptidylprolyl Isomerase B (Cyclophilin B)
PR Probability of Repression
pre-miRNA Precursor microRNA
pri-miRNA Primary microRNA
PSMI16 Putative Sponge for miRNA-16
PTEN Phosphatase and Tensin Homolog
Ptr Pan troglodytes (Chimpanzee)
REST RE1-Silencing Transcription Factor
RISC RNA-induced Silencing Complex
RNA Ribonucleic Acid
RNP Ribonucleoprotein
RT-PCR Reverse Transcription Polymerase Chain Reaction
xii
12
SEMA3F Semaphorin-3F
SFXN2 Sideroflexin2
shRNA Small (Short) Hairpin RNA
siRNA Small (Short) Interfering RNA
siSPOTR siRNA Seed Potential of Off-Target Reduction
SLC12A8 Solute Carrier Family 12, member 8
T4-PNK T4-Polynucleotide Kinase
TA Target Abundance
TEs Transposable Elements
TRC The RNAi Consortium
TU Transcription Unit
UBXN2B Ubiquitin Regulatory X domain-containing protein 2B
µl Microliter
1
CHAPTER 1
INTRODUCTION
miRNAs: biogenesis
Arguably the most extensively studied of the endogenous small RNA families,
miRNAs are distinguished by the characteristic stem-loop structure of their precursor
transcripts. Approximately 50% of human miRNAs are clustered with one or more other
miRNAs that are believed to be co-transcribed as a single polycistron RNA Polymerase II
(PolII) transcribed (Griffiths-Jones et al, 2006). Nearly the same fraction are hosted
within and co-transcribed as part of a protein-coding messenger RNA (mRNA)
transcription unit (TU). Classically, these primary miRNA (pri-miRNA) are initially
clipped out of the nascent TU by the microprocessor complex, comprised of the RNAase
III enzyme, DROSHA (Han, 2004) and its essential cofactor, DiGeorge syndrome critical
region gene 8 (DGCR8) (Gregory, 2004; Han, 2006). DGCR8 is believed to bind to a
single-stranded portion of the pri-miRNA located at the base of the double-stranded stem,
opposite the loop. Guided by DGCR8, DROSHA cleaves the pri-miRNA ~11 nucleotides
(nt) into the stem, releasing a precursor miRNA (pre-miRNA) hairpin product comprised
of the stem and loop (Han, 2006). RNA splicing and pri-miRNA processing appear to be
tightly coordinated in the case of intron-resident miRNAs, with evidence pointing to
miRNA “cropping” proceeding intron removal; cropping does not appear to impact the
splicing process itself (Kim & Kim, 2007). On the other hand, mirtrons are a distinct
miRNA subgroup that depend directly on splicing activity. They form from the lariat
structure formed during splicing and actually bypass the DROSHA cleavage step
(Berezikov et al, 2007; Okamura et al, 2007).
The pre-miRNA intermediate is shuttled out of the nucleus through the Exportin-5
(EXP5) nuclear transport receptor in cooperation with a Ran GTPase cofactor (Bohnsack
et al, 2004; Lund et al, 2004; Yi et al, 2005). EXP5 recognizes the double-stranded stem
2
of the pre-miRNA, along with the characteristic 2 nt 3’ overhang left by RNAse III
enzymes like DROSHA. Once released into the cytoplasm, a second RNase III enzyme,
DICER1, binds the pre-miRNA and cleaves the loop, yielding a ~20nt double-stranded
RNA with 2 nt 3’ overhangs on both ends (Macrae, 2006; Saito et al, 2005).
miRNAs: mechanism of action
In a process that is not well-understood, DICER1 facilitates loading of the
miRNA duplex into an AGO protein at the heart of the RNA-induced Silencing Complex
(RISC). In mammals, miRNAs are loaded into any one of four AGO proteins (AGO 1-4).
Understanding the distinct roles of each AGO complex is still under active investigation,
but it is known that AGO 1 and 2 are the primary isoforms in mammals and that AGO 2,
but not 1, 3 or 4, has catalytic “slicer” activity (Azuma-Mukai, 2008). Regardless of the
isoform, only one of the two miRNA strands is ultimately incorporated into RISC as the
antisense, or “guide” strand. The sense, or “passenger” strand is degraded after either
being cleaved by AGO slicer activity, or separated by an as-yet unidentified RNA
helicase.
Once loaded into RISC, the mature miRNA imparts target specificity to the
silencing complex. In animals, miRNAs primarily interact with the 3’UTRs of protein
coding transcripts, binding to partially-complementary motifs encoded in the mRNA. As
few as 6-7 nucleotides of complementarity between a miRNA recognition element
(MRE) in a target mRNA and the vital “seed” region of the miRNA (nt 2-7/8) is often
sufficient to impart RISC activity, usually resulting in a reduction of protein output from
that transcript through transcript destabilization or translation inhibition. With so few
base pairs mediating target interactions, hundreds or perhaps thousands of transcripts may
contain potential binding motifs. Also, the importance of the seed region is such that a
single base change, or shift in processing, can change the seed sequence and therefore,
3
drastically change the potential target profile. Because of these features, many
mechanisms have evolved to control the expression level and processing of the miRNA.
miRNAs: transcriptional and co-transcriptional control
MiRNAs generally impart a repressive effect directly related to the concentration
of the mature miRNA. Since miRNA biogenesis involves several sequential and
interdependent processes, each step represents a potential point of regulation. For
example, most miRNA are transcribed by RNA Polymerase II (PolII), the same
polymerase that transcribes protein-coding genes, and many of the same mechanisms of
transcriptional control apply. In fact, intronic miRNAs may directly rely on the
transcriptional control of the host gene for its own expression. This suggests that the
same transcription factors and chromatin modifiers regulating mRNA expression at the
transcriptional level can consequently regulate the miRNA levels. Analogous
transcriptional regulation has been observed with intergenic PolII-transcribed miRNA
TUs as well. For example, our lab previously showed that the RE1-silencing transcription
factor (REST) can repress expression of neuron-enriched miR-9 (Packer et al, 2008).
REST is classically known as a transcription factor that suppresses neuronal genes in
non-neuronal cells. Furthermore, in several documented cases including the example of
REST and miR-9, the transcription factor is reciprocally regulated by the miRNA, setting
up feedback or feed-forward loops (Bracken, 2008; Packer et al, 2008).
Interestingly, some intronic miRNAs have discordant expression with their host
gene in certain cellular contexts. One possible explanation for this, which I described
earlier, is that promoter elements located in the intron upstream of the miRNA sequence
can independently drive transcription of the miRNA (Monteys et al, 2010). Another not
mutually-exclusive explanation for discordant expression is that pri-miRNA processing is
altered through microprocessor regulation. For example, SMAD proteins, which
classically serve in TGFβ signal transduction, bind to sequences in the loops of some
4
miRNAs, including miR-21 and miR-199a (Davis et al, 2008). This interaction promotes
DROSHA cleavage, increasing levels of that miRNA. Several examples of
microprocessor inhibition have also been demonstrated. For example, Adenosine
Deaminases that act on RNA (ADAR) catalyze the conversion of Adenosine to Inosine
residues in some double-stranded RNA substrates, and a subset of miRNAs have shown a
reduction in processing efficiency when deaminated at particular residues (Yang, 2006).
miRNAs: post-transcriptional control
As mentioned earlier, pre-miRNAs are exported from the nucleus to be processed
further by DICER1. Control of miRNA export can occur by modifying the capacity to
interact with EXP5 or altering the Ran GTPase cycle. This would be one mechanism to
regulate miRNAs on a global level. Another means of miRNA regulation is at the level of
DICER1 processing. One well-documented example of this involves the interaction
between the protein, LIN28A and the miRNA, let-7. LIN28A binds to the loop of let-7
and inhibits both DROSHA and DICER1 processing. LIN28A can also lead to
recruitment of the terminal urydilase, TUT4, which catalyzes the non-templated addition
of poly-uridine residues to the 3’ end of let-7 (Piskounova et al, 2011). This results in
rapid let-7 turnover. As with the REST/miR-9 example presented above, LIN28A has
3’UTR binding sites for let-7, setting up an auto-regulatory feedback loop (Newman et
al, 2008). This interaction results in an inversely-correlated expression pattern of mature
let-7 and LIN28A during development. Interestingly, high levels of let-7 precursor can be
detected throughout development, but the mature sequence only appears in terminally-
differentiated cells once LIN28A expression drops.
miRNAs: changing the mature miRNA sequence
modulates target profiles
Once a mature miRNA is produced, most research supports that the miRNA
mediates target repression by binding to 6-8 nts in mRNA 3’UTRs that are
5
complementary to the seed sequence (positions 2-8). In most cases, even a single
mismatch between the 6 nt “core” (position 2-7) completely disrupts miRNA binding
(Lewis et al, 2005). Additionally, many target sites with little or no binding outside of the
seed respond to the miRNA, suggesting that in addition to being necessary for miRNA
function, the seed may also be sufficient. The limited role of the flanking miRNA
sequence is supported by the crystal structures of AGO family proteins showing that the
3’ end of miRNAs are actually flipped out of the pocket of miRNA- target binding
(Schirle & MacRae, 2012).
Together, these observations support a critical role for the seed region in
determining the target specificity of a miRNA. This also suggests that regulatory
mechanisms that alter the miRNA seed sequence could cause global changes in the
miRNAs target profile. For example, as mentioned above, ADAR enzymes catalyze A-to-
I editing of some miRNA sequences, which can alter the efficiency of their processing.
However, because inosine is functionally equivalent to guanine in terms of base-pairing
interactions, editing of a single nucleotide of the seed could result in very different target
profile.
Another potential means to alter the miRNA seed was proposed based on studying
large-scale sequencing of endogenous small RNAs. These studies revealed that many
miRNAs have variable 5’ and 3' ends, as DROSHA and DICER1 processing are not
always perfectly precise. This creates isomiRs with 5’ or 3’ termini shifted by one or
more bases. While the majority of these shifts were found at the 3’end (leaving the seed
unaffected), some miRNAs had variable 5’ ends as well; although, to date, no clear
example demonstrating a functional role for a 5’ isomiR has been shown. That being said,
this phenomenon is an important consideration when designing exogenous RNAi
triggers, as described in Chapter 4. In short, some artificial sequences demonstrate
sequence-dependent cellular toxicity, largely due to widespread seed-mediated
transcriptional dysregulation. These so-called “off-target effects” can be mitigated by
6
rationally designing sequences with a low propensity for off-targeting. However,
misprocessing of the exogenous sequence could result in seed sequence variant, altering
its off-targeting potential. Keeping this in mind, RNAi trigger design should consider the
off-targeting potential of seed sequences caused by processing shifts.
Finally, global modulation of miRNA target profiles can also be regulated through
strand selection. Most miRNAs are predominantly processed into a single mature
isoform, strongly biased towards loading of only one of the two strands of the miRNA
duplex. Until very recently, miRNA naming conventions allowed the non-dominant arm
to be designated the star (*) strand. However, examples continue to accumulate
demonstrating specific conditions under which the star strand plays an important, if not
predominant role.
miRNAs: changing mRNA sequence
modulates target profiles
Individual miRNAs potentially regulate hundreds of target transcripts.
Consequently, mechanisms to regulate miRNA activity on a global level are best aimed at
directly regulating the miRNA, as outlined above. However, regulating miRNA activity
at the level of the target transcript would provide a mechanism to control which
transcripts are bound by the miRNA. This process has been observed on a global scale in
the context of some cancer cell lines and other dividing cell types. In these settings,
widespread shortening of 3’UTRs was observed, due to alternative splicing or
polyadenylation site choice. This resulted in loss of miRNA binding sites, concomitant
transcript stabilization and increased protein production. Differential expression between
long and short isoforms of the target mRNAs coincided with differential inclusion of
miRNA binding sites.
7
Long noncoding RNAs
Long non-coding RNAs (lncRNAs), like their small RNA counterparts, are a
recently-discovered class of RNAs, broadly categorized based on their lack of coding
potential (<100 amino acid Open Reading Frame) and long (>200 nt) transcript size. This
rather nondescript and arbitrary definition serves to distinguish these novel transcript
with the ever-increasing small RNA world. Some of the classically-described small
nuclear RNA spicing factors could fall within this definition, but the lncRNA jargon
tends to focus on the more recently-discovered groups.
The nomenclature used for lncRNA subgroups are somewhat informative.
Nomenclature typically relies on describing the position and orientation of lncRNAs
relative to nearby protein-coding TUs (Figure 1-1). While the descriptions do not infer
function, most of the data regarding transcripts overlapping protein coding genes support
a cis regulatory role. That is, transcripts overlapping protein-coding TUs tend to be
involved in the transcriptional or epigenetic regulation of the protein coding genes they
overlap.
This contrasts with the class of long intergenic non-coding RNAs, which by
definition lack nearby protein-coding transcripts. These transcripts may be RNA
scaffolds for proteins involved in transcriptional or epigenetic control. These transcripts
are similar to protein-coding mRNAs; they are often spliced, capped and polyadenylated.
These transcripts do have some protein-coding potential but it is unlikely to be their
predominant function based on conservation and ORF size.
Exogenous RNAi
As mentioned earlier, endogenous miRNAs typically mediate post-transcriptional
repression of mRNAs by pairing to partially-complementary sequence motifs within
target 3’UTRs. Although relatively rare in mammalian cells, fully-complementary pairing
between the mature miRNA and its target mRNA can impart AGO2-mediated “slicer”
8
activity. Exogenous RNAi triggers are artificial sequences designed to engage the
miRNA pathway and induce AGO2-mediated cleavage of a desired mRNA target gene.
With the advent of this technology, specific repression of nearly any gene of interest can
be achieved without depending on specific drugs or the more technically-difficult genetic
knockouts. Therapeutically, RNAi triggers have been used to silence viral genes,
dominantly-heritable toxic gene products and disease-modifying proteins.
Exogenous RNAi: implementation and design
Artificial RNAi triggers can be chemically-synthesized double-stranded RNAs
(dsRNA) or expressed hairpin transcripts. Chemically-synthesized oligonucleotides are
usually ~21nt short (small) interfering RNA (siRNAs). Because siRNAs structurally
resemble the dsRNA products loaded into AGO proteins after dicer cleavage, siRNAs
bypass DROSHA and DICER1 processing. SiRNAs are commonly used in lipid-based
transfections in vitro, although means of efficiently delivering these molecules in vivo is
an area of active research.
Expressed RNAi constructs are short hairpin RNA (shRNA) or miRNA shuttles.
ShRNAs structurally resemble pre-miRNA hairpins yielded from DROSHA processing
and are typically driven by RNA Polymerase III (Pol III) promoters, such as U6 and H1.
Studies from our lab and others showed that shRNAs can be relatively poor substrates for
DICER1 and can cause buildup of precursor RNA with consequent cellular toxicity.
Better substrates were obtained by cloning the artificial sequences into the context of an
endogenous miRNA sequence. For example, miR-30-based shuttles used commonly in
our lab express an exogenous dsRNA stem in the context of the endogenous miR-30
along with a portion of the 5’ and 3’ flanking sequence, generating a sequence more
closely resembling a natural pri-miRNA processed by DROSHA in the nucleus. In any
case, the ultimate goal is to introduce a sequence into the endogenous miRNA pathway to
load the desired ~21 nt guide strand antisense to the target gene of interest. Because, as
9
mentioned above, exogenous sequences enter the miRNA pathway and can impart
miRNA-like seed interactions, incorrect processing or loading of the passenger strand
may not only decrease efficacy of target silencing, but also increase the off-target effects
imparted by the sequence. Design techniques used to minimize these undesired effects
are described in more detail in Chapter 4.
Objectives
In the work described here, questions of miRNA interactions with mRNA targets
are addressed. In Chapter 2, I provide computational and functional evidence addressing
whether Transposable Elements (TEs) are involved in the evolution and function of
human miRNAs. More specifically, are miRNAs processed from TEs functional and do
they regulate mRNAs containing MiRNA Recognition Elements (MREs) embedded
within homologous sequences? Also, do highly-conserved, non-TE-derived miRNAs
functionally target 3’UTR TEs, gaining novel MREs via lineage-specific TE proliferation
events? The work presented in the Appendix serves to answer whether Adenosine-to-
Inosine RNA editing of a subset of TE (Alu)-derived MREs dynamically modifies
miRNA binding. Single base changes resulting from this RNA editing could create,
destroy or even switch miRNA binding sites.
While miRNA binding sites are believed to primarily reside in the 3’UTRs of
protein-coding mRNAs, I also address the important question as to whether long
noncoding RNAs (lncRNAs), which are by definition “untranslated,” are also bound by
miRNAs. Specifically, I propose that some lncRNAs are endogenous miRNA “sponges”
owing to their numerous, often >10, MREs for a specific miRNA family. In this way, I
suggest that some lncRNAs compete for miRNA binding to mRNA 3’UTRs, regulating
miRNAs in a competitive manner.
Finally, in Chapter 4, I take knowledge gleaned from studies of miRNA
interactions with targets and ask whether miRNA target rules can guide rational design of
10
highly-specific artificial RNAi triggers. These exogenous molecules are known to enter
the endogenous miRNA pathway to be processed and to mediate cleavage and
degradation of the intended mRNA target. However, the artificial sequences also can
behave like miRNAs, leading to unintentional repression of mRNAs through seed-
mediated 3’UTR interactions. By designing antisense sequences with very few MRE
sites, can I limit the off-target impact of the RNAi sequences, and reduce false-positive
functional changes and cellular toxicity?
Summary
Endogenous miRNAs classically regulate the expression of protein-coding
transcripts through interactions with short, often evolutionarily-conserved, sequence
motifs in 3’UTRs. However, non-conserved MREs outnumber conserved ones ~10:1 and
a significant proportion of these may respond to miRNA-mediated silencing (Farh et al,
2005). Cellular mechanisms have evolved to control the promiscuity of these sequences
to some extent, but the ability to recognize novel binding sites plays an important role in
the evolution of miRNA function. Understanding these interactions not only aids in the
discovery of novel roles for endogenous miRNAs, but also improves our understanding
of the side-effects associated with exogenous RNAi triggers and improves the design of
future RNAi therapeutics.
Published work
With the exception of Figure 2-10, Chapter 2 is adapted from a work in
preparation for peer-reviewed publication where RMS serves as the primary author and
designed all experiments in conjunction with his mentor, Beverly Davidson (BLD). The
work presented in figure 2-10 was designed and performed by RMS, but was adapted
from (Monteys et al, 2010) published in RNA. Chapter 3 represents a work in progress
that will be submitted for peer review publication pending results from further functional
assays. RMS designed all studies and will be the sole primary author. Chapter 4 is
11
adapted from (Boudreau et al, 2013), published in Nucleic Acids Research, where RMS
was an equally-contributing primary author. A complementary manuscript was published
in Molecular Therapy in 2011, where RMS was a second contributing author, but data are
not presented directly from that work (Boudreau et al, 2011). The data presented in the
Appendix are adapted from a publication in Human Molecular Genetics, where RMS is a
contributing author (Borchert et al, 2009). Work contributed by authors other than RMS
are indicated in the methods or figure legends.
12
Figure 1-1. Canonical microRNA Biogenesis Pathway (Davis-Dusenbery & Hata, 2010).
13
Figure 1-2. Anatomy of lncRNA loci. (Adapted from Rinn & Chang, 2012)
Owing to the largely unknown function of most lncRNAs, they are often classified according to their location and orientation in relation to nearby protein-coding genes. Gene-proximal lncRNAs often act in cis, regulating expression from the protein-coding gene. Antisense transcripts initiate transcription within or 3’ of the protein-coding gene and are transcribed in the opposite direction
14
CHAPTER 2
TRANSPOSABLE ELEMENTS CREATE FUNCTIONAL
MICRORNAS AND MICRORNA TARGET SITES
Abstract
Transposable Elements (TEs) account for nearly one-half of the sequence content
in the human genome. De novo germline transposition into regulatory or coding
sequences of protein-coding genes causes several heritable disorders. However, TEs are
prevalent in and around protein-coding genes, sparking inquiry into possible regulatory
function. Computational studies revealed miRNA genes and miRNA Recognition
Elements (MREs) residing within TE sequences, but little evidence exists to support a
role for these sequences. In this work, I functionally validate miRNAs and MREs derived
from the most prevalent TE families, including evolutionarily ancient LINE2 and MIR
retrotransposons as well as primate-specific Alu elements.
Introduction
Transposable Elements (TEs or transposons) mobilize and reintegrate within a
host organism’s genome and different TE classes have diverse structural features,
transposition mechanisms and evolutionary origins. Some elements mobilize via "copy
and paste" mechanisms and others through "cut and paste.” Retrotransposons (Type I)
replicate by transcribing an RNA copy that subsequently reintegrates into the host
genome and serves as a template for RNA-dependent DNA polymerase (a.k.a. reverse
transcriptase) activity. Analogous mechanisms are essential in the life cycle of infectious
retroviruses like Human Immunodeficiency Virus (HIV) and Human T-Cell Leukemia
Virus (HTLV). Non-infectious Endogenous Retroviruses (ERVs) are predominant
members of the Long Terminal Repeat (LTR)-containing subclass of retrotransposons.
Non-LTR-containing retrotransposons, including Long and Short Interspersed Nuclear
Elements (LINEs and SINEs, respectively), are the most abundant TE class in the human
15
genome, accounting for more than 30% of the total DNA content. Additionally, non-LTR
LINE1, Alu and SVA elements are the only TE families that remain active. DNA (Type
II) transposons encode proteins that excise the TE and facilitate reintegration elsewhere.
Although no DNA elements are active in the human genome at present, evolutionary
analysis of human DNA element sequences, accounting for 3% of the total genome
content, revealed that primate genomes had abundant DNA transposition until ~37
million years ago (MYA). Together, TEs mobilizing through both mechanisms have
modified the human genome as well as the genomes of most organisms across all
domains of life, and some continue to do so.
Irrespective of the mechanism, transposition of "active" elements is potentially
mutagenic as TE excision (Type II) or integration (Type I and II) can directly disrupt the
sequence or expression of protein-coding genes. Examples of de novo germline insertions
of active Alu, LINE1 and SVA elements are evident in more than 60 human diseases
including β-thalassemia, hemophilia and cystic fibrosis. Additionally, high copy numbers
of Alu and LINE1 elements, both active and inactive, can cause somatic genome
instability and cancer.
In spite of their mutagenic potential, TEs are commonly observed as the
predominant contributor to a genome's sequence content. In fact, ~80% of the 17
gigabase-pair (Gb) bread wheat genome (Triticum aestivum) is TE-derived, which is
more than 4.5 times the size of the human genome. Conservative estimates place TE
content of humans at ~45% of the genome. More recent estimates using an improved TE
prediction algorithm suggests that this value is closer to 65-70%.
Mechanisms to protect against potentially deleterious transposition events have
evolved in plants and animals, including RNA interference (RNAi). In the mammalian
germline, where heritable mutations can accumulate, piwi-Interacting RNAs (piRNAs)
and some endogenous siRNAs (endo-siRNAs), are loaded into Argonaute-family proteins
(PIWI and Ago2, respectively) and guide silencing complexes to complementary TE
16
sequences. Intriguingly, computational observations from our lab and others reported that
miRNAs can be processed from TE-derived genomic loci (Borchert et al, 2006 ;
Piriyapongsa & Jordan, 2007; Piriyapongsa et al, 2007; Smalheiser & Torvik, 2005;
Smalheiser & Torvik, 2006a). Although miRNAs have no reported role in TE defense,
the canonical targets for miRNA regulation—mRNA 3'UTRs—are often littered with TE
sequences. This led to the hypothesis that TE-derived miRNAs may target homologous
TEs in 3'UTRs, thus imparting miRNA-mediated regulation in a TE-dependent manner.
This would be similar in principal to endo siRNA and piRNA mediated TE repression.
Although computational data are abundant to support this hypothesis, functional
validation of these interactions remains rare. To date, only LINE2-derived miR-28-5p
and miR-151 bind to the 3'UTRs of CXCR5 and LYPD3, respectively, through non-
canonical miRNA recognition elements (MREs), causing endonucleolytic cleavage of the
target substrate. In this work, I show that miR-28 regulates gene expression of several
target mRNAs through LINE2 elements in 3’UTRs. Unlike the LINE1 and Alu elements
described above, LINE2s are inactive remnants of an ancient period of transposition that
was active early in placental mammal evolutionary history. The LINE2-derived targets
show sequence conservation across all extant species sharing a common ancestor.
I also demonstrate the impact of mammalian-wide and primate-specific TEs on
target gene interactions for many miRNA families. Specifically, I show that let-7
regulates several genes through MIR-embedded target sites in human cells, that these
interactions are highly conserved, and that they represent adaptive changes to let-7
function held over from the evolution of placental mammals. I also demonstrate primate-
specific additions to miR-24 function as a result of its seed sequence being present in the
Alu consensus. Finally, I provide evidence for the functionality of primate-specific, Alu-
derived miR-1285, and show that it regulates transcripts through MREs contained within
homologous Alu elements.
17
Methods
Data retrieval and parsing
Unless stated otherwise, all genomic data were obtained from the UCSC Genome
Browser using the Table Browser utility.
3’UTR analyses
Human (GRCh37/hg19) and mouse (NCBI37/mm9) 3’UTR sequences, genomic
coordinates and accession numbers were downloaded from the RefSeq annotations track.
"RefSeq Genes" was selected under the "Genes and Gene Prediction Tracks" group. To
return only data for protein-coding mRNAs, the filter option was applied to the "refGene"
table, changing the "name" field filter so that "name" does match "NM*", thus selecting
RefSeq accession numbers beginning with the NM prefix. To extract the 3’UTR
sequence data, "sequence" was selected as the output format. Parsing and reformatting of
the sequence data was performed using the Galaxy web server tools. After selecting "Get
Output", 3’UTR data were specifically extracted by choosing "genomic" in the window
asking for sequence type. In the next screen, all but the 3’UTR box were unchecked. To
facilitate accurate conversion between local and global MRE coordinates in downstream
analyses, the radio button was chosen to output one FASTA record per region.
To prepare the 3’UTRs for TargetScan input, Galaxy tools were used to convert
FASTA to Tabular format which was then manipulated in Excel. Genomic coordinates
(chromosome, start, end and strand) and RefSeq IDs are provided in the FASTA header
and this information was combined to serve as the ID column in the required TargetScan
format. The species IDs (human=9606; mouse=10090) was added before saving as a text
file.
18
3’UTR TEs
The UCSC Genome Table Browser was used to extract RepeatMasker Track
annotations from the human and mouse assemblies. The "RepeatMasker" track was
selected from the "Variation and Repeats" group. "All fields from selected table" was
chosen as the output format and the data sent to the Galaxy server. Unless otherwise
indicated, simple repeats, low complexity regions and other non-TE repeats were filtered
out of the dataset.
In Galaxy, the RepeatMasker output was converted to "Interval", indicating the
proper assembly and coordinate information. To select 3’UTR TEs from the full genome
list, the 3’UTR and TE coordinates were joined on genomic coordinates using the tool by
that name and requiring at least 6bp of overlap (the minimum seed match size).
miRNA target prediction and TE annotation
TargetScan MRE predictions
The TargetScan 5.1 Perl script was downloaded from the Human TargetScan
website and default parameters were used to predict seed binding sites in 3’UTRs
(Grimson et al, 2007). At this time, TargetScan6.2 is the most recent release and the new
code allows prediction of additional MRE site-types. The default site-type settings in the
targetscan_60.pl code (7mer-A1, 7mer-M8 and 8mer-1A) are identical to those in the
TargetScan 5.1 version, except that 8mer-1A is labeled as 8mer. Unique miRNA seed
family IDs and seed sequences were taken from the miR_Family_Info.txt file provided
on the TargetScan website.
Local position to global coordinate conversion
The conversion of local positions to global genomic coordinates was a common
task and several methods were employed depending on the number of records and the
operating system (Windows or Linux) being used. The method outlined below describes
19
the use of Excel for working with the text files; however, Excel 2007, 2010 and 2013 are
limited to just over 1 million rows per worksheet. Any other methods employed carried
out the same math operations, as shown below.
The TargetScan output file lists all miRNA and target gene pairs, along with the
MRE site-type and position information. Target site positions are relative to the
beginning of the 3’UTR target, with the first base as position 1. To convert to genomic
positions, the 3’UTR genomic coordinates were extracted from the target name column
(see "3’UTRs" under Data Retrieval in the methods above) and combined with the local
position information. It is important to note that UCSC coordinates are a 0-based system,
meaning that the first base is 0. The adjustments are shown in the formulae, below. These
calculations were all done in Microsoft Excel.
:: 1,
,
:, +
1,
Intersection of MRE and TE genomic coordinates
MRE genomic coordinates and TargetScan output information were uploaded to
the Galaxy public server to use the available coordinate-based functions (Blankenberg et
al, 2010). The uploaded file was converted to "interval" format, using the calculated
MRE genomic coordinates for the "START" and "END" fields. MRE and RepeatMasker
tables were joined using the "Join the intervals of two datasets side-by-side" function and
setting the "min overlap (bp)" field to 1 and the "return" field to "all records of first field"
(assuming the MRE table is input as the first dataset). MREs not overlapping a TE are
returned with the TE information filled with a null value. Because 3'UTR information
was already present in the MRE table, the original RepeatMasker output from the UCSC
genome browser was used for this intersection.
20
Galaxy tools were used to join overlapping target site and RepeatMasker TE
coordinates. Analysis of miRNA target site frequency in TE families was subsequently
performed in Microsoft Excel. Because target prediction already accounted for transcript
orientation, overlaps were allowed on either strand. TEs were labeled as being transcribed
in the "+" or "−" orientation. If the TE and the 3’UTR/MRE are oriented on the same
strand (+/+ or -/-), then the TE is transcribed in the "+" orientation; if opposite (+/- or -
/+), then "−".
Alu-MRE positional enrichment relative to Alu consensus
In the formulas presented in the previous section, MRE positions within the host
3'UTR were converted to MRE genomic coordinates. Given that MRE and 3'UTR
genomic coordinates are both known, one can easily revert back to the MRE position
relative to the host 3'UTR. The same concept is used to determine the 3'UTR position
within the host 3'UTR Alu region, simply substituting the Alu genomic coordinates in
place of the 3'UTR coordinates. Given that both are on the same, 0-based, coordinate
system, no adjustment is needed. Importantly, these local coordinates represent the MRE
position relative to the 5' end of 3'UTR region positions within the Alu were calculated
using rearranged versions of the equations presented above, with some slight
modifications. Here, global coordinates were converted back to local coordinates, except
that the local positions are calculated relative to the Alu rather than the 3'UTR.
First, the MRE position relative to the 3'UTR Alu was calculated using the
formulae that follow. Note that no base adjustment is needed here because both sets of
coordinates are 0-based.
:1,
,
, +1,
21
Generating unique MRE coordinates
When using unique miRNA seed families and sequences in the initial target
prediction, MRE coordinate redundancy (i.e. same chromosome, start, end and strand)
results from overlapping mRNA isoforms with distinct accession numbers. With TE-
MRE predictions, a second source is the partial overlap of RepeatMasker annotations.
Here, redundant MRE sites were collapsed, with the exception of those resulting from TE
overlaps. Two non-programmatic methods were used, as described below.
Unique TE-MREs: Microsoft Excel
TE-MRE coordinates were filtered from the full MRE table using the Galaxy text
filtering function. The table was then exported and opened in Excel. After selecting the
whole table "Remove Duplicates" is selected. Fields containing replicate information
based on the stated definition are left checked. Specifically, boxes remain checked for the
MRE genomic coordinates (Chromosome, start, end and strand) and minimal necessary
TE information (start, end, strand and TE name).
Unique TE-MREs: Galaxy
The Galaxy server has several coordinate-based filtering functions which are far
simpler than the steps described here; however, at present, only the unique coordinates
are produced in the output file. Thus, to remove redundant coordinates in Galaxy without
losing the information containing the redundant fields and any other TE or MRE
information (but no 3'UTR information) are concatenated into one new field, separated
by a specified delimiter. The desired delimiter symbol is added to the table in the last
column using the "Add column" function. A symbol/character is chosen which is not
present in any of the fields. If the unique coordinates were to be analyzed further using
Galaxy, a delimiter is chosen based upon those available in the "Convert delimiters to
tab" function.
22
After the symbol is added, the "Merge columns together" function is used to
combine all of the replicate fields into one column, with each of the fields separated by
the delimiter symbol. The concatenated information is added as a new column in the
output and is then used in the "Group" function as the "group by" parameter. Adding no
operations to this function, the output contains a single column with the concatenated
information. Finally, the delimiter symbol is converted back to tabs using the "convert
delimiter to tabs" function.
TE annotations of miRNA genes
Genomic coordinate and sequence data for human miRNA hairpins was obtained
from the miRBase FTP repository (Version 15). These sequences represent the pre-miR
plus some flanking sequence 3' and 5' of the Drosha cleavage site, but are not intended to
represent the full pri-miRNA. To determine TE overlap with the mature miRNA, these
sequences and coordinates were sufficient. TE and miRNA coordinates were intersected
using the "Join on Genomic Coordinates" function in Galaxy. The Excel string matching
function, "SEARCH" was used to search for the mature miRNA within the pri/pre-
miRNA sequences, thus providing local positional coordinates. The intersected miRNA
and TE genomic coordinates allowed calculation of the start and end positions of the TE
relative to the pre-miRNA, using the first (5') nucleotide of the pre-miR as 1. Finally, the
combined local positions of the mature miRNA and TEs allowed calculation of the
percent overlap of the two features. Those miRNAs for which the TE completely
overlapped the miRNA seed sequence (position 2-8) were considered "TE-derived" for
the purposes of this study
Detailed positional analysis of Alu-MREs
Genomic coordinates for Alu-derived MREs were extracted, along with
RepeatMasker track annotations for the associated TE. These annotations provide a
summary of a sequence alignment between the Alu and the corresponding Alu family
23
consensus. The MRE position relative to the genomic Alu start position was first
calculated and then adjusted according to the alignment start/stop positions.
Microarray analysis
With the exception of the miR-24 experiment, preprocessed microarray fold
change values were obtained from the Supplementary Data 4 table in Garcia et.al. 2011
(Garcia et al, 2011). The original data are available from NCBI GEO. Data series
GSE8501 contains the experimental data for miR-122 (GSM210901), miR-128
(GSM210903) and miR-132 (GSM210904). GSE2075 contains data for miR-1
(GSM37599).
Experimental data for miR-24 is available in data series GSE17828 (Lal et al,
2009). The GEO2R tool available at the NCBI website was used to analyze the miR-24
data. Log2 fold-changes were obtained comparing "miR-24 HepG2" samples with
"Control HepG2". Genes with multiple probes were summarized using median log2 fold-
changes.
Cloning 3’UTR reporters
All 3’UTR reporter constructs were based on the psi-CHECK-2 (Invitrogen) dual-
luciferase system, with the 3’UTR of interest cloned into the XhoI/NotI cloning site 3’ of
the Renilla luciferase stop codon and 5’ of the SV40 poly-A signal. 3’UTR sequences
were cloned from genomic DNA isolated from HEK293 (human) or BEND3 (mouse) cell
lines, using Qiagen DNA extraction kits. PCR primers with appropriate restriction sites
were designed to flank the longest RefSeq-annotated isoform containing the TE/MRE of
interest. Phusion® Hi-Fidelity DNA Polymerase (New England Biolabs) was used to
perform the PCR amplification. Standard cloning protocols were subsequently followed
to restriction-digest and then ligate the vector and inserts. Proper insert sequence and
orientation were confirmed both by analytical restriction digests and direct Sanger
sequencing.
24
Artificial miRNA target sites were all based on two tandem copies of the reverse-
complemented mature miRNA sequence of interest, separated by a short linker sequence
containing an AgeI restriction site to facilitate downstream screening. The sequence was
modified to introduce mismatches near the center of each site and in any locations where
other miRNAs had potential seed pairing. The resulting sequences were ordered as pairs
of synthetic DNA oligonucleotides (IDT) that when subsequently annealed, formed the
artificial sites with 5’XhoI/3’NotI half-sites. T4-Polynucleotide Kinase (T4-PNK) was
used to phosphorylate the 5’ ends of the annealed pairs, which then served as the insert
for the downstream cloning protocol in the same manner as above.
Cloning endogenous microRNAs
Endogenous miRNAs were PCR amplified from human genomic DNA (HEK293
cells) using primers designed to flank the 5’ and 3’ ends of the annotated hairpin by at
least 200bp on each end. PCR products were subcloned into PCR Blunt II TOPO
plasmids, using standard protocols. After sequence verification, the TOPO plasmids
served as a template for a second PCR reaction using primers nested within the original
insert, containing the XhoI and SalI restriction sites and producing a product containing
the miRNA hairpin ±200bp. Standard cloning protocols were then followed to clone the
insert into the CMV promoter-driven expression plasmid (pFB-AAV-miRNA-pA).
Cloning of endogenous miR-566 for comparison with the syntenic loci of other primates
and mice was carried out in the same manner, but only the human sequence was
subsequently cloned into the CMV expression plasmid. Primers were designed to
flanking sequences conserved in primates, allowing use of the same primer set in all
reactions.
Cell culture and transfections
HEK293 and HeLa cells were cultured in DMEM (10% FBS) without antibiotics.
Approximately 24 hours prior to transfection, cells were seeded onto 24-well plates.
25
Transfections were performed in triplicate, using 5 ng luciferase reporter per reaction.
Artificial miRNA mimics (Pre-miRs™) or Anti-miRs™ (Ambion®) were transfected
using Lipofectamine 2000 in Optimem, at final concentrations ranging between 0 and 50
nM. All reactions were balanced with a negative control (NC#1) such that the final
concentration of the combined oligonucleotides equaled that of the highest dose of the
test miRNA. Media was completely removed from cells prior to adding the transfection
complexes, which were combined with an equal volume of DMEM (10% FBS) just
before plating. Cells co-transfected with miRNA mimics and luciferase reporters were
harvested 24 hours later. For all other conditions cells were harvested 36-48 hours post-
transfection.
Luciferase assays
Luciferase assays were carried out using the Dual-Luciferase Assay Kit
(Promega) using the standard protocol. Briefly, transfected cells on a 24-well plate were
lysed by removing the media and adding 100 µl of 1x Passive Lysis Buffer (PLB) to each
well. Cells were rocked gently for 15 minutes. 10 µl of lysate was taken from each well
and added to the bottom of opaque, flat-bottom 96-well plates. Luciferase substrates for
firefly (1x Luciferase Assay Reagent II) and Renilla (1x Stop & Glo) were prepared as
indicated in the manual. A Glomax 96-well Plate injector/reader (Promega) was used to
inject 50 µl of substrate sequentially, reading for 2 seconds after each injection.
RT-qPCR
For both mRNA and miRNA RT-qPCR, RNA was isolated from cells using
TRIzol (Invitrogen). The protocol for fixed cell culture presented in the TRIzol manual
was followed, except that GlycoBlue (Applied Biosystems) was added as a carrier prior
to the Isopropanol precipitation. RNA pellets were air-dried and resuspended in RNase-
free water. RNA concentrations were calculated using a NanoDrop spectrophotometer
(Thermo Scientific).
26
microRNA
TaqMan® gene expression assays (Applied Biosystems) were used to analyze
miRNA expression, using miRNA-specific RT and qPCR assays. mRNAs were reverse-
transcribed using the High Capacity cDNA Reverse Transcription kit (Applied
Biosystems) with random primers. SYBR green primers were designed for target mRNAs
using PrimerQuest (IDT) and used in the qPCR reactions.
Results
MiRNAs have predicted binding sites in 3’UTR-
resident TE sequences
Post-transcriptional regulation by miRNAs is classically mediated between a
miRNA and partially-complementary miRNA recognition elements (MREs) located
within mRNA 3’UTRs. In the human genome, more than 4,000 protein-coding transcripts
co-transcribe TE sequences as part of their 3’UTR. To predict the potential impact of TE
sequences on the target repertoire of individual miRNA families, I predicted and
analyzed miRNA target site frequencies in 3’UTR-embedded TE sequences.
Approximately 60% of all TE-derived sites were found in Alu (~35%), LINE1 (~12%)
and MIR (~11%) elements, consistent with their high genomic prevalence (Figure 2-1A).
On average, 5-10% of a miRNA’s total predicted target repertoire is predicted within TE-
derived MREs, although upwards of 50% is evident for some miRNA families (Figure 2-
1B). Target site distribution within particular TE families was similar with most miRNAs
showing strong bias to a particular group. For more than 85% of miRNAs, L1 and Alu
retrotransposons formed the predominant class, again trending with their high genome-
wide frequencies (Figure 2-1B).
While the MRE seed-based target prediction method demonstrates widespread
potential for miRNA regulation through TE target sites, studies of miRNA targeting
efficacy suggest that given no other information, only a small fraction of predicted targets
27
are likely to be functional (Grimson et al, 2007). Although knowledge of local sequence
and structural features has improved the likelihood of silencing activity, target site
conservation remains among the most predictive factors (Friedman et al, 2009).
Let-7 directly regulates genes through conserved, MIR-
element-derived target sites
The conservation parameter used by many target prediction programs improves the
predictive power of the algorithm and highlights events that may have served an adaptive
purpose during evolution. To determine if TE-MREs display strong sequence
conservation, target sites predicted in MIR retrotransposons were evaluated. For this, I
performed gene functional enrichment analysis using the ToppFunn algorithm on the
~1200 human genes harboring at least one 3’UTR-embedded MIR element. ToppFunn
incorporates several miRNA target prediction programs in addition to other common
classification schemes, (Chen et al, 2009). TopFunn output contained highly-significant
enrichment for let-7 target genes based on TargetScan, PITA (TOP classification), PicTar
and miRSVR (conserved_highEffect-0.5 class) algorithms (Figure 2-2; bottom). In
agreement with this, I found that ~40% (192) of TE-MRE sites were predicted within
MIRs (Figure 2-2; top-left). I also found several examples where the MRE had a high
degree of conservation restricted primarily to the miRNA binding site. For instance, for
Myosin 1F (MYO1F) and E2F transcription factor 6 (E2F6), not only is local
conservation of the let-7 binding site evident, but it is also the only let-7 site predicted in
that 3’UTR (Figure 2-3).
To test whether MIR-embedded target sites impart let-7-mediated regulation, I
cloned into luciferase reporters the 3’UTRs of MYO1F, E2F6, MYC-binding protein
(MYCBP), and major facilitator superfamily domain-containing protein 4 (MFSD4).
When co-transfected with a let-7a mimic, there was dose-dependent reduction in
luciferase activity (Figure 2-4A). No significant response was observed in the psiCHECK
28
vector control. Conversely, inhibition of endogenous let-7a in HeLa cells using an Anti-
miR™ induced luciferase reporter activity to varying degrees depending on the target
(Figure 2-4B). .
miRNAs with high Alu-MRE frequency target specific
regions in the Alu
Up to 30% of all predicted targets for well-conserved miRNAs were contained
within Alus. In the cases of miR-24 and miR-122, more than 80% of potential TE-derived
target sites (1948 and 1402 Alu targets, respectively) were predicted within Alus (Figure
2-5A, B). I reasoned that these unusually high target site frequencies (as compared to
other TEs) was due to miRNAs targeting regions with little sequence divergence from the
parent Alu. If so, MREs for these miRNAs would map to specific positions within the
Alu consensus sequence. To test this, I used genomic coordinates for each MRE and its
host Alu, and calculated the relative position of the MRE within the Alu. These local
positions were adjusted to be relative to the consensus Alu sequence, using
RepeatMasker annotations of the alignment between the Alu and its consensus. From
this, I found that the majority of sites for miR-24 and miR-122, as well as other miRNAs,
fell within very narrow regions in the Alu consensus (Figure 2-6). This suggests that the
local sequence and structural context of the MREs may be similar among these target
mRNAs.
miR-24 directly regulates transcripts through
Alu-derived target sites
I next tested if the Alu-derived sites create functional platforms for miRNA
regulation. For this, candidate Alu-derived targets for miR-24 were selected that had
higher average conservation scores in the MRE site than in the flanking Alu sequence
(e.g. MAP3K9, Figure 2-7). The 3’UTRs from five candidate genes (Platelet F11
Receptor (F11R), Carbohydrate (N-acetylglucosamine 6-O) Sulfotransferase 6 (CHST6),
29
Procadherin beta 11 (PCDHB1), and Eukaryotic Translation Initiation Factor 2, Subunit 3
gamma (EIF2S3), Mitogen-Activated Protein 3-Kinase 9 (MAP3K9)), were cloned into a
luciferase reporter for functional validation. One let-7a MIR-derived target gene,
MFSD4, also had an Alu-derived miR-24 site and was also tested. Dose-dependent
luciferase reduction was observed in response to a miR-24 mimic for EIF2S3 and
MAP3K9, as well as the artificial miR-24 target control, miR-24_2xT, (Figure2-7;
bottom). At the doses used (1 and 10 nM) no significant knockdown was observed in the
other constructs tested or in the psiCHECK no-target control (data not shown and Figure
2-7). Above that concentration, miR-24 caused non-specific changes in both firefly and
Renilla luciferase in the negative control reporter, preventing accurate interpretation of
the 3’UTR reporter data at these higher doses (data not shown). Blocking endogenous
miR-24 with antisense Anti-miR™ inhibitors resulted in a dose-dependent increase in
luciferase activity only in the artificial target positive control, consistent with the low
validation rate seen in the overexpression experiments (Not Shown). If the chosen
candidate genes represent an accurate sample of miR-24 Alu-derived targets, the extent
of Alu-derived targets imparting miR-24 responsive activity is low.
However, because miR-24 predominantly targets a specific region within the Alu
sequence, it may not be representative of an Alu sequence’s general capacity to allow for
miRNA mediated regulation. To test this on a global scale, and to determine if the
responses vary with different miRNAs, I analyzed a set of publically-available
microarray data that measured mRNA transcriptional changes in response to
overexpression of various miRNAs. Genes were annotated according to whether or not
they contained i) a 3’UTR Alu, ii) an Alu-derived target site, or iii) a canonical (non-TE-
derived) target site for the miRNA in question. Analysis of cumulative distribution
functions for all groups revealed significant repression of expression relative to genes
lacking 3’UTR-resident Alus and target sites, but not to the degree seen in genes with
canonical sites. For example, compare the cumulative fraction plots for miR-122 and
30
miR-24 (Figure 2-8 and Figure 2-9). Interestingly, in spite of the generally weaker
knockdown of Alu-derived targets, they represented 20-30% of the down-regulated target
list (Figure 2-9). Also, the two candidates (MAP3K9 and EIF2S3) that responded to miR-
24 overexpression in the context of the luciferase reporter were also repressed according
to the microarray data, supporting our previous findings (Figure 2-7).
Proliferation of Alu and B1 SINEs resulted in the
convergent acquisition of miRNA targets in their
respective primate and murine lineages
To test the impact of recently-evolved lineage-specific TEs, as well as to improve
the prediction of functionally-relevant sites, I repeated our target prediction analysis
using mouse 3’UTR-resident TE sequences. I also searched for the convergent
acquisition of TE-derived target sites to determine if murine and primate orthologs would
independently gain regulatory sites for the same miRNA (Figure 2-10). For this, I
gathered coordinates for mouse 3’UTR-resident TE sequences and used the "lift-over"
utility on the Galaxy web server to convert to the corresponding human coordinates.
Mouse 3’UTR sequences overlapping TEs with no mappable human counterpart were
then selected, as were human 3’UTR sequences overlapping TEs with no mappable
mouse counterpart. Target sites were predicted, using miRNAs present in both species. I
focused on 3’UTRs with single sites that demonstrated sequence conservation in all
species where a TE insertion was present. I functionally tested human and mouse Solute
Carrier Family 12, member 8 (SLC12A8), Sideroflexin2 (SFXN2), UBX domain-
containing protein 2B (UBXN2B) and CDGSH Iron Sulfur Domain 2 (CISD2), using
3’UTR reporters. The 3’UTR of chimpanzee SFXN2 was also cloned, because the miR-
24 seed match had a single base mutation in the target site. Both mouse and human
SFXN2 and SLC12A8 showed significant repression when co-expressed with 15 or 30
31
nM of miR-24 mimic (Figure 2-10). No significant response was seen with UBXN2B,
CISD2 or ptrSFXN2 (Figure 2-10).
Potentially-active Alu loci contain
miRNA binding motifs
Alu and LINE1 elements represent two of the most abundant TEs in the human
genome, as well as the two most-frequently predicted TE-derived target sites. Because
both TE families still show evidence of active retrotransposition in humans, they remain
potential sources for novel miRNA binding sites. A recent study assessed mobilization
activity of 89 full-length Alu sequences and found 124 key positions that were 100%
conserved in active elements (Bennett et al, 2008). I predicted miRNA binding sites in
the ~12,000 Alus in the human genome that retained these 124 features, hypothesizing
that these would be the most likely sources of new Alu-derived sites. As expected, many
of the miRNAs with a high frequency of Alu-derived 3’UTR sites, including miR-24 and
miR-122, had a high frequency of sites in the potentially-active Alu sequences (Table 2-
1). Surprisingly, however, MREs for several miRNAs were present in well over 90% of
the potentially-active sequences. For example, in the case of miR-150, over 98% of the
potentially-active Alus contain a MRE site. This suggests that novel Alu insertions into
3’UTRs may have a higher likelihood of carrying MREs for a subset of miRNA families.
MiRNAs are processed from TE sequences and regulate
target genes containing homologous elements
In the examples presented thus far, the miRNA’s origin precedes that of the
corresponding target sites. Although several mechanisms may generate novel miRNA
genes, I focused on TE sequence processing as a source of miRNAs. I hypothesized that
these miRNAs would inherently gain functional targets through homologous TE
sequences resident in 3’UTRs. In this scenario, a new miRNA gene would have an active
source of novel target sites. To test the functionality of these interactions, I specifically
32
focused on miRNAs where the sequence alignment to the TE overlaps the seed of the
miRNA guide strand. Therefore, I generated a list of all human miRNAs with any
detectable TE homology and then selected sequences where the TE annotation
completely overlapped the proposed seed sequence.
While most miRNAs with TE homology are of relatively recent origin, one
notable example of broad conservation is miR-28. Annotation of this locus shows that
tandem inverted copies of the 3’ end of an L2c retrotransposon formed the 5’ and 3’ arms
of the miRNA precursor (Figure 2-11). Both L2c sequences have a similar level of
divergence from the L2c consensus (23%, 19.4%), suggesting that both insertions
occurred around the same time. A recently-published study suggested that miR-28-5p
binds to and drives endonucleolytic cleavage of LYPD3, interacting with the transcript
through a novel "centered-seed" site within a homologous L2 element (Shin et al, 2010).
In line with this result, luciferase reporters containing the 3’UTR of LYPD3
demonstrated dose-dependent repression in response to a miR-28-5p mimic. Similar
responses were observed using 3’UTR reporters for E2F6, within which miR-28-5p is
also predicted to bind through an L2 sequence (Figure 2-11).
Functional validation of Alu-derived miRNAs
While miR-28 demonstrates functionality of a TE-derived miRNA, its high level
of conservation across eutherian mammals makes it a rare example among the TE-
derived miRNAs. To test if primate-specific TE-derived miRNAs are functional, I used
Alu-derived miR-566 and miR-1285 as case studies. At the start of this study, miR-566
and miR-619 were the only human miRNAs with Alu homology, and only with miR-566
did the Alu encompass the mature miRNA sequence (Figure 2-13A). This miRNA was
initially annotated in a study characterizing the miRNA contingent of colorectal cells
(Cummins et al, 2006). Using a common stem-loop PCR method, I detected significant
expression of miR-566 in PBMC and HEK293 cells. Furthermore, I could express the
33
miR-566 genomic locus in the context of an otherwise promoterless cloning plasmid
(Figure 2-13B,C,D). However, miR-566-encoding plasmids were unable to reduce
luciferase reporter expression (not shown), while a commercially available miR-566 Pre-
miR™ was functional. I suggest that the increase in expression as measured by stem-loop
PCR, without forming a functional miRNA, could be due to non-specific priming.
Indeed, northern blots detected the miR-566 precursor, but not a ~20 nt band
corresponding to the mature sequence (not shown).
While conflicting evidence was found for Alu-derived miR-566, as high-
throughput sequencing technologies matured and were used in miRNA discovery studies,
additional Alu-derived sequences were annotated as miRNAs. Among the list of putative
Alu-miRNAs, miR-1285 had subjectively more promising deep sequencing support than
miR-566, based on data collected by miRBase and deepBase sequencing data (Kozomara
& Griffiths-Jones, 2011; Yang et al, 2010). In a recent study which sought to validate
functionally all known mouse miRNAs, the primary method used for this purpose
involved cloning the miRNA along with ~100 flanking nucleotides into an expression
vector. They expressed these constructs in vitro and then sequenced the small RNA
fractions to determine those that yielded a functional mature miRNA (Chiang et al,
2010). These same constructs were also tested for the ability to silence MRE-containing
reporters, and the data related back to the sequencing results. Similarly, to test the
functionality of the miR-1285, I cloned 200bp flanking the pre-miRNAs of the two hsa-
miR-1285 loci (miR-1285-1 and miR-1285-2) into an expression plasmid (Figure 2-12A).
Interestingly miR-1285-1 and -2 share a common seed sequence and homology to Alu
elements, but differ notably in secondary structure. To test whether functional miRNAs
could arise from the genomic fragments, I transfected HEK-293 cells with 0, 200 or
400ng of either miR-1285-1 or miR-1285-2, balanced with a control plasmid lacking a
miRNA. A similar expression plasmid was also generated for hsa-miR-24-1 to serve as a
positive control for a valid miRNA. All plasmids were co-transfected with their
34
corresponding artificial targets as described previously. Interestingly, miR-1285-1, but
not miR-1285-2 reduced expression of the artificial reporter (Figure 2-12B). This
suggests that the miR-1285-2 locus does not produce a functional miRNA.
I also tested whether predicted targets for miR-1285 responded to miR-1285
overexpression. The majority of target sites predicted for miR-1285 were located in Alus
(Figure 2-12C), and so candidates containing Alu-derived sites were tested.
Overexpression of miR-1285 reduced expression of EIF2S3, CHST6 and CBFA2T2 in a
dose dependent manner. Attempts to block miR-1285 activity using Anti-miRs™ were
ineffective, even for the artificial target. While this could mean that endogenous miR-
1285 is non-functional in the cell lines tested, I also were unsuccessful in inhibiting the
effects of miR-1285 overexpression with the Anti-miR™ (Figure 2-12D). I hypothesize
that could be due to nonspecific binding of the Anti-miR™ to the multitude of Alu
sequences in other transcripts transcribed at any given time. In any case, the
overexpression data and the proper processing of miR-1285 in the context of the genomic
locus supports miR-1285 as a functional Alu-derived miRNA. Furthermore, while
varying functional results were observed for the TE-derived miRNAs tested, the data
from miR-28 and miR-1285 show that these miRNAs deserve further study.
Discussion
In this work, I demonstrate that the most prevalent TE families in the human
genome, namely Alu, MIR and LINE2 elements, provide a functional platform for
miRNA-mediated regulation when resident in mRNA 3’UTRs. I also found that, while
the majority of TE-MREs in human 3’UTRs reside in primate-specific L1 and Alu
elements, sequence conservation was also seen in the MIR-derived let-7 MREs.
Further inquiry into the extent of Alu MRE function will undoubtedly benefit
from high-throughput approaches of measuring gene expression changes after modulating
miRNA levels, such as the microarray experiments presented here. The low degree of
35
sequence divergence among the 3’UTR-resident Alus leads to a preponderance of
predicted MREs for some miRNAs. As a function of their limited divergence from
parental Alu sequences, distinct miRNA binding sites cluster in specific Alu primary
sequence regions (Figure 2-6). Although, on average, Alu-MREs had lower potency
canonical (non TE-derived) sites, evidence from the array data and our luciferase results
show that some are, indeed, functional.
One possible reason for the lower validation rate of Alu-MREs may arise from the
fact that Alus can associate with Signal Recognition Particle (SRP) proteins through
specific domains (Bovia et al, 1997; Chang et al, 1996; Hsu et al, 1995). If SRP binding
occurs in the context of a 3’UTR-resident Alu, MREs, and hence miRNA access, may be
shielded. In a previous study, in vitro transcription of Chloramphenicol Acetyltransferase
(CAT) mRNAs with artificial 5’ or 3’UTR Alus transcribed in the sense orientation were
bound by SRP complex (Hsu et al, 1995). If SRP binding blocks miRNA association,
miRNAs predominantly targeting the antisense Alu would be less affected. This
hypothesis could be tested directly by querying microarray datasets. Genes could be
categorized based on the presence or absence of a 3’UTR Alu, whether the Alu is
transcribed in the sense or antisense orientation, and if the 3’UTR Alu has a predicted
MRE. One could also test genes with known 3’UTR-Alu-SRP interactions to assess the
impact of the SRP on miRNA-mediated silencing. Additionally candidate genes could be
predicted computationally by searching for sequence features indicative of SRP binding.
One predictor of SRP-binding is active transposition (Bennett et al, 2008). Because the
SRP binding domains G25C and G159C in the AluYa5 subfamily are important for
transposition activity, genes with Alus retaining these features could be predicted along
with the associated MREs. Their direct association and the effects on miRNA-mediated
repression could be assessed by first immunoprecipitating SRP9 or 14 followed by RT-
PCR of candidate transcripts to confirm SRP-Alu interaction. The validated 3’UTRs
could be cloned subsequently into luciferase reporters and site-directed mutagenesis
36
performed to confirm their role in SRP binding. The candidates emerging from this
secondary screen would then be used to determine if miRNA-mediated repression was
impacted by the presence of absence of SRP, or the intactness of the G24 and G1159
sites. Conversely, 3’UTR Alus lacking these sites could be reverse-engineered to attain
SRP association sites and miRNA knockdown efficiency measured.
On a larger scale, if such interactions are shown to be important for miRNA
recognition, a Cross Linking/Immunoprecipitation – Sequencing (CLIP-SEQ) method
could be developed for the Alu-binding SRP proteins, SRP9 or SRP14. One likely
complication in this experiment would be the predominance of the 7SL noncoding RNA,
which is the canonical SRP binding partner, or free cytoplasmic Alu RNAs. However, the
molecular weight difference between Alu/7SL-bound and mRNA-bound SRP complexes
could help resolve this, since CLIP protocols often include a size-selection step.
Curiously, of the ~1.2 million Alu copies present in the human genome, fewer
than 20 are expected to produce functional miRNAs. Also, the functionality of most of
these Alu-derived miRNAs is untested. In this work, I found that miR-566 and one of the
two miR-1285 loci did not produce a functional miRNA. However, LINE-derived miR-
28 is both well conserved and functional and miR-1285-1 did demonstrate effective
processing and silencing efficacy. These data show that some TE-derived miRNAs are,
indeed functional, but Alu-derived miRNAs and other miRNAs with low apparent
sequence conservation, deserve closer scrutiny. Ideally, these studies should include a
combination of northern blot and reporter based assays, before concluding that a bona
fide miRNA emerges from the locus in question.
37
Further complication in identifying true Alu-derived miRNAs comes from a
recent study which demonstrates that DICER1 degrades Alu RNAs (Kaneko et al, 2011),
indicating that some Alu-derived small RNAs are DICER-dependent degradation
products rather than miRNAs. These results emphasize the importance of functional
experiments for validating Alu-derived miRNA function, such as those presented here.
While miR-1285 and miR-566 each produced small RNAs, only miR-1285 was capable
of silencing a luciferase reporter in a sequence-dependent manner. For miRNA discovery
studies, close examination of the proposed loci should be undertaken to ensure that they
follow the criteria outlined in (Chiang et al, 2010). The experimental design used for
testing miR-566 and miR-1285 was taken from the Chiang et al study; although they
additionally performed high-throughput sequencing on small RNA fractions extracted
from cells expressing the miRNA expression constructs. From these data, they found that
miRNAs most likely to validate functionally were those producing a predominant mature
sequence with a homogeneous 5’ end and a passenger strand with a 2 nt 3’overhang. The
miRBase repository is actively incorporating small RNA high-throughput sequencing
reads and using evidence such as that proposed in the Chiang et al study to improve the
accuracy of miRNA identification and remove dubious annotations (Kozomara &
Griffiths-Jones, 2011).
In summary, I find evidence that some TE-derived miRNAs and miRNA binding
sites are both conserved and functional. I also show that some sequences with low
sequence conservation do respond to miRNA expression, with evidence both from
reporters and global transcript expression profiles. Together, our data support a role for
TEs in the evolution of human miRNA interactions, as suggest that novel miRNA
functions may continue to arise as active transposition persists.
38
Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs.
TargetScan was used to predict miRNA target sites in Refseq-annotated human 3’UTRs. TargetScan’s provided human miRNA family seed file was used. TE-MREs were annotated by intersecting unique MRE coordinates with the RepeatMasker track annotations at the UCSC Genome Browser. (A) The TE Family annotation from RepeatMasker was used to classify all human TE-MREs, and the percent contribution of the top 5 most prevalent families is shown, representing >87% of all TE-derived target sites predicted. Primate-specific Alu and L1 retrotransposons make up more than half of the sites; the more ancient L2 and MIR elements constituted ~20% of the sites. “Other” represents all other transposable elements, but simple repeats and other low complexity repeats were not considered in this analysis. (B) Seed families were selected for which Alu, L1 or MIR-derived MREs were the most frequent TE-MRE. Seed families were binned according to the fraction of TE-MREs comprised by the majority TE group The histograms for the three TEs are overlayed, and so should be read as though every bar starts at zero. For example, the histogram shows that for ~60 miRNAs, L1s represent ~25% of the predicted MRE sites. Alus showed a bimodal distribution, because for many miRNAs, Alus represented more than half of predicted TE-MREs.
39
Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs.
The TE family composition of predicted human let-7 MREs revealed over 40% are of MIR origin. (Top) This was the highest proportion of MIR-derived sites for any miRNA in the dataset.
40
Figure 2-2. Continued. The 192 transcripts with MIR-MREs represent just over 10% of the ~1800 human genes with embedded MIR elements. (Bottom) Gene names for the ~1800 genes were analyzed using ToppFun to find functional groups associated with genes. Statistical significance is presented as p-values adjusted using Bonferroni correction. Let-7 had the most significant p-values of any functional category, including the non-miRNA categories not shown. Furthermore, it was the only miRNA with significant results for more than two of the prediction methods. The MRE prediction methods and any additional information are color-coded. For mirSVR (C = Conserved, NC = Non-conserved, HE = High Efficacy (predicted), LE = Low Efficacy).
41
Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6.
Let-7 MREs (yellow box) overlapping a MIR element (red boxes) annotated by RepeatMasker. MYO1F and E2F6 are two candidates where: i) no let-7 MRE is present in the 3’UTR aside from the MIR-MRE shown, and ii) PhyloP conservation scores (Mammal Cons track) showed (subjectively) strong conservation coincident with the binding site.
42
Figure 2-4. Let-7 regulates 3’UTRs containing MIR-derived MREs
3’UTRs of MYCBP, MFSD4, E2F6 and MYO1F, each containing a single MIR-derived let-7 MRE, were cloned into Dual luciferase reporters and co-transfected into HEK293 cells with low doses (0.1, 1.0 nM) of an artificial let-7 mimic (Pre-miR™). Reactions were balanced to 1.0nM with a non-targeting Pre-miR™ (ctrl). Repression of luciferase activity was observed after 24 hours with all four reporters, as well as the let7_2xT positive control, but not the negative control reporter (CTRL). Luciferase activity is plotted as a percent of the activity observed in the 0nM let-7 Pre-miR™ dose. (B) Luciferase reporters were then co-transfected with a let-7a AntimiR inhibitor (0, 25, 50nM) into HeLa cells which express high levels of endogenous let-7a. 48 hours later, all 3’UTR, but not the CTRL reporters showed increased activity over the 0 nM AntimiR dose. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
43
Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction.
TE-derived MREs were predicted for each miRNA. RepeatMasker track annotations were used to tabulate TE Family frequencies. The top represented families are shown, grouping all other families into the “Other” category. These results showed that Alus represent over 80% of the TE-derived miRNAs.
44
Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus.
The position of the Alu consensus sequence containing a miR-125-3p, miR-24 or miR-122 MRE was graphed above. MRE position was normalized across all Alus by calculating positions relative to the Alu consensus sequence (see Methods). The high MRE frequency observed for each of the miRNAs is restricted to a narrow range 5-10bp wide. This suggests that little sequence divergence has occurred among Alus in these regions. It also suggests that these miRNAs encounter similar local sequence/structural contexts when binding to similar Alu-derived sites in other mRNA targets.
45
Figure 2-7. Alu-derived MREs respond to miR-24 overexpression.
Transcripts with Alu-derived MREs with evidence of local sequence conservation were functionally tested. (Top) The Primate Conservation track shows a rise in conservation score coincident with the miR-24 MRE overlapping the AluSp family sequence. (Bottom) Luciferase reporters expressing EIF2S3 and MAP3K9 3’UTRs were co-transfected into HEK293 cells with Pre-miR™ miR-24 mimics (0, 1, 10 nM Pre-miR™ doses). 24-hours later, luciferase assays reveled that EIF2S3 and MAP3K9 reporter expression was in response to miR-24, while the negative controls (ctrl) were not. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
46
Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale
Gene annotations from arrays were intersected with TE-MRE predictions for the corresponding miRNAs. (Top) Genes were grouped according to whether they had a canonical (non TE-derived) MRE, Alu MRE or no MRE for the indicated miRNA family, and the empirical cumulative distributions were plotted. Canonical and Alu MRE-containing transcripts were shifted to the left of the non-target set, demonstrating a larger fraction of down-regulated transcripts in these groups relative to Alu MRE sites (Bottom).
47
Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence.
Cumulative distribution plots showed greater knockdown of genes with Canonical or Alu-derived MREs compared to background (no MRE). Therefore, in the case of miR-122 (left) and miR-24 (right), between 20 and 30 percent of all MRE-specific knockdown is due to the presence of Alu-derived target sites.
48
Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families.
Target sites were predicted in human and mouse 3’UTR TEs, limiting to TE integrations specific to each lineage (Top). Homologous genes were then combined, searching for lineage-specific TE-derived MREs for the same miRNA. For miR-24, most of these sites resulted from transposition of B1 SINE elements, which, like Alus in primates, arose from a 7SL RNA ancestral sequence. Candidates were selected which had an Alu-derived miR-24 site in human 3’UTRs and a B1-derived site in mouse. The chosen candidates, SFXN2, SLC12A8 and UBXN2B additionally had binding sites that were conserved in species where the insertion was present. (Bottom) Luciferase reporters expressing candidate 3’UTRs were co-transfected with miR-24 Pre-miRs™. SFXN2 and SLC12A8 reporters showed reduced expression after miR-24 treatment compared to the control treated cells for both human and mouse constructs. UBXN2B showed no response. Chimpanzee SFXN2 had a single base change that disrupted the predicted binding site and a reporter of this 3’UTR did not respond to miR-24 addition. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
49
Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome
miRNA Family Alu (+) Alu (-)
miR-150 98.1% 0.0%
miR-129/129-5p 97.7% 0.0%
miR-590/590-3p 95.6% 0.0%
miR-106/302 95.6% 0.0%
miR-520gh 95.6% 0.3%
miR-17-5p/20/93.mr/106/519.d 95.5% 0.0%
miR-411 95.3% 0.0%
miR-512-3p/1186 94.0% 0.0%
miR-483/483-5p 86.0% 3.8%
miR-1234 83.2% 15.3%
miR-1307 75.6% 0.1%
miR-122 72.4% 0.3%
miR-139-3p 70.9% 0.6%
miR-720.h 67.5% 0.0%
miR-575 61.1% 10.4%
miR-1281 56.4% 0.0%
miR-709/1827 7.7% 97.1%
miR-24 0.6% 96.6%
miR-940 0.2% 95.8%
miR-485/485-5p 0.0% 93.2%
miR-548c-3p 1.8% 91.8%
miR-290-5p/292-5p/371-5p 2.6% 91.6%
miR-661 6.5% 73.7%
miR-566 0.2% 71.6%
miR-766 0.2% 70.6%
miR-508-5p 0.0% 68.0%
miR-1273 22.9% 66.3%
miR-663 0.0% 66.3%
Annotations and sequences of potentially active Alus were taken from Supplemental Table 3 (Bennett et al, 2008).
50
Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences.
LINE2-derived miR-28 is a conserved TE-derived microRNA (A). (B) Homologous LINE2-derived MREs make up the largest proportion of TE-derived targets. (C) LYPD3 has a LINE2c-embedded miR-28 MRE that shows strong conservation localized around the site. (D) Co-expression of 3’UTR luciferase reporters for LYPD3 and E2F6, both of which have predicted L2 target sites, are potently repressed in response to miR-28-5p overexpression. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).
51
Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs.
(A) Two genomic loci are annotated for Alu-derived miR-1285, miR-1285-1 and miR-1285-2. Both loci, along with ~100 flanking bp were cloned into an expression plasmid. (B) MiR-1285-1 but not miR-1285-2 repressed the expression of luciferase reporters with 3’UTR resident target sites (miR1285-2xT). Mutating the seed sequence abolished this activity, indicating that miR-1285-1 is a functional miRNA. (C) Alu-derived miR-1285 has MREs predicted in homologous Alu sequences. (D) Coexpression of the miR-1285 mimic with predicted target 3’UTR reporters led to a reduction in luciferase activity. (E) Anti-miRs™ do not affect miR-1285 activity, likely due to dilution effect from other transcripts with Alu-derived sequences. N=3; Error bars = SD. * = p ≤ 0.05 (ANOVA; Tukey’s post hoc).
52
Figure 2-13. Pol III intronic promoters drive intronic miRNA expression.
(A) Representative diagram of the miR-566 genomic locus in several species. (B) Diagram of gHsa-miR-566 and gMmus-Sema3F constructs. The gHsa-miR-566 construct contains the intronic sequence of SEMA3F harboring the primate-specific Alu-derived miR-566 sequence. gMmus-Sema3F construct contains the equivalent intronic sequence of Sema3F from mouse, which is devoid of intronic miRNA sequence. (C) Mir-566 expression in HEK293 cells. Mir-566 expression was detected by QPCR in HEK293 cells after transfection with gHsa-566 but not in cells transfected with gMmus-Sema3F. MiR-566 levels were normalized to 18S expression and compared to cells transfected with gMmus-Sema3F. Data are mean ± SEM. *, P<0.05, n=4. (D) Mir-566 is expressed independently of Sema3F. MiR-566 and Sema3F expression were determined in HEK293 cells and PBMC cells. Data show expression of both miR-566 and Sema3F in HEK293 cells, while PBMC cells express only miR-566 and not the host gene. MiR-566 and Sema3F levels were normalized to 18S expression. Data are mean ± SEM. *, P<0.05, n=4.
53
CHAPTER 3
LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL
SOURCE OF ENDOGENOUS MICRORNA “SPONGES”
Abstract
MicroRNAs (miRNAs) classically bind to the 3’ Untranslated Regions (3’UTRs)
of protein-coding genes, playing important roles in diverse cellular processes. Exploring
the function of individual has relied on molecular tools that reduce the miRNA’s
expression or activity. One effective method uses constructs expressing a reporter gene
with a 3’UTR containing several miRNA binding sequences. These miRNA “sponges”
compete for miRNA binding to endogenous targets. Similarly, I find that some
endogenous long non-coding RNAs (lncRNAs) contain numerous binding sites for a
miRNA. Therefore, I propose that these lncRNAs function as endogenous miRNA
“sponges,” regulating the activity of one or more miRNAs through competitive
inhibition.
Introduction
Long, non-coding RNAs (lncRNAs) are an enigmatic class of novel RNA species
roughly defined as being larger than 200bp and having no evidence for coding potential
(Rinn & Chang, 2012). Although the name and definition are rather nondescript and
arbitrary, in the few years since their discovery, several distinct groups have emerged and
are generally defined according to their position and orientation in relation to nearby
protein-coding genes (Figure 1-2). For lncRNAs falling within or proximal to protein-
coding genes, this classification proved somewhat useful as many are thought to act in
cis, regulating expression of the neighboring transcripts. Recently, interest has grown in
understanding the function of the long intergeninc non-coding RNAs (lincRNAs). These
long non-coding RNAs can be have been implicated in the coordination of epigenetic
processes. Most long non-coding RNAs are restricted to the nucleus, supporting their role
54
in epigenetic and transcriptional control, but some are predominately cytoplasmic and are
capped, spliced and polyadenylated like mRNAs. Because microRNAs (miRNAs)
classically regulate protein-coding mRNAs by binding to non-coding 3’UTRs, I
hypothesized that some mRNA-like lncRNAs may be substrates for miRNA binding.
One of the general mechanisms by which lncRNAs can regulate transcriptional or
epigenetic states is by acting as decoys for protein regulatory factors (Wang & Chang,
2011). For example, growth arrest-specific 5 (Gas5) is an lncRNA that forms a structure
mimicking a DNA glucocorticoid response element (GRE). In this way, Gas5 competes
for binding with the DNA binding domain of the glucocorticoid receptor (Kino et al,
2010).
In this study, I propose a mechanism by which lncRNAs can act as decoys for
Argonaute (Ago)-bound miRNAs. Binding between a miRNA and mRNA classically
occurs through the 3’UTR of the mRNA as the coding region is typically a less-effective
substrate (Garcia et al, 2011). LncRNAs are effectively “UTRs”, thereby providing, at
least theoretically, large non-coding platforms for miRNA binding. LncRNAs may
provide a platform for multiple miRNA recognition elements (MREs) which would
impede miRNA:mRNA interaction. (Figure 3-14). To find lncRNAs that provide miRNA
“sponges” of biological relevance, I predicted MREs in lncRNAs that are expressed at
high levels in mouse Embryonic Stem Cells (ESCs) by microarray, or in particular
regions in the mouse brain by in situ hybridization (ISH). From this, I found many
candidate lncRNA that have greater than 10, and some with as many as 40 MREs in an
lncRNA for a single miRNA family. I characterize one such lncRNA with 23 binding
sites for the miR-15/16 family in the longest annotated isoform. Interestingly, I find
evidence for alternative lncRNA isoforms formed from alternative splicing or
transcription start site choice, which removes many of the predicted miR-15/16 binding
sites, which could subsequently regulate the degree to which a miRNA is sequestered.
55
Together, our findings suggest a mechanism for lncRNA-mediated regulation of miRNA
activity.
Methods
Data sources
Accession numbers for the mouse “Brain” and “ESC” long non-coding RNAs
were taken from the supplementary data provided in references (Dinger et al, 2008;
Mercer et al, 2008). Using these accession numbers, sequences were obtained from the
UCSC Genome browser (mm9). Mouse miRNA data, including seed sequences,
conservation level and seed family annotations were obtained from the TargetScan
website (http://www.targetscan.org/) (Grimson et al, 2007).
In situ hybridization (ISH) data for PSMI16 in adult mouse brain was obtained
from Allen Brain Atlas (http://mouse.brain-map.org/) (Ng et al, 2009) . PSMI16 data
series were found using its accession number from the Riken database, 6720401G13Rik.
The same search term was used to obtain ISH data for the e14.5 mouse embryo, available
from the Eurexpress transcriptome atlas (http://www.eurexpress.org/ee/) (Diez-Roux et
al, 2011).
Prediction and analysis of MRE content in lncRNAs
Target sites for mouse miRNAs were predicted in lncRNA sequences from both
datasets independently using the standalone Perl implementation of the TargetScan 5.1
algorithm (Lewis et al, 2005). For the purposes of representing the distribution of MRE
frequency, only one representative of a miRNA seed family was used. Additionally, for
lncRNAs with multiple isoforms, the sequence with the highest MRE frequency for a
given miRNA was represented.
56
RNA isolation and RT-PCR
Tissues were harvested from wild-type C57Bl/6 mice after deeply anesthetizing
with isofluorine and sacrificing by cervical dislocation. Tissues were immediately placed
in ~300ul RNALater (Life Technologies) and stored at 4°C overnight.
Total RNA was extracted using TRIzol reagent (Life Technologies) according to
the manufacturer’s protocol. ~1ml of Trizol was added to the mouse tissues after
removing the RNALater and tissues were homogenized on ice using a micropestle. RNA
samples were quantified by spectrophotometry and 1.0 μg of total RNA treated for 1.5 hr
with DNAse I to remove genomic contamination (DNA-free kit, Ambion®). Unless
otherwise indicated, cDNA was generated from ~500ng of the RNA using the High
Capacity cDNA Reverse Transcription Kit with random primers (Life Technologies).
Ago immunoprecipitation
Immunoprecipitations were performed using Dynabeads (Invitrogen). Beads were
prepared according to the manufacturer’s protocol, binding either the Ago or IgG control
antibodies. NPC or HEK293 cells were lysed using RIPA buffer with RNase and protease
inhibitors added. The immuoprecipitations were also performed using the manufacturer’s
protocol with the following details. Cell lysates were incubated with the dynabeads for
two hours at 4°C. After incubation and placing the samples on the magnet to separate the
bound beads, ~100 ul of the supernatant was retained as input. Three washes were then
performed with lysis buffer. After the final wash, the beads were separated, buffer
removed and 1ml of TRIzol was added directly to the beads and the reserved supernatant.
RNA was isolated as above. RT-PCR was performed to detect PSMI16 as above, except
that 200ng of RNA was used because of low RNA yield in the IP.
57
Results
Abundant MRE content is evident in many mouse
lncRNAs
To predict the extent of miRNA binding to lncRNAs, I evaluated MRE content in
a set of lncRNA sequences with previously-defined expression patterns in mouse brain or
embryonic stem cells (ESCs). ~460,000 combined seed matches were found in 849 and
945 sequences, representing confidently-expressed lncRNAs in the “brain” or ESC
datasets, respectively. Because the average lncRNA is expressed lower levels than a
typical mRNA, to effectively compete for miRNA binding I hypothesized that lncRNA
decoys would have numerous binding sites for a miRNA it regulates. To uncover
candidate interaction pairs, I tabulated MRE frequency for all predicted miRNA:lncRNA
interactions. As a control, target prediction was repeated on all sequences after
performing a randomized dinucleotide shuffle. 148 brain and 63 ESC-expressed lncRNAs
had at least 10 MREs for one or more miRNAs (Figure 3-15). By contrast, only three
such events were predicted in the scrambled control datasets. Many well-conserved
miRNAs, including pro-oncogenic miR-27, the tumor-suppressive and developmentally-
important miR-15/-16 and miR-302 families, and brain-enriched miR-128 and miR-338
had at least 10 sites predicted in one or more lncRNAs (Table 3-1). Interestingly, many
lncRNAs had numerous target sites predicted for several different miRNAs. For example,
the lncRNA NR_015505 (BC066100, in Table 3-1) has 11 MREs for miR-338, 12 for
miR-302 and 23 for the miR-15/16 family, suggesting a potential to coordinately regulate
multiple miRNAs. However, because the miR-15/16 family has nearly twice the MRE
content as any other miRNA in this transcript, I hypothesized that these miRNAs would
be the most likely candidates for competitive inhibition. Therefore, for the purposes of
this study and for simplicity, I refer to NR_015505 throughout the text as Putative
Sponge for miRNA-16 (PSMI16).
58
Expression pattern of lncRNA, PSMI16
Adult mouse brain
PSMI16 was one of only two lncRNAs that had more than 10 sites for any
conserved miRNA and was expressed in both the brain and ESC datasets. The “Brain”
lncRNA dataset had come from the identification of lncRNAs with ISH data available
from the Allen Brain Atlas. The ISH data for NR_015505 revealed the highest expression
levels in regions within cerebellum and hippocampus (Figure 3-16). Closer inspection of
the hippocampal expression pattern revealed that expression was restricted to the granule
cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. Similar
regional restriction was seen in the cerebellum, where high expression levels were only
observed in the periphery of the granular layer of the cerebellar cortex. While I
hypothesized that having high MRE content would help overcome low levels of
expression often observed with lncRNAs, these data show that in certain regions,
PSMI16 levels may be quite high in addition to having 23 MREs for miR-16.
Developing mouse at e14.5
Because PSMI16 was also expressed in the ESC dataset, I was interested to see
whether the lncRNA was expressed in the developing mouse. An ISH data series showing
PSMI16 expression in a mouse embryo 14.5 days post coitum (DPC) was found in the
Eurexpress Transcriptome Atlas database (Diez-Roux et al, 2011) (Figure 3-17).
Other adult mouse tissues and cell lines
To characterize PSMI16 further, I tested its expression in adult mouse tissues
using semi-quantitative RT-PCR. Expression was detected in all tissues tested, with
particularly high levels in colon, thymus, lung, pineal gland and ovaries (Figure 3-18).
These data show that the lncRNA has expression in many embryonic and adult tissues. I
also tested several mouse cell lines for its expression, including brain-derived endothelial
59
(BEND3), neuroblastoma (N2A), and neural progenitor (NPC) cells for the lncRNA’s
expression. N2As showed the lowest level of expression, and so NPC and BEND3 cells
were used for further studies (Figure 3-18). In these latter experiments, Oligo-dT primers
were used in the RT reactions, creating a cDNA library of polyadenylated transcripts. I
was able to amplify PSMI16 from Oligo-dT libraries not only using the same primer set
as before, but also with a set amplifying a near full-length product. This suggests that
PSMI16 is a polyadenylated transcript,
PSMI16 associates with Ago2
Although PSMI16 was expressed at high levels in many biological settings, in
order to function as a miRNA target decoy, it should associate with miRNA-containing
complexes. Specifically, a target decoy should associate with RISC, within which
miRNAs are bound by Ago proteins. To test whether PSMI16 associates with this
complex, I performed RNA Immunoprecipitation (RIP) using an antibody for Ago2 on
NPC cell lysates, which had appreciable levels of PSMI16 and miR-16 (Figure 3-5 and
Sarah Fineberg, unpublished data). Because PSMI16 is rodent-specific, human HEK293
cells were used as a negative control. RNA was extracted from bound and unbound
fractions, and RT-PCR for PSMI16 was performed. As expected, HEK293s showed no
detectable expression for PSMI16 (Figure 3-19). PSMI16 was detected in both Ago and
IgG control supernatants, confirming expression of PSMI16 in these cells and its integrity
through the course of the experiment. In the IP samples, however, a single specific band
was only seen in the Ago-bound IP fraction of mouse NPCs, demonstrating that PSMI16
associates with Argonaute proteins in NPCs.
“Modular” exon structure and differential MRE
inclusion in PSMI16 alternative isoforms
Based on annotations available at the UCSC Genome Browser, I saw that PSMI16
is a ~5.7kb transcript with 20 exons and several alternative isoforms formed from
60
differential splicing, promoter use, or 3’ end choice. The structure of the primary isoform,
drawn with exons roughly to scale in Figure 3-20, reveals that the predicted target sites
for miR-16 are present in 13 of the exons within the first (5’) two-thirds of the transcript.
Interestingly, many of the exons are near-identical copies of one another, as demonstrated
by a multiple sequence alignment of the miR-16 binding sites with ~18 flanking bases
(Figure 3-20; bottom). Remarkably, in addition to the full-length transcript, a novel short
isoform was amplified in BEND3 cells that excluded as many as 12 predicted binding
sites (Figure 3-20). Also a different variant (AK030946) skips exons 4-13, leaving only 4
miR-16 MREs (Figure 3-20). These data suggest an intriguing mechanism by which the
repetitive MRE-containing exons could serve as “Modular” units, allowing fine control
over MRE frequency and, consequently, miRNA repression levels.
Discussion
During the course of these experiments, five highly-publicized articles were
published in quick succession demonstrating various biological systems where this
miRNA “sponging” mechanism plays an important role (Cesana et al, 2011; Karreth et
al, 2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). These long non-
coding RNAs were named competing endogenous RNA (ceRNA). The first study
demonstrated that a PTEN pseudogene had no coding potential, but retained many of the
same miRNA binding sites as its protein-coding counterpart (Poliseno et al, 2010). As
compared with the example proposed in this thesis chapter, the pseudogene ceRNA
would likely have more precise impact on PTEN levels, because it would compete for
multiple miRNAs, all of which bind the PTEN transcript. Interestingly, the same group
later published that protein-coding PTEN functions as a ceRNA in a coding-independent
matter, adding to the complexity of this system (Tay et al, 2011). Finally, an mRNA-like
non-coding RNA, like PSMI16, was shown to function as a ceRNA during muscle
differentiation (Cesana et al, 2011). They showed that linc-MD1 “sponges” miR-133 and
61
miR-135, which themselves regulate transcription factors that activate muscle-specific
gene expression. Together, these data add a great deal of complexity to an already-
complex system of post-transcriptional gene regulation.
PSMI16 may still prove to be an interesting case study, since regulation of its
alternative isoforms adds an intriguing layer of complexity to the ceRNA story. However,
the linchpin in the PSMI16 story was finding a measurable indication of cellular
responses to miR-16 expression—a point which ultimately remained elusive. I proposed
measuring levels of previously-established miR-16 target genes after manipulating the
levels of PSMI16 or altering accessibility to the MRE site. However, I was unable to
validate any previously-described miR-16 targets in BEND3 cells. Overexpression of
miR-16 using Pre-miR™ mimics and inhibition with Anti-miRs™ had no effect on the
levels of the five genes tested (not shown). Artificial reporters for miR-16 showed that
the Pre-miR™ and Anti-miR™ treatments were working properly; suggesting that the
target genes tested are not responsive to miR-16 in these cells. Therefore any response
observed from manipulating PSMI16 levels would likely be non-specific in this setting.
62
Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”.
(Bottom right) In a typical setting, a pri-miRNA is processed in the nucleus, processed sequentially by RNaseIII enzymes, Drosha and Dicer, and the mature guide miRNA is loaded into an Ago protein (Ago2 is depicted). (Top right) The miRNA guides the Ago-containing RISC machinery to complementary binding sites in the 3’UTR of a protein-coding miRNA, leading to reduced protein output through transcript destabilization or translation inhibition. (Top left) A proposed long non-coding RNA (lncRNA) “sponge” is transcribed in the nucleus, then possibly capped, spliced and polyadenylated before being exported to the cytoplasm where the miRNP complexes located. An lncRNA with numerous miRNA binding sites is proposed as a means to effectively compete for miRNA binding. (Bottom center) With miRNP complexes sequestered on the lncRNA, translation of target mRNAs resumes.
63
Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs.
Modified histograms summarizing the frequency of binding sites predicted for all possible miRNA (492) x lncRNA interactions in the “Brain” (1665) or “ESC” (1333) datasets. Control lncRNA datasets (dotted lines) were generated by a randomized dinucleotide shuffling of each sequence.
64
Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs
lncRNA miRNA
Accession. Coordinates (mm9) (#MRE) 1Conserved miRNA
AK077064 2chrY:30704983-30714563(+) (25)miR-590-3p; (15)miR-145,miR-186,miR-205,miR-24,miR-344d,miR-488; (12)miR-340-5p; (10)miR-129-5p,miR-136,miR-155,miR-214,miR-28, miR-339-5p,miR-376c,miR-433,miR-539-5p,miR-544-3p,miR-592
BC066100 (PSMI16)
chrX:47916301-47988243(-) (23) miR-16; (12) miR-302c;(11) miR-199a-5p, miR-338-3p
AK036570 chr3:127039311-127201919(-) (10) miR-296-3p; (9) miR-376b;(8) miR-486
AK148461 chr4:109074884-109078859(-) (10) miR-140; (8) miR-142-3p
AK145034 chr2:18609149-18611941(-) (10) miR-149; (9) miR-544-3p
AK028839 chr8:120205521-120217098(+) (14) miR-342-3p; (11) miR-377
AK141020 chr2:173950538-174009419(-) (10) miR-495
AK034303 chr8:96953714-96958182(-) (8) miR-495
AK031731 chr2:75538560-75542038(-) (9) miR-495
AK048599 chr13:59873240-59886735(-) (8) miR-146a
AK038923 chr5:142577068-142579235(+) (12) miR-16
BC030475 chr7:116094278-116105049(+) (8) miR-16
AK083005 chr11:32626600-32630288(+) (8) miR-18a
AK054418 chr8:86105567-86109838(-) (8) miR-214
AK017143 chr5:22888125-22939143(-) (23) miR-361
AK133305 chr4:39345077-39397282(-) (13) miR-378
DQ127229 chr8:93352735-93578407(+) (10) miR-544-3p
BC098197 chr18:89461815-89466349(-) (10) miR-590-3p
AK046284 chr7:66541346-66544482(-) (9) miR-590-3p
1“Broadly-Conserved” or “Conserved” based on TargetScan miRNA family Annotations
2Coordinates are from mm10 assembly (GRCm38.p2)
65
Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain.
In situ data were obtained from Allen Brain Atlas data for the adult mouse brain (Ng et al, 2009) and images below modified by Ryan Spengler. PSMI16 is listed under its Riken dataset ID, 6720401G13Rik. Sagittal and coronal section data series are available. Expression level filters were applied and images are shown for both (Top Left) a coronal section (position 222) and (Top Right) and a sagittal section (position 36). The highest expression levels (Bottom; orange structures) were seen in the Hippocampal Formation (HPF, green structures) specifically in the granular cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. (Bottom right) High expression was also seen in the granule cell layer of the cerebellar cortex (CBX, yellow structures).
66
Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization.
In situ expression strength values (left) and images (right) are taken from the Eurexpress website (Diez-Roux et al, 2011). Expression strength is provided by the database and represents a numeric depiction of subjective assessments of signal intensity in the regions falling under the anatomical system categories shown on the graph. Moderate to high expression is seen in the nervous system, as in the adult mouse (Figure 3-3). (Right) In situ hybridization of PSMI16 is shown in a sagittal section of the mouse embryo. Select areas annotated as being of “High” expression are indicated on the image.
67
Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines.
(A) RT-PCR was performed on several tissues from the adult mouse using random primers for the RT step and specific primers to detect PSMI16. The ~100bp product was detectable in all tissues tested, with particularly high levels seen in the thymus, lung pineal gland, ovary and kidney. (B) NPC, BEND3 and N2A cells were tested for expression of PSMI16. Oligo-dT primers were used for the RT step to test whether PSMI16 is likely polyadenylated. Expression was highest in NPC and BEND3 cells, as detected by the primer set used in A (Lower bands, bottom right). A nearly full-length product was also detected in NPC and BEND3 cells (Top bands). Both bands suggest that PSMI16 is polyadenylated.
68
Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells.
An Ago2 antibody was used to IP Ago2 and bound RNAs from mouse NPCs and Human HEK293s (negative control). RNA was purified and reverse transcribed from bound (IP) and unbound (Supernatant) samples. PCR was performed on the cDNAs (30 cycles) using a primer set for PSMI16 or β-actin control. A specific band was seen in the Ago IP fraction of the NPCs and not HEK293s, showing that endogenous PSMI16 associates with Ago2 in these cells.
69
Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms.
(Top) 23 miR-16 MREs are predicted in the PSMI16 reference sequence (NR_015505), spread across 13/20 exons. Alternative inclusion of MRE-containing exons is apparent in annotated isoforms, including AK030946 (shown above) which incorporates only 4 sites. The novel short isoform cloned incorporates 12. (Bottom) Multiple sequence alignment of MRE-containing exons reveals high sequence similarity.
70
CHAPTER 4
SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND
POTENT SIRNAS FOR HUMAN AND MOUSE 0F
Abstract
RNA interference (RNAi) serves as a powerful and widely-used gene silencing
tool for basic biological research and is being developed as a therapeutic avenue to
suppress disease-causing genes. However, the specificity and safety of RNAi strategies
remains under scrutiny because small inhibitory RNAs (siRNAs) induce off-target
silencing. Currently, the tools available for designing siRNAs are biased towards efficacy
as opposed to specificity. Prior work from our laboratory and others’ supports the
potential to design highly specific siRNAs by limiting the promiscuity of their seed
sequences (positions 2-8 of the small RNA), the primary determinant of off-targeting.
Here, a bioinformatic approach to predict off-targeting potentials was established using
publically-available siRNA data from more than 50 microarray experiments. With this,
we developed a specificity focused siRNA design algorithm and accompanying online
tool which, upon validation, identifies candidate sequences with minimal off-targeting
potentials and potent silencing capacities. This tool offers researchers unique
functionality and output compared to currently available siRNA design programs.
Furthermore, this approach can greatly improve genome-wide RNAi libraries and, most
notably, provides the only broadly applicable means to limit off-targeting from RNAi
expression vectors.
Introduction
RNAi is mediated by small RNAs (~21 nucleotides) which are loaded into the
RNA Induced Silencing Complex (RISC), generating a functional complex capable of
base-pairing with and repressing target transcripts (Provost et al, 2002 ). Scientists have
devised strategies to co-opt the cellular RNAi machinery to silence virtually any gene of
71
interest using siRNAs, which may be chemically synthesized or expressed in the context
of stem-loop RNAs [e.g. short-hairpin RNAs (shRNAs)]. RNAi tools are vital for
functional genomics studies which enrich our understanding of basic biological
processes. In addition, RNAi-based therapeutics exhibit exciting potential to treat
numerous human ailments by suppressing disease-associated genes (Davidson &
McCray, 2011). However, the utility of RNAi is appreciably limited by our ability to
design siRNAs which are both potent and specific. There is considerable evidence
supporting that siRNAs bind to and regulate unintended mRNAs, an effect known as off-
target silencing (Chi et al, 2003 ; Jackson et al, 2003 ; Semizarov et al, 2003 ). Although
most siRNA design algorithms include BLAST to identify off-target transcripts with
near-perfect complementarity, off-targeting primarily occurs when the seed region
(nucleotides 2-8 of the small RNA) pairs with sequences within 3’UTRs of unintended
mRNAs thus inducing translational repression and transcript destabilization, similar to
canonical microRNA-based silencing (Guo et al, 2010; Jackson et al, 2006 ; Lewis et al,
2003). Notably, short stretches of complementarity – as little as 6 bp – may be sufficient
to initiate off-target silencing (Birmingham et al, 2006 ) (Figure 4-1A).
Numerous reports support that seed-based off-targeting generates false positives
in RNAi screens and dictates the toxicity potential of siRNAs (Anderson et al, 2008;
Fedorov et al, 2006 ; Ma et al, 2006; Schultz et al, 2011). Anderson et al. reported that
the extent of siRNA off-targeting correlates with the frequency of seed complements
(hexamers) present in the 3’UTRome (Figure 4-1B) (Anderson et al). Upon evaluating
subsets of siRNAs with differing off-targeting potential (low, medium and high; based on
3’UTR hexamer distributions), the low subset had significantly diminished microarray
off-target signatures and less adverse effects on cell viability as compared to the other
subsets. These findings established the importance of considering seed complement
hexamer frequencies as a key criterion for designing highly specific siRNAs, and some
siRNA design algorithms have since incorporated seed-specificity guidelines
72
(Birmingham et al, 2007; Jackson & Linsley, 2010; Naito et al, 2004 ). However, these
algorithms remain strongly biased for silencing efficacy, and because numerous potency-
based filters are applied ahead of specificity guidelines, few candidate siRNAs with low
off-targeting potential seeds emerge. This is reflected in recent literature and genome-
wide RNAi libraries, where only 10% of siRNAs fall into the previously established low
off-targeting range, per the Anderson et al. study(Boudreau et al, 2011; Moffat et al,
2006). While potency-based design is rational, only a fraction of the functional siRNAs
for a given target transcript are predicted, and in many instances, highly functional
siRNAs do not satisfy several design rules.
In recent work from our laboratory, we aimed to improve the safety profile of
therapeutic RNAi by designing hairpin-based vectors containing siRNAs with low off-
targeting potentials (Boudreau et al, 2011). We implemented a design scheme which
focuses on seed specificity yet promotes efficacy. This approach proved successful in
identifying therapeutic sequences which effectively silence target gene expression, induce
minimal off-targeting and are well-tolerated in mouse and non-human primate brains
(McBride et al, 2011). These promising results prompted us to extend the utility of this
approach by developing a user-friendly tool to facilitate with the selection of low off-
targeting potential siRNAs for broader application in therapeutic development and basic
biological research. Here, we describe a specificity biased design algorithm which
employs an improved means to score off-targeting potentials, and demonstrate its
effectiveness and unique functionality in comparison to current publically available tools.
Methods
Dataset and Sequence Retrieval
Pre-processed microarray datasets, annotations and sequences were obtained from
previously published supplementary materials (Garcia et al, 2011). This represents a
73
compilation of microarray data from seven earlier reports describing gene expression
changes in siRNA- or miRNA-treated HeLa cells.
TargetScan 6.0 was used determine the frequencies of seed complement binding
sites (e.g. 6-mer, 7A1, 7m8 and 8-mer) for all possible 16,384 heptamers (corresponding
to positions 2-8 of the small RNA) for each RefSeq 3’UTR sequence (Garcia et al, 2011).
Human (GRCh37/hg19) and mouse (NCBI37/mm9) 3’UTR sequences, and
corresponding gene symbols and accession numbers were obtained from the UCSC Table
Browser (http://genome.ucsc.edu/) using RefSeq annotations (Fujita et al, 2011;
Karolchik et al, 2004; Kent et al, 2002; Lander et al, 2001 ; Pruitt et al, 2005).
Formulating POTS
Dataset selection
Expression data for endogenous microRNAs were excluded from the training and
validation sets; several publications have suggested avoiding these seed sequences in
RNAi sequence design (Garcia et al; Wang et al, 2009). The GSE5814 dataset was also
excluded, because 77 of the experiments tested siRNAs with the same seed sequence.
Strand-biasing analyses were performed to determine whether sense or antisense strands
induce detectable off-targeting in each experiment. Pairwise T-tests were performed
comparing genes with at least 1 7mer site (>=1 8mer, 7M8 or 7A1) for either sense or
antisense strand seed sequence, to those having no predicted 3’UTR target site, including
6mer sites. Experiments exhibiting highly significant repression mediated by the sense
strand (one-tailed; P≤6E-5), and little to no evidence for the antisense (P>0.05) were
removed from further analyses. Of the remaining studies, the Dharmacon2008 dataset
qualitatively showed the most diversity in seed off-targeting potential, and it was set
aside for downstream validation.
74
Establishing weighted probability of repression (PR) values
and POTS calculation
Following the dataset filtering described above, 53 microarray datasets from
three independent studies (Dharmacon2006, GSE5291 and GSE5769) were used as
training data to establish POTS. For each microarray dataset, transcripts with a single
predicted 3’UTR seed binding site for either the sense or antisense strand of the given
siRNA were considered. This was done to account for possible loading of the sense
strand which may also mediate off-targeting. Transcripts with multiple target sites (8mer,
7M8, 7A1 or 6mer) for either strand were ignored so that the silencing potential for single
sites for each site type could be determined. Background data for each microarray
consisted of the remaining transcripts with no predicted 3’UTR seed binding sites for
either siRNA strand. Transcripts containing seed binding sites were parsed into groups
based on seed site type, and cumulative distributions of gene expression values were
generated for each transcript set.
PR values were calculated as a measure of the increased probability of repression
imparted by the presence of the single seed binding sites, relative to background
expectations. Statistical analyses were first performed on the datasets collectively to
identify the log2 fold-change value corresponding to the most significant divergence of
repressive potentials across all site types. For this, the data were analyzed at discrete
intervals (0.05 log2 fold-change increments), comparing the mean differences in
cumulative fractions (paired-samples T-test) for each site type set relative to the
respective background values across all experiments. Fisher’s method was used to
summarize p-values at each interval. The most significant interval (-0.3 log2 X2=176.4;
df=8; P<6E-34) was used calculate PR values where,
0.3 0.3
75
These PR values were multiplied by seed binding site frequencies (N) for each
site type in the 3’UTRome and summed to compute a weighted Potential Off-Targeting
Score using the following equation:
To generate the final POTS used in the siSPOTR tool, PR values were calculated
for both the validation and training datasets, and the median values served as the final PR
value. Also, 8mer, 7M8, 7A1 and 6mer site counts for all 16,384 heptamers were
calculated from Targetscan 6.0 (Garcia et al, 2011) predictions based on human and
mouse RefSeq-annotated 3’UTRs.
Tissue-specific POTS analysis
Expression profiles from 177 human cell lines and tissues based on the
U133A/GNF1H gene atlas were obtained from the BioGPS FTP site (http://biogps.org)
(Su et al, 2004; Wu et al, 2009). For each dataset, genes with median expression values
of greater than 100 for their corresponding probe sets were considered to be expressed. A
tissue-specific POTS (tsPOTS) was calculated for each tissue, as described above, but
limiting the 3’UTRs to expressed genes when calculating site type frequencies. Spearman
correlations were performed to evaluate variability in the rank-order of seed sequences by
tsPOTS, as compared to POTS calculated based on all human 3’UTRs.
Validating siSPOTR
Efficacy
The 2431 siRNAs in the Huesken Dataset were stepwise filtered according to the
siSPOTR design scheme (i.e. strand-biasing, GC-content and POTS rank). For a
comparison of efficacy, we used siDesign Center (Dharmacon), a highly utilized siRNA
design tool which focuses primarily on potency. Target gene coding sequences were
obtained using the Genbank Accessions provided in the Huesken siRNA Dataset and
76
were used as input sequences into the siDesign Center tool for siRNA design using
default settings. The top ten hits by siDesign Center were considered the top candidates
and were intersected with the Huesken siRNA dataset. Gene silencing efficacies for
overlapping siRNAs were recorded and plotted.
Ranking off-targeting potential
To evaluate the ability of the PR values to estimate the relative extent of off-
targeting, POTS values were calculated for the validation set, using the median value for
each site type determined from the training set. Target site frequencies were calculated as
described above, using human RefSeq 3’UTR sequences for transcripts present on the
array. POTS values were determined as the sum-product of the 8mer, 7M8, 7A1 and
6mer site frequencies and their respective PR values.
Cumulative distribution plots for gene expression values were generated by
parsing the transcripts by site type with no limitation for transcripts with single sites. The
number of down-regulated transcripts over background was calculated as described
above, subtracting the background fraction at the same point. Seeds were ranked
according to these values, and were compared to the rank-order of their estimated POTS
values, using spearman rank correlations. Visual inspection of the correlation plot
showed seven qualitatively-distinct outliers in the right tail of the POTS distribution (red
dots, Figure 5D). Spearman’s rank correlation coefficients and p-values were calculated
with and without these samples included.
Suppression signatures
Microarray data for the validation datasets was processed on a per target gene
basis (i.e. GAPDH, PPIB, and No Target groups) to discern off-targeting from gene
expression changes resulting from on-target silencing. The microarray data for each
group was evaluated to identify genes that were down-regulated by more than three
standard deviations from the mean, across the datasets, for a given gene. These gene lists
77
and accompanying gene expression values were imported into Partek Genomics Suite
(Partek GS, Saint Louis, MO) and used to perform hierarchical clustering by row
(columns were ordered by increasing POTS) allowing visualization of the suppression
signatures by heatmaps. Heatmaps were partitioned to separate low POTS and high
POTS siRNAs for each group. A qualitative assessment of suppression signature size was
defined by the area of the broadest, dark blue regions for each lane and plotted on a
common x-axis.
SiRNA Design Tool Comparison
We obtained RefSeq coding sequences for the sixteen therapeutically-relevant
gene targets (Table 1). These sequences were used as input at each of the indicated
siRNA tool websites [siDesign Center (Dharmacon,
http://www.dharmacon.com/designcenter/DesignCenterPage.aspx), siRNA Target Finder
(Genscript, https://www.genscript.com/ssl-bin/app/rnai), DSIR (Commissariat à l'Energie
Atomique; France, http://biodev.cea.fr/DSIR/DSIR.html), and Applied Biosystems SVM
siRNA Design Tool (http://www5.appliedbiosystems.com/tools/siDesign/) (Birmingham
et al, 2007; Vert et al, 2006; Wang et al, 2009). These websites were selected for this
comparison analysis because they are the select few of potency-based design tools that
consider seed-based off-targeting. In each case, the optional parameters were adjusted to
match our design scheme (e.g. 20-70% GC-content). At siDesign Center, output siRNAs
for each of the sixteen targets were sorted using by “Low Freq Seed” to identify
candidates with low off-targeting potential among their top hits. For each target, up to 50
siRNAs were obtained for POTS analysis. At siRNA Target Finder, the Machine
Learning option was used along with the Off-target filter (human, organ=house, seed
size=7, and Functional alignment option). Antiviral and Tradeoff options were
deselected, and the output siRNAs (up to 10 per target gene) were used for POTS
analysis. At DSIR, the default options were used and POTS for all candidates [ranging
78
from 4 to 517 siRNAs per target gene (RTP801 and APOB, respectively)] were
determined. For the Applied Biosystems siRNA Design Tool, sequences were uploaded
and siRNAs obtained. For all siRNAs evaluated in these analyses, POTS were
determined using positions 2-8 of the antisense strand.
Genome-wide shRNA coverage analysis and prospective
library generation and comparison
The EMBOSS Splitter tool on the Galaxy web server (http://galaxyproject.org/)
was used to generate a list of candidate siRNAs, for all human RefSeq 5’-UTR, CDS and
3’UTR sequences using a 21-nt, 1-nt offset sliding window (Blankenberg et al, 2010;
Giardine et al, 2005; Goecks et al, 2010). Candidate siRNAs were filtered to promote
antisense strand loading, retaining target sequences with the following pattern:
NN[G/C]3-4N5-19[A/T/C]20-21(Birmingham et al, 2007; Khvorova et al, 2003 ; Matveeva et
al, 2007; Schwarz et al, 2003). Sequences falling outside of a 20-70% G/C content range
were removed.
POTS values were obtained for the remaining sequences and were used to rank
order candidate siRNAs for each transcript. Similar to previous publications and currently
available RNAi libraries, candidates with near-perfect binding (0 or 1 mismatch) across
an 18-nt core (antisense strand positions 2-19) were removed (Birmingham et al, 2007;
Moffat et al, 2006). For purposes of comparison to the RNAi Consortium human shRNA
library (Broad Institute, MIT) (Moffat et al) and coverage analysis, sequences
corresponding to the 5’-UTR through the first 30-nt of the coding region were also
removed. Candidate sites were grouped by Gene Symbol and duplicate values removed,
noting sequences found in multiple transcript isoforms or with more than one site in the
same transcript. A prospective shRNA library was generated by applying an additional
filter to eliminate sequences with “TTTT” or “AAAA” motifs, allowing for compatibility
79
with Pol-III expression-based systems. For each dataset, up to 10 candidates with the
lowest POTS were included per gene.
For off-target comparison and coverage analysis with the RNAi Consortium
shRNA library (one of the few with sequence information), POTS values were assigned
based on position 2-8 of the reported antisense strand. POTS values were binned for each
dataset for POTS distribution comparison. ShRNA coverage analysis is reported based
only on the genes included in the TRC dataset.
Results
Low off-targeting siRNAs maintain potency
We first assessed whether siRNAs with low off-targeting potential have the
capacity for potent silencing, since a diminished efficacy could explain their
underrepresentation in the literature. Upon evaluation of 2431 randomly designed
siRNAs described by Huesken et al. (henceforth referred to as the Huesken siRNA
dataset) (Huesken et al, 2005 ), we found that low off-targeting potential siRNAs (i.e.
those having less than 2000 potential off-targets based on 3’UTR seed complement
hexamer distributions) exhibit comparable silencing efficiencies relative to the remaining
sequences (~66% and 69% knockdown, respectively; Figure 4-2), with 1 in 4 siRNAs
achieving >80% silencing, a commonly accepted threshold for potency. These results
indicate that low off-targeting potential does not preclude siRNAs from being functional,
suggesting that a siRNA design scheme weighted towards seed specificity would be
capable of generating potent sequences.
Design of effective low off-targeting potential siRNAs
We thus developed a siRNA design algorithm termed siSPOTR (siRNA Seed
Potential of Off-Target Reduction) which incorporates the most prominent determinants
of siRNA efficacy while focusing mainly on seed specificity. For a given target sequence,
80
all possible 21-mer siRNAs are filtered based on strand-loading and GC-content and then
rank-ordered based on seed specificity.
Strand-biasing
First, siRNAs are selected to promote faithful loading of the antisense strand to
mitigate potential off-targeting mediated by the sense strand. This is achieved using
conventional siRNA design methodology based on duplex thermodynamic stability, with
strong G-C binding at the 5’ end (2 bp) of the sense strand and weak A/G-U binding at
the opposing end (2 bp; Figure 1A) (Khvorova et al, 2003 ; Schwarz et al), with target
sites corresponding to NN[G/C]3-4N5-19[A/T/C]20[A/T/C/G]21. Notably, this differential
stability represents the most significant attribute promoting siRNA efficacy, therefore
encouraging potency in addition to specificity (i.e. preventing off-targeting from the
sense strand) (Birmingham et al, 2007; Matveeva et al, 2007). To satisfy this criterion,
weak G-U wobble pairing at the 3’ end of the target site can be introduced by converting
cytosines into uridines. We allow sense strand modifications at position 20 and 21 (i.e.
positions 1 and 2 of the antisense strand, respectively), while only permitting antisense
strand modification at position 1. Previously published data supports that the first
position of the antisense strand does not influence targeting efficacy (Miller et al, 2004 ),
and the ability to make these base conversions increases the number of potential target
sites passing this strand-biasing filter.
GC-content
Next, putative target sequences are filtered based on GC-content, another strong
determinant of siRNA potency (Birmingham et al, 2007; Matveeva et al, 2007). A range
of 30-65% GC is considered optimal for identifying effective siRNAs and is generally
used among potency-based siRNA design algorithms. To improve our yield of siRNAs
with a potential for high specificity, we allow a broader range of 20-70% GC content.
Our evaluation of the Huesken siRNA dataset supports that siRNAs within this range
81
exhibit a suitable potential for efficient silencing of >80% (roughly 1 in 4 randomly
designed siRNAs) (Huesken et al, 2005 ) (data not shown).
Seed specificity
Finally, we rank candidate siRNAs by scoring seed specificity using a weighted
system (POTS: potential off-targeting score) that was formulated based on miRNA target
recognition paradigms and siRNA off-targeting data derived from siRNA microarray
studies (>50 unique siRNAs individually tested in HeLa cells). Off-targeting among these
datasets follows the well-characterized miRNA-based hierarchy of silencing potential
based on seed site type (Figure 4-3A) (Lewis et al, 2005); the presence of 8-mers within
transcript 3’UTRs confers a notably higher potential for down-regulation relative to the
intermediate 7m8 and 7A1 sites, while 6-mer sites impart the least repressive potential
over baseline transcripts (i.e. no sites). Statistical analyses performed on the datasets
collectively revealed that the most significant divergence of the repressive potentials
among all site types occurs at ≤ -0.3 Log2 fold-change (P<0.001, Figure 4-3B). We next
established a weighted probability of repression (PR) (i.e. the likelihood for ≥ 0.3 Log2
fold-change down-regulation relative to baseline) for each site type by evaluating the
siRNA experiments individually to control for the observed baseline variability among
these datasets. The resulting PR values [8-mer (14.58%), 7m8 (7.68%), 7A1 (6.56%), and
6-mer (3.64%)] were calculated using the median values for each site type across the
datasets. These PR values were then incorporated into the POTS formula which
integrates both seed site type and frequency parameters. Previous reports have established
that the potential for a miRNA to down-regulate a transcript depends not only on seed
site types, but also the frequencies of these sites within a target 3’UTR (Doench & Sharp;
Grimson et al, 2007; Nielsen et al, 2007). Grimson et al. reported that multiple miRNA
seed sites in a single 3’UTR primarily act in an independent and non-cooperative manner
(e.g. two 8-mers impart twice the repressive potential relative to a single 8-mer). Our
82
evaluation of the siRNA microarray experiments corroborated these results (data not
shown), and thus, the POTS equation was formulated accordingly to provide an accurate
estimation of off-targeting potentials.
Where N = frequency of site in the 3’UTRome, and PR = probability of repression.
We next calculated POTS for all possible 16,384 heptamers [note: heptamer
sequences corresponding to positions 2-8 siRNAs/miRNAs determines all possible seed
site type sequences (Figure 4-3A)] using transcriptome-wide human 3’UTR sequences
and observed a broad, non-uniform distribution of POTS, ranging from 5 to 5095 (Figure
4-3C). Not surprisingly, the highest scores were among heptamer sequences relevant to
polyadenylation (e.g. AAAAAAA), whereas low POTS heptamers contain CpG
dinucleotide motifs which are relatively rare within mammalian genomes. The POTS=50
value is highlighted, representing an estimated but relevant cut-off which is employed
henceforth for demonstrative purposes throughout this manuscript. This value is
noteworthy since all 14 of the previously validated low off-targeting potential siRNAs
tested by Anderson et al. have POTS<50(Anderson et al, 2008). Furthermore, our
evaluation of 750 siRNAs and accompanying in vitro cytotoxicity data support POTS<50
as a conservative cut-off associated with an improved likelihood for tolerability (data not
shown) (Fedorov et al, 2006 ). The siSPOTR specificity feature serves primarily to rank
the off-targeting potential of siRNAs, and a firm cut-off for POTS values does not exist,
much like for siRNA efficacy scores provided by potency-based siRNA design
algorithms.
The importance of weighting seed site types is evident particularly in cases where
seeds sharing the same core hexamer vary greatly in the number of genes containing the
more potent 7- and 8-mer sites. For example, the seeds CGCGATa and CGCGATc each
have 302 potential off-target transcripts (based on 3’UTR hexamer counts) but
83
respectively have 40 and 201 transcript 3’UTRs with 7- or 8-mer sites. This 5-fold
difference creates a considerable disparity in the off-targeting potentials of these seeds,
resulting in a two-fold difference in their POTS values (Table 4-2, Table 4-3). This
illustrates the importance of considering position 8 which dictates the sequence of the
most potent seed site types (i.e. 7m8 and 8mer). We calculated the mean site type
frequencies for all possible heptamers binned by POTS values, revealing nearly a 5-10
fold reduction in the more potent site types for Low POTS heptamers, relative to those
with medium-to-high POTS (e.g. for 8mers, mean values of ~45 compared to >350,
respectively).
Finally, as means to further refine our prediction of off-targeting potentials, we
considered the degree to which POTS is influenced by variations in gene expression
changes across tissues. For this, transcriptional profiling data from 177 different human
cell lines and tissues (BioGPS) were used to calculate tissue-specific POTS for all
possible heptamers. Although gene expression patterns vary greatly across tissues, POTS
ranks for each heptamer correlate strongly (r2>0.95; Figure 4-4). These data support that
organism-wide application of POTS is suitable.
SiSPOTR design example
We provide a step-wise example illustrating the use of siSPOTR for designing
siRNAs targeting the human PPIB coding sequence (CDS; Figure 4-5). The 648-nt target
sequence is first divided up to produce all 631 possible 21-mer siRNA target sites, and
the strand-biasing and GC-content filters described above are applied prior to
determining POTS values for the resulting siRNAs. In this example, among the 113
PPIB-targeted siRNAs which satisfy the strand-biasing and GC-content criteria, seven are
represented in the siRNA validation datasets described below, allowing visualization of
the measured off-targeting associated with their respective POTS values of 25, 29, 40,
407, 410, 510 and 560 (Figure 4-6).
84
Validation of siSPOTR algorithm:
efficacy and specificity
Efficacy
We gauged the capacity of siSPOTR to identify potent siRNA sequences among
the siRNAs in the Huesken dataset (Figure 4-6A). The siRNAs satisfying the strand-
biasing and GC-content criteria were rank ordered by POTS (low to high), yielding seven
siRNAs with POTS<50. Here, this relatively low number results from fewer sequences
passing the strand-biasing filter, since the capacity for introducing duplex instability
using G-U base-pairs, as described above, is not applicable to these pre-existing siRNAs.
Surprisingly, these seven siRNAs each had >80% silencing efficacy, with a mean
comparable to that of siRNAs within the database that were identified among the top hits
generated by siDesign Center (Dharmacon), a widely-used siRNA design website.
Although siDesign Center yields more hits among this database, only two of these
siRNAs has a POTS<50. Indeed, siSPOTR identified five siRNAs not among the
siDesign Center hits (Figure 4-6A, Venn diagram), highlighting the unique output
potential of the siSPOTR algorithm.
Off-targeting potential
We next evaluated the predictive power of POTS to estimate the extent of off-
target gene silencing observed among microarray experiments for 40 unique siRNAs
targeting GAPDH, PPIB, or “No Target”. These 40 experiments were selected because
the siRNAs encompass a broad range of POTS with relatively equal representation across
low, medium and high scores. To improve our ability to discern sequence-specific off-
targeting from gene expression changes associated with on-target silencing, the datasets
were grouped by target gene prior to calculating differential gene expression and
establishing “suppression signatures” for each siRNA. Furthermore, each of these 40
siRNAs exhibits greater than 85% silencing efficacy, reducing the potential for detecting
85
gene expression changes due to varying degrees of on-target silencing within groups. In
support of the POTS approach, our analyses of these datasets reveals smaller sequence-
specific “suppression signatures” among the low off-targeting potential siRNAs
(POTS<50), relative to siRNAs with higher POTS (Figure 4-6B). Notably, 13 of 28
higher POTS siRNAs produced greater “suppression signatures” than the largest one
observed among the low POTS siRNAs (Figure 4-6C). It is important to note that our
analyses (data not shown) and previously published data support that these “suppression
signatures” consist of down-regulated transcripts that are enriched for 3’UTR seed
binding motifs, suggesting that most are likely to be direct siRNA off-targets (Burchard
et al, 2009; Jackson et al, 2006).
The prospect of using POTS to accurately rank off-targeting potentials among
these 40 siRNAs was also assessed. Spearman rank correlation of the POTS scores and
numbers of down-regulated off-targets observed for each siRNA indicated a positive
correlation of modest significance (Figure 4-6D, dotted line, P = 0.05). As depicted by
this plot, a few higher POTS siRNAs have low numbers of off-targets (red dots);
however, none of the low POTS siRNAs showed high numbers of off-targets. Indeed
removing the overt outliers among the higher POTS siRNAs produces a highly
significant correlation (solid line, P < 1E-8), providing further evidence that POTS is a
reliable predictor of siRNA off-targeting potentials. These data, in conjunction with the
efficacy validation, establish the robust capability of siSPOTR to identify highly specific
and effective siRNAs.
Finally, we reasoned that training on more datasets (i.e. combining the training
and validation sets described above) could generate a more accurate POTS for ranking
siRNA off-targeting potentials. As expected, the Spearman rank correlation of POTS
scores and numbers of down-regulated off-targets observed for each siRNA showed even
greater significance (Figure 4-7). These improved POTS values are used henceforth.
86
Comparison of siSPOTR to other algorithms
We subsequently compared the abilities of our design strategy and other
publically available algorithms, particularly those which incorporate seed specificity
parameters, to identify siRNAs with low off-targeting potential seeds (i.e. low POTS).
The coding sequences of 16 therapeutically-relevant genes (of varying sizes; comprising
in total ~50 kb) were used as input, and the number of candidate siRNAs with POTS<50
was determined for each algorithm. Our design scheme identified more low off-targeting
potential siRNAs [at least four siRNAs (a typical starting number for initial efficacy
screening) for all 16 of the input genes] relative to the other algorithms, which failed to
generate at least four siRNAs with POTS<50 for at least 8 of the 16 genes (Table 4-1).
This observation emphasizes a considerable limitation of current siRNA design tools that
are strongly biased towards potency, highlighting the unique functionality that siSPOTR
provides to researchers seeking siRNAs with low off-targeting potentials.
Prospective applications to expressed RNAi and
genome-wide RNAi libraries
The siSPOTR algorithm provides an attractive approach for limiting off-targeting
from hairpin-based RNAi expression systems, which unlike siRNAs, are not amenable to
chemical modifications that may reduce seed-based off-targeting (Bramsen et al, 2010;
Jackson et al, 2006 ; Vaish et al). Recently, we published microarray data supporting that
RNAi vectors expressing siRNAs with low off-targeting potentials (based on 3’UTR
hexamer frequencies) show reduced off-targeting relative to sequences with more
promiscuous seeds (Boudreau et al, 2011). To ascertain whether POTS can be a reliable
indicator of off-targeting from expressed RNAi, we evaluated the association of POTS
with off-targeting for the expressed RNAi sequences tested in this previous study (eight
constructs with POTS ranging from 11 to 653). Hierarchical clustering of differentially
expressed genes (N=827, P<0.0001) among the various RNAi sequences reveals that the
87
clustering distance relative to the control (i.e. promoter-only vector) increases in
agreement with rising POTS values (Figure 4-8), supporting that Low POTS RNAi
sequences induce fewer gene expression changes as compared to sequences with higher
POTS values. These data substantiate the utility of siSPOTR for improving the specificity
of RNAi expression vectors.
Next, we investigated the feasibility of generating a genome-wide shRNA library
using this algorithm. Genome-wide RNAi screens are broadly used to discover genes
implicated in biological pathways and phenotypes; however, these screens can be plagued
by off-target effects producing false leads (Ma et al, 2006; Schultz et al, 2011). Although
bioinformatic approaches show some practicality for distinguishing off-targets from bona
fide targets (Sigoillot et al, 2012; Zhang et al), careful attention to sequence selection
may greatly reduce off-targeting among libraries. There are currently several RNAi
libraries available in synthetic siRNA or expressed forms (e.g. shRNAs). Here, we
demonstrate the potential of our siRNA design scheme to generate genome-wide RNAi
libraries with high specificity (based on POTS and BLAST, see methods). Our
prospective shRNA library (“Low POTS”) consists of 235,121 sequences (up to 10
shRNAs per target gene; POTSmedian=37) and provides at least 4 shRNAs with<50 POTS
for more than 78% of all RefSeq mRNAs (Figure 4-9). These sequences have reduced
(nearly 10-fold) off-targeting potential over those offered in a publically available
shRNA library [178,265 sequences; POTSmedian=322; The RNAi Consortium (TRC)]
which covers 0.70% of RefSeq mRNAs with at least 4 shRNAs having<50 POTS. A
histogram of the POTS distributions for each of these libraries reveals an evident
disparity, with>90% of the sequences having improved POTS relative to the TRC library
which followed a near-random distribution mirroring POTS for all possible heptamers.
For genome-wide siRNA design, the “low POTS” library coverage is even broader (data
not shown), providing an additional means to enhance specificity in combination with
88
chemical modifications to the seed (Bramsen et al, 2010; Jackson et al, 2006 ; Vaish et
al, 2011).
SiSPOTR Online Tool
Based on these observations, we developed an online tool employing the
siSPOTR algorithm to assist users with designing RNAi sequences with low off-targeting
potential for application in human and mouse (https://sispotr.icts.uiowa.edu). The
siSPOTR tool searches user-defined target sequences for siRNAs that pass strand-biasing
and GC% filters and outputs candidate siRNAs rank-ordered by POTS from lowest to
highest. For convenience, the sequences are ready-to-order with the necessary nucleotide
substitutions made to the sense strand to promote proper strand-loading. In addition,
DNA oligonucleotide sequences for generating corresponding shRNAs are supplied to
assist users with generating RNAi expression vectors. The output also provides detailed
off-targeting information for each siRNA including i) the number of 3’UTRs containing
each seed site type, ii) the putative off-target transcripts, and iii) counts of each seed site
type on a per transcript basis. The siSPOTR tool also alerts the user if the siRNA seed
sequence matches that of a known miRNA, as such an instance may confound
experimental results given the regulatory roles miRNAs play in numerous biological
processes and pathways. Furthermore, recognizing the ease of purchasing pre-validated
siRNAs and shRNAs, we provide an accompanying online tool which allows users to
input siRNA sequences to obtain POTS values and the detailed off-targeting information
described above. These tools will provide researchers with dependable means to
minimize and evaluate off-targeting concerns associated with RNAi experiments.
89
Discussion
Consideration of Seed Pairing Stability
A recent report from the Bartel laboratory evaluated the impact of seed-pairing
stability (SPS) and target abundance (TA; levels of potential binding sites in the cellular
transcriptome) on seed-mediated silencing by small RNAs (miRNAs and siRNAs)
(Garcia et al, 2011). Their data support that seeds with weak SPS inherently have higher
TA, and that both factors limit seed-based silencing potency, presumably from weaker
binding and a dilution effect associated with the increased number of targets. In contrast
to the siSPOTR approach, the authors propose that designing siRNAs with weak SPS and
high TA seeds may minimize off-targeting potential. While the potency of such seeds
may be low on average, the possibility of repressing considerably more off-targets exists.
A comparison of the low POTS approach to the weak SPS strategy may be warranted.
When accounting for repressive potentials in addition to the numbers of predicted off-
targets, it is likely that siRNAs having weak SPS would consistently have higher numbers
of off-targets expected to be down-regulated, relative to low POTS siRNAs. Even yet, a
consideration for SPS in siRNA design is warranted, and we have added SPS values to
the siSPOTR output, so that users may avoid higher SPS seeds among siRNAs with
comparable POTS values.
The Utility of siSPOTR
Off-target effects (e.g. false discovery rates and toxicity) pose a problem for gene
silencing technologies, particularly for RNAi therapeutics, thus supporting the need for
developing a user-friendly tool to assist researchers in designing siRNAs which are
highly specific and efficacious. Here, and in prior work from our laboratory and others’,
we demonstrate that focusing on seed specificity in siRNA design may mitigate off-
targeting by 5- to 10-fold, as supported by predictive analyses and transcriptional
profiling data from RNAi studies (Anderson et al, 2008; Boudreau et al, 2011). Unlike
90
other siRNA design strategies, siSPOTR yields numerous candidate sequences with low
off-targeting potentials, providing a broad and attractive approach towards alleviating
off-target concerns. Other means to address off-targeting have been previously described.
For example, in basic biological research, scientists may employ “same seed” controls
(i.e. containing the same seed sequence as the experimental siRNA, but central
mismatches to prevent silencing of the target of interest) to discern on-target versus off-
target effects(Boudreau et al, 2011). Furthermore, research supports that off-targeting
from synthetic siRNAs can be reduced by chemical modifications or using lower doses
(Bramsen et al, 2010; Caffrey et al, 2011; Jackson et al, 2006 ; Vaish et al, 2011; Wang
et al, 2009); however, specificity could be enhanced further by employing seeds with low
POTS. By contrast, for expressed RNAi forms (e.g. shRNAs), our approach provides the
only broadly applicable methodology to limit off-targeting potential. Although sequence-
specific effects on hairpin expression, stability, and processing may also contribute to off-
targeting potential, our data support that POTS provides a good predictor of off-targeting
for RNAi expression vectors. This is important particularly since dosing from RNAi
expression vectors cannot be as readily controlled, and shRNA-induced toxicities have
been reported by several groups (Boudreau et al, 2009a; Grimm et al, 2006 ; Martin et al,
2011; McBride et al, 2008). Given the extensive use of RNAi expression systems in the
laboratory and in therapeutic development, siSPOTR will serve as a valuable tool to the
research community.
SiSPOTR can easily be used in conjunction with other siRNA design algorithms
(e.g. those weighted towards efficacy) to query their outputs for off-targeting potentials
and information. For instance, one can use Applied Biosystems’ hyperfunctional (i.e.
highly potent) siRNA design tool to identify hyperfunctional candidate sequences which
can subsequently be input into the siSPOTR tool to retrieve their POTS values (Wang et
al, 2009). This combined approach aims to ascertain siRNAs with a highly desirable
balance of potency and low off-targeting potential, providing an attractive means to
91
identify therapeutic siRNAs for disease-relevant targets, particularly larger genes which
have numerous low POTS siRNAs available (Table 4-1).
SiSPOTR allows users to query the identities of predicted seed-based off-target
transcripts as means to avoid potentially important cellular genes (e.g. those involved in
cell cycle and viability). Off-target identity is an important contributor to the overall
detrimental effects caused by disrupting gene networks, and the resulting tolerability for a
given siRNA. However, declaring a predicted off-target to be important remains difficult
due to a dependence on numerous variables [e.g. experimental system (i.e. cell type),
duration and extent of knockdown, identities of other off-targets (e.g. a two-hit model),
etc.]. Nevertheless, although researchers should consider the identities of predicted off-
targets, it stands to reason that minimizing the off-targeting potential of the siRNA seed
will inherently reduce the likelihood of unintentionally silencing important genes and
further limit downstream events associated with cascading gene networks.
Finally, siSPOTR supports RNAi sequence design for human and mouse
experimental systems; however, all low POTS heptamers contain CpG motifs which are
consistently sparse throughout mammalian genomes. Furthermore, the ranking of
heptamers by POTS for mouse and human reveals a significant correlation (r2>0.938, plot
not shown), suggesting that siSPOTR is likely applicable to other mammalian species.
92
Figure 4-1.Diagram of on- and off-target silencing by siRNAs.
(A) Cartoon depicting a siRNA duplex designed to exhibit proper strand-biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and contain a low off-targeting potential seed (green highlight). Upon loading into RISC, the antisense strand may direct on-target silencing (intended) and off-target silencing (unintended). (B) Schematic highlighting the relationship between the frequencies of seed complement binding sites in the 3’UTRome and the off-targeting potential for siRNAs. Contributed by Ryan Boudreau.
93
Figure 4-2.Effect of siRNA off-targeting potential on gene silencing capacity.
A siRNA database composed of 2431 randomly designed siRNAs (targeting 31 unique mRNAs) and accompanying silencing data (Huesken et al, 2005 ) was used to determine whether low off-targeting potential siRNAs (i.e. those having <2000 potential off-targets based on seed complement hexamer distributions in human RefSeq 3’UTRs; blue) have similar capacities for gene silencing relative to the remaining 2068 siRNAs (mid-to-high off-targeting potentials; red). Roughly 1 in 4 of the low off-targeting potential siRNAs achieved >80% silencing (a commonly accepted threshold for potency), and overall their average efficiencies were comparable to the remaining siRNAs (~66% and 69% knockdown, respectively; dotted lines). (Contributed by Ryan Spengler).
94
Figure 4-3. Formulation and distribution of POTS (potential off-targeting score).
(A) Illustration of seed site types, with seed sequences highlighted in green. The adenosine corresponding to position 1 is highlighted in yellow and represents a defining feature for the 7A1 and 8mer binding site types. (B) The effect of seed site type on off-target silencing was determined using data 54 microarray experiments testing unique siRNAs in HeLa cells. Cumulative distribution plots for gene expression values are shown for transcripts grouped by the binding site type present. Only transcripts containing singles sites of a given type were considered. ***Student t-test indicated that the most significant divergence of the repressive potentials among these site types occurs at ≤ -0.3 Log2 fold-change (P<0.001). (C) Schematic illustrating how POTS is calculated using seed site type frequency and probability of repression (PR) values, shown above each respective site type. (D) The distribution of POTS scores – based on human 3’UTR sequences – for all possible 16,384 heptamers is plotted. POTS<50 is highlighted to indicate a relevant cut-off which is employed for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value). (Panels A and C contributed by Ryan Boudreau; Panels B and D contributed by Ryan Spengler).
95
Figure 4-4. Correlation of POTS ranks across tissues.
Tissue-specific POTS values for 177 human cell lines and tissues (BioGPS) were calculated based on genes expressed (median of probeset expression values ≥100) in those tissues. (A) POTS values calculated using all 3’UTR sequences (Overall POTS) were correlated with those calculated by the 177 expression profiles (Spearman rank correlation). The histogram and box plot show the variation of correlation coefficients (r2) for each pairwise comparison (error bars = 2-98th percentile). (B) The scatter-plot shows the correlation of Overall POTS scores with the tissue-specific POTS distributions with the worst calculated correlations (r2 0.9982-0.9986).
96
Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm.
All possible 631 siRNAs targeting the human PPIB coding sequence (CDS) were filtered based on strand biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and GC-content, and the number of siRNAs passing each criteria are provided. Note: the asterisk denotes a cytosine base in the 3’ end of the target site; this base can be converted to a uridine to produce a weak G:U base-pairing in the resulting siRNA duplex. The heptamer seed sequence used for POTS determination is highlighted. (Contributed by Ryan Boudreau).
97
Figure 4-6. Validation of siSPOTR: efficacy and off-targeting.
(A) SiRNA efficacy was evaluated using a database of 2431 randomly designed siRNAs with accompanying silencing data. The number of siRNAs passing each stage of our stepwise filtering process is indicated along with the number of potent sequences among them (i.e. those with >80% silencing efficacy. *siDesign Center (Dharmacon) was used for comparison by inputting the relevant target gene sequences into the online tool (N=29) and intersecting the top ten hits for each gene with the 2431 siRNAs. The box and whiskers plot shows the max and min gene silencing values (whiskers) and the upper and lower quartiles (box). The accompanying Venn diagram shows that siSPOTR identified five unique and effective sequences not present among the siDesign Top Hits. (B-D) Microarray data from experiments testing 40 unique siRNAs were used to assess the reliability of POTS as an indicator for off-targeting potential. (B) Heatmaps representing sequence-specific gene “suppression signatures” unique to each siRNA were generated using hierarchical clustering of significantly down-regulated genes (>3 standard deviations from the mean) among the datasets on a per target gene basis (i.e. GAPDH, PPIB and No Target), and columns were ordered and parsed by POTS for each group.
98
Figure 4-6. Continued. (C) A qualitative representation of “suppression signature” size (i.e. sum of dark blue regions) for each column is shown. The red dotted line marks the largest “suppression signature” among the siRNAs with POTS<50. (D) Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers. (Panels A, B and C contributed by Ryan Boudreau; Panels A and D contributed by Ryan Spengler).
99
Figure 4-7. Spearman rank correlation of final POTS values.
Spearman rank correlation of final POTS values. Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Data consists of the training and validation groups combined. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers.
100
Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors.
HEK293 cells were transfected with U6 promoter-only or U6-driven hairpin-based RNAi expression plasmids (n = 4 for each treatment), and RNA was harvested 72 h later for microarray analysis. Two-way ANOVA was performed to detect differentially expressed genes among the treatment groups. Hierarchical clustering of differentially expressed genes (P < 0.0001, 827 genes) was performed to visualize the relationships among the treatment groups. Notably, all of the low POTS sequences (green) exhibit gene expression profiles that are more closely related to the U6 control, as compared to the remaining sequences which have medium (yellow) to high (red) POTS values. (Contributed by Ryan Boudreau).
101
Table 4-1. Comparison of siRNA design tools.
Gene CDS(nt) siSPOTR siDesign Genscript DSIR AppBio
SNCA 423 4 0 0 0 0 SOD1 465 4 1 0 0 0
RTP801 699 19 5 1 0 0 TOR1a 999 14 3 6 6 1 SCA3 1086 6 4 2 3 0 VEGF 1239 22 4 4 1 2 MYC 1365 31 7 2 4 3
BACE1 1506 18 0 2 0 0 KRT6a 1695 23 0 1 2 0 SCA1 2448 42 2 1 3 1 SCA7 2679 35 6 3 7 2
EGFR1 3633 47 5 3 13 2 BCR-Abl 3816 83 7 2 7 2
SCA2 3942 42 2 2 13 1 HTT 9435 82 3 N/A 8 N/A
APOB 13692 66 1 N/A 14 N/A Total 49122 538 50 29 81 14
At least 4 siRNAs? 16 of 16 7 of 16 2 of 16 8 of 16 0 of 16
** POTS<50 serves as a relevant cut-off for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value).
N/A indicates that the online tool was unable to process transcripts of this length.
Contributed by Ryan Boudreau.
102
Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.
All possible 7mer (nt 2-8) seed sequences were grouped according to their common core 6mer (nt 2-7). The number of 3’UTRs containing any 6mer binding motif were counted. The number of these putative targets containing at least one 8mer, 7M8 or 7A1 site, given the variant base at position 8 was also tallied. The ratio between the maximum and minimum number of genes among the four heptamers was then calculated for each group. The groups with the 10 highest ratios are indicated in the table above.
# 3'UTRs with 8mer, 7M8 or 7A1 Binding Site Given N at Seed position 8
Seed nt 2-8# 3'UTRs with 6mer Seed
Binding Site (nt 2-7) A C G T Max/Min
CGCGATN 302 47 93 201 40 5.03TCGCGCN 343 97 49 220 66 4.49TCGAACN 852 564 134 225 234 4.21ATCGCGN 288 46 65 182 44 4.14AATCGCN 954 596 149 268 298 4.00ATCCGCN 864 228 137 529 211 3.86CGATTCN 1070 675 187 343 329 3.61AGGCGTN 1754 397 1194 431 332 3.60ACCGCGN 536 110 152 295 84 3.51AGCCGAN 1483 272 945 319 307 3.47
103
Table 4-3. The effect of seed position 8 on off-targeting potential by POTS
The same as Table 4-2, except here the POTS values for the core 6mer sequence given A, T, G or C at position 8 are provided. The ratio between the maximum and minimum POTS in each seed group is provided.
POTS Value Given N at Seed position 8
Seed nt 2-8# 3'UTRs with 6mer Seed
Binding Site (nt 2-7) A C G T Max/Min
GATTACN 3430 217 170 190 347 2.04CGCGATN 302 10 12 19 10 1.95TCGAACN 852 52 27 33 34 1.93ACACACN 5932 620 543 588 1044 1.92AATCCCN 4697 341 274 324 525 1.92ATATACN 5092 419 308 356 580 1.88ATGTACN 4083 262 199 233 372 1.87CGATTCN 1070 64 34 45 45 1.86GTAATCN 3134 265 209 387 269 1.85AGGCGTN 1754 69 120 71 65 1.85
104
Figure 4-9. Comparison of off-targeting potentials among shRNA libraries.
A histogram and complementing table presenting the POTS distributions and genome-wide coverage of shRNA library sequences are shown for our “Low POTS” library (green) and the TRC library (red). The POTS distribution of all possible heptamers (blue) serves as a reference. The range encompassing 90% of all sequences for each shRNA library is indicated. Yellow highlights intersect to emphasize the coverage disparities at a key point; POTS<50 provides a conservative cut-off for low off-targeting potential, and at least 4 siRNAs are desired for a given gene when generating a library or performing initial efficacy screening.
105
CHAPTER V
FINAL DISCUSSION
Competitive Endogenous RNAs
Experimental manipulation of miRNA activity has long relied on the ability to
block or sequester miRNA binding through the use of synthetic antagomirs and expressed
miRNA sponges. These molecular tools showed that, at least in principle, miRNA
activity can be regulated in a competitive manner. In Chapter 3, lncRNAs were proposed
as endogenous miRNA “sponges,” serving as endogenous analogs to the artificial
inhibitory tools. As described briefly in that chapter, recently published functional
evidence suggests that competitive endogenous RNAs (ceRNAs) take on many forms,
including long intergenic noncoding RNAs (lincRNAs) similar to PSMI16, pseudogenes,
and even protein-coding mRNAs (Cesana et al, 2011; Hansen et al, 2013; Karreth et al,
2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). Based upon these
reports, “ceRNA” describes a functionally diverse array of RNA classes, much in the
same way that “RNAi” describes a general process mediated by miRNA, endo-siRNA,
piRNA and the like. Future work will likely involve functional characterization of more
ceRNA:miRNA interactions, along with the physiological or pathophysiological
pathways in which they function. Additionally, other RNA species, such as the recently-
reported circRNAs (Hansen et al, 2013), may also be ceRNAs.
The observation that pseudogenes like PTENP1 function as ceRNAs adds another
connection between transposons and miRNAs. PTENP1 is an example of a “processed”
pseudogene. Processed pseudogenes are created when a mature, spliced transcript is
reverse transcribed and integrated into the genome by retrotransposon- or retrovirus-
encoded proteins. For example, PTENP1 formed when a LINE1 element mobilized and
reverse transcribed a fully processed copy of the PTEN gene. In the Posileno et.al. study,
many of the conserved MRE sites (e.g. miR-19,-20,-21,-26 and -214) from the PTEN
106
3’UTR were still intact in PTENP1, thus imparting its ceRNA activity. Interestingly,
PTENP1 is present only in apes, as no syntenic locus is found in rhesus (Old World
Monkey), marmoset (New World Monkey) or mouse genomes. This exemplifies how
primate-specific transposition activity can alter the activity of conserved miRNAs.
The fact that PTENP1 retains many MREs from the parent PTEN transcript also
reveals an important nuance differentiating pseudogene ceRNAs from other ceRNAs.
Most mRNAs, like PTEN, are coordinately regulated by multiple miRNAs, and a
pseudogene could compete for them. This means that pseudogene ceRNAs would likely
have the most potent effect on the expression levels of the parent gene and any other gene
bound by the same set of miRNAs. On the other hand, lincRNA ceRNAs like PSMI16
have numerous binding sites for a given miRNA. I would hypothesize that lincRNAs
would globally impact the targets of a miRNA family, whereas pseudogenes would
regulate its parent gene more specifically.
Off-targeting and RNAi design
We took advantage of mRNA transcript degradation by miRNA-like interactions
to detect off-target effects from exogenous RNAi triggers after their delivery to cells and
tissues. We found that the extent of miRNA-mediated changes on cell expression profiles
was robust, and in some cases, these broad transcriptional perturbations caused cell
toxicity. It stands to reason then, that rational design of RNAi triggers with low off-
targeting potential would reduce the probability of generalized transcriptional
disturbances and subsequent toxicity.
Although in general we can reduce off-targeting probability with our siSPOTR
algorithm, we also found that some low off-targeting potential sequences induced toxicity
in vivo. This suggests that not all off-targeting can be avoided, and that empirical testing
of RNAi triggers is required to assess their overall safety. Future research to further
improve predictions of RNAi specificity would benefit from closer analysis of sequences
107
deemed toxic in the literature. We are currently working to find ways to “switch” the off-
targeting profile of exogenous, artificial miRNA triggers found to be toxic. We have
found that given an antisense RNA with a “low POTS” seed that induces unintended
toxicity, single base changes to the seed sequence changes the off-target profile. As we
assume that at least one of the original sequence’s off-targets is problematic when
suppressed, switching to another low POTS seed avoids most, if not all of the original
off-target genes. To test this, we are currently working with an artificial miRNA that
effectively silences expression of Huntingtin (HTT), but which induces behavioral
deficits in wild-type C57BL/6 mice. Because similar constructs targeting HTT have been
tested in nearly identical experimental settings, achieving comparable levels of HTT
repression (Boudreau et al, 2009b; McBride et al, 2008), we hypothesize that sequence-
specific off-target effects are causing this phenotype. So far, experiments performed by
Alex Mas Monteys using constructs I designed to alter seed sequences while retaining
potency, reveals that directed single base mutations in the toxic miRNA’s seed preserves
HTT silencing efficacy in vitro. Bioinformatic target site predictions indicate that very
few seed-mediated targets overlap between the toxic trigger and the modified ones. The
next step is to inject the original or modified sequences into C57BL/6 mice as before and
see whether the mutations correct the toxic phenotype. Notably, if toxicity persists, this is
likely due to hitting other target transcripts whose expression level must be maintained at
or near 100% for cell viability.
If the single base mutations prove effective in mitigating the toxic phenotypes that
manifest from seed-mediated off-target effects, it follows that the same changes could be
made to reduce the off-target potential of high POTS sequences. As discussed in Chapter
4, commercial suppliers of pre-designed RNAi sequences focus on designing the most
potent sequences for their customers. Also mentioned in Chapter 4, based on the relative
rarity of low POTS seeds, most of the sequences designed for potency and not for seed
specificity will likely have high off-targeting potential. However, we found that all low
108
POTS seed sequences contained at least one “CG” dinucleotide. This dinucleotide is
known to be relatively infrequent in mammalian genomes. On average, every additional
“CG” nucleotide in a 7-mer motif results in a 10-fold reduction in 7-mer frequency
(Garcia et al, 2011). Therefore, we expect that if we start with a highly-potent RNAi
sequence with relatively high off-target potential, a single base change in the seed to
introduce a “CG” dinucleotide could greatly reduce their off-targeting potential. If these
mutations have minimal impact on silencing efficacy, as we have seen with the HTT
sequences thus far, we could greatly increase our ability to design low off-targeting
sequences, and perhaps even increase our stringency in screening for potency.
Emerging technologies in the study of miRNA biology
The data presented in this work, as well as many of the cited publications, has
revealed that the mechanisms underlying miRNA biogenesis and function are far more
complex than represented in the canonical pathways outlined in Chapter 1. Integrating
these newer pathways and determining the relative breadth of each to various biological
systems will be important tasks in the future. For example, Ago HITS-CLIP and similar
technologies will be essential for verifying to what extent and in which biological settings
TE-derived or lncRNA-resident MREs are actually occupied by Ago complexes. Based
on the “off-targeting” phenomenon observed with exogenous RNAi triggers, it is clear
that the RNAi machinery can be pushed to silence biologically-irrelevant targets in
sufficient doses. HITS-CLIP will provide a better picture of what is actually engaged by
RISC machinery under physiological conditions.
On the other hand, the physiological role of the Ago-bound complexes also
remains an open question. Our current understanding of miRNA function is largely based
upon perturbations of individual miRNA levels in cell culture models. Less clear is what
function miRNAs play in a relatively static setting of terminally-differentiated cells. Even
in comparing miRNA binding profiles in normal versus disease states, the question will
109
remain as to which changes are causative and which are reactionary to the disease state. I
believe that in order to effectively use and interpret HITS-CLIP to study these kinds of
questions, we should first understand how Ago binding profiles relate to gene expression
changes in acute disease settings. For example, what happens to Ago binding profiles
during acute ischemia brought on by stroke or myocardial infarction? Furthermore, how
does the response differ in these settings in which very different mRNA and miRNA
profiles are intrinsically present? Following a common theme in biology, it seems likely
that some miRNAs will be involved in an immediate reactionary phase, followed by
another group guiding a return to homeostasis. Among the most interesting findings will
be determining to what extent the concentration of the miRNA or the targets influence the
activities of one another, given that lncRNAs, pseudogenes and mRNAs appear to
compete for miRNA binding.
Alternatively, given a setting such as B-cell chronic lymphocytic leukemia (B-
CLL) where the miR-15/16 family is deleted in nearly half of all cases (Calin, 2002),
HITS-CLIP and other high-throughput techniques would help uncover which
physiological changes result from loss of miR-15 and -16, and which come from the
resulting void filled by the miRNAs that remain. As expected, many validated targets for
miR-15/16 are upregulated in response to the chromosomal deletion. However, the
sudden disappearance of such a highly-expressed miRNA would also likely increase the
effective silencing capacity of the remaining miRNAs. In a simplistic setting, given a loss
of the miR-15/16 family with no net change in expression of miRNA machinery or other
mature miRNAs, more Ago proteins would be free to engage the remaining miRNAs.
The extent to which these miRNAs contribute to the observed gene expression changes
remains an open question. Ago HITS-CLIP could be performed to compare B-CLL cells
with the miR-15/16 deletion with B-CLL cells lacking the deletion or normal B cells.
Analysis of the Ago binding profile will show a complete loss of the miR-15/16-
dependent peaks. If the remaining miRNAs do indeed have increased binding potential
110
with the miR-15/16 locus deleted, then there should be a concomitant increase in peaks or
peak height associated with these remaining miRNAs. Performing RNA-seq on the total
RNA in these cells will also be important for comparative HITS-CLIP to account for
peak changes due to changes in mRNA expression levels.
What has become quite clear over the past several years is that a close partnership
between computational and molecular biologists is essential for truly understanding the
function of these small non-coding RNAs. No miRNA or miRNA:target interaction exists
in a vacuum, and microarray, RNA-seq and HITS-CLIP techniques will help to delineate
some more complex interactions. At the same time, the role of the biologist becomes all
the more important to present a setting and biological question for which these techniques
can be effectively employed, correctly interpreted and ultimately validated.
As the miRNA field moves forward, largely guided by high-throughput
sequencing technology, researchers should go in with a sense of naivety to the role that
miRNAs play. Reading through a 2004 review in Cell, entitled “MicroRNAs: genomics,
biogenesis, mechanism and function,” (Bartel, 2004) it is apparent that prior assumptions
guiding current research in these areas have changed very little in the near decade that
has passed since the review’s publication. Although such assumptions are not necessarily
invalid, indiscriminately following them has left many important observations to become
nothing more than puzzling curiosities. Assuming no strict a priori knowledge, careful
interpretation of the information gleaned from the new technology mentioned above
could illuminate the importance of intriguing observations such as, miRNAs up-
regulating gene expression (Vasudevan et al, 2007), “isomiR” production (Guo & Lu,
2010; Martí et al, 2010), or even apparent loading into Ago and functional silencing
mediated by miRNA precursors (Tan et al, 2009). In general, miRNA studies primarily
report miRNA-mediated repression of target genes containing 3’UTR MREs based on the
most commonly-annotated miRNA isoform. What remains unclear is to what extent the
research is biased due to researchers only choosing to study the canonical interactions, or
111
whether the non-canonical pathways are actually rare occurrences in nature. Ultimately,
the coming years should prove exciting for the miRNA field, and it has been a privilege
to play some small part in contributing to knowledge and discourse in this area.
112
APPENDIX
ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS
GENERATES NOVEL MICRORNA BINDING SITES 1F
Abstract
Animals regulate gene expression at multiple levels, contributing to the
complexity of the proteome. Among these regulatory events are post-transcriptional gene
silencing, mediated by small noncoding RNAs (e.g., microRNAs), and adenosine-to-
inosine (A-to-I) editing, generated by Adenosine Deaminases that Act on double stranded
RNA (ADAR). Recent data suggest that these regulatory processes are connected at a
fundamental level. A-to-I editing can affect Drosha processing or directly alter the
microRNA (miRNA) sequences responsible for mRNA targeting. Here, we analyzed the
previously reported adenosine deaminations occurring in human cDNAs, and asked if
there was a relationship between A-to-I editing events in the mRNA 3’ untranslated
regions (UTRs) and mRNA::miRNA binding. We find significant correlations between
A-to-I editing and changes in miRNA complementarities. In all, over 3,000 of the 12,723
distinct adenosine deaminations assessed were found to form 7-mer complementarities
(known as seed matches) to a subset of human miRNAs. In 200 of the ESTs, we also
noted editing within a specific 13 nucleotide motif. Strikingly, deamination of this motif
simultaneously creates seed matches to three (otherwise unrelated) miRNAs. Our results
suggest the creation of miRNA regulatory sites as a novel function for ADAR activity.
Consequently, many miRNA target sites may only be identifiable through examining
expressed sequences.
113
Introduction
A-to-I RNA editing catalyzed by dsRNA-specific ADAR refers to the conversion
of adenosine to inosine in double-stranded (ds) or stem-loop regions of precursor mRNAs
(Bass, 2002). Experimental evidence demonstrates that, whether found in a codon,
anticodon or mature miRNA, inosine, like guanine, preferentially base-pairs with
cytosine (Yoshida et al, 1968). Several characterized examples of amino acid changes
created by adenosine deamination show that ADARs can regulate gene expression by
directing the synthesis of distinct proteins from a single open reading frame (Bass, 2002;
Burns et al, 1997). Recent work by Li et al. confirms that editing events occur at a much
higher frequency within noncoding regions (Li et al, 2009). Comparisons of human EST
and genomic sequences have identified thousands of distinct ADAR deaminations
occurring in many different genes (Levanon et al). Possible functions for editing events
include altered splicing, RNA localization, nuclear retention, mRNA stability and
translational efficiency (reviewed in (Chen & Carmichael, 2008)). Interestingly, most
editing sites occur in Alu elements (Athanasiadis et al, 2004 ; Hundley et al, 2008; Kim et
al; Levanon et al, 2004 ), the majority of which are in UTRs (Hundley et al, 2008;
Levanon et al).
Experimental evidence suggests that miRNA-mediated post-transcriptional gene
silencing and A-to-I editing are interrelated (Kawahara et al, 2007a; Kawahara et al,
2007b; Luciano et al, 2004; Scadden, 2005). MiRNA transcripts have been found to
undergo ADAR deamination with editing affecting Drosha processing, Dicer processing
or mRNA targeting (Kawahara et al, 2008; Kawahara et al, 2007b; Yang et al, 2006 ).
Work by Kawahara and colleagues showed that ADAR deamination of the seed region of
miR-376 alters the gene set regulated by the edited versus the unedited miRNA
(Kawahara et al, 2007b). In this work, we asked if A-to-I editing of the target mRNA,
rather than the miRNA, could impact mRNA::miRNA binding by creating seed matches.
114
We examined the previously reported 12,723 distinct ADAR editing sites (Levanon et al,
2004), and find A-to-I editing creates perfect complementarities to human miRNA seeds.
Results
Adenosine deamination creates miRNA
complementarities
ADAR-mediated conversion of adenosine to inosine allows inosine:cytosine
pairing because inosine is chemically similar and functionally equivalent to guanosine
(Figure A-1A). A well-established participant in regulating RNA:RNA interactions
through altering sequence complementarity, the preferential base pairing of inosine to
cytosine was described several decades ago in codon:anticodon interactions (Yoshida et
al). More recently, the direct ADAR deamination of a miRNA (miR-376) was found to
alter miRNA target selection (Kawahara et al, 2007b). Over 12,000 A-to-I editing sites
have been identified in human mRNAs with nearly 90% of these occurring in UTRs
(Athanasiadis et al, 2004 ; Kim et al, 2004; Levanon et al, 2004 ). Because 3’ UTRs are
widely accepted as the predominant site of miRNA:mRNA association, we asked whether
deamination of 3’ UTR A-to-I editing sites (Levanon et al, 2004 ) significantly altered
their complementarity to currently annotated human miRNAs.
Although miRNAs are generally ~21-22 nt in length, their association with target
mRNAs is typically mediated through a seven base pair (bp) interaction involving base
pairs 2-8 (5’ to 3’) of the mature miRNA (Lai, 2002). This 7 nt sequence constitutes a
miRNA “seed” and its reverse complement in a target mRNA, a “seed match” (Lewis et
al, 2005). Using a simple 7 bp seed scan of the 100 bp 5’ and 3’ of the 12,723 distinct
deamination sites (Levanon et al, 2004 ), we identified miRNA seed matches that were
created or lost. All sites were screened once with a central adenosine (unedited, lost) and
once with a central guanosine (edited, created) (Figure A-1B). Using this approach, we
identified seed matches to 30 miRNA families that were significantly enriched (p ≤
115
1.8x10-5) in sequences bearing a central G position (Table A-1 and Table A-2).
Strikingly, over 3,000 of the 12,723 sites form perfect miRNA seed complements if
deaminated. We coined these miRNA associating if deaminated (MAID) sites, and find
that most are localized to the 3’ UTR (Table A-2). While editing can also destroy sites
(not shown), we focus here on MAIDs and their ability to confer miRNA-mediated
regulation.
MiR-513 and miR-769-3p/-450b-3p
specifically target deamination sites
We first examined the greatest outliers, miR-513 and miR-769-3p/-450b-3p, in
greater detail. In the 12,723 dataset representing unedited sequences, the average number
of seed matches to miR-769-3p/-450b-3p at any 7 nt position was 0.79 (max = 4). This
strongly contrasts the 252 miR-769-3p/-450b-3p seed matches unique to the edited 3’
UTR dataset (Table A-1 and Figure A-2A). Similarly, the average number of seed
matches to miR-513 at any position was 0.63 (max = 4) vs. 257 when comparing the
unedited to the edited 3’ UTR flanking sequence and edit site. Therefore, for these
mRNAs, miR-513 and miR-769-3p/-450b-3p preferentially target deaminated sequences.
Upon closer examination, we found that ~190 of the matches to the miR-513 seed
(3’ GGACACU 5’) and miR-769-3p/-450b-3p seed (3’ CUAGGGU 5’) were created by
a single deamination within a common 12 nt motif (5’ CCUGUIAUCCCA 3’) (Figure A-
2B). Finding an invariant guanine immediately 3’ of these 12 nt, and allowing for a single
GU wobble at an adenosine or guanine immediately 3’ to the deamination site, extended
the miRNA-513/-769-3p/-450b-3p MAID to 5’ CCUGUIRUCCCAG 3’. Thus, the
simple scanning approach used identified 288 distinct sites within this 13 nt motif, which
when edited forms seed matches to miR-513 and miR-769-3p/-450b-3p (Figure A-2C).
Thus, MAIDs containing miR-513 and miR-769-3p/-450b-3p seed matches are
significantly enriched in a subset of the deamination sites originally identified by
116
Levanon et al. (Levanon et al, 2004 ) (not shown). Of note, this result was repeated using
a standalone TargetScan program (Lewis et al, 2005) without considering conservation of
seed matches as a ranking criterion.
MiR-513 and miR-769-3p repress
deaminated sequences
To test if MAID sequences could serve as miR-513 and/or miR-769-3p/-450b-3p
targets, we constructed a series of luciferase reporters possessing unedited sequences, or
‘edited’ 13 bp MAIDs specific to miR-513/miR-759-3p/-450-3p downstream of Renilla
luciferase (Figure A-3A). MiR-513 expression vectors repressed TGAT (edited) target
activity by ~50% and TGGT target activity by ~ 40% when co-transfected with the
reporters (Figure A-3B). Replacing miR-513 with miR-769 in similar experiments
resulted in ~70% and ~60% reductions in Renilla luciferase activity, respectively (Figure
A-3B). Importantly, reporter activity from the ‘unedited’ reporter construct TAAT was
not affected when transfected with either miRNA (similar to control, data not shown).
We next repeated these experiments with pooled miRNA expression vectors and either
miRNA-specific or control miRNA inhibitors (Anti-miRs™) (Krutzfeldt et al, 2005 ). As
shown in Figure A-3B, activity was restored to near normal levels in the presence of
specific miR-513 and miR-769-3p inhibitors. These data demonstrate that MAIDs can be
specifically repressed by miR-513 and miR-769-3p.
To test whether MAIDs can confer miRNA regulation to endogenous mRNAs, we
examined the 3’ UTR of DFFA (DNA fragmentation factor alpha - also referred to as
ICAD). DFFA was selected for three reasons: one, the presence of nine 5’
CCUGUIRUCCCAG 3’ motifs within the 3’ UTR (Figure A-4A, B) (Hubbard et al,
2007); two, the prevalence of ESTs within the NCBI dataset showing ADAR activity in
the 3’ UTR (Wheeler et al, 2007); and three, our own sequencing confirmation of DFFA
117
deamination in cDNAs cloned from neuroblastoma (NB7) vs. HEK 293 cells (Fig. 4C).
Together, these characteristics present DFFA as a possible target for MAID regulation.
Two luciferase reporters were constructed to evaluate the ability of miR-513
and/or miR-769-3p to repress DFFA (Fig. 4D). These vectors encode Renilla luciferase
harboring either the DFFA edited 3’ UTR cloned from NB7 cells (DFFA-E), or the
unedited DFFA 3’ UTR cloned from HEK 293 cells (DFFA-U). DFFA-U and DFFA-E
constructs differ by a single adenosine deamination (293_2 vs. NB7_2; Figure A-4C). As
shown in Fig. 4E, miR-513 and miR-769-3p co-transfection repressed DFFA-E reporter
activity by ~ 30% and 60%, respectively. In contrast, DFFA-U reporter expression was
not repressed when transfected with either miRNA (similar to control, data not shown;
Figure A-4E). Experiments with pooled miRNA expression vectors and either miRNA-
specific or control miRNA inhibitors (Anti-miRs™) confirmed miRNA targeting activity
specific to the edited state. For DFFA-E, but not DFFA-U, Renilla activity was
significantly increased (to near normal levels) in the presence of Anti-miR-513 and Anti-
miR-769-3p (Figure A-4E). These experiments indicate that miR-513 and miR-769-3p
can regulate the DFFA mRNA 3’ UTR in an adenosine deamination-dependent manner.
MiR-769-3p represses DFFA expression specifically in
cells that deaminate the DFFA 3’ UTR
Having confirmed that miR-513 and miR-769-3p selectively repress deaminated
DFFA 3’ UTRs within the context of our reporters assays, we next tested whether they
could repress endogenous DFFA protein expression. Because miR-513 and miR-769-3p
were either undetectable or expressed at very low levels in NB7 cells (data not shown and
Figure A-5A), we used over-expression plasmids. MiR-769-3p overexpression in NB7
cells resulted in ~ 60% reduction in DFFA, in contrast to miR-769-3p overexpression in
HEK 293 cells (Figure A-5B, C). Co-transfection of Anti-miR-769-3p, but not a control
Anti-miR™, abrogated this repression (Figure A-5B). To assure these findings were the
118
consequence of miRNA expression, we repeated these experiments using commercially
synthesized miRNA precursor RNAs and found these similarly capable of silencing
endogenous DFFA levels in NB7 cells (data not shown). The cell line-specific reduction
of endogenous DFFA supports the hypothesis that miR-769-3p can regulate the
deaminated DFFA 3’ UTRs.
Discussion
In this work, we demonstrate that A-to-I editing can create miRNA target sites
and that these sites are functional in vitro. On the surface, our findings appear to
contradict a recent hypothesis report indicating no correlation between miRNA target
sites and A-to-I editing (Liang & Landweber, 2007); however, our data are largely in
agreement, and the apparent conflict can be explained by the scope of our study. The
prior work by Liang and Landweber (Liang & Landweber, 2007) searched for a
relationship between ADAR deaminations and 73 conserved miRNA families. In
contrast, we evaluated all human miRNAs. By extending the study beyond conserved
miRNAs, we identify 325 miRNA families with complementarities enriched at
deamination sites. Notably, the majority of miRNAs identified is primate specific and
would, therefore, not be identified in an analysis of evolutionarily conserved miRNAs. In
more recent work, Li and coworkers also found no enrichment of editing sites in miRNA
target genes when sites within Alu elements were excluded (Li et al).
While the simple scanning approach we used identified a number of 7 bp miRNA
seed matches that were formed upon editing, our results are likely an underestimation.
Our approach does not consider the formation (or loss) of a1 sites, which is defined by
TargetScan5.0 as having a 6-bp perfect match (nts 2-7 of the seed) and an adenosine
immediately 3’ of the seed match. Similarly, our scan would not identify sites with an
edited adenosine in position 1 of an 8mer seed match. Even with these possible
differences in absolute numbers of created MAIDs or miRNA-targeting sites lost upon
119
editing, data from TargetScan5.0 resulted in the same overall trends for the transcripts
profiled.
Alu sequences in transcripts are notable for harboring human miRNA seed
complementarities (Lehnert et al, 2009; Smalheiser & Torvik, 2006a) and we extend that
complementarity to A-to-I edited sites. The MAIDs identified in our work overlap with
adenosines that are edited more frequently than those at other positions, as quantified by
Kim et al (Supplementary Fig. 1(Kim et al, 2004). Interestingly, the sequences
surrounding and containing these sites are highly conserved among Alu families (Ray &
Batzer, 2005). Together with our experimental and informatics analyses, these findings
suggest that one outcome of primate mRNA 3’ UTR adenosine deamination is the
modulation of miRNA target sites.
Editing occurring within 3’ UTR-resident Alu elements can result in nuclear
retention of the transcripts (Chen et al, 2008), however, 3’ UTR-edited mRNAs can also
be associated with polysomes (Hundley et al, 2008), suggesting that not all edited
transcripts are retained in the nucleus. Furthermore, a more recent bioinformatics study
demonstrated that deletion of sequences between paired Alus can occur, potentially
removing miRNA target sites(Osenberg et al, 2009). Relevant to our work, Osenberg and
coworkers did not detect cleavage of human DFFA 3’ UTRs, suggesting that our
predicted sites remain intact in the final transcript. It will be interesting in further work to
determine the sequence isoform of DFFA and other edited 3’ UTRs as to their
cytoplasmic expression (or nuclear retention) between cells of different origin, cells in
variable states, or among different tissues.
Finally, although miRNA:target interactions have been catalogued using
bioinformatic approaches, target validation still requires empirical validation. Factors
such as GU base-pairing, local secondary structure, target accessibility, and position
effects due to nucleotide composition clearly complicate accurate target prediction
(Smalheiser & Torvik, 2006b). Here, we demonstrate that some miRNA target sequences
120
are not detectable by querying only the genomic sequence. For the miRNAs we identify
as complementary to deaminated sequences, as well as miRNAs potentially targeting
other post-transcriptional editing events, accurate target identification may only be
possible through evaluating expressed sequences directly. More recent data showing
editing events between tissues of an individual (Li et al, 2009), and deep sequencing that
reveal rare editing events in the developing brain (Wahlstedt et al, 2009), will be useful
to analyze further the extent of editing that confers miRNA target site modulation.
Materials and Methods
Informatics evaluation of ADAR deamination sites
The full, publically available Compugen deamination dataset (composed of
12,723 characterized human deamination sites with 200 nt of flanking sequence (100 nt
5’ and 3’) were obtained from www.cgen.com/research/Publications. Perfect seed
matches to all currently annotated human miRNAs were identified in each of the
Compugen sequences using either an in-house Perl script which counted seed match
occurrence and recorded the position of each within individual 201 nt Compugen
sequences (as illustrated in Fig. 1), or an in-house Excel-based analysis, and using the
standalone version of Targetscan (Targetscan 5.0). Each 7-mer sequence was queried for
perfect identity to the reverse and complement of the 7nt seed sequence for human
miRNAs (miRBASE 13.0). Statistical significance of individual miRNA seed occurrence
was determined by Fisher exact test. Significance of A-to-I editing site and MAID
occurrence in human cDNA databases was determined by chi-square distribution test
Vector construction
Unless otherwise indicated, PCR amplifications were performed in 40 μl reactions
at standard concentrations (1.5 mM MgCl2, 0.2 mM dNTP, 1 Biolase PCR buffer, 0.5
U Taq (Biolase USA, Inc., Randolph, MA), 0.5 µM each primer) and using standard
121
cycling parameters (94 °C - 3 min, (94 °C - 30s, 55 °C - 30s, 72 °C - 60s) 30 cycles, 72
°C - 3 min) then cloned into Topo TA PCR 2.1 (Invitrogen, Carlsbad, CA). RT-PCRs
were performed at 55 °C using RetroScript III Reverse Transcriptase (Invitrogen,
Carlsbad, CA) and oligo-dT 20mers. UTR amplifications were cloned into Topo TA PCR
2.1 and sequenced. Antisense reporters (TAAT, TAGT, TGAT, TGGT, Consensus-A and
Consensus-G) were constructed by oligonucleotide primer extension (amplifications
performed as above except cycle number was decreased to 25 and extensions to 10s) with
primers containing 5 Xho-I and 3 Spe-I restriction enzyme sites. Following digestion,
amplicons were ligated into the Renilla luciferase 3�UTR of psiCheck2 (Promega,
Madison, WI) vector linearized with Xho-I and Spe-I then incubated with antarctic
phosphatase (NEB, Ipswich, MA). The presence of an independently transcribed firefly
luciferase in these reporters allowed normalization for transfection efficiency.
Luciferase assays
HEK 293s were cultured in DMEM (10% FBS and 1% PS) in 12-well plates. At
90% confluency, cells were transfected following the Lipofectamine 2000 (Invitrogen,
Carlsbad, CA) protocol. As indicated, luciferase assays (n = 3) were performed on HEK
293 lysates following cotransfections of psiCheck2 (Promega, Madison, WI) luciferase
reporters with Alu promoter expression vectors (pAL-513, pAL-769-3p or pAL-1-control
(Borchert et al, 2006 ) and/or Anti-miRs (Ambion ®) following manufacturer
recommended guidelines. At 35 h, existing media was replaced with 1 ml fresh media. At
36 h, cells were scraped from well bottoms and transferred to 1.5 ml Eppendorf tubes.
Eppendorfs were centrifuged at 2000 RCF for 3 min, followed by supernatant aspiration
and cell resuspension in 300 μl of PBS. Cells were lysed by three freeze thaws and debris
removed by centrifuging at 3000 RCF for 3 min. 50 μl of supernatant was transferred to a
96-well MicroLite plate (MTX Lab Systems, Vienna, VA) then firefly and Renilla
luciferase activities measured using the Dual-GLO Luciferase® Reporter System
122
(Promega, Madison, WI) and a 96-well plate luminometer (Dynex, Worthing, West
Sussex, UK). RLUs were calculated as the quotient of Renilla luciferase/firefly luciferase
RLU and normalized to mock.
Western blotting
Cells were cultured and transfected as described for luciferase assays. At 36 h,
existing media was replaced with 100 μl of lysis buffer containing protease inhibitors and
incubated for 15 min at 4°C after which cells were scraped from well bottoms and
transferred to 1.5 ml Eppendorf tubes. Proteins were electrophoresed through an 8%
SDS–polyacrylamide gel (BioRad, Hercules, CA) and transferred to Immobilon-P PVDF
membranes (Millipore, Billerica, MA). Membranes were blocked for 1 h in 2% (w/v)
nonfat milk in phosphate-buffered saline containing 0.05% Tween 20, washed, and
incubated with primary antibody overnight at 4°C using the following dilution: DFFA –
1:3000 (ab16258, Abcam, Cambridge, MA) and β-catenin – 1:6000 (ab2982, Abcam).
Membranes were washed and incubated with goat anti-rabbit peroxidase-conjugated
secondary antibody (111-035-144, Jackson ImmunoResearch, West Grove, PA).
Immunoreactive bands were visualized with ECL Plus (Amersham, Piscataway, NJ) and
quantified using a Fluorochem densitometer (Alpha Innotech Corp., San Leandro, CA).
123
Table A-1. MiRNA seed matches markedly enriched by adenosine deamination
miRNA Seed Exp2 / Uned Deam. p-value
miR-513a-5p 2 TCACAGG 0.6 / 0 / 257 5.8E-79
miR-769-3p,-450b- TGGGATC 0.6 / 4 / 252 4.5E-70 miR-140-3p ACCACAG 1.1 / 0 / 135 6.4E-29 miR-340* CCGTCTC 5.9 / 9 / 132 7.3E-28 miR-129-5p TTTTTGC 1.2 / 11 / 105 8.9E-21 miR-1207-3p CAGCTGG 1.6 / 4 / 97 1.3E-19 miR-222* TCAGTAG 1.3 / 6 / 95 2.8E-18 miR-936 CAGTAGA 1.0 / 5 / 94 3.7E-18 miR-30a*,d*,e* TTTCAGT 1.1 / 10 / 93 4.3E-18 miR-646 AGCAGCT 1.1 / 2 / 87 6.4E-16 miR-412 CTTCACC 0.6 / 5 / 83 1.2E-15 miR-330-5p,-326 CTCTGGG 0.6 / 4 / 78 6.3E-13 miR-629* TTCTCCC 1.2 / 4 / 74 5.8E-12 miR-34a* AATCAGC 0.9 / 0 / 61 8.4E-10 miR-519e AAGTGCC 0.8 / 3 / 59 8.1E-9 miR-548c-3p AAAAATC 2.0 / 0 / 59 8.1E-9 miR-325 CTAGTAG 0.5 / 5 / 57 7.7E-8 miR-371-3p AGTGCCG 0.2 / 0 / 57 7.7E-8 miR-630 GTATTCT 0.4 / 4 / 56 8.3E-8 miR-1281 CGCCTCC 3.6 / 0 / 52 1.1E-7 miR-1229 TCTCACC 1.0 / 3 / 40 5.6E-6 miR-28-5p,-708 AGGAGCT 0.7 / 1 / 35 5.4E-5
1Expected numbers are based on seed match occurrence in the 200nt flanking each adenosine deamination.
2 miRs-513 and -769-3p are often (~80%) complementarity to the same deaminated sequences.
3 The two miR 518 family seeds AAAGCGC and AAGCGCT are complementarity to the same twenty-six deaminated sequences.
124
Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences
Number of Edited Sequences
% of Total Edited Sequences
% of Edited Sequences with
miR seed matches
Total Edited Sequences
12723 100.00%
Total # Successfully
Mapped1
8014 (99.75%/0.25%)
62.99%
Total MAIDs Overall
3058 24.04% 100%
Total MAIDs Mapped2
1918 (99.64%/0.36%)
15.08% 62.72%
Total Lost Sequences Overall
2358 19.95% 100.00%
Total Lost Sequences Mapped
1605 (99.75%/0.25%)
12.61% 63.24%
1 12,723 available editing sequences (201 nt) were mapped to human transcripts (previous mapping data of these sequences were not currently available) obtained from the UCSC table browser (RefSeq hg18) using megablast (arguments: -W 196; -S 1; -F F; -p 100). Due to the highly repetitive nature of the sequences used for this analysis, positive identification required 100% identity and 100% coverage of the editing sequence. Using these criteria, ~63% of the 12,723 sequences could be mapped.
2 Sequences with miRNA sites created (MAIDs) or lost were queried for mapping to coding or noncoding regions. Of the sequences mapped to transcripts using our methods, the vast majority fell with non-coding regions (%non-coding/%coding presented in the first numerical column).
125
Figure A-1. ADARs deaminate adenosine to inosine, potentially altering miRNA complementarities.
(A) A cartoon depicting adenosine, deaminated adenosine (inosine), and guanine. In some tRNAs, inosine routinely serves as a member of the anticodon where it is recognized as a guanine. (B) A characterized deamination site occurring in the 3’ UTR of DNA Fragmentation Factor α (DFFA) is shown in both an edited and unedited state. In this work, each 7 nt sequence (red) occurring within 100 nt of > 12,000 distinct deamination sites (blue) were screened against all annotated human miRNA seed sequences. The miR-513 seed (yellow) illustrates how target mRNA deamination can mediate miRNA binding. Contributed by Glen Borchert.
126
Figure A-2. A-to-I edits frequently create miR-513 and miR-769-3p / -450b-3p complementarities.
(A) 12,719 unique EST sequences (www.cgen.com), each consisting of a central A-to-I deamination and 100 nt flanks (i.e. n100 (A or I) n100), were screened for complementarity to human miRNAs. All human miRNA seed matches were identified within the individual 201 nt sequences originally identified as an A-to-I transition by Compugen (statistical significance is addressed in Table 2). The top two panels represent all miR-769-3p (and miR-450b-3p) seed matches occurring at each position in both the unedited (left) and edited (right) states. The lower panels represent all miR-513 seed matches occurring in unedited (left) and edited (right) states. (B) A cartoon of miR-513 and miR-769-3p / -450b-3p complementarities to a MiRNA Associating If Deaminated (MAID) site in both unedited (left) and edited (right) states is shown. Perfect seed matches to miR-769-3p / -450b-3p (blue) and miR-513 (yellow) are significantly enriched in sequences containing characterized deaminations (red). Vertical lines indicate complementary base pairing. (C) Venn diagram depicting the overlap between miR-513 and miR-769-3p / -450b-3p target sites matching the full MAID motif. Importantly, nearly 100 additional sequences are identified by allowing a single GU wobble immediately 3’ to the deamination. CCUGUIRUCCCAG. Original analysis by Glen Borchert. Reanalysis and editing by Ryan Spengler.
127
Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence.
(A) A diagram shows hairpin expression vectors and MAID reporter constructs. pAL -513 and -769-3p reporters have miR-513 and miR-769-3p hairpins downstream of the miR-517 Pol-III promoter. TAAT, TGAT, and TGGT reporters contain 3 tandem copies of the 13 bp MAID sequence in the 3’ UTR of Renilla luciferase for testing activity in the unedited (TAAT) or edited (TGAT, TGGT) states. Guanines mimicking A-to-I edits are bolded and underscored. (B) Renilla luciferase activity (normalized to firefly luciferase and presented as percent mock transfected control) following co-transfection of miR-513, miR-769-3p, pooled miR-513 and miR-769-3p inhibitors and/or control miRNA inhibitor with the indicated reporters into HEK 293 cells (n = 3) is illustrated. *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.
128
Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression.
(A) A cartoon depicts the DFFA 3’UTR and the localization of nine distinct MAIDs (lines above the 3’UTR). (B) Alignment of the nine DFFA 3’UTR MAIDs commonly deaminated in ESTs is represented. MAID sequences are shaded. Four MAIDs contain GU wobbles from consensus (bold). (C) Alignment of DFFA_1 sequences from independent DFFA clones isolated from HEK 293 cells and NB7 cells is shown. DFFA_1 was deaminated in NB7s (bold) but not in HEK 293s. RT reactions were performed using a thermostable reverse transcriptase. (D) A diagram of DFFA 3’UTR reporter constructs is shown. In DFFA-Edited (-E) and DFFA-Unedited (-U), the Renilla 3’ UTRs are the cloned DFFA 3’UTRs from NB7 and HEK 293 cells, respectively. DFFA-E nucleotides differing from DFFA-U are bolded and underscored (compare NB7_2 and 293_1 detailed in panel (C). (E) Luciferase assays performed identically to those in 3b except for the reporter constructs illustrated (n=3). *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.
129
Figure A-5. MiR-769 selectively represses DFFA protein.
(A) Relative miR-769-3p RNA levels in HEK 293, A549, HT1080 and NB7 cell lines are shown as determined by quantitative PCR. MiR-513 was not detected in these cell lines. (B) MiR-769-3p over-expression reduces DFFA levels specifically in NB7 cells. Endogenous DFFA protein levels in NB7 and HEK293 cells were determined by western blot densitometry. The ratio of DFFA levels in NB7 / HEK293 is shown. (C) Western blot analysis of endogenous DFFA in HEK 293 and NB7 cell lysates following transfection of miR-769 as indicated. Representative blots for DFFA and β-catenin (loading control) are shown. Relative DFFA levels were calculated as band intensity ratios of DFFA to β-catenin and normalized to mock (left most bar in each graph). 400 - 400 ng miR-769 expression vector; 200 - 200 ng miR-769 expression vector; 100 - 100 ng miR-769 expression. Contributed by Glen Borchert and Brian Gilmore.
130
REFERENCES
Anderson EM, Birmingham A, Baskerville S, Reynolds A, Maksimova E, Leake D, Fedorov Y, Karpilow J, Khvorova A (2008) Experimental validation of the importance of seed complement frequency to siRNA specificity. RNA 14: 853-861
Athanasiadis A, Rich A, Maas S (2004 ) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391
Azuma-Mukai A (2008) Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc Natl Acad Sci USA 105: 7964-7969
Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-297
Bass BL (2002) RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817-846
Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE (2008) Active Alu retrotransposons in the human genome. Genome Res 18: 1875-1883
Berezikov E, Chung WJ, Willis J, Cuppen E, Lai EC (2007) Mammalian mirtron genes. Mol Cell 28: 328-336
Birmingham A, Anderson E, Sullivan K, Reynolds A, Boese Q, Leake D, Karpilow J, Khvorova A (2007) A protocol for designing siRNAs with high functionality and specificity. Nat Protoc 2: 2068-2078
Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, Fedorov Y, Baskerville S, Maksimova E, Robinson K, Karpilow J, Marshall WS, Khvorova A (2006 ) 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3: 199-204
131
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19: Unit 19 10 11-21
Bohnsack MT, Czaplinski K, Gorlich D (2004) Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. Rna 10: 185-191
Borchert G, Gilmore B, Spengler R, Xing Y, Lanier W, Bhattacharya D, Davidson B (2009) Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum Mol Genet 18: 4801-4807
Borchert GM, Lanier W, Davidson BL (2006 ) RNA polymerase III transcribes human microRNAs. Nat Struct Mol Biol 13: 1097-1101
Boudreau RL, Martins I, Davidson BL (2009a) Artificial MicroRNAs as siRNA Shuttles: Improved Safety as Compared to shRNAs In vitro and In vivo. Mol Ther 17: 169-175
Boudreau RL, McBride JL, Martins I, Shen S, Xing Y, Carter BJ, Davidson BL (2009b) Nonallele-specific silencing of mutant and wild-type huntingtin demonstrates therapeutic efficacy in Huntington's disease mice. Mol Ther 17: 1053-1063
Boudreau RL, Spengler RM, Davidson BL (2011) Rational Design of Therapeutic siRNAs: Minimizing Off-targeting Potential to Improve the Safety of RNAi Therapy for Huntington's Disease. Molecular Therapy 19: 2169-2177
Boudreau RL, Spengler RM, Hylock RH, Kusenda BJ, Davis HA, Eichmann DA, Davidson BL (2013) siSPOTR: a tool for designing highly specific and potent siRNAs for human and mouse. Nucleic Acids Research 41
Bovia F, Wolff N, Ryser S, Strub K (1997) The SRP9/14 subunit of the human signal recognition particle binds to a variety of Alu-like RNAs and with higher affinity than its mouse homolog. Nucleic Acids Res 25: 318-326
132
Bracken CP (2008) A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res 68: 7846-7854
Bramsen JB, Pakula MM, Hansen TB, Bus C, Langkjaer N, Odadzic D, Smicius R, Wengel SL, Chattopadhyaya J, Engels JW, Herdewijn P, Wengel J, Kjems J (2010) A screen of chemical modifications identifies position-specific modification by UNA to most potently reduce siRNA off-target effects. Nucleic Acids Res 38: 5761-5773
Burchard J, Jackson AL, Malkov V, Needham RH, Tan Y, Bartz SR, Dai H, Sachs AB, Linsley PS (2009) MicroRNA-like off-target transcript regulation by siRNAs is species specific. Rna 15: 308-315
Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB (1997) Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387: 303-308
Caffrey DR, Zhao J, Song Z, Schaffer ME, Haney SA, Subramanian RR, Seymour AB, Hughes JD (2011) siRNA off-target effects can be reduced at concentrations that match their individual potency. PLoS One 6: e21503
Calin GA (2002) Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA 99: 15524-15529
Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier O, Chinappi M, Tramontano A, Bozzoni I (2011) A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147: 358-369
Chang DY, Hsu K, Maraia RJ (1996) Monomeric scAlu and nascent dimeric Alu RNAs induced by adenovirus are assembled into SRP9/14-containing RNPs in HeLa cells. Nucleic Acids Res 24: 4165-4170
Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37: W305-311
133
Chen LL, Carmichael GG (2008) Gene regulation by SINES and inosines: biological consequences of A-to-I editing of Alu element inverted repeats. Cell Cycle 7: 3294-3301
Chen LL, DeCerbo JN, Carmichael GG (2008) Alu element-mediated gene silencing. EMBO J 27: 1694-1705
Chi JT, Chang HY, Wang NN, Chang DS, Dunphy N, Brown PO (2003 ) Genomewide view of gene silencing by small interfering RNAs. Proc Natl Acad Sci U S A 100: 6343-6346
Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, Blelloch R, Schroth GP, Nusbaum C, Bartel DP (2010) Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24: 992-1009
Cummins JM, He Y, Leary RJ, Pagliarini R, Diaz LA, Jr., Sjoblom T, Barad O, Bentwich Z, Szafranska AE, Labourier E, Raymond CK, Roberts BS, Juhl H, Kinzler KW, Vogelstein B, Velculescu VE (2006) The colorectal microRNAome. Proc Natl Acad Sci U S A 103: 3687-3692
Davidson BL, McCray PB, Jr. (2011) Current prospects for RNA interference-based therapies. Nat Rev Genet 12: 329-340
Davis BN, Hilyard AC, Lagna G, Hata A (2008) SMAD proteins control DROSHA-mediated microRNA maturation. Nature 454: 56-61
Davis-Dusenbery BN, Hata A (2010) Mechanisms of control of microRNA biogenesis. J Biochem 148: 381-392
134
Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S, Rozado D, Magen A, Canidio E, Pagani M, Peluso I, Lin-Marq N, Koch M, Bilio M, Cantiello I, Verde R, De Masi C, Bianchi SA, Cicchini J, Perroud E, Mehmeti S, Dagand E, Schrinner S, Nürnberger A, Schmidt K, Metz K, Zwingmann C, Brieske N, Springer C, Hernandez AM, Herzog S, Grabbe F, Sieverding C, Fischer B, Schrader K, Brockmeyer M, Dettmer S, Helbig C, Alunni V, Battaini MA, Mura C, Henrichsen CN, Garcia-Lopez R, Echevarria D, Puelles E, Garcia-Calero E, Kruse S, Uhr M, Kauck C, Feng G, Milyaev N, Ong CK, Kumar L, Lam M, Semple CA, Gyenesei A, Mundlos S, Radelof U, Lehrach H, Sarmientos P, Reymond A, Davidson DR, Dollé P, Antonarakis SE, Yaspo ML, Martinez S, Baldock RA, Eichele G, Ballabio A (2011) A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9: e1000582
Dinger ME, Amaral PP, Mercer TR, Pang KC, Bruce SJ, Gardiner BB, Askarian-Amiri ME, Ru K, Soldà G, Simons C, Sunkin SM, Crowe ML, Grimmond SM, Perkins AC, Mattick JS (2008) Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res 18: 1433-1445
Doench JG, Sharp PA (2004) Specificity of microRNA target selection in translational repression. Genes Dev 18: 504-511
Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP (2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310: 1817-1821
Fedorov Y, Anderson EM, Birmingham A, Reynolds A, Karpilow J, Robinson K, Leake D, Marshall WS, Khvorova A (2006 ) Off-target effects by siRNA can induce toxic phenotype. RNA 12: 1188-1196
Friedman RC, Farh KK, Burge CB, Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19: 92-105
Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39: D876-882
135
Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP (2011) Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18: 1139-1146
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15: 1451-1455
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11: R86
Gregory RI (2004) The Microprocessor complex mediates the genesis of microRNAs. Nature 432: 235-240
Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34: D140-144
Grimm D, Streetz KL, Jopling CL, Storm TA, Pandey K, Davis CR, Marion P, Salazar F, Kay MA (2006 ) Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature 441: 537-541
Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP (2007) MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27: 91-105
Guo H, Ingolia NT, Weissman JS, Bartel DP (2010) Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466: 835-840
Guo L, Lu Z (2010) Global expression analysis of miRNA gene cluster and family based on isomiRs from deep sequencing data. Comput Biol Chem 34: 165-171
Han J (2004) The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18: 3016-3027
136
Han J (2006) Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125: 887-901
Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495: 384-388
Hsu K, Chang DY, Maraia RJ (1995) Human signal recognition particle (SRP) Alu-associated protein also binds Alu interspersed repeat sequence RNAs. Characterization of human SRP9. J Biol Chem 270: 10179-10186
Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610-617
Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Meloon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt F, Hall J (2005 ) Design of a genome-wide siRNA library using an artificial neural network. Nat Biotechnol 23: 995-1001
Hundley HA, Krauchuk AA, Bass BL (2008) C. elegans and H. sapiens mRNAs with edited 3' UTRs are present on polysomes. RNA 14: 2050-2060
Jackson AL, Bartz SR, Schelter J, Kobayashi SV, Burchard J, Mao M, Li B, Cavet G, Linsley PS (2003 ) Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol 21: 635-637
Jackson AL, Burchard J, Leake D, Reynolds A, Schelter J, Guo J, Johnson JM, Lim L, Karpilow J, Nichols K, Marshall W, Khvorova A, Linsley PS (2006 ) Position-specific chemical modification of siRNAs reduces "off-target" transcript silencing. RNA 12: 1197-1205
137
Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, Linsley PS (2006) Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. Rna 12: 1179-1187
Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9: 57-67
Kaneko H, Dridi S, Tarallo V, Gelfand BD, Fowler BJ, Cho WG, Kleinman ME, Ponicsan SL, Hauswirth WW, Chiodo VA, Karikó K, Yoo JW, Lee DK, Hadziahmetovic M, Song Y, Misra S, Chaudhuri G, Buaas FW, Braun RE, Hinton DR, Zhang Q, Grossniklaus HE, Provis JM, Madigan MC, Milam AH, Justice NL, Albuquerque RJ, Blandford AD, Bogdanovich S, Hirano Y, Witta J, Fuchs E, Littman DR, Ambati BK, Rudin CM, Chong MM, Provost P, Kugel JF, Goodrich JA, Dunaief JL, Baffi JZ, Ambati J (2011) DICER1 deficit induces Alu RNA toxicity in age-related macular degeneration. Nature 471: 325-330
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32: D493-496
Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA, Pandolfi PP (2011) In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell 147: 382-395
Kawahara Y, Megraw M, Kreider E, Iizasa H, Valente L, Hatzigeorgiou AG, Nishikura K (2008) Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36: 5270-5280
Kawahara Y, Zinshteyn B, Chendrimada TP, Shiekhattar R, Nishikura K (2007a) RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8: 763-769
Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K (2007b) Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315: 1137-1140
138
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12: 996-1006
Khvorova A, Reynolds A, Jayasena SD (2003 ) Functional siRNAs and miRNAs Exhibit Strand Bias. Cell 115: 209-216
Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14: 1719-1725
Kim YK, Kim VN (2007) Processing of intronic microRNAs. Embo J 26: 775-783
Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3: ra8
Kozomara A, Griffiths-Jones S (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39: D152-157
Krutzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M (2005 ) Silencing of microRNAs in vivo with 'antagomirs'. Nature 438: 685-689
Lai EC (2002) Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genetics 30: 363-364
Lal A, Navarro F, Maher CA, Maliszewski LE, Yan N, O'Day E, Chowdhury D, Dykxhoorn DM, Tsai P, Hofmann O, Becker KG, Gorospe M, Hide W, Lieberman J (2009) miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to "seedless" 3'UTR microRNA recognition elements. Mol Cell 35: 610-625
139
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ (2001 ) Initial sequencing and analysis of the human genome. Nature 409: 860-921
Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G, Schuit FC (2009) Evidence for co-evolution between human microRNAs and Alu-repeats. PLoS ONE 4: e4456
Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005
140
Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004 ) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005
Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 15-20
Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115: 787-798
Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM (2009) Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324: 1210-1213
Liang H, Landweber LF (2007) Hypothesis: RNA editing of microRNA target sites in humans? Rna 13: 463-467
Luciano DJ, Mirsky H, Vendetti NJ, Maas S (2004) RNA editing of a miRNA precursor. RNA 10: 1174-1177
Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2004) Nuclear export of microRNA precursors. Science 303: 95-98
Ma Y, Creanga A, Lum L, Beachy PA (2006) Prevalence of off-target effects in Drosophila RNA interference screens. Nature 443: 359-363
Macrae IJ (2006) Structural basis for double-stranded RNA processing by Dicer. Science 311: 195-198
Martin JN, Wolken N, Brown T, Dauer WT, Ehrlich ME, Gonzalez-Alegre P (2011) Lethal toxicity caused by expression of shRNA in the mouse striatum: implications for therapeutic design. Gene Ther
141
Martí E, Pantano L, Bañez-Coronel M, Llorens F, Miñones-Moyano E, Porta S, Sumoy L, Ferrer I, Estivill X (2010) A myriad of miRNA variants in control and Huntington's disease brain regions detected by massively parallel sequencing. Nucleic Acids Res 38: 7219-7235
Matveeva O, Nechipurenko Y, Rossi L, Moore B, Saetrom P, Ogurtsov AY, Atkins JF, Shabalina SA (2007) Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Nucleic Acids Res 35: e63
McBride JL, Boudreau RL, Harper SQ, Staber PD, Monteys AM, Martins I, Gilmore BL, Burstein H, Peluso RW, Polisky B, Carter BJ, Davidson BL (2008) Artificial miRNAs mitigate shRNA-mediated toxicity in the brain: Implications for the therapeutic development of RNAi. Proc Natl Acad Sci U S A 105: 5868-5873
McBride JL, Pitzer MR, Boudreau RL, Dufour B, Hobbs T, Ojeda SR, Davidson BL (2011) Preclinical safety of RNAi-mediated HTT suppression in the rhesus macaque as a potential therapy for Huntington's disease. Mol Ther 19: 2152-2162
Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A 105: 716-721
Miller V, Gouvion C, Davidson B, Paulson H (2004 ) Targeting Alzheimer's disease genes with RNA interference: an efficient strategy for silencing mutant allele. Nucleic Acids Res 32: 661-668
Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, Carpenter AE, Foo SY, Stewart SA, Stockwell BR, Hacohen N, Hahn WC, Lander ES, Sabatini DM, Root DE (2006) A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124: 1283-1298
Monteys AM, Spengler RM, Wan J, Tecedor L, Lennox KA, Xing Y, Davidson BL (2010) Structure and activity of putative intronic miRNA promoters. Rna-a Publication of the Rna Society 16: 495-505
Naito Y, Yamada T, Ui-Tei K, Morishita S, Saigo K (2004 ) siDirect: highly effective, target-specific siRNA design software for mammalian RNA interference. Nucleic Acids Res 32: W124-129
142
Newman MA, Thomson JM, Hammond SM (2008) Lin-28 interaction with the let-7 precursor loop mediates regulated microRNA processing. RNA 14: 1539-1549
Ng L, Bernard A, Lau C, Overly CC, Dong HW, Kuan C, Pathak S, Sunkin SM, Dang C, Bohland JW, Bokil H, Mitra PP, Puelles L, Hohmann J, Anderson DJ, Lein ES, Jones AR, Hawrylycz M (2009) An anatomic gene expression atlas of the adult mouse brain. Nat Neurosci 12: 356-362
Nielsen CB, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge CB (2007) Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 13: 1894-1910
Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC (2007) The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130: 89-100
Osenberg S, Dominissini D, Rechavi G, Eisenberg E (2009) Widespread cleavage of A-to-I hyperediting substrates. RNA 15: 1632-1639
Packer AN, Xing Y, Harper SQ, Jones L, Davidson BL (2008) The bifunctional microRNA miR-9/miR-9* regulates REST and CoREST and is downregulated in Huntington's disease. J Neurosci 28: 14341-14346
Piriyapongsa J, Jordan IK (2007) A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS One 2: e203
Piriyapongsa J, Marino-Ramirez L, Jordan IK (2007) Origin and evolution of human microRNAs from transposable elements. Genetics 176: 1323-1337
Piskounova E, Polytarchou C, Thornton JE, LaPierre RJ, Pothoulakis C, Hagan JP, Iliopoulos D, Gregory RI (2011) Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell 147: 1066-1079
143
Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465: 1033-1038
Provost P, Dishart D, Doucet J, Frendewey D, Samuelsson B, Radmark O (2002 ) Ribonuclease activity and RNA binding of recombinant human Dicer. The EMBO Journal 21: 5864-5874
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501-504
Ray DA, Batzer MA (2005) Tracking Alu evolution in New World primates. BMC Evol Biol 5: 51
Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81: 145-166
Saito K, Ishizuka A, Siomi H, Siomi MC (2005) Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol 3: e235
Scadden AD (2005) The RISC subunit Tudor-SN binds to hyper-edited double-stranded RNA and promotes its cleavage. Nat Struct Mol Biol 12: 489-496
Schirle NT, MacRae IJ (2012) The crystal structure of human Argonaute2. Science 336: 1037-1040
Schultz N, Marenstein DR, De Angelis DA, Wang WQ, Nelander S, Jacobsen A, Marks DS, Massague J, Sander C (2011) Off-target effects dominate a large-scale RNAi screen for modulators of the TGF-beta pathway and reveal microRNA regulation of TGFBR2. Silence 2: 3
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115: 199-208
144
Semizarov D, Frost L, Sarthy A, Kroeger P, Halbert DN, Fesik SW (2003 ) Specificity of short interfering RNA determined through gene expression signatures. Proc Natl Acad Sci U S A 100: 6347-6352
Shin C, Nam JW, Farh KK, Chiang HR, Shkumatava A, Bartel DP (2010) Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38: 789-802
Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, Quattrochi B, King RW (2012) A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nat Methods 9: 363-366
Smalheiser NR, Torvik VI (2005) Mammalian microRNAs derived from genomic repeats. Trends Genet 21: 322-326
Smalheiser NR, Torvik VI (2006a) Alu elements within human mRNAs are probable microRNA targets. Trends Genet 22: 532-536
Smalheiser NR, Torvik VI (2006b) Complications in mammalian microRNA target prediction. Methods Mol Biol 342: 115-127
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101: 6062-6067
Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A (2011) An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147: 370-381
Tan GS, Garchow BG, Liu X, Yeung J, Morris JP, Cuellar TL, McManus MT, Kiriakidou M (2009) Expanded RNA-binding activities of mammalian Argonaute 2. Nucleic Acids Res 37: 7533-7545
145
Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147: 344-357
Vaish N, Chen F, Seth S, Fosnaugh K, Liu Y, Adami R, Brown T, Chen Y, Harvie P, Johns R, Severson G, Granger B, Charmley P, Houston M, Templin MV, Polisky B (2011) Improved specificity of gene silencing by siRNAs containing unlocked nucleobase analogs. Nucleic Acids Res 39: 1823-1832
Vasudevan S, Tong Y, Steitz JA (2007) Switching from repression to activation: microRNAs can up-regulate translation. Science 318: 1931-1934
Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y (2006) An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinformatics 7: 520
Wahlstedt H, Daniel C, Enstero M, Ohman M (2009) Large-scale mRNA sequencing determines global regulation of RNA editing during brain development. Genome Res 19: 978-986
Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43: 904-914
Wang X, Wang X, Varma RK, Beauchamp L, Magdaleno S, Sendera TJ (2009) Selection of hyperfunctional siRNAs with improved potency and specificity. Nucleic Acids Res 37: e152
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: D5-12
Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, 3rd, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10: R130
146
Yang JH, Shao P, Zhou H, Chen YQ, Qu LH (2010) deepBase: a database for deeply annotating and mining deep sequencing data. Nucleic Acids Res 38: D123-130
Yang W (2006) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nature Struct Mol Biol 13: 13-21
Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K (2006 ) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13: 13-21
Yi R, Doehle BP, Qin Y, Macara IG, Cullen BR (2005) Overexpression of exportin 5 enhances RNA interference mediated by short hairpin RNAs and microRNAs. RNA 11: 220-226
Yoshida M, Kaziro Y, Ukita T (1968) The modification of nucleosides and nucleotides. X. Evidence for the important role of inosine residue in codon recognition of yeast alanine tRNA. Biochim Biophys Acta 166: 646-655
Zhang XD, Santini F, Lacson R, Marine SD, Wu Q, Benetti L, Yang R, McCampbell A, Berger JP, Toolan DM, Stec EM, Holder DJ, Soper KA, Heyse JF, Ferrer M (2011) cSSMD: assessing collective activity for addressing off-target effects in genome-scale RNA interference screens. Bioinformatics 27: 2775-2781