Mechanisms Of MicroRNA evolution, regulation and function

University of IowaIowa Research Online

Theses and Dissertations

2013

Mechanisms Of MicroRNA evolution, regulationand function: computational insight, biologicalevaluation and practical applicationRyan Michael SpenglerUniversity of Iowa

Copyright 2013 Ryan Spengler

This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/2636

Follow this and additional works at: http://ir.uiowa.edu/etd

Part of the Cell Biology Commons

Recommended CitationSpengler, Ryan Michael. "Mechanisms Of MicroRNA evolution, regulation and function: computational insight, biological evaluationand practical application." PhD (Doctor of Philosophy) thesis, University of Iowa, 2013.http://ir.uiowa.edu/etd/2636.

http://ir.uiowa.edu?utm_source=ir.uiowa.edu%2Fetd%2F2636&utm_medium=PDF&utm_campaign=PDFCoverPages

http://ir.uiowa.edu/etd?utm_source=ir.uiowa.edu%2Fetd%2F2636&utm_medium=PDF&utm_campaign=PDFCoverPages

http://ir.uiowa.edu/etd?utm_source=ir.uiowa.edu%2Fetd%2F2636&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/10?utm_source=ir.uiowa.edu%2Fetd%2F2636&utm_medium=PDF&utm_campaign=PDFCoverPages

1

MECHANISMS OF MICRORNA EVOLUTION, REGULATION AND FUNCTION:

COMPUTATIONAL INSIGHT, BIOLOGICAL EVALUATION

AND PRACTICAL APPLICATION

by

Ryan Michael Spengler

An Abstract

Of a thesis submitted in partial fulfillment of the requirements for the Doctor of

Philosophy degree in Molecular and Cellular Biology in the Graduate College of

The University of Iowa

May 2013

Thesis Supervisor: Professor Beverly L. Davidson

1

1

ABSTRACT

MicroRNAs (miRNAs) are an abundant and diverse class of small, non-protein

coding RNAs that guide the post-transcriptional repression of messenger RNA (mRNA)

targets in a sequence-specific manner. Hundreds, if not thousands of distinct miRNA

sequences have been described, each of which has the potential to regulate a large number of

mRNAs. Over the last decade, miRNAs have been ascribed roles in nearly all biological

processes in which they have been tested. More recently, interest has grown in understanding

how individual miRNAs evolved, and how they are regulated. In this work, we demonstrate

that Transposable Elements are a source for novel miRNA genes and miRNA target sites. We

find that primate-specific miRNA binding sites were gained through the transposition of Alu

elements. We also find that remnants of Mammalian Interspersed Repeat transposition, which

occurred early in mammalian evolution, provide highly conserved functional miRNA binding

sites in the human genome. We also provide data to support that long non-coding RNAs

(lncRNAs) can provide a novel miRNA binding substrate which, rather than inhibiting the

miRNA target, inhibits the miRNA. As such, lncRNAs are proposed to function as

endogenous miRNA “sponges,” competing for miRNA binding and reducing miRNA-

mediated repression of protein-coding mRNA targets. We also explored how dynamic

changes to miRNA binding sites can occur by A-to-I editing of the 3 ‘UTRs of mRNA

targets. These works, together with knowledge gained from the regulatory activity of

endogenous and exogenously added miRNAs, provided a platform for algorithm

development that can be used in the rational design of artificial RNAi triggers with

improved target specificity. The cumulative results from our studies identify and in some

cases clarify important mechanisms for the emergence of miRNAs and miRNA binding sites

on large (over eons) and small (developmental) time scales, and help in translating these gene

silencing processes into practical application.

2

2

Abstract Approved: ____________________________________ Thesis Supervisor

____________________________________ Title and Department

____________________________________ Date

1

MECHANISMS OF MICRORNA EVOLUTION, REGULATION AND FUNCTION:

COMPUTATIONAL INSIGHT, BIOLOGICAL EVALUATION

AND PRACTICAL APPLICATION

by


A thesis submitted in partial fulfillment of the requirements for the Doctor of

Philosophy degree in Molecular and Cellular Biology in the Graduate College of

The University of Iowa

May 2013

Thesis Supervisor: Professor Beverly L. Davidson

Graduate College The University of Iowa

Iowa City, Iowa

CERTIFICATE OF APPROVAL

_______________________

PH.D. THESIS

_______________

This is to certify that the Ph.D. thesis of


has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Molecular and Cellular Biology at the May 2013 graduation.

Thesis Committee: ___________________________________ Beverly L. Davidson, Thesis Supervisor

___________________________________ Adam Dupuy

___________________________________ John Logsdon

___________________________________ Andrew Russo

___________________________________ Yi Xing

ii

2

ACKNOWLEDGMENTS

First of all, I would like to thank my mentor, Dr. Bev Davidson, for her guidance,

motivation and most of all, patience. I also must acknowledge all the members of the

Davidson Lab, both past and present, who have been an invaluable source of knowledge,

discussion and guidance over the years. In particular, I would like to thank Ryan

Boudreau and Alex Mas Monteys with whom I have closely collaborated and who are

responsible for some of the work presented in this manuscript.

I also thank Dr. Anton McCaffrey, my first research mentor, who took the time to

train me in the basics of molecular biology techniques. He encouraged me to think

outside the box, and guided me as I learned to interpret data and manage my own

research projects.

I owe a special mention of thanks to the entire faculty in the Biology department

of Augustana College, who first taught me to think about science and encouraged me to

explore my own interests. Dr. Kristin Douglas and Dr. Dara Wegman-Geedey were

particularly amazing mentors who guided me in my own research. Dr. Douglas deserves

a special acknowledgement as she first introduced me to microRNAs, which sparked my

interest in the subject and led me to follow that interest in my graduate research.

Finally, words cannot truly describe my appreciation for the love and support my

family has given me over the years. My wife, Erin, most of all has been a vital source of

encouragement and the fact that I am writing this manuscript is in large part due to her

always being there for me.

Thank you all.

iii

3

ABSTRACT

MicroRNAs (miRNAs) are an abundant and diverse class of small, non-protein

coding RNAs that guide the post-transcriptional repression of messenger RNA (mRNA)

targets in a sequence-specific manner. Hundreds, if not thousands of distinct miRNA

sequences have been described, each of which has the potential to regulate a large number of

mRNAs. Over the last decade, miRNAs have been ascribed roles in nearly all biological

processes in which they have been tested. More recently, interest has grown in understanding

how individual miRNAs evolved, and how they are regulated. In this work, we demonstrate

that Transposable Elements are a source for novel miRNA genes and miRNA target sites. We

find that primate-specific miRNA binding sites were gained through the transposition of Alu

elements. We also find that remnants of Mammalian Interspersed Repeat transposition, which

occurred early in mammalian evolution, provide highly conserved functional miRNA binding

sites in the human genome. We also provide data to support that long non-coding RNAs

(lncRNAs) can provide a novel miRNA binding substrate which, rather than inhibiting the

miRNA target, inhibits the miRNA. As such, lncRNAs are proposed to function as

endogenous miRNA “sponges,” competing for miRNA binding and reducing miRNA-

mediated repression of protein-coding mRNA targets. We also explored how dynamic

changes to miRNA binding sites can occur by A-to-I editing of the 3 ‘UTRs of mRNA

targets. These works, together with knowledge gained from the regulatory activity of

endogenous and exogenously added miRNAs, provided a platform for algorithm

development that can be used in the rational design of artificial RNAi triggers with

improved target specificity. The cumulative results from our studies identify and in some

cases clarify important mechanisms for the emergence of miRNAs and miRNA binding sites

on large (over eons) and small (developmental) time scales, and help in translating these gene

silencing processes into practical application.

iv

4

TABLE OF CONTENTS

LIST OF TABLES ................................................................................................... vii

LIST OF FIGURES ................................................................................................ viii

LIST OF ABBREVIATIONS .................................................................................... x

CHAPTER

1. INTRODUCTION ................................................................................................. 1 miRNAs: biogenesis ............................................................................... 1 miRNAs: mechanism of action ............................................................... 2 miRNAs: transcriptional and co-transcriptional control ......................... 3 miRNAs: post-transcriptional control ..................................................... 4 miRNAs: changing the mature miRNA sequence modulates target profiles .................................................................................................... 4 miRNAs: changing mRNA sequence modulates target profiles ........... 6 Long noncoding RNAs ........................................................................... 7 Exogenous RNAi .................................................................................... 7 Exogenous RNAi: implementation and design ....................................... 8 Objectives ............................................................................................... 9 Summary ............................................................................................... 10 Published work ..................................................................................... 10

2. TRANSPOSABLE ELEMENTS CREATE FUNCTIONAL MICRORNAS AND MICRORNA TARGET SITES ............................................................ 14 Abstract ................................................................................................. 14 Introduction ........................................................................................... 14 Methods ................................................................................................ 17

Data retrieval and parsing .............................................................. 17 3’UTR analyses ...................................................................... 17 3’UTR TEs ............................................................................. 18

miRNA target prediction and TE annotation ................................. 18 TargetScan MRE predictions ................................................. 18 Local position to global coordinate conversion ..................... 18 Intersection of MRE and TE genomic coordinates ................ 19 Alu-MRE positional enrichment relative to Alu consensus ... 20 Generating unique MRE coordinates ..................................... 21

TE annotations of miRNA genes ................................................... 22 Detailed positional analysis of Alu-MREs .................................... 22 Microarray analysis ....................................................................... 23 Cloning 3’UTR reporters ............................................................... 23 Cloning endogenous microRNAs .................................................. 24 Cell culture and transfections ........................................................ 24 Luciferase assays ........................................................................... 25 RT-qPCR ....................................................................................... 25

microRNA .............................................................................. 26 Results................................................................................................... 26

MiRNAs have predicted binding sites in 3’UTR-resident TE sequences ....................................................................................... 26

v

5

Let-7 directly regulates genes through conserved, MIR-element-derived target sites ......................................................................... 27 miRNAs with high Alu-MRE frequency target specific regions in the Alu ........................................................................................... 28 miR-24 directly regulates transcripts through Alu-derived target sites ................................................................................................ 28 Proliferation of Alu and B1 SINEs resulted in the convergent acquisition of miRNA targets in their respective primate and murine lineages .............................................................................. 30 Potentially-active Alu loci contain miRNA binding motifs ......... 31 MiRNAs are processed from TE sequences and regulate target genes containing homologous elements ........................................ 31 Functional validation of Alu-derived miRNAs ............................. 32

Discussion ............................................................................................. 34

3. LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL SOURCE OF ENDOGENOUS MICRORNA “SPONGES” ................................ 53 Abstract ................................................................................................. 53 Introduction ........................................................................................... 53 Methods ................................................................................................ 55

Data sources ................................................................................... 55 Prediction and analysis of MRE content in lncRNAs ................... 55 RNA isolation and RT-PCR .......................................................... 56 Ago immunoprecipitation .............................................................. 56

Results................................................................................................... 57 Abundant MRE content is evident in many mouse lncRNAs ....... 57 Expression pattern of lncRNA, PSMI16 ....................................... 58

Adult mouse brain .................................................................. 58 Developing mouse at e14.5 .................................................... 58 Other adult mouse tissues and cell lines ................................. 58

PSMI16 associates with Ago2 ....................................................... 59 “Modular” exon structure and differential MRE inclusion in PSMI16 alternative isoforms ......................................................... 59

Discussion ............................................................................................. 60

4. SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND POTENT SIRNAS FOR HUMAN AND MOUSE............................................... 70 Abstract ................................................................................................. 70 Introduction ........................................................................................... 70 Methods ................................................................................................ 72

Dataset and Sequence Retrieval .................................................... 72 Formulating POTS ......................................................................... 73

Dataset selection ..................................................................... 73 Establishing weighted probability of repression (PR) values and POTS calculation ............................................................. 74

Tissue-specific POTS analysis ...................................................... 75 Validating siSPOTR ...................................................................... 75

Efficacy .................................................................................. 75 Ranking off-targeting potential .............................................. 76 Suppression signatures ........................................................... 76

SiRNA Design Tool Comparison .................................................. 77 Genome-wide shRNA coverage analysis and prospective library generation and comparison ............................................................ 78

Results................................................................................................... 79

vi

6

Low off-targeting siRNAs maintain potency ................................ 79 Design of effective low off-targeting potential siRNAs ................ 79

Strand-biasing ......................................................................... 80 GC-content ............................................................................. 80 Seed specificity ...................................................................... 81 SiSPOTR design example ...................................................... 83

Validation of siSPOTR algorithm: efficacy and specificity ......... 84 Efficacy .................................................................................. 84 Off-targeting potential ............................................................ 84

Comparison of siSPOTR to other algorithms ................................ 86 Prospective applications to expressed RNAi and genome-wide RNAi libraries ............................................................................... 86 SiSPOTR Online Tool ................................................................... 88

Discussion ............................................................................................. 89 Consideration of Seed Pairing Stability ........................................ 89 The Utility of siSPOTR ................................................................. 89

V. FINAL DISCUSSION ...................................................................................... 105 Competitive Endogenous RNAs ......................................................... 105 Off-targeting and RNAi design .......................................................... 106 Emerging technologies in the study of miRNA biology .................... 108

APPENDIX ............................................................................................................ 112

ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS GENERATES NOVEL MICRORNA BINDING SITES1F ...................................... 112 Abstract ............................................................................................... 112 Introduction ......................................................................................... 113 Results................................................................................................. 114

Adenosine deamination creates miRNA complementarities ....... 114 MiR-513 and miR-769-3p/-450b-3p specifically target deamination sites ......................................................................... 115 MiR-513 and miR-769-3p repress deaminated sequences .......... 116 MiR-769-3p represses DFFA expression specifically in cells that deaminate the DFFA 3’ UTR ...................................................... 117

Discussion ........................................................................................... 118 Materials and Methods ....................................................................... 120

Informatics evaluation of ADAR deamination sites ................... 120 Vector construction ..................................................................... 120 Luciferase assays ......................................................................... 121 Western blotting .......................................................................... 122

REFERENCES ...................................................................................................... 130

vii

7

LIST OF TABLES

Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome ................................................................................................... 49

Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs ................................................................................................. 64

Table 4-1. Comparison of siRNA design tools. .................................................... 101

Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.............................................................................................. 102

Table 4-3. The effect of seed position 8 on off-targeting potential by POTS ....... 103

Table A-1. MiRNA seed matches markedly enriched by adenosine deamination . 123

Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences ............................................................................................. 124

viii

8

LIST OF FIGURES

Figure 1-1. Canonical microRNA Biogenesis Pathway (Davis-Dusenbery & Hata, 2010). .................................................................................................. 12

Figure 1-2. Anatomy of lncRNA loci (Adapted from Rinn & Chang, 2012) ........ 13

Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs. ..... 38

Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs. ................................................................................................. 39

Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6. ...................................................................................... 41

Figure 2-4. Let-7 regulates 3’UTRs containing MIR-derived MREs.................... 42

Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction. ....................................................................... 43

Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus. .......................................................................... 44

Figure 2-7. Alu-derived MREs respond to miR-24 overexpression. ..................... 45

Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale .. 46

Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence. ................................................ 47

Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families. 48

Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences. ........................................................................................... 50

Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs. ................................................ 51

Figure 2-13. Pol III intronic promoters drive intronic miRNA expression. ............ 52

Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”. .................................. 62

Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs. 63

Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain. ......................................................... 65

ix

9

Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization. ....................................... 66

Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines. ............................................................................................. 67

Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells. .................................................................................................... 68

Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms. ..... 69

Figure 4-1. Diagram of on- and off-target silencing by siRNAs. .......................... 92

Figure 4-2. Effect of siRNA off-targeting potential on gene silencing capacity. .. 93

Figure 4-3. Formulation and distribution of POTS (potential off-targeting score). .................................................................................................. 94

Figure 4-4. Correlation of POTS ranks across tissues. .......................................... 95

Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm. ............................................................ 96

Figure 4-6. Validation of siSPOTR: efficacy and off-targeting. ........................... 97

Figure 4-7. Spearman rank correlation of final POTS values. .............................. 99

Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors. .............................................................................................. 100

Figure 4-9. Comparison of off-targeting potentials among shRNA libraries. ..... 104

Figure A-1. ADARs deaminate adenosine to inosine, potentially altering miRNA complementarities. ............................................................................ 125

Figure A-2. A-to-I edits frequently create miR-513 and miR-769-3p / -450b-3p complementarities. ............................................................................ 126

Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence. ............................................................................ 127

Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression. ......................................................................................... 128

Figure A-5. MiR-769 selectively represses DFFA protein. .................................. 129

x

10

LIST OF ABBREVIATIONS

ADAR Adenosine Deaminases that Act on RNA

AGO Argonaute

A-to-I Adenosine to Inosine

BEND3 Brain-derived Endothelial Cells

C Conserved

cDNA Complementary DNA

ceRNA Competing Endogenous RNA

CHST6 Carbohydrate (N-acetylglucosamine 6-O) Sulfotransferase 6

CISD2 CDGSH Iron Sulfur Domain 2

CTRL Control

DFFA DNA Fragmentation Factor Alpha

DGCR8 DiGeorge Syndrome Critical Region Gene 8

DMEM Dulbecco's Modified Eagle Medium

DNA Deoxyribonucleic Acid

DPC Days Post Coitum

EIF2S3 Eukaryotic Translation Initiation Factor 2, Subunit 3 gamma

ESC Embryonic Stem Cell

EST Expressed Sequence Tags

EXP5 Exportin-5

F11R Platelet F11 Receptor

FBS Fetal Bovine Serum

GAPDH Glyceraldehyde-3-Phosphate Dehydrogenase

GAS5 Growth Arrest-Specific 5

HE High Efficacy

HEK293 Human Embryonic Kidney 293 Cells

HITS-CLIP High Throughput Sequencing Crosslink Immunoprecipitation

hsa Homo sapiens (Human)

IP Immunoprecipitation

ISH In situ Hybridization

kb Kilobase

xi

11

LE Low Efficacy

lincRNAs Long Intergenic Non-coding RNAs

LINE1 (L1) Long Interspersed Nuclear Element 1

LINE2 (L2) Long Interspersed Nuclear Element 2

MAID miRNA Associating If Deaminated

MAP3K9 Mitogen-Activated Protein 3-Kinase 9

miRNA microRNA

MIRs Mammalian-wide Interspersed Repeats

mmu Mus musculus (Mouse)

MRE miRNA recognition element

mRNA Messenger RNA

N2A Neuro 2A Cells

NC Non-Conserved

ng Nanogram

nM Nanomolar

nt Nucleotide

PBMC Peripheral Blood Mononuclear Cell

PCDHB1 Procadherin Beta 11

PolII RNA Polymerase II

POTS Potential Off-Targeting Score

PPIB Peptidylprolyl Isomerase B (Cyclophilin B)

PR Probability of Repression

pre-miRNA Precursor microRNA

pri-miRNA Primary microRNA

PSMI16 Putative Sponge for miRNA-16

PTEN Phosphatase and Tensin Homolog

Ptr Pan troglodytes (Chimpanzee)

REST RE1-Silencing Transcription Factor

RISC RNA-induced Silencing Complex

RNA Ribonucleic Acid

RNP Ribonucleoprotein

RT-PCR Reverse Transcription Polymerase Chain Reaction

xii

12

SEMA3F Semaphorin-3F

SFXN2 Sideroflexin2

shRNA Small (Short) Hairpin RNA

siRNA Small (Short) Interfering RNA

siSPOTR siRNA Seed Potential of Off-Target Reduction

SLC12A8 Solute Carrier Family 12, member 8

T4-PNK T4-Polynucleotide Kinase

TA Target Abundance

TEs Transposable Elements

TRC The RNAi Consortium

TU Transcription Unit

UBXN2B Ubiquitin Regulatory X domain-containing protein 2B

µl Microliter

1

CHAPTER 1

INTRODUCTION

miRNAs: biogenesis

Arguably the most extensively studied of the endogenous small RNA families,

miRNAs are distinguished by the characteristic stem-loop structure of their precursor

transcripts. Approximately 50% of human miRNAs are clustered with one or more other

miRNAs that are believed to be co-transcribed as a single polycistron RNA Polymerase II

(PolII) transcribed (Griffiths-Jones et al, 2006). Nearly the same fraction are hosted

within and co-transcribed as part of a protein-coding messenger RNA (mRNA)

transcription unit (TU). Classically, these primary miRNA (pri-miRNA) are initially

clipped out of the nascent TU by the microprocessor complex, comprised of the RNAase

III enzyme, DROSHA (Han, 2004) and its essential cofactor, DiGeorge syndrome critical

region gene 8 (DGCR8) (Gregory, 2004; Han, 2006). DGCR8 is believed to bind to a

single-stranded portion of the pri-miRNA located at the base of the double-stranded stem,

opposite the loop. Guided by DGCR8, DROSHA cleaves the pri-miRNA ~11 nucleotides

(nt) into the stem, releasing a precursor miRNA (pre-miRNA) hairpin product comprised

of the stem and loop (Han, 2006). RNA splicing and pri-miRNA processing appear to be

tightly coordinated in the case of intron-resident miRNAs, with evidence pointing to

miRNA “cropping” proceeding intron removal; cropping does not appear to impact the

splicing process itself (Kim & Kim, 2007). On the other hand, mirtrons are a distinct

miRNA subgroup that depend directly on splicing activity. They form from the lariat

structure formed during splicing and actually bypass the DROSHA cleavage step

(Berezikov et al, 2007; Okamura et al, 2007).

The pre-miRNA intermediate is shuttled out of the nucleus through the Exportin-5

(EXP5) nuclear transport receptor in cooperation with a Ran GTPase cofactor (Bohnsack

et al, 2004; Lund et al, 2004; Yi et al, 2005). EXP5 recognizes the double-stranded stem

2

of the pre-miRNA, along with the characteristic 2 nt 3’ overhang left by RNAse III

enzymes like DROSHA. Once released into the cytoplasm, a second RNase III enzyme,

DICER1, binds the pre-miRNA and cleaves the loop, yielding a ~20nt double-stranded

RNA with 2 nt 3’ overhangs on both ends (Macrae, 2006; Saito et al, 2005).

miRNAs: mechanism of action

In a process that is not well-understood, DICER1 facilitates loading of the

miRNA duplex into an AGO protein at the heart of the RNA-induced Silencing Complex

(RISC). In mammals, miRNAs are loaded into any one of four AGO proteins (AGO 1-4).

Understanding the distinct roles of each AGO complex is still under active investigation,

but it is known that AGO 1 and 2 are the primary isoforms in mammals and that AGO 2,

but not 1, 3 or 4, has catalytic “slicer” activity (Azuma-Mukai, 2008). Regardless of the

isoform, only one of the two miRNA strands is ultimately incorporated into RISC as the

antisense, or “guide” strand. The sense, or “passenger” strand is degraded after either

being cleaved by AGO slicer activity, or separated by an as-yet unidentified RNA

helicase.

Once loaded into RISC, the mature miRNA imparts target specificity to the

silencing complex. In animals, miRNAs primarily interact with the 3’UTRs of protein

coding transcripts, binding to partially-complementary motifs encoded in the mRNA. As

few as 6-7 nucleotides of complementarity between a miRNA recognition element

(MRE) in a target mRNA and the vital “seed” region of the miRNA (nt 2-7/8) is often

sufficient to impart RISC activity, usually resulting in a reduction of protein output from

that transcript through transcript destabilization or translation inhibition. With so few

base pairs mediating target interactions, hundreds or perhaps thousands of transcripts may

contain potential binding motifs. Also, the importance of the seed region is such that a

single base change, or shift in processing, can change the seed sequence and therefore,

3

drastically change the potential target profile. Because of these features, many

mechanisms have evolved to control the expression level and processing of the miRNA.

miRNAs: transcriptional and co-transcriptional control

MiRNAs generally impart a repressive effect directly related to the concentration

of the mature miRNA. Since miRNA biogenesis involves several sequential and

interdependent processes, each step represents a potential point of regulation. For

example, most miRNA are transcribed by RNA Polymerase II (PolII), the same

polymerase that transcribes protein-coding genes, and many of the same mechanisms of

transcriptional control apply. In fact, intronic miRNAs may directly rely on the

transcriptional control of the host gene for its own expression. This suggests that the

same transcription factors and chromatin modifiers regulating mRNA expression at the

transcriptional level can consequently regulate the miRNA levels. Analogous

transcriptional regulation has been observed with intergenic PolII-transcribed miRNA

TUs as well. For example, our lab previously showed that the RE1-silencing transcription

factor (REST) can repress expression of neuron-enriched miR-9 (Packer et al, 2008).

REST is classically known as a transcription factor that suppresses neuronal genes in

non-neuronal cells. Furthermore, in several documented cases including the example of

REST and miR-9, the transcription factor is reciprocally regulated by the miRNA, setting

up feedback or feed-forward loops (Bracken, 2008; Packer et al, 2008).

Interestingly, some intronic miRNAs have discordant expression with their host

gene in certain cellular contexts. One possible explanation for this, which I described

earlier, is that promoter elements located in the intron upstream of the miRNA sequence

can independently drive transcription of the miRNA (Monteys et al, 2010). Another not

mutually-exclusive explanation for discordant expression is that pri-miRNA processing is

altered through microprocessor regulation. For example, SMAD proteins, which

classically serve in TGFβ signal transduction, bind to sequences in the loops of some

4

miRNAs, including miR-21 and miR-199a (Davis et al, 2008). This interaction promotes

DROSHA cleavage, increasing levels of that miRNA. Several examples of

microprocessor inhibition have also been demonstrated. For example, Adenosine

Deaminases that act on RNA (ADAR) catalyze the conversion of Adenosine to Inosine

residues in some double-stranded RNA substrates, and a subset of miRNAs have shown a

reduction in processing efficiency when deaminated at particular residues (Yang, 2006).

miRNAs: post-transcriptional control

As mentioned earlier, pre-miRNAs are exported from the nucleus to be processed

further by DICER1. Control of miRNA export can occur by modifying the capacity to

interact with EXP5 or altering the Ran GTPase cycle. This would be one mechanism to

regulate miRNAs on a global level. Another means of miRNA regulation is at the level of

DICER1 processing. One well-documented example of this involves the interaction

between the protein, LIN28A and the miRNA, let-7. LIN28A binds to the loop of let-7

and inhibits both DROSHA and DICER1 processing. LIN28A can also lead to

recruitment of the terminal urydilase, TUT4, which catalyzes the non-templated addition

of poly-uridine residues to the 3’ end of let-7 (Piskounova et al, 2011). This results in

rapid let-7 turnover. As with the REST/miR-9 example presented above, LIN28A has

3’UTR binding sites for let-7, setting up an auto-regulatory feedback loop (Newman et

al, 2008). This interaction results in an inversely-correlated expression pattern of mature

let-7 and LIN28A during development. Interestingly, high levels of let-7 precursor can be

detected throughout development, but the mature sequence only appears in terminally-

differentiated cells once LIN28A expression drops.

miRNAs: changing the mature miRNA sequence

modulates target profiles

Once a mature miRNA is produced, most research supports that the miRNA

mediates target repression by binding to 6-8 nts in mRNA 3’UTRs that are

5

complementary to the seed sequence (positions 2-8). In most cases, even a single

mismatch between the 6 nt “core” (position 2-7) completely disrupts miRNA binding

(Lewis et al, 2005). Additionally, many target sites with little or no binding outside of the

seed respond to the miRNA, suggesting that in addition to being necessary for miRNA

function, the seed may also be sufficient. The limited role of the flanking miRNA

sequence is supported by the crystal structures of AGO family proteins showing that the

3’ end of miRNAs are actually flipped out of the pocket of miRNA- target binding

(Schirle & MacRae, 2012).

Together, these observations support a critical role for the seed region in

determining the target specificity of a miRNA. This also suggests that regulatory

mechanisms that alter the miRNA seed sequence could cause global changes in the

miRNAs target profile. For example, as mentioned above, ADAR enzymes catalyze A-to-

I editing of some miRNA sequences, which can alter the efficiency of their processing.

However, because inosine is functionally equivalent to guanine in terms of base-pairing

interactions, editing of a single nucleotide of the seed could result in very different target

profile.

Another potential means to alter the miRNA seed was proposed based on studying

large-scale sequencing of endogenous small RNAs. These studies revealed that many

miRNAs have variable 5’ and 3' ends, as DROSHA and DICER1 processing are not

always perfectly precise. This creates isomiRs with 5’ or 3’ termini shifted by one or

more bases. While the majority of these shifts were found at the 3’end (leaving the seed

unaffected), some miRNAs had variable 5’ ends as well; although, to date, no clear

example demonstrating a functional role for a 5’ isomiR has been shown. That being said,

this phenomenon is an important consideration when designing exogenous RNAi

triggers, as described in Chapter 4. In short, some artificial sequences demonstrate

sequence-dependent cellular toxicity, largely due to widespread seed-mediated

transcriptional dysregulation. These so-called “off-target effects” can be mitigated by

6

rationally designing sequences with a low propensity for off-targeting. However,

misprocessing of the exogenous sequence could result in seed sequence variant, altering

its off-targeting potential. Keeping this in mind, RNAi trigger design should consider the

off-targeting potential of seed sequences caused by processing shifts.

Finally, global modulation of miRNA target profiles can also be regulated through

strand selection. Most miRNAs are predominantly processed into a single mature

isoform, strongly biased towards loading of only one of the two strands of the miRNA

duplex. Until very recently, miRNA naming conventions allowed the non-dominant arm

to be designated the star (*) strand. However, examples continue to accumulate

demonstrating specific conditions under which the star strand plays an important, if not

predominant role.

miRNAs: changing mRNA sequence

modulates target profiles

Individual miRNAs potentially regulate hundreds of target transcripts.

Consequently, mechanisms to regulate miRNA activity on a global level are best aimed at

directly regulating the miRNA, as outlined above. However, regulating miRNA activity

at the level of the target transcript would provide a mechanism to control which

transcripts are bound by the miRNA. This process has been observed on a global scale in

the context of some cancer cell lines and other dividing cell types. In these settings,

widespread shortening of 3’UTRs was observed, due to alternative splicing or

polyadenylation site choice. This resulted in loss of miRNA binding sites, concomitant

transcript stabilization and increased protein production. Differential expression between

long and short isoforms of the target mRNAs coincided with differential inclusion of

miRNA binding sites.

7

Long noncoding RNAs

Long non-coding RNAs (lncRNAs), like their small RNA counterparts, are a

recently-discovered class of RNAs, broadly categorized based on their lack of coding

potential (<100 amino acid Open Reading Frame) and long (>200 nt) transcript size. This

rather nondescript and arbitrary definition serves to distinguish these novel transcript

with the ever-increasing small RNA world. Some of the classically-described small

nuclear RNA spicing factors could fall within this definition, but the lncRNA jargon

tends to focus on the more recently-discovered groups.

The nomenclature used for lncRNA subgroups are somewhat informative.

Nomenclature typically relies on describing the position and orientation of lncRNAs

relative to nearby protein-coding TUs (Figure 1-1). While the descriptions do not infer

function, most of the data regarding transcripts overlapping protein coding genes support

a cis regulatory role. That is, transcripts overlapping protein-coding TUs tend to be

involved in the transcriptional or epigenetic regulation of the protein coding genes they

overlap.

This contrasts with the class of long intergenic non-coding RNAs, which by

definition lack nearby protein-coding transcripts. These transcripts may be RNA

scaffolds for proteins involved in transcriptional or epigenetic control. These transcripts

are similar to protein-coding mRNAs; they are often spliced, capped and polyadenylated.

These transcripts do have some protein-coding potential but it is unlikely to be their

predominant function based on conservation and ORF size.

Exogenous RNAi

As mentioned earlier, endogenous miRNAs typically mediate post-transcriptional

repression of mRNAs by pairing to partially-complementary sequence motifs within

target 3’UTRs. Although relatively rare in mammalian cells, fully-complementary pairing

between the mature miRNA and its target mRNA can impart AGO2-mediated “slicer”

8

activity. Exogenous RNAi triggers are artificial sequences designed to engage the

miRNA pathway and induce AGO2-mediated cleavage of a desired mRNA target gene.

With the advent of this technology, specific repression of nearly any gene of interest can

be achieved without depending on specific drugs or the more technically-difficult genetic

knockouts. Therapeutically, RNAi triggers have been used to silence viral genes,

dominantly-heritable toxic gene products and disease-modifying proteins.

Exogenous RNAi: implementation and design

Artificial RNAi triggers can be chemically-synthesized double-stranded RNAs

(dsRNA) or expressed hairpin transcripts. Chemically-synthesized oligonucleotides are

usually ~21nt short (small) interfering RNA (siRNAs). Because siRNAs structurally

resemble the dsRNA products loaded into AGO proteins after dicer cleavage, siRNAs

bypass DROSHA and DICER1 processing. SiRNAs are commonly used in lipid-based

transfections in vitro, although means of efficiently delivering these molecules in vivo is

an area of active research.

Expressed RNAi constructs are short hairpin RNA (shRNA) or miRNA shuttles.

ShRNAs structurally resemble pre-miRNA hairpins yielded from DROSHA processing

and are typically driven by RNA Polymerase III (Pol III) promoters, such as U6 and H1.

Studies from our lab and others showed that shRNAs can be relatively poor substrates for

DICER1 and can cause buildup of precursor RNA with consequent cellular toxicity.

Better substrates were obtained by cloning the artificial sequences into the context of an

endogenous miRNA sequence. For example, miR-30-based shuttles used commonly in

our lab express an exogenous dsRNA stem in the context of the endogenous miR-30

along with a portion of the 5’ and 3’ flanking sequence, generating a sequence more

closely resembling a natural pri-miRNA processed by DROSHA in the nucleus. In any

case, the ultimate goal is to introduce a sequence into the endogenous miRNA pathway to

load the desired ~21 nt guide strand antisense to the target gene of interest. Because, as

9

mentioned above, exogenous sequences enter the miRNA pathway and can impart

miRNA-like seed interactions, incorrect processing or loading of the passenger strand

may not only decrease efficacy of target silencing, but also increase the off-target effects

imparted by the sequence. Design techniques used to minimize these undesired effects

are described in more detail in Chapter 4.

Objectives

In the work described here, questions of miRNA interactions with mRNA targets

are addressed. In Chapter 2, I provide computational and functional evidence addressing

whether Transposable Elements (TEs) are involved in the evolution and function of

human miRNAs. More specifically, are miRNAs processed from TEs functional and do

they regulate mRNAs containing MiRNA Recognition Elements (MREs) embedded

within homologous sequences? Also, do highly-conserved, non-TE-derived miRNAs

functionally target 3’UTR TEs, gaining novel MREs via lineage-specific TE proliferation

events? The work presented in the Appendix serves to answer whether Adenosine-to-

Inosine RNA editing of a subset of TE (Alu)-derived MREs dynamically modifies

miRNA binding. Single base changes resulting from this RNA editing could create,

destroy or even switch miRNA binding sites.

While miRNA binding sites are believed to primarily reside in the 3’UTRs of

protein-coding mRNAs, I also address the important question as to whether long

noncoding RNAs (lncRNAs), which are by definition “untranslated,” are also bound by

miRNAs. Specifically, I propose that some lncRNAs are endogenous miRNA “sponges”

owing to their numerous, often >10, MREs for a specific miRNA family. In this way, I

suggest that some lncRNAs compete for miRNA binding to mRNA 3’UTRs, regulating

miRNAs in a competitive manner.

Finally, in Chapter 4, I take knowledge gleaned from studies of miRNA

interactions with targets and ask whether miRNA target rules can guide rational design of

10

highly-specific artificial RNAi triggers. These exogenous molecules are known to enter

the endogenous miRNA pathway to be processed and to mediate cleavage and

degradation of the intended mRNA target. However, the artificial sequences also can

behave like miRNAs, leading to unintentional repression of mRNAs through seed-

mediated 3’UTR interactions. By designing antisense sequences with very few MRE

sites, can I limit the off-target impact of the RNAi sequences, and reduce false-positive

functional changes and cellular toxicity?

Summary

Endogenous miRNAs classically regulate the expression of protein-coding

transcripts through interactions with short, often evolutionarily-conserved, sequence

motifs in 3’UTRs. However, non-conserved MREs outnumber conserved ones ~10:1 and

a significant proportion of these may respond to miRNA-mediated silencing (Farh et al,

2005). Cellular mechanisms have evolved to control the promiscuity of these sequences

to some extent, but the ability to recognize novel binding sites plays an important role in

the evolution of miRNA function. Understanding these interactions not only aids in the

discovery of novel roles for endogenous miRNAs, but also improves our understanding

of the side-effects associated with exogenous RNAi triggers and improves the design of

future RNAi therapeutics.

Published work

With the exception of Figure 2-10, Chapter 2 is adapted from a work in

preparation for peer-reviewed publication where RMS serves as the primary author and

designed all experiments in conjunction with his mentor, Beverly Davidson (BLD). The

work presented in figure 2-10 was designed and performed by RMS, but was adapted

from (Monteys et al, 2010) published in RNA. Chapter 3 represents a work in progress

that will be submitted for peer review publication pending results from further functional

assays. RMS designed all studies and will be the sole primary author. Chapter 4 is

11

adapted from (Boudreau et al, 2013), published in Nucleic Acids Research, where RMS

was an equally-contributing primary author. A complementary manuscript was published

in Molecular Therapy in 2011, where RMS was a second contributing author, but data are

not presented directly from that work (Boudreau et al, 2011). The data presented in the

Appendix are adapted from a publication in Human Molecular Genetics, where RMS is a

contributing author (Borchert et al, 2009). Work contributed by authors other than RMS

are indicated in the methods or figure legends.

12

Figure 1-1. Canonical microRNA Biogenesis Pathway (Davis-Dusenbery & Hata, 2010).

13

Figure 1-2. Anatomy of lncRNA loci. (Adapted from Rinn & Chang, 2012)

Owing to the largely unknown function of most lncRNAs, they are often classified according to their location and orientation in relation to nearby protein-coding genes. Gene-proximal lncRNAs often act in cis, regulating expression from the protein-coding gene. Antisense transcripts initiate transcription within or 3’ of the protein-coding gene and are transcribed in the opposite direction

14

CHAPTER 2

TRANSPOSABLE ELEMENTS CREATE FUNCTIONAL

MICRORNAS AND MICRORNA TARGET SITES

Abstract

Transposable Elements (TEs) account for nearly one-half of the sequence content

in the human genome. De novo germline transposition into regulatory or coding

sequences of protein-coding genes causes several heritable disorders. However, TEs are

prevalent in and around protein-coding genes, sparking inquiry into possible regulatory

function. Computational studies revealed miRNA genes and miRNA Recognition

Elements (MREs) residing within TE sequences, but little evidence exists to support a

role for these sequences. In this work, I functionally validate miRNAs and MREs derived

from the most prevalent TE families, including evolutionarily ancient LINE2 and MIR

retrotransposons as well as primate-specific Alu elements.

Introduction

Transposable Elements (TEs or transposons) mobilize and reintegrate within a

host organism’s genome and different TE classes have diverse structural features,

transposition mechanisms and evolutionary origins. Some elements mobilize via "copy

and paste" mechanisms and others through "cut and paste.” Retrotransposons (Type I)

replicate by transcribing an RNA copy that subsequently reintegrates into the host

genome and serves as a template for RNA-dependent DNA polymerase (a.k.a. reverse

transcriptase) activity. Analogous mechanisms are essential in the life cycle of infectious

retroviruses like Human Immunodeficiency Virus (HIV) and Human T-Cell Leukemia

Virus (HTLV). Non-infectious Endogenous Retroviruses (ERVs) are predominant

members of the Long Terminal Repeat (LTR)-containing subclass of retrotransposons.

Non-LTR-containing retrotransposons, including Long and Short Interspersed Nuclear

Elements (LINEs and SINEs, respectively), are the most abundant TE class in the human

15

genome, accounting for more than 30% of the total DNA content. Additionally, non-LTR

LINE1, Alu and SVA elements are the only TE families that remain active. DNA (Type

II) transposons encode proteins that excise the TE and facilitate reintegration elsewhere.

Although no DNA elements are active in the human genome at present, evolutionary

analysis of human DNA element sequences, accounting for 3% of the total genome

content, revealed that primate genomes had abundant DNA transposition until ~37

million years ago (MYA). Together, TEs mobilizing through both mechanisms have

modified the human genome as well as the genomes of most organisms across all

domains of life, and some continue to do so.

Irrespective of the mechanism, transposition of "active" elements is potentially

mutagenic as TE excision (Type II) or integration (Type I and II) can directly disrupt the

sequence or expression of protein-coding genes. Examples of de novo germline insertions

of active Alu, LINE1 and SVA elements are evident in more than 60 human diseases

including β-thalassemia, hemophilia and cystic fibrosis. Additionally, high copy numbers

of Alu and LINE1 elements, both active and inactive, can cause somatic genome

instability and cancer.

In spite of their mutagenic potential, TEs are commonly observed as the

predominant contributor to a genome's sequence content. In fact, ~80% of the 17

gigabase-pair (Gb) bread wheat genome (Triticum aestivum) is TE-derived, which is

more than 4.5 times the size of the human genome. Conservative estimates place TE

content of humans at ~45% of the genome. More recent estimates using an improved TE

prediction algorithm suggests that this value is closer to 65-70%.

Mechanisms to protect against potentially deleterious transposition events have

evolved in plants and animals, including RNA interference (RNAi). In the mammalian

germline, where heritable mutations can accumulate, piwi-Interacting RNAs (piRNAs)

and some endogenous siRNAs (endo-siRNAs), are loaded into Argonaute-family proteins

(PIWI and Ago2, respectively) and guide silencing complexes to complementary TE

16

sequences. Intriguingly, computational observations from our lab and others reported that

miRNAs can be processed from TE-derived genomic loci (Borchert et al, 2006 ;

Piriyapongsa & Jordan, 2007; Piriyapongsa et al, 2007; Smalheiser & Torvik, 2005;

Smalheiser & Torvik, 2006a). Although miRNAs have no reported role in TE defense,

the canonical targets for miRNA regulation—mRNA 3'UTRs—are often littered with TE

sequences. This led to the hypothesis that TE-derived miRNAs may target homologous

TEs in 3'UTRs, thus imparting miRNA-mediated regulation in a TE-dependent manner.

This would be similar in principal to endo siRNA and piRNA mediated TE repression.

Although computational data are abundant to support this hypothesis, functional

validation of these interactions remains rare. To date, only LINE2-derived miR-28-5p

and miR-151 bind to the 3'UTRs of CXCR5 and LYPD3, respectively, through non-

canonical miRNA recognition elements (MREs), causing endonucleolytic cleavage of the

target substrate. In this work, I show that miR-28 regulates gene expression of several

target mRNAs through LINE2 elements in 3’UTRs. Unlike the LINE1 and Alu elements

described above, LINE2s are inactive remnants of an ancient period of transposition that

was active early in placental mammal evolutionary history. The LINE2-derived targets

show sequence conservation across all extant species sharing a common ancestor.

I also demonstrate the impact of mammalian-wide and primate-specific TEs on

target gene interactions for many miRNA families. Specifically, I show that let-7

regulates several genes through MIR-embedded target sites in human cells, that these

interactions are highly conserved, and that they represent adaptive changes to let-7

function held over from the evolution of placental mammals. I also demonstrate primate-

specific additions to miR-24 function as a result of its seed sequence being present in the

Alu consensus. Finally, I provide evidence for the functionality of primate-specific, Alu-

derived miR-1285, and show that it regulates transcripts through MREs contained within

homologous Alu elements.

17

Methods

Data retrieval and parsing

Unless stated otherwise, all genomic data were obtained from the UCSC Genome

Browser using the Table Browser utility.

3’UTR analyses

Human (GRCh37/hg19) and mouse (NCBI37/mm9) 3’UTR sequences, genomic

coordinates and accession numbers were downloaded from the RefSeq annotations track.

"RefSeq Genes" was selected under the "Genes and Gene Prediction Tracks" group. To

return only data for protein-coding mRNAs, the filter option was applied to the "refGene"

table, changing the "name" field filter so that "name" does match "NM*", thus selecting

RefSeq accession numbers beginning with the NM prefix. To extract the 3’UTR

sequence data, "sequence" was selected as the output format. Parsing and reformatting of

the sequence data was performed using the Galaxy web server tools. After selecting "Get

Output", 3’UTR data were specifically extracted by choosing "genomic" in the window

asking for sequence type. In the next screen, all but the 3’UTR box were unchecked. To

facilitate accurate conversion between local and global MRE coordinates in downstream

analyses, the radio button was chosen to output one FASTA record per region.

To prepare the 3’UTRs for TargetScan input, Galaxy tools were used to convert

FASTA to Tabular format which was then manipulated in Excel. Genomic coordinates

(chromosome, start, end and strand) and RefSeq IDs are provided in the FASTA header

and this information was combined to serve as the ID column in the required TargetScan

format. The species IDs (human=9606; mouse=10090) was added before saving as a text

file.

18

3’UTR TEs

The UCSC Genome Table Browser was used to extract RepeatMasker Track

annotations from the human and mouse assemblies. The "RepeatMasker" track was

selected from the "Variation and Repeats" group. "All fields from selected table" was

chosen as the output format and the data sent to the Galaxy server. Unless otherwise

indicated, simple repeats, low complexity regions and other non-TE repeats were filtered

out of the dataset.

In Galaxy, the RepeatMasker output was converted to "Interval", indicating the

proper assembly and coordinate information. To select 3’UTR TEs from the full genome

list, the 3’UTR and TE coordinates were joined on genomic coordinates using the tool by

that name and requiring at least 6bp of overlap (the minimum seed match size).

miRNA target prediction and TE annotation

TargetScan MRE predictions

The TargetScan 5.1 Perl script was downloaded from the Human TargetScan

website and default parameters were used to predict seed binding sites in 3’UTRs

(Grimson et al, 2007). At this time, TargetScan6.2 is the most recent release and the new

code allows prediction of additional MRE site-types. The default site-type settings in the

targetscan_60.pl code (7mer-A1, 7mer-M8 and 8mer-1A) are identical to those in the

TargetScan 5.1 version, except that 8mer-1A is labeled as 8mer. Unique miRNA seed

family IDs and seed sequences were taken from the miR_Family_Info.txt file provided

on the TargetScan website.

Local position to global coordinate conversion

The conversion of local positions to global genomic coordinates was a common

task and several methods were employed depending on the number of records and the

operating system (Windows or Linux) being used. The method outlined below describes

19

the use of Excel for working with the text files; however, Excel 2007, 2010 and 2013 are

limited to just over 1 million rows per worksheet. Any other methods employed carried

out the same math operations, as shown below.

The TargetScan output file lists all miRNA and target gene pairs, along with the

MRE site-type and position information. Target site positions are relative to the

beginning of the 3’UTR target, with the first base as position 1. To convert to genomic

positions, the 3’UTR genomic coordinates were extracted from the target name column

(see "3’UTRs" under Data Retrieval in the methods above) and combined with the local

position information. It is important to note that UCSC coordinates are a 0-based system,

meaning that the first base is 0. The adjustments are shown in the formulae, below. These

calculations were all done in Microsoft Excel.

:: 1,

,

:, +

1,

Intersection of MRE and TE genomic coordinates

MRE genomic coordinates and TargetScan output information were uploaded to

the Galaxy public server to use the available coordinate-based functions (Blankenberg et

al, 2010). The uploaded file was converted to "interval" format, using the calculated

MRE genomic coordinates for the "START" and "END" fields. MRE and RepeatMasker

tables were joined using the "Join the intervals of two datasets side-by-side" function and

setting the "min overlap (bp)" field to 1 and the "return" field to "all records of first field"

(assuming the MRE table is input as the first dataset). MREs not overlapping a TE are

returned with the TE information filled with a null value. Because 3'UTR information

was already present in the MRE table, the original RepeatMasker output from the UCSC

genome browser was used for this intersection.

20

Galaxy tools were used to join overlapping target site and RepeatMasker TE

coordinates. Analysis of miRNA target site frequency in TE families was subsequently

performed in Microsoft Excel. Because target prediction already accounted for transcript

orientation, overlaps were allowed on either strand. TEs were labeled as being transcribed

in the "+" or "−" orientation. If the TE and the 3’UTR/MRE are oriented on the same

strand (+/+ or -/-), then the TE is transcribed in the "+" orientation; if opposite (+/- or -

/+), then "−".

Alu-MRE positional enrichment relative to Alu consensus

In the formulas presented in the previous section, MRE positions within the host

3'UTR were converted to MRE genomic coordinates. Given that MRE and 3'UTR

genomic coordinates are both known, one can easily revert back to the MRE position

relative to the host 3'UTR. The same concept is used to determine the 3'UTR position

within the host 3'UTR Alu region, simply substituting the Alu genomic coordinates in

place of the 3'UTR coordinates. Given that both are on the same, 0-based, coordinate

system, no adjustment is needed. Importantly, these local coordinates represent the MRE

position relative to the 5' end of 3'UTR region positions within the Alu were calculated

using rearranged versions of the equations presented above, with some slight

modifications. Here, global coordinates were converted back to local coordinates, except

that the local positions are calculated relative to the Alu rather than the 3'UTR.

First, the MRE position relative to the 3'UTR Alu was calculated using the

formulae that follow. Note that no base adjustment is needed here because both sets of

coordinates are 0-based.

:1,

,

, +1,

21

Generating unique MRE coordinates

When using unique miRNA seed families and sequences in the initial target

prediction, MRE coordinate redundancy (i.e. same chromosome, start, end and strand)

results from overlapping mRNA isoforms with distinct accession numbers. With TE-

MRE predictions, a second source is the partial overlap of RepeatMasker annotations.

Here, redundant MRE sites were collapsed, with the exception of those resulting from TE

overlaps. Two non-programmatic methods were used, as described below.

Unique TE-MREs: Microsoft Excel

TE-MRE coordinates were filtered from the full MRE table using the Galaxy text

filtering function. The table was then exported and opened in Excel. After selecting the

whole table "Remove Duplicates" is selected. Fields containing replicate information

based on the stated definition are left checked. Specifically, boxes remain checked for the

MRE genomic coordinates (Chromosome, start, end and strand) and minimal necessary

TE information (start, end, strand and TE name).

Unique TE-MREs: Galaxy

The Galaxy server has several coordinate-based filtering functions which are far

simpler than the steps described here; however, at present, only the unique coordinates

are produced in the output file. Thus, to remove redundant coordinates in Galaxy without

losing the information containing the redundant fields and any other TE or MRE

information (but no 3'UTR information) are concatenated into one new field, separated

by a specified delimiter. The desired delimiter symbol is added to the table in the last

column using the "Add column" function. A symbol/character is chosen which is not

present in any of the fields. If the unique coordinates were to be analyzed further using

Galaxy, a delimiter is chosen based upon those available in the "Convert delimiters to

tab" function.

22

After the symbol is added, the "Merge columns together" function is used to

combine all of the replicate fields into one column, with each of the fields separated by

the delimiter symbol. The concatenated information is added as a new column in the

output and is then used in the "Group" function as the "group by" parameter. Adding no

operations to this function, the output contains a single column with the concatenated

information. Finally, the delimiter symbol is converted back to tabs using the "convert

delimiter to tabs" function.

TE annotations of miRNA genes

Genomic coordinate and sequence data for human miRNA hairpins was obtained

from the miRBase FTP repository (Version 15). These sequences represent the pre-miR

plus some flanking sequence 3' and 5' of the Drosha cleavage site, but are not intended to

represent the full pri-miRNA. To determine TE overlap with the mature miRNA, these

sequences and coordinates were sufficient. TE and miRNA coordinates were intersected

using the "Join on Genomic Coordinates" function in Galaxy. The Excel string matching

function, "SEARCH" was used to search for the mature miRNA within the pri/pre-

miRNA sequences, thus providing local positional coordinates. The intersected miRNA

and TE genomic coordinates allowed calculation of the start and end positions of the TE

relative to the pre-miRNA, using the first (5') nucleotide of the pre-miR as 1. Finally, the

combined local positions of the mature miRNA and TEs allowed calculation of the

percent overlap of the two features. Those miRNAs for which the TE completely

overlapped the miRNA seed sequence (position 2-8) were considered "TE-derived" for

the purposes of this study

Detailed positional analysis of Alu-MREs

Genomic coordinates for Alu-derived MREs were extracted, along with

RepeatMasker track annotations for the associated TE. These annotations provide a

summary of a sequence alignment between the Alu and the corresponding Alu family

23

consensus. The MRE position relative to the genomic Alu start position was first

calculated and then adjusted according to the alignment start/stop positions.

Microarray analysis

With the exception of the miR-24 experiment, preprocessed microarray fold

change values were obtained from the Supplementary Data 4 table in Garcia et.al. 2011

(Garcia et al, 2011). The original data are available from NCBI GEO. Data series

GSE8501 contains the experimental data for miR-122 (GSM210901), miR-128

(GSM210903) and miR-132 (GSM210904). GSE2075 contains data for miR-1

(GSM37599).

Experimental data for miR-24 is available in data series GSE17828 (Lal et al,

2009). The GEO2R tool available at the NCBI website was used to analyze the miR-24

data. Log2 fold-changes were obtained comparing "miR-24 HepG2" samples with

"Control HepG2". Genes with multiple probes were summarized using median log2 fold-

changes.

Cloning 3’UTR reporters

All 3’UTR reporter constructs were based on the psi-CHECK-2 (Invitrogen) dual-

luciferase system, with the 3’UTR of interest cloned into the XhoI/NotI cloning site 3’ of

the Renilla luciferase stop codon and 5’ of the SV40 poly-A signal. 3’UTR sequences

were cloned from genomic DNA isolated from HEK293 (human) or BEND3 (mouse) cell

lines, using Qiagen DNA extraction kits. PCR primers with appropriate restriction sites

were designed to flank the longest RefSeq-annotated isoform containing the TE/MRE of

interest. Phusion® Hi-Fidelity DNA Polymerase (New England Biolabs) was used to

perform the PCR amplification. Standard cloning protocols were subsequently followed

to restriction-digest and then ligate the vector and inserts. Proper insert sequence and

orientation were confirmed both by analytical restriction digests and direct Sanger

sequencing.

24

Artificial miRNA target sites were all based on two tandem copies of the reverse-

complemented mature miRNA sequence of interest, separated by a short linker sequence

containing an AgeI restriction site to facilitate downstream screening. The sequence was

modified to introduce mismatches near the center of each site and in any locations where

other miRNAs had potential seed pairing. The resulting sequences were ordered as pairs

of synthetic DNA oligonucleotides (IDT) that when subsequently annealed, formed the

artificial sites with 5’XhoI/3’NotI half-sites. T4-Polynucleotide Kinase (T4-PNK) was

used to phosphorylate the 5’ ends of the annealed pairs, which then served as the insert

for the downstream cloning protocol in the same manner as above.

Cloning endogenous microRNAs

Endogenous miRNAs were PCR amplified from human genomic DNA (HEK293

cells) using primers designed to flank the 5’ and 3’ ends of the annotated hairpin by at

least 200bp on each end. PCR products were subcloned into PCR Blunt II TOPO

plasmids, using standard protocols. After sequence verification, the TOPO plasmids

served as a template for a second PCR reaction using primers nested within the original

insert, containing the XhoI and SalI restriction sites and producing a product containing

the miRNA hairpin ±200bp. Standard cloning protocols were then followed to clone the

insert into the CMV promoter-driven expression plasmid (pFB-AAV-miRNA-pA).

Cloning of endogenous miR-566 for comparison with the syntenic loci of other primates

and mice was carried out in the same manner, but only the human sequence was

subsequently cloned into the CMV expression plasmid. Primers were designed to

flanking sequences conserved in primates, allowing use of the same primer set in all

reactions.

Cell culture and transfections

HEK293 and HeLa cells were cultured in DMEM (10% FBS) without antibiotics.

Approximately 24 hours prior to transfection, cells were seeded onto 24-well plates.

25

Transfections were performed in triplicate, using 5 ng luciferase reporter per reaction.

Artificial miRNA mimics (Pre-miRs™) or Anti-miRs™ (Ambion®) were transfected

using Lipofectamine 2000 in Optimem, at final concentrations ranging between 0 and 50

nM. All reactions were balanced with a negative control (NC#1) such that the final

concentration of the combined oligonucleotides equaled that of the highest dose of the

test miRNA. Media was completely removed from cells prior to adding the transfection

complexes, which were combined with an equal volume of DMEM (10% FBS) just

before plating. Cells co-transfected with miRNA mimics and luciferase reporters were

harvested 24 hours later. For all other conditions cells were harvested 36-48 hours post-

transfection.

Luciferase assays

Luciferase assays were carried out using the Dual-Luciferase Assay Kit

(Promega) using the standard protocol. Briefly, transfected cells on a 24-well plate were

lysed by removing the media and adding 100 µl of 1x Passive Lysis Buffer (PLB) to each

well. Cells were rocked gently for 15 minutes. 10 µl of lysate was taken from each well

and added to the bottom of opaque, flat-bottom 96-well plates. Luciferase substrates for

firefly (1x Luciferase Assay Reagent II) and Renilla (1x Stop & Glo) were prepared as

indicated in the manual. A Glomax 96-well Plate injector/reader (Promega) was used to

inject 50 µl of substrate sequentially, reading for 2 seconds after each injection.

RT-qPCR

For both mRNA and miRNA RT-qPCR, RNA was isolated from cells using

TRIzol (Invitrogen). The protocol for fixed cell culture presented in the TRIzol manual

was followed, except that GlycoBlue (Applied Biosystems) was added as a carrier prior

to the Isopropanol precipitation. RNA pellets were air-dried and resuspended in RNase-

free water. RNA concentrations were calculated using a NanoDrop spectrophotometer

(Thermo Scientific).

26

microRNA

TaqMan® gene expression assays (Applied Biosystems) were used to analyze

miRNA expression, using miRNA-specific RT and qPCR assays. mRNAs were reverse-

transcribed using the High Capacity cDNA Reverse Transcription kit (Applied

Biosystems) with random primers. SYBR green primers were designed for target mRNAs

using PrimerQuest (IDT) and used in the qPCR reactions.

Results

MiRNAs have predicted binding sites in 3’UTR-

resident TE sequences

Post-transcriptional regulation by miRNAs is classically mediated between a

miRNA and partially-complementary miRNA recognition elements (MREs) located

within mRNA 3’UTRs. In the human genome, more than 4,000 protein-coding transcripts

co-transcribe TE sequences as part of their 3’UTR. To predict the potential impact of TE

sequences on the target repertoire of individual miRNA families, I predicted and

analyzed miRNA target site frequencies in 3’UTR-embedded TE sequences.

Approximately 60% of all TE-derived sites were found in Alu (~35%), LINE1 (~12%)

and MIR (~11%) elements, consistent with their high genomic prevalence (Figure 2-1A).

On average, 5-10% of a miRNA’s total predicted target repertoire is predicted within TE-

derived MREs, although upwards of 50% is evident for some miRNA families (Figure 2-

1B). Target site distribution within particular TE families was similar with most miRNAs

showing strong bias to a particular group. For more than 85% of miRNAs, L1 and Alu

retrotransposons formed the predominant class, again trending with their high genome-

wide frequencies (Figure 2-1B).

While the MRE seed-based target prediction method demonstrates widespread

potential for miRNA regulation through TE target sites, studies of miRNA targeting

efficacy suggest that given no other information, only a small fraction of predicted targets

27

are likely to be functional (Grimson et al, 2007). Although knowledge of local sequence

and structural features has improved the likelihood of silencing activity, target site

conservation remains among the most predictive factors (Friedman et al, 2009).

Let-7 directly regulates genes through conserved, MIR-

element-derived target sites

The conservation parameter used by many target prediction programs improves the

predictive power of the algorithm and highlights events that may have served an adaptive

purpose during evolution. To determine if TE-MREs display strong sequence

conservation, target sites predicted in MIR retrotransposons were evaluated. For this, I

performed gene functional enrichment analysis using the ToppFunn algorithm on the

~1200 human genes harboring at least one 3’UTR-embedded MIR element. ToppFunn

incorporates several miRNA target prediction programs in addition to other common

classification schemes, (Chen et al, 2009). TopFunn output contained highly-significant

enrichment for let-7 target genes based on TargetScan, PITA (TOP classification), PicTar

and miRSVR (conserved_highEffect-0.5 class) algorithms (Figure 2-2; bottom). In

agreement with this, I found that ~40% (192) of TE-MRE sites were predicted within

MIRs (Figure 2-2; top-left). I also found several examples where the MRE had a high

degree of conservation restricted primarily to the miRNA binding site. For instance, for

Myosin 1F (MYO1F) and E2F transcription factor 6 (E2F6), not only is local

conservation of the let-7 binding site evident, but it is also the only let-7 site predicted in

that 3’UTR (Figure 2-3).

To test whether MIR-embedded target sites impart let-7-mediated regulation, I

cloned into luciferase reporters the 3’UTRs of MYO1F, E2F6, MYC-binding protein

(MYCBP), and major facilitator superfamily domain-containing protein 4 (MFSD4).

When co-transfected with a let-7a mimic, there was dose-dependent reduction in

luciferase activity (Figure 2-4A). No significant response was observed in the psiCHECK

28

vector control. Conversely, inhibition of endogenous let-7a in HeLa cells using an Anti-

miR™ induced luciferase reporter activity to varying degrees depending on the target

(Figure 2-4B). .

miRNAs with high Alu-MRE frequency target specific

regions in the Alu

Up to 30% of all predicted targets for well-conserved miRNAs were contained

within Alus. In the cases of miR-24 and miR-122, more than 80% of potential TE-derived

target sites (1948 and 1402 Alu targets, respectively) were predicted within Alus (Figure

2-5A, B). I reasoned that these unusually high target site frequencies (as compared to

other TEs) was due to miRNAs targeting regions with little sequence divergence from the

parent Alu. If so, MREs for these miRNAs would map to specific positions within the

Alu consensus sequence. To test this, I used genomic coordinates for each MRE and its

host Alu, and calculated the relative position of the MRE within the Alu. These local

positions were adjusted to be relative to the consensus Alu sequence, using

RepeatMasker annotations of the alignment between the Alu and its consensus. From

this, I found that the majority of sites for miR-24 and miR-122, as well as other miRNAs,

fell within very narrow regions in the Alu consensus (Figure 2-6). This suggests that the

local sequence and structural context of the MREs may be similar among these target

mRNAs.

miR-24 directly regulates transcripts through

Alu-derived target sites

I next tested if the Alu-derived sites create functional platforms for miRNA

regulation. For this, candidate Alu-derived targets for miR-24 were selected that had

higher average conservation scores in the MRE site than in the flanking Alu sequence

(e.g. MAP3K9, Figure 2-7). The 3’UTRs from five candidate genes (Platelet F11

Receptor (F11R), Carbohydrate (N-acetylglucosamine 6-O) Sulfotransferase 6 (CHST6),

29

Procadherin beta 11 (PCDHB1), and Eukaryotic Translation Initiation Factor 2, Subunit 3

gamma (EIF2S3), Mitogen-Activated Protein 3-Kinase 9 (MAP3K9)), were cloned into a

luciferase reporter for functional validation. One let-7a MIR-derived target gene,

MFSD4, also had an Alu-derived miR-24 site and was also tested. Dose-dependent

luciferase reduction was observed in response to a miR-24 mimic for EIF2S3 and

MAP3K9, as well as the artificial miR-24 target control, miR-24_2xT, (Figure2-7;

bottom). At the doses used (1 and 10 nM) no significant knockdown was observed in the

other constructs tested or in the psiCHECK no-target control (data not shown and Figure

2-7). Above that concentration, miR-24 caused non-specific changes in both firefly and

Renilla luciferase in the negative control reporter, preventing accurate interpretation of

the 3’UTR reporter data at these higher doses (data not shown). Blocking endogenous

miR-24 with antisense Anti-miR™ inhibitors resulted in a dose-dependent increase in

luciferase activity only in the artificial target positive control, consistent with the low

validation rate seen in the overexpression experiments (Not Shown). If the chosen

candidate genes represent an accurate sample of miR-24 Alu-derived targets, the extent

of Alu-derived targets imparting miR-24 responsive activity is low.

However, because miR-24 predominantly targets a specific region within the Alu

sequence, it may not be representative of an Alu sequence’s general capacity to allow for

miRNA mediated regulation. To test this on a global scale, and to determine if the

responses vary with different miRNAs, I analyzed a set of publically-available

microarray data that measured mRNA transcriptional changes in response to

overexpression of various miRNAs. Genes were annotated according to whether or not

they contained i) a 3’UTR Alu, ii) an Alu-derived target site, or iii) a canonical (non-TE-

derived) target site for the miRNA in question. Analysis of cumulative distribution

functions for all groups revealed significant repression of expression relative to genes

lacking 3’UTR-resident Alus and target sites, but not to the degree seen in genes with

canonical sites. For example, compare the cumulative fraction plots for miR-122 and

30

miR-24 (Figure 2-8 and Figure 2-9). Interestingly, in spite of the generally weaker

knockdown of Alu-derived targets, they represented 20-30% of the down-regulated target

list (Figure 2-9). Also, the two candidates (MAP3K9 and EIF2S3) that responded to miR-

24 overexpression in the context of the luciferase reporter were also repressed according

to the microarray data, supporting our previous findings (Figure 2-7).

Proliferation of Alu and B1 SINEs resulted in the

convergent acquisition of miRNA targets in their

respective primate and murine lineages

To test the impact of recently-evolved lineage-specific TEs, as well as to improve

the prediction of functionally-relevant sites, I repeated our target prediction analysis

using mouse 3’UTR-resident TE sequences. I also searched for the convergent

acquisition of TE-derived target sites to determine if murine and primate orthologs would

independently gain regulatory sites for the same miRNA (Figure 2-10). For this, I

gathered coordinates for mouse 3’UTR-resident TE sequences and used the "lift-over"

utility on the Galaxy web server to convert to the corresponding human coordinates.

Mouse 3’UTR sequences overlapping TEs with no mappable human counterpart were

then selected, as were human 3’UTR sequences overlapping TEs with no mappable

mouse counterpart. Target sites were predicted, using miRNAs present in both species. I

focused on 3’UTRs with single sites that demonstrated sequence conservation in all

species where a TE insertion was present. I functionally tested human and mouse Solute

Carrier Family 12, member 8 (SLC12A8), Sideroflexin2 (SFXN2), UBX domain-

containing protein 2B (UBXN2B) and CDGSH Iron Sulfur Domain 2 (CISD2), using

3’UTR reporters. The 3’UTR of chimpanzee SFXN2 was also cloned, because the miR-

24 seed match had a single base mutation in the target site. Both mouse and human

SFXN2 and SLC12A8 showed significant repression when co-expressed with 15 or 30

31

nM of miR-24 mimic (Figure 2-10). No significant response was seen with UBXN2B,

CISD2 or ptrSFXN2 (Figure 2-10).

Potentially-active Alu loci contain

miRNA binding motifs

Alu and LINE1 elements represent two of the most abundant TEs in the human

genome, as well as the two most-frequently predicted TE-derived target sites. Because

both TE families still show evidence of active retrotransposition in humans, they remain

potential sources for novel miRNA binding sites. A recent study assessed mobilization

activity of 89 full-length Alu sequences and found 124 key positions that were 100%

conserved in active elements (Bennett et al, 2008). I predicted miRNA binding sites in

the ~12,000 Alus in the human genome that retained these 124 features, hypothesizing

that these would be the most likely sources of new Alu-derived sites. As expected, many

of the miRNAs with a high frequency of Alu-derived 3’UTR sites, including miR-24 and

miR-122, had a high frequency of sites in the potentially-active Alu sequences (Table 2-

1). Surprisingly, however, MREs for several miRNAs were present in well over 90% of

the potentially-active sequences. For example, in the case of miR-150, over 98% of the

potentially-active Alus contain a MRE site. This suggests that novel Alu insertions into

3’UTRs may have a higher likelihood of carrying MREs for a subset of miRNA families.

MiRNAs are processed from TE sequences and regulate

target genes containing homologous elements

In the examples presented thus far, the miRNA’s origin precedes that of the

corresponding target sites. Although several mechanisms may generate novel miRNA

genes, I focused on TE sequence processing as a source of miRNAs. I hypothesized that

these miRNAs would inherently gain functional targets through homologous TE

sequences resident in 3’UTRs. In this scenario, a new miRNA gene would have an active

source of novel target sites. To test the functionality of these interactions, I specifically

32

focused on miRNAs where the sequence alignment to the TE overlaps the seed of the

miRNA guide strand. Therefore, I generated a list of all human miRNAs with any

detectable TE homology and then selected sequences where the TE annotation

completely overlapped the proposed seed sequence.

While most miRNAs with TE homology are of relatively recent origin, one

notable example of broad conservation is miR-28. Annotation of this locus shows that

tandem inverted copies of the 3’ end of an L2c retrotransposon formed the 5’ and 3’ arms

of the miRNA precursor (Figure 2-11). Both L2c sequences have a similar level of

divergence from the L2c consensus (23%, 19.4%), suggesting that both insertions

occurred around the same time. A recently-published study suggested that miR-28-5p

binds to and drives endonucleolytic cleavage of LYPD3, interacting with the transcript

through a novel "centered-seed" site within a homologous L2 element (Shin et al, 2010).

In line with this result, luciferase reporters containing the 3’UTR of LYPD3

demonstrated dose-dependent repression in response to a miR-28-5p mimic. Similar

responses were observed using 3’UTR reporters for E2F6, within which miR-28-5p is

also predicted to bind through an L2 sequence (Figure 2-11).

Functional validation of Alu-derived miRNAs

While miR-28 demonstrates functionality of a TE-derived miRNA, its high level

of conservation across eutherian mammals makes it a rare example among the TE-

derived miRNAs. To test if primate-specific TE-derived miRNAs are functional, I used

Alu-derived miR-566 and miR-1285 as case studies. At the start of this study, miR-566

and miR-619 were the only human miRNAs with Alu homology, and only with miR-566

did the Alu encompass the mature miRNA sequence (Figure 2-13A). This miRNA was

initially annotated in a study characterizing the miRNA contingent of colorectal cells

(Cummins et al, 2006). Using a common stem-loop PCR method, I detected significant

expression of miR-566 in PBMC and HEK293 cells. Furthermore, I could express the

33

miR-566 genomic locus in the context of an otherwise promoterless cloning plasmid

(Figure 2-13B,C,D). However, miR-566-encoding plasmids were unable to reduce

luciferase reporter expression (not shown), while a commercially available miR-566 Pre-

miR™ was functional. I suggest that the increase in expression as measured by stem-loop

PCR, without forming a functional miRNA, could be due to non-specific priming.

Indeed, northern blots detected the miR-566 precursor, but not a ~20 nt band

corresponding to the mature sequence (not shown).

While conflicting evidence was found for Alu-derived miR-566, as high-

throughput sequencing technologies matured and were used in miRNA discovery studies,

additional Alu-derived sequences were annotated as miRNAs. Among the list of putative

Alu-miRNAs, miR-1285 had subjectively more promising deep sequencing support than

miR-566, based on data collected by miRBase and deepBase sequencing data (Kozomara

& Griffiths-Jones, 2011; Yang et al, 2010). In a recent study which sought to validate

functionally all known mouse miRNAs, the primary method used for this purpose

involved cloning the miRNA along with ~100 flanking nucleotides into an expression

vector. They expressed these constructs in vitro and then sequenced the small RNA

fractions to determine those that yielded a functional mature miRNA (Chiang et al,

2010). These same constructs were also tested for the ability to silence MRE-containing

reporters, and the data related back to the sequencing results. Similarly, to test the

functionality of the miR-1285, I cloned 200bp flanking the pre-miRNAs of the two hsa-

miR-1285 loci (miR-1285-1 and miR-1285-2) into an expression plasmid (Figure 2-12A).

Interestingly miR-1285-1 and -2 share a common seed sequence and homology to Alu

elements, but differ notably in secondary structure. To test whether functional miRNAs

could arise from the genomic fragments, I transfected HEK-293 cells with 0, 200 or

400ng of either miR-1285-1 or miR-1285-2, balanced with a control plasmid lacking a

miRNA. A similar expression plasmid was also generated for hsa-miR-24-1 to serve as a

positive control for a valid miRNA. All plasmids were co-transfected with their

34

corresponding artificial targets as described previously. Interestingly, miR-1285-1, but

not miR-1285-2 reduced expression of the artificial reporter (Figure 2-12B). This

suggests that the miR-1285-2 locus does not produce a functional miRNA.

I also tested whether predicted targets for miR-1285 responded to miR-1285

overexpression. The majority of target sites predicted for miR-1285 were located in Alus

(Figure 2-12C), and so candidates containing Alu-derived sites were tested.

Overexpression of miR-1285 reduced expression of EIF2S3, CHST6 and CBFA2T2 in a

dose dependent manner. Attempts to block miR-1285 activity using Anti-miRs™ were

ineffective, even for the artificial target. While this could mean that endogenous miR-

1285 is non-functional in the cell lines tested, I also were unsuccessful in inhibiting the

effects of miR-1285 overexpression with the Anti-miR™ (Figure 2-12D). I hypothesize

that could be due to nonspecific binding of the Anti-miR™ to the multitude of Alu

sequences in other transcripts transcribed at any given time. In any case, the

overexpression data and the proper processing of miR-1285 in the context of the genomic

locus supports miR-1285 as a functional Alu-derived miRNA. Furthermore, while

varying functional results were observed for the TE-derived miRNAs tested, the data

from miR-28 and miR-1285 show that these miRNAs deserve further study.

Discussion

In this work, I demonstrate that the most prevalent TE families in the human

genome, namely Alu, MIR and LINE2 elements, provide a functional platform for

miRNA-mediated regulation when resident in mRNA 3’UTRs. I also found that, while

the majority of TE-MREs in human 3’UTRs reside in primate-specific L1 and Alu

elements, sequence conservation was also seen in the MIR-derived let-7 MREs.

Further inquiry into the extent of Alu MRE function will undoubtedly benefit

from high-throughput approaches of measuring gene expression changes after modulating

miRNA levels, such as the microarray experiments presented here. The low degree of

35

sequence divergence among the 3’UTR-resident Alus leads to a preponderance of

predicted MREs for some miRNAs. As a function of their limited divergence from

parental Alu sequences, distinct miRNA binding sites cluster in specific Alu primary

sequence regions (Figure 2-6). Although, on average, Alu-MREs had lower potency

canonical (non TE-derived) sites, evidence from the array data and our luciferase results

show that some are, indeed, functional.

One possible reason for the lower validation rate of Alu-MREs may arise from the

fact that Alus can associate with Signal Recognition Particle (SRP) proteins through

specific domains (Bovia et al, 1997; Chang et al, 1996; Hsu et al, 1995). If SRP binding

occurs in the context of a 3’UTR-resident Alu, MREs, and hence miRNA access, may be

shielded. In a previous study, in vitro transcription of Chloramphenicol Acetyltransferase

(CAT) mRNAs with artificial 5’ or 3’UTR Alus transcribed in the sense orientation were

bound by SRP complex (Hsu et al, 1995). If SRP binding blocks miRNA association,

miRNAs predominantly targeting the antisense Alu would be less affected. This

hypothesis could be tested directly by querying microarray datasets. Genes could be

categorized based on the presence or absence of a 3’UTR Alu, whether the Alu is

transcribed in the sense or antisense orientation, and if the 3’UTR Alu has a predicted

MRE. One could also test genes with known 3’UTR-Alu-SRP interactions to assess the

impact of the SRP on miRNA-mediated silencing. Additionally candidate genes could be

predicted computationally by searching for sequence features indicative of SRP binding.

One predictor of SRP-binding is active transposition (Bennett et al, 2008). Because the

SRP binding domains G25C and G159C in the AluYa5 subfamily are important for

transposition activity, genes with Alus retaining these features could be predicted along

with the associated MREs. Their direct association and the effects on miRNA-mediated

repression could be assessed by first immunoprecipitating SRP9 or 14 followed by RT-

PCR of candidate transcripts to confirm SRP-Alu interaction. The validated 3’UTRs

could be cloned subsequently into luciferase reporters and site-directed mutagenesis

36

performed to confirm their role in SRP binding. The candidates emerging from this

secondary screen would then be used to determine if miRNA-mediated repression was

impacted by the presence of absence of SRP, or the intactness of the G24 and G1159

sites. Conversely, 3’UTR Alus lacking these sites could be reverse-engineered to attain

SRP association sites and miRNA knockdown efficiency measured.

On a larger scale, if such interactions are shown to be important for miRNA

recognition, a Cross Linking/Immunoprecipitation – Sequencing (CLIP-SEQ) method

could be developed for the Alu-binding SRP proteins, SRP9 or SRP14. One likely

complication in this experiment would be the predominance of the 7SL noncoding RNA,

which is the canonical SRP binding partner, or free cytoplasmic Alu RNAs. However, the

molecular weight difference between Alu/7SL-bound and mRNA-bound SRP complexes

could help resolve this, since CLIP protocols often include a size-selection step.

Curiously, of the ~1.2 million Alu copies present in the human genome, fewer

than 20 are expected to produce functional miRNAs. Also, the functionality of most of

these Alu-derived miRNAs is untested. In this work, I found that miR-566 and one of the

two miR-1285 loci did not produce a functional miRNA. However, LINE-derived miR-

28 is both well conserved and functional and miR-1285-1 did demonstrate effective

processing and silencing efficacy. These data show that some TE-derived miRNAs are,

indeed functional, but Alu-derived miRNAs and other miRNAs with low apparent

sequence conservation, deserve closer scrutiny. Ideally, these studies should include a

combination of northern blot and reporter based assays, before concluding that a bona

fide miRNA emerges from the locus in question.

37

Further complication in identifying true Alu-derived miRNAs comes from a

recent study which demonstrates that DICER1 degrades Alu RNAs (Kaneko et al, 2011),

indicating that some Alu-derived small RNAs are DICER-dependent degradation

products rather than miRNAs. These results emphasize the importance of functional

experiments for validating Alu-derived miRNA function, such as those presented here.

While miR-1285 and miR-566 each produced small RNAs, only miR-1285 was capable

of silencing a luciferase reporter in a sequence-dependent manner. For miRNA discovery

studies, close examination of the proposed loci should be undertaken to ensure that they

follow the criteria outlined in (Chiang et al, 2010). The experimental design used for

testing miR-566 and miR-1285 was taken from the Chiang et al study; although they

additionally performed high-throughput sequencing on small RNA fractions extracted

from cells expressing the miRNA expression constructs. From these data, they found that

miRNAs most likely to validate functionally were those producing a predominant mature

sequence with a homogeneous 5’ end and a passenger strand with a 2 nt 3’overhang. The

miRBase repository is actively incorporating small RNA high-throughput sequencing

reads and using evidence such as that proposed in the Chiang et al study to improve the

accuracy of miRNA identification and remove dubious annotations (Kozomara &

Griffiths-Jones, 2011).

In summary, I find evidence that some TE-derived miRNAs and miRNA binding

sites are both conserved and functional. I also show that some sequences with low

sequence conservation do respond to miRNA expression, with evidence both from

reporters and global transcript expression profiles. Together, our data support a role for

TEs in the evolution of human miRNA interactions, as suggest that novel miRNA

functions may continue to arise as active transposition persists.

38

Figure 2-1. TE family composition of putative TE-MREs in human 3’UTRs.

TargetScan was used to predict miRNA target sites in Refseq-annotated human 3’UTRs. TargetScan’s provided human miRNA family seed file was used. TE-MREs were annotated by intersecting unique MRE coordinates with the RepeatMasker track annotations at the UCSC Genome Browser. (A) The TE Family annotation from RepeatMasker was used to classify all human TE-MREs, and the percent contribution of the top 5 most prevalent families is shown, representing >87% of all TE-derived target sites predicted. Primate-specific Alu and L1 retrotransposons make up more than half of the sites; the more ancient L2 and MIR elements constituted ~20% of the sites. “Other” represents all other transposable elements, but simple repeats and other low complexity repeats were not considered in this analysis. (B) Seed families were selected for which Alu, L1 or MIR-derived MREs were the most frequent TE-MRE. Seed families were binned according to the fraction of TE-MREs comprised by the majority TE group The histograms for the three TEs are overlayed, and so should be read as though every bar starts at zero. For example, the histogram shows that for ~60 miRNAs, L1s represent ~25% of the predicted MRE sites. Alus showed a bimodal distribution, because for many miRNAs, Alus represented more than half of predicted TE-MREs.

39

Figure 2-2. TE-MRE composition and unbiased gene function analysis reveal strong functional connections between let-7 and MIR-derived MREs.

The TE family composition of predicted human let-7 MREs revealed over 40% are of MIR origin. (Top) This was the highest proportion of MIR-derived sites for any miRNA in the dataset.

40

Figure 2-2. Continued. The 192 transcripts with MIR-MREs represent just over 10% of the ~1800 human genes with embedded MIR elements. (Bottom) Gene names for the ~1800 genes were analyzed using ToppFun to find functional groups associated with genes. Statistical significance is presented as p-values adjusted using Bonferroni correction. Let-7 had the most significant p-values of any functional category, including the non-miRNA categories not shown. Furthermore, it was the only miRNA with significant results for more than two of the prediction methods. The MRE prediction methods and any additional information are color-coded. For mirSVR (C = Conserved, NC = Non-conserved, HE = High Efficacy (predicted), LE = Low Efficacy).

41

Figure 2-3. Genome browser views for let-7 MIR-derived MREs in (A) MYO1F and (B) E2F6.

Let-7 MREs (yellow box) overlapping a MIR element (red boxes) annotated by RepeatMasker. MYO1F and E2F6 are two candidates where: i) no let-7 MRE is present in the 3’UTR aside from the MIR-MRE shown, and ii) PhyloP conservation scores (Mammal Cons track) showed (subjectively) strong conservation coincident with the binding site.

42

Figure 2-4. Let-7 regulates 3’UTRs containing MIR-derived MREs

3’UTRs of MYCBP, MFSD4, E2F6 and MYO1F, each containing a single MIR-derived let-7 MRE, were cloned into Dual luciferase reporters and co-transfected into HEK293 cells with low doses (0.1, 1.0 nM) of an artificial let-7 mimic (Pre-miR™). Reactions were balanced to 1.0nM with a non-targeting Pre-miR™ (ctrl). Repression of luciferase activity was observed after 24 hours with all four reporters, as well as the let7_2xT positive control, but not the negative control reporter (CTRL). Luciferase activity is plotted as a percent of the activity observed in the 0nM let-7 Pre-miR™ dose. (B) Luciferase reporters were then co-transfected with a let-7a AntimiR inhibitor (0, 25, 50nM) into HeLa cells which express high levels of endogenous let-7a. 48 hours later, all 3’UTR, but not the CTRL reporters showed increased activity over the 0 nM AntimiR dose. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).

43

Figure 2-5. TE-MRE compositions for (A) miR-24-3p and (B) miR-122 show a prominent Alu fraction.

TE-derived MREs were predicted for each miRNA. RepeatMasker track annotations were used to tabulate TE Family frequencies. The top represented families are shown, grouping all other families into the “Other” category. These results showed that Alus represent over 80% of the TE-derived miRNAs.

44

Figure 2-6. Most frequent Alu-MRE sequences map to distinct positions relative to the Alu consensus.

The position of the Alu consensus sequence containing a miR-125-3p, miR-24 or miR-122 MRE was graphed above. MRE position was normalized across all Alus by calculating positions relative to the Alu consensus sequence (see Methods). The high MRE frequency observed for each of the miRNAs is restricted to a narrow range 5-10bp wide. This suggests that little sequence divergence has occurred among Alus in these regions. It also suggests that these miRNAs encounter similar local sequence/structural contexts when binding to similar Alu-derived sites in other mRNA targets.

45

Figure 2-7. Alu-derived MREs respond to miR-24 overexpression.

Transcripts with Alu-derived MREs with evidence of local sequence conservation were functionally tested. (Top) The Primate Conservation track shows a rise in conservation score coincident with the miR-24 MRE overlapping the AluSp family sequence. (Bottom) Luciferase reporters expressing EIF2S3 and MAP3K9 3’UTRs were co-transfected into HEK293 cells with Pre-miR™ miR-24 mimics (0, 1, 10 nM Pre-miR™ doses). 24-hours later, luciferase assays reveled that EIF2S3 and MAP3K9 reporter expression was in response to miR-24, while the negative controls (ctrl) were not. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).

46

Figure 2-8. Microarray datasets measuring response to miRNA overexpression to assess functional response of Alu-derived targets on a global scale

Gene annotations from arrays were intersected with TE-MRE predictions for the corresponding miRNAs. (Top) Genes were grouped according to whether they had a canonical (non TE-derived) MRE, Alu MRE or no MRE for the indicated miRNA family, and the empirical cumulative distributions were plotted. Canonical and Alu MRE-containing transcripts were shifted to the left of the non-target set, demonstrating a larger fraction of down-regulated transcripts in these groups relative to Alu MRE sites (Bottom).

47

Figure 2-9. The fraction of down-regulated genes with Alu-derived MREs is in proportion to their overall prevalence.

Cumulative distribution plots showed greater knockdown of genes with Canonical or Alu-derived MREs compared to background (no MRE). Therefore, in the case of miR-122 (left) and miR-24 (right), between 20 and 30 percent of all MRE-specific knockdown is due to the presence of Alu-derived target sites.

48

Figure 2-10. Functional miR-24 MREs are independently created in rodent and primate clades due to lineage-specific, but homologous TE families.

Target sites were predicted in human and mouse 3’UTR TEs, limiting to TE integrations specific to each lineage (Top). Homologous genes were then combined, searching for lineage-specific TE-derived MREs for the same miRNA. For miR-24, most of these sites resulted from transposition of B1 SINE elements, which, like Alus in primates, arose from a 7SL RNA ancestral sequence. Candidates were selected which had an Alu-derived miR-24 site in human 3’UTRs and a B1-derived site in mouse. The chosen candidates, SFXN2, SLC12A8 and UBXN2B additionally had binding sites that were conserved in species where the insertion was present. (Bottom) Luciferase reporters expressing candidate 3’UTRs were co-transfected with miR-24 Pre-miRs™. SFXN2 and SLC12A8 reporters showed reduced expression after miR-24 treatment compared to the control treated cells for both human and mouse constructs. UBXN2B showed no response. Chimpanzee SFXN2 had a single base change that disrupted the predicted binding site and a reporter of this 3’UTR did not respond to miR-24 addition. N=3; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).

49

Table 2-1. miRNAs have predicted MREs in potentially-active Alus in the human genome

miRNA Family Alu (+) Alu (-)

miR-150 98.1% 0.0%

miR-129/129-5p 97.7% 0.0%

miR-590/590-3p 95.6% 0.0%

miR-106/302 95.6% 0.0%

miR-520gh 95.6% 0.3%

miR-17-5p/20/93.mr/106/519.d 95.5% 0.0%

miR-411 95.3% 0.0%

miR-512-3p/1186 94.0% 0.0%

miR-483/483-5p 86.0% 3.8%

miR-1234 83.2% 15.3%

miR-1307 75.6% 0.1%

miR-122 72.4% 0.3%

miR-139-3p 70.9% 0.6%

miR-720.h 67.5% 0.0%

miR-575 61.1% 10.4%

miR-1281 56.4% 0.0%

miR-709/1827 7.7% 97.1%

miR-24 0.6% 96.6%

miR-940 0.2% 95.8%

miR-485/485-5p 0.0% 93.2%

miR-548c-3p 1.8% 91.8%

miR-290-5p/292-5p/371-5p 2.6% 91.6%

miR-661 6.5% 73.7%

miR-566 0.2% 71.6%

miR-766 0.2% 70.6%

miR-508-5p 0.0% 68.0%

miR-1273 22.9% 66.3%

miR-663 0.0% 66.3%

Annotations and sequences of potentially active Alus were taken from Supplemental Table 3 (Bennett et al, 2008).

50

Figure 2-11. miR-28 is derived from an LINE2c retrotransposon, is highly conserved and regulates transcripts with LINE2-embedded MRE sequences.

LINE2-derived miR-28 is a conserved TE-derived microRNA (A). (B) Homologous LINE2-derived MREs make up the largest proportion of TE-derived targets. (C) LYPD3 has a LINE2c-embedded miR-28 MRE that shows strong conservation localized around the site. (D) Co-expression of 3’UTR luciferase reporters for LYPD3 and E2F6, both of which have predicted L2 target sites, are potently repressed in response to miR-28-5p overexpression. N=4; Error bars = SD. * = p ≤ 0.05 (Student’s T-test; two-tailed).

51

Figure 2-12. Alu-derived miR-1285-1 is effectively processed and mediates knockdown of genes with Alu-MREs.

(A) Two genomic loci are annotated for Alu-derived miR-1285, miR-1285-1 and miR-1285-2. Both loci, along with ~100 flanking bp were cloned into an expression plasmid. (B) MiR-1285-1 but not miR-1285-2 repressed the expression of luciferase reporters with 3’UTR resident target sites (miR1285-2xT). Mutating the seed sequence abolished this activity, indicating that miR-1285-1 is a functional miRNA. (C) Alu-derived miR-1285 has MREs predicted in homologous Alu sequences. (D) Coexpression of the miR-1285 mimic with predicted target 3’UTR reporters led to a reduction in luciferase activity. (E) Anti-miRs™ do not affect miR-1285 activity, likely due to dilution effect from other transcripts with Alu-derived sequences. N=3; Error bars = SD. * = p ≤ 0.05 (ANOVA; Tukey’s post hoc).

52

Figure 2-13. Pol III intronic promoters drive intronic miRNA expression.

(A) Representative diagram of the miR-566 genomic locus in several species. (B) Diagram of gHsa-miR-566 and gMmus-Sema3F constructs. The gHsa-miR-566 construct contains the intronic sequence of SEMA3F harboring the primate-specific Alu-derived miR-566 sequence. gMmus-Sema3F construct contains the equivalent intronic sequence of Sema3F from mouse, which is devoid of intronic miRNA sequence. (C) Mir-566 expression in HEK293 cells. Mir-566 expression was detected by QPCR in HEK293 cells after transfection with gHsa-566 but not in cells transfected with gMmus-Sema3F. MiR-566 levels were normalized to 18S expression and compared to cells transfected with gMmus-Sema3F. Data are mean ± SEM. *, P<0.05, n=4. (D) Mir-566 is expressed independently of Sema3F. MiR-566 and Sema3F expression were determined in HEK293 cells and PBMC cells. Data show expression of both miR-566 and Sema3F in HEK293 cells, while PBMC cells express only miR-566 and not the host gene. MiR-566 and Sema3F levels were normalized to 18S expression. Data are mean ± SEM. *, P<0.05, n=4.

53

CHAPTER 3

LONG INTERGENIC NON-CODING RNAS ARE A POTENTIAL

SOURCE OF ENDOGENOUS MICRORNA “SPONGES”

Abstract

MicroRNAs (miRNAs) classically bind to the 3’ Untranslated Regions (3’UTRs)

of protein-coding genes, playing important roles in diverse cellular processes. Exploring

the function of individual has relied on molecular tools that reduce the miRNA’s

expression or activity. One effective method uses constructs expressing a reporter gene

with a 3’UTR containing several miRNA binding sequences. These miRNA “sponges”

compete for miRNA binding to endogenous targets. Similarly, I find that some

endogenous long non-coding RNAs (lncRNAs) contain numerous binding sites for a

miRNA. Therefore, I propose that these lncRNAs function as endogenous miRNA

“sponges,” regulating the activity of one or more miRNAs through competitive

inhibition.

Introduction

Long, non-coding RNAs (lncRNAs) are an enigmatic class of novel RNA species

roughly defined as being larger than 200bp and having no evidence for coding potential

(Rinn & Chang, 2012). Although the name and definition are rather nondescript and

arbitrary, in the few years since their discovery, several distinct groups have emerged and

are generally defined according to their position and orientation in relation to nearby

protein-coding genes (Figure 1-2). For lncRNAs falling within or proximal to protein-

coding genes, this classification proved somewhat useful as many are thought to act in

cis, regulating expression of the neighboring transcripts. Recently, interest has grown in

understanding the function of the long intergeninc non-coding RNAs (lincRNAs). These

long non-coding RNAs can be have been implicated in the coordination of epigenetic

processes. Most long non-coding RNAs are restricted to the nucleus, supporting their role

54

in epigenetic and transcriptional control, but some are predominately cytoplasmic and are

capped, spliced and polyadenylated like mRNAs. Because microRNAs (miRNAs)

classically regulate protein-coding mRNAs by binding to non-coding 3’UTRs, I

hypothesized that some mRNA-like lncRNAs may be substrates for miRNA binding.

One of the general mechanisms by which lncRNAs can regulate transcriptional or

epigenetic states is by acting as decoys for protein regulatory factors (Wang & Chang,

2011). For example, growth arrest-specific 5 (Gas5) is an lncRNA that forms a structure

mimicking a DNA glucocorticoid response element (GRE). In this way, Gas5 competes

for binding with the DNA binding domain of the glucocorticoid receptor (Kino et al,

2010).

In this study, I propose a mechanism by which lncRNAs can act as decoys for

Argonaute (Ago)-bound miRNAs. Binding between a miRNA and mRNA classically

occurs through the 3’UTR of the mRNA as the coding region is typically a less-effective

substrate (Garcia et al, 2011). LncRNAs are effectively “UTRs”, thereby providing, at

least theoretically, large non-coding platforms for miRNA binding. LncRNAs may

provide a platform for multiple miRNA recognition elements (MREs) which would

impede miRNA:mRNA interaction. (Figure 3-14). To find lncRNAs that provide miRNA

“sponges” of biological relevance, I predicted MREs in lncRNAs that are expressed at

high levels in mouse Embryonic Stem Cells (ESCs) by microarray, or in particular

regions in the mouse brain by in situ hybridization (ISH). From this, I found many

candidate lncRNA that have greater than 10, and some with as many as 40 MREs in an

lncRNA for a single miRNA family. I characterize one such lncRNA with 23 binding

sites for the miR-15/16 family in the longest annotated isoform. Interestingly, I find

evidence for alternative lncRNA isoforms formed from alternative splicing or

transcription start site choice, which removes many of the predicted miR-15/16 binding

sites, which could subsequently regulate the degree to which a miRNA is sequestered.

55

Together, our findings suggest a mechanism for lncRNA-mediated regulation of miRNA

activity.

Methods

Data sources

Accession numbers for the mouse “Brain” and “ESC” long non-coding RNAs

were taken from the supplementary data provided in references (Dinger et al, 2008;

Mercer et al, 2008). Using these accession numbers, sequences were obtained from the

UCSC Genome browser (mm9). Mouse miRNA data, including seed sequences,

conservation level and seed family annotations were obtained from the TargetScan

website (http://www.targetscan.org/) (Grimson et al, 2007).

In situ hybridization (ISH) data for PSMI16 in adult mouse brain was obtained

from Allen Brain Atlas (http://mouse.brain-map.org/) (Ng et al, 2009) . PSMI16 data

series were found using its accession number from the Riken database, 6720401G13Rik.

The same search term was used to obtain ISH data for the e14.5 mouse embryo, available

from the Eurexpress transcriptome atlas (http://www.eurexpress.org/ee/) (Diez-Roux et

al, 2011).

Prediction and analysis of MRE content in lncRNAs

Target sites for mouse miRNAs were predicted in lncRNA sequences from both

datasets independently using the standalone Perl implementation of the TargetScan 5.1

algorithm (Lewis et al, 2005). For the purposes of representing the distribution of MRE

frequency, only one representative of a miRNA seed family was used. Additionally, for

lncRNAs with multiple isoforms, the sequence with the highest MRE frequency for a

given miRNA was represented.

56

RNA isolation and RT-PCR

Tissues were harvested from wild-type C57Bl/6 mice after deeply anesthetizing

with isofluorine and sacrificing by cervical dislocation. Tissues were immediately placed

in ~300ul RNALater (Life Technologies) and stored at 4°C overnight.

Total RNA was extracted using TRIzol reagent (Life Technologies) according to

the manufacturer’s protocol. ~1ml of Trizol was added to the mouse tissues after

removing the RNALater and tissues were homogenized on ice using a micropestle. RNA

samples were quantified by spectrophotometry and 1.0 μg of total RNA treated for 1.5 hr

with DNAse I to remove genomic contamination (DNA-free kit, Ambion®). Unless

otherwise indicated, cDNA was generated from ~500ng of the RNA using the High

Capacity cDNA Reverse Transcription Kit with random primers (Life Technologies).

Ago immunoprecipitation

Immunoprecipitations were performed using Dynabeads (Invitrogen). Beads were

prepared according to the manufacturer’s protocol, binding either the Ago or IgG control

antibodies. NPC or HEK293 cells were lysed using RIPA buffer with RNase and protease

inhibitors added. The immuoprecipitations were also performed using the manufacturer’s

protocol with the following details. Cell lysates were incubated with the dynabeads for

two hours at 4°C. After incubation and placing the samples on the magnet to separate the

bound beads, ~100 ul of the supernatant was retained as input. Three washes were then

performed with lysis buffer. After the final wash, the beads were separated, buffer

removed and 1ml of TRIzol was added directly to the beads and the reserved supernatant.

RNA was isolated as above. RT-PCR was performed to detect PSMI16 as above, except

that 200ng of RNA was used because of low RNA yield in the IP.

57

Results

Abundant MRE content is evident in many mouse

lncRNAs

To predict the extent of miRNA binding to lncRNAs, I evaluated MRE content in

a set of lncRNA sequences with previously-defined expression patterns in mouse brain or

embryonic stem cells (ESCs). ~460,000 combined seed matches were found in 849 and

945 sequences, representing confidently-expressed lncRNAs in the “brain” or ESC

datasets, respectively. Because the average lncRNA is expressed lower levels than a

typical mRNA, to effectively compete for miRNA binding I hypothesized that lncRNA

decoys would have numerous binding sites for a miRNA it regulates. To uncover

candidate interaction pairs, I tabulated MRE frequency for all predicted miRNA:lncRNA

interactions. As a control, target prediction was repeated on all sequences after

performing a randomized dinucleotide shuffle. 148 brain and 63 ESC-expressed lncRNAs

had at least 10 MREs for one or more miRNAs (Figure 3-15). By contrast, only three

such events were predicted in the scrambled control datasets. Many well-conserved

miRNAs, including pro-oncogenic miR-27, the tumor-suppressive and developmentally-

important miR-15/-16 and miR-302 families, and brain-enriched miR-128 and miR-338

had at least 10 sites predicted in one or more lncRNAs (Table 3-1). Interestingly, many

lncRNAs had numerous target sites predicted for several different miRNAs. For example,

the lncRNA NR_015505 (BC066100, in Table 3-1) has 11 MREs for miR-338, 12 for

miR-302 and 23 for the miR-15/16 family, suggesting a potential to coordinately regulate

multiple miRNAs. However, because the miR-15/16 family has nearly twice the MRE

content as any other miRNA in this transcript, I hypothesized that these miRNAs would

be the most likely candidates for competitive inhibition. Therefore, for the purposes of

this study and for simplicity, I refer to NR_015505 throughout the text as Putative

Sponge for miRNA-16 (PSMI16).

58

Expression pattern of lncRNA, PSMI16

Adult mouse brain

PSMI16 was one of only two lncRNAs that had more than 10 sites for any

conserved miRNA and was expressed in both the brain and ESC datasets. The “Brain”

lncRNA dataset had come from the identification of lncRNAs with ISH data available

from the Allen Brain Atlas. The ISH data for NR_015505 revealed the highest expression

levels in regions within cerebellum and hippocampus (Figure 3-16). Closer inspection of

the hippocampal expression pattern revealed that expression was restricted to the granule

cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. Similar

regional restriction was seen in the cerebellum, where high expression levels were only

observed in the periphery of the granular layer of the cerebellar cortex. While I

hypothesized that having high MRE content would help overcome low levels of

expression often observed with lncRNAs, these data show that in certain regions,

PSMI16 levels may be quite high in addition to having 23 MREs for miR-16.

Developing mouse at e14.5

Because PSMI16 was also expressed in the ESC dataset, I was interested to see

whether the lncRNA was expressed in the developing mouse. An ISH data series showing

PSMI16 expression in a mouse embryo 14.5 days post coitum (DPC) was found in the

Eurexpress Transcriptome Atlas database (Diez-Roux et al, 2011) (Figure 3-17).

Other adult mouse tissues and cell lines

To characterize PSMI16 further, I tested its expression in adult mouse tissues

using semi-quantitative RT-PCR. Expression was detected in all tissues tested, with

particularly high levels in colon, thymus, lung, pineal gland and ovaries (Figure 3-18).

These data show that the lncRNA has expression in many embryonic and adult tissues. I

also tested several mouse cell lines for its expression, including brain-derived endothelial

59

(BEND3), neuroblastoma (N2A), and neural progenitor (NPC) cells for the lncRNA’s

expression. N2As showed the lowest level of expression, and so NPC and BEND3 cells

were used for further studies (Figure 3-18). In these latter experiments, Oligo-dT primers

were used in the RT reactions, creating a cDNA library of polyadenylated transcripts. I

was able to amplify PSMI16 from Oligo-dT libraries not only using the same primer set

as before, but also with a set amplifying a near full-length product. This suggests that

PSMI16 is a polyadenylated transcript,

PSMI16 associates with Ago2

Although PSMI16 was expressed at high levels in many biological settings, in

order to function as a miRNA target decoy, it should associate with miRNA-containing

complexes. Specifically, a target decoy should associate with RISC, within which

miRNAs are bound by Ago proteins. To test whether PSMI16 associates with this

complex, I performed RNA Immunoprecipitation (RIP) using an antibody for Ago2 on

NPC cell lysates, which had appreciable levels of PSMI16 and miR-16 (Figure 3-5 and

Sarah Fineberg, unpublished data). Because PSMI16 is rodent-specific, human HEK293

cells were used as a negative control. RNA was extracted from bound and unbound

fractions, and RT-PCR for PSMI16 was performed. As expected, HEK293s showed no

detectable expression for PSMI16 (Figure 3-19). PSMI16 was detected in both Ago and

IgG control supernatants, confirming expression of PSMI16 in these cells and its integrity

through the course of the experiment. In the IP samples, however, a single specific band

was only seen in the Ago-bound IP fraction of mouse NPCs, demonstrating that PSMI16

associates with Argonaute proteins in NPCs.

“Modular” exon structure and differential MRE

inclusion in PSMI16 alternative isoforms

Based on annotations available at the UCSC Genome Browser, I saw that PSMI16

is a ~5.7kb transcript with 20 exons and several alternative isoforms formed from

60

differential splicing, promoter use, or 3’ end choice. The structure of the primary isoform,

drawn with exons roughly to scale in Figure 3-20, reveals that the predicted target sites

for miR-16 are present in 13 of the exons within the first (5’) two-thirds of the transcript.

Interestingly, many of the exons are near-identical copies of one another, as demonstrated

by a multiple sequence alignment of the miR-16 binding sites with ~18 flanking bases

(Figure 3-20; bottom). Remarkably, in addition to the full-length transcript, a novel short

isoform was amplified in BEND3 cells that excluded as many as 12 predicted binding

sites (Figure 3-20). Also a different variant (AK030946) skips exons 4-13, leaving only 4

miR-16 MREs (Figure 3-20). These data suggest an intriguing mechanism by which the

repetitive MRE-containing exons could serve as “Modular” units, allowing fine control

over MRE frequency and, consequently, miRNA repression levels.

Discussion

During the course of these experiments, five highly-publicized articles were

published in quick succession demonstrating various biological systems where this

miRNA “sponging” mechanism plays an important role (Cesana et al, 2011; Karreth et

al, 2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). These long non-

coding RNAs were named competing endogenous RNA (ceRNA). The first study

demonstrated that a PTEN pseudogene had no coding potential, but retained many of the

same miRNA binding sites as its protein-coding counterpart (Poliseno et al, 2010). As

compared with the example proposed in this thesis chapter, the pseudogene ceRNA

would likely have more precise impact on PTEN levels, because it would compete for

multiple miRNAs, all of which bind the PTEN transcript. Interestingly, the same group

later published that protein-coding PTEN functions as a ceRNA in a coding-independent

matter, adding to the complexity of this system (Tay et al, 2011). Finally, an mRNA-like

non-coding RNA, like PSMI16, was shown to function as a ceRNA during muscle

differentiation (Cesana et al, 2011). They showed that linc-MD1 “sponges” miR-133 and

61

miR-135, which themselves regulate transcription factors that activate muscle-specific

gene expression. Together, these data add a great deal of complexity to an already-

complex system of post-transcriptional gene regulation.

PSMI16 may still prove to be an interesting case study, since regulation of its

alternative isoforms adds an intriguing layer of complexity to the ceRNA story. However,

the linchpin in the PSMI16 story was finding a measurable indication of cellular

responses to miR-16 expression—a point which ultimately remained elusive. I proposed

measuring levels of previously-established miR-16 target genes after manipulating the

levels of PSMI16 or altering accessibility to the MRE site. However, I was unable to

validate any previously-described miR-16 targets in BEND3 cells. Overexpression of

miR-16 using Pre-miR™ mimics and inhibition with Anti-miRs™ had no effect on the

levels of the five genes tested (not shown). Artificial reporters for miR-16 showed that

the Pre-miR™ and Anti-miR™ treatments were working properly; suggesting that the

target genes tested are not responsive to miR-16 in these cells. Therefore any response

observed from manipulating PSMI16 levels would likely be non-specific in this setting.

62

Figure 3-14. Proposed mechanism for microRNA competitive inhibition by endogenous long non-coding RNA “sponges”.

(Bottom right) In a typical setting, a pri-miRNA is processed in the nucleus, processed sequentially by RNaseIII enzymes, Drosha and Dicer, and the mature guide miRNA is loaded into an Ago protein (Ago2 is depicted). (Top right) The miRNA guides the Ago-containing RISC machinery to complementary binding sites in the 3’UTR of a protein-coding miRNA, leading to reduced protein output through transcript destabilization or translation inhibition. (Top left) A proposed long non-coding RNA (lncRNA) “sponge” is transcribed in the nucleus, then possibly capped, spliced and polyadenylated before being exported to the cytoplasm where the miRNP complexes located. An lncRNA with numerous miRNA binding sites is proposed as a means to effectively compete for miRNA binding. (Bottom center) With miRNP complexes sequestered on the lncRNA, translation of target mRNAs resumes.

63

Figure 3-15. Distribution of MRE frequency in predicted miRNA/lncRNA pairs.

Modified histograms summarizing the frequency of binding sites predicted for all possible miRNA (492) x lncRNA interactions in the “Brain” (1665) or “ESC” (1333) datasets. Control lncRNA datasets (dotted lines) were generated by a randomized dinucleotide shuffling of each sequence.

64

Table 3-1. Putative lncRNA “sponges” and MRE frequency for conserved miRNAs

lncRNA miRNA

Accession. Coordinates (mm9) (#MRE) 1Conserved miRNA

AK077064 2chrY:30704983-30714563(+) (25)miR-590-3p; (15)miR-145,miR-186,miR-205,miR-24,miR-344d,miR-488; (12)miR-340-5p; (10)miR-129-5p,miR-136,miR-155,miR-214,miR-28, miR-339-5p,miR-376c,miR-433,miR-539-5p,miR-544-3p,miR-592

BC066100 (PSMI16)

chrX:47916301-47988243(-) (23) miR-16; (12) miR-302c;(11) miR-199a-5p, miR-338-3p

AK036570 chr3:127039311-127201919(-) (10) miR-296-3p; (9) miR-376b;(8) miR-486

AK148461 chr4:109074884-109078859(-) (10) miR-140; (8) miR-142-3p

AK145034 chr2:18609149-18611941(-) (10) miR-149; (9) miR-544-3p

AK028839 chr8:120205521-120217098(+) (14) miR-342-3p; (11) miR-377

AK141020 chr2:173950538-174009419(-) (10) miR-495

AK034303 chr8:96953714-96958182(-) (8) miR-495

AK031731 chr2:75538560-75542038(-) (9) miR-495

AK048599 chr13:59873240-59886735(-) (8) miR-146a

AK038923 chr5:142577068-142579235(+) (12) miR-16

BC030475 chr7:116094278-116105049(+) (8) miR-16

AK083005 chr11:32626600-32630288(+) (8) miR-18a

AK054418 chr8:86105567-86109838(-) (8) miR-214

AK017143 chr5:22888125-22939143(-) (23) miR-361

AK133305 chr4:39345077-39397282(-) (13) miR-378

DQ127229 chr8:93352735-93578407(+) (10) miR-544-3p

BC098197 chr18:89461815-89466349(-) (10) miR-590-3p

AK046284 chr7:66541346-66544482(-) (9) miR-590-3p

1“Broadly-Conserved” or “Conserved” based on TargetScan miRNA family Annotations

2Coordinates are from mm10 assembly (GRCm38.p2)

65

Figure 3-16. PSMI16 (NR_015505) In situ hybridization reveals strong regional expression in adult mouse brain.

In situ data were obtained from Allen Brain Atlas data for the adult mouse brain (Ng et al, 2009) and images below modified by Ryan Spengler. PSMI16 is listed under its Riken dataset ID, 6720401G13Rik. Sagittal and coronal section data series are available. Expression level filters were applied and images are shown for both (Top Left) a coronal section (position 222) and (Top Right) and a sagittal section (position 36). The highest expression levels (Bottom; orange structures) were seen in the Hippocampal Formation (HPF, green structures) specifically in the granular cell layer of the dentate gyrus and the pyramidal layer of Fields CA1, 2 and 3. (Bottom right) High expression was also seen in the granule cell layer of the cerebellar cortex (CBX, yellow structures).

66

Figure 3-17. Strong regional expression of PSMI16 is seen in the developing mouse (14.5 DPC) by in situ hybridization.

In situ expression strength values (left) and images (right) are taken from the Eurexpress website (Diez-Roux et al, 2011). Expression strength is provided by the database and represents a numeric depiction of subjective assessments of signal intensity in the regions falling under the anatomical system categories shown on the graph. Moderate to high expression is seen in the nervous system, as in the adult mouse (Figure 3-3). (Right) In situ hybridization of PSMI16 is shown in a sagittal section of the mouse embryo. Select areas annotated as being of “High” expression are indicated on the image.

67

Figure 3-18. PSMI16 expression by RT-PCR in (A) adult mouse tissues and (B) cell lines.

(A) RT-PCR was performed on several tissues from the adult mouse using random primers for the RT step and specific primers to detect PSMI16. The ~100bp product was detectable in all tissues tested, with particularly high levels seen in the thymus, lung pineal gland, ovary and kidney. (B) NPC, BEND3 and N2A cells were tested for expression of PSMI16. Oligo-dT primers were used for the RT step to test whether PSMI16 is likely polyadenylated. Expression was highest in NPC and BEND3 cells, as detected by the primer set used in A (Lower bands, bottom right). A nearly full-length product was also detected in NPC and BEND3 cells (Top bands). Both bands suggest that PSMI16 is polyadenylated.

68

Figure 3-19. PSMI16 associates with Ago proteins in mouse neural progenitor cells.

An Ago2 antibody was used to IP Ago2 and bound RNAs from mouse NPCs and Human HEK293s (negative control). RNA was purified and reverse transcribed from bound (IP) and unbound (Supernatant) samples. PCR was performed on the cDNAs (30 cycles) using a primer set for PSMI16 or β-actin control. A specific band was seen in the Ago IP fraction of the NPCs and not HEK293s, showing that endogenous PSMI16 associates with Ago2 in these cells.

69

Figure 3-20. Differential MRE incorporation in alternative PSMI16 isoforms.

(Top) 23 miR-16 MREs are predicted in the PSMI16 reference sequence (NR_015505), spread across 13/20 exons. Alternative inclusion of MRE-containing exons is apparent in annotated isoforms, including AK030946 (shown above) which incorporates only 4 sites. The novel short isoform cloned incorporates 12. (Bottom) Multiple sequence alignment of MRE-containing exons reveals high sequence similarity.

70

CHAPTER 4

SISPOTR: A TOOL FOR DESIGNING HIGHLY SPECIFIC AND

POTENT SIRNAS FOR HUMAN AND MOUSE 0F

Abstract

RNA interference (RNAi) serves as a powerful and widely-used gene silencing

tool for basic biological research and is being developed as a therapeutic avenue to

suppress disease-causing genes. However, the specificity and safety of RNAi strategies

remains under scrutiny because small inhibitory RNAs (siRNAs) induce off-target

silencing. Currently, the tools available for designing siRNAs are biased towards efficacy

as opposed to specificity. Prior work from our laboratory and others’ supports the

potential to design highly specific siRNAs by limiting the promiscuity of their seed

sequences (positions 2-8 of the small RNA), the primary determinant of off-targeting.

Here, a bioinformatic approach to predict off-targeting potentials was established using

publically-available siRNA data from more than 50 microarray experiments. With this,

we developed a specificity focused siRNA design algorithm and accompanying online

tool which, upon validation, identifies candidate sequences with minimal off-targeting

potentials and potent silencing capacities. This tool offers researchers unique

functionality and output compared to currently available siRNA design programs.

Furthermore, this approach can greatly improve genome-wide RNAi libraries and, most

notably, provides the only broadly applicable means to limit off-targeting from RNAi

expression vectors.

Introduction

RNAi is mediated by small RNAs (~21 nucleotides) which are loaded into the

RNA Induced Silencing Complex (RISC), generating a functional complex capable of

base-pairing with and repressing target transcripts (Provost et al, 2002 ). Scientists have

devised strategies to co-opt the cellular RNAi machinery to silence virtually any gene of

71

interest using siRNAs, which may be chemically synthesized or expressed in the context

of stem-loop RNAs [e.g. short-hairpin RNAs (shRNAs)]. RNAi tools are vital for

functional genomics studies which enrich our understanding of basic biological

processes. In addition, RNAi-based therapeutics exhibit exciting potential to treat

numerous human ailments by suppressing disease-associated genes (Davidson &

McCray, 2011). However, the utility of RNAi is appreciably limited by our ability to

design siRNAs which are both potent and specific. There is considerable evidence

supporting that siRNAs bind to and regulate unintended mRNAs, an effect known as off-

target silencing (Chi et al, 2003 ; Jackson et al, 2003 ; Semizarov et al, 2003 ). Although

most siRNA design algorithms include BLAST to identify off-target transcripts with

near-perfect complementarity, off-targeting primarily occurs when the seed region

(nucleotides 2-8 of the small RNA) pairs with sequences within 3’UTRs of unintended

mRNAs thus inducing translational repression and transcript destabilization, similar to

canonical microRNA-based silencing (Guo et al, 2010; Jackson et al, 2006 ; Lewis et al,

2003). Notably, short stretches of complementarity – as little as 6 bp – may be sufficient

to initiate off-target silencing (Birmingham et al, 2006 ) (Figure 4-1A).

Numerous reports support that seed-based off-targeting generates false positives

in RNAi screens and dictates the toxicity potential of siRNAs (Anderson et al, 2008;

Fedorov et al, 2006 ; Ma et al, 2006; Schultz et al, 2011). Anderson et al. reported that

the extent of siRNA off-targeting correlates with the frequency of seed complements

(hexamers) present in the 3’UTRome (Figure 4-1B) (Anderson et al). Upon evaluating

subsets of siRNAs with differing off-targeting potential (low, medium and high; based on

3’UTR hexamer distributions), the low subset had significantly diminished microarray

off-target signatures and less adverse effects on cell viability as compared to the other

subsets. These findings established the importance of considering seed complement

hexamer frequencies as a key criterion for designing highly specific siRNAs, and some

siRNA design algorithms have since incorporated seed-specificity guidelines

72

(Birmingham et al, 2007; Jackson & Linsley, 2010; Naito et al, 2004 ). However, these

algorithms remain strongly biased for silencing efficacy, and because numerous potency-

based filters are applied ahead of specificity guidelines, few candidate siRNAs with low

off-targeting potential seeds emerge. This is reflected in recent literature and genome-

wide RNAi libraries, where only 10% of siRNAs fall into the previously established low

off-targeting range, per the Anderson et al. study(Boudreau et al, 2011; Moffat et al,

2006). While potency-based design is rational, only a fraction of the functional siRNAs

for a given target transcript are predicted, and in many instances, highly functional

siRNAs do not satisfy several design rules.

In recent work from our laboratory, we aimed to improve the safety profile of

therapeutic RNAi by designing hairpin-based vectors containing siRNAs with low off-

targeting potentials (Boudreau et al, 2011). We implemented a design scheme which

focuses on seed specificity yet promotes efficacy. This approach proved successful in

identifying therapeutic sequences which effectively silence target gene expression, induce

minimal off-targeting and are well-tolerated in mouse and non-human primate brains

(McBride et al, 2011). These promising results prompted us to extend the utility of this

approach by developing a user-friendly tool to facilitate with the selection of low off-

targeting potential siRNAs for broader application in therapeutic development and basic

biological research. Here, we describe a specificity biased design algorithm which

employs an improved means to score off-targeting potentials, and demonstrate its

effectiveness and unique functionality in comparison to current publically available tools.

Methods

Dataset and Sequence Retrieval

Pre-processed microarray datasets, annotations and sequences were obtained from

previously published supplementary materials (Garcia et al, 2011). This represents a

73

compilation of microarray data from seven earlier reports describing gene expression

changes in siRNA- or miRNA-treated HeLa cells.

TargetScan 6.0 was used determine the frequencies of seed complement binding

sites (e.g. 6-mer, 7A1, 7m8 and 8-mer) for all possible 16,384 heptamers (corresponding

to positions 2-8 of the small RNA) for each RefSeq 3’UTR sequence (Garcia et al, 2011).

Human (GRCh37/hg19) and mouse (NCBI37/mm9) 3’UTR sequences, and

corresponding gene symbols and accession numbers were obtained from the UCSC Table

Browser (http://genome.ucsc.edu/) using RefSeq annotations (Fujita et al, 2011;

Karolchik et al, 2004; Kent et al, 2002; Lander et al, 2001 ; Pruitt et al, 2005).

Formulating POTS

Dataset selection

Expression data for endogenous microRNAs were excluded from the training and

validation sets; several publications have suggested avoiding these seed sequences in

RNAi sequence design (Garcia et al; Wang et al, 2009). The GSE5814 dataset was also

excluded, because 77 of the experiments tested siRNAs with the same seed sequence.

Strand-biasing analyses were performed to determine whether sense or antisense strands

induce detectable off-targeting in each experiment. Pairwise T-tests were performed

comparing genes with at least 1 7mer site (>=1 8mer, 7M8 or 7A1) for either sense or

antisense strand seed sequence, to those having no predicted 3’UTR target site, including

6mer sites. Experiments exhibiting highly significant repression mediated by the sense

strand (one-tailed; P≤6E-5), and little to no evidence for the antisense (P>0.05) were

removed from further analyses. Of the remaining studies, the Dharmacon2008 dataset

qualitatively showed the most diversity in seed off-targeting potential, and it was set

aside for downstream validation.

74

Establishing weighted probability of repression (PR) values

and POTS calculation

Following the dataset filtering described above, 53 microarray datasets from

three independent studies (Dharmacon2006, GSE5291 and GSE5769) were used as

training data to establish POTS. For each microarray dataset, transcripts with a single

predicted 3’UTR seed binding site for either the sense or antisense strand of the given

siRNA were considered. This was done to account for possible loading of the sense

strand which may also mediate off-targeting. Transcripts with multiple target sites (8mer,

7M8, 7A1 or 6mer) for either strand were ignored so that the silencing potential for single

sites for each site type could be determined. Background data for each microarray

consisted of the remaining transcripts with no predicted 3’UTR seed binding sites for

either siRNA strand. Transcripts containing seed binding sites were parsed into groups

based on seed site type, and cumulative distributions of gene expression values were

generated for each transcript set.

PR values were calculated as a measure of the increased probability of repression

imparted by the presence of the single seed binding sites, relative to background

expectations. Statistical analyses were first performed on the datasets collectively to

identify the log2 fold-change value corresponding to the most significant divergence of

repressive potentials across all site types. For this, the data were analyzed at discrete

intervals (0.05 log2 fold-change increments), comparing the mean differences in

cumulative fractions (paired-samples T-test) for each site type set relative to the

respective background values across all experiments. Fisher’s method was used to

summarize p-values at each interval. The most significant interval (-0.3 log2 X2=176.4;

df=8; P<6E-34) was used calculate PR values where,

0.3 0.3

75

These PR values were multiplied by seed binding site frequencies (N) for each

site type in the 3’UTRome and summed to compute a weighted Potential Off-Targeting

Score using the following equation:

To generate the final POTS used in the siSPOTR tool, PR values were calculated

for both the validation and training datasets, and the median values served as the final PR

value. Also, 8mer, 7M8, 7A1 and 6mer site counts for all 16,384 heptamers were

calculated from Targetscan 6.0 (Garcia et al, 2011) predictions based on human and

mouse RefSeq-annotated 3’UTRs.

Tissue-specific POTS analysis

Expression profiles from 177 human cell lines and tissues based on the

U133A/GNF1H gene atlas were obtained from the BioGPS FTP site (http://biogps.org)

(Su et al, 2004; Wu et al, 2009). For each dataset, genes with median expression values

of greater than 100 for their corresponding probe sets were considered to be expressed. A

tissue-specific POTS (tsPOTS) was calculated for each tissue, as described above, but

limiting the 3’UTRs to expressed genes when calculating site type frequencies. Spearman

correlations were performed to evaluate variability in the rank-order of seed sequences by

tsPOTS, as compared to POTS calculated based on all human 3’UTRs.

Validating siSPOTR

Efficacy

The 2431 siRNAs in the Huesken Dataset were stepwise filtered according to the

siSPOTR design scheme (i.e. strand-biasing, GC-content and POTS rank). For a

comparison of efficacy, we used siDesign Center (Dharmacon), a highly utilized siRNA

design tool which focuses primarily on potency. Target gene coding sequences were

obtained using the Genbank Accessions provided in the Huesken siRNA Dataset and

76

were used as input sequences into the siDesign Center tool for siRNA design using

default settings. The top ten hits by siDesign Center were considered the top candidates

and were intersected with the Huesken siRNA dataset. Gene silencing efficacies for

overlapping siRNAs were recorded and plotted.

Ranking off-targeting potential

To evaluate the ability of the PR values to estimate the relative extent of off-

targeting, POTS values were calculated for the validation set, using the median value for

each site type determined from the training set. Target site frequencies were calculated as

described above, using human RefSeq 3’UTR sequences for transcripts present on the

array. POTS values were determined as the sum-product of the 8mer, 7M8, 7A1 and

6mer site frequencies and their respective PR values.

Cumulative distribution plots for gene expression values were generated by

parsing the transcripts by site type with no limitation for transcripts with single sites. The

number of down-regulated transcripts over background was calculated as described

above, subtracting the background fraction at the same point. Seeds were ranked

according to these values, and were compared to the rank-order of their estimated POTS

values, using spearman rank correlations. Visual inspection of the correlation plot

showed seven qualitatively-distinct outliers in the right tail of the POTS distribution (red

dots, Figure 5D). Spearman’s rank correlation coefficients and p-values were calculated

with and without these samples included.

Suppression signatures

Microarray data for the validation datasets was processed on a per target gene

basis (i.e. GAPDH, PPIB, and No Target groups) to discern off-targeting from gene

expression changes resulting from on-target silencing. The microarray data for each

group was evaluated to identify genes that were down-regulated by more than three

standard deviations from the mean, across the datasets, for a given gene. These gene lists

77

and accompanying gene expression values were imported into Partek Genomics Suite

(Partek GS, Saint Louis, MO) and used to perform hierarchical clustering by row

(columns were ordered by increasing POTS) allowing visualization of the suppression

signatures by heatmaps. Heatmaps were partitioned to separate low POTS and high

POTS siRNAs for each group. A qualitative assessment of suppression signature size was

defined by the area of the broadest, dark blue regions for each lane and plotted on a

common x-axis.

SiRNA Design Tool Comparison

We obtained RefSeq coding sequences for the sixteen therapeutically-relevant

gene targets (Table 1). These sequences were used as input at each of the indicated

siRNA tool websites [siDesign Center (Dharmacon,

http://www.dharmacon.com/designcenter/DesignCenterPage.aspx), siRNA Target Finder

(Genscript, https://www.genscript.com/ssl-bin/app/rnai), DSIR (Commissariat à l'Energie

Atomique; France, http://biodev.cea.fr/DSIR/DSIR.html), and Applied Biosystems SVM

siRNA Design Tool (http://www5.appliedbiosystems.com/tools/siDesign/) (Birmingham

et al, 2007; Vert et al, 2006; Wang et al, 2009). These websites were selected for this

comparison analysis because they are the select few of potency-based design tools that

consider seed-based off-targeting. In each case, the optional parameters were adjusted to

match our design scheme (e.g. 20-70% GC-content). At siDesign Center, output siRNAs

for each of the sixteen targets were sorted using by “Low Freq Seed” to identify

candidates with low off-targeting potential among their top hits. For each target, up to 50

siRNAs were obtained for POTS analysis. At siRNA Target Finder, the Machine

Learning option was used along with the Off-target filter (human, organ=house, seed

size=7, and Functional alignment option). Antiviral and Tradeoff options were

deselected, and the output siRNAs (up to 10 per target gene) were used for POTS

analysis. At DSIR, the default options were used and POTS for all candidates [ranging

78

from 4 to 517 siRNAs per target gene (RTP801 and APOB, respectively)] were

determined. For the Applied Biosystems siRNA Design Tool, sequences were uploaded

and siRNAs obtained. For all siRNAs evaluated in these analyses, POTS were

determined using positions 2-8 of the antisense strand.

Genome-wide shRNA coverage analysis and prospective

library generation and comparison

The EMBOSS Splitter tool on the Galaxy web server (http://galaxyproject.org/)

was used to generate a list of candidate siRNAs, for all human RefSeq 5’-UTR, CDS and

3’UTR sequences using a 21-nt, 1-nt offset sliding window (Blankenberg et al, 2010;

Giardine et al, 2005; Goecks et al, 2010). Candidate siRNAs were filtered to promote

antisense strand loading, retaining target sequences with the following pattern:

NN[G/C]3-4N5-19[A/T/C]20-21(Birmingham et al, 2007; Khvorova et al, 2003 ; Matveeva et

al, 2007; Schwarz et al, 2003). Sequences falling outside of a 20-70% G/C content range

were removed.

POTS values were obtained for the remaining sequences and were used to rank

order candidate siRNAs for each transcript. Similar to previous publications and currently

available RNAi libraries, candidates with near-perfect binding (0 or 1 mismatch) across

an 18-nt core (antisense strand positions 2-19) were removed (Birmingham et al, 2007;

Moffat et al, 2006). For purposes of comparison to the RNAi Consortium human shRNA

library (Broad Institute, MIT) (Moffat et al) and coverage analysis, sequences

corresponding to the 5’-UTR through the first 30-nt of the coding region were also

removed. Candidate sites were grouped by Gene Symbol and duplicate values removed,

noting sequences found in multiple transcript isoforms or with more than one site in the

same transcript. A prospective shRNA library was generated by applying an additional

filter to eliminate sequences with “TTTT” or “AAAA” motifs, allowing for compatibility

79

with Pol-III expression-based systems. For each dataset, up to 10 candidates with the

lowest POTS were included per gene.

For off-target comparison and coverage analysis with the RNAi Consortium

shRNA library (one of the few with sequence information), POTS values were assigned

based on position 2-8 of the reported antisense strand. POTS values were binned for each

dataset for POTS distribution comparison. ShRNA coverage analysis is reported based

only on the genes included in the TRC dataset.

Results

Low off-targeting siRNAs maintain potency

We first assessed whether siRNAs with low off-targeting potential have the

capacity for potent silencing, since a diminished efficacy could explain their

underrepresentation in the literature. Upon evaluation of 2431 randomly designed

siRNAs described by Huesken et al. (henceforth referred to as the Huesken siRNA

dataset) (Huesken et al, 2005 ), we found that low off-targeting potential siRNAs (i.e.

those having less than 2000 potential off-targets based on 3’UTR seed complement

hexamer distributions) exhibit comparable silencing efficiencies relative to the remaining

sequences (~66% and 69% knockdown, respectively; Figure 4-2), with 1 in 4 siRNAs

achieving >80% silencing, a commonly accepted threshold for potency. These results

indicate that low off-targeting potential does not preclude siRNAs from being functional,

suggesting that a siRNA design scheme weighted towards seed specificity would be

capable of generating potent sequences.

Design of effective low off-targeting potential siRNAs

We thus developed a siRNA design algorithm termed siSPOTR (siRNA Seed

Potential of Off-Target Reduction) which incorporates the most prominent determinants

of siRNA efficacy while focusing mainly on seed specificity. For a given target sequence,

80

all possible 21-mer siRNAs are filtered based on strand-loading and GC-content and then

rank-ordered based on seed specificity.

Strand-biasing

First, siRNAs are selected to promote faithful loading of the antisense strand to

mitigate potential off-targeting mediated by the sense strand. This is achieved using

conventional siRNA design methodology based on duplex thermodynamic stability, with

strong G-C binding at the 5’ end (2 bp) of the sense strand and weak A/G-U binding at

the opposing end (2 bp; Figure 1A) (Khvorova et al, 2003 ; Schwarz et al), with target

sites corresponding to NN[G/C]3-4N5-19[A/T/C]20[A/T/C/G]21. Notably, this differential

stability represents the most significant attribute promoting siRNA efficacy, therefore

encouraging potency in addition to specificity (i.e. preventing off-targeting from the

sense strand) (Birmingham et al, 2007; Matveeva et al, 2007). To satisfy this criterion,

weak G-U wobble pairing at the 3’ end of the target site can be introduced by converting

cytosines into uridines. We allow sense strand modifications at position 20 and 21 (i.e.

positions 1 and 2 of the antisense strand, respectively), while only permitting antisense

strand modification at position 1. Previously published data supports that the first

position of the antisense strand does not influence targeting efficacy (Miller et al, 2004 ),

and the ability to make these base conversions increases the number of potential target

sites passing this strand-biasing filter.

GC-content

Next, putative target sequences are filtered based on GC-content, another strong

determinant of siRNA potency (Birmingham et al, 2007; Matveeva et al, 2007). A range

of 30-65% GC is considered optimal for identifying effective siRNAs and is generally

used among potency-based siRNA design algorithms. To improve our yield of siRNAs

with a potential for high specificity, we allow a broader range of 20-70% GC content.

Our evaluation of the Huesken siRNA dataset supports that siRNAs within this range

81

exhibit a suitable potential for efficient silencing of >80% (roughly 1 in 4 randomly

designed siRNAs) (Huesken et al, 2005 ) (data not shown).

Seed specificity

Finally, we rank candidate siRNAs by scoring seed specificity using a weighted

system (POTS: potential off-targeting score) that was formulated based on miRNA target

recognition paradigms and siRNA off-targeting data derived from siRNA microarray

studies (>50 unique siRNAs individually tested in HeLa cells). Off-targeting among these

datasets follows the well-characterized miRNA-based hierarchy of silencing potential

based on seed site type (Figure 4-3A) (Lewis et al, 2005); the presence of 8-mers within

transcript 3’UTRs confers a notably higher potential for down-regulation relative to the

intermediate 7m8 and 7A1 sites, while 6-mer sites impart the least repressive potential

over baseline transcripts (i.e. no sites). Statistical analyses performed on the datasets

collectively revealed that the most significant divergence of the repressive potentials

among all site types occurs at ≤ -0.3 Log2 fold-change (P<0.001, Figure 4-3B). We next

established a weighted probability of repression (PR) (i.e. the likelihood for ≥ 0.3 Log2

fold-change down-regulation relative to baseline) for each site type by evaluating the

siRNA experiments individually to control for the observed baseline variability among

these datasets. The resulting PR values [8-mer (14.58%), 7m8 (7.68%), 7A1 (6.56%), and

6-mer (3.64%)] were calculated using the median values for each site type across the

datasets. These PR values were then incorporated into the POTS formula which

integrates both seed site type and frequency parameters. Previous reports have established

that the potential for a miRNA to down-regulate a transcript depends not only on seed

site types, but also the frequencies of these sites within a target 3’UTR (Doench & Sharp;

Grimson et al, 2007; Nielsen et al, 2007). Grimson et al. reported that multiple miRNA

seed sites in a single 3’UTR primarily act in an independent and non-cooperative manner

(e.g. two 8-mers impart twice the repressive potential relative to a single 8-mer). Our

82

evaluation of the siRNA microarray experiments corroborated these results (data not

shown), and thus, the POTS equation was formulated accordingly to provide an accurate

estimation of off-targeting potentials.

Where N = frequency of site in the 3’UTRome, and PR = probability of repression.

We next calculated POTS for all possible 16,384 heptamers [note: heptamer

sequences corresponding to positions 2-8 siRNAs/miRNAs determines all possible seed

site type sequences (Figure 4-3A)] using transcriptome-wide human 3’UTR sequences

and observed a broad, non-uniform distribution of POTS, ranging from 5 to 5095 (Figure

4-3C). Not surprisingly, the highest scores were among heptamer sequences relevant to

polyadenylation (e.g. AAAAAAA), whereas low POTS heptamers contain CpG

dinucleotide motifs which are relatively rare within mammalian genomes. The POTS=50

value is highlighted, representing an estimated but relevant cut-off which is employed

henceforth for demonstrative purposes throughout this manuscript. This value is

noteworthy since all 14 of the previously validated low off-targeting potential siRNAs

tested by Anderson et al. have POTS<50(Anderson et al, 2008). Furthermore, our

evaluation of 750 siRNAs and accompanying in vitro cytotoxicity data support POTS<50

as a conservative cut-off associated with an improved likelihood for tolerability (data not

shown) (Fedorov et al, 2006 ). The siSPOTR specificity feature serves primarily to rank

the off-targeting potential of siRNAs, and a firm cut-off for POTS values does not exist,

much like for siRNA efficacy scores provided by potency-based siRNA design

algorithms.

The importance of weighting seed site types is evident particularly in cases where

seeds sharing the same core hexamer vary greatly in the number of genes containing the

more potent 7- and 8-mer sites. For example, the seeds CGCGATa and CGCGATc each

have 302 potential off-target transcripts (based on 3’UTR hexamer counts) but

83

respectively have 40 and 201 transcript 3’UTRs with 7- or 8-mer sites. This 5-fold

difference creates a considerable disparity in the off-targeting potentials of these seeds,

resulting in a two-fold difference in their POTS values (Table 4-2, Table 4-3). This

illustrates the importance of considering position 8 which dictates the sequence of the

most potent seed site types (i.e. 7m8 and 8mer). We calculated the mean site type

frequencies for all possible heptamers binned by POTS values, revealing nearly a 5-10

fold reduction in the more potent site types for Low POTS heptamers, relative to those

with medium-to-high POTS (e.g. for 8mers, mean values of ~45 compared to >350,

respectively).

Finally, as means to further refine our prediction of off-targeting potentials, we

considered the degree to which POTS is influenced by variations in gene expression

changes across tissues. For this, transcriptional profiling data from 177 different human

cell lines and tissues (BioGPS) were used to calculate tissue-specific POTS for all

possible heptamers. Although gene expression patterns vary greatly across tissues, POTS

ranks for each heptamer correlate strongly (r2>0.95; Figure 4-4). These data support that

organism-wide application of POTS is suitable.

SiSPOTR design example

We provide a step-wise example illustrating the use of siSPOTR for designing

siRNAs targeting the human PPIB coding sequence (CDS; Figure 4-5). The 648-nt target

sequence is first divided up to produce all 631 possible 21-mer siRNA target sites, and

the strand-biasing and GC-content filters described above are applied prior to

determining POTS values for the resulting siRNAs. In this example, among the 113

PPIB-targeted siRNAs which satisfy the strand-biasing and GC-content criteria, seven are

represented in the siRNA validation datasets described below, allowing visualization of

the measured off-targeting associated with their respective POTS values of 25, 29, 40,

407, 410, 510 and 560 (Figure 4-6).

84

Validation of siSPOTR algorithm:

efficacy and specificity

Efficacy

We gauged the capacity of siSPOTR to identify potent siRNA sequences among

the siRNAs in the Huesken dataset (Figure 4-6A). The siRNAs satisfying the strand-

biasing and GC-content criteria were rank ordered by POTS (low to high), yielding seven

siRNAs with POTS<50. Here, this relatively low number results from fewer sequences

passing the strand-biasing filter, since the capacity for introducing duplex instability

using G-U base-pairs, as described above, is not applicable to these pre-existing siRNAs.

Surprisingly, these seven siRNAs each had >80% silencing efficacy, with a mean

comparable to that of siRNAs within the database that were identified among the top hits

generated by siDesign Center (Dharmacon), a widely-used siRNA design website.

Although siDesign Center yields more hits among this database, only two of these

siRNAs has a POTS<50. Indeed, siSPOTR identified five siRNAs not among the

siDesign Center hits (Figure 4-6A, Venn diagram), highlighting the unique output

potential of the siSPOTR algorithm.

Off-targeting potential

We next evaluated the predictive power of POTS to estimate the extent of off-

target gene silencing observed among microarray experiments for 40 unique siRNAs

targeting GAPDH, PPIB, or “No Target”. These 40 experiments were selected because

the siRNAs encompass a broad range of POTS with relatively equal representation across

low, medium and high scores. To improve our ability to discern sequence-specific off-

targeting from gene expression changes associated with on-target silencing, the datasets

were grouped by target gene prior to calculating differential gene expression and

establishing “suppression signatures” for each siRNA. Furthermore, each of these 40

siRNAs exhibits greater than 85% silencing efficacy, reducing the potential for detecting

85

gene expression changes due to varying degrees of on-target silencing within groups. In

support of the POTS approach, our analyses of these datasets reveals smaller sequence-

specific “suppression signatures” among the low off-targeting potential siRNAs

(POTS<50), relative to siRNAs with higher POTS (Figure 4-6B). Notably, 13 of 28

higher POTS siRNAs produced greater “suppression signatures” than the largest one

observed among the low POTS siRNAs (Figure 4-6C). It is important to note that our

analyses (data not shown) and previously published data support that these “suppression

signatures” consist of down-regulated transcripts that are enriched for 3’UTR seed

binding motifs, suggesting that most are likely to be direct siRNA off-targets (Burchard

et al, 2009; Jackson et al, 2006).

The prospect of using POTS to accurately rank off-targeting potentials among

these 40 siRNAs was also assessed. Spearman rank correlation of the POTS scores and

numbers of down-regulated off-targets observed for each siRNA indicated a positive

correlation of modest significance (Figure 4-6D, dotted line, P = 0.05). As depicted by

this plot, a few higher POTS siRNAs have low numbers of off-targets (red dots);

however, none of the low POTS siRNAs showed high numbers of off-targets. Indeed

removing the overt outliers among the higher POTS siRNAs produces a highly

significant correlation (solid line, P < 1E-8), providing further evidence that POTS is a

reliable predictor of siRNA off-targeting potentials. These data, in conjunction with the

efficacy validation, establish the robust capability of siSPOTR to identify highly specific

and effective siRNAs.

Finally, we reasoned that training on more datasets (i.e. combining the training

and validation sets described above) could generate a more accurate POTS for ranking

siRNA off-targeting potentials. As expected, the Spearman rank correlation of POTS

scores and numbers of down-regulated off-targets observed for each siRNA showed even

greater significance (Figure 4-7). These improved POTS values are used henceforth.

86

Comparison of siSPOTR to other algorithms

We subsequently compared the abilities of our design strategy and other

publically available algorithms, particularly those which incorporate seed specificity

parameters, to identify siRNAs with low off-targeting potential seeds (i.e. low POTS).

The coding sequences of 16 therapeutically-relevant genes (of varying sizes; comprising

in total ~50 kb) were used as input, and the number of candidate siRNAs with POTS<50

was determined for each algorithm. Our design scheme identified more low off-targeting

potential siRNAs [at least four siRNAs (a typical starting number for initial efficacy

screening) for all 16 of the input genes] relative to the other algorithms, which failed to

generate at least four siRNAs with POTS<50 for at least 8 of the 16 genes (Table 4-1).

This observation emphasizes a considerable limitation of current siRNA design tools that

are strongly biased towards potency, highlighting the unique functionality that siSPOTR

provides to researchers seeking siRNAs with low off-targeting potentials.

Prospective applications to expressed RNAi and

genome-wide RNAi libraries

The siSPOTR algorithm provides an attractive approach for limiting off-targeting

from hairpin-based RNAi expression systems, which unlike siRNAs, are not amenable to

chemical modifications that may reduce seed-based off-targeting (Bramsen et al, 2010;

Jackson et al, 2006 ; Vaish et al). Recently, we published microarray data supporting that

RNAi vectors expressing siRNAs with low off-targeting potentials (based on 3’UTR

hexamer frequencies) show reduced off-targeting relative to sequences with more

promiscuous seeds (Boudreau et al, 2011). To ascertain whether POTS can be a reliable

indicator of off-targeting from expressed RNAi, we evaluated the association of POTS

with off-targeting for the expressed RNAi sequences tested in this previous study (eight

constructs with POTS ranging from 11 to 653). Hierarchical clustering of differentially

expressed genes (N=827, P<0.0001) among the various RNAi sequences reveals that the

87

clustering distance relative to the control (i.e. promoter-only vector) increases in

agreement with rising POTS values (Figure 4-8), supporting that Low POTS RNAi

sequences induce fewer gene expression changes as compared to sequences with higher

POTS values. These data substantiate the utility of siSPOTR for improving the specificity

of RNAi expression vectors.

Next, we investigated the feasibility of generating a genome-wide shRNA library

using this algorithm. Genome-wide RNAi screens are broadly used to discover genes

implicated in biological pathways and phenotypes; however, these screens can be plagued

by off-target effects producing false leads (Ma et al, 2006; Schultz et al, 2011). Although

bioinformatic approaches show some practicality for distinguishing off-targets from bona

fide targets (Sigoillot et al, 2012; Zhang et al), careful attention to sequence selection

may greatly reduce off-targeting among libraries. There are currently several RNAi

libraries available in synthetic siRNA or expressed forms (e.g. shRNAs). Here, we

demonstrate the potential of our siRNA design scheme to generate genome-wide RNAi

libraries with high specificity (based on POTS and BLAST, see methods). Our

prospective shRNA library (“Low POTS”) consists of 235,121 sequences (up to 10

shRNAs per target gene; POTSmedian=37) and provides at least 4 shRNAs with<50 POTS

for more than 78% of all RefSeq mRNAs (Figure 4-9). These sequences have reduced

(nearly 10-fold) off-targeting potential over those offered in a publically available

shRNA library [178,265 sequences; POTSmedian=322; The RNAi Consortium (TRC)]

which covers 0.70% of RefSeq mRNAs with at least 4 shRNAs having<50 POTS. A

histogram of the POTS distributions for each of these libraries reveals an evident

disparity, with>90% of the sequences having improved POTS relative to the TRC library

which followed a near-random distribution mirroring POTS for all possible heptamers.

For genome-wide siRNA design, the “low POTS” library coverage is even broader (data

not shown), providing an additional means to enhance specificity in combination with

88

chemical modifications to the seed (Bramsen et al, 2010; Jackson et al, 2006 ; Vaish et

al, 2011).

SiSPOTR Online Tool

Based on these observations, we developed an online tool employing the

siSPOTR algorithm to assist users with designing RNAi sequences with low off-targeting

potential for application in human and mouse (https://sispotr.icts.uiowa.edu). The

siSPOTR tool searches user-defined target sequences for siRNAs that pass strand-biasing

and GC% filters and outputs candidate siRNAs rank-ordered by POTS from lowest to

highest. For convenience, the sequences are ready-to-order with the necessary nucleotide

substitutions made to the sense strand to promote proper strand-loading. In addition,

DNA oligonucleotide sequences for generating corresponding shRNAs are supplied to

assist users with generating RNAi expression vectors. The output also provides detailed

off-targeting information for each siRNA including i) the number of 3’UTRs containing

each seed site type, ii) the putative off-target transcripts, and iii) counts of each seed site

type on a per transcript basis. The siSPOTR tool also alerts the user if the siRNA seed

sequence matches that of a known miRNA, as such an instance may confound

experimental results given the regulatory roles miRNAs play in numerous biological

processes and pathways. Furthermore, recognizing the ease of purchasing pre-validated

siRNAs and shRNAs, we provide an accompanying online tool which allows users to

input siRNA sequences to obtain POTS values and the detailed off-targeting information

described above. These tools will provide researchers with dependable means to

minimize and evaluate off-targeting concerns associated with RNAi experiments.

89

Discussion

Consideration of Seed Pairing Stability

A recent report from the Bartel laboratory evaluated the impact of seed-pairing

stability (SPS) and target abundance (TA; levels of potential binding sites in the cellular

transcriptome) on seed-mediated silencing by small RNAs (miRNAs and siRNAs)

(Garcia et al, 2011). Their data support that seeds with weak SPS inherently have higher

TA, and that both factors limit seed-based silencing potency, presumably from weaker

binding and a dilution effect associated with the increased number of targets. In contrast

to the siSPOTR approach, the authors propose that designing siRNAs with weak SPS and

high TA seeds may minimize off-targeting potential. While the potency of such seeds

may be low on average, the possibility of repressing considerably more off-targets exists.

A comparison of the low POTS approach to the weak SPS strategy may be warranted.

When accounting for repressive potentials in addition to the numbers of predicted off-

targets, it is likely that siRNAs having weak SPS would consistently have higher numbers

of off-targets expected to be down-regulated, relative to low POTS siRNAs. Even yet, a

consideration for SPS in siRNA design is warranted, and we have added SPS values to

the siSPOTR output, so that users may avoid higher SPS seeds among siRNAs with

comparable POTS values.

The Utility of siSPOTR

Off-target effects (e.g. false discovery rates and toxicity) pose a problem for gene

silencing technologies, particularly for RNAi therapeutics, thus supporting the need for

developing a user-friendly tool to assist researchers in designing siRNAs which are

highly specific and efficacious. Here, and in prior work from our laboratory and others’,

we demonstrate that focusing on seed specificity in siRNA design may mitigate off-

targeting by 5- to 10-fold, as supported by predictive analyses and transcriptional

profiling data from RNAi studies (Anderson et al, 2008; Boudreau et al, 2011). Unlike

90

other siRNA design strategies, siSPOTR yields numerous candidate sequences with low

off-targeting potentials, providing a broad and attractive approach towards alleviating

off-target concerns. Other means to address off-targeting have been previously described.

For example, in basic biological research, scientists may employ “same seed” controls

(i.e. containing the same seed sequence as the experimental siRNA, but central

mismatches to prevent silencing of the target of interest) to discern on-target versus off-

target effects(Boudreau et al, 2011). Furthermore, research supports that off-targeting

from synthetic siRNAs can be reduced by chemical modifications or using lower doses

(Bramsen et al, 2010; Caffrey et al, 2011; Jackson et al, 2006 ; Vaish et al, 2011; Wang

et al, 2009); however, specificity could be enhanced further by employing seeds with low

POTS. By contrast, for expressed RNAi forms (e.g. shRNAs), our approach provides the

only broadly applicable methodology to limit off-targeting potential. Although sequence-

specific effects on hairpin expression, stability, and processing may also contribute to off-

targeting potential, our data support that POTS provides a good predictor of off-targeting

for RNAi expression vectors. This is important particularly since dosing from RNAi

expression vectors cannot be as readily controlled, and shRNA-induced toxicities have

been reported by several groups (Boudreau et al, 2009a; Grimm et al, 2006 ; Martin et al,

2011; McBride et al, 2008). Given the extensive use of RNAi expression systems in the

laboratory and in therapeutic development, siSPOTR will serve as a valuable tool to the

research community.

SiSPOTR can easily be used in conjunction with other siRNA design algorithms

(e.g. those weighted towards efficacy) to query their outputs for off-targeting potentials

and information. For instance, one can use Applied Biosystems’ hyperfunctional (i.e.

highly potent) siRNA design tool to identify hyperfunctional candidate sequences which

can subsequently be input into the siSPOTR tool to retrieve their POTS values (Wang et

al, 2009). This combined approach aims to ascertain siRNAs with a highly desirable

balance of potency and low off-targeting potential, providing an attractive means to

91

identify therapeutic siRNAs for disease-relevant targets, particularly larger genes which

have numerous low POTS siRNAs available (Table 4-1).

SiSPOTR allows users to query the identities of predicted seed-based off-target

transcripts as means to avoid potentially important cellular genes (e.g. those involved in

cell cycle and viability). Off-target identity is an important contributor to the overall

detrimental effects caused by disrupting gene networks, and the resulting tolerability for a

given siRNA. However, declaring a predicted off-target to be important remains difficult

due to a dependence on numerous variables [e.g. experimental system (i.e. cell type),

duration and extent of knockdown, identities of other off-targets (e.g. a two-hit model),

etc.]. Nevertheless, although researchers should consider the identities of predicted off-

targets, it stands to reason that minimizing the off-targeting potential of the siRNA seed

will inherently reduce the likelihood of unintentionally silencing important genes and

further limit downstream events associated with cascading gene networks.

Finally, siSPOTR supports RNAi sequence design for human and mouse

experimental systems; however, all low POTS heptamers contain CpG motifs which are

consistently sparse throughout mammalian genomes. Furthermore, the ranking of

heptamers by POTS for mouse and human reveals a significant correlation (r2>0.938, plot

not shown), suggesting that siSPOTR is likely applicable to other mammalian species.

92

Figure 4-1.Diagram of on- and off-target silencing by siRNAs.

(A) Cartoon depicting a siRNA duplex designed to exhibit proper strand-biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and contain a low off-targeting potential seed (green highlight). Upon loading into RISC, the antisense strand may direct on-target silencing (intended) and off-target silencing (unintended). (B) Schematic highlighting the relationship between the frequencies of seed complement binding sites in the 3’UTRome and the off-targeting potential for siRNAs. Contributed by Ryan Boudreau.

93

Figure 4-2.Effect of siRNA off-targeting potential on gene silencing capacity.

A siRNA database composed of 2431 randomly designed siRNAs (targeting 31 unique mRNAs) and accompanying silencing data (Huesken et al, 2005 ) was used to determine whether low off-targeting potential siRNAs (i.e. those having <2000 potential off-targets based on seed complement hexamer distributions in human RefSeq 3’UTRs; blue) have similar capacities for gene silencing relative to the remaining 2068 siRNAs (mid-to-high off-targeting potentials; red). Roughly 1 in 4 of the low off-targeting potential siRNAs achieved >80% silencing (a commonly accepted threshold for potency), and overall their average efficiencies were comparable to the remaining siRNAs (~66% and 69% knockdown, respectively; dotted lines). (Contributed by Ryan Spengler).

94

Figure 4-3. Formulation and distribution of POTS (potential off-targeting score).

(A) Illustration of seed site types, with seed sequences highlighted in green. The adenosine corresponding to position 1 is highlighted in yellow and represents a defining feature for the 7A1 and 8mer binding site types. (B) The effect of seed site type on off-target silencing was determined using data 54 microarray experiments testing unique siRNAs in HeLa cells. Cumulative distribution plots for gene expression values are shown for transcripts grouped by the binding site type present. Only transcripts containing singles sites of a given type were considered. ***Student t-test indicated that the most significant divergence of the repressive potentials among these site types occurs at ≤ -0.3 Log2 fold-change (P<0.001). (C) Schematic illustrating how POTS is calculated using seed site type frequency and probability of repression (PR) values, shown above each respective site type. (D) The distribution of POTS scores – based on human 3’UTR sequences – for all possible 16,384 heptamers is plotted. POTS<50 is highlighted to indicate a relevant cut-off which is employed for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value). (Panels A and C contributed by Ryan Boudreau; Panels B and D contributed by Ryan Spengler).

95

Figure 4-4. Correlation of POTS ranks across tissues.

Tissue-specific POTS values for 177 human cell lines and tissues (BioGPS) were calculated based on genes expressed (median of probeset expression values ≥100) in those tissues. (A) POTS values calculated using all 3’UTR sequences (Overall POTS) were correlated with those calculated by the 177 expression profiles (Spearman rank correlation). The histogram and box plot show the variation of correlation coefficients (r2) for each pairwise comparison (error bars = 2-98th percentile). (B) The scatter-plot shows the correlation of Overall POTS scores with the tissue-specific POTS distributions with the worst calculated correlations (r2 0.9982-0.9986).

96

Figure 4-5. Workflow schematic for designing siRNAs targeting human PPIB using the siSPOTR algorithm.

All possible 631 siRNAs targeting the human PPIB coding sequence (CDS) were filtered based on strand biasing [i.e. strong G-C (blue) and weak A/G-U (red) binding at the respective 5’ and 3’ ends of the sense strand] and GC-content, and the number of siRNAs passing each criteria are provided. Note: the asterisk denotes a cytosine base in the 3’ end of the target site; this base can be converted to a uridine to produce a weak G:U base-pairing in the resulting siRNA duplex. The heptamer seed sequence used for POTS determination is highlighted. (Contributed by Ryan Boudreau).

97

Figure 4-6. Validation of siSPOTR: efficacy and off-targeting.

(A) SiRNA efficacy was evaluated using a database of 2431 randomly designed siRNAs with accompanying silencing data. The number of siRNAs passing each stage of our stepwise filtering process is indicated along with the number of potent sequences among them (i.e. those with >80% silencing efficacy. *siDesign Center (Dharmacon) was used for comparison by inputting the relevant target gene sequences into the online tool (N=29) and intersecting the top ten hits for each gene with the 2431 siRNAs. The box and whiskers plot shows the max and min gene silencing values (whiskers) and the upper and lower quartiles (box). The accompanying Venn diagram shows that siSPOTR identified five unique and effective sequences not present among the siDesign Top Hits. (B-D) Microarray data from experiments testing 40 unique siRNAs were used to assess the reliability of POTS as an indicator for off-targeting potential. (B) Heatmaps representing sequence-specific gene “suppression signatures” unique to each siRNA were generated using hierarchical clustering of significantly down-regulated genes (>3 standard deviations from the mean) among the datasets on a per target gene basis (i.e. GAPDH, PPIB and No Target), and columns were ordered and parsed by POTS for each group.

98

Figure 4-6. Continued. (C) A qualitative representation of “suppression signature” size (i.e. sum of dark blue regions) for each column is shown. The red dotted line marks the largest “suppression signature” among the siRNAs with POTS<50. (D) Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers. (Panels A, B and C contributed by Ryan Boudreau; Panels A and D contributed by Ryan Spengler).

99

Figure 4-7. Spearman rank correlation of final POTS values.

Spearman rank correlation of final POTS values. Spearman rank correlation of the POTS scores and numbers of down-regulated off-targets (i.e. transcripts with 3’UTRs containing 7- and 8-mer seed binding sites and ≤ -0.3 Log2 fold-change) observed for each siRNA is plotted. Data consists of the training and validation groups combined. Linear regression lines, including correlation coefficients and p-values, for all data points (dotted line) and black dots (solid line) are provided. Red dots represent overt outliers.

100

Figure 4-8. Effect of POTS on off-targeting from hairpin-based RNAi expression vectors.

HEK293 cells were transfected with U6 promoter-only or U6-driven hairpin-based RNAi expression plasmids (n = 4 for each treatment), and RNA was harvested 72 h later for microarray analysis. Two-way ANOVA was performed to detect differentially expressed genes among the treatment groups. Hierarchical clustering of differentially expressed genes (P < 0.0001, 827 genes) was performed to visualize the relationships among the treatment groups. Notably, all of the low POTS sequences (green) exhibit gene expression profiles that are more closely related to the U6 control, as compared to the remaining sequences which have medium (yellow) to high (red) POTS values. (Contributed by Ryan Boudreau).

101

Table 4-1. Comparison of siRNA design tools.

Gene CDS(nt) siSPOTR siDesign Genscript DSIR AppBio

SNCA 423 4 0 0 0 0 SOD1 465 4 1 0 0 0

RTP801 699 19 5 1 0 0 TOR1a 999 14 3 6 6 1 SCA3 1086 6 4 2 3 0 VEGF 1239 22 4 4 1 2 MYC 1365 31 7 2 4 3

BACE1 1506 18 0 2 0 0 KRT6a 1695 23 0 1 2 0 SCA1 2448 42 2 1 3 1 SCA7 2679 35 6 3 7 2

EGFR1 3633 47 5 3 13 2 BCR-Abl 3816 83 7 2 7 2

SCA2 3942 42 2 2 13 1 HTT 9435 82 3 N/A 8 N/A

APOB 13692 66 1 N/A 14 N/A Total 49122 538 50 29 81 14

At least 4 siRNAs? 16 of 16 7 of 16 2 of 16 8 of 16 0 of 16

** POTS<50 serves as a relevant cut-off for purposes of this manuscript (refer to ‘Results’ section for further information regarding the relevance of this value).

N/A indicates that the online tool was unable to process transcripts of this length.

Contributed by Ryan Boudreau.

102

Table 4-2. The effect of seed position 8 on off-targeting potential by site frequency.

All possible 7mer (nt 2-8) seed sequences were grouped according to their common core 6mer (nt 2-7). The number of 3’UTRs containing any 6mer binding motif were counted. The number of these putative targets containing at least one 8mer, 7M8 or 7A1 site, given the variant base at position 8 was also tallied. The ratio between the maximum and minimum number of genes among the four heptamers was then calculated for each group. The groups with the 10 highest ratios are indicated in the table above.

# 3'UTRs with 8mer, 7M8 or 7A1 Binding Site Given N at Seed position 8

Seed nt 2-8# 3'UTRs with 6mer Seed

Binding Site (nt 2-7) A C G T Max/Min

CGCGATN 302 47 93 201 40 5.03TCGCGCN 343 97 49 220 66 4.49TCGAACN 852 564 134 225 234 4.21ATCGCGN 288 46 65 182 44 4.14AATCGCN 954 596 149 268 298 4.00ATCCGCN 864 228 137 529 211 3.86CGATTCN 1070 675 187 343 329 3.61AGGCGTN 1754 397 1194 431 332 3.60ACCGCGN 536 110 152 295 84 3.51AGCCGAN 1483 272 945 319 307 3.47

103

Table 4-3. The effect of seed position 8 on off-targeting potential by POTS

The same as Table 4-2, except here the POTS values for the core 6mer sequence given A, T, G or C at position 8 are provided. The ratio between the maximum and minimum POTS in each seed group is provided.

POTS Value Given N at Seed position 8

Seed nt 2-8# 3'UTRs with 6mer Seed

Binding Site (nt 2-7) A C G T Max/Min

GATTACN 3430 217 170 190 347 2.04CGCGATN 302 10 12 19 10 1.95TCGAACN 852 52 27 33 34 1.93ACACACN 5932 620 543 588 1044 1.92AATCCCN 4697 341 274 324 525 1.92ATATACN 5092 419 308 356 580 1.88ATGTACN 4083 262 199 233 372 1.87CGATTCN 1070 64 34 45 45 1.86GTAATCN 3134 265 209 387 269 1.85AGGCGTN 1754 69 120 71 65 1.85

104

Figure 4-9. Comparison of off-targeting potentials among shRNA libraries.

A histogram and complementing table presenting the POTS distributions and genome-wide coverage of shRNA library sequences are shown for our “Low POTS” library (green) and the TRC library (red). The POTS distribution of all possible heptamers (blue) serves as a reference. The range encompassing 90% of all sequences for each shRNA library is indicated. Yellow highlights intersect to emphasize the coverage disparities at a key point; POTS<50 provides a conservative cut-off for low off-targeting potential, and at least 4 siRNAs are desired for a given gene when generating a library or performing initial efficacy screening.

105

CHAPTER V

FINAL DISCUSSION

Competitive Endogenous RNAs

Experimental manipulation of miRNA activity has long relied on the ability to

block or sequester miRNA binding through the use of synthetic antagomirs and expressed

miRNA sponges. These molecular tools showed that, at least in principle, miRNA

activity can be regulated in a competitive manner. In Chapter 3, lncRNAs were proposed

as endogenous miRNA “sponges,” serving as endogenous analogs to the artificial

inhibitory tools. As described briefly in that chapter, recently published functional

evidence suggests that competitive endogenous RNAs (ceRNAs) take on many forms,

including long intergenic noncoding RNAs (lincRNAs) similar to PSMI16, pseudogenes,

and even protein-coding mRNAs (Cesana et al, 2011; Hansen et al, 2013; Karreth et al,

2011; Poliseno et al, 2010; Sumazin et al, 2011; Tay et al, 2011). Based upon these

reports, “ceRNA” describes a functionally diverse array of RNA classes, much in the

same way that “RNAi” describes a general process mediated by miRNA, endo-siRNA,

piRNA and the like. Future work will likely involve functional characterization of more

ceRNA:miRNA interactions, along with the physiological or pathophysiological

pathways in which they function. Additionally, other RNA species, such as the recently-

reported circRNAs (Hansen et al, 2013), may also be ceRNAs.

The observation that pseudogenes like PTENP1 function as ceRNAs adds another

connection between transposons and miRNAs. PTENP1 is an example of a “processed”

pseudogene. Processed pseudogenes are created when a mature, spliced transcript is

reverse transcribed and integrated into the genome by retrotransposon- or retrovirus-

encoded proteins. For example, PTENP1 formed when a LINE1 element mobilized and

reverse transcribed a fully processed copy of the PTEN gene. In the Posileno et.al. study,

many of the conserved MRE sites (e.g. miR-19,-20,-21,-26 and -214) from the PTEN

106

3’UTR were still intact in PTENP1, thus imparting its ceRNA activity. Interestingly,

PTENP1 is present only in apes, as no syntenic locus is found in rhesus (Old World

Monkey), marmoset (New World Monkey) or mouse genomes. This exemplifies how

primate-specific transposition activity can alter the activity of conserved miRNAs.

The fact that PTENP1 retains many MREs from the parent PTEN transcript also

reveals an important nuance differentiating pseudogene ceRNAs from other ceRNAs.

Most mRNAs, like PTEN, are coordinately regulated by multiple miRNAs, and a

pseudogene could compete for them. This means that pseudogene ceRNAs would likely

have the most potent effect on the expression levels of the parent gene and any other gene

bound by the same set of miRNAs. On the other hand, lincRNA ceRNAs like PSMI16

have numerous binding sites for a given miRNA. I would hypothesize that lincRNAs

would globally impact the targets of a miRNA family, whereas pseudogenes would

regulate its parent gene more specifically.

Off-targeting and RNAi design

We took advantage of mRNA transcript degradation by miRNA-like interactions

to detect off-target effects from exogenous RNAi triggers after their delivery to cells and

tissues. We found that the extent of miRNA-mediated changes on cell expression profiles

was robust, and in some cases, these broad transcriptional perturbations caused cell

toxicity. It stands to reason then, that rational design of RNAi triggers with low off-

targeting potential would reduce the probability of generalized transcriptional

disturbances and subsequent toxicity.

Although in general we can reduce off-targeting probability with our siSPOTR

algorithm, we also found that some low off-targeting potential sequences induced toxicity

in vivo. This suggests that not all off-targeting can be avoided, and that empirical testing

of RNAi triggers is required to assess their overall safety. Future research to further

improve predictions of RNAi specificity would benefit from closer analysis of sequences

107

deemed toxic in the literature. We are currently working to find ways to “switch” the off-

targeting profile of exogenous, artificial miRNA triggers found to be toxic. We have

found that given an antisense RNA with a “low POTS” seed that induces unintended

toxicity, single base changes to the seed sequence changes the off-target profile. As we

assume that at least one of the original sequence’s off-targets is problematic when

suppressed, switching to another low POTS seed avoids most, if not all of the original

off-target genes. To test this, we are currently working with an artificial miRNA that

effectively silences expression of Huntingtin (HTT), but which induces behavioral

deficits in wild-type C57BL/6 mice. Because similar constructs targeting HTT have been

tested in nearly identical experimental settings, achieving comparable levels of HTT

repression (Boudreau et al, 2009b; McBride et al, 2008), we hypothesize that sequence-

specific off-target effects are causing this phenotype. So far, experiments performed by

Alex Mas Monteys using constructs I designed to alter seed sequences while retaining

potency, reveals that directed single base mutations in the toxic miRNA’s seed preserves

HTT silencing efficacy in vitro. Bioinformatic target site predictions indicate that very

few seed-mediated targets overlap between the toxic trigger and the modified ones. The

next step is to inject the original or modified sequences into C57BL/6 mice as before and

see whether the mutations correct the toxic phenotype. Notably, if toxicity persists, this is

likely due to hitting other target transcripts whose expression level must be maintained at

or near 100% for cell viability.

If the single base mutations prove effective in mitigating the toxic phenotypes that

manifest from seed-mediated off-target effects, it follows that the same changes could be

made to reduce the off-target potential of high POTS sequences. As discussed in Chapter

4, commercial suppliers of pre-designed RNAi sequences focus on designing the most

potent sequences for their customers. Also mentioned in Chapter 4, based on the relative

rarity of low POTS seeds, most of the sequences designed for potency and not for seed

specificity will likely have high off-targeting potential. However, we found that all low

108

POTS seed sequences contained at least one “CG” dinucleotide. This dinucleotide is

known to be relatively infrequent in mammalian genomes. On average, every additional

“CG” nucleotide in a 7-mer motif results in a 10-fold reduction in 7-mer frequency

(Garcia et al, 2011). Therefore, we expect that if we start with a highly-potent RNAi

sequence with relatively high off-target potential, a single base change in the seed to

introduce a “CG” dinucleotide could greatly reduce their off-targeting potential. If these

mutations have minimal impact on silencing efficacy, as we have seen with the HTT

sequences thus far, we could greatly increase our ability to design low off-targeting

sequences, and perhaps even increase our stringency in screening for potency.

Emerging technologies in the study of miRNA biology

The data presented in this work, as well as many of the cited publications, has

revealed that the mechanisms underlying miRNA biogenesis and function are far more

complex than represented in the canonical pathways outlined in Chapter 1. Integrating

these newer pathways and determining the relative breadth of each to various biological

systems will be important tasks in the future. For example, Ago HITS-CLIP and similar

technologies will be essential for verifying to what extent and in which biological settings

TE-derived or lncRNA-resident MREs are actually occupied by Ago complexes. Based

on the “off-targeting” phenomenon observed with exogenous RNAi triggers, it is clear

that the RNAi machinery can be pushed to silence biologically-irrelevant targets in

sufficient doses. HITS-CLIP will provide a better picture of what is actually engaged by

RISC machinery under physiological conditions.

On the other hand, the physiological role of the Ago-bound complexes also

remains an open question. Our current understanding of miRNA function is largely based

upon perturbations of individual miRNA levels in cell culture models. Less clear is what

function miRNAs play in a relatively static setting of terminally-differentiated cells. Even

in comparing miRNA binding profiles in normal versus disease states, the question will

109

remain as to which changes are causative and which are reactionary to the disease state. I

believe that in order to effectively use and interpret HITS-CLIP to study these kinds of

questions, we should first understand how Ago binding profiles relate to gene expression

changes in acute disease settings. For example, what happens to Ago binding profiles

during acute ischemia brought on by stroke or myocardial infarction? Furthermore, how

does the response differ in these settings in which very different mRNA and miRNA

profiles are intrinsically present? Following a common theme in biology, it seems likely

that some miRNAs will be involved in an immediate reactionary phase, followed by

another group guiding a return to homeostasis. Among the most interesting findings will

be determining to what extent the concentration of the miRNA or the targets influence the

activities of one another, given that lncRNAs, pseudogenes and mRNAs appear to

compete for miRNA binding.

Alternatively, given a setting such as B-cell chronic lymphocytic leukemia (B-

CLL) where the miR-15/16 family is deleted in nearly half of all cases (Calin, 2002),

HITS-CLIP and other high-throughput techniques would help uncover which

physiological changes result from loss of miR-15 and -16, and which come from the

resulting void filled by the miRNAs that remain. As expected, many validated targets for

miR-15/16 are upregulated in response to the chromosomal deletion. However, the

sudden disappearance of such a highly-expressed miRNA would also likely increase the

effective silencing capacity of the remaining miRNAs. In a simplistic setting, given a loss

of the miR-15/16 family with no net change in expression of miRNA machinery or other

mature miRNAs, more Ago proteins would be free to engage the remaining miRNAs.

The extent to which these miRNAs contribute to the observed gene expression changes

remains an open question. Ago HITS-CLIP could be performed to compare B-CLL cells

with the miR-15/16 deletion with B-CLL cells lacking the deletion or normal B cells.

Analysis of the Ago binding profile will show a complete loss of the miR-15/16-

dependent peaks. If the remaining miRNAs do indeed have increased binding potential

110

with the miR-15/16 locus deleted, then there should be a concomitant increase in peaks or

peak height associated with these remaining miRNAs. Performing RNA-seq on the total

RNA in these cells will also be important for comparative HITS-CLIP to account for

peak changes due to changes in mRNA expression levels.

What has become quite clear over the past several years is that a close partnership

between computational and molecular biologists is essential for truly understanding the

function of these small non-coding RNAs. No miRNA or miRNA:target interaction exists

in a vacuum, and microarray, RNA-seq and HITS-CLIP techniques will help to delineate

some more complex interactions. At the same time, the role of the biologist becomes all

the more important to present a setting and biological question for which these techniques

can be effectively employed, correctly interpreted and ultimately validated.

As the miRNA field moves forward, largely guided by high-throughput

sequencing technology, researchers should go in with a sense of naivety to the role that

miRNAs play. Reading through a 2004 review in Cell, entitled “MicroRNAs: genomics,

biogenesis, mechanism and function,” (Bartel, 2004) it is apparent that prior assumptions

guiding current research in these areas have changed very little in the near decade that

has passed since the review’s publication. Although such assumptions are not necessarily

invalid, indiscriminately following them has left many important observations to become

nothing more than puzzling curiosities. Assuming no strict a priori knowledge, careful

interpretation of the information gleaned from the new technology mentioned above

could illuminate the importance of intriguing observations such as, miRNAs up-

regulating gene expression (Vasudevan et al, 2007), “isomiR” production (Guo & Lu,

2010; Martí et al, 2010), or even apparent loading into Ago and functional silencing

mediated by miRNA precursors (Tan et al, 2009). In general, miRNA studies primarily

report miRNA-mediated repression of target genes containing 3’UTR MREs based on the

most commonly-annotated miRNA isoform. What remains unclear is to what extent the

research is biased due to researchers only choosing to study the canonical interactions, or

111

whether the non-canonical pathways are actually rare occurrences in nature. Ultimately,

the coming years should prove exciting for the miRNA field, and it has been a privilege

to play some small part in contributing to knowledge and discourse in this area.

112

APPENDIX

ADENOSINE DEAMINATION IN HUMAN TRANSCRIPTS

GENERATES NOVEL MICRORNA BINDING SITES 1F

Abstract

Animals regulate gene expression at multiple levels, contributing to the

complexity of the proteome. Among these regulatory events are post-transcriptional gene

silencing, mediated by small noncoding RNAs (e.g., microRNAs), and adenosine-to-

inosine (A-to-I) editing, generated by Adenosine Deaminases that Act on double stranded

RNA (ADAR). Recent data suggest that these regulatory processes are connected at a

fundamental level. A-to-I editing can affect Drosha processing or directly alter the

microRNA (miRNA) sequences responsible for mRNA targeting. Here, we analyzed the

previously reported adenosine deaminations occurring in human cDNAs, and asked if

there was a relationship between A-to-I editing events in the mRNA 3’ untranslated

regions (UTRs) and mRNA::miRNA binding. We find significant correlations between

A-to-I editing and changes in miRNA complementarities. In all, over 3,000 of the 12,723

distinct adenosine deaminations assessed were found to form 7-mer complementarities

(known as seed matches) to a subset of human miRNAs. In 200 of the ESTs, we also

noted editing within a specific 13 nucleotide motif. Strikingly, deamination of this motif

simultaneously creates seed matches to three (otherwise unrelated) miRNAs. Our results

suggest the creation of miRNA regulatory sites as a novel function for ADAR activity.

Consequently, many miRNA target sites may only be identifiable through examining

expressed sequences.

113

Introduction

A-to-I RNA editing catalyzed by dsRNA-specific ADAR refers to the conversion

of adenosine to inosine in double-stranded (ds) or stem-loop regions of precursor mRNAs

(Bass, 2002). Experimental evidence demonstrates that, whether found in a codon,

anticodon or mature miRNA, inosine, like guanine, preferentially base-pairs with

cytosine (Yoshida et al, 1968). Several characterized examples of amino acid changes

created by adenosine deamination show that ADARs can regulate gene expression by

directing the synthesis of distinct proteins from a single open reading frame (Bass, 2002;

Burns et al, 1997). Recent work by Li et al. confirms that editing events occur at a much

higher frequency within noncoding regions (Li et al, 2009). Comparisons of human EST

and genomic sequences have identified thousands of distinct ADAR deaminations

occurring in many different genes (Levanon et al). Possible functions for editing events

include altered splicing, RNA localization, nuclear retention, mRNA stability and

translational efficiency (reviewed in (Chen & Carmichael, 2008)). Interestingly, most

editing sites occur in Alu elements (Athanasiadis et al, 2004 ; Hundley et al, 2008; Kim et

al; Levanon et al, 2004 ), the majority of which are in UTRs (Hundley et al, 2008;

Levanon et al).

Experimental evidence suggests that miRNA-mediated post-transcriptional gene

silencing and A-to-I editing are interrelated (Kawahara et al, 2007a; Kawahara et al,

2007b; Luciano et al, 2004; Scadden, 2005). MiRNA transcripts have been found to

undergo ADAR deamination with editing affecting Drosha processing, Dicer processing

or mRNA targeting (Kawahara et al, 2008; Kawahara et al, 2007b; Yang et al, 2006 ).

Work by Kawahara and colleagues showed that ADAR deamination of the seed region of

miR-376 alters the gene set regulated by the edited versus the unedited miRNA

(Kawahara et al, 2007b). In this work, we asked if A-to-I editing of the target mRNA,

rather than the miRNA, could impact mRNA::miRNA binding by creating seed matches.

114

We examined the previously reported 12,723 distinct ADAR editing sites (Levanon et al,

2004), and find A-to-I editing creates perfect complementarities to human miRNA seeds.

Results

Adenosine deamination creates miRNA

complementarities

ADAR-mediated conversion of adenosine to inosine allows inosine:cytosine

pairing because inosine is chemically similar and functionally equivalent to guanosine

(Figure A-1A). A well-established participant in regulating RNA:RNA interactions

through altering sequence complementarity, the preferential base pairing of inosine to

cytosine was described several decades ago in codon:anticodon interactions (Yoshida et

al). More recently, the direct ADAR deamination of a miRNA (miR-376) was found to

alter miRNA target selection (Kawahara et al, 2007b). Over 12,000 A-to-I editing sites

have been identified in human mRNAs with nearly 90% of these occurring in UTRs

(Athanasiadis et al, 2004 ; Kim et al, 2004; Levanon et al, 2004 ). Because 3’ UTRs are

widely accepted as the predominant site of miRNA:mRNA association, we asked whether

deamination of 3’ UTR A-to-I editing sites (Levanon et al, 2004 ) significantly altered

their complementarity to currently annotated human miRNAs.

Although miRNAs are generally ~21-22 nt in length, their association with target

mRNAs is typically mediated through a seven base pair (bp) interaction involving base

pairs 2-8 (5’ to 3’) of the mature miRNA (Lai, 2002). This 7 nt sequence constitutes a

miRNA “seed” and its reverse complement in a target mRNA, a “seed match” (Lewis et

al, 2005). Using a simple 7 bp seed scan of the 100 bp 5’ and 3’ of the 12,723 distinct

deamination sites (Levanon et al, 2004 ), we identified miRNA seed matches that were

created or lost. All sites were screened once with a central adenosine (unedited, lost) and

once with a central guanosine (edited, created) (Figure A-1B). Using this approach, we

identified seed matches to 30 miRNA families that were significantly enriched (p ≤

115

1.8x10-5) in sequences bearing a central G position (Table A-1 and Table A-2).

Strikingly, over 3,000 of the 12,723 sites form perfect miRNA seed complements if

deaminated. We coined these miRNA associating if deaminated (MAID) sites, and find

that most are localized to the 3’ UTR (Table A-2). While editing can also destroy sites

(not shown), we focus here on MAIDs and their ability to confer miRNA-mediated

regulation.

MiR-513 and miR-769-3p/-450b-3p

specifically target deamination sites

We first examined the greatest outliers, miR-513 and miR-769-3p/-450b-3p, in

greater detail. In the 12,723 dataset representing unedited sequences, the average number

of seed matches to miR-769-3p/-450b-3p at any 7 nt position was 0.79 (max = 4). This

strongly contrasts the 252 miR-769-3p/-450b-3p seed matches unique to the edited 3’

UTR dataset (Table A-1 and Figure A-2A). Similarly, the average number of seed

matches to miR-513 at any position was 0.63 (max = 4) vs. 257 when comparing the

unedited to the edited 3’ UTR flanking sequence and edit site. Therefore, for these

mRNAs, miR-513 and miR-769-3p/-450b-3p preferentially target deaminated sequences.

Upon closer examination, we found that ~190 of the matches to the miR-513 seed

(3’ GGACACU 5’) and miR-769-3p/-450b-3p seed (3’ CUAGGGU 5’) were created by

a single deamination within a common 12 nt motif (5’ CCUGUIAUCCCA 3’) (Figure A-

2B). Finding an invariant guanine immediately 3’ of these 12 nt, and allowing for a single

GU wobble at an adenosine or guanine immediately 3’ to the deamination site, extended

the miRNA-513/-769-3p/-450b-3p MAID to 5’ CCUGUIRUCCCAG 3’. Thus, the

simple scanning approach used identified 288 distinct sites within this 13 nt motif, which

when edited forms seed matches to miR-513 and miR-769-3p/-450b-3p (Figure A-2C).

Thus, MAIDs containing miR-513 and miR-769-3p/-450b-3p seed matches are

significantly enriched in a subset of the deamination sites originally identified by

116

Levanon et al. (Levanon et al, 2004 ) (not shown). Of note, this result was repeated using

a standalone TargetScan program (Lewis et al, 2005) without considering conservation of

seed matches as a ranking criterion.

MiR-513 and miR-769-3p repress

deaminated sequences

To test if MAID sequences could serve as miR-513 and/or miR-769-3p/-450b-3p

targets, we constructed a series of luciferase reporters possessing unedited sequences, or

‘edited’ 13 bp MAIDs specific to miR-513/miR-759-3p/-450-3p downstream of Renilla

luciferase (Figure A-3A). MiR-513 expression vectors repressed TGAT (edited) target

activity by ~50% and TGGT target activity by ~ 40% when co-transfected with the

reporters (Figure A-3B). Replacing miR-513 with miR-769 in similar experiments

resulted in ~70% and ~60% reductions in Renilla luciferase activity, respectively (Figure

A-3B). Importantly, reporter activity from the ‘unedited’ reporter construct TAAT was

not affected when transfected with either miRNA (similar to control, data not shown).

We next repeated these experiments with pooled miRNA expression vectors and either

miRNA-specific or control miRNA inhibitors (Anti-miRs™) (Krutzfeldt et al, 2005 ). As

shown in Figure A-3B, activity was restored to near normal levels in the presence of

specific miR-513 and miR-769-3p inhibitors. These data demonstrate that MAIDs can be

specifically repressed by miR-513 and miR-769-3p.

To test whether MAIDs can confer miRNA regulation to endogenous mRNAs, we

examined the 3’ UTR of DFFA (DNA fragmentation factor alpha - also referred to as

ICAD). DFFA was selected for three reasons: one, the presence of nine 5’

CCUGUIRUCCCAG 3’ motifs within the 3’ UTR (Figure A-4A, B) (Hubbard et al,

2007); two, the prevalence of ESTs within the NCBI dataset showing ADAR activity in

the 3’ UTR (Wheeler et al, 2007); and three, our own sequencing confirmation of DFFA

117

deamination in cDNAs cloned from neuroblastoma (NB7) vs. HEK 293 cells (Fig. 4C).

Together, these characteristics present DFFA as a possible target for MAID regulation.

Two luciferase reporters were constructed to evaluate the ability of miR-513

and/or miR-769-3p to repress DFFA (Fig. 4D). These vectors encode Renilla luciferase

harboring either the DFFA edited 3’ UTR cloned from NB7 cells (DFFA-E), or the

unedited DFFA 3’ UTR cloned from HEK 293 cells (DFFA-U). DFFA-U and DFFA-E

constructs differ by a single adenosine deamination (293_2 vs. NB7_2; Figure A-4C). As

shown in Fig. 4E, miR-513 and miR-769-3p co-transfection repressed DFFA-E reporter

activity by ~ 30% and 60%, respectively. In contrast, DFFA-U reporter expression was

not repressed when transfected with either miRNA (similar to control, data not shown;

Figure A-4E). Experiments with pooled miRNA expression vectors and either miRNA-

specific or control miRNA inhibitors (Anti-miRs™) confirmed miRNA targeting activity

specific to the edited state. For DFFA-E, but not DFFA-U, Renilla activity was

significantly increased (to near normal levels) in the presence of Anti-miR-513 and Anti-

miR-769-3p (Figure A-4E). These experiments indicate that miR-513 and miR-769-3p

can regulate the DFFA mRNA 3’ UTR in an adenosine deamination-dependent manner.

MiR-769-3p represses DFFA expression specifically in

cells that deaminate the DFFA 3’ UTR

Having confirmed that miR-513 and miR-769-3p selectively repress deaminated

DFFA 3’ UTRs within the context of our reporters assays, we next tested whether they

could repress endogenous DFFA protein expression. Because miR-513 and miR-769-3p

were either undetectable or expressed at very low levels in NB7 cells (data not shown and

Figure A-5A), we used over-expression plasmids. MiR-769-3p overexpression in NB7

cells resulted in ~ 60% reduction in DFFA, in contrast to miR-769-3p overexpression in

HEK 293 cells (Figure A-5B, C). Co-transfection of Anti-miR-769-3p, but not a control

Anti-miR™, abrogated this repression (Figure A-5B). To assure these findings were the

118

consequence of miRNA expression, we repeated these experiments using commercially

synthesized miRNA precursor RNAs and found these similarly capable of silencing

endogenous DFFA levels in NB7 cells (data not shown). The cell line-specific reduction

of endogenous DFFA supports the hypothesis that miR-769-3p can regulate the

deaminated DFFA 3’ UTRs.

Discussion

In this work, we demonstrate that A-to-I editing can create miRNA target sites

and that these sites are functional in vitro. On the surface, our findings appear to

contradict a recent hypothesis report indicating no correlation between miRNA target

sites and A-to-I editing (Liang & Landweber, 2007); however, our data are largely in

agreement, and the apparent conflict can be explained by the scope of our study. The

prior work by Liang and Landweber (Liang & Landweber, 2007) searched for a

relationship between ADAR deaminations and 73 conserved miRNA families. In

contrast, we evaluated all human miRNAs. By extending the study beyond conserved

miRNAs, we identify 325 miRNA families with complementarities enriched at

deamination sites. Notably, the majority of miRNAs identified is primate specific and

would, therefore, not be identified in an analysis of evolutionarily conserved miRNAs. In

more recent work, Li and coworkers also found no enrichment of editing sites in miRNA

target genes when sites within Alu elements were excluded (Li et al).

While the simple scanning approach we used identified a number of 7 bp miRNA

seed matches that were formed upon editing, our results are likely an underestimation.

Our approach does not consider the formation (or loss) of a1 sites, which is defined by

TargetScan5.0 as having a 6-bp perfect match (nts 2-7 of the seed) and an adenosine

immediately 3’ of the seed match. Similarly, our scan would not identify sites with an

edited adenosine in position 1 of an 8mer seed match. Even with these possible

differences in absolute numbers of created MAIDs or miRNA-targeting sites lost upon

119

editing, data from TargetScan5.0 resulted in the same overall trends for the transcripts

profiled.

Alu sequences in transcripts are notable for harboring human miRNA seed

complementarities (Lehnert et al, 2009; Smalheiser & Torvik, 2006a) and we extend that

complementarity to A-to-I edited sites. The MAIDs identified in our work overlap with

adenosines that are edited more frequently than those at other positions, as quantified by

Kim et al (Supplementary Fig. 1(Kim et al, 2004). Interestingly, the sequences

surrounding and containing these sites are highly conserved among Alu families (Ray &

Batzer, 2005). Together with our experimental and informatics analyses, these findings

suggest that one outcome of primate mRNA 3’ UTR adenosine deamination is the

modulation of miRNA target sites.

Editing occurring within 3’ UTR-resident Alu elements can result in nuclear

retention of the transcripts (Chen et al, 2008), however, 3’ UTR-edited mRNAs can also

be associated with polysomes (Hundley et al, 2008), suggesting that not all edited

transcripts are retained in the nucleus. Furthermore, a more recent bioinformatics study

demonstrated that deletion of sequences between paired Alus can occur, potentially

removing miRNA target sites(Osenberg et al, 2009). Relevant to our work, Osenberg and

coworkers did not detect cleavage of human DFFA 3’ UTRs, suggesting that our

predicted sites remain intact in the final transcript. It will be interesting in further work to

determine the sequence isoform of DFFA and other edited 3’ UTRs as to their

cytoplasmic expression (or nuclear retention) between cells of different origin, cells in

variable states, or among different tissues.

Finally, although miRNA:target interactions have been catalogued using

bioinformatic approaches, target validation still requires empirical validation. Factors

such as GU base-pairing, local secondary structure, target accessibility, and position

effects due to nucleotide composition clearly complicate accurate target prediction

(Smalheiser & Torvik, 2006b). Here, we demonstrate that some miRNA target sequences

120

are not detectable by querying only the genomic sequence. For the miRNAs we identify

as complementary to deaminated sequences, as well as miRNAs potentially targeting

other post-transcriptional editing events, accurate target identification may only be

possible through evaluating expressed sequences directly. More recent data showing

editing events between tissues of an individual (Li et al, 2009), and deep sequencing that

reveal rare editing events in the developing brain (Wahlstedt et al, 2009), will be useful

to analyze further the extent of editing that confers miRNA target site modulation.

Materials and Methods

Informatics evaluation of ADAR deamination sites

The full, publically available Compugen deamination dataset (composed of

12,723 characterized human deamination sites with 200 nt of flanking sequence (100 nt

5’ and 3’) were obtained from www.cgen.com/research/Publications. Perfect seed

matches to all currently annotated human miRNAs were identified in each of the

Compugen sequences using either an in-house Perl script which counted seed match

occurrence and recorded the position of each within individual 201 nt Compugen

sequences (as illustrated in Fig. 1), or an in-house Excel-based analysis, and using the

standalone version of Targetscan (Targetscan 5.0). Each 7-mer sequence was queried for

perfect identity to the reverse and complement of the 7nt seed sequence for human

miRNAs (miRBASE 13.0). Statistical significance of individual miRNA seed occurrence

was determined by Fisher exact test. Significance of A-to-I editing site and MAID

occurrence in human cDNA databases was determined by chi-square distribution test

Vector construction

Unless otherwise indicated, PCR amplifications were performed in 40 μl reactions

at standard concentrations (1.5 mM MgCl2, 0.2 mM dNTP, 1 Biolase PCR buffer, 0.5

U Taq (Biolase USA, Inc., Randolph, MA), 0.5 µM each primer) and using standard

121

cycling parameters (94 °C - 3 min, (94 °C - 30s, 55 °C - 30s, 72 °C - 60s) 30 cycles, 72

°C - 3 min) then cloned into Topo TA PCR 2.1 (Invitrogen, Carlsbad, CA). RT-PCRs

were performed at 55 °C using RetroScript III Reverse Transcriptase (Invitrogen,

Carlsbad, CA) and oligo-dT 20mers. UTR amplifications were cloned into Topo TA PCR

2.1 and sequenced. Antisense reporters (TAAT, TAGT, TGAT, TGGT, Consensus-A and

Consensus-G) were constructed by oligonucleotide primer extension (amplifications

performed as above except cycle number was decreased to 25 and extensions to 10s) with

primers containing 5 Xho-I and 3 Spe-I restriction enzyme sites. Following digestion,

amplicons were ligated into the Renilla luciferase 3�UTR of psiCheck2 (Promega,

Madison, WI) vector linearized with Xho-I and Spe-I then incubated with antarctic

phosphatase (NEB, Ipswich, MA). The presence of an independently transcribed firefly

luciferase in these reporters allowed normalization for transfection efficiency.

Luciferase assays

HEK 293s were cultured in DMEM (10% FBS and 1% PS) in 12-well plates. At

90% confluency, cells were transfected following the Lipofectamine 2000 (Invitrogen,

Carlsbad, CA) protocol. As indicated, luciferase assays (n = 3) were performed on HEK

293 lysates following cotransfections of psiCheck2 (Promega, Madison, WI) luciferase

reporters with Alu promoter expression vectors (pAL-513, pAL-769-3p or pAL-1-control

(Borchert et al, 2006 ) and/or Anti-miRs (Ambion ®) following manufacturer

recommended guidelines. At 35 h, existing media was replaced with 1 ml fresh media. At

36 h, cells were scraped from well bottoms and transferred to 1.5 ml Eppendorf tubes.

Eppendorfs were centrifuged at 2000 RCF for 3 min, followed by supernatant aspiration

and cell resuspension in 300 μl of PBS. Cells were lysed by three freeze thaws and debris

removed by centrifuging at 3000 RCF for 3 min. 50 μl of supernatant was transferred to a

96-well MicroLite plate (MTX Lab Systems, Vienna, VA) then firefly and Renilla

luciferase activities measured using the Dual-GLO Luciferase® Reporter System

122

(Promega, Madison, WI) and a 96-well plate luminometer (Dynex, Worthing, West

Sussex, UK). RLUs were calculated as the quotient of Renilla luciferase/firefly luciferase

RLU and normalized to mock.

Western blotting

Cells were cultured and transfected as described for luciferase assays. At 36 h,

existing media was replaced with 100 μl of lysis buffer containing protease inhibitors and

incubated for 15 min at 4°C after which cells were scraped from well bottoms and

transferred to 1.5 ml Eppendorf tubes. Proteins were electrophoresed through an 8%

SDS–polyacrylamide gel (BioRad, Hercules, CA) and transferred to Immobilon-P PVDF

membranes (Millipore, Billerica, MA). Membranes were blocked for 1 h in 2% (w/v)

nonfat milk in phosphate-buffered saline containing 0.05% Tween 20, washed, and

incubated with primary antibody overnight at 4°C using the following dilution: DFFA –

1:3000 (ab16258, Abcam, Cambridge, MA) and β-catenin – 1:6000 (ab2982, Abcam).

Membranes were washed and incubated with goat anti-rabbit peroxidase-conjugated

secondary antibody (111-035-144, Jackson ImmunoResearch, West Grove, PA).

Immunoreactive bands were visualized with ECL Plus (Amersham, Piscataway, NJ) and

quantified using a Fluorochem densitometer (Alpha Innotech Corp., San Leandro, CA).

123

Table A-1. MiRNA seed matches markedly enriched by adenosine deamination

miRNA Seed Exp2 / Uned Deam. p-value

miR-513a-5p 2 TCACAGG 0.6 / 0 / 257 5.8E-79

miR-769-3p,-450b- TGGGATC 0.6 / 4 / 252 4.5E-70 miR-140-3p ACCACAG 1.1 / 0 / 135 6.4E-29 miR-340* CCGTCTC 5.9 / 9 / 132 7.3E-28 miR-129-5p TTTTTGC 1.2 / 11 / 105 8.9E-21 miR-1207-3p CAGCTGG 1.6 / 4 / 97 1.3E-19 miR-222* TCAGTAG 1.3 / 6 / 95 2.8E-18 miR-936 CAGTAGA 1.0 / 5 / 94 3.7E-18 miR-30a*,d*,e* TTTCAGT 1.1 / 10 / 93 4.3E-18 miR-646 AGCAGCT 1.1 / 2 / 87 6.4E-16 miR-412 CTTCACC 0.6 / 5 / 83 1.2E-15 miR-330-5p,-326 CTCTGGG 0.6 / 4 / 78 6.3E-13 miR-629* TTCTCCC 1.2 / 4 / 74 5.8E-12 miR-34a* AATCAGC 0.9 / 0 / 61 8.4E-10 miR-519e AAGTGCC 0.8 / 3 / 59 8.1E-9 miR-548c-3p AAAAATC 2.0 / 0 / 59 8.1E-9 miR-325 CTAGTAG 0.5 / 5 / 57 7.7E-8 miR-371-3p AGTGCCG 0.2 / 0 / 57 7.7E-8 miR-630 GTATTCT 0.4 / 4 / 56 8.3E-8 miR-1281 CGCCTCC 3.6 / 0 / 52 1.1E-7 miR-1229 TCTCACC 1.0 / 3 / 40 5.6E-6 miR-28-5p,-708 AGGAGCT 0.7 / 1 / 35 5.4E-5

1Expected numbers are based on seed match occurrence in the 200nt flanking each adenosine deamination.

2 miRs-513 and -769-3p are often (~80%) complementarity to the same deaminated sequences.

3 The two miR 518 family seeds AAAGCGC and AAGCGCT are complementarity to the same twenty-six deaminated sequences.

124

Table A-2. A-to-I editing occurs predominantly in noncoding regions of expressed sequences

Number of Edited Sequences

% of Total Edited Sequences

% of Edited Sequences with

miR seed matches

Total Edited Sequences

12723 100.00%

Total # Successfully

Mapped1

8014 (99.75%/0.25%)

62.99%

Total MAIDs Overall

3058 24.04% 100%

Total MAIDs Mapped2

1918 (99.64%/0.36%)

15.08% 62.72%

Total Lost Sequences Overall

2358 19.95% 100.00%

Total Lost Sequences Mapped

1605 (99.75%/0.25%)

12.61% 63.24%

1 12,723 available editing sequences (201 nt) were mapped to human transcripts (previous mapping data of these sequences were not currently available) obtained from the UCSC table browser (RefSeq hg18) using megablast (arguments: -W 196; -S 1; -F F; -p 100). Due to the highly repetitive nature of the sequences used for this analysis, positive identification required 100% identity and 100% coverage of the editing sequence. Using these criteria, ~63% of the 12,723 sequences could be mapped.

2 Sequences with miRNA sites created (MAIDs) or lost were queried for mapping to coding or noncoding regions. Of the sequences mapped to transcripts using our methods, the vast majority fell with non-coding regions (%non-coding/%coding presented in the first numerical column).

125

Figure A-1. ADARs deaminate adenosine to inosine, potentially altering miRNA complementarities.

(A) A cartoon depicting adenosine, deaminated adenosine (inosine), and guanine. In some tRNAs, inosine routinely serves as a member of the anticodon where it is recognized as a guanine. (B) A characterized deamination site occurring in the 3’ UTR of DNA Fragmentation Factor α (DFFA) is shown in both an edited and unedited state. In this work, each 7 nt sequence (red) occurring within 100 nt of > 12,000 distinct deamination sites (blue) were screened against all annotated human miRNA seed sequences. The miR-513 seed (yellow) illustrates how target mRNA deamination can mediate miRNA binding. Contributed by Glen Borchert.

126

Figure A-2. A-to-I edits frequently create miR-513 and miR-769-3p / -450b-3p complementarities.

(A) 12,719 unique EST sequences (www.cgen.com), each consisting of a central A-to-I deamination and 100 nt flanks (i.e. n100 (A or I) n100), were screened for complementarity to human miRNAs. All human miRNA seed matches were identified within the individual 201 nt sequences originally identified as an A-to-I transition by Compugen (statistical significance is addressed in Table 2). The top two panels represent all miR-769-3p (and miR-450b-3p) seed matches occurring at each position in both the unedited (left) and edited (right) states. The lower panels represent all miR-513 seed matches occurring in unedited (left) and edited (right) states. (B) A cartoon of miR-513 and miR-769-3p / -450b-3p complementarities to a MiRNA Associating If Deaminated (MAID) site in both unedited (left) and edited (right) states is shown. Perfect seed matches to miR-769-3p / -450b-3p (blue) and miR-513 (yellow) are significantly enriched in sequences containing characterized deaminations (red). Vertical lines indicate complementary base pairing. (C) Venn diagram depicting the overlap between miR-513 and miR-769-3p / -450b-3p target sites matching the full MAID motif. Importantly, nearly 100 additional sequences are identified by allowing a single GU wobble immediately 3’ to the deamination. CCUGUIRUCCCAG. Original analysis by Glen Borchert. Reanalysis and editing by Ryan Spengler.

127

Figure A-3. miR-513 and miR-769-3p target MAIDs but not the corresponding unedited sequence.

(A) A diagram shows hairpin expression vectors and MAID reporter constructs. pAL -513 and -769-3p reporters have miR-513 and miR-769-3p hairpins downstream of the miR-517 Pol-III promoter. TAAT, TGAT, and TGGT reporters contain 3 tandem copies of the 13 bp MAID sequence in the 3’ UTR of Renilla luciferase for testing activity in the unedited (TAAT) or edited (TGAT, TGGT) states. Guanines mimicking A-to-I edits are bolded and underscored. (B) Renilla luciferase activity (normalized to firefly luciferase and presented as percent mock transfected control) following co-transfection of miR-513, miR-769-3p, pooled miR-513 and miR-769-3p inhibitors and/or control miRNA inhibitor with the indicated reporters into HEK 293 cells (n = 3) is illustrated. *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.

128

Figure A-4. Endogenous MAIDs are targets for miR-513 and miR-769-3p repression.

(A) A cartoon depicts the DFFA 3’UTR and the localization of nine distinct MAIDs (lines above the 3’UTR). (B) Alignment of the nine DFFA 3’UTR MAIDs commonly deaminated in ESTs is represented. MAID sequences are shaded. Four MAIDs contain GU wobbles from consensus (bold). (C) Alignment of DFFA_1 sequences from independent DFFA clones isolated from HEK 293 cells and NB7 cells is shown. DFFA_1 was deaminated in NB7s (bold) but not in HEK 293s. RT reactions were performed using a thermostable reverse transcriptase. (D) A diagram of DFFA 3’UTR reporter constructs is shown. In DFFA-Edited (-E) and DFFA-Unedited (-U), the Renilla 3’ UTRs are the cloned DFFA 3’UTRs from NB7 and HEK 293 cells, respectively. DFFA-E nucleotides differing from DFFA-U are bolded and underscored (compare NB7_2 and 293_1 detailed in panel (C). (E) Luciferase assays performed identically to those in 3b except for the reporter constructs illustrated (n=3). *, p < 0.005. Contributed by Glen Borchert and Brian Gilmore.

129

Figure A-5. MiR-769 selectively represses DFFA protein.

(A) Relative miR-769-3p RNA levels in HEK 293, A549, HT1080 and NB7 cell lines are shown as determined by quantitative PCR. MiR-513 was not detected in these cell lines. (B) MiR-769-3p over-expression reduces DFFA levels specifically in NB7 cells. Endogenous DFFA protein levels in NB7 and HEK293 cells were determined by western blot densitometry. The ratio of DFFA levels in NB7 / HEK293 is shown. (C) Western blot analysis of endogenous DFFA in HEK 293 and NB7 cell lysates following transfection of miR-769 as indicated. Representative blots for DFFA and β-catenin (loading control) are shown. Relative DFFA levels were calculated as band intensity ratios of DFFA to β-catenin and normalized to mock (left most bar in each graph). 400 - 400 ng miR-769 expression vector; 200 - 200 ng miR-769 expression vector; 100 - 100 ng miR-769 expression. Contributed by Glen Borchert and Brian Gilmore.

130

REFERENCES

Anderson EM, Birmingham A, Baskerville S, Reynolds A, Maksimova E, Leake D, Fedorov Y, Karpilow J, Khvorova A (2008) Experimental validation of the importance of seed complement frequency to siRNA specificity. RNA 14: 853-861

Athanasiadis A, Rich A, Maas S (2004 ) Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391

Azuma-Mukai A (2008) Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc Natl Acad Sci USA 105: 7964-7969

Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-297

Bass BL (2002) RNA editing by adenosine deaminases that act on RNA. Annu Rev Biochem 71: 817-846

Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE (2008) Active Alu retrotransposons in the human genome. Genome Res 18: 1875-1883

Berezikov E, Chung WJ, Willis J, Cuppen E, Lai EC (2007) Mammalian mirtron genes. Mol Cell 28: 328-336

Birmingham A, Anderson E, Sullivan K, Reynolds A, Boese Q, Leake D, Karpilow J, Khvorova A (2007) A protocol for designing siRNAs with high functionality and specificity. Nat Protoc 2: 2068-2078

Birmingham A, Anderson EM, Reynolds A, Ilsley-Tyree D, Leake D, Fedorov Y, Baskerville S, Maksimova E, Robinson K, Karpilow J, Marshall WS, Khvorova A (2006 ) 3' UTR seed matches, but not overall identity, are associated with RNAi off-targets. Nat Methods 3: 199-204

131

Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19: Unit 19 10 11-21

Bohnsack MT, Czaplinski K, Gorlich D (2004) Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. Rna 10: 185-191

Borchert G, Gilmore B, Spengler R, Xing Y, Lanier W, Bhattacharya D, Davidson B (2009) Adenosine deamination in human transcripts generates novel microRNA binding sites. Hum Mol Genet 18: 4801-4807

Borchert GM, Lanier W, Davidson BL (2006 ) RNA polymerase III transcribes human microRNAs. Nat Struct Mol Biol 13: 1097-1101

Boudreau RL, Martins I, Davidson BL (2009a) Artificial MicroRNAs as siRNA Shuttles: Improved Safety as Compared to shRNAs In vitro and In vivo. Mol Ther 17: 169-175

Boudreau RL, McBride JL, Martins I, Shen S, Xing Y, Carter BJ, Davidson BL (2009b) Nonallele-specific silencing of mutant and wild-type huntingtin demonstrates therapeutic efficacy in Huntington's disease mice. Mol Ther 17: 1053-1063

Boudreau RL, Spengler RM, Davidson BL (2011) Rational Design of Therapeutic siRNAs: Minimizing Off-targeting Potential to Improve the Safety of RNAi Therapy for Huntington's Disease. Molecular Therapy 19: 2169-2177

Boudreau RL, Spengler RM, Hylock RH, Kusenda BJ, Davis HA, Eichmann DA, Davidson BL (2013) siSPOTR: a tool for designing highly specific and potent siRNAs for human and mouse. Nucleic Acids Research 41

Bovia F, Wolff N, Ryser S, Strub K (1997) The SRP9/14 subunit of the human signal recognition particle binds to a variety of Alu-like RNAs and with higher affinity than its mouse homolog. Nucleic Acids Res 25: 318-326

132

Bracken CP (2008) A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res 68: 7846-7854

Bramsen JB, Pakula MM, Hansen TB, Bus C, Langkjaer N, Odadzic D, Smicius R, Wengel SL, Chattopadhyaya J, Engels JW, Herdewijn P, Wengel J, Kjems J (2010) A screen of chemical modifications identifies position-specific modification by UNA to most potently reduce siRNA off-target effects. Nucleic Acids Res 38: 5761-5773

Burchard J, Jackson AL, Malkov V, Needham RH, Tan Y, Bartz SR, Dai H, Sachs AB, Linsley PS (2009) MicroRNA-like off-target transcript regulation by siRNAs is species specific. Rna 15: 308-315

Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB (1997) Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387: 303-308

Caffrey DR, Zhao J, Song Z, Schaffer ME, Haney SA, Subramanian RR, Seymour AB, Hughes JD (2011) siRNA off-target effects can be reduced at concentrations that match their individual potency. PLoS One 6: e21503

Calin GA (2002) Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci USA 99: 15524-15529

Cesana M, Cacchiarelli D, Legnini I, Santini T, Sthandier O, Chinappi M, Tramontano A, Bozzoni I (2011) A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147: 358-369

Chang DY, Hsu K, Maraia RJ (1996) Monomeric scAlu and nascent dimeric Alu RNAs induced by adenovirus are assembled into SRP9/14-containing RNPs in HeLa cells. Nucleic Acids Res 24: 4165-4170

Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37: W305-311

133

Chen LL, Carmichael GG (2008) Gene regulation by SINES and inosines: biological consequences of A-to-I editing of Alu element inverted repeats. Cell Cycle 7: 3294-3301

Chen LL, DeCerbo JN, Carmichael GG (2008) Alu element-mediated gene silencing. EMBO J 27: 1694-1705

Chi JT, Chang HY, Wang NN, Chang DS, Dunphy N, Brown PO (2003 ) Genomewide view of gene silencing by small interfering RNAs. Proc Natl Acad Sci U S A 100: 6343-6346

Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, Blelloch R, Schroth GP, Nusbaum C, Bartel DP (2010) Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24: 992-1009

Cummins JM, He Y, Leary RJ, Pagliarini R, Diaz LA, Jr., Sjoblom T, Barad O, Bentwich Z, Szafranska AE, Labourier E, Raymond CK, Roberts BS, Juhl H, Kinzler KW, Vogelstein B, Velculescu VE (2006) The colorectal microRNAome. Proc Natl Acad Sci U S A 103: 3687-3692

Davidson BL, McCray PB, Jr. (2011) Current prospects for RNA interference-based therapies. Nat Rev Genet 12: 329-340

Davis BN, Hilyard AC, Lagna G, Hata A (2008) SMAD proteins control DROSHA-mediated microRNA maturation. Nature 454: 56-61

Davis-Dusenbery BN, Hata A (2010) Mechanisms of control of microRNA biogenesis. J Biochem 148: 381-392

134

Diez-Roux G, Banfi S, Sultan M, Geffers L, Anand S, Rozado D, Magen A, Canidio E, Pagani M, Peluso I, Lin-Marq N, Koch M, Bilio M, Cantiello I, Verde R, De Masi C, Bianchi SA, Cicchini J, Perroud E, Mehmeti S, Dagand E, Schrinner S, Nürnberger A, Schmidt K, Metz K, Zwingmann C, Brieske N, Springer C, Hernandez AM, Herzog S, Grabbe F, Sieverding C, Fischer B, Schrader K, Brockmeyer M, Dettmer S, Helbig C, Alunni V, Battaini MA, Mura C, Henrichsen CN, Garcia-Lopez R, Echevarria D, Puelles E, Garcia-Calero E, Kruse S, Uhr M, Kauck C, Feng G, Milyaev N, Ong CK, Kumar L, Lam M, Semple CA, Gyenesei A, Mundlos S, Radelof U, Lehrach H, Sarmientos P, Reymond A, Davidson DR, Dollé P, Antonarakis SE, Yaspo ML, Martinez S, Baldock RA, Eichele G, Ballabio A (2011) A high-resolution anatomical atlas of the transcriptome in the mouse embryo. PLoS Biol 9: e1000582

Dinger ME, Amaral PP, Mercer TR, Pang KC, Bruce SJ, Gardiner BB, Askarian-Amiri ME, Ru K, Soldà G, Simons C, Sunkin SM, Crowe ML, Grimmond SM, Perkins AC, Mattick JS (2008) Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res 18: 1433-1445

Doench JG, Sharp PA (2004) Specificity of microRNA target selection in translational repression. Genes Dev 18: 504-511

Farh KK, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP (2005) The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310: 1817-1821

Fedorov Y, Anderson EM, Birmingham A, Reynolds A, Karpilow J, Robinson K, Leake D, Marshall WS, Khvorova A (2006 ) Off-target effects by siRNA can induce toxic phenotype. RNA 12: 1188-1196

Friedman RC, Farh KK, Burge CB, Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19: 92-105

Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39: D876-882

135

Garcia DM, Baek D, Shin C, Bell GW, Grimson A, Bartel DP (2011) Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18: 1139-1146

Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A (2005) Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15: 1451-1455

Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11: R86

Gregory RI (2004) The Microprocessor complex mediates the genesis of microRNAs. Nature 432: 235-240

Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34: D140-144

Grimm D, Streetz KL, Jopling CL, Storm TA, Pandey K, Davis CR, Marion P, Salazar F, Kay MA (2006 ) Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature 441: 537-541

Grimson A, Farh KK, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP (2007) MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27: 91-105

Guo H, Ingolia NT, Weissman JS, Bartel DP (2010) Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466: 835-840

Guo L, Lu Z (2010) Global expression analysis of miRNA gene cluster and family based on isomiRs from deep sequencing data. Comput Biol Chem 34: 165-171

Han J (2004) The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18: 3016-3027

136

Han J (2006) Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125: 887-901

Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495: 384-388

Hsu K, Chang DY, Maraia RJ (1995) Human signal recognition particle (SRP) Alu-associated protein also binds Alu interspersed repeat sequence RNAs. Characterization of human SRP9. J Biol Chem 270: 10179-10186

Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610-617

Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Meloon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt F, Hall J (2005 ) Design of a genome-wide siRNA library using an artificial neural network. Nat Biotechnol 23: 995-1001

Hundley HA, Krauchuk AA, Bass BL (2008) C. elegans and H. sapiens mRNAs with edited 3' UTRs are present on polysomes. RNA 14: 2050-2060

Jackson AL, Bartz SR, Schelter J, Kobayashi SV, Burchard J, Mao M, Li B, Cavet G, Linsley PS (2003 ) Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol 21: 635-637

Jackson AL, Burchard J, Leake D, Reynolds A, Schelter J, Guo J, Johnson JM, Lim L, Karpilow J, Nichols K, Marshall W, Khvorova A, Linsley PS (2006 ) Position-specific chemical modification of siRNAs reduces "off-target" transcript silencing. RNA 12: 1197-1205

137

Jackson AL, Burchard J, Schelter J, Chau BN, Cleary M, Lim L, Linsley PS (2006) Widespread siRNA "off-target" transcript silencing mediated by seed region sequence complementarity. Rna 12: 1179-1187

Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9: 57-67

Kaneko H, Dridi S, Tarallo V, Gelfand BD, Fowler BJ, Cho WG, Kleinman ME, Ponicsan SL, Hauswirth WW, Chiodo VA, Karikó K, Yoo JW, Lee DK, Hadziahmetovic M, Song Y, Misra S, Chaudhuri G, Buaas FW, Braun RE, Hinton DR, Zhang Q, Grossniklaus HE, Provis JM, Madigan MC, Milam AH, Justice NL, Albuquerque RJ, Blandford AD, Bogdanovich S, Hirano Y, Witta J, Fuchs E, Littman DR, Ambati BK, Rudin CM, Chong MM, Provost P, Kugel JF, Goodrich JA, Dunaief JL, Baffi JZ, Ambati J (2011) DICER1 deficit induces Alu RNA toxicity in age-related macular degeneration. Nature 471: 325-330

Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32: D493-496

Karreth FA, Tay Y, Perna D, Ala U, Tan SM, Rust AG, DeNicola G, Webster KA, Weiss D, Perez-Mancera PA, Krauthammer M, Halaban R, Provero P, Adams DJ, Tuveson DA, Pandolfi PP (2011) In vivo identification of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced mouse model of melanoma. Cell 147: 382-395

Kawahara Y, Megraw M, Kreider E, Iizasa H, Valente L, Hatzigeorgiou AG, Nishikura K (2008) Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36: 5270-5280

Kawahara Y, Zinshteyn B, Chendrimada TP, Shiekhattar R, Nishikura K (2007a) RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8: 763-769

Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K (2007b) Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315: 1137-1140

138

Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12: 996-1006

Khvorova A, Reynolds A, Jayasena SD (2003 ) Functional siRNAs and miRNAs Exhibit Strand Bias. Cell 115: 209-216

Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A (2004) Widespread RNA editing of embedded alu elements in the human transcriptome. Genome Res 14: 1719-1725

Kim YK, Kim VN (2007) Processing of intronic microRNAs. Embo J 26: 775-783

Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3: ra8

Kozomara A, Griffiths-Jones S (2011) miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39: D152-157

Krutzfeldt J, Rajewsky N, Braich R, Rajeev KG, Tuschl T, Manoharan M, Stoffel M (2005 ) Silencing of microRNAs in vivo with 'antagomirs'. Nature 438: 685-689

Lai EC (2002) Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genetics 30: 363-364

Lal A, Navarro F, Maher CA, Maliszewski LE, Yan N, O'Day E, Chowdhury D, Dykxhoorn DM, Tsai P, Hofmann O, Becker KG, Gorospe M, Hide W, Lieberman J (2009) miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to "seedless" 3'UTR microRNA recognition elements. Mol Cell 35: 610-625

139

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ (2001 ) Initial sequencing and analysis of the human genome. Nature 409: 860-921

Lehnert S, Van Loo P, Thilakarathne PJ, Marynen P, Verbeke G, Schuit FC (2009) Evidence for co-evolution between human microRNAs and Alu-repeats. PLoS ONE 4: e4456

Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005

140

Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF (2004 ) Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol 22: 1001-1005

Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 15-20

Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115: 787-798

Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM (2009) Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324: 1210-1213

Liang H, Landweber LF (2007) Hypothesis: RNA editing of microRNA target sites in humans? Rna 13: 463-467

Luciano DJ, Mirsky H, Vendetti NJ, Maas S (2004) RNA editing of a miRNA precursor. RNA 10: 1174-1177

Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U (2004) Nuclear export of microRNA precursors. Science 303: 95-98

Ma Y, Creanga A, Lum L, Beachy PA (2006) Prevalence of off-target effects in Drosophila RNA interference screens. Nature 443: 359-363

Macrae IJ (2006) Structural basis for double-stranded RNA processing by Dicer. Science 311: 195-198

Martin JN, Wolken N, Brown T, Dauer WT, Ehrlich ME, Gonzalez-Alegre P (2011) Lethal toxicity caused by expression of shRNA in the mouse striatum: implications for therapeutic design. Gene Ther

141

Martí E, Pantano L, Bañez-Coronel M, Llorens F, Miñones-Moyano E, Porta S, Sumoy L, Ferrer I, Estivill X (2010) A myriad of miRNA variants in control and Huntington's disease brain regions detected by massively parallel sequencing. Nucleic Acids Res 38: 7219-7235

Matveeva O, Nechipurenko Y, Rossi L, Moore B, Saetrom P, Ogurtsov AY, Atkins JF, Shabalina SA (2007) Comparison of approaches for rational siRNA design leading to a new efficient and transparent method. Nucleic Acids Res 35: e63

McBride JL, Boudreau RL, Harper SQ, Staber PD, Monteys AM, Martins I, Gilmore BL, Burstein H, Peluso RW, Polisky B, Carter BJ, Davidson BL (2008) Artificial miRNAs mitigate shRNA-mediated toxicity in the brain: Implications for the therapeutic development of RNAi. Proc Natl Acad Sci U S A 105: 5868-5873

McBride JL, Pitzer MR, Boudreau RL, Dufour B, Hobbs T, Ojeda SR, Davidson BL (2011) Preclinical safety of RNAi-mediated HTT suppression in the rhesus macaque as a potential therapy for Huntington's disease. Mol Ther 19: 2152-2162

Mercer TR, Dinger ME, Sunkin SM, Mehler MF, Mattick JS (2008) Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A 105: 716-721

Miller V, Gouvion C, Davidson B, Paulson H (2004 ) Targeting Alzheimer's disease genes with RNA interference: an efficient strategy for silencing mutant allele. Nucleic Acids Res 32: 661-668

Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, Carpenter AE, Foo SY, Stewart SA, Stockwell BR, Hacohen N, Hahn WC, Lander ES, Sabatini DM, Root DE (2006) A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124: 1283-1298

Monteys AM, Spengler RM, Wan J, Tecedor L, Lennox KA, Xing Y, Davidson BL (2010) Structure and activity of putative intronic miRNA promoters. Rna-a Publication of the Rna Society 16: 495-505

Naito Y, Yamada T, Ui-Tei K, Morishita S, Saigo K (2004 ) siDirect: highly effective, target-specific siRNA design software for mammalian RNA interference. Nucleic Acids Res 32: W124-129

142

Newman MA, Thomson JM, Hammond SM (2008) Lin-28 interaction with the let-7 precursor loop mediates regulated microRNA processing. RNA 14: 1539-1549

Ng L, Bernard A, Lau C, Overly CC, Dong HW, Kuan C, Pathak S, Sunkin SM, Dang C, Bohland JW, Bokil H, Mitra PP, Puelles L, Hohmann J, Anderson DJ, Lein ES, Jones AR, Hawrylycz M (2009) An anatomic gene expression atlas of the adult mouse brain. Nat Neurosci 12: 356-362

Nielsen CB, Shomron N, Sandberg R, Hornstein E, Kitzman J, Burge CB (2007) Determinants of targeting by endogenous and exogenous microRNAs and siRNAs. Rna 13: 1894-1910

Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC (2007) The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130: 89-100

Osenberg S, Dominissini D, Rechavi G, Eisenberg E (2009) Widespread cleavage of A-to-I hyperediting substrates. RNA 15: 1632-1639

Packer AN, Xing Y, Harper SQ, Jones L, Davidson BL (2008) The bifunctional microRNA miR-9/miR-9* regulates REST and CoREST and is downregulated in Huntington's disease. J Neurosci 28: 14341-14346

Piriyapongsa J, Jordan IK (2007) A family of human microRNA genes from miniature inverted-repeat transposable elements. PLoS One 2: e203

Piriyapongsa J, Marino-Ramirez L, Jordan IK (2007) Origin and evolution of human microRNAs from transposable elements. Genetics 176: 1323-1337

Piskounova E, Polytarchou C, Thornton JE, LaPierre RJ, Pothoulakis C, Hagan JP, Iliopoulos D, Gregory RI (2011) Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell 147: 1066-1079

143

Poliseno L, Salmena L, Zhang J, Carver B, Haveman WJ, Pandolfi PP (2010) A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465: 1033-1038

Provost P, Dishart D, Doucet J, Frendewey D, Samuelsson B, Radmark O (2002 ) Ribonuclease activity and RNA binding of recombinant human Dicer. The EMBO Journal 21: 5864-5874

Pruitt KD, Tatusova T, Maglott DR (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501-504

Ray DA, Batzer MA (2005) Tracking Alu evolution in New World primates. BMC Evol Biol 5: 51

Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81: 145-166

Saito K, Ishizuka A, Siomi H, Siomi MC (2005) Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol 3: e235

Scadden AD (2005) The RISC subunit Tudor-SN binds to hyper-edited double-stranded RNA and promotes its cleavage. Nat Struct Mol Biol 12: 489-496

Schirle NT, MacRae IJ (2012) The crystal structure of human Argonaute2. Science 336: 1037-1040

Schultz N, Marenstein DR, De Angelis DA, Wang WQ, Nelander S, Jacobsen A, Marks DS, Massague J, Sander C (2011) Off-target effects dominate a large-scale RNAi screen for modulators of the TGF-beta pathway and reveal microRNA regulation of TGFBR2. Silence 2: 3

Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115: 199-208

144

Semizarov D, Frost L, Sarthy A, Kroeger P, Halbert DN, Fesik SW (2003 ) Specificity of short interfering RNA determined through gene expression signatures. Proc Natl Acad Sci U S A 100: 6347-6352

Shin C, Nam JW, Farh KK, Chiang HR, Shkumatava A, Bartel DP (2010) Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38: 789-802

Sigoillot FD, Lyman S, Huckins JF, Adamson B, Chung E, Quattrochi B, King RW (2012) A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. Nat Methods 9: 363-366

Smalheiser NR, Torvik VI (2005) Mammalian microRNAs derived from genomic repeats. Trends Genet 21: 322-326

Smalheiser NR, Torvik VI (2006a) Alu elements within human mRNAs are probable microRNA targets. Trends Genet 22: 532-536

Smalheiser NR, Torvik VI (2006b) Complications in mammalian microRNA target prediction. Methods Mol Biol 342: 115-127

Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101: 6062-6067

Sumazin P, Yang X, Chiu HS, Chung WJ, Iyer A, Llobet-Navas D, Rajbhandari P, Bansal M, Guarnieri P, Silva J, Califano A (2011) An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147: 370-381

Tan GS, Garchow BG, Liu X, Yeung J, Morris JP, Cuellar TL, McManus MT, Kiriakidou M (2009) Expanded RNA-binding activities of mammalian Argonaute 2. Nucleic Acids Res 37: 7533-7545

145

Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, Karreth F, Poliseno L, Provero P, Di Cunto F, Lieberman J, Rigoutsos I, Pandolfi PP (2011) Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell 147: 344-357

Vaish N, Chen F, Seth S, Fosnaugh K, Liu Y, Adami R, Brown T, Chen Y, Harvie P, Johns R, Severson G, Granger B, Charmley P, Houston M, Templin MV, Polisky B (2011) Improved specificity of gene silencing by siRNAs containing unlocked nucleobase analogs. Nucleic Acids Res 39: 1823-1832

Vasudevan S, Tong Y, Steitz JA (2007) Switching from repression to activation: microRNAs can up-regulate translation. Science 318: 1931-1934

Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y (2006) An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinformatics 7: 520

Wahlstedt H, Daniel C, Enstero M, Ohman M (2009) Large-scale mRNA sequencing determines global regulation of RNA editing during brain development. Genome Res 19: 978-986

Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43: 904-914

Wang X, Wang X, Varma RK, Beauchamp L, Magdaleno S, Sendera TJ (2009) Selection of hyperfunctional siRNAs with improved potency and specificity. Nucleic Acids Res 37: e152

Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35: D5-12

Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, 3rd, Su AI (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 10: R130

146

Yang JH, Shao P, Zhou H, Chen YQ, Qu LH (2010) deepBase: a database for deeply annotating and mining deep sequencing data. Nucleic Acids Res 38: D123-130

Yang W (2006) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nature Struct Mol Biol 13: 13-21

Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K (2006 ) Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13: 13-21

Yi R, Doehle BP, Qin Y, Macara IG, Cullen BR (2005) Overexpression of exportin 5 enhances RNA interference mediated by short hairpin RNAs and microRNAs. RNA 11: 220-226

Yoshida M, Kaziro Y, Ukita T (1968) The modification of nucleosides and nucleotides. X. Evidence for the important role of inosine residue in codon recognition of yeast alanine tRNA. Biochim Biophys Acta 166: 646-655

Zhang XD, Santini F, Lacson R, Marine SD, Wu Q, Benetti L, Yang R, McCampbell A, Berger JP, Toolan DM, Stec EM, Holder DJ, Soper KA, Heyse JF, Ferrer M (2011) cSSMD: assessing collective activity for addressing off-target effects in genome-scale RNA interference screens. Bioinformatics 27: 2775-2781

Documents

Mechanisms Of MicroRNA evolution, regulation and function