225
STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF NONCODING RNAS IN MAMMALIAN CELLS by Sungyul Lee A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland December, 2015 © 2015 Sungyul Lee All Rights Reserved

LEE-DISSERTATION-2016.pdf - JScholarship

Embed Size (px)

Citation preview

STRUCTURAL AND FUNCTIONAL CHARACTERIZATION OF

NONCODING RNAS IN MAMMALIAN CELLS

by

Sungyul Lee

A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy

Baltimore, Maryland

December, 2015

© 2015 Sungyul Lee

All Rights Reserved

ii

Abstract

Francis Crick proposed the central dogma of molecular biology more than a half century

ago focusing on the role of RNA as a messenger which delivers genetic information from

DNA to protein. However, it is now clear that RNA constitute a major player in every

aspects of biological processes as much as protein does, through their noncoding

functions. While early studies of RNA biology were mostly centered around abundant

and constitutive noncoding RNAs in ribosome, spliceosome, transcriptional machinery

and telomere, recent studies are now shifting their heads toward less abundant and

dynamically regulated tissue or developmental time specific noncoding RNAs such as

microRNAs (miRNAs) and long noncoding RNAs (lncRNAs). With advent of new

analytic tools and massive amount of sequencing data, there have been continued

unexpected discoveries revealing how our genome is written and read inside the cell.

MicroRNAs are ~22 nt small RNAs that guide RISC proteins to their target genes

through their base complementary thereby achieve posttranscriptional gene repression.

The mechanism of repression is almost universal in animals but the regulation of their

expression is one of big questions in the field. In order to facilitate investigations of

expression control of miRNAs in mammals, we annotated genome-wide primary miRNA

transcripts of mouse and human. We undertook this endeavor to provide most

comprehensive transcriptional pictures across human and mouse genomes, which is a

major bottleneck in the elucidation of mechanisms that controls miRNA abundance. To

do this, we had to overcome 3 obstacles. First, we expressed dominant-negative

DROSHA mutant to suppress efficient hairpin cropping of microprocessor thefore

enriched un-processed primary transcripts for sequencing. Second, we used panel of

iii

human and mouse cell lines of diverse origin to increase coverage of miRNAs that are

expressed tissue specifically. Lastly, we collaborated with Steven Salzberg’s lab to

employ recently developed assembly algorithm, StringTie, which outperforms other

existing assembly tools for this application. Together these, we uncovered

unanticipated features and new potential regulatory mechanisms, including link between

pri-miRNAs and distant mRNAs, and alternative splicing and alternative promoter usage

that can produce transcripts carrying subsets of miRNAs encoded by polycistronic

clusters. These results provide a valuable resource for the study of mammalian miRNA

regulation.

Another class of emerging regulatory noncoding RNA is long noncoding RNA (lncRNA).

Although current human genome annotation predicts almost similar number of genes

encoding lncRNA as protein coding genes, the question remains how many of them are

indeed plays integral part of diverse biological functions. Unlike miRNA, mechanisms of

lncRNAs are quite unique in each case, making it difficult to predict their function based

on primary sequence. One of very limited number of ways to find their functions is to

investigate their phenotype in cellular or organismal level after introducing genetic

ablation. Through the screening of lncRNA that are induced after DNA damage, we

identified NORAD which suggested its functionality given their high conservation in

mammals, high abundance, and association with an interesting biological cue (i.e.

induction after DNA damage). Surprisingly, cells inactivated NORAD expression

showed increased level of numerical and structural chromosomal instability. We found

this transcript harbors unusually high number of PUMILIO binding motifs allowing it to

sequester this RNA binding protein (RBP), thereby suppressing its repressive activity on

its targets. PUMILIO targets includes factors important for DNA damage response, DNA

iv

repair, and mitosis. Overexpression of PUMILIO also showed suppression of these

target genes and phenocopied NORAD knockout cells. I also generated knockout

mouse of clear NORAD ortholog Norad, using CRISPR/Cas9 technology. It might be

very interesting to see the same phenotype in this animal, and possibly other

phenotypes that we couldn’t observe due to simplicity of cultured cells. Altogether this

study shows novel mechanism of genomic stability maintenance through sequestrating

PUMILIO by a lncRNA, NORAD.

Advisor: Joshua T. Mendell, M.D., Ph.D.

Reader: Ben Ho Park, M.D., Ph.D. and Haig H. Kazazian, M.D., Ph.D.

v

Preface

My dissertation work written in this book only partially reflects what I was given and

supported from wonderful people and institutions around me. Without helps and

influences from them, this work could never been materialized. First and foremost, I’m

immensely grateful to my mentor Joshua Mendell, and I’m truly indebted for his scientific

acumen and critical thinking. His enthusiasm for unknowns and pursuit of perfection

always inspired me and motivated my scientific creativity. I believe his influence and

legacy will continue to be remained on my future career. My thesis committee members,

Haig Kazazian and Ben Ho Park provided me valuable guidance throughout my thesis

work. I could only continue to be professionally nurtured through our annual meetings,

with their constructive criticisms and solutions for problems each time I had. I also thank

my colleagues in Mendell lab. In particular, Tsung-Cheng Chang taught me so many

useful experimental technics and Florian Kopp was always there with me to discuss and

perform exciting works together. I thank my graduate program Pathobiology at Johns

Hopkins for giving me administrative and financial support. Lab manager Ana Doughty

was the most helpful people I ever met and Molecular biology department in UT

Southwestern enabled me to continue my work in Dallas. I thank our excellent

collaboration groups including Stephen Salzberg lab, Yang Xie lab, and Hongtao Yu lab.

Finally, I can’t finish my acknowledgements without saying thank you to my family. My

parents Se-il and Young-sook inherited in me their appreciation of hard-work and

thankfulness for everything happening around me. My proud son, Shihoo is the energy

that always drives me go and the best motivation of my life. This dissertation is

dedicated to Jung Hee, mother of my son and wife of mine, who shares every sorrows

and joys of my life with me.

vi

Table of Contents

Abstract ........................................................................................................................... ii

Preface ............................................................................................................................ v

Table of Contents ........................................................................................................... vi

List of Tables ................................................................................................................. vii

List of Figures ............................................................................................................... viii

Chapter 1: Introduction ................................................................................................... 1

Chapter 2: Genome-wide annotation of microRNA primary transcript structures ...........10

Introduction ................................................................................................................10

Results .......................................................................................................................14

Discussion .................................................................................................................61

Materials and methods ...............................................................................................64

Chapter 3: Characterization and loss of function study of a human long noncoding RNA

induced by DNA damage, NORAD ................................................................................74

Introduction ................................................................................................................74

Results .......................................................................................................................77

Discussion ............................................................................................................... 107

Materials and Methods ............................................................................................. 109

Chapter 4: Mechanism of chromosome instability in NORAD depleted cells ................ 120

Introduction .............................................................................................................. 120

Results ..................................................................................................................... 123

Discussion ............................................................................................................... 156

Materials and Methods ............................................................................................. 161

Chapter 5: Generation of Norad knockout mouse using CRISPR/Cas9 genome editing

system ......................................................................................................................... 175

Introduction .............................................................................................................. 175

Results ..................................................................................................................... 176

Discussion ............................................................................................................... 181

Materials and Methods ............................................................................................. 182

Chapter 6: Future directions ........................................................................................ 188

Appendix ..................................................................................................................... 195

References .................................................................................................................. 196

Curriculum Vitae .......................................................................................................... 214

vii

List of Tables

Table 2.1 Conserved miRNAs encoded by newly annotated pri-miRNAs ......................17 Table 2.2 Evaluation of the performance of four transcriptome assembly programs on

pri-miRNAs that are annotated in Refseq ......................................................................21 Table 2.3 RNAseq mapping statistics ............................................................................28 Table 2.4 Novel potential regulatory mechanisms for conserved human non-protein

coding pri-miRNAs.........................................................................................................47 Table 2.5 Novel potential regulatory mechanisms for conserved mouse non-protein

coding pri-miRNAs.........................................................................................................48 Table 2.6 Primer sequences for mutagenesis ................................................................65 Table 2.7 Primer sequences for real-time RT-PCR ........................................................66 Table 2.8 Primer sequences for RACE in Fig 2.11 ........................................................67 Table 2.9 Primer sequences for RACE in Fig 2.17 ........................................................68 Table 2.10 Primer sequences for RACE in Fig 2.20.......................................................69 Table 2.11 Primer sequences for RT-PCR ....................................................................70 Table 2.7 Transfection methods ....................................................................................72 Table 3.1 TALEN RVDs and target sequences for NORAD ......................................... 110 Table 3.2 Primers used to amplify homology arms for NORAD LSL knock-in .............. 111 Table 3.3 Primers used for genotyping genome edited single cell derived clones ........ 113 Table 3.4 siRNA target sequences .............................................................................. 114 Table 3.5 Primers used to generate northern blot probe .............................................. 116 Table 3.6 Primers used for 3’ RACE ............................................................................ 119 Table 4.1 PUM target genes that are downregulated in NORAD−/− cells and required for

genomic stability .......................................................................................................... 155 Table 4.2 Primers used for in vitro transcription for NORAD affinity purification ........... 163 Table 4.3 Oligos for cloning sgRNA into CRISPR/Cas9 plasmids ................................ 170 Table 4.4 TA cloning of PUM CRISPR/Cas9 targeted alleles ...................................... 171 Table 4.5 siRNA target sequence of PUM ................................................................... 172 Table 4.6 qPCR primers .............................................................................................. 173 Table 5.1 Oligos used for CRISPR/Cas9 plasmid construction .................................... 183 Table 5.2 Primers used for T7 Endonuclease I cleavage assay ................................... 185 Table 5.3 Primers used for in vitro transcription of sgRNA and Cas9 mRNA ............... 187

viii

List of Figures

Figure 2.1 Overview of the organization and existing annotation of conserved human

miRNA genes ................................................................................................................15 Figure 2.2 DROSHA inhibition enables capturing primary microRNA transcripts ...........19 Figure 2.3 DROSHA inhibition facilitates pri-miRNA assembly ......................................23 Figure 2.4 RT-PCR validation of newly assembled primary transcripts encoding human

miR-221 and miR-222. ..................................................................................................25 Figure 2.5 Overview of the experimental workflow used to generate pri-miRNA

assemblies. ...................................................................................................................27 Figure 2.6 General characteristics of human and mouse pri-miRNAs ............................30 Figure 2.7 Examples of evolutionarily conserved pri-miRNAs ........................................32 Figure 2.8 RT-PCR validation of newly assembled primary transcripts encoding human

and mouse miR-101-1. ..................................................................................................34 Figure 2.9 RT-PCR validation of newly assembled primary transcripts encoding mouse

miR-101-1 .....................................................................................................................35 Figure 2.10 Classification of newly annotated miRNA genes .........................................38

Figure 2.11 5 and 3 RACE analysis of newly assembled primary transcripts encoding

human miR-30a and miR-30c-2 .....................................................................................39 Figure 2.12 RT-PCR validation of newly assembled primary transcripts encoding human

miR-30a and miR-30c-2. ...............................................................................................40 Figure 2.13 RT-PCR validation of newly assembled primary transcripts encoding human

miR-505 .........................................................................................................................42 Figure 2.14 Additional examples of human miRNAs that are transcribed as extensions of

annotated protein-coding genes ....................................................................................43 Figure 2.15 RT-PCR validation of the newly assembled primary transcript encoding

human miR-99b, let-7e, and miR-125a ..........................................................................45 Figure 2.16 Examples of newly-identified miRNA regulatory mechanisms .....................50

Figure 2.17 5 RACE analysis of primary transcripts encoding human let-7a-3 and let-7b

......................................................................................................................................51 Figure 2.18 RT-PCR validation of newly assembled primary transcripts encoding human

let-7a-3 and let-7b .........................................................................................................52 Figure 2.19 Host genes for miRNA cluster .....................................................................54

Figure 2.20 5 RACE analysis of primary transcripts encoding human miR-100, let-7a-2,

and miR-125b-1 .............................................................................................................56 Figure 2.21 RT-PCR validation of newly assembled primary transcripts encoding human

miR-100, let-7a-2, and miR-125b-1 ...............................................................................57 Figure 2.22 miRNA biogenesis can be affected by alternative splicing ..........................59 Figure 2.23 RT-PCR validation of primary transcripts encoding human miR-205 ...........60 Figure 3.1 Evolutionary conservation of mammalian noncoding RNA, NORAD .............79 Figure 3.2 NORAD expression in human tissues ...........................................................80 Figure 3.3 NORAD is induced by DNA damage and expressed abundantly in multiple

human cell lines .............................................................................................................82 Figure 3.4 NORAD shows very low coding potential as determined by codon substitution

frequency ......................................................................................................................84

ix

Figure 3.5 Genome editing to inactive NORAD and validation of edited alleles by

Southern blot .................................................................................................................86 Figure 3.6 Validation of NORAD targeting in HCT116 cells ...........................................87 Figure 3.7 DNA damage-induced G1 and G2 checkpoints are grossly intact in NORAD−/−

HCT116 cells .................................................................................................................89 Figure 3.8 Genetic inactivation of NORAD results in chromosomal instability in human

cells ...............................................................................................................................90 Figure 3.9 Chromosome instability can be measured by interphase DNA FISH for

statistical analyses.........................................................................................................92 Figure 3.10 Time-lapse image of mitotic defects in NORAD−/− HCT116 cells .................93 Figure 3.11 Non-recurrent de novo chromosomal rearrangements in NORAD−/− clones 94 Figure 3.12 Inactivation of NORAD in nontransformed BJ-5ta cells results in

chromosomal instability .................................................................................................97 Figure 3.13 TALEN-mediated genome editing is not a general cause of chromosomal

instability .......................................................................................................................99 Figure 3.14 NORAD knock-down using siRNA shows similar phenotype as TALEN-

mediated NORAD inactivation ..................................................................................... 101 Figure 3.15 Cre-induced de-repression of NORAD rescues chromosomal instability ... 104 Figure 3.16 Tetraploidy is a stable state in NORAD−/− cells whereas diploid cells lacking

NORAD generate new tetraploid subclones ................................................................ 106 Figure 4.1 NORAD is localized predominantly to the cytoplasm .................................. 124 Figure 4.2 Domain structure of NORAD ....................................................................... 126 Figure 4.3 NORAD interacts with PUMILIO proteins .................................................... 129 Figure 4.4 PAR-CLIP identifies NORAD as a major PUM2 target ................................ 131 Figure 4.5 NORAD and Norad pseudogenes in human and mouse genomes ............. 134 Figure 4.6 PUM2 PAR-CLIP reveals NORAD as the most preferred PUM2 binding

transcript ..................................................................................................................... 136 Figure 4.7 Conserved 15 PUMILIO binding sites in NORAD ........................................ 138 Figure 4.8 PUM2 PAR-CLIP reads clusters on predicted PRE consensus motifs of

NORAD ....................................................................................................................... 139 Figure 4.9 Measurement of the number of PUM1 and PUM2 protein molecules per

HCT116 cell................................................................................................................. 141 Figure 4.10 PUM2 targets are down-regulated in NORAD−/− cells ............................... 143 Figure 4.11 PUMILIO overexpression phenocopies both the molecular and phenotypic

consequences of NORAD inactivation. ........................................................................ 146 Figure 4.12 PUMILIO knockout masks the phenotype of NORAD inactivation. ............ 148 Figure 4.13 PUMILIO knockdown rescues phenotype of NORAD inactivation. ............ 149 Figure 4.14 Genes required for the maintenance of chromosomal stability are repressed

in NORAD−/− and PUM1/2-overexpressing cells........................................................... 152 Figure 4.15 Genes required for the maintenance of chromosomal stability are repressed

in NORAD−/− and PUM1/2-overexpressing cells........................................................... 154 Figure 4.16 A novel NORAD-PUMILIO axis that regulates genomic stability ............... 157 Figure 5.1 Two flanking gRNAs were designed to generate Norad deletion allele ....... 177 Figure 5.2 Assessment of CRISPR/Cas9 activity in mouse ES cells ............................ 178 Figure 5.3 Injectable form of RNAs into one-cell mouse embryo .................................. 180 Figure 6.1 Graphical summary of NORAD function...................................................... 191

1

Chapter 1: Introduction

Early studies of RNA biology

After initial demonstrations that DNA is the genetic material (Avery et al., 1944; Hershey

and Chase, 1952), “messenger” function of RNA for protein synthesis was proposed

(Jacob and Monod, 1961) embodying a fundamental concept of molecular biology – The

central dogma (Crick, 1970). Yet, this simplicity of genetic information flow has been

challenged many times by continued discoveries of various types of RNA species that

are different from messenger RNA (mRNA) (Cech and Steitz, 2014). In early days,

heteronuclear RNA (hnRNA) were isolated from HeLa cell nuclei (Warner et al., 1966)

and later found that some fractions were dissociated from polyribosomes and doesn’t

contribute to mRNA (Salditt-Georgieff et al., 1981; Salditt-Georgieff and Darnell, 1982).

One could have conceived these non-ribosome bound RNA might have some non-

coding function until they turned out to be temporary precursors of mRNA before splicing

event (Berget et al., 1977; Chow et al., 1977). However, there are overwhelming

2

numbers of example that significant portions of RNA molecules in cells are bona fide

noncoding transcripts.

Instead of merely being scaffold of protein components of ribosome, ribosomal RNA

(rRNA) has been shown to have catalytic functions for protein synthesis (Dahlberg,

1989) while transfer RNA (tRNA) plays adapter function bridging mRNA codon and

amino acid (Hoagland et al., 1958). In nucleoli, small nucleolar RNA (snoRNA) were

identified (Zieve and Penman, 1976) and later found they utilize base-paring to guide

small nucleolar ribonucleoprotein (snoRNP) for rRNA and other types of RNA for their

chemical modifications and processing (Kiss-Laszlo et al., 1996; Ni et al., 1997) which

are important steps for ribosome biogenesis. Since the report of highly abundant U-rich

small RNA in HeLa cells (Weinberg and Penman, 1968) rich literatures have been

accumulated describing how U-rich small nuclear RNA (U snRNA) functions in splicing

by base-paring with splice sites and induce catalytic activity in the spliceosome (Busch

et al., 1982). At the tip of linear eukaryotic chromosome, ribonucleoprotein (RNP)

telomerase maintains length of telomere by synthesizing telomere repeats (Greider and

Blackburn, 1989) and RNA components of this RNP (TR, TER, or TERC) functions as a

“flexible scaffold” bringing accessary proteins required for telomerase reverse

transcriptase (TERT) activity (Zappulla and Cech, 2004). 7SK is also known to function

as scaffolding different protein components required for another important biological

process - elongation phase of pol II transcription. This highly structured RNA binds to

Hexim1 and LARP7 and regulate P-TEFb elongation factor (Yik et al., 2003).

3

Noncoding functions of RNA are not only limited in the cell nucleus. 7SL RNA scaffolds

formation of signal recognition particle (SRP) that enables translocation of nascent

proteins across the endoplasmic reticulum (ER) (Walter and Blobel, 1982). This RNA

component is known to stabilize SRP complex and enhances interaction between SRP

and SRP receptor (Doudna and Batey, 2004). More recently, small RNAs in the

cytoplasm that regulate post-transcriptional gene expression were discovered (Lee et al.,

1993; Wightman et al., 1993). Instead of constitutive cellular functions such as mRNA

production and maturation, protein synthesis and transport, and telomere maintenance,

these tiny RNA species are known to fine-tune levels of mRNAs. Their expression

patterns are usually tissue and/or developmental time-specific, explaining such a long

time it had been taken before its existence and mechanism of actions were revealed in

the history of RNA biology.

Discovery of microRNA and functions in human physiology and disease

The phenomenon of RNA interference (RNAi) was first hinted from RNA delivery

experiments in plants (Napoli et al., 1990) and later discovered by Andrew Fire and

Craig Mello that double-stranded RNA is responsible reagent for this sequence-specific

gene silencing effect (Fire et al., 1998). In the meantime, two independent groups, led

by Victor Ambros and Gary Ruvkun, found 22 nucleotide (nt) small RNA encoded by lin-

4 regulates lin-14 posttranscriptionally in developmental timing of nematode worm, C.

elegans (Lee et al., 1993; Wightman et al., 1993). However, due to lack of sequence

homology of lin-4 in other animals, these ground-breaking findings were not fully

appreciated until the discovery of 21 nt RNA let-7 (Reinhart et al., 2000) which is deeply

conserved in all bilaterian animals (Pasquinelli et al., 2000) suggesting similar

posttranscriptional gene silencing (PTGS) mediated by these small RNAs might be a

4

general gene regulatory mechanism (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee

and Ambros, 2001), evolved from very early evolutionary time. Collectively classified as

microRNA (miRNA), these small RNAs were further found to be conserved in animals,

plants, fungi and protozoa (Bartel, 2004).

Animal miRNAs are transcribed by RNA polymerase II as primary transcripts (pri-

miRNAs) (Lee et al., 2004 ; Cai et al., 2004) and their biogenesis involves two steps of

endonuclease processing (Lee et al., 2002). Initial transcript with characteristic hairpin

structure is co-transcriptionally cropped by group of proteins called microprocessor

which includes RNase III-type endonuclease, Drosha and DGCR8 in the nucleus (Lee et

al., 2003). This ~70 nt precursor microRNA (pre-miRNA) is then exported to the

cytoplasm by exportin 5 (XPO5), mediated by RanGTP (Yi et al., 2003; Bohnsack et al.,

2004). Subsequently this intermediate precursor is further processed by another RNase

III protein, Dicer (Ketting et al., 2001; Knight and Bass, 2001) and cleaved into ~22 nt

small dsRNA. One of two strands, called guide RNA is preferentially selected and

loaded onto Argonaute (Ago) proteins which is catalytic component of the RNA-induced

silencing complex, or RISC (Hammond et al., 2000). RISC utilize sequence

complementary of guide RNA to target sequences at 3’ UTR of mRNA (Bartel, 2009),

leading to destabilization of target (Guo et al., 2010), mostly through de-adenylation (Wu

et al., 2006; Giraldez et al., 2006).

Initially discovered in animal developmental process (Lee et al., 1993; Wightman et al.,

1993), gene regulatory mechanism by miRNAs were also found to be important for other

diverse biological processes and human diseases. For example, miR-15a/16-1 cluster

was frequently found to be deleted in B-cell chronic lymphocytic leukemia (B-CLL)

5

patient samples (Calin et al., 2002). Many followed literatures suggested miRNA

profiling can be utilized for diagnosis, stratification, and prognosis of cancer (Calin et al.,

2005; Calin and Croce, 2006; Lu et al., 2005) and even as a therapeutic measure

(Chivukula and Hollands, 2012). miRNA dysfunction is also known to be linked to

cardiovascular disease and genetic disorders in human (Mendell and Olson, 2012).

Now that it becomes evident that these small RNAs are integral components of human

physiology and disease, it instantaneously begs the following question. How each

miRNA expression is regulated in particular spatiotemporal settings? In order to address

this question, we first need to know how genes encoding miRNAs are structured into our

genome and wired into transcriptional and posttranscriptional regulatory networks, which

is far from being carefully studied systemically. Our lab and others have invested great

efforts to demonstrate that well-established transcription factors, such as Myc and p53

are functionally intergraded into their oncogenic or tumor-suppressive signaling circuitry

(He et al., 2007; O'Donnell et al., 2005; Chang et al., 2008). However, without a

comprehensive map describing in which configuration these genes are embedded and

transcribed, such studies cannot be accelerated any further. Therefore, chapter 2 of this

dissertation aims to provide a valuable resource of genome-wide annotation of miRNA

primary transcripts and classify each type of transcripts, enabling further researches in

the field.

Long noncoding RNAs transcribed in the human genome

The human genome carries nearly three billion bases of information but only a tiny

fraction of less than 2% is known to be protein coding (Lander et al., 2001 ; Consortium

6

et al., 2007). However, recent genome-wide interrogations of mammalian transcriptome

enabled by genome tiling array and next-generation sequencing (NGS) technology

revealed that transcription is pervasive in genomes (Bertone et al., 2004 ; Carninci et al.,

2005 ; Djebali et al., 2012) implying thousands of noncoding transcripts are being

actively generated at least in some tissues and cell types. The exploration of the human

transcriptome has paved the way for the discovery of a variety of new noncoding RNA

classes and their multiple biological functions, revolutionizing the thoughts on the role of

the non-protein coding space in the human genome (Cech and Steitz, 2014). One of

these emerging types of RNA is the class of long noncoding RNA (lncRNA), which is a

heterogeneous group of transcripts that is defined by a sequence length of more than

200 nucleotides and by the lack of any obvious open reading frame (ORF) (Guttman et

al., 2013).

Unveiling the roles of lncRNAs in physiology, including developmental processes,

epigenetic regulation, tissue differentiation and homeostasis (Pauli et al., 2011; Ulitsky et

al., 2011 ; Fatica and Bozzoni, 2014), as well as in pathophysiology, including cancer

and neurological disorders (Wapinski and Chang, 2011 ; Iyer et al., 2015 ; Faghihi et al.,

2008 ; Ziats and Rennert, 2013), contributed to the growing appreciation of their

importance in diverse aspects of biology. There has been many attempts to

comprehensively identify lncRNAs in the human genome, and many thousands of

transcripts with varying numbers were reported depending on the method used for

transcript construction (i.e. cDNAs, tiling array, or RNA-seq), the criteria utilized to

assess the coding potential (CSF, ORF length, or Pfam) and the types of cell lines or

tissue panels tested ((Ulitsky and Bartel, 2013). The current version of GENCODE (Ver

22) estimates 15,900 lncRNA genes (http://www.gencodegenes.org/) (Harrow et al.,

7

2012), and a recent meta-analysis of the human transcriptome predicted an even higher

and surprising number of 58,648 (Iyer et al., 2015), which represents more than twice

the number of protein coding genes. However, the exact number of lncRNAs in the

human genome is still under debate, and the biological role and functionality of the

overwhelming majority of these transcripts remain largely elusive.

Functional lncRNAs in mammals

Compared to other known noncoding RNA classes, lncRNAs stand out due to their

enormous diversity in terms of their evolutionary conservation, expression level,

molecular function, and genomic and cellular localization (Hung et al., 2014; Ulitsky and

Bartel, 2013). In the nucleus, lncRNAs such as XIST, HOTAIR and HOTTIP are known

to regulate gene expression at the transcriptional level by associating with chromatin

remodeling complexes in cis or trans. Other types of nuclear lncRNAs include Firre and

PCGEM1, which modify three-dimensional nuclear architecture by mediating the

formation of interchromosomal domains or enhancer-promoter interactions (Rinn and

Chang, 2012; Quinodoz and Guttman, 2014; Bonasio and Shiekhattar, 2014).

Collectively, many nuclear lncRNAs have been reported to influence the genome (Sabin

et al., 2013). On the other hand, cytoplasmic lncRNAs post-transcriptionally regulate

gene expression by base pairing to their target mRNAs (Yoon et al., 2013; Fatica and

Bozzoni, 2014). For instance, BACE1-AS and TINCR stabilize their target mRNAs

(Faghihi et al., 2008; Kretz et al., 2013), whereas 1/2sbsRNA facilitates target mRNA

degradation (Gong and Maquat, 2011). Interestingly, lincRNA-p21 is known to repress

translation of target genes in the cytoplasm (Yoon et al., 2012) while also having cis-

regulatory activity in the nucleus (Dimitrova et al., 2014).

8

Although an expanding number of lncRNAs has been identified over recent years and

evidences for their important implications in human diseases are rapidly growing, studies

on lncRNAs are still in early infancy. As yet, there have been only a few extensive

genetic studies that provide strong evidence for the biological relevance of a small

number of lncRNAs. There are still doubts about the functionality of many lncRNAs due

to their relatively low abundance as compared to protein coding genes (Cabili et al.,

2011) and due to their marginal sequence conservation through evolution (Ulitsky and

Bartel, 2013), suggesting that many, if not most, of them might be by-products of

promiscuous Pol II transcription (Schultes et al., 2005; Struhl, 2007). Therefore, it is

critical to rigorously study each potential lncRNA of interest with loss-of-function

experiments followed by a thorough identification of the underlying mechanism to prove

its biological function and significance.

Through chapter 3 and chapter 4, we describe the characterization and functional

dissection of a poorly described lncRNA which we termed NORAD. Unlike many other

lncRNAs, NORAD is expressed as abundant as several housekeeping genes with a

ubiquitous expression pattern across multiple tissues, high sequence homology in

mammals and conserved synteny, implicating an important biological role. Interestingly,

NORAD loss-of-function results in increased structural and numerical aneuploidy. We

show that NORAD harbors an unusually high number of PREs and binds with high

affinity to PUMILIO, suggesting that NORAD can sequester the cellular pool of PUMILIO

proteins. Accordingly, PUMILIO overexpression phenocopies the CIN phenotype

caused by NORAD loss-of-function suggesting the following model: loss of NORAD

leads to hyperactivity of PUMILIO and in consequence to the suppression of PUMILIO-

regulated CIN suppressor genes, which renders cells susceptible to chromosome

9

segregation errors. Our findings provide a new genetic axis important for the

maintenance of chromosomal stability, in which a novel lncRNA modulates the activity of

a key regulatory protein of mRNA expression.

10

Chapter 2: Genome-wide annotation of microRNA primary

transcript structures

Introduction

microRNAs (miRNAs) are a broad class of ~18-24 nucleotide RNA molecules that play a

critical role in regulating gene expression in diverse physiologic settings and diseases by

negatively regulating the translation and stability of target messenger RNAs (mRNAs)

(Bartel, 2009). Over the past decade, significant progress has been made in identifying

miRNA targets and dissecting the mechanisms through which they are regulated by

miRNA-directed protein complexes (Gurtan and Sharp, 2013; Pasquinelli, 2012).

However, much less is known about how miRNA expression is regulated (Winter et al.,

2009; Schanen and Li, 2011). Through examination of mature miRNA levels, it is well

established that miRNA abundance is tightly controlled during development and across

tissues (Chiang et al., 2010; Landgraf et al., 2007). Moreover, dysregulated expression

11

of specific miRNAs plays a causative role in a number of human diseases, including

cancer and cardiovascular disease (Di Leva et al., 2014; Olson, 2014). Indeed, key

transcription factors and signaling pathways have been shown to strongly regulate

miRNA expression under diverse physiologic and pathophysiologic conditions

(Lotterman et al., 2008). Nevertheless, a major bottleneck in the dissection of the

mechanisms through which these pathways control miRNA levels has been our

incomplete understanding of miRNA gene structures.

miRNAs are initially transcribed by RNA polymerase II as long primary transcripts (pri-

miRNAs) that can extend hundreds of kilobases in length (Lee et al., 2004; Cai et al.,

2004). The mature miRNA sequences are located in introns or exons of pri-miRNAs,

within regions that fold into imperfect hairpin structures (Rodriguez et al., 2004). The

RNA-binding protein DGCR8 and the RNase III enzyme DROSHA together recognize

and cleave the hairpins, generating ~60-80 nucleotide precursors (pre-miRNAs) that are

subsequently exported to the cytoplasm where they are processed into mature miRNAs

by DICER. Once loaded into the Argonaute family of RNA-binding proteins, miRNAs

select mRNA targets for repression (Ha and Kim, 2014). While a subset of miRNAs are

hosted in well characterized protein-coding genes, the majority of pri-miRNAs are

transcribed as poorly-characterized noncoding transcripts (Rodriguez et al., 2004). Due

to the nature of rapid and efficient DROSHA/DGCR8 processing, the abundance of pri-

miRNAs is very low at steady-state. Therefore, elucidation of pri-miRNA structure has

remained a significant challenge. A further understanding of the organization of miRNA

transcription units will likely reveal new transcriptional and post-transcriptional regulatory

mechanisms that influence miRNA biogenesis and potentially uncover new opportunities

to manipulate miRNA expression for experimental or therapeutic applications.

12

Previous studies have systematically identified genomic locations of the promoters and

transcription start sites (TSSs) of miRNAs by integrating chromatin signatures such

H3K4me3 histone modifications, nucleosome position, cap analysis of gene expression

(CAGE) tags, and high-throughput TSS sequencing (TSS-Seq) (Chien et al., 2011;

Ozsolak et al., 2008; Georgakilas et al., 2014; Xiao et al., 2014; Marsico et al., 2013;

Megraw et al., 2009; Marson et al., 2008). Nevertheless, while providing valuable

information regarding the boundaries of miRNA transcription units, these approaches do

not provide annotation of the often complex splicing patterns of miRNA primary

transcripts and thus provide an incomplete picture of miRNA gene structure. Moreover,

miRNA promoters that are located at great distances from the mature miRNA sequence

are not easily associated with a given miRNA transcription unit and alternative promoter

usage can be difficult to discern. Finally, without an understanding of the structure of the

pri-miRNA itself, it is impossible to determine whether miRNAs encoded by polycistronic

clusters are always co-transcribed or whether transcripts carrying subsets of the

clustered miRNAs are produced through use of alternative promoters, polyadenylation

sites, or even through alternative splicing.

In recent years, high-throughput RNA sequencing (RNA-seq) has emerged as a

powerful tool for transcriptome reconstruction (Martin and Wang, 2011; McGettigan,

2013). Unfortunately, due to their low abundance, pri-miRNAs are poorly represented in

standard RNA-seq datasets, thus preventing comprehensive annotation of their

structures using existing methodologies. To overcome this limitation, we developed a

highly effective experimental and computational approach that allows genome-wide

mapping of miRNA primary transcript structures. By performing deep RNA-seq in cells

expressing a dominant negative DROSHA mutant protein, we demonstrated dramatic

enrichment of intact pri-miRNAs, resulting in much greater coverage of these transcripts

13

compared to standard RNA-seq. This strategy permitted the reconstruction of pri-

miRNA structures in a high-throughput manner. We applied this approach to human and

mouse cell lines of diverse origins, thereby significantly improving the existing annotation

of mammalian miRNA genes. These new assemblies revealed new regulatory

mechanisms for many miRNAs, including previously unknown connections between pri-

miRNAs and distant protein coding genes, alternative pri-miRNA splicing, and pri-miRNA

transcripts that produce subsets of miRNAs encoded by polycistronic clusters. This new

genome-wide map of pri-miRNA structure provides a valuable resource for investigating

the mechanisms that control miRNA expression in normal physiology and disease.

14

Results

Pri-miRNAs are poorly represented in standard RNA-seq datasets

In order to globally reconstruct pri-miRNA structures, we first examined existing RNA-

seq datasets to determine whether they could be used for this purpose. The Illumina

BodyMap 2.0 represents a collection of RNA-seq datasets generated from 16 human

tissues, each sequenced very deeply (~80 million 50 bp paired-end reads per sample)

(www.ebi.ac.uk/arrayexpress; ArrayExpress ID: E-MTAB-513). As described in greater

detail below, we determined that StringTie, a transcriptome assembler that we recently

described (Pertea et al., 2015), outperforms other existing assembly algorithms for pri-

miRNA reconstruction. We therefore employed StringTie to assess pri-miRNA assembly

using Illumina BodyMap data.

Although assemblies were attempted for all human pri-miRNAs, the quality and extent of

pri-miRNA reconstruction was assessed by examining a well-annotated set of miRNAs

that are highly conserved among mammals (Chiang et al., 2010). Non-conserved

human miRNAs were excluded from this performance analysis since these are

frequently expressed at low levels and there is no current consensus regarding which of

these represent bona fide miRNAs as opposed to non-functional RNAs that spuriously

enter the miRNA processing pathway (Chiang et al., 2010; Kozomara and Griffiths-

Jones, 2014). 295 human miRNAs, produced from 183 transcription units, are classified

as conserved among mammals (Figure 2.1).

15

Figure 2.1 Overview of the organization and existing annotation of conserved human miRNA genes

16

Of these 183 transcription units, 80 represent well-annotated protein coding genes,

whereas the remaining 103 are intergenic. While the structures of 29 of these intergenic

pri-miRNAs are annotated in RefSeq, the majority (74 of 103) have no existing

annotation. Assembly of all 16 BodyMap datasets using StringTie, which comprised the

analysis of over 1.2×109 reads, resulted in the assembly of only 11 additional novel pri-

miRNA structures covering the set of conserved miRNAs (Table 2.1). These results

indicate that standard RNA-seq libraries are inadequate for transcriptome-wide

reconstruction of pri-miRNA structures.

17

Table 2.1 Conserved miRNAs encoded by newly annotated pri-miRNAs

Class Illumina BodyMap 2.0

Human cell lines Mouse cell lines

Class I Independent noncoding transcription units

miR-23a/24-2/27ab miR-101-1/3671 miR-141/200cb miR-142a miR-193b/365ab miR-219-2a miR-223c

let-7a-1/7f-1/7d let-7i miR-10b miR-23a/24-2/27ab miR-29c/29b-2 miR-30a/30c-2b miR-30b/30db miR-34a miR-92bb miR-101-1/3671 miR-129-2 miR-130a miR-130b/301b miR-132/212b miR-138-1 miR-141/200cb miR-144/451a/4732b miR-146a/3142b miR-148ab miR-187b miR-192/194-2b miR-193b/365ab miR-194-1/215 miR-200a/200b/429 miR-221/222 miR-302a/302b/302c/302d/367

let-7a-1/7f-1/7d let-7i miR-7a-2 miR-17/18a/19a/20a/19b-1/92a-1 miR-31 miR-129-1a miR-129-2 miR-130a miR-133b/206a miR-137 miR-138-1 miR-138-2a miR-142a miR-150a miR-155 miR-191/425 miR-194-1/215 miR-199a-1a miR-219-2a miR-221/222 miR-302a/302b/302c/302d/367 miR-384 miR-670a miR-3074-1a

Class II Extension of existing protein-coding transcripts

miR-21 miR-505

miR-7-2 miR-21 miR-34b/34c miR-181c/181db miR-196a-1 miR-219-1 miR-324 miR-505

miR-10a miR-34b/34c miR-196a-1 miR-196a-2a miR-196b miR-200a/200b/429 miR-219-1 miR-320a miR-324 miR-331a miR-345a miR-505

Class III Extension of existing non-coding transcripts

miR-29a/29b-1 miR-370

let-7e/miR-99b/miR-125a miR-9-3 miR-29a/29b-1 miR-296/298 miR-370

let-7e/miR-99b/miR-125a miR-9-2 miR-18b/19b-2/20b/92a-2/106a/363a miR-29a/29b-1 miR-296/298

anot mapped in human cell lines bnot mapped in mouse cell lines cnot mapped in human and mouse cell lines

18

DROSHA inhibition facilitates pri-miRNA assembly

During miRNA biogenesis, pri-miRNAs are first processed in the nucleus by the

microprocessor complex composed of DROSHA and DGCR8. We reasoned that the

low steady-state abundance of pri-miRNAs, and their poor representation in standard

RNA-seq libraries, is most likely due to their rapid degradation following microprocessor-

mediated cleavage. Therefore, we hypothesized that slowed or disrupted

DROSHA/DGCR8 activity may result in an enrichment of pri-miRNAs in RNA-seq

libraries and thereby facilitate pri-miRNA assembly. To test this concept, a trans-

dominant negative DROSHA mutant protein (TN-DROSHA) containing inactivating

mutations in critical residues in the catalytic RNase IIIa and IIIb domains (Heo et al.,

2008) was ectopically expressed in HEK293T cells, and nuclear RNA was analyzed by

quantitative real time PCR (qRT-PCR). Amplicons spanning pre-miRNA hairpins in the

primary transcripts that encode the miR-15a/16-1 and miR-17-92 clusters (DLEU2 and

MIR17HG, respectively) were strongly enriched following TN-DROSHA expression,

indicating efficient inhibition of microprocessor activity (Figure 2.2). Importantly, distant

regions of these pri-miRNAs that do not span the pre-miRNA hairpins also showed

significant enrichment, suggesting that the entire pri-miRNA was stabilized.

19

Figure 2.2 DROSHA inhibition enables capturing primary microRNA transcripts

qPCR analysis of pri-miRNA abundance in HEK293T cells with or without expression of TN-DROSHA. The assayed transcripts DLEU2 and MIR17HG are depicted in the upper panel with green arrows indicating the location of primers. qPCR results are shown in the lower panel with error bars representing standard deviations derived from three independent measurements.

20

Next, we subjected the same nuclear RNA from TN-DROSHA expressing HEK293T

cells to Illumina RNA sequencing to test its suitability for transcriptome-wide pri-miRNA

assembly. After generating a very deep RNA-seq dataset (193,346,087 100bp paired-

end reads), we evaluated several transcriptome assemblers, such as StringTie, Cufflinks

(Trapnell et al., 2010), IsoLasso (Li et al., 2011), and Scripture (Guttman et al., 2010), to

assess their performance for this application (Table 2.2). By evaluating the assembly of

pri-miRNAs that are annotated in RefSeq, we found that StringTie correctly assembled

the highest number of pri-miRNA transcripts in considerably less time than the other

assemblers. We therefore used StringTie for all subsequent pri-miRNA assembly

experiments.

21

Table 2.2 Evaluation of the performance of four transcriptome assembly programs on pri-miRNAs that are annotated in Refseq1

Program Number of predicted pri-miRNA transcripts matching the RefSeq annotation

Number of RefSeq pri-miRNAs for which at least one transcript was assembled correctly by the program

Running Time (hours:minutes

:seconds)

StringTie 561 467 1:13:23

Cufflinks 378 337 21:01:08

IsoLasso 90 82 14:36:04

Scripture 293 200 65:57:32

1 Note: There are 788 Refseq genes (1,836 transcripts) that overlap 876 miRNAs annotated in miRBase release 20 (out of 1,871 total miRNAs).

22

When RNA-seq data from TN-DROSHA expressing HEK293T cells were used, pri-

miRNA assembly was dramatically improved compared to results obtained using the

Illumina BodyMap. From this single cell line, 24/74 conserved intergenic pri-miRNAs

that lack existing annotation were assembled. When combined with RefSeq annotation,

53/103 conserved intergenic pri-miRNAs in total were defined, essentially doubling the

available annotation of conserved non-protein coding pri-miRNAs. Reads mapping to

miRNA loci were highly enriched for those that span splice sites, allowing reconstruction

of multi-exonic pri-miRNA structures. Illustrative of these improved assemblies, 3 multi-

exonic transcripts that encode miR-221 and miR-222 were reconstructed using RNA-seq

data generated from TN-DROSHA-expressing HEK293T cells, while few reads mapping

to these transcripts were present in Illumina BodyMap data (Figure 2.3).

23

Figure 2.3 DROSHA inhibition facilitates pri-miRNA assembly

Visualization of RNA-seq data from Illumina Human BodyMap 2.0 (kidney and liver) and TN-DROSHA-transfected HEK293T cells. The Integrative Genomics Viewer (IGV) was used to visualize mapped read alignments. Segments of reads that are aligned to the genome are shown in grey, while blue lines represent spliced sequences. StringTie assembled transcripts produced from this locus are shown at the bottom of the panel. Plots representing H3K4Me3 histone marks and evolutionary conservation were generated using the UCSC Genome Browser (human genome GRCh37/hg19 assembly). The y-axes for UCSC Genome Browser tracks shown in this and all other figures represent the default vertical viewing range settings.

24

These transcript assemblies were validated by confirming the predicted exon-exon

junctions using reverse-transcriptase PCR (RT-PCR) with primers near the 5' and 3'

ends of the transcripts (Figure 2.4). Notably, although the 5' ends of these transcripts

are ~25-100 kb upstream of the MIR221 and MIR222 sequences, analysis of ENCODE

chromatin immunoprecipitation sequencing (ChIP-seq) data (Ernst et al., 2011) revealed

precise co-localization with H3K4me3 promoter marks (Figure 2.3), supporting the

correct identification of these transcription start sites. These results demonstrate that

inhibition of microprocessor activity by expression of TN-DROSHA greatly improves pri-

miRNA assembly in RNA-seq data.

25

Figure 2.4 RT-PCR validation of newly assembled primary transcripts encoding human miR-221 and miR-222.

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing. Nonspecific PCR product indicated with asterisk.

26

Genome-wide annotation of pri-miRNAs

Having established an experimental and computational strategy suitable for pri-miRNA

reconstruction, we next sought to apply this approach to generate a genome-wide map

of human and mouse pri-miRNA structures. Since miRNA expression is often cell-type

and tissue specific (Olive et al., 2015), we selected for analysis a panel of 8 human cell

lines (A-172, A-673, HCT116, HEK293T, HepG2, MCF-7, NCCIT, and primary

fibroblasts) and 6 mouse cell lines (C2C12, CT-26, Hepa1-6, Neuro-2a, mouse

embryonic fibroblasts (MEF), and E14TG2a embryonic stem cells) derived from a

diverse array of cell-types. Transfection conditions were optimized for each cell line and

TN-DROSHA was introduced, followed by RNA-seq and StringTie transcriptome

reconstruction (Figure 2.5). On average, approximately 180 million 100bp paired-end

reads were generated per sample (Table 2.3).

27

Figure 2.5 Overview of the experimental workflow used to generate pri-miRNA assemblies.

28

Table 2.3 RNAseq mapping statistics

Species Cell type Read count Mapping frequency

Human

A172 184,705,740 92.50%

A673 174,578,382 93.40%

Fibroblast 142,718,780 92.20%

HCT116 150,638,560 90.20%

HEK293 193,346,087 86.70%

HepG2 221,060,288 91.10%

MCF7 160,067,256 91.00%

NCCIT 165,209,310 93.30%

Mouse

C2C12 163,248,130 92.50%

CT-26 215,970,827 90.90%

E14TG2a 211,111,824 91.00%

Hepa1-6 150,418,313 93.40%

MEF 200,927,640 90.20%

Neuro-2a 193,572,149 89.50%

29

Using these data, pri-miRNA assemblies were provided for 1291/1871 (69%) of human

miRNAs and 888/1181 (75%) of mouse miRNAs that are annotated in miRBase version

20. This includes assemblies for 594 human and 425 mouse miRNAs that are not

hosted by annotated protein-coding genes. As mentioned above, non-conserved

intergenic miRNAs are generally very low in abundance and consensus is lacking

regarding which of these represent true miRNA genes. Therefore, to more accurately

assess the quality of these pri-miRNA assemblies, we focused on the pri-miRNA

transcripts that encode the set of 295 human and 297 mouse miRNAs that are

conserved among mammals (Chaing et al, 2010), which represents a more reliable set

of bona fide miRNAs. 38% (39 of 103) of human and 39% (41 of 104) of mouse

conserved non-protein coding pri-miRNAs were successfully reconstructed in at least

one cell line (Figure 2.6). When combined with existing RefSeq data, annotation for

66% and 59% of conserved intergenic miRNA genes was provided in total for human

and mouse, respectively.

30

Figure 2.6 General characteristics of human and mouse pri-miRNAs

(A) Proportion of conserved non-protein coding human and mouse pri-miRNAs annotated in this study or in RefSeq in at least one cell type.

(B, C) Intronic or exonic locations of conserved miRNAs transcribed within protein coding (B) or non-protein coding genes (C).

31

General characteristics and conservation of pri-miRNAs

Using these improved pri-miRNA maps, we examined the characteristics that typify

miRNA-encoding genes. As expected, of the conserved miRNAs that are hosted within

protein-coding genes, a large majority of pre-miRNA hairpins are located in introns (75%

in human and 83% in mouse, Figure 2.6B). For conserved intergenic miRNAs, the

frequency of intronic miRNAs drops to approximately 40% with the remainder in exons

or regions that may be intronic or exonic due to alternative splicing (Figure 2.6C). In

some cases, intergenic miRNAs are hosted in unspliced noncoding RNAs (6% in human

and 8% in mouse).

In cases where orthologous human and mouse intergenic pri-miRNAs were assembled,

we frequently observed conservation of the organization of these miRNA-encoding loci.

The locations of pri-miRNA promoters were particularly highly conserved, with the 5

ends of these transcripts almost always mapping to orthologous regions in the human

and mouse genomes when pri-miRNA assemblies were available for both species.

Representative examples of conserved pri-miRNAs are shown in Figure 2.7.

32

Figure 2.7 Examples of evolutionarily conserved pri-miRNAs

(A) Genomic loci encoding human and mouse miR-101-1. StringTie assembled transcripts, as well as H3K4Me3 marks, CpG islands, and conservation tracks from the UCSC Genomic Browser (hg19 and mm10) are shown.

(B) Genomic loci encoding human and mouse miR-324 as in panel A. The RefSeq protein coding transcript DLG4 is shown in blue.

33

For instance, we identified two distinct pri-miRNAs that encode human miR-101-1 that

each utilized different transcription start sites located approximately 9 kb upstream of the

miRNA (Figure 2.7A). The presence of CpG islands and H3K4me3 histone marks near

the transcript 5 ends support these assemblies. Likewise, two transcription start sites

were also mapped to a GC-rich region 9 kb upstream of the sequence that encodes

mouse miR-101a (Figure 2.7A). Both the human and mouse pri-miRNA transcripts are

composed of 2 exons, with the miRNA located in exon 2. These transcript structures

were confirmed by RT-PCR (Figure 2.8, 2.9). Human and mouse miR-324 are also

representative of miRNAs encoded by transcription units with conserved organization,

and, as discussed in greater detail below, represent a class of pri-miRNAs that are

transcribed as 5 extensions of annotated protein coding genes (Figure 2.7B).

34

Figure 2.8 RT-PCR validation of newly assembled primary transcripts encoding human and mouse miR-101-1.

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing.

35

Figure 2.9 RT-PCR validation of newly assembled primary transcripts encoding mouse miR-101-1

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing.

36

Classification of miRNA gene structures

Examination of miRNAs that are not hosted within protein coding genes revealed that

their primary transcripts could be catalogued into 3 broad classes (Table 2.1), each

described below and illustrated in Figure 2.10.

Class I: Independent noncoding transcription units

Approximately 60-70% of newly-defined noncoding pri-miRNAs that host conserved

miRNAs do not overlap any existing annotated genes and likely represent independent

transcription units (Table 2.1). For example, MIR30A and MIR30C-2 are intergenic

miRNA genes with no existing annotation of their primary transcripts (Figure 2.10A).

Our assemblies revealed two putative overlapping pri-miRNAs that initiate and terminate

at distinct sites. The 5 ends of both transcripts co-localize with ENCODE H3K4me3

ChIP-seq signals and were validated using 5 rapid amplification of cDNA ends (RACE)

(Figure 2.11). 3 RACE was used to confirm the distal termini of the transcripts while

RT-PCR verified their exonic structure (Figures 2.11, 2.12). Although it is generally

assumed that clustered miRNAs such as these are always co-transcribed, it is

noteworthy that use of the upstream promoter produces a transcript that encodes miR-

30a but not miR-30c-2. These results suggest that production of miR-30a is uncoupled

from miR-30c-2 in some settings. As discussed further below, we found additional

examples of pri-miRNA transcripts that produce subsets of clustered miRNAs.

37

38

Figure 2.10 Classification of newly annotated miRNA genes

(A) Class I pri-miRNAs, represented by the transcripts that encode miR-30a and miR-30c-2, are independent noncoding transcription units with no existing annotation.

(B) Class II pri-miRNAs, represented by the transcript that encodes miR-505, are extensions of annotated protein coding transcripts. The RefSeq protein coding transcript ATP11C is shown in blue.

(C) Class III, pri-miRNAs, represented by the transcript that encodes miR-99b, let-7e, and miR-125a, are extensions of annotated noncoding transcripts. The RefSeq noncoding transcript SPACA6P is shown in blue.

39

Figure 2.11 5 and 3 RACE analysis of newly assembled primary transcripts encoding human miR-30a and miR-30c-2

The upper panel summarizes the overall transcript structures while the lower panel shows primer locations (green arrows) with red ticks indicating the end of each individual sequenced RACE clone. Putative polyadenylation signals are shown in blue.

40

Figure 2.12 RT-PCR validation of newly assembled primary transcripts encoding human miR-30a and miR-30c-2.

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing. The two PCR products generated with primer pair 556/557 result from alternative splicing.

41

Class II: Extended protein-coding transcripts

In addition to completely independent transcription units, we unexpectedly observed that

several pri-miRNAs are produced as extended isoforms of annotated protein coding

genes (Table 2.1 and Figure 2.10B). This configuration is illustrated by MIR505, which

is located ~100 kb upstream of the gene that encodes the ATP11C protein.

Remarkably, we observed that the predominant promoter that drives ATP11C

transcription is located upstream of MIR505, with the miRNA hairpin located within intron

1 of the extended transcript. Indeed, ENCODE H3K4me3 ChIP-seq signal is

significantly higher at the extended transcript 5 end compared to the RefSeq annotated

ATP11C promoter. RT-PCR confirmed the existence of the extended miRNA-hosting

transcript (Figure 2.13). Additional examples of similarly organized pri-miRNAs

encoding miR-181c/181d and miR-219-1 are provided in Figure 2.14.

42

Figure 2.13 RT-PCR validation of newly assembled primary transcripts encoding human miR-505

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing. The two PCR products represent alternatively spliced isoforms.

43

Figure 2.14 Additional examples of human miRNAs that are transcribed as extensions of annotated protein-coding genes

44

Class III: Extended annotated noncoding transcripts

The third class of pri-miRNAs that we observed were a set that overlap annotated

RefSeq noncoding RNAs. This type of transcript is exemplified by the pri-miRNA that

encodes miR-99b, let-7e, and miR-125a (Figure 2.10C). These miRNAs are located

immediately upstream of an annotated noncoding RNA, SPACA6P. In our assemblies, a

longer transcript that encompasses both the miRNAs and SPACA6P was detected. RT-

PCR confirmed the transcript structure predicted by our data (Figure 2.15). It is likely

that the existing annotation of SPACA6P actually represents the 3 cleavage product of

the MIR99B/MIRLET7E/MIR125A pri-miRNA that is produced by DROSHA processing,

since the 5 end of SPACA6P is immediately adjacent to the 3 end of the pre-miR-125a

hairpin. We speculate that this class of pri-miRNAs is largely composed of transcripts

that are incompletely annotated in RefSeq.

45

Figure 2.15 RT-PCR validation of the newly assembled primary transcript encoding human miR-99b, let-7e, and miR-125a

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with the PCR product corresponding to the assembled transcript highlighted with a red arrowhead. The identity of the PCR product was verified by DNA sequencing.

46

Pri-miRNA structures reveal novel regulatory mechanisms

Inspection of pri-miRNA gene structure using our assemblies uncovered new potential

regulatory mechanisms that likely influence the production of specific miRNAs. These

mechanisms include alternative promoters, partially-transcribed miRNA clusters, and

alternative splicing, each discussed in turn below and summarized in Tables 2.4 and

2.5.

47

Table 2.4 Novel potential regulatory mechanisms for conserved human non-protein coding pri-miRNAs

Encoded human miRNA(s) Multiple

promoters

Partial production of cluster

miRNA spans splice site

let-7a-1/let-7f-1/let-7d Yes

let-7a-3/let-7b Yes

let-7c/miR-99a/miR-125b-2 Yes

miR-9-2 Yes

miR-9-3 Yes

miR-15a/miR-16-1 Yes

miR-17/miR-18a/miR-19a/miR-20a/miR-19b-1/miR-92a-1

Yes

miR-22 Yes

miR-23a/miR-24-2/miR-27a Yes

miR-29a/miR-29b-1 Yes

miR-30b/miR-30d Yes

miR-31 Yes

miR-101-1/miR-3671 Yes

miR-130a Yes

miR-135a-2/miR-1251 Yes

miR-135b Yes

miR-137/miR-2682 Yes

miR-181c/miR-181d Yes

miR-193b/miR-365a Yes

miR-195/miR-497 Yes

miR-221/miR-222 Yes

miR-675 Yes

let-7a-2/miR-100/miR-125b-1 Yes Yes

miR-30a/miR-30c-2 Yes Yes

miR-374a/miR-374b/miR-421/miR-545 Yes Yes

miR-132/miR-212 Yes

miR-130b/miR-301b Yes (miR-130b)

miR-199a-2/miR-214 Yes (miR-199a-2)

miR-202 Yes

miR-205 Yes

48

Table 2.5 Novel potential regulatory mechanisms for conserved mouse non-protein coding pri-miRNAs

Encoded mouse miRNA(s) Multiple

promoters Partial production

of cluster miRNA spans

splice site

let-7b/let-7c-2 Yes

miR-15a/miR-16-1 Yes

miR-17/miR-18a/miR-19a/miR-20a/miR-19b-1/miR-92a-1

Yes

miR-29a/miR-29b-1 Yes

miR-31 Yes

miR-101a Yes

miR-196b Yes

miR-221/miR-222 Yes

miR-345 Yes

miR-374/miR-421 Yes

let-7a-2/miR-100/miR-125b-1 Yes Yes

miR-670 Yes

49

Alternative promoters

Perhaps unsurprisingly given the incomplete existing annotation of pri-miRNA genes, our

assemblies frequently identified alternative promoters that drive miRNA expression in

different cell types. This phenomenon is exemplified by the gene that encodes let-7a-3

and let-7b. This pri-miRNA, annotated in RefSeq as MIRLET7BHG, initiates 27 kb

upstream of the miRNA sequences, in a region rich in H3K4me3-modified histones

(Figure 2.16). We observed two additional transcription start sites further upstream,

also associated with H3K4me3. These transcript structures and 5 ends were validated

by RT-PCR and RACE (Figures 2.17, 2.18). While all cell lines tested used the most

upstream promoter, the alternative downstream transcription start sites were

differentially utilized in a cell-line specific manner. These results suggest that these

distinct promoters may be differentially regulated. Of the 103 human intergenic

conserved miRNA transcription units, we documented that at least 25 have multiple

alternative promoters (Table 2.4), indicating that this is a very common mode of miRNA

regulation.

50

Figure 2.16 Examples of newly-identified miRNA regulatory mechanisms

(A) Pri-miRNA genes frequently utilize multiple alternative promoters, as exemplified by the transcript that encodes let-7a-3 and let-7b. The RefSeq noncoding transcript MIRLET7BHG is shown in blue.

51

Figure 2.17 5 RACE analysis of primary transcripts encoding human let-7a-3 and let-7b

The upper panel summarizes the overall transcript structures while the lower panel shows primer locations (green arrows) with red ticks indicating the end of each individual sequenced RACE clone.

52

Figure 2.18 RT-PCR validation of newly assembled primary transcripts encoding human let-7a-3 and let-7b

Green arrows indicate the location of primers. RT-PCR results are shown to the left of the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing. The multiple PCR products generated with each primer pair represent alternatively spliced isoforms.

53

Transcription of subsets of clustered miRNAs

Many miRNA sequences are clustered in the genome and it is generally assumed that

miRNAs that are located within approximately 50 kb of one another are co-transcribed

as polycistronic transcripts (Baskerville and Bartel, 2005; Liang et al., 2007).

Unexpectedly, we observed multiple examples of pri-miRNA transcripts that encode

subsets of clustered miRNAs (Tables 2.4, 2.5). The transcripts that host miR-30a and

miR-30c-2, described above (Figure 2.10), represent examples of this phenomenon.

Another interesting example is the miRNA cluster that encodes miR-100, let-7a-2, and

miR-125b-1. Notably, the clustering of these miRNAs and even their order in the cluster

is conserved between mammals and Drosophila, suggesting that their coordinated

regulation has been subject to strong evolutionary selection (Roush and Slack, 2008).

Our assemblies confirmed the existence of a previously annotated RefSeq transcript,

MIR100HG, that encompasses all three human miRNAs in the cluster (Figure 2.19).

54

Figure 2.19 Host genes for miRNA cluster

Pri-miRNAs may host subsets of clustered miRNAs, as illustrated by transcripts that encode miR-100, let-7a-2, and miR-125b-1. The RefSeq noncoding transcript MIR100HG is shown in blue.

55

The 5 end of this pri-miRNA is supported by H3K4me3 data. In addition, we identified 3

additional alternative transcription start sites also corroborated by H3K4me3 histone

modifications. Use of the most downstream promoter produces a transcript that

encodes only miR-125b-1. RT-PCR and 5 RACE confirmed the accuracy of all of these

pri-miRNA transcript assemblies (Figures 2.20, 2.21). These findings demonstrate that

production of individual miRNAs in polycistronic clusters can be uncoupled through the

use of alternative promoters.

56

Figure 2.20 5 RACE analysis of primary transcripts encoding human miR-100, let-7a-2, and miR-125b-1

The upper panel summarizes the overall transcript structures while the lower panel shows primer locations (green arrows) with red ticks indicating the end of each individual sequenced RACE clone.

57

Figure 2.21 RT-PCR validation of newly assembled primary transcripts encoding human miR-100, let-7a-2, and miR-125b-1

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing.

58

Alternative splicing

A previous analysis of existing expressed sequence tags (ESTs) and mRNAs revealed a

class of pre-miRNA sequences that span intron-exon junctions such that splicing

prevents processing of these miRNA hairpins by the microprocessor complex (Melamed

et al., 2013). We were able to confirm the existence of pri-miRNAs with this

configuration using our assemblies (Table 2.4). For example, the pre-miR-205 hairpin

spans the splice donor site immediately upstream of the final exon of an annotated pri-

miRNA, MIR205HG (Figure 2.22). Use of this splice site disrupts the pre-miR-205

sequence and is thus mutually exclusive with production of the mature miRNA.

Interestingly, we found alternatively spliced isoforms that utilize a distinct 3 terminal

exon, placing the pre-miRNA hairpin within an intron, a location permissive for miRNA

processing. RT-PCR confirmed the use of both alternative terminal exons (Figure 2.23).

These observations lend further support for the regulation of miRNA biogenesis by

alternative splicing.

59

Figure 2.22 miRNA biogenesis can be affected by alternative splicing

miRNAs may span splice sites and thereby may be regulated by alternative splicing. The pri-miRNA that encodes miR-205 is shown as a representative example of this configuration. The RefSeq noncoding transcript MIR205HG is shown in blue.

60

Figure 2.23 RT-PCR validation of primary transcripts encoding human miR-205

Green arrows indicate the location of primers. RT-PCR results are shown below the transcript alignments with PCR products corresponding to the assembled transcripts highlighted with red arrowheads. Identities of all PCR products were verified by DNA sequencing.

61

Discussion

Investigation of miRNA functions in numerous biological settings has advanced our

understanding of the roles of miRNAs in development and disease and the downstream

targets that they regulate (Vidigal and Ventura, 2015). On the other hand, considerably

less is known about the pathways that govern miRNA biogenesis at transcriptional and

post-transcriptional levels. Elucidation of such miRNA regulatory mechanisms has been

hindered by the poor annotation of pri-miRNA gene structures. Indeed, a frequent

misperception is that miRNA promoters are located in the genomic sequence

immediately adjacent to pre-miRNA hairpins when, in fact, these promoters are often

located 10’s to 100’s of kilobases upstream (Chang et al., 2007; Cai et al., 2004).

Clearly, dissection of cis- and trans-regulation of miRNA transcription requires an

accurate description of the relevant transcription units. Putative post-transcriptional

regulatory mechanisms may also be overlooked without an understanding of the splicing

patterns or polyadenylation sites of pri-miRNA transcripts. In light of these limitations,

we set out to establish a resource of miRNA gene structures that could be easily

accessed by investigators in the field in order to improve the study of miRNA regulation.

Herein, we describe a novel experimental and computational approach that we

developed to achieve this goal.

Having demonstrated that comprehensive pri-miRNA annotation cannot be easily

accomplished using existing RNA-seq data, we devised a multi-step strategy to enable

genome-wide pri-miRNA reconstruction. First, a dominant negative DROSHA protein

that globally impairs pri-miRNA processing is expressed, thereby stabilizing pri-miRNA

transcripts and dramatically improving their coverage in RNA-seq libraries. Next,

StringTie, an advanced transcriptome assembler that is capable of accurately

62

reconstructing pri-miRNAs, is employed. Since miRNA expression is often cell-type

specific, we applied this strategy to a panel of human and mouse cell lines of diverse

origins, thereby successfully annotating ~70% of pri-miRNAs in these species. We

anticipate that near complete assembly of annotated miRNAs is possible by applying this

approach to additional cell types.

Multiple lines of evidence support the accuracy of the new pri-miRNA annotations

provided here. First, the 5 ends of the assembled transcripts are frequently located

within regions enriched in H3K4me3 histone marks and CpG islands, features that are

associated with RNA polymerase II promoters (Mikkelsen et al., 2007). Moreover, we

extensively validated new pri-miRNA assemblies using 5 and 3 RACE as well as RT-

PCR, demonstrating strong concordance between predicted and actual pri-miRNA

structures. Additionally, mature miRNAs are highly conserved and we reasoned that

their gene structures would tend to be conserved as well. Indeed, in cases where

orthologous pri-miRNAs were annotated in human and mouse, we frequently found

similar gene structures and promoter locations. Overall, these findings support the

reliability of these new pri-miRNA assemblies.

This new map of pri-miRNA structure has revealed previously unrecognized potential

regulatory mechanisms for many miRNAs. In particular, we found that alternative

promoter usage is a frequent feature of miRNA genes, underscoring the need for a

thorough understanding of a given miRNA transcription unit to fully dissect its cis- and

trans-regulation. Unexpectedly, we also found several examples of pri-miRNAs that are

contiguous with downstream protein-coding genes, suggesting possible coordinated

expression. In light of these findings, it will be interesting to investigate whether the

63

miRNAs and proteins encoded by these linked transcripts function within or control

common cellular or developmental pathways. In addition, analysis of pri-miRNAs

spanning polycistronic clusters revealed that these miRNAs are not always co-

transcribed, even in cases where the clustered organization is deeply conserved, such

as the miRNA cluster that encodes miR-100, let-7a-2, and miR-125b-1. These results

indicate that expression of these apparently linked miRNAs may be uncoupled in some

settings. Finally, our data confirm previous analyses that identified miRNAs that span

splice-sites (Melamed et al., 2013), supporting a role for alternative splicing in regulating

the expression of specific miRNAs.

In summary, our results highlight the importance of precise annotation of miRNA gene

structures, provide assemblies for a large majority of human and mouse pri-miRNAs,

and offer an experimental framework for further reconstruction of the remaining pri-

miRNAs yet-to-be described. We anticipate that these annotations will be highly

valuable for ongoing efforts to dissect mechanisms of miRNA regulation in diverse

biological settings.

64

Materials and methods

Cell culture

E14TG2a embryonic stem cells were cultured in GMEM with 1% nonessential amino

acids, β-mercaptoethanol, and leukocyte inhibitory factor. A-172, A-673, C2C12,

HEK293T, Hepa1-6, MCF-7, and MEF cell lines were cultured in DMEM. CT-26 and

NCCIT cells were cultured in RPMI 1640. HCT116 cells were cultured in McCoy's 5A.

HepG2, human primary fibroblasts, and Neuro-2a were cultured in EMEM. All media

was supplemented with 10% fetal bovine serum (FBS) and Antibiotic-Antimycotic.

Plasmids

To generate pcDNA5/FLAG-HA-DGCR8, FLAG-HA-DGCR8 was amplified from

pFLAG/HA-DGCR8 (Landthaler et al., 2004) and cloned into the HindIII site of

pcDNA5/FRT (Life Technologies). To construct the TN-DROSHA expression plasmid,

E1045Q and E1222Q mutations were introduced into pcDNA3.1/V5-His-DROSHA

(Rakheja et al., 2014) using the QuikChange Lightning Site-Directed Mutagenesis kit

(Stratagene). This plasmid also carries synonymous mutations at codons T438-L444

that render it resistant to commonly used siRNAs. Primer sequences for mutagenesis

are provided in Table 2.6-11.

65

Table 2.6 Primer sequences for mutagenesis

Mutation Forward primer sequence (5'-3') Reverse primer sequence (5'-3')

E1045Q AAGCGTTAATAGGAGCTGTTTACTTG

GAGGGAAG

GAAAACAATTGGCCATTGCATGTCGAAGG

TCCG

E1222Q AATCATTTATTGCAGCGCTGTACATT

GATAAGGATTTGGAATATG

GCAAAAGGTCCGCCAAGGTCTTGGTGCGA

AG

Synonymous mutation T438-L444

GAGAGATCTGTATGACAAATTTGAGG

AGGAGTTGGGGAGC

AATCGTGATGTTCCAACCACTGTAGAATC

TCCCACCTG

66

Table 2.7 Primer sequences for real-time RT-PCR

Gene/ Amplicon

Forward primer sequence (5'-3') Reverse primer sequence (5'-3')

Human DLEU2

TGCATTGGAACATGACATGAG AAGAATTGCTGAGCTAAGTAGAGGTC

Human C13orf25

GGCCTCCGGTCGTAGTAAAG GCAGTTAGGTCCACGTGTATGA

Human pri-miR-15a

TAGGCGCGAATGTGTGTTTA TGCTATCATAAGAGCTATGAATAAAAAG

Human pri-miR-16-1

CTTTTTATTCATAGCTCTTATGATAGC TCAATAAAACTGAAAACACATTAGTAACA

Human pri-miR-17

CACCTTGTAAAACTGAAGATTGTGA CCTGCACTTTAAAGCCCAACT

Human pri-miR-18a

AGGGCCTGCTGATGTTGAGT AACACCTATATACTTGCTTGGCTTG

18S rRNA GTAACCCGTTGAACCCCATT CCATCCAATCGGTAGTAGCG

67

Table 2.8 Primer sequences for RACE in Fig 2.11

Amplicon Forward primer sequence (5'-3') Reverse primer sequence (5'-3')

5'RACE from exon A

CGACTGGAGCACGAGGACACTGA CGCTCGCCTGACAGCTGATG

Nested 5'RACE from exon A

GGACACTGACATGGACTGAAGGAGTA GCAGGAGGAGGAGGGGAGAA

3'RACE from exon B

ATCCCTCCCTGTCACACACG GCTGTCAACGATACGCTACGTAACG

Nested 3'RACE from exon B

GATGGGTGGTCGCTTACCTGTG CGCTACGTAACGGCATGACAGTG

5'RACE from exon C

CGACTGGAGCACGAGGACACTGA TGCTCTAAAGTCTGCTCCCAGAGAGG

Nested 5'RACE from exon C

GGACACTGACATGGACTGAAGGAGTA CTGCTCCCAGAGAGGACTTGT

3'RACE from exon D

TGGCGCCACTTTCCTGAGAT GCTGTCAACGATACGCTACGTAACG

Nested 3'RACE from exon D

ACTTCCAGCCAGTTTGGGTCA CGCTACGTAACGGCATGACAGTG

68

Table 2.9 Primer sequences for RACE in Fig 2.17

Amplicon Forward primer sequence (5'-3') Reverse primer sequence (5'-3')

5'RACE from exon A

CGACTGGAGCACGAGGACACTGA CCACACGCACCTCCTGGTTG

Nested 5'RACE from exon A

GGACACTGACATGGACTGAAGGAGTA TGTCTTGGTTCTGTCTGTCTGATG

5'RACE from exon D

CGACTGGAGCACGAGGACACTGA AAACCTGCTTCCATCTTGTTAGGC

Nested 5'RACE from exon D

GGACACTGACATGGACTGAAGGAGTA GGCTAATATCTTCAAATCATCCACACG

5'RACE from exon E

CGACTGGAGCACGAGGACACTGA GTGGCACCATCCCGAGCAAG

Nested 5'RACE from exon E

GGACACTGACATGGACTGAAGGAGTA AGAGCTCTCAGTGCGCTAGG

69

Table 2.10 Primer sequences for RACE in Fig 2.20

Amplicon Forward primer sequence (5'-3') Reverse primer sequence (5'-3')

5'RACE from exon A

CGACTGGAGCACGAGGACACTGA AGGCCCTCAGCTAGCGGTCTG

Nested 5'RACE from exon A

GGACACTGACATGGACTGAAGGAGTA

GGTCTGAGTCCTGGGTTCCAAA

5'RACE from exon B

CGACTGGAGCACGAGGACACTGA CGGAGGATGGAGGCGTCTTCT

Nested 5'RACE from exon B

GGACACTGACATGGACTGAAGGAGTA

CCAAAGCCAGGAAGTGAAAATGA

5'RACE from exon C

CGACTGGAGCACGAGGACACTGA AAATGCGGCCACACGGACTTT

Nested 5'RACE from exon C

GGACACTGACATGGACTGAAGGAGTA

GGCCACACGGACTTTGAAGG

70

Table 2.11 Primer sequences for RT-PCR

Primer name Primer sequence (5'-3')

507 GAGTAGGCGCGTGGAGTC

508 TCTTGCACGATCAAAATAGGG

509 GCCACATGTGATAGATGACCA

510 GGGTGATCCTTTGCCTTCT

511 CAGGCAGGACGAGAGAAAGA

513 TGCAATGTAAGCTTCTGTTTCC

514 GGGGAGAGGATGGAGAGC

515 TCATTTTCTCCGCAGCATC

517 CGAGCTCAGTTATGGCACAC

518 GGGAGTCTAAGGGCAGCAG

519 TGCTGCTGCTGCTGCTAC

520 TAGCGGGAAGAACAAAGGAA

521 GGGACGCTGGAGTCTGG

522 TTCTGGTGGCTGCATTACTCT

523 GGAGAGAGGAAGAGCGGAGT

524 AAAGGCGCTTCTTTTCACCT

525 CCTGTCAGTCACCGTGTCC

552 AAGAGGGTGAGCGTTTGGA

553 CCAGGGACGTCATTTTCACT

554 CCCTTCAAAGTCCGTGTGG

555 GGTGGCTAGGTGACAGGAGA

556 GGGTGACTTTCTCGACTCGT

557 CTGGCCCATGTCTCTCTGTT

559 CAAGACATCTGAGGGGCAAC

560 GCAGAGGAGGTGTCTTCAGG

561 CACTAGTGTCTCCCCTGCTTC

563 CAGCCTAGCGCACTGAGAG

565 GTCCTCTCTGGGAGCAGACTT

566 TTTGAACCATGAATTCCACCT

575 TCTTTGGACAAAATTGAGAAGAACT

71

RNA preparation

Cells were co-transfected with pcDNA3.1/V5-His-TN-DROSHA and pcDNA5/FLAG-HA-

DGCR8 under optimized conditions (Table 2.7), and harvested 48 h after transfection.

To isolate nuclear RNA, cells were lysed on ice for 5 min in 10 mM Tris-HCl pH 7.5, 10

mM NaCl, 0.2 mM EDTA, 0.05% NP-40, and nuclei were spun at 2500 xg for 3 min and

then resuspended in QIAzol for RNA isolation using miRNeasy kit with DNase I digestion

according to the manufacturer’s instructions (Qiagen).

RT-PCR, qPCR, and RACE

RNA was reverse-transcribed using the QuantiTect Reverse Transcription Kit (Qiagen)

prior to PCR amplification. qPCR was performed using an ABI 7900HT Sequence

Detection System with the SYBR Green PCR core reagent kit (Life Technologies).

Eukaryotic 18S rRNA endogenous control (Life Technologies) was used as an internal

standard. RACE was performed using the GeneRacer kit (Life Technologies). Primer

sequences are provided in Table S6.

RNA-seq library preparation and sequencing

RNA-seq libraries were generated using the Illumina TruSeq RNA Sample Preparation

Kit v2 according to the manufacturer’s protocol, and sequenced in one lane of a HiSeq

2000 using the 100 bp paired-end protocol.

72

Table 2.7 Transfection methods

Cell lines Transfection reagent Molar Ratio of

plasmids transfected2

A-172 Cell Line Kit V; Program U-029; Nucleofector 2b (Amaxa) 3:1

A-673 Cell Line Kit V; Program X-001; Nucleofector 2b (Amaxa) 3:1

C2C12 Cell Line Kit V; Program B-032; Nucleofector 2b (Amaxa) 3:1

CT-26 Cell Line Kit SE; Program CM-137; 4D-Nucleofector (Amaxa) 3:1

E14TG2a embryonic stem cells Xfect (Clontech) 3:1

HCT116 FuGENE HD (Promega) 2:1

HEK293T FuGENE HD (Promega) 1:1

Hepa1-6 Cell Line Kit SF; Program EH-100; 4D-Nucleofector (Amaxa) 3:1

HepG2 FuGENE HD (Promega) 3:1

Human primary fibroblasts FuGENE HD (Promega) 3:1

MCF-7 Lipofectamine LTX (Life Technologies) 4:1

MEF MEF Kit 2; Program T-020; Nucleofector 2b (Amaxa) 3:1

Neuro-2a Cell Line Kit V; Program T-024; Nucleofector 2b (Amaxa) 3:1

NCCIT FuGENE HD (Promega) 4:1

2 *Molar ratio of pcDNA3.1/V5-His-TN-DROSHA to pcDNA5/FLAG-HA-DGCR8

73

Alignment of reads and transcriptome assembly

Reads with a length shorter than 25 nucleotides were first filtered and discarded using

fqtrim (http://ccb.jhu.edu/software/fqtrim/index.shtml). The remaining reads were aligned

to the human (hg19) or mouse (mm10) reference genome using TopHat2 (Kim et al.,

2013). The alignments were assembled using StringTie-v0.97 (Pertea et al., 2015).

fqtrim command line:

fqtrim -A -p 5 -l 25 -o trimmed.fq.gz R1.fastq.gz,R2.fastq.gz

tophat command line:

tophat2 -p 10 -o tophat -G known_genes.gff3 --transcriptome-index=./tindex --library-

type fr-firststrand hg19 R1.trimmed.fq.gz R2.trimmed.fq.gz >& run.tophat

stringtie command line:

stringtie accepted_hits.bam -p 10 -S -g 0 -f 0.1 -o accepted_hits.gtf

DATA ACCESS

The RNA-seq datasets from this study have been submitted to the NCBI Sequence

Read Archive (SRA; http://www.ncbi.nlm.nih.gov/sra) under accession number

SRP057660. Human and mouse assemblies are available in the Supplementary Data,

and can also be viewed in the UCSC Genome Browser using the following link:

http://www4.utsouthwestern.edu/mendell-lab/resources.html

74

Chapter 3: Characterization and loss of function study of a

human long noncoding RNA induced by DNA damage, NORAD

Introduction

A large body of evidence has demonstrated that eukaryotic genomes are extensively

transcribed outside of protein-coding genes (Djebali et al., 2012). Among non-protein

coding transcripts are a class referred to as long noncoding RNAs, or lncRNAs, which

have attracted significant attention due to their emerging functions in development and

disease (Fatica and Bozzoni, 2014; Li and Chang, 2014). lncRNAs represent a

heterogeneous family of RNAs that are defined by a length of greater than 200

nucleotides and by the lack of any detectable open reading frame (ORF). The exact

number of lncRNAs encoded in the human genome is a matter of debate, but most

estimates place the number in the tens of thousands (Ulitsky and Bartel, 2013; Iyer et

75

al., 2015). The biological roles and molecular functions of the overwhelming majority of

these transcripts remain unexplored or elusive.

Compared to other known noncoding RNA classes, lncRNAs stand out due to their

enormous diversity with respect to evolutionary conservation, expression level,

molecular function, and cellular localization (Ulitsky and Bartel, 2013). In the nucleus,

lncRNAs such as XIST, HOTAIR, and HOTTIP have been shown to regulate gene

expression at the transcriptional level in cis or trans by associating with and directing the

activity of chromatin remodeling complexes (Rinn and Chang, 2012). Other types of

nuclear lncRNAs organize subnuclear structure, including Firre, which mediates

interchromosomal interactions (Hacisuleyman et al., 2014), and NEAT1, which is

essential for paraspeckle formation (Clemson et al., 2009). Cytoplasmic lncRNAs, in

contrast, have been shown to post-transcriptionally regulate gene expression by base

pairing with target mRNAs. lncRNA:mRNA interactions can result in target stabilization,

as is the case for the noncoding RNAs BACE1-AS and TINCR (Faghihi et al., 2008;

Kretz et al., 2013), whereas other noncoding RNAs such as1/2sbsRNA can trigger target

mRNA degradation (Gong and Maquat, 2011). LncRNAs may also modulate the activity

of interacting proteins in the cytoplasm (Liu et al., 2015; Kino et al., 2010). Despite

these well-characterized examples, studies of lncRNA function are still at an early stage.

Due to their generally low abundance and modest evolutionary conservation relative to

protein-coding genes (Cabili et al., 2011; Ulitsky and Bartel, 2013), it has been

suggested that a large fraction of lncRNAs represent products of promiscuous

transcription rather than independently functional RNAs (Struhl, 2007). To resolve this

issue, detailed functional studies, including the use of genetic loss-of-function

76

approaches, are needed to establish the biological role and molecular activity of putative

lncRNAs of interest.

77

Results

Characterization of NORAD, an abundant, conserved human lncRNA

This study was initiated in an attempt to identify human lncRNAs that regulate the DNA

damage response. To this end, we examined a set of previously identified mouse

lncRNAs that are induced after doxorubicin treatment in a p53-dependent manner in

murine embryonic fibroblasts (Guttman et al., 2009). Among these transcripts, we were

particularly interested in a poorly-characterized 4.9 kilobase (kb) unspliced lncRNA,

annotated as 2900097C17Rik, that exhibits a high degree of evolutionary conservation

in mammals (Figure 3.1A). A clear ortholog of this transcript, with 65% nucleotide

identity to 2900097C17Rik (Figure 3.1B), is expressed from the syntenic location in the

human genome (Figure 3.1C). Annotated in RefSeq as LINC00657, this 5.3 kb lncRNA

is highly expressed in human cell lines based on ENCODE RNA-seq data (Figure 3.1A)

and is ubiquitously expressed in human tissues according to Illumina BodyMap 2.0 data

(Figure 3.2A). Like the mouse ortholog, the human transcript has features of an RNA

polymerase II transcription unit, including an enrichment of H3K4me3 modified histones

at the transcription start site (Figure 3.1A) and a canonical polyadenylation signal at the

3 end, use of which was confirmed by 3 rapid amplification of cDNA ends (RACE)

(Figure 3.2B).

78

79

Figure 3.1 Evolutionary conservation of mammalian noncoding RNA, NORAD

(A) Schematic representation of NORAD (annotated in RefSeq as LINC00657) with associated UCSC Genome Browser tracks depicting mammalian conservation (PhastCons) as well as ENCODE RNA-seq and H3K4me3 ChIP-seq coverage in human cell lines (Rosenbloom et al., 2013).

(B) Sequence identity of between human NORAD and mouse Norad (annotated in RefSeq as 2900097C17Rik). Two sequences were aligned using BLAST (bl2seq) (Altschul et al., 1990) and percentage of identical nucleotides from aligned segments are indicated.

(C) Conserved synteny between human and mouse. Syntenic location of NORAD and Norad loci were obtained from Ensenbl (Cunningham et al., 2015) http://useast.ensembl.org/Homo_sapiens/Location/Genome

80

Figure 3.2 NORAD expression in human tissues

(A) Illumina BodyMap 2.0 1X75bp RNA-seq data were downloaded from The Galaxy Project (https://usegalaxy.org/library/index), aligned to hg19 using Tophat2 (Trapnell et al., 2009), and FPKM values were calculated using Cufflinks (Trapnell et al., 2010).

(B) Major polyadenylation site at the 3 end of NORAD identified by 3 RACE.

81

To determine whether the regulation of this lncRNA is conserved between human and

mouse, we examined its expression after doxorubicin treatment in the human colon

cancer cell line HCT116 and a derivative cell line in which p53 was inactivated by

homologous recombination (Bunz et al., 1998). As in mouse, the human transcript is

induced after DNA damage in a p53-dependent manner (Figure 3.3A). We therefore

named this lncRNA Noncoding RNA Activated by DNA Damage, or NORAD. Despite its

p53-dependent induction, we were unable to identify an obvious p53 binding site in the

vicinity of the NORAD promoter nor was one identified in a recent p53 ChIP-seq study

performed in this cell line (Sanchez et al., 2014). Therefore, it is likely that the regulation

of NORAD by p53 is indirect.

NORAD is easily detectable as a discrete transcript of the expected size by northern

blotting (Figure 3.3B). To quantitatively assess its abundance, we determined the

absolute copy number of NORAD in a panel of human cell lines with or without

doxorubicin treatment. These experiments revealed that NORAD is present at ~300-

1400 copies per cell, similar in abundance to highly expressed mRNA transcripts such

as ACTB (Islam et al., 2011) (Figure 3.3C).

82

Figure 3.3 NORAD is induced by DNA damage and expressed abundantly in multiple human cell lines

(A) qRT-PCR analysis of NORAD expression relative to 18S rRNA in p53+/+ and p53−/−

HCT116 cells with or without treatment with 1 M doxorubicin for 24 hours. For this and all subsequent qPCR figures, error bars represent standard deviations from 3 independent measurements. (B) Northern blot analysis of NORAD expression in total RNA in HCT116 cells. (C) Absolute quantification of NORAD transcript copy number per cell, determined by qRT-24 hours.

83

Because annotated lncRNAs may encode conserved peptides (Anderson et al., 2015;

Bazzini et al., 2014), we examined the coding potential of NORAD using PhyloCSF,

which has been widely used to discriminate protein coding from noncoding transcripts

based on their evolutionary signatures (Lin et al., 2011). This analysis confirmed the

absence of a detectable conserved open reading frame (ORF) in the NORAD transcript,

with the highest scoring ORF receiving a codon substitution frequency (CSF) value

similar to other well characterized lncRNAs such as NEAT1 and XIST (Figure 3.4).

NORAD also lacks the potential to encode any recognizable protein domains, based on

a BLASTX analysis of all possible reading frames throughout the transcript. These

results support a noncoding function for NORAD. Based on these findings that

established NORAD as a highly conserved, ubiquitously expressed, abundant lncRNA,

we set out to investigate its functions in human cells.

84

Figure 3.4 NORAD shows very low coding potential as determined by codon substitution frequency

Maximum CSF scores of NORAD as well as other known coding and noncoding RNAs determined by analysis with PhyloCSF (Lin et al., 2011).

85

NORAD loss-of-function results in chromosomal instability

To elucidate potential functions of NORAD, we designed 3 pairs of transcription

activator-like effector nucleases (TALENs) that target within the first 300 nucleotides of

the lncRNA to facilitate the homology-directed insertion of a transcriptional stop element

and puromycin-resistance cassette flanked by loxP sites (Figure 3.5A). Initially, this

approach was used to inactivate NORAD in HCT116 cells, a stably diploid human cell

line that has been extensively used to study the p53 pathway and the human DNA

damage response (Jallepalli et al., 2001; Bunz et al., 1998). All 3 TALEN pairs produced

correctly targeted subclones with high efficiency after puromycin selection, with 124/147

clones exhibiting heterozygous insertions at the NORAD locus and 15/147 clones

exhibiting homozygous insertions. Correct targeting in representative clones generated

with each TALEN pair was confirmed by Southern blotting (Figure 3.5B). Targeted

clones exhibited the expected loss of NORAD expression, as assessed by northern

blotting and quantitative RT-PCR (qRT-PCR) (Figures 3.6A, 3.6B).

86

Figure 3.5 Genome editing to inactive NORAD and validation of edited alleles by Southern blot

(A) NORAD was inactivated in human cell lines by designing custom TALEN pairs (represented as scissors) to cleave within the first 300 nucleotides of the gene, thereby stimulating the homology-directed insertion of a puromycin resistance cassette (PuroR) followed by 4 tandem polyadenylation signals (STOP). The presence of loxP sites (green triangles) allows excision of the STOP cassette upon expression of Cre recombinase.

(B) Schematic showing 7 kb SphI restriction fragment created by correct NORAD targeting and its detection by Southern blot in knockout clones.

87

Figure 3.6 Validation of NORAD targeting in HCT116 cells

(A) Northern blot analysis of NORAD in HCT116 clones of the indicated genotypes.

(B) qRT-PCR analysis of NORAD expression relative to 18S rRNA in targeted HCT116 clones of the indicated genotypes.

88

Because NORAD is upregulated after DNA damage, we first determined whether the

DNA damage-activated cell cycle checkpoints are intact in NORAD−/− cells. HCT116

cells undergo a well-documented p53-dependent G1 and G2 arrest after treatment with

doxorubicin or ionizing radiation (Bunz et al., 1998). We observed no consistent defect

in either the G1 or G2 checkpoints in independent NORAD−/− clones (Figure 3.7),

indicating that NORAD is not required for these aspects of the DNA damage response.

These analyses rely upon flow cytometric measurements of DNA content as cells

progress through the cell cycle. Unexpectedly, these assays revealed that 2/15

NORAD−/− clones appeared to have stably tetraploid DNA content (Figure 3.8A). These

findings were confirmed by examining metaphase chromosome spreads. Wild-type

HCT116 cells uniformly had 45 chromosomes, consistent with the reported karyotype

(Masramon et al., 2000) (Figure 3.8B). In contrast, tetraploid NORAD−/− cells had

variable chromosome numbers, with DNA content approaching 4N. As described below,

our subsequent experiments have demonstrated that the spontaneous generation of

tetraploid HCT116 subclones is exceedingly rare and we have never observed stable

tetraploidization of these cells without NORAD inactivation, despite the analysis of over

100 subclones produced after control manipulations. Even apparently diploid NORAD−/−

clones displayed a range of chromosome numbers (Figure 3.8B), suggesting that this

karyotypically-stable cell line had adopted a chromosomal instability (CIN) phenotype,

defined as the frequent loss or gain of whole chromosomes (Geigl et al., 2008).

89

Figure 3.7 DNA damage-induced G1 and G2 checkpoints are grossly intact in NORAD−/− HCT116 cells

(A) The G1 checkpoint was assessed by treating cells with 1 M doxorubicin for 24 hours and subsequently measuring DNA content by propidium iodide staining and flow cytometry. The fraction of cells in G1 in doxorubicin-treated cells is plotted in the graph. p53−/− cells, which lack an effective G1 checkpoint, exit G1 after DNA damage while NORAD−/− cells accumulate in this cell cycle phase.

(B) The G2 checkpoint was assessed by treating cells as in panel A and measuring the fraction of mitotic cells by phospho-histone 3 S10 (pH3) staining, which is plotted in the graph. Unlike p53−/− cells, which lack an effective G2 checkpoint, NORAD−/− cells fail to enter M phase after DNA damage.

90

Figure 3.8 Genetic inactivation of NORAD results in chromosomal instability in human cells

(A) Flow cytometry histograms showing DNA content, as measured by propidium iodide staining, in representative diploid and tetraploid NORAD−/− HCT116 clones.

(B) Metaphase spreads of wild-type HCT116 cells and representative tetraploid and diploid NORAD−/− clones. The number in the lower right corner of each image shows the number of chromosomes present. Abnormal chromosome numbers indicated in red.

91

Human cancer cells frequently exhibit CIN, which is believed to contribute to

tumorigenesis by driving gain- and loss-of-function of oncogenes and tumor

suppressors, respectively (Rajagopalan et al., 2003). How cancer cells acquire a CIN

phenotype is a major unresolved question and the role of lncRNAs in this process is

poorly understood. We therefore wished to quantitatively assess whether loss of

NORAD induces CIN by employing an established fluorescent in situ hybridization

(FISH) assay in which centromere probes are used to label marker chromosomes which

can then be scored in hundreds of interphase cells (Jallepalli et al., 2001). Assaying

chromosomes 7 and 20 with this approach verified that wild-type HCT116 cells exhibit a

low rate of chromosomal gain or loss (Figure 3.9). In contrast, up to 25% of NORAD−/−

cells displayed gain or loss of one of these chromosomes, confirming the presence of a

CIN phenotype. Importantly, since only 2 chromosomes were assayed in these

experiments, these measurements likely represent a significant underestimate of the

rate of aneuploidy in NORAD−/− cells. In addition, live cell imaging documented a high

rate of mitotic errors, including anaphase bridges and mitotic slippage, in NORAD−/−

clones (Figure 3.10).

We further characterized this phenotype by karyotyping representative NORAD−/−

clones, which revealed the presence of non-recurrent de novo structural chromosomal

rearrangements (Figure 3.11). Thus, inactivation of NORAD results in numerical and

structural aneuploidy. These findings were documented in 3 independent NORAD−/−

clones generated with 3 different TALEN pairs, strongly suggesting that this phenotype

is specifically due to NORAD loss-of-function rather than an off-target effect of TALEN-

mediated genome editing.

92

Figure 3.9 Chromosome instability can be measured by interphase DNA FISH for statistical analyses

(A) Representative images of chromosome 7 and 20 FISH in NORAD+/+ and NORAD−/− HCT116 cells. White arrowheads highlight cells with chromosome loss or gain.

(B) NORAD−/− cells exhibit significantly elevated levels of aneuploidy. At least 100 interphase nuclei in each of 3 independent knockout clones were assayed for chromosome 7 and 20 using DNA FISH and the frequency of cells exhibiting a non-modal chromosome number was scored. **p<0.005, chi-square test.

93

Figure 3.10 Time-lapse image of mitotic defects in NORAD−/− HCT116 cells

(A) Representative time-lapse images of mitoses in NORAD+/+ and NORAD−/− HCT116 cells. Time stamp indicates minutes elapsed. (B, C) Quantification of the percentage of mitoses exhibiting the indicated mitotic errors in time-lapse imaging experiments. Values represent the average of 3 independent experiments with 39-100 mitoses imaged per genotype per experiment. Error bars represent standard deviations. *p<0.05; **p<0.01, Student’s t-test.

94

Figure 3.11 Non-recurrent de novo chromosomal rearrangements in NORAD−/− clones

Parental HCT116 and representative NORAD−/− clones were karyotyped by Giemsa-trypsin-Wright staining of metaphase spreads. As reported, HCT116 cells harbored 3 major chromosomal rearrangements involving chromosomes 10, 16 and 18 (black arrowheads) (Abdel-Rahman et al., 2001; Bunz et al., 2002). Non-recurrent de novo rearrangements in NORAD−/− clones are indicated by red arrowheads.

95

To determine whether the regulation of genomic stability by NORAD is unique to

HCT116 cells, we again used TALEN-stimulated homologous recombination to introduce

the transcriptional stop cassette into the NORAD locus in BJ-5ta cells, a telomerase-

immortalized, non-transformed diploid fibroblast cell line (Figure 3.12A-B). Targeting

was much less efficient in this cell line, with 2/393 clones harboring homozygous

insertions in NORAD. Although these NORAD−/− BJ-5ta cells were grossly diploid by

flow cytometric analysis of DNA content (data not shown), they exhibited significantly

elevated levels of aneuploidy, as determined by centromere FISH and quantification of

chromosomes 7 and 20 (Figure 3.12C). Thus, the regulation of chromosomal stability

by NORAD occurs in both transformed and non-transformed human cell lines.

96

97

Figure 3.12 Inactivation of NORAD in nontransformed BJ-5ta cells results in chromosomal instability

(A) PCR genotyping of BJ-5ta clones with homozygous targeting of AAVS1 or NORAD.

(B) qRT-PCR analysis of NORAD expression relative to 18S rRNA in targeted BJ-5ta clones of the indicated genotypes.

(C) Cells of the indicated genotypes were assayed for aneuploidy using chromosome 7/20 FISH as in Figure 3.9. 100 nuclei were scored per clone. P value calculated by chi-square test.

98

Chromosomal instability is specifically due to NORAD loss-of-function

Given that the effects of TALEN-mediated genome editing on chromosomal stability in

human cell lines has not been extensively examined, we performed a series of

experiments to confirm that the CIN phenotype that we observed in NORAD−/− cells was

specifically due to loss of this lncRNA rather than a general consequence of genome

manipulation with TALENs. First, we obtained a published TALEN pair that targets the

AAVS1/PPP1R12C locus (Sanjana et al., 2012) and used it to generate clones with

homozygous insertions of a puromycin resistance cassette at this site. Quantification of

chromosomes 7 and 20 documented normal chromosome numbers in targeted HCT116

and BJ-5ta cells (Figures 3.13, 3.12C). HCT116 cells transfected with these TALENs

were further subcloned and ploidy was examined using flow cytometry. 0/70 analyzed

clones acquired tetraploid DNA content (data not shown). Thus, neither CIN nor

tetraploidy is a general property of cells that have undergone TALEN-mediated genome

editing.

99

Figure 3.13 TALEN-mediated genome editing is not a general cause of chromosomal instability

Insertion of a puromycin resistance cassette at the AAVS1/PPP1R12C locus was performed using a published TALEN pair (Hockemeyer et al., 2009; Sanjana et al., 2012) and the frequency of aneuploidy in homozygous targeted HCT116 clones was assessed using DNA FISH as in Figure 3.9B. n.s., not significant (chi-square test)

100

Next, we depleted the NORAD transcript using 2 distinct siRNAs to recapitulate the

NORAD-deficient state using an unrelated method (Figure 3.14). After 12 days of

subsequent growth, populations of NORAD knockdown cells were assessed for

chromosome content by FISH. As observed following TALEN-mediated inactivation of

NORAD, knockdown of this transcript resulted in significantly elevated chromosomal

instability (Figure 3.14B). Subclones of control or NORAD knockdown cells were then

produced, revealing infrequent but reproducible de novo generation of tetraploid lines

derived specifically from cells transfected with NORAD targeting siRNAs (Figure 3.14C).

101

Figure 3.14 NORAD knock-down using siRNA shows similar phenotype as TALEN-mediated NORAD inactivation

(A) qRT-PCR analysis of NORAD expression, relative to 18S rRNA, in HCT116 cells 48 hours after transfection with control (siNT) or NORAD-targeting siRNAs.

(B) Chromosomal instability in siRNA-transfected HCT116 cells 12 days after siRNA transfection, assayed as in Figure 3.9B. At least 200 nuclei were scored per condition. P value calculated by chi-square test.

(C) Flow cytometry histograms showing DNA content, as measured by propidium iodide staining, in representative HCT116 subclones generated after transfection with the indicated siRNAs.

102

Lastly, we took advantage of our targeting strategy in NORAD−/− cells, which allowed

excision of the transcriptional stop cassette by Cre recombinase. As expected,

adenoviral delivery of Cre resulted in restoration of NORAD expression (Figure 3.15A).

10 subclones were generated from NORAD−/− cells with or without Cre expression and

chromosome content was assessed by centromere FISH (Figure 3.15B). Cells with

rescued NORAD expression exhibited significantly lower levels of aneuploidy. These

findings confirmed that NORAD is essential for the maintenance of genomic stability in

human cells.

103

104

Figure 3.15 Cre-induced de-repression of NORAD rescues chromosomal instability

(A) qRT-PCR analysis of NORAD expression in NORAD+/+ and NORAD−/− HCT116 cells with or without adenovirus-Cre infection.

(B) Subclones generated from untreated or adenovirus-Cre infected NORAD−/− HCT116 cells were scored for aneuploidy as in Figure 3.9B. P value calculated by Student’s t-test.

105

NORAD directly regulates both ploidy and chromosomal stability

It has been proposed that in some cancer cells, CIN results from whole genome

duplication events that produce a transient tetraploid state that subsequently resolves

into an unstable pseudo-diploid state (Ganem et al., 2007). Therefore, since we

recovered both tetraploid and diploid NORAD−/− clones that each exhibited CIN, it was

unclear whether loss of NORAD primarily causes tetraploidization which then results in

CIN as a secondary consequence of this event, or whether NORAD directly regulates

both ploidy and chromosomal stability. The fact that CIN can be rescued by NORAD

reactivation in diploid knockout cells (Figure 3.15) supports the latter possibility. If CIN

were due to a prior, now resolved, tetraploid state, restoration of NORAD should no

longer have the capacity to revert genomic instability in diploid cells. Furthermore, if the

CIN phenotype of NORAD−/− cells is solely a secondary consequence of polyploidization,

tetraploid knockout cells should revert to a diploid state at a measureable frequency.

However, analysis of 32 subclones derived from tetraploid NORAD−/− cells demonstrated

that these cells do not detectably revert to diploidy (Figure 3.16A). In contrast,

approximately 10% of subclones of diploid NORAD−/− cells gain tetraploid DNA content

(Figure 3.16B). These results support a primary role for NORAD in regulating both

ploidy and chromosomal stability in diploid cells.

106

Figure 3.16 Tetraploidy is a stable state in NORAD−/− cells whereas diploid cells lacking NORAD generate new tetraploid subclones

(A) Flow cytometry histograms showing DNA content, as measured by propidium iodide staining, in a tetraploid NORAD−/− HCT116 clone and a representative subclone derived from it. All 32 examined subclones retained tetraploid DNA content.

(B) 24 subclones derived from a diploid NORAD−/− HCT116 clone were examined as in panel A. 2/24 subclones gained tetraploid DNA content.

107

Discussion

As opposed to many other lncRNAs (Ponting et al., 2009; Khalil et al., 2009), NORAD

stands out due to its high conservation in mammals and abundant and ubiquitous

expression in various cell types and tissues. Although initial identification of this

transcript in our Doxorubicin screening suggested its role in a p53 dependent DNA

damage response, the induction after DNA damage that we observed here is likely to be

indirect, based on a recently published CHIPseq study (Sanchez et al., 2014) as well as

no p53 binding site in the proximity of the NORAD promoter. Its exact role in the DNA

damage response pathway will be an important question that has to be addressed in the

future. Additionally, it was recently reported that NORAD (LINC00657) is induced by

hypoxia in human endothelial cells (Michalik et al., 2014), suggesting broader roles for

NORAD in cellular stress responses. How NORAD regulation influences the functional

outputs of these and other stress response pathways represents an important area for

future research.

While regulation of NORAD expression is still elusive, genetic loss-of-function study

identified clear and interesting phenotype of NORAD – regulation of genomic stability.

The fidelity of chromosome segregation during cell division must be maintained at a high

level to ensure the accurate transmission of genetic information to daughter cells as well

as to avoid severe pathologic consequences. CIN, a phenotype characterized by the

frequent gain or loss of chromosomes during mitosis, is a hallmark of cancer cells

(Hanahan and Weinberg, 2011; Kops et al., 2005) and is a key mechanism that

contributes to gain- and loss-of-function of oncogenes and tumor suppressors.

108

Accordingly, most solid tumors show rapidly evolving structural and numerical

aneuploidy (Albertson et al., 2003; Gerlinger et al., 2012), which is often associated with

poor patient prognosis (Carter et al., 2006). Therefore, the mechanisms through which

cells maintain accurate chromosome transmission and how this process goes awry in

cancer have been the subject of decades of intensive research. Various mechanisms

are known to contribute to chromosomal instability, including defects in the mitotic

checkpoint (Kops et al., 2005), deficiencies in sister chromatid cohesion (Manning et al.,

2014), spindle abnormalities (Cimini, 2008), the presence of supernumerary

centrosomes (Ganem et al., 2009), and replication stress (Burrell et al., 2013).

More recently, noncoding RNAs including numerous miRNAs as well as some lncRNAs,

such as PANDA, ANRIL or lincRNA-p21, are reported to be involved in DNA damage

response and thus in maintaining genome integrity (reviewed in (Wan et al., 2014)). In a

recent study, the noncoding RNA CCAT2 has been demonstrated to be upregulated in

microsatellite-stable colon cancer and to promote tumorigenesis and chromosomal

instability by activating MYC and WNT signaling (Ling et al., 2013). However, whether

lncRNAs can be an integral part of this essential cellular process have been largely

unexplored. Therefore, discovery of CIN phenotype in NORAD−/− cells provides valuable

evidence that not only proteins involved in mitotic checkpoint but also long noncoding

RNAs plays important roles in this biological process, adding additional regulatory layer.

Since defects in the maintenance of genome integrity are implied in multiple complex

diseases, developmental defects, aging and almost all types of cancer (Iourov et al.,

2010; Kops et al., 2005; Zeman and Cimprich, 2014), it will be of great interest and

importance to investigate phenotypes that are caused by NORAD loss-of-function in

animal level.

109

Materials and Methods

TALENs and targeting constructs for genome editing

3 pairs of TALENs targeting NORAD were designed using ZiFit Targeter v4.1 (Sander et

al., 2010) and constructed using the Restriction Enzyme And Ligation (REAL) assembly

method (Sander et al., 2011) with Addgene Kit #1000000017. Sequences of target

genomic DNA (gDNA) and TALEN RVDs are provided in Table 3.1. To construct donor

templates for homologous recombination (HR), homology arms were amplified from

gDNA (primers in Table 3.2) and cloned into Lox-Stop-Lox TOPO (Addgene plasmid

#11584) (Jackson et al., 2001) using the In-Fusion HD cloning Kit (Clontech). A

previously described TALEN pair targeting the AAVS1/PPP1R12C locus (Sanjana et al.,

2012) and an AAVS1/PPP1R12C targeting construct (Hockemeyer et al., 2009) were

obtained from Addgene (hAAVS1 1L TALEN, Plasmid #35431; hAAVS1 1R TALEN,

Plasmid #35432; AAVS1 hPGK-PuroR-pA donor, Addgene plasmid #22072).

110

Table 3.1 TALEN RVDs and target sequences for NORAD

TALEN RVDs

TALEN Pair TALEN RVD3

TALEN1 NG HD HD NN NN NG HD HD NN NN HD NI NN NI NN

NN NN NI NN NN NI NN HD NN NN NN HD NG NN HD NN NG NG HD NG

TALEN2 HD HD NI NN NN HD HD HD NG HD HD NN NN HD HD HD HD NN

HD NN NN HD HD NG NN NG HD HD HD NN NN NN NN HD HD

TALEN3 NN NI NI HD NG NN NN NN NN NN NN HD HD HD HD

NI NG HD NG NN HD NI NN NN NN HD NI NN NI NN

Target sequences on NORAD

TALEN Pair Target sequence 5' to 3'3

TALEN1 target T TCCGGTCCGGCAGAG atcgcggagagacgc AGAACGCAGCCCGCTCCTCC A

TALEN2 target T CCAGGCCCTCCGGCCCCG ggccggcgggtgaactgggg GGCCCCGGGACAGGCCG A

TALEN3 target T GAACTGGGGGGCCCC gggacaggccgagcc CTCTGCCCTGCAGAT A

3 Red and Blue sequences represent left and right TALEN target, respectively and grey sequences are spacer

111

Table 3.2 Primers used to amplify homology arms for NORAD LSL knock-in

Primer name

Description Sequence 5' to 3'4

LSL 3ACD rev reverse primer for all 3 right homology arms for NORAD TALENs

CTCGATCGAGGTCGAAGAGGGTGGTGGGCATTT

LSL 3A fwd forward primer for right homology arm for NORAD TALEN1

ACGAAGTTATGTCGAGACGCAGAACGCAGCCCG

LSL 3C fwd forward primer for right homology arm for NORAD TALEN2

ACGAAGTTATGTCGAGAGCCCTCTGCCCTGCAG

LSL 3D fwd forward primer for right homology arm for NORAD TALEN3

ACGAAGTTATGTCGACCTCTCTTTCCCACCCCA

LSL 5A rev reverse primer for left homology arm for NORAD TALEN1

ACGAAGTTATGTCGATCTCCGCGATCTCTGCCG

LSL 5C rev reverse primer for left homology arm for NORAD TALEN2

ACGAAGTTATGTCGAGGCCTGTCCCGGGGCCCC

LSL 5D rev reverse primer for left homology arm for NORAD TALEN3

ACGAAGTTATGTCGATTCGCTGCGGCTTCAAGG

LSL 5ACD fwd forward primer for all 3 left homology arms for NORAD TALENs

AGCGGCCGCTGTCGAAAATGAAATATTGGAGTCTTCT

4 Red and Blue nucleotides are complementary to the vector sequence for InFusion reaction.

112

Cell culture, transfection, and adenovirus transduction

HCT116 and BJ-5ta cells were obtained from ATCC and cultured in either McCoy’s 5a or

a 4:1 mixture of DMEM and Medium199 respectively, supplemented with 10% FBS and

1X Antibiotic-Antimycotic (Life Technologies). HCT116 cells were transfected with

Fugene HD (Promega). 10 g DNA and 30 L of the transfection reagent were used per

10 cm dish. For BJ-5ta, 4×106 cells were suspended in 100 L nucleofector solution SE

with 3 g DNA and transferred to 100 L cuvettes, followed by nucleofection using a 4D-

Nucleofector System (Lonza) with program EN-150. For genome editing experiments,

plasmids were mixed at molar ratio of Left-TALEN:Right-TALEN:HR-donor = 1:1:8.

Transfected cells were then selected with 1 g/mL puromycin for at least 7 days and

surviving cells were plated in 96 well plates at single cell density. Genomic DNA was

isolated from single-cell clones with the DNeasy kit (Qiagen) and genotyped by PCR

with primers provided in Table 3.3. Ad-Cre was obtained from the UT Southwestern

Vector Core and cells were transduced with an MOI of 200 for 2 days. siRNAs

(sequences in Table 3.4) were transfected using DharmaFECT 2 (GE Healthcare).

113

Table 3.3 Primers used for genotyping genome edited single cell derived clones

Primer name

Description Sequence 5' to 3'

NORAD HR 5' fwd

forward primer outside of left homology arms for NORAD lox-Stop-lox knock-in allele

CTCTCCCGCACTGCAGTTCA

NORAD LOC HR 5' rev

reverse primer inside PuroR cassette

AGGGCCAGCTCATTCCTCCC

NORAD HR 3' fwd

forward primer inside STOP cassette

GAATTCCGCAAGCTAGCCAC

NORAD HR 3' rev

reverse primer outside of right homology arms for NORAD lox-Stop-lox knock-in allele

ACGTGGACGTATCGCTTCCA

AAVS1 fwd

forward primer outside of left homology arm for AAVS1 locus knock-in allele

CTCTCCTGAGTCCGGACCACTTTG

AAVS1 rev

reverse primer for untargeted WT AAVS1 allele

CAAGCTCTCCCTCCCAGGAT

AAVS1 TA rev

reverse primer inside PuroR cassette

CACAAGGGTAGCGGCGAAGAT

114

Table 3.4 siRNA target sequences

Sequence name

Description Target sequence 5' to 3'

siNon-Target Negative control siRNA from Dharmacon

GCGCGATAGCGCGAATATA

siNORAD-1 siRNA sequence targeting 829..847 of NORAD

TAGCCCTTCTAGATGGAAA

siNORAD-2 siRNA sequence targeting 177..195 of NORAD

CCACTGGCTGTGCCCAGAC

115

RNA isolation, qPCR, and northern blotting

Total RNA was extracted from cultured cells with Trizol (Invitrogen) or the RNeasy kit

(Qiagen) and contaminating gDNA was digested with RNase-free DNase (Qiagen). For

qRT-PCR experiments, the Taqman One-Step RT-PCR Master Mix (Life Technologies)

was used with a custom NORAD Taqman assay or a commercial 18S rRNA Taqman

assay (Life Technologies). For all other qPCR assays, RNA was reverse-transcribed

with SuperScript III (Invitrogen) and Power SYBR Green PCR Master Mix (Life

Technologies) was used. Primers and probes used for qPCR are provided in Table S5.

To measure NORAD copies per cell, NORAD was first amplified from HCT116 cDNA

and cloned into pcDNA3.1. This plasmid was then used to generate a standard curve

for absolute quantification of NORAD abundance in defined numbers of cells. For

northern blotting, 20 g total RNA was separated on a 0.7% denaturing agarose gel

containing formaldehyde and transferred to Hybond N+ membranes. The NORAD probe

was PCR amplified with primers provided in Table 3.5 and radiolabeled using the

Random Primed DNA Labeling Kit (Roche).

Southern blotting

Genomic DNA was isolated using DNeasy (Qiagen) and digested with SphI. 10 g of

digested DNA was electrophoresed on a 0.7% agarose gel and transferred to Hybond

N+ membrane (Amersham). The probe was generated by purifying the 483 bp BsaI-

HindIII fragment of Lox-Stop-Lox TOPO (Addgene plasmid #11584) (Jackson et al.,

2001) and radiolabeled using the Random Primed DNA Labeling Kit (Roche).

116

Table 3.5 Primers used to generate northern blot probe

Primer name Description Sequence 5' to 3'

Northern1 fwd forward primer for northern blot probe: amplicon 47..837 of NORAD

CTCCTCCAGGGCCCTCCAG

Northern 1 rev reverse primer for northern blot probe: amplicon 47..837 of NORAD

GAAGGGCTAGATGTGACAAATGTTT

117

Time-lapse imaging

Cells were grown on NUNC chambered coverglasses (Thermo). To visualize DNA in

HCT116 cells, a cell permeable Hoechst dye (33342; Invitrogen) was used at 25-50

ng/mL. Time-lapse fluorescence images were collected every 5 minutes for 24-48 hours

using a Leica inverted microscope equipped with an environmental chamber that

controls temperature and CO2, a 63X oil-objective, an Evolve 512 Delta EMCCD

camera, and Metamorph software (MDS Analytical Technologies).

DNA FISH and Karyotyping

Chromosome enumeration probes for Chromosome 7 (CHR7-10-GR) and chromosome

20 (CHR20-10-RE) were purchased from Empire Genomics. For interphase DNA FISH,

cells were harvested with trypsin, washed with PBS, and incubated in hypotonic solution

(0.4% KCl) for 10 minutes. Cells were then resuspended in fixation buffer (3:1 mix of

methanol:glacial acetic acid) and spread on slides pre-treated with 1M HCl for 24 hours,

then 70% EtOH for 24 hours and stored in distilled water. For analyzing metaphase

spreads, cells were treated with 1 g/mL colcemid (Roche) for 30 minutes, harvested

and fixed as described above, and spread on slides in a climate-controlled hood, set at

25°C and 40% humidity. DNA FISH hybridizations and karyotype analyses were

performed by the Veripath Cytogenetics laboratory at UT Southwestern.

Flow cytometry

Assessment of DNA content by propidium iodide staining and flow cytometry was

performed as previously described (Hwang et al., 2007). For phospho-Histone H3

118

(Ser10) staining, trypsinized cells were fixed in 4% formaldehyde for 10 min, washed

with PBS, and incubated with 100 L incubation buffer (1% BSA and 0.1% Triton X-100

in PBS) with antibody (9701, Cell Signaling) diluted at 1:50 followed by staining with goat

anti-rabbit antibody conjugated to AlexaFluor 488 (Life Technologies).

Prediction of coding potential with PhyloCSF

A Multiz alignment of 46 vertebrates aligned to GRCh37/hg19 for CENPB, JUND, UBC,

ERBB2, NEAT1, XIST, and NORAD (LINC00657) in multiple alignment format (MAF)

and BED files containing strand-specific genomic coordinates for the exons in each gene

were downloaded from the UCSC Table Browser and uploaded to Galaxy

(https://usegalaxy.org/) (Blankenberg et al., 2010). These files were used with the ‘Stitch

MAF blocks’ followed by ‘Concatenate FASTA alignment by species’ functions of Galaxy

to generate FASTA alignments for each gene in the 29 mammals specified by the

PhyloCSF phylogeny (http://mlin.github.io/PhyloCSF/29mammals.nh.png). PhyloCSF

(Lin et al., 2011) was run with the resulting FASTA file using the following parameters: [--

orf=ATGStop --frames=3 --removeRefGaps --aa --allScores].

3 RACE

3 RACE was performed using the GeneRacer kit (Life Technologies) and primers listed

in Table 3.6.

119

Table 3.6 Primers used for 3’ RACE

Primer name Description Sequence 5' to 3'

NORAD 3' RACE 1

forward primer for NORAD 3' RACE

TCCCATAAAATTGGATGTTGTGCCTA

NORAD 3' RACE 2

Nested forward primer for NORAD 3' RACE

TGTGAATGACTTTGTTCTTTGCTTGTG

120

Chapter 4: Mechanism of chromosome instability in NORAD

depleted cells

Introduction

Pumilio-Fem3-binding factor (PUF) proteins represent a deeply conserved family of RNA

binding proteins that act as negative regulators of gene expression (Wickens et al.,

2002). PUF proteins bind with high specificity to sequences in the 3 UTRs of target

mRNAs through their PUMILIO homology domains (Zamore et al., 1997) and stimulate

deadenylation and decapping, resulting in accelerated turnover and decreased

translation (Miller and Olivas, 2011). There are two human and mouse PUF proteins,

PUMILIO1 (PUM1) and PUMILIO2 (PUM2), that bind to target transcripts containing an

eight nucleotide sequence (UGUANAUA), referred to as the PUMILIO response element

(PRE). Many mammalian PUM targets have been identified using high-throughput

approaches (Chen et al., 2012; Galgano et al., 2008; Hafner et al., 2010; Morris et al.,

121

2008), revealing diverse functions for these proteins in germline homeostasis (Chen et

al., 2012; Spassov and Jurecic, 2003), cell cycle control (Kedde et al., 2010; Miles et al.,

2012) and neuronal activity and function (Driscoll et al., 2013; Vessey et al., 2010).

Notably, Pum1 haploinsufficiency in mice has recently been reported to result in

spinocerebellar ataxia type 1 (SCA1)-like neurodegeneration due to increased levels of

the PUM-target Ataxin1 (Gennarino et al., 2015), demonstrating that PUM dosage must

be precisely controlled in vivo to avoid significant pathologic consequences.

Nevertheless, the mechanisms through which PUM activity is regulated remain

unknown.

Here we describe the unexpected finding that a poorly characterized mammalian

lncRNA, which we termed NORAD, functions as a major regulator of PUM activity in

human cells. This lncRNA initially came to our attention due to its induction after DNA

damage, its strong evolutionary conservation, and its ubiquitous, abundant expression in

human tissues and cell lines. Surprisingly, inactivation of NORAD using a genome

editing approach resulted in chromosomal instability and dramatic aneuploidy in

previously karyotypically-stable human cell lines. Identification of NORAD-interacting

proteins revealed that this lncRNA functions as a multivalent binding platform for PUM

proteins. With at least 15 conserved PREs, NORAD has the capacity to sequester a

significant fraction of the total cellular pool of PUM1 and PUM2. We further showed for

the first time that PUM proteins regulate a large set of target transcripts that play a

critical role in maintaining the fidelity of chromosome transmission including key factors

necessary for mitosis, DNA replication, and DNA repair. In the absence of NORAD,

PUM hyperactivity leads to repression of these targets, resulting in genomic instability.

These findings have revealed a lncRNA-dependent mechanism that regulates a highly

122

dosage-sensitive family of RNA binding proteins, uncovering a new post-transcriptional

regulatory axis that maintains genomic stability in mammalian cells.

123

Results

NORAD is a cytoplasmic multivalent PUMILIO binding platform

To begin to elucidate the mechanism through which NORAD regulates genomic stability,

we first examined its subcellular localization. Fractionation revealed that NORAD is

nearly exclusively cytoplasmic, with a subcellular distribution comparable to ACTB

mRNA and clearly distinct from the established nuclear lncRNA NEAT1 (Figure 4.1A).

Cytoplasmic localization of NORAD was confirmed by single molecule RNA FISH

(Figure 4.1B). These findings suggest that NORAD interacts with a factor in the

cytoplasm through which it regulates faithful chromosome transmission.

124

Figure 4.1 NORAD is localized predominantly to the cytoplasm

(A) qRT-PCR analysis of NORAD and cytoplasmic (ACTB) and nuclear (NEAT1) control transcripts in subcellular fractions of HCT116 cells.

(B) Representative NORAD single-molecule RNA FISH images of HCT116 cells of the indicated genotypes.

125

We next carefully examined the sequence of NORAD to determine if any obvious

domain architecture could be discerned that might provide clues regarding its molecular

function. Alignment of the NORAD sequence to itself using the BLAST algorithm

uncovered a repetitive ~400 nucleotide domain that recurs 5 times in the transcript

(Figure 4.2). We termed this sequence the NORAD domain (ND1-ND5). Notably, a

large fraction of the conserved sequence within NORAD is encompassed within these

repetitive regions. We hypothesized that the NORAD domain represents a binding

platform through which this lncRNA is able to assemble a multivalent ribonucleoprotein

(RNP) complex.

126

Figure 4.2 Domain structure of NORAD

(A) Dot plot of nucleotide identity generated by aligning the NORAD sequence to itself using BLAST (discontinuous megablast; http://blast.ncbi.nlm.nih.gov/). This alignment revealed multiple repetitive regions within the NORAD sequence.

(B) Schematic of the NORAD transcript, showing the locations of the repetitive regions, termed NORAD domains (ND1-5). The mammalian conservation plot was obtained from the UCSC Genome Browser (PhastCons track).

(C) NORAD fragments used for in vitro transcription and RNA pull-down experiments.

127

To identify components of this putative NORAD RNP, we synthesized 7 biotinylated

RNA fragments encompassing each NORAD domain as well as the 5 and 3 segments

of the transcript (Figure 4.2C). These 7 fragments, along with corresponding antisense

sequences, were used to recover associated proteins in HCT116 lysates, which were

subsequently identified by mass spectrometry (Figure 4.3A). Candidate interactors

were filtered for those that were detectable above background in all five NORAD domain

pull downs with at least 5-fold enrichment compared to each corresponding antisense

pull down. Only a single protein, PUMILIO 2 (PUM2), fulfilled these criteria (Figure

4.3B). We confirmed the binding of PUM2 to all five NORAD domains as well as the 5

end of NORAD using western blotting (Figure 4.3C). Western blotting also revealed

detectable interaction of NORAD with the related protein PUMILIO 1 (PUM1). We

further assessed PUM1/2-NORAD interactions by immunoprecipitating endogenous

PUM proteins, which confirmed highly significant enrichment of endogenous NORAD

(Figure 4.3D). Consistent with these data, both NORAD (Figure 4.1) as well as

PUM1/PUM2 (Morris et al., 2008; Ponten et al., 2008; Narita et al., 2014) are

predominantly localized to the cytoplasm.

128

129

Figure 4.3 NORAD interacts with PUMILIO proteins

(A) Experimental scheme to identify NORAD interacting proteins. Biotinylated NORAD fragments were synthesized by in vitro transcription and associated proteins were recovered from cell lysates, eluted with RNase A digestion, and identified using mass spectrometry.

(B) Plot of the normalized spectral index statistic (Trudgian et al., 2011), derived from mass spectrometry data, providing a quantitative estimate of PUM2 abundance in each sense and antisense NORAD fragment pull-down.

(C) Western blot analysis of PUM1 and PUM2 in sense (S) and antisense (AS) NORAD fragment pull-downs. GAPDH served as a negative control.

(D) NORAD or ACTB transcripts were assessed by qRT-PCR in endogenous PUM1, PUM2, or negative control IgG immunoprecipitates from HCT116 cells. Fold enrichment over IgG signal plotted.

130

Because the NORAD-PUM1/2 interactions were discovered through in vitro binding

experiments in cell extracts and validated through RNA immunoprecipitation (RIP)

experiments, both of which allow re-association of RNAs and proteins in cell lysates (Mili

and Steitz, 2004), we took advantage of a previously generated photoactivatable

ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) dataset

generated with human PUM2 (Hafner et al., 2010). Through the covalent crosslinking of

RNA binding proteins and target RNAs prior to cell lysis, PAR-CLIP detects specific RNA

binding events that occur in intact cells. 7523 PUM2 binding sites, occurring in ~3000

transcripts, were identified in this experiment. Remarkably, a site in NORAD, within

ND4, was ranked 11th out of all PUM2 binding sites, based on its representation in

PUM2 PAR-CLIP sequencing libraries (Figure 4.4). Four additional PUM2 binding sites

in NORAD were also identified. These findings corroborate our in vitro binding and RIP

data, providing strong evidence for endogenous interactions between NORAD and

PUMILIO proteins.

131

Figure 4.4 PAR-CLIP identifies NORAD as a major PUM2 target

(A) Histogram of the number of sequence tags per PUM2 PAR-CLIP cluster (Hafner et al., 2010). Red lines show NORAD CLIP clusters. Data obtained from http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/PUM2/PUM2.html.

(B) Locations of the PUM2 PAR-CLIP clusters in the NORAD transcript. Numbers above each cluster represent the ranking based on the number of sequence tags per cluster (cluster 1 was the most frequently crosslinked site in NORAD).

132

However, we noted the presence of a large number of NORAD pseudogenes, including

4 nearly full length copies with >93% nucleotide identity to NORAD (Figure 4.5), which

likely confounded the mapping of sequencing reads in the Hafner et al. study. Notably,

at least four of these putative pseudogenes (on chromosomes 6, 9, 12, and X) are nearly

full-length, with greater than 93% sequence identity to NORAD over at least 4.2 kb.

Several of these homologous sequences have features of processed pseudogenes,

including target site duplications and terminal poly(A) sequences (Kazazian, 2014)).

Nevertheless, analysis of Illumina BodyMap 2.0 RNA-seq data from 16 human tissues

revealed little evidence of transcription of most of these loci (data not shown), with the

notable exception of a nearly full-length NORAD-related sequence on chromosome 6,

which is annotated in Refseq as HCG11. However even HCG11, which shows the

highest detectable expression of any NORAD-related sequence, has an average FPKM

of 2.0 ± 1.3 in BodyMap data compared to an average FPKM of 31.8 ± 16.8 for NORAD.

Accordingly, use of sequence-specific Taqman assays demonstrated that HCG11

abundance is >200-fold lower than NORAD abundance in HCT116 cells (data not

shown). Thus, at present, there is no evidence that any of these NORAD-related

sequences are functional in human cells, although it remains possible that some may

perform a PUMILIO sequestering function in specific tissues or cell types.

Importantly, our Southern blot strategy (Figure 3.5) confirms that these NORAD-related

sequences did not confound our genome editing approach to inactivate NORAD

expression since all analyzed NORAD−/− clones have single copy insertions of the lox-

STOP-lox cassette at the desired site.

133

134

Figure 4.5 NORAD and Norad pseudogenes in human and mouse genomes

(A) BLAT alignment of NORAD to the human genome (GRCh37/hg19; http://genome.ucsc.edu/cgi-bin/hgBlat) revealed 43 genomic loci that exhibit 84-98% identity to NORAD over at least a 100 bp span.

(B, C) Matched distribution of human (B) and mouse (C) pseudogenes with high sequence identity.

135

We therefore reanalyzed these data, first extracting all reads that map to NORAD prior

to transcriptome-wide mapping. Remarkably, this revealed that NORAD was the most

highly represented PUM2 CLIP target by a large margin (Figure 4.6A). To complement

these data, we performed PAR-CLIP on endogenous PUM2 in NORAD+/+ and NORAD−/−

HCT116 cells. Recovery of PUM2 was less efficient in this experiment than the prior

study, which used heterologous expression of epitope-tagged PUM2, resulting in less

comprehensive transcriptome-wide PUM2 target identification. Nevertheless, NORAD

was again the most highly represented target of endogenous PUM2 (Figure 4.6B) and,

as expected, was not detected in NORAD−/− cells, demonstrating that the NORAD

pseudogenes do not confound our modified mapping approach.

136

Figure 4.6 PUM2 PAR-CLIP reveals NORAD as the most preferred PUM2 binding transcript

(A) Histogram of the total number of CLIP reads per PUM2 target transcript in PAR-CLIP data generated with FLAG-PUM2 (Hafner et al., 2010) (A) or endogenous PUM2 (B). Number of NORAD CLIP reads shown in red text in parentheses.

137

Human PUM1 and PUM2 represent members of the deeply conserved PUF family of

RNA binding proteins that negatively regulate the stability and translation of target

mRNAs to which they bind (Wickens et al., 2002). PUF proteins are known for highly

specific binding to target RNAs, with human PUM1 and PUM2 exhibiting a strong

preference to bind to the PUMILIO response element (PRE) sequence UGUANAUA

(Galgano et al., 2008; Wang et al., 2002). This element is expected to occur 1 time in

approximately 16 kb of random sequence. Strikingly, there are 15 conserved sequences

perfectly matching the PRE in the 5.3 kb NORAD transcript, with the large majority

clustering in or near the NORAD domains (Figure 4.7). This is in stark contrast to other

PUM-bound transcripts, 90% of which have 2 or fewer PREs (Galgano et al., 2008).

Analysis of CLIP cluster distribution on NORAD confirmed the binding of endogenous

PUM2 to 7/15 PREs and heterologously-expressed PUM2 to 15/15 PREs (Figure 4.8).

Together, these data provide compelling evidence demonstrating multivalent interaction

of PUMILIO proteins with NORAD and indicate that NORAD is the preferred PUM2

target transcript in human cells. Thus, in vitro binding data, RIP and PAR-CLIP

interaction studies, and the identification of a large number of conserved PREs together

provide compelling evidence demonstrating multivalent interaction of NORAD with

PUMILIO proteins in human cells.

138

Figure 4.7 Conserved 15 PUMILIO binding sites in NORAD

Location, sequence, and conservation of PREs in NORAD. ND, NORAD domain. Red lines indicate conserved PRE sequences on NORAD.

139

Figure 4.8 PUM2 PAR-CLIP reads clusters on predicted PRE consensus motifs of NORAD

Location and read depth of endogenous PUM2 (upper) or FLAG-PUM2 (lower) PAR-CLIP clusters mapped to NORAD. Black bars, clusters overlapping PREs; gray bars, non-PRE clusters.

140

NORAD acts as a negative regulator of PUMILIO activity

Our prior measurements of NORAD transcript copy number revealed ~500-1000 copies

per cell in HCT116 (Figure 3.3C). With 15 PREs per transcript, NORAD therefore has

the capacity to bind ~7,500-15,000 PUMILIO protein molecules per cell. Based on these

estimates, we hypothesized that NORAD sequesters a large fraction of the pool of

PUMILIO proteins, thus negatively regulating their ability to repress target mRNAs. To

further test the plausibility of this model, we determined the number of PUM1 and PUM2

protein molecules per HCT116 cell by purifying recombinantly-expressed PUM1/2, which

were then used to generate standard curves for quantitative western blotting. These

measurements documented an average of ~15,000 PUM1 and ~2,000 PUM2 proteins

per cell (Figure 4.9). Thus, NORAD has the potential to sequester a significant fraction

of PUMILIO proteins in this cell line.

141

Figure 4.9 Measurement of the number of PUM1 and PUM2 protein molecules per HCT116 cell

Purified recombinantly-expressed PUM1 and PUM2 were used to generate standard curves to estimate the mass of PUM1 or PUM2 in a given quantity of HCT116 lysate corresponding to a known number of cells. Western blot signals were quantified using a C-DiGit scanner (LI-COR). Quantification summarized in tables below blots.

142

Based on these estimates, we hypothesized that NORAD sequesters a large fraction of

the pool of PUMILIO proteins, thus negatively regulating their ability to repress target

mRNAs. This model invokes at least 3 key predictions: First, in NORAD−/− cells,

PUM1/2 should be hyperactive resulting in relative repression of PUM1/2 targets;

second, PUM1 and/or PUM2 overexpression should phenocopy NORAD loss-of-

function; and third, depletion of PUM1/2 should suppress the NORAD loss-of-function

phenotype.

To test these predictions, we first performed RNA-seq on NORAD+/+ and NORAD−/−

HCT116 cells. Consistent with PUMILIO hyperactivity, PUM2 CLIP targets were

statistically-significantly downregulated in NORAD−/− cells (Figure 4.10A). Significant

downregulation of these targets was also confirmed by Gene Set Enrichment Analysis

(GSEA) (Subramanian et al., 2005) (Figure 4.10B).

143

Figure 4.10 PUM2 targets are down-regulated in NORAD−/− cells

(A) Cumulative distribution plots depicting behavior of PUM2 CLIP targets, as defined in Hafner et al. and this study, versus non-PUM2-targets in the indicated RNA-seq experiments. P value calculated by Kolmogorov–Smirnov test demonstrates significant repression of PUM2 targets in all tested datasets. (B) GSEA using RNA-seq data from NORAD−/− cells demonstrates significant downregulation of custom genesets consisting of 591 genes containing the top 1,000 PUM2 PAR-CLIP clusters identified by Hafner et al. (Hafner et al., 2010) (upper) or the 463 PUM2 PAR-CLIP targets identified in the present study (lower). NES, normalized enrichment score; FDR, false discovery rate.

144

We next generated HCT116 cell lines with stable overexpression of PUM1 or PUM2

(Figure 4.11A). Importantly, NORAD expression was unchanged in these cells (Figure

4.11B). RNA-seq confirmed the expected downregulation of PUM2 PAR-CLIP targets

(Figure 4.11 C,D). Furthermore, PUM1 or PUM2 overexpression produced a gene

expression signature that was similar to that observed upon NORAD inactivation, with

genes that were down- or upregulated in NORAD−/− cells showing a similar pattern of

expression in PUM1/2 overexpressing cells (Figure 4.11 E,F). Accordingly, PUM2 and,

to a lesser extent, PUM1 overexpression was sufficient to induce significant levels of

aneuploidy (Figure 4.11 G). Thus, PUMILIO overexpression phenocopies both the

molecular and phenotypic consequences of NORAD inactivation.

145

146

Figure 4.11 PUMILIO overexpression phenocopies both the molecular and phenotypic consequences of NORAD inactivation.

(A) Western blot of PUM1 and PUM2 in overexpressing HCT116 clones. Irrelevant lanes were removed from blots where indicated with vertical lines. (B) qRT-PCR analysis of NORAD expression relative to 18S rRNA in PUM1- or PUM2-overexpressing HCT116 cells. n.s., not significant relative to control (GFP) cells (Student’s t-test). (C, D) Cumulative distribution plots depicting behavior of PUM2 CLIP targets, as defined in Hafner et al. and this study, versus non-PUM2-targets in the indicated RNA-seq experiments. P value calculated by Kolmogorov–Smirnov test demonstrates significant repression of PUM2 targets in all tested datasets. (D) GSEA using RNA-seq data from cells overexpressing PUM1 (upper) or PUM2 (lower) demonstrates significant downregulation of a custom geneset consisting of 331 genes that are downregulated in NORAD−/− cells at least 2-fold with an adjusted p value ≤ 0.01 (left) or upregulation of a custom geneset consisting of 304 genes that are upregulated in NORAD−/− cells at least 2-fold with an adjusted p value ≤ 0.01 (right). (E) PUM1 and PUM2 overexpressing clones were assayed for aneuploidy using chromosome 7/20 FISH as in Figure 2E-F. At least 200 nuclei were scored per clone. n.s., not significant; *p<0.05; **p<0.005; ***p<0.0005, chi-square test.

147

Lastly, we used two approaches to deplete PUM1/2 in NORAD−/− cells. First,

CRISPR/Cas9-mediated genome editing was used to inactivate PUM1, PUM2, or both

(Figure 4.12A), followed by TALEN-mediated insertion of the transcriptional stop

cassette at the NORAD locus. Individual knockout of PUM1 or PUM2 resulted in partial

suppression of CIN in NORAD−/− cells, consistent with functional redundancy of these

proteins (Figure 4.12B). Unexpectedly, double knockout of PUM1 and PUM2 led to

measureable aneuploidy (Figure 4.12C). Together with our finding that PUM1 or PUM2

overexpression also results in aneuploidy (Figure 4.11G), these results suggest that

precise control of PUM1/2 levels is necessary to maintain genomic stability. Importantly,

knockout of NORAD in the PUM1−/−; PUM2−/− background did not result in a further

increase in CIN (Figure 4.12C).

Finally, we demonstrated that siRNA-mediated depletion of PUM1/2 in NORAD−/− cells

(Figure 4.13) significantly reduces the frequency of mitotic errors, as documented by

time-lapse imaging (Figure 4.13B, C). These data establish a critical role for PUMILIO

proteins downstream of NORAD in the maintenance of genomic stability.

148

Figure 4.12 PUMILIO knockout masks the phenotype of NORAD inactivation.

(A) Western blot of PUM1 and PUM2 in representative single or double knockout HCT116 clones. Irrelevant lanes were removed from blots where indicated with vertical lines. (B, C) Cells of the indicated genotypes were assayed for aneuploidy. *p<0.05, Student’s t-test, comparing NORAD−/−; PUM1+/+; PUM2+/+ to NORAD−/−; PUM1−/−; PUM2+/+ or NORAD−/−; PUM1+/+; PUM2−/−.

149

Figure 4.13 PUMILIO knockdown rescues phenotype of NORAD inactivation.

(A) Western blot of PUM1 and PUM2 in HCT116 cells following transfection with a control siRNA (siNonTarget) or 2 independent sets of PUM1/PUM2-targeting siRNAs.

(B,C) Quantification of the percentage of mitoses exhibiting the indicated mitotic errors in time-lapse imaging experiments after transfection with control siRNA (siNT) or two distinct sets of siRNAs targeting PUM1 and PUM2. Values represent the average of 3 independent experiments with 85-200 mitoses imaged per condition per experiment. Error bars represent standard deviations. *p<0.05; **p<0.01, Student’s t-test.

150

PUMILIO proteins repress key mitotic, DNA repair, and DNA replication factors

Finally, to determine why PUMILIO hyperactivity results in CIN, we further examined the

expression of PUM2 PAR-CLIP targets in our RNA-seq data from NORAD−/− cells.

Among the 1303 genes that are statistically-significantly downregulated in NORAD−/−

cells are 193 PUM2 targets (Figure 4.14A). These targets are significantly enriched for

regulators of the cell cycle, mitosis, DNA repair, and DNA replication (Figure 4.14B).

Notably, individual knockout or knockdown of many of the PUM2 targets that are

downregulated in NORAD−/− cells has previously been shown to be sufficient to induce

genomic instability, including core components of the cohesin complex (e.g. SMC1A,

SMC3, and ESCO2), centromere components (e.g. CENPJ), and key factors necessary

for DNA repair and replication (e.g. PARP1, PARP2, EXO1, BARD1, MCM4, and MCM8)

(summarized in Table 4.1). We validated the downregulation of a large set of these

transcripts with qRT-PCR in NORAD−/− cells, as well as in cells that overexpress PUM1

or PUM2 (Figures 4.15). This coordinated downregulation of a broad set of targets that

are necessary to maintain genomic stability would be expected to strongly impair

accurate chromosome transmission, as observed upon NORAD inactivation and

PUMILIO overexpression.

151

152

Figure 4.14 Genes required for the maintenance of chromosomal stability are repressed in NORAD−/− and PUM1/2-overexpressing cells

(A) Venn diagram showing overlap of genes that are significantly downregulated in

NORAD−/− HCT116 cells (adjusted p value ≤ 0.05; see Table S2) and PUM2 PAR-CLIP targets identified in Hafner et al. and this study.

(B) Gene ontology analysis of the 174 PUM2 PAR-CLIP targets that are downregulated in NORAD−/− cells, demonstrating enrichment of genes involved in mitosis, the cell cycle, DNA replication, and DNA repair.

153

154

Figure 4.15 Genes required for the maintenance of chromosomal stability are repressed in NORAD−/− and PUM1/2-overexpressing cells

(A) qRT-PCR validation of PUM2 PAR-CLIP targets that have a known role in the maintenance of genomic stability (see Table 4.1) and were downregulated in NORAD−/− cells according to RNA-seq. Gene expression was normalized to 18S rRNA. All genes shown were significantly downregulated in NORAD−/− cells (p≤0.05, Student’s t-test).

(B) qRT-PCR demonstrating expression of genes from panel C that are significantly downregulated in both PUM1- and PUM2-overexpressing HCT116 cells (p≤0.05, Student’s t-test).

(C) Expression of genes that are downregulated in NORAD−/− cells (see panel B) was assessed by qRT-PCR in PUM1- and PUM2-overexpressing cells. Graph shows genes that are significantly repressed by PUM1 (p≤0.05; Student’s t-test) but not PUM2. Gene expression was normalized to 18S rRNA.

155

Table 4.1 PUM target genes that are downregulated in NORAD−/− cells and required for genomic stability

Gene Category Notes References

ESCO2 Cohesin Cohesin acetyltransferase; Esco2 knockout in MEFs causes severe chromosome segregation defects.

(Whelan et al., 2012)

SMC1A Cohesin Component of the cohesin complex; SMC1A knockdown in HCT116 causes CIN.

(Barber et al., 2008)

SMC3 Cohesin Component of the cohesin complex; SMC3 knockdown in HCT116 causes CIN.

(Barber et al., 2008)

CENPJ Centromere Centromere protein; Cenpj haploinsufficiency in MEFs causes genomic instability.

(McIntyre et al., 2012)

EXO1 DNA repair Exonuclease; Exo1 deficiency causes chromosomal aberrations in MEFs.

(Schaetzlein et al., 2013)

RBMX DNA repair RNA binding protein; RBMX knockdown causes premature chromatid separation and aberrant mitosis; RBMX is involved in DNA repair.

(Adamson et al., 2012; Matsunaga et al., 2012)

PARP2 DNA repair Poly(ADP-ribose) polymerase; Key DNA repair factor; Parp2 knockout in MEFs causes chromosome mis-segregation upon treatment with an alkylating agent.

(De Vos et al., 2012; Menissier de Murcia et al., 2003)

NET1 DNA repair Guanine nucleotide exchange factor; NET1 depletion results in aberrant chromosome congression and separation.

(Menon et al., 2013)

PARP1 DNA repair Poly(ADP-ribose) polymerase; Key DNA repair factor; Parp1 knockout in MEFs causes chromosomal instability.

(De Vos et al., 2012; Samper et al., 2001)

RBBP8 DNA repair Retinoblastoma binding protein (also known as CTIP); RBBP8 is important for DNA double strand break repair and homologous recombination; RBBP8 knockdown causes genomic instability.

(Terasawa et al., 2014; Wang et al., 2014)

BARD1 DNA repair BRCA1 associated protein; Bard1-/-;p53-/- mouse embryos display chromosomal abnormalities; reconstitution of BARD1 in Bard1-/- cancer cells reduces chromosomal aberrations.

(Laufer et al., 2007; McCarthy et al., 2003)

MCM8 Replication Minichromosome maintenance complex component; Mcm8 knockout causes genomic instability in MEFs.

(Lutzmann et al., 2012)

WDHD1 Replication Also known as CTF4 - Chromosome transmission fidelity 4; CTF4 coordinates DNA unwinding and polymerase activity during replication.

(Kang et al., 2013)

MCM4 Replication Minichromosome maintenance complex component; Hypomorphic Mcm4 allele causes genomic instability in mice.

(Shima et al., 2007)

MASTL Mitosis Microtubule-associated serine/threonine kinase-like; MASTL knockdown causes mitotic defects and chromosomal abnormalities.

(Burgess et al., 2010; Voets and Wolthuis, 2010)

PRC1 Mitosis Protein regulator of cytokinesis; PRC1 is required for cytokinesis and is involved in proper chromosome segregation.

(Jiang et al., 1998; Liu et al., 2009)

LMNB2 Mitosis Nuclear lamin; LMNB2 knockdown in HCT116 causes CIN; LMNB2 is downregulated in CIN-type colon cancer cell lines.

(Kuga et al., 2014)

LIN9 Other Subunit of the DREAM complex; Lin9 knockout causes chromosomal instability in MEFs.

(Hauser et al., 2012)

SLBP Other Stem-loop binding protein; Interacts with Histone mRNA 3' ends; Slbp mutant flies exhibit genomic instability.

(Salzler et al., 2009)

HMGB1 Other High-mobility group box family member; Hmgb1 knockout in MEFs results in CIN.

(Giavara et al., 2005)

DNMT1 Other DNA methyltransferase; DNMT1 knockout in HCT116 causes CIN. (Karpf and Matsui, 2005)

156

Discussion

Although thousands of lncRNAs have been identified, the molecular functions of the vast

majority remain unknown. Here we report the initial functional characterization of a

highly conserved lncRNA that we termed NORAD, which is broadly and abundantly

expressed in mammalian cells and tissues. Our studies of this lncRNA have yielded

several important and unexpected findings. First, as demonstrated in the previous

chapter, inactivation of NORAD is sufficient to produce a chromosomal instability (CIN)

phenotype in previously karyotypically-stable cell lines. To our knowledge, these results

provide the first demonstration of an essential role for a lncRNA in the maintenance of

chromosomal stability in mammalian cells. Second, we show that NORAD preserves

genomic stability by acting as a multivalent binding platform for the PUMILIO family of

RNA binding proteins. Due to its high abundance and multitude of PUMILIO binding

sites, NORAD is able to sequester a significant fraction of the total cellular pool of

PUMILIO proteins, thereby greatly limiting their ability to repress target mRNAs. Among

PUMILIO targets are a large set of factors that are critical for mitosis, DNA repair, and

DNA replication whose excessive repression in the absence of NORAD perturbs

accurate chromosome segregation and can induce tetraploidization (Figure 4.16). The

elucidation of this novel lncRNA:PUMILIO regulatory interaction has expanded our

understanding of lncRNA functions and has uncovered a heretofore-unknown role for

PUMILIO proteins in the regulation of genomic stability in mammals.

157

Figure 4.16 A novel NORAD-PUMILIO axis that regulates genomic stability

Due to its abundance and multitude of PUMILIO binding sites, NORAD acts as a potent negative regulator of PUMLIO activity. In the absence of this lncRNA, PUMILIO is released to hyperactively repress a program of genes necessary to maintain chromosomal stability and a euploid state, including key factors required for mitosis, DNA replication, and DNA repair.

158

Our discovery that NORAD sequesters PUMILIO proteins contributes to an emerging

concept that a major class of lncRNAs function as molecular decoys. For example,

noncoding transcripts that sequester microRNAs (miRNAs), referred to as competing

endogenous RNAs (ceRNAs), have been proposed to act as broad regulators of gene

expression (Salmena et al., 2011). lncRNAs that inhibit proteins through competitive

binding have also been reported, such as GAS5 and the glucocorticoid receptor (Kino et

al., 2010), GADD7 and TDP-43 (Liu et al., 2012), and PANDA and NF-YA (Hung et al.,

2011). Nevertheless, due to the generally low abundance of lncRNAs and the frequent

promiscuity of protein-RNA interactions, the extent to which lncRNAs function through

this mechanism has been heavily debated. Importantly, several features of NORAD

distinguish it from the majority of lncRNAs and strongly support its function as a bona

fide molecular decoy. First, NORAD is unusually abundant with expression in the range

of ~500-1000 copies per cell in human cell lines, comparable to abundant housekeeping

transcripts such as ACTB. Moreover, the presence of at least 15 PUMLIIO response

elements (PREs) per NORAD transcript further amplifies, by more than an order of

magnitude, the number of competitive binding sites provided by this lncRNA. Indeed,

careful measurements of the number of PUM1 and PUM2 protein molecules per cell

revealed that NORAD has the potential to sequester 50-100% of the total PUMILIO

protein pool in HCT116 cells. Finally, it is noteworthy that unlike many RNA binding

proteins that interact with loosely defined consensus sequences, PUMILIO proteins are

known for their exquisite specificity (Wang et al., 2002). Thus, NORAD provides an

optimized binding platform that would be expected to efficiently assemble a multivalent

PUMILIO RNP, thereby greatly reducing the availability of PUMILIO proteins to act upon

mRNA targets.

159

These results also establish a novel role for PUMILIO proteins as important regulators of

genomic stability. Our finding that PUMILIO proteins repress a program of genes whose

expression is necessary to maintain chromosomal stability reveals a previously

unrecognized pathway to CIN. Prominent among PUMILIO targets are many genes that

function in DNA replication and repair as well as key mitotic factors. Previous studies

have demonstrated that individual knockdown or knockout of a large number of these

genes is sufficient to produce a CIN phenotype (summarized in Table 4.1). It is

therefore highly plausible that the coordinated repression of these targets under

conditions of PUMILIO hyperactivity would produce a state of severe genomic instability,

as observed upon NORAD loss-of-function. Importantly, it is presently unclear whether

PUMILIO hyperactivity contributes to CIN in human cancer cells since abnormal

expression or activity of PUMILIO or NORAD has not been reported in human tumors.

Nevertheless, in light of our findings, a more thorough examination of this pathway in

cancer is merited.

These findings contribute to a growing appreciation that the activity of PUMILIO proteins

must be maintained within a narrow range to maintain homeostasis in mammals. For

example, Pum1 haploinsufficiency results in neurodegeneration in mice due to

upregulation of the PUMILIO target Ataxin1 (Gennarino et al., 2015). Our data

document that hyperactivity of PUM1 or PUM2 also has deleterious consequences.

Nevertheless, little is known regarding how PUMILIO activity is regulated. The

emergence of NORAD in mammals provides a robust mechanism to buffer PUMILIO

activity and maintain it within tolerable limits. A major unresolved question, however, is

whether NORAD functions primarily as a static buffer or whether its levels are modulated

in order to further titrate PUMILIO activity under certain conditions. Importantly, since

160

each NORAD transcript has the capacity to bind at least 15 PUMILIO protein molecules,

even small changes in NORAD levels can profoundly influence PUMILIO availability.

For example, NORAD initially came to our attention due to its modest induction after

DNA damage (~2 fold; see Figure 3.3). Yet this small increase generates ~7000

additional PREs, representing sufficient binding sites to sequester nearly half of the total

pool of PUMILIO proteins in HCT116 cells. Notably, since transcripts encoding several

key DNA repair factors are PUM1/PUM2 targets (Figure 4.14), upregulation of NORAD

and a concomitant enhancement of PUMILIO sequestration would be expected to de-

repress these targets, thereby augmenting cellular DNA repair capacity.

In summary, characterization of the noncoding RNA NORAD has revealed a potent

molecular decoy for PUMILIO proteins, uncovering an unexpected mechanism through

which the activity of these highly-dosage sensitive post-transcriptional regulators is

controlled in mammalian cells. These results have also established the existence of a

newly-defined PUMILIO regulon that includes a program of genes whose expression is

essential for the maintenance of genomic stability. Since chromosomal instability, as

observed upon NORAD inactivation and consequent PUMILIO hyperactivity, can

produce developmental defects, accelerated aging, cancer, and other pathologies

(Iourov et al., 2010; Kops et al., 2005; Zeman and Cimprich, 2014), examination of

NORAD regulation and activity in normal physiology and disease will be of great interest.

161

Materials and Methods

Subcellular fractionation

Cytoplasmic, nuclear soluble, and chromatin-associated fractions were generated as

described previously (Cabianca et al., 2012). Briefly, cells were harvested by

trypsinization and lysed in RLN1 solution (50 mM Tris-HCl pH 8.0, 140 mM NaCl, 1.5

mM MgCl2, 0.5% NP-40, 2 mM VRC) in ice for 5 min. After centrifugation, the

supernatant was collected as the cytoplasmic fraction while the pellet was further

extracted with RLN2 solution (50 mM Tris-HCl pH 8.0, 500 mM NaCl, 1.5 mM MgCl2,

0.5% NP-40, 2 mM VRC). Further centrifugation yielded the nuclear-soluble fraction as

supernatant and chromatin-associated fraction as pellet. RNA was extracted from

fractions with Trizol (Life Technologies).

RNA FISH

A Stellaris single molecule FISH probe for NORAD was designed using the Stellaris

Probe designer (https://www.biosearchtech.com/stellarisdesigner/). Each probe consists

of a pool of 48 oligonucleotides, each labeled with CAL Fluor Red 610. Cells were

grown on Nunc Lab-Tek II CC2 chambered slides (Thermo) and fixed with 4%

formaldehyde for 10 min. Fixed cells were permeabilized in 70% EtOH for 1 hour and

dehydrated for 2 min each in 70%, 80%, 95%, and 100% EtOH, then air-dried. Slides

were washed in PBS with 0.1% Tween 20 and hybridized at 37°C overnight in 100 L

hybridization buffer (100 mg/mL dextran sulfate, 10% formamide in 2X SSC) containing

125 nM probe per each 22 mm x 22 mm surface under a coverglass sealed with rubber

cement. Slides were washed with 10% formamide in 2X SSC and mounted with

ProLong Gold Antifade with DAPI (Molecular probes).

162

NORAD affinity purification and mass spectrometry

NORAD fragments were amplified with primers containing T7 and SP6 promoter

sequences (Table 4.2) and used as templates for the MEGAscript T7/SP6 Transcription

Kit (Ambion) with the Biotin RNA labeling mix (Roche). In vitro transcribed RNA was

treated with DNase I and purified with the RNeasy kit (Qiagen). 30 pmol purified

biotinylated RNA was heated to 90°C in 60 L RNA structure buffer (10 M Tris-Cl pH

7.0, 0.1 M KCl, 10 mM MgCl2) for 2 minutes then put on ice for 2 minutes. 2×107 cells

were harvested by scraping and snap-frozen before resuspension in 1.2 mL lysis buffer

[150 mM NaCl, 50 mM Tris-Cl pH 7.5, 0.5% Triton X-100, 1mM PMSF, 1x protease

inhibitor cocktail (Roche), and 100 U/ml of SUPERaseIN (Ambion)]. Lysates were

sonicated using a Bioruptor (Diagenode) for 10 min with 30 sec on/off cycles and pre-

cleared with 50 L washed streptavidin C1 Dynabeads (Invitrogen) at 4°C for 1 hour. 30

pmol biotinylated RNA was then added to pre-cleared lysates and rotated at 4°C for 2

hours, followed by addition of 50 L streptavidin C1 Dynabeads and further rotation for 1

hour. Beads were washed 6 times with lysis buffer at 4°C and proteins were eluted by

incubating in RNase A buffer (50 mM Tris-Cl pH 7.5, 150 mM NaCl, 100 g/mL RNase

A) for 35 minutes at 37°C. Eluted proteins were subjected to label-free quantification

using mass spectrometry and SINQ spectral index analysis (Trudgian et al., 2011) at the

UT Southwestern Proteomics core. Proteins detectable in at least 1 sense NORAD

fragment pull-down with ≥5 spectral counts were included in subsequent analyses.

163

Table 4.2 Primers used for in vitro transcription for NORAD affinity purification

Primer name

Description Sequence 5' to 3'5

T7S6-5endF

forward primer to amplify 5p for in vitro transcription: amplicon 1..813 of NORAD

TAATACGACTCACTATAGGGAGAAGTTCCGGTCCGGCAGAGAT

T7S6-5endR

reverse primer to amplify 5p for in vitro transcription: amplicon 1..813 of NORAD

ATTTAGGTGACACTATAGAAGGGTTCTATTAAAAGGTTGGGGTGGAG

T7S6-ND1F

forward primer to amplify ND1 for in vitro transcription: amplicon 704..1322 of NORAD

TAATACGACTCACTATAGGGAGACCACCCTCTGGGAAGATTTACTG

T7S6-ND1R

reverse primer to amplify ND1 for in vitro transcription: amplicon 704..1322 of NORAD

ATTTAGGTGACACTATAGAAGGGAACAGGTGATTTGGCCATTCCCC

T7S6-ND2F

forward primer to amplify ND2 for in vitro transcription: amplicon 1290..1914 of NORAD

TAATACGACTCACTATAGGGAGATGGCCAAATCACCTGTT

T7S6-ND2R

reverse primer to amplify ND2 for in vitro transcription: amplicon 1290..1914 of NORAD

ATTTAGGTGACACTATAGAAGGGTATAGACATTACTATACTGTTCAC

T7S6-ND3F

forward primer to amplify ND3 for in vitro transcription: amplicon 1882..2569 of NORAD

TAATACGACTCACTATAGGGAGAGCCACCTTTGTGAACAGTAT

T7S6-ND3R

reverse primer to amplify ND3 for in vitro transcription: amplicon 1882..2569 of NORAD

ATTTAGGTGACACTATAGAAGGGAATGGCAAAACACCATTTGCAATT

T7S6-ND4F

forward primer to amplify ND4 for in vitro transcription: amplicon 2494..3156 of NORAD

TAATACGACTCACTATAGGGAGAAATGCTGTTTGGAAGTGGAAT

T7S6-ND4R

reverse primer to amplify ND4 for in vitro transcription: amplicon 2494..3156 of NORAD

ATTTAGGTGACACTATAGAAGGGGCACAAATATCAAAATGGGTA

T7S6-ND5F

forward primer to amplify ND5 for in vitro transcription: amplicon 3133..3775 of NORAD

TAATACGACTCACTATAGGGAGACAGTACCCATTTTGATATTTGTGC

T7S6-ND5R

reverse primer to amplify ND5 for in vitro transcription: amplicon 3133..3775 of NORAD

ATTTAGGTGACACTATAGAAGGGAAGATGGGGTTTCACCATGTTGG

T7S6-3endF

forward primer to amplify 3p for in vitro transcription: amplicon 3951..5287 of NORAD

TAATACGACTCACTATAGGGAGAGTGCACAATGTAGGTTAACAGTA

T7S6-3endR

reverse primer to amplify 3p for in vitro transcription: amplicon 3951..5287 of NORAD

ATTTAGGTGACACTATAGAAGGGGGAAATTGAAAAACACAAGCAAA

5 Red and blue sequences represent the T7 and SP6 promoters, respectively.

164

Immunoprecipitation and antibodies

For PUM immunoprecipitation, 1×107 cells were resuspended in 1 mL Polysome Lysis

Buffer (PLB; 15 mM Tris-Cl pH 7.4, 300 mM NaCl, 15 mM MgCl2, 1% Triton X-100, 1

mM DTT, 100 U/ml SUPERase-IN, 1 mM PMSF, 1X Roche protease inhibitor cocktail)

and incubated on ice for 30 min. Lysates were pre-cleared with washed Protein G

magnetic beads (Novex) at 4°C for 30 minutes. 10 g of PUM1 antibody (sc-135049,

Santa Cruz), PUM2 antibody (sc-31535, Santa Cruz), rabbit IgG (sc-2027, Santa Cruz),

or goat IgG (sc-2028, Santa Cruz) were incubated with 200 L Protein G magnetic

beads in PBS with 0.02% Tween-20 for 30 min at room temperature and added to the

pre-cleared lysates, followed by rotation at 4°C for 4 hours and 3 washes in PLB on ice.

10% of beads were resuspended in Laemmli buffer for western blotting and RNA was

isolated from the remaining beads using Trizol. Antibodies used for western blotting

were PUM1 (ab92545, Abcam), PUM2 (ab92390, Abcam), -Tubulin (T9026, Sigma),

and GAPDH (2118, Cell Signaling).

RNA-seq and analysis

RNA-seq libraries were prepared using the TruSeq Stranded Total RNA with Ribo-Zero

Human/Mouse/Rat Sample Preparation kit (Illumina) and sequenced using the 100 bp

paired-end protocol on an Illumina HiSeq 2000 in the McDermott Center Next

Generation Sequencing Core at UT Southwestern. For comparing NORAD+/+ and

NORAD−/− HCT116 cells, 3 biological replicates per genotype were sequenced with an

average paired-read depth of 52×106. For PUM overexpression experiments, 3

replicates of GFP-expressing HCT116 cells (negative control) and 2 independent PUM1-

or PUM2-overexpressing clones (2 replicates each) were sequenced. An average of

165

27×106 paired-reads were generated per sample. Quality assessment of the RNA-seq

data was performed with NGS-QC-Toolkit (Patel and Jain, 2012). Reads with mean

Phred quality scores of less than 20 were removed from further analysis. Filtered reads

were then aligned to the human reference genome (hg19) using Tophat2 (v2.0.10) (Kim

et al., 2013) with library type setting ‘fr-firststranded’ and other parameters set to default.

Differential gene expression analysis was performed using the R package edgeR

(v1.10.1) (Robinson et al., 2010) following a published protocol (Anders et al., 2013).

Gene ontology analysis was performed using DAVID (http://david.abcc.ncifcrf.gov)

(Huang et al., 2007).

Recombinant PUMILIO protein purification

Human PUM1 and PUM2 UltimateORF clones (Life Technologies) were subcloned into

destination vector pDEST17 (Life Technologies) using Gateway LR Clonase II Enzyme

mix (Life Technologies) for expression of 6Xhistidine tagged-recombinant proteins.

Plasmids were transformed into Rosetta 2(DE3)pLysS competent cells (Novagen) and

recombinant proteins were induced with 0.2 mM IPTG at 20°C. Bacteria were lysed in 8

M urea lysis buffer (100 mM NaH2PO4, 10 mM Tris-Cl, 8 M urea, pH 8.0) and bound

proteins were recovered on Ni-NTA agarose resin, washed with lysis buffer at pH 6.3,

and eluted with 250 mM imidazole at pH 4.5. The concentration of purified proteins was

determined by electrophoresis alongside a serial dilution of BSA standards (Pierce) with

coomassie staining.

166

Time-lapse imaging

Mitotic cells were recorded and evaluated as described in Chapter 2. Briefly, cells were

grown on NUNC chambered coverglasses (Thermo). To visualize DNA in HCT116 cells,

a cell permeable Hoechst dye (33342; Invitrogen) was used at 25-50 ng/mL. Time-lapse

fluorescence images were collected every 5 minutes for 24-48 hours using a Leica

inverted microscope equipped with an environmental chamber that controls temperature

and CO2, a 63X oil-objective, an Evolve 512 Delta EMCCD camera, and Metamorph

software (MDS Analytical Technologies).

PAR-CLIP

PAR-CLIP was performed essentially as described in (Spitzer et al., 2014). Briefly,

HCT116 cells and isogenic NORAD−/− cells were grown to ~80% confluence at which

point 4-thiouridine (Sigma) was added to the media at final concentration of 100 M.

After 18 hours, 4-thiouridine-labeled cells were washed with cold PBS and crosslinked

using 365 nm UV with 150 mJ/cm2 total energy in a Spectrolinker XL-1500 (Spectroline).

A total of ~720 million cells (36 150 mm dishes) per CLIP condition were collected and

resuspended in NP-40 lysis buffer (50 mM HEPES-KOH, pH 7.5, 150 mM KCl, 2mM

EDTA-NaOH, pH 8.0, 1 mM NaF, 0.5% NP-40 substitute, 0.5 mM DTT, and Complete

EDTA-free protease inhibitor cocktail). After centrifugation, the soluble fraction was

filtered through a 5 m syringe filter and incubated with 1 U/

min. 100 g PUM2 antibody (K-14, sc-31535, Santa Cruz) was conjugated to Protein G

magnetic beads and incubated with RNase-treated lysate at 4ºC for 4 hours. Bead-

bound PUM2 RNP complexes were washed with IP wash buffer (50 mM HEPES-KOH,

pH 7.5, 300 mM KCl, 0.05% NP-40 substitute, 0.5 mM DTT and Complete EDTA-free

167

protease inhibitor cocktail) followed by an additional RNase T1 treatment (1 U/L at

22ºC for 15 min). Beads were further washed with high-salt wash buffer (50 mM

HEPES-KOH, pH 7.5, 500 mM KCl, 0.05% NP-40 substitute, 0.5 mM DTT and Complete

EDTA-free protease inhibitor cocktail) and the 5 ends of PUM2 bound RNAs were

labeled with 32P using calf intestinal alkaline phosphatase followed by T4 PNK and [-

32P]-ATP. 100 M unlabeled ATP was added after radiolabeling to ensure all RNA

species were 5-phosphorylated. Labeled RNP complexes were eluted from beads by

boiling in 1X SDS-PAGE loading buffer (62.5 mM Tris HCl pH 6.8, 1.5% SDS, 8.3%

Glycerol, 0.005% Bromophenol blue) and resolved on an SDS-PAGE gel. After

autoradiography, bands corresponding to the PUM2 RNP size (~120 kDa) were excised

and electro-eluted using D-tube dialyzer tubes (Milipore) in MOPS-SDS running buffer.

Eluted samples were then digested with 1.2 mg/mL Proteinase K (Sigma) at 55ºC for 30

min. RNA was isolated using phenol/chloroform extraction followed by ethanol

precipitation. Sequencing libraries were constructed using the TruSeq Small RNA

Library Preparation Kit (Illumina). Sequencing was performed on a NextSeq 500

(Illumina).

Quality assessment of the CLIP-Seq data was done using NGS-QC-Toolkit (Patel and

Jain, 2012). Reads with mean phred quality scores of less than 20 were removed from

further analysis. Cutadapt (v1.2.1) (Martin, 2011) was used to remove the sequencing

adapters using default settings and all reads 15 nt or longer were aligned to repeat

masked NORAD and the human transcriptome (Ensembl GRCh37.75) in two-steps:

First, all reads were aligned to NORAD using Bowtie (v1.0.0) (Langmead et al., 2009),

requiring unique mapping within NORAD and allowing up to 1 mismatch (-v 1 -m 1).

Then the rest of the reads were aligned to the transcriptome using Bowtie with the

168

settings (-a -m 1). CLIP crosslinking sites were identified as follows: 1) All transcriptome

coordinates were converted to genomic coordinates and all reads with unique genomic

location were kept; 2) PCR duplicates were removed; 3) Reads with at least 1 nt overlap

were clustered; 4) All clusters with at least 5 reads and at least 1 T to C mutation were

defined as CLIP clusters.

PUMILIO overexpression

Human PUM1 and PUM2 UltimateORF clones (Life technologies), or eGFP as a

negative control, were subcloned into pLX302 (Addgene plasmid #25896) (Yang et al.,

2011) using Gateway LR Clonase II Enzyme mix (Invitrogen). The resulting lentiviral

backbones were packaged in HEK293T cells by co-transfection with psPAX2 and

pMD2.G (Addgene plasmids #12260 and #12259). Viral supernatants were passed

through a 0.45 micron filter and used to transduce HCT116 cells in the presence of 8

g/mL polybrene (EMD Milipore). Beginning 48 hours after transduction, cells were

selected with 1 g/mL puromycin for at least 7 days and single cell-derived clones were

screened for PUM expression by western blot.

Generation of PUM1 and PUM2 knockout cells

PUM1−/− and PUM2−/− cells were generated using the CRISPR/Cas9 system to introduce

frameshift mutations in exons upstream of the sequence encoding PUMILIO homology

domains (PUM-HD), which are essential for target binding. To generate PUM1 and

PUM2 individual knockouts, single guide RNAs (sgRNAs) targeting exon 7 for PUM1

and exon 8 for PUM2 were designed (Table 4.3) and cloned into pX459 (Addgene

plasmid, #48139) followed by transfection into HCT116 and puromycin selection. Single

169

cell clones were screened by western blotting using PUM antibodies (Abcam ab92545

for PUM1 and ab92390 for PUM2) and validated by sequencing of mutant alleles after

amplification and TA cloning of CRISPR/Cas9 target sites, using primer pairs provided in

Table 4.4. To generate PUM1−/−; PUM2−/− double knockout cells, pX458 (Addgene

plasmid, #48138) expressing the sgRNA used for single PUM1 knockout was

transfected into PUM2−/− cells followed by FACS sorting of GFP+ cells and single cell

cloning. Screening of double knockout cells was performed by western and sequencing

as described above. Finally, to knockout NORAD in PUM1−/−, PUM2−/−, and PUM1−/−;

PUM2−/− cells, TALEN-mediated homologous recombination (HR) was used using a

modified lox-STOP-lox cassette carrying a hygromycin resistance cassette instead of a

puromycin resistance cassette.

PUM1/PUM2 knockdown experiments

ON-TARGETplus siRNAs (GE-Dharmacon) targeting human PUM1 (9696) and PUM2

(23369) were purchased from GE Dharmacon and tested to identify those that yielded

the most efficient knockdown. Two siRNAs for PUM1 and two siRNAs for PUM2 were

selected (target sequences provided in Table 4.5). HCT116 cells were transfected once

per day for 3 consecutive days. 5 days after the first transfection, cells were plated on

chambered coverglasses and mitoses were recorded by time-lapse imaging as

described above.

qPCR validation of PUM target genes repressed in NORAD−/− cells

Primers are provided in Table 4.6.

170

Table 4.3 Oligos for cloning sgRNA into CRISPR/Cas9 plasmids

Primer name

Description Sequence 5' to 3'6

PUM1 sgRNA fwd

single guide RNA seqeuence insert for CRISPR/Cas9 targeting of PUM1

CACCGCAGCAAGCGCATTAGGTCTT

PUM1 sgRNA rev

single guide RNA seqeuence insert for CRISPR/Cas9 targeting of PUM1

AAACAAGACCTAATGCGCTTGCTGC

PUM2 sgRNA fwd

single guide RNA seqeuence insert for CRISPR/Cas9 targeting of PUM2

CACCGGCGTCCTCTTACTCCCAATC

PUM2 sgRNA rev

single guide RNA seqeuence insert for CRISPR/Cas9 targeting of PUM2

AAACGATTGGGAGTAAGAGGACGCC

6 Red sequences are 5' overhangs for cloning into CRISPR/Cas9 plasmids (pX458 and pX459).

171

Table 4.4 TA cloning of PUM CRISPR/Cas9 targeted alleles

Sequence name

Description Sequence 5' to 3'

PUM1 TA fwd

Amplicon of PUM1 CRISPR/Cas9 target site for TA cloning

TCCCATGGGAATGAAGTAGAGTGT

PUM1 TA rev

Amplicon of PUM1 CRISPR/Cas9 target site for TA cloning

AACTGGACAAAAGGAAGAGGCC

PUM2 TA fwd

Amplicon of PUM2 CRISPR/Cas9 target site for TA cloning

AAAAATATCCAAAGGCTGTTTGTAA

PUM2 TA rev

Amplicon of PUM2 CRISPR/Cas9 target site for TA cloning

TAGGCAAGATTTTAAATACAGTTTGATT

172

Table 4.5 siRNA target sequence of PUM

Sequence name

Description Target sequence 5' to 3'

siNon-Target

Negative control siRNA from Dharmacon GCGCGATAGCGCGAATATA

siPUM1-1 (Set1)

siRNA sequence targeting 696..714 of PUM1 (NM_001020658)

GGTCAGAGTTTCCATGTGA

siPUM1-2 (Set2)

siRNA sequence targeting 3528..3546 of PUM1 (NM_001020658)

CGGAAGATCGTCATGCATA

siPUM2-1 (Set1)

siRNA sequence targeting 714..732 of PUM2 (NM_001282752.1)

CTGAAGTAGTTGAGCGCTT

siPUM2-2 (Set2)

siRNA sequence targeting 4965..4983 of PUM2 (NM_001282752.1)

AGACATAACAGTAACACGA

173

Table 4.6 qPCR primers

Primer name

Description Sequence 5' to 3'

NEAT1 fwd forward qPCR primer for NEAT1 AGGCAGGGAGAGGTAGAAGG

NEAT1 rev reverse qPCR primer for NEAT1 TGGCATGGACAAGTTGAAGA

PUM1 fwd forward qPCR primer for PUM1 CCGGGCGATTCCTGTCTAA

PUM1 rev reverse qPCR primer for PUM1 CCTTTGTCGTTTTCATCACTGTCT

PUM2 fwd forward qPCR primer for PUM2 GGGAGCTTCTCACCATTCAATG

PUM2 rev reverse qPCR primer for PUM2 CCATGAAAACCCTGTCCAGATC

SMC3 F forward qPCR primer for SMC3 AGGATTTGGAAGACACTGAAGC

SMC3 R reverse qPCR primer for SMC3 TCATTAAGATCCTGGTCCAGTTTA

SMC1A F forward qPCR primer for SMC1A CGGTGATCTGTGTGAGGATCT

SMC1A R reverse qPCR primer for SMC1A TTCTGCTGCAGTGTGTTCATC

HMGB1 F forward qPCR primer for HMGB1 CATTGAGCTCCATAGAGACAGC

HMGB1 R reverse qPCR primer for HMGB1 GGATCTCCTTTGCCCATGT

LIN9 F forward qPCR primer for LIN9 CAAAGTTTTGCATAAAGTTCAACAGT

LIN9 R reverse qPCR primer for LIN9 CGTCTCATATCTGTTGGCTGAT

MCM8 F forward qPCR primer for MCM8 CCAGGCCTAGGAAAAAGTCA

MCM8 R reverse qPCR primer for MCM8 GAGGTGGTCGTGGTGTTACC

EXO1 F forward qPCR primer for EXO1 CTTTCTCAGTGCTCTAGTAAGGACTCT

EXO1 R reverse qPCR primer for EXO1 TGGAGGTCTGGTCACTTTGA

MCM4 F forward qPCR primer for MCM4 TGTTTGCTCACAATGATCTCG

MCM4 R reverse qPCR primer for MCM4 CGAATAGGCACAGCTCGATA

DNMT1 F forward qPCR primer for DNMT1 GAGGCCTTCACGTTCAACA

DNMT1 R reverse qPCR primer for DNMT1 CTGGGTACAGGTCCTCATCC

SLBP F forward qPCR primer for SLBP CCCTAAACCCCGTTCCAG

SLBP R reverse qPCR primer for SLBP TCATTGATGAGGAGTTTCCTTTT

ESCO2 F forward qPCR primer for ESCO2 AACCCTGAAGATGAAATGCAG

ESCO2 R reverse qPCR primer for ESCO2 CCCATCCCAAAACTCTGCTA

174

PARP1 F forward qPCR primer for PARP1 TCTTTGATGTGGAAAGTATGAAGAA

PARP1 R reverse qPCR primer for PARP1 GGCATCTTCTGAAGGTCGAT

PARP2 F forward qPCR primer for PARP2 ACCAAGAAAGCCCCACTTG

PARP2 R reverse qPCR primer for PARP2 AGCCCGAATACAATCCTCAA

BARD1 F forward qPCR primer for BARD1 CATTCTGAGAGAGCCTGTGTGTT

BARD1 R reverse qPCR primer for BARD1 TCCAATGCAGTCACTTACACAAT

CENPJ F forward qPCR primer for CENPJ AAAGAAGAAAACCGTAACCATCC

CENPJ R reverse qPCR primer for CENPJ GTTCTGTCACTTTCTCCCAACA

LMNB2 F forward qPCR primer for LMNB2 GGCTCCTGCTCAAGATCTCA

LMNB2 R reverse qPCR primer for LMNB2 GACTCGTACAGCGCCTTGAT

MASTL F forward qPCR primer for MASTL CAGTCCCAAATGGGAAAAAG

MASTL R reverse qPCR primer for MASTL CAACTGCATTCCAACTCATCA

RBBP8 F forward qPCR primer for RBBP8 CTTGGGCACACGTGTAAGG

RBBP8 R reverse qPCR primer for RBBP8 AATGTAGCGGAATCGGTGTC

WDHD1 F forward qPCR primer for WDHD1 ACATCCTAGAAGATGATGAAAACTCA

WDHD1 R reverse qPCR primer for WDHD1 TTGTGAATGCTGCCTTCTTG

PRC1 F forward qPCR primer for PRC1 TTTACAAACCGAGGAGGAAATC

PRC1 R reverse qPCR primer for PRC1 TCGTGCCTTCAACTCTTCTTC

NET1 F forward qPCR primer for NET1 AGAATCGAAGCGAGCAAAGT

NET1 R reverse qPCR primer for NET1 CCAAGATGTCTTGAAACAGGAA

RBMX F forward qPCR primer for RBMX CAGTTCGCAGTAGCAGTGGA

RBMX R reverse qPCR primer for RBMX TCGAGGTGGACCTCCATAAC

175

Chapter 5: Generation of Norad knockout mouse using

CRISPR/Cas9 genome editing system

Introduction

At least three evidences are supporting the hypothesis that an annotated lncRNA,

2900097C17Rik is the functional ortholog of human NORAD. First of all, like in case of

NORAD, many paralogs of 2900097C17Rik can be found in mouse genome, while only

2900097C17Rik shows conserved synteny to NORAD (Figure 3.1, 4.5). Secondly, a

previous study of mouse Pumilio targets identified 2900097C17Rik as one of Pum

interacting transcripts (Chen et al., 2012). As expected from the sequence similarity to

human ortholog, 2900097C17Rik also harbors 15 PREs as potential binding sites for

Pumilio proteins. For these reasons, we hypothesized the annotated transcript

2900097C17Rik is a functional mouse ortholog of NORAD, therefore we named it as

Norad and decided to generate mice with genetic ablation of this allele.

176

Results

Flanking CRISPR/Cas9 for Norad deletion allele

To generate whole deletion allele of Norad, two single guide RNAs (sgRNAs) flanking

Norad were designed (Figure 5.1A). Each sgRNA targets within 1 kb from the either

end of the allele (Figure 5.1B) and successful non-homologous end joining (NHEJ)

product will generate 6.7 kb deletion allele. In order to test if designed gRNAs shows

genome editing events in mouse embryonic cells, gRNAs were cloned into Cas9

expression vector (pX330) and transfected into E14tg2a cells using highly efficient

transfection reagent, Xfect (Clontech) which shows >70% transfection efficiency with

GFP control (Figure 5.2A, B). Using this transfection condition, cells were transfected

with CRISPR/Cas9 followed by genomic DNA isolation and T7 Endonuclease I mismatch

cleavage assay (Mashal et al., 1995). Cleavage products were detected (Figure 5.2B) at

expected sizes (Figure 5.1B, 5.2C).

177

Figure 5.1 Two flanking gRNAs were designed to generate Norad deletion allele

(A) Schematic representation of deletion strategy in mouse genome. Two flaking gRNAs are simultaneously injected into mouse zygotes to induce non-homologous end joining leaving Norad allele out of genome.

(B) Genomic locations of two gRNAs and primers used for assessing CRISPR/Cas9 activity are indicated.

178

Figure 5.2 Assessment of CRISPR/Cas9 activity in mouse ES cells

(A) Fluorescence microscope image of mouse ES cells transfected with GFP plasmid and (B) Flow-cytometry of GFP transfected cells shows more than 72% cells are transfected and express GFP protein.

(C) T7E1 cleavage assay to assess presence of mutations at targeted genomic loci after CRISPR/Cas9 expression. Cleavage product from T7Endonuclease1 as show as predicted from Figure 5.1B.

179

In vitro transcription and RNA modification for zygotic injection

Genetically engineered mice provides useful information about particular gene functions

in an organismal level (Capecchi, 2005). However, traditional methods of generating

knockout mice takes long time (> 1 year) and effort, and sometimes even impossible

when embryonic stem cells (ES cells) are not available for that animal. However, Rudolf

Jaenisch group recently demonstrated high-efficiency, multiplexed genome modification

by co-injection of single guide RNAs (sgRNAs) and Cas9 mRNA directly into one-cell

mouse (Yang et al., 2013; Wang et al., 2013) and this new application of CRISPR/Cas9

technology created a major breakthrough in animal studies (Hsu et al., 2014). In order

to apply same methodology to generate Norad knockout mouse, we synthesized sgRNA

from pX330 we tested in Figure 5.2. Along with sgRNA, Cas9 mRNA was also

synthesized from the same plasmids, followed by 5’ capping and poly-adenylation for

efficient translation inside mouse zygote (Weill et al., 2012). Size and quantity of

prepared RNA were verified by running them on denaturing gels (Figure5.3) and

subjected to one-cell mouse embryo injection from the Transgenic core.

180

Figure 5.3 Injectable form of RNAs into one-cell mouse embryo

sgRNAs targeting flanking intergentic sequences of Norad and poly-adenylated Cas9

mRNA were synthesized in vitro using T7 polymerase. After RNA purification, size of

each RNA components were validated on denaturing urea gel (A) or agarose gel (B).

sgRNAs are ~100 nt and capped Cas9 mRNA is 4.3 kb. Poly-A tailed mRNA is shown

at around 6 kb.

181

Discussion

We initially designed conditional alleles which can be generated by co-injecting single

stranded DNA targeting constructs containing loxP sites, flanked by homology arms on

either side that can be utilized as a donor templates for homologous recombination

(Yang et al., 2013). However this strategy requires very efficient targeting efficiency

since all 4 targeting sites need to be recombined simultaneously at one-cell stage.

Through screening of mice derived from injected embryo, we found some founders of

doubly inserted loxP allele on one strand, but no case were observed that carries both

alleles inserted (Data not shown). Instead, we found deletion events as described in

Figure 5.1A at some frequency and could cross these founders to generated knockout

mice. It will be very interesting if these mice also shows some levels of chromosomal

instability and see what physiologic outcome of this is.

Injection of RNA into one-cell embryo for genome editing is very innovative and

fascinating method that now being applied to other animals including monkeys (Niu et

al., 2014). However, there are still major caveat that zygotic RNA injection is not the

best option because inefficient translation of Cas9 mRNA can lead to genetic mosaicism,

as we observed through genotyping of our mice (Data not shown). As suggested from

elsewhere (Hsu et al., 2014), future efforts to optimize injecting protein Cas9 loaded with

sgRNA which can presumably yield more efficiency at true one-cell stage might improve

this gene targeting technology.

182

Materials and Methods

CRISPR/Cas9 sgRNA designing and cloning into expression vector

CRISPR/Cas9 target sites were selected from web-based designer tool provided by

Feng Zhang Laboratory (http://crispr.mit.edu/). Genomic DNA sequences flanking Norad

allele was searched for target sequences with high “quality score” with minimum off-

target sites. Six sgRNAs targeting left side (5’ flanking) and two sgRNAs targeting right

side (3’ flanking) were tested for best performing sgRNA based on T7 Endonuclease

cleavage assay (Mashal et al., 1995) after cloning into BbsI site of bicistronic expression

vector encoding Cas9 and sgRNA, pX330 (Cong et al., 2013) using oligos provided in

Table 5.1.

Mouse ES cell culture and transfection

E14TG2a embryonic stem cells were cultured in GMEM with 1% nonessential amino

acids, β-mercaptoethanol, and leukocyte inhibitory factor (LIF). Transfection was

performed with Xfect (Clontech) according to manufacturer’s instruction. Briefly,

500,000 ES cells were plated on 6 well plate coated with 0.2% geletin, 5 hours before

transfection. 5 g DNA was mixed with 2.5 l Xfect polymer in 200 l reaction buffer.

After 10 minutes incubation, these nanoparticle solutions were added to ES cells.

183

Table 5.1 Oligos used for CRISPR/Cas9 plasmid construction

Primer name

Description Sequence 5' to 3'7

Norad R1 fwd

single guide RNA sequence insert for CRISPR/Cas9 targeting of Norad R1

CACCTGGCCTGGGTTAGATGTACC

Norad R1 rev

single guide RNA sequence insert for CRISPR/Cas9 targeting of Norad R1

AAACGGTACATCTAACCCAGGCCA

Norad L3 fwd

single guide RNA sequence insert for CRISPR/Cas9 targeting of Norad L3

CACCGGCAACACTATCCTTGGGCC

Norad L3 rev

single guide RNA sequence insert for CRISPR/Cas9 targeting of Norad L3

AAACGGCCCAAGGATAGTGTTGCC

7 Red sequences are 5' overhangs for cloning into CRISPR/Cas9 plasmids (pX330).

184

Genomic DNA isolation and T7 Endonuclease I cleavage assay

Two Days after transfection, cells were harvested for genomic DNA isolation by using

DNeasy Blood and Tissue Kit (Qiagen) according to manufacturer’s instruction. This

gDNA was used as template for PCR amplification for either side of Norad allele using

primer provided in Table 5.2. PCR products were purified with QIAquick PCR

Purification Kit (Qiagen) and quantified by NanoDrop2000 (Thermo). 200 ng DNA was

suspended in 1x NEB2 buffer (New England Biolabs) then denatured at 95ºC for 5

minutes followed by cooling from 95ºC to 85ºC, at ramping speed at -2ºC/sec, then slow

annealing 85ºC to 25ºC, at ramping speed at -0.1ºC/sec, allowing hybrid generation of

wild type and mutant DNA heteroduplex. 10 units of T7 Endonuclease I (NEB) were

added and incubated at 37ºC for 15 minutes. Reaction was stopped by adding 2 l of

0.25 M EDTA then ran on EtBr stained agarose gels.

185

Table 5.2 Primers used for T7 Endonuclease I cleavage assay

Primer name Description Sequence 5' to 3'

Norad L fwd Left arm PCR amplicon for T7EI assay

GCATTGTACTTTGGAACCATAA

Norad L rev Left arm PCR amplicon for T7EI assay

AGAGTGTGTGTAAAGAGCCT

Norad R fwd Right arm PCR amplicon for T7EI assay

ACTTTGTTCTTGCTTTCTTGTTT

Norad R rev Right arm PCR amplicon for T7EI assay

CCTGCGCCACCCAGAGAAGC

186

In vitro transcription and RNA purification for one-cell embryo injection

DNA template for sgRNA transcripts and Cas9 mRNA were PCR amplified from guide

RNA inserted pX330 vectors using primers provided in Table 5.3. DNA products at

expected sizes were gel extracted and purified using QIAquick Gel Extraction Kit

(Qiagen) and quantified by Nanodrop. 200 ng Cas9 mRNA was transcribed using

mMESSAGEmMACHINE T7 Ultra Kit (Ambion) according to manufacturer’s instruction,

followed by DNase I digestion for 15 min. After further poly-adenylation reaction using

the same kit, injectable form or mRNA was purified using MEGAclear kit (Ambion). For

in vitro transcription of sgRNA, 500 ng above purified DNA templates were transcribed

using MEGAshortscirpt kit (Ambion) according to manufacturer’s instruction. DNase I

treated RNA were further purified using MEGAclear kit (Ambion). All RNAs were

provided to the Transgenic core at UT Southwestern Medical Center and injected into

one-cell embryo by Mylinh Nguyen.

187

Table 5.3 Primers used for in vitro transcription of sgRNA and Cas9 mRNA

Primer name Description Sequence 5' to 3'8

Norad R1 IVT Norad R1 sgRNA forward primer with T7 promoter

ttaatacgactcactatagGTGGCCTGGGTTAGATGT

ACC

Norad L3 IVT Norad L3 sgRNA forward primer with T7 promoter

ttaatacgactcactatagGGCAACACTATCCTTGGG

CC

CommonR Common reverse primer for all sgRNAs

AAAAGCACCGACTCGGTGCC

Cas9 IVT fwd Cas9 forward primer with T7 promoter

taatacgactcactatagGGAGAATGGACTATAAGGA

CCACGAC

Cas9 R Reverse primer for Cas9

GCGAGCTCTAGGAATTCTTAC

8 Red sequences indicate T7 promoter

188

Chapter 6: Future directions

Transcriptome of primary miRNA in mammalian cells

MicroRNA (miRNA) expression is dynamically regulated during development, across

tissues, and in various human diseases. While a subset of miRNAs are hosted in

protein-coding genes, the majority of pri-miRNAs are transcribed as poorly-characterized

noncoding transcripts. Due to the efficiency of DROSHA processing, the abundance of

pri-miRNAs is very low at steady-state. Therefore, elucidation of pri-miRNA structure

has remained a significant challenge. To address this problem, we developed an

experimental and computational approach that allows rapid transcriptome-wide mapping

of pri-miRNA structures. By performing deep RNA-seq in cells expressing a dominant-

negative DROSHA mutant protein, we demonstrated dramatic enrichment of intact pri-

miRNAs, resulting in greater coverage of these transcripts compared to standard RNA-

seq.

189

While we attempted to utilize currently best available tools and materials as much as we

could, there are multiple reasons we still might have missed important pieces of puzzles

rendering our transcriptomic map to be incomplete or even distorted in some cases. For

example, our lab reported miRNA expression is globally changed in different cell

densities (Hwang et al., 2009) and it can be also true in 3 dimensional cultures, not to

mention in a situation when cells are interacting with other types of cells (i.e. immune

cells or stem cells in their niche). 2 dimensional cultured cells were the best option we

had for its ease of obtaining large amount of materials in a condition when DROSHA

activity is suppressed. If technical advances allow, transcriptome study in more

physiologically relevant cultures or tissues might provide more accurate data. Another

point worthy of re-consideration is use of DROSHA mutant. Obviously, introduction of

this microprocessor inhibitor enhanced our mapping coverage. However, this is under

the assumption that processing of primary miRNA transcript is largely DROSHA

dependent. Therefore, miRNAs that are cropped from primary transcripts more

efficiently, independent from DROSHA activity might have been easily missed from our

mapping effort. In an extension of this criticism, one could also imagine some noncoding

RNAs being actively transcribe but their discovery have been elusive due to their

peculiarity of biogenesis and lack of current sequencing methodology to capture such

RNA species.

Rush for more functional long noncoding RNAs

The fidelity of chromosome segregation must be maintained at a high level to ensure the

accurate transmission of genetic information as well as to avoid severe pathologic

consequences. Chromosomal instatbiity (CIN), a phenotype characterized by the

frequent gain or loss of chromosomes during mitosis, is a hallmark of cancer cells and is

190

a key mechanism that contributes to gain- and loss-of-function of oncogenes and tumor

suppressors. Long noncoding RNAs (lncRNAs) have emerged as regulators of diverse

biological processes, yet their roles in the maintenance of genomic stability remain

poorly understood. In a screen for human lncRNAs that are regulated by DNA damage,

we identified a poorly characterized noncoding transcript that we termed Noncoding

RNA Activated by DNA Damage (NORAD) that is essential for the maintenance of

genomic stability in human cells.

In chapter 3, we showed that NORAD is a broadly expressed, highly abundant, and

conserved mammalian lncRNA. Inactivation of NORAD in human cells triggers dramatic

aneuploidy. Furthermore, throughout chapter 4, we also demonstrated NORAD

functions as a potent molecular decoy for PUMILIO proteins, which repress a program of

genes necessary to maintain genomic stability (Figure 6.1). This functional and

mechanistic study was impossible without serendipitous discovery of its phenotype in

NORAD−/− cells.

191

Figure 6.1 Graphical summary of NORAD function

NORAD is a highly conserved and abundant long noncoding RNA that is broadly expressed in mammalian tissues. NORAD functions as potent molecular decoy for PUMILIO proteins, which normally bind to, and trigger decay of, messenger RNAs. In the absence of NORAD, PUMILIO hyperactivity results in repression of a large program of genes that are essential for normal mitosis, DNA repair, and DNA replication. This causes dramatic aneuploidy in previously karyotypically normal human cells.

192

It’s becoming more and more cliché that transcription of genome is pervasive and such

transcripts from previously overlooked “junk DNA” might be encoding functional

noncoding RNAs (Guttman and Rinn, 2012). Possible functionality of these unexplored

and currently unknown transcripts are supported by multiple lines of evidence, including

their regulated patterns of expression (Cawley et al., 2004). While increasing number or

literatures are beginning to elucidate some of their functions, pending number of

untouched lncRNAs seems overwhelming. Major impediment in mining more

biologically and physiologically relevant noncoding RNAs in a systemic level more

rapidly is because one, most noncoding RNA will be resistant to mutagens as used in

traditional genetic screening since out-of-frame or nonsense mutations might be

meaningless, and two, it’s hard to set the screening readout or decide which phenotype

to look at, after applying genetic perturbations (Willingham et al., 2005). Given the

diversity of possible mechanisms and lack of prediction tool, screening lncRNAs through

association with particular biological responses is currently among limited options for

initial approach (Guttman et al., 2009) and have been demonstrated by multiple studies

(Huarte et al., 2010; Hung et al., 2011). As data accumulates by this “guilt-by-

association” study, more generalized themes will emerge and accelerate further

discovery. In the meantime, careful examination of their causality might need to be

accompanied to these investigations.

Era of redefining regulatory RNAs

Since 1950s, after molecular biologists discovered messenger RNAs, big assumption

that most genetic information is enacted by proteins may have led us to long and wrong

ways of our understanding of genetic programs in multi-cellular life forms (Morris and

Mattick, 2014). Not to mention their fundamental and constitutive roles involved in

193

translation and transcription, noncoding RNAs are rising to be a major player as

epigenetic regulator, chromosomal organizer, gene transcriptional controller, and fine-

tuning gene titrator. For example, at least 20% of human genes are under miRNA

regulation (Xie et al., 2005) and number of literatures describing novel regulatory

functions of lncRNAs are ever expanding.

As much as their mechanistic designs are immensely different from how proteins

perform, study of regulatory noncoding RNAs also need to be approached in different

ways. For instance, many in vivo studies have shown that deletion of individual miRNAs

often leads to subtle or no phenotypic consequences (Mendell and Olson, 2012; Vidigal

and Ventura, 2015) and there are only handful of functional demonstration by transgenic

or knockout animals for lncRNAs (Li and Chang, 2014). One explanation could be high

levels of redundancy of noncoding RNAs, possibly due to existence of functional

homologs without substantial sequence homology. Unlike proteins, there’s little studies

on domain structure of functional noncoding RNAs or general catalytic mechanisms.

Protein folding studies and enzymology has long history of research while study of RNA

secondary structure is at only its infancy.

Another emerging consensus in this field is that miRNAs buffer gene expression against

internal and external perturbations and cellular stresses (Mendell and Olson, 2012;

Vidigal and Ventura, 2015) and this concept might be extended to lncRNA. In multi-

cellular organisms that encounters diverse influences and challenges from outside may

have evolved to have complicated networks of fine-tuning regulatory nodes such that

overt changes in phenotype is only becomes apparent when the challenge excels its

threshold, as opposed to simple binary switches. This hypothesis leaves a lot of

194

homework to biologist and yet again we may all have to admit we don’t know much more

then what we know now, and even don’t realize what we don’t know.

195

Appendix

Chapter 2 was published on Genome Research in 2015 (Chang et al., 2015)

Chang, T.C., Pertea, M., Lee, S., Salzberg, S.L., and Mendell, J.T. (2015). Genome-wide annotation of microRNA primary transcript structures reveals novel regulatory mechanisms. Genome Res 25, 1401-1409.

Chapter 3 and 4 is in press and will be published on Cell in 2016

Lee, S., Kopp, F., Chang, T.C., Sataluri, A., Chen, B., Sivakumar S., Yu H., Xie, Y., Mendell J.T. (2016) Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell 164, 1-12. In press

196

References

Adamson, B., Smogorzewska, A., Sigoillot, F.D., King, R.W., and Elledge, S.J. (2012). A genome-wide homologous recombination screen identifies the RNA-binding protein RBMX as a component of the DNA-damage response. Nat. Cell Biol. 14, 318-328.

Albertson, D.G., Collins, C., McCormick, F., and Gray, J.W. (2003). Chromosome aberrations in solid tumors. Nat. Genet. 34, 369-376.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J Mol Biol 215, 403-410.

Anders, S., McCarthy, D.J., Chen, Y., Okoniewski, M., Smyth, G.K., Huber, W., and Robinson, M.D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 8, 1765-1786.

Anderson, D.M., Anderson, K.M., Chang, C.L., Makarewich, C.A., Nelson, B.R., McAnally, J.R., Kasaragod, P., Shelton, J.M., Liou, J., Bassel-Duby, R., et al. (2015). A micropeptide encoded by a putative long noncoding RNA regulates muscle performance. Cell 160, 595-606.

Avery, O.T., Macleod, C.M., and McCarty, M. (1944). Studies on the Chemical Nature of the Substance Inducing Transformation of Pneumococcal Types : Induction of Transformation by a Desoxyribonucleic Acid Fraction Isolated from Pneumococcus Type Iii. J Exp Med 79, 137-158.

Barber, T.D., McManus, K., Yuen, K.W., Reis, M., Parmigiani, G., Shen, D., Barrett, I., Nouhi, Y., Spencer, F., Markowitz, S., et al. (2008). Chromatid cohesion defects may underlie chromosome instability in human colorectal cancers. Proc. Natl. Acad. Sci. U. S. A. 105, 3443-3448.

Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297.

Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233.

Baskerville, S., and Bartel, D.P. (2005). Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11, 241-247.

Bazzini, A.A., Johnstone, T.G., Christiano, R., Mackowiak, S.D., Obermayer, B., Fleming, E.S., Vejnar, C.E., Lee, M.T., Rajewsky, N., Walther, T.C., et al. (2014). Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981-993.

Berget, S.M., Moore, C., and Sharp, P.A. (1977). Spliced segments at the 5' terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci U S A 74, 3171-3175.

197

Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., et al. (2004). Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242-2246.

Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010). Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19, Unit 19 10 11-21.

Bohnsack, M.T., Czaplinski, K., and Gorlich, D. (2004). Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA 10, 185-191.

Bonasio, R., and Shiekhattar, R. (2014). Regulation of transcription by long noncoding RNAs. Annu Rev Genet 48, 433-455.

Bunz, F., Dutriaux, A., Lengauer, C., Waldman, T., Zhou, S., Brown, J.P., Sedivy, J.M., Kinzler, K.W., and Vogelstein, B. (1998). Requirement for p53 and p21 to sustain G2 arrest after DNA damage. Science 282, 1497-1501.

Burgess, A., Vigneron, S., Brioudes, E., Labbe, J.C., Lorca, T., and Castro, A. (2010). Loss of human Greatwall results in G2 arrest and multiple mitotic defects due to deregulation of the cyclin B-Cdc2/PP2A balance. Proc. Natl. Acad. Sci. U. S. A. 107, 12564-12569.

Burrell, R.A., McClelland, S.E., Endesfelder, D., Groth, P., Weller, M.C., Shaikh, N., Domingo, E., Kanu, N., Dewhurst, S.M., Gronroos, E., et al. (2013). Replication stress links structural and numerical cancer chromosomal instability. Nature 494, 492-496.

Busch, H., Reddy, R., Rothblum, L., and Choi, Y.C. (1982). SnRNAs, SnRNPs, and RNA processing. Annu Rev Biochem 51, 617-654.

Cabianca, D.S., Casa, V., Bodega, B., Xynos, A., Ginelli, E., Tanaka, Y., and Gabellini, D. (2012). A long ncRNA links copy number variation to a polycomb/trithorax epigenetic switch in FSHD muscular dystrophy. Cell 149, 819-831.

Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., and Rinn, J.L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915-1927.

Cai, X., Hagedorn, C.H., and Cullen, B.R. (2004). Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs. RNA 10, 1957-1966.

Calin, G.A., and Croce, C.M. (2006). MicroRNA signatures in human cancers. Nat Rev Cancer 6, 857-866.

Calin, G.A., Dumitru, C.D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H., Rattan, S., Keating, M., Rai, K., et al. (2002). Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 99, 15524-15529.

198

Calin, G.A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S.E., Iorio, M.V., Visone, R., Sever, N.I., Fabbri, M., et al. (2005). A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-1801.

Capecchi, M.R. (2005). Gene targeting in mice: functional analysis of the mammalian genome for the twenty-first century. Nat Rev Genet 6, 507-512.

Carninci, P., Kasukawa, T., Katayama, S., Gough, J., Frith, M.C., Maeda, N., Oyama, R., Ravasi, T., Lenhard, B., Wells, C., et al. (2005). The transcriptional landscape of the mammalian genome. Science 309, 1559-1563.

Carter, S.L., Eklund, A.C., Kohane, I.S., Harris, L.N., and Szallasi, Z. (2006). A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, 1043-1048.

Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., et al. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499-509.

Cech, T.R., and Steitz, J.A. (2014). The noncoding RNA revolution-trashing old rules to forge new ones. Cell 157, 77-94.

Chang, T.C., Pertea, M., Lee, S., Salzberg, S.L., and Mendell, J.T. (2015). Genome-wide annotation of microRNA primary transcript structures reveals novel regulatory mechanisms. Genome Res 25, 1401-1409.

Chang, T.C., Wentzel, E.A., Kent, O.A., Ramachandran, K., Mullendore, M., Lee, K.H., Feldmann, G., Yamakuchi, M., Ferlito, M., Lowenstein, C.J., et al. (2007). Transactivation of miR-34a by p53 broadly influences gene expression and promotes apoptosis. Mol Cell 26, 745-752.

Chang, T.C., Yu, D., Lee, Y.S., Wentzel, E.A., Arking, D.E., West, K.M., Dang, C.V., Thomas-Tikhonenko, A., and Mendell, J.T. (2008). Widespread microRNA repression by Myc contributes to tumorigenesis. Nat Genet 40, 43-50.

Chen, D., Zheng, W., Lin, A., Uyhazi, K., Zhao, H., and Lin, H. (2012). Pumilio 1 suppresses multiple activators of p53 to safeguard spermatogenesis. Curr. Biol. 22, 420-425.

Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24, 992-1009.

Chien, C.H., Sun, Y.M., Chang, W.C., Chiang-Hsieh, P.Y., Lee, T.Y., Tsai, W.C., Horng, J.T., Tsou, A.P., and Huang, H.D. (2011). Identifying transcriptional start sites of human microRNAs based on high-throughput sequencing data. Nucleic Acids Res 39, 9345-9356.

199

Chivukula, K.K., and Hollands, C. (2012). Human Acellular Dermal Matrix for Neonates with Complex Abdominal Wall Defects: Short- and Long-Term Outcomes. American Surgeon 78, E346-E348.

Chow, L.T., Gelinas, R.E., Broker, T.R., and Roberts, R.J. (1977). An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA. Cell 12, 1-8.

Cimini, D. (2008). Merotelic kinetochore orientation, aneuploidy, and cancer. Biochim. Biophys. Acta 1786, 32-40.

Clemson, C.M., Hutchinson, J.N., Sara, S.A., Ensminger, A.W., Fox, A.H., Chess, A., and Lawrence, J.B. (2009). An architectural role for a nuclear noncoding RNA: NEAT1 RNA is essential for the structure of paraspeckles. Mol. Cell 33, 717-726.

Consortium, E.P., Birney, E., Stamatoyannopoulos, J.A., Dutta, A., Guigo, R., Gingeras, T.R., Margulies, E.H., Weng, Z., Snyder, M., Dermitzakis, E.T., et al. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816.

Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561-563.

Cunningham, F., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G., Fitzgerald, S., et al. (2015). Ensembl 2015. Nucleic Acids Res 43, D662-669.

Dahlberg, A.E. (1989). The functional role of ribosomal RNA in protein synthesis. Cell 57, 525-529.

De Vos, M., Schreiber, V., and Dantzer, F. (2012). The diverse roles and clinical relevance of PARPs in DNA damage repair: current state of the art. Biochem Pharmacol 84, 137-146.

Di Leva, G., Garofalo, M., and Croce, C.M. (2014). MicroRNAs in cancer. Annu Rev Pathol 9, 287-314.

Dimitrova, N., Zamudio, J.R., Jong, R.M., Soukup, D., Resnick, R., Sarma, K., Ward, A.J., Raj, A., Lee, J.T., Sharp, P.A., et al. (2014). LincRNA-p21 activates p21 in cis to promote Polycomb target gene expression and to enforce the G1/S checkpoint. Mol. Cell 54, 777-790.

Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al. (2012). Landscape of transcription in human cells. Nature 489, 101-108.

Doudna, J.A., and Batey, R.T. (2004). Structural insights into the signal recognition particle. Annu Rev Biochem 73, 539-557.

Driscoll, H.E., Muraro, N.I., He, M., and Baines, R.A. (2013). Pumilio-2 regulates translation of Nav1.6 to mediate homeostasis of membrane excitability. J. Neurosci. 33, 9644-9654.

200

Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43-49.

Faghihi, M.A., Modarresi, F., Khalil, A.M., Wood, D.E., Sahagan, B.G., Morgan, T.E., Finch, C.E., St Laurent, G., 3rd, Kenny, P.J., and Wahlestedt, C. (2008). Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of beta-secretase. Nat. Med. 14, 723-730.

Fatica, A., and Bozzoni, I. (2014). Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet 15, 7-21.

Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806-811.

Galgano, A., Forrer, M., Jaskiewicz, L., Kanitz, A., Zavolan, M., and Gerber, A.P. (2008). Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system. PLoS One 3, e3164.

Ganem, N.J., Godinho, S.A., and Pellman, D. (2009). A mechanism linking extra centrosomes to chromosomal instability. Nature 460, 278-282.

Ganem, N.J., Storchova, Z., and Pellman, D. (2007). Tetraploidy, aneuploidy and cancer. Curr. Opin. Genet. Dev. 17, 157-162.

Geigl, J.B., Obenauf, A.C., Schwarzbraun, T., and Speicher, M.R. (2008). Defining 'chromosomal instability'. Trends Genet. 24, 64-69.

Gennarino, V.A., Singh, R.K., White, J.J., De Maio, A., Han, K., Kim, J.Y., Jafar-Nejad, P., di Ronza, A., Kang, H., Sayegh, L.S., et al. (2015). Pumilio1 haploinsufficiency leads to SCA1-like neurodegeneration by increasing wild-type Ataxin1 levels. Cell 160, 1087-1098.

Georgakilas, G., Vlachos, I.S., Paraskevopoulou, M.D., Yang, P., Zhang, Y., Economides, A.N., and Hatzigeorgiou, A.G. (2014). microTSS: accurate microRNA transcription start site identification reveals a significant number of divergent pri-miRNAs. Nat Commun 5, 5700.

Gerlinger, M., Rowan, A.J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N., Stewart, A., Tarpey, P., et al. (2012). Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883-892.

Giavara, S., Kosmidou, E., Hande, M.P., Bianchi, M.E., Morgan, A., d'Adda di Fagagna, F., and Jackson, S.P. (2005). Yeast Nhp6A/B and mammalian Hmgb1 facilitate the maintenance of genome stability. Curr. Biol. 15, 68-72.

Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and clearance of maternal mRNAs. Science 312, 75-79.

201

Gong, C., and Maquat, L.E. (2011). lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3' UTRs via Alu elements. Nature 470, 284-288.

Greider, C.W., and Blackburn, E.H. (1989). A telomeric sequence in the RNA of Tetrahymena telomerase required for telomere repeat synthesis. Nature 337, 331-337.

Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835-840.

Gurtan, A.M., and Sharp, P.A. (2013). The role of miRNAs in regulating gene expression networks. J Mol Biol 425, 3582-3600.

Guttman, M., Amit, I., Garber, M., French, C., Lin, M.F., Feldser, D., Huarte, M., Zuk, O., Carey, B.W., Cassady, J.P., et al. (2009). Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223-227.

Guttman, M., Garber, M., Levin, J.Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M.J., Gnirke, A., Nusbaum, C., et al. (2010). Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28, 503-510.

Guttman, M., and Rinn, J.L. (2012). Modular regulatory principles of large non-coding RNAs. Nature 482, 339-346.

Guttman, M., Russell, P., Ingolia, N.T., Weissman, J.S., and Lander, E.S. (2013). Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240-251.

Ha, M., and Kim, V.N. (2014). Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol 15, 509-524.

Hacisuleyman, E., Goff, L.A., Trapnell, C., Williams, A., Henao-Mejia, J., Sun, L., McClanahan, P., Hendrickson, D.G., Sauvageau, M., Kelley, D.R., et al. (2014). Topological organization of multichromosomal regions by the long intergenic noncoding RNA Firre. Nat. Struct. Mol. Biol. pu21, 198-206.

Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., Munschauer, M., et al. (2010). Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129-141.

Hammond, S.M., Bernstein, E., Beach, D., and Hannon, G.J. (2000). An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296.

Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: the next generation. Cell 144, 646-674.

Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa, A., Searle, S., et al. (2012). GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760-1774.

202

Hauser, S., Ulrich, T., Wurster, S., Schmitt, K., Reichert, N., and Gaubatz, S. (2012). Loss of LIN9, a member of the DREAM complex, cooperates with SV40 large T antigen to induce genomic instability and anchorage-independent growth. Oncogene 31, 1859-1868.

He, L., He, X., Lim, L.P., de Stanchina, E., Xuan, Z., Liang, Y., Xue, W., Zender, L., Magnus, J., Ridzon, D., et al. (2007). A microRNA component of the p53 tumour suppressor network. Nature 447, 1130-1134.

Heo, I., Joo, C., Cho, J., Ha, M., Han, J., and Kim, V.N. (2008). Lin28 mediates the terminal uridylation of let-7 precursor MicroRNA. Mol Cell 32, 276-284.

Hershey, A.D., and Chase, M. (1952). Independent functions of viral protein and nucleic acid in growth of bacteriophage. J Gen Physiol 36, 39-56.

Hoagland, M.B., Stephenson, M.L., Scott, J.F., Hecht, L.I., and Zamecnik, P.C. (1958). A soluble ribonucleic acid intermediate in protein synthesis. J Biol Chem 231, 241-257.

Hockemeyer, D., Soldner, F., Beard, C., Gao, Q., Mitalipova, M., DeKelver, R.C., Katibah, G.E., Amora, R., Boydston, E.A., Zeitler, B., et al. (2009). Efficient targeting of expressed and silent genes in human ESCs and iPSCs using zinc-finger nucleases. Nat. Biotechnol. 27, 851-857.

Hsu, P.D., Lander, E.S., and Zhang, F. (2014). Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262-1278.

Huang, D.W., Sherman, B.T., Tan, Q., Kir, J., Liu, D., Bryant, D., Guo, Y., Stephens, R., Baseler, M.W., Lane, H.C., et al. (2007). DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 35, W169-175.

Huarte, M., Guttman, M., Feldser, D., Garber, M., Koziol, M.J., Kenzelmann-Broz, D., Khalil, A.M., Zuk, O., Amit, I., Rabani, M., et al. (2010). A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142, 409-419.

Hung, C.L., Wang, L.Y., Yu, Y.L., Chen, H.W., Srivastava, S., Petrovics, G., and Kung, H.J. (2014). A long noncoding RNA connects c-Myc to tumor metabolism. Proc Natl Acad Sci U S A 111, 18697-18702.

Hung, T., Wang, Y., Lin, M.F., Koegel, A.K., Kotake, Y., Grant, G.D., Horlings, H.M., Shah, N., Umbricht, C., Wang, P., et al. (2011). Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet 43, 621-629.

Hwang, H.W., Wentzel, E.A., and Mendell, J.T. (2007). A hexanucleotide element directs microRNA nuclear import. Science 315, 97-100.

Hwang, H.W., Wentzel, E.A., and Mendell, J.T. (2009). Cell-cell contact globally activates microRNA biogenesis. Proc Natl Acad Sci U S A 106, 7016-7021.

203

Iourov, I.Y., Vorsanova, S.G., and Yurov, Y.B. (2010). Somatic genome variations in health and disease. Curr Genomics 11, 387-396.

Islam, S., Kjallquist, U., Moliner, A., Zajac, P., Fan, J.B., Lonnerberg, P., and Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160-1167.

Iyer, M.K., Niknafs, Y.S., Malik, R., Singhal, U., Sahu, A., Hosono, Y., Barrette, T.R., Prensner, J.R., Evans, J.R., Zhao, S., et al. (2015). The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199-208.

Jackson, E.L., Willis, N., Mercer, K., Bronson, R.T., Crowley, D., Montoya, R., Jacks, T., and Tuveson, D.A. (2001). Analysis of lung tumor initiation and progression using conditional expression of oncogenic K-ras. Genes Dev 15, 3243-3248.

Jacob, F., and Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3, 318-356.

Jallepalli, P.V., Waizenegger, I.C., Bunz, F., Langer, S., Speicher, M.R., Peters, J.M., Kinzler, K.W., Vogelstein, B., and Lengauer, C. (2001). Securin is required for chromosomal stability in human cells. Cell 105, 445-457.

Jiang, W., Jimenez, G., Wells, N.J., Hope, T.J., Wahl, G.M., Hunter, T., and Fukunaga, R. (1998). PRC1: a human mitotic spindle-associated CDK substrate protein required for cytokinesis. Mol. Cell 2, 877-885.

Kang, Y.H., Farina, A., Bermudez, V.P., Tappin, I., Du, F., Galal, W.C., and Hurwitz, J. (2013). Interaction between human Ctf4 and the Cdc45/Mcm2-7/GINS (CMG) replicative helicase. Proc. Natl. Acad. Sci. U. S. A. 110, 19760-19765.

Karpf, A.R., and Matsui, S. (2005). Genetic disruption of cytosine DNA methyltransferase enzymes induces chromosomal instability in human cancer cells. Cancer Res. 65, 8635-8639.

Kazazian, H.H., Jr. (2014). Processed pseudogene insertions in somatic cells. Mob DNA 5, 20.

Kedde, M., van Kouwenhove, M., Zwart, W., Oude Vrielink, J.A., Elkon, R., and Agami, R. (2010). A Pumilio-induced RNA structure switch in p27-3' UTR controls miR-221 and miR-222 accessibility. Nat. Cell Biol. 12, 1014-1020.

Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659.

Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. U. S. A. 106, 11667-11672.

204

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36.

Kino, T., Hurt, D.E., Ichijo, T., Nader, N., and Chrousos, G.P. (2010). Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Science signaling 3, ra8.

Kiss-Laszlo, Z., Henry, Y., Bachellerie, J.P., Caizergues-Ferrer, M., and Kiss, T. (1996). Site-specific ribose methylation of preribosomal RNA: a novel function for small nucleolar RNAs. Cell 85, 1077-1088.

Knight, S.W., and Bass, B.L. (2001). A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science 293, 2269-2271.

Kops, G.J., Weaver, B.A., and Cleveland, D.W. (2005). On the road to cancer: aneuploidy and the mitotic checkpoint. Nat. Rev. Cancer 5, 773-785.

Kozomara, A., and Griffiths-Jones, S. (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42, D68-73.

Kretz, M., Siprashvili, Z., Chu, C., Webster, D.E., Zehnder, A., Qu, K., Lee, C.S., Flockhart, R.J., Groff, A.F., Chow, J., et al. (2013). Control of somatic tissue differentiation by the long non-coding RNA TINCR. Nature 493, 231-235.

Kuga, T., Nie, H., Kazami, T., Satoh, M., Matsushita, K., Nomura, F., Maeshima, K., Nakayama, Y., and Tomonaga, T. (2014). Lamin B2 prevents chromosome instability by ensuring proper mitotic chromosome segregation. Oncogenesis 3, e94.

Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.

Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414.

Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical region gene 8 and Its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol 14, 2162-2167.

Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862.

Laufer, M., Nandula, S.V., Modi, A.P., Wang, S., Jasin, M., Murty, V.V., Ludwig, T., and Baer, R. (2007). Structural requirements for the BARD1 tumor suppressor in

205

chromosomal stability and homology-directed DNA repair. J. Biol. Chem. 282, 34325-34333.

Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864.

Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854.

Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415-419.

Lee, Y., Jeon, K., Lee, J.T., Kim, S., and Kim, V.N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. EMBO J 21, 4663-4670.

Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060.

Li, L., and Chang, H.Y. (2014). Physiological roles of long noncoding RNAs: insight from knockout mice. Trends Cell Biol. 24, 594-602.

Li, W., Feng, J., and Jiang, T. (2011). IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. J Comput Biol 18, 1693-1707.

Liang, Y., Ridzon, D., Wong, L., and Chen, C. (2007). Characterization of microRNA expression profiles in normal human tissues. BMC Genomics 8, 166.

Lin, M.F., Jungreis, I., and Kellis, M. (2011). PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275-282.

Ling, H., Spizzo, R., Atlasi, Y., Nicoloso, M., Shimizu, M., Redis, R.S., Nishida, N., Gafa, R., Song, J., Guo, Z., et al. (2013). CCAT2, a novel noncoding RNA mapping to 8q24, underlies metastatic progression and chromosomal instability in colon cancer. Genome Res. 23, 1446-1461.

Liu, B., Sun, L., Liu, Q., Gong, C., Yao, Y., Lv, X., Lin, L., Yao, H., Su, F., Li, D., et al. (2015). A cytoplasmic NF-kappaB interacting long noncoding RNA blocks IkappaB phosphorylation and suppresses breast cancer metastasis. Cancer Cell 27, 370-381.

Liu, J., Wang, Z., Jiang, K., Zhang, L., Zhao, L., Hua, S., Yan, F., Yang, Y., Wang, D., Fu, C., et al. (2009). PRC1 cooperates with CLASP1 to organize central spindle plasticity in mitosis. J Biol Chem 284, 23059-23071.

Liu, X., Li, D., Zhang, W., Guo, M., and Zhan, Q. (2012). Long non-coding RNA gadd7 interacts with TDP-43 and regulates Cdk6 mRNA decay. EMBO J. 31, 4415-4427.

Lotterman, C.D., Kent, O.A., and Mendell, J.T. (2008). Functional integration of microRNAs into oncogenic and tumor suppressor pathways. Cell Cycle 7, 2493-2499.

206

Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., et al. (2005). MicroRNA expression profiles classify human cancers. Nature 435, 834-838.

Lutzmann, M., Grey, C., Traver, S., Ganier, O., Maya-Mendoza, A., Ranisavljevic, N., Bernex, F., Nishiyama, A., Montel, N., Gavois, E., et al. (2012). MCM8- and MCM9-deficient mice reveal gametogenesis defects and genome instability due to impaired homologous recombination. Mol. Cell 47, 523-534.

Manning, A.L., Yazinski, S.A., Nicolay, B., Bryll, A., Zou, L., and Dyson, N.J. (2014). Suppression of genome instability in pRB-deficient cells by enhancement of chromosome cohesion. Mol. Cell 53, 993-1004.

Marsico, A., Huska, M.R., Lasserre, J., Hu, H., Vucicevic, D., Musahl, A., Orom, U., and Vingron, M. (2013). PROmiRNA: a new miRNA promoter recognition method uncovers the complex regulation of intronic miRNAs. Genome Biol 14, R84.

Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., Johnstone, S., Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., et al. (2008). Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521-533.

Martin, J.A., and Wang, Z. (2011). Next-generation transcriptome assembly. Nat Rev Genet 12, 671-682.

Mashal, R.D., Koontz, J., and Sklar, J. (1995). Detection of mutations by cleavage of DNA heteroduplexes with bacteriophage resolvases. Nat Genet 9, 177-183.

Masramon, L., Ribas, M., Cifuentes, P., Arribas, R., Garcia, F., Egozcue, J., Peinado, M.A., and Miro, R. (2000). Cytogenetic characterization of two colon cell lines by using conventional G-banding, comparative genomic hybridization, and whole chromosome painting. Cancer Genet. Cytogenet. 121, 17-21.

Matsunaga, S., Takata, H., Morimoto, A., Hayashihara, K., Higashi, T., Akatsuchi, K., Mizusawa, E., Yamakawa, M., Ashida, M., Matsunaga, T.M., et al. (2012). RBMX: a regulator for maintenance and centromeric protection of sister chromatid cohesion. Cell reports 1, 299-308.

McCarthy, E.E., Celebi, J.T., Baer, R., and Ludwig, T. (2003). Loss of Bard1, the heterodimeric partner of the Brca1 tumor suppressor, results in early embryonic lethality and chromosomal instability. Mol. Cell. Biol. 23, 5056-5063.

McGettigan, P.A. (2013). Transcriptomics in the RNA-seq era. Curr Opin Chem Biol 17, 4-11.

McIntyre, R.E., Lakshminarasimhan Chavali, P., Ismail, O., Carragher, D.M., Sanchez-Andrade, G., Forment, J.V., Fu, B., Del Castillo Velasco-Herrera, M., Edwards, A., van der Weyden, L., et al. (2012). Disruption of mouse Cenpj, a regulator of centriole biogenesis, phenocopies Seckel syndrome. PLoS genetics 8, e1003022.

207

Megraw, M., Pereira, F., Jensen, S.T., Ohler, U., and Hatzigeorgiou, A.G. (2009). A transcription factor affinity-based code for mammalian transcription initiation. Genome Res 19, 644-656.

Melamed, Z., Levy, A., Ashwal-Fluss, R., Lev-Maor, G., Mekahel, K., Atias, N., Gilad, S., Sharan, R., Levy, C., Kadener, S., et al. (2013). Alternative splicing regulates biogenesis of miRNAs located across exon-intron junctions. Mol Cell 50, 869-881.

Mendell, J.T., and Olson, E.N. (2012). MicroRNAs in stress signaling and human disease. Cell 148, 1172-1187.

Menissier de Murcia, J., Ricoul, M., Tartier, L., Niedergang, C., Huber, A., Dantzer, F., Schreiber, V., Ame, J.C., Dierich, A., LeMeur, M., et al. (2003). Functional interaction between PARP-1 and PARP-2 in chromosome stability and embryonic development in mouse. EMBO J. 22, 2255-2263.

Menon, S., Oh, W., Carr, H.S., and Frost, J.A. (2013). Rho GTPase-independent regulation of mitotic progression by the RhoGEF Net1. Mol. Biol. Cell 24, 2655-2667.

Michalik, K.M., You, X., Manavski, Y., Doddaballapur, A., Zornig, M., Braun, T., John, D., Ponomareva, Y., Chen, W., Uchida, S., et al. (2014). Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ. Res. 114, 1389-1397.

Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553-560.

Miles, W.O., Tschop, K., Herr, A., Ji, J.Y., and Dyson, N.J. (2012). Pumilio facilitates miRNA regulation of the E2F3 oncogene. Genes Dev. 26, 356-368.

Mili, S., and Steitz, J.A. (2004). Evidence for reassociation of RNA-binding proteins after cell lysis: implications for the interpretation of immunoprecipitation analyses. RNA 10, 1692-1694.

Miller, M.A., and Olivas, W.M. (2011). Roles of Puf proteins in mRNA degradation and translation. Wiley interdisciplinary reviews. RNA 2, 471-492.

Morris, A.R., Mukherjee, N., and Keene, J.D. (2008). Ribonomic analysis of human Pum1 reveals cis-trans conservation across species despite evolution of diverse mRNA target sets. Mol. Cell. Biol. 28, 4093-4103.

Morris, K.V., and Mattick, J.S. (2014). The rise of regulatory RNA. Nat Rev Genet 15, 423-437.

Napoli, C., Lemieux, C., and Jorgensen, R. (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. Plant Cell 2, 279-289.

Narita, R., Takahasi, K., Murakami, E., Hirano, E., Yamamoto, S.P., Yoneyama, M., Kato, H., and Fujita, T. (2014). A novel function of human Pumilio proteins in cytoplasmic sensing of viral infection. PLoS Pathog. 10, e1004417.

208

Ni, J., Tien, A.L., and Fournier, M.J. (1997). Small nucleolar RNAs direct site-specific synthesis of pseudouridine in ribosomal RNA. Cell 89, 565-573.

Niu, Y., Shen, B., Cui, Y., Chen, Y., Wang, J., Wang, L., Kang, Y., Zhao, X., Si, W., Li, W., et al. (2014). Generation of gene-modified cynomolgus monkey via Cas9/RNA-mediated gene targeting in one-cell embryos. Cell 156, 836-843.

O'Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., and Mendell, J.T. (2005). c-Myc-regulated microRNAs modulate E2F1 expression. Nature 435, 839-843.

Olive, V., Minella, A.C., and He, L. (2015). Outside the coding genome, mammalian microRNAs confer structural and functional complexity. Sci Signal 8, re2.

Olson, E.N. (2014). MicroRNAs as therapeutic targets and biomarkers of cardiovascular disease. Sci Transl Med 6, 239ps233.

Ozsolak, F., Poling, L.L., Wang, Z., Liu, H., Liu, X.S., Roeder, R.G., Zhang, X., Song, J.S., and Fisher, D.E. (2008). Chromatin structure analyses identify miRNA promoters. Genes Dev 22, 3172-3183.

Pasquinelli, A.E. (2012). MicroRNAs and their targets: recognition, regulation and an emerging reciprocal relationship. Nat Rev Genet 13, 271-282.

Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89.

Patel, R.K., and Jain, M. (2012). NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619.

Pauli, A., Rinn, J.L., and Schier, A.F. (2011). Non-coding RNAs as regulators of embryogenesis. Nature reviews. Genetics 12, 136-149.

Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T., and Salzberg, S.L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290-295.

Ponten, F., Jirstrom, K., and Uhlen, M. (2008). The Human Protein Atlas - a tool for pathology. J. Pathol. 216, 387-393.

Ponting, C.P., Oliver, P.L., and Reik, W. (2009). Evolution and functions of long noncoding RNAs. Cell 136, 629-641.

Quinodoz, S., and Guttman, M. (2014). Long noncoding RNAs: an emerging link between gene regulation and nuclear organization. Trends Cell Biol 24, 651-663.

Rajagopalan, H., Nowak, M.A., Vogelstein, B., and Lengauer, C. (2003). The significance of unstable chromosomes in colorectal cancer. Nat. Rev. Cancer 3, 695-701.

209

Rakheja, D., Chen, K.S., Liu, Y., Shukla, A.A., Schmid, V., Chang, T.C., Khokhar, S., Wickiser, J.E., Karandikar, N.J., Malter, J.S., et al. (2014). Somatic mutations in DROSHA and DICER1 impair microRNA biogenesis through distinct mechanisms in Wilms tumours. Nat Commun 2, 4802.

Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906.

Rinn, J.L., and Chang, H.Y. (2012). Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145-166.

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140.

Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. (2004). Identification of mammalian microRNA host genes and transcription units. Genome Res 14, 1902-1910.

Rosenbloom, K.R., Sloan, C.A., Malladi, V.S., Dreszer, T.R., Learned, K., Kirkup, V.M., Wong, M.C., Maddren, M., Fang, R., Heitner, S.G., et al. (2013). ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 41, D56-63.

Roush, S., and Slack, F.J. (2008). The let-7 family of microRNAs. Trends Cell Biol 18, 505-516.

Sabin, L.R., Delas, M.J., and Hannon, G.J. (2013). Dogma derailed: the many influences of RNA on the genome. Mol. Cell 49, 783-794.

Salditt-Georgieff, M., and Darnell, J.E., Jr. (1982). Further evidence that the majority of primary nuclear RNA transcripts in mammalian cells do not contribute to mRNA. Mol Cell Biol 2, 701-707.

Salditt-Georgieff, M., Harpold, M.M., Wilson, M.C., and Darnell, J.E., Jr. (1981). Large heterogeneous nuclear ribonucleic acid has three times as many 5' caps as polyadenylic acid segments, and most caps do not enter polyribosomes. Mol Cell Biol 1, 179-187.

Salmena, L., Poliseno, L., Tay, Y., Kats, L., and Pandolfi, P.P. (2011). A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353-358.

Salzler, H.R., Davidson, J.M., Montgomery, N.D., and Duronio, R.J. (2009). Loss of the histone pre-mRNA processing factor stem-loop binding protein in Drosophila causes genomic instability and impaired cellular proliferation. PLoS One 4, e8168.

Samper, E., Goytisolo, F.A., Menissier-de Murcia, J., Gonzalez-Suarez, E., Cigudosa, J.C., de Murcia, G., and Blasco, M.A. (2001). Normal telomere length and chromosomal end capping in poly(ADP-ribose) polymerase-deficient mice and primary cells despite increased chromosomal instability. J. Cell Biol. 154, 49-60.

Sanchez, Y., Segura, V., Marin-Bejar, O., Athie, A., Marchese, F.P., Gonzalez, J., Bujanda, L., Guo, S., Matheu, A., and Huarte, M. (2014). Genome-wide analysis of the

210

human p53 transcriptional network unveils a lncRNA tumour suppressor signature. Nature communications 5, 5812.

Sander, J.D., Cade, L., Khayter, C., Reyon, D., Peterson, R.T., Joung, J.K., and Yeh, J.R. (2011). Targeted gene disruption in somatic zebrafish cells using engineered TALENs. Nat. Biotechnol. 29, 697-698.

Sander, J.D., Maeder, M.L., Reyon, D., Voytas, D.F., Joung, J.K., and Dobbs, D. (2010). ZiFiT (Zinc Finger Targeter): an updated zinc finger engineering tool. Nucleic Acids Res. 38, W462-468.

Sanjana, N.E., Cong, L., Zhou, Y., Cunniff, M.M., Feng, G., and Zhang, F. (2012). A transcription activator-like effector toolbox for genome engineering. Nat. Protoc. 7, 171-192.

Schaetzlein, S., Chahwan, R., Avdievich, E., Roa, S., Wei, K., Eoff, R.L., Sellers, R.S., Clark, A.B., Kunkel, T.A., Scharff, M.D., et al. (2013). Mammalian Exo1 encodes both structural and catalytic functions that play distinct roles in essential biological processes. Proc. Natl. Acad. Sci. U. S. A. 110, E2470-2479.

Schanen, B.C., and Li, X. (2011). Transcriptional regulation of mammalian miRNA genes. Genomics 97, 1-6.

Schultes, E.A., Spasic, A., Mohanty, U., and Bartel, D.P. (2005). Compact and ordered collapse of randomly generated RNA sequences. Nat Struct Mol Biol 12, 1130-1136.

Shima, N., Alcaraz, A., Liachko, I., Buske, T.R., Andrews, C.A., Munroe, R.J., Hartford, S.A., Tye, B.K., and Schimenti, J.C. (2007). A viable allele of Mcm4 causes chromosome instability and mammary adenocarcinomas in mice. Nat. Genet. 39, 93-98.

Spassov, D.S., and Jurecic, R. (2003). The PUF family of RNA-binding proteins: does evolutionarily conserved structure equal conserved function? IUBMB life 55, 359-366.

Struhl, K. (2007). Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat. Struct. Mol. Biol. 14, 103-105.

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545-15550.

Terasawa, M., Shinohara, A., and Shinohara, M. (2014). Canonical non-homologous end joining in mitosis induces genome instability and is suppressed by M-phase-specific phosphorylation of XRCC4. PLoS genetics 10, e1004563.

Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-1111.

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and

211

quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-515.

Ulitsky, I., and Bartel, D.P. (2013). lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26-46.

Ulitsky, I., Shkumatava, A., Jan, C.H., Sive, H., and Bartel, D.P. (2011). Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537-1550.

Vessey, J.P., Schoderboeck, L., Gingl, E., Luzi, E., Riefler, J., Di Leva, F., Karra, D., Thomas, S., Kiebler, M.A., and Macchi, P. (2010). Mammalian Pumilio 2 regulates dendrite morphogenesis and synaptic function. Proc. Natl. Acad. Sci. U. S. A. 107, 3222-3227.

Vidigal, J.A., and Ventura, A. (2015). The biological functions of miRNAs: lessons from in vivo studies. Trends Cell Biol 25, 137-147.

Voets, E., and Wolthuis, R.M. (2010). MASTL is the human orthologue of Greatwall kinase that facilitates mitotic entry, anaphase and cytokinesis. Cell cycle 9, 3591-3601.

Walter, P., and Blobel, G. (1982). Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum. Nature 299, 691-698.

Wan, G., Liu, Y., Han, C., Zhang, X., and Lu, X. (2014). Noncoding RNAs in DNA repair and genome integrity. Antioxid Redox Signal 20, 655-677.

Wang, H., Li, Y., Truong, L.N., Shi, L.Z., Hwang, P.Y., He, J., Do, J., Cho, M.J., Li, H., Negrete, A., et al. (2014). CtIP maintains stability at common fragile sites and inverted repeats by end resection-independent endonuclease activity. Mol. Cell 54, 1012-1021.

Wang, H., Yang, H., Shivalila, C.S., Dawlaty, M.M., Cheng, A.W., Zhang, F., and Jaenisch, R. (2013). One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910-918.

Wang, X., McLachlan, J., Zamore, P.D., and Hall, T.M. (2002). Modular recognition of RNA by a human pumilio-homology domain. Cell 110, 501-512.

Wapinski, O., and Chang, H.Y. (2011). Long noncoding RNAs and human disease. Trends Cell Biol. 21, 354-361.

Warner, J.R., Soeiro, R., Birnboim, H.C., Girard, M., and Darnell, J.E. (1966). Rapidly labeled HeLa cell nuclear RNA. I. Identification by zone sedimentation of a heterogeneous fraction separate from ribosomal precursor RNA. J Mol Biol 19, 349-361.

Weill, L., Belloc, E., Bava, F.A., and Mendez, R. (2012). Translational control by changes in poly(A) tail length: recycling mRNAs. Nature Structural & Molecular Biology 19, 577-585.

212

Weinberg, R.A., and Penman, S. (1968). Small molecular weight monodisperse nuclear RNA. J Mol Biol 38, 289-304.

Whelan, G., Kreidl, E., Wutz, G., Egner, A., Peters, J.M., and Eichele, G. (2012). Cohesin acetyltransferase Esco2 is a cell viability factor and is required for cohesion in pericentric heterochromatin. EMBO J. 31, 71-82.

Wickens, M., Bernstein, D.S., Kimble, J., and Parker, R. (2002). A PUF family portrait: 3'UTR regulation as a way of life. Trends Genet. 18, 150-157.

Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862.

Willingham, A.T., Orth, A.P., Batalov, S., Peters, E.C., Wen, B.G., Aza-Blanc, P., Hogenesch, J.B., and Schultz, P.G. (2005). A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570-1573.

Winter, J., Jung, S., Keller, S., Gregory, R.I., and Diederichs, S. (2009). Many roads to maturity: microRNA biogenesis pathways and their regulation. Nat Cell Biol 11, 228-234.

Wu, L., Fan, J., and Belasco, J.G. (2006). MicroRNAs direct rapid deadenylation of mRNA. Proc Natl Acad Sci U S A 103, 4034-4039.

Xiao, Y., Liu, T., Zhao, H., Li, X., Guan, J., Xu, C., Ping, Y., Fan, H., Wang, L., Zhao, T., et al. (2014). Integrating epigenetic marks for identification of transcriptionally active miRNAs. Genomics 104, 70-78.

Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345.

Yang, H., Wang, H., Shivalila, C.S., Cheng, A.W., Shi, L., and Jaenisch, R. (2013). One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell 154, 1370-1379.

Yang, X., Boehm, J.S., Yang, X., Salehi-Ashtiani, K., Hao, T., Shen, Y., Lubonja, R., Thomas, S.R., Alkan, O., Bhimdi, T., et al. (2011). A public genome-scale lentiviral expression library of human ORFs. Nat Methods 8, 659-661.

Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17, 3011-3016.

Yik, J.H., Chen, R., Nishimura, R., Jennings, J.L., Link, A.J., and Zhou, Q. (2003). Inhibition of P-TEFb (CDK9/Cyclin T) kinase and RNA polymerase II transcription by the coordinated actions of HEXIM1 and 7SK snRNA. Mol Cell 12, 971-982.

Yoon, J.H., Abdelmohsen, K., and Gorospe, M. (2013). Posttranscriptional gene regulation by long noncoding RNA. J Mol Biol 425, 3723-3730.

213

Yoon, J.H., Abdelmohsen, K., Srikantan, S., Yang, X., Martindale, J.L., De, S., Huarte, M., Zhan, M., Becker, K.G., and Gorospe, M. (2012). LincRNA-p21 suppresses target mRNA translation. Mol. Cell 47, 648-655.

Zamore, P.D., Williamson, J.R., and Lehmann, R. (1997). The Pumilio protein binds RNA through a conserved domain that defines a new class of RNA-binding proteins. RNA 3, 1421-1433.

Zappulla, D.C., and Cech, T.R. (2004). Yeast telomerase RNA: a flexible scaffold for protein subunits. Proc Natl Acad Sci U S A 101, 10024-10029.

Zeman, M.K., and Cimprich, K.A. (2014). Causes and consequences of replication stress. Nat Cell Biol 16, 2-9.

Ziats, M.N., and Rennert, O.M. (2013). Aberrant expression of long noncoding RNAs in autistic brain. J. Mol. Neurosci. 49, 589-593.

Zieve, G., and Penman, S. (1976). Small RNA species of the HeLa cell: metabolism and subcellular localization. Cell 8, 19-31.

214

Curriculum Vitae

Sungyul Lee

6000 Harry Hines Blvd, NA6.200

Dallas, TX 75390

(214) 648-5185

[email protected]

Education

M.S. 2006 Seoul National University, College of Medicine (Seoul, South Korea)

Graduate Program of Molecular and Clinical Oncology, Supervisor: Jung Weon Lee

B.S. 2004 Korea University, College of Life Science and Biotechnology (Seoul, South Korea)

Dept. of Biotechnology and Genetic Engineering

Career

2011 ~ present: Visiting graduate student, Howard Hughes Medical Institute / UT Southwestern

Medical Center (Dallas, TX), Supervisor: Joshua T. Mendell

2009 ~ present: Ph.D. candidate, Pathobiology program at Johns Hopkins University School of

Medicine (Baltimore, MD), Supervisor: Joshua T. Mendell

2007 ~ 2009: Research Scientist, HanAll BioPharma, Co. Ltd. (Suwon, South Korea)

2006 ~ 2007: Research Scientist, Oscotec, Inc. (Chonan, South Korea)

Fulfills military service of Republic of Korea as a Technical Research Personnel.

215

Publications

1. Lee, S., Kopp, F., Chang, T.C., Sataluri, A., Chen, B., Sivakumar S., Yu H., Xie, Y., Mendell J.T.

(2016) Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins.

Cell 164, 1-12. In press

2. Chang TC, Pertea M, Lee S, Salzberg SL, Mendell JT (2015) Genome-wide annotation of

microRNA primary transcript structures reveals novel regulatory mechanisms. Genome Res 25,

1401-1409.

3. Choi S, Oh SR, Lee SA, Lee SY, Ahn K, Lee HK, Lee JW. (2008) Regulation of TM4SF5-

mediated tumorigenesis through induction of cell detachment and death by tiarellic acid. Biochim

Biophys Acta. 1783(9):1632-41.

4. Lee SY, Lee SA, Cho IH, Oh MA, Kang ES, Kim YB, Seo WD, Choi S, Nam JO, Tamamori-

Adachi M, Kitajima S, Ye SK, Kim S, Hwang YJ, Kim IS, Park KH, Lee JW (2008) Tetraspanin

TM4SF5 mediates loss of contact inhibition through epithelial-mesenchymal transition in human

hepatocarcinoma. J Clin Invest. 118(4):1354-66.

5. Kim YB, Lee SY, Ye SK, Lee JW (2007) Epigenetic regulation of integrin-linked kinase

expression depending on adhesion of gastric carcinoma cells. Am J Physiol Cell Physiol.

292(2):C857-66.

6. Lee SY, Kim YT, Lee MS, Kim YB, Chung E, Kim S, Lee JW (2006) Focal adhesion and actin

organization by a cross-talk of TM4SF5 with integrin alpha2 are regulated by serum treatment. Exp

Cell Res. 312(16):2983-99.

7. Lee MS, Kim YB, Lee SY, Kim JG, Kim SH, Ye SK, Lee JW (2006) Integrin signaling and cell

spreading mediated by phorbol 12-myristate 13-acetate treatment. J Cell Biochem. 99(1):88-95.

8. Lee MS, Kim TY, Kim YB, Lee SY, Ko SG, Jong HS, Kim TY, Bang YJ, Lee JW (2005) The

signaling network of transforming growth factor beta1, protein kinase C delta, and integrin underlies

the spreadingm and invasiveness of gastric carcinoma cells. Mol Cell Biol. 25(16):6921-36.

First-author papers

216

9. Kim YB, Yu J, Lee SY, Lee MS, Ko SG, Ye SK, Jong HS, Kim TY, Bang YJ, Lee JW (2005) Cell

adhesion status-dependent histone acetylation is regulated through intracellular contractility-

related signaling activities. J Biol Chem. 280(31):28357-64.

10. Lee MS, Ko SG, Kim HP, Kim YB, Lee SY, Kim SG, Jong HS, Kim TY, Lee JW, Bang YJ (2004)

Smad2 mediates Erk1/2 activation by TGF-beta1 in suspended, but not in adherent, gastric

carcinoma cells. Int J Oncol. 24(5):1229-34.

Presentation and Meeting Abstracts

2015 Symposium abstract: Noncoding RNA NORAD regulates genomic stability in human cells.

Innovations in Cancer Prevention and Research Conference (CPRIT), Austin, Texas USA

2015 Poster presentation: Regulation of chromosome stability by Noncoding RNA induced by

DNA damage (NORAD) in human cells. Keystone Symposia, MicroRNAs and Noncoding RNAs in

Cancer (E5), Keystone, Colorado USA

Honors and Scholarships

2013 Mogam Scholarship, Mogam Science Scholarship Foundation

2012~2013 Research Training Award from CPRIT (Cancer Prevention and Research Institute of

Texas), Cancer Intervention and Prevention Discovery Program (RP101496)

2000~2003 Semester High Honors, Korea University

2001~2003 Chungsoo Scholarships, Chungsoo Scholarship Foundation