20
Cell Stem Cell, Volume 18 Supplemental Information Divergent lncRNAs Regulate Gene Expression and Lineage Differentiation in Pluripotent Cells Sai Luo, J. Yuyang Lu, Lichao Liu, Yafei Yin, Chunyan Chen, Xue Han, Bohou Wu, Ronggang Xu, Wei Liu, Pixi Yan, Wen Shao, Zhi Lu, Haitao Li, Jie Na, Fuchou Tang, Jianlong Wang, Yong E. Zhang, and Xiaohua Shen

Divergent lncRNAs Regulate Gene Expression and Lineage ... · Divergent lncRNAs Regulate Gene Expression and Lineage Differentiation in Pluripotent Cells ... Ratios of corresponding

Embed Size (px)

Citation preview

Cell Stem Cell, Volume 18

Supplemental Information

Divergent lncRNAs Regulate Gene Expression

and Lineage Differentiation in Pluripotent Cells

Sai Luo, J. Yuyang Lu, Lichao Liu, Yafei Yin, Chunyan Chen, Xue Han, BohouWu, Ronggang Xu, Wei Liu, Pixi Yan, Wen Shao, Zhi Lu, Haitao Li, Jie Na, FuchouTang, Jianlong Wang, Yong E. Zhang, and Xiaohua Shen

INDEX OF SUPPLEMENTAL DATA

SUPPLEMENTAL FIGURES

Figure S1. Divergent lncRNAs correlate with regulatory functions in transcription and development,

and have earlier evolutionary origin. Related to Figure 1.

Figure S2. Prevalent transcriptional regulation by divergent lncRNAs. Related to Figures 2 and 3.

Figure S3. Loss-of-function analyses revealed a requirement for Evx1as in regulating EVX1

transcription. Related to Figure 4.

Figure S4. Overexpression analysis of Evx1as and EVX1. Related to Figure 4.

Figure S5. Mechanistic investigation of Evx1as function. Related to Figure 5.

Figure S6. Evx1as and EVX1 are required for mesendodermal differentiation. Related to Figures 6

and 7.

SUPPLEMENTAL TABLES

Table S1. A statistic summary of lncRNA/coding (r/c), coding/coding (c/c) and lncRNA/lncRNA

(r/r) gene pairs in defined biotypes across species. Related to Figure 1.

Table S2. List of lncRNAs located close to protein-coding genes in human (A) and mouse (B).

Related to Figure 1.

Table S3. List of protein-coding genes in defined biotypes in human (A) and mouse (B). Related to

Figure 1.

Table S4. The list of 168 conserved genes that neighbor divergent lncRNAs in human and mouse.

Related to Figure 1F.

Table S5. RNA-seq profiling of Fendrr knockdown and RA-induced differentiation of ESCs.

Related to Figures 2 and 3.

Table S6. RNA-seq profiling of day-4 differentiated ESCs depleted or lack of Evx1as and EVX1.

Related to Figure 7.

Table S7. A list of primers, probes, sgRNAs, siRNAs, shRNAs and ChIRP probes. Related to

Figures 2, 3, 4, 5, 6 and 7.

Table S8. High-throughput sequencing datasets used in this study. Related to Figures 1, 2, 3, 4, 5

and 7.

SUPPLEMENTAL EXPERIMENTAL PROCEDURES

SUPPLEMENTAL REFERENCES

A B C

D

E F

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6 7 8 9 10 11

Rat

io o

f tot

al ln

cRN

As

Evolutionary age assignment

XHlincRNA

G

Figure S1

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

XIXO

XH

XIXO

0

0.5

1

1

.5

Den

sity 0

0.5

1

1

.5

0

0.5

1

1.5

human mouse

0

0.5

1

1

.5

XO XO

SUSU

Pearson correlation coefficient (c.c)

lncRNA / coding (XH, XT, XI, SD)lncRNA / coding (XO, SU)coding / coding(XH, XT, XIO, SDU) lincRNA / coding

XT

XI

SD

XH

XT

XI

SD

XH XHc/c XT XI XO SD SU

0 5 10 15 20[-log10(p value)]

123/77696/556

82/475219/1720

49/28493/594

99/670 333/1110

199/569125/854

mouse

GO of protein-coding genes neighboring XH lncRNAs (mouse)

050

250

150

Human

Rhesus

Rabbit

Mouse

X.tropicalis

Zebrafish

Chimp Orangutan Gibbon

Marmoset

Guinea PigRat

Horse Dog Cow ElephantOpossum PlatypusChicken Zebra finch

Stickleback Tetraodon

Lizard

myr

450

350300

200

400

100

0

2

3

45

6 78 910 1112

1

13

Armadillo (dasNov3)Manatee (triMan1)

Opossum (monDom5)Tasmanian devil (sarHar1)Platypus (ornAna1)Chicken (galGal4)Painted turtle (chrPic2)X.tropicalis (xenTro3)Fugu (fr3)Zebrafish (danRer7)

Mouse (mm10)Rat (rn5)Naked mole rat (hetGla2)Squirrel (speTri2)Human (hg19)Rhesus (rheMac3)Marmoset (calJac3)Dog (canFam3)Cow (bosTau7)

myr

nervous system developmentcell differentiation

developmental processembryonic development

organ morphogenesispattern specification process

transcriptionpositive regulation of transcription

sequence-specific DNA bindingtranscription factor activity

lincRNA

−1.0 −0.5 0.0 0.5 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Pearson Correlation Coefficient

1.0

123456−10

Cumu

lative

Fra

ction

[(#

of p

airs

) / (t

otal

# o

f gen

es)]

0

0.1

0.2human (hg19)

0

0.1

0.2

Rat

io

mouse (mm10)

XHXH

c/cXH

r/r XTXT

c/cXT

r/r XOXI

Oc/cXI

Or/rXI SUSD

Uc/c

SDUr

/r

SD

Pea

rson

cor

rela

tion

coef

ficie

ncy

with

Fen

drr

(c.c

)

distance (kb) between Fendrr and nearby protein-coding transcripts

E

C

Figure S2

ESC neural / extraembryonic endodermRA

N2B27

Rel

ativ

e ex

pres

sion

0

2

Cacng6as Cacng6

Ctrl sh1

sh2 sh3

*

0

1

2

Plekhd1as Plekhd1

Ctrl sh1 sh2

0

1

2

Atrnas Atrn

Ctrl sh1

0

1

2

Graspas Grasp

Ctrl sh1 sh2

0

2

Rbm27as Rbm27

Ctrl

shRbm27as

0

1

2

Bcat2as Bcat2

Ctrl sh1

**

**

** * *

* *

*

Rel

ativ

e ex

pres

sion

Cancer cell (MCF7)GF

Rel

ativ

e ex

pres

sion

0

20000

40000

60000

80000

100000

0

1000

2000

3000

2ce

llm

oru

lab

lasto

cyst

2ce

llm

oru

lab

lasto

cyst

Gata6as GATA6

Rel

ativ

e ex

pres

sion

Rel

ativ

e ex

pres

sion

# of EGFP+;GATA6+ cells

total # of GATA6+cells= 50%

ii) GATA6as siRNA

& H2B-EGFP mRNA

i) scramble siRNA

& H2B-EGFP mRNA

Injecting RNA into

one cell of two-cell embryos blastocyst GATA6+ GFP+ GATA6+; GFP+

# of EGFP+;GATA6+ cells

total # of GATA6+cells< 50%

H

(i) (ii)

DESC neural / extraembryonic endoderm

RA

N2B27

0

0.5

1

1.5

CtrlshSox21as

Rel

ativ

e ex

pres

sion

0

0.5

1

1.5

Ctrl

shLhx1os

0

0.5

1

1.5

CtrlshIer2as

0

0.5

1

1.5

CtrlshZdhhc4as

* *

**

* *

*

*

I

0

0.5

1

1.5

shC

trl

shV

AT

1-1

shV

AT

1-2

shR

ND

2-1

shR

ND

2-2

shR

ND

2-3

VAT1 RND2

0

0.5

1

1.5

shC

trl

shC

EN

PQ

-1sh

CE

NP

Q-2

shM

UT

-1sh

MU

T-2

CENPQ MUT

0

0.5

1

1.5

shC

trl

shM

ED

29-1

shM

ED

29-2

shP

AF

1-1

shP

AF

1-2

shP

AF

1-3

MED29 PAF1

0

0.5

1

1.5

shC

trl

shR

RA

S

shS

CA

F1

RRASSCAF1

0

0.5

1

1.5

shC

trl

shN

ED

D8-

1

shN

ED

D8-

2

shN

ED

D8-

3

shG

MP

R2-

1

shG

MP

R2-

2

shG

MP

R2-

3

NEDD8 GMPR2

0

0.5

1

1.5

shC

trl

shV

PS

35-1

shV

PS

35-2

shO

RC

6

VPS35ORC6

**

0

0.5

1

1.5

shC

trl

shM

AG

T1

shC

OX

7B

MAGT1COX7B

*

0

0.5

1

1.5

shC

trl

shH

MM

R-1

shH

MM

R-2

shN

UD

CD

2-1

shN

UD

CD

2-2

HMMRNUDCD2

*

Rel

ativ

e ex

pres

sion

J

0

0.5

1

1.5

shC

trl

shS

MC

R8-

1

shS

MC

R8-

2

TOP3A SMCR8

0

0.5

1

1.5

shC

trl

shA

TP

5J-1

shA

TP

5J-2

shA

TP

5J-3

ATP5J GABPA

0

0.5

1

1.5

shC

trl

shA

TF

5-1

shA

TF

5-2

ATF5NUP62

0

0.5

1

1.5

2

shC

trl

shM

ED

25-1

shM

ED

25-2

shM

ED

25-3

FUZ MED25

Rel

ativ

e ex

pres

sion

Pearson correlation coefficiency with Fendrr (c.c)

A B

MouseHuman

A B

D

E

I

K

Figure S3

Rela

tive e

xpre

ssio

n

J

0

50

100

150

a b nc

Evx1as

WT Evx1as-null

0

50

100

150

200

250

a b nc

EVX1

WT Evx1as-null

Rela

tive e

xpre

ssio

n

*

*

*

*

CRISPR - on

Evx1as

Knock-outs

C

D A

B

KO #1

KO #2,3,4

1f2r 3r 4r 1r 5r

SexA1 SexA1~7 kb

probe

Evx1as

EVX1E

FE

EVX1 Knock-outs

KO #1

KO #2,3,4

0

0.5

1

1.5

2

Evx1as-pre Evx1as EVX1-pre EVX1 T

nc c-sgRNA d-sgRNA

e-sgRNA f-sgRNA

C

#2 #3 #4 WT

Outside

PCR

Inside

PCR

1f/5r

1f/2r

1f/3r

KO clones

#1 WT #1 WT #1 WT #1 WT

Outside

PCR Inside PCR

1f/1r 1f/3r 1f/4r 1f/2r

1.1 kb

KO clone:

F G

* * **

Evx1as KO clones

WT #1 #2 #3 #4

WT

(~7 kb)KO

Southern blot

H

T

mRNAEVX1

pre-mRNA

EVX1

mRNA

Evx1as

pre-mature

RNA

Evx1as

RNA

B

NANOG OCT4

TCL1 T

WT

EV

X1

KI

Evx

1as

KI

WT

EV

X1

KI

Evx

1as

KI

Rela

tive e

xpre

ssio

n t

o W

TA

CAG PGK-hygro

loxP loxP5’ HA 3’ HA

b

loxP loxP

CAG5’ HA 3’ HA

a

PGK-hygro

CAG

CAG

Sca1

7 kb

probe

13.8 kb

10.1 kb

Nde1

Nde1

Nde1Sca1

Evx1as CAG KI

EVX1 CAG KI

+CRE

C

D

Figure S4

0

0.5

1

1.5

2

Evx1as(s) Evx1as(l) Evx1as(rs) Hottip GFP

REX1 expression

sgRNA(a) sgRNA(b) sgRNA(REX1)

F G

0

0.5

1

1.5Tethering of EVX1 RNA

EVX1-sgRNA(a)

EVX1-sgRNA(b)

EVX1-sgRNA(REX1)

Rela

tive e

xpre

ssio

n

Rela

tive e

xpre

ssio

n

EVX1

pre-mRNA

T

mRNA

0

1

2

EVX1-pre EVX1 T

Evx1as(l)-sgRNA(a)

Evx1as(l)-sgRNA(b)

Evx1as(l)-sgRNA(REX1)

Hottip-sgRNA(a)

Hottip-sgRNA(b)

Hottip-sgRNA(REX1)

RNA tethering

Rela

tive e

xpre

ssio

n

* *

EVX1

mRNA

T

mRNA

EVX1

pre-mRNA

*

(ii)

*

(i)

WT (13.8 kb)

WT 1 2 3 4 5 WT WT 1 2 3 4 5 WT

Evx1as CAG KI EVX1 CAG KI

EVX1 CAG KI

Evx1as CAG KI

KI clone #:

E

0%

50%

100%

D0 D4

statistic analysis of RNA FISH

none one spot two spots

n=12 n=54

C

Figure S5

Perc

enta

ge o

f cells

D

*

*

Fold

enrichm

ent

to G

AP

DH

D4 D4

Evx1as

D4

D0

DAPI

D0 D0

B

D0

merge

G

H J

M

0

20

40

60

nc promoter

D0

D4

Rela

tive e

xpre

ssio

n

MED12 ChIP

anti-MED12

K L RNA pull-down

anti-MED1

(i)

(iii)

F

SMC1

ChIA-PET

(ESCs, D0)

SOX2

KLF4

ESRRB

CDK8

CDK9

RING1B

CBX7

rep1 PETs

rep2 PETs

Interaction

seq tags

ChI

P-

seq

(ES

Cs,

D0)

(+)

(-)

Ribominus total

RNA-seq(No LIF, D4)

D0

D4

Evx1asEVX1

0-100

0-100

0-41

0-107

0-23

0-9

0-32

0-23

0-27

0-111

0-43

0-36

0-32

Evx1as

ChIRP-seq

(ii)

anti-MED1

2kb

0-30

0-30

D0

D4

Evx1as

ChIRP-seq

10kb

MACS PEAKS

Gene nameHOXA11

Hoxa11asHOXA13 Evx1as

EVX1

E

I

A

AEvx1as: y = -3.9307x + 42.226

R² = 0.9964, primer efficiency 89.8%

EVX1: y = -4.178x + 44.665R² = 0.9962, primer efficiency 86.8%

0

5

10

15

20

25

30

0 5 10

Raw

Ct va

lues

Log10 of copy numbers

Evx1as trendline

EVX1 trendline

C E

0

0.5

1

1.5

2PB-GFP PB-Evx1as

0

5

10

15

20

PB-GFP PB-Evx1as

Rela

tive e

xp

ressio

n

Rela

tive e

xpre

ssio

n

H

*

Figure S6

Evx1as EVX1

KO KD KO KD (no LIF, day 4)

B

(i) (ii)

Trans-overexpression of Evx1as (no LIF, day 4)

D

0

-0.8

Enrichment plot: ME-high genes

WT (D4) Evx1as KD (D4)

MSGN1

HOXB1

WNT5A

T

EVX1

SNAIL1

TGFB1

BMP2

FGF3

MSX1

GSC

LHX1

CXCR4

EOMES

GATA6

SOX17

FOXA2

OTX2

ZIC2

ZIC5

SOX1

NRTN

NDRG2

SOX2

FOXD3

UTF1

PDZD4

TET1

NANOG

ERAS

WT (D4) Evx1as-null (D4) WT (D4) Evx1as-null (D4)

500 79 75

Evx1as KO EVX1 KO

Downregulated genes compared to WT

(FPKM>1, fold change<-2, p<0.05)

F

Term p-value

Fold

Enrichment

GO:0009888~tissue development 2.38E-06 5.03

GO:0032502~developmental process 2.04E-06 2.42

GO:0007275~multicellular organismal development 1.36E-06 2.55

GO:0009653~anatomical structure morphogenesis 5.33E-05 3.29

GO:0048522~positive regulation of cellular process 1.00E-04 2.95

GO:0009887~organ morphogenesis 1.97E-04 4.25

GO:0007389~pattern specification process 2.04E-04 6.46

GO:0030154~cell differentiation 2.28E-04 2.63

GO:0003002~regionalization 3.02E-04 7.50

GO:0009790~embryonic development 5.15E-04 3.77

G

Enrichment plot: XEN-high genes Enrichment plot: NPC-high genes

SUPPLEMENTAL FIGURE LEGENDS

Figure S1. Divergent lncRNAs correlate with regulatory functions in transcription and development, and have

earlier evolutionary origin. Related to Figure 1.

(A) Expression correlation analysis of lincRNAs with their nearest ten neighbor genes across 23 human tissues.

(B) Ratios of corresponding gene pairs in the human and mouse genome. The biotypes without a suffix represent

neighboring lncRNA/coding gene pairs. Suffixes ‘c/c’ and ‘r/r’ represent nearby protein-coding/coding and

lncRNA/lncRNA pairs, respectively. ‘XIO’ and ‘SDU’ comprises antisense-inside/outside and sense-

downstream/upstream, respectively. The y axis shows ratios of gene pairs versus total numbers of lncRNAs (for

lncRNA/coding, lncRNA/lncRNA pairs) or protein-coding genes (for coding/coding pairs).

(C) Pearson coexpression correlation of gene pairs. Pairs of lincRNAs and the nearest protein coding genes (black

curves) and pairs of coding/coding gene pairs (blue) serve as controls. Dotted lines are set at c.c = 0.7. Comparing

lncRNA/coding pairs to two control pairs, antisense lncRNA/coding pairs exhibit significantly higher positive

correlation (Wilcoxon p<5x10-6 for XH in human and mouse, p<1x10-4 for human XT, XI, XO and SD pairs, and

p<2x10-7 for mouse XI pairs).

(D) GO analysis of protein-coding genes neighboring various biotypes of lncRNAs in mouse. Selected GO terms

(enrichment score>1.5, p<1x10-6) in XH lncRNAs are shown. Approximately 509 genes in mouse are related to

transcription and development.

(E) and (F) Vertebrate phylogenetic tree with human (E) or mouse (F) at the top. Branch numbers represent

evolutionary age assignments. Smaller numbers mean older or greater evolutionary origins. Species names and

corresponding genome assemblies are shown.

(G) Evolutionary age distributions of mouse lncRNAs. The origination time of each lncRNA was dated according to

the vertebrate phylogenetic tree in panel (F). To avoid bias caused by neighboring genes, sequences overlapping

with protein-coding exons were filtered out. The x axis shows the age assignment at which a lncRNA first appears.

The y axis shows the ratio of lncRNAs falling into a particular age assignment in the corresponding class of total

lncRNAs. Divergent XH lncRNAs exhibit a skewed distribution towards older or greater evolutionary ages (lower

numbers on the left) compared to lincRNAs. The mean evolutionary age of mouse divergent lncRNAs is

significantly older than that of lincRNAs (5.8 for XH versus 6.3 for lincRNAs, [Wilcoxon p < 3.4x10-9]).

Figure S2. Prevalent transcriptional regulation by divergent lncRNAs. Related to Figures 2 and 3.

(A) Co-expression of two divergent gene pairs by RT-qPCR analysis during ESC differentiation induced by LIF

withdrawal (day 0 to 6, D0 to D6). Error bars represent standard deviations of mean expression normalized to

GADPH (n=3 biological replicates).

(B) RNAi knockdown (KD) of Evx1as led to attenuated activation of EVX1 in day 4-differentiated ESCs. To control

for possible off-target effects of RNAi, we expressed seven different Evx1as shRNAs by either retrovirus (#1-5)

or lentivirus (#6-7, also shown in Figure 2C) and observed consistent decreases in EVX1 mRNA upon Evx1as

depletion. ‘shCtrl’ is the scrambled shRNA control.

(C) Pearson correlation plot of coexpression of Fendrr and its nearby protein-coding genes (±500 kb) across 17 mouse

tissues and cell types.

(D) and (E) RNAi knockdown on day 2 of retinoic acid (RA)-induced differentiation in N2B27 medium towards neural

and extraembryonic endodermal lineages. In panel (D), knockdown of lncRNA Lhx1os, Sox21as, Zdhhc4as and

Ier2as caused downregulation of the corresponding divergent protein-coding gene, LHX1, SOX21, ZDHHC4 and

IER2, respectively. In panel (E), knockdown of lncRNA Cacng6as, Graspas, Plekhd1as, Bcat2as, Atrnas and

Rbm27as did not affect the corresponding divergent protein-coding gene.

(F) RNAi knockdown (KD) of Ifg1ras (RP11-35O15.1) downregulated the expression of IGF1R in MCF7 cells. In

panels (D, E and F), the y axis represents relative expression normalized to GADPH and the scramble control. Data

are shown as mean ± s.d. (n=4, including 2 independent knockdown and 2 technical replicates for each knockdown).

*p < 0.05.

(G) Co-activation of Gata6as and GATA6 during mouse embryonic development. The y axis represents fold changes

to 2-cell stage embryos. n=3 biological replicates.

(H) Schematic representation of two-cell injection experiment (related to Figures 3E and 3F). Scramble siRNA control

(siCtrl) or siRNAs against Gata6as or GATA6 were mixed with H2B-GFP mRNA and injected into one cell of

mouse embryos at the two-cell stage. H2B-GFP expression marks cells injected with siRNAs. Microinjected

embryos were cultured until blastocyst stage around E3.75 ~ E4, fixed and stained with anti-GATA6 antibody. In

each injected embryo, GATA6-positive and/or GFP-positive cells were counted. In scramble controls, the ratio of

cells expressing both GFP and GATA6 (GFP+; GATA6+) versus total numbers of cells expressing GATA6 (GATA6+)

should be equal to 50% because only one cell in the two-cell embryo is injected with GFP mRNA together with

siRNA. However, Gata6as RNAi embryos have a ratio lower than 50% as GATA6 expression is attenuated upon

Gata6as depletion.

(I) Heatmap of the expression of 12 randomly selected divergent coding/coding pairs. that we successfully knocked

down. Gene names in red indicate that mRNA gene knockdown affected the nearby mRNA, while the blue indicates

that mRNA gene knockdown failed to affect the nearby mRNA. Gene names in grey indicate genes that failed to

be knocked down by RNAi.

(J) The effects of knockdown of genes shown in panel (I) on their nearby gene expression. The y axis represents

relative mean expression normalized to GADPH and the scramble shRNA (Ctrl) cells. Data are shown as mean ±

SD (n=2 technical repeats). **indicates significant changes elicited by knockdown of a nearby mRNA gene (p <

0.05).

Figure S3. Loss-of-function analyses revealed a requirement for Evx1as in regulating EVX1 transcription.

Related to Figure 4.

(A) and (B) Highly correlated coexpression of Evx1as and EVX1 in human (A) and mouse (B). Both genes are highly

expressed in mesendoderm and mesoderm cells during early development, and are activated during ESC

differentiation at days 2, and 6 (D2, D4 and D6) induced by LIF withdrawal (-LIF).

(C) Schematic diagram of CRISPR inhibition (CRISPRi). To investigate the effect of transcription on nearby gene

regulation, we performed CRISPRi to inhibit elongation of Evx1as or EVX1 transcripts and assayed the effects on

the other gene’s expression during ESC differentiation. Relative locations of sgRNAs (c, d, e and f) are shown. The

sgRNA c and f target the non-template strand of Evx1as or EVX1, respectively. The sgRNAs d and e target the

template strand of Evx1as or EVX1, respectively.

(D) RT-qPCR analysis of CRISPRi on day-3 differentiation induced by LIF withdrawal. The y axis shows relative

expression normalized to GADPH and the control cells. The sgRNA c targeting the non-template strand of Evx1as

efficiently inhibited Evx1as transcription and significantly downregulated both pre-mRNA and mRNA levels of

EVX1. In contrast, the sgRNA d targeting the same region on the template strand (overlapped 9-bp with the sgRNA

c) did not affect Evx1as or EVX1 expression. In addition, the sgRNA f targeting the non-template strand of EVX1

moderately decreased EVX1 transcription, but failed to affect Evx1as expression

(E) Schematic diagram of Evx1as and EVX1 knockout strategies by CRISPR/Cas9. Two knockout strategies to delete

Evx1as. Knockout (KO) #1 was generated with the sgRNA pair B and C, while KOs #2, 3 and 4 were generated

with the sgRNA pair A and D. The relative positions of the probe used for southern blot analysis (brown bar) and

PCR primers (green arrows) are indicated. For EVX1 knock-outs, the sgRNA E was used to mutate EVX1 (KO #1).

The sgRNA pair E and F was used for KO #2, 3 and 4. EVX1 knockout mutations were confirmed by PCR and

sequencing shown in panel (I).

(F) PCR genotyping analysis of Evx1as-null ESCs (KO clone #1). PCR with the ‘Outside’ primers (1f and 1r) generated

a ~1.1 kb band representing the deletion allele. The wild-type allele would generate a ~3.3 kb band which is too

long to be detected due to the short extension time used (1 min). The ‘Inside’ PCR primers (1f with 3r, 4r or 2r)

detected the WT allele (indicated by red arrows) but failed to amplify deletion alleles.

(G) PCR genotyping analysis of Evx1as-null ESCs (KO clones #2-4). PCR with the ‘Outside’ primers (1f and 5r)

generated a ~800 bp band representing the deletion allele. The wild-type allele would generate a ~5.2 kb band, but

this is too long to be detected due to the short extension time used (1 min). The ‘Inside’ primers (1f and 2r/3r)

detected the WT allele but not the deletion alleles.

(H) Southern blotting of Evx1as-null clones. Genomic DNAs were digested by SexA1. The Southern probe is located

upstream of the deletion regions. The expected fragment sizes are ~7 kb for wild-type, ~4.8 kb for KO clone #1

and ~2.5 kb for KO clones #2-4. KO #1 shows one Southern band with the expected size (~4.8 kb). KO clones #2,

3 and 4 show the expected deletion in one allele but show various deletions or mutations in the other allele,

indicating imprecise cutting of CRISPR/Cas9 at the sgRNA targeting sites. Nevertheless, all four KO clones

showed blocked activation of Evx1as and EVX1 during ESC differentiation in Figure 4F.

(I) Sequencing analysis of the four EVX1 knockout ESC clones. Evx1as KO #1 has a 19-bp deletion in exon 2 of EVX1,

resulting in a frame shift and disrupted homeodomain of EVX1. KO #2 contains a 1-bp insertion in exon 2 and a

78-bp deletion in intron 2, resulting in a nonsense STOP codon. KO #3 contains a 57-bp sequence replacement and

insertion in exon 2 and an 8-bp deletion in intron 2. KO #4 contains a 223-bp deletion covering the splicing junction

of exon 2 and intron 2 of EVX1.

(J) Evx1as and EVX1 expression induced by CRISPR-on in wild-type (WT) and Evx1as-null ESCs. The y axis shows

relative expression normalized to GADPH and the wild-type cells.

(K) Knockdown of EVX1 by lentivirus-mediated RNAi failed to affect Evx1as expression in day 4 of ESC

differentiation. n=4 replicates, including four independent knockdown by two shRNAs against EVX1.

In panels (D and J-K), data are shown as mean ± SD (n=3 independent experiments unless otherwise indicated). *

indicates p < 0.05 compared to the control.

Figure S4. Overexpression analysis of Evx1as and EVX1. Related to Figure 4.

(A) Overexpression (OE) of Evx1as or EVX1 in trans by transposon-mediated random integration in ESCs had no

effect on the transcription of the other gene. Data are shown as mean ± s.d. (n=3, biological replicates). *p < 0.05

compared to WT ESCs.

(B) Schematic diagram of two-step generation of CAG-promoter knockin (KI) ESCs by CRISPR/Cas9. The sgRNAs

a and b, which target the corresponding insertion sites, were used to facilitate homologous recombination. CRE

recombinase was used to excise the PGK-hygromycin resistance gene cassette.

(C) Southern blot analysis of Evx1as and EVX1 CAG KI clones. Genomic DNA was digested with Sca1 and Nde1 and

hybridized with the probe shown in panel (B). The expected bands for wild-type, Evx1as CAG KI and EVX1 CAG

KI are ~13.8 kb, ~10.1 kb, and ~7.5 kb, respectively.

(D) Expression of pluripotency (NANOG, OCT4, TCL1) and mesendodermal (T) marker genes in knockin ESCs. Data

are shown as mean ± s.d. (n=4, biological replicates). The normal pluripotency program was observed in Evx1as

CAG KI ESCs, ruling out the possibility that a change in cellular state caused the change in EVX1 expression.

(E) The effect of tethering Evx1as transcripts (the long isoform) and HOTTIP to the Evx1as/EVX1 promoter region.

Relative positions of sgRNAs a and b fused to RNA are shown in Figure 4D. A sgRNA targeting a non-related

genomic sequence (the TSS of REX1) fused with RNA serves as the negative control. Tethering the long isoform

(l) of Evx1as RNA to the promoter of Evx1as/EVX1 significantly increased the levels of EVX1 pre-mRNAs and

mRNAs. In comparison, tethering the lncRNA HOTTIP known to be involved in transcription activation failed to

increase the levels of EVX1.

(F) The effect of tethering EVX1 transcripts to the Evx1as/EVX1 promoter region.

(G) RNA tethering had no effect on REX1 expression by RT-qPCR. As REX1 is highly expressed in pluripotent ESCs,

tethering Evx1as to its promoter cannot further enhance the transcription of REX1 because of its strong endogenous

promoter activity. In addition, Evx1as RNA transcripts specifically bind to its own locus and 3’ downstream regions

on chromatin, suggesting that its regulatory function may require specific genomic sequences or chromatin context.

The short (s) and long (l) isoforms, the reverse transcript of the short isoform (rs) of Evx1as, HOTTIP and GFP

were fused with the sgRNAs a, b and REX1. In panels (E-G), the y axis represents relative expression normalized

to the corresponding RNA transcripts tethered with the sgRNA(REX1). Data are shown as mean ± s.d. (n=3

independent transfection experiments).

Figure S5. Mechanistic investigation of Evx1as function. Related to Figure 5.

(A) Subcellular distribution of Evx1as and EVX1 transcripts. GADPH, U1 and Xist RNAs serve as fractionation

controls for cytosolic, nuclear and chromatin fractions, respectively. Evx1as transcripts are detected in both the

cytoplasm and the nucleus, and nuclear Evx1as primarily binds to chromatin.

(B) RNA FISH of Evx1as RNA in undifferentiated (D0) and day 4 (D4)-differentiated ESCs. The big red box contains

an enlarged view of the small boxed area. Scale bar, 10 m. The pattern of two nuclear signals of Evx1as was only

observed in day-4 differentiated ESCs, but not in undifferentiated ESCs with negligible expression of Evx1as,

indicating the specificity of the FISH probes to recognize Evx1as RNA instead of its DNA locus. Cytosolic Evx1as

transcripts may be diffused in the cytoplasm, resulting in a low cytosolic concentration that is difficult to be

detected by FISH. So the results from RNA FISH and qPCR analysis are not contradictory to each other.

(C) A statistical summary of Evx1as RNA FISH shown in panel (B). Grey, dark blue and light blue boxes represent

ESCs that do not detect Evx1as, or detect one or two signal spots of Evx1as, respectively. Numbers of cells

analyzed are indicated.

(D) RT-qPCR analysis of RNA transcripts captured by ChIRP in day-4 differentiated ESCs. Evx1as probes specifically

pulled down Evx1as RNA, while EVX1 probes specifically pulled down EVX1 mRNA. GAPDH and T serve as

negative controls. The y axis shows fold enrichment normalized to GAPDH.

(E) Evx1as ChIRP-seq analysis in D0 and D4 differentiated ESCs. Peaks (p<1x10-5) called by MACS program are

shown by vertical bars in pink.

(F) The Evx1as/EVX1 locus in genome browser. Tracks of Evx1as ChIRP-seq (zoon-in view) are shown on the top.

Sequencing tracks of ribominus total RNA-seq performed on the SOLiD sequencing platform are shown below.

No or few signal reads are detected beyond the TES of Evx1as in total or polyA enriched RNA-seq. In comparison,

the ChIRP-seq signals extend >20 kb downstream of the Evx1as locus, reflecting chromatin association of Evx1as

transcripts rather than picking the full extent of Evx1as RNA transcription.

Two biological replicates of SMC1 ChIA-PET indicate two PET peaks (that is, interaction hubs) at both the

promoter and potential enhancer of the EVX1/Evx1as. A high-confidence interaction between the potential

enhancer and promoter is identified by a ChIA-PET analysis program. Multiple sequencing tags that connect the

potential enhancer (boxed in green) and the promoter region of Evx1as/EVX1 are shown at the bottom.

(G) Southern blotting of two enhancer knockout ESC lines that were isolated independently. The region deleted in the

enhancer knockouts is boxed in green in panel (E).

(H) RT-qPCR analysis of Evx1as and EVX1 expression in enhancer knockout ESCs on day 4 of LIF withdrawal. n =

4, including 2 PCR repeats for 2 independent differentiation experiments.

(I) The effect of tethering Evx1as transcripts (short isoform) to the Evx1as/EVX1 enhancer. A sgRNA targeting a non-

related genomic sequence (the TSS of REX1) fused with Evx1as RNA serves as the negative control. n = 3

independent transfection experiments.

(J) ChIP-qPCR of MED12 at the Evx1as/EVX1 promoter in day-0 (D0) or day-4 (D4) differentiated wild-type ESCs.

The y axis shows fold enrichment relative to the IgG. ‘nc’ represents an unrelated genomic region (primers CSa).

(K) ChIP-qPCR of CTCF in wild-type (WT) and Evx1as-null ESCs in day-0 and day-4 differentiation.

(L) In vitro pull-down assay. In vitro transcribed, biotin-labeled Evx1as RNA (short and long isoforms) and GFP were

used to pull down Mediator proteins in cell lysates of day-4 differentiated ESCs. The long isoform (l) of Evx1as

as well as its reverse form (rl) captures both MED1 and MED12, whereas the short isoform (s) as well as its

reverse form (rs) interacts with MED1 but not MED12. We suspected that the reverse form of Evx1as transcripts

might form similar secondary structures that can be recognized by the Mediator in vitro, which may not serve as

a good control. In contrast, the biotin-labeled GFP control RNA failed to capture MED1 and MED12 in vitro.

(M) Knockdown of MED12 attenuates activation of Evx1as and EVX1 in day-4 differentiation induced by LIF

withdrawal. Two shRNAs against for MED12 were used. ‘Ctrl’ is the scrambled shRNA control. All PCR data are

shown as mean ± SD (n=4, including 2 biological and 2 technical replicates unless otherwise indicated). *p < 0.05.

Figure S6. Evx1as and EVX1 are required for mesendodermal differentiation. Related to Figures 6 and 7.

(A) Standard curves for qPCR detection of Evx1as and EVX1 in single-cell analysis. Evx1as and EVX1 have similar

amplification efficiency. Purified PCR products of Evx1as and EVX1 with known concentration were used as the

template. n=3 replicates.

(B) Box plot of the numbers of RNA molecules in Evx1as- or EVX1-expressing cells on day-4 differentiation by direct

RT-qPCR without a pre-amplification step. A threshold of >2 RNA molecules for either gene was chosen for cells

expressing Evx1as or EVX1. In conditions omitting the amplification step, about 36.4 Evx1as or 763 EVX1

transcripts per cell are detected at the median level in day 4-differentiated ESCs. Because the range of transcript

numbers per cell was similar with or without amplification, data were pooled in Figures 6B-6D.

(C) Gene set enrichment analysis (GSEA) shows that Evx1as-null ESCs exhibit no significant global changes in the

expression of NPC and XEN genes compared to wild-type cells on day 4 of LIF withdrawal, suggesting that

Evx1as is specifically required for ME differentiation. NPC-high (198) and XEN-high genes (35) were previously

defined as the set of genes that are highly expressed in neural precursor cells and extraembryonic endoderm,

respectively (Table S6C). Normalized enrichment score (NES) and nominal p values are shown in each plot.

(D) ESCs depleted of Evx1as by RNAi knockdown (KD) on day 4 of LIF withdrawal show global downregulation of

ME genes by GSEA, which is consistent with Evx1as-null ESCs in Figure 7B. Normalized enrichment score (NES)

and nominal p values are shown.

(E) Heatmap of fold changes of representative genes shown in Figure 7D. RNA-seq analysis of knockout (KO) and

knockdown (KD) ESCs of Evx1as or EVX1 on day 4 of LIF withdrawal showed similar expression changes.

(F) A set of 79 genes are downregulated in both knockout mutants (fold-change >2 and p<0.05).

(G) GO analysis of the common downregulated genes (79) in both knockout mutants. These genes are significantly

enriched in development-related terms. Many of them are known regulators of mesodermal development,

including WNT5A, FGF18, TGFB2, SP8, TGFB1I1, HOXB1, ESX1, NKX2-9, WNT10B and EDN1 etc.

(H) RT-qPCR analysis of differentiated maker genes in ESCs overexpressing Evx1as in trans by PB transposon-

mediated random integration. Panel (i) shows the fold of upregulation of Evx1as in differentiated day-4 ESCs.

Panel (ii) shows no obvious changes in EVX1 and marker genes when ectopically overexpressing Evx1as. Data

are shown as mean ± s.d. (n=3, biological replicates). *p < 0.05 compared to the control ESCs expressing PB-GFP.

SUPPLEMENTAL EXPERIMENTAL PROCEDURES

LncRNA annotation

Due to the continuously growing numbers of lncRNA genes that have been identified in recent years, we first

compiled a non-redundant yet comprehensive list of lncRNAs for both human and mouse. We used Cuffcompare

(Trapnell et al., 2012) to assemble all lncRNAs annotated in RefSeq, GENCODE, UCSC and Ensembl in mouse (mm10)

or human (hg19). We first extracted the set of noncoding genes (NR_*) from RefSeq (Pruitt et al., 2014) and used it as

the starting reference. We then used Cuffcompare to add in non-redundant noncoding transcripts annotated in

GENCODE (Harrow et al., 2012) to the starting set of RefSeq noncoding transcripts. Only noncoding transcripts with

class codes “i” (a transfrag falling entirely within a reference intron), “u” (unknown, intergenic transcript), and “x”

(exonic overlap with reference on the opposite strand) were kept and combined with the previous set to form a new

reference set of noncoding transcripts for subsequent comparison. This procedure was reiterated to add in non-redundant

annotations from UCSC (Karolchik et al., 2014) and Ensembl (Flicek et al., 2014). Finally, we removed all noncoding

transcripts shorter than 200 nt and then used Cuffcompare to filter out transcripts overlapping largely with known

protein-coding gene exons to obtain the final list of lncRNAs. Isoforms of lncRNA transcripts were combined to

generate non-redundant lists of lncRNA genes for both mouse and human. In total, we annotated 20,489 human

transcripts corresponding to 14,801 lncRNA genes, and 7,385 mouse transcripts corresponding to 6,240 lncRNA genes.

LncRNA classification

We used 5 kb as the cutoff distance for classification. LncRNA transcripts located at least 5 kb away from the gene

body of a protein-coding gene were classified as intergenic lncRNAs (referred to as the ‘lincRNA’ biotype). LncRNAs

< 5kb from the nearest protein-coding gene were further subdivided into ‘antisense’ (XH, XT, XI and XO) or ‘sense’

(SD and SU) biotypes (Figure 1C) according to their position and transcriptional orientation relative to the protein-

coding gene.

An XH lncRNA is defined as antisense and head-to-head (or divergent) relative to the nearby protein-coding

transcript. In the XH biotype, the difference between the genomic coordinates of the two transcription start sites (TSSs)

must be less than 5 kb, and the genomic coordinates of both TSSs must fall within the range of the two transcription

end sites (TESs). An XT lncRNA is defined as antisense and tail-to-tail relative to the nearby protein-coding transcript.

In the XT biotype, the difference between genomic coordinates of the two TESs must be less than 5 kb, and the genomic

coordinates of both TESs must fall within the range of the two TSSs. For the XI biotype (antisense-inside), a lncRNA

transcript must fall within the nearby protein-coding gene locus. For the XO biotype (antisense-outside), a lncRNA

transcript must completely cover a protein-coding transcript. A sense lncRNA located downstream of or contained

within a protein-coding gene is defined as the SD biotype. A sense lncRNA located upstream of or covering a protein-

cold gene locus is defined as the SU biotype.

Transcript pairs were then consolidated into gene pairs to remove redundant pairing in a locus. Because a gene

may have several nearby genes within a 5 kb region, a lncRNA or protein-coding gene may fall into multiple biotypes.

Protein-coding gene pairs (‘c/c’) or lncRNA pairs (‘r/r’) were classified similarly except for “XIO” and “SDU” which

contain inside/outside or downstream/upstream, respectively.

Neighboring gene and simulation analysis

We calculated the fraction of protein-coding genes that are located in a defined genomic distance from a neighboring

lncRNA gene or coding gene, and compared the observed values to simulated distributions obtained by random

positioning (Figure 1B). We only kept the longest transcript for each gene for analysis of neighboring genes. To explore

a potential distance effect on the analysis, we used different distances ranged from 0 to 20 kb with 1-kb step size for

neighbor definition. For simulation, we randomly rearranged gene loci for all genes in the human genome (3.3x109 bp)

for ten times, and the average values were shown in Figure 1B.

Coexpression correlation analysis

To reveal potential interactions between lncRNA genes and neighboring genes, we analyzed expression

correlations of each lncRNA with its nearest ten genes within a range of 500-kb distance by pairwise Pearson correlation

analysis across 23 human tissues (Figures 1A and 1D). For coexpression analysis in Figure S1C, Pearson correlation

coefficients (c.c) were calculated for each gene pair across 17 mouse and 23 human tissues shown in Table S8. Only

gene pairs with at least one gene that is expressed (cutoff for protein-coding genes: FPKM >5 and for lncRNA:

FPKM >1) in at least one tissue type were kept for this analysis. The closest protein-coding gene was identified for

each lincRNA (Table S2). These lincRNA/coding gene pairs were used as a negative control. Divergent XH

lncRNA/coding pairs show higher positive correlations than the control XHc/c pairs and the lincRNA/coding pairs in

both human and mouse (Wilcoxon p<5x10-6, Figure S1C).

Gene Ontology (GO) and phenotype ontology analysis

GO analysis was performed using DAVID bioinformatics tools (Huang da et al., 2009). We considered a particular

GO term to be significantly enriched if it has an enrichment score larger than 1.5 and a p value less than 1x10-6. A total

of 2,517 out of 2,714 protein-coding genes neighboring XH lncRNAs, 1,379 of 1,514 in XT, 2,154 of 2,325 in XI, 280

of 305 in XO, 1,369 of 1,509 in SD, 468 of 521 in SU, and 2,605 of 2,911 in XHc/c in human were annotated for GO

terms. GO analysis was performed similarly for mouse genes and genes that overlapped in both human and mouse.

We used GREAT (Genomic Regions Enrichment of Annotations Tool) (McLean et al., 2010) to analyze

mammalian phenotype ontology terms defined by Mouse Genome Informatics (MGI) on the above sets of protein-

coding genes associated with human lncRNA biotypes. Significantly enriched phenotype terms (hypergeometric p-

value <1x10-6) were selected and plotted based on [–log10(p value)] in Figure 1G.

Evolutionary age analysis. To gain perspective on the evolution of divergent lncRNAs, we dated lncRNA genes on

the vertebrate phylogenetic tree by following a previous strategy (Zhang et al., 2010). We filtered out sequences

overlapping with protein-coding exons to avoid bias caused by neighboring genes (Zhang et al., 2010). Out of all

vertebrate genome sequences targeted by the UCSC genome alignment pipeline, we chose a subset with relatively good

assembly quality (as revealed by larger contig N50s) as the outgroup species. For each gene of interest in human or

mouse, we inferred the phylogenetic distribution of its orthologs based on pair-wise syntenic genomic alignment from

the UCSC website and determined the time when this locus originated by following the parsimony rule. Since the

pipeline mainly depends on the chromosomal coordinates, overlapping exons between lncRNAs and coding genes were

removed first before orthology inference. Moreover, the pipeline works at the DNA sequence level and does not

consider whether the corresponding orthologous locus in each outgroup can be transcribed or not. Thus, the age

assignment represents a conservative or upper bound estimate of the time of origin of the gene of interest. In human,

divergent lncRNAs are more ancient, with a mean age of 4.79, than lincRNAs (mean age 5.73, Wilcoxon p<2.2x10-16)

and other lncRNA biotypes (mean ages ranging 4.85-5.11). Similarly in mouse, divergent lncRNAs have a mean age of

5.77 compared with 6.29 for lincRNA (Wilcoxon p<3.4x10-9) and 5.70-7.10 for other lncRNA biotypes.

ESC culture, differentiation and reprogramming

Wild-type (CJ9, 46C) and various knockout and knockin embryonic stem cells (ESCs) were cultured on gelatin-

coated plates in standard ESC medium consisting of DMEM (Cellgro) supplemented with 15% heat-inactivated fetal

bovine serum (Hyclone), 1% Glutamax (GIBCO), 1% Penicillin/Streptomycin (Cellgro), 1% nucleoside (Millipore),

0.1mM 2-mercaptoethanol (GIBCO), 1% MEM nonessential amino acids (Cellgro), and 1000U/ml recombinant

leukemia inhibitory factor (Millipore). Mesendodermal (ME) differentiation of ESCs was induced by LIF withdrawal.

ME cells were derived from ESCs carrying brachyury (T)-driven GFP and enriched by FACS sorting of day-6

differentiated ESCs positive for T-GFP expression (Shen et al., 2009; Shen et al., 2008). NPCs (neural precursor cells)

and NSCs (neural stem cells) were derived from 46C ESCs carrying a SOX1 promoter driven-GFP reporter (Conti et

al., 2005). For differentiation towards neural and extraembryonic endoderm lineages, 46C cells were plated in N2B27

medium supplemented with 2 M of retinoic acid (RA) (Okada et al., 2004; Yin et al., 2015; Ying and Smith, 2003).

Differentiated cells were harvested at the indicated time points from day 0 to day 6 for gene expression analysis.

Reprograming of pre-iPSCs to induced pluripotent stem cell (iPSC) was performed as described previously

(Fidalgo et al., 2012; Theunissen et al., 2011). The pre-iPSCs were first infected with lentivirus expressing a scramble

shRNA control or shRNAs against Ccnyl1as and then co-transfected with pBASE transposase and a PiggyBac (PB)

transposon carrying NANOG-expressing cassette. Transduced pre-iPSC cells were selected by puromycin (for RNAi)

and hygromycin (for PB-mediated NANOG overexpression) for >4 days and seeded (1×105 cells per well) on a six-

well plate in ESC media (serum plus LIF) for 4 days, and then switched to serum-free N2B27 medium supplemented

with LIF and 2i (dual inhibition of mitogen-activated protein kinase signaling [PD0325901, 1 M)] and glycogen

synthase kinase-3 GSK3 [CHIR99021, 3 M] ). After 10 days in 2i/LIF medium, iPSC clones positive for OCT4-GFP

were picked, expanded and analyzed by RT-qPCR.

RNA interference (RNAi)

We randomly chose 41 divergent lncRNA genes among the list of lncRNAs that contain more than one exon and

are upregulated during RA-induced differentiation of ESCs with FPKM of a lncRNA >1 and FPKM of a paired protein-

coding gene >1. We constructed 3 shRNAs for each lncRNA and performed RNAi twice for a total of 123 shRNA

constructs. Only 16 lncRNAs were knocked down by at least one shRNA and 10 of them attenuated the transcription

of nearby coding gene.

For divergent protein-coding/coding pairs, we randomly picked 24 genes in 12 pairs that are expressed (FPKM>1)

during RA differentiation. We used 3 shRNAs for each protein-coding gene and performed RNAi for a total of 72

shRNA constructs. We successfully knocked down 20 genes in 12 pairs by at least one shRNA. Among them, only 4

protein-coding genes in 3 divergent pairs appeared to have a positive effect on nearby gene transcription upon depletion.

RNAi was performed as described previously (Shen et al., 2008). For Evx1as knockdown, we used both retrovirus-

mediated (LUMPIG vector) (Wang et al., 2007) and lentivirus-mediated (pLKO vector) (Moffat et al., 2006) RNAi. To

achieve consistent knockdown with high efficiency, we subsequently used lentivirus-mediated RNAi (pLKO) for other

RNAi experiments except for Gata6as. Lentivirus was packaged and generated in 293T cells. Infected ESCs or MCF7

were selected by puromycin for 48 hours before harvesting for RNA analysis. For differentiation experiments, infected

ESCs were lifted 48 hour post-selection by puromycin and plated at low density in various differentiation culture media

without addition of puromycin. Error bars were based on different shRNAs or independent infection experiments (n

3).

Microinjection of siRNAs into mouse zygotes was performed as described previously (Sharif et al., 2010). Mouse

zygotes or two-cell embryos from superovulated C57BL/6 females mated with CBA males were collected in fresh M2

medium. For RNA analysis shown in Figure 3D, 250 ng/l of scramble siRNA control or siRNAs against Gata6as were

injected into one-cell embryos. To study the effect of Gata6as knockdown on the number of cells expressing GATA6

shown in Figures 3E and 3F, we co-injected 250 ng/l of a scramble siRNA control or siRNAs against Gata6as or

GATA6 with H2B-GFP mRNA into one cell of two-cell embryos (Figure S2H). H2B-GFP expression marks the cells

that are injected with siRNAs. Five different siRNAs against Gata6as were separated into two shRNA pools for

injection. Microinjected zygotes (20-30 embryos per injection) were cultured in KSOM medium for 2-3 days.

Blastocysts were harvested for RNA analysis or fixed and stained with anti-GATA6 antibody (R&D Systems, AF1700).

For RT-qPCR of embryos, expression was normalized to TUB4 and error bars were calculated from four independent

microinjection experiments. Primers for shRNAs and siRNAs are listed in Table S7.

Antisense oligonucleotide (ASO) treatment

An alternative gene-silencing approach is based on treatment with locked nucleic acid-based anti-sense

oligonucleotide gapmers (ASOs), which induce RNA degradation by recruiting RNase H to their target RNAs in a

strand-specific manner (Wheeler et al., 2012). The mechanism is distinct from the RNAi approach, which involves

AGO/RISC complexes. As it is difficult to sustain the levels of transfected ASO necessary for efficient knockdown

during 4 days of ESC differentiation, we assayed the effect of ASO treatment of Evx1as while we artificially activated

Evx1as and EVX1 by CRISPR-on (Konermann et al., 2015). ASO treatment was performed as previously described

(Yin et al., 2012). We synthesized five antisense phosphorothioate-modified oligodeoxynucleotides (ASO) targeting

Evx1as RNA (BioSune, in Shanghai, China). In Figures 4E, 4G and 5D, five ASOs were mixed together for Evx1as

knockdown. The control ASO was the same as previously used (Yin et al., 2012). For ASO treatment in CRISPR-on

experiment, ESCs were first transfected with CRISPR-on components including dCas9-VP64, MS2-P65-HSF1 and

sgRNA (fused with MS2). After 12 hours of transfection, ESCs were transfected again with ASO (Lipofectamine 3000,

Life Technology). Transfected cells were then cultured for 2 days before harvest for RNA analysis.

CRISPR/Cas9-mediated knockout and knockin

Plasmids expressing Cas9 (Addgene ID 44758; ‘pST1374-N-NLS-flag-linker-Cas9’) and sgRNAs (‘pGL3-U6-

sgRNA’) were mixed in a 1:1 ratio and cotransfected into ESCs by lipofectamine 2000 (Mali et al., 2013; Shen et al.,

2013; Zhou et al., 2014) (Life Technologies, #200059-61). For knockout, Cas9 and two sgRNAs flanking the genomic

regions to be deleted were cotransfected into ESCs. For knockin, considering that CRISPR/Cas9-mediated excision in

a targeted genomic region might enhance the rate of homologous recombination in ESCs, we cotransfected a targeting

vector for homologous recombination with Cas9, one sgRNA that targets the site of insertion and one sgRNA that

targets the vector for linearization. Targeting plasmids contain a CAG promoter sequence and a hygromycin selection

cassette (PGK-hygro) which is flanked by two loxP sites and can be excised by CRE recombinase. The CAG and PGK-

hygro cassettes are flanked by 5’ and 3’ homology arms (~1 kb in length) which allow homologous recombination and

precise insertion into targeting sites.

ESCs were selected by puromycin (for Cas9-expressing cells) or hygromycin (for knockin vectors) 12 hours post

transfection for 2 days and a portion of them were subjected to genomic DNA isolation and PCR analysis of deletion

or insertion alleles. ESCs transfected with Cas9 alone were used as a negative control for PCR analysis. If deletion or

insertion was detected in mixed cells, ESCs were then plated at low density in 10-cm plates and clones were picked in

6-7 days. Individual ESC clones (24-96 clones) were picked, expanded and analyzed by PCR genotyping and Southern

blotting.

CAG knockin ESCs were transfected with CRE-GFP plasmids to excise the PGK-hygro cassette. CRE-GFP

positive cells were FACS sorted and plated. Single ESC clones were picked, expanded and analyzed by PCR. Primers

and sgRNA sequences are listed in Table S7.

CRISPR-mediated activation (CRISPR-on), interference (CRISPRi)

CRISPR-on was performed as previously described (Konermann et al., 2015). Plasmids expressing dCas9-VP64

(Addgene #61425), MS2-P65-HSF1 (Addgene #61426) and sgRNA (fused with MS2) (Addgene #61427) were

cotransfected into ESCs by lipofectamine 2000 (Life Technologies). ESCs were selected with hygromycin and

blasticidin for 2 days before RNA isolation and RT-qPCR analysis. All sgRNAs used for CRISPR-on are fused with an

MS2 aptamer.

CRISPRi was performed as previously described (Qi et al., 2013). It has been reported that cotransfection of

catalytically inactive CAS9 (dCAS9) with an sgRNA targeting the non-template DNA strand of a gene, but not the

template strand, can block elongation of transcripts (Qi et al., 2013; Rossi et al., 2015). Two sgRNAs were designed to

target the same DNA location which is ~52 bp downstream of Evx1as or EVX1 TSS, one targeting the non-template

strand which is expected to effectively inhibit transcription elongation of targeted genes and one targeting the template

strand which have no effect on transcription elongation. After one day of ESC differentiation induced by LIF withdrawal,

sgRNAs were co-transfected with dCas9. Transfected cells were harvested after 2 days for RT-qPCR.

RNA tethering

RNA-tethering was performed as previously described (Shechner et al., 2015). RNA is fused with an sgRNA at

its 5’ end and a U1 3’box at its 3’end. The short (s) and long (l) isoforms of Evx1as RNA as well as a control expressing

the reverse sequence (rs) of the Evx1as short isoform, GFP and HOTTIP RNA were fused to sgRNAs targeting to the

promoter, potential enhancer and a non-related region (the immediate upstream of the TSS of REX1). Then, the fusion

RNA-sgRNA constructs were co-transfected with dCas9 into ESCs for 2 days before harvest for RT-qPCR. The length

of RNA transcripts used in the tethering experiments is: 621 nt for Evx1as (s) or (rs); 2789 nt for Evx1as (l) or (rl);

2877 nt for EVX1; 1962 nt for the lncRNA HOTTIP; 720 nt for GFP RNA. Sequences for sgRNAs are listed in Table

S7.

Transposon-mediated overexpression

The full-length, long isoform of Evx1as in day-4 differentiated ESCs was cloned by 5’ and 3’ rapid amplification

of cDNA ends (RACE) (Takara, D315 and D314). The cDNA sequences of Evx1as and EVX1 were cloned into a

PiggyBac vector and placed downstream of the CAG promoter. Overexpression vectors were co-transfected with

pBASE transposase into ESCs. ESCs that stably express Evx1as or EVX1 through transposon-mediated random

insertion were selected by hygromycin and maintained for subsequent analyses. Primers for RACE are listed in Table

S7.

Single-cell and time-course expression analysis

Single-cell expression analysis was performed as described previously (Tang et al., 2010) (Kurimoto et al., 2007).

Briefly, cells were digested into single-cell suspensions. Single cells were manually picked, lysed in reverse

transcription buffer and used immediately for cDNA synthesis with oligo-dT primers by SuperScript III (Life

Technologies). The cDNA samples were subjected to a pre-amplification step (20 cycles) or directly used for real-time

qPCR detection (Bio-Rad, SybrGreen mix). Evx1as (Evx1as-rtf/r) and EVX1 (EVX1-rtf/r) primers show similar PCR

efficiencies (Figure S6A). PCR detection of GADPH was used as a loading control. Only cells with detectable GADPH

expression were counted and analyzed. In total, data from 64–285 single cells per time point were collected from three

independent differentiation experiments induced by LIF withdrawal. The number of RNA molecules per cell was

calculated based on standard curves plotted using purified PCR products from Evx1as and EVX1 cDNAs as the template.

We estimated that 5 femtograms of Evx1as or EVX1 template (purified PCR product) contained ~1.2 x 104 molecules

(385 bp in length, MW ~2.5 x 104/mol) or ~1.5 x 104 molecules (306 bp in length, MW ~2.0 x 104/mol), respectively.

A threshold of 2 RNA molecules per cell was chosen. Only if a cell population contain 5 cells in each time point,

median expression was calculated and considered biologically meaningful. Sporadic expression in 1~2 cells was not

counted. Because the range of transcript numbers per cell was similar with or without amplification (Figures 6D and

S6B), data from different experiments were pooled (Figures 6B-6D). When calculating the median transcript abundance

per cell, we only considered a population containing 5 cells as biologically meaningful.

Quantitative RT-qPCR analysis

Total RNA was treated with DNase I and reverse transcribed by SuperScript III (Life Technologies). To detect the

premature RNA of Evx1as and the pre-mRNA of EVX1, pairs of primers covering one exon-intron junction were used.

To detect the mature RNA or mRNA, pairs of primers covering two exons were used. Gene expression was normalized

to GADPH except for the Gata6as RNAi experiment in which expression was normalized to TUB4. Error bars in RT-

PCR analysis represent standard deviations of mean expression relative to GADPH or TUB4, or average fold changes

compared to the scrambled shRNA control. Primer sequences are listed in Table S7.

RNA-Seq and ChIP-seq data analysis

The RNA and DNA libraries were constructed by following Illumina library preparation protocols. High-

throughput sequencing was performed on a HiSeq2000 or 2500. Alignments of RNA-Seq data were performed using

Tophat v2.0.10 (Trapnell et al., 2012). Only those reads uniquely mapped to the reference genome were kept for further

analysis (Tophat parameter “–g 1”). Fragments Per Kilobase of exon model per Million mapped reads (FPKM) were

calculated by Cufflink 2.1.1 (Trapnell et al., 2012) to represent expression levels of transcripts.

The sets of genes highly expressed in ESCs (95 genes), mesendoderm (marked by brachyury (T)-driven GFP, ME-

high genes, 174) and neural precursor cells (marked by SOX1-driven GFP, NPC-high genes, 198) were selected as

previously described (Shen et al., 2009; Shen et al., 2008) (Table S6C). XEN-high genes (35) were enriched in

extraembryonic endoderm cells (Table S6C). Gene set enrichment analysis (GSEA) was performed as described

previously by comparing knockdown cells to the scramble shRNA control cells (Shen et al., 2009; Shen et al., 2008).

ChIP-seq reads were aligned to genome assemblies (hg19 or mm10) with no gaps by Bowtie2 v2.1.0 (Langmead

et al., 2009). Aligned files were further converted to bedgraph files with BEDTools (Quinlan and Hall, 2010). Average

ChIP-seq reads within 5-10 kb regions relative to TSSs were calculated and plotted in Figures 1H. All RNA-Seq and

ChIP-seq datasets used in this study are listed in Table S8.

Chromatin immunoprecipitation (ChIP), RNA immunoprecipitation (RIP) and Chromatin isolation by RNA

purification (ChIRP)

ChIP was performed as previously described (Shen et al., 2008) with antibodies for H3K4me3 (Abcam, ab8580),

H3K27ac (Abmart), MED12 (Bethyl labs,A300-774A), MED1 (Bethyl labs, A300-793A), CTCF (Millipore 07-729).

Fold enrichment was normalized to an unrelated genomic region (‘nc’, primers CSa) and the input. RIP was performed

as previously described (Yang et al., 2014). Fold enrichment was normalized to GAPDH.

ChIRP was performed as previously described (Yin et al., 2015). Briefly, 59-nt DNA probes were biotinylated by

terminal transferase (New England Biolabs). Sequential crosslinking was performed with the following steps: two

rounds of 300 mJ UV treatment on ice, treatment with 0.8% formaldehyde (FMA) for 10 minutes, treatment with 2 mM

dithiobis (succinimidyl propionate) (DSP) for 30 minutes, and treatment with 3.7% FMA for 10 minutes at room

temperature. Chromatin was sonicated to yield fragments of 2-5 kb in size. Hybridization, washing and elution steps

were performed as previously described; however, we added an additional stringent wash step (0.1x SSC). After elution

and reverse crosslinking, the DNA was subjected to qPCR analysis or sequencing library construction. The RNA was

also isolated for RT-qPCR analysis. To minimize non-specific targeting of ChIRP probes to chromatin DNA, we

performed Evx1as and EVX1 ChIRP in undifferentiated ESCs lacking both RNA transcripts. For ChIRP-Seq data anlysis,

raw reads were uniquely mapped to the mouse genome (mm9) using Bowtie v.1.0.0 (Langmead et al., 2009). Positive

peaks were identified with the MACS program by comparing D4 to D0 samples with a p-value cutoff of 1×10−5 (Zhang

et al., 2008). Fold enrichment of chromatin association of Evx1as or EVX1 was calculated by normalizing ChIRP signals

to undifferentiated ESCs and an unrelated genomic region (primers CSa) in order to minimize non-specific targeting of

ChIRP probes to chromatin. The probe and primer sequences are listed in Table S7.

Chromosome conformation capture (3C)

The 3C analysis was performed as previously described (Miele and Dekker, 2009). Briefly, 5x106 undifferentiated

or day-4 differentiated ESCs (wild-type or Evx1as-null ESCs) were crosslinked with 1% FMA at room temperature for

10 min. After nuclear extraction by douncer homogenization, pellets were suspended in NEB buffer 3 with 0.1% SDS

at 65C for 30 min, and then quenched with 1% Triton X-100, added 200U of BglII and BclI to digest chromatin at

37C overnight. After digestion, 1% SDS was added to the reaction to inactivate enzymes at 65C for 30 min. The

reaction mix was diluted for 20-fold with DNA ligation buffer with 1% Triton X-100, and then added 1/40 volume of

T4 DNA ligase (4000U) at 16C for 4h. After reverse crosslink overnight by protease K, DNA was isolated by

phenol/chloroform extraction. A genomic DNA sequence covering the Evx1as/EVX1 locus and nearby regions was

amplified by PCR to normalize PCR efficiency for 3C primers. The DNA control was subjected to BglII and BclI

digestion, ligation and DNA extraction. Interaction frequencies were calculated by dividing the normalized ratios in the

chromatin samples to the level in the DNA control.

Nuclear run-on

Nuclear run-on was performed as previously described (Patrone et al., 2000). About 5x107 ESCs were harvested

and washed with PBS for one run-on experiment. Cell pellets were added 1ml of lysis buffer (10 mM Tris-HCl, pH 7.4,

3 mM MgCl2, 10 mM NaCl, 150 mM sucrose and 0.5% NP-40) and incubated on ice for 5 min. After centrifugation at

250g, the pellet was washed with lysis buffer without NP-40 and re-suspended with 100 l nuclear storage buffer (50

mM Tris-HCl, pH 8.3, 40% glycerol, 5 mM MgCl2 and 0.1 mM EDTA). Equal volume of 2X transcription buffer (200

mM KCl, 20 mM Tris-HCl, pH8.0, 5 mM MgCl2, 4 mM DTT, 4 mM each of ATP, GTP and CTP, 200 mM sucrose and

20% glycerol) was added into nuclei and then supplied with 8 l biotin-16-UTP (10 mM, Roche). After incubation at

29C for 30 min, RNA was extracted by Trizol. About 50 l of M280 Dyna beads were washed with PBS, resuspended

into 50 l of 2X binding buffer (10 mM Tris-HCl, pH 7.5, 1 mM EDTA and 2 M NaCl), and then mixed with equal

volume of purified RNA. After incubation at RT for 4 hours, beads were washed twice with 2X SSC plus 15%

formamide, 2X SSC once and finally re-suspended in 30 l RNase-free water for RT-qPCR. RT-qPCR primers used to

detect premature RNA of Evx1as and the pre-mRNA of EVX1 were designed to cover one exon-intron junction, that is,

one primer locates in the intron and one in an adjacent exon.

Northern and southern blot analysis

Wild-type ESCs at days 0 and 4 of differentiation induced by LIF withdrawal were harvested for total RNA

isolation by Trizol (Life Technologies). The polyA+ RNA fractions were enriched using a Dynabeads mRNA

purification kit (Life Technologies). About 1 µg of polyA+ RNA was loaded per lane for northern blot analysis. For

southern blot analysis, 5-10 µg of digested genomic DNA from wild-type or mutant ESCs was loaded per lane.

Digoxigenin-labeled antisense RNA probe or DNA probes were used for northern or southern blotting, respectively.

For northern blot detection of Evx1as, the probe was a ~360-nt DIG-UTP labeled RNA probe that was in vitro

transcribed by MaxiScript T3 kit (Ambion, AM1316). Southern probes were DIG-dUTP-labeled DNA, ~800 bp in

length. Hybridization was done overnight at 68C or 42C for Northern or Southern blotting, respectively. After

washing, membranes were stained with anti-DIG-AP (Roche) and exposed to X-ray film. Primers for northern and

southern probes are listed in Table S8.

RNA fluorescence in situ hybridization (FISH)

FISH was performed according to the manufacturer’s protocol of Stellaris FISH probes (www.biosearchtech.com).

A total of 48 probes labeled with Quasar570 (Cy3 replacement) were used to target Evx1as transcripts. ESCs were

plated on cover glasses and differentiated for 4 days after LIF withdrawal. Cells were washed with PBS and fixed by

3.7% formaldehyde for 10 min at room temperature. Then washed with PBS and permeabilized with 70% ethanol at

4°C for at least an hour. After permeabilization, cells were washed with washing buffer (2X SSC contain 10%

formamide) and hybridized with 250 nM probes (2X SSC with 100 mg/mL dextran sulfate and 10% formamide) in a

dark humidified chamber at 37°C for 4 hours. Then washed twice and stained with 5 ng/mL DAPI in washing buffer at

37°C for 30 min. At last, coverslips were mounted by Fluoromount-G (Southern Biotech).

Subcellular fractionation

Subcellular fractionation was performed as described previously (Bhatt et al., 2012). Briefly, ESCs were

resuspended in cold cytoplasmic lysis buffer (containing 0.15% NP-40), laid onto cold sucrose buffer, and then

centrifuged to separate the cytoplasmic fraction from the nuclei. The nuclear pellet was resuspended in nuclear lysis

buffer (containing 0.5M urea, 0.5% NP40) and centrifuged. The soluble nucleoplasm fraction was extracted, then the

chromatin pellet was resuspended in PBS. RNA was extracted by TRIzol reagent (Life Technologies).

RNA pull-down

RNA pull down was performed as described previously (Carla et al., 2013). Briefly, biotinylated RNAs were

transcribed in vitro by T7 (sense RNA) or T3 (reverse RNA) polymerase according to the manufacturer’s protocol

(Ambion, AM1354 and AM1316). Biotin-16-UTPs (Roche) were added as 5% of UTP in the reactions. About 2 g of

biotinylated RNAs were heated for 10 min at 95°C, and then cooled down to room temperature in RNA structure buffer

(10 mM Tris pH 7, 0.1 M KCl, 10 mM MgCl2). About 5x107 day-4 differentiated ESCs were used for each RNA pull-

down experiment. Cells were re-suspended in 2 ml PBS and added 8 ml nuclear isolation buffer (10 mM Tris pH 7.5, 5

mM MgCl2, 1% Triton-X100) followed by 20 min incubation on ice. Nuclei were pelleted by centrifugation at 2,500g

for 15 min and resuspended in 1ml RIP buffer (25 mM Tris pH 7, 0.15 M KCl, 0.5mM DTT, 0.5% NP-40, 1 mM PMSF,

cocktail and RNaseOut). After 20 stocks of homogenization by a dounce homogenizer and 10 min centrifugation at

13,000 rpm, the supernatant was collected as nuclear extract. Nuclear extract was pre-cleared by 30 l M280 beads

(Life technology) and 20 g yeast RNA for 1 hour at 4°C, and then incubated with 2 g biotinylated RNA at 4°C

overnight, followed by addition of 40 l equilibrated M280 beads for additional 3 hours at 4°C. After 4x10 min washes

by RIP buffer, proteins bound to RNA were eluted in 2% SDS sample buffer by heating at 95°C for 10 min, and then

analyzed by western blot.

SUPPLEMENTAL REFERENCES

Bhatt, D.M., Pandya-Jones, A., Tong, A.J., Barozzi, I., Lissner, M.M., Natoli, G., Black, D.L., and Smale, S.T. (2012).

Transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular RNA fractions. Cell 150,

279-290.

Conti, L., Pollard, S.M., Gorba, T., Reitano, E., Toselli, M., Biella, G., Sun, Y., Sanzone, S., Ying, Q.L., Cattaneo, E.,

et al. (2005). Niche-independent symmetrical self-renewal of a mammalian tissue stem cell. PLoS biology 3, e283.

Fidalgo, M., Faiola, F., Pereira, C.F., Ding, J., Saunders, A., Gingold, J., Schaniel, C., Lemischka, I.R., Silva, J.C., and

Wang, J. (2012). Zfp281 mediates Nanog autorepression through recruitment of the NuRD complex and inhibits somatic

cell reprogramming. Proceedings of the National Academy of Sciences of the United States of America 109, 16202-

16207.

Flicek, P., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho-Silva, D., Clapham, P., Coates, G.,

Fitzgerald, S., et al. (2014). Ensembl 2014. Nucleic acids research 42, D749-755.

Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B.L., Barrell, D., Zadissa,

A., Searle, S., et al. (2012). GENCODE: the reference human genome annotation for The ENCODE Project. Genome

research 22, 1760-1774.

Huang da, W., Sherman, B.T., and Lempicki, R.A. (2009). Systematic and integrative analysis of large gene lists using

DAVID bioinformatics resources. Nat Protoc 4, 44-57.

Karolchik, D., Barber, G.P., Casper, J., Clawson, H., Cline, M.S., Diekhans, M., Dreszer, T.R., Fujita, P.A., Guruvadoo,

L., Haeussler, M., et al. (2014). The UCSC Genome Browser database: 2014 update. Nucleic acids research 42, D764-

770.

Kurimoto, K., Yabuta, Y., Ohinata, Y., and Saitou, M. (2007). Global single-cell cDNA amplification to provide a

template for representative high-density oligonucleotide microarray analysis. Nat Protoc 2, 739-752.

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short

DNA sequences to the human genome. Genome biology 10, R25.

Mali, P., Esvelt, K.M., and Church, G.M. (2013). Cas9 as a versatile tool for engineering biology. Nature methods 10,

957-963.

McLean, C.Y., Bristor, D., Hiller, M., Clarke, S.L., Schaar, B.T., Lowe, C.B., Wenger, A.M., and Bejerano, G. (2010).

GREAT improves functional interpretation of cis-regulatory regions. Nature biotechnology 28, 495-501.

Miele, A., and Dekker, J. (2009). Mapping cis- and trans- chromatin interaction networks using chromosome

conformation capture (3C). Methods in molecular biology (Clifton, NJ 464, 105-121.

Moffat, J., Grueneberg, D.A., Yang, X., Kim, S.Y., Kloepfer, A.M., Hinkle, G., Piqani, B., Eisenhaure, T.M., Luo, B.,

Grenier, J.K., et al. (2006). A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-

content screen. Cell 124, 1283-1298.

Okada, A., Aoki, Y., Kushima, K., Kurihara, H., Bialer, M., and Fujiwara, M. (2004). Polycomb homologs are involved

in teratogenicity of valproic acid in mice. Birth defects research 70, 870-879.

Pruitt, K.D., Brown, G.R., Hiatt, S.M., Thibaud-Nissen, F., Astashyn, A., Ermolaeva, O., Farrell, C.M., Hart, J.,

Landrum, M.J., McGarvey, K.M., et al. (2014). RefSeq: an update on mammalian reference sequences. Nucleic acids

research 42, D756-763.

Qi, L.S., Larson, M.H., Gilbert, L.A., Doudna, J.A., Weissman, J.S., Arkin, A.P., and Lim, W.A. (2013). Repurposing

CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173-1183.

Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features.

Bioinformatics 26, 841-842.

Sharif, B., Na, J., Lykke-Hartmann, K., McLaughlin, S.H., Laue, E., Glover, D.M., and Zernicka-Goetz, M. (2010). The

chromosome passenger complex is required for fidelity of chromosome transmission and cytokinesis in meiosis of

mouse oocytes. Journal of cell science 123, 4292-4300.

Shen, B., Zhang, J., Wu, H., Wang, J., Ma, K., Li, Z., Zhang, X., Zhang, P., and Huang, X. (2013). Generation of gene-

modified mice via Cas9/RNA-mediated gene targeting. Cell research 23, 720-723.

Shen, X., Liu, Y., Hsu, Y.J., Fujiwara, Y., Kim, J., Mao, X., Yuan, G.C., and Orkin, S.H. (2008). EZH1 Mediates

Methylation on Histone H3 Lysine 27 and Complements EZH2 in Maintaining Stem Cell Identity and Executing

Pluripotency. Mol Cell 32, 491-502.

Theunissen, T.W., van Oosten, A.L., Castelo-Branco, G., Hall, J., Smith, A., and Silva, J.C. (2011). Nanog overcomes

reprogramming barriers and induces pluripotency in minimal conditions. Curr Biol 21, 65-71.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and

Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and

Cufflinks. Nature Protocols 7, 562-578.

Wang, J., Theunissen, T.W., and Orkin, S.H. (2007). Site-directed, virus-free, and inducible RNAi in embryonic stem

cells. Proceedings of the National Academy of Sciences of the United States of America 104, 20850-20855.

Yang, Y.W., Flynn, R.A., Chen, Y., Qu, K., Wan, B., Wang, K.C., Lei, M., and Chang, H.Y. (2014). Essential role of

lncRNA binding for WDR5 maintenance of active chromatin and embryonic stem cell pluripotency. eLife 3, e02046.

Yin, Y., Yan, P., Lu, J., Song, G., Zhu, Y., Li, Z., Zhao, Y., Shen, B., Huang, X., Zhu, H., et al. (2015). Opposing Roles

for the lncRNA Haunt and Its Genomic Locus in Regulating HOXA Gene Activation during Embryonic Stem Cell

Differentiation. Cell stem cell 16, 504-516.

Ying, Q.L., and Smith, A.G. (2003). Defined conditions for neural commitment and differentiation. Methods in

enzymology 365, 327-341.

Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M.,

Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome biology 9, R137.

Zhang, Y.E., Vibranovski, M.D., Landback, P., Marais, G.A., and Long, M. (2010). Chromosomal redistribution of

male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS biology 8.

Zhou, J., Wang, J., Shen, B., Chen, L., Su, Y., Yang, J., Zhang, W., Tian, X., and Huang, X. (2014). Dual sgRNAs

facilitate CRISPR/Cas9-mediated mouse genome targeting. The FEBS journal 281, 1717-1725.