10
Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka Eguchi a,b , Matthew J. Wleklinski b , Mackenzie C. Spurgat b , Evan A. Heiderscheit b , Anna S. Kropornicka b,c , Catherine K. Vu b , Devesh Bhimsaria b,d , Scott A. Swanson e , Ron Stewart e , Parameswaran Ramanathan d , Timothy J. Kamp f , Igor Slukvin g,h , James A. Thomson e,i,j , James R. Dutton k , and Aseem Z. Ansari b,j,1 a Cellular and Molecular Biology Training Program, University of WisconsinMadison, Madison, WI 53706; b Department of Biochemistry, University of WisconsinMadison, Madison, WI 53706; c Genetics Training Program, University of WisconsinMadison, Madison, WI 53706; d Department of Electrical and Computer Engineering, University of WisconsinMadison, Madison, WI 53706; e Morgridge Institute for Research, Madison, WI 53715; f Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792; g Wisconsin National Primate Research Center, University of WisconsinMadison, Madison, WI 53715; h Department of Pathology and Laboratory Medicine, University of WisconsinMadison, Madison, WI 53715; i Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53706; j The Genome Center of Wisconsin, University of WisconsinMadison, Madison, WI 53706; and k Stem Cell Institute, University of Minnesota, Minneapolis, MN 55455 Edited by Michael R. Green, University of Massachusetts Medical School, Worcester, MA, and approved November 14, 2016 (received for review July 7, 2016) Artificial transcription factors (ATFs) are precision-tailored molecules designed to bind DNA and regulate transcription in a preprogrammed manner. Libraries of ATFs enable the high-throughput screening of gene networks that trigger cell fate decisions or phenotypic changes. We developed a genome-scale library of ATFs that display an engineered interaction domain (ID) to enable cooperative assem- bly and synergistic gene expression at targeted sites. We used this ATF library to screen for key regulators of the pluripotency network and discovered three combinations of ATFs capable of inducing pluripotency without exogenous expression of Oct4 (POU domain, class 5, TF 1). Cognate site identification, global transcriptional pro- filing, and identification of ATF binding sites reveal that the ATFs do not directly target Oct4; instead, they target distinct nodes that converge to stimulate the endogenous pluripotency network. This forward genetic approach enables cell type conversions without a priori knowledge of potential key regulators and reveals unantici- pated gene network dynamics that drive cell fate choices. artificial transcription factor | genome-scale library | cell fate | reprogramming | gene regulatory networks E xpression of certain transcription factors (TFs) can profoundly alter gene regulatory dynamics of a cell to the extent that the cell may transition to a completely different state. For example, the TFs Oct4 (POU domain, class 5, TF 1), Sox2 [SRY (sex-determining region Y)-box 2], Klf4 (Kruppel-like factor 4), and c-Myc (myelocy- tomatosis oncogene), have been widely used to reprogram somatic cells to an induced pluripotent stem (iPS) cell state (13). Similarly, other TF combinations can reprogram somatic cells to adopt specific cell states, such as myocytes, cardiomyocytes, neurons, and hepato- cytes (47). However, state-of-the-art methods to find regulators of cell fate conversions rely on trial and error and empirical exploration of a small subset of combinations of different transcriptional regu- lators (8). Such efforts are highly constrained by the number of combinations that can be tested and are labor intensive and cost prohibitive. Conventional approaches often rely on the assumption that the factors that maintain a particular cell state are the same factors that reprogram gene networks to drive cell fate conversion, an assumption that may not necessarily be correct, especially when the intended conversion does not occur naturally during develop- ment. Moreover, TFs function in a specific cellular milieu and trigger appropriate gene expression in response to specific cues that might not occur in the cellular systems where they are being tested. The epigenetic landscape and heterochromatic regions of the cell may also present barriers to accessibility to key regulatory regions (9). To overcome such barriers to cell fate conversions, we developed a li- brary of artificial transcription factors (ATFs) that stimulate tran- scriptional circuits independently of the original cell state. ATFs are DNA-binding molecules designed to control gene expression in a predetermined manner (10). Rather than taking the conventional approach of testing candidate factors curated from studying embryonic development or differential expression analysis, unbiased screening of a genome-scale ATF library can be a highly effective and orthogonal approach to sample thousands of sites in parallel and activate cell fate-defining transcriptional networks. Use of a library also yields ATFs that can access genomic loci without having to first identify accessible regions upstream of desired target genes. Because ATFs do not rely on endogenously expressed cofactors and are not restrained by feedback circuits that limit the function of ectopically expressed natural factors, they can serve as powerful agents to perturb the homeostatic state of any cell type. The target genes of specific ATFs that evoke changes in cell states can enable the unbiased identification of gene regula- tory networks that govern cell fate conversion. TFs are modular by nature, and each domain can be tailored to create ATFs that target and regulate genes and networks in a preprogrammed manner (1114). The DNA-binding domain (DBD) confers sequence specificity in targeting genomic loci. The effector domain provides the ATF with function, be it transcriptional activation, repression, or modification of chromatin. Importantly, an interaction domain (ID) can be incorporated in the design such that Significance The ability to convert cells into desired cell types enables tissue engineering, disease modeling, and regenerative medicine; however, methods to generate desired cell types remain diffi- cult, uncertain, and laborious. We developed a strategy to screen gene regulatory elements on a genome scale to discover paths that trigger cell fate changes. The proteins used in this study cooperatively bind DNA and activate genes in a syner- gistic manner. Subsequent identification of transcriptional networks does not depend on prior knowledge of specific regulators important in the biological system being tested. This powerful forward genetic approach enables direct cell state conversions as well as other challenging manipulations of cell fate. Author contributions: A.E. and A.Z.A. designed research; A.E., M.J.W., M.C.S., E.A.H., and C.K.V. performed research; P.R., T.J.K., I.S., J.A.T., and J.R.D. contributed new reagents/ analytic tools; A.E., A.S.K., D.B., S.A.S., and R.S. analyzed data; and A.E., J.R.D., and A.Z.A. wrote the paper. Conflict of interest statement: A.Z.A. is the sole member of VistaMotif, LLC, and founder of the nonprofit WINStep Forward. This article is a PNAS Direct Submission. Data deposition: The RNA-seq and ChIP-seq data reported in this paper have been de- posited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE89221). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1611142114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1611142114 PNAS | Published online December 5, 2016 | E8257E8266 CELL BIOLOGY PNAS PLUS Downloaded by guest on December 5, 2020

Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

Reprogramming cell fate with a genome-scale libraryof artificial transcription factorsAsuka Eguchia,b, Matthew J. Wleklinskib, Mackenzie C. Spurgatb, Evan A. Heiderscheitb, Anna S. Kropornickab,c,Catherine K. Vub, Devesh Bhimsariab,d, Scott A. Swansone, Ron Stewarte, Parameswaran Ramanathand,Timothy J. Kampf, Igor Slukving,h, James A. Thomsone,i,j, James R. Duttonk, and Aseem Z. Ansarib,j,1

aCellular and Molecular Biology Training Program, University of Wisconsin–Madison, Madison, WI 53706; bDepartment of Biochemistry, University ofWisconsin–Madison, Madison, WI 53706; cGenetics Training Program, University of Wisconsin–Madison, Madison, WI 53706; dDepartment of Electrical andComputer Engineering, University of Wisconsin–Madison, Madison, WI 53706; eMorgridge Institute for Research, Madison, WI 53715; fDepartment ofMedicine, University of Wisconsin School of Medicine and Public Health, Madison, WI 53792; gWisconsin National Primate Research Center, University ofWisconsin–Madison, Madison, WI 53715; hDepartment of Pathology and Laboratory Medicine, University of Wisconsin–Madison, Madison, WI 53715;iDepartment of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI 53706; jThe Genome Center ofWisconsin, University of Wisconsin–Madison, Madison, WI 53706; and kStem Cell Institute, University of Minnesota, Minneapolis, MN 55455

Edited by Michael R. Green, University of Massachusetts Medical School, Worcester, MA, and approved November 14, 2016 (received for review July 7, 2016)

Artificial transcription factors (ATFs) are precision-tailored moleculesdesigned to bind DNA and regulate transcription in a preprogrammedmanner. Libraries of ATFs enable the high-throughput screeningof gene networks that trigger cell fate decisions or phenotypicchanges. We developed a genome-scale library of ATFs that displayan engineered interaction domain (ID) to enable cooperative assem-bly and synergistic gene expression at targeted sites. We used thisATF library to screen for key regulators of the pluripotency networkand discovered three combinations of ATFs capable of inducingpluripotency without exogenous expression of Oct4 (POU domain,class 5, TF 1). Cognate site identification, global transcriptional pro-filing, and identification of ATF binding sites reveal that the ATFs donot directly target Oct4; instead, they target distinct nodes thatconverge to stimulate the endogenous pluripotency network. Thisforward genetic approach enables cell type conversions without apriori knowledge of potential key regulators and reveals unantici-pated gene network dynamics that drive cell fate choices.

artificial transcription factor | genome-scale library | cell fate |reprogramming | gene regulatory networks

Expression of certain transcription factors (TFs) can profoundlyalter gene regulatory dynamics of a cell to the extent that the cell

may transition to a completely different state. For example, the TFsOct4 (POU domain, class 5, TF 1), Sox2 [SRY (sex-determiningregion Y)-box 2], Klf4 (Kruppel-like factor 4), and c-Myc (myelocy-tomatosis oncogene), have been widely used to reprogram somaticcells to an induced pluripotent stem (iPS) cell state (1–3). Similarly,other TF combinations can reprogram somatic cells to adopt specificcell states, such as myocytes, cardiomyocytes, neurons, and hepato-cytes (4–7). However, state-of-the-art methods to find regulators ofcell fate conversions rely on trial and error and empirical explorationof a small subset of combinations of different transcriptional regu-lators (8). Such efforts are highly constrained by the number ofcombinations that can be tested and are labor intensive and costprohibitive. Conventional approaches often rely on the assumptionthat the factors that maintain a particular cell state are the samefactors that reprogram gene networks to drive cell fate conversion,an assumption that may not necessarily be correct, especially whenthe intended conversion does not occur naturally during develop-ment. Moreover, TFs function in a specific cellular milieu and triggerappropriate gene expression in response to specific cues that mightnot occur in the cellular systems where they are being tested. Theepigenetic landscape and heterochromatic regions of the cell mayalso present barriers to accessibility to key regulatory regions (9). Toovercome such barriers to cell fate conversions, we developed a li-brary of artificial transcription factors (ATFs) that stimulate tran-scriptional circuits independently of the original cell state.ATFs are DNA-binding molecules designed to control gene

expression in a predetermined manner (10). Rather than taking

the conventional approach of testing candidate factors curatedfrom studying embryonic development or differential expressionanalysis, unbiased screening of a genome-scale ATF library canbe a highly effective and orthogonal approach to sample thousandsof sites in parallel and activate cell fate-defining transcriptionalnetworks. Use of a library also yields ATFs that can access genomicloci without having to first identify accessible regions upstream ofdesired target genes. Because ATFs do not rely on endogenouslyexpressed cofactors and are not restrained by feedback circuits thatlimit the function of ectopically expressed natural factors, they canserve as powerful agents to perturb the homeostatic state of anycell type. The target genes of specific ATFs that evoke changes incell states can enable the unbiased identification of gene regula-tory networks that govern cell fate conversion.TFs are modular by nature, and each domain can be tailored

to create ATFs that target and regulate genes and networks in apreprogrammed manner (11–14). The DNA-binding domain(DBD) confers sequence specificity in targeting genomic loci. Theeffector domain provides the ATF with function, be it transcriptionalactivation, repression, or modification of chromatin. Importantly, aninteraction domain (ID) can be incorporated in the design such that

Significance

The ability to convert cells into desired cell types enables tissueengineering, disease modeling, and regenerative medicine;however, methods to generate desired cell types remain diffi-cult, uncertain, and laborious. We developed a strategy toscreen gene regulatory elements on a genome scale to discoverpaths that trigger cell fate changes. The proteins used in thisstudy cooperatively bind DNA and activate genes in a syner-gistic manner. Subsequent identification of transcriptionalnetworks does not depend on prior knowledge of specificregulators important in the biological system being tested. Thispowerful forward genetic approach enables direct cell stateconversions as well as other challenging manipulations ofcell fate.

Author contributions: A.E. and A.Z.A. designed research; A.E., M.J.W., M.C.S., E.A.H., andC.K.V. performed research; P.R., T.J.K., I.S., J.A.T., and J.R.D. contributed new reagents/analytic tools; A.E., A.S.K., D.B., S.A.S., and R.S. analyzed data; and A.E., J.R.D., and A.Z.A.wrote the paper.

Conflict of interest statement: A.Z.A. is the sole member of VistaMotif, LLC, and founderof the nonprofit WINStep Forward.

This article is a PNAS Direct Submission.

Data deposition: The RNA-seq and ChIP-seq data reported in this paper have been de-posited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo(accession no. GSE89221).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1611142114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1611142114 PNAS | Published online December 5, 2016 | E8257–E8266

CELL

BIOLO

GY

PNASPL

US

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 2: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

the ATF can interact with other factors in the cell (10). Principles ofcooperative assembly and synergistic activation were integrated inthe design of our genome-scale ATF library (15, 16).We used the following three criteria to choose among an array

of DBD scaffolds: (i) ability to target multiple nodes of a genenetwork, (ii) regulatory potency of resulting ATFs, and (iii) ef-ficiency of delivery into cells. Applying these criteria to the repur-posed nuclease-inactivated CRISPR/Cas9 system, TAL-effectors,programmable small-molecule polyamides, and zinc fingers, weconcluded that the zinc finger scaffold, engineered to enable co-operative and combinatorial assembly on DNA, would be the mosteffective DBD scaffold for the creation of an ATF library designedto trigger cell fate conversions (seeMaterials and Methods for detailson choice of DBD).To demonstrate unbiased ability to change cell identity, we

used our ATF library to screen for factors that induced pluri-potency in mouse embryonic fibroblasts (MEFs) without exog-enous delivery of Oct4. We created a library of 2.62 × 106 ATFsthat encompass five times the number of factors as the entiresequence space of all 10-bp binding sites on double-strandedDNA (524,809 unique sequences). RNA-sequencing (RNA-seq)data, epigenetic landscapes, and comprehensive ATF bindingprofiles by cognate site identification (CSI) sequencing (CSI-seq)and ChIP-sequencing (ChIP-seq) were analyzed to determine thekey nodes through which ATFs activate the pluripotency network.

Surprisingly, bioinformatic analysis reveals that the ATFs take fi-broblasts through a different path to pluripotency than exoge-nously expressed Oct4. We demonstrate that this forward geneticapproach enables the pursuit of elusive cell fate conversions in anunbiased manner.

ResultsATF Architecture. To determine the best architecture for a zincfinger ATF library, we tested the impact of each modular domainon the level of induction. The zinc finger backbone is derivedfrom human EGR1/ZIF268 (early growth response 1), a well-studied scaffold for zinc finger ATFs (17–19). We fused VP64, atetrameric repeat of the 11-aa activation region of VP16, a potenttransactivation domain from the herpes simplex virus to the Cterminus to the zinc fingers (Fig. 1A) (20). Although a variety ofzinc finger-based libraries have been described before (21–27), adistinguishing and important feature of our ATF design is inclu-sion of a 15-aa peptide that serves as an ID, allowing dimerizationof the ATF with another ATF through the hydrophobic surfaceof the first zinc finger of EGR1 (28). The inclusion of the ID inour ATF design adds a layer of control to the ATF library byallowing the ATFs to harness cooperative binding and synergisticactivation (16).Because we wanted to create an ATF library of high com-

plexity while minimizing nonessential modules, it was necessary

Fig. 1. ATF designed with three zinc fingers, activation domain, and interaction domain (ID) to maximize transcriptional effect. (A) Architecture of the ATF.From N to C terminus, the ATF consists of an ID, three zinc fingers, a nuclear localization signal (NLS) from EGR1, and a VP64 activation domain. (B) The three-zinc finger ATF induces expression 329-fold over the mock control, whereas the two-zinc finger ATF induces expression twofold in a luciferase assay per-formed in HEK293 cells. The DBD comes from either the first two or all three zinc fingers of EGR1. Each ATF is also comprised of an ID, NLS, and VP64 (n = 4,P < 0.01 by one-way ANOVA with post hoc Tukey test). (C) RNA-seq results in HEK293 cells show that the three-zinc finger ATF with an ID up-regulatesexpression of the greatest number of genes compared with the mock-transfected control. All ATFs in this assay have an NLS and VP64 (n = 1; P < 0.0005).(D) Among the up-regulated genes, 18 genes are expressed 50-fold or more by the three-zinc finger ATF with an ID (n = 1; P < 0.0005). (E) Design of the ATFlibrary. The residues that confer specificity (−1, 2, 3, and 6 positions) were randomized to amino acids represented by VNN codons, where V is A, C, or G. Alibrary with a complexity of 2.6 × 106 ATFs was created. Sequencing of 100 ATFs from the library shows representation of all 16 amino acids. X in the aminoacid sequence represents any residue.

E8258 | www.pnas.org/cgi/doi/10.1073/pnas.1611142114 Eguchi et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 3: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

to determine the minimum number of zinc fingers required tohave a transcriptional effect. Toward this end, we compared atwo-zinc finger ATF with a three-zinc finger ATF in a luciferaseassay with a palindromic cognate site for EGR1 upstream of theluciferase reporter. The two-zinc finger ATF could only activatethe luciferase reporter 2-fold over background, whereas thethree-zinc finger ATF was capable of activating 329-fold overbackground (Fig. 1B). Incorporation of the ID further increasedactivation of the three-zinc finger ATF almost by an order ofmagnitude (SI Appendix, Fig. S3A).To determine how the ATFs impact genome-wide transcrip-

tion, we performed RNA-seq in human cells expressing one ofthe four ATFs with different architectures. The ATFs either hadthe first two or all three zinc fingers of EGR1 as the DBD, withand without the ID. Compared with the mock-transfected con-trol, the two-zinc finger ATFs had little impact on altering thetranscriptional profile (Fig. 1 C and D, and SI Appendix, Fig. S3B and C). On the other hand, the three-zinc finger ATF with theID altered the expression of 104 transcripts (100 up-regulated, 4down-regulated) (Fig. 1D and SI Appendix, Fig. S3C). Most of thegenes were up-regulated compared with the mock-transfectedcontrol, and the repressed genes could be attributed to indirecteffects of the ATFs. As the three-zinc finger ATF with the ID wascapable of binding as a monomer or dimer, most of the subset ofgenes up-regulated by the three-zinc finger ATF without the IDcould also be activated by the three-zinc finger ATF with the ID,and the ID increased the level of induction for a subset of genes(SI Appendix, Fig. S3B).

Genome-Scale ATF Library Design. Taking all these results intoconsideration, a scaffold that includes from N to C terminus: anID, three zinc fingers derived from EGR1, NLS from EGR1,VP64 activation domain, and a 3× hemagglutinin (HA) tag wasused to construct the ATF scaffold. The library of different DNAsequence-targeting ATFs was created by incorporating VNNcodons, where V is A, C, or G, at the DNA recognition residues(−1, 2, 3, and 6). Use of VNN codons prevents incorporation ofpremature stop codons within the ORF and permits the in-corporation of 16 different amino acids (Fig. 1E). The library wascloned into a second-generation lentiviral system to ensure ef-ficient delivery to mammalian cells. The ATF is driven by theconstitutively active EF-1α promoter, which resists silencing inmammalian cells compared with other constitutive promoters(Fig. 2A) (29). The sequence space for all 10-bp sequences ondouble-stranded DNA is 524,809 different sequence permuta-tions. We created an ATF library with a complexity of 2.62 × 106,five times the targeted sequence space. Sanger sequencing of 100clones confirmed success of our design with incorporation of all16 aa at each recognition residue, suggesting diverse represen-tation in the library (Fig. 1E and SI Appendix, Table S7).

ATF Library Activates the Pluripotency Network. We asked whetherATFs in the library could replace the key regulator of pluri-potency, Oct4, in the mixture of TFs that triggers the pluripotencynetwork, Oct4, Sox2, Klf4, and c-Myc (Oct4+SKM). To test a li-brary, capable of sampling thousands of sites in the genome, it wasnecessary to have a robust readout of positive phenotypes (Fig.2A). Toward this end, we used MEFs isolated from a transgenicmouse line that allows lineage tracing of endogenous Oct4 tran-scription (Fig. 2B) (30). In these cells, tamoxifen-inducible Crerecombinase (mER-Cre-mER) is expressed when the endogenouspluripotency associated gene, Oct4, is transcribed. In the presenceof 4-hydroxytamoxifen, the recombinase removes Tomato fromthe ROSA locus, and transmembrane-bound GFP is expressed.Consequently, Tomato+GFP– MEFs become Tomato–GFP+ cellsafter endogenous Oct4 is activated, and GFP expression is main-tained in all their cell progeny (Fig. 2B).

The ATF library was transduced in MEFs (multiplicity of in-fection of 3) with Sox2, Klf4, and c-Myc (SKM). As a positivecontrol, we delivered Oct4+SKM to MEFs (SI Appendix, Fig.S4A). To account for reprogramming events induced by SKM,alone, we delivered lentivirus lacking an ORF in place of theATF (Empty+SKM). The lentivirus with the “empty” ORFaccounted for any false-positive events that could arise fromlentiviral delivery or integration of a strong constitutive pro-moter at relevant genomic sites. We also included a control withthe ATF library alone, as well as untreated MEFs. Although thewidely used pluripotency-inducing combination of Oct4+SKMresulted in Tomato–GFP+ expression in 0.033% cells, the ATFlibrary+SKM induced reporter activation in 0.229% of the cells(Fig. 2C). No Tomato–GFP+ cells were observed for untreatedMEFs and cells treated with ATF library alone or Empty+SKM.Although the percentage of cells with Oct4 activated in ATF-treated cells was higher than in Oct4+SKM cells, this higherpercentage of Tomato–GFP+ cells can be attributed to more celldeath in the ATF-treated cells. Tomato–GFP+ cells bearingmembers of the ATF library+SKM were isolated as single cellsfor further analysis. Of the ATF library+SKM cells, a smallfraction (0.8%) of cells were Tomato+GFP+. Because the half-life of Tomato fluorescent protein is ∼24 h, there is a periodafter Oct4 is activated when the cells are double positive. Wesorted these cells separately to determine whether the ATFsexpressed in the double-positive cells were different from thoseexpressed in the Tomato–GFP+ cells.

Single-Cell Retrieval of Active ATF Combinations. Because differentcombinations of ATFs can potentially act in concert to activatethe pluripotency network, we identified the ATFs from individualsingle cells to capture ATF combinations that activate endogenousOct4 transcription and induce GFP expression. Preliminaryevaluation of iPS colonies derived from the screen with mixedcombinations of ATFs showed high levels of expression for en-dogenous pluripotency genes (SI Appendix, Fig. S4C). Elevendifferent cells that were GFP+ were isolated and subjected totwo-step nested PCR of genomic DNA. Sequencing of the PCRproducts revealed 11 unique combinations of ATFs (Fig. 2D).The range of ATFs varied between 2 and 10 ATFs within a singlecell. ZFATF1 (light blue) and ZFATF2 (orange) appeared inmost of the combinations; however, an additional ATF was nec-essary for reproducible conversion to a bona fide iPS cell state (SIAppendix, Table S8). One ATF from cells 4–11 had a frameshiftmutation near the N terminus, resulting in a 163-aa proteinproduct that does not code for a zinc finger protein (ZFATF5).Only the ID as well as the first 19 aa of the first zinc fingerremained intact (SI Appendix, Fig. S4B).All ATF combinations identified in the screen for endogenous

Oct4 expression were revalidated to determine whether theywere true positives. Among the 11 ATF combinations, C2,C3, and C4 reproducibly generate colonies of iPS cells whenexpressed with Sox2, Klf4, and c-Myc (Fig. 2E and SI Appendix,Fig. S4E and Table S8). Interestingly, C4 was identified from theTomato+GFP+ cells, in which Oct4 was activated before theTomato signal dissipated. During the validation step, MEFsexpressing C4+SKM became iPS cells ∼28 d later than the iPScells generated by the other ATF combinations or Oct4+SKM.The doubling times for the iPS cells generated with C2+SKM orC3+SKM were comparable to that of iPS cells generated withOct4+SKM; however, the doubling time for iPS cells expressingC4+SKM was slightly longer (SI Appendix, Fig. S4D). ATF-inducediPS cells demonstrated capacity for self-renewal and have beencultured beyond 65 passages.

Cells Generated with Different ATF Combinations Are Pluripotent.ATF-induced Oct4+ cells were further characterized for mark-ers of pluripotency. Immunofluorescence was performed to confirm

Eguchi et al. PNAS | Published online December 5, 2016 | E8259

CELL

BIOLO

GY

PNASPL

US

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 4: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

Fig. 2. ATF library activates pluripotency network. (A) Genetic screen with an ATF library. (1) The ATF library was cloned into a second-generation lentiviralvector. The screen is performed in cells with a robust change in phenotype or a lineage-specific reporter. (2) Positive outcomes are isolated as single cells, suchthat combinations of ATFs, if any, can be captured (3). Integrated ATFs are identified from single cells (4). Identified ATFs are retested for validation. Oncevalidated, downstream experiments can be performed to identify ATF target genes. (B) Testing the ATF library in mouse embryonic fibroblasts (MEFs) isolatedfrom a transgenic mouse line that allows lineage tracing of endogenous Oct4 transcription. Upon induction of endogenous Oct4, tamoxifen-inducible Crerecombinase (mER-Cre-mER) is coexpressed. The recombinase removes Tomato from the ROSA locus, and transmembrane-bound GFP is expressed. Conse-quently, Tomato+GFP– MEFs become Tomato–GFP+, and GFP expression is maintained in all their cell progeny. (C) Flow cytometry results at day 15 afterintroduction of TFs. Tomato–GFP+ MEFs transduced with the ATF library+SKM were isolated as single cells for further analysis. MEFs treated with Oct4+SKM(positive control) and untreated MEFs (negative control) were used for comparison. Double-positive (Tomato+GFP+; Q2) cells were also collected. Percentagesare displayed under the quadrant number. (D) ATFs were identified from 11 single cells by two-step nested PCR of genomic DNA. Unique ATFs are depictedwith a different color. One ATF had a frameshift mutation shortly after the ID, coding for a protein that does not have a zinc finger structure. A few ATFs,notably the light blue and orange ATFs, are expressed in most of the cells analyzed. Three ATFs are made up of two fingers (light blue, red, and pink). All cellsexcept number 4 were collected as Tomato–GFP+ cells. Cell 4 was Tomato+GFP+ at day 15. (E) Three combinations of ATFs (C2, C3, and C4) successfully inducedpluripotency with SKM. Micrographs of MEFs transduced with an Empty+SKM are Tomato+GFP–. iPS cells generated with ATFs+SKM are similar to thosegenerated with Oct4+SKM and are Tomato–GFP+. Two ATFs in each combination are the same (light blue and orange). (Scale bar, 100 μm.)

E8260 | www.pnas.org/cgi/doi/10.1073/pnas.1611142114 Eguchi et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 5: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

expression of pluripotency markers, OCT4, SOX2, and NANOG(Fig. 3A and SI Appendix, Fig. S5A). The capacity for ATF-derivediPS cells to differentiate into all three germ layers was assessed byformation of teratomas (Fig. 3B) and embryoid bodies (EBs) (SIAppendix, Fig. S5 B–E). Immunocytochemistry of myosin lightpolypeptide 2 (mesoderm), forkhead box A2 (endoderm), and βIIItubulin (ectoderm) confirmed expression of germ layer markers atthe protein level, and RT-qPCR of T, Nkx2.5, and Kdr (meso-derm); Ttr, Afp, and FoxA2 (endoderm); Nes, Nefl, and Sox17(ectoderm) confirmed differentiation into all three germ layers atthe transcriptional level. Beating cardiomyocytes were also ob-served from embryoid outgrowths from EBs derived from ATF-induced iPS cells (Movie S1).From morphological and select gene marker analysis, we ex-

panded our validation to global transcriptome analysis of theATF-treated iPS cells. Comparison of genome-wide transcrip-tional profiles showed that the transcriptomes of ATF-inducediPS cells cluster tightly with mouse ES cells as well as iPS cellsgenerated with Oct4+SKM (Fig. 3C and SI Appendix, Fig. S6A).RNA profiles of cells at early stages of reprogramming clusteredwith MEFs and the Empty+SKM control.ATF-induced iPS cells show an up-regulation of pluripotency

markers and a down-regulation of fibroblast markers (Fig. 3D).Using the 853 genes that make up the fibroblast gene regulatorynetwork (GRN) and the 705 genes that make up the pluri-potency GRN from CellNet (31), we compared the expressionprofiles of ATF-induced iPS cells to those of other pluripotentcells and MEFs (Fig. 3 E and F). Our genome-wide analysisindicates that the profiles of our ATF-induced iPS cells highlycorrelated with profiles of pluripotent cells generated using ex-ogenous Oct4. It is important to note that, at early stages ofreprogramming, ATF-treated cells have a remarkably differentprofile compared with Oct4+SKM-treated cells (Fig. 3F). Thesedifferences suggest other underlying regulators beyond what ischaracterized in the GRNs of CellNet guide cells to pluri-potency. Once fully reprogrammed to the pluripotent state,global transcriptome profiles shows ATF-induced iPS cells sharemore similarity among themselves than with Oct4+SKM or EScells (Fig. 3C and SI Appendix, Fig. S6 B and C). ATF expressionat early stages of reprogramming was readily detectable by RNA-seq; however, once converted to iPS cells, the lentiviral elementscontrolling the expression of ATFs are silenced, a further con-firmation that the cells were fully reprogrammed. Once reprog-rammed, the converted cells maintain the iPS cell state even inthe absence of regulators that triggered the initial regulatorynodes that led to cell fate conversion.

Signature Epigenetic Landscapes at ATF-Activated Pluripotency Genes.The genome-wide chromatin modification landscapes in ATF-inducediPS cells were compared with iPS cells generated with Oct4+SKM.Specifically, ChIP-seq was performed on histone 3 lysine 27 acet-ylation (H3K27ac), the marker delineating active promoters andsuperenhancers that define cell identity (32, 33), and histone 3lysine 9 trimethylation (H3K9me3), the marker that is stronglycorrelated to repressed regions of the genome that are bound byheterochromatin protein 1 (34). For the analysis of the epigeneticchanges induced by the ATFs, ChIP-seq peaks were identifiedusing MACS2 after alignment to the mouse genome with Bowtie2(35, 36). Analysis of aggregate patterns of the active and re-pressive histone marks by deepTools package shows strongH3K27ac peaks upstream of the gene and in the gene body forexpressed genes, whereas broad H3K9me3 peaks marked genesthat were not expressed (SI Appendix, Fig. S7 C and D) (37).

Fig. 3. iPS cells generated with ATFs are pluripotent. (A) Immunofluores-cence staining of C2+SKM iPS colonies with OCT4, SOX2, and NANOG. (Scalebars, 100 μm.) (B) Teratoma assay results show differentiation into meso-derm, endoderm, and ectoderm. (Scale bar, 100 μm.) (C) iPS cells generatedwith ATFs cluster with mouse ES cells and iPS cells generated withOct4+SKM. Samples marked early are MEFs transduced with the indicatedfactors before conversion into iPS cells between days 18 and 27 (n = 3 or 4).(D) A heat map of fibroblast and pluripotency markers of iPS cells generatedwith ATFs shows down-regulation of fibroblast genes and up-regulation ofpluripotency genes. Scale displays differential expression log2(ratio relativeto mean). (E) A heat map of 853 genes from the CellNet fibroblast generegulatory network (GRN) show iPS cells generated with ATFs obtain tran-scriptional profiles similar to that of other pluripotent cells (Oct4+SKM iPScells and ES cells). Scale displays differential expression log2(ratio relative tomean). (F) A heat map of 705 genes from the CellNet pluripotency GRN showiPS cells generated with ATFs have expression profiles similar to that of otherpluripotent cells. Scale displays differential expression log2(ratio relative tomean). (G) H3K27ac marks, specifying active regions of chromatin, appear ina common set of genes for Oct4+SKM iPS cells, C2+SKM iPS cells, andC3+SKM iPS cells. H3K27ac peaks were annotated to genes with Homer tocreate Venn diagrams for genes. ChIP enrichment among treatments wasdetermined to be significant at an FDR < 0.1 by DiffBind (n = 2).(H) H3K9me3 marks, specifying repressed regions of chromatin, appear in acommon set of genes for Oct4+SKM iPS cells, C2+SKM iPS cells, and C3+SKM

iPS cells. H3K9me3 peaks were annotated to genes with Homer to createVenn diagrams for genes. ChIP enrichment among treatments was de-termined to be significant at an FDR < 0.1 by DiffBind (n = 2).

Eguchi et al. PNAS | Published online December 5, 2016 | E8261

CELL

BIOLO

GY

PNASPL

US

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 6: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

The differentially marked histone modifications were deter-mined by using DiffBind (38). Peaks with a false-discovery rate(FDR) < 0.1 were categorized into unique and overlapping sets foriPS cells generated with ATF combinations+SKM or Oct4+SKM.Remarkably, despite differences in transcriptome profiles duringearly stages (Fig. 3F), pluripotent cells shared similar sets of peaksfor H3K27ac regardless of whether they were generated withATFs or with natural factors (Fig. 3G). Likewise, repressiveH3K9me3 peaks were similar for ATF-induced iPS cells andOct4-induced iPS cells, although there was greater overlap forOct4+SKM and C3+SKM (Fig. 3H). Taken together, the activeH3K27ac marks and the repressive H3K9me3 marks confirmthat the ATFs+SKM induced significant remodeling of thechromatin structure in MEFs to a state that distinguishes them aspluripotent cells.

DNA Sequence Specificity Landscapes of Pluripotency-Inducing ATFs.DNA targets of ATFs were examined by CSI, a method thatcaptures the comprehensive binding profile of a DNA-bindingfactor (39–41). CSI enables the discovery of sequence specificity

across all possible randomized 25-bp sequence permutations(Fig. 4A) (41). In brief, a DNA-binding protein is incubated withDNA sequences, protein–DNA complexes are isolated, and thebound DNA sequences are PCR amplified for the next round ofselection. Two to three rounds of enrichment efficiently capturesthe broad spectrum of DNA sequence preferences of a givenprotein or ATF. The enriched cognate sites are identified viamassively parallel high-throughput sequencing. The compre-hensive binding site specificity of the factor is displayed asspecificity and energy landscape (SEL) (Fig. 4 A and B) (40, 41).The top-five 10-bp binding sites for each ATF of C2 were used

to identify target genes. Binding sites within ±1 kb of the tran-scriptional start site (TSS) were considered in the analysis. Wechose this stringent window based on ATF design principles (42),the tendency of sequence-specific TFs to exhibit a peak −300 bprelative to the TSS (43), and evidence that the predictive powerof TF binding on gene regulation drops significantly when thebinding sites examined are beyond 2 kb from the TSS (44, 45).Gene set enrichment analysis of all target genes with Enrichr (46)showed an overrepresentation of genes found in PluriNetWork for

Fig. 4. ATFs target pluripotency genes. (A) Cognate site identification (CSI). This method of determining the sequence specificity of DNA-binding factorsinvolves incubating the ATF with randomized permutations of 25-bp sequences. The ATF–DNA complexes are captured with an HA antibody, and bound DNAis PCR amplified for the next round of selection. Three rounds of selection are performed, and all three rounds are multiplexed and sequenced to obtainspecificity and energy landscapes (SELs) and position weight matrices (PWMs). (B) SELs display the comprehensive binding preferences based on a chosen seedmotif. The height of the peak is associated with affinity. Sequences that are 1–2 bp longer than the seed are arranged in concentric rings. Each ring outwardfrom the 0-mismatch ring displays sequences to the corresponding number of mismatches. Within the mismatch rings, sequences are arranged by position ofthe mismatch, and then alphabetically. (C) SELs of four ATFs validated in this study. The first two ATFs (light blue and orange) appear in all three combi-nations. Each SEL displayed shows data after three rounds of enrichment. (D) Gene set enrichment analysis of genes with a CSI score of >20 within ±1 kb ofthe TSS shows overrepresentation of genes from PluriNetWork for Mus musculus. The top-100–scoring 10-bp motifs for each ATF from C2 was used for thisanalysis. Binding sites were annotated to genes with Homer. Genes with a sum CSI score of >20 (equivalent to six or more ATF binding sites) were analyzedwith Enrichr (WikiPathways). The combined score from Enrichr is calculated by multiplying the log of the P value by the z score (deviation from the expectedrank). The adjusted P value corresponds to the Benjamini–Hochberg corrected P value.

E8262 | www.pnas.org/cgi/doi/10.1073/pnas.1611142114 Eguchi et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 7: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

Mus musculus (SI Appendix, Fig. S7E) (47). In addition to high-affinity sites, a wide range of binding sites with moderate to loweraffinities are used to regulate the expression of different genes (48,49). Therefore, rather than relying solely on single consensusmotifs or position weighted matrices, the top-100–scoring 10-bpbinding sites for each ATF were used to identify target genes (Fig.4C and Dataset S1). Analysis with Enrichr for 2,897 genes with asum CSI score of 20 or greater, in other words, six or more medium-to high-affinity ATF binding sites within ±1 kb of the TSS showedan overrepresentation of PluriNetWork genes (Fig. 4D) (47). Thisanalysis was also performed for C3 and C4, and similar results wereobtained, suggesting that the different ATF combinations activatesimilar nodes of the pluripotency network. To our surprise, Oct4was not the primary target as it ranks 4,788 for the sum CSI scorefor C2, 3,929 for C3, and 5,225 for C4. Other regulators in thepluripotency circuitry with significantly higher sum CSI scores wouldserve as primary targets of the ATFs, and these genes, subsequently,trigger the activation of endogenous Oct4.

ATF-Triggered Networks. To map the ATF target genes in thepluripotency network, CSI data were integrated with RNA-seqdata. Genes, expressed greater than twofold more in ATF-induced iPS cells compared with Empty+SKM cells, were fil-tered with those having ATF binding sites within ±1 kb of theTSS using the top-five–scoring motifs from CSI (Fig. 5A). Twoother pairwise comparisons (C2/C3/C4+SKM at the early stageversus iPS stage and C2/C3/C4+SKM iPS cells versus Oct4+SKM iPS cells) were included for differential expression analysis.Of the genes up-regulated greater than twofold that also bearATF binding sites, 17 were implicated in inducing pluripotencyfrom previous studies (Fig. 5B and Dataset S2) (47, 50–57). To

determine the direct ATF targets, we performed ChIP-seq onC2+SKM early cells before complete conversion to iPS cells andconsequent silencing of the exogenously delivered ATFs. ChIP-seq of ATFs tagged with HA suggests binding of the ATFs tothese targets (Fig. 5C and SI Appendix, Fig. S7A). Presence ofH3K27ac peaks at the ATF target genes also indicates that theyare actively expressed (SI Appendix, Fig. S7B).A gene regulatory network based on the CSI results and dif-

ferential expression data were built using information from theliterature and the STRING database (Fig. 6A) (58). GreaterATF occupancy near the TSS for the 17 predicted targets in thegene regulatory network suggests they are direct targets (Fig.6B). A comparison of direct OCT4 targets with the ATF targetgenes reveals striking differences (Fig. 6C) (47). These differencessuggest that ATFs activate the pluripotency network throughdifferent nodes than the exogenously expressed Oct4. Althoughthe primary targets may differ at the outset, the eventual iPS cellsshow remarkable convergence in the transcriptome profiles andepigenetic landscapes.

DiscussionZinc finger, TAL effector, and CRISPR/Cas9 libraries have beentested for loss-of-function phenotypes, acquisition of resistanceto a drug, or up-regulation of specific genes (21, 22, 27, 59–63);however, this study reports a gain-of-function screen with agenome-scale ATF library to reprogram fibroblasts to iPS cells, afeat that requires drastic transcriptional and epigenetic changes.Rather than engineering ATFs to target unique sites in thegenome, we chose the zinc finger DBD primarily because of itsability to target a range of different DNA sites as well as theability to interact with methylated and heterochromatic DNA.

Fig. 5. ATFs activate key regulators of the pluripotency network. (A) Workflow for determining ATF target genes for C2+SKM. Three pairwise comparisonswere made: (1) C2+SKM iPS cells vs. Empty+SKM cells, (2) C2+SKM early cells vs. Empty+SKM cells, (3) C2+SKM iPS cells vs. Oct4+SKM iPS cells. Genes up-regulated greater than twofold (P < 0.05) in the cells transduced with ATFs with ATF binding sites within a ±1-kb window of the TSS were determined to bepotential targets. Binding sites were identified by using the top-five–scoring 10-bp motifs from CSI. These target genes were used to build the network in Fig.6A with information from the literature and the STRING database. (B) Differentially expressed pluripotency genes with ATF binding sites within ±1 kb of thetranscriptional start site (TSS). ATF binding sites were derived from the top-five–scoring 10-bp motifs from CSI for C2+SKM. (C) ChIP-seq signal for HA tag onATFs for five predicted targets in Fig. 5B. Additional ChIP-seq traces are in SI Appendix, Fig. S5A. Traces display total reads for C2+SKM and Empty+SKM cellsat an intermediate stage before reprogramming to a pluripotent state.

Eguchi et al. PNAS | Published online December 5, 2016 | E8263

CELL

BIOLO

GY

PNASPL

US

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 8: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

Furthermore, we deliberately engineered in a synthetic protein–protein interaction module that endows the ATFs with a uniqueability to sample binding sites as cooperative dimers and therebystimulate target gene expression to greater extent due to tran-scriptional synergy (15). Based on design principles, we gener-ated and screened a zinc finger ATF library of high complexity,not previously tested in mammalian cells. Furthermore, the ca-pacity for the ATFs to cooperatively bind target genes providesthe library with a unique feature to sample a larger set of bindingsites cooperatively and activate to greater extent due to synergy.Conventional zinc finger libraries consisted of ATFs created byshuffling a limited number of zinc finger units, previously char-acterized to bind specific triplets of nucleotides (64). The libraryused in this study uses a much larger repertoire of residues, in-corporating 16 of the 20 possible amino acids in the recognitionresidues, greatly expanding the target space of the ATFs. Ourdesign, which is consistent with a survey of natural zinc fingersfound in eukaryotes, finds that nature uses all amino acids in therecognition residues (65).Application of the ATF library in this gain-of-function screen

demonstrates that zinc finger ATFs can perturb the transcrip-tional profile of a cell to levels that are sufficiently robust toinduce a dramatic phenotypic change. Like natural TFs, theATFs in this study target 9- to 10-bp sites and can bind co-operatively as homodimers and heterodimers to target a diverserange of regulatory elements simultaneously. As each ATF in thelibrary will have unique sequence preferences and varying degreesof affinity for DNA, a wide range of outcomes can be elicited uponintroduction of these ATFs. More importantly, unlike natural TFs,ATFs do not necessarily rely on endogenous partner proteins, andthus, transcriptional networks can be stimulated from any ho-meostatic state. Moreover, rather than first identifying regulatoryregions that might permit ATF binding, the use of a library permits

the identification of molecules that can bind at functionally relevantloci to regulate the expression of critical genes that drive desiredphenotypes or cell states.Because we are using a large library of ATFs, capable of

sampling thousands of genes in parallel, there is a potential forour ATFs to activate endogenous Oct4 directly; however, in ouranalysis, we find that the ATFs seem to activate Oct4 indirectly.It is interesting to note that two ATFs that recur in differentcombinations (ZFATF1 and ZFATF2), target sites that areprevalent in CpG islands. These ATFs could potentially amplifythe output of transcriptional networks that are stimulated bycombinations of pluripotency factors working alongside the ex-ogenously expressed ATFs and SKM. Intriguingly, during theearly stages of conversion, cells expressing ATFs+SKM exhibiteda different transcriptome profile from those expressing Oct4+SKM;however, in the final iPS cell state, the expression profiles of allof the pluripotent cell types were similar at the molecular level.Although the functional pluripotency of the ATF-derived iPScells are described, direct comparison with teratomas or EBsfrom OSKM-derived iPS cells remains to be studied. The dif-ferences in molecular signatures at the early stage suggest thatthe MEFs take different dedifferentiation routes to the samepluripotent state. Thus, this unbiased forward genetic approachcan reveal unanticipated network linkages and identify ways tointerface with a cell fate-defining gene regulatory network. Moreremarkably, our results show that continued expression of ATFsis not required to maintain a stable cell state once homeostaticsystems are set.In addition to providing a means to perform a forward genetic

screen, we demonstrate that our ATF library is a powerful re-source for everyone who is interested in inducing cell fate con-version in the absence of a priori knowledge of natural TFs thatmight govern the desired cell type or phenotype. Furthermore,we describe a strategy toward identifying cell fate-definingtranscriptional networks. By integrating expression data with invitro binding site data, we were able to identify the nodes of thetranscriptional network implicated in the induction of pluri-potency. This technology enables the pursuit of elusive cellphenotypes or direct conversions, considered challenging toachieve by conventional methods. In summary, this study pro-vides compelling support of our design principles and demon-strates that our ATF library can be used in a gain-of-functionscreen for complex cell fate conversions.

Materials and MethodsDesign: Choice of DBD Scaffold. In designing an ATF library for cell fateconversions, we first focused on choosing a DNA-binding scaffold that wouldbe most conducive for performing a forward genetic screen. As noted above,the following criteria were considered.i) Ability to target multiple nodes. Although nuclease-inactivated CRISPR/Cas9(66) and TAL-effectors (67) require at least than 10–20 bp to bind DNA, zincfingers can be designed to target a wider range of sequences. The breadthof binding sites of zinc fingers is an important advantage. A zinc finger ATFcan have thousands of binding sites in the genome, much like natural TFs,and perturb the transcriptome on a genomic scale. By contrast, highly spe-cific DBDs such as TAL-effectors or RNA-guided CRISPR/Cas systems thattarget single genomic loci are less likely to perturb multiple nodes in anetwork and alter the homeostatic state to induce a change in cell fate.ii) Regulatory potency. Among the DBDs, zinc fingers have the unique ability totarget both active and silenced regions of the genome, a feature importantfor cell fate conversions. Certain naturally occurring families of C2H2 zincfingers bind methylated DNA and heterochromatin (65, 68), making zincfingers well suited for activating epigenetically silenced regions unlike TAL-effectors, which are sensitive to DNA methylation (61, 69). Although CRISPR/Cas9 systems can activate silenced genes, they show limited ability to in-crease the expression of genes that are already expressed at moderateor high levels (63, 70). Additionally, compared with zinc fingers and TAL-effectorsthat can up-regulate genes to biologically relevant levels (42, 71, 72), the magni-tude of transcriptional change induced by nuclease-inactivated CRISPR/Cas9 systemswith a single guide is not as robust (73). Recent modifications to the CRISPR/Cas9

Fig. 6. Transcriptional networks activated by ATFs and Oct4. (A) Nodes of thepluripotency network activated by the ATFs of C2. The size of the node reflectslevel of expression in C2+SKM iPS cells. (B) ChIP read density in C2+SKM earlycells for HA-tagged ATFs for 17 direct target genes (Left) and 15 indirect targetgenes (Right) for nodes in Fig. 6A. Traces represent coverage across a windowof −2 to +1.5 kb relative to the TSS. (C) Direct targets of Oct4 (47). The size ofthe node reflects level of expression in Oct4+SKM iPS cells.

E8264 | www.pnas.org/cgi/doi/10.1073/pnas.1611142114 Eguchi et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 9: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

system have improved their impact on the level of expression of target genes;however, these modifications come at the expense of increasing their size (70)or introducing additional effector molecules that can be recruited by Cas9 (74,75) or the guide RNA (63, 76).iii) Efficient delivery. Because cell fate conversions occur at low frequencies,efficient delivery of the ATFs to cells is critical for genome-scale gain-of-function screens. Each zinc finger unit is ∼30 aa, and three zinc fingers (90 aa)can be linked in tandem to target a 9- to 10-bp site (77). Compared with Cas9(∼1,380 aa) or TAL-effectors (∼520 aa), designed to target a binding site ofthe same size, zinc fingers are much smaller and can be efficiently deliveredto mammalian cells. Programmable molecules such as polyamides may providea smaller ATF; however, the rules governing polyamide cell permeability arestill not well understood, making delivery problematic (78). In brief, the zincfinger DBD emerges as an optimal scaffold for design of complex ATF librariesthat can trigger cell fate-defining gene networks.

Zinc Finger ATF Library. The scaffold of the ATF is comprised from N to Cterminus: a 15-aa ID, the DNA binding domain of human EGR1, NLS fromEGR1, VP64, and 3× HA tag. The ATF library was created by amplifying oligoswith VNN codons at the −1, 2, 3, and 6 positions relative to the recognitionhelix of each zinc finger. The ATF library was cloned into the second-generationpSIN vector by ligation-independent cloning as described in SI Appendix, SIMaterials and Methods.

Cell Culture. Oct4: CremER-Cre-mER; mTmG MEFs were grown in DMEM sup-plemented with 10% (vol/vol) FBS on plates coated with 0.1% gelatin.Mouse E14T ES cells and iPS cells were grown in knockout DMEM supple-mented with 15% (vol/vol) FBS, 1% nonessential amino acids, 2 mM L-glu-tamine, 1 × 103 units/mL leukemia inhibitory factor, 1 mM sodium pyruvate,and 100 μM β-mercaptoethanol. During reprogramming of Oct4:CremER-Cre-mER;mTmG MEFs, 4-hydroxytamoxifen was added at 100 nM concentration. Cellswere maintained in a humidified 37 °C incubator with 5% CO2. Additional de-tails are in SI Appendix, SI Materials and Methods.

Luciferase Assay. The palindromic EGR1 binding site 5′-GCG-TGG-GCG-CGC-CCC-CGC-3′ was cloned upstream of the luciferase gene in the pGL3 basicvector (Promega). Luciferase assay (Promega; E4030) was performed accordingto the manufacturer’s guidelines. Additional details are in SI Appendix, SI Materialsand Methods.Retrovirus production. Oct4, Sox2, Klf4, and c-Myc were packaged into retro-virus with Plat-E cells as described in ref. 79.Lentivirus production. ATFs and the empty control were packaged into lenti-virus with HEK293FT cells using a second-generation lentiviral system. Detailsare described in SI Appendix, SI Materials and Methods.

Identification of ATFs from Single Cells. Cells with a positive phenotype forOct4 lineage tracing activation were isolated as single cells into a 96-wellplate. Nested two-step PCR is described in SI Appendix, SI Materials and Methods.

EB Formation. For EB formation, pluripotent cells were seeded into ultralow-adhesion dishes at a concentration of 1 × 105 cells/mL in knockout DMEMsupplemented with 15% (vol/vol) FBS, 1% nonessential amino acids, 2 mML-glutamine, 1 mM sodium pyruvate, and 100 μM β-mercaptoethanol. Mediawas changed the day after seeding and every 2 d thereafter. EBs were col-lected on days 7, 11, and 14 for quantitative RT-PCR (RT-qPCR) and immuno-fluorescence. Cells were maintained in a humidified 37 °C incubator with5% CO2.

Immunofluorescence. EBs were plated on poly-L-lysine on day 14 to culture EBoutgrowths. iPS cells were plated on glass slides coated with 0.1% gelatin forimmunofluorescence. Antibody sources and dilutions are described in SIAppendix, SI Materials and Methods.

RT-qPCR. RNA was extracted from cells with RNeasy Mini Kit (Qiagen; 74104).RNA was converted into cDNA with SuperScript III First-Strand SynthesisSystem (Thermo Fisher; 18080051). qPCR was performed with Bullseye Eva-Green qPCR Mix with low ROX (Midwest Scientific; BEQPCR-LR). Primer setsare listed in SI Appendix, SI Materials and Methods.

ChIP. For ChIP, 5 × 106 cells were fixed in 1.5% (vol/vol) formaldehyde for15 min. Harvested cells were flash frozen, and then sonicated and lysed. Lysateswere precleared and immunoprecipitated overnight with H3K27ac antibody(Abcam; ab4729), H3K9me3 antibody (Abcam; ab8898), or HA antibody(Abcam; ab9110) at 4 °C. Immunoprecipitated histone marks were purified withprotein G magnetic beads (Life Technologies; 10004D) and a series offive washes. Cross-links of protein–DNA complexes were reversed by in-cubating at 65 °C for 6 h. Eluted DNA was treated with RNase A and Pro-teinase K. Additional details are in SI Appendix, SI Materials and Methods.

RNA-Seq Analysis. Reads were aligned with Bowtie2, version 2.2.5, to eitherthe human genome hg19 (HEK293) or mouse genome mm10 (MEFs or cellsderived from MEFs). Counts were quantified with Cufflinks, and differentialexpression was determined by Cuffdiff (80).

ChIP-Seq Analysis. Reads were annotated to the mouse genome mm10 withBowtie2, version 2.2.5. Output sam files were converted to bam files, sorted,and indexed with Samtools 1.3. H3K27ac, H3K9me3, and HA peaks werecalled with MACS2 2.2.1. Differential peak signals were determined byDiffBind 1.16.2. ChIP peaks were visualized with Integrative Genomics Viewer(IGV). Additional details are in SI Appendix, SI Materials and Methods.

CSI. CSI was performed by incubating cell lysates containing zinc finger ATFswith randomized permutations of 25-bp sequences. The ATF–DNA complexeswere immunoprecipitated with HA magnetic beads. Three rounds of en-richment were performed, and all three rounds of enrichment were se-quenced. Experimental details and bioinformatic analysis are described in SIAppendix, SI Materials and Methods.

ACKNOWLEDGMENTS. We thank Sandra Tseng and Graham Erwin for helpwith ChIP-seq analysis. We thank José Rodríguez-Martínez for CSI analysisand Christina Shafer with RNA-seq analysis. We also acknowledge MitchellProbasco with help with flow cytometry, Jennifer Bolin for help with Illu-mina sequencing, and Bret Duffin for the teratoma assay. We thank LauraVanderploeg for help with figure graphics. We are also grateful to JudithKimble, Sushmita Roy, Garrett Lee, and Fang Wan for helpful discussions.This work was supported by the NIH Grant HL099773, W. M. Keck MedicalResearch Award, and Progenitor Cell Biology Consortium Jump-Start Award5U01HL099997-05 (Subaward 101330A). A.E. was supported by the Mor-gridge Biotechnology Wisconsin Distinguished Fellowship Award and theStem Cell and Regenerative Medicine Training Award. A.S.K. was supportedby Genomic Sciences Training Program Grant 5T32HG002760. D.B. was sup-ported by a National Science Foundation–Nanoscale Science and Engineer-ing Center grant.

1. Takahashi K, Yamanaka S (2006) Induction of pluripotent stem cells from mouseembryonic and adult fibroblast cultures by defined factors. Cell 126(4):663–676.

2. Takahashi K, et al. (2007) Induction of pluripotent stem cells from adult human fi-broblasts by defined factors. Cell 131(5):861–872.

3. Yu J, et al. (2007) Induced pluripotent stem cell lines derived from human somaticcells. Science 318(5858):1917–1920.

4. Davis RL, Weintraub H, Lassar AB (1987) Expression of a single transfected cDNAconverts fibroblasts to myoblasts. Cell 51(6):987–1000.

5. Ieda M, et al. (2010) Direct reprogramming of fibroblasts into functional car-diomyocytes by defined factors. Cell 142(3):375–386.

6. Vierbuchen T, et al. (2010) Direct conversion of fibroblasts to functional neurons bydefined factors. Nature 463(7284):1035–1041.

7. Sekiya S, Suzuki A (2011) Direct conversion of mouse fibroblasts to hepatocyte-likecells by defined factors. Nature 475(7356):390–393.

8. Cohen DE, Melton D (2011) Turning straw into gold: Directing cell fate for re-generative medicine. Nat Rev Genet 12(4):243–252.

9. Becker JS, Nicetto D, Zaret KS (2016) H3K9me3-dependent heterochromatin: Barrierto cell fate changes. Trends Genet 32(1):29–41.

10. Eguchi A, Lee GO, Wan F, Erwin GS, Ansari AZ (2014) Controlling gene networks andcell fate with precision-targeted DNA-binding proteins and small-molecule-basedgenome readers. Biochem J 462(3):397–413.

11. Ansari AZ, Mapp AK (2002) Modular design of artificial transcription factors. CurrOpin Chem Biol 6(6):765–772.

12. Mapp AK, Ansari AZ, Ptashne M, Dervan PB (2000) Activation of gene expression bysmall molecule transcription factors. Proc Natl Acad Sci USA 97(8):3930–3935.

13. Mapp AK, Ansari AZ (2007) A TAD further: Exogenous control of gene activation. ACSChem Biol 2(1):62–75.

14. Ansari AZ (2007) Chemical crosshairs on the central dogma. Nat Chem Biol 3(1):2–7.15. Ptashne M (2004) A Genetic Switch (Cold Spring Harbor Lab Press, Cold Spring Harbor,

NY), 3rd Ed.16. Moretti R, Ansari AZ (2008) Expanding the specificity of DNA targeting by harnessing

cooperative assembly. Biochimie 90(7):1015–1025.17. Wolfe SA, Nekludova L, Pabo CO (2000) DNA recognition by Cys2His2 zinc finger

proteins. Annu Rev Biophys Biomol Struct 29:183–212.18. Segal DJ, Barbas CF (2001) Custom DNA-binding proteins come of age: Polydactyl

zinc-finger proteins. Curr Opin Biotechnol 12(6):632–637.

Eguchi et al. PNAS | Published online December 5, 2016 | E8265

CELL

BIOLO

GY

PNASPL

US

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0

Page 10: Reprogramming cell fate with a genome-scale library of ... · Reprogramming cell fate with a genome-scale library of artificial transcription factors Asuka ... the design of our genome-scale

19. Klug A (2010) The discovery of zinc fingers and their applications in gene regulationand genome manipulation. Annu Rev Biochem 79:213–231.

20. Triezenberg SJ, Kingsbury RC, McKnight SL (1988) Functional dissection of VP16, thetrans-activator of herpes simplex virus immediate early gene expression. Genes Dev2(6):718–729.

21. Bae K-H, et al. (2003) Human zinc fingers as building blocks in the construction ofartificial transcription factors. Nat Biotechnol 21(3):275–280.

22. Blancafort P, Magnenat L, Barbas CF (2003) Scanning the human genome withcombinatorial transcription factor libraries. Nat Biotechnol 21(3):269–274.

23. Carroll D, Morton JJ, Beumer KJ, Segal DJ (2006) Design, construction and in vitrotesting of zinc finger nucleases. Nat Protoc 1(3):1329–1341.

24. Greisman HA, Pabo CO (1997) A general strategy for selecting high-affinity zinc fingerproteins for diverse DNA target sites. Science 275(5300):657–661.

25. Lee J, et al. (2011) Induction of stable drug resistance in human breast cancer cellsusing a combinatorial zinc finger transcription factor library. PLoS One 6(7):e21112.

26. Park K-S, et al. (2003) Phenotypic alteration of eukaryotic cells using randomized li-braries of artificial transcription factors. Nat Biotechnol 21(10):1208–1214.

27. Tschulena U, Peterson KR, Gonzalez B, Fedosyuk H, Barbas CF (2009) Positive selectionof DNA-protein interactions in mammalian cells through phenotypic coupling withretrovirus production. Nat Struct Mol Biol 16(11):1195–1199.

28. Wang BS, Grant RA, Pabo CO (2001) Selected peptide extension contacts hydrophobicpatch on neighboring zinc finger and mediates dimerization on DNA. Nat Struct Biol8(7):589–593.

29. Teschendorf C, Warrington KH, Siemann DW, Muzyczka N (2002) Comparison of theEF-1 alpha and the CMV promoter for engineering stable tumor cell lines using re-combinant adeno-associated virus. Anticancer Res 22(6A):3325–3330.

30. Greder LV, et al. (2012) Analysis of endogenous Oct4 activation during induced plu-ripotent stem cell reprogramming using an inducible Oct4 lineage label. Stem Cells30(11):2596–2601.

31. Cahan P, et al. (2014) CellNet: Network biology applied to stem cell engineering. Cell158(4):903–915.

32. Whyte WA, et al. (2013) Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153(2):307–319.

33. Hnisz D, et al. (2013) Super-enhancers in the control of cell identity and disease. Cell155(4):934–947.

34. Schultz DC, Ayyanathan K, Negorev D, Maul GG, Rauscher FJ (2002) SETDB1: A novelKAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes toHP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. GenesDev 16(8):919–932.

35. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. NatMethods 9(4):357–359.

36. Zhang Y, et al. (2008) Model-based analysis of ChIP-Seq (MACS). Genome Biol 9(9):R137.

37. Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T (2014) deepTools: A flexibleplatform for exploring deep-sequencing data. Nucleic Acids Res 42(Web server issue):W187–W191.

38. Ross-Innes CS, et al. (2012) Differential oestrogen receptor binding is associated withclinical outcome in breast cancer. Nature 481(7381):389–393.

39. Warren CL, et al. (2006) Defining the sequence-recognition profile of DNA-bindingmolecules. Proc Natl Acad Sci USA 103(4):867–872.

40. Carlson CD, et al. (2010) Specificity landscapes of DNA binding molecules elucidatebiological function. Proc Natl Acad Sci USA 107(10):4544–4549.

41. Tietjen JR, Donato LJ, Bhimisaria D, Ansari AZ (2011) Sequence-specificity and energylandscapes of DNA-binding molecules. Methods Enzymol 497:3–30.

42. Rebar EJ, et al. (2002) Induction of angiogenesis in a mouse model using engineeredtranscription factors. Nat Med 8(12):1427–1432.

43. Koudritsky M, Domany E (2008) Positional distribution of human transcription factorbinding sites. Nucleic Acids Res 36(21):6795–6805.

44. Cheng C, Gerstein M (2012) Modeling the relative relationship of transcription factorbinding and histone modifications to gene expression levels in mouse embryonic stemcells. Nucleic Acids Res 40(2):553–568.

45. Whitfield TW, et al. (2012) Functional analysis of transcription factor binding sites inhuman promoters. Genome Biol 13(9):R50.

46. Chen EY, et al. (2013) Enrichr: Interactive and collaborative HTML5 gene list enrich-ment analysis tool. BMC Bioinformatics 14(1):128.

47. Som A, et al. (2010) The PluriNetWork: An electronic representation of the networkunderlying pluripotency in mouse, and its applications. PLoS One 5(12):e15165.

48. Farley EK, et al. (2015) Suboptimization of developmental enhancers. Science350(6258):325–328.

49. Crocker J, et al. (2015) Low affinity binding site clusters confer hox specificity andregulatory robustness. Cell 160(1-2):191–203.

50. Sharov AA, et al. (2008) Identification of Pou5f1, Sox2, and Nanog downstream targetgenes with statistical confidence by applying a novel algorithm to time course mi-croarray and genome-wide chromatin immunoprecipitation data. BMC Genomics9(1):269.

51. Marson A, et al. (2008) Connecting microRNA genes to the core transcriptional reg-ulatory circuitry of embryonic stem cells. Cell 134(3):521–533.

52. Heng J-CD, et al. (2010) The nuclear receptor Nr5a2 can replace Oct4 in the re-programming of murine somatic cells to pluripotent cells. Cell Stem Cell 6(2):167–174.

53. Buganim Y, et al. (2012) Single-cell expression analyses during cellular reprogram-ming reveal an early stochastic and a late hierarchic phase. Cell 150(6):1209–1222.

54. Shu J, et al. (2013) Induction of pluripotency in mouse somatic cells with lineagespecifiers. Cell 153(5):963–975.

55. Lujan E, et al. (2015) Early reprogramming regulators identified by prospective iso-lation and mass cytometry. Nature 521(7552):352–356.

56. Krentz AD, et al. (2013) Interaction between DMRT1 function and genetic back-ground modulates signaling and pluripotency to control tumor susceptibility in thefetal germ line. Dev Biol 377(1):67–78.

57. Kim J, Chu J, Shen X, Wang J, Orkin SH (2008) An extended transcriptional networkfor pluripotency of embryonic stem cells. Cell 132(6):1049–1061.

58. Szklarczyk D, et al. (2015) STRING v10: Protein-protein interaction networks, in-tegrated over the tree of life. Nucleic Acids Res 43(Database issue):D447–D452.

59. Zhou Y, et al. (2014) High-throughput screening of a CRISPR/Cas9 library for func-tional genomics in human cells. Nature 509(7501):487–491.

60. Wang T, Wei JJ, Sabatini DM, Lander ES (2014) Genetic screens in human cells usingthe CRISPR-Cas9 system. Science 343(6166):80–84.

61. Kim Y, et al. (2013) A library of TAL effector nucleases spanning the human genome.Nat Biotechnol 31(3):251–258.

62. Li Y, Ehrhardt K, Zhang MQ, Bleris L (2014) Assembly and validation of versatiletranscription activator-like effector libraries. Sci Rep 4:4857.

63. Konermann S, et al. (2015) Genome-scale transcriptional activation by an engineeredCRISPR-Cas9 complex. Nature 517(7536):583–588.

64. Gonzalez B, et al. (2010) Modular system for the construction of zinc-finger librariesand proteins. Nat Protoc 5(4):791–810.

65. Najafabadi HS, et al. (2015) C2H2 zinc finger proteins greatly expand the humanregulatory lexicon. Nat Biotechnol 33(5):555–562.

66. Jinek M, et al. (2012) A programmable dual-RNA-guided DNA endonuclease inadaptive bacterial immunity. Science 337(6096):816–821.

67. Boch J, et al. (2009) Breaking the code of DNA binding specificity of TAL-type III ef-fectors. Science 326(5959):1509–1512.

68. Filion GJP, et al. (2006) A family of human zinc finger proteins that bind methylatedDNA and repress transcription. Mol Cell Biol 26(1):169–181.

69. Valton J, et al. (2012) Overcoming transcription activator-like effector (TALE) DNAbinding domain sensitivity to cytosine methylation. J Biol Chem 287(46):38427–38432.

70. Chavez A, et al. (2015) Highly efficient Cas9-mediated transcriptional programming.Nat Methods 12(4):326–328.

71. Gao X, et al. (2013) Reprogramming to pluripotency using designer TALE transcrip-tion factors targeting enhancers. Stem Cell Rep 1(2):183–197.

72. Bailus BJ, et al. (2016) Protein delivery of an artificial transcription factor restoreswidespread Ube3a expression in an Angelman syndrome mouse brain. Mol Ther24(3):548–555.

73. Esvelt KM, et al. (2013) Orthogonal Cas9 proteins for RNA-guided gene regulationand editing. Nat Methods 10(11):1116–1121.

74. Gilbert LA, et al. (2014) Genome-scale CRISPR-mediated control of gene repressionand activation. Cell 159(3):647–661.

75. Tanenbaum ME, Gilbert LA, Qi LS, Weissman JS, Vale RD (2014) A protein-taggingsystem for signal amplification in gene expression and fluorescence imaging. Cell159(3):635–646.

76. Zalatan JG, et al. (2015) Engineering complex synthetic transcriptional programs withCRISPR RNA scaffolds. Cell 160(1-2):339–350.

77. Pabo CO, Sauer RT (1992) Transcription factors: Structural families and principles ofDNA recognition. Annu Rev Biochem 61:1053–1095.

78. Edelson BS, et al. (2004) Influence of structural variation on nuclear localization ofDNA-binding polyamide-fluorophore conjugates. Nucleic Acids Res 32(9):2802–2818.

79. Takahashi K, Okita K, Nakagawa M, Yamanaka S (2007) Induction of pluripotent stemcells from fibroblast cultures. Nat Protoc 2(12):3081–3089.

80. Trapnell C, et al. (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7(3):562–578.

E8266 | www.pnas.org/cgi/doi/10.1073/pnas.1611142114 Eguchi et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 5,

202

0