6
NATURE METHODS | VOL.7  NO.8  | AUGUST 2010  | 615 BRIEF COMMUNICATIONS nonspecific pulldown by reducing sample volumes and titrating quantities of antibody and beads (Supplementary Fig. 1 and Online Methods). These improvements enabled us to effectively enrich for histone H3 Lys4 trimethylation (H3K4me3) in a ChIP performed with 10,000 mouse embryonic stem cells (ESCs), as judged by quantitative PCR (qPCR). However, the modified ChIP procedure yielded scarce quanti- ties of DNA, below the detection limit of fluorometry, and thus remained incompatible with standard ChIP-seq protocols, which require several nanograms of immunoprecipitated DNA. We esti- mated that an H3K4me3 ChIP performed on 10,000 cells yielded ~10–50 pg of DNA and sought to develop a library preparation procedure compatible with these amounts. We initially pursued a strategy that coupled a random primer–based amplification procedure widely used in microarray studies 7 with standard Illumina library preparation. But when we sequenced the resulting libraries and integrated the reads into density maps, we observed many false positive peaks, a conspicuous absence of signal over (G+C)-rich regions and high numbers of unalignable reads that likely reflected primer artifacts. To overcome these problems, we implemented several strate- gies (Fig. 1a). First, we tested modified random primers with Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors Mazhar Adli 1–3 , Jiang Zhu 1–3 & Bradley E Bernstein 1–3 Current methods for whole-genome mapping of protein-DNA interactions, performed by coupling chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq), require large amounts of starting materials, which precludes their application to rare cell types. Here we combine a high-sensitivity ChIP assay with a new library preparation procedure to map histone modifications in as few as 10,000 cells. We used the technique to characterize mouse hematopoietic progenitors and thereby gain insight into their developmental program. Chromatin immunoprecipitation (ChIP) coupled with high- throughput sequencing (ChIP-seq) has become a standard tool for whole-genome mapping of histone modifications, transcription factors and chromatin-associated proteins 1 . However, the technique is plagued by inefficiencies at ChIP and sequencing steps that neces- sitate large amounts of starting materials, typically on the order of millions or tens of millions of cells. ChIP assays yield small amounts of DNA, whose availability for downstream assays is additionally reduced by DNA damage introduced during fixation and fragmen- tation. Although locus-specific analysis has been achieved for small cell numbers by coupling ChIP with PCR or promoter arrays 2–5 , such assays do not achieve the comprehensiveness afforded by DNA sequencing approaches. Sequencing, however, requires relatively large amounts of DNA, in part because library preparation involves an inefficient ligation step that requires double-stranded DNA with intact ends. Direct sequencing of DNA without need for ligation or amplication 6 can address some of these inefficiencies and be used to interrogate small quantities of immunoprecipitated DNA. However, neither standard nor direct ChIP-seq approaches have so far been successfully applied to analyze small numbers of cells. We therefore developed a ChIP-seq assay for limited numbers of cells using the Illumina Genome Analyzer, the dominant short-read sequencing platform. First, we optimized the ChIP assay for small samples. We identified reproducible conditions for shearing small amounts of chromatin and minimized background resulting from 1 Howard Hughes Medical Institute, Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA. 2 Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 3 Center for Systems Biology and Center for Cancer Research, Massachusetts General Hospital, Boston, Massachusetts, USA. Correspondence should be addressed to B.E.B. ([email protected]). RECEIVED 26 MARCH; ACCEPTED 28 MAY; PUBLISHED ONLINE 11 JULY 2010; DOI:10.1038/NMETH.1478 a Step 1 (Primer extension) Step 3 (Restriction and ligation) 55555A-35553333-A 33337 °C + Primer 1 Primer 2 + 72 °C BciVI digestion + Illumina adapter ligation b + Step 2 (PCR:10–15 cycles) 0 40 80 120 160 200 Sox2 Egf4 Six1 Gapdh Chr18 Small-cell-number ChIP-seq (100,000 cells) Small-cell-number ChIP-seq (50,000 cells) Small-cell-number ChIP-seq (10,000 cells) Standard ChIP-seq (20 million cells) Relative fold enrichment Figure 1 | Preparation of sequencing library from small amounts of immunoprecipitated DNA. (a) Schematic of ChIP-seq library preparation for small cell numbers. Universal PCR sequence is indicated in orange, restriction site in gray and random 9-mer in blue. (b) qPCR data for H3K4me3 ChIP-seq libraries from indicated cell numbers showing relative enrichments at four gene promoters with H3K4me3 in ESCs. Data were normalized against an intergenic control region on chromosome 18 (chr18). Error bars, s.e.m. based on three experimental replicates. BRIEF COMMUNICATIONS © 2010 Nature America, Inc. All rights reserved.

Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

Embed Size (px)

Citation preview

Page 1: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

nature methods  |  VOL.7  NO.8  |  AUGUST 2010  |  615

brief communications

nonspecific pulldown by reducing sample volumes and titrating quantities of antibody and beads (Supplementary Fig. 1 and Online Methods). These improvements enabled us to effectively enrich for histone H3 Lys4 trimethylation (H3K4me3) in a ChIP performed with 10,000 mouse embryonic stem cells (ESCs), as judged by quantitative PCR (qPCR).

However, the modified ChIP procedure yielded scarce quanti-ties of DNA, below the detection limit of fluorometry, and thus remained incompatible with standard ChIP-seq protocols, which require several nanograms of immunoprecipitated DNA. We esti-mated that an H3K4me3 ChIP performed on 10,000 cells yielded ~10–50 pg of DNA and sought to develop a library preparation procedure compatible with these amounts. We initially pursued a strategy that coupled a random primer–based amplification procedure widely used in microarray studies7 with standard Illumina library preparation. But when we sequenced the resulting libraries and integrated the reads into density maps, we observed many false positive peaks, a conspicuous absence of signal over (G+C)-rich regions and high numbers of unalignable reads that likely reflected primer artifacts.

To overcome these problems, we implemented several strate-gies (Fig. 1a). First, we tested modified random primers with

Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitorsMazhar Adli1–3, Jiang Zhu1–3 & Bradley E Bernstein1–3

current methods for whole-genome mapping of protein-dna interactions, performed by coupling chromatin immunoprecipitation with high-throughput sequencing (chiP-seq), require large amounts of starting materials, which precludes their application to rare cell types. here we combine a high-sensitivity chiP assay with a new library preparation procedure to map histone modifications in as few as 10,000 cells. We used the technique to characterize mouse hematopoietic progenitors and thereby gain insight into their developmental program.

Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-seq) has become a standard tool for whole-genome mapping of histone modifications, transcription factors and chromatin-associated proteins1. However, the technique is plagued by inefficiencies at ChIP and sequencing steps that neces-sitate large amounts of starting materials, typically on the order of millions or tens of millions of cells. ChIP assays yield small amounts of DNA, whose availability for downstream assays is additionally reduced by DNA damage introduced during fixation and fragmen-tation. Although locus-specific analysis has been achieved for small cell numbers by coupling ChIP with PCR or promoter arrays2–5, such assays do not achieve the comprehensiveness afforded by DNA sequencing approaches. Sequencing, however, requires relatively large amounts of DNA, in part because library preparation involves an inefficient ligation step that requires double-stranded DNA with intact ends. Direct sequencing of DNA without need for ligation or amplication6 can address some of these inefficiencies and be used to interrogate small quantities of immunoprecipitated DNA. However, neither standard nor direct ChIP-seq approaches have so far been successfully applied to analyze small numbers of cells. We therefore developed a ChIP-seq assay for limited numbers of cells using the Illumina Genome Analyzer, the dominant short-read sequencing platform.

First, we optimized the ChIP assay for small samples. We identified reproducible conditions for shearing small amounts of chromatin and minimized background resulting from

1Howard Hughes Medical Institute, Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA. 2Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 3Center for Systems Biology and Center for Cancer Research, Massachusetts General Hospital, Boston, Massachusetts, USA. Correspondence should be addressed to B.E.B. ([email protected]).Received 26 MaRch; accepted 28 May; published online 11 July 2010; doi:10.1038/nMeth.1478

a Step 1(Primer extension)

Step 3(Restriction and ligation)

5′

5′

5′5′

5′A-3′5′

5′5′

3′ 3′

3′ 3′-A

3′

3′

3′

37 °C+Primer 1

Primer 2+

72 °CBciVI

digestion

+ Illuminaadapterligation

b

+

Step 2(PCR:10–15 cycles)

0

40

80

120

160

200

Sox2 Egf4 Six1 Gapdh Chr18

Small-cell-numberChIP-seq (100,000 cells)

Small-cell-numberChIP-seq (50,000 cells)

Small-cell-numberChIP-seq (10,000 cells)

Standard ChIP-seq(20 million cells)

Rel

ativ

e fo

ld e

nric

hmen

t

figure 1 | Preparation of sequencing library from small amounts of immunoprecipitated DNA. (a) Schematic of ChIP-seq library preparation for small cell numbers. Universal PCR sequence is indicated in orange, restriction site in gray and random 9-mer in blue. (b) qPCR data for H3K4me3 ChIP-seq libraries from indicated cell numbers showing relative enrichments at four gene promoters with H3K4me3 in ESCs. Data were normalized against an intergenic control region on chromosome 18 (chr18). Error bars, s.e.m. based on three experimental replicates.

brief communications©

201

0 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Page 2: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

616  |  VOL.7  NO.8  |  AUGUST 2010  |  nature methods

brief communications

bulky chemical groups or that formed secondary structures that prevent self-annealing. By using a random primer hairpin structure and implementing an exonuclease digestion step after priming, we could amplify small quantities of immunoprecipi-tated DNA while minimizing the amount of nonspecific product (Fig. 1a, Supplementary Fig. 2 and Online Methods). Second, we identified additives, cycling conditions and a polymerase that enabled faithful amplification of immunoprecipitated DNA, maintaining representation of (G+C)-rich sequences (Fig. 1b and Online Methods). Finally, we digested the amplified DNA at a BciVI site introduced near the ends of the immunoprecipitated fragments to yield double-stranded products with 3′ adenine overhangs that could be ligated directly to Illumina adapters for sequencing.

We validated this protocol by analyzing the sequencing libraries by qPCR because we could not evaluate the scarce starting ChIP samples. This qPCR procedure quantified short amplicons corres-ponding to positive- and negative-control genomic sites whose

chromatin states are relatively invariant across cell types (Fig. 1b, Supplementary Fig. 1 and Online Methods).

We deep sequenced a library prepared from an H3K4me3 ChIP performed on 10,000 ESCs. We obtained roughly ten million 36-base reads. Initial alignments indicated that the first nine bases of the sequencing reads had higher mismatch rates, possibly owing to imperfect hybridization of random primers. Therefore, we performed subsequent alignments using bases 10–36 of each read. Seven million of the ten million sequenced reads could be aligned to the mouse reference genome, as is typical for ChIP-seq experi-ments. We processed aligned reads into a density profile using standard procedures8. Visual analysis of the ChIP-seq map gener-ated from 10,000 ESCs suggested good concordance to a standard H3K4me3 ChIP-seq map generated from roughly ten million ESCs (Fig. 2a), with striking peaks at a majority of (G+C)-rich pro-moters. A more quantitative analysis revealed roughly 93% overlap between a set of 11,193 H3K4me3-enriched promoters identified in the small-cell-number dataset and 12,079 H3K4me3-enriched

figure 2 | Validation of small-cell-number ChIP-seq maps. (a) H3K4me3 profiles and RefSeq gene positions across a 50-kb region on chromosome 2. (b) Overlap between H3K4me3-marked promoters identified using the two ChIP-seq methods. (c) Receiver operating characteristic curve shows true positive and false positive rates for H3K4me3 small-cell-number ChIP-seq data, evaluated for 1-kb windows across the genome, using the standard H3K4me3 ChIP-seq dataset as a gold standard. Red line indicates x = y. (d) Chromatin maps from 20,000 LSK cells in a biological replicate. Contour plots depict correlation between methylation in promoter regions (H3K4me3 and H3K27me3) or within genes (H3K36me3) and mRNA expression in LSK cells. Each data point corresponds to a single gene. (e) Contour plots depict correlations between LSK cell replicates in gene promoters and within genes. (f) H3K4me3, H3K27me3 and H3K36me3 signal tracks for the independent LSK cell replicates for the Hoxa cluster and the Flt3 locus. The y axis of each profile represents the estimated number of immunoprecipitated fragments at each position normalized by the total number of reads in a given dataset. RefSeq gene annotations are also shown.

c

10,416(93%)

b

Sta

ndar

d C

hIP

-seq

(n =

12,

079)

Sm

all-c

ell-n

umbe

rC

hIP

-seq

(n =

11,

193)

1,663 777

a

e f

H3K

4me3

H3K

27m

e3 H

3K36

me3

LSK 1

LSK 2

LSK 1

LSK 2

LSK 1

LSK 2

LSK 1

LSK 2

LSK 1

LSK 2

LSK 1

LSK 2

H3K

4me3

H3K

27m

e3 H

3K36

me3

1.0

0.8

0.6

0.4

0.2

1.00.80.60.40.2

0

0

False positive rate

Tru

e po

sitiv

e ra

te

Hoxa cluster

Flt3 Pan3

−1.5

−1.0

−0.5

0

0.5

1.0

1.5

−1.5

−1.0

–0.5

0

0.5

1.0

−1.0

−0.5

0

0.5

1.0

Gene expression

H3K4me3

H3K27me3

H3K36me3

d

H3K4me3 standard ChIP-seq (20 million ESCs)

H3K4me3 small-cell-number ChIP-seq (10,000 cells)

RefSeq genes

ChI

P-s

eq s

igna

l; LS

K c

ells

(lo

g 10)

ChIP-seq signal; LSK cells replicate 2 (log10)

ChI

P-s

eq s

igna

l; LS

K c

ells

rep

licat

e 1

(log 10

)Promoter Gene

H3K4me3

H3K27me3

H3K36me3

H3K4me3

H3K27me3

H3K36me3

r = 0.91 r = 0.84

r = 0.91r = 0.90

r = 0.84 r = 0.92

2.5 2.52.0 2.0

2.0

2.0 2.0

2.52.0 2 3

3.0

1.51.0 10.5

1.50.5

0.5 0.52.0 2.00 01.5 1.5

2.01.5

1.0 1.0

0.50 1.0–0.5

–0.5 –0.5

–0.5

0 0

0 0

00

0

0

–0.5–0.5

–0.5

0.5 0.5

0.5

0.50.5

1.0 1.0

1.0

0.5

1.0

1.01.0

1.5 1.5

1.5

2.0

1.5

1.5 1.5

© 2

010

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.

Page 3: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

nature methods  |  VOL.7  NO.8  |  AUGUST 2010  |  617

brief communications

promoters identified from the standard dataset (Fig. 2b). Considering the standard ChIP-seq dataset as a gold standard, we estimate that the small-cell-number ChIP-seq dataset had a sensitiv-ity of ~80% at a specificity of ~90% (Fig. 2c and Online Methods).

Next, we profiled chromatin in hematopoietic stem and progenitor cells isolated from mouse bone marrow and enriched for the immuno-phenotype lineage−, Sca-1+ and c-kit+ (LSK) (Supplementary Fig. 3 and Online Methods). Using roughly 20,000 cells per assay, we profiled H3K4me3 and two other histone H3 modifications, Lys27 trimethylation (H3K27me3) and Lys36 trimethylation (H3K36me3). These experiments yielded 13–19 million reads, of which roughly 75% could be aligned to the mouse reference genome. Several lines of evidence suggest that the resulting chroma-tin maps accurately reflect the true patterns of histone modification in the LSK population. H3K4me3 and H3K36me3 localize to gene promoters and gene bodies, respectively, consistent with known roles for these marks in transcriptional initiation and elongation9. Integration of published gene expression profiles for LSK cells10 confirmed expected positive (H3K4me3 and H3K36me3) and nega-tive (H3K27me3) correlations between the modifications and the transcriptional status of corresponding genes (Fig. 2d). Finally, we confirmed the reproducibility of the method by profiling all three modifications in a second LSK population (Fig. 2e,f).

The LSK cell chromatin profiles revealed several interesting findings of potential relevance to hematopoietic development. First, we observed relatively few large H3K4me3 regions or ‘domains’ (>10 kilobases (kb)) that were distinct from the sharp peaks observed at most promoters (Fig. 3a). In ESCs, the largest H3K4me3 domains correspond to pluripotency genes such as Pou5f1 (also known as Oct4) (ref. 8). In the hematopoietic pro-genitors, the largest H3K4me3 domains coincide with known hematopoietic regulators, including Hoxb4, Hoxa7, Hoxa9, Runx1, Meis1, Ikzf2 and Tal1 (Fig. 3a–c and Supplementary Table 1).

To investigate the importance of the H3K4me3 domains, we examined chromatin maps for ESCs8. Roughly one-third of loci with H3K4me3 domains in LSK cells carry both H3K4me3 and H3K27me3 in ESCs. Such ‘bivalent domains’ have been correlated with developmental loci that are silent but poised for activation at later development stages11,12. Reasoning that this subset of loci could illuminate genes with critical functions in hemato-poietic progenitors, we collated a list of genes marked by bivalent domains in ESCs and H3K4me3 domains in LSK cells. This list included 40 genes, 30 of which encode transcription factors or other developmental regulators with previously described func-tions in hematopoiesis (Supplementary Table 2).

Although initially described in ESCs, bivalent domains are also present in multipotent cells, including hematopoietic pro-genitors8,13,14. The LSK cell chromatin maps contained roughly 1,700 promoters with detectable H3K27me3 and H3K4me3. A majority of these promoters were also bivalent in ESCs, but they had variable patterns in CD4+ T cells, a differentiated progeny of LSK cells15 (Fig. 3d and Supplementary Fig. 4).

Whereas essentially all promoters with H3K27me3 in ESCs also had high H3K4me3, H3K27me3-marked promoters in LSK cells had a wide range of H3K4me3 levels. We explored the importance of the bivalent state in an in vivo setting, specifically asking whether the extent of H3K4me3 predicts the likelihood that a given promoter will be activated in differentiated progeny. We integrated our chromatin data with a compendium of gene expression profiles for differentiated hematopoietic cells15,16. We found that the extent of H3K4me3 at a given H3K27me3-marked promoter in LSK cells significantly correlated with the number of differentiated cell types in which the corresponding transcript could be detected (P < 10−5; Fig. 3e). Accordingly, master reg-ulators of hematopoietic lineages, such as Pax5 and Ebf1, had high levels of H3K27me3 and H3K4me3 in LSK cells (Fig. 3d

Zfp608

20 kb

Zeb2

20 kb

50 kb

Meis1 Actr6

5 kb 5 kb

a

Hoxa7Hoxa9 Meis1 Hlf Hoxb4 Nfe2 Ikzf2 Lmo2 Runx1 Tal1

101

H3K4me3

Domain size (kb) (log10)

H3K27me3

H3K36me3

H3K4me3

H3K27me3

H3K36me3

ES

Cs

LSK

cel

ls

b

cBivalent

H3K4me3

H3K27me3

H3K4me3

H3K27me3

H3K4me3

H3K27me3

ES

Cs

LSK

cel

lsT

cel

ls

d

e

H3K

4me3

sig

nal i

n LS

K

K27 only

Forexample:

Forexample:

H3K4me3

P < 10–5

Lef1

Pax5

Ebf1

Gata6

Sox3

Myod1

Pax3

Pax7

Num

ber

of p

rom

oter

s

Sox3Lef1

CD

4+ T

h1C

D4+

Th1

7C

D4+

Th2

CD

4+ n

Tre

gC

D4+

iTre

gC

D4+

nai

veC

D8+

act

ivat

edC

D8+

nai

veN

K c

ell

B c

ell

Ery

thro

cyte

sM

onoc

yte

Gra

nulo

cyte

s

2 4 6 8Number of cell types

figure 3 | Chromatin domains at developmental regulators in hematopoietic progenitors. (a) LSK cell H3K4me3 profiles for Meis1 and Actr6 genes. Insets show expanded images of these profiles over the transcription start sites of these genes. (b) Size distribution of H3K4me3 intervals associated with gene promoters in LSK cells; the high end of the size distribution was enriched for indicated hematopoietic regulators. Green line indicates domain size of 10 kb. (c,d) ChIP-seq signal tracks for ESCs and LSK cells across Zfp608 and Zeb2 (c) and Lef1 and Sox3 (d). (e) Heat map of genes with H3K27me3 and variable levels of H3K4me3 in LSK cells, depicting expression levels of the indicated genes in different hematopoietic cell types (blue indicates low expression; red indicates high expression). Th, helper T cells; nTreg, natural regulatory T cells; and iTreg, inducible regulatory T cells. The number of cell types in which a gene is expressed is plotted on the right.

© 2

010

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.

Page 4: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

618  |  VOL.7  NO.8  |  AUGUST 2010  |  nature methods

brief communications

and Supplementary Fig. 5). In contrast, master regulators that function in nonhematopoietic lineages, such as Myod1 and Sox3, are enriched for H3K27me3 but not for H3K4me3 (Fig. 3d and Supplementary Fig. 5). Thus, the presence of bivalent chromatin in hematopoietic progenitors was associated with an increased likelihood of transcriptional induction during differentiation. Further study is needed to determine the causality of these chro-matin structures and their relationship to transcriptional priming and to the developmental potential of progenitor cells17.

This ChIP-seq approach should be generally applicable for characterizing chromatin landscapes in biologically and clini-cally important cell models that have been inaccessible owing to inadequate sample size. Going forward, additional reductions in cell numbers may be achievable by careful optimization of ChIP conditions. The library preparation scheme and accompanying validation procedure may also be useful for profiling transcription factors from relatively smaller samples. However, this application is likely to prove challenging owing to variability among factors and affinity reagents and the exceedingly small DNA yields asso-ciated with such experiments.

methodsMethods and any associated references are available in the online version of the paper at http://www.nature.com/naturemethods/.

Accession codes. Gene Expression Omnibus (GEO): GSE22075.

Note: Supplementary information is available on the Nature Methods website.

acknoWledGmentsWe thank D. Flowers and I. Bernstein for expertise, reagents and assistance in isolation of hematopoietic cells, N. Shoresh and T. Mikkelsen for computational

assistance and B. Knoechel, E. Mendenhall, A. Goren and A. Chi for constructive discussions and critical reading of the manuscript. This research was supported by funds from the Starr Cancer Consortium, a Charles E. Culpeper Scholarship, the US National Human Genome Research Institute, the National Institutes of Health Roadmap for Epigenomics and the National Heart, Lung, and Blood Institute.

author contributionsM.A. and B.E.B. designed the method. M.A. performed the experiments. M.A., J.Z. and B.E.B. analyzed the data and wrote the manuscript.

comPetinG financial interestsThe authors declare competing financial interests: details accompany the full-text HTML version of the paper at http://www.nature.com/naturemethods/.

Published online at http://www.nature.com/naturemethods/. reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.

1. Park, P.J. Nat. Rev. Genet. 10, 669–680 (2009).2. Acevedo, L.G. et al. Biotechniques 43, 791–797 (2007).3. O’Neill, L.P., VerMilyea, M.D. & Turner, B.M. Nat. Genet. 38, 835–841

(2006).4. Attema, J.L. et al. Proc. Natl. Acad. Sci. USA 104, 12371–12376 (2007).5. Dahl, J.A. & Collas, P. Nucleic Acids Res. 36, e15 (2008).6. Goren, A. et al. Nat. Methods 7, 47–49 (2010).7. Lieb, J.D., Liu, X., Botstein, D. & Brown, P.O. Nat. Genet. 28, 327–334

(2001).8. Mikkelsen, T.S. et al. Nature 448, 553–560 (2007).9. Li, B., Carey, M. & Workman, J.L. Cell 128, 707–719 (2007).10. Tothova, Z. et al. Cell 128, 325–339 (2007).11. Azuara, V. et al. Nat. Cell Biol. 8, 532–538 (2006).12. Bernstein, B.E. et al. Cell 125, 315–326 (2006).13. Cui, K. et al. Cell Stem Cell 4, 80–93 (2009).14. Weishaupt, H., Sigvardsson, M. & Attema, J.L. Blood 115, 247–256 (2010).15. Wei, G. et al. Immunity 30, 155–167 (2009).16. Chambers, S.M. et al. Cell Stem Cell 1, 578–591 (2007).17. Bottardi, S., Ghiam, A.F., Bergeron, F. & Milot, E. Cell Cycle 6, 1035–1039

(2007).

© 2

010

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.

Page 5: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

nature methodsdoi:10.1038/nmeth.1478

online methodsCell acquisition. Mouse V6.5 ESCs were grown for two passages on 0.2% gelatin-coated plates containing DMEM supplemented with 15% fetal calf serum, 0.01% leukemia inhibitory factor, 1% penicillin-streptomycin, 1% Glutamax (Invitrogen), 1% non-essential amino acids and 0.001% 2-mercaptoethanol in 5% CO2 at 37 °C. Hematopoietic stem and progenitor (LSK) cells18,19 cells were enriched from mouse bone marrow by first depleting cells expressing lineage markers (CD2, CD3, CD5, CD8a, CD45R, CD11b, GR-1 and TER-119) with Dynal beads (Invitrogen), followed by fluorescence-activated cell sorting (FACS) for cells expressing Sca-1 and c-Kit on Vantage or Aria cell sorters (Becton Dickinson) as described previously20. Postsort analysis con-firmed that over 90% of the cells expressed the LSK phenotype (Supplementary Fig. 3). Two distinct biological replicates of LSK cells, enriched from independent mouse bone marrow isolates, were profiled.

High-sensitivity ChIP assay. Approximately 10,000 ESCs, counted with a hemocytometer, or 20,000 FACS-separated LSK cells were used for each ChIP assay. Cells were cross-linked with 1% formaldehyde in 1 ml PBS at room temperature (25 °C) for 10 min. After quenching with glycine for 5 min, the cells were washed twice with ice-cold PBS with 10% serum (addition of serum facilitates cell recovery during washes). Cells were collected after each wash by centrifugation at 2,500g for 3 min. Cell pellets were resuspended in 100 μl of lysis buffer containing 1% SDS, 10 mM EDTA and 50 mM Tris-HCl (pH 8.1) flash frozen and stored at −80 °C. Frozen pellets were thawed on ice for 10 min to allow lysis. Lysate was then diluted with 400 μl of ChIP dilution buffer containing 0.01% SDS, 1.1% Triton X-100, 1.2 mM EDTA and 16.7 mM Tris-HCl (pH 8.1). Dilution of lysis buffer ensures sufficient volume during sonication and prevents excessive lysis. Cells were sonicated under conditions optimized to yield soluble chromatin fragments in a size range of 200 to 700 base pairs. Chromatin from 10,000 ESCs was sonicated for 115 s using a Branson 250 sonifier at 40% power amplitude (pulses: 0.7 s on and 1.3 s off). Chromatin from 20,000 LSK cells was sonicated for 150 s with the same settings. To avoid loss of materials, we proceeded with immunoprecipitation directly, without intervening centrifu-gation. Immunoprecipitation was carried out overnight in 1 ml of ChIP dilution buffer supplemented with protease inhibitor cock-tails (Roche). We scaled down antibody quantities for the mini-ChIP assays as follows: 0.25 μg H3K4me3 (Millipore; 07473); 0.5 μg H3K27me3 (Millipore; 07449); and 0.25 μg H3K36me3 (Abcam; 9050). Excess antibody increased background in these assays, presumably owing to nonspecific pulldown. After over-night incubation, samples were rotated with ~10 μl of prewashed protein A–Sepharose (bed volume) at 4 °C for 2 h. The beads were then collected by brief centrifugation at 1,000g, keeping the unbound fraction to confirm fragmentation by gel electrophoresis. Next, the beads were successively washed twice with 700 μl of each of the following buffers at 4 °C: low-salt immune complex wash buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl (pH 8.1) and 150 mM NaCl); LiCl wash buffer (0.25 M LiCl, 1% NP40, 1% deoxycholate, 1 mM EDTA and 10 mM Tris-HCl (pH 8.1)); and TE (10 mM Tris-HCl and 1 mM EDTA (pH 8.0)). In each case, beads were collected on filter columns (Costar SPIN-X) to maxi-mize recovery. Bound chromatin was then eluted from the beads

twice in 125 μl of 0.2% SDS and 0.1 M NaHCO3 supplemented with fresh 5 mM DTT by incubation at 65 °C for 10 min followed by centrifugation at 16,000g for 1 min. The eluted chromatin and the ‘input’ sample were then incubated at 65 °C for 5 h before digestion with proteinase K at 37 °C for 2 h. Immunoprecipitated DNA was then extracted once using phenol, chloroform and isoamyl alcohol, and precipitated with ethanol and glycogen. Precipitated DNA was washed once with 70% ethanol and incubated with RNase A at 37 °C for 30 min.

Small-cell-number ChIP-seq library preparation. The library method described here consists of three main steps. In step 1, ChIP DNA (estimated at 10–50 pg) was primed with a univer-sal primer for four cycles (primer 1, 5′-GACATGTATCCGGATGTNNNNNNNNN-3′) to create template DNA with common sequences incorporated at the ends. Primer 1 contains a common sequence for PCR, a restriction site for BciVI and a random 9-mer at the 3′ end. Priming was achieved using Sequenase V2.0 (US Biochemical), a polymerase with strand displacement capability but no 3′ to 5′ exonuclease activity, as described previously7. First, we incubated 7 μl of chromatin immunoprecipitated DNA with 1 μl of primer 1 (2 mM stock) and 2 μl of Sequenase buffer at 98 °C briefly, and then annealed at 8 °C for 5 min. We then added 5.5 μl of Sequenase enzyme mix containing 1.5 μl of 3 nM dNTPs, 0.75 μl of 0.1 M DTT, 1.5 μl of 500 mg ml−1 BSA and 0.3 μl of 13 U μl−1 Sequenase. The temperature was gradually increased to 37 °C and incubated for 8 min. The whole cycle was then repeated with the addition of 1.2 μl of diluted Sequenase (1:4) instead of the enzyme mix. After four cycles of priming, excess primer 1 was cleaned up by exonuclease and alkaline phosphatase treatment as follows: 3 μl of ExoSAP-IT reagent (USB) was added to 15 μl of step 1 reaction and incubated for 15 min at 37 °C. The enzymes were then heat inactivated at 85 °C for 15 min.

In step 2, product from the priming reaction was PCR amplified using primer 2 (5′-GACATGTATCCGGATGT-3′). The template DNA from step 1 was diluted by addition of 43 μl of water. Four parallel PCRs (50 μl total volume each) were set up with 15 μl of template DNA from step 1, 1 μl of primer 2 (5 mM stock), 1.5 μl of Phusion DNA polymerase (NEB), 10 μl of high GC buffer (NEB) and 1% DMSO. The amplification consisted of 15 cycles of denaturation (98 °C for 30 s), annealing (40 °C for 30 s and 50 °C for 30 s) and extension (72 °C for 1 min).

In step 3, PCR product was column purified using the MinElute reaction cleanup kit (Qiagen) and digested overnight with BciVI (NEB) to yield 3′ adenine overhangs at each end. Digested frag-ments were ligated to Illumina adapters using the Quick Ligation kit (NEB). This approach eliminated several standard library preparation steps, considerably reducing material loss. Ligation products were amplified for 18 cycles using standard Illumina primers and then size selected on 2% agarose, retaining 275–600-base-pair (bp) fragments. The resulting libraries were hybridized to flow cells, subjected to cluster amplification and sequenced on the Illumina Genome Analyzer using standard procedures.

Quality control and library validation. We developed a qPCR-based quality-control strategy to assess whether sequencing libraries maintained representation with minimal production of nonspecific product. We designed for this purpose a series of primers that yield short amplicons (~80–100 bp). Primers were

© 2

010

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.

Page 6: Genome-wide chromatin maps derived from limited numbers of hematopoietic progenitors

nature methods doi:10.1038/nmeth.1478

designed to genomic sites representing negative control regions that have not been seen to be modified in any cell type studied to date to our knowledge, as well as several positive control regions of varying G+C sequence content that are frequently enriched for H3K4me3, H3K27me3 or H3K36me3 (Supplementary Table 3). Faithful libraries tend to support amplification of modified and unmodified sites, with the former detected at substantially higher levels. In contrast, poor-quality libraries may have equal repre-sentation of modified and unmodified sites, reflecting a failed ChIP assay or very low levels for all examined sites owing to failed library preparation. Library prepared from control input sample was used to assess relative enrichment levels.

Processing of small-cell-number ChIP-seq sequencing reads. Illumina reads were postprocessed and aligned to the reference genome (UCSC, mm8) from the tenth base using MAQ (mapping and assembly with quality) with default parameters except that reads with more than 10 hits in the genome were excluded. Reads that mapped to more than one position in the genome were then filtered out. Multiple reads mapping to the same position were only counted once to remove potential bias from PCR. Aligned reads were extended by 300 bases to roughly approximate the average size of sequenced fragments, and a 25-bp resolution density map was established by counting the number of fragments that overlap each position. Positions in the density map where less than 50% of the flanking 200 bp are alignable were masked as repetitive and discarded from subsequent analysis. We collated a set of 19,149 RefSeq genes (US National Center for Biotechnology Information, 13 May 2009 update). For each gene, we calculated promoter signals for H3K4me3 and H3K27me3 using mean sig-nal density over a 4-kb interval centered on the transcription start site. We calculated H3K36me3 signals using mean signal density over the gene body. H3K4me3-enriched intervals were called by a window-scan procedure. We first established a background signal model by randomly moving each read to an alignable position

on the same chromosome. Then, for each 1-kb window sliding across the genome by 25-bp steps, we calculated a nominal P value by comparison with the random background. Windows with nominal P < 10−5 were identified as enriched, and enriched windows separated by gaps of size < 1 kb were merged. ChIP-seq data for mouse ESCs8 and CD4+ T cells15 were retrieved from previous studies. Datasets can be accessed at http://www.broadinstitute.org/cgi-bin/seq_platform/chipseq/shared_portal/clone/Nano_ChIP_Seq.py.

Comparison of standard ChIP-seq and small-cell-number ChIP-seq data for ESCs. We collated a set of 11,193 H3K4me3-enriched promoters in the small-cell-number ChIP-seq dataset and 12,079 enriched promoters in a standard ChIP-seq dataset based on density distributions. In addition, we evaluated the genome-wide correspondence between the respective datasets by calculating the mean H3K4me3 ChIP-seq densities for promoter regions and nonoverlapping 1-kb windows across the genome. The top 12,000 1-kb windows in a standard H3K4me3 ChIP-seq dataset were defined as ‘positives’. Considering these positives as a ‘gold standard’, we then calculated true positive rates and false positive rates for the H3K4me3 small-cell-number ChIP-seq data-set at varying cutoff values for receiver operating curve (ROC) analysis (Fig. 2c and Supplementary Figs. 6 and 7).

Expression data for LSK cells. Gene expression profiles for LSK cells and other differentiated hematopoietic cells were retrieved from published datasets10,15,16. The original CEL files were proc-essed with GCRMA (GC robust multiarray average) package and quantile normalized.

18. Uchida, N., Aguila, H.L., Fleming, W.H., Jerabek, L. & Weissman, I.L. Blood 83, 3758–3779 (1994).

19. Morrison, S.J. & Weissman, I.L. Immunity 1, 661–673 (1994).20. Varnum-Finney, B., Dallas, M.H., Kato, K. & Bernstein, I.D. Blood 111,

2615–2620 (2008).

© 2

010

Nat

ure

Am

eric

a, In

c. A

ll ri

gh

ts r

eser

ved

.