35
Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner – Biol 476/576 Week 13 (April 6) Epigenetics and Disease Etiology Primary Papers 1. Godfrey, et al. (2007) Pediatr Res. 61(5 Pt 2):5R-10R. 2. Gjoneska, et al. (2015) Nature 518:365 3. Stirzaker, et al. (2015) Nature Comm 6:5899 Discussion Student 33 – Ref #1 above What is the mismatch concept? How does epigenetics apply to the hypothesis? What mechanism is involved in the developmental origins of disease? Student 34 – Ref #2 above What epigenetic technology and marks were examined? What epigenetic signals were identified? What was the link between immunology and Alzheimer’s identified? Student 36 – Ref #3 above What is the triple negative breast cancer? What are methylation clusters? What was the prognostic value of the epigenetic signature?

Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Spring2017–EpigeneticsandSystemsBiologyDiscussionSession(EpigeneticsandDiseaseEtiology)MichaelK.Skinner–Biol476/576Week13(April6)

EpigeneticsandDiseaseEtiology

PrimaryPapers1. Godfrey,etal.(2007)PediatrRes.61(5Pt2):5R-10R.2. Gjoneska,etal.(2015)Nature518:3653. Stirzaker,etal.(2015)NatureComm6:5899

DiscussionStudent33–Ref#1above

• Whatisthemismatchconcept?• Howdoesepigeneticsapplytothehypothesis?• Whatmechanismisinvolvedinthedevelopmentaloriginsofdisease?

Student34–Ref#2above

• Whatepigenetictechnologyandmarkswereexamined?• Whatepigeneticsignalswereidentified?• WhatwasthelinkbetweenimmunologyandAlzheimer’sidentified?

Student36–Ref#3above

• Whatisthetriplenegativebreastcancer?• Whataremethylationclusters?• Whatwastheprognosticvalueoftheepigeneticsignature?

Page 2: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Epigenetic Mechanisms and the Mismatch Concept of theDevelopmental Origins of Health and Disease

KEITH M. GODFREY, KAREN A. LILLYCROP, GRAHAM C. BURDGE, PETER D. GLUCKMAN, AND MARK A. HANSON

Centre for Developmental Origins of Health and Disease [K.M.G., K.A.L., G.C.B., M.A.H.], University of Southampton, SouthamptonSO16 5YA, United Kingdom; Liggins Institute and National Research Centre for Growth and Development [P.D.G.], University of

Auckland, Private Bag 92019, Auckland, New Zealand

ABSTRACT: There is now considerable evidence that elements ofthe heritable or familial component of disease susceptibility aretransmitted by nongenomic means, and that environmental influencesacting during early development shape disease risk in later life. Theunderlying mechanisms are thought to involve epigenetic modifica-tions in nonimprinted genes induced by aspects of the developmentalenvironment, which modify gene expression without altering DNAsequences. These changes result in life-long alterations in geneexpression. Such nongenomic tuning of phenotype through develop-mental plasticity has adaptive value because it attempts to match anindividual’s responses to the environment predicted to be experi-enced. When the responses are mismatched, disease risk increases.An example of such mismatch is that arising either from inaccuratenutritional cues from the mother or placenta before birth, or fromrapid environmental change through improved socioeconomic con-ditions, which contribute substantially to the increasing prevalence oftype-2 diabetes, obesity, and cardiovascular disease. Recent evidencesuggests that the effects can be transmitted to more than the imme-diately succeeding generation, through female and perhaps malelines. Future research into epigenetic processes may permit us todevelop intervention strategies. (Pediatr Res 61: 5R–10R, 2007)

Epidemiologic studies have demonstrated a robust associ-ation between small size at birth and during infancy, and

a greater risk of chronic disease including coronary heartdisease, hypertension, stroke, type 2 diabetes, and osteoporo-sis in later life (1). It is now accepted that the associations donot reflect confounding by adult environmental risk factorssuch as smoking or socioeconomic status, and the originalobservations from the Southampton group have been exten-sively replicated worldwide (1). A recent meta-analysis of 18studies reported that the relative risk of adult coronary heartdisease was 0.84 for each 1 kg increase in birth weight (2).This value is likely to substantially underestimate the devel-opmental influence as there is much experimental evidencethat the prenatal environment can induce long-term cardiovas-cular effects without necessarily affecting size at birth (3).Moreover, profound effects have now been demonstrated ifthere is a “mismatch” between the early, developmental en-

vironment and the subsequent environment in childhood andadult life (4). These and other observations have resulted inwide recognition that the “Developmental Origins of Healthand Disease” has major public health implications worldwide.For example, a recent World Health Organization TechnicalConsultation concluded, “The global burden of death, disabil-ity, and loss of human capital as a result of impaired fetaldevelopment is huge and affects both developed and develop-ing countries” (5). The report advocates a move away fromsimply low birth weight, to broader considerations of maternalwell-being, and achieving the optimal environment for thefetus to maximize its potential for a full and healthy life.

In parallel with the epidemiologic observations, animalstudies have demonstrated the importance of epigeneticchanges in mediating the effects on adult phenotype andphysiology arising from perturbations of the developmentalenvironment, including maternal diet (6,7), uterine blood flow(8), and maternal nursing behavior (9). The role of epigeneticprocesses in the early stages of some forms of cancer is wellestablished (10), but we are only now starting to appreciatethat epigenetic processes also have major implications for ourunderstanding of evolutionary mechanisms and for humandevelopment, reproduction, and degenerative disease. Theeffects on the offspring of epigenetic changes during develop-ment in animals mimic aspects of human disease, for example,metabolic disease, impaired renal function, or exaggeratedstress responses, and a coherent theory for a role of epigeneticmechanisms in the developmental origins of later chronicdisease is emerging. This is the subject of this review.

MISMATCH, DEVELOPMENTAL PLASTICITY,AND EPIGENETICS

Steep temporal trends in the incidence rates of cardiovas-cular disease in many populations suggest that the epidemio-logic associations are unlikely to have arisen exclusivelythrough the pleiotropic effects of genes that influence bothfetal growth and later cardiovascular risk. In contrast, theeffects are now viewed as the result of the phenotype estab-

Received November 14, 2006; accepted January 9, 2007.Correspondence: Mark A. Hanson, Ph.D., Centre for Developmental Origins of Health

and Disease, Princess Anne Hospital (Mailpoint 887), Coxford Road, Southampton SO165YA, UK; e-mail: [email protected]

M.A.H. is supported by the British Heart Foundation.

DOI: 10.1203/pdr.0b013e318045bedb

Abbreviations: Dnmt, DNA methyltransferase; CpG, cytosine and guanineadjacent to each other in the genome, linked by a phosphodiester bond; GR,glucocorticoid receptor; HDAC, histone deacetylase; HMT, histone methyl-transferase; PPAR, peroxisome proliferator-activated receptor; RNS Pol,RNA polymerase; TF, transcription factor

0031-3998/07/6105-0005RPEDIATRIC RESEARCH Vol. 61, No. 5, Pt 2, 2007Copyright © 2007 International Pediatric Research Foundation, Inc. Printed in U.S.A.

5R

Page 3: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

lished by the interaction between genes and the developmentalenvironment using the processes of developmental plasticity(11–13). As in other species, developmental plasticity at-tempts to “tune” gene expression to produce a phenotype bestsuited to the predicted later environment (14). When theresulting phenotype is matched to its environment, the organ-ism will remain healthy. When there is a mismatch, theindividual’s ability to respond to environmental challengesmay be inadequate and risk of disease increases. Thus, thedegree of the mismatch determines the individual’s suscepti-bility to chronic disease (4).

The degree of mismatch can by definition be increased byeither poorer environmental conditions during development,or richer conditions later, or both (4). Unbalanced maternaldiet, body composition, or disease can perturb the former; therapid increase in energy-dense foods and reduced physicalactivity levels associated with a western lifestyle will increasethe degree of mismatch via the latter (Fig. 1). Such changesare of considerable importance in developing societies goingthrough rapid socioeconomic transitions. In this review, wefocus on the epigenetic components of such inherited risk ofdisease, while noting that other, nongenomic mechanisms alsooperate to alter risk of disease in subsequent generations, e.g.the passage of cultural risk factors such as smoking.

The processes of phenotypic induction through develop-mental plasticity produce integrated changes in a range oforgans via epigenetic processes. They establish a life-coursestrategy for meeting the demands of the predicted later envi-ronment (15). This explains why an impaired early environ-ment produces a range of effects—alterations in cardiovascu-lar and metabolic homeostasis, growth and body composition,cognitive and behavioral development, reproductive function,

repair processes and longevity—some of which are associatedwith increased risk of cardiovascular and metabolic disease,“precocious” puberty, osteoporosis, and some forms of cancer.Understanding the underlying epigenetic processes thus holdsthe key to understanding the underlying pathophysiology andto developing approaches to early diagnosis, prevention andtreatment of these diseases.

EPIGENETIC PROCESSES–DEFINITIONAND MECHANISMS

The term “epigenetic” was coined by Waddington (16) torefer to the ways in which the developmental environment caninfluence the mature phenotype. His work and that of others(17) on developmental plasticity stemmed from observationsthat environmental influences during development could in-duce alternative phenotypes from a genotype, some of theclearest examples being polyphenisms in insects (18). Suchprocesses can, however, also induce a gradation of pheno-types, constituting a population reaction norm (19). Wadding-ton showed in Drosophila melanogaster that wing vein patterncould be affected by heat shock treatment of the pupae (20).Breeding individuals with these environmentally inducedchanges led eventually to a stable population exhibiting thephenotype without the environmental stimulus. Waddingtontermed this “genetic assimilation.” Such work, largely over-looked by proponents of the modern synthesis of genetic andevolutionary biology (21), demonstrates a dynamic interactionbetween the genome and the environment during the plasticphase of development, producing effects that can be heritable(11) in terms of an environmental cue acting in one generationhaving effects that are manifest in subsequent generations.

The term “epigenetic” is now used to refer to structuralchanges to genes that do not alter the nucleotide sequence,with epigenetic inheritance being defined as biologic pro-cesses that regulate mitotically or meiotically heritablechanges in gene expression without altering the DNA se-quence (22). Of particular relevance is methylation of specificCpG dinucleotides in gene promoters and alterations in DNApackaging arising from chemical modifications of the chro-matin histone core around which DNA wraps (Fig. 2). Themodifications include acetylation, methylation, ubiquitination,and phosphorylation. Such epigenetic inheritance systems (23)can be random with respect to the environment and have beentermed “epimutations” (24), or specific epigenetic changes canbe induced by the environment (25) (Fig. 3).

Epigenetic mechanisms are widely implicated in cancer(10). Promoter methylation is important for asymmetricalsilencing of imprinted genes (26) and retrotransposons(27,28). However, they also play a critical role in a range ofdevelopmental processes. With the exception of imprintedgenes, widespread removal of epigenetic marks occurs follow-ing fertilization when maternal and paternal genomes undergoextensive demethylation to ensure pluripotency of the devel-oping zygote. This is followed by de novo methylation justbefore implantation (29,30). About 70% of CpGs are methyl-ated, mainly in repressive heterochromatin regions and inrepetitive sequences such as retrotransposable elements (31).

Figure 1. The mismatch concept emphasizes that the degree of disparitybetween the environment experienced during development and that experi-enced later influences the risk of disease. During the period of developmentalplasticity in prenatal and early postnatal life, epigenetic processes are thoughtto alter gene expression to produce phenotypic attributes best suited to theenvironment in which the individual predicts that it will live, based onenvironmental cues transmitted via the mother. Greater mismatch givesgreater risk of disease from unpredicted excessive richness (high caloriedensity food, sedentary lifestyle) of the environment. Thus, risk is greater withpoorer developmental environment (A vs B), and with socioeconomic transi-tions to an affluent western lifestyle. Adapted from Gluckman PD et al. 2007Am J Hum Biol 19:1–19 © 2006 Wiley- Liss,Inc., with permission.

6R GODFREY ET AL.

Page 4: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

DNA methylation also plays a key role in cell differentiationby silencing the expression of specific genes during the de-velopment and differentiation of individual tissues. For exam-ple, the expression of the homeobox gene Oct-4, a key regu-lator of cellular pluripotency in the early embryo, ispermanently silenced by hypermethylation of its promoteraround E6.5 in the mouse (32), whereas HoxA5 and HoxB5,which are required for later stages of development, are notmethylated and silenced until early postnatal life (33). Forsome genes there also appear to be gradations of promoterdemethylation associated with developmental changes in roleof the gene product. The �-crystallin II and phosphoenolpyru-vate carboxykinase promoters are methylated in the early

embryo but undergo progressive demethylation during fetaldevelopment, and are hypomethylated compared with theembryo and expressed in the adult (34,35). Thus, changes inmethylation that are associated with cell differentiation andfunctional changes are established at different times duringdevelopment of the embryo. The pattern of DNA methylationis copied during mitosis by Dnmt-1 activity. This provides an“epigenetic memory” of patterns of gene regulation, and hencecell function, which is established during development andwhich is passed to the adult (29). This immediately suggests amechanism by which the environment may induce stablechanges to cell function that persist in the adult organism, bywhich environmental challenges at different times during de-velopment may produce different phenotypic outcomes and inhumans differential risk of disease.

Genomic imprinting represents a special case of epigeneticregulation of genes (36). Through imprinting (which bears norelation to the term for behavioral conditioning defined byLorenz), heritable patterns of gene expression are inducedwithout changes in the sequence of genomic DNA through thesilencing of one set of alleles dependent on its parental genderorigin. Disease resulting from imprinting disorders is wellrecognized, e.g. Beckwith-Wiedemann syndrome. Althoughrare, the incidence of this disorder is increased in offspringconceived by assisted reproductive techniques (37). Imprint-ing is most frequently mediated by allele-specific DNA meth-ylation, although imprinted alleles may differ in other ways.

Small noncoding regulatory RNA regulation of gene ex-pression is a newly emerging epigenetic mechanism (25).These microRNAs have been shown to not only modulate thestability and translation of mRNAs, but also can induce genesilencing through the induction of gene methylation and al-terations in chromatin structure. However, whether early lifeenvironmental challenges such as maternal nutritional con-straints can alter the expression or formation of these microR-NAs in the offspring has yet to be discovered, and the preciserole that these microRNAs play in the developmental originsof adult disease remains to be determined.

EVIDENCE FOR NONGENOMIC INHERITANCEIN HUMANS

Human studies have provided a number of lines of evidencesuggesting transgenerational nongenomic inheritance, al-though it is inevitably difficult to define the relative contribu-tions of genetic, epigenetic, and common environmental orlearned behavioral factors. For example, patterns of smoking,diet, and exercise can affect risk across more than one gener-ation (38) by several mechanisms. Strong evidence for trans-generational nongenomic inheritance exists for dietary andendocrine exposures. Records from Overkalix in northernSweden for individuals born in 1890, 1905, and 1920 haveshown that diabetes mortality increased in men if the paternalgrandfather was exposed to abundant nutrition during hisprepubertal growth period (39), an effect later extended topaternal grandmother/granddaughter pairs and transmitted in agender-specific fashion (40). During the 1944/1945 famine inthe Netherlands, previously adequately nourished women

Figure 2. Epigenetic silencing of transcription. When CpG dinucleotides areunmethylated in the promoter, RNA Pol and TF can bind to specific nucle-otide sequences and the coding region (exon) is transcribed. Methylation ofCpGs by the activity of Dnmt enables recruitment of methyl CpG bindingprotein-2 (MeCP2), which in turn recruits HDAC/ HMT to form an enzymecomplex bound to the gene promoter. The MeCP2/HDAC/HMT complexremoves acetyl groups from histones and catalyses di- and tri- methylation ofspecific lysine residues which causes the DNA to condense. This preventsaccess of RNA polymerase and transcription factors to DNA and so convertstranscriptionally active euchromatin to inactive heterochromatin. Thus, theoverall effect of DNA and histone methylation is to induce long-term silenc-ing of transcription.

Figure 3. Developmental plasticity declines and exposure to environmentalchallenges increases with age. Epigenetic processes are induced by cues from thedevelopmental environment. They play a role in determining the phenotype of theoffspring as part of a life-course strategy to match it to its environment. If notappropriately matched, the risk of later disease is increased.

7REPIGENETICS, MISMATCH, AND DOHAD

Page 5: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

were subjected to low caloric intake and associated environ-mental stress. Pregnant women exposed to famine in latepregnancy gave birth to smaller babies (41) who had anincreased risk of later insulin resistance (42). Famine exposureat different stages of gestation was variously associated withan increased risk of obesity, dyslipidemia, and coronary heartdisease, and F2 offspring of females exposed in the firsttrimester in utero did not have the expected increase in birthweight with increasing birth order (41). Exposure of pregnantwomen to diethylstilbestrol led to a marked increase in repro-ductive abnormalities and uterine fibroids (43), an earliermenopause (44), and breast (45) and rare genital tract cancersin their children, and there is evidence of third-generationaleffects transmitted through the maternal line (46).

EVIDENCE OF EPIGENETIC MECHANISMSIN ANIMALS

Animal research has given us new insights into develop-mental plasticity and epigenetics. First, it is clear that suchepigenetic effects during development produce gradedchanges in the expression of a range of genes, in addition tothose that produce parent-specific effects mediated via im-printed genes. Feeding a reduced protein diet to pregnant ratsinduces permanent changes in gene expression in the off-spring; GR and PPAR� expression is increased in the liver(7,47), whereas expression of the enzyme that inactivatescorticosteroids, 11�-hydroxysteroid dehydrogenase type II, isreduced in liver, lung, kidney, and brain (48). In the liver,increased GR and PPAR� expression is due to hypomethyla-tion of their respective promoters (7). The PPAR� promoter isalso hypomethylated in the heart (49). In contrast, there wasno difference in methylation of the PPAR�1 promoter in theliver, which suggests that the changes in epigenetic regulationinduced by the maternal reduced protein diet were gene-specific (7). Graded silencing of the retrotransposon IAPelement that regulates the agouti phenotype has been shown inthe offspring of mice fed diets with different amounts of folicacid during pregnancy (6). Nondietary factors also inducealtered epigenetic regulation of genes. Epigenetic changes inthe methylation of renal p53 are produced by uterine bloodflow restriction and are associated with reduced nephronnumber (8), which may precede the development of hyperten-sion (50). Variations in maternal behavior also lead to epige-netic changes; in rats, maternal care of the pups influencesmethylation of the estrogen receptor-alpha1b (51) and thehippocampal GR17 promoters (9), the latter resulting inchanges in hypothalamic-pituitary-adrenal axis stress re-sponses (9). This study illustrates that epigenetic changes canbe induced in later stages of developmental plasticity, such asduring early postnatal life.

The levels of methylation of CpG bases in the genome arecontrolled in part by the activity of Dnmts. The developmentaleffects observed in the rat are not produced by changes in theexpression of Dnmt-3a or b, or in the activity of methylbinding domain protein-2 (52), revealing that they are notproduced by changes in the demethylation/remethylation pro-cesses that occur soon after fertilization (29,30). In contrast,

they are accompanied by decreased Dnmt-1 expression (52).This suggests a mechanism by which down-regulation ofDnmt-1 expression during early development leads to a pro-gressive loss of epigenetic memory and an altered adultphenotype. This accords with the effects of nutritional orendocrine challenges during early gestation in altering growthof organs such as the heart and liver (53,54) and producinglater effects on cardiovascular and metabolic control (55–57).In the rat, the epigenetic effects appear to be dependent on1-carbon metabolism. Supplementation of the reduced proteindiet with folic acid during pregnancy prevents cardiovascularchanges in the offspring (58) and normalizes the changes inGR and PPAR� promoter methylation and gene expression(7) and in Dnmt-1 binding and expression (52). Induction ofelevated blood pressure or endothelial dysfunction in theoffspring is also prevented by maternal supplementation withglycine, but not with alanine or urea (59,60), supporting theconcept that methyl group provision is important.

Furthermore, recent data show that both the effects of glu-cocorticoid treatment and the reduced protein diet in pregnancycan be passed to the second generation without further nutritionalor endocrine manipulation (58,61,62). Feeding rats a reducedprotein diet during pregnancy in the F0 generation induceshypomethylation of the PPAR� and GR promoters in the liversof both the F1 and F2 male offspring (62). This shows thattransmission of induced phenotypes between generations in-volves altered epigenetic regulation of specific genes.

ADAPTIVE VALUE OF NONGENOMICINHERITANCE

The increasing evidence for nongenomic inheritance andparticularly epigenetic inheritance raises the question of whythe processes underpinning it have been preserved throughevolution. Natural selection is generally viewed as a processby which a species and its environment become well matched.Developmental plasticity utilizes environmental cues to adjustindividual phenotype to the current and predicted environment(13,63). These processes of developmental plasticity leadingto nongenomic inheritance may have evolved to enhancefitness during shorter-term environmental shifts than Darwin-ian selection can necessarily cope with, and/or to ensure agreater match to a variable environment than selection alonecan generate. In addition, it enables the induction of a widerrange of phenotypes, permitting survival in a broader range ofenvironments. Such strategies may have been important in theevolution of mammalian generalist species (64). Theoreticalmodels demonstrate the circumstances under which fitness isenhanced if parents transmit information about the environ-ment to their progeny. Factors to consider include the fidelityof the transmission of environmental cues, the degree ofpredictability of environmental conditions, and the costs ofincorrect prediction (63,65–67).

RELEVANCE OF EPIGENETIC PROCESSES TOTHE RISK OF ADULT DISEASE

We now live much longer than our hominine ancestors.Thus, mechanisms selected for their advantage in our earlier

8R GODFREY ET AL.

Page 6: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

evolution may no longer be advantageous or may be advan-tageous in the young and disadvantageous in the elderly.There are limits to the environment that the fetus can senseand use to adjust its development (68). Nongenomic epige-netic processes of transmitting environmental information be-tween generations evolved to assist our evolution as wemoved across changing environments. They may also haveserved to buffer critical aspects of our development, especiallythe vulnerable period of weaning in infancy, against short-term environmental changes occurring between generations(69). Such processes were not designed to deal with themassive mismatch between the generally constrained fetalenvironment and the modern postnatal environment of high-energy intake and low energy expenditure (4) and disease riskis amplified by a greater mismatch between the prenatallypredicted and actual adult environments. As a result, societiesin rapid economic transition are particularly vulnerable (70–73). Epigenetic and other nongenomic inheritance processesmay have conferred survival advantage on evolving hominids;they now exacerbate risk of disease for several successivegenerations and play a major part in the current epidemics ofmetabolic and cardiovascular disease (14,73). Additionally,the possibility is now being explored that exposure to xeno-biotics such as endocrine disruptors may have multigenera-tional effects through female and male lines by actions onsimilar epigenetic mechanisms (74).

Lastly, returning to our starting point of population studies,we must note that there is increasing evidence for the effectsof maternal obesity and gestational diabetes as risk factors forlater metabolic and cardiovascular disease in the offspring(75,76), a concept again supported by experimental studies inanimals (77). These effects contribute to the increasing trans-generationally passed rising incidence of such disease in bothdeveloped and developing societies. The extent to which suchrisk of disease operates by epigenetic processes is not known.

CONCLUSION

Epigenetic changes provide a “memory” of developmentalplastic responses to early environment. Their effects may onlybecome manifest later in life, e.g. in terms of altered responsesto environmental challenges. If the epigenetic change hasoccurred in part of the genome where gene expression iscontrolled by a transcription factor, then the consequences ofthe change will not become manifest until the transcriptionfactor operates. There is additional potential for epigeneticmarks to change throughout life as shown by recent studies onhomozygous twins (78), and there is some evidence for inher-itance of tissue-specific DNA methylation patterns (79). It isnow important to conduct further research to determine thespecific role of epigentic processes in the development of riskof cardiovascular and metabolic disease or other sequelae.

Acknowledgments. The authors thank Dr. Alan Beedle forhis assistance with the literature search and editing of themanuscript.

REFERENCES

1. Godfrey KM 2006 The “Developmental Origins” hypothesis: epidemiology. In:Gluckman PD, Hanson MA (eds) Developmental Origins of Health and Disease—ABiomedical Perspective. Cambridge University Press, pp 6–32

2. Huxley R, Owen C, Whincup P, Cook D, Rich-Edwards J, Smith G 2007 Isbirthweight a risk factor for coronary heart disease in later life? J Epidemiol (inpress)

3. Hanson MA, Gluckman PD 2005 Developmental processes and the induction ofcardiovascular function: conceptual aspects. J Physiol 565:27–34

4. Gluckman PD, Hanson MA 2006 Mismatch; How Our World No Longer Fits OurBodies. Oxford University Press, Oxford

5. Promoting Optimal Fetal Development: Report of a Technical Consultation. WorldHealth Organization. Available at: http://www.who.int/nutrition/topics/fetal_dev_re-port_EN.pdf

6. Waterland RA, Jirtle RL 2003 Transposable elements: targets for early nutritionaleffects on epigenetic gene regulation. Mol Cell Biol 23:5293–5300

7. Lillycrop KA, Phillips ES, Jackson AA, Hanson MA, Burdge GC 2005 Dietaryprotein restriction of pregnant rats induces and folic acid supplementation preventsepigenetic modification of hepatic gene expression in the offspring. J Nutr 135:1382–1386

8. Pham TD, MacLennan NK, Chiu CT, Laksana GS, Hsu JL, Lane RH 2003Uteroplacental insufficiency increases apoptosis and alters p53 gene methylation inthe full-term IUGR rat kidney. Am J Physiol Regul Integr Comp Physiol 285:R962–R970

9. Weaver IC, Cervoni N, Champagne FA, D’Alessio AC, Sharma S, Seckl JR, DymovS, Szyf M, Meaney MJ 2004 Epigenetic programming by maternal behavior. NatNeurosci 7:847–854

10. Laird PW 2005 Cancer epigenetics. Hum Mol Genet 14:R65–R7611. West-Eberhard MJ 2003 Developmental Plasticity and Evolution. Oxford University

Press, New York12. Bateson P, Barker D, Clutton-Brock T, Deb D, Foley RA, Gluckman P, Godfrey K,

Kirkwood T, Mirazon Lahr M, Macnamara J, Metcalfe NB, Monaghan P, SpencerHG, Sultan SE 2004 Developmental plasticity and human health. Nature 430:419–421

13. Gluckman PD, Hanson MA, Spencer HG, Bateson P 2005 Environmental influencesduring development and their later consequences for health and disease: implicationsfor the interpretation of empirical studies. Proc Biol Sci 272:671–677

14. Gluckman PD, Hanson MA 2004 Living with the past: evolution, development andpatterns of disease. Science 305:1733–1736

15. Gluckman PD, Hanson MA, Beedle AS 2007 Early life events and their conse-quences for later disease; a life history and evolutionary perspective. Am J Hum Biol19:1–19

16. Van Speybroeck L 2002 From epigenesis to epigenetics. The case of C.H. Wad-dington. Ann N Y Acad Sci 981:61–81

17. Schmalhausen I 1949 Factors of Evolution; the Theory of Stabilizing Selection.Blakiston, McGraw-Hill, New York

18. Applebaum SW, Heifetz Y 1999 Density-dependent physiological phase in insects.Annu Rev Entomol 44:317–341

19. Schlichting CD, Pigliucci M 1998 Phenotypic evolution: a reaction norm perspec-tive. Sinauer Associates Inc., Sunderland, MA

20. Waddington CH 1957 The Strategy of the Genes: A Discussion of Some Aspects ofTheoretical Biology. Macmillan, New York

21. Mayr E 2001 What Evolution Is. Basic Books, New York22. Jablonka E, Lamb MJ 2005 Evolution in Four Dimensions: Genetic, Epigenetic,

Behavioral and Symbolic Variation in the History of Life. MIT Press, Cambridge,MA

23. Maynard Smith J 1990 Models of a Dual Inheritance System. J Theor Biol143:41–53.

24. Holliday R 1991 Mutations and Epimutations in Mammalian Cells. Mutat Res250:351–363

25. Gluckman PD, Hanson MA, Beedle AS 2007 Non-genomic transgenerational inher-itance of disease risk. Bioessays 29:145–154

26. Li E, Beard C, Jaenisch R 1993 Role for DNA methylation in genomic imprinting.Nature 366:362–365

27. Walsh CP, Chaillet JR, Bestor TH 1998 Transcription of IAP endogenous retrovi-ruses is constrained by cytosine methylation. Nat Genet 20:116–117

28. Waterland RA, Jirtle RL 2003 Transposable elements: targets for early nutritionaleffects on epigenetic gene regulation. Mol Cell Biol 23:5293–5300

29. Bird A 2002 DNA methylation patterns and epigenetic memory. Genes Dev 16:6–2130. Reik W, Dean W, Walter J 2001 Epigenetic reprogramming in mammalian devel-

opment. Science 293:1089–109331. Yoder JA, Soman NS, Verdine GL, Bestor TH 1997 DNA (cytosine-5)-

methyltransferases in mouse cells and tissues. Studies with a mechanism-basedprobe. J Mol Biol 270:385–395

32. Gidekel S, Bergman Y 2002 A unique developmental pattern of Oct-3/4 DNAmethylation is controlled by a cis-demodification element. J Biol Chem 277:34521–34530

33. Hershko AY, Kafri T, Fainsod A, Razin A 2003 Methylation of HoxA5 and HoxB5and its relevance to expression during mouse development. Gene 302:65–72

34. Grainger RM, Hazard-Leonards RM, Samaha F, Hougan LM, Lesk MR, ThomsenGH 1983 Is hypomethylation linked to activation of delta-crystallin genes duringlens development? Nature 306:88–91

35. Benvenisty N, Mencher D, Meyuhas O, Razin A, Reshef L 1985 Sequential changesin DNA methylation patterns of the rat phosphoenolpyruvate carboxykinase geneduring development. Proc Natl Acad Sci U S A 82:267–271

9REPIGENETICS, MISMATCH, AND DOHAD

Page 7: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

36. Reik W, Walter J 2001 Genomic imprinting: parental influence on the genome. NatRev Genet 2:21–32

37. Arnaud P, Feil R 2005 Epigenetic deregulation of genomic imprinting in humandisorders and following assisted reproduction. Birth Defects Res C Embryo Today75:81–97

38. Brook JS, Whiteman M, Brook DW 1999 Transmission of risk factors across threegenerations. Psychol Rep 85:227–241

39. Kaati G, Bygren LO, Edvinsson S 2002 Cardiovascular and diabetes mortalitydetermined by nutrition during parents’ and grandparents’ slow growth period. EurJ Hum Genet 10:682–688

40. Pembrey ME, Bygren LO, Kaati G, Edvinsson S, Northstone K, Sjostrom M,Golding J, ALSPAC Study Team 2006 Sex-specific, male-line transgenerationalresponses in humans. Eur J Hum Genet 14:159–166.

41. Lumey LH, Stein AD 1997 Offspring birth weights after maternal intrauterineundernutrition: a comparison within sibships. Am J Epidemiol 146:810–819

42. Painter RC, Roseboom TJ, Bleker OP 2005 Prenatal exposure to the Dutch famineand disease in later life: an overview. Reprod Toxicol 20:345–352

43. Baird DD, Newbold R 2005 Prenatal diethylstilbestrol (DES) exposure is associatedwith uterine leiomyoma development. Reprod Toxicol 20:81–84

44. Hatch EE, Troisi R, Wise LA, Hyer M, Palmer JR, Titus-Ernstoff L, Strohsnitter W,Kaufman R, Adam E, Noller KL, Herbst AL, Robboy S, Hartge P, Hoover RN 2006Age at natural menopause in women exposed to diethylstilbestrol in utero. Am JEpidemiol 164:682–688

45. Palmer JR, Wise LA, Hatch EE, Troisi R, Titus-Ernstoff L, Strohsnitter W, KaufmanR, Herbst AL, Noller KL, Hyer M, Hoover RN 2006 Prenatal diethylstilbestrolexposure and risk of breast cancer. Cancer Epidemiol Biomarkers Prev 15:1509–1514

46. Brouwers MM, Feitz WF, Roelofs LA, Kiemeney LA, de Gier RP, Roeleveld N2006 Hypospadias: a transgenerational effect of diethylstilbestrol? Hum Reprod21:666–669

47. Burdge GC, Phillips ES, Dunn RL, Jackson AA, Lillycrop KA 2004 Effect ofreduced maternal protein consumption during pregnancy in the rat on plasma lipidconcentrations and expression of peroxisomal proliferator–activated receptors in theliver and adipose tissue of the offspring. Nutr Res 24:639–646

48. Bertram C, Trowern AR, Copin N, Jackson AA, Whorwood CB 2001 The maternaldiet during pregnancy programs altered expression of the glucocorticoid receptor andtype 2 11beta-hydroxysteroid dehydrogenase: potential molecular mechanisms un-derlying the programming of hypertension in utero. Endocrinology 142:2841–2853

49. Burdge GC, Hanson MA, Slater-Jefferies JL, Lillycrop KA 2007 Epigenetic regu-lation of transcription: a mechanism for inducing variations in phenotype (fetalprogramming) by differences in nutrition during early life? Br J Nutr (in press)

50. Woods LL, Ingelfinger JR, Nyengaard JR, Rasch R 2001 Maternal protein restrictionsuppresses the newborn renin-angiotensin system and programs adult hypertensionin rats. Pediatr Res 49:460–467

51. Champagne FA, Weaver IC, Diorio J, Dymov S, Szyf M, Meaney MJ 2006 Maternalcare associated with methylation of the estrogen receptor-alpha1b promoter andestrogen receptor-alpha expression in the medial preoptic area of female offspring.Endocrinology 147:2909–2915

52. Lillycrop KA, Slater-Jefferies JL, Hanson MA, Godfrey KM, Jackson AA, BurdgeGC 2007 Induction of altered epigenetic regulation of the hepatic glucocorticoidreceptor in the offspring of rats fed a protein-restricted diet during pregnancysuggests that reduced DNA methyltransferase-1 expression is involved in impairedDNA methylation and changes in histone modifications. Br J Nutr (in press)

53. Han HC, Austin KJ, Nathanielsz PW, Ford SP, Nijland MJ, Hansen TR 2004Maternal nutrient restriction alters gene expression in the ovine fetal heart. J Physiol558:111–121

54. Oliver MH, Hawkins P, Harding JE 2005 Periconceptional undernutrition altersgrowth trajectory and metabolic and endocrine responses to fasting in late-gestationfetal sheep. Pediatr Res 57:591–598

55. McMillen IC, Robinson JS 2005 Developmental origins of the metabolic syndrome:prediction, plasticity, and programming. Physiol Rev 85:571–633

56. Gardner DS, Tingey K, Van Bon BW, Ozanne SE, Wilson V, Dandrea J, Keisler DH,Stephenson T, Symonds ME 2005 Programming of glucose-insulin metabolism in

adult sheep after maternal undernutrition. Am J Physiol Regul Integr Comp Physiol289:R947–R954

57. Poore KR, Cleal JK, Newman JP, Boullin JP, Noakes D, Hanson MA, Green LR2007 Nutritional challenges during development induce sex-specific changes inglucose homeostasis in the adult sheep. Am J Physiol Endocrinol Metab 292:E32–E39

58. Torrens C, Brawley L, Anthony FW, Dance CS, Dunn R, Jackson AA, Poston L,Hanson MA 2006 Folate supplementation during pregnancy improves offspringcardiovascular dysfunction induced by protein restriction. Hypertension 47:982–987

59. Jackson AA, Dunn RL, Marchand MC, Langley-Evans SC 2002 Increased systolicblood pressure in rats induced by a maternal low-protein diet is reversed by dietarysupplementation with glycine. Clin Sci (Lond) 103:633–639

60. Brawley L, Torrens C, Anthony FW, Itoh S, Wheeler T, Jackson AA, Clough GF,Poston L, Hanson MA 2004 Glycine rectifies vascular dysfunction induced bydietary protein imbalance during pregnancy. J Physiol 554:497–504

61. Drake AJ, Walker BR, Seckl JR 2005 Intergenerational consequences of fetalprogramming by in utero exposure to glucocorticoids in rats. Am J Physiol RegulIntegr Comp Physiol 288:R34–R38

62. Burdge GC, Slater-Jefferies J, Torrens C, Phillips ES, Hanson MA, Lillycrop KA,2007 Dietary protein restriction of pregnant rats in the F0 generation induces alteredmethylation of hepatic gene promoters in the adult male offspring in the F1 and F2generations. Br J Nutr 97:435–439

63. Gluckman PD, Hanson MA, Spencer HG 2005 Predictive adaptive responses andhuman evolution. Trends Ecol Evol 20:527–533

64. Lister AM 2004 The impact of Quaternary Ice Ages on mammalian evolution. PhilosTrans R Soc Lond B Biol Sci 359:221–241

65. Jablonka E, Oborny B, Molnar I, Kisdi E, Hofbauer J, Czaran T 1995 The adaptiveadvantage of phenotypic memory in changing environments. Philos Trans R SocLond B Biol Sci 350:133–141

66. Moran NA 1992 The evolutionary maintenance of alternative phenotypes. Am Nat139:971–989

67. Sultan SE, Spencer HG 2002 Metapopulation structure favors plasticity over localadaptation. Am Nat 160:271–283

68. Gluckman PD, Hanson MA 2004 Maternal constraint of fetal growth and itsconsequences. Semin Fetal Neonatal Med 9:419–425

69. Kuzawa CW 1998 Adipose tissue in human infancy and childhood: an evolutionaryperspective. Am J Phys Anthropol 27:177–209

70. Popkin BM 2001 Nutrition in transition: the changing global nutrition challenge.Asia Pac J Clin Nutr 10:S13–S18

71. Bhargava SK, Sachdev HS, Fall CH, Osmond C, Lakshmy R, Barker DJ, Biswas SK,Ramji S, Prabhakaran D, Reddy KS 2004 Relation of serial changes in childhoodbody-mass index to impaired glucose tolerance in young adulthood. N Engl J Med350:865–875

72. Prentice AM, Moore SE 2005 Early programming of adult diseases in resource poorcountries. Arch Dis Child 90:429–432

73. Gluckman PD, Hanson MA 2004 The developmental origins of the metabolicsyndrome. Trends Endocrinol Metab 15:183–187

74. Anway MD, Cupp AS, Uzumcu M, Skinner MK 2005 Epigenetic transgenerationalactions of endocrine disruptors and male fertility. Science 308:1466–1469

75. Forsen T, Eriksson JG, Tuomilehto J, Teramo K, Osmond C, Barker DJ 1997Mother’s weight in pregnancy and coronary heart disease in a cohort of Finnish men:follow up study. BMJ 315:837–840

76. Silverman BL, Purdy LP, Metzger BE 1996 The intrauterine environment: implica-tions for the offspring of diabetic mothers. Diabetes Rev 4:21–35

77. Aerts L, Van Assche FA 2006 Animal evidence for the transgenerational develop-ment of diabetes mellitus. Int J Biochem Cell Biol 38:894–903

78. Fraga MF, Ballestar E, Paz MF, Ropero S, Setien F, Ballestar ML, Heine-Suner D,Cigudosa JC, Urioste M, Benitez J, Boix-Chornet M, Sanchez-Aguilera A, Ling C,Carlsson E, Poulsen P, Vaag A, Stephan Z, Spector TD, Wu YZ, Plass C, EstellerM 2005 Epigenetic differences arise during the lifetime of monozygotic twins. ProcNatl Acad Sci U S A 102:10604–10609

79. Silva AJ, White R 1988 Inheritance of allelic blueprints for methylation patterns.Cell 54:145–152

10R GODFREY ET AL.

Page 8: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

LETTER OPENdoi:10.1038/nature14252

Conserved epigenomic signals in mice and humansreveal immune basis of Alzheimer’s diseaseElizabeta Gjoneska1,2*, Andreas R. Pfenning2,3*, Hansruedi Mathys1, Gerald Quon2,3, Anshul Kundaje2,3,4, Li-Huei Tsai1,21

& Manolis Kellis2,31

Alzheimer’s disease (AD) is a severe1 age-related neurodegenerativedisorder characterized by accumulation of amyloid-b plaques andneurofibrillary tangles, synaptic and neuronal loss, and cognitivedecline. Several genes have been implicated in AD, but chromatinstate alterations during neurodegeneration remain uncharacterized.Here we profile transcriptional and chromatin state dynamics acrossearly and late pathology in the hippocampus of an inducible mousemodel of AD-like neurodegeneration. We find a coordinated down-regulation of synaptic plasticity genes and regulatory regions, andupregulation of immune response genes and regulatory regions,which are targeted by factors that belong to the ETS family of tran-scriptional regulators, including PU.1. Human regions orthologousto increasing-level enhancers show immune-cell-specific enhancersignatures as well as immune cell expression quantitative trait loci,while decreasing-level enhancer orthologues show fetal-brain-specificenhancer activity. Notably, AD-associated genetic variants are speci-fically enriched in increasing-level enhancer orthologues, implicatingimmune processes in AD predisposition. Indeed, increasing enhan-cers overlap known AD loci lacking protein-altering variants, andimplicate additional loci that do not reach genome-wide significance.Our results reveal new insights into the mechanisms of neurodegen-eration and establish the mouse as a useful model for functional studiesof AD regulatory regions.

Gene expression2,3 and genetic variation4 studies suggest gene-regulatorychanges may underlie AD, but regulatory epigenetic alterations duringneurodegeneration remain uncharacterized, given the inaccessible natureof human brain samples. To address this need, we profiled transcriptionaland epigenomic changes during neurodegeneration in the hippocampusof the CK-p25 mouse model of AD5–7 and CK littermate controls at bothearly and late stages of neurodegeneration (2 weeks and 6 weeks afterp25 induction). CK-p25 mice, in which accumulation of the Cdk5 activ-ator protein p25 is inducible, exhibit DNA damage, aberrant gene expres-sion and increased amyloid-b levels at early stages7, followed by neuronaland synaptic loss and cognitive impairment at late stages5,6.

For transcriptome analysis, we used RNA sequencing to quantify geneexpression changes for 13,836 ENSEMBL genes (see Methods, ExtendedData Fig. 1a and Supplementary Table 1). We found 2,815 upregulatedgenes and 2,310 downregulated genes in the CK-p25 AD mouse modelas compared to CK littermate controls (at q , 0.01; SupplementaryTable 1), which we classified into transient (2 weeks only), late-onset(6 weeks only) and consistent (both) expression classes (Fig. 1a, ExtendedData Fig. 4a and Supplementary Table 1). These showed distinct functionalenrichments (Fig. 1a and Supplementary Table 2), with transient-increasegenes enriched in cell cycle functions (P , 10292), consistent-increasegenes enriched in immune (P , 10210) and stimulus-response (P , 1024)functions, and consistent- and late-decrease genes enriched in synapticand learning functions (P , 10212).

These coordinatedneuronal and immunechanges are consistentwith the pathophysio-logy of AD2 and probably

reflect both cell-type-specific expression changes and changes in cellcomposition. Indeed, comparison with expression in microglia8 (the resi-dent immune cells of the brain) shows that both the cell type composi-tion (P 5 2.7 3 1024) and microglia-specific activation (P 5 2.9 3 1026)significantly contribute to the gene expression changes (see Methods).Additionally, reverse transcription followed by quantitative PCR (RT–qPCR) of increased-level genes in purified CD11b1 CD45low microgliapopulations confirms cell-type-specific activation for five of the sevenmicroglia-specific genes tested (Extended Data Fig. 2).

Confirming the biological relevance of our mouse model for humanAD, the observed changes in gene expression in mouse, especially for theconsistent and late classes, agreed with gene expression differences bet-ween 22 patients with AD and 9 controls in human post-mortem lasercapture microdissected hippocampal grey matter2 (Fig. 1b). The enrichedGene Ontology classes also agreed between mouse and human, withhigher immune gene expression and lower neuronal gene expressionin patients with AD (Fig. 1c).

For epigenome analysis, we used chromatin immunoprecipitationsequencing (ChIP-seq) to profile seven chromatin marks9: histone 3Lys 4 trimethylation (H3K4me3; associated primarily with active pro-moters); H3K4me1 (enhancers); H3K27 acetylation (H3K27ac; enhancer/promoter activation); H3K27me3 (Polycomb repression); H3K36me3and H4K20me1 (transcription); and H3K9me3 (heterochromatin) (Ex-tended Data Fig. 1a). We used ChromHMM (http://compbio.mit.edu/ChromHMM/) to learn a chromatin state model (Methods and ExtendedData Fig. 3a) defined by recurrent combinations of histone modifica-tions, consisting of promoters, enhancers, transcribed, bivalent, repressed,heterochromatin and low-signal states (Extended Data Fig. 3a). Wedefined 57,840 active promoters using H3K4me3 peaks within promoterchromatin states, and 151,447 active enhancer regions using H3K27acpeaks within enhancer chromatin states (Extended Data Fig. 1a, Sup-plementary Table 3 and Methods).

We mapped orthologous genes between mouse and human usingENSEMBL one-to-one orthologues (see Methods). We also mappedorthologous noncoding regions using multiple mammalian sequence align-ments, mapping each mouse peak to its best human match (see Methods).We found matches for 90% of promoter regions, 84% of enhancers,74% of Polycomb-repressed regions and 33% of heterochromatin regions(Supplementary Table 3). Comparing our mouse chromatin states tohuman hippocampus chromatin states10, we found significant epigeno-mic conservation at orthologous noncoding regions (Extended DataFig. 3b), consistent with recent results11.

A special issuenature.com/epigenomeroadmap

NatureEPIGENOME ROADMAP

*These authors contributed equally to this work.1These authors jointly supervised this work.

1The Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 2Broad Institute of HarvardUniversity and Massachusetts Institute of Technology, Cambridge,Massachusetts02142, USA. 3Computer Science and Artificial Intelligence Laboratory,Massachusetts Institute of Technology, Cambridge,Massachusetts 02139, USA. 4Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA.

1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 6 5

Macmillan Publishers Limited. All rights reserved©2015

Page 9: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

We quantified epigenomic changes in promoter regions using relativedifferences in H3K4me3 levels resulting in 3,667 increased-level and 5,056decreased-level peaks (q , 0.01; Extended Data Fig. 4b and Supplemen-tary Table 3), which we classified into transient, consistent and late-stage, as for gene expression changes. For enhancer regions, we usedrelative levels of H3K27ac, resulting in 2,456 increased-level and 2,154decreased-level peaks (Extended Data Fig. 4c and Supplementary Table3). Only a very small number of peaks showed differences in Polycomb-repressed and heterochromatin regions, leading us to focus on enhan-cer and promoter changes for the remaining analyses (Extended DataFig. 4d, e and Supplementary Table 3).

Genes flanking increased- and decreased-level regulatory regions (seeMethods) showed consistent gene expression changes for both promoterand enhancers regions (Extended Data Fig. 5), and were consistentlyenriched in immune and stimulus-response functions for increased-level enhancers and promoters, and in synapse and learning-associatedfunctions for deceased-level enhancers and promoters (Fig. 1d, e),consistent with our Gene Ontology results of changing gene expressionlevels.

Increased- and decreased-level regulatory regions showed distinctregulatory motif enrichments (Fig. 1f, g). Increased-level peaks wereenriched in NFkB, E2F, PPARG, IRF and PU.1 (ref. 12) transcriptionfactor motifs for both enhancers and promoters, consistent with immuneregulator targeting. Decreased-level peaks in enhancers were enriched forDNA-binding RFX motifs, and peaks in promoters were enriched forzinc-finger ZIC motifs, two known neurodevelopmental regulators13,14.

Consistent with the observed motif enrichments, increased-levelenhancers and promoters showed in vivo binding of PU.1 in mouseembryos15,16 (Fig. 1h, i). Only increased-level promoters were bound inmacrophages and BV-2 microglial-like cells17–19 that are both implicatedin AD20, while both increased- and decreased-level promoters were boundin several immune cell lineages (Fig. 1h). The PU.1 regulator itself(encoded by the SPI1 gene) showed increased expression and enhancerlevels (Extended Data Fig. 1b), possibly contributing to immune enhan-cer and promoter upregulation, consistent with roles for PU.1, ETS-1and other ETS family members in microglia activation and proliferationduring neurodegeneration21,22. By contrast, neuronal function regulatorswere not enriched in increased-level enhancers (except for a weak enrich-ment of fetal brain CREB; Fig. 1i), consistent with primarily immune andinflammatory function of these regions.

Decreased-level enhancers and promoters were targeted by differentregulators, suggesting distinct regulatory programs. Decreased-levelpromoters were preferentially bound by CREB and SRF (P , 10221

and P , 10216), two known regulators of neuronal activity in corticalneurons23, and decreased-level enhancers were preferentially bound byCBP (Phypergeometric 5 5.4 3 10220), a known co-activator for neuronalactivity16 (Fig. 1h, i). Surprisingly, p300-bound regions15 did not showany enrichment, suggesting distinct roles for CBP and p300, despite ageneral association with enhancers for both. The distinct neuronal andimmune targeting of decreased-level and increased-level regulatory regionsprovides a mechanistic basis for the expression differences observed forneuronal and immune genes, and suggests potential therapeutic targetsfor reversing observed alterations during neurodegeneration.

On the basis of chromatin state annotations in 127 human cell typesand tissues10 (Fig. 3a and Supplementary Table 4), regions orthologousto increased-level enhancers in mouse showed immune cell enhanceractivity in human (P , 102100), while orthologues of decreased-levelenhancers in mouse showed fetal brain tissue enhancer activity in human(P , 1028 consistent; P , 10217 late-stage; Fig. 2a and SupplementaryTable 4). Adult brain tissues (including hippocampus) were not asstrongly enriched, suggesting changes are biased towards neuronalplasticity. These results are consistent with decreased neuronal plasti-city, and increased microglial activation and proliferation during ADprogression24.

To verify whether the increased-level putative enhancer regions wereindeed functional, we used a luciferase reporter assay to evaluate theirability to drive in vitro gene expression in immortalized murine mic-roglial (BV-2) and neuroblastoma (N2a) cell lines. Eight of the nineincreased-level human orthologues tested were indeed able to drive invitro reporter expression. Two of these, BIN1 and ZNF710, were activein both cell types, while the remaining six showed a BV-2-cell-specificincrease in luciferase expression (Fig. 2b and Supplementary Table 5),confirming both functional conservation and tissue specificity of increased-level enhancer regions implicated by our mouse model of AD.

Human orthologues of increased-level enhancers were also enrichedfor expression quantitative trait loci (eQTLs) in CD41 T cells and CD141

monocytes25,26 (Extended Data Fig. 6 and Supplementary Table 6),

a b

c

******

*

*

*

Human AD expression change

enrichment (T statistic)

P = 0.03

P < 1 × 10–6

P < 1 × 10–20

P < 1 × 10–3

P < 1 × 10–42

P < 1 × 10–37

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

Differential

expression

in human AD

+

–15 1050–5–10

Exp

ressio

n c

hang

es

(RN

A-s

eq

)

Seq

. m

otif

enrich.

Seq

. m

otif

enrich.

TF

bin

din

g (C

hIP

)T

F b

ind

ing

(C

hIP

)

E2

F

ET

S/P

U.1

IRF

PP

AR

G

NF

KB

RF

X

ZIC

PU

.1 c

om

mo

n

PU

.1 m

ac+

BV

2

p300 f

eta

l b

rain

NP

AS

4 f

eta

l b

rain

SR

F f

eta

l b

rain

CR

EB

feta

l b

rain

CB

P f

eta

l b

rain

d hf

e ig

624

794

1,397

142

1,029

1,139

(a)

(c)

(d, e)

(f, g)

(h, i)

GO enrich. (mouse transcribed) –3.0 3.0

GO enrich. (human transcribed) –1.5 1.5

GO enrich. (mouse pro./enh.) –1.5 1.5

Motif enrichment

ChIP enrichment

0 1.0

H3K

4m

e3 in

pro

mo

ter

sta

tes

H3K

27ac in

enhancer

sta

tes

Regions

Regions

125

740

2,802

316

941

3,799

130

1,779

547

123

911

1,120

0 20.0

Consistent decrease

Transient decrease

Transient increase

Consistent increase

Late increase

Late decrease

Cel

l cyc

le

Gen

es

Imm

une

syst

em p

roce

ss

Stim

ulus

res

pon

se

Cel

l adhe

sion

Ana

tom

ical

struc

ture

Est

. of l

ocal

izat

ion

Neu

ron

pro

ject

ion

dev

.

Syn

aptic

tra

nsm

issi

on

Lear

ning

/mem

ory

Neu

roge

nesi

s

Ner

vous

sys

tem

dev

.

Figure 1 | Conserved gene expression changes between mouse and humanAD are associated with immune and neuronal functions. a, Six distincttemporal classes of differentially expressed genes are denoted; transient (early)increase (pink) or decrease (light blue), consistent increase (red) or decrease(blue), and late (6 week) increase (dark red) or decrease (navy blue). Expressionis shown relative to the mean of three replicates at 2-week control (CK) mice.Shown are the most significant distinct biological process Gene Ontology (GO)categories in each class of differentially regulated genes (asterisk denotesenrichment of hypergeometric P , 0.01). Grey boxes indicate no overlappinggenes. b, T-statistic identifying the bias of each differentially regulated class ofgenes in AD cases relative to controls; negative t denotes lower expression inAD, positive t denotes higher expression in AD. c, Enrichment of GeneOntology categories for differentially expressed genes between AD cases andcontrols in human2. d, e, Enrichment of each Gene Ontology categoryexamined in the gene expression analysis was calculated for H3K4me3promoters (pro.; red) (d) and H3K27ac enhancers (enh.; yellow) (e). Asteriskdenotes categories with a binomial P , 0.01. f, g, Enrichment of regulatorymotifs within changing promoters (top) (f) and enhancers (bottom) (g) in themouse AD model. h, i, Overlap of changing promoters (top) (h) and enhancers(bottom) (i) with regions shown to be bound by immune (orange) andneuronal (purple) transcriptional factors (TF) and co-factors profiled usingChIP-seq in mouse immune and neuronal tissues15–19.

RESEARCH LETTER

3 6 6 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5

Macmillan Publishers Limited. All rights reserved©2015

Page 10: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

indicating that they contain driver mutations controlling immune cellregulatory programs. The enrichment was strongest for CD141 mono-cytes (Extended Data Fig. 6), which also showed the highest enhancerenrichment and is consistent with the observed inflammatory responseGene Ontology category.

To test whether the implicated regulatory regions are causal, we exam-ined their enrichment for AD-associated variants from genome-wideassocation studies (GWAS). Genetic variants associated with AD in ameta-analysis of ,74,000 individuals4 were enriched in increased-level enhancer orthologues (Fig. 2c) (4.4-fold enrichment, binomialP 5 1.2 3 10210 at GWAS cutoff P , 0.001; 9.7-fold enrichment, bino-mial P , 3.7 3 1026 at GWAS cutoff P , 1025). By contrast, decreased-level enhancer orthologues were surprisingly not enriched (0.61-fold),suggesting a causal role specifically for immune-related processes. Pro-moter regions were only weakly enriched, strongly implicating distalenhancers in mediating AD predisposition (Extended Data Fig. 7).

Across diverse cell types and tissues, we found concordance betweenthe enrichment for AD GWAS single nucleotide polymorphisms (SNPs)and the enrichment for increased-level enhancer orthologues (R2 5 0.49;Fig. 2d, Extended Data Fig. 8a, left and Supplementary Table 4), withCD141 immune cells being the most enriched in both, followed by otherimmune cell types, and with fetal brain enhancers showing the smallestenrichment in both. By contrast, decreasing enhancers orthologues showeda very weak correlation (R2 , 0.08) (Fig. 2e, Extended Data Fig. 8b, rightand Supplementary Table 4). The increased-level enhancer orthologue

enrichment for AD GWAS SNPs persisted both within CD141 enhan-cers (3.0-fold enrichment, binomial P 5 1.3 3 1025) and outside CD141

enhancers (3.4-fold, P 5 0.005), suggesting it is not solely a feature ofCD141 cell type enrichment (see Methods).

These results are consistent with enhanced microglial expression ofCD14 in brains of animal models of AD, and a regulatory role of theCD14 receptor in microglial inflammatory response, which modulatesamyloid-bdeposition24. Thus, the enrichment of AD-associated variantsin CD141 primary immune cells, but not neuronal cells, indicates thatAD genetic predisposition is primarily associated with immune func-tion, while decrease in neuronal plasticity may be affected primarily bynon-genetic effects, such as diet, education, physical activity and age, whichare thought to lead to epigenetic changes related to cognitive reserve27.

We next used the epigenomic annotations of increased-level enhancerorthologues to gain insights into AD-associated loci (SupplementaryTable 7). Among the 20 genome-wide significant AD-associated loci4,11 contain no protein-altering SNPs in linkage disequilibrium (LD),indicating they may have noncoding roles. Of these, five localize withinincreased-level enhancer orthologues, including two well-establishedGWAS loci (PICALM and BIN1), and three loci (INPP5D, CELF1 (alsocontaining the SPI1 gene) and PTK2B) only recently recognized as signi-ficant by combining all AD cohorts.

For INPP5D (Fig. 3a), a known regulator of inflammation28, the mostsignificant variants localize within an increased-level enhancer ortholo-gue, which also shows CD141 enhancer activity. In the CELF1 locus

0

3

6

9

!"

Enrichm

ent

in A

D-a

sso

cia

ted

SN

Ps

(–lo

g10(P

valu

e))

Category of enhancer change

b c d e

0

10

20

30

100

200

300

400

500

Fo

ld d

iffe

rence in lum

inescence

BV-2 (microglia model)

N2a (neuronal model)

Enhancer region

0

1

2

3

4

Enrichm

ent

in A

D-a

sso

cia

ted

SN

Ps

(–lo

g10(P

valu

e))

0

1

2

3

4

0 50 100 150 200

R2 = 0.49

Enrichment in changing enhancers in AD mouse model (–log10(P value))

0 2 4 6 8

Decreasing consistent

Roadmap class

Immune

Adult brain

Fetal brain

Other

R2 = 0.05

Increasing consistent

a

Immune/blood

cell types

0

1

2

3

Ad

ult

bra

in

Fo

ld e

nrichm

ent

of

chang

ing

mo

use A

D

mo

del enhancers

in h

um

an c

ells

/tis

sues

Roadmap Epigenomics cell type or tissue

Feta

l b

rain

Other cell types/tissues

P < 1 × 10–142 Peripheral blood mononuclear cells

P < 1 × 10–204 CD14 primary cells

Hippocampus

Fetal brain female

Consistent decreaseTransient decreaseTransient increase Consistent increase Late increase

Late decrease

Ctrl

SPI1

(PU.1

)ZN

F710

#1ZN

F710

#2IN

PP5D

MVB

12DO

PEY2

ABCA1BI

N1

Figure 2 | AD GWAS loci are preferentially enriched in increasing enhancerorthologues with immune function. a, Enrichment (y axis) of changingmouse AD enhancer orthologues, with a focus on consistently increasing (red)category of enhancers, in 127 cell and tissue types profiled by the RoadmapEpigenomics Consortium10 (columns). Roadmap samples are grouped intofetal brain (purple), adult brain (green), immune/blood cell types (orange) andall other (grey). b, Cell-type-specific fold luciferase reporter expression changerelative to control (ctrl) for selected increasing enhancer regions in BV-2microglia (orange) versus N2A neurons (purple) (n 5 3, *P , 0.05, two-tailed

t-test). c, Enrichment of AD-associated SNPs (y axis, binomial P value) inhuman regions orthologous to the mouse enhancers. d, e, Enrichment ofAD-associated SNPs (y axis, permutation P value) in tissue-specific enhancerannotations from the Roadmap Epigenomics Consortium (points), relativeto their enrichment for consistently increasing (d) and consistentlydecreasing (e) orthologous enhancer regions in the mouse AD model (x axis,hypergeometric P value). Linear regression trend line and R2, based on Pearsoncorrelation, is shown.

LETTER RESEARCH

1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 6 7

Macmillan Publishers Limited. All rights reserved©2015

Page 11: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

(Fig. 3b) a large region of association spans several genes, but the stron-gest genetic signal (P 5 2 3 1026) localizes upstream of SPI1 (PU.1), andspecifically within an increased-level enhancer orthologue that is alsoactive in immune cells. We confirmed that the AD-associated C–T sub-stitution, rs1377416, in the SPI1 enhancer leads to increased in vitroenhancer activity in murine BV-2 microglia cells using a luciferase reporterassay (Fig. 3d). In addition, the AD-associated SNP rs55876153 nearSPI1, which overlaps an increased-level mouse enhancer orthologue, isin strong linkage disequilibrium (LD 5 0.89, see Methods) with a knownSPI1 eQTL, rs10838698 (ref. 25), even though it did not significantly alterenhancer activity in the luciferase assay.

Outside known GWAS loci, an additional 22 weakly associated regions(3.9-fold, P , 4.9 3 1027) contain variants within increased-level enhan-cer orthologues (Supplementary Table 7), of which 17 lack protein-alteringvariants in linkage disequilibrium (R2 , 0.4), providing strong candi-dates for directed experiments. One such example includes ABCA1 (P 5

6.9 3 1025; Fig. 3c), a paralogue of AD-associated ABCA7 and encodinga glial-expressed transporter that influences APOE metabolism in thecentral nervous system29. The region lacks protein-altering variantsand all five SNPs in the cluster of association lie specifically within anincreased-enhancer orthologue, which is also active in CD141 immunecells and, to a lesser extent, in human hippocampus and fetal brain.

Overall, our study revealed contrasting changes in immune and neur-onal genes and regulatory regions during AD-like neurodegeneration inmouse, strong human–mouse conservation of gene expression and epi-genomic signatures, and enrichment of AD-associated loci in increased-level enhancer orthologues in human. While immune genes are known

to be among the most significant genetic loci associated with AD, thedepletion of neuronal promoters and enhancers is particularly notablefor a cognitive disorder with well-established environmental and expe-riential factors that include diet, exercise, education and age. These resultsare consistent with a model in which increased immune susceptibilityto environmental factors during ageing and cognitive decline is medi-ated by interactions between genetically driven immune cell dysregula-tion and environmentally driven epigenomic alteration in neuronal cells.

Our study also illustrates the power of model organisms for the studyof human disease progression, especially for disorders affecting inac-cessible tissues for which only post-mortem samples are available inhuman. We find that molecular changes in both genes and regulatoryregions are highly conserved between human AD and CK-p25 neuro-degeneration, enabling detailed studies of the molecular signatures asso-ciated with disease progression across diverse environmental conditions,in a variety of brain regions and cell types, and in response to therapeuticagents before or after disease onset.

Lastly, our results indicate specific therapeutic targets for AD, includingputative causal nucleotides lying in increased-level enhancer orthologuesthat may be targeted by CRISPR/Cas9 genome editing30, and trans-actingregulators. In particular, the transcription factor PU.1 is implicated asa therapeutic target by its genetic association with AD, as well as theenrichment of the PU.1 motif and the PU.1 in vivo binding sites atincreased-level regulatory regions during mouse neurodegeneration.The conservation of neuronal and immune regulatory circuitry betweenmouse and human suggests that CK-p25 mice may offer a powerful modelfor studying the gene-regulatory and cognitive effects of such interventions.

0

6

3

0

6

3

0

6

3

Immune (CD14)Chromatin

states in

human

tissues

AD GWAS P value (–log10(P))

AD GWAS P value (–log10(P))

a b

c

Hippocampus

Fetal brain

Gene annotation (RefSeq)

Enhancers in AD mouse model

Immune (CD14)Chromatin

states in

human

tissues

Hippocampus

Fetal brain

Gene annotation (RefSeq)

Enhancers in AD mouse model

NGEF NEU2 INPP5D CELF1SPI1

ABCA1

Human chromatin state

Genic enhancer

Enhancer

Promoter

Transcribed

Heterochromatin

Polycomb repression

Mouse enhancer change

Consistent decrease

Transient decrease

Transient increase

Consistent increase

Late increase

Late decrease

No change

SPI1INPP5D

ABCA1

300 kb

150 kb

1.16 Mb

0

5

10

15

20

Fo

ld d

iffe

rence in

lum

inescence

**

NS

d

SPI1

SPI1

rs1

3774

16

SPI1

rs5

5876

153

Ctrl

Figure 3 | Increasing enhancer orthologues help interpret AD-associatednon-coding loci. a–c, Overlap of disease-associated SNPs (top) with increasingenhancers (second row, red) and immune enhancers in human (CD141

primary cells) is shown for genome-wide significant (INPP5D and CELF1(containing the SPI1 gene); a and b) and below-significance (ABCA1; c) ADGWAS loci. Roadmap chromatin state annotations for immune cells (CD141

primary; E029), hippocampus (E071) and fetal brain (E81), with colours as

shown in the key. Light red highlight denotes increasing enhancer regionstested in luciferase assay. kb, kilobases; Mb, megabases. d, AD-associated SNPrs1377416 amplifies in vitro luciferase activity of putative enhancer region38,313–37,359 base pairs (bp) upstream of SPI1 (PU.1) gene in BV-2 cells.n 5 3, P , 0.0001, one-way analysis of variance (ANOVA); **P , 0.01,Tukey’s multiple comparison post-hoc test. NS, not significant.

RESEARCH LETTER

3 6 8 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5

Macmillan Publishers Limited. All rights reserved©2015

Page 12: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Online Content Methods, along with any additional Extended Data display itemsandSourceData, are available in the online version of the paper; references uniqueto these sections appear only in the online paper.

Received 7 January 2014; accepted 22 January 2015.

1. Alzheimer’s Association. 2014 Alzheimer’s disease facts and figures. AlzheimersDement. 10, e47–e92 (2014).

2. Blalock, E.M., Buechel, H.M.,Popovic, J., Geddes, J.W.& Landfield, P.W.Microarrayanalyses of laser-captured hippocampus reveal distinct gray and white mattersignatures associated with incipient Alzheimer’s disease. J. Chem. Neuroanat. 42,118–126 (2011).

3. Zhang, B.et al. Integratedsystemsapproach identifies geneticnodesandnetworksin late-onset Alzheimer’s disease. Cell 153, 707–720 (2013).

4. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 newsusceptibility loci for Alzheimer’s disease. Nature Genet. 45, 1452–1458 (2013).

5. Cruz, J. C., Tseng, H.-C., Goldman, J. A., Shih, H. & Tsai, L.-H. Aberrant Cdk5activation by p25 triggers pathological events leading to neurodegeneration andneurofibrillary tangles. Neuron 40, 471–483 (2003).

6. Fischer, A., Sananbenesi, F., Pang, P. T., Lu, B. & Tsai, L.-H. Opposing roles oftransient and prolonged expression of p25 in synaptic plasticity andhippocampus-dependent memory. Neuron 48, 825–838 (2005).

7. Cruz, J. C. et al. p25/cyclin-dependent kinase 5 induces production andintraneuronal accumulation of amyloid beta in vivo. J. Neurosci. 26, 10536–10541(2006).

8. Orre, M. et al. Isolation of glia from Alzheimer’s mice reveals inflammation anddysfunction. Neurobiol. Aging 35, 2746–2760 (2014).

9. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in thehuman genome. Nature 489, 57–74 (2012).

10. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 referencehuman epigenomes. Nature http://dx.doi.org/nature14248 (this issue).

11. Cheng, Y. et al. Principles of regulatory information conservation between mouseand human. Nature 515, 371–375 (2014).

12. Gallant, S. & Gilkeson, G. ETS transcription factors and regulation of immunity.Arch. Immunol. Ther. Exp. (Warsz.) 54, 149–163 (2006).

13. Creyghton, M. P. et al. Histone H3K27ac separates active from poised enhancersand predicts developmental state. Proc. Natl Acad. Sci. USA 107, 21931–21936(2010).

14. Aruga, J. The role of Zic genes in neural development. Mol. Cell. Neurosci. 26,205–221 (2004).

15. Visel, A.et al.Ahigh-resolutionenhanceratlasof thedeveloping telencephalon.Cell152, 895–908 (2013).

16. Kim, T. K. et al. Widespread transcription at neuronal activity-regulated enhancers.Nature 465, 182–187 (2010).

17. May, G. et al. Dynamic analysis of gene expression and genome-wide transcriptionfactor binding during lineage specification of multipotent progenitors. Cell StemCell 13, 754–768 (2013).

18. Heinz, S. et al. Simple combinations of lineage-determining transcription factorsprime cis-regulatory elements required for macrophage and B cell identities. Mol.Cell 38, 576–589 (2010).

19. Crotti, A. et al. Mutant Huntingtin promotes autonomous microglia activation viamyeloid lineage-determining factors. Nature Neurosci. 17, 513–521 (2014).

20. Prinz, M. & Priller, J. Microglia and brain macrophages in the molecular age: fromorigin to neuropsychiatric disease. Nature Rev. Neurosci. 15, 300–312 (2014).

21. Gomez-Nicola, D., Fransen, N. L., Suzzi, S. & Perry, V. H. Regulation of microglialproliferationduringchronicneurodegeneration. J.Neurosci.33,2481–2493 (2013).

22. Jantaratnotai, N. et al. Upregulation and expression patterns of the angiogenictranscription factor Ets-1 in Alzheimer’s disease brain. J. Alzheimers Dis. 37,367–377 (2013).

23. Lyons, M. R. & West, A. E. Mechanisms of specificity in neuronal activity-regulatedgene transcription. Prog. Neurobiol. 94, 259–295 (2011).

24. Reed-Geaghan, E. G., Reed, Q. W., Cramer, P. E. & Landreth, G. E. Deletion of CD14attenuates Alzheimer’s disease pathology by influencing the brain’s inflammatorymilieu. J. Neurosci. 30, 15369–15373 (2010).

25. Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatoryvariants upon monocyte gene expression. Science 343, 1246949 (2014).

26. Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative riskalleles in leukocytes. Science 344, 519–523 (2014).

27. Stern, Y. Cognitive reserve in ageing and Alzheimer’s disease. Lancet Neurol. 11,1006–1012 (2012).

28. Lam, P. Y., Yoo, S. K., Green, J. M. & Huttenlocher, A. The SH2-domain-containinginositol 5-phosphatase (SHIP) limits the motility of neutrophils and theirrecruitment to wounds in zebrafish. J. Cell Sci. 125, 4973–4978 (2012).

29. Krimbou, L. et al. Molecular interactions between apoE and ABCA1: impact onapoE lipidation. J. Lipid Res. 45, 839–848 (2004).

30. Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. NatureProtocols 8, 2281–2308 (2013).

Supplementary Information is available in the online version of the paper.

Acknowledgements We thank A. Mungenast for critical reading and editing of themanuscript and discussions about the project, M. Taylor for mouse colonymaintenance, and X. Zhang, R. Issner, H. Whittonand C. Epstein for technical assistancewith ChIP-seq library preparation. We thank P. Kheradpour for the transcription factorbinding site motif scan of the mouse genome. This work was partially supported by theBelfer Neurodegeneration Consortium funding and NIH/NINDS/NIA (RO1NS078839)to L-HT, Early Postdoc Mobility fellowship from the Swiss National Science Foundation(P2BSP3_151885) to H.M., and NIH/NHGRI (R01HG004037-07 and RC1HG005334)to M.K.

Author Contributions This study was designed by E.G., A.R.P., A.K., M.K. and L.-H.T., anddirected and coordinated by M.K. and L.-H.T. E.G. initiated, planned and performed theexperimental work. A.R.P. performed computational analysis to characterizedifferential gene expression and histone mark levels, identify orthologous humanregions and enriched transcription factor binding sites, and compare regulatoryregions to human AD meta-analysis data. A.K. contributed to the computationalanalysis by generating mouse chromatin states and the quantification and control ofChIP datasets. H.M. helped with isolation and gene expression analysis of specific celltype populations. G.Q. performed permutation test comparing human Roadmapenhancers to AD GWAS SNPs. The manuscript was written by E.G., A.R.P., L.-H.T. andM.K., and commented on by all authors.

Author Information All data are available from the NCBI Gene Expression Omnibus(GEO) database under accession number GSE65159, the NIH Roadmap(http://www.roadmapepigenomics.org/data) and NCBI Epigenomics portal(http://www.ncbi.nlm.nih.gov/epigenomics). Reprints and permissions information isavailable at www.nature.com/reprints. The authors declare no competing financialinterests. Readers are welcome to comment on the online version of the paper.Correspondence and requests for materials should be addressed toL.-H.T. ([email protected]) or M.K. ([email protected]).

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported licence. The images or other

third party material in this article are included in the article’s Creative Commons licence,unless indicated otherwise in the credit line; if the material is not included under theCreative Commons licence, users will need to obtain permission from the licence holderto reproduce the material. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-sa/3.0

LETTER RESEARCH

1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 6 9

Macmillan Publishers Limited. All rights reserved©2015

Page 13: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

METHODSAnimals. All mouse work was approved by the Committee on Animal Care ofthe Division of Comparative Medicine at MIT. Adult (3-month-old) female double-transgenic CK-p25 (ref. 5) mice and their respective control littermates were used forthe experiments. Brain tissue was collected at either 2 or 6 weeks after p25 induction.Upon dissection tissue was flash-frozen in liquid nitrogen. No animals were excludedfrom the study and no randomization or blinding was required.Chromatin immunoprecipitation. Mouse hippocampus was collected immedi-ately after euthanasia. Chromatin immunoprecipitation was then performed as des-cribed in Broad ChIP protocol (http://www.roadmapepigenomics.org/protocols/type/experimental/). In brief, tissues were minced and crosslinked in 1% formalde-hyde (Thermo Scientific) for 15 min at room temperature and quenched with glycinefor 5 min (Sigma). The samples were homogenized in cell lysis buffer containing pro-teinase inhibitors (complete, Roche) and chromatin was then fragmented to a sizerange of ,200–500 bp using a Branson 250 digital sonifier. Solubilized chromatin wasthen diluted and incubated with ,1mg antibody at 4 uC overnight. Immune com-plexes were captured with Protein-A-sepharose beads, washed and eluted. Enrichedchromatin was then subjected to crosslink reversal and proteinase K digestion at 65 uC,phenol–chloroform extraction and ethanol precipitation. Isolated ChIP DNA wasresuspended and quantified using the Qubit assay (Invitrogen). H3K4me1 (Abcam,ab8895), H3K4me3 (Millipore, 07-473), H3K9me3 (Abcam, ab8898), H3K27me3(Millipore, 07-449), H3K27ac (Abcam, ab4729), H3K36me3 (Abcam, ab9050) andH4K20me1 (Abcam, ab9051) were used to immunoprecipitate endogenous proteins.ChIP-seq high-throughput sequencing, read mapping and quality control. Se-quencing libraries were prepared from ,1–5 ng ChIP (or input) DNA as describedpreviously31. Gel electrophoresis was used to retain library fragments between 300and 600 bp. Before sequencing, libraries were quantified using Qubit (Invitrogen)and quality-controlled using Agilent’s Bioanalyzer. The 36-bp single-end sequencingwas performed using the Illumina HiSeq 2000 platform according to standard oper-ating procedures. For each histone modification, five biological replicate data setswere produced with corresponding whole-cell extract controls, except for H3K4me3,H4K20me1and H3K27me3 in the 2-week control (CK) sample, where optimal amountof reads for sufficient coverage was obtained from four biological replicates. Readswere mapped to the mm9 reference mouse genome using MAQ v0.7.1-9 using defaultparameters32. Reads mapping to multiple locations were discarded. Duplicates weremarked and filtered using PICARD (http://picard.sourceforge.net/). After filtering,roughly 55–60 million unique reads were obtained for each histone modification ineach condition (,9–12 million reads per replicate) and ,110–145 million reads intotal for the whole-cell extract controls in each condition. All replicate data sets passedquality control based on ENCODE ChIP-seq data standards based on read quality,read mapping statistics, library complexity and strand cross-correlation analysis (tomeasure signal-to-noise ratios)33.RNA sequencing. Mouse brains were homogenized and total RNA was extractedusing Trizol reagent (Ambion). Total RNA was quality-controlled using Agilent’sBioanalyzer and prepared for sequencing using Illumina’s TruSeq Stranded TotalRNA Sample Preparation Kit with Ribo-Zero. High-throughput sequencing wasperformed on an Illumina HiSeq 2000 platform. Roughly 15 million 76-pair-end readswere generated for each data set. Sequence reads were aligned to mouse mm9 ge-nome with Bowtie. On the basis of the reproducibility of the results (Fig. 2a), threereplicate biological data sets were produced for each condition. A small number ofreplicates suffice for RNA sequencing (RNA-seq) studies34 and we were able to detectlarge-scale changes in read counts in coherent gene ontology categories, with sim-ilarities to human AD (Fig. 2c, d). Therefore, we decided that additional replicateswere not necessary.Peak calling and signal coverage tracks for ChIP-seq data. For each histone modi-fication in each condition, mapped reads were pooled across ChIP-seq replicatesand regions of enrichment (peaks) were identified for the pooled ChIP-seq data setrelative to the pooled control using the MACS2 peak caller (version 2.0.10.20130712)35

(https://github.com/taoliu/MACS/) using a relaxed p-value of 0.01. For each histonemodification, overlapping peaks (at least 1 bp overlap) were merged across all con-ditions to obtain a non-redundant master list of regions of enrichment. Master listsof broad domains of enrichment for the diffused marks H3K27me3, H3K9me3,H3K36me3 and H4K20me1 were obtained by allowing merging peaks across con-ditions that were within 1 kb of each other. Genome-wide signal coverage tracksrepresenting per-base fold enrichment and the likelihood ratio of ChIP relative tocontrol were also computed using MACS2.Learning combinatorial chromatin states. We used ChromHMM to learn com-binatorial chromatin states jointly across all four conditions36. ChromHMM wastrained using all seven chromatin marks in virtual concatenation mode across allconditions. Reads from replicate data sets were pooled before learning states. TheChromHMM parameters used are as follows: reads were shifted in the 59 to 39 direc-tion by 100 bp; for each ChIP-seq data set, read counts were computed in non-overlapping 200-bp bins across the entire genome; each bin was discretized into

two levels, 1 indicating enrichment, and 0 indicating no enrichment. The binariza-tion was performed by comparing ChIP-seq read counts to corresponding whole-cell extract control read counts within each bin and using a Poisson P value thresholdof 13 1024 (the default discretization threshold in ChromHMM). We trainedseveral models with the number of states ranging from 12 to 23 states. We decidedto use a 14-state model for all further analyses as it captured all the key interactionsbetween the chromatin marks and larger number of states did not capture signifi-cantly new interactions. To assign biologically meaningful mnemonics to the states,we used the ChromHMM package to compute the overlap and neighbourhoodenrichments of each state relative to coordinates of known gene annotations. Thetrained model was then used to compute the posterior probability of each state for eachgenomic bin in each condition. The regions were labelled using the state with the maxi-mum posterior probability. The chromatin state models and browser tracks can bedownloaded from http://www.broadinstitute.org/,anshul/projects/liz/segmentation/results/S14/webpage_14.html.Differential analysis and visualization. We used the DEseq2 method that modelsread count statistics from replicates across multiple conditions to identify differenti-ally expressed genes and regions of enrichment of histone marks37. Our proceduresare consistent with the standards for ChIP-seq and RNA-seq analysis determined byrigorous benchmarking as a part of the ENCODE project33. The minimal recom-mended depth for sufficient sensitivity of peak detection for histone marks for thehuman or mouse genome is ,20 million mapped reads33. However, owing to limitedamount of starting material obtained from a single mouse, we obtained ,10 millionunique mapped reads from each biological replicate. Directly, using read countsfrom the original replicates would result in significant loss of power to detect differ-ential events. To improve sensitivity, for each histone mark in each condition, wepooled mapped reads from all replicates and created a pair of pseudo-replicates withequal number of reads (,30 million) by randomly subsampling (without replace-ment) from the pool. Reads were then extended to the predominant fragment length.Extended-read counts were computed within all regions in the master peak list of ahistone mark for all pseudo-replicates in all conditions and the table of counts wasused as input to DEseq2. The raw data are available online (NCBI GEO GSE65159).

For RNA-seq data, the numbers reads overlapping ENSEMBL gene models38 weredetermined by HT-Seq (http://www-huber.embl.de/users/anders/HTSeq/). The rawdata are available online (NCBI GEO GSE65159). To ensure that the genes we chosewere sufficiently quantifiable, we remove every gene where fewer than 20 reads werefound across all samples. The resulting set of genes is found in Supplementary Table 1.

IGV39 is used to visualize the histone marks, gene expression, chromatin state andAD GWAS data relative to the RefSeq gene model. Gene expression levels shown areraw read density. Levels of histone marks plotted are the log-likelihood ratio of ChIPsignal relative to whole-cell extract control.

Within the DEseq2 framework of generalized linear models, we used a combina-tion of different models to determine the significantly regulated genes and signifi-cantly regulated histone mark levels. We compared the set of all 2-week and 6-weekcontrols to the three following groups: (1) the 2-week CK-p25 samples; (2) the6-week CK-p25 samples; (3) a group containing both the 2-week and 6-week sam-ples. The first two tests identified changes that might be 2-week or 6-week specific.The third test identified changes that might be too subtle to detect at any one timepoint alone. In each case, the most basic equation (count < CKp25 status) was used,but for a subset of samples. A stringent threshold of q , 0.01 (Benjamini Hochberg)was used to determine significantly changing genes expression levels and histonemark levels. Next, to determine the temporal bias of genes expression levels andhistone marks we built another model (count < time), which compared the 2-weekand 6-week CK-p25 samples. Levels considered likely to change (q , 0.5) were cate-gorizes as transient (2-week bias) or late-stage (6-week bias). The results of the RNA-seq analysis are found in Supplementary Table 1, while the results of the histonemark analysis are in Supplementary Table 2.

For the histone modifications, we defined promoters using H3K4me3 peaks labelledwith the promoter state annotation under any of the conditions (CK-p25 or control,and 2 or 6 weeks). We define enhancers based on peaks of H3K27ac labelled bythe enhancer chromatin state. We define Polycomb-repressed regions based onH3K27me3 peaks labelled by the Polycomb-repressed chromatin state. Our defini-tions are consistent with known roles of these histone modifications40. Defining theboundaries of the regulatory regions using the peaks of the relevant histone mod-ifications, and not the chromatin states, maximizes our power to detect changes inhistone mark levels.

Pathway and Gene Ontology analysis for the gene expression data were then gene-rated through the use of DAVID41,42. We present the most significant biologicalprocess gene ontology category result as well as a subset of non-redundant less sig-nificant categories that still pass our threshold significant (q , 0.01). For the regulatoryregions, GREAT (with default parameters) was used to find the fold enrichment in thesame Gene Ontology categories43.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 14: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Statistical framework for comparing CK-p25 changing genes and regulatoryregions to other data sets. A common theme throughout the analysis is the char-acterization of regulatory regions that change in the CK-p25 mouse model. The moststringent control for this characterization is genes or regions of the same type that donot change in CK-p25. Owing to the six categories of direction (increasing anddecreasing) and temporal pattern (transient, consistent and late-stage), we chose adiscrete statistical framework as opposed to trying to define a ranking across thesedifferent conditions. To measure the overlap between these discrete categories andother discrete data sets, we could use either a hypergeometric P value or a binomialP value. For every test in the material described below, we computed both signifi-cance values and obtained consistent results, with only minor differences in exactP value. In general, we chose the hypergeometric test, which is the most direct to lookat overlap of annotated regions. As opposed to the overlap of the CK-p25 mousecategories with other ChIP-Seq peaks, the overlap with transcription factor bindingsite motifs or SNPs can be thought of as sampling with replacement, which lendsitself to the binomial P value. No power analysis was done to estimate sample size.Comparison of histone marks and gene expression. As described above, DESeq2was used determine the log fold change in expression at 2 and 6 weeks in CK-p25mice relative to control. Each enhancer and promoter was mapped to the closestENSEMBL gene model based on distance to transcription start. For each categoryof histone mark direction and temporal pattern, we examined the enrichment ofeach category of CK-p25 gene expression change relative to unchanging genes. Thesignificance of the enrichment is calculated using a hypergeometric test.Identification of orthologous human regions. The promoter (H3K4me3 peaks an-notated as transcription start site by chromatin state), enhancer (H3K27ac peaks an-notated as enhancer by chromatin state) and Polycomb-repressed regions (H3K27me3peaks annotated as Polycomb-repressed by chromatin state) were mapped to thehuman genome. BED files representing the coordinates of these peaks in mm9 weremapped to mm10 using liftover44. Those peaks were mapped compared to the humangenome the UCSC multiple alignment chain files (http://hgdownload.soe.ucsc.edu/goldenPath/mm10/multiz60way/)45. More specifically, the alignments that overlapthe mouse peak and include hg19 were extracted. We calculated the human mousepairwise alignment for each multiple alignment using the ‘globalms’ function of bio-python (http://biopython.org/, version 1.59; python version 2.71). The highest scor-ing pairwise alignment formed base of the orthologous region in human. This regionwas extended on either side using lower scoring multiple alignments. The ortholo-gous region in hg19 was required to be greater than 30 bp and no more than twice thelength of the region in mouse. The mean conservation was examined using thePHASTCons score across placental mammals46 based on the same 60-way multiplesequence alignment. The mapped enhancer regions were annotated with their chro-matin state in human hippocampus, and across all 127 cell types and tissues, usingBEDTools47. The information from human tissues was collected according to pro-tocols described in more detail in the companion publication as a part of the Road-map Epigenomics project10 (http://www.roadmapepigenomics.org/). The protocolsare approved by the NIH and no sequence information from identifiable subjects isprovided.Computational analysis of cell type proportion. To estimate computationally therelative composition of the neural and immune cell types we compared the chan-ging expression patterns in our data set to a set of established cell-type-specificmarkers48–50. This analysis shows that indeed it is likely that cell type composition ischanging in the CK-p25 mouse model, consistent with a known decrease in num-ber of neurons and astrogliosis at 6 weeks5. In summary, a transient enrichmentof monocyte specific transcripts was observed at 2 weeks, a consistent enrichmentof microglial-specific transcripts was enriched at 2 and 6 weeks, while astrocyte,oligodendroctye and endothelial-specific markers were primarily increased at6 weeks (Extended Data Fig. 9a, b). We could also detect a signature of neuronalloss, primarily at 6 weeks as well (Extended Data Fig. 9a, b). On the basis of theseresults alone, it is possible that changes in cell type composition are contributing tosome of the differences we observe in our mouse model.

We also compared our data to a published study of microglial activation in anothermouse model of AD8, to dissect out computationally changes that are probably due tocell type proportion versus changes due to activation within cells. If the changes inour mouse model were primarily due to cell type proportion, then the increase weobserved in the CK-p25 mice should be proportional to the expression level of thosegenes microglia. If the changes we observed were primarily due to activation, then thechanges we observe in the CK-p25 mouse should be proportional to the amount ofactivation found in during neurodegeneration8. Using the genes with published geneexpression changes during activation8, we modelled these two possibilities as a linearregression problem and examined the relative significance of both hypotheses inthe R programming language: CK-p25 log fold change < microglial expression 1

microglial activation log fold change. We found that the changes in the CK-p25mice were significantly related to the changes in cell activation (P 5 2.9 3 1026) as

well as the changes in cell type proportion (P 5 2.7 3 1024), suggesting that both cellactivation and composition changes occur.Comparison of gene expression in mouse model and human AD. To examine therelationship between AD in the mouse model and human, we mapped each 1–1orthologous gene from mouse to human in ENSEMBL (http://www.biomart.org/)51.For each category of expression change in mouse, we examined how that set of genesbehaved in human AD cases relative to controls in whole hippocampus52 as well aslaser capture microdissected hippocampal grey matter2. To make this comparisonwe first downloaded both data sets from GEO (GSE1297 and GSE28146), applied avariance stabilization normalization, and then used limma53 to find the log fold changein expression of all cases relative to controls. For each category of mouse gene express-ion, we calculated a P value based on a t-test for the bias of genes to increase or decreasein human AD relative to control. Because the original study52 had more confoundersowing to changes in grey/white matter proportion, we focused our analysis on the 22cases and 9 controls from the laser capture samples2.Enrichment of cofactors and transcription factors. Peaks representing both neural15,16

and immune17–19 enhancers or transcription factor binding were used to annotatethe H3K27ac enhancers and H3K4me3 promoters. We used a hypergeometric testto evaluate whether or not these external annotations were enriched in the set ofincreased-level or decreased-level enhancers relative to the enhancers whose levelsdo not change. This same procedure was used to look at the enrichment of the CK-p25 enhancer orthologues in Roadmap Epigenome data. In this case, only enhancersthat map to human are taken to be the background.

The putative binding sites based on transcription factor binding site motifs wereidentified independent of conservation and have been previously published54. Thetranscription factor binding sites were further clustered based on similarity55. Theleast significant of two statistical tests was used as a stringent measure of binding siteenrichment. (1) The real transcription factor binding site motifs in the category ofinterest were compared shuffled control motifs that preserved nucleotide content.(2) The real transcription factor binding site motifs in the category of interest werecompared the real motifs in enhancers that are stable in the CK-p25 mice. To esti-mate the significance for test (1), we use a binomial P value because the length dis-tribution is different for changing regulator regions compared to unchanging. Thenwe estimate the probability of finding a site per base pair. To estimate the significancefor test (2), we use a hypergeometric test. After identifying significant transcriptionfactor binding sites in categories or regulatory regions, we collapsed the results intoclusters of almost identical motifs, representing families. The group members can befound in a companion manuscript10 as well as online (http://www.broadinstitute.org/,pouyak/motifs-table/).Luciferase reporter assay. A total of 14 oligonucleotide gBlocks (IDT), ranging in500–1,000 nucleotides in length, and corresponding to 10 enhancer regions weresynthesized. Each gBlock contained a constant 59-GCTAGCCTCGAGGAT and39-ATCAAGATCTGGCCT region, for direct cloning into an EcoRV (NEB) line-arized minimal promoter firefly luciferase vector pGL4.23[luc2/minP] (Promega).The resulting reporter constructs were verified by DNA sequencing. BV-2 cellswere provided by B. Yankner. N2a cells were purchased from the American TypeCulture Collection and maintained following their protocols. In brief, cells weregrown in RPMI-1640 and DMEM respectively, supplemented with 10% FBS and1% penicillin/streptomycin, and split 1:10 every 3 days. Cells were seeded into 24-well plates 1 day before transfection. Transfections into BV-2 and N2a cells wereperformed with 1mg of a pGL4.23 plasmid and 200 ng of Renilla luciferase con-struct pGL4.74[Rluc/TK] (Promega). Luciferase activities were measured 24 h aftertransfection using the Dual-Glo Luciferase Assay (Promega) and an EnVision 2103Multilabel Plate Reader (PerkinElmer) and normalized to Renilla luciferase activity.All assays were performed in triplicate.Microglia isolation. The 2-week-induced CK-p25 mice and age-matched controlswere perfused with 50 ml PBS to wash away blood and minimize macrophagecontamination in the brains. Hippocampal tissue was collected immediately afterperfusion and a single-cell suspension was prepared as described previously56. FACSwas then used to purify CD11b1 CD45low microglia cells using allophycocyanin(APC)-conjugated CD11b mouse clone M1/70.15.11.5 (Miltenyi Biotec, 130-098-088)and phycoerythrin (PE)-conjugated CD45 antibody (BD Pharmingen, 553081). Cellswere collected directly into RNA lysis buffer (Qiagen, 74104).cDNA synthesis and qPCR. Total RNA was extracted using the RNeasy Mini kit(Qiagen, 74104) according to manufacturer’s instructions. RNA concentration andpurity was determined using Agilent’s Bioanalyzer and reverse transcribed usingiScript cDNA Synthesis Kit (Biorad, 170-8891). For gene expression analysis cDNAfrom three biological replicates was quantitatively amplified on a thermal cycler(BioRad) using SYBR green (Biorad) and gene-specific primers (SupplementaryTable 8). The comparative Ct method57 was used to examine differences in geneexpression. Values were normalized to expression levels of Cd11b (also known asItgam). Three technical replicates were used for each gene.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 15: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

eQTL analysis. The human orthologous regions to mouse enhancers that changein the CK-p25 mouse were compared to control for the their enrichment to overlapregulatory SNPs from published eQTL studies in immune cell types under a varietyof conditions25,26. Because the eQTLs were processed separately, we applied ourown threshold (P , 1 3 1024). We then calculated enrichment of human ortho-logues of different categories CK-p25 enhancers relative to stable regions and useda binomial P value to estimate the significance.Enrichment of AD GWAS SNPs in Roadmap enhancers. The enrichment of ADGWAS SNPs that map to Roadmap enhancer regions is calculated on the basis ofpermutations of SNPs. In brief, SNPs were permuted 1,000,000 times preservingdistance to gene, minor allele frequency, and a number of SNPs in LD. The thousandgenomes projects database was used as the reference for this information.Comparison of regulatory regions to AD meta-analysis. The enrichment of CKp-25 human enhancer orthologues in AD was calculated by comparing the numberchanging regions that overlap SNPs4 to unchanging regions that overlap SNPs. Wecalculate the significance using a binomial P value, in which the probability of successin the changing enhancers is based on the frequency in the unchanging enhancers.The results for the consistently increasing enhancers were slightly more significancewhen using a hypergeometric test instead of the binomial. To test whether theenrichment of increasing enhancer orthologous regions was due to the overlap withCD141 cell enhancers, we repeated the above enrichment procedure within the setof CK-p25 enhancer orthologues that also overlap CD141 cell enhancers. The enrich-ment using this control was still significant (3.0-fold enrichment, binomial P 5

1.3 3 1025). AD GWAS SNPs that were in a mouse enhancer orthologues wereexpanded using an LD of 0.8 and then tested for potential coding SNPs58 or eQTLs(Supplementary Table 7).

31. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine humancell types. Nature 473, 43–49 (2011).

32. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and callingvariants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

33. Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE andmodENCODE consortia. Genome Res. 22, 1813–1831 (2012).

34. Anders, S. et al. Count-based differential expression analysis of RNA sequencingdata using R and Bioconductor. Nature Protocols 8, 1765–1786 (2013).

35. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137(2008).

36. Ernst, J., Kellis, M. & Chrom, H. M. M. Automating chromatin-state discovery andcharacterization. Nature Methods 9, 215–216 (2012).

37. Anders, S. & Huber, W. Differential expression analysis for sequence count data.Genome Biol. 11, R106 (2010).

38. Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).

39. Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer(IGV): high-performance genomics data visualization and exploration. Brief.Bioinform. 14, 178–192 (2013).

40. Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODEdata. Nucleic Acids Res. 41, 827–841 (2013).

41. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysisof largegene listsusing DAVIDbioinformatics resources.NatureProtocols4,44–57(2009).

42. Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools:paths toward the comprehensive functional analysis of large gene lists. NucleicAcids Res. 37, 1–13 (2009).

43. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatoryregions. Nature Biotechnol. 28, 495–501 (2010).

44. Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. NucleicAcids Res. 34, D590–D598 (2006).

45. Blanchette, M. et al. Aligning multiple genomic sequences with the threadedblockset aligner. Genome Res. 14, 708–715 (2004).

46. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, andyeast genomes. Genome Res. 15, 1034–1050 (2005).

47. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparinggenomic features. Bioinformatics 26, 841–842 (2010).

48. Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of glia,neurons, and vascular cells of the cerebral cortex. J. Neurosci. 34, 11929–11947(2014).

49. Hickman, S. E. et al. The microglial sensome revealed by direct RNA sequencing.Nature Neurosci. 16, 1896–1905 (2013).

50. Butovsky, O. et al. Identification of a unique TGF-b-dependent molecular andfunctional signature in microglia. Nature Neurosci. 17, 131–143 (2014).

51. Vilella, A. J. et al. EnsemblCompara GeneTrees: Complete, duplication-awarephylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009).

52. Blalock, E. M. et al. Incipient Alzheimer’s disease: microarray correlation analysesreveal major transcriptional and tumor suppressor responses. Proc. Natl Acad. Sci.USA 101, 2173–2178 (2004).

53. Smyth, G. K., Michaud, J. & Scott, H. S. Use of within-array replicate spots forassessing differential expression in microarray experiments. Bioinformatics 21,2067–2075 (2005).

54. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraintusing 29 mammals. Nature 478, 476–482 (2011).

55. Kheradpour, P. & Kellis, M. Systematic discovery and characterization ofregulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 42,2976–2987 (2014).

56. Guez-Barber, D. et al.FACS purification of immunolabeled cell types from adult ratbrain. J. Neurosci. Methods 203, 10–18 (2012).

57. Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 22DDCT method. Methods 25, 402–408 (2001).

58. Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states,conservation, and regulatory motif alterations within sets of genetically linkedvariants. Nucleic Acids Res. 40, D930–D934 (2012).

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 16: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

REG000522930

REG000523072 REG000523050 REG000523041 REG000523008 REG000522966 REG000522932 REG000522872 REG000522837 REG000522828 REG000522803

REG000522988

2_TssAFlnk 6_EnhG 5_TxWk 5_TxWk 12_EnhBiv 7_Enh 7_Enh 5_TxWk 4_Tx 4_Tx 4_Tx 15_Quies 7_Enh 5_TxWk 7_Enh 4_Tx 7_Enh 7_Enh 7_Enh

2_TssAFlnk 4_Tx 4_Tx 4_Tx 7_Enh 4_Tx 1_TssA 4_Tx 4_Tx 7_Enh

15_Quies 4_Tx 5_TxWk 4_Tx 7_Enh 7_Enh 5_TxWk 5_TxWk 7_Enh 4_Tx 13_ReprPC 7_Enh 7_Enh 7_Enh 7_Enh 4_Tx 4_Tx 7_Enh 15_Quies

2_MACS_peak_7617_lociStitched 1_MACS_peak_7633_lociStitched 1_MACS_peak_7637_lociStitched 21_MACS_peak_7669_lociStitched 1_MACS_peak_7690_lociStitched

REG000497757 REG000497766

REG000497729 REG000497734 REG000497745 REG000497755 REG000497765 REG000497770 REG000497778

14_ReprPCWk 15_Quies 7_Enh 9_Het 7_Enh 7_Enh 3_TxFlnk 4_Tx 4_Tx 4_Tx 15_Quies 5_TxWk 6_EnhG 6_EnhG 4_Tx 4_Tx

15_Quies 7_Enh 9_Het 15_Quies 7_Enh 15_Quies 7_Enh 7_Enh 7_Enh 15_Quies 5_TxWk 5_TxWk 5_TxWk 8_ZNF/Rpts

15_Quies 7_Enh 15_Quies 9_Het 12_EnhBiv 7_Enh 15_Quies 7_Enh 7_Enh 15_Quies 7_Enh 7_Enh 7_Enh

13_MACS_peak_26950_lociStitched 10_MACS_peak_26963_lociStitched

REG000590384

REG000590456 REG000590377

1_MACS_peak_8612_lociStitched

REG0005904565666666666666

590384

0590377

25

0.06

0.08

0.20

0.24

1.62

0.06

0.10

25

0.06

0.08

0.20

0.24

1.62

0.06

0.10

H3K36me3

H4K20me1

Chromatin

RNA-Seq

SPI1/PU.1

H3K4me3

H3K27ac

H3K4me1

H3K27me3

H3K9me3

H3K36me3

H4K20me1

Chromatin

RNA-Seq

H3K4me3

H3K27ac

H3K4me1

H3K27me3

H3K9me3

Increasing promoter

Increasing enhancer

Increasinggene body

AD

mo

del

(C

K-p

25)

Late

(6

wee

ks)

Co

ntr

ol (

CK

) La

te (

6 w

eeks

)

CK-p25

A in

crea

se

A in

c

A2 wk

Neuro

n los

s

Synap

se lo

ss

Cognit

ive im

pairm

ent

Astrog

liosis

CK-p25 CK

6 wk

Dissect hippocampus

RNA isolation ChIP: H3K27ac, H3K4me1, H3K4me3, H3K36me3, H4K20me1, H3K27me3,

H3K9me3

ChIP Sequencing Single-end 36 nt~60 million reads/mark

Multiplex 12 DNA libraries

CK

TotalActive promoters 57,840Active enhancers 151,447

Transcribed genes 13,836

RNA Sequencing Paired-end 76 nt ~15 million reads/sample

a b

42kb

Extended Data Figure 1 | Epigenomic and transcriptomic profiling of amouse model of AD. a, Experimental design and progression pathology in theCK-p25 mice. b, Gene expression and histone modification levels at the SPI1locus at 6 weeks of inducible p25 overexpression. Profiled are histone marks

associated with repression (blue); histone marks associated with enhancers(orange); histone marks associated with promoters (red); histone marksassociated with gene bodies (green); RNA-seq (black).

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 17: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

ActbCd1

1bCd1

4Aif1

Cx3cr1 Spi1

Inpp5dCst7

Clec7a

0

1

2

3

4

5

50

100

** NS

NS

****

*

**

**** CK-p25 CK

Fol

d ch

ange

rel

ativ

e to

CK

Extended Data Figure 2 | Differential microglia-specific gene expressionchanges in the CK-p25 mice. RT–qPCR of selected microglia markers andimmune response genes shows upregulation of gene expression in fluorescenceactivated cell (FAC)-sorted CD11b1 CD45low microglia from 2-week-induced

CK-p25 mice (red bars) relative to respective controls (black bars). Actb(b-actin) was used as a negative control. Values were normalized to Cd11bexpression (n 5 3, *P , 0.05, two-tailed t-test). NS, non-significant.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 18: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

TssA

TssU

TssD

Tx

Tx3p

Gene

EnhG

Enh1

Enh2

Bival

LowG

LowI

Mou

se c

hrom

atin

sta

tes

1

2

3

4

5

6

7

8

9

10

11

12

13

14

H3K

36m

e3

H3K

20m

e1

H3K

4me1

H

3K27

ac

H3K

4me3

H

3K27

me3

H

3K9m

e3

Enh

ance

rs

Tra

nscr

ibed

P

rom

oter

s

Low signal

Polycombrepressed ReprPC

Het Hetero- chromatin

Bivalent

b

H3K4me3 in promoters

H3K27ac in enhancers

H3K27me3 in ReprPC

TssA EnhG ReprPC ReprPCWk LowG Enh

863 32 1749 2181 20070 41225

2263 862 12379 721 9313 36683

842 145 3040 195 3132 15024

Rel

ativ

e en

richm

ent o

f m

ouse

reg

ulat

ory

regi

on

Human hippocampus chromatin state

Mou

se c

hrom

atin

sta

te

H

1

0

-1 -1

a

Extended Data Figure 3 | Chromatin state conservation. a, Combinatorialpatterns of the seven histone modifications profiled were used to definepromoter (1–3; A, active; D, downstream; U, upstream), gene body (4–6; tx,transcribed; 3P, 3 prime), enhancer (7–9; G, genic; 1 5 strong, 2 5 weak),bivalent (10), repressed Polycomb (11), heterochromatin (12), and low signal(13–14) chromatin states. Darker blue indicates a higher enrichment of themeasured histone mark (x axis) to be found in a particular state (y axis).

b, Promoter, enhancer and repressed chromatin states in mouse hippocampus(rows), as profiled in this study, align to matching chromatin states inhuman (columns), as profiled by the Roadmap Epigenomics Consortium10.Shading indicates enrichment relative to human chromatin state abundance(columns). The number of regions overlapping is shown in each cell ofthe heatmap.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 19: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

c d e

Log2 change relative to 2wk control

1 0 -1

2

Control

6 2 6

CK-p25

130

1779

547

123

911

1120

H3K27ac at enhancers

2

Control

6 2 6

CK-p25

H3K27me3 at Polycomb

2

Control

6 2 6

CK-p25

13

4

24

24

7

52

H3K9me3 at heterochromatin

32

16

136

37

40

10

Log2 change relative to 2wk control

2 0 -2

624

794

1,397

142

1,029

1,139

b

2

Control

6 2 6

CK-p25

H3K4me3 at promoters

Time (weeks)

Animal model

Consistentincrease

740

Late-stageincrease

2802

Transientdecrease

316

Consistentdecrease

941

Late-stagedecrease

3,799

Transientincrease

125

a

2

Control

6 2 6

CK-p25

RNA-Seq atgene bodies

Extended Data Figure 4 | Differential gene expression and histone marklevels at regulatory regions in CK-p25 mice. a–e, Shown are six distinctclasses of differentially modified regions: transient (early) increase (pink) ordecrease (light blue), consistent increase (red) or decrease (blue), and late(6-week) increase (dark red) or decrease (navy blue). The heatmap shows thelog fold change relative to 2-week controls for gene expression (a), H3K4me3

peaks at ‘TSS’ (transcription start site) chromatin state (b), H3K27ac peaks atenhancer chromatin state (c), H3K27me3 peaks overlapping the Polycombrepressed chromatin state (d), and H3K9me3 peaks overlapping theheterochromatin chromatin state (e). Numbers denote peaks falling into eachcategory.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 20: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

a b

Cat

egor

y of

gen

e ex

pres

sion

Enrichment relative overlap

* * * *

* *

* * *

* * *

* *

* * *

* * *

* *

* * *

* * *

* *

*

*

*

*

-3 3

Promoter Enhancer Polycomb repressed

0

Consistent decrease

Transient decrease

Transient increase

Consistent increase

Late increase

Late decrease

Con

sist

ent d

ecre

ase

Tra

nsie

nt d

ecre

ase

Tra

nsie

nt in

crea

se

Con

sist

ent i

ncre

ase

Late

incr

ease

Late

dec

reas

e

Con

sist

ent d

ecre

ase

Tra

nsie

nt d

ecre

ase

Tra

nsie

nt in

crea

se

Con

sist

ent i

ncre

ase

Late

incr

ease

Late

dec

reas

e

Con

sist

ent d

ecre

ase

Tra

nsie

nt d

ecre

ase

Tra

nsie

nt in

crea

se

Con

sist

ent i

ncre

ase

Late

incr

ease

Late

dec

reas

e

c

Extended Data Figure 5 | Relationship between changes of gene expressionand regulatory regions in CK-p25 mice. a–c, For each class of gene expressionchange in the CK-p25 model (x axis), overlap with different histonemodifications is shown (y axis) for H3K4me3 at promoters (a), H3K27ac atenhancers (b), and H3K27me3 at Polycomb repressed regions (c). Histone

modifications were mapped to the nearest transcription start site(Supplementary Table 3) to show the enrichment of the changing regulatoryregions relative to those that are stable in CK-p25. The significance is calculatedbased on the hypergeometric P value of the overlap.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 21: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Dep

letio

n/en

richm

ent (

-log 1

0(P

val

ue))

Monocytes (Fairfax et al)

Monocytes (Raj et al)

CD4+ cells (Raj et al)

Regulatory region overlap with immune eQTLs

0

5

10

Consistent decreaseTransient decreaseTransient increase Consistent increase Late increase

Late decrease

Extended Data Figure 6 | Enrichment of immune cell eQTLs in increasingmouse enhancers. Enrichment of eQTL SNP (y axis; 2log10(binomialP , 1024)) in monocytes and CD41 (refs 25, 26) is compared to the

orthologous regions of CK-p25-affected enhancers relative to enhancers whoselevels do not change.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 22: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Enr

ichm

ent i

n A

D-a

ssoc

iate

d S

NP

s (-

log 1

0(P

val

ue))

0

3

6

9

Category of regulatory region change

Promotern

AD

asso

Consistent decreaseTransient decreaseTransient increase Consistent increase Late increase

Late decrease

Extended Data Figure 7 | Weak enrichment of AD GWAS SNPs atdifferential CK-p25 promoters. Enrichment of AD-associated SNPs (y axis,binomial P value) in human regions orthologous to different classes ofmouse promoters.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 23: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

2

Enrichment in changing enhancers in AD mouse model (-log10(P value))

Enr

ichm

ent i

n A

D-a

ssoc

iate

d S

NP

s (-

log 1

0(P

val

ue))

so

ciat

edS

0

1

2

3

4

munerainrain

0

1

3

4

0

1

2

3

4

munerainrain

0

1

2

3

4

0

1

2

3

4

0 2 4 6 8

0

1

2

3

4

0 50 100 150 200

R 2 = 0.49 Roadmap classImmune

Adult brain

Fetal brain

Other

R 2 = 0.05

R2 = 0.51 R2 = 0.07 R2 = 0.29 R2 = 0.07

Category of mouse enhancer change

Consistent decreaseTransient decreaseTransient increase Consistent increase Late increase

Late decrease

a b

Extended Data Figure 8 | Enrichment of tissue-specific enhancerannotations from the Roadmap Epigenomics Consortium for AD-associated SNPs and mouse enhancers. a, b, Enrichment of AD-associatedSNPs (y axis, permutation P value) in tissue-specific enhancer annotationsfrom the Roadmap Epigenomics Consortium (points), relative to their

enrichment for increased-level (a) and decreased-level (b) (colours of differentclasses along y axis) of orthologous enhancer regions in the mouse AD model(x axis, hypergeometric P value). Linear regression trend line and R2 based onPearson correlation is shown.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2015

Page 24: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

a bM

onoc

ytes

Mic

rogl

ia

Ast

rocy

tes

End

othe

lial

Olig

oden

droc

ytes

Mac

roph

ages

Neu

rons

11.9

9.3

33 19.2 19 5.7 3.5

85.3 2.3 2.4

33.3

Enrichment of cell type 4 0 -4

Cell type enrichment (-log10(P value))

Neurons

Monocytes

Microglia

Astrocytes, Oligodendrocytes,Macrophages, Endothelial

Exp

ressio

n c

ha

ng

e

2wk 6wk

0 Consistent decrease

Transient decrease

Transient increase

Consistent increase

Late increase

Late decrease-

+

Extended Data Figure 9 | Cell type composition. a, For each class of geneexpression change (x axis), shown is the enrichment of cell-type-specific genemarkers from published data sets48–50. The macrophage and monocytecategories are computed relative to microglia49,50. The enrichment is calculated

relative to the genes that do not change in expression level in the CK-p25 mice.Cells in the heatmap are labelled based on the 2log10(P value) (hypergeometrict-test). Cases where no genes overlapped are shown in grey. b, Summary ofa, showing the inferred change in cell type composition across time.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2015

Page 25: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

ARTICLE

Received 11 Jul 2014 | Accepted 17 Nov 2014 | Published 2 Feb 2015

Methylome sequencing in triple-negative breastcancer reveals distinct methylation clusters withprognostic valueClare Stirzaker1,2,*, Elena Zotenko1,2,*, Jenny Z. Song1, Wenjia Qu1, Shalima S. Nair1,2, Warwick J. Locke1,2,

Andrew Stone1,2, Nicola J. Armstong1,3, Mark D. Robinson1,4, Alexander Dobrovic5, Kelly A. Avery-Kiejda6,

Kate M. Peters7, Juliet D. French7,8, Sandra Stein9, Darren J. Korbie10, Matt Trau7,10, John F. Forbes11,

Rodney J. Scott6,12, Melissa A. Brown7, Glenn D. Francis9,10 & Susan J. Clark1,2

Epigenetic alterations in the cancer methylome are common in breast cancer and provide

novel options for tumour stratification. Here, we perform whole-genome methylation capture

sequencing on small amounts of DNA isolated from formalin-fixed, paraffin-embedded tissue

from triple-negative breast cancer (TNBC) and matched normal samples. We identify dif-

ferentially methylated regions (DMRs) enriched with promoters associated with transcription

factor binding sites and DNA hypersensitive sites. Importantly, we stratify TNBCs into three

distinct methylation clusters associated with better or worse prognosis and identify 17 DMRs

that show a strong association with overall survival, including DMRs located in the Wilms

tumour 1 (WT1) gene, bi-directional-promoter and antisense WT1-AS. Our data reveal

that coordinated hypermethylation can occur in oestrogen receptor-negative disease, and

that characterizing the epigenetic framework provides a potential signature to stratify TNBCs.

Together, our findings demonstrate the feasibility of profiling the cancer methylome with

limited archival tissue to identify regulatory regions associated with cancer.

DOI: 10.1038/ncomms6899

1 Epigenetics Group, Cancer Division, Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia. 2 St Vincent’s Clinical School,University of NSW, Sydney, New South Wales 2010, Australia. 3 School of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006,Australia. 4 Swiss Institute of Bioinformatics and Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich,Switzerland. 5 Translational Genomics & Epigenomics Laboratory, Olivia Newton-John Cancer Research Institute, Melbourne, Victoria 3084, Australia.6 School of Biomedical Sciences and Pharmacy, Faculty of Health, University of Newcastle, Newcastle, New South Wales 2308, Australia. 7 School ofChemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland 4072, Australia. 8 Department of Genetics and ComputationalBiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4029, Australia. 9 Pathology Queensland, Princess Alexandra Hospital, Brisbane,Queensland 4102, Australia. 10 Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland 4072, Australia.11 School of Medicine and Public Health, Faculty of Health and Medicine, University of Newcastle, Newcastle, New South Wales 2305, Australia. 12 Division ofMolecular Medicine, Hunter Area Pathology Service and the Hunter Medical Research Institute, John Hunter Hospital, Newcastle, New South Wales 2305,Australia. * These authors contributed equally to this work. Correspondence and requests for materials should be addressed to S.J.C. (email:[email protected]).

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 1

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 26: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Triple-negative breast cancers (TNBCs) comprise a hetero-geneous group of cancers with varying prognoses,presenting a challenge for effective clinical management.

TNBC is clinically defined by the absence of oestrogen receptor(ER) and progesterone receptor expression, and neither over-expression nor amplification of human epidermal growth factorreceptor 2 (HER2)1,2. TNBC represents B15–20% of all newlydiagnosed breast cancer cases and is generally associated withhigh risk of disease recurrence and shorter overall survivalcompared with non-TNBC3. Broadly, TNBC patients can becategorized into two distinct groups; those that succumb to theirdisease within three to five years regardless of treatment andthose that remain disease-free to the extent that their overallsurvival exceeds that of non-TNBC patients (that is,approximately 48 to 10 years post diagnosis)4,5. Currently,methods by which TNBC patients are stratified into high- andlow-risk subgroups remain limited to staging by clinicopatho-logical factors such as tumour size, level of invasiveness andlymph node infiltration. However, unlike other breast cancersubtypes, TNBC outcome is less closely related to stage6. Thus,there is a clear need to identify a robust method by which TNBCpatients can be stratified by prognosis, to enable more informeddisease management.

Current efforts to stratify early breast cancer prognosis haveprimarily focused on multi-gene expression signatures and allhave received varying degrees of acceptance7. In addition tomulti-gene expression assays, DNA methylation signatures arebeing assessed as potential molecular biomarkers of cancer8.A number of studies have documented aberrant methylationevents in breast carcinogenesis and identified specific DNAmethylation biomarkers that have significant diagnostic andprognostic potential9–12. Several studies have also identified DNAmethylation signatures that can distinguish between breast cancersubtypes13–16, and others that may be predictive of treatmentresponse17–19.

Despite growing interest in the prognostic significance of DNAmethylation in breast cancer, there have been no studiesspecifically investigating the DNA methylation profile of humanTNBC and its association with disease outcome. Here we carryout genome-wide DNA methylation profiling of formalin-fixedparaffin-embedded (FFPE) triple-negative clinical DNA samples,using affinity capture of methylated DNA with recombinantmethyl-CpG binding domain of MBD2 protein, followed by nextgeneration sequencing (MBDCap-Seq)20,21. This high-resolutiontechnique allows for genome-wide methylation analysis of CpGrich DNA22,23. Using MBDCap-Seq, we identify regionalmethylation profiles specific to TNBC, which we validate usingmethylation data extracted from TCGA breast cancer cohort13.Importantly, we also report the first potential prognosticmethylation signature of survival, specific for TNBC that nowwarrants further study in larger cohorts.

ResultsGenome coverage of MBDCap-Seq. To delineate regionsassayable with MBDCap-Seq, we first profiled fully methylated(CpG methyltransferase SssI-treated blood sample) DNA.Computational analysis of SssI MBDCap-Seq revealed thatMBDCap-Seq can robustly assess the methylation status of230,655 regions spanning a total of 116 Mbp, comprising5,012,633 CpG dinucleotides, or B18% of the total number ofCpG sites in the human genome (see Methods; SupplementaryFig. 1a). The assayed CpG sites span 91% of all CpG islands;84% CpG island shores; 72% RefSeq promoters; 38% intronsand 31% exons. We next compared coverage of MBDCap-Seqwith the Illumina HumanMethylation450K (HM450K) array

(Supplementary Fig. 1b) and found that MBDCap-Seq inter-rogates additional 4,740,327 CpG sites as compared with thehigh-density HM450K array.

A major advantage of the MBDCap-Seq method is the abilityto interrogate regional blocks of hypermethylation, that is,methylation spanning consecutive CpG sites, which commonlyoccurs in cancer. We compared regional MBDCap-Seq coverageto coverage of HM450K arrays (Supplementary Fig. 1a) andfound that while MBDCap-Seq and HM450K arrays have similarregional coverage of CpG islands (91 versus 81%) and RefSeqpromoters (71 versus 83%), MBDCap-Seq regional coverage ofshores (77 versus 28%), enhancers (12 versus 2%) and insulators(11 versus 1%) is much greater, highlighting the potentialadvantage of MBDCap-Seq in screening novel functional regionsof the cancer methylome.

To determine if MBDCap-Seq can also provide accuratemethylation analysis from FFPET DNA, we compared DNAmethylation profiles from DNA isolated from fresh frozen (FF)and FFPET of matching tumour and lymph node samples.We show that MBDCap-Seq from FFPET provides equivalentmethylation to FF DNA (Pearson Correlation Coefficient of0.95 and 0.86, respectively) (Supplementary Fig. 2a) and thatMBDCap-Seq and HM450K array performed on the same FFPETtumour and lymph node DNA show high concordance (0.79 and0.77, respectively) (Supplementary Fig. 2b–d). We also showthat there are regions uniquely covered by MBDCap-Seq, forexample, at enhancers and insulator elements (SupplementaryFig. 2e,f).

Differentially methylated regions in TNBCs. To identify dif-ferentially methylated regions (DMRs) in TNBCs, we first pro-filed FFPET DNA using MBDCap-Seq from a discovery cohort of19 Grade 3 TNBC tumours and six matched normal samples(Supplementary Table 1) and analysed the data with a novelcomputational pipeline for comparative statistical analysis ofMBDCap-Seq samples (see Methods; Supplementary Fig. 3). Weidentified 822 hypermethylated and 43 hypomethylated statisti-cally significant DMRs (FDR o0.05), harbouring 64,005 and 623CpG sites, respectively, compared with matched normal samples(see Fig. 1a,b; Supplementary Data 1) and validated sample-spe-cific differential methylation using Sequenom DNA methylationanalysis (Supplementary Fig. 4). Next, we determined the geno-mic location of the DMRs and found that CpG islands, CpGisland shores and promoters are significantly overrepresented(hypergeometric test; P value o0.0001) in the 822 hypermethy-lated regions and underrepresented in the 43 regions ofhypomethylation (Fig. 1c; see Methods). Notably, ChromHMM-annotated HMEC promoters24 and polycomb repressed regionswere also significantly enriched (hypergeometric test; P valueoo0.001) for gain of methylation in the breast cancer samples.Finally, we validated example DMRs in an independent cohort of31 TNBCs and 15 normal breast samples and a panel of cell lines(Supplementary Table 2). We performed Sequenom methylationanalysis on five of the 822 hypermethylated regions spanning theCpG island promoters of NPY, FERD3L, HMX2, SATB2 andC9orf125 (Supplementary Fig. 5). The levels of methylationdetected in the normal samples were uniformly low, whereas thefive DMRs showed striking hypermethylation in the TNBCs(Fig. 1d) and 24 breast cancer cell lines (Fig. 1e).

Functional characterization of hypermethylated genes. Topredict the potential functional significance of the 822 DMRsidentified in the TNBC, we first determined which regionsoverlapped with promoters and genes and found thatthe DMRs were associated with 513 RefSeq promoters, which

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899

2 NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 27: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

corresponded to 308 genes (Supplementary Data 2). We used theDAVID functional annotation tool25 to annotate this set ofgenes. Visualization of statistically significantly (FDR o0.05)overrepresented gene sets revealed two largely non-overlappinggroups of genes26 (see Methods; Supplementary Fig. 6;Supplementary Table 3). One group is annotated with keywords‘DNA-BINDING’, ‘TRANSCRIPTION’, ‘TRANSCRIPTIONREGULATION’, ‘HOMEOBOX’, ‘DEVELOPMENTALPROTEIN’ and ‘DIFFERENTIATIONS’ and containsapproximately 100 genes, mostly transcription factors, such asBARHL2, DLX6, OTX2, RUNX1T1 and TAC1. The second groupis annotated with keywords ‘SIGNAL’, ‘CELL MEMBRANE’,

‘TRANSDUCER’, ‘GLYCOPROTEIN’ and ‘G-PROTEINCOUPLED RECEPTOR’ and contains genes involved insignalling pathways such as ADRB3, GHSR, NPY and ROBO3.

To determine whether promoter hypermethylation waspotentially involved in gene silencing, we examined TCGAexpression data for the 308 genes affected by promoterhypermethylation (see Methods for the analysis of TCGAexpression data for TNBC samples; 89 tumour and eightmatched normal samples). We found that genes with promoterhypermethylation are enriched in downregulated genes (71out of 245 genes, for which expression data are available, aredownregulated; hypergeometric test; FC 1.73; P value oo0.001)

Color key

0 1 21.50.5

Hypermethylated Hypomethylated

Normal Cancer

0

2

4

6

8

10ChromHMM annotations

*

*

*

* *

RefSeq transcripts

CPG islands*

Enh

ance

rs

Insu

lato

rs

Pro

mot

ers

Pol

ycom

b

Exo

ns

Pro

mot

ers

Intr

ons

Inte

rgen

ic

Isla

nds

Sho

res

*

Genomic distribution of DMRs

Hypomethylated HypermethylatedNormal Cancer

Obs

erve

d/ex

pect

ed

Met

hyla

tion

(%)

Clinical sample validation

Normal (n=15) Tumour (n=33)Key:

FE

RD

3L

C9o

rf12

5

HM

X2

NP

Y

SA

TB

2

0

20

40

60

80

100

Cell line validation

FE

RD

3L

C9o

rf12

5

HM

X2

NP

Y

SA

TB

2

0

20

40

60

80

100

Met

hyla

tion

(%)

Normal (n=3) Cancer (n=24)Key:

12

1,835198

1,083

21339

59Hyper Down

Mutated0.0

0.5

1.0

1.5

2.0

Obs

erve

d/ex

pect

ed

Up Down

Expression change

Figure 1 | MBDCap-seq identifies DMRs in discovery cohort. A heatmap showing methylation profile of 822 hypermethylated (a) and 43 hypomethylated

regions (b) across a cohort of 19 tumour and six matched normal samples in the discovery cohort. Columns are samples and rows are regions. The level of

methylation (number of reads normalized with respect to fully methylated sample) is represented by a colour scale—blue for low levels and red for high

levels of methylation. (c) A bar plot showing association of DMRs across functional/regulatory regions of the genome—(i) CpG islands and shores,

(ii) RefSeq transcripts and (iii) Broad ChromHMM HMEC annotation. The height of the bars represents the level of enrichment measured as a ratio

between the frequency of hypermethylated (pink) or hypomethylated (blue) regions overlapping a functional element over the expected frequency if such

overlaps were to occur at random in the genome. Statistically significant enrichments (P value o0.05; hypergeometric test) are marked with an asterisk.

(d) Sequenom validation of five hypermethylated regions—FERD3L, C9orf125, HMX2, NPY and SATB2—is shown for an independent cohort of TNBC samples

(normal n¼ 15; tumour n¼ 33) and (e) a panel of breast cancer cell lines (normal n¼ 3; cancer n¼ 24). For each region, box plots displaying the

distribution of methylation levels are shown in grey/blue for normal/tumour samples/cell lines. (f) A bar plot showing enrichment of genes with promoter

hypermethylation in sets of genes that are up-/downregulated in the TCGA cohort of TNBC tumours as compared with matched normal samples. The

height of the bars represents the level of enrichment measured as a ratio between the observed number of up-/downregulated genes with promoter

hypermethylation to the expected number of such genes. (g) A Venn diagram showing overlap between genes with promoter hypermethylation, genes

downregulated in TCGA TNBC cohort (hypergeometric test; FC 1.73; P value oo0.001) and genes with two or more mutations (hypergeometric test; FC

1.92; P value oo0.001) in TCGA breast cancer cohort.

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 3

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 28: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

and are depleted in upregulated genes (28 out of 245 genes areupregulated; hypergeometric test; FC 0.53; P value oo0.001)(Fig. 1f).

To identify potential driver events, we overlapped the 308hypermethylated genes with genes recurrently mutated in breastcancer in TCGA13 (Fig. 1g). We found that out of 308 genes withpromoter methylation, 51 are mutated (hypergeometric test; FCof 1.92; P value oo0.001) and 12 (C9orf125, COL14A1, ENPP2,ERG2, PLD5, ROBO3, RUNX1T1, SEMA5A, TBX18, TSHZ3,ZBTB16 and ZNF208) are both mutated and downregulated.Of these, both ROBO3 and SEMA5A are part of the axonguidance pathway recently implicated in tumour initiation andprogression27,28. Interestingly, promoter hypermethylationaffects, in total, seven members of the axon guidance pathway(CRMP1, GDNF, GFRA1, MYL9, ROBO1, ROBO3 and SEMA5A)with four members (GFRA1, MYL9, ROBO3 and SEMA5A)downregulated.

Differentially methylated regions specific to TNBCs. We nextasked if any of the 822 DMRs were also found in ER� or ERþbreast cancer. We used TCGA breast cancer methylation cohort,which comprises HM450K data for 354 ERþ and 105 ER�breast tumours (73 of which are TNBCs) and 83 normal breastsamples (see Methods for the analysis of TCGA methylationdata). Of the 822 DMRs regions identified in the MBDCap-seqmethylation discovery set, 770 DMRs are interrogated by a totalof 4,987 HM450K probes from the TCGA data set. We found thatwhile the majority of these probes are not methylated in breastnormal tissue, they were hypermethylated to various degrees inboth ERþ and ER� breast cancers (Fig. 2a). Both ERþ andER� subtypes also contained samples with minimal methylationacross all probes, as well as those that displayed extensivemethylation more representative of a CpG island methylatorphenotype (CIMP)29.

Next, we asked whether any of the DMRs were TNBC specific.Out of 4,987 HM450K probes, we found that 5% (282/4,987) weresignificantly hypermethylated in TNBCs (t-test; mean differential(diff) methylation 410%; P value o0.05) compared with theERþ tumours and the rest of the ER� ve tumours. Usingmethylation values of 282 TNBC-specific probes, we were able toclassify tumour samples in the TCGA HM450K cohort intoTNBCs and non-TNBCs with sensitivity of 0.72 sensitivity,specificity of 0.94 and AUC 0f 0.90 (Fig. 2b). From the 282TNBC-specific probes, we identified 36 TNBC-specific regions(harbouring at least three or more 450K TNBC-specific probes)that primarily overlap promoters and/or gene bodies(Supplementary Table 4; Supplementary Fig. 7). The regionspredominantly overlap genes-encoding zinc fingers and tran-scription factors and intergenic regions that are commonlymarked by polycomb in HMECs. An example of two such TNBC-specific regions are located in the promoters of genes-encodingzinc finger proteins ZNF154 and ZNF671 on chromosome 19(Fig. 2c). Both promoters have low methylation levels in normalbreast and increased levels of methylation in TNBC samples ascompared with ERþ cancer. The distribution of expressionvalues mirrors the methylation status, with normal samplesshowing the highest levels of expression and TNBC tumoursshowing the lowest levels of expression (Fig. 2d), suggestingsilencing by methylation of both ZNF154 and ZNF671 in TNBCtumours.

DNA methylation profile can stratify TNBCS. To identifyDMRs that potentially stratify TNBCs, we used unsupervisedcluster analysis on methylation of the 4,987 HM450K probes andidentified three distinct groups of TNBC tumours from the

TCGA data sets (Fig. 3a; see Methods). Survival analysis revealedthat the largely hypomethylated cluster (blue cluster) was asso-ciated with better prognosis as compared with the other two morehighly methylated clusters (orange and red clusters) (Fig. 3b).In particular, the medium methylated cluster (orange cluster)comprises samples with the worst prognosis (Cox proportionalhazards model; hazard ratio¼ 8.64; P value¼ 0.005) as com-pared with the good prognosis TNBC cluster (blue cluster). More-over, there was no association between the induced clusters andsurvival for ERþ or non TNBC samples (Supplementary Fig. 8).

Next, we determined to what extent regional methylationstratify TNBCs into good and bad prognosis groups. Survivalanalysis identified 190 probes that were associated with survivalin TCGA TNBC samples (Cox proportional hazards model;P value o0.05) in both univariate and multivariate analyses (seeMethods). We observed regional association (at least threeconcordantly located survival probes) for 17 regions; 14 genomicregions with poor survival and three genomic regions for goodsurvival (Table 1; Supplementary Fig. 9). Each of the individualKaplan–Meier plots for individual CpG sites in each regionshowed excellent survival separation, highlighting the potentialvalue as prognostic biomarkers (Fig. 3c–e; Supplementary Fig. 9).The genomic location of these regions vary with four regionslocated in a promoter (SLC6A3, C6orf174, WT1-AS and ZNF254),seven in the gene body only (DMRTA2, LHX8, WT1, WT1-AS,HOXB13, ECEL1, SOX2-OT) and five in intergenic regions(Table 1). Interestingly, with the exception of the region encodedby chr10: 102,409,068-102,409,766, all prognostic regions overlapDNase1 hypersensitive sites (ENCODE) and are marked with apolycomb signature in HMEC cells and many contain numerousconserved transcription factor binding sites (TRANSFAC30)(Table 1). Furthermore, we show that the average level ofmethylation of CpG sites in the 17 potential prognostic regions ishigher in the two poor survival groups and is lower in the normaland low-risk groups (Supplementary Fig. 10).

A striking example of regional hypermethylation acrossconsecutive CpG probes that shows statistical significance as aprognostic marker of survival are the DMRs spanning thebi-directional promoter and gene bodies of WT1 gene and itsantisense counterpart, WT1-AS (Fig. 3f). Wilms tumour protein(WT1) is a zinc finger transcription factor overexpressed inseveral tumour types including breast31. We observe anassociation between high level of methylation in chr11-11623and chr11-1210, regions spanning the gene bodies of WT1 andWT1-AS, respectively, and poor survival in TCGA TNBC cohort(Fig. 3f). Moreover, increased levels of methylation in theseregions are also associated with increased expression of WT1(chr11-11623) and WT1-AS (chr11-1210) in TNBC patients(Supplementary Fig. 11). Conversely, we observe that TNBCpatients with high methylation in chr11-4047, a region spanningbi-directional promoter of WT1 and WT1-AS, survive longer thanTNBC patients with low methylation in this region.

DiscussionThe prognostic stratification of TNBC patients remains one of themost significant challenges in breast cancer research. Whilecurrent efforts have primarily focused on the development ofmulti-gene expression classifiers to inform patient outcome, herewe demonstrate the significant prognostic potential of DNAmethylation biomarkers for the stratification of TNBCs. Weperformed genome-wide DNA methylation profiling on TNBC,identified novel regions of differential methylation, and validatedregions specific for TNBC using TCGA methylation data as anindependent cohort. Strikingly, unsupervised cluster analysis ofDMRs stratified TNBC patients into populations of high, medium

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899

4 NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 29: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

or low-risk disease outcome. In addition, using both univariateand multivariate Cox proportional hazard models, we identified17 DMRs significantly associated with TNBC patient survival(P-value o0.05). Critically, our classifiers paralleled the

biologically relevant time-dependent pattern of patient outcome,whereby TNBC patients are most vulnerable to disease-associateddeath within the first five years following diagnosis, highlightingtheir potential use as a valuable prognostic application.

ER– tumours(n=105)

Normal tissue(n=83)

ER+ tumours (n=354)

4,987 DMRprobes

0.0 0.2 0.4 0.6 0.8 1.0

Methylation beta values

DMR

CpG islandsHM450 probes

0.0

0.2

0.4

0.6

0.8

1.0

cg26

7056

88

cg03

2151

05

cg12

0740

25

cg19

2461

10

cg11

9776

86

cg08

0482

22

cg24

0169

39

chr19-31544 chr19-6109ZNF154 ZNF671

5 kb

cg03

1425

86

cg11

2945

13

cg05

6612

82

cg21

7906

26

cg27

0497

66

cg03

2341

86

cg08

6687

90

cg12

5069

30

0.0

0.2

0.4

0.6

0.8

1.0

88 55 55 00 66 2 99

β-V

alue

Normal tissue ER+ and other ER- tumours TNBC tumoursKey:

Normal ER+ TNBC

3

4

5

6

ZNF671 (mean diff = −0.85 P value = 5.50e−21)

Normal ER+ TNBC

2

3

4

5

ZNF154 (mean diff = −0.51 P value = 9.04e−10)

Exp

ress

ion

Specificity

Sen

sitiv

ity

0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.8 0.6 0.4 0.2 0.0

AUC: 0.90 (0.85–0.96)

ROC analysis

Figure 2 | Methylation profile of candidate DMRs in the TCGA breast cancer cohort. (a) A heatmap showing methylation profile of TCGA breast

cancer samples across 4,987 HM450K probes overlapping hypermethylated DMRs identified in the discovery cohort. Rows are probes and columns are

TCGA breast cancer samples profiled on HM450K—83 normal, 105 ER� tumour and 354 ERþ tumour samples. (b) A classifier (Partial Least Squares

model) based on methylation values of 282 TNBC-specific probes assigns TCGA HM450K tumour samples into TNBC and non-TNBC with high accuracy;

ROC analysis yields AUC of 0.9; assigning samples to highest scoring class (TNBC or non-TNBC) yields sensitivity of 0.72 and specificity of 0.94. The

TCGA HM450K cohort was randomly split into training set (TNBC n¼ 37; non-TNBC n¼ 193) and testing set (TNBC n¼ 36; non-TNBC n¼ 193). The

model was trained on training set and prediction accuracy assessed on testing set. (c) Box plots showing the distribution of methylation levels for two

adjacent regions on chromosome 19 in TCGA normal (n¼ 83), TNBC tumour (n¼ 73), and other breast tumour samples (n¼ 354). These two regions

which span the promoters of ZNF154 and the adjacent ZNF671 gene, are hypermethylated in the discovery cohort and exhibit regional TNBC-specific

hypermethylation in TCGA cohort, i.e. they are more heavily methylated in TNBC tumours as compared with normal and other tumour subtypes, as shown

in the box plots (t-test; mean diff 40.1; P value o0.05). (d) Box plots showing the distribution of expression levels of ZNF154 and ZNF671 genes in TCGA

normal (n¼92), TNBC tumour (n¼ 119) and ERþ tumour (n¼ 588) samples. The difference in expression of TNBC tumours versus ERþ is significant for

both genes (t-test; ZNF154 mean diff¼ �0.51; P value¼ 9.04e� 10; ZNF671 mean diff¼ �0.85; P value¼ 5.50e� 21).

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 5

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 30: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

HM450K probes overlapping hypermethylated DMRsT

CG

A T

NB

C tu

mou

r sa

mpl

es

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Overall survival (months)

Pro

babi

lity

Cluster survival

chr11-11623

WT1

cg05

2229

24

cg04

0967

67

cg13

0666

58

cg05

9409

84

cg13

5409

60

cg05940984 cg13066658

cg04096767

0 50 100 150 200

cg13540960

Overall survival (months)0 50 100 150 200

Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

cg05222924

DMRHM450 probes

CpG islands

Gene

Normal

Tumour

HR = 4.70 (7.70)P = 0.048 (0.01)

HR = 5.14 (7.03)P = 0.04 (0.02)

HR = 6.63 (10.2)p = 0.02 (0.005)

HR = 5.50 (7.10)P = 0.03 (0.02)

HR = 4.74 (6.59)P = 0.046 (0.02)

High

Low

Methylation:

chr11-4047 chr11-12105 kb

WT1-ASWT1

cg25

5778

42

cg03

0523

01

cg09

7117

46

cg05

0221

05

cg13

6637

93

cg07

1937

66

cg01

9522

34

cg19

5702

44

cg19

1263

00

cg13663793 cg07193766

cg01952234 cg19570244

cg19126300

Key: Heatmap

0.2 0.60.4 0.8

HR = 0.15 (0.11)P = 0.02 (0.01)

HR = 0.14 (0.12)P = 0.01 (0.01)

HR = 0.26 (0.22)P = 0.044 (0.04)

HR = 0.17 (0.14)P = 0.01 (0.01)

HR = 0.14 (0.12)p = 0.01 (0.01)

cg25577842 cg03052301

cg09711746 cg05022105

HR = 5.80 (7.18)P = 0.02 (0.02)

HR = 4.81 (8.56)P = 0.04 (0.009)

HR = 14.3 (21.2)P = 0.01 (0.006)

HR = 14.5 (22.9)P = 0.01 (0.005)

High

Low

Methylation:

High LowMethylation:

WT1 gene body WT1/WT1-AS promoter WT1-AS gene body

Methylation = poor survival Methylation = improved survival Methylation = poor survival

1 kb

cg22

1491

37

cg21

8651

50

HOXB13

chr17-22033

Methylation = poor survival

HOXB13 gene body

cg00

5579

47

cg07

9194

43

cg00

2729

71

cg03

9432

18

cg07

0993

31

chr10-13741

Repressed

1 kbDMR

HM450 probesGene/ChromHMM

Normal

Tumour

CpG Islands

Methylation = poor survival

Intergenic

Weak enhancer

chr19-36571

ZNF254

1 kb

cg26506288

cg21865150

cg22149137HR = 6.27 (9.34)

P = 0.02 (0.006)

HR = 5.81 (11.3)

P = 0.02 (0.003)

HR = 6.80 (10.1)P = 0.01 (0.005)

High

Low

Methylation:

cg26

5062

88

High

Low

Methylation:

0 50 100 150 200

0.2

0.4

0.6

0.8

1.0cg00557947

Overall survival (months)

Pro

babi

lity

0 50 100 150 200

cg07919443

Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

cg00272971 cg03943218

cg07099331

HR = 3.88(4.74)P = 0.04 (0.04)

HR = 5.95 (6.53)

P = 0.02 (0.03)

HR = 3.76 (4.69)P = 0.049 (0.04)

HR = 3.96 (5.04)P = 0.04 (0.04)

HR = 6.04 (7.99)

P = 0.02 (0.01)cg09777776 cg17268801

cg04571847 cg02286642HR = 0.17 (0.10)

P = 0.02 (0.009)HR = 0.15 (0.09)

P = 0.02 (0.008)

HR = 0.16 (0.17)P = 0.02 (0.02)

HR = 0.15 (0.09)P = 0.01 (0.006)

High LowMethylation:

cg09

7777

76

cg17

2688

01

cg04

5718

47

cg02

2866

42

Methylation = improved survival

ZNF254 promoter

0.0

0.2

0.4

0.6

0.8

1.0

β

Key

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0P

roba

bilit

y

0.0

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.00 50 100 150 200

Overall survival (months)0 50 100 150 200

Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0 50 100 150 200Overall survival (months)

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

0.0

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899

6 NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 31: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

Table 1 | Summary of 17 DMRs associated with overall survival in TCGA TNBC.

Chr Start End RefSeqLocation

CpGisland

CpGshore

Dnasehypersensitivesite

Conservedtranscriptionfactor bindingsites (Z scorecutoff¼ 2.33)

ChromHMMHMECpolycomb

Prognosticprobes

Gene function/location

Poor prognosischr1 50,658,646 50,659,783 DMRTA2* Yes No Yes GRa Yes 3 Doublesex- and Mab-

3-related transcriptionfactor

chr1 75,368,128 75,368,976 LHX8* Yes Yes Yes NA Yes 3 Homeobox proteinLhx8

chr10 102,409,068 102,409,766 Yes Yes No STAT1b, NK-kB,CREB, NF-Y,CEBPa

Yes 5 Intergenic region,ChromHMMpolycomb marked

chr11 32,404,535 32,407,465 WT1* Yes Yes Yes EGR1, EGR2,EGR2, NF1,LMO2, RFX1,MIF1, CREB,cJUN, ATF,ATF2

Yes 5 Wilms tumour protein,transcription factor

chr11 32,416,010 32,417,947 WT1-AS* Yes Yes Yes NA Yes 4 Wilms tumour protein,antisense transcipt

chr13 27,398,788 27,401,867 Yes Yes Yes NA Yes 4 Intergenic region,ChromHMMpolycomb marked

chr14 56,330,541 56,332,135 Yes Yes Yes USF1, MAX1,c-MYC

Yes 3 Intergenic region,ChrommHMMpolycomb

chr17 44,159,065 44,159,578 HOXB13* Yes Yes Yes HSF1, HSF2 Yes 3 Homeobox genefamily, transcriptionfactor

chr2 233,058,433 233,060,592 ECEL1* Yes Yes Yes NRSF, PAX2,STAT5A, YY1,AHR, GATA2,AP2

Yes 3 Zinc-containing typeIIintegral-membraneprotein

chr3 182,923,564 182,924,686 SOX2-OT* No No Yes NA Yes 4 Non-protein codingRNA gene

chr5 1,498,811 1,499,696 SLC6A3w Yes Yes Yes NA Yes 3 Neurotransmitterreporter

chr6 27,620,848 27,621,582 No No Yes NA No 3 ChromHMMpromoter marked,Intergenic region

chr6 127,881,341 127,882,455 C6orf174*,w No Yes Yes STAT5A, FOXC1 Yes 6 Chromosome 6 openreading frame SOGA3protein coding region

chr7 121,726,837 121,728,266 Yes Yes Yes CHX10 Yes 4 ChromHMMpolycomb marked

Good prognosischr11 32,413,697 32,415,714 WT1/WT1-

ASwYes Yes Yes E47, AP4,

c-MYC, ARNTYes 5 Bi-directional

promoter of WT1/WT1-AS transcriptionfactor

chr19 24,061,637 24,062,272 ZNF254*,w No No Yes NA No 4 Zinc fingerprotein,transcriptionalregulation

chr22 44,641,414 44,642,542 Yes Yes Yes NA Yes 3 Intergenic region,ChromHMMpromoter

DMR, differentially methylated region; NA, not available; TNBC, triple-negative breast cancer.*Gene body.wPromoter.

Figure 3 | Methylation profile stratifies TNBC tumours into survival subgroups. (a) Unsupervised clustering with 4,987 HM450K probes overlapping

822 hypermethylated DMRs identified in the discovery cohort separates TCGA TNBC tumours (n¼ 73) into three main clusters. The heatmap shows the

methylation profile of TCGA TNBC tumours and cluster dendrogram. The three clusters are colour-coded with the red cluster exhibiting the highest

methylation (TNBC.high), the blue cluster exhibiting the lowest methylation (TNBC.low) and the orange cluster exhibiting an intermediate level of

methylation (TNBC.medium). b; methylation beta value. (b) A Kaplan–Meier plot showing survival curves for the patients in the three clusters defined in

a. In addition, individual regions of hypermethylation in the discovery cohort overlap with survival-associated probes in the TCGA cohort, including (c)

intergenic loci, (d) intragenic loci (for example, the HOXB13 gene body) and (e) promoter associate loci (for example, ZNF254 promoter). (f) Association

with survival for three adjacent regions—chr11-11623, chr11-4047 and chr11-1210—spanning the WT1/WT1-AS locus is shown. These three regions are

hypermethylated in the discovery cohort and overlap several probes showing statistically significant association with overall survival in both univariate and

multivariate analyses. For each region, the methylation profile of TCGA TNBC tumour (n¼ 73) and adjacent normal samples (n¼9) across overlapping

survival probes is shown as a heatmap. The Kaplan–Meier plots for each of the overlapping survival probes is shown as well with corresponding hazard

ratios and P values from Cox proportional hazards model; values in parentheses correspond to multivariate analysis. HR, hazard ratio.

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 7

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 32: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

The DNA methylation aberrations we identified in the TNBCsamples follow specific patterns common to many cancer types32.For instance, hypermethylation events were localized to CpGislands and shores, while hypomethylation occurred globallyacross intragenic regions32. We observed a strong co-localizationof the hypermethylated regions with H3K27me3 marked(polycomb repressed) regions in HMEC cells, supporting thefinding that many polycomb-regulated genes are predisposed toaberrant methylation in cancer33. We identified 308 genesaffected by promoter hypermethylation and functional analysisrevealed significant enrichment of genes and transcription factorsinvolved in development and differentiation, as well as DNAbinding, homeobox proteins and transcriptional regulation.Hypermethylation of homeobox genes has been previouslyreported in breast cancer and associated with diseaseprogression and poor patient prognosis15,16,34,35. Genesencoding glycoproteins were also enriched in the functionalanalysis. A significant function of glycoproteins is that ofdirecting immune response36. This is particularly poignantsince several gene expression modules associated with immuneresponse have been used to predict TNBC patient outcome37–41.Many of the aberrant cancer promoter hypermethylation eventsaffect genes already silenced in the tissue of origin and thereforeconsidered to be passenger events that do not actively contributeto cancer initiation or progression42. To identify potential drivermethylation events, we highlighted genes that were bothdownregulated in TNBC tumours and recurrently mutated inbreast cancer. Twelve methylated genes were identified as bothmutated and downregulated, including ROBO3 and SEMA5A thatare a part of the axon guidance pathway, recently implicated intumour initiation and progression27. In total, promoterhypermethylation affects seven members of the axon guidancepathway. Although the mechanism by which axon guidancedrives cancer progression is not completely understood, our datasupport a potential causal role for DNA methylation for many ofthese family members in TNBCs.

Using an independent TNBC cohort from the TCGA data, wevalidated 36 TNBC DMRs comprising 20 genes. Strikingly, fourof the 20 genes encoded zinc finger proteins (ZNFs). IndividualZNFs and even some clusters of ZNF genes have been foundhypermethylated and silenced in several tumour types43–46. Inaddition, methylation of other ZNF genes have potentialprognostic value in prostate and bladder cancer47,48. Althoughthe mechanisms by which aberrant ZNF expression facilitatesoncogenesis are not completely understood, ZNFs are included intwo independently derived, TNBC specific, multi-gene expressionclassifiers (TN45 and Buck 14)38,39.

Recent studies have identified non-TNBC as more heavilymethylated compared with TNBC16. In our study, we found thata distinct population of both ERþ and ER� tumours areassociated with extensive methylation across the DMRs, morerepresentative of a CpG island methylator phenotype (CIMP)29.Interestingly, a previous report describes the breast-CIMP(B-CIMP) group comprising solely ERþ tumours16; however,our results show that coordinated hypermethylation can alsooccur in ER� disease. We also identified three distinctmethylation clusters of TNBC tumours based on our DMRs.The largely hypomethylated profile was associated with bettersurvival within the first five years post diagnosis compared withthe more heavily methylated subtypes. Interestingly, the mediummethylated cluster was associated with the worst survival. Proofof concept that methylation can be used to stratify breast cancersubtypes was recently demonstrated by TCGA, where DNAmethylation data were used to classify breast cancer into fivedistinct subtypes; however, each of the five methylation groupswere represented by multiple tumour subtypes and the

relationship between methylation and prognosis was notexplored13. Here, we also identified 17 individual DMRscapable of stratifying TNBC patients into good and poorprognosis groups. Notably, these regions predominately overlapwith DNAaseI hypersensitive regions and contain conservedtranscription factor binding sites highlighting their potentiallysignificant role in transcriptional regulation. Of the genes listed,many, including WT1, WT1-AS, DMRTA1 and HOXB13, havebeen previously identified as hypermethylated in numerouscancer subtypes including breast cancer49–52, althoughassociations with patient prognosis were not defined in thesestudies.

Finally, three ‘survival’ DMRs span the bi-directional promoterand gene bodies of WT1 gene and its antisense counter-partWT1-AS. WT1 is an extensively studied transcription factoressential for normal development of the urogenital system andderegulated across many cancer types31. In breast cancer, highmRNA levels of WT1 were reported to be associated with poorpatient survival53 and positive modulation of expression of WT1by its antisense transcript WT1-AS54,55. Our observed patterns ofmethylation and survival support an extensive body of evidenceon the tight epigenetic transcriptional regulation of WT1 and itsrole in breast cancer prognosis. More specifically, high levels ofmethylation across regions spanning gene bodies of WT1 andWT1-AS genes correlate with elevated levels of expression andpoor survival, whereas hypermethylation spanning the bi-directional promoter is associated with decreased WT1 andWT1-AS expression and improved survival.

Cumulatively, the work presented here highlights the prog-nostic potential of DNA methylation in TNBC. We identifiedindividual potential biomarkers of patient outcome as well asproviding the first evidence to suggest that DNA methylationcould be used to stratify TNBC subtypes associated with distinctprognostic profiles. Both observations warrant further clinicalinvestigation in larger independent cohorts as these signaturesmay in the future provide valuable tools in the management ofTNBC.

MethodsBreast cancer tissue samples. Human tissue samples representing normal andtumour breast from fresh frozen and formalin-fixed paraffin-embedded tissue wereobtained for this study. Only samples that were classified as triple-negative Grade 3ductal adenocarcinomas (Supplementary Table 1) were included. The study pro-tocol was approved by the Hunter New England Human Research Ethics Com-mittee (NSW HREC Reference No: HREC/09/HNE/153), Newcastle, New SouthWales, Australia and the Princess Alexandra Hospital Human Research EthicsCommittee (PAH HREC)(Research Protocol: 2007/165), Brisbane, Queensland.

DNA isolation from formalin-fixed paraffin-embedded material. DNA isolationfrom microdissected formalin-fixed paraffin-embedded tissue was performed usingthe Gentra Puregene Genomic DNA purification tissue kit according to themanufacturer’s instructions (Qiagen). 5� 1 mm cores or 5� 10mm full-facedsections were used for each extraction. The de-paraffinization step was carried outas follows: the paraffin samples were cut into small pieces, 500 ml xylene was addedand incubated at 55 �C for 5 min, and the tissue was pelleted at 16,000g for 3 min,discarding the xylene. After repeating this step, 500ml 100% EtOH was added for5 min at room temperature with constant mixing and the tissue collected bycentrifugation at 16,000g for 3 min. The EtOH step was repeated and the tissuepellet dried for 10 min. Then, 300 ml of cell lysis solution was added and the tubeincubated for 70 �C for 10 min, followed by the addition of 20 ml Proteinase K(20 mg ml� 1) to each sample and vortexing for 20 s and incubation in a 55 �Cblock overnight with constant vortexing. The following day, a further 10 ml pro-teinase K was added, vortexed for 20 s and further incubated at 55 �C until thesamples appear clear. Then, 1 ml RNase A solution (100 mg ml� 1) was added,mixed by inverting 25 times and incubated at 37 �C for 1 h. The sample was placedon ice to quickly cool it. Then 100 ml protein precipitation solution was added tothe cell lysates, vortexed for 20 s, incubated on ice for 5 min and centrifuged at fullspeed for 5 min at 4 �C to pellet the protein precipitate. The supernatant containingthe DNA was carefully removed into a clean microcentrifuge tube. The DNA wasprecipitated with 300 ml 100% isopropanol, and 2 ml glycogen (20 mg ml� 1) wasadded if low yield was expected (o1 mg). The solutions were mixed by inversion

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899

8 NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 33: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

(50 times) followed by centrifugation for 10 min at 4 �C. The DNA pellet waswashed with 70% EtOH, air-dried and dissolved in 20 ml H2O. To dissolve thepellet, it was incubated for 1 h at 65 �C with constant vortexing.

Enrichment of methylated DNA by MBDCap. The MethylMiner MethylatedDNA Enrichment Kit (Invitrogen) was used to isolate methylated DNA from500 ng to 1 mg of genomic FFPET DNA and was sonicated to 100–500 bp. MBD-Biotin Protein (3.5 mg) was coupled to 10ml of Dynabeads M-280 Streptavidinaccording to the manufacturer’s instructions. The MBD-magnetic bead conjugateswere washed three times and re-suspended in one volume of 1� Bind/Washbuffer. The capture reaction was performed by the addition of 500 ng to 1 mgsonicated DNA to the MBD-magnetic beads on a rotating mixer for 1 h at roomtemperature. All capture reactions were done in duplicate. The beads were washedthree times with 1� Bind/Wash buffer. The bound methylated DNA was elutedas a single fraction with a single high-salt elution buffer (2,000 mM NaCl).Each fraction was concentrated by ethanol precipitation using 1 ml glycogen(20 mg ml� 1), 1/10 volume of 3 M sodium acetate, pH 5.2 and two sample volumesof 100% ethanol and re-suspended in 60 ml H2O. Enrichment of methylated DNAafter capture was previously assessed by quantitative PCR of control genes ofknown methylation status; namely EN1 (heavily methylated) and GAPDH(unmethylated)22.

Preparation of MBDCap-Seq libraries and Illumina sequencing. Ten nano-grams of DNA of MBDCap-enriched DNA was prepared for Ilumina sequencingusing the Illumina ChIP-Seq DNA sample prep kit (IP-102-1001) according to themanufacturer’s instructions. The library preparation was analysed on Agilent HighSensitivity DNA 1000 Chip. Each sample was sequenced on one lane of the GA11x.

Computational analysis of MBDCap-Seq data. Sequenced reads were aligned tothe hg18 version of the human genome with bowtie. Reads with more than threemismatches and reads mapping to multiple positions were removed. Finally,multiple reads mapping to exactly the same genomic coordinate were eliminatedand only one read was retained for downstream analysis. Alignment statistics forsamples used in this study are given in Supplementary Table 5. MBDCap-Seqplatform was previously shown to interrogate CpG dense regions of the genome23.To accurately delineate regions of the genome assayable by MBDCap-Seq, we usedfully methylated sample (SssI blood sample) to guide us to the genomic regionsattracting sequenced tags. More specifically, we applied findPeaks peak callingutility from HOMER suite of programs56 to fully methylated sample (withparameter settings of -style histone -size 300 -minDist 300 -tagThreshold 18) toidentify 230,655 regions covering B116 Mbp of the genome. We interchangeablyrefer to these regions as regions of interest or SssI regions. For each MBDCap-Seqsample to be analysed, we computed the number of sequenced tags overlappingSssI regions, which resulted in table of counts where columns are samples androws are SssI regions. We used edgeR Bioconductor package57 (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html) to model distribution ofreads between normal (n¼ 6) and tumour (n¼ 19) group of samples in thediscovery cohort. Since edgeR package does not support modelling of paired andunpaired data simultaneously, we performed two separate analyses, a pairedanalysis with six normal/tumour pairs and unpaired analysis with all the samples,and then intersected the results. We found 822 hypermethylated and 43hypomethylated regions at FDR threshold of 0.05 in both paired and unpairedanalyses.

Clustering of MBDCap-Seq data. The number of reads mapping to a particularregion of genome depends not just on the average level of methylation in theregion, but also on other factors, such as density of methylated CpG nucleotides.To compare MBDCap-Seq readout to other more quantitative technologies such asHM450K and Sequenom, we used fully methylated MBDCap-Seq sample tonormalize MBDCap-Seq readouts for samples in the discovery cohort. Morespecifically, let Xi be the number of tags overlapping region i and N be the totalnumber of tags overlapping SssI regions in the sample to be normalized and Yi andM be the corresponding numbers in the control sample. Then, the normalizednumber of tags overlapping the region i is given by

logXi

N�M

Yiþ 1

� �ð1Þ

We used normalized tag counts for heatmap visualization in Fig. 1, for comparisonwith HM450K in Supplementary Fig. 2, and for comparison with Sequenom inSupplementary Fig. 4.

Functional annotations of the genome. CpG island annotation for hg18 wasobtained from UCSC genome browser. The location of CpG island shores wasderived from CpG islands by taking ±2 Kb flanking regions and removing anyoverlaps with CpG islands. RefSeq transcript annotation for hg18 was obtainedfrom UCSC genome browser. Promoters were defined as þ 2,000/� 100 bp aroundtranscription start site. Intergenic regions were defined as regions complementingtranscript regions extended to ±2 Kb around the transcripts. HMEC ChromHMM

annotations for hg18 were downloaded from ENCODE. The original annotationpartitions the HMEC genome into 15 functional states (see Fig. 1b in ref. 24). InFig. 1c and Supplementary Fig. 1B, for brevity, we collapsed the three originalpromoter states into one promoter state and the four original enhancer states intoone enhancer state.

Enrichment analysis statistical methods. For the enrichment analysis ofhypermethylated regions, we used hypergeometric test to assess the enrichment ofvarious functional annotations of the genome in the set of differentially methylatedregions. For a given functional annotation represented by a set of genomic regions,fraction of SssI regions (regions assayable by MBDCap-Seq) overlapping functionalannotation was compared with the fraction of hyper-/hypomethylated regionsoverlapping functional annotation using hypergeometric distribution. For theenrichment analysis of genes affected by promoter hypermethylation, first, we usedDAVID functional annotation tool25 to carry out analysis against gene sets definedby SP_PIR_KEYWORDS annotation. Second, we used hypergeometric test toassess the enrichment of additional gene sets in the set of genes affected bypromoter hypermethylation26. In both the analyses, the set of 15,643 RefSeq geneswith promoters overlapping SssI regions was used as a background.

Sequenom quantitative massARRAY methylation analysis. SequenomMassARRAY methylation analysis was performed according to Coolen et al.58

Briefly, 500 ng of FFPET clinical sample and cell line DNA (SupplementaryTable 2) was extracted and bisulphite treated using the standard bisulphiteprotocol59. As controls for the methylation analysis, whole-genome amplified DNA(0% methylated) and M.SssI-treated DNA (100% methylated) were bisulphitetreated in parallel. The primers were designed using the EpiDesignerBETAsoftware from Sequenom (see Supplementary Table 6 for sequences). Each reverseprimer has a T7-promoter tag (50-CAGTAATACGACTCACTATAGGGAGAAGGCT-30) and each forward primer has a 10-mer tag (50-AGGAAGAGAG-30). Ontesting these primers on bisulphite-treated DNA, all the primers gave specific PCRproducts at a Tm of 60 �C. To check for potential PCR bias towards methylated ornon-methylated sequences, we used serological DNA (Millipore) as a 100%methylated control and whole-genome amplified human blood DNA as a 0%methylated control. The PCRs were optimized and performed in triplicate usingthe conditions: 95 �C for 2 min, 45 cycles of 95 �C for 40 s, 60 �C for 1 min and72 �C for 1 min 30 s and final extension at 72 �C for 5 min. After PCR amplification,the triplicates were pooled and a shrimp alkaline phosphatase treatment wasperformed using 5 ml of the PCR product as template. Then, 2 ml of the shrimpalkaline phosphatase-treated PCR product was taken and subjected to in vitrotranscription and RNaseA Cleavage for the T-cleavage reaction. The samples werepurified by resin treatment and spotted on a 384-well SpectroCHIP by aMassARRAY Nanodispenser. This was followed by spectral acquisition on aMassARRAY Analyser Compact matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. The results were then analysed by the EpiTYPERsoftware V 1.0, which gives quantitative methylation levels for individual CpG sites.The average methylation ratio was calculated by averaging the ratios obtained fromeach CpG site. For the Sequenom validation, sample sizes were determined for atwo sample t-test with a two-sided alpha of 0.01, assuming five regions were to beinvestigated. Assuming the difference in average methylation levels is 0.25 (tumour:s.d.¼ 0.2, normal: s.d.¼ 0.05), to have 90% power to establish a significantdifference between tumour and normal samples, 15 samples per group wererequired. The calculations are based on preliminary data from the lab onmethylation levels in breast cancer and normal samples (unpublished).

Acquisition of TCGA data. Throughout the paper, we used several molecular datasets from TCGA breast cancer (BRCA) cohort. Clinical annotation of samples wasobtained from the marker TCGA BRCA publication (Supplementary Table 1 in ref.13; Supplementary Table 7). Raw HM450K methylation data (Level 1) wasobtained from TCGA data portal in January 2012. Methylation data spanned 67normal and 354 tumour ERþ samples, 16 normal and 105 tumour ER� samplesand nine normal and 73 tumour TNBC samples. Processed array expression data(Level 3) was obtained from TCGA data portal in March 2012. Expression dataspanned 52 normal and 406 tumour ERþ samples, nine normal and 118 tumourER� samples and eight normal and 89 tumour TNBC samples. Processed RNA-Seq expression data (Level 3) were obtained from TCGA data portal in December2012. Expression data spanned 73 normal and 588 tumour ERþ samples, 19normal and 174 tumour ER� samples and 12 normal and 119 tumour TNBCsamples. Summary of TCGA BRCA mutation data was obtained from COSMICdatabase (http://cancer.sanger.ac.uk/cosmic/study/overview?study_id=414). Thesummary lists mutations in gene coding regions across patients including bothsynonymous and non-synonymous amino-acid substitutions. We consider a geneas mutated if it appears at least two times in the list (Supplementary Table 8).

Analysis of HM450K methylation data. The raw HM450K data were pre-processed and background normalized with Bioconductor minfi package usingpreprocessIllumina(..., bg.correct¼TRUE, normalize¼ ‘controls’, reference¼ 1)command; resulting M-values were used for statistical analyses60 and beta-valuesfor heatmap visualizations and clustering. To identify TNBC-specific HM450K

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 9

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 34: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

probes, we carried out t-test comparison between TNBC (n¼ 73) and non-TNBC(n¼ 386) tumours. This analysis resulted in 282 probes having adj. P value o0.05and estimated mean difference of methylation between TNBC and non-TNBCtumours of at least 10%; these probes were declared as TNBC specific. Regionsoverlapping three or more TNBC-specific probes were declared as TNBC specific.For TNBC-specific signature, we trained a Partial Least Squares model asimplemented in the caret R package61,62 to classify tumours into TNBC and non-TNBC based on the methylation values of 282 TNBC-specific probes. The tumoursamples in the TCGA HM450K cohort were randomly partitioned into equal-sizetraining/testing sets. The model parameters were derived from training set andthen applied to make predictions on the testing set. The performance of the modelwas assessed using test set predictions.

Analysis of expression data. Differential expression analysis between normal(n¼ 8) and tumour (n¼ 89) TNBC samples was carried out with Bioconductorlimma package. Since only subset of tumour samples had paired adjacent normalsamples, patient data were treated as random effect using limma’s duplicate-Correlation(y) function. This analysis resulted in 3,017 downregulated and 3,407upregulated genes with adj. P value o0.05 out of 17,655 genes on the array. InFig. 1f,g, we only considered genes with SssI regions in their promoter regionsreducing the number of downregulated, upregulated and total genes to 2,119, 2,722and 15,543, respectively. We used log-transformed RNA-Seq expression values tohighlight the relationship between methylation and expression for number ofcandidate regions in Fig. 2c and Supplementary Fig. 11.

Survival analysis. TNBC tumour samples in TCGA HM450K cohort (n¼ 73)were clustered on the basis of methylation beta-values of 4,987 HM450K probesoverlapping the 822 hypermethylated regions. We applied consensus clusteringalgorithm63 as implemented in Bioconductor ConsensusClusterPlus packageto the 4,987� 73 methylation matrix with parameters maxK¼ 4, reps¼ 1000,pItem¼ 0.8, pFeature¼ 0.8, clusterAlg¼ ‘km’, distance¼ ‘euclidean’. We usedSVD decomposition to reduce the dimension of the methylation matrix to R10

before clustering. We chose the three-cluster configuration for downstreamsurvival analysis.

Survival analysis was carried out using Cox proportional hazards model asimplemented in R survival package against overall survival data (SupplementaryTable 7). Survival analysis of cluster data was carried out with cluster membershipas an explanatory variable. The BRCA TNBC cohort consists of 73 patients withHM450K methylation data and 12 events. Survival analysis of individual probes(4,987 probes overlapping 822 hypermethylated DMRs) was carried out with probemethylation status as explanatory variable (univariate analysis) and age, stage andprobe methylation status (multivariate analysis). Methylation status wasrepresented by a binary variable, high (higher than the median beta-value for theprobe) and low (smaller or equal to the median beta-value for the probe). Stage wasderived from AJCC stage in the clinical annotation of samples. Due to moderatesize of the cohort, we reduced the number of values of the stage variable to two bycollapsing stages I, IA, IB, II, IIA and IIB into one state and stages III, IIIA, IIIB,IIIC and IV into one state. This resulted in 190 probes with methylation statusstatistically and significantly (P value o0.05 in both univariate and multivariateanalyses) associated with overall survival in TCGA TNBC patients. Regionalaggregation of survival probes identified 17 hypermethylated DMRs overlappingthree or more survival probes. Fourteen regions were associated with poorprognosis, these regions overlapped probes for which high methylationcorresponded to lower probability of survival, and three regions wereassociated with good prognosis.

References1. Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406,

747–752 (2000).2. Perou, C. M. Molecular stratification of triple-negative breast cancers.

Oncologist 16(Suppl 1): 61–70 (2011).3. Blows, F. M. et al. Subtyping of breast cancer by immunohistochemistry to

investigate a relationship between subtype and short and long term survival:a collaborative analysis of data for 10,159 cases from 12 studies. PLoS Med. 7,e1000279 (2010).

4. Dent, R. et al. Triple-negative breast cancer: clinical features and patterns ofrecurrence. Clin. Cancer Res. 13, 4429–4434 (2007).

5. Jatoi, I., Anderson, W. F., Jeong, J.-H. & Redmond, C. K. Breast cancer adjuvanttherapy: time to consider its time-dependent effects. J. Clin. Oncol. 29,2301–2304 (2011).

6. Park, Y. H. et al. Clinical relevance of TNM staging system according to breastcancer subtypes. Ann. Oncol. 22, 1554–1560 (2011).

7. Reis-Filho, J. S. & Pusztai, L. Gene expression profiling in breast cancer:classification, prognostication, and prediction. Lancet 378, 1812–1823 (2011).

8. Laird, P. W. The power and the promise of DNA methylation markers. Nat.Rev. Cancer 3, 253–266 (2003).

9. Szyf, M. DNA methylation signatures for breast cancer classification andprognosis. Genome Med. 4, 26 (2012).

10. Xu, Z. et al. Epigenome-wide association study of breast cancer usingprospectively collected sister study samples. J. Natl Cancer Inst. 105, 694–700(2013).

11. Chimonidou, M. et al. CST6 promoter methylation in circulating cell-free DNAof breast cancer patients. Clin. Biochem. 46, 235–240 (2013).

12. Snell, C., Krypuy, M., Wong, E. M., Loughrey, M. B. & Dobrovic, A. BRCA1promoter methylation in peripheral blood DNA of mutation negative familialbreast cancer patients with a BRCA1 tumour phenotype. Breast Cancer Res. 10,12 (2008).

13. TCGA. Comprehensive molecular portraits of human breast tumours. Nature490, 61–70 (2012).

14. Holm, K. et al. Molecular subtypes of breast cancer are associated withcharacteristic DNA methylation patterns. Breast Cancer Res. 12, R36(2010).

15. Fackler, M. J. et al. Genome-wide methylation analysis identifies genes specificto breast cancer hormone receptor status and risk of recurrence. Cancer Res. 71,6195–6207 (2011).

16. Fang, F. et al. Breast cancer methylomes establish an epigenomic foundation formetastasis. Sci. Transl. Med. 3, 75ra25 (2011).

17. Cho, Y. H. et al. Prognostic significance of gene-specific promoterhypermethylation in breast cancer patients. Breast Cancer Res. Treat. 131,197–205 (2012).

18. Stone, A. et al. BCL-2 hypermethylation is a potential biomarker of sensitivityto antimitotic chemotherapy in endocrine-resistant breast cancer. Mol. CancerTher. 12, 1874–1885 (2013).

19. Stefansson, O. A., Villanueva, A., Vidal, A., Marti, L. & Esteller, M. BRCA1epigenetic inactivation predicts sensitivity to platinum-based chemotherapy inbreast and ovarian cancer. Epigenetics 7, 1225–1229 (2012).

20. Serre, D., Lee, B. H. & Ting, A. H. MBD-isolated Genome Sequencing providesa high-throughput and comprehensive survey of DNA methylation in thehuman genome. Nucleic Acids Res. 38, 391–399 (2010).

21. Rauch, T. & Pfeifer, G. P. Methylated-CpG island recovery assay: a newtechnique for the rapid detection of methylated-CpG islands in cancer. Lab.Invest. 85, 1172–1180 (2005).

22. Nair, S. S. et al. Comparison of methyl-DNA immunoprecipitation (MeDIP)and methyl-CpG binding domain (MBD) protein capture for genome-wideDNA methylation analysis reveal CpG sequence coverage bias. Epigenetics 6,34–44 (2011).

23. Robinson, M. D. et al. Evaluation of affinity-based genome-wide DNAmethylation data: effects of CpG density, amplification bias, and copy numbervariation. Genome Res. 20, 1719–1729 (2010).

24. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in ninehuman cell types. Nature 473, 43–49 (2011).

25. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrativeanalysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc.4, 44–57 (2009).

26. Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. D. Enrichment map: anetwork-based method for gene-set enrichment visualization andinterpretation. PLoS ONE 5, e13984 (2010).

27. Mehlen, P., Delloye-Bourgeois, C. & Chedotal, A. Novel roles for Slits andnetrins: axon guidance cues as anticancer targets? Nat. Rev. Cancer 11, 188–197(2011).

28. Neufeld, G. & Kessler, O. The semaphorins: versatile regulators oftumour progression and tumour angiogenesis. Nat. Rev. Cancer 8, 632–645(2008).

29. Hughes, L. A. E. et al. The CpG island methylator phenotype: what’s in a name?Cancer Res. 73, 5858–5868 (2013).

30. Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptionalgene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

31. Yang, L., Han, Y., Suarez Saiz, F. & Minden, M. D. A tumor suppressor andoncogene: the WT1 story. Leukemia 21, 868–876 (2007).

32. Shen, H. & Laird Peter, W. Interplay between the Cancer Genome andEpigenome. Cell 153, 38–55 (2013).

33. Cedar, H. & Bergman, Y. Linking DNA methylation and histone modification:patterns and paradigms. Nat. Rev. Genet. 10, 295–304 (2009).

34. Tommasi, S., Karm, D. L., Wu, X., Yen, Y. & Pfeifer, G. P. Methylation ofhomeobox genes is a frequent and early epigenetic event in breast cancer. BreastCancer Res. 11, R14 (2009).

35. Pilato, B. et al. HOX gene methylation status analysis in patients withhereditary breast cancer. J. Hum. Genet. 58, 51–53 (2013).

36. Rudd, P. M., Elliott, T., Cresswell, P., Wilson, I. A. & Dwek, R. A. Glycosylationand the immune system. Science 291, 2370–2376 (2001).

37. Teschendorff, A., Miremadi, A., Pinder, S., Ellis, I. & Caldas, C. An immuneresponse gene expression module identifies a good prognosis subtype inestrogen receptor negative breast cancer. Genome Biol. 8, R157 (2007).

38. Yau, C. et al. A multigene predictor of metastatic outcome in early stagehormone receptor-negative and triple-negative breast cancer. Breast CancerRes. 12, R85 (2010).

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899

10 NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications

& 2015 Macmillan Publishers Limited. All rights reserved.

Page 35: Spring 2017 – Epigenetics and Systems Biology Discussion ...Spring 2017 – Epigenetics and Systems Biology Discussion Session (Epigenetics and Disease Etiology) Michael K. Skinner

39. Kuo, W. H. et al. Molecular characteristics and metastasis predictor genes oftriple-negative breast cancer: a clinical study of triple-negative breastcarcinomas. PLoS ONE 7, e45831 (2012).

40. Rody, A. et al. A clinically relevant gene signature in triple negative andbasal-like breast cancer. Breast Cancer Res. 13, R97 (2011).

41. Hallett, R. M., Dvorkin-Gheva, A., Bane, A. & Hassell, J. A. A gene signature forpredicting outcome in patients with basal-like breast cancer. Sci. Rep. 2, 227(2012).

42. Sproul, D. et al. Transcriptionally repressed genes become aberrantlymethylated and distinguish tumors of different lineages in breast cancer. Proc.Natl Acad. Sci. USA 108, 4364–4369 (2011).

43. Cheng, Y. et al. KRAB zinc finger protein ZNF382 is a proapoptotic tumorsuppressor that represses multiple oncogenes and is commonly silenced inmultiple carcinomas. Cancer Res. 70, 6516–6526 (2010).

44. Lleras, R. A. et al. Hypermethylation of a cluster of Kruppel-Type zinc fingerprotein genes on chromosome 19q13 in oropharyngeal squamous cellcarcinoma. Am. J. Pathol. 178, 1965–1974 (2011).

45. Huang, R.-L. et al. Methylomic analysis identifies frequent DNA methylation ofzinc finger protein 582 (ZNF582) in cervical neoplasms. PLoS ONE 7, e41060(2012).

46. Severson, P. L., Tokar, E. J., Vrba, L., Waalkes, M. P. & Futscher, B. W.Coordinate H3K9 and DNA methylation silencing of ZNFs in toxicant-inducedmalignant transformation. Epigenetics 8, 1080–1088 (2013).

47. Vanaja, D. K., Cheville, J. C., Iturria, S. J. & Young, C. Y. F. Transcriptionalsilencing of zinc finger protein 185 identified by expression profiling isassociated with prostate cancer progression. Cancer Res. 63, 3877–3882 (2003).

48. Reinert, T. et al. Diagnosis of bladder cancer recurrence based on urinary levelsof EOMES, HOXA9, POU4F2, TWIST1, VIM, and ZNF154 hypermethylation.PLoS ONE 7, e46297 (2012).

49. Rodriguez, B. A. et al. Epigenetic repression of the estrogen-regulatedHomeobox B13 gene in breast cancer. Carcinogenesis 29, 1459–1465 (2008).

50. Bruno, P. et al. WT1 CpG islands methylation in human lung cancer: a pilotstudy. Biochem. Biophys. Res. Commun. 426, 306–309 (2012).

51. Ghoshal, K. et al. HOXB13, a target of DNMT3B, is methylated at an upstreamCpG island, and functions as a tumor suppressor in primary colorectal tumors.PLoS ONE 5, e10338 (2010).

52. Okuda, H. et al. Epigenetic inactivation of the candidate tumor suppressorgene HOXB13 in human renal cell carcinoma. Oncogene 25, 1733–1742 (2006).

53. Miyoshi, Y. et al. High expression of Wilms’ tumor suppressor gene predictspoor prognosis in breast cancer patients. Clin. Cancer Res. 8, 1167–1171 (2002).

54. Moorwood, K. et al. Antisense WT1 transcription parallels sense mRNA andprotein expression in fetal kidney and can elevate protein levels in vitro. J.Pathol. 185, 352–359 (1998).

55. Dallosso, A. R. et al. Alternately spliced WT1 antisense transcripts interact withWT1 sense RNA and show epigenetic and splicing defects in cancer. RNA 13,2287–2299 (2007).

56. Heinz, S. et al. Simple combinations of lineage-determining transcriptionfactors prime cis-regulatory elements required for macrophage and B cellidentities. Mol. Cell 38, 576–589 (2010).

57. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductorpackage for differential expression analysis of digital gene expression data.Bioinformatics 26, 139–140 (2010).

58. Coolen, M. W., Statham, A. L., Gardiner-Garden, M. & Clark, S. J. Genomicprofiling of CpG methylation and allelic specificity using quantitative high-throughput mass spectrometry: critical evaluation and improvements. NucleicAcids Res. 35, e119 (2007).

59. Clark, S. J., Harrison, J., Paul, C. L. & Frommer, M. High sensitivity mapping ofmethylated cytosines. Nucleic Acids Res. 22, 2990–2997 (1994).

60. Du, P. et al. Comparison of Beta-value and M-value methods for quantifyingmethylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010).

61. Kuhn, M. Building predictive models in R using the caret Package. J. Stat.Softw. 28, 1–26 (2008).

62. Mevik, B.-H. & Wehrens, R. The pls Package: principal component and partialleast squares regression in R. J. Stat. Softw. 18, 1–24 (2007).

63. Tamayo, S. M. P., Mesirov, J. & Golub, T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expressionmicroarray data. Mach. Learn. 52, 91–118 (2003).

AcknowledgementsWe thank the Ramaciotti Centre, University of New South Wales (Sydney, Australia) forgenome sequencing. This work is supported by the National Breast Cancer Foundation(NBCF) program and project grants and National Health and Medical Research Council(NHMRC1029579) project grant and NHMRC Fellowship to S.J.C. M.A.B. is supportedby Cancer Council Queensland and University of Queensland. J.D.F. is supported by afellowship from the National Breast Cancer Foundation (NBCF) Australia.

Author contributionsS.J.C., M.T., A.D., J.F.F., R.J.S., M.A.B. and G.D.F. were involved in the overall studydesign. C.S. and J.Z.S. were in the development of methodology. Acquisition of data wasdone by C.S., E.Z., J.Z.S., S.S.N. and W.J.L. Clinical samples and/or preparation of DNAwere provided by K.A.A.-K., K.M.P., W.Q., S.S., G.D.F. and J.D.F. Analysis and inter-pretation of the data (for example, statistical analysis, biostatistics, computational ana-lysis) was done by N.J.A., M.D.R., W.J.L., E.Z. and A.S. Writing, figures, review of themanuscript were done by E.Z., W.J.L., A.S., C.S. and S.J.C. Conception and studysupervision was done by S.J.C. and C.S.

Additional informationAccession codes: Methylation sequence data have been deposited in GenBank/EMBL/DDBJ under the accession code GSE58020.

Supplementary Information accompanies this paper at http://www.nature.com/naturecommunications

Competing financial interests: The authors declare no competing financial interests.

Reprints and permission information is available online at http://npg.nature.com/reprintsandpermissions/

How to cite this article: Stirzaker, C. et al. Methylome sequencing in triple-negativebreast cancer reveals distinct methylation clusters with prognostic value. Nat. Commun.6:5899 doi: 10.1038/ncomms6899 (2015).

NATURE COMMUNICATIONS | DOI: 10.1038/ncomms6899 ARTICLE

NATURE COMMUNICATIONS | 6:5899 | DOI: 10.1038/ncomms6899 | www.nature.com/naturecommunications 11

& 2015 Macmillan Publishers Limited. All rights reserved.